[PDF] Globally optimal parameter estimates for nonlinear diffusions

Abstract

This paper studies an approximation method for the log-likelihood function of a nonlinear diffusion process using the bridge of the diffusion. The main result (Theorem \refthm:approx) shows that this approximation converges uniformly to the unknown likelihood function and can therefore be used efficiently with any algorithm for sampling from the law of the bridge. We also introduce an expected maximum likelihood (EML) algorithm for inferring the parameters of discretely observed diffusion processes. The approach is applicable to a subclass of nonlinear SDEs with constant volatility and drift that is linear in the model parameters. In this setting, globally optimal parameters are obtained in a single step by solving a linear system. Simulation studies to test the EML algorithm show that it performs well when compared with algorithms based on the exact maximum likelihood as well as closed-form likelihood expansions.

Full PDF

aa r X i v : . [ m a t h . S T ] J a n The Annals of Statistics (cid:13)

Institute of Mathematical Statistics, 2010

GLOBALLY OPTIMAL PARAMETER ESTIMATES FORNONLINEAR DIFFUSIONS

By Aleksandar Mijatovi´c and Paul Schneider

Imperial College London and Warwick Business School

This paper studies an approximation method for the log-likelihoodfunction of a nonlinear diﬀusion process using the bridge of the diﬀu-sion. The main result (Theorem 1) shows that this approximationconverges uniformly to the unknown likelihood function and cantherefore be used eﬃciently with any algorithm for sampling fromthe law of the bridge. We also introduce an expected maximum like-lihood (EML) algorithm for inferring the parameters of discretely ob-served diﬀusion processes. The approach is applicable to a subclassof nonlinear SDEs with constant volatility and drift that is linear inthe model parameters. In this setting, globally optimal parametersare obtained in a single step by solving a linear system. Simulationstudies to test the EML algorithm show that it performs well whencompared with algorithms based on the exact maximum likelihoodas well as closed-form likelihood expansions.

1. Introduction.

In the natural and social sciences, diﬀusion processesare widely used as models for random phenomena that evolve continuouslyin time. They are popular because they arise as solutions of stochastic dif-ferential equations, which are natural probabilistic generalizations of thedeterministic models described by ordinary diﬀerential equations. It is wellknown that if the data are recorded at discrete times, parametric inferencefor diﬀusions using maximum likelihood estimates is diﬃcult, primarily be-cause it is usually impossible to ﬁnd the corresponding likelihood function[see Sørensen (2004) for a review of methods of inference in the diﬀusion set-ting]. In this paper, we are concerned with the estimation of the parametersin the stochastic diﬀerential equation (SDE) dX t = µ ( X t , θ ) dt + dW t where µ ( · , · ) : R × R N +1 → R , (1.1) Received February 2008; revised April 2009.

AMS 2000 subject classiﬁcations.

Key words and phrases.

Nonlinear diﬀusion, maximum likelihood, EM algorithm, esti-mation, global optimization.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Statistics ,2010, Vol. 38, No. 1, 215–245. This reprint diﬀers from the original in paginationand typographic detail. 1

A. MIJATOVI ´C AND P. SCHNEIDER is an arbitrary continuous (possibly) nonlinear function and W t is a standardBrownian motion. In order to guarantee the existence of a nonexplosive solu-tion of (1.1), we need to assume that for each parameter value θ ∈ R N +1 , thefunction µ ( · , θ ) : R → R is locally Lipschitz with linear growth [see Kloedenand Platen (1999), Chapter 4]. The task is to infer the vector of coeﬃcients θ ∈ Θ ⊆ R N +1 in the drift µ ( · , θ ) from K + 1 observed realizations x , . . . , x K of the diﬀusion X t , where Θ is some compact subset in the parameter space.When the exact likelihood function is available, the parameters can bedetermined by maximizing the joint likelihood of the observations. The truelikelihood function is, however, available only in very few cases. A varietyof approximations exist and are well documented in the literature [see A¨ıt-Sahalia (2002) and Jensen and Poulsen (2002), Hurn, Jeisman and Lindsay(2007), Schneider (2006) for empirical comparisons of the available methods].A general method for parameter inference based on the EM algorithm is tomaximize an approximate likelihood function, which can be deﬁned if onecan simulate the bridge of the diﬀusion in (1.1) (see Section 2 for the precisedeﬁnition of this approximation). Recently, an exact simulation approachfor diﬀusion bridges was developed in Beskos et al. (2006) and an eﬃcientalgorithm for sampling from bridges of ergodic diﬀusions was proposed inBladt and Sørensen (2007). Either of these simulation methods can be usedto deﬁne the approximate likelihood function mentioned above. The maintheoretical contribution of the present paper is Theorem 1, which states that,under some additional regularity conditions on the drift µ ( · , θ ) in (1.1), theapproximation for the likelihood function obtained by the simulation of thebridge of the diﬀusion in (1.1) is justiﬁed because it approximates uniformlythe true likelihood. For the precise statement of the result, see Section 2.In this paper, we also propose a new algorithm for the inference of pa-rameters when diﬀusion (1.1) takes a simpler form, as given in (3.1). Ourmethod circumvents the use of numerical optimization to determine the pa-rameters for diﬀusion models of the form (3.1). The estimation algorithmtransforms the original problem into a related inference problem that has aunique global solution θ ∗ ∈ R N +1 which is obtained in a single step by solvinga linear system of dimension ( N + 1) × ( N + 1) given in (3.5). By Theorem 1,the related inference problem approximates uniformly on compact subsetsof the parameter space the original inference problem. We also show thatthe approximations of the expectations that feature in linear system (3.5)converge uniformly on bounded subsets of the parameter space as the timeinterval between consecutive data points goes to zero (i.e., the number ofobservations K + 1 goes to inﬁnity). This property is implied by Theorem3. Diﬀusions that are not of the form (1.1) can often be transformed intothe required structure by a well-known change-of-variable method [see (3.6)at the end of Section 3]. The constant diﬀusion coeﬃcient requirement in LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS (1.1) is therefore not as restrictive as it may seem at ﬁrst glance. Many ofthe widely used diﬀusion processes with state-dependent volatility fall intothis class. The square root process and the ﬂexible diﬀusion used in A¨ıt-Sahalia (1996) and Jones (2003b) [see (4.3) and (4.4) for the SDEs describingthe models] can be dealt with in this fashion. The likelihood functions ofboth processes, conditional upon S&P 100 implied volatility index data, areanalyzed using the EML algorithm in Section 4.2. In the case of the squareroot process, direct maximum likelihood estimation is also performed andit is shown that the parameter values obtained agree with the ones foundusing an algorithm based on the EML procedure. In this paper, we consideronly one-dimensional diﬀusion processes, even though the EML algorithmcan be applied to the multidimensional case without introducing additionalcomputational complexity when the underlying process is reducible [see A¨ıt-Sahalia (2008) for the precise deﬁnition]. However, extending the result ofTheorem 1 to higher dimensions is a much harder problem.The paper is organized as follows. Section 2 describes how to approximatethe likelihood function and states our main theoretical result (Theorem 1).Section 3 deﬁnes and derives the EML algorithm for diﬀusions given by (3.1).Section 4 consists of two subsections: in Section 4.1, a comparison of the EMLalgorithm with exact maximum likelihood estimation and the analytic like-lihood approximation method from A¨ıt-Sahalia (2002) is performed; Section4.2 estimates the square root and ﬂexible diﬀusion processes conditional onimplied volatility data. Section 5 concludes the paper. The Appendices Aand B contain the proofs of Theorems 1 and 3.

2. The main result.

Let x , . . . , x K denote K + 1 realizations of the dif-fusion X t given in (1.1), observed at times t , . . . , t K . To avoid notationalcomplexity, we assume evenly spaced time intervals between consecutivedata points: ∆ = t k − t k − for all k ∈ { , . . . , K } . Since we are assuming thatthe drift µ ( · , θ ) : R → R is locally Lipschitz and has linear growth, SDE (1.1)has a weakly unique solution for any starting point x in its domain andany parameter value θ ∈ Θ ⊆ R N +1 .The starting point of our approach is the EM algorithm, which we nowbrieﬂy review [see Dempster, Laird and Rubin (1977) or McLachlan andKrishnan (1997) for the general theory]. We begin by considering two con-secutive data points and then apply our analysis to the entire data set.Between two consecutive observations x k − and x k at times t k − and t k ,respectively, we introduce M − u , . . . , u M − and deﬁne u := x k − , u M := x k . Note that the lengthof the time interval between u m − and u m , for any m ∈ { , . . . , M } , equals δ := ∆ /M . Given two observed realizations, u and u M , the task is to ﬁndthe parameter θ = ( a , . . . , a N ) such that the value π ( u M | u , θ ) of the con-ditional transition density of the diﬀusion X t is maximized. Consider the A. MIJATOVI ´C AND P. SCHNEIDER joint likelihood π ( u M , . . . , u | u , θ ) of the variable u M and the latent aux-iliary variables u , . . . , u M − conditional on u . The Markov property of thediﬀusion X t implies the following representation: π ( u M , . . . , u | u , θ ) = M Y m =1 π ( u m | u m − , θ ) . (2.1)In order to formulate the EM algorithm in our setting, we need to intro-duce the following notation. Let random variables U m := X t k − + mδ for all m ∈ { , . . . , M − } correspond to the auxiliary states between the consec-utive observations and let the random vector U := ( U , . . . , U M − ) be theauxiliary state vector. The joint distribution of U is given by the law of thebridge of the diﬀusion X t , which starts at time t k − at the level u andﬁnishes at the level u M at time t k , denoted by Q θ (or, more accurately, Q ∆ ,u ,u M θ ). The subscript θ signiﬁes the dependence of this probability lawon the parameters in the model. The EM algorithm starts with some feasiblevalue θ of the parameter θ and repeats the following two steps:E- step : determine the conditional expectation θ E Q θn [log π ( U, u M | u , θ )];M- step : maximize this expression with respect to θ .The function in the E-step of the algorithm is known as the complete likeli-hood . The important observation is that the expectation deﬁning the com-plete likelihood is taken with respect to the distribution of the vector U ,which is given by the law Q θ n of the diﬀusion bridge. With each iteration,the value π ( u M | u , θ ) is increased and therefore the algorithm is guaran-teed to converge to a stationary point which, in some pathological cases, isnot a local maximum [see McLachlan and Krishnan (1997) for convergenceproperties of the EM algorithm]. It is thus key to understanding the be-havior of the complete likelihood θ E Q θn [log π ( U, u M | u , θ )] for any ﬁxedparameter value θ n .There are two problems associated with the E-step of the algorithm inour setting. The ﬁrst is that the joint density π ( u M , . . . , u | u , θ ) for thelaw of the process X t given by SDE (1.1) cannot be obtained in closed form.The second problem is that the law of the bridge of the diﬀusion X t (i.e.,the process X t conditional upon X t k − = u and X t k = u M ) which arises inthe expectation is also unknown.Using an Euler scheme approximation for the solution of SDE (1.1),together with Markov property (2.1), one can obtain an approximationfor the joint likelihood function π ( u M , . . . , u | u , θ ) in the following way.Over any short time period of length δ , the Euler scheme approximatesthe solution X t + δ of SDE (1.1) at time t + δ , conditional upon the level X t , by the normal random variable X t + µ ( X t , θ ) δ + W δ with mean X t + µ ( X t , θ ) δ and variance δ . Over a longer time period ∆, a succession of LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS such normal variables is used to approximate the original process [see Kloe-den and Platen (1999), Section 10.2, for the general theory and conver-gence properties of Euler schemes for SDEs]. Each transition density π ( u m | u m − , θ ), m ∈ { , . . . , M } , in (2.1) can therefore be approximated by thenormal density φ ( u m ; u m − + µ ( u m − , θ ) δ, δ ) with mean u m − + µ ( u m − , θ ) δ and variance δ = ∆ M deﬁned above. Identity (2.1) implies that an approxi-mation for the joint likelihood π ( u M , . . . , u | u , θ ) is given by the product Q Mm =1 φ ( u m ; u m − + µ ( u m − , θ ) δ, δ ). This approximation is useful because itdepends explicitly [through the known drift function µ ( · , θ )] on the modelparameter θ and has been used in Pedersen (1995) and Brandt and Santa-Clara (2002) to obtain an approximation for the transition density. Themethod, known as simulated maximum likelihood (SML), is based on thefollowing convergence result which holds under global Lipschitz and lineargrowth conditions [see Theorem 2 in Pedersen (1995)]: π ( u M | u , θ )(2.2) = lim M →∞ Z R M − M Y m =1 φ ( u m ; u m − + µ ( u m − , θ ) δ, δ ) du · · · du M − . The SML algorithm uses this relationship to obtain an approximation forthe likelihood function directly. As we will now show, it is possible to circum-vent the diﬃcult issue of the computation of high-dimensional integrals andobtain optimal parameter values without having to compute approximationsof transition densities.In the same spirit as in Jones (1998), Eraker (2001), Elerian, Chib andShephard (2001) and Roberts and Stramer (2001), the auxiliary data pointsintroduced at the beginning of this section are there to exploit the con-vergence of the discrete-time approximation to the diﬀusion X t . As shownabove, the main problem is to ﬁnd the maximum of the complete likeli-hood θ E Q θ [log π ( U, u M | u , θ )] for any value θ in the parameter space.Instead of doing this, we solve an “approximate” problem, where the func-tion π ( u M , . . . , u | u , θ ) under the expectation is replaced by the density Q Mm =1 φ ( u m ; u m − + µ ( u m − , θ ) δ, δ ) of the Euler scheme approximation. Theapproximate likelihood function can then be obtained as soon as we cansimulate the trajectories of the diﬀusion bridge [using, e.g., the algorithmin Beskos et al. (2006) or Bladt and Sørensen (2007)] governed by the law Q θ . In Section 3, we will show that the approximate problem has a uniquemaximum that can be obtained as a solution of a linear system of size( N + 1) × ( N + 1), where N + 1 is the dimension of the parameter θ , if theunderlying diﬀusion takes the form (3.1).A natural question that arises at this point concerns the quality of theapproximation of the complete likelihood by the sequence of functions ( θ A. MIJATOVI ´C AND P. SCHNEIDER E Q θ [ P Mm =1 log φ ( U m ; U m − + ∆ M µ ( U m − , θ ) , ∆ M )]) M ∈ N . Numerical experimentsin Section 4 suggest that this approximation works well. Under some addi-tional regularity hypothesis on the drift µ ( · , θ ), this intuitive claim can bejustiﬁed by the following theorem. Theorem 1.

Suppose that, in addition to the local Lipschitz conditionwith linear growth, we also assume that the function µ : R × R N +1 → R in(1.1) is twice diﬀerentiable in the state variable with bounded second deriva-tive. Let θ be a ﬁxed value in the parameter space. The following equalitythen holds: lim M →∞ E Q θ " M X m =1 (cid:18) log π ( U m | U m − , θ ) − log φ (cid:18) U m ; U m − + ∆ M µ ( U m − , θ ) , ∆ M (cid:19)(cid:19) = 0 for all θ in the parameter space, where φ ( y ; x, δ ) is the normal density func-tion with mean x and variance δ . Furthermore, the limit is uniform in θ oneach compact subset of the parameter space. Note that the nature of Theorem 1 is fundamentally diﬀerent from that ofthe result (2.2) above, proved in Pedersen (1995), because the expectationin the theorem is taken with respect to the law of the bridge of the diﬀusion X t , rather than the law of the diﬀusion X t itself (i.e., conditional upon X t k = u M = x t k and X t k − = u = x t k − ). The proof of Theorem 1 is foundin Appendix A. It is based on the fact that in the one-dimensional case,there exists an explicit formula for the transition density of the diﬀusion interms of the Brownian bridge [see Rogers (1985)]. Because this is a specialproperty of the one-dimensional case, the proof does not easily generalize tothe multidimensional setting.The main contribution of Theorem 1 is that it provides the theoreticalbasis for using the approximate complete likelihood described above withany method capable of simulating the diﬀusion bridge of the process deﬁnedby (1.1), including the algorithms in Beskos et al. (2006) and Bladt andSørensen (2007), provided the regularity conditions on the drift are met.In Section 3, we use Theorem 1 to justify a key step in a new estimationalgorithm for discretely observed diﬀusions.

3. Expected maximum likelihood (EML) algorithm.

In this section, weare concerned with the estimation of the parameters in the SDE dX t = µ ( X t , θ ) dt + dW t where µ ( x, θ ) := g ( x ) + N X i =0 a i f i ( x ) , (3.1) LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS driven by the standard Brownian motion W t . The drift µ ( · , θ ) : R → R isgiven by an arbitrary family of independent, possibly nonlinear, Lipschitzfunctions g, f i : R → R with linear growth. The task is to infer the vectorof coeﬃcients θ := ( a , . . . , a N ) ∈ Θ ⊆ R N +1 in the drift µ ( · , θ ) from K + 1observed realizations, x , . . . , x K , of the diﬀusion X t . As we shall see, theEML algorithm consists of solving a linear system of size ( N + 1) × ( N + 1)given in (3.5) and converges to the global maximum in a single step.Having constructed the approximation to the complete likelihood in Sec-tion 2, we now turn to the initial estimation problem. By the M-step ofthe EM algorithm, our task is to maximize the approximate complete like-lihood function θ E Q θ [ P Mm =1 log φ ( U m ; U m − + δµ ( U m − , θ ) , δ )] for anyﬁxed value of the model parameter θ . The following obvious proposition iscrucial to all that follows. Proposition 2.

The complete likelihood θ E Q θ [ P Mm =1 log φ ( U m ; U m − + δµ ( U m − , θ ) , δ )] is a nondegenerate quadratic form with a uniqueglobal maximum. It is clear that the complete likelihood in Proposition 2 is a nondegeneratequadratic form in θ , bounded above by a constant, which implies that allof its eigenvalues must be negative. Therefore, there exists a unique globalmaximum.The following simple calculation will yield the globally optimal parametervalue θ ⋆ = ( a ⋆ , . . . , a ⋆N ), which exists by Proposition 2. By setting the partialderivative with respect to each coordinate a j , j ∈ { , . . . , N } , of θ in thefunction given in Proposition 2 to zero, we obtain the linear system Aθ ⊤ = b ,where θ = ( a , . . . , a N ), A = δ P Mm =1 A m , b = P Mm =1 b m and, for any m ∈{ , . . . , M } , A m :=  E Q θ [ f ( U m − ) f ( U m − )] · · · E Q θ [ f N ( U m − ) f ( U m − )]... . . . ... E Q θ [ f ( U m − ) f N ( U m − )] · · · E Q θ [ f N ( U m − ) f N ( U m − )]  , (3.2) b ⊤ m := ( E Q θ [( U m − U m − − g ( U m − ) δ ) f ( U m − )] · · · (3.3) × E Q θ [( U m − U m − − g ( U m − ) δ ) f N ( U m − )]) ⊤ . Since there exists a unique global maximum of the approximate completelikelihood, the inverse A − must also exist and the unique optimal parametervalue is given by θ ⋆ = ( A − b ) ⊤ . For K + 1 observations of the process X t , theglobally optimal parameter value θ ⋆ is obtained in the same way. The onlydiﬀerence is that matrix (3.2) and vector (3.3) are computed using M · K ,rather than M , auxiliary and observed realizations [see (3.5)]. A. MIJATOVI ´C AND P. SCHNEIDER

The globally optimal value θ ⋆ of the parameter vector solves the linearsystem whose coeﬃcients are yet to be determined. Computing the expec-tations E Q θ [ · ] in closed form is impossible because it requires the unknowntransition density π ( u M − , . . . , u | u M , u , θ ) of the bridge of the diﬀusion X t . The key idea that helps to circumvent this problem is to replace the lawof the bridge of X t with the law of the corresponding Brownian bridge inall of the coeﬃcients of matrix (3.2) and vector (3.3). The crucial additionalbeneﬁt of this substitution is that it removes the dependence of the coeﬃ-cients of the linear system on the parameter θ , which implies that the EMprocedure terminates after only one iteration. Therefore, by Proposition 2,the EML algorithm is guaranteed to converge to the globally optimal pa-rameter value θ ⋆ in a single step. A recent parameter estimation algorithmfor general one-dimensional diﬀusion models given in Beskos et al. (2006),based on a sophisticated sampling method known as retrospective sampling ,also employs the EM approach. The EM algorithm is also used in Bladt andSørensen (2007), where the time-reversal symmetry of ergodic diﬀusions isexploited to sample from the corresponding bridge. Unlike in the case of theEML algorithm, in both of those settings, an iteration of E-step and M-stepis required in order to obtain the stationary value of the model parameter.We now need to consider the quality of the weak approximation of thelaw of the bridge of the diﬀusion X t (i.e., a process X t conditioned to startat X = x and ﬁnish at X ∆ = y , where ∆ is the length of the time intervalbetween consecutive observations in the data) by the law of the Brownianbridge (i.e., a Brownian motion W t conditioned to start at W = x and ﬁnishat W ∆ = y ). This question is of importance because it tells us how far thecoeﬃcients of the linear system given by the matrix (3.2) and the targetvector (3.3) are from the ones used in the EML algorithm (3.5). It is intu-itively clear that when ∆ goes to zero, the Brownian bridge approximationmust improve in quality. Since the law of the diﬀusion bridge is absolutelycontinuous with respect to the law of the Brownian bridge, it is possibleto bound the approximation error explicitly in terms of ∆ and the modelparameter θ . Theorem 3.

Assume that functions g, f i : R → R , i ∈ { , . . . , N } , in thedrift of SDE (3.1) satisfy the linear growth condition and are twice diﬀeren-tiable with bounded second derivatives, and let G : R M → R be a polynomiallybounded measurable function for some integer M ∈ N . Let Q ∆ ,x,yθ denote thelaw of the bridge, starting at x and ﬁnishing at y , of the diﬀusion X t thatsolves SDE (3.1) with the parameter value θ = ( a , . . . , a N ) ∈ R N +1 and let W t denote the standard Brownian motion. The measure Q ∆ ,x,yθ is then ab-solutely continuous with respect to the law of the corresponding Brownian LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS bridge W ∆ ,x,y and the Radon–Nikodym derivative is given by d Q ∆ ,x,yθ d W ∆ ,x,y = L ∆ θ E W ∆ ,x,y [ L ∆ θ ] where L ∆ θ := exp (cid:18) − Z ∆0 ( µ ( W s , θ ) + µ ′ ( W s , θ )) ds (cid:19) . (a) The following inequality holds for all x, y in the domain of X t and alltimes < t < · · · < t M < ∆ : | E Q ∆ ,x,yθ [ G ( X t , . . . , X t M )] − E W ∆ ,x,y [ G ( W t , . . . , W t M )] | ≤ E W ∆ ,x,y (cid:20)(cid:18) L ∆ θ E W ∆ ,x,y [ L ∆ θ ] − (cid:19) (cid:21) k G k , where k G k := E W ∆ ,x,y [ G ( W t , . . . , W t M ) ] / denotes the L -norm of therandom variable G ( W t , . . . , W t M ) . (b) Let S ( θ ) := sup z ∈ R { µ ( z, θ ) + µ ′ ( z, θ ) } and I ( θ ) := inf z ∈ R { µ ( z, θ ) + µ ′ ( z,θ ) } be the maximum and minimum of the integrand in L ∆ θ , respectively.The following inequality then holds: | E Q ∆ ,x,yθ [ G ( X t , . . . , X t M )] − E W ∆ ,x,y [ G ( W t , . . . , W t M )] |≤ (cid:18) exp (cid:18) ∆2 ( S ( θ ) − I ( θ )) (cid:19) − (cid:19) k G k . The absolute continuity of the measures Q ∆ ,x,yθ and W ∆ ,x,y is well knownand the form of the Radon–Nikodym derivative in Theorem 3 follows fromLemma 1 in Beskos et al. (2006) and expressions (A.1) and (A.2) in Ap-pendix A. The inequality in part (a) of the theorem bounds the error arisingfrom the approximation of the law Q ∆ ,x,yθ by the measure W ∆ ,x,y in termsof the variance of the Radon–Nikodym derivative and the L -norm of theintegrand. Since the latter is independent of the model parameter θ , thisinequality provides a way of bounding the error for a general integrand G interms of the second moment of the Radon–Nikodym derivative. In practice,the second moment can always be estimated by simulation, thus yielding amodel-speciﬁc bound on the error of the coeﬃcients in linear system (3.5)used in the EML algorithm. Figure 4 contains the graphs of the densitiesof the Radon–Nikodym derivative for the nonlinear SDE in (4.4) used inSection 4. A cursory inspection of the scale of the domains of these densi-ties shows how tight the bound in part (a) of Theorem 3 really is, even forrelatively large time steps ∆. A. MIJATOVI ´C AND P. SCHNEIDER

It is intuitively clear that the Brownian bridge approximation works wellfor short time intervals ∆ and less well as the time step grows. This view issupported by the inequality in part (b) of Theorem 3, which is a consequenceof the bound in part (a). Furthermore, (b) implies that the approximation ofthe law Q ∆ ,x,yθ by W ∆ ,x,y is a good one, even for larger time steps ∆, providedthat the drift µ ( · , θ ) does not vary much as a function of the state. Thisimplies that the method of approximation proposed in the EML algorithmwould work well in the case of the diﬀusion with a periodic drift used inExample 1 in Beskos et al. (2006), for time steps ∆ as large as . Also,note that the norm of the random variable G ( W t , . . . , W t M ) in the Hilbertspace L ( W ∆ ,x,y ) is ﬁnite for a polynomially bounded function G becausethe law W ∆ ,x,y of the Brownian bridge is Gaussian with bounded variance.The proof of Theorem 3 can be found in Appendix B.Having replaced the law of the diﬀusion bridge Q ∆ ,x,yθ by the law of theBrownian bridge W ∆ ,x,y , which is independent of the parameter θ , we areleft with the task of calculating the expectations in the coeﬃcients of matrix(3.2) and vector (3.3). A numerical integration approach would be feasiblebecause we have an explicit formula for the normal density of the marginalsof the probability measure W ∆ ,x,y . However, because of the numerous two-dimensional integrals in (3.3), the problem does not lend itself well to thisapproach.An alternative approach is to simulate the paths of the Brownian bridgeand use Monte Carlo methods to obtain the relevant expectations. This canbe done by using the modiﬁed Brownian bridge sampler deﬁned in Durhamand Gallant (2002) and Chib and Shephard (2002), given by the followingrecursive formula: u m +1 = u m + u M − u m M − m + r M − m − M − m δ / Z m , (3.4)where δ = ∆ /M is the length of the time interval between consecutive aux-iliary states, u = x, u M = y are the initial and ﬁnal points of the Brow-nian bridge and Z m ∼ N (0 ,

1) are independent random variables for all m ∈ { , . . . , M − } . It is proved in Stramer and Yan (2007b) (see Propo-sition 1) that the joint density of the modiﬁed Brownian bridge equals thejoint law of the Brownian bridge at the discretization times, which impliesthat scheme (3.4) introduces no discretization bias and is therefore prefer-able to the Euler approximation. Since the parameter θ does not appearin the evolution equation (3.4) of the modiﬁed Brownian bridge, the EMLalgorithm can be described as follows.Let x , . . . , x K be the K + 1 observations at times t , . . . , t K of the diﬀusion X t given by SDE (3.1) and let ∆ = t k − t k − for all k ∈ { , . . . , K } . Let M − u mk , m ∈ { , . . . , M − } , LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS between the observed data points x k − and x k such that u k = x k − and u Mk = x k for all k ∈ { , . . . , K } . Let S be the number of the simulationsused. The EML algorithm then consists of the following simple steps. Step

1. For each k ∈ { , . . . , K } and each s ∈ { , . . . , S } , generate a Brow-nian bridge path ( u ( s ) mk ) m =0 ,...,M using (3.4). Step

2. Find the unique solution of the linear system Aθ ⊤ = b , where A ij := ∆ M K X k =1 M X m =1 S X s =1 f i ( u ( s ) m − k ) f j ( u ( s ) m − k ) and(3.5) b i := K X k =1 M X m =1 S X s =1 ( u ( s ) mk − u ( s ) m − k − g ( u ( s ) m − k )) f i ( u ( s ) m − k )with i, j ∈ { , . . . , N } , to obtain the globally optimal parameter value θ ⋆ =( a ⋆ , . . . , a ⋆N ).An appealing feature of the EML algorithm described above is that it cir-cumvents the iterative process that is ubiquitous in the general EM frame-work. The invertibility of matrix A is, by Proposition 2, equivalent to thenondegeneracy of the complete likelihood function, which is implied by thelinear independence of the functions f i in the drift (3.1). Note that if auxil-iary state variables u k , . . . , u M − k , for all k ∈ { , . . . , K } , are not introduced,then we can remove the expectation operators in (3.2) and (3.3) or, equiv-alently, the sums over s and m in step 2 of the above algorithm. In thiscase, the EML algorithm reduces to the classic linear regression. Under theconjugate normal prior, the estimates of the drift parameters then coincidewith the posterior mean in a Bayesian analysis of the coeﬃcients.We conclude this section with a brief comment about diﬀusion modelswith state-dependent diﬀusion functions. A scalar diﬀusion dX t = µ ( X t , θ ) dt + σ ( X t , ϑ ) dW t can always be transformed using a change-of-variable Y = γ ( X, ϑ ) = R X duσ ( u,ϑ ) ,which depends on the diﬀusion parameter vector ϑ , into dY t = µ Y ( X t , θ, ϑ ) dt + dW t (3.6) where µ Y ( y, θ, ϑ ) = µ ( γ − ( y, ϑ ) , θ ) σ ( γ − ( y, ϑ ) , ϑ ) − ∂σ∂x ( γ − ( y, ϑ ) , ϑ ) . Note that if the original drift µ ( · , θ ) is aﬃne in the parameter θ , then sois the transformed drift µ Y ( · , θ, ϑ ). Therefore, an application of the EMLalgorithm is feasible for any ﬁxed value of the diﬀusion parameter ϑ . InSection 4, we will discuss how to apply the EML algorithm to diﬀusions ofthis kind [see the models given by (4.3) and (4.4)]. A. MIJATOVI ´C AND P. SCHNEIDER

4. Applications.

There are at least two potential applications for theEML algorithm. The ﬁrst is the classical parameter estimation problem fordiﬀusion models. The advantage of the EML approach is that the resultingparameter estimates are globally optimal and the bias introduced throughthe Euler approximation is, by Theorem 1, arbitrarily small. The secondapplication is based on the fact that the EML algorithm is computationallyvery fast. The speed of the algorithm enables one to easily explore the de-pendence of the likelihood function on the diﬀusion parameter ϑ [see (3.6)],along with the dependence of the globally optimal drift parameter θ as afunction of ϑ [see examples (4.3) and (4.4)]. Both of these applications willbe illustrated in the present section.4.1. Base cases.

To test the EML algorithm for potential biases arisingin the Euler and the Brownian bridge approximations, we start by estab-lishing two base cases. The ﬁrst case is an Ornstein–Uhlenbeck diﬀusion[see (4.1)], where the true transition density, and, therefore, the likelihoodfunction, is known. The second case is a diﬀusion with a nonlinear drift[see (4.2)], where we employ the closed-form likelihood expansion from A¨ıt-Sahalia (2002) as a benchmark, because this method is known to producevery accurate approximations of the true transition density [see, e.g., Schnei-der (2006), Hurn, Jeisman and Lindsay (2007), Jensen and Poulsen (2002)].To put the EML algorithm to the test, we simulate 1000 data sets fromeach of the two models [the Ornstein–Uhlenbeck process in (4.1) and thenonlinear diﬀusion in (4.2)] with K + 1 = 500 observations in each data set.As mentioned above, in the ﬁrst case, we perform an exact ML estimationon each of the data sets using the exact transition density in the likelihoodfunction and in the second case, we ﬁrst apply the closed-form likelihoodexpansion from A¨ıt-Sahalia (2002) to obtain an approximation to the likeli-hood function which is then used in quasi-ML estimation. The two modelsare given by dX t = ( a − a X t ) dt + dW t , model A,(4.1) dX t = ( a + a X t + a X t ) dt + dW t , model B,(4.2)with parameter values a = 10 , a = 2 . a = 1 , a = − , a = − . t is measured in years and the consecutive obser-vations in the generated data sets are one month apart. In other words,∆ = and we choose an auxiliary state variable for each day of the month,that is, M − S = 1000 and one S = 200simulated paths. Figure 1 displays the comparison of the EML procedureusing 1000 simulations with the direct ML estimation for model A. The bi-ases and standard deviations of the parameter estimates are shown in Table LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS Fig. 1.

Empirical distribution of estimates from model A . This ﬁgure shows the empir-ical distribution of the maximum likelihood estimator and the EML estimator over 1000estimations for the Ornstein–Uhlenbeck model (4.1). Plotted are the maximum likelihoodestimator, obtained by maximizing the true transition density using a gradient solver, andthe EML estimator, obtained by using formula (3.5). Table 1

Base cases A and B : bias and standard deviation. This table shows the estimation resultsfor the parameters from equations (4.1) and (4.2) on 1000 data sets generated using thetrue transition density for model A and a very ﬁne Euler approximation for model B (100 auxiliary data points). The benchmark estimations for model A are performed usingexact maximum likelihood. The benchmark estimations for model B are obtained usingclosed-form likelihood expansions. The column “bias” shows the mean bias of theestimated parameters. Bias is deﬁned as ˆ θ ( i ) − θ for the i th estimation. The column“Std. Dev.” shows the standard deviation of the parameter estimates. For the EMLestimation, 30 auxiliary data points were used ML/A¨ıt-Sahalia EML S = 1000 EML S = 200 θ Bias Std. Dev. Bias Std. Dev. Bias Std. Dev.

Model A a

10 0 . . . . a . . . . . a . . . . a − − . − . − . . a − . . . . .

1. Explicit gradients are used in the likelihood search in the case of both theexact likelihood function and the closed-form likelihood expansion.A striking observation is the higher bias of the ML estimator over the EMLestimator. The closed-form likelihood expansion and EML display similarbiases. No noticeable diﬀerence can be seen by choosing 200 or 1000 simu-lations to approximate the expectations in the EML algorithm. In Stramerand Yan (2007a), the authors suggest that in a related problem of Monte A. MIJATOVI ´C AND P. SCHNEIDER

Carlo estimation for the transition densities of diﬀusions, the optimal num-ber of simulations S is of the order M , which, in the two cases discussedhere, amounts to approximately 900 simulations. Also, note that the EMLprocedure takes about a second to produce the optimal parameter valuesfor each of the data sets described in this subsection. The hardware used toperform these experiments was a PC with a 64-bit Xeon 2.8 MHz processor.4.2. Exploring the likelihood function.

The empirical features of the dy-namics of equity indices such as the S&P 100 include time-varying volatility,a level eﬀect for the volatility of the variance of return [see Jones (2003a)]and evidence for jumps [Andersen, Benzoni and Lund (2002)]. We are nowgoing to investigate how a diﬀusion model, speciﬁed by a nonlinear SDE, ﬁtsthe implied volatility data. In this section, we study the relation betweenthe diﬀusion and drift parameters for each of the processes dV t = κ ( γ − V t ) dt + σ p V t dW t , model I,(4.3) dV t = ( a + a V t + a V t ) dt + q σ V t + σ V t dW t , model II,(4.4)conditional on S&P 100 implied volatility data given by a time series of thevolatility index VXO. Empirical studies in Jones (2003a) and A¨ıt-Sahaliaand Kimmel (2007) have rejected the square root process (4.3) as a spec-iﬁcation for the variance dynamics of S&P 100. Nevertheless, the relationbetween the parameters of the square root process, conditional upon realdata, can be investigated using the EML algorithm. The second model isa nonlinear diﬀusion (4.4), introduced by A¨ıt-Sahalia (1996), and is poten-tially ﬂexible enough to accommodate the rich dynamics exhibited by theS&P 100 implied volatility index data.We start by transforming the SDEsin (4.3) and (4.4) into a form with a unit diﬀusion coeﬃcient using formula(3.6). For model I, we apply the transformation y ( x ) = 2 √ x/σ , which yields dY t = (cid:18) b Y t + b Y t (cid:19) dt + dW t . (4.5)Model II requires the change-of-variable y ( x ) = √ xσ + √ σ + xσ ) √ σ , whichtransforms it into dY t = µ ( Y t ) dt + dW t with µ ( y ) = b q e − y √ σ ( e y √ σ − σ ) /σ + b e − y √ σ ( e y √ σ − σ ) q e − y √ σ ( e y √ σ − σ ) /σ σ + b e − y √ σ ( e y √ σ − σ ) q e − y √ σ ( e y √ σ − σ ) /σ σ − e − y √ σ ( σ + e y √ σ )2 q e − y √ σ ( e y √ σ − σ ) /σ . In the case of model I, we also perform maximum likelihood estimation withthe true transition density of the square root variance (4.5). The resultingparameters are θ = 0 . , κ = 5 . σ = 0 . σ , we can transform the time series for thevolatility index using the change of coordinates y ( x ) = 2 √ x/σ and performthe estimation of the parameters in (4.5) using EML. This operation takesabout one second on a personal computer (64-bit Xeon 2.8 MHz, runningLinux) for the given data set. By repeating this process for each value of σ ona ﬁnite grid in the interval [0 . , . Fig. 2.

Model I . (a) displays the maximum likelihood function (up to a proportionalityfactor) of speciﬁcation (4.3) as a function of σ on the x -axis, conditional on S&P 100implied volatility data. For a given σ , the optimal values of θ ⋆ and κ ⋆ are computed usingEML. The likelihood function is computed using the SML algorithm with the Brownianbridge importance sampler [see Durham and Gallant (2002)]. (b) displays globally optimalvalues θ ⋆ and κ ⋆ as functions of σ . The noncentral chi-square density is given in terms of special functions that are diﬃcultto handle numerically for nonsymbolic computational tools with ﬁnite precision such asFortran, C/C++ and MATLAB. We perform this estimation using Mathematica. A. MIJATOVI ´C AND P. SCHNEIDER as having a single parameter σ and apply the SML algorithm [as describedin Durham and Gallant (2002)] or some analytic likelihood approximationto ﬁnd the likelihood function for σ [see Figure 2(a)]. This application ofEML has therefore reduced the dimension of the parameter space from 3to 1. Note that, in this approach, SML aﬀords an additional computationaladvantage over other likelihood approximation methods because one canreuse the modiﬁed Brownian bridge paths that were generated for EML asan importance sampler when computing the likelihood function. Using thetime series of implied volatilities, an analogous estimation can be performedfor the model given by SDE (4.4).The analysis of the likelihood functions, together with the optimal driftparameters for the processes (4.3) and (4.4), provides some interesting in-sights into the process speciﬁcation as well as the estimation method. Theﬁrst observation is that the ML estimates obtained from the true transitiondensity of the square root process (4.3) agree with the EML estimates. Thisis clear from Figures 2(a) and 2(b) at the point σ = 0 . and the likelihood function are smooth inthe parameters of the state-dependent volatility function. Preliminary EMLestimates also suggest that the Euler approximation on a daily level is notsuﬃcient for a nonlinear diﬀusion like the one in (4.4). Even a discretiza-tion of 5 subintervals per day appears too coarse. The estimates stabilizebetween 10 and 30 subintervals. This is in line with the ﬁndings in Robertsand Stramer (2001), Figure 4, in a similar setting. The likelihood functionfor model (4.4) is extremely ﬂat in the diﬀusion parameters close to theoptimal region (see Figure 3) and care should be taken with the estimation.Finally, a likelihood-ratio test applied to the two variance models revealsthat the speciﬁcation (4.4) is preferable to the square root speciﬁcation.

5. Conclusion.

This paper is concerned with an approximation proce-dure for the likelihood function of a nonlinear diﬀusion, given a discrete setof observations. The method can be used with any algorithm for samplingfrom the law of the diﬀusion bridge [e.g., Beskos et al. (2006) or Bladt andSørensen (2007)] and is shown to converge uniformly on compact subsets ofthe parameter space (see Theorem 1).We also develop a new expected maximum likelihood (EML) algorithmfor the estimation of parameters governing a nonlinear diﬀusion process.It provides globally optimal parameter values when the drift is aﬃne in the It is beneﬁcial for the stability of the method to keep the random numbers ﬁxed inall of the expectations arising in (3.2) and (3.3). This principle is shown to guarantee theconvergence of the MCEM algorithm in Papaspiliopoulos and Sermaidis (2007).LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS Fig. 3.

Model

II.

The likelihood of model (4.4) as a function of σ (on the x -axis) and σ (on the y -axis), conditional on S&P 100 implied volatility data. For given σ and σ ,the values a ⋆ , a ⋆ and a ⋆ are computed using EML. The likelihood function for model II given in (4.4) is computed using the SML algorithm with the Brownian bridge importancesampler [see Durham and Gallant (2002)]. coeﬃcients and the diﬀusion function is constant. For diﬀusions with a state-dependent volatility function, our method is used to express the likelihoodas a function of the volatility parameters only, thereby signiﬁcantly reduc- Fig. 4.

Simulated densities of the Radon–Nikodym derivative d Q ∆ ,x,yθ /d W ∆ ,x,y of thelaw of the bridge of the diﬀusion X t with respect to the law of the Brownian bridge (seeTheorem 3 for the precise deﬁnition). The diﬀusion X t used in this example is given by anonlinear SDE (4.4). The parameter value θ is implied by the VXO data. The coordinatesof θ are approximately given by a = 0 . , a = − , a = − and the diﬀusion coeﬃcientsare σ = 0 . and σ = 3 . The graphs correspond to time-horizons ∆ equal to two weeksand one month, and a ﬁxed starting point x = 0 . for the diﬀusion bridge and the Brow-nian bridge. Three diﬀerent end points y of the respective bridges are chosen for eachtime-horizon. Note that the density in all of the cases is concentrated in a small neighbor-hood of , thus making the bound in (a) of Theorem 3 very tight, even for time intervals ∆ as long as one month. A. MIJATOVI ´C AND P. SCHNEIDER ing the dimension of the parameter space for a gradient-based solver. Theframework is easy to implement and is guaranteed to solve the expectationmaximization problem in a single iteration. It uses auxiliary data pointsand is based on two observations: the fact that the complete likelihood (i.e.,the joint likelihood of the observed and auxiliary data points) of the Eulerscheme approximates uniformly the complete likelihood of the diﬀusion asthe time interval between the consecutive auxiliary data points goes to zero,and the fact that the law of the Brownian bridge approximates well thelaw of the diﬀusion bridge. Global optimality (Proposition 2), theoreticalbounds (Theorem 3) and asymptotic results (Theorem 1) are established,quantifying the quality of the approximations. Additional numerical exper-iments suggest that the method works very well for multivariate nonlineardiﬀusions, even for large time intervals between observed data points.A topic for further research is the possible extension of the EML frame-work to the estimation of jump-diﬀusions. Instead of using the law of theBrownian bridge, a semi-nonparametric density [see Gallant and Tauchen(2009, 2006)] could be used to approximate the conditional density p ( x τ | x τ , x τ ) = p ( x τ | x τ ) p ( x τ | x τ ) /p ( x τ | x τ ) , ≤ τ < τ < τ , of the cor-responding bridge process with jumps. The complete likelihood functionwould then be obtained by conditioning on a high-frequency discretization(i.e., using many auxiliary state variables) of the jump-diﬀusion which wouldidentify well the jump parameters.APPENDIX A: PROOF OF THEOREM 1Recall that we have a diﬀusion X t which is a solution of the SDE dX t = µ ( X t , θ ) dt + dW t , where µ ( · , θ ) : R → R is a bounded Lipschitz function withbounded ﬁrst and second derivatives. Without loss of generality, we canassume that the parameter space Θ is a compact subset of R N +1 . For s > t ,let π t,s ( x | x , θ ) denote the probability density function of X s conditionalon X t = x . It is well known that such a density exists [see (7) and (8) inRogers (1985)] and, by Girsanov’s theorem, can be expressed as π t,t + δ ( x | x , θ ) = 1 √ πδ e − ( x − x ) / (2 δ ) e R xx µ ( u,θ ) du Φ θ ( δ, x , x ) , (A.1)where Φ θ ( δ, x , x ) := E (cid:20) exp (cid:18) − δ Z g θ ( x + u ( x − x ) + √ δW u ) du (cid:19)(cid:21) . (A.2)Here, W denotes the Brownian bridge W u := W u − uW , for u ∈ [0 , g θ is given by g θ ( u ) := µ ′ ( u, θ ) + µ ( u, θ ) . Our task is to provethat the sum ∆ /δ − X k =1 E Q θ (cid:20) log (cid:18) π t,t + δ ( X δ ( k +1) | X δk , θ ) φ ( X δ ( k +1) ; X δk + δµ ( X δk , θ ) , δ ) (cid:19)(cid:21) (A.3) LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS converges to zero uniformly for all θ in the compact parameter space Θ,where Q θ is the law of the bridge of the diﬀusion X t , for t ∈ [0 , ∆], whichstarts at X = x and ﬁnishes at X ∆ = x ∆ for any ﬁxed pair of real numbers x , x ∆ ∈ R . The function φ ( y ; x, δ ) in this expression is the normal densityfunction with mean x and variance δ .The ﬁrst step in the proof of Theorem 1 is to understand the integrandlog (cid:18) π t,t + δ ( y | x, θ ) φ ( y ; x + δµ ( x, θ ) , δ ) (cid:19) = δ µ ( x, θ ) − ( y − x ) µ ( x, θ )(A.4) + Z yx µ ( u, θ ) du + log(Φ θ ( δ, x, y )) . We start with the following claim.

Claim 1.

The integral of the drift can be expressed as R yx µ ( u, θ ) du =( y − x ) µ ( x, θ )+ ( y − x ) µ ′ ( x, θ )+ ( y − x ) h θ ( x, y ) , where h θ ( x, y ) is a boundedmeasurable function which is linear in θ . Let µ ( u, θ ) = µ ( x, θ ) + ( u − x ) µ ′ ( x, θ ) + ( u − x ) µ ′′ ( ξ x,u , θ ) be the Taylorapproximation of order 2 of the drift µ ( · , θ ). The point ξ x,u lies in the interval( x, u ) ⊂ R [or in ( u, x ), if x is larger than u ]. For any ﬁxed y in R , we can inte-grate this representation of µ ( · , θ ) to obtain the representation of the integralwhich is given in Claim 1. We need to check that the function h θ ( x, y ) := y − x ) R yx ( u − x ) b θ ( u, x ) du , for x = y , is bounded and measurable. Here, b θ ( u, x ) is given by the quotient b θ ( u, x ) := 2 µ ( u,θ ) − µ ( x,θ ) − ( u − x ) µ ′ ( x,θ )( u − x ) if u = x and is zero otherwise. Note that the function b θ is bounded since µ ′′ ( · , θ ) isbounded and linear in θ . Since the set Θ is compact, the estimate | h θ ( x, y ) | ≤ C ( y − x ) R yx ( u − x ) du = C holds for all y > x and some constant C . A sim-ilar bound holds for y < x . It follows from the deﬁnition of b θ that it ismeasurable on R × R since it is continuous outside the zero measure set { ( x, x ) : x ∈ R } . Fubini’s theorem implies that h θ ( x, y ) must therefore alsobe measurable. This proves Claim 1.The next task is to relate the asymptotic behavior of the function log(Φ θ ( δ,x, y )) to the drift µ ( · , θ ). This will be achieved in Claims 2 and 3. Claim 2.

There exist constants δ > and A > such that for all δ ∈ [0 , δ ] , we have the following equality: log(Φ θ ( δ, x, y )) + δ Z g θ ( x + u ( y − x )) du = δ / A θ ( x, y ) , where A θ ( x, y ) is a measurable function of x and y that satisﬁes | A θ ( x, y ) |

By Lagrange’s mean value theorem, we obtain the following equalities: g θ ( x + u ( y − x ) + √ δW u ) − g θ ( x + u ( y − x ))= √ δW u g ′ θ ( x + u ( y − x ) + ξ )= √ δW u X θ ( x, y, u ) , where the real number ξ ∈ (0 , √ δW u ) depends on x, y, δ and W u . Note thatthe random variable X θ ( x, y, u ) is measurable since it coincides with thequotient [ g θ ( x + u ( y − x ) + √ δW u ) − g θ ( x + u ( y − x ))] / ( √ δW u ) on a com-plement of the set { W u = 0 } , which has probability zero. This representationalso implies that for any ﬁxed path of W u , the function X θ ( x, y, u ) is jointlymeasurable in the variables x , y and u and that it is quadratic in the parame-ter θ . The assumptions on the drift µ ( · , θ ) and the above equality imply thatthe random variable X θ ( x, y, u ) is bounded for all possible triplets ( x, y, u ).Let the variable Y θ ( x, y ) denote the integral Y θ ( x, y ) := R X θ ( x, y, u ) W u du and let Z θ ( x, y ) denote the random variable Z θ ( x, y ) := P ∞ k =1 ( − δ / ) k − × Y θ ( x,y ) k k ! . The left-hand side of the equality in Claim 2 can now be rewrittenas log E (cid:20) exp (cid:18) − δ / Y θ ( x, y ) (cid:19)(cid:21) = log E [1 − δ / Z θ ( x, y )] . (A.5)From the deﬁnition of W u and the fact that X θ ( x, y, u ) is bounded by a posi-tive constant C independent of θ , we can see that | Y θ ( x, y ) | ≤ C ( R | W u | du + | W | ). This implies the bound | Z θ ( x, y ) | ≤ e | Y θ ( x,y ) | ≤ e C R | W u | du e C | W | , which,combined with the Cauchy–Schwarz inequality, yields E [ | Z θ ( x, y ) | ] ≤ E [ e C R | W u | du ] / E [ e C | W | ] / . (A.6)This means that the expectation of Z θ ( x, y ) exists and is bounded aboveand below uniformly in x and y for all parameters θ ∈ Θ. Therefore, thereexists δ > − < δ E [ Z θ ( x, y )] < holds for all x, y , if δ ∈ [0 , δ ].The right-hand side of (A.5) can now be expressed aslog E [1 − δ / Z θ ( x, y )] = log(1 − δ / E [ Z θ ( x, y )])= δ / ∞ X k =1 ( δ / ) k − E [ Z θ ( x, y )] k k since the series log(1 − z ) = P ∞ k =1 z k /k converges uniformly on compactsubsets of ( − , A θ ( x, y ) := P ∞ k =1 ( δ / ) k − E [ Z θ ( x, y )] k /k ,which is therefore measurable and uniformly bounded for all x, y and δ ∈ [0 , δ ]. It also follows from the bound in (A.6) that there exists a constant A > | A θ ( x, y ) | < A for all x, y and all parameter values θ ∈ Θ.This proves Claim 2.

LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS Claim 3.

The equality R g θ ( x + u ( y − x )) du = g θ ( x ) + ( y − x ) h θ ( x, y ) holds for all x and y and some bounded measurable function h θ ( x, y ) whichis quadratic in the parameter θ . By Lagrange’s mean value theorem, we obtain g θ ( x + u ( y − x )) = g θ ( x ) + g ′ θ ( ξ ) u ( y − x ), where | ξ − x | < u | y − x | . It is clear that [ g θ ( x + u ( y − x )) − g θ ( x )] / ( y − x ) is a measurable function deﬁned on (( R × R ) − { ( x, x ) : x ∈ R } ) × [0 , { ( x, x ) : x ∈ R } has Lebesgue measure zeroin R × R , we can extend the quotient to a bounded measurable function on R × R × [0 , , R g θ ( x + u ( y − x )) du = g θ ( x ) + ( y − x ) R g ′ θ ( ξ ) u du . It follows fromFubini’s theorem that the last integral, denoted by h θ ( x, y ), is a measurablefunction of x and y . It is also clear that h θ ( x, y ) is bounded, since theintegrand ( x, y, u ) g ′ θ ( ξ ) u is bounded on its domain, and that h θ ( x, y ) isquadratic in the parameter θ . This completes the proof of Claim 3.We can now apply Claims 1, 2 and 3 to the equality in (A.4) to obtainthe following representation:log (cid:18) π t,t + δ ( y | x, θ ) φ ( y ; x + δµ ( x, θ ) , δ ) (cid:19) = 12 µ ′ ( x, θ )[( y − x ) − δ ] + ( y − x ) h θ ( x, y ) − δh θ ( x, y )( y − x ) + δ / A θ ( x, y ) , where h θ ( x, y ), h θ ( x, y ) and A θ ( x, y ) are bounded measurable functions. Thesum in (A.3) can now be decomposed naturally into four sums. The ﬁrstthree will tend to zero as δ goes to zero, by Lemma 4. The fourth one caneasily be bounded as follows: δ / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ /δ − X k =1 E Q θ [ A θ ( X δk , X δ ( k +1) )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ / (cid:18) ∆ δ − (cid:19) A = √ δ (∆ − δ ) A , where the constant A is as in Claim 2 and hence converges to zero uniformlyin θ . Also, note that since the functions µ ′ ( x, θ ), h θ ( x, y ) and h θ ( x, y ) are atmost quadratic in the parameter θ which takes values in a compact regionΘ, it is enough to state and prove Lemma 4 for functions that do not dependon θ and still obtain uniform convergence in θ . The proof of Theorem 1 willtherefore be complete as soon as we prove the following lemma. Lemma 4.

Let f δ : R × R → R be one of the following functions: A. MIJATOVI ´C AND P. SCHNEIDER (1) f δ ( x, y ) = h ( x )(( y − x ) − δ ) for any bounded diﬀerentiable function h with bounded ﬁrst derivative; (2) f δ ( x, y ) = h ( x, y )( y − x ) , where h ( x, y ) is a bounded measurable func-tion; or (3) f δ ( x, y ) = δh ( x, y )( y − x ) , where h ( x, y ) is as in (2).The following then holds: lim δ → /δ − X k =1 E Q θ [ f δ ( X kδ , X ( k +1) δ )] = 0 . Proof.

Let A ( δ ) denote the sum in the above limit. Since the diﬀusion X t is a Markov process, we can express A ( δ ) as A ( δ ) = ∆ /δ − X k =1 E Q θ [ E Q θ [ f δ ( X kδ , X ( k +1) δ ) | X kδ , X ∆ ]] . (A.7)Conditional densities, which arise in the expectations in (A.7), can be ex-pressed using the formula in (A.1) for the transition density of X t . We cantherefore rewrite A ( δ ) as A ( δ ) = ∆ /δ − X k =1 Z R π ,kδ ( x | x ) π kδ, ∆ ( x ∆ | x ) π , ∆ ( x ∆ | x ) dx × Z R f δ ( x, y ) π kδ, ( k +1) δ ( y | x ) π ( k +1) δ, ∆ ( x ∆ | y ) π kδ, ∆ ( x ∆ | x ) dy. To simplify the notation, let N := ∆ δ −

1. Note that we are always choosing δ so that N is an integer. The above expression for A ( δ ) implies that thelemma will follow if we prove the following equalities:lim N →∞ N − X k =1 δ / p k ( N − k ) × Z R × R f δ ( x, y ) exp (cid:18) − ( x ∆ − y ) N − k ) / ( N + 1)) − ( y − x ) / ( N + 1) − ( x − x ) k/ ( N + 1)) (cid:19) × Φ (cid:18) ∆ N − kN + 1 , y, x ∆ (cid:19) Φ (cid:18) ∆ N + 1 , x, y (cid:19) (A.8) × Φ (cid:18) ∆ kN + 1 , x , x (cid:19) dx dy = 0 , LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS lim δ → Z R f δ ( x, x ∆ ) 1 p δ (∆ − δ ) exp (cid:18) − ( x ∆ − x ) δ − ( x − x ) − δ ) (cid:19) (A.9) × Φ( δ, x, x ∆ )Φ(∆ − δ, x , x ) dx = 0 . Condition (A.9) corresponds to the last summand in (A.7), while condition(A.8) accounts for the rest of the sum in (A.7).Let us start by proving (A.9). Note that since, by assumption, the drift µ isbounded and has a bounded ﬁrst derivative, the function Φ must be bounded(as mentioned above, we are omitting dependence on the parameter θ as itis ﬁxed throughout). The function h in the deﬁnition of f δ is also bounded,so the absolute value of the integral in (A.9) is smaller than δC R R | v − | e − v / dv , δ / C R R | v | e − v / dv and δ / C R R | v | e − v / dv for f δ given by(1), (2) and (3), respectively. These bounds are obtained by the change ofvariable v = ( x ∆ − x ) / √ δ in the integral in (A.9). In each of the three cases,the constant C is independent of δ . This proves (A.9).We are now left with the harder problem of proving (A.8). Note that theintegrand is absolutely integrable over R × R , which, by Fubini’s theorem,implies that we are free to choose any order of integration in the doubleintegral. We will ﬁrst prove case (1), where f δ ( x, y ) = h ( x )(( y − x ) − δ ).This case is harder than (2) and (3), which will be dealt with at the end ofthe proof.By substituting u = ( x − y ) / √ δ and integrating over the state variable x ﬁrst, we can rewrite the sum in (A.8) as B ( u, N ) := Z R ( u − e − u / C ( u, N ) du, (A.10) C ( u, N ) := N − X k =1 F ( u, k, N ) p k ( N − k ) , (A.11) F ( u, k, N ) := Z R h u r ∆ N + 1 + y ! × exp (cid:18) − ( u p ∆ / ( N + 1) + y − x ) k/ ( N + 1)) − ( x ∆ − y ) N − k ) / ( N + 1)) (cid:19) (A.12) × Φ (cid:18) ∆ N − kN + 1 , y, x ∆ (cid:19) Φ ∆ N + 1 , u r ∆ N + 1 + y, y ! × Φ ∆ kN + 1 , x , u r ∆ N + 1 + y ! dy. A. MIJATOVI ´C AND P. SCHNEIDER

We shall now prove the following statements, which will imply the lemma:(a) lim N →∞ P N − k =1 1 √ k ( N − k ) = π ;(b) lim N →∞ C ( u, N ) exists for every u ∈ R ;(c) lim N →∞ C ( u, N ) is independent of u ;(d) lim N →∞ B ( u, N ) = R R ( u − e − u / (lim N →∞ C ( u, N )) du .Note that (c) and (d) together imply that lim N →∞ B ( u, N ) = (lim N →∞ C ( u,N )) R R ( u − e − u / du = 0, which proves the lemma in the case where f δ is given by (1). In order to prove (a), we need to consider the graph ofthe mapping x / p x ( N − x ) in the interval (0 , N ). We can, without lossof generality, assume that N is an even integer by choosing the decreasingtime interval δ correspondingly [recall that δ = ∆ / ( N + 1)]. By inspectingthe area under the graph of this map, it is easy to see that the followinginequalities must hold: Z N/ dx p x ( N − x ) + 2 N + Z N − N/ dx p x ( N − x ) ≤ N − X k =1 p k ( N − k ) ≤ Z N/ dx p x ( N − x ) + Z NN/ dx p x ( N − x ) . It is easy to check that R x dz √ z ( y − z ) = 2 arctan( q xy − x ) for all x ∈ (0 , y ), where y is any positive real number. Using this formula to calculate the integralsin the above inequalities, we obtain the following relations:2(arctan( √ N − − arctan(1 / √ N −

1) + 2 /N ) ≤ N − X k =1 p k ( N − k ) ≤ arctan(1) + π − arctan s N/ N/ − !! for every even integer N . It is now clear that the limit of the sum in (a)equals π .We are now going to prove (b) and (c). Note that we can assume, withoutloss of generality, that the function h in (A.12) is nonnegative [i.e., h ( y ) ≥ y ∈ R ] because we can express it as h = h + − h − , where h + ( y ) =max { h ( y ) , } and h − ( y ) = max {− h ( y ) , } are nonnegative functions. Note LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS that the functions h + and h − are nondiﬀerentiable on a set which is at mostcountable and therefore has Lebesgue measure zero.We will start by showing that the limit lim N →∞ C (0 , N ) exists. Since h is nonnegative, the same is true of F (0 , k, N ). Using the same reasoning asabove, we obtain the following inequalities: N/ − X k =1 F (0 , k, N ) Z k +1 k dx p x ( N − x )+ F (cid:18) , N , N (cid:19) N + N − X k = N/ F (0 , k + 1 , N ) Z k +1 k dx p x ( N − x ) ≤ N − X k =1 F (0 , k, N ) p k ( N − k )(A.13) ≤ N/ − X k =0 F (0 , k + 1 , N ) Z k +1 k dx p x ( N − x )+ N − X k = N/ F (0 , k, N ) Z k +1 k dx p x ( N − x ) . Since the function x / p x ( N − x ) is integrable on the interval [0 , N ] and F is bounded, the inequalities in (A.13), together with the next claim, implythat the limit lim N →∞ C (0 , N ) exists. Claim. lim N →∞ N/ − X k =1 | F (0 , k + 1 , N ) − F (0 , k, N ) | Z k +1 k dx p x ( N − x )= lim N →∞ N − X k = N/ | F (0 , k + 1 , N ) − F (0 , k, N ) | Z k +1 k dx p x ( N − x ) = 0 . The set A , where h + is not diﬀerentiable, can be expressed in terms of h as follows: A = h − (0) ∩ ( h ′− (0 , ∞ ) ∪ h ′− ( −∞ , h + is nondiﬀerentiable precisely atthose zeros of h where the derivative h ′ is nonzero. Since h ′ is continuous, the set h ′− (0 , ∞ )[resp., h ′− ( −∞ , R and can therefore be expressed as a disjoint union ofopen intervals. Recall that there can only be countably many disjoint open intervals in R .Since function h is monotonic on each of these intervals, there can be at most one zero ineach interval. Therefore, the set A of such zeros of h must be countable. A. MIJATOVI ´C AND P. SCHNEIDER

The two cases in the claim are very similar and can be deduced usingthe same methodology. We will give details only for the ﬁrst case. Let k ∈{ , . . . , N − } and recall that Φ and h are bounded functions with boundedderivatives. We can therefore obtain, using (A.12), the following estimate: | F (0 , k + 1 , N ) − F (0 , k, N ) | ≤ X i =1 C i I i ( k, N ) , (A.14)where C i , i = 1 , . . . ,

4, are constants independent of k and N , and I i ( k, N ), i = 1 , . . . ,

4, are integrals given by: I ( k, N ) := Z R (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − ( x ∆ − y ) N − k − / ( N + 1)) (cid:19) − exp (cid:18) − ( x ∆ − y ) N − k ) / ( N + 1)) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) dy ; I ( k, N ) := Z R (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − ( y − x ) k + 1) / ( N + 1)) (cid:19) − exp (cid:18) − ( y − x ) k/ ( N + 1)) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) × exp (cid:18) − ( x ∆ − y ) N − k ) / ( N + 1)) (cid:19) dy ; I ( k, N ) := Z R (cid:12)(cid:12)(cid:12)(cid:12) Φ (cid:18) ∆ N − k − N + 1 , y, x ∆ (cid:19) − Φ (cid:18) ∆ N − kN + 1 , y, x ∆ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) × exp (cid:18) − ( x ∆ − y ) N − k ) / ( N + 1)) (cid:19) dy ; I ( k, N ) := Z R (cid:12)(cid:12)(cid:12)(cid:12) Φ (cid:18) ∆ k + 1 N + 1 , x , y (cid:19) − Φ (cid:18) ∆ kN + 1 , x , y (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) × exp (cid:18) − ( x ∆ − y ) N − k ) / ( N + 1)) (cid:19) dy. We can now estimate the integrals I i ( k, N ), i = 1 , . . . ,

4, for any k ∈ { , . . . , N − } . For I ( k, N ), we get I ( k, N ) ≤ Z R exp (cid:18) − ( x ∆ − y ) ∆( N/ ( N + 1)) (cid:19) × (cid:12)(cid:12)(cid:12)(cid:12) − exp (cid:18) − ( x ∆ − y ) N + 1( N − k − N − k ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) dy ≤ Z R exp( − ( x ∆ − y ) / ∆) LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS × (cid:18) − exp (cid:18) − ( x ∆ − y ) ∆ N + 1 N ( N/ − (cid:19)(cid:19) dy. The last integral in this inequality converges to zero, by the dominatedconvergence theorem. Therefore, I ( k, N ) goes to zero with increasing N uniformly in k .In order to bound I ( k, N ), note that the function x exp( − ( y − x ) / (2∆ x ))has a derivative which is bounded (for each y ∈ R ) on the interval x ∈ [0 , / k ∈ { , . . . , N/ − } , the value x = k/ ( N + 1) lies in the interval(0 , / k such that k/ ( N + 1) ≤ ( N + 1) − / , we get the followingbound: I ( k, N ) ≤ Z R (cid:18) exp (cid:18) − ( y − x ) N + 1) − / (cid:19) + exp (cid:18) − ( y − x ) N + 1) − / (cid:19)(cid:19) dy (A.15) = √ π ∆(1 + √ N + 1) / , which is uniform in all k that satisfy the above condition.For k ∈ { , . . . , N/ − } such that k/ ( N + 1) > ( N + 1) − / , the La-grange mean value theorem implies the existence of some ξ in the interval( kN +1 , k +1 N +1 ) such that (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − ( y − x ) k/ ( N + 1) + 1 / ( N + 1)) (cid:19) − exp (cid:18) − ( y − x ) k/ ( N + 1)) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N + 1 ( y − x ) ξ e − ( y − x ) / (2∆ ξ ) ≤ N + 1 ( y − x ) √ N + 12∆ . The last inequality is independent of k and holds for every y ∈ R . It thereforeimplies that I ( k, N ) ≤ C √ N + 1for all k ∈ { , . . . , N − } which satisfy the inequality kN +1 > ( N + 1) − / (theconstant C in the above expression is independent of k ). This, together withthe inequality in (A.15), implies that I ( k, N ) converges to zero uniformlyin k .It is shown in Rogers (1985) (see page 160) that the partial derivative ∂ Φ ∂δ of the function Φ( δ, x , x ) deﬁned in (A.2) equals the expectation of the A. MIJATOVI ´C AND P. SCHNEIDER derivative ∂∂δ (cid:20) exp (cid:18) − δ Z g ( x + u ( x − x ) + √ δW u ) du (cid:19)(cid:21) . Using the fact that g is bounded and has a bounded ﬁrst derivative, weconclude that ∂ Φ ∂δ is also bounded. Therefore, we can apply Lagrange’s meanvalue theorem to prove that I ( k, N ) and I ( k, N ) converge to zero uniformlyin k as N goes to inﬁnity.We have just shown that lim N →∞ M ( N ) = 0, where M ( N ) := max {| F (0 , k +1 , N ) − F (0 , k, N ) | : k = 1 , . . . , N − } . The inequalities0 ≤ N/ − X k =1 | F (0 , k + 1 , N ) − F (0 , k, N ) | Z k +1 k dx p x ( N − x ) ≤ M ( N ) Z N/ dx p x ( N − x ) , together with (a) and the fact that M ( N ) converges to zero, prove the claim.We have therefore proven that the limit lim N →∞ C (0 , N ) exists.In order to complete the proof of (b) and (c), we need to understand thebehavior of ∂C∂u ( u, N ) = N − X k =1 p k ( N − k ) ∂F∂u ( u, k, N )for all u ∈ R . The key observation here is that every instance of the parameter u in deﬁnition (A.12) of the function F ( u, k, N ) is of the form u √ N +1 . It istherefore natural to expect that the partial derivative ∂F∂u ( u, k, N ) tends tozero with increasing N for every ﬁxed value of u . This is precisely what weshall now prove.It follows from (A.12) that the inequality (cid:12)(cid:12)(cid:12)(cid:12) ∂F∂u ( u, k, N ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ D √ N + 1 X i =1 J i (A.16)holds, where D is a positive constant independent of u , k and N , and theintegrals J i , i = 1 , . . . ,

4, are of the following form: J := Z R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Φ ∆ kN + 1 , x , u r ∆ N + 1 + y !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − ( x ∆ − y ) (cid:19) dy ; J := Z R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h ′ u r ∆ N + 1 + y !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − ( x ∆ − y ) (cid:19) dy ; LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS J := Z R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Φ ∆ N + 1 , u r ∆ N + 1 + y, y !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − ( x ∆ − y ) (cid:19) dy ; J := Z R | u p ∆ / ( N + 1) + y − x | ∆( k/ ( N + 1)) exp (cid:18) − ( u p ∆ / ( N + 1) + y − x ) k/ ( N + 1)) (cid:19) dy. The functions Φ and Φ denote the derivatives of the function Φ givenin (A.2) with respect to the ﬁrst and second state variable, respectively. Itis shown in Rogers (1985) that these derivatives exist and that they arebounded.In order to obtain the bound in (A.16), we had to exchange the order ofdiﬀerentiation and integration [see the deﬁnition of function F in (A.12)].This can be justiﬁed by the dominated convergence theorem since the dif-ference quotients are bounded above by the function y sup u ′ ∈ ( u − ,u +1) (cid:20) ∂f∂u ( u ′ , y ) (cid:21) exp (cid:18) − ( x ∆ − y ) (cid:19) for each u ∈ R , where f is the integrand in (A.12). This function is clearlyin L ( R ) and the dominated convergence theorem applies. Also, note thatthe integral J is well deﬁned because, as noted earlier (see page 23), thefunction h is nondiﬀerentiable only on a set of measure zero.We can now proceed to estimate the integrals J i , i = 1 , . . . ,

4. Since thefunctions Φ , Φ and h ′ are bounded, the integrals J , J and J are alsobounded above by a constant for all u , N and k . By introducing a changeof variable v = ( u q ∆ N +1 + y − x ) / q ∆ kN +1 , we can transform J into theintegral R R | v | e − v / dv , which is ﬁnite and independent of u , N and k . Com-bining these ﬁndings with (A.16), we can conclude that | ∂F∂u ( u, k, N ) | ≤ D √ N +1 for some constant D independent of u , N and k .By the fundamental theorem of calculus, we have C ( u, N ) = C (0 , N ) + R u ∂C∂u ( v, N ) dv . Using the bounds on ∂F∂u we have just obtained, we ﬁnd thatthe following inequalities hold: | C ( u, N ) − C (0 , N ) | ≤ N − X k =1 p k ( N − k ) Z u (cid:12)(cid:12)(cid:12)(cid:12) ∂F∂u ( v, k, N ) (cid:12)(cid:12)(cid:12)(cid:12) dv ≤ Du √ N + 1 N − X k =1 p k ( N − k ) . Since the sum on the right-hand side is convergent and since we know thatthe limit L := lim N →∞ C (0 , N ) exists, it follows from this inequality thatlim N →∞ C ( u, N ) = L for every u in R . This proves (b) and (c). A. MIJATOVI ´C AND P. SCHNEIDER

Statement (d) follows from the dominated convergence theorem if we canshow that the function C ( u, N ) is bounded for u ∈ R and all integers N .This is a consequence of the deﬁnition of C ( u, N ) given in (A.11), the factthat F ( u, k, N ) is bounded [see (A.12)] and statement (a) which was provenabove. This concludes the proof of the lemma in case (1), where f δ ( x, y ) = h ( x )(( y − x ) − δ ), because, by (c) and (d), we get lim N →∞ B ( u, N ) =(lim N →∞ C ( u, N )) R R ( u − e − u / du = 0, which is equivalent to the state-ment in (A.8).Cases (2) and (3) are much simpler. Again, what we need to show is that(A.8) holds for the corresponding choices of f δ ( x, y ). By introducing thesubstitution u = x − y √ δ , where δ = ∆ N +1 , the absolute value of the integral in(A.8) of case (2) [resp. (3)] is transformed to q ∆ N +1 R R | u | e − u / C ( u, N ) du [resp., q ∆ N +1 R R | u | e − u / C ( u, N ) du ], where the function C ( u, N ) is as in(A.11), while the function F ( u, k, N ) is as in (A.12) with the exception ofthe integrand h ( u q ∆ N +1 + y, y ), which now becomes a bounded function oftwo variables. It is clear that the equality in (A.8) will hold as soon as wesee that the function C ( u, N ) is bounded for all u ∈ R and all integers N .This follows from deﬁnition (A.11), from statement (a) on page 22 and fromthe inequality | F ( u, k, N ) | ≤ D R R exp( − ( x ∆ − y ) ) dy , which holds for someconstant D . This concludes the proof of the lemma. (cid:3) APPENDIX B: PROOF OF THEOREM 3As mentioned in the discussion following Theorem 3, it follows formLemma 1 in Beskos et al. (2006) and expressions (A.1) and (A.2) in Ap-pendix A that the Radon–Nikodym derivative d Q ∆ ,x,yθ /d W ∆ ,x,y exists andis of the required form. The bound in (a) follows from the Cauchy–Schwarzinequality on the Hilbert space L ( W ∆ ,x,y ) since the Brownian bridge is aGaussian process with bounded variance and the function G has at mostpolynomial growth. In other words, the random variable G ( W t , . . . , W t M )is an element in L ( W ∆ ,x,y ), as is the Radon–Nikodym derivative since thedrift µ ( · , θ ) is bounded above and below.Part (b) of the theorem is a consequence of the inequality in part (a),Jensen’s inequality and the elementary observation 0 ≤ E [ X ] − E [ X ] ≤ ( b − a ) , which holds for any random variable X that takes values in a boundedinterval [ a, b ], applied to X := L ∆ θ E W ∆ ,x,y [ L ∆ θ ] −

1. Here, the variable L ∆ θ is asdeﬁned in Theorem 3. This concludes the proof. LOBALLY OPTIMAL PARAMETER ESTIMATES FOR NONLINEAR DIFFUSIONS Acknowledgments.

We would like to thank Yacine A¨ıt-Sahalia, MartinCrowder, Helmut Elsinger, Axel Gandy, Alois Geyer, Klaus P¨otzelberger,Leopold S¨ogner, Osnat Stramer and Ruey Tsay for useful discussions, NickBingham for detailed comments on an earlier version of the manuscript andthe anonymous referee for noting that Theorem 1 holds in much greatergenerality than initially stated.REFERENCES

A¨ıt-Sahalia, Y. (1996). Testing continuous-time models of the spot interest rate.

Reviewof Financial Studies A¨ıt-Sahalia, Y. (2002). Maximum-likelihood estimation of discretely-sampled diﬀusions:A closed-form approximation approach.

Econometrica A¨ıt-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diﬀusions.

Ann. Statist. A¨ıt-Sahalia, Y. and

Kimmel, R. (2007). Maximum likelihood estimation of stochasticvolatility models.

Journal of Financial Economics Andersen, T. G., Benzoni, L. and

Lund, J. (2002). An empirical investigation ofcontinuous-time equity return models.

J. Finance Beskos, A., Papaspiliopoulos, O., Roberts, G. and

Fearnhead, P. (2006). Exact andcomputationally eﬃcient likelihood-based estimation for discretely observed diﬀusionprocesses.

J. R. Stat. Soc. Ser. B Stat. Methodol. Bladt, M. and

Sørensen, M. (2007). Simple simulation of diﬀusion bridges with appli-cation to likelihood inference for diﬀusions. Working paper, CAF Centre for AnalyticalFinance, Univ. Aarhus.

Brandt, M. W. and

Santa-Clara, P. (2002). Simulated likelihood estimation of diﬀu-sions with an application to exchange rate dynamics in incomplete markets.

Journal ofFinancial Economics Chib, S. and

Shephard, N. (2002). Comment on Garland B. Durham and A. RonaldGallant’s “numerical techniques for maximum likelihood estimation of continuous-timediﬀusion processes.”

J. Bus. Econom. Statist. Dempster, A., Laird, N. and

Rubin, D. (1977). Maximum likelihood from incompletedata via the EM algorithm (with discussion).

J. Roy. Statist. Soc. Ser. B Durham, G. B. and

Gallant, R. A. (2002). Numerical techniques for maximum like-lihood estimation of continuous-time diﬀusion processes.

J. Bus. Econom. Statist. Elerian, O., Chib, S. and

Shephard, N. (2001). Likelihood inference for discretelyobserved nonlinear diﬀusions.

Econometrica Eraker, B. (2001). MCMC analysis of diﬀusion models with application to ﬁnance.

J.Bus. Econom. Statist. Gallant, A. R. and

Tauchen, G. (2006). EMM: A program for eﬃcient method of mo-ments estimation. Duke Univ. Available at http://econ.duke.edu/webfiles/arg/emm . Gallant, A. R. and

Tauchen, G. (2009). Simulated score methods and indirect inferencefor continuous-time models. Elsevier.

Hurn, A., Jeisman, J. and

Lindsay, K. (2007). Seeing the wood for the trees: A criticalevaluation of methods to estimate the parameters of stochastic diﬀerential equations.

Journal of Financial Econometrics Jensen, B. and

Poulsen, R. (2002). Transition densities of diﬀusion processes: Numericalcomparison of approximation techniques.

Journal of Derivatives A. MIJATOVI ´C AND P. SCHNEIDER

Jones, C. S. (1998). Bayesian estimation of continuous-time ﬁnance models. Workingpaper, Univ. Rochester.

Jones, C. S. (2003a). The dynamics of stochastic volatility: Evidence from underlyingand options markets.

J. Econometrics

Jones, C. S. (2003b). Nonlinear mean reversion in the short-term interest rate.

Reviewof Financial Studies Kloeden, P. E. and

Platen, E. (1999).

Numerical Solution of Stochastic DiﬀerentialEquations . Springer, New York.

McLachlan, G. J. and

Krishnan, T. (1997).

The EM Algorithm and Extensions . Wiley,New York. MR1417721

Papaspiliopoulos, O. and

Sermaidis, G. (2007). Monotonicity properties of the MonteCarlo EM algorithm and connections with simulated likelihood. Working Paper, War-wick Univ.

Pedersen, A. (1995). A new approach to maximum likelihood estimation for stochas-tic diﬀerential equations based on discrete observations.

Scand. J. Statist. Roberts, G. O. and

Stramer, O. (2001). On inference for partially observed nonlin-ear diﬀusion models using the metropolis-hastings algorithm.

Biometrika Rogers, L. (1985). Smooth transition densities for one-dimensional diﬀusions.

Bull. Lond.Math. Soc. Schneider, P. (2006). Approximations of transition densities for nonlinear multivari-ate diﬀusions with an application to dynamic term structure models. Working paper,Vienna Univ. Economics and Business Administration.

Sørensen, H. (2004). Parametric inference for diﬀusion processes observed at discretepoints: A survey.

International Statistical Review Stramer, O. and

Yan, J. (2007a). Asymptotics of an eﬃcient Monte Carlo estimationfor the transition density of diﬀusion processes.

Methodol. Comput. Appl. Probab. Stramer, O. and

Yan, J. (2007b). On simulated likelihood of discretely observed diﬀusionprocesses and comparison to closed-form approximation.

J. Comput. Graph. Statist. Department of MathematicsImperial College LondonHuxley Building180 Queen’s GateLondon SW7 2AZUnited KingdomE-mail: [email protected]