[PDF] The computational cost of blocking for sampling discretely observed diffusions

Abstract

Many approaches for conducting Bayesian inference on discretely observed diffusions involve imputing diffusion bridges between observations. This can be computationally challenging in settings in which the temporal horizon between subsequent observations is large, due to the poor scaling of algorithms for simulating bridges as observation distance increases. It is common in practical settings to use a blocking scheme, in which the path is split into a (user-specified) number of overlapping segments and a Gibbs sampler is employed to update segments in turn. Substituting the independent simulation of diffusion bridges for one obtained using blocking introduces an inherent trade-off: we are now imputing shorter bridges at the cost of introducing a dependency between subsequent iterations of the bridge sampler. This is further complicated by the fact that there are a number of possible ways to implement the blocking scheme, each of which introduces a different dependency structure between iterations. Although blocking schemes have had considerable empirical success in practice, there has been no analysis of this trade-off nor guidance to practitioners on the particular specifications that should be used to obtain a computationally efficient implementation. In this article we conduct this analysis (under the simplifying assumption that the underlying diffusion is a Gaussian process), and demonstrate that the expected computational cost of a blocked path-space rejection sampler scales asymptotically at an almost cubic rate with respect to the observation distance. Numerical experiments suggest applicability of both the results of our paper and the guidance we provide beyond the class of linear diffusions considered.

Full PDF

TT HE COMPUTATIONAL COST OF BLOCKING FOR SAMPLINGDISCRETELY OBSERVED DIFFUSIONS

A P

REPRINT

Marcin Mider

Max Planck Institute for Mathematics in the SciencesInselstraße 22, 04103 Leipzig, Germany [email protected]

Paul A. Jenkins

Department of StatisticsUniversity of WarwickCoventry CV4 7AL, U.K. [email protected]

Murray Pollock

School of Mathematics, Statistics and PhysicsNewcastle UniversityNewcastle-upon-Tyne NE1 7RU, U.K. [email protected]

Gareth O. Roberts

Department of StatisticsUniversity of WarwickCoventry CV4 7AL, U.K. [email protected] A BSTRACT

Many approaches for conducting Bayesian inference on discretely observed diffusions involveimputing diffusion bridges between observations. This can be computationally challenging in settingsin which the temporal horizon between subsequent observations is large, due to the poor scaling ofalgorithms for simulating bridges as observation distance increases. It is common in practical settingsto use a blocking scheme , in which the path is split into a (user-speciﬁed) number of overlappingsegments and a Gibbs sampler is employed to update segments in turn. Substituting the independentsimulation of diffusion bridges for one obtained using blocking introduces an inherent trade-off:we are now imputing shorter bridges at the cost of introducing a dependency between subsequentiterations of the bridge sampler. This is further complicated by the fact that there are a number ofpossible ways to implement the blocking scheme, each of which introduces a different dependencystructure between iterations. Although blocking schemes have had considerable empirical success inpractice, there has been no analysis of this trade-off nor guidance to practitioners on the particularspeciﬁcations that should be used to obtain a computationally efﬁcient implementation. In thisarticle we conduct this analysis (under the simplifying assumption that the underlying diffusion is aGaussian process), and demonstrate that the expected computational cost of a blocked path-spacerejection sampler scales asymptotically at an almost cubic rate with respect to the observation distance.Numerical experiments suggest applicability of both the results of our paper and the guidance weprovide beyond the class of linear diffusions considered.

Keywords

Bayesian inference ⋅ Blocking ⋅ Diffusion ⋅ Gaussian process ⋅ Markov chain Monte Carlo ⋅ Rejectionsampling

Diffusions have been widely applied to model continuous-time phenomena of interest, including in molecular dynamics(Boys et al. 2008), neuroscience (Lansky & Ditlevsen 2008), and ﬁnance (Karatzas & Shreve 1998). In general, adiffusion on R d is a Markov process X deﬁned to be the solution, with law we will denote by P , to a stochasticdifferential equation of the following form: d X t = b ( X t ) d t + σ ( X t ) d W t , X = x , t ∈ [ , T ] , (1) a r X i v : . [ s t a t . C O ] S e p olynomial blocking strategies A P

REPRINT where b ∶ R d → R d and σ ∶ R d → R d × d ′ denote the drift and volatility coefﬁcient respectively, and W is a standard d ′ -dimensional Brownian motion. Throughout we assume standard regularity conditions hold which ensure the existenceof a unique, global, weak solution to (1) (see for instance Øksendal 2003).In practice we will typically only have access to discrete observations of (1), and so for practitioners the statisticalproblem of interest is to use these observations to draw inference on the parameters of b and σ of (1). A commonBayesian strategy is to augment the parameter space with the space describing the complete underlying diffusiontrajectory. A Markov Chain Monte Carlo algorithm can then explore this augmented space by alternating betweenupdates of the parameters and updates of the unobserved sample path connecting observations (sampling of diffusionbridges) (Roberts & Stramer 2001). As a consequence a considerable and methodologically diverse literature hasdeveloped concerned with simulating diffusion bridges (the law of (1) conditioned to terminate at the subsequentobservation (for instance, X T = x T ), which we denote by P ( T,x ,x T ) or generically by P ⋆ ), including Beskos &Roberts (2005), Bladt et al. (2014), Delyon & Hu (2006), Durham & Gallant (2002), Golightly & Wilkinson (2008),Hairer et al. (2011), Roberts & Stramer (2001), Schauer et al. (2017).One of the common difﬁculties with Markov Chain Monte Carlo strategies is sampling diffusion bridges betweendistant observations; the duration of the bridge, which we denote by T , is large. This setting naturally arises whenthe underlying diffusion (1) is sparsely observed (or high-dimensional), for instance in shape analysis applications(Arnaudon et al. 2020), or in the case of diffusions on graphs (Freidlin & Wentzell 1993). The problem here is thatmethodologies for sampling diffusion bridges scale poorly with T , and many of the most widely used approaches have exponential computational cost in T . Consequently, addressing the poor scaling in T has drawn considerable interest.One popular approach is the blocking scheme introduced by Shephard & Pitt (1997), which has been employed in anumber of practical problems with strong empirical evidence of its efﬁcacy (Chib et al. 2004, Golightly & Wilkinson2008, Kalogeropoulos 2007, Kalogeropoulos et al. 2010, van der Meulen & Schauer 2018, Stramer & Roberts 2007).Blocking is a conceptually simple idea in which the time domain of the diffusion bridge is overlaid with a set of temporal anchors ( =∶ k < k < ⋅ ⋅ ⋅ < k m < k m + ∶= T ), and the values of the bridge are taken for some initialisation trajectory at those points (which are known as knots , and for which we will denote X i ∶= X k i to simplify notation).Simulation from P ( T,x ,x T ) is then achieved by constructing a Gibbs sampler which alternates between updating knotsand updating the segments of the trajectory conditional on the knots, a number of times. For instance, we could beginby simulating from the conditional law P ( k − k ,X ,X ) (updating the trajectory between [ t , t ] which includes theknot at X , conditional on the knots at X and X ), and then P ( k − k ,X ,X ) (updating the trajectory between [ t , t ] and containing the knot X , conditional on the knots at X and X ), and so on sweeping across all anchor points. Thissweep would then be iterated a number of times to reduce the dependency between the resulting bridge and the initial(or previous) trajectory. In this article we consider the three canonical blocking schemes of Roberts & Sahu (1997) withequidistant anchors: the checkerboard scheme, in which the odd and even indexed knots are alternatively updated; the lexicographic scheme, in which the knots are updated in temporal order; and the random scheme, in which at each stepa random knot is updated. We more formally introduce blocking and deﬁne these schemes in Section 2.From a computational perspective, blocking substitutes the expensive simulation of a (single independent) draw from P ( T,x ,x T ) , with the cost of simulating repeated sweeps of the m + shorter (and computationally more efﬁcient)bridges for each segment given by the temporal anchors. Any analysis of this trade-off needs to take into account theserial correlation induce by the blocking strategy.Despite widespread adoption of blocking in practice to mitigate the computational cost of simulating diffusion bridges(as indicated above), there is little theoretical support for its efﬁcacy. Furthermore, there is little concrete guidance onhow to implement, and then appropriately tune (selecting for instance the number and locations of the anchor points), ablocking scheme.In this article we provide general guidance for implementing blocking schemes by addressing these practical consid-erations for particular classes of diffusion process. We analyse the computational cost of several rejection samplingalgorithms for bridges as a function of block size and bridge duration. We analyse the expected cost of a single iterationof various algorithms, and then to capture the trade-off described above we consider the cost of the algorithm whichcomprises both the cost of one iteration, and the total number of iterations required to obtain an ‘independent’ sample.We give a more formal description of what we mean by achieving independence below, in terms of the relaxation timeof the underlying Markov chain.In this article we work under the assumption that the underlying measure is a Gaussian diffusion (i.e. P is the law of ascaled Brownian motion or the law of the Ornstein–Uhlenbeck process). Under this simpliﬁcation the Gibbs step forupdating the bridge segments can be implemented without error, i.e. without discretising time, for example by means ofa rejection sampler directly on the path-space of the diffusion (see Appendix A for full details). In this setting we provethat Theorem 1 below holds, as the culmination of the results in Section 3. We gather all proofs in the appendices.2olynomial blocking strategies A P

REPRINT

Theorem 1.

Suppose P ⋆ is the conditional law of a Gaussian diffusion which is sampled by rejection on path-spaceand using a checkerboard or lexicographic blocking scheme. Let χ , χ be arbitrarily small, positive constants, andsuppose the m anchors are spaced equidistantly such that m = c T + χ (for some constant c > ). Then the expectedcomputational cost of the blocked rejection sampler, C blocking ( T ) , satisﬁes: C blocking ( T ) = O( T + χ ) , as T → ∞ . (2) Where P denotes the law of a scaled Brownian motion the above statement remains valid also with χ = χ = . Although what we prove in Theorem 1 addresses a somewhat idealised setting, the requirement m = c T + χ acts as aconcrete guide for choosing the number of blocks. Furthermore, our empirical results in Section 4 indicate that theguidance we establish can be more broadly useful beyond the class of linear diffusions. Thus we demonstrate thatblocking can lead to signiﬁcantly improved computational efﬁciency when conducting inference for discretely observeddiffusions. In this section we provide a systematic deﬁnition of blocking for sampling a diffusion path. Deﬁne a set of anchorsspread across the time domain: < k < ⋅ ⋅ ⋅ < k m < T and knots as the values of the path taken at the anchors: K( ω ) ∶= { X k ( ω ) , . . . , X k m ( ω )} . Each anchor is now assigned to one of k disjoint subsets A i , i = , . . . , k , each comprising m i anchors: { k , . . . , k m } = k ⊍ i = { r i , . . . , r im i } =∶ k ⊍ i = A i , ( with m i ∈ N + , i ∈ { , . . . , k }) . This allows us to group the knots by associating them with the corresponding subsets of anchors: K i ( ω ) ∶= { X r ( ω ) ; r ∈ A i } , i ∈ { , . . . , k } . For convenience of notation we let K − i (resp. A − i ) denote all knots (resp. anchors) that do not belong to K i (resp. A i ): K − i ( ω ) ∶= ⋃ j ≠ i K j ( ω ) , A − i ( ω ) ∶= ⊍ j ≠ i A j ( ω ) , i ∈ { , . . . , k } , and assign labels to an ordered collection of all anchors in A − i , plus the end-points: { e i , . . . , e i ( m + − m i ) } = A − i ∪ { , T } , ( with e ij < e i ( j + ) ) , i ∈ { , . . . , k } . Further, deﬁne B i ∶= { ( e ij , e i ( j + ) ) ∣ ∃ r ∈ A i s.t. r ∈ [ e ij , e i ( j + ) ]} m − m i j = , to be only those intervals between the end-points or anchors in A − i , which contain at least one anchor belonging to A i .The path segments X ∣ B i , obtained through restricting X to B i , are termed blocks . Finally, in the case k = we say that A and A are interlaced if whenever a, c ∈ A i , with a < c , then there exists b ∈ A ( i mod 2 )+ s.t. a < b < c , i = , .A sampler for a path equipped with a blocking technique is a Gibbs sampler that updates the full path only one block ata time by drawing from the conditional laws P ⋆ ∣ B i (⋅∣K − i ) —i.e. the target laws restricted to blocks B i and conditionedon the knots in K − i . For simplicity we refer to this technique as a blocked sampler in the remainder of the text, andpresent general pseudo-code for it in Algorithm 1. Algorithm 1:

Blocked sampler on path-spaceInitialise X ; for n = , . . . , N dofor i = , . . . , k do Draw I ∼ q ( i, ⋅) (various choices for q are deﬁned below, in Deﬁnitions 1–3);Update X ∣ B I by sampling X ∣ B I ∼ P ⋆ ∣ B I (⋅∣K − I ) ; return X There are a number of ways we can update the blocks, and in this article we consider the three canonical blockingschemes of Roberts & Sahu (1997). In particular, we refer to a single, full Gibbs sweep of Algorithm 1 (the inner for-loop ) as a: 3olynomial blocking strategies

A P

REPRINT

Deﬁnition 1.

Checkerboard blocking update scheme if k = , A and A are interlaced, and q ( i, j ) ∶= { i } ( j ) . Deﬁnition 2.

Lexicographic blocking update scheme if k = m , A i ∶= { k i } , i ∈ { , . . . , m } , and q ( i, j ) ∶= { i } ( j ) . Deﬁnition 3.

Random blocking update scheme if k = m , A i ∶= { k i } , i ∈ { , . . . , m } , and q ( i, j ) ∶= m { ,...,m } ( j ) .The above are not exhaustive, but characterise the most widely used, and are tractable enough for analysis. We furthersimply various computations by assuming the anchors are equidistant , and defer discussion of this assumption and itsrelaxation to Section 5. Assumption 1.

The anchors are placed on an equidistant grid: k i + − k i = Tm + =∶ δ m,T , i ∈ { , . . . , m − } . We begin by quantifying the computational cost of a rejection sampling algorithm for diffusion bridges in the absenceof blocking.

Proposition 1.

Under Assumptions 3–9 enumerated in Appendix A, the expected computational cost as a function of T of obtaining a single draw with a path-space rejection sampling algorithm, denoted by C rej ( T ) , is given by C rej ( T ) = f ( T ) T e c T , (3) where c > is some constant independent of T , and the function f ∶ R + → R is continuous and such that f ( T ) ∼ T − d / as T → ∞ . In particular, for large enough T there is a constant c > such that: C rej ( T ) ≥ c T − d / e c T . Remark. If P is the law of a drifted Brownian motion, then Proposition 1 cannot be applied directly, because Assumption9 does not hold. However, for this case an easy calculation shows that the acceptance probability of a rejection samplerwith Brownian bridge proposals is equal to , implying (under Assumption 8) that C rej ( T ) is proportional to T .Now considering a single sweep of the blocking schemes introduced in Section 2, note that we have substituted samplinga single diffusion bridge (of length T ) with sampling a number of diffusion bridges of shorter time horizon, δ m,T (asrequired by X ∣ B I ∼ P ⋆ ∣ B I (⋅∣K − I ) ). By application of Proposition 1, the expected computational cost of simulating eachof these shorter bridges is therefore C rej ( δ m,T ) , and hence the expected cost of a single Gibbs sweep is: C sweep ( T, m ) ∶= m ⋅ C rej ( δ m,T ) = f ( δ m,T ) mTm + { c δ m,T } . (4)In particular, if δ m,T < c as T → ∞ for some constant c , then C sweep ( T, m ) ∼ c T, (5)for some c > . Equation (4) holds for all m and T and follows from (3); however, as the behaviour of f ( t ) for small t is not immediately transparent, to learn something about C sweep ( T, m ) when δ m,T is small, we may use the fact thatthe acceptance probability of the rejection sampler approaches as the bridge duration decreases to . This fact impliesthat for small enough t , C rej ( t ) ∼ c t and it yields (5). For instance, upon setting m = ⌊ T ⌋ , the cost in (4) becomes O( T ) as T → ∞ . Direct comparison of the exponential cost C rej ( T ) of direct rejection sampling (as given by Proposition 1), withthe linear cost C sweep ( T, m ) of a single sweep of a blocking scheme (as given by (5)), does not capture the remnantdependency structure introduced by the blocking scheme. In addition we need to consider the number of sweepsrequired to render this dependency negligible. In order to do that we ﬁrst introduce the following notion Deﬁnition 4. (Roberts & Sahu 1997) The [ L -]convergence rate ρ of a Markov chain { X ( n ) ; n = , . . . , N } with thetransition kernel P and an invariant density π is deﬁned as the minimum number for which for all square π -integrablefunctions f , and for all r > ρ ∥ P n f − π ( f )∥ L ( π ) ∶= ∫ [ P n f ( X ( ) ) − π ( f )] π ( d X ( ) ) ≤ V f r n , where P n f ( X ( ) ) ∶= E π [ f ( X ( n ) )∣ X ( ) ] , π ( f ) ∶= E π [ f ( X )] and V f is a positive number that depends on f .4olynomial blocking strategies A P

REPRINT

We can now capture the cost of reducing the dependency on the past by considering the relaxation time , denoted

T = T (

T, m ) , and deﬁned as: T = − ( ρ ) . (6)It represents the time required by the underlying Markov chain to output a draw from its stationary distribution (Levin &Peres 2017). This makes it possible to compare C rej ( T ) with the expected computational cost of the blocked rejectionsampler as follows: C blocking ( T, m ) ∶= T (

T, m ) ⋅ C sweep ( T, m ) . (7)We will later consider the most appropriate choice of blocking scheme, and how to optimise m .Instead of analyzing the chain targeting the law P ⋆ it is sufﬁcient to consider a related chain that targets the marginallaw of the vector G ∶= ( X k , . . . , X k m )∣( X , X T ) = K∣( X , X T ) , (8)which we denote by G . To see this, notice that conditionally on the knots K being distributed according to G , a path X returned after a single Gibbs sweep of a blocking scheme is distributed exactly according to P ⋆ . The object of interestbecomes a Markov chain with a transition kernel P denoting a single Gibbs sweep, and with stationary distribution π = G (Roberts & Rosenthal 2001).Throughout, we additionally assume that the following condition holds, which makes the subsequent required calcula-tions tractable. Assumption 2.

The target law P ⋆ is such that G is a Gaussian process.We discuss this key technical assumption in Section 4, where we note that the established results seem to hold empiricallymore broadly.Under Assumption 2 and using either the lexicographic or checkerboard updating scheme, a single Gibbs step (i.e.an update G∣ B I ∼ P ⋆ ∣ B I ∩{ X k ,...,X km } (⋅∣K − I ) ) has a tractable, Gaussian transition density, and thus so does the entireGibbs sweep G ( n ) ↦ G ( n + ) with mean and covariance µ ∶= E [G] , Σ ∶= C ov [G] . As a consequence it is possible to explicitly characterise the transition kernel P , as follows. Lemma 1.

Under the lexicographic and checkerboard updating schemes, the n -step transition kernel P n of the Markovchain {G ( l ) ; l = , . . . } is Gaussian, with mean and covariance matrix given respectively by: E [G ( l + n ) ∣G ( l ) ] = B n G ( l ) + ( I − B ) − ( I − B n ) b, C ov [G ( l + n ) ∣G ( l ) ] = Σ − B n Σ ( B n ) T , (9) with B ∈ R m × m and b ∈ R m . Under the lexicographic or the checkerboard updating schemes {G ( l ) ; l = , . . . } is an AR ( ) process, and so thespectral radius ρ spec ( B ) of the matrix B must satisfy ρ spec ( B ) < for the process to converge, and equals the L -convergence rate (Amit 1991). This connection extends to the random updating scheme. In the following lemmawe derive the spectral radius of each blocking scheme as a function of m and T , which aids in optimising theirparameterisation and analysing their scaling. We denote by Λ ∶= Σ − the precision matrix of G and deﬁne A ∶= I − diag { Λ − , . . . , Λ − mm } Λ . Lemma 2. (Roberts & Sahu 1997) Under the checkerboard and lexicographic updating schemes, the spectral radiusof the matrix B and the L -convergence rate of a blocked rejection sampler coincide. More explicitly, under thecheckerboard, lexicographic, and random updating schemes respectively the L -convergence rates ( ρ check , ρ lex , and ρ rand resp.) are equal to: ρ m,T ∶= ρ check = ρ lex = ρ spec ( B lex ) = ρ spec ( B check ) = λ max ( A ) , ρ rand = [ m − + λ max ( A ) m ] m , where λ max ( A ) denotes the maximum eigenvalue of the matrix A and where we write B check (resp. B lex ) to denote amatrix B corresponding to the checkerboard (resp. lexicographic) updating scheme. λ max ( A ) can be found more explicitly by exploiting the close connection between the precision matrix Λ and the matrixof partial correlations (given precisely in (B.2), in Appendix B).5olynomial blocking strategies A P

REPRINT

Theorem 2.

We have λ max ( A ) = ∣ c ( δ m,T )∣ cos ( πm + ) , with c ( δ m,T ) ∶= C orr ( X δ , X δ ∣ X , X δ ) . In particular: ρ m,T = c ( δ m,T ) cos ( πm + ) , ρ rand = ⎡⎢⎢⎢⎣ m − + ∣ c ( δ m,T )∣ cos ( πm + ) m ⎤⎥⎥⎥⎦ m . The form of c ( δ m,T ) will, in general, depend on the type of a Gaussian process that is being considered. In thefollowing corollaries we present more explicit versions of the statements from Theorem 2 for the two choices of P :scaled Brownian motion σW , with σ > ; and, the Ornstein–Uhlenbeck process. Without loss of generality we centrethe latter at : d X t = − θX t d t + σ d W t , X = x , t ∈ [ , T ] . (10) Corollary 1. If P is the law of a scaled Brownian motion σW , σ > , then: ρ m,T = cos ( πm + ) , ρ rand = ⎡⎢⎢⎢⎣ m − + cos ( πm + ) m ⎤⎥⎥⎥⎦ m . In particular, independently of T , as m → ∞ ρ m,T = − ( πm + ) + O( m − ) , ρ rand = − ( πm + ) + O( m − ) . Corollary 2. If P is the law of the Ornstein–Uhlenbeck process (10) , then the correlation c ( δ m,T ) can be computed inclosed form (see (B.6) ), and in particular, when T = o ( m ) : lim m →∞ ( − ρ m,T ) ( πm + ) − = m →∞ ( − ρ rand ) ( πm + ) − = , We can now combine these results with (6) to ﬁnd the relaxation time:

Theorem 3.

Under the checkerboard, lexicographic, and random updating schemes and—if P is the law of theOrnstein–Uhlenbeck process (10) —also T = o ( m ) as m → ∞ , we have T ( m ) = O( m ) . We can minimize the cost of blocking C blocking ( T, m ) over the remaining parameter, m , using Theorem 3 and (7).This leads to Theorem 1, which is the main result of this paper (as presented in Section 1, with accompanying proof inAppendix B). Consider a target process deﬁned to be the solution of the following stochastic differential equation (with law P ): d X t = ( − ( X t )) d t +

12 d W t , X = , t ∈ [ , T ] . (11)This diffusion exhibits highly multimodal behaviour, and so in practice it is challenging to simulate trajectories of P (and in particular the conditioned bridge law P ⋆ over large time horizons). It is possible to simulate trajectoriesexactly by means of path-space rejection sampling (as detailed in Appendix A). However, X is not a Gaussian process(it violates Assumption 2), and so Theorem 1 does not hold in a rigorous sense. As such (11) makes an interesting caseto investigate the practical limitations of Theorem 1. As we show below, the empirical results would suggest the theoryholds more broadly.We consider six problems (increasing in difﬁculty) of simulating paths according to the laws P ( T,x ,x T ) , with parameters P ( . , , . ) , P ( . , , . ) , P ( . , , . ) , P ( , , . ) , P ( , , . ) , P ( , , . ) . The values of the end-points were chosen byﬁxing T , simulating multiple paths according to (11) and picking x T to be some point in the vicinity of the (largest)mode. For T = . , the plotted paths resemble Brownian bridges, but as T increases the non-linear dynamics becomepronounced: the diffusion is effectively attracted to a ladder of values and it is repelled at the intermediate points,leading to multimodal behaviour of the trajectories. Drawing paths from the last three laws using path-space rejectionsampling but without blocking (an unmodiﬁed rejection sampler ) is computationally infeasible.6olynomial blocking strategies A P

REPRINT

Figure 1: Time (in seconds; log-transformed) required to sample a single path of the sine diffusion (11) as a function ofthe number of used knots.For each of the six examples we ran a blocked rejection sampler with checkerboard updating scheme for iterationsand with various numbers of knots. For the ﬁrst three problems we also employed an unmodiﬁed rejection sampler. Werecorded the time required to sample a single path (which for a blocked rejection sampler is counted as one execution ofthe inner for-loop of Algorithm 1) and plotted it in Figure 1 against the number of used knots.For T = . the unmodiﬁed rejection sampler clearly outperforms any blocking scheme. This is unsurprising as pathsunder P ( . , , . ) closely resemble Brownian bridges (and indeed every diffusion behaves as a drifted Brownian motionon a small-enough time-scale). However, as T increases, this pattern changes and blocking reduces the cost of obtainingany single sample path. In particular, notice a steep, exponential reduction in cost that is especially pronounced for ( T, x T ) = ( , . ) (this would be illustrated even more emphatically by ( T, x T ) = ( , . ) and ( T, x T ) = ( , . ) had the corresponding experiments with a lower number of knots been run; however, their costs are prohibitively highand had to be omitted).Figure 1, though helpful in conﬁrming Proposition 1, does not take into account the cost due to de-creased speed of mixing—the main motivation for the developments presented in Section 3. To incor-porate also this cost we plot in Figure 2 the time-adjusted effective sample size (taESS), with taESS ∶=[ effective sample size ]/[ elapsed time in seconds to sample an entire chain ] (and ESS was computed according to Gel-man et al. (2013, Section 11.5)) against the (half-) length of blocks (i.e. δ m,T ). As deﬁned in Figure 2, taESS isapproximately equal to a number of independent samples that can be drawn in one second. Clearly, the larger taESS isthe more efﬁcient the algorithm is.First, for any experiment we expect there to be a point for which increasing the number of knots any further will onlylead to a decrease in taESS—this corresponds to all costs being dominated by the cost due to a slowdown in mixing andit is clearly illustrated by sharp dips of curves on the left side of Figure 2. Second, for examples for which the targetlaw is sufﬁciently different from the law of Brownian bridges we expect that some level of blocking will improve theoverall computational cost. This is also conﬁrmed by the declines of taESS curves toward the right side of Figure 2.We note that under the most difﬁcult sampling regimes it was impractical to run the algorithm with even fewer blocksdue to excessive execution times—had the examples been run and the curves continued, the decline in performancewould have been even starker. Additionally, Figure 2 is suggestive of there being an optimal value of δ m,T (somewherearound δ m,T ≈ . ), that is almost independent of T and m and that yields the highest taESS in each experiment. Thisis consistent with the results of Section 3, where an optimal number of knots was found to be roughly m = c T + χ forsome arbitrarily small χ > and some c > , which implies the claim about the dependence of the optimal δ m,T on T and m .Finally, we verify the bound from (2) empirically. To this end, notice that taESS − is approximately equal to theamount of time needed to obtain a single independent sample. This is consistent with the characterization of thecomputational cost of a blocked rejection sampler as given in (7). Theorem 1 asserts that this cost scales at most(slightly in excess of) cubically in the duration of the bridge, so long as δ m,T is taken as approximately constant when T → ∞ . Consequently, taESS ( T ) should be at most a cubic function of T and if plotted on a log-log scale, this would7olynomial blocking strategies A P

REPRINT

Figure 2: Time-adjusted effective sample size vs half-length of blocks (i.e. δ m,T ).be equivalent to taESS ( T ) tracking some line with slope . Figure 3 gives this precise plot, showing that the prediction(2) is indeed satisﬁed. Figure 3: Computational cost as a function of time for the sine example. In this article we have analysed and provided practical guidance for using blocking schemes when conducting Bayesianinference for discretely observed diffusions. We achieved this by studying the computational cost of diffusion bridgesampling algorithms. We have shown rigorously that the computational cost of rejection sampling on path-space(modiﬁed with blocking) targeting the law of a Gaussian diffusion scales at most as O( T + χ ) as T → ∞ (for anarbitrarily small χ > ), so long as the number of equidistant anchors is m = c T + χ (for some χ > and c > ).Furthermore, using the example of a non-linear sine diffusion we provide empirical evidence which would suggest thatthese same conclusions hold for diffusions outside of this restrictive class.Theorem 1 indicates that choosing the number of knots in excess of the guideline we provide results in a penalty whichis polynomial in T , whereas critically choosing too few knots results in the computational cost being dominated by theexponential cost for imputing diffusion bridges between successive knots (see Proposition 1). As such our guidelineof choosing m = c T + χ , (for some small χ > and some c > ) is useful for ensuring the robustness of blocking8olynomial blocking strategies A P

REPRINT schemes, and the simpliﬁcation of choosing m ∝ T is a reasonable heuristic for practitioners. Note that althoughchoosing too many knots is likely to be penalized less than choosing too few, choosing an excessive number of knotscan negatively impact the mixing of the underlying chain.Naturally, for more general laws P it might be useful to consider using irregularly spaced anchors (and so relaxingAssumption 1). Heuristically, we may wish to place more knots in areas in which the proposal law does not approximatethe target law well. Developing more general theory to support the use of an irregular spacing of anchors is likelyto require more knowledge of the speciﬁc diffusion under study. Of course, from a methodological perspective thismotivates future research looking at how to place knots by assessing proposal-target discrepancy, or developing adaptiveschemes. Finally, it is worth recalling that within the context of Bayesian inference for discretely observed diffusionprocesses, the full chain in this setting is a Gibbs sampler that alternates between updating the unknown parametersand imputing the unobserved path. Since the mixing time of the unobserved path inﬂuences the mixing time of theparameter chain, then in light of the work in this paper it may as a future extension also be possible to study the mixingbehaviour of the parameter chain. A Rejection sampling on path-space

In this article we have restricted our attention to the class of diffusion bridges which can be sampled by means of path-space rejection sampling . In particular, to sample from P ⋆ we sample trajectories from an accessible and absolutelycontinuous proposal law (denoted Q ∗ ), and accept with probability proportional to the Radon-Nikodým derivativeof P ⋆ to Q ∗ (Beskos & Roberts 2005, Beskos et al. 2006, 2008, Pollock et al. 2016). To ﬁnd an appropriate Q ∗ weimpose the following common assumption (Kloeden & Platen 2013, Section 4.4) Assumption 3.

There exists η ∶ R d → R d such that ∇ η = σ − .Under Assumption 3 the process Y ∶= { η ( X t ) , t ∈ [ , T ]} satisﬁes the following stochastic differential equation, d Y t = α ( Y t ) d t + d W t , Y = y ∶= η ( x ) , t ∈ [ , T ] , for a known, closed-form drift α . With unit volatility, the law of Y is now absolutely continuous with respect toBrownian motion, and so Brownian motion is a viable proposal law for sampling from P (and the law induced by theBrownian bridge is a viable proposal for P ⋆ ).In order to avoid unnecessary inﬂation of notation we assume throughout the article that σ ≡ in (1), so that α ≡ b , η becomes an identity map, and X ≡ Y . The general case of σ (that satisﬁes Assumption 3) follows without additionaleffort. Assumption 4. α is at least once continuously differentiable. Assumption 5.

There exists a potential function A ∶ R d → R such that ∇ A = α . Assumption 6.

The function φ ( y ) ∶= (∥ α ( y )∥ + ∆ A ( y )) is bounded from below by some Φ ∶= inf { φ ( y ) ∶ y ∈ R d } ∈ R .Under assumptions 4–6 we have (Beskos & Roberts 2005, Section 3): d P ⋆ d Q ∗ ({ η − ( Y t ) , t ∈ [ , T ]}) ∝ exp {− ∫ T ( φ ( Y t ) − Φ ) d t } =∶ p ( Y ) ≤ . (A.1)It follows that sampling from P ⋆ can be accomplished using Algorithm 2. Note that computing the integral in (A.1)required for Algorithm 2 can be achieved either (i) approximately, by simulating a candidate Y ○ over a ﬁne mesh andcomputing the integral in (A.1) numerically; or (ii) exactly, via an additional randomization step that utilises a Poissonpoint process (Beskos & Roberts 2005, Beskos et al. 2006, 2008). We refer to these two methods as, respectively, approximate and exact path-space rejection samplers. Algorithm 2:

Rejection sampling on path-space while

True do Draw Y ○ ∼ W ( T,η ( x ) ,η ( x T )) , i.e. a d -dimensional Brownian bridge joining η ( x ) and η ( x T ) on [ , T ] ;Draw U ∼ Unif([0,1]) ; if U ≤ p ( Y ○ ) then Set X ← { η − ( Y ○ t ) , t ∈ [ , T ]} ; return X A P

REPRINT

To prove Proposition 1, which quantiﬁes the computational cost of Algorithm 2, we impose the following naturalassumptions on the cost of simulating a proposal trajectory.

Assumption 7.

The cost of generating any proposal sample X is independent of the value of p ( X ) as deﬁned in (A.1). Assumption 8.

The cost c ( X ) of simulating X has expectation growing linearly in T: E [ c ( X )] = c T , c ∈ R + .Assumptions 7 and 8 are always satisﬁed if rejection sampling is performed with the approximate method describedabove, so long as the mesh width is kept constant as T → ∞ . For the exact method, Assumption 7 will in general beviolated (for instance, if the number of simulated Poisson points is , then conditional on this information p ( X ) = a.s.) but for T → ∞ it is a good enough approximation. Assumption 8 is satisﬁed if φ is bounded.We can now derive the cost of a single draw using a path-space rejection sampler as follows: Lemma 3.

Under Assumptions 3–8: C rej ( T ) = c q T ( x , x T ) p T ( x , x T ) T e − Φ T , where c > is some constant independent of T , p T ( x , x T ) is the transition density under P for going from x to x T over the interval [ , T ] and q T ( x , x T ) is the same transition density, but under the proposal law Q instead.Proof. Denote by X ( i ) , i ∈ { , , . . . } , independent samples from Q ∗ and by c ( X ( i ) ) the cost of sampling path X ( i ) , i ∈ { , , . . . } . Rejection sampling requires a geometrically distributed number of simulations (with a randomlydistributed parameter at each trial), so its expected cost is C rej ( T ) ∶= E ⎡⎢⎢⎢⎢⎣ ∞ ∑ i = ⎛⎝ i ∑ j = c ( X ( j ) )⎞⎠ ⋅ p ( X ( i ) ) i − ∏ j = ( − p ( X ( j ) ))⎤⎥⎥⎥⎥⎦= E [ c ( X ( ) )] ∞ ∑ i = i E [ p ( X ( i ) )] i − ∏ j = ( − E [ p ( X ( j ) )])= E [ c ( X )] E [ p ( X )] , (A.2)where the measures with respect to which the expectations above are taken should be clear from the context (and includeBrownian bridge measures, products of Brownian bridge measures, and any additional randomness needed to simulateevents of probability p ( X ( i ) ) ). We now have: E Q ∗ [ p ( X )] = E Q ∗ [ exp {− ∫ T ( φ ( X t ) − Φ ) d t }] (A.3) = E Q ∗ [ exp {[ A ( X T ) − A ( X )] − [ A ( X T ) − A ( X )] − ∫ T ( φ ( X t ) − Φ ) d t }]= exp {− A ( X T ) + A ( X ) + Φ T } p T ( x , x T ) q T ( x , x T ) E Q ∗ [ d P ⋆ d Q ∗ ( X )]= c p T ( x , x T ) q T ( x , x T ) e Φ T , (A.4)where c ∶= exp {− A ( x T ) + A ( x )} and where the third equality followed from Dacunha-Castelle & Florens-Zmirou(1986, Eq (3.1)). The result now follows by substituting (A.4) into (A.2) and noting that, by assumption 8, we have E [ c ( X )] = c T .To better understand the scaling with T of the ratio of transition densities under the laws P and Q in Lemma 3 weimpose the following ﬁnal assumption, which allows us to establish Lemmata 4 and 5 required for proving Proposition1. Assumption 9.

The target diffusion is ergodic and deﬁned on R d . Lemma 4.

Under Assumption 9, for f ∶ T → q T ( x , v ) p T ( x , v ) , v ∈ R d , we have that f ( T ) ∼ T − d / as T → ∞ and d denotes the dimension of the process. A P

REPRINT

Proof. q and p are well-behaved densities, which means f is continuous. As the target diffusion is ergodic, p T ( x , v ) → ˆ p ( v ) as T → ∞ , where ˆ p is the stationary density of the diffusion law. On the other hand q T ( x , v ) is just a Gaussiandensity with variance T d I , which for T → ∞ behaves as ∼ T − d / . Lemma 5.

Assumption 9 implies that Φ < .Proof. From Lemma 4 we have that the RHS of (A.4) is ∼ T d / e Φ T for T → ∞ . As the LHS of (A.4) represents anexpected probability, we must have Φ < for this expression to take values in [ , ] .We are now in a position to prove Proposition 1. Proof of Proposition 1.

This follows directly by combining Lemmata 3, 4 and 5.

B Proofs

Proof of Lemma 1.

The chain {G ( l ) ; l = , . . . } coincides with the chains considered in Roberts & Sahu (1997). Inparticular, the -step transition kernel under lexicographic and checkerboard updating schemes is stated explicitly asRoberts & Sahu (1997, Lemma 1). We provide a proof for completeness. {G ( l ) ; l = , . . . } behaves like an AR ( ) process, therefore G ( l + ) = B G ( l ) + (cid:15), (cid:15) ∼ N ( b, V ) , for some B , b , and V and G ( l + n ) = B n G ( l ) + (cid:15) ( n ) , (cid:15) ( n ) ∼ N ( b ( n ) , V ( n ) ) , (B.1)with b ( n ) ∶= ( I + B + ⋅ ⋅ ⋅ + B n − ) b = ( I − B ) − ( I − B n ) b and some V ( n ) that we are about to derive. Under eitherscheme b and B can be found in closed form (which we omit for brevity). If the chain has reached stationarity, i.e. if G ( l ) ∼ N ( µ, Σ ) , then also G ( l + n ) ∼ N ( µ, Σ ) . On the other hand, if G ( l ) ∼ N ( µ, Σ ) , then by (B.1) G ( l + n ) ∣G ( l ) ∼ N ( B n G ( l ) + ( I − B ) − ( I − B n ) b, B Σ B T + V ( n ) ) . Consequently: Σ = B n Σ ( B n ) T + V ( n ) , and this yields (9). Proof of Lemma 2.

Since the chain {G ( l ) ; l = , . . . } coincides with the chains considered in Roberts & Sahu (1997),the statement of Roberts & Sahu (1997, Theorem 1) applies under checkerboard and lexicographic updating schemes:i.e. the L convergence rates under the two regimes are given by ρ check = ρ spec ( B lex ) and ρ lex = ρ spec ( B lex ) respectively. Due to tridiagonal structure of the precision matrix Λ (which follows from the Markov property of theprocess G ; see also a short explanation in the proof of Theorem 2 that leads up to (B.3)), Roberts & Sahu (1997,Corollary 3) implies that the two spectral radii coincide, i.e. ρ spec ( B lex ) = ρ spec ( B check ) . By the same token, Roberts& Sahu (1997, Theorem 5) applies as well, yielding ρ spec ( B check ) = λ max ( A ) . Finally, the L convergence rate of therandom updating scheme follows from Roberts & Sahu (1997, Theorem 2). Proof of Theorem 2.

The precision matrix Λ of any random vector G with non-degenerate covariance matrix can berelated to a matrix of partial correlations via (Lauritzen 1996, p. 130): C orr (G [ i ] , G [ j ] ∣G/{G [ i ] , G [ j ] }) = − Λ [ i,j ] √ Λ [ i,i ] Λ [ j,j ] . (B.2)By the deﬁnition of G in (8), it is easy to see that C orr (G [ i ] , G [ j ] ∣G/{G [ i ] , G [ j ] }) = whenever ∣ i − j ∣ > ;that by symmetry Λ [ i,i + ] = Λ [ i + ,i ] , ( i = , . . . , m ) ; and that Λ [ i,i ] = Λ [ j,j ] , ( i, j = , . . . , m ) , because V ar (G [ i ] ∣G/G [ i ] ) = ( Λ [ i,i ] ) − , ( i = , . . . , m ) (Roberts & Sahu 1997, p.296). In addition, under Assumption 2,the covariance matrix depends only on time and not on the state variable, thus combining this with Assumption1: C orr (G [ i ] , G [ i + ] ∣G/{G [ i ] , G [ i + ] }) =∶ c ( δ m,T ) , ( i = , . . . , m − ) . Consequently, Λ is a Toeplitz matrix whose11olynomial blocking strategies A P

REPRINT non-zero entries are related via Λ [ i,i + ] = Λ [ i + ,i ] = − Λ [ i,i ] c ( δ m,T ) , ( i = , . . . , m ) . The form of matrix A now follows: A = ⎛⎜⎜⎜⎜⎝ c ( δ m,T ) . . . c ( δ m,T ) c ( δ m,T ) ⋱ ⋮ ⋱ ⋱ ⋱ ⋮ ⋱ c ( δ m,T ) c ( δ m,T ) . . . c ( δ m,T ) ⎞⎟⎟⎟⎟⎠ . (B.3)The eigenvalues of Toeplitz matrices may be found in closed form (Smith 1985, Kulkarni et al. 1999) and in particular,those of matrix A are given by − c ( δ m,T ) cos ( πlm + ) , l = , . . . , m. Depending on the sign of c ( δ m,T ) the maximal eigenvalue of A is therefore given by: λ max ( A ) = {− c ( δ m,T ) cos ( πmm + ) = c ( δ m,T ) cos ( πm + ) , if c ( δ m,T ) > , − c ( δ m,T ) cos ( πm + ) if c ( δ m,T ) < , and the result concerning λ max ( A ) follows. The remaining statements follow as well by substituting the expression for λ max ( A ) into Lemma 2. Proof of Corollary 1.

By Theorem 2, only c ( δ m,T ) needs to be computed. This follows from standard properties ofBrownian motion and bridges: c ( δ m,T ) = C ov ( X δ , X δ ∣ X , X δ )√ V ar ( X δ ∣ X , X δ ) V ar ( X δ ∣ X , X δ ) = δσ √( δσ ) = . (B.4)The asymptotic behaviour of ρ m,T follows immediately from Taylor expansion of cos ( x ) around . For the asymptoticbehaviour of ρ rand , notice that by Taylor expansions of cos ( x ) around , log ( − x ) around , and exp ( x ) around respectively: ρ rand = exp { m log [ m − { m − + cos ( πm + )}]}= exp { m log [ − m ( πm + ) + O( m − )}]}= exp { − ( πm + ) + O( m − )}= − ( πm + ) + O( m − ) . Proof of Corollary 2.

For the Ornstein–Uhlenbeck process we have: C ov [( Y s Y t )∣ Y , Y T ] = σ θ ( e − θs sinh ( θs ) e − θt sinh ( θs ) e − θt sinh ( θs ) e − θt sinh ( θt ))− σ θ ⎛⎜⎝ e − θT sinh ( θs ) sinh ( θT ) e − θT sinh ( θs ) sinh ( θt ) sinh ( θT ) e − θT sinh ( θs ) sinh ( θt ) sinh ( θT ) e − θT sinh ( θt ) sinh ( θT ) ⎞⎟⎠ , < s < t < T. (B.5)It now follows from direct substitution of the relevant terms of (B.5) into the deﬁnition of the partial correlation (usedfor instance in (B.4)) that: c ( δ m,T ) = e − θδ sinh ( θδ ) − e − θδ sinh ( θδ ) sinh ( θδ ) sinh ( θδ ) ¿``(cid:192)( e − θδ sinh ( θδ ) − e − θδ sinh ( θδ ) sinh ( θδ ) )( e − θδ sinh ( θδ ) − e − θδ sinh ( θδ ) sinh ( θδ ) ) . (B.6)To uncover the asymptotic behaviour of ρ m,T (as m → ∞ ), we ﬁrst rely on Taylor expansions to derive the estimate12olynomial blocking strategies A P

REPRINT e − aδ sinh ( bδ ) sinh ( dδ ) = bdδ − abdδ + ( ( b d + bd ) + a bd ) δ − ( ab d + abd + a bd ) δ + O( δ ) , δ → , from which it follows that c ( δ m,T ) = [( δ − δ + δ − δ + O( δ )) − ( δ − δ + δ − δ + O( δ ))]⋅ [( δ − δ + δ − δ + O( δ )) − ( δ − δ + δ − δ + O( δ ))] − / ⋅ [( δ − δ + δ − δ + O( δ )) − ( δ − δ + δ − δ + O( δ ))] − / = δ + δ + O( δ ) δ + δ + O( δ ) = − δ + O( δ ) + δ + O( δ ) , δ → . This yields asymptotically: ρ m,T = ( − δ + O( δ ) + δ + O( δ ) ) ( − ( πδT ) + O( δ ))= − δ [ π T + + δ + O( δ ) ] + O( δ ) , δ → . The asymptotic convergence rate ρ rand follows similarly: ρ rand = exp { m log [ m − { m − + ( − δ + O( δ ) + δ + O( δ ) )( − ( πδT ) + O( δ ))}]}= exp { m log [ − m δ [ π T + + δ + O( δ ) ] + m O( δ )]}= exp { − δ [ π T + + δ + O( δ ) ] + O( δ )}= − δ [ π T + + δ + O( δ ) ] + O( δ ) , δ → . Proof of Theorem 3.

The result follows immediately from (6) and Corollaries 1 and 2.

Proof of Theorem 1.

This follows from Theorem 3 and (7). We minimize the cost of blocking C blocking ( T, m ) overthe remaining hyperparameter m and derive its ﬁnal form as a function of T .1. If P is the law of the Ornstein–Uhlenbeck process, then the requirement T = o ( m ) from Theorem 3 alreadyimplies that δ m,T < c for large enough m and T and thus C blocking ( T, m ) ∼ T ( m ) T . As T ( m ) = O( m ) ,in order to minimize C blocking ( T, m ) , we should take m = c T g ( T ) , for some c > , and g ( T ) such that g ( T ) → ∞ as T → ∞ at a rate that is as slow as possible.2. On the other hand, if P is the law of the scaled Brownian motion, then the fastest growing contribution is thatfrom the exponential term in (7) and in order to annul it, we should take m = c T .13olynomial blocking strategies A P

REPRINT

References

Amit, Y. (1991), ‘On rates of convergence of stochastic relaxation for Gaussian and non-Gaussian distributions’,

Journalof Multivariate Analysis (1), 82–99.Arnaudon, A., van der Meulen, F., Schauer, M. & Sommer, S. (2020), ‘Diffusion bridges for stochastic Hamiltoniansystems with applications to shape analysis’, arXiv preprint arXiv:2002.00885 .Beskos, A., Papaspiliopoulos, O. & Roberts, G. O. (2006), ‘Retrospective exact simulation of diffusion sample pathswith applications’, Bernoulli (6), 1077–1098.Beskos, A., Papaspiliopoulos, O. & Roberts, G. O. (2008), ‘A factorisation of diffusion measure and ﬁnite sample pathconstructions’, Methodology and Computing in Applied Probability (1), 85–104.Beskos, A. & Roberts, G. O. (2005), ‘Exact simulation of diffusions’, The Annals of Applied Probability (4), 2422–2444.Bladt, M., Sørensen, M. et al. (2014), ‘Simple simulation of diffusion bridges with application to likelihood inferencefor diffusions’, Bernoulli (2), 645–675.Boys, R. J., Wilkinson, D. J. & Kirkwood, T. B. L. (2008), ‘Bayesian inference for a discretely observed stochastickinetic model’, Statistics and Computing (2), 125–135.Chib, S., Pitt, M. K. & Shephard, N. (2004), ‘Likelihood based inference for diffusion driven models’.Dacunha-Castelle, D. & Florens-Zmirou, D. (1986), ‘Estimation of the coefﬁcients of a diffusion from discreteobservations’, Stochastics: An International Journal of Probability and Stochastic Processes (4), 263–284.Delyon, B. & Hu, Y. (2006), ‘Simulation of conditioned diffusion and application to parameter estimation’, StochasticProcesses and their Applications (11), 1660–1675.Durham, G. B. & Gallant, A. R. (2002), ‘Numerical techniques for maximum likelihood estimation of continuous-timediffusion processes’,

Journal of Business & Economic Statistics (3), 297–338.Freidlin, M. I. & Wentzell, A. D. (1993), ‘Diffusion processes on graphs and the averaging principle’, The Annals ofprobability pp. 2215–2245.Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. & Rubin, D. B. (2013),

Bayesian data analysis , CRCpress.Golightly, A. & Wilkinson, D. J. (2008), ‘Bayesian inference for nonlinear multivariate diffusion models observed witherror’,

Computational Statistics & Data Analysis (3), 1674–1693.Hairer, M., Stuart, A. M., Voss, J. et al. (2011), ‘Sampling conditioned hypoelliptic diffusions’, The Annals of AppliedProbability (2), 669–698.Kalogeropoulos, K. (2007), ‘Likelihood-based inference for a class of multivariate diffusions with unobserved paths’, Journal of Statistical Planning and Inference (10), 3092–3102.Kalogeropoulos, K., Roberts, G. O. & Dellaportas, P. (2010), ‘Inference for stochastic volatility models using timechange transformations’,

The Annals of Statistics (2), 784–807.Karatzas, I. & Shreve, S. E. (1998), Methods of mathematical ﬁnance , Vol. 39, Springer.Kloeden, P. E. & Platen, E. (2013),

Numerical Solution of Stochastic Differential Equations , Vol. 23, Springer Science& Business Media.Kulkarni, D., Schmidt, D. & Tsui, S. K. (1999), ‘Eigenvalues of tridiagonal pseudo-Toeplitz matrices’,

Linear Algebraand its Applications (1-3), 63–80.Lansky, P. & Ditlevsen, S. (2008), ‘A review of the methods for signal estimation in stochastic diffusion leakyintegrate-and-ﬁre neuronal models’,

Biological cybernetics (4-5), 253.Lauritzen, S. L. (1996), Graphical models , Vol. 17, Clarendon Press.Levin, D. A. & Peres, Y. (2017),

Markov chains and mixing times , Vol. 107, American Mathematical Soc.Øksendal, B. (2003),

Stochastic differential equations , Springer.Pollock, M., Johansen, A. M. & Roberts, G. O. (2016), ‘On the exact and ε -strong simulation of (jump) diffusions’, Bernoulli (2), 794–856.Roberts, G. O. & Rosenthal, J. S. (2001), ‘Markov chains and de-initializing processes’, Scandinavian Journal ofStatistics (3), 489–504.Roberts, G. O. & Sahu, S. K. (1997), ‘Updating schemes, correlation structure, blocking and parameterization for theGibbs sampler’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2), 291–317.14olynomial blocking strategies A P

REPRINT

Roberts, G. O. & Stramer, O. (2001), ‘On inference for partially observed nonlinear diffusion models using themetropolis–hastings algorithm’,

Biometrika (3), 603–621.Schauer, M., Van Der Meulen, F., Van Zanten, H. et al. (2017), ‘Guided proposals for simulating multi-dimensionaldiffusion bridges’, Bernoulli (4A), 2917–2950.Shephard, N. & Pitt, M. K. (1997), ‘Likelihood analysis of non-Gaussian measurement time series’, Biometrika (3), 653–667.Smith, G. D. (1985), Numerical solution of partial differential equations: ﬁnite difference methods , Oxford UniversityPress.Stramer, O. & Roberts, G. O. (2007), ‘On Bayesian analysis of nonlinear continuous-time autoregression models’,

Journal of Time Series Analysis (5), 744–762.van der Meulen, F. & Schauer, M. (2018), ‘Bayesian estimation of incompletely observed diffusions’, Stochastics90