[PDF] Unbiased simulation of rare events in continuous time

Abstract

For rare events described in terms of Markov processes, truly unbiased estimation of the rare event probability generally requires the avoidance of numerical approximations of the Markov process. Recent work in the exact and \varepsilon-strong simulation of diffusions, which can be used to almost surely constrain sample paths to a given tolerance, suggests one way to do this. We specify how such algorithms can be combined with the classical multilevel splitting method for rare event simulation. This provides unbiased estimations of the probability in question. We discuss the practical feasibility of the algorithm with reference to existing \varepsilon-strong methods and provide proof-of-concept numerical examples.

Full PDF

UUnbiased simulation of rare events in continuous time

James Hodgson , Adam M. Johansen , and Murray Pollock

Department of Statistics, University of Warwick, Coventry, CV4 7AL * [email protected] School of Mathematics, Statistics and Physics, Newcastle University,Newcastle-upon-Tyne, NE1 7RU The Alan Turing Institute, The British Library, London, NW1 2DBFebruary 17, 2021

Abstract

For rare events described in terms of Markov processes, truly unbiased esti-mation of the rare event probability generally requires the avoidance of numericalapproximations of the Markov process. Recent work in the exact and ε -strong sim-ulation of diﬀusions, which can be used to almost surely constrain sample paths toa given tolerance, suggests one way to do this. We specify how such algorithms canbe combined with the classical multilevel splitting method for rare event simulation.This provides unbiased estimations of the probability in question. We discuss thepractical feasibility of the algorithm with reference to existing ε -strong methodsand provide proof-of-concept numerical examples. Keywords— epsilon-strong simulation, exact simulation, Feynman-Kac, sequential MonteCarlo, splitting

Rare events are those which have (very) low probability of occurrence. Estimating the probabil-ity of rare events is important, among other places, throughout the natural and social sciences;see, for example, [35, Part II] for a broad range of applications. The case of interest here is thatwhere the rare event corresponds to a continuous-time Markov process hitting a particular setbefore it enters some other recurrent set. This setting has attracted considerable attention inthe literature, and general solutions centre around simulation-based methods.The principal approaches fall into two broad categories, importance sampling and splitting .In importance sampling, one simulates from a process for which the event of interest is morelikely to occur, and corrects for the change of sampling distribution using importance weights.With splitting methods, trajectories which approach the rare set (in an appropriate sense) arereplicated to allow lower-variance estimation of the target probability. This paper is concernedwith splitting methods, in particular with implementing such methods with no bias for a broadclass of continuous-time processes. Existing methods depend upon time-discretisation and henceintroduce a diﬃcult to quantify bias. It is shown in this article that the adaptation of ideas fromthe ﬁeld of (cid:15) -strong simulation to this context allows this bias to be avoided. Let ( X ( t ) : t ≥ a r X i v : . [ m a t h . S T ] F e b e a continuous-time Markov process in R d , and let A, B ⊂ R d be disjoint sets, with A positiverecurrent for X . The problem of interest is that of eﬃciently estimating the probability that X reaches set B before set A when this probability is very small. That is, writing τ S for the ﬁrsthitting time of a set S , the objective is to estimate p = P ( τ B < τ A ) (cid:28) . The assumption that p (cid:28) { τ B < τ A } enough times to get a reliable estimate will be imprac-tically high. In fact, the relative variance of the naive estimator obtained from N Monte Carlosimulations is p (1 − p ) /N · p − ≈ / ( N p ), which suggests that a large multiple of p − (cid:29) Multilevel splitting (MLS) is a popular algorithm based on targeting the rare event via asequence of more likely events. The idea goes back to the 1951 paper of Kahn and Harris[25] (who in turn attribute the idea to von Neumann), which discusses an application to thetransmission of particles through an impeding barrier in the context of nuclear shielding. Themethod is to choose a sequence of nested sets B ⊃ B ⊃ · · · ⊃ B m = B , all disjoint from A , and use a particle system to sequentially estimate P ( τ B i < τ A ). Starting with a particlesystem of a large enough size, N , a reasonable fraction will reach B , allowing an estimate of p . Then, by branching (or “splitting”) those which do into R i copies, a healthy populationcan be maintained to estimate the subsequent probabilities. A detailed presentation is given inSection 2.1.Splitting algorithms have been independently rediscovered many times and in many variantssince the work of [25]. Prominent examples include the repetitive simulation trials after reachingthresholds (RESTART) algorithm of [36], developed for modelling packet loss probabilities intelecommunications, and the pruning-enriched Rosenbluth method (PERM) of [22] for simulatingpolymer chains.The MLS algorithm we present in Section 2.1 is that found in [18] which addresses variousimplementation issues such as the choice of levels and importance function. The unbiasedness ofthe algorithm for discrete-time processes is shown rigorously in [1], which identiﬁes and resolvesan issue in the original argument of [18]. The construction of conﬁdence intervals and the optimalchoice of tuning parameters under cost constraints are addressed in [28, 29]. A characterizationof the asymptotic properties of this algorithm, including a central limit theorem, are given in[16].Choosing the nested sets and other parameters of the algorithm to maintain a particlepopulation of stable size, rather than one which dies out or explodes, can be diﬃcult. Onepractical variant which removes the diﬃculty of choosing the splitting ratios R i in advance isthat of [30], in which a ﬁrst particle system is used to estimate the R i , and a second system usesthese estimated values to estimate p . An alternative idea is to construct the levels B i adaptively,for example via the scheme of [11]. A generalisation of this scheme has recently been shown beunbiased in [7].The proof of unbiasedness in [1] also holds for a variant in which the initial system of N par-ticles is kept at ﬁxed size by sampling new trajectories uniformly at random (with replacement)from the surviving trajectories at each level. This variant, which is slightly type of SequentialMonte Carlo method and can be understood within the framework of [13]; this approach is alsodiscussed in [18] under the name of ﬁxed eﬀort splitting . In this version the sets B i are stillchosen in advance, but the number of particles is ﬁxed at N for the duration of the algorithm.Rather than independently “splitting” each path which survives to B i into a pre-determinednumber of oﬀspring, exactly N particles are resampled (i.e. sampled with replacement) fromamong the surviving particles. This removes the diﬃculty of choosing a suitable splitting ratioin order to arrive at a stable population size, but it is more diﬃcult to understand the varianceproperties of this algorithm and even the asymptotic variance expression is somewhat more omplex than the variance of the simple algorithm.As discussed, in principle at least the estimate obtained using MLS is unbiased. But inorder to implement the algorithm, it is necessary to simulate hitting times and locations ofthe sets B i for the process X . When dealing with discrete-time processes, there is often nodiﬃculty in representing a full sample path, and so a direct implementation gives unbiasedestimates. However, when the process of interest evolves in continuous-time, Monte Carlosample paths are usually constructed using a discrete-time numerical scheme, causing bias inthe resulting estimates — at least outside the setting of ﬁnite activity pure jump processes. Wediscuss this issue and some related literature in Section 2.2. This paper introduces a frameworkwhich combines algorithms for the ε - strong simulation of sample paths of X (sometimes termedtolerance-enforced simulation) with MLS in order to obtain a truly unbiased algorithm.For a chosen time horizon [ s, t ], ε -strong methods are explicit constructions of a family ofrandom processes ˜ X ε [ s, t ] indexed by the parameter ε >

0. These processes are deﬁned jointlyon the same probability space as X , admit a ﬁnite-dimensional representation, and satisfysup r ∈ [ s,t ] (cid:107) X ( r ) − ˜ X ε ( r ) (cid:107) < ε almost surely, for an appropriate norm (cid:107) · (cid:107) , typically the supremum norm (cid:107) · (cid:107) ∞ . Such pathsconstrain, almost surely, the range of X over the time interval [ s, t ] to within the tolerance ε .Moreover, for η < ε , such schemes allow one to sample ˜ X η conditional on ˜ X ε . This meansit is possible to sample, exactly, Bernoulli random variables which indicate whether a pathof X entering given subsets of R d . We use such ε -strong samplers to construct a modiﬁedprocedure for MLS, and establish that this modiﬁed procedure is unbiased — and, in contrastto the abstract MLS algorithm without time discretisation, can be implemented for a class ofcontinuous-time processes.The literature on ε -strong simulation is closely related to that on the exact simulation ofdiﬀusions , [4, 2]. Both constructions are rejection samplers on the space of continuous paths,using Brownian-like proposals, in which the acceptance probability for a given path can beassessed by looking at only ﬁnitely many points. The incorporation of exact simulation intoan SMC algorithm has been successfully carried out in the ﬁltering setting [17]; our approachhere is diﬀerent, using ε -strong simulation algorithms within the SMC context to allow forunbiased estimation of rare event probabilities. The unbiasedness of our approach follows fromthe possibility of exactly determining whether sample paths cross particular barriers, ratherfrom the use of unbiased random weights as in the random weight particle ﬁlter setting of [17].The ﬁrst ε -strong algorithm, for Brownian motion, was given in [3]. The constructionexploits a certain representation of the escape probability of one-dimensional Brownian motionfrom a bounded interval. The contributions of [33] showed that exact simulation was possiblefor a much wider class of diﬀusions and jump-diﬀusions than merely Brownian motion (whichinclude multidimensional diﬀusions amenable to Lamperti transformation, see [31]). This preciseconstruction is what we use for the development of our approach applied to MLS.Although we focus in our implementation upon the algorithmic construction of [33], ourframework can employ any ε -strong approach with certain basic properties and we note thatthis area is being actively researched and a number of other such schemes can be found in theliterature. Thus, in principle, the approach developed in this paper provides a mechanism for theunbiased estimation of rare event probabilities associated with a broad and increasing categoryof stochastic processes. A more general multidimensional ε -strong construction appears in [5],which employs altogether diﬀerent techniques inspired by the theory of rough paths. Ratherthan sampling the diﬀusion path itself exactly, one instead discretely samples the Brownianpath driving the diﬀusion. A rough path-type continuity result guarantees that passing thisdiscrete Brownian sample through a modiﬁed Euler scheme gives a sample within ε -toleranceof the diﬀusion path. [12] contains a related construction for fractional Brownian motion of any urst index, and for solutions to SDEs driven by fractional Brownian motion with Hurst index H > . An ε -strong algorithm for a process which is not the solution to an SDE is describedin [8]. Recent applications of this methodology include [32].This paper will demonstrate that the information provided by ε -strong algorithms can beexploited within MLS algorithms to obtain an exact estimate of the desired rare event probabil-ity. In Section 2, we introduce multilevel splitting formally, discuss the problem of estimationbias into practical implementations, and present ε -strong sampling as a way to overcome this.In Section 3, we develop in detail a method for implementing epsilon-strong samplers insideMLS algorithms and establish the unbiasedness of the resulting estimates. In Section 4, weoﬀer some simple numerical simulations to illustrate the method. We conclude with a briefdiscussion in Section 5. This section describes the existing work upon which the developments of Section 3 depend. InSection 2.1, we give a full description of multilevel splitting and its SMC variant, together withdiscussion of the unbiasedness of the corresponding estimators. In Section 2.2, we explain whyunbiasedness generally does not hold in any existing implementation of these methods. Finally,Section 2.3 introduces a formalism of ε -strong simulation (following [5]), and describes how itcan be used to address the shortcomings detailed in Section 2.2. Recall that multilevel splitting requires the speciﬁcation of a sequence of nested events R d ⊃ B ⊃ B ⊃ · · · ⊃ B m = B in such a way that the probabilities p i := (cid:40) P ( τ B < τ A ) , i = 1 P ( τ B i < τ A | τ B i − < τ A ) , ≤ i ≤ m are large relative to p , and may consequently be estimated more eﬃciently.In order to do this, it is convenient to assume the existence of a continuous function ξ : R d → R of which the boundaries of A and B i are level sets . That is, we suppose that there arereal numbers z A < z < z < · · · < z m = z B such that A = ξ − (( −∞ , z A ]) , B i = ξ − ([ z i , ∞ )) , with boundaries ∂A = ξ − ( { z A } ) , ∂B i = ξ − ( { z i } ). The function ξ has numerous names in theliterature, including the reaction co-ordinate , and is typically deﬁned such that higher valuesrepresent locations closer to B , as it is throughout this paper.We take ( X ( t ) : t ≥

0) to be a continuous-time Markov process with almost surely continuoussample paths (the central example of diﬀusion processes will be the focus of our algorithmicconstructions). We take X (0) to be distributed according to an initial distribution λ , wherethe support of λ is contained in ξ − ([ z A , z B )) (so X may begin on the boundary ∂A of A ; thischoice is made for notational convenience, and the support of λ may be taken instead to beall of R d with only minor modiﬁcations). Except where necessary, dependence upon λ will besuppressed from our notation.Write τ i for τ B i , and deﬁne σ i = τ A ∧ τ i to be the ﬁrst hitting time of A ∪ B i for X . Notethat τ A , τ i , σ i are equivalently the hitting times for the one-dimensional process [ ξ ( X )] s = ξ ( X s )of { z i } , { z A } and { z A , z i } respectively. In an algorithmic implementation, it is more convenientto use ξ ( X ) to decide when a level has been crossed if X has dimension greater than one. Weremark that the process ξ ( X ) is not in general a Markov process, so it is not possible to reduceall problems of this form to the univariate Markovian process setting. BB B B Figure 1: An illustration of multilevel splitting for a single particle system. The particlebegins at the black node, and each branch splits into two i.i.d copies upon reaching a set B i for the ﬁrst time. Branches which reach A terminate there. Those which reach B areused to form an estimate of the rare event probability P (process hits B before A ). The idea of MLS is to run a particle system in which each particle splits into several, say R i , copies immediately upon ﬁrst reaching a set boundary ∂B i . Alternatively, it is terminatedupon reaching ∂A . The splitting is managed so that a healthy system of particles is available forestimating each p i . Since p = (cid:81) mi =1 p i , if we have an estimator ˆ p i for each p i , then (cid:81) mi =1 ˆ p i is anatural estimator for p . The ˆ p i are deﬁned as follows: suppose that we begin with N particles,of which N reach level B before A . Then we estimate p withˆ p = N N . Suppose that each of these N surviving particles is split into a constant number R of copiesimmediately upon reaching B , and that N of the branched trajectories reach B before A .Then we estimate p with ˆ p = N R N . Continuing in this way, one obtains a sequence of estimators ˆ p , . . . , ˆ p m of p , . . . , p m which maybe multiplied together to give the estimateˆ p = m (cid:89) i =1 ˆ p i = N m N (cid:81) m − i =1 R i for p .This estimator is unbiased; a proof may be found in [1, Section 3, Proposition 3.1]. Weuse a similar argument in Proposition 3.3 in Section 3.3 to establish the unbiasedness of thesplitting-type algorithm which we develop there.Provided that z i and R i are well-chosen, this estimate can be much more eﬃcient thanna¨ıve Monte Carlo estimation. For instance, choosing z i such that p i = p /m , and choosing R i with small variance such that E [ R i ] = p − i , the relative variance of ˆ p is reduced to approx-imately O ( p − /m ). See [20, 28], for more detail on parameter choice and asymptotic variancecalculations.It is convenient to work with the discrete-time pair-process U i = ( σ i , X i ( σ i )), i = 1 , . . . , m ,i.e. the observations of X at its splitting times. Let S be the Borel sigma algebra associatedwith R ≥ × R d , and let M i : ( R ≥ × R d ) × S → [0 ,

1] denote the Markov kernels of this discrete-time process. Finally, we deﬁne potential functions G i : R ≥ × R d → { , } on the state spaceof this pair process as indicators of the sets B i for the process ( U i ): G i ( t, x ) = (cid:40) , if ξ ( x ) ≥ z i , , otherwise . A full description of multilevel splitting is given in Algorithm 1. lgorithm 1 Idealised Multilevel SplittingGiven λ together with G i , M i for i = 1 , . . . , m , an initial number of particles N , andsplitting ratios R , . . . , R m − :1. For each j = 1 , . . . , N , draw independently X j (0) ∼ λ and U j ∼ M (cid:0)(cid:0) , X j (0) (cid:1) , · (cid:1) .2. Let S = { U j : G ( U j ) = 1 } be a list of the the surviving paths, and set N = | S | .3. For i = 2 , . . . , m :(a) If N i − = 0, return ˆ p = 0.(b) Else given S i − = { ¯ U ji − } N i − j =1 , for all ( j, k ) ∈ { ( j (cid:48) , k (cid:48) ) :1 ≤ j (cid:48) ≤ N i − , ≤ k (cid:48) ≤ R i − } sample independently U j,ki ∼ M i ( ¯ U ji − , · ).(c) Let S i = { U j,ki : G i ( U j,ki ) = 1 } , and N i = | S i | .4. Return ˆ p = N m N (cid:81) m − i =1 R i . A small modiﬁcation to Algorithm 1 gives a variant commonly known as ﬁxed eﬀort split-ting , which can be viewed as a Sequential Monte Carlo algorithm. This connection has beenexploited previously by [10] (note that these algorithms are distinct from those which use SMCto approximate static rare events which depend upon the trajectory of a process only overa ﬁxed time interval [9, 14, 24]). In this variant, a number of particles N to be maintainedthroughout is chosen in advance, and the splitting of each individual surviving particle is re-placed with resampling from the set of surviving particles. This is useful in that the proceduredoes not requite the speciﬁcation of tuning parameters R , . . . , R m − . A full description is givenin Algorithm 2.The particle system in Algorithm 2 has the familiar structure of an SMC sampler, withstep 3b combining multinomial resampling with propagation via M i . The numerical example inSection 4.2 is carried out using this simple scheme, but it should be noted that other resamplingschemes from the SMC literature can also be used; see [19] for a detailed analysis of schemeswhich might be expected to reduce estimator variance without introducing any bias. The rareevent probability of interest can be interpreted as the normalizing constant of an excursion-valued Feynman-Kac ﬂow in the sense of [13, Section 12.2.6]. This ﬂow has transition densitiesspeciﬁed in terms of the underlying dynamics and stopping times, and zero-one-valued potentialfunctions indicate whether crossing occurs into B i or A at each level. Consequently, the SMCvariant of MLS admits an interpretation as a mean ﬁeld approximation of this ﬂow and theestimator beneﬁts from the usual theoretical analysis of these, see [13]. This includes inheritinga strong law of large numbers, a central limit theorem and a proof of unbiasedness. This theorydoes not apply directly, however, to the estimator of Algorithm 1.The unbiasedness proof of [1] also applies to Algorithm 2. An alternative but more generalpoint of view from which this derives is the general theory of SMC estimators in the Feynman-Kac framework described above; see in particular [13, Theorem 7.4.2]. Remark 1.

We have presented all of the algorithms in this paper from the perspective of esti-mating rare event probabilities. One may be interested in estimating other quantities, such as lgorithm 2 Idealised MLS via SMCGiven λ together with G i , M i for i = 1 , . . . , m , and a ﬁxed number of particles N :1. For each j = 1 , . . . , N , draw independently X j (0) ∼ λ and U j ∼ M (cid:0)(cid:0) , X j (0) (cid:1) , · (cid:1) .2. Record N = (cid:80) Nj =1 G ( U j ).3. For i = 2 , . . . , m :(a) If N i − = 0, return ˆ p SMC = 0.(b) For j = 1 , . . . , N sample independently U ji ∼ N (cid:80) Nk =1 G i − ( U ki − ) M i ( U ki − , · ).(c) Record N i = (cid:80) Nj =1 G i ( U ji ).4. Return ˆ p SMC = m (cid:89) i =1 (cid:18) N i N (cid:19) . the law of the process conditional on its hitting B before A . As with standard MLS, these maybe estimated by a direct extension of the splitting algorithm. So far, we have assumed that we are able to simulate without approximation the pair U i =( σ i , X ( σ i )). Since these depend upon full sample paths of X , it is not apparent that thiscan be done except when X has an exceptionally simple form, for example X is a piece-wisedeterministic process. In practice it is usual to resort to a discretisation scheme. For example,suppose that X is described by X (0) ∼ λ and dX ( t ) = µ ( X ( t )) dt + σ ( X ( t )) dW ( t ) (1)for t ∈ [0 , T ], where W is e -dimensional Brownian motion for some e ∈ N , and µ : R d → R d , σ : R d → R d × e are suﬃciently regular to guarantee the existence of a strong Itˆo solution (see forexample, [26, Section 4.5] for suitable conditions).We might use an Euler-Maruyama scheme such as the following, deﬁned on a chosen time-grid t j ∈ P for a partition P of [0 , T ]:ˆ X ( t j +1 ) = ˆ X ( t j ) + µ (cid:16) ˆ X ( t j ) (cid:17) ( t j +1 − t j ) + σ (cid:16) ˆ X ( t j ) (cid:17) ( W ( t j +1 ) − W ( t j )) . (2)with ˆ X (0) ∼ λ . Such a scheme can then be used to implement an approximation of Al-gorithm 1 as follows: rather than drawing samples from M i in Steps 1 and 3b, one runs thediscrete scheme until a crossing into A or B is observed at time ˆ σ i = min j { t j : ˆ X ( t j ) ∈ A ∪ B } ,and approximates U i using (ˆ σ i , ˆ X (ˆ σ i )).This is the case for instance in [11], in which a formal algorithm is developed in a continuous-time setting but the numerical example is discretised “[ﬁnely] enough to avoid clipping theprocess, which could introduce a bias in the estimation”. [30] acknowledges explicitly the biasinduced by discretisation in their application, and proposes a small modiﬁcation to reduce,but not eliminate, it. Even [7], which focuses on establishing the unbiasedness of a particularadaptive multilevel splitting framework in some generality ultimately invokes time-discretisationto apply the framework to continuous-time processes such as over-damped Langevin diﬀusions. ith such a numerical scheme, one is forced to assess the level-crossing problem accordingto the discrete sample paths of ˆ X . But the law ˜ P of ˆ X will not in general coincide with thetrue ﬁnite-dimensional marginal law P of ( X t , . . . , X t k ) induced by (1). And even if it werepossible to get a ﬁnite-dimensional sample from P restricted to the times of the partition, forexample by exact simulation, this would give no information about the sample path over theopen intervals ( t j , t j +1 ), during which a crossing may (or may not) occur.The quantities that are needed to carry out Algorithms 1 and 2 are U ji = ( σ ji , X ji ( σ ji )) and G i ( U ji ). Using ε -strong simulation, we show that it is possible to sample exactly G i ( U ji ) withoutaccess to U ji itself, using instead an ε -strong approximation to it. This is the focus of the nextsection; later, we show also that the modiﬁcation to Algorithms 1 and 2 made necessary byusing this approximation does not aﬀect the unbiasedness of the resulting estimates. ε -strong simulation Formally, following the deﬁnition given in [5], an ε -strong algorithm is a joint construction of X together with a family of processes ˜ X ε indexed by (cid:15) > s, t ] such that the following four properties hold:1. Almost surely, sup r ∈ [ s,t ] (cid:107) X ( r ) − ˜ X ε ( r ) (cid:107) ≤ ε for an appropriate norm (cid:107) · (cid:107) ;2. ˜ X ε is piece-wise constant and left-continuous on [ s, t ], taking only ﬁnitely many valuesand so can be fully stored on a computer;3. ˜ X ε can be simulated exactly. That is, to sample ˜ X ε it is necessary to sample certainintermediate random variables, and this criterion requires that this can be done withoutapproximations; and4. Given a ﬁnite sequence of tolerances ε > ε > · · · > ε m >

0, for 1 ≤ (cid:96) < (cid:96) ≤ m it holdsalmost surely for all r ∈ [ s, t ] that { x : (cid:107) ˜ X ε (cid:96) ( r ) − x (cid:107) ≤ ε (cid:96) } ⊂ { x : (cid:107) ˜ X ε (cid:96) ( r ) − x (cid:107) ≤ ε (cid:96) } , and moreover it is possible to sample explicitly ˜ X ε (cid:96) conditional on ˜ X ε (cid:96) .An ε -strong algorithm produces a chain (in time) of ﬁnitely many (cid:107) · (cid:107) -balls, each of whichalmost surely constrains the sample path of X over the corresponding interval of time. Moreover,by applying Property 4 the radius of these balls can be iterative reduced, constraining X progressively more tightly by employing a greater number of balls. An example (in two spatialdimensions) of how ε -strong sampling may be used is given in Figure 2. It is often advantageousto apply condition 4 selectively in order to get tight constraints on X at certain locations ofinterest, while allowing looser constraints elsewhere. For example, Figure 2(c) shows the resultof applying Property 4 to the ﬁrst two ε -balls of the initial ε -sample in 2(b). Remark 2.

The choice of a weak inequality in 1) diﬀers slightly from the presentation in[5]. The reason is simply that our application requires calculating suprema and inﬁma of thecontinuous function ξ over regions C ( t ) = { x : (cid:107) ˜ X ( t ) − x (cid:107) ≤ ε } , and the weak inequality ensuresthat these extrema are attained in the regions C ( t ) . Remark 3.

The insistence on condition 2) that ˜ X ε be piece-wise constant is not strictly nec-essary since other processes which admit ﬁnite-dimensional representations could ﬁll the samerole. For example, continuous and piece-wise linear/polynomial ˜ X ε are possible alternatives.However, we will assume throughout that ˜ X ε both for convenience and for consistency with the ε -strong schemes which are employed in Section 4. emark 4. In existing multi-dimensional ε -strong algorithms, the norm in condition 1) isalways the (cid:96) ∞ -norm, so the representation in Figure 2 corresponding to the (cid:96) -norm should betaken to be schematic. With Property 4 in mind, let ( ε (cid:96) ) ∞ (cid:96) =1 be a decreasing sequence of tolerances converging to 0.Write ˜ X (cid:96) [ s : t ] for the ε -strong path ˜ X ε (cid:96) [ s : t ]. Let ( t , . . . , t K ) be the jump-times of this path,and let t = s , t K +1 = t . It is useful to deﬁne the associated discrete-time process ( ˜ X (cid:96)k ) Kk =0 where ˜ X (cid:96)k = ˜ X (cid:96) ( t k ). We deﬁne also an augmented process called the skeleton of ˜ X (cid:96) to be thediscrete-time process ( ˜ Z (cid:96)k ) Kk =0 such that˜ Z (cid:96)k = ( t k , t k +1 , ˜ X (cid:96)k , (cid:96) ) . (This is, in a way, rather a backwards deﬁnition since an ε -strong path itself is typically con-structed from its skeleton.) Given the skeleton Z as deﬁned above, the (almost) unique ε -strongpath associated with it is deﬁned by˜ X (cid:96) ( u ) = K (cid:88) k =0 I [ t k ,t k +1 ) ( u ) ˜ X (cid:96)k . for u ∈ [ s, t ), and we may take ˜ X (cid:96) ( t ) = ˜ X (cid:96)K . The skeleton is somehow a more computationally-motivated object than its associated path, and we will refer primarily to the paths themselvesoutside of our algorithmic pseudo-code. It is useful to deﬁne also C k = { x : (cid:107) x − ˜ X (cid:96)k (cid:107) ≤ ε (cid:96) } ,the constraining region for X over [ t k , t k +1 ].Say two skeletons ( ˜ Z i, k ) Kk =0 , ( ˜ Z j, m ) Lm =0 deﬁned on [ r, s ] and [ s, t ] respectively are compatible if C K ∩ C (cid:54) = ∅ . For two compatible skeletons, we deﬁne their concatenation ˜ Z i, = ˜ Z i, ⊕ ˜ Z j, to be the process ( ˜ Z n ) K + L +1 n =1 with˜ Z n = (cid:40) ˜ Z i, n , ≤ n ≤ K, ˜ Z j, n − K − , K + 1 ≤ n ≤ L + K + 1 . Analogously, for two compatible ε -strong paths ˜ X i, , ˜ X j, deﬁned on [ r, s ], [ s, t ] respectively, andwith skeletons ( ˜ Z i, k ) Kk =0 , ( ˜ Z j, m ) Lm =0 , we deﬁne their concatenation as follows. Take the skeleton˜ Z = ˜ Z i, ⊕ ˜ Z j, , and writing ˜ Z n = ( t n , t n +1 , ˜ X (cid:96) ( n ) n , (cid:96) ( n )) set( ˜ X i, ⊕ ˜ X j, )( u ) = K + L +1 (cid:88) n =0 I [ t n ,t n +1 ) ( u ) ˜ X (cid:96) ( n ) n for u ∈ [ s, t ), ( ˜ X i, ⊕ ˜ X j, )( t ) = ˜ X (cid:96) ( L + K +1) L + K +1 .Two paths which are themselves concatenations of ε - strong paths may be concatenatedanalogously. We will exploit the fact that the binary concatenation operation is associative toallow us to write concatenations of more than two processes without ambiguity. We also ﬁndit convenient to adopt the convention that for k ≥ k , the sub-skeleton ( ˜ Z k ) k k = k acts as theidentity for this binary operation, so that for any skeleton ˜ Y , ( Z k ) k k = k ⊕ ˜ Y = ˜ Y ⊕ ( Z k ) k k = k = ˜ Y . Remark 5.

Typically ε -strong algorithms require more information about the process than wehave made explicit in our deﬁnition of a skeleton, for example those of [33]. For ease of ex-position, we have suppressed this since we do not need to refer to it for the development ofthe algorithms in this paper, but it should be understood that our skeletons contain any extrainformation required for the stated ε -strong conditions to hold. XX T (a) Initial value X = x , and target path X ε ε ε ˜ X ˜ X T (b) ε -strong sample and constraining regions ε (c) Partial ε -strong sample(d) Full ε -strong sample Figure 2: An illustration of ε -strong simulation. The left column shows shows the ε -strongprocess ˜ X developing as conditional samples are made ﬁrst with tolerance ε , followedwith ε < ε . The right column shows the ﬁxed target path X , and how the ε -strongconstraints relate to it. Pale circles indicate superseded constraints from the previousstep. 10 Exact simulation of rare events

In multilevel splitting, one tracks the progress of X towards A and B using the reaction co-ordinate ξ , declaring a crossing at level i when the process ξ ( X ) reaches either z A or z i . Inthis section we describe methods for sampling such barrier crossing events exactly, for Markovprocesses X with almost surely continuous sample paths for which an ε -strong method forsampling X exists. Combining these with a slight modiﬁcation of Algorithms 1 & 2 providesa method of obtaining unbiased estimates of rare event probabilities. Much of what follows isgeometrically intuitive, though notationally cumbersome, and Figures 2 and 3 are intended toillustrate the intuition which motivates the accompanying speciﬁcations.Throughout this section we take X to be a diﬀusion as described in (1), together with theconditions assumed there. For simplicity, we assume further that X has volatility bounded awayfrom 0, which ensures that X crosses any given boundary with positive probability over anytime interval. We assume also that an ε -strong algorithm as described in Section 2.3 has beenchosen and is used to carry out the sampling in Algorithms 3, 4 and 6. We begin with the simpler problem of sampling exactly an indicator random variable for theevent that X crosses into a set D = ξ − ([ z D , ∞ )) when started from its complement D c = ξ − (( −∞ , z D )), over the ﬁxed time interval [0 , t ]. To this end, we suppose that X (0) ∼ λ wherethe support of λ is contained in D c . Almost surely, a path X [0 : t ] which crosses into D attainsa maximum distance d max ( X, D c ) > D c . Conversely, a path X [0 : t ] which does not crossinto D has (almost surely) minimum distance d min ( X, D ) > D . Consider the ε -strongpath ˜ X ε (0 : t ) for a tolerance ε , and let 0 = t < · · · < t K +1 = t be its jump-times. For k = 0 , . . . , K , inside each time interval [ t k , t k +1 ] the ball C k = { x : (cid:107) x − ˜ X ε ( t k ) (cid:107) ≤ ε } almostsurely constrains the path of X associated with ˜ X ε . So if ε < max( d max ( X, D c ) , d min ( X, D )),then either i) C k ⊂ D for some k (if X does make a crossing), or ii) C k ⊂ D c for all k (if X doesnot make a crossing). By checking each C k in turn, we can determine which of these conditionsholds, and thereby construct the desired indicator random variable.Of course, it is not possible to choose a suitable ε in advance, since the underlying path X and its minimum and maximum distances from D and D c are not known. Instead, we canspecify a sequence ( ε (cid:96) ) ∞ (cid:96) =1 of tolerances with ε (cid:96) →

0. If X (cid:96) turns out to be insuﬃcient todetermine the crossing, we can apply Property 4 of Section 2.3 to sample X (cid:96) +1 conditional on X (cid:96) as necessary until a suﬃciently small tolerance is found.It is very wasteful, however, to construct a ﬁnitely-representable path ˜ X [0 : t ] which is veryclose to X [0 , t ] on the whole interval [0 : t ]. It is likely that even when X crosses into D , muchof the time X is not near the boundary ∂D , and we need only approximate X closely where itis near ∂D . For this reason, as suggested in Section 2.3, it useful to work instead with paths ofmixed tolerance ˜ X = J (cid:77) j =1 ˜ X (cid:96) ( j ) [ s j − , s j ] , where (0 = s , s , . . . , s J ) is a partition of [0 , t ] with s J = t , and ( ε (cid:96) ( j ) ) Jj =1 is a selection from( ε (cid:96) ) ∞ (cid:96) =1 . Such a path is the result, for example, of applying Property 4 with ε < ε to a constantsegment ˜ X ( t k − , t k ) of ˜ X , and the result in this case would be˜ X = ˜ X [0 : t k − ] ⊕ ˜ X [ t k − : t k ] ⊕ ˜ X [ t k : t ] . For later convenience, our formalisation in Algorithm 3 of the algorithm under description takesan ˜ X of this kind, or rather the skeleton of such a path, as input. he three possible relationships between C k and D, D c can be described in terms of ξ ,the reaction co-ordinate: sup x ∈ C k ξ ( x ) < z D is equivalent to X [ t k , t k +1 ] ⊂ D c , and similarlyinf x ∈ C k ξ ( x ) ≥ z D is equivalent to X [ t k , t k +1 ] ⊂ D . The third possibility, thatinf x ∈ C k ξ ( x ) < z D , sup x ∈ C k ξ ( x ) ≥ z D , gives no deﬁnite information about the location of X [ t k , t k +1 ] with respect to D . It is consistentwith X [ t k , t k +1 ] falling entirely in D , entirely in D c , or partially in both. We categorise thebehaviour of the process in this time interval by deﬁning: n k :=  − , if sup x ∈ C k ξ ( x ) < z D , if inf x ∈ C k ξ ( x ) < z D , sup x ∈ C i ξ ( x ) ≥ z D +1 , if inf x ∈ C k ξ ( x ) ≥ z D . (3)Algorithm 3 samples exactly an indicator for the event that X crosses into D . Algorithm 3

Single barrier crossingfunction(( ˜ Z k ) Kk =0 , D ):1. Calculate the sequence ( n k ) Kk =0 .2. If n k = − k = 1 , . . . , K , return (0 , ˜ Z ) to indicate no crossing into D .3. If n k = +1 for some k , return (+1 , ˜ Z ) to indicate a crossing into D .4. Else:(a) Set j = min { k ∈ { , . . . , K } : n k = 0 } , and consider˜ Z (cid:96) ( j ) := ˜ Z (cid:96) ( j ) j = ( t j , t j +1 , ˜ X j , (cid:96) ( j )). Use the reﬁning Property 4 of Section 2.3to sample ˜ Z (cid:96) ( j )+1 conditional on ˜ Z (cid:96) ( j ) .(b) Update ˜ Z ← ( ˜ Z k ) j − k =0 ⊕ ˜ Z (cid:96) ( j )+1 ⊕ ( ˜ Z m ) Km = j +1 , and update K ← ( Z ) + 1. Return to Step 1. Remark 6.

It may be noted that in Step 4a) of Algorithm 3, it is not strictly necessary tochoose k minimal. There may be computational advantages to using a diﬀerent system, suchas attempting choose an k for which C k ∩ A is large (indicating a high probability of crossing).This can be computationally preferable, at the expense of providing less information about τ A (see Section 3.3). The assumption that sup x ∈ C k ξ ( x ) and inf x ∈ C k ξ ( x ) can be calculated is rather strong, butholds for many realistic scenarios. For example, supposing X takes values in R d and, taking the L ∞ -norm, (cid:107) x (cid:107) = max i ∈ ,...,d | x i | , these quantities can be calculated if ξ is monotonic in eachargument. As a speciﬁc example, in Section 4, we take d = 2, ξ ( x, y ) = min( x, y ). Anotherexample of a tractable reaction coordinate, which illustrates that monotonicity is not necessary,is ξ ( x, y ) = | x − y | . .2 Crossing a two-sided barrier We consider next a two-sided barrier problem, with regions A = ξ − (( −∞ , z A ]), B = ξ − ([ z B , ∞ )),with X (0) ∼ λ such that z A ≤ ξ ( X (0)) < z B , and the problem of sampling an indicator randomvariable for the event that X crosses into B before A . Here we work over over a random interval[0 , σ ] where σ is the hitting time for A ∪ B of X , rather than over a ﬁxed interval as in theprevious section.In this case, we can declare a level crossing into A (for example) at the ﬁrst k for whichsup x ∈ C k ξ ( x ) < z A and max j z A X may enter A in [ t k , t k +1 ]0 inf x ∈ C k ξ ( x ) > z A , sup x ∈ C k ξ ( x ) < z B X does not enter A or B in [ t k , t k +1 ]1 sup x ∈ C k ξ ( x ) ≥ z B , inf x ∈ C k ξ ( x ) < z B X may enter B in on [ t k , t k +1 ]2 inf x ∈ C k ξ ( x ) ≥ z B X remains within B on [ t k , t k +1 ]Table 1: Enumeration of possible C i -locations with respect to a two-sided barrier. Suppose we have calculated the sequence ( n k ) Kk =0 associated with the path ˜ X [ s : t ]. In orderto determine which of A and B has been crossed ﬁrst, it is necessary to consider segments of˜ X [ s : t ] in which a crossing of one or the other barrier may have occurred, but a crossing of bothbarriers cannot have occurred. By checking each of these segments in turn, the decision canbe made. In terms of the sequence ( n k ), these segments are constructed as follows. We write J for the number of segments, where the deﬁnition of J is contained in the construction. Wedeﬁne recursively the sequence of indices which mark the beginning of a new segment in whicha crossing may occur, as the sequence ( κ ( j )) J +1 j =0 . Let κ (0) = 0, and while κ ( j − < K + 1, set κ ( j ) = (cid:40) min { k > κ ( j −

1) : n k = 0 } ∧ ( K + 1) if n κ ( j − (cid:54) = 0min { k > κ ( j −

1) : n k (cid:54) = 0 } ∧ ( K + 1) if n κ ( j − = 0 . Each element of this sequence is taken to denote the beginning of a block B j ⊂ ( n k ) of consec-utive elements, so B j = { n k : κ ( j − ≤ k < κ ( j ) − } . By construction, each B j consistsof a string of elements of exactly one of the sets {− , − } , { } , { , } . Each block there-fore corresponds to a segment of ˜ X [ s : t ] in which X crosses into at most one of A and B . For example, in the case that ( n i ) = (1 , , , , − , − , − J = 3 and the blocks are B = (1 , , B = (0 , , B = ( − , − , − B ”, “no crossing” and “deﬁnite crossing into A ”, respectively. he two-sided barrier crossing procedure is given in Algorithm 4, in which the output isan indicator random variable for the event that X hits B before A . An illustration is given inFigure 3. x AB (a) Full realisation of a process X over a ﬁxed time-horizon AB (b) Initial ε -strong constraints;possible crossing in highlightedball AB (c) A ﬁner resolution of thehighlighted ball in 3(b) deter-mines that the ﬁrst crossing isinto B AB (d) Alternative realisation of X ,together with initial ε -strongconstraints AB (e) One-stage resolution indeter-minate AB (f) Further resolution deter-mines ﬁrst crossing into A Figure 3: Two illustrations of Algorithm 4. The ﬁrst row shows: 3(a) a realisation of X over a ﬁnite time horizon, 3(b) an initial ε -strong simulation and 3(c) a reﬁnementwhich is suﬃcient to show the process crossing into B . The second row shows 3(d)an alternative sample path consistent with the same initial ε -strong simulation, 3(e)an inconclusive reﬁnement and 3(f) a further reﬁnement suﬃcient to conclude that theprocess has crossed into A . Remark 7 (Implementation) . In general, it may be computationally ineﬃcient to use a suﬃ-ciently small initial ε for the whole sample path of X , for example when X crosses a barrierat a very early time. We note that there are many variations of Algorithm 4 which will alsosample the outcome correctly and may avoid doing so for computational eﬃciency. Our choicehas been made for clarity of exposition. Remark 8 (Discontinuous processes and jump-diﬀusions) . We have assumed throughout forconvenience that the sample paths with which we deal are almost surely continuous. Relaxingthis requirement is straightforward but slightly complicates the implementation. Given an ε -strong algorithm for a jump-diﬀusion or similar piece-wise-continuous process, an appropriatealteration to the rule for beginning a new block will give an equally correct algorithm. Finally, we turn to an exact implementation of multilevel splitting. The main point of diﬀerencewith Algorithm 1 is that since an ε -strong sample ˜ X [ s, t ] merely constrains the correspondingpath X [ s, t ], there is no easy way to determine the hitting location and time of any given barrier. lgorithm 4 Two-sided barrier crossingfunction(( ˜ Z k ) Kk =0 , A, B ), for ˜ Z a skeleton over the interval [ s, t ]:1. Initialise ˜ Z full as an empty skeleton.2. Calculate the sequence ( n k ) Kk =0 associated with ˜ Z .3. Divide ( n k ) Kk =0 into blocks B , . . . , B J using the sequence ( κ ( j )) J +1 j =0 as describedabove.4. For j = 1 , . . . , J :(a) If ( − ∈ B j , set D = ( −

1) to indicate a crossing into A , and skip to 6.(b) If (+2) ∈ B j , set D = (+1) to indicate a crossing into B , and skip to 6.(c) If ( − ∈ B j , sample ( I, ˜ Z (cid:48) ) ← Algorithm.3 (( ˜ Z k ) κ ( j ) − k = κ ( j − , A ) to decide if theﬁrst crossing is into A , and set˜ Z ← ( ˜ Z k ) κ ( j − − k =0 ⊕ ˜ Z (cid:48) ⊕ ( ˜ Z k ) Kk = κ ( j ) , and K ← ( Z ) + 1. If I = (+1), set D = ( −

1) and skip to 6.(d) If (+1) ∈ B j , sample ( I, ˜ Z (cid:48) ) ← Algorithm.3 (( ˜ Z k ) κ ( j ) − k = κ ( j − , B ) to decide if theﬁrst crossing is into B , and update ˜ Z, K as in c). If I = (+1), set D = (+1)and skip to 6.5. Here, we know that no crossing is made in the interval [ s, t ] spanned by ˜ Z , so it isnecessary to sample a continuation of the path. Let ˜ Z full ← ˜ Z full ⊕ ˜ Z . Writing( t K , t, x, ε ) = ˜ Z K , update ( s, t ) ← ( t, t − s ), and sample a new ˜ X ε ( s : t ]. Recordits skeleton ( ˜ Z k ) Kk =1 , and return to 2.6. Set ˜ Z full ← ˜ Z full ⊕ ˜ Z , and return ( D, ˜ Z full ).15 n particular, we will not have access to the hitting times σ i , τ i nor the hitting locations X ( τ i )deﬁned in Section 2.1.Suppose we use Algorithm 4 to sample an indicator random variable for the event that X hits B before A , for instance, with initial simulation interval [0 , T ]. Suppose that a positiveresult is returned over the interval [0 , cT ], for some random c ∈ N corresponding to the numberof passes through Algorithm 4. We must then choose when and where to split this path of ˜ X .In this section, we show that if the splitting is carried out at time M T , this does not aﬀect theunbiasedness of the MLS estimate.In general, write ˜ σ i for a random time which serves as an upper bound on the ﬁrst hittingtime of A ∪ B i for ˜ X , which is deﬁned as:˜ σ i = T · min { m ∈ N : mT ≥ σ i } , i.e. the time to which ˜ X is sampled in Algorithm 4 (so ˜ σ i is a multiple of T ) in order to establishthat a crossing into A or B i has occurred. Similarly, let ˜ τ i be the corresponding upper boundon the ﬁrst hitting time of B i . To understand the exact MLS we describe later in this section, itis helpful to have in mind an idealised splitting scheme slightly diﬀerent from MLS as presentedin Section 2.1, which we call idealised splitting with coupling .As in idealised MLS, we assume that it is possible to sample complete continuous paths of X up to a given stopping time. But rather than split these paths into independent copies attimes τ i , the split paths are coupled in the following way: from time τ i until time ˜ τ i , the “split”paths are set to be identically equal, and after this time they evolve conditionally independentlygiven X ˜ τ i .For i = 1 , . . . , m , let ˜ M i denote the transition kernels for the discrete time quadruple process V i = ( σ i , ˜ σ i , X ( σ i ) , ˜ X (˜ σ i )). Deﬁne also ˜ G i ( V i ) = I B i ( X ( σ i )). The details are given in Algorithm5. Call the estimator for p resulting from this algorithm ˜ p . A Bx B B B Figure 4: An illustration of Idealised Splitting with Couplings for a single particle system.The particle begins at the node labelled x . Level crossings are indicated by empty nodes,whereas splittings occur at the ﬁlled nodes. Between any empty node and the followingﬁlled node, the particle trajectories are coupled identically. Compare with Figure 1. Of course, it is not possible to implement Algorithm 5 as written, since we cannot simulatefull paths of X , nor make splits at times τ i . But the construction of MLS with couplingsmeans that an algorithm which splits paths at the tractable time ˜ τ i instead (and allows themto propagate independently from that point) produces identical estimators ˜ q i for p i .It is possible that a particle crosses several barriers z i , z i +1 , . . . before time ˜ τ i . In this case,the particle splits as normal at each barrier, each new copy remaining identically coupled, untilthe splitting time ˜ τ i after which they proceed independently.The following proposition establishes that this framework gives rise to unbiased estimates.In order to simplify the analysis, we make a further assumption about the ε -strong methodbeing used: that over any interval [ s, t ], we have (cid:16) ˜ X ( t ) | ˜ X ( s ) = x s (cid:17) is equal in distribution to( X ( t ) | X ( s ) = x s ). In other words, we assume that the end-points of ε -strong samples are fromthe true distribution of X . This holds in the schemes developed in [3] and [33], for example. lgorithm 5 Idealised Splitting with CouplingGiven λ together with ˜ G i , ˜ M i for i = 1 , . . . , m , an initial number of particles N , andsplitting ratios R , . . . , R m − :1. For j = 1 , . . . , N :(a) Draw X j (0) ∼ λ , V j ∼ ˜ M (cid:0)(cid:0) , , X j (0) , X j (0) (cid:1) , · (cid:1) .2. Let S = { V j : ˜ G ( V j ) = 1 } be a list of the the surviving paths, and N = | S | .3. For i = 2 , . . . , m :(a) If N i − = 0, return ˜ p = 0.(b) Otherwise given S i − = { V ji − } N i − j =1 , for each ( j, k ) ∈ { ( j (cid:48) , k (cid:48) ) :1 ≤ j (cid:48) ≤ N i − , ≤ k (cid:48) ≤ R i − } :i. Sample V ( j,k ) i ∼ ˜ M i ( V ji − , · )(c) Let S i = { V j,ki : ˜ G i ( V j,ki ) = 1 } , and set N i = | S i | .4. Estimate ˜ p = N m N (cid:81) m − i =1 R i . And given the close relation of ε -strong simulation with exact simulation, it may often be thecase that a ε -strong algorithm of this form can be constructed (see for instance [5, 6]).Given the MLS with couplings scheme, it is now possible to establish this result with onlyminor modiﬁcation of the existing arguments of [1]; this result paves the way for the methodologywhich we introduce and in principle allows for unbiased estimation of rare event probabilitiesin continuous time whenever ε -strong simulation of the process of interest is possible. Proposition 1. ˜ p is an unbiased estimator for p : E [˜ p ] = p .Proof. We consider the discrete-time Markov process ( V i ) mi =0 . Let ˜ M i denote the transitionkernels of this process at time i . Let also ˜ M i : j = ˜ M j ◦ ˜ M j − ◦ · · · ◦ ˜ M i +1 denote the elementsof the associated two-parameter dynamic semigroup, describing the evolution of the processfrom time i to j . Note that for every i , ∆ = R × A × R d is an absorbing state for ˜ M i since if X ( σ i ) ∈ A , σ j = σ i for all j > i .Deﬁne G to be the sigma algebra generated by { V j : j = 1 , . . . , N } , G k = G k − ∨ Σ( V jk :1 ≤ j ≤ N k ) for k = 2 , . . . , m , so that ( G k ) kk =1 is the natural ﬁltration of this process..We observe the following recursion for the number of particles successfully reaching B i given G i − , which is immediate from the deﬁnition of M i : that for any function h k : R × ( R d ) → R which is equal to 0 on ∆: E (cid:20) R i − N i − (cid:88) j =1 h i (cid:16) V ji (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) G i − (cid:21) = R i − (cid:88) k : ˜ G i − ( V ki − )=1 (cid:90) h i ( u ) ˜ M i (cid:16) V ki − , du (cid:17) . (4)(taking R = 1) which follows since each V ji which is a descendent of any particular V ki − hasthe same marginal law. eﬁne estimators ˜ p i for p i in Algorithm 5, namely˜ p i = (cid:40) N N , i = 1 , and N i R i − N i − , ≤ i ≤ m. We show that for 1 ≤ k , E (cid:34) m (cid:89) i = k ˜ p i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G k − (cid:35) = 1 N k − (cid:88) j : ˜ G k − ( V jk − )=1 (cid:16) − ˜ M ( k − m (cid:16) V jk − , ∆ (cid:17)(cid:17) , by backwards induction on k , starting with the case k = m . Note that for the case k = 1,each term in the sum on the RHS is the probability that a particle with a given starting valuesuccessfully reaches B before A . The result is then obtained upon taking a further expectationover the starting value.The case k = m : ˜ p m = N m R m − N m − = 1 R m − N m − R m − N m − (cid:88) j =1 ˜ G m (cid:0) V jm (cid:1) and the result then follows from taking the conditional expectation, and applying (4) with h j (cid:16) σ, ˜ σ, X ( σ ) , ˜ X (˜ σ ) (cid:17) = I B j ( X ( σ )).Now supposing the result holds for k + 1, we show that it holds also for k . We have thefollowing chain of equalities (with the convention that G ( U j ) = 1): E (cid:34) m (cid:89) i = k ˜ p i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G k − (cid:35) = E (cid:34) ˜ p k E (cid:34) m (cid:89) i = k +1 ˜ p i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G k (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G k − (cid:35) (5)= E  N k R k − N k − · N k (cid:88) j : ˜ G k ( V jk )=1 (cid:16) − ˜ M k : m (cid:16) V jk , ∆ (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G k −  (6)= 1 N k − (cid:88) j : ˜ G k − ( V jk − )=1 (cid:18)(cid:90) (cid:16) − ˜ M k : m ( u, ∆) (cid:17) ˜ M k (cid:16) V jk − , du (cid:17)(cid:19) (7)= 1 N k − (cid:88) j : ˜ G k − ( V jk − )=1 (cid:18) − (cid:90) ˜ M k : m ( u, ∆) ˜ M k (cid:16) V jk − , du (cid:17)(cid:19) (8)= 1 N k − (cid:88) j : ˜ G k − ( V jk − )=1 (cid:16) − ˜ M ( k − m (cid:16) V jk − , ∆ (cid:17)(cid:17) (9)where (5) follows from the tower rule, and noting that ˜ p k is G k -measurable; (6) from the induc-tion hypothesis; (7) from (4); (8) from expanding and integrating over M k ( V jk − , du ); and (9)from the semigroup property.Moreover, putting k = 1 have(9) = 1 N k − N (cid:88) j =1 (cid:16) − M m (cid:16) V j , ∆ (cid:17)(cid:17) ince (cid:16) − M m (cid:16) V j , ∆ (cid:17)(cid:17) = P ( τ B < τ A | X (0)), we see that E [˜ p ] = E (cid:34) m (cid:89) i =1 ˜ p m (cid:35) = P ( τ B < τ A )as desired. Remark 9.

In this algorithm we have taken the simple choice to split paths of ˜ X at the ﬁrst(random) multiple of T after which a crossing is guaranteed, ˜ τ i . Since this could result in a longgap between the crossing time and the splitting time, i.e. a large (˜ τ i − τ i ) , this may introducesome unwanted variance into the estimation and various other approaches to splitting could beimplemented. One simple alternative is to choose in advance a reasonably ﬁne deterministictime-grid, and to split at the ﬁrst location on the time-grid after which crossing is guaranteed. Algorithm 6

Exact Multilevel SplittingGiven λ together with ˜ G i , ˜ M i for i = 1 , . . . , m , an initial number of particles N , andsplitting ratios R , . . . , R m − :1. Initialise S = · · · = S m = ∅ .2. For j = 1 , . . . , N :(a) Draw ˜ X j (0) ∼ λ , and simulate ˜ X j [0 : T ] together with its skeleton ˜ Z j as perSection 2.3.(b) Sample ( ˜ G ( ˜ Z j ) , ˜ Z j ) ∼ Algorithm.4( ˜ Z j , A, B ). If ˜ G ( ˜ Z j ) = 1, add ˜ Z j to S .3. Record N = | S | .4. For i = 2 , . . . , m :(a) if N i − = 0, return the estimate ˆ p ex = 0 of p .(b) Otherwise, given S i − = { ˜ Z ji − } N i − j =1 , for all pairs { ( j, k ) } ≤ k ≤ R i − ≤ j ≤ N i − sample inde-pendently ( ˜ G i ( ˜ Z j,ki ) , Z j,ki ) ∼ Algorithm.4( ˜ Z ji − , A, B i ) , and if ˜ G i ( ˜ Z j,ki ) = 1, add ˜ Z j,ki to S i .(c) Set N i = | S i | .5. Return ˆ p ex = N m N (cid:81) m − i =1 R i . The estimate p ex given by Exact Multilevel Splitting (Algorithm 6) is exactly the same asthat given by Idealised Splitting with Coupling (Algorithm 5), since the particle systems deﬁnedby these algorithms are essentially identical. Corollary 1. E [ˆ p ex ] = p for all N ≥ . imilarly to Algorithm 2, a Sequential Monte Carlo variant of Algorithm 6 may be con-structed by replacing the splitting step with resampling: this is illustrated in Algorithm 7. Itsadvantages over Algorithm 6 are the same as the advantages of Algorithm 2 in Section 2.1,namely that there are no splitting ratios R i which need to be calibrated to ensure a stableparticle system. Algorithm 7

Exact Multilevel Splitting via SMCGiven λ together with ˜ G i , ˜ M i for i = 1 , . . . , m , and a ﬁxed number of particles N : . . . b ) (cid:48) . Otherwise, given S i − = { ˜ Z ji − } N i − j =1 , sample a , . . . , a N independently and uni-formly at random (with replacement) from 1 , . . . , N i − , and sample independently( ˜ Z ji , ˜ G i ( ˜ Z ji )) ∼ Algorithm.4( ˜ Z a j i − , A, B i ). . . . . Estimate ˆ p SMC = (cid:81) mi =1 N i ( N ) m = m (cid:89) i =1 (cid:18) N i N (cid:19) . The examples in this section were carried out using a single core on an Intel Xeon E5-2440processor with an advertised clock speed of 2.40GHz.

Our ﬁrst illustrative example uses the ε -strong scheme for Brownian motion of [3, 33], in asetting in which the exact solution is known. In one dimension, the reaction co-ordinate maybe taken to be the identity function. We choose A = ( −∞ , B = [3 , ∞ ), B i = [3 i , ∞ ) for i = 1 , . . . ,

17, with initial point x = 1. It is well-known that for real 0 < a < b , the probabilitythat a Brownian path started at a reaches b before 0 is a/b — as can be veriﬁed with a simpleoptional stopping argument. Therefore the target probability is 3 − ≈ . × − .The ε -strong scheme in question has some additional features which allow a substantialimprovement in speed to that given in Algorithm 7 in this exceptionally simple setting. Over agiven time interval [ s, t ], for a Brownian motion ( X ( r )) r ∈ [ s,t ] it is possible to sample bounds L ↓ ( s, t ) ≤ inf r ∈ [ s,t ] X ( r ) ≤ L ↑ ( s, t ) , U ↓ ( s, t ) ≤ sup r ∈ [ s,t ] X ( r ) ≤ U ↑ ( s, t ) . An ε -strong algorithm over [0 , T ] in the sense given in section 2.3 can be recovered by choosingpartitioning [0 , T ] into suitably small time intervals [ s i , t i ], and taking˜ X ε ( r ) = 12 ( U ↑ ( s i , t i ) − L ↓ ( s i , t i ))for r ∈ [ s i , t i ]. This corresponds to taking C i = [ L ↓ ( s i , t i ) , U ↑ ( s i , t i )] in the notation of Sec-tion 3.1. This does not make use of the extra information available in ( L ↑ , U ↓ ). In particular, lgorithmEuler (0.005)Euler (0.01)Euler (0.001)Euler (0.1)Exact0.00.40.20.60.8 0 5 10 E s ti m a t e d D e n s it y / P ( τ < σ ) / − Figure 5: Kernel density estimates for the various schemes. The mean of each densityestimate is shown as a coloured vertical line. The black dot on the ordinal axis indicatesthe true probability. to assess whether X has crossed above the point b over [ s i , t i ], it is suﬃcient to ﬁnd that U ↓ ( s i , t i ) > b , since this guarantees the maximum of X is large enough. This is easier to checkthan the stricter condition that C i ⊂ ( b, ∞ ), or equivalently that L ↓ ( s i , t i ) > b .We compare the exact estimator to Euler-Maruyama-type schemes (see (2)) with three levelsof resolution. The initial step sizes for the schemes are taken to be 0 . , .

005 and 0 . B i is reached the step size is multiplied by 3 = 9.This scaling which depends upon analytical techniques which would not be available in morerealistic problems was essential in order to achieve a reasonable calculation time (this choiceensures the expected number of Euler-Maruyama steps until a crossing is decided remainsconstant as the scale of the problem grows). 500 estimators were produced for each procedure,each with population of 1000 particles. The results are shown in Figure 5.Typical run-times for a single sample were 19s for the exact-MLS algorithm, 32s for theEuler-Maruyama-0.005 scheme, 157s for the Euler-Maruyama-0.001 scheme. This demonstratesthat in favourable circumstances exact MLS can yield estimates of rare event probabilities atsigniﬁcantly lower computational cost than that at which discretisation-based methods canreach an acceptable level of bias. In other settings the cost of exact methods can be somewhathigher, as the next example will demonstrate. Our second example illustrates Algorithm 7 in a pure form, is a two-dimensional problem.The random process is again taken to be Brownian motion initialised at W = ( , ). Thereaction co-ordinate is chosen to be ξ ( x, y ) = min( x, y ), and the levels are chosen to be A = ξ − (( −∞ , , B = ξ − ((2 , ∞ )) , B i = ξ − (2 ( i +1) , ∞ )) for i = 1 , . . . ,

18. We are not aware ofany simple means by which the rare event probability can be analytically obtained in this case. lgorithmEuler (0.01)Euler (0.05)Euler (0.10)Exact0.00.10.20.3 151050 E s t i m a t e d D e n s i t y /10 P ( τ < σ ) / − Figure 6: Kernel density estimates for exact MLS and for an Euler-Maruyama-discretisation-based method with three diﬀerent discretisation step sizes. The mean ofeach density estimate is shown as a coloured vertical line.

The ε -strong algorithm used is that of [3, 33], in the same fashion as Section 4.1. Again, theexact estimator is compared to three Euler-Maruyama schemes of (2) with increasing degreesof ﬁneness, in this case with initial step-sizes 0 .

1, 0 .

05 and 0 .

01. The step-sizes were againre-scaled at each new level to ensure an approximately constant relative error, this time by afactor of 2 = 4. For each scheme, 250 trials were simulated, each using 100 particles. Typicalrunning times were 13s for the Euler-Maruyama-0.1 scheme, 21s for the Euler-Maruyama-0.05scheme, 1m 45s for the Euler-Maruyama-0.05 scheme and 534m for the exact scheme.Although the running time for exact MLS is signiﬁcantly longer than that of the discreteschemes in this case, we believe that a more careful attempt to tune and adapt its parameterscould substantially reduce the diﬀerence. However, the optimal choice of parameters will dependon the particular application, and since our aim is to provide proof-of-concept validation of thegeneric methodology developed in this paper we have not attempted to do so.Figure 6 shows a kernel density estimate of the sampling distribution of the resulting es-timators obtained using the geom density function of ggplot2 [37], using its default choiceof bandwidth. As the Euler-Maruyama scheme increases in ﬁneness, the resulting estimate isappears closer to the estimate from exact MLS. We have presented the ﬁrst algorithm for the exact estimation of a class of rare event probabil-ities for continuous-time processes. Our method has been to directly replace discrete approxi-mations of continuous-time sample paths with ε -strong samples of the same paths, in order toobtain the sequence ˆ p i of conditionally unbiased estimates required for multilevel splitting.There is considerable ongoing eﬀort in the development of ε -strong simulation methods fora broad class of stochastic processes, and their application (see for instance [32]). It is likelythat further development in this area will allow the approach described within this paper to beapplied with greater eﬃciency to a broader class of stochastic processes. Naturally, it is likely hat the exactness of this approach will always come at the expense of a computational costthat exceeds that of discretisation-based schemes. This cost can be partially oﬀset against theneed to assess the bias inherent in such schemes.One important assumption that we have made is the ability to calculate inﬁma and supremaof the reaction co-ordinate over the sets C k which constrain a sample path X . The problemof assessing whether these C k intersect the MLS sets B i has links to the problem of collisiondetection studied in computer graphics (see eg [27]). Insight from this ﬁeld might allow a morecareful classiﬁcation of suitable ξ for a given C k , or for the assumption to be weakened in certaincircumstances.The alive particle ﬁlter [15] provides an alternative approach to implementing SMC algo-rithms with { , } -valued potential functions which also provide unbiased estimates of normal-izing constants and thus, in the present context, of the rare event probability. We did notexperience diﬃculties with extinction of the particle system in our experiments, but incorporat-ing that approach within the exact MLS framework that we present would be interesting becauseit would automatically mitigate the inﬂuence of poorly chosen intermediate levels, albeit at thecost of further randomizing the computational cost.As we noted in section 2.1, one strength of this method is that other quantities relating to thedistribution of paths which reach the rare event set B may also be estimated. These require thesort of detailed information given by ε -strong simulation. However, if the rare event probabilityis the only quantity of interest, it is possible that approaches to obtaining unbiased estimates ofrare event probabilities without requiring the full machinery of unbiased MLS implementationsare suﬃcient.The construction of computable unbiased estimators via a sequence of asymptotically biasedestimators has received much attention in recent years, following [21]. This technique has beendirectly extended to estimating expectations of functionals of SDE paths in [34], and recentlythis approach has been further extended to non-linear ﬁltering problems in [23]. In ongoing workwe have started to explore the possibility that a related approach could allow for exact rareevent estimation in a more general setting, and with greater scope for practical applications. Acknowledgements

This work was supported by the Engineering and Physical Sciences Research Council undergrant numbers EP/L016710/1, EP/R034710/1 and EP/T004134/1 and by two Alan Turing In-stitute programmes; the Lloyd’s Register Foundation programme on ‘Data-centric engineering’and the UK Government’s ‘Defence and security’ programme.

References [1] M. Amrein and H. K¨unsch. A Variant of Importance Splitting for Rare Event Estimation:Fixed Number of Successes.

ACM Transactions on Modelling and Computer Simulation ,21(2):Article 13, 20pp., 2011.[2] A. Beskos, O. Papaspiliopoulos, and G.O. Roberts. Retrospective exact simulation ofdiﬀusion sample paths with applications.

Bernoulli , 12(6):1077–1098, 2006.[3] A. Beskos, S. Peluchetti, and G.O. Roberts. ε -Strong simulation of the Brownian path. Bernoulli , 18(4):1223–1248, 2012.[4] A. Beskos and G.O. Roberts. Exact simulation of diﬀusions.

Annals of Applied Probability ,15(4):2422–2444, 2005.

5] J. Blanchet, X. Chen, and J. Dong. ε -Strong simulation for multidimensional stochasticdiﬀerential equations via rough path analysis. Annals of Applied Probability , 27(1):275–336,2017.[6] J. Blanchet and F. Zhang. Exact simulation for multivariate Itˆo diﬀusions.

Advances inApplied Probability , 52(4):1003–1034, Dec 2020.[7] C-E. Br´ehier, M. Gazeau, L. Gouden`ege, T. Leli`evre, and M. Rousset. Unbiasedness ofsome generalized adaptive multilevel splitting algorithms.

Annals of Applied Probability ,26(6):3559–3601, 2016.[8] J. I. G. C´azares, A. Mijatovi´c, and G. U. Bravo. ε -strong simulation of the convex minorantsof stable processes and meanders, 2019. arXiv:1910.13273 [math.PR].[9] F. C´erou, P. Del Moral, T. Furon, and A. Guyader. Sequential Monte Carlo for rare eventestimation. Statistics and computing , 22(3):795–808, 2012.[10] F. C´erou, P. Del Moral, F. Le Gland, and P. Lezaud. Genetic genealogical models in rareevent analysis.

ALEA: Latin American Journal of Probability and Mathematical Statistics ,1:181–203, 2006.[11] F. C´erou and A. Guyader. Adaptive multilevel splitting for rare event analysis.

StochasticAnalysis and Applications , 25(2):417–443, 2007.[12] Y. Chen, J. Dong, and H. Ni. ε -Strong simulation of fractional Brownian motion andrelated stochastic diﬀerential equations. Mathematics e-print 1902.08824, ArXiv, 2019.[13] P. Del Moral. Feynman-Kac Formulae . Springer Verlag, New York, 2004.[14] P. Del Moral and J. Garnier. Genealogical particle analysis of rare events.

Annals ofApplied Probability , 15(4):2496–2534, 2005.[15] P. Del Moral, A. Jasra, A. Lee, C. Yau, and X. Zhang. The alive particle ﬁlter and its use inparticle Markov chain Monte Carlo.

Stochastic Analysis and Applications , 33(6):943–974,2015.[16] P. Del Moral and P. Lezaud. Branching and interacting particle interpretations of rareevent probabilities. In

Stochastic Hybrid Systems , pages 277–323. Springer, 2006.[17] P. Fearnhead, O. Papaspiliopoulos, and G.O. Roberts. Particle ﬁlters for partially-observeddiﬀusion.

Journal of the Royal Statistical Society, Series B: Methodology , 70(4):755–777,2008.[18] M. J. J. Garvels.

The Splitting Method in Rare Event Simulation . PhD thesis, Universityof Twente, Twente, 2000.[19] M. Gerber, N. Chopin, and N. Whiteley. Negative association, ordering and convergenceof resampling methods.

Annals of Statistics , 47(4):2236–2260, 2019.[20] P. Glasserman, P. Heidelberger, P. Shahabuddin, and T. Zajic. Multilevel splitting forestimating rare event probabilities.

Operations Research , 47(4):585–600, 1999.[21] P. W. Glynn and C-H. Rhee. Exact estimation for Markov chain equilibrium expectations.

Journal of Applied Probability , 51(A):377–389, 2014.[22] P. Grassberger. Pruned-enriched Rosenbluth method: Simulations of θ polymers of chainlength up to 1000000. Physical Review E , 56(3):3682–3693, 1997.

23] A. Jasra, F. Yu, and J. Heng. Multilevel particle ﬁlters for the non-linear ﬁltering problemin continuous time.

Statistics and Computing , 30(5):1381–1402, 2020.[24] A.M. Johansen, P. Del Moral, and A. Doucet. Sequential Monte Carlo samplers for rareevents. In

Proceedings of the 6th International Workshop on Rare Event Simulation , pages256–267, Bamberg, Germany, October 2006.[25] H. Kahn and T. E. Harris. Estimation of particle transmission by random sampling.

National Bureau of Standards Applied Mathematics Series , 12:27–30, 1951.[26] P. E. Kloeden and E. Platen.

Numerical Solution of Stochastic Diﬀerential Equations ,volume 23. Springer Science & Business Media, 2013.[27] S. Kockara, T. Halic, K. Iqbal, C. Bayrak, and R. Rowe. Collision detection: A survey. In , pages 4046–4051,2007.[28] A. Lagnoux-Renaudie. Rare event simulation.

Probability in the Engineering and theInformational Sciences , 20:45–66, 2006.[29] A. Lagnoux-Renaudie. Eﬀective branching splitting method under cost constraint.

Stochas-tic Processes and their applications , 118(10):1820–1851, 2008.[30] A. Lagnoux-Renaudie. A two-step branching splitting model under cost constraint for rareevent analysis.

Journal of Applied Probability , 46(2):429–452, 2009.[31] J. Lamperti. A simple construction of certain diﬀusion processes.

J. Math. Kyoto Univ. ,4(1):161–170, 1964.[32] M. Mider, P.A. Jenkins, M. Pollock, G.O. Roberts, and M. Sørensen. Simulating bridgesusing conﬂuent diﬀusions. Mathematics e-print 1903.10184, ArXiv, 2019.[33] M. Pollock, A.M. Johansen, and G.O. Roberts. On the exact and ε -strong simulation of(jump) diﬀusions. Bernoulli , 22(2):794–856, 2016.[34] C-H. Rhee and P. W. Glynn. Unbiased estimation with square root convergence for SDEmodels.

Operations Research , 63(5):1026–1043, 2015.[35] G. Rubino and B. Tuﬃn, editors.

Rare event simulation using Monte Carlo methods ,volume 73. Wiley Online Library, 2009.[36] M. Vill´en-Altamirano and J. Vill´en-Altamirano. RESTART: a straightforward methodfor fast simulation of rare events. In

Proceedings of Winter Simulation Conference , pages282–289. IEEE, 1994.[37] H. Wickham. ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag New York,2016.. Springer-Verlag New York,2016.