[PDF] A Koopman framework for rare event simulation in stochastic differential equations

Abstract

We exploit the relationship between the stochastic Koopman operator and the Kolmogorov backward equation to construct importance sampling schemes for stochastic differential equations. Specifically, we propose using eigenfunctions of the stochastic Koopman operator to approximate the Doob transform for an observable of interest (e.g., associated with a rare event) which in turn yields an approximation of the corresponding zero-variance importance sampling estimator. Our approach is broadly applicable and systematic, treating non-normal systems, non-gradient systems, and systems with oscillatory dynamics or rank-deficient noise in a common framework. In nonlinear settings where the stochastic Koopman eigenfunctions cannot be derived analytically, we use dynamic mode decomposition (DMD) methods to compute them numerically, but the framework is agnostic to the particular numerical method employed. Numerical experiments demonstrate that even coarse approximations of a few eigenfunctions, where the latter are built from non-rare trajectories, can produce effective importance sampling schemes for rare events.

Full PDF

aa r X i v : . [ s t a t . C O ] J a n A Koopman framework for rare event simulation in stochasticdiﬀerential equations

Benjamin J. Zhang , Tuhin Sahai , and Youssef M. Marzouk Department of Aeronautics & Astronautics, Center for Computational Science & Engineering,Massachusetts Institute of Technology Raytheon Technologies Research Center

Abstract

We exploit the relationship between the stochastic Koopman operator and the Kolmogorovbackward equation to construct importance sampling schemes for stochastic diﬀerential equa-tions. Speciﬁcally, we propose using eigenfunctions of the stochastic Koopman operator toapproximate the Doob transform for an observable of interest (e.g., associated with a rareevent) which in turn yields an approximation of the corresponding zero-variance importancesampling estimator. Our approach is broadly applicable and systematic, treating non-normalsystems, non-gradient systems, and systems with oscillatory dynamics or rank-deﬁcient noise ina common framework. In nonlinear settings where the stochastic Koopman eigenfunctions can-not be derived analytically, we use dynamic mode decomposition (DMD) methods to computethem numerically, but the framework is agnostic to the particular numerical method employed.Numerical experiments demonstrate that even coarse approximations of a few eigenfunctions,where the latter are built from non-rare trajectories, can produce eﬀective importance samplingschemes for rare events.

Understanding and quantitatively characterizing rare phenomena is important to modeling, de-sign, and decision making in a variety of science and engineering disciplines. Examples includestudying the failure of materials [1], predicting the insolvency of ﬁnancial institutions [2], un-derstanding the occurrence of rogue waves [3, 4], estimating reaction rates in computationalchemistry [5], and assessing the reliability of aerospace systems [6]. Many of these examplesinvolve dynamical systems forced by random noise, which is captured in the form of Brownianmotion, and a key challenge is to compute the probabilities of noise-induced rare events andthe predominant mechanisms by which they occur. These rare events are often associated withadverse outcomes and failures.In this paper, we present a general framework for constructing Monte Carlo estimators of rareevent probabilities, and of other expectations associated with rare events, in nonlinear stochasticdiﬀerential equations (SDEs). Rare event simulation for SDEs is particularly challenging for tworeasons. It ﬁrst requires a faithful model, i.e., one that exhibits the rare phenomena of interestwith suﬃciently accurate probability. Second, a computationally eﬃcient methodology is neededto produce the rare event, i.e., to characterize the tails of the relevant distributions. Performingthe latter also elucidates the pathways or mechanisms leading to a rare event.For SDEs, expectations with respect to the induced path-space probability measures are,generally, diﬃcult to compute directly. Hence Monte Carlo methods are often used to esti-mate these expectations instead. Simple Monte Carlo methods, while robust, are ineﬃcientfor estimating expectations sensitive to rare events [7]. Since rare events by deﬁnition occurinfrequently, the variance of a simple Monte Carlo estimator can be very large relative to the uantity of interest. Furthermore, for rare events in SDEs that obey a large deviations principle,simple Monte Carlo methods require an exponentially increasing number of samples to maintaina constant relative error as the noise factor in the Brownian motion decreases linearly [8, 7]. Forthese reasons, a vast body of literature has focused on devising sampling methods that improveon simple Monte Carlo for rare event simulation [9].Importance sampling for SDEs constitutes a major class of Monte Carlo methods for simu-lating rare events. Here, one simulates an alternative dynamical system whose trajectories reachthe rare event more often. Each of these samples is then re-weighted according to its importancerelative to the original SDE’s distribution. These weights are given by the celebrated Girsanovtheorem [10]. Multi-level splitting and subset simulation comprise a diﬀerent class of adaptiveMonte Carlo methods for rare events in which, over a series of iterations, one creates an artiﬁcialdrift towards the rare event of interest. Splitting methods were ﬁrst conceived in [11], with morecomputationally eﬃcient methods proposed recently [12]. Subset simulation, on the other hand,was originally proposed in the engineering reliability literature [13] and has been widely adoptedand improved upon by the civil engineering community [14]. While subset simulation can beused to estimate rare event probabilities in dynamical systems, it is typically used to samplestatic models. Links between subset simulation and sequential Monte Carlo are described in[15], and particle splitting has been extended to static and non-Markovian models in [16]. Im-portance sampling for SDEs, while simple to implement and easily parallelizable, is intrusive: inthe context of SDEs, it requires altering the drift term of the model, which may be impossiblewhen the model is given as a black box. In contrast, particle splitting methods are applicable inblack-box settings and often more stable than their importance sampling counterparts [9]. Thatsaid, splitting is generally more diﬃcult to implement than importance sampling. Furthermore,to the authors’ knowledge, there is no simple characterization of a multilevel splitting schemethat yields a zero-variance estimator.Importance sampling and particle splitting have been further enhanced by large deviationstheory [8, 17]. These methods exploit the large deviations principle as an alternative mechanismof characterizing rare events in dynamical systems, inform the implementation of splitting andimportance sampling, and provide theoretical guarantees on estimator eﬃciency [7, 18, 19].These methods also appear in the literature as so-called instanton -based sampling methods,where minimizers of the system’s large deviations rate function describe how to push the systemtowards the rare event of interest [20, 21]. This approach was ﬁrst noted in queuing theory[22], but has also been used for problems in the physical sciences. These enhancements arealso related to variational approaches to importance sampling, where the alternative SDE isposed as the solution to a stochastic optimal control problem—which, in principle, can yieldzero-variance estimators [23, 24, 25, 26]. The drawbacks of these approaches are also well-noted. Large deviations-based approaches for sampling are optimal in an asymptotic sense, butcounter-examples have been constructed to show that they can lead to larger variance thanapplying direct Monte Carlo [27]. And the computational eﬀort required to solve stochasticoptimal control problems can be untenable in high-dimensional settings.In this paper, we propose a novel framework for rare event simulation that uses tools originat-ing from dynamical systems theory and combines them with importance sampling. Speciﬁcally,we show that the stochastic Koopman eigenfunctions associated with a given SDE can be usedto accurately and eﬃciently approximate zero-variance importance sampling (IS) estimators.The approach leverages recent developments in Koopman operator approximation techniquesand only assumes that the SDE is amenable to numerical Koopman analysis. We emphasizethat our framework is agnostic to the numerical method used for approximating the Koopmanoperator.The last decade has witnessed considerable interest in operator-theoretic and data-drivencomputational approaches for analyzing and manipulating nonlinear dynamical systems. TheKoopman operator is a linear mapping on the space of observables of a given dynamical system[28, 29, 30]. Its existence provides a global linearization of the dynamics and enables spectralanalysis for nonlinear systems. Moreover, the discovery that data-driven methods for dynamicalsystems such as dynamic mode decomposition (DMD) [31] (originally conceived in the ﬂuid echanics community) can eﬀectively compute spectral objects of the Koopman operator [32]has led to their further development and widespread application.In this work, we exploit the relationship between zero-variance sampling and the Koopmanoperator to show that DMD methods can be integrated with importance sampling to create newrare event simulation techniques. Our framework provides a systematic approach with general applicability. For example, existing rare event simulation techniques are often demonstratedon gradient systems, or on systems with normal dynamics. Our approach is also applicableto non-gradient systems, non-normal systems that display transient growth, and oscillatorydynamics. This ﬂexibility is critical for extending eﬃcient dynamic rare event simulation torealistic engineering problems [6].A key feature of our approach is that we leverage the data-driven nature of Koopman nu-merics to provide insight into rare events via simulation of non-rare trajectories . The ability toresolve Koopman eigenfunctions near the rare event using non-rare trajectories enables compu-tation of a biasing that “pushes” importance sampling trajectories into the rare event regions.We show that even coarse approximations of the Koopman eigenfunctions using non-rare trajec-tories can produce good importance sampling estimators for rare event simulation. The methodis asymptotically exact in the sense that as one employs a larger number of Koopman eigen-functions, the variance of the corresponding importance sampling estimator tends towards zero.We provide a non-asymptotic analysis that describes how, under certain conditions, the secondmoment of the importance sampling estimator is bounded by a term that depends on how wellthe Koopman eigenfunctions approximate the observable of interest. Let { X t } t ∈ [0 ,T ] be a time-homogeneous diﬀusion process evolving according to the SDE, ( d X t = A ( X t ) d t + B ( X t ) d W t X = x, (1)where X t is an element of R d , A is a function from R d to itself, B is a function from R d to thespace of d × r real-valued matrices, and W t is a standard r -dimensional Brownian motion. Toguarantee existence and uniqueness of a strong solution to the SDE, we assume the drift vectorand diﬀusion matrix are locally Lipschitz in space [33]. We wish to estimate ρ = E [ f ( X T ) | X = x ] = Z R d f ( x ) π T ( x ) d x, (2)where f ( x ) is non-negative, and π T ( x ) is the probability density of the state at time T . Notethat if f ( x ) were an indicator function over some rare event of interest E , then ρ would equalthe probability of the state being in region E at time T . We also assume that the system hasan invariant distribution η ∞ .In the next section, we review some theoretical tools for importance sampling in stochasticdiﬀerential equations. In Section 3, we discuss the Koopman operator and related numericalmethods, and we present our framework for constructing importance sampling estimators. InSection 4, we demonstrate the methodology on a range of illustrative stochastic dynamicalsystems. We analyze the variance of the importance sampling estimators produced by ourmethodology in Section 5. We conclude and discuss future work in Section 6. We start with an overview of analytical tools for studying stochastic diﬀerential equations,including the inﬁnitesimal generator and the Kolmogorov equations. Much of this discussion isbased on Karatzas and Shreve [33], Øksendal [10], and Pavliotis [34]. We also review importancesampling in the context of SDEs and describe related approaches to rare event simulation basedon stochastic optimal control and large deviations theory. .1 Kolmogorov equations Let X t be deﬁned by the SDE in (1). One of the primary tools for studying stochastic processesis the inﬁnitesimal generator deﬁned as A f = lim t → E [ f ( X t ) | X = x ] − f ( x ) t , (3)for f ∈ D A , where D A is the set of functions for which the above limit exists for all x ∈ R d . ForSDEs, a closed form expression of the limit involves the drift and diﬀusion terms as follows, A f = h A ( x ) , ∇ f i + Tr (cid:2) Q ( x ) ∇ f (cid:3) , (4)= d X i =1 A i ( x ) ∂f∂x i + d X i =1 d X j =1 Q ij ( x ) ∂ f∂x i ∂x j , where Q ( x ) = B ( x ) B ( x ) ∗ and ψ is a twice-continuously diﬀerentiable function on R d . Theinﬁnitesimal generator appears in the Kolmogorov equations, which are two PDEs that describethe evolution of densities and statistics of a given SDE. The Kolmogorov backward equation (KBE) describes the time-evolution of expectations of functions of the state. Let Φ( t, x ) = E t,x [ f ( X T )] := E [ f ( X T ) | X t = x ] be deﬁned on t ∈ [0 , T ], where T >

0. Then  ∂ Φ ∂t + A Φ = 0Φ(

T, x ) = f ( x ) . (5)The Kolmogorov forward equation (KFE), also known as the Fokker–Planck equation, describesthe evolution of the probability density function of the state. The equation is found by consid-ering the L -adjoint of the inﬁnitesimal generator. Let π ( t, x ) be the probability density of X t .Then  ∂π∂t = A ∗ π ( t, x ) π (0 , x ) = π ( x ) (6)where the adjoint is A ∗ π = −∇ · ( A ( x ) π ) + Tr (cid:2) ∇ ( Q ( x ) π ) (cid:3) . (7)Theoretically, expectations such as (2) can be found via a direct solution of the KBE. Thequantity of interest is simply an evaluation of the solution: ρ = Φ(0 , x ). However, solving theKBE exactly is expensive and increasingly intractable as the dimension of the state space grows.Furthermore, when one is interested in quantities such as rare event probabilities, the requiredsolution accuracy typically becomes prohibitive. For this reason, we turn to sampling methods,in which multiple independent simulations of an SDE are performed to estimate expectationsthrough a sample average. While a direct solution of the Kolmogorov equations may not befeasible, in what follows, we show that these equations can be used to approximate zero-varianceestimators. We now review some basic notions of Monte Carlo and importance sampling methods for SDEs.Let P be the path-space measure induced by the SDE in (1). A simple Monte Carlo method forestimating ρ involves generating M independent simulations of the SDE, evaluating the functionof interest f ( x ) at the end of each sample path, and computing the sample average. We thenhave ρ ≈ ˆ ρ = 1 M M X i =1 f ( X ( i ) T ) , (8) here the samples X ( i ) are drawn independently from P . The eﬃciency of a Monte Carloestimator is typically evaluated by considering its variance and relative error (also known as thecoeﬃcient of variation, i.e., the standard error divided by the quantity of interest) [35]. Theyare, respectively, V [ˆ ρ ] = 1 M V [ f ( X T )] , (9) r e = 1 ρ p V [ˆ ρ ] . (10)A good unbiased estimator should have low variance, but when estimating a small ρ such as arare event probability, relative error is the better metric. This is because r e can still be largeif ρ is orders of magnitude smaller than V [ˆ ρ ]. In other words, our goal is to ensure that thestandard deviation of the estimator scales in proportion with the probability of interest.The ineﬃciency of simple Monte Carlo methods is clear when used to estimate rare eventprobabilities. Let f ( x ) = E ( x ) where E ⊂ R d is a region of phase (or observable) space visitedinfrequently. The variance and relative error of the estimator are, V [ˆ ρ ] = ρ − ρ M ≈ ρM ,r e ≈ √ M ρ .

We can see that the number of samples required to keep the relative error below 1 is O (1 /ρ ).This task is particularly intractable when it is computationally expensive to procure samplesfrom the dynamical system. In simple Monte Carlo, the only way one can reduce the varianceof the estimator is by increasing the number of samples in each estimate. Variance reductionmethods pursue diﬀerent mechanisms for reducing the variance beyond simply increasing thenumber of samples.One common variance reduction approach is importance sampling , which involves drawingsamples from an alternative probability measure, Q , that is absolutely continuous with respectthe original probability distribution, such that the variance of the resulting estimator is reduced.To account for the bias introduced when sampling from the alternative probability distribution,each sample is weighted according its relative importance with respect to the original measure P . In particular, ˆ ρ IS = 1 M M X i =1 f ( ˜ X ( i ) T ) d P d Q ( ˜ X ( i ) ) , (11)where ˜ X ( i ) are drawn independently from Q . The variance of this estimator is dependent onthe product of the function f ( x ) and the likelihood ratio between P and Q , and on the numberof samples drawn: V [ˆ ρ IS ] = 1 M V Q (cid:20) f ( ˜ X T ) d P d Q (cid:21) . (12)Thus, designing a measure Q provides an additional mechanism to reduce the variance of thesampling method. For SDE systems, the only admissible choice of Q is induced by another SDEsystem { ˜ X } t ∈ [0 ,T ] with the same diﬀusion term as the original SDE and a diﬀerent drift term[10, 36]: ( d ˜ X t = h A ( ˜ X t ) + B ( ˜ X t ) u ( t, ˜ X t ) i d t + B ( ˜ X t )d W t ˜ X = x. (13)Here u ( t, x ) is called the biasing function. This function serves as a feedback controller thatguides the system such that the resulting importance sampling estimator has lower variance. he likelihood ratio is now given by Girsanov’s theorem [10, 33]: Z ( ˜ X ) ≡ d P d Q ( ˜ X ) = exp − Z T h u ( t, ˜ X t ) , d W t i − Z T k u ( t, ˜ X t ) k d t ! . (14)The task now is to choose u ( t, x ) such that the variance of the resulting importance samplingestimator is smaller, or better yet, zero. Assuming that f ( x ) is twice-continuously diﬀerentiableand strictly positive, there exists a choice of u ( t, x ) that leads to a zero-variance importancesampling estimator. This choice is the celebrated Doob h-transform [7, 37, 38].

Theorem 1 (Doob h -transform) . Let f ∈ C be strictly positive. Let Φ( t, x ) = E t,x [ f ( X T )] bethe solution to  ∂ Φ ∂t + A Φ = 0 , Φ( T, x ) = f ( x ) . (15) Then using the biasing function u ( t, x ) = B ∗ ( x ) ∇ log Φ( t, x ) (16) in (13) will satisfy f ( ˜ X T ) exp " − Z T h u ( t, ˜ X t ) , d W t i − Z T k u ( t, ˜ X t ) k d t = Φ(0 , x ) . (17)The proof is provided in Appendix A. This choice of biasing results in a zero-varianceestimator for ρ since ρ = Φ(0 , x ). This result should not be surprising: having access to theexact solution to the KBE enables construction of a Monte Carlo estimator with zero variance,since an evaluation of the solution is, itself, a zero-variance estimator. Though this relationshipmight seem tautological, it provides useful insights for devising eﬃcient rare event simulationtechniques.Previous approaches recast this problem in terms of optimal control. By deﬁning a newfunction U ( t, x ) = − log Φ( t, x ), one can obtain a PDE for U ( t, x ) by performing a change ofvariables on the KBE. The resulting PDE is known as a stochastic Hamilton–Jacobi–Bellman (HJB) equation, which can be reformulated as a stochastic optimal control problem. In [23], theauthors opt to solve the stochastic optimal control problem directly by using this formulationin conjunction with the Donsker–Varadhan variational formula. One can further recast theproblem in terms of the solution of a system of forward-backward SDEs [25]. This approachalso admits a cross-entropy interpretation for importance sampling for SDEs [26].A similar approach incorporates the theory of large deviations, speciﬁcally the Freidlin–Wentzell theory for small noise diﬀusions [39]. Here, one considers a noise parameter ǫ thatscales the diﬀusion term, by replacing B ( x ) with √ ǫ B ( x ). Then by considering the variabletransformation U ǫ ( t, x ) = − ǫ log Φ( t, x ) and sending ǫ to zero, one obtains a Hamilton–Jacobiequation whose solution is related to the large deviations rate function of the system [7]. It wasfound that subsolutions of this Hamilton–Jacobi equation [18, 40, 41], for diﬀusion processes on R d and in function space [42], result in provably asymptotically eﬃcient estimators. However,the drawback of this approach is that the subsolution of the Hamilton-Jacobi equation mustbe “guessed,” which is not always straightforward. Note that exact solutions of the resultingdeterministic optimal control problem lead to strongly eﬃcient estimators for SDEs [7]. As written, Theorem 1 does not apply when f is an indicator function. This is an artifact of the simple waywe have chosen to express the result. It is straightforward to modify it by conditioning on the event that X T entersa particular region; indeed, the Doob transform was originally derived just for conditioned processes [37]. Also,our numerical experiments will mollify the indicator function in constructing numerical approximations of the Doobtransform, such that Theorem 1 applies directly. ur approach, described below in Section 3, will avoid both of the above reformulationsby directly computing approximate Doob transforms using approximate solutions to the KBE.These solutions of the KBE will be expressed in terms of the eigenfunctions of the stochasticKoopman operator. Our approach can also be related to the work of [43], in which the authorscombine trajectory data from molecular dynamics simulations with nonlinear manifold learningtechniques to inform the exploration of rare regions of state space. However, their technique isrestricted to gradient systems for computational chemistry applications. The problem posed in (2) is just one of many scenarios that are of interest in rare event simula-tion. In this paper, we only consider the problem of the state being in some region of interest atsome ﬁxed future time T . Another common problem is to compute the probability of enteringsome region, E , before another, F : hence P ( X τ ∈ E ), where τ = inf { t > X t ∈ E ∪ F } . Avariation of this problem considers path-dependent quantities, which involve functionals of sam-ple trajectories. These problems are well-studied in the computational chemistry community,where one seeks rare paths between long-lived molecular conﬁgurations [5, 24]. This quantity ofinterest is associated with the solution of a boundary value problem, and its approximation canalso be used for sampling. The application of data-driven dynamical systems methodologies tothis problem has been studied in [44].Another quantity of interest is the probability of entering some set of interest within a ﬁxed ﬁnite time interval, i.e., P ( τ ≤ T ) where τ = inf { t > X t ∈ E } . This problem is associatedwith escaping from attracting sets of a dynamical system and is well-studied in [41, 45]. In thiscase, asymptotically eﬃcient importance sampling estimators are designed by considering aninitial-boundary value problem associated with the KBE. We ﬁrst review the deterministic and stochastic Koopman operators, and discuss how they canbe used to approximate expectations and probabilities. We then describe how we will use thestochastic Koopman operator to construct importance sampling schemes for SDEs.

A traditional approach to analyzing dynamical systems involves simulating the evolution of states . The Koopman operator [28] provides an alternative perspective: it represents the dy-namical system in terms of the evolution of observables . The key advantage is that the evolutionof observables is linear even when the underlying system is nonlinear, thus enabling spectralanalysis of nonlinear systems [29].Let x t be an autonomous dynamical system on R d evolving according to ˙ x = a ( x ). Let F t be the ﬂow map; that is, if x is the initial condition, then x t = F t x . Let f : R d −→ R be anobservable in some space of functions H . The Koopman operator (KO) is deﬁned as K t f ( x ) = ( f ◦ F t )( x ) . (18)It is trivial to show that the KO is linear even when the dynamical system is nonlinear. Thisproperty allows one to study the eigenfunctions and eigenvalues of the operator. A function φ ( x ) is a Koopman eigenfunction if it satisﬁes K t φ ( x ) = e λt φ ( x ), where λ is the correspondingKoopman eigenvalue.The stochastic Koopman operator (sKO) is deﬁned in a similar fashion [46]. We focusour attention on random dynamical systems that evolve according to SDEs as deﬁned in (1).Let { X t } t ∈ [0 ,T ] be a stochastic process and f be a twice continuously diﬀerentiable real-valuedobservable, respectively. Then the stochastic Koopman operator is deﬁned as, K t f ( x ) = E [ f ( X t ) | x = x ] = E ,x [ f ( X t )] , (19) here the expectation is taken over the distribution of the state of the stochastic process at time t . Analogous to the deterministic setting, the sKO is also linear, leading to the spectral analysisof nonlinear SDEs. The evolution of the expectation of the sKO’s eigenfunctions at future timesis simple to determine. If φ ( x ) is an eigenfunction of the sKO, then E [ φ ( X t ) | X = x ] = e λt φ ( x ).Thus, the time evolution of certain observables of the dynamical system can be determinedcomputationally. Assuming that the sKO eigenfunctions exist and form a basis for a suitable function space,expectations and probabilities associated with an SDE can, in principle, be calculated from allthe eigenfunctions. Speciﬁcally, we can write the expectation of an observable at some ﬁxedtime in terms of the expectations of the sKO eigenfunctions, by ﬁrst expressing the observableas a linear combination of these eigenfunctions. A ﬁnite collection of eigenfunctions can thus provide an approximation to the expectationsand probabilities of interest. Let f represent some observable of interest and { φ i ( x ) } Ni =1 be a col-lection of N eigenfunctions of the sKO with corresponding eigenvalues { λ i } Ni =1 . Approximatingthe observable in terms of the eigenfunctions gives f ( x ) ≈ N X i =1 f i φ i ( x ) , (20)and hence, E [ f ( X t ) | X = x ] ≈ N X i =1 f i E ,x [ φ i ( X t )] , (21)= N X i =1 f i K t φ i ( x )= N X i =1 f i e λ i t φ i ( x ) . For rare event probabilities, it suﬃces to replace f with an indicator function over the rareset of interest; that is, to compute P ( X T ∈ E | X = x ), one would choose f ( x ) = E ( x ).In general, making this approximation accurate may require accurately computing many sKOeigenfunctions, which may not be practical in most settings. Instead we can combine this ideawith importance sampling, as follows.The sKO eigenfunctions can be used to create approximate solutions to the Kolmogorovbackward equation. For continuous-time autonomous dynamical systems, the set of stochasticKoopman operators {K t } t ∈ [0 , ∞ ) form a one parameter semigroup indexed by time. All elementsof the semigroup share the same eigenfunctions, with varying eigenvalues depending on theirparameter value. The generator of the semigroup is identically the inﬁnitesimal generator of theSDE. That is, the generator of the sKO semigroup is exactly the evolution operator of the KBE.While this connection has been studied in stochastic analysis since the time of Kolmogorov, thisconnection is made most explicit in [46]. For SDEs that admit an invariant measure and whose generators are compact and self-adjoint, the spectraltheorem guarantees the existence of eigenvalues and a complete orthonormal set in L ( η ∞ ), where η ∞ is the invariantmeasure. A frequently studied class of systems that admits a complete set of eigenfunctions are reversible diﬀusions.One example of a reversible diﬀusion occurs when the drift term is the gradient of a potential function and the diﬀusionmatrix is the identity. In these cases, the solutions to the Kolmogorov equations can be found via eigenfunctionexpansions. See [34] for further details. Irreversible Ornstein-Uhlenbeck processes with invariant measure ν have alsobeen shown to admit a complete basis of eigenfunctions on L p ( ν ) for p > ith this knowledge, we can construct importance sampling estimators for nonlinear SDEs.Observe that (21) provides an approximation to the quantity of interest in (2). Rather thanusing it directly to estimate the probability of the rare event, we use it to approximate the Doobtransform. Observe that ˜Φ( t, x ) = N X i =1 f i e λ i ( T − t ) φ i ( x ) , (22)is an approximate solution to the KBE in (5). Then we can use the approximate Doob transform,˜ u ( t, x ) = B ( x ) ∗ P Ni =1 f i e λ i ( T − t ) ∇ φ i ( x ) P Ni =1 f i e λ i ( T − t ) φ i ( x ) , (23)to construct a new importance sampling scheme via (13) and (14). Intuitively, if ˜Φ is a goodapproximation of the true solution, then the approximate Doob transform will be a good ap-proximation to the true Doob transform, with the guarantee that if there exists a complete setof eigenfunctions, then the estimator will have zero variance as N → ∞ .In practice, this framework oﬀers considerable ﬂexibility. While (21) provides an approx-imation to the quantity of interest, the errors introduced by truncation, and any additionalerrors resulting from numerical approximations of the eigenfunctions themselves, cannot easilybe characterized. Instead, using the approximation within the Doob transform allows us toresolve these errors through Monte Carlo simulation. Our numerical experiments will demon-strate that even crude approximations of a few sKO eigenfunctions, where the latter are builtfrom non-rare trajectories, can be used to build eﬀective importance sampling methods for rareevent probabilities. Moreover, the dynamics of the controlled SDE system (13) naturally revealthe most likely paths to the rare event.Next we discuss numerical techniques for approximating the sKO eigenfunctions. Dynamic mode decomposition (DMD) methods are a class of data-driven methods that cancompute eigenvalues and eigenfunctions of a (stochastic) dynamical system’s (stochastic) Koop-man operator. The original DMD method was presented in [31] as means of model reduction forcomplex ﬂuid ﬂows. Low-dimensional behavior was extracted from time series data comprisingsnapshots of high-ﬁdelity ﬂuid dynamics simulations. The connection between DMD and thespectral objects of the Koopman operator was made clear by [32, 48], and there has since beenconsiderable interest in developing more eﬀective and eﬃcient DMD methodologies and variants.DMD methodologies typically use only sample trajectories of the system to compute theKoopman eigenvalues and eigenfunctions, by indirectly approximating the inﬁnitesimal genera-tor [46, 49]. To avoid introducing errors due to these approximations of the generator, we use theanalytical form of the SDE, which in turn provides access to the exact form of the generator ofthe sKO semigroup. In particular, we compute Koopman eigenfunctions and eigenvalues usinga recently developed variant of DMD called inﬁnitesimal generator EDMD [50]. The approachis based on using the stochastic Koopman generator in (3) directly. We summarize the mainsteps in the approach here.Fix a set of test points { x i } mi =1 drawn from a probability measure µ and a set of twicecontinuously diﬀerentiable basis functions { ψ k ( x ) } nk =1 . Suppose the stochastic process { X t } evolves according to (1). The main idea is to project the action of the Koopman generatoronto the basis functions. Following the notation of [50], let ψ ( x ) = [ ψ ( x ) , . . . , ψ n ( x )] T , deﬁne In our numerical experiments, we ﬁnd that collecting test points from sample trajectories tends to produce betterresults (when validated on separate testing data) than prescribing some arbitrary measure µ . ψ k ( x ) := ( A ψ k )( x ), and deﬁne d Ψ X =  d ψ ( x ) · · · d ψ ( x m )... . . . ...d ψ n ( x ) · · · d ψ n ( x m )  Ψ X =  ψ ( x ) · · · ψ ( x m )... . . . ... ψ n ( x ) · · · ψ n ( x m )  . (24)Let K be the ﬁnite dimensional representation of A . The task is to ﬁnd the matrix K ∈ R n × n such that the residual k dΨ X − K Ψ X k F is minimized, where k · k F is the Frobenius norm. Eachcolumn of K is the solution to a least-squares problem, and it can be shown that K = dΨ X Ψ + X ,where + denotes the pseudoinverse. Furthermore, [50] shows that as the number of test points m → ∞ , this DMD method converges to a Galerkin projection onto the span of the basisfunctions with respect to µ . Speciﬁcally, it is shown that, K = dΨ X Ψ + X = (dΨ X Ψ TX )(Ψ X Ψ TX ) + = b A b G + , (25)where b A = 1 m m X i =1 d ψ ( x i ) ψ ( x i ) T , b G = 1 m m X i =1 ψ ( x i ) ψ ( x i ) T . (26)And as the number of test points goes to inﬁnity,lim m →∞ b A ij = Z ( A ψ i )( x ) ψ j ( x ) d µ, lim m →∞ b G ij = Z ψ i ( x ) ψ j ( x ) d µ. (27)The quality of the approximated eigenfunctions and eigenvalues will depend on the choice ofbasis functions and test point measure µ . Given a ﬁnite collection of sKO eigenfunctions, we can approximate solutions to the KBE, andhence the Doob transform, without having to solve the stochastic optimal control problemsassociated with existing rare event sampling methods. Computing these sKO eigenfunctions, asdescribed in the previous section, is the ﬁrst numerical challenge of our approach. The secondchallenge is to approximate the observable f as a linear combination of the eigenfunctions. Wetackle this very simply, using linear regression with a least-squares objective.Formulating this regression problem precisely, and ensuring that the results can be usedto deﬁne an appropriate biasing function via (23), requires resolving two issues. First is thechoice of regression points. We construct the regression problem using the same point setused to compute the sKO eigenfunctions, as described in the previous subsection. For rareevent simulation, i.e., f ( x ) = E ( x ), properly representing the indicator function demands thatsome regression points lie inside the event of interest E . In our approach, we simulate manytrajectories of the original system with diﬀerent initial conditions throughout the domain andthen subsample each trajectory to generate the regression (and EDMD) points. We assume thatthe user knows where the rare event E lies in state space, but has little idea how the systemreaches it. To ensure that we have regression points inside E , we begin many of the sampletrajectories inside the event of interest. In our numerical experiments, for instance, the initialconditions are uniformly spaced over some subset of the state space that contains a portion ofthe rare event and the initial condition x . (Further details are given in Section 4.)Now suppose that { φ i } Ni =1 are the computed sKO eigenfunctions, and let { x j } mj =1 denote theregression points. Let f = ( f , . . . , f N ) ∈ R N be the expansion coeﬃcients in (20), F ∈ R m beevaluations of f at the regression points, and C ∈ R m × N be the design matrix with C ji = φ i ( x j ) . We then solve the least-squares problem,min f ∈ R N k F − Cf k . (28) ext, recall that the Doob transform requires the approximate KBE solution (22) to bestrictly positive. This property is not guaranteed by linear regression onto the eigenfunctions.One could add positivity constraints at the regression points to (28), but instead we correct“afterwards” by adding a constant to the approximate KBE solution produced by the regression.This correction does not impact the consistency of the sampling approach, because the constantfunction is always an sKO eigenfunction. The value of the approximated observable ˜ f ( x ) = P Ni =1 f i φ i ( x ) at each of the regression points can be found by computing Cf . Assuming thatthe regression points suﬃciently sample the relevant parts of the state space, we simply takethe minimum of these values, denoted by − ǫ , and replace the coeﬃcient f of the constant sKOeigenfunction with f + max( ǫ, f will not aﬀect the direction ofthe biasing function ˜ u ( t, x ). The magnitude of ˜ u will be diminished, however, since adding aconstant increases the magnitude of the denominator in (23). This correction may thus causethe biasing to be too small to push the state into the rare event. To address this issue, wescale the biasing function by a multiplicative factor c ≥

1, to ensure that a suﬃcient numberof trajectories reach the rare event when performing importance sampling; our ﬁnal biasing isthus ˜ u ( t, x ) = c B ( x ) ∗ ∇ log ˜Φ( t, x ). In practice, we adjust c after ﬁnding the Doob transform, bysimulating small batches of the controlled system with diﬀerent c values and choosing a valuesuch that a suﬃcient fraction of samples reach the rare event.Our approach is summarized in Algorithm 1. In the next section, we provide further detailson our implementation of the regression method and explore how the numerical choices aboveaﬀect the performance of importance sampling. Algorithm 1:

Approximating the Doob transform.

Input:

SDE dX t = A ( X t )d t + B ( X t )d W t and observable f ( x ) Output:

Approximate Doob transform ˜ u ( t, x ) Generate test points { x j } mj =1 from sample trajectories with diﬀerent initial conditions Apply generator EDMD to obtain sKO eigenfunctions { φ i ( x ) } Ni =1 and eigenvalues { λ i } Ni =1 .Alternatively, for linear systems, OU eigenfunctions are computed exactly. Approximate f ( x ) ≈ ˜ f ( x ) = P Ni =1 f i φ i ( x ) via regression If necessary, increase f so that ˜ f ( x j ) > j . Approximate solution to KBE is ˜Φ( t, x ) = P Ni =1 f i e λ i ( T − t ) φ i ( x ) Approximate Doob transform (biasing) is ˜ u ( t, x ) = c B ( x ) ∗ ∇ log ˜Φ( t, x ) . Choose c such that a suﬃcientnumber of trajectories reach the rare event. We demonstrate our framework on a series of linear and nonlinear stochastic dynamical systems.The impact of numerical parameters used to construct the biasing is ﬁrst explored through asimple example involving a one-dimensional Ornstein–Uhlenbeck process. We then demonstratethe generality of our approach for linear dynamical systems with additive noise, by applying itto a non-normal linear SDE, a noisy Brownian oscillator, and the stochastic advection-diﬀusionequation (which is an inﬁnite-dimensional system). We then turn to several nonlinear SDEs,where we show how the approach enables escape from diﬀerent types of attractors.The stochastic ODE systems are integrated numerically using a stochastic Runge–Kuttascheme [38, 51]. The stochastic PDE system is integrated using exponential Euler methods [52].Since our importance sampling is unbiased, it suﬃces to report the variance of the importancesampling weight as seen in (12). Without loss of generality, we will also report the relative errordeﬁned in (10) with M = 1, i.e., the relative error per sample . .1 Illustrative one-dimensional SDE We ﬁrst consider a simple one-dimensional Ornstein–Uhlenbeck (OU) process to illustrate ourapproach and to highlight numerical challenges that occur in more complex examples as well.Let X t ∈ R evolve according to ( d X t = − X t d t + √ W t ,X = 0 . (29)Our goal is to estimate ρ = P ( X T ≥ | X = 0) = E [ x> ( X T ) | X = 0], where T = 1. For thisproblem, the marginal density at time T can be derived analytically and the exact value of ρ (to ﬁve digits) is 1 . × − .The stochastic Koopman generator of the system is A ψ = − xψ ′ + ψ ′′ (30)and the associated eigenvalue problem is known as the Hermite diﬀerential equation, whose solu-tions can be found in closed form. The eigenfunctions are the probabilists’ Hermite polynomials φ n ( x ) = He n ( x ) with eigenvalues λ n = − n for n ∈ N . We use least squares regression to ﬁndthe expansion coeﬃcients in (20). In this case, the eigenfunctions are orthogonal with respectto the standard Gaussian distribution, which implies that an optimal approximation of the indi-cator function f ( x ) = x> ( x ) in a weighted L ( ν ) (where ν is the standard Gaussian measure)sense could be found by integrating the product of the indicator and an eigenfunction over thestandard Gaussian measure. More generally, if the diﬀusion is reversible, then its eigenfunctionswill be orthogonal with respect to the invariant distribution of the system [34]. However, thisapproach is impractical for higher-dimensional systems, as it would require computing severalhigh dimensional integrals, each of which is sensitive to the rare event. Therefore, to keep thisexample consistent with the results of the more complicated systems, we perform regression asdescribed in Section 3.4.We perform our regression with test points drawn from a distribution with more probabilitymass in the rare event than the invariant distribution. Speciﬁcally, we draw m = 50 independentsamples x i ∼ N (0 , ), where roughly 15% of points fall inside the region of interest. Tomitigate the Gibbs phenomenon, we use a molliﬁed version of the indicator in the regressionproblem, f ( x ) = (1 + tanh(3( x − p = N − , , ,

21, are plotted in Figure 1a. Notice that least-squares regression oftenleads to the approximating function not being strictly positive over the domain. As explainedin Section 3.4, we then add a constant to the approximation such that it is strictly positive. Weshow the resulting approximations of the indicator function in Figure 1b.Now we use the approximated observable ˜ f to build an approximate Doob transform (23) andperform importance sampling via (13) and (14). To account for a diminished biasing magnitudedue to positivization, as discussed in Section 3.4, we multiply the biasing function by a factor c ≥ c impacts the performance of the importance sampling estimator.First, Figure 2 shows the time- T marginal distributions of the biased and unbiased systems,along with the optimal (zero-variance) importance sampling distribution for the expectationof interest. Notice that as the number of eigenfunctions increases, the shape of the histogramtends towards the zero-variance importance sampling density. Table 1 reports the variance andrelative error per sample of the importance sampling estimators resulting from each approxi-mation of f . In this simple example, we see that increasing the number of basis functions doesnot meaningfully increase the eﬃciency of the estimator. This is likely because increasing thepolynomial degree of the approximation leads to more local minima and maxima, causing somesample trajectories to be driven away from the rare event of interest. On the other hand, thisresult demonstrates how even a small number of eigenfunctions can signiﬁcantly improve theeﬃciency of importance sampling. For instance, using just two eigenfunctions results in thevariance being reduced by a factor of 20 compared to simple Monte Carlo. a) Without positivity constraint. (b) With positivity constraint. Figure 1: Approximating the indicator function.Figure 2: Sample distribution at time T = 1 for the one-dimensional OU example. Blue curve isthe optimal importance sampling density. 13ariance Relative errorMonte Carlo 1 . × − . p = 1 6 . × − . p = 2 7 . × − . p = 11 5 . × − . p = 21 2 . × − . ρ true = 1 . × − Table 1: One-dimensional OU example: IS estimator variance with increasing polynomial degree p . The multiplier c and oﬀset ǫ are tuned. Ideally, the multiplier c should be chosen so that the variance of resulting importance sam-pling estimator is as small as possible. In Table 2 we demonstrate the impact of c on thisvariance, ﬁxing p = 1. The variance of the estimator initially decreases with increasing c , up toa threshold beyond which the performance degrades. Intuitively, too small a multiplier resultsin a larger variance, as too few samples reach the rare event. On the other hand, biasing witha very large multiplier leads to many samples deep in the tails. This implies that the most ofresulting weights (14) will be small. Since the estimator is unbiased, however, a few samples willhave very large weights, leading to a large estimator variance overall. In the following numericalexamples, we choose c such that roughly 50% of the resulting samples enter the rare event; thisrule of thumb is justiﬁed by the trends observed in Table 2. If, however, most of the weightsresulting from a given value of c are very small, and certainly if c is so large such that therelative error is larger than that of simple Monte Carlo, then the value of c should be reducedso that trajectories are not pushed as deeply into the tails. Variance Relative error Proportion in rare eventMonte Carlo 1 . × − .

07 0 . c = 1 6 . × − .

05 0 . c = 2 2 . × − .

31 0 . c = 4 8 . × − .

89 0 . c = 6 7 . × − .

79 0.558IS c = 8 4 . × − .

16 0 . c = 16 3 . × − . . ρ true = 1 . × − Table 2: One-dimensional OU example: impact of the multiplier on importance sampling per-formance, with ﬁxed p = 1. Rightmost column reports the proportion of sample trajectoriesterminating in the rare event region. We now consider linear SDEs of the form,d X t = A X t d t + B d W t , (31)where A ∈ R d × d is diagonalizable, has eigenvalues { µ i } di =1 with strictly negative real parts, andright and left unit eigenvectors denoted by { e i } di =1 and { w i } di =1 , respectively. Here B ∈ R d × r and W t ∈ R r is an r -dimensional Brownian motion. We also assume that the left eigenvectorsof A are not in the null space of B .For linear stochastic dynamical systems, the sKO eigenfunctions can be found exactly. Thegenerator of the sKO semigroup for linear SDEs is known as the Ornstein–Uhlenbeck (OU) perator. In was shown in [47] that, under mild conditions, the OU operator has a discretespectrum in L p ( ν ), where ν is the stationary measure of the process. Furthermore, [47] showsthat the eigenfunctions are complete in L p ( ν ) for p ≥

2, they have a polynomial structure,and the eigenvalues and eigenfunctions are the same for all p . Computing the eigenfunctions,however, presents a separate challenge. It is well known that if the OU operator is self-adjoint,which is the case if A and B are symmetric and commute, then the eigenfunctions are thetensorized Hermite polynomials [34], φ n ( x ) = d Y k =1 He n i (cid:18) √ µ i k B ∗ e i k h x, e i i (cid:19) . (32)If A is normal with only complex eigenvalues, and B commutes with A , then the eigenfunctionsare a tensor product of Hermite-Laguerre-Itˆo polynomials, ﬁrst noted in [53]. Otherwise, onehas to consider numerical methods for computing the eigenfunctions [50, 54, 55]. We begin with a two-dimensional non-normal system, where A = (cid:20) − − . (cid:21) , B = 0 . I . (33)The dot product of the two eigenvectors of the system is 0 . µ = − . µ = −

1, with left eigenvectors [0 . , . T and [1 , T , respectively. We consider theproblem of escaping from a ball of radius L at a ﬁxed ﬁnite time T , ρ = P [ k X T k≥ L | X = 0] , (34)where T = 10, L = 0 .

75. This is a problem of escaping from an attractor, which is well-studied in the computational chemistry literature. Methods such as transition path theory [5]and the string method [56] characterize the most likely pathways for trajectories to transitionbetween metastable states. Related methods such as the gentlest ascent dynamics [57] ﬁndtransition paths by pushing the system along the direction of the most slowly decaying right eigenvector of the Jacobian at the stable point. In a separate eﬀort, [42] justiﬁes using mostslowly decaying right eigenvector to construct eﬃcient importance sampling estimators for linearstochastic PDEs, in the presence of a suitable spectral gap. Yet these methods are typicallyrestricted to gradient systems (noisy diﬀusions on a potential energy surface) or self-adjointlinear systems. Using the Koopman approach, we will demonstrate below that biasing a non-normal linear system along the left eigenvector that corresponds to the most slowly decayingmode leads to a signiﬁcantly better importance sampling estimator.For the system in (33), one can easily check that the ﬁrst six eigenfunctions, ordered accordingto the magnitudes of the eigenvalues and with total polynomial degree up to two, are φ ( x ) = 1 , λ = 0 φ ( x ) = p µ h x, w i λ = − . φ ( x ) = p µ h x, w i λ = − φ ( x ) = 200 µ h x, w i − λ = − . φ ( x ) = 200 √ µ µ h x, w ih x, w i − √ µ µ µ + µ h w , w i λ = − . φ ( x ) = 200 µ h x, w i − λ = − w and w are left eigenvectors of A . The function of interest is an indicator on the ballof radius 0 .

75 centered at the origin, which is projected onto the set of eigenfunctions. Since the A . Right: regression points generated from sample trajectories of the unbiased system. indicator is an even function along the direction of any solitary left eigenvector, we can omiteigenfunctions with odd polynomial degree prior to projection. Thus, the indicator functionover the ball is projected onto the span of { φ , φ , φ , φ } .We plot the eigenfunctions in Figure 3 and highlight the left eigenvector directions in red.To generate the regression points that are used to approximate the indicator function as a linearcombination of eigenfunctions, we simulate 121 independent trajectories of length T , beginningwith uniformly spaced initial conditions on [ − . , . . The state is extracted at time intervalsof ∆ t = 0 .

02, and the resulting 60621 points are shown in Figure 3. Figure 4 then shows thevector ﬁelds produced by the resulting biasing function at t ∈ { , , . } . Notice that the biasingpushes in the direction of the slowest-decaying left eigenvector for most of the simulation period[0 , T ], until the end of the interval, when T − t becomes small and the biasing (23) begins topush in all directions away from the attracting point. In Figure 5, we show sample trajectoriesof the unbiased and biased systems. Notice that the exit directions do not generally align withany eigenvector directions. Performance of the importance sampling estimator is summarizedin Table 3, where we observe that the variance is reduced by a factor of over 2500. Figure 6shows the histogram of the norm of the state at time T for simple Monte Carlo and importancesampling. Notice that far more samples reach the rare event when importance sampling isapplied.The eﬀectiveness of biasing in the direction of the slowest-decaying left eigenvector can beexplained intuitively by considering the phase portrait of a deterministic non-normal linear dy-namical system. In Figure 7, we plot trajectories of a highly non-normal stable linear systemwith initial conditions on the unit circle. We also plot the left eigenvector with the least negativeeigenvalue. First, note that there are initial conditions for which the norm of the state initiallygrows before decaying towards zero; this is a hallmark of highly non-normal systems. Second,notice that pushing outwards in the direction of the left eigenvector naturally exploits the sys-tem’s transient growth to move the state even further from the attracting point at the origin.In non-normal systems, left and right eigenvectors corresponding to diﬀerent eigenvalues areorthogonal. Therefore, the slowest-decaying left eigenvector is orthogonal to the (fast-decaying)manifold spanned by all but the slowest-decaying right eigenvector. When pushing in the di-rection of the left eigenvector, trajectories are driven away from the attracting point by thefast-decaying manifold of the dynamical system. The left eigenvector direction thus harnessesthe system’s underlying dynamics to reach the rare event region with the least biasing eﬀort. A . Note that the lengths of the vectors for a given timeare plotted relative to each other and are not comparable for diﬀerent times.Figure 5: Samples of the nominal and biased trajectories of the non-normal linear system. Redcircle denotes the boundary of the rare event.Figure 6: Distribution of the norm of X T for simple Monte Carlo and importance sampling of thelinear non-normal system. Red line denotes the boundary of the rare event region.17igure 7: Example phase portrait of a highly non-normal system. The red vector points in thedirection of the left eigenvector with the least negative eigenvalue. Notice that initial conditionsthat lie on the line deﬁned by this eigenvector will initially experience transient growth beforedecaying to the origin. Variance Relative errorMonte Carlo 1 . × − . . × − . ρ true = 1 . × − Table 3: Importance sampling performance for the SDE with non-normal dynamics.

Next we consider a damped harmonic oscillator forced by Brownian motion: ( ¨ x + 2 ηω ˙ x + ω x = ˙ W t x (0) = x , ˙ x (0) = 0 . (35)This example will show that our framework works well with complex eigenvalues and rank-deﬁcient noise . The oscillator can be put in the form of (31) with A = (cid:20) − ω − ηω (cid:21) , B = (cid:20) (cid:21) . (36)We compute P [ | x ( T ) | > L | x (0) = ˙ x (0) = 0], i.e., the probability that the position of the oscil-lator exceeds some threshold by a ﬁxed time given that it was initially at rest. We set ω = 1, ζ = 0 . L = 3, and T = 10.To the best of the authors’ knowledge, for the oscillatory case, all rare event simulationalgorithms require solving an associated optimal control problem. Here, we instead project anindicator function dependent on the ﬁrst component of the state, | x | > , onto the ﬁrst nine sKOeigenfunctions. We plot the real parts of these eigenfunctions in Figure 8. Regression points aregenerated by simulating 121 independent trajectories of length T with uniformly spaced initialconditions on [ − , and extracting the state every ∆ t = 0 .

02 time units. In Figure 9, we showsample trajectories of the unbiased and biased systems. Notice that the impact of the biasingonly seems prominent towards the end of the simulation, e.g., from t = 8 onwards. Intuitively, this is because the system has an attracting point at zero, and since we want samples to escapeat the end of the simulation, it is not advantageous to bias early.In Figure 10 we show histograms of the absolute values of the position of the two systemsat time T . The estimator performance is summarized in Table 4, where we observe that biasingreduces the variance by a factor of nearly 5000. Variance Relative errorMonte Carlo 2 . × − . . × − . ρ true = 2 . × − Table 4: Importance sampling performance for the Brownian oscillator.

The stochastic advection-diﬀusion equation is an inﬁnite -dimensional non-normal linear system.We have,  v t = bv x + αv xx + √ ǫηv ( t,

0) = v ( t,

1) = 0 v (0 , x ) = 0 , (37)where η is space-time white noise. Following the approach in [42], this system can be convertedinto the form of (31), where A v = bv x + αv xx acts on the space of L functions over x ∈ [0 , B is the identity map, and W t is a cylindricalWiener process. The system is discretized using an exponential Euler method [52]. We estimate P (cid:2) k v ( T, · ) k L ([0 , ≥ L (cid:3) given that the system initially started at v ( t = 0 , x ) = 0. We have b = 1, α = 0 . ǫ = 1, T = 10, and L = 2 . φ ( v ) = √ µ h v, w i − w is the eigenfunction of the L -adjoint of A , − µ is the leading eigenvalue of A , and h u, v i = R uv dx . | x ( T ) | obtained using simple Monte Carlo and importance sampling forthe Brownian oscillator. 20igure 11: Histograms of k v ( T, · ) k L ([0 , computed using simple Monte Carlo and dynamic impor-tance sampling for the stochastic advection-diﬀusion equation. We plot the histogram of the norm of the system for the biased and unbiased systemsin Figure 11, and present the results of the sampling methods in Table 5. Using only twoeigenfunctionals, the variance of the estimator is reduced by a factor of 12 over Monte Carlo.

Variance Relative errorMonte Carlo 2 . × − . . × − . ρ true = 2 . × − Table 5: Importance sampling performance for the stochastic advection-diﬀusion equation.

We now demonstrate our approach on nonlinear stochastic systems. Consider the noisy Van derPol oscillator given by, d (cid:20) x x (cid:21) = (cid:20) x µ (1 − x ) x − x (cid:21) d t + √ ǫ (cid:20) d W d W (cid:21) . (38)In the absence of noise, the system exhibits a limit cycle, such that all initial conditions convergeto it (except the origin, which is an unstable equilibrium). In the presence of noise, trajectoriescluster on a band that is centered on the limit cycle of the deterministic system. We consider theproblem of ‘peeling’ a solution of the stochastic system from this band. Let µ = 0 . ǫ = 0 . T = 10; our task is to estimate P (cid:2) x ( T ) + x ( T ) > . | x (0) = 2 , x (0) = 0 (cid:3) . (39)The initial condition lies on the limit cycle of the deterministic system. The rare event is aregion that lies outside of it.We ﬁrst ﬁnd the sKO eigenfunctions of the system. As described in Section 3.3, we applygEDMD, using a basis { ψ k ( x , x ) } nk =1 of bivariate Legendre polynomials with total degree upto 10. This basis is constructed such that it is orthonormal with respect to the uniform measureon D = [ − , ⊂ R . There are n = 66 elements in this basis. We generate test points by using rajectory data beginning at 400 initial conditions uniformly spaced on D . Each trajectory issimulated on the interval t ∈ [0 , t = 0 .

05, for a total of 8 × test points.We ﬁnd that the quality of the eigenvalues and eigenfunctions obtained via generator EDMDis highly sensitive to the polynomial degree and the choice of basis. Indeed, it is well-noted thatEDMD methods can often lead to spurious eigenvalues, i.e., eigenvalues that are non-physical,when the choice of basis is poor [50, 58]. We ﬁnd that the same is true for eigenfunctions obtained via EDMD, when either the basis is not suﬃciently representative of the eigenfunctionsor the test points do not suﬃciently cover the state space. Obtaining good approximation of theeigenfunctions is critical to our sampling framework. Given a set of candidate eigenfunctionsproduced by EDMD, we cross-validate them with an independent dataset generated in the samefashion as the test points. In particular, the mean-square error of a candidate eigenfunction φ ( x ) with eigenvalue λ is deﬁned as m P mi =1 |A φ ( x i ) − λφ ( x i ) | . Only candidate eigenfunctionswith a testing error below some threshold (here chosen to be 0.04) are used to approximatethe Doob transform. In Figure 12 we show the ﬁrst nine approximated (and validated) sKOeigenfunctions, alongside a scatterplot of the test points. Figure 12: First nine stochastic Koopman eigenfunctions for the Van der Pol oscillator. Eigenfunc-tions are ordered according to the magnitude of the real part of the Koopman eigenvalues. Rightﬁgure shows the test points.

Approximating the Doob transform to estimate (39) requires approximating the indicatorfunction over the rare event region in the sKO eigenbasis. We ﬁrst express the indicator functionin the Legendre basis by solving a least-squares problem on the gEDMD test points. Since theKoopman eigenfunctions are approximated in the same basis, we can immediately compute thecoeﬃcients of the indicator’s sKO eigenfunction expansion. Just as in the linear case, if theexpansion in the sKO eigenfunction basis is negative in some region of the domain of interest,we add a constant to ensure positivity. A scaling factor is again applied to the biasing so thatenough samples will reach the rare event. In this example, we use the nine eigenfunctions plottedin Figure 12 to approximate the Doob transform.In Figure 13, we show 25 unbiased and biased sample paths of the oscillator. Notice thatnone of these unbiased sample paths reaches the rare event—they all remain inside the red circledemarcating the rare event region—while many of the biased paths do reach it. In Figure 14,we show the histogram of norm of the state at time T = 10 for the two systems. We reportsimulation results for the estimators in Table 6, and observe that the importance samplingestimator reduces variance by a factor of more than 400. T = 10. Red line denotes boundaryof the rare event. Variance Relative errorMonte Carlo 1 . × − . . × − . ρ true = 1 . × − Table 6: Importance sampling performance for the van der Pol oscillator.

Now we consider the noisy Duﬃng oscillator,¨ x + δ ˙ x + x ( β + αx ) = √ ǫ ˙ W t , which can be rewritten in standardized form asd (cid:20) x x (cid:21) = (cid:20) x − δx − x ( β + αx ) (cid:21) d t + √ ǫ (cid:20) W t (cid:21) . (40)The deterministic Duﬃng oscillator has three equilibria. The origin is an unstable equilibrium,while x ∗ = ± p − β/α are two stable equilibria. In the basins of attraction of the stable equi-libria, the system exhibits damped oscillatory dynamics. In the stochastic setting, noise caninfrequently cause trajectories to transition between the basins of attraction. For a transitionto occur, the stochastic forcing must “kick” the system in the correct direction and with thecorrect magnitude, in critical regions of the state space. We thus consider the rare event of transitioning from one basin of attraction to the other: P [ x ( T ) > | x (0) = − . , x (0) = 0] . Here we use parameter values α = 1, β = − δ = 0 . ǫ = 0 . T = 10. The studyof noise-induced transitions between attractors in dynamical systems is an important problemthat arises in protein folding and chemical kinetics [5, 7, 43].Similar to the Van der Pol oscillator, we ﬁnd the stochastic Koopman eigenfunctions byapplying gEDMD with a basis of bivariate scaled Legendre polynomials of total degree upto 12. This leads to 91 basis functions. To create test points for gEDMD, we simulate 400independent trajectories over the interval t ∈ [0 , D = [ − . , . . The data set is then generated by sampling each trajectory at intervalsof ∆ t = 0 .

2. In Figure 15, we show the ﬁrst nine eigenfunctions of the stochastic Duﬃngoscillator, along with the scatterplot of the test points. We approximate the indicator function f ( x ) = x > ( x , x ) via a linear combination of these nine sKO eigenfunctions, using regressionon the same test points.In Figure 16, we show 25 of the resulting biased sample trajectories of the Duﬃng oscillator,compared to unbiased paths. The few unbiased trajectories shown here do not transition tothe opposite basin of attraction. We plot a histogram of the ﬁnal positions of the unbiasedand biased sample trajectories in Figure 17. The ﬁgure demonstrates that unlike simple MonteCarlo, the biased trajectories are able to sample the transition paths with much greater success.Quantitative performance of the estimators is compared in Table 7. In particular, it can be seenthat the importance sampling estimator reduces variance by a factor of nearly 5000. x at time T = 10 for the unbiased and biasedsystems. Variance Relative errorMonte Carlo 2 . × − . . × − . ρ true = 2 . × − Table 7: Importance sampling performance for the noisy Duﬃng oscillator.25

Analyzing the second moment

It is useful to understand how the approximation of the Doob transform impacts the variance ofthe resulting importance sampling estimator. Here we provide some simple analytical results tothat end. For this analysis, we assume that the sKO eigenfunctions are obtained exactly. Conse-quently, the only error in the solution to the KBE originates from the accuracy of approximationof the terminal condition.We perform a non-asymptotic analysis of the importance sampling scheme based on theapproach outlined in [41, 59]. Assume f ( x ) ≥ h ( x ) = − log f ( x ); recall that f ( x )is the terminal condition of the KBE in (5), which is typically the indicator function over therare event. In contrast to our earlier presentation of the Doob transform, here f is allowed tobe a true indicator function, rather than a molliﬁed version of it. Indeed, the analysis in [41, 59]takes this scenario into account. Recall that the importance sampling estimator (cf. (11)) of ρ = E ,x [ f ( X T )] can be written asΓ( x ) = e − h ( ˜ X T ) d P d Q ( ˜ X ) , where x is the initial condition. The second moment of the importance sampling estimatorcorresponding to any control u ( t, x ) in the SDE system (13) is Q ( x ; u ) = E Q " e − h ( ˜ X T ) (cid:18) d P d Q (cid:19) . Using [60] and the subsequent analysis in [41, 59], we obtain the following variational represen-tation of the second moment of the importance sampling estimator: − log Q ( x ; u ) = inf v ∈V E " Z T k v ( s ) k d s − Z T k u ( s, ˆ X s ) k d s + 2 h ( ˆ X T ) , (41)where ˆ X solves  d ˆ X s = h A ( ˆ X s ) − B ( ˆ X s ) u ( s, ˆ X s ) + B ( ˆ X s ) v ( s ) i d s + B ( ˆ X s )d W s ˆ X = x , and V is the set of progressively measurable admissible processes. Recall that when f is theindicator function over set E , h ( ˆ X T ) is inﬁnity if ˆ X T does not enter E and zero otherwise. Wecan, therefore, restrict the set of admissible processes so that V only contains controls ensuringˆ X T ∈ E with probability one. Then by Lemma A.1 in [41], we have that for any suﬃcientlyregular functions Z ( t, x ) and U ( t, x ), where the control is u = − B ∗ ∇ U , − log Q ( x ; u ) ≥ inf v ∈V U (0 , x ) − E [ U ( T, ˆ X )] + 2 Z T G [ Z ]( s, ˆ X ) d s (42) − Z T k B ∗ ( ∇ Z − ∇ U ) k d s , where G [ Z ] = ∂ t Z + h A ( x ) , ∇ Z i − k B ∗ ∇ Z k + Tr BB ∗ ∇ Z. The operator G can be obtainedfrom the partial diﬀerential operator of the KBE, ∂ t [ · ] + A [ · ], via a change of variables Z = − log Φ.In our approach, the controller is derived from an approximation to the solution of theKBE: ˜ u ( t, x ) = ∇ log ˜Φ( t, x ). Therefore, if we choose U ( t, x ) = − log ˜Φ( t, x ), then we can use(42). Recall that we have assumed ˜Φ (22) to be constructed with the exact sKO eigenfunctions.Therefore, it is an exact solution of the KBE of the system for t ∈ [0 , T ), but does not match he terminal condition at t = T . Nonetheless, we have G [ U ] = 0 exactly. Taking Z = U thengives − log Q ( x ; u ) ≥ U (0 , x ) − v ∈V E h U ( T, ˆ X T ) i . (43)Note that the above bound is tight if ˜Φ( t, x ) in fact exactly matches the terminal condition,˜Φ( T, x ) = f ( x ). In this case, we have that U ( T, ˆ X T ) = h ( ˆ X T ), which, in turn, implies that theright-hand side equals 2 U ( t, x ) = − ρ and therefore Q ( x ; u ) ≤ ρ . Since Q ( x ; u ) ≥ ρ byJensen’s inequality, we conclude Q ( x ; u ) = ρ . In other words, the variance of the estimator iszero.On the other hand, when the biasing is imperfect but based on the true sKO eigenfunctions,(43) implies that the bound on the second moment depends on the accuracy of the approxima-tions of f ( x ) and of the KBE solution at the initial condition x (i.e., the quantity of interest ρ = Φ(0 , x )) using the eigenfunctions. Recalling that U ( T, ˆ X T ) = − log ˜Φ( T, ˆ X T ), observe that − v ∈V E h U ( T, ˆ X T ) i = 2 inf v ∈V E h log ˜Φ( T, ˆ X T ) i . Appealing to the properties of the expectation and the fact that ˆ X T ∈ E with probability one, E h log ˜Φ( T, ˆ X T ) i ≥ inf y ∈ E E h log ˜Φ( T, y ) i = inf y ∈ E log ˜Φ( T, y ) . Then (43) can be bounded from below as follows: − log Q ( x ; u ) ≥ − , x ) + 2 inf y ∈ E log ˜Φ( T, y ) . (44)This relation is an upper bound for the second moment for the importance sampling estimator.The ﬁrst term reﬂects how well the solution approximates the quantity of interest ρ . The secondterm reﬂects how well the approximate KBE solution approximates the terminal condition inthe rare event. It is important to emphasize that these two terms are coupled to one anothersince the solution at the initial condition is dependent on how well the terminal condition isapproximated.Further reﬁnement of these bounds is diﬃcult. The framework we have presented is rathergeneral, in the sense that we did not make strong assumptions on the properties of the stochas-tic dynamical system. Moreover, without prescribing closed-form or otherwise very speciﬁcapproaches to positivization or scaling (i.e., choosing ǫ and c ), it is diﬃcult to characterizeprecisely how well the sKO eigenfunctions approximate the solution to the KBE. For speciﬁcclasses of dynamical systems, one might be able to elucidate these bounds further, but sincethe emphasis of this paper has been on a generally applicable computational approach, we leavesuch analyses to future work. We have presented a framework for constructing importance sampling schemes for stochasticdynamical systems, using eigenfunctions of the associated stochastic Koopman operator (sKO).We use sKO eigenfunctions to approximate the Doob transform for the observable of interest,which in turn yields an approximation of the corresponding zero-variance importance samplingestimator. Our approach is broadly applicable, and we demonstrate the computation of rareevent probabilities in a wide variety of linear and nonlinear SDEs. These numerical exampleshighlight how one can exploit non-rare (bulk) trajectories of the dynamical system to informbiasing strategies for rare event simulation. For systems where the sKO eigenfunctions cannotbe derived analytically, we used generator EDMD to compute them numerically. Our approachis agnostic to the numerical method used to approximate the sKO eigenfunctions, however, nd thus as state-of-the-art methods for numerical approximation of the Koopman operatorimprove, our framework too will improve in accuracy and eﬃcacy. Moreover, even impreciseapplications of our approach can still lead to signiﬁcant variance reduction. We demonstratethat crude approximations to the Doob transform, using only a few numerically-approximatedeigenfunctions, can lead to variance reduction of several orders of magnitude over simple MonteCarlo.We note that our approach is applicable to a wide range of stochastic dynamical systems, in-cluding many that are not typically handled by existing rare event simulation methods. Methodsinspired by computational chemistry, for example, typically consider high-dimensional diﬀusionprocesses governed by a potential, i.e., gradient systems. We instead propose a single frameworkthat enables rare event simulation in systems with non-normal dynamics, oscillatory behavior,limit cycles, and degenerate noise—which appear in a variety of scientiﬁc and engineering set-tings [4, 6, 61]. This framework often “recovers” solutions proposed for speciﬁc cases. Forinstance, in non-normal linear systems, we ﬁnd that the dominant direction of biasing awayfrom an attracting point is aligned with the leading left eigenvectors of the drift term. This isconsistent with the rigorous theoretical results of [42] for inﬁnite-dimensional self-adjoint linearsystems, where the left and right eigendirections coincide; there, the authors found that (given asuﬃcient spectral gap) the best way to escape from an attractor is again to bias in the directionof the most slowly decaying eigenmode.We can also contrast our approach with rare event simulation methods based on stochasticoptimal control [23, 24, 26]. The goal of these eﬀorts is the same as ours: to ﬁnd a controllerfor the dynamical system that approximates the zero-variance importance sampling estimator.However, the stochastic optimal control formulation requires solving optimization problems orthe associated nonlinear Hamilton-Jacobi-Bellman equation, both of which may be intractablein high dimensions. These methods attempt to precisely compute the Doob transform locally ,depending on where trajectories lie in state space. In contrast, we consider the Kolmogorovbackward equation, which, due to linearity, enables eﬃcient computation based on eigenfunctioninformation. Our approach thus crudely computes the Doob transform globally , using sKOeigenfunctions approximated via non-rare trajectories.There are several avenues for future work. For example, approximation of the terminal con-dition of the KBE via sKO eigenfunctions presents some outstanding questions. We currentlyconstruct this approximation by combining regression with a post hoc numerical correction toensure positivity. A single integrated, consistent procedure for constructing positive approxi-mations would be preferable: not only might it improve the eﬃciency of rare event simulation,but it could also enable further theoretical analysis of approximation error and hence estimatorvariance.Importance sampling is known to be, in many cases, an unstable method, as there mightbe no guarantees that the variance of the resulting estimator is ﬁnite [9]. For future work, weplan to adapt our approach to more robust sampling methods for rare event simulation, suchas multilevel splitting. The connection between eﬃcient importance sampling estimators andmultilevel splitting has been well established [9]. Particle splitting also has the virtue of beingnon-intrusive, meaning that one is not required to alter the system dynamics to perform rareevent simulation. Since Koopman numerical methods enable us to construct crude approxima-tions to the KBE non-intrusively, combining these methods with particle splitting will lead tomore eﬃcient black box approaches for rare event simulation. We hope that these avenues forfuture research will produce new algorithms for robust, data-driven rare event simulation indiverse applications. Acknowledgements

This material is based upon work supported by the DARPA EQUiPS program, and by theUnited States Air Force and Air Force Oﬃce of Scientiﬁc Research under contract numbersFA8650-16C-7646 and FA9550-20-1-0397. We would also like to thank Kostas Spiliopolous, aul Dupuis, and Jose Blanchet for their helpful advice and insights. A Proof of Theorem 1

We now prove the main theorem associated with the Doob transform that enables the construc-tion of the rare event estimator presented in this work.

Proof.

We compute the stochastic integral Z T h u ( t, ˜ X t ) , d W t i . Let g ( t, x ) = log Φ( t, x ), and apply Itˆo’s formula:d g = ∂∂t log Φ( t, ˜ X t )d t + h∇ log Φ( t, ˜ X t ) , d ˜ X t i + 12 Tr h ∇ [log Φ( t, x )](d ˜ X t )(d ˜ X t ) ∗ i = 1Φ ∂∂t Φ d t + (cid:28) ∇ Φ , A ( ˜ X t ) + BB ∗ ∇ ΦΦ (cid:29) d t + (cid:28) ∇ ΦΦ , B d W t (cid:29) + 12 Tr (cid:20) BB ∗ ∇ ΦΦ (cid:21) d t −

12 Tr (cid:20) BB ∗ ( ∇ Φ)( ∇ Φ) ∗ Φ (cid:21) d t = 1Φ (cid:18) ∂∂t Φ + h∇ Φ , A ( ˜ X t ) i + 12 Tr (cid:2) BB ∗ ∇ Φ (cid:3)(cid:19) d t + 12 (cid:28) B ∗ ∇ ΦΦ , B ∗ ∇ ΦΦ (cid:29) d t + (cid:28) ∇ ΦΦ , B d W t (cid:29) = 1Φ (cid:18) ∂ Φ ∂t + A Φ (cid:19) d t + 12 k B ∗ ∇ log Φ( t, ˜ X t ) k d t + h B ∗ ∇ log Φ( t, ˜ X t ) , d W t i = 12 k u ( t, ˜ X t ) k d t + h u ( t, ˜ X t ) , d W t i . This implies that Z T h u ( t, ˜ X t ) , dW t i = log Φ( T, x ) − log Φ(0 , x ) − Z T k u ( t, ˜ X t ) k d t. Plugging this into (17), we have f ( ˜ X T ) exp " − Z T h u ( t, ˜ X t ) , d W t i − Z T k u ( s, ˜ X s ) k d s = f ( ˜ X T ) exp h log Φ(0 , x ) − log Φ( T, ˜ X T ) i = f ( ˜ X T ) Φ(0 , x ) f ( ˜ X T )= Φ(0 , x ) . Thus, the biasing leads to a zero variance estimator.

B Numerical Solution to Stochastic PDEs

To make this paper more self-contained, we provide a brief review of methods for simulatingstochastic PDEs. Much of this presentation is based on [52, 62]. We use these methods whendiscretizing the stochastic advection-diﬀusion equation in Section 4.2.3.Stochastic PDEs are typically solved by formulating the equation as a stochastic diﬀerentialequation on a Hilbert space of functions deﬁned over some subset of R d . Let H ( D ) be a Sobolevspace over an open set D ⊂ R d . Let A be a compact self-adjoint linear operator that maps ( D ) to itself, and f be a possibly nonlinear function from H ( D ) to itself. Stochastic PDEsare typically formulated in the following semilinear form,d X t = [ A X t + f ( X t )] d t + d W t , (45)where f are nonlinear functions from H ( D ) to itself, X t ∈ H ( D ) for all t , and W t is aninﬁnite-dimensional Wiener process. The inner product over this Sobolev space is h ψ, φ i = Z D ψφ d x. (46)Theoretical details on inﬁnite-dimensional Wiener processes can be found in [63]. B.1 Simulating inﬁnite dimensional Wiener processes

Let H be a separable inﬁnite dimensional Hilbert space and W t be an H -valued Q -Wienerprocess. One may simulate this process using a series expansion. Let { e k } k ∈ N be an orthonormalbasis of H comprised of eigenvectors of Q with eigenvalues q k >

0. One can then represent W t as follows, W t = ∞ X k =1 √ q k β k ( t ) e k , (47)where β k are independent real-valued one-dimensional Wiener processes. B.2 Exponential Euler schemes

The exponential Euler scheme is a type of Galerkin method in which the linear part of theprojected SPDE is solved exactly. The nonlinear parts are integrated forwards in time usingDuhamel’s principle. We assume A admits an orthonormal basis { φ i } ∞ i =1 in L ( D ) with eigen-values − λ i for λ i >

0, where φ k ∈ H ( D ) ∩ H ( D ). Here, H ( D ) denotes the Sobolev spacewhose functions satisfy Dirichlet boundary conditions.Deﬁne a ﬁnite-dimensional subspace of H ( D ) via a subset of the orthonormal basis { φ i } Ni =1 .We project the SPDE onto the resulting N -dimensional space and obtain a ﬁnite-dimensionalrepresentation of the SPDE,d X Nt = (cid:2) A N X Nt + F N ( X Nt ) (cid:3) d t + d W Nt (48)where A N v = N X i =1 − λ i h v, φ i i φ i , (49) F N = P N F | X N = N X i =1 h f ( X Nt ) , φ i i φ i . (50)The result of the projection is called the Itˆo-Galerkin stochastic ODE. The mild representationof the solution is X Nt = e A N t x N + Z t e A N ( t − s ) F N ( X Ns ) d s + Z t e A N ( t − s ) d W Ns . (51)Discretizing in time, we arrive at the following recurrence formula. Let Y Nk = X Nk ∆ t ; then wehave Y Nk +1 = e A N ∆ t Y Nk + A − N (cid:0) e A N ∆ t − I (cid:1) F N ( Y Nk ) + Z t k +1 t k e A N ( t k +1 − s ) d W s . (52) his recurrence equation can be decoupled into N equations describing the evolutions of thecoeﬃcients of the solution. Let Y Nk +1 ,i and F iN denote the i th components of Y Nk +1 and F N ,respectively. Each component of Y Nk +1 evolves according to Y Nk +1 ,i = e − λ i ∆ t Y Nk,i + 1 − e − λ i ∆ t λ i F iN ( Y Nk ) + √ q k Z t k +1 t k e − λ i ( t k +1 − s ) d β i ( s ) . (53)As for the last integral, note that any stochastic integral where the integrand is not dependenton the Brownian motion is Gaussian [10]. Furthermore, any Itˆo integral has mean zero (byvirtue of being a martingale) and its variance can be computed via the Itˆo isometry. That is, E "(cid:18)Z t k +1 t k e − λ i ( t k +1 − s ) d β i ( s ) (cid:19) = Z t k +1 t k e − λ i ( s − t k +1 ) d s (54)= 12 λ i (cid:0) − e − λ i ∆ t (cid:1) . (55)Therefore, the numerical algorithm for simulating the SPDE is Y Nk +1 ,i = e − λ i ∆ t Y Nk,i + 1 − e − λ i ∆ t λ i F iN ( Y Nk ) + r q k λ i (1 − e − λ i ∆ t )∆ W ik , (56)where ∆ W ik ∼ N (0 , A v = αv xx and f ( v ) = bv x . References [1] Jingchen Liu, Jianfeng Lu, and Xiang Zhou. Eﬃcient rare event simulation for failureproblems in random media.

SIAM Journal on Scientiﬁc Computing , 37(2):A609–A624,2015.[2] Paul Embrechts and No¨el Veraverbeke. Estimates for the probability of ruin with spe-cial emphasis on the possibility of large claims.

Insurance: Mathematics and Economics ,1(1):55–72, 1982.[3] Giovanni Dematteis, Tobias Grafke, and Eric Vanden-Eijnden. Rogue waves and largedeviations in deep sea.

Proceedings of the National Academy of Sciences , 115(5):855–860,2018.[4] Will Cousins, Miguel Onorato, Amin Chabchoub, and Themistoklis P Sapsis. Predictingocean rogue waves from point measurements: An experimental study for unidirectionalwaves.

Physical Review E , 99(3):032201, 2019.[5] Eric Vanden-Eijnden. Transition path theory. In

Computer Simulations in CondensedMatter Systems: From Materials to Chemical Biology Volume 1 , pages 453–493. Springer,2006.[6] Benjamin Zhang, Youssef Marzouk, Byung-Young Min, and Tuhin Sahai. Rare event sim-ulation of a rotorcraft system. In ,page 1181, 2018.[7] Eric Vanden-Eijnden and Jonathan Weare. Rare event simulation of small noise diﬀusions.

Communications on Pure and Applied Mathematics , 65(12):1770–1803, 2012.[8] A. Dembo and O. Zeitouni.

Large Deviations Techniques and Applications . Applicationsof mathematics. Springer, 1998.[9] Amarjit Budhiraja and Paul Dupuis.

Analysis and Approximation of Rare Events: Repre-sentations and Weak Convergence Methods , volume 94. Springer, 2019.[10] Bernt Øksendal. Stochastic diﬀerential equations. In

Stochastic diﬀerential equations , pages65–84. Springer, 2003.

11] Herman Kahn and Theodore E Harris. Estimation of particle transmission by randomsampling.

National Bureau of Standards applied mathematics series , 12:27–30, 1951.[12] M Villen-Altamirano. Restart: A method for accelerating rare event simulations.

Analysis ,3:3.[13] Siu-Kui Au and James L Beck. Estimation of small failure probabilities in high dimensionsby subset simulation.

Probabilistic engineering mechanics , 16(4):263–277, 2001.[14] Iason Papaioannou, Wolfgang Betz, Kilian Zwirglmaier, and Daniel Straub. MCMC algo-rithms for subset simulation.

Probabilistic Engineering Mechanics , 41:89–103, 2015.[15] Fr´ed´eric C´erou, Pierre Del Moral, Teddy Furon, and Arnaud Guyader. Sequential MonteCarlo for rare event estimation.

Statistics and computing , 22(3):795–808, 2012.[16] Zdravko I Botev and Dirk P Kroese. Eﬃcient Monte Carlo simulation via the generalizedsplitting method.

Statistics and Computing , 22(1):1–16, 2012.[17] SR Srinivasa Varadhan.

Large deviations and applications . SIAM, 1984.[18] Paul Dupuis and Hui Wang. Importance sampling, large deviations, and diﬀerential games.

Stochastics: An International Journal of Probability and Stochastic Processes , 76(6):481–508, 2004.[19] Thomas Dean and Paul Dupuis. Splitting for rare event simulation: A large deviationapproach to design and analysis.

Stochastic processes and their applications , 119(2):562–587, 2009.[20] Lasse Ebener, Georgios Margazoglou, Jan Friedrich, Luca Biferale, and Rainer Grauer.Instanton based importance sampling for rare events in stochastic PDEs.

Chaos: AnInterdisciplinary Journal of Nonlinear Science , 29(6):063102, 2019.[21] G Margazoglou, L Biferale, R Grauer, K Jansen, D Mesterh´azy, T Rosenow, and R Tripic-cione. Hybrid Monte Carlo algorithm for sampling rare events in space-time histories ofstochastic ﬁelds.

Physical Review E , 99(5):053303, 2019.[22] David Siegmund. Importance sampling in the Monte Carlo study of sequential tests.

TheAnnals of Statistics , pages 673–684, 1976.[23] Carsten Hartmann, Lorenz Richter, Christof Sch¨utte, and Wei Zhang. Variational charac-terization of free energy: Theory and algorithms.

Entropy , 19(11):626, 2017.[24] Carsten Hartmann, Omar Kebiri, Lara Neureither, and Lorenz Richter. Variational ap-proach to rare event simulation using least-squares regression.

Chaos: An InterdisciplinaryJournal of Nonlinear Science , 29(6):063107, 2019.[25] Omar Kebiri, Lara Neureither, and Carsten Hartmann. Adaptive importance samplingwith forward-backward stochastic diﬀerential equations. arXiv preprint arXiv:1802.04981 ,2018.[26] Wei Zhang, Han Wang, Carsten Hartmann, Marcus Weber, and Christof Sch¨utte. Applica-tions of the cross-entropy method to importance sampling and optimal control of diﬀusions.

SIAM Journal on Scientiﬁc Computing , 36(6):A2654–A2672, 2014.[27] Paul Glasserman, Yashan Wang, et al. Counterexamples in importance sampling for largedeviations probabilities.

The Annals of Applied Probability , 7(3):731–746, 1997.[28] Bernard O Koopman. Hamiltonian systems and transformation in Hilbert space.

Pro-ceedings of the National Academy of Sciences of the United States of America , 17(5):315,1931.[29] Marko Budiˇsi´c, Ryan Mohr, and Igor Mezi´c. Applied Koopmanism.

Chaos: An Interdisci-plinary Journal of Nonlinear Science , 22(4):047510, 2012.[30] Y Susuki A Mauroy, I Mezic.

Koopman Operator in Systems and Control . Springer.[31] Peter J Schmid. Dynamic mode decomposition of numerical and experimental data.

Journalof ﬂuid mechanics , 656:5–28, 2010.

32] Clarence W Rowley, Igor Mezi´c, Shervin Bagheri, Philipp Schlatter, and Dan S Henningson.Spectral analysis of nonlinear ﬂows.

Journal of ﬂuid mechanics , 641:115–127, 2009.[33] Ioannis Karatzas and Steven Shreve.

Brownian motion and stochastic calculus , volume 113.Springer Science & Business Media, 2012.[34] Grigorios A Pavliotis.

Stochastic processes and applications: diﬀusion processes, the Fokker-Planck and Langevin equations , volume 60. Springer, 2014.[35] Søren Asmussen and Peter W Glynn.

Stochastic simulation: algorithms and analysis ,volume 57. Springer Science & Business Media, 2007.[36] Daniel W Stroock.

Elements of Stochastic Calculus and Analysis . Springer.[37] L Chris G Rogers and David Williams.

Diﬀusions, Markov processes and martingales:Volume 2, Itˆo calculus , volume 2. Cambridge university press, 2000.[38] Simo S¨arkk¨a and Arno Solin.

Applied stochastic diﬀerential equations , volume 10. Cam-bridge University Press, 2019.[39] Mark Iosifovich Freidlin and Alexander D Wentzell. Random perturbations. In

RandomPerturbations of Dynamical Systems , pages 15–43. Springer, 1998.[40] Paul Dupuis and Hui Wang. Subsolutions of an Isaacs equation and eﬃcient schemes forimportance sampling.

Mathematics of Operations Research , 32(3):723–757, 2007.[41] Paul Dupuis, Konstantinos Spiliopoulos, and Xiang Zhou. Escaping from an attractor:importance sampling and rest points I.

The Annals of Applied Probability , 25(5):2909–2958, 2015.[42] Michael Salins and Konstantinos Spiliopoulos. Rare event simulation via importance sam-pling for linear SPDEs.

Stochastics and Partial Diﬀerential Equations: Analysis and Com-putations , pages 1–39, 2017.[43] Eliodoro Chiavazzo, Roberto Covino, Ronald R Coifman, C William Gear, Anastasia SGeorgiou, Gerhard Hummer, and Ioannis G Kevrekidis. Intrinsic map dynamics explorationfor uncharted eﬀective free-energy landscapes.

Proceedings of the National Academy ofSciences , 114(28):E5494–E5503, 2017.[44] Erik H Thiede, Dimitrios Giannakis, Aaron R Dinner, and Jonathan Weare. Galerkinapproximation of dynamical quantities using trajectory data.

The Journal of chemicalphysics , 150(24):244111, 2019.[45] Konstantinos Spiliopoulos. Nonasymptotic performance analysis of importance samplingschemes for small noise diﬀusions.

Journal of Applied Probability , 52(3):797–810, 2015.[46] Nelida ˇCrnjari´c-ˇZic, Senka Ma´ceˇsi´c, and Igor Mezi´c. Koopman operator spectrum forrandom dynamical systems.

Journal of Nonlinear Science , pages 1–50, 2019.[47] Giorgio Metafune, Diego Pallara, and Enrico Priola. Spectrum of Ornstein-Uhlenbeckoperators in L p spaces with respect to invariant measures. Journal of Functional Analysis ,196(1):40–60, 2002.[48] Jonathan H Tu, Clarence W Rowley, Dirk M Luchtenburg, Steven L Brunton, andJ Nathan Kutz. On dynamic mode decomposition: theory and applications. arXiv preprintarXiv:1312.0041 , 2013.[49] Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data–drivenapproximation of the Koopman operator: Extending dynamic mode decomposition.

Journalof Nonlinear Science , 25(6):1307–1346, 2015.[50] Stefan Klus, Feliks N¨uske, Sebastian Peitz, Jan-Hendrik Niemann, Cecilia Clementi, andChristof Sch¨utte. Data-driven approximation of the Koopman generator: Model reduction,system identiﬁcation, and control, 2019.[51] Andreas R¨oßler. Runge–kutta methods for the strong approximation of solutions of stochas-tic diﬀerential equations.

SIAM Journal on Numerical Analysis , 48(3):922–952, 2010.

52] Arnulf Jentzen and Peter E Kloeden. The numerical approximation of stochastic partialdiﬀerential equations.

Milan Journal of Mathematics , 77(1):205–244, 2009.[53] Yong Chen, Yong Liu, et al. On the eigenfunctions of the complex Ornstein–Uhlenbeckoperators.

Kyoto Journal of Mathematics , 54(3):577–596, 2014.[54] Todd K Leen, Robert Friel, and David Nielsen. Eigenfunctions of the multidimensionallinear noise Fokker-Planck operator via ladder operators. arXiv preprint arXiv:1609.01194 ,2016.[55] Benjamin Zhang, Tuhin Sahai, and Youssef Marzouk. Computation of eigenfunctions ofthe multidimensional Ornstein-Uhlenbeck operator, in preparation.[56] E Weinan, Weiqing Ren, and Eric Vanden-Eijnden. String method for the study of rareevents.

Physical Review B , 66(5):052301, 2002.[57] Xiang Zhou et al. The gentlest ascent dynamics. arXiv preprint arXiv:1011.0042 , 2010.[58] Steven L Brunton and J Nathan Kutz.

Data-driven science and engineering: Machinelearning, dynamical systems, and control . Cambridge University Press, 2019.[59] Paul Dupuis, Konstantinos Spiliopoulos, and Hui Wang. Importance sampling for multi-scale diﬀusions.

Multiscale Modeling & Simulation , 10(1):1–27, 2012.[60] Michelle Bou´e, Paul Dupuis, et al. A variational representation for certain functionals ofBrownian motion.

The Annals of Probability , 26(4):1641–1659, 1998.[61] Steven H Strogatz.

Nonlinear dynamics and chaos with student solutions manual: Withapplications to physics, biology, chemistry, and engineering . CRC press, 2018.[62] Zhongqiang Zhang and George Karniadakis.

Numerical methods for stochastic partial dif-ferential equations with white noise , volume 196. Springer, 2017.[63] Giuseppe Da Prato and Jerzy Zabczyk.

Stochastic equations in inﬁnite dimensions . Cam-bridge university press, 2014.. Cam-bridge university press, 2014.