[PDF] Backward Simulation of Stochastic Process using a Time Reverse Monte Carlo method

Abstract

The "backward simulation" of a stochastic process is defined as the stochastic dynamics that trace a time-reversed path from the target region to the initial configuration. If the probabilities calculated by the original simulation are easily restored from those obtained by backward dynamics, we can use it as a computational tool. It is shown that the naive approach to backward simulation does not work as expected. As a remedy, the Time Reverse Monte Carlo method (TRMC) based on the ideas of Sequential Importance Sampling (SIS) and Sequential Monte Carlo (SMC) is proposed and successfully tested with a stochastic typhoon model and the Lorenz 96 model. TRMC with SMC, which contains resampling steps, is shown to be more efficient for simulations with a larger number of time steps. A limitation of TRMC and its relation to the Bayes formula are also discussed.

Full PDF

JJournal of the Physical Society of Japan

Backward Simulation of Stochastic ProcessUsing a Time Reverse Monte Carlo Method

Shinichi Takayanagi ∗ and Yukito Iba † SOKENDAI, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562,Japan

The “backward simulation” of a stochastic process is deﬁned as the stochastic dynamics thattrace a time-reversed path from the target region to the initial conﬁguration. If the probabil-ities calculated by the original simulation are easily restored from those obtained by back-ward dynamics, we can use it as a computational tool. It is shown that the na¨ıve approach tobackward simulation does not work as expected. As a remedy, the time reverse Monte Carlomethod (TRMC) based on the ideas of sequential importance sampling (SIS) and sequentialMonte Carlo (SMC) is proposed and successfully tested with a stochastic typhoon model andthe Lorenz 96 model. TRMC with SMC, which contains resampling steps, is found to bemore e ﬃ cient for simulations with a larger number of time steps. A limitation of TRMC andits relation to the Bayes formula are also discussed.

1. Introduction

In this paper, we discuss “backward simulation,” which traces a time-reversed path froma target region A to the initial conﬁguration (Fig. 1). If the outputs of the original simulation(“forward simulation”) are easily restored from those obtained by backward dynamics, wecan use backward simulation as a computational tool. In particular, the time required to cal-culate the probability to reach A from the initial conﬁgurations can be signiﬁcantly reducedwhen the target region A is small but the initial distribution is broad. An example is a compu-tation of the probability that a typhoon will hit the Tokyo area exactly under a given stochasticmodel (Sect. 5.1).It is, however, di ﬃ cult to design backward dynamics with the desired properties. Speciﬁ- ∗ [email protected] † [email protected] 1 / a r X i v : . [ phy s i c s . d a t a - a n ] J a n . Phys. Soc. Jpn. Forward dynamics

Target regionInitial distribution

Backward dynamics

Target regionInitial distribution

Fig. 1.

Forward and backward simulations. The forward simulation is ine ﬃ cient when the target region A ismuch smaller than the support of the initial distribution p ( x ). The backward simulation simulates paths fromthe target region A to the support of the initial distribution p ( x ). cally, consider the forward dynamics of a D -dimensional stochastic process X deﬁned by X i + = g ( X i ) + η i , (1)where η i is an independent noise that obeys an arbitrary distribution and the function g : R D → R D describes noiseless forward dynamics. Then, a na¨ıve way to derive a time-reversedequation is to rearrange Eq. (1) as X i = g − ( X i + − η i ) . (2)Here, we assume that function g is a one-to-one and onto function and denotes the inversefunction of g as g − . We can construct a time-reversed path iteratively, using Eq. (2) and theindependent realization of η i , starting from the target region. It deﬁnes an apparently naturalcandidate for backward dynamics.Surprisingly, this na¨ıve method does not work as expected; it does not reproduce the cor-rect probabilities deﬁned by the forward simulation, and the calculation of factors required tocorrect the bias is often computationally expensive. This becomes clear in Sect. 2. Further-more, the computation of g − in Eq. (2) is time-consuming and reduces the e ﬃ ciency of thecomputation.The aim of this research is to draw attention to these facts and propose an algorithmthat partially resolves the problem. We named this algorithm the time reverse Monte Carlomethod (TRMC). TRMC is based on the ideas of sequential importance sampling (SIS)

1, 2) and sequential Monte Carlo (SMC).

1, 3–5)

We discuss TRMC based on SIS in Sects. 3 and4 and its improved version based on SMC in Sect. 6. There have been several studies us-ing “path reweighting” in computational chemistry.

6, 7)

These studies used reweighting fordi ﬀ erent purposes. /

22. Phys. Soc. Jpn.

TRMC based on SIS is tested for a stochastic typhoon model and the Lorenz 96 model inSect. 5. The improved version with SMC is also tested in Sect. 6 for simulations with a largernumber of steps. In these examples, TRMC provides unbiased estimates of the probabilitieswithout expensive computation. In Sect. 7, we discuss its relation to the Bayes formula, aswell as the possible improvement and limitations of TRMC.Time-reversed dynamics itself was discussed in several studies, mostly from a theoret-ical viewpoint. On the other hand, related computational problems are found in data science,especially in time-series analysis using state-space models.

Our problem can formallybe regarded as a limiting case of the “smoothing” part of these algorithms, where only oneobservation (“target”) is available at the end of the time series. There are, however, importantdi ﬀ erences from our problem, which are discussed in Sect. 7. Studies related to the statisticalinference on a discrete state stochastic process, such as genepropagation

2, 14) and informationsource detection were also reported. These studies, however, did not consider dynamicalsystems of continuous variables.

2. Failure of Na¨ıve Method

Here, we provide a detailed discussion of the na¨ıve method and its drawbacks, whichform the motivation for our algorithm. Before providing details, we formulate the problem.Let S T = { = t ≤ t · · · ≤ t N = T } be a partition of the interval [0 , T ], and let step size ∆ t = t i + − t i be a constant; x i is used to represent the value of stochastic process X at time point t i .The transition probability density from x i to x i + deﬁned by Eq. (1) is denoted as p ( x i + | x i ).We consider an estimation of the probability P ( X N ∈ A ) that X N hits a small target region A in the D -dimensional space. The probability is formally written as P ( X N ∈ A ) = (cid:90) dx N x N ∈ A  N − (cid:89) i = p ( x i + | x i )  p ( x ) , (3)where x ∈ A is the indicator function that takes value 1 when x ∈ A , and 0 otherwise, and p ( x )is the initial distribution of the forward simulation. Hereafter, dx k : l indicates dx k dx k + · · · dx l for k ≤ l .A na¨ıve method is deﬁned as a repeated simulation with a uniformly distributed ini-tial condition in the target region A using Eq. (2). Initially, it appears su ﬃ cient to evaluate P ( X N ∈ A ) as M (cid:80) Mj = p ( x ( j )0 ). However, there are two problems with this na¨ıve method. First,the exact computation of g − in Eq. (2) is not easy. Computing g − using numerical root-ﬁnding techniques such as the Newton-Raphson method is computationally intensive and itsseverity increases as the dimension increases. /

22. Phys. Soc. Jpn.

Second, this computation does not reproduce the correct probability P ( X N ∈ A ) even withthe exact g − . To understand this problem, we show the di ﬀ erence between the forward sim-ulation and the na¨ıve method. Let us deﬁne Y i = X i − η i − = g ( X i − ); i ∈ [1 , . . . , N ] . (4)Using this deﬁnition, we can rewrite Eq. (2) as Y i + η i − = g − ( Y i + ) . (5)Equation (5) can be simpliﬁed into Y i = g − ( Y i + ) − η i − . (6)The probability calculated using Eq. (6) corresponds to the equation (cid:90) dy N dx N x N ∈ A ˜ p f ( y N | x N )  N − (cid:89) i = ˜ p ( y i | y i + )  p ( g − ( y )) , (7)where ˜ p f ( y N | x N ) is the transition probability density from x N to y N deﬁned by Eq. (4) with i = N and ˜ p ( y i | y i + ) is the transition probability density from y i + to y i deﬁned by Eq. (6). Aninitial condition x N is uniformly distributed in the target region A .We have to introduce the Jacobian of function g so that Eq. (7) is consistent with Eq. (3).To show this, Eq. (3) is rewritten using equations p ( x i | x i − ) dx i = (cid:12)(cid:12)(cid:12) det( J g − ( y i + )) (cid:12)(cid:12)(cid:12) ˜ p ( y i | y i + ) dy i , (8) i ∈ [1 , · · · , N − p ( x ) dx = (cid:12)(cid:12)(cid:12) det( J g − ( y )) (cid:12)(cid:12)(cid:12) p (cid:16) g − ( y ) (cid:17) dy , (9)where (cid:12)(cid:12)(cid:12) det( J g − ( y i )) (cid:12)(cid:12)(cid:12) is the absolute value of the Jacobian of function g − . As a result, theprobability P ( X N ∈ A ) is calculated as P ( X N ∈ A ) = (cid:90) dy N dx N x N ∈ A J ( y , . . . , y N ) ˜ p f ( y N | x N )  N − (cid:89) i = ˜ p ( y i | y i + )  p (cid:16) g − ( y ) (cid:17) , (10) J ( y , . . . , y N ) =  N − (cid:89) i = (cid:12)(cid:12)(cid:12) det( J g − ( y i + )) (cid:12)(cid:12)(cid:12) . (11)We can obtain the correct probability using Eq. (10) instead of Eq. (7). The Jacobian J g − calculation is, however, computationally expensive.We note that the factor J ( y , . . . , y N ) goes toexp (cid:32) − (cid:90) T div f ( x t ) dt (cid:33) (12) /

22. Phys. Soc. Jpn.

Volume at Volume at

Fig. 2.

Change in the inﬁnitesimal volume in the state space along each path. in the limit as ∆ t → g ( x ) = x + f ( x ) ∆ t . The proof of Eq. (12) is givenin Appendix A. This shows that we must include factor J ( y , . . . , y N ) for unbiased estimationeven in the limit of inﬁnitesimal ∆ t . We can regard the factor written in Eq. (12) as the changein the inﬁnitesimal volume along each path (Fig. 2).

3. Time Reverse Monte Carlo Method

To overcome these di ﬃ culties, we propose the TRMC method. TMRC essentially in-volves introducing simpliﬁed backward dynamics with a weight. This weight enables thebias of estimators to be corrected. First, we introduce a backward transition probability q ( x i + → x i ) from x i + to x i . We can choose an arbitrary probability density q , while the com-putation e ﬃ ciency strongly depends on it. Once we introduce q ( x i + → x i ), we can rewriteEq. (3) as P ( X N ∈ A ) = (cid:90) dx N x N ∈ A V A  N − (cid:89) i = q ( x i + → x i ) W i  V A p ( x ) , (13)where W i = p ( x i + | x i ) q ( x i + → x i ) (14)is the weight required to correct the bias of estimators and V A is the volume of target region A . Suppose p ( x ) is uniformly distributed on B ⊂ R D ; p ( x ) = V B x ∈ B , V B is the volume of B . The e ﬃ ciency of our algorithm does not depend on the factor V A when V B is considerablylarge. This is the advantage of using our algorithm.The algorithm consists of the following steps. TRMC Algorithm

Step 1: Draw M samples (cid:110) x (1) N , · · · , x ( M ) N (cid:111) from the uniform distribution in V A . /

22. Phys. Soc. Jpn.

Step 2: Apply the following steps for j = , . . . , M , and for i = N − , . . . , x ( j ) i + to x ( j ) i with transition probability q (cid:16) x ( j ) i + → x ( j ) i (cid:17) .b. Calculate weight W ( j ) i using Eq. (14).Step 3: Evaluate the unbiased estimates of probability P ( X N ∈ A ) as P ( X N ∈ A ) (cid:119) M M (cid:88) j = W ( j ) , (15)where the factor W ( j ) =  N − (cid:89) i = W ( j ) i  V A p ( x ( j )0 ) (16)is attached to each simulation path.The inputs of our algorithm are the number of Monte Carlo paths M , the number of time steps N , the initial distribution p ( x ), the target region A , and the transition probability density q .When we actually perform the simulation on our computers, we take the logarithm of theseweights to prevent numerical overﬂow.This algorithm provides unbiased estimates of the desired probabilities. The idea of thisscheme is a kind of SIS. An advantage of our method is that we do not need to calculate g − or their Jacobian matrices at each i .The remaining problem involves determining the method for choosing the transition prob-ability q ( x i + → x i ). The basic idea is to choose the backward dynamics that generates tra-jectories similar to the forward dynamics deﬁned by Eq. (1). The similarity of the trajectoryis measured by W ( j ) in Eq. (16).

4. Implementation for Stochastic Di ﬀ erence Equation To give concrete examples of the transition probability q ( x i + → x i ), we assume the for-ward dynamics to be given in the following form: X i + = X i + f ( X i ) ∆ t + (cid:15) i √ ∆ t . (17)This corresponds to the case wherein g ( x ) = x + f ( x ) ∆ t in Eq. (1). The noise (cid:15) i is assumedto be i.i.d. Gaussian noise with mean zero and the variance-covariance matrix Σ = σσ T .This class of equations appears in a wide range of problems in many di ﬀ erent ﬁelds such asphysics, computational chemistry, and mathematical ﬁnance.

18, 19)

In this case, as a simple choice, we can use the following backward dynamics: X i = X i + − f ( X i + ) ∆ t + (cid:15) i √ ∆ t . (18)This approximation corresponds to substituting f ( X i + ) for f ( X i ) in Eq. (17). /

22. Phys. Soc. Jpn.

With this choice, weight W i in Eq. (14) takes the form W i = p ( x i + | x i ) q ( x i + → x i ) = exp (cid:104) − ( x i + − x i − f ( x i ) ∆ t ) T ( Σ∆ t ) − ( x i + − x i − f ( x i ) ∆ t ) (cid:105) exp (cid:104) − ( x i + − x i − f ( x i + ) ∆ t ) T ( Σ∆ t ) − ( x i + − x i − f ( x i + ) ∆ t ) (cid:105) = exp (cid:34) − ( f ( x i + ) − f ( x i )) T Σ − (cid:32) ( x i + − x i ) − ∆ t f ( x i + ) + f ( x i )) (cid:33)(cid:35) . (19)As we show in the next section, the resultant algorithm is simple yet e ﬀ ective comparedwith the forward simulation when the target region A is smaller than the support of the initialdistribution p ( x ).We note that the factor (cid:81) N − i = W i goes toexp (cid:32) − (cid:90) T div f ( x t ) dt (cid:33) (20)in the limit as ∆ t →

0. The proof of Eq. (20) is given in Appendix B. Note that Eq. (20)coincides with Eq. (12) derived from a di ﬀ erent assumption.

5. Experimental Results

We present the numerical results in this section. Forward simulations (FS) are used tocheck the consistency and computational e ﬃ ciency of our result.Using forward and backward dynamics, we simulate sample trajectories x = { x , · · · , x N } generated by each model and compute the probability P ( X N ∈ A ) from M independent simu-lations.We denote a standard error of TRMC to evaluate the computational e ﬃ ciency by σ s . Wealso denote the standard error of FS by σ Fs . Using these variables, we deﬁne a relative valueof variance by ρ = (cid:32) σ Fs σ s (cid:33) . (21)The factor ρ indicates the computational e ﬃ ciency only including the e ﬀ ect caused by thevariance of estimators for a ﬁxed sample size. With this deﬁnition, more complex algorithmstend to be more e ﬃ cient while they require more computational time. Then, we also deﬁneanother measure of the relative computational e ﬃ ciency ρ as ρ = ρ τ F τ , (22)where τ is the computational time in seconds of the simulation and τ F is the computationaltime of FS in seconds. This e ﬃ ciency is deﬁned in the sense of the actual performance con-sidering both the computational time and the variance of the resulting estimates. /

22. Phys. Soc. Jpn.

The ﬁrst example is a stochastic typhoon model, which gives an example of risk esti-mation by the proposed method. The stochastic typhoon model was designed to reproduce thestatistics of typhoons in the northwestern part of the Paciﬁc Ocean. This is a four-dimensionalmodel given by x i + = x i + v i , v i + = V ( x i + ) + w ( v i − V ( x i )) + (cid:15) i , V ( x i ) = a + a x φ, i + a sin x λ, i + a sin x λ, i , (23)where we use a global coordinate system deﬁned by the geographic longitude ( φ ) and latitude( λ ). We also deﬁne the two-dimensional position x = (cid:16) x φ , x λ (cid:17) , speed v = (cid:16) v φ , v λ (cid:17) of a typhoon,and function V ( x ) = (cid:16) V φ ( x ) , V λ ( x ) (cid:17) . w , a , a , a , and a are constants. The noise (cid:15) obeys aGaussian distribution with mean zero and variances σ .We ﬁx w = . a = (0 . , . a = (0 . , . a = ( − . , . a = (0 . , − . σ = .

4. The target region A is (cid:110) ( x φ , x λ ); 138 . ≤ x φ ≤ . , . ≤ x λ ≤ . (cid:111) . Since there is no range constraint on thedistribution of the ﬁnal speed v f at the target, we adopt a uniform distribution with asuitably wide range U f ; here, U f is deﬁned as the region (cid:8) ( v φ , v λ ); V φ ( x A ) − ≤ v φ ≤ V φ ( x A ) + , V λ ( x A ) − ≤ v λ ≤ V λ ( x A ) + (cid:9) , where x A is the center of target region A.We also assume that the initial condition is uniformly distributed in D = (cid:8) ( x , v ); 111 ≤ x φ ≤ , − ≤ x λ ≤ , v ∈ U (cid:9) , where U is deﬁned as the region (cid:8) ( v φ , v λ ); V φ ( x ) − . ≤ v φ ≤ V φ ( x ) + . , V λ ( x ) − . ≤ v λ ≤ V λ ( x ) + . (cid:9) and x = (120 , . This corresponds to thecase wherein typhoons that occurred in the Philippines travel to the Tokyo area exactly witha small probability. We set M to 10 and N to 16. Examples of Monte Carlo paths for bothsimulations are given in Figs. 3 and 4.This simulation was carried out on a laptop computer with 2.3 Ghz Intel core i5 and 8GBytes memory. The computational time of TRMC in this simulation for generating 10 Monte Carlo paths is around 4 . × seconds.Table I shows the result of computational experiments for the stochastic typhoon model.It shows that the probabilities of FS and TRMC agree within the error bars. If we ignore thefactor deﬁned by Eq. (16), it does not reproduce the unbiased probability as in the case of thestochastic di ﬀ erence equation. Furthermore, it shows that TRMC is 4.2 times in terms of ρ and 7.3 times in ρ

1) more e ﬃ cient than FS. Fig. 5 shows the convergence of TRMC when thenumber of Monte Carlo paths M increases. It reveals that our algorithm converges correctly /

22. Phys. Soc. Jpn.

Longitude La t i t ude Fig. 3.

Example of Monte Carlo paths gener-ated by the stochastic typhoon model originatingfrom the northwestern part of the Paciﬁc Ocean.Each line corresponds to a path generated by theforward simulation. The black rectangular regionshows the possible initial position of typhoons inthe northwestern part of the Paciﬁc Ocean. Theinitial positions of typhoons are uniformly dis-tributed.

Longitude La t i t ude Fig. 4.

Example of Monte Carlo paths gener-ated by TRMC starting from Tokyo. Each line cor-responds to a path generated by TRMC. The blackrectangular region corresponds to the possible ini-tial position of typhoons in the northwestern partof the Paciﬁc Ocean. on increasing the number of Monte Carlo paths M .To simulate events with smaller probabilities, we make the target region A smaller as (cid:110) ( x φ , x λ ); 138 . ≤ x φ ≤ . , . ≤ x λ ≤ . (cid:111) . It shows that the smaller the probabil-ity, the more e ﬃ cient our algorithm becomes as compared with the FS.In Fig. 4, a few Monte Carlo paths are shown to have moved northward. To prevent thisfrom happening and improve its e ﬃ ciency, we restrict the velocity distribution of MonteCarlo paths to the tendency to move southward. We change the range U f of the ﬁnal speed v f to (cid:8) ( v φ , v λ ); V φ ( x A ) − ≤ v φ ≤ V φ ( x A ) + , V λ ( x A ) − ≤ v λ ≤ V λ ( x A ) + (cid:9) . We call thissimulation TRMC (restricted) in Fig. 6. Table I shows that the probabilities of TRMC andTRMC (restricted) agree within error bars. Because the number of unnecessary Monte Carlopaths moving northward decreases, TRMC (restricted) is more e ﬃ cient than TRMC. Moresevere constraint v λ ≥

0, however, causes a small bias in the estimated probabilities: seeTRMC ( v λ ≥

0) in Table I. Fig. 7 also shows that our algorithm converges correctly onincreasing the number of Monte Carlo paths M .So far, we consider the case that the number of time steps from the initial position to /

22. Phys. Soc. Jpn. Number of Monte Carlo Paths E s t i m a t ed P r obab ili t y Fig. 5.

Convergence of TRMC for the stochastic typhoon model. The estimated probabilities converge tothose obtained by FS as the number of Monte Carlo paths increases. Error bars indicate approximate ± ± M = , as TRMC. Tokyo ( N =

16) is precisely known. We can relax this assumption, but we should be carefulwith a limitation of a discrete time model. A typhoon can pass nearby Tokyo, for example,between N =

15 and N =

16, which causes an underestimation of the actual risk when weonly consider hit at integral time steps. A way to reduce this e ﬀ ect is to develop models withsmaller steps, while it is also possible to introduce some initialization or interpolation methodinto a backward simulation. However, we leave this as a future problem, because this studyaims to check whether the concept of backward simulation is mathematically valid. As a higher-dimensional example, we evaluate the e ﬃ ciency of our algorithm for theLorenz 96 model.

21, 22)

The Lorenz 96 model is an atmospheric model and was introduced byEdward Lorenz in 1996. It is deﬁned as the set of coupled ordinary di ﬀ erential equations dx k dt = f k ( x ) + (cid:15) k , f k ( x ) = − x k − x k − + x k − x k + − x k + F , (24) k = . . . K , where x = { x k ; k = . . . K } is the state of the system and F is a constant. We set K = (cid:15) k with mean zero and variance σ . Here, we choose F =

8, a valueknown to cause weak chaotic behavior and often used as a benchmark in data assimilation. //

22. Phys. Soc. Jpn.

Table I.

Comparison among TRMC, TRMC (restricted), TRMC (no weight), and FS for stochastic typhoonmodel. Case IMethod P ( X N ∈ A ) σ s ρ ρ TRMC 6 . × − . × − . . . × − . × − . . v λ ≥

0) 6 . × − . × − − − TRMC (no weight) 0 . × − . × − − − FS 6 . × − . × − . . P ( X N ∈ A ) σ s ρ ρ TRMC 1 . × − . × − . . . × − . × − − − FS 1 . × − . × − . . Longitude La t i t ude Fig. 6.

Example of Monte Carlo paths generated by TRMC starting from Tokyo. The velocity distribution isrestricted to the tendency to move southward. Each line corresponds to a path generated by TRMC. The blackrectangular region corresponds to the possible initial position of typhoons in the northwestern part of the PaciﬁcOcean.

To simulate Eq. (24), we have to discretize it. While many discretization schemes areavailable, we focus on the simplest and most common scheme, the Euler scheme. The time-discretized version of Eq. (24) by the Euler scheme is x k , i + = x k , i + f ( x i ) ∆ t + (cid:15) k ∆ t , k = . . . K , (25) //

22. Phys. Soc. Jpn. Number of Monte Carlo Paths E s t i m a t ed P r obab ili t y Fig. 7.

Convergence of TRMC for the stochastic typhoon model in the smaller probability case. The estimatedprobabilities converge to those obtained by FS as the number of Monte Carlo paths increases. Error bars indicateapproximate ± ± M = , as TRMC. where we set ∆ t to 0 .

001 and σ to 0 . / √ ∆ t .The target region A is { ( x , . . . , x K ) | − . ≤ x i ≤ . i = . . . K } . We also assume that theinitial state for x i is uniformly distributed in D = { ( x , . . . , x K ) | . ≤ x i ≤ . i = . . . K } . Weset M to 10 and N to 100.We conduct this simulation on the environment described in Sect. 5.1. The computationaltime of TRMC in this simulation for generating 10 Monte Carlo paths is around 3 . × seconds.Table II shows the result of computational experiments for the Lorenz 96 model. It showsthat the probabilities of TRMC and FS agree within the error bars. The case where we ignorethe factor deﬁned by Eq. (16) does not reproduce the same unbiased probability as the othercomputational experiments. The result shows that TRMC can perform better for estimatingthe probabilities in the high-dimensional case. TRMC is 5.18 times more e ﬃ cient than FSin terms of ρ , and 8.23 times in ρ ﬃ cient in Table II. Fig. 8 shows the convergenceof TRMC when the number of Monte Carlo paths M increases. It reveals that our algorithmconverges correctly on increasing the number of Monte Carlo paths M .

6. Improved Scheme with Resampling

Let us consider cases with a larger number of time steps. The proposed algorithm may notalways work e ﬃ ciently in this situation. For example, we consider the case where N is equalto 500 in the Lorenz 96 model (Fig. 9); these weights are normalized such that their sum is //

22. Phys. Soc. Jpn.

Table II.

Comparison among TRMC, TRMC (no weight), and FS for the Lorenz 96 model.Method P ( X N ∈ A ) σ s ρ ρ TRMC 2 . × − . × − .

23 5 . . × − . × − − − FS 2 . × − . × − .

00 1 . Number of Monte Carlo Paths E s t i m a t ed P r obab ili t y Fig. 8.

Convergence of TRMC for the Lorenz 96 model. The estimated probabilities converge to those ob-tained by FS as the number of Monte Carlo paths increases. Error bars indicate approximate ± ± M = , as TRMC.

1, i.e., (cid:80) Mj = W ( j ) =

1. The inset located at the top right of the ﬁgure shows the graph witha logarithmic scale on the x-axis. This style is also used in Fig. 12. The weight distributioncorresponding to N =

500 in Fig. 9 has a heavy-tailed distribution. This phenomenon is re-ferred to as degeneracy, and it means that the weights become unbalanced, and a few weightsdominate all the others. This consequently causes a decrease in computational e ﬃ ciency. ? ) We introduce an improved scheme to solve this problem, which is realized by resam-pling.

Hereafter, we denote it as TRMC (RS). This algorithm is e ﬀ ective when both thenumber of time steps and the amount of noise are large.Note that our algorithm is based on time-reversed dynamics and uses SMC di ﬀ erentlyfrom the previous studies

4, 22, 25–27) on rare event sampling.We assume that the resampling procedure modiﬁes the weight at s time step s − (cid:89) i = W i (26) //

22. Phys. Soc. Jpn. · · · · · - · - Weight W e i gh t den s i t y N=100 N=500

Lorentz96 Model

Log Weight W e i gh t den s i t y Fig. 9.

Distribution of weight (cid:81) N − i = W i in TRMC with di ﬀ erent numbers of time steps. The vertical andhorizontal lines indicate the weight density and the value of weights, respectively. The weight distributions witha large number of time steps have a heavy-tailed distribution. of each Monte Carlo path to an unweighted one by eliminating Monte Carlo paths havingsmall weights and by multiplying Monte Carlo paths having large weights.We denote the j th Monte Carlo path as x ( j ) = (cid:110) x ( j )0 , . . . , x ( j ) s (cid:111) . The procedure of resamplingis as follows:(1) Deﬁne normalized weights ˜ W ( j ) = (cid:81) s − i = W ( j ) i (cid:80) Mj = (cid:81) s − i = W ( j ) i . (2) Resample M times with replacement from set (cid:110) x ( j ) (cid:111) Mj = of Monte Carlo paths, where theprobability of sampling set of x ( j ) is proportional to ˜ W ( j ) .After a resampling step, Monte Carlo paths (cid:110) x ( j ) (cid:111) Mj = and associated weights (cid:110) W ( j ) (cid:111) Mj = arereplaced by the set of replicated Monte Carlo paths with an equal importance weight W ( j ) = M (cid:80) Mj = (cid:81) s − i = W ( j ) i . Degeneracy is estimated using the e ﬀ ective sample size: M e f f = (cid:80) Mj = ( ˜ W ( j ) ) . (27)A small value of M e f f corresponds to high degeneracy. Hence, a resampling procedure isperformed when this value is lower than a certain threshold Θ = α M , where α is a relativethreshold. That is, a resampling procedure is performed when M ef f M < α .We can use the Eq. (13) to evaluate probabilities in this case. Figure 10 shows a graphicalscheme of resampling.Using this resampling, we simulate the Lorenz 96 model with σ = . / √ ∆ t , which is //

22. Phys. Soc. Jpn.

Resampling

Probability densityParticles after resamplingParticles before resampling

Fig. 10.

Graphical example of resampling. Particles with large weights are replaced with multiple copies ofthem, and particles with small weights are removed. larger than that in Sect. 5.2. We set the threshold α to 0 . , .

5, and 0 .

9. The simulations withthese threshold values of α are denoted by α = , α = Table III.

Comparison among TRMC, TRMC (RS, α = P ( X N ∈ A ) σ s ρ ρ TRMC 2 . × − . × − . . α = . × − . × − . . . × − . × − . . On the other hand, Fig. 11 shows that TRMC (RS) is more e ﬃ cient than TRMC in a widerange of threshold values. We also show the weight distributions of TRMC and TRMC (RS)in Fig. 12. The variance of the distribution is much smaller for TRMC (RS) than for TRMC.

7. Discussion

The examples provided in the preceding sections show that backward simulations usingTRMC provide correct averages and can be more e ﬃ cient than forward simulations. In theseexamples, the computational e ﬃ ciencies of TRMC are 3–16 times higher than those obtainedby forward simulation, when the calculated probability of hitting the target is 2 × − –10 − .Note that TRMC can be used to calculate the probability for an arbitrarily small target region; //

22. Phys. Soc. Jpn.

Forward TRMC a =5% a =50% a =90% r , r r r Fig. 11.

Comparison among FS (Forward), TRMC, and TRMC(RS) for the Lorenz 96 model. α = α % meansTRMC(RS) with α = α . · · · · · - · - · - · - Weight W e i gh t den s i t y TRMC TRMC(RS)

Log Weight W e i gh t den s i t y Fig. 12.

Distribution of the weight (cid:81) N − i = W i in TRMC and TRMC (RS) for the Lorenz 96 model. We set thethreshold α to 0 . this would be impossible by using forward simulation.There are, however, cases in which TRMC is ine ﬃ cient. First, TRMC is not advantageousif the time-reversed paths rarely encounter a region in which the initial density p ( x ) is high;this can occur when the initial density is not broad. Another case in which TRMC can be in-e ﬃ cient is when the weight in Eq. (19) (or, in the continuous time version, Eq. (20)) is highlytime dependent. If paths with smaller weights in the initial stage of backward simulation ac-quire larger weights in the latter stage, resampling of the path (particle splitting) in SMC maynot be e ﬀ ective. In this case, if TRMC with SIS is ine ﬀ ective, TRMC with SMC also showspoor performance. //

22. Phys. Soc. Jpn.

To discuss the possible improvement of the algorithm, it is useful to introduce optimalbackward dynamics. Although it is not easy to obtain these dynamics a priori , the formaldeﬁnition is derived as follows. First, the marginal probability at step n obtained from theforward simulation is deﬁned as p ( x n ) = (cid:90) dx n −  n − (cid:89) i = p ( x i + | x i )  p ( x ) , (28)which satisﬁes the relation p ( x i + ) = (cid:90) p ( x i + | x i ) p ( x i ) dx i . (29)Using Eqs. (28) and (29), the transition probability q ∗ of the optimal backward dynamics isdeﬁned as q ∗ ( x i + → x i ) = p ( x i + | x i ) p ( x i ) (cid:82) p ( x i + | x i ) p ( x i ) dx i = p ( x i + | x i ) p ( x i ) p ( x i + ) . . (30)Note that Eq. (30) appears similar to the formulas used in Bayesian inference when the prob-ability p ( x i ) obtained by forward simulations is regarded as an analog of the prior distributionof x i . In terms of the selection of Eq. (30) for backward dynamics, the following relationholds:  n − (cid:89) i = p ( x i + | x i )  p ( x ) = p ( x N ) (cid:89) i = N − q ∗ ( x i + → x i ) , (31)Eq. (31) means that the combined probability of time-reversed paths deﬁned by forward sim-ulation is recovered by the backward dynamics Eq. (30). Speciﬁcally, the time-reversed pathsinitialized by p ( x N ) automatically converge to their initial density p ( x ) using the backwarddynamics Eq. (30). In this sense, q ∗ ( x i + → x i ) in Eq. (30) is considered as the optimalbackward dynamics. Implementation of these dynamics, however, requires the probabilities p ( x i ) , i = , . . . N , which are usually not available prior to the simulations. Note that the back-ward dynamics deﬁned by the Langevin equation in previous studies can be derived fromEq. (30) as a continuous-time limit.Equations (30) and (31) were previously discussed in the ﬁeld of time-series dataanalysis, where approximations of the marginal probabilities p ( x i ) , i = , . . . N are used todeﬁne the backward dynamics q ( x i + → x i ). In these studies, the observed data were availableat many of the time steps i = , . . . , N , whereas the target is given only at i = N in ourproblem. Then, approximations of probabilities p ( x i ) , i = , . . . N are derived using forwardsimulations constrained with the observed data (“ﬁltering stage”).It is, however, di ﬃ cult to apply these methods to our problem. If we were to apply a //

22. Phys. Soc. Jpn. similar method to our problem, we would have to run a number of forward simulations toestimate p ( x i ) , i = , . . . N before executing backward simulations. This would be computa-tionally expensive and seems unrealistic without a highly e ﬃ cient method for the probabilityestimation. Methods such as those discussed previously

29, 30) may be applied to optimize thebackward dynamics in our problem, but this is left for future study.On the other hand, when some observed data are available outside the equations thatdescribe the stochastic process, we may use these data to approximate p ( x i ) , i = , . . . N andhence use them to approximate the optimal backward dynamics. In this case, we avoid theuse of a large amount of forward computation to construct the optimized backward dynamics.This seems possible for the stochastic typhoon model, where data from actual observationsof real typhoons are available. Note that this idea is di ﬀ erent from data assimilation (i.e.,inference with simulations combined with observed data), because here we use observeddata only to improve the computational e ﬃ ciency; they do not cause the bias of calculatedprobabilities.In this paper, we assumed that the number of time steps is ﬁxed. This is not, however, al-ways clear in advance in realistic problems. In the case of a realistic typhoon model, we mustconsider the case of passing through Tokyo between two discrete time steps. The interpolationmethod in this case will be developed in a future work.

8. Concluding Remarks

We discussed methods for the backward simulation of the stochastic process. These meth-ods trace a time-reversed path from the target region to the initial conﬁguration. A na¨ıve ap-proach to this problem was shown not to function as expected. To resolve the di ﬃ culties, thetime reverse Monte Carlo method (TRMC) was introduced. The TRMC method is based onSIS and SMC, and is designed to provide the probabilities of events correctly. TRMC withSIS was tested for the stochastic typhoon model and the Lorenz 96 model; it converges moree ﬃ ciently than forward simulations in some of these examples. For simulations with a largernumber of steps, TRMC with SMC was shown to be advantageous. We also discussed thelimitation and possible improvement of TRMC and its relation to the Bayes formula. Appendix A: Deviation of Eq. (12)The aim of this appendix is to prove Eq. (12). Up to the ﬁrst-order ∆ t , the Jacobiandet( J g ( x )) is given by det( J g ( x )) = det ( I + ∇ f ( x ) ∆ t ) //

22. Phys. Soc. Jpn. = + Tr ( ∇ f ( x ) ∆ t ) + O (( ∆ t ) ) = exp (cid:2) div f ( x ) ∆ t (cid:3) + O (( ∆ t ) ) , (A · I is the unit matrix of order D × D . D is the dimension of stochastic process X .Using Eq. (A · ∆ t → J ( y , . . . , y N ) = N − (cid:89) i = (cid:12)(cid:12)(cid:12) det( J g − ( y i + )) (cid:12)(cid:12)(cid:12) = exp  N (cid:88) i = − div f ( x i ) ∆ t  + O (( ∆ t ) ) −−−−→ ∆ t → exp (cid:34) − (cid:90) T div f ( x t ) dt (cid:35) . (A · Appendix B: Deviation of Eq. (20)The aim of this appendix is to prove (20).Up to the ﬁrst-order ∆ t , the weight at time t i is given by W i = exp (cid:34) − ( f ( x i + ) − f ( x i )) T Σ − (cid:32) ( x i + − x i ) − ∆ t f ( x i + ) + f ( x i )) (cid:33)(cid:35) (B · = exp (cid:34) Tr (cid:32) − ( f ( x i + ) − f ( x i )) T Σ − (cid:32) ( x i + − x i ) − ∆ t f ( x i + ) + f ( x i )) (cid:33)(cid:33)(cid:35) = exp (cid:104) − Tr (cid:16) ( ∇ f ( x i )( x i + − x i )) T Σ − ( x i + − x i ) (cid:17) + o ( ∆ t ) (cid:105) = exp (cid:104) − Tr (cid:16) ∇ f ( x i ) T Σ − ( x i + − x i )( x i + − x i ) T (cid:17) + o ( ∆ t ) (cid:105) . In the limit as ∆ t →

0, Eq. (17) becomes the following stochastic di ﬀ erential equation: dX t = f ( X t ) dt + σ dW t , (B · W t is a standard Brownian motion. Here, we used Ito’s rule,

18, 19) in which we substitute √ dt for dW t and consider up to the order of dt . Using Eq. (B · ∆ t → x i + − x i )( x i + − x i ) T −−−−→ ∆ t → dx t dx Tt = ( f ( x t ) dt + σ dW t ) ( f ( x t ) dt + σ dW t ) T (B · = σ dW t dW Tt σ T dt + O ( dt ) = Σ dt , (B · dW t dW Tt = dt and σσ T = Σ . /

22. Phys. Soc. Jpn.

As a result, we obtain using Eq. (B ·

1) and (B · N − (cid:89) i = W i = exp  − N − (cid:88) i = Tr (cid:16) ∇ f ( x i ) T Σ − ( x i + − x i )( x i + − x i ) T (cid:17) + o ( ∆ t )  (B · −−−−→ ∆ t → exp (cid:34) − (cid:90) T Tr (cid:16) ∇ f ( x t ) T (cid:17) dt (cid:35) = exp (cid:34) − (cid:90) T div f ( x t ) dt (cid:35) , (B · //

22. Phys. Soc. Jpn.

References

1) A. Doucet, N. De Freitas, and N. Gordon:

Sequential Monte Carlo methods in practice (Springer-Verlag New York, New York, 2001), pp. 3–14.2) J. S. Liu:

Monte Carlo Strategies in Scientiﬁc Computing (Springer-Verlag New York,New York, 2004), pp. 23–52.3) A. Doucet and A. M. Johansen, A Tutorial on Particle ﬁltering and smoothing: Fifteenyears later, The Oxford Handbook of Nonlinear Filtering, pp. 656–704. Oxford Univer-sity Press, 2011.4) J. Tailleur and J. Kurchan: Nature Physics (2007) 203.5) D. M. Zuckerman: Statistical Physics of Biomolecules: An Introduction (CRC Press,Florida, 2010), pp. 297–323.6) L. Donati, C. Hartmann, and B. G. Keller: The Journal of chemical physics (2017)244112.7) J. D. Chodera, W. C. Swope, F. No, J.-H. Prinz, M. R. Shirts, and V. S. Pande: TheJournal of chemical physics (2011) 244107.8) J. Anderson, S. Ho ﬀ man, and C. Peters: The Journal of Physical Chemistry (1972)4006.9) U. G. Haussmann and E. Pardoux: The Annals of Probability (1986) 1188.10) A. Millet, D. Nualart, and M. Sanz: The Annals of Probability (1989) 208.11) M. Briers, A. Doucet, and S. Maskell: Annals of the Institute of Statistical Mathematics (2010) 61.12) F. Lindsten and T. B. Sch¨on: Foundations and Trends in Machine Learning (2013) 1.13) M. Isard and A. Blake: European conference on computer vision, 1998, pp. 767–781.14) R. Gri ﬃ ths and S. Tavare: Theoretical Population Biology (1994) 131.15) K. Zhu and L. Ying: IEEE / ACM Transactions on Networking (TON) (2016) 408.16) D. W. Heermann, Computer-simulation methods, Computer Simulation Methods in The-oretical Physics, pp. 8–12. Springer, 1990.17) E. Vanden-Eijnden: Annual review of physical chemistry (2010) 391.18) M. S. Joshi: The Concepts and Practice of Mathematical Finance (Cambridge Univer-sity Press, Cambridge, 2008), pp. 97–126. //

22. Phys. Soc. Jpn.

19) B. Oksendal:

Stochastic di ﬀ erential equations: an introduction with applications (Springer-Verlag Berlin Heidelberg, Berlin, 2013), pp. 43–60.20) S. Nakano, K. Suzuki, and G. Ueno: presented at JSST 2013 International Conferenceon Simulation Technology (2013).21) E. N. Lorenz: Proc. Seminar on predictability, Vol. 1, 1996.22) J. Wouters and F. Bouchet: arXiv:1511.02703 (2015).23) E. Ott, B. R. Hunt, I. Szunyogh, A. V. Zimin, E. J. Kostelich, M. Corazza, E. Kalnay,D. Patil, and J. A. Yorke: Tellus A: Dynamic Meteorology and Oceanography (2004)415.24) S. T˘anase-Nicola and J. Kurchan: Physical review letters (2003) 188302.25) T. La ﬀ argue, K.-D. N. T. Lam, J. Kurchan, and J. Tailleur: Journal of Physics A: Math-ematical and Theoretical (2013) 254002.26) R. J. Allen, C. Valeriani, and P. R. ten Wolde: Journal of physics: Condensed matter (2009) 463102.27) D. M. Zuckerman: Annual review of biophysics (2011) 41.28) A. Kong, J. S. Liu, and W. H. Wong: Journal of the American statistical association (1994) 278.29) H. J. Kappen and H. C. Ruiz: Journal of Statistical Physics (2016) 1244.30) J. Heng, A. N. Bishop, G. Deligiannidis, and A. Doucet: arXiv:1708.08396 (2017). //