Efficient Low-Order Approximation of First-Passage Time Distributions
David Schnoerr, Botond Cseke, Ramon Grima, Guido Sanguinetti
EEfficient Low-Order Approximation of First-Passage Time Distributions
David Schnoerr , Botond Cseke , Ramon Grima , and Guido Sanguinetti , ∗ School of Informatics, University of Edinburgh, Edinburgh, UK School of Biological Sciences, University of Edinburgh, Edinburgh, UK Microsoft Research, Cambridge, UK
We consider the problem of computing first-passage time distributions for reaction processesmodelled by master equations. We show that this generally intractable class of problems is equivalentto a sequential Bayesian inference problem for an auxiliary observation process. The solution canbe approximated efficiently by solving a closed set of coupled ordinary differential equations (for thelow-order moments of the process) whose size scales with the number of species. We apply it to anepidemic model and a trimerisation process, and show good agreement with stochastic simulations.
Many systems in nature consist of stochastically inter-acting agents or particles. Such systems are frequentlymodelled as reaction processes whose dynamics are de-scribed by master equations [1]. There are several exam-ples of stochastic modelling of reaction processes in thefields of systems biology [2, 3], ecology [4], epidemiology[5], social sciences [6] and neuroscience [7]. The mathe-matical analysis of such stochastic processes, however, ishighly non-trivial.A particularly important quantity of interest is the first-passage time (FPT), that is, the random time ittakes the process to first cross a certain threshold [8, 9].FPT distributions play a crucial role both in the theoryof stochastic processes and in their applications acrossvarious disciplines as they allow us to investigate quanti-tatively the uncertainty in the emergence of system prop-erties within a finite time horizon. For example, the timeit takes cells to respond to external signals by expressingcertain genes may be modelled as a FPT problem. Dif-ferent characteristics of this first time distribution, forexample the variance of the FPT, may represent evo-lutionarily different strategies that organisms adopt tofilter fluctuations in the environment [10–12]. Examplesfrom other disciplines include the extinction time of dis-eases in epidemic models, or the time it takes to form acertain number of polymers in polymerisation processes.FPTs for stochastic processes have been of interest instatistical physics for many decades [13]. For certain ran-dom walk or spatial diffusion processes analytic solutionshave been derived [14–16]. Recently, analytic resultshave been found for effective one-dimensional diffusionprocesses to a target [17–19]. For multi-dimensional dif-fusion processes to small targets approximate solutionshave been derived using singular perturbation methodsand matched asymptotic expansions [20–23].The problem of computing FPT distributions for reac-tion processes modelled by master equations, however,is much less explored. Generally, no tractable evolu-tion equations exist except for one-variable, one-step pro-cesses [1, 13], or certain linear and/or catalytic processes[24–26]. For single-time properties of the underlying mas- ∗ [email protected] ter equation efficient approximation methods exist rely-ing on continuous state spaces [27], but it is not clear howto extend them for the computation of FPTs. Spectralmethods constitute efficient approximations for small sys-tems [28, 29]. Since these methods typically scale withthe size of the state space, they are not applicable tolarge systems. Some existing FPT approaches for masterequations consider rare events in single-species systemsand/or mean FPTs only [30–33].In this article, we approach the problem of computingFPTs from a novel perspective. We show that the FPTproblem can be formulated exactly as a Bayesian infer-ence problem. We achieve this by introducing an aux-iliary observation process that determines whether theprocess has crossed the threshold up to a given time. Thisnovel formulation allows us to derive an efficient approx-imation scheme that relies on the solution of a small setof ordinary differential equations. We will use this ap-proximation to analyse the FPT distributions in severalnon-trivial examples. We focus on reaction networks withdiscrete state spaces modelled by master equations, butthe developed method can also be applied to processeswith continuous state spaces modelled by Fokker-Planckequations.The standard approach to compute the FPT of a pro-cess x t to leave a certain region C is to compute the survival probability Z [0 ,t ] , that is, the probability thatthe process remains in C on the time interval [0 , t ] [13].The FPT distribution is then given by the negative timederivative of Z [0 ,t ] . The latter can be written as a pathintegral over the process with absorbing boundary of C [13]. Equivalently, one can reweigh the unconstrainedprocess by an indicator function p ( C [0 ,t ] | x [0 ,t ] ) on thepaths x [0 ,t ] such that p ( C [0 ,t ] | x [0 ,t ] ) = 1 if x τ ∈ C for τ ∈ [0 , t ] and zero otherwise. One can then write thesurvival probability Z [0 ,t ] up till time t as a path integralover the density p ( x [0 ,t ] ) of the unconstrained process as Z [0 ,t ] = (cid:90) D x [0 ,t ] p ( x [0 ,t ] ) p ( C [0 ,t ] | x [0 ,t ] ) . (1)At the heart of our method lies the interpretation of p ( C [0 ,t ] | x [0 ,t ] ) as a binary observation process : an ob-server external to the system assesses if the process hasleft the region of interest or not. In this interpretation, a r X i v : . [ phy s i c s . c o m p - ph ] N ov the survival probability Z [0 ,t ] constitutes the marginallikelihood of this auxiliary observation process. The prob-lem of computing Z [0 ,t ] and hence the FPT distributionis thus formally equivalent to a Bayesian inference prob-lem . Note, however, that there are no experimental datainvolved and no data are being simulated.Moreover, note that so far no approximations havebeen made and (1) is exact. However, it is not obvi-ous how to compute the path integral in (1). To makeprogress, we approximate the continuous-time processwith paths x [0 ,t ] by a discrete-time version ( x t , . . . , x t N )at points t = 0 , . . . , t N = t with spacing ∆ t = t/N . Theeffects of such a discretisation of time on certain survivalprobabilities has recently been studied in [34]. We willlater take the continuum limit ∆ t → global observation process p ( C [0 ,t ] | x [0 ,t ] ) can be written as a product of local ob-servation processes p ( C t i | x t i ) as p ( C [0 ,t ] | x t , . . . , x t N ) = N (cid:89) i =0 p ( C t i | x t i ) , (2)where p ( C t i | x t i ) = 1 if x t i ∈ C and zero otherwise. Thisgives the model a Markovian structure and allows us tocast it into a sequential Bayesian inference problem, asfollows. First, we approximate the binary observationfactors in (2) by a smooth approximation of the form p ( C t i | x t i ) ≈ exp ( − ∆ t U ( x t i , t i )) , (3)where U ( x t i , t i ) is a smooth function that is large for x t i / ∈ C and close to zero for x t i ∈ C , with a sharp slopeat the boundary. Moreover, we require U ( x t i , t i ) to havea tractable expectation w.r.t. a normal distribution. Thesmooth approximation in (3) proves computationally ex-pedient in the algorithm below and will allow us to takethe continuum limit ∆ t →
0. Note that this approxi-mation is equivalent to approximating the global binaryconstraint with the global soft (that is, continuous) con-straint p ( C [0 ,t ] | x [0 ,t ] ) = exp (cid:18) − (cid:90) t dτ U ( x τ , τ ) (cid:19) . (4)The survival probability Z [0 ,t ] in (1) for the discrete-timesystem factorises as Z [0 ,t ] ≈ p ( C t ) N − (cid:89) i =0 p ( C t i +1 | C ≤ t i ) , (5)where p ( C t ) is the probability of being in C at time t and p ( C t i +1 | C ≤ t i ) ≡ p ( C t i +1 | C t i , C t i − , . . . , C t ) = (cid:82) d x t i +1 p ( C t i +1 | x t i +1 ) p ( x t i +1 | C ≤ t i ) is the probabilitythat the process is found to be in C at time t i +1 , giventhat it was in C for all previous time points. The com-putation of these factors corresponds to a sequentialBayesian inference problem which can be solved by iter-atively (i) solving the master equation forward between measurement points and (ii) updating the distributionusing the observation model. More specifically, the twosteps comprise(i) Suppose we know p ( x t i | C ≤ t i ) ≡ p ( x t i | C t i , C t i − , . . . , C t ) at time t i , that is,the marginal distribution of the process at time t i conditioned on the current and all previousobservations. Suppose further that using this asthe initial distribution, we can solve the system(the master equation) forward in time until timepoint t i +1 to obtain p ( x t i +1 | C ≤ t i ), that is, themarginal distribution of the process at time t i +1 conditioned on previous observations (note that p ( x t i +1 | C ≤ t i ) does not include the observation C t i +1 at time t i +1 ).(ii) To obtain p ( x t i +1 | C ≤ t i +1 ) we need to take the ob-servation p ( C t i +1 | x t i +1 ) at time point t i +1 into ac-count. This is achieved by means of Bayes’ theoremas p ( x t i +1 | C ≤ t i +1 ) = p ( C t i +1 | x t i +1 ) p ( x t i +1 | C ≤ t i ) Z t i +1 , (6) where we defined the normalisation Z t i +1 = p ( C t i +1 | C ≤ t i ). Note that the latter is just a factorin (5).Performing steps (i) and (ii) iteratively from t to t N and keeping track of the normalisation factors in (6) onecan thus, in principle, compute the survival probabilityaccording to (5).However, steps (i) and (ii) are generally intractable,and we propose an approximation method in the follow-ing. For step (i), we need to solve the system forward intime. We do this approximately by means of the normalmoment closure [35–37], which approximates the dis-crete process by a continuous one and assumes the single-time probability distribution to be a multivariate normaldistribution N ( x t ; µ t , Σ t ) with mean µ t and covariance Σ t . Using this assumption on the master equation leadsto a closed set of ordinary differential equations for µ t and Σ t which can be solved numerically [27].Now suppose that we have solved the system forwardfrom time t to t + ∆ t using normal moment closureto obtain ˆ µ t +∆ t and ˆ Σ t +∆ t and hence the distribution p ( x t +∆ t | C ≤ t ) = N ( x t +∆ t ; ˆ µ t +∆ t , ˆ Σ t +∆ t ) (step (i)). Wenext have to perform the observation update in (6) instep (ii) to obtain p ( x t +∆ t | C ≤ t +∆ t ). In order to be ableto use normal moment closure again in the next (i) step,we approximate p ( x t +∆ t | C ≤ t +∆ t ) by a multivariate nor-mal distribution with mean µ t +∆ t and covariance Σ t +∆ t of the r.h.s. in (6). This approach is known as assumed-density filtering in the statistics literature [38]. In sum-mary, with the described approximations, steps (i) and(ii) comprise(i) Solve normal moment closure equations for µ t and Σ t from time t to t +∆ t to obtain ˆ µ t +∆ t and ˆ Σ t +∆ t ,where µ t and Σ t are respectively the mean and co-variance of the approximating normal distribution N ( x t ; µ t , Σ t ).(ii) Compute the mean µ t +∆ t and covariance Σ t +∆ t ofthe r.h.s. of (6) and approximate p ( x t +∆ t | C ≤ t +∆ t )in (6) with a corresponding normal distribution N ( x t +∆ t ; µ t +∆ t , Σ t +∆ t ).Next, we derive a continuous time description combin-ing steps (i) and (ii). This is achieved by first expandingthe update in step (i) leading from µ t and Σ t to ˆ µ t +∆ t and ˆ Σ t +∆ t in ∆ t , which gives a single Euler step updateof the moment closure equations. Similarly, we expandstep (ii) which leads from ˆ µ t +∆ t and ˆ Σ t +∆ t to µ t +∆ t and Σ t +∆ t , as follows. Note first that, by definition, thenormalisation Z t +∆ t in (6) can be written as Z t +∆ t ≈ (cid:90) d x N ( x ; ˆ µ t +∆ t , ˆ Σ t +∆ t ) e − ∆ tU ( x ,t +∆ t ) . (7) Taking the logarithm of both sides, expanding in ∆ t andtaking derivatives w.r.t. ˆ µ t and ˆ Σ t , one can derive thedesired expansion of the update in step (ii). The result-ing expansions of steps (i) and (ii) can then be combinedto give unifying update equations for µ t and Σ t (see Sup-plemental Material [39] for a derivation). Finally, takingthe continuum limit ∆ t → ∂∂t µ t = (cid:18) ∂∂t µ t (cid:19) MC − Σ t ∂∂ µ t (cid:104) U ( x t , t ) (cid:105) N ( x t ; µ t , Σ t ) , (8) ∂∂t Σ t = (cid:18) ∂∂t Σ t (cid:19) MC − Σ t (cid:18) ∂∂ Σ t (cid:104) U ( x t , t ) (cid:105) N ( x t ; µ t , Σ t ) (cid:19) Σ t , (9) ∂∂t log Z [0 ,t ] = −(cid:104) U ( x t , t ) (cid:105) N ( x t ; µ t , Σ t ) . (10) Here, the first terms on the r.h.s. of (8) and (9) are re-spectively the equations for the mean and covariance asobtained from the normal moment closure (MC) for theunconstrained system, while the second terms incorpo-rate the auxiliary observation. Equation (10) gives thedesired survival probability for the process. We termthis method for computing FPT distributions
BayesianFirst-Passage Times (BFPT).Equations (8)-(10) are the central result of this article.They constitute closed form ordinary differential equa-tions for the mean, covariance and log-survival probabil-ity of the process, for which efficient numerical integra-tors exist. Solving these equations forward in time on aninterval [0 , t ] provides an approximation of the survivalprobability Z [0 ,τ ] for all τ ∈ [0 , t ] (on the time grid of thenumerical integrator), from which the FPT distribution p ( τ ; C ) can be derived for all τ ∈ [0 , t ] by taking the neg-ative derivative of Z [0 ,τ ] , that is, p ( τ ; C ) = − ∂Z [0 ,τ ] /∂τ .The number of equations scales with the square of thenumber of species, and the method hence is applicableto large systems. Crucially, and in contrast to stochas-tic simulations and spectral methods, the complexity ofthe method is independent of the population size and the ( � ) SIR time P opu l a t i onnu m be r s ( � ) BFPT meanBFPT varianceBFPT modeSSA meanSSA varianceSSA mode6 8 10 12 14 16 18 2001234 x m ean , v a r . & m odeo f F P T ( � ) time F P T d i s t r i bu t i on ( � ) BFPTSSA0 1 2 3 4 5 6 70.00.20.40.60.81.01.21.4 time F P T d i s t r i bu t i on ( � ) - - - - - time, t l og [ F P T d i s t r i bu t i on ] ∼ e - t ∼ e - t ( � ) - - - - - - - time, t l og [ F P T d i s t r i bu t i on ] ∼ e - t ∼ e - t ∼ e - t FIG. 1. Results for the epidemic system (12). (a) Simulatedpath of the process. (b) Mean, variance and mode of theFPT distribution of species I becoming extinct as a functionof the initial populations x of species S , from the stochasticsimulation algorithm (SSA, dots, 10 samples per point) andfrom BFPT (lines). The rate constants are set to k = 0 . k = 1, and the initial value for species I is set to y =2 x . (c),(d) FPT distributions as obtained from the SSA(dots, 10 samples per parameter set) and BFPT (lines) forthe parameter set ( x , y , k , k ) chosen as (6 , , . ,
1) (blue,(c)), (20 , , . ,
1) (red, (c)), (20 , , . ,
2) (blue, (d)), and(40 , , . ,
1) (red, (d)). The parameter a in (11) was chosenas a = − a = − . size of the state space. Similar equations were obtainedin a different context in [40, 41] by means of a variationalapproximation.In the derivation of (8)-(10) we utilised three approx-imations: after discretising time, we approximated theunconstrained process using normal moment closure andthe observation updates by projections onto a normal dis-tribution. We then approximated the binary observationmodel by a soft loss function, which allowed us to takethe continuum limit in time. Depending on the prob-lem, the relative contribution of the three sources to theoverall error may vary.The choice of loss function U ( x , t ) depends on theproblem at hand. In general, for computational conve-nience one needs to be able to compute analytically theexpectation of the loss function w.r.t. a multivariate nor-mal distribution. In our examples, we use an exponentialloss function to constrain the i th component of the statevector x about a threshold c U ( x , t ) = exp( a ( x i − c )) , a ∈ R , c ∈ R . (11)The absolute value of the parameter a determines thesteepness of the loss function. In principle, we choose a as large as numerically feasible. For a detailed discussionon the choice of loss function see Supplemental Material[39].We now examine the performance of BFPT on threeexamples. For the analytically tractable Poisson birthprocess we find that BFPT captures the low-order mo-ments and the bulk mass of the distribution accuratelywhile giving the correct scaling law for the tail of thedistribution (see Supplemental Information for details).Next, we consider an epidemic system consisting of asusceptible population S , an infected population I and arecovered population R and interactions S + I k −−−−−→ I, I k −−−−−→ R. (12) This system is frequently modelled as a continuous-time Markov-jump process to model a disease spreadingthrough a population. k and k in (12) are the cor-responding rate constants. Let x t = ( x t , y t , z t ), where x t , y t and z t denote the populations of S, I and R , re-spectively. We are interested in the probability distribu-tion of time for the disease to be permanently eradicated,that is, the time it takes the process to reach a state with y t = 0.Figs. 1(b) shows the mean, variance and mode of theFPT to extinction as obtained from our method and thestochastic simulation algorithm [42]. We find that BFPTaccurately captures the mean, variance and mode of theFPT over a wide range of varying initial values for S and I .Figs. 1(c) and (d) show the FPT distributions forfour different parameter sets. The modality, mode andoverall shape of the FPT are well captured, even forhighly skewed and bimodal distributions (c.f. blue curvein Figs. 1(c) and (d), respectively). In some cases themethod predicts less peaked distributions than actual(not shown here). Figs. 1(e) and (f) show the same re-sults on logarithmic scale. We observe that our methodcorrectly predicts an exponential scaling (straight linesin logarithmic scale), although the scaling is not alwaysaccurate, indicating a worse approximation in the tails ofthe distribution.The value of the approach is borne out by consider-ing its computational efficiency: for the results shownin Fig. 1, BFPT is several orders of magnitude fasterthan stochastic simulations. For example, simulating 10 paths to obtain the results shown in Figs. 1(c)-(f) takesabout 10 -10 seconds in our implementation of the di-rect stochastic simulation algorithm [42], whereas BFPTtakes less than a second.Finally, we apply BFPT to a polymerisation system ofmonomers X , dimers XX and trimers XXX with inter-actions X + X k −→ XX, XX + X k −→ XXX. (13)
Starting from a fixed number of 10 of monomers, zerodimers and zero trimers, we are interested in the FPT it ( � ) BFPTSSA time F P T d i s t r i bu t i on ( � ) - - - - - - - - - - log ( k ) l og ( k ) Mean FPT ���� ( � ) - - - - - - - - - - log ( k ) l og ( k ) Coefficient of Variation ( � ) FIG. 2. Results for the polymerisation system in (13). (a)FPT distribution for the parameters k = k = 10 − ob-tained from the stochastic simulation algorithm (SSA, dots,10 samples). (b),(c) Heat plots of the mean and the coeffi-cient of variation (defined as standard deviation divided bymean) of the FPT to produce 200 trimers starting with 10 monomers, as a function of k and k on logarithmic scale.(d) Corresponding 3D plot for the normalisation of the FPTdistribution, that is, the probability with which at least 200trimers are being produced. The white areas in (b) and (c)indicate that either the value is larger than the plotted rangeor that the target state is reached with such small probabilitythat an estimation of moments is not sensible. The parameter a in (11) was fixed to a = 0 . takes to produce 200 trimers. We are interested in ex-ploring the dependence of this FPT distribution on theparameters of the system (dimerisation and trimerisationrate k and k , respectively); such parameter explorationis computationally too demanding to be performed bybrute force simulation without access to dedicated hard-ware since the FPT distribution needs to be estimatedfor a large number of parameter sets.Fig. 2 shows the results for this process. We observeexcellent agreement between BFPT and simulations for aparticular value of the parameters (Fig. 2 (a)). The heatplot for the mean as a function of k and k indicates thatfor a given trimerisation rate k a minimal mean FPT isachieved for an intermediate value of dimerisation rate k (Fig. 2 (b)). We find a linear relationship k ≈ . k between the two rates for the location of these minima.The variance of FPT behaves quantitatively similarly(not shown in the figure). The coefficient of variation(Fig. 2 (c)), however, becomes minimal for small valuesof k for a given k . This reveals an unexpected trade-off between an optimal mean FPT and optimal noise-to-mean ratio (coefficient of variation). Fig. 2 (d) shows theprobability that the target state is reached, that is, theprobability that at least 200 trimers are being produced.We find that there are two parameter regions, one withprobability close to one and one with probability closeto zero, and a small transition range between these twowith boundary k ≈ . k .In conclusion, we have shown that the problem ofcomputing survival probabilities and FPT distributionsfor stochastic processes can be formulated as a sequen-tial Bayesian inference problem. This novel formulationopens the way for a new class of efficient approxima-tion methods from machine learning and computationalstatistics to address this classical intractable problem.Here, we derived an approximation for FPT distributionswhich relies on solving a small set of ordinary differentialequations. This results in considerable efficiency gains;empirically, we found the approximation to be highly ac- curate in several examples. However, we do not have atpresent systematic error estimates for the method; weleave the investigation of such bounds and possible cor-rection methods for future work. In particular, it will beinteresting to study the tail behaviour of FPT distribu-tions with our method, as these were not always capturedwell in our examples. We notice that, while we appliedour method to processes with discrete state spaces mod-elled by master equations, in principle, it can equally eas-ily be applied to processes with continuous state spacesmodelled by Fokker-Planck equations.This work was supported by the Leverhulme Trust[RPG-2013-171]; and the European Research Council[MLCS 306999]. We thank Manfred Opper for insightfuldiscussions. [1] C. W. Gardiner, Handbook of Stochastic Methods , Vol. 3(Springer Berlin, 1985).[2] A. Eldar and M. B. Elowitz, Nature , 167 (2010).[3] R. Grima, Phys. Rev. Lett. , 218103 (2009).[4] A. J. McKane and T. J. Newman, Phys. Rev. Lett. ,218102 (2005).[5] M. I. Dykman, I. B. Schwartz, and A. S. Landsman,Phys. Rev. Lett. , 078101 (2008).[6] J. Fern´andez-Gracia, K. Suchecki, J. J. Ramasco,M. San Miguel, and V. M. Egu´ıluz, Phys. Rev. Lett. , 158701 (2014).[7] T. Betz, D. Lim, and J. A. K¨as, Phys. Rev. Lett. ,098103 (2006).[8] M. R. D’Orsogna and T. Chou, Phys. Rev. Lett. ,170603 (2005).[9] S. Condamin, O. B´enichou, V. Tejedor, R. Voituriez, andJ. Klafter, Nature , 77 (2007).[10] E. Kussell and S. Leibler, Science , 2075 (2005).[11] S. T˘anase-Nicola, P. B. Warren, and P. R. ten Wolde,Phys. Rev. Lett. , 068102 (2006).[12] T. J. Kobayashi, Phys. Rev. Lett. , 228104 (2010).[13] S. Redner, A guide to first-passage processes (CambridgeUniversity Press, 2001).[14] O. B´enichou, C. Loverdo, M. Moreau, and R. Voituriez,Reviews of Modern Physics , 81 (2011).[15] A. J. Bray, S. N. Majumdar, and G. Schehr, Advancesin Physics , 225 (2013).[16] F. Aurzada and T. Simon, in L´evy Matters V (Springer,2015) pp. 183–224.[17] A. Godec and R. Metzler, Phys. Rev. X , 041037 (2016).[18] A. Godec and R. Metzler, Scientific reports , 20349(2016).[19] A. Godec and R. Metzler, Journal of Physics A: Mathe-matical and Theoretical , 084001 (2017).[20] A. Singer, Z. Schuss, D. Holcman, and R. Eisenberg,Journal of Statistical Physics , 437 (2006).[21] D. Holcman and Z. Schuss, SIAM Review , 213 (2014).[22] S. A. Isaacson, A. J. Mauro, and J. Newby, PhysicalReview E , 042414 (2016).[23] P. C. Bressloff and S. D. Lawley, Journal of Physics A:Mathematical and Theoretical , 195001 (2017).[24] G. Bel, B. Munsky, and I. Nemenman, Physical biology , 016003 (2009). [25] B. Munsky, I. Nemenman, and G. Bel, The Journal ofchemical physics , 235103 (2009).[26] R. Grima and A. Leier, J. Phys. Chem. B , 13 (2017).[27] D. Schnoerr, G. Sanguinetti, and R. Grima, J. Phys. A , 093001 (2017).[28] P. Deuflhard, W. Huisinga, T. Jahnke, and M. Wulkow,SIAM Journal on Scientific Computing , 2990 (2008).[29] V. Kazeev, M. Khammash, M. Nip, and C. Schwab,PLoS computational biology , e1003359 (2014).[30] M. Assaf, A. Kamenev, and B. Meerson, Physical ReviewE , 041123 (2008).[31] M. Assaf and B. Meerson, Physical Review E , 021116(2010).[32] S. Be’er and M. Assaf, Journal of Statistical Mechanics:Theory and Experiment , 113501 (2016).[33] M. F. Weber and E. Frey, Rep. Prog. Phys. , 046601(2017).[34] S. N. Majumdar, A. J. Bray, and G. C. Ehrhardt, Phys-ical Review E , 015101 (2001).[35] L. A. Goodman, Biometrics , 212 (1953).[36] D. Schnoerr, G. Sanguinetti, and R. Grima, The Journalof chemical physics , 08B616 1 (2014).[37] D. Schnoerr, G. Sanguinetti, and R. Grima, The Journalof Chemical Physics , 185101 (2015).[38] P. S. Maybeck, Stochastic models, estimation, and con-trol , Vol. 3 (Academic press, 1982).[39] See Supplemental Material at http:// for a derivation of(8)-(10), a detailed discussion of the choice of the lossfunction in (11), as well as a derivation of the exact re-sults for the Poisson birth process , .[40] B. Cseke, M. Opper, and G. Sanguinetti, in
Advances inNeural Information Processing Systems (2013) pp. 971–979.[41] B. Cseke, D. Schnoerr, M. Opper, and G. Sanguinetti,J. Phys. A. , 494002 (2016).[42] D. T. Gillespie, J. Comput. Phys. , 403 (1976). Appendix A: Derivation of main result
Here, we derive the main results of this work given in Equa-tions (8)-(10). To this end, consider step (i) of the sequential scheme presented in the main text and suppose we have solvedthe corresponding moment closure equations from t to t + ∆ t to obtain ˆ µ t +∆ t and ˆ Σ t +∆ t which we can thus expand asˆ µ t +∆ t = µ t + ∆ t (cid:18) ∂∂t µ t (cid:19) MC + O (∆ t ) , (A1)ˆ Σ t +∆ t = Σ t + ∆ t (cid:18) ∂∂t Σ t (cid:19) MC + O (∆ t ) . (A2)ˆ µ t +∆ t and ˆ Σ t +∆ t are the moments of the distribution at time t + ∆ t prior to the update in (6): p ( x t +∆ t | C ≤ t ) = N ( x ; ˆ µ t +∆ t , ˆ Σ t +∆ t ) . (A3)Note that p ( x t +∆ t | C ≤ t ) does not include the observation C t +∆ t . The latter can be taken into account using updateequation (6). Using the exponential form of the constraintgiven in (3) one finds that the normalisation in (6) can bewritten as stated in (7). Using this, one can derive the rela-tions Z t +∆ t = (cid:90) d x t +∆ t e − ∆ t U ( x t +∆ t ,t +∆ t ) (A4) × N ( x t +∆ t ; ˆ µ t +∆ t , ˆ Σ t +∆ t ) ,∂∂ ˆ µ log Z t +∆ t (A5)= (cid:104) ∂∂ ˆ µ log N ( x t +∆ t ; ˆ µ t +∆ t , ˆ Σ t +∆ t ) (cid:105) p ( x t +∆ t | C ≤ t +∆ t ) ,∂∂ ˆ Σ log Z t +∆ t (A6)= (cid:104) ∂∂ ˆ Σ log N ( x t +∆ t ; ˆ µ t +∆ t , ˆ Σ t +∆ t ) (cid:105) p ( x t +∆ t | C ≤ t +∆ t ) , where we have used p ( x t +∆ t | C ≤ t +∆ t )= e − ∆ t U ( x t +∆ t ,t +∆ t ) N ( x t +∆ t ; ˆ µ t +∆ t , ˆ Σ t +∆ t ) Z t +∆ t , (A7)which follows from (3) and (6). Taking the logarithm of (A4)and expanding in ∆ t we further findlog Z t +∆ t = − ∆ t (cid:104) U ( x , t + ∆ t ) (cid:105) N ( x ;ˆ µ t +∆ t , ˆ Σ t +∆ t ) + O (∆ t ) . (A8)Using (A8) in the l.h.s. of (A5) and (A6) and performing thelogarithm and derivative on the r.h.s., we obtain the updateequations µ t +∆ t = ˆ µ t +∆ t (A9) − ∆ t ˆ Σ t +∆ t ∂∂ ˆ µ (cid:104) U ( x , t + ∆ t ) (cid:105) N ( x ;ˆ µ t +∆ t , ˆ Σ t +∆ t ) + O (∆ t ) , Σ t +∆ t = ˆ Σ t +∆ t (A10) − t ˆ Σ t +∆ t ∂ (cid:104) U ( x , t + ∆ t ) (cid:105) N ( x ;ˆ µ t +∆ t , ˆ Σ t +∆ t ) ∂ ˆ Σ t +∆ t ˆ Σ t +∆ t + O (∆ t ) . Plugging (A1) or (A2) into (A9) or (A10) and expanding in∆ t we find µ t +∆ t = µ t + ∆ t (cid:18) ∂∂t µ t (cid:19) MC (A11) − ∆ t Σ t ∂∂ µ (cid:104) U ( x , t ) (cid:105) N ( x ; µ t , Σ t ) + O (∆ t ) , Σ t +∆ t = Σ t + ∆ t (cid:18) ∂∂t Σ t (cid:19) MC (A12) − t Σ t (cid:18) ∂∂ Σ t (cid:104) U ( x , t ) (cid:105) N ( x ; µ t , Σ t ) (cid:19) Σ t + O (∆ t ) . Taking the continuum limit ∆ t → Z t +∆ t = log Z [0 ,t +∆ t ] − log Z [0 ,t ] , (A13)in (A8) we obtainlog Z [0 ,t +∆ t ] − log Z [0 ,t ] = − ∆ t (cid:104) U ( x , t + ∆ t ) (cid:105) N ( x ;ˆ µ t +∆ t , ˆ Σ t +∆ t ) + O (∆ t ) , (A14)which is just the discrete-time version of (10). Appendix B: Choice of loss function
We approximate the binary observation process by a softconstraint of exponential form (3). For the studied examples,we choose an exponential loss function (11). This (softly)confines the process to the interval x t ∈ [ −∞ , c ]([ c, ∞ ]) if a ispositive (negative). This leaves us with the choice of the mag-nitude of a . A larger | a | corresponds to a steeper loss functionand hence to a better approximation of the binary observa-tion process. However, a larger | a | also leads to stronger non-linearity of the equations (8)-(10). This will typically limit | a | for numerical feasibility. Optimally, one would hope theresults to converge in | a | for values for which the equationsare still numerically feasible. For some of the examples stud-ied we indeed found this to be the case. For others, we chose | a | as large as numerically feasible. A more systematic way ofchoosing | a | is left for future work. Appendix C: Poisson process
The first example in the main text is the Poisson birthprocess comprising a single species X and a single reaction ∅ k −−−−→ X. (C1)If there are zero X molecules in the system initially it is easyto show that the solution p ( x, t ) of the corresponding masterequation at time t is given by a Poisson distribution withmean kt : p ( x, t ) = e − kt ( kt ) x x ! , (C2)where x ∈ N is the number of X molecules in the system.Suppose we want to compute the FPT distribution p ( τ ; c ) forreaching a state c ∈ N + . Since the number of X moleculesnever decreases, p ( τ ; c ) is simply given by the probability of being in state c − k of the reaction firingwhich means jumping into state c : p ( τ ; c ) = k × p ( c − , τ ) = k × e − kτ ( kτ ) c − ( c − . (C3)The long-time behaviour is hence given by p ( τ ; c ) ∼ e − kτ , τ → ∞ . (C4)Next, we derive the same result using our method. We usean exponential loss function of the form given in (11) and a >
0. Plugging the expectation of the loss function into (8)and (9) we obtain the evolution equations for the mean µ t and variance Σ t of the process ∂∂t µ t = k − a Σ e a ( µ t + a Σ / − c ) ,∂∂t Σ t = k − a Σ e a ( µ t + a Σ / − c ) . (C5) In steady state ∂∂t µ t = ∂∂t Σ t = 0 this is solved by µ ∗ = c −
12 + 1 a log( k ) , Σ ∗ = 1 /a. (C6)Plugging these into the equation (10) for the survival proba-blity we find ∂∂t log Z [0 ,t ] (cid:12)(cid:12) µ ∗ , Σ ∗ = − e a ( µ ∗ + a Σ ∗ / − c ) = − k. (C7)Solving this we find the long-time FPT distribution p ( τ ; c ) (cid:12)(cid:12) µ ∗ , Σ ∗ = ∂∂t Z [0 ,t ] (cid:12)(cid:12) µ ∗ , Σ ∗ ,t = τ ∼ e − kτ ,,