Stability-Optimized High Order Methods and Stiffness Detection for Pathwise Stiff Stochastic Differential Equations
SStability-Optimized High Order Methods and Stiffness Detectionfor Pathwise Stiff Stochastic Differential Equations ∗ Christopher Rackauckas † Qing Nie ‡ April 13, 2018
Abstract
Stochastic differential equations (SDE) often exhibit large random transitions. This property, whichwe denote as pathwise stiffness, causes transient bursts of stiffness which limit the allowed step size forcommon fixed time step explicit and drift-implicit integrators. We present four separate methods toefficiently handle this stiffness. First, we utilize a computational technique to derive stability-optimizedadaptive methods of strong order 1.5 for SDEs. The resulting explicit methods are shown to exhibitsubstantially enlarged stability regions which allows for them to solve pathwise stiff biological modelsorders of magnitude more efficiently than previous methods like SRIW1 and Euler-Maruyama. Secondly,these integrators include a stiffness estimator which allows for automatically switching between implicitand explicit schemes based on the current stiffness. In addition, adaptive L-stable strong order 1.5implicit integrators for SDEs and stochastic differential algebraic equations (SDAEs) in mass-matrixform with additive noise are derived and are demonstrated as more efficient than the explicit methodson stiff chemical reaction networks by nearly 8x. Lastly, we developed an adaptive implicit-explicit(IMEX) integration method based off of a common method for diffusion-reaction-convection PDEs andshow numerically that it can achieve strong order 1.5. These methods are benchmarked on a range ofproblems varying from non-stiff to extreme pathwise stiff and demonstrate speedups between 5x-6000xwhile showing computationally infeasibility of fixed time step integrators on many of these test equations.
Stochastic differential equations (SDEs) are dynamic equations of the form dX t = f ( t, X t ) dt + g ( t, X t ) dW t , (1)where X t is a d -dimensional vector, f : R d → R d is the drift coefficient, and g : R d → R d × m is the diffusioncoefficient which describes the amount and mixtures of the noise process W t which is a m -dimensionalBrownian motion. SDEs are of interest in scientific disciplines because they can exhibit behaviors which arenot found in deterministic models. For example, An ODE model of a chemical reaction network may stay ata constant steady state, but in the presence of randomness the trajectories may be switching between varioussteady states [31, 41, 14]. In many cases, these unique features of stochastic models are pathwise-dependentand are thus not a property of the evolution of the mean trajectory. However, these same effects causerandom events of high numerical stiffness, which we denote as pathwise stiffness, which can cause difficultiesfor numerical integration methods. ∗ This work was partially supported by NIH grants R01GM107264 and P50GM76516 and NSF grants DMS1562176 andDMS1161621. This material is based upon work supported by the National Science Foundation Graduate Research Fellowshipunder Grant No. DGE-1321846, the National Academies of Science, Engineering, and Medicine via the Ford Foundation, andthe National Institutes of Health Award T32 EB009418. Its contents are solely the responsibility of the authors and do notnecessarily represent the official views of the NIH. † Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA and Center for Complex BiologicalSystems, University of California, Irvine, Irvine, CA 92697, USA ([email protected]). ‡ Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA and Center for Complex BiologicalSystems, University of California, Irvine, Irvine, CA 92697, USA and Department of Developmental and Cell Biology, Universityof California, Irvine, Irvine, CA 92697, USA ([email protected]). a r X i v : . [ m a t h . NA ] A p r t X t Pathwise Stiffness Example
Figure 1:
Example of a Pathwise Stiff Solution . Depicted is a sample trajectory of Equation 2 solvedusing the SOSRI methods developed in this manuscript with reltol = abstol = 10 − .A minimal example of pathwise stiffness is demonstrated in the equation dX t = [ − X t (1 − X t ) (2 − X t )] dt + g ( t, X t ) dW t , X = 2 , t ∈ [0 , . (2)with additive noise g ( t, X t ) = 10 where a sample trajectory is shown in Figure 1. This equation has twostable steady states, one at X = 0 and another at X = 2, which the solution switches between when thenoise is sufficiently large. While near a steady state the derivative is approximately zero making the problemnon-stiff, during these transitions the derivative of the drift term reaches a maximum of ≈ t . This display oflarge, transient, and random switching behavior in a given trajectory causes stochastic bursts of numericalstiffness, a phenomena which we will denote pathwise stiffness. The fixed time step Euler-Maruyama methodwould require dt < × − to be stable for most trajectories, thus requiring greater than 2 × steps tosolve this 1-dimensional SDE. In many cases the switching behavior can be rare (due to smaller amounts ofnoise) or can happen finitely many times like in the multiplicative noise version with g ( t, X t ) = 10 X t . Yeteven if these switches are only a small portion of the total time, the stability requirement imposed by theirexistence determines the possible stepsizes and thus has a large contribution to the overall computationalcost. While implicit methods can be used to increase the stability range, this can vastly increase the overallcomputational cost of each step, especially in the case large systems of SDEs like discretizations of stochasticreaction-diffusion equations. In addition, implicit solvers have in practice a smaller stability region due torequiring convergence of the quasi-Newton solvers for the implicit steps. This problem is mitigated in ODEsoftware by high-quality stage predictors given by extrapolation algorithms for good initial conditions for theNewton steps [12]. However, there are no known algorithms for stage predictors in the presence of large noisebursts and thus we will demonstrate that classic implicit solvers have a form of instability. Thus both fixedtime step explicit and implicit solvers are inadequate for efficiently handling this common class of SDEs.Since these features exist in the single trajectories of the random processes, methods which attempt toaccount for the presence of such bursts must do so on each individual trajectory in order to be efficient. Inprevious work, the authors have shown that by using adaptive time-stepping, a stochastic reaction networkof 19 reactants is able to be solved with an average time step 100,000 times larger than the value thatwas found necessary for stability during the random stiff events for a high order SRK method [29]. Thisdemonstrated that the key to solving these equations efficiently required controlling the time steps in apathwise manner. However, the methods were still largely stability-bound, meaning the chosen tolerances tosolve the model were determined by what was necessary for stability and was far below the error necessaryfor the application. The purpose of this investigation is to develop numerical methods with the ability tobetter handle pathwise stiffness and allow for efficient solving of large Monte Carlo experiments.We approach this problem through four means. First, we develop adaptive stability-optimized SRKmethods with enlarged stability regions. This builds off of similar work for ODE integrators which optimizethe coefficients of a Butcher tableau to give enhanced stability [24, 2, 40]. Similar to the Runge-KuttaChebyschev methods [12] (and the S-ROCK extension to the stochastic case [22, 1, 21]), these methods are2esigned to be efficient for equations which display stiffness without fully committing to implicit solvers.Given the complexity of the stochastic stability equations and order conditions, we develop a novel andscalable mechanism for the derivation of “optimal” Runge-Kutta methods. We use this method to designstability-optimized methods for additive noise and diagonal noise SDEs. We show through computationalexperiments that these adaptive stability-optimized SRK methods can adequately solve transiently stiffequations without losing efficiency in non-stiff problems.On the other hand, to handle extreme stiffness we develop implicit RK methods for SDEs and stochasticdifferential algebraic equations (SDAEs) in mass matrix form with additive noise. We extend the definitionof L-stability to additive noise SDEs and develop two strong order 1.5 methods: a fully implicit 2-stage L-stable method and an extension of the a well-known L-stable explicit first stage singly diagonally implicit RK(ESDIRK) method due to Kennedy and Carpenter which is commonly used for convection-diffusion-reactionequations [18]. To the author’s knowledge, these are the first high order adaptive L-stable methods for SDEsand the first adaptive proposed SDAE integrators. In addition, to extend the utility of these additive noisemethods, we derive an extension of the methods for additive SDEs to affine SDEs (mixed multiplicative andadditive noise terms) through a Lamperti transformation [27]. Lastly, in order to handle extreme transientstiffness, for each of these types of methods we derive computationally cheap methods for detecting stiffnessand switching between implicit and explicit integrators in the presence of stiffness. We show that thesemethods can robustly detect pathwise stiff transients and thus can serve as the basis for automatic switchingmethods for SDEs. Together we test on non-stiff, semi-stiff, and stiff equations with 2 to 6 × ×
100 SDEsfrom biological literature and show speedups between 6x-60x over the previous adaptive SRIW1 algorithm,and demonstrate the infeasibility of common explicit and implicit methods (Euler-Maruyama, Runge-KuttaMilstein, Drift-Implicit Stochastic θ -Method, and Drift-Implicit θ Runge-Kutta Milstein) found as the basisof many SDE solver packages [34, 10, 16].
The class of methods we wish to study are the adaptive strong order 1.5 SRK methods for diagonal noise[32, 29]. Diagonal noise is the case where the diffusion term g is diagonal matrix (cid:0) σ i X it (cid:1) and includesphenomenological noise models like multiplicative and affine noise. The diagonal noise methods utilize thesame general form and order conditions as the methods for scalar noise so we use their notation for simplicity.The strong order 1.5 methods for scalar noise are of the form X n +1 = X n + s (cid:88) i =1 α i f (cid:16) t n + c (0) i h, H (0) i (cid:17) + (3) s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , √ h + β (3) i I (1 , h + β (4) i I (1 , , h (cid:19) g (cid:16) t n + c (1) i h (cid:17) (4)with stages H (0) i = X n + s (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h, H (1) j (cid:17) I (1 , h (5) H (1) i = X n + s (cid:88) j =1 A (1) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (1) ij g (cid:16) t n + c (1) j h, H (1) j (cid:17) √ h where the I j are the Wiktorsson approximations to the iterated stochastic integrals [43]. In the case ofadditive noise, defined as having the diffusion coefficient satisfy g ( t, X t ) ≡ g ( t ) , reduces to the form X n +1 = X n + s (cid:88) i =1 α i f (cid:16) t n + c (0) i h, H (0) i (cid:17) + s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , h (cid:19) g (cid:16) t n + c (1) i h (cid:17) (6)3ith stages H (0) i = X n + s (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h (cid:17) I (1 , h . (7)The tuple of coefficients (cid:0) A ( j ) , B ( j ) , β ( j ) , α (cid:1) thus fully determines the SRK method. These coefficients mustsatisfy the constraint equations described in Appendix C.1 in order to receive strong order 1.5. Thesemethods are appended with error estimates E D = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ t (cid:88) i ∈ I ( − σ ( i ) f (cid:16) t n + c (0) i ∆ t, H (0) i (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) or E D = ∆ t (cid:88) i ∈ I (cid:12)(cid:12)(cid:12) f (cid:16) t n + c (0) i ∆ t, H (0) i (cid:17)(cid:12)(cid:12)(cid:12) E N = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) i ∈ I (cid:18) β (3) i I (1 , ∆ t + β (4) i I (1 , , ∆ t (cid:19) g (cid:16) t n + c (1) i ∆ t, H (1) i (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) and the rejection sampling with memory (RSwM) algorithm to give it fully adaptive time-stepping [29].Thus unlike in the theory of ordinary differential equations [23, 8, 6, 39, 37], the choice of coefficients forSRK methods does not require explicitly finding an embedded method when developing an adaptive SRKmethod and we will therefore take for granted that each of the derived methods is adaptive. We use a previous definition of a discrete approximation as numerically stable if for any finite time interval[ t , T ], there exists a positive constant ∆ such that for each (cid:15) > δ ∈ (0 , ∆ )lim | X δ − ¯ X δ | → sup t ≤ t ≤ T P (cid:0)(cid:12)(cid:12) X δt − ¯ X δt (cid:12)(cid:12) ≥ (cid:15) (cid:1) = 0 (8)where X δn is a discrete time approximation with maximum step size δ > X δ and ¯ X δn respectivelystarting at ¯ X δn [20]. For additive noise, we consider the complex-valued linear test equations dX t = µX t dt + dW t (9)where µ is a complex number. In this framework, a scheme which can be written in the form X hn +1 = X hn G ( µh ) + Z δn (10)with a constant step size δ ≡ h and Z δn are random variables which do not depend on the Y δn , then the regionof absolute stability is the set where for z = µh , | G ( z ) | < X hn +1 = X hn + z (cid:16) α · H (0) (cid:17) + β (1) σI (1) + σβ (2) I (1 , h (11)where H (0) = (cid:16) I − zA (0) (cid:17) − (cid:18) ˆ X hn + B (0) eσ I (1 , h (cid:19) (12)where ˆ X hn is the size s constant vector of elements X hn and e = (1 , , , T . By substitution we receive X hn +1 = X hn (cid:18) z (cid:18) α · (cid:16) I − zA (0) (cid:17) − (cid:19)(cid:19) + (13) (cid:16) I − zA (0) (cid:17) − B (0) eσ I (1 , h + β (1) σI (1) + σβ (2) I (1 , h (14)4his set of equations decouples since the iterated stochastic integral approximation I j are random numbersand are independent of the X hn . Thus the stability condition is determined by the equation G ( z ) = 1 + zα · (cid:16) I − zA (0) (cid:17) − (15)which one may notice is the stability equation of the drift tableau applied to a deterministic ODE [5]. Thusthe stability properties of the deterministic Runge-Kutta methods carry over to the additive noise SRAmethods on this test equation. However, most two-stage tableaus from ODE research were developed tosatisfy higher order ODE order constraints which do not apply. Thus we will instead look to maximizestability while satisfying the stochastic order constraints. For explicitmethods, A (0) and B (0) are lower diagonal and we receive the simplified stability function G ( z ) = 1 + A z α + z ( α + α ) (16)for a two-stage additive noise SRK method. For this method we will find the method which optimizes thestability in the real part of z . Thus we wish to find A (0) and α s.t. the negative real roots of | G ( z ) | = 1 areminimized. By the quadratic equation we see that there exists only a single negative root: z = −√ α α .Using Mathematica’s minimum function, we determine that the minimum value for this root subject to theorder constraints is z = (cid:16) − (cid:113) (cid:17) ≈ − . α = , meaning that the SRA1method due to Rossler achieves the maximum stability criteria. However, given extra degrees of freedom,we attempted to impose that c (0)1 = c (1)1 = 0 and c (0)2 = c (1)2 = 1 so that the error estimator spans thewhole interval. This can lead to improved robustness of the adaptive error estimator. In fact, when tryingto optimize the error estimator’s span we find that there is no error estimator which satisfies c (0)2 > whichis the span of the SRA1 method [32]. Thus SRA1 is the stability-optimized 2-stage explicit method whichachieves the most robust error estimator. A (0) = (cid:18) (cid:19) , B (0) = (cid:18) (cid:19) , α = (cid:18) (cid:19) β (1) = (cid:18) (cid:19) , β (2) = (cid:18) − (cid:19) , c (0) = (cid:18) (cid:19) , c (1) = (cid:18) (cid:19) (17) For the 3-stage SRA method, we receive the simplified stability function G ( z ) = A A α z + A α z + A α z + A α z + α z + α z + α z + 1 (18)To optimize this method, we attempted to use the same techniques as before and optimize the real valuesof the negative roots. However, in this case we have a cubic polynomial and the root equations are moredifficult. Instead, we turn to a more general technique to handle the stability optimization which will beemployed in later sections as well. To do so, we generate an optimization problem which we can numericallysolve for the coefficients. To simplify the problem, we let z ∈ R and define the function: f ( z, w ; N, M ) = (cid:90) D χ G ( z ) ≤ ( z ) dz (19)Notice that f is the area of the stability region when D is sufficiently large. Thus we define the stability-optimized SRK method for additive noise SDEs as the set of coefficients which achieves5 (A) (B)(C) (D) Figure 2:
SOSRA Stability Regions . The stability regions ( | G ( z ) | <
1) are plotted in the ( x, y )-plane for z = x + iy . (A) SRA1. (B)
SRA3. (C)
SOSRA. (D)
SOSRA2max A ( i ) ,B ( i ) ,β ( i ) ,α f ( z ) (20)subject to: Order ConstraintsIn all cases we impose 0 < c (0) i , c (1) i <
1. We use the order constraints to simplify the problem to a nonlinearoptimization problem on 14 variables with 3 equality constraints and 4 inequality constraints (with boundconstraints on the 10 variables). However, we found that simplifying the problem even more to require c (0)1 = c (1)1 = 0 and c (0)3 = c (1)3 = 1 did not significantly impact the stability regions but helps the error estimatorand thus we reduced the problem to 10 variables, 3 equality constraints, and 2 inequality constraints. Thiswas optimized using the COBYLA local optimization algorithm [17, 28] with randomized initial conditions100 times and all gave similar results. In the Mathematica notebook we show the effect of changing thenumerical integration region D on the results, but conclude that a D which does not bias the result forbetter/worse real/complex handling does not improve the result. The resulting algorithm, SOSRA, we givenby the coefficients in table in Section B.2. Lastly, we used the condition that c (0)2 = c (0)3 = c (1)2 = c (1)3 = 1 toallow for free stability detection (discussed in Section 5.4). The method generated with this extra constraintis SOSRA2 whose coefficients are in the table in Section B.3. These methods have their stability regionscompared to SRA1 and SRA3 in Figure 2 where it is shown that the SOSRA methods more than doublesthe allowed time steps when the eigenvalues of the Jacobian are dominated by the real part.6 .2 Drift Implicit Methods for Stiff SDEs with Additive Noise It’s clear that, as in the case for deterministic equations, the explicit methods cannot be made A-stable.However, the implicit two-stage additive noise SRK method is determined by G ( z ) = z ( A ( A z − α z −
1) + A z ( α − A ) + A α z − A ( α z + 1) + α + α ) + 1 A z ( A z − − z ( A A z + A ) + 1 (21) which is A -stable if A z ( A z − − z ( A A z + A ) + 1 > z ( A ( A z − α z −
1) + A z ( α − A ) (22)+ A α z − A ( α z + 1) + α + α ) + 1 . (23)Notice that the numerator equals the denominator if and only if z = 0 or z = α + α ( A − A ) α + ( A − A ) α . (24)From the order conditions we know that α + α = 1 which means that no root exists with Re ( z ) < A − A ) α + ( A − A ) α >
0. Thus under these no roots conditions, we can determine A-stabilityby checking the inequality at z = 1, which gives 1 > ( A − A ) α + ( A − A ) α . Using the ordercondition, we have a total of four constraints on the A (0) and α :( A + A ) α + ( A + A ) α = 12 (25) α + α = 10 < ( A − A ) α + ( A − A ) α < z →∞ G ( z ) = 0 . (26)This implies that − A A + A α + A A − A α − A α + A α α A A − A A = 0 (27)The denominator is − det( A (0) ) which implies A (0) must be non-singular. Next, we attempt to impose B-stability on the drift portion of the method. We use the condition due to Burrage and Butcher that for B = diag ( α , α ) M = BA (0) + A (0) B − αα T (for ODEs) [4], we require both B and M to be non-negativedefinite. However, in the supplemental Mathematica notebooks we show computationally that there is no2-stage SRK method of this form which satisfies all three of these stability conditions. Thus we settle forA-stability and L-stability.Recalling that c (0) and c (1) are the locations in time where f and g are approximated respectively, wewish to impose c (0)1 = 0 (28) c (0)2 = 1 c (1)1 = 0 c (1)2 = 1so that the error estimator covers the entire interval of integration. Since c (0) = A (0) e , this leads to thecondition A + A = 1. Using the constraint-satisfaction algorithm FindInstance in Mathematica, we look7
10 - 5 5 10x- 10- 5510yLSRA - 10 - 5 5 10x- 10- 5510ySKenCarp (A) (B)
Figure 3:
Implicit SRA Stability Regions . The stability regions ( | G ( z ) | <
1) are plotted in the ( x, y )-plane for z = x + iy . (A) LSRA. (B)
SKenCarpfor tableaus which satisfy the previous conditions with the added constraint of semi-implicitness, i.e. B (0) is lower triangular. This assumption is added because the inverse of the normal distribution has unboundedmoments, and thus in many cases it mathematically simpler to consider the diffusion term as explicit (thoughthere are recent methods which drop this requirement via truncation or extra assumptions on the solution[26]). However, we find that there is no coefficient set which meets all of these requirements. However, if werelax the interval estimate condition to allow 0 ≤ c (0)2 ≤
1, we find an A-L stable method: A (0) = (cid:18) − (cid:19) , B (0) = (cid:18) (cid:19) , α = (cid:18) (cid:19) (29) β (1) = (cid:18) (cid:19) , β (2) = (cid:18) − (cid:19) , c (0) = (cid:18) (cid:19) , c (1) = (cid:18) (cid:19) which we denote LSRA. If we attempt to look for a 2-stage SDIRK-like method to reduce the complexity ofthe implicit equation, i.e. A (0)12 = 0, using FindInstance we find the constraints unsatisfiable. Note that if wedrop the semi-implicit assumption we find that the full constraints cannot be satisfied there (we still cannotsatisfy c (0)1 = 0 and c (0)2 = 1), and there does not exist a 2-stage A-L stable SDIRK method in that case. Since the stability region of the SRA methods is completely determined by the deterministic portion A (0) ,in some cases there may exist a sensible extension of implicit Runge-Kutta methods for ordinary differentialequations to high order adaptive methods stochastic differential equations with additive noise which keepthe same stability properties. Since the order constraints which only involve the deterministic portions A (0) , c (0) , and α match the conditions required for ODE integrators, existence is dependent on finding β (1) , β (2) , c (1) , and B (0) that satisfy the full order constraints. In this case, an adaptive error estimator can be addedby using the same estimator as the ODE method (which we call E D ) but adding the absolute size of thestochastic portions E N = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , h (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (30)leading to the error estimator E = δE D + E N . (31)This can be shown similarly to the construction in [29]. Given the large literature on implicit RK methodsfor ODEs, this presents a large pool of possibly good methods and heuristically one may believe that thesewould do very well in the case of small noise. 8owever, we note that there does not always exist such an extension. Using the constraint-satisfactionalgorithm FindInstance in Mathematica, we looked for extensions of the explicit first stage singly-diagonallyimplicit RK (ESDIRK) method TRBDF2 [15] and could not find values satisfying the constraints. Inaddition, we could not find values for an extension of the 5th order Radau IIA method [13, 12] whichsatisfies the constraints. In fact, our computational search could not find any extension of a 3-stage L-stableimplicit RK method which satisfies the constraints.But, the 4-stage 3rd order ODE method due to Kennedy and Carpenter [18] can be extended to thefollowing: A (0) = − − , (32) α = − β (1) = , β (2) = − , c (0) = , c (1) = B (0)2 , ≈ − . B (0)4 , ≈ − . The exact values for B , and B , are shown in B.1. (E)SDIRK methods are particularly interestingbecause these methods can be solved using a single factorization of the function of the Jacobian I − γdtJ where J is the Jacobian. Additionally, explicit handling of the noise term is similar to the Implicit-Explicit(IMEX) form for additive Runge-Kutta methods in that it occurs by adding a single constant term to theNewton iterations in each stage, meaning it does not significantly increase the computational cost. Thechosen ESDIRK method has a complimentary explicit tableau to form an IMEX additive Runge-Kuttamethod, and the chosen values for the stochastic portions are simultaneously compatible with the orderconditions for this tableau. In Section 6.1 we numerically investigate the order of the IMEX extension of themethod and show that it matches the convergence of the other SRA methods on the test equation. One thingto note is that since the problem is additive noise the method is never implicit in the dependent variables inthe noise part, so in theory this can also be extended with B (1) implicit as well (with convergence concernsdue to the non-finite inverse moments of the Normal distribution [20]). One has to be careful with the implementation to avoid accumulation of floating point error for highly stiffequations. For our implementation, we used a method similar to that described in [15]. The implicit stageswere defined in terms of z i = hf (cid:16) t + c (0) i , H (0) j (cid:17) (33)where X is the previous step, and thus the iterations become H (0) j = γz i + i − (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + i − (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h (cid:17) I (1 , h = γz i + α i . (34)This gives the implicit system for the residual: G ( z i ) = z i − hf (cid:16) t n + c (0) i h, γz i + α i (cid:17) I − γhJ where J is the Jacobian of f and thus is the same for each stage. For choosingthe values to start the Newton iterations, also known as stage prediction, we tested two methods. The firstis the trivial stage predictor which is z i = z j for the j s.t. j < i and c j < c i , i.e. using the closest derivativeestimate. The other method that was tested is what we denote the stochastic minimal residual estimategiven by H (0) j = α i or z i = 0. This method takes into account the stochastic bursts at a given step and thusdemonstrated much better stability. We note that these methods also apply to solving ODEs with mass-matrices of the form:
M dX t = f ( t, X t ) dt + M g ( t, X t ) dW t . The derivation of the method is the same, except in this case we receive the implicit system G ( z i ) = M z i − hf (cid:16) t n + c (0) i h, γz i + α i (cid:17) which has a Jacobian M − γhJ . Like in the ODE case, these implicit methods can thus solve DAEs inmass-matrix form (the case where M is singular), though we leave discussion of convergence for future re-search. One interesting property to note is that a zero row in the mass matrix corresponds to a constraintequation which is only dependent on the output of f since the multiplication of g by M is zero in thatsame corresponding row. Thus when a singular mass matrix is applied to the noise equation, the corre-sponding constraints are purely deterministic relations. Thus while this is a constrained form, propertieslike conservation of energy in physical models can still be placed on the solution using this mass-matrixformulation. Given the efficiency of the methods for additive noise, one method for developing efficient methods for moregeneral noise processes is to use a transform of diagonal noise processes to additive noise. This transform isdue to Lamperti [27], which states that the SDE of the form dX t = f ( t, X t ) dt + σ ( t, X t ) R ( t ) dW t (35)where σ > σ i ( t, X i,t ) has the transformation Z i,t = ψ i ( t, X i,t ) = (cid:90) σ i ( x, t ) dx | x = X i,t (36)which will result in an Ito process with the i th element given by dZ i,t = (cid:32) ∂∂t ψ i ( t, x ) | x = ψ − ( t,Z i,t ) + f i ( t, ψ − ( t, Z t )) σ i (cid:0) t, ψ − i ( t, Z i,t ) (cid:1) − ∂∂x σ i (cid:0) t, ψ − i ( t, Z i,t ) (cid:1)(cid:33) dt (37)+ n (cid:88) j =1 r ij ( t ) dw j,t (38)with X t = ψ − ( t, Z t ) . (39)This is easily verified using Ito’s Lemma. In the case of mixed multiplicative and additive noise (affine noise),the vector equation: dX t = f ( t, X t ) dt + ( σ M X t + σ A ) dW t (40)10ith σ M > σ A >
0, the transform becomes element-wise in the system. Thus we can consider theone-dimensional case. Since ψ ( t, X t ) = (cid:82) (cid:16) σ M X t + σ A (cid:17) dx | x = X t = log( σ M X t + σ A ) σ M , then X t = exp( σ M Z t ) − σ A σ M and dZ t = ˜ f ( t, X t ) dt + dW t (41)˜ f ( t, X t ) = f ( t, X t ) σ M X t + σ A − σ M provided σ M X t is guaranteed to be sufficiently different from σ A to not cause definitional issues. It iscommon in biological models like chemical reaction networks that X t ≥
0, in which case this is well-definedfor any σ A > σ M > Z t which is the same as X t if σ M (cid:54) = 0 and is the transformed X t otherwise (assuming parameters must bepositive). When doing so, references of X i,t must be changed into exp( σ M Z i,t ) − σ A σ M . For example, the affinenoise Lotka-Volterra SDE: dx = ( ax − bxy ) dt + ( σ M x + σ A ) dW t dy = ( − cy + dxy ) dt + σ ˜ A dW t only has noise on the first term, so this transforms to x = exp( σ M z ) − σ A σ M dz = (cid:18) ax − bxyσ M x + σ A − σ M (cid:19) dt + dW t dy = ( − cy + dxy ) dt + σ ˜ A dW t along with the change to the initial condition and can thus be solved with the SRA methods. We note a wordof caution that the above transformation only holds when σ A > σ A = 0, the transformation isdifferent, with X t = exp( Z t ) σ M (instead of exp( σ M Z t ) σ M which one would get by taking σ A = 0).Instead of performing the transformations directly on the functions themselves, we can modify the SRAalgorithm to handle this case as: X n +1 = ψ − (cid:32) ψ ( X n ) + s (cid:88) i =1 α i ˜ f (cid:16) t n + c (0) i h, H (0) i (cid:17) + s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , h (cid:19) ˜ g (cid:16) t n + c (1) i h (cid:17)(cid:33) (42) with stages H (0) i = ψ ( X n ) + s (cid:88) j =1 A (0) ij ˜ f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij ˜ g (cid:16) t n + c (1) j h (cid:17) I (1 , h (43)where ψ is the element-wise function: ψ i ( x ) = log( σ i,M x + σ i,A ) σ i,M σ i,M > , σ i,A > log( x ) σ i,M σ i,M > , σ i,A = 0 x o.w. ,ψ − i ( z ) = exp( σ i,M z ) σ i,M σ i,M > , σ i,A > exp( z ) σ i,M σ i,M > , σ i,A = 0 x o.w. and ˜ g i ( t ) = (cid:40) σ i,M > σ i,A o.w. This can be summarized as performing all internal operations in Z -space (where the equation is additive)but saving each step in X -space. 11 Optimized-Stability Order 1.5 SRK Methods with Diagonal Noise
For diagonal noise, we use the mean-square definition of stability [20]. A method is mean-square stable iflim n →∞ E (cid:16) | X n | (cid:17) = 0 on the test equation dX t = µX t dt + σX t dW t . (44)In matrix form we can re-write our method as given by X n +1 = X n + µh (cid:16) α · H (0) (cid:17) + σI (1) (cid:16) β (1) · H (1) (cid:17) + σ I (1 , √ h (cid:16) β (2) · H (1) (cid:17) (45)+ σ I (1 , h (cid:16) β (3) · H (1) (cid:17) + σ I (1 , , h (cid:16) β (4) · H (1) (cid:17) (46)with stages H (0) = X n + µ ∆ tA (0) H (0) + σ I (1 , h B (0) H (1) , (47) H (1) = X n + µ ∆ tA (1) H (0) + σ √ ∆ tB (1) H (1) where ˆ X n is the size s constant vector of X n . H (0) = (cid:16) I − hA (0) (cid:17) − (cid:18) ˆ X n + σ I (1 , h B (0) H (1) (cid:19) , (48) H (1) = (cid:16) I − σ √ hB (1) (cid:17) − (cid:16) ˆ X n + µhA (1) H (0) (cid:17) By the derivation in the appendix, we receive the equation S = E U n +1 U n = { µht α · (cid:32) I − µ ∆ tA (0) − µσI (1 , A (1) B (0) (cid:18) I − σ √ hB (1) (cid:19) − (cid:33) − (cid:32) I + σ I (1 , h B (0) (cid:18) I − σ √ hB (1) (cid:19) − (cid:33) (49)+ σI (1) β (1) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) + σ I (1 , √ h β (2) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) + σ I (1 , h β (3) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) + σ I (1 , , h β (4) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) } We apply the substitutions from the Appendix and let z = µh, (50) w = σ √ h. In this space, z is the stability variable for the drift term and w is the stability in the diffusion term. Underthis scaling (cid:16) h, √ h (cid:17) , the equation becomes independent of h and thus becomes a function S ( z, w ) on thecoefficients of the SRK method where mean-square stability is achieved when | S ( z, w ) | <
1. The equation S ( z, w ) in terms of its coefficients for explicit methods ( A ( i ) and B ( i ) lower diagonal) has millions of termsand is shown in the supplemental Mathematica notebook. Determination of the stability equation for theimplicit methods was found to be computationally intractable and is an avenue for further research.12 .2 An Optimization Problem for Determination of Coefficients We wish to determine the coefficients for the diagonal SRK methods which optimize the stability. To do so,we generate an optimization problem which we can numerically solve for the coefficients. To simplify theproblem, we let z, w ∈ R . Define the function f ( z, w ; N, M ) = (cid:90) M − M (cid:90) − N χ S ( z,w ) ≤ ( z, w ) dzdw. (51)Notice that for N, M → ∞ , f is the area of the stability region. Thus we define the stability-optimizeddiagonal SRK method as the set of coefficients which achievesmax A ( i ) ,B ( i ) ,β ( i ) ,α f ( z, w ) (52)subject to: Order ConstraintsHowever, like with the SRK methods for additive noise, we impose a few extra constraints to add robustnessto the error estimator. In all cases we impose 0 < c (0) i , c (1) i < c (0)4 = c (1)4 = 1which we call the End-C Constraint. Lastly, we can prescribe the ordering constraint c ( j )1 < c ( j )2 < c ( j )3 < c ( j )4 which we denote as the Inequality-C Constraint.The resulting problem is a nonlinear programming problem with 44 variables and 42-48 constraint equa-tions. The objective function is the two-dimensional integral of a discontinuous function which is determinedby a polynomial of in z and w with approximately 3 million coefficients. To numerically approximate thisfunction, we calculated the characteristic function on a grid with even spacing dx using a CUDA kerneland found numerical solutions to the optimization problem using the JuMP framework [9] with the NLoptbackend [17]. A mixed approach using many solutions of the semi-local optimizer LN AUGLAG EQ [7, 3]and fewer solutions from the global optimizer GN ISRES [33] were used to approximate the optimality of so-lutions. The optimization was run many times in parallel until many results produced methods with similaroptimality, indicating that we likely obtained values near the true minimum.The parameters N and M are the bounds on the stability region and also represent a trade-off between thestability in the drift and the stability in the diffusion. A method which is optimized when M is small wouldbe highly stable in the case of small noise, but would not be guaranteed to have good stability properties inthe presence of large noise. Thus these parameters are knobs for tuning the algorithms for specific situations,and thus we solved the problem for different combinations of N and M to determine different algorithms forthe different cases. The coefficients generated for approximately-optimal methods fall into three categories. In one category wehave the drift-dominated stability methods where large N and small M was optimized. On the other end wehave the diffusion-dominated stability methods where large M and small N was optimized. Then we havethe mixed stability methods which used some mixed size choices for N and M . As a baseline, we optimizedthe objective without constraints on the c i to see what the “best possible method” would be. When this wasdone with large N and M , the resulting method, which we name SOSRI, has almost every value of c satisfythe constraints, but with c (0)2 ≈ − .
04 and c (0)4 ≈ .
75. To see if we could produce methods which were morediffusion-stable, we decreased N to optimize more in w but failed to produce methods with substantiallyenlarged diffusion-stability over SOSRI.Adding only the inequality constraints on the c i and looking for methods for drift-dominated stability,we failed to produce methods whose c i estimators adequately covered the interval. Some of the resultsdid produce stability regions similar to SOSRI but with c (0) i < . c i , one method, which welabel SOSRI2, resulted in similar stability to SOSRI but satisfy the c i constraints. In addition, this methodsatisfies c (0)3 = c (0)4 = 1 and c (1)3 = c (1)4 = 1, a property whose use will be explained in Section 5.4. Thestability regions for these methods is shown in Figure 4.13 - z- - - wEuler-Maruyama - - z- - - wSRIW1 - - z- - - wSRIW2- - z- - - wSOSRI - - z- - - wSOSRI2 (A) (B) (C)(D) (E) Figure 4:
SOSRI Stability Regions . The stability regions ( S ( z, w ) ≤ z, w )-plane. (A) Euler-Maruyama. (B)
SRIW1. (C)
SRIW2. (D)
SOSRI. (E)
SOSRI2To look for more diffusion-stable methods, we dropped to N = 6 to encourage the methods to expandthe stability in the w -plane. However, we could not find a method whose stability region went substantiallybeyond [ − ,
2] in w . This was further decreased to N = 1 where methods still could not go substantiallybeyond | | . Thus we were not able to obtain methods optimized for the diffusion-dominated case. This hardbarrier was hit under many different constraint and objective setups and under thousands of optimizationruns, indicating there might be a diffusion-stability barrier for explicit methods. In many real-world cases, one may not be able to clearly identify a model as drift-stability bound or diffusion-stability bound, or if the equation is stiff or non-stiff. In fact, many models may switch between suchextremes. An example is a model with stochastic switching between different steady states. In this case,we have that the diffusion term f ( t, X ss ) ≈ f can be very large and stiff, causing the integration tobe heavily drift-stability dominated. Since these switches are random, the ability to adapt between thesetwo behaviors could be key to achieving optimal performance. Given the trade-off, we investigated how ourmethods allow for switching between methods which optimize for the different situations.The basis for our method is an extension of a method proposed for deterministic differential equations[35, 36, 12]. The idea is to create a cheap approximation to the dominant eigenvalues of the Jacobians for the14rift and diffusion terms. If v is the eigenvector of the respective Jacobian, then for (cid:107) v (cid:107) sufficiently small, | λ D | ≈ (cid:107) f ( t, x + v ) − f ( t, x ) (cid:107)(cid:107) v (cid:107) , (53) | λ N | ≈ (cid:107) g ( t, x + v ) − g ( t, x ) (cid:107)(cid:107) v (cid:107) (54)where | λ D | and | λ N | are the estimates of the dominant eigenvalues for the deterministic and noise functionsrespectively. We have in approximation that H ( k ) i is an approximation for X t + c ( k ) i h and thus the differencebetween two successive approximations at the same time-point, c ( k ) i = c ( k ) j , then the following serves as alocal Jacobian estimate: | λ D | ≈ (cid:107) f ( t + c (0) i h, H (0) i ) − f ( t + c (0) j h, H (0) j ) (cid:107)(cid:107) H (0) i − H (0) j (cid:107) , (55) | λ N | ≈ (cid:107) f ( t + c (1) i h, H (1) i ) − f ( t + c (1) j h, H (1) j ) (cid:107)(cid:107) H (1) i − H (1) j (cid:107) (56)If we had already computed a successful step, we would like to know if in the next calculation we shouldswitch methods due to stability. Thus it makes sense to approximate the Jacobian at the end of the interval,meaning i = s and j = s − s is the number of stages. Then if z min is the minimum z ∈ R such that z is in the stability region for the method, h | λ D | z min > w min to be of the maximum of the stability region in the noise axis.Hairer noted that, for ODEs, if a RK method has c i = c j = 1, then it follows that ρ = (cid:107) k i − k j (cid:107)(cid:107) g i − g j (cid:107) (57)where k i = f ( t + c i h, g i ) is an estimate of the eigenvalues for the Jacobian of f . Given the construction ofSOSRI2, a natural extension is | λ D | ≈ (cid:107) f (cid:16) t n + c (0)4 h, H (0)4 (cid:17) − f (cid:16) t n + c (0)3 h, H (0)3 (cid:17) (cid:107)(cid:107) H (0)4 − H (0)3 (cid:107) , (58) | λ N | ≈ (cid:107) g (cid:16) t n + c (0)4 h, H (1)4 (cid:17) − g (cid:16) t n + c (0)3 h, H (1)3 (cid:17) (cid:107)(cid:107) H (1)4 − H (1)3 (cid:107) (59)Given that these values are all part of the actual step calculations, this stiffness estimate essentially is free.By comparing these values to the stability plot in Figure 2, we use the following heuristic to decide if SOSRI2is stability-bound in its steps:1. If 10 > | λ D | > .
5, then we check if h | λ N | > ω .2. If | λ D | < .
5, then we check if h | λ N | / > ω .The denominator is chosen as a reasonable box approximation to the edge of the stability region. ω is asafety factor: in theory ω is 1 since we divided by the edge of the stability region, but in practice this isonly an eigenvalue estimate and thus ω allows for a trade-off between the false positive and false negativerates. If either of those conditions are satisfied, then h is constrained by the stability region. The solver canthus alert the user that the problem is stiff or use this estimate to switch to a method more suitable for stiffequations. In addition, the error estimator gives separate error estimates in the drift and diffusion terms.15 scheme could combine these two facts to develop a more robust stiffness detection method, and label thestiffness as either drift or diffusion dominated.We end by noting that SOSRA2 has the same property, allowing stiffness detection via | λ D | ≈ (cid:107) f (cid:16) t n + c (0)3 h, H (0)3 (cid:17) − f (cid:16) t n + c (0)2 h, H (0)2 (cid:17) (cid:107)(cid:107) H (0)3 − H (0)2 (cid:107) (60)and, employing a similar method as the deterministic case, check for stiffness via the estimate h | λ D | / > ω .In addition, stiff solvers can measure the maximal eigenvalues directly from the Jacobian. Here we suggestthe measure from Shampine [35, 36, 12] of using (cid:107) J (cid:107) ∞ as a cheap upper bound. For semi-implicit methodslike LSRA we only get a stability bound on the drift term, but this should be sufficient since for additivenoise diffusive noise instability is not an issue. In order to test the efficiency and correctness of the SRA algorithms, we chose to use the additive noise testEquation 67. Figure 5A demonstrates that the SOSRA and SKenCarp methods achieve the strong order2.0 on Equation 65. To test the convergence of the SRI algorithms, we used the linear test Equation 67.Figure 5B demonstrates that the SOSRI methods achieve the strong order 1.5 on Equation 67. Lastly, wetested the convergence of the IMEX version of the SKenCarp integrator. We defined the split SDE 69 asa modification of Equation 65 where the f part is solved implicitly and the f part is solved explicitly.Figure 5C demonstrates that the IMEX SKenCarp method achieves strong order 2.0 . Note that this doesnot demonstrate that the method always achieves strong order 1.5 since sufficient conditions for the IMEXpairing are unknown, but it gives numerical evidence that the method can be high order. To test the efficiency we first plotted work-precision [11, 38, 12] diagrams for the SOSRA, SOSRA2, andSKenCarp methods against the SRA1, SRA2, SRA3 [32] methods, and fixed time step Euler-Maruyamamethod (Milstein is equivalent to Euler-Maruyama in this case [20]). We tested the error and timing onEquation 65. In addition, we tested using the Lotka-Volterra equation with additive noise Equation 70.Since 70 does not have an analytical solution, a reference solution was computed using a low tolerancesolution via SOSRA for each Brownian trajectory. The plots show that there is a minimal difference inefficiency between the SRA algorithms for errors in the interval (cid:2) − , − (cid:3) , while these algorithms are allsignificantly more efficient than the Euler-Maruyama method when the required error is < − (Figure 6).The weak error work-precision diagrams show that when using between 100 to 10,000 trajectories, the weakerror is less than the sample error in the regime where there is no discernible efficiency difference between theSRA methods. These results show that in the regime of mild accuracy on non-stiff equations, the SOSRA,SOSRA2, and SKenCarp methods are much more efficient than low order methods yet achieve the sameefficiency as the non-stability optimized SRA variants. Note that these results also show that the errorestimator for adaptivity is highly conservative, generating solutions with around 2 orders of magnitude lesserror than the tolerance suggests. To test how efficiently the algorithms could achieve solve stiff equations, we chose to analyze the qualitativeresults of the driven Van der Pol equation. The driven Van der Pol equation is given by Equation 71 where µ t E rr o r Additive Convergence Tests
SOSRASOSRA2SKenCarp2nd Order Reference dt E rr o r Linear Convergence Tests
SOSRISOSRI21.5 Order Reference dt E rr o r IMEX Convergence Tests
IMEX SKenCarp2nd Order Reference (B)(A)(C)
Figure 5:
Additive Noise Convergence Tests.
The error is averaged over 1000 trajectories. Shown arethe strong l error along the time series of the solution. (A) Convergence results on Equation 65. The testused a fixed time step h = 1 / − to h = 1 / − . (B) Convergence results on Equation 67. The test used afixed time step h = 1 / − to h = 1 / − . (C) Convergence results on the IMEX Equation 69. The test useda fixed time step h = 1 / − to h = 1 / − . 17 rror T i m e ( s ) Additive Strong Error
Error T i m e ( s ) Additive Weak Error
Error T i m e ( s ) Lotka-Volterra Strong Error
Error T i m e ( s ) Lotka-Volterra Weak Error
Euler-MaruyamaSRA1SRA2SRA3SOSRASOSRA2SKenCarpSample Error: 100Sample Error: 10000 (A) (B)(C) (D)
Figure 6:
SOSRA Efficiency Tests.
The error was taken as the average of 10,000 trajectories for Equation65 and 100 trajectories for the Lokta-Volterra Equation 70. The sample error was determined for the weakerror as the normal 95% confidence interval for the mean using the variance of the true solution Equation66 or the variance of the estimated true solutions via low tolerance solutions. The time is the average timeto compute a trajectory and is averaged over 1000 runs at the same tolerance or step size. (A)
Shownare the work-precision plots for the methods on Equation 65. Each of the adaptive time-stepping methodssolved the problem on the interval using changing values of tolerances, with tol = abstol = reltol starting at10 and ending at 10 − going in increments of 10. The fixed time-stepping methods used time steps of size h = 1 / − to h = 1 / , changing the value by factors of 5. The error is the strong l error computed overthe time series. (B) Same setup as the previous plot but using the weak error at the final time-point. (C)
Shown are the work-precision plots for the methods on the Equation 70. Each of the adaptive time-steppingmethods solved the problem on the interval using changing values of tolerances, with tol = abstol = reltol starting at 4 − and ending at 4 − going in increments of 4. The fixed time-stepping methods used time stepsof size h = 1 / − . to h = 1 / − . , changing the value by factors of 12. The error is the strong l errorcomputed over the time series. (D) Same setup as the previous plot but using the weak error at the finaltime-point. 18lgorithm Run-time (seconds) Relative Time (vs SKenCarp)SKenCarp 37.23 1.0xSOSRA 315.58 8.5xSOSRA2 394.82 10.6xSRA3 1385.66 37.2xSRA1 3397.66 91.3xEuler-Maruyama 5949.19 159.8xDISTM (cid:0) θ = (cid:1) SRA Run times on Van der Pol with additive noise.
The additive noise Van der Pol equationwas solved 100 times using the respective algorithms at the highest tolerance by powers of two which matchthe low tolerance solution to plotting accuracy. The fixed time step methods had their ∆ t determined as thelargest ∆ t in increments of powers of 2 that produced no unstable trajectories. This resulted in ∆ t = 5 e − θ -methods. Note that the total time of the drift-implicit stochastic θ -method and the Euler-Maruyama method were determined by extrapolating the timefrom a single stable trajectory on t ∈ [0 ,
1] due to time constraints. DISTM is the Drift-Implicit Stochastic θ -Methodis the driving factor. As µ increases the equation becomes more stiff. µ = 10 is a common test for stiff ODEsolvers [13], with lower values used to test the semi-stiff regime for ODEs. For our purposes, we chose µ = 10 as a semi-stiff test case. The ODE case, solved using the Tsit5 explicit Runge-Kutta algorithm [39, 30], anddemonstrates the mild stiffness which is still well-handled by explicit methods (Figure 7A). We extend thismodel to the driven Van der Pol model with additive noise Equation 72 where ρ = 3 . dW (1) and dW (2) are independent Brownian motions. The solution to this model is interesting because itgives the same qualitative behavior, large bursts when x ( t ) crosses zero, but in this case the zero crossings arestochastic. Even at high tolerances, ( abstol = 10, reltol = 1 / ), SOSRA is able to reproduce this qualitativebehavior of the low tolerance solutions (Figure 7B), and SOSRA2 producing similar results at the sametolerances a factor of two lower. Given the conservativeness of the error estimators shown in previous (andother tests), this case corresponds to roughly two decimal places of accuracy, which is more than sufficientfor many phenomenological models. However, even at tolerances of abstol = 1 / , reltol = 1 / SRA3 wasunable to reproduce the correct qualitative behavior (Figure 7C). Thus we decreased the tolerances by factorsof 2 until it was able to reproduce the correct qualitative results (Figure 7D). This shows that the SOSRA aremore reliable on models with transient stiffness. To test the impact on the run time of the algorithms, eachof the algorithms were run 100 times with the tolerance setup that allows them to most efficiently generatecorrect qualitative results. The run times are shown in Table 1, which show that SRA1 takes more than 10times and SRA3 nearly 4 times as long as the SOSRA methods. In this case the implicit method SKenCarpis the fastest by besting the SOSRA methods by more than 8x while achieving similar qualitative results.This shows that as stiffness comes into play, the SOSRA methods along with the implicit SKenCarp methodare more robust and efficient. The fixed time step methods were far less efficient. Adaptive timestepping viarejection sampling was crucial to the success of the SKenCarp method because it required the ability to pullback to a smaller timestep when Newton iterations diverged, otherwise it resulted in time estimates around5x slower than SOSRA.
In addition to testing efficiency, we used this to test the stiffness detection in SOSRA2. Using a safetyfactor of ω = 5, we added only two lines of code to make the algorithm print out the timings for which thealgorithm predicts stiffness. The results on two trajectories were computed and are shown in Figure 8. Theauthors note that the stiffness detection algorithms are surprisingly robust without any tweaking being doneand are shown to not give almost any false positives nor false negatives on this test problem. While this19 t -10-50510 ODE Solution y(t)x(t) t -10-50510 High tolerance SOSRA t -10-50510 High Tolerance SRA3 t -10-50510 Low Tolerance SRA3 (A) (B)(C) (D)
Figure 7:
Representative trajectories for solutions to the Van der Pol equations with additivenoise.
Each of these trajectories are solved with the same underlying Brownian process. (A)
The solutionto the ODE with the explicit Runge-Kutta method Tsit5. (B)
The solution to the SDE with tolerance abstol = 1 , reltol = 1 / from SOSRA. (C) Solution to the SDE with tolerances abstol = 2 − , reltol = 2 − with SRA3. (D) Solution to the SDE with tolerances abstol = 2 − , reltol = 2 − with SRA3.safety factor is set somewhat high in comparison to traditional ODE stiffness detection, we note that thesealgorithms were designed to efficiently handle mild stiffness and thus we see it as a benefit that they onlydeclare stiffness when it appears to be in the regime which is more suitable for implicit methods. To test the efficiency we plotted a work-precision diagram with SRIW1, SOSRI, SOSRI2, and the fixed timestep Euler-Maruyama and a Runge-Kutta Milstein schemes for Equation 67 and the multiplicative noiseLotka-Volterra Equation 73. As with Equation 70, Equation 73 does not have an analytical solution so areference solution was computed using a low tolerance solution via SOSRI for each Brownian trajectory.The results show that there is a minimal difference in efficiency between the SRI algorithms for errors overthe interval (cid:2) − , − (cid:3) , while these algorithms are all significantly more efficient than the lower orderalgorithms when the required error is < − (Figure 9A-D). The weak error work-precision diagrams showthat when using between 100 to 10,000 trajectories, the weak error is less than the sample error in theregime where there is no discernible efficiency difference between the SRI methods.These results show thatin the regime of mild accuracy on non-stiff equations, these methods are much more efficient than low ordermethods yet achieve the same efficiency as the non-stability optimized SRI variants. Note that these resultsalso show the conservativeness of the error estimators.20 .50.0100-10 Trajectory 1 x(t)y(t) t Trajectory 2
Stiffness Detection Positive
Figure 8:
Stiffness detection in the Van der Pol equations with additive noise Equation 72.
Tworepresentative trajectories to Equation 7 are plotted. The green dots indicate time-points where the stiffnessdetection algorithm detected stiffness.
Error T i m e ( s ) Linear Strong Error
Error T i m e ( s ) Linear Weak Error
Error T i m e ( s ) Lotka-Volterra Strong Error
Error T i m e ( s ) Lotka-Volterra Weak Error
Euler-MaruyamaRKMilSRIW1SRIW2SOSRISOSRI2Sample Error: 100Sample Error: 10000 (A) (B)(C) (D)
Figure 9:
SOSRI efficiency on non-stiff test equations.
The error was taken as the average of 10,000trajectories for the Equation 67 and 100 trajectories for the Lokta-Volterra Equation 73. The sample errorwas determined for the weak error as the normal 95% confidence interval for the mean using the variance of thetrue solution Equation 68 or the variance of the estimated true solutions via low tolerance solutions. The timeis the average time to compute a trajectory and is averaged over 1000 runs at the same tolerance or step size. (A)
Shown are the work-precision plots for the methods on Equation 67. Each of the adaptive time-steppingmethods solved the problem on the interval using changing values of tolerances, with tol = abstol = reltol starting at 10 − and ending at 10 − going in increments of 10. The fixed time-stepping methods used timesteps of size h = 5 − to h = 5 − , changing the value by factors of 5. The error is the strong l error computedover the time series. (B) Same setup as the previous plot but using the weak error at the final time-point. (C)
Shown are the work-precision plots for the methods on the multiplicative noise Lotka-Volterra Equation73. Each of the adaptive time-stepping methods solved the problem on the interval using changing values oftolerances, with tol = abstol = reltol starting at 4 − and ending at 4 − going in increments of 4. The fixedtime-stepping methods used time steps of size h = 1 / − . to h = 1 / − . , changing the value by factorsof 12. The error is the strong l error computed over the time series. (D) Same setup as the previous plotbut using the weak error at the final time-point. 21 lgorithm Abstol Reltol Run-time (seconds) Relative Time (vs SOSRI)SOSRI 2 − − − − − − − − − − − − − − (cid:0) θ = (cid:1) Table 2:
SRI times for the the EMT model on t ∈ [0 , . The equations were solved 10,000 times withthe given tolerances to completion and the elapsed time was recorded. The fixed time step methods hadtheir ∆ t determined as the largest ∆ t in increments of powers of 2 that produced no unstable trajectories,as shown in [29]. DISTM is the Drift-Implicit Stochastic θ -Method To test the real consequences of the enhanced stability, we use the Epithelial-Mesenchymal Transition (EMT)model of 20 pathwise stiff reaction equations introduced in [14], studied as a numerical test in [29], andwritten in Section A.7. In the previous work it was noted that t ∈ [0 ,
1] was a less stiff version of this model.Thus we first tested the speed that the methods could solve for 10,000 trajectories with no failures due tonumerical instabilities. The tolerances were tuned for each method by factors of 2 and finding the largestvalues that were stable. Since SOSRI demonstrated that its stability is much higher than even SOSRI2,we show the effect of tolerance changes on SOSRI as well. The results show that at similar tolerances theSOSRI method takes nearly 5x less time than SRIW1 (Table 2). However, there is an upper bound on thetolerances before the adaptivity is no longer able to help keep the method stable. For SRIW1, this boundis much lower, causing it to run more than 15x slower than the fastest SOSRI setup. Interestingly SOSRI2required a higher tolerance than SRIW1 but was 3x faster than SRIW1’s fastest setup. We note that SOSRI’shighest relative tolerance 2 − ≈ × − is essentially requiring 4 digits of accuracy (in strong error) whenconsidering the conservativeness of the error estimator, which is far beyond the accuracy necessary in manycases. Lastly, we note that the SOSRI method is able to solve for 10,000 stable trajectories more than 60xfaster than any of the tested fixed time step methods.We then timed the run time to solve 10 trajectories in the t ∈ [0 , tol = 10 − ,while the others require more (especially in absolute tolerance as there is a stiff reactant whose values travelclose to zero). One interesting point to note is that at similar tolerances both SOSRI and SOSRI2 receivesimilar timings and both over 6 times faster than the fastest SRIW1 tolerance setup. Both are nearly twiceas fast as SRIW1 when matching tolerances as well. Given the conservativeness of the error estimatorsgenerally being around 2 orders of magnitude more precise than the local error estimate, the low tolerancesolutions are accurate enough for many phenomenological experiments and thus present a good speedupover previous methods. The timings for Euler-Maruyama and Runge-Kutta Milstein schemes are omittedsince the tests were unable to finish. From the results of [29] we note that the average dt for SRIW1 onthe edge of its stability had that the smallest dt was approximately 10 − . The stability region for fixedstep-size Euler-Maruyama is strictly smaller than SRIW1 (Figure 4) which suggests that it would requirearound 5 × time steps (with Runge-Kutta Milstein being similar) to solve to t = 500. Thus, given ittakes on our setup extrapolating the time given 170 seconds for 2 steps, this projects to around 1 . × seconds, or approximately 5 years. 22 lgorithm Abstol Reltol Run-time (seconds) Relative Time (vs SOSRI)SOSRI 10 − − − − − − − − − − − − (cid:0) θ = (cid:1) (cid:0) θ = (cid:1) Table 3:
SRI times for the the EMT model on t ∈ [0 , . The equations were solved 10 times withthe given tolerances to completion and the elapsed time was recorded. The fixed timestep methods hadtheir ∆ t chosen by incrementing by 10 − until 10 consecutive trajectories were stable. Drift-Implicit EulerMaruyama (DIEM) had ∆ t = and Drift-Implicit Runge-Kutta Milstein (DIRKM) had ∆ t = . Algorithm Abstol Reltol Run-time (seconds) Relative Time (vs SOSRI)SOSRI 10 − − − − − − Table 4:
SRI times for the the retinoic acid SPDE model on t ∈ [0 , . The equations were solvedtwice with the given tolerances to completion and the elapsed time was recorded. The tolerances were chosenas the highest pair of tolerances which did not diverge (going up by powers of 10). Note that none of thecases did the two timings vary by more than 1% of the total run time. Euler-Maruyama used time stepsof ∆ t = 1 / t = 1 / As another test we applied the methods to a method of lines discretization of a stochastic partial differentialequation (SPDE) describing the spatial regulation of the zebrafish hindbrain via retinoic acid signaling (Section A.8) [31]. The discretization results in a system of 6 × ×
100 SDEs. Starting from an initialzero state, a concentration gradient emerges over t ∈ [0 , f calls per step, resulting in a more efficient solution when high accuracy is not necessary.We note that the drift-implicit stochastic θ -method and drift implicit θ Runge-Kutta Milstein methods weretoo inefficient to estimate since their time steps were constrained to be near that of the Euler-Maruyamaequation due to divergence of the Newton iterations. This SPDE could also be solved via SKenCarp byusing the transformation of Section 4, but from experiments on the PDE we note that efficient solution ofthe implicit equations would require using a preconditioned Krylov method due to the size of the systemand thus it is left for future investigation.
In this work we derived stability-optimized SRK methods for additive and diagonal noise equations, and useda transformation to allow the additive noise methods to solve affine noise problems. Many other equations23an be reduced to the additive noise case as well using the same means. Importantly, our derivation methodsutilized heavy computational tools in order to approximately optimize otherwise intractable equations. Thissame method of derivation can easily be scaled up to higher orders, and by incorporating the coefficients forhigher conditions, efficiency can be optimized as well by adding the norm of the principle error coefficients tothe optimization function. The majority of the search was performed using global optimizers in massive par-allel using a hand-optimized CUDA kernel for the numerical integral of the characteristic function, replacingman-hours with core-hours and effectively optimizing the method. The clear next steps are to find SRA andSRI methods with minimal error estimates and sensible stability regions for the cases in which lower strongerror matters, and similar optimizations on SRK methods developed for small noise problems. We note thathigh strong order methods were investigated because of their better trajectory-wise convergence, allowingfor a more robust solution and error estimation since our application to transiently pathwise stiff equationsrequires such properties.In this work we also derived L-stable methods for additive (and thus multiplicative and affine) noiseequations, and computationally could not find an A-B-L stable method. While our method does not provethat no 2-stage A-B-L method exists, we have at least narrowed down its possibility. Additionally anextension of a well-known ESDIRK method to additive noise was developed. These ESDIRK methods havean extension which allows for mass-matrices in the problem formulation. Using singular mass matrices, thesemethods also present themselves as integrators for a form of SDAEs with deterministic constraints. Thismethod has an implicit-explicit (IMEX) extension and the stochastic extension was compatible with bothtableaus. We showed that this IMEX version of the method could numerically converge at order 2.0 on atest problem (matching the other SRA methods), indicating that it may achieve the sufficient condition.As an adaptive high order IMEX method, the ODE version of the method is a common choice for largediscretizations of PDEs. Thus this method could present itself as a potentially automatic and efficientoption for discretizations of large affine noise SPDEs by being able to use a low number of time steps whileminimizing the amount of work required to solve the implicit equation. We note that adaptivity along withefficient stage predictors was required to be more efficient than the common stochastic theta methods sincedivergence of quasi-Newton steps can be common if care is not taken. After engineering the method withall of the components together, the benchmark results showed large efficiency gains over both the previousdrift-implicit and stability-optimized explicit methods. While previous literature questioned the applicabilityof L-stable integrators to stochastic differential equations due to high error in the slow variables [25], ourcomputations show that this analysis may be mislead by analyzing strong order 0.5 methods. With ourhigher strong order methods we see sufficiently accurate results on real stiff problems, and this is greatlyhelped by time stepping adaptivity.The main caveat for our methods is the restrictions on the form of noise. While we have shown that anenlarged class of problems (affine noise) can handled by the integrators for additive noise problems, this isstill a very special case in the scope of possible SDEs. Diagonal noise is a much expanded scope but is stillconstrained, and our implicit methods were only derived for the additive noise case. Further research shouldfocus on the expansion of this these techniques to high order adaptive ESDIRK diagonal noise integrators.In addition, when g is non-zero a “diagonal noise” problem over the complex plane does not have diagonalnoise (due to the mixing of real and complex parts from complex multiplication, and reinterpretation asa 2 n real system). Thus these methods are not applicable to problems defined in the complex plane withcomplex Wiener processes. Development of similar integrators for commutative noise problems could allowfor similar performance benefits on such problems and is a topic for future research.Additionally, we were not able to sufficiently improve the stability along the noise axis with our explicitdiagonal noise methods. However, this is likely due to explicitness in the noise term. Recent research hasshown that step splitting which utilize a predicted step in the diffusion calcuation can significantly improvethe stability of a method [19, 42]. Given this, we conjecture that a form of predictor-correction, such as: X n +1 = X n + s (cid:88) i =1 α i f (cid:16) t n + c (0) i h, H (0) i (cid:17) (61)+ s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , √ h + β (3) i I (1 , h + β (4) i I (1 , , h (cid:19) g (cid:16) t n + c (1) i h (cid:17) (62)24ith stages H (1) i = X n + s (cid:88) j =1 A (1) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (1) ij g (cid:16) t n + c (1) j h, H (1) i (cid:17) √ h (63) H (0) i = X n + s (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h, H (1) i (cid:17) I (1 , h (64) H (1) i = X n + s (cid:88) j =1 A (1) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (1) ij g (cid:16) t n + c (1) j h, H (1) i (cid:17) √ h could improve the noise stability of the method while keeping explicitness and the same tableau. However,proper convergence and stability analysis would require significant effort.Our timings show that the current high order SRK methods are stability-bound and that when scientificstudies are only looking for small amounts of accuracy in stochastic simulations, most of the computationaleffort is lost to generating more accurate than necessary solutions in order to satisfy stability constraints.For additive noise problems we were able to obtain solutions about 5x-30x faster and for diagonal noiseapproximately 6x than the current adaptive methods (SRA1, SRA3, SRIW1), while common methods likeEuler-Maruyama and Drift-Implicit θ Runge-Kutta Milstein were in many cases hundreds of times sloweror in many cases could not even finish. We have also shown that these methods are very robust even athigh tolerances and have a tendency to produce the correct qualitative results on semi-stiff equations (viaplots) even when the user chosen accuracy is low. Given that the required user input is minimal and workover a large range of stiffness, we see these as very strong candidates for default general purpose solversfor problem-solving environments such as MATLAB and Julia since they can easily and efficiently produceresults which are sufficiently correct. Due to a choice in the optimization, the SOSRA and SOSRA2 methodsare not as efficient at low tolerances as SRA3, so SRA3 should be used when high accuracy is necessary (onadditive or affine noise problems). However, in many cases like integrating to find steady distributions ofbistable parameter regimes or generating trajectories of phonomenological models, this ability to quickly geta more course estimate is valuable.The stiffness detection in SDEs is a novel addition which we have demonstrated can act very robustly. Ithas a control parameter ω which can be used to control the false positive and false negative rate as needed.Note that stiff methods can achieve similar largest eigenvalue estimates directly from the Jacobians of f (and g ) given that the methods are implicit (or in the case of Rosenbrock methods, the Jacobian must still becomputed), and thus this can be paired with a stiff solver to allow for automatic switching between stiff andnon-stiff solvers. Given that the cost for such stiffness checks is minimal and the demonstrated efficiency ofthe implicit methods on stiff equations, we are interested in future studies on the efficiency of such compositemethod due to the stochastic nature of stiffness in SDEs. A Appendix I: Test Equations
A.1 Additive Noise Test Equation dX t = (cid:18) β √ t −
12 (1 + t ) X t (cid:19) dt + αβ √ t dW t , X = 12 , (65)where α = and β = with true solution X t = 1 √ t X + β √ t ( t + αW t ) . (66) A.2 Diagonal Noise Test Equation dX t = αX t dt + βX t dW t X = 12 , (67)25here α = and β = with true solution X t = X e (cid:16) β − α (cid:17) t + αW t . (68) A.3 Split Additive Test Equation dX t = ( f ( t, X t ) + f ( t, X t )) dt + αβ √ t dW t , X = 12 , (69)where f ( t, X t ) = β √ tf ( t, X t ) = −
12 (1 + t ) X t with true solution Equation 66. A.4 Additive Noise Lotka-Volterra dx = ( ax − bxy ) dt + σ A dW t dy = ( − cy + dxy ) dt + σ A dW t (70)where a = 1 . b = 1, c = 3 . d = 1 . σ A = 0 . A.5 Additive Noise Van Der Pol
The driven Van Der Pol equation is dy = µ ((1 − x ) y − x ) dtdx = ydt (71)The additive noise variant is dy = µ ((1 − x ) y − x ) dt + ρdW (1) t dx = y + ρdW (2) t (72) A.6 Multiplicative Noise Lotka-Volterra dx = ( ax − bxy ) dt + σ A dW t dy = ( − cy + dxy ) dt + σ A dW t (73)where a = 1 . b = 1, c = 3 . d = 1 . σ A = 0 . A.7 Epithelial-Mesenchymal Transition Model
The Epithelial-Mesenchymal Transition (EMT) model is given by the following system of SDEs which correspond to achemical reaction network modeled via mass-action kinetics with Hill functions for the feedbacks. Thismodel was introduced in [14]. A = (([ T GF ] + [
T GF
0] ( t )) /J snail ) n snail + ([ OV OL /J snail ) n snail d [ snail t dt = k snail + k snail (([ T GF ] + [
T GF
0] ( t )) /J snail ) n snail (1 + A ) (1 + [ SNAIL ] /J snail ) kd snail ([ snail − [ SR ]) − kd SR [ SR ] d [ SNAIL ] dt = k SNAIL ([ snail − [ SR ]) − kd SNAIL [ SNAIL ] d [ miR dt = kO + k SNAIL ] /J ) n + ([ ZEB ] /J ) n − kd ([ miR − [ SR ]) − (1 − λ SR ) kd SR [ SR ] d [ SR ] dt = T k ( K SR ([ snail − [ SR ]) ([ miR − [ SR ]) − [ SR ]) d [ zeb ] dt = k zeb + k zeb ([ SNAIL ] /J zeb ) n zeb SNAIL ] /J zeb ) n zeb + ([ OV OL /J zeb ) n zeb − kd zeb (cid:32) [ zeb ] − (cid:88) i =1 C i [ ZR ] (cid:33) − (cid:88) i =1 kd ZR i C i [ ZR i ] d [ ZEB ] dt = k ZEB (cid:32) [ zeb ] − (cid:88) i =1 C i [ ZR i ] (cid:33) − kd ZEB [ ZEB ] d [ miR dt = k + k SNAIL ] /J ) n + ([ ZEB ] /J ) n − kd (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) − (cid:88) i =1 (1 − λ i ) kd ZR i C i i [ ZR i ] − (1 − λ TR ) kd TR [ T R ] d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33)(cid:32) [ zeb ] − (cid:88) i =1 C i [ ZR i ] (cid:33) − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ tgf ] dt = k tgf − kd tgf ([ tgf ] − [ T R ]) − kd TR [ T R ] d [ T GF ] dt = k TGF + k TGF ([ tgf ] − [ T R ]) − kd TGF [ T GF ] d [ T R ] dt = T k (cid:32) K TR (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) ([ tgf ] − [ T R ]) − [ T R ] (cid:33) d [ Ecad ] dt = k E + k E SNAIL ] /J E ) n E + k E ZEB ] /J E ) n E − kd E [ Ecad ] B = k V ([ SNAIL ] /J V ) n V SNAIL ] /J V ) n V + k V ([ ZEB ] /J V ) n V ZEB ] /J V ) n V d [ V im ] dt = k V + B (1 + [ OV OL /J V ) − kd V [ V im ] d [ OV OL dt = k + k
11 + ([
ZEB ] /J ) n − kd O [ OV OL d [ OV OL p dt = k Op [ OV OL − kd Op [ OV OL p arameter Value Parameter Value Parameter Value Parameter Value J J E K k O J J E K kO J J V K kO J J V K kd snail J O J V K TR kd tgf J snail J zeb K SR kd zeb J snail J zeb T GF kd TGF J snail K T k kd ZEB k snail k zeb λ k TGF n n snail λ k E n n E λ k V n n E λ k E n n V λ k E n O n V λ SR k V n snail n zeb λ TR k V k O k k k tgf k zeb k TGF k SNAIL k ZEB kd ZR kd ZR kd ZR kd ZR kd ZR kd O kd kd kd SR kd E kd V k Op kd Op (cid:88) i =1 iC i [ ZR i ] = 5 [ ZR ] + 20 [ ZR ] + +30 [ ZR ] + 20 [ ZR ] + 5 [ ZR ] , (cid:88) i =1 C i [ ZR i ] = 5 [ ZR ] + 10 [ ZR ] + 10 [ ZR ] + 5 [ ZR ] + [ ZR ] , [ T GF
0] ( t ) = (cid:40) t > o.w. The parameter values are given in Table 5.
A.8 Retinoic Acid SPDE Model d [ RAout ] = (cid:0) β ( x ) + D ∆ [ RAout ] − b [ RAout ] + c (cid:2) RAin (cid:3)(cid:1) dt + σRAoutdWouttd (cid:2) RAin (cid:3) = (cid:32) b [ RAout ] + δ [ BP ] [ RA − RAR ] − (cid:32) γ [ BP ] + η + α [ RA − RAR ] ω + [ RA − RAR ] − c (cid:33) (cid:2) RAin (cid:3)(cid:33) dtd [ RA − BP ] = (cid:0) γ [ BP ] (cid:2) RAin (cid:3) + λ [ BP ] [ RA − RAR ] − ( δ + ν [ RAR ]) [ RA − BP ] (cid:1) dtd [ RA − RAR ] = ( ν [ RA − BP ] [ RAR ] − λ [ BP ] [ RA − RAR ]) dt + σRA − RAR [ RA − RAR ] dWRA − RARtd [ BP ] = (cid:32) a − λ [ BP ] [ RA − RAR ] − γ [ BP ] (cid:2) RAin (cid:3) + ( δ + ν [ RAR ]) [ RA − BP ] − u [ BP ] + d [ RA − RAR ] e + [ RA − RAR ] (cid:33) dtd [ RAR ] = ( ζ − ν [ RA − BP ] [ RAR ] + λ [ BP ] [ RA − RAR ] − r [ RAR ]) dt where β ( x ) = β H ( x −
40) with H the Heaviside step function and x = 40 is the edge of retinoic acidproduction [31]. The space was chosen as [ − , × [0 , x = ∆ y = 5. The boundary conditionswere no-flex on every side except the right side which had leaky boundary conditions with parameter kA =0 . B Appendix I: SKenCarp, SOSRA, and SOSRI Tableaus
All entries not listed are zero. 28 arameter Value Parameter Value Parameter Value σ RA in , σ RA − RAR , σ RA out ω u γ d α δ e β η a c r ζ ν λ D B.1 SKenCarp Exact Values K = 87294609440832483406992237 K = − K = 26826820 √ K = K ( K − K )4868738516734691891458097 B (0)2 , = K − ,B (0)4 , = K − K ,B (0) i,j = 0 o.w. B.2 SOSRA
Coefficient Value Coefficient Value α β (1)3 α β (2)1 α β (2)2 c (0)1 β (2)3 -1.0247917243497813 c (0)2 A (0)2 , c (0)3 A (0)3 , -3.1609142252828395 c (1)1 A (0)3 , c (1)2 B (0)2 , c (1)3 B (0)3 , β (1)1 -16.792534242221663 B (0)3 , β (1)2 .3 SOSRA2 Coefficient Value Coefficient Value α β (1)3 α -0.9683897375354181 β (2)1 α β (2)2 -0.8169981105823436 c (0)1 β (2)3 -0.18300188941765633 c (0)2 A (0)2 , c (0)3 A (0)3 , c (1)1 A (0)3 , c (1)2 B (0)2 , c (1)3 B (0)3 , β (1)1 B (0)3 , β (1)2 B.4 SOSRI
Coefficient Value Coefficient Value A (0)2 , -0.04199224421316468 α A (0)3 , α A (0)3 , -2.0527723684000727 c (0)2 -0.04199224421316468 A (0)4 , c (0)3 A (0)4 , -2.8895936137439793 c (0)4 A (0)4 , c (1)1 A (1)2 , c (1)2 A (1)3 , c (1)3 A (1)3 , -0.1502377115150361 c (1)4 A (1)4 , β (1)1 -1.8453464565104432 A (1)4 , β (1)2 A (1)4 , β (1)3 -0.2523866501071323 B (0)2 , -0.21641093549612528 β (1)4 B (0)3 , β (2)1 B (0)3 , β (2)2 -0.5771202869753592 B (0)4 , -1.0536037558179159 β (2)3 -0.12919702470322217 B (0)4 , β (2)4 B (0)4 , -0.20725685784180017 β (3)1 B (1)2 , -0.5119011827621657 β (3)2 -2.688764531100725 B (1)3 , β (3)3 B (1)3 , -4.9395031322250995 β (3)4 -0.40896857551684945 B (1)4 , β (4)1 B (1)4 , β (4)2 -0.57877086147738 B (1)4 , -1.4223118283355949 β (4)3 α β (4)4 α -0.6401334255743456 30 .5 SOSRI2 Coefficient Value Coefficient Value A (0)2 , α A (0)3 , α -0.2911544680711602 A (0)3 , c (0)2 A (0)4 , c (0)3 A (0)4 , c (0)4 A (0)4 , -0.27162232008616016 c (1)1 A (1)2 , c (1)2 A (1)3 , c (1)3 A (1)3 , c (1)4 A (1)4 , β (1)1 -0.45315689727309133 A (1)4 , β (1)2 A (1)4 , -0.04344582292908241 β (1)3 B (0)2 , β (1)4 B (0)3 , β (2)1 -0.4994383733810986 B (0)3 , β (2)2 B (0)4 , β (2)3 -0.25613778661003145 B (0)4 , -0.46089678470929774 β (2)4 -0.16260245862427797 B (0)4 , -0.9637509618944188 β (3)1 B (1)2 , β (3)2 -0.8330937231303933 B (1)3 , -0.07452812525785148 β (3)3 -0.3792843195533583 B (1)3 , -0.49783736486149366 β (3)4 -0.24077885458934023 B (1)4 , -0.5591906709928903 β (4)1 -0.4976090683622265 B (1)4 , β (4)2 B (1)4 , -0.8984927888368557 β (4)3 -1.4102107084476505 α -0.15036858140642623 β (4)4 α C Appendix II: SRK Order Conditions
C.1 Order Conditions for Rler-SRI Methods
The coefficients (cid:0) A , B , β ( i ) , α (cid:1) must satisfy the following order conditions to achieve order .5:1. α T e = 12. β (1) T e = 1 3. β (2) T e = 04. β (3) T e = 0 5. β (4) T e = 0additionally, for order 1:1. β (1) T B (1) e = 02. β (2) T B (1) e = 1 3. β (3) T B (1) e = 04. β (4) T B (1) e = 0and lastly for order 1.5: 31. α T A (0) e = α T B (0) e = 13. α T (cid:0) B (0) e (cid:1) = β (1) T A (1) e = 15. β (2) T A (1) e = 06. β (3) T A (1) e = − β (4) T A (1) e = 08. β (1) T (cid:0) B (1) e (cid:1) = 1 9. β (2) T (cid:0) B (1) e (cid:1) = 010. β (3) T (cid:0) B (1) e (cid:1) = − β (4) T (cid:0) B (1) e (cid:1) = 212. β (1) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 013. β (2) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 014. β (3) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 015. β (4) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 116 . β (1) T (cid:16) A (1) (cid:16) B (0) e (cid:17)(cid:17) + 13 β (3) T (cid:16) A (1) (cid:16) B (0) e (cid:17)(cid:17) = 0where f, g ∈ C , ( I × R d , R d ), c ( i ) = A ( i ) e , e = (1 , , , T [32]. C.2 Order Conditions for Rler-SRA Methods
The coefficients (cid:0) A , B , β ( i ) , α (cid:1) must satisfy the conditions for order 1:1. α T e = 1 2. β (1) T e = 1 3. β (2) T e = 0and the additional conditions for order 1.5:1. α T B (0) e = 12. α T A (0) e = α T (cid:0) B (0) e (cid:1) = β (1) T c (1) = 1 5. β (2) T c (1) = − c (0) = A (0) e with f ∈ C , ( I × R d , R d ) and g ∈ C ( I , R d ) [32]. D Appendix III: Derivation Details (cid:18) I − µ ∆ tA (0) (cid:19) H (0) = Un + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:18) Un + µ ∆ tA (1) H (0) (cid:19) , (cid:18) I − µ ∆ tA (0) (cid:19) H (0) − (cid:34) σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:35) µ ∆ tA (1) H (0) = Un + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − Un (cid:32) I − µ ∆ tA (0) − µ ∆ tA (1) σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) H (0) = (cid:32) I + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) UnH (0) = (cid:32) I − µ ∆ tA (0) − µσI (1 , A (1) B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) − (cid:32) I + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) Un (cid:18) I − σ √ ∆ tB (1) (cid:19) H (1) = Un + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:32) Un + σ I (1 , t B (0) H (1) (cid:33)(cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) H (1) = Un + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − Un (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) H (1) = (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) UnH (1) = (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un n +1 = Un + µ ∆ t α · (cid:32) I − µ ∆ tA (0) − µσI (1 , A (1) B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) − (cid:32) I + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) Un + σI (1) β (1) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un + σ I (1 , √ ∆ t β (2) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un + σ I (1 , t β (3) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un + σ I (1 , , t β (4) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un Thus we substitute in the Wiktorsson approximations I ( i,i ) = 12 (cid:0) ∆ W − h (cid:1) I ( i,i,i ) = 16 (cid:0) ∆ W − h ∆ W (cid:1) I ( i. = 12 h (cid:18) ∆ W + 1 √ Z (cid:19) where ∆ Z ∼ N (0 , h ) is independent of ∆ W ∼ N (0 , h ). By the properties of the normal distribution, wehave that E [(∆ W ) n ] = 0for any odd n and E (cid:104) (∆ W ) (cid:105) = hE (cid:104) (∆ W ) (cid:105) = 3 h E (cid:104) (∆ W ) (cid:105) = 15 h E (cid:104) (∆ W ) (cid:105) = 105 h , and similarly for ∆ Z . Acknowledgments
We would like to thank the members of JuliaDiffEq, specifically David Widmann (@devmotion), Yingbo Ma(@YingboMa) and (@dextorious) for their contributions to the ecosystem. Their efforts have helped makethe development of efficient implementations possible.
References [1]
A. Abdulle and S. Cirilli , S-rock: Chebyshev methods for stiff stochastic differential equations , SIAMJournal on Scientific Computing, 30 (2008), pp. 997–1014, https://doi.org/10.1137/070679375 , https://doi.org/10.1137/070679375 .[2] D. I. K. Ahmadia and A. J. , Optimal stability polynomials for numerical integration of initial valueproblems , CAMCOS, 7 (2012), pp. 247–271, https://doi.org/10.2140/camcos.2012.7.247 .[3]
E. G. Birgin and J. M. Martinez , Improving ultimate convergence of an augmented lagrangianmethod , Optimization Methods and Software, 23 (2008), pp. 177–195, https://doi.org/10.1080/10556780701577730 , https://doi.org/10.1080/10556780701577730 .334] K. Burrage and J. C. Butcher , Non-linear stability of a general class of differential equation meth-ods , BIT Numerical Mathematics, 20 (1980), pp. 185–203, https://doi.org/10.1007/BF01933191 , https://doi.org/10.1007/BF01933191 .[5] J. C. Butcher , A history of runge-kutta methods , Applied Numerical Mathematics, 20 (1996),pp. 247–260, https://doi.org/http://dx.doi.org/10.1016/0168-9274(95)00108-5 , .[6] J. C. Butcher , Numerical methods for ordinary differential equations in the 20th century , Jour-nal of Computational and Applied Mathematics, 125 (2000), pp. 1–29, https://doi.org/http://doi.org/10.1016/S0377-0427(00)00455-6 , .[7] A. Conn, N. Gould, and P. Toint , A globally convergent augmented lagrangian algorithm foroptimization with general constraints and simple bounds , SIAM Journal on Numerical Analysis, 28(1991), pp. 545–572, https://doi.org/10.1137/0728030 , https://doi.org/10.1137/0728030 .[8] J. R. Dormand and P. J. Prince , A family of embedded runge-kutta formulae , Journalof Computational and Applied Mathematics, 6 (1980), pp. 19–26, https://doi.org/http://dx.doi.org/10.1016/0771-050X(80)90013-3 , .[9] I. Dunning, J. Huchette, and M. Lubin , Jump: A modeling language for mathematical optimization ,SIAM Review, 59 (2017), pp. 295–320, https://doi.org/10.1137/15M1020575 , https://doi.org/10.1137/15M1020575 .[10] H. Gilsing and T. Shardlow , Sdelab: A package for solving stochastic differen-tial equations in matlab , Journal of Computational and Applied Mathematics, 205 (2007),pp. 1002–1018, https://doi.org/https://doi.org/10.1016/j.cam.2006.05.037 , .[11] E. Hairer, S. P. Nrsett, and G. Wanner , Solving ordinary differential equations I : nonstiffproblems , Springer series in computational mathematics,, Springer, Heidelberg ; London, 2nd rev. ed.,2009.[12]
E. Hairer and G. Wanner , Solving Ordinary Differential Equations II - Stiff and Differential-Algebraic Problems , Springer, 1991.[13]
E. Hairer and G. Wanner , Stiff differential equations solved by radau methods , Journalof Computational and Applied Mathematics, 111 (1999), pp. 93–111, https://doi.org/https://doi.org/10.1016/S0377-0427(99)00134-X , .[14] T. Hong, K. Watanabe, C. H. Ta, A. Villarreal-Ponce, Q. Nie, and X. Dai , An ovol2-zeb1mutual inhibitory circuit governs bidirectional and multi-step transition between epithelial and mes-enchymal states , PLoS Comput Biol, 11 (2015), p. e1004569, https://doi.org/10.1371/journal.pcbi.1004569 , http://dx.doi.org/10.1371%2Fjournal.pcbi.1004569 .[15] M. E. Hosea and L. F. Shampine , Analysis and implementation of tr-bdf2 , Applied Numerical Math-ematics, 20 (1996), pp. 21–37, https://doi.org/https://doi.org/10.1016/0168-9274(95)00115-8 , .[16] A. Janicki, A. Izydorczyk, and P. Gradalski , Computer Simulation of Stochastic Models withSDE-Solver Software Package , Springer Berlin Heidelberg, Berlin, Heidelberg, 2003, pp. 361–370, https://doi.org/10.1007/3-540-44860-8_37 , https://doi.org/10.1007/3-540-44860-8_37 .[17] S. G. Johnson , The nlopt nonlinear-optimization package , http://ab-initio.mit.edu/nlopt .3418] C. A. Kennedy and M. H. Carpenter , Additive runge-kutta schemes for convection-diffusion-reaction equations , Applied Numerical Mathematics, 44 (2003), pp. 139–181, https://doi.org/https://doi.org/10.1016/S0168-9274(02)00138-1 , .[19] P. Kloeden and A. Neuenkirch , Convergence of numerical methods for stochastic differential equa-tions in mathematical finance , 2012, https://doi.org/10.1142/9789814436434_0002 .[20]
P. E. Kloeden and E. Platen , Numerical Solution of Stochastic Differential Equations , SpringerBerlin Heidelberg, 2011, https://books.google.com/books?id=BCvtssom1CMC .[21]
Y. Komori and K. Burrage , Weak second order s-rock methods for stratonovich stochas-tic differential equations , Journal of Computational and Applied Mathematics, 236 (2012),pp. 2895–2908, https://doi.org/https://doi.org/10.1016/j.cam.2012.01.033 , .[22] Y. Komori and K. Burrage , Strong first order s-rock methods for stochastic differen-tial equations , Journal of Computational and Applied Mathematics, 242 (2013), pp. 261–274, https://doi.org/https://doi.org/10.1016/j.cam.2012.10.026 , .[23] F. S. Lawrence , Some practical runge-kutta formulas , Math. Comput., 46 (1986), pp. 135–150, https://doi.org/10.2307/2008219 .[24]
J. Lawson , An order five runge-kutta process with extended region of stability , SIAM Journal on Nu-merical Analysis, 3 (1966), pp. 593–597, https://doi.org/10.1137/0703051 , http://dx.doi.org/10.1137/0703051 .[25] T. Li, A. Abdulle, and W. E , Effectiveness of implicit methods for stiff stochastic differential equa-tions , Commun. Comput. Phys, 3 (2008), pp. 295–307.[26]
X. Mao , The truncated eulermaruyama method for stochastic differential equations , Journal ofComputational and Applied Mathematics, 290 (2015), pp. 370–384, https://doi.org/https://doi.org/10.1016/j.cam.2015.06.002 , .[27] J. K. Moller and H. Madsen , From state dependent diffusion to constant diffusion in stochasticdifferential equations by the lamperti transform , report, Technical University of Denmark, DTU Infor-matics, Building 321, 2010.[28]
M. J. D. Powell , A direct search optimization method that models the objective and constraint func-tions by linear interpolation , in Advances in Optimization and Numerical Analysis, Proceedings of the6th Workshop on Optimization and Numerical Analysis, Oaxaca, Mexico, S. Gomez and J.-P. Hennart,eds., vol. 275, Kluwer Academic Publishers, pp. 51–67, https://doi.org/citeulike-article-id:6904064 , .[29] C. Rackauckas and Q. Nie , Adaptive methods for stochastic differential equations via natural em-beddings and rejection sampling with memory , Discrete and Continuous Dynamical Systems - SeriesB, 22 (2016), pp. 2731–2761, https://doi.org/10.3934/dcdsb.2017133 , http://aimsciences.org//article/id/5354a27a-e5be-4c40-9d7a-e918853b56b7 .[30] C. Rackauckas and Q. Nie , Differentialequations.jl - a performant and feature-rich ecosystem forsolving differential equations in julia , Journal of Open Research Software, 5 (2017), p. 15, https://doi.org/http://doi.org/10.5334/jors.151 .[31]
C. Rackauckas and Q. Nie , Mean-independent noise control of cell fates via intermediate states ,iScience, Accepted (2018). 3532]
A. Rossler , Runge kutta methods for the strong approximation of solutions of stochastic differentialequations , SIAM Journal on Numerical Analysis, 48 (2010), pp. 922–952, https://doi.org/10.1137/09076636x .[33]
T. P. Runarsson and Y. Xin , Search biases in constrained evolutionary optimization , IEEE Trans-actions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 35 (2005), pp. 233–243, https://doi.org/10.1109/TSMCC.2004.841906 .[34]
T. Schaffter , From genes to organisms: Bioinformatics System Models and Software , thesis, 2014.[35]
L. F. Shampine , Stiffness and nonstiff differential equation solvers, ii: Detecting stiffness with runge-kutta methods , ACM Trans. Math. Softw., 3 (1977), pp. 44–53, https://doi.org/10.1145/355719.355722 .[36]
L. F. Shampine and K. L. Hiebert , Detecting stiffness with the fehlberg (4, 5) formulas ,Computers & Mathematics with Applications, 3 (1977), pp. 41–46, https://doi.org/http://dx.doi.org/10.1016/0898-1221(77)90112-2 , .[37] W. H. E. Sharp, D. J. Higham, B. Owren, and P. W. , A survey of the explicit runge-kutta method ,(1995).[38]
G. Soderlind and L. Wang , Evaluating numerical ode/dae methods, algorithms and software , Jour-nal of Computational and Applied Mathematics, 185 (2006), pp. 244–260, https://doi.org/https://doi.org/10.1016/j.cam.2005.03.009 , .[39] C. Tsitouras , Runge-kutta pairs of order 5(4) satisfying only the first column simplifying assump-tion , Computers & Mathematics with Applications, 62 (2011), pp. 770–775, https://doi.org/http://doi.org/10.1016/j.camwa.2011.06.002 , .[40] P. J. van der Houwen , Explicit runge-kutta formulas with increased stability boundaries , NumerischeMathematik, 20 (1972), pp. 149–164, https://doi.org/10.1007/BF01404404 , http://dx.doi.org/10.1007/BF01404404 .[41] B. Wang and Q. Qi , Modeling the lake eutrophication stochastic ecosystem and the research of its sta-bility , Mathematical Biosciences, https://doi.org/https://doi.org/10.1016/j.mbs.2018.03.019 , .[42] P. Wang and Y. Li , Split-step forward methods for stochastic differential equations , Journal ofComputational and Applied Mathematics, 233 (2010), pp. 2641–2651, https://doi.org/https://doi.org/10.1016/j.cam.2009.11.010 , .[43] M. Wiktorsson , Joint characteristic function and simultaneous simulation of iterated ito integrals formultiple independent brownian motions , The Annals of Applied Probability, (2001), pp. 470–487, https://doi.org/10.1214/aoap/1015345301 , http://projecteuclid.org/euclid.aoap/1015345301http://projecteuclid.org/euclid.aoap/1015345301