[PDF] Stability-Optimized High Order Methods and Stiffness Detection for Pathwise Stiff Stochastic Differential Equations

Abstract

Stochastic differential equations (SDE) often exhibit large random transitions. This property, which we denote as pathwise stiffness, causes transient bursts of stiffness which limit the allowed step size for common fixed time step explicit and drift-implicit integrators. We present four separate methods to efficiently handle this stiffness. First, we utilize a computational technique to derive stability-optimized adaptive methods of strong order 1.5 for SDEs. The resulting explicit methods are shown to exhibit substantially enlarged stability regions which allows for them to solve pathwise stiff biological models orders of magnitude more efficiently than previous methods like SRIW1 and Euler-Maruyama. Secondly, these integrators include a stiffness estimator which allows for automatically switching between implicit and explicit schemes based on the current stiffness. In addition, adaptive L-stable strong order 1.5 implicit integrators for SDEs and stochastic differential algebraic equations (SDAEs) in mass-matrix form with additive noise are derived and are demonstrated as more efficient than the explicit methods on stiff chemical reaction networks by nearly 8x. Lastly, we developed an adaptive implicit-explicit (IMEX) integration method based off of a common method for diffusion-reaction-convection PDEs and show numerically that it can achieve strong order 1.5. These methods are benchmarked on a range of problems varying from non-stiff to extreme pathwise stiff and demonstrate speedups between 5x-6000x while showing computationally infeasibility of fixed time step integrators on many of these test equations.

Full PDF

SStability-Optimized High Order Methods and Stiﬀness Detectionfor Pathwise Stiﬀ Stochastic Diﬀerential Equations ∗ Christopher Rackauckas † Qing Nie ‡ April 13, 2018

Abstract

Stochastic diﬀerential equations (SDE) often exhibit large random transitions. This property, whichwe denote as pathwise stiﬀness, causes transient bursts of stiﬀness which limit the allowed step size forcommon ﬁxed time step explicit and drift-implicit integrators. We present four separate methods toeﬃciently handle this stiﬀness. First, we utilize a computational technique to derive stability-optimizedadaptive methods of strong order 1.5 for SDEs. The resulting explicit methods are shown to exhibitsubstantially enlarged stability regions which allows for them to solve pathwise stiﬀ biological modelsorders of magnitude more eﬃciently than previous methods like SRIW1 and Euler-Maruyama. Secondly,these integrators include a stiﬀness estimator which allows for automatically switching between implicitand explicit schemes based on the current stiﬀness. In addition, adaptive L-stable strong order 1.5implicit integrators for SDEs and stochastic diﬀerential algebraic equations (SDAEs) in mass-matrixform with additive noise are derived and are demonstrated as more eﬃcient than the explicit methodson stiﬀ chemical reaction networks by nearly 8x. Lastly, we developed an adaptive implicit-explicit(IMEX) integration method based oﬀ of a common method for diﬀusion-reaction-convection PDEs andshow numerically that it can achieve strong order 1.5. These methods are benchmarked on a range ofproblems varying from non-stiﬀ to extreme pathwise stiﬀ and demonstrate speedups between 5x-6000xwhile showing computationally infeasibility of ﬁxed time step integrators on many of these test equations.

Stochastic diﬀerential equations (SDEs) are dynamic equations of the form dX t = f ( t, X t ) dt + g ( t, X t ) dW t , (1)where X t is a d -dimensional vector, f : R d → R d is the drift coeﬃcient, and g : R d → R d × m is the diﬀusioncoeﬃcient which describes the amount and mixtures of the noise process W t which is a m -dimensionalBrownian motion. SDEs are of interest in scientiﬁc disciplines because they can exhibit behaviors which arenot found in deterministic models. For example, An ODE model of a chemical reaction network may stay ata constant steady state, but in the presence of randomness the trajectories may be switching between varioussteady states [31, 41, 14]. In many cases, these unique features of stochastic models are pathwise-dependentand are thus not a property of the evolution of the mean trajectory. However, these same eﬀects causerandom events of high numerical stiﬀness, which we denote as pathwise stiﬀness, which can cause diﬃcultiesfor numerical integration methods. ∗ This work was partially supported by NIH grants R01GM107264 and P50GM76516 and NSF grants DMS1562176 andDMS1161621. This material is based upon work supported by the National Science Foundation Graduate Research Fellowshipunder Grant No. DGE-1321846, the National Academies of Science, Engineering, and Medicine via the Ford Foundation, andthe National Institutes of Health Award T32 EB009418. Its contents are solely the responsibility of the authors and do notnecessarily represent the oﬃcial views of the NIH. † Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA and Center for Complex BiologicalSystems, University of California, Irvine, Irvine, CA 92697, USA ([email protected]). ‡ Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA and Center for Complex BiologicalSystems, University of California, Irvine, Irvine, CA 92697, USA and Department of Developmental and Cell Biology, Universityof California, Irvine, Irvine, CA 92697, USA ([email protected]). a r X i v : . [ m a t h . NA ] A p r t X t Pathwise Stiffness Example

Figure 1:

Example of a Pathwise Stiﬀ Solution . Depicted is a sample trajectory of Equation 2 solvedusing the SOSRI methods developed in this manuscript with reltol = abstol = 10 − .A minimal example of pathwise stiﬀness is demonstrated in the equation dX t = [ − X t (1 − X t ) (2 − X t )] dt + g ( t, X t ) dW t , X = 2 , t ∈ [0 , . (2)with additive noise g ( t, X t ) = 10 where a sample trajectory is shown in Figure 1. This equation has twostable steady states, one at X = 0 and another at X = 2, which the solution switches between when thenoise is suﬃciently large. While near a steady state the derivative is approximately zero making the problemnon-stiﬀ, during these transitions the derivative of the drift term reaches a maximum of ≈ t . This display oflarge, transient, and random switching behavior in a given trajectory causes stochastic bursts of numericalstiﬀness, a phenomena which we will denote pathwise stiﬀness. The ﬁxed time step Euler-Maruyama methodwould require dt < × − to be stable for most trajectories, thus requiring greater than 2 × steps tosolve this 1-dimensional SDE. In many cases the switching behavior can be rare (due to smaller amounts ofnoise) or can happen ﬁnitely many times like in the multiplicative noise version with g ( t, X t ) = 10 X t . Yeteven if these switches are only a small portion of the total time, the stability requirement imposed by theirexistence determines the possible stepsizes and thus has a large contribution to the overall computationalcost. While implicit methods can be used to increase the stability range, this can vastly increase the overallcomputational cost of each step, especially in the case large systems of SDEs like discretizations of stochasticreaction-diﬀusion equations. In addition, implicit solvers have in practice a smaller stability region due torequiring convergence of the quasi-Newton solvers for the implicit steps. This problem is mitigated in ODEsoftware by high-quality stage predictors given by extrapolation algorithms for good initial conditions for theNewton steps [12]. However, there are no known algorithms for stage predictors in the presence of large noisebursts and thus we will demonstrate that classic implicit solvers have a form of instability. Thus both ﬁxedtime step explicit and implicit solvers are inadequate for eﬃciently handling this common class of SDEs.Since these features exist in the single trajectories of the random processes, methods which attempt toaccount for the presence of such bursts must do so on each individual trajectory in order to be eﬃcient. Inprevious work, the authors have shown that by using adaptive time-stepping, a stochastic reaction networkof 19 reactants is able to be solved with an average time step 100,000 times larger than the value thatwas found necessary for stability during the random stiﬀ events for a high order SRK method [29]. Thisdemonstrated that the key to solving these equations eﬃciently required controlling the time steps in apathwise manner. However, the methods were still largely stability-bound, meaning the chosen tolerances tosolve the model were determined by what was necessary for stability and was far below the error necessaryfor the application. The purpose of this investigation is to develop numerical methods with the ability tobetter handle pathwise stiﬀness and allow for eﬃcient solving of large Monte Carlo experiments.We approach this problem through four means. First, we develop adaptive stability-optimized SRKmethods with enlarged stability regions. This builds oﬀ of similar work for ODE integrators which optimizethe coeﬃcients of a Butcher tableau to give enhanced stability [24, 2, 40]. Similar to the Runge-KuttaChebyschev methods [12] (and the S-ROCK extension to the stochastic case [22, 1, 21]), these methods are2esigned to be eﬃcient for equations which display stiﬀness without fully committing to implicit solvers.Given the complexity of the stochastic stability equations and order conditions, we develop a novel andscalable mechanism for the derivation of “optimal” Runge-Kutta methods. We use this method to designstability-optimized methods for additive noise and diagonal noise SDEs. We show through computationalexperiments that these adaptive stability-optimized SRK methods can adequately solve transiently stiﬀequations without losing eﬃciency in non-stiﬀ problems.On the other hand, to handle extreme stiﬀness we develop implicit RK methods for SDEs and stochasticdiﬀerential algebraic equations (SDAEs) in mass matrix form with additive noise. We extend the deﬁnitionof L-stability to additive noise SDEs and develop two strong order 1.5 methods: a fully implicit 2-stage L-stable method and an extension of the a well-known L-stable explicit ﬁrst stage singly diagonally implicit RK(ESDIRK) method due to Kennedy and Carpenter which is commonly used for convection-diﬀusion-reactionequations [18]. To the author’s knowledge, these are the ﬁrst high order adaptive L-stable methods for SDEsand the ﬁrst adaptive proposed SDAE integrators. In addition, to extend the utility of these additive noisemethods, we derive an extension of the methods for additive SDEs to aﬃne SDEs (mixed multiplicative andadditive noise terms) through a Lamperti transformation [27]. Lastly, in order to handle extreme transientstiﬀness, for each of these types of methods we derive computationally cheap methods for detecting stiﬀnessand switching between implicit and explicit integrators in the presence of stiﬀness. We show that thesemethods can robustly detect pathwise stiﬀ transients and thus can serve as the basis for automatic switchingmethods for SDEs. Together we test on non-stiﬀ, semi-stiﬀ, and stiﬀ equations with 2 to 6 × ×

100 SDEsfrom biological literature and show speedups between 6x-60x over the previous adaptive SRIW1 algorithm,and demonstrate the infeasibility of common explicit and implicit methods (Euler-Maruyama, Runge-KuttaMilstein, Drift-Implicit Stochastic θ -Method, and Drift-Implicit θ Runge-Kutta Milstein) found as the basisof many SDE solver packages [34, 10, 16].

The class of methods we wish to study are the adaptive strong order 1.5 SRK methods for diagonal noise[32, 29]. Diagonal noise is the case where the diﬀusion term g is diagonal matrix (cid:0) σ i X it (cid:1) and includesphenomenological noise models like multiplicative and aﬃne noise. The diagonal noise methods utilize thesame general form and order conditions as the methods for scalar noise so we use their notation for simplicity.The strong order 1.5 methods for scalar noise are of the form X n +1 = X n + s (cid:88) i =1 α i f (cid:16) t n + c (0) i h, H (0) i (cid:17) + (3) s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , √ h + β (3) i I (1 , h + β (4) i I (1 , , h (cid:19) g (cid:16) t n + c (1) i h (cid:17) (4)with stages H (0) i = X n + s (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h, H (1) j (cid:17) I (1 , h (5) H (1) i = X n + s (cid:88) j =1 A (1) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (1) ij g (cid:16) t n + c (1) j h, H (1) j (cid:17) √ h where the I j are the Wiktorsson approximations to the iterated stochastic integrals [43]. In the case ofadditive noise, deﬁned as having the diﬀusion coeﬃcient satisfy g ( t, X t ) ≡ g ( t ) , reduces to the form X n +1 = X n + s (cid:88) i =1 α i f (cid:16) t n + c (0) i h, H (0) i (cid:17) + s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , h (cid:19) g (cid:16) t n + c (1) i h (cid:17) (6)3ith stages H (0) i = X n + s (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h (cid:17) I (1 , h . (7)The tuple of coeﬃcients (cid:0) A ( j ) , B ( j ) , β ( j ) , α (cid:1) thus fully determines the SRK method. These coeﬃcients mustsatisfy the constraint equations described in Appendix C.1 in order to receive strong order 1.5. Thesemethods are appended with error estimates E D = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ t (cid:88) i ∈ I ( − σ ( i ) f (cid:16) t n + c (0) i ∆ t, H (0) i (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) or E D = ∆ t (cid:88) i ∈ I (cid:12)(cid:12)(cid:12) f (cid:16) t n + c (0) i ∆ t, H (0) i (cid:17)(cid:12)(cid:12)(cid:12) E N = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) i ∈ I (cid:18) β (3) i I (1 , ∆ t + β (4) i I (1 , , ∆ t (cid:19) g (cid:16) t n + c (1) i ∆ t, H (1) i (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) and the rejection sampling with memory (RSwM) algorithm to give it fully adaptive time-stepping [29].Thus unlike in the theory of ordinary diﬀerential equations [23, 8, 6, 39, 37], the choice of coeﬃcients forSRK methods does not require explicitly ﬁnding an embedded method when developing an adaptive SRKmethod and we will therefore take for granted that each of the derived methods is adaptive. We use a previous deﬁnition of a discrete approximation as numerically stable if for any ﬁnite time interval[ t , T ], there exists a positive constant ∆ such that for each (cid:15) > δ ∈ (0 , ∆ )lim | X δ − ¯ X δ | → sup t ≤ t ≤ T P (cid:0)(cid:12)(cid:12) X δt − ¯ X δt (cid:12)(cid:12) ≥ (cid:15) (cid:1) = 0 (8)where X δn is a discrete time approximation with maximum step size δ > X δ and ¯ X δn respectivelystarting at ¯ X δn [20]. For additive noise, we consider the complex-valued linear test equations dX t = µX t dt + dW t (9)where µ is a complex number. In this framework, a scheme which can be written in the form X hn +1 = X hn G ( µh ) + Z δn (10)with a constant step size δ ≡ h and Z δn are random variables which do not depend on the Y δn , then the regionof absolute stability is the set where for z = µh , | G ( z ) | < X hn +1 = X hn + z (cid:16) α · H (0) (cid:17) + β (1) σI (1) + σβ (2) I (1 , h (11)where H (0) = (cid:16) I − zA (0) (cid:17) − (cid:18) ˆ X hn + B (0) eσ I (1 , h (cid:19) (12)where ˆ X hn is the size s constant vector of elements X hn and e = (1 , , , T . By substitution we receive X hn +1 = X hn (cid:18) z (cid:18) α · (cid:16) I − zA (0) (cid:17) − (cid:19)(cid:19) + (13) (cid:16) I − zA (0) (cid:17) − B (0) eσ I (1 , h + β (1) σI (1) + σβ (2) I (1 , h (14)4his set of equations decouples since the iterated stochastic integral approximation I j are random numbersand are independent of the X hn . Thus the stability condition is determined by the equation G ( z ) = 1 + zα · (cid:16) I − zA (0) (cid:17) − (15)which one may notice is the stability equation of the drift tableau applied to a deterministic ODE [5]. Thusthe stability properties of the deterministic Runge-Kutta methods carry over to the additive noise SRAmethods on this test equation. However, most two-stage tableaus from ODE research were developed tosatisfy higher order ODE order constraints which do not apply. Thus we will instead look to maximizestability while satisfying the stochastic order constraints. For explicitmethods, A (0) and B (0) are lower diagonal and we receive the simpliﬁed stability function G ( z ) = 1 + A z α + z ( α + α ) (16)for a two-stage additive noise SRK method. For this method we will ﬁnd the method which optimizes thestability in the real part of z . Thus we wish to ﬁnd A (0) and α s.t. the negative real roots of | G ( z ) | = 1 areminimized. By the quadratic equation we see that there exists only a single negative root: z = −√ α α .Using Mathematica’s minimum function, we determine that the minimum value for this root subject to theorder constraints is z = (cid:16) − (cid:113) (cid:17) ≈ − . α = , meaning that the SRA1method due to Rossler achieves the maximum stability criteria. However, given extra degrees of freedom,we attempted to impose that c (0)1 = c (1)1 = 0 and c (0)2 = c (1)2 = 1 so that the error estimator spans thewhole interval. This can lead to improved robustness of the adaptive error estimator. In fact, when tryingto optimize the error estimator’s span we ﬁnd that there is no error estimator which satisﬁes c (0)2 > whichis the span of the SRA1 method [32]. Thus SRA1 is the stability-optimized 2-stage explicit method whichachieves the most robust error estimator. A (0) = (cid:18) (cid:19) , B (0) = (cid:18) (cid:19) , α = (cid:18) (cid:19) β (1) = (cid:18) (cid:19) , β (2) = (cid:18) − (cid:19) , c (0) = (cid:18) (cid:19) , c (1) = (cid:18) (cid:19) (17) For the 3-stage SRA method, we receive the simpliﬁed stability function G ( z ) = A A α z + A α z + A α z + A α z + α z + α z + α z + 1 (18)To optimize this method, we attempted to use the same techniques as before and optimize the real valuesof the negative roots. However, in this case we have a cubic polynomial and the root equations are morediﬃcult. Instead, we turn to a more general technique to handle the stability optimization which will beemployed in later sections as well. To do so, we generate an optimization problem which we can numericallysolve for the coeﬃcients. To simplify the problem, we let z ∈ R and deﬁne the function: f ( z, w ; N, M ) = (cid:90) D χ G ( z ) ≤ ( z ) dz (19)Notice that f is the area of the stability region when D is suﬃciently large. Thus we deﬁne the stability-optimized SRK method for additive noise SDEs as the set of coeﬃcients which achieves5 (A) (B)(C) (D) Figure 2:

SOSRA Stability Regions . The stability regions ( | G ( z ) | <

1) are plotted in the ( x, y )-plane for z = x + iy . (A) SRA1. (B)

SRA3. (C)

SOSRA. (D)

SOSRA2max A ( i ) ,B ( i ) ,β ( i ) ,α f ( z ) (20)subject to: Order ConstraintsIn all cases we impose 0 < c (0) i , c (1) i <

1. We use the order constraints to simplify the problem to a nonlinearoptimization problem on 14 variables with 3 equality constraints and 4 inequality constraints (with boundconstraints on the 10 variables). However, we found that simplifying the problem even more to require c (0)1 = c (1)1 = 0 and c (0)3 = c (1)3 = 1 did not signiﬁcantly impact the stability regions but helps the error estimatorand thus we reduced the problem to 10 variables, 3 equality constraints, and 2 inequality constraints. Thiswas optimized using the COBYLA local optimization algorithm [17, 28] with randomized initial conditions100 times and all gave similar results. In the Mathematica notebook we show the eﬀect of changing thenumerical integration region D on the results, but conclude that a D which does not bias the result forbetter/worse real/complex handling does not improve the result. The resulting algorithm, SOSRA, we givenby the coeﬃcients in table in Section B.2. Lastly, we used the condition that c (0)2 = c (0)3 = c (1)2 = c (1)3 = 1 toallow for free stability detection (discussed in Section 5.4). The method generated with this extra constraintis SOSRA2 whose coeﬃcients are in the table in Section B.3. These methods have their stability regionscompared to SRA1 and SRA3 in Figure 2 where it is shown that the SOSRA methods more than doublesthe allowed time steps when the eigenvalues of the Jacobian are dominated by the real part.6 .2 Drift Implicit Methods for Stiﬀ SDEs with Additive Noise It’s clear that, as in the case for deterministic equations, the explicit methods cannot be made A-stable.However, the implicit two-stage additive noise SRK method is determined by G ( z ) = z ( A ( A z − α z −

1) + A z ( α − A ) + A α z − A ( α z + 1) + α + α ) + 1 A z ( A z − − z ( A A z + A ) + 1 (21) which is A -stable if A z ( A z − − z ( A A z + A ) + 1 > z ( A ( A z − α z −

1) + A z ( α − A ) (22)+ A α z − A ( α z + 1) + α + α ) + 1 . (23)Notice that the numerator equals the denominator if and only if z = 0 or z = α + α ( A − A ) α + ( A − A ) α . (24)From the order conditions we know that α + α = 1 which means that no root exists with Re ( z ) < A − A ) α + ( A − A ) α >

0. Thus under these no roots conditions, we can determine A-stabilityby checking the inequality at z = 1, which gives 1 > ( A − A ) α + ( A − A ) α . Using the ordercondition, we have a total of four constraints on the A (0) and α :( A + A ) α + ( A + A ) α = 12 (25) α + α = 10 < ( A − A ) α + ( A − A ) α < z →∞ G ( z ) = 0 . (26)This implies that − A A + A α + A A − A α − A α + A α α A A − A A = 0 (27)The denominator is − det( A (0) ) which implies A (0) must be non-singular. Next, we attempt to impose B-stability on the drift portion of the method. We use the condition due to Burrage and Butcher that for B = diag ( α , α ) M = BA (0) + A (0) B − αα T (for ODEs) [4], we require both B and M to be non-negativedeﬁnite. However, in the supplemental Mathematica notebooks we show computationally that there is no2-stage SRK method of this form which satisﬁes all three of these stability conditions. Thus we settle forA-stability and L-stability.Recalling that c (0) and c (1) are the locations in time where f and g are approximated respectively, wewish to impose c (0)1 = 0 (28) c (0)2 = 1 c (1)1 = 0 c (1)2 = 1so that the error estimator covers the entire interval of integration. Since c (0) = A (0) e , this leads to thecondition A + A = 1. Using the constraint-satisfaction algorithm FindInstance in Mathematica, we look7

10 - 5 5 10x- 10- 5510yLSRA - 10 - 5 5 10x- 10- 5510ySKenCarp (A) (B)

Figure 3:

Implicit SRA Stability Regions . The stability regions ( | G ( z ) | <

1) are plotted in the ( x, y )-plane for z = x + iy . (A) LSRA. (B)

SKenCarpfor tableaus which satisfy the previous conditions with the added constraint of semi-implicitness, i.e. B (0) is lower triangular. This assumption is added because the inverse of the normal distribution has unboundedmoments, and thus in many cases it mathematically simpler to consider the diﬀusion term as explicit (thoughthere are recent methods which drop this requirement via truncation or extra assumptions on the solution[26]). However, we ﬁnd that there is no coeﬃcient set which meets all of these requirements. However, if werelax the interval estimate condition to allow 0 ≤ c (0)2 ≤

1, we ﬁnd an A-L stable method: A (0) = (cid:18) − (cid:19) , B (0) = (cid:18) (cid:19) , α = (cid:18) (cid:19) (29) β (1) = (cid:18) (cid:19) , β (2) = (cid:18) − (cid:19) , c (0) = (cid:18) (cid:19) , c (1) = (cid:18) (cid:19) which we denote LSRA. If we attempt to look for a 2-stage SDIRK-like method to reduce the complexity ofthe implicit equation, i.e. A (0)12 = 0, using FindInstance we ﬁnd the constraints unsatisﬁable. Note that if wedrop the semi-implicit assumption we ﬁnd that the full constraints cannot be satisﬁed there (we still cannotsatisfy c (0)1 = 0 and c (0)2 = 1), and there does not exist a 2-stage A-L stable SDIRK method in that case. Since the stability region of the SRA methods is completely determined by the deterministic portion A (0) ,in some cases there may exist a sensible extension of implicit Runge-Kutta methods for ordinary diﬀerentialequations to high order adaptive methods stochastic diﬀerential equations with additive noise which keepthe same stability properties. Since the order constraints which only involve the deterministic portions A (0) , c (0) , and α match the conditions required for ODE integrators, existence is dependent on ﬁnding β (1) , β (2) , c (1) , and B (0) that satisfy the full order constraints. In this case, an adaptive error estimator can be addedby using the same estimator as the ODE method (which we call E D ) but adding the absolute size of thestochastic portions E N = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , h (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (30)leading to the error estimator E = δE D + E N . (31)This can be shown similarly to the construction in [29]. Given the large literature on implicit RK methodsfor ODEs, this presents a large pool of possibly good methods and heuristically one may believe that thesewould do very well in the case of small noise. 8owever, we note that there does not always exist such an extension. Using the constraint-satisfactionalgorithm FindInstance in Mathematica, we looked for extensions of the explicit ﬁrst stage singly-diagonallyimplicit RK (ESDIRK) method TRBDF2 [15] and could not ﬁnd values satisfying the constraints. Inaddition, we could not ﬁnd values for an extension of the 5th order Radau IIA method [13, 12] whichsatisﬁes the constraints. In fact, our computational search could not ﬁnd any extension of a 3-stage L-stableimplicit RK method which satisﬁes the constraints.But, the 4-stage 3rd order ODE method due to Kennedy and Carpenter [18] can be extended to thefollowing: A (0) =  − −  , (32) α =  −  β (1) =   , β (2) =  −  , c (0) =   , c (1) =   B (0)2 , ≈ − . B (0)4 , ≈ − . The exact values for B , and B , are shown in B.1. (E)SDIRK methods are particularly interestingbecause these methods can be solved using a single factorization of the function of the Jacobian I − γdtJ where J is the Jacobian. Additionally, explicit handling of the noise term is similar to the Implicit-Explicit(IMEX) form for additive Runge-Kutta methods in that it occurs by adding a single constant term to theNewton iterations in each stage, meaning it does not signiﬁcantly increase the computational cost. Thechosen ESDIRK method has a complimentary explicit tableau to form an IMEX additive Runge-Kuttamethod, and the chosen values for the stochastic portions are simultaneously compatible with the orderconditions for this tableau. In Section 6.1 we numerically investigate the order of the IMEX extension of themethod and show that it matches the convergence of the other SRA methods on the test equation. One thingto note is that since the problem is additive noise the method is never implicit in the dependent variables inthe noise part, so in theory this can also be extended with B (1) implicit as well (with convergence concernsdue to the non-ﬁnite inverse moments of the Normal distribution [20]). One has to be careful with the implementation to avoid accumulation of ﬂoating point error for highly stiﬀequations. For our implementation, we used a method similar to that described in [15]. The implicit stageswere deﬁned in terms of z i = hf (cid:16) t + c (0) i , H (0) j (cid:17) (33)where X is the previous step, and thus the iterations become H (0) j = γz i + i − (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + i − (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h (cid:17) I (1 , h = γz i + α i . (34)This gives the implicit system for the residual: G ( z i ) = z i − hf (cid:16) t n + c (0) i h, γz i + α i (cid:17) I − γhJ where J is the Jacobian of f and thus is the same for each stage. For choosingthe values to start the Newton iterations, also known as stage prediction, we tested two methods. The ﬁrstis the trivial stage predictor which is z i = z j for the j s.t. j < i and c j < c i , i.e. using the closest derivativeestimate. The other method that was tested is what we denote the stochastic minimal residual estimategiven by H (0) j = α i or z i = 0. This method takes into account the stochastic bursts at a given step and thusdemonstrated much better stability. We note that these methods also apply to solving ODEs with mass-matrices of the form:

M dX t = f ( t, X t ) dt + M g ( t, X t ) dW t . The derivation of the method is the same, except in this case we receive the implicit system G ( z i ) = M z i − hf (cid:16) t n + c (0) i h, γz i + α i (cid:17) which has a Jacobian M − γhJ . Like in the ODE case, these implicit methods can thus solve DAEs inmass-matrix form (the case where M is singular), though we leave discussion of convergence for future re-search. One interesting property to note is that a zero row in the mass matrix corresponds to a constraintequation which is only dependent on the output of f since the multiplication of g by M is zero in thatsame corresponding row. Thus when a singular mass matrix is applied to the noise equation, the corre-sponding constraints are purely deterministic relations. Thus while this is a constrained form, propertieslike conservation of energy in physical models can still be placed on the solution using this mass-matrixformulation. Given the eﬃciency of the methods for additive noise, one method for developing eﬃcient methods for moregeneral noise processes is to use a transform of diagonal noise processes to additive noise. This transform isdue to Lamperti [27], which states that the SDE of the form dX t = f ( t, X t ) dt + σ ( t, X t ) R ( t ) dW t (35)where σ > σ i ( t, X i,t ) has the transformation Z i,t = ψ i ( t, X i,t ) = (cid:90) σ i ( x, t ) dx | x = X i,t (36)which will result in an Ito process with the i th element given by dZ i,t = (cid:32) ∂∂t ψ i ( t, x ) | x = ψ − ( t,Z i,t ) + f i ( t, ψ − ( t, Z t )) σ i (cid:0) t, ψ − i ( t, Z i,t ) (cid:1) − ∂∂x σ i (cid:0) t, ψ − i ( t, Z i,t ) (cid:1)(cid:33) dt (37)+ n (cid:88) j =1 r ij ( t ) dw j,t (38)with X t = ψ − ( t, Z t ) . (39)This is easily veriﬁed using Ito’s Lemma. In the case of mixed multiplicative and additive noise (aﬃne noise),the vector equation: dX t = f ( t, X t ) dt + ( σ M X t + σ A ) dW t (40)10ith σ M > σ A >

0, the transform becomes element-wise in the system. Thus we can consider theone-dimensional case. Since ψ ( t, X t ) = (cid:82) (cid:16) σ M X t + σ A (cid:17) dx | x = X t = log( σ M X t + σ A ) σ M , then X t = exp( σ M Z t ) − σ A σ M and dZ t = ˜ f ( t, X t ) dt + dW t (41)˜ f ( t, X t ) = f ( t, X t ) σ M X t + σ A − σ M provided σ M X t is guaranteed to be suﬃciently diﬀerent from σ A to not cause deﬁnitional issues. It iscommon in biological models like chemical reaction networks that X t ≥

0, in which case this is well-deﬁnedfor any σ A > σ M > Z t which is the same as X t if σ M (cid:54) = 0 and is the transformed X t otherwise (assuming parameters must bepositive). When doing so, references of X i,t must be changed into exp( σ M Z i,t ) − σ A σ M . For example, the aﬃnenoise Lotka-Volterra SDE: dx = ( ax − bxy ) dt + ( σ M x + σ A ) dW t dy = ( − cy + dxy ) dt + σ ˜ A dW t only has noise on the ﬁrst term, so this transforms to x = exp( σ M z ) − σ A σ M dz = (cid:18) ax − bxyσ M x + σ A − σ M (cid:19) dt + dW t dy = ( − cy + dxy ) dt + σ ˜ A dW t along with the change to the initial condition and can thus be solved with the SRA methods. We note a wordof caution that the above transformation only holds when σ A > σ A = 0, the transformation isdiﬀerent, with X t = exp( Z t ) σ M (instead of exp( σ M Z t ) σ M which one would get by taking σ A = 0).Instead of performing the transformations directly on the functions themselves, we can modify the SRAalgorithm to handle this case as: X n +1 = ψ − (cid:32) ψ ( X n ) + s (cid:88) i =1 α i ˜ f (cid:16) t n + c (0) i h, H (0) i (cid:17) + s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , h (cid:19) ˜ g (cid:16) t n + c (1) i h (cid:17)(cid:33) (42) with stages H (0) i = ψ ( X n ) + s (cid:88) j =1 A (0) ij ˜ f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij ˜ g (cid:16) t n + c (1) j h (cid:17) I (1 , h (43)where ψ is the element-wise function: ψ i ( x ) =  log( σ i,M x + σ i,A ) σ i,M σ i,M > , σ i,A > log( x ) σ i,M σ i,M > , σ i,A = 0 x o.w. ,ψ − i ( z ) =  exp( σ i,M z ) σ i,M σ i,M > , σ i,A > exp( z ) σ i,M σ i,M > , σ i,A = 0 x o.w. and ˜ g i ( t ) = (cid:40) σ i,M > σ i,A o.w. This can be summarized as performing all internal operations in Z -space (where the equation is additive)but saving each step in X -space. 11 Optimized-Stability Order 1.5 SRK Methods with Diagonal Noise

For diagonal noise, we use the mean-square deﬁnition of stability [20]. A method is mean-square stable iflim n →∞ E (cid:16) | X n | (cid:17) = 0 on the test equation dX t = µX t dt + σX t dW t . (44)In matrix form we can re-write our method as given by X n +1 = X n + µh (cid:16) α · H (0) (cid:17) + σI (1) (cid:16) β (1) · H (1) (cid:17) + σ I (1 , √ h (cid:16) β (2) · H (1) (cid:17) (45)+ σ I (1 , h (cid:16) β (3) · H (1) (cid:17) + σ I (1 , , h (cid:16) β (4) · H (1) (cid:17) (46)with stages H (0) = X n + µ ∆ tA (0) H (0) + σ I (1 , h B (0) H (1) , (47) H (1) = X n + µ ∆ tA (1) H (0) + σ √ ∆ tB (1) H (1) where ˆ X n is the size s constant vector of X n . H (0) = (cid:16) I − hA (0) (cid:17) − (cid:18) ˆ X n + σ I (1 , h B (0) H (1) (cid:19) , (48) H (1) = (cid:16) I − σ √ hB (1) (cid:17) − (cid:16) ˆ X n + µhA (1) H (0) (cid:17) By the derivation in the appendix, we receive the equation S = E  U n +1 U n  = { µht  α · (cid:32) I − µ ∆ tA (0) − µσI (1 , A (1) B (0) (cid:18) I − σ √ hB (1) (cid:19) − (cid:33) − (cid:32) I + σ I (1 , h B (0) (cid:18) I − σ √ hB (1) (cid:19) − (cid:33) (49)+ σI (1)  β (1) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) + σ I (1 , √ h  β (2) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) + σ I (1 , h  β (3) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) + σ I (1 , , h  β (4) · (cid:32) I − σ √ hB (1) − µhA (1) (cid:18) I − µhA (0) (cid:19) − σ I (1 , h B (0) (cid:33) − (cid:32) I + µhA (1) (cid:18) I − µhA (0) (cid:19) − (cid:33) } We apply the substitutions from the Appendix and let z = µh, (50) w = σ √ h. In this space, z is the stability variable for the drift term and w is the stability in the diﬀusion term. Underthis scaling (cid:16) h, √ h (cid:17) , the equation becomes independent of h and thus becomes a function S ( z, w ) on thecoeﬃcients of the SRK method where mean-square stability is achieved when | S ( z, w ) | <

1. The equation S ( z, w ) in terms of its coeﬃcients for explicit methods ( A ( i ) and B ( i ) lower diagonal) has millions of termsand is shown in the supplemental Mathematica notebook. Determination of the stability equation for theimplicit methods was found to be computationally intractable and is an avenue for further research.12 .2 An Optimization Problem for Determination of Coeﬃcients We wish to determine the coeﬃcients for the diagonal SRK methods which optimize the stability. To do so,we generate an optimization problem which we can numerically solve for the coeﬃcients. To simplify theproblem, we let z, w ∈ R . Deﬁne the function f ( z, w ; N, M ) = (cid:90) M − M (cid:90) − N χ S ( z,w ) ≤ ( z, w ) dzdw. (51)Notice that for N, M → ∞ , f is the area of the stability region. Thus we deﬁne the stability-optimizeddiagonal SRK method as the set of coeﬃcients which achievesmax A ( i ) ,B ( i ) ,β ( i ) ,α f ( z, w ) (52)subject to: Order ConstraintsHowever, like with the SRK methods for additive noise, we impose a few extra constraints to add robustnessto the error estimator. In all cases we impose 0 < c (0) i , c (1) i < c (0)4 = c (1)4 = 1which we call the End-C Constraint. Lastly, we can prescribe the ordering constraint c ( j )1 < c ( j )2 < c ( j )3 < c ( j )4 which we denote as the Inequality-C Constraint.The resulting problem is a nonlinear programming problem with 44 variables and 42-48 constraint equa-tions. The objective function is the two-dimensional integral of a discontinuous function which is determinedby a polynomial of in z and w with approximately 3 million coeﬃcients. To numerically approximate thisfunction, we calculated the characteristic function on a grid with even spacing dx using a CUDA kerneland found numerical solutions to the optimization problem using the JuMP framework [9] with the NLoptbackend [17]. A mixed approach using many solutions of the semi-local optimizer LN AUGLAG EQ [7, 3]and fewer solutions from the global optimizer GN ISRES [33] were used to approximate the optimality of so-lutions. The optimization was run many times in parallel until many results produced methods with similaroptimality, indicating that we likely obtained values near the true minimum.The parameters N and M are the bounds on the stability region and also represent a trade-oﬀ between thestability in the drift and the stability in the diﬀusion. A method which is optimized when M is small wouldbe highly stable in the case of small noise, but would not be guaranteed to have good stability properties inthe presence of large noise. Thus these parameters are knobs for tuning the algorithms for speciﬁc situations,and thus we solved the problem for diﬀerent combinations of N and M to determine diﬀerent algorithms forthe diﬀerent cases. The coeﬃcients generated for approximately-optimal methods fall into three categories. In one category wehave the drift-dominated stability methods where large N and small M was optimized. On the other end wehave the diﬀusion-dominated stability methods where large M and small N was optimized. Then we havethe mixed stability methods which used some mixed size choices for N and M . As a baseline, we optimizedthe objective without constraints on the c i to see what the “best possible method” would be. When this wasdone with large N and M , the resulting method, which we name SOSRI, has almost every value of c satisfythe constraints, but with c (0)2 ≈ − .

04 and c (0)4 ≈ .

75. To see if we could produce methods which were morediﬀusion-stable, we decreased N to optimize more in w but failed to produce methods with substantiallyenlarged diﬀusion-stability over SOSRI.Adding only the inequality constraints on the c i and looking for methods for drift-dominated stability,we failed to produce methods whose c i estimators adequately covered the interval. Some of the resultsdid produce stability regions similar to SOSRI but with c (0) i < . c i , one method, which welabel SOSRI2, resulted in similar stability to SOSRI but satisfy the c i constraints. In addition, this methodsatisﬁes c (0)3 = c (0)4 = 1 and c (1)3 = c (1)4 = 1, a property whose use will be explained in Section 5.4. Thestability regions for these methods is shown in Figure 4.13 - z- - - wEuler-Maruyama - - z- - - wSRIW1 - - z- - - wSRIW2- - z- - - wSOSRI - - z- - - wSOSRI2 (A) (B) (C)(D) (E) Figure 4:

SOSRI Stability Regions . The stability regions ( S ( z, w ) ≤ z, w )-plane. (A) Euler-Maruyama. (B)

SRIW1. (C)

SRIW2. (D)

SOSRI. (E)

SOSRI2To look for more diﬀusion-stable methods, we dropped to N = 6 to encourage the methods to expandthe stability in the w -plane. However, we could not ﬁnd a method whose stability region went substantiallybeyond [ − ,

2] in w . This was further decreased to N = 1 where methods still could not go substantiallybeyond | | . Thus we were not able to obtain methods optimized for the diﬀusion-dominated case. This hardbarrier was hit under many diﬀerent constraint and objective setups and under thousands of optimizationruns, indicating there might be a diﬀusion-stability barrier for explicit methods. In many real-world cases, one may not be able to clearly identify a model as drift-stability bound or diﬀusion-stability bound, or if the equation is stiﬀ or non-stiﬀ. In fact, many models may switch between suchextremes. An example is a model with stochastic switching between diﬀerent steady states. In this case,we have that the diﬀusion term f ( t, X ss ) ≈ f can be very large and stiﬀ, causing the integration tobe heavily drift-stability dominated. Since these switches are random, the ability to adapt between thesetwo behaviors could be key to achieving optimal performance. Given the trade-oﬀ, we investigated how ourmethods allow for switching between methods which optimize for the diﬀerent situations.The basis for our method is an extension of a method proposed for deterministic diﬀerential equations[35, 36, 12]. The idea is to create a cheap approximation to the dominant eigenvalues of the Jacobians for the14rift and diﬀusion terms. If v is the eigenvector of the respective Jacobian, then for (cid:107) v (cid:107) suﬃciently small, | λ D | ≈ (cid:107) f ( t, x + v ) − f ( t, x ) (cid:107)(cid:107) v (cid:107) , (53) | λ N | ≈ (cid:107) g ( t, x + v ) − g ( t, x ) (cid:107)(cid:107) v (cid:107) (54)where | λ D | and | λ N | are the estimates of the dominant eigenvalues for the deterministic and noise functionsrespectively. We have in approximation that H ( k ) i is an approximation for X t + c ( k ) i h and thus the diﬀerencebetween two successive approximations at the same time-point, c ( k ) i = c ( k ) j , then the following serves as alocal Jacobian estimate: | λ D | ≈ (cid:107) f ( t + c (0) i h, H (0) i ) − f ( t + c (0) j h, H (0) j ) (cid:107)(cid:107) H (0) i − H (0) j (cid:107) , (55) | λ N | ≈ (cid:107) f ( t + c (1) i h, H (1) i ) − f ( t + c (1) j h, H (1) j ) (cid:107)(cid:107) H (1) i − H (1) j (cid:107) (56)If we had already computed a successful step, we would like to know if in the next calculation we shouldswitch methods due to stability. Thus it makes sense to approximate the Jacobian at the end of the interval,meaning i = s and j = s − s is the number of stages. Then if z min is the minimum z ∈ R such that z is in the stability region for the method, h | λ D | z min > w min to be of the maximum of the stability region in the noise axis.Hairer noted that, for ODEs, if a RK method has c i = c j = 1, then it follows that ρ = (cid:107) k i − k j (cid:107)(cid:107) g i − g j (cid:107) (57)where k i = f ( t + c i h, g i ) is an estimate of the eigenvalues for the Jacobian of f . Given the construction ofSOSRI2, a natural extension is | λ D | ≈ (cid:107) f (cid:16) t n + c (0)4 h, H (0)4 (cid:17) − f (cid:16) t n + c (0)3 h, H (0)3 (cid:17) (cid:107)(cid:107) H (0)4 − H (0)3 (cid:107) , (58) | λ N | ≈ (cid:107) g (cid:16) t n + c (0)4 h, H (1)4 (cid:17) − g (cid:16) t n + c (0)3 h, H (1)3 (cid:17) (cid:107)(cid:107) H (1)4 − H (1)3 (cid:107) (59)Given that these values are all part of the actual step calculations, this stiﬀness estimate essentially is free.By comparing these values to the stability plot in Figure 2, we use the following heuristic to decide if SOSRI2is stability-bound in its steps:1. If 10 > | λ D | > .

5, then we check if h | λ N | > ω .2. If | λ D | < .

5, then we check if h | λ N | / > ω .The denominator is chosen as a reasonable box approximation to the edge of the stability region. ω is asafety factor: in theory ω is 1 since we divided by the edge of the stability region, but in practice this isonly an eigenvalue estimate and thus ω allows for a trade-oﬀ between the false positive and false negativerates. If either of those conditions are satisﬁed, then h is constrained by the stability region. The solver canthus alert the user that the problem is stiﬀ or use this estimate to switch to a method more suitable for stiﬀequations. In addition, the error estimator gives separate error estimates in the drift and diﬀusion terms.15 scheme could combine these two facts to develop a more robust stiﬀness detection method, and label thestiﬀness as either drift or diﬀusion dominated.We end by noting that SOSRA2 has the same property, allowing stiﬀness detection via | λ D | ≈ (cid:107) f (cid:16) t n + c (0)3 h, H (0)3 (cid:17) − f (cid:16) t n + c (0)2 h, H (0)2 (cid:17) (cid:107)(cid:107) H (0)3 − H (0)2 (cid:107) (60)and, employing a similar method as the deterministic case, check for stiﬀness via the estimate h | λ D | / > ω .In addition, stiﬀ solvers can measure the maximal eigenvalues directly from the Jacobian. Here we suggestthe measure from Shampine [35, 36, 12] of using (cid:107) J (cid:107) ∞ as a cheap upper bound. For semi-implicit methodslike LSRA we only get a stability bound on the drift term, but this should be suﬃcient since for additivenoise diﬀusive noise instability is not an issue. In order to test the eﬃciency and correctness of the SRA algorithms, we chose to use the additive noise testEquation 67. Figure 5A demonstrates that the SOSRA and SKenCarp methods achieve the strong order2.0 on Equation 65. To test the convergence of the SRI algorithms, we used the linear test Equation 67.Figure 5B demonstrates that the SOSRI methods achieve the strong order 1.5 on Equation 67. Lastly, wetested the convergence of the IMEX version of the SKenCarp integrator. We deﬁned the split SDE 69 asa modiﬁcation of Equation 65 where the f part is solved implicitly and the f part is solved explicitly.Figure 5C demonstrates that the IMEX SKenCarp method achieves strong order 2.0 . Note that this doesnot demonstrate that the method always achieves strong order 1.5 since suﬃcient conditions for the IMEXpairing are unknown, but it gives numerical evidence that the method can be high order. To test the eﬃciency we ﬁrst plotted work-precision [11, 38, 12] diagrams for the SOSRA, SOSRA2, andSKenCarp methods against the SRA1, SRA2, SRA3 [32] methods, and ﬁxed time step Euler-Maruyamamethod (Milstein is equivalent to Euler-Maruyama in this case [20]). We tested the error and timing onEquation 65. In addition, we tested using the Lotka-Volterra equation with additive noise Equation 70.Since 70 does not have an analytical solution, a reference solution was computed using a low tolerancesolution via SOSRA for each Brownian trajectory. The plots show that there is a minimal diﬀerence ineﬃciency between the SRA algorithms for errors in the interval (cid:2) − , − (cid:3) , while these algorithms are allsigniﬁcantly more eﬃcient than the Euler-Maruyama method when the required error is < − (Figure 6).The weak error work-precision diagrams show that when using between 100 to 10,000 trajectories, the weakerror is less than the sample error in the regime where there is no discernible eﬃciency diﬀerence between theSRA methods. These results show that in the regime of mild accuracy on non-stiﬀ equations, the SOSRA,SOSRA2, and SKenCarp methods are much more eﬃcient than low order methods yet achieve the sameeﬃciency as the non-stability optimized SRA variants. Note that these results also show that the errorestimator for adaptivity is highly conservative, generating solutions with around 2 orders of magnitude lesserror than the tolerance suggests. To test how eﬃciently the algorithms could achieve solve stiﬀ equations, we chose to analyze the qualitativeresults of the driven Van der Pol equation. The driven Van der Pol equation is given by Equation 71 where µ t E rr o r Additive Convergence Tests

SOSRASOSRA2SKenCarp2nd Order Reference dt E rr o r Linear Convergence Tests

SOSRISOSRI21.5 Order Reference dt E rr o r IMEX Convergence Tests

IMEX SKenCarp2nd Order Reference (B)(A)(C)

Figure 5:

Additive Noise Convergence Tests.

The error is averaged over 1000 trajectories. Shown arethe strong l error along the time series of the solution. (A) Convergence results on Equation 65. The testused a ﬁxed time step h = 1 / − to h = 1 / − . (B) Convergence results on Equation 67. The test used aﬁxed time step h = 1 / − to h = 1 / − . (C) Convergence results on the IMEX Equation 69. The test useda ﬁxed time step h = 1 / − to h = 1 / − . 17 rror T i m e ( s ) Additive Strong Error

Error T i m e ( s ) Additive Weak Error

Error T i m e ( s ) Lotka-Volterra Strong Error

Error T i m e ( s ) Lotka-Volterra Weak Error

Euler-MaruyamaSRA1SRA2SRA3SOSRASOSRA2SKenCarpSample Error: 100Sample Error: 10000 (A) (B)(C) (D)

Figure 6:

SOSRA Eﬃciency Tests.

The error was taken as the average of 10,000 trajectories for Equation65 and 100 trajectories for the Lokta-Volterra Equation 70. The sample error was determined for the weakerror as the normal 95% conﬁdence interval for the mean using the variance of the true solution Equation66 or the variance of the estimated true solutions via low tolerance solutions. The time is the average timeto compute a trajectory and is averaged over 1000 runs at the same tolerance or step size. (A)

Shownare the work-precision plots for the methods on Equation 65. Each of the adaptive time-stepping methodssolved the problem on the interval using changing values of tolerances, with tol = abstol = reltol starting at10 and ending at 10 − going in increments of 10. The ﬁxed time-stepping methods used time steps of size h = 1 / − to h = 1 / , changing the value by factors of 5. The error is the strong l error computed overthe time series. (B) Same setup as the previous plot but using the weak error at the ﬁnal time-point. (C)

Shown are the work-precision plots for the methods on the Equation 70. Each of the adaptive time-steppingmethods solved the problem on the interval using changing values of tolerances, with tol = abstol = reltol starting at 4 − and ending at 4 − going in increments of 4. The ﬁxed time-stepping methods used time stepsof size h = 1 / − . to h = 1 / − . , changing the value by factors of 12. The error is the strong l errorcomputed over the time series. (D) Same setup as the previous plot but using the weak error at the ﬁnaltime-point. 18lgorithm Run-time (seconds) Relative Time (vs SKenCarp)SKenCarp 37.23 1.0xSOSRA 315.58 8.5xSOSRA2 394.82 10.6xSRA3 1385.66 37.2xSRA1 3397.66 91.3xEuler-Maruyama 5949.19 159.8xDISTM (cid:0) θ = (cid:1) SRA Run times on Van der Pol with additive noise.

The additive noise Van der Pol equationwas solved 100 times using the respective algorithms at the highest tolerance by powers of two which matchthe low tolerance solution to plotting accuracy. The ﬁxed time step methods had their ∆ t determined as thelargest ∆ t in increments of powers of 2 that produced no unstable trajectories. This resulted in ∆ t = 5 e − θ -methods. Note that the total time of the drift-implicit stochastic θ -method and the Euler-Maruyama method were determined by extrapolating the timefrom a single stable trajectory on t ∈ [0 ,

1] due to time constraints. DISTM is the Drift-Implicit Stochastic θ -Methodis the driving factor. As µ increases the equation becomes more stiﬀ. µ = 10 is a common test for stiﬀ ODEsolvers [13], with lower values used to test the semi-stiﬀ regime for ODEs. For our purposes, we chose µ = 10 as a semi-stiﬀ test case. The ODE case, solved using the Tsit5 explicit Runge-Kutta algorithm [39, 30], anddemonstrates the mild stiﬀness which is still well-handled by explicit methods (Figure 7A). We extend thismodel to the driven Van der Pol model with additive noise Equation 72 where ρ = 3 . dW (1) and dW (2) are independent Brownian motions. The solution to this model is interesting because itgives the same qualitative behavior, large bursts when x ( t ) crosses zero, but in this case the zero crossings arestochastic. Even at high tolerances, ( abstol = 10, reltol = 1 / ), SOSRA is able to reproduce this qualitativebehavior of the low tolerance solutions (Figure 7B), and SOSRA2 producing similar results at the sametolerances a factor of two lower. Given the conservativeness of the error estimators shown in previous (andother tests), this case corresponds to roughly two decimal places of accuracy, which is more than suﬃcientfor many phenomenological models. However, even at tolerances of abstol = 1 / , reltol = 1 / SRA3 wasunable to reproduce the correct qualitative behavior (Figure 7C). Thus we decreased the tolerances by factorsof 2 until it was able to reproduce the correct qualitative results (Figure 7D). This shows that the SOSRA aremore reliable on models with transient stiﬀness. To test the impact on the run time of the algorithms, eachof the algorithms were run 100 times with the tolerance setup that allows them to most eﬃciently generatecorrect qualitative results. The run times are shown in Table 1, which show that SRA1 takes more than 10times and SRA3 nearly 4 times as long as the SOSRA methods. In this case the implicit method SKenCarpis the fastest by besting the SOSRA methods by more than 8x while achieving similar qualitative results.This shows that as stiﬀness comes into play, the SOSRA methods along with the implicit SKenCarp methodare more robust and eﬃcient. The ﬁxed time step methods were far less eﬃcient. Adaptive timestepping viarejection sampling was crucial to the success of the SKenCarp method because it required the ability to pullback to a smaller timestep when Newton iterations diverged, otherwise it resulted in time estimates around5x slower than SOSRA.

In addition to testing eﬃciency, we used this to test the stiﬀness detection in SOSRA2. Using a safetyfactor of ω = 5, we added only two lines of code to make the algorithm print out the timings for which thealgorithm predicts stiﬀness. The results on two trajectories were computed and are shown in Figure 8. Theauthors note that the stiﬀness detection algorithms are surprisingly robust without any tweaking being doneand are shown to not give almost any false positives nor false negatives on this test problem. While this19 t -10-50510 ODE Solution y(t)x(t) t -10-50510 High tolerance SOSRA t -10-50510 High Tolerance SRA3 t -10-50510 Low Tolerance SRA3 (A) (B)(C) (D)

Figure 7:

Representative trajectories for solutions to the Van der Pol equations with additivenoise.

Each of these trajectories are solved with the same underlying Brownian process. (A)

The solutionto the ODE with the explicit Runge-Kutta method Tsit5. (B)

The solution to the SDE with tolerance abstol = 1 , reltol = 1 / from SOSRA. (C) Solution to the SDE with tolerances abstol = 2 − , reltol = 2 − with SRA3. (D) Solution to the SDE with tolerances abstol = 2 − , reltol = 2 − with SRA3.safety factor is set somewhat high in comparison to traditional ODE stiﬀness detection, we note that thesealgorithms were designed to eﬃciently handle mild stiﬀness and thus we see it as a beneﬁt that they onlydeclare stiﬀness when it appears to be in the regime which is more suitable for implicit methods. To test the eﬃciency we plotted a work-precision diagram with SRIW1, SOSRI, SOSRI2, and the ﬁxed timestep Euler-Maruyama and a Runge-Kutta Milstein schemes for Equation 67 and the multiplicative noiseLotka-Volterra Equation 73. As with Equation 70, Equation 73 does not have an analytical solution so areference solution was computed using a low tolerance solution via SOSRI for each Brownian trajectory.The results show that there is a minimal diﬀerence in eﬃciency between the SRI algorithms for errors overthe interval (cid:2) − , − (cid:3) , while these algorithms are all signiﬁcantly more eﬃcient than the lower orderalgorithms when the required error is < − (Figure 9A-D). The weak error work-precision diagrams showthat when using between 100 to 10,000 trajectories, the weak error is less than the sample error in theregime where there is no discernible eﬃciency diﬀerence between the SRI methods.These results show thatin the regime of mild accuracy on non-stiﬀ equations, these methods are much more eﬃcient than low ordermethods yet achieve the same eﬃciency as the non-stability optimized SRI variants. Note that these resultsalso show the conservativeness of the error estimators.20 .50.0100-10 Trajectory 1 x(t)y(t) t Trajectory 2

Stiffness Detection Positive

Figure 8:

Stiﬀness detection in the Van der Pol equations with additive noise Equation 72.

Tworepresentative trajectories to Equation 7 are plotted. The green dots indicate time-points where the stiﬀnessdetection algorithm detected stiﬀness.

Error T i m e ( s ) Linear Strong Error

Error T i m e ( s ) Linear Weak Error

Error T i m e ( s ) Lotka-Volterra Strong Error

Error T i m e ( s ) Lotka-Volterra Weak Error

Euler-MaruyamaRKMilSRIW1SRIW2SOSRISOSRI2Sample Error: 100Sample Error: 10000 (A) (B)(C) (D)

Figure 9:

SOSRI eﬃciency on non-stiﬀ test equations.

The error was taken as the average of 10,000trajectories for the Equation 67 and 100 trajectories for the Lokta-Volterra Equation 73. The sample errorwas determined for the weak error as the normal 95% conﬁdence interval for the mean using the variance of thetrue solution Equation 68 or the variance of the estimated true solutions via low tolerance solutions. The timeis the average time to compute a trajectory and is averaged over 1000 runs at the same tolerance or step size. (A)

Shown are the work-precision plots for the methods on Equation 67. Each of the adaptive time-steppingmethods solved the problem on the interval using changing values of tolerances, with tol = abstol = reltol starting at 10 − and ending at 10 − going in increments of 10. The ﬁxed time-stepping methods used timesteps of size h = 5 − to h = 5 − , changing the value by factors of 5. The error is the strong l error computedover the time series. (B) Same setup as the previous plot but using the weak error at the ﬁnal time-point. (C)

Shown are the work-precision plots for the methods on the multiplicative noise Lotka-Volterra Equation73. Each of the adaptive time-stepping methods solved the problem on the interval using changing values oftolerances, with tol = abstol = reltol starting at 4 − and ending at 4 − going in increments of 4. The ﬁxedtime-stepping methods used time steps of size h = 1 / − . to h = 1 / − . , changing the value by factorsof 12. The error is the strong l error computed over the time series. (D) Same setup as the previous plotbut using the weak error at the ﬁnal time-point. 21 lgorithm Abstol Reltol Run-time (seconds) Relative Time (vs SOSRI)SOSRI 2 − − − − − − − − − − − − − − (cid:0) θ = (cid:1) Table 2:

SRI times for the the EMT model on t ∈ [0 , . The equations were solved 10,000 times withthe given tolerances to completion and the elapsed time was recorded. The ﬁxed time step methods hadtheir ∆ t determined as the largest ∆ t in increments of powers of 2 that produced no unstable trajectories,as shown in [29]. DISTM is the Drift-Implicit Stochastic θ -Method To test the real consequences of the enhanced stability, we use the Epithelial-Mesenchymal Transition (EMT)model of 20 pathwise stiﬀ reaction equations introduced in [14], studied as a numerical test in [29], andwritten in Section A.7. In the previous work it was noted that t ∈ [0 ,

1] was a less stiﬀ version of this model.Thus we ﬁrst tested the speed that the methods could solve for 10,000 trajectories with no failures due tonumerical instabilities. The tolerances were tuned for each method by factors of 2 and ﬁnding the largestvalues that were stable. Since SOSRI demonstrated that its stability is much higher than even SOSRI2,we show the eﬀect of tolerance changes on SOSRI as well. The results show that at similar tolerances theSOSRI method takes nearly 5x less time than SRIW1 (Table 2). However, there is an upper bound on thetolerances before the adaptivity is no longer able to help keep the method stable. For SRIW1, this boundis much lower, causing it to run more than 15x slower than the fastest SOSRI setup. Interestingly SOSRI2required a higher tolerance than SRIW1 but was 3x faster than SRIW1’s fastest setup. We note that SOSRI’shighest relative tolerance 2 − ≈ × − is essentially requiring 4 digits of accuracy (in strong error) whenconsidering the conservativeness of the error estimator, which is far beyond the accuracy necessary in manycases. Lastly, we note that the SOSRI method is able to solve for 10,000 stable trajectories more than 60xfaster than any of the tested ﬁxed time step methods.We then timed the run time to solve 10 trajectories in the t ∈ [0 , tol = 10 − ,while the others require more (especially in absolute tolerance as there is a stiﬀ reactant whose values travelclose to zero). One interesting point to note is that at similar tolerances both SOSRI and SOSRI2 receivesimilar timings and both over 6 times faster than the fastest SRIW1 tolerance setup. Both are nearly twiceas fast as SRIW1 when matching tolerances as well. Given the conservativeness of the error estimatorsgenerally being around 2 orders of magnitude more precise than the local error estimate, the low tolerancesolutions are accurate enough for many phenomenological experiments and thus present a good speedupover previous methods. The timings for Euler-Maruyama and Runge-Kutta Milstein schemes are omittedsince the tests were unable to ﬁnish. From the results of [29] we note that the average dt for SRIW1 onthe edge of its stability had that the smallest dt was approximately 10 − . The stability region for ﬁxedstep-size Euler-Maruyama is strictly smaller than SRIW1 (Figure 4) which suggests that it would requirearound 5 × time steps (with Runge-Kutta Milstein being similar) to solve to t = 500. Thus, given ittakes on our setup extrapolating the time given 170 seconds for 2 steps, this projects to around 1 . × seconds, or approximately 5 years. 22 lgorithm Abstol Reltol Run-time (seconds) Relative Time (vs SOSRI)SOSRI 10 − − − − − − − − − − − − (cid:0) θ = (cid:1) (cid:0) θ = (cid:1) Table 3:

SRI times for the the EMT model on t ∈ [0 , . The equations were solved 10 times withthe given tolerances to completion and the elapsed time was recorded. The ﬁxed timestep methods hadtheir ∆ t chosen by incrementing by 10 − until 10 consecutive trajectories were stable. Drift-Implicit EulerMaruyama (DIEM) had ∆ t = and Drift-Implicit Runge-Kutta Milstein (DIRKM) had ∆ t = . Algorithm Abstol Reltol Run-time (seconds) Relative Time (vs SOSRI)SOSRI 10 − − − − − − Table 4:

SRI times for the the retinoic acid SPDE model on t ∈ [0 , . The equations were solvedtwice with the given tolerances to completion and the elapsed time was recorded. The tolerances were chosenas the highest pair of tolerances which did not diverge (going up by powers of 10). Note that none of thecases did the two timings vary by more than 1% of the total run time. Euler-Maruyama used time stepsof ∆ t = 1 / t = 1 / As another test we applied the methods to a method of lines discretization of a stochastic partial diﬀerentialequation (SPDE) describing the spatial regulation of the zebraﬁsh hindbrain via retinoic acid signaling (Section A.8) [31]. The discretization results in a system of 6 × ×

100 SDEs. Starting from an initialzero state, a concentration gradient emerges over t ∈ [0 , f calls per step, resulting in a more eﬃcient solution when high accuracy is not necessary.We note that the drift-implicit stochastic θ -method and drift implicit θ Runge-Kutta Milstein methods weretoo ineﬃcient to estimate since their time steps were constrained to be near that of the Euler-Maruyamaequation due to divergence of the Newton iterations. This SPDE could also be solved via SKenCarp byusing the transformation of Section 4, but from experiments on the PDE we note that eﬃcient solution ofthe implicit equations would require using a preconditioned Krylov method due to the size of the systemand thus it is left for future investigation.

In this work we derived stability-optimized SRK methods for additive and diagonal noise equations, and useda transformation to allow the additive noise methods to solve aﬃne noise problems. Many other equations23an be reduced to the additive noise case as well using the same means. Importantly, our derivation methodsutilized heavy computational tools in order to approximately optimize otherwise intractable equations. Thissame method of derivation can easily be scaled up to higher orders, and by incorporating the coeﬃcients forhigher conditions, eﬃciency can be optimized as well by adding the norm of the principle error coeﬃcients tothe optimization function. The majority of the search was performed using global optimizers in massive par-allel using a hand-optimized CUDA kernel for the numerical integral of the characteristic function, replacingman-hours with core-hours and eﬀectively optimizing the method. The clear next steps are to ﬁnd SRA andSRI methods with minimal error estimates and sensible stability regions for the cases in which lower strongerror matters, and similar optimizations on SRK methods developed for small noise problems. We note thathigh strong order methods were investigated because of their better trajectory-wise convergence, allowingfor a more robust solution and error estimation since our application to transiently pathwise stiﬀ equationsrequires such properties.In this work we also derived L-stable methods for additive (and thus multiplicative and aﬃne) noiseequations, and computationally could not ﬁnd an A-B-L stable method. While our method does not provethat no 2-stage A-B-L method exists, we have at least narrowed down its possibility. Additionally anextension of a well-known ESDIRK method to additive noise was developed. These ESDIRK methods havean extension which allows for mass-matrices in the problem formulation. Using singular mass matrices, thesemethods also present themselves as integrators for a form of SDAEs with deterministic constraints. Thismethod has an implicit-explicit (IMEX) extension and the stochastic extension was compatible with bothtableaus. We showed that this IMEX version of the method could numerically converge at order 2.0 on atest problem (matching the other SRA methods), indicating that it may achieve the suﬃcient condition.As an adaptive high order IMEX method, the ODE version of the method is a common choice for largediscretizations of PDEs. Thus this method could present itself as a potentially automatic and eﬃcientoption for discretizations of large aﬃne noise SPDEs by being able to use a low number of time steps whileminimizing the amount of work required to solve the implicit equation. We note that adaptivity along witheﬃcient stage predictors was required to be more eﬃcient than the common stochastic theta methods sincedivergence of quasi-Newton steps can be common if care is not taken. After engineering the method withall of the components together, the benchmark results showed large eﬃciency gains over both the previousdrift-implicit and stability-optimized explicit methods. While previous literature questioned the applicabilityof L-stable integrators to stochastic diﬀerential equations due to high error in the slow variables [25], ourcomputations show that this analysis may be mislead by analyzing strong order 0.5 methods. With ourhigher strong order methods we see suﬃciently accurate results on real stiﬀ problems, and this is greatlyhelped by time stepping adaptivity.The main caveat for our methods is the restrictions on the form of noise. While we have shown that anenlarged class of problems (aﬃne noise) can handled by the integrators for additive noise problems, this isstill a very special case in the scope of possible SDEs. Diagonal noise is a much expanded scope but is stillconstrained, and our implicit methods were only derived for the additive noise case. Further research shouldfocus on the expansion of this these techniques to high order adaptive ESDIRK diagonal noise integrators.In addition, when g is non-zero a “diagonal noise” problem over the complex plane does not have diagonalnoise (due to the mixing of real and complex parts from complex multiplication, and reinterpretation asa 2 n real system). Thus these methods are not applicable to problems deﬁned in the complex plane withcomplex Wiener processes. Development of similar integrators for commutative noise problems could allowfor similar performance beneﬁts on such problems and is a topic for future research.Additionally, we were not able to suﬃciently improve the stability along the noise axis with our explicitdiagonal noise methods. However, this is likely due to explicitness in the noise term. Recent research hasshown that step splitting which utilize a predicted step in the diﬀusion calcuation can signiﬁcantly improvethe stability of a method [19, 42]. Given this, we conjecture that a form of predictor-correction, such as: X n +1 = X n + s (cid:88) i =1 α i f (cid:16) t n + c (0) i h, H (0) i (cid:17) (61)+ s (cid:88) i =1 (cid:18) β (1) i I (1) + β (2) i I (1 , √ h + β (3) i I (1 , h + β (4) i I (1 , , h (cid:19) g (cid:16) t n + c (1) i h (cid:17) (62)24ith stages H (1) i = X n + s (cid:88) j =1 A (1) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (1) ij g (cid:16) t n + c (1) j h, H (1) i (cid:17) √ h (63) H (0) i = X n + s (cid:88) j =1 A (0) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (0) ij g (cid:16) t n + c (1) j h, H (1) i (cid:17) I (1 , h (64) H (1) i = X n + s (cid:88) j =1 A (1) ij f (cid:16) t n + c (0) j h, H (0) j (cid:17) h + s (cid:88) j =1 B (1) ij g (cid:16) t n + c (1) j h, H (1) i (cid:17) √ h could improve the noise stability of the method while keeping explicitness and the same tableau. However,proper convergence and stability analysis would require signiﬁcant eﬀort.Our timings show that the current high order SRK methods are stability-bound and that when scientiﬁcstudies are only looking for small amounts of accuracy in stochastic simulations, most of the computationaleﬀort is lost to generating more accurate than necessary solutions in order to satisfy stability constraints.For additive noise problems we were able to obtain solutions about 5x-30x faster and for diagonal noiseapproximately 6x than the current adaptive methods (SRA1, SRA3, SRIW1), while common methods likeEuler-Maruyama and Drift-Implicit θ Runge-Kutta Milstein were in many cases hundreds of times sloweror in many cases could not even ﬁnish. We have also shown that these methods are very robust even athigh tolerances and have a tendency to produce the correct qualitative results on semi-stiﬀ equations (viaplots) even when the user chosen accuracy is low. Given that the required user input is minimal and workover a large range of stiﬀness, we see these as very strong candidates for default general purpose solversfor problem-solving environments such as MATLAB and Julia since they can easily and eﬃciently produceresults which are suﬃciently correct. Due to a choice in the optimization, the SOSRA and SOSRA2 methodsare not as eﬃcient at low tolerances as SRA3, so SRA3 should be used when high accuracy is necessary (onadditive or aﬃne noise problems). However, in many cases like integrating to ﬁnd steady distributions ofbistable parameter regimes or generating trajectories of phonomenological models, this ability to quickly geta more course estimate is valuable.The stiﬀness detection in SDEs is a novel addition which we have demonstrated can act very robustly. Ithas a control parameter ω which can be used to control the false positive and false negative rate as needed.Note that stiﬀ methods can achieve similar largest eigenvalue estimates directly from the Jacobians of f (and g ) given that the methods are implicit (or in the case of Rosenbrock methods, the Jacobian must still becomputed), and thus this can be paired with a stiﬀ solver to allow for automatic switching between stiﬀ andnon-stiﬀ solvers. Given that the cost for such stiﬀness checks is minimal and the demonstrated eﬃciency ofthe implicit methods on stiﬀ equations, we are interested in future studies on the eﬃciency of such compositemethod due to the stochastic nature of stiﬀness in SDEs. A Appendix I: Test Equations

A.1 Additive Noise Test Equation dX t = (cid:18) β √ t −

12 (1 + t ) X t (cid:19) dt + αβ √ t dW t , X = 12 , (65)where α = and β = with true solution X t = 1 √ t X + β √ t ( t + αW t ) . (66) A.2 Diagonal Noise Test Equation dX t = αX t dt + βX t dW t X = 12 , (67)25here α = and β = with true solution X t = X e (cid:16) β − α (cid:17) t + αW t . (68) A.3 Split Additive Test Equation dX t = ( f ( t, X t ) + f ( t, X t )) dt + αβ √ t dW t , X = 12 , (69)where f ( t, X t ) = β √ tf ( t, X t ) = −

12 (1 + t ) X t with true solution Equation 66. A.4 Additive Noise Lotka-Volterra dx = ( ax − bxy ) dt + σ A dW t dy = ( − cy + dxy ) dt + σ A dW t (70)where a = 1 . b = 1, c = 3 . d = 1 . σ A = 0 . A.5 Additive Noise Van Der Pol

The driven Van Der Pol equation is dy = µ ((1 − x ) y − x ) dtdx = ydt (71)The additive noise variant is dy = µ ((1 − x ) y − x ) dt + ρdW (1) t dx = y + ρdW (2) t (72) A.6 Multiplicative Noise Lotka-Volterra dx = ( ax − bxy ) dt + σ A dW t dy = ( − cy + dxy ) dt + σ A dW t (73)where a = 1 . b = 1, c = 3 . d = 1 . σ A = 0 . A.7 Epithelial-Mesenchymal Transition Model

The Epithelial-Mesenchymal Transition (EMT) model is given by the following system of SDEs which correspond to achemical reaction network modeled via mass-action kinetics with Hill functions for the feedbacks. Thismodel was introduced in [14]. A = (([ T GF ] + [

T GF

0] ( t )) /J snail ) n snail + ([ OV OL /J snail ) n snail d [ snail t dt = k snail + k snail (([ T GF ] + [

T GF

0] ( t )) /J snail ) n snail (1 + A ) (1 + [ SNAIL ] /J snail ) kd snail ([ snail − [ SR ]) − kd SR [ SR ] d [ SNAIL ] dt = k SNAIL ([ snail − [ SR ]) − kd SNAIL [ SNAIL ] d [ miR dt = kO + k SNAIL ] /J ) n + ([ ZEB ] /J ) n − kd ([ miR − [ SR ]) − (1 − λ SR ) kd SR [ SR ] d [ SR ] dt = T k ( K SR ([ snail − [ SR ]) ([ miR − [ SR ]) − [ SR ]) d [ zeb ] dt = k zeb + k zeb ([ SNAIL ] /J zeb ) n zeb SNAIL ] /J zeb ) n zeb + ([ OV OL /J zeb ) n zeb − kd zeb (cid:32) [ zeb ] − (cid:88) i =1 C i [ ZR ] (cid:33) − (cid:88) i =1 kd ZR i C i [ ZR i ] d [ ZEB ] dt = k ZEB (cid:32) [ zeb ] − (cid:88) i =1 C i [ ZR i ] (cid:33) − kd ZEB [ ZEB ] d [ miR dt = k + k SNAIL ] /J ) n + ([ ZEB ] /J ) n − kd (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) − (cid:88) i =1 (1 − λ i ) kd ZR i C i i [ ZR i ] − (1 − λ TR ) kd TR [ T R ] d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33)(cid:32) [ zeb ] − (cid:88) i =1 C i [ ZR i ] (cid:33) − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ ZR ] dt = T k (cid:32) K (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) [ ZR ] − [ ZR ] (cid:33) d [ tgf ] dt = k tgf − kd tgf ([ tgf ] − [ T R ]) − kd TR [ T R ] d [ T GF ] dt = k TGF + k TGF ([ tgf ] − [ T R ]) − kd TGF [ T GF ] d [ T R ] dt = T k (cid:32) K TR (cid:32) [ miR − (cid:88) i =1 iC i [ ZR i ] − [ T R ] (cid:33) ([ tgf ] − [ T R ]) − [ T R ] (cid:33) d [ Ecad ] dt = k E + k E SNAIL ] /J E ) n E + k E ZEB ] /J E ) n E − kd E [ Ecad ] B = k V ([ SNAIL ] /J V ) n V SNAIL ] /J V ) n V + k V ([ ZEB ] /J V ) n V ZEB ] /J V ) n V d [ V im ] dt = k V + B (1 + [ OV OL /J V ) − kd V [ V im ] d [ OV OL dt = k + k

11 + ([

ZEB ] /J ) n − kd O [ OV OL d [ OV OL p dt = k Op [ OV OL − kd Op [ OV OL p arameter Value Parameter Value Parameter Value Parameter Value J J E K k O J J E K kO J J V K kO J J V K kd snail J O J V K TR kd tgf J snail J zeb K SR kd zeb J snail J zeb T GF kd TGF J snail K T k kd ZEB k snail k zeb λ k TGF n n snail λ k E n n E λ k V n n E λ k E n n V λ k E n O n V λ SR k V n snail n zeb λ TR k V k O k k k tgf k zeb k TGF k SNAIL k ZEB kd ZR kd ZR kd ZR kd ZR kd ZR kd O kd kd kd SR kd E kd V k Op kd Op (cid:88) i =1 iC i [ ZR i ] = 5 [ ZR ] + 20 [ ZR ] + +30 [ ZR ] + 20 [ ZR ] + 5 [ ZR ] , (cid:88) i =1 C i [ ZR i ] = 5 [ ZR ] + 10 [ ZR ] + 10 [ ZR ] + 5 [ ZR ] + [ ZR ] , [ T GF

0] ( t ) = (cid:40) t > o.w. The parameter values are given in Table 5.

A.8 Retinoic Acid SPDE Model d [ RAout ] = (cid:0) β ( x ) + D ∆ [ RAout ] − b [ RAout ] + c (cid:2) RAin (cid:3)(cid:1) dt + σRAoutdWouttd (cid:2) RAin (cid:3) = (cid:32) b [ RAout ] + δ [ BP ] [ RA − RAR ] − (cid:32) γ [ BP ] + η + α [ RA − RAR ] ω + [ RA − RAR ] − c (cid:33) (cid:2) RAin (cid:3)(cid:33) dtd [ RA − BP ] = (cid:0) γ [ BP ] (cid:2) RAin (cid:3) + λ [ BP ] [ RA − RAR ] − ( δ + ν [ RAR ]) [ RA − BP ] (cid:1) dtd [ RA − RAR ] = ( ν [ RA − BP ] [ RAR ] − λ [ BP ] [ RA − RAR ]) dt + σRA − RAR [ RA − RAR ] dWRA − RARtd [ BP ] = (cid:32) a − λ [ BP ] [ RA − RAR ] − γ [ BP ] (cid:2) RAin (cid:3) + ( δ + ν [ RAR ]) [ RA − BP ] − u [ BP ] + d [ RA − RAR ] e + [ RA − RAR ] (cid:33) dtd [ RAR ] = ( ζ − ν [ RA − BP ] [ RAR ] + λ [ BP ] [ RA − RAR ] − r [ RAR ]) dt where β ( x ) = β H ( x −

40) with H the Heaviside step function and x = 40 is the edge of retinoic acidproduction [31]. The space was chosen as [ − , × [0 , x = ∆ y = 5. The boundary conditionswere no-ﬂex on every side except the right side which had leaky boundary conditions with parameter kA =0 . B Appendix I: SKenCarp, SOSRA, and SOSRI Tableaus

All entries not listed are zero. 28 arameter Value Parameter Value Parameter Value σ RA in , σ RA − RAR , σ RA out ω u γ d α δ e β η a c r ζ ν λ D B.1 SKenCarp Exact Values K = 87294609440832483406992237 K = − K = 26826820 √ K = K ( K − K )4868738516734691891458097 B (0)2 , = K − ,B (0)4 , = K − K ,B (0) i,j = 0 o.w. B.2 SOSRA

Coeﬃcient Value Coeﬃcient Value α β (1)3 α β (2)1 α β (2)2 c (0)1 β (2)3 -1.0247917243497813 c (0)2 A (0)2 , c (0)3 A (0)3 , -3.1609142252828395 c (1)1 A (0)3 , c (1)2 B (0)2 , c (1)3 B (0)3 , β (1)1 -16.792534242221663 B (0)3 , β (1)2 .3 SOSRA2 Coeﬃcient Value Coeﬃcient Value α β (1)3 α -0.9683897375354181 β (2)1 α β (2)2 -0.8169981105823436 c (0)1 β (2)3 -0.18300188941765633 c (0)2 A (0)2 , c (0)3 A (0)3 , c (1)1 A (0)3 , c (1)2 B (0)2 , c (1)3 B (0)3 , β (1)1 B (0)3 , β (1)2 B.4 SOSRI

Coeﬃcient Value Coeﬃcient Value A (0)2 , -0.04199224421316468 α A (0)3 , α A (0)3 , -2.0527723684000727 c (0)2 -0.04199224421316468 A (0)4 , c (0)3 A (0)4 , -2.8895936137439793 c (0)4 A (0)4 , c (1)1 A (1)2 , c (1)2 A (1)3 , c (1)3 A (1)3 , -0.1502377115150361 c (1)4 A (1)4 , β (1)1 -1.8453464565104432 A (1)4 , β (1)2 A (1)4 , β (1)3 -0.2523866501071323 B (0)2 , -0.21641093549612528 β (1)4 B (0)3 , β (2)1 B (0)3 , β (2)2 -0.5771202869753592 B (0)4 , -1.0536037558179159 β (2)3 -0.12919702470322217 B (0)4 , β (2)4 B (0)4 , -0.20725685784180017 β (3)1 B (1)2 , -0.5119011827621657 β (3)2 -2.688764531100725 B (1)3 , β (3)3 B (1)3 , -4.9395031322250995 β (3)4 -0.40896857551684945 B (1)4 , β (4)1 B (1)4 , β (4)2 -0.57877086147738 B (1)4 , -1.4223118283355949 β (4)3 α β (4)4 α -0.6401334255743456 30 .5 SOSRI2 Coeﬃcient Value Coeﬃcient Value A (0)2 , α A (0)3 , α -0.2911544680711602 A (0)3 , c (0)2 A (0)4 , c (0)3 A (0)4 , c (0)4 A (0)4 , -0.27162232008616016 c (1)1 A (1)2 , c (1)2 A (1)3 , c (1)3 A (1)3 , c (1)4 A (1)4 , β (1)1 -0.45315689727309133 A (1)4 , β (1)2 A (1)4 , -0.04344582292908241 β (1)3 B (0)2 , β (1)4 B (0)3 , β (2)1 -0.4994383733810986 B (0)3 , β (2)2 B (0)4 , β (2)3 -0.25613778661003145 B (0)4 , -0.46089678470929774 β (2)4 -0.16260245862427797 B (0)4 , -0.9637509618944188 β (3)1 B (1)2 , β (3)2 -0.8330937231303933 B (1)3 , -0.07452812525785148 β (3)3 -0.3792843195533583 B (1)3 , -0.49783736486149366 β (3)4 -0.24077885458934023 B (1)4 , -0.5591906709928903 β (4)1 -0.4976090683622265 B (1)4 , β (4)2 B (1)4 , -0.8984927888368557 β (4)3 -1.4102107084476505 α -0.15036858140642623 β (4)4 α C Appendix II: SRK Order Conditions

C.1 Order Conditions for Rler-SRI Methods

The coeﬃcients (cid:0) A , B , β ( i ) , α (cid:1) must satisfy the following order conditions to achieve order .5:1. α T e = 12. β (1) T e = 1 3. β (2) T e = 04. β (3) T e = 0 5. β (4) T e = 0additionally, for order 1:1. β (1) T B (1) e = 02. β (2) T B (1) e = 1 3. β (3) T B (1) e = 04. β (4) T B (1) e = 0and lastly for order 1.5: 31. α T A (0) e = α T B (0) e = 13. α T (cid:0) B (0) e (cid:1) = β (1) T A (1) e = 15. β (2) T A (1) e = 06. β (3) T A (1) e = − β (4) T A (1) e = 08. β (1) T (cid:0) B (1) e (cid:1) = 1 9. β (2) T (cid:0) B (1) e (cid:1) = 010. β (3) T (cid:0) B (1) e (cid:1) = − β (4) T (cid:0) B (1) e (cid:1) = 212. β (1) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 013. β (2) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 014. β (3) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 015. β (4) T (cid:0) B (1) (cid:0) B (1) e (cid:1)(cid:1) = 116 . β (1) T (cid:16) A (1) (cid:16) B (0) e (cid:17)(cid:17) + 13 β (3) T (cid:16) A (1) (cid:16) B (0) e (cid:17)(cid:17) = 0where f, g ∈ C , ( I × R d , R d ), c ( i ) = A ( i ) e , e = (1 , , , T [32]. C.2 Order Conditions for Rler-SRA Methods

The coeﬃcients (cid:0) A , B , β ( i ) , α (cid:1) must satisfy the conditions for order 1:1. α T e = 1 2. β (1) T e = 1 3. β (2) T e = 0and the additional conditions for order 1.5:1. α T B (0) e = 12. α T A (0) e = α T (cid:0) B (0) e (cid:1) = β (1) T c (1) = 1 5. β (2) T c (1) = − c (0) = A (0) e with f ∈ C , ( I × R d , R d ) and g ∈ C ( I , R d ) [32]. D Appendix III: Derivation Details (cid:18) I − µ ∆ tA (0) (cid:19) H (0) = Un + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:18) Un + µ ∆ tA (1) H (0) (cid:19) , (cid:18) I − µ ∆ tA (0) (cid:19) H (0) − (cid:34) σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:35) µ ∆ tA (1) H (0) = Un + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − Un (cid:32) I − µ ∆ tA (0) − µ ∆ tA (1) σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) H (0) = (cid:32) I + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) UnH (0) = (cid:32) I − µ ∆ tA (0) − µσI (1 , A (1) B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) − (cid:32) I + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) Un (cid:18) I − σ √ ∆ tB (1) (cid:19) H (1) = Un + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:32) Un + σ I (1 , t B (0) H (1) (cid:33)(cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) H (1) = Un + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − Un (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) H (1) = (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) UnH (1) = (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un n +1 = Un + µ ∆ t  α · (cid:32) I − µ ∆ tA (0) − µσI (1 , A (1) B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) − (cid:32) I + σ I (1 , t B (0) (cid:18) I − σ √ ∆ tB (1) (cid:19) − (cid:33) Un  + σI (1)  β (1) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un  + σ I (1 , √ ∆ t  β (2) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un  + σ I (1 , t  β (3) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un  + σ I (1 , , t  β (4) · (cid:32) I − σ √ ∆ tB (1) − µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − σ I (1 , t B (0) (cid:33) − (cid:32) I + µ ∆ tA (1) (cid:18) I − µ ∆ tA (0) (cid:19) − (cid:33) Un  Thus we substitute in the Wiktorsson approximations I ( i,i ) = 12 (cid:0) ∆ W − h (cid:1) I ( i,i,i ) = 16 (cid:0) ∆ W − h ∆ W (cid:1) I ( i. = 12 h (cid:18) ∆ W + 1 √ Z (cid:19) where ∆ Z ∼ N (0 , h ) is independent of ∆ W ∼ N (0 , h ). By the properties of the normal distribution, wehave that E [(∆ W ) n ] = 0for any odd n and E (cid:104) (∆ W ) (cid:105) = hE (cid:104) (∆ W ) (cid:105) = 3 h E (cid:104) (∆ W ) (cid:105) = 15 h E (cid:104) (∆ W ) (cid:105) = 105 h , and similarly for ∆ Z . Acknowledgments

We would like to thank the members of JuliaDiﬀEq, speciﬁcally David Widmann (@devmotion), Yingbo Ma(@YingboMa) and (@dextorious) for their contributions to the ecosystem. Their eﬀorts have helped makethe development of eﬃcient implementations possible.

References [1]

A. Abdulle and S. Cirilli , S-rock: Chebyshev methods for stiﬀ stochastic diﬀerential equations , SIAMJournal on Scientiﬁc Computing, 30 (2008), pp. 997–1014, https://doi.org/10.1137/070679375 , https://doi.org/10.1137/070679375 .[2] D. I. K. Ahmadia and A. J. , Optimal stability polynomials for numerical integration of initial valueproblems , CAMCOS, 7 (2012), pp. 247–271, https://doi.org/10.2140/camcos.2012.7.247 .[3]

E. G. Birgin and J. M. Martinez , Improving ultimate convergence of an augmented lagrangianmethod , Optimization Methods and Software, 23 (2008), pp. 177–195, https://doi.org/10.1080/10556780701577730 , https://doi.org/10.1080/10556780701577730 .334] K. Burrage and J. C. Butcher , Non-linear stability of a general class of diﬀerential equation meth-ods , BIT Numerical Mathematics, 20 (1980), pp. 185–203, https://doi.org/10.1007/BF01933191 , https://doi.org/10.1007/BF01933191 .[5] J. C. Butcher , A history of runge-kutta methods , Applied Numerical Mathematics, 20 (1996),pp. 247–260, https://doi.org/http://dx.doi.org/10.1016/0168-9274(95)00108-5 , .[6] J. C. Butcher , Numerical methods for ordinary diﬀerential equations in the 20th century , Jour-nal of Computational and Applied Mathematics, 125 (2000), pp. 1–29, https://doi.org/http://doi.org/10.1016/S0377-0427(00)00455-6 , .[7] A. Conn, N. Gould, and P. Toint , A globally convergent augmented lagrangian algorithm foroptimization with general constraints and simple bounds , SIAM Journal on Numerical Analysis, 28(1991), pp. 545–572, https://doi.org/10.1137/0728030 , https://doi.org/10.1137/0728030 .[8] J. R. Dormand and P. J. Prince , A family of embedded runge-kutta formulae , Journalof Computational and Applied Mathematics, 6 (1980), pp. 19–26, https://doi.org/http://dx.doi.org/10.1016/0771-050X(80)90013-3 , .[9] I. Dunning, J. Huchette, and M. Lubin , Jump: A modeling language for mathematical optimization ,SIAM Review, 59 (2017), pp. 295–320, https://doi.org/10.1137/15M1020575 , https://doi.org/10.1137/15M1020575 .[10] H. Gilsing and T. Shardlow , Sdelab: A package for solving stochastic diﬀeren-tial equations in matlab , Journal of Computational and Applied Mathematics, 205 (2007),pp. 1002–1018, https://doi.org/https://doi.org/10.1016/j.cam.2006.05.037 , .[11] E. Hairer, S. P. Nrsett, and G. Wanner , Solving ordinary diﬀerential equations I : nonstiﬀproblems , Springer series in computational mathematics,, Springer, Heidelberg ; London, 2nd rev. ed.,2009.[12]

E. Hairer and G. Wanner , Solving Ordinary Diﬀerential Equations II - Stiﬀ and Diﬀerential-Algebraic Problems , Springer, 1991.[13]

E. Hairer and G. Wanner , Stiﬀ diﬀerential equations solved by radau methods , Journalof Computational and Applied Mathematics, 111 (1999), pp. 93–111, https://doi.org/https://doi.org/10.1016/S0377-0427(99)00134-X , .[14] T. Hong, K. Watanabe, C. H. Ta, A. Villarreal-Ponce, Q. Nie, and X. Dai , An ovol2-zeb1mutual inhibitory circuit governs bidirectional and multi-step transition between epithelial and mes-enchymal states , PLoS Comput Biol, 11 (2015), p. e1004569, https://doi.org/10.1371/journal.pcbi.1004569 , http://dx.doi.org/10.1371%2Fjournal.pcbi.1004569 .[15] M. E. Hosea and L. F. Shampine , Analysis and implementation of tr-bdf2 , Applied Numerical Math-ematics, 20 (1996), pp. 21–37, https://doi.org/https://doi.org/10.1016/0168-9274(95)00115-8 , .[16] A. Janicki, A. Izydorczyk, and P. Gradalski , Computer Simulation of Stochastic Models withSDE-Solver Software Package , Springer Berlin Heidelberg, Berlin, Heidelberg, 2003, pp. 361–370, https://doi.org/10.1007/3-540-44860-8_37 , https://doi.org/10.1007/3-540-44860-8_37 .[17] S. G. Johnson , The nlopt nonlinear-optimization package , http://ab-initio.mit.edu/nlopt .3418] C. A. Kennedy and M. H. Carpenter , Additive runge-kutta schemes for convection-diﬀusion-reaction equations , Applied Numerical Mathematics, 44 (2003), pp. 139–181, https://doi.org/https://doi.org/10.1016/S0168-9274(02)00138-1 , .[19] P. Kloeden and A. Neuenkirch , Convergence of numerical methods for stochastic diﬀerential equa-tions in mathematical ﬁnance , 2012, https://doi.org/10.1142/9789814436434_0002 .[20]

P. E. Kloeden and E. Platen , Numerical Solution of Stochastic Diﬀerential Equations , SpringerBerlin Heidelberg, 2011, https://books.google.com/books?id=BCvtssom1CMC .[21]

Y. Komori and K. Burrage , Weak second order s-rock methods for stratonovich stochas-tic diﬀerential equations , Journal of Computational and Applied Mathematics, 236 (2012),pp. 2895–2908, https://doi.org/https://doi.org/10.1016/j.cam.2012.01.033 , .[22] Y. Komori and K. Burrage , Strong ﬁrst order s-rock methods for stochastic diﬀeren-tial equations , Journal of Computational and Applied Mathematics, 242 (2013), pp. 261–274, https://doi.org/https://doi.org/10.1016/j.cam.2012.10.026 , .[23] F. S. Lawrence , Some practical runge-kutta formulas , Math. Comput., 46 (1986), pp. 135–150, https://doi.org/10.2307/2008219 .[24]

J. Lawson , An order ﬁve runge-kutta process with extended region of stability , SIAM Journal on Nu-merical Analysis, 3 (1966), pp. 593–597, https://doi.org/10.1137/0703051 , http://dx.doi.org/10.1137/0703051 .[25] T. Li, A. Abdulle, and W. E , Eﬀectiveness of implicit methods for stiﬀ stochastic diﬀerential equa-tions , Commun. Comput. Phys, 3 (2008), pp. 295–307.[26]

X. Mao , The truncated eulermaruyama method for stochastic diﬀerential equations , Journal ofComputational and Applied Mathematics, 290 (2015), pp. 370–384, https://doi.org/https://doi.org/10.1016/j.cam.2015.06.002 , .[27] J. K. Moller and H. Madsen , From state dependent diﬀusion to constant diﬀusion in stochasticdiﬀerential equations by the lamperti transform , report, Technical University of Denmark, DTU Infor-matics, Building 321, 2010.[28]

M. J. D. Powell , A direct search optimization method that models the objective and constraint func-tions by linear interpolation , in Advances in Optimization and Numerical Analysis, Proceedings of the6th Workshop on Optimization and Numerical Analysis, Oaxaca, Mexico, S. Gomez and J.-P. Hennart,eds., vol. 275, Kluwer Academic Publishers, pp. 51–67, https://doi.org/citeulike-article-id:6904064 , .[29] C. Rackauckas and Q. Nie , Adaptive methods for stochastic diﬀerential equations via natural em-beddings and rejection sampling with memory , Discrete and Continuous Dynamical Systems - SeriesB, 22 (2016), pp. 2731–2761, https://doi.org/10.3934/dcdsb.2017133 , http://aimsciences.org//article/id/5354a27a-e5be-4c40-9d7a-e918853b56b7 .[30] C. Rackauckas and Q. Nie , Diﬀerentialequations.jl - a performant and feature-rich ecosystem forsolving diﬀerential equations in julia , Journal of Open Research Software, 5 (2017), p. 15, https://doi.org/http://doi.org/10.5334/jors.151 .[31]

C. Rackauckas and Q. Nie , Mean-independent noise control of cell fates via intermediate states ,iScience, Accepted (2018). 3532]

A. Rossler , Runge kutta methods for the strong approximation of solutions of stochastic diﬀerentialequations , SIAM Journal on Numerical Analysis, 48 (2010), pp. 922–952, https://doi.org/10.1137/09076636x .[33]

T. P. Runarsson and Y. Xin , Search biases in constrained evolutionary optimization , IEEE Trans-actions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 35 (2005), pp. 233–243, https://doi.org/10.1109/TSMCC.2004.841906 .[34]

T. Schaffter , From genes to organisms: Bioinformatics System Models and Software , thesis, 2014.[35]

L. F. Shampine , Stiﬀness and nonstiﬀ diﬀerential equation solvers, ii: Detecting stiﬀness with runge-kutta methods , ACM Trans. Math. Softw., 3 (1977), pp. 44–53, https://doi.org/10.1145/355719.355722 .[36]

L. F. Shampine and K. L. Hiebert , Detecting stiﬀness with the fehlberg (4, 5) formulas ,Computers & Mathematics with Applications, 3 (1977), pp. 41–46, https://doi.org/http://dx.doi.org/10.1016/0898-1221(77)90112-2 , .[37] W. H. E. Sharp, D. J. Higham, B. Owren, and P. W. , A survey of the explicit runge-kutta method ,(1995).[38]

G. Soderlind and L. Wang , Evaluating numerical ode/dae methods, algorithms and software , Jour-nal of Computational and Applied Mathematics, 185 (2006), pp. 244–260, https://doi.org/https://doi.org/10.1016/j.cam.2005.03.009 , .[39] C. Tsitouras , Runge-kutta pairs of order 5(4) satisfying only the ﬁrst column simplifying assump-tion , Computers & Mathematics with Applications, 62 (2011), pp. 770–775, https://doi.org/http://doi.org/10.1016/j.camwa.2011.06.002 , .[40] P. J. van der Houwen , Explicit runge-kutta formulas with increased stability boundaries , NumerischeMathematik, 20 (1972), pp. 149–164, https://doi.org/10.1007/BF01404404 , http://dx.doi.org/10.1007/BF01404404 .[41] B. Wang and Q. Qi , Modeling the lake eutrophication stochastic ecosystem and the research of its sta-bility , Mathematical Biosciences, https://doi.org/https://doi.org/10.1016/j.mbs.2018.03.019 , .[42] P. Wang and Y. Li , Split-step forward methods for stochastic diﬀerential equations , Journal ofComputational and Applied Mathematics, 233 (2010), pp. 2641–2651, https://doi.org/https://doi.org/10.1016/j.cam.2009.11.010 , .[43] M. Wiktorsson , Joint characteristic function and simultaneous simulation of iterated ito integrals formultiple independent brownian motions , The Annals of Applied Probability, (2001), pp. 470–487, https://doi.org/10.1214/aoap/1015345301 , http://projecteuclid.org/euclid.aoap/1015345301http://projecteuclid.org/euclid.aoap/1015345301