Autonomous learning of nonlocal stochastic neuron dynamics
AAutonomous learning of nonlocal stochastic neuron dynamics
Tyler E. Maltba a,1 , Hongli Zhao b,2 , Daniel M. Tartakovsky c,3, ∗ a Department of Statistics, UC Berkeley, Berkeley, CA 94720, USA. b Department of Mathematics, UC Berkeley, Berkeley, CA 94720, USA. c Department of Energy Resources Engineering, Stanford University, Stanford, CA 94305, USA.
Abstract
Neuronal dynamics is driven by externally imposed or internally generated random excitations/noise,and is often described by systems of stochastic ordinary differential equations. A solution to theseequations is the joint probability density function (PDF) of neuron states. It can be used to calculatesuch information-theoretic quantities as the mutual information between the stochastic stimulus andvarious internal states of the neuron (e.g., membrane potential), as well as various spiking statistics.When random excitations are modeled as Gaussian white noise, the joint PDF of neuron states satisfiesexactly a Fokker-Planck equation. However, most biologically plausible noise sources are correlated(colored). In this case, the resulting PDF equations require a closure approximation. We propose twomethods for closing such equations: a modified nonlocal large-eddy-diffusivity closure and a data-drivenclosure relying on sparse regression to learn relevant features. The closures are tested for stochastic leakyintegrate-and-fire (LIF) and FitzHugh-Nagumo (FHN) neurons driven by sine-Wiener noise. Mutualinformation and total correlation between the random stimulus and the internal states of the neuron arecalculated for the FHN neuron.
Keywords:
Stochastic neuron model, Colored noise, Method of distributions, Nonlocal, Equationlearning
1. Introduction
Computational neuroscience often employs the theory of nonlinear dynamical systems to quantita-tively understand complex neuronal processes [1]. A key aspect of the dynamical systems approach isthe ability to discover relevant properties of the brain function without accounting for every detail. Itcomprises coupled ordinary differential equations (ODEs), with the number of ODEs determining thedimensionality of a dynamical system. Examples of such neuron representations are the one-dimensionalLeaky Integrate-and-Fire (LIF) model [2], two-dimensional FitzHugh-Nagumo (FHN) [3] and Morris-Lecar [4] models, and four-dimensional Hodgkin-Huxley model [5]. To account for apparent randomnessof neuron behavior (e.g., due to synaptic, internal, and/or channel noise), such models include a stochas-tic noise/source term. To account for parametric uncertainty, the coefficients in such models might betreated as random variables. In both cases, solutions to the resulting stochastic ODEs take the form ofjoint probability density functions (PDFs) of the system states. These PDFs can be either used directlyto characterize the ensemble behavior of large populations of neurons [6] or post-processed to computeinterspike-interval (ISI) distributions [7, 8], first exit time (FET) distributions [9, 10], and the mutualinformation between different system states or between a system state and a stochastic input [11]. ∗ Corresponding author Supported by National Science Foundation Graduate Research Fellowship under Grant No. DGE 1752814 Supported by UC Berkeley SURF L&S Program Supported in part by Air Force Office of Scientific Research under award number FA9550-18-1-0474, and by a gift fromTOTAL.
Preprint submitted to arXiv.org November 24, 2020 a r X i v : . [ q - b i o . N C ] N ov ynaptic noise reflects randomness in the release of neurotransmitters by synapses and random inputsfrom other neurons. Internal noise originates from the operations of ionic channels, and channel noisestems from the random switching of such channels. Regardless of its genesis, a random source instochastic ODEs (e.g., ionic current in the LIF and FHN models) is frequently treated as Gaussian whitenoise [7]. While mathematically and computationally convenient, Gaussian white noise is biologicallyimplausible since it is unbounded and uncorrelated. For that reason, sine-Wiener (SW) channel noise—abounded and correlated (colored) non-Gaussian process—has been used to model signal transmission [14],resonance [15], and detection [16] in nervous systems; to induce spike death in excitable systems [17]; andto detect a weak periodic signal in the FHN neuron [18].One of the attractive features of Gaussian white noise is that the joint PDF of a solution to a stochasticODE-based neuron model, like a solution to any Langevin system, satisfies exactly the Fokker-Planckequation [19]. This deterministic advection-diffusion partial-differential equation, whose dimensionality isdetermined by the number of state variables/ODEs in the underlying stochastic model, is no longer exactin the presence of colored noise. A joint PDF of neuron states—or other quantities such as ISI or FET—can be estimated via (multilevel) Monte Carlo simulations [20, 21, 9, 22], but this procedure requiressignificant computational resources and sheds little light on the probabilistic dynamics of the neuron.More efficient methods of stochastic computation, e.g., polynomial chaos expansions and stochastic finiteelements, do not provide a computational speed-up in this setting because colored noise is characterizedby a short correlation length, leading to the so-called curse of dimensionality (see, e.g., the referencesin [23]). The method of distributions [24] can be used to derive a PDF equation for stochastic ODEs withcolored noise [23, 25]. In the limit of the vanishing correlation length, i.e., when colored noise reduces towhite noise, such a PDF equation becomes exact, reducing to the corresponding Fokker-Planck equation.Otherwise, it requires a closure approximation to be computable.We propose two alternative approaches to construction of such closures: nonlocal large-eddy-diffusivity(LED) closures [26] and a data-driven closure relying on statistical inference, i.e., dictionary learning,of relevant derivative features in PDF equations [27]. To the best of our knowledge, both the methodof distributions and statistical learning of PDF equations from a few Monte Carlo runs have not beenused in computational neuroscience before. We investigate the relative performance of these strategieson two stochastic neuron models of varying degree of complexity, the randomized LIF and FHN models.The LIF model is by far the simplest biologically plausible neuron model, which is advantageous forsimulating large coupled networks. Yet it is limited in its ability to capture the basic geometry of aneuron’s excitability. The FHN model overcomes this limitation and captures the key features of theHodgkin-Huxley model. Our stochastic analogues of the LIF and FHN models account for the SWchannel noise.In Section 2, we present the method of distributions in the context of Langevin-type systems drivenby colored noise, and describe alternative approaches to the construction of closures for resulting PDFequations. Our numerical strategies for solving these equations are discussed in Section 3. The method’sperformance and errors associated with the proposed closures are investigated for the stochastic LIFand FHN models in Section 4. This section also contains our strategy for computing mutual informationbetween channel noise (or stochastic input current) and a neuron’s membrane potential. Main conclusionsderived from this study are summarized in Section 5.
2. Method of Distributions
Consider a system of stochastic ordinary differential equations (SDEs),d x ( t, ω )d t = v ( x ( t, ω ) , t ; ξ ( t, ω )) , x (0 , ω ) = x ( ω ) , (1) Due to the large number of these channels, channel noise is often ignored by arguing that fluctuations average out [12].However, channel noise in and of itself can change the behavior of neurons [13], hence, it is important to study its effectson a neuron’s dynamics. , T f ) and holds for almost every ω ∈ Ω in an appropriate probabil-ity space (Ω , F , P ). The solution x ( t, ω ) = [ x ( t, ω ) , . . . , x N ( t, ω )] (cid:62) : (0 , T f ) × Ω → R N is an R N -valuedstochastic process, and the initial state x is an N -dimensional random vector with the joint probabilitydensity function (PDF) f x ( X ) : R N → R + . The given deterministic function v = [ v , . . . , v N ] (cid:62) : R N → R N , parameterized with a set of N par random coefficients ξ ( t, ω ) = [ ξ ( t, ω ) , . . . , ξ N par ( t, ω )] (cid:62) , satisfiesconditions guaranteeing the existence of a unique path-wise solution x ( t, ω ) (see for instance [28, 29]). Thezero-mean random processes ξ ( t, ω ) are characterized by a prescribed single-point joint PDF f ξ ( Ξ ; t ) anda two-point covariance matrix C ξ with components C ik = E [ ξ i ( t , ω ) ξ k ( t , ω )], where i, k = 1 , . . . , N par and t , t ∈ (0 , T f ). Here and below we use E [ · ] and (cid:104)·(cid:105) interchangeably to denote the ensemble mean.Let X = [ X , . . . , X N ] (cid:62) ∈ R N be a variable in the system’s phase space. At any given time t , thestate of the system is (partially) characterized by the joint probability P [ x ( t, ω ) ≤ X ] ≡ F x ( X ; t ). If thissingle-point joint cumulative distribution function F x ( X ; t ) is differentiable with respect to all X i , thenthe system is described by the corresponding single-point joint PDF f x ( X ; t ). Our goal is to derive adeterministic equation for the PDF f x ( X ; t ) of x ( t, ω ), the random solution to (1).If all the parameters ξ i ( t, ω ) are uncorrelated in time (white noise), i.e. if C ik ∼ δ ( t − t ) where δ ( · ) isthe Dirac delta function, then f x ( X ; t ) satisfies exactly a Fokker-Plank (advection-diffusion) equation [19].Otherwise, a governing equation for f x ( X ; t ), which is henceforth referred to as a PDF equation, is notunique and its derivation requires a closure approximation (e.g., [23] and the references therein). Ingeneral, correlated inputs render such equations nonlocal in both phase space and time [ibid]. We focuson two interconnected challenges: numerical treatment of nonlocal PDF equations and statistical inference(aka learning) of the relevant closure terms in such equations from a few Monte Carlo realizations of (1).We use the implementation of the method of distribution that treats random noise ξ ( t ) exactly. This isin contrast to the implementations based on truncated expansions of random inputs, e.g., in a Karhunen-Lo´eve series [30, 31]. Such expansions are not appropriate for colored noise ξ ( t ) because of its shortcorrelation lengths. Derivation of a PDF equation starts by defining an auxiliary functional [24]Π( X , t ; ω ) = δ ( x ( t, ω ) − X ) = N (cid:89) i =1 δ ( x i ( t, ω ) − X i ) . (2)At any time t , its ensemble mean over all realizations of x is the PDF f x ( X ; t ): E [Π] ≡ (cid:90) R N δ ( Y − X ) f x ( Y ; t )d Y = f x ( X ; t ) . We show in Appendix A that Π( X , t ; ω ) obeys, in the sense of distributions, the linear conservation law ∂ Π ∂t + ∇ X · [ v ( X , t ; ω )Π] = 0 , Π( X ,
0) = δ ( x ( ω ) − X ) . (3)This equation describes, e.g., advection, in the phase space X , of a passive scalar (with concentrationΠ) in the random velocity field v . In that context, its stochastic averaging/homogenization has been thesubject of multiple investigations [26].A typical homogenization procedure employs the Reynolds decomposition to represent the randominputs and outputs as the sums of their means and zero-mean fluctuations around these means. Forexample, v = (cid:104) v (cid:105) + v (cid:48) where (cid:104) v (cid:105) ( X , t ) = (cid:90) R N par v ( X , t ; Ξ ) f ξ ( Ξ )d Ξ and (cid:104) v (cid:48) (cid:105) = . Likewise, the Reynolds decomposition of Π is Π = f x + Π (cid:48) . Substituting these decompositions into (3)and taking the ensemble mean yields an unclosed PDF equation ∂f x ∂t + ∇ X · ( (cid:104) v (cid:105) f x ) + ∇ X · (cid:104) v (cid:48) Π (cid:48) (cid:105) = 0 , f x ( X ; 0) = f x ( X ) , (4)3ith vanishing free-space boundary conditions. This equation is unclosed because it contains the unknownflux term (cid:104) v (cid:48) Π (cid:48) (cid:105) . Following [32, 25], we consider three alternative versions of the large-eddy-diffusivity(LED) closure: local, semi-local, and nonlocal. In all three formulations, the terminal problem (AppendixA.2) d χ d s = (cid:104) v ( χ , s ) (cid:105) , χ ( t ) = X , (5)and its associated flow χ ( s ) = Φ ( s ; X , t ) play an essential role. Whenever possible, we omit ω fornotational convenience. The local closure is constructed by a direct application of the classical LED approach in fluid dy-namics [26]. Under that approach, the cross covariance (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) is split into advection and diffusioncomponents, (cid:104) v (cid:48) Π (cid:48) (cid:105) ≈ V ( X , t ) f x ( X ; t ) − D ( X , t ) ∇ X f x ( X ; t ) , (6)where the LED drift velocity V and the LED diffusion tensor D are given in (A.16) and (A.17), respec-tively. Underpinning this closure is the assumption that the density f x and its spatial derivatives varyslowly over a correlation time interval ( t − τ, t ). Therefore, this closure is expected to be valid for systemswith short correlation time scales. The nonlocal closure does not rely on short correlations to the extent of the local closure and issecond-order accurate in στ (see [33] and Appendix A.2), (cid:104) v (cid:48) Π (cid:48) (cid:105) ≈ − (cid:90) t J ( s ; X , t ) ∇ Φ · (cid:16) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) f x ( Φ ( s ; X , t ); s ) (cid:17) d s, (7a)where J ( s ; X , t ) = exp (cid:18) − (cid:90) ts ∇ χ · (cid:104) v ( χ ( r ) , r ) (cid:105) d r (cid:19) . (7b)For a given dimension N , the computational complexity of the nonlocal PDF method is dominated bythe work of solving the terminal problem (5) and evaluating the Jacobian (7b). The latter is dramaticallysimplified if the flow divergence ∇ χ · (cid:104) v (cid:105) is linear in χ , which occurs in such classical test problems as thenonlinear pendulum and the Duffing oscillator; analyses of such problems, e.g., [34], are likely to give anoverly optimistic assessment of the PDF methods. The computational experiments reported in section 4are chosen to have the nonlinear flow divergence ∇ χ · (cid:104) v (cid:105) . The semi-local LED closures [32, 25] aim to combine the simplicity and speed of the local closure with(most of) the accuracy of the nonlocal one. One such closure is derived in Appendix A.3, (cid:104) v (cid:48) Π (cid:48) (cid:105) ≈ − (cid:90) t J ( s ; X , t ) exp(( t − s ) J ( X , t )) (cid:62) ∇ X · (cid:16) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) J − ( s ; X , t ) f x ( X ; t ) (cid:17) d s, (8)where J is the Jacobian of the mean-field velocity, J ( X , t ) = ∂ (cid:104) v ( X , t ) (cid:105) /∂ X . Numerical evaluation of Φ ( s ; X , t ) in J is computationally expensive. To alleviate this cost, we approximate it by its terminalcondition, Φ ( s ; X , t ) ≈ X , resulting in the approximate Jacobian ˜ J .4 emark 2.1. As an alternative, one can approximate Φ ( s ; X , t ) via a higher-order Taylor expansionaround the terminal condition ( X , t ). Our numerical experiments in section 4 reveal that this approx-imation gave no significant improvement in accuracy, while simultaneously presenting more numericalchallenges than the terminal approximation.Evaluation of the inverse of the approximate Jacobian ˜ J − introduces additional challenges thatwould otherwise be absent if J − were known exactly, including, but not limited to, singularities and/ornegative values of (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ). Many of these issues may be resolved by approximating ˜ J − with anonnegative spline. Another option, which entirely circumvents such issues, is to use a simpler butslightly less accurate semi-local closure, (cid:104) v (cid:48) Π (cid:48) (cid:105) ≈ − (cid:90) t ˜ J ( s ; X , t ) exp(( t − s ) J ( X , t )) (cid:62) ∇ X · (cid:16) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) f x ( X ; t ) (cid:17) d s. (9)It is obtained by approximating f x ( Φ ( s ; X , t ); s ) ≈ f x ( X ; t ) in (7), and amounts to assuming the PDF f x to be approximately constant over the interval ( t − τ, t ). Depending on the application, the integrals in (A.16), (A.17), (8), and (9) may or not be analyticallycomputable. If they cannot be solved analytically, their numerical evaluation can be computationallyexpensive, and prohibitively so in high dimensions. As an alternative to the LED closures presentedabove, we explore empirical (data-informed) closures of (4). The latter are constructed by representingthe unknown term (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) as a differential operator (cid:104) v (cid:48) Π (cid:48) (cid:105) ≈ β ( X , t ) f x ( X ; t ) + D ( X , t ) ∇ X f x ( X ; t ) + higher-order derivatives , (10)where D is a second-order semi-positive definite tensor with components D ik ( i, k = 1 . . . , N ). The M real-valued coefficients β = { β , D , . . . , D NN , . . . } are learned from post-processed Monte Carlorealizations of (1) by solving the following optimization problem.For simplicity, we consider a square spatial domain D ⊂ R N and discretize with a uniform square mesh X kn = X n + k ∆ X for k ∈ { , K } in each spatial dimension n . For the temporal domain (0 , T f ), we also takethe uniform grid t l = l ∆ t for l ∈ { , L } . Let ˆ f x ∈ R K × ... × K × L with entries ˆ f k ...k N l x ≡ ˆ f x ( X k , ..., X k N N ; t l )denote a joint PDF computed via post-processing of Monte Carlo realizations of (1) with a kernel densityestimator. We emphasize that ˆ f x is not the converged Monte Carlo solution, f MC , of (1). Instead, ˆ f x iscomputed using only a small fraction of the number of trials needed to compute the converged solution f MC with a prescribed accuracy. Furthermore, we denote by ˆ L a suitable numerical discretization (e.g.,finite difference, finite volume, or total variation regularized differentiation) of the differential operator L ≡ ∇ X · ( (cid:104) v (cid:105) f x + (cid:104) v (cid:48) Π (cid:48) (cid:105) ) in (4), with (cid:104) v (cid:48) Π (cid:48) (cid:105) given by (10). Finally, we introduce the discretized residual R k ...k N l ( β ) = ∂ ˆ f k ...k N l x ∂t + ˆ L ( ˆ f k ...k N l x ; β ) . The optimal variable coefficient vector β is one that satisfiesˆ β ( X , t ) = argmin β ( X ,t ) (cid:40) K N L K (cid:88) k =1 . . . K (cid:88) k N =1 L (cid:88) l =1 ||R k ...k N l ( β ) || + λ || β || (cid:41) , (11)where || · || is the L norm, || · || is the L norm penalty ensuring sparsification of features, and λ is ahyper-parameter iteratively chosen through a geometric sequence [35, 36, 27].In order to generalize the optimization problem in (11), one must balance a bias-variance tradeoffwhen designing a hypothesis class, i.e., choosing the order of derivatives in (10). A large hypothesis class,represented above by M variable coefficients β , minimizes bias at the cost of high variance. Such a classis more likely to fit an appropriate operator to the data in ˆ f x ; however, at the cost of possibly making5he equations too difficult to manipulate analytically. Choosing too simple of a hypothesis class mayprevent generalization by automatically removing hypotheses with high variance, i.e., filtering out noiseand outliers in the data. By choosing to include both variable coefficients (as opposed to constants) andthe L penalty in (11), we aim to balance the bias-variance tradeoff by respectively increasing the powerof the model, and by introducing sparsification, which leads to more interpretable equations.The hypothesis class is further constrained by the fact that (4) is an equation for the PDF of (1), whichmeans that it must take the form of a master equation. As a consequence of Pawula’s theorem [19, 33],the (finite) Kramers-Moyal (KM) expansion of the PDF master equation does not contain any partialderivatives of order higher than two; otherwise, it could yield a PDF that either is negative or may notintegrate to one. Hence, the order of derivatives in (10) is restricted to one.
3. Numerics
For any of the above closures, we solve (4) through operator splitting by successively considering thetwo equations ∂f x ∂t + ∇ X · (cid:104) v (cid:105) f x = 0 , (12) ∂f x ∂t + ∇ X · (cid:104) v (cid:48) Π (cid:48) (cid:105) = 0 . (13)The conservative advection equation (12) is solved through the Clawpack package [37] using a Lax-Wendroff discretization and MC limiter. In the N = 1 dimensional setting (13) is solved via a Crank-Nicolson time discretization. For dimensions higher than one, the modified Craig-Sneyd scheme presentedin [38] is used with θ = 1 /
2. This stable alternating direction implicit scheme treats mixed derivativesexplicitly while maintaining second-order temporal accuracy. In all cases, (13) is spatially discretized withsecond-order central differencing.
Clawpack provides two options for operator splitting, namely Godunovor Strang splitting. However, Strang splitting is not currently available for adaptive mesh refinement.For the application in Section 4.1, the classic version (uniform spatial mesh) of
Clawpack is used withStrang splitting since adaptive mesh refinement is not necessary. For the application in Section 4.2, weuse the adaptive version of
Clawpack (AMRClaw) with Godunov splitting, which significantly reducescomputational costs. Although Godunov splitting is only formally first-order accurate, a convergencestudy revealed that this application maintains near second-order temporal accuracy, i.e., its temporaltruncation errors range between O (∆ t . ) and O (∆ t ).The computational domain is taken as ( − L, L ) N where L > For a fixed time t , the error is defined in terms of the Kullback-Leibler(KL) divergence [39], D KL ( f MC || f x )( t ) = (cid:90) R N f MC ( X ; t ) ln (cid:18) f MC ( X ; t ) f x ( X ; t ) (cid:19) d X , It quantifies the amount of information lost when f x is used to approximate the high-resolution MonteCarlo solution f MC , which is treated as a yardstick.Each Monte Carlo realization is achieved by solving (1) with an adaptive fourth-order Runge-Kuttascheme, treating the discretized stochastic fluctuation v (cid:48) as a variable coefficient. For each time step inthe domain, a large number of these realizations are then post-processed with a Gaussian kernel densityestimator to obtain the yardstick PDF f MC on the discretized spatial domain. The bandwidth of thekernel density estimator is chosen by Silverman’s rule. In addition to f MC , we also compute the datamatrix ˆ f x first defined in Section 2.4. The number of realizations used to compute ˆ f x ranges from 5% to20% of those used for f MC . In Section 4.1, the solution to (1) is nonnegative, so the domain is reduced to (0 , L ) with homogeneous boundaryconditions. emark 3.1. Other natural choices for measuring error of PDF solutions are the Wasserstein and totalvariation distances. Relative L p norms are typically poor choices for the measuring error of PDFs. Theyoften underrepresent the accuracy of closures, sometimes increasing the error by one order of magnitudeeven if many statistical measures such as mean, variance, skewness, kurtosis, and modality match. Remark 3.2.
Asymptotic “plug-in” estimators of bandwidth, such as Scott’s or Silverman’s rule, arecomputationally inexpensive. Yet, if the underlying distribution is heavy-tailed or multimodal, comput-ing the yardstick PDF f MC and the data matrix ˆ f x may require an unusually large number of realiza-tions. More robust approaches to bandwidth selection include one-side cross-validation [40, 41] and totalvariation regularization [42]. The latter allows for variable bandwidths, but has to solve a bandwidthoptimization problem at each time step. The local and semi-local formulations of the LED closures have the same computational complexity,while the nonlocal formulation is considerably more expensive [25]. Evaluation of the cross-covarianceterm (7) in the latter involves a numerical approximation of the integral, which requires the history of thesolution up to the current time step. In the results presented below, the proposed local and semi-localapproaches are four orders of magnitude faster than the standard Monte Carlo approach.
Numerical solution of (12) and (13) with semi-local or nonlocal LED closures (7)–(9) may becomehighly nontrivial, e.g., when
N > v (cid:48) is spatially dependent. Our numerical experiments reportedin Section 4 reveal the total computational costs of the data-driven closures to be higher than that of thelocal and semi-local LED closures. Yet, the data-driven closures are still considerably cheaper than thestandard Monte Carlo approach and, in contrast to the LED closures, are relatively easy to implementnumerically. This makes the data-driven closures an attractive alternative. Moreover, if the optimizationproblem (11) is properly constrained and not naively implemented, the bulk computational costs of thedata-driven approach lies in generating the (relatively few) Monte Carlo realizations needed to assembleˆ f x . The remainder of this section is devoted to solving (11). Remark 3.3.
A naive strategy for solving the optimization problem (11), which involves the discretizedvariable coefficients β ( X k , ..., X k N N , t l ) at each grid point ( X k , ..., X k N N , t l ), is prohibitively expensivewithout proper computational resources, particularly memory allocation. The neuroscience application inSection 4.2, which has a relatively low-dimensional ( N = 2) phase space, illustrates the high-dimensionalnature of this optimization problem. An accurate solution of (12) would require a grid size of at most∆ X = ∆ X = 0 .
01 at the highest level of mesh refinement. Although this is handled with an adaptivemesh when solving (12) and (13), a uniform mesh is used for the optimization problem (11). For a squaredomain, a uniform mesh of size ∆ X = ∆ X = 0 .
01 consisting of 1000 elements in each direction wouldresult in 10 spatial grid cells. Given that the time step ∆ t must satisfy the hyperbolic CFL condition,the total number of spatiotemporal grid points is on the order of 10 for each of the M coefficients β .The resulting dimension of the optimization problem is O ( M K N L ) = M , well beyond the limitationsof a personal computer.To reduce the dimensionality of the optimization problem (11), we use a polynomial basis expansionto represent the decision variables β ( X , t ). For a polynomial (e.g., Chebyshev or Legendre) basis P q ( · ), β m ( X , t ) = Q (cid:88) q . . . Q N (cid:88) q N R (cid:88) r α q ...q N rm P q ( X ) . . . P q N ( X N ) P r ( t ) . (14)The optimization problem (11) is now solved over the expansion coefficients α q ...q N rm ∈ R for m ∈{ , M } . This step reduces the optimization dimension from O ( M K N L ) to O ( M Q . . . Q N R ), where each Q , ..., Q N , R <
10. (For the application from Section 4 , the optimization dimension is reduced from M to less than M .) 7sing N = 2 for illustration purposes, we build both a feature matrix F ∈ R K N L × M for (10) containingthe numerical derivatives of ˆ f x , F = ∂ X ˆ f x ( X , X , t ) ∂ X ˆ f x ( X , X , t ) . . . ∂ X X ˆ f x ( X , X , t )1 ∂ X ˆ f x ( X , X , t ) ∂ X ˆ f x ( X , X , t ) . . . ∂ X X ˆ f x ( X , X , t )... ... ... . . . ...1 ∂ X ˆ f x ( X K , X K , t L ) ∂ X ˆ f x ( X K , X K , t L ) . . . ∂ X X ˆ f x ( X K , X K , t L ) , and its corresponding label vector V ∈ R K N L in (12), V = (cid:16) ∂ t ˆ f x + ∇ X · (cid:104) v (cid:105) ˆ f x (cid:17) ( X , X , t ) (cid:16) ∂ t ˆ f x + ∇ X · (cid:104) v (cid:105) ˆ f x (cid:17) (cid:0) X , X , t (cid:1) ... (cid:16) ∂ t ˆ f x + ∇ X · (cid:104) v (cid:105) ˆ f x (cid:17) (cid:0) X K , X K , t L (cid:1) . The elements of V contained the derivatives discretized with a Lax-Wendroff discretization and MClimiter. The approximated variable coefficients in (14) form a vector P q ...q N r ∈ R K N L with elements P q ...q N rk ...k N l = P q ( X k ) . . . P q N ( X k N N ) P r ( t l ) that correspond to the grid-point elements in F and V . Theresidual takes the form R ( α q ...q N r ) = V + ( F (cid:12) P q ...q N r T ) α q ...q N r (15)for each polynomial coefficient vector, where (cid:12) is the element-wise Hadamard product and ∈ R M is avector of ones. Hence, the optimization problem over all polynomial coefficients isˆ A ≡ ˆ α ... ˆ α ... ...ˆ α Q ...Q N R = argmin A || R ( A ) || + λ || A || , (16a)where R ( A ) = R ( α ... ) R ( α ... )... R ( α Q ...Q N R ) ∈ R K N LQ ...Q N R . (16b)We solve (16) by utilizing the lasso function of MATLAB with a 10-fold cross-validation set on eachiteration. We use a mean squared error loss function and a relative tolerance of 10 − , i.e., the algorithmterminates when the L norm of successive estimates of the coefficient vector differ by less than 10 − .The extrapolation power of our model is tested by fitting the data for the first half of the time horizon,[0 , T f / T f / , T f ]. Remark 3.4.
It may be of interest to consider different variations of cross-validation algorithms thatare not considered in this study, including least-angle regression and Lasso implemented with Bayesinformation criterion. Such algorithms may give different results depending on the choice of the hypothesisclass.
4. Numerical Experiments
We consider two applications of the method of distributions and its associated closures to neuron mod-els, namely, a leaky integrate-and-fire (LIF) model and the FitzHugh-Nagumo (FHN) neuron, all driven8y sine-Wiener (SW) channel noise. The first application has one state variable ( N = 1), and the secondhas two ( N = 2). SW noise is commonly used in neuroscience, e.g., to model signal transmission [14],resonance [15], and detection [16] in nervous systems; to induce spike death in excitable systems [17]; andfor weak periodic signal detection in the FHN neuron [18].SW noise ξ ( t ) is a bounded and correlated zero-mean non-Gaussian process that is defined by theSDE d ξ t = − στ ξ t d t + (cid:114) σ τ cos (cid:32)(cid:114) τ B t (cid:33) d B t , ξ (0) = 0 , (17)where is σ > τ > B t is a standardBrownian motion. The solution to (17) is given by ξ ( t ) = σ sin (cid:32)(cid:114) τ B t (cid:33) . (18)It has the mean (cid:104) ξ ( t ) (cid:105) = 0 for any fixed time t , and covariance function [43, 44, 33, 19] C ξ ( t, s ) = (cid:104) ξ ( t ) ξ ( s ) (cid:105) = σ (cid:18) − t − sτ (cid:19) (cid:20) − exp (cid:18) − sτ (cid:19)(cid:21) , s ≤ t. (19)In both applications, SDE (1) and all of its corresponding PDF equations are solved up to time T f = 100. LIF neuron models are often used in place of their higher-dimensional counterparts, i.e., models withlarge N , especially when considering large coupled networks. They describe the basic neuron behaviorof slowly charging in the presence of current, spiking after reaching a voltage threshold, and then slowlydischarging [2]. In order to present an example where all of the proposed closures are in agreement, theLIF model under consideration is not a spiking model, i.e., it lacks a voltage threshold and resettingmechanism for the membrane potential. We do study spiking dynamics in Section 4.2 for which there isa discrepancy between the different closures.In particular, we consider a (one-dimensional) LIF model that is embedded in the (two-dimensional)stochastic FHN model in the excitable regime. The derivation of the embedding is presented in Section4 of [12]. We use their Eq. 4.3, albeit driven by SW noise rather than Gaussian white noise, to describethe membrane voltage x ( t ) ≥
0, d x d t = σ x − µx + ξ ( t ) , (20)with a given initial conditions x (0) = x where x is a folded Gaussian random variable, f x ( X ) = | N (0 . , . ) | . In the simulations below, we set µ = 0 . σ =0 . . . . σ = 0 . . .
01, and 0 .
015 in the FHN model. The correlation times vary between τ = 0 .
05, 0 . .
15, and 0 . v ( x ( t ) , t ; ξ ( t )) ≡ σ / (2 x ) − µx + ξ . In the phasespace, v ( X, t ; ξ ( t )) ≡ σ / (2 X ) − µX + ξ is linear in the additive SW ξ ( t ). Hence, (cid:104) v ( X, t ) (cid:105) = σ / (2 X ) − µX and v (cid:48) ( X, t ) = ξ . The terminal problem (5) takes the formd χ d s = σ χ − µχ, χ ( t ) = X, (21)and admits an analytical solution, yielding a closed-form expression for the associated flow χ ( s ) =Φ( s ; X, t ). A FHN neuron is considered to be in the excitable regime when starting in the basin of attraction of a unique stablefixed point, a large excursion, or spike, occurs in the phase space and then permanently returns to the fixed point [1]. .1.1. Local LED closure For the problem under consideration, equations (A.16) and (A.17), which define the drift velocity V and the diffusion coefficient D in the local LED closure (6), reduce to V ( X, t ) ≡ D ( X, t ) = (cid:90) t J ( s ; X, t ) C ξ ( t, s ) d s, (22)where J ( s ; X, t ) = exp (cid:18) − (cid:90) ts ∂∂χ (cid:104) v ( χ ( r ) , r ) (cid:105) d r (cid:19) = exp (cid:18)(cid:90) ts (cid:20) σ ( r ; X, t ) + µ (cid:21) d r (cid:19) . (23)The integral in J can be evaluated analytically, but this is generally not the case. To be consistent withthe approximations that are usually needed, we approximate Φ( s ; X, t ) in J by its terminal condition X , resulting in the approximate Jacobian˜ J ( s ; X, t ) = exp (cid:18)(cid:90) ts (cid:20) σ X + µ (cid:21) d r (cid:19) = exp (cid:20) ( t − s ) (cid:18) σ X + µ (cid:19)(cid:21) . (24)With this local LED closure, the single-point PDF f x ( X ; t ) of the random membrane voltage x ( t ) satisfiesa Fokker-Planck equation ∂f x ∂t + ∂∂X (cid:20)(cid:18) σ X − µX (cid:19) f x (cid:21) = ∂∂X (cid:18) ˜ D ( X, t ) ∂f x ∂X (cid:19) , (25)where the diffusion coefficient ˜ D is computed upon replacing J with ˜ J in (22) and analytically evalu-ating the resulting integral. Remark 4.1.
As mentioned in Remark 2.1, a higher-order Taylor expansion around the terminal condi-tion (
X, t ) may be used as an alternative approximation for Φ( s ; X, t ) in (23). Our numerical experimentswith this approximation showed no improvement in accuracy, but simultaneously induced additional nu-merical challenges: The resulting linear approximation of the Jacobian J behaves like exp[( t − s ) O ( X − )],as does ˜ J in (24), but has singularities at points other than X = 0. For the problem under consideration, the semi-local closure (9) reduces to (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X, t ) ≈ − D ( t ) ∂f x ∂X , (26)where the diffusion coefficient D ( t ) is computed analytically, D ( t ) = (cid:90) t ˜ J ( s ; X, t ) exp (cid:18) ( t − s ) d (cid:104) v ( X, t ) (cid:105) d X (cid:19) C ξ ( t, s )d s = (cid:90) t C ξ ( t, s )d s. (27)The resulting PDF equation for f x ( X ; t ) is identical to (25), except that now D ( t ) is used in place of˜ D ( X, t ). A solution to this equation, f x ( X ; t ), is shown in Fig. 1 for several combinations of the meta-parameters σ and τ characterizing the strength and correlation time of the colored noise ξ ( t ), respectively.For the stochastic LIF model (20), the PDFs of membrane voltage, f x ( X ; t ), computed via the PDFequation with the nonlocal and local closures are visually indistinguishable from both the PDF with thesemi-local closure shown in Fig. 1 and the yardstick PDF, f MC ( X ; t ), estimated via high-resolution MonteCarlo simulations. The latter required N MC ≈ We found the semi-local closure (8) to provide only minor improvements in accuracy relative to the closure (9), whilebeing significantly more involved computationally. Consequently, only the numerical results for the semi-local closure (9)are presented therein. X f x = 0.044, = 0.05 X f x = 0.044, = 0.20 X f x = 0.133, = 0.05 X f x = 0.133, = 0.20 P D F o f m e m b r a n e v o l t ag e , f x ( X ; t )
044 and 0 . σ = 0 .
066 and 0 . σ and τ , as to be expected from the theoretic considerations, since the LED-based closures can be thought ofleading-order terms in perturbation expansions in the powers of σ and τ . Regardless of the choice of thehyper-parameter values, D KL ( f MC || f x ) generally decreases with time t . That is because the membranevoltage PDF approaches its nearly Gaussian steady state, wherein the PDF and its derivatives withrespect to X are almost constant over the correlation time intervals ( t − τ, t ), an approximation thatunderpins both the local and semi-local closures. For all choices of σ and τ and for all time t , thenonlocal, semi-local, and local closures yield PDFs f x ( X ; t ) within the KL divergence below 1 . × − ,with the largest error occurring for the largest values of σ and τ . The local and semi-local approachesare four orders of magnitude faster than the Monte Carlo approach.
Since the diffusion coefficient for the semi-local closure, D , depends only on time, see (27), we considerspatially independent variable coefficients β ≡ β ( t ) in the data-driven closure (14), significantly reducingthe dimensionality of the optimization problem (11) . Furthermore, since D ( t ) is a monotonicallyincreasing function that quickly reaches an asymptote (Fig. 3), we also investigate a data-driven closurewith constant coefficients β , eliminating the need for a polynomial expansion and further simplifying theoptimization problem. In both cases, the one-dimensional version of (10) gives rise to the hypothesis ∂ (cid:104) v (cid:48) Π (cid:48) (cid:105) ∂X ≈ β ∂f x ∂X + β ∂ f x ∂X , (28) Although the accuracy of the local and semi-local closures is nearly identical for the stochastic LIF model (20), this isgenerally not the case, as seen in Section 4.2. To be concrete, we represented β i ( t ) ( i = 1 ,
2) with the first ten Legendre polynomials. = 0 . , ⌧ = 0 .
133 and τ = 0 .
20) of the membrane voltage PDFs f x ( X ; t ) alternatively computed with the semi-local closure (SL), the data-driven closure with constant coefficients (DD C ),and the data-driven closure with a temporal Legendre basis expansion (DD P ). f x ( X ; t ) being under-diffused. This comparison is provided for the largest values of σ and τ consideredin our study ( σ = 0 .
133 and τ = 0 . D KL ( f MC || f x ) between the “exact” solution f MC ( X ; t ) and its approximations f x ( X ; t ) ob-tained with the data-driven closure. We illustrate the performance of the least computationally expensivedata-driven closure, i.e., the one that relies on the constant coefficients ˆ β ∈ R . As σ and τ decrease,the discrepancy between the the semi-local closure and the data-driven closure increases, with the largestdifference in their KL divergences occurring for σ = 0 .
044 and τ = 0 .
05 at later times. At first glance,this may seem contradictory to Fig. 3, but one must consider that the diffusion coefficients decrease with σ and τ , and that σ also influences advection as it is included in the drift term in (20). When σ and τ aresmall, the resulting PDF f x ( X ; t ) advects and diffuses at a slow rate (see Fig. 1), resulting in a spikierPDF. Hence, in terms of relative entropy, i.e. the KL divergence, f x ( X ; t ) is more sensitive to smallchanges in the diffusion coefficient. Unlike in the LED theory, the data-driven closure does not guaranteethat f x ( X ; t ) should improve as σ and τ decrease. Regardless of these observations, the data-drivenclosure remains accurate when compared to the semi-local closure. t -4 -3 -2 -1 D K L = 0.133 t -4 -3 -2 -1 D K L = 0.044 K L d i v e r g e n ce , D K L
000 Monte Carlo realizations.
The residual ˆ R in the optimization problem (11) uses the data matrix ˆ f x computed from N trMC MonteCarlo realizations. Hence, the optimal values of β and, hence, the PDF f x ( X ; t ) computed with (12), 13and (28) are affected by the choice of N trMC . Figure 6 sheds light on this dependence by exhibiting the KLdivergence D KL ( f MC || f x ), averaged over all temporal grid nodes, as function of N trMC . For all choices of σ and τ considered, the optimal solution ˆ β to (11) needs N trMC ≈ ,
000 Monte Carlo trials to stabilize,13nd the accuracy of the data-driven closure does not improve for N trMC > , N MC , needed to compute f MC . The data-driven results presented in Figs. 3–5are obtained with N trMC = 30 ,
000 Monte Carlo trials.
MC trials -4 -3 -2 -1 D K L = 0.044 = 0 .
75, in accordance with [12]. With these parameter values, the deterministic version of 29,i.e., in the absence of the noise ξ , is in the excitable regime and has a unique stable fixed point at( x , x ) = ( − . , − . ξ ( t ) may perturb thetrajectory away from the fixed point, causing repeated firing. The frequency of this repeated firingincreases with the noise amplitude σ [12]. In our simulations we consider noise amplitudes of σ = 0 . .
05, and 0 .
2, and correlation times τ = 0 .
01, 0 .
1, and 5 .
0. The initial state x (0) = x is an uncorrelatedjoint Gaussian whose mean is perturbed from the stable fixed point: f x ( X ) ∼ N (cid:18)(cid:20) − . − . (cid:21) , (cid:20) .
01 00 0 . (cid:21)(cid:19) . The approximate Jacobian ˜ J associated with the FHN model (29) needed for both the local andsemi-local LED closures is given by˜ J ( s ; X , t )] = exp( − ( t − s )(1 − (cid:15)β − X )) . For the problem under consideration, equations (A.16) and (A.17) with J ≈ ˜ J yield analyticalexpressions for the drift velocity V and diffusion coefficient D , V ( X , t ) ≡ (cid:20) (cid:21) , (30a) D ( X , t ) ≡ (cid:90) t ˜ J ( s ; X , t ) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) d s = D ( X , t ) (cid:20) (cid:21) , (30b) A neuron is considered to be in the excitable regime when starting in the basin of attraction of a unique stable fixedpoint, a large excursion, or spike, occurs in the phase space and then permanently returns to the fixed point [1]. D ( X , t ) = (cid:90) t ˜ J ( s ; X , t ) C ( t, s )d s. (30c)The integral in (30c) is computed analytically. The PDF equation (A.18) for the joint density f x ( X ; t, )simplifies to ∂f x ∂t + ∇ X · (cid:18)(cid:20) X − X / − X + I(cid:15) ( X + α − βX (cid:21) f x (cid:19) = D ∂ f x ∂X . (31) t C oe ff i c i en t s -4 = 0.05, = 0.10 t C oe ff i c i en t s -5 = 0.01, = 5.00 Time, t
05 and τ = 0 .
1, the optimization algorithm sets ˆ β = 0 while D = O (10 − ), which is near zero when compared with ˆ β and D . The semi-local closure (9) takes the form (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) = − (cid:90) t ˜ J ( s ; X , t ) C ( t, s ) exp(( t − s ) J ( X , t )) (cid:62) (cid:20) (cid:21) ∇ X f x ( X ; t )d s. (32)The integrals in (32) are computed exactly with a symbolic computation software Mathematica , givingrise to the PDF equation ∂f x ∂t + ∇ X · (cid:18)(cid:20) X − X / − X + I(cid:15) ( X + α − βX (cid:21) f x (cid:19) = ∂∂X (cid:18) D ∂f x ∂X (cid:19) + D ∂ f x ∂X . (33)where D ( X , t ) and D ( X , t ) are known functions whose explicit expressions are omitted for brevity,and whose graphical representation is provided in Fig. 7. (The X -dependence of these coefficients isinsignificant, and D ( · , t ) and D ( · , t ) are plotted for X = 0.) Figure 8 shows temporal snapshots of thejoint PDF f x ( X ; · ) computed with (33). The shape of this PDF reveals that the ensemble mean (cid:104) x ( t ) (cid:105) in and of itself does not accurately capture the dynamics of (29).Figure 9 depicts the temporal evolution of the KL divergence D KL ( f MC (cid:107) f x ). It serves to ascertainthe accuracy of the semi-local closure that underpins the PDF equation (33). As expected, the accuracyincreases as the correlation time τ decreases. As the noise amplitude σ decreases, the different choicesfor τ become less impactful as the errors are more tightly clustered. Although not shown here, we foundthat, for all choices of σ , the local closure agrees with the semi-local closure within a KL divergence of10 − for τ = 0 .
01. For τ = 0 .
1, the KL divergence of the local closure is between 0 . . τ = 5 .
0, it is up to a full order of magnitude more than that of thesemi-local closure. 15 x ( X ; t )
20 40 60 t -4 -2 D K L = 0.20 t -4 -2 D K L = 0.01 K L d i v e r g e n ce , D K L
05 and τ = 0 . β ( N trMC ) to the sparse regression problem (11) needs N trMC Monte Carlo trials to stabilize for all choices of σ and τ . The accuracy of the data-driven closure doesnot improve for N trMC > , N trMC = 40 ,
000 was used in the simulation results reported inFigs. 7–10. This number is less than 20% of the number of trials needed to compute the Monte Carlosolution f MC . MC trials -2 D K L = 0.20 MC trials -2 D K L = 0.05 MC trials -2 -1 D K L = 0.01 K L d i v e r g e n ce , D K L
05 and τ = 0 .
1. The MI initially increases in all cases: when the system begin to evolve, a portionof the joint PDF enters into a pseudo period due to the probability of a spike, and more informationis gained about the coupling of states. However, there is a stark contrast in the evolution of I ( x ; x )when compared to I ( x ; ξ ) and I ( x ; ξ ). The MI I ( x ; x ) oscillates according to the pseudo-periodicbehavior of the joint PDF (Fig. 8), i.e., the neuron’s spikes. The MI between the membrane potential x ( t ) and the recovery variable (inhibitory response) x ( t ) is gained as the neuron spikes, but decreasesas the neuron returns to the vicinity of its resting state, as one would expect [11]. Unlike I ( x ; x ), both I ( x ; ξ ) and I ( x ; ξ ) are nearly constant in time after a slight initial increase. Hence, apart from the firstspike, observing the membrane potential/recovery variable for a fixed time does not provide any more orless information about the noise source than any of the other temporal observations. The converse is alsotrue since MI is symmetric. This behavior is attributed to the fact that the SDE for ξ ( t ) is independent18rom the states x ( t ) and x ( t ). The noise ξ can change the dynamics of x and x by perturbing theneuron from rest and causing it to fire; however, x and x have no effect on how the noise is generated. t Time, t
The PDF method in Section 2 can be generalized in order to derive an equation for ajoint PDF of system states at multiple times. This enables the calculation of information transfer ratesbetween states at different times, as opposed to the instantaneous information transmission. For example,instead of calculating total correlation between the FHN states x and x and the noise source ξ at anytime t , one can compute the total correlation between these states at any two times t and t . This wouldlead to the doubling of the number of dimensions in the PDF equation from three to six, and requiredeployment of numerical strategies for solution of high-dimensional partial-differential equations such astensor-train methods [47, 48]. Remark 4.3.
Apart from computing information-theoretic metrics, the PDF method can be used tocalculate PDFs for first exit/spike times and interspike-intervals. For example, the FHN neuron spikes(almost surely) when its path crosses the line X = 0 [12]. Defining the first exit/spike time to be τ = inf { s > x ( s ) = 0 } , its distribution can be calculated in terms of the cumulative distributionfunction (CDF) F x ( X , X ; t ). When studying first spiking times, it is common to consider deterministicinitial conditions in (1). This setting corresponds to the initial condition for the PDF equation (4) beingthe Dirac delta function, which introduces significant numerical challenges [49]. This can be circumventedby deploying the CDF method (see [24] and the references therein). It would result in a partial differentialequation for the CDF of system states, for which the deterministic initial condition in (1) translates intothe Heaviside function as the initial CDF.
5. Conclusions
Nonlinear dynamical systems provide a rich framework for understanding complex neuronal processes.Their stochastic analogues allow one to study the effects of externally imposed and/or internally generatednoise on neuronal dynamics. They can also be used to quantify parametric uncertainty in underlying19odels. Noise sources in stochastic ODE-based neuron models are often treated as Gaussian white noisedue to its convenient mathematical properties, one being that the joint PDF of system states satisfiesexactly the Fokker-Planck equation. However, Gaussian white noise is not a biologically plausible noisesource for many neuronal systems, for which colored noise characterized by a short correlation length isa more realistic representation.Monte Carlo methods are often employed to calculate PDF solutions of stochastic ODEs, as well asinformation-theoretic quantities and distributions of various spiking statistics. They are easily imple-mentable, yet computationally demanding. Efficient methods for general Langevin-type systems withcolored noise have been studied extensively; however, most are not appropriate for neuron models. Meth-ods utilizing only the first few moments of system states typically underperform because low-order mo-ments can spend considerable time in a low probability state, illustrated in Fig. 8 by the ensemble mean.Additionally, many methods that do characterize a full PDF solution, such as polynomial chaos expan-sions and stochastic finite elements, suffer from the curse of dimensionality when applied to systems withshort correlation timescales. The methods of distributions, comprised of PDF and CDF methods, is theexception; this study utilizes the former to discover Fokker-Planck-type equations for stochastic neuronmodels. Such equations describe PDF dynamics of neuronal states, and in general, the states of generalLangevin-type systems. While the PDF equations resulting from the method of distributions are exactfor stochastic ODEs driven by white noise, they are generally unclosed and require approximations inthe presence of colored noise. We proposed two approaches for constructing such closures: nonlocal largeeddy diffusivity (LED) closures and a data-driven closure relying on statistical inference; we apply eachto study the stochastic LIF and FHN neurons.Fully nonlocal LED closures result in derived integro-partial differential PDF equations that are dif-ficult to discretize numerically and computationally demanding to solve unless the underlying stochasticODEs exhibit weak nonlinearities and are low-dimensional, e.g., the LIF model. For more complicatedmodels, e.g., the FHN model, localization approximations are needed to render the PDF equation numer-ically computable. Full localization yields “easy-to-solve” linear partial-differential PDF equations, butintroduces significant approximation error unless the noise intensity and correlation time are small. Par-tial (semi-) localization provides an attractive trade-off between accuracy and computational complexity;however, this is likely to be problem dependent. Regardless, LED closures reduce computational costs byseveral orders of magnitude when compared to the standard Monte Carlo approach, with the local andsemi-local LED closures being more efficient than the nonlocal one.Unlike LED closures, the proposed data-driven closure needs only a few Monte Carlo trials andkernel density estimation to learn terms in the PDF equation from a dictionary of possible differentiableoperators. The relevant derivative features and their coefficients are found by formulating an optimizationproblem via sparse regression, i.e., Lasso. If the number of equations in a neuron model is greaterthan one, the dimensionality of the regression problem is inherently high when considering arbitraryvariable coefficients. It can be significantly reduced by approximating the coefficients with a polynomialbasis expansion, which is sufficiently accurate for the models considered in this study. While morecomputationally expensive than the local and semi-local LED closures, the data-driven closure is easierto implement and still less expensive than standard Monte Carlo approach. Its accuracy is comparableto the semi-local LED closure.The method of distributions is a computationally efficient approach for finding PDF solutions tostochastic neuron models driven by colored noise. Post-processing of these solutions yields distributionsof spiking statistics or information-theoretic quantities such as mutual information. For relatively simplemodels, e.g., the LIF neuron, all of the proposed closures yield consistent results and remain accurateover a wide range of noise intensities and correlation times. For models with more complicated dynamics,e.g., the FHN model and other two- or three-dimensional models such as Morris-Lecar, the semi-localLED and data-driven closures are likely to be in agreement and maintain good accuracy as long as στ ,the product of the noise intensity and correlation time, is not very large relative to the system’s scale.For high-dimensional models, such as Hodgkin-Huxley, the numerical discretization of PDF equationsresulting from LED closures may be difficult, the data-driven closure is likely to give better results.20 eferences [1] E. M. Izhikevich, Dynamical Systems in Neuroscience: the geometry of excitability and bursting, MIT Press, 2007.[2] W. Gerstner, W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity, Cambridge UniversityPress, 2002. doi:10.1017/CBO9780511815706 .[3] R. FitzHugh, Impulses and physiological states in theoretical models of nerve membrane, Biophys. J. 1(6) (1961)445–466. doi:10.1016/S0006-3495(61)86902-6 .[4] C. Morris, H. Lecar, Voltage oscillations in the barnacle giant muscle fiber, Biophys. J. 35(1) (1981) 193–213. doi:10.1016/S0006-3495(81)84782-0 .[5] A. L. Hodgkin, A. F. Huxley, A quantitative description of membrane current and its application to conduction andexcitation in nerve . 1952;117(4), J. Physiol. 117(4) (1952) 500–544. doi:10.1113/jphysiol.1952.sp004764 .[6] W. Gerstner, W. M. Kistler, R. Naud, L. Paninski, Neuronal Dynamics, Cambridge University Press, 2014. doi:10.1017/cbo9781107447615 .[7] P. E. Greenwood, L. M. Ward, Stochastic Neuron Models, Springer, Cham, 2016. doi:10.1007/978-3-319-26911-5 .[8] A. Iolov, S. Ditlevsen, A. Longti, Fokker-Planck and Fortet equation-based parameter estimation for a leaky integrate-and-fire model with sinusoidal and stochastic forcing, J. Math. Neurosc. 4 (2014) 4. doi:10.1186/2190-8567-4-4 .[9] H. Alzubaidi, T. Shardlow, Improved simulation techniques for first exit time of neural diffusion models, Commun. inStats. - Simulation and Comput. 43 (10) (2014) 2508–2520. doi:10.1080/03610918.2012.755197 .[10] H. C. Tuckwell, F. Wan, Time to first spike in stochastic hodgkin?huxley systems, Physica A: Stat. Mech. App. 351 (2)(2005) 427–438. doi:10.1080/03610918.2012.755197 .[11] F. Rieke, Spikes: exploring the neural code., Computational neuroscience, MIT Press, 1997.[12] M. E. Yamakou, T. D. Tran, L. H. Duc, J. Jost, The stochastic Fitzhugh–Nagumo neuron model in the excitableregime embeds a leaky integrate-and-fire model, J. Math. Biol. 79 (2019) 509–532. doi:10.1007/s00285-019-01366-z .[13] J. A. White, J. T. Rubinstein, A. R. Kay, Channel noise in neurons, Trends Neurosci. 23 (3) (2000) 131–137. doi:10.1016/S0166-2236(99)01521-0 .[14] X. S. Kang, X. M. Liang, H. P. L¨u, Enhanced response to subthreshold signals by phase noise in a Hodgkin–Huxleyneuron, Chin. Phys. Lett. 30 (1) (2013) 018701–018704. doi:10.1088/0256-307X/30/1/018701 .[15] X. Liang, M. Dhamala, L. Zhao, Z. Liu, Phase-disorder-induced double resonance of neuronal activity, Phys. Rev. E82 (1) (2010) 010902–010905.[16] X. Liang, L. Zhao, Z. Liu, Phase-noise-induced resonance in a single neuronal system, Phys. Rev. E 84 (3) (2010)031916–031920.[17] W. Guo, L. C. Du, D. C. Mei, Coherence and spike death induced by bounded noise and delayed feedback in anexcitable system, Eur. Phys. J. B 85 (6) (2012) 182.[18] Y. Yao, J. Ma, Weak periodic signal detection by sine-Wiener-noise-induced resonance in the FitzHugh–Nagumoneuron, Cogn. Neurodyn. 12 (2018) 343–349. doi:10.1007/s11571-018-9475-3 .[19] H. Risken, T. Frank, The Fokker-Planck Equation: Methods of Solution and Applications, Springer Series in Syner-getics, Springer, New York, 1996.[20] E. Haskell, D. Nykamp, D. Tranchina, A population density method for large-scale modeling of neuronal networks withrealistic synaptic kinetics, Neurocomputing 38-40 (2001) 627–632. doi:10.1016/S0925-2312(01)00407-6 .[21] R. Rosenbaum, A diffusion approximation and numerical methods for adaptive neuron models with stochastic inputs,Front. Comput. Neurosci. 38-40 (2001) 627–632. doi:10.3389/fncom.2016.00039 .[22] M. Giles, Multilevel Monte Carlo path simulation, Oper. Res. 56 (3) (2008) 607–617. doi:10.1287/opre.1070.0496 .[23] P. Wang, A. M. Tartakovsky, D. M. Tartakovsky, Probability density function method for Langevin equations withcolored noise, Phys. Rev. Lett. 110 (14) (2013) 140602. doi:10.1103/PhysRevLett.110.140602 .[24] D. M. Tartakovsky, P. A. Gremaud, Method of distributions for uncertainty quantification, in: e. a. R. Ghanem (Ed.),Handbook of Uncertainty Quantification, Springer, 2015, pp. 1–22.[25] T. E. Maltba, P. A. Gremaud, D. M. Tartakovsky, Nonlocal PDF methods for Langevin equations with colored noise,J. Comput. Phys. 367 (2018) 87–101. doi:10.1016/j.jcp.2018.04.023 .[26] R. H. Kraichnan, Eddy viscosity and diffusivity: Exact formulas and approximations, Complex Syst. 1 (1987) 805–820.[27] J. Bakarji, D. M. Tartakovsky, Data-driven discovery of coarse-grained equations, arXiv.URL http://arxiv.org/abs/2002.00790 [28] J. L. Strand, Random ordinary differential equations, J. Diff. Equat. 7 (1970) 538–553.[29] T. Neckel, F. Rupp, Random differential equations in scientific computing, De Gruyter, 2013.[30] D. Venturi, T. P. Sapsis, H. Cho, G. E. Karniadakis, A computable evolution equation for the joint response-excitationprobability density function of stochastic dynamical systems, Proc. R. Soc. A 468 (2139) (2012) 759–783.[31] D. Venturi, D. M. Tartakovsky, A. M. Tartakovsky, G. E. Karniadakis, Exact PDF equations and closure approxima-tions for advective-reactive transport, J. Comput. Phys. 243 (2013) 323–343. doi:10.1016/j.jcp.2013.03.001 .[32] D. A. Barajas-Solano, A. M. Tartakovsky, Probabilistic density function method for nonlinear dynamical systemsdriven by colored noise, Phys. Rev. E 93 (2016) 052121–1–052121–13.[33] N. G. van Kampen, Stochastic differential equations, Phys. Rep. 24 (1976) 171–228.[34] H. Cho, D. Venturi, G. Karniadakis, Adaptive discontinuous Galerkin method for response-excitation PDF equations,SIAM J. Sci. Comput. 35 (2013) B890–B911.[35] S. L. Brunton, J. L. Proctor, J. N. Kunz, W. Bialek, Discovering governing equations from data by sparse identificationof nonlinear dynamical systems, Proc. Natl. Acad. Sci. U.S.A 113 (15) (2016) 3932–3937.[36] H. Schaeffer, Learning partial differential equations via data discovery and sparse optimization, Proc. Roy. Soc.473 (2197). doi:10.1098/rspa.2016.0446 .
37] Clawpack Development Team, Clawpack software, version 5.6.1 (2019).URL [38] K. J. in ’t Hout, C. Mishra, Stability of ADI schemes for multidimensional diffusion equations with mixed derivativeterms, Appl. Numer. Math. 74 (2013) 83–94. doi:10.1016/j.apnum.2013.07.003 .[39] D. Mackay, Information Theory, Inference and Learning Algorithms, 6th Edition, Cambridge University Press, 2003.[40] O. Y. Savchuk, J. D. Hart, Fully robust one-sided cross-validation for regression functions, Computational Statistics32 (3) (2017) 1003–1025.[41] O. Savchuk, One-sided cross-validation for nonsmooth density functions, arXiv.URL https://arxiv.org/abs/1703.05157 [42] R. Koenker, I. Mizera, Density estimation by total variation regularization, in: V. Nair (Ed.), Advances in Sta-tistical Modeling and Inference, Vol. 3 of Series in Biostatistics, World Scientific, 207, pp. 613–633. doi:10.1142/9789812708298_0030 .[43] L. J. Ning, P. Liu, The effect of sine-wiener noises on transition in a genotype selection model with time delays, Eur.Phys. J. B 89 (9) (2016) 201.[44] H. Yang, L. J. Ning, Phase transitions induced by time-delay and different noises, Nonlinear Dyn. 88 (4) (2017)2427–2433.[45] S. Vejnarov´a, The multiinformation function as a tool for measuring stochastic dependence, in: M. Jordan (Ed.),Learning in graphical models, MIT Press, 1999, pp. 261–296.[46] S. Watanbe, Information theoretical analysis of multivariate correlation, IBM Journal of Research and Development 4(1960) 66–82.[47] A. M. P. Boelens, D. Venturi, D. M. Tartakovsky, Parallel tensor methods for high-dimensional linear pdes, J. Comput.Phys. 375 (12) (2018) 519–539. doi:10.1016/j.jcp.2018.08.057 .[48] A. M. P. Boelens, D. Venturi, D. M. Tartakovsky, Tensor methods for the boltzmann-bgk equation, J. Comput. Phys.421. doi:10.1016/j.jcp.2020.109744 .[49] R. Rutjens, G. Jacobs, D. M. Tartakovsky, Method of distributions for systems with stochastic forcing, Int. J. Uncert.Quant. doi:10.1615/Int.J.UncertaintyQuantification.2020031940 .[50] L. Evans, Partial Differential Equations, 2nd Edition, AMS, Providence, 2010.[51] M. Ye, S. P. Neuman, A. Guadagnini, D. M. Tartakovsky, Nonlocal and localized analyses of conditional meantransient flow in bounded, randomly heterogeneous porous media, Water Resour. Res. 40 (2004) W05104. doi:10.1029/2003WR002099 . Appendix A. Derivation of PDF Equations
Appendix A.1. Regularization Argument
Let us define Π (cid:15) ( X , t ), a regularized version of Π( X , t ) in (2), asΠ (cid:15) ( X , t ) ≡ ( η (cid:15) (cid:63) Π)( X , t ) = (cid:90) R N η (cid:15) ( X − Y )Π( Y , t )d Y = η (cid:15) ( X − x ) . (A.1)The standard positive mollifier η (cid:15) ∈ C ∞ ( R N ) satisfies the conditions of symmetry, η (cid:15) ( X − x ) = η (cid:15) ( x − X ),and scaling η (cid:15) ( Y ) = (cid:15) − N (cid:82) η d Y η (cid:18) Y (cid:15) (cid:19) , where η ( Y ) = (cid:40) exp (cid:16) | Y | − (cid:17) if | Y | <
10 if | Y | ≥ . (A.2)Following the arguments from [50], one can show that Π (cid:15) is a smooth approximation of Π. Let φ ( X , t ) ∈ C c ( R N × [0 , ∞ )). It follows from (A.1) that I ≡ (cid:90) ∞ (cid:90) R N Π (cid:15) ( X , t ) ∂φ∂t ( X , t )d X d t = (cid:90) ∞ (cid:90) R N η (cid:15) ( X − x ) ∂φ∂t ( X , t )d X d t. (A.3)Integrating by parts in t and applying the sifting property gives I = (cid:90) ∞ (cid:90) R N ˙ η (cid:15) ( X − x ) v ( x , t, ω ) φ ( X , t )d X d t − (cid:90) R N η (cid:15) ( X − x ) φ ( X , X , = (cid:90) ∞ (cid:90) R N (cid:90) R N ˙ η (cid:15) ( X − Y ) v ( Y , t, ω )Π( Y , t ) φ ( X , t )d Y d X d t − (cid:90) R N Π (cid:15) ( X , φ ( X , X , where ˙ η (cid:15) ( · ) is the derivative of η (cid:15) ( · ). According to the Gauss-Ostrogradsky theorem in X , I = − (cid:90) ∞ (cid:90) R N ( η (cid:15) (cid:63) v Π)( X , t ) ∇ X φ ( X , t )d X d t − (cid:90) R N Π (cid:15) ( X , φ ( X , X . (A.4)22t follows from (A.4) and (A.3) that, for any φ ∈ C c ( R N × [0 , ∞ )), (cid:90) ∞ (cid:90) R N Π (cid:15) ∂φ∂t d X d t + (cid:90) ∞ (cid:90) R N ( η (cid:15) (cid:63) v Π) ∇ X φ d X d t + (cid:90) R N Π (cid:15) ( X , φ ( X , X = 0 . By standard arguments, taking the limit (cid:15) → (cid:90) ∞ (cid:90) R N Π ∂φ∂t d X d t + (cid:90) ∞ (cid:90) R N ( v Π) ∇ X φ d X d t + (cid:90) R N Π( X , φ ( X , X d t = 0 , (A.5)Hence, Π is the distributional solution to (3). Appendix A.2. LED closure derivation
Let L denote the linear operator L Π = ∂ Π ∂t + ∇ X · ( (cid:104) v (cid:105) Π)together with the initial and boundary conditionsΠ( X ,
0) = δ ( x − X ) and lim | X |→∞ Π( X , t ) = 0 . (A.6)Let ˆ L denote the adjoint of L , i.e., ˆ L Ψ = − ∂ Ψ ∂s − (cid:104) v (cid:105) · ∇ Y Ψwith vanishing boundary condition at infinity and vanishing terminal condition Ψ( Y , t ) = 0. Finally, let G = G ( X , t ; Y , s ) be the Green’s function of ˆ L ; it is defined as a solution ofˆ LG = δ ( X − Y ) δ ( t − s ) , (A.7)with vanishing boundary condition at infinity and terminal condition G ( X , t ; Y , t ) = 0. We use themethod of characteristics to compute this Green’s function. The characteristics of (A.7), ϕ = ϕ ( r ) with s ≤ r ≤ s , satisfy d ϕ d r = (cid:104) v ( ϕ , r ) (cid:105) , r ∈ ( s, t ) , (A.8)subject to the initial condition ϕ ( s ) = Y. (A.9)It induces the associated flow Φ as ϕ ( r ) = Φ ( r ; Y , s ). Along these characteristics, (A.7) reduces todd r G ( X , t ; ϕ , r ) = − δ ( t − r ) δ ( X − ϕ ) , r ∈ ( s, t );with G ( X , t ; Φ ( t ; Y , s ) , t ) = 0. Its solution gives the Green’s function, G ( X , t ; Y , s ) = H ( t − s ) δ ( X − Φ ( t ; Y , s )) , (A.10)where H is the Heaviside function.Elementary manipulations lead to the equation for Π (cid:48) , L Π (cid:48) = −∇ X · ( v (cid:48) Π − (cid:104) v (cid:48) Π (cid:48) (cid:105) ) , (A.11)23ith homogeneous initial conditions and vanishing boundary conditions. Rewriting (A.11) in terms of Y and s , multiplying by G ( X , t ; Y , s ), and integrating by parts, we obtain (cid:90) t (cid:90) R N GL Π (cid:48) d Y d s = − (cid:90) R N G ( X , t ; Y , (cid:48) ( Y , Y + (cid:90) t (cid:90) R N Π (cid:48) ˆ LG d Y d s, where the additional boundary terms cancel because of the terminal condition on G and the vanishingboundary condition (A.6) at infinity. Accounting for (A.7) and (A.11),Π (cid:48) ( X , t ) = (cid:90) R N G ( X , t ; Y , (cid:48) ( Y , Y + (cid:90) t (cid:90) R N G ( X , t ; Y , s ) ∇ Y · (cid:18) (cid:104) v (cid:48) ( Y , s )Π (cid:48) ( Y , s ) (cid:105) − v (cid:48) ( Y , s )Π( Y , s ) (cid:19) d Y d s. The first integral vanishes if the initial condition on x is deterministic. An exact, albeit unclosed,equation for the stochastic flux (cid:104) v (cid:48) Π (cid:48) (cid:105) is obtained by multiplying the previous relation with v (cid:48) ( X , t ) andtaking the ensemble mean, (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) = (cid:90) R N G ( X , t ; Y , (cid:104) v (cid:48) ( X , t )Π (cid:48) ( Y , (cid:105) d Y − (cid:90) t (cid:90) R N G ( X , t ; Y , s ) ∇ Y · (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Y , s )Π( Y , s ) (cid:105) d Y d s. The LED closure is constructed by setting the first integral to zero, resulting in a second-order (in στ )computable relationship (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) ≈ − (cid:90) t (cid:90) R N G ( X , t ; Y , s ) ∇ Y · (cid:16) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Y , s ) (cid:105) f y ( Y ; s ) (cid:17) d Y d s. (A.12)Substituting G from (A.10) and setting q ( X , Y , t, s ) = ∇ Y · (cid:16) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Y , s ) (cid:105) f y ( Y , s ) (cid:17) , we obtain (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) ≈ − (cid:90) t (cid:90) R N δ ( X − Φ ( t ; Y , s )) q ( X , Y , t, s )d Y d s ≈ − (cid:90) t (cid:90) R N δ ( X − ϕ ) q ( X , Φ ( s ; ϕ , t ) , t, s ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ Φ ( s ; ϕ , t ) ∂ ϕ (cid:12)(cid:12)(cid:12)(cid:12) d ϕ d s. The Jacobian in this expression is made explicit by invoking the Liouville-Ostrogradsky lemma. It isreproduced and proved below because we were not able to find an appropriate reference.
Lemma . Consider an initial value problemd χ d s = g ( χ , s ) , χ ( t ) = ξ , (A.13)where g : R N × R → R N is a given deterministic function, and ξ is a possibly random initial condition.Both g and ξ are such that, for any realization of ξ , the above system admits a unique solution for therange of time under consideration. Then, if Φ is the corresponding flow, i.e., χ ( s ) = Φ ( s ; ξ , t ), we have (cid:12)(cid:12)(cid:12)(cid:12) ∂ Φ ( s ; ξ , t ) ∂ ξ (cid:12)(cid:12)(cid:12)(cid:12) = exp (cid:18) − (cid:90) ts ∇ χ · g ( χ ( r ) , r )d r (cid:19) . roof. Following the arguments of Section 2, we show that f , the PDF of χ , satisfies ∂f∂t + ∇ χ · ( g f ) = 0 . By construction, the restriction of f on characteristic curves, P ( s ) = f ( χ ( s ); s ), satisfiesd P d s = −P ∇ χ · g . Hence, P ( s ) = P ( t ) exp (cid:18) − (cid:90) st ∇ χ · g ( χ ( r ) , r )d r (cid:19) . On the other hand, treating the flow Φ as a one-to-one change of variables, P ( s ) = (cid:12)(cid:12)(cid:12)(cid:12) ∂ Φ − ∂ χ (cid:12)(cid:12)(cid:12)(cid:12) P ( t ) . Combining these two expressions, and taking the inverse completes the proof.We apply the Liouville-Ostrogradsky lemma to χ and Φ defined, respectively, as the solution of andthe flow corresponding to (A.13) with g ( χ, s ) = (cid:104) v ( χ , s ) (cid:105) and ξ = X . This yields (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) ≈ − (cid:90) t J ( s ; X , t ) ∇ Φ · (cid:16) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) f x ( Φ ( s ; X , t ); s ) (cid:17) d s, (A.14a)where J ( s ; X , t ) = exp (cid:18) − (cid:90) ts ∇ χ · (cid:104) v ( χ ( r ) , r ) (cid:105) d r (cid:19) . (A.14b)Substituting (A.14) into (4) gives an integro-differential equation for f x . Its numerical solution posessignificant challenges as the resulting problem is local neither in time nor in space.The localization inherent in the classical LED theory [26] assumes that f x and its spatial derivativesare approximately constant over the correlation-time interval ( t − τ, t ). This gives rise to approximations f x ( Φ ( s ; X , t ); s ) ≈ f x ( X ; t ) and ∇ Φ f x ( Φ ( s ; X , t ); s ) ≈ ∇ X f x ( X ; t ), so that (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) ≈ V ( X , t ) f x ( X ; t ) − D ( X , t ) ∇ X f x ( X ; t ) , (A.15)where V and D are the LED drift velocity vector and diffusion tensor V ( X , t ) ≡ − (cid:90) t J ( s ; X , t ) (cid:104) v (cid:48) ( X , t ) ∇ Φ · v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) d s, (A.16) D ( X , t ) ≡ (cid:90) t J ( s ; X , t ) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) d s. (A.17)Substituting (A.15)–(A.17) into (4) results in the classical LED equation for the joint PDF f x : ∂f x ∂t + ∇ X · (cid:16) ( (cid:104) v (cid:105) + V ) f x (cid:17) = ∇ X · ( D ∇ X f x ) . (A.18)The shortcomings of the above LED closure—most importantly, its limited validity to short correlationtimescales—are investigated in [32, 25, 51], among many other studies.25 ppendix A.3. Semi-local LED closure This closure is based on the mean-field advection problem ∂f x ∂s + ∇ X · ( (cid:104) v (cid:105) f x ) = 0 , s < t (A.19)and is equivalent to f x ( Φ ( s ; X , t ); s ) = J − ( s ; X , t ) f x ( X ; t ) . (A.20)Equation (A.20) gives a local approximation for f x ; substituting it into (A.14) yields the semi-local closure (cid:104) v (cid:48) Π (cid:48) (cid:105) ( X , t ) ≈ − (cid:90) t J ( s ; X , t ) ∇ Φ · (cid:16) (cid:104) v (cid:48) ( X , t ) v (cid:48)(cid:62) ( Φ ( s ; X , t ) , s ) (cid:105) J − ( s ; X , t ) f x ( X ; t ) (cid:17) d s. (A.21)The PDF f x in (A.21) has been localized from ( Φ ( s ; X , t ); s ) to ( X ; t ), but the operator ∇ Φ · has not.We now approximate ∇ Φ in terms of ∇ X , ∇ X = Ψ (cid:62) ∇ Φ with Ψ ij = ∂ Φ i ∂ X j ( s ; X , t ) . (A.22)Elementary calculus shows that the transpose of the sensitivity matrix of the flow, Ψ ( s ; X , t ), satisfiesd Ψ d s = JΨ , Ψ ( t ; X , t ) = I , where J ( Y , s ) = ∂ (cid:104) v ( Y , s ) (cid:105) /∂ Y is the Jacobian of the mean-field velocity, and I is the N × N identitymatrix. Consequently, Ψ can be expressed as Ψ ( s ; X , t ) = OE[ J ]( s )= I + (cid:90) st J ( Φ ( s ; X , t ) , s )d s + (cid:90) st (cid:90) s t J ( Φ ( s ; X , t ) , s ) J ( Φ ( s ; X , t ) , s )d s d s + . . . , where OE is the ordered exponential. Using the approximation J ( Φ ( s ; X , t ) , s ) ≈ J ( X , t ), the orderedexponential simplifies to the matrix exponential. Hence, we obtain Ψ ≈ exp(( s − t ) J ( X , t )) , and, accounting for (A.22), ∇ Φ ≈ exp(( t − s ) J ( X , t )) (cid:62) ∇ X ..