[PDF] Fast and Accurate Langevin Simulations of Stochastic Hodgkin-Huxley Dynamics

Abstract

Fox and Lu introduced a Langevin framework for discrete-time stochastic models of randomly gated ion channels such as the Hodgkin-Huxley (HH) system. They derived a Fokker-Planck equation with state-dependent diffusion tensor D and suggested a Langevin formulation with noise coefficient matrix S such that S S ⊺ =D . Subsequently, several authors introduced a variety of Langevin equations for the HH system. In this paper, we present a natural 14-dimensional dynamics for the HH system in which each \emph{directed} edge in the ion channel state transition graph acts as an independent noise source, leading to a 14×28 noise coefficient matrix S . We show that (i) the corresponding 14D system of ordinary differential \rev{equations} is consistent with the classical 4D representation of the HH system; (ii) the 14D representation leads to a noise coefficient matrix S that can be obtained cheaply on each timestep, without requiring a matrix decomposition; (iii) sample trajectories of the 14D representation are pathwise equivalent to trajectories of Fox and Lu's system, as well as trajectories of several existing Langevin models; (iv) our 14D representation (and those equivalent to it) give the most accurate interspike-interval distribution, not only with respect to moments but under both the L 1 and L ∞ metric-space norms; and (v) the 14D representation gives an approximation to exact Markov chain simulations that are as fast and as efficient as all equivalent models. Our approach goes beyond existing models, in that it supports a stochastic shielding decomposition that dramatically simplifies S with minimal loss of accuracy under both voltage- and current-clamp conditions.

Full PDF

11 Fast and Accurate Langevin Simulations of Stochas-tic Hodgkin-Huxley Dynamics

Shusen Pu and Peter J. Thomas − Case Western Reserve University Departments of Mathematics, Applied Mathematics,and Statistics, Biology, Cognitive Science, and Electrical, Computer, and SystemsEngineering.

Keywords:

Stochastic conductance-based model, Langevin equations, randomlygated ion channel, stochastic shielding, dimension reduction, voltage clamp, currentclamp, efﬁcient simulation.

Abstract

Fox and Lu introduced a Langevin framework for discrete-time stochastic models ofrandomly gated ion channels such as the Hodgkin-Huxley (HH) system. They de-rived a Fokker-Planck equation with state-dependent diffusion tensor D and suggesteda Langevin formulation with noise coefﬁcient matrix S such that SS (cid:124) = D . Subse-quently, several authors introduced a variety of Langevin equations for the HH system.In this paper, we present a natural 14-dimensional dynamics for the HH system in whicheach directed edge in the ion channel state transition graph acts as an independent noisesource, leading to a × noise coefﬁcient matrix S . We show that (i) the corre-sponding 14D system of ordinary differential equations is consistent with the classical4D representation of the HH system; (ii) the 14D representation leads to a noise co-efﬁcient matrix S that can be obtained cheaply on each timestep, without requiring amatrix decomposition; (iii) sample trajectories of the 14D representation are pathwiseequivalent to trajectories of Fox and Lu’s system, as well as trajectories of several ex-isting Langevin models; (iv) our 14D representation (and those equivalent to it) givethe most accurate interspike-interval distribution, not only with respect to moments butunder both the L and L ∞ metric-space norms; and (v) the 14D representation givesan approximation to exact Markov chain simulations that are as fast and as efﬁcient asall equivalent models. Our approach goes beyond existing models, in that it supports astochastic shielding decomposition that dramatically simpliﬁes S with minimal loss ofaccuracy under both voltage- and current-clamp conditions. a r X i v : . [ q - b i o . N C ] M a y Introduction

Many natural phenomena exhibit stochastic ﬂuctuations arising at the molecular scale,the effects of which impact macroscopic quantities. Understanding when and how mi-croscale ﬂuctuations will signiﬁcantly contribute to macroscale behavior is a fundamen-tal problem spanning the sciences. The impact of random ion channel ﬂuctuations onthe timing of action potentials in nerve cells provides an important example. Channelnoise can have a signiﬁcant effect on spike generation (Mainen and Sejnowski, 1995;Schneidman et al., 1998), propagation along axons (Faisal and Laughlin, 2007), andspontaneous (ectopic) action potential generation in the absence of stimulation (ODon-nell and van Rossum, 2015). At the network level, channel noise can drive endogenousvariability of vital rhythms such as respiratory activity (Yu et al., 2017).Hodgkin and Huxley’s quantitative model for active sodium and potassium currentsproducing action potential generation in the giant axon of

Loligo (Hodgkin and Hux-ley, 1952) suggested an underlying system of gating variables consistent with a multi-state Markov process description (Hill and Chen, 1972). The discrete nature of indi-vidual ion channel conductances was conﬁrmed experimentally (Neher and Sakmann,1976). Subsequently, numerical studies of high-dimensional discrete-state, continuous-time Markov chain models produced insights into the effects of ﬂuctuations in discreteion channel populations on action potentials (Skaugen and Walløe, 1979; Strassbergand DeFelice, 1993), aka channel noise (White et al., 1998, 2000).In the standard molecular-level HH model, which we adopt here, the K + channelcomprises four identical “ n ” gates that open and close independently, giving a ﬁve-vertex channel-state diagram with eight directed edges; the channel conducts a currentonly when in the rightmost state (Fig. 1, top). The Na + channel comprises three iden-tical “ m ” gates and a single “ h ” gate, all independent, giving an eight-vertex diagramwith twenty directed edges, of which one is conducting (Fig. 1, bottom).Discrete-state channel noise models are numerically intensive, whether implementedusing discrete-time binomial approximations to the underlying continuous-time Markovprocess (Skaugen and Walløe, 1979; Schmandt and Gal´an, 2012) or continuous-timehybrid Markov models with exponentially distributed state transitions and continuouslyvarying membrane potential. The latter were introduced by (Clay and DeFelice, 1983)and are in principle exact (Anderson et al., 2015). Under voltage-clamp conditionsthe hybrid conductance-based model reduces to a time-homogeneous Markov chain(Colquhoun and Hawkes, 1981) that can be simulated using standard methods such asGillespie’s exact algorithm (Gillespie, 1977, 2007). Even with this simpliﬁcation, suchMarkov Chain (MC) algorithms are numerically expensive to simulate with realisticpopulation sizes of 1000s of channels or greater. Therefore, there is an ongoing needfor efﬁcient and accurate approximation methods.Following Clay and DeFelice’s exposition of continuous time Markov chain imple-mentations, (Fox and Lu, 1994) introduced a Fokker-Planck equation (FPE) frameworkthat captured the ﬁrst and second order statistics of HH ion channel populations in a14-dimensional representation. Taking into account conservation of probability, oneneeds four variables to represent the population of K + channels, seven for Na + , andone for voltage, leading to a 12-dimensional state space description. The resultinghigh-dimensional partial differential equation is impractical to solve numerically. How-2 α n β n N n α n β n N n α n β n N n α n β n N n N n K + Channel m α m β m M Na + Channel m α m β m M Na + Channel m α m β m M Na + Channel M m m α m β m M m α m β m M m α m β m M M m α h β h α h β h α h β h α h β h Figure 1: Molecular potassium (K + ) and sodium (Na + ) channel states for the Hodgkin-Huxley model. Filled circles mark conducting states n and m . Per capita transi-tion rates for each directed edge ( α n , β n , α m , β m , α h and β h ) are voltage dependent(cf. eqns. (80)-(85)). Directed edges are numbered 1-8 (K + channel) and 1-20 (Na + -channel), marked in small red numerals.ever, as Fox and Lu observed, “to every Fokker-Planck description, there is associateda Langevin description” (Fox and Lu, 1994). They therefore introduced a Langevinstochastic differential equation of the form: C dVdt = I app ( t ) − ¯ g Na M ( V − V Na ) − ¯ g K N ( V − V K ) − g leak ( V − V leak ) , (1) d M dt = A Na M + S ξ , (2) d N dt = A K N + S ξ , (3)where C is the capacitance, I app is the applied current, maximal conductances are de-noted ¯ g ion , with V ion being the associated reversal potential, and ohmic leak current g leak ( V − V leak ) . M ∈ R and N ∈ R are vectors for the fractions of Na + andK + channels in each state, with M representing the open channel fraction for Na + ,and N the open channel fraction for K + (Fig. 1). Vectors ξ ( t ) ∈ R and ξ ( t ) ∈ R are independent Gaussian white noise processes with zero mean and unit variances (cid:104) ξ ( t ) ξ (cid:124) ( t (cid:48) ) (cid:105) = I δ ( t − t (cid:48) ) and (cid:104) ξ ( t ) ξ (cid:124) ( t (cid:48) ) (cid:105) = I δ ( t − t (cid:48) ) . The state-dependent ratematrices A Na and A K are given in eqns. (16) and (17). In Fox and Lu’s formulation, S must satisfy S = √ D , where D is a symmetric, positive semi-deﬁnite k × k “diffusionmatrix” (see Appendix D for the D matrices for the standard HH K + and Na + chan-nels). We will refer to the 14-dimensional Langevin equations (1)-(3), with S = √ D ,as the “Fox-Lu” model.The original Fox-Lu model, later called the “conductance noise model” by (Gold-wyn and Shea-Brown, 2011), did not see widespread use until gains in computing speed3ade the square root calculations more feasible. Seeking a more efﬁcient approxima-tion, (Fox and Lu, 1994) also introduced a four-dimensional Langevin version of theHH model. This model was systematically studied in (Fox, 1997) which can be writtenas follows: C dVdt = I app ( t ) − ¯ g Na m h ( V − V Na ) − ¯ g K n ( V − V K ) − g leak ( V − V leak ) (4) dxdt = α x (1 − x ) − β x x + ξ x ( t ) , where x = m, h, or , n. (5)where ξ x ( t ) are Gaussian processes with covariance function E [ ξ x ( t ) , ξ x ( t (cid:48) )] = α x (1 − x ) + β x xN δ ( t − t (cid:48) ) . (6)Here N represents the total number of Na + channels (respectively, the total number ofK + channels) and δ ( · ) is the Dirac delta function. This model, referred as the “subunitnoise model” by (Goldwyn and Shea-Brown, 2011), has been widely used as an ap-proximation to MC ion channel models (see references in Bruce (2009); Goldwyn andShea-Brown (2011)). For example, Schmid et al. (2001) used this approximation to in-vestigate stochastic resonance and coherence resonance in forced and unforced versionsof the HH model (e.g. in the excitable regime). However, the numerical accuracy of thismethod was criticized by several studies (Mino et al., 2002; Bruce, 2009), which foundthat its accuracy does not improve even with increasing numbers of channels.Although more accurate approximations based on Gillespie’s algorithm (using apiecewise constant propensity approximation, Bruce (2009); Mino et al. (2002)) andeven based on exact simulations (Clay and DeFelice, 1983; Newby et al., 2013; An-derson et al., 2015) became available, they remained prohibitively expensive for largenetwork simulations. Meanwhile, Goldwyn and Shea-Brown’s rediscovery of Fox andLu’s earlier conductance based model (Goldwyn and Shea-Brown, 2011; Goldwynet al., 2011) launched a ﬂurry of activity seeking the best Langevin-type approxima-tion. Goldwyn and Shea-Brown (2011) introduced a faster decomposition algorithm tosimulate equations (1)-(3), and showed that Fox and Lu’s method accurately capturedthe fractions of open channels and the inter-spike intervla (ISI) statistics, in comparisonwith Gillespie-type Monte Carlo (MC) simulations. However, despite the developmentof efﬁcient singular value decomposition based algorithms for solving S = √ D , thisstep still causes a bottleneck in the algorithms based on (Fox and Lu, 1994; Goldwynand Shea-Brown, 2011; Goldwyn et al., 2011).Many variations on Fox and Lu’s 1994 Langevin model have been proposed in re-cent years (Dangerﬁeld et al., 2010; Linaro et al., 2011; Dangerﬁeld et al., 2012; Orioand Soudry, 2012; G ¨uler, 2013b; Huang et al., 2013; Pezo et al., 2014; Huang et al.,2015; Fox, 2018) including Goldwyn et al’s work (Goldwyn and Shea-Brown, 2011;Goldwyn et al., 2011), each with its own strengths and weaknesses. One class of meth-ods imposes projected boundary conditions (Dangerﬁeld et al., 2010, 2012); as we willshow in §

5, this approach leads to inaccurate interspike interval distribution, and isinconsistent with a natural multinomial invariant manifold structure for the ion chan-nels. Several methods implement correlated noise at the subunit level, as in (5)-(6)(Fox, 1997; Linaro et al., 2011; G¨uler, 2013a,b). However, if one recognizes that, at the4olecular level, the individual directed edges represent the independent noise sourcesin ion channel dynamics, then the approach incorporating noise at the subunit level ob-scures the biophysical origin of ion channel ﬂuctuations. Some methods introduce thenoisy dynamics at the level of edges rather than nodes, but lump reciprocal edges to-gether into pairs (Orio and Soudry, 2012; Dangerﬁeld et al., 2012; Huang et al., 2013;Pezo et al., 2014). This approach implicitly assumes, in effect, that the ion channelprobability distribution satisﬁes a detailed balance (or microscropic reversibility) condi-tion. However, while detailed balance holds for the HH model under stationary voltageclamp, this condition is violated during active spiking. Finally, the stochastic shield-ing approximation (Schmandt and Gal´an, 2012; Schmidt and Thomas, 2014; Schmidtet al., 2018) does not have a natural formulation in the representation associated withan n × n noise coefﬁcient matrix S ; in the cases of rectangular S matrices used in (Orioand Soudry, 2012; Dangerﬁeld et al., 2012) stochastic shielding can only be applied toreciprocal pairs of edges. We will elaborate on these points in § D = SS (cid:124) is not unique. As Fox recentlypointed out, sub-block determinants of the D matrices play a major role in the structureof the S matrix elements. (Fox, 2018) conjectured that “a universal form for S mayexist”. In this paper we obtain the universal form for the noise coefﬁcient matrix S .Moreover, we prove that our model is equivalent to Fox and Lu’s 1994 model in thestrong sense of pathwise equivalence .The remainder of the paper is organized as follows. In §

2, we review the canonical deterministic

14D version of the HH model. We prove a series of lemmas which show(1) the multinomial submanifold M is an invariant manifold within the 14D space and(2) the velocity on the 14D space and the pushforward of the velocity on the 4D spaceare identical. Moreover, we show (numerically) that (3) the submanifold M is glob-ally attracting, even under current clamp conditions. Fig. 2 illustrates the relationshipbetween the 4D and 14D deterministic HH models. § × LangevinHH model. Like (Orio and Soudry, 2012; Dangerﬁeld et al., 2012; Pezo et al., 2014),we avoid matrix decomposition by computing S directly. The key difference betweenour approach and its closest relative (Pezo et al., 2014) is to use a rectangular n × k matrix S for which each directed edge is treated as an independent noise source, ratherthan lumping reciprocal edges together in pairs. In the new Langevin model, the formof our S matrix reﬂects the biophysical origins of the underlying channel noise, andallows us to apply the stochastic shielding approximation by neglecting the noise onselected individual directed edges. As we prove in §

4, our model (without the stochas-tic shielding approximation) is pathwise equivalent to all those in a particular class ofbiophysically derived Langevin models, including those used in (Fox and Lu, 1994;Goldwyn et al., 2011; Goldwyn and Shea-Brown, 2011; Orio and Soudry, 2012; Pezo5t al., 2014; Fox, 2018). In addition to 4D and 14D deterministic trajectories, Fig. 2 alsoshows a stochastic trajectory generated by our Langevin model. Finally, we compareour Langevin model to several alternative stochastic neural models in terms of accuracy(of the full ISI distribution) and numerical efﬁciency in § In this section, we review the classical four-dimensional model of Hodgkin and Huxley(1952) (HH), as well as its natural fourteen-dimensional version (Dayan and Abbott(2001), § § § §

3. In contrast to the ODE case, the stochastic versions of the 4D and 14Dmodels are not equivalent (Goldwyn and Shea-Brown, 2011).6igure 2: 4D and 14D HH models. The meshed surface is a three dimensional projec-tion of the 14D state space onto three axes representing the voltage, v , the probabilityof all four potassium gates being in the closed state, n , and the probability of exactlyone potassium gate being in the open state, n . Blue curves:

Trajectories of the deter-ministic 14D HH model with initial conditions located on the 4D multinomial invariantsubmanifold, M . We prove that M is an invariant submanifold in § Black curve:

The deterministic limit cycle solution for the 14D HH model, which forms a closedloop within M . Green curve:

A trajectory of the deterministic 14D HH model withinitial conditions (vertical green arrow) off the multinomial submanifold.

Red curve:

A trajectory of the stochastic 14D HH model (cf. §

3) with the same initial conditionsas the green trajectory. The blue and black arrows mark the directions of the trajecto-ries. Note that trajectories starting away from M converge to M ; and all deterministictrajectories converge to the deterministic limit cycle. Parameters of the simulation aregiven in Tab. 5. 7 .1 The 4D Hodgkin-Huxley Model The 4D voltage-gated ion channel HH model is a set of four ordinary differential equa-tions

C dvdt = − ¯ g Na m h ( v − V Na ) − ¯ g K n ( v − V K ) − g L ( v − V L ) + I app , (7) dmdt = α m ( v )(1 − m ) − β m ( v ) m, (8) dhdt = α h ( v )(1 − h ) − β h ( v ) h, (9) dndt = α n ( v )(1 − n ) − β n ( v ) n, (10)where v is the membrane potential, I app is the applied current, and ≤ m, n, h ≤ aredimensionless gating variables associated with Na + and K + channels. The constant ¯ g ion is the maximal value of the conductance for the sodium and potassium channel,respectively. Parameters V ion and C are the ionic reversal potentials and capacitance,respectively. The quantities α x and β x , x ∈ { m, n, h } are the voltage-dependent percapita transition rates, deﬁned in Appendix B.This system is a C ∞ vector ﬁeld on a four-dimensional manifold (with boundary)contained in R : X = {−∞ < v < ∞ , ≤ m, h, n ≤ } = R × [0 , . The manifoldis forward and backward invariant in time. If I app is constant then X has an invariantsubset given by X ∩ { v min ≤ v ≤ v max } , where v min and v max are calculated in Lemma 1.As pointed out by (Keener and Sneyd (1998), §

3, p. 106) and (Keener, 2009), forvoltage either ﬁxed or given as a prescribed function of time, the equations for m, h and n can be interpreted as the parametrization of an invariant manifold embedded ina higher-dimensional time-varying Markov system. Several papers developed this ideafor a variety of ion channel models and related systems (Keener, 2009; Earnshaw andKeener, 2010b) but the theory developed is restricted to the voltage-clamped case.Under ﬁxed voltage clamp, the ion channels form a time-homogeneous Markovprocess with a unique (voltage-dependent) stationary probability distribution. Under time-varying current clamp the ion channels nevertheless form a Markov process, albeitno longer time-homogeneous. Under these conditions the ion channel state convergesrapidly to a multinomial distribution indexed by a low-dimensional set of time-varyingparameters ( m ( t ) , h ( t ) , n ( t ) ) (Keener, 2010). In the current-clamped case, the ion chan-nel process, considered alone, is neither stationary nor Markovian, making the analysisof this case signiﬁcantly more challenging, from a mathematical point of view. For the HH kinetics given in Fig. 1 (on page 3), we deﬁne the eight-component statevector M for the Na + gates, and the ﬁve-component state vector N for the K + gates,respectively, as M = [ m , m , m , m , m , m , m , m ] (cid:124) ∈ [0 , (11) N = [ n , n , n , n , n ] (cid:124) ∈ [0 , , (12)8here (cid:80) i =0 (cid:80) j =0 m ij = 1 and (cid:80) i =0 n i = 1 . The open probability for the Na + channelis M = m , and is N = n for the K + channel. The deterministic 14D HH equationsmay be written (compare (7)-(10)) C dVdt = − ¯ g Na M ( V − V Na ) − ¯ g K N ( V − V K ) − g L ( V − V L ) + I app , (13) d M dt = A Na ( V ) M , (14) d N dt = A K ( V ) N , (15)where the voltage-dependent drift matrices A Na and A K are given by A Na ( V ) =  A Na (1) β m β h α m A Na (2) 2 β m β h α m A Na (3) 3 β m β h

00 0 α m A Na (4) 0 0 0 β h α h A Na (5) β m α h α m A Na (6) 2 β m

00 0 α h α m A Na (7) 3 β m α h α m A Na (8)  , (16) A K ( V ) =  A K (1) β n ( V ) 0 0 04 α n ( V ) A K (2) 2 β n ( V ) 0 00 3 α n ( V ) A K (3) 3 β n ( V ) 00 0 2 α n ( V ) A K (4) 4 β n ( V )0 0 0 α n ( V ) A K (5)  , (17) and the diagonal elements A ion ( i ) = − (cid:88) j : j (cid:54) = i A ion ( j, i ) , for ion ∈ { Na , K } . Earnshaw and Keener (2010b) suggests that it is reasonable to expect that the globalﬂow of the 14D system should converge to the 4D submanifold but also that it is farfrom obvious that it must. Existing theory applies to the voltage-clamped case Keener(2009); Earnshaw and Keener (2010b). Here, we consider instead the current-clampedcase, in which the ﬂuctuations of the ion channel state inﬂuences the voltage evolution,and vice-versa .In the remainder of this section we will (1) deﬁne a multinomial submanifold M and show that it is an invariant manifold within the 14D space, and (2) show that thevelocity on the 14D space and the pushforward of the velocity on the 4D space areidentical. In § M is globally attractingwithin the higher-dimensional space.In order to compare the trajectories of the 14D HH equations with trajectories of thestandard 4D equations, we deﬁne lower-dimensional and higher-dimensional domains9 and Y , respectively, as X = {−∞ < v < ∞ , ≤ m ≤ , ≤ h ≤ , ≤ n ≤ } = R × [0 , ⊂ R Y = {−∞ < v < ∞} ∩ (cid:40) ≤ m ij , (cid:88) i =0 1 (cid:88) j =0 m ij = 1 (cid:41) ∩ (cid:40) ≤ n i , (cid:88) i =0 n i = 1 (cid:41) = R × ∆ × ∆ ⊂ R , (18)where ∆ k is the k -dimensional simplex in R k +1 given by y + . . . + y k +1 = 1 , y i ≥ . The 4D HH model dxdt = F ( x ) , equations (7)-(10), is deﬁned for x ∈ X , and the 14DHH model dydt = G ( y ) , equations (13)-(15), is deﬁned for y ∈ Y . We introduce a14D model 4D model ( v, m , . . . , m , n , . . . , n ) ( v, m, h, n ) v v ( m + m ) + ( m + m ) + m + m mm + m + m + m hn / n / n / n n ‘Table 1: R : Map from the 14D HH model ( m , . . . , m , n , . . . , n ) to the 4D HHmodel ( m, h, n ) . Note that { m , . . . , m } and { n , . . . , n } both follow multinomialdistributions.dimension-reducing mapping R : Y → X as in Table 1, and a mapping from lower tohigher dimension, H : X → Y as in Table 2. We construct R and H in such a way that R ◦ H acts as the identity on X , that is, for all x ∈ X , x = R ( H ( x )) . The maps H and R are consistent with a multinomial structure for the ion channel state distribution,in the following sense. The space Y covers all possible probability distributions on theeight sodium channel states and the ﬁve potassium channel states. Those distributionswhich are products of one multinomial distribution on the K + -channel 1 and a secondmultinomial distribution on the Na + -channel2 form a submanifold of ∆ × ∆ . In thisway we deﬁne a submanifold, denoted M = H ( X ) , the image of X under H .Before showing that the multinomial submanifold M is an invariant manifold withinthe 14D space, we ﬁrst show that the deterministic 14D HH model is deﬁned on abounded domain. Having a bounded forward-invariant manifold is a general propertyof conductance-based models, which may be written in the form dVdt = f ( V, N open ) = 1 C (cid:40) I app − g leak ( V − V leak ) − (cid:88) i ∈I (cid:2) g i N i open ( V − V i ) (cid:3)(cid:41) (19) d N dt = A ( V ) N and (20) N open = O [ N ] . (21)1 That is, distributions indexed by a single open probability n ; with the ﬁve states having probabilities (cid:0) i (cid:1) n i (1 − n ) − i for ≤ i ≤ . That is, distributions indexed by two open probabilities m and h , with the eight states having proba-bilities (cid:0) i (cid:1) m i (1 − m ) − i h j (1 − h ) − j , for ≤ i ≤ , and ≤ j ≤ .

10D model 14D model ( v, m, h, n ) ( v, m , . . . , m , n , . . . , n ) v v (1 − n ) n − n ) n n − n ) n n − n ) n n n n (1 − m ) (1 − h ) m − m ) m (1 − h ) m − m ) m (1 − h ) m m (1 − h ) m (1 − m ) h m − m ) mh m − m ) m h m m h m Table 2: H : Map from the 4D HH model ( m, h, n ) and the 14D HH model ( m , . . . , m , n , . . . , n ) .Here, C is the membrane capacitance, I app is an applied current with upper and lowerbounds I ± respectively, and g i is the conductance for the i th ion channel. The index i runs over the set of distinct ion channel types in the model, I . The gating vector N represents the fractions of each ion channel population in various ion channel states,and the operator O gives the fraction of each ion channel population in the open (orconducting) channel states. The following lemma establishes that any conductance-based model (including the 4D or 14D HH model) is deﬁned on a bounded domain. Lemma 1.

For a conductance-based model of the form (19) - (21) , and for any boundedapplied current I − ≤ I app ≤ I + , there exist upper and lower bounds V max and V min such that trajectories with initial voltage condition V ∈ [ V min , V max ] remain within thisinterval for all times t > , regardless of the initial channel state.Proof. Let V = min i ∈I { V i } ∧ V leak , and V = max i ∈I { V i } ∨ V leak , where the index i runsover I , the set of distinct ion channel types. Note that for all i , ≤ N i open ≤ , and11 i > , g leak > . Therefore when V ≤ V dVdt = 1 C (cid:40) I app − g leak ( V − V leak ) − (cid:88) i ∈I (cid:2) g i N i open ( V − V i ) (cid:3)(cid:41) (22) ≥ C (cid:40) I app − g leak ( V − V ) − (cid:88) i ∈I (cid:2) g i N i open ( V − V ) (cid:3)(cid:41) (23) ≥ C (cid:40) I app − g leak ( V − V ) − (cid:88) i ∈I [ g i × × ( V − V )] (cid:41) (24) = 1 C { I app − g leak ( V − V ) } . (25)Inequality (23) follows because V = min i ∈I { V i } ∧ V leak , and inequality (24) followsbecause V − V ≤ , g i > and N i open ≥ . Let V min := min (cid:110) I − g leak + V , V (cid:111) . When V < V min , dVdt > . Therefore, V will not decrease beyond V min .Similarly, when V ≥ V dVdt = 1 C (cid:40) I app − g leak ( V − V leak ) − (cid:88) i ∈I (cid:2) g i N i open ( V − V i ) (cid:3)(cid:41) (26) ≤ C (cid:40) I app − g leak ( V − V ) − (cid:88) i ∈I (cid:2) g i N i open ( V − V ) (cid:3)(cid:41) (27) ≤ C (cid:40) I app − g leak ( V − V ) − (cid:88) i ∈I [ g i × × ( V − V )] (cid:41) (28) = 1 C { I app − g leak ( V − V ) } . (29)Inequality (27) holds because V = max i ∈I { V i } ∨ V leak , and inequality (28) holds because V − V ≥ , g i > and N i open ≥ . Let V max = max (cid:110) I app g leak + V , V (cid:111) . When V > V max , dVdt < . Therefore, V will not go beyond V max .We conclude that if V takes an initial condition in the interval [ V min , V max ] , then V ( t ) remains within this interval for all t ≥ .Given that y ∈ Y has a bounded domain, Lemma 2 follows directly, and establishesthat the multinomial submanifold M is a forward-time–invariant manifold within the14D space. The proof of Lemma 2 is in Appendix C. Lemma 2.

Let X and Y be the lower-dimensional and higher-dimensional Hodgkin-Huxley manifolds given by (18) , and let F and G be the vector ﬁelds on X and Y deﬁned by (7) - (10) and (13) - (15) , respectively. Let H : X → M ⊂ Y and R : Y →X be the mappings given in Tables 2 and 1, respectively, and deﬁne the multinomialsubmanifold M = H ( X ) . Then M is forward-time–invariant under the ﬂow generatedby G . Moreover, the vector ﬁeld G , when restricted to M , coincides with the vector ﬁeldinduced by F and the map H . That is, for all y ∈ M , G ( y ) = D x H ( R ( y )) · F ( R ( y )) . § G on Y convergesto M exponentially fast. Thus, an initial probability distribution over the ion chan-nel states that is not multinomial quickly approaches a multinomial distribution withdynamics induced by the 4D HH equations. Similar results, restricted to the voltage-clamp setting, were established by Keener and Earnshaw (Keener and Sneyd, 1998;Keener, 2009; Earnshaw and Keener, 2010b). Keener and Earnshaw (Keener and Sneyd, 1998; Keener, 2009; Earnshaw and Keener,2010b) showed that for Markov chains with constant (even time varying) transitionrates: (i) the multinomial probability distributions corresponding to mean-ﬁeld mod-els (such as the HH sodium or potassium models) form invariant submanifolds withinthe space of probability distributions over the channel states, and (ii) arbitrary initialprobability distributions converged exponentially quickly to the invariant manifold. Forsystems with prescribed time-varying transition rates, such as for an ion channel systemunder voltage clamp with a prescribed voltage V ( t ) as a function of time, the distribu-tion of channel states had an invariant submanifold again corresponding to the multi-nomial distributions, and the ﬂow on that manifold induced by the evolution equationswas consistent with the ﬂow of the full system.In the preceding section we established the dynamical consistency of the 14D and4D models with enough generality to cover both the voltage-clamp and current-clampsystems; the latter is distinguished by NOT having a prescribed voltage trace, but ratherhaving the voltage coevolve along with the (randomly ﬂuctuating) ion channel states.Here, we give numerical evidence for exponential convergence under current clampsimilar to that established under voltage clamp by Keener and Earnshaw.Rather than providing a rigorous proof, we give numerical evidence for the standarddeterministic HH model that y → M under current clamp (spontaneous ﬁring condi-tions) in the following sense: if y ( t ) is a solution of ˙ y = G ( y ) with arbitrary initialdata y ∈ Y , then || y ( t ) − H ( R ( y ( t ))) || → as t → ∞ , exponentially quickly. More-over, the convergence rate is bounded by λ = max ( λ v , λ Na , λ K ) , where λ ion is the leastnegative nontrivial eigenvalue of the channel state transition matrix (over the voltagerange V min ≤ v ≤ V max ) for a given ion, and − /λ v is the largest value taken by themembrane time constant (for V min ≤ v ≤ V max ). In practice, we ﬁnd that the membranetime constant does not determine the slowest time scale for convergence to M . In fact itappears that the second-least-negative eigenvalues (not the least-negative eigenvalues)of the ion channel matrices set the convergence rate.Note that y ∈ Y can be written as y = [ V ; M ; N ] . As shown in Appendix C,the Jacobian matrix ∂H∂x consists of three block matrices: one for the voltage terms, ∂V∂v , one associated to the Na + gates, given by ∂ M ∂m and ∂ M ∂h , and one corresponding tothe K + gates, ∂ N ∂n . Fixing a particular voltage v , let λ i , i ∈ { , , , . . . , } be theeight eigenvalues of A Na and v i be the associated eigenvectors, i.e., A Na v i = λ i v i forthe rate matrix in equations (14). Similarly, let η i , w i , i ∈ { , , , . . . , } be the ﬁveeigenvalues and the associated eigenvectors of A K , i.e., A K w i = η i w i , for the rate matrix13n equations (15). If we rank the eigenvalues of either matrix in descending order, theleading eigenvalue is always zero (because the sum of each column for A Na and A K is zero for every V ) and the remainder are real and negative. Let λ and η denotethe largest (least negative) nontrivial eigenvalues of A Na and A K , respectively, and let v ∈ R and w ∈ R be the corresponding eigenvectors.The eigenvectors of the full 14D Jacobian are not simply related to the eigenvec-tors of the component submatrices, because the ﬁrst (voltage) row and column con-tain nonzero off-diagonal elements. However, the eigenvectors associated to the largestnonzero eigenvectors of A Na and A K (respectively v and w ) are parallel to ∂M/∂h and ∂N/∂n , regardless of voltage. In other words, the slowest decaying directions for eachion channel, v and w , transport the ﬂow along the multinominal sub-manifold of Y .Therefore, it is reasonable to make the hypothesis that if Y ( t ) is a solution of ˙ y = G ( y ) with arbitrary initial data y ∈ Y , then || y ( t ) − H ( R ( y ( t ))) |||| y (0) − H ( R ( y (0))) || (cid:46) e − λ t (30)for λ being the second largest nonzero eigenvalue of A K and A Na over all v in therange v min < v < v max . The convergence behavior is plotted numerically in Fig. 3, andis consistent with the Ansatz (30). We calculate the distance from a point y to M as y max = argmax y ∈Y (cid:107) y − H ( R ( y )) (cid:107) . (31)In order to obtain an upper bound on the distance as a function of time, we begin withthe furthest point in the simplex from M , by numerically ﬁnding the solution to theargument (31), which is y max = [ v, . , , , . , , , , , . , , , , . . This vector represents the furthest possible departure from the multinomial distribution:all probability equally divided between the extreme states m and m for the sodiumchannel, and the extremal states n and n for potassium. The maximum distance fromthe multinomial submanifold M , d max , is calculated using this point. As shown inFig. 3, the function d max e − λ t provides a tight upper bound for the convergence ratefrom arbitrary initial data y ∈ Y to the invariant submanifold M . Finite populations of ion channels generate stochastic ﬂuctuations (“channel noise”) inionic currents that inﬂuence action potential initiation and timing (White et al., 1998;Schneidman et al., 1998). At the molecular level, ﬂuctuations arise because transitionsbetween individual ion channel states are stochastic (Hill and Chen, 1972; Neher andSakmann, 1976; Skaugen and Walløe, 1979). Each directed edge in the ion channelstate transition diagrams (cf. Fig. 1) introduces an independent noise source. It is ofinterest to be able to attribute variability of the interspike interval timing distribution tospeciﬁc molecular noise sources, speciﬁcally individual directed edges in each channel14igure 3:

Convergence of trajectories y ( t ) , for arbitrary initial conditions y ∈ Y , to the multi-nomial submanifold M , for an ensemble of random initial conditions. A: distance (31) between y ( t ) and M . B: Logarithm of the distance in panel A. The red solid line shows d max e − λ t inpanel A and log( d max ) − λ t in panel B. state graph. In order to explore these contributions, we develop a system of Langevinequations for the Hodgkin-Huxley equations, set in a 14-dimensional phase space.Working with a higher-dimensional stochastic model may appear inconvenient, butin fact has several advantages. First, any projection of an underlying 14D model onto alower (e.g. 4D) stochastic model generally entails loss of the Markov property. Second,the higher-dimensional representation allows us to assess the contribution of individ-ual molecular transitions to the macroscopically observable variability of timing in theinterspike interval distribution. Third, by using a rectangular noise coefﬁcient matrixconstructed directly from the transitions in the ion channel graphs, we avoid a matrixdecomposition step. This approach leads to a fast algorithm that is equivalent to theslower algorithm due to Fox and Lu (Fox and Lu, 1994; Goldwyn and Shea-Brown,2011) in a strong sense (pathwise equivalence) that we detail in § An “exact” representation of the Hodgkin-Huxley system with a population of M tot sodium channels and N tot potassium channels treats each of the 20 directed edges inthe sodium channel diagram, and each of the 8 directed edges in the potassium channeldiagram, as independent Poisson processes, with voltage-dependent per capita intensi-ties. As in the deterministic case, the sodium and potassium channel population vectors M and N satisfy (cid:80) i =0 (cid:80) j =0 M ij ≡ ≡ (cid:80) i =0 N i .3 Thus they are constrained, re-3 We annotate the stochastic population vector M either as [ M , M , . . . , M ] or as [ M , . . . , M ] ,whichever is more convenient. In either notation M ≡ M is the conducting state of the Na + channel.For the K + channel, N denotes the conducting state. R and a 4D simplex embedded in R . In therandom–time-change representation (Anderson and Kurtz, 2015) the exact evolutionequations are written in terms of sums over the directed edges E for each ion channel, E Na = { , . . . , } and E K = { , . . . , } , cf. Fig. 1. M ( t ) = M (0) + 1 M tot (cid:88) k ∈E Na ζ Na k Y Na k (cid:18) M tot (cid:90) t α Na k ( V ( s )) M i ( k ) ( s ) ds (cid:19) (32) N ( t ) = N (0) + 1 N tot (cid:88) k ∈E K ζ K k Y K k (cid:18) N tot (cid:90) t α K k ( V ( s )) N i ( k ) ( s ) ds (cid:19) . (33)Here ζ ion k is the stoichiometry vector for the k th directed edge. If we write i ( k ) forthe source node and j ( k ) for the destination node of edge k , then ζ ion k = e ion j ( k ) − e ion i ( k ) .4 Each Y ion k ( τ ) is an independent unit-rate Poisson process, evaluated at “inter-nal time” (or integrated intensity) τ , representing the independent channel noise aris-ing from transitions along the k th edge. The voltage-dependent per capita transitionrate along the k th edge is α ion k ( v ) , and M i ( k ) ( s ) (resp. N i ( k ) ( s ) ) is the fractional oc-cupancy of the source node for the k th transition at time s . Thus, for example, thequantity M tot α Na k ( V ( s )) M i ( k ) ( s ) gives the net intensity along the k th directed edge inthe Na + channel graph at time s . Remark 1.

Under “voltage-clamp” conditions, with the voltage V held ﬁxed, (32) - (33) reduce to a time-invariant ﬁrst-order transition process on an directed graph (Schmidtand Thomas, 2014; Gadgil et al., 2005). Under “current-clamp” conditions, the voltage evolves according to a conditionallydeterministic current balance equation of the form dVdt = 1 C { I app ( t ) − ¯ g Na M ( V − V Na ) − ¯ g K N ( V − V K ) − g leak ( V − V leak ) } . (34)Here, C ( µF/cm ) is the capacitance, I app ( nA/cm ) is the applied current, the maximalconductance is ¯ g chan ( mS/cm ), V chan ( mV ) is the associated reversal potential, and theohmic leak current is g leak ( V − V leak ) .The random–time-change representation (32)-(34) leads to an exact stochastic sim-ulation algorithm, given in (Anderson and Kurtz, 2015); equivalent simulation algo-rithms have been used previously (Clay and DeFelice, 1983; Newby et al., 2013). Manyauthors substitute a simpliﬁed Gillespie algorithm that imposes a piecewise-constantpropensity approximation, ignoring the voltage dependence of the transition rates α ion k between channel transition events (Goldwyn et al., 2011; Goldwyn and Shea-Brown,2011; Orio and Soudry, 2012; Pezo et al., 2014). The two methods give similar momentstatistics, provided N tot , M tot (cid:38) (Anderson and Kurtz, 2015); their similarity regard-ing path-dependent properties (including interspike interval distributions) has not beenstudied in detail. Moreover, both Markov chain algorithms are prohibitively slow formodest numbers (e.g. thousands) of channels; the exact algorithm may be even slowerthan the approximate Gillespie algorithm. For consistency with previous studies, in this4 We write e Na i and e K i for the i th standard unit vector in R or R , respectively. M tot = 6000 Na + and N tot = 1800 K + channels as our “gold standard” Markov chain (MC) model,as in (Goldwyn and Shea-Brown, 2011).In § For sufﬁciently large number of channels, (Schmidt and Thomas, 2014; Schmidt et al.,2018) showed that under voltage clamp, equations (32)-(33) can be approximated bya multidimensional Ornstein-Uhlenbeck (OU) process (or Langevin equation) in theform5 d M = (cid:88) k =1 ζ Na k (cid:26) α Na k ( V ) M i ( k ) dt + (cid:113) (cid:15) Na α Na k ( V ) M i ( k ) dW Na k (cid:27) (35) d N = (cid:88) k =1 ζ K k (cid:26) α K k ( V ) N i ( k ) dt + (cid:113) (cid:15) K α K k ( V ) N i ( k ) dW K k (cid:27) . (36)Here, M , N , ζ ion k , and α ion k have the same meaning as in (32)-(33). The channel state increments in a short time interval dt are d M and d N , respectively. The ﬁnite-timeincrement in the Poisson process Y ion k is now approximated by a Gaussian process,namely the increment dW ion k in a Wiener (Brownian motion) process associated witheach directed edge. These independent noise terms are scaled by (cid:15) Na = 1 /M tot and (cid:15) K = 1 /N tot , respectively.Equations (34)-(36) comprise a system of Langevin equations for the HH system(under current clamp) on a 14-dimensional phase space driven by 28 independent whitenoise sources, one for each directed edge. These equations may be written succinctlyin the form d X = f ( X ) dt + √ (cid:15) G ( X ) d W ( t ) (37)where we deﬁne the 14-component vector X = ( V ; M ; N ) , and W ( t ) is a Wiener pro-cess with 28 independent components. The deterministic part of the evolution equation5 The convergence of the discrete channel system to a Langevin system under voltage clamp is aspecial case of Kurtz’ theorem (Kurtz, 1981). ( X ) = (cid:2) dVdt ; d M dt ; d N dt (cid:3) is the same as the mean-ﬁeld, equations (13)-(15). The state-dependent noise coefﬁcient matrix G is × and can be written as √ (cid:15) G =  × × S Na × × S K  . The coefﬁcient matrix S K is S K = 1 √ N tot  −√ α n n √ β n n √ α n n −√ β n n −√ α n n √ β n n √ α n n −√ β n n · · ·· · · −√ α n n √ β n n √ α n n −√ β n n −√ α n n √ β n n √ α n n −√ β n n  , and S Na is given in Appendix D. Note that each of the 8 columns of S K corresponds tothe ﬂux vector along a single directed edge in the K + channel transition graph. Simi-larly, each of the 20 columns of S Na corresponds to the ﬂux vector along a directed edgein the Na + graph (cf. App. § D).

Remark 2.

Although the ion channel state trajectories generated by equation (37) arenot strictly bounded to remain within the nonnegative simplex, empirically, the voltagenevertheless remains within speciﬁed limits with overwhelming probability.

To facilitate comparison of the model (34)-(36) with prior work (Fox and Lu, 1994;Fox, 1997; Goldwyn and Shea-Brown, 2011), we may rewrite the × D Langevindescription in the equivalent form

C dVdt = I app ( t ) − ¯ g Na M ( V − V Na ) − ¯ g K N ( V − V K ) − g leak ( V − V leak ) , (38) d M dt = A Na M + S Na ξ Na , (39) d N dt = A K M + S K ξ K , (40)The drift matrices A Na and A K remain the same as in (Fox and Lu, 1994), and are thesame as in the 14D deterministic model (16)-(17). S Na and S K are constructed fromdirect transitions of the underlying kinetics in Fig. 1, ξ Na ∈ R and ξ K ∈ R are vectorsof independent Gaussian white noise processes with zero mean and unit variance.Fox and Lu’s original approach (Fox and Lu, 1994) requires solving a matrix squareroot equation SS (cid:124) = D to obtain a square ( × for Na + or × for K + ) noise coef-ﬁcient matrix consistent with the state-dependent diffusion matrix D . As an advantage,18he ion channel representation (38)-(40) uses sparse, nonsquare noise coefﬁcient ma-trices ( × for the Na + channel and × for the K + channel), which exposes theindependent sources of noise for the system.The new Langevin model in (38)-(40) does not require detailed balance, which givesmore insights to the underlying kinetics. Review papers such as (Goldwyn and Shea-Brown, 2011; Pezo et al., 2014; Huang et al., 2015), did systematic comparison ofvarious stochastic versions of the HH model. In § §

5, we quantitaviely analyzethe connection between the new model and other existing models (Fox and Lu, 1994;Goldwyn et al., 2011; Goldwyn and Shea-Brown, 2011; Dangerﬁeld et al., 2010; Orioand Soudry, 2012; Dangerﬁeld et al., 2012; Huang et al., 2013; Pezo et al., 2014; Huanget al., 2015; Fox, 2018). Problems such as the boundary constrains are beyond thescope of this paper, however, we would like to connect the new model to another typeof approximation to the MC model, namely the stochastic shielding approximation.

The stochastic shielding (SS) approximation was introduced by Schmandt and Gal ´ an(Schmandt and Gal´an, 2012), in order to approximate the Markov process using ﬂuc-tuations from only a subset of the transitions, namely those corresponding to changesin the observable states. In (Schmidt and Thomas, 2014), we showed that, under volt-age clamp, each directed edge makes a distinct contribution to the steady-state varianceof the ion channel conductance, with the total variance being a sum of these contribu-tions. We call the variance due to the k th directed edge the edge importance ; assumingdetailed balance, it is given by R k = J k n (cid:88) i =2 n (cid:88) j =2 (cid:18) − λ i + λ j (cid:19) ( c (cid:124) v i ) ( w (cid:124) i ζ k ) ( ζ (cid:124) k w j ) (cid:0) v (cid:124) j c (cid:1) . (41)Here, J k is the steady-state probability ﬂux along the k th directed edge; λ n < λ n − ≤ . . . ≤ λ < are the eigenvalues of the drift matrix ( A Na or A K , respectively), and v i (resp. w i ) are the corresponding right (resp. left) eigenvectors of the drift matrix.Each edge’s stoichiometry vector ζ k has components summing to zero; consequentlythe columns of A Na and A K all sum to zero. Thus each drift matrix has a leading trivialeigenvalue λ ≡ . The vector c speciﬁes the unitary conductance of each ion channelstate; for the HH model it is proportional to e Na or e K , respectively.Fig. 4 shows the edge importance for each pair of edges in the HH Na + and K + ionchannel graph, as a function of voltage in the range [ − , mV. Note that recipro-cal edges have identical R k due to detailed balance. Under voltage clamp, the largestvalue of R k for the HH channels always corresponds to directly observable transitions,i.e. edges k such that | c (cid:124) ζ k | > , although this condition need not hold in general(Schmidt et al., 2018).To apply the stochastic shielding method under current clamp, we simulate themodel with noise from only a selected subset E (cid:48) ⊂ E of directed edges, replacing19igure 4: Stochastic shielding under voltage clamp. Redrawn (with permission) fromFigs. 10 & 13 of (Schmidt and Thomas, 2014). Each curve shows the edge importanceR k (equation (41)) as a function of voltage in the range [ − , mV for a differentedge pair. For the K + kinetics, R = R are the largest R k value in the voltage rangeabove. For the Na + kinetics, R = R have the largest R k values in the subthresholdvoltage range (c. [ − , − mV), and R = R have the largest R k values in thesuprathreshold voltage range (c. [ − , mV).(39)-(40) with d M dt = A Na M + S (cid:48) Na ξ Na , (42) d N dt = A K M + S (cid:48) K ξ K , , (43)where S (cid:48) Na (resp. S (cid:48) K ) is a reduced matrix containing only the noise coefﬁcients fromthe most important edges E (cid:48) . That is, E (cid:48) contains a subset of edges with the largestedge-importance values R k .Schmandt and Gal´an (2012) assumed that the edges with the largest contributioncontribution to current ﬂuctuations under voltage clamp would also make the largestcontributions to variability in voltage and timing under current clamp, and includededges − of the K + channel ( E (cid:48) K = { , } ) and edges − and − of theNa + channel ( E (cid:48) Na = { , , , } ), yielding an × matrix S (cid:48) Na and an × matrix S (cid:48) K . They demonstrated numerically that restricting stochastic forcing to theseedges gave a signiﬁcantly faster simulation with little appreciable change in statisticalbehavior: under voltage clamp, the mean current remained the same, with a small (butnoticeable) decrease in the current variance; meanwhile similar inter-spike interval (ISI)statistics were observed.Under current clamp, detailed balance is violated, and it is not clear from mathemat-ical principles whether the edges with the largest R k under voltage clamp necessarilymake the largest contribution under other circumstances. In order to evaluate the con-tribution of the ﬂuctuations driven by each directed edge on ISI variability, we test the20tochastic shielding method by removing all but one column of S ion at a time. That is,we restrict to a single noise source and observe the resulting ISI variance empirically.For example, to calculate the importance of the k th direct edge in the Na + channel, wesuppress the noise from all other edges by setting S (cid:48) K ξ K = × and S (cid:48) Na = (cid:104) × , · · · , S Na (: , k ) , · · · , × (cid:105) i.e., only include the k th column of S Na and set other columns to be zeros. The ISIvariance was calculated from an ensemble of voltage traces, each spanning c. 500ISIs.Fig. 5A plots the logarithm of the ISI variance for each edge in E K . Vertical bars(cyan) show the ensemble mean of the ISI variance, with a 95% conﬁdence interval su-perimposed (magenta). Several observations are in order. First, the ISI variance drivenby the noise in each edge decreases rapidly, the further the edge is from the observabletransitions (edges 7,8), reﬂecting the underlying “stochastic shielding” phenomenon.Second, the symmetry of the edge importance for reciprocal edge pairs ((1,2), (3,4),(5,6) and (7,8)) that is observed under voltage clamp is broken under current clamp.The contribution of individual directed edges to timing variability under current clamphas another important difference compared with the edge importance (current ﬂuctua-tions) under voltage clamp. A similar breaking of symmetry for reciprocal edges is seenfor the Na + channel, again reﬂecting the lack of detailed balance during active spiking.Fig. 5B shows the ISI variance when channel noise is included on individual edgesof E Na . Here the difference between voltage and current clamp is striking. Under volt-age clamp, the four most important edges are always those representing “observabletransitions”, in the sense that the transition’s stoichiometry vector ζ is not orthogonal tothe conductance vector c . That is, the four most important pairs are always 11-12 and19-20, regardless of voltage (Fig. 4). Under current clamp, the most important edgesare 17, 18, 19 and 20. Although edges 11 and 12 are among the four most impor-tant sources of channel population ﬂuctuations under voltage clamp, they are not evenamong the top ten contributors to ISI variance, when taken singly. Even though edges17 and 18 are “hidden”, meaning they do not directly change the instantaneous channelconductance, these edges are nevertheless the second most important pair under currentclamp. Therefore, when we implement the stochastic-shielding based approximation,we include the pairs 17-18 and 19-20 in equation (42). We refer to the approximate SSmodel driven by these six most important edges as the × D HH model.Given the other parameters we use for the HH model (cf. Tab. 5 in Appendix B), theinput current of I app = 10 nA is slightly beyond the region of multistability associatedwith a subcritical Andronov-Hopf bifurcation. In order to make sure the results arerobust against increases in the applied current, we tried current injections ranging from20 to 100 nA. While injecting larger currents decreased the ISI variance, it did notchange the rank order of the contributions from the most important edges. Fox and Lu’s method was widely used since its appearance (see references in Bruce(2009); Goldwyn and Shea-Brown (2011); Huang et al. (2015)), and the “best” ap-21igure 5: Logarithm of variance of ISI for stochastic shielding under current clamp.Cyan bar is the mean of ISI, and magenta plots 95% conﬁdence interval of the mean ISI(see text for details). The applied current is 10 nA with other parameters speciﬁed inthe Appendices. For the K + kinetics, the largest contribution edge is 7, and 8 is slightlysmaller ranking the second largest. For the Na + kinetics, the largest contribution pairis 19 and 20, with 20 slightly smaller than 19. Moreover, edge 17 and 18 is the secondlargest pair. 22roximation for the underlying Markov Chain (MC) model has been a subject of ongo-ing discussion for decades. Several studies (Mino et al., 2002; Bruce, 2009; Senguptaet al., 2010) attested to discrepancies between Fox’s later approach in (Fox, 1997) andthe discrete-state MC model, raising the question of whether Langevin approximationscould ever accurately represent the underlying ﬂuctuations seen in the “gold standard”MC models. An inﬂuential review paper (Goldwyn and Shea-Brown, 2011) found thatthese discrepancies were due to the way in which noise is added to the stochastic dif-ferential equations (1)-(3). Recent studies including Dangerﬁeld et al. (2010); Linaroet al. (2011); Goldwyn and Shea-Brown (2011); Goldwyn et al. (2011); Dangerﬁeldet al. (2012); Orio and Soudry (2012); G ¨uler (2013b); Huang et al. (2013); Pezo et al.(2014); Huang et al. (2015); Fox (2018) discussed various ways of incorporating chan-nel noise into HH kinetics based on the original work by Fox and Lu (Fox and Lu,1994; Fox, 1997), some of which have the same SDEs but with different boundary con-ditions. Different boundary conditions (BCs) are not expected to have much impact oncomputational efﬁciency. Indeed, if BCs are neglected, the main difference betweenchannel-based (or conductance-based) models is the diffusion matrix S in the Langevineuqations (2) and (3). As the discussion about where and how to incorporate noise intothe HH model framework goes on, Fox (2018) recently asked whether there is a way ofrelating different models with different S matrices. We give a positive answer to thisquestion below.In § × D” Langevin model.

Two Langevin models are pathwise equivalent if the sample paths (trajectories) of onemodel can be made to be identical to the sample paths of the other, under an appropriatechoice of Gaussian white noise samples for each. To make this notion precise, considertwo channel-based Langevin models of the form dX = f ( X ) dt + G ( X ) dW with thesame mean dynamics f ∈ R d and two different d × n matrices (possibly with differentvalues of n and n ), G and G . Denote f : R d → R d , (44) G : R d → R d × n , (45) G : R d → R d × n . (46)Let X ( t ) = [ X ( t ) , X ( t ) , . . . , X d ( t )] (cid:124) and X ∗ ( t ) = [ X ∗ ( t ) , X ∗ ( t ) , . . . , X ∗ d ( t )] (cid:124) betrajectories produced by the two models and let W ( t ) = [ W ( t ) , W ( t ) , . . . , W n ( t )] (cid:124) and W ∗ ( t ) = [ W ∗ ( t ) , W ∗ ( t ) , . . . , W ∗ n ( t )] (cid:124) be vectors of Weiner processes. That is, W i ( t ) , i = 1 , , . . . , n and W ∗ j ( t ) , j = 1 , , . . . , n are independent Wiener processeswith (cid:104) W i ( s ) W j ( t ) (cid:105) = δ ij δ ( t − s ) and (cid:104) W ∗ i ( s ) W ∗ j ( t ) (cid:105) = δ ij δ ( t − s ) . Note that n and n need not be equal. As deﬁned in (Allen et al., 2008), the stochastic differential equation23SDE) models dX = f ( t, X ( t )) dt + G ( t, X ( t )) dW ( t ) (47)and dX ∗ = f ( t, X ∗ ( t )) dt + G ( t, X ∗ ( t )) dW ∗ ( t ) (48)are pathwise equivalent if systems (47) and (48) posses the same probability distribu-tion, and moreover, a sample path solution of one equation is also a sample solutionto the other one. Allen et al. (2008) proved a theorem giving general conditions underwhich the trajectories of two SDEs are equivalent. We follow their construction closelybelow, adapting it to the case of two different Langevin equations for the Hodgkin-Huxley system represented in a 14-dimensional state space.As in §

3, channel-based Langevin models for the stochastic dynamics of HH can bewritten as dX = f ( X ) dt + S ( X ) dW ( t ) (49)where the 14-component random vector X = ( V ; M ; N ) and f ( x ) = (cid:2) dVdt ; d M dt ; d N dt (cid:3) isthe same as the mean-ﬁeld, eqns. (13)-(15). Recall that x = [ v, m , n ] (cid:124) . Here we write S ( x ) =  × m × n S Na ( m ) × n × m S K ( n )  , with S Na : R → R × m , (50)for the Na + channel, and S K : R → R × n , (51)for the K + channel. Here, m is the number of independent white noise forcing termsaffecting the sodium channel variables, while n is the number of independent noisesources affecting the potassium gating variables. We write W ( t ) = [ W ( t ) , W ( t ) , . . . , W m + n ( t )] (cid:124) for a Wiener process incorporating both the sodium and potassium noise forcing. Giventwo channel-based models with diffusion matrices S Na,1 : R → R × m , (52) S Na,2 : R → R × m , (53)for the Na + channel, and S K,1 : R → R × n , (54) S K,2 : R → R × n , (55)for the K + channel, we construct the diffusion matrix D = SS (cid:124) . In order for the twomodels to generate equivalent sample paths, it sufﬁces that they have the same diffusionmatrix, i.e. D = S S (cid:124) =  × × × × D Na × × × D K  = S S (cid:124) . dX = f ( t, X ( t )) dt + S ( t, X ( t )) dW ( t ) , (56) dX ∗ = f ( t, X ∗ ( t )) dt + S ( t, X ∗ ( t )) dW ∗ ( t ) . (57)The probability density function p ( t, x ) for random variable X in eqn. (56) satisﬁesthe Fokker-Planck equation ∂p ( t, x ) ∂t = 12 (cid:88) i =1 8 (cid:88) j =1 ∂ ∂x i x j (cid:104) p ( t, x ) m + n (cid:88) l =1 S ( i,l )1 ( t, x ) S ( j,l )1 ( t, x ) (cid:105) − (cid:88) i =1 ∂∂x i (cid:104) f i ( t, x ) p ( t, x ) (cid:105) = 12 (cid:88) i =1 8 (cid:88) j =1 ∂ ∂x i x j (cid:104) D ( i,j ) ( t, x ) p ( t, x ) (cid:105) − (cid:88) i =1 ∂∂x i (cid:104) f i ( t, x ) p ( t, x ) (cid:105) (58)where S ( i,j )1 ( t, x ) is the ( i, j ) th entry of the diffusion matrix S ( t, x ) . Eqn. (58) holdsbecause D ( i,j ) ( t, x ) = m + n (cid:88) l =1 S ( i,l )1 ( t, x ) S ( j,l )1 ( t, x ) . If z , z ∈ R and z (cid:54) z , then P ( z (cid:54) X ( t ) (cid:54) z ) = (cid:90) z , z , (cid:90) z , z , · · · (cid:90) z , z , p ( t, x ) dx dx · · · dx . Note that (56) and (57) have the same expression (58) for the Fokker-Planck equation,therefore, X and X ∗ possess the same probability density function. In other words, theprobability density function of X in eqn. (49) is invariant for different choices of thediffusion matrix S . We now explicitly construct a mapping between Fox and Lu’s 14D model (Fox and Lu,1994) and any channel-based model (given the same boundary conditions). We beginwith a channel-based Langevin description dX = f ( t, X ( t )) dt + S ( t, X ( t )) dW ( t ) , (59)and Fox and Lu’s model (Fox and Lu, 1994) dX ∗ = f ( t, X ∗ ( t )) dt + S ( t, X ∗ ( t )) dW ∗ ( t ) , (60)where S is a d by m matrix satisfying SS (cid:124) = D (note that S is not necessarily a squarematrix), and S = √D .Let T be the total simulation time of the random process in equations (59) and (60).For (cid:54) t (cid:54) T , denote the singular value decomposition (SVD) of S as S ( t ) = P ( t ) Λ ( t ) Q ( t ) P ( t ) is an d × d orthogonal matrix (i.e., P (cid:124) P = P P (cid:124) = I d ) and Q ( t ) is an m × m orthogonal matrix, and Λ ( t ) is a d × m matrix with rank ( Λ ) = r (cid:54) d positive diagonalentries and d − r zero diagonal entries.First, we prove that given a Wiener trajectory, W ( t ) , t ∈ [0 , T ] and the solutionto eqn. (59), X ( t ) , there exists a Wiener trajectory W ∗ ( t ) such that the solution toeqn. (60), X ∗ , is also a solution to eqn. (59). In other words, for a Wiener process W ( t ) we can construct a W ∗ ( t ) , such that X ∗ ( t ) = X ( t ) , for ≤ t ≤ T .Following (Allen et al., 2008), we construct the vector W ∗ ( t ) of d independentWiener processes as follows: W ∗ ( t ) = (cid:90) t P ( s ) (cid:104)(cid:0) Λ ( s ) Λ (cid:124) ( s ) (cid:1) (cid:105) + Λ ( s ) Q ( s ) dW ( s ) + (cid:90) t P ( s ) dW ∗∗ ( s ) (61)for (cid:54) t (cid:54) T , where W ∗∗ ( t ) is a vector of length d with the ﬁrst r entries equal to 0and the next d − r entries independent Wiener processes, and (cid:104)(cid:0) Λ ( s ) Λ (cid:124) ( s ) (cid:1) (cid:105) + is thepseudoinverse of (cid:0) Λ ( s ) Λ (cid:124) ( s ) (cid:1) . Consider that D ( t ) = S ( t ) S (cid:124) ( t ) = P ( t ) Λ ( t ) Q ( t ) (cid:104) P ( t ) Λ ( t ) Q ( t ) (cid:105) (cid:124) (62) = P ( t ) Λ ( t ) Λ (cid:124) ( t ) P (cid:124) ( t ) (63) = [ S ( t )] , (64)where S ( t ) = P ( t ) (cid:16) Λ ( t ) Λ (cid:124) ( t ) (cid:17) P (cid:124) ( t ) is a square root of D , by construction.The diffusion term on the right side of (60) with X ∗ ( t ) replaced by X ( t ) satisﬁes S ( t, X ( t )) dW ∗ ( t )= S ( t ) (cid:16) P ( t ) (cid:104)(cid:0) Λ ( t ) Λ (cid:124) ( t ) (cid:1) (cid:105) + Λ ( t ) Q ( t ) dW ( t ) + P ( t ) dW ∗∗ ( t ) (cid:17) = P ( t ) (cid:16) Λ ( t ) Λ (cid:124) ( t ) (cid:17) P (cid:124) ( t ) P ( t ) (cid:104)(cid:0) Λ ( t ) Λ (cid:124) ( t ) (cid:1) (cid:105) + Λ ( t ) Q ( t ) dW ( t )+ P ( t ) (cid:16) Λ ( t ) Λ (cid:124) ( t ) (cid:17) P (cid:124) ( t ) P ( t ) dW ∗∗ ( t )= { P ( t ) Λ ( t ) Q ( t ) } dW ( t ) . (65)From the SVD of S =P Λ Q, we conclude that S ( t, X ( t )) dW ∗ ( t ) = S ( t, X ( t )) dW ( t ) . (66)Hence, dX = f ( t, X ( t )) dt + S ( t, X ( t )) dW ∗ t , i.e., X ( t ) is a sample path solution ofequation (60).Similarly, given a Wiener trajectory W ∗ ( t ) and the solution to eqn. (60) X ∗ ( t ) , wecan construct a vector W ( t ) of m independent Winner processes as W ( t ) = (cid:90) t Q (cid:124) ( s ) Λ + ( s ) (cid:2) Λ ( s ) Λ (cid:124) ( s ) (cid:3) / P (cid:124) ( s ) dW ∗ ( s ) + (cid:90) t Q (cid:124) ( s ) dW ∗∗∗ ( s ) (67)26or (cid:54) t (cid:54) T , where W ∗∗∗ ( t ) is a vector of length m with the ﬁrst r entries equal to 0and the next m − r entries independent Wiener processes, and Λ + ( s ) is the pseudoin-verse of Λ ( s ) . Then, by an argument parallel to (65), we conclude that S ( t, X ∗ ( t )) dW ( t ) = S ( t, X ∗ ( t )) dW ∗ ( t ) . (68)Hence, dX ∗ = f ( t, X ∗ ( t )) dt + S ( t, X ∗ ( t )) dW ( t ) , that is, X ∗ ( t ) is also a solution to(59). Therefore we can conclude that the channel-based Langevin model in eqn. (59) ispathwise equivalent to the Fox and Lu’s model.To illustrate pathwise equivalence, Fig. 6 plots trajectories of the × D stochasticHH model and Fox and Lu’s model, using noise traces dictated by the preceding con-struction. In panel A, we generated a sample path for eqn. (59) and plot three variablesin X : the voltage V , Na + channel open probability M and K + channel open probabil-ity N . The corresponding trajectory, X ∗ , for Fox and Lu’s model was generated fromeqn. (60) and the corresponding Wiener trajectory was calculated using eqn. (61). Thetop three subplots in panel A superposed the voltage V ∗ , Na + channel open probability M ∗ and K + channel open probability N ∗ in X ∗ against those in X . The bottom threesubplots in panel A plot the point-wise differences of each variable. Eqns. (59) and (60)are numerically solved in Matlab using the Euler-Maruyama method with a time step dt = 0 . ms. The slight differences observed arise in part due to numerical errors incalculating the singular value decomposition of S (in eqn. (59)); another source of erroris the ﬁnite accuracy of the Euler-Maruyama method.6 As shown in Fig. 6, most differ-ences occur near the spiking region, where the system is numerically very stiff and thenumerical accuracy of the SDE solver accounts for most of the discrepancies (analysisof which is beyond the scope of this paper). We can conclude from the comparison inFig. 6 that the × D Langevin model is pathwise equivalent with the Fox and Lu’smodel. Similarly, the same analogy applies for other channel-based Langevin modelssuch that with the same diffusion matrix D ( X ) .We have shown that our “ × D” model, with a 14-dimensional state space and28 independent noise sources (one for each directed edge) is pathwise equivalent to Foxand Lu’s original 1994 model as well as other channel-based models (under correspond-ing boundary conditions) including (Goldwyn et al., 2011; Goldwyn and Shea-Brown,2011; Orio and Soudry, 2012; Pezo et al., 2014; Fox, 2018). As we shall see in §

5, thepathwise equivalent models give statistically indistinguishable interspike interval dis-tributions under the same BCs. We emphasize the importance of boundary conditionsfor pathwise equivalence. Two simulation algorithms with the same A i and S i matriceswill generally have nonequivalent trajectories if different boundary conditions are im-posed. For example, (Dangerﬁeld et al., 2012) employs the same dynamics as (Orio andSoudry, 2012) away from the boundary, where ion channel state occupancy approacheszero or one. But where the latter allow trajectories to move freely across this boundary(which leads only to small, short-lived excursions into “nonphysical” values), Danger-ﬁeld imposes reﬂecting boundary conditions through a projection step at the boundary.As we will see below ( § The forward Euler method is ﬁrst order accurate for ordinary differential equations, but the forwardEuler-Maruyama method is only O ( √ dt ) accurate for stochastic differential equations (Kloeden andPlaten, 1999). A: Givena sample path of the × D Langevin model in eqn. (59), we construct the noise byeqn. (61) and generate the sample trajectory of Fox and Lu’s model using eqn. (60). B: Given a sample path of Fox and Lu’s model in eqn. 60, we construct the noiseby eqn. (67) and generate the sample trajectory of the × D Langevin model usingeqn. (59). For both cases, we plot the voltage V , the open probability of the Na + channel( M ), and the open probability of the K + channel ( N ). To show that these trajectoriesare pathwise equivalent, we superpose the trajectories for each variable and also plotthe point-wise differences of each. We obtain excellent agreement in both directions.28ally signiﬁcant difference in the ISI distribution, as well as a loss of accuracy whencompared with the “gold standard” Markov chain simulation. In §

3, we studied the contribution of every directed edge to the ISI variability and pro-posed how stochastic shielding could be applied under current clamp. Moreover, in § S matrices or boudary conditions (Fox and Lu, 1994; Goldwynand Shea-Brown, 2011; Dangerﬁeld et al., 2012; Orio and Soudry, 2012; Pezo et al.,2014), the 14D HH model (proposed in § § § L norm of the difference between two distributions (the Wasserstein distance, (Wasser-stein, 1969)), in § L ∞ norm (the Kolmogorov-Smirnovtest, (Kolmogorov, 1933; Smirnov, 1948)), in § × D model) have indistinguishable ISI statistics,while the non-equivalent models (Fox ’97, Dangerﬁeld, Goldwyn & Shea-Brown, our × D stochastic-shielding model) have signiﬁcantly different ISI distributions. Ofthese, the × D SS model is the closest to the models in the × D class, and asfast as any other model considered. L Norm Difference of ISIs

We ﬁrst evaluate the accuracy of different stochastic simulation algorithms by compar-ing their ISI distributions under current clamp to that produced by a reference algorithm,namely the discrete-state Markov Chain (MC) algorithm.Let X , X , . . . , X n be n independent samples of ISIs with a true cumulative distri-bution function F . Let F n ( · ) denote the corresponding empirical cumulative distribu-tion function (ECDF) deﬁned by F n ( x ) = 1 n n (cid:88) i =1 { X i ≤ x } , x ∈ R , (69)where we write A to denote the indicator function for the set A . Let Q and Q M be thequantile functions of F and F M , respectively. The L -Wasserstein distance betweentwo CDF’s F M and F can be written as (Shorack and Wellner, 2009) (page 64) ρ ( F, F M ) = (cid:90) ∞ (cid:12)(cid:12) F ( x ) − F M ( x ) (cid:12)(cid:12) dx = (cid:90) (cid:12)(cid:12) Q ( x ) − Q M ( x ) (cid:12)(cid:12) dx. (70)29ote that ρ has the same units as “ dx ”. Thus the L distances reported in Tab. 3 haveunits of milliseconds.When two models have the same number of samples, n , (70) can be estimated by (cid:90) (cid:12)(cid:12) Q ( x ) − Q M ( x ) (cid:12)(cid:12) dx ≈ n n (cid:88) i =1 | X i − Y i | := ρ ( F n , F Mn ) , (71)where X , · · · , X n and Y , · · · , Y n are n independent samples sorted in ascending orderwith CDF F and F M , respectively.We numerically calculate ρ ( F n , F Mn ) to compare several Langevin models againstthe MC model. We consider the following models: “Fox94” denotes the original modelproposed by (Fox and Lu, 1994), which requires a square root decomposition ( S = √ D ) for each step in the simulation, see equations (1)-(3). “Fox97” is the widely used“subunit model” of Fox (1997), see equations (4)-(5). “Goldwyn” denotes the methodtaken from (Goldwyn and Shea-Brown, 2011), where they restrict the 14D system ( V ,5 K + gates and 8 Na + gates) to the 4D multinomial submanifold ( V, m, n, and h , seep. 8 above), with gating variables truncated to [0 , . We write “Orio” for the modelproposed by (Orio and Soudry, 2012), where they constructed a rectangular matrix S such that SS (cid:124) = D (referred to as S paired in Tab. 3) combining ﬂuctuations drivenby pairs of reciprocal edges, thereby avoiding taking matrix square roots at each timestep. The model “Dangerﬁeld” represents (Dangerﬁeld et al., 2012), which used thesame S matrix as in (Orio and Soudry, 2012) but added a reﬂecting (no-ﬂux) boundarycondition via orthogonal projection (referred to as S EF in Tab. 3). Finally, we includethe × D model we proposed in § S single in Tab. 3);“SS” is the stochastic shielding model speciﬁed in § I app = 10 nA), and initial conditions ﬁxed. Through-out the paper, we presume a ﬁxed channel density of 60 channels /µ m for sodium and18 channels /µ m for potassium in a membrane patch of area µ m , consistent withprior work such as Goldwyn and Shea-Brown (2011); Orio and Soudry (2012). Theinitial condition is taken to be the point on the deterministic limit cycle at which thevoltage crosses upwards through − mV. An initial transient corresponding to 10-15ISIs is discarded, to remove the effects of the initial condition. See Tab. 5 in AppendixB for a complete speciﬁcation of simulation parameters. We compared the efﬁciencyand accuracy of each model through the following steps:1. For each model, a single run simulates a total time of 84000 milliseconds (ms)with time step 0.008 ms, recording at least 5000 ISIs.2. For each model, repeat 10,000 runs in step one.3. Create a reference ISI distribution by aggregating all 10,000 runs of the MCmodel, i.e. based on roughtly × ISIs.4. For each of individual runs, align all ISI data into a single vector and calculatethe ECDF using equation (69).5. Compare the ISI distribution of each model with the reference MC distributionby calculating the L -difference of the ECDFs using equation (71).30. To compare the computational efﬁciency, we take the average execution time ofthe MC model as the reference. The relative computational efﬁciency is the ratioof the average execution time of a model with that of the MC model (c. 3790sec.).Model Variables S Matrix Noise Dim. L Norm (msec.) RuntimeV+M+N Na+K (Wasserstein Dist.) (sec.)MC 1+8+5 n/a 20+8 . e- ± . e- S = √ D . e- ± . e- . e- ± . e- S EF . e- ± . e- S = √ D . e- ± . e- S paired . e- ± . e- × D 1+8+5 S single . e- ± . e- S ss . e- ± . e- L -Wasserstein distances of ISI distributions for Langevintype Hodgkin-Huxley models compared to the MC model. Model (see text for details):MC: Markov-chain. Fox1994: model from Fox and Lu (1994). Fox97: Fox (1997).Goldwyn: Goldwyn and Shea-Brown (2011). Dangerﬁeld: Dangerﬁeld et al. (2010,2012). × D: model proposed in § § S Matrix: Form of the noise coefﬁcient matrixin equations (1)-(3). Noise Dimensions: number of independent Gaussian white-noisesources represented for sodium and potassium, respectively. L Norm: Empiricallyestimated L -Wasserstein distance between the model’s ISI distribution and the MCmodel’s ISI distribution. For MC-vs-MC, independent trials were compared. a ± b :mean ± standard deviation. Runtime (in sec.): see text for details. n/a: not applicable.Table 3 gives the empiricially measured L difference in ISI distribution betweenseveral pairs of models.7 The ﬁrst row (“MC”) gives the average L distance betweenindividual MC simulations and the reference distribution generated by aggregating allMC simulations, in order to give an estimate of the intrinsic variability of the measure.Figure 8 plots the L -Wasserstein differences versus the relative computational efﬁ-ciency of several models against the MC model. These results suggest that the Fox94,Orio, and × D models are statistically indistinguishable, when compared with theMC model using the L -Wasserstein distance. This result is expected in light of ourresults ( §

4) showing that these three models are pathwise-equivalent. (We will makepariwise statistical comparisons between the ISI distributions of each model in § × D and Orio models are signif-icantly faster than the original Fox94 model (and the Goldwyn model) because theyavoid the matrix square root computation. The Dangerﬁeld model has speed similar to7

Runtimes in Tab. 3, rounded to the nearest integer number of seconds, were obtained by averagingthe runtimes on a distribution of heterogeneous compute nodes from Case Western Reserve University’shigh-performance computing cluster. × D model, but the use of reﬂecting boundary conditions introduces signiﬁcantinaccuracy in the ISI distribution. The imposition of truncating boundary conditions inthe Goldwyn model also appears to affect the ISI distribution.Figure 7: The probability density of interspikeintervals (ISIs) for Fox97 (blue) and the MCmodel (red). The probability densities were cal-culated over more than . × ISIs.Of the models considered, theFox97 subunit model is the fastest,however it makes a particularly poorapproximation to the ISI distributionof the MC model. Note that the max-imum L -Wasserstein distance be-tween two distributions is 2. The ISIdistribution of Fox97 subunit modelto that of the MC model is morethan 0.8, which is ten times largerthan the L -Wasserstein distance ofthe SS model, and almost half ofthe maximum distance. As shown inFig. 7, the Fox97 subunit model failsto achieve the spike ﬁring thresholdand produces longer ISIs. Because ofits inaccuracy, we do not include thesubunit model in our remaining com-parisons. The stochastic shielding model, on the other hand, has nearly the same speedas the Fox97 model, but is over 100 times more accurate (in the L sense). The SSmodel is an order of magnitude faster than the × D model, and has less than twicethe L discrepancy versus the MC model ( L norm 76.2 versus 49.3 microseconds).While this difference in accuracy is statistically signiﬁcant, it may not be practicallysigniﬁcant, depending on the application (see § In addition to using the L -Wasserstein distances to test the differences between twoCDFs, we can also make a pairwise comparison between each model by applying theDvoretzky-Kiefer-Wolfowitz inequality (Dvoretzky et al., 1956) and the two-sampleKolmogorov-Smirnov (KS) test (Kolmogorov, 1933; Smirnov, 1948). While the Wasser-stein distance is based on the L norm, the KS statistic is based on the L ∞ (or supre-mum) norm.The Dvoretzky-Kiefer-Wolfowitz inequality (Dvoretzky et al., 1956) establishesconﬁdence bounds for the CDF. Speciﬁcally, the interval that contains the true CDF, F ( · ) , with probability − α , is given by | F n ( x ) − F ( x ) | ≤ ε where ε = (cid:115) ln α n . (72)When comparing samples X M , X M , . . . , X Mn obtained from an approximate model M against the gold standard, in § L difference of the empirical den-32igure 8: The L -Wasserstein distances and relative computational efﬁciency vs. theMC model. “Fox94” (green circle), “Goldwyn” (black cross), “Orio” (cyan square),“14D” (blue star), “SS” (magenta downward pointing triangle), “Dangerﬁeld” (red up-ward pointing triangle), and the “MC” (brown diamond) model. The L error for ISIdistribution was computed using the L -Wasserstein distance (71), with discrete timeGillespie/Monte-Carlo simulations as a reference. The relative computational efﬁciencyis the ratio of the recorded run time to the mean recorded time of the MC mode (3790seconds). The mean and 95% conﬁdence intervals were calculated using 100 repetitionsof , runs each ( × ISIs total). 33ity functions, as an approximation for the L difference of the true distributions. In-stead, we work here with the L ∞ norm, ρ ∞ ( F n , F Mn ) = lim p →∞ (cid:18)(cid:90) ∞ (cid:12)(cid:12) F Mn ( x ) − F n ( x ) (cid:12)(cid:12) p dx (cid:19) /p = sup ≤ x< ∞ (cid:0)(cid:12)(cid:12) F Mn ( x ) − F n ( x ) (cid:12)(cid:12)(cid:1) . (73)For each x ≥ , equation (72) bounds the discrepancy between the true and empiri-cal distribution differences as follows. By the triangle inequality, and independence ofthe X i from the X Mi , the inequality | F M − F | = | F M − F Mn + F n − F + F Mn − F n |≤ | F M − F Mn | + | F n − F | + | F Mn − F n |≤ ε + | F Mn − F n | , (74)holds with probability (1 − α ) . Similarly, | F Mn − F n | = | F Mn − F Mn + F − F n + F M − F |≤ | F M − F Mn | + | F n − F | + | F M − F |≤ ε + | F M − F | (75)also holds with probability (1 − α ) . Together, (74)-(75) indicate that the discrepancybetween the difference of empirical distributions and the difference of true distributionsis bounded as (cid:12)(cid:12)(cid:12) | F M − F | − | F Mn − F n | (cid:12)(cid:12)(cid:12) ≤ ε (76)with probability (1 − α ) , for ε = (cid:113) ln α n .We will use the pointwise difference of the ECDF’s for a large sample as an estimatefor the pointwise difference between two true CDFs. The two-sample Kolmogorov-Smirnov (KS) test (Kolmogorov, 1933; Smirnov, 1948) offers a statistics to test whethertwo samples are from the same distribution. The two-sample KS statistic is D n,m = sup x | F ,n ( x ) − F ,m ( x ) | , (77)where F ,n and F ,m are two ECDFs for two samples deﬁned in (69), and the sup is thesupremum function. The reference statistic, R n,m ( α ) , depending on the signiﬁcancelevel α , is deﬁned as R n,m ( α ) = (cid:114) − log( α/ (cid:114) n + mnm , (78)where n and m are the sample sizes. The null hypothesis that “the two samples comefrom the same distribution” is rejected at the signiﬁcance level α if D n,m > R n,m ( α ) . (79)Figure 9 plots the logarithm of ratio of the two-sample KS statistics, D n,m R n,m (0 . , for“Fox94” (Fox and Lu, 1994), “Goldwyn” (Goldwyn and Shea-Brown, 2011),“Danger-ﬁled” (Dangerﬁeld et al., 2012), “Orio” (Orio and Soudry, 2012), “14D” (the × D34igure 9: Logarithm of the ratio of Kolmogorov-Smirnov test statistic D n,m /R n,m ( α ) ,eqns. (77)-(78), for samples from the ISI distribution for each pair of models. Left:

Box and whisker plots showing mean and 95% conﬁdence intervals based on 10,000pairwise comparisons. The ﬁrst ﬁve plots show self-comparisons (green bars); the re-mainder compare distinct pairs (grey bars). A:“Fox94” (Fox and Lu, 1994), B:“Orio”(Orio and Soudry, 2012), C: “14D” (the × D model we proposed in § Right:

Mean logarithms (as in left panel) for all pairwise comparisons. Fox94, Orioand × D form a block of statistically indistinguishable samples.35odel we proposed in § §

4, namely the“Fox94”, “Orio” and the “14D” model, are not distinguishable at any conﬁdence leveljustiﬁed by our data. Note that those three models use the same boundary conditions(free boundary condition as in Orio and Soudry (2012)) and the ratio D n,m /R n,m ( α ) ofpairwise comparison is on the same magnitude of that for the self-comparisons. How-ever, as pointed out above, these statistically equivalent simulation algorithms have dif-ferent computational efﬁciencies (Fig. 8). Among these methods, Orio and Soundry’salgorithm (14 dimensional state space with 14 undirected edges as noise sources) andour method (14 dimensional state space with 28 directed edges as noise sources) havesimilar efﬁciencies, with Orio’s method being about 5% faster than ours method. Our × D method provides the additional advantage that it facilitates further accelerationunder the stochastic shielding approximation (see § × D and Fox ’94 models, algo-rithms using different boundary conditions are not pathwise equivalent, which is againveriﬁed in Fig. 9. Algorithms with subunit approximation and truncated boundary con-dition (i.e., “Goldwyn”) and the reﬂecting boundary condition (i.e. “Dangerﬁeld”) aresigniﬁcantly different in accuracy (and in particular, they are less accurate) than modelsin the × D class.

The exact method for Markov Chain (MC) simulation for an electrotonically compact(single compartment) conductance-based stochastic model under current clamp is a hy-brid discrete (channel state) / continuous (voltage) model of the sort used by (Clayand DeFelice, 1983; Newby et al., 2013; Anderson et al., 2015). While MC methodsare computationally expensive, simulations based on Gaussian/Langevin approximationcan capture the effects of stochastic ion channel ﬂuctuations with reasonable accuracyand excellent computational efﬁciency. Since Goldwyn and Shea Brown’s work fo-cusing the attention of the computational neuroscience community on Fox and Lu’sLangevin algorithm for the Hodgkin-Huxley system (Fox and Lu, 1994; Goldwyn andShea-Brown, 2011), several variants of this approach have appeared in the literature.In the present paper we advocate for a class of models combining the best features ofconductance-based Langevin models with the recently developed stochastic shieldingapproximation (Schmandt and Gal´an, 2012; Schmidt and Thomas, 2014; Schmidt et al.,2018). We propose a Langevin model with a 14-dimensional state space, representingthe voltage, ﬁve states of the K + channel, and eight states of the Na + -channel; and a28-dimensional representation of the driving noise: one independent Gaussian noiseterm for each directed edge in the channel-state transition graph. We showed in § × Dmodel, with independent noise sources corresponding to each ion channel transition( § § × D modelis pathwise equivalent both to Fox and Lu’s original Langevin model, and to A 14-statemodel with 14 independent noise sources due to (Orio and Soudry, 2012).The original 4D HH model, the 14D deterministic HH model, and the family ofequivalent 14D Langevin models we consider here, form a nested family, each con-tained within the next. Speciﬁcally, (i) the 14D ODE model is the “mean-ﬁeld” versionof the 14D Langevin model, and (ii) the 4D ODE model forms an attracting invariantsubmanifold within the 14D ODE model, as we establish in our Lemma 2. So in a veryspeciﬁc sense, the original HH equations “live inside” the 14D Langevin equations.Thus these three models enjoy a special relationship. In contrast, the 4D Langevinequations studied in Fox (1997) do not bear an especially close relationship to the otherthree.In addition to rigorous mathematical analysis we also performed numerical com-parisons ( §

5) showing that, as expected, the pathwise equivalent models produced sta-tistically indistinguishable interspike interval (ISI) distributions. Moreover, the ISI dis-tributions for our model (and its equivalents) were closer to the ISI distribution of the“gold standard” MC model under two different metric space measures. Our method(along with Orio and Soudry’s) proved computationally more efﬁcient than Fox andLu’s original method and Dangerﬁeld’s model (Dangerﬁeld et al., 2012). In addition,our method lends itself naturally to model reduction (via the stochastic shielding ap-proximation) to a signiﬁcantly faster × D simulation that preserves a surprisinglyhigh level of accuracy.

The discrete-state Markov Chain algorithm due to Gillespie is often taken to be the goldstandard simulation for a single-compartment stochastic conductance-based model. Mostformer literature on Langevin HH models, such as (Goldwyn and Shea-Brown, 2011;Linaro et al., 2011; Orio and Soudry, 2012; Huang et al., 2013), when establishing a ref-erence MC model, consider a version of the discrete Gillespie algorithm that assumesa piecewise-constant propensity approximation, i.e. that does not take into account thatthe voltage changes between transitions, which changes the transition rates. This ap-proximation can lead to biophysically unrealistic voltage traces for very small systemsizes (cf. Fig. 2 of (Kispersky and White, 2008), top trace with N = 1 ion channel)although the differences appear to be mitigated for N (cid:38) channels (Anderson et al.,2015). In the present paper, our MC simulations are based on 6000 Na + and 1800K + channels (as in Goldwyn and Shea-Brown (2011)), and we too use the ISI dis-tribution generated by a piecewise-constant propensity MC algorithm as our referencedistribution. As shown in Tab. 3 and Fig. 8, the computation time for MC is one order ofmagnitude larger than efﬁcient methods such as (Orio and Soudry, 2012), (Dangerﬁeldet al., 2012) and the × D model. The computational cost of the MC model increasesdramatically as the number of ion channels grows, therefore, even the approximate MCalgorithm is inapplicable for a large number of channels.37 .3 Langevin Models

It is worth pointing out that the accuracy of Fox and Lu’s original Langevin equationshas not been fully appreciated. In fact, Fox and Lu’s model (Fox and Lu, 1994) gives anapproximation to the MC model that is just as accurate as (Orio and Soudry, 2012) bothin the gating variable statistics (Goldwyn and Shea-Brown, 2011) and also in the ISIdistribution sense (see §

5) – because as we have established, these models are pathwiseequivalent! However, the original implementation requires taking a matrix square rootin every timestep in the numerical simulation, which signiﬁcantly reduces its computa-tional efﬁciency.Models based on modiﬁcations of (Fox and Lu, 1994)’s work can be divided intothree classes: the subunit model (Fox, 1997); effective noise models (Linaro et al.,2011; G ¨uler, 2013b), and channel-based Langevin models such as (Goldwyn and Shea-Brown, 2011; Dangerﬁeld et al., 2012; Orio and Soudry, 2012; Pezo et al., 2014; Huanget al., 2013).

Subunit model

The ﬁrst modiﬁcation of the Fox and Lu’s model is the subunit model(Fox, 1997), which keeps the original form of the HH model, and adds noise to thegating variables ( m, h, and n ) (Fox, 1997; Goldwyn and Shea-Brown, 2011). Thesubunit approximation model was widely used because of its fast computational speed.However, as (Bruce, 2009) and others pointed out, the inaccuracy of this model remainssigniﬁcant even for large number of channels. Moreover, (Goldwyn and Shea-Brown,2011) and (Huang et al., 2015) found that the subunit model fails to capture the statisticsof the HH Na + and K + gates. In this paper, we also observed that the subunit model ismore efﬁcient than channel-based Langevin models, but tends to delay spike generation.As shown in Fig. 7, the subunit model generates signiﬁcantly longer ISIs than the MCmodel. Effective noise models

Another modiﬁcation to Fox and Lu’s algorithm is to addcolored noise to the channel open fractions. Though colored noise models such as(Linaro et al., 2011; G¨uler, 2013b) are not included in our model comparison, (Huanget al., 2015) found that both these effective noise models generate shorter ISIs than theMC model with the same parameters. Though the comparison we provided in § × Dmodel, combining the results from (Goldwyn and Shea-Brown, 2011) and (Huang et al.,2015), the × D model could be compared to a variety of models including (Foxand Lu, 1994; Fox, 1997; Goldwyn and Shea-Brown, 2011; Linaro et al., 2011; Orioand Soudry, 2012; Dangerﬁeld et al., 2012; Huang et al., 2013; G¨uler, 2013b).

Channel-based Langevin models

The main focus of this paper is the modiﬁcationbased on the original Fox and Lu’s matrix decomposition method, namely, the channel-based (or conductance-based) Langevin models. We proved in § × Dmodel are pathwise equivalent, which was also veriﬁed from our numerical simulationsin § §

5. In §

4, we discussed channel-based Langevin models including (Fox andLu, 1994; Goldwyn and Shea-Brown, 2011; Dangerﬁeld et al., 2012; Orio and Soudry,38012; Fox, 2018). We excluded Fox’s more recent implementation (Fox, 2018) in § S matrix from theCholesky decomposition in (Fox, 2018) involve square roots of differences of severalquantities, with no guarantee that the differences will result in nonnegative terms – evenwith strictly positive value of the gating variables. Nevertheless, this model would be inthe equivalence class and in any case would not be more efﬁcient than the Orio’s model,because of the noise dimension and complicated operations (involving taking multiplesquare roots) in each step. If two random variables have similar distributions, then they will have similar moments,but not vice-versa. Therefore, comparison of the full interspike-interval distributionsproduced by different simulation algorithms gives a more rigorous test than compar-ison of ﬁrst and second moments of the ISI distribution. Most previous evaluationsof competing Langevin approximations were based on the accuracy of low-order mo-ments, for example the mean and variance of channel state occupancy under voltageclamp, or the mean and variance of the interspike interval distribution under currentclamp (Goldwyn et al., 2011; Goldwyn and Shea-Brown, 2011; Schmandt and Gal´an,2012; Linaro et al., 2011; Dangerﬁeld et al., 2012; Orio and Soudry, 2012; Huang et al.,2013, 2015). Here, we compare the accuracy of the different algorithms using the fullISI distribution, but using the L norm of the difference (Wasserstein metric) and the L ∞ norm (Kolmogorov-Smirnov test). Greenwood et al. (2015) previously comparedthe ISI distributions generated by the Markov chain (Gillespie algorithm) to the dis-tribution generated by different types of Langevin approximations (LA), including theoriginal Langevin models (Fox and Lu, 1994; Goldwyn and Shea-Brown, 2011), thechannel-based LA with colored noise (Linaro et al., 2011; G¨uler, 2013b), and LA witha × variant of the diffusion coefﬁcient matrix S (Orio and Soudry, 2012). Theyconcluded that Orio and Soudry’s method provided the best match to the Markov chainmodel, speciﬁcally “Fox-Goldwyn, and Orio-Kurtz8 methods both generate ISI his-tograms very close to those of Micro9” (Greenwood et al., 2015). We note that thecomparison reported in this paper simply superimposed plots of the ISI distributions,allowing a qualitative comparison, while our metric-space analysis is fully quantitative.In any case, their conclusions are consistent with our ﬁndings; we showed in § We refer to this model as to as “Orio” This is the model we refer to as the Markov-chain model. .5 Stochastic Shielding Method The stochastic shielding (SS) approximation (Schmandt and Gal´an, 2012) provides anefﬁcient and accurate method for approximating the Markov process with only a subsetof observable states. For conductance-based models, rather than aggregating ion chan-nel states, SS effects dimension reduction by selectively eliminating those independentnoise sources that have the least impact on current ﬂuctuations. Recent work in (Pezoet al., 2014) compared previous methods such as (Gillespie, 1977; Orio and Soudry,2012; Dangerﬁeld et al., 2012; Huang et al., 2013; Schmandt and Gal´an, 2012) in accu-racy, applicability and simplicity as well as computational efﬁciency. They concludedthat for mesoscopic numbers of channels, stochastic shielding methods combined withdiffusion approximation methods can be an optimal choice. Like (Orio and Soudry,2012), the stochastic shielding method proposed by (Pezo et al., 2014) also assumeddetailed balance of transitions between adjacent states and used edges that are directlyconnected to the open gates of HH Na + and K + . We calculated the edge importancein § + gates are not the four edges directly connected to the conducting state, as assumedin previous application of the SS method (Schmandt and Gal´an, 2012). Among all modiﬁcations of Fox and Lu’s method considered here, Orio and Soudry’sapproach, and our × D model provide the best approximation to the “gold stan-dard” MC model, with the greatest computational efﬁciency. Several earlier modelswere studied in the review paper by (Goldwyn and Shea-Brown, 2011), where theyrediscovered that the original Langevin model proposed by Fox and Lu is the best ap-proximation to the MC model among those considered. Later work (Huang et al., 2015)further surveyed a wide range of Langevin approximations for the HH system including(Fox and Lu, 1994; Fox, 1997; Goldwyn and Shea-Brown, 2011; Linaro et al., 2011;G¨uler, 2013b; Orio and Soudry, 2012; Huang et al., 2013) and explored models withdifferent boundary conditions. Huang et al. (2015) concluded that the bounded andtruncated-restored Langevin model (Huang et al., 2013) and the unbounded (Orio andSoudry, 2012)’s model provide the best approximation to the MC model.As shown in § §

5, the × D Langevin model naturally derived from thechannel structure is pathwise equivalent to the Fox and Lu ‘94, Fox ‘18, and Orio-Soudry models under the same boundary conditions. The × D model is moreaccurate than the reﬂecting boundary condition method of (Dangerﬁeld et al., 2012),and also better than the approximation method proposed by (Goldwyn and Shea-Brown,2011), when the entire ISI distribution is taken into account. We note that (Huang et al.,2015) treated Goldwyn’s method (Goldwyn and Shea-Brown, 2011) as the original Foxand Lu model in their comparison, however, the simulation in (Goldwyn and Shea-Brown, 2011) uses the D multinomial submanifold to update gating variables. Ouranalysis and numerical simulations suggest that the original Fox and Lu model is indeedas accurate as the Orio-Soudry model, while the computational cost still remains a majorconcern.Though the × D model has similar efﬁciency and accuracy with Orio and40oudry (2012), it has several advantages. First, the rectangular S matrix (in eqn. (2)-(3))in Orio’s model merges the noise contributions of reciprocal pairs of edges. However,this dimension reduction assumes, in effect, that detailed balance holds along reciprocaledges, which our results show is not the case, under current clamp (Fig. 5). Moreover,the × D model arises naturally from the individual transitions of the exact evolu-tion equations (eqn.(32)-(33)) for the underlying Markov Chain model, which makes itconceptually easier to understand. In addition, our method for deﬁning the × DLangevin model and ﬁnding the best SS model extends to channel-based models witharbitrary channel gating schemes beyond the standard HH model. Given any channelstate transition graph, the Langevin equations may be read off from the transitions, andthe edge importance under current clamp can be evaluated by applying the stochasticshielding method to investigate the contributions of noise from each individual directededge. Finally, in exchange for a small reduction in accuracy, the stochastic shieldingmethod affords a signiﬁcant gain in efﬁciency. The × D model thus offers a natu-ral way to quantify the contributions of the microscopic transitions to the macroscopicvoltage ﬂuctuations in the membrane through the use of stochastic shielding. For gen-eral ion channel models, extending a biophysically-based Langevin model analogous toour × D HH model, together with the stochastic shielding method, may provide thebest available tool for investigating how unobservable microscopic behaviors (such asion channel ﬂuctuations) affect the macroscopic variability in many biological systems.

All Langevin models, including our proposed × D model, proceed from the as-sumption that the ion channel population is large enough (and the ion channel statetransitions frequent enough) that the Gaussian approximations by which the white noiseforcing terms are derived, are justiﬁed. Thus when the system size is too small, noLangevin system will be an appropriate. Fortunately the Langevin approximation ap-pears quite accurate for realistic population sizes.The × D model uses more noise sources than other approaches. However,stochastic shielding allows us to jettison noise sources that do not signiﬁcantly impactthe system dynamics (the voltage ﬂuctuations and ISI distribution). Moreover, in or-der to compare the ISI distribution in detail among several variants of the Fox and Lu’94 model versus the Markov Chain standard, we have considered a single value of thedriving current, while other studies have compared parametrized responses such as theﬁring rate, ISI variance, or other moments, as a function of applied current. Accuratecomparisons require large ensembles of independent trajectories, forcing a tradeoff be-tween precision and breadth; we opted here for precise comparisons at a representativelevel of the driving current.From a conceptual point of view, a shortcoming of most Langevin models is thetendency for some channel state variables x to collide with the domain boundaries x ∈ [0 , and to cross them during numerical simulations with ﬁnite time steps. We adoptedthe approach advocated by Orio and Soudry (2012) of using “free boundaries” in whichgating variables can make excursions into the (unphysical) range x < or x > .Practically speaking, these excursions are always short, if the time step is reasonably41mall, as they tend to be self-correcting.10 Another approach is to construct reﬂectingboundary conditions; different implementations of this idea were used in Dangerﬁeldet al. (2010), Fox and Lu (1994), and Schmid et al. (2001). Dangerﬁeld’s methodproved both slower and less accurate than the free boundary method. As an alternativemethod, one uses a biased rejection sampling approach, testing each gating variableof the 14D model on each time step, and repeating the noise sample for any time stepviolating the domain conditions Fox and Lu (1994); Schmid et al. (2001). We foundthat this method had accuracy similar to that of Dangerﬁeld’s method ( L -Wassersteindifference ≈ . e- msec, cf. Tab. 3) and runtime similar to that of the Fox and Lu 94implementation, about 4 times slower than our 14D Langevin model.Yet another approach that in principle can guarantee the stochastic process remainswithin proscribed bounds, rederives a “best diffusional approximation” Fokker-Planckequation based on matching a master-equation birth-and-death description of a (bino-mial) population of two-state ion channels, leading to modiﬁed drift and diffusion terms(Goychuk, 2014). This method does not appear to extend readily to the 14D setting,with underlying multinomial structure of the ion channel gates, so we do not dwell onit further.Table 3 gives the accuracies with which each model reproduces the ISI distribu-tion, compared to a standard reference distribution generated through a large numberof samples of the MC method. The mean L difference between a single sample andthe reference sample is about 0.227 microseconds. For a nonnegative random variable T ≥ , the difference in the mean under two probability distributions is bounded aboveby the L difference in their cumulative distribution functions.11 Thus the L normgives an idea of the temporal accuracy with which one can approximate a given dis-tribution by another. The mean difference between the ISI distribution generated by asingle run of the full × D model is about 49 µ sec, and the discrepancy producedby the (signiﬁcantly faster) SS model is about 76 µ sec. When would this level of ac-curacy matter for the function of a neuron within a network? The barn owl Tyto alba uses interaural time difference to localize its prey to within 1-2 degrees, a feat that re-quires encoding information in the precise timing of auditory system action potentialsat the scale of 5-20 microseconds (Moiseff and Konishi, 1981; Gerstner et al., 1996).For detailed studies of the effects of channel noise in this system, the superior accuracyof the MC model might be preferred. On the other hand, the timescale of informationencoding in the human auditory nerve is thought to be in the millisecond range (Gold-wyn et al., 2010); with precision in the feline auditory system as reported as low as 100 µ s (Imennov and Rubinstein, 2009), see also (Woo et al., 2009). For these and othermammalian systems, the stochastic shielding approximation should provide sufﬁcientaccuracy.10 To avoid complex entries, we use | x | when calculating entries in the noise coefﬁcient matrix. For a nonnegative random variable T with cumulative distribution function F ( t ) = P [ T ≤ t ] ,the mean satisﬁes E [ T ] = (cid:82) ∞ (1 − F ( t )) dt (Grimmett and Stirzaker, 2005). Therefore the differ-ence in mean under two distributions F and F satisﬁes | E [ T ] − E [ T ] | = (cid:12)(cid:12)(cid:82) ∞ F ( t ) − F ( t ) dt (cid:12)(cid:12) ≤ ρ ( F , F ) . .8 Future Work In § Acknowledgments

The authors thank Dr. David Friel (Case Western Reserve University) for illuminatingdiscussions of the impact of channel noise on neural dynamics, and the anonymousreferees for many detailed and constructive comments.This work was made possible in part by grants from the National Science Founda-tion (DMS-1413770 and DEB-1654989) and the Simons Foundation. PT thanks theOberlin College Department of Mathematics for research support. This research hasbeen supported in part by the Mathematical Biosciences Institute and the National Sci-ence Foundation under grant DMS-1440386. Large-scale simulations made use of theHigh Performance Computing Resource in the Core Facility for Advanced ResearchComputing at Case Western Reserve University.43 ppendices

A Table of Common Symbols and Notations

Symbol Meaning C Membrane capacitance ( µF/cm ) v Membrane potential ( mV ) V Na , V K , V L Ionic reversal potential for Na + , K + and leak ( mV ) I app Applied current to the membrane ( nA/cm ) m, h, n Dimensionless gating variables for Na + and K + channels α x , β x , x ∈ { m, n, h } Voltage dependent rate constant ( /msec ) x vector of state variablesM = [ M , M , · · · , M ] Eight-component state vector for the Na + gates [ m , m , m , m , m , m , m , m ] (cid:124) Components for the Na + gatesN = [ N , N , · · · , N ] Five-component state vector for the K + gates [ n , n , n , n , n ] (cid:124) Components for the K + gates M tot , N tot Total number of Na + and K + channels X Y ∆ k k -dimensional simplex in R k +1 given by y + . . . + y k +1 = 1 , y i ≥ M Multinomial submanifold within the 14D space A Na , A K State-dependent rate matrix D State diffusion matrix S , S , S , S Na , S K Noise coefﬁcient matrices ξ Vector of independent δ -correlated Gaussian whitenoise with zero mean and unit variance X = [ X , X , . . . , X d ] A d-dimensional random variable for sample path W = [ W , W , . . . , W n ] A Wiener trajectory with n components δ ( · ) The Dirac delta function δ ij The Kronecker delta F n Empirical cumulative distribution function with n observations (in §

5, we use m, n as sample sizes)Table 4: Common symbols and notations in this paper.44ymbol Meaning Value C Membrane capacitance 1 µF/cm ¯ g Na Maximal sodium conductance 120 µS/cm ¯ g K Maximal potassium conductance 36 µS/cm g leak Leak conductance 0.3 µS/cm V Na Sodium reversal potential for Na + mVV K Potassium reversal potential for K + -77 mVV leak Leak reversal potential -54.4 mVI app

Applied current to the membrane 10 nA/cm A Membrane Area µ m M tot Total number of Na + channels 6,000 N tot Total number ofK + channels 18,00Table 5: Parameters used for simulations in this paper. B Parameters and Transition Matrices

Subunit kinetics for the Hodgkin and Huxley equations are given by α m ( v ) = 0 . − v )exp(2 . − . v ) − (80) β m ( v ) = 4 exp( − v/ (81) α h ( v ) = 0 .

07 exp( − v/ (82) β h ( v ) = 1exp(3 − . v ) + 1 (83) α n ( v ) = 0 . − v )exp(1 − . v ) − (84) β n ( v ) = 0 .

125 exp( − v/ (85) A K ( v ) =  D K (1) β n ( v ) 0 0 04 α n ( v ) D K (2) 2 β n ( v ) 0 00 3 α n ( v ) D K (3) 3 β n ( v ) 00 0 2 α n ( v ) D K (4) 4 β n ( v )0 0 0 α n ( v ) D K (5)  ,A Na =  D Na (1) β m β h α m D Na (2) 2 β m β h α m D Na (3) 3 β m β h

00 0 α m D Na (4) 0 0 0 β h α h D Na (5) β m α h α m D Na (6) 2 β m

00 0 α h α m D Na (7) 3 β m α h α m D Na (8)  , D ion ( i ) = − (cid:88) j : j (cid:54) = i A ion ( j, i ) , ion ∈ { Na , K } . C Proof of Lemma 2

For the reader’s convenience we restate

Lemma 2.

The map from the 4D HH model to the 14D HH model is given in Tab. 2 as { H : x → y | x ∈ X , y ∈ Y} , and the map from the 14D HH model to the 4D HHmodel is given in Tab. 1 as { R : y → x | x ∈ X , y ∈ Y} . The partial derivatives ∂H∂x ofthe map H are given by dm dm = − − m ) (1 − h ) dm dh = − (1 − m ) dm dm = 3(1 − h )(3 m − m + 1) dm dh = − − m ) mdm dm = 3(1 − h )(2 m − m ) dm dh = − − m ) m dm dm = 3(1 − h ) m dm dh = − m dm dm = − h (1 − m ) dm dh = (1 − m ) dm dm = 3 h (3 m − m + 1) dm dh = 3(1 − m ) mdm dm = 3 h (2 m − m ) dm dh = 3(1 − m ) m dm dm = 3 hm dm dh = m .

46e can write ∂H/∂x in matrix form as: ∂H∂x =  − − m ) (1 − h ) − (1 − m )

00 3(1 − h )(3 m − m + 1) − − m ) m

00 3(1 − h )(2 m − m ) − − m ) m

00 3(1 − h ) m − m − h (1 − m ) (1 − m )

00 3 h (3 m − m + 1) 3(1 − m ) m

00 3 h (2 m − m ) 3(1 − m ) m

00 3 hm m

00 0 0 − − n ) − n ) (1 − n )0 0 0 12 n (1 − n )(1 − n )0 0 0 4 n (3 − n )0 0 0 4 n  . We write out the vector ﬁelds (14) and (15) component by component: d M dt = β m M + β h M − (3 α m + α h ) M = − − m ) (1 − h ) [(1 − m ) α m − mβ m ] + (1 − m ) [ hβ h − (1 − h ) α h ] d M dt = 3 α m M + 2 β m M + β h M − (2 α m + β m + α h ) M = 3(1 − h )(3 m − m + 1) [(1 − m ) α m − mβ m ] + 3(1 − m ) m [ hβ h − (1 − h ) α h ] d M dt = 2 α m M + 3 β m M + β h M − ( α m + 2 β m + α h ) M , = 3(1 − h )(2 m − m ) [(1 − m ) α m − mβ m ] + 3(1 − m ) m [ hβ h − (1 − h ) α h ] d M dt = α m M + β h M − (3 β m + α h ) M , = 3(1 − h ) m [(1 − m ) α m − mβ m ] + m [ hβ h − (1 − h ) α h ] d M dt = β m M + α h M − (3 α m + β h ) M , = − h (1 − m ) [(1 − m ) α m − mβ m ] + (1 − m ) [ hβ h − (1 − h ) α h ] d M dt = 3 α m M + 2 β m M + α h M − (2 α m + β m + β h ) M , = 3 h (3 m − m + 1) [(1 − m ) α m − mβ m ] − − m ) m [ hβ h − (1 − h ) α h ] d M dt = 2 α m M + 3 β m M + α h M − ( α m + 2 β m + β h ) M , = 3 h (2 m − m ) [(1 − m ) α m − mβ m ] − − m ) m [ hβ h − (1 − h ) α h ] d M dt = α m M + α h M − (3 β m + β h ) M , = 3 hm [(1 − m ) α m − mβ m ] − m [ hβ h − (1 − h ) α h ] d N dt = β n N − α n N = − − n ) [ α n (1 − n ) − nβ n ] ,d N dt = 4 α n N + 2 β n N − (3 α n + β n ) N = 4(1 − n ) (1 − n )[ α n (1 − n ) − nβ n ] , N dt = 3 α n N + 3 β n N − (2 α n + 2 β n ) N = 12 n (1 − n )(1 − n )[ α n (1 − n ) − nβ n ] ,d N dt = 2 α n N + 4 β n N − (3 α n + 3 β n ) N = 4 n (3 − n )[ α n (1 − n ) − nβ n ] ,d N dt = α n N − β n N = 4 n [ α n (1 − n ) − nβ n ] . By extracting common factors from the previous expressions it is clear that G ( y ) maybe written, for all y ∈ Y , as G ( y ) = − ¯ g Na M ( V − V Na ) − ¯ g K N ( V − V K ) − g L ( V − V L ) + I app C (cid:26) ∂H∂v ( R ( y )) (cid:27) + [(1 − m (cid:48) ) α m − m (cid:48) β m ] (cid:26) ∂H∂m ( R ( y )) (cid:27) − [ h (cid:48) β h − (1 − h (cid:48) ) α h ] (cid:26) ∂H∂h ( R ( y )) (cid:27) + [ α n (1 − n (cid:48) ) − n (cid:48) β n ] (cid:26) ∂H∂n ( R ( y )) (cid:27) (86)where m (cid:48) = ( M + M ) / M + M ) / M + M ) , h (cid:48) = M + M + M + M and n (cid:48) = N / N / N / N . Thus G ( y ) is in the span of the column vectors ∂H/∂v , ∂H/∂m , ∂H/∂n , and ∂H/∂h , as was to be shown.On the other hand, the vector ﬁeld for the 4D HH ODE (7-10) is given by F =  ( − ¯ g Na m h ( V − V Na ) − ¯ g K n ( V − V K ) − g L ( V − V L ) + I app ) /Cα m ( V )(1 − m ) − β m ( V ) mα h ( V )(1 − h ) − β h ( V ) hα n ( V )(1 − n ) − β n ( V ) n  . Referring to (86), we see that G ( y ) = D x H ( R ( y )) F ( R ( y )) . Thus we complete theproof of Lemma 2. D Diffusion Matrix of the 14D Model

As deﬁned in equations (11) and (12) M = [ m , m , m , m , m , m , m , m ] (cid:124) , (87) N = [ n , n , n , n , n ] (cid:124) , (88)the diffusion matrices D Na and D K are given by48 K =  D K (1 , − α n n − β n n − α n n − β n n D K (2 ,

2) 3 α n n − β n n − α n n − β n n D K (3 , − α n n − β n n · · ·· · · − α n n − β n n D K (4 , − α n n − β n n − α n n − β n n D K (5 ,  , D (1:4) Na =  D Na (1 , − α m m − β m m − α m m − β m m D Na (2 , − α m m − β m m − α m m − β m m D Na (3 , − α m m − β m m − α m m − β m m D Na (4 , − α h m − β h m − α h m − β h m − α h m − β h m  ,D (5:8) Na =  − α h m − β h m − α h m − β h m − α h m − β h m

00 0 0 − α h m − β h m D Na (5 , − α m m − β m m − α m m − β m m D Na (6 , − α m m − β m m − α m m − β m m D Na (7 , − α m m − β m m  , where D ion ( i, i ) = − (cid:88) j : j (cid:54) = i D ion ( j, i ) , for ion ∈ { Na , K } . S K and S Na for the × D HH model are given by S (1:5) Na =  −√ α h m √ β h m −√ α m m √ β m m

00 0 √ α m m −√ β m m −√ α h m √ α h m −√ β h m −√ β h m √ α h m −√ β h m  S (6:10) Na =  √ β h m −√ α m m √ β m m √ α m m −√ β m m −√ α h m √ β h m −√ β h m √ α h m −√ β h m  S (11:15) Na =  −√ α m m √ β m m √ α m m −√ β m m −√ α h m √ β h m

00 0 0 000 0 0 0 √ α m m √ α h m −√ β h m  S (16:20) Na =  −√ α m m √ β m m −√ β m m −√ α m m √ β m m √ α m m −√ β m m −√ α m m √ β m m √ α m m −√ β m m  , where S ( i : j ) Na is the i th -j th column of S Na , and S K =  −√ α n n √ β n n √ α n n −√ β n n −√ α n n √ β n n √ α n n −√ β n n · · ·· · · √ α n n −√ β n n −√ α n n √ β n n √ α n n −√ β n n  . eferences Allen, E. J., Allen, L. J., Arciniega, A., and Greenwood, P. E. (2008). Construction ofequivalent stochastic differential equation models.

Stochastic Analysis and Applica-tions , 26(2):274–297.Anderson, D. F., Ermentrout, B., and Thomas, P. J. (2015). Stochastic representations ofion channel kinetics and exact stochastic simulation of neuronal dynamics.

Journalof Computational Neuroscience , 38(1):67–82.Anderson, D. F. and Kurtz, T. G. (2015).

Stochastic Analysis of Biochemical Systems ,volume 1. Springer.Bruce, I. C. (2009). Evaluation of stochastic differential equation approximation of ionchannel gating models.

Annals of Biomedical Engineering , 37(4):824–838.Clay, J. R. and DeFelice, L. J. (1983). Relationship between membrane excitability andsingle channel open-close kinetics.

Biophys J , 42(2):151–7.Colquhoun, D. and Hawkes, A. (1981). On the stochastic properties of single ion chan-nels.

Proc. R. Soc. Lond. B , 211(1183):205–235.Dangerﬁeld, C., Kay, D., and Burrage, K. (2010). Stochastic models and simulation ofion channel dynamics.

Procedia Computer Science , 1(1):1587–1596.Dangerﬁeld, C. E., Kay, D., and Burrage, K. (2012). Modeling ion channel dynamicsthrough reﬂected stochastic differential equations.

Physical Review E , 85(5):051907.Dayan, P. and Abbott, L. F. (2001).

Theoretical neuroscience: computational and math-ematical modeling of neural systems . Computational Neuroscience. The MIT Press.Dvoretzky, A., Kiefer, J., and Wolfowitz, J. (1956). Asymptotic minimax characterof the sample distribution function and of the classical multinomial estimator.

TheAnnals of Mathematical Statistics , 27(3):642–669.Earnshaw, B. and Keener, J. (2010a). Global asymptotic stability of solutions of nonau-tonomous master equations.

SIAM Journal on Applied Dynamical Systems , 9(1):220–237.Earnshaw, B. and Keener, J. (2010b). Invariant manifolds of binomial-like nonau-tonomous master equations.

SIAM Journal on Applied Dynamical Systems , 9(2):568–588.Faisal, A. A. and Laughlin, S. B. (2007). Stochastic simulations on the reliability ofaction potential propagation in thin axons.

PLoS Computational Biology , 3(5).Fox and Lu (1994). Emergent collective behavior in large numbers of globally coupledindependently stochastic ion channels.

Phys Rev E Stat Phys Plasmas Fluids RelatInterdiscip Topics , 49(4):3421–3431. 51ox, R. F. (1997). Stochastic versions of the Hodgkin-Huxley equations.

BiophysicalJournal , 72(5):2068–2074.Fox, R. F. (2018). Critique of the Fox-Lu model for Hodgkin-Huxley ﬂuctuations inneuron ion channels. arXiv preprint arXiv:1807.07632 .Gadgil, C., Lee, C. H., and Othmer, H. G. (2005). A stochastic analysis of ﬁrst-orderreaction networks.

Bulletin of Mathematical Biology , 67.Gerstner, W., Kempter, R., Van Hemmen, J. L., and Wagner, H. (1996). A neuronallearning rule for sub-millisecond temporal coding.

Nature , 383(6595):76–78.Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions.

J.Phys. Chem. , 81:2340–2361.Gillespie, D. T. (2007). Stochastic simulation of chemical kinetics.

Annu. Rev. Phys.Chem. , 58:35–55.Goldwyn, J. H., Imennov, N. S., Famulare, M., and Shea-Brown, E. (2011). Stochas-tic differential equation models for ion channel noise in Hodgkin-Huxley neurons.

Physical Review E , 83(4):041908.Goldwyn, J. H. and Shea-Brown, E. (2011). The what and where of adding channelnoise to the Hodgkin-Huxley equations.

PLoS Comput Biol , 7(11):e1002247.Goldwyn, J. H., Shea-Brown, E., and Rubinstein, J. T. (2010). Encoding and decodingamplitude-modulated cochlear implant stimulia point process analysis.

Journal ofComputational Neuroscience , 28(3):405–424.Goychuk, I. (2014). Stochastic modeling of excitable dynamics: improved Langevinmodel for mesoscopic channel noise. In

International Conference on Nonlinear Dy-namics of Electronic Systems , pages 325–332. Springer.Greenwood, P. E., McDonnell, M. D., and Ward, L. M. (2015). Dynamics of gammabursts in local ﬁeld potentials.

Neural Computation , 27(1):74–103.Grimmett, G. and Stirzaker, D. (2005).

Probability and Random Processes . OxfordUniversity Press, third edition.G¨uler, M. (2013a). An investigation of the stochastic Hodgkin-Huxley models undernoisy rate functions.

Neural Computation , 25(9):2355–2372.G¨uler, M. (2013b). Stochastic Hodgkin-Huxley equations with colored noise terms inthe conductances.

Neural Computation , 25(1):46–74.Hill, T. L. and Chen, Y.-D. (1972). On the theory of ion transport across the nervemembrane: IV. noise from the open-close kinetics of K+ channels.

BiophysicalJournal , 12(8):948–959. 52odgkin, A. L. and Huxley, A. F. (1952). A quantitative description of membranecurrent and its application to conduction and excitation in nerve.

J Physiol , 117:500–544.Huang, Y., R¨udiger, S., and Shuai, J. (2013). Channel-based Langevin approach for thestochastic Hodgkin-Huxley neuron.

Physical Review E , 87(1):012716.Huang, Y., R ¨udiger, S., and Shuai, J. (2015). Accurate Langevin approaches to simulateMarkovian channel dynamics.

Physical Biology , 12(6):061001.Imennov, N. S. and Rubinstein, J. T. (2009). Stochastic population model for electricalstimulation of the auditory nerve.

IEEE Transactions on Biomedical Engineering ,56(10):2493–2501.Keener, J. P. (2009). Invariant manifold reductions for Markovian ion channel dynam-ics.

J Math Biol , 58(3):447–57.Keener, J. P. (2010). Exact reductions of Markovian dynamics for ion channels with asingle permissive state.

J Math Biol , 60(4):473–9.Keener, J. P. and Sneyd, J. (1998).

Mathematical Physiology , volume 1. Springer.Kispersky, T. and White, J. A. (2008). Stochastic models of ion channel gating.

Schol-arpedia , 3(1):1327.Kloeden, P. E. and Platen, E. (1999).

Numerical Solution of Stochastic DifferentialEquations . Number 23 in Applications of Mathematics. Springer.Kolmogorov, A. (1933). Sulla determinazione empirica di una lgge di distribuzione.

Inst. Ital. Attuari, Giorn. , 4:83–91.Kurtz, T. G. (1981).

Approximation of population processes . Society for Industrial andApplied Mathematics.Linaro, D., Storace, M., and Giugliano, M. (2011). Accurate and fast simulationof channel noise in conductance-based model neurons by diffusion approximation.

PLoS Computational Biology , 7(3):e1001102.Mainen, Z. F. and Sejnowski, T. J. (1995). Reliability of spike timing in neocorticalneurons.

Science , 268(5216):1503–1506.Mino, H., Rubinstein, J. T., and White, J. A. (2002). Comparison of algorithms for thesimulation of action potentials with stochastic sodium channels.

Ann Biomed Eng ,30(4):578–587.Moiseff, A. and Konishi, M. (1981). The owl’s interaural pathway is not involved insound localization.

Journal of Comparative Physiology , 144(3):299–304.Neher, E. and Sakmann, B. (1976). Single-channel currents recorded from membraneof denervated frog muscle ﬁbres.

Nature , 260(5554):799.53ewby, J. M., Bressloff, P. C., and Keener, J. P. (2013). Breakdown of fast-slow analysisin an excitable system with channel noise.

Phys Rev Lett , 111(12):128101.Orio, P. and Soudry, D. (2012). Simple, fast and accurate implementation of the diffu-sion approximation algorithm for stochastic ion channels with multiple states.

PLoSOne , 7(5):e36670.ODonnell, C. and van Rossum, M. C. (2015). Spontaneous action potentials and neuralcoding in unmyelinated axons.

Neural Computation , 27(4):801–818.Pezo, D., Soudry, D., and Orio, P. (2014). Diffusion approximation-based simula-tion of stochastic ion channels: which method to use?

Frontiers in ComputationalNeuroscience , 8:139.Schmandt, N. T. and Gal´an, R. F. (2012). Stochastic-shielding approximation of Markovchains and its application to efﬁciently simulate random ion-channel gating.

PhysicalReview Letters , 109(11):118101.Schmid, G., Goychuk, I., and H¨anggi, P. (2001). Stochastic resonance as a collectiveproperty of ion channel assemblies.

EPL (Europhysics Letters) , 56(1):22.Schmidt, D. R., Gal´an, R. F., and Thomas, P. J. (2018). Stochastic shielding andedge importance for Markov chains with timescale separation.

PLoS ComputationalBiology , 14(6):e1006206.Schmidt, D. R. and Thomas, P. J. (2014). Measuring edge importance: a quantitativeanalysis of the stochastic shielding approximation for random processes on graphs.

The Journal of Mathematical Neuroscience , 4(1):6.Schneidman, E., Freedman, B., and Segev, I. (1998). Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing.

NeuralComputation , 10(7):1679–1703.Sengupta, B., Laughlin, S., and Niven, J. (2010). Comparison of Langevin andMarkov channel noise models for neuronal signal generation.

Physical Review E ,81(1):011918.Shorack, G. R. and Wellner, J. A. (2009).

Empirical processes with applications tostatistics . SIAM.Skaugen, E. and Walløe, L. (1979). Firing behaviour in a stochastic nerve membranemodel based upon the Hodgkin-Huxley equations.

Acta Physiol Scand , 107(4):343–63.Smirnov, N. (1948). Table for estimating the goodness of ﬁt of empirical distributions.

The Annals of Mathematical Statistics , 19(2):279–281.Strassberg, A. F. and DeFelice, L. J. (1993). Limitations of the Hodgkin-Huxley formal-ism: effects of single channel kinetics on transmembrane voltage dynamics.

NeuralComputation , 5(6):843–855. 54asserstein, L. N. (1969). Markov processes over denumerable products of spaces de-scribing large systems of automata.

Problems of Information Transmission , 5(3):47–52.White, J. A., Klink, R., Alonso, A., and Kay, A. R. (1998). Noise from voltage-gatedion channels may inﬂuence neuronal dynamics in the entorhinal cortex.

Journal ofNeurophysiology , 80(1):262–269.White, J. A., Rubinstein, J. T., and Kay, A. R. (2000). Channel noise in neurons.

Trendsin Neurosciences , 23(3):131–137.Woo, J., Miller, C. A., and Abbas, P. J. (2009). Simulation of the Electrically Stim-ulated Cochlear Neuron: Modeling Adaptation to Trains of Electric Pulses.

IEEETransactions on Biomedical Engineering , 56(5):1348–1359.Yu, H., Dhingra, R. R., Dick, T. E., and Gal´an, R. F. (2017). Effects of ion channel noiseon neural circuits: an application to the respiratory pattern generator to investigatebreathing variability.