[PDF] Efficiently simulating discrete-state models with binary decision trees

Abstract

Stochastic simulation algorithms (SSAs) are widely used to numerically investigate the properties of stochastic, discrete-state models. The Gillespie Direct Method is the pre-eminent SSA, and is widely used to generate sample paths of so-called agent-based or individual-based models. However, the simplicity of the Gillespie Direct Method often renders it impractical where large-scale models are to be analysed in detail. In this work, we carefully modify the Gillespie Direct Method so that it uses a customised binary decision tree to trace out sample paths of the model of interest. We show that a decision tree can be constructed to exploit the specific features of the chosen model. Specifically, the events that underpin the model are placed in carefully-chosen leaves of the decision tree in order to minimise the work required to keep the tree up-to-date. The computational efficencies that we realise can provide the apparatus necessary for the investigation of large-scale, discrete-state models that would otherwise be intractable. Two case studies are presented to demonstrate the efficiency of the method.

Full PDF

EEﬃciently simulating discrete-state models with binarydecision trees

Christopher Lester ∗ , Ruth E. Baker † , Christian A. Yates ‡

16 January 2020

Abstract

Stochastic simulation algorithms (SSAs) are widely used to numerically investigatethe properties of stochastic, discrete-state models. The Gillespie Direct Method is thepre-eminent SSA, and is widely used to generate sample paths of so-called agent-basedor individual-based models. However, the simplicity of the Gillespie Direct Method oftenrenders it impractical where large-scale models are to be analysed in detail. In this work,we carefully modify the Gillespie Direct Method so that it uses a customised binarydecision tree to trace out sample paths of the model of interest. We show that a decisiontree can be constructed to exploit the speciﬁc features of the chosen model. Speciﬁcally,the events that underpin the model are placed in carefully-chosen leaves of the decision treein order to minimise the work required to keep the tree up-to-date. The computationaleﬃciencies that we realise can provide the apparatus necessary for the investigation oflarge-scale, discrete-state models that would otherwise be intractable. Two case studiesare presented to demonstrate the eﬃciency of the method.

In the life sciences, models are extensively used to understand and interrogate data, therebyplaying an important part in furthering our understanding of a range of biological mecha-nisms [1, 2]. A multitude of stochastic modelling approaches have enjoyed widespread successes ∗ Mathematical Institute, Woodstock Road, Oxford, OX2 6GG. Email: [email protected] . † Mathematical Institute, Woodstock Road, Oxford, OX2 6GG. ‡ Department of Mathematical Sciences, North Rd, Bath BA2 7AY. a r X i v : . [ q - b i o . Q M ] J a n s interpretable models of real-world physical phenomena [3]. In this manuscript, we focus on discrete-state models that explicitly track each event that changes the model’s state. Discrete-state models take on many forms: by way of example, these include the biochemical reactionnetworks described by a Chemical Master Equation [1, 2, 3], and the agent- or individual-basedmodels designed to undertake detailed epidemiological investigations [4, 5].The dynamics of realistic, discrete-state models are often analytically intractable: speciﬁcally,it is typically impossible to obtain closed-form equations that probabilistically report evensummary statistics of the system of interest [2, 3]. To make sense of the dynamics of a chosenmodel, it is often necessary to use Monte Carlo simulation to generate an ensemble of samplepaths: the sample paths are realisations, or example time-series, of the model. Once thesample paths have been generated, summary statistics can be estimated to characterise thedynamics [6].The Gillespie Direct Method (‘GDM’) is an event-driven algorithm that is widely used to gen-erate sample paths of a chosen stochastic model. The GDM generates sample paths by sequen-tially sampling each event that takes place over a given time interval [6, 7, 8]. Many variationsof the GDM have been implemented, and a range of software packages can be employed toproduce sample paths [9, 10]. To ensure that the largest, most complex and physically-relevantmodels can be thoroughly investigated – that is, ensuring that parameter sweeps, model valida-tion and veriﬁcation studies can be carried out in suﬃcient detail – the GDM must be studied,reﬁned and improved upon.In this manuscript, we focus on one particular computational challenge encountered when usingthe GDM. Speciﬁcally, if there are M classes of event that characterise the interactions (oftendescribed as reactions) within the model, then the standard GDM uses O ( M ) computationalsteps to select each event that takes place [6]. Each event modiﬁes the populations of the model:therefore, the reaction or interaction rates need to be updated, and this requires a further O ( M )computational steps. As explained in the previous paragraph, this computational complexitycan preclude the in-depth analysis of large-scale models.The GDM can be modiﬁed so that a binary decision tree is used to simulate each event [11].With a decision tree, an average of O (log M ) steps are required to generate each reaction orinteraction, but if the decision tree is not carefully chosen, a signiﬁcant number of computationalsteps will still be required as a consequence of the reaction or interaction rates needing to beupdated.As there is a remarkable degree of ﬂexibility in the choice of the binary decision tree that is2sed, we take the opportunity to design a bespoke binary decision tree that can be quicklyupdated after each interaction is simulated. We show that, with a judicial decision tree choice,sample paths can be generated with a substantially lower level of computational resources. This manuscript is arranged as follows: in Section 2, we provide background material so that adiscrete-state model is carefully deﬁned. In Section 3, we develop a bespoke decision tree-basedsimulation algorithm. Our new algorithm is carefully tested, and the results are presented inSection 4. A discussion of the ﬁndings is presented in Section 5.

This section provides the background material that underpins the rest of the manuscript. Westart by introducing notation, and then proceed to outline the Gillespie Direct Method and itsderivatives.

A discrete-state model comprises individuals, each of a given ‘species’. The population of eachspecies can change due to interactions that are often referred to as ‘reactions’. We provide adescription of the terms ‘species’ and ‘reactions’ as follows:

Species.

We use a vector X ( t ) to denote the ‘state vector’ that records the populationscomprising the model. There is considerable ﬂexibility in how the vector X ( t ) might be deﬁned: • if a model is spatially homogeneous, then the total population of each species can betracked through time; • if a model is spatially inhomogeneous, then the location of diﬀerent individuals must berecorded. One approach involves discretising the region of interest into K ‘patches’ or‘compartments’ [12], which we enumerate as 1 , , . . . , K . In this case, the patch population of each species in each patch is recorded.3 eactions. Any interaction, or change to the state vector, is as a result of a ‘reaction’. Weindex reactions as 1 , , . . . , M . The propensity of a reaction speciﬁes its average rate; it iscommon for the propensity to be calculated according to mass action kinetics [2]. In addition,each reaction is associated with a stoichiometric vector that indicates the net change to thestate vector, X , after the reaction has ‘ﬁred’.We denote the propensity function of the j -th reaction as a j ( X ), and the stoichiometric vectoras ν j . Thus, if the j -th reaction takes place, we update X by setting X ← X + ν j . Example.

For an S-I-S compartment model [5], we use ‘ S k ’ to denote a susceptible individualin patch k ; for an infected individual in patch k , we write ‘ I k ’. For a K -patch model the statevector, X , can be expressed as X = [ S , I , S , I , . . . , S K , I K ] T . Infection takes place within a patch: infection of a susceptible individual in patch k by contactwith an infected individual (also within patch k ) can be represented by the reaction S k + I k → · I k , whilst recovery of an infected individual can be represented as I k → S k . The former reaction reduces the S k population by one, and increases the I k population by one.The latter reaction has the reverse eﬀect. For the former reaction, the propensity is proportionalto S k · I k , whilst for the latter reaction, the propensity is proportional to I k . The constants ofproportionality are known as rate constants and may be estimated from experimental data [2].If individuals are able to move from one patch to another these movements are treated asreactions and can be represented as I k → I j , S k → S j , for distinct patches k and j . For the former reaction, the I k population reduces by one, and the4 j population increases by one. A similar result applies for the latter reaction. The propensityof the former reaction is proportional to I k , whilst for the latter reaction, the propensity isproportional to S k . The constants of proportionality can be calculated by referring to thediﬀusion coeﬃcients. (cid:4) The GDM is a serial method in the sense that it generates sample paths of a model of interestby simulating individual reactions in the order in which they take place. A key beneﬁt of thisapproach is that it can be implemented in a straightforward manner, as shown in Algorithm 1.

Algorithm 1

Gillespie Direct Method.

Require: initial conditions, X (0), and ﬁnal time, T . set t ← loop for each reaction j = 1 , . . . , M , calculate a j ( X ) and set a ( X ) ← (cid:80) Mj =1 a j ( X ) sample ∆ ← Exp( a ) if t + ∆ > T then break end if choose the m -th reaction to ﬁre: each reaction j is chosen with probability a j /a set X ← X + ν m , and set t ← t + ∆ end loop At each iteration of the loop of Algorithm 1, the propensity (or average rate) of each reactionis calculated, and the propensities are summed together to give the total propensity (i.e. theaverage rate at which any reaction ﬁres). Then: • in line 4, an exponential variate is used simulate the time between successive reactions; • in line 8, a speciﬁc reaction is chosen. The inverse transform method is a popular techniquefor choosing a speciﬁc reaction, and this method proceeds as follows. First, sample u froma uniform random variate on (0 , k , where k satisﬁes k − (cid:88) j =1 a j ( X ) ≤ u · a ( X ) < k (cid:88) j =1 a j ( X ) . In practice, one can follow a ‘bottom-up’ search procedure: set k ←

1, and then increase k until (cid:80) kj =1 a j ( X ) > u · a ( X ). If the reactions are randomly ordered, then this searchprocedure requires an average of M/ .3 Optimising the Gillespie Direct Method We now present simulation methods that are variants of the GDM. Each variant that wehighlight addresses speciﬁc deﬁciencies of Algorithm 1. We will use these improved simulationmethods as inspiration for the new framework outlined in Section 3.

Optimised and Sorting Direct Methods.

Cao et al. [13] describe the Optimised DirectMethod. When compared with the GDM, the method is more computationally eﬃcient for twoprincipal reasons. Firstly, at each iteration of the GDM, all propensity values are recalculated,even though some of these values will remain the same. Instead Cao et al. [13] draw upa dependency graph to indicate which propensity values might need to be updated at eachiteration of the algorithm. Those propensities that are unaﬀected by the chosen reaction arenot recalculated. Secondly, the Optimised Direct Method uses a carefully-chosen ‘bottom-up’inverse transform method used to select reactions. An average of (cid:80) Mj =1 j · a j ( X ) /a ( X ) steps arerequired to generate each value of k . If the propensities are sorted so that a ≥ a ≥ · · · ≥ a M then the average number of steps required to generate k is minimised.McCollum et al. [14] present an algorithm known as the Sorting Direct Method. This methodtakes into account the fact that reaction frequencies can change over the time-span of interest.McCollum et al. [14] dynamically reorder the reactions in a speciﬁed list that is used as partof the inverse transform method to select reactions. Every time a reaction is ﬁred, it swapsplaces with the reaction above it in the list. The eﬀect is that frequently-encountered reactionsappear towards the top of the list, whilst rarely-seen reactions linger at the bottom. Logarithmic Direct Method.

If the frequencies of diﬀerent reactions vary by orders of mag-nitude, then the above-mentioned methods can signiﬁcantly accelerate stochastic simulation.However, if all reactions occur with similar frequencies, then the methods provide only limitedbeneﬁts. Moreover, the computational complexity of the method for choosing the reactionto ﬁre remains O ( M ). In a technical report, Li and Petzold [11] point out that propensityvalues are repeatedly summed – ﬁrst, to calculate the total propensity, a ( X ), and second,as part of the inverse transform method. Li and Petzold [11] therefore propose the follow-ing strategy: instead of storing the propensities, store the cumulative sum of the ﬁrst j (for j = 1 , . . . , M ) propensities, b , . . . , b M , where b j ≡ (cid:80) jj (cid:48) =1 a j ( X ) (with b ≡

0; equivalently, set b j ≡ b j − + a j ( X )). Then, a reaction k is ﬁred by performing a binary search to determine thevalue of k such that b k − < u · b M < b k . O (log ( M )); however, O ( M ) steps are still required to generate b , . . . , b M . Recycling random numbers.

The unmodiﬁed GDM uses two random numbers during eachiteration: one random number to determine the time spanned by that step, and a second randomnumber as part of the inverse transform method to choose a reaction. Yates and Klingbeil [15]noted that sample paths can be generated by using only one random number at each step.They show that if a (0 , u , is sampled and the inverse transformmethod is used to sample a reaction k , then v , deﬁned as v ≡  u · a ( X ) a ( X ) , k = 1 ,u · a ( X ) − (cid:80) k − i =1 a i ( X ) a k ( X ) , k > , is a (0 , k . The random variate v can be usedto generate the waiting time until the next reaction, ∆, as∆ ≡ log(1 /v ) a ( X ) . We are now in a position to set out a general approach to eﬃciently simulating a large-scalestochastic model in more detail.

In this section, we present a new, integrative method for generating sample paths of the mod-els speciﬁed in Section 2.1. Building on the GDM, we eﬃciently simulate reactions with abinary decision tree. We demonstrate that a well-chosen decision tree can lead to substantialcomputational savings, and we provide a heuristic method for choosing such decision trees.

The implementation of the GDM described in Section 2.2 employed the inverse-transformmethod to ﬁre (i.e. sample) reactions. Taking into account the Logarithmic Direct Methodoutlined in Section 2.3, we now detail how a binary decision tree can be used to ﬁre reactionseﬃciently. We start with a deﬁnition: 7 inary tree.

Whilst more formal deﬁnitions are possible, a binary tree, D , comprises a set ofnodes that are joined together by edges, subject to the following: • a node is classiﬁed as a ‘leaf’ or ‘non-leaf’; • a ‘non-leaf’ is the ‘parent-node’ to precisely two ‘child-nodes’: a left-child, and a right-child; • a ‘leaf’ node has no children; • there is exactly one node which has no ‘parent-node’: this is the ‘root’ of the tree.An edge connects two nodes if there is a parent-child relationship between them; an example ofa binary tree is shown in Figure 1. We will assign a weight ω to each node α . The descendantsof a node include the child-nodes, the child-nodes of the child-nodes, and so on. rootnodeleaf leaf leaf ω = ω + ω ω = ω + ω ω ω ω Figure 1: An example of a binary tree. On the left-hand side, the ‘leaf’ nodes are shown in anolive colour, whilst the ‘non-leaf’ nodes are shown in blue. The weights of the nodes are shownon the right-hand side .

Binary trees as a decision tool.

We use a binary tree to simulate reactions as follows.We represent each reaction, j = 1 , . . . , M , as a leaf node in the binary tree D , with weight ω = a j ( X ). We then generate non-leaf nodes to join the tree up. The weight of each non-leafnode is given by the sum of the weights of its child-nodes. In particular, this means the weightof a node is equal to the total propensity of all of the leaves that are its descendants. Theweight of the root node is therefore equal to the sum of all the propensities (i.e. the totalpropensity, a ( X )).This means that a reaction can be chosen by following a recursive procedure: start from theroot, and select the left-child with probability ω left / ( ω left + ω right ) (where ω left and ω right referto the respective weights of the child-nodes). Otherwise, select the right node. We are therebycomparing the total propensity associated with the descendants of the left-child with that ofthe right-child so as to decide which branch of the tree to explore further. Now consider the8hildren of the node just selected (i.e. the ‘grand-children’ of the root node): select the left(grand-) child with probability ω (cid:48) left / ( ω (cid:48) left + ω (cid:48) right ), otherwise select the right (grand-) child.Recursively repeat until a leaf is selected. The reaction associated with this leaf is ﬁred.We illustrate further with an example. Example.

Consider an S-E-I-R compartment model [5]. In this model, a susceptible individual(state ‘S’), is exposed to a disease (state ‘E’) after coming into contact with an infectious (state‘I’) individual. After a random period of time, exposed individuals become infectious, andinfectious individuals are removed (state ‘R’) from the system (either through recovery ordeath).The reactions that govern this model are detailed on the left of Figure 2. On the right, a binarydecision tree is shown. The reactions are represented by leaves α = 1 , ,

3. The other nodes(created to connect the tree) are represented by α = 4 ,

5. The weights of α = 1 , ω , ω and ω , are given by the propensities of their respective reactions: θ · S · I , θ · E , and θ · I ,respectively. The weight of α = 4 is ω = ω + ω , and the weight of α = 5 is ω = ω + ω . α = 5 is the root. α = 1 : S + I θ −→ E + Iα = 2 : E θ −→ Iα = 3 : I θ −→ R α = 5( ω = ω + ω ) α = 4( ω = ω + ω ) α = 1( ω = θ · S · I ) α = 2( ω = θ · E ) α = 3( ω = θ · I )Figure 2: The reactions of an S-E-I-R model are shown on the left, and a corresponding binarydecision tree is shown on the right. The index of each node is shown, together with its weightin brackets. Further details are contained within the main text.Having described how one might modify the GDM to make use of a binary decision tree, wenow use Section 3.2 to provide pseudo-code of our simulation algorithm. In Section 3.3, weexplain how the tree D can be chosen to ensure computational eﬃciency.9 .2 A reformulated Gillespie Direct Method A redesigned GDM is presented as Algorithm 2. Algorithm 2 will use a binary decision treeto ﬁre reactions. In turn, Algorithm 2 relies on three sub-routines: Algorithm 2a is used forthe initial parametrisation of the decision tree, Algorithm 2b is directly responsible for ﬁringreactions, and Algorithm 2c ensures the tree remains up-to-date. Note that Algorithm 2a runsonce: thereafter, we alternate between Algorithms 2b and 2c. We ﬁrst discuss Algorithms 2band 2c:

Firing reactions.

In Section 3.1 we explained that, by starting at the root of the tree D ,and repeatedly choosing either the left or right child-node, eventually a leaf is reached. Thereaction associated with this leaf is ﬁred. At each step of Algorithm 2b, the total propensityassociated with all reactions descending the left- and the right-child-nodes are compared. Theleft-child is selected with probability ω left / ( ω left + ω right ). A (0 , u , isgenerated. Then, u ≤ ω left ω left + ω right = ⇒ go left , u > ω left ω left + ω right = ⇒ go right . This procedure can be repeated, using a new random number at each step, until a leaf (i.e. areaction) is reached.The algorithm is more eﬃcient if the random numbers are recycled. To do this, let the randomnumber used at step i be u i . Then we let u i +1 , the random number to be used at the next step,be chosen aspicked left = ⇒ u i +1 ≡ u i · ( ω left + ω right ) ω left , picked right = ⇒ u i +1 ≡ u i · ( ω left + ω right ) − ω left ω right . Clearly, u i and u i +1 depend on each other. However, if we left d i ∈ { (cid:96), r } indicate the choice ofpicking the left- or right-child at step i , then the decision d i +1 is independent of decision d i : P (cid:104) d i +1 , d i (cid:105) = (cid:90) P (cid:104) d i +1 , d i | u i +1 (cid:105) d P (cid:104) u i +1 (cid:105) = (cid:90) P (cid:104) d i +1 | d i , u i +1 (cid:105) · P (cid:104) d i | u i +1 (cid:105) d P (cid:104) u i +1 (cid:105) = (cid:90) P (cid:104) d i +1 | u i +1 (cid:105) · P (cid:104) d i (cid:105) d P (cid:104) u i +1 (cid:105) = P (cid:104) d i +1 (cid:105) · P (cid:104) d i (cid:105) . If, instead of working with (0 , v ∼ (0 , ω left + ω right )-uniform10andom variates, then we explore D as follows: v ≤ ω left = ⇒ go left , v > ω left = ⇒ go right . Random numbers can be recycled, and we can take v aspicked left = ⇒ v i +1 ≡ v i , picked right = ⇒ v i +1 ≡ v i − ω left . Algorithm 2b implements this method.

Updating the tree.

After a reaction j has ﬁred, the population numbers change, and itcould be necessary for some reaction propensities to be updated. For each reaction, j , a list ofpropensities that need to be recalculated can be drawn up a priori as K j = (cid:8) k : at least one of the reactants of reaction k is changed by reaction j (cid:9) . After a reaction j has ﬁred, all the leaf nodes of D that are included in the list K j need theirweights updated. In addition, as the weight of each non-leaf node is given by the sum of theweights of its child-nodes, a number of other nodes will need their weights recalculated. Wewrite Λ j as the list of all nodes that need their weights recalculated. Algorithm 2c updatesthe tree. Figure 3 (which is contained within the next section) provides a diagrammaticalrepresentation of this update procedure. We have thus established how a binary decision tree can be used to sample the reactions of adiscrete-state stochastic model. We now consider what a highly eﬃcient binary decision treemight look like. Once such a decision tree is decided upon, Algorithm 2 (which, in turn, callsAlgorithms 2b and 2c) will be utilised to generate sample paths of the model.In Section 3.2 we outlined the two major components of the computational cost of using amodiﬁed GDM: ﬁrstly, Algorithm 2b is used to sample each reaction by exploring the decisiontree, and, secondly, Algorithm 2c updates the decision tree as required. It is possible to directlyoptimise the performance of Algorithm 2b by using the Huﬀman coding algorithm. The Huﬀ-man algorithm is widely used in the ﬁeld of communication theory, and places nodes with highﬁring probabilities close to the root, whilst placing nodes that ﬁre rarely far from the root [16].The eﬀect of the Huﬀman algorithm is to minimise the number of steps required to sample a11 lgorithm 2

Eﬃcient Gillespie Direct Method.

Require: initial conditions, X (0), decision tree, D , and ﬁnal time, T . set t ← for each reaction j = 1 , . . . M , calculate propensity a j ( X ) populate tree D : use Algorithm loop sample ∆ ← Exp( ω root ) if t + ∆ > T then break end if choose the k -th reaction to ﬁre: use Algorithm set X ← X + ν k , and set t ← t + ∆ update tree D as per Algorithm end loopAlgorithm 2a

Sub-method to populate decision tree with weights.

Require:

Tree D for each node α ∈ D do , ordered by decreasing distance from root if α is a leaf then , set weight ω α ← a j ( X ) else set weight ω α ← ω left + ω right . end if end forAlgorithm 2b Sub-method to search for a reaction to ﬁre. This algorithm is recursive, so aprocedure is used to start.

Require:

Binary Tree, D , and r ∼ U (0 , ω root ). procedure Select Reaction return Decide ( α root , r ) end procedureRequire: node, α , with weight ω α ; left-child-node α left with weight ω left ; right-child-node α right function Decide ( α, r ): if α is a leaf then return α else if w left > r then return Decide ( α left , r ) else return Decide ( α right , u − ω left ) end if end if end functionAlgorithm 2c Sub-method that updates decision tree with new weights, as required.

Require:

Tree D , and reaction j for each node α ∈ Λ j do , ordered by decreasing distance from root if α is a leaf then , set ω α ← a j ( X ) else set ω α ← ω left + ω right . end if end for single path from the root of the tree to a node. When the tree is updated, wetraverse multiple paths from the leaves associated with reactions with new propensity values,up to the root. Following the notation of Section 3.2, if reaction j ﬁres, then (cid:107) K j (cid:107) propensitiesneed to be recalculated, which means, potentially, (cid:107) K j (cid:107) paths to follow. Whilst each model hasits own structure, for a model comprising many connected, interacting individuals, (cid:107) K j (cid:107) mighttypically be larger than one, so that the cost of updating the decision tree could dominate thecost of searching the tree.We seek to design a decision tree so that steps in the tree update process speciﬁed by Algorithm2c can be shared. For example, if the j -th reaction ﬁres, and this results in an update to thepropensities of reactions k and (cid:96) , then, if the leaves representing reactions k and (cid:96) are siblings,the updates to the non-leaf nodes between k and (cid:96) , and the root, can be shared. This willreduce the computational cost of updating the tree. Further, if the leaves representing k and (cid:96) are ‘cousins’, then all but one of the updates to the non-leaf nodes between k and (cid:96) , andthe root, can be shared. In order to focus on such computational improvements, at this stage,we will restrict attention to balanced trees, which means that the distance from the root todiﬀerent leaves diﬀers by at most one. We illustrate the beneﬁts of our proposed improvementswith an example. Example.

We consider two decision trees that are shown in Figure 3. In this example, leaveswith a path distance of two between them share parent-, grand-parent- and root- nodes. Leaveswith a path distance of four between them share a grand-parent and the root node, whilst leaveswith a path distance of six share only the root.Suppose the j -th reaction ﬁres, resulting in an update to the propensities of reactions 1 and 2.Thus, the weights of nodes α = 1 and α = 2 are recalculated.Recursively, all nodes on the path from α = 1 or α = 2 to the root need to be recalculated.If α = 1 and α = 2 are siblings, then there are three non-leaf nodes that need their weightsrecalculated: the diagram on the left shows that α = 3, α = 5 and α = 7 all need to beupdated. By contrast, if α = 1 and α = 2 are not siblings, then more internal nodes requireweight recalculation: the decision tree on the right shows that α = 3, α = 4, α = 5, α = 6 and α = 7 need their weights recalculated. (cid:4) = 7 α = 5 α = 3 α = 1 α = 2 ·· · ··· · ·· · α = 7 α = 5 α = 3 α = 1 · ·· · α = 6 ·· · α = 4 · α = 2Figure 3: Two decision trees of a stochastic model comprising eight reactions are shown. If theweights of nodes α = 1 and α = 2 need to be recalculated, additional updates are required,and the respective values of α are indicated on the respective decision trees. Dots indicate thenodes that do not need to be updated. The tree on the left-hand side can be updated moreeﬃciently than the tree on the right-hand side. Further details are contained within the maintext.The method for building a decision tree comprises the following two key steps:1. generate an interactivity graph , with nodes representing reactions and edge weights indi-cating a score that captures the frequency with which pairs of reaction propensities aresimultaneously updated;2. construct a decision tree with reference to the interactivity graph, so that Algorithm 2ccan carry its task out with only limited computational resources. Generating an interactivity graph.

The interactivity graph G ( V, E ) is constructed asfollows. Each reaction is represented as a node. Each pair of nodes (i.e. reactions) ( i, j ) isconnected by a weighted edge; the weight of this edge indicates the frequency of simultaneousupdates to the propensities of this pair of reactions.The frequency of simultaneous updates to reaction propensities is not generally known a priori ,and must be estimated. A very small number of survey sample paths are generated to providethis estimate. Let ω k indicate the total number of times reaction k has ﬁred (taken over thesurvey sample paths). Then, the edge weight connecting nodes i and j is given by E ( i, j ) = (cid:88) k ω k · I { R k updates a i } · I { R k updates a j } , where the sum is taken over all reactions of the model. Alternatively, edge weights can be14rovided by an unweighted sum: E ( i, j ) = (cid:88) k I { R k updates a i } · I { R k updates a j } . The unweighted sum proceeds as if all reactions have roughly equal propensities, and has thebeneﬁt of not requiring the generation of survey simulations to calibrate the algorithm.

Example.

Returning to the example of an S-E-I-R model [5] (see Section 3.1), on the left ofFigure 4 three reactions are indicated. On the right of this ﬁgure, the interactivity graph isshown: each node is associated with the indicated reaction, and the edge weights are indicated. α = 1 : S + I θ −→ E + Iα = 2 : E θ −→ Iα = 3 : I θ −→ R α = 1 α = 2 α = 3 ω ω + ω ω + ω Figure 4: The reactions of an S-E-I-R model are shown on the left, and a correspondinginteractivity graph is shown on the right. Further details are contained within the main text.

Constructing a decision tree.

Having restricted ourselves to a balanced decision tree, fora total of M reactions, the number of functionally distinct decision trees is of the order of O ( M ! / M − ), which means that exhaustively searching for a decision tree is generally notfeasible. Therefore, our method of constructing a binary decision tree is, by necessity, heuristic.The procedure starts by algorithmically bisecting the interactivity graph, placing each node ineither the ‘left-’ or the ‘right-’ set, with this bisection chosen to minimise the edge cut, that is,to achieve min (cid:88) i ∈ left (cid:88) j ∈ right E ( i, j ) , where E ( i, j ) is the weight of the edge between i and j , subject to the left- and right-sets havingthe same size (or, if necessary, diﬀering by one node). This means that, on the whole, wherereactions i and j frequently have their propensities updated together, they are both in the sameset.This partition generates the ﬁrst decision of the decision tree: from the root, the left-childrepresents all reactions assigned to the left-set, whilst the right-child represents all reactionsassigned to the right-set. The procedure is then recursively repeated inside the left-set and the15ight-set, until the sets can no longer be bisected. To illustrate we consider an example. Example.

We consider a model comprising eight reactions. An interactivity graph is generated.Then, the eight reactions are bisected into two sets of size four. Each set of size four is bisectedagain. On the left of Figure 5, the red boxes indicate the two sets after the ﬁrst bisection; theblue boxes indicate the second-level bisections (and the third-level bisections are trivial). Theresultant binary decision tree is shown on the right of Figure 5. Note that the designation ofleft- and right-set is arbitrary and does not change the graph. (cid:4) α = 1 α = 2 α = 3 α = 4 α = 5 α = 6 α = 7 α = 8 ··· α = α = · α = α = ·· α = α = · α = α = Figure 5: The left side indicates how an interactivity graph is repeatedly bisected. The rightside shows the resultant decision generated by the algorithm. Further details are containedwithin the main text.Finally, we need to specify the heuristic graph partitioning algorithm that we wish to use. Thealgorithm will need to, at each level, bisect the nodes whilst minimising the resultant edge cut.A wide range of algorithms would be suitable, but we will use the ubiquitous Kernighan-Linalgorithm [17]; the method is detailed in Algorithm 3.We are now in a position to commence numerical investigations into the new simulation method.

In this section, we present two case studies that demonstrate the eﬃciency of the reﬁnedsimulation algorithm. The ﬁrst case study is concerned with the diﬀusion of a single species ona patch network. The second case study models the spread of a disease between communities.The numerical performance of diﬀerent simulation algorithms is compared by referring to thenumber of reactions ﬁred per second of CPU time. We will compare the performance of abespoke decision tree (BDT) chosen by the method described in Section 3, with the performance16f a random decision tree (RDT). The RDT will be binary and balanced. A discussion of theresults follows in Section 5. All the sample paths were generated on an Intel Core i5 CPU ratedat 2.5 GHz.

In this ﬁrst case study, we consider the diﬀusion of a single species, X . The species inhabits adomain comprising K patches, and individual members of the species can move from one patchto another patch only if the patches are connected.Clearly, the relative computational performance of competing simulation algorithms will dependon the ‘contact network’ that deﬁnes which patches are connected to each other. Suppose eachpatch of the model is associated with a node in the contact network. If it is possible to movefrom one patch to another in the model, there is to be an edge between the associated nodesof the contact network. We will generate the interaction network by following two widely-usedframeworks: the Erd˝os-R´enyi [18] and Barab´asi-Albert [19] models. Each framework is nowintroduced. The Erd˝os-R´enyi model.

An Erd˝os-R´enyi (ER) model on K nodes [18] is constructed by Algorithm 3

Kernighan-Lin graph bisection algorithm [17].

Require: interactivity graph, G ( V, E ) randomly partition V into disjoint sets A and B such that (cid:12)(cid:12) | A | − | B | (cid:12)(cid:12) ≤ repeat compute D v for all v ∈ V as D v = (cid:88) v (cid:48) ∈ B E ( v, v (cid:48) ) − (cid:88) v (cid:48) ∈ A E ( v, v (cid:48) ) let g v , a v , and b v be empty lists for n = 1 , . . . , (cid:98)| V | / (cid:99) do ﬁnd a ∈ A and b ∈ B , such that g = D a + D b − · E ( a, b ) is maximal remove a and b from further consideration in this pass add g to g v , a to a v , and b to b v update D v for A ← A \ a , and B ← B \ b end for ﬁnd k which maximises g max , the sum of g v [1] , . . . , g v [ k ] if g max > then exchange a v [1] , a v [2] , . . . , a v [ k ] with b v [1] , b v [2] , . . . , b v [ k ] end if until g max ≤ return G(V,E) 17onnecting nodes randomly. Each out of the possible edges is included with constant probability p ; diﬀerent edges are independently included. The Barab´asi-Albert model . The Barab´asi-Albert (BA) model is a widely-used example of ascale-free interaction network that generates a random graph on K nodes, and is in widespreaduse as a modelling tool. For example, the BA model has been used to describe hyperlinksbetween web-pages of the internet [19]. The BA model uses preferential attachment, whichmeans that, as new nodes are added to a graph, they are more likely to be connected toexisting nodes with more edges radiating from them. Barab´asi and Albert [19] describe theirmodel as follows:“Starting with a small number ( m ) vertices [nodes], at every time step we adda new vertex with m ( ≤ m ) edges that link the new vertex to m diﬀerent verticesalready present in the system. To incorporate preferential attachment, we assumethat the probability Π that a new vertex will be connected to a vertex i dependson the connectivity k i of that vertex, so that Π( k i ) = k i / (cid:80) j k j .”In order to model the movement of a species, X , over K patches, we ﬁx K = 50. To robustlycompare competing simulation algorithms, we must ﬁrst generate diﬀerent contact networks,and then compare the performance of the competing algorithms on each of the contact networks.We start with the ER model. First, we set p ∈ { / , / , / , / , / } : larger values of p result in a higher proportion of patches being in contact with other patches. Then, for eachvalue of p , ﬁve diﬀerent stochastic models are generated, each having its own contact network.Thus, there are 25 models in total. For each of the 25 models, we proceed to: • construct a BDT tree according to Section 3; • construct 100 RDTs by randomly assigning reactions to leaves of the tree.We then compare the performance of the simulation algorithm with a carefully-chosen BDTagainst a reference distribution of RDTs.In Table 1 and Figure 6, we compare the computational performance of a BDT generated asdescribed in Section 3 against the performance of random RDTs. For each of the aforementionedchoices of p , and for each of the ﬁve randomly-generated ER contact networks, we generatea BDT. We then generate 100 RDTs. The decision trees are then used to simulate sample18 onﬁguration p = 1 / p = 3 / p = 5 / p = 7 / p = 9 / Bespoke tree ( s − ) . · . · . · . · . · Random tree ( s − ) . · . · . · . · . · Speed-up (%)

17 42 132 266 330Table 1: The computational performance of a BDT is compared with the performance of RDTs.This table shows the average number of reactions ﬁred per CPU-second when movement ona contact-network based on an ER model is simulated. The averages are calculated for eachchoice of the parameter p of the ER model. The BDT is signiﬁcantly more eﬃcient, especiallyfor densely connected patch networks.paths; we determine the number of reactions simulated by each algorithm per CPU-second.The summary provided by Table 1 shows that a BDT typically outperforms an RDT, with theBDT being able to simulate between 17% and 330% more reactions per CPU-second (with thespeed-up factor depending on the value of p ). Figure 7 provides more detail: for each diﬀerentcontact network, the CPU performance of each of the 100 RDTs is represented as a histogram,and the performance of the BDT is superimposed. In particular, Figure 7 demonstrates thatthe BDT consistently outperforms RDTs. The improvements introduced with a BDT are moreapparent for larger values of p : we return to this point in the discussion (Section 5).We now proceed to consider the BA model. As before, a range of interaction graphs is generatedby sweeping through diﬀerent choices of m : we take m ∈ { , , , , } , and m = m . For eachchoice of m , we generate ﬁve random interaction graphs as a basis for comparing the BDT witha range of RDTs. In Table 2, the computational performance of the diﬀerent decision trees iscompared. Table 2 shows that, based on the test data, a BDT is, on average, between 7% and114% more eﬃcient than an RDT. The eﬃciencies are more apparent when m is large: indeed,this corresponds with a greater number of patches being in contact with each other.Further details are provided in Figure 7. We point out that Figure 7 illustrates that when veryfew edges are present in the contact network the performance of the BDT is not dissimilarto the typical performance of an RDT. Where there are few edges in the contact network theinteractivity graph is sparse and there are only limited opportunities to simultaneously updatethe weights of the nodes of the decision tree. As such, only limited computational savings overan RDT are possible. 19 onﬁguration m = 1 m = 2 m = 4 m = 8 m = 16 Edges present (%)

Bespoke tree ( s − ) . · . · . · . · . · Random tree ( s − ) . · . · . · . · . · Speed-up (%) m of the BA model, and the proportion of edges present in the contactnetwork is also shown. The BDT is generally more eﬃcient, especially for densely connectedpatch networks. In this second case study, a spatially inhomogeneous S-I-S compartment model is investi-gated [5]. Individuals within this model are either susceptible (state ‘S’) or infected (state‘I’) with a disease, and the patch that contains the individual is recorded. An infection of asusceptible individual in patch k can occur due to either a local or a long-range infection. Alocal infection is represented as the reaction S k + I k θ −→ · I k . A contact network speciﬁes how long-range infections take place: where patches k and (cid:96) are incontact, a long-range infection is represented as S k + I (cid:96) θ −→ I k + I (cid:96) (for k (cid:54) = (cid:96) ) . The recovery of an infected individual can be represented as I k θ −→ S k . We take ( θ , θ , θ ) as (1 , − , ), and, inside each of the K = 50 patches, initially thereare 10 infected and 90 susceptible individuals. Contact networks, that indicate which long-range infections are possible, are generated with the ER method with the parameter p set as p ∈ { / , / , / , / , / } . Then for each value of p ﬁve diﬀerent stochastic models aregenerated, each having its own contact network.20s before, we compare the performance of a BDT generated as described in Section 3 againsta reference distribution of RDTs. In Table 3, the computational performance of the diﬀerentdecision trees is compared. Table 3 shows that, based on the test data, a BDT is, on average,between 15% and 78% more eﬃcient than an RDT. Further information is contained in Figure8. As with the case study presented in Section 4.1, when p is small, very few edges are presentin the model, and in these cases, Figure 8 conﬁrms that the performance of the BDT is notdissimilar to the typical performance of an RDT. Conﬁguration p = 1 / p = 3 / p = 5 / p = 7 / p = 9 / Bespoke tree ( s − ) . · . · . · . · . · Random tree ( s − ) . · . · . · . · . · Speed-up (%)

15 37 62 69 78Table 3: The computational performance of a BDT is compared with the performance of RDTs.This table shows the average number of reactions ﬁred per CPU-second where the long-rangeinteractions of an S-I-S model are determined with an ER-modelled contact network. Theaverages are calculated for each choice of the parameter p of the ER model. The BDT isgenerally more eﬃcient. Agent-based models of complicated real-world phenomena would serve no useful purpose ifit were impossible to characterise the resultant models’ dynamics. As such, the design anddevelopment of eﬃcient stochastic simulation algorithms play a pivotal role in underpinningthe work of the modelling community. In this work, we present a technique for designingsimulation algorithms that exploit the speciﬁc dynamics of the stochastic, discrete-state modelof interest. We then proceeded to evaluate the computational performance with a number ofcase studies.In this ﬁnal section, we discuss the conclusions under three headings: ﬁrstly, we explain howa highly eﬃcient simulation algorithm facilitates the use of more complicated and realisticdiscrete-state models as a modelling tool. Secondly, we explain how these conclusions willinevitably lead to the development of parallelised simulation algorithms for event-based models.Thirdly, we present our view of the direction future research is likely to take. We then concludewith an outlook.

Modelling implications.

By using a carefully-chosen binary decision tree to ﬁre reactions in21 simulation algorithm, we allow more sample paths to be generated for a given level of com-putational resources. An immediate beneﬁt is the ability for a researcher to conduct parametersweeps of a chosen model in more detail, thereby being in a stronger position to characterisethe model dynamics. A more subtle beneﬁt lies in the ability to describe more complicatedmodels, since our BDTs make it feasible to include many, long-range interactions as part of themodel, even if these interactions only occur infrequently.Section 4 demonstrates that, for a ﬁxed number of species and patches, this algorithm isparticularly suited for use with models with a relatively high number of reactions. In particular,in Section 4.1 we studied the movement of individuals on Erd˝os-R´enyi contact graphs. Figure9 shows that, as the number of patch-to-patch contacts increases, the number of reactions perCPU-second of both a BDT- and an RDT-based simulation algorithm decreases. However, theperformance of a bespoke algorithm is signiﬁcantly less aﬀected.

Parallel simulation algorithms.

Whilst a parallelised implementation of a simulation algo-rithm will not reduce the CPU-seconds required for each sample path, by using more resourcessimultaneously, model statistics can be available sooner to an end-user. For simple discrete-state models, sample paths can be simultaneously generated through a trivial parallelisation:for each CPU core, a new thread, endowed with a diﬀerent random number seed, is spawnedand the simulation completed. Where a model comprises a very large number of interactingspecies, it might be necessary for the system to be partitioned, with diﬀerent components ofthe system assigned to diﬀerent CPU cores for simulation. Where a cross-partition interactionis to take place, the diﬀerent CPU threads will need to be synchronised. In our view, the devel-opment of an interactivity graph represents an important advance in transforming stochasticsimulation into a parallelised endeavour.

Future work.

Additional work that builds on our bespoke simulation strategy is likely totake on three forms. Firstly, by relaxing the constraints we have imposed on the decisiontree, namely, that it is both balanced and binary, further computational improvements mightbe possible. Secondly, inline updating of the decision tree, that is, a continuous reﬁnementof the decision tree structure as sample paths are generated might also be possible. Thirdly,alternative approaches for generating decision trees, such as the use of a genetic algorithm,might also yield promising results. 22 .1 Outlook

In this work, we demonstrated that a binary decision tree provides an eﬀective means of sam-pling reactions of a discrete-state stochastic model. Careful choice of the decision tree providessigniﬁcant computational improvements over a random selection of a decision tree. Whilst thebeneﬁts of the new method are most noticeable in a stochastic model comprising a relativelyhigh number of reactions, the method is suitable for use with a general, discrete-state model.

References [1] Allen, L. J.

An Introduction to Stochastic Processes with Applications to Biology . CRCPress, 2010.[2] Wilkinson, D. J.

Stochastic Modelling for Systems Biology . CRC Press, 2006.[3] Van Kampen, N. G.

Stochastic Processes in Physics and Chemistry , volume 1. Elsevier,1992.[4] Andersson, H. and Britton, T.

Stochastic Epidemic Models and their Statistical Analysis .Springer, 2012.[5] Diekmann, O. and Heesterbeek, J. A. P.

Mathematical Epidemiology of Infectious Diseases:Model Building, Analysis and Interpretation , volume 5. John Wiley & Sons, 2000.[6] Gillespie, D. T., Hellander, A., and Petzold, L. R. Perspective: Stochastic algorithms forchemical kinetics.

Journal of Chemical Physics , (17), 2013.[7] Gillespie, D. T. A general method for numerically simulating the stochastic time evolutionof coupled chemical reactions. Journal of Computational Physics , (4):403–434, 1976.[8] Gillespie, D. T. Exact stochastic simulation of coupled chemical reactions. Journal ofPhysical Chemistry , (25):2340–2361, 1977.[9] Hattne, J., Fange, D., and Elf, J. Stochastic reaction-diﬀusion simulation with MesoRD. Bioinformatics , (12):2923–2924, 2005.[10] Drawert, B., Engblom, S., and Hellander, A. URDME: a modular framework for stochasticsimulation of reaction-transport processes in complex geometries. BMC Systems Biology , (1):76, 2012. 2311] Li, H. and Petzold, L. Logarithmic direct method for discrete stochastic simulation ofchemically reacting systems. UCSB, Technical Report , 2006.[12] Meinecke, L.

Stochastic Simulation of Multiscale Reaction-Diﬀusion Models via First ExitTimes . Ph.D. thesis, Uppsala University, 2016.[13] Cao, Y., Li, H., and Petzold, L. Eﬃcient formulation of the stochastic simulation algorithmfor chemically reacting systems.

Journal of Chemical Physics , (9):4059–4067, 2004.[14] McCollum, J. M., Peterson, G. D., Cox, C. D., Simpson, M. L., and Samatova, N. F.The sorting direct method for stochastic simulation of biochemical systems with varyingreaction execution behavior. Computational Biology and Chemistry , (1):39–49, 2006.[15] Yates, C. A. and Klingbeil, G. Recycling random numbers in the stochastic simulationalgorithm. Journal of Chemical Physics , (9):094103, 2013.[16] Welsh, D. Codes and Cryptography . Oxford University Press, 1988.[17] Kernighan, B. W. and Lin, S. An eﬃcient heuristic procedure for partitioning graphs.

TheBell System Technical Journal , (2):291–307, 1970.[18] Erd˝os, P. and R´enyi, A. On the evolution of random graphs. Publications of the Mathe-matical Institute of the Hungarian Academy of Sciences , (1):17–60, 1960.[19] Barab´asi, A.-L. and Albert, R. Emergence of scaling in random networks. Science , (5439):509–512, 1999. 24 .4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 p = / Case 1

Case 2

Case 3

Case 4

Case 5 p = / p = / p = / p = / Figure 6: Twenty-ﬁve contact-networks are generated, and the movement of particles, as de-scribed in Section 4.1, on these networks is simulated. The computational performance of aBDT is compared with the performance of RDTs. The histograms represent the average num-ber of reactions ﬁred per CPU second for each of the 100 diﬀerent RDTs, whilst the red linesindicate the reaction rate of the BDT. The contact networks are ER graphs generated with theindicated parameter value of p . m = Case 1

Case 2

Case 3

Case 4

Case 5 m = m = m = m = Figure 7: Twenty-ﬁve contact-networks are generated, and the movement of particles on thesenetworks, as described in Section 4.1, is simulated. The computational performance of a BDTis compared with the performance of RDTs. The histograms represent the average number ofreactions ﬁred per CPU second for each of the 100 diﬀerent RDTs, whilst the red lines indicatethe reaction rate of the BDT. The contact networks are BA graphs generated with the indicatedparameter value of m . 25 .8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 p = / Case 1

Case 2

Case 3

Case 4

Case 5 p = / p = / p = / p = / Figure 8: Twenty-ﬁve contact-networks are generated, and the spread of a disease on thesenetworks, as described in Section 4.2, is simulated. The computational performance of a BDTis compared with the performance of RDTs. The histograms represent the average number ofreactions ﬁred per CPU second for each of the 100 diﬀerent RDTs, whilst the red lines indicatethe reaction rate of the BDT. The contact networks are BA graphs generated with the indicatedparameter value of p .

10 20 30 40 50 60 70 80 90Proportion of edges present (%)1234567 R e a c t i o n s p e r C P U s e c o n d ( ) Bespoke TreeRandom Tree

Figure 9: The average number of reactions per CPU-second is shown for BDT and RDT-based simulations of movement on a contact network. As per Section 4.1, 25 diﬀerent contactnetworks are generated (ﬁve contacts networks for each choice of p ). On the x -axis we show theproportion of edges present in each contact network, and the yy