[PDF] Exact Parallelization of the Stochastic Simulation Algorithm for Scalable Simulation of Large Biochemical Networks

Abstract

Full PDF

EExact Parallelization of the Stochastic Simulation Algorithm for Scalable Simulationof Large Biochemical Networks

Arthur P. Goldberg, a) David R. Jeﬀerson, b) John A. P. Sekar, c) and Jonathan R.Karr d)1) Icahn Institute for Data Science and Genomic Technology, and Department ofGenetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai Lawrence Livermore National Laboratory (Dated: 22 May 2020)

Comprehensive simulations of the entire biochemistry of cells have great potential tohelp physicians treat disease and help engineers design biological machines. But suchsimulations must model networks of millions of molecular species and reactions.The Stochastic Simulation Algorithm (SSA) is widely used for simulating biochem-istry, especially systems with species populations small enough that discreteness andstochasticity play important roles. However, existing serial SSA methods are pro-hibitively slow for comprehensive networks, and existing parallel SSA methods, whichuse periodic synchronization, sacriﬁce accuracy.To enable fast, accurate, and scalable simulations of biochemistry, we present anexact parallel algorithm for SSA that partitions a biochemical network into many SSAprocesses that simulate in parallel. Our parallel SSA algorithm exactly coordinatesthe interactions among these SSA processes and the species state they share bystructuring the algorithm as a parallel discrete event simulation (DES) applicationand using an optimistic parallel DES simulator to synchronize the interactions. Weanticipate that our method will enable unprecedented biochemical simulations. a) Author to whom correspondence should be addressed: [email protected] b) Electronic mail: drjeﬀ[email protected] c) Electronic mail: [email protected] d) Electronic mail: [email protected] a r X i v : . [ q - b i o . M N ] M a y . INTRODUCTION Models of biochemical systems play a critical role advancing medicine and bioengineering.Technological advances in single-cell and genomic measurement are rapidly generating datathat enable larger and more complex models of biochemical pathways and whole cells. Butadvances in the performance and accuracy of simulation techniques are needed for integratingmodels.The Stochastic Simulation Algorithm (SSA) is a widely used method for predicting thetime evolution of chemical systems transformed by chemical reactions.

In particular, SSAcan model the variability exhibited by chemical systems with small species populations,conditions which often arise in small biological systems such as populations of individualcells.

Our interest in creating tools to enable dynamical models of large biochemical systems hasmotivated us to parallelize SSA. We are especially interested in whole-cell modeling, whichcreates genome-scale models of the known biochemical pathways in individual cells.

Awhole-cell model of a cell with a large genome, like a human cell, contains tens of thousands ofspecies and tens of thousands of reactions. In addition, it contains numerous compartmentsthat represent intracellular organelles such as the nucleus, mitochondria, lysosomes andothers. Whole-cell models are typically integrated for an entire cell cycle. The complexityof whole-cell models and the duration of their integrations cause their simulations run slowly.A highly parallel SSA algorithm could speed up simulations of models of large biochemicalsystems like whole-cell models.There are multiple opportunities for parallelism in the simulation of biochemical net-works due to the physical structure of cells, as well as due to artifacts of the limited abilityof researchers to estimate their structure. First, the compartmentalization of cells into or-ganelles means that the biochemical networks of a cell decomposes into a set of biochemicalsub-networks of the organelles, weakly connected via edges that represent a comparativelysmall number of exchange reactions. Second, the subcompartmentalization of organelles intospatial domains such as chromosomal regions and individual mRNA means that the sub-networks of the organelles decompose into weakly connected cliques. Third, the chemicalspeciﬁcity of most enzymes means that the sub-networks of the organelles further decom-pose into cliques. Fourth, our limited knowledge of the interactions among cellular processesmeans that our in silico biochemical networks have higher clustering coeﬃcients than realbiochemical networks.Exact implementations of SSA are computationally expensive when modeling systemsthat involve many reaction executions because they execute only one reaction per itera-tion. In addition, since SSA models stochastic behavior, multiple runs are needed to obtaindistributions and moments of predicted properties.While extensive eﬀort has been devoted to improving the performance of SSA anexact, parallel SSA algorithm that can concurrently leverage numerous processors to speedup an SSA simulation is not available.We introduce a novel parallel algorithm for speeding up SSA. Our approach conceives ofSSA as a discrete event simulation (DES) and leverages optimistic approaches for parallelDES (PDES).

In contrast to prior work, our parallel SSA algorithm takes advantageof more of the parallelism inherent in large reaction networks, exactly synchronizes changesto the species population state that is shared among components of the algorithm, and,assuming that random number generators are properly managed, can exactly reproduce the2redictions of a sequential SSA algorithm.

II. THE STOCHASTIC SIMULATION ALGORITHMA. A summary of SSA

SSA solves the following problem. Consider a well-mixed container of chemical species,the chemical reactions that transform them, and a rate law for each reaction that provides itsexecution rate as a function of the species populations. Given an initial state of this system,one can model the dynamic probability that it occupies any population state by represent-ing state transition probabilities as a system of coupled diﬀerential equations. Althoughthis formulation, known as the Chemical Master Equation, is exact, it is not tractable forbiologically interesting models because the state space is prohibitively large.The Stochastic Simulation Algorithm models the dynamical behavior of this system bycomputing a sequence of reaction executions, choosing reactions and execution times sothat the probability of generating a given trajectory equals the probability that would beprovided from a solution of the Chemical Master Equation.An algorithm for SSA, known as the Direct Method, was provided by Gillespie. procedure SSA Direct Method Set initial species populations t ← (cid:46) simulation time τ, µ ← Plan next reaction execution while t + τ ≤ max simulation time do t ← t + τ Update species populations according to reaction µ τ, µ ← Plan next reaction execution end while end procedure function Plan next reaction execution for r in reactions do Compute the rate (propensity) for r , a r end for a ← (cid:80) r a r (cid:46) the total propensity (cid:46) let U () randomly sample the standard uniform distribution τ ← (1 /a ) ln(1 /U ()) (cid:46) τ is the time the next reaction executes Choose µ by sampling P [ µ ] = a µ /a (cid:46) µ is the next reaction return τ, µ end function Detailed derivations for the Direct Method rely on a key assumption of the DirectMethod—that propensities a r remain constant in between reaction executions (equation (1)in 21).We use the Direct Method for pedagogical purposes in this paper, but base our parallelSSA algorithm on an optimized SSA algorithm, the Next Reaction Method. II. INTRODUCTION TO PARALLEL SSA

This section provides an overview of the parallel SSA algorithm.

A. Objectives

The parallel SSA algorithm we present has two primary objectives.1. Speed up the performance of SSA on a large reaction network by partitioning thenetwork into multiple sub-networks and integrating the sub-networks in parallel.2. Exactly simulate reaction networks, so that the results of a parallel simulation arethe same as the results of a sequential simulation. This objective is satisﬁed by theparallel SSA algorithm presented below.

B. Opportunities for parallelism

A reaction network in a large biochemical model like a whole-cell model provides twotypes of opportunities for parallelism. First, the sub-networks of reactions in separate com-partments may be suﬃciently decoupled from each other that they can be simulated inparallel. The amount of a compartment’s coupling is measured by the mean sum of therates of its reactions that exchange species with other compartments, relative to the meansum of the rates of all reactions in the compartment. Multiple prior studies have takenthis approach to parallelize SSA.

Second, reactions within a compartment may also bepartitioned into sub-networks that are suﬃciently decoupled that they can be simulated inparallel.These two types of potential parallelism are complementary and can be combined ina single parallelization. Our parallel SSA approach below abstracts away the distinctionsbetween them and simply partitions a reaction network into sub-networks. Nevertheless, wenote these opportunities for parallelism because prior work has leveraged the ﬁrst one, andthis discussion may help readers develop intuition for parallel SSA.

C. Overview of the parallel SSA approach

To provide an overview of the parallel SSA algorithm we brieﬂy summarize its high-levelsteps, broken down into its parallelization and simulation phases.

Parallelize

1. Read a model to simulate, including its reactions and rate laws.2. Generate a graph G that characterizes the dependencies among reactions and speciesin the model.3. Partition G into sub-networks of reactions that will be simulated in parallel.4. Identify the species that are shared between multiple sub-networks of reactions, andpartition them. 4 imulate

1. Read a simulation conﬁguration for the model, including the initial populations ofspecies and other initial conditions, and a maximum simulation time.2. Instantiate an optimistic object-oriented (OO) parallel discrete event simulation(PDES) environment on a parallel computer.

3. Map each reaction sub-network of G onto an SSA simulation object in the optimisticPDES environment, instantiate the object, and send it an initialization event message.4. Map each set of shared species onto a Shared Species Population (SSP) simulationobject in the PDES environment, instantiate the object and send it an initializationevent message.5. Initialize all other objects in the simulation.6. Run the parallel SSA simulation.These steps and components are thoroughly described below. IV. THE PARALLEL SSA ALGORITHM

This section presents the concepts underlying the parallel SSA algorithm.

A. Transforming a reaction network into a parallel simulation

1. Encode the reaction network’s dependencies into a directed graph

Let R and S represent all reactions and species, and r i , s j , and l k represent individualreactions, species and rate laws, respectively. We map the dependencies between thesecomponents onto a bipartite directed graph G ( v, e ), where v = R (cid:83) S . G is constructedas follows. A directed edge ( s j , r i ) is added to G if species s j participates in rate law l i ,thereby encoding the dependency of reaction r i on species s j . Similarly, a directed edge( r i , s j ) is added to G if s j has a non-zero stoichiometry in reaction r i , thereby encodingthe dependency of s j on reaction r i that arises because executing reaction r i changes thepopulations of species s j . Figure 1 illustrates G for a small example network.

2. Partition the dependency graph

To identify reaction network components that can be simulated in parallel, we partition G into two types of sub-networks. The ﬁrst type of sub-network contains reactions, andall of the species that participate only in reactions in the sub-network (e.g., see partitions α and β in in Figure 2). The species contained in a reaction sub-network are called local More precisely, these are the species connected in G to reactions in the sub-network and not connectedto any other reactions. pecies Reactionsr : A + B → C @ k A Br : C + D → E + F + D @ k C Dr : F + F → G @ k F(F-1)

Reactions Speciesr r r ABCDEFG

Reaction depends onand changes speciesSpecies changedby reaction

Reactions

Reaction dependson species

FIG. 1. Relationships between reactions and species encoded in dependency graph G . The rela-tionships between reactions and species are encoded in a directed graph whose nodes are reactionsand species. Directed edges indicate data dependencies. An edge from a reaction and its ratelaw to a species indicates that executing the reaction changes the species’ population. An edgefrom a species to a reaction indicates that the reaction’s rate law uses the species’ population. Forexample, G contains an edge from enzyme D to reaction r because the rate law for r is a functionof the population of enzyme D. Bi-directional edges indicate both of these dependencies.The @ signs in the list of reactions separate each reaction from its rate law, a mathematical ex-pression that computes the rate at which the reaction executes. In a rate law a species symbolrepresents the number of molecules of the species in the container being simulated. species, because they’re used only by the reactions in their local sub-network. For example,in Figure 2 species A and B are local to sub-network α .The second type of sub-network created by a partition contains only shared species, whichare used by reactions in two or more reaction sub-networks.

3. Designing the partitioning algorithm

The goal of the partitioning algorithm is to identify partitions into sub-networks thatcan be rapidly simulated in parallel. In the near-term, we plan to use linkage clusteringalgorithms to partition the dependency graph G . Intuitively, this will minimize the numberof state changes that would need to be synchronized between the DES objects which areencoded into the sub-networks.Long-term, we aim to develop an algorithm for identifying the optimal partitioning thatminimizes the expected run-time on a parallel computer. Due to the challenges to analyzingthe expected run-time of a parallel SSA simulation, the optimal partitioning will likely need6e to identify empirically by testing the performance of putative partitionings.

4. Encoding the reaction network into simulation objects

The two types of sub-networks produced by a partition are mapped into two types ofDES objects.Each sub-network of reactions deﬁnes the reactions that are integrated by one OO DESobject that simulates SSA, called an

SSA object . For example, in Figure 2 SSA object α integrates reaction r and stores local species A and B , while SSA object β integratesreactions r and r and stores local species D , E , F , and G .Each sub-network of shared species is mapped into one OO DES Shared Species Pop-ulation (SSP) object which will coordinated the populations of the shared species duringa simulation. For example, in Figure 2 shared species C is stored in an SSP object andis used by reactions r and r , which are executed by two diﬀerent SSA objects, α and β ,respectively. r r r ABCDEFG

SSA object α A, B

SSP object Cr SSA object β D, E, F, Gr , r Interactions betweensimulation objects αβ FIG. 2. A partition of the dependencies illustrated in Figure 1. The dependency graph G in Figure 1is partitioned into two reaction sub-networks, α and β , and 1 shared species sub-network. Theseare then mapped onto the SSA objects α and β , and an SSP object. In this small network eachSSA object interacts with the SSP object, but in reaction networks suitable for parallel executionby parallel SSA, each SSA object would interact with a small fraction of the SSP objects, and viceversa . B. Architecture of parallel SSA

This parallel SSA algorithm has been designed to run as a parallel OO DES application ontop of an optimistic PDES simulator.

The major advantage of this approachis that the optimistic PDES simulator will be responsible for running the objects in theparallel SSA algorithm in simulation time order. The major types of simulation objects inparallel SSA will be the SSA and SSP classes deﬁned above. In addition, other utility classeswill be added, such as classes for initialization and for checkpointing model predictions.7 . THE PARALLEL SSA ALGORITHM

This section presents our approach for an exact parallel SSA algorithm. We begin byrestructuring SSA as an OO DES application and deﬁning the structure of DES applicationclasses. In sub-section V B we illustrate the features and components of the algorithm,which we follow with a sub-section containing pseudocode for its classes. The ﬁnal sub-section describes how the algorithm handles compartments.

A. Structure SSA as an object-oriented discrete event simulation application

Many variations of the SSA algorithm have been published, beginning with two in Gille-spie’s original paper. SSA variations have the characteristics of a DES application. First,their events, which simulate reaction executions, occur at a discrete time instants. And,second, these events are dynamically scheduled by computations at earlier simulation times.But published SSA variations do not use DES because they’re written as free-standing al-gorithms.Since an optimistic parallel OO DES simulator will be used to synchronize the parallelSSA algorithm, we begin the discussion of synchronization by recasting the Direct Method(presented in section II A) as an OO DES application. This implementation of the DirectMethod OO DES class executes two event methods,

Initialize Direct Method and

Executeand schedule reaction : Event method

Initialize Direct Method (

0, self, self, initial populations, max simulation time ) Set initial species populations t m ← max simulation time (cid:46) Call Direct Method function deﬁned in section II A τ, µ ← Plan next reaction execution() send Execute and schedule reaction( τ , self, self, µ ) end Event method Event method

Execute and schedule reaction ( t , self, self, µ ) if t ≤ t m then Update species populations according to reaction µ τ, µ ← Plan next reaction execution() send

Execute and schedule reaction( t + τ , self, self, µ ) end if end Event method All OO DES application objects interact exclusively through event messages, a strictinterface that makes it possible for parallel OO DES simulators to execute them.As illustrated by this pseudo-code for the Direct Method class, all DES event messagesin this paper have the form

Message type(time, sender, receiver, [arguments]) ,where

Message type is the type of the message, time is the simulation time at which themessage will be received and executed, sender identiﬁes the simulation object that sendsthe message, receiver identiﬁes the simulation object that will receive the message, and theoptional arguments are data carried by the message from its sender to its receiver .The algorithm for a DES application class is deﬁned by its message handler event meth-ods, whose names and signatures must exactly match the types and ﬁelds of messages8eceived by the class, although the handler method names are written in small caps todistinguish them from the messages. Throughout the execution of a handler the simulationtime of the receiver object handling an event message is automatically time . B. An exact parallel SSA algorithm

This section presents an algorithm that focuses on the key logic needed to exactly paral-lelize SSA. Performance optimizations for the algorithm follow in section VI A.The exact parallel SSA algorithm is a parallel OO DES application that executes twoclass types, SSA objects and Shared Species Population (SSP) objects. Updates and readsof local species stored in SSA objects always occur at the correct simulation times, so wefocus on shared species. To exactly synchronize shared species, updates and reads of theirpopulations must also occur at the correct times. More speciﬁcally, all updates to sharedspecies populations by a reaction must change the populations in SSP objects at the timethe reaction executes. And when an SSA object uses a shared species at time t it mustobtain the SSP’s value for the species’ population at t .Correct timing of updates and reads of shared species populations is achieved by explicitlyaccessing shared species at SSP objects. When a reaction executed by an SSA object updatesthe population of a shared species, the object updates the population at the SSP by sendingan Adjust populations message to the SSP. And the SSP handles an

Adjust populations message by sending a zero-delay

Populations message to each SSA object that uses theshared species which were updated in the

Adjust populations message.Table I lists these messages and Figure 3 illustrates their dynamics. The interactionsbetween SSA object α and the SSP over the time interval t to t (see section A of Figure 3)execute one reaction, r . At time t three event messages are sent to schedule the reactionand the parallel coordination it requires: TABLE I. Event message types used by Parallel SSA

Messagetype Senderobjecttype Receiverobjecttype Arguments (not in-cluding the required ar-guments time , sender ,and receiver ) Parallel SSA semantics and re-ceiver object action

Execute reac-tion

SSA SSA µ Execute reaction µ Schedulereaction

SSA SSA Schedule the next reaction

Adjust popu-lations

SSA SSP pop changes

Convey the stoichiometry of a re-action’s shared species to the SSP,which updates their populations

Populations

SSP SSA shared species pops

Provide populations of the re-quested species to the SSA • Execute reaction schedules the execution of r at SSA α at t . Every reaction executionrequires this message. 9 t t t t Time SSAobject α SSPobject storing C SSAobject β

Executereaction(t , s, r, μ)Executereaction(t , s, r, r ) Schedulereaction(t , s, r)Adjust populations(t , s, r)Populations(t , s, r, C: new_C) Adjust populationsretraction Executereactionretraction Schedulereactionretraction

Adjust populations(t , s, r, C += 1)Schedulereaction(t , s, r)Populations(t , s, r, C: new_C) A : Execution of one reaction with SSP coordination B : Update to shared species leads to retraction of planned reaction FIG. 3. Example event message interactions between SSA and SSP objects. To conserve space, theevent message ﬁelds time, sender, receiver are abbreviated t i , s, r . A : The interactions betweenSSA object α and the SSP show the messages involved in a single reaction execution coordinatedwith the SSP, as described in section V B. B : The Populations message that updates the populationof shared species C at SSA object β at time t causes SSA β to cancel its planned execution ofreaction µ at time t , as discussed in section V B 1. Each retraction message is indicated by adashed arrow that has the same color as the message being retracted, and points to the sameobject at the same time as it does. • Schedule reaction causes SSA α to schedule the next reaction. Every reaction executionrequires this message. • Adjust populations schedules the SSP to increment the population of C at t , as perthe stoichiometry of reaction r . Every reaction execution that changes the populationof shared species requires this message.At time t , in response to the Adjust populations message, the SSP sends a

Populations message that contains the shared populations to SSA α .The algorithms that send and handle these messages are detailed in sections V C 1 andV C 2.

1. Handling updates to shared species used by rate laws

A fundamental assumption SSA is that the propensities a k ( X ( t )) are constant exceptwhen reaction executions change species counts (equation (1) in. ) Parallel SSA revisesthis assumption by adding that propensities can also change when the populationsof shared species used by propensities are updated . Changes external to an SSAobject can thus invalidate the computation that scheduled a reaction execution. The standard notation a k ( X ( t )) indicates the propensity of reaction k —computed using k ’s rate law—asa function of the species populations at time t , X ( t ). Some SSA algorithms model time-varying propensities, for example to simulate containers with varyingvolume, as in section V of . β over the time interval t to t (section B ofFigure 3):1. At time t , in response to the Adjust populations it received from SSA α , the SSPsends a Populations message that updates the population of shared species C at SSA β .2. At t SSA β receives the message, updates the population of species C, determines thatthe population of C was used by the propensity calculations made when schedulingits pending reaction µ , and concludes that the assumption made when scheduling µ —that the propensities calculated at time t would remain constant until time t —nolonger holds. Thus, SSA β must now cancel the scheduled execution of reaction µ ,and reschedule its next reaction.3. SSA β cancels each of the messages it sent at t to schedule the execution of reaction µ by sending a corresponding retraction message at t (see the three messages labeled retraction and indicated by dashed arrows in Figure 3). Retraction messages areapplication messages that completely cancel the eﬀects of a corresponding retractedmessage, including the possible execution of the retracted message and that execution’sside eﬀects. They are a fully developed feature of the Virtual Time . optimisticparallel synchronization theory which is implemented by Time Warp optimistic PDESsimulators

4. SSA β now reschedules its next reaction, which Figure 3 illustrates as executing attime t .The algorithms for the parallel SSA classes that implement these actions are detailed inthe next section. C. Class algorithms for parallel SSA

This section presents event methods for the SSA and SSP classes that implement theparallel SSA algorithm. To simplify this section’s presentation all shared species are storedin a single SSP object, and we consider a system with only one biological compartment,corresponding to a well-mixed container in the stochastic simulation’s analytic framework. These limitations are relaxed in section VI A below.

1. The SSP class

An SSP object executes two types of events:

Initialize initializes the SSP, and

Adjustpopulations updates the populations of speciﬁed shared species. Event method

Initialize (0, self, self, initial populations, species locs) (cid:46) initialize initial populations of all shared species self.populations = initial populations self.SSA map = species locs (cid:46) map shared species to their SSAs end Event method : Event method

Adjust populations (time, SSA obj, self, pop changes) add the changes in pop changes to self.populations (cid:46) send updated populations to SSAs that depend on them for SSA in self.SSA map do shared species pops = empty dictionary for species in pop changes do if species in self.SSA map[SSA] then shared species pops[species] = self.populations[species] end if end for if shared species pops then send Populations (time, self, SSA, shared species pops) end if end for end Event method

Adjust populations adjusts the populations of shared species in self.populations by thechanges in pop changes . In lines 4–14 the SSP acts like a write-through cache, forward-ing updated shared species population values to the SSAs executing reactions that use thespecies in their rate laws.

2. The SSA class

This section deﬁnes the algorithms used by the SSA class. We adapt Gibson and Bruck’sNext Reaction Method (NRM) to determine the reactions that execute and their execu-tion times. However, to simplify the presentation we do not incorporate NRM’s optimizationthat avoids recomputing propensities which do not depend on changed species populations.Instead, in section VI A we identify it as an optimization that should be added to the algo-rithm.An SSA object executes four types of event methods. They correspond 1-to-1 to themessages illustrated in Figure 3 and described in section V B:

Initialize initializes the object , Schedule reaction schedules the next reaction,

Execute reaction executes a reaction, and

Populations receives updated values for the populations of the shared species used by theSSA object. Event method

Initialize (0, self, self, reactions, initial populations) self.reactions = reactions (cid:46) reactions and their rate laws self.local species = initial populations[ ' local species ' ] self.cached shared species = initial populations[ ' shared species ' ] self.SSP = the single SSP self.next rxn.time = 0 (cid:46) self.next rxn tracks a pending reaction self.next rxn.reaction = ∅ µ, τ µ = Select initial reaction (self) Send reaction events ( µ , τ µ , self) end Event method Note that the

Execute and schedule reaction event method used by the SSA class presented in section V Ahas been decomposed into two event methods, which is required by the reasoning at the end of sec-tion V D 1. nitialize only runs once, at time 0. function Select initial reaction (self) (cid:46) Select the ﬁrst reaction, saving propensities and times for k in self.reactions do self. a k = a k = propensity for k r k = uniform(0, 1) random number self. τ k = τ k = (1 /a k ) ln(1 /r k ) (cid:46) execution time for k end for µ = k that satisﬁes min k { τ k } (cid:46) Select next reaction return µ, τ µ end function To support the event-driven model of OO DES, the reaction scheduling code must beseparated into a diﬀerent function for each event. Each function determines and returns thenext reaction to execute, µ , and the time when it will execute, τ µ . The ﬁrst such function, Select initial reaction , which is called by

Initialize , is mathematically equivalent tothe initialization of NRM (lines 2–5 of Alg. 2 in 21). function Send reaction events ( µ , τ µ , self) send Execute reaction ( τ µ , self, self, µ ) send Schedule reaction ( τ µ , self, self) species changes = shared species with stoichiometry (cid:54) = 0 in µ if species changes then send Adjust populations ( τ µ , self, self.SSP, species changes) end if self.next rxn.reaction = µ self.next rxn.time = τ µ end function The

Send reaction events function sends all of the event messages transmitted by inSSA object. It is called by each of the three event methods that send messages. Each callsends all the messages associated with the execution of one reaction. Lines 11–12 record thereaction and its execution time for later use. Event method

Schedule reaction (t, SSA obj, self) (cid:46) Schedule a reaction after executing one µ = self.next rxn.reaction µ, τ µ = Select after reaction execution (self, t, µ ) Send reaction events ( µ , τ µ , self) end Event method function Select after reaction execution (self, t , µ ) (cid:46) Select next reaction after executing µ ; update self. a k and self. τ k s = self for k in self.reactions do a k = new propensity for k if k (cid:54) = µ then self. τ k = τ k = s.a k a k ( s.τ k − t ) + t end if end for r µ = uniform(0, 1) random number self. τ µ = τ µ = (1 / a µ ) ln(1 /r µ ) µ = k that satisﬁes min k { τ k } (cid:46) Select next reaction for k in self.reactions do self. a k = a k end for return µ, τ µ end function Schedule reaction implements a DES version of the body of the NRM’s iterative loop.

Select after reaction execution , which selects the next reaction after a reactionexecutes, is mathematically equivalent to the code in NRM’s iterative loop (lines 5 and 7–10of Alg. 2 in 21). Event method

Populations (t, SSP obj, self, shared species pops) self.cached shared species = shared species pops if time ¡ self.next rxn.time then (cid:46) Cancel next reaction and schedule again Cancel scheduled reaction (self) µ, τ µ = Select after reaction cancellation (self, t) Send reaction events ( µ , τ µ , self) end if end Event method function Cancel scheduled reaction events (self) (cid:46)

Cancel events related to the execution of reaction self.next rxn µ = self.next rxn.reaction τ µ = self.next rxn.time retract Execute reaction ( τ µ , self, self, µ ) retract Schedule reaction ( τ µ , self, self) species changes = shared species with stoichiometry (cid:54) = 0 in µ if species changes then retract Adjust populations ( µ , self, self.SSP, species changes) end if end function function Select after reaction cancellation (self, t ) (cid:46) Select next reaction after cancellation, updating self’s a k and τ k s = self for k in self.reactions do a k = new propensity for k self. τ k = τ k = s.a k a k ( s.τ k − t ) + t end for µ = k that satisﬁes min k { τ k } (cid:46) Select next reaction for k in self.reactions do self. a k = a k end for return µ, τ µ end function opulations records updated populations for the shared species used by an SSA object’srate laws. If a future reaction is pending then it must be cancelled and reaction schedulingmust be redone, as previewed in section V B 1 above. This novel aspect of the parallel SSAalgorithm is handled by a pair of functions: Cancel scheduled reaction events cancelsall of the events related to the previously scheduled reaction, and

Select after reactioncancellation then selects the next reaction to execute. In the former function, each“retract

Event message ” operation sends a retraction message that cancels the previouslysent

Event message . Select after reaction cancellation schedules the next reaction. We adapt theNRM to determine the reaction times. Because no reactions have executed since propensitieswere last calculated, all reactions—including the one that was cancelled—can be treated likethe reactions that did not execute in the iterative NRM loop (see lines 4–9 of Alg. 2 in 21).Thus, lines 27–31 of Select after reaction cancellation calculate a new propensityand execution time for each reaction, and select the reaction with the minimum time. Norandom numbers are needed. Event method

Execute reaction (t, SSA obj, self, µ ) update self.local species according to the stoichiometry of µ end Event method Execute reaction updates the local species populations according to the stoichiometry ofthe reaction being executed.

D. Correctness of the parallel SSA algorithm

The algorithms for the SSP and SSA class methods deﬁne an exact parallel SSA algorithmbecause they maintain these invariants:

Read timing:

Propensity calculations read species populations at the correct times.

Write timing:

Updates to species populations are performed at the correct times.

Static propensities:

The propensities used to schedule every reaction that executes re-main constant between the time they are computed and the reaction’s execution.We consider only shared species, since locally stored species are trivially read and updatedat the correct times.

Read timing holds because the most recent populations of all sharedspecies are used to calculate each propensity: whenever the SSP storing a shared speciesreceives an

Adjust populations message it responds by sending a zero-delay

Populations message to each SSA object that uses the shared species which were updated in the

Adjustpopulations message.

Write timing holds because the populations of all shared speciesmodiﬁed by a reaction are updated at the SSP via an

Adjust populations message thatis executed at the time the reaction executes. And

Static propensities holds becausewhenever a species population used by a propensity calculation changes, the reaction thatdepends on the change is cancelled and scheduling is redone, as performed in lines 5–7 of

Populations .However, one ﬁnal issue must be resolved to complete this informal correctness proof.The dependencies among simultaneous

SSA event messages need to be addressed. The nextsub-section deﬁnes these dependencies and describes how the parallel SSA algorithm ensuresthat simultaneous event messages execute in the correct order.15 . Dependencies among simultaneous parallel SSA event messages

When a reaction executes all four event messages associated with the reaction executeat the simulation time of the reaction (see section A of Figure 3). To achieve correctnessparallel SSA must control the execution order of simultaneous event messages. Speciﬁcally,they should be executed in an order consistent with the logic of sequential SSA. This sectionpresents the rationale for that order, and the simulation mechanisms that achieve the order.Figure 4 illustrates logical dependency relationships between simultaneous parallel SSAevent messages. These reasons explain the dependencies in Figure 4:1. Populations is a response to

Adjust populations , so

Adjust populations must executebefore

Populations .2.

Populations must follow

Execute reaction so that species populations are not alteredbefore

Execute reaction is executed.3.

Populations must precede

Schedule reaction so that

Schedule reaction sees fully up-dated populations, as occurs in sequential SSA algorithms.

Execute reactionPopulationsSchedule reactionAdjust populations

Events executedby SSP objectsEvents executedby SSA objects

FIG. 4. Dependencies among simultaneous parallel SSA events associated with a single reactionexecution. A directed edge from event message x to event message y means that the event thatexecutes message x must occur before the event that executes message y . The rationale for eachdirected edge is provided in the text. Parallel SSA achieves the ordering presented in Figure 4 in two levels. First, globally atany instant of time SSP objects must execute before SSA objects, as the alternative wouldbe inconsistent with the dependencies in Figure 4. And second, locally within each objecttype, messages must execute in the fully determined order shown in Figure 4.Two standard PDES mechanisms can implement these two levels of ordering. Globally,simultaneous events at diﬀerent PDES objects can be ordered by controlling a sub-time ofthe simulation time. SSP objects can be forced to execute before SSA objects when theyexecute simultaneously by always giving SSP objects a smaller sub-time than SSA objects. All simultaneous messages received by an object at a given simulation time must be passed to the objectas an event message set . Therefore, SSPs must precede SSAs at a given time to ensure that

Adjustpopulations executes before

Populations . Execute reaction , Populations , Schedule reaction . This ordering is straightforward toachieve. For example, a sequential OO DES engine called

DE Sim that we wrote providesdeclarative support for controlling the execution order of simultaneous events. The requirement that

Populations executes between

Execute reaction and

Schedule reac-tion forces parallel SSA to logically distinguish the event that executes a reaction from theevent that schedules the next reaction.

E. Compartments

Multiple compartments are fully supported by the parallel SSA algorithm. Consider asystem with two adjacent compartments, c and c , and a single exchange reaction x thattransfers a species from c to c . Letting the species being transferred be named s [ c ] and s [ c ]in c and c , respectively, reaction x can simply be s [ c ] → s [ c ]. Let the reactions containedin c and c be mapped to SSA objects S and S respectively, with S executing reaction x and s [ c ] a shared species used by both S and S . When x executes, the population of s isdecremented in c and incremented in c .This exchange reaction is naturally handled by the parallel SSA algorithm. Since S executes x , the population change in S occurs at the time x executes, and the Staticpropensities invariant holds in S .SSA object S will receive a Populations message at the time x executes, which will cause S to cancel its pending reaction and redo reaction scheduling, as performed by lines 5–8of the Populations event method. Therefore, the

Static propensities invariant holds in S as well, and the parallel SSA algorithm handles reactions that transfer species betweencompartments.Since each reaction execution is independent, this analysis extends to multiple exchangereactions between a pair of adjacent compartments, and to many compartments.Lastly, we note that an exact sequential SSA simulation of a system containing multiplecompartments must use an approach similar to ours because it will simulate each compart-ment independently and cannot be exact unless it cancels reactions which are interruptedby exchange reactions. However, unlike this parallel algorithm, it will not need to handlecascading cancellations. F. Practical considerations for the PDES simulator

The parallel SSA algorithm application employs several features that must be supportedby the OO PDES simulator on which it runs. First, because

Populations is a zero-delay message these must be allowed by the simulator. Second, the

Cancel scheduledreaction events function uses a “retract

Event message ” operation to send a retrac-tion message that cancels a previously sent

Event message . Retraction messages. aresupported by the ROSS optimistic PDES simulator I. OPTIMIZATIONS AND EVALUATIONA. Optimizing parallel SSA

Multiple optimizations should be incorporated into the parallel SSA algorithm to improveits performance. These are all exact. • Multiple SSP objects should be employed so that a) they can share computationalload, and b) shared species can be mapped to SSP objects that are located on aparallel computer near the SSA objects that use them. • The optimization introduced by NRM that recomputes only propensities which dependon species populations that have changed should be incorporated. In parallel SSA,when a reaction executes these species are given by the union of the species with non-zero stoichiometry in the reaction with the shared species whose populations have beenupdated. When a reaction is cancelled, they are given by the updated species providedby the

Populations message that triggered the cancellation. This optimization shouldbe implemented using the indexed priority queue P deﬁned in. This requires thatSSPs track the shared species population updates which have not been received byeach SSA. • Unnecessary data in

Populations messages can be avoided by having SSA objectsrecord updates to shared species locally and having SSPs not send an update to ashared species back to the SSA that reported the update. If all data in a

Populations message is unnecessary, the message need not be sent.These optimizations are all straightforward to implement.In addition to these exact optimizations, approximate optimizations should also be con-sidered.

B. Evaluation

While quantitative performance results are not available because parallel SSA has notbeen implemented yet, the parallel SSA algorithm achieves these conceptual objectives.It oﬀers a method to accelerate SSA without sacriﬁcing accuracy by partitioning reactionnetworks and simulating sub-networks in parallel on a supercomputer. In addition, it aimsto reduce simulation run-times by maximizing the number of sub-networks in a simulationwhile minimizing the cost of synchronization.The parallel SSA algorithm also faces several challenges. Partitioning will be performedon a static reaction network, whereas the characteristics of the network and its sub-networkswill likely vary over the duration of a simulation. The performance of a parallel SSA simula-tion will depend on the rate of updates to shared species, and fraction of those updates thatcause reactions to be cancelled. A partitioning algorithm that accurately estimates theserates needs to be developed. 18

II. NEXT STEPS

Much additional work must be completed before parallel SSA can become a standardtool for accelerating the simulation of large biochemical models. We plan these next steps. • Implement the algorithm: Select a PDES simulator to use as a foundation, and im-plement the SSA algorithm and the optimizations from section VI A as an DES appli-cation that runs on the simulator. • Implement partitioning: Develop a reaction-network partitioning algorithm. In addi-tion, a related algorithm is needed to map SSA and SSP objects to processors andcores in a supercomputer when a parallel SSA simulation in initialized. • Evaluate the implementation’s performance: We will develop a benchmark reactionnetwork, obtain a fast sequential SSA implementation such as , and evaluate parallelSSA’s relative speedup. Initial construction has begun on a conﬁgurable generator ofsynthetic reaction networks. • Integrate parallel SSA into a user-friendly modeling environment: a comprehensiveenvironment for modeling whole-cells and other large networks needs a modeling lan-guage, a simulation experiment language, and a format for simulation results. • Combine parallel SSA with other integration algorithms in a multi-algorithmic simula-tor: because diﬀerent pathways in cells are characterized at diﬀerent levels, whole-cellmodels must be simulated with multiple integration algorithms, including dynamicFlux Balance Analysis and ODEs along with SSA. To achieve this, we will mergeparallel SSA with an existing multi-algorithmic simulator.

VIII. CONCLUSIONS

We make important progress toward using parallelism to accelerate the Stochastic Simula-tion Algorithm (SSA) by presenting the ﬁrst exact parallel algorithm for SSA. The algorithmparallelizes SSA with no loss of accuracy by partitioning a reaction network into multiplesub-networks that are simulated by independent but coordinated SSA instances. It exactlysynchronizing accesses to species populations shared by the instances, and cancels pendingreactions that are interrupted by population updates that invalidate prior propensity cal-culations. To recover from reaction cancellations the algorithm employs a modiﬁed NextReaction Method approach. All concurrent synchronization, including event timing, re-action cancellation and rollback, is achieved by leveraging the existing synchronization inoptimistic parallel DES simulators. This will make the parallel SSA algorithm easier toimplement and deploy. In addition, the algorithm exactly simulates systems that containmultiple compartments and transfer species between them. We present a plan for imple-menting, optimizing and evaluating it.A high-performance, production parallel SSA algorithm would help enable simulations ofcomprehensive models of the entire biochemistry of cells, which would advance the treatmentof disease and the engineering of useful microbes.19 CKNOWLEDGMENTS

This worked was supported by National Science Foundation award 1649014 and NationalInstitutes of Health award R35GM119771 to J.R.K. We appreciate insightful comments fromRobert Clayton Blake at Lawrence Livermore National Laboratory.

REFERENCES D. T. Gillespie, The journal of physical chemistry , 2340 (1977). D. T. Gillespie, Annu. Rev. Phys. Chem. , 35 (2007). T. Maier, A. Schmidt, M. G¨uell, S. K¨uhner, A.-C. Gavin, R. Aebersold, and L. Serrano,Molecular systems biology (2011). J. Yu, J. Xiao, X. Ren, K. Lao, and X. S. Xie, Science , 1600 (2006). A. P. Goldberg, B. Szigeti, Y. H. Chew, J. A. Sekar, Y. D. Roth, and J. R. Karr, Currentopinion in biotechnology , 97 (2018). B. Szigeti, Y. D. Roth, J. A. Sekar, A. P. Goldberg, S. C. Pochiraju, and J. R. Karr,Current opinion in systems biology , 8 (2018). J. R. Karr, J. C. Sanghvi, D. N. Macklin, M. V. Gutschow, J. M. Jacobs, B. Bolival Jr,N. Assad-Garcia, J. I. Glass, and M. W. Covert, Cell , 389 (2012). A. P. Goldberg, Y. H. Chew, and J. R. Karr, in

Proceedings of the 2016 ACM SIGSIMConference on Principles of Advanced Discrete Simulation (2016) pp. 259–262. V. H. Thanh, C. Priami, and R. Zunino, The Journal of chemical physics , 10B602 1(2014). D. T. Gillespie and L. R. Petzold, The Journal of Chemical Physics , 8229 (2003). A. Auger, P. Chatelain, and P. Koumoutsakos, The Journal of chemical physics ,084103 (2006). D. T. Gillespie, The Journal of chemical physics , 1716 (2001). R. Fujimoto, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304) , 147 (2000). C. Carothers, D. Bauer, and S. Pearce, Proceedings Fourteenth Workshop on Parallel andDistributed Simulation , 53 (2000). C. D. Carothers and K. S. Perumalla, in

Proceedings of the 2010 Winter Simulation Con-ference (IEEE, 2010) pp. 678–687. D. Jeﬀerson, B. Beckman, F. Wieland, L. Blume, and M. DiLoreto, in

Proceedings of theeleventh ACM Symposium on Operating systems principles (1987) pp. 77–93. D. R. Jeﬀerson and P. D. Barnes, Jr, in (IEEE,2017) pp. 786–797. E. Mikida, N. Jain, L. Kale, E. Gonsiorowski, C. D. Carothers, P. D. Barnes, Jr, andD. Jeﬀerson, in

Proceedings of the 2016 ACM SIGSIM Conference on Principles of Ad-vanced Discrete Simulation (2016) pp. 99–110. M. A. Gibson and J. Bruck, The journal of physical chemistry A , 1876 (2000). Y. Cao, D. Gillespie, and L. Petzold, Journal of Computational Physics , 395 (2005). D. F. Anderson, The Journal of chemical physics , 214107 (2007). M. Jeschke, R. Ewald, A. Park, R. Fujimoto, and A. M. Uhrmacher, ACM SIGMETRICSPerformance Evaluation Review , 22 (2008). T. Mazza, P. Ballarini, R. Guido, and D. Prandi, IEEE/ACM transactions on computa-tional biology and bioinformatics , 911 (2012).20 L. Dematt´e and T. Mazza, in

International Conference on Computational Methods inSystems Biology (Springer, 2008) pp. 191–210. B. Wang, Y. Yao, Y. Zhao, B. Hou, and S. Peng, in (IEEE, 2009) pp. 91–100. M. J. Hallock, J. E. Stone, E. Roberts, C. Fry, and Z. Luthey-Schulten, Parallel computing , 86 (2014). D. R. Jeﬀerson, ACM Transactions on Programming Languages and Systems , 404 (1985). P. D. Barnes, Jr, C. D. Carothers, D. R. Jeﬀerson, and J. M. Lapre, in

SIGSIM-PADS’13 (Association for Computing Machinery, Montr`eal, 2013) pp. 327–336. C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto, ACM Transactions on Modelingand Computer Simulation (TOMACS) , 224 (1999). M. Schordan, D. Jeﬀerson, P. D. Barnes, Jr, T. Oppelstrup, and D. Quinlan, in

Interna-tional Conference on Reversible Computation (Springer, 2015) pp. 95–110. E. Mikida and L. Kale, in

Proceedings of the 2018 ACM SIGSIM Conference on Principlesof Advanced Discrete Simulation (2018) pp. 189–200. R. M. Fujimoto, Communications of the ACM , 30 (1990). G. Lomow, S. R. Das, and R. M. Fujimoto, ACM Transactions on Modeling and ComputerSimulation (TOMACS) , 219 (1991). D. R. Jeﬀerson and P. D. Barnes, Jr, “Virtual time iii, part 1: Uniﬁed virtual time syn-chronization for parallel discrete event simulation,” (2020), submitted. A. P. Goldberg, “Python framework for discrete event simulation,” (2020). M. Schordan, T. Oppelstrup, D. Jeﬀerson, P. D. Barnes Jr, and D. Quinlan, in

Proceed-ings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (2016) pp. 111–122. E. T. Somogyi, J.-M. Bouteiller, J. A. Glazier, M. K¨onig, J. K. Medley, M. H. Swat, andH. M. Sauro, Bioinformatics31