Learning heterogenous reaction rates from stochastic simulations
HHow to Upscale The Kinetics of Complex Microsystems
Ivan Kryven ∗ and Ariana Torres-Knoop Van ’t Hoff Institute for Molecular Sciences, University of Amsterdam
The rate constants of chemical reactions are typically inferred from slopes and intersection pointsof observed concentration curves. In small systems that operate far below the thermodynamic limit,these concentration profiles become stochastic and such an inference is less straightforward. Byusing elements of queuing theory, we introduce a procedure for inferring (time dependent) kineticparameters from microscopic observations that are given by molecular simulations of many simul-taneously reacting species. We demonstrate that with this procedure it is possible to assimilate theresults of molecular simulations in such a way that the latter become descriptive on the macroscopicscale. As an example, we upscale the kinetics of a molecular dynamics system that forms a complexmolecular network. Incidentally, we report that the kinetic parameters of this system feature apeculiar time and temperature dependences, whereas the probability of a network strand to close acycle follows a universal distribution.
PACS numbers: 02.50.Tt, 82.20.Uv, 36.20.Ey, 02.50.TtKeywords: chemical kinetics, networks, statistical inference, stochastic processes
How to deduce chemical rate constants from obser-vations? On the macroscopic scale, where concentra-tions of chemical compounds are deterministic quanti-ties, this question was answered by Arrhenius who fa-mously connected the reaction rate constants to slopesand intersection points of the concentration relatedcurves[1]. Microscopic systems, as for instance, liv-ing cells[2, 3], micropores[4], or those used for in sil-ico computer experiments[5–9], typically have a smallreaction volume, and therefore, the reaction rates fea-ture stochastic fluctuations that are not accountedfor in the Arrhenius theory. The stochasticity meansthat the reaction rates are lacking a predictable or-der and due to the thermal noise and chaotic collisiondynamics can only be described with probabilities. Afew more assumptions that provide the foundationsfor the Arrhenius theory of reaction rates may leadto artefacts even in the macroscopic systems: a well-mixed environment, Boltzmann’s stosszahlansatz, ab-sence of memory, and non-cooperation of particles.If such artefacts occur[10–12], the reaction rate con-stants are said to be time-dependent. For exam-ple, irreversible polymerisation leads to progressivelygrowing molecules and therefore each reaction firingchanges the conditions of the system and consequentlythe rates of the reactions occurring therein[8, 13, 14].Molecular networks pose an especially severe case:their physical properties evolve considerably in thecourse of the assembly process and the latter can un-dergo various types of phase transitions[7–9]. As anillustration of how drastic such changes can be, Fig. 1depicts the formation of a densifying molecular net-work that transits from a liquid-like to solid-like state.It is not surprising that such evolution of physicalproperties has a strong and complex impact on the re-action rates. Molecular dynamics (MD) simulations[6]are parameter free and do not require reaction rateconstants as input. On the contrary, they producestreams of large series of data that implicitly con-tain information about these rates. Provided the re-action rates are extracted from an MD system therates can then be used as input for macroscopic kineticmodels[1], which constitutes the multi-scale paradigm.While the foundation of reaction rates is frequently discussed in the literature[15–18], this letter takes aphenomenological lens and devises the methodologyfor inferring reaction rate parameters from noisy mi-croscopic observations as given by, for example, molec-ular dynamics simulations.Consider a system that consists of N chemicalspecies reacting via M reactions. Each species may berepresented by multiple particles, which is indicatedby particle counts x = ( x , x , . . . , x N ) (cid:62) . We thushave (cid:80) Ni =1 x i particles in total. The reactive interac-tions that occur between these species can be modelledby using three levels of mathematical description[1]:the equation of motion, stochastic process, and rateequation. The rate equations are ordinary differentialequations (ODEs) that instead of species counts x i ,govern the evolution of their molar concentrations: c i = x i V N A , (1)whereby we assume that volume V → ∞ and x i scaleaccordingly to keep the pressure constant. Here, N A is the Avogadro’s constant. In the general case of N species and M reactions, such an ODE is given by: c (cid:48) i ( t ) = M (cid:88) j =1 k j S i,j c ν j ( t ) , i = 1 , , . . . , N, (2)where c = ( c , c , . . . , c N ) (cid:62) is the column vector ofconcentrations, the vector power c ν = c ν c ν . . . c ν N N is evaluated in the point-wise manner, k j are the re-action rate constants, ν j are vectors with binary ele-ments defining the participation of species, and S isan N × M matrix having the stoichiometric vectors asthe rows. The intuition behind Eq. (2) is discussedon a simple example in Supplementary Materials[19].The τ -leaping method[20] gives the stochasticequivalent of Eq. (2) that instead of concentra-tions operates with discrete particle counts x i . Sup-pose that all elements of the species count vectorare large, x (cid:29)
0, and in a small time increment τ these values undergo a small relative change. Let z = ( z , z , . . . , z M ) (cid:62) be the column vector of reac-tion firings observed during time interval τ . The el-ements of this vector are Poisson random variables: a r X i v : . [ c ond - m a t . s o f t ] J a n FIG. 1: (Colour online) Time snapshots of the carbon skeleton of the largest cluster in the di-acrylate network as givenby molecular simulations (this work) suggest that the reaction rates may considerably slow down during the course ofpolymerisation. Left-to-right: 20%, 30%, and 80% of reaction progress as measured by the double bond conversion, p . z j ∼ Poiss[ λ i x ν i τ ] , j = 1 , . . . , M, which, when com-bined with reaction stoichiometry S , provide the up-date vectors for species counts x during interval τ . Byletting τ l = t l − t l − to iterate over all discrete timeintervals, one recovers the whole evolution trajectoryof species count vector x l for l = 1 , . . . , L : x l = x l − + Sz l − , z l ∼ (Poiss[ λ x ν l τ l ] , . . . , Poiss[ λ M x ν M l τ l ]) (cid:62) , (3)where the stochastic rates λ , . . . , λ M are the onlyparameters. Equation (3) can be regarded as an N -dimensional random walk on species count numbers,or alternatively, a stochastic process. We give com-plete derivation of this equation in SupplementaryMaterials[19]. Stochastic process (3), although is avery practical model as it is relatively fast to simulate,relies upon the system being well-mixed, memoryless,and non-cooperative among other assumptions. Analternative that does not suffer from these issues issolving the N -body problem for a system composedof all copies of chemical species.Molecular dynamics describes the evolution of com-plex systems by solving the equation of motionfor each particle. When provided with a reactivescheme[6], MD also yields the evolution of the dif-ferent species in the system, their particle counts, andthe counts of all reaction events. We denote the reali-sations of the latter by adding a tilde to the symbols: ˜ x l and ˜ z l . We therefore have both, the empirical re-alisation of ˜ x l and ˜ z l as given by MD, and the the-oretical model for the underlaying stochastic process x l and z l as parametrised by stochastic rates λ j .In what follows we devise several maximum likeli-hood estimators (MLEs) that infer the stochastic rates λ j by treating the molecular dynamics trajectories asinput data. See Supplemental Material[19] for themathematical derivations of these MLEs. First, weassume that the stochastic rates λ j do not dependon time. In this case, the maximisation of the log-likelihood gives the following estimates: λ j = (cid:104) ˜ z j,l (cid:105)(cid:104) ˜ x ν j l τ l (cid:105) , var( λ j ) = λ j L (cid:104) ˜ z j,l (cid:105) , (4) where (cid:104) x l (cid:105) := L L (cid:80) l =1 x l denotes the time-average andvar( λ j ) refers to the asymptotic variance of this es-timator. Since the process of Eq. (3) does not re-quire the stochastic rates to be the same on all timesteps, we may devise an estimator that yields stochas-tic rates in the form of a time series: λ j,l = (cid:104) ˜ z j,l (cid:105) s (cid:104) ˜ x ν j l τ l (cid:105) s , var( λ l,j ) = λ j,l (2 s + 1) (cid:104) ˜ z j,l (cid:105) s , (5)where (cid:104) x l (cid:105) s := s +1 l + s (cid:80) l = l − s x l represents the moving av-erage. Finally, assume that the reaction rate param-eters that appear in the random walk model (3) havean exponential dependence on time: λ j ( t ) = e − p j ( t ) , (6)where p j ( t ) = α j, + α j, t + α j, t + · · · + α j,s t S , is apolynomial of order S . According to the derivationsgiven in Supplementary Information [19], the estima-tors for the polynomial coefficients are given by thesolution of the following system of non-linear equa-tions: (cid:104) ( e − p ( t l ) ˜ x ν j l τ l − ˜ z j,l ) t sl (cid:105) = 0 , s = 0 , . . . , S, (7)whereas the variance of the rate logarithm is given by:var(ln λ j ( t )) = 1 L b (cid:62) H − b , (8)where H is an ( S + 1) × ( S + 1) matrix with elements: H k,s = (cid:104) e − p j ( t l ) t kl t sl ˜ x ν j l τ l (cid:105) . and b = (1 , t, t , . . . , t S ) (cid:62) . One can replace time t in MLE (6) with any monotonous function of timethat tracks the progress of the chemical system, for in-stance, the conversion of an important species is oftena practical choice. In order to rationally determinethe best order of the polynomial for approximation,we propose to minimise two qualities simultaneously: -9 -9 -9 Inferred rate, constant MLEInferred rate, exp-polynomialInferred rate, time series -9 p ppp a bc de fg h FIG. 2: a,b,c,d,
Inferences of the reaction rate pre-factors A ( t ) from a single MD trajectory are realised withthe constant estimator Eq. (4), 4 rd order exp-polynomialestimator Eq. (6), and time series estimator Eq. (5). Themargins indicate 2 σ confidence. e,f,g,h, Inferences of thereaction rate pre-factors A ( p ) as a function of p are realisedwith time-series and exp-polynomial estimators. All pan-els share the same legend. the variance, and the residual of the estimator (see SI[19]). The source code implementing estimator equa-tions (4)-(8) is provided in [33].In what follows, we demonstrate the above-introduced method for statistical inference of reactionrates on an example of photo-polymerising hexanedioldiacrylate (HDDA) network[13]. The detailed chem-istry of this system, as well as the description of theMD methodology, are given in our previous work[9],and we will show now how to replace these computa-tionally expensive MD simulations with a simple sys-tem of ODEs that are valid on arbitrary large timescales. Here, we refer to the following MD system asthe microsystem : 2000 HDDA molecules confined in a7 . − m simulation box with periodic boundaryconditions and integrated in time up to 10 − s in theNPT ensemble. Only at time t = 0 the system is initi-ated with 5% radicals ( i.e. dark polymerisation[13]).Moreover, the activation energy of the reaction hasbeen reduced to speed up the simulations. The true kinetic parameters can be recovered by appropriateunbiasing (see Ref. [9] for the discussion). Despite be-ing so small this system has the maximal size given theconstrains of realistic computational resources: theintegration of one trajectory requires approximatelyone week of computation time on a 32 core super-computer. This microsystem is confronted with the macrosystem that reflects the desired real world tar-get: 4 . particles), polymerised under continuous initial-isation that maintains a steady concentration of rad-icals at 10 − ( e.g. photo polymerisation). We in-vestigate the rates of the two most important species:vinyl groups (V) and a radicals (R) that react via tworeaction channels, respectively propagation and ter-mination: V + R −−→ R , R + R −−→ ∅ . (9)The molecular network is formed as a byproduct ofthese reactions, and, as it is being formed, the net-work interferes with the collision rates of the species.This complex feedback of the network to the kineticsmay results in non-trivial dependence of the reactionrates on time. The mechanism (9) is characterised by S = (cid:18) − − (cid:19) , ν = (1 , , ν = (0 , , which incombination with molecular dynamics data ˜ x l and ˜ z l ,provides enough information to apply the rate estima-tors. Since the activation energy E a has been reducedin the microsystem, we use the following decomposi-tion of the rate: k ( t ) = A ( t ) e − E a / ( RT ) , (10)and perform the inference solely for pre-exponentialfactor A ( t ). Here, T denotes the temperature and R the gas constant. To recover the rate coefficient k ( t ), Eq.(10) should be supplied with E a = 31 . kJmol for propagation and E a = 8 . kJmol for terminationreactions (activation energies taken from the RMGpydatabase[34]). From the variance analysis (see SI[19])we conclude, that the optimal estimation is providedby the polynomial of 4 th order for propagation andthe 3 rd order for termination reactions. Apart of time t , another frequently used quantity[8, 13, 14, 21] thatcharacterise the progress of a network formation is p ( t ) = V (0) − V ( t ) V (0) , (11)the ratio between the current number of bonds in thenetwork and the maximum possible number of bonds.This quantity is also known as the bond conversionin chemistry, or occupancy probability in the theoryof percolation and network science. Figure 2 presentsthe inferred from single MD trajectories values of A ( t )and A ( p ) for different temperatures of polymerisation T . Independently of T , both A ( t ) and A ( p ) are clearlynon-constant, but drastically decrease throughout thereaction progress, which violates the Arrhenius as-sumptions. Moreover the profiles of A ( p ) depend on T and p in a non-linear fashion. In order to explain thiscomplex behaviour we observe that the system under-goes two phase-transitions that may not necessarilycoincide: the transition from disconnected clusters toa spanning network (the topological transition), andthe transition from liquid/resin-like to solid/glassystate (the glass transition). Thus in total, we havefour distinct domains in the T − p phase space: Ω –viscous, no network; Ω – glassy, no network; Ω –rubbery, network; Ω – glassy, network. As shown inFigure 3a, the partition of the phase space into thesedomains, indicates that the topological transition oc-curs around p c ≈ . p for glass transition is afunction of T . By colour-coding the points in the pro-files of A ( p ) depending to which domain they belongto, Figure 3b reveals that increasing T has oppositeeffects on A below and above the topological phasetransition: increased temperature inhibits the valueof pre-factor A for p < p c and promotes this value for p > p c . The nature of this difference derives from theeffect that the underlaying network has on the colli-sion rates. Below topological transition, when most ofthe system is disconnected, the positions of the speciesare weakly correlated, and increasing the temperatureleads to increase of the volume V in an NPT ensem-ble, which in turn reduces the collision rate. Abovethe topological transition, when a sizeable portion ofthe system is connected with a path, the geometry ofthe network becomes a function of topology and, thusdo not depend on the temperature or pressure any-more. In this case, however, the thermal fluctuationswithin the network, which are stronger for larger T ,promote local diffusion of species and therefore give alight boost to the collision rates. Therefore, collisionsin a network are governed by different mechanismsthan collisions in the ideal gas: shortest path betweenspecies embedded in a network becomes the most im-portant factor that explains the collision rates, which,in turn, become independent of temperature and pres-sure.To emphasise the universal dependence of system’sgeometry on the topology we calculate the returnprobability of the shortest path in the network. Thenetwork consist of multiple connected componentsthat grow in size and eventually amount to one gi-ant molecule that occupies the whole system. Bothspecies R and V reside in the network, and therefore,every propagation firing either joins two of such con-nected components or closes a cycle. The probabilitythat a polymer chain closes a cycle of length n is typi-cally derived from the return probability of a randomwalk that models the chain’s geometry. The exactdefinition of this random walk is a topic of debates,see for example Refs. [21–24]. As shown in Fig. 4a,the probability that a network strand closes a cycle isuniversal and can be asymptotically related to Flory’sexpression for the self-avoiding random walk, p ∼ n − / e − n − − αn / , where the chain stiffness parameter was found to be α = 1 .
0 0.2 0.4 0.6 0.8200300400500600 T [ K ] Ω Ω Ω Ω a b pp FIG. 3: a, As seen form MD simulations, the glass andtopological phase transitions separates the phase spaceinto four domains: Ω – viscous, no network; Ω – glassy,no network; Ω – rubbery, network; Ω – glassy, network.The solid lines mark σ − confidence interval around the do-main boundaries. b, Inferred form multiple MD trajecto-ries values of A ( p ) show that the polymerisation temper-ature has opposite effects on reaction rate pre-factor A indomains Ω , Ω and Ω , Ω . Symbols denote tempera-ture, and the colour code – the corresponding domain ofthe phase space. c -10 -8 -6 -5 -7 -9 -11 V , R Strand length, -3 -2 -1 MD, all temperaturesGaussian coilSelf-avoiding walk R e t u r n p r obab ili t y ba -6 -4 -2 -6 -4 -2 d FIG. 4: (colour online) a,b,c
Demonstration of the up-scaling procedure: Concentration of V as a function oftime in the upscaled macrosystem , the ODEs are suppliedwith inferred conversion-dependent reaction rates ( a – exp-polynomial of power 4, b – constant). The bands indicate σ confidence. c, Comparison of species counts in the mi-crosystem at T = 300K modelled with MD up to 10 − s,and with the stochastic/ODE models supplied with MD-inferred exp-polynomial rates. d, The probability of a net-work strand to close a cycle of a given length is fitted withFlory’s empirical expression for the return probability ofthe self-avoiding random walk. The return probability ofthe Gaussian coil is also given for reference. Error barscorrespond to a σ -confidence interval. as well. In the latter case, the shortest path replacesthe notion of a linear strand. The fact that the re-turn probability does not depend on temperature isexclusive to networks since the latter feature more ge-ometrically constrained configurations as compared tosingle chains.Finally, the most important applied implication ofthe rate inference is that one can use this procedure toperform predictions with the accuracy close to that ofMD simulations but on the macroscopic scale. Sinceall kinetic parameters are derived from the particlepotentials, as encoded by the force field, such predic-tions can be parameter-free. In order to perform thepredictions, one models the reaction mechanism (9)with ODEs (1) that are supplied with inferred fromMD simulations expressions for A ( p ), where p is givenby Eq. (11). Figures 4b,c,d illustrates this princi-ple: Fig. 4b compares MD data with the stochasticand ODE models, still in the microsystem, whereasFigs. 4c,d present the upscaled results as given by theODEs with inferred rates for the macrosystem up to t = 100s. See Supplemental Material [19] for the cor-responding rate parameters. Note that representingthe rates as exp-polynomial functions of p (Fig. 4c) asopposed to constants (Fig. 4d) is essential to capturethe kinetic slowdown that is induced by the jammingand is especially pronounced at low temperatures.To summarise, in this letter we solve the inverseproblem to the famous Gillespie’s stochastic simu-lation algorithm[26]: we take counts of molecularspecies and recover the reaction rate parameters thatdrive the kinetics. As such, our results introduce thestochastic alternative to one of the most common toolsin chemistry: the Arrhenius plot. From the point ofview of molecular dynamics, reaction rates is an emer-gent phenomenon of many reactive particles, and thisinference method allows one to extract the kinetic pa-rameters from simulations. By using the fact thatthe inferred parameters may be scale-invariant, wedemonstrate that it is possible to assimilate the re-sults of reactive molecular simulations so that theybecome descriptive on the macroscopic scale. Such anupscaling opens up the opportunity for modelling themacroscopic world with the accuracy that is prevalentto molecular simulations by using simple differentialequations. Molecular simulations of many reaction-driven macroscopic phenomena are already on theway, see for example the studies on crystallisation[5,27], self-assembly[28], aggregation[29], separation[30],and polymerisation[9, 11, 31], and the concept of or-dinary differential equations that learn from molecu-lar simulations may facilitate discovery of new macro-scopic trends and improving existing kinetic modelsfor these phenomena. As a proof of concept, we applythe method to diacrylate polymerisation. The kinet-ics of this process is very complex and depends on theunderlaying dynamic network. We reveal an intricatephenomenological dependance of the kinetic parame-ters on temperature and time in this system, and showthat these dependencies are induced by the complexevolution of the underlaying network topology. In-deed, collision rates of free species is a function of thevolume, whereas collision rates of species embeddedin a network are dictated by network geometry whichappears to feature universal characteristics. In sup-port of the latter statement we observe that the prob-ability of a topological shortest path in the networkto close a cycle of a given length is invariant in timeand temperature and follows a universal distribution.By this example we demonstrated that, despite com-plexity, it is possible to model the transition betweenfreely interacting spices and a dense network by ordi-nary differential equations with appropriately chosen non-linear coefficients.The authors are grateful to prof. S. Woutersen forhis valuable suggestions. I.K. acknowledges the sup-port from the research programme VENI with ProjectNo 639.071.511, and A.T-K. acknowledges supportfrom PREDAGIO Project. Both projects are financedby the Netherlands Organisation for Scientific Re-search (NWO). ∗ Electronic address: [email protected][1] Desmond J Higham. Modeling and simulating chem-ical reactions.
SIAM review , 50(2):347–368, 2008.[2] Timothy S Gardner, Charles R Cantor, and James JCollins. Construction of a genetic toggle switch inescherichia coli.
Nature , 403(6767):339, 2000.[3] Ivan Kryven, Susanna R¨oblitz, and Christof Sch¨utte.Solution of the chemical master equation by radialbasis functions approximation with interface tracking.
BMC Systems Biology , 9(1):67, 2015.[4] Sergio Branciamore, Enzo Gallori, E¨ors Szathm´ary,and Tam´as Cz´ar´an. The origin of life: chemical evo-lution of a metabolic system in a mineral honeycomb?
Journal of Molecular Evolution , 69(5):458, 2009.[5] Masakazu Matsumoto, Shinji Saito, and IwaoOhmine. Molecular dynamics simulation of the icenucleation and growth process leading to water freez-ing.
Nature , 416(6879):409, 2002.[6] Karim Farah, Florian M¨uller-Plathe, and Michael CB¨ohm. Classical reactive molecular dynamics im-plementations: State of the art.
ChemPhysChem ,13(5):1127–1151, 2012.[7] Ahmad K Omar and Zhen-Gang Wang. Shear-induced heterogeneity in associating polymer gels:Role of network structure and dilatancy.
Phys. Rev.Lett. , 119(11):117801, 2017.[8] Ivan Kryven, Jorien Duivenvoorden, Joen Hermans,and Piet D Iedema. Random graph approach to multi-functional molecular networks.
Macromolecular The-ory and Simulations , 25(5):449–465, 2016.[9] Ariana Torres-Knoop, Ivan Kryven, Verena Scham-boeck, and Piet Iedema. Modeling the free-radicalpolymerization of hexanediol diacrylate (HDDA): amolecular dynamics and graph theory approach.
SoftMatter , 2018.[10] Ahmad K. Omar and Zhen-Gang Wang. Shear-induced heterogeneity in associating polymer gels:Role of network structure and dilatancy.
Phys. Rev.Lett. , 119:117801, 2017.[11] Vittore F. Scolari, Guillaume Mercy, Romain Koszul,Annick Lesne, and Julien Mozziconacci. Kinetic sig-nature of cooperativity in the irreversible collapse ofa polymer.
Phys. Rev. Lett. , 121:057801, 2018.[12] Simone Ciarella, Francesco Sciortino, and Wouter G.Ellenbroek. Dynamics of vitrimers: Defects as a high-way to stress relaxation.
Phys. Rev. Lett. , 121:058003,2018.[13] C Decker, B Elzaouk, and D Decker. Kinetic study ofultrafast photopolymerization reactions.
Journal ofMacromolecular Science, Part A: Pure and AppliedChemistry , 33(2):173–190, 1996.[14] Thomas R Rooney and Robin A Hutchinson.Monomer structure and solvent effects on copolymercomposition in (Meth) acrylate radical copolymeriza-tion.
Industrial & Engineering Chemistry Research ,57(15):5215–5227, 2018.[15] Frank P Buff and David J Wilson. Some consider-ations of unimolecular rate theory.
The Journal of
Chemical Physics , 32(3):677–685, 1960.[16] George H Weiss. Overview of theoretical models forreaction rates.
Journal of Statistical Physics , 42(1-2):3–36, 1986.[17] Peter H¨anggi, Peter Talkner, and Michal Borkovec.Reaction-rate theory: fifty years after Kramers.
Re-views of modern physics , 62(2):251, 1990.[18] Peter G Bolhuis and G´abor Cs´anyi. Nested transi-tion path sampling.
Phys. Rev. Lett. , 120(25):250601,2018.[19] Supplemental material url.[20] Daniel T Gillespie. Approximate accelerated stochas-tic simulation of chemically reacting systems.
TheJournal of Chemical Physics , 115(4):1716–1733, 2001.[21] Michael Rubinstein and Ralph H Colby.
Polymerphysics , volume 23. Oxford university press, NewYork, 2003.[22] Michael Lang. Elasticity of phantom model networkswith cyclic defects.
ACS Macro Letters , 7(5):536–539,2018.[23] Rui Wang, Alfredo Alexander-Katz, Jeremiah AJohnson, and Bradley D Olsen. Universal cyclictopology in polymer networks.
Phys. Rev. Lett. ,116(18):188302, 2016.[24] Hern´an D Rozenfeld, Joseph E Kirk, Erik M Bollt,and Daniel Ben-Avraham. Statistics of cycles: howloopy is your network?
Journal of Physics A: Math-ematical and General , 38(21):4589, 2005.[25] Paul J Flory.
Principles of polymer chemistry . CornellUniversity Press, 1953.[26] Daniel T Gillespie. A general method for numerically simulating the stochastic time evolution of coupledchemical reactions.
Journal of computational physics ,22(4):403–434, 1976.[27] Haiyang Niu, Pablo M Piaggi, Michele Invernizzi, andMichele Parrinello. Molecular dynamics simulationsof liquid silica crystallization.
Proceedings of the Na-tional Academy of Sciences , 115(21):5348–5352, 2018.[28] Juan R Perilla, Boon Chong Goh, C Keith Cassidy,Bo Liu, Rafael C Bernardi, Till Rudack, Hang Yu, ZheWu, and Klaus Schulten. Molecular dynamics simu-lations of large macromolecular complexes.
Currentopinion in structural biology , 31:64–74, 2015.[29] Alexander K Buell, Jamie R Blundell, Christopher MDobson, Mark E Welland, Eugene M Terentjev, andTuomas PJ Knowles. Frequency factors in a landscapemodel of filamentous protein aggregation.
Physicalreview letters , 104(22):228101, 2010.[30] Jan Smrek and Kurt Kremer. Small activity differ-ences drive phase separation in active-passive polymermixtures.
Phys. Rev. Lett. , 118:098002, 2017.[31] Swarnavo Sarkar and Sheng Lin-Gibson. Computa-tional design of photocured polymers using stochasticreaction–diffusion simulation.
Advanced Theory andSimulations , 1(7):1800028, 2018.[32] Pierre Del Moral and Spiridon Penev.
Stochastic Pro-cesses: From Applications to Theory . Chapman andHall/CRC, 2017.[33] https://github.com/ikryven/RateInference[34] RMGpy kinetic database: https://rmg.mit.edu/
SUPPLEMENTARY NOTE 1: SUMMARY OF THE STOCHASTIC SIMULATION ALGORITHM
As an example, consider a system that consists of three chemical species A,B and C, having the particlecounts x , x , and x , and reacting via the following mechanism:A + B k −− (cid:42)(cid:41) −− k C . (12)We can describe this system by the following set of rate equations for time-dependent concentrations: c (cid:48) = k c c − k c ,c (cid:48) = − k c c + k c ,c (cid:48) = − k c c + k c , (13)where k i are the rate constants. In the general case of M reactions and N species having concentrations c = ( c , c , . . . , c N ) (cid:62) , the reaction rate equation generalises to: c (cid:48) i ( t ) = M (cid:88) j =1 k j S i,j c ν j ( t ) , i = 1 , , . . . , N. (14)Here, the vector power c ν = c ν c ν . . . c ν N N is evaluated in the point-wise manner; k j are the reaction rateconstants, ν j define the orders of the reactions, and S is an N × M matrix having the stoichiometric vectorsas its rows. In order to see that Eq. (13) is the special case of Eq. (14) it is sufficient to substitute: S = (cid:18) − − − − (cid:19) (cid:62) , ν = (1 , , (cid:62) , ν = (0 , , (cid:62) . One can see that the elements of ν sum up to 2, which indicates that j = 1 is a first order reaction, whereasthe elements of ν sum up to 1, indicating that the reaction order of j = 2 is two.The reactions of first and second orders derive their rates in different ways, yet, as was suggested by Gillespie,who devised the stochastic simulation algorithm (SSA)[26], these rates can be related to the stochastic processesthat govern the reaction firings. Consider a system that consists of a single molecule undergoing a first orderreaction. The probability that time t passes until this molecule reacts, is given by an exponential randomvariable with parameter λ : P [ t ∈ [ τ, τ + d τ ]] = λe − λτ . In short, we will refer to this fact as t ∼ Exp[ λ ] , also known as the “exponential clock”[32]. If instead, we have x = A independent molecules of the same species, the time until the first reaction firing within this speciesis given by: t ∼ inf { Exp[ λ ] , . . . , Exp[ λ ] (cid:124) (cid:123)(cid:122) (cid:125) x times } ∼ Exp[ x λ ] . (15)Here, we made use of the standard result about the minimum of multiple exponential random variables[32].Since t is an exponential random variable, its expected value is given by E [ t ] = ( λx ) − , which gives thecharacteristic time between reaction firings. Thus, the reaction rate r (the amount of substance per volume pertime) is given by r = 1 E [ t ] 1 V N A = x λV N A = λc ( t ) = kc ( t ) , (16)where the last equality derives from the fact that c i = x i V N A , where N A is the Avogadro’s number. Equation(16) settles the relationship between the stochastic rate λ and the rate constant k for first order reactions: k = λ. (17)The rates of second order reactions are dependent on a coincidence of two events: 1. the two reactants collidein the correct configuration, 2. together they undergo a first order reaction. We thus have a two-stage process:A + B −− (cid:42)(cid:41) −− AB −−→ C , (18)where AB is an intermediate that represents the species that collided but have not reacted. According toArrhenius theory, the first stage settles on an equilibrium: the number of AB is a constant fraction of the totalnumber of couple combinations: A x x . Since AB −−→
C is a first-order mechanism, it features the stochastic rate λ (cid:48) as given by Eq. (16). Consequently,one writes the time until the first reaction firing as t ∼ Exp[ λ (cid:48) A x x ] = Exp[ λx x ] , λ = λ (cid:48) A (19)which, after applying similar transformations to Eq. (16), gives the known approximation for the second-orderreaction rate: r = 1 E [ t ] 1 V N A = λx x V N A = λV N A c ( t ) c ( t ) = kc ( t ) c ( t ) . For second order reactions we therefore have: k = λV N A . (20)Note that if a second order reaction takes place between members of the same species, then the number ofcouples x ( x −
1) and therefore, k ≈ λV N A . More generally, if the j th reaction (of arbitraryorder now) is isolated, the waiting time that passes before the reaction firing is t ∼ Exp[ λ j x ν j ] and, byanalogy to Eq. (15), the time until the earliest event in the case of multiple competing reactions is givenby: t ∼ inf j Exp[ λ j x ν j ] ∼ Exp[ (cid:80) j λ j x ν j ] . Moreover, the probability that this is the j th reaction is givenby: P [ j ] = λ j x ν j (cid:80) i λ i x ν i . When iterated over multiple time steps, the later two sampling rules yield the followingstochastic process: x l = x l − + Sz l − , z l ∼ (Poiss[ λ ( t l ) x ν l τ l ] , . . . , Poiss[ λ M ( t l ) x ν n l τ l ]) (cid:62) , l = 1 , . . . , L. (21) SUPPLEMENTARY NOTE 2: MAXIMUM LIKELIHOOD ESTIMATORS
In this supplement, we give the derivations related to the problem of inferring the intensity parameters λ j ( t ) , j = 1 , . . . , M of stochastic process (21) from a known realisation of empirical data ˜ x l , ˜ z l on a series oftime points t l . Maximum likelihood estimator for constant reaction rates
We consider the general setting in which the time intervals τ l = t l − t l − , l = 1 , . . . , L need not to beequispaced. Let λ j ( t ) = λ j = const, then the rates of the Poisson random variables from Eq. (21) are given by λ j x ν j l τ l , l = 1 , . . . , L. Therefore the probability to observe configuration ˜ x l , ˜ z l on time intervals τ l is given by: L (cid:89) l =1 M (cid:89) j =1 e − λ λ y y ! (cid:12)(cid:12)(cid:12) y = ˜ z j,l λ = λ j ˜ x ν j l τ l , and taking a logarithm of this product gives the log-likelihood of the entire ensemble of data, f ( λ , . . . , λ M ) = L (cid:88) l =1 M (cid:88) j =1 ( − λ + y ln λ − ln y !) (cid:12)(cid:12)(cid:12) y = ˜ z j,l λ = λ j ˜ x ν j l τ l , (22)which has the following derivatives: ∂f∂λ j = − L (cid:88) l =1 ˜ x ν j l τ l + 1 λ j L (cid:88) l =1 ˜ z j,l = − L (cid:104) ˜ x ν j l τ l (cid:105) + 1 λ j L (cid:104) ˜ z j,l (cid:105) where (cid:104) x l (cid:105) := L L (cid:80) l =1 x l . By equating this derivatives to zero, one obtains expressions for λ j : λ j = (cid:104) ˜ z j,l (cid:105)(cid:104) ˜ x ν j l τ l (cid:105) , j = 1 , . . . , M, (23)In order to give an estimate for the variance of these parameter, var( λ , . . . , λ n ) , we make use of the asymptoticnormality property of this MLE and write:var( λ , . . . , λ n ) = − (Hess f ( λ , . . . , λ n )]) − , (24)where Hess f ( λ , . . . , λ n ) := ∂ f∂k i ∂k j is the Hessian matrix. Evaluating this variance estimate for Eq. (22) resultsin a diagonal covariance matrix, so that: var( λ j ) = λ j L (cid:104) ˜ z j,l (cid:105) . (25) Maximum likelihood estimator for moving-average reaction rates (time-series estimator)
For this estimator we require time intervals τ l to be equispaced. Consider a modification of the previouscase in which for every l = 1 , . . . , L the parameter λ ( t l ) is calculated from a local snippet of the data ˜ x l (cid:48) , ˜ z l (cid:48) ,where l (cid:48) = l − s, . . . , l + s . Here, s = 1 , , . . . plays role of a regularity parameter. Consequently, we obtain thefollowing log-likelihood function for λ j,l : f ( λ , , . . . , λ M,L ) = s + l (cid:88) l = l − s M (cid:88) j =1 ( − λ + y ln λ − ln y !) (cid:12)(cid:12)(cid:12) y = ˜ z j,l λ = λ j,l ˜ x ν j l τ l = s + l (cid:88) l = l − s M (cid:88) j =1 ( − λ j,l ˜ x ν j l τ l + ˜ z j,l ln( λ j,l ) + ˜ z j,l ln( ˜ x ν j l τ l ) − ln( ˜ z j,l !)) , having derivatives: ∂f∂λ j,l = − s + l (cid:88) l = l − s ˜ x ν j l τ l + 1 λ j,l s + l (cid:88) l = l − s ˜ z j,l = (2 s + 1)( 1 λ j,l (cid:104) ˜ z j,l (cid:105) − (cid:104) ˜ x ν j l τ l (cid:105) ) , where (cid:104) x l (cid:105) s := l + s (cid:80) l = l − s x l is the moving average. By equating this derivatives to zero, one obtains expressions for λ j,l : λ j,l = (cid:104) ˜ z j,l (cid:105) s (cid:104) ˜ x ν j l τ l (cid:105) s . (26)By following an analogous derivation to the one of Eq. (25), one obtains the estimate for variance:var( λ j,l ) = λ j,l (2 s + 1) (cid:104) ˜ z j,l (cid:105) s . (27) Maximum Likelihood estimator for exponentially decaying reaction rates
Consider the following ansatz for the parameters of process (21): λ j ( t ) = λ j, e − α j t . (28)By plugging y = ˜ z j,l and λ = λ j, e − α j t l ˜ x ν j l τ l into the log-likelihood function, we obtain: f ( α , , . . . , α M, , α , . . . , α n ) = L (cid:88) l =1 M (cid:88) j =1 ( − λ + y ln λ − ln y !) = L (cid:88) l =1 M (cid:88) j =1 ( − λ j, e − α j t l ˜ x ν j l τ l + ˜ z j,l ln λ j, − ˜ z j,l α j t l − ln ˜ z j,l !) . (29)By equating to zero the partial derivatives with respect to λ j, , we obtain: ∂f∂λ j, = − L (cid:88) l =1 e − α j t l ˜ x ν j l τ l + 1 λ j, L (cid:88) l =1 ˜ z j,l = 0 , and consequently: λ j, = (cid:104) ˜ z j,l (cid:105)(cid:104) e − α j t l ˜ x ν j l τ l (cid:105) . (30)In a similar fashion, we compute the derivatives with respect to α j and equate them to zero: ∂f∂α j = λ j, L (cid:88) l =1 t l e − α j t l ˜ x ν j l τ l − L (cid:88) l =1 ˜ z j,l t l = λ j, L (cid:104) t l e − α j t l ˜ x ν j l τ l (cid:105) − L (cid:104) ˜ z j,l t l (cid:105) = 0 . Plugging Eq. (30) in to the latter equality gives: (cid:104) ˜ z j,l (cid:105)(cid:104) e − α j t l ˜ x ν j l τ l (cid:105) (cid:104) t l e − α j t l ˜ x ν j l τ l (cid:105) − (cid:104) ˜ z j,l t l (cid:105) = 0 . and since (cid:104) e − α j t l ˜ x ν j l τ l (cid:105) > (cid:104) ( t l (cid:104) ˜ z j,l (cid:105) − (cid:104) ˜ z j,l t l (cid:105) ) ˜ x ν j l τ l ω t l j (cid:105) = 0 , ω j ∈ [0 , , α j = − ln ω j . (31)If each of these transcendental equations have a unique real root ω j ∈ [0 , α j . Equation (31) can be solved numerically by, for example, the bisection method. As a special case, when t l = hl, l = 1 , , . . . , L are equispaced, Eqs. (31) become polynomial equations. For each j : α j = − h ln y where L (cid:88) l =1 a l y l = 1 (32)0and a l = ( l (cid:104) ˜ z j,l (cid:105) − (cid:104) ˜ z j,l l (cid:105) ) ˜ x ν j l . The roots of this polynomial equation can be solved numerically, for exampleby reformulating the problem as the eigenvalue problem of the corresponding companion matrix.Analogously to Eq. (24), the variances of λ j, and α j can be computed form the Hessian matrices of thecorresponding log-likelihood functions. These matrices are not diagonal, however, at t = 0 we have λ j, e − α j t = λ j, and therefore: var( λ j ) = var( λ j, ) = λ j, L (cid:104) ˜ z j,l (cid:105) , In similar fashion, when t (cid:29) , λ j, e − α j t = e ( t ln λ j, − α j ) t ≈ e − α j t andvar( α j ) = 1 Lλ j, (cid:104) t l e − α j t l ˜ x ν j l τ l (cid:105) . Maximum Likelihood estimator for exp-polynomial reaction rates
In this estimator we assume the ansatz: λ j ( t ) = e − p j ( t ) , (33)where p j ( t ) = α j, + α j, t + α j, t + · · · + α j,s t S . By plugging y = ˜ z j,l and λ = λ j ( t ) ˜ x ν j l τ l = e − p j ( t ) ˜ x ν j l τ l into the log-likelihood function, we obtain f ( α , , . . . , α M,s ) = L (cid:88) l =1 M (cid:88) j =1 ( − λ + y ln λ − ln y !) = L (cid:104)− e − p j ( t l ) ˜ x ν j l τ l − ˜ z j,l p j ( t l )+ ˜ z j,l ln( ˜ x ν j l )+ ˜ z j,l ln τ l +ln( ˜ z j,l !) (cid:105) . Which has derivatives ∂f∂α j,s = L (cid:104) e − p j ( t l ) t sl ˜ x ν j l τ l − ˜ z j,l t sl (cid:105) . We obtain M · S equations that define α j,s by equatingthese derivatives to zero: (cid:104) ( e − p j ( t l ) ˜ x ν j l τ l − ˜ z j,l ) t sl (cid:105) = 0 . As in the preceding case, the variance analysis is performed by computing the Hessian matrix of the log-likelihoodfunction: ∂ f∂α j ,s ∂α j ,s = (cid:40) − L (cid:104) e − p j ( t l ) ˜ x ν j l τ l t s l t s l (cid:105) , if j = j j (cid:54) = j , so that var( α j, , α j, , . . . , α j,S ) = L H − where H k,s = (cid:104) e − p j ( t l ) ˜ x ν j l τ l t kl t sl (cid:105) . Moreover, this covariance matrix translates into the total variance of the rate parameter logarithm in thefollowing way: var(ln λ j ( t )) = var (cid:32) S (cid:88) s =0 α j,s t s (cid:33) = 1 L b (cid:62) H − b , (34)where b = (1 , t, t , . . . , t S ) (cid:62) . SUPPLEMENTARY NOTE 3: INFERENCE OF THE KINETIC RATES FOR HDDAPOLYMERISATION
The simplest way to infer the rate parameters from the MD data is to apply the constant MLE, Eq. (23). Therate parameters inferred in this way from the HDDA microsystem, as shown in Supplementary Table I, featuretight confidence intervals. Can we however claim then that the constant MLE is a good choice? Let us considerthe most general MLE λ j ( χ ) as given by Eq. (33) by using conversion χ ( t ) = 1 − c ( t ) c (0) as the time variable. Note1 T [K] Propagation, k [ molLs ] Termination, k [ molLs ]200 14 . ± . .
282 10 ± .
488 10
250 792 . ± .
04 7 .
865 10 ± .
387 10
300 17733 . ± . .
113 10 ± .
492 10
350 97422 . ± . .
494 10 ± .
964 10
400 4 .
276 10 ± . .
106 10 ± .
898 10
450 1 .
682 10 ± . .
03 10 ± .
89 10
500 3 .
644 10 ± . .
258 10 ± .
336 10
550 1 .
031 10 ± .
391 10 .
237 10 ± .
871 10
600 1 .
64 10 ± .
944 10 .
515 10 ± .
577 10 TABLE I: Inferred reaction rate parameters for HDDA polymerisation as given by the constant MLE FIG. 5: Conversion-dependent rate pre-factors as estimated with MLEs of various order (the red line and the 2 σ confidenceconfidence margins). The apparent time-series pre-factor is given for a reference (the black line). The optimal balancebetween small residual and high certainty corresponds to order 4. that setting the polynomial order S = 0 reduces this MLE to the previous case. In Supplementary Figure 5we explore how different orders S = 0 , . . . , A ( χ ) andthe corresponding to them confidence intervals. To quantify the quality of the exp-polynomial estimator wecalculate the residual: r = (cid:90) | ln λ j ( χ ) − ln λ ∗ j ( t ) | d χ, where λ ∗ j ( t ) is given by time-series estimator (26). Generally speaking, the higher order of the polynomial thesmaller are the values of r . Yet, this is not the case for the variance of r , which has a tendency to increase withthe polynomial order (the trend that can be also seen in Supplementary Figure 5). Employing the fact that,var( r ) = (cid:90) var(ln λ j ( χ ))d χ , we find the upper bound of the confidence interval to be c = r +2 (cid:112) var( r ) . The optimal polynomial order is thendefined as the order that yields the smallest value of c . The left panel in Figure 6 shows that the residual indeedtends to decrease with increasing polynomial order, whereas the right panel shows that there is an optimalsaddle point, S = 4, at which the confidence interval is the smallest in the most of the MD trajectories. One can2also see from Supplementary Figure 6( left ) that the accuracy increases around 5-10 fold when we use the 4 th order estimator as opposed to constant one ( i.e. the 0 th order). Similar analysis for the termination reactionreveals the optimal order S = 3, see Supplementary Figure 7. We therefore report the inferred rate coefficientsusing the 4 th polynomial for the propagation: k ( χ, T ) = A ( t ) e − Ea, RT = Ce − ( α , χ + α , χ + α , χ + α , χ + α , ) e − Ea, RT , (35)and the 3 th order polynomial for the termination reaction: k ( χ, T ) = A ( t ) e − Ea, RT = Ce − ( α , χ + α , χ + α , χ + α , ) e − Ea, RT , (36)where the activation energy is E a, = 3 . · for propagation and E a, = 8 . · for terminationreactions, the scaling constant C = V N a = 452 . Lmol , and the polynomial coefficients are given in Supplemen-tary Table II. Note that the units of the exp-polynomial in Eq. (35) are s − , so that the units of k ( χ, T ) and k ( χ, T ) are Lmol s . Unlike in the case of the constant estimator, where we can report the confidence intervals in aconcise manner, in the cases of higher polynomial order, the total uncertainty of the MLE is given by covariancesmatrices that are not diagonal and depend on the temperature, see Supplementary Tables III and IV. Since Σ T = H − , the covariance matrices can readily be plugged into (34) for uncertainty quantification. Note inconsistence with Eqs. (36) and (36), the covariance matrices are computed using χ ( t ) as the time variable. FIG. 6: The effect of the polynomial order S of the MLE for the propagation reaction. ( Left: ) The estimator residual r as a function of S . ( Right: ) The upper bound c of the residual confidence interval as a function of S . The upperbound is averaged over 4 simulation runs. The colour scheme is identical on both panels and indicates the simulationtemperature. FIG. 7: The effect of the polynomial order S of the MLE for the termination reaction. ( Left: ) The estimator residual r as a function of S . ( Right: ) The upper bound c of the residual confidence interval as a function of S The bound isaveraged over 4 simulation runs. The colour scheme is identical on both panels and indicates the simulation temperature. T [K] Propagation rate Termination rate α , α , α , α , α , α , α , α , α ,
200 78 . − .
177 16 .
329 2 . − .
479 163 . − .
050 43 . − . . − .
800 48 . − . − .
741 56 . − .
904 27 . − . . − .
096 12 .
209 2 . − .
787 89 . − .
233 36 . − . . − .
024 2 .
348 2 . − .
529 41 . − .
910 19 . − . . − .
106 12 .
000 0 . − .
584 84 . − .
340 37 . − . . − .
874 3 .
435 1 . − .
412 25 . − .
223 13 . − . . − .
238 0 .
775 1 . − .
098 47 . − .
980 28 . − . . − .
363 7 .
150 0 . − .
451 49 . − .
727 25 . − . . − .
538 3 .
135 1 . − .
178 39 . − .
521 22 . − . Σ = . − − . − . − .
022 0 . − . − . − . . − . . − . . − . . − .
022 4 . − . . − . . − . . − . . , Σ = . − − . − . − − . − . − − . − . − .
42 0 . − . . − − .
42 3 . − . . − . − . − . . − . . − − .
61 4 . − . . , Σ = . − − . − . − − . − . − − . − . − .
55 1 . − . . − − .
55 3 . − . . − . − . − . . − . . − − .
75 5 . − . . , Σ = . − − . − . − − . − . − − . − . − .
17 0 . − . . − − .
17 0 . − . . − . − . − . . − . . − − .
16 0 . − . . , Σ = . − − . − . − − . − . − − . − . − . . − . . − − . . − . . − . − . − . . − . . − − .
18 1 . − . . , Σ = . − − . − . − − .
013 7 . − − . − . − .
19 0 . − . . − − .
19 1 . − . . − .
013 0 . − . . − . . − − . . − . . , Σ = . − − . − . − − . − . − − . − . − .
13 0 . − . . − − .
13 0 . − . . − . − . − . . − . . − − . . − . . , Σ = . − − . − . − − . − . − − . − . − .
15 0 . − . . − − .
15 0 . − . . − . − . − . . − . . − − .
12 0 . − . . , Σ = . − − . − . − − . − . − − . − . − .
099 0 . − . . − − .
099 0 . − .
74 0 . − . − . − .
74 1 . − . . − − .
073 0 . − .
62 0 . TABLE III: Covariance matrices as given by Eq.(34) for the 4 rd order estimator (propagation reaction at various tem-peratures) Σ = . − − . − . − − . − − . − . − .
15 0 . . − − .
15 0 . − . − . − . − .
61 0 . , Σ = . − − . − . − − . − − . − . − .
055 0 . . − − .
055 0 . − . − . − . − . . , Σ = . − − . − . − − . − − . − . − .
077 0 . . − − .
077 0 . − . − . − . − .
29 0 . , Σ = . − − . − . − − . − − . − . − − .
025 0 . . − − .
025 0 . − . − . − . − .
063 0 . , Σ = . − − . − . − − . − − . − . − − .
027 0 . . − − .
027 0 . − . − . − . − .
074 0 . , Σ = . − − . − . − − . − − . − . − − .
028 0 . . − − .
028 0 . − . − . − . − .
07 0 . , Σ = . − − . − . − − . − − . − . − − .
018 0 . . − − .
018 0 . − . − . − . − .
035 0 . , Σ = . − − . − . − − . − − . − . − .
027 0 . . − − .
027 0 . − . − . − . − .
056 0 . , Σ = . − − . − . − − . − − . − . − − .
014 9 . − . − − .
014 0 . − . − . − . − − .
026 0 . TABLE IV: Covariance matrices as given by Eq. (34) for the 3 rdrd