[PDF] The expected number of viable autocatalytic sets in chemical reaction systems

Abstract

The emergence of self-sustaining autocatalytic networks in chemical reaction systems has been studied as a possible mechanism for modelling how living systems first arose. It has been known for several decades that such networks will form within systems of polymers (under cleavage and ligation reactions) under a simple process of random catalysis, and this process has since been mathematically analysed. In this paper, we provide an exact expression for the expected number of self-sustaining autocatalytic networks that will form in a general chemical reaction system, and the expected number of these networks that will also be uninhibited (by some molecule produced by the system). Using these equations, we are able to describe the patterns of catalysis and inhibition that maximise or minimise the expected number of such networks. We apply our results to derive a general theorem concerning the trade-off between catalysis and inhibition, and to provide some insight into the extent to which the expected number of self-sustaining autocatalytic networks coincides with the probability that at least one such system is present.

Full PDF

PPREDICTING THE NUMBER OF VIABLE AUTOCATALYTIC SETS INSYSTEMS THAT COMBINE CATALYSIS AND INHIBITION

STUART KAUFFMAN AND MIKE STEEL

Abstract.

The emergence of self-sustaining autocatalytic networks in chemical reaction sys-tems has been studied as a possible mechanism for modelling how living systems ﬁrst arose. Ithas been known for several decades that such networks will form within systems of polymers(under cleavage and ligation reactions) under a simple process of random catalysis, and thisprocess has since been mathematically analysed. In this paper, we provide an exact expressionfor the expected number of self-sustaining autocatalytic networks that will form in a generalchemical reaction system, and the expected number of these networks that will also be uninhib-ited (by some molecule produced by the system). Using these equations, we are able to describethe patterns of catalysis and inhibition that maximise or minimise the expected number of suchnetworks. We apply our results to derive a general theorem concerning the trade-oﬀ betweencatalysis and inhibition, and to provide some insight into the extent to which the expectednumber of self-sustaining autocatalytic networks coincides with the probability that at leastone such system is present.

Address:

M. Steel (corresponding author):Biomathematics Research Centre,University of Canterbury, Christchurch, New ZealandTel.: +64-3-364-2987 ext 7688Fax: [email protected] KauﬀmanAﬃliate Professor Institute for Systems BiologyEmeritus Professor, Biochemistry and Biophysics, University of Pennsylavania, PA, USAstukauﬀ[email protected]

Keywords: autocatalytic network, catalysis, inhibition, random process a r X i v : . [ q - b i o . M N ] J u l STUART KAUFFMAN AND MIKE STEEL Introduction

A characteristic of living systems is the ability for metabolism to be simultaneously self-sustaining and collectively autocatalytic. Systems that combine these two general propertieshave been studied within a formal framework that is referred to as RAF theory [9]. We giveprecise deﬁnitions shortly but, roughly speaking, a ‘RAF’ is a subset of reactions where thereactants and at least one catalyst of each reaction in the subset can be produced from anavailable food set by using reactions from within the subset only. The study of RAFs tracesback to pioneering work on ‘collectively autocatalytic sets’ in polymer models of early life[15, 16], which was subsequently developed mathematically (see [7, 9] and the references there-in). RAF algorithms have been applied recently to investigate the traces of earliest metabolismthat can be detected in large metabolic databases across bacteria and archaea [19], leadingto the development of an open-source program to analyse and visualise RAFs in complexbiochemical systems [11]. RAF theory overlaps with other graph-theoretic approaches in whichthe emergence of directed cycles in reaction graphs plays a key role [1, 12, 13], and is alsorelated to (M, R) systems [3, 14] and chemical organisation theory [4].RAF theory has also been applied in other ﬁelds, including ecology [2] and cognition [6], andthe ideas may have application in other contexts. In economics, for instance, the production ofconsumer items can be viewed as a catalysed reaction; for example, the production of a woodentable involves nails and wood (reactants) and a hammer (a catalyst, as it is not used up inthe reaction but makes the reaction happen much more eﬃciently) and the output (reactionproduct) is the table. On a larger scale, a factory is a catalyst for the production of the itemsproduced in it from reactants brought into the factory. In both these examples, notice thateach reactant may either be a raw material (i.e. the elements of a ‘food set’) or a productsof other (catalysed) reactions, whereas the products may, in turn, be reactants, or catalysts,for other catalysed reactions. Products can sometimes also inhibit reactions; for example, theproduction of internal combustion engines resulted in processes for building steam engines beingabandoned.In this paper, we extend RAF theory further by investigating the impact of diﬀerent modesof catalysis and inhibition on the appearance of (uninhibited) RAF subsets. We focus on theexpected number of such sets (rather than on the probability that at least one such set exists[5, 17]), as this leads to explicit mathematical expressions, as well as providing some insightinto the expected population sizes of RAFs for the catalysis rate at which they ﬁrst appear (aswe discuss in Section 4.2). We begin with some formal deﬁnitions.1.1.

Deﬁnitions.

Let X be a set of molecule types; R a set of reactions, where each reactionconsists of a subset of molecule types as input (‘reactants’) and a set of molecule types asoutputs (‘products’); and let F be a subset of X (called a ‘food set’). We refer to the triple Q = ( X, R, F ) as a chemical reaction system with food set and, unless stated otherwise, weimpose no further restrictions on Q (e.g. it need not correspond to a system of polymers anda reaction can have any positive number of reactants and any positive number of products).Given a reaction r ∈ R , we let ρ ( r ) ⊆ X denote the set of reactants of r and π ( r ) denote theset of products of r . Moreover, given a subset R (cid:48) of R , we let π ( R (cid:48) ) = (cid:83) r ∈ R (cid:48) π ( r ) . XPECTED NUMBER OF UNINHIBITED RAF SETS 3

A subset R (cid:48) of R is said to be F -generated if R (cid:48) can be ordered r , r , . . . , r | R (cid:48) | so that ρ ( r ) ⊆ F and for each i ∈ { , . . . , | R |} , we have ρ ( r i ) ⊆ F ∪ π ( { r , . . . , r i − } ). In other words, R (cid:48) is F -generated if the R (cid:48) can be built up by starting from one reaction that has all its reactantsin the food set, then adding reactions in such a way that each added reaction has each of itsreactants present either in the food set or as a product of a reaction in the set generated so far.Now suppose that certain molecule types in X can catalyse certain reactions in R . A subset R (cid:48) of R is said to be Reﬂexively Autocatalytic and F-generated (more brieﬂy, a

RAF ) if R (cid:48) isnonempty and each reaction r ∈ R (cid:48) is catalysed by at least one molecule type in F ∪ π ( R (cid:48) ) and R (cid:48) is F -generated. We may also allow certain molecule types to also inhibit reactions in R , inwhich case a subset R (cid:48) of R is said to be an uninhibited RAF (uRAF) if R (cid:48) is a RAF and noreaction in R (cid:48) is inhibited by any molecule type in F ∪ π ( R (cid:48) ). Since a union of RAFs is alsoa RAF, when a RAF exists in a system, there is a unique maximal RAF. However, the samedoes not apply to uRAFs – in particular, the union of two uRAFs can fail to be a uRAF. Theseconcepts are illustrated in Fig. 1. r ab xz r x ′ z ′ r ′ r ′ a ′ b ′ c c ′ w ′ w r r Figure 1.

A chemical reaction system consisting of the set of molecule types X = { a, b, c, a (cid:48) , b (cid:48) , c (cid:48) , w, w (cid:48) , z, z (cid:48) } , a food set F = { a, b, c, a (cid:48) , b (cid:48) , c (cid:48) } and the reactionset R = { r , r , r (cid:48) , r (cid:48) , r , r } (square vertices). Solid arcs indicate two reactantsentering a reaction and a product coming out. Catalysis is indicated by dashedarcs (blue) and inhibition is indicated by dotted arcs (red). The full set of re-actions is not a RAF, but it contains several RAFs that are contained in theunique maximal RAF R (cid:48) = { r , r (cid:48) , r , r (cid:48) } (note that r is not part of this RAFeven though it is catalysed and the reactants of r are present in the food set).This RAF is not a uRAF; however, { r , r } and { r (cid:48) , r (cid:48) } are uRAFs, and so are { r } , { r (cid:48) } and { r , r (cid:48) } . STUART KAUFFMAN AND MIKE STEEL Modelling catalysis and inhibition

We will model catalysis and also blocking (inhibition) by random processes. To provide forgreater generality, we allow the possibility that elements in a subset C − (respectively, B − ) ofthe the food set cannot catalyse (respectively block) any reaction in R . Let c = | F \ C − | and b = | F \ B − | . Thus c (respectively b ) is the number of food elements that are possible catalysts(respectively blockers).Suppose that each molecule type x ∈ X \ C − has an associated probability C x of catalysingany given reaction in R . We will treat C x as a random variable taking values in [0 ,

1] with acommon distribution D . This results in a random assignment of catalysis (i.e. a random subset χ of X × R ), where ( x, r ) ∈ χ if x catalyses r . Let C x,r be the event that x catalyses r .We assume that:( I ) C = ( C x , x ∈ X \ C − ) is a collection of independent random variables.( I ) Conditional on C , ( C x,r : x ∈ X \ C − , r ∈ R ) is a collection of independent events.Since the distribution of C x is the same for all x ∈ X \ C − , we will use C to denote this randomvariable. Let µ C = E [ C ] and, for i ≥

0, let λ i be the i –th moment of 1 − C ; that is: λ i = E [(1 − C ) i ] . Although our results concern general catalysis distributions, we will pay particular attentionto three forms of catalysis: • The uniform model:

Each x ∈ X \ C − catalyses each reaction in R with a ﬁxed proba-bility p . Thus, C = p with probability 1, and so µ C = p . • The sparse model: C = u with probability π and C = 0 with probability 1 − π , and so µ C = uπ . • The all-or-nothing model: C = 1 with probability π and C = 0 with probability 1 − π ,and so µ C = π .The uniform model is from Kauﬀman’s binary polymer network and has been the defaultfor most recent studies involving polymer models [9]. More realistic catalysis scenarios can bemodelled by allowing C to take a range of values values around µ C with diﬀerent probabilities.The sparse model generalises the uniform model slightly by allowing a (random) subset ofmolecule types to be catalysts. In this model, π would typically be very small in applications(i.e. most molecules are not catalysts but those few that are will catalyse a lot or reactions, asin the recent study of metabolic origin of [19]). The all-or-nothing model is a special case ofthe sparse model. The emergence of RAFs in these models (and others, including a power-lawdistribution) was investigated in [8].For these three models, the associated λ i values are given as follows: λ = 1, and for all i ≥ λ i =  (1 − µ C ) i , (uniform model);1 − π + π (1 − u ) i , (sparse model);1 − π, (all-or-nothing model) . XPECTED NUMBER OF UNINHIBITED RAF SETS 5

In addition to catalysis, we may also allow random blocking (inhibition) of reactions bymolecules, formalised as follows. Suppose that each molecule type x ∈ X \ B − has an associatedprobability B x of blocking any given reaction in R . We will treat B x as a random variable takingvalues in [0 ,

1] with a common distribution ˆ D . This results in a random assignment of blocking( i.e. a random subset B of X × R ), where ( b, r ) ∈ B if b blocks reaction r . Let B x,r be theevent that x blocks r . We assume that:( I (cid:48) ) B = ( B x , x ∈ X \ B − ) is a collection of independent random variables.( I (cid:48) ) Conditional on B , ( B x,r : x ∈ X \ C − , r ∈ R ) is a collection of independent events.Since the distribution of B x is the same for all x , we will use B to denote this random variable,let µ B = E [ B ] and, for i ≥

0, let: ˆ λ i = E [(1 − B ) i ] . We also assume that catalysis and inhibition are independent of each other. Formally, this isthe following condition:( I ) C –random variables in ( I , I ) are independent of the B –random variables in ( I (cid:48) , I (cid:48) ).Note that ( I ) allows the possibility that a molecule type x both catalyses and blocks thesame reaction r (the eﬀect of this on uRAFs is the same as if x just blocks r ; (i.e. blocking isassumed to trump catalysis)). Notice also that λ = ˆ λ = 1.3. Generic results

To state our ﬁrst result, we require two further deﬁnitions. Let µ RAF and µ uRAF denote theexpected number of RAFs and uRAFs (respectively) arising in Q under the the random processof catalysis and inhibition described. For integers n, s ≥ n r,s be the number of F-generatedsubsets R (cid:48) of R for which the total number of non-food products in X produced by reactionsin R (cid:48) is s . Note that n r,s = 0 for s > min {| X | − F, rM } where M is the maximum number ofproducts of any single reaction.Part (i) of the following theorem gives an exact expression for µ RAF and µ uRAF , which wethen use in Parts (ii) and (iii) to describe the catalysis and inhibition distributions (having agiven mean) that minimise or maximise the expected number of RAFs and uRAFs. We applythis theorem to particular systems in the next section. Theorem 1.

Let Q be any chemical reaction system with food set, accompanied by catalysisand inhibition distributions D and ˆ D , respectively.(i) The expected number of RAFs and uRAFs for Q is given as follows: (2) µ RAF = (cid:88) r,s n r,s (cid:32) r (cid:88) i =0 ( − i (cid:18) ri (cid:19) λ s + ci (cid:33) and (3) µ uRAF = (cid:88) r,s n r,s (cid:32) r (cid:88) i =0 ( − i (cid:18) ri (cid:19) λ s + ci (cid:33) ˆ λ s + br . STUART KAUFFMAN AND MIKE STEEL (ii) Among all distributions D on catalysis having a given mean µ C , the distribution thatminimises the expected number of RAFs and uRAFs (for any inhibition distribution) isthe uniform model (i.e. C = µ C with probability 1).(iii) Among all distributions ˆ D on inhibition having a given mean µ B , the following hold:(a) the distribution that minimises the expected number of uRAFs (for any catalysisdistribution) is the uniform model ( B = µ B with probability 1).(b) the distribution that maximises the expected number of uRAFs (for any catalysisdistribution) is the all-or-nothing inhibition model (i.e. B = 1 with probability µ B ,and B = 0 with probability − µ B ) . Remarks. • If P RAF and P uRAF are the probability that Q contains a RAF and a uRAF, respectively,then these quantities are bounded above as follows: P RAF ≤ µ RAF and P uRAF ≤ µ uRAF . This follows from the well-known inequality P ( V > ≤ E [ V ] for any non-negativeinteger-valued random variable V , upon taking V to be the number of RAFs (or thenumber of uRAFs). We will explore the extent to which P RAF underestimates µ RAF inSection 4.2. • Theorem 1 makes clear that the only relevant aspects of the network (

X, R ) for µ RAF and µ uRAF are encoded entirely within the coeﬃcients n r,s (the two stochastic termsdepend only on r and s but not on further aspects of the network structure). Bycontrast, an expression for the probabilities P RAF and P uRAF that a RAF or uRAFexists requires more detailed information concerning the structure of the network. Thisis due to dependencies that arise in the analysis. Notice also that Theorem 1 allows thecomputation of µ uRAF in O ( | R | × | X | ) steps (assuming that the λ i , ˆ λ i and n r,s valuesare available). • Although the computation or estimation of n r,s may be tricky in general systems,Eqn. (2) can still be useful (even with little or no information about n r,s ) for askingcomparative types of questions. In particular, Parts (ii) and (iii) provide results thatare independent of the details of the network ( X, R, F ). In the context of the binarypolymer model, they provide some insight into the ﬁndings that more variable catalysisrates allow RAFs to arise at lower average catalysis rates than under uniform catalysis[8], as illustrated in Fig. 2. 4.

Applications

Inhibition-catalysis trade-oﬀs under the uniform model.

For any model in whichcatalysis and inhibition are uniform, Theorem 1 provides a simple prediction concerning how theexpected number of uRAFs compares with a model with zero inhibition (and a lower catalysisrate). To simplify the statement, we will assume b = c and we will write µ uRAF ( p, tp ) to denotethe dependence of µ uRAF on µ C = p and µ B = tp for some value of t . We will also write XPECTED NUMBER OF UNINHIBITED RAF SETS 7

J Math Chem

Fig. 3

Comparison across the four models of P n (probability of a RAF) as a function of f (average catalysisrate per molecule) on the binary polymer model for n =

10 and n = Consider now the uniform model in which, in addition to catalysis, each molecule typeindependently inhibits each reaction with a constant probability. Let h be the averagenumber of reactions each molecule type inhibits, and, as before, let f be the averagenumber of reactions each molecule type catalyzes ( f = p | R n | ) . Recall that a u -RAFis a RAF R ! for which no reaction in R ! is inhibited by any molecule type in the set S ( R ! ) consisting of the products of reactions in R ! together with the molecule typesin the food set F .Clearly if h > u -RAF is bounded above by a constant δ ( h ) < f → ∞ ) since there is a positive probability that all of the reactionsthat just involve reactants from the food set F are inhibited. Moreover δ ( h ) → + as h grows, even if f were to grow exponentially (or faster) with h . Despite this ‘trumps–all’ feature of inhibition, one can still provide an inequality that relates the expectednumber of u -RAFs in a uniform polymer model with rates ( f , h ) to the expectednumber of RAFs in the uniform model with catalysis rate f .Let us assume that the catalysis rate f grows linearly with n (this is a minorrestriction since this rate sufﬁces for RAFs to arise) and allow h to also grow linearlywith n but with a constant factor that decays exponentially with the correspondingconstant factor for f . Under this restriction, Theorem 2 (below) shows that we expect(at least) as many u -RAFs as there would be RAFs if the catalysis rate had simplybeen halved.Given a uniform polymer model Q n (over any ﬁnite alphabet of κ states) let µ( f , h ) be the expected number of u -RAFs as a function of f and h (the catalysis and inhibitionrates, respectively) and let µ( f ) be the expected number of RAFs as a function of f . Figure 2.

The probability of RAFs in the binary polymer network under diﬀer-ent catalysis models with the same mean value µ C (from the simulation resultsreported in [8] polymers up to length n = 10 and n = 16). The ﬁnding that theuniform model requires a higher rate of catalysis for RAFs to emerge than modelsthat allow more variation in catalysis rates (sparse, or all-or-none) is consistentwith Theorem 1(ii). A power-law distribution (not studied in this paper) hassimilar behaviour to uniform model. This ﬁgure is from [8]. p = ν/N , where N is the total number of molecule types that are in the food set or can begenerated by a sequence of reactions in R . We assume in the following result that p is small(in particular, < /

2) and N is large (in particular, (1 − ν/N ) N can be approximated by e − ν ).The following result applies to any chemical reaction system and provides a lower bound onthe expected number of uRAFs in terms of the expected number of RAFs in the system withno inhibition (and half the catalysis rate); its proof relies on Theorem 1 and is provided in theAppendix. Corollary 1.

For all non-negative values of t with t ≤ ν ln(1 + e − ν ) , the following inequalityholds: µ uRAF (2 p, tp ) ≥ µ RAF ( p, . Explicit calculations for two models on a subclass of networks.

We now consider elementary chemical reaction systems (i.e. systems for which each reaction has all its reactantsin the food set, as studied in [18]), with the further assumptions that (i) each reaction hasexactly one product, (ii) diﬀerent reactions produce diﬀerent products, (iii) no reactions areinhibited, and (iv) no food element catalyses any reaction. We can associate with each suchsystem a directed graph G on the set X − F of products of the reactions, with an arc from x STUART KAUFFMAN AND MIKE STEEL to y if x catalyses the reaction that produces y (this models a setting investigated in [12, 13]).RAF subsets are then in one-to-one correspondence with the subgraphs of G for which eachvertex has indegree at least one. In particular, a RAF exists if and only if there is a directedcycle in G (which could be an arc from a vertex to itself). In this simple set-up, if N denotesthe number of reactions (= number of non-food molecule types) then: n r,s = (cid:40)(cid:0) Nr (cid:1) , if r = s ;0 , otherwise.Applying Theorem 1(i) gives:(4) µ RAF = N (cid:88) j =0 (cid:18) Nj (cid:19) (cid:32) j (cid:88) i =0 ( − i (cid:18) ji (cid:19) λ ji (cid:33) . Regarding catalysis, consider ﬁrst the all-or-nothing model , for which λ i = 1 − π = 1 − µ C .Applying the identity (cid:80) nk =1 (cid:0) nk (cid:1) x k = (1 + x ) k (twice), Eqn. (4) simpliﬁes to:(5) µ RAF = 2 N − (2 − µ C ) N . This expression can also be derived by the following direct argument. First, note that asubset S of the N products of reactions does not correspond to a RAF if and only if each of the | S | elements x in S has C x = 0. The random variable W = |{ x : C x = 1 }| follows the binomialdistribution Bin ( N, µ C ), and the proportion of sets of size N that avoid a given set S of size k is 2 − k . Thus the expected proportion of subsets that are not RAFs is the expected value of2 − W where W is the binomial distribution above. Applying standard combinatorial identitiesthen leads to Eqn. (5).The probability of a RAF in this special case is also easily computed:(6) P RAF = 1 − (1 − µ C ) N . Notice that one can select µ C to tend to 0 in such a way P RAF converges to 0 exponentiallyquickly with N while µ RAF tends to inﬁnity at an exponential rate with N (this requires µ C todecay suﬃciently fast with N but not too fast, e.g. µ C = Θ( N − − β ) for β > µ RAF ( µ C ) = 2 N P RAF ( µ C / . By contrast, for the uniform model , applying straightforward algebra to Eqn. (4) leads to(7) µ RAF = N (cid:88) j =1 (cid:18) Nj (cid:19) (cid:0) − (1 − µ C ) j (cid:1) j . Asymptotics of the two models at the catalysis level where RAFs arise:

For theall-or-nothing and uniform models, RAFs arise with a given (positive) probability, providedthat µ C converges to 0 no faster than N − as N grows. Thus it is helpful to write µ C = γ/N to compare their behaviour at this transition. An asymptotic study of the emergence of ﬁrst cycles in large random directed graphs was explored in [1].

XPECTED NUMBER OF UNINHIBITED RAF SETS 9

For the all-or-nothing model, we have: µ RAF ( µ C ) P RAF ( µ C ) = 2 N (cid:16) − (cid:0) − γ N (cid:1) N (cid:17)(cid:0) − (cid:0) − γN (cid:1)(cid:1) ∼ N (cid:18) − exp( − γ/ − exp( − γ ) (cid:19) , where ∼ is asymptotic equivalence as N becomes large (with γ being ﬁxed), and so:(8) µ RAF ( µ C ) P RAF ( µ C ) ∼ N − (1 + O ( γ )) , Let us compare this with the uniform model with the same µ C (and hence γ ) value. It canbe shown that when γ < e − , we have:(9) lim N →∞ N (cid:88) j =1 (cid:18) Nj (cid:19) (cid:0) − (1 − γ/N ) j (cid:1) j = γ + o ( γ ) . where o ( γ ) has order γ as γ → N and assuming γ < − exp( − γ ) ≤ P RAF ( µ C ) ≤ − ln(1 − γ ) . In particular, for small γ we have P RAF = γ + o ( γ ). Combining these results for the uniformmodel gives:(11) µ RAF ( µ C ) P RAF ( µ C ) = 1 + O ( γ ) . We now notice two key diﬀerences between Eqns. (8) and (11) when N is large and γ issmall: their ratio involves the factor 2 N and they have diﬀerent dependencies on γ . This canbe explained as follows. In the all-or-nothing model, the existence of a RAF comes down towhether or not there is a universal catalyst; when there is, then any subset of the N elementscontaining that catalyst is a RAF. By contrast, with the uniform model, at the catalysis levelwhere RAFs ﬁrst arise there is likely to be only one or a few such subsets.5. Concluding comments

In this paper, we have focused on the expected number of RAFs and uRAFs (rather thanthe probability of at least one such set existing), as this quantity can be described explicitly,and generic results described via this expression can be derived (e.g. in Parts (ii) and (iii) ofTheorem 1 and Corollary 1). Even so, the expressions in Theorem 1 involve quantities n r,s thatmay be diﬃcult to quantify exactly; thus in the second part of the paper, we consider morerestrictive types of systems.In our analysis, we have treated inhibition and catalysis as simple and separate processes.However, a more general approach would allow reactions to proceed under rules that are encodedby Boolean expressions. For example, the expression ( a ∧ b ) ∨ c ∨ ( d ∧ ¬ e ) assigned to a reaction r would allow r to proceed if at least one of the following holds: (i) both a and b are present ascatalysts, or (ii) c is present as a catalyst or (iii) d is present as a catalyst and e is not present as an inhibitor. Extending the results in this paper to this more general setting could be aninteresting exercise for future work. References [1] Bollob´as, B., Rasmussen, S. (1989). First cycles in random directed graph processes. Discrete Math. 75,55–68.[2] Cazzolla Gatti, R., Fath., B, Hordijk, W., Kauﬀman, S. and Ulanowicz, R. (2018). Niche emer-gence as an autocatalytic process in the evolution of ecosystems. J. Theor. Biol. 454, 110–117.(doi:10.1016/j.jtbi.2018.05.038)[3] Cornish-Bowden, A. and C´ardenas, M. L. (2007). Organizational invariance in (M, R)-systems. Chem.Biodiv. 4, 2396–2406.[4] Dittrich, P. and Speroni di Fenizio, P. (2007). Chemical organisation theory. Bull. Math. Biol. 69, 1199–1231.[5] Filisetti, A., Villani, M., Damiani, C., Graudenzi, A., Roli, A., Hordijk, W. and Serra, R. (2014). On RAFsets and autocatalytic cycles in random reaction networks. In: Pizzuti C., Spezzano G. (eds) Advances in Ar-tiﬁcial Life and Evolutionary Computation. WIVACE 2014. Communications in Computer and InformationScience, vol 445. Springer, Cham.[6] Gabora, L. and Steel, M. (2017). Autocatalytic networks in cognition and the origin of culture. J. Theor.Biol. 63, 617–638. doi:10.1016/j.jtbi.2017.07.022[7] Hordijk, W. (2019). A history of autocatalytic sets. Biol. Theory 14, 224–246. doi: 10.1007/s13752-019-00330-w[8] Hordijk, W. and Steel, M. (2016). Autocatalytic sets in polymer networks with variable catalysis distribu-tions, J. Math. Chem., 54(10): 1997–2021.[9] Hordijk, W. and Steel, M. (2017). Chasing the tail: The emergence of autocatalytic networks. Biosystems152:1–10.[10] Hordijk, W., Steel, M. and Kauﬀman, S. A. (2019). Molecular diversity required for the formation ofautocatalytic sets. Life 9: 23.[11] Huson, D. and Steel, M. (2020).

CatlyNet . https://github.com/husonlab/catlynet[12] Jain, S. and Krishna, S. (1998). Autocatalytic sets and the growth of complexity in an evolutionary model.Phys. Rev. Lett. 81, 5684–5687.[13] Jain, S. and Krishna, S. (2001). A model for the emergence of cooperation, interdependence, and structurein evolving networks. PNAS 98, 543–547.[14] Jaramillo, S., Honorato-Zimmer, R., Pereira, U., Contreras, D., Reynaert, B., Hernn-dez, V., Soto-Andrade,J., Crdenas, M., Cornish-Bowden, A. and Letelier, J. (2010). (M,R) systems and RAF sets: common ideas,tools and projections. In: Proceedings of the Alife XII Conference. Odense, Denmark, pp. 94–100.[15] Kauﬀman, S. (1971). Cellular homeostasis, epigenesis and replication in randomly aggregated macromolec-ular systems. J. Cybern. 1, 71–96.[16] Kauﬀman, S. (1986). Autocatalytic sets of proteins. J. Theor. Biol. 119, 1–24.[17] Mossel, E. and Steel, M. (2005). Random biochemical networks: the probability of self-sustaining auto-catalysis. J. Theor. Biol. 233(3), 327–336.[18] Steel, M., Hordijk, W. and Xavier, J. C. (2018). Autocatalytic networks in biology: structural theory andalgorithms. J. Roy. Soc. Interface 16: 20180808[19] Xavier, J. C., Hordijk, W., Kauﬀman, S., Steel M. and Martin, W. F. (2020). Autocatalytic chemicalnetworks at the origin of metabolism. Proc. Roy. Soc. B. 287:20192377.

XPECTED NUMBER OF UNINHIBITED RAF SETS 11 Appendix: Proof of Theorem 1, Corollary 1 and Eqn. (9).

For Part (i), recall that π ( R (cid:48) ) denotes the set of products of reactions in R (cid:48) , and let FG( r, s )denote the subset R (cid:48) of R of size r that are F-generated and for which the number of non-foodmolecule types produced by reactions in R (cid:48) has size s . Thus n r,s = | FG( r, s ) | .For R (cid:48) ⊆ R , let I R (cid:48) be the Bernoulli random variable that takes the value 1 if each reactionin R (cid:48) is catalysed by at least one product of a reaction in R (cid:48) or by an element of F \ C − , and0 otherwise. Similarly, let ˆ I R (cid:48) be the Bernoulli random variable that takes the value 1 if noreaction in R (cid:48) is blocked by the product of any reaction in R (cid:48) or by an element of F \ B − . Thenthe random variable (cid:88) r,s (cid:88) R (cid:48) ∈ FG( r,s ) I R (cid:48) · ˆ I R (cid:48) counts the number of uRAFs present, so we have: µ uRAF = E (cid:88) r,s (cid:88) R (cid:48) ∈ FG( r,s ) I R (cid:48) · ˆ I R (cid:48)  = (cid:88) r,s (cid:88) R (cid:48) ∈ FG( r,s ) E (cid:104) I R (cid:48) · ˆ I R (cid:48) (cid:105) (12) = (cid:88) r,s (cid:88) R (cid:48) ∈ FG( r,s ) E [ I R (cid:48) ] · E [ˆ I R (cid:48) ] , where the second equality is by linearity of expectation, and the third equality is by the inde-pendence assumption ( I ). Given R (cid:48) ∈ FG( r, s ), let C , C , . . . , C s + c be the random variables(ordered in any way) that correspond to the catalysis probabilities of the s products of R (cid:48) andthe c elements of F \ C − . We can then write:(13) E [ I R (cid:48) ] = P ( I R (cid:48) = 1) = E [ P ( I R (cid:48) = 1 | C , C , . . . , C s + c )] , where the second expectation is with respect to the random variables C i . The event I R (cid:48) = 1occurs precisely when each of the r reactions in R (cid:48) is catalysed by at least one of the s + c elements in ( π ( R (cid:48) ) \ F ) ∪ ( F \ C − ). By the independence assumption ( I ), P ( I R (cid:48) = 1 | C , C , . . . , C s + c ) = (cid:89) r (cid:48) ∈ R (cid:48) (cid:32) − s + c (cid:89) j =1 (1 − C j ) (cid:33) = (cid:32) − s + c (cid:89) j =1 (1 − C j ) (cid:33) r . Set V := (cid:81) s + cj =1 (1 − C j ). For each i ≥

0, we have: E [ V i ] = E (cid:34) s + c (cid:89) j =1 (1 − C j ) (cid:35) i  = E (cid:34) s + c (cid:89) j =1 (1 − C j ) i (cid:35) = s + c (cid:89) j =1 E [(1 − C j ) i ]= s + c (cid:89) j =1 λ i = λ s + ci , where the ﬁrst two equalities are trivial algebraic identities, the third is by the independenceassumption ( I ), the fourth is by deﬁnition and the last is trivial. Applying the binomial expansion (1 − V ) r = (cid:80) ri =0 ( − i (cid:0) ri (cid:1) V i to this last equation, togetherwith Eqn. (13), gives: E [ I R (cid:48) ] = r (cid:88) i =0 ( − i (cid:18) ri (cid:19) λ s + ci . Turning to inhibition, a RAF subset R (cid:48) of R in FG( r, s ) is a uRAF precisely if no reactionin R (cid:48) is blocked by any of the s + b elements of ( π ( R (cid:48) ) \ F ) ∪ ( F \ B − ). By the independenceassumption ( I (cid:48) ), P (ˆ I R (cid:48) = 1 | B , B , . . . , B s + b ) = (cid:89) r (cid:48) ∈ R (cid:48) (cid:32) s + b (cid:89) j =1 (1 − B j ) (cid:33) = (cid:32) s + b (cid:89) j =1 (1 − B j ) (cid:33) r = s + b (cid:89) j =1 (1 − B j ) r . Applying expectation (using the independence assumption ( I (cid:48) )), together with the identity E [(1 − B j ) r ] = ˆ λ r gives: E [ˆ I R (cid:48) ] = ˆ λ s + br . Combining these expressions into Eqn. (6) gives the ﬁrst equation in Part (i). The second isthen obtained by putting ˆ λ i = 1 for all i . Parts (ii) and (iii):

Observe that the function u = (1 − y ) r for r ≥ r >

1. Thus, by Jensen’s Inequality, for any random variable Y , we have:(14) E [(1 − Y ) r ] ≥ (1 − E [ Y ]) r , and the inequality is strict when Y is nondegenerate and r > Y = (cid:81) s + cj =1 (1 − C j ). Then the proof of Part (i) shows that: E [ I R (cid:48) ] = E [(1 − Y ) r ] . Thus, by Inequality (14) and the identity E [ (cid:81) s + cj =1 (1 − C j )] = (cid:81) s + cj =1 E [(1 − C j )] (by the indepen-dence assumption ( I )), we obtain: E [ I R (cid:48) ] ≥ (1 − E [ Y ]) r = (1 − (1 − µ C ) s + c ) r , with equality only for the uniform model. This gives Part (ii).For Part (iii)(a), it suﬃces to note that ˆ λ r = E [(1 − B ) r )] ≥ (1 − µ B ) r , by Inequality (14).For Part (iii)(b), suppose that Y is a random variable taking values in [0 ,

1] with mean η andlet Y be the random variable that takes the value 1 with probability η and 0 otherwise. Then E [ Y k ] = η for all k ≥

1, and E [ Y k ] ≤ η for all k ≥ Y k ≤ Y ); moreover, E [ Y ] = η ifand only if E [ Y (1 − Y )] = 0, which implies that Y = Y . Now apply this to Y = (1 − B ) and k = r to deduce for the distributions on B that have a given mean µ B , ˆ λ r is maximised whenthe distribution takes the value 1 with probability µ B and zero otherwise. (cid:3) XPECTED NUMBER OF UNINHIBITED RAF SETS 13

Proof of Corollary 1:

By Theorem 1, we have:(15) µ uRAF (2 p, tp ) = (cid:88) r,s n r,s − c [(1 − (1 − p ) s ) · (1 − tp ) s ] r , and(16) µ RAF ( p,

0) = (cid:88) r,s n r,s − c [1 − (1 − p ) s ] r . For each x ∈ (0 , . − (1 − x ) s ≥ − (1 − x ) s = (1 − (1 − x ) s )(1 + (1 − x ) s ) . Thus (with x = p ), we see that the term inside the square brackets in Eqn. (15) exceeds theterm in square brackets in Eqn. (16) by a factor of (1 + (1 − p ) s )(1 − tp ) s , and this is minimisedwhen s = N (the largest possible value s can take). In that case, we have:(1 + (1 − p ) s )(1 − tp ) s ∼ (1 + e − ν ) e − tν and this is at least 1 when t satisﬁes the stated inequality (namely, t ≤ ν ln(1 + e − ν )). In thisway, each term in Eqn. (15) is greater or equal to the corresponding term in square brackets inEqn. (16), which justiﬁes the inequality in Corollary 1. (cid:3) Justiﬁcation of Eqn. (9):

The j -th term in the summation term in the LHS of Eqn. (9) is (cid:0) Nj (cid:1) (1 − (1 − γ/N ) j ) j . For j = 1 this simpliﬁes to γ , and for j = 2, the term is 2 γ + O (1 /N ).More generally, for all j ≥

1, the j -th term in this sum is: (cid:18) Nj (cid:19) (cid:18) jγN j (cid:19) j + O (1 /N ) = γ j · j j j ! + O (1 /N ) . Finally, observe that, by Stirling’s formula for j ! we have γ j · j j j ! ∼ ( γe ) j √ πj , and this term converges to zero at exponential rate as j increases provided that γ < e − ..