On the generalization of the GMS evolutionary model
aa r X i v : . [ m a t h . P R ] A p r On the generalization of the GMS evolutionarymodel
Skevi Michael and Stanislav VolkovSeptember 4, 2018
Abstract
We study a generalization of the evolution model proposed byGuiol, Machado and Schinazi (2010). In our model, at each momentof time a random number of species is either born or removed fromthe system; the species to be removed are those with the lower fit-nesses, fitnesses being some numbers in [0 , f,
1] for some f ∈ [0 , Over the last years the modeling of biological evolution has received a lotof attention in literature. Many models were proposed to explain and un-derstand how nature works. A question that is a common reference to mostresearch done over this field is why some species survive while others in thesame ecosystem go extinct. 1ne of the models that were proposed for this purpose is the Bak-Sneppenmodel (BS) which was introduced by Per Bak and Kim Sneppen in 1993.The basic idea of their work was to build a model in which there exists acriterion that would represent the strength or resistance of each vertex in theecosystem. This criterion is called fitness . The fitness of a vertex is usuallyrelated to its genetic code. The initial idea of their model was that the vertexwith the weakest fitness is replaced by a new one. However, this leads to nointeractions between the vertices and hence the model did not have receivemuch interest either from the biological or the mathematical point of view.To include the interaction factor in their model, they suggested that a “weak”vertex when leaving the system will also affect vertices that are connectedwith it and they will be removed from the system as well.In particular, the BS model consists of an “ecosystem” that containsa (fixed) number N of vertices which are located on the circumference of acircle. A quantity between 0 and 1 is assigned to each vertex, and it representsits fitness. At each time step the vertex with the lowest fitness is replaced byanother one with a random fitness in the interval [0 , , f,
1] for some f that is believed to be close to 2 /
3. However, so2ar one could not find a theoretical proof to confirm this behaviour.In 2010 Guiol, Machado and Schinazi considered another stochastic modelof evolution (we will refer to it as the GMS model) as an alternative for the BSmodel, since they believed that the setup of the BS model was a bit artificialand did not represent the nature well. In the GMS model, the process startswiht an empty subset of vertices of [0 , p a new vertex is born (birth case) and with probability q = 1 − p one vertexis removed (death case). Each vertex that enters the system is assigned afitness value which is an independent random variable uniformly distributedon [0 , f c = q/p will eventually approacha uniform distribution in the corresponding interval, with the error being oforder less than n / ǫ for any ǫ >
0. Note that this mimics the behaviourwhich is expected to hold for the BS model.There are two basic differences between the two models. In the GMSmodel the number of vertices in the system is random (not fixed) as in theBS model, which seems to be a more realistic approach to an evolutionarymodel. The second difference is that in the GMS model only the weakestvertex is removed at each time, hence there is no interaction among thevertices of the ecosystem. This means that a “strong” vertex is more likelyto survive in the GMS model than in the BS model.Recently, there were some finer results for the GMS model by Ben-Ari etal (2011), which included a log log n correction term. Guiol et al in (2011)also discovered a link between the survival time in an evolution model andthe Bessel distributions.The Guiol et al (2010) paper motivated us to consider an extension of theGMS model, in which both the number of newborn and taken away verticesis random. Thus it makes the model even more realistic in expressing nature,3s well as providing us with some non-trivial mathematical challenges. InSection 2 we assume that the number of deaths is a bounded random variableand obtain the results similar to those in Guiol et al (2010); this assumptionis removed in Section 3 where we study the most general case. In our paper we will assume that at each step the numbers of vertices beingborn or taken away are random. Namely, suppose that X and Z are twopositive integer-valued random variables, X n ( Z n resp.) are i.i.d. randomvariables with the distribution of X ( Z resp.) and X n ’s and Z n ’s are allindependent. Fix p ∈ (0 ,
1) and set q = 1 − p . At time n , the state of thesystem is a finite subset T n of vertices in [0 , , X ≡ Z ≡ T = ∅ . At time n , with probability p we generate Z n new vertices, each having a fitness uniformly distributedover [0 ,
1] independently of each other and of anything else, so that | T n +1 | = | T n | + Z n ; otherwise with probability q = 1 − p we remove X n vertices withthe smallest X n fitnesses, with the agreement that if there are less than X n vertices in the system, the system becomes empty again; as a result, | T n +1 | =max {| T n | − X n , } here. Under some assumptions on the distributions of X and Z we will derive the results for the long-term behaviour of the system.First, for some constant f ∈ (0 ,
1) define L n , R n and R ′ n as follows: L n : set of vertices alive in the system at time n whose fitnesses lie in[0 , f ) R n : set of vertices alive in the system at time n whose fitnesses lie in4 f, R ′ n : set of vertices that were born in the system from time 0 to n andwere assigned a fitness in [ f, R n ⊆ R ′ n . Definition 1.
Suppose that A ⊆ A ⊆ A . . . is an infinite sequence ofsets, each consisting of a finite number of points in R . We say that A n approaches a random sample from distribution F if, with probability , thereexists another sequence of sets B ⊆ B ⊆ B . . . such that (i) each ofthese sets is a finite collection of i.i.d. random variables with the commondistribution F ; (ii) | B n | → ∞ as n → ∞ ; and (iii) | A n ∆ B n | = o ( | B n | ) as n → ∞ . Here A ∆ B = ( A \ B ) ∪ ( B \ A ) . Let p c = µ X µ X + µ Z . (1) Theorem 1.
Assume that there is an (integer) constant
M > such that X ≤ M a.s., and E ( Z ) < ∞ . Let µ X = E ( X ) and µ Z = E ( Z ) . Alsosuppose that p ∈ ( p c , and let f = qp µ X µ Z ∈ (0 , . (2) Then, for every ǫ > , there are n ∈ N and C > such that ≤ | R ′ n | − | R n | ≤ C n + ǫ for n ≥ n . Moreover, T n approaches a random sample from U [ f, .Proof. The general skeleton of the proof is similar to that in [3], althoughour model requires a deeper analysis. 5irst, look at those times when | L n | ≥ M . In the “death” event, R n isunaffected and all vertices will be removed from the complementary set L n .Hence, for those times E ( | L n +1 | − | L n | | F n ) = E [ W n ] = pf µ Z − qµ X = 0 (3)where F n is the sigma-algebra generated by the process by time n , and thedistribution of random variable W n is given by W n = ( Binomial ( Z n , f ) with probability p, − X n with probability q. (4)On the other hand, it is at the times when | L n | < M some vertices may betaken away from the set of R n , resulting in − M ≤ | R n +1 | − | R n | ≤ t n to be t n = |{ ≤ k ≤ n : | L k | < M }| the number of those “bad” times. We will show that t n is of order smallerthan n . Let k n = |{ ≤ k ≤ n : | L k − | ≥ M and | L k | < M }| For any µ > P (cid:16) t n > µn + ǫ (cid:17) ≤ P (cid:16) t n > µn + ǫ ; k n < n + ǫ (cid:17) + P (cid:16) k n ≥ n + ǫ (cid:17) = ( I ) + ( II ) . (5)First, we want to choose an appropriate µ and hence to get an upper boundon ( I ). Set E = 0 and for i = 1 , , . . . recursively define G i = min { k > E i : | L k | ≥ M } ,E i +1 = min { k > G i : | L k | < M } . { ≤ k ≤ n : | L k | < M } = { , , , . . . , n } \ ∞ [ i =1 [ E i , G i ) ! and max { i : E i ≤ n } = k n + 1 . Let ⌊·⌋ denote the integer part of a number. Observe that G i − E i arestochastically smaller than i.i.d. non-negative random variables ξ i with thedistribution given by P ( ξ i ≥ m ) = (cid:0) − ( pf ) M (cid:1) ⌊ m/M ⌋ , m = 0 , , , . . . since for | L k | to reach M , even starting from 0, it suffices to have M con-secutive birth events in which at least one of the new particles is located in[0 , f ]. Let µ = E ξ i < ∞ and m n = ⌊ n / ǫ ⌋ . Then( I ) = P (cid:16) t n > µn + ǫ ; k n < n + ǫ (cid:17) ≤ P k n +1 X i =1 [ G i − E i ] > µn + ǫ , k n < n + ǫ ! ≤ P m n +1 X i =1 [ G i − E i ] > µn + ǫ ! ≤ P m n +1 X i =1 ξ i > µn + ǫ ! ≤ P m n +1 X i =1 ξ i > µm n ! . At this point we will use a large deviation estimate which follows immediatelyfrom Lemma 9.4 in Chapter 1.9 of Durrett (1996):
Lemma 1.
Let X , X , . . . , X n be an i.i.d. sequence of random variables with µ := E X i and φ ( ϑ ) := E ( e ϑ X i ) < ∞ for some positive ϑ . Let κ ( ϑ ) = log φ ( ϑ ) and S n = X + X + . . . + X n . Then for α > µ , P ( S n ≥ nα ) ≤ exp {− n ( αϑ − κ ( ϑ )) } . Moreover, for ϑ small we have αϑ − κ ( ϑ ) > .
7y applying this lemma to the sequence of ξ i (note that E ( e ϑξ i ) < ∞ forsufficiently small ϑ due to the fact that ξ i is a linear transformation of anexponential random variable), we obtain that there exists θ > I ) ≤ P m n +1 X i =1 ξ i > µm n ! ≤ exp (cid:0) − θn / ǫ (cid:1) for all n. (6)Next, we want to get an upper bound on ( II ). We have( II ) = P (cid:16) k n ≥ n + ǫ (cid:17) ≤ P m n \ i =1 { E i +1 − G i < n } ! . At the same time, for each i ≥ E i +1 − G i is stochastically smaller thana random variable τ i ≥
1, where τ i ’s are independent and each having thedistribution of τ = min { j ≥ W + W + · · · + W j < } . Here W k are i.i.d. random variables with the distribution given by (4). Toestimate P ( τ < n ) we use a result from the general theory of random walksgiven in Feller (1966) volume 2 (Theorem 1.a in Chapter XII.8 and Theorem1 in Chapter XVIII.5 respectively). Theorem 2.
Let X , X , . . . , X n be an i.i.d, sequence of random variableswith distribution F such that < F (0) < . We define the sums S i , i ∈{ , , . . . , n } so that S = 0 and S n = X + X + . . . + X n and let K n = { ≤ k ≤ n : S k > S , S k > S , . . . , S k > S k − , S k ≥ S k +1 , . . . S k ≥ S n } = min { j : S j = max i ∈ [0 ,n ] S i } . If the series ∞ X k =1 n (cid:18) P ( S n > − (cid:19) (7)8 onverges then for ≤ k ≤ n , P ( K n = k ) ∼ (cid:18) kk (cid:19)(cid:18) n − kn − k (cid:19) n where a n ∼ b n means that lim n →∞ a n /b n = 1 . Theorem 3.
Consider the notation of Theorem 2 and suppose that its con-ditions hold. If E X = 0 and E ( X ) = σ < ∞ , then the series (7) is atleast conditionally convergent. Now we can apply Theorems 2 and 3 by setting X i = − W i since E W i = 0due to (3), and E W i < ∞ due to the fact that X is bounded and E ( Z ) < ∞ .Consequently P ( τ ≥ n ) = P ( X ≤ , X + X ≤ , . . . , X + · · · + X n ≤ P ( K n = 0) ∼ (cid:18) nn (cid:19) n = (2 n )!( n !) n ∼ √ πn where we used Stirling’s formula in the last equation. Combining the abovecalculations we have that( II ) ≤ ( P ( τ < n )) m n = (1 − P ( τ ≥ n )) m n = (cid:18) − o (1) √ πn (cid:19) m n = (cid:20) exp (cid:18) − o (1) √ πn (cid:19)(cid:21) m n . Therefore, there is a constant α > n ( II ) ≤ e − αn ǫ . (8)Hence, plugging (6) and (8) into (5) we obtain ∞ X n =1 P (cid:16) t n > µn + ǫ (cid:17) < ∞ and therefore by the Borel-Cantelli lemma a.s. there is an n such that t n ≤ µn + ǫ for all n ≥ n .Since | R ′ n +1 | − | R ′ n | = | R n +1 | − | R n | + ∆, where ∆ = 0 if | L n | ≥ M and∆ ∈ { , , , . . . , M } if | L n | < M , we have9 R ′ n | − | R n | = n − X k =0 (cid:0) | R ′ k +1 | − | R ′ k | (cid:1) − n − X k =0 ( | R k +1 | − | R k | )= n − X k =0 (cid:2) ( | R ′ k +1 | − | R ′ k | ) − ( | R k +1 | − | R k | ) (cid:3) {| L k |
Let µ = E X = P ∞ n =1 P ( X ≥ n ) ∈ [0 , ∞ ]. First, suppose that c ≤ m > /m < c . We have µ = ∞ X n =1 P ( X ≥ n ) ≤ ∞ X n =1 P ( X ≥ cn ) ≤ ∞ X n =1 P ( X ≥ n/m ) = m ∞ X k =1 P ( X ≥ k ) = mµ. (9)Now if c >
1, there exists and integer m > c . Then µ = ∞ X n =1 P ( X ≥ n ) ≥ − ∞ X n =0 P ( X ≥ cn ) ≥ − ∞ X n =0 P ( X ≥ nm ) ≥ − ∞ X n =0 m " m − X k =0 P ( X ≥ nm + k ) = − m ∞ X k =0 P ( X ≥ k ) = 1 + µm − . (10)Together, (9) and (10) yield the statement of the Lemma.Recall that T n is the set of species alive in the system at time n , so incase that we have a death event, | T n +1 | = max { , | T n | − X n } . Moreover, assuming p > p c , for every ǫ ∈ [0 , − f ) where p c is given by (1)and f is the same as in (2) define L ǫn := T n ∩ [0 , f + ǫ ) and R ǫn := T n ∩ [ f + ǫ, L n = L n and R n = R n . Also, define A ǫn as follows: A ǫn = { at time n we kill all vertices in L ǫn } = { L ǫn +1 = ∅} . emma 3. Suppose µ Z = E Z < ∞ , µ X = E X < ∞ and p > p c . Then,with probability , A ǫn occurs finitely often.Proof. First note that | L ǫn +1 | = | L ǫn | + Y n , where Y n ∼ Binomial ( Z n , f + ǫ ) , with probability p, max {| L ǫn | − X n , } , with probability q. Now, let Q = 0 and define Q i recursively as Q n +1 = Q n + Binomial ( Z n , f + ǫ ) , with probability p, − X n , with probability q. Thus Q n can be coupled with | L ǫn | in such a way that | L ǫn | ≥ Q n for all n .On the other hand, Q n can be written as a sum of n i.i.d. random variableseach with expectation 2 δ := pµ Z ( f + ǫ ) − qµ X = pµ Z ǫ >
0. By the stronglaw of large numbers we havelim n →∞ Q n n = 2 δ a.s.Hence lim inf n →∞ | L ǫn | n ≥ δ a.s.which yields that with probability 1 there exists a time N ∈ N such that | L ǫn | > nδ for all n ≥ N . Next we calculate the probability that A ǫn occurs: P ( A ǫn ) = q P ( X n ≥ | L ǫn | ) ≤ q P ( X ≥ nδ ) for n ≥ N . Consequently, by Lemma 2, since E X < ∞ , we have ∞ X n =1 P ( A ǫn ) ≤ N + q ∞ X n =1 P ( X ≥ nδ ) < ∞ and so by the Borel-Cantelli lemma, we have that A ǫn occurs finitely oftenwith probability 1. 12ecall that µ Z = E Z and µ X = E X . Theorem 4.
The following is true.(a) Suppose µ Z = ∞ and µ X < ∞ . Then T n approaches a random samplefrom U [0 , .(b) Suppose µ Z < ∞ . If µ X < ∞ and p > p c then T n approaches a randomsample from U [ f, where f is given by (2).(c) Suppose µ Z < ∞ . If µ X < ∞ and p < p c , or µ X = ∞ , then T n = ∅ forinfinitely many n . Remark 1.
Theorem 4 leaves some gaps: it covers neither the critical casewhen µ X , µ Z < ∞ and p = p c , nor the general case where both µ X = µ Z = ∞ .Proof. (a) Let B n be the set of all the particles born in the system by time n and let D n be the set of particles removed from the system by time n ;therefore B n is a collection of i.i.d. U [0 ,
1] random variables and T n = B n \ D n .Since at each time n we remove from the system at most X n particles, by thestrong law of large numbers we havelim sup n →∞ | D n | n ≤ qµ X < ∞ a.s., andlim n →∞ | B n | n = pµ Z = ∞ a.s.Therefore,lim sup n →∞ | T n ∆ B n || B n | = lim sup n →∞ | D n || B n | = lim sup n →∞ | D n | /n | B n | /n = 0 a.s.which yields the desired conclusion.(b) Assume µ X < ∞ and p ∈ ( p c , R ′ n denotes a set ofvertices that were born in the system up to the time n that were assigned13 fitness in [ f, R ′ n is a collection of i.i.d. U [ f,
1] random variables.Moreover, as in the proof of Theorem 1, | R ′ n | → ∞ . Fix an ǫ > R ǫn ⊆ R n ⊆ R ′ n . According to Lemma 3, there will be a time N suchthat events A ǫn do not occur for n ≥ N . This implies that no vertices aretaken away from [ f + ǫ,
1] for those n , and as a resultsup n | ( R ′ n \ R ǫn ) ∩ [ f + ǫ, | < ∞ a.s.On the other hand, by the strong law we have | R ′ n ∩ [ f, f + ǫ, | /n → ǫ p µ Z a.s., therefore0 ≤ lim sup n →∞ | R ′ n \ R n | n ≤ lim sup n →∞ | R ′ n \ R ǫn | n ≤ ǫ p µ Z a.s.Since ǫ > n →∞ | R ′ n \ R n | n = 0 a.s.From the end of the proof of Theorem 1 we have | R ′ n | /n → pµ Z (1 − f ) a.s.and | L n | /n → n →∞ | R ′ n ∆ T n || R ′ n | = lim sup n →∞ | R ′ n \ R n | + | L n || R ′ n | ≤ lim sup n →∞ | R ′ n \ R n | /n | R ′ n | /n +lim sup n →∞ | L n | /n | R ′ n | /n = 0 a.s.which proves that T n approaches a random sample from U [ f, µ X < ∞ but p < p c . Due to the renewal nature of theprocess, it is sufficient to demonstrate that there exist a.s. at least one n ≥ T n = ∅ .Let W n be i.i.d. random variables with the distribution given by W n = ( Z n with probability p, − X n with probability q. Let τ = inf { n ≥ W + W + · · · + W n ≤ } . Then τ + 1 has the samedistribution as inf { n ≥ | T n | = 0 } . Observe that by the strong lawlim n →∞ W + · · · + W n n = E W = pµ Z − qµ X = µ X (cid:18) pp c − (cid:19) < . τ < ∞ a.s.Finally, assume that µ X = ∞ . By the strong law we havelim sup n →∞ | T n | n ≤ lim n →∞ | B n | n = p µ Z a.s.therefore there exists c > N such that | T n | ≤ cn forall n ≥ N . On the other hand, by Lemma 2, X n P ( X n ≥ cn ) = X n P ( X ≥ cn ) = ∞ , and since the events { X n ≥ cn } are independent, by the second Borel-CantelliLemma there will be infinitely many n for which X n ≥ cn ≥ | T n | and hence T n +1 = ∅ . References [1] Bak, V., and Sneppen, K. (1993). Punctuated equilibrium and criticalityin a simple model of evolution,
Physical Review Letters , , 4083–4086.[2] Ben-Ari, I., Matzavinos, A., and Roitershtein, A. (2011). On a speciessurvival model. http://arxiv.org/abs/1006.2585[3] Guiol, H., Machado, F.P., and Schinazi, R.B. (2010). A stochasticmodel of evolution. To appear in Markov Processes and Related Fields .http://arxiv.org/abs/0909.2108[4] Guiol, H., Machado, F.P., and Schinazi, R.B. (2011). On a link between aspecies survival time in an evolution model and the Bessel distributions.http://arxiv.org/abs/1102.2817[5] Durrett, R. (1996)
Probability: Theory and Examples. (2nd. ed.)Duxbury Press, Belmont, California.156] Feller, W. (1971).
An Introduction to Probability Theory and Its Appli-cations , Vol. 2 (second edition). John Wiley.[7] Meester, R., and Znamenski, D. (2003). Limit behavior of the Bak-Sneppen evolution model.
Ann. Probab. , no. 4, 1986-2002.[8] Meester, R., and Znamenski, D. (2004). Critical thresholds and the limitdistribution in the Bak-Sneppen model. Comm. Math. Phys.246