[PDF] A unifying approach to branching processes in varying environments

Abstract

Branching processes (Z_n)_{n \ge 0} in a varying environment generalize the Galton-Watson process, in that they allow time-dependence of the offspring distribution. Our main results concern general criteria for a.s. extinction, square-integrability of the martingale (Z_n/\mathbf E[Z_n])_{n \ge 0}, properties of the martingale limit W and a Yaglom type result stating convergence to an exponential limit distribution of the suitably normalized population size Z_n, conditioned on the event Z_n >0. The theorems generalize/unify diverse results from the literature and lead to a classification of the processes.

Full PDF

aa r X i v : . [ m a t h . P R ] S e p A unifying approach to branching processesin varying environment

G¨otz Kersting ∗ September 14, 2017

Abstract

Branching processes ( Z n ) n ≥ in varying environment generalize the Galton-Watson pro-cess, in that they allow time-dependence of the oﬀspring distribution. Our main results con-cern general criteria for a.s. extinction, square-integrability of the martingale ( Z n / E [ Z n ]) n ≥ ,properties of the martingale limit W and a Yaglom type result stating convergence to anexponential limit distribution of the suitably normalized population size Z n , conditioned onthe event Z n >

0. The theorems generalize/unify diverse results from the literature and leadto a classiﬁcation of the processes.

Keywords and phrases. branching process, varying environment, Galton-Watson process,exponential distribution

MSC 2010 subject classiﬁcation.

Primary 60J80.

Branching processes ( Z n ) n ≥ in varying environment generalize the classical Galton-Watson pro-cesses, in that they allow time-dependence of the oﬀspring distribution. This natural settingpromises relevant applications (e.g. to random walks on trees as in [13]). Yet these processesare seldom considered or applied nowadays. This lack of interest is largely due to the fact thatformer research on branching processes in varying environment was widely stimulated by theappearence of certain exotic properties suggesting that some typical behaviour can hardly bespotted. In particular, a classiﬁcation along the lines of Galton-Watson processes hasn’t beenobtained by now. In this paper we like to put such a misled impression right and intend to furnisha classiﬁcation. To this end we prove several theorems ranging from criteria for a.s. extinction upto Yaglom type results. We require only mild regularity assumptions, in particular we don’t setany restrictions on the sequence of expectations E [ Z n ], n ≥

0, thereby generalizing and unifyinga number of individual results from the literature.In order to deﬁne a branching process in varying environment (BPVE) denote by Y , Y , . . . a sequence of random variables with values in N and by f , f , . . . their distributions. Let Y in , i, n ∈ N , be independent random variables such that Y in and Y n coincide in distribution for all ∗ Institut f¨ur Mathematik, Goethe Universit¨at, Frankfurt am Main, Germany, [email protected],work partially supported by the DFG Priority Programme SPP 1590 “Probabilistic Structures in Evolution” , n ≥

1. Deﬁne the random variables Z n , n ≥

0, with values in N recursively as Z := 1 , Z n := Z n − X i =1 Y in , n ≥ . Then the process ( Z n ) n ≥ is called a branching process in varying environment v = ( f , f , . . . ) with initial value Z = 1 . It may be considered as a model for the development of the size of apopulation where individuals reproduce independently with oﬀspring distributions f n potentiallychanging among generations. Without further mention we always require that 0 < E [ Y n ] < ∞ for all n ≥ Z n is a.s. convergent to arandom variable Z ∞ with values in N ∪ {∞} . It also clariﬁes under which conditions ( Z n ) n ≥ may ‘fall asleep’ at a positive state meaning that the event 0 < Z ∞ < ∞ occurs with positiveprobability. Let us call such a branching process asymptotically degenerate . Thus for a BPVE itis no longer true that the process a.s. either gets extinct or else converges to inﬁnity. For thereaders’ convenience we add as an appendix a (comparatively) short proof of Lindvall’s theorem.As mentioned above a BPVE may exhibit extraordinary properties, which don’t occur forGalton-Watson processes. Thus it may possess diﬀerent growth rates, as detected by MacPheeand Schuh [14]. Here we establish a framework which excludes such exceptional phenomena andelucidates the generic behaviour. As or results will show, this is naturally done in an ℓ -setting.Our main assumption is a requirement of uniformity which reads as follows: There is aconstant c < ∞ such that for all natural numbers n ≥ E [ Y n ; Y n ≥ ≤ c E [ Y n ; Y n ≥ · E [ Y n | Y n ≥ . (A)This regularity assumption is considerably mild. As we shall explicate in the next section, it isfulﬁlled for distributions f n , n ≥

1, belonging to any common class of probability measures, likePoisson, binomial, hypergeometric, geometric, linear fractional, negative binomial distributions,without any restriction to the parameters. It is also satisﬁed in the case that the random variables Y n , n ≥

1, are a.s. uniformly bounded by a constant c < ∞ . For the proof note that we have E [ Y n | Y n ≥ ≥

1. Since in examples a direct veriﬁcation of (A) may be tedious, we shallpresent in the next section a third moment condition which often can be easily checked.Before presenting our results let us agree on the following notational conventions: Let P bethe set of all probability measures on N . We write the weights of f ∈ P as f [ k ], k ∈ N . Alsowe deﬁne f ( s ) := ∞ X k =0 s k f [ k ] , ≤ s ≤ . Thus we denote the probability measure f and its generation functions by one and the samesymbol. This facilitates presentation and will cause no confusion whatsoever. Keep in mindthat each operation applied to these measures has to be understood as an operation applied totheir generating functions. Thus f f stands not only for the multiplication of the generatingfunctions f , f but also for the convolution of the respective measures. Also f ◦ f expressesthe composition of generating functions as well as the resulting probability measure. We shallconsider the mean and factorial moments of a random variable Y with distribution f , E [ Y ] = f ′ (1) , E [ Y ( Y − f ′′ (1) , E [ Y ( Y − Y − f ′′′ (1) , ν := E [ Y ( Y − E [ Y ] , ρ := Var [ Y ] E [ Y ] = ν + 1 E [ Y ] − . We shall discuss branching processes in varying environment along the lines of Galton-Watsonprocesses. Let for n ≥ q := P ( Z ∞ = 0) , µ n := f ′ (1) · · · f ′ n (1) , ν n := f ′′ n (1) f ′ n (1) , ρ n := ν n + 1 f ′ n (1) − µ := 1. Thus q is the probability of extinction and, as is well-known, µ n = E [ Z n ], n ≥

0. Note that for the standardized factorial moments ν n we have ν n < ∞ under assumption(A). This implies E [ Z n ] < ∞ for all n ≥ Theorem 1.

Assume (A) . Then the conditions (i) q = 1 , (ii) E [ Z n ] = o ( E [ Z n ]) (cid:12)(cid:12) as n → ∞ , (iii) ∞ X k =1 ρ k µ k − = ∞ , (iv) µ n → and/or ∞ X k =1 ν k µ k − = ∞ are equivalent. Moreover, the conditions (v) q < , (vi) E [ Z n ] = O ( E [ Z n ] ) (cid:12)(cid:12) as n → ∞ , (vii) ∞ X k =1 ρ k µ k − < ∞ , (viii) ∃ < r ≤ ∞ : µ n → r and ∞ X k =1 ν k µ k − < ∞ are equivalent. These conditions are eﬀective in diﬀerent ways. Condition (iii)/(vii) appears to be particularysuitable as a criterion for a.s. extinction, whereas the conditions (iv) and (viii) will prove usefulfor the classiﬁcation of BPVEs. Condition (vi) will allow to determine the growth rate of Z n ,see Theorem 2. Observe that (ii) can be rewritten as E [ Z n ] = o ( Var [ Z n ]). In simple phrase thissays that under (A) we have a.s. extinction, iﬀ the noise dominates the mean in the long run.We point out that conditions (iii) and (vi) employ not only the expectations µ n but also secondmoments. This is a novel aspect in comparsion with Galton-Watson processes and also withAgresti’s [1] classical criterion on branching processes in varying environment. Agresti proves3.s. extinction iﬀ P k ≥ /µ k − = ∞ . He could do so by virtue of his stronger assumptions,which e.g. don’t cover asymptotically degenerate processes. In our setting it may happen that P k ≥ ρ k /µ k − = ∞ and P k ≥ /µ k − < ∞ , and also the other way round. This is shown by thefollowing examples. Examples. (i) Let Y n take only the values n + 2 or 0 with P ( Y n = n + 2) = n − . Then E [ Y n ( Y n − ∼ n , E [ Y n ] = 1 + 2 /n , E [ Y n − | Y n ≥ ∼ n such that (A) is fulﬁlled. Also µ n ∼ n / ρ n ∼ n ,hence P k ≥ /µ k − < ∞ and P k ≥ ρ k /µ k − = ∞ .(ii) Let Y n take only the values 0,1 or 2 with P ( Y n = 0) = P ( Y n = 2) = 1 / (2 n ). Then E [ Y n ( Y n − ∼ n − , E [ Y n ] = 1 and E [ Y n − | Y n ≥ ∼ / (2 n ) such that (A) is fulﬁlled. Also µ n = 1 and ρ n ∼ n − , hence P k ≥ /µ k − = ∞ and P k ≥ ρ k /µ k − < ∞ .The last example exhibits an asymptotically degenerate branching processes.Next we turn to the normalized population sizes W n := Z n µ n , n ≥ . Clearly ( W n ) n ≥ constitutes a non-negative martingale, thus there exists an integrable randomvariable W ≥ W n → W a.s., as n → ∞ . Under (A) the random variable W exhibits the dichotomy known for Galton-Watson processes. Theorem 2.

Assume (A) . Then we have: (i) If q = 1 , then W = 0 a.s. (ii) If q < , then E [ W ] = 1 and P ( W = 0) = q . A formula for the variance of W may be found in [7]. We point out that Assumption (A)excludes the possibility of P ( W = 0) > q , in particular the possibility of diﬀerent rates of growthas determined by MacPhee and Schuh [14] (see also [4], [5]). By means of Theorem 2 (ii) wealso gain further insight on asymptotically degenerate processes. Under assumption (A) they arejust those processes which fulﬁl the properties q < < lim n →∞ µ n < ∞ . Together withTheorem 1, (v) and (viii) we obtain the following collorary. Corollary.

Under (A) a BPVE is asymptotically degenerate, iﬀ P ∞ k =1 ν k < ∞ and the sequence µ n , n ≥ , has a positive, ﬁnite limit. Then Z ∞ < ∞ a.s. Now we consider the random variables Z n conditioned on the events Z n >

0. The next resultspeciﬁes the circumstances under which the random variables stay stochastically bounded.4 heorem 3.

Let (A) be satisﬁed. Then these conditions are equivalent: (i) for all ε > there is a c < ∞ such that P ( Z n > c | Z n > ≤ ε for all n ≥ , (ii) there is a c > such that cµ n ≤ P ( Z n > ≤ µ n for all n ≥ , or, what amounts to thesame thing, sup n ≥ E [ Z n | Z n > < ∞ , (iii) n X k =1 ν k µ k − = O (cid:16) µ n (cid:17) as n → ∞ . This theorem applies to two diﬀerent regimes. In case of q < < lim n →∞ µ n < ∞ , that is if we deal with an asymptotically degenerate process. The case q = 1 is more substantial. For a Galton-Watson process the theorem’s conditions are valid just inthe subcritical setting. Recall that in this special situation the conditioned random variables Z n have a limiting distribution, too. It is easy to see that such a result cannot hold in our generalcontext of a BPVE. Indeed: there are two oﬀspring distributions ˆ f and ˜ f such that the limitingdistributions ˆ g and ˜ g for the corresponding conditional Galton-Watson processes diﬀer from eachother. Choose an increasing sequence 0 = n < n < n < · · · of natural numbers and considerthe BPVE ( Z n ) n ≥ in varying environment v = ( f , f , . . . ), where f n = ˆ f for n k < n ≤ n k +1 , k ∈ N , and f n = ˜ f else. Then it is obvious that Z n k +1 given the event Z n k +1 > g and Z n k given the event Z n k > g , provided thatthe sequence ( n k ) k ≥ is increasing suﬃciently fast.Finally we arrive at results in the spirit of Kolomgorov’s and Yaglom’s classical asymptotics,which for Galton-Watson processes signify the critical region. Here we need another condition.We require: For every ε > c ε < ∞ such that for all natural numbers n ≥ E (cid:2) Y n ; Y n > c ε (1 + E [ Y n ]) (cid:3) ≤ ε E (cid:2) Y n ; Y n ≥ (cid:3) (B)This kind of uniform integrability condition is again widely satisﬁed, as we explain in the nextsection. It implies assumption (A). Indeed, for ε = 1 / E [ Y n ; Y n ≥ ≤ E [ Y n ; 2 ≤ Y n ≤ c / (1 + E [ Y n ])] ≤ c / (1 + E [ Y n ]) E [ Y n ; Y n ≥ . (1)Since 1 + E [ Y n ] ≤ E [ Y n | Y n ≥ c = 4 c / . Theorem 4.

Let (B) be satisﬁed and let q = 1 . Assume that µ n = o (cid:16) n X k =1 ν k µ k − (cid:17) as n → ∞ . Then P ( Z n > ∼ (cid:16) n X k =1 ν k µ k − (cid:17) − as n → ∞ . Moreover, setting a n := µ n n X k =1 ν k µ k − , n ≥ , then a n → ∞ and the distribution of Z n /a n conditioned on the event Z n > converges to astandard exponential distribution. a n ∼ E [ Z n | Z n > q < n →∞ µ n is ﬁnite or inﬁnite. The ﬁrst onecovers asymptotically processes and the second one the truely supercritical processes. In thecase q = 1 we call the processes critical under the assumptions of Theorem 4 (then we necessar-ily have P ∞ k =1 ν k µ k − = ∞ ) and subcritical under the conditions of Theorem 3 (then necessarilylim n →∞ µ n = 0). This results in the following classiﬁcation of a branching process in environment v = ( f , f , . . . ) under assumption (A). Term it supercritical , if lim n →∞ µ n = ∞ and ∞ X k =1 ν k µ k − < ∞ , asympt. deg. , if 0 < lim n →∞ µ n < ∞ and ∞ X k =1 ν k µ k − < ∞ , critical , if ∞ X k =1 ν k µ k − = ∞ and 1 µ n = o (cid:16) n X k =1 ν k µ k − (cid:17) , subcritical , if lim n →∞ µ n = 0 and n X k =1 ν k µ k − = O (cid:16) µ n (cid:17) . In the critical case convergence of the means µ n is not enforced, they may diverge, convergeto zero or even oszillate. There are also mixed cases oszillating between the critical and thesubcritical regimes. Examples. (i) In the case 0 < inf n ν n ≤ sup n ν n < ∞ (as e.g. for Poisson variables) the classiﬁcationsimpliﬁes. Here we are in the supercritical regime, iﬀ P k ≥ /µ k < ∞ (which enforces µ n → ∞ ).Roughly speaking this means that µ n has to grow faster than linearly. On the other hand weare in the subcritical regime, iﬀ 1 /µ n ≥ c P n − k =0 /µ k for some constant c > µ n → µ n ≤ c − (1 + c ) − n for n ≥ µ n decreases atleast at a geometric rate. Asymptotically degenerate behaviour is excluded, and there remainsplenty of room for critical processes, that is for the processes which conform to the requirements P k ≥ /µ k = ∞ and 1 /µ n = o (cid:0) P n − k =0 /µ k (cid:1) .(ii) In the binary case P ( Y n = 2) = p n , P ( Y n = 0) = 1 − p n we get f ′ n (1) = ν n = 2 p n and µ n = 2 n p · · · p n . This boils down to the same classiﬁcation as in the previous example.(iii) In the symmetric case P ( Y n = 0) = P ( Y n = 2) = p n / P ( Y n = 1) = 1 − p n we have µ n = 1 and ν n = p n . Here we ﬁnd critical or asymptotically degenerate behaviour, according towhether P ∞ k =1 p n is divergent or convergent.Our proofs use mainly tools from analysis. We are faced with the task to treat the probabilitymeasures f ◦ · · · ◦ f n , which, as is well-known, are the distributions of the random variables Z n .In order to handle such iterated compositions of generating functions we resort to a device which6as been applied from the beginning in the theory of branching processes. For a probabilitydistribution f on N with positive, ﬁnite mean m we deﬁne a function ϕ : [0 , → R through theequation 11 − f ( s ) = 1 m (1 − s ) + ϕ ( s ) , ≤ s < . To a certain extent the mean and the ‘shape’ of f are decoupled in this way. Indeed, Lemma 1below shows that ϕ takes values which are of the magnitude of the standardized second factorialmoment ν . Therefore we brieﬂy call ϕ the shape function of f . As we shall see these functions areuseful to dissolve the generating function f ◦ · · · ◦ f n into a sum. Here our contribution consistsin obtaining sharp upper and lower bounds for the function ϕ and its derivative ϕ ′ , which thenserve to precisely estimate the survival probabilities P ( Z n > ϕ which we give below. We shall seethat this bound can be considered as a special case of the well-known Paley-Zygmund inequality.Agresti also obtained a lower bound for the survival probabilities, which, however, in general isaway from our sharp bound. Lyons [13] obtained the equivalence of (v), (vi), (vii) and (somewhatdisguised) (viii) in Theorem 1 under the assumption that the Y n are a.s. bounded by a constant c < ∞ , with methods completely diﬀerent from ours. He also proved Theorem 2, again under theassumption that the oﬀspring numbers are a.s. uniformly bounded by a constant. D’Souza andBiggins [5] obtained Theorem 2 under a diﬀerent set of assumptions. They require that there arenumbers a > , b > µ m + n /µ m ≥ ab n for all m, n ≥ Y n are uniformly dominated by a random variable Y with E [ Y log + Y ] < ∞ . Goettge [9] obtains E [ W ] = 1 under the alleviated condition µ n ≥ an b with a > , b > P ( W = 0) = q . Inorder to prove the conditional limit law in Theorem 4 Jagers [10] draws attention to uniformestimates due to Sevast’yanov [15] (see also Lemma 3 in [6]). However, this approach demandsamongst others the strong assumption that the sequence E [ Z n ], n ≥

0, is bounded from aboveand away from zero. Independently and in parallel to our work N. Bhattacharya and M. Perlman[2] have presented a considerable generalization of Jager’s result, on a diﬀerent route and underassumptions which are stronger than ours.The paper is organized as follows. In Section 2 we discuss the assumptions and severalexamples. In Section 3 we analyze the shape function ϕ . Then Section 4 contains the proofs ofour theorems. In the Appendix we return to Lindvall’s theorem. Let us now compare the assumptions (A) and (B). The following example illustrates their diﬀer-ence in range. 7 xample.

Let Y have a linear fractional distribution meaning that P ( Y = y | Y ≥

1) = (1 − p ) y − p , y ≥ < p < P ( Y ≥ E [ Y | Y ≥

1] = 1 p , E [ Y − | Y ≥

1] = (1 − p ) p , E [ Y ( Y − | Y ≥

1] = 2(1 − p ) p , and it follows E [ Y ; Y ≥ ≤ E [ Y ( Y − − p ) p P ( Y ≥ E [ Y − Y ≥ · E [ Y | Y ≥ ≤ E [( Y ; Y ≥ · E [ Y | Y ≥ . Thus for any sequence Y n of linear fractional random variables assumption (A) is fulﬁlled with c = 4, whatever their parameters p n and P ( Y n ≥

1) are.On the other hand formula (1) implies in the linear fractional case the inequality2(1 − p n ) p n P ( Y n ≥

1) = E [ Y n ( Y n − ≤ E [ Y n ; Y n ≥ ≤ c / E [ Y n ; Y n ≥ · (1 + E [ Y n ]) ≤ c / E [ Y n − Y n ≥ · (1 + E [ Y n ]) = 4 c / (1 − p n ) p n P ( Y n ≥ (cid:16) p n P ( Y n ≥ (cid:17) which simpliﬁes to 12 c / ≤ p n + P ( Y n ≥ . Therefore assumption (B) prevents a degenerating distribution of Y n in the sense that it takespositive values only with asymptotically vanishing probability but given this event its values aregetting larger and larger.As it happens, Theorem 4 still holds true for linear fractional Y n , n ≥

1, regardless of thevalidity ot (B). Then, as is well known, also Z n is linear fractional for any n ≥

1, and consequentlythe sequence Z n / E [ Z n | Z n ≥

1] given the event that Z n ≥ E [ Z n | Z n ≥ → ∞ . We leave it to the reader towork out the details.In other examples it might be cumbersome to verify assumptions (A) or (B) directly. Thereforewe introduce another assumption, which is more amenable in this respect. It reads: There is aconstant ¯ c < ∞ such that for all natural numbers n ≥ E [ Y n ( Y n − Y n − ≤ ¯ c E [ Y n ( Y n − · (1 + E [ Y n ]) (C)Condition (C) implies (A) and (B), as seen from the following proposition. Proposition.

If condition (C) is fulﬁlled, then (B) holds with c ε = max(3 , c/ε ) and (A) holdswith c = max(12 , c ) . roof. From c ε ≥ E [ Y n ; Y n > c ε (1 + E [ Y n ])] ≤ E [( Y n − Y n − Y n > c ε (1 + E [ Y n ])] ≤ E [ Y n ( Y n − Y n − c ε (1 + E [ Y n ]) ≤ cc ε E [ Y n ( Y n − . It follows E [ Y n ; Y n > c ε (1 + E [ Y n ])] ≤ ε E [ Y n ; Y n ≥ , which is our ﬁrst claim. The second one follows by means of (1).Condition (C) is formulated in such a way that it can be easily handled by means of generatingfunctions and its derivatives. Here are some examples. Examples. (i) Let Y be Poisson with parameter λ >

0. Then E [ Y ( Y − Y − λ ≤ λ ( λ + 1) = E [ Y ( Y − E [ Y ]) . For this type of distribution (C) is fulﬁlled with ¯ c = 1.(ii) For binomial Y with parameters m ≥ < p < E [ Y ( Y − Y − m ( m − m − p ≤ m ( m − p mp ≤ E [ Y ( Y − E [ Y ]) . (iii) For a hypergeometric distribution with parameter ( N, K, m ) we have for N ≥ E [ Y ( Y − Y − m ( m − m − K ( K − K − N ( N − N − ≤ m ( m − K ( K − N ( N − mKN ≤ E [ Y ( Y − E [ Y ]) , and (C) is satisﬁed with ¯ c = 3. The case N ≤ negative binomial distributions the generating function is given by f ( s ) = (cid:16) p − s (1 − p ) (cid:17) α with 0 < p < α . Now E [ Y ] = α − pp , E [ Y ( Y − α ( α + 1) (1 − p ) p , E [ Y ( Y − Y − α ( α + 1)( α + 2) (1 − p ) p . Thus E [ Y ( Y − Y − ≤ E [ Y ( Y − E [ Y ]) . Again (C) is fulﬁlled with ¯ c = 3. 9 Bounds for the shape function

For f ∈ P with mean 0 < m = f ′ (1) < ∞ deﬁne the shape function ϕ = ϕ f : [0 , → R throughthe equation 11 − f ( s ) = 1 m (1 − s ) + ϕ ( s ) , ≤ s < . Due to convexity of f ( s ) we have ϕ ( s ) ≥ ≤ s <

1. By means of a Taylor expansion of f around 1 one obtains lim s ↑ ϕ ( s ) = f ′′ (1) / (2 f ′ (1) ), thus we extend ϕ by setting ϕ (1) := ν ν := f ′′ (1) f ′ (1) . In this section we prove the following sharp bounds.

Lemma 1.

Assume f ′′ (1) < ∞ . Then for ≤ s ≤ ϕ (0) ≤ ϕ ( s ) ≤ ϕ (1) . (2)Note that ϕ is identical zero if f [ z ] = 0 for all z ≥

2. Else ϕ (0) >

0, and the lower bound of ϕ becomes strictly positive. Choosing s = 1 and s = 0 in (2) we obtain ϕ (0) / ≤ ϕ (1) and ϕ (0) ≤ ϕ (1). Note that for f = δ k (Dirac-measure at point k ) and k ≥ ϕ (1) = ϕ (0) / ϕ ( s ). Lemma 2.

Let the random variable Y have distribution f and assume f ′′ (1) < ∞ . Then for ≤ s ≤ and natural numbers a ≥ s ≤ t ≤ | ϕ (1) − ϕ ( t ) | ≤ νa (1 − s ) + 2 m E [ Y ; Y > a ] + 2 mν (1 − s ) . Uniform estimates of ϕ (1) − ϕ ( s ) based on third moments have already been obtained by Sev-ast’yanov [15] and others (see Lemma 3 in [6]). Our lemma implies and generalizes these estimates.For the proof of these lemmas we use the following known result. For convenience we give itsproof. Lemma 3.

Let g , g ∈ P have the same support and satisfy the following property: For any y ∈ N with g [ y ] > it follows g [ z ] g [ y ] ≤ g [ z ] g [ y ] for all z > y . Also let α : N → R be a non-decreasing function. Then ∞ X y =0 α ( y ) g [ y ] ≤ ∞ X y =0 α ( y ) g [ y ] . roof. By assumption there is a non-decreasing function h ( y ), y ∈ N , such that h ( y ) = g ( y ) /g ( y )for all elements y of the support of g . Then for any real number c ∞ X y =0 α ( y ) g [ y ] − ∞ X y =0 α ( y ) g [ y ] = ∞ X y =0 ( α ( y ) − c )( g [ y ] − g [ y ]) = ∞ X y =0 ( α ( y ) − c )( h ( y ) − g [ y ] . For c := min { α ( y ) : h ( y ) ≥ } we have α (0) ≤ c < ∞ . For this choice of c , since h and α arenon-decreasing, every summand of the right-hand sum is non-negative. Thus the whole sum isnon-negative, too, and our assertion follows. Proof of Lemma 1. (i) First we examine a special case of Lemma 3. Consider for 0 < s ≤ r ∈ N the probability measures g s [ y ] = s r − y s + · · · + s r , ≤ y ≤ r . Then for 0 < s ≤ t ≤

1, 0 ≤ y < z ≤ r we have g s [ z ] /g s [ y ] = s y − z ≥ t y − z = g t [ z ] /g t [ y ]. Wetherefore obtain that r X y =0 yg s [ y ] = s r − + 2 s r − + · · · + r s + · · · + s r is a decreasing function in s . Also P ry =0 yg [ y ] = r and P ry =0 yg [ y ] = r/

2, and it follows for0 ≤ s ≤ r ≤ r + ( r − s + · · · + s r − s + · · · + s r ≤ r . (3)(ii) Next we derive a second representation for ϕ . We have1 − f ( s ) = ∞ X z =1 f [ z ](1 − s z ) = (1 − s ) ∞ X z =1 f [ z ] z − X k =0 s k , and f ′ (1)(1 − s ) − (1 − f ( s )) = (1 − s ) ∞ X z =1 f [ z ] z − X k =0 (1 − s k )= (1 − s ) ∞ X z =1 f [ z ] z − X k =1 k − X j =0 s j = (1 − s ) ∞ X z =1 f [ z ](( z −

1) + ( z − s + · · · + s z − ) . Therefore ϕ ( s ) = m (1 − s ) − (1 − f ( s )) m (1 − s )(1 − f ( s ))= P ∞ y =1 f [ y ](( y −

1) + ( y − s + · · · + s y − ) m · P ∞ z =1 f [ z ](1 + s + · · · + s z − ) . ϕ ( s ) ≤ ψ ( s ) m ≤ ϕ ( s ) (4)with ψ ( s ) := P ∞ y =1 f [ y ]( y − s + · · · + s y − ) P ∞ z =1 f [ z ](1 + s + · · · + s z − ) . Now consider the probability measures g s ∈ P , 0 ≤ s ≤

1, given by g s [ y ] := f [ y ](1 + s + · · · + s y − ) P ∞ z =1 f [ z ](1 + s + · · · + s z − ) , y ≥ . (5)Then for f [ y ] > z > y , after some algebra, g s [ z ] g s [ y ] = f [ z ] f [ y ] z − y Y v =1 (cid:16) s − + · · · + s − y − v +1 (cid:17) , which is an increasing function in s . Therefore by Lemma 3 the function ψ ( s ) is increasing in s .In combination with (4) we get ϕ ( s ) ≤ ψ ( s ) m ≤ ψ (1) m ≤ ϕ (1) , ϕ ( s ) ≥ ψ ( s ) m ≥ ψ (0) m ≥ ϕ (0) . This gives the claim of the proposition.

Proof of Lemma 2.

We prepare the proof by estimating the derivative of ϕ given by ϕ ′ ( s ) = 1 m mf ′ ( s )(1 − f ( s )) − m (1 − s ) . In order to handle this expression we substitute the right-hand square of the geometric mean p mf ′ ( s ) by the square of the arithmetic mean ( m + f ′ ( s )) / ϕ ′ ( s ) = ψ ( s ) − ψ ( s ) (6)with ψ ( s ) = 14 m ( m + f ′ ( s )) (1 − f ( s )) − m (1 − s ) , ψ ( s ) = 14 m ( m + f ′ ( s )) (1 − f ( s )) − f ′ ( s )(1 − f ( s )) . We show that both ψ and ψ are non-negative functions and estimate them from above.To accomplish this for ψ we introduce ζ ( s ) := ( m + f ′ ( s )) − − f ( s )1 − s = ∞ X y =1 y (1 + s y − ) f [ y ] − ∞ X y =1 − s y − s f [ y ]= ∞ X y =3 (cid:16) y (1 + s y − ) − s + · · · + s y − ) (cid:17) f [ y ] . dds (cid:0) y (1 + s y − ) − s + · · · + s y − ) (cid:1) = y ( y − s y − − s + . . . + ( y − s y − ) ≤ y ( y − s y − − s y − (1 + 2 + . . . + ( y − ≤ s ≤

1, and since ζ (1) = 0 we see that ζ is a non-negative, decreasing function. Thus ψ is a non-negative function, too. Also ζ (0) ≤ m .Moreover we have for y ≥ y (1 + s y − ) − s + · · · + s y − ) = (1 − s ) y − X z =1 z ( y − z − s z − , and consequently ζ ( s ) = (1 − s ) ξ ( s )with ξ ( s ) := ∞ X y =3 y − X z =1 z ( y − z − s z − f [ y ] . The function ξ is non-negative and increasing.Coming back to ψ we rewrite it as ψ ( s ) = ( m + f ′ ( s ))(1 − s ) − (1 − f ( s ))(1 − f ( s ))(1 − s ) · ( m + f ′ ( s ))(1 − s ) + (1 − f ( s )) m (1 − f ( s ))(1 − s ) . Using f ′ ( s ) ≤ m it follows ψ ( s ) ≤ ζ ( s )2(1 − f ( s )) (cid:16) − f ( s ) + 1 m (1 − s ) ) (cid:17) = ζ ( s )2 (cid:16) m (1 − s ) + ϕ ( s ) (cid:17)(cid:16) m (1 − s ) + ϕ ( s ) (cid:17) ≤ ζ ( s ) (cid:16) m (1 − s ) + ϕ ( s ) (cid:17) = 2 ξ ( s ) m + 2 ζ ( s ) ϕ ( s ) . By means of Lemma 1, the monotonicity properties of ξ and ζ and ϕ (1) = ν/ ζ (0) ≤ m weobtain 0 ≤ ψ ( s ) ≤ ξ ( s ) m + 2 mν . (7)Now we investigate the function ψ , which we rewrite as ψ ( s ) = 14 m (cid:16) m − f ′ ( s )1 − f ( s ) (cid:17) . We have 1 − f ( s ) = ∞ X z =1 (1 − s z ) f [ z ] = (1 − s ) ∞ X z =1 (1 + s + · · · + s z − ) f [ z ]13nd m − f ′ ( s ) = ∞ X y =1 (1 − s y − ) yf [ y ] = (1 − s ) ∞ X y =2 y (1 + · · · + s y − ) f [ y ] . Using the notation (5) it follows m − f ′ ( s )1 − f ( s ) = ∞ X y =2 · · · + s y − · · · + s y − yg s [ y ] ≤ ∞ X y =2 yg s [ y ] . As above we may apply Lemma 3 to the probability measures g s and conclude that the right-handterm is increasing with s . Therefore0 ≤ m − f ′ ( s )1 − f ( s ) ≤ ∞ X y =2 yg [ y ] = P ∞ y =2 y f [ y ] P ∞ z =1 zf [ z ] ≤ P ∞ y =1 y ( y − f [ y ] P ∞ z =1 zf [ z ] = 2 mν and hence 0 ≤ ψ ( s ) ≤ mν . (8)Coming to our claim note ﬁrst that owing to the non-negativity of ψ and ψ we obtain fromformula (6) for any s ≤ u ≤ − Z s ψ ( t ) dt ≤ ϕ (1) − ϕ ( u ) ≤ Z s ψ ( t ) dt . The equations (7) and (8) entail − mν (1 − s ) ≤ ϕ (1) − ϕ ( u ) ≤ m Z s ξ ( t ) dt + 2 mν (1 − s ) . (9)It remains to estimate the right-hand integral. We have for 0 ≤ s < Z s ξ ( t ) dt = ∞ X y =3 y − X z =1 ( y − z − − s z ) f [ y ] ≤ (1 − s ) ∞ X y =3 ( y − f [ y ] y − X z =1 z − X u =0 s u = (1 − s ) ∞ X y =3 ( y − f [ y ] y − X u =0 ( y − − u ) s u = (1 − s ) ∞ X u =0 s u ∞ X y = u +3 ( y − f [ y ] . The right-hand sum is monotonically decreasing in u , therefore for natural numbers a we end upwith the estimate Z s ξ ( t ) dt ≤ ∞ X y =3 ( y − f [ y ](1 − s ) a − X u =0 s u + ∞ X y = a +3 ( y − f [ y ](1 − s ) ∞ X u = a s u ≤ f ′′ (1) a (1 − s ) + E [ Y ; Y > a ] . Combining this estimate with (9) our claim follows.14 emark.

We have ξ (1) = ∞ X y =3 y − X z =1 z ( y − z − f [ y ] = 13 ∞ X y =3 z ( z − z − f [ z ] = f ′′′ (1)3and hence from (6), (7), (8) and the monotonicity of ξ for 0 ≤ s ≤ − f ′′ (1) f ′ (1) ≤ ϕ ′ ( s ) ≤ f ′′′ (1)3 f ′ (1) + 2 f ′′ (1) f ′ (1) . The quality of these bounds is evident from the observation that ϕ ′ (1) = 16 f ′′′ (1) f ′ (1) − f ′′ (1) f ′ (1) , as follows by means of Taylor expansions of f and f ′ about 1. Let v = ( f , f , . . . ) denote a varying environment. Let us deﬁne for non-negative integers k ≤ n the probability measures f k,n := f k +1 ◦ · · · ◦ f n with the convention f n,n = δ (the dirac measure at point 1). As is well-known, the distributionof Z n is given by f ,n . Thus for a BPVE one is faced with the task to analyze such probabilitymeasures.First let us review some formulas for moments. There exists a clear-cut expression for thevariance of Z n due to Fearn [7]. It seems to be less noticed that there is a similar appealing formulafor the second factorial moment of Z n , which turns out to be more useful for our purpose. Lemma 4.

For a BPVE ( Z n ) n ≥ we have E [ Z n ] = µ n , E [ Z n ( Z n − E [ Z n ] = n X k =1 ν k µ k − . The proof is standard. We have f ′ k,n ( s ) = n Y l = k +1 f ′ l ( f l,n ( s )) , in particular f ′ n,n ( s ) = 1, and after some rearrangements f ′′ k,n ( s ) = f ′ k,n ( s ) n X l = k +1 f ′′ l ( f l,n ( s )) f ′ l ( f l,n ( s )) Q l − j = k +1 f ′ j ( f j,n ( s )) , in particular f ′′ n,n ( s ) = 0. Choosing k = 0 and s = 1 Lemma 4 is proved.Next we recall an expansion of the generating function of Z n taken from [11] and [8]. It is akind of formula which has been used in many studies of branching processes. Let ϕ n , n ≥

1, bethe shape functions of f n , n ≥

1. Then, since f k,n = f k +1 ◦ f k +1 ,n for k < n ,11 − f k,n ( s ) = 1 f ′ k +1 (1)(1 − f k +1 ,n ( s )) + ϕ ( f k +1 ,n ( s )) . Iterating the formula we end up with the following identity.15 emma 5.

For ≤ s < , ≤ k < n − f k,n ( s ) = µ k µ n (1 − s ) + µ k n X l = k +1 ϕ l ( f l,n ( s )) µ l − . The next lemma clariﬁes the role of Assumption (A).

Lemma 6.

Under Assumption (A) there is a γ < ∞ such that for all n ≥ E [ Z n ] E [ Z n ] ≤ P ( Z n > ≤ γ E [ Z n ] E [ Z n ] . Proof.

The left-hand estimate is just the Paley-Zygmund inequality. For the right-hand estimateobserve that P ( Z n >

0) = 1 − f ,n [0] = 1 − f ,n (0). Using Lemma 5 with s = 0 we get therepresentation 1 P ( Z n >

0) = 1 µ n + n X k =1 ϕ k ( f k,n (0)) µ k − , (10)hence by means of Lemma 1 1 P ( Z n > ≥ µ n + 12 n X k =1 ϕ k (0) µ k − . (11)Now ϕ k (0) = 11 − f k [0] − f ′ k (1) = E [( Y k − Y k ≥ P ( Y k > E [ Y k ] ≥ E [ Y k ; Y k ≥ P ( Y k > E [ Y k ] , hence (A) implies ϕ k (0) ≥ c E [ Y k ; Y k ≥ E [ Y k ] ≥ ν k c = ϕ k (1) c . (12)It follows with γ = max(1 , c )1 P ( Z n > ≥ µ n + 14 c n X k =1 ν k µ k − ≥ γ (cid:16) µ n + n X k =1 ν k µ k − (cid:17) . On the other hand Lemma 4 implies E [ Z n ] E [ Z n ] = E [ Z n ( Z n − E [ Z n ] + 1 E [ Z n ] = n X k =1 ν k µ k − + 1 µ n . (13)Combining the last two formulas our claim follows.16 roof of Theorem 1. (i) ⇔ (ii): Since lim n →∞ P ( Z n >

0) = 1 − q the equivalence follows fromLemma 6.(ii) ⇔ (iii): We have n X k =1 ρ k µ k − = n X k =1 ν k + f k (1) − − µ k − = n X k =1 ν k µ k − + n X k =1 (cid:16) µ k − µ k − (cid:17) = n X k =1 ν k µ k − + 1 µ n − , (14)thus because of (13) E [ Z n ] E [ Z n ] = n X k =1 ρ k µ k − + 1 . (15)This gives the claim.(iii) ⇔ (iv): This equivalence is an immediate consequence of (14).(v) ⇔ (vi): This implication follows again from Lemma 6.(vi) ⇔ (vii): Again this is a consequence of equation (15).(vii) ⇔ (viii): This claim follows from (14). Remark.

From (11) it follows that a suﬃcient condition for a.s. extinction is given by thesingle requirement P k ≥ ϕ k (0) /µ k − = ∞ . This conﬁrms a conjecture of Jirina [11]. Proof of Theorem 2.

Statement (i) is obvious. For the ﬁrst part of statement (ii) note that fromTheorem 1, (vi) it follows that sup n ≥ E [ W n ] < ∞ . Therefore the martingale ( W n ) n ≥ is square-integrable implying E [ W ] = E [ W ] = 1.For the other part we distinguish two cases. Either µ n → r with 0 < r < ∞ . Then W n = Z n /µ n → Z ∞ /r a.s., consequently W = Z ∞ /r a.s. and P ( W = 0) = P ( Z ∞ = 0) = q . Elsewe may assume µ n → ∞ in view of Theorem 1, (viii). Also { Z ∞ = 0 } ⊂ { W = 0 } a.s., thus itis suﬃcient to show that P ( Z ∞ > , W = 0) = 0. First we estimate P ( Z ∞ = 0 | Z k = 1) frombelow. From Lemma 5 and Lemma 1 for k < n − P ( Z n = 0 | Z k = 1) = 11 − f k,n (0) ≥ µ k n X l = k +1 ϕ l (0) µ l − . as well as 11 − E [ e − λW n | Z k = 1] = 11 − f k,n ( e − λ/µ n ) ≤ µ k µ n (1 − e − λ/µ n ) + 2 µ k n X l = k +1 ϕ l (1) µ l − with λ >

0. By means of ϕ l (1) = ν l / − E [ e − λW n | Z k = 1] ≤ µ k µ n (1 − e − λ/µ n ) + 4 c − P ( Z n = 0 | Z k = 1) . n → ∞ we get 11 − E [ e − λW | Z k = 1] ≤ µ k λ + 4 c − P ( Z ∞ = 0 | Z k = 1)and with λ → ∞ P ( W > | Z k = 1) ≤ c P ( Z ∞ > | Z k = 1) . Using e − x ≤ − x for 0 ≤ x ≤ / P ( W > | Z k = 1) ≤ (8 c ) − P ( Z ∞ = 0 | Z k = 1) = 1 − P ( Z ∞ > | Z k = 1) ≥ − c P ( W > | Z k = 1) ≥ e − c P ( W > | Z k =1) ≥ (1 − P ( W > | Z k = 1)) c = P ( W = 0 | Z k = 1) c . (16)Now we draw on a martingale, which already appears in the work of D’Souza and Biggins [5].Let for n ≥ M n := P ( W = 0 | Z , . . . , Z n ) = P ( W = 0 | Z n = 1) Z n a.s. . From standard martingale theory M n → I { W = 0 } a.s. In particular we have P ( W = 0 | Z n = 1) Z n → W = 0 , (17)a result which has already been noticed and exploited by D’Souza [4].We distinguish two cases. Either there is an inﬁnite sequence of natural numbers such that P ( W > | Z n = 1) > (8 c ) − along this sequence. Then (17) implies that Z n → W = 0. Or else we may apply our estimate (16) to obtain from (17) that P ( Z ∞ = 0 | Z n = 1) Z n → W = 0 . Therefore, given ε >

0, we have for n suﬃciently large P ( Z ∞ > , W = 0) ≤ ε + P ( Z n > , P (cid:0) Z ∞ = 0 | Z n = 1) Z n ≥ − ε (cid:1) ≤ ε + 11 − ε E [ P ( Z ∞ = 0 | Z n ); Z n > ε + 11 − ε P ( Z ∞ = 0 , Z n > . Letting n → ∞ we obtain P ( Z ∞ > , W = 0) ≤ ε , and the claim follows with ε → Proof of Theorem 3. (i) ⇒ (iii): From Lemma 5, Lemma 1 and (12) we have for 0 ≤ s < E [1 − s Z n | Z n >

0] = 1 − f ,n ( s )1 − f ,n (0) ≥ c P nk =1 ν k µ k − µ n (1 − s ) + P nk =1 ν k µ k − . By assumption we may choose s < / (8 c ) for all n ≥ µ n (1 − s ) ≥ n X k =1 ν k µ k − . Thus the implication is veriﬁed. 18iii) ⇒ (ii): From the left-hand inequality in Lemma 6 and from (13) we obtain1 ≤ E [ Z n | Z n >

0] = µ n P ( Z n > ≤ µ n n X k =1 ν k µ k − from which the claim follows.(ii) ⇒ (i): This implication is obvious.The next lemma prepares the proof of Theorem 4. It clariﬁes the role of (B). Lemma 7.

Under the assumptions of Theorem 4 we have sup ≤ s ≤ (cid:12)(cid:12)(cid:12) n X k =1 ϕ k ( f k,n ( s )) µ k − − n X k =1 ϕ k (1) µ k − (cid:12)(cid:12)(cid:12) = o (cid:16) n X k =1 ϕ k (1) µ k − (cid:17) as n → ∞ .Proof. Fix ε > c ε/ according to assumption (B). Let s k := 1 − η f ′ k (1)with some 0 < η <

1. Then from Lemma 2 with a = ⌊ c ε/ ⌋ sup s k ≤ t ≤ | ϕ k (1) − ϕ k ( t ) | ≤ ηc ε/ ϕ k (1) + ε ϕ k (1) + 2 ν k f ′′ k (1) f ′ k (1) η f ′ k (1)From the estimate (1) it follows that f ′′ k (1) ≤ c / f ′ k (1)(1+ f ′ k (1)). Therefore there is a η = η ε > s k ≤ t ≤ | ϕ k (1) − ϕ k ( t ) | ≤ ε ϕ k (1) . Now deﬁne r = r ε,n := min { k ≤ n : f k,n (0) < s k } , where we put r = n , if no k ≤ n fulﬁls the right-hand inequality. It follows in view of Lemma 1 (cid:12)(cid:12)(cid:12) n X k =1 ϕ k (1) µ k − − n X k =1 ϕ k ( f k,n ( s )) µ k − (cid:12)(cid:12)(cid:12) ≤ ε r − X k =1 ϕ k (1) µ k − + 3 n X k = r ϕ k (1) µ k − . Also from (12), Lemma 1 and (10) with c = 2 c / n X k = r +1 ϕ k (1) µ k − ≤ c n X k = r +1 ϕ k (0) µ k − ≤ c n X k = r +1 ϕ k ( f k,n (0)) µ k − ≤ cµ r − f r,n (0) ≤ cµ r (1 − s r ) = 2 c (1 + f ′ r (1)) ηµ r = 2 cη (cid:16) µ r − + 1 µ r (cid:17) and ϕ r (1) µ r − = f ′′ r (1)2 f ′ r (1) µ r − ≤ c / (1 + f ′ r (1)) f ′ r (1) µ r − = c / (cid:16) µ r − + 1 µ r (cid:17) . (cid:12)(cid:12)(cid:12) n X k =1 ϕ k (1) µ k − − n X k =1 ϕ k ( f k,n ( s )) µ k − (cid:12)(cid:12)(cid:12) ≤ ε n X k =1 ϕ k (1) µ k − + 3 (cid:16) cη + c / (cid:17)(cid:16) µ r − + 1 µ r (cid:17) Observe that the other assumptions of the Theorem 4 together with Theorem 1, (iii) imply ∞ X k =1 ϕ k (1) µ k − = 12 ∞ X k =1 ν k µ k − = ∞ . In view of the lemma’s assumption there is a positive integer r ε such that for all n ≥ r > r ε (cid:16) cη + c / (cid:17)(cid:16) µ r − + 1 µ r (cid:17) ≤ ε r − X k =1 ν k µ k − + ε r X k =1 ν k µ k − ≤ ε n X k =1 ϕ k (1) µ k − . Since the right-hand term diverges as n → ∞ it follows3 (cid:16) c η + c / (cid:17)(cid:16) µ r − + 1 µ r (cid:17) ≤ ε n X k =1 ϕ k (1) µ k − for all 0 < r ≤ n , if only n is large enough, and we obtain (cid:12)(cid:12)(cid:12) n X k =1 ϕ k ( f k,n ( s )) µ k − − n X k =1 ϕ k (1) µ k − (cid:12)(cid:12)(cid:12) ≤ ε n X k =1 ϕ k (1) µ k − . This proves our claim.

Proof of Theorem 4.

From (10), Lemma 7 and the theorem’s assumption it follows1 P ( Z n >

0) = 1 µ n + n X k =1 ϕ k ( f k,n (0)) µ k − ∼ n X k =1 ν k µ k − implying the ﬁrst claim. Also, using Lemma 5 we have1 − E [ e − λZ n /a n | Z n >

0] = 1 − f ,n ( e − λ/a n )1 − f ,n (0)= (cid:16) µ n + n X k =1 ϕ k ( f k,n (0)) µ k − (cid:17)(cid:16) µ n (1 − e − λ/a n ) + n X k =1 ϕ k ( f k,n ( e − λ/a n )) µ k − (cid:17) − . Since a n → ∞ , from Lemma 7 and the theorem’s assumption1 − E [ e − λZ n /a n | Z n >

0] = (cid:16) o (1)) n X k =1 ν k µ k − (cid:17)(cid:16) (1 + o (1)) a n λµ n + (1 + o (1)) n X k =1 ν k µ k − (cid:17) − as n → ∞ . From the deﬁnition of a n we get1 − E [ e − λZ n /a n | Z n >

0] = λ + o (1)1 + λ . This implies the claim. 20

Appendix

Here we consider Lindvall’s theorem [12]. His approach rests on the extensive calculations ofChurch [3]. We give a self-contained proof streamlining their ideas.

Theorem.

For a BPVE ( Z n ) n ≥ in varying environment v = ( f , f , . . . ) there exists a randomvariable Z ∞ with values in N ∪ {∞} such that as n → ∞ Z n → Z ∞ a.s. Moreover, P ( Z ∞ = 0 or ∞ ) = 1 ⇔ ∞ X n =1 (1 − f n [1]) = ∞ . Proof. (i) We prepare the proof by showing that the sequence of probability measures f ,n isvaguely converging to a (possibly defective) measure g on N . Note that f ,n [0] → q . Thus either f ,n → qδ vaguely (with the Dirac measure δ at point 0), or else (by the Helly-Bray Theorem)there exists a sequence of integers 0 = n < n < n < · · · such that, as k → ∞ , we have f ,n k → g vaguely with g = qδ .In the latter case the limiting generating function g ( s ) is strictly increasing in s , and f ,n k ( s ) → g ( s ) for all 0 ≤ s <

1. Then, for given n ∈ N , we deﬁne l n := n k , m n := n k +1 with n k ≤ n < n k +1 ,thus l n ≤ n < m n . We like to show that f l n ,n converges vaguely to δ . For this purpose we considera subsequence n ′ such that both f l n ′ ,n ′ and f n ′ ,m n ′ converge vaguely to measures h and h . Goingin f ,m n ′ = f ,l n ′ ◦ f l n ′ ,n ′ ◦ f n ′ ,m n ′ to the limit we obtain g ( s ) = g ( h ( h ( s ))) , ≤ s < . Since g is strictly increasing, h ( h ( s )) = s , which for generating functions implies h ( s ) = h ( s ) = s . Thus indeed, using the common sub-sub-sequence argument, f l n ,n → δ as n → ∞ . Itfollows that, as n → ∞ f ,n ( s ) = f ,l n ( f l n ,n ( s )) → g ( s ) , ≤ s < , which means f ,n → g vaguely, as has been claimed.(ii) We now turn to the proof of the ﬁrst statement. The case g ( s ) = 1 for all 0 ≤ s < g = δ and q = 1, and Z n is a.s. convergent to 0. Thus we are left with the case g ( s ) < s <

1. Then there is a decreasing sequence ( b n ) n ≥ of real numbers, such that f ,n (1 / ≤ b n ≤ b n ↓ g (1 / a n ) n ≥ through the equation f ,n ( a n ) = b n . Therefore 1 / ≤ a n ≤

1, also we have f ,n +1 ( a n +1 ) ≤ f ,n ( a n ) or equivalently f n +1 ( a n +1 ) ≤ a n .Then the process U = ( U n ) n ≥ , given by U n := a Z n n · { Z n > } is a non-negative supermartingale. Indeed, because of f n +1 (0) Z n ≥ { Z n =0 } and f n +1 ( a n +1 ) ≤ a n we have E [ U n +1 | Z , . . . , Z n ] = f n +1 ( a n +1 ) Z n − f n +1 (0) Z n ≤ a Z n n − { Z n =0 } = U n a.s.21hus U n is a.s. convergent to a random variable U ≥ g = qδ . Then g ( s ) is strictly increasing, which implies a n → / n → ∞ . Hence the a.s. convergence of U n enforces the a.s. convergence of Z n withpossible limit ∞ .Or else g = qδ . Then g (1 /

2) = q , implying that for n → ∞ E [ U n ] = f ,n ( a n ) − f ,n (0) = b n − P ( Z n = 0) → g (1 / − q = 0and consequently U = 0 a.s. implying U n → a n ≥ / n , this enforces that Z n converges a.s. to 0 or ∞ . In both cases Z n → Z ∞ a.s. for some random variable Z ∞ .(iii) For the second statement we use the representation Z n = P Z n − i =1 Y i,n . Deﬁne the events A z,n := { P zi =1 Y i,n = z } . Then for z ≥ P ( A z,n ) ≥ − z (1 − f n [1]) . Indeed, if f n [1] ≥ /

3, then P ( A z,n ) ≥ P ( Y ,n = 1 , Y ,n = · · · = Y z,n = 1) ≥ (1 − f n [1]) f n [1] z ≥ − z (1 − f n [1]) , and if f n [1] ≤ /

3, then either P ( Y i,n > ≥ / P ( Y i,n = 0) ≥ / P ( A z,n ) ≥ P (min( Y ,n , . . . , Y z,n ) >

1) + P ( Y ,n = · · · = Y z,n = 0) ≥ − z (1 − f n [1]) . Now assume P ∞ n =1 (1 − f n [1]) = ∞ . Since for ﬁxed z the events A z,n are independent, it follows bythe Borel-Cantelli Lemma that these events occur a.s. inﬁnitely often. From the a.s. convergenceof Z n we get for z ≥ P ( Z ∞ = z ) = P ( Z n = z ﬁnitely often) ≤ P ( A z,n occurs ﬁnitely often) = 0 . This implies that P (1 ≤ Z ∞ < ∞ ) = 0.Conversely let P ∞ n =1 (1 − f n [1]) < ∞ . Then for z ≥ P ( Z = z ) > P ( Z ∞ = z ) ≥ P ( Z n = z for all n ) ≥ P ( Z = z ) (cid:16) ∞ Y n =1 f n [1] (cid:17) z > , and it follows P (1 ≤ Z ∞ < ∞ ) >

0. Thereby the proof is ﬁnished.

References [1] A. Agresti, On the extinction times of random and varying environment branching processes.

J. Appl. Probab. (1975), 39–46.[2] N. Bhattacharya, M. Perlman, Time-inhomogeneous branching processes conditioned onnon-extinction. Preprint (2017). arXiv:1703.00337 [math.PR][3] J. D. Church, On inﬁnite composition products of probability generating functions. Z.Wahrscheinlichkeitstheorie verw. Geb. (1971), 243–256.224] J. C. D’Souza, The rates of growth of the Galton-Watson process in varying environments. Adv. Appl. Probab. (1994), 698–714.[5] J. C. D’Souza, J. D. Biggins, The supercritical Galton-Watson process in varying environ-ments. Stoch. Proc. Appl. (1992), 39–47.[6] K. S. Fahady, M. P. Quine, D. Vere Jones, Heavy traﬃc approximations for the Galton-Watson process. Adv. Appl. Probab . (1971), 282–300.[7] D. H. Fearn, Galton-Watson processes with generation dependence. Proc. 6th Berkeley Symp.Math. Statist. Probab. (1971), 159–172.[8] J. Geiger, G. Kersting, The survival probability of a critical branching process in randomenvironment. Theor. Probab. Appl. (2001), 517–525.[9] R. T. Goettge, Limit theorems for the supercritical Galton-Watson process in varying envi-ronments. Math. Biosci. (1976), 171–190.[10] P. Jagers, Galton-Watson processes in varying environments. J. Appl. Probab. (1974),174–178.[11] M. Jirina, Extinction of non-homogeneous Galton-Watson processes. J. Appl. Probab. (1976), 132–137.[12] T. Lindvall, Almost sure convergence of branching processes in varying and random envi-ronments. Ann. Probab. (1974), 344–346.[13] R. Lyons, Random walks, capacity and percolation on trees. Ann. Probab. (1992), 2043–2088.[14] I. M. MacPhee, H. J. Schuh, A Galton-Watson branching process in varying environmentswith essentially constant means and two rates of growth. Austral. J. Statist. (1983),329–338.[15] B. A. Sevast’yanov, Transient phenomena in branching stochastic processes. Theor. Probab.Appl.4