[PDF] Genealogy and spatial distribution of the N-particle branching random walk with polynomial tails

Abstract

Full PDF

GGenealogy and spatial distribution of the N -particle branchingrandom walk with polynomial tails Sarah Penington ∗ Matthew I. Roberts ∗ Zsóﬁa Talyigás ∗ February 25, 2021

Abstract

The N -particle branching random walk is a discrete time branching particle system withselection. We have N particles located on the real line at all times. At every time step eachparticle is replaced by two oﬀspring, and each oﬀspring particle makes a jump of non-negativesize from its parent’s location, independently from the other jumps, according to a given jumpdistribution. Then only the N rightmost particles survive; the other particles are removed fromthe system to keep the population size constant. Inspired by work of J. Bérard and P. Maillard, weexamine the long term behaviour of this particle system in the case where the jump distributionhas regularly varying tails and the number of particles is large. We prove that at a typical largetime the genealogy of the population is given by a star-shaped coalescent, and that almost thewhole population is near the leftmost particle on the relevant space scale. N -BRW model We investigate a particle system called N -particle branching random walk ( N -BRW). In this discretetime stochastic process, at each time step, we have N particles located on the real line. We saythat the particles at the n th time step or at time n belong to the n th generation. The locationsof the particles change at every time step according to the following rules. Every particle hastwo oﬀspring. The oﬀspring particles have random independent displacements from their parents’locations, according to some prescribed displacement distribution supported on the non-negative realnumbers. Then from the N oﬀspring particles, only the N particles with the rightmost positionssurvive to form the next generation. That is, at each time step we have a branching step in whichthe N oﬀspring particles move, and we have a selection step , in which N out of the N oﬀspringare killed. Ties are decided arbitrarily. We describe the process more formally in Section 2.1.We will use the notation [ N ] := { , . . . , N } and N := N ∪ { } throughout. A pair ( i, n ) with i ∈ [ N ] and n ∈ N will represent the i th particle from the left in generation n . We also refer to therightmost particle ( N, n ) as the leader at time n . Furthermore, we will denote the locations of the N particles in the n th generation by the ordered set of N real numbers X ( n ) = {X ( n ) ≤ · · · ≤ X N ( n ) } , (1.1)where X i ( n ) is the location of particle ( i, n ) . We sometimes call X ( n ) the particle cloud . ∗ Department of Mathematical Sciences, University of Bath, UK a r X i v : . [ m a t h . P R ] F e b he long term behaviour of the N -BRW heavily depends on the tail of the displacement distri-bution. Motivated by the work of Bérard and Maillard [2], we investigate the N -BRW in the casewhere the displacement distribution is regularly varying, and N is large.We say that a function f is regularly varying with index α ∈ R if for all y > , f ( xy ) f ( x ) → y α as x → ∞ . (1.2)Let X be a random variable and let the function h be deﬁned by P ( X > x ) = 1 h ( x ) for x ≥ . (1.3)We assume throughout that P ( X ≥

0) = 1 , that h is regularly varying with index α > , and thatthe displacement distribution of the N -BRW is given by (1.3). These are the same assumptionsunder which the results of [2] were proved. The reader may wish to think of the particular regularlyvarying function given by h ( x ) = x α for x ≥ and h ( x ) = 1 for x ∈ [0 , . We do not expectsigniﬁcant change in the behaviour of the N -BRW if jumps of negative size are allowed, but we donot prove this; we use the assumption that the jumps are non-negative several times in our argument. Before explaining our main result, we describe the time and space scales we will be working with.We deﬁne (cid:96) N := (cid:100) log N (cid:101) , (1.4)for N ≥ ; this is the time scale we will be using throughout. To avoid trivial cases we alwaysassume that N ≥ . The time scale (cid:96) N is the time it takes for the descendants of one particle totake over the whole population, if none are killed in selection steps.For the space scale we choose a N := h − (2 N (cid:96) N ) , (1.5)where h is as in (1.3), and h − denotes the generalised inverse of h deﬁned by h − ( x ) := inf { y ≥ h ( y ) > x } . (1.6)It is worth thinking of the particular case h ( x ) = x α for x ≥ , for which we have a N = (2 N (cid:96) N ) /α and h ( a N ) = 2 N (cid:96) N .With the choice of a N in (1.5), for any positive constant c , the expected number of jumps whichare larger than ca N in a time interval of length (cid:96) N is of constant order, as N goes to inﬁnity. Theheuristic picture in [2] says that jumps of order a N govern the speed, the spatial distribution, andthe genealogy of the population for N large. Besides the main result of [2] on the asymptotic speedof the particle cloud, it is conjectured that at a typical time the majority of the population is close tothe leftmost particle, and that the genealogy of the population is given by a star-shaped coalescent.In this paper we prove these conjectures. Stating our main result precisely involves introducing some more notation and deﬁning some ratherintricate events. We will do this in Section 2. In this section we instead aim to explain the mainmessage of the theorem. When we say ‘with high probability’, we mean with probability convergingto 1 as N → ∞ . For all η > , M ∈ N and t > (cid:96) N , the N -BRW has the following properties with high probability: Spatial distribution:

At time t there are N − o ( N ) particles within distance ηa N of theleftmost particle, i.e. in the interval [ X ( t ) , X ( t ) + ηa N ] . • Genealogy:

The genealogy of the population on an (cid:96) N time scale is asymptotically given by astar-shaped coalescent, and the time to coalescence is between (cid:96) N and (cid:96) N .That is, there exists a time T ∈ [ t − (cid:96) N , t − (cid:96) N ] such that with high probability, if we choose M particles uniformly at random at time t , then every one of these particles descends from therightmost particle at time T . Furthermore, with high probability no two particles in the sampleof size M have a common ancestor after time T + ε N (cid:96) N , where ε N is any sequence satisfying ε N → and ε N (cid:96) N → ∞ , as N → ∞ . The star-shaped genealogy might seem counter-intuitive because every particle has only twodescendants. Indeed, if we take a sample of

M > particles at time t , and look at the lineages ofthese particles, they certainly cannot coalesce in one time step. Our result says that all coalescencesof the lineages of the sample occur within o ( (cid:96) N ) time. Therefore, looking on an (cid:96) N time scale thecoalescence appears instantaneous. We construct our heuristic picture based on the tribe heuristics for the N -BRW with regularly varyingtails described in [2]. The tribe heuristics say that at a typical large time there are N − o ( N ) particlesclose to the leftmost particle if we look on the a N space scale. We call this set of particles the bigtribe . Furthermore, there are small tribes of size o ( N ) to the right of the big tribe. The numberof such small tribes is O (1) . While the position of the big tribe moves very little on the a N spacescale, the number of particles in the small tribes doubles at each time step. As a result, the big tribeeventually dies out, and one of the small tribes grows to become the new big tribe and takes overthe population.To escape the big tribe and create a new tribe that takes over the population, a particle mustmake a big jump of order a N . As we explained in Section 1.2, jumps of this size occur on an (cid:96) N time scale, and (cid:96) N is the time needed for a new tribe to grow to a big tribe of size N .Take t > (cid:96) N . Building on the tribe heuristics, we describe the following picture. Assume thata particle becomes the leader with a big jump of order a N . We claim that this particle will haveof order N surviving descendants (cid:96) N time after the big jump. Moreover, the particle that makesthe last such jump before time t := t − (cid:96) N will be the common ancestor of the majority of thepopulation at time t . We denote the generation of this ancestor particle by T , and assume that T ∈ [ t , t ] . In Figure 1 we illustrate how a new tribe is formed at time T , and how it grows to a bigtribe by time t . We will prove the main result described in Section 1.3 by showing that the picturein Figure 1 develops with high probability.We introduce the notation t i := t − i(cid:96) N , (1.7)for t, i ∈ N . The message of Figure 1, which we will prove later, is that the following occurs withhigh probability. A: At time T ∈ [ t , t ] , particle ( N, T ) has taken a big jump of order a N and escaped the big tribe.It now leads by a large distance, and its descendants will be the leaders at least until time t .There are two main reasons for this. First, we deﬁne T as the last time before time t when abig jump of order a N creates a new leader, so particles with big jumps in the time interval [ T, t ] cannot become leaders. Second, particles with smaller jumps not descending from particle ( N, T ) are unlikely to catch up with the leading tribe, because paths with small jumps move very little on3 ηa N Tt T + (cid:96) N t A B CD Figure 1: A particle that makes a big jump of order a N at time T is the common ancestor of almostthe whole population at time t . The vertical axis represents time, and the particles’ locations are depictedhorizontally, increasing from left to right. The black dots represent particles. Horizontal dotted lines in anellipse or circle show where the majority of the population (the big tribe) is. The arrows represent jumpsfrom the big tribe. We use circles to zoom in on the population. The particles circled in red are killed in theselection step. The events labelled A to D are described in the main text. the a N space scale. This is an important property of random walks with regularly varying tails,which we will state and prove in Lemma 4.3 and apply in Corollary 4.5. B: After time t , there might be particles which do not descend from particle ( N, T ) , but which, bymaking a big jump of order a N , move beyond the tribe of particle ( N, T ) . However, these particleshave substantially less than (cid:96) N time to produce descendants by time t , and so each of them can onlyhave o ( N ) descendants at time t . Particles which do not descend from ( N, T ) are unlikely to movebeyond the tribe of particle ( N, T ) without making a big jump.There will only be O (1) big jumps of order a N between times t and t , because jumps of order a N happen with frequency of order /(cid:96) N . Therefore, until time t , the total number of particles tothe right of the tribe of particle ( N, T ) is at most o ( N ) . C: The tribe of particle ( N, T ) doubles in size at each step up to (almost) time T + (cid:96) N . Selectiondoes not aﬀect these particles signiﬁcantly, because the number of particles to the right of this tribeis at most o ( N ) before time T + (cid:96) N , as we explained in part B.4 : At time T + (cid:96) N there are N particles to the right of the position of particle ( N, T ) . This is anelementary property of the N -BRW, following from the non-negativity of the jump sizes. The N particles are mainly in the tribe of particle ( N, T ) , and there may be o ( N ) particles ahead of thetribe. From this point on, the N leftmost oﬀspring particles in the tribe of particle ( N, T ) do notsurvive.Then, between times T + (cid:96) N and t , the number of particles in the tribe of particle ( N, T ) will remain N − o ( N ) , where the o ( N ) part doubles at each time step but does not reach order N by time t .Therefore, almost every particle at time t descends from particle ( N, T ) .Furthermore, as the number of descendants of particle ( N, T ) only reaches order N at (roughly)time T + (cid:96) N , the descendants of particle ( N, T ) are unlikely to make big jumps of order a N beforetime T + (cid:96) N . We will prove this property (and many others) in Lemma 4.6. Only O (1) descendantsof particle ( N, T ) make big jumps of order a N between times T and t , and these big jumps are likelyto happen after time T + (cid:96) N , and so signiﬁcantly after time t . Therefore, most time- t descendantsof particle ( N, T ) will not have an ancestor which made a big jump between times T and t , thusthey will not move far from their ancestor’s position X N ( T ) on the a N space scale.In order to prove our statements in Section 1.3 we also need to show that there is at least oneparticle which becomes the new leader with a jump of order a N during the time interval [ t , t ] . Theexistence of such a particle will imply that indeed there exists T ∈ [ t , t ] as in Figure 1. We give aheuristic argument for this in Section 2.3, where we also explain the idea for proving that if we takea sample of M particles at time t then the coalescence of the ancestral lineages of these particleshappens within a time window of width o ( (cid:96) N ) . In order to show that our main result is more or less optimal, we will prove two additional results.

Spatial distribution:

Our main theorem says that most particles in the population are likely tobe within distance ηa N of the leftmost at time t , for arbitrarily small η > when N and t arelarge. We will show that this is not true of all particles: the distance between the leftmost andrightmost particles is typically of order a N , and is arbitrarily large on the a N space scale withpositive probability. Therefore our result that most particles are close to the leftmost particle onthe a N space scale gives meaningful information on the shape of the particle cloud at a typical time.We state this formally in Proposition 2.3 and then prove it in Section 6. Genealogy:

Our main theorem says that the generation T of the most recent common ancestor ofa sample from the population at time t is between times t and t with high probability. We willprove that this is the strongest possible result in the sense that for any subinterval of [ t , t ] withlength of order (cid:96) N there is a positive probability that T is in that subinterval. This will be the mainmessage of Proposition 2.2, which we prove in Section 6.We also mention here that the precise statement of our main result, Theorem 2.1, implies thatthe distribution of the rescaled time to coalescence, ( t − T ) /(cid:96) N , has no atom at or in the limit N → ∞ . The N -BRW shows dramatically diﬀerent behaviours with diﬀerent jump distributions; this includesthe speed at which the particle cloud moves to the right, the spatial distribution within the popu-lation, and the genealogy. Below we discuss existing results and conjectures on these properties of5he N -BRW. We start by summarising the results of Bérard and Maillard, who studied the speed ofthe particle cloud when the displacement distribution is heavy-tailed. Heavy-tailed displacement distribution

Bérard and Maillard [2] introduced the stairs process, the record process of a shifted space-timePoisson point process. They proved that it describes the scaling limit of the pair of trajectoriesof the leftmost and rightmost particles’ positions ( X ( n ) , X N ( n )) n ∈ N when the jump distributionhas polynomial tails. The correct scaling is to speed up time by log N and to shrink the spacescale by a N . Using the relation between the N -BRW and the stairs process they prove their mainresult: the speed of the particle cloud grows as a N / log N in N , and the propagation is linear orsuperlinear (but at most polynomial) in time. The propagation is linear if the jump distributionhas ﬁnite expectation, and superlinear otherwise; the asymptotics follow from the behaviour of thestairs process. This behaviour is diﬀerent from that of the classical branching random walk withoutselection, where the propagation is exponentially fast in time in a heavy-tailed setting [13].The tribe heuristics in [2] predict—but do not prove—that the majority of the population islocated close to the leftmost particle, that the genealogy should be star-shaped, and that the rel-evant time scale for coalescence of ancestral lineages is (cid:96) N . We will prove the above properties inTheorem 2.1, and therefore the present paper and [2] together provide a comprehensive picture ofthe N -BRW with regularly varying tails, including the behaviour of the speed, spatial distributionand genealogy. Light-tailed displacement distribution

Particle systems with selection have been studied with light-tailed displacement distribution in thephysics literature as a microscopic stochastic model for front propagation. First Brunet and Derrida[9, 10], and later Brunet, Derrida, Mueller and Munier [8, 7] made predictions on the behaviour ofparticle systems with branching and selection.

Speed:

For the N -BRW, Bérard and Gouéré [1] proved the existence of the asymptotic speed ofthe particle cloud as time goes to inﬁnity, which in fact applies for any jump distribution with ﬁniteexpectation. They also proved that the asymptotic speed converges to a ﬁnite limiting speed as thenumber of particles N goes to inﬁnity, with a surprisingly slow rate (log N ) − , which was predictedby Brunet and Derrida [9, 10]. The limiting speed is the same as the speed of the rightmost particlein a classical branching random walk without selection with exponentially decaying tails [16, 17, 5]. Spatial distribution:

The spatial distribution in the light-tailed case is also predicted in [9, 10].The authors argue that the fraction of particles to the right of a given position at a given time shouldevolve according to an analogue of the FKPP equation. The FKPP equation is a reaction-diﬀusionequation admitting travelling wave solutions. Rigorous results on the relation between particlesystems with selection and free boundary problems with travelling wave solutions have been provedin [14] and [4, 11].

Genealogy:

On the genealogy of the N -BRW with light-tailed displacement distribution, thepapers [8, 7] arrived at the following conjecture (see also [18]). If we pick two particles at randomin a generation, then the number of generations we need to go back to ﬁnd a common ancestor ofthe two particles is of order (log N ) . Furthermore, if we take a uniform sample of k particles in ageneration and trace back their ancestral lines, the coalescence of their lineages is described by theBolthausen-Sznitman coalescent, if time is scaled by (log N ) . This property has been shown for acontinuous time model, a branching Brownian motion (BBM) with absorption [3], where particlesare killed when hitting a deterministic moving boundary. For the N -BRW and its continuous timeanalogue, the N -BBM, no rigorous proof has yet been given.6 isplacement distribution with stretched exponential tail As we have seen, the behaviour of the N -BRW is signiﬁcantly diﬀerent in the light-tailed andheavy-tailed cases. It is then a natural question to ask what happens in an intermediate regime,where the jump distribution has stretched exponential tails. Random walks and branching randomwalks with stretched exponential tails have been investigated in the literature [12, 15], but questionsabout the N -BRW with such a jump distribution, such as asymptotic speed, spatial distribution,and genealogy, remain open. In the future we intend to investigate the N -BRW in the stretchedexponential case. In Section 2 we state Theorem 2.1 and Propositions 2.2 and 2.3, our main results, which we haveexplained in Sections 1.3 and 1.5. Furthermore, we give a heuristic argument for the proof ofTheorem 2.1, introduce the notation we will be using throughout, and carry out the ﬁrst steptowards proving Theorem 2.1 in Lemma 2.5. As a result, the proof of Theorem 2.1 will be reducedto proving Propositions 2.6 and 2.7. We prove the former in Sections 3 and 4 and the latter inSection 5.In Section 3 we give a deterministic argument for the existence of a common ancestor betweentimes t and t of almost the whole population at time t . The argument will also imply that almostevery particle in the population at time t is near the leftmost particle. Then in Section 4 we checkthat the events of the deterministic argument occur with high probability. A key step in the proofis to see that paths cannot move a distance of order a N in (cid:96) N time without making at least onejump of order a N . We prove a large deviation result to show this, taking ideas from [13] and [15].The other important tool, which we will use to estimate probabilities, is Potter’s bound for regularlyvarying functions.In Section 5 we prove that the genealogy is star-shaped. We will use concentration resultsfrom [19] to see that a single particle at time T + ε N (cid:96) N cannot have more than of order N − ε N surviving descendants at time t , which will be enough to conclude the result.In Section 6 we prove Propositions 2.2 and 2.3 using some of our ideas from the deterministicargument in Section 3.Section 7 is a glossary of notation, where we collect the notation most frequently used in thispaper with a brief explanation, and with a reference to the section or equation where the notationis deﬁned. In Section 7 we also list the most important intermediate steps of the proof of our mainresult. N -BRW Let X i,b,n , i ∈ [ N ] , b ∈ { , } , n ∈ N be i.i.d. random variables with common law given by (1.3).Each X i,b,n stands for the jump size of the b th oﬀspring of particle ( i, n ) . Let X (0) = {X (0) ≤ . . . ≤ X N (0) } be any ordered set of N real numbers, which represents the initial locations of the N particles. Now we describe inductively how X (0) and the random variables X i,b,n , i ∈ [ N ] , b ∈ { , } , n ∈ N determine the N -BRW, that is, the sequence of locations of the N particles, ( X ( n )) n ∈ N .We start with the initial conﬁguration of particles X (0) . Once X ( n ) has been determined forsome n ∈ N , then X ( n + 1) is deﬁned as follows. Each particle has two oﬀspring, each of whichperforms a jump from the location of its parent. The N independent jumps at time n are thengiven by the i.i.d. random variables X i,b,n , i ∈ [ N ] , b ∈ { , } as above. After the jumps, only the N X ( n + 1) = {X ( n + 1) ≤ · · · ≤ X N ( n + 1) } is given bythe N largest numbers from the collection ( X i ( n ) + X i,b,n ) i ∈ [ N ] ,b ∈{ , } . Ties are decided arbitrarily.Note that since the jumps are non-negative, the sequences X i ( n ) are non-decreasing in n for all i ∈ [ N ] . Indeed, at time n there are at least N − i + 1 particles to the right of or at position X i ( n ) ,and so there are at least min( N, N − i + 1)) particles to the right of or at X i ( n ) at time n + 1 , sowe must have X i ( n + 1) ≥ X i ( n ) . We refer to this property as monotonicity throughout. We explained the message of our main result in Section 1.3. In this section we provide the precisestatement in Theorem 2.1. First we introduce the setup for the theorem.For n, k ∈ N and i ∈ [ N ] we will denote the index of the time- n ancestor of the particle ( i, n + k ) by ζ i,n + k ( n ) , i.e. particle ( ζ i,n + k ( n ) , n ) is the ancestor of ( i, n + k ) . Recall that the relevant space scale for ourprocess is a N , deﬁned in (1.5). For r ≥ and n ∈ N , let L r,N ( n ) denote the number of particleswhich are within distance ra N of the leftmost particle at time n : L r,N ( n ) := max { i ∈ [ N ] : X i ( n ) ≤ X ( n ) + ra N } . (2.1)Deﬁne a sequence ( ε N ) N ∈ N such that ε N (cid:96) N is an integer for all N ≥ , and which satisﬁes ε N (cid:96) N → ∞ and ε N → as N → ∞ . (2.2)We introduce two events which describe the spatial distribution and the genealogy of the populationat a given time t . Our main result, Theorem 2.1, says that these two events occur with highprobability. We deﬁne the events for all N ≥ and t > (cid:96) N . For η > and γ ∈ (0 , , the ﬁrst eventsays that at least N − N − γ particles (i.e. almost the whole population if N is large) are withindistance ηa N of the leftmost particle at time t . We let A = A ( t, N, η, γ ) := (cid:8) L η,N ( t ) ≥ N − N − γ (cid:9) . (2.3)Recall the notation t i from (1.7). We illustrate the second event in Figure 2. We sample M ∈ N particles uniformly at random from the population at time t . Let P = ( P , . . . , P M ) be the indexset of the sampled particles. The event says that there exists a time T between t and t such thatall of the particles in the sample have a common ancestor at time T , but no pair of particles in thesample have a common ancestor at time T + ε N (cid:96) N . Moreover, the common ancestor at time T isthe leader particle ( N, T ) . Additionally, the event says that the time T is not particularly close to t or t , in that T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] for some δ > . We let A = A ( t, N, M, δ ) := (cid:26) ∃ T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] : ζ P i ,t ( T ) = N ∀ i ∈ [ M ] and ζ P i ,t ( T + ε N (cid:96) N ) (cid:54) = ζ P j ,t ( T + ε N (cid:96) N ) ∀ i, j ∈ [ M ] , i (cid:54) = j (cid:27) . (2.4)For convenience, we will often write A and A for the two events above, omitting the arguments.We will prove the following result. Theorem 2.1.

For all η > and M ∈ N there exist γ, δ ∈ (0 , such that for all N ∈ N suﬃcientlylarge and t ∈ N with t > (cid:96) N , P ( A ∩ A ) > − η, where (cid:96) N is given by (1.4) , and A = A ( t, N, η, γ ) and A = A ( t, N, M, δ ) are deﬁned in (2.3) and (2.4) respectively. T + ε N (cid:96) N to ( (cid:96) N ) t − T ∈ [ (cid:96) N , (cid:96) N ] Figure 2: Coalescence of the ancestral lineages of M = 6 particles. We go backwards in time fromtop to bottom in the ﬁgure. To each particle in the sample we associate a vertical line, representingits ancestral line. Two lines coalesce into one when the particles they are associated with have acommon ancestor for the ﬁrst time going backwards from time t . All coalescences of the lineages ofthe sample happen within a time window of size o ( (cid:96) N ) . Time T is the generation of the most recentcommon ancestor of the majority of the whole population at time t . The three dots in each lineindicate that the picture is not proportional: the time between t and T is of order (cid:96) N , whereas thetime between all coalescences and T is o ( (cid:96) N ) .We explained two additional results in Section 1.5 which show the optimality of Theorem 2.1.We state these results precisely below.We deﬁne the event A (cid:48) as a modiﬁcation of the event A . Whereas A said that the coalescencetime T is roughly in [ t , t ] , the event A (cid:48) says that T is in the smaller interval [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] for < s < s < ; and whereas A occurs with high probability, we will show that A (cid:48) occurs withprobability bounded away from . For M ∈ N and < s < s < , we deﬁne A (cid:48) = A (cid:48) ( t, N, M, s , s ) := (cid:26) ∃ T ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] : ζ P i ,t ( T ) = N ∀ i ∈ [ M ] and ζ P i ,t ( T + ε N (cid:96) N ) (cid:54) = ζ P j ,t ( T + ε N (cid:96) N ) ∀ i, j ∈ [ M ] , i (cid:54) = j (cid:27) . (2.5)Proposition 2.2 below says that for all < s < s < and r > , with probability boundedbelow by a constant depending on r and s − s , the event A (cid:48) occurs and the diameter at time t is at least ra N . The diameter of the particle cloud at time n will be denoted by d ( X ( n )) ; that is, d ( X ( n )) := X N ( n ) − X ( n ) . (2.6) Proposition 2.2.

For all < s < s < , M ∈ N and r > , there exists π r,s − s > such thatfor N suﬃciently large and t > (cid:96) N , P (cid:0) A (cid:48) ∩ { d ( X ( t )) ≥ ra N } (cid:1) > π r,s − s , where A (cid:48) ( t, N, M, s , s ) is deﬁned in (2.5) . Our second result about the diameter says that for all r , the probability that d ( X ( n )) ≥ ra N isbounded away from zero, and it tends to as r → , and tends to as r → ∞ , if N is suﬃciently9arge and n > (cid:96) N . This shows that the probability that after a long time the diameter is not oforder a N is small, and therefore the part of Theorem 2.1 that says most of the population is withindistance ηa N of the leftmost particle with high probability, for arbitrarily small η > , is meaningful. Proposition 2.3.

There exist < p r ≤ q r ≤ such that q r → as r → ∞ and p r → as r → ,and for all r > , < p r ≤ P ( d ( X ( n )) ≥ ra N ) ≤ q r , for N suﬃciently large and n > (cid:96) N . We ﬁrst prove a simple lemma which will be helpful in the course of the proof of Theorem 2.1 andalso helpful for understanding the heuristics. The lemma says that the number of particles that areto the right of a given position at least doubles at every time step until it reaches N . The statementfollows deterministically from the deﬁnition of the N -BRW. The proof serves as a warm-up forseveral more deterministic arguments to come. For x ∈ R and n ∈ N , we write the set of particlesto the right of position x at time n as G x ( n ) := { i ∈ [ N ] : X i ( n ) ≥ x } . (2.7) Lemma 2.4.

Let x ∈ R and n, k ∈ N . Then | G x ( n + k ) | ≥ min (cid:16) N, k | G x ( n ) | (cid:17) . Proof.

The statement is clearly true when G x ( n ) = ∅ . Now assume that G x ( n ) (cid:54) = ∅ . Let us ﬁrstconsider the case in which every descendant of the particles in G x ( n ) survives until time n + k .Since there are k | G x ( n ) | such descendants, each of which is to the right of x since all jumps arenon-negative, in this case we have | G x ( n + k ) | ≥ k | G x ( n ) | .Now let us consider the case in which not every descendant of the particles in G x ( n ) survivesuntil time n + k . This means that there exist m ∈ [ n, n + k − , j ∈ [ N ] and b ∈ { , } such that ( j, m ) is a descendant of a particle in G x ( n ) and X j ( m ) + X j,b,m ≤ X ( m + 1) . Since particle ( j, m ) descends from G x ( n ) , and all jumps are non-negative, we also have x ≤ X j ( m )+ X j,b,m , and therefore x ≤ X ( m + 1) ≤ X ( n + k ) , and the result follows.Now we turn to the heuristics for the proof of Theorem 2.1. The heuristic picture to keep inmind when thinking about both the statement and the proof is Figure 1. As in Section 1.4, we let T denote the last time at which a particle takes the lead with a big jump of order a N before time t . In Section 1.4, we argued that if T ∈ [ t , t ] then with high probability, particle ( N, T ) will bethe common ancestor of almost every particle in the population at time t , and almost the wholepopulation at time t is close to X N ( T ) on the a N space scale. We will use a rigorous version of thisheuristic argument to show that the event A occurs with high probability, and that the time T satisﬁes the ﬁrst line in the event A with high probability. That is, every particle from a uniformsample of ﬁxed size M at time t descends from particle ( N, T ) with high probability.If T is as described above, then we can only have T ∈ [ t , t ] if there is a particle which takesthe lead with a jump of order a N in the time interval [ t , t ] . It is not straightforward to show thatthis happens with high probability. It could be the case that the diameter is large on the a N spacescale during the time interval [ t , t ] , say greater than Ca N , where C > is large. In this situation,10f the jumps of order a N in the time interval [ t , t ] come from close to the leftmost particle, andthey are all smaller than Ca N , then these jumps will not make a new leader, and time T will notbe in the time interval [ t , t ] . We will prove that this is unlikely. A key property which is helpfulin seeing this is the following. If no particle takes the lead with a big jump of order a N for (cid:96) N time,e.g. between times s ∈ N and s + (cid:96) N , then the diameter of the particle cloud will be very small onthe a N space scale at time s + (cid:96) N . Indeed, all the N particles, including the leftmost, are to theright of position X N ( s ) at time s + (cid:96) N by Lemma 2.4. But with high probability, particles cannotmove far to the right from this position without making big jumps of order a N . We will prove thisin Corollary 4.5. Therefore, provided that no unlikely event happens, if no particle takes the leadwith a big jump between times s and s + (cid:96) N , then every particle will be near the position X N ( s ) attime s + (cid:96) N . We formally prove this in Lemma 3.9.We will be able to use this property for s = t − c (cid:48) (cid:96) N with small c (cid:48) > . We will conclude that ifno particle takes the lead with a jump of order a N in the time interval [ t − c (cid:48) (cid:96) N , t − c (cid:48) (cid:96) N ] then thediameter at time t − c (cid:48) (cid:96) N is likely to be small on the a N space scale, i.e. d ( X ( t − c (cid:48) (cid:96) N )) < ca N ,for some c > which we can choose to be much smaller than c (cid:48) . If the diameter is less than ca N ,then any particle performing a jump larger than ca N becomes the new leader.The expected number of jumps larger than ca N in c (cid:48) (cid:96) N time is c (cid:48) (cid:96) N N h ( ca N ) − , because thereare N jumps at each time step and the jump distribution is given by (1.3), which is roughly c (cid:48) /c α for N suﬃciently large. If c α is much smaller than c (cid:48) , then with high probability there will be ajump of size greater than ca N in the time interval [ t − c (cid:48) (cid:96) N , t ] , and the particle performing it willbecome the new leader. Therefore the last time before time t when a particle becomes the leaderwith a jump of order a N will be after time t , which gives us T ∈ [ t , t ] .The above idea works for the case where no particle takes the lead with a big jump of order a N in the time interval [ t − c (cid:48) (cid:96) N , t ] for some small c (cid:48) > . If instead there is such a particle then wewill argue that in a short interval of length c (cid:48) (cid:96) N it is likely that the jump made by this particle willnot be too large on the a N scale and therefore the particle’s descendants will be surpassed by largerjumps of order a N at some point in the much longer time interval [ t , t ] .In order to show that the coalescence is star shaped, we also need the second line of the event A , which says that all coalescences of the lineages of a sample of M particles at time t happenwithin a time window of size ε N (cid:96) N ; that is, instantaneously on the (cid:96) N time scale (see Figure 2).To prove that no pair of particles in the sample of M have a common ancestor at time T + ε N (cid:96) N ,it will be enough to prove that every particle at time T + ε N (cid:96) N has a number of time- t descendantswhich is at most a very small proportion of the total population size N (we will check this inLemma 2.5). With high probability, most of the population at time t descends from the leading ε N (cid:96) N ≈ N ε N particles at time T + ε N (cid:96) N (the descendants of particle ( N, T ) ). If these particlesshare their time- t descendants fairly evenly, then a particle in this leading tribe will have roughly N − ε N = o ( N ) descendants. Indeed, we will prove using concentration results from [19] that withhigh probability the number of time- t descendants of a particle from the leading tribe at time T + ε N (cid:96) N will not exceed the order of N − ε N . We now introduce the notation we will be using throughout the proof of Theorem 2.1. We deﬁne theﬁltration ( F n ) n ∈ N by letting F n be the σ -algebra generated by the random variables ( X i,b,m , i ∈ [ N ] , b ∈ { , } , m < n ) from Section 2.1. Since X ( n ) is deﬁned in such a way that it only depends onjumps performed before time n , the process ( X ( n )) n ∈ N is adapted to the ﬁltration ( F n ) n ∈ N . Since ( X i,b,m , i ∈ [ N ] , b ∈ { , } , m ∈ N ) are i.i.d., the jumps ( X i,b,n , i ∈ [ N ] , b ∈ { , } ) are independentof the σ -algebra F n . In Theorem 2.1 we assume that t > (cid:96) N , as in the proof we will examine11he process in the time interval [ t , t ] , where t is given by (1.7). Since jumps at time t are not F t -measurable, we will be interested in jumps performed in the time interval [ t , t − .The jump of the i th particle’s b th oﬀspring at time n will be referred to using the random variable X i,b,n , or the triple ( i, b, n ) . In order to study the genealogy of the N -BRW particle system, we willneed a notation which says that two particles are related. Let us introduce the partial order (cid:46) onthe set of pairs { ( i, n ) , i ∈ [ N ] , n ∈ N } . First, for i ∈ [ N ] and n ∈ N we say that ( i, n ) (cid:46) ( i, n ) and,for j ∈ [ N ] , we write ( i, n ) (cid:46) ( j, n + 1) if and only if the j th particle at time n + 1 is an oﬀspring ofthe i th particle at time n . Then in general, for n, k ∈ N and i , i k ∈ [ N ] we write ( i , n ) (cid:46) ( i k , n + k ) if and only if particle ( i k , n + k ) is a descendant of particle ( i , n ) : ( i , n ) (cid:46) ( i k , n + k ) ⇐⇒ ∃ i , . . . , i k − : ( i j − , n + j − (cid:46) ( i j , n + j ) , ∀ j ∈ [ k ] . (2.8)Then the particles (( i j , n + j ) , j ∈ [ k ]) represent the ancestral line between ( i , n ) and ( i k , n + k ) .Recall that for n, k ∈ N and i ∈ [ N ] we denote the index of the time- n ancestor of the particle ( i, n + k ) by ζ i,n + k ( n ) . Thus, using our partial order above, we can write for j ∈ [ N ] , ζ i,n + k ( n ) = j ⇐⇒ ( j, n ) (cid:46) ( i, n + k ) . (2.9)We also introduce a slightly diﬀerent (strict) partial order (cid:46) b , which will be convenient later on.For i , i k ∈ [ N ] , n ∈ N and k ∈ N we write ( i , n ) (cid:46) b ( i k , n + k ) if and only if the b th oﬀspring ofparticle ( i , n ) is the time- ( n + 1) ancestor of particle ( i k , n + k ) . Note that if ( i , n ) (cid:46) b ( i k , n + k ) then there exists i ∈ [ N ] such that X i ( n + 1) = X i ( n ) + X i ,b,n and ( i , n + 1) (cid:46) ( i k , n + k ) . Using the above partial order, we deﬁne the path between particles ( i , n ) and ( i k , n + k ) (andbetween positions X i ( n ) and X i k ( n + k ) ), as the sequence of jumps connecting the two particles.For i , i k ∈ [ N ] and n ∈ N , if k ∈ N and ( i , n ) (cid:46) ( i k , n + k ) , we let P i k ,n + ki ,n := (cid:8) ( i j , b j , n + j ) : j ∈ { , . . . , k − } and ( i j , n + j ) (cid:46) b j ( i k , n + k ) (cid:9) , (2.10)and we let P i k ,n + ki ,n := ∅ otherwise. Then if k ∈ N and ( i , n ) (cid:46) ( i k , n + k ) , X i k ( n + k ) = X i ( n ) + (cid:88) ( j,b,m ) ∈ P ik,n + ki ,n X j,b,m . (2.11)For i ∈ [ N ] and n, k ∈ N with n ≤ k , let N i,n ( k ) denote the set of descendants of particle ( i, n ) at time k : N i,n ( k ) := { j ∈ [ N ] : ( i, n ) (cid:46) ( j, k ) } , (2.12)and if n < k , for b ∈ { , } , let N bi,n ( k ) be the set of time- k descendants of the b th oﬀspring ofparticle ( i, n ) : N bi,n ( k ) := { j ∈ [ N ] : ( i, n ) (cid:46) b ( j, k ) } . (2.13)(Note that the sets N i,n ( k ) and N bi,n ( k ) may be empty.) We write |N i,n ( k ) | and |N bi,n ( k ) | for thenumber of descendants in each case.Finally, as time is discrete, it will be useful to introduce a notation for the set of integers in aninterval; for ≤ s ≤ s , we let (cid:74) s , s (cid:75) := [ s , s ] ∩ N . .5 Big jumps and breaking the record As discussed in Section 1.4, the common ancestor of the majority of the population at time t isa particle which made an unusually big jump, of order a N , between times t and t . The set ofunusually big jumps will play an essential role in the proof of Theorem 2.1. We will be particularlyinterested in particles which become ‘leaders’ after performing such jumps. These particles are thecandidates to become the common ancestor of almost the whole population at time t .We now introduce the necessary notation for the above concepts. In the deﬁnitions we willindicate the dependence on a new parameter ρ ∈ (0 , , as the choice of ρ will be important lateron. Furthermore, everything we deﬁne will depend on N and t , which we do not always indicate.For ρ ∈ (0 , we introduce the term big jump for jumps of size greater than ρa N , and we denotethe set of big jumps on an interval [ s , s ] ⊆ [ t , t − by B [ s ,s ] N : B [ s ,s ] N = B [ s ,s ] N ( ρ ) := { ( k, b, s ) ∈ [ N ] × { , } × (cid:74) s , s (cid:75) : X k,b,s > ρa N } , (2.14)where a N is given by (1.5). We also let B N := B [ t ,t − N . (2.15)We say a particle breaks the record if it takes the lead with a big jump. If one of the currentleader’s descendants makes a small jump (that is, a non-big jump) to become the leader, then thatdoes not count as breaking the record in our terminology. Let S N denote the set of times when therecord is broken by a big jump between times t and t : S N = S N ( ρ ) := (cid:26) s ∈ (cid:74) t , t − (cid:75) : ∃ ( k, b ) ∈ [ N ] × { , } such that ( k, s ) (cid:46) b ( N, s + 1) and X k,b,s > ρa N (cid:27) . (2.16)Next, we deﬁne T as the last time when the leader broke the record with a big jump beforetime t , if there is any such time. We let T = T ( ρ ) := 1 + max { S N ( ρ ) ∩ [ t , t − } , (2.17)and let T = 0 if S N ( ρ ) ∩ [ t , t −

1] = ∅ . Note that the big jump which takes the lead happensat time T − , and T is the time right after the jump. In the proof it turns out that with highprobability, T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] for some δ > , and particle ( N, T ) is the common ancestorof almost the whole population at time t .We will have a separate notation, ˆ S N , for the times when the leader is surpassed by a particlewhich performs a big jump. Note that this is not exactly the same set of times as S N : it mighthappen that a particle ( i, s ) has an oﬀspring ( j, s + 1) , which beats the current leader ( N, s ) witha big jump, but it does not become the next leader at time s + 1 because it is beaten by anotheroﬀspring particle which did not make a big jump. We deﬁne ˆ S N = ˆ S N ( ρ ) := (cid:26) s ∈ (cid:74) t , t − (cid:75) : ∃ ( k, b ) ∈ [ N ] × { , } such that X k,b,s > ρa N and X k ( s ) + X k,b,s > X N ( s ) (cid:27) . (2.18)We will see in Corollary 3.8 below that with high probability, S N and ˆ S N coincide on certain timeintervals. Sometimes we will also need to refer to the set of times when big jumps do not take thelead or beat the current leader. Therefore, with a slight abuse of notation, we will write S cN and ˆ S cN to denote the sets of times (cid:74) t , t − (cid:75) \ S N and (cid:74) t , t − (cid:75) \ ˆ S N respectively.13 .6 Reformulation In this section, we break down the event A of Theorem 2.1. Our ultimate goal is to show, for asuitable choice of ρ , that T = T ( ρ ) , as deﬁned in (2.17), has the properties required in A . To thisend we introduce new events which imply A with high probability, and only involve T and thenumber of time- t descendants of particle ( N, T ) and of the particles at time T + ε N (cid:96) N . We will usethe following notation: T ε N = T ε N ( ρ ) := T ( ρ ) + ε N (cid:96) N , (2.19)where ε N is deﬁned in (2.2). Recalling (2.12), for i ∈ [ N ] , we write N i := N i,T εN ( t ) (2.20)for the set of time- t descendants of the i th particle at time T ε N , and D i = D i,T εN ( t ) := |N i,T εN ( t ) | (2.21)for the size of this set.For γ, δ, ρ ∈ (0 , , we introduce the event A = A ( t, N, δ, ρ, γ ) := { T ( ρ ) ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] } ∩ (cid:8) |N N,T ( ρ ) ( t ) | ≥ N − N − γ (cid:9) . (2.22)This event says that almost the whole population at time t descends from particle ( N, T ) , whichwill imply with high probability that each particle in the uniform sample of M particles in the event A is a descendant of ( N, T ) . The ﬁnal part of the deﬁnition of the event A says that no twoparticles at time t in the uniform sample of M particles share an ancestor at time T ε N . We nowdeﬁne an event which says that every time- T ε N particle has at most a very small proportion of the N surviving descendants at time t , so that with high probability none of them have two descendantsin the sample of M particles. For ν > and ρ ∈ (0 , , we let A ( ν ) = A ( t, N, ρ, ν ) := (cid:26) max i ∈N N,T ( T εN ) D i,T εN ( t ) ≤ νN (cid:27) . (2.23)Note that in the deﬁnition of A ( ν ) we take the maximum only over the time- T ε N descendants ofparticle ( N, T ) . It will be easy to deal with the remaining particles at time T ε N , because the event A implies that for ν > , if N is large, particles not descended from ( N, T ) cannot have more than νN descendants at time t . In the following result, we reduce the proof of Theorem 2.1 to showingthat A , A and A ( ν ) occur with high probability.As part of the proof we show that the probability that two particles in the sample of M at time t have a common ancestor at time T ε N can be upper bounded by little more than the sum of theprobabilities of the events A c and A ( ν ) c when ν is small. We will use this intermediate result inanother argument later on in Section 6, so we state it as part of Lemma 2.5 below. Lemma 2.5.

Take M ∈ N and γ, δ, ρ, η ∈ (0 , , and let < ν < η/M . Then for all N suﬃcientlylarge and t > (cid:96) N , P ( ∃ j, l ∈ [ M ] , j (cid:54) = l : ζ P j ,t ( T ε N ) = ζ P l ,t ( T ε N )) ≤ P ( A c ) + P ( A ( ν ) c ) + η/ , and P ( A c ) ≤ P ( A c ) + P ( A ( ν ) c ) + η, where A ( t, N, M, δ ) , A ( t, N, δ, ρ, γ ) and A ( t, N, ρ, ν ) are deﬁned in (2.4) , (2.22) and (2.23) re-spectively, P j is the index of a particle in the uniform sample of M particles at time t , and ζ P j ,t ( T ε N ) is the index of the time- T ε N ancestor of particle ( P j , t ) , deﬁned in (2.9) . roof. Fix M ∈ N and γ, δ, ρ, η ∈ (0 , . Note that by the deﬁnition of A in (2.4), { T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] } ∩ (cid:8) ζ P j ,t ( T ) = N ∀ j ∈ [ M ] (cid:9) ∩ (cid:8) ζ P j ,t ( T ε N ) (cid:54) = ζ P l ,t ( T ε N ) ∀ j, l ∈ [ M ] , j (cid:54) = l (cid:9) ⊆ A . (2.24)First we aim to show that for N suﬃciently large, P ( { T / ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] } ∪ (cid:8) ∃ j ∈ [ M ] : ζ P j ,t ( T ) (cid:54) = N (cid:9) ) ≤ P ( A c ) + η/ . (2.25)Note that if A occurs then T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] , and A is F t -measurable, so P ( { T / ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] } ∪ (cid:8) ∃ j ∈ [ M ] : ζ P j ,t ( T ) (cid:54) = N (cid:9) ) ≤ E (cid:2) A P ( ∃ j ∈ [ M ] : ζ P j ,t ( T ) (cid:54) = N | F t ) (cid:3) + P ( A c ) . Now, on the event A , at most N − γ time- t particles are not descended from ( N, T ) , and thereforea union bound on the uniformly chosen sample (which is not F t -measurable) gives that the above isat most M N − γ /N + P ( A c ) . This implies (2.25) for N suﬃciently large.Now ﬁx ν ∈ (0 , η/M ) . Our second step is to prove that for N suﬃciently large, P ( ∃ j, l ∈ [ M ] , j (cid:54) = l : ζ P j ,t ( T ε N ) = ζ P l ,t ( T ε N )) ≤ P ( A c ) + P ( A ( ν ) c ) + η/ , (2.26)which is the ﬁrst part of the statement of the lemma. The event on the left-hand side means thatthere is a particle at time T ε N which has at least two descendants in the sample of M particles attime t . That is P ( ∃ j, l ∈ [ M ] , j (cid:54) = l : ζ P j ,t ( T ε N ) = ζ P l ,t ( T ε N )) = P ( ∃ i ∈ [ N ] , j, l ∈ [ M ] , j (cid:54) = l : {P j , P l } ⊆ N i ) . (2.27)We will use that if all the N i sets have size smaller than νN then it is unlikely that two particlesof the uniformly chosen sample will fall in the same N i set. Since D i is F t -measurable for all i , aunion bound gives P ( ∃ i ∈ [ N ] , j, l ∈ [ M ] , j (cid:54) = l : {P j , P l } ⊆ N i ) ≤ E (cid:20) { max i ∈ [ N ] D i ≤ νN } N (cid:88) i =1 (cid:88) ≤ j νN (cid:19) . (2.28)Since the sample is chosen uniformly at random, the ﬁrst term on the right-hand side is equal to E (cid:34) { max i ∈ [ N ] D i ≤ νN } N (cid:88) i =1 (cid:18) M (cid:19) (cid:0) D i (cid:1)(cid:0) N (cid:1) (cid:35) ≤ E (cid:34) { max i ∈ [ N ] D i ≤ νN } max j ∈ [ N ] D j (cid:18) M (cid:19) (cid:80) Ni =1 D i N ( N − (cid:35) ≤ (cid:18) M (cid:19) νNN − , (2.29)where in the second inequality we exploit the indicator and use that (cid:80) Ni =1 D i = N . In order todeal with the second term on the right-hand side of (2.28), note that the maximum is taken over allparticles at time T ε N (because of the deﬁnition of D i in (2.21)). Suppose N is suﬃciently large that N − γ ≤ νN . Then if the event A occurs, particles not descended from particle ( N, T ) (i.e. particlesnot in N N,T ( T ε N ) ) have at most νN descendants at time t . Therefore, by the deﬁnition of A ( ν ) , P (cid:18) max i ∈ [ N ] D i > νN (cid:19) ≤ P ( A ( ν ) c ) + P (cid:18) max i ∈ [ N ] \N N,T ( T εN ) D i > νN (cid:19) ≤ P ( A ( ν ) c ) + P ( A c ) , (2.30)15or N suﬃciently large.Putting (2.27)-(2.30) together, since we chose ν < η/M we have that (2.26) holds for N suﬃ-ciently large. By (2.24), (2.25) and (2.26), the result follows.We now state the two main intermediate results in the proof of Theorem 2.1, which say that, forwell-chosen γ , δ , and ρ , the events A , A and A ( ν ) occur with high probability. In Sections 3 and4 we give the proof of Proposition 2.6, and in Section 5 we prove Proposition 2.7. Proposition 2.6.

For η ∈ (0 , there exist < γ < δ < ρ < η such that for N suﬃciently largeand t > (cid:96) N , P ( A ∩ A ) > − η, where A ( t, N, η, γ ) and A ( t, N, δ, ρ, γ ) are deﬁned in (2.3) and (2.22) respectively. Proposition 2.7.

Let η ∈ (0 , and ν > . Then for ρ ∈ (0 , η ) as in Proposition 2.6, for N suﬃciently large and t > (cid:96) N , P ( A ( ν )) > − η, where A ( t, N, ρ, ν ) is deﬁned in (2.23) .Proof of Theorem 2.1. Lemma 2.5, Proposition 2.6 and Proposition 2.7 immediately imply Theo-rem 2.1.

Our strategy for the proof of Proposition 2.6 is based on the picture in Figure 1. For t > (cid:96) N , wewill show that the following happens between times t and t with probability close to 1.1. There will be particles which lead by a large distance at times in [ t , t ] . The last such particlewill be at time T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] with position X N ( T ) .2. The descendants of this particle are close together and far away from the the rest of thepopulation at time t , forming a small (size o ( N ) ) leader tribe.3. At time t , the descendants of the small leader tribe from time t form a big tribe of N − o ( N ) particles, which descend from particle ( N, T ) and are close to the leftmost particle.The ﬁrst part of the proof is a deterministic argument given in Section 3, which shows that if ‘allgoes well’ between times t and t , then steps 1-2-3 above roughly describe what happens, which willimply that the events A and A in Proposition 2.6 occur. For the deterministic argument we willintroduce a number of events, which will describe suﬃcient criteria for A and A to happen. Oncewe have shown that the intersection of these events is contained in A ∩ A , it is enough to provethat the probability of this intersection is close to 1. This part will be carried out in Section 4, andconsists of checking that ‘all goes well’ with high probability.We describe our strategy for showing Proposition 2.7 in detail in Section 5.1. The main idea is togive a lower bound on the position of the leftmost particle at time t with high probability, and thenuse concentration inequalities from [19] to bound the number of time- t descendants of each particlein N N,T ( T ε N ) which can reach that lower bound by time t . A key intermediate step will be to seethat with high probability, particles can reach the lower bound only if they have an ancestor whichmade a jump larger than a certain size. 16 Deterministic argument for the proof of Proposition 2.6

In this section we provide the main component of the proof of Proposition 2.6. We follow the planexplained in the previous section; we deﬁne new events and show that they imply A and A . InSection 4, we will prove that the new events occur with high probability. The events describe astrategy designed to make sure that the majority of the population at time t has a common ancestorat some time between t and t ; that is, to ensure that A occurs. The strategy will also show thatmost of the particles descended from particle ( N, T ) cannot move too far from position X N ( T ) bytime t . Thus it will be easy to see that these descendants are near the leftmost particle at time t ,and so A must occur. So although the strategy is designed for the event A , it will imply A too.In the course of the proof we will use several constants. We ﬁrst give a guideline, which showshow the constants should be thought of throughout the rest of the paper, then we describe thespeciﬁc assumptions we need for the rest of this section. Recall that we ﬁxed α > as in (1.3) andthat we have η ∈ (0 , from the statement of Proposition 2.6. The other constants can be thoughtof as < γ < δ (cid:28) ρ (cid:28) c (cid:28) c (cid:28) c (cid:28) c (cid:28) c (cid:28) c (cid:28) η < and K (cid:29) ρ − α . (3.1)As everything is constant in (3.1), we only use (cid:28) as an informal notation to say that the left-handside is much smaller than the right-hand side.More speciﬁcally, for the rest of this section we ﬁx the constants γ, δ, ρ, c , c , . . . , c , η and K ,and assume that they satisfy < γ < δ < ρ, (3.2) ρ < c , (3.3) c j < c j +1 < η < , j = 1 , . . . , , (3.4) K > ρ − α . (3.5)We will have additional conditions on these constants in Section 4, which will be consistent with theassumptions (3.2)-(3.5).Every event we introduce below will depend on N , t (with t > (cid:96) N ) and on some of the constantsabove. In the deﬁnitions we will not indicate this dependence explicitly. Note furthermore that inthe statement of Proposition 2.6, taking N suﬃciently large may depend on γ , δ , or ρ . A We begin by breaking down the event A from Proposition 2.6 into two other events. Then we willdeﬁne a strategy for showing that these two events occur. The ﬁrst event describes the particlesystem at time t ; it says that there is a small leader tribe of size less than N − δ , and everyother particle is at least c a N to the left of this tribe. Moreover, each particle in the leading tribedescends from the same particle, ( N, T ) . The common ancestor ( N, T ) is the last particle whichbreaks the record with a big jump before time t (see (2.17) and also Figure 1). We also require T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] , which is part of the event A .To keep track of the size of the leader tribe we introduce notation for the number of particleswhich are within distance εa N of the leader at time n : R ε,N ( n ) := max { i ∈ [ N ] : X N − i +1 ( n ) ≥ X N ( n ) − εa N } , for n ∈ N and ε > . (3.6)Note that if R ε,N ( t ) < N then particle ( N − R ε,N ( t ) + 1 , t ) is within distance εa N of the leader,but particle ( N − R ε,N ( t ) , t ) is not. In the event we introduce below, we set ε = c and require17he distance between these two particles to be at least c a N , showing that there is a gap betweenthe leader tribe and the other particles. The event is deﬁned as follows: B :=  R c ,N ( t ) ≤ min (cid:8) N − , N − δ (cid:9) , X N − R c ,N ( t ) ( t ) ≤ X N − R c ,N ( t )+1 ( t ) − c a N , T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] and N N,T ( t ) = { N − R c ,N ( t ) + 1 , . . . , N }  , (3.7)where T = T ( ρ ) and N N,T ( t ) are given by (2.17) and (2.12) respectively.In the description of Figure 1 in Section 1.4, we explained that the descendants of particle ( N, T ) are likely to lead at time t . The event B requires more; it also says that the leading tribe leads bya large distance, which is important to ensure that no other tribes can interfere with our heuristicpicture and will be useful in Section 3.2. The most involved part of the deterministic argument in theremainder of Section 3 is to break up the event B into other events which happen with probabilityclose to 1.We now deﬁne another event which says that particles which are not in the leading tribe at time t have at most N − γ (i.e. much less than N for N large) descendants in total at time t . This willimply that the leading tribe at time t will dominate the population at time t . We let B :=  N − R c ,N ( t ) (cid:88) j =1 |N j,t ( t ) | ≤ N − γ  , (3.8)where N j,t ( t ) is given by (2.12). The events which we will introduce to break down the event B will easily imply B as well. Before deﬁning the new events we check that B and B indeed imply A . Lemma 3.1.

Let A , B and B be the events given by (2.22) , (3.7) and (3.8) respectively. Thenfor all N ≥ and t > (cid:96) N , B ∩ B ⊆ A . Proof.

On the event B , the descendants of particle ( N, T ) are the R c ,N ( t ) rightmost particles attime t . Thus N N,T ( t ) is a disjoint union of the sets N j,t ( t ) for j ∈ (cid:74) N − R c ,N ( t ) + 1 , N (cid:75) . Wededuce that on the event B ∩ B , |N N,T ( t ) | = N (cid:88) j = N − R c ,N ( t )+1 |N j,t ( t ) | ≥ N − N − γ . Since T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] on the event B , the result follows. B and B We now break down the events B and B into new events C to C whose probabilities will be easierto estimate. The majority of the work in this section consists of showing that the intersection of thenew events implies B . We can then quickly conclude that the intersection implies both B and A .One of the new events will need to be further broken down in Section 3.3. C to C Recall that (cid:74) s , s (cid:75) denotes the set of integers in the interval [ s , s ] and that the constants γ , δ , ρ , c , c , . . . , c , η and K satisfy (3.2)-(3.5). We ﬁrst introduce τ to denote the ﬁrst time after t whena gap of size c a N appears between the leader and the second rightmost particle: τ := inf { s ≥ t + 1 : X N ( s ) > X N − ( s ) + 2 c a N } . (3.9)18he ﬁrst new event we deﬁne says that such a gap appears by time t , that is C := { τ ∈ (cid:74) t + 1 , t (cid:75) } . (3.10)The next event C ensures that the current leading tribe keeps distance from the other tribesduring the time interval [ τ , t ] . This is important, since B requires a gap behind the leading tribeat time t . The event C says that if a particle is far away (at least c a N ) from the leader, thenit cannot jump to within distance c a N of the leader’s position with a single big jump (recallfrom (3.1) that c (cid:28) c ). That is, a particle far from the leader either stays at least c a N behindthe leader, or it beats the leader by more than c a N . Jumping close to the leader would require alarge jump, of size greater than c a N , restricted to an interval of size c a N , which is much smallerthan the size of the jump. We will see in Section 4 that the probability that such a jump occursbetween times t and t is small. Let Z i ( s ) denote the gap between the rightmost and the i th particleat time s : Z i ( s ) := X N ( s ) − X i ( s ) , for s ∈ N and i ∈ [ N ] . (3.11)Now we can deﬁne our next event C := (cid:26) (cid:64) ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t , t − (cid:75) such that Z i ( s ) ≥ c a N and X i,b,s ∈ ( Z i ( s ) − c a N , Z i ( s ) + 2 c a N ] (cid:27) . (3.12)We need to introduce several more events to make sure that ‘all goes well’; that is, particleswhich we do not expect to make big jumps indeed do not make big jumps, and smaller jumps donot make too much diﬀerence on the a N space scale. The next event says that if a particle makes abig jump, then it will not have a descendant which makes another big jump within (cid:96) N time: C := (cid:40) B N ∩ P k ,s k ,s = { ( k , b , s ) }∀ ( k , b , s ) ∈ B N , ∀ s ∈ (cid:74) s + 1 , min { s + (cid:96) N + 1 , t } (cid:75) , ∀ k ∈ N b k ,s ( s ) (cid:41) , (3.13)where B N , P k ,s k ,s and N b k ,s ( s ) are deﬁned in (2.15), (2.10) and (2.13) respectively.The next event says the following. Take any path between two particles in the time interval [ t , t ] . If we omit the big jumps from the path then it does not move more than distance c a N . Inparticular, if there are no big jumps at all then the path moves at most c a N . The event is given by C := (cid:40) (cid:80) ( i,b,s ) ∈ P k ,s k ,s X i,b,s { X i,b,s ≤ ρa N } ≤ c a N ∀ ( k , s ) ∈ [ N ] × (cid:74) t , t − (cid:75) , ∀ s ∈ (cid:74) s + 1 , t (cid:75) , ∀ k ∈ N k ,s ( s ) (cid:41) , (3.14)where P k ,s k ,s and N k ,s ( s ) are deﬁned in (2.10) and (2.12) respectively.The last three events are simple. On C , two big jumps cannot happen at the same time: C := {| B N ∩ { ( k, b, s ) : ( k, b ) ∈ [ N ] × { , }} | ≤ ∀ s ∈ (cid:74) t , t − (cid:75) } . (3.15)Then C excludes big jumps which happen either right after time t or very close to time t : C := (cid:110) B [ t ,t + (cid:100) δ(cid:96) N (cid:101) ] N ∪ B [ t −(cid:100) δ(cid:96) N (cid:101) ,t + (cid:100) δ(cid:96) N (cid:101) ] N = ∅ (cid:111) (3.16)where B [ s ,s ] N is deﬁned in (2.14). Finally, C gives a bound on the number of big jumps: C := {| B N | ≤ K } , (3.17)where we recall that we chose K to be a positive constant at the start of Section 3.19ow we can state the main result of this subsection. It says that on the events C to C the events B , B and A occur, and therefore A occurs as well. We have an additional event in Proposition 3.2below, which says that the diameter of the particle cloud at time t is larger than c a N . As partof the proposition we also show that C to C imply this event, because it will be useful in anotherargument later on in Section 6. Proposition 3.2.

Let η ∈ (0 , , and assume that the constants γ, δ, ρ, c , c , . . . , c , K satisfy (3.2) - (3.5) . Then for N suﬃciently large that KN − δ < N − γ < and t > (cid:96) N , (cid:92) j =1 C j ⊆ B ∩ B ∩ A ∩ (cid:8) d ( X ( t )) ≥ c a N (cid:9) ⊆ A ∩ A ∩ (cid:8) d ( X ( t )) ≥ c a N (cid:9) , where B , B , A and A are deﬁned in (3.7) , (3.8) , (2.3) and (2.22) respectively, and C , C , . . . , C are given by (3.10) and (3.12) – (3.17) . Note that the second inclusion in Proposition 3.2 follows directly from Lemma 3.1. C to C imply B , B and A : proof of Proposition 3.2 We start by proving some easy lemmas which hold on the event (cid:84) j =1 C j , and which will be appliedin the course of the proof of Proposition 3.2.The ﬁrst lemma gives another way of writing the event C , which will be more convenient to usein this section. (The deﬁnition of C will be easier to work with when we show, in Section 4, that C occurs with high probability.) The lemma says that on the event C , if a path moves more than c a N then it must contain a big jump. Lemma 3.3.

On the event C , for all ( k , s ) ∈ [ N ] × (cid:74) t , t − (cid:75) , s ∈ (cid:74) s + 1 , t (cid:75) and k ∈ N k ,s ( s ) , X k ( s ) > X k ( s ) + c a N = ⇒ B N ∩ P k ,s k ,s (cid:54) = ∅ , where B N , N k ,s ( s ) and P k ,s k ,s are deﬁned in (2.15) , (2.12) and (2.10) respectively.Proof. Let ( k , s ) ∈ [ N ] × (cid:74) t , t − (cid:75) , s ∈ (cid:74) s +1 , t (cid:75) , and k ∈ N k ,s ( s ) . Assume that B N ∩ P k ,s k ,s = ∅ , and the event C occurs. Then by (2.11), X k ( s ) = X k ( s ) + (cid:88) ( i,b,s ) ∈ P k ,s k ,s X i,b,s = X k ( s ) + (cid:88) ( i,b,s ) ∈ P k ,s k ,s X i,b,s { X i,b,s ≤ ρa N } ≤ X k ( s ) + c a N by the deﬁnition of the event C , which completes the proof.The next lemma says that if a path of length at most (cid:96) N starts with a big jump then it movesdistance at most c a N after the big jump. Lemma 3.4.

On the event C ∩ C , for all ( k , b , s ) ∈ B N , s ∈ (cid:74) s + 1 , min { s + (cid:96) N , t } (cid:75) and k ∈ N b k ,s ( s ) , X k ( s ) ≤ X k ( s ) + X k ,b ,s + c a N , where B N and N b k ,s ( s ) are deﬁned in (2.15) and (2.13) respectively. roof. Let l ∈ [ N ] be such that ( k , s ) (cid:46) b ( l, s + 1) , so that X l ( s + 1) = X k ( s ) + X k ,b ,s . (3.18)If s = s + 1 then we are done; from now on assume s ≥ s + 2 . Since X k ,b ,s is a big jump, onthe event C there are no further big jumps on the path between particles ( l, s + 1) and ( k , s ) ,that is B N ∩ P k ,s l,s +1 = ∅ . Therefore, by Lemma 3.3 we have X k ( s ) ≤ X l ( s + 1) + c a N , which,together with (3.18), completes the proof.In the next lemma, we describe how we can exploit the fact that on the event C there are nevertwo big jumps at the same time. First, the event C tells us that if a particle makes a big jump,then the other particles move very little at the time of the jump. Second, it also implies that if aparticle signiﬁcantly beats the current leader with a big jump, then it becomes the new leader, andthe gap behind this new leader will be roughly the distance by which it beat the previous leader.Both statements follow immediately from the setup, but will be useful for example in the proofs ofCorollaries 3.7 and 3.8 below, and later on in the proofs of Propositions 3.11 and 2.2 as well. Lemma 3.5.

On the event C , for all ( k, b, s ) ∈ B N ,(a) X j ( s + 1) ≤ X N ( s ) + ρa N for all j ∈ [ N ] \ N bk,s ( s + 1) , and(b) if X k ( s ) + X k,b,s > X N ( s ) + ca N for some c > ρ , then ( k, s ) (cid:46) b ( N, s + 1) and X N ( s + 1) −X N − ( s + 1) > ( c − ρ ) a N .Proof. Assume that C occurs and ﬁx k, b, s as in the statement. Let j ∈ [ N ] \N bk,s ( s +1) be arbitrary.Assume that i ∈ [ N ] and b i ∈ { , } are such that ( i, s ) (cid:46) b i ( j, s +1) , and so X j ( s +1) = X i ( s )+ X i,b i ,s ,with ( i, b i ) ∈ ([ N ] × { , } ) \ { ( k, b ) } . By the deﬁnition of the event C , X k,b,s is the only big jumpat time s . Thus we have X i,b i ,s ≤ ρa N , and by bounding the i th particle’s position at time s by therightmost position at time s we get X j ( s + 1) = X i ( s ) + X i,b i ,s ≤ X N ( s ) + ρa N , which completes the proof of part (a). Furthermore, if the condition in (b) holds, then we also have X j ( s + 1) ≤ X N ( s ) + ρa N < X k ( s ) + X k,b,s − ( c − ρ ) a N . (3.19)Since (3.19) holds for any j ∈ [ N ] \ N bk,s ( s + 1) and we are assuming c > ρ , we conclude that ( k, s ) (cid:46) b ( N, s + 1) , and the result follows by taking j = N − in (3.19).The next lemma says that if C ∩ C occurs then all big jumps in the time interval [ t , t − comefrom close to the leftmost particle. Our heuristics suggest this should be true, because we expectmost particles to be close to the leftmost particle at a typical time. However, the proof only relieson the assumption that the events C and C occur. Lemma 3.6.

On the event C ∩ C , X k ( s ) ≤ X ( s ) + c a N ∀ ( k, b, s ) ∈ B [ t ,t − N . Proof.

Take s ∈ (cid:74) t , t − (cid:75) , k ∈ [ N ] and b ∈ { , } , and assume that we have X k,b,s > ρa N . Let i k = ζ k,s ( s − (cid:96) N ) be the time- ( s − (cid:96) N ) ancestor of particle ( k, s ) (recall (2.9)). Since ( k, b, s ) ∈ B N ,by the deﬁnition of the event C , we must have B N ∩ P k,si k ,s − (cid:96) N = ∅ . Then by Lemma 3.3 we have X k ( s ) ≤ X i k ( s − (cid:96) N ) + c a N . (3.20)Furthermore, at time s every particle is to the right of X i k ( s − (cid:96) N ) , by Lemma 2.4. This means X i k ( s − (cid:96) N ) ≤ X ( s ) , and so X k ( s ) ≤ X ( s ) + c a N by (3.20).21e will use Lemma 3.6 to prove the next result, which says that if the diameter of the cloudof particles is large and a particle makes a big jump, then either it takes the lead and will besigniﬁcantly ahead of the second rightmost particle, or it stays signiﬁcantly behind the leader. Corollary 3.7.

On the event (cid:84) j =2 C j , if ( k, b, s ) ∈ B [ t ,t − N and d ( X ( s )) ≥ ( c + c ) a N then(a) if X k,b,s > Z k ( s ) then X N ( s + 1) = X k ( s ) + X k,b,s > X N − ( s + 1) + (2 c − ρ ) a N , and(b) if X k,b,s ≤ Z k ( s ) then X k ( s ) + X k,b,s ≤ X N ( s ) − c a N ,where Z k ( s ) and C , . . . , C are given by (3.11) – (3.15) .Proof. Since X k,b,s is a big jump, by Lemma 3.6 and the fact that d ( X ( s )) = X N ( s ) − X ( s ) ≥ ( c + c ) a N , X k ( s ) ≤ X ( s ) + c a N ≤ X N ( s ) − c a N . Hence the gap between the k th particle and the rightmost particle is bounded below by c a N : Z k ( s ) ≥ c a N . (3.21)It follows that if X k,b,s > Z k ( s ) , by the deﬁnition of the event C we have X k,b,s > Z k ( s ) + 2 c a N ,which implies that X k,b,s + X k ( s ) > X N ( s ) + 2 c a N . Since c > ρ by (3.3) and (3.4), Lemma 3.5(b) implies the statement of part (a). If instead X k,b,s ≤ Z k ( s ) , then by (3.21) and the deﬁnition of C , we have X k,b,s ≤ Z k ( s ) − c a N , whichcompletes the proof.The next result says that if the diameter of the cloud of particles is big at some time s , then ifat time s or s − a particle makes a big jump which beats the current leader, this particle becomesthe new leader. Corollary 3.8.

On the event (cid:84) j =2 C j , for all s ∈ (cid:74) t + 1 , t − (cid:75) , if d ( X ( s )) ≥ c a N then s ∈ S N ⇐⇒ s ∈ ˆ S N and s − ∈ S N ⇐⇒ s − ∈ ˆ S N , where S N and ˆ S N are deﬁned in (2.16) and (2.18) .Proof. Take s ∈ (cid:74) t + 1 , t − (cid:75) and suppose d ( X ( s )) ≥ c a N .If s ∈ S N , then there exists ( k, b, s ) ∈ B N such that X k ( s ) + X k,b,s = X N ( s + 1) ≥ X N ( s ) ,where we used monotonicity for the inequality. To show that s ∈ ˆ S N , we need to show that in fact X k ( s ) + X k,b,s > X N ( s ) , i.e. the inequality is strict, but this follows from Corollary 3.7(b), whichapplies since d ( X ( s )) ≥ c a N ≥ ( c + c ) a N by (3.4).Now suppose s ∈ ˆ S N . Since d ( X ( s )) ≥ c a N ≥ ( c + c ) a N , and by the deﬁnition of ˆ S N , theconditions of Corollary 3.7(a) hold for ( k, b, s ) , for some ( k, b ) ∈ [ N ] × { , } . Then Corollary 3.7(a)implies that s ∈ S N , and therefore the ﬁrst equivalence in the statement holds.If d ( X ( s − ≥ ( c + c ) a N , then we can repeat the proof of the ﬁrst equivalence to show that s − ∈ S N ⇐⇒ s − ∈ ˆ S N .If instead d ( X ( s − < ( c + c ) a N we argue as follows. Suppose s − ∈ S N . Then there exists ( k, b, s − ∈ B N such that X k ( s −

1) + X k,b,s − = X N ( s ) ≥ X N ( s − , X k,b,s − ≥ Z k ( s − . Now X k,b,s − = Z k ( s − is impossible because, with theassumption that d ( X ( s − < ( c + c ) a N , it would imply X N ( s ) = X k ( s −

1) + Z k ( s −

1) = X N ( s − < X ( s −

1) + ( c + c ) a N < X ( s ) + c a N by monotonicity and (3.4). This contradicts the assumption d ( X ( s )) ≥ c a N from the statementof this corollary. Hence, we must have X k,b,s − > Z k ( s − , and so s − ∈ ˆ S N .Now suppose s − ∈ ˆ S N , and take ( k, b, s − ∈ B N such that X k ( s −

1) + X k,b,s − > X N ( s − .Then by Lemma 3.5(a) and the assumption on d ( X ( s − , for all j ∈ [ N ] \ N bk,s − ( s ) we have X j ( s ) ≤ X N ( s −

1) + ρa N < X ( s −

1) + ( c + c + ρ ) a N . By monotonicity, (3.3) and (3.4) this is strictly smaller than X ( s ) + c a N . Thus, at time s , allparticles not in N bk,s − ( s ) are closer than distance c a N to the leftmost particle. Hence, since weassumed that d ( X ( s )) ≥ c a N , we must have ( k, s − (cid:46) b ( N, s ) , which means that s − ∈ S N .The last property we state before the proof of Proposition 3.2 says the following. First, if noparticle beats the leader with a big jump for a time interval of length at most (cid:96) N , then the leader’sposition does not change much during this time. We will use the extra condition that the diameteris not too small to prove this easily; if the diameter is too small then jumps that are “almost big”could complicate matters. Second, the lemma says that if the diameter becomes small at some point,then it cannot become too large within (cid:96) N time, if there is no particle which beats the leader witha big jump. Recall the deﬁnition of ˆ S N from (2.18). Lemma 3.9.

On the event (cid:84) j =2 C j , for all s ∈ (cid:74) t , t − (cid:75) and ∆ s ∈ [ (cid:96) N ] , if s + ∆ s ≤ t and (cid:74) s, s + ∆ s − (cid:75) ⊆ ˆ S cN then the following statements hold:(a) If d ( X ( r )) ≥ c a N for all r ∈ (cid:74) s, s + ∆ s − (cid:75) , then X N ( s + ∆ s ) ≤ X N ( s ) + c a N . In particular,if ∆ s = (cid:96) N then d ( X ( s + (cid:96) N )) ≤ c a N .(b) If there exists r ∈ (cid:74) s, s + ∆ s − (cid:75) such that d ( X ( r )) ≤ c a N , then d ( X ( s + ∆ s )) ≤ c a N +2 c a N .Proof. First we prove part (a). Let i, j ∈ [ N ] with ( i, s ) (cid:46) ( j, s + ∆ s ) . Assume that there is abig jump on the path between X i ( s ) and X j ( s + ∆ s ) at time s (cid:48) ∈ (cid:74) s, s + ∆ s − (cid:75) , i.e. there exists ( k (cid:48) , b (cid:48) , s (cid:48) ) ∈ B N ∩ P j,s +∆ si,s . Since we assume s (cid:48) ∈ ˆ S cN , we have X k (cid:48) ( s (cid:48) ) + X k (cid:48) ,b (cid:48) ,s (cid:48) ≤ X N ( s (cid:48) ) . Thensince we assume d ( X ( s (cid:48) )) ≥ c a N > ( c + c ) a N by (3.4), we can apply Corollary 3.7(b) to obtain X k (cid:48) ( s (cid:48) ) + X k (cid:48) ,b (cid:48) ,s (cid:48) ≤ X N ( s (cid:48) ) − c a N . (3.22)Therefore, ﬁrst by Lemma 3.4, second by (3.22), and third by monotonicity and (3.4) we get X j ( s + ∆ s ) ≤ X k (cid:48) ( s (cid:48) ) + X k (cid:48) ,b (cid:48) ,s (cid:48) + c a N ≤ X N ( s (cid:48) ) − c a N + c a N < X N ( s + ∆ s ) . Hence j (cid:54) = N , which means that the leader at time s + ∆ s must be a particle which does not havean ancestor which made a big jump in the time interval [ s, s + ∆ s − . That is, B N ∩ P N,s +∆ si,s = ∅ for all i ∈ [ N ] . But then by Lemma 3.3 we must have X N ( s + ∆ s ) ≤ X N ( s ) + c a N , which shows the ﬁrst statement of part (a). By Lemma 2.4 we also have X ( s + (cid:96) N ) ≥ X N ( s ) , andthe second statement of part (a) follows. 23ow we prove part (b). Let τ d denote the last time before s + ∆ s when the diameter is atmost c a N , that is τ d = sup (cid:8) r ≤ s + ∆ s : d ( X ( r )) ≤ c a N (cid:9) . By our assumption in part (b) we have τ d ≥ s .If τ d = s + ∆ s then we are done. Assume instead that τ d < s + ∆ s . Then we can estimate theleftmost particle position at time s + ∆ s using monotonicity and the deﬁnition of τ d : X ( s + ∆ s ) ≥ X ( τ d ) ≥ X N ( τ d ) − c a N . (3.23)To estimate the rightmost position, we ﬁrst use the fact that τ d ∈ (cid:74) s, s + ∆ s − (cid:75) ⊆ ˆ S cN and d ( X ( τ d + 1)) > c a N by the deﬁnition of τ d . Hence, the second equivalence of Corollary 3.8 impliesthat τ d ∈ S cN ; that is, no big jump takes the lead at time τ d + 1 . Thus, for some ( k, b ) ∈ [ N ] × { , } we have X N ( τ d + 1) = X k ( τ d ) + X k,b,τ d ≤ X N ( τ d ) + ρa N . (3.24)Now (3.23), (3.24) and (3.3) show that if τ d = s + ∆ s − then we are done. Assume insteadthat τ d < s + ∆ s − . Then we can apply part (a) for the time interval [ τ d + 1 , s + ∆ s ] , because d ( X ( r )) > c a N ∀ r ∈ (cid:74) τ d + 1 , s + ∆ s (cid:75) by the deﬁnition of τ d . So by part (a) and then by (3.24) wehave X N ( s + ∆ s ) ≤ X N ( τ d + 1) + c a N ≤ X N ( τ d ) + ( ρ + c ) a N . (3.25)Now (3.25), (3.23) and (3.3) yield part (b). Proof of Proposition 3.2.

The main eﬀort of this proof is in showing that the C i events imply B . Sowe want to see a leader tribe at time t in which all the particles are descended from particle ( N, T ) ,and are signiﬁcantly to the right of all the particles not descended from particle ( N, T ) . We beginby giving an outline of how this will be proved. Outline of proof that C to C imply B Assume the event ∩ j =1 C j occurs. On the event C there will be a time τ ∈ [ t + 1 , t ] when theleader, particle ( N, τ ) , is a distance more than c a N ahead of the second rightmost (and everyother) particle. Having this gap at time τ will ensure that the back of the population is furtherthan c a N away from the leader at all times up to t . That is, the diameter cannot be too smallafter time τ , and so we will be able to apply Corollary 3.7.It is a possibility that on the time interval [ τ , t ] , every particle not descended from ( N, τ ) staysfurther than roughly c a N to the left of the tribe descending from ( N, τ ) . Then we will have thedesired leader tribe with a gap behind it at time t . Alternatively, the tribe of particle ( N, τ ) maybe surpassed by other particles. But then, by Corollary 3.7(a), the leader must be beaten by atleast roughly c a N . The new leader’s descendants might be surpassed too, but again by at least c a N . Then, after the last time T when a tribe is surpassed before t (i.e. the last time when abig jump takes the lead, see (2.17)), no particle will make a big jump that gets closer to the leadertribe than c a N , by Corollary 3.7(b). We will see that this implies that at time t , the leader tribewill be further away than c a N from all the other particles. This argument works if the particles ofthe tribes do not move far from the position of their ancestor which made a big jump. We have thisproperty due to Lemma 3.4.Therefore, the proof will expand on the following steps:(i) The record is broken by a big jump at time τ . Therefore time T , the last time when therecord is broken by a big jump before time t , is either at time τ or later.24ii) The diameter is at least c a N between times τ and t .We will show that the back of the population stays far behind X N ( τ ) , because of the smallnumber of big jumps compared to the number of particles. This is useful, because most of thelemmas and corollaries above will apply if the diameter is not too small.(iii) At time T , the last time before t when a particle takes the lead with a big jump, there willbe a gap of size at least c a N between the leader ( N, T ) and the second rightmost particle ( N − , T ) .This step follows by Corollary 3.7(a), which we can apply because of step (ii). If the diameteris big and the leader is beaten, then the new leader will lead by a large distance.(iv) Every other particle stays at least distance c a N behind the descendants of particle ( N, T ) until time t .This is mainly due to steps (ii) and (iii) and Corollary 3.7(b): if the diameter is big and theleader is not beaten by a big jump, then big jumps will arrive far behind the leader. Therefore,the gap behind the leader tribe created in step (iii) will remain until time t .(v) The leading tribe has the size required by the event B , and thus the event B occurs. Proof that C to C imply B We now give a detailed proof, following steps (i)-(v) above, that (cid:92) j =1 C j ⊆ B . (3.26)Assume that (cid:84) j =1 C j occurs. We ﬁrst check that we have T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] by provingthe following statement. Step (i). We have t + (cid:100) δ(cid:96) N (cid:101) < τ ≤ T ≤ t − (cid:100) δ(cid:96) N (cid:101) , where τ and T are deﬁned in (3.9) and (2.17) . In order to see this, we will use the following simple property: X j ( s − ≤ X N − ( s ) ∀ s ∈ N and j ∈ [ N ] . (3.27)Indeed, since all jumps are non-negative, and particle ( N, s − has two oﬀspring, there are at leasttwo particles to the right of (or at) position X N ( s − at time s . Thus X N ( s − ≤ X N − ( s ) , whichshows (3.27).By the deﬁnition of the event C , we have τ ∈ (cid:74) t + 1 , t (cid:75) . Let ( ˆ J , ˆ b ) ∈ [ N ] × { , } be suchthat ( ˆ J , τ − (cid:46) ˆ b ( N, τ ) , and so X N ( τ ) = X ˆ J ( τ −

1) + X ˆ J, ˆ b,τ − . It also follows from (3.27) that X ˆ J ( τ − ≤ X N − ( τ ) . Hence the deﬁnition of τ in (3.9) implies that X ˆ J, ˆ b,τ − > c a N , whichmeans that X ˆ J, ˆ b,τ − is a big jump, and so cannot happen on the time interval [ t , t + (cid:100) δ(cid:96) N (cid:101) ] by thedeﬁnition of C . This implies the ﬁrst inequality in Step (i). We also notice that X ˆ J, ˆ b,τ − is a bigjump which takes the lead at time τ , that is τ − ∈ S N (see (2.16)). Then we have T ≥ τ by thedeﬁnition of T in (2.17), which shows the second inequality of Step (i). Furthermore, the deﬁnitionof T also shows that T > t − (cid:100) δ(cid:96) N (cid:101) is not possible on C , which concludes the third inequality andthe proof of Step (i).Since we now know that T (cid:54) = 0 , particle ( N, T ) is the last particle which broke the record with a bigjump before time t . Take ( J, b ∗ ) ∈ [ N ] × { , } such that ( J, T − (cid:46) b ∗ ( N, T ) , so X N ( T ) = X J ( T −

1) + X J,b ∗ ,T − , (3.28)25ith X J,b ∗ ,T − > ρa N . That is, at time T − the J th particle’s b ∗ th oﬀspring performed a bigjump X J,b ∗ ,T − , with which it became the leader at time T at position X N ( T ) . We will show that attime t there is a leader tribe in which every particle descends from particle ( N, T ) . Our next steptowards this statement is to show that the diameter is large between times τ and t . Step (ii). We have d ( X ( s )) ≥ c a N for all s ∈ (cid:74) τ , t (cid:75) .We prove Step (ii) by showing that the number of particles within distance c a N of the leader isstrictly smaller than N at all times in (cid:74) τ , t (cid:75) .Let s ∈ (cid:74) τ , t (cid:75) . Consider an arbitrary particle ( i, s ) in the population at time s . We ﬁrst claimthat if X i ( s ) > X N ( τ ) − c a N , (3.29)then particle ( i, s ) has an ancestor which made a big jump at some time ˜ s ∈ (cid:74) τ − , s − (cid:75) . That is,if (3.29) holds then B N ∩ P i,sj,τ − (cid:54) = ∅ , for some j ∈ [ N ] . (3.30)To see this, we notice that X j ( τ − ≤ X N − ( τ ) < X N ( τ ) − c a N ∀ j ∈ [ N ] , (3.31)where the ﬁrst inequality follows by (3.27), and the second from the deﬁnition of τ . Therefore,by (3.29), (3.31) and (3.4), we have X i ( s ) > X j ( τ −

1) + c a N ∀ j ∈ [ N ] . (3.32)In particular, this holds for j ∈ [ N ] such that ( j, τ − (cid:46) ( i, s ) . Therefore (3.30) must hold byLemma 3.3, showing that our claim is true.Thus, every particle which is to the right of X N ( τ ) − c a N at time s has an ancestor whichmade a big jump between times τ − and s − . This gives us the following bound: (cid:8) i ∈ [ N ] : X i ( s ) > X N ( τ ) − c a N (cid:9) ≤ (cid:88) ( l,b,r ) ∈ B [ τ − ,s − N |N bl,r ( s ) | , (3.33)where N bl,r ( s ) and B [ τ − ,s − N are deﬁned in (2.13) and (2.14) respectively. On the right-hand sidewe sum the number of descendants of all particles which made a big jump between times τ − and s − . We want to show that this is smaller than N , because that means that there must be at leastone particle to the left of (or at) X N ( τ ) − c a N at time s .Since [ τ − , s ] ⊆ [ t + (cid:100) δ(cid:96) N (cid:101) , t ] by Step (i), any particle at a time in [ τ − , s − has at most t − ( t + (cid:100) δ(cid:96) N (cid:101) ) descendants at time s . Furthermore, the number of big jumps in the time interval [ τ − , s − is at most K , by the deﬁnition of C . Hence, by (3.33) and then since t − t = (cid:96) N , (cid:8) i ∈ [ N ] : X i ( s ) > X N ( τ ) − c a N (cid:9) ≤ K t − ( t + (cid:100) δ(cid:96) N (cid:101) ) ≤ KN − δ < N, (3.34)by our assumption on N in the statement of Proposition 3.2. Therefore, by (3.34) and monotonicitywe must have X ( s ) ≤ X N ( τ ) − c a N ≤ X N ( s ) − c a N , which concludes the proof of Step (ii).Next we show that there is a gap between the two rightmost particles at time T . Step (iii). We have X N − ( T ) + c a N < X N ( T ) . τ ≤ T by Step (i). If T = τ then the statement of Step (iii) holds by thedeﬁnition of τ and (3.4).Suppose instead that T > τ . We now check the conditions of Corollary 3.7(a). Recall from (3.28)that X J,b ∗ ,T − is a big jump. Since the particle performing the jump X J,b ∗ ,T − becomes the leaderat time T , we have X J,b ∗ ,T − ≥ Z J ( T − , where Z J ( T − is the gap between the J th particle andthe leader at time T − . Also note that ( J, b ∗ , T − ∈ B [ t ,t ] N , and that by Step (ii) and (3.4) wehave d ( X ( T − > ( c + c ) a N . Therefore Corollary 3.7(a) and (b) imply X N ( T ) = X J ( T −

1) + X J,b ∗ ,T − > X N − ( T ) + (2 c − ρ ) a N , which together with (3.3) and (3.4) shows the statement of Step (iii).In Step (iv) we show that every particle which does not descend from particle ( N, T ) is to the leftof X N ( T ) − c a N at time t . Step (iv). Let i ∈ [ N − and j ∈ [ N ] . If ( i, T ) (cid:46) ( j, t ) then X j ( t ) ≤ X N ( T ) − c a N . First we will use Lemma 3.9(a) to bound X N ( t ) . Since T is the last time when a particle took thelead with a big jump before time t , we have (cid:74) T, t − (cid:75) ⊆ S cN , where S N is deﬁned in (2.16). ByCorollary 3.8 and Steps (i) and (ii), it follows that (cid:74) T, t − (cid:75) ⊆ ˆ S cN . Therefore the conditions ofLemma 3.9(a) hold with s = T and ∆ s = t − T . Then Lemma 3.9(a) yields X N ( t ) ≤ X N ( T ) + c a N . (3.35)Now we prove the upper bound on X j ( t ) in the statement of Step (iv). Let us ﬁrst consider thecase in which there is no big jump in the path between particles ( i, T ) and ( j, t ) , i.e. B N ∩ P j,t i,T = ∅ .Then, by Lemma 3.3, Step (iii) and (3.4) we have X j ( t ) ≤ X i ( T ) + c a N < X N ( T ) − c a N + c a N < X N ( T ) − c a N , which shows that the statement of Step (iv) holds in this case.Now suppose instead that there exists a big jump on the path between particles ( i, T ) and ( j, t ) ,so assume we have some ( l, b, r ) ∈ B N ∩ P j,t i,T . We will show that, even with the big jump X l,b,r , par-ticle ( j, t ) cannot arrive close to the leader particle ( N, t ) at time t . This fact together with (3.35)will imply Step (iv).We know that (cid:74) T, t − (cid:75) ⊆ ˆ S cN , and so, in particular, the leader at time r is not beaten by thebig jump X l,b,r . Hence by the deﬁnition of Z l ( r ) in (3.11) we have X l,b,r ≤ Z l ( r ) . Therefore, becauseof Steps (i) and (ii) and by (3.4), Corollary 3.7(b) applies, which implies X l ( r ) + X l,b,r ≤ X N ( r ) − c a N . (3.36)Now by Lemma 3.4 and since t − T < (cid:96) N by Step (i), then by (3.36), and ﬁnally by monotonicity, X j ( t ) ≤ X l ( r ) + X l,b,r + c a N ≤ X N ( r ) − c a N + c a N ≤ X N ( t ) − c a N + c a N . (3.37)Putting (3.37) and (3.35) together and then using (3.4), we obtain X j ( t ) ≤ X N ( T ) − c a N + 2 c a N ≤ X N ( T ) − c a N , which ﬁnishes the proof of Step (iv). Step (v). The event B , as deﬁned in (3.7) , occurs. R = R c ,N ( t ) , where R c ,N ( t ) is given by (3.6). To provethat B occurs, we ﬁrst show that N N,T ( t ) = { j ∈ [ N ] : X j ( t ) ≥ X N ( t ) − c a N } = { N − R + 1 , . . . , N } . (3.38)The second equality follows directly from the deﬁnition of R ; we will prove the ﬁrst equality.Note that Step (iv) implies that every descendant of particle ( N, T ) survives until time t , thatis |N N,T ( t ) | = 2 t − T > . Indeed, by Step (i) and our assumption on N we have t − T ≤ N − δ

N,T ( t ) . Therefore, Step (iv) and (3.39)imply (3.40).Finally, we need to show that R ≤ N − δ . (3.41)We have that R = | { N − R + 1 , . . . , N } | = |N N,T ( t ) | ≤ t − ( t + (cid:100) δ(cid:96) N (cid:101) ) ≤ N − δ , where in the second equality we used (3.38), and the inequality follows since T > t + (cid:100) δ(cid:96) N (cid:101) byStep (i). Therefore by Step (i), (3.38), (3.40) and (3.41), B occurs, which concludes Step (v).This completes the proof of (3.26). Proof that C to C imply B Recall the deﬁnition of the event B in (3.8). We now prove that B ∩ C ∩ C ∩ C ⊆ B , (3.42)which implies (cid:84) j =1 C j ⊆ B because of (3.26).Assume that B ∩ C ∩ C ∩ C occurs. Again write R = R c ,N ( t ) , where R c ,N ( t ) is deﬁnedusing (3.6). Take j ∈ [ N − R ] and consider particle ( j, t ) . Then, by the deﬁnition of the event B in (3.7), and since the leader at time t is to the right of every particle at time t , we have X j ( t ) ≤ X N − R +1 ( t ) − c a N ≤ X N ( t ) − c a N . (3.43)28ow suppose that the i th particle at time t is a descendant of particle ( j, t ) , i.e. i ∈ N j,t ( t ) .Lemma 2.4 implies that every particle at time t is to the right of (or at) X N ( t ) . Thus we have X i ( t ) ≥ X N ( t ) , which together with (3.43) and (3.4) implies X i ( t ) > X j ( t ) + c a N . Thus, by Lemma 3.3, there must be a big jump in the path between particles ( j, t ) and ( i, t ) ; thatis, we must have B N ∩ P i,tj,t (cid:54) = ∅ .Therefore we can bound the number of time- t descendants of particles (1 , t ) , (2 , t ) , . . . , ( N − R, t ) by the number of descendants of particles which made a big jump between times t and t − : N − R (cid:88) j =1 |N j,t ( t ) | ≤ (cid:88) ( k,b,s ) ∈ B [ t ,t − N |N bk,s ( t ) | . (3.44)By the deﬁnition of the event C , no particle makes a big jump in the time interval [ t − (cid:100) δ(cid:96) N (cid:101) , t + (cid:100) δ(cid:96) N (cid:101) ] . Hence, any particle which made a big jump between times t and t − can have at most t − ( t + (cid:100) δ(cid:96) N (cid:101) ) descendants at time t . Furthermore, by the deﬁnition of C , | B [ t ,t − N | ≤ K . Puttingthese observations together with (3.44) we obtain N − R (cid:88) j =1 |N j,t ( t ) | ≤ KN − δ < N − γ , (3.45)by our assumption on N in the statement of the proposition. This completes the proof of (3.42). Proof that C to C imply A Recall the deﬁnition of A in (2.3). We now complete the proof of Proposition 3.2 by showing that (cid:92) j =1 C j ⊆ A . (3.46)Assume (cid:84) j =1 C j occurs. Let i, j ∈ [ N ] be such that ( j, t ) (cid:46) ( i, t ) . Assume ﬁrst that B N ∩ P i,tj,t = ∅ .Then, by Lemma 3.3 and using the leader’s position as an upper bound, we obtain X i ( t ) ≤ X j ( t ) + c a N ≤ X N ( t ) + c a N ≤ X ( t ) + c a N , where the last inequality follows by Lemma 2.4. Thus, recalling the deﬁnition of L c ,N ( t ) in (2.1),we have i ∈ [ L c ,N ( t )] . Therefore, if i > L c ,N ( t ) then we must have B N ∩ P i,tj,t (cid:54) = ∅ . It follows that N − L c ,N ( t ) ≤ (cid:88) ( k,b,s ) ∈ B [ t ,t − N |N bk,s ( t ) | < N − γ by the same argument as for (3.45). Since we took c < η in (3.4), we now have L η,N ( t ) ≥ N − N − γ ,which ﬁnishes the proof of (3.46). The proof of Proposition 3.2 then follows from (3.26), (3.42), (3.46)and Step (ii). 29 .3 Breaking down event C We have now broken down the events B , B and A into simpler events C to C . In Section 4 wewill be able to show directly that the events C to C occur with high probability. However, we willneed to break C down further, into simpler events that we will show occur with high probabilityin Section 4. In this section we carry out the task of breaking down C , which says that a gap ofsize c a N appears behind the rightmost particle at some point during the time interval [ t + 1 , t ] (see (3.10)), into simpler events. Recall that we assumed t > (cid:96) N , and that the constants η ∈ (0 , , γ , δ , ρ , c , c , . . . , c , K satisfy (3.2)-(3.5).The ﬁrst event we introduce is the same as the event C in (3.12), except with larger gaps andjumps. That is, if a particle is more than c a N away from the leader, then it does not jump towithin distance c a N of the leader’s position with a single big jump (recall that c (cid:28) c ). We let D := (cid:26) (cid:64) ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t , t − (cid:75) such that X i,b,s ∈ ( Z i ( s ) − c a N , Z i ( s ) + 3 c a N ] and Z i ( s ) ≥ c a N (cid:27) , (3.47)where Z i ( s ) is the gap between the i th and the rightmost particle. The reason behind the deﬁnitionof D is the following. Assume that a big jump beats the leader at a time when the diameter isfairly big ( > c a N ). Then the event D , together with the events C , C and C , implies that thisparticle must become the new leader and it will lead by at least (3 c − ρ ) a N , which will be enoughto show that C occurs. We state this as a corollary below, which we will use later on in this section. Corollary 3.10.

On the event D ∩ C ∩ C ∩ C , if ( k, b, s ) ∈ B [ t ,t − N , d ( X ( s )) ≥ ( c + c ) a N and X k,b,s > Z k ( s ) , then X N ( s + 1) = X k ( s ) + X k,b,s > X N − ( s + 1) + (3 c − ρ ) a N , where Z k ( s ) , D and C , C , C are given by (3.11) , (3.47) and (3.13) - (3.15) respectively, and B [ t ,t − N is deﬁned in (2.14) .Proof. The statement follows by exactly the same argument as for Corollary 3.7(a), if we replace C by D , c by c and c by c .The next two events will ensure that the record is broken in the time interval [ t + 1 , t ] . Theﬁrst event says that there is a jump of size greater than c a N in every interval of length c (cid:96) N in [ t , t ] (recall c (cid:28) c ). We deﬁne D := {∀ s ∈ (cid:74) t , t − c (cid:96) N (cid:75) , ∃ ( k, b, ˆ s ) ∈ [ N ] × { , } × (cid:74) s, s + c (cid:96) N (cid:75) : X k,b, ˆ s > c a N } . (3.48)The event D will be useful if at some point in the time interval [ t , t ] the diameter is not too large( ≤ c a N ). If D occurs then shortly after this point a jump of size larger than c a N happens. Wewill show that this jump breaks the record, and the particle performing this jump will lead by atleast c a N . The reason for this is that the jump size ( > c a N ) is much greater than the precedingdiameter ( ≤ c a N ), and that c (cid:28) c .The next event says that there will be a jump of size greater than c a N between times t and t + (cid:100) (cid:96) N / (cid:101) (recall c (cid:29) c ). Let D := {∃ ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t , t + (cid:100) (cid:96) N / (cid:101) (cid:75) : X i,b,s > c a N } . (3.49)The next event says that there is no jump of size greater than c a N shortly before time t . We let D := { (cid:64) ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t − (cid:100) c (cid:96) N (cid:101) , t (cid:75) : X i,b,s > c a N } (3.50)30recall c (cid:28) c ). Our last event excludes jumps of size in a certain small range in a certain short timeinterval. The starting point of this time interval will be the ﬁrst time after t when the diameter isat most c a N : τ := inf (cid:8) s ≥ t : d ( X ( s )) ≤ c a N (cid:9) , (3.51)and we deﬁne the event D := (cid:26) (cid:64) ( k, b, s ) ∈ [ N ] × { , } × (cid:74) τ , τ + c (cid:96) N (cid:75) : X k,b,s ∈ (2 c a N , c a N + 3 c a N ] (cid:27) . (3.52)We can now state the main result of this subsection. Proposition 3.11.

Let η ∈ (0 , , and assume that the constants γ, δ, ρ, c , c , . . . , c , K satisfy (3.2) - (3.5) . For all N ≥ suﬃciently large that (cid:96) N − (cid:100) c (cid:96) N (cid:101) ≥ (cid:100) (cid:96) N / (cid:101) and t > (cid:96) N , (cid:92) j =2 C j ∩ (cid:92) i =1 D i ⊆ C , where D , . . . , D are deﬁned in (3.47) - (3.50) and (3.52) respectively, and C , . . . , C are deﬁnedin (3.10) and (3.12) - (3.17) respectively. Before giving a precise proof of Proposition 3.11, we give an outline of the argument, which isdivided into four separate cases. Suppose (cid:84) j =2 C j ∩ (cid:84) i =1 D i occurs. Case 1:

Suppose there is a time τ ∈ [ t , t − c (cid:96) N ] when the diameter is not too large (at most c a N ). Then shortly after time τ , there will be a jump of size larger than c a N , by the deﬁnitionof the event D . We will show that the particle making this jump breaks the record and will leadby a distance larger than c a N . The proof will also use the deﬁnition of the event D . Case 2(a):

Suppose the diameter is larger than c a N at all times in [ t , t − c (cid:96) N ] , but the recordis broken by a big jump at some point in this time interval. Then Corollary 3.10 tells us that therewill be a gap of size greater than c a N behind the new record. Case 2(b):

Suppose the diameter is larger than c a N at all times in [ t , t − c (cid:96) N ] . If the recordis not broken on the time interval [ t − (cid:100) c (cid:96) N (cid:101) , t − c (cid:96) N ] , then using Lemma 3.9, we can show thatthe diameter is less than c a N at time t − (cid:100) c (cid:96) N (cid:101) , giving us a contradiction. Thus this case isimpossible. Case 2(c):

Suppose the diameter is larger than c a N at all times in [ t , t − c (cid:96) N ] . Now considerthe case that the record is not broken on the time interval [ t , t − c (cid:96) N ] , but is broken shortly before t , during the time interval [ t − (cid:100) c (cid:96) N (cid:101) , t − . By the deﬁnition of the event D , this jump cannotbe very big. Therefore, we will see that the new leader will be beaten by the ﬁrst jump of size greaterthan c a N , if the record has not already been broken before that. There will be a jump of sizegreater than c a N before time t + (cid:100) (cid:96) N / (cid:101) because of the event D , so the record must be brokenby a big jump before time t − c (cid:96) N . This again gives us a contradiction, meaning that Case 2(c) isalso impossible.We now prove Proposition 3.11, using cases 1, 2(a), 2(b) and 2(c) as described above.31 roof of Proposition 3.11. Fix η ∈ (0 , and take constants γ, δ, ρ, c , c , . . . , c , K as in (3.2)-(3.5).Let us assume that (cid:84) j =2 C j ∩ (cid:84) i =1 D i occurs. Case 1: t ≤ τ ≤ t − c (cid:96) N .In this case, by the deﬁnition of τ we have d ( X ( τ )) ≤ c a N . (3.53)Let us now consider the ﬁrst jump of size greater than c a N after time τ ; that is, let s ∗ = inf { s ≥ τ : ∃ ( k, b ) ∈ [ N ] × { , } such that X k,b,s > c a N } ∈ (cid:74) τ , τ + c (cid:96) N (cid:75) (3.54)by the deﬁnition of the event D in (3.48). Take ( k ∗ , b ∗ ) ∈ [ N ] × { , } such that X k ∗ ,b ∗ ,s ∗ > c a N (there is a unique choice of the pair ( k ∗ , b ∗ ) by the deﬁnition of the event C ). We will show thatthe jump X k ∗ ,b ∗ ,s ∗ creates a gap of size larger than c a N behind the leader. We do this in twosteps. First we show that the diameter is not too large right before the jump X k ∗ ,b ∗ ,s ∗ occurs; thenwe show that a gap is created.(i) We claim that d ( X ( s ∗ )) ≤ c a N + c a N . (3.55)Now we prove the claim. By (3.53) we can assume s ∗ > τ . Let j ∈ [ N ] be arbitrary, and thentake i ∈ [ N ] such that ( i, τ ) (cid:46) ( j, s ∗ ) . We will show that particle ( j, s ∗ ) is within distance (2 c + c ) a N of the leftmost particle at time s ∗ . We consider two cases, depending on whetherthere is a big jump on the path between X i ( τ ) and X j ( s ∗ ) .• If B N ∩ P j,s ∗ i,τ = ∅ , then by Lemma 3.3, (3.53) and monotonicity, X j ( s ∗ ) ≤ X i ( τ ) + c a N ≤ X ( τ ) + c a N + c a N ≤ X ( s ∗ ) + c a N + c a N . (3.56)• If B N ∩ P j,s ∗ i,τ (cid:54) = ∅ , then take ( k (cid:48) , b (cid:48) , s (cid:48) ) ∈ B N ∩ P j,s ∗ i,τ . Then X k (cid:48) ( s (cid:48) ) is the position of theparent of the particle that makes the jump X k (cid:48) ,b (cid:48) ,s (cid:48) . Since (by (3.54)) X k ∗ ,b ∗ ,s ∗ is the ﬁrstjump of size greater than c a N after time τ , and since s (cid:48) < s ∗ , we have X k (cid:48) ,b (cid:48) ,s (cid:48) ≤ c a N .Then since s ∗ − s (cid:48) ≤ s ∗ − τ ≤ c (cid:96) N , by Lemma 3.4 we have X j ( s ∗ ) ≤ X k (cid:48) ( s (cid:48) ) + X k (cid:48) ,b (cid:48) ,s (cid:48) + c a N ≤ X k (cid:48) ( s (cid:48) ) + 2 c a N + c a N . Now Lemma 3.6 and monotonicity imply that this is at most X ( s (cid:48) ) + 2 c a N + 2 c a N ≤ X ( s ∗ ) + 2 c a N + 2 c a N . (3.57)By (3.56), (3.57) and our choice of constants in (3.4), we conclude that for any particle position X j ( s ∗ ) in the population at time s ∗ , X j ( s ∗ ) ≤ X ( s ∗ ) + 2 c a N + c a N , which implies (3.55).(ii) We claim that X N − ( s ∗ + 1) + 2 c a N < X N ( s ∗ + 1) . (3.58)By the deﬁnition of ( k ∗ , b ∗ , s ∗ ) , we have X k ∗ ,b ∗ ,s ∗ > c a N , and we also know X k ∗ ,b ∗ ,s ∗ / ∈ (2 c a N , c a N +3 c a N ] by the deﬁnition of the event D , because s ∗ ∈ (cid:74) τ , τ + c (cid:96) N (cid:75) . Thereforewe have X k ∗ ,b ∗ ,s ∗ > c a N + 3 c a N . (3.59)32hen by (3.59) and (3.55), X k ∗ ( s ∗ ) + X k ∗ ,b ∗ ,s ∗ > X ( s ∗ ) + (2 c + 3 c ) a N ≥ X N ( s ∗ ) + (3 c − c ) a N . (3.60)Note that c − c > ρ by (3.3)-(3.4). Hence by (3.59), (3.60) and Lemma 3.5(b), we have ( k ∗ , s ∗ ) (cid:46) b ∗ ( N, s ∗ + 1) and X N ( s ∗ + 1) > X N − ( s ∗ + 1) + (3 c − c − ρ ) a N , which is larger than X N − ( s ∗ + 1) + 2 c a N by (3.3)-(3.4). This ﬁnishes the proof of (3.58).Recall from (3.54) that s ∗ ∈ (cid:74) τ , τ + c (cid:96) N (cid:75) . Furthermore, event C tells us that s ∗ / ∈ [ t − (cid:100) δ(cid:96) N (cid:101) , t ] .Therefore, by the assumption of Case 1 that τ ∈ [ t , t − c (cid:96) N ] , we conclude t + 1 ≤ s ∗ + 1 ≤ t ,which together with (3.58) shows that C occurs. We conclude that Proposition 3.11 holds in Case 1. Case 2(a): τ > t − c (cid:96) N and [ t , t − c (cid:96) N ] ∩ ˆ S N (cid:54) = ∅ , where ˆ S N is deﬁned in (2.18).This means that there exists (ˆ k, ˆ b, ˆ s ) ∈ B [ t ,t − c (cid:96) N ] N with X ˆ k, ˆ b, ˆ s > Z ˆ k (ˆ s ) (recall (3.11)). Since τ > t − c (cid:96) N , we have d ( X (ˆ s )) > c a N . Then by (3.4), we can apply Corollary 3.10 to obtain X N (ˆ s + 1) = X ˆ k (ˆ s ) + X ˆ k, ˆ b, ˆ s > X N − (ˆ s + 1) + (3 c − ρ ) a N . By our choice of constants in (3.3)-(3.4), and because ˆ s + 1 ∈ (cid:74) t + 1 , t (cid:75) , this shows that C occurs.Therefore we are done with the proof of Proposition 3.11 in Case 2(a). Case 2(b): τ > t − c (cid:96) N and [ t − (cid:100) c (cid:96) N (cid:101) , t − c (cid:96) N ] ∩ ˆ S N = ∅ .We will apply Lemma 3.9 with s = t −(cid:100) c (cid:96) N (cid:101) and ∆ s = (cid:96) N . By assumption we have [ s, s +∆ s − ⊆ ˆ S cN , and therefore applying either part (a) or part (b) of Lemma 3.9 as appropriate, we have d ( X ( s + ∆ s )) = d ( X ( t − (cid:100) c (cid:96) N (cid:101) )) ≤ max (cid:8) c a N , c a N + 2 c a N (cid:9) which is smaller than c a N by (3.4), contradicting the assumption that τ > t − c (cid:96) N . This showsthat Case 2(b) cannot occur. Case 2(c): τ > t − c (cid:96) N and [ t , t − c (cid:96) N ] ∩ ˆ S N = ∅ , but [ t − (cid:100) c (cid:96) N (cid:101) , t − ∩ ˆ S N (cid:54) = ∅ .Deﬁne τ := inf (cid:8) s ≤ t : (cid:74) s, t (cid:75) ⊆ ˆ S cN (cid:9) ∈ ( t − (cid:100) c (cid:96) N (cid:101) , t ] . (3.61)Suppose, aiming for a contradiction, that there exists r ∈ (cid:74) τ , t (cid:75) such that d ( X ( r )) ≤ c a N . Thensince (cid:74) τ , t (cid:75) ⊆ ˆ S cN , Lemma 3.9(b) applies and says that d ( X ( t )) ≤ c a N + 2 c a N . By (3.4), thiscontradicts the assumption that τ > t − c (cid:96) N . Thus (by (3.4) again for r > t ) we must have d ( X ( r )) ≥ c a N ∀ r ∈ (cid:74) τ , t − c (cid:96) N (cid:75) . (3.62)Now note that τ − ∈ ˆ S N . Then by (3.62), the second equivalence in Corollary 3.8 implies thatin fact τ − ∈ S N . Hence, by the deﬁnition of S N in (2.16), there exists ( k, b ) ∈ [ N ] × { , } suchthat X N ( τ ) = X k ( τ −

1) + X k,b,τ − , (3.63)where X k,b,τ − > ρa N . Now Lemma 3.6 provides a bound on X k ( τ − , and the deﬁnition of D together with the fact that τ − ∈ [ t − (cid:100) c (cid:96) N (cid:101) , t ] gives us a bound on X k,b,τ − , so that we obtain X N ( τ ) ≤ X ( τ −

1) + ( c + c ) a N . (3.64)33ow, on the event D , there exists (˜ i, ˜ b, ˜ s ) ∈ [ N ] × { , } × (cid:74) t , t + (cid:100) (cid:96) N / (cid:101) (cid:75) such that X ˜ i, ˜ b, ˜ s > c a N > ρa N (3.65)by (3.3)-(3.4). We show that the particle performing this big jump beats the leader at time ˜ s . Byour assumption that (cid:96) N − (cid:100) c (cid:96) N (cid:101) ≥ (cid:100) (cid:96) N / (cid:101) and by (3.61), we have (cid:74) τ , ˜ s (cid:75) ⊆ ˆ S cN and ˜ s − τ ≤ (cid:96) N .Therefore, by (3.62) we can apply Lemma 3.9(a) with s = τ and ∆ s = ˜ s − τ , and then by (3.64)we have X N (˜ s ) ≤ X N ( τ ) + c a N ≤ X ( τ −

1) + (2 c + c ) a N . (3.66)By (3.4), it follows that X N (˜ s ) < X ( τ −

1) + 2 c a N < X (˜ s ) + X ˜ i, ˜ b, ˜ s ≤ X ˜ i (˜ s ) + X ˜ i, ˜ b, ˜ s , where in the second inequality we use monotonicity and (3.65). Therefore, by the assumptions that ˜ s ∈ (cid:74) t , t + (cid:100) (cid:96) N / (cid:101) (cid:75) and (cid:96) N − (cid:100) c (cid:96) N (cid:101) ≥ (cid:100) (cid:96) N / (cid:101) , and by the deﬁnition of ˆ S N in (2.18), we have ˜ s ∈ ˆ S N ∩ [ t , t − c (cid:96) N ] , which contradicts the assumption of Case 2(c).We have now shown that if (cid:84) j =2 C j ∩ (cid:84) i =1 D i occurs then Cases 2(b) and 2(c) are impossible,whereas Cases 1 and 2(a) imply that C must occur. This concludes the proof of Proposition 3.11. In the deterministic argument in Section 3 we have provided a strategy which ensures that the events A and A occur. In this section we check that the events C to C and D to D which make upthis strategy all occur with high probability, and use this to ﬁnish the proof of Proposition 2.6.When bounding the probabilities of these events, it will be useful to consider branching randomwalks (BRWs) without selection, where at each time step all particles have two oﬀspring, the oﬀspringparticles make i.i.d. jumps from their parents’ locations, and every oﬀspring particle survives. Belowwe describe a construction of the N -BRW from N independent BRWs, which will allow us to considerour events on the probability space on which the BRWs are deﬁned. (A similar construction wasused in [1].) N -BRW from N independent BRWs Consider a binary tree with the following labelling. Let U := ∞ (cid:91) n =0 { , } n , and for convenience we write e.g. instead of (1 , , . Then the root of the binary tree has label ∅ , and for all u ∈ U the two children of vertex u have labels u and u . We will use the partialorder (cid:22) on the set U ; we write u (cid:22) v if either u = v or the vertex with label u is an ancestor of thevertex with label v in the binary tree. We also write u ≺ v if u (cid:22) v and u (cid:54) = v .The particles of the N independent BRWs will have labels from the set [ N ] × U , and we have alexicographical order on the set of labels. We also let U := U \ {∅} . The jumps of the BRWs willbe given by random variables ( Y j,u ) j ∈ [ N ] ,u ∈U , which are i.i.d. with common law given by (1.3).The N initial particles of the N independent BRWs are labelled with the pairs ( j, ∅ ) with j ∈ [ N ] .For each j ∈ [ N ] , we let Y j ( ∅ ) ∈ R be the initial location of particle ( j, ∅ ) . Then, at each timestep n ∈ N , each particle ( j, u ) with j ∈ [ N ] and u ∈ { , } n has two oﬀspring labelled ( j, u and ( j, u , which make jumps Y j,u , Y j,u from the location Y j ( u ) . The locations of the oﬀspring34articles ( j, u and ( j, u will be Y j ( u

1) = Y j ( u ) + Y j,u and Y j ( u

2) = Y j ( u ) + Y j,u . Note thatfor u ≺ v , the path between particles ( j, u ) and ( j, v ) is given by the jumps Y j,w with u ≺ w (cid:22) v ,i.e. Y j ( v ) − Y j ( u ) = (cid:80) u ≺ w (cid:22) v Y j,w .Now we construct the N -BRW by deﬁning the surviving set of particles for each time n ∈ N as the N -element set H n ⊆ [ N ] × { , } n , constructed iteratively as follows. Let H := { (1 , ∅ ) , . . . , ( N, ∅ ) } . Given H n for some n ∈ N , we let H (cid:48) n denote the set of oﬀspring of the particlesin the set H n : H (cid:48) n := (cid:91) ( j,u ) ∈ H n { ( j, u , ( j, u } . Then H n +1 ⊆ H (cid:48) n consists of the particles with the N largest values in the collection ( Y j ( u )) ( j,u ) ∈ H (cid:48) n ,where ties are broken based on the lexicographical order of the labels. In this way an N -BRW isconstructed from the initial conﬁguration ( Y j ( ∅ )) j ∈ [ N ] and the jumps ( Y j,u ) j ∈ [ N ] ,u ∈U .For n ∈ N , we let F (cid:48) n denote the σ -algebra generated by ( Y j,u ) j ∈ [ N ] ,u ∈∪ nm =1 { , } m . Note that H n is F (cid:48) n -measurable for each n .Returning to our original notation in Section 2.1, we can say the following. For all n ∈ N , let X ( n ) denote the ordered set which contains the values ( Y j ( u )) ( j,u ) ∈ H n in ascending order: X ( n ) = {X ( n ) ≤ · · · ≤ X N ( n ) } := {Y j ( u ) ≤ · · · ≤ Y j N ( u N ) } , (4.1)where H n = { ( j i , u i ) : i ∈ [ N ] } , and again ties are broken based on the lexicographical orderof the labels. Then we deﬁne the map σ which associates the pair ( i, n ) ∈ [ N ] × N with particle ( j i , u i ) ∈ H n , where Y j i ( u i ) has the i th position in the ordered set X ( n ) . That is, for ( i, n ) ∈ [ N ] × N we let σ ( i, n ) = ( j i , u i ) ∈ H n ⊂ [ N ] × U , (4.2)where ( j i , u i ) is as in (4.1). The jumps in our original notation are then given by X i, ,n := Y j i ,u i and X i, ,n := Y j i ,u i , (4.3)if σ ( i, n ) = ( j i , u i ) .Finally, recall that we introduced the partial order (cid:46) in (2.8) in Section 2.4 to denote that twoparticles are related in the N -BRW. This partial order corresponds to the partial order (cid:22) in the N independent BRWs as follows. For all n, k ∈ N and i , i k ∈ [ N ] , we have ( i , n ) (cid:46) ( i k , n + k ) if and only if for some j ∈ [ N ] and u, v ∈ U , we have σ ( i , n ) = ( j, u ) , σ ( i k , n + k ) = ( j, v ) , and u (cid:22) v . Furthermore, for b ∈ { , } we have ( i , n ) (cid:46) b ( i k , n + k ) if and only if the above holds andadditionally k ≥ and ub (cid:22) v .Now we can consider the N -BRW constructed from N independent BRWs with the notationintroduced in Sections 2.1 and 2.4. It follows from our construction that for any path in the N -BRW, there is a path in one of the N independent BRWs that consists of the same sequence of jumpsas the path in the N -BRW. We state and prove this simple property below. Recall the notation P i ,n i ,n from (2.10). Lemma 4.1.

For all k ∈ N , i , i k ∈ [ N ] and n ∈ N , if ( i , n ) (cid:46) ( i k , n + k ) with P i k ,n + ki ,n = { ( i l , b l , n + l ) : l ∈ { , . . . , k − }} , then there exists j ∈ [ N ] and ( u l ) kl =0 ⊆ U such that(1) ( j, u l ) ∈ H n + l , for all l ∈ { , . . . , k } ,(2) u l b l (cid:22) u k , for all l ∈ { , . . . , k − } , and X i l ,b l ,n + l = Y j,u l b l , for all l ∈ { , . . . , k − } .Proof. Take ( i l , b l , n + l ) ∈ P i k ,n + ki ,n (with l ∈ { , . . . , k − } ). Then ( i l , n + l ) (cid:46) b l ( i k , n + k ) . Thus,there exist j ∈ [ N ] and u l , u k ∈ U such that σ ( i l , n + l ) = ( j, u l ) , σ ( i k , n + k ) = ( j, u k ) , and u l b l (cid:22) u k . This implies X i l ,b l ,n + l = Y j,u l b l (see (4.3)) and also ( j, u l ) ∈ H n + l and ( j, u k ) ∈ H n + k bythe deﬁnition (4.2) of σ . Since ( i l , b l , n + l ) ∈ P i k ,n + ki ,n was arbitrary, the result follows. One of the most important components of the deterministic argument in Section 3 is that pathscannot move very far without big jumps; this is the meaning of the event C deﬁned in (3.14).Corollary 4.5 is the main result of this section and will be used to bound from below the probabilitythat the event C occurs.As in [2], we use Potter’s bounds to give useful estimates on the regularly varying function h (with index α ) deﬁned in (1.3). We will use the following elementary consequence of Potter’s bounds. Lemma 4.2.

For (cid:15) > , there exist B ( (cid:15) ) > and C ( (cid:15) ) , C ( (cid:15) ) > such that h ( x ) ≤ C x (cid:15) − α and h ( x ) ≤ C x α + (cid:15) ∀ x ≥ B. Proof.

Let (cid:15) > be arbitrary. By Potter’s bounds [6, Theorem 1.5.6(iii)], there exists x > depending only on (cid:15) such that h ( y ) h ( x ) ≤ (cid:0) ( y/x ) α + (cid:15) , ( y/x ) α − (cid:15) (cid:1) ∀ x, y ≥ x . (4.4)Let x ≥ x be arbitrary and let y = x in (4.4). Then we have y/x ≤ and so ( y/x ) α + (cid:15) ≤ ( y/x ) α − (cid:15) ,and the ﬁrst inequality in the statement of the lemma holds with C = 2 x α − (cid:15) h ( x ) − and B = x +1 .Similarly, since we have x/y ≥ , we have ( x/y ) α − (cid:15) ≤ ( x/y ) α + (cid:15) , and hence by (4.4) (with x and y exchanged) the second inequality holds with C = 2 h ( x ) x − ( α + (cid:15) )0 and B = x + 1 .In order to show that C occurs with high probability, we prove a lemma about a random walkwith the same jump distribution as our N -BRW, but in which jumps larger than a certain size arediscarded and count as a jump of size zero. The lemma gives an upper bound on the probabilitythat this random walk moves a large distance x N in of order (cid:96) N steps, if the jumps larger than rx N are discarded (for some r ∈ (0 , ). For an arbitrarily large q > , the parameter r can be takensuﬃciently small that the above probability is smaller than N − q (for large N ). Our lemma is similarto the lemma on page 168 of [13], where the jump distribution is truncated; jumps greater than athreshold value are not allowed at all, instead of being counted as zero. We use ideas from the proofof Theorem 3 in [15], which is a large deviation result for sums of random variables with stretchedexponential tails.Recall that P ( X > x ) = h ( x ) − for x ≥ , where h is regularly varying with index α > . Lemma 4.3.

Let X , X , . . . be i.i.d. random variables with X d = X . For any m ∈ N , q > , λ > , < r < ∧ λ (1 ∧ α )8 q , for N suﬃciently large, if x N > N λ then P (cid:32) m(cid:96) N (cid:88) j =1 X j { X j ≤ rx N } ≥ x N (cid:33) ≤ N − q . Lemma 4.4.

Suppose Y is a non-negative random variable. For v > and < K < K < ∞ , E [exp( vY { Y ≤ K } ) { Y ≥ K } ] = (cid:90) K K ve vu P ( Y > u ) du + e vK P ( Y ≥ K ) − ( e vK − P ( Y > K ) . (4.5) Proof.

First note that the random variable in the expectation on the left-hand side of (4.5) takesthe value if Y > K . The expectation can be written as E [exp( vY { Y ≤ K } ) { Y ≥ K } ] = E (cid:2) e vY { K ≤ Y ≤ K } (cid:3) + P ( Y > K ) . (4.6)Now we will work on the integral on the right-hand side of (4.5). First, by Fubini’s theorem we have (cid:90) K K ve vu P ( Y > u ) du = E (cid:20)(cid:90) K K ve vu { Y >u } du (cid:21) = E (cid:20)(cid:90) K ∧ YK ve vu du { Y ≥ K } (cid:21) . By calculating the integral, it follows that (cid:90) K K ve vu P ( Y > u ) du = E (cid:104)(cid:16) e v ( K ∧ Y ) − e vK (cid:17) { Y ≥ K } (cid:105) = E (cid:2) e vY { K ≤ Y ≤ K } (cid:3) + E (cid:2) e vK { Y >K } (cid:3) − E (cid:2) e vK { Y ≥ K } (cid:3) . The result follows by (4.6).

Proof of Lemma 4.3.

Let ˜ X := X { X ≤ rx N } and ˜ X j := X j { X j ≤ rx N } for all j ∈ N . Take N suﬃ-ciently large that (cid:96) N ≤ N . Then by Markov’s inequality and since ˜ X , ˜ X , . . . are i.i.d. with ˜ X d = ˜ X , for c > , P  m(cid:96) N (cid:88) j =1 ˜ X j ≥ x N  = P  exp  c(cid:96) N x − N m(cid:96) N (cid:88) j =1 ˜ X j  ≥ e c(cid:96) N  ≤ e − c(cid:96) N E (cid:104) e c(cid:96) N x − N ˜ X (cid:105) m(cid:96) N ≤ N − c log 2 + m log 2 log E (cid:20) e c(cid:96)N x − N ˜ X (cid:21) , (4.7)since log N ≤ (cid:96) N ≤ N . We will show that with an appropriate choice of c > , for N suﬃciently large, the right-hand side of (4.7) is smaller than N − q . First we require c > q log 2 . (4.8)Second, we will have another condition on c which ensures that E [ e c(cid:96) N x − N ˜ X ] ≤ O ( N − (cid:15) ) as N → ∞ for some (cid:15) > . We now estimate this expectation and determine the choice of c .Take < (cid:15) < λ (1 ∧ α )2( λ +1) , and take B = B ( (cid:15) ) > and C = C ( (cid:15) ) > as in Lemma 4.2. Suppose N is suﬃciently large that rx N > B . We apply Lemma 4.4 with Y = X , v = c(cid:96) N x − N , K = B and K = rx N , and then use (1.3), to obtain E (cid:104) e c(cid:96) N x − N ˜ X (cid:105) ≤ E (cid:20) e c(cid:96) N x − N X { X ≤ rxN } { X ≥ B } (cid:21) + e Bc(cid:96) N x − N P ( X < B ) ≤ (cid:90) rx N B c(cid:96) N x − N e c(cid:96) N x − N u h ( u ) − du + e Bc(cid:96) N x − N . (4.9)37e will choose c such that the ﬁrst term on the right-hand side of (4.9) is close to zero. ByLemma 4.2, and then since r < , we have (cid:90) rx N B c(cid:96) N x − N e c(cid:96) N x − N u h ( u ) − du ≤ (cid:90) rx N B C c(cid:96) N x − N e c(cid:96) N x − N u u − α + (cid:15) du ≤ C c(cid:96) N x − N (cid:90) rx N B e c(cid:96) N x − N ( rx N ) x (cid:15)N u − α du. Integrating the right-hand side, since we took N suﬃciently large that (cid:96) N ≤ N , we conclude (cid:90) rx N B c(cid:96) N x − N e c(cid:96) N x − N u h ( u ) − du ≤  C c − α (cid:96) N N cr log 2 (cid:0) r − α x (cid:15) − αN − B − α x (cid:15) − N (cid:1) , if α (cid:54) = 1 ,C c(cid:96) N x (cid:15) − N N cr log 2 log x N , if α = 1 , (4.10)where in the α = 1 case we use that B > and that r < .Now, since x N > N λ and (cid:15) < ∧ α , the right-hand side of (4.10) is at most of order N − (cid:15) if cr log 2 + λ ( (cid:15) − (1 ∧ α )) < − (cid:15). (4.11)Since r < λ (1 ∧ α )8 q by the assumptions of the lemma, we can ﬁnd c such that q log 2 < c < λ (1 ∧ α ) log 24 r . Then since we chose (cid:15) < λ (1 ∧ α )2( λ +1) , c satisﬁes (4.8) and (4.11). Note furthermore that since x N > N λ ,the second term on the right-hand side of (4.9) is close to for N large; for N suﬃciently large wehave e Bc(cid:96) N x − N ≤ e Bc(cid:96) N N − λ ≤ Bc(cid:96) N N − λ . (4.12)Hence, (4.9), (4.10) and the choice of c , and (4.12) with the fact that (cid:15) < λ show that thereexists a constant A > such that E (cid:104) e c(cid:96) N x − N ˜ X (cid:105) ≤ AN − (cid:15) for N suﬃciently large and x N > N λ . Therefore, by (4.7) and (4.8) we have P  m(cid:96) N (cid:88) j =1 ˜ X j ≥ x N  ≤ N − q + m log 2 log(1+ AN − (cid:15) ) ≤ N − q + m log 2 AN − (cid:15) < N − q , for N suﬃciently large, which concludes the proof.We now apply Lemma 4.3 to the N -BRW, to give us a convenient form of the result which wewill use later in this section and also in Section 5. Corollary 4.5.

Let λ > and < r < ∧ λ (1 ∧ α )48 . Then there exists C > such that for N suﬃciently large, if x N > N λ , P (cid:32) ∃ ( k , s ) ∈ [ N ] × (cid:74) t , t − (cid:75) , s ∈ (cid:74) s + 1 , t (cid:75) and k ∈ N k ,s ( s ) : (cid:80) ( i,b,s ) ∈ P k ,s k ,s X i,b,s { X i,b,s ≤ rx N } ≥ x N (cid:33) ≤ CN − , where P k ,s k ,s and N k ,s ( s ) are deﬁned in (2.10) and (2.12) respectively. roof. Take ( k , s ) , ( k , s ) ∈ [ N ] × (cid:74) t , t − (cid:75) with ( k , s ) (cid:46) ( k , s ) , and let k (cid:48) = ζ k ,s ( t ) be theindex of the time- t ancestor of ( k , s ) (see (2.9) for the notation). If the path between particles ( k , s ) and ( k , s ) moves at least x N even with discarding jumps greater than rx N , then the pathbetween ( k (cid:48) , t ) and ( k , s ) does the same, because all jumps are non-negative. Therefore we onlyneed to consider paths starting with the N particles of the population at time t : P (cid:32) ∃ ( k , s ) ∈ [ N ] × (cid:74) t , t − (cid:75) , s ∈ (cid:74) s + 1 , t (cid:75) and k ∈ N k ,s ( s ) : (cid:80) ( i,b,s ) ∈ P k ,s k ,s X i,b,s { X i,b,s ≤ rx N } ≥ x N (cid:33) ≤ P (cid:32) ∃ k (cid:48) ∈ [ N ] , s ∈ (cid:74) t + 1 , t (cid:75) and k ∈ N k (cid:48) ,t ( s ) : (cid:80) ( i,b,s ) ∈ P k ,s k (cid:48) ,t X i,b,s { X i,b,s ≤ rx N } ≥ x N (cid:33) . (4.13)Now consider the N -BRW constructed from N independent BRWs (see Section 4.1). Assume that k (cid:48) ∈ [ N ] , s ∈ (cid:74) t + 1 , t (cid:75) and k ∈ N k (cid:48) ,t ( s ) are such that (cid:88) ( i,b,s ) ∈ P k ,s k (cid:48) ,t X i,b,s { X i,b,s ≤ rx N } ≥ x N . Then by Lemma 4.1 there exists a path in one of the N independent BRWs that contains the samejumps as the path P k ,s k (cid:48) ,t . Thus Lemma 4.1 implies that there exist ( j, u ) ∈ H t and ( j, v ) ∈ H s such that u ≺ v and (cid:88) u ≺ w (cid:22) v Y j,w { Y j,w ≤ rx N } ≥ x N . That is, there is a path in the N independent BRWs between times t and s which moves at least x N even with discarding jumps of size greater than rx N . This means that there must be a path withthe same property between times t and t as well, because all jumps are non-negative. Therefore P (cid:32) ∃ k (cid:48) ∈ [ N ] , s ∈ (cid:74) t + 1 , t (cid:75) and k ∈ N k (cid:48) ,t ( s ) : (cid:80) ( i,b,s ) ∈ P k ,s k (cid:48) ,t X i,b,s { X i,b,s ≤ rx N } ≥ x N (cid:33) ≤ P (cid:18) ∃ ( j, u ) ∈ H t and v ∈ { , } t with v (cid:31) u : (cid:88) u ≺ w (cid:22) v Y j,w { Y j,w ≤ rx N } ≥ x N (cid:19) . (4.14)Let X i , i = 1 , , . . . be i.i.d. with distribution given by (1.3), and take λ > , x N > N λ , and < r < ∧ λ (1 ∧ α )48 . Note that the random variables Y j,w are all distributed as the X i randomvariables, and that there are (cid:96) N terms in the sum on the right-hand side of (4.14). We will give aunion bound for the probability of the event on the right-hand side of (4.14), using that H t is a setof N elements and that a particle in the set H t has (cid:96) N descendants in a BRW (without selection)at time t , which means (cid:96) N possible labels for v for each ( j, u ) ∈ H t . Then by (4.13), (4.14) andby conditioning on F (cid:48) t and using a union bound, P (cid:32) ∃ ( k , s ) ∈ [ N ] × (cid:74) t , t − (cid:75) , s ∈ (cid:74) s + 1 , t (cid:75) and k ∈ N k ,s ( s ) : (cid:80) ( i,b,s ) ∈ P k ,s k ,s X i,b,s { X i,b,s ≤ rx N } ≥ x N (cid:33) ≤ N (cid:96) N P (cid:32) (cid:96) N (cid:88) j =1 X j { X j ≤ rx N } ≥ x N (cid:33) . (4.15)39hen by Lemma 4.3 with m = 4 and q = 6 , we have that for N suﬃciently large, P (cid:32) (cid:96) N (cid:88) j =1 X j { X j ≤ rx N } ≥ x N (cid:33) ≤ N − . The result follows by (4.15). h In order to bound the probabilities of the events C to C and D to D , we will need to use severalproperties of the function h from (1.3). Recall that h is regularly varying with index α > , andthat it determines the jump distribution of the N -BRW in the sense that for each jump ( i, b, s ) , P ( X i,b,s > x ) = h ( x ) − ∀ x ≥ . (4.16)Recall that a N = h − (2 N (cid:96) N ) , and note that a N → ∞ as N → ∞ . Indeed, by the deﬁnition of h − in (1.6), a N is non-decreasing, and since h is non-decreasing by (1.3), a N cannot converge to a ﬁnitelimit a ∈ R , because this would imply h ( a + 1) ≥ N (cid:96) N ∀ N . Moreover, letting C = C ( α ) as inLemma 4.2, for N suﬃciently large that a N + 1 ≥ B = B ( α ) , N (cid:96) N < h ( a N + 1) ≤ C ( a N + 1) α , (4.17)where in the ﬁrst inequality we use the deﬁnition (1.6) of h − and that h is non-decreasing, and thesecond inequality follows by the second inequality of Lemma 4.2.Since h is regularly varying with index α , we have N (cid:96) N h ( a N ) → as N → ∞ . (4.18)Indeed, since h is non-decreasing, for any ε ∈ (0 , , by (1.2) and by the deﬁnition of a N we have (1 − ε ) α − ε ≤ h ( a N (1 − ε )) h ( a N ) ≤ N (cid:96) N h ( a N ) ≤ h ( a N (1 + ε )) h ( a N ) ≤ (1 + ε ) α + ε, for N suﬃciently large. Often in our proofs it will be enough to use that (4.18) implies < N (cid:96) N h ( a N ) < , (4.19)for N suﬃciently large.For convenience we state a few other simple properties of h , which we will apply several times.Let r ∈ (0 , and η < / . First, we have h ( ra N ) < h ( a N ) ( r − α + η ) < h ( a N ) 2 r − α , (4.20)for N suﬃciently large, by (1.2). Second, for N suﬃciently large, we also have N (cid:96) N h ( ra N ) < N (cid:96) N h ( a N ) ( r − α + η ) < (1 + η )( r − α + η ) < r − α , (4.21)by (4.20) and (4.18). Furthermore, by the same argument as for (4.21), for N suﬃciently large, N (cid:96) N h ( ra N ) > r − α . (4.22)40 .4 Probabilities and proof of Proposition 2.6 Next we will go through the events ( C j ) j =2 and ( D i ) i =1 , which we deﬁned in Section 3, one byone. We will prove upper bounds on the probabilities of their complement events, which will thenallow us to prove Proposition 2.6. Recall that the events ( C j ) j =2 and ( D i ) i =1 all depend on theconstants η, K, γ, δ, ρ, c . . . , c introduced in (3.2)-(3.5), and Propositions 3.2 and 3.11 hold whenthe constants satisfy the conditions (3.2)-(3.5). In order to show that the events in question occurwith high probability, the constants need to satisfy some extra conditions which are consistentwith (3.2)-(3.5). We now specify these choices.Recall that α > . First we assume that η ∈ (0 , is very small; in particular, that it is smallenough to satisfy η < min (cid:32)(cid:18) α +2 log (cid:18) η (cid:19)(cid:19) − /α , η · α (cid:33) . (4.23)Then we choose the remaining constants as follows:(a) c := η ,(b) c := η ∨ α ) ,(c) c := c / (1 ∧ α )5 ,(d) take c > small enough to satisfy c < c ∨ α )4 and (1 − c /c ) α ≥ − αc /c ,(e) take c > small enough to satisfy c < c ∨ α )3 and (1 − c /c ) α ≥ − αc /c ,(f) c := c ,(g) ρ := c (1 ∧ α ) / (100 α ) ,(h) δ := ρ α +1 ,(i) γ := δ/ ,(j) K := ρ − α − .Note that the constants with the choices above can be thought of as in (3.1). We state a few simpleconsequences of these choices, which will be useful in proving upper bounds on the probabilities ofthe complement events of C to C and D to D . First, by (4.23), we have η < · α < , (4.24)and note that all constants γ, δ, ρ, c . . . , c and /K are at most η . Thus, from (a)-(f) and (4.24),for j = 1 , . . . , , we have c j ≤ c j +1 ≤ c j +1 η < c j +1 · α , (4.25)which also means c j < η (4.26)for j = 1 , . . . , . In particular we will need that c c < (1 ∨ α ) (4.27)41nd c c < (1 ∨ α ) , (4.28)which both follow by (4.25) and by the fact that α ≥ e α ≥ ∨ α for α > . We will also use thatfrom (e) we have c − α − c < c − ∨ α )+4(1 ∨ α )3 ≤ c < c · α c α < η α α , (4.29)where we applied (4.25) and that α ≥ α , and then that c < η . Then similarly, from (d) we have c − α − c < η α α . (4.30)Finally, from (g) and (4.26) we have ρ < c < η . (4.31)Considering the choices (a)-(j) together with the consequences (4.24) and (4.25), and noticingthat (g) implies ρ ≤ c / , we conclude that the constants η, K, γ, δ, ρ, c . . . , c satisfy (3.2)-(3.5),so we will be able to apply Propositions 3.2 and 3.11 with this choice of constants.We can now show that the events C to C and D to D occur with high probability. Lemma 4.6.

Suppose the constants η , K , γ , δ , ρ , c , . . . , c > satisfy (4.23) and (a) - (j) . Thenfor N suﬃciently large and t > (cid:96) N , P ( C cj ) < η and P ( D ci ) < η for all j ∈ { , . . . , } and i ∈ { , . . . , } , where the events ( C j ) j =2 and ( D i ) i =1 are deﬁned in (3.12) - (3.17) and (3.47) - (3.52) respectively.Proof. Assume that η > satisﬁes (4.23). We consider the events ( C j ) j =2 and ( D i ) i =1 with the con-stants K , γ , δ , ρ , c , . . . , c , and we assume that these constants satisfy (a)-(j). We will upper boundthe probabilities of the events ( C cj ) j =2 and ( D ci ) i =1 using (4.25)-(4.31) above, and the properties ofthe regularly varying function h described in Section 4.3. The event C c (see (3.12)) says that there is a time s ∈ [ t , t − when a particle at distanceat least c a N behind the leader jumps to within distance c a N of the leader’s position. We useMarkov’s inequality, and sum over all the jumps happening between times t and t − to bound theprobability of this event. We have P ( C c ) ≤ E (cid:20) (cid:26) ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t , t − (cid:75) such that Z i ( s ) ≥ c a N and X i,b,s ∈ ( Z i ( s ) − c a N , Z i ( s ) + 2 c a N ] (cid:27)(cid:21) = (cid:88) ( i,b,s ) ∈ [ N ] ×{ , }× (cid:74) t ,t − (cid:75) E (cid:104) { Z i ( s ) ≥ c a N } { X i,b,s ∈ ( Z i ( s ) − c a N ,Z i ( s )+2 c a N ] } (cid:105) . Recall from Section 2.4 that for s ∈ N and i ∈ [ N ] , the distance Z i ( s ) of the i th particle from theleader is F s -measurable, but the jumps performed at time s , X i, ,s and X i, ,s , are independent of F s . Hence by (4.16), P ( C c ) ≤ (cid:88) ( i,b,s ) ∈ [ N ] ×{ , }× (cid:74) t ,t − (cid:75) E (cid:104) E (cid:104) { Z i ( s ) ≥ c a N } { X i,b,s ∈ ( Z i ( s ) − c a N ,Z i ( s )+2 c a N ] } (cid:12)(cid:12)(cid:12) F s (cid:105)(cid:105) = (cid:88) ( i,b,s ) ∈ [ N ] ×{ , }× (cid:74) t ,t − (cid:75) E (cid:2) { Z i ( s ) ≥ c a N } (cid:0) h ( Z i ( s ) − c a N ) − − h ( Z i ( s ) + 2 c a N ) − (cid:1)(cid:3) . (4.32)42ince h is monotone non-decreasing, for any z ≥ c a N we have h ( z − c a N ) − − h ( z + 2 c a N ) − ≤ h (( c − c ) a N ) − (cid:18) − h ( z − c a N ) h ( z + 2 c a N ) (cid:19) . (4.33)Take (cid:15) > . For the fraction on the right-hand side of (4.33) we have that for N suﬃciently large,for z ≥ c a N , ≥ h ( z − c a N ) h ( z + 2 c a N ) ≥ h (cid:16) ( z + 2 c a N ) · ( c − c ) a N ( c +2 c ) a N (cid:17) h ( z + 2 c a N ) ≥ (cid:18) − c c + 2 c (cid:19) α − (cid:15) ≥ − α c c − (cid:15), (4.34)where we ﬁrst use the monotonicity of h , and in the second inequality we use that z ≥ c a N , thatthe function y (cid:55)→ ( y − c a N ) / ( y + 2 c a N ) is increasing in y , and we again use the monotonicityof h . The third inequality follows by (1.2), and the fourth holds by the deﬁnition of c in (e). Then,by (4.20) and the lower bound in (4.34) with (cid:15) = η ( c − c ) α , we see from (4.33) that for N suﬃciently large, for z ≥ c a N , h ( z − c a N ) − − h ( z + 2 c a N ) − ≤ c − c ) − α h ( a N ) − (cid:18) α c c + η ( c − c ) α (cid:19) ≤ h ( a N ) − (16 α α c − α − c + 2 η ) , (4.35)where for the ﬁrst term of the second inequality we used the fact that ( c − c ) − α < ( c / − α ,because c < c / by (4.25).Now let us return to (4.32) and notice that we sum over N (cid:96) N jumps. Therefore, by (4.35) weconclude that for N suﬃciently large, P ( C c ) ≤ N (cid:96) N h ( a N ) (16 α α c − α − c + 2 η ) ≤ α α c − α − c + 2 η ) < η < η , where we used (4.19) in the second inequality, (4.29) in the third, and (4.24) in the fourth. The event C c (see (3.13)) says that there exists a big jump in the time interval [ t , t − such thata descendant also performs a big jump during the time interval [ t , t − , within time (cid:96) N of the ﬁrstbig jump.Consider the N -BRW constructed from N independent BRWs (see Section 4.1). If C c occursthen there must be two big jumps in the N -BRW as above; that is, we must have ( i , s ) (cid:46) b ( i , s ) with s ∈ (cid:74) t , t − (cid:75) and s ∈ (cid:74) s + 1 , min { s + (cid:96) N , t − } (cid:75) , and ( i , b , s ) , ( i , b , s ) ∈ B N , where B N is the set of big jumps deﬁned in (2.15). Then by Lemma 4.1 there are two big jumps with thesame properties in the N independent BRWs; that is, there exist j ∈ [ N ] , u , u ∈ U such that ( j, u ) ∈ H s , ( j, u ) ∈ H s , u b (cid:22) u , X i ,b ,s = Y j,u b and X i ,b ,s = Y j,u b . Therefore, since s ∈ (cid:74) s + 1 , min { s + (cid:96) N , t − } (cid:75) ⊆ (cid:74) s + 1 , s + (cid:96) N (cid:75) and H s ⊆ [ N ] × { , } s , we have P ( C c ) ≤ P  ∃ s ∈ (cid:74) t , t − (cid:75) , ( j, u ) ∈ H s , b ∈ { , } and s ∈ (cid:74) s + 1 , s + (cid:96) N (cid:75) , u ∈ { , } s , u (cid:31) u , b ∈ { , } : Y j,u b > ρa N and Y j,u b > ρa N  . (4.36)Recall the deﬁnition of F (cid:48) n in Section 4.1. By a union bound over the possible values of s , s , b and b , and then conditioning on F (cid:48) s and applying another union bound over the possible values of ( j, u ) and u , P ( C c ) ≤ (cid:88) s ∈ (cid:74) t ,t − (cid:75) ,s ∈ (cid:74) s +1 ,s + (cid:96) N (cid:75) ,b ,b ∈{ , } E  (cid:88) ( j,u ) ∈ H s ,u ∈{ , } s ,u (cid:31) u P (cid:0) Y j,u b > ρa N , Y j,u b > ρa N (cid:12)(cid:12) F (cid:48) s (cid:1) . ( Y j,u ) j ∈ [ N ] ,u ∈∪ m>s { , } m are independent of F (cid:48) s , for ( j, u ) ∈ H s and u ∈ { , } s wehave P (cid:0) Y j,u b > ρa N , Y j,u b > ρa N (cid:12)(cid:12) F (cid:48) s (cid:1) = h ( ρa N ) − . Hence by summing over the (cid:96) N − possible values for s , and the two possible values for b and b , and since | H s | = N , and for u ∈ { , } s there are s − s possible values of u ∈ { , } s with u (cid:31) u , for N suﬃciently large we have P ( C c ) ≤ (cid:96) N · (cid:88) s ∈ (cid:74) s +1 ,s + (cid:96) N (cid:75) N s − s h ( ρa N ) − ≤ N (cid:96) N · · log N +1 h ( ρa N ) − = (cid:18) N (cid:96) N h ( ρa N ) (cid:19) (cid:96) − N ≤ ρ − α · (cid:96) − N < η , (4.37)where in the third inequality we used (4.21). The event C c (see (3.14)) can be bounded using Corollary 4.5. We apply the corollary with x N = c a N , r = ρ/c and λ = 1 / (2 α ) . We can make this choice for λ , because we have c a N > N / (2 α ) (4.38)for all N suﬃciently large by (4.17). By our choice of ρ in (g), we have r < ∧ λ (1 ∧ α )48 , and soCorollary 4.5 tells us that for some constant C > , for N suﬃciently large, P ( C c ) ≤ CN − < η . (4.39) The event C c (see (3.15)) says that two big jumps occur at the same time, that is C c = {∃ s ∈ (cid:74) t , t − (cid:75) , ( k , b ) (cid:54) = ( k , b ) ∈ [ N ] × { , } : X k ,b ,s > ρa N and X k ,b ,s > ρa N } . By a union bound over the (cid:96) N time steps and the possible pairs of jumps at each time step, P ( C c ) ≤ (cid:96) N (cid:18) N (cid:19) h ( ρa N ) − ≤ (cid:18) N (cid:96) N h ( ρa N ) (cid:19) (cid:96) − N ≤ ρ − α · (cid:96) − N < η (4.40)for N suﬃciently large, where the third inequality follows by (4.21). The event C c (see (3.16)) says that a big jump happens in (at least) one of two very short timeintervals, [ t , t + (cid:100) δ(cid:96) N (cid:101) ] and [ t − (cid:100) δ(cid:96) N (cid:101) , t + (cid:100) δ(cid:96) N (cid:101) ] . In total there are N · (3 (cid:100) δ(cid:96) N (cid:101) + 2) jumpsperformed during these two time intervals. By a union bound over these jumps, we get P ( C c ) = P ( ∃ ( i, b, s ) ∈ [ N ] × { , } × ( (cid:74) t , t + (cid:100) δ(cid:96) N (cid:101) (cid:75) ∪ (cid:74) t − (cid:100) δ(cid:96) N (cid:101) , t + (cid:100) δ(cid:96) N (cid:101) (cid:75) ) : X i,b,s > ρa N ) ≤ N (3 δ(cid:96) N + 5) h ( ρa N ) − ≤ δρ − α (1 + 2 δ − (cid:96) − N ) < η , (4.41)for N suﬃciently large, where in the second inequality we used (4.21), and the last inequality followsby the choice of δ in (h) and by (4.31). The event C gives an upper bound on the number of big jumps (see (3.17)). There are N (cid:96) N jumps performed in the time interval [ t , t − ; by Markov’s inequality and then by (4.21), we have P ( C c ) = P ( { ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t , t − (cid:75) : X i,b,s > ρa N } > K ) ≤ N (cid:96) N h ( ρa N ) − K ≤ K ρ − α < η (4.42)44or N suﬃciently large, where the last inequality follows by the choice of K in (j) and by (4.31). The event D (see (3.47)) has the same deﬁnition as that of C (see (3.12)), except with diﬀerentconstants. By the same argument as for (4.35), using the deﬁnition of c in (d), for N suﬃcientlylarge we have h ( z − c a N ) − − h ( z + 3 c a N ) − ≤ h ( a N ) − (24 α · α c − α − c + 2 η ) ∀ z ≥ c a N . (4.43)Then continuing in the same way as after (4.35) we obtain P ( D c ) ≤ α α c − α − c + 2 η ) < η < η , (4.44)for N suﬃciently large, by (4.30) and (4.24). The event D in (3.48) says that in every interval of length c (cid:96) N in [ t , t ] there is a particle whichperforms a jump of size greater than c a N . We introduce a slightly diﬀerent event to show that D happens with high probability. Let us divide the interval [ t , t ] into subintervals of length c (cid:96) N ,to get (cid:6) c − (cid:7) subintervals (the last subinterval may end after time t ). If a jump of size greaterthan c a N happens in each of these subintervals then D occurs. We describe this formally by thefollowing event: ˜ D :=  ∀ m ∈ (cid:8) , , . . . , (cid:6) c − (cid:7)(cid:9) , ∃ ( k, b, s ) ∈ [ N ] × { , } × (cid:74) t + ( m − c (cid:96) N , t + m c (cid:96) N (cid:75) : X k,b,s > c a N  ; as mentioned above, if ˜ D occurs then D occurs. The complement event of ˜ D is that there isa subinterval in which every jump made by a particle has size at most c a N . Note that in eachsubinterval (cid:74) t + ( m − c (cid:96) N , t + m c (cid:96) N (cid:75) , there are at least N · c (cid:96) N jumps. Therefore, by aunion bound, we have P ( D c ) ≤ P ( ˜ D c ) ≤ (cid:6) c − (cid:7) (cid:18) − h (2 c a N ) (cid:19) c (cid:96) N N ≤ (4 c − + 1) exp (cid:18) − c N (cid:96) N h (2 c a N ) (cid:19) ≤ c − exp (cid:18) − c (2 c ) − α (cid:19) , (4.45)where in the third inequality we use that − x ≤ e − x for x ≥ , and the fourth inequality followsby (4.22) for N suﬃciently large and since c < . Now note that by (c), c c − α = c − α/ (1 ∧ α )5 ≥ c − > α log (cid:16) c η (cid:17) , where the last inequality holds because c − > α by (4.25), < log x < x for x > , and c − > η by (4.26). Substituting this into (4.45) shows that P ( D c ) < η/ . The event D c deﬁned in (3.49) says that every jump in the time interval [ t , t + (cid:100) (cid:96) N / (cid:101) ] has sizeat most c a N . There are at least N (cid:96) N jumps in this time interval, and so for N suﬃciently large,since e − x ≥ − x for x ≥ , and then by (4.22), P ( D c ) ≤ (cid:18) − h (2 c a N ) (cid:19) N(cid:96) N ≤ exp (cid:18) − N (cid:96) N h (2 c a N ) (cid:19) ≤ exp (cid:18) − (2 c ) − α (cid:19) . (4.46)45ow (a) and (4.23) tell us that c − α = η − α > α +2 log( η ) , and substituting this into (4.46) showsthat P ( D c ) < η/ . The event D c (see (3.50)) says that in the time interval [ t − (cid:100) c (cid:96) N (cid:101) , t ] , a particle performs ajump of size greater than c a N (recall from (a) and (b) that c (cid:28) c ). Since there are at most N ( (cid:100) c (cid:96) N (cid:101) + 1) ≤ N ( c (cid:96) N + 2) jumps in the time interval [ t − (cid:100) c (cid:96) N (cid:101) , t ] , by a union bound, P ( D c ) = P ( ∃ ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t − (cid:100) c (cid:96) N (cid:101) , t (cid:75) : X i,b,s > c a N ) ≤ N ( c (cid:96) N + 2) h ( c a N ) ≤ c c − α (1 + 2 c − (cid:96) − N ) ≤ η ∨ α ) η − α < η , (4.47)for N suﬃciently large, where in the second inequality we use (4.21), the third inequality holds bythe choices in (b) and (a) for N suﬃciently large, and the fourth follows by (4.24). The event D c (see (3.52)) says that in a short time interval after time τ (deﬁned in (3.51)) ajump is performed whose size falls into a small interval, (2 c a N , (2 c + 3 c ) a N ] . We can see from thedeﬁnition of τ as the ﬁrst time after t when the diameter is at most c a N , that τ is a stoppingtime. Therefore we can condition on F τ , and apply the strong Markov property. By Markov’sinequality we have P ( D c ) = P ( ∃ ( k, b, s ) ∈ [ N ] × { , } × (cid:74) τ , τ + c (cid:96) N (cid:75) : X k,b,s ∈ (2 c a N , (2 c + 3 c ) a N ]) ≤ E [ E [ { ( k, b, s ) ∈ [ N ] × { , } × (cid:74) τ , τ + c (cid:96) N (cid:75) : X k,b,s ∈ (2 c a N , (2 c + 3 c ) a N ] } |F τ ]] . Note that if τ < ∞ then during the time interval [ τ , τ + c (cid:96) N ] there are at most N ( c (cid:96) N + 1) jumps; it follows that P ( D c ) ≤ E (cid:34) (cid:88) ( k,b,s ) ∈ [ N ] ×{ , }× (cid:74) τ ,τ + c (cid:96) N (cid:75) P ( X k,b,s ∈ (2 c a N , (2 c + 3 c ) a N ] | F τ ) { τ < ∞} (cid:35) ≤ N ( c (cid:96) N + 1) (cid:0) h (2 c a N ) − − h ((2 c + 3 c ) a N ) − (cid:1) (4.48)by the strong Markov property. Now we can use the monotonicity of h and then the upperbound (4.43) to get h (2 c a N ) − − h ((2 c + 3 c ) a N ) − ≤ h ((2 c − c ) a N ) − − h ((2 c + 3 c ) a N ) − ≤ h ( a N ) − (24 α · α c − α − c + 2 η ) (4.49)for N suﬃciently large. Therefore, by (4.48), (4.49), and (4.18), we have that for N suﬃcientlylarge, P ( D c ) ≤ (1 + c − (cid:96) − N ) c (1 + η )(24 α α c − α − c + 2 η ) < c · η < η , (4.50)where in the second inequality we use (4.30) and that (1+ c − (cid:96) − N )(1+ η ) < for N suﬃciently large,and the last inequality follows by (4.26) and (4.24). This concludes the proof of Lemma 4.6.We have seen in Lemma 4.6 above that with an appropriate choice of constants, the probabilitiesof the events C to C and D to D which imply A and A are close to 1. We can now use this toprove Proposition 2.6. Proof of Proposition 2.6.

Take η ∈ (0 , . Without loss of generality, we can assume that η issuﬃciently small that it satisﬁes (4.23). Then choose K , γ , δ , ρ , c , . . . , c as in (a)-(j) (at the46eginning of Section 4.4). Note that before stating Lemma 4.6 we checked that these constants alsosatisfy (3.2)-(3.5). Therefore by Proposition 3.2 and Proposition 3.11, for N suﬃciently large and t > (cid:96) N , (cid:92) j =2 C j ∩ (cid:92) i =1 D i ⊆ A ∩ A . Therefore, for N suﬃciently large and t > (cid:96) N , by a union bound, P (( A ∩ A ) c ) ≤ P  (cid:92) j =2 C j ∩ (cid:92) i =1 D i  c  ≤ (cid:88) j =2 P ( C cj ) + (cid:88) i =1 P ( D ci ) < η by Lemma 4.6, which completes the proof. We will prove Proposition 2.7 in this section. So far we have proved Proposition 2.6, which says thatwith high probability the common ancestor of the majority of the population at time t is particle ( N, T ) , where T is given by (2.17); in particular, T is between times t and t . Now recall thenotation introduced in (2.19)-(2.23). Proposition 2.7 says that for ν > , with high probability,every particle in the set N N,T ( T + ε N (cid:96) N ) has at most νN surviving descendants at time t , wherewe may assume that ( ε N ) N ∈ N satisﬁes ε N (cid:96) N ∈ N ∀ N ≥ , ε N (cid:96) N → ∞ as N → ∞ and ε N ≤

14 log (cid:96) N (cid:96) N ∀ N ≥ . (5.1)The ﬁrst two of these assumptions on ε N are from (2.2). The third can be made without loss ofgenerality, because if ε (cid:48) N > ε N , and every particle in N N,T ( T + ε N (cid:96) N ) has at most νN survivingdescendants at time t , then certainly every particle in N N,T ( T + ε (cid:48) N (cid:96) N ) has at most νN survivingdescendants at time t .Fix η ∈ (0 , suﬃciently small that it satisﬁes (4.23). Then choose K , γ , δ , ρ , c , . . . , c asin (a)-(j). Then take N suﬃciently large that Proposition 2.6 and Lemma 4.6 hold for our chosenconstants, and take t > (cid:96) N . Let ν > be ﬁxed and let us write A := A ( ν ) from now on. Our strategy for showing Proposition 2.7 is to give a lower bound on the position of the leftmostparticle at time t with high probability, and then bound the number of time- t descendants of eachparticle in N N,T ( T ε N ) which can reach that lower bound by time t . We will be able to control thenumber of such descendants because of Corollary 4.5. Assume that we know X ( t ) ≥ X N ( T ) + ˆ a T,N ,where ˆ a T,N > N λ for some λ > , but ˆ a T,N (cid:28) a N . Then Corollary 4.5 implies that with highprobability all surviving particles at time t must have an ancestor which made a jump of size greaterthan r ˆ a T,N for an appropriate choice of r ∈ (0 , . So given a particle i ∈ N N,T ( T ε N ) , we can ﬁndan upper bound for the number of its time- t descendants with high probability, by considering thenumber of its descendants which made a jump of size greater than r ˆ a T,N before time t . Thus, weshould choose ˆ a T,N such that we have X ( t ) ≥ X N ( T ) + ˆ a T,N with high probability, and also suchthat we can get a good enough upper bound for each D i (see (2.21)) from Corollary 4.5 to concludeProposition 2.7.We now give a sketch argument to motivate our choice of lower bound on X ( t ) . Assume that T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] . We also assume that the record set at time T is not broken by a big47ump before time t + δ(cid:96) N , and so almost all the descendants of particle ( N, T ) survive betweentimes T and T + (cid:96) N . This all happens with high probability, as we saw in Section 4; in particularrecall the event C from (3.16). Set θ T,N := ( t − T ) /(cid:96) N .Note that if a descendant of particle ( N, T ) makes a jump of size greater than ˆ a T,N at time T + k for some k ∈ [(1 − δ ) (cid:96) N , (cid:96) N ] , then it can have (1+ θ T,N ) (cid:96) N − k descendants at time t , and all of thesedescendants are to the right of X N ( T ) + ˆ a T,N . Also, there are approximately k particles in theleading tribe descending from ( N, T ) at time T + k . Therefore, we expect that jumps of size greaterthan ˆ a T,N , performed by the descendants of ( N, T ) in the time interval [ T + (1 − δ ) (cid:96) N , T + (cid:96) N ] ,contribute to the number of particles to the right of X N ( T ) + ˆ a T,N at time t by roughly (cid:88) k ∈ (cid:74) (1 − δ ) (cid:96) N ,(cid:96) N (cid:75) k · (1+ θ T,N ) (cid:96) N − k h (ˆ a T,N ) ≈ δ(cid:96) N (1+ θ T,N ) (cid:96) N h (ˆ a T,N ) . If we want to make sure that all the N particles are to the right of X N ( T ) + ˆ a T,N at time t , thenthe above should be approximately N , and so ˆ a T,N should be roughly h − ( δ(cid:96) N N θ T,N ) .There are several potential inaccuracies in this argument. For example, the descendants of aparticle making a jump of size greater than ˆ a T,N do not necessarily all survive until time t . We willuse a reasoning similar to Lemma 2.4 to clarify this issue. Another problem might occur if a particle ( i, T + k ) makes a jump of size greater than ˆ a T,N , and then at time T + k + 1 , its oﬀspring does thesame. In this case our sketch argument double counts the time- t descendants of particle ( i, T + k ) .We will therefore make some adjustments in the rigorous proof to avoid double counting.In Sections 5.2 to 5.5 below, we will make the sketch argument precise, then use Corollary 4.5to see that with high probability, particles must have at least one jump greater than a certainsize (roughly but not exactly h − ( δ(cid:96) N N θ T,N ) ) in their ancestry to survive until time t . Finally, foreach particle ( i, T ε N ) , we upper bound the number of particles at time t which descend from particle ( i, T ε N ) and have a jump greater than this certain size in their ancestry between times T ε N and t . In the strategy above we suggested that h − ( δ(cid:96) N N θ T,N ) should be a good lower bound for X ( t ) −X N ( T ) . A problem with this lower bound is that it depends on T , and conditioning on T wouldchange the distribution of the process, as T is not a stopping time; see the deﬁnition in (2.17).Note however, that the ﬁrst, second, . . . , n th times after time t at which a jump of size greaterthan ρa N breaks the record between times t and t , are stopping times, and T is equal to one ofthese times with high probability. Furthermore, the number of such times is at most K with highprobability, by Lemma 4.6 and the deﬁnition of the event C . Therefore, we can deﬁne a ﬁnite set ofstopping times in such a way that T is in the set with high probability. Then we can prove a similarstatement to Proposition 2.7 for each stopping time in the ﬁnite set with the strategy described inthe previous section. This will be enough to prove Proposition 2.7.Recall the deﬁnition of S N in (2.16). Deﬁne a sequence of stopping times by setting T := t + (cid:100) δ(cid:96) N (cid:101) − , and T n := 1 + inf { S N ( ρ ) ∩ [ T n − , t − (cid:100) δ(cid:96) N (cid:101) − } , (5.2)for n ∈ N ; let T n := t if the intersection above is empty.For all n ∈ N , we introduce some new notation which will be frequently used in the course of theproof. First we let T ε N n := T n + ε N (cid:96) N . (5.3)The set and number of time- t descendants of the i th particle at time T ε N n will be denoted by N i,n := N i,T εNn ( t ) and D i,n := |N i,n | . (5.4)48e also introduce θ n,N := ( t − T n ) (cid:96) N ≥ . (5.5)Take < δ < δ/ and set ˆ a n,N := h − ( δ N θ n,N (cid:96) N ) , (5.6)where h − , deﬁned in (1.6), is the generalised inverse of h from (1.3). We explained the motivationfor this deﬁnition of ˆ a n,N in Section 5.1. By the same argument as for (4.18) (and since δ N θ n,N (cid:96) N ≥ δ (cid:96) N ) we have that for (cid:15) > , for N suﬃciently (deterministically) large, for each n ∈ N , δ N θ n,N (cid:96) N h (ˆ a n,N ) ∈ [1 − (cid:15), (cid:15) ] . (5.7)We note that ˆ a n,N is roughly N θ n,N /α ; in particular, if h ( x ) = x α for x ≥ then ˆ a n,N = ( δ N θ n,N (cid:96) N ) /α .Take < δ < δ . Throughout Section 5 we will use the term ‘medium jump’ for jumps of sizegreater than δ ˆ a n,N , as the relevant space scale in this section is ˆ a n,N . We denote the set of mediumjumps on a time interval [ s , s ] ⊆ [ t , t − by M [ s ,s ] n,N := { ( k, b, s ) ∈ [ N ] × { , } × (cid:74) s , s (cid:75) : X k,b,s > δ ˆ a n,N } , (5.8)and we let M n,N := M [ t ,t − n,N . (5.9)The stopping times ( T n ) n ∈ N allow us to give an upper bound on the probability of A c . Suppose | B [ t ,t ] N | ≤ K and T ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] . Then | S N ( ρ ) ∩ [ t , t ] | ≤ K by the deﬁnition of S N in (2.16), and so by the deﬁnition of T in (2.17) and the deﬁnition of T n in (5.2), it follows that T = T n for some n ∈ [ K ] . Hence, by the deﬁnition of A in (2.23) and then by a union bound, P ( A c ) = P (cid:18) max i ∈N N,T ( T εN ) D i > νN (cid:19) ≤ P (cid:18) ∃ n ∈ [ K ] : T n ≤ t − (cid:100) δ(cid:96) N (cid:101) and max i ∈N N,Tn ( T εNn ) D i,n > νN (cid:19) + P (cid:0) | B [ t ,t ] N | > K (cid:1) + P (cid:0) T / ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ] (cid:1) . (5.10)By the deﬁnition of the event C in (3.17) and by Lemma 4.6, P ( | B [ t ,t ] N | > K ) ≤ P ( C c ) < η . Then by the deﬁnition of the event A in (2.22) and by Proposition 2.6, P ( T / ∈ [ t + (cid:100) δ(cid:96) N (cid:101) , t − (cid:100) δ(cid:96) N (cid:101) ]) ≤ P ( A c ) < η. Therefore, applying a union bound for the ﬁrst term on the right-hand side of (5.10), we obtain P ( A c ) ≤ E (cid:34) K (cid:88) n =1 { T n ≤ t −(cid:100) δ(cid:96) N (cid:101)} P (cid:18) max i ∈N N,Tn ( T εNn ) D i,n > νN (cid:12)(cid:12)(cid:12)(cid:12) F T n (cid:19)(cid:35) + 10011000 η. (5.11)From now on we aim to show that each term of the sum inside the expectation is small. For all n ∈ N , we let P T n denote the law of the N -BRW conditioned on F T n : P T n ( · ) := P ( · | F T n ) and E T n [ · ] := E [ · | F T n ] . (5.12)49 .3 Proof of Proposition 2.7 We now state the most important intermediate results in the proof of Proposition 2.7, and showthat they imply the result. We then prove these intermediate results in Sections 5.4 and 5.5.Our ﬁrst main intermediate result says that the probability that a particle in N N,T n ( T ε N n ) has adescendant at time t such that there is no medium jump on the path between the particle and thedescendant is small. We prove this result in Section 5.4. Lemma 5.1.

For all N suﬃciently large, t > (cid:96) N , and n ∈ N with T n < t , P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) , k ∈ N i,n : P k,ti,T εNn ∩ M n,N = ∅ (cid:17) < η K , where T n , T ε N n and P T n are given by (5.2) , (5.3) and (5.12) , N N,T n ( T ε N n ) and N i,n are deﬁnedin (2.12) and (5.4) , P k,ti,T εNn in (2.10) , and M n,N in (5.9) . Our second intermediate result says that with high probability, for each i ∈ N N,T n ( T ε N n ) , therecannot be more than νN time- t descendants of particle ( i, T ε N n ) if each descendant has a mediumjump on their path. We prove this result in Section 5.5. Lemma 5.2.

For all N suﬃciently large, t > (cid:96) N , and n ∈ N with T n < t , P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) : D i,n > νN and P k,ti,T εNn ∩ M n,N (cid:54) = ∅ ∀ k ∈ N i,n (cid:17) < η K , where T n , T ε N n and P T n are given by (5.2) , (5.3) and (5.12) , N N,T n ( T ε N n ) , N i,n and D i,n are deﬁnedin (2.12) and (5.4) , P k,ti,T εNn in (2.10) , and M n,N in (5.9) .Proof of Proposition 2.7. Suppose N is suﬃciently large that Lemmas 5.1 and 5.2 hold. Take n ∈ N and suppose T n < t (which also implies T n ≤ t − (cid:100) δ(cid:96) N (cid:101) by the deﬁnition (5.2) of T n ). Supposea particle in N N,T n ( T ε N n ) has more than νN surviving descendants at time t . Then either all thedescendants have an ancestor which performed a medium jump between times T ε N n and t , or thereis at least one particle which survives without a medium jump in its ancestry. Therefore we have P T n (cid:18) max i ∈N N,Tn ( T εNn ) D i,n > νN (cid:19) ≤ P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) , k ∈ N i,n : P k,ti,T εNn ∩ M n,N = ∅ (cid:17) + P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) : D i,n > νN and P k,ti,T εNn ∩ M n,N (cid:54) = ∅ ∀ k ∈ N i,n (cid:17) < η K (5.13)by Lemmas 5.1 and 5.2. Then by (5.11), it follows that P ( A c ) < K · η K + 10011000 η < η, which completes the proof. There are two key ideas in the proof. First we show that for a ﬁxed n ∈ N with T n < t , thewhole population is to the right of position X N ( T n ) + ˆ a n,N at time t , with high probability. Second,we prove that with high probability paths cannot reach position X N ( T n ) + ˆ a n,N without having amedium jump on the path. 50 emma 5.3. For all N suﬃciently large, t > (cid:96) N , and n ∈ N with T n < t , P T n ( X ( t ) < X N ( T n ) + ˆ a n,N ) < η K , where T n and ˆ a n,N are given by (5.2) and (5.6) respectively.Proof. Recall the deﬁnition of G x ( n ) in (2.7). Let G := G X N ( T n )+ˆ a n,N ( t ) ; then, to prove the state-ment of the lemma, we aim to show that for N suﬃciently large and t > (cid:96) N , P T n ( | G | < N ) < η K . (5.14)Recall the deﬁnition of δ > in (5.6); ﬁx δ (cid:48) ∈ (8 δ , δ ) and then take δ ∈ (8 δ , δ (cid:48) ) such that δ (cid:96) N is an integer (this is possible for N suﬃciently large). Let S k := T n + (cid:96) N − k for k ∈ (cid:74) , δ (cid:96) N (cid:75) .Then for each k ∈ (cid:74) , δ (cid:96) N (cid:75) , at time S k there are at least (cid:96) N − k particles to the right of (or at)position X N ( T n ) , by Lemma 2.4. These particles are either in the interval [ X N ( T n ) , X N ( T n ) + ˆ a n,N ) or to the right of this interval. Let us denote the set of particles in [ X N ( T n ) , X N ( T n ) + ˆ a n,N ) at time S k by A k , i.e. for k ∈ (cid:74) , δ (cid:96) N (cid:75) let A k := { i ∈ [ N ] : X i ( S k ) ∈ [ X N ( T n ) , X N ( T n ) + ˆ a n,N ) } . We will handle the following two cases separately:(a) the event E := (cid:8) | A k | ≥ (cid:96) N − k ∀ k ∈ (cid:74) , δ (cid:96) N (cid:75) (cid:9) occurs,(b) the event E c = (cid:110) ∃ k ∈ (cid:74) , δ (cid:96) N (cid:75) : | G X N ( T n )+ˆ a n,N ( S k ) | > (cid:96) N − k (cid:111) occurs.First we deal with case (a). We give a lower bound on | G | using a similar argument to the proofof Lemma 2.4. First note that jumps of size greater than ˆ a n,N from particles in A k arrive to theright of position X N ( T n ) + ˆ a n,N for all k ∈ (cid:74) , δ (cid:96) N (cid:75) . Thus all time- t descendants of a particle thatmakes such a jump will be in the set G . For k ∈ (cid:74) , δ (cid:96) N (cid:75) , let M (cid:48) k denote the set of such jumps: M (cid:48) k := { ( i, b, S k ) : X i,b,S k > ˆ a n,N and i ∈ A k } . Suppose for all k ∈ (cid:74) , δ (cid:96) N (cid:75) , all particles descending from the set M (cid:48) k survive until time t . Thenthe total number of such descendants will be (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:91) k ∈ (cid:74) ,δ (cid:96) N (cid:75) (cid:91) ( i,b,S k ) ∈M (cid:48) k N bi,S k ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = δ (cid:96) N (cid:88) k =1 k + θ n,N (cid:96) N − (cid:88) i ∈ A k ,b ∈{ , } { X i,b,Sk > ˆ a n,N } . (5.15)The ﬁrst term in the sum is the number of time- t descendants of a particle at time S k + 1 = T n + (cid:96) N − k + 1 , and the second sum gives the number of jumps of size greater than ˆ a n,N fromparticles in A k .If instead there exists k ∈ (cid:74) , δ (cid:96) N (cid:75) such that not every particle descending from a jump in M (cid:48) k survives until time t , then there must be N particles to the right of (or at) X N ( T n ) + ˆ a n,N at sometime s ≤ t (and therefore at time t , by monotonicity). We conclude the following lower bound: | G | ≥ min (cid:32) N, δ (cid:96) N (cid:88) k =1 k + θ n,N (cid:96) N − (cid:88) i ∈ A k ,b ∈{ , } { X i,b,Sk > ˆ a n,N } (cid:33) . (5.16)51et ξ j,k ∼ Ber ( h (ˆ a n,N ) − ) be i.i.d. random variables, by which we mean that P T n ( ξ j,k = 1) = 1 h (ˆ a n,N ) = 1 − P T n ( ξ j,k = 0) for all k, j ∈ N . The indicator random variables in (5.16) all have this distribution. Thus by (5.16), P T n ( {| G | < N } ∩ E ) ≤ P T n (cid:32)(cid:40) δ (cid:96) N (cid:88) k =1 k + θ n,N (cid:96) N − (cid:88) i ∈ A k ,b ∈{ , } { X i,b,Sk > ˆ a n,N } < N (cid:41) ∩ E (cid:33) ≤ P T n (cid:32) δ (cid:96) N (cid:88) k =1 k + θ n,N (cid:96) N − (cid:96)N − k (cid:88) j =1 ξ j,k < N (cid:33) , (5.17)since on the event E there are at least (cid:96) N − k jumps from the set A k for each k ∈ (cid:74) , δ (cid:96) N (cid:75) .We will use the concentration inequality from [19, Theorem 2.3(c)] to estimate the right-handside of (5.17). As the inequality applies for independent random variables taking values in [0 , , weconsider the random variables − δ (cid:96) N + k ξ j,k ∈ [0 , for k ∈ (cid:74) , δ (cid:96) N (cid:75) and j ∈ [2 (cid:96) N − k ] . Let µ denotethe expectation of the sum of these random variables over k and j : µ := E T n (cid:34) δ (cid:96) N (cid:88) k =1 − δ (cid:96) N + k (cid:96)N − k (cid:88) j =1 ξ j,k (cid:35) = δ (cid:96) N (cid:88) k =1 − δ (cid:96) N + k (cid:96) N − k h (ˆ a n,N ) ≥ δ (cid:96) N N − δ h (ˆ a n,N ) ≥ N − δ − θ n,N (5.18)for N suﬃciently large, where the last inequality holds because h (ˆ a n,N ) ≤ δ N θ n,N (cid:96) N by (5.7) for N suﬃciently large, and because we chose δ /δ ≥ . Thus P T n (cid:32) δ (cid:96) N (cid:88) k =1 k + θ n,N (cid:96) N − (cid:96)N − k (cid:88) j =1 ξ j,k < N (cid:33) ≤ P T n (cid:32) δ (cid:96) N (cid:88) k =1 − δ (cid:96) N + k (cid:96)N − k (cid:88) j =1 ξ j,k < N − δ − θ n,N (cid:33) ≤ P T n (cid:32) δ (cid:96) N (cid:88) k =1 − δ (cid:96) N + k (cid:96)N − k (cid:88) j =1 ξ j,k < µ (cid:33) for N suﬃciently large, where in the ﬁrst inequality we multiply by − ( δ + θ n,N ) (cid:96) N to get terms in [0 , in the sum and notice that − (cid:96) N ≤ N − , and the second inequality holds by (5.18). We nowapply the concentration inequality from [19, Theorem 2.3(c)] to the independent random variables − δ (cid:96) N + k ξ j,k ∈ [0 , on the right-hand side above, giving that P T n (cid:32) δ (cid:96) N (cid:88) k =1 k + θ n,N (cid:96) N − (cid:96)N − k (cid:88) j =1 ξ j,k < N (cid:33) ≤ e − µ/ ≤ e − N δ − δ , (5.19)where in the second inequality we use (5.18) again and that θ n,N ≤ − δ by (5.5) and since T n ≥ t + δ(cid:96) N by (5.2). Now putting (5.17) and (5.19) together, since δ − δ > δ − δ (cid:48) > weconclude that P T n ( {| G | < N } ∩ E ) < η K (5.20)for N suﬃciently large. 52n case (b), E c deterministically implies that | G | = N . Indeed, if E c occurs then it follows thatthere exists k ∈ (cid:74) , δ (cid:96) N (cid:75) such that | G X N ( T n )+ˆ a n,N ( S k ) | > (cid:96) N − k . Recall that S k = T n + (cid:96) N − k .Then by Lemma 2.4 we have | G | ≥ min (cid:16) N, (cid:96) N − k k + θ n,N (cid:96) N (cid:17) = N (5.21)for N suﬃciently large, because θ n,N ≥ δ by (5.5) and (5.2), and since we are assuming T n < t .Thus for N suﬃciently large, P T n ( {| G | < N } ∩ E c ) = 0 , which together with (5.20) and (5.14) concludes the proof.Now we are ready to prove Lemma 5.1. Corollary 4.5 tells us that paths cannot move a largedistance without having jumps which have size at least the order of magnitude of that large distance.So Lemma 5.3 and Corollary 4.5 together will show that paths without medium jumps cannot surviveuntil time t with high probability. Proof of Lemma 5.1.

We partition the event in Lemma 5.1 based on the position of the leftmostparticle: P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) , k ∈ N i,n : P k,ti,T εNn ∩ M n,N = ∅ (cid:17) = P T n (cid:16)(cid:110) ∃ i ∈ N N,T n ( T ε N n ) , k ∈ N i,n : P k,ti,T εNn ∩ M n,N = ∅ (cid:111) ∩ {X ( t ) < X N ( T n ) + ˆ a n,N } (cid:17) + P T n (cid:16)(cid:110) ∃ i ∈ N N,T n ( T ε N n ) , k ∈ N i,n : P k,ti,T εNn ∩ M n,N = ∅ (cid:111) ∩ {X ( t ) ≥ X N ( T n ) + ˆ a n,N } (cid:17) . (5.22)This will be useful, because from Lemma 5.3 we know that the leftmost particle at time t is to theright of (or at) X N ( T n ) + ˆ a n,N with high probability. Hence it is enough to focus on the second termon the right-hand side of (5.22), and show that with high probability, paths cannot move beyond X N ( T n ) + ˆ a n,N without medium jumps.Assume that the event in the second term on the right-hand side of (5.22) occurs with i ∈N N,T n ( T ε N n ) and k ∈ N i,n , and so we have P k,ti,T εNn ∩ M n,N = ∅ and X k ( t ) ≥ X ( t ) ≥ X N ( T n ) + ˆ a n,N .Note that particle ( k, t ) is a descendant of particle ( N, T n ) as well. The path between these twoparticles has to move distance at least ˆ a n,N . Thus one of the following must happen. Either thepath between particles ( N, T n ) and ( k, t ) moves ˆ a n,N even without medium jumps, or there mustbe a medium jump on this path. In the latter case the medium jump must be in the time interval [ T n , T ε N n − , because we assumed P k,ti,T εNn ∩ M n,N = ∅ . This leads to the following upper bound: P T n (cid:16)(cid:110) ∃ i ∈ N N,T n ( T ε N n ) , k ∈ N i,n : P k,ti,T εNn ∩ M n,N = ∅ (cid:111) ∩ {X ( t ) ≥ X N ( T n ) + ˆ a n,N } (cid:17) ≤ P T n (cid:32) ∃ k ∈ N N,T n ( t ) : (cid:88) ( i,b,s ) ∈ P k,tN,Tn X i,b,s { X i,b,s ≤ δ ˆ a n,N } ≥ ˆ a n,N (cid:33) + P T n ( ∃ s ∈ (cid:74) T n , T ε N n − (cid:75) , i ∈ N N,T n ( s ) and b ∈ { , } : X i,b,s > δ ˆ a n,N ) ≤ CN − + P T n ( ∃ s ∈ (cid:74) T n , T ε N n − (cid:75) , i ∈ N N,T n ( s ) and b ∈ { , } : X i,b,s > δ ˆ a n,N ) (5.23)for N suﬃciently large, where the second inequality holds for some constant C > because of Corol-lary 4.5 applied with x N = ˆ a n,N , r = δ and λ = δ/ (2 α ) . To check the conditions of Corollary 4.5we ﬁrst notice that we chose δ < δ , and claim that δ < ∧ δ (1 ∧ α )96 α . Indeed, at the beginning of53ection 5 we chose δ together with the other constants η , K , γ , ρ , c , . . . , c satisfying (a)-(j). Then(h), (g), (4.26) and (4.24) (using the fact that α > α for α > ) easily imply the claim. Regardingthe condition that x N > N λ , we have ˆ a n,N > N θ n,N / α ≥ N δ/ α for N suﬃciently large, where theﬁrst inequality follows by (5.7) and Lemma 4.2 by the same argument as for (4.17) and (4.38), andthe second inequality holds because θ n,N ≥ δ by (5.5), (5.2) and since we are assuming T n < t .Next we use a union bound to control the second term on the right-hand side of (5.23), usingthat there are at most · k jumps descending from particle ( N, T n ) at time T n + k . We have P T n ( ∃ s ∈ (cid:74) T n , T ε N n − (cid:75) , i ∈ N N,T n ( s ) and b ∈ { , } : X i,b,s > δ ˆ a n,N ) ≤ ε N (cid:96) N − (cid:88) k =0 · k h ( δ ˆ a n,N ) < ε N (cid:96) N h ( δ ˆ a n,N ) ≤ N ε N δ α δ N θ n,N (cid:96) N ≤ δ α δ N ε N − δ (5.24)for N suﬃciently large, where in the third inequality we use that ε N (cid:96) N ≤ N ε N for N suﬃcientlylarge, and that h ( δ ˆ a n,N ) ≥ δ α δ N θ n,N (cid:96) N / for N suﬃciently large because of (1.2) and (5.7), andin the fourth inequality we use that θ n,N ≥ δ by (5.5), (5.2) and since we are assuming T n < t .Note that we have ε N < δ/ for N suﬃciently large by our assumptions in (5.1). Therefore,by (5.22), Lemma 5.3, (5.23), and (5.24) we conclude that P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) , k ∈ N i,n : P k,ti,T εNn ∩ M n,N = ∅ (cid:17) < η K for N suﬃciently large. Proof of Lemma 5.2.

We partition the time interval [ T ε N n , t − into two subintervals, and look atthe number of medium jumps and the number of time- t descendants of the medium jumps. Let I := [ T ε N n , t + 2 ε N (cid:96) N − and I := [ t + 2 ε N (cid:96) N , t − be the two intervals, and let A ij denote the set of particles in N i,n which have a medium jump intheir ancestral lines which happened in the time interval I j : A ij := (cid:110) k ∈ N i,n : P k,ti,T εNn ∩ M I j n,N (cid:54) = ∅ (cid:111) , i ∈ N N,T n ( T ε N n ) , j ∈ { , } . (5.25)If there is a medium jump in I , then there may be many, possibly of order N , particles at time t descending from this medium jump. However, we will see that with high probability there are nomedium jumps at all in I : particle ( N, T n ) does not have enough descendants by the end of I for anyto have made a medium jump. In contrast, in the second interval there are many particles to makemedium jumps (although not more than N at any one time), but there is less time to produce manydescendants by time t . Indeed, for each i ∈ N N,T n ( T ε N n ) the expected number of time- t descendantsof ( i, T ε N n ) whose path has a medium jump in I is of order N − ε N . Using a concentration resultfrom [19], we will see that the number of descendants itself (rather than the expected number) is oforder N − ε N with high probability, and therefore for each i , the total contribution of A i and A i is o ( N ) with high probability. With the above strategy in mind, we give the following upper boundon the probability in the statement of Lemma 5.2, using (5.4): P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) : D i,n > νN and P k,ti,T εNn ∩ M n,N (cid:54) = ∅ ∀ k ∈ N i,n (cid:17) ≤ P T n (cid:16) ∃ i ∈ N N,T n ( T ε N n ) : (cid:110) k ∈ N i,n : P k,ti,T εNn ∩ M n,N (cid:54) = ∅ (cid:111) > νN (cid:17) = P T n ( ∃ i ∈ N N,T n ( T ε N n ) : | A i ∪ A i | > νN ) ≤ P T n ( ∃ i ∈ N N,T n ( T ε N n ) : A i (cid:54) = ∅ ) + P T n (cid:0) ∃ i ∈ N N,T n ( T ε N n ) : | A i | > CN − ε N (cid:1) (5.26)54or N suﬃciently large and any constant C , since ε N (cid:96) N → ∞ as N → ∞ by our choice of ε N in (5.1).We let ˜ I := [ T n , t + 2 ε N (cid:96) N − ⊃ I . It is enough to bound the ﬁrst term on the right-handside of (5.26) by the probability that any of the descendants of particle ( N, T n ) makes a mediumjump by time t + 2 ε N (cid:96) N − : P T n ( ∃ i ∈ N N,T n ( T ε N n ) : A i (cid:54) = ∅ ) ≤ P T n (cid:16) ∃ ( j, b, s ) ∈ M ˜ I n,N : ( N, T n ) (cid:46) ( j, s ) (cid:17) . (5.27)This probability will be very small, as the total number of descendants of ( N, T n ) in the time interval ˜ I is not large enough to see jumps of order ˆ a n,N . Indeed, applying a union bound over the jumpsmade by descendants of ( N, T n ) at times T n + k shows that the right-hand side of (5.27) is at most ( θ n,N +2 ε N ) (cid:96) N − (cid:88) k =0 · k h ( δ ˆ a n,N ) ≤ · ( θ n,N +2 ε N ) (cid:96) N δ α δ N θ n,N (cid:96) N ≤ δ α δ (cid:96) − / N (5.28)for N suﬃciently large, where in the ﬁrst inequality we use that h ( δ ˆ a n,N ) ≥ δ α δ N θ n,N (cid:96) N / for N suﬃciently large by (1.2) and (5.7), and in the second inequality we use the assumption on ε N in (5.1), and that θ n,N (cid:96) N ≤ N θ n,N .For the second term on the right-hand side of (5.26) we will give an upper bound using theconcentration inequality from [19, Theorem 2.3(b)]. First we bound | A i | for any i ∈ N N,T n ( T ε N n ) : | A i | ≤ (1+ θ n,N ) (cid:96) N − (cid:88) k =( θ n,N +2 ε N ) (cid:96) N (cid:88) j ∈N i,TεNn ( T n + k ) ,b ∈{ , } { X j,b,Tn + k >δ ˆ a n,N }|N bj,T n + k ( t ) | , (5.29)where we sum up the number of time- t descendants of every particle descended from ( i, T ε N n ) whichmade a jump of size greater than δ ˆ a n,N at a time T n + k in the time interval I . Now let ξ ij,k ∼ Ber ( h ( δ ˆ a n,N ) − ) be i.i.d. random variables, by which we mean that P T n ( ξ ij,k = 1) = h ( δ ˆ a n,N ) − = 1 − P T n ( ξ ij,k = 0) , for all i, j, k ∈ N . The indicator random variables in (5.29) all have this distribution. Consideringthat we have |N i,T εNn ( T n + k ) | ≤ min( N, k − ε N (cid:96) N ) and |N bj,T n + k ( t ) | ≤ (1+ θ n,N ) (cid:96) N − k − for all k ∈ (cid:74) ( θ n,N + 2 ε N ) (cid:96) N , (1 + θ n,N ) (cid:96) N − (cid:75) , and since N N,T n ( T ε N n ) ≤ ε N (cid:96) N , we obtain the following upperbound from (5.29): P T n (cid:0) ∃ i ∈ N N,T n ( T ε N n ) : | A i | > CN − ε N (cid:1) ≤ P T n (cid:32) ∃ i ∈ [2 ε N (cid:96) N ] : (1+ θ n,N ) (cid:96) N (cid:88) k =( θ n,N +2 ε N ) (cid:96) N (1+ θ n,N ) (cid:96) N − k N, k − εN (cid:96)N ) (cid:88) j =1 ξ ij,k > CN − ε N (cid:33) ≤ ε N (cid:96) N P T n (cid:32) (1+ θ n,N ) (cid:96) N (cid:88) k =( θ n,N +2 ε N ) (cid:96) N (1+ θ n,N ) (cid:96) N − k N, k − εN (cid:96)N ) (cid:88) j =1 ξ j,k > CN − ε N (cid:33) (5.30)by a union bound.Now [19, Theorem 2.3(b)] applies for independent random variables taking values in [0 , , so weconsider the random variables (2 ε N + θ n,N ) (cid:96) N − k ξ j,k ∈ [0 , for each k and j in the sum. Let µ denote55he expectation of the sum of these random variables over k and j : µ := E T n (cid:34) (1+ θ n,N ) (cid:96) N (cid:88) k =( θ n,N +2 ε N ) (cid:96) N (2 ε N + θ n,N ) (cid:96) N − k N, k − εN (cid:96)N ) (cid:88) j =1 ξ j,k (cid:35) = E T n (cid:34) (1+ ε N ) (cid:96) N − (cid:88) k =( θ n,N +2 ε N ) (cid:96) N (2 ε N + θ n,N ) (cid:96) N − k k − ε N (cid:96) N +1 h ( δ ˆ a n,N ) (cid:35) + E T n (cid:34) (1+ θ n,N ) (cid:96) N (cid:88) k =(1+ ε N ) (cid:96) N (2 ε N + θ n,N ) (cid:96) N − k Nh ( δ ˆ a n,N ) (cid:35) . (5.31)Now considering that for N suﬃciently large, δ α δ N θ n,N (cid:96) N / ≤ h ( δ ˆ a n,N ) ≤ δ α δ N θ n,N (cid:96) N by (1.2)and (5.7), that N ε N + θ n,N ≤ (2 ε N + θ n,N ) (cid:96) N ≤ N ε N + θ n,N , that δ ≤ θ n,N ≤ − δ and that ε N < δ/ for N suﬃciently large, it can be seen that we have K N ε N ≤ µ ≤ K N ε N , (5.32)for some constants K , K > . Then, if we multiply both sides of the sum in (5.30) by (2 ε N − (cid:96) N and use that (2 ε N − (cid:96) N ≥ N ε N − / , we get P T n (cid:0) ∃ i ∈ N N,T n ( T ε N n ) : | A i | > CN − ε N (cid:1) ≤ ε N (cid:96) N P T n (cid:32) (1+ θ n,N ) (cid:96) N (cid:88) k =( θ n,N +2 ε N ) (cid:96) N (2 ε N + θ n,N ) (cid:96) N − k N, k − εN (cid:96)N ) (cid:88) j =1 ξ j,k > CN ε N (cid:33) . By (5.32) we have µ ≥ K N ε N , and we can choose C > K so that CN ε N ≥ µ for N suﬃcientlylarge. Then by [19, Theorem 2.3(b)] we have for N suﬃciently large, P T n (cid:0) ∃ i ∈ N N,T n ( T ε N n ) : | A i | > CN − ε N (cid:1) ≤ N ε N exp (cid:32) − K N ε N ) (cid:33) , (5.33)which is small if N is large, by our choice of ε N in (5.1). Then by (5.26), (5.27), (5.28) and (5.33)we conclude Lemma 5.2. In Proposition 2.2 we need to prove that for any interval of the form [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] with < s < s < , the probability that the time of the common ancestor T is in this interval isbounded away from 0 for large N . The main idea of the proof is that if there is a big jump in thetime interval [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] which is much larger than any other jump in the time interval [ t , t ] , then that big jump will break the record, and we will have T ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] .More precisely, let r > be as in Proposition 2.2. We will ask that a particle performs a jumplarger than ( r + 3) a N at some time s ∗ ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ) , and all the other jumps in thetime interval [ t , t ] are smaller than a N . We will show that this happens with a probability boundedbelow by a positive constant (independent of N ).Suppose the above event occurs, and also the events C and C occur. Then we will also see that d ( X ( s ∗ )) ≤ (1 + c ) a N . This will imply that the particle which makes the jump larger than ( r + 3) a N at time s ∗ breaks the record, and it will lead by more than roughly ( r + 2) a N at time s ∗ + 1 . Asa result, the tribe of this particle will lead between times s ∗ + 1 and t , because we assumed that56ll jumps in [ s ∗ + 1 , t ) are smaller than a N . Moreover, particles not in the leading tribe cannot getcloser than ra N to the leading tribe by time t ; therefore, we will conclude d ( X ( t )) ≥ ra N as well.The following lemma will be useful for proving the above statements. Lemma 6.1.

Take ρ, c > . Then for N ≥ and t > (cid:96) N , for all s ∈ [ t , t ] and r > , on theevent C ∩ C , { X i,b,s ≤ r a N ∀ ( i, b, s ) ∈ [ N ] × { , } × (cid:74) s , s + (cid:96) N − (cid:75) } ⊆ { d ( X ( s + (cid:96) N )) ≤ ( r + c ) a N } , where the events C and C are deﬁned in (3.13) and (3.14) respectively.Proof. Let G denote the event on the left-hand side in the statement of the lemma: G := { X i,b,s ≤ r a N ∀ ( i, b, s ) ∈ [ N ] × { , } × (cid:74) s , s + (cid:96) N − (cid:75) } . Let j ∈ [ N ] be arbitrary, and let i = ζ j,s + (cid:96) N ( s ) . Then, on the event C , we have | B N ∩ P j,s + (cid:96) N i,s | ≤ ,and on the event C , no particle moves further than c a N once big jumps have been removed fromits path. Thus, on the event C ∩ C ∩ G , X j ( s + (cid:96) N ) ≤ X i ( s ) + c a N + (cid:88) ( i (cid:48) ,b (cid:48) ,s (cid:48) ) ∈ B N ∩ P j,s (cid:96)Ni,s X i (cid:48) ,b (cid:48) ,s (cid:48) ≤ X N ( s ) + ( r + c ) a N . But by Lemma 2.4, we have X N ( s ) ≤ X ( s + (cid:96) N ) , and the result follows. Proof of Proposition 2.2.

Recall the deﬁnition of A (cid:48) from (2.5), and consider a uniform sample of M particles at time t with indices P , . . . , P M . Also recall the deﬁnitions of T ( ρ ) in (2.17) and T ε N ( ρ ) in (2.19). For any ρ > we have { T ( ρ ) ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] } ∩ (cid:8) ζ P j ,t ( T ( ρ )) = N ∀ j ∈ [ M ] (cid:9) ∩ (cid:8) ζ P j ,t ( T ε N ( ρ )) (cid:54) = ζ P l ,t ( T ε N ( ρ )) ∀ j, l ∈ [ M ] , j (cid:54) = l (cid:9) ⊆ A (cid:48) . (6.1)For r > , we deﬁne A (cid:48) as a modiﬁcation of the event A from (2.22): A (cid:48) = A (cid:48) ( t, N, ρ, γ, r, s , s ) := { T ( ρ ) ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] }∩ (cid:8) |N N,T ( ρ ) ( t ) | ≥ N − N − γ (cid:9) ∩ { d ( X ( t )) ≥ ra N } . (6.2)We also deﬁne the set of jumps in the time interval [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ) which are larger than ( r + 3) a N : B (cid:48) N ( t, r, s , s ) := (cid:26) ( i, b, s ) ∈ [ N ] × { , } × (cid:74) t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) − (cid:75) : X i,b,s > ( r + 3) a N (cid:27) , (6.3)and the event G , which says that there is only one jump in the set B (cid:48) N , and every other jump issmaller than a N during the time interval [ t , t − : G = G ( t, N, r, s , s ) := (cid:26) | B (cid:48) N | = 1 and X i,b,s ≤ a N , ∀ ( i, b, s ) ∈ ([ N ] × { , } × [ t , t − \ B (cid:48) N (cid:27) . (6.4)Fix < s < s < , M ∈ N and r > . Choose π r,s − s > such that π r,s − s < s − s r + 3) α · e − , (6.5)57nd then η > suﬃciently small that it satisﬁes (4.23) and η < s − s r + 3) α · e − − π r,s − s . (6.6)Then choose the constants γ, δ, ρ, c , c , . . . , c , K such that they satisfy (a)-(j). Recall from Sec-tion 4.4 that this implies the properties in (3.2)-(3.5) and (4.24)-(4.31) also hold for η and γ, δ, ρ, c ,c , . . . , c , K . Let < ν < η/M .In the course of the proof we will use the events A and A from (2.22) and (2.23), and we willshow the following for N suﬃciently large and t > (cid:96) N :1. P (( A (cid:48) ) c ∪ { d ( X ( t )) < ra N } ) ≤ P (( A (cid:48) ) c ) + P ( A c ) + P ( A ( ν ) c ) + η (cid:84) j =2 C j ∩ (cid:84) i =1 D i ∩ G ⊆ A (cid:48) P ( G ) ≥ s − s r +3) α · e − P (( A (cid:48) ) c ∪ { d ( X ( t )) < ra N } ) ≤ − π r,s − s . We start by proving step 1. Notice that with our choices of constants, the conditions of Lemma 2.5hold. Therefore, we know P ( ∃ j, l ∈ [ M ] , j (cid:54) = l : ζ P j ,t ( T ε N ) = ζ P l ,t ( T ε N )) ≤ P ( A c ) + P ( A ( ν ) c ) + η/ , (6.7)for N suﬃciently large. Hence, because of (6.1), in order to prove step 1 it remains to show that P (cid:0) { T ( ρ ) / ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] } ∪ (cid:8) ∃ j ∈ [ M ] : ζ P j ,t ( T ) (cid:54) = N (cid:9) ∪ { d ( X ( t )) < ra N } (cid:1) ≤ P (( A (cid:48) ) c ) + η/ , (6.8)for N suﬃciently large. This follows similarly to the proof of (2.25). Partitioning the event on theleft-hand side of (6.8) using the event A (cid:48) , and then conditioning on F t , we obtain P (cid:0) { T ( ρ ) / ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] } ∪ (cid:8) ∃ j ∈ [ M ] : ζ P j ,t ( T ) (cid:54) = N (cid:9) ∪ { d ( X ( t )) < ra N } (cid:1) ≤ E (cid:104) A (cid:48) P ( ∃ j ∈ [ M ] : ζ P j ,t ( T ) (cid:54) = N | F t ) (cid:105) + P (cid:0) ( A (cid:48) ) c (cid:1) (6.9)where we use that if A (cid:48) occurs then T ( ρ ) ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] and d ( X ( t )) ≥ ra N , and that A (cid:48) is F t -measurable. Now, on the event A (cid:48) , at most N − γ time- t particles are not descended from ( N, T ) , and therefore a union bound on the uniformly chosen sample (which is not F t -measurable)shows that the right-hand side of (6.9) is at most M N − γ /N + P (( A (cid:48) ) c ) . This implies (6.8) for N suﬃciently large, and by (6.7) and (6.8) we are done with step 1.We next prove step 2. Assume the event (cid:84) j =2 C j ∩ G occurs. Then there exists ( i ∗ , b ∗ , s ∗ ) ∈ B (cid:48) N with s ∗ ∈ (cid:74) t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) − (cid:75) . We notice that every jump in the time interval [ t , s ∗ − has size at most a N on the event G . Thus, we can apply Lemma 6.1 with s = s ∗ − (cid:96) N > t , ρ and c as chosen at the beginning of the proof, and with r = 1 . We then obtain d ( X ( s ∗ )) ≤ (1 + c ) a N . (6.10)This means that a particle that makes a jump larger than ( r + 3) a N at time s ∗ must take the leadat time s ∗ + 1 . Indeed, X i ∗ ( s ∗ ) + X i ∗ ,b ∗ ,s ∗ > X ( s ∗ ) + ( r + 3) a N ≥ X N ( s ∗ ) + ( r + 2 − c ) a N , (6.11)58here in the ﬁrst inequality we use that X i ∗ ( s ∗ ) ≥ X ( s ∗ ) and that ( i ∗ , b ∗ , s ∗ ) ∈ B (cid:48) N , and the secondinequality follows by (6.10). Note that our choice of constants means that ρ < r + 2 − c < r + 3 holds (see e.g. (4.24) and (4.31)); thus we have B (cid:48) N ⊆ B N , and Lemma 3.5(b) applies. Therefore,by Lemma 3.5(b), we have ( i ∗ , s ∗ ) (cid:46) b ∗ ( N, s ∗ + 1) and X i ∗ ( s ∗ ) + X i ∗ ,b ∗ ,s ∗ = X N ( s ∗ + 1) > X N − ( s ∗ + 1) + ( r + 2 − c − ρ ) a N , (6.12)which also shows that s ∗ ∈ S N ( ρ ) , where S N ( ρ ) is the set of times when the record is broken by abig jump (see (2.16)).Now we prove that s ∗ + 1 = T ( ρ ) and d ( X ( t )) ≥ ra N . Let ˆ s ∈ (cid:74) s ∗ + 1 , t − (cid:75) be arbitrary (andnote that (cid:74) s ∗ + 1 , t − (cid:75) is not empty for N suﬃciently large). We will see that ˆ s / ∈ S N ( ρ ) , andtherefore T ( ρ ) / ∈ (cid:74) s ∗ + 2 , t (cid:75) , i.e. T ( ρ ) = s ∗ + 1 .Take k ∈ [ N − , and assume that j ∈ N k,s ∗ +1 (ˆ s + 1) . Note that | B N ∩ P j, ˆ s +1 k,s ∗ +1 | ≤ by thedeﬁnition of the event C , and that every jump in the time interval [ s ∗ + 1 , t − is at most of size a N by the deﬁnition of the event G . Hence, by the deﬁnition of the event C we have X j (ˆ s + 1) ≤ X k ( s ∗ + 1) + c a N + (cid:88) ( i,b,s ) ∈ B N ∩ P j, ˆ s +1 k,s ∗ +1 X i,b,s ≤ X N − ( s ∗ + 1) + ( c + 1) a N < X N ( s ∗ + 1) − ( r + 1 − c − ρ ) a N ≤ X N (ˆ s + 1) − ( r + 1 − c − ρ ) a N , (6.13)where in the second inequality we also use that k ≤ N − , the third inequality follows by (6.12),and the fourth by monotonicity.Then (6.13) has two consequences. First, it shows that X j (ˆ s + 1) < X N (ˆ s + 1) (see e.g. (4.24)and (4.31)); thus the leader at time ˆ s + 1 must descend from particle ( N, s ∗ + 1) ; that is, ζ N, ˆ s +1 ( s ∗ +1) = N . Note that we also have X i,b, ˆ s ≤ ρa N for all i ∈ N N,s ∗ +1 (ˆ s ) and b ∈ { , } by the deﬁnitionof the event C . We conclude that the record is not broken by a big jump at time ˆ s + 1 , which meansthat ˆ s / ∈ S N ( ρ ) . Since ˆ s ∈ (cid:74) s ∗ + 1 , t − (cid:75) was arbitrary, and s ∗ ∈ S N ( ρ ) , we must have T ( ρ ) = s ∗ + 1 ,by the deﬁnition (2.17) of T ( ρ ) . Hence, (cid:92) i =2 C i ∩ G ⊆ { T ( ρ ) ∈ [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ] } . (6.14)The second consequence of (6.13) is that d ( X (ˆ s + 1)) > ra N , since c + ρ < . Indeed, we noticethat since s ∗ + 1 > t and ˆ s + 1 ≤ t , the number of descendants of particle ( N, s ∗ + 1) is strictlyless than N at time ˆ s + 1 . Thus, there exists k ∈ [ N − such that N k,s ∗ +1 (ˆ s + 1) (cid:54) = ∅ , and for sucha k and for some j ∈ N k,s ∗ +1 (ˆ s + 1) the bound in (6.13) holds, and shows that d ( X (ˆ s + 1)) > ra N .Since ˆ s ∈ (cid:74) s ∗ + 1 , t − (cid:75) was arbitrary we conclude (cid:92) i =2 C i ∩ G ⊆ { d ( X ( t )) ≥ ra N } . (6.15)As Propositions 3.11 and 3.2 (and the deﬁnition of A in (2.22)) imply for N suﬃciently largethat (cid:92) j =2 C j ∩ (cid:92) i =1 D i ∩ G ⊆ (cid:92) i =1 C i ∩ G ⊆ A ⊆ (cid:8) |N N,T ( ρ ) ( t ) | ≥ N − N − γ (cid:9) , G says that out of the N (cid:96) N jumps occurring in the time interval [ t , t − ,there are N (cid:96) N − jumps of size at most a N , and there is one larger than ( r + 3) a N , which canhappen any time during the time interval [ t + (cid:100) s (cid:96) N (cid:101) , t + (cid:100) s (cid:96) N (cid:101) ) . Using that (cid:100) s (cid:96) N (cid:101)− −(cid:100) s (cid:96) N (cid:101) ≥ ( s − s ) (cid:96) N / for large N , we have for N suﬃciently large, P ( G ) ≥ N ( s − s )2 (cid:96) N · h (( r + 3) a N ) − (cid:0) − h ( a N ) − (cid:1) N(cid:96) N − ≥ ( s − s )2 h ( a N ) h (( r + 3) a N ) · N (cid:96) N h ( a N ) · e − N(cid:96)Nh ( aN ) ≥ s − s r + 3) α · e − , where the second inequality holds if N is suﬃciently large that − h ( a N ) − > e − h ( a N ) − , which ispossible because h ( a N ) → ∞ as N → ∞ by (4.18). In the third inequality we use that h ( a N ) /h (( r +3) a N ) ≥ ( r + 3) − α / for N large enough by (1.2) and (4.17), and that / ≤ N (cid:96) N /h ( a N ) ≤ for N large enough by (4.19). This completes step 3.For step 4, we note that we chose the constants η , γ , δ , ρ , c , c , . . . , c , K and ν in such a waythat the probability bounds in Propositions 2.6 and 2.7 and Lemma 4.6 hold for N suﬃciently largeand t > (cid:96) N . Hence, putting steps 1 to 3 together we conclude P (cid:0) ( A (cid:48) ) c ∪ { d ( X ( t )) < ra N } (cid:1) ≤ (cid:88) j =2 P ( C cj ) + (cid:88) i =1 P ( D ci ) + P ( G c ) + P ( A c ) + P ( A ( ν ) c ) + η ≤ − s − s r + 3) α · e − + 5 η< − π r,s − s , where in the last inequality we used (6.6). This ﬁnishes the proof of Proposition 2.2.The proof of Proposition 2.3 involves some of our previous results. We will use the statement ofProposition 2.2 about the diameter to prove that for any ﬁxed r > , P ( d ( X ( n )) ≥ ra N ) can be lowerbounded by a positive constant. Then the statement of Proposition 3.2 about the diameter showsthat on the events C to C the diameter at time t is greater than c a N , so, considering Lemma 4.6,we will see that the diameter is at least of order a N at a typical time with high probability. Finally,we will conclude that the diameter is at most of order a N with high probability using Lemma 6.1,and also using that jumps of size ra N are unlikely to happen in (cid:96) N time if r is very large. Proof of Proposition 2.3.

Take η, γ, δ, ρ, c , c , . . . , c , K such that they satisfy (4.23), (a)-(j), andtherefore also (3.2)-(3.5) and (4.24)-(4.31) (and η may be arbitrarily small). Let r > be arbitrary.Let s = 1 / , s = 1 / , M = 3 . Then we take π r,s − s > and N ∈ N suﬃciently large that thebounds in Proposition 2.2 and Lemma 4.6 and the inclusions in Propositions 3.2 and 3.11 and inLemma 6.1 hold with the above constants and for all t > (cid:96) N . Furthermore, we assume that N issuﬃciently large that e − h ( ra N / − < − h ( ra N / − , (6.16) h ( a N ) h ( ra N / ≤ r/ − α , (6.17)and N (cid:96) N h ( a N ) ≤ . (6.18)60e can take N suﬃciently large that (6.16), (6.17) and (6.18) hold because of (4.18), (4.17) (i.e. a N → ∞ as N → ∞ ), (1.2) and (4.19). Having ﬁxed N with these properties, take n > (cid:96) N .First we apply Proposition 2.2 in the above setting with t = n + (cid:96) N (and t = n ). The propositionimplies that < π r,s − s < P ( d ( X ( n )) ≥ ra N ) . (6.19)Now we prove that if r is suﬃciently small then we have P ( d ( X ( n )) < ra N ) < η. (6.20)Assume that r < c , where c was speciﬁed at the beginning of this proof.Consider the events ( C j ) j =2 and ( D i ) i =1 with the constants γ, δ, ρ, c , c , . . . , c , K and with t = n + (cid:96) N . By Propositions 3.11 and 3.2 we have (cid:92) j =2 C j ∩ (cid:92) i =1 D i ⊆ (cid:92) j =1 C j ⊆ (cid:8) d ( X ( n )) ≥ c a N (cid:9) . Therefore, since r < c , and then by Lemma 4.6, we have P ( d ( X ( n )) < ra N ) ≤ P ( d ( X ( n )) < c a N ) ≤ (cid:88) j =2 P ( C cj ) + (cid:88) i =1 P ( D ci ) < η, which establishes (6.20).Next we prove that if r is suﬃciently large then P ( d ( X ( n )) ≥ ra N ) < η. (6.21)Assume r > . We apply Lemma 6.1 with t = n + (cid:96) N , s = n − (cid:96) N and r = r/ . Note that by (4.24)and (4.26) we have r + c < r . Then Lemma 6.1 implies P ( d ( X ( n )) ≥ ra N ) ≤ P ( ∃ ( i, b, s ) ∈ [ N ] × { , } × (cid:74) n − (cid:96) N , n − (cid:75) : X i,b,s > r a N )= 1 − (1 − h ( ra N / − ) N(cid:96) N ≤ − exp (cid:18) − N (cid:96) N h ( ra N / (cid:19) = 1 − exp (cid:18) − N (cid:96) N h ( a N ) h ( a N ) h ( ra N / (cid:19) ≤ − exp (cid:0) − r/ − α (cid:1) , (6.22)where in the equality we use the tail distribution (1.3) for the N (cid:96) N jumps in the time interval (cid:74) n − (cid:96) N , n − (cid:75) , the second inequality holds by (6.16), and in the third we use (6.17) and (6.18).Then (6.22) shows that (6.21) holds for r suﬃciently large.Since η > was arbitrarily small, (6.19) and (6.20) show the existence of p r and (6.21) provesthe existence of q r as in the statement of Proposition 2.3, and therefore we have ﬁnished the proofof this result. Below we list the most frequently used notation of this paper. In the second column of the tablewe give a brief description, and in the third column we refer to the section or equation where thenotation is deﬁned or ﬁrst appears. 61 otation Meaning Def./Sect. N number of particles Sect. 1.1 ( i, n ) refers to the i th particle from the left at time n Sect. 1.1 X i ( n ) location of the i th particle from the left at time n Sect. 1.1 h the function /h deﬁnes the tail of the jump distribution (1.3) α h is regularly varying with index α > (1.2), (1.3) (cid:96) N time scale: (cid:96) N = (cid:100) log N (cid:101) (1.4) a N space scale: a N = h − (2 N (cid:96) N ) , h ( a N ) ∼ N (cid:96) N (1.5) t t ∈ N is an arbitrary time, we assume t > (cid:96) N Sect. 1.3 t i t i = t − i(cid:96) N , we use t , t , t , t (1.7) X i,b,n jump size of the b th oﬀspring of particle ( i, n ) Sect. 2.1 ( i, b, n ) refers to the jump X i,b,n of the b th oﬀspring of particle ( i, n ) Sect. 2.4 d ( X ( n )) diameter of the particle cloud at time n (2.6) ( i, n ) (cid:46) ( j, n + k ) particle ( i, n ) is the time- n ancestor of particle ( j, n + k ) (2.8) ( i, n ) (cid:46) b ( j, n + k ) the b th oﬀspring of particle ( i, n ) is the time- ( n + 1) ancestorof particle ( j, n + k ) Sect. 2.4 ζ i,n + k ( n ) ζ i,n + k ( n ) ∈ [ N ] is the index of the time- n ancestor of theparticle ( i, n + k ) (2.9) P i k ,n + ki ,n path (sequence of jumps) between particles ( i , n ) and ( i k , n + k ) , if ( i , n ) (cid:46) ( i k , n + k ) (2.10) N i,n ( n + k ) N i,n ( n + k ) ⊆ [ N ] is the set of time- ( n + k ) descendants ofparticle ( i, n ) (2.12) N bi,n ( n + k ) N bi,n ( n + k ) ⊆ [ N ] is the set of time- ( n + k ) descendants ofthe b th oﬀspring of particle ( i, n ) (2.13) ρa N jumps of size greater than ρa N are called big jumps Sect. 2.5 B N set of big jumps (2.14), (2.15) S N set of times when the record is broken by a big jump (2.16) ˆ S N times when the leader is surpassed by a big jump (2.18) T time of the common ancestor of almost every particle at time t Sect. 1.3 T = T ( ρ ) the last time before t when a particle breaks the record witha big jump (2.17) ( N, T ) the leader (rightmost) particle at time T Sect. 1.4 Z i ( s ) distance between the i th and the rightmost particle (3.11)Next, we list the events which appear throughout our main argument. We give a brief explanationof each event and refer to the equation where the event is deﬁned. We also include short descriptionsof the main results involving these events to give a summary of the major steps of the proof ofTheorem 2.1. We write “whp” as shorthand for “with high probability”. Event Meaning Def./Sect. A Almost the whole population is close to the leftmost particle at time t . (2.3) A The genealogy of the population at time t is given by a star-shaped coales-cent; there is a common ancestor at time T ∈ [ t , t ] . (2.4) A and A occur whp (Theorem 2.1)62 Almost every particle at time t descends from the leader at time T ∈ [ t , t ] . (2.22) A Shortly after time T no particle has a positive proportion of the populationas descendants at time t . (2.23)If A and A occur whp then A occurs whp (Lemma 2.5)The event A occurs whp (Proposition 2.7)The event A ∩ A occurs whp (Proposition 2.6). This is shown using theevents below. B There is a leading tribe, descended from the leader at time T ∈ [ t , t ] ,which is a signiﬁcant distance from the other particles at time t . (3.7) B Particles which are not in the leading tribe at time t have o ( N ) descendantsin total at time t . (3.8) B ∩ B ⊆ A (Lemma 3.1) C A particle leads by a large distance compared to the second rightmostparticle at some point in [ t + 1 , t ] . (3.10) C Particles far from the leader stay far behind or beat the leader by a lot. (3.12) C There is at most one big jump on a path of length (cid:96) N . (3.13) C Paths without big jumps move very little on the a N space scale. (3.14) C Two big jumps cannot happen at the same time. (3.15) C No big jumps happen at times very close to t or t . (3.16) C The number of big jumps performed in [ t , t ] is bounded above by a constantindependent of N . (3.17) (cid:84) j =1 C j ⊆ B ∩ B ∩ A ⊆ A ∩ A (Proposition 3.2) D Same as C with diﬀerent constants. (3.47) D In every short interval on the (cid:96) N time scale, at least one big jump largerthan a certain size occurs. (3.48) D In the ﬁrst half of [ t , t ] a big jump larger than a certain size occurs. (3.49) D Shortly before time t , only jumps smaller than a certain size occur. (3.50) D During a short time interval, jumps of size in a certain small range do nothappen. (3.52) (cid:84) j =2 C j ∩ (cid:84) i =1 D i ⊆ C (Proposition 3.11)The events C − C , D − D all occur whp (Lemma 4.6) Acknowledgements

MR would like to thank the Royal Society for funding his University Research Fellowship. ZT issupported by a scholarship from the EPSRC Centre for Doctoral Training in Statistical AppliedMathematics at Bath (SAMBa), under the project EP/L015684/1.

References [1] Jean Bérard and Jean-Baptiste Gouéré. Brunet-Derrida behavior of branching-selection particlesystems on the line.

Communications in Mathematical Physics , 298(2):323–342, 2010.632] Jean Bérard and Pascal Maillard. The limiting process of N -particle branching random walkwith polynomial tails. Electron. J. Probab. , 19, 2014.[3] Julien Berestycki, Nathanaël Berestycki, and Jason Schweinsberg. The genealogy of branchingBrownian motion with absorption.

Ann. Probab. , 41(2):527–618, 2013.[4] Julien Berestycki, Éric Brunet, and Sarah Penington. Global existence for a free boundaryproblem of Fisher–KPP type.

Nonlinearity , 32(10):3912, 2019.[5] J. D. Biggins. The ﬁrst- and last-birth problems for a multitype age-dependent branchingprocess.

Advances in Applied Probability , 8(3):446–459, 1976.[6] N. H. Bingham, C. M. Goldie, and J. L. Teugels.

Regular Variation . Encyclopedia of Mathe-matics and its Applications. Cambridge University Press, 1987.[7] É. Brunet, B. Derrida, A. H. Mueller, and S. Munier. Noisy traveling waves: Eﬀect of selectionon genealogies.

Europhys. Lett. , 76(1):1–7, 2006.[8] É. Brunet, B. Derrida, A. H. Mueller, and S. Munier. Eﬀect of selection on ancestry: an exactlysoluble case and its phenomenological generalization.

Phys. Rev. E , 76:041104, 2007.[9] Éric Brunet and Bernard Derrida. Shift in the velocity of a front due to a cutoﬀ.

Phys. Rev.E , 56:2597–2604, 1997.[10] Éric Brunet and Bernard Derrida. Microscopic models of traveling wave equations.

ComputerPhysics Communications , 121:376–381, 2000.[11] Anna De Masi, Pablo A Ferrari, Errico Presutti, and Nahuel Soprano-Loto. Hydrodynamicsof the N -BBM process. In International workshop on Stochastic Dynamics out of Equilibrium ,pages 523–549. Springer, 2017.[12] D. Denisov, A. B. Dieker, and V. Shneer. Large deviations for random walks under subexpo-nentiality: The big-jump domain.

Ann. Probab. , 36(5):1946–1991, 2008.[13] R. Durrett. Maxima of branching random walks.

Zeitschrift für Wahrscheinlichkeitstheorie undVerwandte Gebiete , 62:165–170, 1983.[14] Rick Durrett and Daniel Remenik. Brunet–Derrida particle systems, free boundary problemsand Wiener–Hopf equations.

Ann. Probab. , 39(6):2043–2078, 2011.[15] Nina Gantert. The maximum of a branching random walk with semiexponential increments.

Ann. Probab. , 28(3):1219–1229, 2000.[16] J. M. Hammersley. Postulates for subadditive processes.

Ann. Probab. , 2(4):652–680, 1974.[17] J. F. C. Kingman. The ﬁrst birth problem for an age-dependent branching process.

Ann.Probab. , 3(5):790–801, 1975.[18] Pascal Maillard. Speed and ﬂuctuations of N -particle branching Brownian motion with spatialselection. Probability Theory and Related Fields , 166(3-4):1061–1173, 2016.[19] Colin McDiarmid. Concentration. In