[PDF] Branching Brownian motion and Selection in the Spatial Lambda-Fleming-Viot Process

Abstract

We ask the question "when will natural selection on a gene in a spatially structured population cause a detectable trace in the patterns of genetic variation observed in the contemporary population?". We focus on the situation in which 'neighbourhood size', that is the effective local population density, is small. The genealogy relating individuals in a sample from the population is embedded in a spatial version of the ancestral selection graph and through applying a diffusive scaling to this object we show that whereas in dimensions at least three, selection is barely impeded by the spatial structure, in the most relevant dimension, d=2 , selection must be stronger (by a factor of log(1/μ) where μ is the neutral mutation rate) if we are to have a chance of detecting it. The case d=1 was handled in Etheridge et al. (2015). The mathematical interest is that although the system of branching and coalescing lineages that forms the ancestral selection graph converges to a branching Brownian motion, this reflects a delicate balance of a branching rate that grows to infinity and the instant annullation of almost all branches through coalescence caused by the strong local competition in the population.

Full PDF

aa r X i v : . [ m a t h . P R ] N ov Submitted to the Annals of Applied Probability

BRANCHING BROWNIAN MOTION AND SELECTION INTHE SPATIAL Λ-FLEMING-VIOT PROCESS

By Alison Etheridge ∗ , Nic Freeman, SarahPenington † and Daniel Straulino ‡ University of Oxford, University of Sheﬃeld and Heilbronn Institute forMathematical Research

We ask the question “when will natural selection on a gene in aspatially structured population cause a detectable trace in the pat-terns of genetic variation observed in the contemporary population?”.We focus on the situation in which ‘neighbourhood size’, that is theeﬀective local population density, is small. The genealogy relatingindividuals in a sample from the population is embedded in a spa-tial version of the ancestral selection graph and through applying adiﬀusive scaling to this object we show that whereas in dimensionsat least three, selection is barely impeded by the spatial structure,in the most relevant dimension, d = 2, selection must be stronger(by a factor of log(1 /µ ) where µ is the neutral mutation rate) if weare to have a chance of detecting it. The case d = 1 was handled inEtheridge et al. (2015).The mathematical interest is that although the system of branch-ing and coalescing lineages that forms the ancestral selection graphconverges to a branching Brownian motion, this reﬂects a delicatebalance of a branching rate that grows to inﬁnity and the instantannullation of almost all branches through coalescence caused by thestrong local competition in the population.

1. Introduction.

Our aims in this work are two-fold. On the one hand,we address a question of interest in population genetics: when will the actionof natural selection on a gene in a spatially structured population cause adetectable trace in the patterns of genetic variation observed in the con-temporary population? On the other hand, we investigate some of the richstructure underlying mathematical models for spatially evolving populationsand, in particular, the systems of interacting random walks that, as dualprocesses (corresponding to ancestral lineages of the model), describe the ∗ supported in part by EPSRC Grant EP/I01361X/1 † supported by EPSRC DTG EP/K503113/1 ‡ supported by CONACYTPrimary 60G99, Secondary 92B05 Keywords and phrases:

Spatial Lambda-Fleming-Viot Process, branching, coalescing,natural selection, branching Brownian motion, population genetics A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO genetic relationships between individuals sampled from those populations.Since the seminal work of Fisher (1937), a large literature has developedthat investigates the interaction of natural selection with the spatial struc-ture of a population. Traditionally, the deterministic action of migration andselection is approximated by what we now call the Fisher-KPP equation andpredictions from that equation are compared to data. However, many im-portant questions depend on how selection and migration interact with athird force, the stochastic ﬂuctuations known as random genetic drift, andthis poses signiﬁcant new mathematical challenges.For the most part, random drift is modelled through Wright-Fisher noiseresulting in a stochastic PDE as a model for the evolution of gene frequencies w : ∂w∂t = m ∆ w − sw (1 − w ) + p γw (1 − w ) ˙ W (for suitable constants m , s and γ ), where W is space-time white noise.This stochastic Fisher-KPP equation has been extensively studied, see, forexample, Mueller et al. (2008) and references therein. However, from a mod-elling perspective it has two immediate shortcomings. First, it only makessense in one spatial dimension. This is generally overcome by artiﬁciallysubdividing the population, and thus replacing the stochastic PDE by asystem of stochastic ordinary diﬀerential equations, coupled through migra-tion. The second problem is that, in deriving the equation, one allows the‘neighbourhood size’ to tend to inﬁnity. We shall give a precise deﬁnitionof neighbourhood size in Section 2. Loosely, it is inversely proportional tothe probability that two individuals sampled from suﬃciently close to oneanother had a common parent in the previous generation and small neigh-bourhood size corresponds to strong genetic drift. It is understanding theimplications of dropping this (usually implicit) assumption of unboundedneighbourhood size that motivated the work presented here.Our starting point will be the Spatial Λ-Fleming-Viot process with selec-tion (SΛFVS), which (along with its dual) was introduced and constructedin Etheridge et al. (2014). The dynamics of both the SΛFVS and its dualare driven by a Poisson Point Process of ‘events’ (which model reproductionor extinction and recolonisation in the population) and will be describedin detail in Section 2. The advantage of this model is that it circumventsthe need to subdivide the population in higher dimensions. However, sinceour proof is based on an analysis of the branching and coalescing system ofrandom walkers that describes the ancestry of a sample from the popula-tion, it would be straightforward to modify it to apply to, for example, anindividual based model in which a ﬁxed number of individuals reside at each BM AND SELECTION IN THE SΛFV PROCESS point of a d -dimensional lattice.In classical models of population genetics, in which there is no spatialstructure, we generally think of population size as setting the timescale ofevolution of frequencies of diﬀerent genetic types. Evidently that makes nosense in our setting. However (even in the classical setting), as we explainin more detail in Section 3, if natural selection is to leave a distinguishabletrace in contemporary patterns of genetic variation, then a suﬃciency ofneutral mutations must fall on the genealogical trees relating individualsin a sample. Thus, in fact, it is the neutral mutation rate which sets thetimescale and, since mutation rates are very low, this leads us to considerscaling limits.In Etheridge et al. (2014), scaling limits of the (forwards in time) SΛFVSwere considered in which the neighbourhood size tends to inﬁnity. In thatcase, the classical Fisher-KPP equation and, in one spatial dimension, itsstochastic analogue are recovered. The dual process of branching and coa-lescing lineages converges to branching Brownian motion, with coalescenceof lineages (in one dimension) at a rate determined by the local time thatthey spend together. In this article we consider scaling limits in the (verydiﬀerent) regime in which neighbourhood size remains ﬁnite. In this contextthe interaction between genetic drift and spatial structure becomes muchmore important and, in contrast to Etheridge et al. (2014), it is the dualprocess which proves to be the more analytically tractable object.We shall focus on the most biologically relevant case of two spatial dimen-sions. The case of one dimension was discussed in Etheridge et al. (2015).The main interest there is mathematical: the dual process of branching andcoalescing ancestral lineages, suitably scaled, converges to the Brownian net.However, the scaling required to obtain a non-trivial limit reveals a strongeﬀect of the spatial structure. Here we shall identify the corresponding scal-ings in dimensions d ≥

2. Whereas in Etheridge et al. (2014), the scalingof the selection coeﬃcient is independent of spatial dimension and, indeed,mirrors that for unstructured populations, for bounded neighbourhood sizethis is no longer the case. In d = 1 and d = 2 the scaling of the selectioncoeﬃcient required to obtain a non-trivial limit reﬂects strong local compe-tition.Our main result, Theorem 2.7, is that under these (dimension-dependent)scalings, the scaled dual process converges to a branching Brownian motion.For d ≥ d = 2,under our scaling, the rate of branching of ancestral lineages explodes to in-ﬁnity but, crucially, all except ﬁnitely many branches are instantaneously A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO annulled through coalescence. That this ﬁnely balanced picture produces anon-degenerate limit results from a combination of the failure of two dimen-sional Brownian motion to hit points and the strong (local) interactions ofthe approximating random walks, which cause coalescence.From a biological perspective, the main interest is that, in contrast tothe inﬁnite neighbourhood size limit, here we see a strong eﬀect of spatialdimension in our results. When neighbourhood size is very big, the proba-bility of ﬁxation for an advantageous genetic type, i.e. the probability thatthe genetic type establishes and sweeps through the entire population, isnot aﬀected by spatial structure. When neighbourhood size is small, in (oneand) two spatial dimensions, selection has to be much stronger to leave adetectable trace than in a population with no spatial structure. Indeed, localestablishment is no longer a guarantee of eventual ﬁxation.The rest of the paper is laid out as follows. In Section 2 we describe theSΛFVS and the dual process of branching and coalescing random walks,state our main result and provide a heuristic argument that explains ourchoice of scalings. In Section 3 we place our ﬁndings in the context of previouswork on selective sweeps in spatially structured populations and in Section 4we prove our result.

Acknowledgements

Our results (with diﬀerent proofs) form part of the DPhil thesis of thelast author. We would like to thank the examiners, Christina Goldschmidtand Anton Wakolbinger, for their careful reading of the thesis and detailedfeedback. We would also like to thank the two anonymous referees for theircareful reading of the paper and valuable comments.

2. The model and main result.

The model.

To motivate the deﬁnition of the SΛFVS, it is conve-nient to recall (a very special case of) the model without selection, intro-duced in Etheridge (2008); Barton et al. (2010). We shall call it the SΛFV toemphasize that selection is not acting. We proceed informally, only carefullyspecifying the state space and conditions that are suﬃcient to guaranteeexistence of the process when we deﬁne the SΛFVS itself in Deﬁnition 2.3.The interested reader can ﬁnd much more general conditions under whichthe SΛFV exists in Etheridge and Kurtz (2014).We restrict ourselves to the case of just two genetic types, which wedenote a and A , and we suppose that the population is evolving in R d . It isconvenient to index time by the whole real line. At each time t , the random BM AND SELECTION IN THE SΛFV PROCESS function { w t ( x ) , x ∈ R d } is deﬁned, up to a Lebesgue null set of R d , by(2.1) w t ( x ) := proportion of type a at spatial position x at time t. The dynamics are driven by a Poisson point process Π on R × R d × R + × (0 , t, x, r, u ) ∈ Π speciﬁes a reproduction event which will aﬀectthat part of the population at time t which lies within the closed ball B r ( x )of radius r centred on the point x . First the location z of the parent of theevent is chosen uniformly at random from B r ( x ). All oﬀspring inherit thetype α of the parent which is determined by w t − ( z ); that is, with probability w t − ( z ) all oﬀspring will be type a , otherwise they will be A . A portion u ofthe population within the ball is then replaced by oﬀspring so that w t ( y ) = (1 − u ) w t − ( y ) + u { α = a } , ∀ y ∈ B r ( x ) . The population outside the ball is unaﬀected by the event. We sometimescall u the impact of the event.Under this model, the time reversal of the same Poisson Point Process ofevents governs the ancestry of a sample from the population. Each ancestrallineage that lies in the region aﬀected by an event has a probability u of beingamong the oﬀspring of the event, in which case, as we trace backwards intime, it jumps to the location of the parent, which is sampled uniformly fromthe region. In this way, ancestral lineages evolve according to (dependent)compound Poisson processes and lineages can coalesce when aﬀected by thesame event. All lineages aﬀected by an event inherit the type of the parentof that event. Remark . In Etheridge and Kurtz (2014), the S Λ FV and its dual areconstructed simultaneously on the same probability space, through a lookdownconstruction, as the limit of an individual based model, and so the dualprocess just described really can be interpreted as tracing the ancestry ofindividuals in a sample from the population.

We are now in a position to deﬁne the neighbourhood size.

Definition . Write σ for the variance of the ﬁrst coordinate of thelocation of a single ancestral lineage after one unit of time and η ( x ) forthe instantaneous rate of coalescence of two lineages that are currently at aseparation x ∈ R d . Then the neighbourhood size , N is given by N = 2 dC d σ R R d η ( x ) dx , where C d is the volume of the unit ball in R d . A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

Neighbourhood size is used in biology to quantify the local number ofbreeding individuals in a continuous population; see Barton et al. (2013b)for a derivation of this formula. If we assume that the impact is the same forall events, then the impact is inversely proportional to the neighbourhoodsize, see Barton et al. (2013b).There are very many diﬀerent ways in which to introduce selection intothe SΛFV. Our approach here is a simple adaptation of that adopted inclassical models of population genetics. The parental type in the SΛFV isa uniform pick from the types in the region aﬀected by the event. We canintroduce a small advantage to individuals of type A by choosing the parentin a weighted way. Thus if, immediately before reproduction, the proportionof type a individuals in the region aﬀected by the event is w , then theoﬀspring will be type a with probability w/ (1 + s (1 − w )). We say that therelative ﬁtnesses of types a and A are 1 and 1 + s respectively and refer to s as the selection coeﬃcient. We are interested only in small values of s andso we expand w s (1 − w ) = w { − s (1 − w ) } + O ( s ) = (1 − s ) w + s w + O ( s ) . We shall regard s as being negligible. We can then think of each event, inde-pendently, as being a ‘neutral’ event with probability (1 − s ) and a ‘selective’event with probability s . Reproduction during neutral events is exactly asbefore, but during selective events, we sample two potential parents; only ifboth are type a will the oﬀspring be of type a .Let us now give a more precise deﬁnition of the SΛFVS. We retain thenotation of (2.1). A construction of an appropriate state space for x w t ( x )can be found in V´eber and Wakolbinger (2015). Using the identiﬁcation Z R d ×{ a,A } f ( x, κ ) M ( dx, dκ ) = Z R d (cid:8) w ( x ) f ( x, a ) + (1 − w ( x )) f ( x, A ) (cid:9) dx, this state space is in one-to-one correspondence with the space M λ of mea-sures on R d × { a, A } with ‘spatial marginal’ Lebesgue measure, which weendow with the topology of vague convergence. By a slight abuse of notation,we also denote the state space of the process ( w t ) t ∈ R by M λ . Definition . Fix

R ∈ (0 , ∞ ) . Let µ be a ﬁnite measure on (0 , R ] and, for each r ∈ (0 , R ] , let ν r be a probabilitymeasure on (0 , . Further, let Π be a Poisson point process on R × R d × (0 , R ] × (0 , with intensity measure (2.2) dt ⊗ dx ⊗ µ ( dr ) ν r ( du ) . BM AND SELECTION IN THE SΛFV PROCESS The spatial Λ-Fleming-Viot process with selection (S Λ FVS) driven by (2.2) is the M λ -valued process ( w t ) t ∈ R with dynamics given as follows.If ( t, x, r, u ) ∈ Π , a reproduction event occurs at time t within the closedball B r ( x ) of radius r centred on x . With probability − s the event is neutral ,in which case:1. Choose a parental location z uniformly at random within B r ( x ) , anda parental type, α , according to w t − ( z ) , that is α = a with probability w t − ( z ) and α = A with probability − w t − ( z ) .2. For every y ∈ B r ( x ) , set w t ( y ) = (1 − u ) w t − ( y ) + u { α = a } .With the complementary probability s the event is selective , in which case:1. Choose two ‘potential’ parental locations z, z ′ independently and uni-formly at random within B r ( x ) , and at each of these sites ‘potential’parental types α , α ′ , according to w t − ( z ) , w t − ( z ′ ) respectively.2. For every y ∈ B r ( x ) set w t ( y ) = (1 − u ) w t − ( y ) + u { α = α ′ = a } . Declarethe parental location to be z if α = α ′ = a or α = α ′ = A and to be z (resp. z ′ ) if α = A, α ′ = a (resp. α = a, α ′ = A ). This is a very special case of the SΛFVS introduced in Etheridge et al.(2014).We are especially concerned with the dual process of the SΛFVS. Whereasin the neutral case we can always identify the distribution of the locationof the parent of each event, without any additional information on the dis-tribution of types in the region, now, at a selective event, we are unableto identify which of the ‘potential parents’ is the true parent of the eventwithout knowing their types. These can only be established by tracing fur-ther into the past. The resolution is to follow all potential ancestral lineagesbackwards in time. This results in a system of branching and coalescingwalks.As in the neutral case, the dynamics of the dual are driven by the samePoisson point process of events, Π, that drove the forwards in time process.The distribution of this Poisson point process is invariant under time reversaland so we shall abuse notation by reversing the direction of time whendiscussing the dual.We suppose that at time 0 (which we think of as ‘the present’), we sample k individuals from locations x , . . . , x k and we write ξ s , . . . , ξ N s s for the lo-cations of the N s potential ancestors that make up our dual at time s beforethe present. Definition . The branching andcoalescing dual process (Ξ t ) t ≥ driven by Π is the S m ≥ ( R d ) m -valued Markov A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO process with dynamics deﬁned as follows: at each event ( t, x, r, u ) ∈ Π , withprobability − s , the event is neutral:1. For each ξ it − ∈ B r ( x ) , independently mark the corresponding lineagewith probability u ;2. if at least one lineage is marked, all marked lineages disappear and arereplaced by a single lineage, whose location at time t is drawn uniformlyat random from within B r ( x ) .With the complementary probability s , the event is selective:1. For each ξ it − ∈ B r ( x ) , independently mark the corresponding lineagewith probability u ;2. if at least one lineage is marked, all marked lineages disappear and arereplaced by two lineages, whose locations at time t are drawn indepen-dently and uniformly from within B r ( x ) .In both cases, if no lineages are marked, then nothing happens. Since we only consider ﬁnitely many initial individuals in the sample, andthe jump rate of the dual is bounded by a linear function of the number ofpotential ancestors, this description gives rise to a well-deﬁned process.This dual process is the analogue for the SΛFVS of the Ancestral SelectionGraph (ASG), introduced in the companion papers Krone and Neuhauser(1997); Neuhauser and Krone (1997), which describes all the potential an-cestors of a sample from a population evolving according to the Wright-Fisher diﬀusion with selection. Perhaps the simplest way of expressing theduality between the SΛFVS and the branching and coalescing dual processis to observe that all the individuals in our sample are of type a if andonly if all potential ancestral lineages are of type a at any time t in thepast. This is analogous to the moment duality between the ASG and theWright-Fisher diﬀusion with selection. However, to state this formally forthe SΛFVS, we would need to be able to identify E [ Q ni =1 w t ( x i )] for anychoice of points x , . . . , x n ∈ R d . The diﬃculty is that, just as in the neutralcase, the SΛFVS w t ( x ) is only deﬁned at Lebesgue almost every point x andso we have to be satisﬁed with a ‘weak’ moment duality. Proposition . [Etheridge et al. (2014)] The spatial Λ -Fleming-Viotprocess with selection is dual to the process (Ξ t ) t ≥ in the sense that forevery k ∈ N and ψ ∈ C c (( R d ) k ) , we have E w (cid:20) Z ( R d ) k ψ ( x , . . . , x k ) (cid:26) k Y j =1 w t ( x j ) (cid:27) dx . . . dx k (cid:21) BM AND SELECTION IN THE SΛFV PROCESS = Z ( R d ) k ψ ( x , . . . , x k ) E { x ,...,x k } (cid:20) N t Y j =1 w (cid:0) ξ jt (cid:1)(cid:21) dx . . . dx k . (2.3)2.2. The main result.

Our main result concerns a diﬀusive rescaling ofthe dual process of Deﬁnition 2.4 and so from now on it will be convenientif forwards in time refers to forwards for the dual process.

We shall take the impact parameter, u , to be a ﬁxed number in (0 , ν r = δ u for all r ). In fact, the same arguments work when u is allowedto be random, as long as R RR ′ R uν r ( du ) µ ( dr ) > < R ′ < R , butthis would make our proofs notationally cumbersome.Let us describe the scaling more precisely. Suppose that µ is a ﬁnitemeasure on (0 , R ]. We shall assume for convenience that R is deﬁned insuch a way that for any δ > µ (( R − δ, R ]) >

0. For each n ∈ N , deﬁnethe measure µ n by µ n ( B ) = µ ( n / B ) , for all Borel subsets B of R + . It willbe convenient to write R n = R / √ n . At the n th stage of the rescaling, ourrescaled dual is driven by the Poisson point process Π n on R × R d × (0 , R n ]with intensity(2.4) n dt ⊗ n d/ dx ⊗ µ n ( dr ) . This corresponds to rescaling space and time from ( t, x ) to ( n − t, n − / x ).Importantly, we do not scale the impact u . Each event of Π n , independently,is neutral with probability 1 − s n and selective with probability s n , where(2.5) s n = ( log nn d = 2 , n d ≥ . In Etheridge et al. (2015) it was shown that in d = 1, one should take s n = 1 / √ n .Although not obvious for the SΛFVS itself, when considering the dualprocess it is not hard to understand why the scalings (2.4) and (2.5) shouldlead to a non-trivial limit.If we ignore the selective events, then a single ancestral lineage evolves asa pure jump process which is homogeneous in both space and time. Write V r for the volume of B r (0). The rate at which the lineage jumps from y to y + z can be written(2.6) m n ( dz ) = nu Z R n n d/ V r (0 , z ) V r µ n ( dr ) dz, A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO where V r (0 , z ) is the volume of B r (0) ∩ B r ( z ). To see this, by spatial homo-geneity, we may take the lineage to be at the origin in R d before the jump,and then, in order for it to jump to z , it must be aﬀected by an event thatcovers both 0 and z . If the event has radius r , then the volume of possiblecentres, x , of such events is V r (0 , z ) and so the intensity with which sucha centre is selected is n n d/ V r (0 , z ) µ n ( dr ). The parental location is chosenuniformly from the ball B r ( x ), so the probability that z is chosen as theparental location is dz/V r and the probability that our lineage is actuallyaﬀected by the event is u . Combining these yields (2.6).The total rate of jumps is Z R d m n ( dz ) = Z R n nu n d/ V r Z R d Z R d | x |

3, order 1 / log n in d = 2 and order 1 / √ n in d = 1. Therefore,in order to have a positive probability of seeing branching in the scalinglimit, in d ≥ s n is order1 /n . However, for d = 2, we need order log n branches before we expect toﬁnd one that is visible to us, hence the choice s n = log n/n . Remark . Our scaling mirrors that described in Durrett and Z¨ahle(2007) for a model of a hybrid zone (by which we mean a region in whichwe see both genetic types) which develops around a boundary between tworegions, in one of which type a individuals are selectively favoured and inthe other of which type A individuals are selectively favoured. In contrast toour continuum setting, their model is a spin system in which exactly oneindividual lives at each point of Z d . Before formally stating our main result, we need some notation. We shall

BM AND SELECTION IN THE SΛFV PROCESS denote by BBM( p, V ) binary branching Brownian motion started from thepoint p ∈ R d , with branching rate V and diﬀusion constant given by(2.8) σ = d Z R d | z | m n ( dz ) = d Z R d Z ∞ | z | u V r (0 , z ) V r µ ( dr ) dz where m n ( dz ) is deﬁned in (2.6). In other words, during their lifetime,which is exponentially distributed with parameter V , individuals follow d -dimensional Brownian motion with diﬀusion constant σ , at the end of whichthey die, leaving behind at the location where they died exactly two oﬀ-spring. We view BBM( p, V ) as a set of (continuous) paths, each startingat p , with precisely one path following each possible distinct sequence ofbranches.Similarly, we write P ( n ) ( p ) for the dual process of Deﬁnition 2.4, rescaledas in (2.4) and (2.5), started from a single individual at the point p ∈ R d and viewed as a collection of paths. Each path traces out a ‘potential an-cestral lineage’, deﬁned exactly as the ancestral lineages in the neutral caseexcept that at each selective event, if a lineage is aﬀected then it jumps tothe location of (either) one of the ‘potential parents’. Precisely one poten-tial ancestral lineage follows each possible route through the branching andcoalescing dual process.We deﬁne the events D n ( ǫ, T ) = ( ∀ l ∈ P ( n ) ( p ) , ∃ l ′ ∈ BBM( p, V ) : sup t ∈ [0 ,T ] | l ( t ) − l ′ ( t ) | ≤ ǫ ) , D ′ n ( ǫ, T ) = ( ∀ l ∈ BBM( p, V ) , ∃ l ′ ∈ P ( n ) ( p ) : sup t ∈ [0 ,T ] | l ( t ) − l ′ ( t ) | ≤ ǫ ) . (2.9) Theorem . Let d ≥ . There exists V ∈ (0 , ∞ ) such that the follow-ing holds. Let T < ∞ , p ∈ R ; then given ǫ > , there exists N ∈ N suchthat, for all n ≥ N there is a coupling between BBM ( p, V ) and P ( n ) ( p ) with P [ D n ( ǫ, T ) ∩ D ′ n ( ǫ, T )] ≥ − ǫ. We will give a proof of Theorem 2.7 only for d = 2. The case d ≥ Sketch of proof.

Consider a pair of potential ancestral lineages, ξ n, and ξ n, , created in some selective event which, without loss of generality, wesuppose happens at time zero. Suppose that we forget about further branchesand when ξ n,i is aﬀected by a neutral event it jumps to the location of the A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO parent; when it is aﬀected by a selective event it jumps to the location of oneof the potential parents (picked at random). Thus ξ n, and ξ n, are compoundPoisson processes which interact when (and only when) | ξ n, − ξ n, | ≤ R n .We choose a large constant c >

0. We begin by showing that ξ n, and ξ n, have probability Θ(1 / log n ) of reaching a distance 1 / (log n ) c from eachother without coalescing (we then say they have diverged). We also showthat the probability that ξ n, and ξ n, have not diverged or coalesced by time1 / (log n ) c is o (1 / (log n )), so coalescence will be instantaneous in the limit.Moreover, once they are 1 / (log n ) c apart, they won’t get within distance2 R n of each other again on a timescale of O (1). Hence from the point ofview of our scaling they stay apart and evolve essentially independently ofeach other.We exploit this observation by coupling the whole rescaled dual processwith a process in which diverged lineages move independently. We use anobject that we call a caterpillar which is deﬁned in the same way as therescaled dual process, except that selective events only result in branchingif at least time 1 / (log n ) c has elapsed since the previous branching. We stopthe caterpillar at the ﬁrst time a pair of lineages has either diverged orfailed to coalesce in time 1 / (log n ) c after branching. We then start two newindependent caterpillars at the positions of the pair of lineages, and continuein the same way, giving a ‘branching caterpillar’.The branching caterpillar can be coupled with the rescaled dual processby piecing together the independent Poisson point processes of events whichdrive each caterpillar into a single driving Poisson point process. We showthat under the coupling, the branching caterpillar and the rescaled dualprocess coincide with high probability, using the result that lineages at aseparation of at least 1 / (log n ) c are unlikely to interact again. Each individ-ual caterpillar converges in an appropriate sense to a segment of a Brownianpath run for an exponentially distributed lifetime, so we can couple thebranching caterpillar with the limiting branching Brownian motion.This programme is carried out in Section 4.

3. Biological background.

In this section, we shall set our work inthe context of the substantial biological literature. The reader concernedonly with the mathematics can safely skip to Section 4.The interplay between natural selection and the spatial structure of a pop-ulation is a question of longstanding interest in population genetics. Fisher(1937) studied the advance of selectively advantageous genetic types througha one-dimensional population using the deterministic diﬀerential equationnow known as the Fisher-KPP equation. This equation also makes sense in

BM AND SELECTION IN THE SΛFV PROCESS higher dimensions, but ignores genetic drift (the randomness due to repro-duction in a ﬁnite population). Work incorporating genetic drift has beenrestricted to either one spatial dimension (see Barton et al. (2013a) and ref-erences therein) or, more commonly, to subdivided populations. Maruyama(1970) studied the probability of ﬁxation of an advantageous genetic type(the probability that eventually the whole population carries this genetictype) in a subdivided population. The assumptions made in that articleare rather strong: if we think of the population as living on islands (or incolonies), then each island has constant total population size and its con-tribution to the next generation is in proportion to that size. Under theseassumptions, the probability of ﬁxation is not aﬀected by the populationstructure: it is the same as for a gene of the same selective advantage inan unstructured population of the same total size. Much subsequent workretained Maruyama’s assumptions, and so it is often assumed that spatialstructure has no inﬂuence on the accumulation of favourable genes. However,Barton (1993) showed that the extra stochasticity produced by the intro-duction of local extinctions and colonisations could signiﬁcantly change theﬁxation probability. This work was extended in, for example, Cherry (2003)and Whitlock (2003).A fundamental problem in genetics is to identify which parts of thegenome have been the target of natural selection. The random nature ofreproduction in ﬁnite populations means that some genetic types (alleles)will be carried by everyone in the population, even though they convey noparticular selective advantage. However, if a favourable mutation arises ina population and ‘sweeps’ to ﬁxation (i.e. increases in frequency until ev-erybody carries it), we expect the genealogical trees (that is the trees ofancestral lineages) relating individuals in a sample from the population todiﬀer from those that we observe in the absence of selection. In particular,they will be more ‘star-shaped’. Of course we cannot observe the genealog-ical trees directly, and so, instead, geneticists exploit the fact that genesare arranged on chromosomes: the ancestry at another position on the samechromosome will be correlated with that at the part of the genome that isthe target of selection. In order to detect selection one therefore examinesthe patterns of variation at other points on the same chromosome, so-calledlinked loci.In order for this approach to work, we require suﬃcient variability at thelinked loci that we see a signal of the distortion in the genealogical tree. Thismeans that we must consider the genealogy of a sample from the populationon the timescale set by the neutral mutation rate. If selection is too strong,the genealogy will be very short and we see no mutations and so we can A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO recover no information about the genealogical trees; if selection is too weak,we won’t be able to distinguish the patterns from those seen under neutralevolution.Since neutral mutation rates are rather small, this means that we areinterested in long timescales. Without selection, ancestral lineages in ourmodel follow symmetric random walks with bounded variance jumps andso we expect a diﬀusive scaling to capture patterns of neutral variation.Since we are looking for deviations from those patterns due to the action ofselection, it makes sense to consider a diﬀusive rescaling in the selective casetoo. Thus, if the neutral mutation rate is µ , then we look at the rescaleddual process with n = 1 /µ . If the branches produced by selection persist longenough to be visible at this scale, then there is positive probability that thepattern of (neutral) variation we see in a sample from the population willlook diﬀerent from the pattern we’d expect without selection.Our results in this paper are relevant to populations evolving in spatialcontinua. The question they address is ‘When can we hope to detect a signalof natural selection in data?’. Whereas in the classical models of subdividedpopulations it is typically assumed that the population in each ‘island’ islarge, so that neighbourhood size is big, by ﬁxing the ‘impact’ parameter u in our model, we are assuming that neighbourhood size is small. As a result,reproduction events are somewhat akin to local extinction and recolonisationevents, in which a signiﬁcant proportion of the local population is replacedin a single event. Our main result shows that our ability to detect selection isthen critically dependent on spatial dimension. For populations living in atleast three spatial dimensions (of which there are very few), spatial structurehas a rather weak eﬀect. However, in two spatial dimensions, selection mustbe stronger and in one spatial dimension (as appropriate for example forpopulations living in intertidal zones) much stronger, before we can expectto be able to detect it. The explanation is that in low dimensions, it isharder for individuals carrying the favoured gene to escape the competitionposed by close relatives who carry the same gene. In our mathematical work,this is reﬂected in the vast majority of branches in our dual process beingcancelled by a coalescence event on a timescale which is negligible comparedto the timescale set by the neutral mutation rate so that no evidence of thesebranches having occurred will be seen in the pattern of neutral mutation.

4. Proof of Theorem 2.7.

Our proof is broken into two steps. Firstin Subsection 4.1 we consider how the pair of potential ancestral lineagescreated during a selective event interact with each other. In particular weﬁnd asymptotics for the probability that they diverge in a short time. This

BM AND SELECTION IN THE SΛFV PROCESS will allow us to identify the branching rate in the limiting Brownian motion.Then in Subsection 4.2 we deﬁne the caterpillar and show how to couplethe dual of the SΛFVS to a system of branching caterpillars. With thisconstruction in hand, Theorem 2.7 follows easily.4.1. Pairs of paths.

In this subsection we are interested in the behaviourof a pair of potential ancestral lineages in the rescaled dual. In order thatthey be uniquely deﬁned, if either is hit by a selective event then we (ar-bitrarily) declare that it jumps to the location of the ﬁrst potential parentsampled in that event. In particular, if they are both aﬀected by the sameevent, then they will necessarily coalesce. We write ξ n, and ξ n, for theresulting potential ancestral lineages and η n = ξ n, − ξ n, for their separation.Throughout this subsection, we use the notation P [ r,r ′ ] to mean that | η n | ∈ [ r, r ′ ] and we adopt the convention that estimates of P [ r,r ′ ] [ B ] hold uniformlyfor all initial laws with mass concentrated on [ r, r ′ ]. We extend this notationto open intervals in the obvious manner. We will also write P r = P [ r,r ] .We are concerned with the behaviour of two potential ancestral lineagescreated during a selective event which, without loss of generality, we supposeto happen at time 0. We shall then refer to η n as an excursion. In this case | η n | ≤ R n and we wish to establish whether or not | η nt | ever exceeds(4.1) γ n = 1(log n ) c , where, in this section, we suppose that c ≥ Remark . We will, eventually, set c = 4 , although any larger con-stant c would give the same result; for now we keep the dependence on c visible in our estimates. For reasons that will soon become apparent, it is convenient to assumethat n is large enough that 7 R n < γ n .The picture of an excursion η n that we would like to build up is, looselyspeaking, as follows.1. With probability κ n = Θ( n ), | η n | reaches displacement γ n withintime 1 / (log n ) c and then ξ n, and ξ n, will not interact again beforea ﬁxed time T >

0. Consequently the displacement between them be-comes macroscopic and we see two distinct paths in the limit. More-over, κ n log n → κ ∈ (0 , ∞ ) as n → ∞ . A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

2. With probability 1 − Θ( n ), | η n | does not reach displacement γ n , and ξ n, and ξ n, coalesce within time 1 / (log n ) c . In this case the diﬀerencebetween them is microscopic and we see only one path in the limit.3. All other outcomes have probability O (cid:0) n ) c − / (cid:1) , which means thatwe won’t see them in the limit.Much of the work in making this rigorous results from the fact that ξ n, , ξ n, only evolve independently when their separation is greater than 2 R n . Ourstrategy is similar to that in the proof of Lemma 4.2 in Etheridge and V´eber(2012), but here we require a stronger result: rather than an estimate of theform κ n ≥ C/ log n we need convergence of κ n log n .4.1.1. Inner and outer excursions.

We shall characterise the behaviourof η n using several stopping times. Set τ out = 0 and deﬁne inductively, for i ≥ τ ini = inf { s > τ outi : | η ns | ≥ R n } , (4.2) τ outi +1 = inf { s > τ ini : | η ns | ≤ R n } . We refer to the interval [ τ outi , τ ini ) (and also to the path of η n during it) asthe i th inner excursion and similarly to [ τ ini − , τ outi ) (and corresponding path)as the i th outer excursion.Since a jump of η n has displacement at most 2 R n , although the initial(0 th ) inner excursion starts in (0 , R n ], for i ≥ | η nτ ini | ∈ [5 R n , R n ]and | η nτ outi | ∈ [2 R n , R n ]. Definition . We deﬁne the stopping times τ coal = inf { s > | η ns | = 0 } ,τ div = inf { s > | η ns | ≥ γ n } ,τ over = 1(log n ) c . We shall say that the i th inner excursion coalesces if τ coal ∈ [ τ outi , τ ini ) .Similarly, the i th outer excursion diverges if τ div ∈ [ τ ini − , τ outi ) .We deﬁne τ type = min( τ coal , τ div , τ over ) and say that η n

1. coalesces if τ type = τ coal ,2. diverges if τ type = τ div ,3. overshoots if τ type = τ over . BM AND SELECTION IN THE SΛFV PROCESS Since almost surely η n only jumps a ﬁnite number of times before time (log n ) − c , almost surely τ type occurs during either an inner or an outer ex-cursion, whose index we denote by i ∗ . We use ζ n to denote the distribution of the distance between the twopotential parents sampled during a selective event. Lemma . There exists α ∈ (0 , such that, uniformly in n , P ζ n [ i ∗ > m ] ≤ α m . Lemma . As n → ∞ , P ζ n [ η n overshoots ] = O (cid:16) n ) c − / (cid:17) . Lemma . As n → ∞ , P ζ n [ η n diverges ] = Θ (cid:16) n (cid:17) . Lemma . As n → ∞ , P ζ n [ η n coalesces ] = 1 − Θ (cid:16) n (cid:17) . Thus, overshoots are relatively unlikely, and typically η n consists of aﬁnite number of inner/outer excursions until either (1) it coalesces, withprobability 1 − Θ( n ), or (2) the two lineages separate to distance γ n , withprobability Θ( n ).The remainder of this Section 4.1.1 is devoted to the proof of Lemmas 4.3-4.5. Lemma 4.6 then follows immediately, since c ≥ τ r = inf { s > | η ns | ≤ r } ,τ r = inf { s > | η ns | ≥ r } . (4.3)Note that τ = τ coal .Note that the random variables τ type , τ r and so on depend implicitly on n ; throughout this section these random variables refer to the stopping timesfor the process η n . Proof. (Of Lemma 4.3.) First consider a single inner excursion of η n . Itis easily seen that there exists some α ′ > n :( † ) For any x ∈ (0 , R n ), if | η nt | = x then the probability that η n will hit0 but not exit B R n (0) within its next three jumps is at least α ′ .In particular, the probability that the ﬁrst three jumps of an inner excursionresult in a coalescence is bounded away from 0 uniformly for any | η nτ outi | ∈ [2 R n , R n ]. If i ∗ > m then at least m inner excursions must occur without A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO a coalescence. The strong Markov property applied at the time τ outi meansthat, conditionally given η nτ outi , the i th inner excursion is independent of( η nt ) t<τ outi . Repeated application of this fact, coupled with ( † ), shows that theprobability of seeing at least m inner excursions without a single coalescenceis at most (1 − α ′ ) m . This completes the proof. (cid:4) We will shortly require a tail estimate on the supremum of the modulusof two dimensional Brownian motion W , which we record ﬁrst for clarity.We write W t = ( W t , W t ) and note P " sup s ∈ [0 ,t ] | W s − W | ≥ x ≤ P " sup s ∈ [0 ,t ] | W s − W | ≥ x/ ≤ P " sup s ∈ [0 ,t ] ( W s − W ) ≥ x/ ≤ e − x / t . (4.4)In the ﬁrst line of the above we use the triangle inequality and the factthat W and W have the same distribution. To deduce the second line,we note that W and − W have the same distribution. For the ﬁnal line,we use the (standard) tail estimate P [sup s ∈ [0 ,t ] ( B s − B ) ≥ x ] ≤ e − x / t fora one dimensional Brownian motion B , which can be deduced via Doob’smartingale inequality applied to the submartingale (exp( xB s /t )) s ≥ .During an outer excursion, η n is the diﬀerence between two independentwalkers and so we can use Skorohod embedding to approximate its behaviourusing elementary calculations for two-dimensional Brownian motion. Thenext lemma exploits this to bound the duration of the outer excursion andthe probability that it diverges. Lemma . As n → ∞ , (4.5) P [5 R n , R n ] (cid:2) τ γ n ∧ τ R n > (log n ) − c − (cid:3) = O (cid:18) n ) c − (cid:19) , and (4.6) P [5 R n , R n ] [ τ γ n < τ R n ] = Θ (cid:18) n (cid:19) . Proof.

For i = 1 , ξ n,i be a pair of independent processes such thatˆ ξ n, has the same distribution as ξ n, and ˆ ξ n, has the same distribution BM AND SELECTION IN THE SΛFV PROCESS as ξ n, . The process ˆ ξ n, − ˆ ξ n, is a compound Poisson process with a rota-tionally symmetric jump distribution and a maximum displacement of 2 R n on each jump. Moreover (essentially by Skorohod’s Embedding Theorem,see e.g. Billingsley (1995)), we can construct a process ˆ η n with the samedistribution as ˆ ξ n, − ˆ ξ n, as follows.Let ( r m , J m ) m ≥ denote a sequence distributed as the jump magnitudesand jump times of ˆ ξ n, − ˆ ξ n, . Let W be a two-dimensional Brownian motionwith W = ˆ ξ n, − ˆ ξ n, , independent of ( r m , J m ) m ≥ . Now setˆ η nt = W T ( S ( t )) where T (0) = 0 , J = 0 , (4.7) T ( m +1) = inf { s > T ( m ) : | W s − W T ( m ) | ≥ r m } ,S ( t ) = sup { i ≥ J i ≤ t } . We may then couple ˆ η n = ˆ ξ n, − ˆ ξ n, . We deﬁne ˆ τ r and ˆ τ r analogously to τ r and τ r , as stopping times of theprocess ˆ η n .Note that since ( ξ n, t , ξ n, t ) t ≤ τ R n has the same distribution as ( ˆ ξ n, , ˆ ξ n, ) t ≤ τ R n ,we may couple them so that they are almost surely equal during this time.Thus { ˆ τ γ n < ˆ τ R n } = { τ γ n < τ R n } . Let T r and T r be the analogues of τ r and τ r for W (not to be confusedwith T ( m ) in (4.7)). By the deﬁnition of the Skorohod embedding in (4.7)we have P [5 R n , R n ] [ˆ τ γ n < ˆ τ R n ] ≥ P [5 R n , R n ] (cid:2) T γ n +2 R n < T R n (cid:3) ≥ P R n (cid:2) T γ n +2 R n < T R n (cid:3) . (4.8)The right hand side concerns only the modulus of two-dimensional Brownianmotion and so can be expressed in terms of the scale function for a two-dimensional Bessel process: P R n (cid:2) T γ n +2 R n < T R n (cid:3) = log(5 R n ) − log(4 R n )log( γ n + 2 R n ) − log(4 R n ) = Θ (cid:18) n (cid:19) , (4.9)which proves the lower bound in (4.6). Similarly, to see the upper bound wenote that P [5 R n , R n ] [ˆ τ γ n < ˆ τ R n ] ≤ P [5 R n , R n ] [ T γ n < T R n ] ≤ P R n [ T γ n < T R n ] A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO = log(7 R n ) − log(2 R n )log( γ n ) − log(2 R n )= Θ (cid:18) n (cid:19) . It remains to prove (4.5). We have τ γ n ∧ τ R n = ˆ τ γ n ∧ ˆ τ R n ≤ ˆ τ γ n . Remark . The above inequality is a very crude estimate, but will beenough to prove (4.5) , which in turn will be enough to give useful bounds onthe duration of excursions due to the freedom in the choice of c . Hence(4.10) P [5 R n , R n ] (cid:2) τ γ n ∧ τ R n > (log n ) − c − (cid:3) ≤ P [5 R n , R n ] h | ˆ η n (log n ) − c − | ≤ γ n i . The remainder of the proof focuses on bounding the right side of (4.10). Todo so, we must relate our compound Poisson process to another Brownianmotion.For j ≥

1, let X j = ˆ η nj/n − ˆ η n ( j − /n . Then ( X j ) j ≥ are i.i.d. and since ˆ ξ n, and ˆ ξ n, are independent, E (cid:2) | X | (cid:3) = 2 E h | ˆ ξ n, /n − ˆ ξ n, | i .Recall from (2.6) that the rate at which ˆ ξ n, jumps from y to y + z isdetermined by the intensity measure m n ( dz ) so that(4.11) E (cid:2) | X | (cid:3) = 2 n Z R | z | m n ( dz ) = 4 σ n , where σ was deﬁned in (2.8). Now recall the deﬁnition of S ( t ) in (4.7);the rate at which ˆ ξ n, jumps is R R m n ( z ) dz = Θ( n ) by (2.7), so S ( n − ) isbounded by the sum of two Poisson(Θ(1)) random variables. Hence sinceeach jump of ˆ η n is bounded by 2 R n , E (cid:2) | X | (cid:3) ≤ (2 R n ) E (cid:2) S ( n − ) (cid:3) = O ( n − ) . (4.12)Once again (since the distribution of X is rotationally symmetric) we mayuse Skorohod’s Embedding Theorem to couple ( X i ) i ≥ to a two-dimensionalBrownian motion B started at η n and a sequence υ , υ , . . . of stopping timesfor B such that setting υ = 0, ( υ i − υ i − ) i ≥ are i.i.d. and B υ i − B υ i − = X i , (4.13) BM AND SELECTION IN THE SΛFV PROCESS E [ υ i − υ i − ] = E [ | X | ] = σ n and E [( υ i − υ i − ) ] = O ( n − ) . It follows that E [ υ ⌊ tn ⌋ ] = σ ⌊ tn ⌋ n and Var( υ ⌊ tn ⌋ ) = O ( tn − ). Hence by Cheby-chev’s inequality, P [ | υ ⌊ tn ⌋ − σ t | ≥ n − / ] ≤ O ( tn − / ) . Applying this result with t = t n := (log n ) − c − , since ˆ η n ⌊ t n n ⌋ /n = B υ ⌊ tnn ⌋ wehave P [5 R n , R n ] (cid:2) | ˆ η nt n | ≤ γ n (cid:3) ≤ P " inf n | B t − B | : t ∈ [2 σ t n − n − / , σ t n + n − / ] o (4.14) ≤ γ n + n − / + 7 R n + P h(cid:12)(cid:12) ˆ η nt n − ˆ η n ⌊ t n n ⌋ /n (cid:12)(cid:12) ≥ n − / i + O ( t n n − / ) . (4.15)For the ﬁrst term on the right hand side we have for n suﬃciently large P h inf n | B t − B | : t ∈ [2 σ t n − n − / , σ t n + n − / ] o ≤ γ n + n − / + 7 R n i ≤ P h | B σ t n − B | ≤ γ n + 3 n − / i + P " sup t ∈ [0 , n − / ] | B t − B | ≥ n − / = O ( γ n t − n ) + O ( e − n / )= O ((log n ) − c ) , (4.16)For the second inequality, we use that the density of B t is bounded by(2 πt ) − for the ﬁrst term and we apply (4.4) for the second term.Moving on to the second term on the right hand side of (4.15), sincefrom (4.11) we have E h | ˆ η nt n − ˆ η n ⌊ t n n ⌋ /n | i = O ( n − ), by Markov’s inequality(4.17) P h | ˆ η nt n − ˆ η n ⌊ t n n ⌋ /n | ≥ n − / i = O ( n − / ) . Putting (4.16) and (4.17) into (4.15) we have P [5 R n , R n ] (cid:2) | ˆ η nt n | ≤ γ n (cid:3) = O ((log n ) − c ) . In view of (4.10), this completes the proof. (cid:4) A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

Proof. (Of Lemma 4.4.) First consider a single inner excursion. Evi-dently there exists β > n :( ‡ ) For any x ∈ (0 , R n ), if | η nt | = x then the probability that η n willeither exit B R n (0) or hit 0 within its next three jumps is at least β .Let ( J l ) l ≥ be the (a.s. ﬁnite) sequence of jump times of our inner ex-cursion, and let B k be the event that the excursion either coalesces or exits B R n (0) at one of { J k +1 , J k +2 , J k +3 } . By the strong Markov property (ap-plied at J k ) and ( ‡ ), inf { k ≥ B k = 1 } is stochastically bounded aboveby a geometric random variable G with success probability β .Moreover, for as long as η n is not at 0, the rate at which it jumps isbounded below by the rate at which ξ n, jumps, which is R R m n ( dz ) = Θ( n )where m n is given by (2.6). Hence for each l ≥ J l +1 − J l is stochasticallybounded above by E l where the ( E i ) i ≥ are i.i.d. exponential random vari-ables of this rate.Combining these observations, P (0 , R n ) h τ R n ∧ τ > n − / i ≤ P h J ⌈ n / +3 ⌉ ≥ n − / i + P h G > n / i (4.18) = O ( n − / ) + (1 − β ) n / = O ( n − / )where the last line follows by Markov’s inequality.We are now in a position to complete the proof. Recall that η n overshootsif it has neither coalesced nor diverged by time (log n ) − c . Let n be suﬃcientlylarge that (log n ) / ( n − / + (log n ) − c − ) ≤ (log n ) − c . Thus, if η n overshoots and i ∗ < (log n ) / , then at least one inner excursionmust have lasted longer than n − / or at least one outer excursion musthave lasted longer than (log n ) − c . Hence, P ζ n [ η n overshoots] ≤ (log n ) / (cid:18) P (0 , R n ) h τ R n ∧ τ > n − / i + P [5 R n , R n ] (cid:2) τ γ n ∧ τ R n > (log n ) − c − (cid:3) (cid:19) + P ζ n h i ∗ > (log n ) / i . Using (4.18), (4.5) and Lemma 4.3 to bound the right hand side of the aboveequation, we obtain P ζ n [ η n overshoots] ≤ (log n ) / ( O ( n − / ) + O ((log n ) − c )) + α (log n ) / BM AND SELECTION IN THE SΛFV PROCESS = O ((log n ) / − c ) , which completes the proof. (cid:4) Proof. (Of Lemma 4.5.) We note that the probability that η n divergesis bounded above by the probability that a divergent outer excursion occursbefore a coalescing inner excursion occurs. Let us write η n,i,in for the i th innerexcursion and η n,i,out for the i th outer excursion and let us write τ r,i,in , τ r,i,in and τ r,i,out , τ r,i,out for the associated equivalents of τ r and τ r . Thus, P ζ n [ η n diverges] ≤ P ζ n (cid:2) inf (cid:8) i ≥ τ γ n ,i,out < τ R n ,i,out } ≤ inf { i ≥ τ ,i,in < τ R n ,i,in (cid:9)(cid:3) . By the strong Markov property (applied successively at times τ outi and τ ini ),along with (4.6) and ( † ), the right hand side of the above equation is boundedabove by the probability that a geometric random variable with success prob-ability Θ( n ) is smaller than an (independent) geometric random variablewith success probability α ′ >

0. With this in hand, an elementary calculationshows that P ζ n [ η n diverges] = O (cid:18) n (cid:19) . It remains to prove a lower bound of the same order.In similar style to ( † ) and ( ‡ ), it is easily seen that there exists δ > n :( ⋆ ) For any x ∈ [ R n , R n ], if | η n | = x , the probability that η n will exit B R n (0) without coalescing is at least δ .We note also that ζ n is equal to n − / ζ in distribution, so since we assumedthat µ (( R , R ]) >

0, there exists ǫ > P [ ζ n ≥ R n ] ≥ ǫ for all n .Thus, applying the strong Markov Property at time τ in and using ( ⋆ ), weobtain P ζ n [ η n diverges] ≥ ǫδ P [5 R n , R n ] [ τ γ n < τ R n ] − P ζ n [ η n overshoots]= Θ (cid:18) n (cid:19) as required, where the ﬁnal statement follows from Lemma 4.7 and Lemma 4.4(since c ≥ (cid:4) A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

Production of branches.

The next step of the proof of Theorem 2.7involves further analysis of pairs of potential ancestral lineages: ﬁrst we needto check that once a pair has separated to a distance γ n they won’t comeback together again before a ﬁxed time K ; second we need to see thatlog n times the divergence probability actually converges (c.f. Lemma 4.5)as n → ∞ , since this will determine the branching rate in our branchingBrownian motion limit. These two statements are the object of the next twolemmas. Lemma . Fix K ∈ (0 , ∞ ) . Then P [(log n ) − c , ∞ ) [ τ R n ≤ K ] = O (cid:18) log log n log n (cid:19) . Lemma . There exists κ ∈ (0 , ∞ ) such that (log n ) P ζ n [ η n diverges ] → κ as n → ∞ . The remainder of this subsection is occupied with proving Lemmas 4.9and 4.10.

Proof. (Of Lemma 4.9.) We use the Skorohod embedding of ˆ η into theBrownian motion W , as deﬁned in (4.7), to reduce the claim to an equivalentstatement about a two-dimensional Bessel process.Recall that η n = ˆ η n = W and recall τ r from (4.3), and that ˆ τ r and T r are the analogues of τ r for ˆ η and W respectively. We have that η ns = ˆ η ns forall s ≤ τ R n so P [(log n ) − c , ∞ ) [ τ R n ≤ K ] = P [(log n ) − c , ∞ ) [ˆ τ R n ≤ K ] ≤ P [(log n ) − c , ∞ ) h T R n ≤ T ( S ( K )) i , (4.19)where we used the Skorohod embedding given in (4.7) in the last line. Forall ˜ K, C >

0, since T ( k ) is increasing in k we have(4.20) P h T ( S ( K )) ≥ ˜ K i ≤ P [ S ( K ) ≥ Cn ] + P h T ( Cn ) ≥ ˜ K i . By its deﬁnition in (4.7), S ( K ) is bounded by the sum of two Poisson randomvariables with parameter χ = K R R m n ( dz ), where m n is given by (2.6). Inparticular, χ = Θ( n ). Recall that if Z ′ is Poisson with parameter χ , then(using a Chernoﬀ bound argument) for k > χ ,(4.21) P [ Z ′ > k ] ≤ e − χ ( eχ ) k k k . BM AND SELECTION IN THE SΛFV PROCESS Hence, for C suﬃciently large, there exists δ > P [ S ( K ) ≥ Cn ] ≤ O ( e − δ n ) . Now by the deﬁnition of ( T ( m ) ) m ≥ in (4.7), and since r m ≤ R n for each m , P h T ( Cn ) ≥ ˜ K i ≤ P " Cn X i =1 R i ≥ ˜ Kn , where ( R i ) i ≥ is an i.i.d. sequence with R d = inf { t ≥ | W t | ≥ R} . Since P [ R ≥ k ] ≤ P [ R ≥ k − P [ | W k − W k − | ≤ R ] ≤ P [ | W − W | ≤ R ] k , there exists λ > E (cid:2) e λR (cid:3) < ∞ . Hence by Cram´er’s theorem, for˜ K a suﬃciently large constant, there exists δ > P h T ( Cn ) ≥ ˜ K i = O ( e − δ n ) . By (4.19) and (4.20) together with (4.22) and (4.23), we now have for ˜ K suﬃciently large(4.24) P [(log n ) − c , ∞ ) [ τ R n ≤ K ] ≤ P [(log n ) − c , ∞ ) h T R n ≤ ˜ K i + O ( e − δ n ) + O ( e − δ n ) . To ﬁnish, we note that P [(log n ) − c , ∞ ) h T R n ≤ ˜ K i ≤ sup x ≥ (log n ) − c (cid:16) P x h T R n ≤ T x +log n i + P x h T x +log n ≤ ˜ K i(cid:17) ≤ sup x ≥ (log n ) − c (cid:18) log( x + log n ) − log x log( x + log n ) − log(4 R n ) (cid:19) + P " sup t ≤ ˜ K | W t − W | ≥ log n = O (cid:18) log log n log n (cid:19) + O ( e − (8 ˜ K ) − (log n ) ) , where the second line uses the scale function for a two-dimensional Besselprocess, and the third line uses (4.4). Substituting this into (4.24), we havethe required result. (cid:4) Proof. (Of Lemma 4.10.) Let p n := P ζ n [ τ γ n < τ ]. Note that by Lemma 4.4,(4.25) | p n − P ζ n [ η n diverges] | = O (cid:18) n ) c − / (cid:19) . A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

Hence by Lemma 4.5, there exist 0 < d ≤ D < ∞ such that for all n ≥ d ≤ (log n ) p n ≤ D. It follows that ( p n ) n ≥ has a subsequence ( p n k ) k ≥ such that (log n k ) p n k → κ ∈ (0 , ∞ ) . Let ǫ > N ∈ N be such that N ≥ /ǫ and | (log N ) p N − κ | ≤ ǫ. By rescaling, noting that ζ n d = ζ N ( Nn ) / , and similarly for η n , wehave(4.26) p N = P ζ n h τ γ N ( Nn − ) / < τ i . Recall, for clarity, that here (as throughout this section) τ r and τ refer tothe stopping times for the process η n .Deﬁne X n,N := | η nτ γN ( Nn − / | . Increasing N , we may assume that 7 R n <γ N ( N n − ) / ≤ γ n for n ≥ N . Thus, p n = P ζ n h τ γ N ( Nn − ) / ≤ τ γ n < τ i = E ζ n h τ γN ( Nn − / <τ P X n,N [ τ γ n < τ ] i . (4.27)Here, the ﬁrst line holds since ζ n < γ N ( N n − ) / ≤ γ n , and the secondline follows from the ﬁrst by applying the Strong Markov Property at time τ γ N ( Nn − ) / .To estimate (4.27), note that X n,N ∈ [ l n,N , r n,N ] := [ γ N ( N n − ) / , γ N ( N n − ) / + 2 R n ] . Using the Skorohod embedding deﬁned in (4.7), P [ l n,N ,r n,N ] [ τ γ n < τ ] ≥ inf x ≥ γ N ( Nn − ) / P x [ τ γ n < τ R n ] ≥ inf x ≥ γ N ( Nn − ) / P x (cid:2) T γ n +2 R n < T R n (cid:3) = log( γ N ( N n − ) / ) − log(7 R n )log( γ n + 2 R n ) − log(7 R n )= log N + O (log log N ) log n + O (log log n ) . (4.28)Note that, in the above, we (again) use the scale function for a two-dimensionalBessel process to deduce the third line.We require slightly more work to establish an upper bound. We have(4.29) P [ l n,N ,r n,N ] [ τ γ n < τ ] ≤ P [ l n,N ,r n,N ] [ τ γ n < τ R n ]+ P [ l n,N ,r n,N ] [ τ R n < τ γ n < τ ] . BM AND SELECTION IN THE SΛFV PROCESS We begin by controlling the second term on the right hand side of (4.29).By the Strong Markov Property at time τ R n , P [ l n,N ,r n,N ] [ τ R n < τ γ n < τ ] = E [ l n,N ,r n,N ] h τ R n <τ γn P | η nτ R n | [ τ γ n < τ ] i ≤ E [ l n,N ,r n,N ] h P | η nτ R n | [ τ γ n < τ ] i . Since (cid:12)(cid:12) η nτ R n (cid:12)(cid:12) ∈ [5 R n , R n ], using (4.6) in the same way as in the proof ofLemma 4.5,(4.30) P [ l n,N ,r n,N ] [ τ R n < τ γ n < τ ] = O (cid:18) n (cid:19) . Next, we control the ﬁrst term on the right hand side of (4.29), again usingthe Skorohod embedding (4.7): P [ l n,N ,r n,N ] [ τ γ n < τ R n ] ≤ P [ l n,N ,r n,N ] [ T γ n < T R n ] ≤ log( γ N ( N n − ) / + 2 R n ) − log(5 R n )log( γ n ) − log(5 R n )= log N + O (log log N ) log n + O (log log n ) . (4.31)Combining (4.28), (4.29), (4.30) and (4.31), P [ l n,N ,r n,N ] [ τ γ n < τ ] = log N + O (log log N )log n + O (log log n ) + O (cid:18) n (cid:19) . Hence by (4.27), p n = P ζ n h τ γ N ( Nn − ) / < τ i (cid:18) log N + O (log log N )log n + O (log log n ) + O (cid:18) n (cid:19)(cid:19) = (log N ) p N log n O (cid:0) log log N log N (cid:1) O (cid:0) log log n log n (cid:1) + O (cid:18) N (cid:19)! , where we used (4.26) in the last line. Since | (log N ) p N − κ | ≤ ǫ we obtainfor n ≥ N (log n ) p n ≥ ( κ − ǫ ) O ( log log N log N )1 + O ( log log n log n ) + O (cid:18) N (cid:19)! and (log n ) p n ≤ ( κ + ǫ ) O (cid:0) log log N log N (cid:1) O (cid:0) log log n log n (cid:1) + O (cid:18) N (cid:19)! . Letting ǫ → N → ∞ , lim n →∞ (log n ) p n = κ . The result followsby (4.25). (cid:4) A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

Convergence to branching Brownian motion.

In this subsection weidentify particular subsets of the dual process that we couple with ob-jects that we call ‘caterpillars’. The caterpillars play the rˆole of individualbranches in the limiting branching Brownian motion. Our (eventual) goalis to write down a system of ‘branching caterpillars’ and couple it to theSΛFVS dual. Establishing these couplings is greatly simpliﬁed by viewingthe branching and coalescing dual as a deterministic function of an aug-mented driving Poisson point process and so our ﬁrst task is to recast theSΛFVS dual in this way.Recall that we have a ﬁxed impact parameter u ∈ (0 , ,

1] as follows: A u = [0 , u ] , and for k ≥ , A k +1 u = uA ku ∪ ( u + (1 − u ) A ku ) . Then if U ∼ Unif[0 , A ku ( U )) k ≥ is an i.i.d. sequence of Bernoulli( u )random variables (see Lemma 3.20 in Kallenberg (2006) for a proof in thecase u = , where ( A ku ( U )) k ≥ is the binary expansion of U ; the generalcase is an easy extension of this).Let X = R × R × R + × B (0) × [0 , . Definition . Given a simple point process Π on X , and some p ∈ R , wedeﬁne ( P t ( p, Π)) t ≥ as a process on ∪ ∞ k =1 ( R ) k as follows.For each t ≥ , P t ( p, Π) = ( ξ t , . . . , ξ N t t ) for some N t ≥ . We refer to i asthe index of the ancestor ξ it . We begin at time t = 0 from a single ancestor P ( p, Π) = ξ = p and proceed as follows.At each ( t, x, r, z , z , q, v ) ∈ Π with v ≥ s n , a neutral event occurs:1. Let ξ n t − , . . . , ξ n m t − denote the ancestors in B r ( x ) which have not yet co-alesced with an ancestor of lower index, with n < . . . < n m . For ≤ i ≤ m , mark the ancestor ξ n i t − iﬀ q ∈ A iu . Let ξ r t − , . . . ξ r l t − denotethe marked ancestors.2. If at least one ancestor is marked, we set ξ r i t = x + rz for each i andcall this the parental location for the event. We say that the ancestor ξ r i t has coalesced with the ancestor ξ r t , for each i ≥ .At each ( t, x, r, z , z , q, v ) ∈ Π with v < s n , a selective event occurs:1. Let ξ n t − , . . . , ξ n m t − denote the ancestors in B r ( x ) which have not yet co-alesced with an ancestor of lower index, with n < . . . < n m . For ≤ i ≤ m , mark the ancestor ξ n i t − iﬀ q ∈ A iu . Let ξ r t − , . . . ξ r l t − denotethe marked ancestors. BM AND SELECTION IN THE SΛFV PROCESS

2. If at least one ancestor is marked, we set ξ r i t = x + rz for each i and add an ancestor ξ N t − +1 t = x + rz . We call x + rz and x + rz the parental locations of the event. We say that the ancestor ξ r i t hascoalesced with the ancestor ξ r t , for each i ≥ .For each l ∈ N , if ξ lτ has coalesced with an ancestor ξ kτ of lower index attime τ , we set ξ lt = ξ kt for all t ≥ τ . In the same way as for the deﬁnition of P ( n ) ( p ) before the statement ofTheorem 2.7, we shall view ( P t ( p, Π)) t ≥ as a collection of potential ancestrallineages. Given a realization of Π, we say that a path that begins at p is apotential ancestral lineage if (1) at each neutral event that it encounters, itmoves to the (single) parent and (2) at each selective event it encounters, itmoves to one of the parents of that event.Note that if Π is a Poisson point process on X with intensity measure(4.32) n dt ⊗ n dx ⊗ µ n ( dr ) ⊗ π − dz ⊗ π − dz ⊗ dq ⊗ dv then as a collection of potential ancestral lineages, ( P t ( p, Π)) t ≥ has thesame distribution as P ( n ) ( p ).When Π takes this form, the result is that the driving Poisson PointProcess in (2.4) has been augmented by components that determine thenature of each event (neutral or selective), the parental locations of eachevent and which lineages in the region of the event are aﬀected by it. Wehave abused notation by retaining the notation Π for this augmented process.4.2.1. The caterpillar.

We now introduce the notion of a caterpillar,which involves following a pair of potential ancestral lineages in the dual. Westop the caterpillar if the pair of lineages reaches displacement of (log n ) − c ,or if the pair does not coalesce within time (log n ) − c after last branching.While doing so, we suppress the creation of the second potential parent atany selective events that occur within time (log n ) − c of the previous (unsup-pressed) selective event.Let Π be a Poisson point process on X with intensity measure (4.32).We write ( P t ( p, Π)) t ≥ = ( ξ t , . . . , ξ N t t ) t ≥ as deﬁned in Deﬁnition 4.11. Definition . For p ∈ R , we deﬁne a lifetime h ( p, Π) > , and a process ( c t ( p, Π)) ≤ t ≤ h ( p, Π) on ( R ) , which we shall refer to as acaterpillar. For each t ≥ , we write c t ( p, Π) = (cid:0) c t ( p, Π) , c t ( p, Π) (cid:1) , A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO dropping the dependence on ( p, Π) from our notation, when convenient. Aspart of the deﬁnition, we will also deﬁne k ∗ ( p, Π) ∈ N and a sequence ( τ br k ) k ≤ k ∗ of stopping times.Set τ br = 0 and let τ br be the time of the ﬁrst selective event after (log n ) − c to aﬀect ξ . For t ≤ τ br , let c t = c t = ξ t .Then, for k ≥ , suppose we have deﬁned ( τ br l ) l ≤ k ; let m ( k ) = N τ br k .For t ∈ [ τ br k , τ br k + (log n ) − c ] , deﬁne c t ( p, Π) = ξ t and c t ( p, Π) = ξ m ( k ) t .In analogy with Deﬁnition 4.2, deﬁne τ div k = inf { t ≥ τ br k : | c t − c t | ≥ (log n ) − c } ,τ coal k = inf { t ≥ τ br k : c t = c t } ,τ over k = τ br k + (log n ) − c , (4.33) and let τ type k = min( τ div k , τ coal k , τ over k ) . If τ type k = τ coal k then set k ∗ ( p, Π) = k and h ( p, Π) = τ type k ∗ . The deﬁnition is then complete. If not, we proceed asfollows.Let τ br k +1 be the time of the ﬁrst selective event occurring strictly after τ br k + (log n ) − c to aﬀect ξ . For t ∈ [ τ br k + (log n ) − c , τ br k +1 ) , let c t ( p, Π) = c t ( p, Π) = ξ t .We then continue iteratively for each k ≤ k ∗ ( p, Π) . We refer to ( τ br k ) k ≤ k ∗ , the times at which a selective event results inbranching, as branching events. We shall abuse our previous terminologyand say that a branching event diverges, coalesces or overshoots when thesame is true of the excursion corresponding to the pair ( c , c ). Remark . Note that ( c t ) t ≥ is not a Markov process with respect toits natural ﬁltration, since c and c are not allowed to branch oﬀ from eachother within (log n ) − c of the previous branching event. However, for i = 1 , , ( c it ( p, Π)) ≤ t ≤ h ( p, Π) is a Markov process with the same jump rate and jumpdistribution as a single potential ancestral lineage in the rescaled S Λ FVSdual. Moreover for each ≤ k ≤ k ∗ , ( c t , c t ) τ br k ≤ t ≤ τ type k is an excursion asdeﬁned in Section 4.1. Recall the deﬁnition of m n ( dz ) from (2.6) and let(4.34) κ n = (log n ) P [ τ type = τ coal ] and λ = n − Z R m n ( dz ) = Θ(1) . By combining Lemma 4.10 and Lemma 4.4,(4.35) κ n → κ BM AND SELECTION IN THE SΛFV PROCESS as n → ∞ .By the strong Markov property of Π, and since τ type k ≤ τ br k + (log n ) − c ≤ τ br k +1 for each k , the types of the selective events, ( { τ type k = τ div k } ) k ≥ ,( { τ type k = τ coal k } ) k ≥ and ( { τ type k = τ over k } ) k ≥ are each i.i.d. sequences. Thus,(4.36) k ∗ ( p, Π) ∼ Geom( κ n (log n ) − ) . By (4.35), there exist constants 0 < a ≤ A < ∞ such that κ n ∈ [ a, A ] for all n suﬃciently large, so(4.37) P [ k ∗ ≥ (log n ) / ] = (1 − κ n log n ) (log n ) / = O ( e − δ (log n ) / )for some δ > Lemma . We can couple h ( p, Π) with H ∼ Exp ( κ n λ ) in such a waythat for some δ > , with probability at least − O ( e − δ (log n ) / ) | h ( p, Π) − H | ≤ n ) − / . Proof.

Recall the deﬁnition of λ in (4.34). Since the total rate at which c jumps is given by λn , and each jump is from a selective event indepen-dently with probability s n = log nn , by the strong Markov property of Π wehave that(4.38) E k := τ br k − ( τ br k − + (log n ) − c ) ∼ Exp( λ log n )and ( E k , { τ type k = τ coal k } ) k ≥ is an i.i.d. sequence.Since (for example) { τ type k = τ coal k } is not independent of the radiusof the event at τ br k , we note that E k and { τ type k = τ coal k } are not indepen-dent; therefore ( E k ) k ≥ is not independent of k ∗ . However, we can couple( E k , { τ type k = τ coal k } ) k ≥ with a sequence ( E ′ k ) k ≥ which is independent of k ∗ as follows.First sample the sequence ( { τ type k = τ coal k } ) k ≥ , and then independently sam-ple a sequence ( E ′ k , A k ) k ≥ with the same distribution as ( E k , { τ type k = τ coal k } ) k ≥ .Then, for each k ≥

1, if A k = { τ type k = τ coal k } set E k = E ′ k , and if not sample E k according to its conditional distribution given { τ type k = τ coal k } .We now have a coupling of ( E k , { τ type k = τ coal k } ) k ≥ and ( E ′ k ) k ≥ such that( E ′ k ) k ≥ is an i.i.d. sequence, independent of k ∗ , with E ′ ∼ Exp( λ log n ).Also, since P [ τ type k = τ coal k ] = Θ((log n ) − ), we have that independently foreach k , E k = E ′ k with probability at least 1 − Θ((log n ) − ). A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

We write k ∗ X k =1 E k = k ∗ X k =1 E ′ k + k ∗ X k =1 D k , where D k = E k − E ′ k and, by (4.36), P k ∗ k =1 E ′ k ∼ Exp( λκ n ).Our next step is to bound P k ∗ k =1 D k . Firstly, applying a Chernoﬀ boundto the binomial distribution yields P h(cid:12)(cid:12)(cid:12)(cid:8) k < (log n ) / : D k = 0 (cid:9)(cid:12)(cid:12)(cid:12) ≥ (log n ) / i = P h Bin (cid:0) (log n ) / , Θ((log n ) − ) (cid:1) ≥ (log n ) / i = O (cid:0) exp( − δ ′ (log n ) / ) (cid:1) (4.39)for some δ ′ >

0. Secondly, P h | D | ≥ (log n ) − / i ≤ P h E ≥ (log n ) − / i + P h E ′ ≥ (log n ) − / i = 2 exp( − λ (log n ) / / . (4.40)Combining (4.37), (4.39) and (4.40), we have that(4.41) P " k ∗ X k =1 D k ≥ (log n ) − / = O (cid:16) e − δ ′′ (log n ) / (cid:17) , for some δ ′′ ∈ (0 , δ ).Note that k ∗ X k =1 E k = τ br k ∗ − k ∗ (log n ) − c = h − k ∗ (log n ) − c − ( τ type k ∗ − τ br k ∗ ) , with 0 ≤ τ type k ∗ − τ br k ∗ ≤ (log n ) − c . Let H = P k ∗ k =1 E ′ k . Then by (4.37) and(4.41), we have P h | h ( p, Π) − H | ≥ (log n ) / − c + (log n ) − c + (log n ) − / i = O (cid:16) e − δ ′′ (log n ) / (cid:17) . The result follows since c ≥ (cid:4) Our next step is to show that a caterpillar is unlikely to end with anovershooting event.

Lemma . As n → ∞ , P h τ type k ∗ = τ over k ∗ i = O (cid:16) (log n ) − c (cid:17) . BM AND SELECTION IN THE SΛFV PROCESS Proof.

By Lemma 4.4, for k ≥ P [ τ type k = τ over k ] = O ((log n ) − c ) . Moreover, { τ type k ∗ = τ over k ∗ } ⊂ { k ∗ ≥ (log n ) / } ∪ (log n ) / [ k =1 { τ type k = τ over k } . It follows, using (4.37), that P [ τ type k ∗ = τ over k ∗ ] = O ( e − δ (log n ) / ) + O ((log n ) + − c ) = O ((log n ) − c ) . This completes the proof. (cid:4)

We now show that a single caterpillar can be coupled to a Brownian mo-tion in such a way that the caterpillar closely follows the Brownian motion,during time [0 , h ( p, Π)].Recall that the rate at which ξ jumps from y to y + z is given by intensitymeasure m n ( dz ), deﬁned in (2.6). Thus for ( c t ) t ≥ started at p , E [ c t ] = p and the covariance matrix of c t is σ t Id since by (2.8), σ = Z R | z | m n ( dz ) . Armed with this, the following lemma is no surprise.

Lemma . Let ( W t ) t ≥ be a two-dimensional Brownian motion with W = p . We can couple ( c t ( p, Π)) t ≤ h ( p, Π) with ( W t ) t ≥ , in such a way that ( W t ) t ≥ is independent of ( τ brk ) k ≥ and k ∗ ( p, Π) , and for any r > , withprobability at least − O ((log n ) − r ) , for t ≤ h ( p, Π) , | c t ( p, Π) − W σ t | ≤ (log n ) − c . Remark . By the deﬁnition of the caterpillar in Deﬁnition 4.12,for all t ≤ h ( p, Π) , | c t − c t | ≤ (log n ) − c . Hence under the coupling ofLemma 4.16, with probability at least − O ((log n ) − r ) , | c t ( p, Π) − W σ t | ≤ n ) − c . Proof.

The proof is closely related to the second half of the proof ofLemma 4.7. Note for k ≥

0, on the time interval [ τ brk + (log n ) − c , τ brk +1 ), c t is a pure jump process with rate of jumps from y to y + z given by A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO (1 − s n ) m n ( dz ). Let (˜ c t ) t ≥ be a pure jump process with ˜ c = 0 and rate ofjumps from y to y + z given by (1 − s n ) m n ( dz ). For i ≥

1, let X i = ˜ c i/n − ˜ c ( i − /n . Then ( X i ) i ≥ are i.i.d., and as in (4.11) and (4.12), we have E [ | X | ] = σ (1 − s n ) n and E [ | X | ] = O ( n − ).By the same Skorohod embedding argument as for (4.13), there is a two-dimensional Brownian motion W started at 0 and a sequence υ , υ , . . . ofstopping times for W such that for i ≥ W υ i = ˜ c i/n and P [ | υ ⌊ tn ⌋ − σ (1 − s n ) t | ≥ n − / ] ≤ O ( tn − / ) . Fix t >

0. Since s n = log nn , for n suﬃciently large, P [ | υ ⌊ tn ⌋ − σ t | ≥ n − / ] ≤ O ( n − / ) . Then by a union bound over j = 1 , . . . , ⌊ n / t ⌋ , P h ∃ j ≤ ⌊ n / t ⌋ : | υ ⌊ jn / ⌋ − σ jn − / | ≥ n − / i ≤ ( n / t ) O ( n − / )(4.43) = O ( n − / ) . Again by a union bound over j , P (cid:20) ∃ j ≤ ⌊ n / t ⌋ : sup (cid:26) | W σ jn − / − W u | : u ∈ [ σ jn − / − n − / , σ ( j + 1) n − / + 2 n − / ] (cid:27) ≥ n − / (cid:21) ≤ ( n / t )2 P h sup {| W s − W | : s ∈ [0 , n − / ] } ≥ n − / i ≤ n / t exp( − n / / o ( n − / ) . (4.44)Here, the last line follows by (4.4).Under the complement of the event of (4.43), for all j < ⌊ n / t ⌋ , | υ ⌊ jn / ⌋ − σ jn − / | ≤ n − / and | υ ⌊ ( j +1) n / ⌋ − σ ( j + 1) n − / | ≤ n − / , which implies that for i such that jn − / ≤ in − ≤ ( j + 1) n − / , υ i ∈ h σ jn − / − n − / , σ ( j + 1) n − / + 2 n − / i . BM AND SELECTION IN THE SΛFV PROCESS Hence combining (4.43) and (4.44), P h ∃ i ≤ ⌊ tn ⌋ : | ˜ c i/n − W σ i/n | ≥ n − / i = O ( n − / ) . Our next step is to control | ˜ c s − ˜ c i/n | during the interval s ∈ [ i/n, ( i +1) /n ]. The distribution of the number of jumps made by ˜ c on an interval[ i/n, ( i + 1) /n ] is Poisson with parameter (1 − s n ) λ , where λ is given by(4.34), and the maximum jump size is 2 R n ; using (4.21) with χ = (1 − s n ) λ and k = log n gives that P " ∃ i ≤ ⌊ tn ⌋ : sup s ∈ [ i/n, ( i +1) /n ] | ˜ c s − ˜ c i/n | ≥ (log n )2 R n = o ( n − ) . Hence for n large enough that (log n )2 R n ≤ n − / , using (4.44) again tobound | W s − W σ i/n | during the interval [ σ i/n, σ ( i + 1) /n ] we have(4.45) P (cid:20) sup s ≤ t | ˜ c s − W σ s | ≥ n − / (cid:21) = O ( n − / ) . We now apply this coupling to ( c t ) τ brk +(log n ) − c ≤ t ≤ τ brk +1 for each k ≥

0, andlet the caterpillar evolve independently of the Brownian motion on eachinterval [ τ br k , τ br k + (log n ) − c ].More precisely, let (˜ c k ) k ≥ be an i.i.d. sequence of pure jump processeswith ˜ c k = 0 and rate of jumps from y to y + z given by (1 − s n ) m n ( dz ). Let( W k ) k ≥ be an i.i.d. sequence of 2-dimensional Brownian motions started at0 and for each k ≥

0, couple W k and ˜ c k in the same way as above, so thatfor ﬁxed t >

0, for each k ≥ P (cid:20) sup s ≤ t | ˜ c ks − W kσ s | ≥ n − / (cid:21) = O ( n − / ) . Then by the Strong Markov property for the process c , we can couple(˜ c k , W k ) k ≥ and c in such a way that for k ≥ s ∈ [0 , τ brk +1 − ( τ brk +(log n ) − c )), c s + τ brk +(log n ) − c − c τ brk +(log n ) − c = ˜ c ks . and (˜ c k , W k ) k ≥ is independent of (cid:0) τ br k , ( c t − c t ) | [ τ br k ,τ br k +(log n ) − c ) (cid:1) k ≥ .Let B be another independent 2-dimensional Brownian motion started at0. We now deﬁne a single Brownian motion W by piecing together incrementsof B and ( W k ) k ≥ . For s < σ (log n ) − c , let W s = B s + p . Then for k ≥ A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO deﬁne the increments of W on the time interval [ σ ( τ brk +(log n ) − c ) , σ ( τ brk +1 +(log n ) − c )) as follows. For s ∈ [0 , σ ( τ brk +1 − τ brk )), let W s + σ ( τ brk +(log n ) − c ) − W σ ( τ brk +(log n ) − c ) = W ks . Then W is a Brownian motion independent of (cid:0) τ br k , ( c t − c t ) | [ τ br k ,τ br k +(log n ) − c ) (cid:1) k ≥ ,which implies that W is independent of both k ∗ and ( τ br k ) k ≥ .We now check that W t is close to c t for t < h . By (4.38), P h τ br k +1 − τ br k ≥ n ) − c i ≤ n − λ . Hence applying (4.46) with t = 1 + (log n ) − c for each k ≤ (log n ) / andusing (4.37), we have that with probability at least 1 − O ( e − δ (log n ) / ), for0 ≤ k ≤ k ∗ and t ∈ [ τ br k + (log n ) − c , τ br k +1 ),(4.47) (cid:12)(cid:12)(cid:12)(cid:16) c t − c τ br k +(log n ) − c (cid:17) − (cid:16) W σ t − W σ ( τ br k +(log n ) − c ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ n − / . For each k , by (4.4), P h sup n | W σ t − W σ τ br k | : t ∈ [ τ br k , τ br k + (log n ) − c ] o ≥ (log n ) − c/ i ≤ − (log n ) c/ / σ )= o (cid:16) (log n ) − r − (cid:17) , (4.48)for any r >

0. Hence, using (4.37) again, P " k ∗ X k =1 sup n | W σ t − W σ τ br k | : t ∈ [ τ br k , τ br k + (log n ) − c ] o ≥ (log n ) − c ≤ P h k ∗ ≥ (log n ) / i + (log n ) / o ((log n ) − r − )= o ((log n ) − r ) . (4.49)For k ≥

0, on the time interval [ τ br k , τ br k + (log n ) − c ] the process c t is a purejump process with rate of jumps from y to y + z given by m n ( dz ). Henceusing the same Skorohod embedding argument as for (4.45), we can couple( c s + τ br k − c τ br k ) s ≤ (log n ) − c with a Brownian motion W ′ started at 0 in such away that P " sup s ≤ (log n ) − c | ( c s + τ br k − c τ br k ) − W σ s | ≥ n − / = O ( n − / ) . BM AND SELECTION IN THE SΛFV PROCESS Applying (4.49) and (4.37), it follows that P (cid:20) k ∗ X k =1 sup n | c t − c τ br k | : t ∈ [ τ br k , τ br k + (log n ) − c ] o ≥ (log n ) − c + 4 n − / (log n ) / (cid:21) = O ((log n ) − r ) . The stated result follows by combining the above equation with (4.47),(4.37) and (4.49). (cid:4)

The branching caterpillar.

We now construct a branching processof caterpillars. We start from a single caterpillar and allow it to evolve untilthe time h . We start two independent caterpillars from the locations of c h and c h . Now iterate. The independent caterpillars deﬁned in this way willbe indexed by points of U = {∅} ∪ S ∞ k =1 { , } k . More formally: Definition . Let (Π j ) j ∈U be a sequence ofindependent Poisson point processes on X with intensity measure (4.32) .For p ∈ R , we deﬁne ( C t ( p, (Π j ) j ∈U )) t ≥ as a process on ∪ ∞ k =1 ( R ) k asfollows. For s > , let Π sj = { ( t − s, x, r, z , z , q, v ) : ( t, x, r, z , z , q, v ) ∈ Π j } . (4.50) Deﬁne ( p j , t j , h j ) inductively for j ∈ U by p ∅ = p , t ∅ = 0 and h j = t j + h ( p j , Π t j j ) t ( j, = t ( j, = h j p ( j, = c h j − t j ( p j , Π t j j ) p ( j, = c h j − t j ( p j , Π t j j ) . Finally, deﬁne U ( t ) = { j ∈ U : t j ≤ t ≤ h j } and C t ( p, (Π j ) j ∈U ) = ( c t − t j ( p j , Π t j j )) j ∈U ( t ) . In words, U ( t ) is the set of indices of the caterpillars that are active at time t , and C t is the set of (positions of) those caterpillars. Note that we translatethe time coordinates in (4.50) to match our deﬁnition of a caterpillar, whichbegan at time 0. The jumps in C t occur at the time coordinates of events in ∪ j ∈U Π j . A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

We now show that for any constant a >

0, with high probability, thelongest ‘chain’ of caterpillars has length at most a log log n + 1. For k ∈ N ,let U k = {∅} ∪ S kj =1 { , } j . Lemma . Fix

T > ; then for any r > , a > , P [ U ( T ) ⌊ a log log n ⌋ ] = o ((log n ) − r ) . Proof.

Fix v ∈ { , } ⌊ a log log n ⌋ +1 . Then by a union bound,(4.51) P h ∃ w ∈ { , } ⌊ a log log n ⌋ +1 s.t. t w ≤ T i ≤ ⌊ a log log n ⌋ +1 P [ t v ≤ T ] . Note that by Lemma 4.14, t v = P ⌊ a log log n ⌋ +1 i =1 H i + R where ( H i ) i ≥ arei.i.d. with H ∼ Exp( λκ n ) and P h R ≥ a log log n + 1)(log n ) − / i = O ((log log n ) e − δ (log n ) / ) . Hence (if n is suﬃciently large that 3( a log log n + 1)(log n ) − / ≤ T / Z ′ is Poisson with parameter λκ n T / P [ t v ≤ T ] ≤ P [ Z ′ ≥ a log log n + 1] + O (cid:16) (log log n ) e − δ (log n ) / (cid:17) . We use (4.21) and combine with (4.51) to deduce that, for any r > P [ U ( T )

6⊆ U ⌊ a log log n ⌋ ] = P h ∃ w ∈ { , } ⌊ a log log n ⌋ +1 s.t. t w ≤ T i = o ((log n ) − r ) . This completes the proof. (cid:4)

The next task is to couple the branching caterpillar to the rescaled dualof the SΛFVS. Since we have expressed the dual as a deterministic functionof the driving point process of events in Deﬁnition 4.11, it is enough to ﬁndan appropriate coupling of the driving events for the branching caterpillarand those of a SΛFVS dual.The idea, roughly, is as follows. Each ‘branch’ of the branching caterpillaris constructed from an independent driving process. For each of these weshould like to retain those events that aﬀected the caterpillar, but we candiscard the rest. If two or more caterpillars are close enough that the eventsaﬀecting them could overlap, to avoid having too many events in these re-gions we have to arbitrarily choose one caterpillar and discard the eventsaﬀecting the others. We then supplement these with additional events, ap-propriately distributed to ﬁll in the gaps and arrive at the driving Poissonpoint process for a SΛFVS dual, with intensity as in (4.32). We will then

BM AND SELECTION IN THE SΛFV PROCESS check that the SΛFVS dual corresponding to this point process coincideswith our branching caterpillar, with probability tending to one as n → ∞ .To put this strategy into practice we require some notation. Let U = U ∪ { } . For V ⊂ U let max( V ) refer to the maximum element of V withrespect to a ﬁxed ordering in which 0 is the minimum value (it does notmatter precisely which ordering we use, but we must ﬁx one). Given a se-quence (Π j ) j ∈U of independent Poisson point processes on X with intensitymeasure (4.32), deﬁne a simple point process Π as follows. Let(4.52) j ( t, x ) = max (cid:16)n k ∈ U ( t ) : ∃ i ∈ { , } with | c it − t k ( p k , Π t k k ) − x | ≤ R n o ∪ { } (cid:17) . Note that j ( t, x ) = 0 corresponds to regions of space-time that are not neara caterpillar, so that for ( t, x, r, z , z , q, v ) ∈ Π , B r ( x ) does not contain acaterpillar. Then we deﬁne(4.53) Π = [ k ∈U { ( t, x, r, z , z , q, v ) ∈ Π k : j ( t, x ) = k } . Lemma . Π is a Poisson point process with intensity measure givenby (4.32) . Remark . We deﬁned the coupling (4.53) for each n ∈ N . As such,in the proof of Lemma 4.20 we regard n as a constant and we will not includeit inside O ( · ) , etc. Proof.

Let ν ( dt, dx, dr, dz , dz , dq, dv ) be the intensity measure givenin (4.32).Let B be the set of bounded Borel subsets of R + × R × R + ×B (0) × [0 , ;for B ∈ B , let N ( B ) = | Π ∩ B | and for j ∈ U , let N j ( B ) = | Π j ∩ B | .Suppose B = ∪ ki =1 B i ∈ B where for each i , B i = [ a i , b i ] × D i for some a = a < b ≤ a < . . . < b k = b . Let B R ⊂ B denote the collection ofsuch sets B . Note that Π is a simple point process, and that therefore Π isa Poisson point process with intensity ν if and only if(4.54) P [ N ( B ) = 0] = e − ν ( B ) for all B ∈ B R . (See e.g. Section 3.4 of Kingman (1992).)For some δ >

0, assume that b i − a i ≤ δ , ∀ i (by partitioning the B i further if necessary). Since B is bounded, ∃ d < ∞ s.t. | x | ≤ d for all( t, x, r, z , z , q, v ) ∈ B . We can write P [ N ( B ) = 0] = P [ ∩ ki =1 { N ( B i ) = 0 } ] A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO = E " k − Y i =1 { N ( B i )=0 } P (cid:18) N ( B k ) = 0 (cid:12)(cid:12)(cid:12)(cid:12) (Π j ( a k )) j ∈U (cid:19) (4.55)where Π j ( t ) := Π j | [0 ,t ] × R × R + ×B (0) × [0 , .For j ∈ U , let D jk = { ( x, r, z , z , q, v ) ∈ D k : j ( a k , x ) = j } and B jk =[ a k , b k ] × D jk . Also let˜ B k = [ a k , b k ] × B d +3 R n (0) × R + × B (0) × [0 , , and let V ( t ) = ∪ s ≤ t U ( s ).For t ∈ [ a k , b k ], if none of the caterpillars in B d +3 R n (0) move duringthe time interval [ a k , t ] then j ( a k , x ) = j ( t, x ) ∀ x ∈ B d (0); thus a point( t, x, r, z , z , q, v ) in Π ∩ B k must be a point in Π j ∩ B jk for some j , and viceversa. We can use this observation to relate { N ( B k ) = 0 } and ∩ j ∈U { N j ( B jk ) =0 } , as follows.If N ( B k ) = 0 and N j ( B jk ) = 0 for some j ∈ U , then D jk = ∅ so j ∈V ( a k ) ∪ { } (either j = 0 or the caterpillar indexed by j is alive at time a k ). Also after a k and before the point in Π j ∩ B jk , one of the caterpillars in B d +3 R n (0) must have moved, so there must be a point in Π l ∩ ˜ B k for some l ∈ V ( b k ). Conversely, if N j ( B jk ) = 0 ∀ j ∈ U and N ( B k ) = 0, then theremust be a point in Π l ∩ ˜ B k followed by either a point in Π ∩ B k or a pointin Π l ′ ∩ B k for some l, l ′ ∈ V ( b k ). Hence { N ( B k ) = 0 }△ ( ∩ j ∈U { N j ( B jk ) = 0 } ) ⊂  N ( B k ) + X l ∈V ( b k ) N l ( ˜ B k ) ≥  . (4.56)Note that by the deﬁnition of a caterpillar in Deﬁnition 4.12, for each j ∈ U , h ( p j , Π t j j ) ≥ (log n ) − c . It follows that V ( b k ) ⊆ S ⌈ b k (log n ) c ⌉ m =0 { , } m .Also if J ⊂ U with | J | = K then P j ∈ J N j ( ˜ B k ) has a Poisson distribu-tion with parameter Kν ( ˜ B k ), and since b k − a k ≤ δ , ν ( ˜ B k ) ≤ n π ( d +3 R n ) µ ((0 , R ]) δ . Hence for Z ′ a Poisson random variable with parameter(2 b k (log n ) c + 1) ν ( ˜ B k ) = O ( δ ), P  N ( B k ) + X j ∈V ( b k ) N j ( ˜ B k ) ≥ (cid:12)(cid:12)(cid:12)(cid:12) (Π j ( a k )) j ∈U  ≤ P (cid:2) Z ′ ≥ (cid:3) = O ( δ ) . By (4.56), we now have that P [ N ( B k ) = 0 | (Π j ( a k )) j ∈U ] = P [ ∩ j ∈U { N j ( B jk ) = 0 } ] + O ( δ ) BM AND SELECTION IN THE SΛFV PROCESS = Y j ∈U exp( − ν ( B jk )) + O ( δ )= exp( − ν ( B k )) + O ( δ ) . Substituting this into (4.55) and then repeating the same argument for k − , k − , . . . , P [ N ( B ) = 0] = k Y i =1 exp( − ν ( B k )) + k X i =1 O ( δ )= exp( − ν ( B )) + k O ( δ ) . By partitioning B further, we can let δt → k = Θ(1 /δ ). It followsthat P [ N ( B ) = 0] = exp( − ν ( B )). By (4.54), this completes the proof. (cid:4) It follows immediately from Lemma 4.20 that the collection of potentialancestral lineages in ( P t ( p, Π)) t ≥ has the same distribution as P ( n ) ( p ), therescaled SΛFVS dual. We now show that under this coupling the rescaledSΛFVS dual and branching caterpillar coincide with high probability.We consider ( C t ( p, (Π j ) j ∈U )) ≤ t ≤ T as a collection of paths as follows. Theset of paths through a single caterpillar ( c t ( p, Π)) t ≤ h ( p, Π) with k ∗ ( p, Π) = k ∗ is given by { l i } i ∈{ , } k ∗ , where l i ( t ) = c t ( p, Π) for t ∈ [0 , (log n ) − c ] and foreach 1 ≤ k ≤ k ∗ , l i ( t ) = c i k t ( p, Π) for t ∈ [ τ br k − + (log n ) − c , ( τ br k + (log n ) − c ) ∧ h ( p, Π)]. Then the collection of paths through ( C t ( p, (Π j ) j ∈U )) ≤ t ≤ T is givenby concatenating paths through the individual caterpillars, i.e. paths l :[0 , T ] → R such that for some sequence ( u m ) m ≥ ⊂ U with u m +1 = ( u m , i m )for some i m ∈ { , } for each m , for t ∈ [ t u m , h u m ], l ( t ) follows a path through( c t − t um ( p u m , Π t um u m )) t with l ( h u m ) = p u m +1 . Lemma . Fix

T > . Let (Π j ) j ∈U be independent Poisson pointprocesses with intensity measure (4.32) and let Π be deﬁned from (Π j ) j ∈U as in (4.53) . Then ( C t ( p, (Π j ) j ∈U )) ≤ t ≤ T and ( P t ( p, Π)) ≤ t ≤ T , viewed as col-lections of paths, are equal with probability at least − O ((log n ) − / ) . Proof.

We shall use Lemma 4.19 with a = (16 log 2) − . Writing, for j ∈ U , k ∗ ( j ) = k ∗ ( p j , Π t j j ), the number of branching events in c t − t j ( p j , Π t j j )before h j , by a union bound over U ⌊ a log log n ⌋ and (4.37), P [ ∃ j ∈ U ⌊ a log log n ⌋ : k ∗ ( j ) ≥ (log n ) / ] ≤ a log log n O ( e − δ (log n ) / )= O ( e − δ (log n ) / / ) . (4.57) A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

Let ( τ br k ( j )) k ≥ denote the sequence of branching events in c t − t j ( p j , Π t j j ), andsimilarly deﬁne ( τ type k ( j )) k ≥ and ( τ over k ( j )) k ≥ as in (4.33). Note that ( C t ) t ≤ T and ( P t ) t ≤ T only diﬀer as collections of paths if either a selective event aﬀectsa caterpillar during a time interval in which it ignores branching, or if twodiﬀerent caterpillars are simultaneously within R n of some x ∈ R and soone of them is not driven by the pieced together Poisson point process Π.More formally, if ( C t ) t ≤ T and ( P t ) t ≤ T diﬀer as collections of paths then oneor more of the following events occurs.1. U ( T )

6⊆ U ⌊ a log log n ⌋ or k ∗ ( j ) ≥ (log n ) / for some j ∈ U ⌊ a log log n ⌋ .2. For some j ∈ U ⌊ a log log n ⌋ and k ≤ (log n ) / , the event E ( j, k ) occurs:one of the lineages c t − t j ( p j , Π t j j ) and c t − t j ( p j , Π t j j ) is aﬀected by aselective event in the time interval [ τ br k ( j ) , τ br k ( j ) + (log n ) − c ].3. For some w = v ∈ U ⌊ a log log n ⌋ , the event E ( v, w ) occurs: there are i , i ∈ { , } with | c i t − t w ( p w , Π t w w ) − c i t − t v ( p v , Π t v v ) | ≤ R n for some t ≤ T .Recall from (4.34) and (2.6) that selective events aﬀect a single lineage withrate λ log n . Hence for k ∈ N and j ∈ U , P [ E ( j, k )] = O ((log n ) − c ).We now consider the event E ( v, w ). For w = v ∈ U , let i = min { j ≥ w j = v j } . Then let w ∧ v = ( ( w , . . . , w i − ) if i ≥ ∅ if i = 1 . At time h w ∧ v , either τ type k ∗ ( w ∧ v ) ( w ∧ v ) = τ over k ∗ ( w ∧ v ) ( w ∧ v ) or τ type k ∗ ( w ∧ v ) ( w ∧ v ) = τ div k ∗ ( w ∧ v ) ( w ∧ v ), in which case | p ( w ∧ v, − p ( w ∧ v, | ≥ (log n ) − c . Conditionalon | p ( w ∧ v, − p ( w ∧ v, | ≥ (log n ) − c , for i , i ∈ { , } , (cid:16) c i t − t w ( p w , Π t w w ) , c i t − t v ( p v , Π t v v ) (cid:17) t ∈ [ t w ,h w ] ∩ [ t v ,h v ] ∩ [0 ,T ] is part of the pair of potential ancestral lineages of an excursion started attime h w ∧ v with initial displacement at least (log n ) − c . Hence by Lemmas 4.9and 4.15, P [ E ( w, v )] = O (cid:18) log log n log n (cid:19) + O (cid:16) (log n ) − c (cid:17) = O (cid:16) (log n ) − / (cid:17) since c ≥

3. By a union bound, and using Lemma 4.19 and (4.57) it followsthat P [( C t ) t ≤ T = ( P t ) t ≤ T ] BM AND SELECTION IN THE SΛFV PROCESS ≤ o ((log n ) − ) + 4(log n ) a log 2+9 / P [ E ( j, k )] + 16(log n ) a log 2 P [ E ( w, v )]= O (cid:16) (log n ) a log 2+ +1 − c (cid:17) + O (cid:16) (log n ) a log 2 − / (cid:17) = O (cid:16) (log n ) − / (cid:17) , by our choice of a = (16 log 2) − and since c ≥ (cid:4) We are now ready to complete the proof of Theorem 2.7.

Proof. (Of Theorem 2.7) Set c = 4. By Lemmas 4.20 and 4.22, we havea coupling of the rescaled SΛFV dual and the branching caterpillar underwhich the two processes are equal (as collections of paths) with probabilityat least 1 − O ((log n ) − / ).We now couple ( C t ( p, (Π j ) j ∈U )) ≤ t ≤ T to a branching Brownian motionwith branching rate λκ n . Let (( W jt ) t ≥ , H j ) j ∈U be an i.i.d. sequence, where( W jt ) t ≥ is a Brownian motion starting at 0 and H j ∼ Exp( λκ n ) independentof ( W jt ) t ≥ . For each j ∈ U , we couple ( c t − t j ( p j , Π t j j )) t ∈ [ t j ,h j ] to (( W jt ) t ≥ , H j )as in Lemmas 4.14 and 4.16.For j ∈ U , let A ( j ) be the event that both | ( h j − t j ) − H j | ≤ n ) − / and for i = 1 , t ∈ [ t j , h j ], (cid:12)(cid:12)(cid:12) ( c it − t j ( p j , Π t j j ) − p j ) − W jσ ( t − t j ) (cid:12)(cid:12)(cid:12) ≤ n ) − c = 2(log n ) − / . By Lemmas 4.14 and 4.16, for any r >

0, for each j ∈ U , P [ A ( j )] ≥ − O ((log n ) − r ). Hence, taking a union bound over j ∈ U ⌊ log log n ⌋ , P [ ∩ j ∈U ⌊ log log n ⌋ A ( j )] ≥ − O ((log n ) log 2 − r ) . Also, for j ∈ U , deﬁne the event A ( j ) = (cid:26) sup t ∈ [0 , n ) − / ] | W jσ t | + sup t ∈ [ H j − n ) − / ,H j ] | W jσ t − W jσ H j | ≤ (log n ) − / (cid:27) . Then by another union bound over U ⌊ log log n ⌋ , since for a Brownian motion( W t ) t ≥ started at 0, P h sup t ∈ [0 , n ) − / ] | W t | ≥ (log n ) − / i = o ((log n ) − r ),we have that P [ ∩ j ∈U ⌊ log log n ⌋ A ( j )] ≥ − O ((log n ) log 2 − r ) . A. ETHERIDGE, N. FREEMAN, S. PENINGTON, D. STRAULINO

By Lemma 4.19, P [ U ( T )

6⊆ U ⌊ log log n ⌋ ] = o ((log n ) − r ).Deﬁne a branching Brownian motion starting at p with diﬀusion constant σ from (( W jt ) t ≥ , H j ) j ∈U by letting the increments of the initial particle begiven by ( W ∅ σ t ) t ≥ until time H ∅ , when it is replaced by two particles whichhave lifetimes H and H and increments given by ( W σ t ) t ≥ , ( W σ t ) t ≥ andso on.If U ( T ) ⊆ U ⌊ log log n ⌋ and A ( j ) ∩ A ( j ) occurs for each j ∈ U ⌊ log log n ⌋ ,each path in the branching caterpillar stays within distance 2(log log n +1)(log n ) − / + 2(log log n + 1)(log n ) − / of some path through the branch-ing Brownian motion and vice versa.Setting r = log 2+ 1 / σ and branch-ing rate κ n λ ) such that with probability at least 1 − O ((log n ) − / ), upto time T each path in the rescaled SΛFVS dual stays within distance2(log log n )(log n ) − / + 2(log log n )(log n ) − / of some path through thebranching Brownian motion and vice versa. Finally, we need to couple thisbranching Brownian motion up to time T with a branching Brownian mo-tion with branching rate κλ . By (4.35), κ n → κ as n → ∞ , so this follows bystraightforward bounds on the diﬀerence between the branching times andthe increments of a Brownian motion during such a time. (cid:4) References.

N H Barton. The probability of ﬁxation of a favoured allele in a subdivided population.

Genetical Research , 62(02):149–157, 1993.N H Barton, A M Etheridge, and A V´eber. A new model for evolution in a spatialcontinuum.

Electron. J. Probab. , 15:162–216, 2010.N H Barton, A M Etheridge, J Kelleher, and A V´eber. Genetic hitchhiking in spatiallyextended populations.

Theoretical population biology , 87:75–89, 2013a.NH Barton, AM Etheridge, J Kelleher, and A V´eber. Inference in two dimensions: allelefrequencies versus lengths of shared sequence blocks.

Theoretical population biology , 87:105–119, 2013b.P Billingsley.

Probability and Measure . Wiley, 1995.J L Cherry. Selection in a subdivided population with local extinction and recolonization.

Genetics , 164(2):789–795, 2003.R Durrett and I Z¨ahle. On the width of hybrid zones.

Stochastic Processes and theirApplications , 117(12):1751–1763, 2007.A Etheridge, N Freeman, and D Straulino. The Brownian net and selection in the spatialLambda-Fleming-Viot process. arXiv preprint arXiv:1506.01158 , 2015.A M Etheridge. Drift, draft and structure: some mathematical models of evolution.

BanachCenter Publ. , 80:121–144, 2008.A M Etheridge and T G Kurtz. Genealogical constructions of population models. arXivpreprint arXiv:1402.6724 , 2014.A M Etheridge and A V´eber. The spatial Lambda-Fleming-Viot process on a large torus:genealogies in the presence of recombination.

Ann. Appl. Probab. , 22(6):2165–2209,2012.BM AND SELECTION IN THE SΛFV PROCESS A M Etheridge, A V´eber, and F Yu. Rescaling limits of the spatial Lambda-Fleming-Viotprocess with selection. arXiv preprint arXiv:1406.5884 , 2014.R A Fisher. The wave of advance of advantageous genes.

Ann. Eugenics , 7:355–369, 1937.Olav Kallenberg.

Foundations of modern probability . Springer Science & Business Media,2006.J F C Kingman.

Poisson processes . Oxford university press, 1992.S M Krone and C Neuhauser. Ancestral processes with selection.

Theor. Pop. Biol. , 51:210–237, 1997.T Maruyama. On the ﬁxation probability of mutant genes in a subdivided population.

Genetical research , 15(02):221–225, 1970.C Mueller, L Mytnik, and J Quastel. Small noise asymptotics of traveling waves.

MarkovProcesses and Related Fields , 14:333–342, 2008.C Neuhauser and S M Krone. Genealogies of samples in models with selection.

Genetics ,145:519–534, 1997.A V´eber and Anton Wakolbinger. The spatial Lambda-Fleming-Viot process: an event-based construction and a lookdown representation.

Ann. Inst. H. Poincar Probab.Statist. , 51(2):570–598, 2015.M C Whitlock. Fixation probability and time in subdivided populations.

Genetics , 164(2):767–779, 2003.

Alison EtheridgeDepartment of StatisticsUniversity of Oxford24-29 St GilesOxfordEnglandE-mail: [email protected]

Nic FreemanSchool of Mathematics and StatisticsUniversity of SheffieldHounsfield RoadSheffieldEnglandE-mail: n.p.freeman@sheﬃeld.ac.uk

Sarah PeningtonDepartment of StatisticsUniversity of Oxford24-29 St GilesOxfordEnglandE-mail: [email protected]