[PDF] Convergence of Likelihood Ratios and Estimators for Selection in non-neutral Wright-Fisher Diffusions

Abstract

A number of discrete time, finite population size models in genetics describing the dynamics of allele frequencies are known to converge (subject to suitable scaling) to a diffusion process in the infinite population limit, termed the Wright-Fisher diffusion. In this article we show that the diffusion is ergodic uniformly in the selection and mutation parameters, and that the measures induced by the solution to the stochastic differential equation are uniformly locally asymptotically normal. Subsequently these two results are used to analyse the statistical properties of the Maximum Likelihood and Bayesian estimators for the selection parameter, when both selection and mutation are acting on the population. In particular, it is shown that these estimators are uniformly over compact sets consistent, display uniform in the selection parameter asymptotic normality and convergence of moments over compact sets, and are asymptotically efficient for a suitable class of loss functions.

Full PDF

aa r X i v : . [ m a t h . P R ] A ug Convergence of Likelihood Ratios and Estimators for Selection innon-neutral Wright-Fisher Diﬀusions

Jaromir Sant, , Paul A. Jenkins, , , , Jere Koskela, , Dario Span`o, MASDOC , Department of Statistics & Department of Computer Science University of Warwick, Coventry CV4 7AL, United KingdomThe Alan Turing Institute , British Library, London NW1 2DB, United Kingdom August 20, 2020

Abstract

A number of discrete time, ﬁnite population size models in genetics describing the dy-namics of allele frequencies are known to converge (subject to suitable scaling) to a diﬀusionprocess in the inﬁnite population limit, termed the Wright-Fisher diﬀusion. In this articlewe show that the diﬀusion is ergodic uniformly in the selection and mutation parameters,and that the measures induced by the solution to the stochastic diﬀerential equation are uni-formly locally asymptotically normal. Subsequently these two results are used to analyse thestatistical properties of the Maximum Likelihood and Bayesian estimators for the selectionparameter, when both selection and mutation are acting on the population. In particular, itis shown that these estimators are uniformly over compact sets consistent, display uniformin the selection parameter asymptotic normality and convergence of moments over compactsets, and are asymptotically eﬃcient for a suitable class of loss functions.

Mathematical population genetics is concerned with the study of how populations evolve overtime, oﬀering viable models to study how various biological phenomena such as selection andmutation aﬀect the genetic proﬁle of the population they act upon. Many models have beenproposed over the years, but perhaps the most popular is the Wright-Fisher model (see for in-stance [11, Chapter 15, Section 2]).Under a suitable scaling of both space and time, a diﬀusion limit exists for the Wright-Fishermodel, which is referred to as the Wright-Fisher diﬀusion (1) and is the main focus of thisarticle. The Wright-Fisher diﬀusion is robust in the sense that the broad class of Cannings [2]models converge to it when suitably scaled. Furthermore, it has the neat property that the onlycontribution to the diﬀusion coeﬃcient comes from random mating whilst other features suchas selection and mutation appear solely in the drift coeﬃcient. This facilitates inference as onecan concentrate on estimating the drift, treating the diﬀusion coeﬃcient as a known expression.In this article we focus on a continuously observed Wright-Fisher diﬀusion describing the allelefrequency dynamics in a two-allele, haploid population undergoing both selection and mutation.In Section 2 we show that the diﬀusion is ergodic uniformly over both the selection and mutationparameters, and subsequently that the associated family of measures induced by the solutionto the stochastic diﬀerential equation (SDE) is uniformly locally asymptotically normal (pro-vided the mutation parameters are greater than 1). In Section 3 we then shift our focus on theproperties of the maximum likelihood (ML) and Bayesian estimators for the selection parameter1 ∈ S ⊂ R (which measures how much more favourable one allele is over the other), under theassumption that the mutation parameters are a priori known. We brieﬂy discuss some technicalissues associated with conducting joint inference for the selection and mutation parameters inSection 4.We point out here that by observing the path continuously through time without error, onecan establish and analyse explicitly the statistical error produced by an estimator based on thewhole sample path, which then clearly illustrates the statistical limitations of alternative estima-tors based on less informative (e.g. discrete) observations. In a discrete observation setting, inaddition to the above mentioned statistical error, one also has to deal with observational error.One certainly cannot hope for an estimator that performs better in a discrete setting than in acontinuous one, so our analysis may be viewed as the ‘best possible’ performance for inferencefrom a discretely observed model.Inference for scalar diﬀusions, particularly proving consistency of estimators under speciﬁc ob-servational schemes, has generated considerable interest over the past few years [6, 12, 15, 16, 17,20, 21, 23]. However, most of the work so far has considered classes of diﬀusions which directlypreclude the Wright-Fisher diﬀusion, for instance by imposing periodic boundary conditions onthe drift coeﬃcients or by requiring the diﬀusion coeﬃcient be strictly positive everywhere. Theasymptotic study of a variety of estimators for continuously observed ergodic scalar diﬀusionshas been entertained in great depth in [12]; see in particular Theorems 2.8 and 2.13 in [12],which are respectively adaptations of Theorems I.5.1, I.10.1 and I.5.2, I.10.2 in [9]. HoweverTheorems 2.8 and 2.13 in [12] cannot be applied directly to the Wright-Fisher diﬀusion as cer-tain conditions do not hold, namely the reciprocal of the diﬀusion coeﬃcient does not havea polynomial majorant. This discrepancy makes replicating the results for the Wright-Fisherdiﬀusion with selection and mutation highly non-trivial. Instead we exploit the explicit natureof (1), below, to prove, in our main result Theorem 3.1, uniform in the selection parameter overcompact sets consistency, asymptotic normality and convergence of moments, as well as asymp-totic eﬃciency for both the Maximum Likelihood (ML) and Bayesian estimators. We achievethis by showing that the conditions of Theorems I.5.1, I.10.1 and I.5.2, I.10.2 in [9] still holdfor the Wright-Fisher diﬀusion and that this diﬀusion is ergodic uniformly in the selection andmutation parameters (a term we deﬁne in Section 2). We point out that the uniformity in ourresults is particularly useful as it controls the lowest rate (over the true parameters) at whichthe parameters of interest are being learned by the inferential scheme.The Wright-Fisher diﬀusion with selection but without mutation was tackled speciﬁcally byWatterson in [23], where the author makes use of a frequentist framework. Having no mutationensures that the diﬀusion is absorbed at either boundary point 0 or 1 in ﬁnite time almostsurely, and by conditioning on absorption Watterson computes the moment generating function,proves asymptotic normality, and derives hypothesis tests for the Maximum Likelihood Esti-mator (MLE). Watterson’s work however does not address the Bayesian estimator, nor does itreadily extend to the case when mutation is present because the diﬀusion is no longer absorbedat the boundaries. In this sense the results obtained in Theorem 3.1 are complementary tothose obtained by Watterson under the assumption that the mutation parameters are known.Although this is a restriction, we are observing the path continuously over the interval [0 , T ]and subsequently sending T → ∞ , so these parameters could be inferred by considering theboundary behaviour of the diﬀusion. In particular, when either mutation parameter is less than1, then the diﬀusion hits the corresponding boundary in ﬁnite time almost surely. Further, asthe diﬀusion approaches the boundary the diﬀusion coeﬃcient (i.e. noise) vanishes, and in factit vanishes suﬃciently quickly on the approach to the boundary that the mutation parameterscan be inferred without error as soon as the boundary is ﬁrst hit. For mutation parameters2reater than or equal to 1, the corresponding boundary point is no longer attainable but thediﬀusion can get arbitrarily close to it as T → ∞ , and a similar argument enables the mutationparameters again to be inferred (see [18, Remark 2.2] for a related argument applying to thesquared Bessel process).The rest of this article is organised as follows: in Section 2 we introduce the Wright-Fisher dif-fusion, proceed to describe some of its properties, and prove that the diﬀusion is both uniformlyin the selection and mutation parameters ergodic, as well as uniformly locally asymptoticallynormal. Section 3 then focuses on the ML and Bayesian estimators for the selection parameter,proving that these estimators have a set of desirable properties in Theorem 3.1. Section 4 thenconcludes with a discussion. The proof of Theorem 2.2 can be found in Appendix A, whilst inAppendix B we extend the conclusions of Theorem 2.2 for two speciﬁc unbounded functions. We start by giving a brief overview of the Wright-Fisher diﬀusion before proving that the diﬀu-sion is ergodic uniformly in the selection and mutation parameters (a term we deﬁne rigorouslyshortly), and subsequently use this to prove the uniform local asymptotic normality (LAN) ofthe family of measures associated to the solution of the SDE.Consider an inﬁnite haploid population undergoing selection and mutation, where we are in-terested in two alleles A and A . Suppose that ϑ = ( s, θ , θ ) ∈ R × (0 , ∞ ) are the selectionand mutation parameters respectively, where s describes the extent to which allele A is favouredover A , alleles of type A mutate to A at rate proportional to θ , and those of type A mutateto A at rate proportional to θ . Let X t denote the frequency of A in the population at time t .Then the dynamics of X t can be described by a diﬀusion process on [0 , dX t = µ ( ϑ , X t ) dt + σ ( X t ) dW t := 12 ( sX t (1 − X t ) − θ X t + θ (1 − X t )) dt + p X t (1 − X t ) dW t , (1)with X ∼ ν for some initial distribution ν , ( W t ) t ≥ a standard Wiener process deﬁned on aﬁltered probability space (Ω , F , ( F t ) t ≥ , P ), and [0 , T ] the observation interval. A strong solu-tion to (1) exists by the Yamada-Watanabe condition (see Theorem 3.2, Chapter IV in [10]),but weak uniqueness suﬃces for our purposes. We denote by P ( ϑ ) ν the law induced on the spaceof continuous functions mapping [0 , T ] into [0 ,

1] (henceforth denoted C T ([0 , ϑ = ( s, θ , θ ), and X ∼ ν (with dependence on T being implicit). Furthermore we denote taking expectation with respect to P ( ϑ ) ν by E ( ϑ ) ν .We assume that θ , θ >

0, for if at least one is 0 then the diﬀusion is absorbed in ﬁnite timeand we are back in the regime studied by Watterson [23]. The boundary behaviour depends onwhether the mutation parameters are either less than or greater or equal to 1, but in either casethe diﬀusion is ergodic as long as θ , θ > f ϑ ( x ) = 1 G ϑ e sx x θ − (1 − x ) θ − , x ∈ (0 , , G ϑ is the normalising constant G ϑ = Z e sx x θ − (1 − x ) θ − dx ≤ max { e s , } B ( θ , θ ) < ∞ , with B ( θ , θ ) := Z x θ − (1 − x ) θ − dx (2)the beta function. In what follows, we will always assume that ξ ∼ f ϑ , and we denote takingexpectation with respect to f ϑ by E ( ϑ ) , where the omission of the subscript will indicate thatwe start from stationarity.It turns out that we need a slightly stronger notion of ergodicity which we now deﬁne. Theidea here is that we can extend pointwise ergodicity in the parameter ϑ = ( s, θ , θ ) to anycompact set K in the parameter space R × (0 , ∞ ) by ﬁnding the slowest rate of convergencewhich works within that compact set. More rigorously, we introduce the following deﬁnition. Deﬁnition 2.1.

The process X is said to be ergodic uniformly in the parameter ϑ if ∀ ε > T →∞ sup ϑ ∈K P ( ϑ ) ν " (cid:12)(cid:12)(cid:12)(cid:12) T Z T h ( X t ) dt − E ( ϑ ) (cid:2) h ( ξ ) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) > ε = 0 (3)holds for any K ⊂ R × (0 , ∞ ) compact and for any function h : [0 , → R bounded andmeasurable, where ξ ∼ f ϑ .To the best of our knowledge, it has not been proven that the Wright-Fisher diﬀusion is ergodic uniformly in its parameters, which motivates the following theorem. Theorem 2.2.

The Wright-Fisher diﬀusion with mutation and selection is uniformly ergodicin the selection and mutation parameters ϑ = ( s, θ , θ ) for any initial distribution ν . We postpone the proof to Appendix A.For the remainder of this section we restrict our attention to the parameter space Θ ⊂ R × [1 , ∞ ) , where Θ is open and bounded, for if either of the mutation parameters were less than 1then the measures P ( ϑ ) ν within this region would be mutually singular with respect to one anotherand thus their Radon-Nikodym derivative undeﬁned. Restricting our attention to mutation pa-rameters within the range [1 , ∞ ) thus ensures that the family of measures { P ( ϑ ) ν , ϑ ∈ Θ } areequivalent, and we have that d P ( ϑ ′ ) ν d P ( ϑ ) ν ( X T ) = ν ( ϑ ′ , X ) ν ( ϑ , X ) exp ( Z T (cid:18) µ ( ϑ ′ , X t ) − µ ( ϑ , X t ) σ ( X t ) (cid:19) dW t − Z T (cid:18) µ ( ϑ ′ , X t ) − µ ( ϑ , X t ) σ ( X t ) (cid:19) dt ) (4)with P ( ϑ ) ν -probability 1. Proofs of the above claims regarding the equivalence of the Wright-Fisher diﬀusion and the form of the Radon-Nikodym derivative can be found in [3], Lemma7.2.2 and Section 10.1.1. We emphasise here that we have allowed the starting distribution ν todepend on the parameters, as is evident from the ﬁrst ratio in (4). However if there is no suchdependence then the only diﬀerence to the above would be to replace this ratio by 1.We end this section by introducing the concept of local asymptotic normality (LAN) and showingthat the Wright-Fisher diﬀusion is uniformly LAN, which will be essential in the next section.4 eﬁnition 2.3 (Special case of Deﬁnition 2.1 in [12]) . The family of measures { P ( ϑ ) ν , ϑ ∈ Θ } is said to be locally asymptotically normal (LAN) at a point ϑ ∈ Θ at rate T − / if for any u ∈ R , the likelihood ratio function admits the representation Z T, ϑ ( u ) := d P ( ϑ + u √ T ) ν d P ( ϑ ) ν ( X T )= exp (cid:26)(cid:10) u , ∆ T ( ϑ , X T ) (cid:11) − h I ( ϑ ) u , u i + r T ( ϑ , u , X T ) (cid:27) , where h· , ·i denotes the Euclidean inner product on R , and ∆ T ( ϑ , X T ) is a random variablesuch that L ϑ (cid:8) ∆ T ( ϑ , X T ) (cid:9) d → N ( , I ( ϑ )) , (5)with I ( ϑ ) the Fisher information matrix evaluated at ϑ , i.e. I ( ϑ ) := E ( ϑ ) (cid:20) ˙ µ ( ϑ , ξ ) ˙ µ ( ϑ , ξ ) T σ ( ξ ) (cid:21) , where ˙ µ ( ϑ , ξ ) T is the transpose of the vector of derivatives of µ ( ϑ , x ) with respect to ϑ . More-over, the function r T ( ϑ , u , X T ) satisﬁeslim T →∞ r T ( ϑ , u , X T ) = 0 in P ( ϑ ) -probability (6)The family of measures is said to be LAN on Θ if it is LAN at every point ϑ ∈ Θ , and furtherit is said to be uniformly LAN on Θ if both convergence (5) and (6) are uniform in s ∈ K forevery compact K ⊂ Θ . Theorem 2.4.

The family of measures { P ( ϑ ) ν , ϑ ∈ Θ } induced by the weak solution to (1) withinitial distribution ν being either a point mass at x ∈ (0 , or the stationary density f ϑ , isuniformly LAN on Θ , with the likelihood ratio function Z T, ϑ ( u ) admitting the representation Z T, ϑ ( u ) = exp (cid:8)(cid:10) u , ∆ T ( ϑ , X T ) (cid:11) − h I ( ϑ ) u , u i + r T ( ϑ , u , X T ) (cid:9) for u ∈ U T, ϑ = { u : ϑ + u √ T ∈ Θ } , where ∆ T ( ϑ , X T ) = 12 √ T Z T ˙ µ ( ϑ , X t ) σ ( X t ) dW t Proof.

From (4), we have that the log-likelihood ratio is given bylog Z T,s ( u ) = log ν ( ϑ + u √ T , X ) ν ( ϑ , X )+ Z T u √ T p X t (1 − X t ) − u √ T r X t − X t + u √ T r − X t X t ! dW t − Z T u √ T p X t (1 − X t ) − u √ T r X t − X t + u √ T r − X t X t ! dt = log ν ( ϑ + u √ T , X ) ν ( ϑ , X ) + (cid:10) u , ∆ T ( ϑ , X T ) (cid:11) − h I ( ϑ ) u , u i + 12 h I ( ϑ ) u , u i − T Z T h u , ˙ µ ( ϑ , X t ) i σ ( X t ) dt, (7)5here I ( ϑ ) = E ( ϑ )  ξ (1 − ξ ) − ξ − ξ − ξ ξ − ξ − ξ − ξξ  . Setting r T ( ϑ , u , X T ) := log ν ( ϑ + u √ T , X ) ν ( ϑ , X ) + 12 h I ( ϑ ) u , u i − T Z T h u , ˙ µ ( ϑ , X t ) i σ ( X t ) dt, we show that (6) holds. The ﬁrst term appears only if, of the two choices for ν , we have ν = f ϑ ,and in that caselog ν ( ϑ + u √ T , X ) ν ( ϑ , X ) = u √ T X + u √ T log X + u √ T log(1 − X ) → T → ∞ . Thus we deduce that (6) follows if we can prove that for any ε > T →∞ sup ϑ ∈K P ( ϑ ) ν "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T Z T h u , ˙ µ ( ϑ , X t ) i σ ( X t ) dt − h I ( ϑ ) u , u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε = 0 . (8)Observe that the expression inside the probability in (8) is made up of six distinct diﬀerencesbetween the averages of the six distinct entries of the Fisher information matrix with respectto time and the stationary density. Thus if we are able to show that each individual diﬀerencedisplays the same convergence as in (3), (8) follows. Now, as h u , ˙ µ ( ϑ , x ) i σ ( x ) = u p x (1 − x ) − u r x − x + u r − xx ! = u x (1 − x ) − u u x + 2 u u (1 − x ) − u u + u x − x + u − xx using (1), we can apply Theorem 2.2 to the ﬁrst four terms directly. The remaining two dif-ferences involve the unbounded functions x (1 − x ) − and (1 − x ) x − and thus Theorem 2.2cannot be applied; however, arguments similar to those used in the proof of this theorem (seeAppendix B for the relevant details and proof) allow us to deduce that (3) is also true for thesetwo functions and thus (6) holds. Finally, (5) follows from Proposition 1.20 in [12] which we caninvoke in view of the above proved (8) and the fact thatsup ϑ ∈K p h I ( ϑ ) u , u i < ∞ . We henceforth assume that the mutation parameters θ , θ > s ∈ S ⊂ R with S open and bounded. As re-marked earlier, the observational regime entertained here would enable one to infer the mutationparameters: on ϑ ∈ R × (0 , this is immediate; the family of measures { P ( ϑ ) ν : ϑ ∈ R × (0 , } are mutually singular. On ϑ ∈ R × [1 , ∞ ) the family of measures { P ( ϑ ) ν : ϑ ∈ R × [1 , ∞ ) } are now mutually absolutely continuous, with both boundary points unattainable. However,6he process can get arbitrarily close to either boundary as T → ∞ , and in this region the noisevanishes suﬃciently quickly that again the corresponding mutation parameters can be inferredto any required precision. In the case when one mutation parameter is less than 1 and the otheris greater than or equal to 1, similar arguments apply. Actually incorporating inference of themutation parameter into the inferential setup below leads to some technical diﬃculties which wediscuss in Section 4, so for simplicity we assume them to be known. Nonetheless all the notationintroduced above and deﬁnitions carry through by replacing ϑ by s .We start by deﬁning the MLE ˆ s T of s in (1) asˆ s T = arg sup s ∈S d P ( s ) ν d P ( s ) ν ( X T )where s ∈ S is arbitrary and its only role is to specify a reference measure whose exact valuedoes not matter. We point out that now (4) simpliﬁes to d P ( s ′ ) ν d P ( s ) ν ( X T ) = ν ( s ′ , X ) ν ( s, X ) exp ( Z T s ′ − s √ T p X t (1 − X t ) dW t − Z T (cid:18) s ′ − s √ T (cid:19) X t (1 − X t ) dt ) . (9)In order to be able to deﬁne the Bayesian estimator, we introduce the class W p of loss functions ℓ : S → R + for which the following stipulations are satisﬁed:A1. ℓ ( · ) is even, non-negative, and continuous at 0 with ℓ (0) = 0 but not identically zero.A2. The sets { u ∈ S : ℓ ( u ) < c } are convex ∀ c > ℓ ( · ) has a polynomial majorant, i.e. there exist strictly positive constants A and b such thatfor any u ∈ S , | ℓ ( u ) | ≤ A (1 + | u | b )A4. For any H > γ , it holds thatinf | u | >H ℓ ( u ) − sup | u |≤ H γ ℓ ( u ) ≥ . As remarked above, we assume that S is an open and bounded subset of R , and we denote by p ( · ) the prior density on S , which we assume belongs to P c := (cid:26) p ( · ) ∈ C ( ¯ S , R + ) : p ( u ) ≤ A (1 + | u | b ) ∀ u ∈ ¯ S , Z ¯ S p ( u ) du = 1 (cid:27) , where A and b are some strictly positive constants, and ¯ S denotes the closure of S . With p ( · ) ∈ P c and ℓ ( · ) ∈ W p , we deﬁne the Bayesian estimator ˜ s T of s in (1) as˜ s T = arg min ¯ s T Z S E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i p ( s ) ds. We introduce the last class of functions we will need, namely denote by G the class of functionssatisfying the following two conditions:1. For a ﬁxed T > g T ( · ) is a monotonically increasing function on [0 , ∞ ), with g T ( y ) → ∞ as y → ∞ . 7. For any N >

0, lim T →∞ y →∞ y N e − g T ( y ) = 0 . Observe that the likelihood ratio function is now given by Z T,s ( u ) : = d P ( s + u √ T ) ν d P ( s ) ν ( X T )= ν ( s + u √ T , X ) ν ( s, X ) exp ( (cid:18) u √ T (cid:19) Z T p X t (1 − X t ) dW t − (cid:18) u √ T (cid:19) Z T X t (1 − X t ) dt ) (10)for u ∈ U T,s := { u ∈ R : s + u √ T ∈ S} .We now present the main result of this article which states that the ML and Bayesian esti-mators for s have a set of desirable properties. We prove this by showing that the conditionsof Theorems I.5.1, I.5.2, I.10.1, and I.10.2 in [9] are satisﬁed for the Wright-Fisher diﬀusion. Asimilar formulation of the result below for the general case of a continuously observed diﬀusionon R can be found in Theorems 2.8 and 2.13 in [12], where the author proves that the conditionsnecessary to invoke Theorems I.5.1, I.5.2, I.10.1, and I.10.2 in [9] hold for a certain class ofdiﬀusions. However, this class includes only those scalar diﬀusions for which the inverse of thediﬀusion coeﬃcient has a polynomial majorant. This fails to hold in our case, forcing us to seekalternative ways to prove that the conditions of the above mentioned theorems hold. Theorem 3.1.

Let ˆ s T and ˜ s T respectively be the ML and Bayesian estimators for selectionparameter s ∈ S (for open bounded S ⊂ R ) in the non-neutral Wright-Fisher diﬀusion (1) withinitial distribution being either a point mass at a ﬁxed x ∈ (0 , or the stationary distribution.In what follows, let ¯ s T refer to either of the two estimators. Then ¯ s T is uniformly over compactsets K ⊂ S consistent, i.e. for any ε > T →∞ sup s ∈K P ( s ) ν (cid:2) | ¯ s T − s | > ε (cid:3) = 0; it converges in distribution to a normal random variable L s n √ T (¯ s T − s ) o d → N (0 , I ( s ) − ) , uniformly in s ∈ K ; and it displays moment convergence for any p > T →∞ E ( s ) ν (cid:20) (cid:12)(cid:12)(cid:12) √ T (¯ s T − s ) (cid:12)(cid:12)(cid:12) p (cid:21) = E (cid:20) (cid:12)(cid:12)(cid:12) I ( s ) − ζ (cid:12)(cid:12)(cid:12) p (cid:21) uniformly in s ∈ K , where ζ ∼ N (0 , , for any compact set K ⊂ S . Furthermore, if the lossfunction ℓ ( · ) ∈ W p , then ¯ s T is also asymptotically eﬃcient, i.e. lim δ → lim T →∞ sup s : | s − s | <δ E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i = E h ℓ (cid:16) I ( s ) − ζ (cid:17)i holds for all s ∈ S , where ζ ∼ N (0 , . As mentioned above, the proof relies on Theorems I.5.1, I.5.2, I.10.1, and I.10.2 in [9], whichfor reference we combine together into Theorem 3.2 below. Establishing that the conditionsof Theorem 3.2 hold for the Wright-Fisher diﬀusion is non-trivial as the standard arguments8ound in [12] no longer hold, and will thus be the main focus of this section. The conclusions ofTheorems I.5.1 and I.5.2 guarantee the uniform over compact sets consistency for the MLE andBayesian estimator respectively, whilst those of Theorems I.10.1 and I.10.2 provide the necessaryconditions to deduce the uniform in s ∈ K asymptotic normality and convergence of momentsfor compact K ⊂ S , as well as asymptotic eﬃciency.

Theorem 3.2 (Ibragimov-Has’minskii) . Let ˆ s T , ˜ s T respectively be the ML and Bayesian estima-tors with prior density p ( · ) ∈ P c , deﬁned in terms of a loss function ℓ ( · ) ∈ W p for the parameter s ∈ S , for open bounded S ⊂ R , in (1). Suppose further that the following conditions are satisﬁedby the likelihood ratio function Z T,s ( u ) as deﬁned in (10):1. ∀K ⊂ S compact, we can ﬁnd constants a and B , and functions g T ( · ) ∈ G (all of whichdepend on K ) such that the following two conditions hold: • ∀ R > , ∀ u, v ∈ U T,s satisfying | u | < R , | v | < R , and for some m ≥ q > dim( S )sup s ∈K E ( s ) ν h(cid:12)(cid:12)(cid:12) Z T,s ( u ) m − Z T,s ( v ) m (cid:12)(cid:12)(cid:12) m i ≤ B (1 + R a ) | u − v | q . (11) • ∀ u ∈ U T,s sup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ e − g T ( | u | ) .

2. The random functions Z T,s ( u ) have marginal distributions which converge uniformly in s ∈ K as T → ∞ to those of the random function Z s ( u ) ∈ C ( R ) , where C ( R ) denotesthe space of continuous functions on R vanishing at inﬁnity, equipped with the supremumnorm and the Borel σ -algebra.3. The limit function Z s ( u ) attains its maximum at the unique point ˆ u ( s ) = u with probability1, and the random function ψ ( v ) = Z R ℓ ( v − u ) Z s ( u ) R R Z s ( y ) dy du attains its minimum value at a unique point ˜ u ( s ) = u with probability 1.Then we have that the ML and Bayesian estimators are: uniformly in s ∈ K consistent, i.e. forany ε > T →∞ sup s ∈K P ( s ) ν (cid:2) | ¯ s T − s | > ε (cid:3) = 0 , the distributions of the random variables ¯ u T = √ T (¯ s T − s ) converge uniformly in s ∈ K to thedistribution of ¯ u , and for any loss function ℓ ∈ W p uniformly in s ∈ K lim T →∞ E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i = E ( s ) ν [ ℓ (¯ u )] . (12)In fact, for the Bayesian estimator the requirements for inequality (11) can be weakened as itsuﬃces to show that (11) holds for m = 2 and any q > Proof of Theorem 3.1.

Our aim will be to prove that Conditions 1, 2, and 3 in Theorem 3.2hold for the Wright-Fisher diﬀusion, for then the ML and Bayesian estimator are uniformlyon compact sets consistent. Below, Condition 1 is shown to hold in Propositions 3.4 and 3.5;Condition 2 is shown in Corollary 3.3; and Condition 3 is shown in Proposition 3.6.9t remains to show how uniform in s ∈ K asymptotic normality and convergence of moments, aswell as asymptotic eﬃciency (under the right choice of loss function) follow. Given Conditions1, 2, and 3 of Theorem 3.2, uniform in s ∈ K asymptotic normality follows immediately fromProposition 3.6; ¯ u = I ( s ) − ∆( s ), ∆( s ) ∼ N (0 , I ( s )), and ¯ u T converges uniformly in distributionto ¯ u . Moreover, as stated in Remark I.5.1 in [9], the Ibragimov-Has’minskii conditions also giveus a bound on the tails of the likelihood ratio, which can be translated into bounds on thetails of | ˆ u T | p for any p > | ˜ u T | p hold for the Bayesian estimator by Theorem I.5.7 in [9], and thus we have thatthe random variables | ¯ u T | p are uniformly integrable for any p >

0, uniformly in s ∈ K for anycompact K ⊂ S . Uniform convergence of the moments of the estimators follows from this andthe uniform convergence in distribution (by applying a truncation argument).For loss functions satisfying ℓ ( · ) ∈ W p , observe that the uniform convergence in (12) allowsus to deduce thatlim T →∞ sup s : | s − s | <δ E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i = sup s : | s − s | <δ E h ℓ (cid:16) I ( s ) − ζ (cid:17)i for ζ ∼ N (0 , I ( s ) is continuous in s , we have thatlim δ → sup s : | s − s | <δ E h ℓ (cid:16) I ( s ) − ζ (cid:17)i = E h ℓ (cid:16) I ( s ) − ζ (cid:17)i , giving asymptotic eﬃciency.We proceed to show that Conditions 1, 2, and 3 in Theorem 3.2 hold for the Wright-Fisherdiﬀusion. Theorem 2.4 gives us that the Wright-Fisher diﬀusion is uniformly LAN, which im-mediately gives the required marginal convergence of the Z T,s ( u ) in Condition 2. Corollary 3.3.

The random functions Z T,s ( u ) given by Z T,s ( u ) = exp (cid:26) u √ T Z T p X t (1 − X t ) dW t − u E ( s ) [ ξ (1 − ξ )] + r T ( s, u, X T ) (cid:27) =: exp (cid:26) u ∆ T ( s ) − u I ( s ) + r T ( s, u, X T ) (cid:27) , have marginal distributions which converge uniformly in s ∈ K as T → ∞ to those of the randomfunction Z s ( u ) ∈ C ( R ) given by Z s ( u ) := exp (cid:26) u ∆( s ) − u I ( s ) (cid:27) , where ∆( s ) := lim T →∞ √ T Z T p X t (1 − X t ) dW t ∼ N (cid:18) , I ( s ) (cid:19) . Proof.

The result follows immediately from the uniform LAN of the family of measures as shownin Theorem 2.4; see for illustration the display just before Lemma 2.10 in [12]. It is clear that Z s ( u ) vanishes at inﬁnity and thus is an element of C ( R ).The next two results allow us to control the Hellinger distance of the likelihood ratio functionas required by Condition 1 in Theorem 3.2. Proposition 3.4.

For any

K ⊂ S compact, we can ﬁnd a constant C such that for any R > ,and for any u, v ∈ U T,s satisfying | u | < R , | v | < R , the following holds sup s ∈K E ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12) Z T,s ( u ) − Z T,s ( v ) (cid:12)(cid:12)(cid:12) (cid:21) ≤ C (1 + R ) | u − v | . roof. In what follows we denote by C i , for i ∈ N , constants which do not depend on u , v , s , or T . Observe that for any s ′ , s ∗ ∈ S it holds that E ( s ′ ) ν "Z T (cid:12)(cid:12)(cid:12)(cid:12) µ ( s ′ , X t ) − µ ( s ∗ , X t ) σ ( X t ) (cid:12)(cid:12)(cid:12)(cid:12) m dt = E ( s ′ ) ν "Z T (cid:12)(cid:12)(cid:12)(cid:12) ( s ′ − s ∗ )2 p X t (1 − X t ) (cid:12)(cid:12)(cid:12)(cid:12) m dt ≤ (cid:18) s ′ − s ∗ (cid:19) m T < ∞ , and so we can use Lemma 1.13 and Remark 1.14 from [12] to split the expectation in (11) intothree E ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12) Z T,s ( u ) − Z T,s ( v ) (cid:12)(cid:12)(cid:12) (cid:21) ≤ C Z (cid:12)(cid:12)(cid:12) f s u ( x ) − f s v ( x ) (cid:12)(cid:12)(cid:12) dx + C Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt + C T Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt, (13)where we denote s u = s + u/ √ T and s v = s + v/ √ T , and remark that the above holds for ν = f s ,whilst if ν = δ x then the ﬁrst term on the RHS of (13) vanishes. Observe that Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt = | u − v | T Z T E ( s v ) ν [ X t (1 − X t )] dt ≤ | u − v | . Therefore C Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt ≤ C | u − v | . (14)A similar calculation can be performed for the third term in (13) to get C T Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt ≤ C | u − v | . (15)Dealing with the ﬁrst term in (13) is slightly more involved. To this end, observe that Z (cid:12)(cid:12)(cid:12) f s u ( x ) − f s v ( x ) (cid:12)(cid:12)(cid:12) dx = Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12) p G s u e ux √ T − p G s v e vx √ T (cid:12)(cid:12)(cid:12) dx. (16)Now we have that C min { e s , } ≤ G s u := Z x θ − (1 − x ) θ − e (cid:16) s + u √ T (cid:17) x dx ≤ C max { e s , } , (17)where C = B ( θ , θ ) e − diam( S ) , C = B ( θ , θ ) e diam( S ) are non-zero, positive, and independentof s and T , since we constrain u, v ∈ U T,s . This allows us to deduce that G / √ G is Lipschitz11n [ C inf s ∈K min { e s , } , C sup s ∈K max { e s , } ] with some constant C >

0, i.e. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p G s u − p G s v (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12) G s u − G s v (cid:12)(cid:12)(cid:12) = C Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12) e ux √ T − e vx √ T (cid:12)(cid:12)(cid:12) dx ≤ C C Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12)(cid:12) ux √ T − vx √ T (cid:12)(cid:12)(cid:12)(cid:12) dx = C C √ T | u − v | Z x θ (1 − x ) θ − e sx dx ≤ C √ T max { e s , } | u − v | , where in the second inequality we have made use of the fact that e z is Lipschitz in z on[ − diam( S ) , diam( S )] with some constant C >

0. Thus we deduce that (cid:12)(cid:12)(cid:12) p G s u e ux √ T − p G s v e vx √ T (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p G s u (cid:16) e ux √ T − e vx √ T (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e vx √ T p G s u − p G s v ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 e vx √ T p G s u (cid:12)(cid:12)(cid:12) e ux √ T − e vx √ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p G s u − p G s v (cid:12)(cid:12)(cid:12) ≤ C x T C min { e s , } (cid:12)(cid:12) u − v (cid:12)(cid:12) + e diam( S ) x C T max { e s , } (cid:12)(cid:12) u − v (cid:12)(cid:12) + e diam( S ) x C C xT √ C max { e s , } min { e s/ , } (cid:12)(cid:12) u − v (cid:12)(cid:12) (18)Putting (18) into (16) gives us that Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12) p G s u e ux √ T − p G s v e vx √ T (cid:12)(cid:12)(cid:12) dx ≤ C s T | u − v | , (19)where C s := C e | s | + C max { e s , } + C max { e s , } min { e s/ , } . Inserting equations (14), (15), and (19) into (13) allows us to deduce thatsup s ∈K E ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12) Z T,s ( u ) − Z T,s ( v ) (cid:12)(cid:12)(cid:12) (cid:21) ≤ sup s ∈K ( (cid:18) C s T + C (cid:19) | u − v | + C | u − v | ) ≤ C | u − v | (cid:0) R (cid:1) , where we make use of the fact that | u | , | v | < R , as well as the fact that C s is continuous in s over any compact set K ⊂ S , and C , C are independent of s . Proposition 3.5.

For

K ⊂ S compact, there exists a function g T ( · ) ∈ G such that for any u ∈ U T,s we have that sup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ e − g T ( | u | ) . (20)12 roof. Assume for now that for any M ≥ P ( s ) ν (cid:20) Z T,s ( u ) > exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27)(cid:21) ≤ C s,M | u | M (21)for some constant C s,M > s and M . We show that if (21) holds, then (20)follows. Indeed E ( s ) ν h Z T,s ( u ) i = E ( s ) ν h Z T,s ( u ) { Z T,s ( u ) ≤ exp { − E ( s ) [ ξ (1 − ξ )] | u | }} i + E ( s ) ν h Z T,s ( u ) { Z T,s ( u ) > exp { − E ( s ) [ ξ (1 − ξ )] | u | }} i ≤ exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + E ( s ) ν [ Z T,s ( u )] P ( s ) ν (cid:20) Z T,s ( u ) > exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27)(cid:21) ≤ exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + C s,M | u | M where in the ﬁrst inequality we have made use of Cauchy-Schwarz, and for the second inequalitywe have used (21). Therefore,sup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ sup s ∈K ( exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + C s,M | u | M ) = exp (cid:26) −

132 inf s ∈K E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + sup s ∈K C s,M | u | M =: exp {− g T ( | u | ) . } It remains to ensure that g T ( · ) ∈ G , that inf s ∈K E ( s ) [ ξ (1 − ξ )] ≥ κ > κ , andthat for any M ≥ s ∈K C s,M < ∞ . Observe thatmin (cid:26) inf s ∈K e s , (cid:27) B ( θ , θ ) ≤ G s ≤ max (cid:26) sup s ∈K e s , (cid:27) B ( θ , θ ) . Thus inf s ∈K E ( s ) [ ξ (1 − ξ )] = inf s ∈K (cid:26)Z G s e sξ ξ θ (1 − ξ ) θ dξ (cid:27) ≥ inf s ∈K nR e sξ ξ θ (1 − ξ ) θ dξ o max { sup s ∈K e s , } B ( θ , θ ) ≥ min { inf s ∈K e s , } B ( θ + 1 , θ + 1)max { sup s ∈K e s , } B ( θ , θ ) =: κ and κ > K is bounded, and thus both sup s ∈K e s and inf s ∈K e s are ﬁnite and non-zero.We show that sup s ∈K C s,M is ﬁnite ∀ M ≥ g T ( | u | ) asdeﬁned above is in the class of functions G . To this end, observe that g T ( | u | ) = − log exp (cid:26) −

132 inf s ∈K E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + sup s ∈K C s,M | u | M ! . Indeed, for a ﬁxed

T > g T ( | u | ) → ∞ as | u | → ∞ , because inf s ∈K E ( s ) [ ξ (1 − ξ )] >

0, andfurthermore given any ﬁxed N , we can choose M large enough (note the way we phrased (21)allows us to choose our M arbitrarily large, say M > N ) such thatlim T →∞ y →∞ y N e − g T ( y ) = lim T →∞ y →∞ y N exp (cid:26) −

132 inf s ∈K E ( s ) [ ξ (1 − ξ )] | y | (cid:27) + sup s ∈K C s,M | y | M ! = 0 , g T ( | u | ) is independentof T . Thus we have proved that if (21) holds, thensup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ e − g T ( | u | ) , g T ( · ) ∈ G . To show that (21) holds, we make use of Chebyshev’s inequality as well as Theorem 3.2 in [13].Indeed, observe that if ν = f s , then P ( s ) ν (cid:20) Z T,s ( u ) ≥ exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27)(cid:21) = P ( s ) ν " G s + u √ T G s exp (cid:26) uX √ T + u √ T Z T p X t (1 − X t ) dW t − | u | (cid:18) T Z T X t (1 − X t ) dt − E ( s ) [ ξ (1 − ξ )] (cid:19) (cid:27) > exp (cid:26) E ( s ) [ ξ (1 − ξ )] | u | (cid:27) ≤ P ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log G s + u √ T G s ! + uX √ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | + P ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) u √ T Z T p X t (1 − X t ) dW t (cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | (cid:21) + P ( s ) ν (cid:20) | u | (cid:12)(cid:12)(cid:12)(cid:12) T Z T X t (1 − X t ) dt − E ( s ) [ ξ (1 − ξ )] (cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | (cid:21) =: A + A + A . If ν = δ x , the only diﬀerence to the above would be the fact that A vanishes and the 48 onthe RHS of the bounds inside A and A would change to 32. For A , we use Chebyshev’sinequality: A = P ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log G s + u √ T G s ! + u √ T X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) ν (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log G s + u √ T G s ! + u √ T X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M  . But log G s + u √ T G s ! = log R x θ − (1 − x ) θ − e ( s + u √ T ) x dx R x θ − (1 − x ) θ − e sx dx ! ≤ | u |√ T , so we have A ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12) u √ T (cid:12)(cid:12)(cid:12)(cid:12) M | X | M = (cid:18) E ( s ) [ ξ (1 − ξ )] √ T | u | (cid:19) M E ( s ) h | ξ | M i ≤ (cid:18) d s E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) h | ξ | M i =: C (1) s,M | u | M , where in the second inequality we made use of the fact that u ∈ U T,s , and thus | u | ≤ d s √ T where we deﬁne d s := sup w ∈ ∂ S | s − w | (which is strictly positive and bounded as S is open and14ounded). To see that sup s ∈K C (1) s,M is bounded, observe thatsup s ∈K C (1) s,M = sup s ∈K ((cid:18) d s E ( s ) [ ξ (1 − ξ )] (cid:19) M E ( s ) h | ξ | M i) ≤ (cid:18) B ( θ , θ ) B ( θ + 1 , θ + 1) sup s ∈K d s max { e s , } min { e s , } (cid:19) M , which is clearly ﬁnite because K is bounded.For A we use a similar argument, but now use the fact that we have a stochastic integral: A ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12) u √ T Z T p X t (1 − X t ) dW t (cid:12)(cid:12)(cid:12)(cid:12) M ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M (cid:18) M M − (cid:19) M T − E ( s ) ν (cid:20)Z T | X t (1 − X t ) | M dt (cid:21) ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M (cid:18) M M − (cid:19) M =: C (2) s,M | u | M , where the ﬁrst line uses Chebyshev’s inequality and the second inequality uses Lemma 1.1 (equa-tion (1.3)) in [12]. That sup s ∈K C (2) s,M is ﬁnite follows from arguments similar to those used forthe respective term in A .For A we make use of Theorem 3.2 in [13], which gives us that for M ≥ P ( s ) ν " (cid:12)(cid:12)(cid:12)(cid:12) T Z T X t (1 − X t ) dt − E ( s ) [ ξ (1 − ξ )] (cid:12)(cid:12)(cid:12)(cid:12) ≥ E ( s ) [ ξ (1 − ξ )] ≤ K ( s, X, M ) k x (1 − x ) k M ∞ (cid:16) E ( s ) [ ξ (1 − ξ )]6 √ T (cid:17) M . (22)For the RHS of (22), we have that K ( s, X, M ) k x (1 − x ) k M ∞ (cid:16) E ( s ) [ ξ (1 − ξ )]6 √ T (cid:17) M ≤ K ( s, X, M ) (cid:18) k x (1 − x ) k ∞ d s E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M =: C (3) s,M | u | M , where K ( s, X, M ) is a function that depends on M and on the moments of the hitting times of X . Finally we deduce that sup s ∈K C (3) s,M is ﬁnite by observing thatsup s ∈K C (3) s,M = sup s ∈K K ( s, X, M ) (cid:18) k x (1 − x ) k ∞ d s E ( s ) [ ξ (1 − ξ )] (cid:19) M ≤ sup s ∈K K ( s, X, M ) (cid:18) B ( θ , θ ) B ( θ + 1 , θ + 1) sup s ∈K d s max { e s , } min { e s , } (cid:19) M , which is ﬁnite since k x (1 − x ) k ∞ = 1 / K is compact, and K ( s, X, M ) is bounded by a functionwhich is continuous in s (see Appendix A for the corresponding details).Finally, we present the result which guarantees that Condition 3 in Theorem 3.2 holds, and thusthat the Ibragimov-Has’minskii conditions hold for the Wright-Fisher diﬀusion.15 roposition 3.6. The random functions Z s ( u ) and ψ ( v ) := Z R ℓ ( v − u ) Z s ( u ) R R Z s ( y ) dy du attain their maximum and minimum respectively at the unique point ¯ u = ¯ u s = I ( s ) − ∆( s ) withprobability 1.Proof. The ﬁrst assertion follows immediately from Corollary 3.3, whilst for the second we directthe interested reader to Theorem III.2.1 in [9], which relies on two results: Anderson’s Lemma(Lemma II.10.1 in [9]), and Lemma II.10.2 in [9].

In this article we have shown in Theorem 2.2 that the Wright-Fisher diﬀusion is ergodic uni-formly in the parameter ϑ = ( s, θ , θ ) ∈ Θ ⊂ R × (0 , ∞ ) , extending the well-known pointwisein ϑ = ( s, θ , θ ) ergodicity of the Wright-Fisher diﬀusion over any compact set K ⊂ R × (0 , ∞ ) for bounded functions. We have also proved that the family of measures { P ( ϑ ) ν : ϑ ∈ Θ } inducedby the solution to the SDE (1) are uniformly LAN when Θ ⊂ R × [1 , ∞ ) in Theorem 2.4, wherethe extra restriction on the mutation rates ensures that the likelihood ratio function is deﬁned.In Section 3 we then considered inference for the selection parameter s when the diﬀusion is ob-served continuously through time and the mutation rates are known. Under these assumptionswe proved that the ML and Bayesian estimators for s ∈ S ( S an open bounded subset of R ) inthe non-neutral Wright-Fisher diﬀusion started from either a ﬁxed point x ∈ (0 ,

1) or from sta-tionarity, are uniformly over compacts sets consistent and display uniform in s ∈ K asymptoticnormality and convergence of moments, for any compact K ⊂ S . Furthermore, for the rightchoice of loss function we also have asymptotic eﬃciency of the two estimators. The uniformityin these results is particularly useful as it guarantees a lower bound on the rate at which the in-ferential parameters are being learned. Such properties have been shown to hold for a wide classof SDEs in [12] by making use of the general theorems of Ibragimov and Has’minskii (TheoremsI.5.1, I.5.2, I.10.1 and I.10.2 in [9]), however they do not hold for the Wright-Fisher diﬀusionas they require the diﬀusion coeﬃcient to be non-zero everywhere and to have an inverse thathas a polynomial majorant. Both conditions fail for (1), forcing us to ﬁnd an alternative way ofproving that the Ibragimov-Has’minskii conditions still hold. We emphasise here that the aim ofthis study is to investigate the properties of the estimators in the “ideal” continuous observationscenario when the whole path is known to the observer.Assuming that the mutation rates are known is a limitation to this study, however we em-phasise that in the regime considered here these can be inferred directly from the path oncethe diﬀusion comes arbitrarily close to either boundary (and for mutation parameters less than1 this happens in ﬁnite time almost surely). Nonetheless, extending this work to include themutation parameters greater than 1 as a part of the inferential setup would be of great interest.This proves to be rather challenging as now the likelihood ratio function involves expressions ofthe form (1 − x ) x − and x (1 − x ) − (as witnessed in Theorem 2.4) which require much moredelicate arguments in order to establish the same conclusions as in Theorem 3.1. The mainissue here is in showing that condition 1 in the Ibragimov-Has’minskii conditions holds, for theother two conditions follow from Theorem 2.4 and Proposition 3.6. In particular, the fact thatthe functions (1 − x ) x − and x (1 − x ) − are unbounded in x and have only ﬁnitely many mo-ments with respect to the stationary distribution means that the strategies used in the proofsof Propositions 3.4 and 3.5 cannot be used.Recent advances in genome sequencing technology have led to an increase in the availability16nd analysis of genetic time series data. Inference for selection has traditionally been conductedusing techniques for and data coming from a single point in time. However, having a time seriesof data points allows one to track the changes in allele frequencies over time, to better under-stand and infer the presence and eﬀect of selection. Several inferential techniques have alreadybeen developed for such a setting (see for instance [1, 14, 19, 8, 7], as well as [4] for a review onthe subject), and although the techniques provide ostensibly reasonable estimation, there arenot always theoretical guarantees on the statistical properties of the estimators being used. Theresults presented in this paper oﬀer a baseline in this regard, and prove that in the absence ofobservational error one is guaranteed that the ML and Bayesian estimators are uniform overcompact sets consistent, asymptotically normal, and display moment convergence, besides beingasymptotically eﬃcient for the right choice of loss function. This work was supported by the EPSRC as well as the MASDOC DTC (under grant EP/HO23364/1),by The Alan Turing Institute under the EPSRC grant EP/N510129/1, and by the EPSRC undergrant EP/R044732/1.

A Proof of Theorem 2.2

Proof.

We show uniform in ϑ = ( s, θ , θ ) ergodicity for the Wright-Fisher diﬀusion by makinguse of Theorem 3.2 in [13], which allows us to bound the LHS of (3) in terms of the moments ofthe hitting times of the process. We point out that this result requires the diﬀusion coeﬃcientto be positive everywhere, and the drift and diﬀusion coeﬃcients to be locally Lipschitz and tosatisfy a linear growth condition. These conditions fail for the Wright-Fisher diﬀusion becauseof its diﬀusion coeﬃcient; however, they are used only to guarantee the existence of a uniquestrong solution to the SDE in Theorem 3.2, which we already have by other means. None ofthese requirements on the drift and diﬀusion coeﬃcients are used in the proof of Theorem 3.2 in[13] when p ∈ { , , . . . } , which allows us to employ this theorem for the Wright-Fisher diﬀusionfor such p . All that remains to prove then is that these moments can be bounded by a functioncontinuous in ϑ , for then the supremum over any compact set K ⊂ R × (0 , ∞ ) is ﬁnite and (3)holds. To this end, we introduce some notation from [13], namely let a, b ∈ (0 ,

1) be arbitraryﬁxed points such that a < b . Deﬁne S = 0, R = 0, and S := inf { t ≥ X t = b } R := inf { t ≥ S : X t = a } S n +1 := inf { t ≥ R n : X t = b } R n +1 := inf { t ≥ S n +1 : X t = a } for n ∈ N . By the strong Markov property, ( R k − R k − ) k ∈ N is an i.i.d. sequence with law under P ( ϑ ) ν equal to the law of R under P ( ϑ ) a , where P ( ϑ ) ν and E ( ϑ ) ν are as deﬁned in Section 2, and P ( ϑ ) a denotes the law of the process started from a . Related to the process ( R n ) n ∈ N we have theprocess ( N t ) t ≥ which we deﬁne as N t := sup { n : R n ≤ t } and for which we observe that { N t ≥ n } = { R n ≤ t } . We also denote by T b := inf { t ≥ X t = b } b . Furthermore, let ℓ ϑ := E ( ϑ ) [ N ] = E ( ϑ ) a [ R ] − (see Lemma 2.7 in [13]),and ¯ η := − ( R − R − ℓ − ϑ ). Then Theorem 3.2 in [13] gives us that for p ∈ { , , . . . } P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T Z T h ( X t ) dt − E ( ϑ ) [ h ( ξ )] (cid:12)(cid:12)(cid:12)(cid:12) > ε (cid:21) ≤ K ( ϑ , X, p ) ε − p k h k p ∞ T − p , where K ( ϑ , X, p ) := 6 p E ( ϑ ) ν h R p i + 12 p C p ℓ p ϑ E ( ϑ ) ν [ | R − R | p ] + 2(6 p ) ℓ ϑ E ( ϑ ) a [ R p ]+ 2 p E ( ϑ ) ν h(cid:12)(cid:12) R − ℓ ϑ − (cid:12)(cid:12) p i + 2 p C p ℓ p ϑ E ( ϑ ) ν [ | ¯ η | p ] , and C p is a constant depending only on p . We point out here that Theorem 3.2 in [13] holds ∀ p ∈ (1 , ∞ ) under additional assumptions, but for our case we need only p ∈ { , , . . . } . Thuswe are left with showing these moments can be bounded from above by a function continuousin ϑ , for then (3) follows by taking the supremum of this function over a compact set. Now theonly terms above that depend on ϑ are E ( ϑ ) ν h R p i ℓ p ϑ E ( ϑ ) ν [ | R − R | p ] ℓ ϑ E ( ϑ ) a [ R p ] E ( ϑ ) ν h(cid:12)(cid:12) R − ℓ − ϑ (cid:12)(cid:12) p i ℓ p ϑ E ( ϑ ) ν [ | ¯ η | p ] (23)and in light of the following inequalities E ( ϑ ) ν [ | ¯ η | p ] ≤ p − (cid:16) E ( ϑ ) ν [ | R − R | p ] + E ( ϑ ) ν h ℓ − p ϑ i(cid:17) = 2 p − (cid:16) E ( ϑ ) a [ R p ] + E ( ϑ ) a [ R ] p (cid:17) , E ( ϑ ) ν h(cid:12)(cid:12) R − ℓ − ϑ (cid:12)(cid:12) p i ≤ p − (cid:16) E ( ϑ ) ν h R p i + E ( ϑ ) ν h ℓ − p ϑ i(cid:17) = 2 p − (cid:16) E ( ϑ ) ν h R p i + E ( ϑ ) a [ R ] p (cid:17) , E ( ϑ ) ν [ | R − R | p ] = E ( ϑ ) a [ R p ] ≤ p − (cid:16) E ( ϑ ) a (cid:2) T pb (cid:3) + E ( ϑ ) b [ T pa ] (cid:17) , E ( ϑ ) ν h R p i ≤ p − (cid:16) E ( ϑ ) ν h T p b i + E ( ϑ ) b h T p a i(cid:17) , E ( ϑ ) a [ R ] = E ( ϑ ) a [ T b ] + E ( ϑ ) b [ T a ] , it suﬃces to consider only the terms ℓ ϑ and E ( ϑ ) ν (cid:2) T pb (cid:3) . Thus we are left with showing that thesetwo terms can be bounded from above by a function continuous in ϑ . We further point out thatwe can reduce our considerations in the expressions above to integer moments, for if this is notthe case then E ( ϑ ) ν (cid:2) T pb (cid:3) ≤ E ( ϑ ) ν h T ⌈ p ⌉ b i + E ( ϑ ) ν h T ⌊ p ⌋ b i where ⌈·⌉ and ⌊·⌋ denote the ceiling and ﬂoor functions respectively.We make use of the backward equation for the quantity U q,b ( x ) := E ( ϑ ) x [ T qb ] for q ∈ { , , . . . } ,to derive the ODE (as can be found in [11] p. 203 and 210, and [22]) x (1 − x )2 U ′′ q,b ( x ) + (cid:18) s x (1 − x ) − θ x + θ − x ) (cid:19) U ′ q,b ( x ) + qU q − ,b ( x ) = 0 (24)with boundary conditions U q,b ( b ) = 0 and lim y → S ′ ( y ) − ∂∂y U q,b ( y ) = 0 when x < b or lim y → S ′ ( y ) − ∂∂y U q,b ( y ) =0 when x > b , where S ( x ) := Z x e − sy y − θ (1 − y ) − θ dy.

18e point out here that in [22] the diﬀusion coeﬃcient is assumed to be strictly positive every-where to ensure that the speed and scale of the diﬀusion are well-deﬁned. The results howeverstill hold for the Wright-Fisher diﬀusion as both of these quantities exist and are well-deﬁneddespite the fact that the diﬀusion coeﬃcient is zero at either boundary. Solving (24) for x < b leads to E ( ϑ ) x [ T qb ] = Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − qU q − ,b ( η ) dηdξ, (25)whilst for x > b we have that E ( ϑ ) x [ T qb ] = Z xb e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − qU q − ,b ( η ) dηdξ. (26)We claim that for any x < b and any q ∈ { , , . . . } , E ( ϑ ) x [ T qb ] ≤ q ! (cid:18) { e − s , } θ Z b (1 − ξ ) − max { θ , } dξ (cid:19) q . (27)To see this, observe that E ( ϑ ) x [ T b ] = Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − dηdξ ≤ { e − s , } Z b ξ − θ (1 − ξ ) − θ Z ξ η θ − (1 − η ) θ − dηdξ ≤ { e − s , } Z b ξ − θ (1 − ξ ) − max { θ , } Z ξ η θ − dηdξ = 2 max { e − s , } θ Z b (1 − ξ ) − max { θ , } dξ, (28)where the second inequality follows from the observation that for η ∈ (0 , ξ )(1 − η ) θ − ≤ ( θ ≥ − ξ ) θ − if θ < , and shows that (27) holds for q = 1. Now the RHS of (28) is independent of x , so we can use therecursion in (26) to conclude by induction that (27) holds for q ∈ { , , . . . } as required. Similararguments to those presented above allow us to conclude that for x > b and q ∈ { , , . . . } , E ( ϑ ) x [ T qb ] ≤ q ! (cid:18) { e s , } θ Z b ξ − max { θ , } dξ (cid:19) q . (29)Both RHS of (27) and (29) are independent of x , so trivially E ( ϑ ) ν (cid:2) T qb (cid:3) ≤ q ! (cid:18) { e − s , } θ Z b (1 − y ) − max { θ , } dy (cid:19) q + (cid:18) { e s , } θ Z b y − max { θ , } dy (cid:19) q ! . (30)All the terms on the RHS of (27), (29) and (30) are continuous in ϑ , so we have our requiredbound for E ( ϑ ) ν (cid:2) T qb (cid:3) . It remains to show that we can bound ℓ ϑ from above by an expressioncontinuous in ϑ . Observe that by deﬁnition ℓ ϑ = E ( ϑ ) a [ R ] − = (cid:16) E ( ϑ ) a [ T b ] + E ( ϑ ) b [ T a ] (cid:17) − , (31)19nd recall that we will take the supremum in ϑ over a given compact set K , so using (25) and(26) respectively, and setting ¯ θ := sup ϑ ∈K θ , ¯ θ := sup ϑ ∈K θ , we can conclude that E ( ϑ ) a [ T b ] = Z ba e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − dηdξ ≥ { e − s , } Z ba ξ − θ (1 − ξ ) − θ dξ Z a η θ − (1 − η ) θ − dη ≥ { e − s , } ( b − a ) a θ θ (1 − a ) ¯ θ − , (32) E ( ϑ ) b [ T a ] = Z ba e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − dηdξ ≥ { e s , } Z ba ξ − θ (1 − ξ ) − θ dξ Z b η θ − (1 − η ) θ − dη ≥ { e s , } ( b − a ) (1 − b ) θ θ b ¯ θ − , (33)which follow by observing that ξ − θ (1 − ξ ) − θ > ∀ ξ ∈ ( a, b ) , ∀ θ , θ > , (1 − η ) θ − ≥ (1 − a ) ¯ θ − ∀ η ∈ (0 , a ) ,η θ − ≥ b ¯ θ − ∀ η ∈ ( b, . Note that the RHS of (32) and (33) are continuous in ϑ , and thus in view of (31) we have foundthe required upper bound on ℓ ϑ which is continuous in ϑ . B Extending Theorem 2.2 for two speciﬁc unbounded functions

Recall the notation introduced in Appendix A, namely the regeneration times { S n , R n } ∞ n =0 andthe number of upcrossings up to time t , { N t } t ≥ . In what follows we consider the function(1 − x ) x − , however similar arguments hold for the function x (1 − x ) − . We want to prove thatlim T →∞ sup ϑ ∈K P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T Z T − X t X t dt − E ( ϑ ) (cid:20) − ξξ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) > ε (cid:21) = 0 (34)holds for any compact set K ⊂ Θ ⊂ R × [1 , ∞ ) . We point out that (34) is an extension of theresult in Theorem 2.2 for the speciﬁc unbounded function (1 − x ) x − which is needed in the proofof Theorem 2.4. Note that the expectation inside the probability is well-deﬁned because we areassuming ( θ , θ ) ∈ [1 , ∞ ) ; however, the function (1 − ξ ) ξ − has only ﬁnitely many moments forany given pair of mutation rates, which makes the analysis here more intricate than the one inAppendix A. The strategy here will be to decompose the sample path of the diﬀusion into i.i.d.blocks of excursions as done in Theorem 3.5 in [13]. However, we will deal with the resultingexpectations in a diﬀerent way, namely by applying the ODE approach used in Appendix A tobound these quantities by functions continuous in ϑ . Recall that as we are taking a supremumover ϑ in a compact set K , bounding an expectation by a function continuous in ϑ suﬃces toyield a bound uniform over K . To this end, ﬁx ε ∈ (0 , E ( ϑ ) [(1 − ξ ) ξ − ]) and choose δ ∈ (0 , ε = δ E ( ϑ ) [(1 − ξ ) ξ − ], and set Ω T := {| N T T − − ℓ ϑ | ≤ ℓ ϑ δ/ } for ℓ ϑ = E ( ϑ ) a [ R ] − .Then as in the proof of Theorem 3.5 in [13], we get the following decomposition P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T Z T − X t X t dt − E ( ϑ ) (cid:20) − ξξ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) > ε (cid:21) P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)Z R − X t X t dt (cid:12)(cid:12)(cid:12)(cid:12) > T ε (cid:21) + P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)Z R NT +1 R − X t X t dt − N T E ( ϑ ) (cid:20) − ξξ (cid:21) E ( ϑ ) a [ R ] (cid:12)(cid:12)(cid:12)(cid:12) > T ε T (cid:21) + P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) N T E ( ϑ ) (cid:20) − ξξ (cid:21) E ( ϑ ) a [ R ] − T E ( ϑ ) (cid:20) − ξξ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) > T ε T (cid:21) + P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)Z R NT +1 T − X t X t dt (cid:12)(cid:12)(cid:12)(cid:12) > T ε T (cid:21) + P ( ϑ ) ν [Ω cT ] =: A + B + E + C + D Dealing with E and D can be achieved as in equations (3.10) and (3.14) in [13], to deduce that E = 0 and D ≤ T ε E ( ϑ ) (cid:20) − ξξ (cid:21) (cid:16) E ( ϑ ) ν (cid:2)(cid:12)(cid:12) R − ℓ − ϑ (cid:12)(cid:12)(cid:3) + 2 C E ( ϑ ) ν h | ¯ η | i ℓ ϑ (cid:17) , for C the constant from the Burkholder-Davis-Gundy inequality. All the above expressions areeither constant or have been shown to be bounded by functions continuous in ϑ in Appendix A,so it remains to deal with terms A , B and C above.Applying Markov’s inequality to A gives A ≤ T ε E ( ϑ ) ν (cid:20)Z R − X t X t dt (cid:21) and we can decompose the above integral E ( ϑ ) ν (cid:20)Z R − X t X t dt (cid:21) = E ( ϑ ) ν (cid:20)Z S − X t X t dt (cid:21) + E ( ϑ ) ν (cid:20)Z R S − X t X t dt (cid:21) ≤ E ( ϑ ) ν (cid:20)Z T b − X t X t dt (cid:21) + 1 − aa E ( ϑ ) ν [ R ] . (35)So it remains to prove that the ﬁrst term on the RHS can be bounded by a function continuousin ϑ . It turns out that B and C can be bounded by similar quantities, so we do this ﬁrst andsubsequently show that the resulting quantities can be bounded by functions continuous in ϑ .Indeed, set ξ k := R R k +1 R k (1 − X t ) X − t dt , M = 0, and M n := n X k =0 (cid:16) ξ k − E ( ϑ ) ν [ ξ k ] (cid:17) . Then B = P ( ϑ ) ν (cid:20) | M N T | > T ε T (cid:21) ≤ P ( ϑ ) ν " sup n ≤⌊ T ℓ ϑ (1+ δ/ ⌋ | M n | > T ε ≤ (cid:18) T ε (cid:19) E ( ϑ ) ν  sup n ≤⌊ T ℓ ϑ (1+ δ/ ⌋ | M n | !  ≤ C (cid:18) T ε (cid:19) E ( ϑ ) ν h [ M ] ⌊ T ℓ ϑ (1+ δ/ ⌋ i by the Chebyshev and Burkholder-Davis-Gundy inequalities, where [ M ] n denotes the quadraticvariation of M up to time n , and C is the Burkholder-Davis-Gundy constant. Now observethat E ( ϑ ) ν h [ M ] ⌊ T ℓ ϑ (1+ δ/ ⌋ i = ⌊ T ℓ ϑ (1 + δ/ ⌋ E ( ϑ ) ν (cid:20)(cid:16) ξ − E ( ϑ ) ν [ ξ ] (cid:17) (cid:21) ⌊ T ℓ ϑ (1 + δ/ ⌋ (cid:16) E ( ϑ ) a (cid:2) ξ (cid:3) + E ( ϑ ) a [ ξ ] (cid:17) . because the { ξ k } ∞ k =1 are i.i.d., and moreover we have that under P ( ϑ ) ν they are equal in distribu-tion to ξ under P ( ϑ ) a . So B ≤ C ⌊ ℓ ϑ (1 + δ/ ⌋ T ε (cid:16) E ( ϑ ) a (cid:2) ξ (cid:3) + E ( ϑ ) a [ ξ ] (cid:17) . (36)The second term of (36) can be bounded in the same way as in (35), whilst for the ﬁrst termwe can use a similar decomposition to get E ( ϑ ) a (cid:2) ξ (cid:3) ≤ E ( ϑ ) a "(cid:18)Z T b − X t X t dt (cid:19) + (cid:18) − aa (cid:19) E ( ϑ ) a (cid:2) R (cid:3)! . (37)Finally, for C we use the same arguments as in [13] (just before equation (3.13)) to get that C ≤ ⌊ T ℓ ϑ (1+ δ/ ⌋ X k =1 P ( ϑ ) ν (cid:20)Z R k +1 R k − X t X t dt > T ε (cid:21) ≤ ⌊ T ℓ ϑ (1 + δ/ ⌋ T ε E ( ϑ ) ν "(cid:18)Z R R − X t X t dt (cid:19) ≤ ℓ ϑ (1 + δ/ T ε E ( ϑ ) a "(cid:18)Z R − X t X t dt (cid:19) , and we can apply the same reasoning as in (37). It remains to show that the terms E ( ϑ ) a (cid:20)Z T b − X t X t dt (cid:21) , E ( ϑ ) ν (cid:20)Z T b − X t X t dt (cid:21) , E ( ϑ ) a "(cid:18)Z T b − X t X t dt (cid:19) (38)can be bounded by functions continuous in ϑ . The same arguments used to derive the ODEsin Appendix A can be used here to derive an ODE for U n ( x ) := E ( ϑ ) x [( R T b (1 − X t ) X − t dt ) n ] forthe cases when x < b and x > b with the same boundary conditions as in Appendix A. Thusthe following recursion holds for U n ( x ) when x < bU n ( x ) = 2 n Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ U n − ( η ) dηdξ, n = 1 , , . . . , (39)and for x > b we have U n ( x ) = 2 n Z xb e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ U n − ( η ) dηdξ, n = 1 , , . . . . (40)Now for n = 1, we get that for x < b , E ( ϑ ) x (cid:20)Z T b − X t X t dt (cid:21) = 2 Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ dηdξ ≤ { e − s , } Z bx ξ − θ (1 − ξ ) − θ Z ξ η θ − dηdξ = 2 max { e − s , } θ − Z bx ξ − (1 − ξ ) − θ dξ, (41)where we point out that the RHS of (41) is continuous in ϑ over any compact set K ⊂ Θ ⊂ R × [1 , ∞ ) because Θ is open. For x > b E ( ϑ ) x (cid:20)Z T b − X t X t dt (cid:21) = 2 Z xb e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ dηdξ { e s , } Z xb ξ − max { θ , } (1 − ξ ) − θ Z ξ (1 − η ) θ dηdξ = 2 max { e s , } θ + 1 Z xb ξ − max { θ , } (1 − ξ ) dξ ≤ { e s , } θ + 1 Z xb ξ − max { θ , } dξ. (42)Thus when ν = f ϑ , E ( ϑ ) ν (cid:20)Z T b − X t X t dt (cid:21) ≤ { e − s , } θ − Z b Z bx ξ − (1 − ξ ) − θ dξf ϑ ( x ) dx + 2 max { e s , } θ + 1 Z b Z xb ξ − max { θ , } dξf ϑ ( x ) dx ≤ { e s , } θ ( θ −

1) 1 G ϑ Z b (1 − ξ ) − θ dξ + 2 max { e s , } θ + 1) Z b ξ − max { θ , } dξ, (43)which follows from Z b Z bx ξ − (1 − ξ ) − θ x θ − (1 − x ) θ − dξdx = Z b Z ξ ξ − (1 − ξ ) − θ x θ − (1 − x ) θ − dxdξ ≤ θ Z b ξ θ − (1 − ξ ) − θ dξ ≤ θ Z b (1 − ξ ) − θ dξ because θ , θ >

1, and Z b Z xb ξ − max { θ , } f ϑ ( x ) dξdx = Z b Z ξ ξ − max { θ , } f ϑ ( x ) dxdξ ≤ Z b ξ − max { θ , } dξ. Similarly, using the recursions in (39) and (40), we get that for x < b , E ( ϑ ) x "(cid:18)Z T b − X t X t dt (cid:19) ≤ { e − s , } ) θ − Z b γ θ − (1 − γ ) − θ dγ × Z bx ξ − θ (1 − ξ ) − θ dξ ≤ { e − s , } ) θ − − b ) − θ Z b γ θ − dγ × Z bx ξ − θ (1 − ξ ) − θ dξ (44)which follows from Z ξ η θ − (1 − η ) θ Z bη γ − (1 − γ ) − θ dγdη ≤ Z b η θ − (1 − η ) θ Z bη γ − (1 − γ ) − θ dγdη ≤ Z b γ θ − (1 − γ ) − θ dγ. As the RHS of (41), (42), (43), and (44) are all continuous in ϑ , we are able to exhibit a bound forthe quantities in (38) uniformly over compact K ⊂ Θ , and hence similarly bound the quantities A , B , and C . Combined with the bounds for D and E we conclude that (34) holds.23 eferences [1] J. P. Bollback, T. L. York, and R. Nielsen. Estimation of 2nes from temporal allele frequencydata. Genetics , 179(1):497–502, 2008.[2] C. Cannings. The latent roots of certain Markov chains arising in genetics: a new approach.I. Haploid models.

Advances in Appl. Probability , 6:260–290, 1974.[3] D. A. Dawson, B. Maisonneuve, and J. Spencer. ´Ecole d’ ´Et´e de Probabilit´es de Saint-FlourXXI—1991 , volume 1541 of

Lecture Notes in Mathematics . Springer-Verlag, Berlin, 1993.Papers from the school held in Saint-Flour, August 18–September 4, 1991, Edited by P. L.Hennequin.[4] M. Dehasque, M. C. vila Arcos, D. Dez-del Molino, M. Fumagalli, K. Guschanski, E. D.Lorenzen, A.-S. Malaspinas, T. Marques-Bonet, M. D. Martin, G. G. R. Murray, A. S. T.Papadopulos, N. O. Therkildsen, D. Wegmann, L. Daln, and A. D. Foote. Inference ofnatural selection from ancient dna.

Evolution Letters , 4(2):94–108, 2020.[5] S. N. Ethier and T. G. Kurtz.

Markov processes . Wiley Series in Probability and Mathe-matical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, Inc., NewYork, 1986. Characterization and convergence.[6] S. Gugushvili and P. Spreij. Parametric inference for stochastic diﬀerential equations: asmooth and match approach.

ALEA Lat. Am. J. Probab. Math. Stat. , 9(2):609–635, 2012.[7] Z. He, X. Dai, M. Beaumont, and F. Yu. Detecting and quantifying natural selection attwo linked loci from time series data of allele frequencies. bioRxiv , 2019.[8] Z. He, X. Dai, M. Beaumont, and F. Yu. Maximum likelihood estimation of natural selectionand allele age from time series data of allele frequencies. bioRxiv , 2020.[9] I. A. Ibragimov and R. Z. Has’minskii.

Statistical estimation: Asymptotic theory . Springer-Verlag, 1981.[10] N. Ikeda and S. Watanabe.

Stochastic diﬀerential equations and diﬀusion processes , vol-ume 24 of

North-Holland Mathematical Library . North-Holland Publishing Co., Amsterdam;Kodansha, Ltd., Tokyo, second edition, 1989.[11] S. Karlin and H. M. Taylor.

A second course in stochastic processes . Academic Press, Inc.[Harcourt Brace Jovanovich, Publishers], New York-London, 1981.[12] Y. A. Kutoyants.

Statistical inference for ergodic diﬀusion processes . Springer Series inStatistics. Springer-Verlag London, Ltd., London, 2004.[13] E. L¨ocherbach, D. Loukianova, and O. Loukianov. Polynomial bounds in the ergodic the-orem for one-dimensional diﬀusions and integrability of hitting times.

Ann. Inst. HenriPoincar´e Probab. Stat. , 47(2):425–449, 2011.[14] A.-S. Malaspinas, O. Malaspinas, S. N. Evans, and M. Slatkin. Estimating allele age andselection coeﬃcient from time-serial data.

Genetics , 192(2):599–607, 2012.[15] R. Nickl and K. Ray. Nonparametric statistical inference for drift vector ﬁelds of multi-dimensional diﬀusions.

Ann. Statist. , 48(3):1383–1408, 2020.[16] R. Nickl and J. S¨ohl. Nonparametric Bayesian posterior contraction rates for discretelyobserved scalar diﬀusions.

Ann. Statist. , 45(4):1664–1693, 2017.2417] L. Panzar and H. van Zanten. Nonparametric Bayesian inference for ergodic diﬀusions.

J.Statist. Plann. Inference , 139(12):4193–4199, 2009.[18] J. Pitman and M. Yor. Bessel processes and inﬁnitely divisible laws. In D. Williams,editor,

Stochastic Integrals , volume 851 of

Lecture Notes in Mathematics , pages 285–370.Springer-Verlag, 1981.[19] J. G. Schraiber, S. N. Evans, and M. Slatkin. Bayesian inference of natural selection fromallele frequency time series.

Genetics , 203(1):493–511, 2016.[20] F. van der Meulen and H. van Zanten. Consistent nonparametric Bayesian inference fordiscretely observed scalar diﬀusions.

Bernoulli , 19(1):44–63, 2013.[21] J. H. van Zanten. A note on consistent estimation of multivariate parameters in ergodicdiﬀusion models.

Scand. J. Statist. , 28(4):617–623, 2001.[22] H. Wang and C. Yin. Moments of the ﬁrst passage time of one-dimensional diﬀusion withtwo-sided barriers.

Statist. Probab. Lett. , 78(18):3373–3380, 2008.[23] G. A. Watterson. Estimating and testing selection: the two-alleles, genetic selection diﬀu-sion model.