Convergence of Likelihood Ratios and Estimators for Selection in non-neutral Wright-Fisher Diffusions
aa r X i v : . [ m a t h . P R ] A ug Convergence of Likelihood Ratios and Estimators for Selection innon-neutral Wright-Fisher Diffusions
Jaromir Sant, , Paul A. Jenkins, , , , Jere Koskela, , Dario Span`o, MASDOC , Department of Statistics & Department of Computer Science University of Warwick, Coventry CV4 7AL, United KingdomThe Alan Turing Institute , British Library, London NW1 2DB, United Kingdom August 20, 2020
Abstract
A number of discrete time, finite population size models in genetics describing the dy-namics of allele frequencies are known to converge (subject to suitable scaling) to a diffusionprocess in the infinite population limit, termed the Wright-Fisher diffusion. In this articlewe show that the diffusion is ergodic uniformly in the selection and mutation parameters,and that the measures induced by the solution to the stochastic differential equation are uni-formly locally asymptotically normal. Subsequently these two results are used to analyse thestatistical properties of the Maximum Likelihood and Bayesian estimators for the selectionparameter, when both selection and mutation are acting on the population. In particular, itis shown that these estimators are uniformly over compact sets consistent, display uniformin the selection parameter asymptotic normality and convergence of moments over compactsets, and are asymptotically efficient for a suitable class of loss functions.
Mathematical population genetics is concerned with the study of how populations evolve overtime, offering viable models to study how various biological phenomena such as selection andmutation affect the genetic profile of the population they act upon. Many models have beenproposed over the years, but perhaps the most popular is the Wright-Fisher model (see for in-stance [11, Chapter 15, Section 2]).Under a suitable scaling of both space and time, a diffusion limit exists for the Wright-Fishermodel, which is referred to as the Wright-Fisher diffusion (1) and is the main focus of thisarticle. The Wright-Fisher diffusion is robust in the sense that the broad class of Cannings [2]models converge to it when suitably scaled. Furthermore, it has the neat property that the onlycontribution to the diffusion coefficient comes from random mating whilst other features suchas selection and mutation appear solely in the drift coefficient. This facilitates inference as onecan concentrate on estimating the drift, treating the diffusion coefficient as a known expression.In this article we focus on a continuously observed Wright-Fisher diffusion describing the allelefrequency dynamics in a two-allele, haploid population undergoing both selection and mutation.In Section 2 we show that the diffusion is ergodic uniformly over both the selection and mutationparameters, and subsequently that the associated family of measures induced by the solutionto the stochastic differential equation (SDE) is uniformly locally asymptotically normal (pro-vided the mutation parameters are greater than 1). In Section 3 we then shift our focus on theproperties of the maximum likelihood (ML) and Bayesian estimators for the selection parameter1 ∈ S ⊂ R (which measures how much more favourable one allele is over the other), under theassumption that the mutation parameters are a priori known. We briefly discuss some technicalissues associated with conducting joint inference for the selection and mutation parameters inSection 4.We point out here that by observing the path continuously through time without error, onecan establish and analyse explicitly the statistical error produced by an estimator based on thewhole sample path, which then clearly illustrates the statistical limitations of alternative estima-tors based on less informative (e.g. discrete) observations. In a discrete observation setting, inaddition to the above mentioned statistical error, one also has to deal with observational error.One certainly cannot hope for an estimator that performs better in a discrete setting than in acontinuous one, so our analysis may be viewed as the ‘best possible’ performance for inferencefrom a discretely observed model.Inference for scalar diffusions, particularly proving consistency of estimators under specific ob-servational schemes, has generated considerable interest over the past few years [6, 12, 15, 16, 17,20, 21, 23]. However, most of the work so far has considered classes of diffusions which directlypreclude the Wright-Fisher diffusion, for instance by imposing periodic boundary conditions onthe drift coefficients or by requiring the diffusion coefficient be strictly positive everywhere. Theasymptotic study of a variety of estimators for continuously observed ergodic scalar diffusionshas been entertained in great depth in [12]; see in particular Theorems 2.8 and 2.13 in [12],which are respectively adaptations of Theorems I.5.1, I.10.1 and I.5.2, I.10.2 in [9]. HoweverTheorems 2.8 and 2.13 in [12] cannot be applied directly to the Wright-Fisher diffusion as cer-tain conditions do not hold, namely the reciprocal of the diffusion coefficient does not havea polynomial majorant. This discrepancy makes replicating the results for the Wright-Fisherdiffusion with selection and mutation highly non-trivial. Instead we exploit the explicit natureof (1), below, to prove, in our main result Theorem 3.1, uniform in the selection parameter overcompact sets consistency, asymptotic normality and convergence of moments, as well as asymp-totic efficiency for both the Maximum Likelihood (ML) and Bayesian estimators. We achievethis by showing that the conditions of Theorems I.5.1, I.10.1 and I.5.2, I.10.2 in [9] still holdfor the Wright-Fisher diffusion and that this diffusion is ergodic uniformly in the selection andmutation parameters (a term we define in Section 2). We point out that the uniformity in ourresults is particularly useful as it controls the lowest rate (over the true parameters) at whichthe parameters of interest are being learned by the inferential scheme.The Wright-Fisher diffusion with selection but without mutation was tackled specifically byWatterson in [23], where the author makes use of a frequentist framework. Having no mutationensures that the diffusion is absorbed at either boundary point 0 or 1 in finite time almostsurely, and by conditioning on absorption Watterson computes the moment generating function,proves asymptotic normality, and derives hypothesis tests for the Maximum Likelihood Esti-mator (MLE). Watterson’s work however does not address the Bayesian estimator, nor does itreadily extend to the case when mutation is present because the diffusion is no longer absorbedat the boundaries. In this sense the results obtained in Theorem 3.1 are complementary tothose obtained by Watterson under the assumption that the mutation parameters are known.Although this is a restriction, we are observing the path continuously over the interval [0 , T ]and subsequently sending T → ∞ , so these parameters could be inferred by considering theboundary behaviour of the diffusion. In particular, when either mutation parameter is less than1, then the diffusion hits the corresponding boundary in finite time almost surely. Further, asthe diffusion approaches the boundary the diffusion coefficient (i.e. noise) vanishes, and in factit vanishes sufficiently quickly on the approach to the boundary that the mutation parameterscan be inferred without error as soon as the boundary is first hit. For mutation parameters2reater than or equal to 1, the corresponding boundary point is no longer attainable but thediffusion can get arbitrarily close to it as T → ∞ , and a similar argument enables the mutationparameters again to be inferred (see [18, Remark 2.2] for a related argument applying to thesquared Bessel process).The rest of this article is organised as follows: in Section 2 we introduce the Wright-Fisher dif-fusion, proceed to describe some of its properties, and prove that the diffusion is both uniformlyin the selection and mutation parameters ergodic, as well as uniformly locally asymptoticallynormal. Section 3 then focuses on the ML and Bayesian estimators for the selection parameter,proving that these estimators have a set of desirable properties in Theorem 3.1. Section 4 thenconcludes with a discussion. The proof of Theorem 2.2 can be found in Appendix A, whilst inAppendix B we extend the conclusions of Theorem 2.2 for two specific unbounded functions. We start by giving a brief overview of the Wright-Fisher diffusion before proving that the diffu-sion is ergodic uniformly in the selection and mutation parameters (a term we define rigorouslyshortly), and subsequently use this to prove the uniform local asymptotic normality (LAN) ofthe family of measures associated to the solution of the SDE.Consider an infinite haploid population undergoing selection and mutation, where we are in-terested in two alleles A and A . Suppose that ϑ = ( s, θ , θ ) ∈ R × (0 , ∞ ) are the selectionand mutation parameters respectively, where s describes the extent to which allele A is favouredover A , alleles of type A mutate to A at rate proportional to θ , and those of type A mutateto A at rate proportional to θ . Let X t denote the frequency of A in the population at time t .Then the dynamics of X t can be described by a diffusion process on [0 , dX t = µ ( ϑ , X t ) dt + σ ( X t ) dW t := 12 ( sX t (1 − X t ) − θ X t + θ (1 − X t )) dt + p X t (1 − X t ) dW t , (1)with X ∼ ν for some initial distribution ν , ( W t ) t ≥ a standard Wiener process defined on afiltered probability space (Ω , F , ( F t ) t ≥ , P ), and [0 , T ] the observation interval. A strong solu-tion to (1) exists by the Yamada-Watanabe condition (see Theorem 3.2, Chapter IV in [10]),but weak uniqueness suffices for our purposes. We denote by P ( ϑ ) ν the law induced on the spaceof continuous functions mapping [0 , T ] into [0 ,
1] (henceforth denoted C T ([0 , ϑ = ( s, θ , θ ), and X ∼ ν (with dependence on T being implicit). Furthermore we denote taking expectation with respect to P ( ϑ ) ν by E ( ϑ ) ν .We assume that θ , θ >
0, for if at least one is 0 then the diffusion is absorbed in finite timeand we are back in the regime studied by Watterson [23]. The boundary behaviour depends onwhether the mutation parameters are either less than or greater or equal to 1, but in either casethe diffusion is ergodic as long as θ , θ > f ϑ ( x ) = 1 G ϑ e sx x θ − (1 − x ) θ − , x ∈ (0 , , G ϑ is the normalising constant G ϑ = Z e sx x θ − (1 − x ) θ − dx ≤ max { e s , } B ( θ , θ ) < ∞ , with B ( θ , θ ) := Z x θ − (1 − x ) θ − dx (2)the beta function. In what follows, we will always assume that ξ ∼ f ϑ , and we denote takingexpectation with respect to f ϑ by E ( ϑ ) , where the omission of the subscript will indicate thatwe start from stationarity.It turns out that we need a slightly stronger notion of ergodicity which we now define. Theidea here is that we can extend pointwise ergodicity in the parameter ϑ = ( s, θ , θ ) to anycompact set K in the parameter space R × (0 , ∞ ) by finding the slowest rate of convergencewhich works within that compact set. More rigorously, we introduce the following definition. Definition 2.1.
The process X is said to be ergodic uniformly in the parameter ϑ if ∀ ε > T →∞ sup ϑ ∈K P ( ϑ ) ν " (cid:12)(cid:12)(cid:12)(cid:12) T Z T h ( X t ) dt − E ( ϑ ) (cid:2) h ( ξ ) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) > ε = 0 (3)holds for any K ⊂ R × (0 , ∞ ) compact and for any function h : [0 , → R bounded andmeasurable, where ξ ∼ f ϑ .To the best of our knowledge, it has not been proven that the Wright-Fisher diffusion is ergodic uniformly in its parameters, which motivates the following theorem. Theorem 2.2.
The Wright-Fisher diffusion with mutation and selection is uniformly ergodicin the selection and mutation parameters ϑ = ( s, θ , θ ) for any initial distribution ν . We postpone the proof to Appendix A.For the remainder of this section we restrict our attention to the parameter space Θ ⊂ R × [1 , ∞ ) , where Θ is open and bounded, for if either of the mutation parameters were less than 1then the measures P ( ϑ ) ν within this region would be mutually singular with respect to one anotherand thus their Radon-Nikodym derivative undefined. Restricting our attention to mutation pa-rameters within the range [1 , ∞ ) thus ensures that the family of measures { P ( ϑ ) ν , ϑ ∈ Θ } areequivalent, and we have that d P ( ϑ ′ ) ν d P ( ϑ ) ν ( X T ) = ν ( ϑ ′ , X ) ν ( ϑ , X ) exp ( Z T (cid:18) µ ( ϑ ′ , X t ) − µ ( ϑ , X t ) σ ( X t ) (cid:19) dW t − Z T (cid:18) µ ( ϑ ′ , X t ) − µ ( ϑ , X t ) σ ( X t ) (cid:19) dt ) (4)with P ( ϑ ) ν -probability 1. Proofs of the above claims regarding the equivalence of the Wright-Fisher diffusion and the form of the Radon-Nikodym derivative can be found in [3], Lemma7.2.2 and Section 10.1.1. We emphasise here that we have allowed the starting distribution ν todepend on the parameters, as is evident from the first ratio in (4). However if there is no suchdependence then the only difference to the above would be to replace this ratio by 1.We end this section by introducing the concept of local asymptotic normality (LAN) and showingthat the Wright-Fisher diffusion is uniformly LAN, which will be essential in the next section.4 efinition 2.3 (Special case of Definition 2.1 in [12]) . The family of measures { P ( ϑ ) ν , ϑ ∈ Θ } is said to be locally asymptotically normal (LAN) at a point ϑ ∈ Θ at rate T − / if for any u ∈ R , the likelihood ratio function admits the representation Z T, ϑ ( u ) := d P ( ϑ + u √ T ) ν d P ( ϑ ) ν ( X T )= exp (cid:26)(cid:10) u , ∆ T ( ϑ , X T ) (cid:11) − h I ( ϑ ) u , u i + r T ( ϑ , u , X T ) (cid:27) , where h· , ·i denotes the Euclidean inner product on R , and ∆ T ( ϑ , X T ) is a random variablesuch that L ϑ (cid:8) ∆ T ( ϑ , X T ) (cid:9) d → N ( , I ( ϑ )) , (5)with I ( ϑ ) the Fisher information matrix evaluated at ϑ , i.e. I ( ϑ ) := E ( ϑ ) (cid:20) ˙ µ ( ϑ , ξ ) ˙ µ ( ϑ , ξ ) T σ ( ξ ) (cid:21) , where ˙ µ ( ϑ , ξ ) T is the transpose of the vector of derivatives of µ ( ϑ , x ) with respect to ϑ . More-over, the function r T ( ϑ , u , X T ) satisfieslim T →∞ r T ( ϑ , u , X T ) = 0 in P ( ϑ ) -probability (6)The family of measures is said to be LAN on Θ if it is LAN at every point ϑ ∈ Θ , and furtherit is said to be uniformly LAN on Θ if both convergence (5) and (6) are uniform in s ∈ K forevery compact K ⊂ Θ . Theorem 2.4.
The family of measures { P ( ϑ ) ν , ϑ ∈ Θ } induced by the weak solution to (1) withinitial distribution ν being either a point mass at x ∈ (0 , or the stationary density f ϑ , isuniformly LAN on Θ , with the likelihood ratio function Z T, ϑ ( u ) admitting the representation Z T, ϑ ( u ) = exp (cid:8)(cid:10) u , ∆ T ( ϑ , X T ) (cid:11) − h I ( ϑ ) u , u i + r T ( ϑ , u , X T ) (cid:9) for u ∈ U T, ϑ = { u : ϑ + u √ T ∈ Θ } , where ∆ T ( ϑ , X T ) = 12 √ T Z T ˙ µ ( ϑ , X t ) σ ( X t ) dW t Proof.
From (4), we have that the log-likelihood ratio is given bylog Z T,s ( u ) = log ν ( ϑ + u √ T , X ) ν ( ϑ , X )+ Z T u √ T p X t (1 − X t ) − u √ T r X t − X t + u √ T r − X t X t ! dW t − Z T u √ T p X t (1 − X t ) − u √ T r X t − X t + u √ T r − X t X t ! dt = log ν ( ϑ + u √ T , X ) ν ( ϑ , X ) + (cid:10) u , ∆ T ( ϑ , X T ) (cid:11) − h I ( ϑ ) u , u i + 12 h I ( ϑ ) u , u i − T Z T h u , ˙ µ ( ϑ , X t ) i σ ( X t ) dt, (7)5here I ( ϑ ) = E ( ϑ ) ξ (1 − ξ ) − ξ − ξ − ξ ξ − ξ − ξ − ξξ . Setting r T ( ϑ , u , X T ) := log ν ( ϑ + u √ T , X ) ν ( ϑ , X ) + 12 h I ( ϑ ) u , u i − T Z T h u , ˙ µ ( ϑ , X t ) i σ ( X t ) dt, we show that (6) holds. The first term appears only if, of the two choices for ν , we have ν = f ϑ ,and in that caselog ν ( ϑ + u √ T , X ) ν ( ϑ , X ) = u √ T X + u √ T log X + u √ T log(1 − X ) → T → ∞ . Thus we deduce that (6) follows if we can prove that for any ε > T →∞ sup ϑ ∈K P ( ϑ ) ν "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T Z T h u , ˙ µ ( ϑ , X t ) i σ ( X t ) dt − h I ( ϑ ) u , u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε = 0 . (8)Observe that the expression inside the probability in (8) is made up of six distinct differencesbetween the averages of the six distinct entries of the Fisher information matrix with respectto time and the stationary density. Thus if we are able to show that each individual differencedisplays the same convergence as in (3), (8) follows. Now, as h u , ˙ µ ( ϑ , x ) i σ ( x ) = u p x (1 − x ) − u r x − x + u r − xx ! = u x (1 − x ) − u u x + 2 u u (1 − x ) − u u + u x − x + u − xx using (1), we can apply Theorem 2.2 to the first four terms directly. The remaining two dif-ferences involve the unbounded functions x (1 − x ) − and (1 − x ) x − and thus Theorem 2.2cannot be applied; however, arguments similar to those used in the proof of this theorem (seeAppendix B for the relevant details and proof) allow us to deduce that (3) is also true for thesetwo functions and thus (6) holds. Finally, (5) follows from Proposition 1.20 in [12] which we caninvoke in view of the above proved (8) and the fact thatsup ϑ ∈K p h I ( ϑ ) u , u i < ∞ . We henceforth assume that the mutation parameters θ , θ > s ∈ S ⊂ R with S open and bounded. As re-marked earlier, the observational regime entertained here would enable one to infer the mutationparameters: on ϑ ∈ R × (0 , this is immediate; the family of measures { P ( ϑ ) ν : ϑ ∈ R × (0 , } are mutually singular. On ϑ ∈ R × [1 , ∞ ) the family of measures { P ( ϑ ) ν : ϑ ∈ R × [1 , ∞ ) } are now mutually absolutely continuous, with both boundary points unattainable. However,6he process can get arbitrarily close to either boundary as T → ∞ , and in this region the noisevanishes sufficiently quickly that again the corresponding mutation parameters can be inferredto any required precision. In the case when one mutation parameter is less than 1 and the otheris greater than or equal to 1, similar arguments apply. Actually incorporating inference of themutation parameter into the inferential setup below leads to some technical difficulties which wediscuss in Section 4, so for simplicity we assume them to be known. Nonetheless all the notationintroduced above and definitions carry through by replacing ϑ by s .We start by defining the MLE ˆ s T of s in (1) asˆ s T = arg sup s ∈S d P ( s ) ν d P ( s ) ν ( X T )where s ∈ S is arbitrary and its only role is to specify a reference measure whose exact valuedoes not matter. We point out that now (4) simplifies to d P ( s ′ ) ν d P ( s ) ν ( X T ) = ν ( s ′ , X ) ν ( s, X ) exp ( Z T s ′ − s √ T p X t (1 − X t ) dW t − Z T (cid:18) s ′ − s √ T (cid:19) X t (1 − X t ) dt ) . (9)In order to be able to define the Bayesian estimator, we introduce the class W p of loss functions ℓ : S → R + for which the following stipulations are satisfied:A1. ℓ ( · ) is even, non-negative, and continuous at 0 with ℓ (0) = 0 but not identically zero.A2. The sets { u ∈ S : ℓ ( u ) < c } are convex ∀ c > ℓ ( · ) has a polynomial majorant, i.e. there exist strictly positive constants A and b such thatfor any u ∈ S , | ℓ ( u ) | ≤ A (1 + | u | b )A4. For any H > γ , it holds thatinf | u | >H ℓ ( u ) − sup | u |≤ H γ ℓ ( u ) ≥ . As remarked above, we assume that S is an open and bounded subset of R , and we denote by p ( · ) the prior density on S , which we assume belongs to P c := (cid:26) p ( · ) ∈ C ( ¯ S , R + ) : p ( u ) ≤ A (1 + | u | b ) ∀ u ∈ ¯ S , Z ¯ S p ( u ) du = 1 (cid:27) , where A and b are some strictly positive constants, and ¯ S denotes the closure of S . With p ( · ) ∈ P c and ℓ ( · ) ∈ W p , we define the Bayesian estimator ˜ s T of s in (1) as˜ s T = arg min ¯ s T Z S E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i p ( s ) ds. We introduce the last class of functions we will need, namely denote by G the class of functionssatisfying the following two conditions:1. For a fixed T > g T ( · ) is a monotonically increasing function on [0 , ∞ ), with g T ( y ) → ∞ as y → ∞ . 7. For any N >
0, lim T →∞ y →∞ y N e − g T ( y ) = 0 . Observe that the likelihood ratio function is now given by Z T,s ( u ) : = d P ( s + u √ T ) ν d P ( s ) ν ( X T )= ν ( s + u √ T , X ) ν ( s, X ) exp ( (cid:18) u √ T (cid:19) Z T p X t (1 − X t ) dW t − (cid:18) u √ T (cid:19) Z T X t (1 − X t ) dt ) (10)for u ∈ U T,s := { u ∈ R : s + u √ T ∈ S} .We now present the main result of this article which states that the ML and Bayesian esti-mators for s have a set of desirable properties. We prove this by showing that the conditionsof Theorems I.5.1, I.5.2, I.10.1, and I.10.2 in [9] are satisfied for the Wright-Fisher diffusion. Asimilar formulation of the result below for the general case of a continuously observed diffusionon R can be found in Theorems 2.8 and 2.13 in [12], where the author proves that the conditionsnecessary to invoke Theorems I.5.1, I.5.2, I.10.1, and I.10.2 in [9] hold for a certain class ofdiffusions. However, this class includes only those scalar diffusions for which the inverse of thediffusion coefficient has a polynomial majorant. This fails to hold in our case, forcing us to seekalternative ways to prove that the conditions of the above mentioned theorems hold. Theorem 3.1.
Let ˆ s T and ˜ s T respectively be the ML and Bayesian estimators for selectionparameter s ∈ S (for open bounded S ⊂ R ) in the non-neutral Wright-Fisher diffusion (1) withinitial distribution being either a point mass at a fixed x ∈ (0 , or the stationary distribution.In what follows, let ¯ s T refer to either of the two estimators. Then ¯ s T is uniformly over compactsets K ⊂ S consistent, i.e. for any ε > T →∞ sup s ∈K P ( s ) ν (cid:2) | ¯ s T − s | > ε (cid:3) = 0; it converges in distribution to a normal random variable L s n √ T (¯ s T − s ) o d → N (0 , I ( s ) − ) , uniformly in s ∈ K ; and it displays moment convergence for any p > T →∞ E ( s ) ν (cid:20) (cid:12)(cid:12)(cid:12) √ T (¯ s T − s ) (cid:12)(cid:12)(cid:12) p (cid:21) = E (cid:20) (cid:12)(cid:12)(cid:12) I ( s ) − ζ (cid:12)(cid:12)(cid:12) p (cid:21) uniformly in s ∈ K , where ζ ∼ N (0 , , for any compact set K ⊂ S . Furthermore, if the lossfunction ℓ ( · ) ∈ W p , then ¯ s T is also asymptotically efficient, i.e. lim δ → lim T →∞ sup s : | s − s | <δ E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i = E h ℓ (cid:16) I ( s ) − ζ (cid:17)i holds for all s ∈ S , where ζ ∼ N (0 , . As mentioned above, the proof relies on Theorems I.5.1, I.5.2, I.10.1, and I.10.2 in [9], whichfor reference we combine together into Theorem 3.2 below. Establishing that the conditionsof Theorem 3.2 hold for the Wright-Fisher diffusion is non-trivial as the standard arguments8ound in [12] no longer hold, and will thus be the main focus of this section. The conclusions ofTheorems I.5.1 and I.5.2 guarantee the uniform over compact sets consistency for the MLE andBayesian estimator respectively, whilst those of Theorems I.10.1 and I.10.2 provide the necessaryconditions to deduce the uniform in s ∈ K asymptotic normality and convergence of momentsfor compact K ⊂ S , as well as asymptotic efficiency.
Theorem 3.2 (Ibragimov-Has’minskii) . Let ˆ s T , ˜ s T respectively be the ML and Bayesian estima-tors with prior density p ( · ) ∈ P c , defined in terms of a loss function ℓ ( · ) ∈ W p for the parameter s ∈ S , for open bounded S ⊂ R , in (1). Suppose further that the following conditions are satisfiedby the likelihood ratio function Z T,s ( u ) as defined in (10):1. ∀K ⊂ S compact, we can find constants a and B , and functions g T ( · ) ∈ G (all of whichdepend on K ) such that the following two conditions hold: • ∀ R > , ∀ u, v ∈ U T,s satisfying | u | < R , | v | < R , and for some m ≥ q > dim( S )sup s ∈K E ( s ) ν h(cid:12)(cid:12)(cid:12) Z T,s ( u ) m − Z T,s ( v ) m (cid:12)(cid:12)(cid:12) m i ≤ B (1 + R a ) | u − v | q . (11) • ∀ u ∈ U T,s sup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ e − g T ( | u | ) .
2. The random functions Z T,s ( u ) have marginal distributions which converge uniformly in s ∈ K as T → ∞ to those of the random function Z s ( u ) ∈ C ( R ) , where C ( R ) denotesthe space of continuous functions on R vanishing at infinity, equipped with the supremumnorm and the Borel σ -algebra.3. The limit function Z s ( u ) attains its maximum at the unique point ˆ u ( s ) = u with probability1, and the random function ψ ( v ) = Z R ℓ ( v − u ) Z s ( u ) R R Z s ( y ) dy du attains its minimum value at a unique point ˜ u ( s ) = u with probability 1.Then we have that the ML and Bayesian estimators are: uniformly in s ∈ K consistent, i.e. forany ε > T →∞ sup s ∈K P ( s ) ν (cid:2) | ¯ s T − s | > ε (cid:3) = 0 , the distributions of the random variables ¯ u T = √ T (¯ s T − s ) converge uniformly in s ∈ K to thedistribution of ¯ u , and for any loss function ℓ ∈ W p uniformly in s ∈ K lim T →∞ E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i = E ( s ) ν [ ℓ (¯ u )] . (12)In fact, for the Bayesian estimator the requirements for inequality (11) can be weakened as itsuffices to show that (11) holds for m = 2 and any q > Proof of Theorem 3.1.
Our aim will be to prove that Conditions 1, 2, and 3 in Theorem 3.2hold for the Wright-Fisher diffusion, for then the ML and Bayesian estimator are uniformlyon compact sets consistent. Below, Condition 1 is shown to hold in Propositions 3.4 and 3.5;Condition 2 is shown in Corollary 3.3; and Condition 3 is shown in Proposition 3.6.9t remains to show how uniform in s ∈ K asymptotic normality and convergence of moments, aswell as asymptotic efficiency (under the right choice of loss function) follow. Given Conditions1, 2, and 3 of Theorem 3.2, uniform in s ∈ K asymptotic normality follows immediately fromProposition 3.6; ¯ u = I ( s ) − ∆( s ), ∆( s ) ∼ N (0 , I ( s )), and ¯ u T converges uniformly in distributionto ¯ u . Moreover, as stated in Remark I.5.1 in [9], the Ibragimov-Has’minskii conditions also giveus a bound on the tails of the likelihood ratio, which can be translated into bounds on thetails of | ˆ u T | p for any p > | ˜ u T | p hold for the Bayesian estimator by Theorem I.5.7 in [9], and thus we have thatthe random variables | ¯ u T | p are uniformly integrable for any p >
0, uniformly in s ∈ K for anycompact K ⊂ S . Uniform convergence of the moments of the estimators follows from this andthe uniform convergence in distribution (by applying a truncation argument).For loss functions satisfying ℓ ( · ) ∈ W p , observe that the uniform convergence in (12) allowsus to deduce thatlim T →∞ sup s : | s − s | <δ E ( s ) ν h ℓ (cid:16) √ T (¯ s T − s ) (cid:17)i = sup s : | s − s | <δ E h ℓ (cid:16) I ( s ) − ζ (cid:17)i for ζ ∼ N (0 , I ( s ) is continuous in s , we have thatlim δ → sup s : | s − s | <δ E h ℓ (cid:16) I ( s ) − ζ (cid:17)i = E h ℓ (cid:16) I ( s ) − ζ (cid:17)i , giving asymptotic efficiency.We proceed to show that Conditions 1, 2, and 3 in Theorem 3.2 hold for the Wright-Fisherdiffusion. Theorem 2.4 gives us that the Wright-Fisher diffusion is uniformly LAN, which im-mediately gives the required marginal convergence of the Z T,s ( u ) in Condition 2. Corollary 3.3.
The random functions Z T,s ( u ) given by Z T,s ( u ) = exp (cid:26) u √ T Z T p X t (1 − X t ) dW t − u E ( s ) [ ξ (1 − ξ )] + r T ( s, u, X T ) (cid:27) =: exp (cid:26) u ∆ T ( s ) − u I ( s ) + r T ( s, u, X T ) (cid:27) , have marginal distributions which converge uniformly in s ∈ K as T → ∞ to those of the randomfunction Z s ( u ) ∈ C ( R ) given by Z s ( u ) := exp (cid:26) u ∆( s ) − u I ( s ) (cid:27) , where ∆( s ) := lim T →∞ √ T Z T p X t (1 − X t ) dW t ∼ N (cid:18) , I ( s ) (cid:19) . Proof.
The result follows immediately from the uniform LAN of the family of measures as shownin Theorem 2.4; see for illustration the display just before Lemma 2.10 in [12]. It is clear that Z s ( u ) vanishes at infinity and thus is an element of C ( R ).The next two results allow us to control the Hellinger distance of the likelihood ratio functionas required by Condition 1 in Theorem 3.2. Proposition 3.4.
For any
K ⊂ S compact, we can find a constant C such that for any R > ,and for any u, v ∈ U T,s satisfying | u | < R , | v | < R , the following holds sup s ∈K E ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12) Z T,s ( u ) − Z T,s ( v ) (cid:12)(cid:12)(cid:12) (cid:21) ≤ C (1 + R ) | u − v | . roof. In what follows we denote by C i , for i ∈ N , constants which do not depend on u , v , s , or T . Observe that for any s ′ , s ∗ ∈ S it holds that E ( s ′ ) ν "Z T (cid:12)(cid:12)(cid:12)(cid:12) µ ( s ′ , X t ) − µ ( s ∗ , X t ) σ ( X t ) (cid:12)(cid:12)(cid:12)(cid:12) m dt = E ( s ′ ) ν "Z T (cid:12)(cid:12)(cid:12)(cid:12) ( s ′ − s ∗ )2 p X t (1 − X t ) (cid:12)(cid:12)(cid:12)(cid:12) m dt ≤ (cid:18) s ′ − s ∗ (cid:19) m T < ∞ , and so we can use Lemma 1.13 and Remark 1.14 from [12] to split the expectation in (11) intothree E ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12) Z T,s ( u ) − Z T,s ( v ) (cid:12)(cid:12)(cid:12) (cid:21) ≤ C Z (cid:12)(cid:12)(cid:12) f s u ( x ) − f s v ( x ) (cid:12)(cid:12)(cid:12) dx + C Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt + C T Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt, (13)where we denote s u = s + u/ √ T and s v = s + v/ √ T , and remark that the above holds for ν = f s ,whilst if ν = δ x then the first term on the RHS of (13) vanishes. Observe that Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt = | u − v | T Z T E ( s v ) ν [ X t (1 − X t )] dt ≤ | u − v | . Therefore C Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt ≤ C | u − v | . (14)A similar calculation can be performed for the third term in (13) to get C T Z T E ( s v ) ν "(cid:18) µ ( s u , X t ) − µ ( s v , X t ) σ ( X t ) (cid:19) dt ≤ C | u − v | . (15)Dealing with the first term in (13) is slightly more involved. To this end, observe that Z (cid:12)(cid:12)(cid:12) f s u ( x ) − f s v ( x ) (cid:12)(cid:12)(cid:12) dx = Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12) p G s u e ux √ T − p G s v e vx √ T (cid:12)(cid:12)(cid:12) dx. (16)Now we have that C min { e s , } ≤ G s u := Z x θ − (1 − x ) θ − e (cid:16) s + u √ T (cid:17) x dx ≤ C max { e s , } , (17)where C = B ( θ , θ ) e − diam( S ) , C = B ( θ , θ ) e diam( S ) are non-zero, positive, and independentof s and T , since we constrain u, v ∈ U T,s . This allows us to deduce that G / √ G is Lipschitz11n [ C inf s ∈K min { e s , } , C sup s ∈K max { e s , } ] with some constant C >
0, i.e. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p G s u − p G s v (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:12)(cid:12)(cid:12) G s u − G s v (cid:12)(cid:12)(cid:12) = C Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12) e ux √ T − e vx √ T (cid:12)(cid:12)(cid:12) dx ≤ C C Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12)(cid:12) ux √ T − vx √ T (cid:12)(cid:12)(cid:12)(cid:12) dx = C C √ T | u − v | Z x θ (1 − x ) θ − e sx dx ≤ C √ T max { e s , } | u − v | , where in the second inequality we have made use of the fact that e z is Lipschitz in z on[ − diam( S ) , diam( S )] with some constant C >
0. Thus we deduce that (cid:12)(cid:12)(cid:12) p G s u e ux √ T − p G s v e vx √ T (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p G s u (cid:16) e ux √ T − e vx √ T (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e vx √ T p G s u − p G s v ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 e vx √ T p G s u (cid:12)(cid:12)(cid:12) e ux √ T − e vx √ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p G s u − p G s v (cid:12)(cid:12)(cid:12) ≤ C x T C min { e s , } (cid:12)(cid:12) u − v (cid:12)(cid:12) + e diam( S ) x C T max { e s , } (cid:12)(cid:12) u − v (cid:12)(cid:12) + e diam( S ) x C C xT √ C max { e s , } min { e s/ , } (cid:12)(cid:12) u − v (cid:12)(cid:12) (18)Putting (18) into (16) gives us that Z x θ − (1 − x ) θ − e sx (cid:12)(cid:12)(cid:12) p G s u e ux √ T − p G s v e vx √ T (cid:12)(cid:12)(cid:12) dx ≤ C s T | u − v | , (19)where C s := C e | s | + C max { e s , } + C max { e s , } min { e s/ , } . Inserting equations (14), (15), and (19) into (13) allows us to deduce thatsup s ∈K E ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12) Z T,s ( u ) − Z T,s ( v ) (cid:12)(cid:12)(cid:12) (cid:21) ≤ sup s ∈K ( (cid:18) C s T + C (cid:19) | u − v | + C | u − v | ) ≤ C | u − v | (cid:0) R (cid:1) , where we make use of the fact that | u | , | v | < R , as well as the fact that C s is continuous in s over any compact set K ⊂ S , and C , C are independent of s . Proposition 3.5.
For
K ⊂ S compact, there exists a function g T ( · ) ∈ G such that for any u ∈ U T,s we have that sup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ e − g T ( | u | ) . (20)12 roof. Assume for now that for any M ≥ P ( s ) ν (cid:20) Z T,s ( u ) > exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27)(cid:21) ≤ C s,M | u | M (21)for some constant C s,M > s and M . We show that if (21) holds, then (20)follows. Indeed E ( s ) ν h Z T,s ( u ) i = E ( s ) ν h Z T,s ( u ) { Z T,s ( u ) ≤ exp { − E ( s ) [ ξ (1 − ξ )] | u | }} i + E ( s ) ν h Z T,s ( u ) { Z T,s ( u ) > exp { − E ( s ) [ ξ (1 − ξ )] | u | }} i ≤ exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + E ( s ) ν [ Z T,s ( u )] P ( s ) ν (cid:20) Z T,s ( u ) > exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27)(cid:21) ≤ exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + C s,M | u | M where in the first inequality we have made use of Cauchy-Schwarz, and for the second inequalitywe have used (21). Therefore,sup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ sup s ∈K ( exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + C s,M | u | M ) = exp (cid:26) −
132 inf s ∈K E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + sup s ∈K C s,M | u | M =: exp {− g T ( | u | ) . } It remains to ensure that g T ( · ) ∈ G , that inf s ∈K E ( s ) [ ξ (1 − ξ )] ≥ κ > κ , andthat for any M ≥ s ∈K C s,M < ∞ . Observe thatmin (cid:26) inf s ∈K e s , (cid:27) B ( θ , θ ) ≤ G s ≤ max (cid:26) sup s ∈K e s , (cid:27) B ( θ , θ ) . Thus inf s ∈K E ( s ) [ ξ (1 − ξ )] = inf s ∈K (cid:26)Z G s e sξ ξ θ (1 − ξ ) θ dξ (cid:27) ≥ inf s ∈K nR e sξ ξ θ (1 − ξ ) θ dξ o max { sup s ∈K e s , } B ( θ , θ ) ≥ min { inf s ∈K e s , } B ( θ + 1 , θ + 1)max { sup s ∈K e s , } B ( θ , θ ) =: κ and κ > K is bounded, and thus both sup s ∈K e s and inf s ∈K e s are finite and non-zero.We show that sup s ∈K C s,M is finite ∀ M ≥ g T ( | u | ) asdefined above is in the class of functions G . To this end, observe that g T ( | u | ) = − log exp (cid:26) −
132 inf s ∈K E ( s ) [ ξ (1 − ξ )] | u | (cid:27) + sup s ∈K C s,M | u | M ! . Indeed, for a fixed
T > g T ( | u | ) → ∞ as | u | → ∞ , because inf s ∈K E ( s ) [ ξ (1 − ξ )] >
0, andfurthermore given any fixed N , we can choose M large enough (note the way we phrased (21)allows us to choose our M arbitrarily large, say M > N ) such thatlim T →∞ y →∞ y N e − g T ( y ) = lim T →∞ y →∞ y N exp (cid:26) −
132 inf s ∈K E ( s ) [ ξ (1 − ξ )] | y | (cid:27) + sup s ∈K C s,M | y | M ! = 0 , g T ( | u | ) is independentof T . Thus we have proved that if (21) holds, thensup s ∈K E ( s ) ν h Z T,s ( u ) i ≤ e − g T ( | u | ) , g T ( · ) ∈ G . To show that (21) holds, we make use of Chebyshev’s inequality as well as Theorem 3.2 in [13].Indeed, observe that if ν = f s , then P ( s ) ν (cid:20) Z T,s ( u ) ≥ exp (cid:26) − E ( s ) [ ξ (1 − ξ )] | u | (cid:27)(cid:21) = P ( s ) ν " G s + u √ T G s exp (cid:26) uX √ T + u √ T Z T p X t (1 − X t ) dW t − | u | (cid:18) T Z T X t (1 − X t ) dt − E ( s ) [ ξ (1 − ξ )] (cid:19) (cid:27) > exp (cid:26) E ( s ) [ ξ (1 − ξ )] | u | (cid:27) ≤ P ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log G s + u √ T G s ! + uX √ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | + P ( s ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) u √ T Z T p X t (1 − X t ) dW t (cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | (cid:21) + P ( s ) ν (cid:20) | u | (cid:12)(cid:12)(cid:12)(cid:12) T Z T X t (1 − X t ) dt − E ( s ) [ ξ (1 − ξ )] (cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | (cid:21) =: A + A + A . If ν = δ x , the only difference to the above would be the fact that A vanishes and the 48 onthe RHS of the bounds inside A and A would change to 32. For A , we use Chebyshev’sinequality: A = P ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log G s + u √ T G s ! + u √ T X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > E ( s ) [ ξ (1 − ξ )] | u | ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) ν (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log G s + u √ T G s ! + u √ T X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M . But log G s + u √ T G s ! = log R x θ − (1 − x ) θ − e ( s + u √ T ) x dx R x θ − (1 − x ) θ − e sx dx ! ≤ | u |√ T , so we have A ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12) u √ T (cid:12)(cid:12)(cid:12)(cid:12) M | X | M = (cid:18) E ( s ) [ ξ (1 − ξ )] √ T | u | (cid:19) M E ( s ) h | ξ | M i ≤ (cid:18) d s E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) h | ξ | M i =: C (1) s,M | u | M , where in the second inequality we made use of the fact that u ∈ U T,s , and thus | u | ≤ d s √ T where we define d s := sup w ∈ ∂ S | s − w | (which is strictly positive and bounded as S is open and14ounded). To see that sup s ∈K C (1) s,M is bounded, observe thatsup s ∈K C (1) s,M = sup s ∈K ((cid:18) d s E ( s ) [ ξ (1 − ξ )] (cid:19) M E ( s ) h | ξ | M i) ≤ (cid:18) B ( θ , θ ) B ( θ + 1 , θ + 1) sup s ∈K d s max { e s , } min { e s , } (cid:19) M , which is clearly finite because K is bounded.For A we use a similar argument, but now use the fact that we have a stochastic integral: A ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M E ( s ) ν "(cid:12)(cid:12)(cid:12)(cid:12) u √ T Z T p X t (1 − X t ) dW t (cid:12)(cid:12)(cid:12)(cid:12) M ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M (cid:18) M M − (cid:19) M T − E ( s ) ν (cid:20)Z T | X t (1 − X t ) | M dt (cid:21) ≤ (cid:18) E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M (cid:18) M M − (cid:19) M =: C (2) s,M | u | M , where the first line uses Chebyshev’s inequality and the second inequality uses Lemma 1.1 (equa-tion (1.3)) in [12]. That sup s ∈K C (2) s,M is finite follows from arguments similar to those used forthe respective term in A .For A we make use of Theorem 3.2 in [13], which gives us that for M ≥ P ( s ) ν " (cid:12)(cid:12)(cid:12)(cid:12) T Z T X t (1 − X t ) dt − E ( s ) [ ξ (1 − ξ )] (cid:12)(cid:12)(cid:12)(cid:12) ≥ E ( s ) [ ξ (1 − ξ )] ≤ K ( s, X, M ) k x (1 − x ) k M ∞ (cid:16) E ( s ) [ ξ (1 − ξ )]6 √ T (cid:17) M . (22)For the RHS of (22), we have that K ( s, X, M ) k x (1 − x ) k M ∞ (cid:16) E ( s ) [ ξ (1 − ξ )]6 √ T (cid:17) M ≤ K ( s, X, M ) (cid:18) k x (1 − x ) k ∞ d s E ( s ) [ ξ (1 − ξ )] | u | (cid:19) M =: C (3) s,M | u | M , where K ( s, X, M ) is a function that depends on M and on the moments of the hitting times of X . Finally we deduce that sup s ∈K C (3) s,M is finite by observing thatsup s ∈K C (3) s,M = sup s ∈K K ( s, X, M ) (cid:18) k x (1 − x ) k ∞ d s E ( s ) [ ξ (1 − ξ )] (cid:19) M ≤ sup s ∈K K ( s, X, M ) (cid:18) B ( θ , θ ) B ( θ + 1 , θ + 1) sup s ∈K d s max { e s , } min { e s , } (cid:19) M , which is finite since k x (1 − x ) k ∞ = 1 / K is compact, and K ( s, X, M ) is bounded by a functionwhich is continuous in s (see Appendix A for the corresponding details).Finally, we present the result which guarantees that Condition 3 in Theorem 3.2 holds, and thusthat the Ibragimov-Has’minskii conditions hold for the Wright-Fisher diffusion.15 roposition 3.6. The random functions Z s ( u ) and ψ ( v ) := Z R ℓ ( v − u ) Z s ( u ) R R Z s ( y ) dy du attain their maximum and minimum respectively at the unique point ¯ u = ¯ u s = I ( s ) − ∆( s ) withprobability 1.Proof. The first assertion follows immediately from Corollary 3.3, whilst for the second we directthe interested reader to Theorem III.2.1 in [9], which relies on two results: Anderson’s Lemma(Lemma II.10.1 in [9]), and Lemma II.10.2 in [9].
In this article we have shown in Theorem 2.2 that the Wright-Fisher diffusion is ergodic uni-formly in the parameter ϑ = ( s, θ , θ ) ∈ Θ ⊂ R × (0 , ∞ ) , extending the well-known pointwisein ϑ = ( s, θ , θ ) ergodicity of the Wright-Fisher diffusion over any compact set K ⊂ R × (0 , ∞ ) for bounded functions. We have also proved that the family of measures { P ( ϑ ) ν : ϑ ∈ Θ } inducedby the solution to the SDE (1) are uniformly LAN when Θ ⊂ R × [1 , ∞ ) in Theorem 2.4, wherethe extra restriction on the mutation rates ensures that the likelihood ratio function is defined.In Section 3 we then considered inference for the selection parameter s when the diffusion is ob-served continuously through time and the mutation rates are known. Under these assumptionswe proved that the ML and Bayesian estimators for s ∈ S ( S an open bounded subset of R ) inthe non-neutral Wright-Fisher diffusion started from either a fixed point x ∈ (0 ,
1) or from sta-tionarity, are uniformly over compacts sets consistent and display uniform in s ∈ K asymptoticnormality and convergence of moments, for any compact K ⊂ S . Furthermore, for the rightchoice of loss function we also have asymptotic efficiency of the two estimators. The uniformityin these results is particularly useful as it guarantees a lower bound on the rate at which the in-ferential parameters are being learned. Such properties have been shown to hold for a wide classof SDEs in [12] by making use of the general theorems of Ibragimov and Has’minskii (TheoremsI.5.1, I.5.2, I.10.1 and I.10.2 in [9]), however they do not hold for the Wright-Fisher diffusionas they require the diffusion coefficient to be non-zero everywhere and to have an inverse thathas a polynomial majorant. Both conditions fail for (1), forcing us to find an alternative way ofproving that the Ibragimov-Has’minskii conditions still hold. We emphasise here that the aim ofthis study is to investigate the properties of the estimators in the “ideal” continuous observationscenario when the whole path is known to the observer.Assuming that the mutation rates are known is a limitation to this study, however we em-phasise that in the regime considered here these can be inferred directly from the path oncethe diffusion comes arbitrarily close to either boundary (and for mutation parameters less than1 this happens in finite time almost surely). Nonetheless, extending this work to include themutation parameters greater than 1 as a part of the inferential setup would be of great interest.This proves to be rather challenging as now the likelihood ratio function involves expressions ofthe form (1 − x ) x − and x (1 − x ) − (as witnessed in Theorem 2.4) which require much moredelicate arguments in order to establish the same conclusions as in Theorem 3.1. The mainissue here is in showing that condition 1 in the Ibragimov-Has’minskii conditions holds, for theother two conditions follow from Theorem 2.4 and Proposition 3.6. In particular, the fact thatthe functions (1 − x ) x − and x (1 − x ) − are unbounded in x and have only finitely many mo-ments with respect to the stationary distribution means that the strategies used in the proofsof Propositions 3.4 and 3.5 cannot be used.Recent advances in genome sequencing technology have led to an increase in the availability16nd analysis of genetic time series data. Inference for selection has traditionally been conductedusing techniques for and data coming from a single point in time. However, having a time seriesof data points allows one to track the changes in allele frequencies over time, to better under-stand and infer the presence and effect of selection. Several inferential techniques have alreadybeen developed for such a setting (see for instance [1, 14, 19, 8, 7], as well as [4] for a review onthe subject), and although the techniques provide ostensibly reasonable estimation, there arenot always theoretical guarantees on the statistical properties of the estimators being used. Theresults presented in this paper offer a baseline in this regard, and prove that in the absence ofobservational error one is guaranteed that the ML and Bayesian estimators are uniform overcompact sets consistent, asymptotically normal, and display moment convergence, besides beingasymptotically efficient for the right choice of loss function. This work was supported by the EPSRC as well as the MASDOC DTC (under grant EP/HO23364/1),by The Alan Turing Institute under the EPSRC grant EP/N510129/1, and by the EPSRC undergrant EP/R044732/1.
A Proof of Theorem 2.2
Proof.
We show uniform in ϑ = ( s, θ , θ ) ergodicity for the Wright-Fisher diffusion by makinguse of Theorem 3.2 in [13], which allows us to bound the LHS of (3) in terms of the moments ofthe hitting times of the process. We point out that this result requires the diffusion coefficientto be positive everywhere, and the drift and diffusion coefficients to be locally Lipschitz and tosatisfy a linear growth condition. These conditions fail for the Wright-Fisher diffusion becauseof its diffusion coefficient; however, they are used only to guarantee the existence of a uniquestrong solution to the SDE in Theorem 3.2, which we already have by other means. None ofthese requirements on the drift and diffusion coefficients are used in the proof of Theorem 3.2 in[13] when p ∈ { , , . . . } , which allows us to employ this theorem for the Wright-Fisher diffusionfor such p . All that remains to prove then is that these moments can be bounded by a functioncontinuous in ϑ , for then the supremum over any compact set K ⊂ R × (0 , ∞ ) is finite and (3)holds. To this end, we introduce some notation from [13], namely let a, b ∈ (0 ,
1) be arbitraryfixed points such that a < b . Define S = 0, R = 0, and S := inf { t ≥ X t = b } R := inf { t ≥ S : X t = a } S n +1 := inf { t ≥ R n : X t = b } R n +1 := inf { t ≥ S n +1 : X t = a } for n ∈ N . By the strong Markov property, ( R k − R k − ) k ∈ N is an i.i.d. sequence with law under P ( ϑ ) ν equal to the law of R under P ( ϑ ) a , where P ( ϑ ) ν and E ( ϑ ) ν are as defined in Section 2, and P ( ϑ ) a denotes the law of the process started from a . Related to the process ( R n ) n ∈ N we have theprocess ( N t ) t ≥ which we define as N t := sup { n : R n ≤ t } and for which we observe that { N t ≥ n } = { R n ≤ t } . We also denote by T b := inf { t ≥ X t = b } b . Furthermore, let ℓ ϑ := E ( ϑ ) [ N ] = E ( ϑ ) a [ R ] − (see Lemma 2.7 in [13]),and ¯ η := − ( R − R − ℓ − ϑ ). Then Theorem 3.2 in [13] gives us that for p ∈ { , , . . . } P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T Z T h ( X t ) dt − E ( ϑ ) [ h ( ξ )] (cid:12)(cid:12)(cid:12)(cid:12) > ε (cid:21) ≤ K ( ϑ , X, p ) ε − p k h k p ∞ T − p , where K ( ϑ , X, p ) := 6 p E ( ϑ ) ν h R p i + 12 p C p ℓ p ϑ E ( ϑ ) ν [ | R − R | p ] + 2(6 p ) ℓ ϑ E ( ϑ ) a [ R p ]+ 2 p E ( ϑ ) ν h(cid:12)(cid:12) R − ℓ ϑ − (cid:12)(cid:12) p i + 2 p C p ℓ p ϑ E ( ϑ ) ν [ | ¯ η | p ] , and C p is a constant depending only on p . We point out here that Theorem 3.2 in [13] holds ∀ p ∈ (1 , ∞ ) under additional assumptions, but for our case we need only p ∈ { , , . . . } . Thuswe are left with showing these moments can be bounded from above by a function continuousin ϑ , for then (3) follows by taking the supremum of this function over a compact set. Now theonly terms above that depend on ϑ are E ( ϑ ) ν h R p i ℓ p ϑ E ( ϑ ) ν [ | R − R | p ] ℓ ϑ E ( ϑ ) a [ R p ] E ( ϑ ) ν h(cid:12)(cid:12) R − ℓ − ϑ (cid:12)(cid:12) p i ℓ p ϑ E ( ϑ ) ν [ | ¯ η | p ] (23)and in light of the following inequalities E ( ϑ ) ν [ | ¯ η | p ] ≤ p − (cid:16) E ( ϑ ) ν [ | R − R | p ] + E ( ϑ ) ν h ℓ − p ϑ i(cid:17) = 2 p − (cid:16) E ( ϑ ) a [ R p ] + E ( ϑ ) a [ R ] p (cid:17) , E ( ϑ ) ν h(cid:12)(cid:12) R − ℓ − ϑ (cid:12)(cid:12) p i ≤ p − (cid:16) E ( ϑ ) ν h R p i + E ( ϑ ) ν h ℓ − p ϑ i(cid:17) = 2 p − (cid:16) E ( ϑ ) ν h R p i + E ( ϑ ) a [ R ] p (cid:17) , E ( ϑ ) ν [ | R − R | p ] = E ( ϑ ) a [ R p ] ≤ p − (cid:16) E ( ϑ ) a (cid:2) T pb (cid:3) + E ( ϑ ) b [ T pa ] (cid:17) , E ( ϑ ) ν h R p i ≤ p − (cid:16) E ( ϑ ) ν h T p b i + E ( ϑ ) b h T p a i(cid:17) , E ( ϑ ) a [ R ] = E ( ϑ ) a [ T b ] + E ( ϑ ) b [ T a ] , it suffices to consider only the terms ℓ ϑ and E ( ϑ ) ν (cid:2) T pb (cid:3) . Thus we are left with showing that thesetwo terms can be bounded from above by a function continuous in ϑ . We further point out thatwe can reduce our considerations in the expressions above to integer moments, for if this is notthe case then E ( ϑ ) ν (cid:2) T pb (cid:3) ≤ E ( ϑ ) ν h T ⌈ p ⌉ b i + E ( ϑ ) ν h T ⌊ p ⌋ b i where ⌈·⌉ and ⌊·⌋ denote the ceiling and floor functions respectively.We make use of the backward equation for the quantity U q,b ( x ) := E ( ϑ ) x [ T qb ] for q ∈ { , , . . . } ,to derive the ODE (as can be found in [11] p. 203 and 210, and [22]) x (1 − x )2 U ′′ q,b ( x ) + (cid:18) s x (1 − x ) − θ x + θ − x ) (cid:19) U ′ q,b ( x ) + qU q − ,b ( x ) = 0 (24)with boundary conditions U q,b ( b ) = 0 and lim y → S ′ ( y ) − ∂∂y U q,b ( y ) = 0 when x < b or lim y → S ′ ( y ) − ∂∂y U q,b ( y ) =0 when x > b , where S ( x ) := Z x e − sy y − θ (1 − y ) − θ dy.
18e point out here that in [22] the diffusion coefficient is assumed to be strictly positive every-where to ensure that the speed and scale of the diffusion are well-defined. The results howeverstill hold for the Wright-Fisher diffusion as both of these quantities exist and are well-defineddespite the fact that the diffusion coefficient is zero at either boundary. Solving (24) for x < b leads to E ( ϑ ) x [ T qb ] = Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − qU q − ,b ( η ) dηdξ, (25)whilst for x > b we have that E ( ϑ ) x [ T qb ] = Z xb e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − qU q − ,b ( η ) dηdξ. (26)We claim that for any x < b and any q ∈ { , , . . . } , E ( ϑ ) x [ T qb ] ≤ q ! (cid:18) { e − s , } θ Z b (1 − ξ ) − max { θ , } dξ (cid:19) q . (27)To see this, observe that E ( ϑ ) x [ T b ] = Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − dηdξ ≤ { e − s , } Z b ξ − θ (1 − ξ ) − θ Z ξ η θ − (1 − η ) θ − dηdξ ≤ { e − s , } Z b ξ − θ (1 − ξ ) − max { θ , } Z ξ η θ − dηdξ = 2 max { e − s , } θ Z b (1 − ξ ) − max { θ , } dξ, (28)where the second inequality follows from the observation that for η ∈ (0 , ξ )(1 − η ) θ − ≤ ( θ ≥ − ξ ) θ − if θ < , and shows that (27) holds for q = 1. Now the RHS of (28) is independent of x , so we can use therecursion in (26) to conclude by induction that (27) holds for q ∈ { , , . . . } as required. Similararguments to those presented above allow us to conclude that for x > b and q ∈ { , , . . . } , E ( ϑ ) x [ T qb ] ≤ q ! (cid:18) { e s , } θ Z b ξ − max { θ , } dξ (cid:19) q . (29)Both RHS of (27) and (29) are independent of x , so trivially E ( ϑ ) ν (cid:2) T qb (cid:3) ≤ q ! (cid:18) { e − s , } θ Z b (1 − y ) − max { θ , } dy (cid:19) q + (cid:18) { e s , } θ Z b y − max { θ , } dy (cid:19) q ! . (30)All the terms on the RHS of (27), (29) and (30) are continuous in ϑ , so we have our requiredbound for E ( ϑ ) ν (cid:2) T qb (cid:3) . It remains to show that we can bound ℓ ϑ from above by an expressioncontinuous in ϑ . Observe that by definition ℓ ϑ = E ( ϑ ) a [ R ] − = (cid:16) E ( ϑ ) a [ T b ] + E ( ϑ ) b [ T a ] (cid:17) − , (31)19nd recall that we will take the supremum in ϑ over a given compact set K , so using (25) and(26) respectively, and setting ¯ θ := sup ϑ ∈K θ , ¯ θ := sup ϑ ∈K θ , we can conclude that E ( ϑ ) a [ T b ] = Z ba e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − dηdξ ≥ { e − s , } Z ba ξ − θ (1 − ξ ) − θ dξ Z a η θ − (1 − η ) θ − dη ≥ { e − s , } ( b − a ) a θ θ (1 − a ) ¯ θ − , (32) E ( ϑ ) b [ T a ] = Z ba e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ − dηdξ ≥ { e s , } Z ba ξ − θ (1 − ξ ) − θ dξ Z b η θ − (1 − η ) θ − dη ≥ { e s , } ( b − a ) (1 − b ) θ θ b ¯ θ − , (33)which follow by observing that ξ − θ (1 − ξ ) − θ > ∀ ξ ∈ ( a, b ) , ∀ θ , θ > , (1 − η ) θ − ≥ (1 − a ) ¯ θ − ∀ η ∈ (0 , a ) ,η θ − ≥ b ¯ θ − ∀ η ∈ ( b, . Note that the RHS of (32) and (33) are continuous in ϑ , and thus in view of (31) we have foundthe required upper bound on ℓ ϑ which is continuous in ϑ . B Extending Theorem 2.2 for two specific unbounded functions
Recall the notation introduced in Appendix A, namely the regeneration times { S n , R n } ∞ n =0 andthe number of upcrossings up to time t , { N t } t ≥ . In what follows we consider the function(1 − x ) x − , however similar arguments hold for the function x (1 − x ) − . We want to prove thatlim T →∞ sup ϑ ∈K P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T Z T − X t X t dt − E ( ϑ ) (cid:20) − ξξ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) > ε (cid:21) = 0 (34)holds for any compact set K ⊂ Θ ⊂ R × [1 , ∞ ) . We point out that (34) is an extension of theresult in Theorem 2.2 for the specific unbounded function (1 − x ) x − which is needed in the proofof Theorem 2.4. Note that the expectation inside the probability is well-defined because we areassuming ( θ , θ ) ∈ [1 , ∞ ) ; however, the function (1 − ξ ) ξ − has only finitely many moments forany given pair of mutation rates, which makes the analysis here more intricate than the one inAppendix A. The strategy here will be to decompose the sample path of the diffusion into i.i.d.blocks of excursions as done in Theorem 3.5 in [13]. However, we will deal with the resultingexpectations in a different way, namely by applying the ODE approach used in Appendix A tobound these quantities by functions continuous in ϑ . Recall that as we are taking a supremumover ϑ in a compact set K , bounding an expectation by a function continuous in ϑ suffices toyield a bound uniform over K . To this end, fix ε ∈ (0 , E ( ϑ ) [(1 − ξ ) ξ − ]) and choose δ ∈ (0 , ε = δ E ( ϑ ) [(1 − ξ ) ξ − ], and set Ω T := {| N T T − − ℓ ϑ | ≤ ℓ ϑ δ/ } for ℓ ϑ = E ( ϑ ) a [ R ] − .Then as in the proof of Theorem 3.5 in [13], we get the following decomposition P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T Z T − X t X t dt − E ( ϑ ) (cid:20) − ξξ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) > ε (cid:21) P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)Z R − X t X t dt (cid:12)(cid:12)(cid:12)(cid:12) > T ε (cid:21) + P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)Z R NT +1 R − X t X t dt − N T E ( ϑ ) (cid:20) − ξξ (cid:21) E ( ϑ ) a [ R ] (cid:12)(cid:12)(cid:12)(cid:12) > T ε T (cid:21) + P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) N T E ( ϑ ) (cid:20) − ξξ (cid:21) E ( ϑ ) a [ R ] − T E ( ϑ ) (cid:20) − ξξ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) > T ε T (cid:21) + P ( ϑ ) ν (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)Z R NT +1 T − X t X t dt (cid:12)(cid:12)(cid:12)(cid:12) > T ε T (cid:21) + P ( ϑ ) ν [Ω cT ] =: A + B + E + C + D Dealing with E and D can be achieved as in equations (3.10) and (3.14) in [13], to deduce that E = 0 and D ≤ T ε E ( ϑ ) (cid:20) − ξξ (cid:21) (cid:16) E ( ϑ ) ν (cid:2)(cid:12)(cid:12) R − ℓ − ϑ (cid:12)(cid:12)(cid:3) + 2 C E ( ϑ ) ν h | ¯ η | i ℓ ϑ (cid:17) , for C the constant from the Burkholder-Davis-Gundy inequality. All the above expressions areeither constant or have been shown to be bounded by functions continuous in ϑ in Appendix A,so it remains to deal with terms A , B and C above.Applying Markov’s inequality to A gives A ≤ T ε E ( ϑ ) ν (cid:20)Z R − X t X t dt (cid:21) and we can decompose the above integral E ( ϑ ) ν (cid:20)Z R − X t X t dt (cid:21) = E ( ϑ ) ν (cid:20)Z S − X t X t dt (cid:21) + E ( ϑ ) ν (cid:20)Z R S − X t X t dt (cid:21) ≤ E ( ϑ ) ν (cid:20)Z T b − X t X t dt (cid:21) + 1 − aa E ( ϑ ) ν [ R ] . (35)So it remains to prove that the first term on the RHS can be bounded by a function continuousin ϑ . It turns out that B and C can be bounded by similar quantities, so we do this first andsubsequently show that the resulting quantities can be bounded by functions continuous in ϑ .Indeed, set ξ k := R R k +1 R k (1 − X t ) X − t dt , M = 0, and M n := n X k =0 (cid:16) ξ k − E ( ϑ ) ν [ ξ k ] (cid:17) . Then B = P ( ϑ ) ν (cid:20) | M N T | > T ε T (cid:21) ≤ P ( ϑ ) ν " sup n ≤⌊ T ℓ ϑ (1+ δ/ ⌋ | M n | > T ε ≤ (cid:18) T ε (cid:19) E ( ϑ ) ν sup n ≤⌊ T ℓ ϑ (1+ δ/ ⌋ | M n | ! ≤ C (cid:18) T ε (cid:19) E ( ϑ ) ν h [ M ] ⌊ T ℓ ϑ (1+ δ/ ⌋ i by the Chebyshev and Burkholder-Davis-Gundy inequalities, where [ M ] n denotes the quadraticvariation of M up to time n , and C is the Burkholder-Davis-Gundy constant. Now observethat E ( ϑ ) ν h [ M ] ⌊ T ℓ ϑ (1+ δ/ ⌋ i = ⌊ T ℓ ϑ (1 + δ/ ⌋ E ( ϑ ) ν (cid:20)(cid:16) ξ − E ( ϑ ) ν [ ξ ] (cid:17) (cid:21) ⌊ T ℓ ϑ (1 + δ/ ⌋ (cid:16) E ( ϑ ) a (cid:2) ξ (cid:3) + E ( ϑ ) a [ ξ ] (cid:17) . because the { ξ k } ∞ k =1 are i.i.d., and moreover we have that under P ( ϑ ) ν they are equal in distribu-tion to ξ under P ( ϑ ) a . So B ≤ C ⌊ ℓ ϑ (1 + δ/ ⌋ T ε (cid:16) E ( ϑ ) a (cid:2) ξ (cid:3) + E ( ϑ ) a [ ξ ] (cid:17) . (36)The second term of (36) can be bounded in the same way as in (35), whilst for the first termwe can use a similar decomposition to get E ( ϑ ) a (cid:2) ξ (cid:3) ≤ E ( ϑ ) a "(cid:18)Z T b − X t X t dt (cid:19) + (cid:18) − aa (cid:19) E ( ϑ ) a (cid:2) R (cid:3)! . (37)Finally, for C we use the same arguments as in [13] (just before equation (3.13)) to get that C ≤ ⌊ T ℓ ϑ (1+ δ/ ⌋ X k =1 P ( ϑ ) ν (cid:20)Z R k +1 R k − X t X t dt > T ε (cid:21) ≤ ⌊ T ℓ ϑ (1 + δ/ ⌋ T ε E ( ϑ ) ν "(cid:18)Z R R − X t X t dt (cid:19) ≤ ℓ ϑ (1 + δ/ T ε E ( ϑ ) a "(cid:18)Z R − X t X t dt (cid:19) , and we can apply the same reasoning as in (37). It remains to show that the terms E ( ϑ ) a (cid:20)Z T b − X t X t dt (cid:21) , E ( ϑ ) ν (cid:20)Z T b − X t X t dt (cid:21) , E ( ϑ ) a "(cid:18)Z T b − X t X t dt (cid:19) (38)can be bounded by functions continuous in ϑ . The same arguments used to derive the ODEsin Appendix A can be used here to derive an ODE for U n ( x ) := E ( ϑ ) x [( R T b (1 − X t ) X − t dt ) n ] forthe cases when x < b and x > b with the same boundary conditions as in Appendix A. Thusthe following recursion holds for U n ( x ) when x < bU n ( x ) = 2 n Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ U n − ( η ) dηdξ, n = 1 , , . . . , (39)and for x > b we have U n ( x ) = 2 n Z xb e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ U n − ( η ) dηdξ, n = 1 , , . . . . (40)Now for n = 1, we get that for x < b , E ( ϑ ) x (cid:20)Z T b − X t X t dt (cid:21) = 2 Z bx e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ dηdξ ≤ { e − s , } Z bx ξ − θ (1 − ξ ) − θ Z ξ η θ − dηdξ = 2 max { e − s , } θ − Z bx ξ − (1 − ξ ) − θ dξ, (41)where we point out that the RHS of (41) is continuous in ϑ over any compact set K ⊂ Θ ⊂ R × [1 , ∞ ) because Θ is open. For x > b E ( ϑ ) x (cid:20)Z T b − X t X t dt (cid:21) = 2 Z xb e − sξ ξ − θ (1 − ξ ) − θ Z ξ e sη η θ − (1 − η ) θ dηdξ { e s , } Z xb ξ − max { θ , } (1 − ξ ) − θ Z ξ (1 − η ) θ dηdξ = 2 max { e s , } θ + 1 Z xb ξ − max { θ , } (1 − ξ ) dξ ≤ { e s , } θ + 1 Z xb ξ − max { θ , } dξ. (42)Thus when ν = f ϑ , E ( ϑ ) ν (cid:20)Z T b − X t X t dt (cid:21) ≤ { e − s , } θ − Z b Z bx ξ − (1 − ξ ) − θ dξf ϑ ( x ) dx + 2 max { e s , } θ + 1 Z b Z xb ξ − max { θ , } dξf ϑ ( x ) dx ≤ { e s , } θ ( θ −
1) 1 G ϑ Z b (1 − ξ ) − θ dξ + 2 max { e s , } θ + 1) Z b ξ − max { θ , } dξ, (43)which follows from Z b Z bx ξ − (1 − ξ ) − θ x θ − (1 − x ) θ − dξdx = Z b Z ξ ξ − (1 − ξ ) − θ x θ − (1 − x ) θ − dxdξ ≤ θ Z b ξ θ − (1 − ξ ) − θ dξ ≤ θ Z b (1 − ξ ) − θ dξ because θ , θ >
1, and Z b Z xb ξ − max { θ , } f ϑ ( x ) dξdx = Z b Z ξ ξ − max { θ , } f ϑ ( x ) dxdξ ≤ Z b ξ − max { θ , } dξ. Similarly, using the recursions in (39) and (40), we get that for x < b , E ( ϑ ) x "(cid:18)Z T b − X t X t dt (cid:19) ≤ { e − s , } ) θ − Z b γ θ − (1 − γ ) − θ dγ × Z bx ξ − θ (1 − ξ ) − θ dξ ≤ { e − s , } ) θ − − b ) − θ Z b γ θ − dγ × Z bx ξ − θ (1 − ξ ) − θ dξ (44)which follows from Z ξ η θ − (1 − η ) θ Z bη γ − (1 − γ ) − θ dγdη ≤ Z b η θ − (1 − η ) θ Z bη γ − (1 − γ ) − θ dγdη ≤ Z b γ θ − (1 − γ ) − θ dγ. As the RHS of (41), (42), (43), and (44) are all continuous in ϑ , we are able to exhibit a bound forthe quantities in (38) uniformly over compact K ⊂ Θ , and hence similarly bound the quantities A , B , and C . Combined with the bounds for D and E we conclude that (34) holds.23 eferences [1] J. P. Bollback, T. L. York, and R. Nielsen. Estimation of 2nes from temporal allele frequencydata. Genetics , 179(1):497–502, 2008.[2] C. Cannings. The latent roots of certain Markov chains arising in genetics: a new approach.I. Haploid models.
Advances in Appl. Probability , 6:260–290, 1974.[3] D. A. Dawson, B. Maisonneuve, and J. Spencer. ´Ecole d’ ´Et´e de Probabilit´es de Saint-FlourXXI—1991 , volume 1541 of
Lecture Notes in Mathematics . Springer-Verlag, Berlin, 1993.Papers from the school held in Saint-Flour, August 18–September 4, 1991, Edited by P. L.Hennequin.[4] M. Dehasque, M. C. vila Arcos, D. Dez-del Molino, M. Fumagalli, K. Guschanski, E. D.Lorenzen, A.-S. Malaspinas, T. Marques-Bonet, M. D. Martin, G. G. R. Murray, A. S. T.Papadopulos, N. O. Therkildsen, D. Wegmann, L. Daln, and A. D. Foote. Inference ofnatural selection from ancient dna.
Evolution Letters , 4(2):94–108, 2020.[5] S. N. Ethier and T. G. Kurtz.
Markov processes . Wiley Series in Probability and Mathe-matical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, Inc., NewYork, 1986. Characterization and convergence.[6] S. Gugushvili and P. Spreij. Parametric inference for stochastic differential equations: asmooth and match approach.
ALEA Lat. Am. J. Probab. Math. Stat. , 9(2):609–635, 2012.[7] Z. He, X. Dai, M. Beaumont, and F. Yu. Detecting and quantifying natural selection attwo linked loci from time series data of allele frequencies. bioRxiv , 2019.[8] Z. He, X. Dai, M. Beaumont, and F. Yu. Maximum likelihood estimation of natural selectionand allele age from time series data of allele frequencies. bioRxiv , 2020.[9] I. A. Ibragimov and R. Z. Has’minskii.
Statistical estimation: Asymptotic theory . Springer-Verlag, 1981.[10] N. Ikeda and S. Watanabe.
Stochastic differential equations and diffusion processes , vol-ume 24 of
North-Holland Mathematical Library . North-Holland Publishing Co., Amsterdam;Kodansha, Ltd., Tokyo, second edition, 1989.[11] S. Karlin and H. M. Taylor.
A second course in stochastic processes . Academic Press, Inc.[Harcourt Brace Jovanovich, Publishers], New York-London, 1981.[12] Y. A. Kutoyants.
Statistical inference for ergodic diffusion processes . Springer Series inStatistics. Springer-Verlag London, Ltd., London, 2004.[13] E. L¨ocherbach, D. Loukianova, and O. Loukianov. Polynomial bounds in the ergodic the-orem for one-dimensional diffusions and integrability of hitting times.
Ann. Inst. HenriPoincar´e Probab. Stat. , 47(2):425–449, 2011.[14] A.-S. Malaspinas, O. Malaspinas, S. N. Evans, and M. Slatkin. Estimating allele age andselection coefficient from time-serial data.
Genetics , 192(2):599–607, 2012.[15] R. Nickl and K. Ray. Nonparametric statistical inference for drift vector fields of multi-dimensional diffusions.
Ann. Statist. , 48(3):1383–1408, 2020.[16] R. Nickl and J. S¨ohl. Nonparametric Bayesian posterior contraction rates for discretelyobserved scalar diffusions.
Ann. Statist. , 45(4):1664–1693, 2017.2417] L. Panzar and H. van Zanten. Nonparametric Bayesian inference for ergodic diffusions.
J.Statist. Plann. Inference , 139(12):4193–4199, 2009.[18] J. Pitman and M. Yor. Bessel processes and infinitely divisible laws. In D. Williams,editor,
Stochastic Integrals , volume 851 of
Lecture Notes in Mathematics , pages 285–370.Springer-Verlag, 1981.[19] J. G. Schraiber, S. N. Evans, and M. Slatkin. Bayesian inference of natural selection fromallele frequency time series.
Genetics , 203(1):493–511, 2016.[20] F. van der Meulen and H. van Zanten. Consistent nonparametric Bayesian inference fordiscretely observed scalar diffusions.
Bernoulli , 19(1):44–63, 2013.[21] J. H. van Zanten. A note on consistent estimation of multivariate parameters in ergodicdiffusion models.
Scand. J. Statist. , 28(4):617–623, 2001.[22] H. Wang and C. Yin. Moments of the first passage time of one-dimensional diffusion withtwo-sided barriers.
Statist. Probab. Lett. , 78(18):3373–3380, 2008.[23] G. A. Watterson. Estimating and testing selection: the two-alleles, genetic selection diffu-sion model.