Asymptotic behaviour of sampling and transition probabilities in coalescent models under selection and parent dependent mutations
aa r X i v : . [ m a t h . P R ] N ov Asymptotic behaviour of sampling andbackward transition probabilities of thecoalescent with parent dependentmutations
Martina Favero ∗ and Henrik Hult KTH, Royal Institute of Technology, Stockholm
Abstract:
Several relevant quantities related to the Kingman coalescent arenot explicitly known when mutations are parent dependent. Examples includethe probability, referred to as the sampling probability, that the typed coales-cent, evolving forwards in time, hits a certain configuration, and the backwardtransition probabilities of its block counting process. In this paper, an asymp-totic result is derived for these quantities, as the sample size goes to infinity. It isshown that the sampling probabilities decay polynomially in the sample size withmultiplying constant depending on the stationary density of the Wright-Fisherdiffusion; and that the backward transition probabilities of the normalised jumpchain converge to the limit of frequencies of types in the sample.
MSC2020 subject classifications:
Primary 60J90; secondary 60J70, 92D15.
Keywords and phrases: coalescent, population genetics, parent dependentmutations, Wright-Fisher diffusion.
1. Introduction
Given a sample of genetic material from some individuals in a population, the King-man coalescent [7] models the genealogy of the individuals. When mutations are parentindependent (PIM), i.e. the genetic type of the mutated offspring does not depend onthe type of its parent, the coalescent can be simulated forward and backward in time,as both its forward and backward transition probabilities are explicitly known. Fur-thermore, its likelihood function is known as well as several other quantities. However,when mutations are parent dependent, the model becomes more complex and severalof the expressions that are explicit in the PIM case become implicit.The coalescent and its numerous generalisations have been extensively used forinference on genetic data sets, in combination with Monte Carlo methods used toapproximate its implicit likelihood function in the non-PIM case, see e.g. [9] for an ∗ corresponding author: [email protected] 1 verview. The asymptotic behaviour of coalescent models in relation to samples oflarge size has recently gained attention due to the large size of modern study sam-ples. While the size of samples continues to increase, due to advancements in DNAsequencing technology, inference methods based on coalescent models struggle to pro-vide reliable results, since the coalescent does not scale well, see e.g. [6]. Moreover,there is a distortion of some of the properties of the coalescent, which is a suitable ap-proximation of certain models, e.g. Wright-Fisher model, provided the sample size issufficiently smaller than the effective population size, see e.g. [1]. Studying the asymp-totic behaviour of the coalescent for large sample size could likely serve as a supportin the analysis of these problems.This paper contributes to the analysis of the asymptotic behaviour of the coales-cent by proving an asymptotic result for its sampling and transition probabilities asspecified in the following. A large sample is considered to be of the form n y ( n ) , with n ∈ N \ { } and y ( n ) ∈ n N d \ { } . As n → ∞ , it is assumed that y ( n ) → y ∈ R d> .An example is y ( n ) = n ⌊ n y ⌋ for some y ∈ R d> . Let p ( n y ( n ) ) be the probability thatthe block counting process of the Kingman coalescent, evolving forward in time, hitsstate n y ( n ) . This corresponds to the likelihood function when a sample n y ( n ) fromthe population at the present time is observed. A recursion formula is known for p ,see e.g. [4], but an explicit formula is unknown in the general case of parent depen-dent mutations. When the sample size is large, it is computationally too expensiveto compute p using the recursion formula. In this paper the asymptotic behaviour ofthe sampling probabilities p ( n y ( n ) ) is studied, as n tends to infinity and y ( n ) tends tosome y ∈ R d> .When mutations are parent independent, the sampling probabilities are explicitlyknown and it is easy to show that their asymptotic decay is polynomial of degree d − d is the number of possible allele types, see Section 2 for details. In particular,the degree of the polynomial does not depend on the mutation parameters, which hintsthat the same behaviour may apply in the general case when mutations are parentdependent. To verify this intuition the following strategy is adopted. First, in Section2, the sampling probabilities are represented as expectations of multinomial drawsfrom the stationary distribution of the Wright-Fisher diffusion. The resulting expec-tations are then interpreted as expectations with respect to a sequence of Dirichletdistributions. Finally, a local limit theorem for the sequence of Dirichlet distributionsis derived, in Section 3, and used, in Section 4, to prove that the asymptotic behaviourof the sampling probabilities is indeed decaying polynomially with degree d − p ( n y ( n ) ) ∼ ˜ p (cid:18) y k y k (cid:19) k y k − d n − d , as n → ∞ , y ( n ) → y , where ˜ p is the stationary density of the Wright-Fisher diffusion, k y k = | y | + · · · + | y d | ,2nd ∼ denotes asymptotic equality in the sense that a n ∼ b n if lim n →∞ a n /b n = 1.Establishing the asymptotic decay of p ( n y ( n ) ), enables the study of the asymptoticbehaviour of the backward transition probabilities of the normalised jump chain of theblock counting process of the Kingman coalescent. Let ρ ( n ) ( v | y ( n ) ) be the probability oftransition from state y ( n ) to state y ( n ) − n v , where v can be equal to e j , j = 1 , . . . , d ,the unit vector, which corresponds to a coalescence of type j , or to e j − e i , i, j =1 , . . . , d , which corresponds to a mutation from type i to type j forward-in-time (type j to i backward-in-time). In Section 5 it is proved that lim n →∞ ρ ( n ) ( e j | y ( n ) ) = y j k y k and lim n →∞ nρ ( n ) ( e j − e i | y ( n ) ) = θy i P ij k y k , i, j = 1 , . . . , d , where θ and P ij are mutationparameters.
2. Interpreting the sampling probabilities
Consider the jump chain of the block counting process of the Kingman coalescent withmutations, possibly parent dependent, H = { H ( r ) } r ∈ N ∈ N d \ { } , evolving forwardin time. Then the probability of observing sample n ∈ N d \ { } is given by p ( n ) = P ( H ( r ) = n , for some r ∈ N ) . (1)Let d ∈ N > be the number of possible types, θ be the mutation rate and P = ( P ij ) di,j =1 be the mutation probability matrix, which is assumed to be irreducible. Assume, ascustomary, that the probability of the coalescent starting with one individual of type i is ˜ π i , where ˜ π = (˜ π i ) di =1 is the invariant measure of P , i.e. p ( e i ) = ˜ π i .The aim of this section is to provide a representation of the sampling probabilitiesthat is convenient for the study of their asymptotic behaviour. This is a key step, infact, once the representation is identified, the outline of the proof of the asymptoticbehaviour becomes intuitively clear.To obtain the desired representation, a well-known relationship between the coa-lescent process and the Wright-Fisher diffusion is used. Let X = { X ( t ) } t ≥ ⊂ S = { x ∈ [0 , d : P di =1 x i = 1 } be the Wright-Fisher diffusion with mutations, that is thestrong solution to the following stochastic differential equation, d X ( t ) = µ ( X ( t )) dt + σ ( X ( t )) / d W ( t ) , t ≥ , (2)where W = { W ( t ) } t ≥ is a d dimensional Wiener process, the diffusion coefficient is σ ij ( x ) = x i ( δ ij − x j ) , i, j = 1 , . . . , d , and the drift is µ i ( x ) = θ P dj =1 x j P ji − θx i , i =1 , . . . , d where θ is the mutation rate and P the mutation probability matrix of theKingman coalescent.The two processes, H and X , are closely related. The sampling probability, p ( n ), canbe expressed as the expectation of a multinomial draw from the stationary distribution3f the Wright-Fisher diffusion. Let ˜ X be distributed according to be the stationarydistribution of the Wright-Fisher diffusion, then, see e.g. [4], p ( n ) = (cid:18) k n k n (cid:19) E " d Y i =1 ˜ X n i i . (3)Obviously, the relation above only holds if the Wright-Fisher diffusion admits a sta-tionary distribution. The existence of a stationary distribution is related to the struc-ture of the mutation mechanism, more precisely, to the existence of an invariantmeasure for the mutation probability matrix P . When mutations are parent inde-pendent, i.e. P ij = Q j > i, j = 1 , . . . , d , the stationary distribution, not onlyexists but also has an explicitly known density, a Dirichlet density with parameters θQ = ( θQ , . . . , θQ d ), due to [11].Unfortunately, the PIM case is the only case where the stationary distribution isexplicitly known. Furthermore, the Wright-Fisher diffusion has a degenerate ellipticgenerator, since the diffusion matrix is zero on the boundary of the domain. Thusthe classical theory for the study of stationarity and for the study of solutions to theFokker-Planck equation does not apply and a specific analysis is needed, see [8] forthe analysis of the stationary distribution and density of the Wright-Fisher diffusion.In this paper, the unknown stationary density is used to analyse the asymptoticbehaviour of the sampling probabilities in the non-PIM case. In order for the stationarydistribution to exist, it is assumed that the mutation transition matrix P is irreducible.In this case, as shown in [8], the invariant measure of P uniquely exists, the stationarydistribution of the Wright-Fisher diffusion is absolutely continuous with respect tothe Lebesgue measure and its probability density function, ˜ p , defined on ∆ = { x ∈ [0 , d − : P d − i =1 x i ≤ } , is smooth on ˚∆ = { x ∈ ∆ : x i > , i = 1 , . . . , d − } .Before focusing on the general case, the asymptotic behaviour of the samplingprobabilities in the PIM case is analysed, in order to get an insight on what kind ofasymptotic decay is to be expected in the general case. Using (3) and the Dirichlet sta-tionary density for the PIM case, an explicit expression for the sampling probabilitiesis given by p ( n ) = (cid:18) k n k n (cid:19) B ( n + θQ ) B ( θQ ) = 1 B ( θQ ) Γ( k n k + 1)Γ( k n k + θ ) d Y i =1 Γ( n i + θQ i )Γ( n i + 1) , where B is the Beta function and Γ is the Gamma function. Applying Stirling’s formula4o the Gamma functions yields p ( n y ( n ) ) ∼ B ( θQ ) (cid:0)(cid:13)(cid:13) n y ( n ) (cid:13)(cid:13) + 1 (cid:1) k n y ( n ) k + e − ( k n y ( n ) k +1 )( k n y ( n ) k + θ ) k n y ( n ) k + θ − e − ( k n y ( n ) k + θ ) · d Y i =1 (cid:16) y ( n ) i + θQ i (cid:17) ny ( n ) i + θQ i − e − (cid:16) ny ( n ) i + θQ i (cid:17) (cid:16) ny ( n ) i + 1 (cid:17) ny ( n ) i + e − (cid:16) ny ( n ) i +1 (cid:17) ∼ B ( θQ ) (cid:18) − θ k n y ( n ) k + θ (cid:19) k n y ( n ) k + (cid:0)(cid:13)(cid:13) n y ( n ) (cid:13)(cid:13) + θ (cid:1) − θ e θ − · d Y i =1 θQ i − ny ( n ) i + θQ i ! ny ( n ) i + (cid:16) ny ( n ) i + θQ i (cid:17) θQ i − e − θQ i ∼ n − d k y k − d B ( θQ ) d Y i =1 (cid:18) y i k y k (cid:19) θQ i − Note that the sampling probabilities decay polynomially with degree d − p ( n y ( n ) ) in the general case ismore involved, because of the lack of an explicit form for the stationary density ofthe Wight-Fisher diffusion. It is based on the following key observation. Denote theexpectation in (3) by k ( n ), i.e., for n ∈ N d \ { } , k ( n ) = E " d Y i =1 ˜ X n i i = Z ∆ d Y i =1 x n i i ˜ p ( x ) d x , (4)so that, for y ( n ) ∈ n N d \ { } , p ( n y ( n ) ) = (cid:18) n (cid:13)(cid:13) y ( n ) (cid:13)(cid:13) n y ( n ) (cid:19) k ( n y ( n ) ) . The function k admits an interpretation as an expectation with respect to a Dirichletdistribution, provided it is divided by the appropriate normalising constant. That is, k ( n y ( n ) ) B ( n y ( n ) + ) = Z ∆ f D ( n ) ( x )˜ p ( x ) d x = E (cid:2) ˜ p ( D ( n ) ) (cid:3) , (5)5here D ( n ) is Dirichlet distributed with concentration parameters equal to n y ( n ) + , f D ( n ) its probability density function, and is the vector of ones in R d .A clarification about notation and state spaces is needed. The space S is used as astate space for the Wright-Fisher diffusion, X , and the Dirichlet distributed randomvectors, D ( n ) , while their densities are integrated over ∆. When the density functions,which are defined on ∆, are evaluated at a point in S , it is implicitly understood thatthe first d − d − x ∈ ∆, x d stands for1 − P d − i =1 x i .Interpreting the function k as in (5), turns out to be an effective tool for analysingthe asymptotic behaviour of the sampling probabilities. In fact, it can be proved thatthe expectation in (5) converges to the constant ˜ p (cid:16) y k y k (cid:17) , while the remaining factors, (cid:0) n k y ( n ) k n y ( n ) (cid:1) B ( n y ( n ) + ), give rise to the polynomial decay of p ( n y ( n ) ). In particular, theproof can be divided into two parts. First, the asymptotic behaviour of the Dirichletrandom vectors D ( n ) is studied, using classical tools, see Section 3. Secondly, thefunction ˜ p is applied to D ( n ) and the asymptotic behaviour of the expectations (5) isstudied in Section 4, the main difficulty being that the stationary density is unknownand possibly unbounded near the axes.
3. Central and local limit theorems for a sequence of Dirichlet randomvectors
This section is devoted to the study of the asymptotic behaviour of a sequence ofDirichlet distributed random vectors. Although a classical topic, central and locallimit theorems are derived to make the paper self contained.Let D ( n ) = ( D ( n )1 , . . . , D ( n ) d ) ∈ S be a Dirichlet distributed random vector withconcentration parameters α ( n ) ∈ R d> such thatlim n →∞ α ( n ) n = α ∈ R d> . (6)The sequence of Dirichlet vectors in the previous section corresponds to α ( n ) = n y ( n ) + . A central limit theorem for the sequence D ( n ) is obtained using the well-known, seee.g. [3], relationship between Dirichlet and Gamma distributions, D ( n ) d = G ( n ) k G ( n ) k , (7)where G ( n ) = ( G ( n )1 , . . . , G ( n ) d ) is a vector of independent Gamma random variableswith shape parameters α ( n ) i , i = 1 , . . . , d and rate parameter β ∈ R > . Note that β
6s irrelevant in the tranformation from G ( n ) to D ( n ) . Furthermore, since the Gammadistribution is infinitely divisible, it is possible to write each of the Gamma randomvariables as a sum of independent Gamma random variables, and thus for each variable G ( n ) i the following central limit theorem holds √ nβ q α ( n ) i n G ( n ) i − α ( n ) i nβ ! d −−−→ n →∞ N (0 , , i = 1 , . . . , d. Consequently, by (6) and independence of the components of the random vectors G ( n ) ,the following central limit theorem for G ( n ) holds √ n (cid:18) n G ( n ) − α β (cid:19) d −−−→ n →∞ N d (cid:18) , β diag ( α ) (cid:19) , where diag ( α ) is the diagonal matrix with α on the diagonal. Since D ( n ) can bewritten as a function of G ( n ) , as in (7), applying the multivariate delta method yields √ n (cid:18) D ( n ) − α k α k (cid:19) d −−−→ n →∞ N d ( , Σ( α )) , with Σ( α ) = J (cid:18) α β (cid:19) β diag( α ) J (cid:18) α β (cid:19) T , where J is the Jacobian matrix associated to the transformation in (7). Calculatingthe Jacobian and multiplying the matrices, which is omitted, yieldsΣ ij ( α ) = α i k α k ( δ ij k α k − α j ) , i, j = 1 , . . . , d. As expected, the covariance matrix above does not depend on the auxiliary rateparameter β .Note that, if k α k = 1, then Σ( α ) = σ ( α ), where σ is the diffusion matrix in (2).Consequently, the Wright-Fisher diffusion matrix can be interpreted as the covariancematrix of the Gaussian limit of a sequence of Dirichlet random vectors.The Gaussian limiting vector, being the limit of a sequence of Dirichlet randomvectors, has a degenerate distribution, as its last component can be expressed in termsof the first d − R d − by excluding the lastcomponent of each vector in the remaining part of this section.7et φ , be the pdf of the d − d − ( α ), the restriction of Σ( α ) to the first d − f D ( n ) ( x ) = 1 B ( α ( n ) ) d Y i =1 x α ( n ) i − i , x ∈ ∆ , be the pdf of (the first d − D ( n ) , where x d stands for 1 − P d − i =1 x i .Then the pdf of √ n (cid:16) D ( n ) − α k α k (cid:17) is given by φ n ( u ) = n − ( d − f D ( n ) (cid:18) √ n u + α ( n ) k α ( n ) k (cid:19) , for u ∈ −√ n (cid:18) ∆ − α ( n ) k α ( n ) k (cid:19) , and equal to 0 otherwise. In general, convergence in distribution does not imply conver-gence of probability density functions, however, under the conditions of the converseto Scheffe’s theorem, see [2], it does. More precisely, if the sequence φ n is boundedand uniformly equicontinuous, then φ n → φ , as n → ∞ , uniformly on R d − . In theremaining part of this section, the conditions for the convergence of densities are ver-ified, again, it is fundamental that the parameters of the Dirichlet vectors grow toinfinity linearly, as assumed in (6).In order to show boundedness of φ n , first notice that, for a fixed n , the maximumof φ n is reached at √ n (cid:18) α ( n ) − k α ( n ) k − d − α ( n ) k α ( n ) k (cid:19) , and thus s n = sup u ∈ R d − φ n ( u ) = n − d − B ( α ( n ) ) d Y i =1 α ( n ) i − k α ( n ) k − d ! α ( n ) i − . Using Stirling’s formula for the Beta function, s n ∼ (2 π ) d − d Y i =1 α i k α k ! − (cid:13)(cid:13) α ( n ) (cid:13)(cid:13) n ! ( d − . From the expression above, it is clear that if (cid:13)(cid:13) α ( n ) (cid:13)(cid:13) grows faster than linearly, thesequence is not bounded and if it grows slower it converges to zero. By assumption(6), (cid:13)(cid:13) α ( n ) (cid:13)(cid:13) grows linearly and therefore the supremum s n converges to (2 π ) d − d Y i =1 α i k α k k α k d − ! − , n → ∞ , and thus the sequence of density functions φ n is bounded. Note that fromthe expression in the last display, the determinant of Σ d − ( α ) can be identifieddet(Σ d − ( α )) = d Y i =1 α i k α k k α k d − . The reasoning leading up to the expression for det(Σ d − ( α )) provides an alternativeway to compute the determinant of the diffusion matrix of the Wright-Fisher diffusion,det( σ d − ( x )) = Q di =1 x i . Following the same type of argument as for the sequence of density functions, it isstraightforward to show that also each sequence of their first order partial derivativesis bounded. Furthermore, the density functions φ n are smooth on a compact set.Therefore, the sequence of densities is uniformly Lipschitz continuous, which verifiesthe uniform equicontinuity condition, proving the local limit result as stated in thefollowing proposition. Proposition 3.1.
Let D ( n ) ∼ Dirich ( α ( n ) ) , α ( n ) ∈ R d> such that lim n →∞ α ( n ) n = α ∈ R d> . Then the following central limit theorem holds √ n (cid:18) D ( n ) − α k α k (cid:19) d −−−→ n →∞ N d ( , Σ( α )) , with Σ ij ( α ) = α i k α k ( δ ij k α k − α j ) , i, j = 1 , . . . , d. Furthermore, a local limit theorem for the corresponding probability density functions, φ n and φ , holds lim n →∞ sup u ∈ R d − | φ n ( u ) − φ ( u ) | = 0 .
4. Asymptotic behaviour of the sampling probabilities
The goal of this section is to prove an asymptotic result for the sampling probabilities(1). In Section 2, the sampling probabilities have been written in terms of the function k such that k ( n y ( n ) ) B ( n y ( n ) + ) = E (cid:2) ˜ p ( D ( n ) ) (cid:3) . The asymptotic behaviour of this expectationis analysed in the following theorem using the results for the sequence of Dirichletrandom vectors of the previous section.
Theorem 4.1.
Let k be defined as in (4) . Let ˜ p be the stationary density of the Wright-Fisher diffusion (2) , assuming the mutation probability matrix, P , is irreducible. Let ( n ) ∈ n N d \ { } such that y ( n ) → y ∈ R d> , as n → ∞ . Then lim n →∞ k (cid:0) n y ( n ) (cid:1) B ( n y ( n ) + ) = ˜ p (cid:18) y k y k (cid:19) . Proof.
The stationary density ˜ p is smooth on ˚∆ = { x ∈ ∆ : x i > } , so it is boundedon any compact set contained in ˚∆. It could, however, explode on the axes. In orderto deal with this problem, the domain is divided in two parts. For ε > , define∆ ε = { x ∈ ˚∆ : x i ≥ ε, i = 1 , . . . , d − } . Since y k y k ∈ ˚∆, it follows that y k y k ∈ ∆ ε , forall 0 < ε ≤ ε y , with ε y = k y k min i =1 ,...,d − y i . Fixing 0 < ε ≤ ε y , consider k (cid:0) n y ( n ) (cid:1) B ( n y ( n ) + ) = Z ∆ f D ( n ) ( x )˜ p ( x ) d x = Z ∆ ε f D ( n ) ( x )˜ p ( x ) d x + Z ∆ \ ∆ ε f D ( n ) ( x )˜ p ( x ) d x . (8)To show convergence for the first term in (8), a change of variables yields Z ∆ ε f D ( n ) ( x )˜ p ( x ) d x = Z I √ n ∆ ε − y ( n ) k y ( n ) k ! ( u ) φ n ( u )˜ p (cid:18) √ n u + y ( n ) k y ( n ) k (cid:19) d u . By Proposition 3.1 and continuity of ˜ p on ˚∆, the integrand above converges pointwiseto φ ( u )˜ p (cid:16) y k y k (cid:17) . Furthermore, since ˜ p is bounded in ∆ ε by some c ε , the sequence isdominated by the sequence φ n ( u ) c ε , the integral of which is equal to c ε , for all n .Therefore, by the generalized dominated convergence theorem, Z ∆ ε f D ( n ) ( x )˜ p ( x ) d x −−−→ n →∞ Z R d − φ ( u )˜ p (cid:18) y k y k (cid:19) d u = ˜ p (cid:18) y k y k (cid:19) . It remains to show that the the second term in (8) converges to zero. First note that,if x ∈ ∆ \ ∆ ε , then x j < ε for some j , and, since x i ≤ , i = 1 , . . . , d , d Y i =1 x ny ( n ) i i < ε ny ( n ) j ≤ ε ny ( n )min , where y ( n ) j ≥ y ( n )min = min i =1 ,...,d − y ( n ) i . Using the inequality above, the fact that the10ntegral of ˜ p is bounded by 1, and Stirling’s formula yield Z ∆ \ ∆ ε f D ( n ) ( x )˜ p ( x ) d x ≤ ε ny ( n )min B ( y ( n ) + ) Z ∆ \ ∆ ε ˜ p ( x ) d x ≤ ε ny ( n )min Γ (cid:16) n k y k ( n )1 + d (cid:17)Q di =1 Γ (cid:16) ny ( n ) i + 1 (cid:17) ∼ (2 π ) − d − ε ny ( n )min (cid:0) n (cid:13)(cid:13) y ( n ) (cid:13)(cid:13) + d (cid:1) n k y ( n ) k + d − Q di =1 (cid:16) ny ( n ) i + 1 (cid:17) ny ( n ) i + ∼ " ε y min d Y i =1 (cid:18) k y k y i (cid:19) y i n n d − (cid:18) k y k π (cid:19) d − d Y i =1 k y k y i ! , where y min = min i =1 ,...,d − y i >
0. The expression in the last display converges to zeroby choosing ε < Q di =1 (cid:16) y i k y k (cid:17) yiymin , and thus the second integral in (8) converges tozero, as n → ∞ , completing the proof.By Theorem 4.1, it is straightforward to show that the asymptotic decay of thesampling probabilities is indeed polynomial as expected. Theorem 4.2.
Let p be the sampling probability (1) of the block counting process of theKingman’s coalescent. Let ˜ p be the stationary density of the Wright-Fisher diffusion (2) , assuming the mutation probability matrix P is irreducible. Let y ( n ) ∈ n N d \ { } ,such that y ( n ) → y ∈ R d> , as n → ∞ . Then lim n →∞ n d − p ( n y ( n ) ) = k y k − d ˜ p (cid:18) y k y k (cid:19) . Proof.
Since p ( n y ( n ) ) = (cid:0) n k y ( n ) k n y ( n ) (cid:1) k ( n y ( n ) ), rewrite n d − p ( n y ( n ) ) = n d − (cid:18) n (cid:13)(cid:13) y ( n ) (cid:13)(cid:13) n y ( n ) (cid:19) B (cid:0) n y ( n ) + (cid:1) k (cid:0) n y ( n ) (cid:1) B ( n y ( n ) + ) . Then, note that n d − (cid:18) n (cid:13)(cid:13) y ( n ) (cid:13)(cid:13) n y ( n ) (cid:19) B (cid:0) n y ( n ) + (cid:1) = n d − Γ (cid:0) n (cid:13)(cid:13) y ( n ) (cid:13)(cid:13) + 1 (cid:1) Γ ( n k y ( n ) k + d ) → k y k − d , as n → ∞ , whereas, by Theorem 4.1, k ( n y ( n ) ) B ( n y ( n ) + ) converges to ˜ p (cid:16) y k y k (cid:17) . This completesthe proof. 11 . Asymptotic behaviour of the transition probabilities Theorem 4.1 does not only imply Theorem 4.2, but also enables the direct analysis ofthe asymptotic behaviour of the backward transitions probabilities of the normalisedjump chain of the block counting process of the Kingman coalescent. Let H ( n ) , n ∈ N , be independent copies of H , and Y ( n ) = n H ( n ) ⊂ n N d \ { } . The focus in this sectionis on the asymptotic behaviour of the transition probabilities ρ ( n ) ( v | y ( n ) ) = P (cid:0) H ( n ) (1) = n y ( n ) − v | H ( n ) (0) = n y ( n ) (cid:1) = P (cid:18) Y ( n ) (1) = y ( n ) − n v | Y ( n ) (0) = y ( n ) (cid:19) , for v = e j , e j − e i , i, j, = 1 , . . . , d , as n → ∞ and y ( n ) → y ∈ R d> . As for thesampling probabilities, the difficulty here is that, when mutations are parent depen-dent, an explicit expression for the backward transition probabilities is not available.Nevertheless, they can be written in terms of the probabilities π , defined below, byapplying the Bayes’ formula to the known forward transition probabilities and usinga symmetry condition, see [10, 5] for more details, ρ ( n ) ( v | y ( n ) ) = y ( n ) j ( ny ( n ) j − k y ( n ) k ( n k y ( n ) k − θ ) 1 π [ j | n y ( n ) − e j ] , if v = e j , j = 1 . . . d, θP ij y ( n ) j k y ( n ) k ( n k y ( n ) k − θ ) π [ i | n y ( n ) − e j ] π [ j | n y ( n ) − e j ] , if v = e j − e i , i, j = 1 . . . d, , otherwise. (9)The probability π [ j | n ], n ∈ N d \ { } , is the probability of sampling an individual oftype j after sampling k n k individuals with types given by n . Equivalently, it can bewritten, in terms of the sampling probabilities, p , as π [ i | n ] = n i + 1 k n k + 1 p ( n + e i ) p ( n ) . (10)In the PIM case, P ij = Q j , i, j = 1 , . . . , d , the sampling probabilities are known, and π [ i | n ] = n i + θQ i k n k + θ , see e.g. [10]. In this case thus π [ i | n y ( n ) ] converges to y i k y k . It turns outthat in the general case the limit of the probabilities π is the same. To prove this, π is written in terms of the function k , π [ i | n ] = k ( n + e i ) k ( n ) . The results of the previous section, combined with the expression above, make itstraightforward to study the asymptotic behaviour of π [ i | n y ( n ) ], and, consequently, ofthe backward transition probabilities. 12 roposition 5.1. Let π be defined as in (10) . Assume the mutation probability matrix P is irreducible. Let y ( n ) ∈ n N d \ { } , such that y ( n ) → y ∈ R d> , as n → ∞ . Then,for i = 1 , . . . , d, lim n →∞ π [ i | n y ( n ) ] = y i k y k . Proof.
Rewrite π [ i | n y ( n ) ] = k ( n y ( n ) + e i ) B ( n y ( n ) + e i + ) B ( n y ( n ) + ) k ( n y ( n ) ) B ( n y ( n ) + e i + ) B ( n y ( n ) + ) . By Theorem 4.1, k ( n y ( n ) + e i ) B ( n y ( n ) + e i + ) and k ( n y ( n ) ) B ( n y ( n ) + ) both converge to ˜ p (cid:16) y k y k (cid:17) , as n → ∞ .Calculating, B ( n y ( n ) + e i + ) B ( n y ( n ) + ) = Γ( ny ( n ) i + 2)Γ( ny ( n ) i + 1) Γ( n (cid:13)(cid:13) y ( n ) (cid:13)(cid:13) + d )Γ( n k y ( n ) k + 1 + d ) = ny ( n ) i + 1 n k y ( n ) k + d , and letting n → ∞ concludes the proof.Knowing the asymptotic behaviour of π directly solves the problem of analysingthe asymptotic behaviour of the backward transition probabilities. Corollary 5.2.
Let ρ ( n ) be the transition probabilities in (9) . Under the assumptionsof Proposition 5.1, for i, j = 1 , . . . , d, lim n →∞ ρ ( n ) ( e j | y ( n ) ) = y j k y k and lim n →∞ nρ ( n ) ( e j − e i | y ( n ) ) = θP ij y i k y k . References [1]
Bhaskar, A. , Clark, A. G. and
Song, Y. S. (2014). Distortion of genealogi-cal properties when the sample is very large.
Proceedings of the National Academyof Sciences of the United States of America
Boos, D. D. (1985). A Converse to Scheffe’s Theorem.
The Annals of Statistics Devroye, L. (1986).
Non-Uniform Random Variate Generation . Springer-Verlag, New York.[4]
Griffiths, R. C. and
Tavare, S. (1994). Simulating Probability Distributionsin the Coalescent.
Theoretical Population Biology De Iorio, M. and
Griffiths, R. C. (2004). Importance sampling on coalescenthistories. I.
Advances in Applied Probability Kelleher, J. , Etheridge, A. M. and
McVean, G. (2016). Efficient Co-alescent Simulation and Genealogical Analysis for Large Sample Sizes.
PLOSComputational Biology .[7] Kingman, J. F. C. (1982). The coalescent.
Stochastic Processes and their Ap-plications
235 - 248.[8]
Shiga, T. (1981). Diffusion Processes in Population Genetics.
J. Math. KyotoUniv. (JMKYAZ) Stephens, M. (2007). Inference under the coalescent. In
Handbook of StatisticalGenetics (D. Balding, M. Bishop and C. Cannings, eds.) 26, 878-908. Whiley,Chichester, UK.[10]
Stephens, M. and
Donnelly, P. (2000). Inference in molecular population ge-netics.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) Wright, S. (1949). Adaption and selection. In