Maximum likelihood estimates under k -allele models with selection can be numerically unstable
aa r X i v : . [ s t a t . A P ] O c t The Annals of Applied Statistics (cid:13)
Institute of Mathematical Statistics, 2009
MAXIMUM LIKELIHOOD ESTIMATES UNDER K-ALLELEMODELS WITH SELECTION CAN BE NUMERICALLY UNSTABLE
By Erkan Ozge Buzbas and Paul Joyce University of Idaho
The stationary distribution of allele frequencies under a varietyof Wright–Fisher k -allele models with selection and parent indepen-dent mutation is well studied. However, the statistical properties ofmaximum likelihood estimates of parameters under these models arenot well understood. Under each of these models there is a point indata space which carries the strongest possible signal for selection,yet, at this point, the likelihood is unbounded. This result remainsvalid even if all of the mutation parameters are assumed to be known.Therefore, standard simulation approaches used to approximate thesampling distribution of the maximum likelihood estimate producenumerically unstable results in the presence of substantial selection.We describe the Bayesian alternative where the posterior distributiontends to produce more accurate and reliable interval estimates for theselection intensity at a locus.
1. Introduction.
We begin by introducing a certain amount of termi-nology from population genetics. Within each living cell are a certain fixednumber of chromosomes , threadlike objects that govern the inheritable char-acteristics of an organism. At certain positions, or loci , on the chromosomes,are genes , the fundamental units of heredity. At each locus there are severalalternative types of genes or alleles . A diploid organism has chromosomesthat occur in homologous pairs. The unordered pair of genes situated at thesame locus of the homologous pair is called a genotype . Thus, if there are k alleles, A , A , . . . , A k at a given locus, then there are k ( k + 1) / A i A j ), 1 ≤ i ≤ j ≤ k . The fitness of a genotype ( A i A j ) is deter-mined by the reproductive success of individuals carrying that genotype. Received July 2008; revised January 2009. Supported both by NSF Grant DEB-0515738 and NIH Grant P20 RR016454 from theINBRE Program of the National Center for Research Resources. Supported in part by NSF Grant DEB-0515738.
Key words and phrases.
Selective overdominance, heterozygote advantage, multiple al-lele models, maximum likelihood, posterior analysis, instability.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Applied Statistics ,2009, Vol. 3, No. 3, 1147–1162. This reprint differs from the original in paginationand typographic detail. 1
E. O. BUZBAS AND P. JOYCE
Understanding the evolutionary forces that shape the patterns of observedgenetic diversity is central to population genetics. Over evolutionary time,genotypes with higher fitness tend to drive out those with lower fitness,thus reducing the genetic diversity of a population with respect to the par-ticular locus under selection. However, there are selective mechanisms thatactually promote genetic diversity. A simple such mechanism called het-erozygote advantage assumes that carrying two variant copies of a gene ata locus (heterozygote) leads to higher fitness in comparison to carrying thesame copy of the gene (homozygote). For example, individuals suffering fromsickle cell anemia carry two copies of a disease gene. On the other hand, indi-viduals that carry two copies of the healthy gene are susceptible to malaria.In regions with high incidence of malaria, individuals carrying one copy ofthe healthy gene and one copy of the diseased gene have higher fitness be-cause they suffer only mild symptoms of anemia while also being resistantto malaria [Allison (1956), Cavalli–Sforza and Bodmer (1971), Harding andGriffiths (1997)]. As an infectious disease, malaria is a major health threatto human populations, affecting approximately 515 million people globally[Snow et al. (2005)].For a population with allele frequencies x , x , . . . , x k , where P ki =1 x i = 1,we define the homozygosity to be h = P ki =1 x i , which is the probability that asample of size two will produce two genes with the same allele. A populationunder the influence of heterozygote advantage would likely have low homozy-gosity. The lowest possible value occurs when all of the allele frequencies areequal ( x i = 1 /k for all i ). Thus, low homozygosity might suggest that a highallelic diversity in a population is explained by heterozygote advantage. Ifall genotypes have equal fitness, we call the locus neutral. In this case, allof the genetic diversity is produced by mutation from one allele to another.While the heterozygote advantage model assumes a diploid population,it is mathematically equivalent to a frequency dependent selection modelwhich can apply to haploid organisms [Neuhauser (1999)]. One form of fre-quency dependence implies that high frequency alleles are at a selectivedisadvantage relative to alleles at low frequency. An example of this regimecomes from allele frequencies from a bacteria that causes Lyme disease ( Bor-relia burgdorferi ) [Donnelly, Nordborg and Joyce (2001)]. The data [previ-ously published in Qiu et al. (1997)] consist of four alleles with frequencies x ′ = (0 . , . , . , . h = 0 . .
25 under k = 4. Under the neutral modelassumption, mutations that occurred in the distant past would correspondto high frequency alleles and more recent mutations would give rise to lowfrequency alleles. So under neutrality one might expect a few alleles in highfrequency and most alleles in low frequency, corresponding to relatively highhomozygosity. At first glance, the homozygosity in the above data appears tobe too low to be explained by neutrality. Watterson derived the distribution LES UNDER K -ALLELE MODEL of the homozygosity statistic under the assumption of neutrality which leadto one of the most common tests to distinguish departures from neutrality[Watterson (1977)]. However, with modern computational methods, we arenow in a position to go beyond simply determining whether neutrality isfeasible. We can now develop likelihood based inference methods for pre-cise alternative models that incorporate selection and estimate the strengthof selection. The alternative models presented here are called the Wright–Fisher k -allele models with selection [Wright (1949)]. These classical modelshave a rich mathematical theory and long history in population genetics [seeEwens (2004) for background], mainly because they provided theoretical in-sight into the dynamics of genes evolving under selection. In the data richworld that population genetics finds itself today, there is a renewed interestin these models as useful tools to draw inferences on selection at genetic lociof interest.A brief description of the Wright–Fisher process is as follows. Considertracking a population of 2 N genes over many generations. A gene can be oneof k possible alleles. Generations are non overlapping and the probability ofsampling genotype ( A i A j ) is proportional to its fitness, w ij = 1 − s ij . To ob-tain the next generation, 2 N pairs of genes are sampled with replacement.A randomly chosen allele within each sampled pair mutates to A i with prob-ability u i , independent of the parent’s type. A standard diffusion argument[Ewens (2004)] based on the Markov chain of allele frequencies generates thestationary distribution f Sel ( x | θ , Σ ) = e − x ′ Σx E Neut ( e − X ′ ΣX ) f Neut ( x | θ ) , (1)where x is a ( k ×
1) column vector of allele frequencies in the population,subject to the condition P ki =1 x i = 1, whose transpose is the row vector x ′ = ( x , . . . , x k ). We have θ ′ = ( θ , . . . , θ k ), where θ i = 4 N u i , is the scaledmutation parameter for type i and Σ = ( σ ij ) with σ ij = 2 N s ij , the ( k × k )symmetric matrix of scaled selection parameters. The probability densityfunction for allele frequencies under neutrality, f Neut ( x | θ ), is given by thefamiliar Dirichlet distribution f Neut ( x | θ ) = Γ( θ + θ + · · · + θ k )Γ( θ )Γ( θ ) · · · Γ( θ k ) x θ − x θ − · · · x θ k − k . (2)We will use the notation E Neut ( · ) to represent expectation with respect to theneutral density given by equation (2), and E Sel ( · | Σ ) to denote expectationwith respect to the density that incorporates selection given by equation (1).The symmetric selective overdominance model is a special case of equa-tion (1), obtained by setting w ij = 1, that is ( s ij = 0) for heterozygotes( i = j ) and w ii = 1 − s , ( s ii = s >
0) for homozygotes regardless of the
E. O. BUZBAS AND P. JOYCE specific type and assuming symmetric mutation θ i = θ/k , for all i . Underthese assumptions, the matrix of selection parameters reduces to a diagonalmatrix with equal elements, Σ = σ I k , where I k is the k dimensional iden-tity matrix, σ = 2 N s , and the mutation parameter is θ = 4 N u . Therefore, X ′ ΣX = σ P ki =1 X i . Substituting appropriately into equation (1) gives f Sel ( x | θ, σ ) = e − σ P ki =1 x i E Neut ( e − σ P ki =1 X i ) Γ( θ )(Γ( θ/k )) k ( x x · · · x k ) θ/k − . (3)Despite the wide ranging applications of k -allele models, the statistical prop-erties of estimators are not well understood. This article aims to clarify in-ference problems associated with estimators of the selection intensity andpresents correct frequentist and Bayesian methods for inference under k -allele models. Theoretical results describing the large variability in the sam-pling distribution of maximum likelihood estimates (MLEs), arising in theanalysis of k -allele models with selection in general and of symmetric over-dominance in particular, are given. The likelihood of allele frequencies underthe stationary distribution of the diffusion limit of a Wright–Fisher popu-lation is examined and the existence of a numerical instability associatedwith MLEs for certain population compositions is established. Numericalinstability of MLEs occurs under what would normally be considered idealconditions. These conditions include the assumption that all the mutationparameters are either known or can be estimated without error. Also, the al-lele frequencies of the entire population are assumed to be observed, ratherthan the usual assumption that the data are viewed as a random samplefrom the population. Even under these idealized conditions, it is shown thatwhen the data carry a strong signal for selection, parametric bootstrap isinaccurate and unreliable for assessing the strength of selection.In Section 2 the theoretical basis for the instability behavior is formalizedby a theorem: under k -allele models with selection, there is a singularitypoint on the allele frequency space where the likelihood is unbounded. Acorollary associated with the theorem is also presented: under the symmet-ric overdominance model, the above mentioned singularity arises when allthe alleles have equal frequencies. This is identified as a perfectly heterozy-gous population. This result is surprising since it suggests that appreciableinformation in the data about selection yield poor estimates and MLEs witha parametric bootstrap approach cannot be used effectively to estimate thestrength of selection. Therefore, our findings highlight the limitations of us-ing the sampling distribution of the MLEs for inference under k -allele modelswith selection.Section 3 exploits a monotonicity argument to show a correct frequentistapproach to the problem as well as a Bayesian method as an alternative toparametric bootstrap. Data from a Killer-cell immunoglobulin-like receptor LES UNDER K -ALLELE MODEL Fig. 1.
Left: The distribution of the maximum likelihood estimates as a function of pop-ulation homozygosity based on equation (3) considering data space that can arise under k = 20 on a grid. ˆ σ asymptotes as h approaches to /k = 0 . . Right: Heavy tailed samplingdistributions of ˆ σ , each based on 1000 simulated data sets with θ = 5 . (KIR) locus and the Lyme disease bacteria locus discussed above are usedto demonstrate the methods in Section 4. A comparison of the heterozygoteadvantage with another selective mechanism of importance, the homozygoteadvantage, is given in Section 5, to emphasize that the effect of instabilityin MLEs may be different under different selective scenarios.
2. Instability of the MLE.
A useful summary of the population com-position under the overdominance model is the homozygosity statistic H = P i X i , which is sufficient for σ given θ and the MLE ˆ σ ( h ), is a decreasingfunction of h = P i x i .The signal for selection is strong when the population homozygosity issmall (Figure 1, left). However, the high rate of divergence and unbounded-ness of ˆ σ ( h ) as h approaches to its minimum value 1 /k is unexpected. Smallperturbations of allele frequencies have a drastic effect on point estimates,particularly for small h . For instance, for a highly polymorphic locus with k = 20 possible alleles, where 1 /k = 0 .
05, a 38% decrease in homozygosity( h = 0 .
13 to h = 0 .
08) corresponds to an approximate 300% increase in ˆ σ ( h );[ˆ σ (0 . ≈ σ (0 . > σ . Assume we condition on θ , so that the sole focus of inference E. O. BUZBAS AND P. JOYCE is on σ . When a simulated sample with h ≈ (1 /k ) is drawn, the resulting ˆ σ islarge. Thus, such samples contribute to a heavy right tail for the distributionof ˆ σ . Sampling distributions generated under strong selection clearly revealthe substantial effect of the heavy tail on the estimation of σ (Figure 1,right). Estimates from data sets generated under a variety of ( σ, θ, k ) valuesshow that the heavy tail in the distribution of ˆ σ is a persistent feature. Infact, all k -allele models have a singularity in data space. Theorem 1 belowgives a precise statement of this general phenomenon. Theorem 1.
Consider the probability density function f Sel ( x | θ , Σ ) de-fined by equation (1) that describes the distribution of allele frequencies atstationarity under the Wright–Fisher model with selection and parent inde-pendent mutation. There exists a vector of allele frequencies x ∗ = ( x ∗ , . . . , x ∗ k ) ′ ,where f Sel ( x ∗ | θ , Σ ) is unbounded as a function of Σ regardless of θ . The proof of Theorem 1 is given in Appendix A, where we show that thepoint x ∗ where f Sel ( x ∗ | θ , Σ ) is unbounded as a function of Σ is found byminimizing the quantity P i,j σ ij x i x j subject to the constraint P i x i = 1. Ifthe mean fitness of the population is defined by ¯ w = P i,j w ij x i x j , we have P i,j σ ij x i x j = 2 N (1 − ¯ w ). Therefore, x ∗ is a point in the data space where¯ w is optimal.The symmetric overdominance model, which has a considerably simplermatrix of selection parameters, allows for further results. In this case, boththe limiting value for ˆ σ and the point x ∗ at which the likelihood is unboundedcan be established. Corollary 1.
Consider the probability density function f Sel ( x | θ, σ ) ,defined by equation (3) that describes the distribution of allele frequencies atstationarity for the Wright–Fisher symmetric selective overdominance modelwith parent independent mutation. Let x ∗ = (1 /k, . . . , /k ) ′ . a. If θ is assumed to be known, then, for all allele frequencies x = x ∗ , themaximum likelihood estimate for σ is finite. Denote the MLE as a func-tion of the homozygosity h = P ki =1 x i by ˆ σ ( h ) . Then, lim h → (1 /k ) + ˆ σ ( h ) = ∞ . (4)b. For all θ > , if ( X , X , . . . , X k ) has joint probability density given byequation (3), then ( X , X , . . . , X k ) converges in probability to (1 /k, /k, . . . , /k ) as σ → ∞ .(See Appendix B for proof ). LES UNDER K -ALLELE MODEL Part (b) of Corollary 1 provides some context for the ill behavior of theMLE for σ . If σ is large, it is highly likely that the population frequenciesare nearly equal. The asymptotic behavior of populations in the limit as σ goes to infinity has been studied in other contexts by both Gillespie (1999)and Joyce, Krone and Kurtz (2003).
3. Methods to assess the error in estimates.
Estimates using the monotonicity of homozygosity.
By exploitingthe monotonicity of the distribution of the homozygosity statistic, H , wedevelop a frequentist approach to obtain interval estimates of selection in-tensity under the selective overdominance model [equation (3)] that does notrely on approximating the sampling distribution for the MLE ˆ σ . Through-out we assume that θ is known and we drop the parameter θ for notationalconvenience when denoting the cumulative distribution function (cdf) forthe homozygosity H = P ki =1 X i such that we have F H ( h | σ ) = P Sel ( H ≤ h | θ, σ ) . While the cdf is always a monotone function of h for fixed σ , this particularcdf is also monotone with respect to the parameter σ for fixed h . Moreprecisely, we can state that as σ increases, the probability of H being smallerthan a particular value h increases. An exact confidence interval for σ isproduced using this monotonicity property in σ .For a given confidence level (1 − α ) and an observed homozygosity H = h ,we choose ˆ σ L and ˆ σ U so that F H ( h | ˆ σ L ) = α , F H ( h | ˆ σ U ) = 1 − α , (5)where α = α + α . We interpret ˆ σ L and ˆ σ U as the smallest and largest valuesof σ that supports the data. Since F H ( h | σ ) is a monotone increasing functionof σ , then ˆ σ L ≤ σ ≤ ˆ σ U if and only if F H ( h | ˆ σ L ) ≤ F H ( h | σ ) ≤ F H ( h | ˆ σ U ),which implies α ≤ F H ( h | σ ) ≤ − α . Therefore, P Sel (ˆ σ L ≤ σ ≤ ˆ σ U ) = P Sel ( α ≤ F H ( H | σ ) ≤ − α ) . The standard theory shows that F H ( H | σ ) is a uniformly distributed randomvariable on the interval [0 ,
1] to give the result P Sel (ˆ σ L ≤ σ ≤ ˆ σ U ) = 1 − α − α = 1 − α. Therefore, [ˆ σ L , ˆ σ U ] is an exact (1 − α ) level confidence interval.However, when θ is unknown, the monotonicity of H in σ holds no moreand only confidence regions are possible to obtain. Therefore, in applica-tions where the variability in ˆ θ is expected to be considerable and the jointestimation of ( θ, σ ) is required, the method is not very useful. Next, we turnto the Bayesian approach as an alternative that allows for marginalizationin σ . E. O. BUZBAS AND P. JOYCE
Fig. 2.
A parametric bootstrap sample of size for ˆ σ using the Lyme disease data.All values of ˆ σ > are plotted at . Estimates based on the Bayesian methods.
Assuming independentuniform priors on ( θ, σ ), the joint posterior distribution of ( θ, σ ) is propor-tional to the likelihood, P Sel ( θ, σ | x ) ∝ e − σ P ki =1 x i E Neut ( e − σ P ki =1 X i ) ( x x · · · x k ) θ/k − , (6)which can be sampled using a standard Markov Chain Monte Carlo ap-proach. We sampled the joint posterior distribution of the parameters viaan independent Metropolis–Hastings update [Metropolis et al. (1953), Hast-ings (1970)]. The posterior mode is found by numerically maximizing thejoint distribution and 95% credible intervals are used as a measure of vari-ability for σ .Since both the posterior and the bootstrap are based on the likelihood,intuition suggests a similar problem of instability might arise in the Bayesiananalysis. Examining the posterior sample of σ (see examples in Section 4),we find that the Bayesian approach does not have the instability observed inthe bootstrap. The reason is that, in contrast to the parametric bootstrapwhich generates data, posterior analysis works on fixed data. While eachsimulated data set in the bootstrap has a certain probability of falling intothe instability region, this problem is avoided in posterior simulation, bysampling the parameter space rather than the data space. LES UNDER K -ALLELE MODEL
4. Examples.
In this section we present two data applications to com-pare the three methods discussed above for making inference on selectionintensity: MLE-bootstrap, monotonicity and the Bayesian approach.4.1.
Lyme disease data.
We revisit (see Section 1) the Lyme disease bac-teria data. Recall the data consist of four alleles with frequencies x ′ =(0 . , . , . , . h = 0 .
288 rela-tively close to the minimum 0 .
25 under k = 4. The MLEs for the mutationand selection parameters are (ˆ θ = 4 . , ˆ σ = 35 .
1) respectively. A parametricbootstrap with (ˆ θ, ˆ σ, k ) = (4 . , . ,
4) admits poor precision for the esti-mated selection intensity as discussed in Section 2. Based on the simulatedsampling distribution for ˆ σ (Figure 2), we get an estimated standard er-ror of 176 .
4. The 2.5th percentile of the simulated sampling distributionof ˆ σ corresponds to 17.2 and the 97.5th percentile is 681.3. Therefore, anapproximate 95% interval estimate based on the parametric bootstrap as-sociated with ˆ σ is (17 . , . . P ( H ≥ . | σ = 681 . < . P ( H ≤ . | σ = 17 .
25) = 0 . α = α = 0 .
025 produces an exact 95% confidence intervalof ( − , KIR data.
Our second example is from a data set published in Nor-man (2004) on KIR genes. The data are from the United Kingdom popula-tion [see locus DL1/S1 from Table 2 in Norman (2004)]. The KIR are highlypolymorphic genes coding for proteins on natural killer cells and they de-tect a specific Major histocompatibility complex Class I protein found ondiseased natural killer cells. Observed levels of variability at these loci sug-gest that the heterozygote advantage mechanism is a good candidate toexplain the variation in KIR genes. The population frequencies are given by x ′ = (0 . , . , . , . , . , . , . , . h = 0 . h min = 0 .
125 for k = 8. An ap-proximate 95% bootstrap interval estimate for this data set is (21 . , . θ , the interval estimate using the monotonicity argument E. O. BUZBAS AND P. JOYCE
Fig. 3.
The empirical distributions of homozygosity under ˆ σ L = − (left) and ˆ σ U = 159 (right) for KIR data (h = Fig. 4.
A sample from posterior distribution of σ for the KIR data set using fixed θ (left) and the joint estimation (right). 95% credible interval limits are (6.3, 182.9) and(4.3, 205.5) (shades) respectively. with α = α = 0 .
025 is given as ( − , θ at the posterior mode, the KIR data giving a 95% credibilityinterval for σ is (6 . , .
9) (Figure 4). For comparison purposes a 95%credibility interval for σ , (4 . , . θ and σ is also included in Figure 4. Not surprisingly, the variability in σ increases when the uncertainty in θ is taken into account.
5. Discussion.
Wright–Fisher k -allele models with selection provide aflexible framework for considering a wide array of biologically meaningful LES UNDER K -ALLELE MODEL selective schemes. The impact of the instability on the MLEs can vary sub-stantially depending on the particular selective scheme. In this section wedescribe an example using the symmetric homozygote advantage model. Ourgoal is to compare the homozygote and heterozygote advantage models interms of the instability explained. The comparison is particularly insightful,as the two schemes have very different biological implications and yet thereis a close connection between their parameterization.In contrast to heterozygote advantage, homozygote advantage selects forgenotypes with the same allelic types. The symmetric version can be ob-tained by letting the selection matrix Σ = − σ I k , σ >
0, now denoting therelative selective advantage of homozygotes to heterozygotes. Note that thedifference between this model and the heterozygote advantage consists onlyof switching the sign of the selection parameter. Therefore, both heterozy-gote and homozygote advantages can be accommodated in the same modelby allowing σ ∈ R , a useful property to compare the instability regions aris-ing under two regimes. Simulated sampling distributions obtained under thehomozygote advantage display a heavy left tail, reflecting the sign changein σ . In light of Theorem 1, which holds for all selective schemes, this resultis not surprising. The existence of a singularity point for the homozygoteadvantage model is guaranteed. The strongest signal for selection under thismodel is at H = 1 (i.e., at maximum homozygosity) and, in fact, the samepoint turns out to be where ˆ σ is infinite since this point maximizes the meanfitness.Interestingly, the effect of the instability on the variability of MLEs un-der the homozygote advantage is milder than the heterozygote advantage.The difference is explained by examining the populations contributing tothe tail of interest in the corresponding bootstrap distributions (i.e., leftand right tails for homozygote and heterozygote advantage models respec-tively). They have different probabilities of arising under the two cases. Let0 < ε < − /k and consider populations that are close to perfect homozy-gosity (1 − ε < H < /k < H < /k + ε ). In the bootstrap procedure, whenevera generated data set falls in these regions, a large MLE for the selectionintensity is obtained. However, the probability of drawing a population inthe first set, P Sel (1 − ε < H < | θ, − σ ), is lower than the probability of draw-ing a population in the second set, P Sel (1 /k < H < /k + ε | θ, σ ), for a given(absolute) selection intensity (Figure 5). In other words, for a given numberof simulated samples, the expected number of samples to hit the singular-ity region is larger under symmetric heterozygote advantage than under thehomozygote advantage. Hence, the type of selective regime is an importantfactor when evaluating the effect of the instability on the confidence inter-vals. Unfortunately, in the important case of the heterozygote advantagemodel, the effect is pronounced. E. O. BUZBAS AND P. JOYCE
Fig. 5.
The probability that a sample falls into the instability region for different se-lection intensities (as percent of generated samples with 1000 samples at each σ ) underhomozygote advantage (dots) and heterozygote advantage (open circles). The samples aregenerated under k = 10 and the instability region is chosen using ε = 0 . units from perfecthomozygosity (1, 0.91) and heterozygosity (0.1, 0.19) (see text). The ultimate goal for the use of methods discussed in this paper is to de-velop statistical methods that can be used to detect selection at multiple locisimultaneously under the k allele models. Multiple loci data provide moreinformation than single locus data, therefore, inference is expected to bemore precise. The Bayesian methods gain a definite advantage of flexibilityas the number of genetic loci, and thus the dimensionality of the problem,increases.Modern population genetics using coalescent based methods to explainpolymorphism data have been effectively used to understand genealogy andrecombination [Fearnhead (2001), Nordborg (2000), Griffiths and Marjoram(1997), Padhukasahasram et al. (2008)] but have been less successful with se-lection. The computational burden for simulating and analyzing data undercoalescent based methods with selection remains heavy. There is a renewedinterest in diffusion approaches which provide an alternative framework tohandle models with selection [Wakeley (2005)]. Furthermore, diffusion mod-els are cornerstones of population genetics theory. Their relatively long his-tory resulted in a variety of useful models to investigate selection other thanthe k -allele setup. An important task is to establish statistical properties ofestimators and investigate the usefulness of different statistical paradigmsunder these models.Finally, counterintuitive results presented in this paper point out thatcare should be exercised in method choice when making inference on selec-tion under the class of models we presented. As emphasized above, realisticapplications of the methods involve inference from multiple loci possibly with LES UNDER K -ALLELE MODEL complex selective schemes. However, such setups are not ideal to investigatethe statistical properties of the estimators. Because computationally inten-sive methods employed in analyzing them become less tractable, complexmodels tend to hide problems of the type discussed in this paper. Therefore,rigorous tests of the methods under the single locus case are essential toguarantee the legitimacy of inference on selection made by employing thesemethods. APPENDIX A: PROOF OF THEOREM 1We begin by fixing θ and Σ , then finding a point in data space thatproduces the largest possible signal for selection. Data space is representedby the k dimensional simplex defined by ∆ k = { ( x , x , . . . , x k ) | P kj =1 x j =1 } . Since ∆ k is a compact set, there exists at least one point x ∗ ≡ x ∗ ( Σ ) ∈ ∆ k where e − xΣx ′ is optimized. It follows from equation (1) that x ∗ is the pointin data space that produces the strongest possible signal for selection. Notethat x ∗ is obtained by minimizing the quadratic function xΣx ′ = P ij σ ij x i x j subject to the constraint P i x i = 1. This implies X i,j σ ij x ∗ i x ∗ j ≤ X i,j σ ij x i x j ∀ x ∈ ∆ k . (7)We now turn to the alternative problem where we fix the data x and calculatethe maximum likelihood estimate for Σ denoted by ˆ Σ ( x ). Throughout wewill assume that θ is known. Since the likelihood [equation (1)] is a smoothfunction of the parameters, the standard calculus approach to optimizationbased on the derivative of the log likelihood is a valid method for findingthe MLE. It follows from equation (1) that ∂∂σ ij ln f Sel ( x | θ , Σ ) = E Neut ( X i X j e − XΣX ′ ) E Neut ( e − XΣX ′ ) − x i x j (8) = E Sel ( X i X j | Σ ) − x i x j . Therefore, to obtain the MLE for Σ , denoted by ˆ Σ ( x ), we set equation(8) equal to zero for each pair of indices i, j and solve for ˆ Σ ( x ). Thus, for agiven data set x , we have E Sel ( X i X j | ˆ Σ ( x )) − x i x j = 0 . (9)Now multiply equation (9) by σ ij and sum to obtain E Sel (cid:18)X i,j σ ij X i X j (cid:12)(cid:12)(cid:12) ˆ Σ ( x ) (cid:19) − X i,j σ ij x i x j = 0 . (10) E. O. BUZBAS AND P. JOYCE
Now, assume x ∗ satisfying the inequality (7) are the observed data. Thenit follows from equation (7) that the newly defined random variable M ∗ ≡ X i,j σ ij X i X j − X i,j σ ij x ∗ i x ∗ j ≥ , (11)with probability 1. Assume the likelihood given by equation (1) is boundedat the point x ∗ . By continuity, the maximum likelihood estimate ˆ Σ ( x ∗ )exists. Then it follows from equation (10) that E Sel ( M ∗ | ˆ Σ ( x ∗ )) = 0 . (12)Therefore, M ∗ is a non-negative random variable with mean zero, implying M ∗ = 0 with probability 1. However, M ∗ is a continuous function of thecontinuous random vector X and is therefore equal to zero with probabilityzero. So, assuming the likelihood is bounded at x ∗ leads to a contradiction.APPENDIX B: PROOF OF COROLLARY 1 Proof of part a.
Because the likelihood given by equation (3) is asmooth function of σ , standard calculus methods can be used to derive theMLE for σ . Again assuming that θ is known and differentiating the naturallog of the likelihood given in (3), we get ∂∂σ ln f Sel ( x | σ, θ ) = − h + E Sel ( H | σ ) . (13)Define g ( σ ) ≡ E Sel ( H | σ ), then g is a decreasing function. To show that g is decreasing, we show that g ′ ( σ ) < σ . It follows from equation (3)that g ( σ ) = E Sel ( H | σ ) = E Neut ( He − σH ) E Neut ( e − σH )and g ′ ( σ ) = − E Neut ( e − σH ) E Neut ( H e − σH ) + ( E Neut ( He − σH )) ( E Neut ( e − σH )) = − Var
Sel ( H ) < . That is, as the selection intensity σ grows, then homozygotes are increas-ingly disadvantaged. Thus, the mean homozygosity, E Sel ( H | σ ), will becomesmaller as the selection intensity, σ , grows. By equation (13), the maximumlikelihood estimate for σ will satisfy the equation H = g (ˆ σ ) . LES UNDER K -ALLELE MODEL To establish equation (4), we need to show that for every sequence of ho-mozygosities converging to 1 /k from above, the corresponding MLE for theselection intensity converges to infinity. Consider a monotone decreasing se-quence of real numbers { a n } where a n → n → ∞ . Define σ n to be thesolution to the equation g ( σ n ) = 1 /k + a n . Since g is a decreasing function, it follows by monotonicity that σ n must bean increasing sequence. Increasing sequences must either converge or divergeto infinity. Suppose for a moment that σ n → σ ∗ < ∞ as n → ∞ . Then bycontinuity of g , we have that g ( σ ∗ ) = 1 /k . This implies that E Sel ( H − /k | σ ∗ ) = 0 . (14)However, we know that H ≥ /k with probability one. Therefore, it followsfrom equation (14) that H − /k is a nonnegative random variable withmean zero. This implies that H − /k is identically zero with probabilityone. This is a contradiction, since we know that, for any value of σ , H is acontinuous random variable whose distribution can be derived from equation(3) and so cannot be equal to 1 /k with probability one. Therefore, σ n mustgo to infinity. (cid:3) Proof of part b.
Note that g ( σ ) = E ( H | σ ) is a decreasing functionin σ that is bounded below by 1 /k . Therefore, lim σ →∞ g ( σ ) must con-verge. Denote the limit by b . Note that g − ( b ) = ∞ . By part (a), g − ( b )is the maximum likelihood estimate for σ when H = b . Since the maxi-mum likelihood estimate is finite for all h > / k , then b = 1 /k . Therefore, E ( H − /k | σ ) = E ( | H − /k | | σ ) goes to zero as σ → ∞ . This implies thatthe L norm of H converges to 1 /k , which implies that H converges to1 /k in probability. The conclusion of part (b) is established by noting that H = 1 /k if and only if ( X , X , . . . , X k ) = (1 /k, /k, . . . , /k ). (cid:3) Acknowledgments.
The authors would like to thank Craig Miller andDarin Rokyta for their helpful comments on the manuscript.REFERENCES
Allison, A. C. (1956). The sickle-cell and haemoglobin c genes in some African popula-tions.
Am. Hum. Genet. Cavalli-Sforza, L. L. and
Bodmer, W. F. (1971).
The Genetics of Human Populations .Dover, Mineola, NY.
Donnelly, P., Nordborg, M. and
Joyce, P. (2001). Likelihood and simulation methodsfor a class of nonneutral population genetics models.
Genetics
Ewens, W. J. (2004).
Mathematical Population Genetics: I. Theoretical Introduction , 2nded. Interdisciplinary Applied Mathematics . Springer, New York. MR2026891 E. O. BUZBAS AND P. JOYCE
Fearnhead, P. and
Donnelly, P. (2001). Estimating recombination rates from popu-lation genetic data.
Genetics
Genz, A. and
Joyce, P. (2003). Computation of the normalization constant for expo-nentially weighted Dirichlet Distribution.
Comput. Sci. Statist. Gillespie, J. (1999). The role of population size in molecular evolution.
Theor. Popul.Biol. Griffiths, R. C. and
Marjoram, P. (1997). An ancestral recombination graph. In
Progress in Population Genetics and Human Evolution, IMA Volumes in Mathemat-ics and Its Applications. (P. Donnelly and S. Tavar´e, eds.) 257–270. Springer, Berlin.MR1493031
Harding, R. M., Fullerton, S. M. and
Griffiths, R. C. (1997). Archaic African andAsian lineages in the genetic ancestry of modern humans.
Am. Jour. Hum. Genet. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and theirapplications.
Biometrika Joyce, P., Genz, A. and
Buzbas, E. O. (2009). Efficient simulation methods for a classof nonneutral population genetics models.
Theor. Popul. Biol.
To appear.
Joyce, P., Krone, S. M. and
Kurtz, T. G. (2003). When can one detect overdominantselection in the infinite-alleles model?
Ann. Appl. Probab. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and
Teller, E. (1953). Equation of state calculations by fast computing machines.
J.Chem. Phys. Neuhauser, C. (1999). The Ancestral Graph and Gene Genealogy under Frequency-Dependent Selection.
Theor. Popul. Biol. Nordborg, M. (2000). Linkage disequilibrium, gene trees and selfing: An ancestral re-combination graph with partial self-fertilization.
Genetics
Norman, P. J., Cook, M. A., Carey, B. S., Carrington, C. V. F., Verity,D. H., Hameed, K., Ramdath, D. D., Chandanayingyong, D., Leppert, M.,Stephens, H. A. F. and
Vaughan, R. W. (2004). SNP haplotypes and allele fre-quencies show evidence for disruptive and balancing selection in the human leukocytereceptor complex.
Immuno-Genetics Padhukasahasram, B., Marjoram, P., Wall, J. D., Bustamante, C. and
Nord-borg, M. (2008). Exploring population genetic models with recombination using effi-cient forward-time simulations.
Genetics
Qiu, W. G., Bosler, E., Campbell, J., Ugine, G., Wang, I.,Benjamin, J. L. and
Dykhuzen, D. E. (1997). A population genetic study of Borrelia burgdorferi sensustricto from eastern long island, New York, suggested frequency-dependent selection,gene flow and host adaptation.
Hereditas
Snow, R. W., Guerra, C. A., Noor, A. M., Myint, H. Y. and
Hay, S. I. (2005). Theglobal distribution of clinical episodes of plasmodium falciparum malaria.
Nature
Wakeley, J. (2005). The limits of population genetics.
Genetics
Watterson, G. A. (1977). Heterosis or neutrality?
Genetics Wright, S. (1949). Adaptation and selection. In