Robustness and epistasis in mutation-selection models
aa r X i v : . [ q - b i o . P E ] J a n Robustness and epistasis in mutation-selectionmodels
Andrea Wolff and Joachim Krug
Institut f¨ur Theoretische Physik, Universit¨at zu K¨oln, Z¨ulpicher Str. 77, 50937 K¨oln,GermanyE-mail: [email protected] and [email protected]
Abstract.
We investigate the fitness advantage associated with the robustness of aphenotype against deleterious mutations using deterministic mutation-selection modelsof quasispecies type equipped with a mesa shaped fitness landscape. We obtain analyticresults for the robustness effect which become exact in the limit of infinite sequencelength. Thereby, we are able to clarify a seeming contradiction between recent rigorouswork and an earlier heuristic treatment based on a mapping to a Schr¨odinger equation.We exploit the quantum mechanical analogy to calculate a correction term for finitesequence lengths and verify our analytic results by numerical studies. In addition, weinvestigate the occurrence of an error threshold for a general class of epistatic landscapeand show that diminishing epistasis is a necessary but not sufficient condition for errorthreshold behavior.
1. Introduction
In current evolutionary theory, the concept of robustness , referring to the invarianceof the phenotype under pertubations, is of central importance [1, 2]. Here we addressspecifically mutational robustness , which we take to imply the stability of some biologicalfunction with respect to mutations away from the optimal genotype. To be precise,suppose the genotype is encoded by a sequence of length L , and the number ofmismatches with respect to the optimal genotype is denoted by k . Robustness is thenquantified by the maximum number of mismatches k , that can be tolerated before thefitness of the individual falls significantly below that of the optimal genotype at k = 0.This situation arises e.g. in the evolution of regulatory motifs, where the fitness is afunction of the binding affinity to the regulatory protein [3, 4]. Assuming that the fitnessis independent of k both for k ≤ k and for k > k , the fitness landscape has the shapeof a mesa parametrized by its width k and height w [5].We consider deterministic mutation-selection models of quasispecies type, whichdescribe the dynamics of large (effectively infinite) populations [6]. We analyse thestationary state of mutation-selection balance, focusing on the dependence of thepopulation fitness on the parameters k and w . This allows us to identify the conditionsunder which a broad fitness peak of relatively low selective advantage outcompetes a obustness and epistasis in mutation-selection models survivalof the flattest [8, 9, 10, 11, 12]. Our central aim is to obtain analytic results for therobustness effect that become exact in the limit of long sequences. In particular, wewant to clarify whether the selective advantage is a function primarily of the relative number of tolerable mismatches x = k /L , or of the total number of mismatches k .Robustness in the sense described above is a special case of epistasis , which refersmore generally to any nonlinear relationship between the number of mutations awayfrom the optimal genotype and the corresponding fitness effect [13]. A simple way toparametrize epistasis is to let the loss of fitness vary with the number of mismatchesas k α , such that the non-epistatic case α = 1 separates regimes of synergistic ( α > α <
1) epistasis [14, 15]. An important problem in previous workon mutation-selection models has been to identify the conditions under which epistaticfitness landscapes display an error threshold , a term that refers to the discontinuousdelocalization of the population from the vicinity of the fitness peak as the mutationrate is increased beyond a critical value [6, 16]. Improving on earlier work that foundthat only landscapes with diminishing epistasis ( α <
1) have an error threshold, wederive here the more stringent condition α ≤ / We base our work on two complementary analytic approaches. First, recent progressin the theory of mutation-selection models [5, 16, 17, 18, 19, 20, 21, 22] providesan expression for the population fitness in terms of a maximum principle (MP) thatbecomes exact when the limit L → ∞ is performed keeping the ratio x = k /L fixed.Second, Gerland and Hwa (GH) [3] have used a drift-diffusion approximation to mapthe mutation-selection problem onto a one-dimensional Schr¨odinger equation which isthen analyzed with standard techniques.Our work was initially motivated by the observation of a discrepancy between thetwo approaches: Whereas the MP predicts that the selective advantage of a broad mesashould vanish when the limit L → ∞ is taken at fixed k , in the GH approach a finiteselective advantage is retained in this limit, which depends on the absolute value of k rather than on x . After introducing the model and briefly reviewing the results of theMP approach in section 2, we therefore provide a detailed discussion of the drift-diffusionapproximation used by GH in section 3. We emphasize that it amounts to a harmonicapproximation, and show how it can be improved in such a way that the results basedon the MP are recovered.The mapping to one-dimensional quantum mechanics is nevertheless useful, as itallows us to derive the leading finite size correction to the population fitness. As aconsequence we find excellent agreement between the analytic predictions and numericalsolutions of the discrete mutation-selection equations. In section 4 we consider theselection transition in a two-peak landscape first studied by Schuster and Swetina[7, 23], in which the population shifts from a high, narrow fitness maximum to a lower obustness and epistasis in mutation-selection models α and verify our prediction by numerical calculations. Finally,some conclusions are presented in section 6. Details of the derivation of the improvedcontinuum approximation and the generalization to arbitrary alphabet size can be foundin two appendices.
2. Mutation-selection models and the maximum principle
We consider the simplest case of binary sequences and adopt continuous time dynamicsof the Crow-Kimura type, in which the mutation and selection terms act in parallel[6]. Point mutations occur at rate µ , and the (Malthusian) fitness is assumed from theoutset to be a function w k only of the Hamming distance k to the optimal sequence at k = 0. The population structure is described by the fraction P k ( t ) of individuals with k mismatches, which satisfies the evolution equation dP k dt = ( w k − ¯ w ) P k + µ ( k + 1) P k +1 + µ ( L − k + 1) P k − − µLP k . (1)with 1 ≤ k ≤ L − k = 0 and k = L . The nonlinearityintroduced by the mean fitness ¯ w ( t ) = P k w k P k can be eliminated by passing tounnormalized population variables [6, 24]. At long times the population distributiontherefore approaches the principal eigenvector P ∗ k of the linear dynamics, which is thesolution of the eigenvalue problemΛ P ∗ k = ( w k − µL ) P ∗ k + µ ( k + 1) P ∗ k +1 + µ ( L − k + 1) P ∗ k − (2)with the maximal eigenvalue Λ. This eigenvalue is equal to the long-time limit ofthe mean population fitness ¯ w , and it is the main quantity of interest in this paper.Depending on the context we will refer to Λ as the mean population fitness, thepopulation growth rate, the principal eigenvalue of the mutation-selection matrix definedby (2) or the ground state energy of the corresponding quantum mechanical problem,to be defined in subsection 3.2.A considerable body of work has been devoted to the solution of (2) for large L . Inorder to obtain nontrivial behavior in the limit L → ∞ , it is necessary to either scalethe mutation rate ∼ /L or the fitness ∼ L . We adopt here the first choice and take L → ∞ , µ → γ = µL fixed. If, in addition, the fitness landscape w k is assumedto depend only on the relative number of mismatches, such that w k = f ( x ) , x = k/L (3)the principal eigenvalue in (2) is given, for L → ∞ , by the solution of a one-dimensionalvariational problem as [16, 17, 18, 19, 20, 21]Λ = max x ∈ [0 , { f ( x ) − γ [1 − p x (1 − x )] } ; (4) obustness and epistasis in mutation-selection models f ( x ) is differentiable the leadingorder correction to (4) takes the form [19, 20]∆Λ = γ L p x c − x c [1 − q − f ′′ ( x ∗ )( x c − x c ) / /γ ] , (5)where x c is the value at which the maximum in (4) is attained.In the first part of this paper we focus on mesa landscapes of the form w k = ( w > ≤ k ≤ k k > k , (6)where w is the selective advantage of the functional phenotype and k denotes thenumber of tolerable mismatches. Within the class of scaling landscapes (3), this isrealized by setting f ( x ) = w θ ( x − x ) , (7)where θ is the Heaviside step function and x = k /L . Provided x < /
2, applicationof the maximum principle (4) yieldsΛ = ( w − γ (1 − p x (1 − x )) : w > w c = γ (1 − p x (1 − x ))0 : w < w c . (8)The value w c of the selective advantage marks the location of the error threshold at whichthe population delocalizes from the fitness peak and the location x c of the maximum in(4) jumps from x c = x to x c = 1 / L − / or L − / rather than L − in this case.
3. Continuum limit and the drift-diffusion equation
A natural approach to analyzing (1) and (2) for large L is to perform a continuum limitin the index k . To this end we introduce ǫ = 1 /L as small parameter and replace thepopulation variable P k by a function φ ( x ) = lim L →∞ P xL . (9)The fitness is taken to be of the general form (3). Expanding the finite differences in(2) to second order in ǫ then yields the stationary drift-diffusion equation f φ − ǫγ ddx (1 − x ) φ + ǫ γ d dx φ = Λ φ. (10)This is identical to the equation obtained by GH [3], who however write it in terms ofthe unscaled variable k = Lx . obustness and epistasis in mutation-selection models f = 0) theprincipal eigenvalue in (10) is readily seen to be Λ = 0, and the corresponding (right)eigenfunction is a Gaussian centered at x = 1 / φ ( x ) ∼ exp[ − (1 − x ) / ǫ ] . (11)This is just the central limit approximation to the binomial distribution P k = 2 − L (cid:18) Lk (cid:19) (12)which solves (2) for w k = 0 and Λ = 0. It is well known that the central limitapproximation of (12) is accurate in a region of size √ L around k = L/
2, but becomesimprecise for deviations of order L . An improved approximation is provided by thetheory of large deviations [25], in which the ansatz P k ∼ exp[ − Lu ( x )] (13)is made to obtain an expression for the large deviation function u ( x ). In the context ofmutation-selection models, this approach has recently been introduced by Saakian [20],who showed that it allows to derive the exact relation (4) in a relatively straightforwardmanner (see Appendix A). Equivalent results can be obtained by continuing theexpansion in (10) to all orders in ǫ and treating the resulting equation in a WKB-typeapproximation, which essentially corresponds to the ansatz (13), see [22].We conclude that the drift-diffusion approximation (10) can be expected to bequantitatively accurate only near the center x = 1 / The key step in reducing (10) to standard form is to symmetrize the linear operator onthe left hand side, thus eliminating the first-order drift term. This can be achieved bythe transformation φ ( x ) = p φ ( x ) ψ ( x ) , (14)with φ ( x ) from (11), which leads to the stationary Schr¨odinger equation − ǫ γ d dx ψ + V ( x ) ψ = − (Λ − ǫγ ) ψ (15)with the effective potential V ( x ) = γ − x ) − f ( x ) . (16)The latter consists of the superposition of a harmonic oscillator centered around x = 1 / obustness and epistasis in mutation-selection models ǫ plays the role of Planck’s constant ~ , which implies that the case of interest is the semiclassical limit of the quantum mechanical problem. In particular, for ǫ → − Λ becomes equal to the minimum of the effective potential. Wethus arrive at the variational principleΛ = max x ∈ [0 , [ f ( x ) − γ − x ) ] , (17)which is precisely the harmonic approximation (in the sense of a quadratic expansionaround x = 1 /
2) of the exact relation (4). In this perspective the error thresholdcorresponds to a shift between different local minima of V ( x ), which become degenerateat the transition point. The transition is generally of first order, in the sense thatthe location x c of the global minimum jumps discontinuously. Within the harmonicapproximation the transition for the mesa landscape occurs at w c = γ − x ) ≈ γ (cid:18) − k L (cid:19) (18)when x = k /L ≪ For small but finite ǫ , quantum corrections to the classical limit (17) have to be takeninto account. If f ( x ) is smooth, the ground state wave function is localized near theminimum x c of the effective potential, and the shift in the ground state energy can becomputed by replacing V ( x ) by a harmonic well, V ( x ) ≈ V ( x c ) + 12 V ′′ ( x c )( x − x c ) = V ( x c ) + 12 [4 γ − f ′′ ( x c )]( x − x c ) . (19)Identifying 1 /γ with the mass m of the quantum particle [compare to (15)], we see thatthis corresponds to a harmonic oscillator of frequency ω = 2 γ p − f ′′ ( x c ) / γ . Theground state energy ǫω/
2, together with the shift ǫγ on the right hand side of (15), thusgives rise to the leading order correction∆Λ = γL [1 − p − f ′′ ( x c ) / γ ] , (20)which coincides with (5) evaluated for x c ≈ /
2. Similarly, the width of the wavefunction is given by ‡ ξ = p γǫ/ ω = √ ǫγ [8 p − f ′′ ( x c ) / γ ] / . (21)In the case of the mesa landscape (7), the potential near x c = x consists of a linearramp of slope − a = V ′ ( x ) = 2 γ (2 x − < w . For small ǫ , the jump can be considered as effectivelyinfinite (as the kinetic energy of the particle is then very small), and the corresponding ‡ Note that, because of the factor √ φ in (14), this is not equal to the width of the stationary populationdistribution. obustness and epistasis in mutation-selection models z ( ~ / m ) / a / = 2 / z γ (1 − x ) / L − / + O ( L − ) , (23)where z ≈ − . ... is the first zero of the Airy function. The scaling ∆Λ ∼ L − / was already noted in [5]. The width of the wave function can be estimated to be of theorder ξ ∼ ( ~ /ma ) / ∼ ǫ / (24)in this case. We are now prepared to make contact with the approach of GH [3]. Assuming fromthe outset that the maximal number of mismatches is small compared to the sequencelength, 1 ≪ k ≪ L , they neglect the contribution 2 x in the drift term on the left handside of (10). The linear operator can then be symmetrized by the transformation φ ( x ) = e x/ǫ ψ ( x ) , (25)which is obtained from (14) by neglecting the terms quadratic in x in φ . This leads toa Schr¨odinger problem similar to (15), but with a potential that differs from − f ( x ) onlyby the constant term γ/
2. The error threshold is determined by the point at which thedecay of the wave function ψ ( x ) matches the exponential factor e x/ǫ in (25), such that φ ( x ) ceases to be normalizable § . For k ≫ w c = γ (cid:18) π k (cid:19) , (26)which depends on the absolute number of mismatches k , but is independent of L .To reconcile this with the result (18), we note that the semiclassical approximationmust break down when the width of the semiclassical wave function, as estimated insubsection 3.3, becomes comparable to the width of the potential well provided by thefitness function. For the discontinuous mesa landscape this occurs when ξ ∼ ǫ / ∼ x = ǫk ⇒ k ∼ ǫ − / = L / . (27)For a mesa that is shorter than L / , the energy of the wave function is determinedby its confinement on the scale x , and it can be estimated from standard quantummechanical considerations to be of the order of ~ / ( mx ) ∼ γǫ /x ∼ γ/k . For k ≪ L / this supersedes the contribution ∼ k /L on the right hand side of (18). Weconclude, therefore, that the leading “quantum” correction to the ”classical” eigenvalueΛ = w − γ/ negative contribution proportional to γ/k , which leads to a § This requires ψ to decay on a scale of order unity in unscaled coordinates at the transition, which isactually inconsistent with the assumption of slow variation on the scale of the sequence index k thatunderlies the continuum approximation. obustness and epistasis in mutation-selection models w c , in qualitative agreement with (26). For smooth fitnesslandscapes the breakdown of the semiclassical regime occurs already at k ∼ L / , butthe condition for the confinement energy contribution γ/k to dominate the k /L -termin (18) still reads k ≪ L / . So far, we have worked in the harmonic approximation around x = 1 /
2, which breaksdown near the boundaries x = 0 and x = 1. However, to access the regime 1 ≪ k ≪ L considered by GH, an accurate treatment of the region of small x ≪ ≤ x ≤ − ǫ γ p x (1 − x ) d dx ψ + h γ (1 − p x (1 − x )) − f ( x ) i ψ = − Λ ψ, (28)which differs from (15) in two respects. First, the potential (16) is replaced by V full ( x ) = γ (1 − p x (1 − x )) − f ( x ) . (29)In the asymptotic limit ǫ → V full ,which exactly recovers the maximum principle (4). Second, the mass of the quantumparticle described by (28) becomes position dependent, m ( x ) b = (cid:16) γ p x (1 − x ) (cid:17) − , (30)which replaces the simple identification m b = 1 /γ in the harmonic case. Inserting (29)and (30) into the expression (23) for the finite size correction yields∆Λ = 2 − / z ǫ − / γ (1 − x ) / [ x (1 − x )] − / . (31)For fixed x this still scales as ǫ / = L − / , but when taking L → ∞ at fixed k , suchthat x →
0, we find instead that∆Λ → − / z γx − / ǫ / = 2 − / z γk − / L − / . (32)We next revisit the considerations of subsection 3.4. The width of the ground statewave function is of order ξ ∼ ( ~ /ma ) / , where both m and a now diverge as x − / for x →
0. Consequently (24) is replaced by ξ ∼ ( ǫ x ) / = ǫk / , (33)and we see that the condition ξ ≫ ǫk for the breakdown of the semiclassicalapproximation is never be satisfied. We conclude that the quantum confinement regimediscussed in subsection 3.4 in fact does not exist, and hence the improved semiclassicalexpression (31) for the finite size correction is expected to remain valid for all k andall L , provided that k , L ≫ obustness and epistasis in mutation-selection models Figure 1.
Growth rate Λ as a function of the plateau width k for two values of theplateau height w = 0 . , .
95. The sequence length is L = 100 and the mutation rateper sequence is γ = 1. The solution of the maximum principle together with the L − / -correction term (including the position-dependent mass) provides the best agreementwith the numerics. The numerical values of the growth rate have been obtained by(numerical) calculation of the largest eigenvalue of the matrix defined by equation (2). Figure 2.
Critical plateau height as a function of the plateau width k . The sequencelength is L = 500 the mutation rate per sequence γ = 1. The solution of the maximumprinciple together with the L − / -correction term provides the best agreement with thenumerics. With increasing x , the L − / and L − / -corrections approach each other.The numerical values have been obtained by monitoring the average magnetization M defined in (34) and determining the plateau height, where M jumps from a finitevalue to zero. The slight modulation of the red line arises from the finite numericalresolution of this procedure. To test the analytical predictions derived in the preceding subsections, we have carriedout a detailed numerical study of the dependence of Λ and w c on k , L and γ . Infigure 1 we show two examples for the dependence of Λ on the plateau width k . Theprediction of the asymptotic maximum principle (4) reproduces the qualitative behaviorof the numerical data but significantly overestimates the value of Λ. The L − / finitesize correction (23) derived in the harmonic approximation improves the comparison, obustness and epistasis in mutation-selection models Figure 3.
Illustration of the power of the sequence length in the correction term forfixed relative and absolute plateau width. ∆Λ is the numerical value for the growthrate Λ num less the value obtained from the maximum principle, eq. (4). For fixedrelative plateau width, as well as for fixed absolute width, the numerics show the sameexponent of the sequence length as the corresponding analytical result.
Figure 4.
Illustration of a fitness landscape with two plateaus. This type of landscapeis used to investigate the influence of height and (relative) broadness of the plateauson the population fitness Λ. but quantitative agreement is achieved only using the refined expression (31), which isproportional to L − / .Figure 2 shows a similar comparison for the critical plateau height w c . Here theprediction (26) of GH is also included and seen to match the numerical outcome onlypoorly, whereas the MP result with the finite size correction (31) produces excellentagreement. Finally, in left panel of figure 3 we verify that the finite size correction ∆Λindeed varies as L − / when L is increased at fixed relative plateau width x . The rightpanel shows the corresponding L − / dependence for fixed absolute plateau width k . obustness and epistasis in mutation-selection models
4. Fitness landscapes with competing peaks
Since we have validated the analytical results of section 3 via numerical studies, wecan now apply the analytical theory to the phenomenon of the survival of the flattestas explained in the introduction. To be specific, we want to find out whether a broadplateau outcompetes a smaller but higher one even in the limit of long sequences.In the literature this question has already been discussed to some extent by Schusterand Swetina [7]. The question can be answered by investigating a fitness landscapeconsisting of two fitness plateaus at the opposing ends of the Hamming space (see figure4). As was shown in [7], for small µ the interference between the two plateaus is negligiblewhen they are separated by a few mutational distances. The stationary state of thesystem is therefore to a very good approximation determined by the comparison betweenthe population growth rates associated with each of the two plateaus in isolation.Observing the center of mass of the population as function of the mutation rateand for fixed sequence length, we find two types of transitions. The first one is a jumpof the population from the higher to the broader plateau, which we will refer to as the selection transition [23] taking place at a mutation rate µ s . The second transition is thewell-known error threshold taking place at µ tr , where the population becomes uniformlyspread in sequence space. To analyze these transitions, an order parameter is needed.A convenient quantity is the population averaged “magnetization” defined by M = 1 − h x i ∈ [ − ,
1] with h x i = 1 L L X k =0 kP ∗ k . (34)If the whole population consists only of master sequences, the magnetization is M = 1.If only the inverse master sequence is present, the magnetization becomes M = −
1. Fora uniform distribution in sequence space (delocalized population) the magnetization is M = 0. Thus we can distinguish the qualitatively different states of the population inthe two plateau landscape by considering the population averaged magnetization M asa function of µ .As can be seen from figure 5, the selection transition (the jump between the twoplateaus) is sharp even for finite sequence lengths, whereas the error threshold is acontinuous transition for finite sequence length and only becomes sharp in the limit ofinfinite sequence length [23]. With growing sequence length the two critical mutationrates µ s and µ tr become smaller and also approach each other until, at a critical sequencelength L ∗ , the selection transition completely disappears. For sequences longer than L ∗ , the population delocalizes directly from the high, narrow peak and the low, broadplateau is never substantially populated.With the help of the maximum principle, this surprising behavior can be easilyunderstood. Using (4), the selection threshold is obtained by equating the population obustness and epistasis in mutation-selection models µ s = w − w (cid:16)p k L − k − p k L − k (cid:17) ≈ w − w √ k − √ k ) L − / , (35)where k , k ≪ L has been assumed in the last step. On the other hand, the errorthreshold µ ( i ) tr associated with plateau i = 0 , i , which gives µ ( i ) tr = w i L (cid:16) − p k i L − k i (cid:17) ≈ w i − p k i /L L − . (36)The different scaling of the two types of thresholds with sequence length implies thatfor large L the error threshold of the higher peak is encountered before the selectionthreshold, which therefore is no longer observable. The critical sequence length L ∗ where the selection transition vanishes can be estimated by equating the approximateexpressions (35) and (36), which yields L ∗ ≈ w √ k − w √ k ) ( w − w ) . (37)Following [7, 23], in our numerical work we have considered short plateaus, k = 1and k = 2, with relative fitness values w /w = 0 .
9, for which (37) give L ∗ ≈ L ∗ ; moreover, the agreementbetween theory and numerics is not substantially improved by using the full expressionsfor the principal eigenvalues Λ and Λ , including the L − / -correction derived insubsection 3.5. This is not surprising, as the continuum approach developed in section3 cannot be expected to be quantitatively accurate for plateaus sizes of order unity.For completeness we mention that for plateau widths scaling with the sequencelength (such that x = k /L and x = k /L are kept fixed as L → ∞ ) the selectiontransition is maintained at a fixed value of γ [18]. In addition to the equilibrium population distribution P ∗ k attained at long times, we canalso consider the ancestral distribution , the equilibrium distribution of the backward timeprocess, as introduced by Baake and collaborators [16, 21]. The ancestral distribution a k gives information on the origin of the equilibrium population and is obtained as theproduct of the right eigenvector P ∗ k and the left eigenvector P ∗∗ k of the mutation-selectionmatrix defined through (2), a k ∼ P ∗ k · P ∗∗ k .For the fitness landscape with two competing peaks, we find an ancestral populationthat is either located on one of the plateaus or uniformly distributed in sequence space(figure 7). The transitions beween these states are all of first order. The continuouscharacter of the error threshold transition of the equilibrium population distribution, asopposed to the discontinuous transition of the ancestral distribution, can be explainedby the growing mutational pressure affecting the population on the plateau and driving obustness and epistasis in mutation-selection models Figure 5.
Order parameter M as function of the mutation rate µ per site for afitness landscape with a high plateau at k = 0 and a broad plateau at k = L . Forshort sequence lengths one can observe a hopping of the population from the higherto the broader plateau and then a delocalization (left picture). For long sequences,one only observes the delocalization transition from the higher fitness plateau (rightpicture). The hopping between the plateaus we call the selection transition . Ittakes place at mutation rate µ s . The delocalization transition, also called errorthreshold, takes place at a mutation rate µ tr . The underlying fitness landscape is w k = 10 · Θ(1 − k ) + 9 · Θ( k − ( L − Figure 6.
Critical mutation rate µ s of the selection transition and µ tr of the errorthreshold for the fitness landscape w k = 10 · Θ(1 − k ) + 9 · Θ( k − ( L − L ∗ . The selectiontransition is observed only for sequence lengths smaller than L ∗ . The numericaldata are compared to analytic predictions based on the MP including the L − / -correction term. As before, the numerical values have been obtained by calculatingthe magnetization of the population and determining for each sequence length themutation rate where the magnetization jumps. obustness and epistasis in mutation-selection models Figure 7.
The dominant entries of the equilibrium and ancestral populationdistribution as a function of the mutation rate per site, calculated numerically. Theunderlying fitness landscape is the same two-plateau landscape used in figures 5 and 6.The sequence length is chosen as L = 50. Occupation fractions are plotted only for themost populated Hamming classes. The two distributions undergo phase transitions atthe same mutation rates, but at the error threshold the ancestral distribution undergoesa discontinuous transition, while for the equilibrium distribution the transition iscontinuous. it towards the middle of the Hamming space. Mutations cause the population to ”leakout” from the plateau. Nevertheless, the individuals maintaining the population andcompensating for the mutational loss are the ones with highest fitness, which are locatedon the plateau and make up the ancestral distribution.Before closing the analysis of plateau-shaped fitness landscapes, we want to mentionthe connection between our description and the popular language of Ising chains or semi-infinite Ising models [27, 28]. In the Ising picture, the ancestral distribution becomes thebulk distribution on a semi-infinite two-dimensional (spatial or spatio-temporal) lattice,and the equilibrium distribution becomes the distribution in the surface layer. Thisanalogy can be seen very clearly in the paper by Tarazona [23], where the different ordersof the transitions in the two distributions are explained in terms of surface wetting.
5. Epistasis and the error threshold
So far, we have discussed robustness of phenotypes using plateau-shaped fitnesslandscapes, which are a special case of the class of epistatic fitness functions. We nowwant to discuss the latter in a more general framework. Epistasis describes the non-linear dependence of the fitness function on the number of mismatches k [13]. Everyadditional mismatch is penalized harder (synergistic epistasis of deleterious mutations)or less hard (diminishing epistasis) than the previous one. Here we address the effect ofepistasis on the existence of an error threshold, which is defined for our purposes as asingularity in the dependence of the population mean fitness on the mutation rate. In obustness and epistasis in mutation-selection models w k = w − bk α , (38)where k is again the Hamming distance to the master sequence and b >
0. The epistasisexponent α takes the value α = 1 in the non-epistatic case, while α > α < α → w k = w − b (1 − δ k, ). It is well known that anerror threshold exists for α →
0, but not for α = 1 [6]. Neglecting backward mutations,Wiehe argued in [14] that an error threshold emerges whenever α <
1. In the followingwe show that, based on the maximum principle (4), the critical value of the epistasisexponent below which an error threshold develops is in fact α = 1 / L → ∞ and µ → γ = µL = const . In order to cast (38) into the form (3) required for theapplication of the maximum principle, we write w k = f ( x ) = w − ˜ bx α , with x = k/L, ˜ b = bL α (39)and the limit L → ∞ should be combined with b →
0, such that ˜ b = const . Since b can be interpreted as a kind of selection coefficient, we are thus considering a situationwhere both the mutation rate (per site) and the selection forces are small. Applying themaximum principle (4) to this landscape, the mean fitness Λ of the population in theequilibrium state is given byΛ = max x ∈ [0 , { w − ˜ bx α − γ [1 − p x (1 − x )] } ≡ max x ∈ [0 , λ ( x ) , (40)where λ ( x ) is the function inside the curly brackets.To find the condition under which the maximum is attained inside the interval x ∈ (0 , dλ/dx = 0, yielding the condition γ ˜ b (1 − x ) = αx α − / √ − x. (41)For α > / x = 0 ,
1, withan infinite slope at x = 1. As a consequence, there exists always a unique solution x c ∈ (0 ,
1) for any value of γ/ ˜ b , which describes the location of the population for L → ∞ . The location varies smoothly from x c = 0 for γ/ ˜ b → x c → / γ/ ˜ b → ∞ , and there is no error threshold. However, for α < / x = 0, and there is no solution for small γ/ ˜ b . The function λ ( x ) is thenmonotonically decreasing, which implies that the maximum in (40) is located at theboundary point x = 0 over a finite interval of γ/ ˜ b . Increasing γ/ ˜ b the function λ ( x )develops a local maximum, which eventually exceeds the boundary value λ (0) = w − γ .At this point the population discontinuously delocalizes to an interior point x c ∈ (0 , obustness and epistasis in mutation-selection models Figure 8.
Magnetization as a function of mutation rate for the fitness landscape(38) with epistasis exponent α = 0 . α = 0 .
52, respectively. For α = 0 . α = 0 .
52, it changessmoothly. Calculations have been done for a sequence length of L = 500. The error threshold condition is of the form γ/ ˜ b = g ( α ) where the function g ( α ) is notexplicitly available. This translates into the expression µ tr = γ tr L = ˜ bg ( α ) L = g ( α ) bL α − , (42)for the critical mutation rate µ tr . This scaling of µ tr with L was also obtained in [14].In the sharp peak limit α → γ/ ˜ b = γ/b = 1, which impliesthat g (0) = 1. On the other hand, for α = 1 / λ ( x ) near x = 0 reads λ ( x ) ≈ w − (˜ b − γ ) x / − γx / , (43)which shows that g (1 /
2) = 1 /
2. For γ/ ˜ b > / x c = (2 − ˜ b/γ ) /
3, which moves continuously away from x = 0. In the language of phasetransitions, α = 1 / α < / α < / α > / M as a function of γ for two different cases. The magnetization displays a non-analytic jump for α < / α > /
2. In figure 9 we show the error threshold as a function γ/ ˜ b = g ( α ), which interpolates between the limits g (0) = 1 and g (1 /
2) = 1 / α = 1 / x matchesthat of the “entropic” term ∼ p x (1 − x ) in the maximum principle (4). Since a similarterm appears also for general alphabet sizes [see (50)], the considerations of this sectionhold in that case as well. obustness and epistasis in mutation-selection models Figure 9.
Numerically determined phase diagram for the epistatic fitness landscape(38). At the thick line the population undergoes a first order phase transition from astate localized at x c = 0 (below the line) to a delocalized state x c > α = 1 /
2. The deviationfrom the prediction γ/ ˜ b = g (1 /
2) = 1 / α = 1 / α > /
2, the populationchanges smoothly. Calculations have been performed for a sequence lengths of L = 750.The slight modulation of the line is due to the finite numerical step size.
6. Conclusions
In this paper we discussed the properties of epistatic fitness landscapes, with particularemphasis on mesa landscapes describing mutational robustness of phenotypes, whichhave been studied previously in the context of regulatory motifs [3, 4]. As populationevolution model we used the continuous time Crow-Kimura model, which is aquasispecies model for asexual and haploid organisms, and analysed its stationary statesfor sequences consisting of two letters. As explained in Appendix B, it is straightforwardto generalize our results to sequence alphabets of general size. Similarly, the extension todiscrete time dynamics can be carried out by replacing the Malthusian fitness landscape w k by its Wrightian counterpart W k ∼ exp( w k ) [6, 18, 19].We reviewed two existing approaches [3, 16, 18] to this problem and explainedthe discrepancy between their predictions by extending the approach of Gerland andHwa [3] beyond the harmonic approximation. Based on a quantum mechanical analogywe derived a novel finite size correction term to the maximum principle of [16], whichsignificantly improves the agreement with numerical calculations. Our central result isthat the relative number of tolerable mismatches x = k /L is the relevant parameterfor the fitness effect of mutational robustness, and we provide accurate formulae for itsquantitative evaluation. As a consequence, we showed that the selection transition firstdescribed by Schuster and Swetina [7] disappears for long sequences.Finally, in section 5, we discussed more general forms of epistatic fitness landscapes obustness and epistasis in mutation-selection models α < Acknowledgements
We are grateful to Alexander Altland, Ellen Baake, Ulrich Gerland, Kavita Jain, LucaPeliti and David Saakian for useful discussions. This work was supported by DFG withinthe Bonn Cologne Graduate School of Physics and Astronomy, SFB 680
Molecular basisof evolutionary innovations and SFB/TR12
Symmetries and Universality in MesoscopicSystems . Appendix A: The large deviations approach
We start by symmetrizing the eigenvalue problem (2). The discrete analogue of thetransformation (14) is Q k = (cid:18) Lk (cid:19) / P ∗ k , (44)which leads toΛ Q k = ( w k − γ ) Q k + µ p ( L − k )( k + 1) Q k +1 + µ p ( L − k + 1) k Q k − . (45)Following [20], we now perform the continuum limit by making a large deviations ansatzfor Q k , Q k = Q xL = ψ ( x ) = exp[ − ǫ − u ( x )] (46)with ǫ = 1 /L . Inserting this into (45) one finds(Λ − f ( x ) + γ ) ψ = 2 γ p x (1 − x ) cosh[ u ′ ] ψ. (47)Cancelling ψ on both sides yields a Hamilton-Jacobi equation for the ‘action’ u ( x ),with u ′ = du/dx playing the role of a canonical momentum [20]. In order to cast (47)into the form of a Schr¨odinger equation, we expand the momentum-dependent factor toquadratic order, cosh( u ′ ) ≈ u ′ ) /
2, and make use of the relation( u ′ ) = ǫ ψ − d ψdx , (48)which follows from (46) to leading order in ǫ . Inserting this into (47) results in (28). Appendix B: General alphabet size
Here we show how our results generalize to the case where the symbols in the geneticsequence are taken from an alphabet of
A > A = 4).We assume a uniform point mutation rate µ connecting any two of the A possible statesof a site in the sequence, and a fitness function w k that depends on the relative number of obustness and epistasis in mutation-selection models − w k ) P ∗ k = (49)= ( A − γ (cid:20) k + 1 A − P ∗ k +1 + ( L − k + 1) P ∗ k − − (cid:18) L − k + kA − (cid:19) P ∗ k (cid:21) . Applying the results of [17, 29, 30] to this problem we find that, asymptotically for large L , the principal eigenvalue is given by the maximum principleΛ = max x ∈ [0 , ( f ( x ) − ( A − γ "(cid:18) − ( A − xA − (cid:19) − p x (1 − x ) √ A − . (50)For the case of the mesa landscape (6) this implies that the population is localized nearthe optimal genotype for w > w c with w c = γ ( A − " − ( A − x A − − p x (1 − x ) √ A − . (51)The right hand side is a monotonically decreasing function of x which vanishes at x = 1 − /A . For fixed x it is an increasing function of A . References [1] J. A. G. M. de Visser et al. Perspective: Evolution and detection of genetic robustness.
Evolution ,57:1959–1972, 2003.[2] D. C. Krakauer and J. B. Plotkin. Redundancy, antiredundancy, and the robustness of genomes.
Proc. Nat. Acad. Sci. USA , 99:1405–1409, 2002.[3] U. Gerland and T. Hwa. On the selection and evolution of regulatory DNA motifs.
J. Mol. Evol. ,55:386–400, 2002.[4] J. Berg, S. Willmann, and M. L¨assig. Adaptive evolution of transcription factor binding sites.
BMC Evolutionary Biology , 4:42, 2004.[5] L. Peliti. Quasispecies evolution in general mean-field landscapes.
Europhysics Letters , 57:745,2002.[6] K. Jain and J. Krug. Adaptation in simple and complex fitness landscapes. In U. Bastolla,M. Porto, H. E. Roman, and M. Vendruscolo, editors,
Structural approaches to sequenceevolution: Molecules,networks and populations . Springer, Berlin, 2007.[7] P. Schuster and J. Swetina. Stationary mutant distributions and evolutionary optimization.
Bulletin of Mathematical Biology , 50:635–660, 1988.[8] C. O. Wilke, J. L. Wang, C. Ofria, R. E. Lenski, and C. Adami. Evolution of digital organismsat high mutation rates leads to survival of the flattest.
Nature , 412:331–333, 2001.[9] F. M. Codoner, J.-A. Daros, R. V. Sol´e, and S. F. Elena. The fittest versus the flattest:Experimental confirmation of the quasispecies effect with subviral pathogens.
PLoS Pathogens ,2:e136, 2006.[10] R. Sanjuan, J. M. Cuevas, V. Furio, E. C. Holmes, and A. Moya. Selection for robustness inmutagenized RNA viruses.
PLoS Genetics , 3:e93, 2007.[11] M. Rolland, C. Brander, D. C. Nickle, J. T. Herbeck, G. S. Gottlieb, M. S. Campbell, B. S.Maust, and J. I. Mullins. HIV-1 over time: fitness loss or robustness gain?
Nature ReviewsMicrobiology , 5, 2007.[12] J. Sardany´es, S.F. Elena, and R.V. Sol´e. Simple quasispecies models for the survival-of-the-flattesteffect: The role of space.
J. Theor. Biol. , 250:560–568, 2008. obustness and epistasis in mutation-selection models [13] P. C. Phillips, S. P. Otto, and M. C. Whitlock. Beyond the average: The evolutionary importanceof gene interactions and variability of epistatic effects. In J. B. Wolf, E. D. Brodie III, and M. J.Wade, editors, Epistasis and the evolutionary process . Oxford University Press, Oxford, 2000.[14] T. Wiehe. Model dependency of error thresholds: the role of fitness functions and contrastsbetween the finite and infinte sites models.
Genet. Res. Camb. , 69:127–136, 1997.[15] K. Jain. Loss of least-loaded class in asexual populations due to drift and epistasis.
Genetics ,179:2125, 2008.[16] J. Hermisson, O. Redner, H. Wagner, and E. Baake. Mutation-selection balance: Ancestry, load,and maximum principle.
Theor. Pop. Biol. , 62:9–46, 2002.[17] E. Baake, M. Baake, A. Bovier, and M. Klein. An asymptotic maximum principle for essentiallylinear evolution models.
J. Math. Biol. , 50:83–114, 2005.[18] D. B. Saakian and C.-K. Hu. Exact solution of the Eigen model with general fitness functions anddegradation rates.
Proc. Nat. Acad. Sci. USA , 103:4935–4939, 2006.[19] J. M. Park and M. W. Deem. Schwinger boson formulation and solution of the Crow-Kimura andEigen models of quasispecies theory.
J. Stat. Phys. , 125:975, 2006.[20] D. B. Saakian. A new method for the solution of model of biological evolution: Derivation ofexact steady-state distributions.
J. Stat. Phys. , 128:781–798, 2007.[21] E. Baake and H.-O. Georgii. Mutation, selection, and ancestry in branching models: a variationalapproach.
J. Math. Biol. , 54:257, 2007.[22] K. Sato and K. Kaneko. Evolution equation of phenotype distribution: General formulation andapplication to error catastrophe.
Phys. Rev. E , 75:061909, 2007.[23] P. Tarazona. Error thresholds for molecular quasispecies as phase transitions: From simplelandscapes to spin-glass models.
Phys. Rev. A , 45:6038, 1992.[24] C. J. Thompson and J. L. McBride. On Eigen’s theory of the self-organization of matter and theevolution of biological macromolecules.
Mathematical Biosciences , 21:127, 1974.[25] D. Sornette.
Critical Phenomena in Natural Sciences . Springer, Berlin, 2000.[26] H. Rollnik.
Quantentheorie I . Vieweg, Wiesbaden, 1995.[27] I. Leuth¨ausser. Statistical mechanics of Eigen’s evolution model.
J. Stat. Phys. , 48:343, 1987.[28] E. Baake and H. Wagner. Mutation-selection models solved exactly with methods from statisticalmechanics.
Genet. Res. , 78:93, 2001.[29] T. Garske and U. Grimm. A maximum principle for the mutation-selection equilibrium ofnucleotide sequences.
Bull. Math. Biol. , 66:397, 2004.[30] T. Garske and U. Grimm. Maximum principle and mutation thresholds for four-letter sequenceevolution.