[PDF] Robustness and epistasis in mutation-selection models

Abstract

We investigate the fitness advantage associated with the robustness of a phenotype against deleterious mutations using deterministic mutation-selection models of quasispecies type equipped with a mesa shaped fitness landscape. We obtain analytic results for the robustness effect which become exact in the limit of infinite sequence length. Thereby, we are able to clarify a seeming contradiction between recent rigorous work and an earlier heuristic treatment based on a mapping to a Schrödinger equation. We exploit the quantum mechanical analogy to calculate a correction term for finite sequence lengths and verify our analytic results by numerical studies. In addition, we investigate the occurrence of an error threshold for a general class of epistatic landscape and show that diminishing epistasis is a necessary but not sufficient condition for error threshold behavior.

Full PDF

aa r X i v : . [ q - b i o . P E ] J a n Robustness and epistasis in mutation-selectionmodels

Andrea Wolﬀ and Joachim Krug

Institut f¨ur Theoretische Physik, Universit¨at zu K¨oln, Z¨ulpicher Str. 77, 50937 K¨oln,GermanyE-mail: [email protected] and [email protected]

Abstract.

We investigate the ﬁtness advantage associated with the robustness of aphenotype against deleterious mutations using deterministic mutation-selection modelsof quasispecies type equipped with a mesa shaped ﬁtness landscape. We obtain analyticresults for the robustness eﬀect which become exact in the limit of inﬁnite sequencelength. Thereby, we are able to clarify a seeming contradiction between recent rigorouswork and an earlier heuristic treatment based on a mapping to a Schr¨odinger equation.We exploit the quantum mechanical analogy to calculate a correction term for ﬁnitesequence lengths and verify our analytic results by numerical studies. In addition, weinvestigate the occurrence of an error threshold for a general class of epistatic landscapeand show that diminishing epistasis is a necessary but not suﬃcient condition for errorthreshold behavior.

1. Introduction

In current evolutionary theory, the concept of robustness , referring to the invarianceof the phenotype under pertubations, is of central importance [1, 2]. Here we addressspeciﬁcally mutational robustness , which we take to imply the stability of some biologicalfunction with respect to mutations away from the optimal genotype. To be precise,suppose the genotype is encoded by a sequence of length L , and the number ofmismatches with respect to the optimal genotype is denoted by k . Robustness is thenquantiﬁed by the maximum number of mismatches k , that can be tolerated before theﬁtness of the individual falls signiﬁcantly below that of the optimal genotype at k = 0.This situation arises e.g. in the evolution of regulatory motifs, where the ﬁtness is afunction of the binding aﬃnity to the regulatory protein [3, 4]. Assuming that the ﬁtnessis independent of k both for k ≤ k and for k > k , the ﬁtness landscape has the shapeof a mesa parametrized by its width k and height w [5].We consider deterministic mutation-selection models of quasispecies type, whichdescribe the dynamics of large (eﬀectively inﬁnite) populations [6]. We analyse thestationary state of mutation-selection balance, focusing on the dependence of thepopulation ﬁtness on the parameters k and w . This allows us to identify the conditionsunder which a broad ﬁtness peak of relatively low selective advantage outcompetes a obustness and epistasis in mutation-selection models survivalof the ﬂattest [8, 9, 10, 11, 12]. Our central aim is to obtain analytic results for therobustness eﬀect that become exact in the limit of long sequences. In particular, wewant to clarify whether the selective advantage is a function primarily of the relative number of tolerable mismatches x = k /L , or of the total number of mismatches k .Robustness in the sense described above is a special case of epistasis , which refersmore generally to any nonlinear relationship between the number of mutations awayfrom the optimal genotype and the corresponding ﬁtness eﬀect [13]. A simple way toparametrize epistasis is to let the loss of ﬁtness vary with the number of mismatchesas k α , such that the non-epistatic case α = 1 separates regimes of synergistic ( α > α <

1) epistasis [14, 15]. An important problem in previous workon mutation-selection models has been to identify the conditions under which epistaticﬁtness landscapes display an error threshold , a term that refers to the discontinuousdelocalization of the population from the vicinity of the ﬁtness peak as the mutationrate is increased beyond a critical value [6, 16]. Improving on earlier work that foundthat only landscapes with diminishing epistasis ( α <

1) have an error threshold, wederive here the more stringent condition α ≤ / We base our work on two complementary analytic approaches. First, recent progressin the theory of mutation-selection models [5, 16, 17, 18, 19, 20, 21, 22] providesan expression for the population ﬁtness in terms of a maximum principle (MP) thatbecomes exact when the limit L → ∞ is performed keeping the ratio x = k /L ﬁxed.Second, Gerland and Hwa (GH) [3] have used a drift-diﬀusion approximation to mapthe mutation-selection problem onto a one-dimensional Schr¨odinger equation which isthen analyzed with standard techniques.Our work was initially motivated by the observation of a discrepancy between thetwo approaches: Whereas the MP predicts that the selective advantage of a broad mesashould vanish when the limit L → ∞ is taken at ﬁxed k , in the GH approach a ﬁniteselective advantage is retained in this limit, which depends on the absolute value of k rather than on x . After introducing the model and brieﬂy reviewing the results of theMP approach in section 2, we therefore provide a detailed discussion of the drift-diﬀusionapproximation used by GH in section 3. We emphasize that it amounts to a harmonicapproximation, and show how it can be improved in such a way that the results basedon the MP are recovered.The mapping to one-dimensional quantum mechanics is nevertheless useful, as itallows us to derive the leading ﬁnite size correction to the population ﬁtness. As aconsequence we ﬁnd excellent agreement between the analytic predictions and numericalsolutions of the discrete mutation-selection equations. In section 4 we consider theselection transition in a two-peak landscape ﬁrst studied by Schuster and Swetina[7, 23], in which the population shifts from a high, narrow ﬁtness maximum to a lower obustness and epistasis in mutation-selection models α and verify our prediction by numerical calculations. Finally,some conclusions are presented in section 6. Details of the derivation of the improvedcontinuum approximation and the generalization to arbitrary alphabet size can be foundin two appendices.

2. Mutation-selection models and the maximum principle

We consider the simplest case of binary sequences and adopt continuous time dynamicsof the Crow-Kimura type, in which the mutation and selection terms act in parallel[6]. Point mutations occur at rate µ , and the (Malthusian) ﬁtness is assumed from theoutset to be a function w k only of the Hamming distance k to the optimal sequence at k = 0. The population structure is described by the fraction P k ( t ) of individuals with k mismatches, which satisﬁes the evolution equation dP k dt = ( w k − ¯ w ) P k + µ ( k + 1) P k +1 + µ ( L − k + 1) P k − − µLP k . (1)with 1 ≤ k ≤ L − k = 0 and k = L . The nonlinearityintroduced by the mean ﬁtness ¯ w ( t ) = P k w k P k can be eliminated by passing tounnormalized population variables [6, 24]. At long times the population distributiontherefore approaches the principal eigenvector P ∗ k of the linear dynamics, which is thesolution of the eigenvalue problemΛ P ∗ k = ( w k − µL ) P ∗ k + µ ( k + 1) P ∗ k +1 + µ ( L − k + 1) P ∗ k − (2)with the maximal eigenvalue Λ. This eigenvalue is equal to the long-time limit ofthe mean population ﬁtness ¯ w , and it is the main quantity of interest in this paper.Depending on the context we will refer to Λ as the mean population ﬁtness, thepopulation growth rate, the principal eigenvalue of the mutation-selection matrix deﬁnedby (2) or the ground state energy of the corresponding quantum mechanical problem,to be deﬁned in subsection 3.2.A considerable body of work has been devoted to the solution of (2) for large L . Inorder to obtain nontrivial behavior in the limit L → ∞ , it is necessary to either scalethe mutation rate ∼ /L or the ﬁtness ∼ L . We adopt here the ﬁrst choice and take L → ∞ , µ → γ = µL ﬁxed. If, in addition, the ﬁtness landscape w k is assumedto depend only on the relative number of mismatches, such that w k = f ( x ) , x = k/L (3)the principal eigenvalue in (2) is given, for L → ∞ , by the solution of a one-dimensionalvariational problem as [16, 17, 18, 19, 20, 21]Λ = max x ∈ [0 , { f ( x ) − γ [1 − p x (1 − x )] } ; (4) obustness and epistasis in mutation-selection models f ( x ) is diﬀerentiable the leadingorder correction to (4) takes the form [19, 20]∆Λ = γ L p x c − x c [1 − q − f ′′ ( x ∗ )( x c − x c ) / /γ ] , (5)where x c is the value at which the maximum in (4) is attained.In the ﬁrst part of this paper we focus on mesa landscapes of the form w k = ( w > ≤ k ≤ k k > k , (6)where w is the selective advantage of the functional phenotype and k denotes thenumber of tolerable mismatches. Within the class of scaling landscapes (3), this isrealized by setting f ( x ) = w θ ( x − x ) , (7)where θ is the Heaviside step function and x = k /L . Provided x < /

2, applicationof the maximum principle (4) yieldsΛ = ( w − γ (1 − p x (1 − x )) : w > w c = γ (1 − p x (1 − x ))0 : w < w c . (8)The value w c of the selective advantage marks the location of the error threshold at whichthe population delocalizes from the ﬁtness peak and the location x c of the maximum in(4) jumps from x c = x to x c = 1 / L − / or L − / rather than L − in this case.

3. Continuum limit and the drift-diﬀusion equation

A natural approach to analyzing (1) and (2) for large L is to perform a continuum limitin the index k . To this end we introduce ǫ = 1 /L as small parameter and replace thepopulation variable P k by a function φ ( x ) = lim L →∞ P xL . (9)The ﬁtness is taken to be of the general form (3). Expanding the ﬁnite diﬀerences in(2) to second order in ǫ then yields the stationary drift-diﬀusion equation f φ − ǫγ ddx (1 − x ) φ + ǫ γ d dx φ = Λ φ. (10)This is identical to the equation obtained by GH [3], who however write it in terms ofthe unscaled variable k = Lx . obustness and epistasis in mutation-selection models f = 0) theprincipal eigenvalue in (10) is readily seen to be Λ = 0, and the corresponding (right)eigenfunction is a Gaussian centered at x = 1 / φ ( x ) ∼ exp[ − (1 − x ) / ǫ ] . (11)This is just the central limit approximation to the binomial distribution P k = 2 − L (cid:18) Lk (cid:19) (12)which solves (2) for w k = 0 and Λ = 0. It is well known that the central limitapproximation of (12) is accurate in a region of size √ L around k = L/

2, but becomesimprecise for deviations of order L . An improved approximation is provided by thetheory of large deviations [25], in which the ansatz P k ∼ exp[ − Lu ( x )] (13)is made to obtain an expression for the large deviation function u ( x ). In the context ofmutation-selection models, this approach has recently been introduced by Saakian [20],who showed that it allows to derive the exact relation (4) in a relatively straightforwardmanner (see Appendix A). Equivalent results can be obtained by continuing theexpansion in (10) to all orders in ǫ and treating the resulting equation in a WKB-typeapproximation, which essentially corresponds to the ansatz (13), see [22].We conclude that the drift-diﬀusion approximation (10) can be expected to bequantitatively accurate only near the center x = 1 / The key step in reducing (10) to standard form is to symmetrize the linear operator onthe left hand side, thus eliminating the ﬁrst-order drift term. This can be achieved bythe transformation φ ( x ) = p φ ( x ) ψ ( x ) , (14)with φ ( x ) from (11), which leads to the stationary Schr¨odinger equation − ǫ γ d dx ψ + V ( x ) ψ = − (Λ − ǫγ ) ψ (15)with the eﬀective potential V ( x ) = γ − x ) − f ( x ) . (16)The latter consists of the superposition of a harmonic oscillator centered around x = 1 / obustness and epistasis in mutation-selection models ǫ plays the role of Planck’s constant ~ , which implies that the case of interest is the semiclassical limit of the quantum mechanical problem. In particular, for ǫ → − Λ becomes equal to the minimum of the eﬀective potential. Wethus arrive at the variational principleΛ = max x ∈ [0 , [ f ( x ) − γ − x ) ] , (17)which is precisely the harmonic approximation (in the sense of a quadratic expansionaround x = 1 /

2) of the exact relation (4). In this perspective the error thresholdcorresponds to a shift between diﬀerent local minima of V ( x ), which become degenerateat the transition point. The transition is generally of ﬁrst order, in the sense thatthe location x c of the global minimum jumps discontinuously. Within the harmonicapproximation the transition for the mesa landscape occurs at w c = γ − x ) ≈ γ (cid:18) − k L (cid:19) (18)when x = k /L ≪ For small but ﬁnite ǫ , quantum corrections to the classical limit (17) have to be takeninto account. If f ( x ) is smooth, the ground state wave function is localized near theminimum x c of the eﬀective potential, and the shift in the ground state energy can becomputed by replacing V ( x ) by a harmonic well, V ( x ) ≈ V ( x c ) + 12 V ′′ ( x c )( x − x c ) = V ( x c ) + 12 [4 γ − f ′′ ( x c )]( x − x c ) . (19)Identifying 1 /γ with the mass m of the quantum particle [compare to (15)], we see thatthis corresponds to a harmonic oscillator of frequency ω = 2 γ p − f ′′ ( x c ) / γ . Theground state energy ǫω/

2, together with the shift ǫγ on the right hand side of (15), thusgives rise to the leading order correction∆Λ = γL [1 − p − f ′′ ( x c ) / γ ] , (20)which coincides with (5) evaluated for x c ≈ /

2. Similarly, the width of the wavefunction is given by ‡ ξ = p γǫ/ ω = √ ǫγ [8 p − f ′′ ( x c ) / γ ] / . (21)In the case of the mesa landscape (7), the potential near x c = x consists of a linearramp of slope − a = V ′ ( x ) = 2 γ (2 x − < w . For small ǫ , the jump can be considered as eﬀectivelyinﬁnite (as the kinetic energy of the particle is then very small), and the corresponding ‡ Note that, because of the factor √ φ in (14), this is not equal to the width of the stationary populationdistribution. obustness and epistasis in mutation-selection models z ( ~ / m ) / a / = 2 / z γ (1 − x ) / L − / + O ( L − ) , (23)where z ≈ − . ... is the ﬁrst zero of the Airy function. The scaling ∆Λ ∼ L − / was already noted in [5]. The width of the wave function can be estimated to be of theorder ξ ∼ ( ~ /ma ) / ∼ ǫ / (24)in this case. We are now prepared to make contact with the approach of GH [3]. Assuming fromthe outset that the maximal number of mismatches is small compared to the sequencelength, 1 ≪ k ≪ L , they neglect the contribution 2 x in the drift term on the left handside of (10). The linear operator can then be symmetrized by the transformation φ ( x ) = e x/ǫ ψ ( x ) , (25)which is obtained from (14) by neglecting the terms quadratic in x in φ . This leads toa Schr¨odinger problem similar to (15), but with a potential that diﬀers from − f ( x ) onlyby the constant term γ/

2. The error threshold is determined by the point at which thedecay of the wave function ψ ( x ) matches the exponential factor e x/ǫ in (25), such that φ ( x ) ceases to be normalizable § . For k ≫ w c = γ (cid:18) π k (cid:19) , (26)which depends on the absolute number of mismatches k , but is independent of L .To reconcile this with the result (18), we note that the semiclassical approximationmust break down when the width of the semiclassical wave function, as estimated insubsection 3.3, becomes comparable to the width of the potential well provided by theﬁtness function. For the discontinuous mesa landscape this occurs when ξ ∼ ǫ / ∼ x = ǫk ⇒ k ∼ ǫ − / = L / . (27)For a mesa that is shorter than L / , the energy of the wave function is determinedby its conﬁnement on the scale x , and it can be estimated from standard quantummechanical considerations to be of the order of ~ / ( mx ) ∼ γǫ /x ∼ γ/k . For k ≪ L / this supersedes the contribution ∼ k /L on the right hand side of (18). Weconclude, therefore, that the leading “quantum” correction to the ”classical” eigenvalueΛ = w − γ/ negative contribution proportional to γ/k , which leads to a § This requires ψ to decay on a scale of order unity in unscaled coordinates at the transition, which isactually inconsistent with the assumption of slow variation on the scale of the sequence index k thatunderlies the continuum approximation. obustness and epistasis in mutation-selection models w c , in qualitative agreement with (26). For smooth ﬁtnesslandscapes the breakdown of the semiclassical regime occurs already at k ∼ L / , butthe condition for the conﬁnement energy contribution γ/k to dominate the k /L -termin (18) still reads k ≪ L / . So far, we have worked in the harmonic approximation around x = 1 /

2, which breaksdown near the boundaries x = 0 and x = 1. However, to access the regime 1 ≪ k ≪ L considered by GH, an accurate treatment of the region of small x ≪ ≤ x ≤ − ǫ γ p x (1 − x ) d dx ψ + h γ (1 − p x (1 − x )) − f ( x ) i ψ = − Λ ψ, (28)which diﬀers from (15) in two respects. First, the potential (16) is replaced by V full ( x ) = γ (1 − p x (1 − x )) − f ( x ) . (29)In the asymptotic limit ǫ → V full ,which exactly recovers the maximum principle (4). Second, the mass of the quantumparticle described by (28) becomes position dependent, m ( x ) b = (cid:16) γ p x (1 − x ) (cid:17) − , (30)which replaces the simple identiﬁcation m b = 1 /γ in the harmonic case. Inserting (29)and (30) into the expression (23) for the ﬁnite size correction yields∆Λ = 2 − / z ǫ − / γ (1 − x ) / [ x (1 − x )] − / . (31)For ﬁxed x this still scales as ǫ / = L − / , but when taking L → ∞ at ﬁxed k , suchthat x →

0, we ﬁnd instead that∆Λ → − / z γx − / ǫ / = 2 − / z γk − / L − / . (32)We next revisit the considerations of subsection 3.4. The width of the ground statewave function is of order ξ ∼ ( ~ /ma ) / , where both m and a now diverge as x − / for x →

0. Consequently (24) is replaced by ξ ∼ ( ǫ x ) / = ǫk / , (33)and we see that the condition ξ ≫ ǫk for the breakdown of the semiclassicalapproximation is never be satisﬁed. We conclude that the quantum conﬁnement regimediscussed in subsection 3.4 in fact does not exist, and hence the improved semiclassicalexpression (31) for the ﬁnite size correction is expected to remain valid for all k andall L , provided that k , L ≫ obustness and epistasis in mutation-selection models Figure 1.

Growth rate Λ as a function of the plateau width k for two values of theplateau height w = 0 . , .

95. The sequence length is L = 100 and the mutation rateper sequence is γ = 1. The solution of the maximum principle together with the L − / -correction term (including the position-dependent mass) provides the best agreementwith the numerics. The numerical values of the growth rate have been obtained by(numerical) calculation of the largest eigenvalue of the matrix deﬁned by equation (2). Figure 2.

Critical plateau height as a function of the plateau width k . The sequencelength is L = 500 the mutation rate per sequence γ = 1. The solution of the maximumprinciple together with the L − / -correction term provides the best agreement with thenumerics. With increasing x , the L − / and L − / -corrections approach each other.The numerical values have been obtained by monitoring the average magnetization M deﬁned in (34) and determining the plateau height, where M jumps from a ﬁnitevalue to zero. The slight modulation of the red line arises from the ﬁnite numericalresolution of this procedure. To test the analytical predictions derived in the preceding subsections, we have carriedout a detailed numerical study of the dependence of Λ and w c on k , L and γ . Inﬁgure 1 we show two examples for the dependence of Λ on the plateau width k . Theprediction of the asymptotic maximum principle (4) reproduces the qualitative behaviorof the numerical data but signiﬁcantly overestimates the value of Λ. The L − / ﬁnitesize correction (23) derived in the harmonic approximation improves the comparison, obustness and epistasis in mutation-selection models Figure 3.

Illustration of the power of the sequence length in the correction term forﬁxed relative and absolute plateau width. ∆Λ is the numerical value for the growthrate Λ num less the value obtained from the maximum principle, eq. (4). For ﬁxedrelative plateau width, as well as for ﬁxed absolute width, the numerics show the sameexponent of the sequence length as the corresponding analytical result.

Figure 4.

Illustration of a ﬁtness landscape with two plateaus. This type of landscapeis used to investigate the inﬂuence of height and (relative) broadness of the plateauson the population ﬁtness Λ. but quantitative agreement is achieved only using the reﬁned expression (31), which isproportional to L − / .Figure 2 shows a similar comparison for the critical plateau height w c . Here theprediction (26) of GH is also included and seen to match the numerical outcome onlypoorly, whereas the MP result with the ﬁnite size correction (31) produces excellentagreement. Finally, in left panel of ﬁgure 3 we verify that the ﬁnite size correction ∆Λindeed varies as L − / when L is increased at ﬁxed relative plateau width x . The rightpanel shows the corresponding L − / dependence for ﬁxed absolute plateau width k . obustness and epistasis in mutation-selection models

4. Fitness landscapes with competing peaks

Since we have validated the analytical results of section 3 via numerical studies, wecan now apply the analytical theory to the phenomenon of the survival of the ﬂattestas explained in the introduction. To be speciﬁc, we want to ﬁnd out whether a broadplateau outcompetes a smaller but higher one even in the limit of long sequences.In the literature this question has already been discussed to some extent by Schusterand Swetina [7]. The question can be answered by investigating a ﬁtness landscapeconsisting of two ﬁtness plateaus at the opposing ends of the Hamming space (see ﬁgure4). As was shown in [7], for small µ the interference between the two plateaus is negligiblewhen they are separated by a few mutational distances. The stationary state of thesystem is therefore to a very good approximation determined by the comparison betweenthe population growth rates associated with each of the two plateaus in isolation.Observing the center of mass of the population as function of the mutation rateand for ﬁxed sequence length, we ﬁnd two types of transitions. The ﬁrst one is a jumpof the population from the higher to the broader plateau, which we will refer to as the selection transition [23] taking place at a mutation rate µ s . The second transition is thewell-known error threshold taking place at µ tr , where the population becomes uniformlyspread in sequence space. To analyze these transitions, an order parameter is needed.A convenient quantity is the population averaged “magnetization” deﬁned by M = 1 − h x i ∈ [ − ,

1] with h x i = 1 L L X k =0 kP ∗ k . (34)If the whole population consists only of master sequences, the magnetization is M = 1.If only the inverse master sequence is present, the magnetization becomes M = −

1. Fora uniform distribution in sequence space (delocalized population) the magnetization is M = 0. Thus we can distinguish the qualitatively diﬀerent states of the population inthe two plateau landscape by considering the population averaged magnetization M asa function of µ .As can be seen from ﬁgure 5, the selection transition (the jump between the twoplateaus) is sharp even for ﬁnite sequence lengths, whereas the error threshold is acontinuous transition for ﬁnite sequence length and only becomes sharp in the limit ofinﬁnite sequence length [23]. With growing sequence length the two critical mutationrates µ s and µ tr become smaller and also approach each other until, at a critical sequencelength L ∗ , the selection transition completely disappears. For sequences longer than L ∗ , the population delocalizes directly from the high, narrow peak and the low, broadplateau is never substantially populated.With the help of the maximum principle, this surprising behavior can be easilyunderstood. Using (4), the selection threshold is obtained by equating the population obustness and epistasis in mutation-selection models µ s = w − w (cid:16)p k L − k − p k L − k (cid:17) ≈ w − w √ k − √ k ) L − / , (35)where k , k ≪ L has been assumed in the last step. On the other hand, the errorthreshold µ ( i ) tr associated with plateau i = 0 , i , which gives µ ( i ) tr = w i L (cid:16) − p k i L − k i (cid:17) ≈ w i − p k i /L L − . (36)The diﬀerent scaling of the two types of thresholds with sequence length implies thatfor large L the error threshold of the higher peak is encountered before the selectionthreshold, which therefore is no longer observable. The critical sequence length L ∗ where the selection transition vanishes can be estimated by equating the approximateexpressions (35) and (36), which yields L ∗ ≈ w √ k − w √ k ) ( w − w ) . (37)Following [7, 23], in our numerical work we have considered short plateaus, k = 1and k = 2, with relative ﬁtness values w /w = 0 .

9, for which (37) give L ∗ ≈ L ∗ ; moreover, the agreementbetween theory and numerics is not substantially improved by using the full expressionsfor the principal eigenvalues Λ and Λ , including the L − / -correction derived insubsection 3.5. This is not surprising, as the continuum approach developed in section3 cannot be expected to be quantitatively accurate for plateaus sizes of order unity.For completeness we mention that for plateau widths scaling with the sequencelength (such that x = k /L and x = k /L are kept ﬁxed as L → ∞ ) the selectiontransition is maintained at a ﬁxed value of γ [18]. In addition to the equilibrium population distribution P ∗ k attained at long times, we canalso consider the ancestral distribution , the equilibrium distribution of the backward timeprocess, as introduced by Baake and collaborators [16, 21]. The ancestral distribution a k gives information on the origin of the equilibrium population and is obtained as theproduct of the right eigenvector P ∗ k and the left eigenvector P ∗∗ k of the mutation-selectionmatrix deﬁned through (2), a k ∼ P ∗ k · P ∗∗ k .For the ﬁtness landscape with two competing peaks, we ﬁnd an ancestral populationthat is either located on one of the plateaus or uniformly distributed in sequence space(ﬁgure 7). The transitions beween these states are all of ﬁrst order. The continuouscharacter of the error threshold transition of the equilibrium population distribution, asopposed to the discontinuous transition of the ancestral distribution, can be explainedby the growing mutational pressure aﬀecting the population on the plateau and driving obustness and epistasis in mutation-selection models Figure 5.

Order parameter M as function of the mutation rate µ per site for aﬁtness landscape with a high plateau at k = 0 and a broad plateau at k = L . Forshort sequence lengths one can observe a hopping of the population from the higherto the broader plateau and then a delocalization (left picture). For long sequences,one only observes the delocalization transition from the higher ﬁtness plateau (rightpicture). The hopping between the plateaus we call the selection transition . Ittakes place at mutation rate µ s . The delocalization transition, also called errorthreshold, takes place at a mutation rate µ tr . The underlying ﬁtness landscape is w k = 10 · Θ(1 − k ) + 9 · Θ( k − ( L − Figure 6.

Critical mutation rate µ s of the selection transition and µ tr of the errorthreshold for the ﬁtness landscape w k = 10 · Θ(1 − k ) + 9 · Θ( k − ( L − L ∗ . The selectiontransition is observed only for sequence lengths smaller than L ∗ . The numericaldata are compared to analytic predictions based on the MP including the L − / -correction term. As before, the numerical values have been obtained by calculatingthe magnetization of the population and determining for each sequence length themutation rate where the magnetization jumps. obustness and epistasis in mutation-selection models Figure 7.

The dominant entries of the equilibrium and ancestral populationdistribution as a function of the mutation rate per site, calculated numerically. Theunderlying ﬁtness landscape is the same two-plateau landscape used in ﬁgures 5 and 6.The sequence length is chosen as L = 50. Occupation fractions are plotted only for themost populated Hamming classes. The two distributions undergo phase transitions atthe same mutation rates, but at the error threshold the ancestral distribution undergoesa discontinuous transition, while for the equilibrium distribution the transition iscontinuous. it towards the middle of the Hamming space. Mutations cause the population to ”leakout” from the plateau. Nevertheless, the individuals maintaining the population andcompensating for the mutational loss are the ones with highest ﬁtness, which are locatedon the plateau and make up the ancestral distribution.Before closing the analysis of plateau-shaped ﬁtness landscapes, we want to mentionthe connection between our description and the popular language of Ising chains or semi-inﬁnite Ising models [27, 28]. In the Ising picture, the ancestral distribution becomes thebulk distribution on a semi-inﬁnite two-dimensional (spatial or spatio-temporal) lattice,and the equilibrium distribution becomes the distribution in the surface layer. Thisanalogy can be seen very clearly in the paper by Tarazona [23], where the diﬀerent ordersof the transitions in the two distributions are explained in terms of surface wetting.

5. Epistasis and the error threshold

So far, we have discussed robustness of phenotypes using plateau-shaped ﬁtnesslandscapes, which are a special case of the class of epistatic ﬁtness functions. We nowwant to discuss the latter in a more general framework. Epistasis describes the non-linear dependence of the ﬁtness function on the number of mismatches k [13]. Everyadditional mismatch is penalized harder (synergistic epistasis of deleterious mutations)or less hard (diminishing epistasis) than the previous one. Here we address the eﬀect ofepistasis on the existence of an error threshold, which is deﬁned for our purposes as asingularity in the dependence of the population mean ﬁtness on the mutation rate. In obustness and epistasis in mutation-selection models w k = w − bk α , (38)where k is again the Hamming distance to the master sequence and b >

0. The epistasisexponent α takes the value α = 1 in the non-epistatic case, while α > α < α → w k = w − b (1 − δ k, ). It is well known that anerror threshold exists for α →

0, but not for α = 1 [6]. Neglecting backward mutations,Wiehe argued in [14] that an error threshold emerges whenever α <

1. In the followingwe show that, based on the maximum principle (4), the critical value of the epistasisexponent below which an error threshold develops is in fact α = 1 / L → ∞ and µ → γ = µL = const . In order to cast (38) into the form (3) required for theapplication of the maximum principle, we write w k = f ( x ) = w − ˜ bx α , with x = k/L, ˜ b = bL α (39)and the limit L → ∞ should be combined with b →

0, such that ˜ b = const . Since b can be interpreted as a kind of selection coeﬃcient, we are thus considering a situationwhere both the mutation rate (per site) and the selection forces are small. Applying themaximum principle (4) to this landscape, the mean ﬁtness Λ of the population in theequilibrium state is given byΛ = max x ∈ [0 , { w − ˜ bx α − γ [1 − p x (1 − x )] } ≡ max x ∈ [0 , λ ( x ) , (40)where λ ( x ) is the function inside the curly brackets.To ﬁnd the condition under which the maximum is attained inside the interval x ∈ (0 , dλ/dx = 0, yielding the condition γ ˜ b (1 − x ) = αx α − / √ − x. (41)For α > / x = 0 ,

1, withan inﬁnite slope at x = 1. As a consequence, there exists always a unique solution x c ∈ (0 ,

1) for any value of γ/ ˜ b , which describes the location of the population for L → ∞ . The location varies smoothly from x c = 0 for γ/ ˜ b → x c → / γ/ ˜ b → ∞ , and there is no error threshold. However, for α < / x = 0, and there is no solution for small γ/ ˜ b . The function λ ( x ) is thenmonotonically decreasing, which implies that the maximum in (40) is located at theboundary point x = 0 over a ﬁnite interval of γ/ ˜ b . Increasing γ/ ˜ b the function λ ( x )develops a local maximum, which eventually exceeds the boundary value λ (0) = w − γ .At this point the population discontinuously delocalizes to an interior point x c ∈ (0 , obustness and epistasis in mutation-selection models Figure 8.

Magnetization as a function of mutation rate for the ﬁtness landscape(38) with epistasis exponent α = 0 . α = 0 .

52, respectively. For α = 0 . α = 0 .

52, it changessmoothly. Calculations have been done for a sequence length of L = 500. The error threshold condition is of the form γ/ ˜ b = g ( α ) where the function g ( α ) is notexplicitly available. This translates into the expression µ tr = γ tr L = ˜ bg ( α ) L = g ( α ) bL α − , (42)for the critical mutation rate µ tr . This scaling of µ tr with L was also obtained in [14].In the sharp peak limit α → γ/ ˜ b = γ/b = 1, which impliesthat g (0) = 1. On the other hand, for α = 1 / λ ( x ) near x = 0 reads λ ( x ) ≈ w − (˜ b − γ ) x / − γx / , (43)which shows that g (1 /

2) = 1 /

2. For γ/ ˜ b > / x c = (2 − ˜ b/γ ) /

3, which moves continuously away from x = 0. In the language of phasetransitions, α = 1 / α < / α < / α > / M as a function of γ for two diﬀerent cases. The magnetization displays a non-analytic jump for α < / α > /

2. In ﬁgure 9 we show the error threshold as a function γ/ ˜ b = g ( α ), which interpolates between the limits g (0) = 1 and g (1 /

2) = 1 / α = 1 / x matchesthat of the “entropic” term ∼ p x (1 − x ) in the maximum principle (4). Since a similarterm appears also for general alphabet sizes [see (50)], the considerations of this sectionhold in that case as well. obustness and epistasis in mutation-selection models Figure 9.

Numerically determined phase diagram for the epistatic ﬁtness landscape(38). At the thick line the population undergoes a ﬁrst order phase transition from astate localized at x c = 0 (below the line) to a delocalized state x c > α = 1 /

2. The deviationfrom the prediction γ/ ˜ b = g (1 /

2) = 1 / α = 1 / α > /

2, the populationchanges smoothly. Calculations have been performed for a sequence lengths of L = 750.The slight modulation of the line is due to the ﬁnite numerical step size.

6. Conclusions

In this paper we discussed the properties of epistatic ﬁtness landscapes, with particularemphasis on mesa landscapes describing mutational robustness of phenotypes, whichhave been studied previously in the context of regulatory motifs [3, 4]. As populationevolution model we used the continuous time Crow-Kimura model, which is aquasispecies model for asexual and haploid organisms, and analysed its stationary statesfor sequences consisting of two letters. As explained in Appendix B, it is straightforwardto generalize our results to sequence alphabets of general size. Similarly, the extension todiscrete time dynamics can be carried out by replacing the Malthusian ﬁtness landscape w k by its Wrightian counterpart W k ∼ exp( w k ) [6, 18, 19].We reviewed two existing approaches [3, 16, 18] to this problem and explainedthe discrepancy between their predictions by extending the approach of Gerland andHwa [3] beyond the harmonic approximation. Based on a quantum mechanical analogywe derived a novel ﬁnite size correction term to the maximum principle of [16], whichsigniﬁcantly improves the agreement with numerical calculations. Our central result isthat the relative number of tolerable mismatches x = k /L is the relevant parameterfor the ﬁtness eﬀect of mutational robustness, and we provide accurate formulae for itsquantitative evaluation. As a consequence, we showed that the selection transition ﬁrstdescribed by Schuster and Swetina [7] disappears for long sequences.Finally, in section 5, we discussed more general forms of epistatic ﬁtness landscapes obustness and epistasis in mutation-selection models α < Acknowledgements

We are grateful to Alexander Altland, Ellen Baake, Ulrich Gerland, Kavita Jain, LucaPeliti and David Saakian for useful discussions. This work was supported by DFG withinthe Bonn Cologne Graduate School of Physics and Astronomy, SFB 680

Molecular basisof evolutionary innovations and SFB/TR12

Symmetries and Universality in MesoscopicSystems . Appendix A: The large deviations approach

We start by symmetrizing the eigenvalue problem (2). The discrete analogue of thetransformation (14) is Q k = (cid:18) Lk (cid:19) / P ∗ k , (44)which leads toΛ Q k = ( w k − γ ) Q k + µ p ( L − k )( k + 1) Q k +1 + µ p ( L − k + 1) k Q k − . (45)Following [20], we now perform the continuum limit by making a large deviations ansatzfor Q k , Q k = Q xL = ψ ( x ) = exp[ − ǫ − u ( x )] (46)with ǫ = 1 /L . Inserting this into (45) one ﬁnds(Λ − f ( x ) + γ ) ψ = 2 γ p x (1 − x ) cosh[ u ′ ] ψ. (47)Cancelling ψ on both sides yields a Hamilton-Jacobi equation for the ‘action’ u ( x ),with u ′ = du/dx playing the role of a canonical momentum [20]. In order to cast (47)into the form of a Schr¨odinger equation, we expand the momentum-dependent factor toquadratic order, cosh( u ′ ) ≈ u ′ ) /

2, and make use of the relation( u ′ ) = ǫ ψ − d ψdx , (48)which follows from (46) to leading order in ǫ . Inserting this into (47) results in (28). Appendix B: General alphabet size

Here we show how our results generalize to the case where the symbols in the geneticsequence are taken from an alphabet of

A > A = 4).We assume a uniform point mutation rate µ connecting any two of the A possible statesof a site in the sequence, and a ﬁtness function w k that depends on the relative number of obustness and epistasis in mutation-selection models − w k ) P ∗ k = (49)= ( A − γ (cid:20) k + 1 A − P ∗ k +1 + ( L − k + 1) P ∗ k − − (cid:18) L − k + kA − (cid:19) P ∗ k (cid:21) . Applying the results of [17, 29, 30] to this problem we ﬁnd that, asymptotically for large L , the principal eigenvalue is given by the maximum principleΛ = max x ∈ [0 , ( f ( x ) − ( A − γ "(cid:18) − ( A − xA − (cid:19) − p x (1 − x ) √ A − . (50)For the case of the mesa landscape (6) this implies that the population is localized nearthe optimal genotype for w > w c with w c = γ ( A − " − ( A − x A − − p x (1 − x ) √ A − . (51)The right hand side is a monotonically decreasing function of x which vanishes at x = 1 − /A . For ﬁxed x it is an increasing function of A . References [1] J. A. G. M. de Visser et al. Perspective: Evolution and detection of genetic robustness.

Evolution ,57:1959–1972, 2003.[2] D. C. Krakauer and J. B. Plotkin. Redundancy, antiredundancy, and the robustness of genomes.

Proc. Nat. Acad. Sci. USA , 99:1405–1409, 2002.[3] U. Gerland and T. Hwa. On the selection and evolution of regulatory DNA motifs.

J. Mol. Evol. ,55:386–400, 2002.[4] J. Berg, S. Willmann, and M. L¨assig. Adaptive evolution of transcription factor binding sites.

BMC Evolutionary Biology , 4:42, 2004.[5] L. Peliti. Quasispecies evolution in general mean-ﬁeld landscapes.

Europhysics Letters , 57:745,2002.[6] K. Jain and J. Krug. Adaptation in simple and complex ﬁtness landscapes. In U. Bastolla,M. Porto, H. E. Roman, and M. Vendruscolo, editors,

Structural approaches to sequenceevolution: Molecules,networks and populations . Springer, Berlin, 2007.[7] P. Schuster and J. Swetina. Stationary mutant distributions and evolutionary optimization.

Bulletin of Mathematical Biology , 50:635–660, 1988.[8] C. O. Wilke, J. L. Wang, C. Ofria, R. E. Lenski, and C. Adami. Evolution of digital organismsat high mutation rates leads to survival of the ﬂattest.

Nature , 412:331–333, 2001.[9] F. M. Codoner, J.-A. Daros, R. V. Sol´e, and S. F. Elena. The ﬁttest versus the ﬂattest:Experimental conﬁrmation of the quasispecies eﬀect with subviral pathogens.

PLoS Pathogens ,2:e136, 2006.[10] R. Sanjuan, J. M. Cuevas, V. Furio, E. C. Holmes, and A. Moya. Selection for robustness inmutagenized RNA viruses.

PLoS Genetics , 3:e93, 2007.[11] M. Rolland, C. Brander, D. C. Nickle, J. T. Herbeck, G. S. Gottlieb, M. S. Campbell, B. S.Maust, and J. I. Mullins. HIV-1 over time: ﬁtness loss or robustness gain?

Nature ReviewsMicrobiology , 5, 2007.[12] J. Sardany´es, S.F. Elena, and R.V. Sol´e. Simple quasispecies models for the survival-of-the-ﬂattesteﬀect: The role of space.

J. Theor. Biol. , 250:560–568, 2008. obustness and epistasis in mutation-selection models [13] P. C. Phillips, S. P. Otto, and M. C. Whitlock. Beyond the average: The evolutionary importanceof gene interactions and variability of epistatic eﬀects. In J. B. Wolf, E. D. Brodie III, and M. J.Wade, editors, Epistasis and the evolutionary process . Oxford University Press, Oxford, 2000.[14] T. Wiehe. Model dependency of error thresholds: the role of ﬁtness functions and contrastsbetween the ﬁnite and inﬁnte sites models.

Genet. Res. Camb. , 69:127–136, 1997.[15] K. Jain. Loss of least-loaded class in asexual populations due to drift and epistasis.

Genetics ,179:2125, 2008.[16] J. Hermisson, O. Redner, H. Wagner, and E. Baake. Mutation-selection balance: Ancestry, load,and maximum principle.

Theor. Pop. Biol. , 62:9–46, 2002.[17] E. Baake, M. Baake, A. Bovier, and M. Klein. An asymptotic maximum principle for essentiallylinear evolution models.

J. Math. Biol. , 50:83–114, 2005.[18] D. B. Saakian and C.-K. Hu. Exact solution of the Eigen model with general ﬁtness functions anddegradation rates.

Proc. Nat. Acad. Sci. USA , 103:4935–4939, 2006.[19] J. M. Park and M. W. Deem. Schwinger boson formulation and solution of the Crow-Kimura andEigen models of quasispecies theory.

J. Stat. Phys. , 125:975, 2006.[20] D. B. Saakian. A new method for the solution of model of biological evolution: Derivation ofexact steady-state distributions.

J. Stat. Phys. , 128:781–798, 2007.[21] E. Baake and H.-O. Georgii. Mutation, selection, and ancestry in branching models: a variationalapproach.

J. Math. Biol. , 54:257, 2007.[22] K. Sato and K. Kaneko. Evolution equation of phenotype distribution: General formulation andapplication to error catastrophe.

Phys. Rev. E , 75:061909, 2007.[23] P. Tarazona. Error thresholds for molecular quasispecies as phase transitions: From simplelandscapes to spin-glass models.

Phys. Rev. A , 45:6038, 1992.[24] C. J. Thompson and J. L. McBride. On Eigen’s theory of the self-organization of matter and theevolution of biological macromolecules.

Mathematical Biosciences , 21:127, 1974.[25] D. Sornette.

Critical Phenomena in Natural Sciences . Springer, Berlin, 2000.[26] H. Rollnik.

Quantentheorie I . Vieweg, Wiesbaden, 1995.[27] I. Leuth¨ausser. Statistical mechanics of Eigen’s evolution model.

J. Stat. Phys. , 48:343, 1987.[28] E. Baake and H. Wagner. Mutation-selection models solved exactly with methods from statisticalmechanics.

Genet. Res. , 78:93, 2001.[29] T. Garske and U. Grimm. A maximum principle for the mutation-selection equilibrium ofnucleotide sequences.

Bull. Math. Biol. , 66:397, 2004.[30] T. Garske and U. Grimm. Maximum principle and mutation thresholds for four-letter sequenceevolution.