Selection biases the prevalence and type of epistasis along adaptive trajectories
SSelection biases the prevalence and type of epistasisalong adaptive tra jectories
Jeremy A. Draghi
1, 2, * and Joshua Plotkin Department of Biology, University of Pennsylvania, Philadelphia,PA, USA Present Address: Department of Zoology, University of BritishColumbia, Vancouver, BC, Canada * Corresponding author a r X i v : . [ q - b i o . P E ] D ec Abstract
The contribution to an organism’s phenotype from one genetic locus may dependupon the status of other loci. Such epistatic interactions among loci are now recog-nized as fundamental to shaping the process of adaptation in evolving populations.Although little is known about the structure of epistasis in most organisms, recentexperiments with bacterial populations have concluded that antagonistic interactionsabound and tend to de-accelerate the pace of adaptation over time. Here, we usea broad class of mathematical fitness landscapes to examine how natural selectionbiases the mutations that substitute during evolution based on their epistatic in-teractions. We find that, even when beneficial mutations are rare, these biases arestrong and change substantially throughout the course of adaptation. In particular,epistasis is less prevalent than the neutral expectation early in adaptation and muchmore prevalent later, with a concomitant shift from predominantly antagonistic inter-actions early in adaptation to synergistic and sign epistasis later in adaptation. Weobserve the same patterns when re-analyzing data from a recent microbial evolutionexperiment. Since these biases depend on the population size and other parame-ters, they must be quantified before we can hope to use experimental data to inferan organism’s underlying fitness landscape or to understand the role of epistasis inshaping its adaptation. In particular, we show that when the order of substitutionsis not known to an experimentalist, then standard methods of analysis may suggestthat epistasis retards adaptation when in fact it accelerates it.2
Author Summary
A major goal in evolutionary biology is to relate the adaptation of an organismstraits to changes in their genome. When researchers identify substitutionspopulation-wide changes in DNA sequencethey often find that the effects of those changes onfitness interact epistatically, or non-linearly. While epistasis is often assumed to berelevant only for very large and diverse populations, here we present computationalresults that demonstrate that epistasis shapes the pattern of adaptive change in allpopulations. Our results allow us to make predictions about what kind of epistasisis most likely to occur, with distinct predictions for patterns in the early or laterphases of adaptation. We also apply these results to recent evolution experimentwith microbes and show that epistasis may appear to slow adaptation, when in factit may be facilitating evolution by natural selection. Our results will provide a newbasis for comparison for all experimentalists who investigate the fitness effects of themolecular changes underlying adaptation.3
Introduction
Two sites in a genome interact epistatically when the contribution to a trait at onesite depends on the state of the other site. While epistasis has been a significanttheme in topics such as the evolution of sex and robustness to mutation, its rolein the dynamics of evolving populations has only begun to be explored. Recentexperimental evolution studies of microbes [1, 2, 3, 4] and biomolecules [5, 6, 7, 8] haverevealed that epistasis is widespread and consequential for adaptation. These studies,combined with experiments that reconstruct ancestral genotypes [9, 10, 11, 12, 13,14, 15] or examine numerous combinations of adaptive mutations [16, 17, 18, 19],have amply demonstrated that molecular evolution cannot be explained or predictedwithout understanding how gene interactions shape adaptive possibilities.Evolution experiments are increasingly used to address two questions that areconceptually distinct, but empirically entangled: of the mutations that could playa role in adaptation, how do their interactions shape evolution, and how do evolu-tionary processes determine what kinds of interactions occur among the sites thatsubstitute? The first question focuses on the properties of the genotype-phenotype-fitness relationship, also called the adaptive landscape, while the second questionasks how the mutations chosen by evolution reflect that underlying landscape. Ifselection and other evolutionary forces are blind to epistasis – that is, if interactionsamong sites do not influence the likelihood that they will substitute – then this sec-ond question is irrelevant, and the genetic changes we see in evolution experimentsperfectly mirror the epistatic properties of the underlying adaptive landscapes. If,however, evolution biases the fixation of groups of mutations with specific patterns4f interactions, then evolution experiments present a complex problem: if epistasisshapes evolution, and evolution distorts the appearance of epistasis, then how can weuse evolution experiments to infer the underlying fitness landscape? This ambiguitycomplicates even qualitative inferences such as whether epistasis among genes canbe said to have slowed or hastened adaptation. To resolve this ambiguity researchersmust first understand how evolution biases the combinations of sites that substitutein an adapting population. Only then can researchers hope to correct for these bi-ases, which will depend upon the size, mutation rate, and other characteristics ofthe population, in order to infer the underlying fitness landscape from experimentaldata.Many theoretical studies of epistasis and patterns of asexual adaptation havefocused on questions of the existence and accessibility of multiple fitness peaks [20,21, 22, 23, 24, 25, 26, 27, 28]. While such work has clarified the broad-scale pictureof how epistasis shapes adaptation, its usefulness in predicting microevolutionarydynamics is limited. In contrast, our interest here is how experiments on adaptingpopulations can be used to infer the properties of an organism’s underlying fitnesslandscape and how epistasis shapes those experimental outcomes.We use a computational model to clarify the evolutionary effects of two contradic-tory roles of epistasis: epistatic interactions can undermine the benefits of previouslyadaptive genetic substitutions, but they can also produce new paths to higher fitness[29, 30]. The first of these effects would tend to retard adaptation, and the lattereffect would accelerate it. Our results show that natural selection biases the preva-lence and type of epistatic interactions among the mutations that substitute, even5hen mutations are too rare to interact directly as coexisting polymorphisms. Herewe work to quantify precisely how selection biases the epistasis among mutationsthat substitute in an adapting population and to understand why these biases arise.
To understand how selection shapes epistasis among the mutations that substitute,we first simulated adaptive walks in which beneficial mutations substitute sequen-tially. Østman et al. suggest that epistatic interactions should not influence substi-tution patterns when rare beneficial mutations fix independently [28]. Our resultsinitially appear to confirm this expectation: Figure 1(a) shows that sites that fixsequentially are almost as likely to interact epistatically as are pairs of randomlychosen sites. However, this concordance disappears when epistasis is examined alongthe sequence of steps comprising an adaptive walk: Fig. 1(b) shows that epistasisis in fact suppressed early in adaptation, and enriched among later steps, comparedto a random (neutral) walk. Thus, selection biases the amount of epistasis amongthe mutations that fix along an adaptive walk, and it does so in a complex manner.These opposing effects produce an apparent agreement with the random expectationfor the overall prevalence of epistasis when observations are coarsely averaged acrossentire walks, but in fact these results demonstrate that epistasis can shape patternsof substitutions even when mutations fix independently, one after another.Why does selection suppress epistasis early in walks and promote it later? To6ddress this, we studied how mutations at sites along an adaptive walk influence thefitness effects of the sites with which they interact (upstream or downstream). Inparticular, in the simple case of K = 1, Fig. 2 shows the distribution of fitness effectsof mutations at those sites that do and do not interact with the site that has justsubstituted along a walk. For a site i that changes early in the walk, mutations at itsinteracting sites are less likely to be beneficial. In other words, adaptive substitutionsearly in the walk partly undermine the benefits that would be conferred by mutationsat their partner sites. Therefore, after an early substitution at one site, its epistaticpartners are less likely to substitute than they would have otherwise – and so theearly steps in an adaptive walks exhibit a deficit of epistasis compared to the neutralexpectation.This bias against epistasis early in a walk is caused by the dependency of selectedsubstitutions on the backgrounds in which they were selectively favored. When asite forms part of the relevant genetic background for a beneficial substitution, it isstatistically likely that changes at such a site would partially undermine the beneficialeffect of this fixed adaptive substitution. Supplemental Figure 6(a) demonstratesthis regression to the mean effect: selective coefficients of mutations at a site j aresuppressed when j interacts with a site i that has just fixed an adaptive mutation,and this suppression is greater when the beneficial effect of the substitution at i islarger. 7 .2 Form of epistasis along an adaptive walk Aside from biasing the amount of epistasis along a walk, selection also biases the typeof epistasis between successive substitutions. We find that the predominant sign ofepistasis, as well as its prevalence, depends on the position of the substitutions alongan adaptive walk. Fig. 3 shows that early substitutions tend to show antagonism withone another, while later substitutions typically exhibit sign epistasis, defined as pairsof mutations where at least one member has a beneficial fitness effect on one back-ground and a deleterious effect on the other. Synergy between beneficial mutationsis present but less common than antagonism at early steps and less common thansign epistasis at later steps. The shift from antagonistic towards synergistic/signepistasis helps to explain why epistasis is suppressed early in an adaptive walk andaugmented later in the walk.Selection also biases the directionality of interactions between successive substi-tutions along an adaptive walk (see Methods). Fig. 4a shows that interactions with i upstream of j are more frequent than the converse along the entire adaptive walk.This difference is explained by disparate effect on the evolvability of site j : j ismore likely to become evolvable (able to substitute beneficially) if it is influencedby the preceding substitution at i , than if it influences i (see Supplemental Text).This result is confirmed by the regressions in Supplemental Figure 6(b & c), andit explains why such interactions are more prevalent than the converse (Figure 4a).When K >
1, another type of interaction is possible: i and j might not influenceeach other directly, but both might influence a third site. Epistasis of this type isexpected to be very common when K is a substantial fraction of N , and results for8 = 5 show that the prevalence of this type of interaction also changes along adap-tive walks (Fig. 4b). Thus, evolution biases the types and directions of interactionsamong substitutions along an adaptive walk.Choosing the K sites which influence each of the N loci defines a network ofdirected interactions. While the number of sites that influence a locus (its in-degree)is fixed at K , the number of sites that a locus influences (its out-degree) is variableand approximately Poisson distributed. Evolutionary preferences for certain types ofepistasis, such as the bias in favor of substitutions at sites that depend on previoussubstitutions, might then lead to systematic variation in the out-degree of sites whichsubstitute adaptively. Supplemental Figure 7 confirms this hypothesis: the out-degree of substituted sites is initially slightly higher than expected, then declineswith substitution number. This evidence implies that those sites with more epistaticconnections are less likely to change during adaptation, because mutations at suchsites have a greater chance of undermining previously selected beneficial changes. The NK model has several features that could amplify the biases in epistasis in-troduced by natural selection. In order to assess whether these model assumptionsmight lead to spurious results, we explored how the patterns shown in Figure 4change as the number of alleles per site and the starting fitnesses were varied. Sup-plemental Figure 8 shows that deviations from the expected prevalence of epistasisare qualitatively similar, and quantitatively greater, when the number of alleles persite, A , is increased. Supplemental Figure 9 confirms that the basic pattern of our9esults is also robust to changes in the fitness of the starting genotype.While we have focused on evolutionary dynamics in the simplified, strong-selection-weak-mutation regime, we can use individual-based simulations to explore epistasisin polymorphic populations with larger values of the population-scaled mutationrate, θ . Supplemental Figures 10 and 11 show very similar patterns for the preva-lence of epistasis among adaptive substitutions, even when θ is much greater than1. Some differences from Fig. 4 are apparent in the high- θ case during the period ofstabilizing selection that follows adaptation. However, these simulations at high θ confirm that the major patterns found in adaptive walks at low θ : there is a deficitof epistasis early in evolution, and a surplus of epistasis later in adaptation, with apredominance of interactions in which the effect of each substitution depends on thepreceding substitution along the line of descent.We also considered a very different set of fitness landscapes – computationallypredicted RNA folding – to assess the generality of our principal findings. RNA se-quences do not have static epistatic interactions between sites; instead, interactionsemerge from the folding topology and change with genotype. However, we can stillmeasure the average frequency of epistasis between substitutions on an evolutionaryline of descent; such data show that the prevalence of epistasis does vary systemati-cally along a series of substitutions, although the trend is toward decreased epistasis(Fig. 7(a)). The type of epistasis can also be quantified, although the high frequencyof conditionally neutral mutations in RNA necessitates a new category of neutralepistasis. Fig. 7(b) shows that, as in the NK model, early antagonism gives way toa high prevalence of sign (and neutral) epistasis later in adaptation. While these10esults differ in some features from those obtained in the NK model, they furtherillustrate that evolution at θ < Two recent studies on experimental populations of bacteria have inferred that an-tagonistic epistasis among beneficial mutations is common and ultimately explainsa trend of diminishing fitness gains over time [2, 3]. These studies relied in parton regression analyses of the fitness effects of observed substitutions in the presenceand absence of the other beneficial substitutions observed in the experiment. Bothstudies found a trend towards smaller beneficial effects when substitutions were as-sayed in backgrounds of higher fitness and so concluded that antagonistic epistasis11ecelerates adaptation. However, we demonstrate below that in the NK model, suchregressions are not a reliable indicator of the effect of epistasis on the speed or extentof adaptation. Our results suggest that a common statistical artifact – regression tothe mean – confounds the interpretations of such regressions and that analyses ofthe role of epistasis in adaptation may be meaningful only when the actual orderedsequence of substitutions is known.We performed the same kinds of regressions as Chou et al. and Khan et al.[2, 3] on adaptive walks simulated on NK landscapes. Specifically, we computedrank regression coefficients of background fitness versus fitness effect for the firstfive substitutions in such adaptive walks (see Methods). The distribution of averageregression coefficients in Fig. 5(a) shows a bias toward negative values similar tothose seen in bacterial experiments [2, 3], suggesting that epistasis becomes morenegative with each substitution and decelerates the pace of adaptation. However,this interpretation is contradicted by our results above: Fig. 3 clearly shows that,on average, epistasis becomes more positive with each substitution. Figs. 1,2, and4 also support this view: epistasis is initially disruptive to the large fitness gains ofearly adaptive changes, but then facilitates later adaptive steps. And, finally, Fig.5(b) shows that a genotype along the line of descent is typically more fit than wouldbe predicted from the fitness effects of its component mutations in the ancestor, andthat this synergistic effect increases along adaptive walks.Our results imply that that regression analysis of fitness effects on different ge-netic backgrounds (e.g. those performed by Khan et al. [3] and Chou et al. [2])may be misleading. To clarify this issue, we examined the mean regression coef-12cients for those adaptive walks that were unequivocally accelerated by epistasis –namely, those walks in which the effect of each subsequent mutation was greater thanexpected under multiplicativity. Even when restricted to these highly synergisticwalks, the regression analyses of the type shown in Fig. 5a. are most often nega-tive and so would erroneously suggest increasing antagonism (Supplemental Figure13). Furthermore, we also performed random (neutral) walks, in which substitutionsalong the line of descent are equally likely to exceed or fall short of their expectedmultiplicative fitness, given the fitness effect in the ancestor; even in these walks,regression coefficients of the type studied by Khan et al. [3] and Chou et al. [2] tendbe negative (Supplemental Figure 14). These data confirm that regressions of fitnesseffects against the fitnesses of genetic backgrounds cannot be reliably used to inferwhether epistasis has slowed or accelerated epistasis, at least in the NK model.The tendency of these regressions toward negative slopes may be caused by thewell-known confound of ”regression to the mean.” In these regressions, the dependentvariable, fitness effect of substitution i on a genetic background, is mathematicallyinterrelated with the independent variable, the fitness of that same background. Ifthe fitness of the background genotype is very poor, then the epistatic effects ofits alleles are statistically likely to be unusually poor. Any change that perturbsthese epistatic effects is likely to improve their fitness contributions. Therefore, asubstitution in a very unfit background is likely to show a large beneficial effectsimply by perturbing the fitness effects of interacting sites. A similar argument canbe made to explain why fitness effects are often small or even deleterious in highly fitbackgrounds and, by extension, why negative correlations are a likely consequence13f the interdependence between the variables in this regression.The results in this section highlight the difficulty of interpreting experimental datawhen the order of substitutions in unknown. The ”regression to the mean” effectwe observed may be magnified by the nature of the NK model. Nonetheless, thisexample shows that the epistatic properties of a selected sequence of substitutionsmay differ strongly and systematically from the broader pattern of epistasis in theunderlying landscape. To solve the dual problem of epistasis in evolution experiments – that epistasis bothshapes the evolutionary process and can be inferred by manipulating the sites thatsubstitute adaptively – requires theory beyond current knowledge in population ge-netics. Because the term “epistasis” encompasses all scenarios in which fitness effectsof alleles do not combine independently, no general model of epistasis has been pro-posed, let alone analyzed. Instead, exploration of a few “toy” models has led toappreciation of the subtle and significant ways that epistasis complicates our under-standing of evolution.Here we have used the NK model to contravene the intuitive notion [28] thatsites substituting one after another will be selected without regard to their epistaticinteractions. To summarize, we have shown that, even when mutations are rare,evolution selects among possible substitutions based upon the number and directionof connections to other loci, and that these selective biases change substantially14long the course of adaptation. In the NK landscapes, epistasis is less prevalent thenthe random expectation early in adaptation and much more prevalent later, with aconcomitant shift from predominantly antagonistic interactions early in adaptationto synergistic and sign epistasis later in adaptation. Additionally, sites with moreepistatic influences on other loci are more likely to substitute early than late inadaptive evolution. These results suggest that even the most basic evolutionaryprocess acting in the context of a simple fitness landscape can produce a complexexpectation for epistasis. Experimentalists must account for this baseline action ofnatural selection on epistasis among substitutions if they hope to infer the propertiesof the fitness landscapes underlying experimental or natural evolutionary outcomes.The basic intuition behind our results is simple. Early on, large-effect mutationstend to act antagonistically, which suppresses the frequency of epistasis among sub-sequent substitutions. Later on, the only way to achieve further fitness gains is byfortunate sign epistasis, and this effect tends to augment the appearance of epistasisas the population approaches a fitness peak.We have focused on evolution by sequential fixation of beneficial substitutionsin order to demonstrate that this seemingly simple case conceals several layers ofcomplexity. However, our results suggest patterns, such as a decrease in observedepistasis among early substitutions and an increase in epistasis among later ones, thatextend to polymorphic populations as well. By focusing on the differences betweenthe distribution of epistasis among all sites, and the specific sequence of substitu-tions, our approach highlights the potential for misleading inferences from evolutionexperiments when the order of substitutions is unknown. Specifically, it may be dif-15cult to reliably determine how the sign of epistasis varies with fitness using only thefitnesses of an ancestral, derived, and possible intermediate genotypes. Regressionanalyses that ignore substitution order might suggest that epistasis is deceleratingadaptation ([2, 3], Fig. 5), whereas in fact epistasis has had an accelerating effect onthe trajectory of fitnesses along the actual path of adaptation. Indeed, our reanalysisof data from [3] supports this possibility. Such discrepancies illustrate the impor-tance of measuring the order in which substitutions occur, in future experimentalstudies, in order to understand how epistasis has shaped a population’s trajectory.While the NK model has the advantages of a tunable level of epistasis and anextensive history of prior work, it certainly does not capture the full range of pos-sibilities of interactions among genes. Our results with larger numbers of alleles inthe NK model, with a computational model of RNA folding, and with data fromevolving microbial populations, suggest that the patterns we have identified in sim-ple cases may be even more pronounced in models that better approximate biolog-ical complexity. However, the most important conclusion from our study is thatnew methods of data-gathering, such as cheap whole-genome sequencing and high-throughput fitness assays, will not suffice to answer basic questions about the role ofgenotype-phenotype maps in evolution. Pioneering studies have provided compellingexamples of the ubiquity of epistasis, but they also serve to exemplify the substantialgap that separates data from hypotheses in experimental evolution. To definitivelylink gene interactions to the rate or predictability of adaptation will require a signif-icant expansion of the theory of population genetics, and a vital first step is seriousengagement with ”toy” models of interacting loci.16
Methods
Invented by Stuart Kauffman to describe evolution on rugged fitness landscapes[31, 32, 20], the NK model produces complex but computationally tractable genotype-fitness maps using only the parameters N , K , and A . The parameter N defines thenumber of sites, each of which can assume any of A alleles. The fitness of a genotypeis calculated in two steps: first, the fitness contribution of each site is determined byreference to a table of pre-calculated values; second, these fitness contributions aremultiplied together and the N-th root of this product is taken as fitness. When K iszero, the fitness contribution of a site depends only on its own allele, and not on thestate of other sites; the lookup table for each site therefore contains only A possiblefitness contributions. When K >
0, the fitness contribution of a site depends onits own allele as well as the alleles at K other sites, yielding a lookup table with A K +1 entries. By specifying that each of the entries in the lookup tables for all N sites are drawn independently from a broad distribution (in our case the uniformdistribution), the NK model ensures that the fitness effect of a substitution dependsstrongly and randomly on some fraction of the genetic background, determined by K . K is constant across sites and genotypes for a particular landscape, and the K sites upon which each locus depend are drawn uniformly from the N − i may depend on the state of site j without implying that the contributionof j also depends on i . We can therefore categorize relationships between sites: i downstream of j if the fitness contribution of i depends on j , and j is conversely upstream of i in this example. While this direction of influence is significant for someof our results below, we note a subtle confusion between epistasis as defined in theNK model, and epistasis as defined by an experimentalist. This confusion stems fromthe fact that an experimentalist measures the fitness effect of a substitution, whilethe NK model considers fitness contributions of sites. Therefore, the fitness effect ofa substitution at site i will show epistasis with site j not only if i is downstream of j , but also if it is upstream of j . Similarly, the fitness effect of a substitution at i will be epistatic with j if both i and j are upstream of some mutual site k , even ifneither i nor j directly influence one another. To minimize this potential confusion,we follow the empirical definition and use the term ‘epistasis’ to refer to any case inwhich the fitness effect of a substitution at a site depends on the state of some othersite. This usage makes the common assumption that independent fitness effects aremultiplicative.Aside from NK landscapes, we also explore epistasis and evolutionary dynamicsusing RNA folding landscapes, as well as experimental bacterial data, describedbelow. To investigate epistasis among genetic substitutions, we employed two types of sim-ulations: adaptive walks, which model simplified fixation dynamics in essentiallymonomorphic populations; and individual-based Monte Carlo simulations of poten-tially diverse populations. In an adaptive walk the population is represented by a18ingle genotype, and a mutation to any of the A − s i = w i w − w is the fitness of the currently fixed genotype and w i the fitness of the mutant i .Evolution is then a Markov process with transition probabilities defined by P i = s i (cid:80) i ∈ M s i (1)where M i is the set of all adaptive, one-mutant neighbors of the current genotype[34, 35]. These walks therefore substitute one mutation at a time, strictly increasingin fitness until a local maximum is reached.We used the Wright-Fisher to describe evolution in polymorphic populations: anasexual population of fixed size n reproduces with discrete generations and selectionon fertility. Mutations occur at Poisson frequencies according to the per-genomerate µ . Because our goal is to investigate evolutionary dynamics when nµ = θ isnear or greater than 1, we required a method of detecting substitutions that doesnot depend on independent, well-demarcated fixation events. We therefore trace theline of descent from the final population back to the initial generation and recordchanges along this lineage. To minimize the false identification of polymorphismsas substitutions, we ran these simulations for n generations past the desired endingtime-point. We then selected the most fit individual and traced its lineage back tothe start, discarding any genetic changes that arose during those last n generations.To compare our results with empirical data, we performed two types of regres-sions on the fitness effects of the first few substitutions in evolutionary simulations.19ollowing recent experimental examples [2, 3], we examined the first five substitu-tions in a simulation and measured the fitness effect of each of these substitutionswith all combinations of the allele states at the other four sites. The fitness of each ofthe sixteen genetic backgrounds is taken as the independent variable, and the fitnesseffect of the focal substitution as the dependent variable; a separate regression wasperformed for each of the five sites, though the resulting five correlation coefficientsare not independent. These analyses were contrasted with a second type of regres-sion, in which the ranks of epistatic deviations of each successive substitution werecompared to their order of substitution. If we consider the first five substitutions,then there are four epistatic deviations: e = W − W W (2) e = W − W W W (3) e = W − W W W W (4) e = W − W W W W W (5)where W , for example, represents the fitness of the genotype with the first twosubstitutions, divided by the fitness of the ancestor. Because these regressions arebased on four points, we use this analysis only to classify epistasis along a walk; forexample, we highlight the significance of walks in which e > e > e > e as examples where epistasis consistently leads to greater than expected fitness.In both cases, we ignore simulations with little epistasis among the first five20ubstitutions in order to avoid spurious correlations; our filtering removed about20% of simulations. Because we remove walks without regard to the direction ofepistatic effects, this removal should not bias any results.New landscapes were generated for each replicate simulation. In both types ofpopulations, a simulation began from a randomly drawn genotype with fitness in acertain range, usually close to the 50th percentile (that is, within 0.002 of the desiredstarting fitness). We also performed Wright-Fisher simulations of evolving RNA populations usingthe Vienna RNA folding package, version 1.8.5, with default folding parameters.RNA sequences of 72 bases in length constituted the genotypes, and the predictedminimum-free-energy structures determined the phenotypes. Fitness of an RNAgenotype was calculated as (1 + s ) − d where d denotes the tree edit distance be-tween the RNA’s phenotype and a defined optimal phenotype. Here s quantifiesthe strength of selection and is equivalent to the multiplicative selective coefficientassociated with a mutation that changes d by a single unit; s = 0 .
01 in the resultsshown. The tree edit distance algorithm, included in the Vienna package, determinesthe minimum number of steps from a group of edit operations that are needed totransform one structure into another. The initial genotype was drawn randomly,and the optimum phenotype, used to impose directional selection, was also createdby randomly drawing genotypes and discarding those whose minimum free energystructure is the trivial, unfolded state. This optimum was also required to be 4021nits from the phenotype of the initial genotype so that the pressure to adapt wasstrong and uniform across replicates. Simulations were run for 50,000 generations.Substitutions in RNA simulations were determined by tracing a line of descent,as described above for NK simulations. Because our goal was to study adaptationamong beneficial mutations, we filtered the resulting records of substitutions byignoring adjacent pairs on the line of descent when both members of the pair wereneutral or deleterious on the background in which they originally fixed. In practice,less than 1% of pairs were excluded by this rule, so our results are not sensitive tothis criterion.
References [1] Blount ZD, Borland CZ, Lenski RE (2008) Historical contingency and the evo-lution of a key innovation in an experimental population of
Escherichia coli . Proceedings of the National Academy of Sciences of the United States of Amer-ica
Science
Science
Es-cherichia coli population.
Science
Chembiochem
Proceedings of the National Academy of Sciences ofthe United States of America
Nature
Plos Genetics
Science
Nature
Proceedings of the National Academy of Sciences of theUnited States of America
Science
Plos Genetics
Plos Pathogens
Plos One
Escherichia coli . Nature Genetics
Plos Genetics
Plos Genetics
Plos Genetics
The origins of order: self-organization and selection inevolution (Oxford University Press, New York).[21] Whitlock M, Phillips P, Moore F, Tonsor S (1995) Multiple fitness peaks andepistasis.
Annual Review of Ecology and Systematics
Evolution
PLoS Comput Biol
Theoretical Population Biology
Pro-ceedings of the National Academy of Sciences of the United States of America
Chaos
Plos Computational Biology
Proceedings of the Royal Society B-Biological Sciences
Genetics
The origins of evolutionary innovations: a theory of transfor-mative change in living systems (Oxford University Press, Oxford).[31] Kauffman S, Levin S (1987) Towards a general theory of adaptive walks onrugged landscapes.
Journal of Theoretical Biology
Journal of TheoreticalBiology
Science
Evolution
Evolution . . . . . K F r a c t i on E p i s t a t i c a) . . . . . Step Number F r a c t i on E p i s t a t i c K = 1
K = 2
K = 3
K = 4
K = 5 b) Figure 1: a) The overall amount of epistasis along an adaptive walk in an NK land-scape roughly agrees with its random expectation. The dots show the frequency ofepistatic interactions between substitution i and its immediate successor j , averagedacross all steps in the adaptive walk. The solid line depicts the predicted incidenceof epistasis, if sites are chosen to substitute randomly (see supplement). N = 20 and A = 2. Standard errors are less than 0.001. b) The frequency of epistasis betweensubsequent substitutions is depressed compared to the random expectations, earlyin adaptation, and augmented late in adaptation. Dots show the frequencies of epis-tasis between substitution i and its immediate successor j , indexed by the positionof substitution i along the adaptive walk. Lines indicate the corresponding randomexpectation. Standard errors are less than 0.01 for all plotted means.27 fter substitution 1 Selective Coefficent D en s i t y -0.2 -0.1 Epistatic
Non-interacting
After substitution 2
Selective Coefficent D en s i t y -0.2 -0.1 After substitution 3
Selective Coefficent D en s i t y -0.2 -0.1 After substitution 4
Selective Coefficent D en s i t y -0.2 -0.1 After substitution 5
Selective Coefficent D en s i t y -0.2 -0.1 After substitution 6
Selective Coefficent D en s i t y -0.2 -0.1 After substitution 7
Selective Coefficent D en s i t y -0.2 -0.1 After substitution 8
Selective Coefficent D en s i t y -0.3 -0.2 -0.1 After substitution 9
Selective Coefficent D en s i t y -0.3 -0.2 -0.1 Figure 2: Mutations at sites that interact with recent substitutions are less favorableearly in evolution and more favorable late, in comparison to non-interacting sites.Histograms depict the frequencies of the selective coefficients of mutations acrossmany replicate adaptive walks with parameters N = 20, K = 1 and A = 2. After anadaptive substitution occurs at site i , all ( A − N −
1) other possible mutations areinspected and classified by whether their fitness effects depend on site i epistatically,or are independent of it. As the adaptive walk proceeds, the selection coefficientsof available mutations shift towards negative values in general. At the same time,among the adaptive mutations available late in evolution, the great majority of theminteract with a recent substitution. Thus, epistasis tends to retard the substitution ofinteracting sites early in evolution, and promote such substitutions late in evolution.28 tep Number F r equen cy o f T y pe o f E p i s t a s i s . . . . . . . . . . . . Synergy
Antagonism
Sign a) Step Number F r equen cy o f T y pe o f E p i s t a s i s . . . . . . . . . . . . b) Figure 3: Prevalence of synergy, antagonism, and sign epistasis among pairs of con-secutive substitutions ( i, j ) along adaptive walks on
N K landscapes. Such walks arecharacterized by an abundance of antagonism early in adaptation, and an abundanceof sign epistasis late in adaptation. a) K = 1, b) K = 5. N = 20 and A = 2.29 . . . . . . . Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i a) . . . . Step Number F r equen cy o f R e l a t i on s h i p b) i depends on jj depends on i Indirect . . . Figure 4: Observed frequency of directional epistatic interactions between substitu-tion i and its immediate successor j along adaptive walks on N K landscapes. a) K = 1; b) K = 5. A single pair may be counted as more than one type of epistasis,as in the case of reciprocal interactions. The dashed lines depict the predicted inci-dences if substitutions are chosen randomly. For K >
1, another class of epistasisis possible: both i and j may jointly influence a third site, which we refer to as anindirect interaction. N = 20 and A = 2. Standard errors are less than 0.01 for allplotted means. 30 ean Spearman's rho -0.5 a) Substitutions P r opo r t i on B e tt e r T han E x pe c t ed . . . . . . b) Figure 5: Two views of epistasis among the first five substitutions in an adaptivewalk. a) Spearman regression coefficients for the relationships between the fitnesseffect of a substitution and the fitness of the genetic background in which that sub-stitution is made [2, 3]. Each substitution is tested against the sixteen genetic back-grounds comprising all combinations of the other four substitutions. b) Proportionof combinations of substitutions along the line of descent which exceed the fitnessexpected from the fitness effects of the individual mutations in the ancestor. Ex-pectations are computed according to Equations 7-10. Standard errors are less than1%. In both figures, replicates are filtered to remove adaptive walks with too littleepistasis among substitutions (see methods) – data shown in panel (b) are addition-ally filtered to remove cases where W = W W /W . The analysis on the left panelwould suggest that epistasis is primarily antagonistic, whereas in fact the right panelshows that synergistic interactions dominate along the walk. N = 20, K = 5 and A = 2. 31 Appendix
The simplicity of the NK model allows the probabilities of different epistatic relation-ships between substituting sites to be calculated, assuming that sites are chosen tosubstitute at random. For distinct loci i and j , the random (neutral) walk prevalencesare given by: p N,u = KN − p N,d = KN − p N,i = 1 − (cid:18) − KN − K − N − (cid:19) N − (8) p N,m = (1 − p N,u )(1 − p N,d )(1 − p N,i )= (cid:18) − KN − (cid:19) (cid:18) − KN − K − N − (cid:19) N − (9) Our results focus on two main types of epistasis: a site may be upstream or down-stream of another site. While both types of influence create an epistatic interactionbetween the fitness effects of changes at both sites, simulation results reveal a pref-erence for interactions where the first site to substitute, i , is upstream of j . Here wetry to quantify both hypothesized effects of epistasis – the detrimental undermin-32ng of fixed beneficial effects, and the positive potential for epistasis to uncover newdirections for adaptation – by calculating conditional probabilities in the NK model.The first goal is to calculate the probability that a site j with alleles b and B iscapable of a beneficial substitution after a beneficial change at site i from allele a toallele A , assuming epistasis between i and j . We consider two ways in which sites i and j can interact: i is upstream of j , and so i determines the effect of substitution at j , or i may be downstream of j ; we will ignore the case where each site may mutually influence one another. In both cases, we want to solve: P { W AB > W Ab } = P { w A ( B ) w B ( A ) > w A ( b ) w b ( A ) } (10)where the notation w A ( b ) indicates the fitness effect of allele b at site j when site i has allele A , and W AB represents the total fitness contribution of both sites whensite i has allele A and site j has allele B . Because fitness is a product across sites, wedefine W AB = w A ( B ) w B ( A ). While fitness is technically the Nth root of the productacross all N sites, our results below depend only on the ranks of fitness contributions;we therefore ignore this complication.These calculations, performed in detail below, demonstrate that the chance for asite to be evolvable – selectively favored to substitute – is diminished if that site’sepistatic partner has beneficially substituted, and that this reduced probability iscaused by the fact that such a substitution will disrupt the genetic background thatwas favorable to the prior substitution. Strikingly, these costs to epistasis seem tobe consistent between the two types of epistasis, despite the observed preference forinteractions where i is upstream of j (Figure 3). While these two kinds of epistasis are33qually likely to undermine the benefit of previous substitutions, and so have a netdeleterious effect, they differ in a more subtle aspect of evolvability. Natural selectionshould exhaust the set of potentially beneficial changes and so late in adaptation weexpect evolvable sites to be rare. To simulate late adaptation, we could assume that w a ( B ) w B ( a ) < w a ( b ) w b ( a ) – that is, that site j was not evolvable prior to change atsite i . Does this assumption reveal a difference between the two classes of epistasis?Simple numerics demonstrate that this additional assumption favors substitutionat j when j is downstream of i : 35.8% ± i , as opposed to only 14.4% ± j is upstream of i . Whilethese probabilities could be derived as above, a more intuitive approach is to considerthe effect of each assumption of the fitness effect of a substitution at site j , aftersubstitution at site i . This effect is related to the ratio: w B ( A ) w A ( B ) w b ( A ) w A ( b )In either model, our expectation of the value of the denominator is increased bythe knowledge that w a ( B ) w B ( a ) < w a ( b ) w b ( a ) < w A ( b ) w b ( A ). When j is downstreamof i , this increase is partially mitigated by two factors: w B ( A ) = w b ( A ), and theremaining term in the numerator, w A ( B ), is an independent draw. However, when j is upstream of i , then w B ( A ) is an independent draw, and w A ( B ) is expected to besmaller than an independent draw, since it equals w a ( B ). This asymmetry betweenthe evolutionary acceptability of two directions of epistatic influence is thereforecaused by another asymmetry: the high likelihood that sites are unevolvable later inan adaptive process. 34 .3 Site i is upstream of j When j depends on i , but i is independent of j , then we have w b ( A ) = w B ( A ) and w b ( a ) = w B ( a ). Eq. 10 therefore reduces to: P { W AB > W Ab } = P { w A ( B ) w B ( A ) > w A ( b ) w b ( A ) } = P { w A ( B ) > w A ( b ) } Because the beneficial substitution a → A occurred on background b , and weassume only beneficial substitutions, w b ( A ) w A ( b ) > w b ( a ) w a ( b ). We want the distri-bution of w A ( b ), given that: w A ( b ) > w b ( a ) w a ( b ) w b ( A )All four of these variables are standard random uniform variates, and we assumethat they are independent. Let x = w b ( a ), y = w a ( b ), z = xy , and v = w b ( A ). Wecan therefore solve for the convolution z = xy , then for the quotient distribution zv .We can find the distribution of z by constructing the cumulative distributionfunction: F xy ( z ) = z + (cid:90) z zx dx = z (1 + lnx | z ) = z (1 − lnz )Then we differentiate to obtain the p.d.f.: f xy ( z ) = − ln ( z )35ext, we obtain the convolution of z over v by a similar construction. Let q = zv ,and note that q ranges from zero to infinity. A simple geometric approach revealsthat q is piecewise at 1. When q ≤
1, we can define the c.d.f.: F z/v ( q | q ≤
1) = (cid:90) q − ln ( z ) (cid:18) − zq (cid:19) dzf z/v ( q | q ≤
1) = 14 (3 − ln ( q ) + 1))and when z > F z/v ( q | q >
1) = (cid:90) − ln ( z ) (cid:18) − zq (cid:19) dzf z/v ( q | q >
1) = 14 q However, our goal is to find the distribution of w A ( b ) given that it is greater than q , and since 0 ≤ w A ( b ) ≤
1, this latter piece of f ( q ) is not needed. f ( x = w A ( b )) = (cid:82) x − ln ( q ) + 1) dq (cid:82) dx (cid:82) x − ln ( q ) + 1) dq = x (3 − ln ( x ))2Focal site j will be beneficial if w A ( B ) > w A ( b ), which we can now calculate as:36 { w A ( B ) > w A ( b ) } = (cid:82) dx (cid:82) x y (3 − ln ( y )) dy (cid:82) dx (cid:82) y (3 − ln ( y )) dy = 718 i is downstream of j If i depends on j , but j is independent of i , then w a ( b ) = w A ( b ) and w a ( B ) = w A ( B ).The condition w A ( b ) w b ( A ) > w a ( b ) w b ( a ) therefore simplifies to w b ( A ) > w b ( a ). Let x = w b ( A ) and y = w b ( a ) and assume that these are independent uniform randomvariates. p ( x | x > y ) = 2 (cid:90) x dx = 2 x Now we let v = w A ( b ) and find the convolution z = xv | x > y by first deriving ac.d.f. F xv | x>y ( z ) = 2 (cid:90) z xdx + 2 (cid:90) z x zx dx = z + 2 z − z = 2 z − z f xv | x>y ( z ) = 2 − z Now, we want to assess the probability that w = w A ( B ) w B ( A ), a pair of inde-pendent random uniform variates, is greater than z . Using the result above that p ( w ) = − ln ( w ), we can write this as a ratio of integrals:37 { w A ( B ) w B ( A ) > w A ( b ) w b ( A ) } = (cid:82) − ln ( w ) dw (cid:82) w − zdz (cid:82) − ln ( w ) dw (cid:82) − zdz = 71838 .5 Supplemental Figures Figure 6: Relation of the beneficial fitness effect of substitution i to the change inthe fitness effect conferred by a mutation at interacting site j . Data are drawn fromthe first step in adaptive walks. a) All types of interactions; b) Interactions in whichsite i is downstream of site j ; c) Interactions in which i is upstream of j . N = 20, K = 1 and A = 2. 39 . . . . . . . K = 1
Step Number O u t D eg r ee K = 5
Step Number O u t D eg r ee Figure 7: Mean out-degree (number of loci which depend epistatically on the focalsite) of substituted sites. Each plotted point is based on at least 5000 adaptive walks. N = 20 and A = 2. 40 . . . . . . . A = 2
Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i . . . . . . . A = 4
Step Number F r equen cy o f R e l a t i on s h i p . . . . . . . A = 8
Step Number F r equen cy o f R e l a t i on s h i p . . . . . . A = 16
Step Number F r equen cy o f R e l a t i on s h i p Figure 8: Observed frequency of directional epistatic interactions between substi-tution i and its immediate successor j for adaptive walks on N K landscapes withdifferent values of A , the number of alleles per locus. A single pair may be countedas more than one type of epistasis, as in the case of reciprocal interactions. Thedashed lines depicts the predicted incidences if sites are chosen uniformly at random,as calculated in Eqs. 2&3. N = 20 and K = 1. Each plotted point is based on atleast 5000 replicates. 41 . . . . . Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i . . . . Step Number F r equen cy o f R e l a t i on s h i p . . . . . Step Number F r equen cy o f R e l a t i on s h i p . . . . . . Step Number F r equen cy o f R e l a t i on s h i p Figure 9: Observed frequency of directional epistatic interactions between substi-tution i and its immediate successor j for adaptive walks on N K landscapes fordifferent values of the fitness percentile of the starting genotype. A single pair maybe counted as more than one type of epistasis, as in the case of reciprocal interac-tions. The dashed lines depicts the predicted incidences if sites are chosen uniformlyat random, as calculated in Eqs. 2&3. N = 20 and K = 1. Each plotted point isbased on at least 5000 replicates. 42 . . . . . . Theta = 1
Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i . . . . . . . . . . . Theta = 10
Step Number F r equen cy o f R e l a t i on s h i p . . . . . Figure 10: Observed frequency of directional epistatic interactions between substitu-tion i and its immediate successor j for individual-based simulations with populationsizes of 1000 and per-genome mutation rates of 0.001 and 0.01. A single pair may becounted as more than one type of epistasis, as in the case of reciprocal interactions.For K >
1, another class of epistasis is possible: both i and j may jointly influence athird site, which we refer to as an indirect interaction. The dashed lines depicts thepredicted incidences if sites are chosen uniformly at random, as calculated in Eqs.2-5. Red lines and corresponding axes depict the mean change in fitness. N = 20and K = 1. 43 . . . . . . . Theta = 1
Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i Indirect . . . . . . . . . . . . . Theta = 10
Step Number F r equen cy o f R e l a t i on s h i p . . . . . Figure 11: Observed frequency of directional epistatic interactions between substitu-tion i and its immediate successor j for individual-based simulations with populationsizes of 1000 and per-genome mutation rates of 0.001 and 0.01. A single pair may becounted as more than one type of epistasis, as in the case of reciprocal interactions.For K >
1, another class of epistasis is possible: both i and j may jointly influence athird site, which we refer to as an indirect interaction. The dashed lines depicts thepredicted incidences if sites are chosen uniformly at random, as calculated in Eqs.2-5. N = 20 and K = 5. 44 . . . . . . Step Number (Index of i) F r equen cy o f E p i s t a s i s be t w een i and j Step Number T y pe o f E p i s t a s i s . . . . . . . . . . . . NeutralSynergyAntagonismSign
Figure 12: Epistasis in individual-based evolution in a model of RNA folding. a)Observed frequency of epistatic interactions between substitution i and its immedi-ate successor j . Error bars indicate confidence intervals. b) Prevalence of four typesof epistasis among those pairs of substitutions that interact. For both figures, pop-ulation of 5000 individuals were simulated for 50,000 generations at a per-genomemutation rate of 2 × − . 45 ean Spearman's rho -0.6 -0.4 -0.2 Figure 13: Spearman rank regression coefficients for the relationships between thefitness effect of a substitution and the fitness of the genetic background in which thatsubstitution is made for adaptive walks which exceed multiplicative expectationsat each step. Each substitution is tested against the sixteen genetic backgroundscomprising all combinations of the other four substitutions. Replicates are filteredto remove walks with too little epistasis among substitutions (see main text). N = 20, K = 5 and A = 2. 46 ean Spearman's rho -0.8 -0.6 -0.4 -0.2 a) Substitutions P r opo r t i on B e tt e r T han E x pe c t ed . . . . . . b) Figure 14: Two views of epistasis among the first five substitutions on randomwalks. a) Spearman rank regression coefficients for the relationships between thefitness effect of a substitution and the fitness of the genetic background in whichthat substitution is made. Each substitution is tested against the sixteen geneticbackgrounds comprising all combinations of the other four substitutions. b) Pro-portion of combinations of substitutions along the line of descent which exceed thefitness expected from the fitness effects of the individual mutations in the ancestor.Expectations are computed according to Equations 7-10. Standard errors are lessthan 1%. In both figures, replicates are filtered to remove walks with too little epis-tasis among substitutions (see main text) – data show in panel (b) are additionallyfiltered to remove cases where W = W W /W . N = 20, K = 5 and A = 2.47 - . . . . Substitution Number O b s e r v ed - E x pe c t ed F i t ne ss a) - . . . . Substitution Number O b s e r v ed - E x pe c t ed F i t ne ss b) Figure 15: Epistasis deviation is initially negative, then becomes positive along thesequence of the first five substitutions observed in a microbial evolution experiment(Khan et al. 2011). The substituted loci are labeled r , t , s , g , and p respectively.Let w r be the fitness of the mutant r allele relative to the ancestor, w rt the fitnessof the double mutant, and so on. Expected fitness can be calculated in at leasttwo ways, each of which shows the same qualitative pattern. a) Expected fitness iscomputed as the product of the effects of each mutation in the ancestral genotype.For example, the deviation for substitution 3 is calculated as w rts w rt − w s . b) Expectedfitness is computed with reference to the effect of the focal allele in the previousgenetic background. For example, the deviation for substitution 3 is here calculatedas w rts w rt − w rs w rr