[PDF] Selection biases the prevalence and type of epistasis along adaptive trajectories

Abstract

The contribution to an organism's phenotype from one genetic locus may depend upon the status of other loci. Such epistatic interactions among loci are now recognized as fundamental to shaping the process of adaptation in evolving populations. Although little is known about the structure of epistasis in most organisms, recent experiments with bacterial populations have concluded that antagonistic interactions abound and tend to de-accelerate the pace of adaptation over time. Here, we use a broad class of mathematical fitness landscapes to examine how natural selection biases the mutations that substitute during evolution based on their epistatic interactions. We find that, even when beneficial mutations are rare, these biases are strong and change substantially throughout the course of adaptation. In particular, epistasis is less prevalent than the neutral expectation early in adaptation and much more prevalent later, with a concomitant shift from predominantly antagonistic interactions early in adaptation to synergistic and sign epistasis later in adaptation. We observe the same patterns when re-analyzing data from a recent microbial evolution experiment. Since these biases depend on the population size and other parameters, they must be quantified before we can hope to use experimental data to infer an organism's underlying fitness landscape or to understand the role of epistasis in shaping its adaptation. In particular, we show that when the order of substitutions is not known to an experimentalist, then standard methods of analysis may suggest that epistasis retards adaptation when in fact it accelerates it.

Full PDF

SSelection biases the prevalence and type of epistasisalong adaptive tra jectories

Jeremy A. Draghi

1, 2, * and Joshua Plotkin Department of Biology, University of Pennsylvania, Philadelphia,PA, USA Present Address: Department of Zoology, University of BritishColumbia, Vancouver, BC, Canada * Corresponding author a r X i v : . [ q - b i o . P E ] D ec Abstract

The contribution to an organism’s phenotype from one genetic locus may dependupon the status of other loci. Such epistatic interactions among loci are now recog-nized as fundamental to shaping the process of adaptation in evolving populations.Although little is known about the structure of epistasis in most organisms, recentexperiments with bacterial populations have concluded that antagonistic interactionsabound and tend to de-accelerate the pace of adaptation over time. Here, we usea broad class of mathematical ﬁtness landscapes to examine how natural selectionbiases the mutations that substitute during evolution based on their epistatic in-teractions. We ﬁnd that, even when beneﬁcial mutations are rare, these biases arestrong and change substantially throughout the course of adaptation. In particular,epistasis is less prevalent than the neutral expectation early in adaptation and muchmore prevalent later, with a concomitant shift from predominantly antagonistic inter-actions early in adaptation to synergistic and sign epistasis later in adaptation. Weobserve the same patterns when re-analyzing data from a recent microbial evolutionexperiment. Since these biases depend on the population size and other parame-ters, they must be quantiﬁed before we can hope to use experimental data to inferan organism’s underlying ﬁtness landscape or to understand the role of epistasis inshaping its adaptation. In particular, we show that when the order of substitutionsis not known to an experimentalist, then standard methods of analysis may suggestthat epistasis retards adaptation when in fact it accelerates it.2

Author Summary

A major goal in evolutionary biology is to relate the adaptation of an organismstraits to changes in their genome. When researchers identify substitutionspopulation-wide changes in DNA sequencethey often ﬁnd that the eﬀects of those changes onﬁtness interact epistatically, or non-linearly. While epistasis is often assumed to berelevant only for very large and diverse populations, here we present computationalresults that demonstrate that epistasis shapes the pattern of adaptive change in allpopulations. Our results allow us to make predictions about what kind of epistasisis most likely to occur, with distinct predictions for patterns in the early or laterphases of adaptation. We also apply these results to recent evolution experimentwith microbes and show that epistasis may appear to slow adaptation, when in factit may be facilitating evolution by natural selection. Our results will provide a newbasis for comparison for all experimentalists who investigate the ﬁtness eﬀects of themolecular changes underlying adaptation.3

Introduction

Two sites in a genome interact epistatically when the contribution to a trait at onesite depends on the state of the other site. While epistasis has been a signiﬁcanttheme in topics such as the evolution of sex and robustness to mutation, its rolein the dynamics of evolving populations has only begun to be explored. Recentexperimental evolution studies of microbes [1, 2, 3, 4] and biomolecules [5, 6, 7, 8] haverevealed that epistasis is widespread and consequential for adaptation. These studies,combined with experiments that reconstruct ancestral genotypes [9, 10, 11, 12, 13,14, 15] or examine numerous combinations of adaptive mutations [16, 17, 18, 19],have amply demonstrated that molecular evolution cannot be explained or predictedwithout understanding how gene interactions shape adaptive possibilities.Evolution experiments are increasingly used to address two questions that areconceptually distinct, but empirically entangled: of the mutations that could playa role in adaptation, how do their interactions shape evolution, and how do evolu-tionary processes determine what kinds of interactions occur among the sites thatsubstitute? The ﬁrst question focuses on the properties of the genotype-phenotype-ﬁtness relationship, also called the adaptive landscape, while the second questionasks how the mutations chosen by evolution reﬂect that underlying landscape. Ifselection and other evolutionary forces are blind to epistasis – that is, if interactionsamong sites do not inﬂuence the likelihood that they will substitute – then this sec-ond question is irrelevant, and the genetic changes we see in evolution experimentsperfectly mirror the epistatic properties of the underlying adaptive landscapes. If,however, evolution biases the ﬁxation of groups of mutations with speciﬁc patterns4f interactions, then evolution experiments present a complex problem: if epistasisshapes evolution, and evolution distorts the appearance of epistasis, then how can weuse evolution experiments to infer the underlying ﬁtness landscape? This ambiguitycomplicates even qualitative inferences such as whether epistasis among genes canbe said to have slowed or hastened adaptation. To resolve this ambiguity researchersmust ﬁrst understand how evolution biases the combinations of sites that substitutein an adapting population. Only then can researchers hope to correct for these bi-ases, which will depend upon the size, mutation rate, and other characteristics ofthe population, in order to infer the underlying ﬁtness landscape from experimentaldata.Many theoretical studies of epistasis and patterns of asexual adaptation havefocused on questions of the existence and accessibility of multiple ﬁtness peaks [20,21, 22, 23, 24, 25, 26, 27, 28]. While such work has clariﬁed the broad-scale pictureof how epistasis shapes adaptation, its usefulness in predicting microevolutionarydynamics is limited. In contrast, our interest here is how experiments on adaptingpopulations can be used to infer the properties of an organism’s underlying ﬁtnesslandscape and how epistasis shapes those experimental outcomes.We use a computational model to clarify the evolutionary eﬀects of two contradic-tory roles of epistasis: epistatic interactions can undermine the beneﬁts of previouslyadaptive genetic substitutions, but they can also produce new paths to higher ﬁtness[29, 30]. The ﬁrst of these eﬀects would tend to retard adaptation, and the lattereﬀect would accelerate it. Our results show that natural selection biases the preva-lence and type of epistatic interactions among the mutations that substitute, even5hen mutations are too rare to interact directly as coexisting polymorphisms. Herewe work to quantify precisely how selection biases the epistasis among mutationsthat substitute in an adapting population and to understand why these biases arise.

To understand how selection shapes epistasis among the mutations that substitute,we ﬁrst simulated adaptive walks in which beneﬁcial mutations substitute sequen-tially. Østman et al. suggest that epistatic interactions should not inﬂuence substi-tution patterns when rare beneﬁcial mutations ﬁx independently [28]. Our resultsinitially appear to conﬁrm this expectation: Figure 1(a) shows that sites that ﬁxsequentially are almost as likely to interact epistatically as are pairs of randomlychosen sites. However, this concordance disappears when epistasis is examined alongthe sequence of steps comprising an adaptive walk: Fig. 1(b) shows that epistasisis in fact suppressed early in adaptation, and enriched among later steps, comparedto a random (neutral) walk. Thus, selection biases the amount of epistasis amongthe mutations that ﬁx along an adaptive walk, and it does so in a complex manner.These opposing eﬀects produce an apparent agreement with the random expectationfor the overall prevalence of epistasis when observations are coarsely averaged acrossentire walks, but in fact these results demonstrate that epistasis can shape patternsof substitutions even when mutations ﬁx independently, one after another.Why does selection suppress epistasis early in walks and promote it later? To6ddress this, we studied how mutations at sites along an adaptive walk inﬂuence theﬁtness eﬀects of the sites with which they interact (upstream or downstream). Inparticular, in the simple case of K = 1, Fig. 2 shows the distribution of ﬁtness eﬀectsof mutations at those sites that do and do not interact with the site that has justsubstituted along a walk. For a site i that changes early in the walk, mutations at itsinteracting sites are less likely to be beneﬁcial. In other words, adaptive substitutionsearly in the walk partly undermine the beneﬁts that would be conferred by mutationsat their partner sites. Therefore, after an early substitution at one site, its epistaticpartners are less likely to substitute than they would have otherwise – and so theearly steps in an adaptive walks exhibit a deﬁcit of epistasis compared to the neutralexpectation.This bias against epistasis early in a walk is caused by the dependency of selectedsubstitutions on the backgrounds in which they were selectively favored. When asite forms part of the relevant genetic background for a beneﬁcial substitution, it isstatistically likely that changes at such a site would partially undermine the beneﬁcialeﬀect of this ﬁxed adaptive substitution. Supplemental Figure 6(a) demonstratesthis regression to the mean eﬀect: selective coeﬃcients of mutations at a site j aresuppressed when j interacts with a site i that has just ﬁxed an adaptive mutation,and this suppression is greater when the beneﬁcial eﬀect of the substitution at i islarger. 7 .2 Form of epistasis along an adaptive walk Aside from biasing the amount of epistasis along a walk, selection also biases the typeof epistasis between successive substitutions. We ﬁnd that the predominant sign ofepistasis, as well as its prevalence, depends on the position of the substitutions alongan adaptive walk. Fig. 3 shows that early substitutions tend to show antagonism withone another, while later substitutions typically exhibit sign epistasis, deﬁned as pairsof mutations where at least one member has a beneﬁcial ﬁtness eﬀect on one back-ground and a deleterious eﬀect on the other. Synergy between beneﬁcial mutationsis present but less common than antagonism at early steps and less common thansign epistasis at later steps. The shift from antagonistic towards synergistic/signepistasis helps to explain why epistasis is suppressed early in an adaptive walk andaugmented later in the walk.Selection also biases the directionality of interactions between successive substi-tutions along an adaptive walk (see Methods). Fig. 4a shows that interactions with i upstream of j are more frequent than the converse along the entire adaptive walk.This diﬀerence is explained by disparate eﬀect on the evolvability of site j : j ismore likely to become evolvable (able to substitute beneﬁcially) if it is inﬂuencedby the preceding substitution at i , than if it inﬂuences i (see Supplemental Text).This result is conﬁrmed by the regressions in Supplemental Figure 6(b & c), andit explains why such interactions are more prevalent than the converse (Figure 4a).When K >

1, another type of interaction is possible: i and j might not inﬂuenceeach other directly, but both might inﬂuence a third site. Epistasis of this type isexpected to be very common when K is a substantial fraction of N , and results for8 = 5 show that the prevalence of this type of interaction also changes along adap-tive walks (Fig. 4b). Thus, evolution biases the types and directions of interactionsamong substitutions along an adaptive walk.Choosing the K sites which inﬂuence each of the N loci deﬁnes a network ofdirected interactions. While the number of sites that inﬂuence a locus (its in-degree)is ﬁxed at K , the number of sites that a locus inﬂuences (its out-degree) is variableand approximately Poisson distributed. Evolutionary preferences for certain types ofepistasis, such as the bias in favor of substitutions at sites that depend on previoussubstitutions, might then lead to systematic variation in the out-degree of sites whichsubstitute adaptively. Supplemental Figure 7 conﬁrms this hypothesis: the out-degree of substituted sites is initially slightly higher than expected, then declineswith substitution number. This evidence implies that those sites with more epistaticconnections are less likely to change during adaptation, because mutations at suchsites have a greater chance of undermining previously selected beneﬁcial changes. The NK model has several features that could amplify the biases in epistasis in-troduced by natural selection. In order to assess whether these model assumptionsmight lead to spurious results, we explored how the patterns shown in Figure 4change as the number of alleles per site and the starting ﬁtnesses were varied. Sup-plemental Figure 8 shows that deviations from the expected prevalence of epistasisare qualitatively similar, and quantitatively greater, when the number of alleles persite, A , is increased. Supplemental Figure 9 conﬁrms that the basic pattern of our9esults is also robust to changes in the ﬁtness of the starting genotype.While we have focused on evolutionary dynamics in the simpliﬁed, strong-selection-weak-mutation regime, we can use individual-based simulations to explore epistasisin polymorphic populations with larger values of the population-scaled mutationrate, θ . Supplemental Figures 10 and 11 show very similar patterns for the preva-lence of epistasis among adaptive substitutions, even when θ is much greater than1. Some diﬀerences from Fig. 4 are apparent in the high- θ case during the period ofstabilizing selection that follows adaptation. However, these simulations at high θ conﬁrm that the major patterns found in adaptive walks at low θ : there is a deﬁcitof epistasis early in evolution, and a surplus of epistasis later in adaptation, with apredominance of interactions in which the eﬀect of each substitution depends on thepreceding substitution along the line of descent.We also considered a very diﬀerent set of ﬁtness landscapes – computationallypredicted RNA folding – to assess the generality of our principal ﬁndings. RNA se-quences do not have static epistatic interactions between sites; instead, interactionsemerge from the folding topology and change with genotype. However, we can stillmeasure the average frequency of epistasis between substitutions on an evolutionaryline of descent; such data show that the prevalence of epistasis does vary systemati-cally along a series of substitutions, although the trend is toward decreased epistasis(Fig. 7(a)). The type of epistasis can also be quantiﬁed, although the high frequencyof conditionally neutral mutations in RNA necessitates a new category of neutralepistasis. Fig. 7(b) shows that, as in the NK model, early antagonism gives way toa high prevalence of sign (and neutral) epistasis later in adaptation. While these10esults diﬀer in some features from those obtained in the NK model, they furtherillustrate that evolution at θ < Two recent studies on experimental populations of bacteria have inferred that an-tagonistic epistasis among beneﬁcial mutations is common and ultimately explainsa trend of diminishing ﬁtness gains over time [2, 3]. These studies relied in parton regression analyses of the ﬁtness eﬀects of observed substitutions in the presenceand absence of the other beneﬁcial substitutions observed in the experiment. Bothstudies found a trend towards smaller beneﬁcial eﬀects when substitutions were as-sayed in backgrounds of higher ﬁtness and so concluded that antagonistic epistasis11ecelerates adaptation. However, we demonstrate below that in the NK model, suchregressions are not a reliable indicator of the eﬀect of epistasis on the speed or extentof adaptation. Our results suggest that a common statistical artifact – regression tothe mean – confounds the interpretations of such regressions and that analyses ofthe role of epistasis in adaptation may be meaningful only when the actual orderedsequence of substitutions is known.We performed the same kinds of regressions as Chou et al. and Khan et al.[2, 3] on adaptive walks simulated on NK landscapes. Speciﬁcally, we computedrank regression coeﬃcients of background ﬁtness versus ﬁtness eﬀect for the ﬁrstﬁve substitutions in such adaptive walks (see Methods). The distribution of averageregression coeﬃcients in Fig. 5(a) shows a bias toward negative values similar tothose seen in bacterial experiments [2, 3], suggesting that epistasis becomes morenegative with each substitution and decelerates the pace of adaptation. However,this interpretation is contradicted by our results above: Fig. 3 clearly shows that,on average, epistasis becomes more positive with each substitution. Figs. 1,2, and4 also support this view: epistasis is initially disruptive to the large ﬁtness gains ofearly adaptive changes, but then facilitates later adaptive steps. And, ﬁnally, Fig.5(b) shows that a genotype along the line of descent is typically more ﬁt than wouldbe predicted from the ﬁtness eﬀects of its component mutations in the ancestor, andthat this synergistic eﬀect increases along adaptive walks.Our results imply that that regression analysis of ﬁtness eﬀects on diﬀerent ge-netic backgrounds (e.g. those performed by Khan et al. [3] and Chou et al. [2])may be misleading. To clarify this issue, we examined the mean regression coef-12cients for those adaptive walks that were unequivocally accelerated by epistasis –namely, those walks in which the eﬀect of each subsequent mutation was greater thanexpected under multiplicativity. Even when restricted to these highly synergisticwalks, the regression analyses of the type shown in Fig. 5a. are most often nega-tive and so would erroneously suggest increasing antagonism (Supplemental Figure13). Furthermore, we also performed random (neutral) walks, in which substitutionsalong the line of descent are equally likely to exceed or fall short of their expectedmultiplicative ﬁtness, given the ﬁtness eﬀect in the ancestor; even in these walks,regression coeﬃcients of the type studied by Khan et al. [3] and Chou et al. [2] tendbe negative (Supplemental Figure 14). These data conﬁrm that regressions of ﬁtnesseﬀects against the ﬁtnesses of genetic backgrounds cannot be reliably used to inferwhether epistasis has slowed or accelerated epistasis, at least in the NK model.The tendency of these regressions toward negative slopes may be caused by thewell-known confound of ”regression to the mean.” In these regressions, the dependentvariable, ﬁtness eﬀect of substitution i on a genetic background, is mathematicallyinterrelated with the independent variable, the ﬁtness of that same background. Ifthe ﬁtness of the background genotype is very poor, then the epistatic eﬀects ofits alleles are statistically likely to be unusually poor. Any change that perturbsthese epistatic eﬀects is likely to improve their ﬁtness contributions. Therefore, asubstitution in a very unﬁt background is likely to show a large beneﬁcial eﬀectsimply by perturbing the ﬁtness eﬀects of interacting sites. A similar argument canbe made to explain why ﬁtness eﬀects are often small or even deleterious in highly ﬁtbackgrounds and, by extension, why negative correlations are a likely consequence13f the interdependence between the variables in this regression.The results in this section highlight the diﬃculty of interpreting experimental datawhen the order of substitutions in unknown. The ”regression to the mean” eﬀectwe observed may be magniﬁed by the nature of the NK model. Nonetheless, thisexample shows that the epistatic properties of a selected sequence of substitutionsmay diﬀer strongly and systematically from the broader pattern of epistasis in theunderlying landscape. To solve the dual problem of epistasis in evolution experiments – that epistasis bothshapes the evolutionary process and can be inferred by manipulating the sites thatsubstitute adaptively – requires theory beyond current knowledge in population ge-netics. Because the term “epistasis” encompasses all scenarios in which ﬁtness eﬀectsof alleles do not combine independently, no general model of epistasis has been pro-posed, let alone analyzed. Instead, exploration of a few “toy” models has led toappreciation of the subtle and signiﬁcant ways that epistasis complicates our under-standing of evolution.Here we have used the NK model to contravene the intuitive notion [28] thatsites substituting one after another will be selected without regard to their epistaticinteractions. To summarize, we have shown that, even when mutations are rare,evolution selects among possible substitutions based upon the number and directionof connections to other loci, and that these selective biases change substantially14long the course of adaptation. In the NK landscapes, epistasis is less prevalent thenthe random expectation early in adaptation and much more prevalent later, with aconcomitant shift from predominantly antagonistic interactions early in adaptationto synergistic and sign epistasis later in adaptation. Additionally, sites with moreepistatic inﬂuences on other loci are more likely to substitute early than late inadaptive evolution. These results suggest that even the most basic evolutionaryprocess acting in the context of a simple ﬁtness landscape can produce a complexexpectation for epistasis. Experimentalists must account for this baseline action ofnatural selection on epistasis among substitutions if they hope to infer the propertiesof the ﬁtness landscapes underlying experimental or natural evolutionary outcomes.The basic intuition behind our results is simple. Early on, large-eﬀect mutationstend to act antagonistically, which suppresses the frequency of epistasis among sub-sequent substitutions. Later on, the only way to achieve further ﬁtness gains is byfortunate sign epistasis, and this eﬀect tends to augment the appearance of epistasisas the population approaches a ﬁtness peak.We have focused on evolution by sequential ﬁxation of beneﬁcial substitutionsin order to demonstrate that this seemingly simple case conceals several layers ofcomplexity. However, our results suggest patterns, such as a decrease in observedepistasis among early substitutions and an increase in epistasis among later ones, thatextend to polymorphic populations as well. By focusing on the diﬀerences betweenthe distribution of epistasis among all sites, and the speciﬁc sequence of substitu-tions, our approach highlights the potential for misleading inferences from evolutionexperiments when the order of substitutions is unknown. Speciﬁcally, it may be dif-15cult to reliably determine how the sign of epistasis varies with ﬁtness using only theﬁtnesses of an ancestral, derived, and possible intermediate genotypes. Regressionanalyses that ignore substitution order might suggest that epistasis is deceleratingadaptation ([2, 3], Fig. 5), whereas in fact epistasis has had an accelerating eﬀect onthe trajectory of ﬁtnesses along the actual path of adaptation. Indeed, our reanalysisof data from [3] supports this possibility. Such discrepancies illustrate the impor-tance of measuring the order in which substitutions occur, in future experimentalstudies, in order to understand how epistasis has shaped a population’s trajectory.While the NK model has the advantages of a tunable level of epistasis and anextensive history of prior work, it certainly does not capture the full range of pos-sibilities of interactions among genes. Our results with larger numbers of alleles inthe NK model, with a computational model of RNA folding, and with data fromevolving microbial populations, suggest that the patterns we have identiﬁed in sim-ple cases may be even more pronounced in models that better approximate biolog-ical complexity. However, the most important conclusion from our study is thatnew methods of data-gathering, such as cheap whole-genome sequencing and high-throughput ﬁtness assays, will not suﬃce to answer basic questions about the role ofgenotype-phenotype maps in evolution. Pioneering studies have provided compellingexamples of the ubiquity of epistasis, but they also serve to exemplify the substantialgap that separates data from hypotheses in experimental evolution. To deﬁnitivelylink gene interactions to the rate or predictability of adaptation will require a signif-icant expansion of the theory of population genetics, and a vital ﬁrst step is seriousengagement with ”toy” models of interacting loci.16

Methods

Invented by Stuart Kauﬀman to describe evolution on rugged ﬁtness landscapes[31, 32, 20], the NK model produces complex but computationally tractable genotype-ﬁtness maps using only the parameters N , K , and A . The parameter N deﬁnes thenumber of sites, each of which can assume any of A alleles. The ﬁtness of a genotypeis calculated in two steps: ﬁrst, the ﬁtness contribution of each site is determined byreference to a table of pre-calculated values; second, these ﬁtness contributions aremultiplied together and the N-th root of this product is taken as ﬁtness. When K iszero, the ﬁtness contribution of a site depends only on its own allele, and not on thestate of other sites; the lookup table for each site therefore contains only A possibleﬁtness contributions. When K >

0, the ﬁtness contribution of a site depends onits own allele as well as the alleles at K other sites, yielding a lookup table with A K +1 entries. By specifying that each of the entries in the lookup tables for all N sites are drawn independently from a broad distribution (in our case the uniformdistribution), the NK model ensures that the ﬁtness eﬀect of a substitution dependsstrongly and randomly on some fraction of the genetic background, determined by K . K is constant across sites and genotypes for a particular landscape, and the K sites upon which each locus depend are drawn uniformly from the N − i may depend on the state of site j without implying that the contributionof j also depends on i . We can therefore categorize relationships between sites: i downstream of j if the ﬁtness contribution of i depends on j , and j is conversely upstream of i in this example. While this direction of inﬂuence is signiﬁcant for someof our results below, we note a subtle confusion between epistasis as deﬁned in theNK model, and epistasis as deﬁned by an experimentalist. This confusion stems fromthe fact that an experimentalist measures the ﬁtness eﬀect of a substitution, whilethe NK model considers ﬁtness contributions of sites. Therefore, the ﬁtness eﬀect ofa substitution at site i will show epistasis with site j not only if i is downstream of j , but also if it is upstream of j . Similarly, the ﬁtness eﬀect of a substitution at i will be epistatic with j if both i and j are upstream of some mutual site k , even ifneither i nor j directly inﬂuence one another. To minimize this potential confusion,we follow the empirical deﬁnition and use the term ‘epistasis’ to refer to any case inwhich the ﬁtness eﬀect of a substitution at a site depends on the state of some othersite. This usage makes the common assumption that independent ﬁtness eﬀects aremultiplicative.Aside from NK landscapes, we also explore epistasis and evolutionary dynamicsusing RNA folding landscapes, as well as experimental bacterial data, describedbelow. To investigate epistasis among genetic substitutions, we employed two types of sim-ulations: adaptive walks, which model simpliﬁed ﬁxation dynamics in essentiallymonomorphic populations; and individual-based Monte Carlo simulations of poten-tially diverse populations. In an adaptive walk the population is represented by a18ingle genotype, and a mutation to any of the A − s i = w i w − w is the ﬁtness of the currently ﬁxed genotype and w i the ﬁtness of the mutant i .Evolution is then a Markov process with transition probabilities deﬁned by P i = s i (cid:80) i ∈ M s i (1)where M i is the set of all adaptive, one-mutant neighbors of the current genotype[34, 35]. These walks therefore substitute one mutation at a time, strictly increasingin ﬁtness until a local maximum is reached.We used the Wright-Fisher to describe evolution in polymorphic populations: anasexual population of ﬁxed size n reproduces with discrete generations and selectionon fertility. Mutations occur at Poisson frequencies according to the per-genomerate µ . Because our goal is to investigate evolutionary dynamics when nµ = θ isnear or greater than 1, we required a method of detecting substitutions that doesnot depend on independent, well-demarcated ﬁxation events. We therefore trace theline of descent from the ﬁnal population back to the initial generation and recordchanges along this lineage. To minimize the false identiﬁcation of polymorphismsas substitutions, we ran these simulations for n generations past the desired endingtime-point. We then selected the most ﬁt individual and traced its lineage back tothe start, discarding any genetic changes that arose during those last n generations.To compare our results with empirical data, we performed two types of regres-sions on the ﬁtness eﬀects of the ﬁrst few substitutions in evolutionary simulations.19ollowing recent experimental examples [2, 3], we examined the ﬁrst ﬁve substitu-tions in a simulation and measured the ﬁtness eﬀect of each of these substitutionswith all combinations of the allele states at the other four sites. The ﬁtness of each ofthe sixteen genetic backgrounds is taken as the independent variable, and the ﬁtnesseﬀect of the focal substitution as the dependent variable; a separate regression wasperformed for each of the ﬁve sites, though the resulting ﬁve correlation coeﬃcientsare not independent. These analyses were contrasted with a second type of regres-sion, in which the ranks of epistatic deviations of each successive substitution werecompared to their order of substitution. If we consider the ﬁrst ﬁve substitutions,then there are four epistatic deviations: e = W − W W (2) e = W − W W W (3) e = W − W W W W (4) e = W − W W W W W (5)where W , for example, represents the ﬁtness of the genotype with the ﬁrst twosubstitutions, divided by the ﬁtness of the ancestor. Because these regressions arebased on four points, we use this analysis only to classify epistasis along a walk; forexample, we highlight the signiﬁcance of walks in which e > e > e > e as examples where epistasis consistently leads to greater than expected ﬁtness.In both cases, we ignore simulations with little epistasis among the ﬁrst ﬁve20ubstitutions in order to avoid spurious correlations; our ﬁltering removed about20% of simulations. Because we remove walks without regard to the direction ofepistatic eﬀects, this removal should not bias any results.New landscapes were generated for each replicate simulation. In both types ofpopulations, a simulation began from a randomly drawn genotype with ﬁtness in acertain range, usually close to the 50th percentile (that is, within 0.002 of the desiredstarting ﬁtness). We also performed Wright-Fisher simulations of evolving RNA populations usingthe Vienna RNA folding package, version 1.8.5, with default folding parameters.RNA sequences of 72 bases in length constituted the genotypes, and the predictedminimum-free-energy structures determined the phenotypes. Fitness of an RNAgenotype was calculated as (1 + s ) − d where d denotes the tree edit distance be-tween the RNA’s phenotype and a deﬁned optimal phenotype. Here s quantiﬁesthe strength of selection and is equivalent to the multiplicative selective coeﬃcientassociated with a mutation that changes d by a single unit; s = 0 .

01 in the resultsshown. The tree edit distance algorithm, included in the Vienna package, determinesthe minimum number of steps from a group of edit operations that are needed totransform one structure into another. The initial genotype was drawn randomly,and the optimum phenotype, used to impose directional selection, was also createdby randomly drawing genotypes and discarding those whose minimum free energystructure is the trivial, unfolded state. This optimum was also required to be 4021nits from the phenotype of the initial genotype so that the pressure to adapt wasstrong and uniform across replicates. Simulations were run for 50,000 generations.Substitutions in RNA simulations were determined by tracing a line of descent,as described above for NK simulations. Because our goal was to study adaptationamong beneﬁcial mutations, we ﬁltered the resulting records of substitutions byignoring adjacent pairs on the line of descent when both members of the pair wereneutral or deleterious on the background in which they originally ﬁxed. In practice,less than 1% of pairs were excluded by this rule, so our results are not sensitive tothis criterion.

References [1] Blount ZD, Borland CZ, Lenski RE (2008) Historical contingency and the evo-lution of a key innovation in an experimental population of

Escherichia coli . Proceedings of the National Academy of Sciences of the United States of Amer-ica

Science

Es-cherichia coli population.

Science

Chembiochem

Proceedings of the National Academy of Sciences ofthe United States of America

Nature

Plos Genetics

Science

Nature

Proceedings of the National Academy of Sciences of theUnited States of America

Science

Plos Genetics

Plos Pathogens

Plos One

Escherichia coli . Nature Genetics

Plos Genetics

The origins of order: self-organization and selection inevolution (Oxford University Press, New York).[21] Whitlock M, Phillips P, Moore F, Tonsor S (1995) Multiple ﬁtness peaks andepistasis.

Annual Review of Ecology and Systematics

Evolution

PLoS Comput Biol

Theoretical Population Biology

Pro-ceedings of the National Academy of Sciences of the United States of America

Chaos

Plos Computational Biology

Proceedings of the Royal Society B-Biological Sciences

Genetics

The origins of evolutionary innovations: a theory of transfor-mative change in living systems (Oxford University Press, Oxford).[31] Kauﬀman S, Levin S (1987) Towards a general theory of adaptive walks onrugged landscapes.

Journal of Theoretical Biology

Journal of TheoreticalBiology

Science

Evolution

Evolution . . . . . K F r a c t i on E p i s t a t i c a) . . . . . Step Number F r a c t i on E p i s t a t i c K = 1

K = 2

K = 3

K = 4

K = 5 b) Figure 1: a) The overall amount of epistasis along an adaptive walk in an NK land-scape roughly agrees with its random expectation. The dots show the frequency ofepistatic interactions between substitution i and its immediate successor j , averagedacross all steps in the adaptive walk. The solid line depicts the predicted incidenceof epistasis, if sites are chosen to substitute randomly (see supplement). N = 20 and A = 2. Standard errors are less than 0.001. b) The frequency of epistasis betweensubsequent substitutions is depressed compared to the random expectations, earlyin adaptation, and augmented late in adaptation. Dots show the frequencies of epis-tasis between substitution i and its immediate successor j , indexed by the positionof substitution i along the adaptive walk. Lines indicate the corresponding randomexpectation. Standard errors are less than 0.01 for all plotted means.27 fter substitution 1 Selective Coefficent D en s i t y -0.2 -0.1 Epistatic

Non-interacting

After substitution 2

Selective Coefficent D en s i t y -0.2 -0.1 After substitution 3

Selective Coefficent D en s i t y -0.2 -0.1 After substitution 4

Selective Coefficent D en s i t y -0.2 -0.1 After substitution 5

Selective Coefficent D en s i t y -0.2 -0.1 After substitution 6

Selective Coefficent D en s i t y -0.2 -0.1 After substitution 7

Selective Coefficent D en s i t y -0.2 -0.1 After substitution 8

Selective Coefficent D en s i t y -0.3 -0.2 -0.1 After substitution 9

Selective Coefficent D en s i t y -0.3 -0.2 -0.1 Figure 2: Mutations at sites that interact with recent substitutions are less favorableearly in evolution and more favorable late, in comparison to non-interacting sites.Histograms depict the frequencies of the selective coeﬃcients of mutations acrossmany replicate adaptive walks with parameters N = 20, K = 1 and A = 2. After anadaptive substitution occurs at site i , all ( A − N −

1) other possible mutations areinspected and classiﬁed by whether their ﬁtness eﬀects depend on site i epistatically,or are independent of it. As the adaptive walk proceeds, the selection coeﬃcientsof available mutations shift towards negative values in general. At the same time,among the adaptive mutations available late in evolution, the great majority of theminteract with a recent substitution. Thus, epistasis tends to retard the substitution ofinteracting sites early in evolution, and promote such substitutions late in evolution.28 tep Number F r equen cy o f T y pe o f E p i s t a s i s . . . . . . . . . . . . Synergy

Antagonism

Sign a) Step Number F r equen cy o f T y pe o f E p i s t a s i s . . . . . . . . . . . . b) Figure 3: Prevalence of synergy, antagonism, and sign epistasis among pairs of con-secutive substitutions ( i, j ) along adaptive walks on

N K landscapes. Such walks arecharacterized by an abundance of antagonism early in adaptation, and an abundanceof sign epistasis late in adaptation. a) K = 1, b) K = 5. N = 20 and A = 2.29 . . . . . . . Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i a) . . . . Step Number F r equen cy o f R e l a t i on s h i p b) i depends on jj depends on i Indirect . . . Figure 4: Observed frequency of directional epistatic interactions between substitu-tion i and its immediate successor j along adaptive walks on N K landscapes. a) K = 1; b) K = 5. A single pair may be counted as more than one type of epistasis,as in the case of reciprocal interactions. The dashed lines depict the predicted inci-dences if substitutions are chosen randomly. For K >

1, another class of epistasisis possible: both i and j may jointly inﬂuence a third site, which we refer to as anindirect interaction. N = 20 and A = 2. Standard errors are less than 0.01 for allplotted means. 30 ean Spearman's rho -0.5 a) Substitutions P r opo r t i on B e tt e r T han E x pe c t ed . . . . . . b) Figure 5: Two views of epistasis among the ﬁrst ﬁve substitutions in an adaptivewalk. a) Spearman regression coeﬃcients for the relationships between the ﬁtnesseﬀect of a substitution and the ﬁtness of the genetic background in which that sub-stitution is made [2, 3]. Each substitution is tested against the sixteen genetic back-grounds comprising all combinations of the other four substitutions. b) Proportionof combinations of substitutions along the line of descent which exceed the ﬁtnessexpected from the ﬁtness eﬀects of the individual mutations in the ancestor. Ex-pectations are computed according to Equations 7-10. Standard errors are less than1%. In both ﬁgures, replicates are ﬁltered to remove adaptive walks with too littleepistasis among substitutions (see methods) – data shown in panel (b) are addition-ally ﬁltered to remove cases where W = W W /W . The analysis on the left panelwould suggest that epistasis is primarily antagonistic, whereas in fact the right panelshows that synergistic interactions dominate along the walk. N = 20, K = 5 and A = 2. 31 Appendix

The simplicity of the NK model allows the probabilities of diﬀerent epistatic relation-ships between substituting sites to be calculated, assuming that sites are chosen tosubstitute at random. For distinct loci i and j , the random (neutral) walk prevalencesare given by: p N,u = KN − p N,d = KN − p N,i = 1 − (cid:18) − KN − K − N − (cid:19) N − (8) p N,m = (1 − p N,u )(1 − p N,d )(1 − p N,i )= (cid:18) − KN − (cid:19) (cid:18) − KN − K − N − (cid:19) N − (9) Our results focus on two main types of epistasis: a site may be upstream or down-stream of another site. While both types of inﬂuence create an epistatic interactionbetween the ﬁtness eﬀects of changes at both sites, simulation results reveal a pref-erence for interactions where the ﬁrst site to substitute, i , is upstream of j . Here wetry to quantify both hypothesized eﬀects of epistasis – the detrimental undermin-32ng of ﬁxed beneﬁcial eﬀects, and the positive potential for epistasis to uncover newdirections for adaptation – by calculating conditional probabilities in the NK model.The ﬁrst goal is to calculate the probability that a site j with alleles b and B iscapable of a beneﬁcial substitution after a beneﬁcial change at site i from allele a toallele A , assuming epistasis between i and j . We consider two ways in which sites i and j can interact: i is upstream of j , and so i determines the eﬀect of substitution at j , or i may be downstream of j ; we will ignore the case where each site may mutually inﬂuence one another. In both cases, we want to solve: P { W AB > W Ab } = P { w A ( B ) w B ( A ) > w A ( b ) w b ( A ) } (10)where the notation w A ( b ) indicates the ﬁtness eﬀect of allele b at site j when site i has allele A , and W AB represents the total ﬁtness contribution of both sites whensite i has allele A and site j has allele B . Because ﬁtness is a product across sites, wedeﬁne W AB = w A ( B ) w B ( A ). While ﬁtness is technically the Nth root of the productacross all N sites, our results below depend only on the ranks of ﬁtness contributions;we therefore ignore this complication.These calculations, performed in detail below, demonstrate that the chance for asite to be evolvable – selectively favored to substitute – is diminished if that site’sepistatic partner has beneﬁcially substituted, and that this reduced probability iscaused by the fact that such a substitution will disrupt the genetic background thatwas favorable to the prior substitution. Strikingly, these costs to epistasis seem tobe consistent between the two types of epistasis, despite the observed preference forinteractions where i is upstream of j (Figure 3). While these two kinds of epistasis are33qually likely to undermine the beneﬁt of previous substitutions, and so have a netdeleterious eﬀect, they diﬀer in a more subtle aspect of evolvability. Natural selectionshould exhaust the set of potentially beneﬁcial changes and so late in adaptation weexpect evolvable sites to be rare. To simulate late adaptation, we could assume that w a ( B ) w B ( a ) < w a ( b ) w b ( a ) – that is, that site j was not evolvable prior to change atsite i . Does this assumption reveal a diﬀerence between the two classes of epistasis?Simple numerics demonstrate that this additional assumption favors substitutionat j when j is downstream of i : 35.8% ± i , as opposed to only 14.4% ± j is upstream of i . Whilethese probabilities could be derived as above, a more intuitive approach is to considerthe eﬀect of each assumption of the ﬁtness eﬀect of a substitution at site j , aftersubstitution at site i . This eﬀect is related to the ratio: w B ( A ) w A ( B ) w b ( A ) w A ( b )In either model, our expectation of the value of the denominator is increased bythe knowledge that w a ( B ) w B ( a ) < w a ( b ) w b ( a ) < w A ( b ) w b ( A ). When j is downstreamof i , this increase is partially mitigated by two factors: w B ( A ) = w b ( A ), and theremaining term in the numerator, w A ( B ), is an independent draw. However, when j is upstream of i , then w B ( A ) is an independent draw, and w A ( B ) is expected to besmaller than an independent draw, since it equals w a ( B ). This asymmetry betweenthe evolutionary acceptability of two directions of epistatic inﬂuence is thereforecaused by another asymmetry: the high likelihood that sites are unevolvable later inan adaptive process. 34 .3 Site i is upstream of j When j depends on i , but i is independent of j , then we have w b ( A ) = w B ( A ) and w b ( a ) = w B ( a ). Eq. 10 therefore reduces to: P { W AB > W Ab } = P { w A ( B ) w B ( A ) > w A ( b ) w b ( A ) } = P { w A ( B ) > w A ( b ) } Because the beneﬁcial substitution a → A occurred on background b , and weassume only beneﬁcial substitutions, w b ( A ) w A ( b ) > w b ( a ) w a ( b ). We want the distri-bution of w A ( b ), given that: w A ( b ) > w b ( a ) w a ( b ) w b ( A )All four of these variables are standard random uniform variates, and we assumethat they are independent. Let x = w b ( a ), y = w a ( b ), z = xy , and v = w b ( A ). Wecan therefore solve for the convolution z = xy , then for the quotient distribution zv .We can ﬁnd the distribution of z by constructing the cumulative distributionfunction: F xy ( z ) = z + (cid:90) z zx dx = z (1 + lnx | z ) = z (1 − lnz )Then we diﬀerentiate to obtain the p.d.f.: f xy ( z ) = − ln ( z )35ext, we obtain the convolution of z over v by a similar construction. Let q = zv ,and note that q ranges from zero to inﬁnity. A simple geometric approach revealsthat q is piecewise at 1. When q ≤

1, we can deﬁne the c.d.f.: F z/v ( q | q ≤

1) = (cid:90) q − ln ( z ) (cid:18) − zq (cid:19) dzf z/v ( q | q ≤

1) = 14 (3 − ln ( q ) + 1))and when z > F z/v ( q | q >

1) = (cid:90) − ln ( z ) (cid:18) − zq (cid:19) dzf z/v ( q | q >

1) = 14 q However, our goal is to ﬁnd the distribution of w A ( b ) given that it is greater than q , and since 0 ≤ w A ( b ) ≤

1, this latter piece of f ( q ) is not needed. f ( x = w A ( b )) = (cid:82) x − ln ( q ) + 1) dq (cid:82) dx (cid:82) x − ln ( q ) + 1) dq = x (3 − ln ( x ))2Focal site j will be beneﬁcial if w A ( B ) > w A ( b ), which we can now calculate as:36 { w A ( B ) > w A ( b ) } = (cid:82) dx (cid:82) x y (3 − ln ( y )) dy (cid:82) dx (cid:82) y (3 − ln ( y )) dy = 718 i is downstream of j If i depends on j , but j is independent of i , then w a ( b ) = w A ( b ) and w a ( B ) = w A ( B ).The condition w A ( b ) w b ( A ) > w a ( b ) w b ( a ) therefore simpliﬁes to w b ( A ) > w b ( a ). Let x = w b ( A ) and y = w b ( a ) and assume that these are independent uniform randomvariates. p ( x | x > y ) = 2 (cid:90) x dx = 2 x Now we let v = w A ( b ) and ﬁnd the convolution z = xv | x > y by ﬁrst deriving ac.d.f. F xv | x>y ( z ) = 2 (cid:90) z xdx + 2 (cid:90) z x zx dx = z + 2 z − z = 2 z − z f xv | x>y ( z ) = 2 − z Now, we want to assess the probability that w = w A ( B ) w B ( A ), a pair of inde-pendent random uniform variates, is greater than z . Using the result above that p ( w ) = − ln ( w ), we can write this as a ratio of integrals:37 { w A ( B ) w B ( A ) > w A ( b ) w b ( A ) } = (cid:82) − ln ( w ) dw (cid:82) w − zdz (cid:82) − ln ( w ) dw (cid:82) − zdz = 71838 .5 Supplemental Figures Figure 6: Relation of the beneﬁcial ﬁtness eﬀect of substitution i to the change inthe ﬁtness eﬀect conferred by a mutation at interacting site j . Data are drawn fromthe ﬁrst step in adaptive walks. a) All types of interactions; b) Interactions in whichsite i is downstream of site j ; c) Interactions in which i is upstream of j . N = 20, K = 1 and A = 2. 39 . . . . . . . K = 1

Step Number O u t D eg r ee K = 5

Step Number O u t D eg r ee Figure 7: Mean out-degree (number of loci which depend epistatically on the focalsite) of substituted sites. Each plotted point is based on at least 5000 adaptive walks. N = 20 and A = 2. 40 . . . . . . . A = 2

Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i . . . . . . . A = 4

Step Number F r equen cy o f R e l a t i on s h i p . . . . . . . A = 8

Step Number F r equen cy o f R e l a t i on s h i p . . . . . . A = 16

Step Number F r equen cy o f R e l a t i on s h i p Figure 8: Observed frequency of directional epistatic interactions between substi-tution i and its immediate successor j for adaptive walks on N K landscapes withdiﬀerent values of A , the number of alleles per locus. A single pair may be countedas more than one type of epistasis, as in the case of reciprocal interactions. Thedashed lines depicts the predicted incidences if sites are chosen uniformly at random,as calculated in Eqs. 2&3. N = 20 and K = 1. Each plotted point is based on atleast 5000 replicates. 41 . . . . . Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i . . . . Step Number F r equen cy o f R e l a t i on s h i p . . . . . Step Number F r equen cy o f R e l a t i on s h i p . . . . . . Step Number F r equen cy o f R e l a t i on s h i p Figure 9: Observed frequency of directional epistatic interactions between substi-tution i and its immediate successor j for adaptive walks on N K landscapes fordiﬀerent values of the ﬁtness percentile of the starting genotype. A single pair maybe counted as more than one type of epistasis, as in the case of reciprocal interac-tions. The dashed lines depicts the predicted incidences if sites are chosen uniformlyat random, as calculated in Eqs. 2&3. N = 20 and K = 1. Each plotted point isbased on at least 5000 replicates. 42 . . . . . . Theta = 1

Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i . . . . . . . . . . . Theta = 10

Step Number F r equen cy o f R e l a t i on s h i p . . . . . Figure 10: Observed frequency of directional epistatic interactions between substitu-tion i and its immediate successor j for individual-based simulations with populationsizes of 1000 and per-genome mutation rates of 0.001 and 0.01. A single pair may becounted as more than one type of epistasis, as in the case of reciprocal interactions.For K >

1, another class of epistasis is possible: both i and j may jointly inﬂuence athird site, which we refer to as an indirect interaction. The dashed lines depicts thepredicted incidences if sites are chosen uniformly at random, as calculated in Eqs.2-5. Red lines and corresponding axes depict the mean change in ﬁtness. N = 20and K = 1. 43 . . . . . . . Theta = 1

Step Number F r equen cy o f R e l a t i on s h i p i depends on jj depends on i Indirect . . . . . . . . . . . . . Theta = 10

Step Number F r equen cy o f R e l a t i on s h i p . . . . . Figure 11: Observed frequency of directional epistatic interactions between substitu-tion i and its immediate successor j for individual-based simulations with populationsizes of 1000 and per-genome mutation rates of 0.001 and 0.01. A single pair may becounted as more than one type of epistasis, as in the case of reciprocal interactions.For K >

1, another class of epistasis is possible: both i and j may jointly inﬂuence athird site, which we refer to as an indirect interaction. The dashed lines depicts thepredicted incidences if sites are chosen uniformly at random, as calculated in Eqs.2-5. N = 20 and K = 5. 44 . . . . . . Step Number (Index of i) F r equen cy o f E p i s t a s i s be t w een i and j Step Number T y pe o f E p i s t a s i s . . . . . . . . . . . . NeutralSynergyAntagonismSign

Figure 12: Epistasis in individual-based evolution in a model of RNA folding. a)Observed frequency of epistatic interactions between substitution i and its immedi-ate successor j . Error bars indicate conﬁdence intervals. b) Prevalence of four typesof epistasis among those pairs of substitutions that interact. For both ﬁgures, pop-ulation of 5000 individuals were simulated for 50,000 generations at a per-genomemutation rate of 2 × − . 45 ean Spearman's rho -0.6 -0.4 -0.2 Figure 13: Spearman rank regression coeﬃcients for the relationships between theﬁtness eﬀect of a substitution and the ﬁtness of the genetic background in which thatsubstitution is made for adaptive walks which exceed multiplicative expectationsat each step. Each substitution is tested against the sixteen genetic backgroundscomprising all combinations of the other four substitutions. Replicates are ﬁlteredto remove walks with too little epistasis among substitutions (see main text). N = 20, K = 5 and A = 2. 46 ean Spearman's rho -0.8 -0.6 -0.4 -0.2 a) Substitutions P r opo r t i on B e tt e r T han E x pe c t ed . . . . . . b) Figure 14: Two views of epistasis among the ﬁrst ﬁve substitutions on randomwalks. a) Spearman rank regression coeﬃcients for the relationships between theﬁtness eﬀect of a substitution and the ﬁtness of the genetic background in whichthat substitution is made. Each substitution is tested against the sixteen geneticbackgrounds comprising all combinations of the other four substitutions. b) Pro-portion of combinations of substitutions along the line of descent which exceed theﬁtness expected from the ﬁtness eﬀects of the individual mutations in the ancestor.Expectations are computed according to Equations 7-10. Standard errors are lessthan 1%. In both ﬁgures, replicates are ﬁltered to remove walks with too little epis-tasis among substitutions (see main text) – data show in panel (b) are additionallyﬁltered to remove cases where W = W W /W . N = 20, K = 5 and A = 2.47 - . . . . Substitution Number O b s e r v ed - E x pe c t ed F i t ne ss a) - . . . . Substitution Number O b s e r v ed - E x pe c t ed F i t ne ss b) Figure 15: Epistasis deviation is initially negative, then becomes positive along thesequence of the ﬁrst ﬁve substitutions observed in a microbial evolution experiment(Khan et al. 2011). The substituted loci are labeled r , t , s , g , and p respectively.Let w r be the ﬁtness of the mutant r allele relative to the ancestor, w rt the ﬁtnessof the double mutant, and so on. Expected ﬁtness can be calculated in at leasttwo ways, each of which shows the same qualitative pattern. a) Expected ﬁtness iscomputed as the product of the eﬀects of each mutation in the ancestral genotype.For example, the deviation for substitution 3 is calculated as w rts w rt − w s . b) Expectedﬁtness is computed with reference to the eﬀect of the focal allele in the previousgenetic background. For example, the deviation for substitution 3 is here calculatedas w rts w rt − w rs w rr