Antibiotic resistance landscapes: a quantification of theory-data incompatibility for fitness landscapes
Kristina Crona, Dayonna Patterson, Kelly Stack, Devin Greene, Christiane Goulart, Mentar Mahmudi, Stephen D. Jacobs, Marcelo Kallman, Miriam Barlow
AANTIBIOTIC RESISTANCE LANDSCAPES: A QUANTIFICATION OFTHEORY-DATA INCOMPATIBILITY FOR FITNESS LANDSCAPES
KRISTINA CRONA, DAYONNA PATTERSON, KELLY STACK, DEVIN GREENE, CHRISTIANEGOULART, MENTAR MAHMUDI, STEPHEN D. JACOBS, MARCELO KALLMAN, MIRIAM BARLOWA
BSTRACT . Fitness landscapes are central in analyzing evolution, in particular for drugresistance mutations for bacteria and virus. We show that the fitness landscapes associ-ated with antibiotic resistance are not compatible with any of the classical models; addi-tive, uncorrelated and block fitness landscapes. The NK model is also discussed.It is frequently stated that virtually nothing is known about fitness landscapes in na-ture. We demonstrate that available records of antimicrobial drug mutations can revealinteresting properties of fitness landscapes in general. We apply the methods to analyzethe TEM family of β -lactamases associated with antibiotic resistance. Laboratory resultsagree with our observations. The qualitative tools we suggest are well suited for com-parisons of empirical fitness landscapes. Fitness landscapes are central in the theory ofrecombination and there is a potential for finding relations between the tools and recom-bination strategies.
1. B
ACKGROUND
The fitness landscape was introduced as a metaphor for adaptation. Informally, thesurface of the landscape consists of genotypes, where similar genotypes are close toeach other, and the fitness of a genotype is represented as a height coordinate. Adap-tation can then be pictured as an uphill walk in the fitness landscape Wright (1931). Itis frequently claimed that we know virtually nothing about fitness landscapes in na-ture. Scarcity of fitness measurements along with the difficulty in measuring fitness,are cited as reasons. The purpose of this work is to demonstrate that available recordsof drug resistance mutations can reveal interesting properties of the underlying fitnesslandscapes. We suggest qualitative tools that are easy to apply and interpret in orderto learn properties of fitness landscapes from data. The setting we have in mind is arecord of clinically found antimicrobial drug resistance mutations, where there is a welldefined wild-type and several mutant variants with some degree of drug resistance.Other cases of adaptation could work as well.There are several advantages with mutation records as a source of information. Therecords are already available. The quantity of data is substantial and growing, and thequality tend to be high since the data is of medical importance. Moreover, the datareflects nature, whereas laboratory data sometimes disagree with clinical observations,see our discussion about the TEM-family below. However, the most important reasonis that this data reflects adaptation. In contrast, many times empirical studies where fit-ness is measured consider combinations of deleterious mutations. The majority of such a r X i v : . [ q - b i o . P E ] M a r CRONA ET. AL combinations are probably exceedingly rare in nature. If one is interested in adaptation,one needs knowledge about beneficial mutations as well. For an overview of recentempirical work where fitness is measured, see e.g. Szendro et al. (2012); Weinreich etal. (2005); Carnerio and Hartl (2010) and references. Most existing studies concern fewloci (4 or 5) and a specific selective environmnet. For studies with many loci, see e.g.Kouyos et al (2012); Schenk et al. (2012), and for a case where fitness ranks for the samegenotypes are determined in several different selective environments, see Goulart et al.(2013).We apply our results to the TEM family of beta-lactamases associated with antibioticresistance. TEM stands for Temoneira, the name of the patient from whom the enzymewas first isolated. TEM beta-lactamases have been found in
Escherichia coli , Klebsiellapneumoniae and other gram-negative bacteria. TEM-1 is considered the wild-type. Thelength of TEM-1 is 287, i.e., TEM-1 can be represented as a sequence of 287 letters inthe 20-letter alphabeth corresponding to the amino acids. Over 170 TEM variants havebeen found clinically, where 41 are single mutants, i.e., they have exactly one amino acidsubstitution, and the the majority (90 %) have at most 4 amino acid substitutions. Weuse the record of the TEM family from the Lahey Clinic .The quality of the TEM record (from now on we will simply refer to ”the TEM record”),is assumed to be high The TEM record represent a case of multiple environments, butprobably not an exessive amount of completely different environments. Several antibi-otics have similar effects. We have good reason to believe that the TEM record is fairlycomplete, and that there is not an abundance of neutral mutations.As a complement, we use laboratory results (Goulart et al., 2013). Our study deter-mines if TEM data is compatible with classical models of fitness landscapes. More pre-cisely, we compare with additive, uncorrelated and block models of fitness landscapes.The NK model is also dicussed.Throughout our article, our focus is to what extent beneficial mutations combine well.We start with a brief description of our approach in the context of the TEM family. Us-ing standard notation, TEM-2 is a single mutant with the mutation Q39K, which meansthat the amino acid denoted ”Q” (glutamine) at position 39 of the wild-type is substi-tuted by the amino acid denoted ”K” (lysine). TEM-174 is the single mutant A213V.One can ask if the double mutant with substitutions Q39K and A213V confer antibioticresistance, since the double mutant combines two resistance mutations. However, thedouble mutant does not occur in the record.Roughly, we compare the candidates for double mutants, such as the one described,with double mutants that do occur in the record, and consider the patterns for how can-didates occur of are absent in the record. This approach is motivated by an evolutionaryperspective. Provided that the quality of a record of resistance mutations is good, mostsingle mutants are more fit, i.e., confer more drug resistance than the wild-type in somenvironment. Likewise, if a double mutant occur in a mutation record, it is plausiblethat the double mutant is more fit than at least one of the corresponding single mutantsin some environment.
NTIBIOTIC RESISTANCE LANDSCAPES 3
Put briefly, we consider if ”good+good=better” for mutations. The goal is to capturethe relation between fitness landscapes and mutation records. One reason for beinginterested in mutation records, is that laboratory results do not always reflect clinicalfacts. A striking example is that the triple mutant of the TEM family with substitutionsA42G, E104K, and G238S, confer a high degree of cefotaxime resistance according tolaboratory results Weinreich et al. (2006), but this mutant has never been observed clini-cally. Examples where single mutants not found outside of the laboratory confer a highdegree of cefotaxime resistance are given in Schenk et al. (2012).It may seem surprising qualitative information can be useful for analyzing fitnesslandscapes. However, we will show that predictions from some classical models relatewell to qualitative information. Our approach could be useful for comparisons of em-pirical landscapes, and there is a potential for relating the information derived directlyto recombination strategies.We define fitness as the expected reproductive success, and use the convention thatthe wild-type has fitness 1. Fitness is called additive if the fitness effects of mutationssum. Consider a biallelic two-loci system. Suppose that the genotype ab has fitness1, the genotype Ab has fitness 1.03 and the genotype aB has fitness 1.01. If fitness isadditive, then the genotype AB has fitness .
04 = 1 + 0 .
01 + 0 . (In the literature non-epistatic fitness is sometimes defined as multiplicative, so that the double mutant wouldhave fitness 1.0403. If the fitness effects and the number of beneficial mutations aresmall, there is not much difference between the definitions.) Values greater than 1.04implies positive epistasis. Values smaller than 1.04 implies negative epistasis. Sign epistasis means that a particular mutation is beneficial or deleterious dependingon genetic background. For example, if ab, Ab, aB and have fitness values as above (1,1.03, 1.01), but AB has fitness 1.02, then there is sign epistasis. Indeed, in this case themutation B is beneficial for the ab-genotype, and deleterious for the Ab-genotype.The concept of a fitness landscapes has been formalized in different ways. A genotypemay be represented as a string in the 20, 4 or 2 letter alphabet, depending on if oneconsiders the amino acids, the base pairs or biallelic systems. Thorughout the paper, wewill consider amino acids.Let Σ denote the 20 letter alphabeth. The genotype space Σ L consists of all L stringsof length L . A fitness landscape w : Σ L (cid:55)→ R assigns a fitness value to each genotype. Thefitness of a genotype g is denoted w g . If two genotypes differ by a single mutation, theyare mutational neighbors . Remark 1.1.
Following the Orr-Gillespie approach we assume that the wild-type hasvery high fitness also in the new environment as compared to a randomly generatedgenotype. Consequently, only a small number of mutations of the wild-type are benefi-cial.The paper is structured as follows. In Section 1.1 we briefly review classical models offitness landscapes. Section 2 provides basic observations of the TEM record. Section 3concerns additive and uncorrelated fitness, and Section 4 block models. For all models,
CRONA ET. AL we compare with the TEM record, and in Section 5 a laborator study of TEM alleles isused as a complement to the record.1.1.
Classical models of fitness landscapes.
Additive fitness landscapes, uncorrelatedfitness landscapes, the block model and Kauffman’s NK model have had a broad influ-ence in evolutionary biology. We will give a brief overview of the four classical models.Additive fitness landscapes, or non-epistatic landscape, has been defined. An addi-tive fitness landscape is single peaked.In contrast, for an uncorrelated (also called random, rugged or House of Cards [HOC])fitness landscape, there is no correlation between the fitness of a genotype and the fit-ness of its mutational neighbors, i.e., alleles that differ by one substitution only.Consider an uncorrelated landscapes where say of the single mutants are more fitthan the wild-type. It follows that for double mutants corresponding to two beneficialsingle mutations, approximately are more fit than the wild-type as well. In otherwords, beneficial mutations do usually not combine well for an uncorrelated fitnesslandscape by Remark 1.1.Uncorrelated fitness and additivity can be considered as two extremes with regard tothe amount of structure in the fitness landscape, and most fitness landscape fall betweenthe extremes. Uncorrelated fitness has been studied extensively in the literature (see e.g.Kingman, 1978; Kauffman and Levin, 1987; Flyvberg and Lautrup, 1992; Rokyta et al.,2006; Park and Krug, 2008).For the block model (see (Macken and Perelson, 1989; Orr, 2006, e.g.)) the stringrepresenting a genotype can be subdivided into blocks, where each block makes an in-dependent contribution to the fitness of the string. Each block has uncorrelated fitness,and the fitness of the string is the sum of contributions from each block. In particular,a block model consisting of only one block only is an uncorrelated fitness landscape.The rational behind this model is that if two blocks have completely different functions,then the effect of two changes in different blocks should be independent.Kaufmann’s NK model (see e. g. Kauffman and Weinberger (1989)) is defined so thatthe epistatic effects are random, whereas the fitness of a genotype is the average of the”contributions” from each locus.More precisely, for the NK model the genotypes have length N (in our notation L = N ), and the parameter K , where ≤ K ≤ N − , reflects interactions between loci. Thefitness contribution φ i from the locus i is determined by its state g i and the states at K other loci i , . . . , i K . The key assumption is that this contribution is assigned at randomfrom some probability distribution. The fitness of a genotype g is the average of thecontributions φ i , so that w g = 1 N N (cid:88) i =1 φ i ( g i , g i , . . . , g i K ) , where i , . . . , i k ⊂ { , . . . , i − , i + 1 , . . . , N } . Several important properties of NK land-scapes depend mainly on N and K, rather than the exact structure of the epistatic inter-actions. The fact that the fitness of the genotype is the average of these N contributions, NTIBIOTIC RESISTANCE LANDSCAPES 5 means that fitness effects of non-interacting mutations sum. Notice that the fitness land-scapes is additive for K = 0 and uncorrelated for K = N − . The popularity of the NKmodel rests on the that the model is ”tunably rugged”. This expression means that theruggedness is expected to increase by K from the single peaked additive landscape for K = 0 to the uncorrelated landscape with a maximal number of peaks for K = N − .Published results on the NK model of (potential) relevance to evolutionary biology con-cerns the number of peaks, the length of mutational trajectories, fitness distributions ofgenotypes and fitness trajectories.Notice that also the block model includes additive landscapes and uncorrelated land-scapes as special cases. More importantly, NK models and block models are similar inthat there is a sharp division between effects which are completely random and effectswhich are additive. One should keep in mind that the block models and Kaufman’s NKmodel are equipped with very special structures. In order to provide some intuition forhow empirical landscapes relate to the models, we will consider examples. Example 1.2. w = 1 , w = w = w = 1 . , w = 1 . , w = w = 1 . , w = 1 . For every loci, replacing 0 by 1 increases fitness by . . Fitness is additive and thegenotype is at a peak. Example 1.3. w = 1 , w = w = w = w = 1 . ,w = w = w = 1 . , w = 1 . ,w = 1 . , w = w = w = 1 . ,w = w = w = 1 . , w = 1 . . For every loci, replacing 0 by 1 increases fitness. For the first locus the increase is . ,regardless of background. For the other loci, the magnitude of the difference dependson the background. For instance, the magnitude is 0.02 for the change from 0000 to 0001,0.01 for the change from 0001 to 0011, and 0.005 for the change from 0011 to 0111.Fitness is obviously not additive in the second example, since the fitness of the doubleand triple mutants are below linear expectations based on the wild-type and the singlemutants. The landscape deviates from expectations for an uncorrelated landscape aswell, since replacing 0 by 1 always gives higher fitness. It remains to consider the moregeneral models. As for the block model, the first locus is independent, The remainingthree loci interact with each other in a symmetric way. Consequently, the natural can-didates for blocks would be one block consisting of the first locus, and another blockconsisting of the remaining three loci. However, the second block deviates considerablyfrom random expectations. Consequently, the block model is not a good fit.For the NK model, the independent fitness contribution for the first loci suggest that K = 0 . However, the second locus depends on the third and the fourth loci, suggestingthat K = 2 . (Similar arguments for the third and the fourth loci, suggests that K = 2 as CRONA ET. AL well.) Since the observations suggest different K -values, the NK model does not seemideal. Remark 1.4.
The NK model allows different interactive patterns. However, it is notexpected that some loci are more or less independent, whereas other loci have consid-erable interactions. Some degree of of symmetry is expected, reflecting the K value.Several other models for fitness landscapes have been suggested, including neutralmodels, see Szendro et al. (2012) for an empirical perspective. For some approachesto fitness landscapes not related to the models mentioned, see the geometric theory ofgene interactions Beerenwinkel et al. (2007b); Crona (2013), and the Orr-Gillespie theoryOrr (2002). Notice also that fitness landscapes have been used in chemistry, physics andcomputer science, in addition to evolutionary biology. In combinatorial optimizationthe fitness function is referred to as the cost function. For a survey on combinatoriallandscapes in general see Reidys and Stadler (2002).2. T HE QUALITATIVE MEASURE OF ADDITIVITY AND THE
TEM
RECORD
Throughout the paper, we focus on single and double mutants in a record. The mo-tivation is trivial. If a single mutant has high fitness, it is likely to be found in nature.However, if a k -tuple mutant is very fit for some large k , the mutant may never be foundbecause of the time span necessary before the substitutions have accumulated. Singleand double mutants are likely to appear relatively early in the process of adaptation.Roughly, we are interested in the proportion of beneficial mutations among all possi-ble single mutations, as well as to what extent beneficial mutations combine well. Theinformation we consider is coarse, and a record of mutation will rarely be perfect.As indicated, we will work with words in the 20 letter alphabet where there is awell defined wild-type. A single mutant is a genotype resulting from one amino acidsubstitution. However, the amino acid substitutions are not comparable to substitutionsof letters in a string. Not every amino acid substitution can occur as the result of asingle point mutation. For instance, suppose that the amino acid is Valine at a particularlocus, and that the codon is GTT. Then one can obtain exactly 6 single mutants at thelocus corresponding to A, D, F, G, I, L (or Alanin, Aspartic acid, Phenylalanine, GlysineIsoleucine, Leucine). On the other hand, one can obtain exactly 8 single mutants (A, D,E, F, G, I, L, M) starting from Valine (the codons for Valine are GTT, GTC, GTA, GTG).In general, the number of single mutants one can obtain varies depending on aminoacid. Moreover, in some cases, such as for Valine, the number depends on if one considera particular codon for the amino acid or all possible codons.To make matters more complicated, the wild-type allele under consideration may beunique in terms of amino acids, but not in terms of codons. For a precise analysis,one may want to consider codon variations in the wild-type allele. However, for ourpurposes it is sufficient to consider amino acids.We assume that there are approximately 7 possible single point mutations for a givenlocus, so that if the wild-type has length N , there are N mutational neighbors. For thereader’s convenience, we included a table of possible single mutants (see Section 7). NTIBIOTIC RESISTANCE LANDSCAPES 7
Remark 2.1.
Throughout the paper, we assume a genotype of length N has N muta-tional neighbors.We are interested in the proportion of beneficial mutations among all possible muta-tions. By Remark 1.1, the proportion of beneficial mutations is expected to be small. Forthe TEM record N = 287 and there are 47 single mutants in the record. Consequently, S R N = 66 ·
287 = 2 . Another interesting property of a fitness landscape, is how beneficial mutations com-bine. More precisely, consider a double mutant which combines two beneficial singlemutations. If the double mutant is less fit than both single mutants, then the double mu-tant would most likely not appear in the record. The qualitative measure of additivityis motivated by this observation. More precisely, we will use the following definition.
Definition 2.2.
Let B p be the set consisting of all double mutants such that both corre-sponding single mutations are beneficial. The set B ⊂ B p consists of all double mutantsin B p which are more fit than at least one of the corresponding single mutants. Thequalitative measure of additivity for a fitness landscape is the ratio | B || B p | .Consider the single mutations in a record and the corresponding double mutants.Whenever two single mutants at different sites occur in the record, the correspondingdouble mutant is considered a candidate for a double mutant of high fitness. Let ˆ B p bethe set of candidates for double mutants. Let ˆ B ⊂ ˆ B p be the set of double mutants inthe record among the candidates. Loosely speaking, one can consider | ˆ B || ˆ B p | the observedqualitative measure of additivity. Under ideal circumstances, the ratios | ˆ B || ˆ B p | and | B || B p | areapproximately the same, at least in for antimicrobial resistance mutations in the contextwe consider. We assume that the adaptation, or the resistance development, will takeplace repeatedly at different geographic locations. If a double mutant is more fit than atleast one of the single mutants, the double mutant should occur sooner or later.If fitness is additive, then | B || B p | =1, and for uncorrelated fitness one expects the value beclose to 0 by Remark 1.1. Of course the measure is coarse. However, it is valuable tohave a simple method for comparing fitness landscapes in different contexts. Wheneverfitness is measured, one can determine | B || B p | , and for any record one can determine | ˆ B || ˆ B p | .Notice that one expects the qualitative measure to decrease by increasing block size forthe block model, as well as by increasing K for the NK model.For some background, a measure of additivity which reflects quantitative fitness dif-ferences is called ”roughness” (Carnerio and Hartl, 2010; Aita et al., 2001). Roughness0 implies that the landscape is additive. A problem with roughness is a possible sizebias, i.e., all else equal, the roughness may be greater for a large number of loci. Thequalitative measure of additivity does not have any size bias. CRONA ET. AL
Analyzing epistasis is closely related to analyzing additivity. For a thorough discus-sion about different measures of epistasis and empirical fitness landscapes, see Szen-dro et al. (2012). The most fine-scaled approach to epistasis is the geometric theory ofgene interactions, which uses triangulations of polytopes (Beerenwinkel et al., 2007b,c;Crona, 2013).We will consider the | ˆ B || ˆ B p | value for the TEM record. The record has 46 single mutants.The substitutions are at position position 69 for 3 single mutants, at position 164 for3 single mutants, at position 244 for 5 single mutants and at position 275 for 2 singlemutants. Each remaining single mutant has its mutation at a unique position.It follows that the number of candidates are (cid:18) (cid:19) − (cid:18) (cid:19) − (cid:18) (cid:19) − (cid:18) (cid:19) − (cid:18) (cid:19) = 1018 . The record has 35 double mutants in the set ˆ B (see Table 1 for a list of the doublemutants in the set ˆ B ). Consequently, | ˆ B || ˆ B p | = 351018 = 3 . We summarize the results for the TEM record in the following observation.
Observation 1.
For the TEM record,(1) the proportion of beneficial single mutations is . ,(2) the ratio | ˆ B || ˆ B p | = = 3 . . Consequently, . is an estimate of the qualitativemeasure of additivity | B || B p | .3. RECORDS OF MUTATIONS , ADDITIVE FITNESS AND UNCORRELATED FITNESS
Consider a record of drug resistance mutations. We first consider conditions ideal forour purposes. Then we discuss the consequences of relaxing some of the conditions.3.1.
The perfect record conditions.
Assume that we have a well defined wild-type andseveral mutant variants associated with drug resistance. Assume that the records ofdrug resistance mutations satisfy the following conditions.(1) The organism adapts to a single environment. [single environment condition](2) The record is complete with respect to single and double mutants in the sensethat(a) All single mutants which are more fit than the wild-type occur in the record,(b) All double mutants which are more fit than both corresponding single mu-tants occur in the record. [completeness condition](3) Every single and double mutant in the record is a result of adaption. In particu-lar, the single mutants are the result of beneficial mutations. [Absence of neutralmutations condition]
Remark 3.1.
Assume that a record satisfies the perfect record conditions, as described.
NTIBIOTIC RESISTANCE LANDSCAPES 9 (i) If | ˆ B || ˆ B p | < , then the fitness landscape is not additive(ii) Suppose that there are s R single mutants in the record. If the fitness landscape isuncorrelated, then one expects | ˆ B || ˆ B p | to equal s R L under the (simplified) assumptionthat a genotype has N mutational neighbors.The first claim is obvious. Fitness being uncorrelated, approximately s R L of doublemutants are more fit than the wild-type. If one restricts to the category of double mu-tants where both corresponding single mutants are more fit than the wild-type, the pro-portion is s R L as well. Fitness being uncorrelated, one third of the double mutants in thiscategory are expected to be less fit than both single mutants. Indeed, there are threepossible fitness ranks, and the double mutant is as likely to have the lowest fitness asany other rank. The resulting proportion is · s R L = s R L , which explains the second claim in the remark.3.2.
Relaxing the perfect record assumptions.
The TEM family has adapted to differ-ent selective environments, since antibiotics have different effects, so that the singleenvironment condition is not satisfied. First we relax the single environment conditionfor a record.
Multiple environments and additive landscapes.
In contrast to the single environment case,even if the fitness landscape associated with each drug is additive the | ˆ B || ˆ B p | -value may belower than 1. The reason is that if two different single mutants are adapted to differentenvironment, then a combination of the two corresponding mutations may not be fitin any environment. As an illustration, consider the following examples with additivelandscapes. Example 3.2.
Consider two different environments and 50 single resistance mutations,where 25 mutants are adapted to each one of two environments. Assume that the fitnesslandscapes associated with both environments are additive. Moreover, assume that twomutations that constitute adaptations to different environments never combine well, sothat the corresponding double mutants do not occur in the record. Then | B || B p | = (cid:0) (cid:1) + (cid:0) (cid:1)(cid:0) (cid:1) = 0 . . Consider exactly the same situation with 50 single mutants but instead 10 different en-vironments, where 5 single mutants are adapted to each different environment. Then | B || B p | = · ( )( ) = 0 . . We conclude that in the case of multiple environments the | B || B p | -value may be low even if each fitness landscape is additive. The case described, where mutations which are beneficial in different environments never combine well is probably not realistic. However, it is clear that multiple environ-ments may lead to a lower | B || B p | -value. Multiple environments and uncorrelated landscapes . Consider a situation with multiple en-vironments where the fitness landscape associated with each environment is uncorre-lated. For simplicity, we assume that there are not an excessive amount of differentenvironments. By assumption, very few single and double mutants should be more fitthan the wild-type in any particular single environment. Multiple environments implymore chance for a mutant to be fit in at least one environment. However, fitness beinguncorrelated, that effect is exactly the same for single and double mutants.Double mutants will be more fit than the wild-type in any of the different environ-ments, so that the BB p -value will be very low also in the case of multiple environments.In other words, unless the BB p -value is very small, we can rule out that all landscapes areuncorrelated fitness landscapes. (Multiple environments may lead to more beneficialmutations. However, there is no difference between single and double mutants in thatrespect, so that the BB p -value should not be influenced.) Incomplete records.
Missing single mutants in the record will normally have little effect,since | ˆ B || ˆ B p | concerns only single mutants in the record and associated double mutants,by definition. However, missing double mutants will make the ˆ B ˆ B p -value smaller ascompared to the result for a more Consequently, incompleteness may lead to and un-derestimate of | B || B p | . Neutral mutations.
An abundance of neutral mutations will make the | ˆ B | ˆ | B p | -value difficultto interpret. Remark 3.3.
An abundance of neutral mutations make the record difficult to interpret.In the case of multiple environments or incomplete records, Observation 2 (i) holds, butnot 2 (ii).The TEM record represents a case of multiple environments, but probably not an ex-cessive amount of completely different environments. We have good reason to believethat the record is fairly complete, and that there is not an abundance of neutral muta-tions in the record.For the TEM-record ˆ B | ˆ B p | = = 3 . , from Observation 1, and s R L = 469 ·
287 = 1 . . Observation 2. (i) Under the perfect record assumptions, the TEM landscape is not compatible withadditive or uncorrelated fitness landscapes.(ii) The TEM record is not compatible with uncorrelated fitness landscapes underrealistic assumptions for the TEM record.
NTIBIOTIC RESISTANCE LANDSCAPES 11 (iii) The TEM record combined with knowledge of the context, suggest that fitness isnot additive for the TEM family.Part (iii) rests on the fact that there does not exist an excessive amount of completelydifferent environments for the TEM family. It would be remarkable with ˆ B | ˆ B p | = 3% if allthe fitness landscapes associated with individual drugs were additive.The TEM record has in total 46 double mutants, where 35 are included in the set ˆ B . We conclude the section with some remarks about the remaining double mutants(see Section 7 for a list of them, and Table 2 of the same section for a list of the doublemutants in ˆ B ). For the double mutant TEM-164, none of the single substitutions corre-spond to single mutants in the record. For the other 9 double mutants, exactly one ofthe substitutions corresponds to a single mutant. The most likely reason for a doublemutant in B p not to be included in B is sign epistasis. Specifically, the single mutationnot in the record is selected for only if the other single mutation has already occurred.In such a case, the sign of the effect (positive or negative) of the second mutation de-pends depends on background (the effect is negative for the wild-type and positive ifthe first mutation has occurred). Constraints for orders in which mutations accumu-late are known from different contexts (see e.g. Desper et al., 1999; Beerenwinkel et al.,2007a), including HIV drug resistance.4. R ECORDS OF MUTATIONS , BLOCK MODELS AND POSITION GRAPHS
If fitness is neither additive nor uncorrelated, then one may consider more generalmodels. We will discuss block model with focus on how single beneficial mutationscombine. However, in this context one has to consider the structure for how beneficialmutations combine, not only the proportion of good combinations. For simplicity, wewill discuss loci rather than amino acid substitutions. The position graph is intended todisplay the structure of the combinations.
Definition 4.1.
For a record of mutations, each node of the position graph corresponds toa locus associated with a single mutant in the record. An edge between two nodes indi-cates that a double mutant occurs in the record, such that the two mutations correspondto the nodes.Notice that the position graph reflects the sites but ignores the actual amino acid sub-stitutions (such as if the substitution is glutamine or lysine, or if both of them occursat the site). Single mutants with substitutions at the same site may of course differ inhow well they combine with other mutation. One may want to look at more-fine scaledinformation and distinguish between different substitutions at the same site. However,for simplicity, we ignore this complication.Figure 1 shows the position graph for the TEM family, except that nodes of degreezero are omitted.
Remark 4.2.
The position graphs considers loci, but not the amino acid substitution.One may want to look at more-fine scaled information and distinguish between differ-ent substitutions at the same site.2 CRONA ET. AL
The position graphs considers loci, but not the amino acid substitution.One may want to look at more-fine scaled information and distinguish between differ-ent substitutions at the same site.2 CRONA ET. AL
130 10439 182265 164 2242169 275276244 24023884 184 F IGURE
1. The position graph for the TEM-family, where we have omittedthe 21 nodes of degree 0.Recall that the degree of a node is the number of edges to other nodes. The complement G of a graph G is a graph on the same nodes, where a pair of nodes are connected byan edge exactly if the pair is not connected by an edge for G . For a complete bipartitegraph , the nodes can be partitioned into two subsets, such that every pair of nodes fromdifferent subsets are connected by an edge, and there are no other edges.The following observation is elementary by Remark 1.1. Remark 4.3.
Assume the block model (with at least two blocks). Let G denote the posi-tion graph. Consider G and the complement G . Under the perfect record assumptions,the nodes of G have degree one or more. Moreover, by Remark 1.1, the following state-ments hold modulo a few errors:(1) For the case of two blocks, G is a complete bipartite graph. It follows that G is adisconnected graph with two components, both of which are complete.(2) In general, for l blocks, G is a disconnected graph with l components, all of whichare complete.For the TEM record, the single mutants correspond to substitutions at exactly 37 posi-tions. Exactly 21 nodes out of the 37 have degree zero. The position graph has 37 nodesand 25 edges.Consider the block model (for at least two blocks). The following observations arepotentially problematic for a block model. NTIBIOTIC RESISTANCE LANDSCAPES 13 (1) There is an abundance of nodes of degree zero (21 our of 37),(2) the total number of edges is (only) 25 and there are 37 nodes.(3) the maximal degree for a nodes is 8,(4) G has several triangles, in particular a triangle consisting of the three nodes ofhighest degree out of all nodes of G .Under the perfect record assumptions, the block model implies that the degree of eachnode is at least one (provided that the nodes are distributed over at least two blocks).For the case of two blocks, the number of edges is between 36 and ×
19 = 342 ,where the minimum corresponds to the distribution 1 and 36 nodes per block, and themaximum corresponds to the distribution of 18 and 19 nodes per block. In the first case,one node should have degree 36. However, the maximal degree of nodes in the positiongraph (Fig. 1) is 8. It is clear that the position graph has too few edges for similar blocklengths, and too low maximal degree for unequal block length.For more than two blocks, even more edges are expected leading to similar problems.Clearly the block model is not compatible with the data under the perfect record condi-tions.Consider the case of exactly two blocks in a more realistic situation. Then the positiongraphs should have essentially no triangles by Remark 1.1. This is because at least twoof the nodes in a triangle are on the same block. (Moreover, for two blocks a reasonableguess would be that the three nodes of highest degree (39, 69, 164) are on the same block[the shorter one]. If so, it is unexpected with a triangle consisting of the three nodes.)It remains to consider the single record condition and consider more than two blocks.In that case, let us analyze the fact that there are relatively few edges. 21 out of 37 nodeshave degree zero, and in fact one can find a set of 31 nodes (including the 21 nodes) inthe position graph, such that no pairs in the set are neighbors . That implies that among 31nodes one cannot find a single pair of nodes on different blocks, such that both of themhave high fitness in the same environment.This is of course possible, especially taken into account that the record may be incom-plete. However, from knowledge of the context, it does not seem plausible. The numberof completely different environments is limited.
Observation 3.
From the TEM record and some knowledge of the context, the blockmodel is probably not a good fit for the TEM family.5. A
LABORATORY STUDY OF
TEM
ALLELES
We will compare the results from the TEM record with a laboratory study. The ad-vantage with the laboratory study is that one can use drug specific information, whichtakes care of the difficulties resulting from multiple environments.The study Goulart et al. (2013) considered the antibiotics Ampicillin (AMP), Cef-tazidime (CAZ) Cefpodoxime (CPD), Cefprozil (CRP), Cefotetan (CTT), Cefotaxime(CTX), Cefepime (FEP) and Pipercillin/tazobactam penicillin/inhibitor (TZP). Fitnessranks were detremined for the 4 single mutants L21F, R164S, T265M, E240K. as well asthe 6 double mutants than can be obtained from them, for each of the 8 antibiotics.
In particular, for the drug Ceftazidime (CAZ), 4 single mutants were more fit than thewild-type, so that one can obtain 6 double mutants from the single mutants. Out of the6 double mutants for Ceftazidime, 5 double mutants were more fit than at least one ofthe single mutants, and one double mutant (the combination of R164S and E240K) wasmore fit than both corresponding single mutants.For the 9 antibiotics, we list the number of single mutants with higher fitness than thewild-typ. and below the | B || B p | -value. , , , , , , , , , , , , , , , , The mean values is 0.57. Obviously the data deviates considerably from additive fitnessas well as uncorrelated fitness.For a comparison, combinations of 5 beneficial mutations from an experimental
Es-cherichia coli population were considered in Khan (2011). Negative epistasis dominated,but sign epistasis was rare. Consider the 10 double mutants combining pairs of the 5beneficial mutations. Every double mutant had higher fitness than at least one of itscorresponding single mutant, so that | B || B p | = = 1 .As for the block model, with very few exceptions, the double mutants in B shouldbe more fit than both single mutants, or less fit than both corresponding single mutants,by Remark 1.1. For the 9 drugs, one can form in total 19 double mutants. 5 of them aremore fit than both corresponding single mutants, 6 of them are more fit than exactly one corresponding single mutants, and 8 of them are less fit than both corresponding singlemutants. This observation indicates that the block model does not apply. Notice alsothat the most plausible block distribution of nodes differ from drug to drug (for somedrugs node 21 and 265 should be on the same block, and for other drugs not). Observation 4
Neither additive fitness, uncorrelated fitness, nor the block model iscompatible with data from the laboratory study Goulart et al. (2013).6.
DISCUSSION
We have compared expectations from additive, uncorrelated, and block models offitness landscapes with empirical data. We argue that the TEM family of β -lactamases isnot compatible with the three models. Under the simplified assumptions of a completerecord and a single environment, we found that the TEM data was not compatible withanyone of the three models. Under more realistic assumptions for the TEM family, thedata was not compatible with uncorrelated fitness. Similarly, using the record and someknowledge of the biological context, it seems plausible that neither the additive nor theblock model is a good fit. Our conclusions for the three models were confirmed by alaboratory study of TEM alleles. We did not compared the TEM family and the NKmodel. However, the symmetry aspect (see Remark 1.4) could be problematic.The literature on additive, uncorrelated block and NK models literature is extensive.Some approaches in the field have been motivated by theoretical considerations, such NTIBIOTIC RESISTANCE LANDSCAPES 15 as relating epistasis to the number of peaks of a fitness landscape. The purpose of ourstudy was not to debate the value of the classical models. We appreciate that toy modelscan be used for generating fruitful hypotheses, which can be tested empirically. As foradditive and uncorrelated fitness, the extremes will always be of interest as a theoreticalstarting point.However, the classical models have been used for interpretations of empirical dataas well. A standard assumption for several topics in evolutionary biology and breed-ing is additive (or multiplicative) fitness, in particular for studies of fitness inheritanceand sexual selection (e.g Kokko et al, 2003). Statistical methods which are suitable foruncorrelated fitness landscapes have been used in empirical studies (see e.g. Crona etal., 2013b, for a discussion), and the NK model is frequently used in empirical contexts.From this perspective, it is reasonable to discuss to what extent the classical models arerealistic.We have suggested elementary tools, including the qualitative measure of additivityand the position graph, for comparing models and data. Our approach demonstratesthat one can determine properties of fitness landscapes from a record of mutations. Theideal setting is a single environment. It would be of interest to compare the qualita-tive measure of additivity with observed behavior for microbes, such as recombinationstrategies. The position graph can be used as a test of modularity.We consider it an advantage that our approach does not depend on any structuralassumptions of the underlying fitness landscapes. Any case of adaptation where onehas a well-defined wild-type and some direct or indirect method for determining fitnessranks of genotypes works.Qualitative information has its limitations. The ideal information for determiningproperties of fitness landscapes is of course direct fitness measurements. For obviousreasons such information is sometimes difficult, if even possible, to derive. Moreover,laboratory results do not always agree with clinical findings. Consequently, direct meth-ods for interpretations of nature are of interest as a complement to experimental results.R
EFERENCES
Aita, T., Iwakura, M. and Husimi, Y. (2001). A cross-section of the fitness landscape ofdihydrofolate reductase.
Protein Eng.
Sep; 14(9):633–8.Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2007). Conjunctive Bayesian net-works.
Bernoulli ; 13: 893-909.Beerenwinkel N., Pachter, L. and Sturmfels B. (2007). Epistasis and shapes of fitnesslandscapes.
Statistica Sinica
BMC Evolutionary Biology
Proc. Natl. Acad. Sci USA 107 suppl 1 : 1747-1751.Crona, K. Polytopes, graphs and fitness landscapes http://arxiv.org/abs/1212.0465v1.
Crona, K., Greene, D. and Barlow, M. (2013). The Peaks and Geometry of Fitness Land-scapes.
Journal of Theoretical Biology
Am Nat.
J. Comput. Biol.
Phys RevA
Theor. Pop. Biol.
23 :202–215.Gillespie, J. H. (1984). The molecular clock may be an episodic clock.
Proc. Natl. Acad.Sci. USA 81 : 8009–8013.Goulart, C. P., Mentar, M., Crona, K., Jacobs, S. J., Kallmann, M., Hall, B. G.,Greene D., Barlow M. (2013). Designing antibiotic cycling strategies by deter-mining and understanding local adaptive landscapes.
PLoS ONE
Proceedings of the Royal Society Series B
J Theor Biol.
J Theor Biol. ; 141:211.Khan, A. I., Dinh, D. M., Schneider, D., Lenski, R. E., Cooper, T. F. (2011).
Science
J Appl Prob
PLoS Genet.
Proc. Natl. Acad. Sci.
USA, 106, 18638–18643.Macken, C.A. and Perelson AS. (1989). Protein evolution on rugged landscapes.
Proc.Natl. Acad. Sci.
USA 86:6191.Orr, H. A. (2002). The population genetics of adaption:the adaption of DNA sequences,
Evolution
Evolution. ;60:1113Park, S. C., Krug J. (2008) Evolution in random fitness landscapes: The infinite sitesmodel.
J Stat Mech P04014.
NTIBIOTIC RESISTANCE LANDSCAPES 17
Poelwijk, F. J., Kiviet, D. J., Weinreich, D. M. and Tans, S. J. (2007). Empirical fitnesslandscapes reveal accessible evolutionary paths.
Nature
SIAM Review
J Theor Biol. 243:114120 .Schenk, M. F., Szendro, I. G., Krug, J. and de Visser, J. A. (2012). Quantifying the adaptivepotential of an antibiotic resistance enzyme.
PLoS Genet.
Jun;8(6):e1002783.Szendro, I. G., Schenk, M. F., Franke, J., Krug, J., de Visser, J. A. G. M. (2012). Quantita-tive analyses of empirical fitness landscapes. arXiv:1202.4378.Wright, S. (1931) Evolution in Mendelian populations.
Genetics , 16 97–159.Weinberger, E. E. (1991) Taylor and Fourier representations of fitness landscapes Biolog-ical Cybernetics, 65: 321–330.Weinreich, D. M., Watson R. A., Chao, L. (2005). Sign epistasis and genetic constraint onevolutionary trajectories.
Evolution
59, 1165–1174.Weinreich, D. M., Delaney N. F., Depristo, M. A., and Hartl, D. L. (2006). Darwinianevolution can follow only very few mutational paths to fitter proteins.
Science 312 :111–114. T ABLE
1. Loci with more than one substitutionlocus
TATISTICS AND TABLES
We use information from the record of the TEM family from the Lahey Clinic . as of April 2012 for this study. Allmutants have been found clinically. The record is continuously updated with new mu-tants.The following loci, in total 37, correspond to single mutations:21, 28, 34, 39, 68, 69, 84, 92, 104, 115,120, 124, 130, 145, 155, 157, 158, 163, 164, 176,182, 184, 189, 204, 213, 218, 224, 230, 238, 240,244, 265, 271, 275, 276, 280, 283Exactly 4 loci, out of the 37 listed above, correspond to more than one substitutions(see Table 1).7.1.
Double mutants and the position graph.
The total number of double mutants inthe record is 45, where 35 combine single mutations from the record (see the table). Theremaining 10 mutants are as follows: TEM-58, TEM-81, TEM-112, TEM-126, TEM-137,TEM-145, TEM-146, TEM-163, TEM-164, TEM-169.Information about double mutants in the record are expressed in the position graph(see Fig. 1). Recall that for the position graph a node corresponds to a locus with a singlemutation. An edge denotes that there exists at least one double mutant which combinesthe substitutions at the two loci. The position graph has 37 nodes and 24 edges. Thefollowing (21) nodes of the position graph have degree zero. , , , , , , , , , , , , , , , , , , , , The degree of the remaining (16) nodes are listed by locus (locus:degree).
21 : 3 ,
39 : 6 ,
69 : 7 ,
84 : 1 ,
104 : 3 ,
130 : 1 ,
164 : 8 ,
182 : 3 ,
184 : 1 ,
238 : 4 ,
224 : 1 ,
240 : 3 ,
244 : 2 ,
265 : 3 ,
275 : 1 ,
276 : 1
NTIBIOTIC RESISTANCE LANDSCAPES 19 T ABLE
2. Single mutants of the record1 TEM-2 Q39K2 TEM-12 R164S3 TEM-17 E104K4 TEM-19 G238S5 TEM-29 R164H6 TEM-30 R244S7 TEM-31 R244C8 TEM-33 M69L9 TEM-34 M69V10 TEM-40 M69I11 TEM-51 R244H12 TEM-54 R244L13 TEM-55 G218E14 TEM-57 G92D15 TEM-70 R204Q16 TEM-76 S130G17 TEM-79 R244G18 TEM-84 N276D19 TEM-90 D115G20 TEM-95 P145A21 TEM-96 D163G22 TEM-103 R275L23 TEM-104 A280V24 TEM-105 S124N25 TEM-117 L21F26 TEM-122 R275Q27 TEM-127 H158N28 TEM-128 D157E29 TEM-135 M182T30 TEM-141 K34E31 TEM-143 R164C32 TEM-148 T189K33 TEM-150 E28D34 TEM-156 M155I35 TEM-166 R120G36 TEM-168 T265M37 TEM-170 G283C38 TEM-171 V84I39 TEM-174 A213V40 TEM-176 A224V41 TEM-181 A184V42 TEM-183 F230L43 TEM-186 D176N44 TEM-191 E240K45 TEM-192 M68I46 TEM-198 T271I T ABLE
3. The double mutants of the record which combine single muta-tions of the record 1 TEM-6: E104K R164H2 TEM-7: Q39K R164S3 TEM-10: R164S E240K4 TEM-11: Q39K R164H5 TEM-13: Q39K 265M6 TEM-15: E104K G238S7 TEM-18: Q39K E104K8 TEM-20: M182T G238S9 TEM-26: E104K R164S10 TEM-28: R164H E240K11 TEM-32: M69I M182T12 TEM-35: M69L N276D13 TEM-36: M69V N276D14 TEM-37: M69I N276D15 TEM-38: M69V R275L16 TEM-44: Q39K R244S17 TEM-45: M69L R275Q18 TEM-53: L21F R164S19 TEM-59: Q39K S130G20 TEM-65: Q39K R244C21 TEM-71: G238S E240K22 TEM-77: M69L R244S23 TEM-82: M69V R275Q24 TEM-106: E104K M182T25 TEM-110: L21F T265M26 TEM-115: L21F R164H27 TEM-116: V84I A184V28 TEM-118: R164H T265M29 TEM-120: L21F G238S30 TEM-144: R164C E240K31 TEM-147: R164H A224V32 TEM-154: M69L R164S33 TEM-160: Q39K M69V34 TEM-165: R164S M182T35 TEM-189: M69L E240K
NTIBIOTIC RESISTANCE LANDSCAPES 21
Ceftazidime w ( L F ) , w ( R S ) , w ( E K ) , w ( T M ) > w ( TEM-1 ) w ( { R S, E K } ) > w ( R S ) , w ( E K ) w ( { L F, R S } ) > w ( L F ) w ( { L F, T M } ) > w ( T M ) w ( { R S, T M } ) > w ( T M ) w ( { E K, T M } ) > w ( T M ) Cefotaxime w ( L F ) , w ( R S ) , w ( E K ) , w ( T M ) > w ( TEM-1 ) w ( { R S, E K } ) > w ( R S ) , w ( E K ) w ( { L F, R S } ) > w ( L F ) Pipercillin/tazobactam penicillin/inhibitor: w ( L F ) , w ( T L ) > w ( TEM-1 ) w ( { L F, T M } ) > w ( L F ) , w ( T M ) Cefpodoxime: w ( R S ) , w ( E K ) > w ( TEM-1 ) w ( { R S, E K } ) > w ( E K ) , w ( R S ) Cefotetan: w ( R S ) , w ( E > w ( TEM-1 ) w ( { R S, E K } ) > w ( R S ) , w ( E Cefprozil: w ( L F ) , w ( T M ) > w ( TEM-1 ) w ( { L F, T M } ) > w ( L F ) Ampicillin: w ( L F ) , w ( T M ) > w ( TEM-1 ) No double mutants to list.Cefepime: w ( L F ) , w ( R S ) > w ( TEM-1 ) No double mutants to list.Amoxillin+Clavulanate: w ( L F ) , w ( T M ) > w ( TEM-1 ) No double mutants to list.
E-mail address : [email protected] CRONA ET. AL