[PDF] Antibiotic resistance landscapes: a quantification of theory-data incompatibility for fitness landscapes

Abstract

Fitness landscapes are central in analyzing evolution, in particular for drug resistance mutations for bacteria and virus. We show that the fitness landscapes associated with antibiotic resistance are not compatible with any of the classical models; additive, uncorrelated and block fitness landscapes. The NK model is also discussed. It is frequently stated that virtually nothing is known about fitness landscapes in nature. We demonstrate that available records of antimicrobial drug mutations can reveal interesting properties of fitness landscapes in general. We apply the methods to analyze the TEM family of β -lactamases associated with antibiotic resistance. Laboratory results agree with our observations. The qualitative tools we suggest are well suited for comparisons of empirical fitness landscapes. Fitness landscapes are central in the theory of recombination and there is a potential for finding relations between the tools and recombination strategies.

Full PDF

AANTIBIOTIC RESISTANCE LANDSCAPES: A QUANTIFICATION OFTHEORY-DATA INCOMPATIBILITY FOR FITNESS LANDSCAPES

KRISTINA CRONA, DAYONNA PATTERSON, KELLY STACK, DEVIN GREENE, CHRISTIANEGOULART, MENTAR MAHMUDI, STEPHEN D. JACOBS, MARCELO KALLMAN, MIRIAM BARLOWA

BSTRACT . Fitness landscapes are central in analyzing evolution, in particular for drugresistance mutations for bacteria and virus. We show that the ﬁtness landscapes associ-ated with antibiotic resistance are not compatible with any of the classical models; addi-tive, uncorrelated and block ﬁtness landscapes. The NK model is also discussed.It is frequently stated that virtually nothing is known about ﬁtness landscapes in na-ture. We demonstrate that available records of antimicrobial drug mutations can revealinteresting properties of ﬁtness landscapes in general. We apply the methods to analyzethe TEM family of β -lactamases associated with antibiotic resistance. Laboratory resultsagree with our observations. The qualitative tools we suggest are well suited for com-parisons of empirical ﬁtness landscapes. Fitness landscapes are central in the theory ofrecombination and there is a potential for ﬁnding relations between the tools and recom-bination strategies.

1. B

ACKGROUND

The ﬁtness landscape was introduced as a metaphor for adaptation. Informally, thesurface of the landscape consists of genotypes, where similar genotypes are close toeach other, and the ﬁtness of a genotype is represented as a height coordinate. Adap-tation can then be pictured as an uphill walk in the ﬁtness landscape Wright (1931). Itis frequently claimed that we know virtually nothing about ﬁtness landscapes in na-ture. Scarcity of ﬁtness measurements along with the difﬁculty in measuring ﬁtness,are cited as reasons. The purpose of this work is to demonstrate that available recordsof drug resistance mutations can reveal interesting properties of the underlying ﬁtnesslandscapes. We suggest qualitative tools that are easy to apply and interpret in orderto learn properties of ﬁtness landscapes from data. The setting we have in mind is arecord of clinically found antimicrobial drug resistance mutations, where there is a welldeﬁned wild-type and several mutant variants with some degree of drug resistance.Other cases of adaptation could work as well.There are several advantages with mutation records as a source of information. Therecords are already available. The quantity of data is substantial and growing, and thequality tend to be high since the data is of medical importance. Moreover, the datareﬂects nature, whereas laboratory data sometimes disagree with clinical observations,see our discussion about the TEM-family below. However, the most important reasonis that this data reﬂects adaptation. In contrast, many times empirical studies where ﬁt-ness is measured consider combinations of deleterious mutations. The majority of such a r X i v : . [ q - b i o . P E ] M a r CRONA ET. AL combinations are probably exceedingly rare in nature. If one is interested in adaptation,one needs knowledge about beneﬁcial mutations as well. For an overview of recentempirical work where ﬁtness is measured, see e.g. Szendro et al. (2012); Weinreich etal. (2005); Carnerio and Hartl (2010) and references. Most existing studies concern fewloci (4 or 5) and a speciﬁc selective environmnet. For studies with many loci, see e.g.Kouyos et al (2012); Schenk et al. (2012), and for a case where ﬁtness ranks for the samegenotypes are determined in several different selective environments, see Goulart et al.(2013).We apply our results to the TEM family of beta-lactamases associated with antibioticresistance. TEM stands for Temoneira, the name of the patient from whom the enzymewas ﬁrst isolated. TEM beta-lactamases have been found in

Escherichia coli , Klebsiellapneumoniae and other gram-negative bacteria. TEM-1 is considered the wild-type. Thelength of TEM-1 is 287, i.e., TEM-1 can be represented as a sequence of 287 letters inthe 20-letter alphabeth corresponding to the amino acids. Over 170 TEM variants havebeen found clinically, where 41 are single mutants, i.e., they have exactly one amino acidsubstitution, and the the majority (90 %) have at most 4 amino acid substitutions. Weuse the record of the TEM family from the Lahey Clinic .The quality of the TEM record (from now on we will simply refer to ”the TEM record”),is assumed to be high The TEM record represent a case of multiple environments, butprobably not an exessive amount of completely different environments. Several antibi-otics have similar effects. We have good reason to believe that the TEM record is fairlycomplete, and that there is not an abundance of neutral mutations.As a complement, we use laboratory results (Goulart et al., 2013). Our study deter-mines if TEM data is compatible with classical models of ﬁtness landscapes. More pre-cisely, we compare with additive, uncorrelated and block models of ﬁtness landscapes.The NK model is also dicussed.Throughout our article, our focus is to what extent beneﬁcial mutations combine well.We start with a brief description of our approach in the context of the TEM family. Us-ing standard notation, TEM-2 is a single mutant with the mutation Q39K, which meansthat the amino acid denoted ”Q” (glutamine) at position 39 of the wild-type is substi-tuted by the amino acid denoted ”K” (lysine). TEM-174 is the single mutant A213V.One can ask if the double mutant with substitutions Q39K and A213V confer antibioticresistance, since the double mutant combines two resistance mutations. However, thedouble mutant does not occur in the record.Roughly, we compare the candidates for double mutants, such as the one described,with double mutants that do occur in the record, and consider the patterns for how can-didates occur of are absent in the record. This approach is motivated by an evolutionaryperspective. Provided that the quality of a record of resistance mutations is good, mostsingle mutants are more ﬁt, i.e., confer more drug resistance than the wild-type in somenvironment. Likewise, if a double mutant occur in a mutation record, it is plausiblethat the double mutant is more ﬁt than at least one of the corresponding single mutantsin some environment.

NTIBIOTIC RESISTANCE LANDSCAPES 3

Put brieﬂy, we consider if ”good+good=better” for mutations. The goal is to capturethe relation between ﬁtness landscapes and mutation records. One reason for beinginterested in mutation records, is that laboratory results do not always reﬂect clinicalfacts. A striking example is that the triple mutant of the TEM family with substitutionsA42G, E104K, and G238S, confer a high degree of cefotaxime resistance according tolaboratory results Weinreich et al. (2006), but this mutant has never been observed clini-cally. Examples where single mutants not found outside of the laboratory confer a highdegree of cefotaxime resistance are given in Schenk et al. (2012).It may seem surprising qualitative information can be useful for analyzing ﬁtnesslandscapes. However, we will show that predictions from some classical models relatewell to qualitative information. Our approach could be useful for comparisons of em-pirical landscapes, and there is a potential for relating the information derived directlyto recombination strategies.We deﬁne ﬁtness as the expected reproductive success, and use the convention thatthe wild-type has ﬁtness 1. Fitness is called additive if the ﬁtness effects of mutationssum. Consider a biallelic two-loci system. Suppose that the genotype ab has ﬁtness1, the genotype Ab has ﬁtness 1.03 and the genotype aB has ﬁtness 1.01. If ﬁtness isadditive, then the genotype AB has ﬁtness .

04 = 1 + 0 .

01 + 0 . (In the literature non-epistatic ﬁtness is sometimes deﬁned as multiplicative, so that the double mutant wouldhave ﬁtness 1.0403. If the ﬁtness effects and the number of beneﬁcial mutations aresmall, there is not much difference between the deﬁnitions.) Values greater than 1.04implies positive epistasis. Values smaller than 1.04 implies negative epistasis. Sign epistasis means that a particular mutation is beneﬁcial or deleterious dependingon genetic background. For example, if ab, Ab, aB and have ﬁtness values as above (1,1.03, 1.01), but AB has ﬁtness 1.02, then there is sign epistasis. Indeed, in this case themutation B is beneﬁcial for the ab-genotype, and deleterious for the Ab-genotype.The concept of a ﬁtness landscapes has been formalized in different ways. A genotypemay be represented as a string in the 20, 4 or 2 letter alphabet, depending on if oneconsiders the amino acids, the base pairs or biallelic systems. Thorughout the paper, wewill consider amino acids.Let Σ denote the 20 letter alphabeth. The genotype space Σ L consists of all L stringsof length L . A ﬁtness landscape w : Σ L (cid:55)→ R assigns a ﬁtness value to each genotype. Theﬁtness of a genotype g is denoted w g . If two genotypes differ by a single mutation, theyare mutational neighbors . Remark 1.1.

Following the Orr-Gillespie approach we assume that the wild-type hasvery high ﬁtness also in the new environment as compared to a randomly generatedgenotype. Consequently, only a small number of mutations of the wild-type are beneﬁ-cial.The paper is structured as follows. In Section 1.1 we brieﬂy review classical models ofﬁtness landscapes. Section 2 provides basic observations of the TEM record. Section 3concerns additive and uncorrelated ﬁtness, and Section 4 block models. For all models,

CRONA ET. AL we compare with the TEM record, and in Section 5 a laborator study of TEM alleles isused as a complement to the record.1.1.

Classical models of ﬁtness landscapes.

Additive ﬁtness landscapes, uncorrelatedﬁtness landscapes, the block model and Kauffman’s NK model have had a broad inﬂu-ence in evolutionary biology. We will give a brief overview of the four classical models.Additive ﬁtness landscapes, or non-epistatic landscape, has been deﬁned. An addi-tive ﬁtness landscape is single peaked.In contrast, for an uncorrelated (also called random, rugged or House of Cards [HOC])ﬁtness landscape, there is no correlation between the ﬁtness of a genotype and the ﬁt-ness of its mutational neighbors, i.e., alleles that differ by one substitution only.Consider an uncorrelated landscapes where say of the single mutants are more ﬁtthan the wild-type. It follows that for double mutants corresponding to two beneﬁcialsingle mutations, approximately are more ﬁt than the wild-type as well. In otherwords, beneﬁcial mutations do usually not combine well for an uncorrelated ﬁtnesslandscape by Remark 1.1.Uncorrelated ﬁtness and additivity can be considered as two extremes with regard tothe amount of structure in the ﬁtness landscape, and most ﬁtness landscape fall betweenthe extremes. Uncorrelated ﬁtness has been studied extensively in the literature (see e.g.Kingman, 1978; Kauffman and Levin, 1987; Flyvberg and Lautrup, 1992; Rokyta et al.,2006; Park and Krug, 2008).For the block model (see (Macken and Perelson, 1989; Orr, 2006, e.g.)) the stringrepresenting a genotype can be subdivided into blocks, where each block makes an in-dependent contribution to the ﬁtness of the string. Each block has uncorrelated ﬁtness,and the ﬁtness of the string is the sum of contributions from each block. In particular,a block model consisting of only one block only is an uncorrelated ﬁtness landscape.The rational behind this model is that if two blocks have completely different functions,then the effect of two changes in different blocks should be independent.Kaufmann’s NK model (see e. g. Kauffman and Weinberger (1989)) is deﬁned so thatthe epistatic effects are random, whereas the ﬁtness of a genotype is the average of the”contributions” from each locus.More precisely, for the NK model the genotypes have length N (in our notation L = N ), and the parameter K , where ≤ K ≤ N − , reﬂects interactions between loci. Theﬁtness contribution φ i from the locus i is determined by its state g i and the states at K other loci i , . . . , i K . The key assumption is that this contribution is assigned at randomfrom some probability distribution. The ﬁtness of a genotype g is the average of thecontributions φ i , so that w g = 1 N N (cid:88) i =1 φ i ( g i , g i , . . . , g i K ) , where i , . . . , i k ⊂ { , . . . , i − , i + 1 , . . . , N } . Several important properties of NK land-scapes depend mainly on N and K, rather than the exact structure of the epistatic inter-actions. The fact that the ﬁtness of the genotype is the average of these N contributions, NTIBIOTIC RESISTANCE LANDSCAPES 5 means that ﬁtness effects of non-interacting mutations sum. Notice that the ﬁtness land-scapes is additive for K = 0 and uncorrelated for K = N − . The popularity of the NKmodel rests on the that the model is ”tunably rugged”. This expression means that theruggedness is expected to increase by K from the single peaked additive landscape for K = 0 to the uncorrelated landscape with a maximal number of peaks for K = N − .Published results on the NK model of (potential) relevance to evolutionary biology con-cerns the number of peaks, the length of mutational trajectories, ﬁtness distributions ofgenotypes and ﬁtness trajectories.Notice that also the block model includes additive landscapes and uncorrelated land-scapes as special cases. More importantly, NK models and block models are similar inthat there is a sharp division between effects which are completely random and effectswhich are additive. One should keep in mind that the block models and Kaufman’s NKmodel are equipped with very special structures. In order to provide some intuition forhow empirical landscapes relate to the models, we will consider examples. Example 1.2. w = 1 , w = w = w = 1 . , w = 1 . , w = w = 1 . , w = 1 . For every loci, replacing 0 by 1 increases ﬁtness by . . Fitness is additive and thegenotype is at a peak. Example 1.3. w = 1 , w = w = w = w = 1 . ,w = w = w = 1 . , w = 1 . ,w = 1 . , w = w = w = 1 . ,w = w = w = 1 . , w = 1 . . For every loci, replacing 0 by 1 increases ﬁtness. For the ﬁrst locus the increase is . ,regardless of background. For the other loci, the magnitude of the difference dependson the background. For instance, the magnitude is 0.02 for the change from 0000 to 0001,0.01 for the change from 0001 to 0011, and 0.005 for the change from 0011 to 0111.Fitness is obviously not additive in the second example, since the ﬁtness of the doubleand triple mutants are below linear expectations based on the wild-type and the singlemutants. The landscape deviates from expectations for an uncorrelated landscape aswell, since replacing 0 by 1 always gives higher ﬁtness. It remains to consider the moregeneral models. As for the block model, the ﬁrst locus is independent, The remainingthree loci interact with each other in a symmetric way. Consequently, the natural can-didates for blocks would be one block consisting of the ﬁrst locus, and another blockconsisting of the remaining three loci. However, the second block deviates considerablyfrom random expectations. Consequently, the block model is not a good ﬁt.For the NK model, the independent ﬁtness contribution for the ﬁrst loci suggest that K = 0 . However, the second locus depends on the third and the fourth loci, suggestingthat K = 2 . (Similar arguments for the third and the fourth loci, suggests that K = 2 as CRONA ET. AL well.) Since the observations suggest different K -values, the NK model does not seemideal. Remark 1.4.

The NK model allows different interactive patterns. However, it is notexpected that some loci are more or less independent, whereas other loci have consid-erable interactions. Some degree of of symmetry is expected, reﬂecting the K value.Several other models for ﬁtness landscapes have been suggested, including neutralmodels, see Szendro et al. (2012) for an empirical perspective. For some approachesto ﬁtness landscapes not related to the models mentioned, see the geometric theory ofgene interactions Beerenwinkel et al. (2007b); Crona (2013), and the Orr-Gillespie theoryOrr (2002). Notice also that ﬁtness landscapes have been used in chemistry, physics andcomputer science, in addition to evolutionary biology. In combinatorial optimizationthe ﬁtness function is referred to as the cost function. For a survey on combinatoriallandscapes in general see Reidys and Stadler (2002).2. T HE QUALITATIVE MEASURE OF ADDITIVITY AND THE

TEM

RECORD

Throughout the paper, we focus on single and double mutants in a record. The mo-tivation is trivial. If a single mutant has high ﬁtness, it is likely to be found in nature.However, if a k -tuple mutant is very ﬁt for some large k , the mutant may never be foundbecause of the time span necessary before the substitutions have accumulated. Singleand double mutants are likely to appear relatively early in the process of adaptation.Roughly, we are interested in the proportion of beneﬁcial mutations among all possi-ble single mutations, as well as to what extent beneﬁcial mutations combine well. Theinformation we consider is coarse, and a record of mutation will rarely be perfect.As indicated, we will work with words in the 20 letter alphabet where there is awell deﬁned wild-type. A single mutant is a genotype resulting from one amino acidsubstitution. However, the amino acid substitutions are not comparable to substitutionsof letters in a string. Not every amino acid substitution can occur as the result of asingle point mutation. For instance, suppose that the amino acid is Valine at a particularlocus, and that the codon is GTT. Then one can obtain exactly 6 single mutants at thelocus corresponding to A, D, F, G, I, L (or Alanin, Aspartic acid, Phenylalanine, GlysineIsoleucine, Leucine). On the other hand, one can obtain exactly 8 single mutants (A, D,E, F, G, I, L, M) starting from Valine (the codons for Valine are GTT, GTC, GTA, GTG).In general, the number of single mutants one can obtain varies depending on aminoacid. Moreover, in some cases, such as for Valine, the number depends on if one considera particular codon for the amino acid or all possible codons.To make matters more complicated, the wild-type allele under consideration may beunique in terms of amino acids, but not in terms of codons. For a precise analysis,one may want to consider codon variations in the wild-type allele. However, for ourpurposes it is sufﬁcient to consider amino acids.We assume that there are approximately 7 possible single point mutations for a givenlocus, so that if the wild-type has length N , there are N mutational neighbors. For thereader’s convenience, we included a table of possible single mutants (see Section 7). NTIBIOTIC RESISTANCE LANDSCAPES 7

Remark 2.1.

Throughout the paper, we assume a genotype of length N has N muta-tional neighbors.We are interested in the proportion of beneﬁcial mutations among all possible muta-tions. By Remark 1.1, the proportion of beneﬁcial mutations is expected to be small. Forthe TEM record N = 287 and there are 47 single mutants in the record. Consequently, S R N = 66 ·

287 = 2 . Another interesting property of a ﬁtness landscape, is how beneﬁcial mutations com-bine. More precisely, consider a double mutant which combines two beneﬁcial singlemutations. If the double mutant is less ﬁt than both single mutants, then the double mu-tant would most likely not appear in the record. The qualitative measure of additivityis motivated by this observation. More precisely, we will use the following deﬁnition.

Deﬁnition 2.2.

Let B p be the set consisting of all double mutants such that both corre-sponding single mutations are beneﬁcial. The set B ⊂ B p consists of all double mutantsin B p which are more ﬁt than at least one of the corresponding single mutants. Thequalitative measure of additivity for a ﬁtness landscape is the ratio | B || B p | .Consider the single mutations in a record and the corresponding double mutants.Whenever two single mutants at different sites occur in the record, the correspondingdouble mutant is considered a candidate for a double mutant of high ﬁtness. Let ˆ B p bethe set of candidates for double mutants. Let ˆ B ⊂ ˆ B p be the set of double mutants inthe record among the candidates. Loosely speaking, one can consider | ˆ B || ˆ B p | the observedqualitative measure of additivity. Under ideal circumstances, the ratios | ˆ B || ˆ B p | and | B || B p | areapproximately the same, at least in for antimicrobial resistance mutations in the contextwe consider. We assume that the adaptation, or the resistance development, will takeplace repeatedly at different geographic locations. If a double mutant is more ﬁt than atleast one of the single mutants, the double mutant should occur sooner or later.If ﬁtness is additive, then | B || B p | =1, and for uncorrelated ﬁtness one expects the value beclose to 0 by Remark 1.1. Of course the measure is coarse. However, it is valuable tohave a simple method for comparing ﬁtness landscapes in different contexts. Wheneverﬁtness is measured, one can determine | B || B p | , and for any record one can determine | ˆ B || ˆ B p | .Notice that one expects the qualitative measure to decrease by increasing block size forthe block model, as well as by increasing K for the NK model.For some background, a measure of additivity which reﬂects quantitative ﬁtness dif-ferences is called ”roughness” (Carnerio and Hartl, 2010; Aita et al., 2001). Roughness0 implies that the landscape is additive. A problem with roughness is a possible sizebias, i.e., all else equal, the roughness may be greater for a large number of loci. Thequalitative measure of additivity does not have any size bias. CRONA ET. AL

Analyzing epistasis is closely related to analyzing additivity. For a thorough discus-sion about different measures of epistasis and empirical ﬁtness landscapes, see Szen-dro et al. (2012). The most ﬁne-scaled approach to epistasis is the geometric theory ofgene interactions, which uses triangulations of polytopes (Beerenwinkel et al., 2007b,c;Crona, 2013).We will consider the | ˆ B || ˆ B p | value for the TEM record. The record has 46 single mutants.The substitutions are at position position 69 for 3 single mutants, at position 164 for3 single mutants, at position 244 for 5 single mutants and at position 275 for 2 singlemutants. Each remaining single mutant has its mutation at a unique position.It follows that the number of candidates are (cid:18) (cid:19) − (cid:18) (cid:19) − (cid:18) (cid:19) − (cid:18) (cid:19) − (cid:18) (cid:19) = 1018 . The record has 35 double mutants in the set ˆ B (see Table 1 for a list of the doublemutants in the set ˆ B ). Consequently, | ˆ B || ˆ B p | = 351018 = 3 . We summarize the results for the TEM record in the following observation.

Observation 1.

For the TEM record,(1) the proportion of beneﬁcial single mutations is . ,(2) the ratio | ˆ B || ˆ B p | = = 3 . . Consequently, . is an estimate of the qualitativemeasure of additivity | B || B p | .3. RECORDS OF MUTATIONS , ADDITIVE FITNESS AND UNCORRELATED FITNESS

Consider a record of drug resistance mutations. We ﬁrst consider conditions ideal forour purposes. Then we discuss the consequences of relaxing some of the conditions.3.1.

The perfect record conditions.

Assume that we have a well deﬁned wild-type andseveral mutant variants associated with drug resistance. Assume that the records ofdrug resistance mutations satisfy the following conditions.(1) The organism adapts to a single environment. [single environment condition](2) The record is complete with respect to single and double mutants in the sensethat(a) All single mutants which are more ﬁt than the wild-type occur in the record,(b) All double mutants which are more ﬁt than both corresponding single mu-tants occur in the record. [completeness condition](3) Every single and double mutant in the record is a result of adaption. In particu-lar, the single mutants are the result of beneﬁcial mutations. [Absence of neutralmutations condition]

Remark 3.1.

Assume that a record satisﬁes the perfect record conditions, as described.

NTIBIOTIC RESISTANCE LANDSCAPES 9 (i) If | ˆ B || ˆ B p | < , then the ﬁtness landscape is not additive(ii) Suppose that there are s R single mutants in the record. If the ﬁtness landscape isuncorrelated, then one expects | ˆ B || ˆ B p | to equal s R L under the (simpliﬁed) assumptionthat a genotype has N mutational neighbors.The ﬁrst claim is obvious. Fitness being uncorrelated, approximately s R L of doublemutants are more ﬁt than the wild-type. If one restricts to the category of double mu-tants where both corresponding single mutants are more ﬁt than the wild-type, the pro-portion is s R L as well. Fitness being uncorrelated, one third of the double mutants in thiscategory are expected to be less ﬁt than both single mutants. Indeed, there are threepossible ﬁtness ranks, and the double mutant is as likely to have the lowest ﬁtness asany other rank. The resulting proportion is · s R L = s R L , which explains the second claim in the remark.3.2.

Relaxing the perfect record assumptions.

The TEM family has adapted to differ-ent selective environments, since antibiotics have different effects, so that the singleenvironment condition is not satisﬁed. First we relax the single environment conditionfor a record.

Multiple environments and additive landscapes.

In contrast to the single environment case,even if the ﬁtness landscape associated with each drug is additive the | ˆ B || ˆ B p | -value may belower than 1. The reason is that if two different single mutants are adapted to differentenvironment, then a combination of the two corresponding mutations may not be ﬁtin any environment. As an illustration, consider the following examples with additivelandscapes. Example 3.2.

Consider two different environments and 50 single resistance mutations,where 25 mutants are adapted to each one of two environments. Assume that the ﬁtnesslandscapes associated with both environments are additive. Moreover, assume that twomutations that constitute adaptations to different environments never combine well, sothat the corresponding double mutants do not occur in the record. Then | B || B p | = (cid:0) (cid:1) + (cid:0) (cid:1)(cid:0) (cid:1) = 0 . . Consider exactly the same situation with 50 single mutants but instead 10 different en-vironments, where 5 single mutants are adapted to each different environment. Then | B || B p | = · ( )( ) = 0 . . We conclude that in the case of multiple environments the | B || B p | -value may be low even if each ﬁtness landscape is additive. The case described, where mutations which are beneﬁcial in different environments never combine well is probably not realistic. However, it is clear that multiple environ-ments may lead to a lower | B || B p | -value. Multiple environments and uncorrelated landscapes . Consider a situation with multiple en-vironments where the ﬁtness landscape associated with each environment is uncorre-lated. For simplicity, we assume that there are not an excessive amount of differentenvironments. By assumption, very few single and double mutants should be more ﬁtthan the wild-type in any particular single environment. Multiple environments implymore chance for a mutant to be ﬁt in at least one environment. However, ﬁtness beinguncorrelated, that effect is exactly the same for single and double mutants.Double mutants will be more ﬁt than the wild-type in any of the different environ-ments, so that the BB p -value will be very low also in the case of multiple environments.In other words, unless the BB p -value is very small, we can rule out that all landscapes areuncorrelated ﬁtness landscapes. (Multiple environments may lead to more beneﬁcialmutations. However, there is no difference between single and double mutants in thatrespect, so that the BB p -value should not be inﬂuenced.) Incomplete records.

Missing single mutants in the record will normally have little effect,since | ˆ B || ˆ B p | concerns only single mutants in the record and associated double mutants,by deﬁnition. However, missing double mutants will make the ˆ B ˆ B p -value smaller ascompared to the result for a more Consequently, incompleteness may lead to and un-derestimate of | B || B p | . Neutral mutations.

An abundance of neutral mutations will make the | ˆ B | ˆ | B p | -value difﬁcultto interpret. Remark 3.3.

An abundance of neutral mutations make the record difﬁcult to interpret.In the case of multiple environments or incomplete records, Observation 2 (i) holds, butnot 2 (ii).The TEM record represents a case of multiple environments, but probably not an ex-cessive amount of completely different environments. We have good reason to believethat the record is fairly complete, and that there is not an abundance of neutral muta-tions in the record.For the TEM-record ˆ B | ˆ B p | = = 3 . , from Observation 1, and s R L = 469 ·

287 = 1 . . Observation 2. (i) Under the perfect record assumptions, the TEM landscape is not compatible withadditive or uncorrelated ﬁtness landscapes.(ii) The TEM record is not compatible with uncorrelated ﬁtness landscapes underrealistic assumptions for the TEM record.

NTIBIOTIC RESISTANCE LANDSCAPES 11 (iii) The TEM record combined with knowledge of the context, suggest that ﬁtness isnot additive for the TEM family.Part (iii) rests on the fact that there does not exist an excessive amount of completelydifferent environments for the TEM family. It would be remarkable with ˆ B | ˆ B p | = 3% if allthe ﬁtness landscapes associated with individual drugs were additive.The TEM record has in total 46 double mutants, where 35 are included in the set ˆ B . We conclude the section with some remarks about the remaining double mutants(see Section 7 for a list of them, and Table 2 of the same section for a list of the doublemutants in ˆ B ). For the double mutant TEM-164, none of the single substitutions corre-spond to single mutants in the record. For the other 9 double mutants, exactly one ofthe substitutions corresponds to a single mutant. The most likely reason for a doublemutant in B p not to be included in B is sign epistasis. Speciﬁcally, the single mutationnot in the record is selected for only if the other single mutation has already occurred.In such a case, the sign of the effect (positive or negative) of the second mutation de-pends depends on background (the effect is negative for the wild-type and positive ifthe ﬁrst mutation has occurred). Constraints for orders in which mutations accumu-late are known from different contexts (see e.g. Desper et al., 1999; Beerenwinkel et al.,2007a), including HIV drug resistance.4. R ECORDS OF MUTATIONS , BLOCK MODELS AND POSITION GRAPHS

If ﬁtness is neither additive nor uncorrelated, then one may consider more generalmodels. We will discuss block model with focus on how single beneﬁcial mutationscombine. However, in this context one has to consider the structure for how beneﬁcialmutations combine, not only the proportion of good combinations. For simplicity, wewill discuss loci rather than amino acid substitutions. The position graph is intended todisplay the structure of the combinations.

Deﬁnition 4.1.

For a record of mutations, each node of the position graph corresponds toa locus associated with a single mutant in the record. An edge between two nodes indi-cates that a double mutant occurs in the record, such that the two mutations correspondto the nodes.Notice that the position graph reﬂects the sites but ignores the actual amino acid sub-stitutions (such as if the substitution is glutamine or lysine, or if both of them occursat the site). Single mutants with substitutions at the same site may of course differ inhow well they combine with other mutation. One may want to look at more-ﬁne scaledinformation and distinguish between different substitutions at the same site. However,for simplicity, we ignore this complication.Figure 1 shows the position graph for the TEM family, except that nodes of degreezero are omitted.

Remark 4.2.

The position graphs considers loci, but not the amino acid substitution.One may want to look at more-ﬁne scaled information and distinguish between differ-ent substitutions at the same site.2 CRONA ET. AL

130 10439 182265 164 2242169 275276244 24023884 184 F IGURE

1. The position graph for the TEM-family, where we have omittedthe 21 nodes of degree 0.Recall that the degree of a node is the number of edges to other nodes. The complement G of a graph G is a graph on the same nodes, where a pair of nodes are connected byan edge exactly if the pair is not connected by an edge for G . For a complete bipartitegraph , the nodes can be partitioned into two subsets, such that every pair of nodes fromdifferent subsets are connected by an edge, and there are no other edges.The following observation is elementary by Remark 1.1. Remark 4.3.

Assume the block model (with at least two blocks). Let G denote the posi-tion graph. Consider G and the complement G . Under the perfect record assumptions,the nodes of G have degree one or more. Moreover, by Remark 1.1, the following state-ments hold modulo a few errors:(1) For the case of two blocks, G is a complete bipartite graph. It follows that G is adisconnected graph with two components, both of which are complete.(2) In general, for l blocks, G is a disconnected graph with l components, all of whichare complete.For the TEM record, the single mutants correspond to substitutions at exactly 37 posi-tions. Exactly 21 nodes out of the 37 have degree zero. The position graph has 37 nodesand 25 edges.Consider the block model (for at least two blocks). The following observations arepotentially problematic for a block model. NTIBIOTIC RESISTANCE LANDSCAPES 13 (1) There is an abundance of nodes of degree zero (21 our of 37),(2) the total number of edges is (only) 25 and there are 37 nodes.(3) the maximal degree for a nodes is 8,(4) G has several triangles, in particular a triangle consisting of the three nodes ofhighest degree out of all nodes of G .Under the perfect record assumptions, the block model implies that the degree of eachnode is at least one (provided that the nodes are distributed over at least two blocks).For the case of two blocks, the number of edges is between 36 and ×

19 = 342 ,where the minimum corresponds to the distribution 1 and 36 nodes per block, and themaximum corresponds to the distribution of 18 and 19 nodes per block. In the ﬁrst case,one node should have degree 36. However, the maximal degree of nodes in the positiongraph (Fig. 1) is 8. It is clear that the position graph has too few edges for similar blocklengths, and too low maximal degree for unequal block length.For more than two blocks, even more edges are expected leading to similar problems.Clearly the block model is not compatible with the data under the perfect record condi-tions.Consider the case of exactly two blocks in a more realistic situation. Then the positiongraphs should have essentially no triangles by Remark 1.1. This is because at least twoof the nodes in a triangle are on the same block. (Moreover, for two blocks a reasonableguess would be that the three nodes of highest degree (39, 69, 164) are on the same block[the shorter one]. If so, it is unexpected with a triangle consisting of the three nodes.)It remains to consider the single record condition and consider more than two blocks.In that case, let us analyze the fact that there are relatively few edges. 21 out of 37 nodeshave degree zero, and in fact one can ﬁnd a set of 31 nodes (including the 21 nodes) inthe position graph, such that no pairs in the set are neighbors . That implies that among 31nodes one cannot ﬁnd a single pair of nodes on different blocks, such that both of themhave high ﬁtness in the same environment.This is of course possible, especially taken into account that the record may be incom-plete. However, from knowledge of the context, it does not seem plausible. The numberof completely different environments is limited.

Observation 3.

From the TEM record and some knowledge of the context, the blockmodel is probably not a good ﬁt for the TEM family.5. A

LABORATORY STUDY OF

TEM

ALLELES

We will compare the results from the TEM record with a laboratory study. The ad-vantage with the laboratory study is that one can use drug speciﬁc information, whichtakes care of the difﬁculties resulting from multiple environments.The study Goulart et al. (2013) considered the antibiotics Ampicillin (AMP), Cef-tazidime (CAZ) Cefpodoxime (CPD), Cefprozil (CRP), Cefotetan (CTT), Cefotaxime(CTX), Cefepime (FEP) and Pipercillin/tazobactam penicillin/inhibitor (TZP). Fitnessranks were detremined for the 4 single mutants L21F, R164S, T265M, E240K. as well asthe 6 double mutants than can be obtained from them, for each of the 8 antibiotics.

In particular, for the drug Ceftazidime (CAZ), 4 single mutants were more ﬁt than thewild-type, so that one can obtain 6 double mutants from the single mutants. Out of the6 double mutants for Ceftazidime, 5 double mutants were more ﬁt than at least one ofthe single mutants, and one double mutant (the combination of R164S and E240K) wasmore ﬁt than both corresponding single mutants.For the 9 antibiotics, we list the number of single mutants with higher ﬁtness than thewild-typ. and below the | B || B p | -value. , , , , , , , , , , , , , , , , The mean values is 0.57. Obviously the data deviates considerably from additive ﬁtnessas well as uncorrelated ﬁtness.For a comparison, combinations of 5 beneﬁcial mutations from an experimental

Es-cherichia coli population were considered in Khan (2011). Negative epistasis dominated,but sign epistasis was rare. Consider the 10 double mutants combining pairs of the 5beneﬁcial mutations. Every double mutant had higher ﬁtness than at least one of itscorresponding single mutant, so that | B || B p | = = 1 .As for the block model, with very few exceptions, the double mutants in B shouldbe more ﬁt than both single mutants, or less ﬁt than both corresponding single mutants,by Remark 1.1. For the 9 drugs, one can form in total 19 double mutants. 5 of them aremore ﬁt than both corresponding single mutants, 6 of them are more ﬁt than exactly one corresponding single mutants, and 8 of them are less ﬁt than both corresponding singlemutants. This observation indicates that the block model does not apply. Notice alsothat the most plausible block distribution of nodes differ from drug to drug (for somedrugs node 21 and 265 should be on the same block, and for other drugs not). Observation 4

Neither additive ﬁtness, uncorrelated ﬁtness, nor the block model iscompatible with data from the laboratory study Goulart et al. (2013).6.

DISCUSSION

We have compared expectations from additive, uncorrelated, and block models ofﬁtness landscapes with empirical data. We argue that the TEM family of β -lactamases isnot compatible with the three models. Under the simpliﬁed assumptions of a completerecord and a single environment, we found that the TEM data was not compatible withanyone of the three models. Under more realistic assumptions for the TEM family, thedata was not compatible with uncorrelated ﬁtness. Similarly, using the record and someknowledge of the biological context, it seems plausible that neither the additive nor theblock model is a good ﬁt. Our conclusions for the three models were conﬁrmed by alaboratory study of TEM alleles. We did not compared the TEM family and the NKmodel. However, the symmetry aspect (see Remark 1.4) could be problematic.The literature on additive, uncorrelated block and NK models literature is extensive.Some approaches in the ﬁeld have been motivated by theoretical considerations, such NTIBIOTIC RESISTANCE LANDSCAPES 15 as relating epistasis to the number of peaks of a ﬁtness landscape. The purpose of ourstudy was not to debate the value of the classical models. We appreciate that toy modelscan be used for generating fruitful hypotheses, which can be tested empirically. As foradditive and uncorrelated ﬁtness, the extremes will always be of interest as a theoreticalstarting point.However, the classical models have been used for interpretations of empirical dataas well. A standard assumption for several topics in evolutionary biology and breed-ing is additive (or multiplicative) ﬁtness, in particular for studies of ﬁtness inheritanceand sexual selection (e.g Kokko et al, 2003). Statistical methods which are suitable foruncorrelated ﬁtness landscapes have been used in empirical studies (see e.g. Crona etal., 2013b, for a discussion), and the NK model is frequently used in empirical contexts.From this perspective, it is reasonable to discuss to what extent the classical models arerealistic.We have suggested elementary tools, including the qualitative measure of additivityand the position graph, for comparing models and data. Our approach demonstratesthat one can determine properties of ﬁtness landscapes from a record of mutations. Theideal setting is a single environment. It would be of interest to compare the qualita-tive measure of additivity with observed behavior for microbes, such as recombinationstrategies. The position graph can be used as a test of modularity.We consider it an advantage that our approach does not depend on any structuralassumptions of the underlying ﬁtness landscapes. Any case of adaptation where onehas a well-deﬁned wild-type and some direct or indirect method for determining ﬁtnessranks of genotypes works.Qualitative information has its limitations. The ideal information for determiningproperties of ﬁtness landscapes is of course direct ﬁtness measurements. For obviousreasons such information is sometimes difﬁcult, if even possible, to derive. Moreover,laboratory results do not always agree with clinical ﬁndings. Consequently, direct meth-ods for interpretations of nature are of interest as a complement to experimental results.R

EFERENCES

Aita, T., Iwakura, M. and Husimi, Y. (2001). A cross-section of the ﬁtness landscape ofdihydrofolate reductase.

Protein Eng.

Sep; 14(9):633–8.Beerenwinkel, N., Eriksson, N., and Sturmfels, B. (2007). Conjunctive Bayesian net-works.

Bernoulli ; 13: 893-909.Beerenwinkel N., Pachter, L. and Sturmfels B. (2007). Epistasis and shapes of ﬁtnesslandscapes.

Statistica Sinica

BMC Evolutionary Biology

Proc. Natl. Acad. Sci USA 107 suppl 1 : 1747-1751.Crona, K. Polytopes, graphs and ﬁtness landscapes http://arxiv.org/abs/1212.0465v1.

Crona, K., Greene, D. and Barlow, M. (2013). The Peaks and Geometry of Fitness Land-scapes.

Journal of Theoretical Biology

Am Nat.

J. Comput. Biol.

Phys RevA

Theor. Pop. Biol.

23 :202–215.Gillespie, J. H. (1984). The molecular clock may be an episodic clock.

Proc. Natl. Acad.Sci. USA 81 : 8009–8013.Goulart, C. P., Mentar, M., Crona, K., Jacobs, S. J., Kallmann, M., Hall, B. G.,Greene D., Barlow M. (2013). Designing antibiotic cycling strategies by deter-mining and understanding local adaptive landscapes.

PLoS ONE

Proceedings of the Royal Society Series B

J Theor Biol.

J Theor Biol. ; 141:211.Khan, A. I., Dinh, D. M., Schneider, D., Lenski, R. E., Cooper, T. F. (2011).

Science

J Appl Prob

PLoS Genet.

Proc. Natl. Acad. Sci.

USA, 106, 18638–18643.Macken, C.A. and Perelson AS. (1989). Protein evolution on rugged landscapes.

Proc.Natl. Acad. Sci.

USA 86:6191.Orr, H. A. (2002). The population genetics of adaption:the adaption of DNA sequences,

Evolution

Evolution. ;60:1113Park, S. C., Krug J. (2008) Evolution in random ﬁtness landscapes: The inﬁnite sitesmodel.

J Stat Mech P04014.

NTIBIOTIC RESISTANCE LANDSCAPES 17

Poelwijk, F. J., Kiviet, D. J., Weinreich, D. M. and Tans, S. J. (2007). Empirical ﬁtnesslandscapes reveal accessible evolutionary paths.

Nature

SIAM Review

J Theor Biol. 243:114120 .Schenk, M. F., Szendro, I. G., Krug, J. and de Visser, J. A. (2012). Quantifying the adaptivepotential of an antibiotic resistance enzyme.

PLoS Genet.

Jun;8(6):e1002783.Szendro, I. G., Schenk, M. F., Franke, J., Krug, J., de Visser, J. A. G. M. (2012). Quantita-tive analyses of empirical ﬁtness landscapes. arXiv:1202.4378.Wright, S. (1931) Evolution in Mendelian populations.

Genetics , 16 97–159.Weinberger, E. E. (1991) Taylor and Fourier representations of ﬁtness landscapes Biolog-ical Cybernetics, 65: 321–330.Weinreich, D. M., Watson R. A., Chao, L. (2005). Sign epistasis and genetic constraint onevolutionary trajectories.

Evolution

59, 1165–1174.Weinreich, D. M., Delaney N. F., Depristo, M. A., and Hartl, D. L. (2006). Darwinianevolution can follow only very few mutational paths to ﬁtter proteins.

Science 312 :111–114. T ABLE

1. Loci with more than one substitutionlocus

TATISTICS AND TABLES

We use information from the record of the TEM family from the Lahey Clinic . as of April 2012 for this study. Allmutants have been found clinically. The record is continuously updated with new mu-tants.The following loci, in total 37, correspond to single mutations:21, 28, 34, 39, 68, 69, 84, 92, 104, 115,120, 124, 130, 145, 155, 157, 158, 163, 164, 176,182, 184, 189, 204, 213, 218, 224, 230, 238, 240,244, 265, 271, 275, 276, 280, 283Exactly 4 loci, out of the 37 listed above, correspond to more than one substitutions(see Table 1).7.1.

Double mutants and the position graph.

The total number of double mutants inthe record is 45, where 35 combine single mutations from the record (see the table). Theremaining 10 mutants are as follows: TEM-58, TEM-81, TEM-112, TEM-126, TEM-137,TEM-145, TEM-146, TEM-163, TEM-164, TEM-169.Information about double mutants in the record are expressed in the position graph(see Fig. 1). Recall that for the position graph a node corresponds to a locus with a singlemutation. An edge denotes that there exists at least one double mutant which combinesthe substitutions at the two loci. The position graph has 37 nodes and 24 edges. Thefollowing (21) nodes of the position graph have degree zero. , , , , , , , , , , , , , , , , , , , , The degree of the remaining (16) nodes are listed by locus (locus:degree).

21 : 3 ,

39 : 6 ,

69 : 7 ,

84 : 1 ,

104 : 3 ,

130 : 1 ,

164 : 8 ,

182 : 3 ,

184 : 1 ,

238 : 4 ,

224 : 1 ,

240 : 3 ,

244 : 2 ,

265 : 3 ,

275 : 1 ,

276 : 1

NTIBIOTIC RESISTANCE LANDSCAPES 19 T ABLE

2. Single mutants of the record1 TEM-2 Q39K2 TEM-12 R164S3 TEM-17 E104K4 TEM-19 G238S5 TEM-29 R164H6 TEM-30 R244S7 TEM-31 R244C8 TEM-33 M69L9 TEM-34 M69V10 TEM-40 M69I11 TEM-51 R244H12 TEM-54 R244L13 TEM-55 G218E14 TEM-57 G92D15 TEM-70 R204Q16 TEM-76 S130G17 TEM-79 R244G18 TEM-84 N276D19 TEM-90 D115G20 TEM-95 P145A21 TEM-96 D163G22 TEM-103 R275L23 TEM-104 A280V24 TEM-105 S124N25 TEM-117 L21F26 TEM-122 R275Q27 TEM-127 H158N28 TEM-128 D157E29 TEM-135 M182T30 TEM-141 K34E31 TEM-143 R164C32 TEM-148 T189K33 TEM-150 E28D34 TEM-156 M155I35 TEM-166 R120G36 TEM-168 T265M37 TEM-170 G283C38 TEM-171 V84I39 TEM-174 A213V40 TEM-176 A224V41 TEM-181 A184V42 TEM-183 F230L43 TEM-186 D176N44 TEM-191 E240K45 TEM-192 M68I46 TEM-198 T271I T ABLE

3. The double mutants of the record which combine single muta-tions of the record 1 TEM-6: E104K R164H2 TEM-7: Q39K R164S3 TEM-10: R164S E240K4 TEM-11: Q39K R164H5 TEM-13: Q39K 265M6 TEM-15: E104K G238S7 TEM-18: Q39K E104K8 TEM-20: M182T G238S9 TEM-26: E104K R164S10 TEM-28: R164H E240K11 TEM-32: M69I M182T12 TEM-35: M69L N276D13 TEM-36: M69V N276D14 TEM-37: M69I N276D15 TEM-38: M69V R275L16 TEM-44: Q39K R244S17 TEM-45: M69L R275Q18 TEM-53: L21F R164S19 TEM-59: Q39K S130G20 TEM-65: Q39K R244C21 TEM-71: G238S E240K22 TEM-77: M69L R244S23 TEM-82: M69V R275Q24 TEM-106: E104K M182T25 TEM-110: L21F T265M26 TEM-115: L21F R164H27 TEM-116: V84I A184V28 TEM-118: R164H T265M29 TEM-120: L21F G238S30 TEM-144: R164C E240K31 TEM-147: R164H A224V32 TEM-154: M69L R164S33 TEM-160: Q39K M69V34 TEM-165: R164S M182T35 TEM-189: M69L E240K

NTIBIOTIC RESISTANCE LANDSCAPES 21

Ceftazidime w ( L F ) , w ( R S ) , w ( E K ) , w ( T M ) > w ( TEM-1 ) w ( { R S, E K } ) > w ( R S ) , w ( E K ) w ( { L F, R S } ) > w ( L F ) w ( { L F, T M } ) > w ( T M ) w ( { R S, T M } ) > w ( T M ) w ( { E K, T M } ) > w ( T M ) Cefotaxime w ( L F ) , w ( R S ) , w ( E K ) , w ( T M ) > w ( TEM-1 ) w ( { R S, E K } ) > w ( R S ) , w ( E K ) w ( { L F, R S } ) > w ( L F ) Pipercillin/tazobactam penicillin/inhibitor: w ( L F ) , w ( T L ) > w ( TEM-1 ) w ( { L F, T M } ) > w ( L F ) , w ( T M ) Cefpodoxime: w ( R S ) , w ( E K ) > w ( TEM-1 ) w ( { R S, E K } ) > w ( E K ) , w ( R S ) Cefotetan: w ( R S ) , w ( E > w ( TEM-1 ) w ( { R S, E K } ) > w ( R S ) , w ( E Cefprozil: w ( L F ) , w ( T M ) > w ( TEM-1 ) w ( { L F, T M } ) > w ( L F ) Ampicillin: w ( L F ) , w ( T M ) > w ( TEM-1 ) No double mutants to list.Cefepime: w ( L F ) , w ( R S ) > w ( TEM-1 ) No double mutants to list.Amoxillin+Clavulanate: w ( L F ) , w ( T M ) > w ( TEM-1 ) No double mutants to list.

E-mail address : [email protected] CRONA ET. AL