Modelling SARS-CoV-2 coevolution with genetic algorithms
MModelling SARS-CoV-2 coevolution with geneticalgorithms
Aymeric Vi´e Mathematical Institute, University of Oxford Institute of New Economic Thinking, University of Oxford
February 25, 2021
Abstract
At the end of 2020, policy responses to the SARS-CoV-2 outbreak havebeen shaken by the emergence of virus variants, impacting public healthand policy measures worldwide. The emergence of these strains suspectedto be more contagious, more severe, or even resistant to antibodies andvaccines, seem to have taken by surprise health services and policymak-ers, struggling to adapt to the new variants constraints. Anticipating theemergence of these mutations to plan ahead adequate policies, and un-derstanding how human behaviors may affect the evolution of viruses bycoevolution, are key challenges. In this article, we propose coevolutionwith genetic algorithms (GAs) as a credible approach to model this rela-tionship, highlighting its implications, potential and challenges. Becauseof their qualities of exploration of large spaces of possible solutions, ca-pacity to generate novelty, and natural genetic focus, GAs are relevant forthis issue. We present a dual GA model in which both viruses aiming forsurvival and policy measures aiming at minimising infection rates in thepopulation, competitively evolve. This artificial coevolution system mayoffer us a laboratory to ”debug” our current policy measures, identify theweaknesses of our current strategies, and anticipate the evolution of thevirus to plan ahead relevant policies. It also constitutes a decisive op-portunity to develop new genetic algorithms capable of simulating muchmore complex objects. We highlight some structural innovations for GAsfor that virus evolution context that may carry promising developmentsin evolutionary computation, artificial life and AI.
As early as June 2020, the initial SARS-CoV-2 strain identified in China wasreplaced as the dominant variant by the D614G mutation (Figure 1). Appearedin January 2020, this strain differed because of a substitution in the gene encod-ing the spike protein. The D614G substitution has been found to have increasedinfectivity and transmission (WHO, 2020a; Korber et al., 2020).1 a r X i v : . [ c s . N E ] F e b n November 5 2020, a new strain of SARS-CoV-2 was reported in Denmark(WHO, 2020b), linked with the mink industry. The ”unique” mutations identi-fied in one cluster, ”Cluster 5”, seemingly as contagious or severe as others, hasbeen found to moderately decrease the sensitivity of the disease to neutralisingantibodies. Culling of farmed minks, increase of genome sequencing activitiesand numerous closing of borders to Denmark residents, followed.On 14 December 2020, the United Kingdom reported a new variant VOC202012/01, with a remarkable number of 23 mutations, with unclear origin(Kupferschmidt, 2020). Early analyses have found that the variant has in-creased transmissibility, though no change in disease severity was identified(WHO, 2020a). One of these 23 mutations, the deletion at position 69/70del,was found to affect the performance of some PCR tests, currently at the centerof national testing strategies. Quickly becoming dominant, this variant was heldresponsible for a significant increase in mortality, ICU occupation and infectionsacross the country (Iacobucci, 2021; Wallace and Ackland, 2021).On 18 December, the variant 501Y.V2 was detected in South Africa, afterrapidly displacing other virus lineages in the region. Preliminary studies showedthat this variant was associated with a higher viral load, which may cause in-creased transmissibility (WHO, 2020a). Recent findings have shown that thisvariant significantly reduced the efficacy of vaccines (Mahase, 2021).Figure 1: Shift over time from orange (the original D type of the virus) to blue(the now-widespread G form, D614G); (Los Alamos National Laboratory, 2020)RNA viruses have high mutation rates (Duffy, 2018). Although many muta-tions are not beneficial for the organisms, and some are inconsequential, somesmall fraction of them are beneficial. We refer the reader to (Duffy, 2018)and Domingo et al. (1996) for a discussion on RNA viruses mutation rates.The consequences of these high mutation rates notably are higher evolvability,i.e. higher capacity to adapt to changing environments. This allows them toemerge in new hosts, escape vaccine-induced immunity, or circumvent diseaseresistance. However, RNA viruses seem to be just below the threshold for crit-ical error: if the majority of mutations are deleterious, higher mutation ratesmay cause ecological collapse in the virus population. As a RNA virus (Lima,2020), SARS-CoV-2 shares these characteristics, and mutates very frequently(Phan, 2020; Benvenuto et al. (2020), Maty´asek, and Kovar´ık, 2020). Espe-2ially relevant for this class of virus, the priorities of many researchers includingthe WHO Virus Evolution Working Group, have been to strengthen ways toidentify relevant mutations, study their characteristics and impacts, as well asoutlining mitigation strategies to respond to these mutations (WHO, 2020a).Anticipating the emergence of these mutations to plan ahead adequate poli-cies, and understanding how human behaviors may affect the evolution of virusesby coevolution, are key challenges. Human adaptation of policies and behaviorscan impact the reproduction of SARS-CoV-2, and target specific characteristicssuch as airborne transmission. The impact of human policies and behaviors onoutbreak trajectory, the evaluation of non pharmaceutical measures, have beenthe object of numerous analyses. However, most of these analyses do not includethe possibility for viruses to mutate, with novel effects and increased transmis-sion rates. The space of possible virus strains is huge and to some extent quasiopen-ended, challenging modelling attempts of this arms’ race.In this article, we propose coevolution with genetic algorithms (GAs) as acredible approach to model this relationship, highlighting its implications, po-tential and challenges. We provide a proof of concept-implementation of thiscoevolution dual-GA. Because of their qualities of exploration of large spaces ofpossible solutions, capacity to generate novelty, and natural genetic focus, GAsare relevant for this issue. We present a dual GA model in which both virusesaiming for survival and policy measures aiming at minimising infection rates inthe population, competitively evolve. Under coevolution, virus adaptation to-wards more infectious variants appear considerably faster than when the virusevolves against a static policy. More contagious strains become dominant inthe virus population under coevolution. The coevolution regime can generatemultiple outbreaks waves as the more infectious variants becoming more dom-inant in the virus population. Seeing more infectious virus variants becomingdominants may signify that our policy measures are effective.This artificial coevolution system may offer us a laboratory to ”debug” ourcurrent policy measures, identify the weaknesses of our current strategies, andanticipate the evolution of the virus to plan ahead relevant policies. It highlightshow human behaviors can shape the evolution of the virus, and how reciprocallythe evolution of the virus shapes the adaptation of public policy measures. Toovercome the simplifications of the implementation in this article, several keyinnovations for evolutionary algorithms may be required, in particular bringingmore advanced biological and genetic concepts in current evolutionary algo-rithms.We first present in Section 2 the concept of coevolution, both generally incomplex systems, and specifically in our study of the evolution viruses and poli-cies. We propose genetic algorithms as a modelling tool for this context. Geneticalgorithms are briefly introduced in Section 3. We present our perspective ofusing genetic ,algorithms to generate an artificial coevolution of SARS-CoV-2,and present its main concepts and design in Section 4. Then, we propose an ex-ample of implementation of a dual genetic algorithm to model this coevolutionprocess in Section 5, describing the model, the operators, the parameters, andsome key results. We develop further the implications and perspectives of this3ork in Section 6. Section 7 presents data and code availability, and Section 8concludes. Co-evolution opens a promising and new way to model such ecosystems. In-vestors in the stock market evolve financial strategies to obtain higher profit,and this evolution can be captured by a GA model. But they are evolving in anenvironment, that notably includes financial regulations set by policy makers.Not only these regulations are evolving as policy makers strive to identify thebest policy to stabilise the market and avoid large crashes: the evolution of reg-ulation and financial strategies is a co-evolution of two species. Policy makersattempt to discourage new loopholes exploited by investors that set a threat onthe real economy; investors adapt to the new regulations seeking for other waysto extract profit, finding new niches that trigger new adaptations of regulations.By capturing this interplay, a GA approach could act as a debugging tool forfinancial regulations, a stress-test program that invents novel ways to challengeour organisations.Most sports competitions see such interplay between rules and strategies.The 2008 Olympic Games saw controversy over new swimming suits with novelmaterials that allowed unprecedented speed and records, leading to their bancausing a change in the innovation strategies of manufacturers. This new direc-tion may spark some day a similar story, calling for new regulation, sparkinga different evolution trajectory. Formula 1 constructors actively seek grey-areazones in the regulation hoping for marginal performance gains. One team cre-atively bypassed the action of a regulatory sensor to increase its engine power,pushing the regulations to add a second sensor and regulate the use of en-gine modes, impacting all teams’ performance. Another racing team exploitedunclear rules on purchases and copying of other cars’ parts to, leading to achange in the regulations that impacts the evolution of other teams develop-ment programs, and that may as well create further unclear rules to be abusedin the future. Another instance of coevolution in complex systems, of publichigh interest, is the co-evolution of viruses and population behaviors or policymeasures. 4 .2 The coevolution of SARS-CoV-2 and policy measures
Figure 2: Illustration of the mutation leading to the variant D614G; (Los AlamosNational Laboratory, 2020)The emergence of viruses’ mutations is a complex topic, both in the mechanismsinvolved at the virus genome level, but also on what causes some particular mu-tations to appear, or to be rewarded. That is, the fitness (dis)advantage of thenew trait encoded by a mutation, in its environment. We can see the strugglebetween SARS-CoV-2 mutations illustrated in Figure 2, and human behaviorsand policy measures, as an arms race, a coevolution. Humans adopt new re-strictions, wear face coverings, adopt social distancing measures, develop testingmethods, to reduce the fatalities and infections due to the virus. Facing thispressure, the virus’ mutations unconsciously strive to change its genome in or-der to improve its chances of survival. As some mutations allow the virus toget new, beneficial traits, possibly higher transmissibility (Priya and Shanker,2021), resistance to antibodies (Callaway, 2020) or causing anomalies in PCRtests (WHO, 2020a), human behaviors may adapt, continuing the arms race.This evolutionary change in traits of individuals in one population, in responseto a change of trait in a second population, followed by a reciprocal response, isa phenomenon known as coevolution (Janzen, 1980). Viruses are walking on thefitness landscape (Wright, 1931), a physical representation of the relationshipbetween traits and fitness, and humans change by their behavior this fitnesslandscape. If by example all humans were hypothetically wearing perfectlyhermetic face coverings, airborne transmission methods would fail, causing thevirus either to go extinct, or to find other means of transmission.The continuous interplay between individual genomes or characteristics, and5heir environment, is an endless source of novelty and niches for adaptation. In-dividuals are influenced by their environment, and the environment itself isinfluenced by individual. This dynamic is difficult to model, especially in ourcontext of virus and policies coevolution. The space of possible actions or pol-icy measures is at least very large. Humans can adopt a large diversity ofmeasures, with many levels of stringency or public support. Likewise, the largesize of the space of possible genomes for viruses, and the diversity of phenotypes,i.e. observable characteristics, that they can exhibit, challenge our modellingattempts. Coevolution can give birth to novel traits that did not exist be-fore, in a quasi open-ended process. Random or enumerative search methodsstruggle to evaluate such a large number of possible combinations. We pro-pose here an alternative framework to simulate this coevolution phenomenonin spite of the complexity of the task. Modelling coevolutionary dynamics hasseen a large variety of approaches: stochastic processes mathematical modelling(Dieckmann and Law, 1996, Hui et al., 2018), network science (Guimaraes etal., 2017), dynamical systems (Caldarelli et al., 1998), and more biological orgenetic methods (Gilman et al., 2012). Evolutionary algorithms (EAs, usedfor coevolution with Rosin and Belew, 1997), in particular Genetic algorithms(GAs), offer one promising approach at this end. Let us first introduce thembriefly, before outlining the properties that makes them relevant for this task.
A genetic algorithm (GA) is a member of the family of evolutionary algorithms (EAs), that are computational search methods inspired from natural selection(Holland, 1992). They simulate Darwinian evolution on individual entities,gathered in a population . Genetic algorithms represent these entities with a genome , i.e. a collection of genes, often represented as a bit string, that de-termines the entity phenotype , i.e. observable characteristics. The entities un-dergo selection based of fitness, reproduction of fittest entities, mutations ofthe genome, that affect their traits (Mirjalili, 2019). Iterating this simplifiedevolution process, the characteristics of the entities may change, improving thefitness of the population.As a population-based search method, GAs are efficient in the exploration of search spaces , i.e. space of possible solutions, that can be very large (Axelrod,1987), or rugged (Wiransky, 2020). That is, that admit several extrema, orvery irregular structure. They quickly identify regions of the search space thatare associated with higher fitness, showing satisfying optimisation capacities(Bhandari et al., 1996). They can also be used to model evolutionary systems,from economies and financial strategies to biological ecologies. Vie (2020a)reviews in more detail its qualities and perspectives as a search method and amodelling tool. 6
An artificial coevolution of SARS-CoV-2
Provided we can formulate an adequate representation of i) the virus genome andii) policy measures, and under the assumption that the mappings a) between thevirus genome and the virus phenotype and b) between the policy actions and thevirus phenotype fitness, can be modelled in a satisfying way, we can representtheir coevolution as a dual genetic algorithm with two populations: a populationof viruses, and a population of policy measures. Both interact indirectly on athird population: the general human population. Viruses survive by infectingnew humans in that population, and policy measures modify -to some extent-the behavior of the human population, as Figure 3 illustrates.Figure 3: The coevolution model with two genetic algorithmsWhy GAs? Genetic algorithms are relevant tools to model this coevolutionrelationships for several reasons. First, evolutionary algorithms appear relevantto model natural selection contexts, as this is precisely their main focus (Hol-land 1992), though a significant fraction of the literature has used this methodfor optimisation. Second, among evolutionary algorithms, the inner genetic-centered approach of GAs give them an adequate baseline to encode more com-plex genomes and phenotypes. The computational architecture of GAs centeredon a genetic representation, subject to evolution operators, appears to be theclosest to the biological objects we are here interested in modelling. Third,genetic algorithms are particularly powerful in exploring new regions of largesearch spaces (Whitley, 1994), that may have non trivial structure (Wiransky,2020). In our coevolution context, we are interested to see what new featuresmay emerge from both viruses and policy responses. GAs, that can generate thisnovelty, thus constitute a relevant option. Fourth, coevolution has already beenmodelled using GAs for optimisation (Potter and De Jong, 1994, Vie, 2020b),giving solid foundations for further work in the area, and existing tools to un-7erstand the complex dynamics of the artificial SARS-CoV-2 coevolution.How could this artificial coevolution be implemented? Starting from initialconditions constituted by i) a population distribution of SARS-CoV-2 variantswith identified genome sequences and traits and ii) a distribution of the currentpolicy measures, we can simulate the evolution of viruses and policy actions, inresponse one to another.To define fitness in this world, one could assume that viruses simply aimat surviving, and do not have an objective function defining some metric tomaximise; the performance of policy measures could be evaluated by minimisingthe number of deaths or infections.The source of novelty in this coevolution system would essentially be mu-tations for viruses, and both mutations and recombination for policy measures.While viruses infect new hosts, and don’t reproduce between themselves, it isreasonable to consider that national policy makers are exchanging, taking noteof what happened in other countries, and changing their own actions in responseto positive effects.From this starting condition, and under these evolution criteria and mecha-nisms, a large number of runs of the system could be simulated. By observingthe behavior of the artificial viruses and policies, and the outbreak dynamicsin the artificial human population, some insights could emerge. We could dis-cover some regularities, such as seeing whether and when viruses evolve towardsgreater transmsissibility, but also observe the changes in the genome, providinguseful indications on where to experimentally look at during physical genomesequencing.This artificial coevolution system may offer us a laboratory to ”debug” ourcurrent policy measures, identify the weaknesses of our current strategies, andanticipate the evolution of the virus. If a significant portion of the simula-tions produced viruses that find a way to not be detected by PCR tests, or toevolve a resistance to our current vaccines, policy makers could be advised inadvance of this possibility, and work ahead to prevent this issue from happening.At times where policy makers faced significant uncertainty on the impactof their measures, a difficulty exacerbated by the rather long incubation timeof SARS-CoV-2 (Lei et al., 2020), this artificial coevolution system can providethem with a complementary way to assess the impact of prospective policymeasures, with an emphasis given on the evolution of the virus. In other words,such simulation possibilities may give the policy maker not only an estimateof the impact of the measures over infection rates and death rates, but alsothe possibility to consider the consequences of such measures over the futurepossible traits of the virus. 8
An example of implementation
In this section, we present an implementation example of a coevolution modelwith dual genetic algorithms. We highlight the building blocks of the model,the parameter configuration, and the key results.
Genetic representation
Individual viruses’ genomes in the model are rep-resented as a binary string whose length is the virus size . Viruses are initialisedwith a genome composed exclusively of zeros: this assumes that at the start,viruses are an original form of the disease with no mutations. Each elementof this genome represents activation (if equal to 1) or non-activation (if equalto 0) of specific mutated genes. Each mutated gene has an effect on the virusreproduction rate. These effects are drawn uniformly in the interval [-1,1]. Thismeans that some mutations will be detrimental to the virus reproduction, oth-ers will have very small or null effects, and some will favor reproduction. Wesimplify as such the process and effects of mutations, collapsing all these dimen-sions onto the virus reproduction rate. The virus population contains a givennumber at the start, programmed by the parameter initial virus size .Individual policies are represented as a binary string as well, initialised withonly zeros. This illustrates a starting point in which government policies startwith no measure at all. Each element of the policy genome is a policy that canbe activated (for a value of the corresponding genome location to 1). Again,we restrict our attention on the virus reproduction rate, and ignore all otherdimensions. Each measure will have an effect over the virus reproduction rate,illustrating the efficiency of different measures to prevent the spread of the dis-ease. The effects of these measures are calibrated from the values obtained byHaug et al. (2020) in their influential analysis of the impact of non pharmaceu-tical interventions. Our model captures the uncertainty on the effects of thesepolicies by setting the effect to be drawn uniformly from the 95% confidenceintervals identified by Haug et al. (2020), illustrated in Figure 4. This drawis done once at the beginning of the run. The number of policies considered isparametrised with the policy population size parameter, and will remain con-stant during the run. Policies can include up to 46 measures, corresponding tothe measures studied by the above reference.
Infection process
We adopt in this illustration a very simplified model ofinfection. Each individual virus in the population is characterised by a repro-duction rate that incorporates two elements. First, a ”base” reproduction rate ,corresponding to the reproduction rate of the original SARS-CoV-2. Second,this base rate is added to the sum of the effects of mutations activated by thisparticular individual virus’ genome. In the infection step, each virus will in-fect as many hosts as its effective reproduction rate . This effective reproductionrate is equal to the virus reproduction rate, minus the average reduction inreproduction rate in the policy population.9igure 4: Effects of Covid 19 government interventions (From Haug et al.,2020). With permission from Nature Human Behavior - Reproduction License4994130245697 (Jan. 22 2021) 10or each new infection, random mutations will happen with a given prob-ability: the virus mutation rate . Each element of the virus genome can mu-tate independently. Higher mutation rates will lead the virus to mutate morefrequently during infections. The mutation operator will transform the givenelement of the genome to a 1 if it is characterised by the value 0, and inversely.As a result, and as the pandemic grows or diminishes, the size of the populationof viruses handled by the genetic algorithm will vary, and some diversity mayappear within this population.
Fitness
In this model, we reduce the decision makers’ problem to a minimi-sation of the reproduction rate of the virus, which essentially encompass objec-tives of reduction of deaths. Each individual policy is characterised by a totalreduction in the reproduction rate, equal to the weighted sum of the effects ofthe activated specific measures. The fitness, or value of each individual policy,will evaluate the weighted effective reproduction rate of three viruses chosen atrandom in the virus population, in a tournament selection process. The pol-icy reduction in the reproduction rate will be applied, and the net, effectivereproduction rate recorded. Policies that obtain lower effective reproductionrates will be more likely to be selected in the creation of the next generation ofpolicies.Viruses do not mutate with an objective. Hence, we have not included afitness function for the evolution of viruses. Mutations remain unguided byany objectives. The changes of the population of viruses will be driven by thedifferential reproduction rates of various strains, as described below.
Policy learning
After the fitness of the policies has been determined, policieswill be selected to form the basis of next generation policies using ”roulettewheel” cumulative fitness selection. Each policy’s selection probability will beequal to the ratio of its adjusted fitness (equal to r where r is the effectivereproduction rate of the policy) to the sum of adjusted fitness scores. Thiscrossover step models a process of communication between successful policies:decision makers observe their peers in other countries, observe the measures theyimplement and the associated results. Measures that appear efficient abroadtend to be implemented nationally by the means of this imitation step. Thiscrossover step occurs with probability equal to the policy crossover rate . Afterselecting two policies, a random uniform crossover point will be determined, andthe two policies’ genomes will be interchanged after this crossover point. Theresult of this procedure will be two children policies for the next generation.Otherwise, when the crossover operator is not activated with probability 1 - policy crossover rate , the children strategies will be exact copies of their parents.Learning to improve policies will also include a mutation step, modellingsmall perturbations or explorations. This illustrates for instance a country im-plementing or removing quarantine restrictions for various reasons. With a policy mutation probability , any element can mutate from value 0 to value 1.We outline here one important limitation: we do not allow policies in our model11o revert back after some measures have been implemented: we essentially forbiddetrimental mutations. Extending our space of possible measures to measuresthat do not work could be an interesting direction as well. We also do not con-sider other factors such as economic output or political situation that could actas a pressure towards relaxation of measures. Again, these constraints would bean interesting addition for this model, but we have chosen to present a simpleillustration of coevolution. Evolution run and parameters
The simulation runs for
Tmax periods.We run our simulations for a base reproduction rate of 2.63 (Mahase, 2020).Note however that simply changing the value of the base reproduction rate, orincluding uncertainty on its determination, is easily achievable in the sourcecode (see below for availability). Higher base rates will likely make the infectionspike faster and higher, while lower base rates may lead to the virus extinctionin some cases, or reductions of the outbreak peaks. In the model, we considerthe time periods to be indexed as weeks, assuming that each virus is transmittedevery seven days. Parameter ValueVirus initial population size 10Virus size 10Policy population size 100Base reproduction rate 2.63Tmax 20Policy crossover rate 0.5Policy mutation rate 0.05Virus mutation rate 0.0001Table 1: Parameter configuration for the dual genetic algorithmA situation of coevolution defines a run in which both the viruses and thepolicy can evolve: that is, their mutation rates and the policy crossover rate arestrictly positive. When the virus mutation rate is null, but the policy mutationrate and policy crossover rates are positive, we model a situation in which onlythe policy is evolving, against a static virus. When the virus mutation rateis positive, and the policy mutation rate and crossover rates are null, we areillustrating a situation in which the virus evolves, and policies remain indifferentand void. All other parameters remain unchanged.Before turning to the simulation results, we make a note on the impact ofthe parameters over the results, and the outbreak dynamics that are generated.A major challenge in this example implementation was to avoid too large epi-demics: as each virus is simulated individually, handling hundred of millionsof viruses can incur a significant computational cost. The development of thesimulation allowed us to be able to simulate in reasonable time (seconds) upto ten billion individual viruses. Higher virus mutation rates, or higher initial12irus sizes, or less effective policies, can lead to exponential growth of the viruspopulation size. Alternatively, if policies are very efficient (high mutation ratesand crossover rates), and if the virus does not mutate frequently enough, themodel may manage to make the virus go extinct. We must acknowledge thatsimulation results can be sensitive to small variations of the parameters. Theconfiguration showed in Table 1 allows to keep computation doable for the 20time periods considered. Outside extreme situations (complete virus takeoveror virus extinction), the main insights presented below hold.
We now run the evolution of viruses and policies in these three situations above,to identify specific features of the coevolution regime. The Figure panel 5presents the main results. Their observation allows us to formulate a few ”styl-ized facts” of the coevolution of viruses and policies. (a) Average reproduction rate of the pop-ulation of viruses over time (b) Average impact in reproduction rate ofpolicies over time(c) Number of different virus strains overtime (d) Frequency of extreme variant genesover time
Figure 5:
Key results from the coevolution dual genetic algorithm Under coevolution, virus adaptation towards more infectiousvariants is considerably faster than when the virus evolves against static policy. In Figure 5a, we can observe that the average repro-duction create in the virus population rises to 3.1 after 20 time periodsunder coevolution (red curve). When the virus does not evolve (blue),the average reproduction rate naturally stays at the initial value of 2.63.Interestingly, when the virus can evolve, but when the policy does not(green curve), the average reproduction rate tends to increase slightly,but much less than under the coevolution regime. Having the virus face amore severe struggle for its survival makes its evolution more efficient.2.
More contagious strains become dominant in the virus popula-tion under coevolution . Figure 5d shows the frequency of viruses inthe virus population containing the mutation gene granting the highest in-crease in reproduction rate. This fraction rises to 0.35 in the coevolutioncase, while this share is considerably lower under virus-only evolution.This point supports the idea that coevolution makes virus’ adaptationmuch more efficient. Indeed, the number of different variants in the pop-ulation exposed by Figure 5c shows interesting insights. In the virus-onlyevolution, up to 800 variants appear during the 20 time periods. This isdue to the outbreak dynamic: in the virus-only evolution, policies do notdo anything and do not change, hence the virus is free to spread every-where. As its population size grows, more mutations happen, and morevariants emerge. Under coevolution, only up to 200 variants emerge, butthe frequency of the strongest mutations shows that virus evolution ismade much more efficient by the challenge proposed by learning policies.Figure 6: Average effective reproduction rate over time3.
The coevolution regime can generate multiple outbreaks wavesas the more infectious variants becoming more dominant in the irus population . While currently in European countries, a so-calledthird wave seem to have occurred coincidentally to the VOC 202012/01(the ”UK variant”) becoming dominant, this pattern occurred as well dur-ing our evolution run. Figure 5b shows that policies evolve to be moreefficient over time, leading the average effective reproduction rate of thevirus to go below 1, in a path to extinction. Under the coevolution regime,the more efficient adaptation of the virus allows instead the effective repro-duction rate to increase again. Several multiple waves seem empirically tostem from relaxing measures, a behavior that our model does not include.However, the same pattern and insight would hold. In this simulation ofcoevolution, multiples waves of infection can occur because of increasingviruses’ reproduction rates, or relation of policy measures.4. Seeing more infectious virus variants becoming dominants maysignify that our policy measures are effective . These sets of figuresshow that when policies are not evolving and not effective, more infectiousvariants take a much longer time to become dominant in the population.Only when policies evolve and actively undermine the virus reproduction,weaker forms progressively disappear, to be replaced by stronger virusvariants. Several countries today see numerous variants quickly increase inthe share of new infections. While this dynamic constitutes a key challengeand difficulty, it can be seen as the sign that the current measures areputting stress on the virus: they are efficient in pushing weaker forms toreduction and eventually extinction. Only by continuously adapting, andadapting faster than the virus strains, can policies and human behaviorspush all variants to final extinction. Our future work with this model willstrive to include vaccines as a policy measures, allow viruses to obtaina vaccine-resistant trait by mutations, and observe how the evolution ofpolicies shapes the emergence of vaccine-resistant strains of SARS-CoV-2.
This perspective for the artificial coevolution first faces the challenges inher-ent to the use of GAs, that were recently reviewed by Vie (2020a). Theircomputational cost increases significantly with the size of the populations theyconsider. If we wanted to simulate very large population of viruses, knowingthat the evolution of SARS-CoV-2 is a hugely parallel process occurring overmillions of hosts simultaneously, the computational cost of the simulation wouldbe significant. In addition, small differences in parameter configuration of GAs,including population size, mutation rates, selection intensity, is difficult in GAs,as different sets of parameters may yield different results, and impact the al-gorithm performance, or convergence properties (Grefenstette, 1986). Last butnot least, the genetic representation needs careful design to cover the diversity ofpossible solutions in a realistic manner, without creating unintended loopholesthat could be exploited by the algorithm (Juzonis et al., 2012) and bias the re-15ults. Several recent works shed new light on these challenges, and provide newmeans to mitigate their effects. The computational cost of GAs fades beforetheir great scaling with parallelism (Mitchell, 1988), and the computing powerof GPUs (Cheng and Gen, 2019) or Cloud computing hardware. New methodshave been introduced in parameter configuration (Hansen, 2016; Huang et al.,2019; Case and Lehre, 2020). A large diversity of genetic representations existin GAs, and some further inspiration from key biological concepts can open theway to representations allowing these algorithms to evolve more complex artifi-cial organisms (Miikkulainen and Forrest, 2021).Specifically in the perspective of the artificial coevolution laboratories dis-cussed here, a key challenge remains in establishing a proper algorithmic repre-sentation of the SARS-CoV-2 genome, and the mapping between this genomeand the virus traits. By proper, we mean that this representation might notneed to be comprehensive or perfectly exact, but should not oversimplify theobject being studied, or neglect important determinants of traits. The workperspective described here faces important limitations, and as these algorithmscould be used for essential matters of public health, the biases they may containrequire careful consideration. These programs cannot simulate at perfectionnatural selection or comprehensive genetics, simply because we do not fully un-derstand them yet.Attempting to model the coevolution of viruses with more realistic simula-tions than the example provided here is certainly a challenging endeavor. Ithowever entails significant benefits and opportunities. The recent mutationsof SARS-CoV-2 have raised public awareness about this critical issue for publichealth, and make attempts to address this issue with a matter of public interest,with immense benefits when we consider the cost faced by the general publicdue to variants-caused restrictions. This challenge constitutes as well an op-portunity for evolutionary algorithms to grow. If we can make these computerprograms that simulate natural selection capable of representing and simulatingthe evolution of viruses, which are organisms considerably more complex thatwhat EAs are currently handling, these improved EAs in the future could leadto breakthroughs in bioinformatics, optimisation, artificial life and AI.How could such algorithms evolve organisms with that level of complex-ity? Modifications of GAs that move from the simple bit string representationto more complex genomes, can start this transformation. Key phenomena ingenetics and biology such as pleiotropy -where one gene impacts several traits-, polygeny -one trait is impacted by several genes-, the evolution of evolvability ,realistic mutations, are yet to be included in these algorithms, and their additioncarries significant benefits and new opportunities. These ”structural” geneticalgorithms that place such emphasis on the genome structure, may make usable to evolve much more complex, adaptive artificial entities to study virusesevolution as illustrated here, but also to create advanced forms of artificial life,or foster progress in generative artificial intelligence. The challenge of mod-elling SARS-CoV-2 coevolution with genetic methods can inspire such decisive16nnovations.
The main simulation code of the GA proof of concept is freely available at https://github.com/aymericvie/Covid19_coevolution . Model parameterssuch as the efficiency of different non pharmaceutical interventions, or the basicreproduction rate of SARS-CoV-2, as well as mutation rates, or learning ratesfor policies, can be easily changed in the code. The code is designed to work onGoogle Colab, and the script is self sufficient to run.
In this article, we propose coevolution with genetic algorithms (GAs) as a cred-ible approach to model this relationship, highlighting its implications, potentialand challenges. We provide a proof of concept-implementation of this coevolu-tion dual-GA. Because of their qualities of exploration of large spaces of possiblesolutions, capacity to generate novelty, and natural genetic focus, GAs are rel-evant for this issue. We present a dual GA model in which both viruses aimingfor survival and policy measures aiming at minimising infection rates in thepopulation, competitively evolve. Under coevolution, virus adaptation towardsmore infectious variants appear considerably faster than when the virus evolvesagainst a static policy. More contagious strains become dominant in the viruspopulation under coevolution. The coevolution regime can generate multipleoutbreaks waves as the more infectious variants becoming more dominant inthe virus population. Seeing more infectious virus variants becoming dominantsmay signify that our policy measures are effective. This artificial coevolutionsystem may offer us a laboratory to ”debug” our current policy measures, iden-tify the weaknesses of our current strategies, and anticipate the evolution of thevirus to plan ahead relevant policies. It also constitutes a decisive opportunityto develop new genetic algorithms capable of simulating much more complex ob-jects. We highlight some structural innovations for GAs for that virus evolutioncontext that may carry promising developments in evolutionary computation,artificial life and AI.
References [1] Axelrod, R. (1987). The evolution of strategies in the iterated prisoner’sdilemma. Genetic algorithms and simulated annealing, 32-41.[2] Benvenuto, Domenico and Giovanetti, Marta and Ciccozzi, Alessandra andSpoto, Silvia and Angeletti, Silvia and Ciccozzi, Massimo (2020). The 2019-new coronavirus epidemic: evidence for virus evolution. Journal of medicalvirology, 92(4), 455–459 173] Bhandari, D., Murthy, C. A., & Pal, S. K. (1996). Genetic algorithm withelitist model and its convergence. International journal of pattern recognitionand artificial intelligence, 10(06), 731-747.[4] Callaway, E. (2020). Making sense of coronavirus mutations. Nature, 174-177.[5] Caldarelli, G., Higgs, P. G., & McKane, A. J. (1998). Modelling coevolutionin multispecies communities. Journal of theoretical biology, 193(2), 345-358.[6] Case, B., & Lehre, P. K. (2020). Self-Adaptation in Nonelitist EvolutionaryAlgorithms on Discrete Problems With Unknown Structure. IEEE Transac-tions on Evolutionary Computation, 24(4), 650-663.[7] Cheng, J. R., & Gen, M. (2019). Accelerating genetic algorithms with GPUcomputing: A selective overview. Computers & Industrial Engineering, 128,514-525.[8] Dieckmann, U., & Law, R. (1996). The dynamical theory of coevolution:a derivation from stochastic ecological processes. Journal of mathematicalbiology, 34(5), 579-612.[9] Domingo, Esteban and Escarm´ıs, Cristina and Sevilla, Noemi and Moya,Andres and Elena, Santiago F and Quer, Josep and Novella, Isabel S andHolland, John J (1996). Basic concepts in RNA virus evolution. The FASEBJournal, 10(8), 859–864.[10] . Duffy, Siobain (2018) Why are RNA virus mutation rates so damn high?PLoS biology, 16 (8), Public Library of Science San Francisco, CA USA[11] Gilman, R. T., Nuismer, S. L., & Jhwueng, D. C. (2012). Coevolution inmultidimensional trait space favours escape from parasites and pathogens.Nature, 483(7389), 328-330.[12] Guimaraes, P. R., Pires, M. M., Jordano, P., Bascompte, J., & Thomp-son, J. N. (2017). Indirect effects drive coevolution in mutualistic networks.Nature, 550(7677), 511-514.[13] Grefenstette, J. J. (1986). Optimization of control parameters for geneticalgorithms. IEEE Transactions on systems, man, and cybernetics, 16(1),122-128.[14] Hansen, N. (2016). The CMA evolution strategy: A tutorial. arXiv preprintarXiv:1604.00772.[15] Holland, J. H. (1992). Genetic algorithms. Scientific american, 267(1), 66-73.[16] Holland, J. H. (1992). Adaptation in natural and artificial systems: anintroductory analysis with applications to biology, control, and artificial in-telligence. MIT press. 1817] Huang, C., Li, Y., & Yao, X. (2019). A survey of automatic parametertuning methods for metaheuristics. IEEE transactions on evolutionary com-putation, 24(2), 201-216.[18] Hui, C., Minoarivelo, H. O., & Landi, P. (2018). Modelling coevolution inecological networks with adaptive dynamics. Mathematical Methods in theApplied Sciences, 41(18), 8407-8422.[19] Iacobucci, G. (2021). Covid-19: New UK variant may be linked to increaseddeath rate, early data indicate. bmj, 372, n230.[20] Janzen, D. H. (1980). When is it coevolution?[21] Juzonis, V., Goranin, N., Cenys, A., & Olifer, D. (2012). Specialized geneticalgorithm based simulation tool designed for malware evolution forecast-ing. Annales Universitatis Mariae Curie-Sklodowska, sectio AI–Informatica,12(4), 23-37.[22] Kai Kupferschmidt (2020). Mutant coronavirus in theUnited Kingdom sets off alarms, but its importance re-mains unclear. .Accessed: 31 December 2020.[23] Korber, B., Fischer, W. M., Gnanakaran, S., Yoon, H., Theiler, J., Abfal-terer, W., ... & Montefiori, D. C. (2020). Tracking changes in SARS-CoV-2Spike: evidence that D614G increases infectivity of the COVID-19 virus.Cell, 182(4), 812-827.[24] Los Alamos National Laboratory, Newer variant of COVID-19–causing virus dominates global infections (2020). https://lanl.gov/discover/news-release-archive/2020/July/0702-newer-variant-covid-dominates-infections.php . Accessed:19 February 2021.[25] Lei, S., Jiang, F., Su, W., Chen, C., Chen, J., Mei, W., ... & Xia, Z. (2020).Clinical characteristics and outcomes of patients undergoing surgeries duringthe incubation period of COVID-19 infection. EClinicalMedicine, 21, 100331.[26] Lima, C. (2020) Information about the new coronavirus disease (COVID-19). Radiologia Brasileira, 53 (2), V–VI.[27] Mahase Elisabeth. Covid-19: What is the R number? BMJ 2020; 369:m1891[28] Mahase, E. (2021). Covid-19: Novavax vaccine efficacy is 86% against UKvariant and 60% against South African variant.1929] Maty´asek, Roman and Kovar´ık, Ales (2020). Mutation patterns of humanSARS-CoV-2 and bat RaTG13 coronavirus genomes are strongly biased to-wards C¿ U transitions, indicating rapid evolution in their hosts. Genes,11(7), 761.[30] Miikkulainen, R., Forrest, S. A biological perspective on evolutionary com-putation. Nat Mach Intell 3, 9–15 (2021). https://doi.org/10.1038/s42256-020-00278-8[31] Mirjalili, S. (2019). Genetic algorithm. In Evolutionary algorithms and neu-ral networks (pp. 43-55). Springer, Cham.[32] Mitchell, M. (1998). An introduction to genetic algorithms. MIT press.[33] Haug, N., Geyrhofer, L., Londei, A., Dervic, E., Desvars-Larrive, A.,Loreto, V., ... & Klimek, P. (2020). Ranking the effectiveness of worldwideCOVID-19 government interventions. Nature human behaviour, 4(12), 1303-1312.[34] Phan, Tung (2020). Genetic diversity and evolution of SARS-CoV-2. Infec-tion, genetics and evolution, 81.[35] Potter, M. A., & De Jong, K. A. (1994). A cooperative coevolutionaryapproach to function optimization. In International Conference on ParallelProblem Solving from Nature (pp. 249-257). Springer, Berlin, Heidelberg.[36] Priya, P., & Shanker, A. (2021). Coevolutionary forces shaping the fitnessof SARS-CoV-2 spike glycoprotein against human receptor ACE2. Infection,Genetics and Evolution, 87, 104646.[37] Rosin, C. D., & Belew, R. K. (1997). New methods for competitive coevo-lution. Evolutionary computation, 5(1), 1-29.[38] Vie, A. (2020a). Qualities, challenges and future of genetic algorithms: aliterature review. arXiv preprint arXiv:2011.05277.[39] Vie, A. (2020b). Genetic algorithm approach to asymmetrical blotto gameswith heterogeneous valuations. SSRN.[40] Wallace, D. J., & Ackland, G. J. (2021). Abrupt increase in the UK coro-navirus death-case ratio in December 2020. medRxiv.[41] Whitley, D. (1994). A genetic algorithm tutorial. Statistics and computing,4(2), 65-85.[42] WHO (2020a). SARS-CoV-2 Variants. . Accessed: 31 December2020. 2043] WHO (2020b). SARS-CoV-2 mink-associated vari-ant strain – Denmark.