Explaining the geographic origins of seasonal influenza A (H3N2)
EExplaining the geographic origins of seasonal influenza A (H3N2)
Frank Wen ∗ , Trevor Bedford , and Sarah Cobey Department of Ecology and Evolution, University of Chicago, 1101 E. 57th St., Chicago,IL 60637 Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterNovember 6, 2018
Most antigenically novel and evolutionarily successful strains of seasonal influenza A (H3N2) originate inEast, South, and Southeast Asia. To understand this pattern, we simulated the ecological and evolutionarydynamics of influenza in a host metapopulation representing the temperate north, tropics, and temperatesouth. Although seasonality and air traffic are frequently used to explain global migratory patterns ofinfluenza, we find that other factors may have a comparable or greater impact. Notably, a region’s basicreproductive number ( R ) strongly affects the antigenic evolution of its viral population and the probabilitythat its strains will spread and fix globally: a 17-28% higher R in one region can explain the observedpatterns. Seasonality, in contrast, increases the probability that a tropical (less seasonal) population willexport evolutionarily successful strains but alone does not predict that these strains will be antigenicallyadvanced. The relative sizes of different host populations, their birth and death rates, and the region inwhich H3N2 first appears affect influenza’s phylogeography in different but relatively minor ways. Theseresults suggest general principles that dictate the spatial dynamics of antigenically evolving pathogens andoffer predictions for how changes in human ecology might affect influenza evolution. Antigenic variants of seasonal influenza continuously emerge and escape human immunity in a process knownas antigenic drift. These drifted strains are less easily recognized by host immunity and therefore have atransmission advantage. More antigenically advanced strains are also more likely to spread globally andsuccessfully perpetuate the evolutionary lineage of subsequent variants.Asia has long been recognized as a major source of not only new influenza subtypes but also new strainsof seasonal influenza [1–4]. Influenza A/H3N2, A/H1N1, and two B lineages currently circulate in the humanpopulation, with the H3N2 subtype causing the most disease [5]. Phylogeographic analyses show that East,South, and Southeast Asia contribute disproportionately to the evolution of seasonal H3N2, exporting mostof the evolutionarily successful strains that eventually spread globally [6–10]. The trunk of H3N2’s phylogenytraces the evolutionary path of the most successful lineage and was estimated to be located in Asia 87% ofthe time from 2000 to 2010 [10]. Additionally, strains of H3N2 isolated in E-SE Asia appear to be moreantigenically advanced, with new antigenic variants emerging earlier in E-SE Asia than in the rest of theworld [7, 11]. These observations suggest that ecological differences between regions, such as climate andhuman demography, affect the local antigenic evolution of H3N2, which in turn shapes its global migratorypatterns. Here we ask what ecological factors might cause disproportionate contributions of particular hostpopulations to the evolution of an influenza-like pathogen. This information may be immediately useful ∗ [email protected] a r X i v : . [ q - b i o . P E ] A ug or viral forecasting. Over the long term, it could help predict changes in influenza’s phylogeography andidentify source populations to improve global vaccination strategies.The conspicuous role of Asia in H3N2’s evolution has been attributed to the seasonal nature of influenzain temperate regions [2, 6–9, 12]. Approximately 85% of Asia’s population and 48% of the global populationresides in a climatically tropical or subtropical region [13] where semiconnected host populations supportasynchronous epidemics that enable regional persistence year-round [7, 12, 14]. Uninterrupted transmissionmight increase both the efficiency of selection and the probability of strain survival and global spread.By contrast, transmission bottlenecks from late spring through autumn in temperate populations necessarilylimit local evolution and reduce opportunities for strain emigration [15,16]. Smaller contributions from othertropical and subtropical regions might arise from the weaker connectivity of their host populations [9,17,18].Although seasonality clearly affects temporal patterns of viral migration [8], a robust explanation fordifferences in regions’ long-term contributions to the evolution of H3N2 would consider the effects of seasonalvariation in transmission in light of other potentially influential differences among host populations, including: Host population size.
E-S-SE Asia alone contains more than half of the global population [19]. Largerhost populations should sustain larger viral populations, and in the absence of other effects, they shouldcontribute a proportionally larger fraction of strains that happen to spread globally. Additionally, if raremutations limit the generation of antigenic variants, larger populations could contribute a disproportionatenumber of antigenically novel strains with high fitness.
Host population turnover.
Birth rates have historically been higher in E-S-SE Asia than in most tem-perate populations [19]. Demographic rates influence the replenishment of susceptibles and loss of immuneindividuals, thereby modulating selection for antigenic change. Faster replenishment of susceptibles increasesprevalence, and thus viral abundance and diversity, but weakens the fitness advantage of antigenic variants.A more immune population imposes greater selection for antigenic change but supports a smaller, less diverseviral population. Thus, the rate of antigenic evolution may vary in a complex way with the rate of hostpopulation turnover [20].
Initial conditions.
H3N2 first emerged in or near Hong Kong in 1968. The region in which a subtypeemerges may effectively give the viral population a head start on evolution. The first epidemic will almostcertainly occur in this region, and viruses here will be the first to experience selective pressure for antigenicchange. If host migration rates are low and the founding viral population persists, this antigenic lead couldbe maintained or even grow in time.
Transmission rates.
Differences in human behaviour can affect transmission rates. The transmission rateaffects a strain’s intrinsic reproductive number ( R ), the expected number of secondary cases caused bya single infection in an otherwise susceptible population. Differences in regional R could affect evolutionin at least two ways. Higher R increases the equilibrium prevalence, increasing the probability that rarebeneficial mutations will appear. In addition, the rate of antigenic drift increases with R in models thatinclude mutation as a diffusion-like process [10, 21–23]. A higher intrinsic reproductive number in onepopulation could thus accelerate the emergence of novel mutants in that area.To understand the potential effects of these five factors on the evolution of H3N2 in space, we simulatedan influenza-like pathogen in a simplified representation of the global human metapopulation. The simulatedmetapopulation consisted of three connected host populations, representing the temperate north, tropics, andtemperate south. Conceptually, the tropics in the model approximate Asia, where most of the populationis tropical or subtropical [13] and epidemics are asynchronous, and exclude other less connected tropicaland subtropical populations on other continents [9, 17, 18]. The two temperate populations approximatenorthern and southern populations where influenza is strongly seasonal. The model can also be generalizedto represent three arbitrary populations by reducing seasonality.We analysed the effects of these factors on two key metrics of influenza’s spatial evolutionary and antigenicdynamics. The first metric measures the proportion of the trunk of the phylogeny present in the tropics(figure 1 a ). The phylogenetic trunk represents the most evolutionarily successful lineage that goes on toseed all future outbreaks. The second metric measures the degree to which tropical strains are antigenicallyadvanced (figure 1 b ). Phenotypically, antigenic dissimilarities can be quantified as distances in antigenicspace using pairwise measures of cross reactivity [11, 24]. Our model uses an analogous measure of antigenicdistances, allowing us to determine the relative antigenic advancement of strains from each region. Weanalysed these two metrics from simulations to test whether any of the five ecological factors could createspatial evolutionary patterns of a similar magnitude to the observed data.2 Results
We simulated an individual-based model that included ecological and evolutionary dynamics in a metapop-ulation with three demes [25]. By default, in one deme, transmission rates are constant throughout the year,and in the two others, transmission rates vary sinusoidally with opposing phases. Viral phenotypes occuras points in 2D Euclidean space, and mutation displaces phenotypes in this 2D space according to a fixedkernel [25]. This space is analogous to an antigenic map constructed from pairwise measurements of cross-reactivity between influenza strains using a hemagglutination inhibition (HI) assay [11, 24]. Susceptibilityto infection is proportional to the distance in antigenic space between the challenging strain and the neareststrain in the host’s infection history, giving distant or antigenically advanced strains greater transmissiveadvantage.Figure 1: Representative output showing influenza-like behaviour from a sample simulation using the defaultparameters (table 2). Statistics reported here are based on 53 replicate simulations. ( a ) The phylogeny of thepathogen is reconstructed explicitly from the recorded ancestry of simulated strains. Branches are coloredby region indicated in panel d . The trunk is determined by tracing the recorded ancestry of surviving strainsat the end of the simulation. Side branches show lineages that go extinct. ( b ) Viruses evolve antigenicallyaway from the founding strain in a canalized fashion. On average, the antigenic distance from the foundingstrain follows the trajectory indicated by the black LOESS spline fitted to viruses from all three regions. Atany given point in time, strains above this line have drifted farther from the founder compared to average,and are thus considered antigenically leading. Conversely, strains below this line are considered antigenicallylagging. Antigenic lead is calculated as the distance to the spline in antigenic units. ( c ) Prevalence ofinfection over time for each region. ( d ) Depiction of the totally connected model population, composed ofthe temperate north, tropics, and temperate south.The model reproduces the characteristic ecological and evolutionary features of H3N2, except for theantigenic lead (table 1), under the default parameters (table 2). We restricted our analyses to simulationswhere the virus remained endemic and where the time to the most recent common ancestor (TMRCA) never3xceeded 10 years during the 40 years of simulation. We chose this cutoff because in some simulations, theviral population developed unrealistically deep branches. In excluding extinctions and excessive diversity(branching), we assume that H3N2’s historical evolutionary patterns represent the virus’ likeliest evolutionarydynamics. Of 100 replicate simulations, the viral population went extinct in 18 cases and exceeded theTMRCA threshold 29 times, leaving 53 simulations for analysis. The model tracks the ancestry of individualstrains, allowing us to explicitly reconstruct the phylogeny of the virus and the geographic location oflineages. The phylogeny has the characteristically well-defined trunk with short branches of the H3N2hemagglutinin (figure 1). This shape arises due repeated selective sweeps of antigenic variants, which reducesstanding diversity; the average TMRCA across replicates was 3.72 years (SD = 0.26), comparable to empiricalestimates of 3.89 years [10]. The antigenic distance from the founder increased linearly with time (figure 1),characteristic of H3N2’s canalized antigenic evolution [24, 25]. The mean antigenic drift across replicatesimulations was 0.97 antigenic units per year (SD = 0.11), comparable to observed rates of 1.01 antigenicunits per year [11]. The mean annual incidence was 9.1% (SD = 0.8%). Reported annual incidence acrossall subtypes of seasonal influenza range from 9-15% [26]. Since we only modeled one lineage (e.g., the H3N2subtype), the low estimate from the model is comparable to observed incidence.Although all three host populations were the same size, the tropical strains were on average more evolu-tionarily successful. The phylogenetic trunk traces the most evolutionarily successful lineage and was locatedin the tropics 77% (SD = 13%) of the time, comparable to the observed 87% of H3N2’s trunk in E-S-SEAsia between 2000-2010 [10]. However, the default parametrization does not produce an antigenic lead inany population, despite the observed antigenic lead of Asian strains (table 1). Antigenic cartography showsthat while H3N2 drifts on average at 1.01 antigenic units per year globally [11], Asian strains tend to befarther drifted at any given time, and the region is thus considered to lead antigenically [7, 11].Table 1: Properties of the default modelStatistic Model mean ± SD Observed (Ref)Annual incidence 0.091 ± − ) 0.97 ± ± ± ± R ) 1.8 [27, 28]Duration of infection ν N
45 million (see ESM)Birth/death (turnover) rate γ − [19]Mutation rate µ − day − (see ESM)Mean mutation step size δ mean δ sd c m − day − (see ESM)Seasonal amplitude (cid:15) .2 Seasonality We first varied the strength of seasonal forcing, holding other parameters at their default values. Seasonalityby itself in the two temperate populations could not cause the tropics to produce more antigenically advancedstrains; however, seasonality did cause the tropics to contribute a greater fraction of evolutionarily successfulstrains (figure 2). By linear regression, we estimate that the trunk would spend 87% of its time in the tropics(the same fraction that is observed in Asia [10]) with a seasonal transmission amplitude ( (cid:15) ) of 0.19 (95% CI:0.18, 0.20). Reduced seasonal forcing in the temperate populations equalized the fraction of the trunk ineach population. In multivariate sensitivity analysis, the amplitude of seasonal transmission accounted for33% of the variation in the tropical fraction of the trunk (electronic supplementary material, figure S2, tableS2). This result suggests that seasonal bottlenecks in temperate populations discourage seasonal strainsfrom fixing globally, in agreement with other models [15]. However, seasonality alone could not explain anyvariation in the tropic’s antigenic lead (electronic supplementary material, figure S2, table S3). We thereforehypothesized that ecological factors besides seasonality must contribute to regional differences in relativeantigenic fitness.
Increasing R in the tropics relative to the temperate populations caused the tropics to produce strains thatled antigenically while also preserving the tropics’ contribution to the trunk (figure 3). Linear regressionimplies that a 28% (95% CI: 25%, 30%) increase in R in the tropics causes the tropics to produce strainsthat are, on average, 0.25 antigenic units ahead of global mean, reproducing the observed antigenic leadin Asia [7, 11]. We also estimate that a 17% increase in R (95% CI: 15%, 19%) causes the phylogenetictrunk to be located in the tropics 87% of the time, reproducing the observed fraction of the H3N2 trunk inAsia [10].The effects of R on antigenic lead were robust to changes in other ecological variables and over a rangeof baseline values of global R . When we varied the other parameters (table 2), relative R in the tropicsaccounted for 77% of the variance in antigenic lead, making it the best predictor of antigenic lead in thetropics (electronic supplementary material, figure S2, table S3). The fraction of the trunk in the tropics alsoincreased with the relative R , although R explained less of the variation in trunk proportion (41%), dueto the effect of seasonality (electronic supplementary material, figure S2, table S2).Notably increased R in one deme was sufficient by itself to make strains more evolutionarily successfuland antigenically advanced. When we removed seasonality altogether to model three climatically identicalpopulations, the population with the highest R produced both the most antigenically leading and evolu-tionarily successful strains (figure 4). Thus, higher R alone in one region can cause it to attain an antigeniclead and fraction of the trunk as large as is observed in Asia.To better understand why increasing regional R causes that region to produce more antigenically ad-vanced strains, we examined the effect of R on antigenic evolution in a single deme. Simulations showedthat increasing R increases the rate of antigenic drift (electronic supplementary material, figure S3). Toinvestigate further, we derived an analytic expression for the invasion fitness of a novel mutant in a popu-lation at the endemic equilibrium (electronic supplementary material, equation S1). When the resident andmutant strains have the same intrinsic fitness ( R ), the growth rate of an antigenically distinct, invadingmutant increases linearly with R (electronic supplementary material, figure S4). This linearity holds aslong as the conversion between antigenic distance and host susceptibility (equation 3) is independent of R .As R increases, not only do mutants invade faster, but the invasion speed increases faster as a function ofantigenic distance (electronic supplementary material, figure S4).Although seasonality alone did not affect antigenic lead, the effects of R on antigenic lead could beinfluenced by seasonality (figure 4). Introducing seasonality in the temperate populations reduced differencesin antigenic phenotype between regions. When tropical strains were antigenically ahead of temperate strains(due to higher tropical R ), introducing seasonality reduced the tropics’ antigenic lead. When tropical strainswere antigenically behind temperate strains (due to lower tropical R ), introducing seasonality reducedthe antigenic lag. Two factors explain the equalizing effect of seasonality on antigenic phenotype. First,higher contact rates during transmission peaks in the two temperate populations increase the rate of strainimmigration from the tropics. Second, seasonal troughs in prevalence allow tropical strains to invade moreeasily due to reduced competition with local strains. 5 ll lll llll llllll llllll lllllll lll llll lll l lll lll lll ll llllllll llll lllll llllll llllll lllllll lll lllll lllll lllll llll t r op i c s t r unk p r opo r ti on ( a ) lll lll llll llllll llllll lllllll lll llll lll l lll lll lll ll llllllll llll lllll llllll llllll lllllll lll lllll lllll lllll llll −0.4−0.20.00.20.40.00 0.05 0.10 0.15 0.20 0.25seasonal amplitude t r op i c s a n ti g e n i c l ea d ( b ) Figure 2: Seasonal amplitude (cid:15) in the temperate populations increases the tropics’ contribution to the mostevolutionarily successful lineage but alone does not affect regional differences in antigenic advancement.Transmission rates β in the temperate north and south oscillate sinusoidally in opposite phase, with amplitude (cid:15) . All other parameters remain at their default values (table 2). ( a ) Effects of seasonality on the fractionof the trunk in the tropics (Pearson’s r = 0 . p < . R = 0 . b )Effects on seasonality on the antigenic lead of the tropics (Pearson’s r = − . p = 0 . R = 0 . ll ll lllllll llllll lllll llll lllllllll lllllll llll ll ll llll llll ll llll llllll lllll llllll lllllll R t r op i c s t r unk p r opo r ti on ( a ) ll ll lllllll llllll lllll llll lllllllll lllllll llll ll ll llll llll ll llll llllll lllll llllll lllllll −0.4−0.20.00.20.4 0.8 0.9 1.0 1.1 1.2relative R t r op i c s a n ti g e n i c l ea d ( b ) Figure 3: Increased R in the tropics increases the tropics’ contribution to the most evolutionarily successfullineage and the antigenic advancement of tropical strains. Relative R is calculated as R in the tropicsdivided by R in the temperate regions. R in the tropics was varied while R in the temperate regions waskept at its default. Other parameters were also kept at their default values (table 2). ( a ) Effect of R inthe tropics on the fraction of the trunk in the tropics (Pearson’s r = 0 . p < . R = 0 . b ) Effectof R in the tropics on the antigenic lead in the tropics (Pearson’s r = 0 . p < . R = 0 . .4 Demographic rates, population size, and initial conditions Other ecological factors affected regional contributions to evolution but could not reproduce the observedpatterns as well as differences in R (electronic supplementary material, figures S1, S2). Notably, strainswere slightly more antigenically advanced in older populations (electronic supplementary material, figure S1).When the rate of population turnover in the tropics was half that in the temperate regions, the tropics ledby 0.04 antigenic units (SD = 0.03). Larger populations generally contributed more to the trunk, althoughthere was much variation that population size alone did not explain (electronic supplementary material,figures S1, S2 and table S2, S3). Initial conditions did not have a lasting effect (electronic supplementarymaterial, figure S5). Influenza A/H1N1 and influenza B both evolve slowly compared to H3N2 and are suspected to have lower R [10, 11]. Specifically, H1N1 drifts at a rate of 0.62 antigenic units per year, and the B/Victoria andYamagata strains drift at 0.42 and 0.32 antigenic units per year respectively [11]. H1N1 and B viruses arealso less apt to have Asian origins than H3N2 [10]. When we simulate with lower baseline R , we find thatdifferences in R between regions have a weaker influence on spatial patterns of evolution (electronic sup-plementary material, figure S8). Based on the relationship between mean R and antigenic drift (electronicsupplementary material, figure S3), we would expect seasonal H1N1, for example, to have an R of 1.6. Forthis R , a 17% increase in R causes the tropics to occupy only 79% (versus 87% for H3N2-like R of 1.8)of the trunk, and a 28% increase in R causes the tropics to lead by 0.20 (versus 0.25 for H3N2) antigenicunits. R s ea s on a l a m p lit ud e a ) 0.000.050.100.150.200.25 0.8 0.9 1.0 1.1 1.2relative R s ea s on a l a m p lit ud e −0.4 −0.2 0.0 0.2 0.4tropics antigenic lead( b ) Figure 4: Seasonality in temperate populations has an equalizing effect on antigenic differences. Relative R is calculated as R in the tropics divided by R in the temperate regions. ( a ) Effects of seasonality and R on the fraction of the trunk in the tropics. Blue indicates that the phylogenetic trunk is located in thetropics less than 1/3 of the time, and red indicates that the trunk is the tropics more than 1/3 of the time.( b ) Effects of seasonality and R on antigenic lead in the tropics. Blue indicates that tropical strains are onaverage ahead antigenically relative to other global strains and red indicates that tropical strains are behindantigenically. Each square averages 1 to 17 replicate simulations. In our model, we find that the simplest explanation for why a host population produces more antigenicallynovel and evolutionarily successful strains than other populations is that its strains have a higher intrinsic7tness, or R . The strong effect of regional R on spatial patterns of viral evolution is caused by the effectof R on antigenic drift. Higher regional R facilitates invasion of antigenically novel strains, resulting infaster antigenic drift. Seasonality reduces the rate at which temperate populations export strains that areevolutionarily successful, but seasonality alone cannot explain regional differences in the production of strainsthat are antigenically novel. Size and age can influence global patterns too, but to a lesser extent: largerpopulations export more strains that fix, and populations with slower replenishment of susceptibles increasethe rate of antigenic evolution. These last two effects are sensitive to changes in seasonality and R . Theseresults highlight the relationship between human ecology and influenza’s phylogeography. Regions with hightransmission rates may be expected to contribute disproportionately to influenza’s evolution and may alsobe ideal targets for vaccine campaigns. Accordingly, changes in human ecology can be expected to alterinfluenza’s phylogeography. These generalizations assume that H3N2 will evolve mostly as it has, with highstrain turnover and limited genetic variation at any time, but more complex dynamics may be possible.To make general predictions, we used a simple model. Although our three-deme metapopulation pre-vents us from replicating influenza’s phylogeographic dynamics precisely, the model nonetheless reveals howecological differences in populations create spatial patterns in the evolution of an influenza-like pathogen.Simulations with more complex metapopulation models showed the same trends as the simple three-dememodel (figures S9, S10), suggesting that our results are robust to changes in metapopulation populationstructure.These results immediately raise the question of whether there is evidence of regional variation in R .Low reporting rates and antigenic evolution make the R of influenza difficult to measure with traditionalmethods, but we can conjecture from several lines of evidence. Low absolute humidity favors transmissionvia aerosol in experimental settings [33] and influences the timing of the influenza season in the UnitedStates [34]. Based on absolute humidity and aerosol transmission alone, these results suggest that R oftropical and subtropical Asia would be lower than in temperate latitudes. However, in Vietnam the onset ofinfluenza-like illness is associated with periods of high humidity [35]. This observation suggests that humidityis not the dominant driver of influenza transmission, at least in this region.Contact rates also influence transmission [36]. Multiple studies have detected a significant effect of schoolclosure on influenza spread [37–39], although this trend is not without exception [40]. Households alsoinfluence risk: after one household member is infected, the average risk of secondary infection in a householdcontact is 10% [41]. Differences in classroom and household sizes may thus influence local transmission, andboth are higher in, for instance, China and India than in Europe and the U.S. [42, 43]. Contact surveysreport higher contact rates in Guangdong, China, than in European communities, whereas those in Vietnamare lower, although differences may arise from differences in survey design [44–46]. These surveys notablymiss non-social, casual contacts (e.g., shared cafeterias and elevators) that might be important for influenzatransmission.Differences in local transmission rates may not scale: high rates of local transmission may be offset orattenuated by the structure of contact networks over larger areas. At the regional level, commuter and airpassenger flows affect the spread of influenza epidemics, suggesting that adults are important to the long-range dispersal of the virus [12,18]. The frequency of long-distance contacts differs between communities [44].Although sensitivity of R to network topology is well known theoretically [47,48], there is a need to integratethe features of local and regional empirical transmission networks to infer large-scale differences R .Empirical estimates of R are in theory attainable from seroprevalence. Under a simplistic, single-strain SIR model, which assumes random mixing and no maternal immunity, differences in R should appearin differences in seropositivity by age. For instance, if R =1.8, approximately 5.1% of 2 year-olds wouldbe seropositive, whereas 7.4% would be seropositive if R were 20% higher. R variation in this rangecould be detected by sampling as few as 1500 2 year-olds in each population. Detailed surveys of H3N2seropositivity by age cohort exist for some European countries [49, 50] but show much faster increases inseropositivity with age than expected under the SIR model: 100% of tested children are seropositive toH3N2 by age 7 in the Netherlands and by age 12 in Germany. This discrepancy between theory and datamay be due to antigenic drift resulting in higher attack rates [10]. The spatial difference in seroprevalencemay also reflect greater contact rates among school-aged children [45] and highlights the possibility thatdifferences in exposure rates at young ages do not reflect mean differences in the populations. Such effectsmay be reduced by examining seroprevalence at older ages, but these estimates must balance a tradeoffbetween minimizing age-related correlations in transmission rates and increasing sample sizes required to8etect asymptotically small differences in seropositivity. Another potential approach to measuring R is torefine estimates of annual incidence in different populations. Estimates of R based on annual incidencewould have to incorporate the histories of recent circulating strains, survey timing and titer dynamics, andvaccination in each population.A greatly reduced birth rate confers a slight antigenic lead, but actual differences in birth rates betweenregions appear too small to explain Asia’s observed lead. Current birth rates across most of Europe, China,and the United States are within 10% of each other [19]. Birth rates are almost twice as high in someSE Asian countries, including Cambodia, Laos, and the Philippines. The highest birth rates are foundin Africa and the Middle East, and are three to four times higher than birth rates in the United Statesand China. Our model suggests that these regions should contribute relatively less to influenza’s antigenicevolution, assuming the differences in population structure are not associated with higher R , and ignoringother differences. However, taking age-assortative mixing into account may negate this expectation, withyounger populations having increased R [48, 51] thus contributing more to antigenic evolution.We expect these results to apply to other antigenically varying, fast-evolving pathogens, including othertypes of influenza. Enterovirus-71 circulates globally, and its VP1 capsid protein experiences continuouslineage replacement through time, similarly to H3N2 hemagglutinin [52]. Norovirus also demonstrates rapidantigenic evolution by amino acid replacements in its capsid protein [53]. We might expect that areaswith high transmission contribute disproportionately to the antigenic evolution and global spread of thesepathogens. In addition, when we simulate with lower R , we find that differences in R between regionsinfluence spatial patterns of antigenic variation less (electronic supplementary material, figure S8). This mayexplain why influenza A H1N1 and influenza B, which are suspected to have lower R [10, 11], are less aptto have Asian origins than H3N2 [10]. We implemented an individual-based
SIR compartmental model of an influenza-like pathogen, originallydescribed by Bedford et al. [25]. In this model, a global metapopulation is composed of three connectedpopulations, representing tropics and temperate north and south. Individuals’ compartments are updatedusing a τ -leaping algorithm. Within a region i , the force of infection is given by F i ( t ) = β i ( t ) I i N i (1)where I is the number of infected hosts. Between regions i and j , the force of infection is given by F ij ( t ) = mβ j ( t ) I i N j (2)where region i is where the infection originates and region j is the destination. Here, m is a scaling factor forinterregional transmission, and β j is the transmission rate of the destination region. Transmission rates inthe seasonal north and south oscillate sinusoidally in opposite phase with amplitude (cid:15) . After recovery frominfection, a host acquires complete immunity to viruses with that specific antigenic phenotype. Hosts thatclear infection accumulate an infection history that defines their immunity. In a contact event, the distancesbetween the infecting viral phenotype and each phenotype in the susceptible host’s immune history arecalculated. The probability of infection after contact is proportional to the distance d to the closest phenotypein the host’s immune history. An individual’s risk of infection by such a strain isRisk = min { , cd } (3)where the proportionality constant for converting antigenic distance to a risk of infection c = 0 .
07 [25]; inother words, one unit of antigenic distance corresponds to 7% reduction in immunity. The linear relationship c between antigenic distance and susceptibility derives from studies of vaccine efficacy [25, 30, 31].Antigenic phenotypes are represented by points in a two-dimensional Euclidean antigenic space. Oneunit of antigenic distance in this space corresponds to a twofold dilution of antiserum in an HI assay [24].The model is initialized at the endemic equilibrium with antigenically identical viruses. By default, all of theinitial infections occur in the tropics. Mutational events occur at a rate µ mutations per day. When a virus9utates, it moves in a random radial direction with a gamma-distributed step size. This mutation rate,along with the mutation size parameters ( δ mean , δ sd ) determine the accessibility of more distant mutationsin antigenic space. The radial direction of mutation is chosen from a uniform distribution.Additional methods are described in the electronic supplementary material. Code implementing the model is available at https://github.com/cobeylab/antigen-phylogeography.git . The complete code for reproducing these results is available at https://github.com/cobeylab/influenza_phylogeography_manuscript.git . We have no competing interests.
TB and SC conceived the study. FW performed the analysis and wrote the first draft of the paper. All ofthe authors contributed to and approve the final version.
We thank Daniel Zinder, Maciej Boni, and Greg Dwyer for helpful discussion and Ed Baskerville for pro-gramming guidance. This work was completed in part with resources provided by the University of ChicagoResearch Computing Center. SC was supported by NIH grant DP2AI117921. FW was supported by NIHgrant T32GM007281.
References
Annual report of the ICAO council: 2014 , 2014.
10 Supporting Information
Parameter values were selected to be consistent with influenza’s biology and to reproduce its major epi-demiological and evolutionary patterns (table 1). The population size N was chosen to minimize extinctionswhile also making efficient use of computational resources. The population birth/death rate γ = 1/30 year − reflects the global crude birth rate estimates of 34 births per 1000 [19].The proportionality constant m for calculating the between-region contact rate was calculated from thenumber of international air travel passengers reported by the International Civil Aviation Organizationdivided by the global population [54].We chose a baseline R = 1 .
8. Estimates of R from the first pandemic wave H3N2 in 1968 range from1.06-2.06 [27], and estimates of R for seasonal influenza range from 1.16 to 2.5, averaging approximately1.8 [28].The five-day duration of infection, 1 /ν , is based on estimates from viral shedding [29]. The transmissionrate, β , is calculated using the definition of R : R = βν + γ
13e chose the seasonal amplitude to ensure consistent troughs during the off-season in temperate populationswhile remaining within reasonable estimates of seasonal transmission rates [32].Mutational parameters were selected to maximize the number of simulations where evolution was influenza-like (figure S6). Mutations occur at a rate of 10 − mutations per day. This phenotypic mutation rate cor-responds to 10 antigenic sites mutating at 10 − mutations per day [6, 25]. The distance of each mutation issampled from a gamma distribution with parameters chosen to yield a mean step size of 0.6 and a standarddeviation of 0.3 antigenic units. These values correspond to a reduction in immunity of 4.2% for an averagemutation (SD = 2.1%). These mutation effect parameters give the gamma distribution an exponential-likeshape, so that most mutations yield small differences in antigenic fitness, while occasionally mutations willyield greater differences. We chose µ , δ mean , and δ sd so that the simulations would exhibit influenza-likebehaviour as consistently as possible (figure S6). Here, the criteria for influenza-like behavior included en-demism, reduced genealogical diversity (TMRCA <
10 years) [10], and a biologically plausible mean rate ofantigenic drift (1.01 antigenic units per year) [11] and incidence (9-15%) [26] (table 1, figure S6).
We examined two metrics that describe influenza’s evolutionary dynamics. For computational tractability,these metrics were calculated using a subset of strains sampled over course of the simulation. Strains weresampled proportionally to prevalence.To calculate antigenic lead, we first calculated the antigenic distance of each sampled strain from thefounding strain (figure 1 a ). We then fit a LOESS spline to these distances over time. The spline describesthe expected antigenic drift of circulating lineages at any point in time. Strains above the spline have driftedfarther than average and are considered antigenically leading. Strains below the spline have drifted lessthan average and are considered antigenically lagging. The antigenic lead in the tropics is calculated as theaverage antigenic distance to this spline for all sampled tropical strains.To calculate the fraction of the trunk in each population, we first identified strains that comprise thetrunk by tracing the lineage of strains that survive to the end of the simulation (figure 1 b ). Because multiplelineages may coexist at the end of the simulation, we excluded the last five years of strains from trunkcalculations. The fraction of the trunk in the tropics is calculated as the fraction of the time the trunk wascomposed of tropical strains. In the univariate sensitivity analyses, we created regional differences in host ecology by varying each of thefive ecological parameters individually ( R , population turnover rate, seasonality, population size, and initialconditions) while keeping all other parameters at their default values (table 2). To test the effects of regional R , we changed R only in the tropics. Similarly, we tested the effects of the rate of population turnover byvarying it only in the tropics. To investigate seasonality, we varied the seasonal amplitude of the transmissionrate in the temperate populations. (The transmission rate in the tropics was always constant over time.) Toexplore population size, we examined the ratio of tropical to temperate population sizes, keeping the globalpopulation constant. We initialized all simulations at the endemic equilibrium such that the total numberof initial infecteds was constant. We then scaled the number of initial infecteds in the tropics while keepingthe number in the two temperate demes the same. We ran twenty replicates for each unique combination ofparameter values and discarded any simulations in which the virus went extinct or the TMRCA exceeded10 years at any time in the 40-year simulation. The analyses for antigenic lead and trunk proportion wereperformed on the remaining simulations (figure S1). To test the robustness of the effects of individual parameters on the antigenic lead and the phylogenetictrunk, we simulated 500 points from a Latin hypercube with dimensions representing relative R , seasonality,relative population size, relative population turnover rate, and the fraction of initial infecteds in the tropics(figure S2). The ranges for each parameter (table S1) were chosen to remain within reasonable estimates.We simulated twenty replicates for each of the 500 unique parameter combination and discarded simulationsin which the virus went extinct or the TMRCA exceeded 10 years at any time in the 40-year simulation.14e performed an ANOVA on the remaining 4119 influenza-like simulations to determine each parameter’scontribution to the variance in antigenic lead and the fraction of the trunk in the tropics (table S2, S3). We assume that the host population supports a resident strain at the endemic equilibrium. We develop anexpression for the fitness of an invading mutant strain to explain how the selection coefficient of the mutantchanges with R .Here, S, I , and R represent the fraction of susceptible, infected, and recovered individuals. The birth rate γ and the death rate are equal, so the population size is constant. All individuals are born into the susceptibleclass. Transmission occurs at rate β , and recovery occurs at rate ν . dSdt = γ (1 − S ) − βSIdIdt = βSI − ( ν + γ ) IdRdt = νI − γR We solve for the endemic equilibrium values of S eq , I eq , R eq . dIdt = 0 = βS eq I eq − ( ν + γ ) I eq S eq = ν + γβ ≡ R R , the basic reproductive number, is defined as the number of secondary infections from a single infectedindividual in a totally susceptible population. Continuing to solve for I eq and R eq , we have dSdt = 0 = γ (1 − S eq ) − βS eq I eq I eq = γβ ( R − dRdt = 0 = νI eq − γR eq R eq = νβ ( R − R e for boththe resident and mutant strains. R e is the expected number of secondary infections from a single infectedindividual in a given population. We will use the relationship R e = SR The mutant strain is d antigenic units from the resident strain. The conversion factor between antigenicunits and infection risk is notated by c . Thus, the susceptibility to the mutant is given by min { cd, } , andimmunity to the mutant is max { − cd, } . For ease of notation, we assume cd ≤
1, and use k = 1 − cd .The fraction of the population immune to the invading strain is denoted by R (cid:48) . Note that the population isat the endemic equilibrium of the resident strain, and not the mutant. R (cid:48) = (1 − cd ) R eq = νβ ( R − k
15e start by allowing coinfection. The fraction of susceptibles to the mutant strain is given by S (cid:48) = 1 − R (cid:48) − N = 1 − νkβ ( R − − N For large N , we have S (cid:48) = 1 − νkβ ( R − dI (cid:48) dt = I (cid:48) [ βS (cid:48) − ( ν + γ )] dIdt = I eq [ βS eq − ( ν + γ )]To get the selection coefficient, we take the difference between the growth rates: s = [ βS (cid:48) − ( ν + γ )] − [ βS eq − ( ν + γ )]= β − γk ( R − − βR Recall that β = ( ν + γ ) R s = ( ν + γ ) R − νk ( R − − ( ν + γ )Simplifying, s = ( νcd + γ )( R −
1) (S1)Now disallowing coinfection, we have S (cid:48) = 1 − R (cid:48) − I eq − I (cid:48) = 1 − νβ ( R − k − γβ ( R − − N For large N , S (cid:48) = 1 − ( R − νk + γβ ]Using the same arithmetic as in the case with coinfection, it follows that16 = β − ( νk + γ )( R − − βR Simplifying, s = ( νcd )( R −
1) (S2)In summary, the selection coefficient of an invading mutant strain increases linearly with the R , whichis shared by both strains. The slope of this relationship is proportional to the distance d between the twostrains in antigenic space (figure S4). Naturally, relationship between the selection coefficient on the distance d between strains depends on the functional relationship between antigenic distance and immunity. However,the linear dependence of the selection coefficient on R holds as long as the functional relationship betweenantigenic distance and immunity is independent of R . R In the
SIR model, the force of infection is F = βI where β is the transmission rate and I is the fraction of infecteds. At the endemic equilibrium, the cumulativefraction of seropositive individuals at a given age a is f ( a ) = 1 − exp( − βI eq a )= 1 − exp( − R ( ν + γI eq a )where ν is the recovery rate and γ is the birth/death rate. I eq , the fraction of infecteds at the endemicequilibrium, is given by I eq = γβ ( R − R = 1 . R = 2 .
16. The difference in the percentage of seropositive two-year-olds between the two groups is approx-imately 2.3%. The sample size in each group required to detect a difference f ( a ) − f ( a ) with α confidenceand 1 − β power is N = f (1 − f ) + f (1 − f )( f − f ) (Φ α/ + Φ β ) For legibility, f i ( a ) is written as f i . To detect a 20% difference in R between two populations with 0.05significance and 0.80 power, we would require a sample of at least 1503 individuals in both groups.
11 Supplemental tables and figures R − (cid:15) ) in temperate populations 0.0 − − γ ) 0.5 − I ) in tropics 0.0 − > F)Relative N < I in tropics 1 0.62 0.002 0.62 38.94 < R < < < > F)Relative N < I in tropics 1 0.02 < R < < < a ) the antigenic leadand ( b ) the fraction of the phylogenetic trunk in the tropics. In each column of plots, only the parameterindicated on the x-axis is varying; all others are held constant at the default value. Each point representsthe mean value over a single simulation. Blue lines indicate linear least squares regression. The dashedlines represent the null hypotheses where ( a ) the trunk is distributed equally among the three regions or( b ) tropical strains are neither antigenically ahead or behind. ( c ) Number of simulations that went extinct(red), exceeded the TMRCA limit (green), or were suitable for analysis (blue).19igure S2: Multivariate sensitivity analysis showing effects of individual parameters on the ( a ) antigeniclead and ( b ) the fraction of the phylogenetic trunk in the tropics. Horizontal axes are projections of a Latinhypercube with dimensions corresponding to the five parameters indicated. Each point shows the mean valueover a single simulation. The dashed lines represent the null hypotheses where ( a ) the trunk is distributedequally among the three regions or ( b ) tropical strains are neither antigenically ahead or behind. Pearson’scorrelation coefficients and associated 95% confidence intervals are indicated.20 ll lllll llll lllll llll llll llll l l R m ea n a n ti g e n i c d r i f t r a t e ( a n ti g e n i c un it s p e r y ea r) Figure S3: Effect of R on antigenic drift in a single deme. Each point shows the mean antigenic driftrate from a single simulation. The blue line represents linear least squares regression, and the dashed lineindicates the empirical estimate of the rate of antigenic drift for H3N2 [11]. Pearson’s r = 0 . p < . .5 2.0 2.5 3.0 . . . . R s e l ec ti on c o e ff i c i e n t s Figure S4: Relationship between R and the selection coefficient for an invading strain with the resident atendemic equilibrium. The relationship for three different antigenic distances d between the invading strainand the resident strain is shown: 0.6 (solid line), 1.0 (dashed line), or 2.0 (dotted line) antigenic units. du r a ti on o f s i m u l a ti on ( y ea r s ) a ) 12345 0.0 0.3 0.6 0.9fraction of initial infecteds in tropics du r a ti on o f s i m u l a ti on ( y ea r s ) −0.10 −0.05 0.00 0.05 0.10tropics antigenic lead( b ) Figure S5: Effect of initial conditions on the antigenic lead and the fraction of the trunk in the tropics earlyin the simulation. Each square represents the average value for n=5 to 11 replicate simulations.22 .20.40.60.20.40.60.20.40.6 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 d mean d mean d mean d mean d mean d s d d s d d s d m = m = m = Figure S6: Sensitivity of influenza-like behaviour to changes in the mutational parameters, the mutationrate µ , mean mutation size δ mean , and standard deviation of the mutation size δ sd . Within each plot, eachsquare represents ten replicate simulations. Each row of plots shows results from simulations using differentmutation rates µ . The number of simulations where the virus went extinct is shown in the second column ofplots, and the number of simulations where the viral population exceeded a TMRCA of 10 years is shown inthe third column of plots. The remaining simulations are considered influenza-like and are shown in the firstcolumn of plots. The reported mean antigenic drift rates and prevalences are averaged over the influenza-likereplicates. The color scales for mean antigenic drift and incidence are centered (white) at the observed valuesfor H3N2 (table 1). 23 . . . . . . age (years) fr ac ti on s e r opo s iti v e R = 1.8 R = 2.16difference Figure S7: Theoretical increase in the fraction of seropositive individuals with age with R = 1 . R = 2 . R b a s e li n e R a ) 1.41.61.82.0 0.8 0.9 1.0 1.1 1.2relative R b a s e li n e R −0.2 −0.1 0.0 0.1 0.2tropics antigenic lead( b ) Figure S8: Lowering baseline R decreases the effect of relative R on the fraction of the trunk and antigeniclead in the tropics. ( a ) Effects of baseline and relative R on the fraction of the trunk in the tropics. Blueindicates that the phylogenetic trunk is located in the tropics less than 1/3 of the time, and red indicatesthat the trunk is the tropics more than 1/3 of the time. ( b ) Effects of baseline R and relative R onantigenic lead in the tropics. Blue indicates that tropical strains are on average ahead antigenically relativeto other strains, and red indicates that tropical strains are behind antigenically. Each square representsan average from 1 to 14 replicate simulations. Grey squares indicate parameter combinations where all oftwenty attempted simulations either went extinct or exceeded the TMRCA threshold of 10 years.24igure S9: Univariate sensitivity analysis using a fully connected 5-deme model ( a ) showing the effects ofindividual parameters on ( b ) the antigenic lead and ( c ) the fraction of the phylogenetic trunk in the tropics.By default, the tropics have a population size that is twice as large as any single temperate deme. In eachcolumn of plots, only the parameter indicated on the x-axis is varying; all others are held constant at thedefault value. Each point represents the mean value over a single simulation. Blue lines indicate linearleast squares regression. The dashed lines represent the null hypotheses where ( b ) the trunk is distributedproportionally to the default population size among the regions or ( c ) tropical strains are neither antigenicallyahead or behind. ( d ) Number of simulations that went extinct (red), exceeded the TMRCA limit (green),or were suitable for analysis (blue). 25igure S10: Univariate sensitivity analysis using a fully connected 6-deme model ( a ) showing the effectsof individual parameters on ( b ) the antigenic lead and ( c ) the fraction of the phylogenetic trunk in eachof the two tropical demes. By default, all demes have the same population size. In each column of plots,only the parameter indicated on the x-axis is varying; all others are held constant at the default value.Each point represents the mean value over a single simulation. Black points show results from one tropicaldeme and red points from the other. Blue lines indicate linear least squares regression to the combineddata from both tropical demes. The dashed lines represent the null hypotheses where ( b ) the trunk isdistributed proportionally to the default population size among the regions or ( c ) tropical strains are neitherantigenically ahead or behind. ( dd