[PDF] Explaining the geographic origins of seasonal influenza A (H3N2)

Abstract

Most antigenically novel and evolutionarily successful strains of seasonal influenza A (H3N2) originate in East, South, and Southeast Asia. To understand this pattern, we simulated the ecological and evolutionary dynamics of influenza in a host metapopulation representing the temperate north, tropics, and temperate south. Although seasonality and air traffic are frequently used to explain global migratory patterns of influenza, we find that other factors may have a comparable or greater impact. Notably, a region's basic reproductive number ( R 0 ) strongly affects the antigenic evolution of its viral population and the probability that its strains will spread and fix globally: a 17-28% higher R 0 in one region can explain the observed patterns. Seasonality, in contrast, increases the probability that a tropical (less seasonal) population will export evolutionarily successful strains but alone does not predict that these strains will be antigenically advanced. The relative sizes of different host populations, their birth and death rates, and the region in which H3N2 first appears affect influenza's phylogeography in different but relatively minor ways. These results suggest general principles that dictate the spatial dynamics of antigenically evolving pathogens and offer predictions for how changes in human ecology might affect influenza evolution.

Full PDF

EExplaining the geographic origins of seasonal inﬂuenza A (H3N2)

Frank Wen ∗ , Trevor Bedford , and Sarah Cobey Department of Ecology and Evolution, University of Chicago, 1101 E. 57th St., Chicago,IL 60637 Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterNovember 6, 2018

Most antigenically novel and evolutionarily successful strains of seasonal inﬂuenza A (H3N2) originate inEast, South, and Southeast Asia. To understand this pattern, we simulated the ecological and evolutionarydynamics of inﬂuenza in a host metapopulation representing the temperate north, tropics, and temperatesouth. Although seasonality and air traﬃc are frequently used to explain global migratory patterns ofinﬂuenza, we ﬁnd that other factors may have a comparable or greater impact. Notably, a region’s basicreproductive number ( R ) strongly aﬀects the antigenic evolution of its viral population and the probabilitythat its strains will spread and ﬁx globally: a 17-28% higher R in one region can explain the observedpatterns. Seasonality, in contrast, increases the probability that a tropical (less seasonal) population willexport evolutionarily successful strains but alone does not predict that these strains will be antigenicallyadvanced. The relative sizes of diﬀerent host populations, their birth and death rates, and the region inwhich H3N2 ﬁrst appears aﬀect inﬂuenza’s phylogeography in diﬀerent but relatively minor ways. Theseresults suggest general principles that dictate the spatial dynamics of antigenically evolving pathogens andoﬀer predictions for how changes in human ecology might aﬀect inﬂuenza evolution. Antigenic variants of seasonal inﬂuenza continuously emerge and escape human immunity in a process knownas antigenic drift. These drifted strains are less easily recognized by host immunity and therefore have atransmission advantage. More antigenically advanced strains are also more likely to spread globally andsuccessfully perpetuate the evolutionary lineage of subsequent variants.Asia has long been recognized as a major source of not only new inﬂuenza subtypes but also new strainsof seasonal inﬂuenza [1–4]. Inﬂuenza A/H3N2, A/H1N1, and two B lineages currently circulate in the humanpopulation, with the H3N2 subtype causing the most disease [5]. Phylogeographic analyses show that East,South, and Southeast Asia contribute disproportionately to the evolution of seasonal H3N2, exporting mostof the evolutionarily successful strains that eventually spread globally [6–10]. The trunk of H3N2’s phylogenytraces the evolutionary path of the most successful lineage and was estimated to be located in Asia 87% ofthe time from 2000 to 2010 [10]. Additionally, strains of H3N2 isolated in E-SE Asia appear to be moreantigenically advanced, with new antigenic variants emerging earlier in E-SE Asia than in the rest of theworld [7, 11]. These observations suggest that ecological diﬀerences between regions, such as climate andhuman demography, aﬀect the local antigenic evolution of H3N2, which in turn shapes its global migratorypatterns. Here we ask what ecological factors might cause disproportionate contributions of particular hostpopulations to the evolution of an inﬂuenza-like pathogen. This information may be immediately useful ∗ [email protected] a r X i v : . [ q - b i o . P E ] A ug or viral forecasting. Over the long term, it could help predict changes in inﬂuenza’s phylogeography andidentify source populations to improve global vaccination strategies.The conspicuous role of Asia in H3N2’s evolution has been attributed to the seasonal nature of inﬂuenzain temperate regions [2, 6–9, 12]. Approximately 85% of Asia’s population and 48% of the global populationresides in a climatically tropical or subtropical region [13] where semiconnected host populations supportasynchronous epidemics that enable regional persistence year-round [7, 12, 14]. Uninterrupted transmissionmight increase both the eﬃciency of selection and the probability of strain survival and global spread.By contrast, transmission bottlenecks from late spring through autumn in temperate populations necessarilylimit local evolution and reduce opportunities for strain emigration [15,16]. Smaller contributions from othertropical and subtropical regions might arise from the weaker connectivity of their host populations [9,17,18].Although seasonality clearly aﬀects temporal patterns of viral migration [8], a robust explanation fordiﬀerences in regions’ long-term contributions to the evolution of H3N2 would consider the eﬀects of seasonalvariation in transmission in light of other potentially inﬂuential diﬀerences among host populations, including: Host population size.

E-S-SE Asia alone contains more than half of the global population [19]. Largerhost populations should sustain larger viral populations, and in the absence of other eﬀects, they shouldcontribute a proportionally larger fraction of strains that happen to spread globally. Additionally, if raremutations limit the generation of antigenic variants, larger populations could contribute a disproportionatenumber of antigenically novel strains with high ﬁtness.

Host population turnover.

Birth rates have historically been higher in E-S-SE Asia than in most tem-perate populations [19]. Demographic rates inﬂuence the replenishment of susceptibles and loss of immuneindividuals, thereby modulating selection for antigenic change. Faster replenishment of susceptibles increasesprevalence, and thus viral abundance and diversity, but weakens the ﬁtness advantage of antigenic variants.A more immune population imposes greater selection for antigenic change but supports a smaller, less diverseviral population. Thus, the rate of antigenic evolution may vary in a complex way with the rate of hostpopulation turnover [20].

Initial conditions.

H3N2 ﬁrst emerged in or near Hong Kong in 1968. The region in which a subtypeemerges may eﬀectively give the viral population a head start on evolution. The ﬁrst epidemic will almostcertainly occur in this region, and viruses here will be the ﬁrst to experience selective pressure for antigenicchange. If host migration rates are low and the founding viral population persists, this antigenic lead couldbe maintained or even grow in time.

Transmission rates.

Diﬀerences in human behaviour can aﬀect transmission rates. The transmission rateaﬀects a strain’s intrinsic reproductive number ( R ), the expected number of secondary cases caused bya single infection in an otherwise susceptible population. Diﬀerences in regional R could aﬀect evolutionin at least two ways. Higher R increases the equilibrium prevalence, increasing the probability that rarebeneﬁcial mutations will appear. In addition, the rate of antigenic drift increases with R in models thatinclude mutation as a diﬀusion-like process [10, 21–23]. A higher intrinsic reproductive number in onepopulation could thus accelerate the emergence of novel mutants in that area.To understand the potential eﬀects of these ﬁve factors on the evolution of H3N2 in space, we simulatedan inﬂuenza-like pathogen in a simpliﬁed representation of the global human metapopulation. The simulatedmetapopulation consisted of three connected host populations, representing the temperate north, tropics, andtemperate south. Conceptually, the tropics in the model approximate Asia, where most of the populationis tropical or subtropical [13] and epidemics are asynchronous, and exclude other less connected tropicaland subtropical populations on other continents [9, 17, 18]. The two temperate populations approximatenorthern and southern populations where inﬂuenza is strongly seasonal. The model can also be generalizedto represent three arbitrary populations by reducing seasonality.We analysed the eﬀects of these factors on two key metrics of inﬂuenza’s spatial evolutionary and antigenicdynamics. The ﬁrst metric measures the proportion of the trunk of the phylogeny present in the tropics(ﬁgure 1 a ). The phylogenetic trunk represents the most evolutionarily successful lineage that goes on toseed all future outbreaks. The second metric measures the degree to which tropical strains are antigenicallyadvanced (ﬁgure 1 b ). Phenotypically, antigenic dissimilarities can be quantiﬁed as distances in antigenicspace using pairwise measures of cross reactivity [11, 24]. Our model uses an analogous measure of antigenicdistances, allowing us to determine the relative antigenic advancement of strains from each region. Weanalysed these two metrics from simulations to test whether any of the ﬁve ecological factors could createspatial evolutionary patterns of a similar magnitude to the observed data.2 Results

We simulated an individual-based model that included ecological and evolutionary dynamics in a metapop-ulation with three demes [25]. By default, in one deme, transmission rates are constant throughout the year,and in the two others, transmission rates vary sinusoidally with opposing phases. Viral phenotypes occuras points in 2D Euclidean space, and mutation displaces phenotypes in this 2D space according to a ﬁxedkernel [25]. This space is analogous to an antigenic map constructed from pairwise measurements of cross-reactivity between inﬂuenza strains using a hemagglutination inhibition (HI) assay [11, 24]. Susceptibilityto infection is proportional to the distance in antigenic space between the challenging strain and the neareststrain in the host’s infection history, giving distant or antigenically advanced strains greater transmissiveadvantage.Figure 1: Representative output showing inﬂuenza-like behaviour from a sample simulation using the defaultparameters (table 2). Statistics reported here are based on 53 replicate simulations. ( a ) The phylogeny of thepathogen is reconstructed explicitly from the recorded ancestry of simulated strains. Branches are coloredby region indicated in panel d . The trunk is determined by tracing the recorded ancestry of surviving strainsat the end of the simulation. Side branches show lineages that go extinct. ( b ) Viruses evolve antigenicallyaway from the founding strain in a canalized fashion. On average, the antigenic distance from the foundingstrain follows the trajectory indicated by the black LOESS spline ﬁtted to viruses from all three regions. Atany given point in time, strains above this line have drifted farther from the founder compared to average,and are thus considered antigenically leading. Conversely, strains below this line are considered antigenicallylagging. Antigenic lead is calculated as the distance to the spline in antigenic units. ( c ) Prevalence ofinfection over time for each region. ( d ) Depiction of the totally connected model population, composed ofthe temperate north, tropics, and temperate south.The model reproduces the characteristic ecological and evolutionary features of H3N2, except for theantigenic lead (table 1), under the default parameters (table 2). We restricted our analyses to simulationswhere the virus remained endemic and where the time to the most recent common ancestor (TMRCA) never3xceeded 10 years during the 40 years of simulation. We chose this cutoﬀ because in some simulations, theviral population developed unrealistically deep branches. In excluding extinctions and excessive diversity(branching), we assume that H3N2’s historical evolutionary patterns represent the virus’ likeliest evolutionarydynamics. Of 100 replicate simulations, the viral population went extinct in 18 cases and exceeded theTMRCA threshold 29 times, leaving 53 simulations for analysis. The model tracks the ancestry of individualstrains, allowing us to explicitly reconstruct the phylogeny of the virus and the geographic location oflineages. The phylogeny has the characteristically well-deﬁned trunk with short branches of the H3N2hemagglutinin (ﬁgure 1). This shape arises due repeated selective sweeps of antigenic variants, which reducesstanding diversity; the average TMRCA across replicates was 3.72 years (SD = 0.26), comparable to empiricalestimates of 3.89 years [10]. The antigenic distance from the founder increased linearly with time (ﬁgure 1),characteristic of H3N2’s canalized antigenic evolution [24, 25]. The mean antigenic drift across replicatesimulations was 0.97 antigenic units per year (SD = 0.11), comparable to observed rates of 1.01 antigenicunits per year [11]. The mean annual incidence was 9.1% (SD = 0.8%). Reported annual incidence acrossall subtypes of seasonal inﬂuenza range from 9-15% [26]. Since we only modeled one lineage (e.g., the H3N2subtype), the low estimate from the model is comparable to observed incidence.Although all three host populations were the same size, the tropical strains were on average more evolu-tionarily successful. The phylogenetic trunk traces the most evolutionarily successful lineage and was locatedin the tropics 77% (SD = 13%) of the time, comparable to the observed 87% of H3N2’s trunk in E-S-SEAsia between 2000-2010 [10]. However, the default parametrization does not produce an antigenic lead inany population, despite the observed antigenic lead of Asian strains (table 1). Antigenic cartography showsthat while H3N2 drifts on average at 1.01 antigenic units per year globally [11], Asian strains tend to befarther drifted at any given time, and the region is thus considered to lead antigenically [7, 11].Table 1: Properties of the default modelStatistic Model mean ± SD Observed (Ref)Annual incidence 0.091 ± − ) 0.97 ± ± ± ± R ) 1.8 [27, 28]Duration of infection ν N

45 million (see ESM)Birth/death (turnover) rate γ − [19]Mutation rate µ − day − (see ESM)Mean mutation step size δ mean δ sd c m − day − (see ESM)Seasonal amplitude (cid:15) .2 Seasonality We ﬁrst varied the strength of seasonal forcing, holding other parameters at their default values. Seasonalityby itself in the two temperate populations could not cause the tropics to produce more antigenically advancedstrains; however, seasonality did cause the tropics to contribute a greater fraction of evolutionarily successfulstrains (ﬁgure 2). By linear regression, we estimate that the trunk would spend 87% of its time in the tropics(the same fraction that is observed in Asia [10]) with a seasonal transmission amplitude ( (cid:15) ) of 0.19 (95% CI:0.18, 0.20). Reduced seasonal forcing in the temperate populations equalized the fraction of the trunk ineach population. In multivariate sensitivity analysis, the amplitude of seasonal transmission accounted for33% of the variation in the tropical fraction of the trunk (electronic supplementary material, ﬁgure S2, tableS2). This result suggests that seasonal bottlenecks in temperate populations discourage seasonal strainsfrom ﬁxing globally, in agreement with other models [15]. However, seasonality alone could not explain anyvariation in the tropic’s antigenic lead (electronic supplementary material, ﬁgure S2, table S3). We thereforehypothesized that ecological factors besides seasonality must contribute to regional diﬀerences in relativeantigenic ﬁtness.

Increasing R in the tropics relative to the temperate populations caused the tropics to produce strains thatled antigenically while also preserving the tropics’ contribution to the trunk (ﬁgure 3). Linear regressionimplies that a 28% (95% CI: 25%, 30%) increase in R in the tropics causes the tropics to produce strainsthat are, on average, 0.25 antigenic units ahead of global mean, reproducing the observed antigenic leadin Asia [7, 11]. We also estimate that a 17% increase in R (95% CI: 15%, 19%) causes the phylogenetictrunk to be located in the tropics 87% of the time, reproducing the observed fraction of the H3N2 trunk inAsia [10].The eﬀects of R on antigenic lead were robust to changes in other ecological variables and over a rangeof baseline values of global R . When we varied the other parameters (table 2), relative R in the tropicsaccounted for 77% of the variance in antigenic lead, making it the best predictor of antigenic lead in thetropics (electronic supplementary material, ﬁgure S2, table S3). The fraction of the trunk in the tropics alsoincreased with the relative R , although R explained less of the variation in trunk proportion (41%), dueto the eﬀect of seasonality (electronic supplementary material, ﬁgure S2, table S2).Notably increased R in one deme was suﬃcient by itself to make strains more evolutionarily successfuland antigenically advanced. When we removed seasonality altogether to model three climatically identicalpopulations, the population with the highest R produced both the most antigenically leading and evolu-tionarily successful strains (ﬁgure 4). Thus, higher R alone in one region can cause it to attain an antigeniclead and fraction of the trunk as large as is observed in Asia.To better understand why increasing regional R causes that region to produce more antigenically ad-vanced strains, we examined the eﬀect of R on antigenic evolution in a single deme. Simulations showedthat increasing R increases the rate of antigenic drift (electronic supplementary material, ﬁgure S3). Toinvestigate further, we derived an analytic expression for the invasion ﬁtness of a novel mutant in a popu-lation at the endemic equilibrium (electronic supplementary material, equation S1). When the resident andmutant strains have the same intrinsic ﬁtness ( R ), the growth rate of an antigenically distinct, invadingmutant increases linearly with R (electronic supplementary material, ﬁgure S4). This linearity holds aslong as the conversion between antigenic distance and host susceptibility (equation 3) is independent of R .As R increases, not only do mutants invade faster, but the invasion speed increases faster as a function ofantigenic distance (electronic supplementary material, ﬁgure S4).Although seasonality alone did not aﬀect antigenic lead, the eﬀects of R on antigenic lead could beinﬂuenced by seasonality (ﬁgure 4). Introducing seasonality in the temperate populations reduced diﬀerencesin antigenic phenotype between regions. When tropical strains were antigenically ahead of temperate strains(due to higher tropical R ), introducing seasonality reduced the tropics’ antigenic lead. When tropical strainswere antigenically behind temperate strains (due to lower tropical R ), introducing seasonality reducedthe antigenic lag. Two factors explain the equalizing eﬀect of seasonality on antigenic phenotype. First,higher contact rates during transmission peaks in the two temperate populations increase the rate of strainimmigration from the tropics. Second, seasonal troughs in prevalence allow tropical strains to invade moreeasily due to reduced competition with local strains. 5 ll lll llll llllll llllll lllllll lll llll lll l lll lll lll ll llllllll llll lllll llllll llllll lllllll lll lllll lllll lllll llll t r op i c s t r unk p r opo r ti on ( a ) lll lll llll llllll llllll lllllll lll llll lll l lll lll lll ll llllllll llll lllll llllll llllll lllllll lll lllll lllll lllll llll −0.4−0.20.00.20.40.00 0.05 0.10 0.15 0.20 0.25seasonal amplitude t r op i c s a n ti g e n i c l ea d ( b ) Figure 2: Seasonal amplitude (cid:15) in the temperate populations increases the tropics’ contribution to the mostevolutionarily successful lineage but alone does not aﬀect regional diﬀerences in antigenic advancement.Transmission rates β in the temperate north and south oscillate sinusoidally in opposite phase, with amplitude (cid:15) . All other parameters remain at their default values (table 2). ( a ) Eﬀects of seasonality on the fractionof the trunk in the tropics (Pearson’s r = 0 . p < . R = 0 . b )Eﬀects on seasonality on the antigenic lead of the tropics (Pearson’s r = − . p = 0 . R = 0 . ll ll lllllll llllll lllll llll lllllllll lllllll llll ll ll llll llll ll llll llllll lllll llllll lllllll R t r op i c s t r unk p r opo r ti on ( a ) ll ll lllllll llllll lllll llll lllllllll lllllll llll ll ll llll llll ll llll llllll lllll llllll lllllll −0.4−0.20.00.20.4 0.8 0.9 1.0 1.1 1.2relative R t r op i c s a n ti g e n i c l ea d ( b ) Figure 3: Increased R in the tropics increases the tropics’ contribution to the most evolutionarily successfullineage and the antigenic advancement of tropical strains. Relative R is calculated as R in the tropicsdivided by R in the temperate regions. R in the tropics was varied while R in the temperate regions waskept at its default. Other parameters were also kept at their default values (table 2). ( a ) Eﬀect of R inthe tropics on the fraction of the trunk in the tropics (Pearson’s r = 0 . p < . R = 0 . b ) Eﬀectof R in the tropics on the antigenic lead in the tropics (Pearson’s r = 0 . p < . R = 0 . .4 Demographic rates, population size, and initial conditions Other ecological factors aﬀected regional contributions to evolution but could not reproduce the observedpatterns as well as diﬀerences in R (electronic supplementary material, ﬁgures S1, S2). Notably, strainswere slightly more antigenically advanced in older populations (electronic supplementary material, ﬁgure S1).When the rate of population turnover in the tropics was half that in the temperate regions, the tropics ledby 0.04 antigenic units (SD = 0.03). Larger populations generally contributed more to the trunk, althoughthere was much variation that population size alone did not explain (electronic supplementary material,ﬁgures S1, S2 and table S2, S3). Initial conditions did not have a lasting eﬀect (electronic supplementarymaterial, ﬁgure S5). Inﬂuenza A/H1N1 and inﬂuenza B both evolve slowly compared to H3N2 and are suspected to have lower R [10, 11]. Speciﬁcally, H1N1 drifts at a rate of 0.62 antigenic units per year, and the B/Victoria andYamagata strains drift at 0.42 and 0.32 antigenic units per year respectively [11]. H1N1 and B viruses arealso less apt to have Asian origins than H3N2 [10]. When we simulate with lower baseline R , we ﬁnd thatdiﬀerences in R between regions have a weaker inﬂuence on spatial patterns of evolution (electronic sup-plementary material, ﬁgure S8). Based on the relationship between mean R and antigenic drift (electronicsupplementary material, ﬁgure S3), we would expect seasonal H1N1, for example, to have an R of 1.6. Forthis R , a 17% increase in R causes the tropics to occupy only 79% (versus 87% for H3N2-like R of 1.8)of the trunk, and a 28% increase in R causes the tropics to lead by 0.20 (versus 0.25 for H3N2) antigenicunits. R s ea s on a l a m p lit ud e a ) 0.000.050.100.150.200.25 0.8 0.9 1.0 1.1 1.2relative R s ea s on a l a m p lit ud e −0.4 −0.2 0.0 0.2 0.4tropics antigenic lead( b ) Figure 4: Seasonality in temperate populations has an equalizing eﬀect on antigenic diﬀerences. Relative R is calculated as R in the tropics divided by R in the temperate regions. ( a ) Eﬀects of seasonality and R on the fraction of the trunk in the tropics. Blue indicates that the phylogenetic trunk is located in thetropics less than 1/3 of the time, and red indicates that the trunk is the tropics more than 1/3 of the time.( b ) Eﬀects of seasonality and R on antigenic lead in the tropics. Blue indicates that tropical strains are onaverage ahead antigenically relative to other global strains and red indicates that tropical strains are behindantigenically. Each square averages 1 to 17 replicate simulations. In our model, we ﬁnd that the simplest explanation for why a host population produces more antigenicallynovel and evolutionarily successful strains than other populations is that its strains have a higher intrinsic7tness, or R . The strong eﬀect of regional R on spatial patterns of viral evolution is caused by the eﬀectof R on antigenic drift. Higher regional R facilitates invasion of antigenically novel strains, resulting infaster antigenic drift. Seasonality reduces the rate at which temperate populations export strains that areevolutionarily successful, but seasonality alone cannot explain regional diﬀerences in the production of strainsthat are antigenically novel. Size and age can inﬂuence global patterns too, but to a lesser extent: largerpopulations export more strains that ﬁx, and populations with slower replenishment of susceptibles increasethe rate of antigenic evolution. These last two eﬀects are sensitive to changes in seasonality and R . Theseresults highlight the relationship between human ecology and inﬂuenza’s phylogeography. Regions with hightransmission rates may be expected to contribute disproportionately to inﬂuenza’s evolution and may alsobe ideal targets for vaccine campaigns. Accordingly, changes in human ecology can be expected to alterinﬂuenza’s phylogeography. These generalizations assume that H3N2 will evolve mostly as it has, with highstrain turnover and limited genetic variation at any time, but more complex dynamics may be possible.To make general predictions, we used a simple model. Although our three-deme metapopulation pre-vents us from replicating inﬂuenza’s phylogeographic dynamics precisely, the model nonetheless reveals howecological diﬀerences in populations create spatial patterns in the evolution of an inﬂuenza-like pathogen.Simulations with more complex metapopulation models showed the same trends as the simple three-dememodel (ﬁgures S9, S10), suggesting that our results are robust to changes in metapopulation populationstructure.These results immediately raise the question of whether there is evidence of regional variation in R .Low reporting rates and antigenic evolution make the R of inﬂuenza diﬃcult to measure with traditionalmethods, but we can conjecture from several lines of evidence. Low absolute humidity favors transmissionvia aerosol in experimental settings [33] and inﬂuences the timing of the inﬂuenza season in the UnitedStates [34]. Based on absolute humidity and aerosol transmission alone, these results suggest that R oftropical and subtropical Asia would be lower than in temperate latitudes. However, in Vietnam the onset ofinﬂuenza-like illness is associated with periods of high humidity [35]. This observation suggests that humidityis not the dominant driver of inﬂuenza transmission, at least in this region.Contact rates also inﬂuence transmission [36]. Multiple studies have detected a signiﬁcant eﬀect of schoolclosure on inﬂuenza spread [37–39], although this trend is not without exception [40]. Households alsoinﬂuence risk: after one household member is infected, the average risk of secondary infection in a householdcontact is 10% [41]. Diﬀerences in classroom and household sizes may thus inﬂuence local transmission, andboth are higher in, for instance, China and India than in Europe and the U.S. [42, 43]. Contact surveysreport higher contact rates in Guangdong, China, than in European communities, whereas those in Vietnamare lower, although diﬀerences may arise from diﬀerences in survey design [44–46]. These surveys notablymiss non-social, casual contacts (e.g., shared cafeterias and elevators) that might be important for inﬂuenzatransmission.Diﬀerences in local transmission rates may not scale: high rates of local transmission may be oﬀset orattenuated by the structure of contact networks over larger areas. At the regional level, commuter and airpassenger ﬂows aﬀect the spread of inﬂuenza epidemics, suggesting that adults are important to the long-range dispersal of the virus [12,18]. The frequency of long-distance contacts diﬀers between communities [44].Although sensitivity of R to network topology is well known theoretically [47,48], there is a need to integratethe features of local and regional empirical transmission networks to infer large-scale diﬀerences R .Empirical estimates of R are in theory attainable from seroprevalence. Under a simplistic, single-strain SIR model, which assumes random mixing and no maternal immunity, diﬀerences in R should appearin diﬀerences in seropositivity by age. For instance, if R =1.8, approximately 5.1% of 2 year-olds wouldbe seropositive, whereas 7.4% would be seropositive if R were 20% higher. R variation in this rangecould be detected by sampling as few as 1500 2 year-olds in each population. Detailed surveys of H3N2seropositivity by age cohort exist for some European countries [49, 50] but show much faster increases inseropositivity with age than expected under the SIR model: 100% of tested children are seropositive toH3N2 by age 7 in the Netherlands and by age 12 in Germany. This discrepancy between theory and datamay be due to antigenic drift resulting in higher attack rates [10]. The spatial diﬀerence in seroprevalencemay also reﬂect greater contact rates among school-aged children [45] and highlights the possibility thatdiﬀerences in exposure rates at young ages do not reﬂect mean diﬀerences in the populations. Such eﬀectsmay be reduced by examining seroprevalence at older ages, but these estimates must balance a tradeoﬀbetween minimizing age-related correlations in transmission rates and increasing sample sizes required to8etect asymptotically small diﬀerences in seropositivity. Another potential approach to measuring R is toreﬁne estimates of annual incidence in diﬀerent populations. Estimates of R based on annual incidencewould have to incorporate the histories of recent circulating strains, survey timing and titer dynamics, andvaccination in each population.A greatly reduced birth rate confers a slight antigenic lead, but actual diﬀerences in birth rates betweenregions appear too small to explain Asia’s observed lead. Current birth rates across most of Europe, China,and the United States are within 10% of each other [19]. Birth rates are almost twice as high in someSE Asian countries, including Cambodia, Laos, and the Philippines. The highest birth rates are foundin Africa and the Middle East, and are three to four times higher than birth rates in the United Statesand China. Our model suggests that these regions should contribute relatively less to inﬂuenza’s antigenicevolution, assuming the diﬀerences in population structure are not associated with higher R , and ignoringother diﬀerences. However, taking age-assortative mixing into account may negate this expectation, withyounger populations having increased R [48, 51] thus contributing more to antigenic evolution.We expect these results to apply to other antigenically varying, fast-evolving pathogens, including othertypes of inﬂuenza. Enterovirus-71 circulates globally, and its VP1 capsid protein experiences continuouslineage replacement through time, similarly to H3N2 hemagglutinin [52]. Norovirus also demonstrates rapidantigenic evolution by amino acid replacements in its capsid protein [53]. We might expect that areaswith high transmission contribute disproportionately to the antigenic evolution and global spread of thesepathogens. In addition, when we simulate with lower R , we ﬁnd that diﬀerences in R between regionsinﬂuence spatial patterns of antigenic variation less (electronic supplementary material, ﬁgure S8). This mayexplain why inﬂuenza A H1N1 and inﬂuenza B, which are suspected to have lower R [10, 11], are less aptto have Asian origins than H3N2 [10]. We implemented an individual-based

SIR compartmental model of an inﬂuenza-like pathogen, originallydescribed by Bedford et al. [25]. In this model, a global metapopulation is composed of three connectedpopulations, representing tropics and temperate north and south. Individuals’ compartments are updatedusing a τ -leaping algorithm. Within a region i , the force of infection is given by F i ( t ) = β i ( t ) I i N i (1)where I is the number of infected hosts. Between regions i and j , the force of infection is given by F ij ( t ) = mβ j ( t ) I i N j (2)where region i is where the infection originates and region j is the destination. Here, m is a scaling factor forinterregional transmission, and β j is the transmission rate of the destination region. Transmission rates inthe seasonal north and south oscillate sinusoidally in opposite phase with amplitude (cid:15) . After recovery frominfection, a host acquires complete immunity to viruses with that speciﬁc antigenic phenotype. Hosts thatclear infection accumulate an infection history that deﬁnes their immunity. In a contact event, the distancesbetween the infecting viral phenotype and each phenotype in the susceptible host’s immune history arecalculated. The probability of infection after contact is proportional to the distance d to the closest phenotypein the host’s immune history. An individual’s risk of infection by such a strain isRisk = min { , cd } (3)where the proportionality constant for converting antigenic distance to a risk of infection c = 0 .

07 [25]; inother words, one unit of antigenic distance corresponds to 7% reduction in immunity. The linear relationship c between antigenic distance and susceptibility derives from studies of vaccine eﬃcacy [25, 30, 31].Antigenic phenotypes are represented by points in a two-dimensional Euclidean antigenic space. Oneunit of antigenic distance in this space corresponds to a twofold dilution of antiserum in an HI assay [24].The model is initialized at the endemic equilibrium with antigenically identical viruses. By default, all of theinitial infections occur in the tropics. Mutational events occur at a rate µ mutations per day. When a virus9utates, it moves in a random radial direction with a gamma-distributed step size. This mutation rate,along with the mutation size parameters ( δ mean , δ sd ) determine the accessibility of more distant mutationsin antigenic space. The radial direction of mutation is chosen from a uniform distribution.Additional methods are described in the electronic supplementary material. Code implementing the model is available at https://github.com/cobeylab/antigen-phylogeography.git . The complete code for reproducing these results is available at https://github.com/cobeylab/influenza_phylogeography_manuscript.git . We have no competing interests.

TB and SC conceived the study. FW performed the analysis and wrote the ﬁrst draft of the paper. All ofthe authors contributed to and approve the ﬁnal version.

We thank Daniel Zinder, Maciej Boni, and Greg Dwyer for helpful discussion and Ed Baskerville for pro-gramming guidance. This work was completed in part with resources provided by the University of ChicagoResearch Computing Center. SC was supported by NIH grant DP2AI117921. FW was supported by NIHgrant T32GM007281.

References

Annual report of the ICAO council: 2014 , 2014.

10 Supporting Information

Parameter values were selected to be consistent with inﬂuenza’s biology and to reproduce its major epi-demiological and evolutionary patterns (table 1). The population size N was chosen to minimize extinctionswhile also making eﬃcient use of computational resources. The population birth/death rate γ = 1/30 year − reﬂects the global crude birth rate estimates of 34 births per 1000 [19].The proportionality constant m for calculating the between-region contact rate was calculated from thenumber of international air travel passengers reported by the International Civil Aviation Organizationdivided by the global population [54].We chose a baseline R = 1 .

8. Estimates of R from the ﬁrst pandemic wave H3N2 in 1968 range from1.06-2.06 [27], and estimates of R for seasonal inﬂuenza range from 1.16 to 2.5, averaging approximately1.8 [28].The ﬁve-day duration of infection, 1 /ν , is based on estimates from viral shedding [29]. The transmissionrate, β , is calculated using the deﬁnition of R : R = βν + γ

13e chose the seasonal amplitude to ensure consistent troughs during the oﬀ-season in temperate populationswhile remaining within reasonable estimates of seasonal transmission rates [32].Mutational parameters were selected to maximize the number of simulations where evolution was inﬂuenza-like (ﬁgure S6). Mutations occur at a rate of 10 − mutations per day. This phenotypic mutation rate cor-responds to 10 antigenic sites mutating at 10 − mutations per day [6, 25]. The distance of each mutation issampled from a gamma distribution with parameters chosen to yield a mean step size of 0.6 and a standarddeviation of 0.3 antigenic units. These values correspond to a reduction in immunity of 4.2% for an averagemutation (SD = 2.1%). These mutation eﬀect parameters give the gamma distribution an exponential-likeshape, so that most mutations yield small diﬀerences in antigenic ﬁtness, while occasionally mutations willyield greater diﬀerences. We chose µ , δ mean , and δ sd so that the simulations would exhibit inﬂuenza-likebehaviour as consistently as possible (ﬁgure S6). Here, the criteria for inﬂuenza-like behavior included en-demism, reduced genealogical diversity (TMRCA <

10 years) [10], and a biologically plausible mean rate ofantigenic drift (1.01 antigenic units per year) [11] and incidence (9-15%) [26] (table 1, ﬁgure S6).

We examined two metrics that describe inﬂuenza’s evolutionary dynamics. For computational tractability,these metrics were calculated using a subset of strains sampled over course of the simulation. Strains weresampled proportionally to prevalence.To calculate antigenic lead, we ﬁrst calculated the antigenic distance of each sampled strain from thefounding strain (ﬁgure 1 a ). We then ﬁt a LOESS spline to these distances over time. The spline describesthe expected antigenic drift of circulating lineages at any point in time. Strains above the spline have driftedfarther than average and are considered antigenically leading. Strains below the spline have drifted lessthan average and are considered antigenically lagging. The antigenic lead in the tropics is calculated as theaverage antigenic distance to this spline for all sampled tropical strains.To calculate the fraction of the trunk in each population, we ﬁrst identiﬁed strains that comprise thetrunk by tracing the lineage of strains that survive to the end of the simulation (ﬁgure 1 b ). Because multiplelineages may coexist at the end of the simulation, we excluded the last ﬁve years of strains from trunkcalculations. The fraction of the trunk in the tropics is calculated as the fraction of the time the trunk wascomposed of tropical strains. In the univariate sensitivity analyses, we created regional diﬀerences in host ecology by varying each of theﬁve ecological parameters individually ( R , population turnover rate, seasonality, population size, and initialconditions) while keeping all other parameters at their default values (table 2). To test the eﬀects of regional R , we changed R only in the tropics. Similarly, we tested the eﬀects of the rate of population turnover byvarying it only in the tropics. To investigate seasonality, we varied the seasonal amplitude of the transmissionrate in the temperate populations. (The transmission rate in the tropics was always constant over time.) Toexplore population size, we examined the ratio of tropical to temperate population sizes, keeping the globalpopulation constant. We initialized all simulations at the endemic equilibrium such that the total numberof initial infecteds was constant. We then scaled the number of initial infecteds in the tropics while keepingthe number in the two temperate demes the same. We ran twenty replicates for each unique combination ofparameter values and discarded any simulations in which the virus went extinct or the TMRCA exceeded10 years at any time in the 40-year simulation. The analyses for antigenic lead and trunk proportion wereperformed on the remaining simulations (ﬁgure S1). To test the robustness of the eﬀects of individual parameters on the antigenic lead and the phylogenetictrunk, we simulated 500 points from a Latin hypercube with dimensions representing relative R , seasonality,relative population size, relative population turnover rate, and the fraction of initial infecteds in the tropics(ﬁgure S2). The ranges for each parameter (table S1) were chosen to remain within reasonable estimates.We simulated twenty replicates for each of the 500 unique parameter combination and discarded simulationsin which the virus went extinct or the TMRCA exceeded 10 years at any time in the 40-year simulation.14e performed an ANOVA on the remaining 4119 inﬂuenza-like simulations to determine each parameter’scontribution to the variance in antigenic lead and the fraction of the trunk in the tropics (table S2, S3). We assume that the host population supports a resident strain at the endemic equilibrium. We develop anexpression for the ﬁtness of an invading mutant strain to explain how the selection coeﬃcient of the mutantchanges with R .Here, S, I , and R represent the fraction of susceptible, infected, and recovered individuals. The birth rate γ and the death rate are equal, so the population size is constant. All individuals are born into the susceptibleclass. Transmission occurs at rate β , and recovery occurs at rate ν . dSdt = γ (1 − S ) − βSIdIdt = βSI − ( ν + γ ) IdRdt = νI − γR We solve for the endemic equilibrium values of S eq , I eq , R eq . dIdt = 0 = βS eq I eq − ( ν + γ ) I eq S eq = ν + γβ ≡ R R , the basic reproductive number, is deﬁned as the number of secondary infections from a single infectedindividual in a totally susceptible population. Continuing to solve for I eq and R eq , we have dSdt = 0 = γ (1 − S eq ) − βS eq I eq I eq = γβ ( R − dRdt = 0 = νI eq − γR eq R eq = νβ ( R − R e for boththe resident and mutant strains. R e is the expected number of secondary infections from a single infectedindividual in a given population. We will use the relationship R e = SR The mutant strain is d antigenic units from the resident strain. The conversion factor between antigenicunits and infection risk is notated by c . Thus, the susceptibility to the mutant is given by min { cd, } , andimmunity to the mutant is max { − cd, } . For ease of notation, we assume cd ≤

1, and use k = 1 − cd .The fraction of the population immune to the invading strain is denoted by R (cid:48) . Note that the population isat the endemic equilibrium of the resident strain, and not the mutant. R (cid:48) = (1 − cd ) R eq = νβ ( R − k

15e start by allowing coinfection. The fraction of susceptibles to the mutant strain is given by S (cid:48) = 1 − R (cid:48) − N = 1 − νkβ ( R − − N For large N , we have S (cid:48) = 1 − νkβ ( R − dI (cid:48) dt = I (cid:48) [ βS (cid:48) − ( ν + γ )] dIdt = I eq [ βS eq − ( ν + γ )]To get the selection coeﬃcient, we take the diﬀerence between the growth rates: s = [ βS (cid:48) − ( ν + γ )] − [ βS eq − ( ν + γ )]= β − γk ( R − − βR Recall that β = ( ν + γ ) R s = ( ν + γ ) R − νk ( R − − ( ν + γ )Simplifying, s = ( νcd + γ )( R −

1) (S1)Now disallowing coinfection, we have S (cid:48) = 1 − R (cid:48) − I eq − I (cid:48) = 1 − νβ ( R − k − γβ ( R − − N For large N , S (cid:48) = 1 − ( R − νk + γβ ]Using the same arithmetic as in the case with coinfection, it follows that16 = β − ( νk + γ )( R − − βR Simplifying, s = ( νcd )( R −

1) (S2)In summary, the selection coeﬃcient of an invading mutant strain increases linearly with the R , whichis shared by both strains. The slope of this relationship is proportional to the distance d between the twostrains in antigenic space (ﬁgure S4). Naturally, relationship between the selection coeﬃcient on the distance d between strains depends on the functional relationship between antigenic distance and immunity. However,the linear dependence of the selection coeﬃcient on R holds as long as the functional relationship betweenantigenic distance and immunity is independent of R . R In the

SIR model, the force of infection is F = βI where β is the transmission rate and I is the fraction of infecteds. At the endemic equilibrium, the cumulativefraction of seropositive individuals at a given age a is f ( a ) = 1 − exp( − βI eq a )= 1 − exp( − R ( ν + γI eq a )where ν is the recovery rate and γ is the birth/death rate. I eq , the fraction of infecteds at the endemicequilibrium, is given by I eq = γβ ( R − R = 1 . R = 2 .

16. The diﬀerence in the percentage of seropositive two-year-olds between the two groups is approx-imately 2.3%. The sample size in each group required to detect a diﬀerence f ( a ) − f ( a ) with α conﬁdenceand 1 − β power is N = f (1 − f ) + f (1 − f )( f − f ) (Φ α/ + Φ β ) For legibility, f i ( a ) is written as f i . To detect a 20% diﬀerence in R between two populations with 0.05signiﬁcance and 0.80 power, we would require a sample of at least 1503 individuals in both groups.

11 Supplemental tables and ﬁgures R − (cid:15) ) in temperate populations 0.0 − − γ ) 0.5 − I ) in tropics 0.0 − > F)Relative N < I in tropics 1 0.62 0.002 0.62 38.94 < R < < < > F)Relative N < I in tropics 1 0.02 < R < < < a ) the antigenic leadand ( b ) the fraction of the phylogenetic trunk in the tropics. In each column of plots, only the parameterindicated on the x-axis is varying; all others are held constant at the default value. Each point representsthe mean value over a single simulation. Blue lines indicate linear least squares regression. The dashedlines represent the null hypotheses where ( a ) the trunk is distributed equally among the three regions or( b ) tropical strains are neither antigenically ahead or behind. ( c ) Number of simulations that went extinct(red), exceeded the TMRCA limit (green), or were suitable for analysis (blue).19igure S2: Multivariate sensitivity analysis showing eﬀects of individual parameters on the ( a ) antigeniclead and ( b ) the fraction of the phylogenetic trunk in the tropics. Horizontal axes are projections of a Latinhypercube with dimensions corresponding to the ﬁve parameters indicated. Each point shows the mean valueover a single simulation. The dashed lines represent the null hypotheses where ( a ) the trunk is distributedequally among the three regions or ( b ) tropical strains are neither antigenically ahead or behind. Pearson’scorrelation coeﬃcients and associated 95% conﬁdence intervals are indicated.20 ll lllll llll lllll llll llll llll l l R m ea n a n ti g e n i c d r i f t r a t e ( a n ti g e n i c un it s p e r y ea r) Figure S3: Eﬀect of R on antigenic drift in a single deme. Each point shows the mean antigenic driftrate from a single simulation. The blue line represents linear least squares regression, and the dashed lineindicates the empirical estimate of the rate of antigenic drift for H3N2 [11]. Pearson’s r = 0 . p < . .5 2.0 2.5 3.0 . . . . R s e l ec ti on c o e ff i c i e n t s Figure S4: Relationship between R and the selection coeﬃcient for an invading strain with the resident atendemic equilibrium. The relationship for three diﬀerent antigenic distances d between the invading strainand the resident strain is shown: 0.6 (solid line), 1.0 (dashed line), or 2.0 (dotted line) antigenic units. du r a ti on o f s i m u l a ti on ( y ea r s ) a ) 12345 0.0 0.3 0.6 0.9fraction of initial infecteds in tropics du r a ti on o f s i m u l a ti on ( y ea r s ) −0.10 −0.05 0.00 0.05 0.10tropics antigenic lead( b ) Figure S5: Eﬀect of initial conditions on the antigenic lead and the fraction of the trunk in the tropics earlyin the simulation. Each square represents the average value for n=5 to 11 replicate simulations.22 .20.40.60.20.40.60.20.40.6 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 d mean d mean d mean d mean d mean d s d d s d d s d m = m = m = Figure S6: Sensitivity of inﬂuenza-like behaviour to changes in the mutational parameters, the mutationrate µ , mean mutation size δ mean , and standard deviation of the mutation size δ sd . Within each plot, eachsquare represents ten replicate simulations. Each row of plots shows results from simulations using diﬀerentmutation rates µ . The number of simulations where the virus went extinct is shown in the second column ofplots, and the number of simulations where the viral population exceeded a TMRCA of 10 years is shown inthe third column of plots. The remaining simulations are considered inﬂuenza-like and are shown in the ﬁrstcolumn of plots. The reported mean antigenic drift rates and prevalences are averaged over the inﬂuenza-likereplicates. The color scales for mean antigenic drift and incidence are centered (white) at the observed valuesfor H3N2 (table 1). 23 . . . . . . age (years) fr ac ti on s e r opo s iti v e R = 1.8 R = 2.16difference Figure S7: Theoretical increase in the fraction of seropositive individuals with age with R = 1 . R = 2 . R b a s e li n e R a ) 1.41.61.82.0 0.8 0.9 1.0 1.1 1.2relative R b a s e li n e R −0.2 −0.1 0.0 0.1 0.2tropics antigenic lead( b ) Figure S8: Lowering baseline R decreases the eﬀect of relative R on the fraction of the trunk and antigeniclead in the tropics. ( a ) Eﬀects of baseline and relative R on the fraction of the trunk in the tropics. Blueindicates that the phylogenetic trunk is located in the tropics less than 1/3 of the time, and red indicatesthat the trunk is the tropics more than 1/3 of the time. ( b ) Eﬀects of baseline R and relative R onantigenic lead in the tropics. Blue indicates that tropical strains are on average ahead antigenically relativeto other strains, and red indicates that tropical strains are behind antigenically. Each square representsan average from 1 to 14 replicate simulations. Grey squares indicate parameter combinations where all oftwenty attempted simulations either went extinct or exceeded the TMRCA threshold of 10 years.24igure S9: Univariate sensitivity analysis using a fully connected 5-deme model ( a ) showing the eﬀects ofindividual parameters on ( b ) the antigenic lead and ( c ) the fraction of the phylogenetic trunk in the tropics.By default, the tropics have a population size that is twice as large as any single temperate deme. In eachcolumn of plots, only the parameter indicated on the x-axis is varying; all others are held constant at thedefault value. Each point represents the mean value over a single simulation. Blue lines indicate linearleast squares regression. The dashed lines represent the null hypotheses where ( b ) the trunk is distributedproportionally to the default population size among the regions or ( c ) tropical strains are neither antigenicallyahead or behind. ( d ) Number of simulations that went extinct (red), exceeded the TMRCA limit (green),or were suitable for analysis (blue). 25igure S10: Univariate sensitivity analysis using a fully connected 6-deme model ( a ) showing the eﬀectsof individual parameters on ( b ) the antigenic lead and ( c ) the fraction of the phylogenetic trunk in eachof the two tropical demes. By default, all demes have the same population size. In each column of plots,only the parameter indicated on the x-axis is varying; all others are held constant at the default value.Each point represents the mean value over a single simulation. Black points show results from one tropicaldeme and red points from the other. Blue lines indicate linear least squares regression to the combineddata from both tropical demes. The dashed lines represent the null hypotheses where ( b ) the trunk isdistributed proportionally to the default population size among the regions or ( c ) tropical strains are neitherantigenically ahead or behind. ( dd