Forecasting hospital demand during COVID-19 pandemic outbreaks
FForecasting hospital demand during COVID-19pandemic outbreaks
Marcos A. Capistr´an, ∗ Antonio Capella, J. Andr´es Christen Centro de Investigaci´on en Matem´aticas, CIMAT-CONACYT,Jalisco S/N, Valenciana, Guanajuato, GTO, Mexico. Instituto de Matem´aticas, UNAM, Circuito Exterior, CU, CDMX, Mexico ∗ To whom correspondence should be addressed; E-mail: [email protected] .
June 2, 2020
We present a compartmental SEIRD model aimed at forecasting hospital oc-cupancy in metropolitan areas during the current COVID-19 outbreak. Themodel features asymptomatic and symptomatic infections with detailed hospi-tal dynamics. We model explicitly branching probabilities and non exponentialresidence times in each latent and infected compartments. Using both hospi-tal admittance confirmed cases and deaths we infer the contact rate and theinitial conditions of the dynamical system, considering break points to modellockdown interventions. Our Bayesian approach allows us to produce timelyprobabilistic forecasts of hospital demand. The model has been used by thefederal government of Mexico to assist public policy, and has been applied forthe analysis of more than 70 metropolitan areas and the 32 states in the coun-try. a r X i v : . [ q - b i o . P E ] J un ntroduction The ongoing COVID-19 pandemic has posed a major challenge to public health systems of allcountries with the imminent risk of saturated hospitals and patients not receiving proper medicalcare. Although the scientific community and public health authorities had insight regarding therisks and preparedness measures required in the onset of a zoonotic pandemic, our knowledge ofthe fatality and spread rates of COVID-19 remains limited ( ). In terms of disease handling,two leading issues determining the current situation are the lack of a pharmaceutical treatmentand our inability to diagnose the asymptomatic infection of COVID-19 ( ).Under the current circumstances, control measures reduce new infections by limiting thenumber of contacts through mitigation and suppression ( ). Mitigation includes social distanc-ing, testing, tracing and isolating of infected individuals, while suppression imposes temporarycancellation of non-essential activities. Most certainly, mitigation and suppression pose a bur-den on the economy, affecting more those individuals living in low income conditions, thusaffecting the capacity of the population as a whole to comply with control measures.Undoubtedly, one key task during the early pandemic response efforts is using epidemiolog-ical records and mathematical and statistical modeling to forecast excess hospital care demand,with formal quantified uncertainty. In this paper we pose a compartmental SEIRD model thattakes into account both asymptomatic and symptomatic infection, including hospital dynamics.We model explicitly the residence time in each latent and infected compartments (
8, 9 ) and weuse records of daily confirmed cases and deaths to pose a statistical model that accounts fordata overdispersion (
10, 11 ). Furthermore, we use Bayesian inference to estimate both the ini-tial state of the governing equations and the contact rate in order to make probabilistic forecastsof the required hospital beds, including the number of intensive care units. We have applied thismodel to forecast hospital demand in metropolitan areas of Mexico. We remark that this model2as been used by Mexican federal public health authorities to assist decision making during theCOVID-19 pandemic.
Contributions and limitations
Broadly speaking, data-driven epidemiological models are built out of necessity of makingforecasts. There are many lessons learnt on emergency preparedness and epidemic surveillancefrom previous pandemic events: AH1N1 influenza ( ), MERS ( ), SARS ( ), Zika ( ),Ebola ( ), etcetera. However, surveillance data during a pandemic event often suffers fromserious deficiencies such as incompleteness and backlogs. Another important issue is the designof data acquisition taking into account geographical granularity ( ). Epidemic surveillance ofCOVID-19 is no different since there is an unknown percentage of asymptomatic infections,and susceptibility is related to economic vulnerability.In this paper we demonstrate that it is possible to produce accurate mid–term (several weeks)probabilistic forecasting of COVID-19 hospital pressure, namely hospital beds and respiratorysupport or mechanical ventilation demands, using confirmed records of cases at hospital admit-tance and deaths.Our analysis uses the Bayesian approach to inverse problems (Bayesian Uncertainty Quan-tification) as a natural mathematical tool to investigate the pandemic evolution and provideactionable forecasts to public health controls during the pandemic.On the other hand, since asymptomatic infection is not fully described so far ( ), it is im-possible to forecast the population fraction that will be in contact with the virus by the end of thecurrent outbreak. Likewise, although it is possible to make model based analysis of scenarios oflockdown exit strategies, reliability is limited due to the lack of information regarding popula-tion viral seroprevalence. Therefore, without serological studies in the open population after anoutbreak, it is not possible to assess the original susceptible population. This is a key piece of3nformation to model and assess the possible next pandemic outbreaks, which our model doesnot address at this point.The model introduced here assumes the contact rate remains relatively constant for severalweeks. Nevertheless, the model accounts for break points due to change of policies in thecontrol measures, such as school closures ( ).Finally, the model does not account for biases due to behavioral changes ( ). In particu-lar, our model may underestimate the decay of outbreaks due to possible granularity of contacts,superspreading events ( ) and other factors. Related work
There are many modeling efforts aimed at forecasting hospital occupancy during the ongoingCOVID-19 pandemic ( ). Broadly speaking, models are informed with evolving informa-tion about COVID-19 cases, clinical description of the patient residence time in each compart-ment, fraction of cases per age group, number of deaths, hospital bed occupancy, etc.Columbia University metapopulation SEIR model ( ) forecasts are based on assumptionsrelating an effective contact rate with population density at a metropolitan area and social dis-tancing policies. The Covid Act Now model ( ) forecasts the replacement number R t andthe fraction of infections requiring hospitalization using the Bayesian paradigm to fit a SEIRmodel to case, hospitalization, death, and recovery counts. The Imperial College response teammathematical model ( ) uses an unweigthed ensamble of four models to produce forecasts ofthe number of deaths in the week ahead for each country with active transmission. The IHMEmodel ( ) combines a mechanistic model of transmission with curve fitting to forecast thenumber of infections and deaths. Moghadas et al. ( ) pose a mechanistic model parametrizedwith demographic data to project hospital utilization in the United States during the COVID-19pandemic. The main goal of Moghadas et al. is to estimate hospital pressure throughout.4ther COVID-19 models have been used to explore exit strategies (
29, 30 ), the role of re-covered individuals as human shields ( ), digital contact tracing ( ) and break points in thecontact rate to account for changes in suppression and mitigation policies ( ). Methods “Models should not be presented as scientific truth” ( ). They are tools that intend to servea specific purpose, evaluate or forecast some particular aspects of a phenomena under certainconditions and ideally should be developed following the processes of predictive science ( ).Namely, identify the quantities of interest (QoI), verify the computational and mathematicalapproximation error, including their implication in the estimation of QoI, and calibrate the pa-rameters to adjust the model in light of data to bring it closer to experimental observation. Whenconsidering uncertainty, Bayesian inference may be used to calibrate some key features of themodel given measured data. Finally, a validation process must be used to build confidence onthe accuracy of the QoI predictions of the model. Dynamical model
As a proxy of hospital pressure the QoI in our model are: the evolution demand of no-ICUhospital beds and number of ICU/respiratory support beds. To estimate the QoI we developed afull compartmental SEIRD model featuring several compartments to describe hospital dynam-ics (see Figure 1 and supporting materials, SM) and sub-compartments to describe explicitlyresidence rates as Erlang distributions (
8, 9 ). The model has two variants, one with age struc-ture and one that assumes age independent dynamics. Here we describe the only latter (seesupporting material for some additional comments on the age dependent mode).Succinctly our model goes as follows (see Figure 1 and SM). Once the susceptible individu-als ( S ) become infected they remain in the incubation compartment ( E ) for mean time of /σ σ ). After the incubation period, exposed individuals become infec-tious and a proportion f become sufficiently sever symptomatic cases ( I S ) to approach a hos-pital, while the remaining cases become mild to asymptomatic ( I A ). The asymptomatic/mild–symptomatic cases remain infectious a mean time of /γ days and eventually recover.For the symptomatic cases ( I S ) we assume that after an average time of /σ days a pro-portion g of infected individuals will need hospitalization ( H ), while the rest ( I C ) will re-ceive ambulatory care, recovering after an average convalescent time of /γ days in quar-antine. The hospitalized patients ( H ) remain an average time of /σ days until a fraction h will need assisted respiratory measures or ICU care such as mechanical ventilation ( U ).The remaining fraction − h of hospitalized patients ( H ) will recover after an average of /γ days. Respiratory-assisted/ICU patients ( U ) remain in that state an average of /σ days until a critical day is reached when a proportion i of them will die ( D ) and the remain-ing proportion − i will recover ( H ) after an average period of /γ days. Similar modelshave been proposed by (
2, 31, 32, 35 ). For the infection force ( λ ) we assume that only mild–symptomatic/asymptomatic ( I A ) and symptomatic ( I S ) individuals spread the infection, thatis λ = β A I A + β S I S N eff where β A and β S are the contact rates of asymptomatic/mild–symptomatic and symptomaticindividuals, respectively.The model has two kind of parameters that have to be calibrated or inferred, those relatedto COVID-19 disease and hospitalization dynamics (such as residence times and proportionsof individuals that split at each bifurcation of the model) and those related to the public re-sponse to mitigation measures such as the contact rates β ’s and the effective number of sus-ceptible individuals N eff at the beginning of an outbreak. Some of these parameters may beestimated from hospital records, be found in recent literature or inferred from reported cases6 E I A I S H I C U H U H R D λ (1 − f ) σ f σ γ (1 − g ) σ γ gσ (1 − h ) σ γ hσ σ iµ (1 − i ) µγ Figure 1: Schematic diagram of model compartments without Erlang sub-compartments. Fora precise definition of parameters see supplementary material.and deaths, but some of remain largely unknown. In the latter category we have the effectivenumber of susceptible individuals N eff at the beginning of an outbreak and the fraction − f ofasymptomatic/mild–symptomatic infections. The number N eff is lower than the full populationfor a metropolitan area and depends on different aspects, but at least in the case of COVID-19it is likely to be a consequence of unequal follow–up of social distancing public policies amongthe population. Reported values of the proportion of asymptomatic/mild–symptomatic infec-tions cases − f range from to , and even in children population (
6, 7, 36 ).Notice that in our model the total number of patients that will visit a hospital is given roughly(bounded) by the product N eff × f and the total number of patients admitted to the hospital isgiven by N eff × f × g , where g is the portion of infected persons that need hospitalization.We have evidence (see SM) that given our choice of observation model (see below), theinference of our QoI only depends on the product N eff × f × g , and not on the value oftheir individual factors. Since the fraction g is far easier to estimate (from hospital records ofadmissions and ambulatory patients) and reported in the literature, we only require to postulate7 value for the product N eff × f .Since our QoI are concerned with hospital pressure, we choose from the available data twosources of information for the observational model: (A) the registered confirmed COVID-19patients at hospitals, with or without hospitalization, and (B) deceased patients. Even underan outbreak, these data are reasonable consistent and systematic information on the inflow (A,registered patients at hospitals) and outflow (B, number of hospital deaths), that “hedge” thehospital dynamics. The model assumes constant, and more or less reliable, reported fractions( g, h, i ) of patients that transit on each hospital dynamics bifurcation. Intuitively, this providesan explanation for why using only A and B the hospital dynamics (our QoI) may be estimated,while only the product N eff × f is important and not to a great extent the nominal values of N eff and f .Regarding data availability for our observational model. Due to administrative reportingdelays we discard the last 7 days of reporting. First visit to the hospital date and the registereddeceased date are used as time stamps for A and B, respectively. We do not use patient reportedsymptoms onset as a time stamp. Bayesian Inference
Undoubtedly, the impact of local testing practices on the number of confirmed cases would needto be analyzed based on the region of interest. In Mexico testing has been relatively low butconsistent. Patients are tested when arriving to hospitals with probable COVID-19 symptomsand limited testing is done elsewhere; therefore most confirmed COVID-19 cases are limitedto A as described above. Therefore, make our inferences we use both confirm cases (A) anddeceased counts (B), in the sense explained in the previous section.For inference, we therefore consider daily confirmed cases c i of patients arriving at H anddaily reported deaths d i , for the metropolitan area or region being analyzed.8o account for over dispersion, we use a negative binomial (NB) distribution N B ( µ, ω, θ ) ,following ( ), with mean µ and over dispersion parameters θ and ω (see SM for details). Theuse of an over disperse NB has proved its value and adequacy in analyzing data from regions inMexico (and other regions). For data y i we let y i ∼ N B ( pµ ( t i ) , ω, θ ) , with fixed values for theoverdispersion parameters ω, θ and an additional reporting probability p . For both confirmedcases and deaths we found good performance fixing ω = 2 . To model daily deaths we fixed θ = 0 . and for daily cases θ = 1 implying higher variability for the later. The reportingprobabilities are 0.95 for deaths and 0.85 for cases, with the assumption, as explained above,that the c i ’s are confirmed sufficiently severe cases arriving at hospitals.For daily deaths counts d i the mean is simply the daily counts µ D ( t i ) = D ( t i ) − D ( t i − ) .For cases c i , the mean µ c ( t i ) we consider is the daily flux entering the H compartment, whichmay be calculated as ( ) µ c ( t i ) = (cid:90) t i t i − gσ I Sm ( t ) dt, where I Sm ( t ) is the last state variable in the I S Erlang series (see SM). We calculate the aboveintegral using a simple trapezoidal rule with 10 points (1/10 day).We assume conditional independence in the data and therefore from the NB model we obtaina likelihood. Our parameters are the contact rate parameter β and crucially we also infer theinitial conditions E (0) , I A (0) , I S (0) . Letting S (0) = N − ( E (0) + I A (0) + I S (0)) and settingthe rest of the parameters to zero, we have all initial conditions defined and the model may besolved numerically to obtain µ D and µ c to evaluate our likelihood. We use the lsoda solveravailable in the scipy.integrate.odeint Python function.To model interventions, a break point is established at which β = β before and β = β after the intervention day. This creates a non-linear time dependent β ( t ) (
19, 37 ). Thisadditional parameters are then included in the inference. Normally only the initial β and anafter lockdown (22 March 2020) β parameters are considered (in Mexico city a third β was9onsidered for modeling a further local intervention in early April).We use Gamma (1 , priors for each E (0) , I A (0) , I S (0) , modeling the low, near to 10,and close to zero counts for the number of infectious initial conditions. The prior for the β ’s arethe same, and a priori independent, long tails, log Normal with σ = 1 and scale parameter 1;that is log ( β ) ∼ N (0 , . To sample from the posterior we resort to MCMC using the “t-walk”generic sampler ( ). The MCMC runs semi automatic, with fairly consistent performances inmost data sets. For any state variable V , the MCMC allows us to sample from the posteriorpredictive distribution for V ( t i ) , and by plotting sequentially some of its quantiles we mayproduce predictions with quantified probabilistic uncertainty, as seen in Figure 2.As in the case of climate forecasting, due to the stochastic nature of an pandemic outbreakpoint-wise estimates such as the maximum a posteriori estimate (MAP) do not give good de-scriptions of the outbreak evolution. No single trajectory of the SEIRD model provide a gooddescription of the outbreak evolution, nor give accurate forecasts. Results
Local transmission started in a different date in each Mexican metropolitan area provided eachone of them has different communicability with Mexico City and with the rest of the world.On the other hand, all metropolitan areas in the country are in lock-down since May 22, with achange of policy due to start at the beginning of the month of June.Figure 2 (a) shows the model forecast, with quantified uncertainty, of the daily records ofCOVID-19 confirmed cases in Mexico City with data trimming to April 17 and April 24 forcomparison. Figures shows both, the absolute number of cases and the number of cases per100,000 inhabitants. Of note, the update in the median, shown with a continuous black line, isalmost negligible while the update in quantiles 10% and 90%, shown with dotted lines, exhibitsa contraction around the median as more data is included. Figure 2 (b) depicts records and10orecasts for the aggregated number of deaths. In Figure 2 (c) and (d) we compare the modelforecasts with hospital bed and ICU occupancy obtained from a secondary official source ofepidemiological surveillance. We remark that the total number of hospital beds and ICU units isconsistently overestimated and shifted to latter times. Our modeling strategy was based on dailydemand of hospital beds and intensive care units records from
Instituto Mexicano del SeguroSocial or Mexican Social Security Institute (IMSS) to calibrate the residence time within thehospital. We assumed that it was necessary to forecast a higher demand in both types of hospitaloccupancy given the emerging nature of the virus and evidence of excess of hospital pressure inplaces like Spain, Italy and New York.In the supplementary material we show the outbreak analysis for the cities of Canc ´un, Ti-juana and La Paz. The rationale for choosing these Mexican cities is as follows. Mexico City isamong the top ten most populated metropolitan areas in the world and comprises roughly onesixth of the Mexican population. Canc ´un and Tijuana are medium size cities with considerableinternational communicability where the first ones with an outbreak among cities in Mexico. LaPaz is the capital of the Mexican state of Baja California Sur in the Baja California peninsulawhere the outbreak was relatively small. Despite the varying features among these cities weforecast the acme of the developing outbreak with up to three weeks for small and medium sizecities.
Mexico city, metropolitan area
Discussion
We presented a Bayesian approach to inform a compartmental SEIRD model to produce prob-abilistic forecasts of hospital pressure during COVID-19 outbreaks.Besides the examples presented in this paper, our model has been applied to other 70metropolitan areas and the 32 states in Mexico (“ama” model; https://coronavirus. , , inhabitants conacyt.mx/proyectos/ama.html , in Spanish). From simple model forecast compar-isons it was apparent from early April that pandemic outbreaks ran at different speeds through-out metropolitan areas in the country, suggesting more regionalized policies than a single na-tionwide strategy. Consequently, future re-opening strategies may need to be thought on aregional basis also. 12he age independent model has proven to be adequate to produce accurate forecast for thehospitalization dynamics during current outbreaks. Therefore the added complexity of the age-structure model may not justify its used at this point. However we continue to work in our agestructure model.Regarding interventions, we model lock-down and social distancing policies by settingbreak points in the contact rates at specific dates and inferring their values. This method doesnot serve to detect change points but it does measure the effectiveness of these policies and itsconsequences on our forecast. Moreover, it makes our model more flexible adapting inferencefor changing circumstances.Our observational model is designed to integrate data after the non-linear term in the dy-namic model and the epidemic curves we inferred and forecast are for the hospitalization dy-namics. Assuming (as the model does) that the rest of the dynamics is proportional to thesecurves, the dynamics of the rest of the model will follow –bounded by residence times– thesame paths of the hospitalization dynamics. Thereby, our estimates and forecasts on hospital-ization pressure can be used as a proxy of the full outbreak. In turn we are also able to produceprobabilistic forecasts for the date when an outbreak will reach its peak.As already mentioned, as yet the asymptomatic infection has not been fully described ( ).The confounding effect of the effective population that participates in the outbreak and thefraction of asymptomatic/mild–symptomatic infections − f makes it impossible to reliablyforecast the population fraction that will be in contact with the virus (i.e. the final popula-tion viral seroprevalence) at the end of an outbreak. Likewise, although it is possible to makemodel based analyses of scenarios of lock-down exit strategies, scenario reliability estimationis limited due to the lack of information regarding population viral seroprevalence. Therefore,without serological studies in the open population after an outbreak it is not possible to assesthe final outbreak size. This is a key piece of information to model and assess possible next pan-13emic outbreaks, from the possibility of reopening and relaxation of social distancing measures.Our model as it stands cannot predict multiple outbreaks, but once epidemic data is available,estimation of the previous and new effective population may provide a toll to predict secondoutbreaks (for a fixed proxy f probability; work in progress).There has been substantial discussion regarding testing strategies for prevalence of COVID-19 cases in different countries. These strategies may serve dissimilar purposes and it shouldbe clear the intended use of each strategy. Due to the lack of description of the asymptomaticinfection most of the strategies for large amount of testing that includes the asymptomatic in-fections will produce biased data and these biases are hard to identify. Mexico is one of thecountries with the smallest ratio of COVID-19 testing (). However, our results show that it ispossible to produce reliable forecast for hospital pressure during an outbreak with this relativelow amount of testing exclusively at hospital admissions (and deceased). The advantage of thistesting strategy is that it is systematic, biases are well identified and it is possible to develop amodel well adapted to this observational method. Acknowledgments
The authors wish to thank Elena ´Alvarez-Buylla, Paola Villarreal (CONACYT) and HugoL´opez-Gatell (Secretar´ıa de Salud) for their support and comments for improving this forecastmodel and also Instituto Mexicano de Seguridad Social (IMSS) for sharing data to adjust thehospitalization dynamics. Very special thanks to CIMAT technicians Judith Esquivel-V´azquezand Oscar Gonz´alez-V´azquez (CIMAT-CONACYT), help in retrieving data and running ourprograms at a CIMAT cluster. We also thank the CONACYT COVID-19 response group, for ad-ditional comments in the developing of our model. The authors are partially founded by CONA-CYT CB-2016-01-284451 grant. AC was partially supported by UNAM PAPPITIN106118grant. 14 eferences
1. N. M. Ferguson, et al. , London: Imperial College COVID-19 Response Team, March (2020).2. R. Verity, et al. , medRxiv (2020).3. C. P. E. R. E. Novel, et al. , Zhonghua liu xing bing xue za zhi= Zhonghua liuxingbingxuezazhi , 145 (2020).4. X. Zhou, Y. Li, T. Li, W. Zhang, Clinical Microbiology and Infection (2020).5. M. Gandhi, D. S. Yokoe, D. V. Havlir,
The New England Journal of Medicine .6. K. Mizumoto, K. Kagaya, A. Zarebski, G. Chowell,
Eurosurveillance (2020).7. M. Day, BMJ (2020).8. D. Champredon, J. Dushoff, D. J. Earn,
SIAM Journal on Applied Mathematics , 3258(2018).9. H. J. Wearing, P. Rohani, M. J. Keeling, PLOS Medicine (2005).10. A. Lind´en, S. M¨antyniemi, Ecology , 1414 (2011).11. A. E. Zarebski, P. Dawson, J. M. McCaw, R. Moss, Infectious Disease Modelling , 56(2017).12. J. A. Cordova-Villalobos, et al. , Gac Med Mex , 102 (2017).13. K.-M. Ha,
Journal of Hospital Infection , 232 (2016).14. E. J. Emanuel, Annals of Internal Medicine , 589 (2003).155. S. J. Hoffman, S. L. Silverberg,
American journal of public health , 329 (2018).16. B. Gates,
New England Journal of Medicine , 1381 (2015).17. N. Perra, B. Gonc¸alves,
Social phenomena (Springer, 2015), pp. 59–83.18. S. W. Park, et al. , medRxiv (2020).19. J. Dehning, et al. , Science (2020).20. C. Eksin, K. Paarporn, J. S. Weitz,
Epidemics , 96 (2019).21. J. S. Weitz, S. W. Park, C. Eksin, J. Dushoff, medRxiv (2020).22. Y. Liu, R. M. Eggo, A. J. Kucharski, The Lancet , e47 (2020).23. T. Yamana, S. Pei, J. Shaman, medRxiv (2020).24. M. Henderson, et al. , Covid act now.25. S. Bhatia, et al. , Short-term forecasts of covid-19 deaths in multiple countries.26. I. COVID, C. J. Murray, et al. , medRxiv (2020).27. S. M. Moghadas, et al. , Proceedings of the National Academy of Sciences , 9122 (2020).28. H. Salje, et al. , Science (2020).29. O. Karin, et al. , medRxiv (2020).30. L. Di Domenico, G. Pullano, C. E. Sabbatini, P.-Y. Bo¨elle, V. Colizza, medRxiv (2020).31. J. S. Weitz, et al. , Nature Medicine pp. 1–6 (2020).32. L. Ferretti, et al. , Science (2020). 163. N. P. Jewell, J. A. Lewnard, B. L. Jewell,
JAMA (2020).34. J. T. Oden, R. Moser, O. Ghattas,
SIAM News , 1 (2010).35. A. J. Kucharski, et al. , The Lancet Infectious Diseases (2020).36. S. Silal, et al. , Estimating cases for covid-19 in south africa update: 19 may 2020 (2020).37. W. Wang, Epidemic models with nonlinear infection forces.38. J. A. Christen, C. Fox,
Bayesian Anal. , 263 (2010).39. A. Cori, N. M. Ferguson, C. Fraser, S. Cauchemez, American Journal of Epidemiology ,1505 (2013).40. P. Van den Driessche, J. Watmough,
Mathematical biosciences , 29 (2002).41. E. E. Team, et al. , Eurosurveillance (2020).42. J. Zhang, et al. , The Lancet Infectious Diseases (2020).43. U. Buchholz, et al. (2020).44. L. Zou, et al. , New England Journal of Medicine , 1177 (2020).45. X. He, et al. , Nature medicine pp. 1–4 (2020).46. Preliminary estimate of excess mortality during the covid-19 outbreak new york city, march11may 2, 2020.47. S. M. Moghadas, et al. , Proceedings of the National Academy of Sciences , 9122 (2020).48. K. Prem, A. Cook, M. Jit,
PLOS Computational Biology , e1005697 (2017).17 upplementary materialsOther examples Tijuana, Mex (a) (b)(c) (d)Figure 3: Outbreak analysis for Tijuana metropolitan area, using data from 17 April and 24April. (a) Confirmed cases (b) Deaths (c) No ICU (d) ICU occupied hospital beds. Totalpopulation , , inhabitants 18 ancun, Mex (a) (b)(c) (d)Figure 4: Outbreak analysis for Cancun metropolitan area, using data from 17 April and 24April. (a) Confirmed cases (b) Deaths (c) No ICU (d) ICU occupied hospital beds. Totalpopulation , inhabitants. 19 a Paz In cases where there are small counts of confirmed cases and deaths we included a fictitiousintervention point after the first 10 confirmed cases ( ). In terms of the outbreak evolutionthis serves the purpose of distinguish the imported cases from the community spread where themodel should be applied.(a) (b)(c) (d)Figure 5: Outbreak analysis for La Paz metropolitan area, using data from 17 April and 24April. (a) Confirmed cases (b) Deaths (c) No ICU (d) ICU occupied hospital beds. Totalpopulation , inhabitants. 20 ata sources Delimitation of the metropolitan areas and population was taken from and https://datos.gob.mx/busca/dataset/proyecciones-de-la-poblacion-de-mexico-y-de-las-entidades-federativas-2016-2050
Model
We developed a dynamic transmission compartmental model to simulate the spread of the novelcoronavirus SARS-CoV-2. A description of the state variables is found in Table 1. Additionallyand “Erlang series” is included for most of these state variables to account for non-exponentialresidente times; see below. The model may be described conceptually with the graph in Fig-ure 1. Without showing the Erlang series for the sub-compartments the system of equations inthe model is as follows: dSdt = − (cid:0) β A I A + β S I S (cid:1) N eff S dEdt = (cid:0) β A I A + β S I S (cid:1) N eff S − σ EdI A dt = (1 − f ) σ E − γ I A dI S dt = f σ E − σ I S dI C dt = (1 − g ) σ I S − γ I C dH dt = gσ I S − σ H dH dt = (1 − h ) σ H − γ H dU dt = hσ H − σ U dU dt = σ U − µU dDdt = iµU dH dt = (1 − i ) µU − γ H dRdt = γ I A + γ I C + γ H + γ H . A brief description of all the parameters in our model may be found in Table 2.21able 1: Description of the state variables for our dynamic transmission model
Variable Description S Susceptibles E Latent individuals I A Asymptomatic/mild–symptomatic individuals I S Symptomatic individuals I C Out-patients H Hospitalized patients, initial stage H Hospitalized patients (no ICU) U Hospitalized patients (ICU or respiratory support) U Hospitalized patients (ICU or respiratory support) critical day H Hospitalized patients recovering after ICU or respiratory support R Recovered D Deceased
S E I A I S H I C U H U H R D λ (1 − f ) σ f σ γ (1 − g ) σ γ gσ (1 − h ) σ γ hσ σ iµ (1 − i ) µγ Figure 6: Schematic diagram of model compartments without Erlang sub-compartments.22able 2: Description of the parameters included in our model.
Parameter Description Units N eff Number of initially susceptible individuals in the population – β A Transmission rate of asymptomatic/mild–symptomatic individuals (asx/mild–sym) per day β S Transmission rate of symptomatic individuals per day κ Relative strength between the transmission the rate of asx/mild–sym and symptomatic f Portion of infected persons with strong enough symptoms to visit a hospital g Portion of infected persons that need hospitalization h Portion of hospitalized patients requiring respiratory support or ICU care i Portion of Respiratory-assisted or ICU patients deceased /σ Average incubation time day /σ Average time from symptomatic onset to hospital visit day /σ Average time from hospital admission to respiratory support or ICU care day /σ Average time with respiratory support or ICU care day /µ Average length of critical stage of respiratory–support/ICU between death and recovery day /γ Average time that asymptomatic/mild–symptomatic individuals remain infectious day /γ Average time of symptomatic individuals that recover without visiting a hospital day /γ Average time from hospital admission to hospital discharge day /γ Average time from respiratory–support/ICU care release to hospital discharge day
Infection force and basic reproductive number R For the infection force ( λ ) we assume that the only individuals that spread the infection cor-respond to the mild–symptomatic/asymptomatic ( I A ) and symptomatic individuals ( I S ) beforetheir contact with the health care system or doctor, e.g. λ = β A I A + β S I S N eff We compute the basic reproductive number R of the epidemic by the next generation matrixmethod ( ) and obtain R = (1 − f ) β A γ + f β S σ . alues of model parameters Since our QoI are related with the hospital pressure we choose all parameters conservatively.For each metropolitan area we assume that N eff corresponds to its full population as defined byInstituto Nacional de Estadistica y Geografia (INEGI). The values of the transition probabilitiesare summarized in Table 3.Table 3: Transition probabilities at bifurcations in the model Parameter Value Reference f g ), IMSS h ), IMSS i ), IMSS Erlang series and sub-compartments
In order to make the intrinsic generation-interval of the renewal equation in each compartmentmore realistic we divide each compartment of the model into m equal sub–compartments togenerate an Erlang–distributed waiting time ( ). The Erlang distributions of each compartmentis calibrated by two parameters: the rate λ E and the shape m , a positive integer that correspondsto the number of sub–compartments on the model. In terms of these parameters the mean of theErlang distribution is m/λ E , this mean correspond to the average times in the dynamic model.We use recent publications and information generously shared by the Instituto Mexicano deSeguridad Social (IMSS) to estimate the average time and the shape parameter of the Erlangseries in each compartment. Details of Erlang series lengths, residence times and the imputedvalues may be found in Table 4. 24able 4: Average times and Erlang shape parameters for sub-compartments Variable Rates Average time Erlang shape m Reference
S β S Inferred 1 – E /σ ) I A /γ ) I S /σ ) I C /γ ) H /σ ) H /γ
10 days 3 ( ) U /σ
10 days 3 IMSS U /µ ) H /γ ) R None – – – D None – – –
Relative strength between the transmission the rate of asymptomatic/mild–symptomaticand symptomatic
In our methodology we aim to infer the force of the infection λ . This parameter is definedin terms of contact rate of asymptomatic/mildsymptomatic individuals β A and contact rate ofsymptomatic individuals β S . Due to the functional dependence of λ in these parameters thereis a lack of identifiability between β A and β S that can not be resolved without further assump-tions. We assume that the relative strength between the transmission rate of asymptomatic/mild–symptomatic and symptomatic is modelled as a fixed ratio κ . We model the value of κ directly asthe ratio of the viral load of symptomatic and asymptomatic/mild–symptomatic patients ( )and is fixed to κ = 2 . Hence, the force of infection becomes λ = β S ( I S + κI A ) /N eff . Data and observational model
To make our inferences we use both confirm cases and deceased counts. In some regions subreporting of COVID-19 related deaths may become relevant, specially in places hit by a severe25utbreak ( ). Nonetheless, deaths are a more reliable data source to estimate a COVID-19outbreak, specially for the purposes of hospital demand. Certainly, daily confirm cases arealso needed for more accurate predictions as a proxy for outbreak dynamics. The problemhere is that the number of confirmed cases depends heavily on local practices, in particular inrelation to the intensity of testing, adding a level of complication if testing intensity has varieddue to ambiguous policies. Undoubtedly, the impact of local testing practices on the numberof confirmed cases would need to be analyzed based on the region of interest. In Mexicotesting has been relatively low but consistent. Patients are tested when arriving to hospitalswith probable COVID-19 symptoms and limited testing is done elsewhere. For inference, wetherefore consider daily counts of patients arriving at H , as we explain next.Accordingly, as explained in the main text, we use daily reported deaths d i and daily con-firmed cases c i , for the metropolitan area or region being analyzed, to perform our inferences.The first default model for count data is a Poisson distribution, however, epidemiological datatends to be over disperse and an over disperse generalized Poisson distribution may be needed tocorrectly, and safely, model these type of data, as it is the case in other ecological studies. ( )suggest the use of and over disperse negative binomial (NB) model. We have followed theirsuggestion, and the use of an over disperse NB has proved its value and adequacy in analyzingdata from regions in Mexico, and elsewhere, as we have already illustrated.Following ( ) the NB distribution is re parametrized in terms of its mean µ and “overdis-persion” parameters θ and ω , with r = µω − θµ and p = ω + θµ , the number of failures beforestopping and the success probability, respectively, in the usual NB parametrization. For data y i we let y i ∼ N B ( pµ ( t i ) , ω, θ ) , with fixed values for the overdispersion parameters ω, θ and anadditional reporting probability p . The index of dispersion is σ /µ = ω + θµ . Over dispersionwith respect to the Poisson distribution is achieved when ω > and the index of dispersionincreases with size if θ (cid:54) = 0 ; both desirable characteristics in outbreak data, adding variability26s counts increase. In both cases we found good performance fixing ω = 2 . To model dailydeaths we fixed θ = 0 . and for daily cases θ = 1 implying higher variability for the later. Thereporting probabilities are 0.95 for deaths and 0.85 for cases, with the assumption, as explainedabove, that the c i ’s are confirmed sufficiently severe cases arriving at hospitals.For daily deaths counts d i the mean is simply the daily counts µ D ( t i ) = D ( t i ) − D ( t i − ) .For cases c i , the mean µ c ( t i ) we consider the daily flux entering the H compartment, whichmay be calculated as ( ) µ c ( t i ) = (cid:90) t i t i − gσ I Sm ( t ) dt, where I Sm ( t ) is the last state variable in the I S Erlang series. We calculate the above integralusing a simple trapezoidal rule with 10 points (1/10 day).
Modeling interventions and Bayesian Inference
We assume conditional independence in the data and therefore from the NB model we obtaina likelihood. Our parameters are the contact rate parameter β and crucially we also infer theinitial conditions E (0) , I A (0) , I S (0) . Letting S (0) = N − ( E (0) + I A (0) + I S (0)) and settingthe rest of the parameters to zero, we have all initial conditions defined and the model may besolved numerically to obtain µ D and µ c to evaluate our likelihood. We use the lsoda solveravailable in the scipy.integrate.odeint Python function.To model interventions, a break point is established at which β = β before and β = β after the intervention day. This creates a non-linear time dependent β ( t ) (
19, 37 ). Thisadditional parameters are then included in the inference. Normally only the initial β and anafter lockdown (22 March 2020) β parameters are considered (in Mexico city a third β wasconsidered for modeling a further local intervention in early April).We use Gamma (1 , priors for each E (0) , I A (0) , I S (0) , modeling the low, near to 10,and close to zero counts for the number of infectious initial conditions. The prior for the β ’s27re the same and a priori independent (a possible generalization is to consider dependent β ’smodeling a decrease in contact rates after lockdowns etc.). These are rather diffuse and longtails LogNormal with σ = 1 and scale parameter 1; that is log ( β ) ∼ N (0 , .To sample from the posterior we resort to MCMC using the “t-walk” generic sampler ( ).The MCMC runs semi automatic, with a fairly consistent burn in of 1,000 iterations (samplinginitial values from the prior). We perform subsampling using the Intergrated AutocorrelationTime, with psuedo-independent sample sizes of 1,000 to 1,500 with 200,000 iterations of theMCMC. This roughly takes 30 min in a 2.2 GHz processor.To illustrate the whole posterior distribution, for any state variable V (or µ c ( t i ) ), for eachsampled initial conditions and β ’s the model is solved at time t , t , . . . , t k , including possiblyfuture dates, obtaining a sample of V ( t i ) values for each t i . The median, and other desiredquantiles are plotted vertically for each date considered, obtaining the plots as in Figure 2.Note that the traced median or other plotted quantiles do not necessary correspond to any givenmodel trajectory, providing a far richer Uncertainty Quantification approach than the classicalparameter estimates plug-in approach. Indeed, the sampled values for V ( t i ) do correspond toMonte Carlo samples of the posterior predictive distribution for V ( t i ) . Adding age structure
Adding group ages is straightforward in these type of models. The number a of ages groupsis decided and our model is repeated a times. Different residence times may be included ( )but we preferred to concentrate on the different transition probabilities g, h that vary nearly twoorders of magnitude using age groups [0 , , (25 , , (50 , , (65 , ( ). The age structureis used to divide the initial infectious and susceptible population proportional to the size of eachage group. The same number of parameters are inferred, using a single β , with an optionalweighting contact matrix ( ) to model different contact rates among age groups in specific28egions. Using a sufficiently flexible software design, progressing to an age structure model isnot complex, nonetheless the MCMC may run substantially slower. Although we have exper-imented with our age structured model, using census data from Mexico and both uniform andnon uniform contact matrices, we do not report any of these results here, given that our non-agestructure model has sufficiently enough predictive power, as already discussed. Confounding effect of N ef f × f To explain the confounding effect of N eff × f we have two observations. First, if we let f = ˜ f /α for some α ∈ (0 , ˜ f ) then differential equations for the variables I c , H , H , H , U , U and D remain invariant and the equations for I s becomes dI S dt = ˜ fα σ E − σ I S . By letting ˜ E = E/α the equation for I s is also invariant with the substitution of E by ˜ E . Now,the equation for ˜ E is given by d ˜ Edt = (cid:0) β A I A + β S I S (cid:1) αN eff S − σ ˜ E. By letting ˜ N eff = αN eff the latter equations becomes also invariant under the substitution of E by ˜ E . Therefore for the lower branch in the model (see Figure 6) the system of equationsis invariant under the change of f and N eff by ˜ f and ˜ N eff provided ˜ N eff × ˜ f = N eff × f holds. Clearly, to have a consistent system of equations, the equations for S , I A and R have tobe adapted in each case to obtain a consistent set of equations.Second, int the inference of parameter β we inform the system with the data at the H and D compartments. If ˜ N eff × ˜ f = N eff × f holds, in view of our first observation, to fit thesedata the fluxes f σ E and ˜ f σ ˜ E in either case have to be the same. Clearly, the solutions in thecompartment I S and after do not change in this case, but the individuals in the I A compartmentdo change depending on which combination of N eff and f or ˜ N eff and ˜ f is considered. There29s a range of validity for α where the inference of ββ