[PDF] Data-driven modeling for different stages of pandemic response

Abstract

Some of the key questions of interest during the COVID-19 pandemic (and all outbreaks) include: where did the disease start, how is it spreading, who is at risk, and how to control the spread. There are a large number of complex factors driving the spread of pandemics, and, as a result, multiple modeling techniques play an increasingly important role in shaping public policy and decision making. As different countries and regions go through phases of the pandemic, the questions and data availability also changes. Especially of interest is aligning model development and data collection to support response efforts at each stage of the pandemic. The COVID-19 pandemic has been unprecedented in terms of real-time collection and dissemination of a number of diverse datasets, ranging from disease outcomes, to mobility, behaviors, and socio-economic factors. The data sets have been critical from the perspective of disease modeling and analytics to support policymakers in real-time. In this overview article, we survey the data landscape around COVID-19, with a focus on how such datasets have aided modeling and response through different stages so far in the pandemic. We also discuss some of the current challenges and the needs that will arise as we plan our way out of the pandemic.

Full PDF

DData-driven modeling for diﬀerent stages of pandemic response

Aniruddha Adiga , Jiangzhuo Chen , Madhav Marathe , Henning Mortveit , , Srinivasan Venkatramanan and Anil Vullikanti , Biocomplexity Institute and Inititiative Department of Systems Engineering and Environment Department of Computer ScienceUniversity of Virginia

Abstract

Some of the key questions of interest during the COVID-19 pandemic (and all outbreaks) include:where did the disease start, how is it spreading, who is at risk, and how to control the spread. There area large number of complex factors driving the spread of pandemics, and, as a result, multiple modelingtechniques play an increasingly important role in shaping public policy and decision making. As diﬀerentcountries and regions go through phases of the pandemic, the questions and data availability also changes.Especially of interest is aligning model development and data collection to support response eﬀorts ateach stage of the pandemic. The COVID-19 pandemic has been unprecedented in terms of real-timecollection and dissemination of a number of diverse datasets, ranging from disease outcomes, to mobility,behaviors, and socio-economic factors. The data sets have been critical from the perspective of diseasemodeling and analytics to support policymakers in real-time. In this overview article, we survey thedata landscape around COVID-19, with a focus on how such datasets have aided modeling and responsethrough diﬀerent stages so far in the pandemic. We also discuss some of the current challenges and theneeds that will arise as we plan our way out of the pandemic.

As the SARS-CoV-2 pandemic has demonstrated, the spread of a highly infectious disease is a complexdynamical process. A large number of factors are at play as infectious diseases spread, including variableindividual susceptibility to the pathogen (e.g., by age and health conditions), variable individual behaviors(e.g., compliance with social distancing and the use of masks), diﬀering response strategies implemented bygovernments (e.g., school and workplace closure policies and criteria for testing), and potential availability ofpharmaceutical interventions. Governments have been forced to respond to the rapidly changing dynamicsof the pandemic, and are becoming increasingly reliant on diﬀerent modeling and analytical techniques tounderstand, forecast, plan and respond; this includes statistical methods and decision support methods usingmulti-agent models, such as: ( i ) forecasting epidemic outcomes (e.g., case counts, mortality and hospitaldemands), using a diverse set of data-driven methods e.g., ARIMA type time series forecasting, Bayesiantechniques and deep learning, e.g., [1–5], ( ii ) disease surveillance, e.g., [6,7], and ( iii ) counter-factual analysisof epidemics using multi-agent models, e.g., [8–13]; indeed, the results of [11, 14] were very inﬂuential in theearly decisions for lockdowns in a number of countries.The speciﬁc questions of interest change with the stage of the pandemic. In the pre-pandemic stage, the focuswas on understanding how the outbreak started, epidemic parameters, and the risk of importation to diﬀerentregions. Once outbreaks started– the acceleration stage, the focus is on determining the growth rates, thediﬀerences in spatio-temporal characteristics, and testing bias. In the mitigation stage, the questions arefocused on non-prophylactic interventions, such as school and work place closures and other social-distancing ∗ To appear in the “Journal of the Indian Institute of Science,” Volume 100. a r X i v : . [ q - b i o . P E ] S e p trategies, determining the demand for healthcare resources, and testing and tracing. In the suppression stage, the focus shifts to using prophylactic interventions, combined with better tracing. These phases arenot linear, and overlap with each other. For instance, the acceleration and mitigation stages of the pandemicmight overlap spatially, temporally as well as within certain social groups.Diﬀerent kinds of models are appropriate at diﬀerent stages, and for addressing diﬀerent kinds of questions.For instance, statistical and machine learning models are very useful in forecasting and short term projections.However, they are not very eﬀective for longer-term projections, understanding the eﬀects of diﬀerent kindsof interventions, and counter-factual analysis. Mechanistic models are very useful for such questions. Simplecompartmental type models, and their extensions, namely, structured metapopulation models are useful forseveral population level questions. However, once the outbreak has spread, and complex individual andcommunity level behaviors are at play, multi-agent models are most eﬀective, since they allow for a moresystematic representation of complex social interactions, individual and collective behavioral adaptation andpublic policies.As with any mathematical modeling eﬀort, data plays a big role in the utility of such models. Till recently,data on infectious diseases was very hard to obtain due to various issues, such as privacy and sensitivity ofthe data (since it is information about individual health), and logistics of collecting such data. The datalandscape during the SARS-CoV-2 pandemic has been very diﬀerent: a large number of datasets are becomingavailable, ranging from disease outcomes (e.g., time series of the number of conﬁrmed cases, deaths, andhospitalizations), some characteristics of their locations and demographics, healthcare infrastructure capacity(e.g., number of ICU beds, number of healthcare personnel, and ventilators), and various kinds of behaviors(e.g., level of social distancing, usage of PPEs); see [15–17] for comprehensive surveys on available datasets.However, using these datasets for developing good models, and addressing important public health questionsremains challenging. The goal of this article is to use the widely accepted stages of a pandemic as a guidingframework to highlight a few important problems that require attention in each of these stages. We will aimto provide a succinct model-agnostic formulation while identifying the key datasets needed, how they can beused, and the challenges arising in that process. We will also use SARS-CoV-2 as a case study unfolding inreal-time, and highlight some interesting peer-reviewed and preprint literature that pertains to each of theseproblems. An important point to note is the necessity of randomly sampled data, e.g. data needed to assessthe number of active cases and various demographics of individuals that were aﬀected. Census providesan excellent rationale. It is the only way one can develop rigorous estimates of various epidemiologicallyrelevant quantities.There have been numerous surveys on the diﬀerent types of datasets available for SARS-CoV-2, e.g., [15–18],as well as diﬀerent kinds of modeling approaches. However, they do not describe how these models becomerelevant through the phases of pandemic response. An earlier similar attempt to summarize such response-driven modeling eﬀorts can be found in [19], based on the 2009-H1N1 experience, this paper builds on theirwork and discusses these phases in the present context and the SARS-CoV-2 pandemic. Although the papertouches upon diﬀerent aspects of model-based decision making, we refer the readers to a companion articlein the same special issue [20] for a focused review of models used for projection and forecasting. Multiple organizations including CDC and WHO have their frameworks for preparing and planning responseto a pandemic. For instance, the Pandemic Intervals Framework from CDC describes the stages in thecontext of an inﬂuenza pandemic; these are illustrated in Figure 1. These six stages span investigation,recognition and initiation in the early phase, followed by most of the disease spread occurring during theacceleration and deceleration stages. They also provide indicators for identifying when the pandemic hasprogressed from one stage to the next [21]. As envisioned, risk evaluation (i.e., using tools like InﬂuenzaRisk Assessment Tool (IRAT) and Pandemic Severity Assessment Framework (PSAF)) and early case iden-tiﬁcation characterize the ﬁrst three stages, while non-pharmaceutical interventions (NPIs) and available and phases of pandemic alert .While such frameworks aid in streamlining the response eﬀorts of these organizations, they also enable eﬀec-tive messaging. To the best of our knowledge, there has not been a similar characterization of mathematicalmodeling eﬀorts that go hand in hand with supporting the response. For summarizing the key models, we consider four of the stages of pandemic response mentioned in Section 2:pre-pandemic, acceleration, mitigation and suppression. Here we provide the key problems in each stage, thedatasets needed, the main tools and techniques used, and pertinent challenges. We structure our discussionbased on our experience with modeling the spread of COVID-19 in the US, done in collaboration with localand federal agencies. • Pre-pandemic (Section 4): in the initial time period, there are few human infections, and the key ques-tions involve understanding the epidemiological parameters, and the risks of importation to diﬀerentcountries. The primary sources of data used in this stage include line lists, clinical investigations andprior literature on similar diseases (for the former question), and mobility data such as airline ﬂows,and information on travel restrictions. • Acceleration (Section 5): this stage is relevant once the epidemic takes root within a country. Thereis usually a big lag in surveillance and response eﬀorts, and the key questions are to model spread • Mitigation (Section 6): in this stage, diﬀerent interventions, which are mostly non-pharmaceutical inthe case of a novel pathogen, are implemented by government agencies, once the outbreak has takenhold within the population. This stage involves understanding the impact of interventions on casecounts and health infrastructure demands, taking individual behaviors into account. The additionaldatasets needed in this stage include those on behavioral changes and hospital capacities. • Suppression (Section 7): this stage involves designing methods to control the outbreak by contacttracing & isolation and vaccination. Data on contact tracing, associated biases, vaccine productionschedules, and compliance & hesitancy are needed in this stage.Figure 2 gives an overview of this framework and summarizes the data needs in these stages. These stages alsoalign well with the focus of the various modeling working groups organized by CDC which include epidemicparameter estimation, international spread risk, sub-national spread forecasting, impact of interventions,healthcare systems, and university modeling. In reality, one should note that these stages may overlap, andmay vary based on geographical factors and response eﬀorts. Moreover, speciﬁc problems can be approachedprospectively in earlier stages, or retrospectively during later stages. This framework is thus meant tobe more conceptual than interpreted along a linear timeline. Results from such stages are very useful forpolicymakers to guide real-time response. 4arameter values descriptiontransmissibility ( R ) 2.5 [2.0,3.0] basic reproduction numberincubation period 5 days time from infection to onsetlatent period 3 ∼ ∼ ∼ ∼ ∼ best guess 2020-04-14 version of “COVID-19 Pandemic Planning Scenarios” document prepared by the Centers for Disease Control andPrevention (CDC) SARS-CoV-2 Modeling Team [23]. Consider a novel pathogen emerging in human populations that is detected through early cases involvingunusual symptoms or unknown etiology. Such outbreaks are characterized by some kind of spillover event,mostly through zoonotic means, like in the case of COVID-19 or past inﬂuenza pandemics (e.g., swine ﬂuand avian ﬂu). A similar scenario can occur when an incidence of a well-documented disease with no knownvaccine or therapeutics emerges in some part of the world, causing severe outcomes or fatalities (e.g., Ebolaand Zika.) Regardless of the development status of the country where the pathogen emerged, such outbreaksnow contains the risk of causing a worldwide pandemic due to the global connectivity induced by humantravel.Two questions become relevant at this stage: what are the epidemiological attributes of this disease, andwhat are the risks of importation to a diﬀerent country? While the ﬁrst question involves biological andclinical investigations, the latter is more related with societal and environmental factors.

One of the crucial tasks during early disease investigation is to ascertain the transmission and severity ofthe disease. These are important dimensions along which the pandemic potential is characterized becausetogether they determine the overall disease burden, as demonstrated within the Pandemic Severity Assess-ment Framework [22]. In addition to risk assessment for right-sizing response, they are integral to developingmeaningful disease models.

Formulation

Let Θ = { θ T , θ S } represent the transmission and severity parameters of interest. They canbe further subdivided into sojourn time parameters θ δ · and transition probability parameters θ p · . Here Θcorresponds to a continuous time Markov chain (CTMC) on the disease states. The problem formulationcan be represented as follows:Given Π(Θ), the prior distribution on the disease parameters and a dataset D , estimate the posterior dis-tribution P (Θ |D ) over all possible values of Θ. In a model-speciﬁc form, this can be expressed as P (Θ |D , M )where M is a statistical, compartmental or agent-based disease model.5 ata needs In order to estimate the disease parameters suﬃciently, line lists for individual conﬁrmed casesis ideal. Such datasets contain, for each record, the date of conﬁrmation, possible date of onset, severity(hospitalization/ICU) status, and date of recovery/discharge/death. Furthermore, age- and demographic/co-morbidity information allow development of models that are age- and risk group stratiﬁed. One such crowd-sourced line list was compiled during the early stages of COVID-19 [24] and later released by CDC for UScases [25]. Data from detailed clinical investigations from other countries such as China, South Korea, andSingapore was also used to parameterize these models [26]. In the absence of such datasets, past parameterestimates of similar diseases (e.g., SARS, MERS) were used for early analyses.

Modeling approaches

For a model agnostic approach, the delays and probabilities are obtained byvarious techniques, including Bayesian and Ordinary Least Squares ﬁtting to various delay distributions. Fora particular disease model, these are estimated through model calibration techniques such as MCMC andparticle ﬁltering approaches. A summary of community estimates of various disease parameters is providedat https://github.com/midas-network/COVID-19 . Further such estimates allow the design of pandemicplanning scenarios varying in levels of impact, as seen in the CDC scenarios page . See [27–29] for methodsand results related to estimating COVID-19 disease parameters from real data. Current models use a largeset of disease parameters for modeling COVID-19 dynamics; they can be broadly classiﬁed as transmissionparameters and hospital resource parameters. For instance in our work, we currently use parameters (withexplanations) shown in Table 1. Challenges

Often these parameters are model speciﬁc, and hence one needs to be careful when reusingparameter estimates from literature. They are related but not identiﬁable with respect to population levelmeasures such as basic reproductive number R (or eﬀective reproductive number R eﬀ ) and doubling timewhich allow tracking the rate of epidemic growth. Also the estimation is hindered by inherent biases incase ascertainment rate, reporting delays and other gaps in the surveillance system. Aligning diﬀerent datastreams (e.g., outpatient surveillance, hospitalization rates, mortality records) is in itself challenging. When a disease outbreak occurs in some part of the world, it is imperative for most countries to estimatetheir risk of importation through spatial proximity or international travel. Such measures are incrediblyvaluable in setting a timeline for preparation eﬀorts, and initiating health checks at the borders. Overcenturies, pandemics have spread faster and faster across the globe, making it all the more important tocharacterize this risk as early as possible.

Formulation

Let C be the set of countries, and G = {C , E} an international network, where edges (oftenweighted and directed) in E represent some notion of connectivity. The importation risk problem can beformulated as below:Given C o ∈ C the country of origin with an initial case at time 0, and C i the country of interest,using G , estimate the expected time taken T i for the ﬁrst cases to arrive in country C i .In its probabilistic form, the same can be expressed as estimating the probability P i ( t ) of seeing the ﬁrstcase in country C i by time t . Data needs

Assuming we have initial case reports from the origin country, the ﬁrst data needed is anetwork that connects the countries of the world to represent human travel. The most common source ofsuch information is the airline network datasets, from sources such as IATA, OAG, and OpenFlights; [30]provides a systematic review of how airline passenger data has been used for infectious disease modeling.These datasets could either capture static measures such as number of seats available or ﬂight schedules,

6r a dynamic count of passengers per month along each itinerary. Since the latter has intrinsic delays incollection and reporting, for an ongoing pandemic they may not be representative. During such times,data on ongoing travel restrictions [31] become important to incorporate. Multi-modal traﬃc will also beimportant to incorporate for countries that share land borders or have heavy maritime traﬃc. For diseasessuch as Zika, where establishment risk is more relevant, data on vector abundance or prevailing weatherconditions are appropriate.

Modeling approaches

Simple structural measures on networks (such as degree, PageRank) could providestatic indicators of vulnerability of countries. By transforming the weighted, directed edges into probabil-ities, one can use simple contagion models (e.g., Independent Cascades) to simulate disease spread andempirically estimate expected time of arrival. Global metapopulation models (GLEaM) that combine SEIRtype dynamics with an airline network have also been used in the past for estimating importation risk.Brockmann and Helbing [32] used a similar framework to quantify eﬀective distance on the network whichseemed to be well correlated with time of arrival for multiple pandemics in the past; this has been extendedto COVID-19 [8, 33]. In [34], the authors employ air travel volume obtained through IATA from ten majorcities across China to rank various countries along with the IDVI to convey their vulnerability. [35] considerthe task of forecasting international and domestic spread of COVID-19 and employ Oﬃcial Airline Group(OAG) data for determining air traﬃc to various countries, and [36] ﬁt a generalized linear model for ob-served number of cases in various countries as a function of air traﬃc volume obtained from OAG data todetermine countries with potential risk of under-detection. Also, [37] provide Africa-speciﬁc case-study ofvulnerability and preparedness using data from Civil Aviation Administration of China.

Challenges

Note that arrival of an infected traveler will precede a local transmission event in a country.Hence the former is more appropriate to quantify in early stages. Also, the formulation is agnostic towhether it is the ﬁrst infected arrival or ﬁrst detected case. However, in real world, the former is diﬃcult toobserve, while the latter is inﬂuenced by security measures at ports of entry (land, sea, air) and the ease ofidentiﬁcation for the pathogen. For instance, in the case of COVID-19, the long incubation period and thehigh likelihood of asymptomaticity could have resulted in many infected travelers being missed by healthchecks at PoEs. We also noticed potential administrative delays in reporting by multiple countries fearingtravel restrictions.

As the epidemic takes root within a country, it may enter the acceleration phase. Depending on thetesting infrastructure and agility of surveillance system, response eﬀorts might lag or lead the rapid growthin case rate. Under such a scenario, two crucial questions emerge that pertain to how the disease may spreadspatially/socially and how the case rate may grow over time.

Within the country, there is need to model the spatial spread of the disease at diﬀerent scales: state,county, and community levels. Similar to the importation risk, such models may provide an estimate ofwhen cases may emerge in diﬀerent parts of the country. When coupled with vulnerability indicators (socio-economic, demographic, co-morbidities) they provide a framework for assessing the heterogeneous impactthe disease may have across the country. Detailed agent-based models for urban centers may help iden-tify hotspots and potential case clusters that may emerge (e.g., correctional facilities, nursing homes, foodprocessing plants, etc. in the case of COVID-19).

Formulation

Given a population representation P at appropriate scale and a disease model M per entity(individual or sub-region), model the disease spread under diﬀerent assumptions of underlying connectivity7 and disease parameters Θ. The result will be a spatio-temporal spread model that results in Z s,t , the timeseries of disease states over time for region s . Data needs

Some of the common datasets needed by most modeling approaches include: (1) socialand spatial representation, which includes Census, and population data, which are available from Cen-sus departments (see, e.g., [38]), and Landscan [39], (2) connectivity between regions (commuter, airline,road/rail/river), e.g., [30, 31], (3) data on locations, including points of interest, e.g., OpenStreetMap [40],and (4) activity data, e.g., the American Time Use Survey [41]. These datasets help capture where peoplereside and how they move around, and come in contact with each other. While some of these are static, moredynamic measures, such as from GPS traces, become relevant as individuals change their behavior during apandemic.

Modeling approaches

Diﬀerent kinds of structured metapopulation models [8, 42–45], and agent basedmodels [46–50] have been used in the past to model the sub-national spread; we refer to [13, 51, 52] forsurveys on diﬀerent modeling approaches. These models incorporate typical mixing patterns, which resultfrom detailed activities and co-location (in the case of agent based models), and diﬀerent modes of traveland commuting (in the case of metapopulation models).

Challenges

While metapopulation models can be built relatively rapidly, agent based models are muchharder—the datasets need to be assembled at a large scale, with detailed construction pipelines, see, e.g., [46–50]. Since detailed individual activities drive the dynamics in agent based models, schools and workplaceshave to be modeled, in order to make predictions meaningful. Such models will get reused at diﬀerent stagesof the outbreak, so they need to be generic enough to incorporate dynamically evolving disease information.Finally, a common challenge across modeling paradigms is the ability to calibrate to the dynamically evolvingspatio-temporal data from the outbreak—this is especially challenging in the presence of reporting biasesand data insuﬃciency issues.

Given the early growth of cases within the country (or sub-region), there is need for quantifying therate of increase in comparable terms across the duration of the outbreak (accounting for the exponentialnature of such processes). These estimates also serve as references, when evaluating the impact of variousinterventions. As an extension, such methods and more sophisticated time series methods can be used toproduce short-term forecasts for disease evolution.

Formulation

Given the disease time series data within the country Z s,t until data horizon T , providescale-independent growth rate measures G s ( T ), and forecasts ˆ Z s,u for u ∈ [ T, T + ∆ T ], where ∆ T is theforecast horizon. Data needs

Models at this stage require datasets such as (1) time series data on diﬀerent kinds of diseaseoutcomes, including case counts, mortality, hospitalizations, along with attributes, such as age, gender andlocation, e.g., [53–57], (2) any associated data for reporting bias (total tests, test positivity rate) [58], whichneed to be incorporated into the models, as these biases can have a signiﬁcant impact on the dynamics, and(3) exogenous regressors (mobility, weather), which have been shown to have a signiﬁcant impact on otherdiseases, such as Inﬂuenza, e.g., [59].

Modeling approaches

Even before building statistical or mechanistic time series forecasting methods,one can derive insights through analytical measures of the time series data. For instance, the eﬀectiveReproductive number, estimated from the time series [60] can serve as a scale-independent metric to comparethe outbreaks across space and time. Additionally multiple statistical methods ranging from autoregressivemodels to deep learning techniques can be applied to the time series data, with additional exogenous variables8s input. While such methods perform reasonably for short-term targets, mechanistic approaches as describedearlier can provide better long-term projections. Various ensembling techniques have also been developedin the recent past to combine such multi-model forecasts to provide a single robust forecast with betteruncertainty quantiﬁcation. One such eﬀort that combines more than 30 methods for COVID-19 can be foundat the COVID Forecasting Hub . We also point to the companion paper for more details on projection andforecasting models. Challenges

Data on epidemic outcomes usually has a lot of uncertainties and errors, including missingdata, collection bias, and backﬁll. For forecasting tasks, these time series data need to be near real-time, elseone needs to do both nowcasting, as well as forecasting. Other exogenous regressors can provide valuablelead time, due to inherent delays in disease dynamics from exposure to case identiﬁcation. Such frameworksneed to be generalized to accommodate qualitative inputs on future policies (shutdowns, mask mandates,etc.), as well as behaviors, as we discuss in the next section.

Once the outbreak has taken hold within the population, local, state and national governments attempt tomitigate and control its spread by considering diﬀerent kinds of interventions. Unfortunately, as the COVID-19 pandemic has shown, there is a signiﬁcant delay in the time taken by governments to respond. As a result,this has caused a large number of cases, a fraction of which lead to hospitalizations. Two key questions inthis stage are: (1) how to evaluate diﬀerent kinds of interventions, and choose the most eﬀective ones,and (2) how to estimate the healthcare infrastructure demand, and how to mitigate it. The eﬀectivenessof an intervention (e.g., social distancing) depends on how individuals respond to them, and the level ofcompliance. The health resource demand depends on the speciﬁc interventions which are implemented. Asa result, both these questions are connected, and require models which incorporate appropriate behavioralresponses.

In the initial stages, only non-prophylactic interventions are available, such as: social distancing, schooland workplace closures, and use of PPEs, since no vaccinations and anti-virals are available. As mentionedabove, such analyses are almost entirely model based, and the speciﬁc model depends on the nature of theintervention and the population being studied.

Formulation

Given a model, denoted abstractly as M , the general goals are (1) to evaluate the impactof an intervention (e.g., school and workplace closure, and other social distancing strategies) on diﬀerentepidemic outcomes (e.g., average outbreak size, peak size, and time to peak), and (2) ﬁnd the most eﬀectiveintervention from a suite of interventions, with given resource constraints. The speciﬁc formulation dependscrucially on the model and type of intervention. Even for a single intervention, evaluating its impact isquite challenging, since there are a number of sources of uncertainty, and a number of parameters associatedwith the intervention (e.g., when to start school closure, how long, and how to restart). Therefore, ﬁndinguncertainty bounds is a key part of the problem. Data needs

While all the data needs from the previous stages for developing a model are still there,representation of diﬀerent kinds of behaviors is a crucial component of the models in this stage; this includes:use of PPEs, compliance to social distancing measures, and level of mobility. Statistics on such behaviors areavailable at a fairly detailed level (e.g., counties and daily) from multiple sources, such as (1) the COVID-19 Impact Analysis Platform from the University of Maryland [56], which gives metrics related to socialdistancing activities, including level of staying home, outside county trips, outside state trips, (2) changes https://covid19forecasthub.org/

9n mobility associated with diﬀerent kinds of activities from Google [61], and other sources, (3) survey dataon diﬀerent kinds of behaviors, such as usage of masks [62].

Modeling approaches

As mentioned above, such analyses are almost entirely model based, includingstructured metapopulation models [8, 42–45], and agent based models [46–50]. Diﬀerent kinds of behaviorsrelevant to such interventions, including compliance with using PPEs and compliance to social distancingguidelines, need to be incorporated into these models. Since there is a great deal of heterogeneity in suchbehaviors, it is conceptually easiest to incorporate them into agent based models, since individual agents arerepresented. However, calibration, simulation and analysis of such models pose signiﬁcant computationalchallenges. On the other hand, the simulation of metapopulation models is much easier, but such behaviorscannot be directly represented— instead, modelers have to estimate the eﬀect of diﬀerent behaviors on thedisease model parameters, which can pose modeling challenges.

Challenges

There are a number of challenges in using data on behaviors, which depends on the speciﬁcdatasets. Much of the data available for COVID-19 is estimated through indirect sources, e.g., through cellphone and online activities, and crowd-sourced platforms. This can provide large spatio-temporal datasets,but have unknown biases and uncertainties. On the other hand, survey data is often more reliable, andprovides several covariates, but is typically very sparse. Handling such uncertainties, rigorous sensitivityanalysis, and incorporating the uncertainties into the analysis of the simulation outputs are important stepsfor modelers.

The COVID-19 pandemic has led to a signiﬁcant increase in hospitalizations. Hospitals are typicallyoptimized to run near capacity, so there have been fears that the hospital capacities would not be adequate,especially in several countries in Asia, but also in some regions in the US. Nosocomial transmission couldfurther increase this burden.

Formulation

The overall problem is to estimate the demand for hospital resources within a population—this includes the number of hospitalizations, and more reﬁned types of resources, such as ICUs, CCUs, medicalpersonnel and equipment, such as ventilators. An important issue is whether the capacity of hospitals withinthe region would be overrun by the demand, when this is expected to happen, and how to design strategiesto meet the demand—this could be through augmenting the capacities at existing hospitals, or building newfacilities. Timing is of essence, and projections of when the demands exceed capacity are important forgovernments to plan.

Data needs

The demands for hospitalization and other health resources can be estimated from the epi-demic models mentioned earlier, by incorporating suitable health states, e.g., [43, 63]; in addition to theinputs needed for setting up the models for case counts, datasets are needed for hospitalization rates anddurations of hospital stay, ICU care, and ventilation. The other important inputs for this component are hos-pital capacity, and the referral regions (which represent where patients travel for hospitalization). Diﬀerentpublic and commercial datasets provide such information, e.g., [64, 65].

Modeling approaches

Demand for health resources is typically incorporated into both metapopulationand agent based models, by having a fraction of the infectious individuals transition into a hospitalizationstate. An important issue to consider is what happens if there is a shortage of hospital capacity. Studyingthis requires modeling the hospital infrastructure, i.e., diﬀerent kinds of hospitals within the region, andwhich hospital a patient goes to. There is typically limited data on this, and data on hospital referralregions, or voronoi tesselation can be used. Understanding the regimes in which hospital demand exceedscapacity is an important question to study. Nosocomial transmission is typically much harder to study, sinceit requires more detailed modeling of processes within hospitals.10 hallenges

There is a lot of uncertainty and variability in all the datasets involved in this process, mak-ing its modeling diﬃcult. For instance, forecasts of the number of cases and hospitalizations have hugeuncertainty bounds for medium or long term horizon, which is the kind of input necessary for understandinghospital demands, and whether there would be any deﬁcits.

The suppression stage involves methods to control the outbreak, including reducing the incidence rateand potentially leading to the eradication of the disease in the end. Eradication in case of COVID-19 appearsunlikely as of now, what is more likely is that this will become part of seasonal human coronaviruses thatwill mutate continuously much like the inﬂuenza virus.

Contact tracing problem refers to the ability to trace the neighbors of an infected individual. Ideally,if one is successful, each neighbor of an infected neighbor would be identiﬁed and isolated from the largerpopulation to reduce the growth of a pandemic. In some cases, each such neighbor could be tested to seeif the individual has contracted the disease. Contact tracing is the workhorse in epidemiology and hasbeen immensely successful in controlling slow moving diseases. When combined with vaccination and otherpharmaceutical interventions, it provides the best way to control and suppress an epidemic.

Formulation

The basic contact tracing problem is stated as follows: Given a social contact network G ( V, E ) and subset of nodes S ⊂ V that are infected and a subset S ⊂ S of nodes identiﬁed as infected,ﬁnd all neighbors of S . Here a neighbor means an individual who is likely to have a substantial contactwith the infected person. One then tests them (if tests are available), and following that, isolates theseneighbors, or vaccinates them or administers anti-viral. The measures of eﬀectiveness for the probleminclude: ( i ) maximizing the size of S , ( ii ) maximizing the size of set N ( S ) ⊆ N ( S ), i.e. the potentialnumber of neighbors of set S , ( iii ) doing this within a short period of time so that these neighbors eitherdo not become infectious, or they minimize the number of days that they are infectious, while they are stillinteracting in the community in a normal manner, ( iv ) the eventual goal is to try and reduce the incidencerate in the community—thus if all the neighbors of S cannot be identiﬁed, one aims to identify thoseindividuals who when isolated/treated lead to a large impact; ( v ) and ﬁnally verifying that these individualsindeed came in contact with the infected individuals and thus can be asked to isolate or be treated. Data needs

Data needed for the contact tracing problem includes: ( i ) a line list of individuals who arecurrently known to be infected (this is needed in case of human based contact tracing). In the real world,when carrying out human contact tracers based deployment, one interviews all the individuals who are knownto be infectious and reaches out to their contacts. Modeling approaches

Human contact tracing is routinely done in epidemiology. Most states in the UShave hired such contact tracers. They obtain the daily incidence report from the state health departmentsand then proceed to contact the individuals who are conﬁrmed to be infected. Earlier, human contact tracersused to go from house to house and identify the potential neighbors through a well deﬁned interview process.Although very eﬀective it is very time consuming and labor intensive. Phones were used extensively in thelast 10-20 years as they allow the contact tracers to reach individuals. They are helpful but have the downsidethat it might be hard to reach all individuals. During COVID-19 outbreak, for the ﬁrst time, societies andgovernments have considered and deployed digital contact tracing tools [66–70]. These can be quite eﬀectivebut also have certain weaknesses, including, privacy, accuracy, and limited market penetration of the digitalapps. 11 hallenges

These include: ( i ) inability to identify everyone who is infectious (the set S ) — this is virtuallyimpossible for COVID-19 like disease unless the incidence rate has come down drastically and for the reasonthat many individuals are infected but asymptomatic; ( ii ) identifying all contacts of S (or S ) – this is hardsince individuals cannot recall everyone they met, certain folks that they were in close proximity might havebeen in stores or social events and thus not known to individuals in the set S . Furthermore, even if a personis able to identify the contacts, it is often hard to reach all the individuals due to resource constraints (eachhuman tracer can only contact a small number of individuals. The overall goal of the vaccine allocation problem is to allocate vaccine eﬃciently and in a timely mannerto reduce the overall burden of the pandemic.

Formulation

The basic version of the problem can be cast in a very simple manner (for networked models):Given a graph G ( V, E ) and a budget B on the number of vaccines available, ﬁnd a set S of size B to vaccinateso as to optimize certain measure of eﬀectiveness. The measure of eﬀectiveness can be ( i ) minimizing the totalnumber of individuals infected (or maximizing the total number of uninfected individuals); ( ii ) minimizingthe total number of deaths (or maximizing the total number of deaths averted); ( iii ) optimizing the abovequantities but keeping in mind certain equity and fairness criteria (across socio-demographic groups, e.g.age, race, income); ( iv ) taking into account vaccine hesitancy of individuals; ( v ) taking into account the factthat all vaccines are not available at the start of the pandemic, and when they become available, one getslimited number of doses each month; ( vi ) deciding how to share the stockpile between countries, state, andother organizations; ( vii ) taking into account eﬃcacy of the vaccine. Data needs

As in other problems, vaccine allocation problems need as input a good representation ofthe system; network based, meta-population based and compartmental mass action models can be used.One other key input is the vaccine budget, i.e., the production schedule and timeline, which serves as theconstraint for the allocation problem. Additional data on prevailing vaccine sentiment and past complianceto seasonal/neonatal vaccinations are useful to estimate coverage.

Modeling approaches

The problem has been studied actively in the literature; network science com-munity has focused on optimal allocation schemes, while public health community has focused on usingmeta-population models and assessing certain ﬁxed allocation schemes based on socio-economic and demo-graphic considerations. Game theoretic approaches that try and understand strategic behavior of individualsand organization has also been studied.

Challenges

The problem is computationally challenging and thus most of the time simulation basedoptimization techniques are used. Challenge to the optimization approach comes from the fact that theoptimal allocation scheme might be hard to compute or hard to implement. Other challenges include fairnesscriteria (e.g. the optimal set might be a speciﬁc group) and also multiple objectives that one needs to balance.

While the above sections provide an overview of salient modeling questions that arise during the keystages of a pandemic, mathematical and computational model development is equally if not more importantas we approach the post-pandemic (or more appropriately inter-pandemic ) phase. Often referred to as peacetime eﬀorts, this phase allows modelers to retrospectively assess individual and collective models on howthey performed during the pandemic. In order to encourage continued development and identifying datagaps, synthetic forecasting challenge exercises [71] may be conducted where multiple modeling groups areinvited to forecast synthetic scenarios with varying levels of data availability. Another set of models that12re quite relevant for policymakers during the winding down stages, are those that help assess overall healthburden and economic costs of the pandemic.

Acknowledgments.

The authors would like to thank members of the Biocomplexity COVID-19 ResponseTeam and Network Systems Science and Advanced Computing (NSSAC) Division for their thoughtful com-ments and suggestions related to epidemic modeling and response support. We thank members of theBiocomplexity Institute and Initiative, University of Virginia for useful discussion and suggestions. Thiswork was partially supported by National Institutes of Health (NIH) Grant R01GM109718, NSF BIG DATAGrant IIS-1633028, NSF DIBBS Grant OAC-1443054, NSF Grant No.: OAC-1916805, NSF Expeditions inComputing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, USCenters for Disease Control and Prevention 75D30119C05935, DTRA subcontract/ARA S-D00189-15-TO-01-UVA. Any opinions, ﬁndings, and conclusions or recommendations expressed in this material are those ofthe author(s) and do not necessarily reﬂect the views of the funding agencies.

References [1] Bijaya Adhikari, Xinfeng Xu, Naren Ramakrishnan, and B. Aditya Prakash. Epideep: Exploiting embed-dings for epidemic forecasting. In

Proceedings of the 25th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining , KDD ’19, page 577–586, New York, NY, USA, 2019. Associationfor Computing Machinery. https://doi.org/10.1145/3292500.3330917.[2] Gaetano Perone. An arima model to forecast the spread and the ﬁnal size of covid-2019 epidemic initaly (ﬁrst version on ssrn 31 march).

SSRN Electronic Journal , 03 2020.[3] Angel Desai, Moritz Kraemer, Sangeeta Bhatia, Anne Cori, Pierre Nouvellet, Mark Herringer, EmilyCohn, Malwina Carrion, John Brownstein, Lawrence Madoﬀ, and Britta Lassmann. Real-time epidemicforecasting: Challenges and opportunities.

Health Security , 17:268–275, 08 2019.[4] Nicholas G. Reich, Craig J McGowan, Teresa K. Yamana, Abhinav Tushar, Evan L. Ray, Dave Osthus,Sasikiran Kandula, Logan C. Brooks, Willow Crawford-Crudell, Graham Casey Gibson, Evan Moore,Rebecca Silva, Matthew Biggerstaﬀ, Michael A. Johansson, Roni Rosenfeld, and Jeﬀrey L Shaman. Ac-curacy of real-time multi-model ensemble forecasts for seasonal inﬂuenza in the u.s.

PLoS ComputationalBiology , 15, 2019.[5] Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rosalind M. Eggo, and W. John Edmunds. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model.

Epidemics ,22:56 – 61, 2018. The RAPIDD Ebola Forecasting Challenge.[6] Healthmap. https://healthmap.org/en/ .[7] Isaac Fung, Zion Tse, and King-wa Fu. The use of social media in public health surveillance.

WesternPaciﬁc surveillance and response journal : WPSAR , 6:3–6, 04 2015.[8] Matteo Chinazzi, Jessica T. Davis, Marco Ajelli, Corrado Gioannini, Maria Litvinova, Stefano Merler,Ana Pastore y Piontti, Kunpeng Mu, Luca Rossi, Kaiyuan Sun, C´ecile Viboud, Xinyue Xiong, HongjieYu, M. Elizabeth Halloran, Ira M. Longini, and Alessandro Vespignani. The eﬀect of travel restrictionson the spread of the 2019 novel coronavirus (covid-19) outbreak.

Science , 368(6489):395–400, 2020.https://science.sciencemag.org/content/368/6489/395.[9] Tom Britton. Basic prediction methodology for covid-19: estimation and sensitivity considerations. medRxiv , 2020. .[10] Joacim Rockl¨ov, Henrik Sj¨odin, and Annelies Wilder-Smith. Covid-19 outbreak on the diamond princesscruise ship: estimating the epidemic potential and eﬀectiveness of public health countermeasures.

Jour-nal of travel medicine , 27(3):taaa030, 2020. 1311] Neil Ferguson, Daniel Laydon, Gemma Nedjati Gilani, Natsuko Imai, Kylie Ainslie, Marc Baguelin,Sangeeta Bhatia, Adhiratha Boonyasiri, ZULMA Cucunuba Perez, Gina Cuomo-Dannenburg, et al.Report 9: Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and health-care demand.

Imperial College Technical Report , 2020. .[12] Stephen Eubank, Hasan Guclu, V. S. Anil Kumar, Madhav Marathe, Aravind Srinivasan, ZoltanToroczkai, and Nan Wang. Modelling disease outbreaks in realistic urban social networks.

Nature ,429:180–184, 2004.[13] Madhav Marathe and Anil Vullikanti. Computational epidemiology.

Communications of the ACM ,56(7):88–96, 2013.[14] IHME COVID, Christopher JL Murray, et al. Forecasting covid-19 impact on hospital bed-days, icu-days, ventilator-days and deaths by us state in the next 4 months.

MedRxiv , 2020. .[15] Teodoro Alamo, Daniel G Reina, Martina Mammarella, and Alberto Abella. Open data resources forﬁghting covid-19. arXiv preprint arXiv:2004.06111 , 2020. https://arxiv.org/pdf/2004.06111.pdf .[16] Teodoro Alamo, DG Reina, and Pablo Mill´an. Data-driven methods to monitor, model, forecast andcontrol covid-19 pandemic: Leveraging data science, epidemiology and control theory. arXiv preprintarXiv:2006.01731 , 2020. https://arxiv.org/pdf/2006.01731.pdf .[17] Junaid Shuja, Eisa Alanazi, Waleed Alasmary, and Abdulaziz Alashaikh. Covid-19 datasets: A surveyand future challenges. medRxiv , 2020. .[18] Reza Sameni. Mathematical modeling of epidemic diseases; a case study of the covid-19 coronavirus. arXiv preprint arXiv:2003.11371 , 2020. https://arxiv.org/pdf/2003.11371.pdf .[19] Joseph T Wu and Benjamin J Cowling. The use of mathematical models to inform inﬂuenza pandemicpreparedness and response.

Experimental Biology and Medicine , 236(8):955–961, 2011.[20] Aniruddha Adiga, Devdatt Dubhashi, Bryan Lewis, Madhav Marathe, Srinivasan Venkatramanan, andAnil Vullikanti. Mathematical models for covid-19 pandemic: a comparative analysis.

Journal of IISc ,2020.[21] Rachel Holloway, Sonja A Rasmussen, Stephanie Zaza, Nancy J Cox, Daniel B Jernigan, and InﬂuenzaPandemic Framework Workgroup. Updated preparedness and response framework for inﬂuenza pan-demics.

Morbidity and Mortality Weekly Report: Recommendations and Reports , 63(6):1–18, 2014.[22] Carrie Reed, Matthew Biggerstaﬀ, Lyn Finelli, Lisa M Koonin, Denise Beauvais, Amra Uzicanin, An-drew Plummer, Joe Bresee, Stephen C Redd, and Daniel B Jernigan. Novel framework for assessingepidemiologic eﬀects of inﬂuenza epidemics and pandemics.

Emerging infectious diseases , 19(1):85, 2013.[23] Centers for Disease Control and Prevention. Covid-19 pandemic planning scenarios. , 2020. [Online, accessed September14, 2020].[24] Bo Xu, Bernardo Gutierrez, Sumiko Mekaru, Kara Sewalk, Lauren Goodwin, Alyssa Loskill, Emily LCohn, Yulin Hswen, Sarah C Hill, Maria M Cobo, et al. Epidemiological data from the covid-19outbreak, real-time case information.

Scientiﬁc data , 7(1):1–6, 2020.[25] CDC. Covid-19 case surveillance public use data — data — centers for disease control and prevention. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf . (Accessed on 08/24/2020). 1426] Long-quan Li, Tian Huang, Yong-qing Wang, Zheng-ping Wang, Yuan Liang, Tao-bi Huang, Hui-yunZhang, Weiming Sun, and Yuping Wang. Covid-19 patients’ clinical characteristics, discharge rate, andfatality rate of meta-analysis.

Journal of medical virology , 92(6):577–583, 2020.[27] Tapiwa Ganyani, C´ecile Kremer, Dongxuan Chen, Andrea Torneri, Christel Faes, Jacco Wallinga, andNiel Hens. Estimating the generation interval for coronavirus disease (covid-19) based on symptomonset data, march 2020.

Eurosurveillance , 25(17):2000257, 2020.[28] Stephen A Lauer, Kyra H Grantz, Qifang Bi, Forrest K Jones, Qulu Zheng, Hannah R Meredith,Andrew S Azman, Nicholas G Reich, and Justin Lessler. The incubation period of coronavirus disease2019 (covid-19) from publicly reported conﬁrmed cases: estimation and application.

Annals of internalmedicine , 172(9):577–582, 2020.[29] Joseph T Wu, Kathy Leung, Mary Bushman, Nishant Kishore, Rene Niehus, Pablo M de Salazar,Benjamin J Cowling, Marc Lipsitch, and Gabriel M Leung. Estimating clinical severity of covid-19 fromthe transmission dynamics in wuhan, china.

Nature Medicine , 26(4):506–510, 2020.[30] Margaux Marie Isabelle Mesl´e, Ian Melvyn Hall, Robert Matthew Christley, Steve Leach, andJonathan Michael Read. The use and reporting of airline passenger data for infectious disease modelling:a systematic review.

Eurosurveillance , 24(31):1800216, 2019.[31] Srini Venkatramanan. Flight cancellations related to 2019-nCoV (COVID-19).

University of VirginiaDataverse , 2020.[32] D. Brockmann and Dirk Helbing. The hidden geometry of complex, network-driven contagion phenom-ena.

Science (New York, N.Y.) , 342:1337–1342, 12 2013.[33] Aniruddha Adiga, Srinivasan Venkatramanan, James Schlitt, Akhil Peddireddy, Allan Dickerman, An-drei Bura, Andrew Warren, Brian D Klahn, Chunhong Mao, Dawen Xie, Dustin Machi, Erin Ray-mond, Fanchao Meng, Golda Barrow, Henning Mortveit, Jiangzhuo Chen, Jim Walke, Joshua Goldstein,Mandy L Wilson, Mark Orr, Przemyslaw Porebski, Pyrros A Telionis, Richard Beckman, Stefan Hoops,Stephen Eubank, Young Yun Baek, Bryan Lewis, Madhav Marathe, and Chris Barrett. Evaluating theimpact of international airline suspensions on the early global spread of covid-19. medRxiv , 2020.[34] Isaac I Bogoch, Alexander Watts, Andrea Thomas-Bachli, Carmen Huber, Moritz UG Kraemer, andKamran Khan. Potential for global spread of a novel coronavirus from China.

Journal of TravelMedicine , 2020.[35] J.T. Wu, K. Leung, and G.M. Leung. Forecasting the potential domestic and international spread ofthe 2019-nCoV outbreak originating in Wuhan, China: A modelling study.

The Lancet , 2020.[36] P.M. De Salazar, R. Niehus, A. Taylor, C.O. Buckee, and M. Lipsitch. Using predicted imports of2019-nCoV cases to determine locations that may not be identifying all imported cases. medRxiv , 2020. .[37] M. Gilbert, G. Pullano, F. Pinotti, E. Valdano, C. Poletto, P.Y. Boelle, E. D’Ortenzio, Y. Yazdanpanah,S.P. Eholie, M. Altmann, and B. Gutierrez. Preparedness and vulnerability of African countries againstintroductions of 2019-nCoV. medRxiv , 2020.[38] R. Beckman, J. Baggerly, A. Keith, and M. McKay. Creating synthetic baseline populations.

Trans-portation Research-A , 30:415–429, 1996.[39] Landscan. https://landscan.ornl.gov/landscan-datasets .[40] Openstreetmap. https://wiki.openstreetmap.org/wiki/Downloading_data .[41] American time use survey. .1542] Duygu Balcan, Vittoria Colizza, Bruno Gon¸calves, Hao Hu, Jos´e J. Ramasco, and Alessandro Vespig-nani. Multiscale mobility networks and the spatial spreading of infectious diseases.

Proceedings of theNational Academy of Sciences , 106:21484 – 21489, 2009.[43] Srinivasan Venkatramanan, Jiangzhuo Chen, Arindam Fadikar, Sandeep Gupta, Dave Higdon, BryanLewis, Madhav Marathe, Henning Mortveit, and Anil Vullikanti. Optimizing spatial allocation of sea-sonal inﬂuenza vaccine under temporal constraints.

PLoS computational biology , 15(9):e1007111, 2019.[44] Marcelo F. C. Gomes, Ana Pastore y Piontti, Luca Rossi, Dennis L Chao, Ira M. Longini, M. ElizabethHalloran, and Alessandro Vespignani. Assessing the international spreading risk associated with the2014 west african ebola outbreak.

PLoS Currents , 6, 2014.[45] Qian Zhang, Kaiyuan Sun, Matteo Chinazzi, Ana Pastore y Piontti, Natalie E. Dean, Diana PatriciaRojas, Stefano Merler, Dina Mistry, Piero Poletti, Luca Rossi, Margaret Bray, M. Elizabeth Halloran,Ira M. Longini, and Alessandro Vespignani. Spread of zika virus in the americas.

PNAS , 114(22):E4334–E4343, 2017.[46] Stephen Eubank, VS Anil Kumar, Madhav V Marathe, Aravind Srinivasan, and Nan Wang. Structureof social contact networks and their impact on epidemics.

DIMACS Series in Discrete Mathematics andTheoretical Computer Science , 70:181, 2006.[47] Christopher L Barrett, Richard J Beckman, Maleq Khan, V. S. Anil Kumar, Madhav V Marathe,Paula E Stretz, Tridib Dutta, and Bryan Lewis. Generation and analysis of large synthetic socialcontact networks. In

Winter Simulation Conference , pages 1003–1014. Winter Simulation Conference,2009.[48] Stephen Eubank, Hasan Guclu, VS Anil Kumar, Madhav V Marathe, Aravind Srinivasan, ZoltanToroczkai, and Nan Wang. Modelling disease outbreaks in realistic urban social networks.

Nature ,429(6988):180–184, 2004.[49] Ira M. Longini, Azhar Nizam, Shufu Xu, Kumnuan Ungchusak, Wanna Hanshaoworakul, Derek A.Cummings, and Elizabeth M. Halloran. Containing pandemic inﬂuenza at the source.

Science ,309(5737):1083–1087, August 2005.[50] Neil Ferguson, Daniel Laydon, Gemma Nedjati Gilani, Natsuko Imai, Kylie Ainslie, Marc Baguelin,Sangeeta Bhatia, Adhiratha Boonyasiri, Zulma Cucunuba Perez, Gina Cuomo-Dannenburg, et al. Re-port 9: Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcaredemand.

Imperial College Technical Reports , 2020.[51] Linda JS Allen, Fred Brauer, Pauline Van den Driessche, and Jianhong Wu.

Mathematical epidemiology ,volume 1945. Springer, 2008.[52] Mark EJ Newman. The structure and function of complex networks.

SIAM review , 45(2):167–256, 2003.[53] Amazon Web Services. A public data lake for analysis of COVID-19 data. https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/ , 2020.[54] MIDAS network. MIDAS 2019 novel coronavirus repository, 2020. https://github.com/midas-network/COVID-19 .[55] The New York Times. Coronavirus (Covid-19) Data in the United States, 2020. https://github.com/nytimes/covid-19-data .[56] Covid-19 impact analysis platform. https://data.covid.umd.edu/ .[57] Biocomplexity Institute. COVID-19 Surveillance Dashboard, 2020. http://ncov.bii.virginia.edu/dashboard/ . 1658] The covid tracking project. https://covidtracking.com/ .[59] Jeﬀrey Shaman, Virginia Pitzer, C´ecile Viboud, Bryan Grenfell, and Marc Lipsitch. Absolute humidityand the seasonal onset of inﬂuenza in the continental united states.

PLoS biology , 8:e1000316, 02 2010.[60] A Cori. Epiestim: a package to estimate time varying reproduction numbers from epidemic curves.

Rpackage version , pages 1–1, 2013.[61] Google covid-19 community mobility reports. .[62] Mask-wearing survey data. https://github.com/nytimes/covid-19-data/tree/master/mask-use .[63] Xutong Wang, Remy F Pasco, Zhanwei Du, Michaela Petty, Spencer J Fox, Alison P Galvani, MichaelPignone, S Claiborne Johnston, and Lauren Ancel Meyers. Impact of social distancing measures oncoronavirus disease healthcare demand, central texas, usa.

Emerging infectious diseases , 26(10), July2020.[64] Current hospital capacity estimates —snapshot. .[65] Total hospital bed occupancy (covid-19). .[66] Lars Lorch, William Trouleau, Stratis Tsirtsis, Aron Szanto, Bernhard Sch¨olkopf, and Manuel Gomez-Rodriguez. Quantifying the eﬀects of contact tracing, testing, and containment. arXiv preprintarXiv:2004.07641 , 2020.[67] Marcel Salath´e, Christian L Althaus, Richard Neher, Silvia Stringhini, Emma Hodcroft, Jacques Fellay,Marcel Zwahlen, Gabriela Senti, Manuel Battegay, Annelies Wilder-Smith, et al. Covid-19 epidemic inswitzerland: on the importance of testing, contact tracing and isolation.

Swiss medical weekly , 150(1112),2020.[68] Luca Ferretti, Chris Wymant, Michelle Kendall, Lele Zhao, Anel Nurtay, Lucie Abeler-D¨orner, MichaelParker, David Bonsall, and Christophe Fraser. Quantifying sars-cov-2 transmission suggests epidemiccontrol with digital contact tracing.

Science , 2020.[69] Mirjam Kretzschmar, Ganna Rozhnova, and Michiel van Boven. Isolation and contact tracing can tipthe scale to containment of covid-19 in populations with social distancing.

Available at SSRN 3562458 ,2020. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3562458 .[70] Justin Chan, Shyam Gollakota, Eric Horvitz, Joseph Jaeger, Sham Kakade, Tadayoshi Kohno, JohnLangford, Jonathan Larson, Sudheesh Singanamalla, Jacob Sunshine, et al. Pact: Privacy sensitiveprotocols and mechanisms for mobile contact tracing. arXiv preprint arXiv:2004.03544 , 2020. https://arxiv.org/pdf/2004.03544.pdf .[71] C´ecile Viboud, Kaiyuan Sun, Robert Gaﬀey, Marco Ajelli, Laura Fumanelli, Stefano Merler, QianZhang, Gerardo Chowell, Lone Simonsen, Alessandro Vespignani, et al. The rapidd ebola forecastingchallenge: Synthesis and lessons learnt.