A hierarchical spatio-temporal model to analyze relative risk variations of COVID-19: a focus on Spain, Italy and Germany
AA hierarchical spatio-temporal model to analyzerelative risk variations of COVID-19: a focus onSpain, Italy and Germany
Abdollah Jalilian and Jorge Mateu Department of Statistics, Razi University, Kermanshah,67149-67346, Iran Universitat Jaume I, Castell´on, Spain
Abstract
The novel coronavirus disease (COVID-19) has spread rapidly acrossthe world in a short period of time and with a heterogeneous pattern. Un-derstanding the underlying temporal and spatial dynamics in the spreadof COVID-19 can result in informed and timely public health policies. Inthis paper, we use a spatio-temporal stochastic model to explain the tem-poral and spatial variations in the daily number of new confirmed casesin Spain, Italy and Germany from late February to mid September 2020.Using a hierarchical Bayesian framework, we found that the temporaltrend of the epidemic in the three countries rapidly reached their peaksand slowly started to decline at the beginning of April and then increasedand reached their second maximum in August. However decline and in-crease of the temporal trend seems to be sharper in Spain and smootherin Germany. The spatial heterogeneity of the relative risk of COVID-19in Spain is also more pronounced than Italy and Germany.
Keywords:
Autoregressive model Besag model COVID-19 Diseasemapping Spatio-temporal prediction
MSC 2010:
Started from Wuhan, the capital of Hubei province, China in December 2019,the outbreak of 2019 novel coronavirus disease (COVID-19) has spread rapidlyacross more than 200 countries, areas or territories in a short period of timewith so far over 4.4 million confirmed cases and 296 thousand confirmed deaths(World Health Organization, 2020).The spread of COVID-19 across and within countries has not followed a ho-mogeneous pattern (Giuliani et al., 2020). The causes of this heterogeneity are1 a r X i v : . [ s t a t . A P ] S e p ot yet clearly identified, but different countries have different levels of nationalcapacity based on their abilities in prevention, detection, response strategies,enabling function, and operational readiness (Kandel et al., 2020). Besides, dif-ferent countries have implemented different levels of rigorous quarantine andcontrol measures to prevent and contain the epidemic, which affect the popula-tion movement and hence the spread pattern of COVID-19. Given the highlycontagious nature of COVID-19, the spatial pattern of the spread of the diseasechanges rapidly over time. Thus, understanding the spatio-temporal dynamicsof the spread of COVID-19 in different countries is undoubtedly critical.The spatial or geographical distribution of relative location of incidence (newcases) of COVID-19 in a country is important in the analyses of the disease riskacross the country. In disease mapping studies, the spatial domain of interestis partitioned into a number of contiguous smaller areas, usually defined byadministrative divisions such as provinces, counties, municipalities, towns orcensus tracts, and the aim of the study is to estimate the relative risk of eacharea at different times (Lee, 2011; Lawson, 2018). Spatio-temporal models arethen required to explain and predict the evolution of incidence and risk of thedisease in both space and time simultaneously (Anderson and Ryan, 2017).Estimation of area-specific risks over time provides information on the dis-ease burden in specific areas and identifies areas with elevated risk levels (hotspots). In addition, identifying the changes in the spatial patterns of the dis-ease risk over time may result in detecting either regional or global trends,and contributes to make informed and timely public health resource allocation(Wakefield, 2007).To account for the underlying temporal and spatial autocorrelation structurein the spread of COVID-19, available data on the daily number of new cases anddeaths in different countries/regions have already been analyzed in a consider-able number of studies. For example, Kang et al. (2020) used Moran’s I spatialstatistic with various definitions of neighbors and observed a significant spatialassociation of COVID-19 in daily number of new cases in provinces of main-land China. Gayawan et al. (2020) used a zero-inflated Poisson model for thedaily number of new COVID-19 cases in the African continent and found thatthe pandemic varies geographically across Africa with notable high incidencein neighboring countries. Briz-Red´on and Serrano-Aroca (2020) conducted aspatio-temporal analysis for exploring the effect of daily temperature on the ac-cumulated number of COVID-19 cases in the provinces of Spain. They found noevidence suggesting a relationship between the temperature and the prevalenceof COVID-19 in Spain. Gross et al. (2020) studied the spatio-temporal spread ofCOVID-19 in China and compare it to other global regions and concluded thathuman mobility/migration from Hubei and the spread of COVID-19 are highlyrelated. Danon et al. (2020) combined 2011 census data to capture populationsizes and population movement in England and Wales with parameter estimatesfrom the outbreak in China and found that the COVID-19 outbreak is going topeak around 4 months after the start of person-to-person transmission. Usinglinear regression, multilayer perceptron and vector autoregression, Sujath et al.(2020) modeled and forecasted the spread of COVID-19 cases in India.2s pointed out in Alamo et al. (2020), there are many national and interna-tional organizations that provide open data on the number of confirmed casesand deaths. However, these data often suffer from incompleteness and inac-curacy, which are considerable limitations for any analyses and modeling con-ducted on the available data on COVID-19 (Langousis and Carsteanu, 2020).We highlight that we are yet in the center of the pandemic crisis and due to thepublic health problem, and also to the severe economical situation, we do nothave access to all sources of data. Thus reseachers know only a portion of all theelements related to COVID-19. In addition, data on many relevant variablessuch as population movement and interaction and the impact of quarantine andsocial distancing policies are not either available or accurately measured. Com-bined with the unknown nature of the new COVID-19 virus, any analysis suchas the present study only provides an approximate and imprecise description ofthe underlying spatio-temporal dynamic of the pandemic. Nevertheless, havinga vague idea is better than having no idea, and the results should be interpretedwith caution.Currently, a wealth of studies have appeared in the very recent literature.Many of them follow the compartmental models in epidemiology, partition thepopulation into subpopulations (compartments) of susceptible (S), exposed (E),infectious (I) and recovered (R), and fit several variations of the classical deter-ministic SIR and SEIR epidemiological models (Peng et al., 2020; Roda et al.,2020; Bastos and Cajueiro, 2020). We believe that considering stochastic compo-nents is important, if not essential, to explain the complexity and heterogeneityof the spread of COVID-19 over time and space. For this reason, in the presentwork we propose a spatio-temporal stochastic modeling approach that is able toaccount for the spatial, temporal and interactions effects, together with possibledeterministic covariates.We acknowledge that the proposed model in its current form requires devel-opment and refinements as more information becomes available, but at the stageof the pandemic we are now, it can provide a reasonable modeling frameworkfor the spatio-temporal spread of COVID-19. This is illustrated by modelingthe daily number of new confirmed cases in Spain, Italy and Germany fromlate February to mid August 2020. The R code for implementing the proposedmodel can be made available upon request. We also provide a Shiny web ap-plication (Chang et al., 2020) based on the model discussed in this paper at https://ajalilian.shinyapps.io/shinyapp/ .The structure of the paper is the following. The open data resources used inthis study are introduced in Section 2. A model for the daily number of regionalcases is considered in Section 3. As described in Section 4, this model explainsthe spatio-temporal variations in the relative risk of each country in terms of anumber of temporal, spatial and spatio-temporal random effects. The results offitting the considered model to the number of daily confirmed cases in Spain,Italy and Germany are given in Section 5. The paper concludes in Section 6with a few last remarks. 3 Data on the daily number of COVID-19 cases
Governmental and non-governmental organizations across the world are col-lecting and reporting regional, national and global data on the daily number ofconfirmed cases, deaths and recovered patients and provide open data resources.Incompleteness, inconsistency, inaccuracy and ambiguity of these open data areamong limitations of any analysis, modeling and forecasting based on the data(Alamo et al., 2020). Particularly, the number of cases mainly consist of casesconfirmed by a laboratory test and do not include infected asymptomatic casesand infected symptomatic cases without a positive laboratory test.In this study, we focus on the daily number of confirmed cases in Spain, Italyand Germany and used the following open data resources.
Spain:
DATADISTA, a Spanish digital communication medium that extractsthe data on confirmed cases (PCR and antibody test) registered by the Au-tonomous Communities of Spain and published by the Ministry of Healthand the Carlos III Health Institute. DATADISTA makes the data availablein an accessible format at the GitHub repository https://github.com/datadista/datasets/tree/master/COVID%2019 . The daily accumulatednumber of total confirmed cases registered in the 19 Autonomous Com-munities of Spain are updated by DATADISTA on a daily bases.
Italy:
Data on the daily accumulated number of confirmed cases in the 20regions of Italy are reported by the Civil Protection Department (Diparti-mento della Protezione Civile), a national organization in Italy that dealswith the prediction, prevention and management of emergency events.These data are available at the GitHub repository https://github.com/pcm-dpc/COVID-19 and are being constantly updated every day at 18:00.
Germany:
The Robert Koch Institute, a federal government agency and re-search institute responsible for disease control and prevention, collectsdata and publishes official daily situation reports on COVID-19 in Ger-many. Data on the daily accumulated number of confirmed cases in the16 federal states of Germany extracted from the situation reports of theRobert Koch Institute are available at the GitHub repository https://github.com/jgehrcke/covid-19-germany-gae and are being updatedon a daily basis.Table 1 summarizes the number of regions, study period and country-widedaily incidence rate of the data for each country.Data on distribution population of the considered countries are extractedfrom the Gridded Population of the World, Version 4 (GPWv4), which pro-vides estimates of the number of persons per pixel (1 degree resolution) for theyear 2020 (Center International Earth Science Information Network (CIESIN)Columbia University, 2018). These data are consistent with national censusesand population registers. 4able 1: Considered countries with their corresponding number of regions ( m ),length of study period and the estimated country-wide daily incidence rate ( (cid:98) (cid:37) ).country number of regions study period incidence rateSpain 18 a Autonomous Communities 2020-02-25 to 2020-09-13 62 . × − Italy 20 Regions 2020-02-25 to 2020-09-15 24 . × − Germany 16 Federal States 2020-03-03 to 2020-09-12 16 . × − a The Autonomous Communities Ceuta and Melilla are merged into “Ceuta y Melilla”
Suppose that a country, the spatial domain of interest, is partitioned into re-gions A , . . . , A m , defined by administrative divisions such as states, provinces,counties, etc (see Table 1). Let Y it denote the number of new COVID-19 casesin region A i , i = 1 , . . . , m , at time (day) t = 1 , . . . , T and E it = E [ Y it ] be theexpected number of new COVID-19 cases in region A i at time t . The expected number of new cases is given by E it = P i (cid:37) it , where P i is thepopulation of region A i and (cid:37) it is the incidence rate of COVID-19 in region A i at time t . Under the null model of spatial and temporal homogeneity of theincidence rate (cid:37) it = (cid:37) and (cid:98) E it = P i (cid:98) (cid:37) , (1)provides an estimate for E it , where (cid:98) (cid:37) = 1 T T (cid:88) t =1 (cid:80) mi =1 Y it (cid:80) mi =1 P i is an estimate of the country-wide homogeneous daily incidence rate (Walleret al., 1997). The estimated daily incidence rate per million population (10 (cid:98) (cid:37) )so far is around 68, 46 and 29 for Spain, Italy and Germany, respectively (seeTable 1). Consul and Jain (1973) introduced a generalization of the Poisson distribution,which is a suitable model to most unimodal or reverse J-shaped counting dis-tributions. Given nonnegative random rates Λ it , i = 1 , . . . , m , t = 1 , . . . , T , weassume that Y it ’s are independent random variables following the generalizedPoisson distribution with P ( Y it = y it | Λ it , ϕ, α ) = exp ( − Ω it − Ψ it y it ) Ω it (Ω it + Ψ it y it ) y it − y it ! , y it = 0 , , . . . , it = Λ it ϕ Λ α − it , Ψ it = ϕ Λ α − it ϕ Λ α − it , where ϕ ≥ α ∈ R . Then P ( Y it = y it | Λ it , ϕ, α ) = exp (cid:18) − Λ it + ϕ Λ α − it y it ϕ Λ α − it (cid:19) Λ it (cid:0) Λ it + ϕ Λ α − it y it (cid:1) y it − (cid:0) ϕ Λ α − it (cid:1) y it y it !and E [ Y it | Λ it , ϕ, α ] = Λ it , V ar( Y it | Λ it , ϕ, α ) = Λ it (cid:0) ϕ Λ α − it (cid:1) . Thus ϕ is the dispersion parameter and the case ϕ = 0 represents the ordinaryPoisson distribution (no dispersion) with P ( Y it = y it | Λ it ) = exp ( − Λ it ) Λ y it it y it ! , y it = 0 , , . . . . Here, parameter α controls the shape (power) of the relation between the con-ditional variance of Y it | Λ it and its conditional mean. For example, the relationbetween V ar( Y it | Λ it ) and E [ Y it | Λ it ] is linear if α = 1 and cubic if α = 2 (Zamaniand Ismail, 2012). The underlying random rates Λ it , i = 1 , . . . , m, t = 1 , . . . , T , account for theextra variability (overdispersion), which may represent unmeasured confoundersand model misspecification (Wakefield, 2007). Variations of the random rate Λ it relative to the expected number of cases E it provide useful information aboutthe spatio-temporal risk of COVID-19 in the whole spatial domain of interestduring the study period. In disease mapping literature, the nonnegative random quantities θ it = E [ Y it | Λ it ] E [ Y it ] = Λ it E it , i = 1 , . . . , m, are called the area-specific relative risks at time t (Lawson, 2018, Section 5.1.4).Obviously E θ it = 1 and C ov ( θ it , θ i (cid:48) t (cid:48) ) = C ov (Λ it , Λ i (cid:48) t (cid:48) ) E it E i (cid:48) t (cid:48) , which means that the temporal and spatial correlation structure of the underly-ing random rates Λ it determine the spatio-temporal correlations between θ it ’s.6able 2: Considered terms in the additive model for the spatio-temporal randomeffect of the log-linear model for relative risks.term description model δ t temporal trend RW2 ε t temporal correlation AR(2) ζ i spatial correlation due to distance between regions GMRF ξ i spatial correlation due to neighborhood relation between regions BYMBy ignoring these correlations, the standardized incidence ratio (cid:98) θ it = Y it / (cid:98) E it provides a naive estimate for the relative risks (Lee, 2011). However, in amodel-based approach the variations of the relative risks are often related toregional and/or temporal observed covariates and the correlation between θ it ’sare explained in terms of regional and/or temporal random effects using, forexample, a log linear model (Wakefield, 2007; Lee, 2011; Lawson, 2018). In the present study, we consider the log linear model θ it = exp ( µ + βd i + η it ) , (2)where µ is the intercept and d i is the population density of region A i , i.e. thepopulation of A i , P i , divided by the area of A i . The population density isstandardized to have mean 0 and variance 1 and β is its regression coefficient.Moreover, η it is a zero mean random effect which represents spatio-temporalvariations in relative risks due to temporal and spatial trend and correlation.Among many different possibilities, we assume that η it takes the additive form η it = δ t + ε t + ζ i + ξ i (3)where δ i represents the temporal trend, ε t accounts for temporal correlation and ζ i and ξ i explain spatial correlation due to spatial distance and neighborhoodrelations among regions A , . . . , A m , respectively (see Table 2).The latent (stochastic) temporal trend δ t is expected to be a smooth functionof t . Since the second order random walk (RW2) model is appropriate forrepresenting smooth curves (Fahrmeir and Kneib, 2008), δ = ( δ , . . . , δ T ) isassumed to follow a RW2 model, i.e.,∆ δ t +1 = δ t +1 − δ t + δ t − = (cid:15) t , t = 2 , . . . , T − , where (cid:15) , . . . , (cid:15) T − are independent and identically distributed (i.i.d.) zero meanGaussian random variables with variance 1 /τ δ . Here the precision parameter τ δ > δ t (Fahrmeir and Kneib, 2008).To account for temporal correlation, we assume that ε t follows a stationaryautoregressive model of order 2, AR(2); i.e., ε t = ψ (1 − ψ ) ε t − + ψ ε t − + (cid:15) (cid:48) t , t = 2 , . . . , T, − < ψ < − < ψ < ε t and (cid:15) (cid:48) , . . . , (cid:15) (cid:48) T are i.i.d. zero mean Gaussian random variables withvariance 1 /τ ε .On the other hand, to account for spatial correlation, we assume that ζ =( ζ , . . . , ζ m ) follows a Gaussian Markov random field (GMRF). More specifically,we assume that ζ is a zero mean Gaussian random vector with the structuredcovariance matrix V ar( ζ ) = 1 τ ζ (cid:18) I m − ωe max C (cid:19) − where I m is the m × m identity matrix, 0 ≤ ω < e max is the largesteigenvalue of the m × m symmetric positive definite matrix C = [ C ii (cid:48) ]. Theentry C ii (cid:48) of matrix C represents to what extend the regions A i and A i (cid:48) areinterconnected. For example, C ii (cid:48) can be related to a data on commuting orpopulation movement between regions A i and A i (cid:48) . In absence of most recentand reliable movement data between the regions of Spain, Italy and Germany,we set C ii (cid:48) to be the Euclidean distance between the centroids of A i and A i (cid:48) .In addition to interconnectivity and correlations due to spatial distance, theneighbourhood structure of regions A , . . . , A m may induce spatial correlationamong relative risks of regions because neighbouring regions often tend to havesimilar relative risks. To include spatial correlation due to neighborhood struc-ture of regions in the model, we assume that ξ = ( ξ , . . . , ξ m ) follows a scaledversion of the Besag-York-Molli´e (BYM) model (Besag et al., 1991); i.e., ξ is azero mean Gaussian random vector with (Riebler et al., 2016) V ar( ξ ) = 1 τ ξ (cid:0) (1 − φ ) Q − + φ I m (cid:1) . Here Q − denotes the generalized inverse of the m × m spatial precision matrix Q = [ Q ii (cid:48) ] with entries Q ii (cid:48) = n i i = i (cid:48) − i ∼ i (cid:48) n i is the number of neighbors of region A i and i ∼ i (cid:48) means that regions A i and A i (cid:48) share a common border. The parameter τ ξ > ≤ φ ≤ In a Bayesian framework, it is necessary to specify prior distributions for all un-known parameters of the considered model. The Gaussian prior with mean zeroand variance 10 is considered as a non-informative prior for the dispersion pa-rameter of generalized Poisson distribution, log ϕ , and for the parameters of thelog linear model for the relative risks µ , β , log τ δ , log τ ε , log τ ζ , log τ ξ , log ω − ω ,8able 3: Parameters of the considered model for the daily number of new cases,their transformation for non-informative uniform (flat) priors and initial values.parameter notation transformation initial valuedispersion parameter of generalized Poisson ϕ log ϕ α α µ µ β β τ δ log τ δ ε t τ ε log τ ε ε t ψ log ψ − ψ ε t ψ log ψ − ψ ζ i τ ζ log τ ζ C in the variance of ζ ω log ω − ω ξ i τ ξ log τ ξ Q − in the variance of ξ φ log φ − φ φ − φ , log ψ − ψ and log ψ − ψ . The prior distribution for the α parameter ofthe generalized Poisson distribution is considered to be a Gaussian distributionwith mean 1 . . Table 3 summarizes the model parametersand their necessary transformation for imposing the non-informative Gaussianpriors.Since all random effects of the model are Gaussian, the integrated nestedLaplace approximation (INLA) method (Rue et al., 2009) can be used for deter-ministic fast approximation of posterior probability distributions of the modelparameters and latent random effects (Martins et al., 2013; Lindgren et al.,2015). The R-INLA package, an R interface to the INLA program and availableat , is used for the implementation of the Bayesian computa-tions in the present work. The R code can be made available upon request. Theinitial values for all parameters in the INLA numerical computations are set tobe the mean of their corresponding prior distribution. The initial value of α ischosen to be one (see Table 3). For count data Y it and in a Bayesian framework, a probabilistic forecast is aposterior predictive distribution on Z + . It is expected to generate values thatare consistent with the observations (calibration) and concentrated around theirmeans (sharpness) as much as possible (Czado et al., 2009). Following a leave-9ne-out cross-validation approach, let B − ( it ) = m (cid:92) i (cid:48) (cid:54) = i =1 T (cid:92) t (cid:48) (cid:54) = t =1 { Y i (cid:48) t (cid:48) = y i (cid:48) t (cid:48) } be the event of observing all count values except the one for region A i at time t .Dawid (1984) proposed the cross-validated probability integral transform (PIT)PIT it ( y it ) = P (cid:0) Y it ≤ y it (cid:12)(cid:12) B − ( it ) (cid:1) for calibration checks. Thus, PIT it is simply the value that the predictive dis-tribution function of Y it attains at the observation point y it . The conditionalpredictive ordinate (CPO)CPO it ( y it ) = P (cid:0) Y it = y it (cid:12)(cid:12) B − ( it ) (cid:1) is another Bayesian model diagnostic. Small values of CPO it ( y it ) indicate pos-sible outliers, high-leverage and influential observations (Pettit, 1990).For count data, Czado et al. (2009) suggested a nonrandomized yet uniformversion of the PIT with F it ( u | y it ) = , u < PIT it ( y it − , u − PIT it ( y it − it ( y it ) − PIT it ( y it − , PIT it ( y it − ≤ u < PIT it ( y it ) , , u ≥ PIT it ( y it ) , which is equivalent to F it ( u | y it ) = (cid:18) − PIT it ( y it ) − u CPO it ( y it ) (cid:19) [0 < PIT it ( y it ) − u ≤ CPO it ( y it )]+ [PIT it ( y it ) − u ≤ , where [ · ] is the indicator function, becausePIT it ( y it −
1) = P (cid:0) Y it ≤ y it − (cid:12)(cid:12) B − ( it ) (cid:1) = P (cid:0) Y it ≤ y it (cid:12)(cid:12) B − ( it ) (cid:1) − P (cid:0) Y it = y it (cid:12)(cid:12) B − ( it ) (cid:1) = PIT it ( y it ) − CPO it ( y it ) . The mean PIT F ( u ) = 1 mT m (cid:88) i =1 T (cid:88) t =1 F it ( u | y it ) , ≤ u ≤ , can then be comparing with the standard uniform distribution for calibration.For example, a histogram with heights f j = F (cid:18) jJ (cid:19) − F (cid:18) j − J (cid:19) , j = 1 , . . . , J, ϕ α µ -1.346 (-1.508, -1.205) -0.951 (-1.101, -0.827) -0.753 (-0.885, -0.620) β -0.062 (-1.182, 1.065) 0.085 (-0.096, 0.274) 0.323 (0.031, 0.615) τ δ τ ε ψ ψ -0.932 (-0.981, -0.832) -0.973 (-0.998, -0.926) -0.688 (-0.841, -0.416) τ ζ ω τ ξ φ j − /J, j/J ), j = 1 , . . . , J , can be compared withits counterpart from the standard uniform distribution with f j = 1 /J . Anydeparture from uniformity indicates forecast failures and model deficiencies. Asmentioned in Czado et al. (2009), U-shaped (reverse U-shaped) histograms in-dicate underdispersed (overdispersed) predictive distributions and when centraltendencies of the predictive distributions are biased, the histograms are skewed. Table 4 presents the Bayesian estimates (posterior means) for every parameterof the considered model fitted to the daily number of new COVID-19 cases inSpain, Italy and Germany. The corresponding 95% credible intervals of themodel parameters are also reported in parentheses.Comparing the estimated parameters among different countries, it can beseen that the dispersion parameter ϕ of the generalized Poisson distribution forItaly is higher than Spain and Germany, but its shape parameter α is around 1 . δ t has at least 35 times larger contribution (smaller precision)than ε t which represents temporal correlation. The opposite signs of ψ and ψ indicate rough oscillations in ε t . The spatial random effect ζ i has largercontribution (smaller precision) than ξ i in the total variations of the relativerisks only in Spain, while for Italy and Germany it is the opposite. This couldbe a result of large Euclidean distance between Spain continental Europeanterritory from two archipelagos territories, which is affecting the consideredcovariance structure of ζ i .In summary, the higher contribution (lower precision) in the total variationsof the relative risks for Spain, Italy and Germany is due to the temporal trend,spatial correlation and finally temporal correlation, respectively. This may hintthat spatial correlations have a greater impact on the relative risks of COVID-19than temporal correlations.The Bayesian estimates and 95% credible intervals for the temporal trend δ t , t = 1 , . . . , T , are shown in Figure 1. These plots can be interpreted as asmoothed temporal trend of the relative risk in the whole country. In fact,Figure 1 suggests that the COVID-19 epidemic in all three countries rapidlyreached their peaks and slowly started to decline at the beginning of April andthen increased and reached its maximum in August. In addition, the secondwave of the epidemic seems to be stronger in Spain and Germany shows a moresmoother trend during the study period.Figure 2 shows the the posterior means of the spatial random effects ζ i and ξ i , i = 1 , . . . , m , on the corresponding map of each country. The plotillustrates spatial heterogeneity of the relative risk of COVID-19 across regionsin each country, particularly in Spain. Regions with positive (negative) ζ i + ξ i values are expected to have elevated (lower) relative risks than the the baselinecountry-wide risk during the study period.In order to see how the estimated relative risks under the fitted model arein agreement with the observed data, Figure 3 shows the spatially accumulateddaily number of cases (cid:80) mi =1 Y it , t = 1 , . . . , T , and their expected values un-der the fitted model, namely the posterior mean and 95% credible interval of (cid:80) mi =1 (cid:98) E it (cid:98) θ it , t = 1 , . . . , T . Except some discrepancies for Spain and Italy, theobserved values are inside the 95% credible intervals and close to the expectedvalues under the fitted model. Figure 3 in addition shows 4-days ahead fore-casts of the total daily number of new cases at the end of study period of eachcountry.Finally, histograms of the normalized PIT values described in Section 4.4are obtained using J = 20 from the fitted models and plotted in Figure 4.The normalized PIT values for the fitted models to data do not show a clearvisible pattern and the histograms seems to be close to the standard uniformdistribution.The above results and more details on observed and predicated values fromthe fitted model are also provided in an interactive Shiny web application at https://ajalilian.shinyapps.io/shinyapp/ .12 date R W t e m po r a l r ando m e ff e c t Spain −2−1012 Apr Jul date R W t e m po r a l r ando m e ff e c t Italy −3−2−1012 Apr Jul date R W t e m po r a l r ando m e ff e c t Germany
Figure 1: Smoothed temporal trend of the relative risks of COVID-19, obtainedfrom posterior mean and 95% credible interval of the structured temporal ran-dom effect of the fitted model 13
Spain
Italy
Germany
Figure 2: Posterior mean of the spatial random effects ζ i (left) and ξ i (right) inthe fitted model 14 date c oun t r y − w i de nu m be r o f ne w c a s e s Spain date c oun t r y − w i de nu m be r o f ne w c a s e s Italy date c oun t r y − w i de nu m be r o f ne w c a s e s Germany
Figure 3: Observed value (solid line), predicted value (dashed line) and 95%prediction interval (grey area) for the daily number of new COVID-19 cases inthe whole country, based on the posterior mean and 95% credible interval of thespatially accumulated relative risks of the fitted model15 ormalized PIT values r e l a t i v e f r equen cy . . . . Spain normalized PIT values r e l a t i v e f r equen cy . . . . Italy normalized PIT values r e l a t i v e f r equen cy . . . . Germany
Figure 4: Histograms of normalized PIT values to check for uniformity16
Concluding remarks
There are some limitations in the analyses and modeling of data on the num-ber of new cases of COVID-19, including data incompleteness and inaccuracy,unavailability or inaccuracy of relevant variables such as population movementand interaction, as well as the unknown nature of the new COVID-19 virus.Nevertheless, understanding the underlying spatial and temporal dynamics ofthe spread of COVID-19 can result in detecting regional or global trends andto further make informed and timely public health policies such as resourceallocation.In this study, we used a spatio-temporal model to explain the spatial andtemporal variations of the relative risk of the disease in Spain, Italy and Ger-many. Despite data limitations and the complexity and uncertainty in the spreadof COVID-19, the model was able to grasp the temporal and spatial trends in thedata. However, the posterior predictive checks using the normalized probabilityintegral transform (PIT) showed that there is room for the model improvements.Obliviously, there are many relevant information and covariates that can beconsidered in our modeling framework and improve the model’s predictive ca-pabilities. One good possibility would be considering most recent and accuratehuman mobility amongst regions. We would expect our model would benefitfrom this information, which right now can not be accessed. Moreover, theconsidered spatio-temporal model in this paper is one instance among manypossibilities. For example, one possibility is to include a random effect term inthe model that represents variations due to joint spatio-temporal correlations;e.g., a separable sptaio-temporal covariance structure. However, the consid-ered model was adequate and no joint spatio-temporal random effect term wasconsidered to avoid increasing the model’s complexity.We focused here on a stochastic spatio-temporal model as a good alternativeto existing deterministic compartmental models in epidemiology to explain thespatio-temporal dynamics in the spread of COVID-19. However, it should beemphasized that one step forward would be considering a combination of adeterministic compartmental model in terms of differential equations for thenumber of susceptible, exposed, infectious and recovered cases with our sort ofstochastic modeling approach. This is a clear novelty and a direction for futureresearch.
Acknowledgements
J. Mateu has been partially funded by research projects UJI-B2018-04 (Universi-tat Jaume I), AICO/2019/198 (Generalitat Valenciana) and MTM2016-78917-R(Ministry of Science). 17 eferences
Alamo, T., Reina, D. G., Mammarella, M., and Abella, A. (2020). Open dataresources for fighting covid-19. arXiv preprint arXiv:2004.06111 . 3, 4Anderson, C. and Ryan, L. M. (2017). A comparison of spatio-temporal diseasemapping approaches including an application to ischaemic heart disease innew south wales, australia.
International journal of environmental researchand public health , 14(2):146. 2Bastos, S. B. and Cajueiro, D. O. (2020). Modeling and forecasting the covid-19pandemic in brazil. arXiv preprint arXiv:2003.14288 . 3Besag, J., York, J., and Molli´e, A. (1991). Bayesian image restoration, withtwo applications in spatial statistics.
Annals of the institute of statisticalmathematics , 43(1):1–20. 8Briz-Red´on, ´A. and Serrano-Aroca, ´A. (2020). A spatio-temporal analysis forexploring the effect of temperature on covid-19 early evolution in spain.
Sci-ence of The Total Environment , page 138811. 2Center International Earth Science Information Network (CIESIN) ColumbiaUniversity (2018). Gridded population of the world, version 4 (gpwv4): Pop-ulation count, revision 11. https://doi.org/10.7927/H4JW8BX5 . Accessed15 May 2020. 4Chang, W., Cheng, J., Allaire, J., Xie, Y., and McPherson, J. (2020). shiny:Web Application Framework for R . R package version 1.4.0.2. 3Consul, P. C. and Jain, G. C. (1973). A generalization of the poisson distribu-tion.
Technometrics , 15(4):791–799. 5Czado, C., Gneiting, T., and Held, L. (2009). Predictive model assessment forcount data.
Biometrics , 65(4):1254–1261. 9, 10, 11Danon, L., Brooks-Pollock, E., Bailey, M., and Keeling, M. J. (2020). A spatialmodel of covid-19 transmission in england and wales: early spread and peaktiming. medRxiv . 2Dawid, A. P. (1984). Present position and potential developments: Some per-sonal views statistical theory the prequential approach.
Journal of the RoyalStatistical Society: Series A (General) , 147(2):278–290. 10Fahrmeir, L. and Kneib, T. (2008). On the identification of trend and correlationin temporal and spatial regression. In
Recent advances in linear models andrelated areas , pages 1–27. Springer. 7Gayawan, E., Awe, O., Oseni, B. M., Uzochukwu, I. C., Adekunle, A. I., Samuel,G., Eisen, D., and Adegboye, O. (2020). The spatio-temporal epidemic dy-namics of covid-19 outbreak in africa. medRxiv . 218iuliani, D., Dickson, M. M., Espa, G., and Santi, F. (2020). Modelling andpredicting the spatio-temporal spread of coronavirus disease 2019 (covid-19)in italy.
Available at SSRN 3559569 . 1Gross, B., Zheng, Z., Liu, S., Chen, X., Sela, A., Li, J., Li, D., and Havlin, S.(2020). Spatio-temporal propagation of covid-19 pandemics. 2Kandel, N., Chungong, S., Omaar, A., and Xing, J. (2020). Health securitycapacities in the context of covid-19 outbreak: an analysis of internationalhealth regulations annual report data from 182 countries.
The Lancet . 2Kang, D., Choi, H., Kim, J.-H., and Choi, J. (2020). Spatial epidemic dy-namics of the covid-19 outbreak in china.
International Journal of InfectiousDiseases . 2Langousis, A. and Carsteanu, A. A. (2020). Undersampling in action and atscale: application to the COVID-19 pandemic.
Stochastic EnvironmentalResearch and Risk Assessment , 34(8):1281–1283. 3Lawson, A. B. (2018).
Bayesian Disease Mapping: Hierarchical Modeling inSpatial Epidemiology . Chapman and Hall/CRC, 3 edition. 2, 6, 7Lee, D. (2011). A comparison of conditional autoregressive models used inbayesian disease mapping.
Spatial and spatio-temporal epidemiology , 2(2):79–89. 2, 7Lindgren, F., Rue, H., et al. (2015). Bayesian spatial modelling with r-inla.
Journal of Statistical Software , 63(19):1–25. 9Martins, T. G., Simpson, D., Lindgren, F., and Rue, H. (2013). Bayesiancomputing with inla: new features.
Computational Statistics & Data Analysis ,67:68–83. 9Peng, L., Yang, W., Zhang, D., Zhuge, C., and Hong, L. (2020). Epi-demic analysis of covid-19 in china by dynamical modeling. arXiv preprintarXiv:2002.06563 . 3Pettit, L. (1990). The conditional predictive ordinate for the normal distri-bution.
Journal of the Royal Statistical Society: Series B (Methodological) ,52(1):175–184. 10Riebler, A., Sørbye, S. H., Simpson, D., and Rue, H. (2016). An intuitivebayesian spatial model for disease mapping that accounts for scaling.
Statis-tical methods in medical research , 25(4):1145–1165. 8Roda, W. C., Varughese, M. B., Han, D., and Li, M. Y. (2020). Why is it difficultto accurately predict the covid-19 epidemic?
Infectious Disease Modelling . 319ue, H., Martino, S., and Chopin, N. (2009). Approximate bayesian infer-ence for latent gaussian models by using integrated nested laplace approxi-mations.
Journal of the royal statistical society: Series b (statistical method-ology) , 71(2):319–392. 9Sujath, R., Chatterjee, J. M., and Hassanien, A. E. (2020). A machine learningforecasting model for COVID-19 pandemic in india.
Stochastic EnvironmentalResearch and Risk Assessment , 34(7):959–972. 2Wakefield, J. (2007). Disease mapping and spatial regression with count data.
Biostatistics , 8(2):158–183. 2, 6, 7Waller, L. A., Carlin, B. P., Xia, H., and Gelfand, A. E. (1997). Hierarchicalspatio-temporal mapping of disease rates.
Journal of the American Statisticalassociation , 92(438):607–617. 5World Health Organization (2020). Coronavirus disease (covid-19) outbreak. . 1Zamani, H. and Ismail, N. (2012). Functional form for the generalized pois-son regression model.
Communications in Statistics-Theory and Methods ,41(20):3666–3675. 6