[PDF] An unbiased spatiotemporal risk model for COVID-19 with epidemiologically meaningful dynamics (A systematic framework for spatiotemporal modelling of COVID-19 disease)

Abstract

Spatiotemporal modelling of infectious diseases such as COVID-19 involves using a variety of epidemiological metrics such as regional proportion of cases or regional positivity rates. Although observing their changes over time is critical to estimate the regional disease burden, the dynamical properties of these measures as well as cross-relationships are not systematically explained. Here we provide a spatiotemporal framework composed of six commonly used and newly constructed epidemiological metrics and conduct a case study evaluation. We introduce a refined risk model that is biased neither by the differences in population sizes nor by the spatial heterogeneity of testing. In particular, the proposed methodology is useful for the unbiased identification of time periods with elevated COVID-19 risk, without sensitivity to spatial heterogeneity of neither population nor testing. Our results also provide insights regarding regional prioritization of testing and the consequences of potential synchronization of epidemics between regions.

Full PDF

1 Title 1 (old):

A systematic framework for spatiotemporal modelling of COVID-19 disease

Title 2:

An unbiased spatiotemporal risk model for COVID-19 with epidemiologically meaningful dynamics

Author line:

Michał Paweł Michalak , Jack Cordes , Agnieszka Kulawik , Sławomir Sitek , Sławomir Pytel , Elżbieta Zuzańska-Żyśko , Radosław Wieczorek Author Affiliations: Institute of Earth Sciences, Faculty of Natural Sciences, University of Silesia in Katowice, Będzińska 60, 41-205 Sosnowiec, Poland, [email protected] Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston 02115, MA, USA Faculty of Science and Technology, University of Silesia in Katowice, Bankowa 14, 40-007 Katowice, Poland Institute of Social and Economic Geography and Spatial Management, Faculty of Natural Sciences, University of Silesia in Katowice, Będzińska 60, 41-205 Sosnowiec, Poland Institute of Mathematics, Faculty of Science and Technology, University of Silesia in Katowice, Bankowa 14, 40-007 Katowice, Poland Word count: 4731 Summary

Spatiotemporal modelling of infectious diseases such as COVID-19 involves using a variety of epidemiological metrics such as regional proportion of cases or regional positivity rates. Although observing their changes over time is critical to estimate the regional disease burden, the dynamical properties of these measures as well as cross-relationships are not systematically explained. Here we provide a spatiotemporal framework composed of six commonly used and newly constructed epidemiological metrics and conduct a case study evaluation. We introduce a refined risk model that is biased neither by the differences in population sizes nor by the spatial heterogeneity of testing. In particular, the proposed methodology is useful for the unbiased identification of time periods with elevated COVID-19 risk, without sensitivity to spatial heterogeneity of neither population nor testing. We offer a case study in Poland that shows improvement over the bias of currently used methods and we believe our method can be implemented in other areas of the world. Our results also provide insights regarding regional prioritization of testing and the consequences of potential synchronization of epidemics between regions. Main Text Introduction

COVID-19 disease is caused by the novel coronavirus SARS-CoV-2 which was first discovered in late 2019 in Wuhan, Hubei Province, China. The main signs of infection include respiratory symptoms, fever, cough and breathing difficulties [1]. Spatiotemporal analysis plays a critical role in estimating the disease burden in specific regions [2], and it has been studied in many countries, including China [3,4], Spain [5], Italy [6,7], Sweden [8], Israel [9], Brazil [10] and United States [11,12]. It is important to note that comparisons between regions may be challenging, and this is not only because of differences in population sizes but also differences in health policies (e.g. testing regimes) that can change over time [11]. For example, one of the most common limitations raised when using spatiotemporal approaches relates to the lack of incorporation of the spatial heterogeneity of testing [13–15]. This omission may be misleading to public health officials in terms of an adequate public health response in regions with relatively high or low testing capabilities. From among many measures applicable in the spatiotemporal modelling of the COVID-19 disease, the following attracted the greatest attention: local [6] and global [16] cumulative number of cases, different versions of population-based relative risk (observed cases/expected cases) [14,17], testing rates (tests/population) [18], local [18] and global [16] positivity rates (confirmed cases/tests) [18,19] and population-based positivity (confirmed cases/population) [15,19]. Although potentially useful, the above measures were not investigated for their suitability in dynamical modelling of infectious disease. In particular, to the best of our knowledge, their dynamical properties and cross-relationships were not systematically explained. However, this has crucial importance for guiding the public health response: for example, it may be decided that an uninterrupted increase in cumulative version of population-based relative risk [14] entails the introduction of specific non-pharmaceutical interventions (NPIs) [20] in specific regions. Moreover, from the definition of these measures it follows that all of them are sensitive to spatial heterogeneity of either population or testing or both in their identification of elevated COVID-19, leading to bias. This study is intended to offer a systematic contribution regarding a variety of spatiotemporal epidemiological measures independently used in both comparative [14,18,19] and non-comparative [16] contexts. We reveal dynamical properties and prove relationships between the commonly used and newly constructed epidemiological metrics with a case study application. The proposed methodology has the potential to enhance the framework of infectious disease modelling and provides insights into how a more harmonized management of the crisis can be achieved.

Simultaneous standardisation with respect to population and testing

The notion of relative risk is often used to investigate the spatial distribution of cases [17,21,22] and is inherently related to the concept of indirect standardisation [22]. It involves calculating the standardised incidence ratio (SIR) which accounts for the differences in population sizes among regions. Typically, this value depends on individual daily infection rates and is calculated as the ratio between the observed number of cases (infections) in a region and the expected number of cases based on the regional population. Values greater than one suggest an elevated risk compared to the population average which may indicate infection clusters or a greater number of vulnerable groups [17]. To alleviate the impact of daily fluctuations and create a framework for comparing the present state with historical reference, a cumulative model can be applied [14], which we refer to as the cumulative standardised incidence ratio (CSIR). It is calculated for a specific day 𝑡 as the ratio of the confirmed number of cases since the outbreak of the pandemic, including day 𝑡 , to the expected cumulative number of cases on day 𝑡 . We mathematically show that from a dynamical viewpoint it is equivalent to the regional cumulative proportion of cases; that is, for a specific day 𝑡 and region 𝑖 , it may decrease or increase depending on whether the proportion of cases for 𝑖 on day 𝑡 is lower or greater than the cumulative proportion for 𝑖 on day 𝑡 − 1 . The limitation of using this risk estimate to compare regions is that it is only unbiased if there is spatial homogeneity regarding testing intensity. To overcome this problem, we first applied an analogous procedure for the data related to regional testing. Namely, we calculated a quantity that we refer to as the cumulative standardised testability ratio (CSTR). It is calculated as the proportion of observed tests to expected tests in a given region. It is important to note that the resulting value can be interpreted as an estimate of the relative safety. This is because the greater the quantity is, the more efficiently NPIs can be applied. We then divided the CSIR by the CSTR to get a refined estimate of the relative risk, which we call the weighted cumulative standardized incidence ratio (WCSIR). It follows that the WCSIR remains unchanged from the CSIR only if the CSTR is equal to one, that is, the number of tests is equal to the expected number of tests. Otherwise, the relative risk increases or decreases depending on whether the CSTR is smaller than or greater than one, respectively. The resulting risk estimate is therefore biased neither by differences in population sizes nor by differences in amount of testing in specific regions. In other words, the WCSIR measure allows testing intensity to be heterogeneous and it captures the change in relative risk by honouring the following expectations: 1) For a specific region, a risk measure should decrease/increase if the regional infection rate (regional infected/global infected) decreases/increases and the analogous test rate (regional tests/global tests) increases/decreases. 2) For a specific region, a risk measure should decrease/increase if the cumulative global positivity rate (GPR) (cumulative positive/cumulative tests) increases/decreases, while the analogous local positivity rate (LPR) decreases/increases. Alternatively, but equivalently (Theorem 2, Supplementary Materials), the WCSIR could be conceptualized as the ratio of LPR to GPR for a given region. Therefore, WCSIR equals one only if LPR=GPR, otherwise the relative risk increases or decreases depending on whether the LPR is greater than or less than GPR, respectively. As such, we anticipate our study to be a starting point for considering more sophisticated models of relative risk. For example, even if the dynamics of the regional proportion of cases show a positive trend for a given region, our methodology can classify this region as one with decreasing risk if the local and global positivity ratios are in opposite directions (i.e. are decreasing and increasing respectively). Case study motivation

On March 4 th , 2020, the first confirmed case was registered in Poland [23]. Four days later, cases were identified in the densely populated Silesian region [24]. As of August 17, 2020, 57,286 cases were verified in Poland, with 18,874 cases (34.8%) attributed to the Silesian region [25]. The core of the Silesian region is referred to as the Katowice conurbation, a polycentric area consisting of 16 towns and approximately 2 million people with a population density of 1,485 per km (24). The largest urban centre is Katowice with 280 thousand inhabitants [26]. The concentration of public health efforts in Silesia follows from its large proportion of the population employed in industry (28.7%), compared to the mean-country value of 20.6%. Of those employed in industry in the area, 16.7% are employed in mining and exploration, which is also the largest value in Poland (mean-country value of 4.6 %) [27]. After the first case in Poland was identified, NPIs were introduced, including cancelling mass events and closing borders, schools, and universities among other measures [28,29]. These interventions helped flatten the curve of total infected individuals and delayed the peak of the disease burden (Fig. S1). However, the lockdown measures applied were ultimately insufficient in terms of containing the spread of the disease in the densely populated and relatively industrially oriented Silesian region. Because it is difficult to apply social distancing recommendations [1] in crowded mine shafts, it is hypothesized that the spread of COVID-19 in the Silesian region was facilitated by miners who might have acted as asymptomatic carriers. This has critical importance as it has been demonstrated that asymptomatic carriers play a vital role in the spread of the novel coronavirus [30] and undocumented infectious cases can facilitate rapid dissemination [31]. Indeed, according to partial results related to 50,053 screening tests within a group of mines conducted between May 7, 2020 and June 25, 2020 as a proxy for the whole mining population, nearly 98 percent of the infected mine employees did not indicate any symptoms [32,33]. As of August 11, 7,934 miners were tested positive for COVID-19 which yields nearly 44% of infections in Silesian region [34]. Although the time period related to screening tests in mines resulted in greater population-based relative risk for Silesia, two major problems remain: 1) whether the decision on screening tests was a result of an already deteriorating epidemiological situation in Silesia – with the beginning of this deterioration being unknown and 2) whether in other regions with relatively low testing capabilities the risk related to COVID-19 disease exists but is undetected. We evaluated the proposed methodology on infection and testing rates that were provided by Ministry of Health in Poland: between March 4 and August 17 (infections) as well as between May 11 and August 17 (tests). The data were provided for 16 administrative regions of Poland (abbreviations presented in Fig. S2) that span the area of 312,696 km . Because the official testing rates were published on a weekly basis, to estimate the testing rates for individual days, we used interpolation. Other official reports supplemented by reports from news sources (as of March 26) were also used to perform interpolation for individual days between March 26 and May 11. Complete details regarding interpolation are available in the Methods. Methods Materials

We collected daily data on infections in Poland starting March 4, 2020 until August 17, 2020 from the Ministry of Health in Poland. From a technical viewpoint, the data are stored in a data frame in which columns represent days, while rows are administrative regions. A limitation of using this data is that reporting inaccuracies were not always adequately described. Although we applied 18 corrections, sometimes the dates corresponding to false positive cases or detected duplicates of confirmed cases were not provided. The inaccuracies and the corresponding metadata including dates of errors and applied corrections are summarised in Table S1. The data on regional testing are available for the time period between March 26, 2020 and August 17, 2020. The reports from May 11, 2020 onward were provided by the Ministry of Health in Poland on a weekly basis in the form of cumulative number of tests conducted for every region. It should be noted, however, that these cumulative data do not include the full day of publishing but are restricted to a specific hour of the day. For example, cumulative regional data on testing published on August 17 cover the time period between March 4 and August 17, 1:00 p.m. We also used official fragmentary reports as well as an unofficial incomplete report for March 26, 2020 from a news source (Table S2). For days preceding March 26, 2020 no data on regional testing are publicly available, therefore we excluded this period from the analysis . There is concern about limited reliability of testing data for Kielce region for which about 241 000 tests were erroneously registered [54]. Although the correction was applied on August 8 [54], the historical data were not officially corrected. We therefore subtracted the superfluous number of tests evenly throughout the time period for the Kielce region. Population census data were obtained from the official repository Demographic Yearbook of Poland [55]. Estimating relative risk To estimate the relative risk in regions of interest we used the concept of indirect standardisation [21,22]. The general formula for estimating the relative risk can be written as follows: 𝑂𝐸 , where 𝑂 and 𝐸 denote the observed and the expected number of cases respectively. Given a specific region 𝑖 , to obtain 𝐸 𝑖 it is first necessary to calculate a global ratio 𝑟 = ∑ 𝑂 𝑖𝑖 ∑ 𝑃 𝑖𝑖 , where ∑ 𝑃 𝑖𝑖 denotes the total population. It is then straightforward to calculate 𝐸 𝑖 as 𝑃 𝑖 𝑟 , with 𝑃 𝑖 being the population in region 𝑖 . For example, if the proportion of cases in Poland is 2%, then the expected number of cases in Silesia would be 2% of the population of Silesia, which assumes a spatially homogenous distribution of cases [21]. We must now differentiate between two versions of calculating the relative risks in our study: Standardised Incidence Ratio (SIR), Cumulative Standardised Incidence Ratio (CSIR) and Weighted Cumulative Standardised Incidence Ratio (WCSIR) based on Cumulative Standardised Testability Ratio (CSTR). For the SIR, the totals of observed number of cases and the expected number of cases 𝐸 𝑖 refer to the daily number of cases. If we denote 𝑥 𝑖,𝑡 as the observed number of cases on day 𝑡 , and 𝐸 𝑖,𝑡 as the expected number of cases on day 𝑡 , we can write the following formula: 𝑆𝐼𝑅 𝑖 (𝑡) = 𝑥 𝑖,𝑡 𝐸 𝑖,𝑡 = 𝑥 𝑖,𝑡 𝑃 𝑖 𝑟 𝑡 = 𝑥 𝑖,𝑡 𝑃 𝑖∑ 𝑥𝑖,𝑡𝑖∑ 𝑃𝑖𝑖 (1) For the CSIR, we assume that, for a specific day 𝑡 , the observed number of cases in region 𝑖 is the sum of the observed number of cases in 𝑖 for days 𝑗 ∈ [1, 𝑡] . Similarly, to calculate the corresponding ratio we assume that the observed number of cases is the sum of all cases that were confirmed in Poland for days 𝑗 ∈ [1, 𝑡] . Then, the ratio is multiplied by the population size in 𝑖 to get the expected number of cases. The CSIR is then calculated as the proportion between the observed and expected number of cases. If we denote 𝑂 𝑖,𝑡−1 = ∑ 𝑥 𝑖,𝑗𝑡−1𝑗=1 as the cumulative number of cases for region 𝑖 up to day 𝑡 − 1 , and 𝐸 𝑖,1:𝑡 as the expected cumulative number of cases for 𝑖 on day 𝑡 , then the formula for CSIR can be written as follows: 𝐶𝑆𝐼𝑅 𝑖 (𝑡) = 𝑂 𝑖,𝑡−1 +𝑥 𝑖,𝑡 𝐸 𝑖,1:𝑡 = 𝑂 𝑖,𝑡−1 +𝑥 𝑖,𝑡 𝑃 𝑖∑ (𝑂𝑖,𝑡−1+𝑥𝑖,𝑡)𝑖 ∑ 𝑃𝑖𝑖 (2) From this it follows that the current value of CSIR is to a large extent governed by the past (cumulative cases from days up to 𝑡 − 1 ) and the contribution of present day 𝑡 weakens with time. This explains why the curves are smooth: the present has a smaller effect than the past. To sum up, the past contributes primarily to the present state and, with time, its role in shaping the present increases. Similarly, the information on the number of tests conducted for each region can be employed to estimate the relative safety (not to be confused with the safety perspective presented in Fig. S11). The corresponding Cumulative Standardised Testability Ratio (CSTR) was calculated in an analogous procedure to that of the CSIR. If we denote 𝑇 𝑖,𝑡−1 as the cumulative number of tests conducted for region 𝑖 up to day 𝑡 − 1 , 𝑦 𝑖,𝑡 as the number of tests conducted for 𝑖 on day 𝑡 and 𝑇𝐸 𝑖,1:𝑡 as the cumulative expected number of tests for 𝑖 on day 𝑡 , then the formula for CSTR can be written as follows: 𝐶𝑆𝑇𝑅 𝑖 (𝑡) = 𝑇 𝑖,𝑡−1 +𝑦 𝑖,𝑡 𝑇𝐸 𝑖,1:𝑡 = 𝑇 𝑖,𝑡−1 +𝑦 𝑖,𝑡 𝑃 𝑖∑ (𝑇𝑖,𝑡−1+𝑦𝑖,𝑡)𝑖 ∑ 𝑃𝑖𝑖 (3) The interpolation procedure was conducted as follows. We first used interpolate_tests function to estimate the cumulative number of tests for individual days assuming a constant intercept: for example, if the cumulative number of tests for region 𝑖 were 1000 and 8000 on May 11 and May 18 respectively, then the intercept for every day related to this time window is equal to = = 1000 . Then, to obtain the approximate value of CSTRs for individual days, relrisk function was used. We note that the assumption of constant intercept may not be realistic but we are not ready to commit to the idea of the best interpolation method in case of differences in temporal resolution between data on infections and testing. To avoid edge-effects that are inherent for the interpolation method, statistical approaches including best-fitting curves could be employed. However, the main disadvantage of the statistical procedure is that one cannot expect that the official data of cumulative number of tests will be honoured at nodes. The interpolation enabled the weighted model to be obtained through the division of data frames corresponding to data on infections and testing. The corresponding equation of the resulting WCSIR is as follows : (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐶𝑎𝑠𝑒𝑠/𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑎𝑠𝑒𝑠)/(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑇𝑒𝑠𝑡𝑠/𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑇𝑒𝑠𝑡𝑠) . 𝑊𝐶𝑆𝐼𝑅 𝑖 (𝑡) = 𝑂𝑖,𝑡−1+𝑥𝑖,𝑡𝐸𝑖,1:𝑡𝑇𝑖,𝑡−1+𝑦𝑖,𝑡𝑇𝐸𝑖,1:𝑡 = 𝑂𝑖,𝑡−1+𝑥𝑖,𝑡𝑃𝑖∑ (𝑂𝑖,𝑡−1+𝑥𝑖,𝑡)𝑖 ∑ 𝑃𝑖𝑖𝑇𝑖,𝑡−1+𝑦𝑖,𝑡𝑃𝑖∑ (𝑇𝑖,𝑡−1+𝑦𝑖,𝑡)𝑖 ∑ 𝑃𝑖𝑖 (4) From a methodological viewpoint, we note that the curves presented here could also be viewed as a measure of relative safety in the form of . This safety perspective provides a better visualization of regions with lowest values of relative risk that in the CSIR or WCSIR models are difficult to distinguish (Fig. S11). Local and global positivity ratios (LPR and GPR) Because the relationship between local and global dynamics of the cumulative proportion of positive cases exerts influence on the dynamics of WCSIR, we provide formulas for these measures:

𝐿𝑃𝑅 𝑖 (𝑡) = 𝑂 𝑖,𝑡 𝑇 𝑖,𝑡 (5), 𝐺𝑃𝑅(𝑡) =

𝑂(𝑡)𝑇(𝑡) (6) The computational objectives corresponding to functions included in the computer code are summarised in Table 2. We used the following R packages: dplyr [56], ggplot2 [57], ggpubr [58], reshape2 [59], tibble [60], sf [61], tmap [62], broom [63], plotly [64] and magrittr [65]. Results

Unweighted risk

The SIR analysis reveals that in the Silesian region (12 KAT) the relative risk values were largely greater than one since mid-April and on five days in late April and early May this value was greater than 3, all before the decision to screen for COVID-19 in mines (Fig. 1A). In the 28 days after implementation of this testing policy, SIR values were always greater than 3, with maximum value 6.92 on May 12 th and fluctuations between 4 and 6. However, greater than expected number of cases since mid-April suggest that the outbreak might have originated earlier than the decision to test miners. Indeed, CSIR curves show a rising trend prior to the decision to implement screening tests in mines in Silesia largely after the Easter holiday (Fig. 1B). The nearly monotonously rising trajectory of both CSIR between mid-April and mid-June denotes the progressive relative deterioration of the epidemiological situation in Silesia for this time period. Weighted risk

However, as discussed above, the estimates of relative risks presented in Fig. 1 are biased by the regional differences in testing intensity. Figure S3 shows that throughout the epidemic the highest testing intensity was observed for the Warszawa region (7 WAR), with lowest positions occupied by the Opole (8 OPO) and Rzeszów (9 RZE) regions. According to the WCSIR model (Fig. 2; Fig. 3 – lowest part) the Opole region (8 OPO) was the least safe region during the height of the epidemic in Poland (11 April-18 May), and at the end of the study period occupies second position, with a significant distance to the third ranked Rzeszów region (9 RZE). Although positive differences between CSIR and WCSIR for both Silesia (12 KAT) and Opole (8 OPO) can be observed (Figs. 2, 3), for the Silesia region this difference is smaller and the corresponding curves are more or less parallel (Fig. 2). This is not the case for the Opole region, whose CSIR and WCSIR curves show a greater difference and a diverging pattern in mid-May: while the WCSIR was increasing during May 13-18 with a simultaneous decrease in CSTR, an increase in CSIR was observed only between May 15-17. This means that the weighted risk estimate may increase even if the unweighted risk decreases. Dynamical patterns

A more comprehensive analysis of the relationship between cumulative measures (CSTR, CSIR, WCSIR and the LPR-GPR tandem (Fig. S4)) can be conducted by plotting daily values of two selected measures on a Cartesian plane (Figs. 4, S5-S10). In summary of all sixteen regions, we highlighted six potentially distinct trajectories that emerged. Fig. 4A shows the ideal situation when CSIR and WCSIR decrease together throughout the period, as exemplified by the Wrocław region (1 WRO). This pattern can be summarised that although sometimes CSTR is increasing, the CSIR is decreasing (Fig. S5A). The second pattern for Warszawa (7 WAR) region shows that the decreasing value of CSIR is associated with an increasing value of WCSIR and LPR (Figs. 4B, S6B). The trajectory of the Opole region (8 OPO) region represents the third pattern: while the CSIR was approximately constant, the WCSIR was rising fast (Fig. 4C), confirming the divergent pattern in mid-May observed in Figure 2. We note however that the LPR and WCSIR show a decreasing trend since June (Figs. 4C, S6C) in Opole region. Figure 4D shows the fourth pattern with an undesireable change of the trajectory of the relationship, as typified by the Rzeszów region (9 RZE) since mid-June. For Silesia (12 KAT) (Fig. 4E) we observe the fifth pattern of a plateau in the relationship after initial growth of CSIR and WCSIR. This may be due to largely decreasing LPR since June when it reached a maximum value of 10.25% (Fig. S4). The last pattern, exemplified by the Poznań (15 POZ) region (Figs. S5F, Figs. S8-S9F), shows a change in direction of CSTR trajectory that started to decrease in mid-June. Although at the beginning of this change LPR was still decreasing, within a month from this change it started to increase (Fig. S10F) with undulating CSIR and WCSIR (Figs. 4F, S5F). Discussion

In this article, we highlighted the properties and relationships between commonly used and newly constructed epidemiological measures applicable in spatiotemporal infectious disease relative risk modelling. We stressed the role of including information on testing intensity in estimating the relative risks of COVID-19 infection, with a case study in Poland. The weighted approach is particularly useful when spatial homogeneity in testing intensity cannot be assumed, which was the case in our example. For instance, as of August 17, the Warszawa region was tested nearly 4.5 times more intensely (CSTR=1.66) than the Rzeszów region (CSTR=0.37). Given these disproportions, inferring the epidemic dynamics from the confirmed number of cases is not justified [35]. We also show the official statements regarding the Silesia region as the most tested (CSTR never greater than 1, as of August 17) [36], the epidemiologically safest region [37], or epidemiologically unexceptional (both CSIR and WCSIR>1) [38] to be false. The refined results could be utilized by authorities and health crisis managers to introduce more integrated NPI policies for adjacent regions that can be epidemiologically synchronized [39,40]. In our case, the similarity between WCSIR values and relatively high positivity rates throughout the study period suggest that this synchronization may be the case for the Opole and Silesia regions. As of now in these regions, differences in organising public gatherings remain: for example in the Opole region, church authorities allowed the organisation of city-wide processions at the Feast of Corpus Christi (June 11) [41], whereas in Silesia mass gatherings of this kind were forbidden [42]. Because public gatherings played a vital role in the spread of the 1918 influenza pandemic [20,43], it is now necessary to stress the significance of the joint effect of testing and infection rates to prevent downplaying the epidemiological risk in poorly tested regions. We moreover illustrated that incorporating the historical data for calculating the relative risk can be beneficial to better identify time periods related to investigated trends. Using a cumulative model of relative risk, we demonstrate that systematic growth in infections already started in mid-April, three weeks before the decision to implement screening tests in mines in Silesia. This is particularly concerning as the lack of screening tests at the time of potentially greater, yet unknown, mobility corresponding to Easter may have facilitated the local spread of the disease among these mostly asymptomatic and thus undetected carriers. We note however that the epidemiological deterioration did not affect all mines equally and it was highly variable throughout the study period: while at the end of June the positivity ratio calculated for a group of several mines did not exceed 4% (1,862 confirmed cases/50,053 tests) [33], one month later a screening test conducted for one mine from this group revealed about 35% population-based positivity (156 confirmed cases/452 employees) [44]. Surprisingly, the systematic deterioration and highest positivity ratio in May for Silesia coincides with a temporal concentration of increase in one particular mine in which the population-based positivity increased from about 5% (245/4,982 employees, as of May 14) to 28% (as of May 24), to 33% (as of July 29). The investigation of COVID-19 spread in similar conditions was already carried out in the densely populated region of Buenos Aires with 13 million inhabitants in 41 districts [45]. This research was based on analysing anonymized mobile phones and it revealed that the spread of the disease was radial in nature: from the central city of Buenos Aires, through the suburban districts, to the neighbouring regions. There are however two major differences underlying the spread of the disease in Buenos Aires versus the Silesian region. While the first difference is related to a very specific spatial structure of the Silesian region, the second points at the greater role of industry rather than population density in the spread of the disease. Other studies have likewise shown that population density [15,46], age structure [15], and socioeconomic status [11,19,47] among other variables affect COVID-19 transmission patterns, but this study adds knowledge regarding how the area’s relative dominance in an economic sector can play a role in the transmission. We note however that the mining industry should be only regarded as a proxy of the infectious potential of large industrial plants and indeed similar events were registered in other regions (e.g. in 15 POZ region in meat-processing company in early August – 313 confirmed cases/800 employees) [48]. Given these plants may attract employees from more distant localities, they have the potential to synchronize the epidemics at the sub-regional level, sometimes trespassing the administrative borders of a higher level [40]. Therefore, particular attention should be paid if this synchronization may be the case for administrative regions with different testing capabilities. The main limitation of this study follows from the incompleteness of official data related to testing data that are publicly available only since May 11 th (see details in Methods). For the time period between March 26 and May 11, the results of weighting approaches and positivity ratios are estimated with greater uncertainty and we urge more caution in interpreting estimates at the beginning of the time period. The confirmed yet unexplained underreporting for Silesia resulting in about an 8% (as of July 9) underestimate of cases as well as limited reliability of testing data for Kielce region [49] poses additional interpretation difficulties [50]. An additional limitation is that there may be other unknown differences in testing regimes influencing the results. For example, poorly tested regions may decide that only very suspicious cases will be tested which will result in a relatively high positivity ratio [16]. We also did not include information on recovered individuals, defined as those with two subsequent negative tests [51] which underestimates the positivity ratios for the time period in which the number of recovered increases. It should also be stressed that the theorem regarding the dynamics of WCSIR has the form of an implication, so one is not allowed to establish a causal relationship between a decrease of WCSIR and either of the alternatives in the implication given. We stress that the WCSIR can be thought of in two alternative ways (CSIR/CSTR or LPR/GPR, Observation 1 in Theorem 2). This flexibility affords opportunities overcome potential difficulties with obtaining population data, which may be the case if a testing site cannot be easily split into regions with known population. Because testing capabilities are always limited, the obtained results after weighting could also be used to consider regional prioritization in the availability of tests. For example, from Table 1 it could be inferred that the following regions are in particular need for increasing the intensity of testing: Opole (8 OPO: + 170.53%) and Rzeszów (9 RZE: + 171.41%). A more meaningful analysis of testing capabilities should however always involve the temporal aspect (Fig. S3) and the relationship between the epidemiological indices that we proposed and developed throughout time (Figs. 4, S5-S10). Therefore, we support calls for the radical increase in the identification of positive cases and accompanying isolation as well as encourage behavioural changes and increase awareness of the disease to help reduce the spread of COVID-19 [31,52,53]. We also believe that this method holds promise to guide efforts in countries without a robust health care infrastructure and to counter a misinterpretation of the perception of a high relative risk in densely tested areas compared to areas with low apparent relative risk due to limited testing availability, when in fact a high risk of COVID-19 may simply be undetected. References WHO . Coronavirus disease (COVID-19) advice for the public

Franch-Pardo I, et al.

Spatial analysis and GIS in the study of COVID-19. A review.

Science of the Total Environment : 140033. 3.

Jia JS, et al.

Population flow drives spatio-temporal distribution of COVID-19 in China.

Nature : 389–394. 4.

Huang R, Liu M, Ding Y . Spatial-temporal distribution of COVID-19 in China and its prediction: A data-driven modeling analysis.

Journal of Infection in Developing Countries : 246–253. 5. Briz-Redón Á, Serrano-Aroca Á . A spatio-temporal analysis for exploring the effect of temperature on COVID-19 early evolution in Spain.

Science of the Total Environment : 138811. 6.

Gatto M, et al.

Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures.

Proceedings of the National Academy of Sciences of the United States of America : 10484–10491. 7.

Bertuzzo E, et al.

The geography of COVID-19 spread in Italy and implications for the relaxation of confinement measures.

Nature Communications : 4264. 8. Gémes K, et al.

Burden and prevalence of prognostic factors for severe COVID-19 in Sweden.

European Journal of Epidemiology

Rossman H, et al.

A framework for identifying regional outbreak and spread of COVID-19 from one-minute population-wide surveys.

Nature Medicine : 634–638. 10. Candido DS, et al.

Evolution and epidemic spread of SARS-CoV-2 in Brazil.

Science Mollalo A, Vahedi B, Rivera KM . GIS-based spatial modeling of COVID-19 incidence rate in the continental United States.

Science of the Total Environment : 138884. 12.

Miller IF, et al.

Disease and healthcare burden of COVID-19 in the United States.

Nature Medicine : 1212–1217. 13. Hohl A, et al.

Daily surveillance of COVID-19 using the prospective space-time scan statistic in the United States.

Spatial and Spatio-temporal Epidemiology

Rohleder S, Bozorgmehr K . Monitoring the spatiotemporal epidemiology of Covid-19 incidence and mortality: a small-area analysis in Germany. researchsquare.com

Zhang CH, Schwartz GG . Spatial Disparities in Coronavirus Incidence and Mortality in the United States: An Ecological Analysis as of May 2020.

Journal of Rural Health : 433–445. 16. Omori R, Mizumoto K, Chowell G . Changes in testing rates could mask the novel coronavirus disease (COVID-19) growth rate.

International Journal of Infectious Diseases : 116–118. 17. Desjardins MR, Hohl A, Delmelle EM . Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters.

Applied Geography : 102202. 18.

Lieberman-Cribbin W, et al.

Disparities in COVID-19 Testing and Positivity in New York City.

American Journal of Preventive Medicine

Cordes J, Castro MC . Spatial analysis of COVID-19 clusters and contextual factors in New York City.

Spatial and Spatio-temporal Epidemiology

Hatchett RJ, Mecher CE, Lipsitch M . Public health interventions and epidemic intensity during the 1918 influenza pandemic.

Proceedings of the National Academy of Sciences of the United States of America : 7582–7587. Bivand RS, Pebesma E, Gómez-Rubio V . Applied Spatial Data Analysis with R . Springer, 2013. 22.

Waller LA, Gotway CA . Applied Spatial Statistics for Public Health Data . Vol. 368. John Wiley & Sons, 2004. 23.

Raciborski F, et al.

Dynamics of COVID ‐

19 outbreak in Poland: an epidemiological analysis of the first two months of the epidemic.

Polish Archives of Internal Medicine

Krzysztofik R, Kantor-Pietraga I, Spórna T . Spatial and functional dimensions of the COVID-19 epidemic in Poland.

Eurasian Geography and Economics

Ministry Of Health in Poland - official Twitter profile . (https://twitter.com/mz_gov_pl?lang=pl). Accessed 5 August 2020. 26.

Runge A, et al.

Can depopulation create urban sustainability in postindustrial regions? A case from Poland.

Sustainability (Switzerland) : 4633. 27. Statistics Poland . Local Data Bank . 28.

Jarynowski A, et al.

Attempt to understand public health relevant social dimensions of COVID-19 outbreak in Poland.

Society Register : 7–44. 29. Pinkas J, et al.

Public health interventions to mitigate early spread of SARS-CoV-2 in Poland.

Medical Science Monitor : e924730-1–e924730-7. 30. Bai Y, et al.

Presumed Asymptomatic Carrier Transmission of COVID-19.

JAMA - Journal of the American Medical Association : 1406–1407. 31.

Li R, et al.

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Science : 489–493. 32.

Polish Press Agency . Szumowski: rekord zachorowań wynika z badań przesiewowych górników i innych ognisk

Polska Grupa Górnicza . Wygasa epidemia w Polskiej Grupie Górniczej . Błoński M . Kolejne przypadki koronawirusa w kopalniach. Chorzy w Rybniku i Rudzie Śląskiej . 2020. 35.

TVN24 . Morawiecki: ‘Śmiało idźcie do urn’. A co pokazują dane o pandemii?

Polish Press Agency . Sasin: od jutra wstrzymamy prace w dwóch kopalniach JSW i w 10 kopalniach PGG

Polsat . Sasin: Śląsk jest najbezpieczniejszym miejscem w Polsce

RMF24 . Szumowski: Rozważamy powrót do obostrzeń. Polacy zapomnieli, że mamy epidemię

Dalziel BD, et al.

Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities.

Science : 75–79. 40.

Balcan D, et al.

Multiscale mobility networks and the spatial spreading of infectious diseases.

Proceedings of the National Academy of Sciences of the United States of America : 21484–21489. 41.

Ogiolda K . Boże Ciało 2020 w Opolu. Biskupi poprowadzili procesję z katedry ‘na górkę’ . opole.naszemiasto.pl . Opole, 2020; Published online: 2020. 42. Chruścińska-Dragan M . Procesje Bożego Ciała 2020. Czy się odbędą? W tym roku zastąpią je procesje wokół świątyń i przykościelnych placów . Dziennik Zachodni . 2020; Published online: 2020. 43.

Cobey S . Modeling infectious disease dynamics.

Science : 713–714. 44.

Błoński M . Śląskie: ponad jedna trzecia przebadanych górników z kopalni Bielszowice z koronawirusem . 2020. 45.

Tagliazucchi E, et al.

Lessons from being challenged by COVID-19.

Chaos, Solitons and Fractals : 109923. 46.

Ramírez-Aldana R, Gomez-Verjan JC, Bello-Chavolla OY . Spatial analysis of COVID-19 spread in Iran: Insights into geographical and structural transmission determinants at a province level. medRxiv

Ramírez IJ, Lee J . COVID-19 emergence and social and health determinants in Colorado: A rapid spatial analysis.

International Journal of Environmental Research and Public Health : 3856. 48. Orlikowski P . Nowe ognisko koronawirusa. Przebadano 800 osób, uruchomiono mobilną stację . 2020. 49.

Onet . Poważny błąd w liczbie testów w Kielcach. Szpital wydał oświadczenie . 2020; Published online: 2020. 50.

Watoła J . Statystyki koronawirusa coraz bardziej fałszywe. Ministerstwu Zdrowia brakuje na Śląsku już 1,1 tys. przypadków . Gazeta Wyborcza . Katowice, 2020; Published online: 2020. 51.

Niżankowski R, Myśliwiec M, Szymański P . Zalecenia w COVID-19 . 2020. 52.

Chinazzi M, et al.

The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak.

Science : 395–400. 53.

Bergquist R, Rinaldi L . Covid-19: Pandemonium in our time.

Geospatial Health Published online: 2020.doi:10.4081/gh.2020.880. 54.

Ministry of Health in Poland . Report on tests . 2020. 55.

Statistics Poland . Demographic Yearbook of Poland . 2020. 56.

Wickham H, et al. dplyr: A Grammar of Data Manipulation.

R package version 0.8.3. Wickham H . ggplot2 Elegant Graphics for Data Analysis (Use R!) . Springer . 2016. 58.

Kassambara A . ‘ggpubr’: ‘ggplot2’ Based Publication Ready Plots.

R package version 0.2.5.

Wickham H . Reshaping data with the reshape package.

Journal of Statistical Software : 1–20. 60. Müller K, Wickham H . tibble: Simple Data Frames. R package version 3.0.1. Pebesma E . Simple features for R: Standardized support for spatial vector data.

R Journal : 439–446. 62. Tennekes M . Tmap: Thematic maps in R.

Journal of Statistical Software : 1–39. 63. David R, Hayes A . broom: Convert Statistical Analysis Objects into Tidy Tibbles. R package version 0.5.2. Sievert C . plotly for R . 2018. 65. Milton Bache S, Wickham H . Magrittr: a forward-pipe operator for R . 2014. Table 1

Estimates of relative risk (as of August 17) using unweighted (CSIR) and weighted (WCSIR) approaches, relative changes and their corresponding positions

Region CSIR Rank 1 Relative Change (%) WCSIR Rank 2 -6.70 0.79 +11.60 0.37 +31.76 0.53 +112.31 0.83 -2.03 1.24 -0.73 1.09 -39.66 0.60 +170.53 2.66 +171.41 1.61

10 BIA 0.65 +19.97 0.78

11 GDA 0.47 -15.24 0.40

12 KAT 2.80 +3.10 2.89

13 KIE 0.64 +33.05 0.85

14 OLS 0.30 +43.36 0.42

15 POZ 0.91 -5.82 0.86

16 SZC 0.39 +13.04 0.44 Table 2

The description of functions included in the computer code (R script)

Function What does it do? cumulate_df A function that cumulates a data frame relrisk Calculates SIR. Note that it can also be used to provide input to interpolate_tests function. relrisk_cum Calculates CSIR. sum_cum Calculates cumulative sum of observed cases for every day. interpolate_tests Calculates CSTR. weighted_risk Calculates WCSIR. reorder It is used to properly assign risk CSIR and WCSIR values to the polygons in shapefile. Figures Fig. 1

Daily time series of (A) SIR and (B) CSIR by region in Poland. The gradient denotes the assumed decreasing intensity of screening tests with time Fig. 2

WCSIR by region in Poland. Note that we included two unweighted CSIR curves (bolded) to illustrate the impact of weighting. The gradient denotes the assumed decreasing intensity of screening tests with time Fig. 3

COVID-19 epidemiological measures for administrative regions in Poland. (CSIR – Cumulative Standardised Incidence Ratio, CSTR – Cumulative Standardised Testability Ratio, LPR – Local Positivity Rate, GPR – Global Positivity Rate, WCSIR – Weighted Cumulative Standardised Incidence Ratio) Fig. 4

Relationship between CSIR and WCSIR curves for individual regions representing six distinct patterns: (A) 1 WRO: Both CSIR and WCSIR values are decreasing, (B) 7 WAR: CSIR is decreasing and WCSIR is increasing, (C) 8 OPO: divergent pattern of CSIR and WCSIR, (D) 9 RZE: both CSIR and WCSIR are increasing, (E) 12 KAT: proportional growth of CSIR and WCSIR, then the relationship plateaus, (F) 15 POZ: a change in trajectory followed by a zigzag pattern between CSIR and WCSIR Financial support : The study (APC) was funded by University of Silesia in Katowice.

Conflicts of Interest: None Data Availability Statement

Processed data on infections and testing are available at https://github.com/michalmichalak997/COVID-19/blob/master/Data%20and%20code/Data%20and%20code_December_4_update.zip . Original data on infections and testing are available at https://twitter.com/mz_gov_pl?lang=pl. Processed data that are used for generating individual plots are available at https://github.com/michalmichalak997/COVID-19/blob/master/Data%20and%20code/Data%20for%20figures_December_4_update.zip Population census data are available at https://stat.gov.pl/en/ .

Code availability

Computer code is available at https://github.com/michalmichalak997/COVID-19/blob/master/README.md. Interactive plots are publicly available at https://michalmichalak997.shinyapps.io/shiny_corona/ Title 1 (old):

A systematic framework for spatiotemporal modelling of COVID-19 disease

Title 2:

An unbiased spatiotemporal risk model for COVID-19 with epidemiologically meaningful dynamics

Authors: Michał Paweł Michalak, Jack Cordes, Agnieszka Kulawik, Sławomir Sitek, Sławomir Pytel, Elżbieta Zuzańska-Żyśko, Radosław Wieczorek 'Supplementary Material' Mathematical formalism Theorem 1 . For a specific region 𝑖 , the 𝐶𝑆𝐼𝑅 on day 𝑡 will decrease if the proportion of new cases for this region on day 𝑡 will be not greater than the proportion of all cases for this region on day 𝑡 − 1 . Proof.

Let 𝑖 be a specific region and 𝐶𝑆𝐼𝑅 𝑖 (𝑡 − 1) = 𝑂 𝑖,𝑡−1 /𝐸 𝑖,1:(𝑡−1) and 𝐶𝑆𝐼𝑅 𝑖 (𝑡) = (𝑂 𝑖,𝑡−1 + 𝑥 𝑖,𝑡 )/𝐸 𝑖,1:𝑡 be the cumulative estimates of relative risks for region 𝑖 on days 𝑡 − 1 and 𝑡 , respectively, where 𝑂 𝑖,𝑡−1 denotes the total (historical) sum of observed number of cases for 𝑖 registered on day 𝑡 − 1 , 𝑥 𝑖,𝑡 denotes new cases for 𝑖 confirmed on day 𝑡 , and 𝐸 𝑖,1:(𝑡−1) and 𝐸 𝑖,1:𝑡 denote the expected cumulative number of cases for 𝑖 on days 𝑡 − 1 and 𝑡 , respectively. The decrease in 𝐶𝑆𝐼𝑅 between days 𝑡 − 1 and 𝑡 may be written in the form of an inequality: 𝑂 𝑖,𝑡−1 +𝑥 𝑖,𝑡 𝐸 𝑖,1:𝑡 ≤ 𝑂 𝑖,𝑡−1 𝐸 𝑖,1:(𝑡−1) ↔ (7) 𝑂 𝑖,𝑡−1 + 𝑥 𝑖,𝑡 ≤ 𝑂 𝑖,𝑡−1 𝐸 𝑖,1:𝑡 𝐸 𝑖,1:(𝑡−1) ↔ (8) 𝑥 𝑖,𝑡 ≤ 𝑂 𝑖,𝑡−1 𝐸 𝑖,1:𝑡 𝐸 𝑖,1:(𝑡−1) − 𝑂 𝑖,𝑡−1 𝐸 𝑖,1:(𝑡−1) 𝐸 𝑖,1:(𝑡−1) = 𝑂 𝑖,𝑡−1 (𝐸 𝑖,1:𝑡 −𝐸 𝑖,1:(𝑡−1) )𝐸 𝑖,1:(𝑡−1) ↔ (9) 𝑥 𝑖,𝑡 𝑂 𝑖,𝑡 ≤ 𝐸 𝑖,1:𝑡 −𝐸 𝑖,1:(𝑡−1) 𝐸 𝑖,1:(𝑡−1) ↔ (10) But 𝐸 𝑖,1:(𝑡−1) = 𝑃 𝑖 𝑟 𝑡−1 and 𝐸 𝑖,1:𝑡 = 𝑃 𝑖 𝑟 𝑡 with 𝑟 𝑡−1 = 𝑂(𝑡−1)∑ 𝑃 𝑖𝑖 and 𝑟 𝑡 = 𝑂(𝑡−1)+𝑥(𝑡)∑ 𝑃 𝑖𝑖 , where 𝑂(𝑡 − 1) denotes the total (historical) sum of observed number of cases for all regions on day 𝑡 − 1 , 𝑥(𝑡) =∑ 𝑥 𝑖,𝑡𝑖 denotes the sum of cases confirmed for all regions on day 𝑡 , 𝑃 𝑖 is the population size in 𝑖 , and ∑ 𝑃 𝑖𝑖 is the country-wide population. Thus, 𝑥 𝑖,𝑡 𝑂 𝑖,𝑡−1 ≤ 𝑃 𝑖 (𝑟 𝑡 −𝑟 𝑡−1 )𝑃 𝑖 𝑟 𝑡−1 = 𝑟 𝑡 −𝑟 𝑡−1 𝑟 𝑡−1 (11) Substituting for 𝑟 𝑡 and 𝑟 𝑡−1 : 𝑥 𝑖,𝑡 𝑂 𝑖,𝑡−1 ≤ 𝑂(𝑡−1)+𝑥(𝑡)∑ 𝑃𝑖𝑖 − 𝑂(𝑡−1)∑ 𝑃𝑖𝑖𝑂(𝑡−1)∑ 𝑃𝑖𝑖 = 𝑥(𝑡)∑ 𝑃𝑖𝑖𝑂(𝑡−1)∑ 𝑃𝑖𝑖 = 𝑥(𝑡)𝑂(𝑡−1) (12) But the above result is equivalent to: 𝑥 𝑖,𝑡 𝑥(𝑡) ≤ 𝑂 𝑖,𝑡−1 𝑂(𝑡−1) (13) Note that 𝑥 𝑖,𝑡 𝑥(𝑡) is the proportion of cases for region 𝑖 on day t, while 𝑂 𝑖,𝑡−1 𝑂(𝑡−1) is the “cumulative” proportion of cases for region 𝑖 on day 𝑡 − 1 , i.e. the total (historical sum) of confirmed cases for 𝑖 up to day 𝑡 − 1 , divided by the total (historical sum) of confirmed cases in the country up to day 𝑡 − 1 . It is also possible to write a simplified version of the above proof. Let 𝑂(𝑡) = ∑ 𝑂 𝑖,𝑡𝑖 , 𝑃 = ∑ 𝑃 𝑖 𝑖 , 𝜌 𝑖 = 𝑃 𝑖 𝑃 and 𝑥(𝑡) = ∑ 𝑥 𝑖,𝑡𝑖 . Then

𝐶𝑆𝐼𝑅 𝑖 (𝑡) = 𝑂 𝑖,𝑡 𝐸 𝑖,𝑡 = 𝑂 𝑖,𝑡 𝑃𝑖𝑂(𝑡)𝑃 = 𝑖 𝑂 𝑖,𝑡 𝑂(𝑡) . (14) Then,

𝐶𝑆𝐼𝑅 𝑖 (𝑡 − 1) ≥ 𝐶𝑆𝐼𝑅 𝑖 (𝑡) ↔ (15) 𝑖 𝑂 𝑖,𝑡−1 𝑂(𝑡−1) ≥ 𝑖 𝑂 𝑖,𝑡 𝑂(𝑡) ↔ (16) 𝑂 𝑖,𝑡−1 𝑂(𝑡−1) ≥ 𝑂 𝑖,𝑡 𝑂(𝑡) ↔ (17) 𝑂 𝑖,𝑡−1 𝑂(𝑡) ≥ 𝑂 𝑖,𝑡

𝑂(𝑡 − 1) (18) 𝑂 𝑖,𝑡−1 (𝑂(𝑡 − 1) + 𝑥(𝑡)) ≥ (𝑂 𝑖,𝑡−1 + 𝑥 𝑖,𝑡 )𝑂(𝑡 − 1) (19) 𝑂 𝑖,𝑡−1 𝑥(𝑡) ≥ 𝑥 𝑖,𝑡 𝑂(𝑡 − 1) (20) 𝑂 𝑖,𝑡−1 𝑂(𝑡−1) ≥ 𝑥 𝑖,𝑡 𝑥(𝑡) (21) Theorem 2.

The WCSIR will decrease if either of the two following conditions in the form of conjuctions are met (we note that an analogous theorem can be formulated for an increasing WCSIR): (i) The CSIR is decreasing (proportion of infected for region 𝑖 on day 𝑡 is not greater than the cumulative proportion of infected for region 𝑖 on day 𝑡 − 1 ) and the CSTR is increasing (proportion of tests in region 𝑖 on day 𝑡 is not smaller than the cumulative proportion of tests on day 𝑡 − 1 .) (ii) LPR is decreasing (the ratio of positive cases for region 𝑖 on day 𝑡 is not greater than the cumulative ratio of positive cases on day 𝑡 − 1 ) and GPR is increasing (the ratio of positive cases for the whole country on day 𝑡 is not smaller than the cumulative ratio for the whole country of positive cases on day 𝑡 − 1 ). Proof. (i) If

𝐶𝑆𝐼𝑅 𝑖 (𝑡) ≤ 𝐶𝑆𝐼𝑅 𝑖 (𝑡 − 1) and 𝐶𝑆𝑇𝑅 𝑖 (𝑡) ≥ 𝐶𝑆𝑇𝑅 𝑖 (𝑡 − 1) , then 𝑊𝐶𝑆𝐼𝑅 𝑖 (𝑡) = 𝐶𝑆𝐼𝑅 𝑖 (𝑡)𝐶𝑆𝑇𝑅 𝑖 (𝑡) ≤ 𝐶𝑆𝐼𝑅 𝑖 (𝑡−1)𝐶𝑆𝑇𝑅 𝑖 (𝑡−1) = 𝑊𝐶𝑆𝐼𝑅 𝑖 (𝑡 − 1) . (ii) Let us first prove a following observation. Observation 1. 𝑊𝐶𝑆𝐼𝑅 𝑖 (𝑡) = 𝐿𝑃𝑅 𝑖 (𝑡)𝐺𝑃𝑅(𝑡) . Proof.

𝑊𝐶𝑆𝐼𝑅 𝑖 (𝑡) = 𝑂 𝑖,𝑡 𝐸 𝑖,1:𝑡 𝑇 𝑖,𝑡 𝑇𝐸 𝑖,1:𝑡 = 𝑂 𝑖,𝑡 𝑇 𝑖,𝑡 𝐸 𝑖,1:𝑡 𝑇𝐸 𝑖,1:𝑡 = 𝐿𝑃𝑅 𝑖 (𝑡)𝑃 𝑖 𝑂(𝑡)∑ 𝑃 𝑖𝑖 𝑃 𝑖 𝑇(𝑡)∑ 𝑃 𝑖𝑖 = 𝐿𝑃𝑅 𝑖 (𝑡)𝐺𝑃𝑅(𝑡) . It results that if

𝐿𝑃𝑅 𝑖 (𝑡) ≤ 𝐿𝑃𝑅 𝑖 (𝑡 − 1) and 𝐺𝑃𝑅(𝑡) ≥ 𝐺𝑃𝑅(𝑡 − 1) , then 𝑊𝐶𝑆𝐼𝑅 𝑖 (𝑡) = 𝐿𝑃𝑅 𝑖 (𝑡)𝐺𝑃𝑅(𝑡) ≤ 𝐿𝑃𝑅 𝑖 (𝑡−1)𝐺𝑃𝑅(𝑡−1) = 𝑊𝐶𝑆𝐼𝑅 𝑖 (𝑡 − 1) . Supplementary Figures

Fig. S1

The development of the epidemic in Poland throughout the study period: (A) Number of daily infections; (B) Cumulative number of infections Fig. S2

Administrative regions of Poland with full names and capitals (in parentheses): 1 WRO – dolnośląskie (Wrocław), 2 BYD – kujawsko-pomorskie (Bydgoszcz), 3 LUB – lubelskie (Lublin), 4 GOR – lubuskie (Gorzów Wielkopolski), 5 LOD – łódzkie (Łódź), 6 KRA – małopolskie (Kraków), 7 WAR – mazowieckie (Warszawa), 8 OPO – opolskie (Opole), 9 RZE – podkarpackie (Rzeszów), 10 BIA – podlaskie (Białystok), 11 GDA – pomorskie (Gdańsk) , 12 KAT – śląskie (Katowice), 13 KIE – świętokrzyskie (Kielce), 14 OLS – warmińsko-mazurskie (Olsztyn), 15 POZ – wielkopolskie (Poznań), 16 SZC – zachodniopomorskie (Szczecin) Fig. S3

Relative testing intensity throughout the study period. Please note that the data are interpolated which causes “edge” effects at the nodes Fig. S4

Local (LPR) and global (GPR) cumulative positivity ratios Fig. S5

Relationship between CSTR and CSIR: (A) 1 WRO: CSIR is monotonically decreasing even if the CSTR is sometimes increasing (B) 7 WAR: both CSTR and CSIR are decreasing up to early August, (C) 8 OPO: decreasing CSTR with a simultaneous growth in CSIR in late-May, (D) 9 RZE: CSIR increasing since mid-June, yet CSTR increased in August, (E) 12 KAT: similar growth rate between CSTR and CSIR in early June that followed a greater growth rate of CSIR in May, (F) 15 POZ: decreasing CSTR with undulating CSIR Fig. S6

Relationship between LPR and WCSIR: (A) 1 WRO: simultaneous decrease in LPR and WCSIR (B) 7 WAR: constant LPR with increasing WCSIR in mid-July, (C) 8 OPO: slightly decreasing LPR with an increasing WCSIR in late-May, (D) 9 RZE: both LPR and WCSIR are increasing since June, (E) 12 KAT: LPR and WCSIR are together increasing and then together decreasing, (F) 15 POZ: faster growth of WCSIR than that of LPR since July Fig. S7

Relationship between CSIR and LPR: (A) 1 WRO: both CSIR and LPR are decreasing (B) 7 WAR: CSIR is decreasing and LPR is increasing, (C) 8 OPO: a zigzag trajectory, trend shows decreasing CSIR and LPR, (D) 9 RZE: both CSIR and LPR are increasing since mid-June, (E) 12 KAT: constant CSIR with decreasing LPR in July, (F) 15 POZ: change in simultaneous decrease of both CSIR and LPR since July Fig. S8

Relationship between CSTR and LPR: (A) 1 WRO: LPR is monotonically decreasing, CSTR is sometimes increasing (B) 7 WAR: CSTR is decreasing and LPR increasing since late-July, (C) 8 OPO: CSTR increasing since June with a constant LPR in August, (D) 9 RZE: CSTR increased slightly in August with dramatically increasing LPR, (E) 12 KAT: decreasing LPR and increasing CSTR since mid-June, (F) 15 POZ: faster decrease rate for CSTR than the rate of increase of LPR since July Fig. S9

Relationship between CSTR and WCSIR: (A) 1 WRO: WCSIR is monotonically decreasing, the CSTR is sometimes increasing (B) 7 WAR: CSTR decreasing with a trend of increasing WCSIR since late-May, (C) 8 OPO: decreasing CSTR with a simultaneous growth in WCSIR in mid-May, (D) 9 RZE: CSTR increased slightly in August with a dramatically increasing WCSIR, (E) 12 KAT: decreasing WCSIR since mid-June and increasing CSTR, (F) 15 POZ: decreasing CSTR with undulating WCSIR since July Fig. S10

Relationship between GPR and LPR: (A) 1 WRO: decreasing LPR with increasing GPR in early-June and August, (B) 7 WAR: in August both LPR and GPR are increasing, (C) 8 OPO: in August LPR approximately constant with increasing GPR, (D) 9 RZE: in August LPR increasing faster than GPR, (E) 12 KAT: increasing LPR with decreasing GPR in May, (F) 15 POZ: in August GPR increasing faster than LPR Fig. S11

Relative safety perspective assumed as 1/risk: (A) unweighted safety (1/CSIR); (B) weighted safety (1/WCSIR) Supplementary Tables Tab. S1

Reported inaccuracies related to the number of confirmed cases

Error ID No. of errors Reporting date Error date Correction date Reference

1 4 June 26, 2020 ND June 25, 2020 [1] 2 5 June 26, 2020 ND June 24, 2020 [2] 3 51 June 19, 2020 ND June 16-19, 2020 [3] 4 1 June 2, 2020 ND June 1, 2020 [4] 5 2 June 9, 2020 June 6, 2020 June 6, 2020 [5] 6 4 May 31, 2020 ND May 31, 2020 [6] 7 1 May 29, 2020 ND May 27, 2020 [7] 8 2 May 26, 2020 ND May 26, 2020 [8] 9 1 June 1, 2020 ND May 25, 2020 [9] 10 34 May 25, 2020 ND May 25, 2020 [10] 11 4 May 23, 2020 ND May 23, 2020 [11] 12 2 May 25, 2020 ND May 23, 2020 [12] 13 39 May 13, 2020 May 12, 2020 May 12, 2020 [13] 14 21 May 8, 2020 May 7, 2020 May 7, 2020 [14] 15 17 May 7, 2020 May 5, 2020 May 5, 2020 [15] 16 2 May 6, 2020 ND May 5, 2020 [16] 17 63 April 30, 2020 April 16 - 19, 2020 April 16 - 19, 2020 [17] 18 5 June 2, 2020 ND June 1, 2020 [18] Tab. S2

Cumulative number of tests, as of March 26, 2020

Region Cumulative number of tests (as of March 26, 2020) Source (OD – official data sent on request) SI References

1. Ministry of Health Poland. Report on errors 51