[PDF] Climate & BCG: Effects on COVID-19 Death Growth Rates

Abstract

Multiple studies have suggested the spread of COVID-19 is affected by factors such as climate, BCG vaccinations, pollution and blood type. We perform a joint study of these factors using the death growth rates of 40 regions worldwide with both machine learning and Bayesian methods. We find weak, non-significant (< 3 σ ) evidence for temperature and relative humidity as factors in the spread of COVID-19 but little or no evidence for BCG vaccination prevalence or PM 2.5 pollution. The only variable detected at a statistically significant level (>3 σ ) is the rate of positive COVID-19 tests, with higher positive rates correlating with higher daily growth of deaths.

Full PDF

CClimate & BCG: Effects on COVID-19 DeathGrowth Rates

Chris Finlay and Bruce A. Bassett South African Radio Astronomical Observatory, Observatory, Cape Town, 7295 Department of Maths and Applied Maths, University of Cape Town, Rondebosch, Cape Town, 7700 African Institute for Mathematical Sciences, Muizenburg, Cape Town, 7950, South Africa South African Astronomical Observatory, Observatory, Cape Town, 7295 + Email:cﬁ[email protected] * Email:[email protected]

July 14, 2020 M ultiple studies have suggested the spreadof COVID-19 is aﬀected by factors such asclimate, BCG vaccinations, pollution andblood type. We perform a joint study of thesefactors using the death growth rates of 40 re-gions worldwide with both machine learning andBayesian methods. We ﬁnd weak, non-signiﬁcant( < σ ) evidence for temperature and relative hu-midity as factors in the spread of COVID-19 but lit-tle or no evidence for BCG vaccination prevalenceor PM . pollution. The only variable detected ata statistically signiﬁcant level (> σ ) is the rate ofpositive COVID-19 tests, with higher positive ratescorrelating with higher daily growth of deaths. The COVID-19 pandemic has triggered extensive ef-forts to predict the severity of COVID-19 to aid in de-cision making around interventions such as lockdownand the closure of schools [1]. Regions hit hard bythe pandemic, such as Wuhan, Lombardy and NewYork, where doubling times of 2-3 days and high crudemortality rates stand in stark contrast to other coun-tries only mildly aﬀected such as Hong Kong, SouthKorea and New Zealand.Potential explanations for the apparent diﬀerencesin the transmissivity (encoded by the time-dependentreproductive number, R t ) and lethality (encoded bythe Infection Fatality Rate, IFR), currently fall into four broad categories. The ﬁrst posits that diﬀerences arelargely ﬁctitious, driven by the heterogeneity of testingand reporting of cases and deaths; a known issue andone that we are particularly concerned with in thispaper.The next three categories posit that the diﬀerencesare primarily real and are driven by (1) cultural andpolicy factors (swift lockdown, eﬃcient contact tracingand quarantining, use of masks, obedient populationsor social structures that are naturally distant or iso-lated), (2) local environmental factors (such as tem-perature and humidity variations, population density,comorbid factors, vaccinations, vitamin D levels, bloodtype etc...) or (3) existence of multiple strains withdiﬀerent transmissibility or lethality [4]. Our primaryinterest lies in disentangling the ﬁrst category (test-ing) from a subset of potential factors in the secondcategory.Finding the relative contributions of each of thesefour categories is key in understanding and optimallyﬁghting the pandemic. The widely diﬀerent testingcapabilities between countries, particularly betweenthe developed and developing world, imply that if notcorrectly treated, testing variability will create spuriousevidence that can lead to false hope and sub-optimalinterventions.In the wake of the spreading pandemic there havebeen a host of studies that have examined the possi-ble impact of climate [8]-[19], blood type [20, 21],haplogroup [31], pollution [26]-[30] and BCG vacci-nation prevalence [22, 23, 24, 25] on the spread ofCOVID-19. Our main conclusion is that testing issuesare signiﬁcant and the factors above are likely not themain causes of variability in growth rates of deaths a r X i v : . [ q - b i o . P E ] J u l limate & BCG: Eﬀects on COVID-19 Death Growth Rates worldwide.We note that both the basic reproductive number, R , and the IFR will be aﬀected by the four categoriesabove in general. However, since we do not currentlyhave access to the true number of infections we cannotaddress the potential eﬀect of environmental factorson the IFR. Similarly the Case Fatality Rate cannot beused for this purpose due to testing diﬀerences aroundthe world. We therefore focus on their potential eﬀecton R , which we quantify through the daily growthrate of deaths of countries around the world for whichwe have suﬃciently good data. To explore the potential impact of climate (e.g. throughtemperature, relative humidity and UV Index), BCGvaccination prevalence, blood type and pollution(PM . ) on COVID-19, one must think carefully aboutchoice of both data and methodology.We choose to focus on deaths instead of the numberof conﬁrmed infections in the belief that these are sig-niﬁcantly less aﬀected by sampling and testing issuesthan infection numbers: the number of tests per 1000population (Tests/1k) currently varies by more thanthree orders of magnitude across the world , makinginfection numbers highly biased and correlated withconfounding variables such as GDP and healthcare. Incountries with limited testing capability, only the moresevere patients are typically tested.Since these are also the patients most likely to die, itis likely that patients who die from COVID-19 are morelikely to be tested than typical COVID-19 suﬀerers, whomay be mostly asymptomatic. Without detailed knowl-edge of testing protocols in each country separatingout the eﬀects of the factors of interest (climate, BCGvaccination etc...) is extremely challenging. Althoughdeaths are not immune to testing issues, with manymissing deaths shown through excess mortality studies,especially in overwhelmed medical systems [47, 48],we argue that this is likely to be much less of an issuein the ﬁrst month after initial COVID-19 deaths, whichwe focus on, since we are interested in R .The second key choice is whether to focus on ab-solute death counts or growth rates. Absolute deathcounts are also highly problematic. First, testing eﬃ-ciencies may vary systematically from country to coun-try. Second, the widely varying start dates of the pan-demic in diﬀerent countries are hard to deal with rig-orously. As a result we focus on the daily growth rateof deaths, G c , for country c during the initial phase ofthe epidemic. The initial daily growth correlates with R c , the base reproductive number in each country; insimple models R c = (1 + G c ) τ , where τ is the periodin days for which a person is infectious on average . http://worldometers.info/coronavirus/ Hence G c = 0 . and τ = 5 days yields R (cid:39) . . Separating out the true causes of variation in G c or R c is still a highly complex problem, and we discussour approach in detail in the additional material insection (5). Part of the complexity arises from the factthat R c depends explicitly both on properties of thevirus and on social factors (e.g. average number andnature of contacts), as well as any of the other potentialfactors we wish to study. Hence, we expect there tobe a large intrinsic variation in the growth of deathsfrom country to country, driven e.g. by the averagenumber of people in a household, population density[5] etc... that may correlate with, and hence leadspurious evidence for, potential factors such as climate,vaccination coverage, blood type and pollution.We model this complexity by allowing each country, c , to have its own unique base growth rate, denoted G c , that is estimated from its own data. This freedomallows the model to account for the myriad unmod-elled factors speciﬁc to each country (population den-sity, GDP, culture, health care quality etc...). However,we tie these base country growth rates, G c , togetherthrough a parent distribution, expressing the prior be-lief that there is a single dominant strain of COVID-19globally.Our primary goal in this study is to examine world-wide data to investigate whether climate and the BCGvaccination prevalence are important drivers of COVID-19 spread. Because of the danger of confounding vari-ables we used the base growth rate, G c for each coun-try in a machine learning feature extraction algorithmto pick the most important additional variables to in-clude in our computationally intensive hierarchicalBayesian analysis. This lead us to exclude PM . pol-lution. In addition we undertook a separate analysisincluding A+ blood type. The data for blood type camefrom unpublished online sources and is therefore keptseparate since it is less trustworthy.The remaining four most promising environmentalfactors were Temperature (T, ◦ C), Relative Humidity(RH, %), BCG vaccination coverage (BCG, %), Ultra-Violet Index (UVIndex); the latter included as a proxyfor vitamin D production. To this set we added twotesting-related variables to serve as diagnostics for po-tential contamination from testing issues: (1) the frac-tion of tests that return positive (Pos-Rate, %) and (2)the number of tests per 1,000 population (Tests/1k),yielding our ﬁnal set of global parameters, Θ ≡ (T, RH,BCG, UV, Pos-Rate, Tests/1k). Due to concerns overdata integrity, we study the impact of A+ blood typeseparately in section (5.4.1).We then explicitly ﬁt for Θ using our sample of 40regions in 37 countries for which there is data for all ofthe parameters in Θ . We cannot use all the death datafor countries since a constant growth rate model failsquickly for most countries due to Non-PharmaceuticalInterventions (NPIs) or nonlinear eﬀects. To addressthis we extract the initial pure exponential part of thedata on a country-by-country basis, as illustrated inFig. (1) and discussed in detail in section (5), which is Page 2 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates the only data we use in our main analysis.The early phase death data for each country c , D c ( t ) ,are then ﬁt to the model: D c ( t ) = D c t (cid:89) τ =1 (cid:18) β { G c + Θ · X c } (cid:19) (1)where β = 10 − converts our growths to percentagesand the X c are the country-speciﬁc data correspondingto Θ .We use a hierarchical Bayesian analysis in whicheach country’s growth rate is drawn from a parent dis-tribution which allows each country’s base growth rate, G c , to be intrinsically diﬀerent rather than forcingany diﬀerences to be due only to the parameters in thefactors encoded by Θ . Figure 1:

Example ﬁt to the log of deaths vs time for India,showing how we select the initial linear part of thelogarithmic data for inclusion in our analysis. Dataare shown as blue points. Data covered by red ﬁtsare those used in our main analysis, in this case at t c ≤ days after reaching three deaths. Greensamples are drawn from the 5d sigmoid model whichcovers all the data and models the changes in thegrowth rate due to social interventions or nonlineareﬀects. This is one of the 40 countries/provinces inour analysis; the full set is shown in Fig. (9). We use Monte Carlo Markov Chain (MCMC) meth-ods to simultaneously ﬁt the base growth rates, G c ,for all countries, our parameters of interest, Θ , andthe parent distribution hyperpriors, implying a largeBayesian hierarchical model with 90 parameters in to-tal. After marginalising over all country growth ratesand parent distribution parameters, we are left withthe marginal distributions on the Θ , our parametersof interest, providing our main results.We assess the importance of each of the Θ param-eters both through standard model selection metricssuch as the Bayesian evidence and Bayesian Informa-tion Criterion (see Table 1) and by computing the sta-tistical signiﬁcance with which the parameters deviatefrom zero in the marginalised posterior chains. Wenow discuss these results. Figure (4) shows the main results of our paper: onlythe parameter associated to the positive rate or fractionof tests (Pos-Rate) is non-zero at more than σ . Thisresult is stable to signiﬁcant changes in our priors andhyperpriors and to including or removing the otherparameters in Θ in the analysis. In addition, the posi-tive rate is selected as the most important variable byall model-selection metrics (see Table 1) and by ourmachine learning analysis, section (5.5). A plot of re-gional death growth rates versus positive rate is shownin Fig. (2) showing the clear correlation.The simplest explanation of this result is that regionsthat experience the most rapid spread of the disease,i.e. those with the largest R , were also the regionsthat were least able to keep up with testing demandsand hence where the rate of Polymerase Chain Reac-tion (PCR) tests used for COVID-19 returning positivewere the highest on average. This is then not a cause,but rather a result of, high growth rate. We ﬁnd thatPos-Rate correlates positively with Tests per 1000 pop-ulation (see Fig. 10): high death growth rate andgrowing positive rate likely spurred increased testing.Neither of these observations help us identify the causeof increased death growth rates, however.Temperature is the only other variable which is non-zero at more than σ in our multivariate ﬁts (see sec-tion 5.4.3 for more discussion) and relative humiditythe only other parameter non-zero at more than . σ ,providing some weak evidence for climatic impact onthe spread of the disease. Increasing temperature andhumidity tends to decrease the spread of COVID-19,in agreement with previous studies.In our model-selection metrics, where we compareﬁts with one variable at a time, relative humidity ispreferred over temperature by the Akaike InformationCriteria (AIC) and Bayesian Information Criteria (BIC)and is the only variable other than Pos-Rate, that has aBIC value more than 2 units lower than the No Factormodel (which has Θ = 0 ) . The remaining param-eters have poor or mixed results relative to the NoFactor model, and all perform very poorly relative tothe model with just the positive rate.In particular, our regression and machine learninganalyses provide no evidence in support of BCG frac-tion. This is not surprising if we look at ﬁgure (3)which plots regional death growth rates versus BCGcoverage: there is no discernible correlation. Our re-sults are therefore in agreement with [24, 25]. Furtherwe do not ﬁnd any correlation between UV index anddeath growth rate. This is pertinent since UV index isrelevant to natural production of vitamin D [38] andvitamin D has been suggested as a protective factoragainst the spread of the disease [35, 36, 37],Now we brieﬂy discuss results for the other testing ∆ BIC = 2 − is the standard demarcation of positive evidence[40]. Page 3 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates variable, Tests / 1k. We ﬁnd that this is not a signif-icant correlate of growth rate. This is perhaps notunexpected. If testing eﬃciency in each country is ap-proximately constant, i.e. the rate of true infectionsdetected changes only slowly with time, then this haslittle impact on the growth rate of deaths. It is only ifthe testing eﬃciency varies rapidly with time that wewould expect this to be signiﬁcant. Our results suggestthat this was not the case, at least within the initialphase of the epidemic in each countries. Neither theBayesian, nor the machine learning analyses found thenumber of tests per 1000 population to be signiﬁcant.We present technical details of the data, algorithmsand analysis in section (5). Figure 2:

The growth rates with errorbars for the 40 regionstaken from our No Factor model plotted against thepositive rate data, are shown in blue. The red linesare posterior samples from the univariate positiverate model. The intercept was taken to be the par-ent level mean of the growth rate and the slope isthe coeﬃcient for positive rate of the same model;showing why this variable is detected at σ . Finally, in our separate analysis of blood types dis-cussed in sections 5.4.1 and 5.5 we analyse the poten-tial correlation of blood type with growth rate. We ﬁndmarginal evidence for A+ blood type being relevant,subject to the caveats discussed in section (5.4.1).How should our results be taken in the context of themany claims of climate, blood, BCG etc... being signiﬁ-cant factors in COVID-19 spread? First, many studieswere based on conﬁrmed COVID-19 test cases which,as we discussed earlier, are aﬀected by diﬀerences intesting capability and protocols between countries.Secondly, many of the studies present regressionsthat do not allow for unmodelled confounding sourcesof variation in the growth. Hence, if a country showshigh growth the algorithm will try to force one of thepotential factors under study to explain it. Insteadthe hierarchical Bayesian framework allows the basegrowth rate of each country to be diﬀerent, and hencepotential factors will only be given credit for the dif-ference in the growth rate if they provide a genuinely

Figure 3:

BCG population coverage estimate against our es-timated base growth rates. We see that there isno clear trend with increasing BCG coverage. Thegrowth rates with error bars are taken from our NoFactors model ﬁt. better ﬁt.Further, many studies do not model the intrinsicuncertainties associated with the data as we have done.We too ﬁnd that the best-ﬁts are non-zero (as can beseen by looking at the peaks of the posteriors in Figure(4) or at the last rows of Table (6). Hence our resultsare not in disagreement with regression results, theissue is about the statistical signiﬁcance of such claims.Finally, although we do not detect environmentalfactors at more than σ , it is interesting to examinehow big the environmental factors would be if our best-ﬁt parameter values describe reality: a ◦ C increase(decrease) in temperature implies a . decrease(increase) in the base daily growth rate while a in-crease (decrease) of 20% in relative humidity wouldmean a . decrease (increase) in base daily growthrate. The decreased spread at higher temperature andhumidity agrees with previous work [9].This may not seem signiﬁcant, but for a city such asJohannesburg, where both temperature and humiditydrop signiﬁcantly in winter, the combined eﬀect couldadd more than to the base daily growth rate. For adaily growth rate of , which was approximately thevalue in May 2020, this would halve the doubling timeof the disease, a signiﬁcant impact. Contrary to previous claims our analysis of growth ratesfor deaths from countries worldwide are consistentwith no eﬀect from climate, pollution or BCG vaccina-tion. The only signiﬁcant correlation detected is withthe positive rate of tests: a country that intrinsically hada high R (due to high population density etc...), wouldnaturally tend to be more overwhelmed and hence run Page 4 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates low on testing kits earlier, leading to increased fractionof positive tests. We did ﬁnd some weak suggestive evi-dence, at < σ , that temperature and relative humiditycorrelate with death growth rates.A separate analysis of blood type data shows thatA+ type is the most important blood type, thoughthe signiﬁcance is marginal, both because the dataquality is low and the statistical signiﬁcance is weak.Our combined statistical and machine learning analysisﬁnds no evidence for PM . pollution, other blood typesor UV Index as drivers of COVID-19.More data could be obtained by dropping the re-quirement that all countries in the sample have datafor all the potential factors, which could potentiallyallow for some of the eﬀects to be detected at higherstatistical signiﬁcance but at the cost of making modelcomparison signiﬁcantly more diﬃcult.The data and code for our Bayesian analysis is avail-able at https://github.com/chrisﬁnlay/covid19/. We thank Michael van Niekerk for extracting data fromBCG ATLAS, Niayesh Afshordi, Ewan Cameron, IngerFabris-Rotelli, Ben Holder and Nadeem Oozeer for dis-cussions and comments.This research has been conducted using resourcesprovided by the United Kingdom Science and Technol-ogy Facilities Council (UK STFC) through the NewtonFund and the South African Radio Astronomy Obser-vatory. Model ∆ AIC ∆ BIC ∆ DIC

Positive Rate 0.0 0.0 0.0Relative Humidity 5.5 5.5 31.8No Factors 15.3 11.6 22.9Temperature 13.7 13.7 24.0BCG Vaccine 14.7 14.7 13.6UV Index 14.7 14.7 30.5Tests / 1k 15.2 15.2 28.6All Incl.Tests 114.5 132.9 113.2All Excl. Tests 125.9 137.0 127.4

Table 1:

Model selection rankings relative to the best model(ﬁrst row) for four metrics estimated from the MCMCchains: Akaike Information Criteria (AIC), BayesianInformation Criteria (BIC) and Deviance InformationCriteria (DIC). Here we consider each factor in Θ sep-arately (univariate), as well as the model with no Θ factors, the model with all factors and a model withall factors excluding the two testing factors (PositiveRate and Tests / 1k). The Positive Rate is unanimouslyselected as the most important feature, followed bythe Relative Humidity. Temperature does not performwell here but this is not surprising: the statisticalsigniﬁcance of the temperature increased by . σ inthe joint multivariate ﬁt relative to the univariate ﬁtalone.A similar increase in signiﬁcance is visible forthe Relative Humidity; see Fig. (4). Note that themodel including the two testing parameters (All Incl.Tests) outperforms the model excluding the testingparameters, reinforcing the fact that testing is impor-tant. A+ blood type results are presented separatelyin section (5.4.1). Page 5 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Figure 4:

Marginal distributions for the various factors we consider in the case where all factors are considered simultaneously(blue; multivariate) and individually (red; univariate). The results in both cases are consistent: only the coeﬃcientassociated with the positive rate of tests is non-zero at more than σ , with temperature non-zero at σ in the multivariatecase. Notice that the signiﬁcance of both the temperature and relative humidity increase in the multivariate ﬁts relativeto the univariate ﬁts where they are ﬁt alone. Both BCG and UV Index are consistent with no eﬀect. We chose to look at the growth rate in deaths as theyare less likely than conﬁrmed cases to be aﬀected bythe widely varying testing protocols between countries.The cumulative death data for each country comes from the Centre for Systems Science and Engineering(CSSE) at Johns Hopkins University (JHU) [44].We cannot, however, take all data on deaths for alltimes since it is clear that a simple exponential modelfails quickly: as soon as interventions occur the simpleexponential model fails and we will obtain contami-nated estimates of the growth rate due to the ﬂatteningof the curve which will skew our analysis; see e.g. Fig.(1). As a result we want to know which data should be

Page 6 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates included in our analysis for each country. This reducesto knowing the time, t c , at which the growth of deathsdeviates from a simple exponential. We then only usedata at t < t c for country c .Doing this cut by hand introduces the possibility ofhuman bias. Instead we compute t c for each countryby ﬁtting a non-linear sigmoid transition function tothe growth rate of the death data of each country overtime: σ c ( t ; t c , α c ) = 11 + exp (cid:2) − α c ( t − t c ) (cid:3) (2)This function allows the growth rate to smoothly tran-sition from an initial value (which we are interestedin) to a ﬁnal value, reﬂecting the impact of social in-terventions or nonlinearity in the system.In this ﬁrst phase of the analysis each country, c ,therefore has 5 parameters ( D c , G c , δG c , α c , t c ) todescribe the trajectory of its deaths, where δG c and α c represent the change in growth rate and suddennessof the transition with time from G c to G c + δG c foreach country, c (the index, c , has been left out in Eq.(3) below): D ( t ; D , G , δG, α, t ) = D t (cid:89) τ =1 (cid:0) G + δG · σ ( τ ; t , α ) (cid:1) (3)To maximise the probability that this is a good ﬁt to thedata the non-linear model in Eq. (3) we only considerdata up to the time when a country passes 1000 deaths.All ﬁve parameters are treated hierarchically with theirown parent distribution where the means are Normallydistributed and the standard deviation is Half Normaldistributed.We only use data from t < t c in our main analysis(see section 5.2), where t c is given by the marginalisedmean from the chains. Typical values for t c werearound 15 days. We excluded countries where t c < because of concerns about quality of the underlyingmodel ﬁt in such cases, i.e. a simple exponential modelwas not a good ﬁt even at early times which could leadto spurious growth rates.The resulting ﬁts to country death data for both thefull 5-d model (green) and just the region t < t c are given in Figure (9), showing that the techniqueperforms well in isolating the initial exponential growthfrom a ﬁrst-principles approach.Once we have t c for each country we further cutour data by using the following rules:• We only use countries for which there were a min-imum of 20 deaths by 25 April 2020.• We choose day zero as the ﬁrst day a countrypasses 3 deaths. This leads to typical startingnumbers of deaths of around 5.• Data for our variables of interest (Climate, BCG,etc...) were not available for all countries. Toensure the same amount of data for all modelsonly countries with data for all our parameters in Θ were included. We did not model the potential correlations and in-teractions of our parameters with these data cuts. Thiscould potentially alter our conclusions: if a parame-ter were extremely important in determining growthrates, then countries with very small numbers of deathswould systematically not make it past our data cuts andhence the signal from that data would be lost. However,it is unclear how to model this censorship rigorouslyand it is left to future work.After these data cuts we were left with 40 provincesin 37 countries with 613 data points in total, shownin Figure (9). The only country with more than oneprovince was Canada which include Alberta, Ontario,Quebec and British Columbia (BC). For each region and country the climate data was gath-ered from the Dark Sky API, [43], where the locationssampled are taken from the latitudes and longitudesgiven for each country or region in the JHU COVID-19data. For each country, c , the mean temperature, meanrelative humidity and mean UV Index are calculated asan average over a N day window starting 28 days priorto day 0 for each region, where N is the number of daysof deaths data for that country. The latter is chosenas an estimate of the average time from infection todeath . Since mean climate variables change relativelyslowly the exact delay is not important. See Fig (9) forthe distribution of t c times. We used UV Index as aproxy for natural vitamin D production. Data for allregions and countries is shown in Table (5). Bacillus Calmette–Guérin (BCG) vaccination policieshave been in place in many countries across the worldstarting from widely diﬀering start dates. We wouldlike to estimate the percentage of the population ineach country that has received a BCG vaccine in theirlifetime. For this purpose we need the age demograph-ics of each country as well as the dates when BCGvaccinations became/stopped being mandatory.To estimate the percentage of the population vac-cinated by BCG we need to draw on three sources ofdata. These are the BCG Atlas, World Health Organisa-tion (WHO) BCG vaccination rates amongst 1-year oldsand the age demographics for each country from theUnited Nations (UN). Firstly we looked to the BCG At-las [45]. This is a heterogeneous dataset which impliesthat not all information was available for each country.We collected the following ﬁelds from the BCG Atlas:1. Current BCG vaccination?2. Which year was vaccination introduced?3. Year BCG stopped?4. Year of changes to BCG schedule limate & BCG: Eﬀects on COVID-19 Death Growth Rates

5. Details of changes6. BCG coverage (%)We did not use the last ﬁeld directly due to missinginformation on this ﬁeld, speciﬁcally if it was for a smallage range of the population or its entirety. For somecountries data was missing from ﬁelds 1-3. In thesecases, where appropriate and possible, the missing datawas obtained from ﬁelds 4 and 5.Age demographics for each country were used tocompute the expected fraction of the population whohave had the BCG vaccination. In 15% of cases theBCG Atlas did not have data and we instead used WHOdata [6] to perform the estimate. The derived BCGfractions and the origin of the data, are shown in Table(2) while a plot of the BCG fractions versus growthrates are shown in Fig. (3).There is one caveat here: our BCG coverage is theestimate of the percentage of the population of eachregion and country that has had the BCG vaccination.This is arguably not the optimal quantity to use in ouranalyses however; it might be better to use the fractionof vaccinated population weighted by the probabilityof infection as a function of age. However, the latter iscurrently unknown and hard to compute even in thebest of situations: how can we know how many peoplewere exposed but never got infected? The large scatterin Figure (3) suggests that this is unlikely to makea signiﬁcant diﬀerence to our conclusions that BCGvaccine is not important in the spread of COVID-19.

Data on prevalence of blood type for each country wastaken from [32, 33] while PM . pollution data camefrom [34]. We discuss the blood type data and resultsin section (5.4.1). As described earlier our basic regression model for thedeaths in country c at day t is: D c ( t ) = D c t (cid:89) τ =1 (cid:18) G c (cid:19) (4)which is assumed valid for T ≤ t c , as described before. G c is measured in percent and we model it’s potentialdependence on our variables of interest as: G c ( Θ ) = G c + Θ · X c (5)where G c and X c are the country-speciﬁc base growthrate and data for climate, BCG etc..., shown in Table(5) and Θ are the global parameters we are interestedin. In general the X c could be time-dependent. In thisanalysis, because of missing data and the fact that ourdata for each region typically spans a short period ( t c is less then 3 weeks as discussed in section 5.1.1), weuse average values for X c . Country BCG Coverage Data SourceArgentina 58.5 % WHOAustralia / NSW 38.9 % BCG ATLASAustria 53.1 % BCG ATLASCanada / Alberta 0.0 % BCG ATLASCanada / BC 0.0 % BCG ATLASCanada / Ontario 0.0 % BCG ATLASCanada / Quebec 0.0 % BCG ATLASChile 93.1 % BCG ATLASColombia 86.4 % BCG ATLASCuba 46.9 % WHOCzechia 73.5 % BCG ATLASDenmark 50.7 % BCG ATLASEstonia 27.2 % WHOFinland 80.4 % BCG ATLASFrance 71.4 % BCG ATLASGermany 49.7 % BCG ATLASGreece 13.9 % WHOHungary 82.3 % BCG ATLASIndia 96.9 % BCG ATLASIsrael 29.0 % BCG ATLASItaly 0.0 % BCG ATLASJapan 70.0 % BCG ATLASKorea, South 51.3 % BCG ATLASLithuania 28.2 % WHOLuxembourg 0.0 % BCG ATLASMexico 98.9 % BCG ATLASNetherlands 0.0 % BCG ATLASNorway 79.8 % BCG ATLASPakistan 77.8 % BCG ATLASPeru 96.5 % BCG ATLASPhilippines 61.8 % WHOPoland 80.8 % BCG ATLASSlovenia 75.2 % BCG ATLASSouth Africa 79.2 % BCG ATLASSweden 40.9 % BCG ATLASSwitzerland 31.9 % BCG ATLASThailand 53.4 % BCG ATLASTurkey 92.8 % BCG ATLASUS 0.0 % BCG ATLASUnited Kingdom 67.2 % BCG ATLAS Table 2:

BCG vaccine coverage for populations for all our re-gions/countries estimating the fraction of the popula-tion who have received the vaccine.

Page 8 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Figure 5:

Graphical model for our hierarchical Bayesian anal-ysis showing the parameters, data and their connec-tions. Rectangles with rounded corners represent pa-rameters in the model with their priors shown as nor-mal or half normal distributions with their respec-tive parameters. Diamonds are ﬁxed inputs/data.Squares represent repetition plates with repeatedvariables with index given in the top-left corner ofthe plate. ˜ D ( t ) c are the observed deaths for country c on day t . Our goal is to determine if any of the parameters Θ are rigorously required to be non-zero by the data.One limitation of Eq. (5) is that it is linear in thecountry data X c . We justify this by noting that thegrowth rates and the underlying data vary over narrowranges, so that retaining only the linear terms in theTaylor series expansion of G c ( Θ ) is a reasonable step.For the temperature variables we have veriﬁed thatassuming instead a step change in the growth at around ◦ C with two hierarchical growth parameters, one oneach side of the step, did not lead to any increasein signiﬁcance in the detection of temperature as aneﬀect.

We write a hierarchical Bayesian probabilistic model byassuming a Poisson likelihood, suitable for count data,with mean given by the deterministic forward modeldeﬁned in Eq. (5), together with priors and hyperpriorsfor all our 90 parameters, as shown in the schematicgraphical representation in Fig. (5); see e.g. [39].We do not try to model missing deaths due to testingirregularities. As long as the fraction of missing deathsremains approximately constant in the early phase ofthe spread within the country that we consider thisshould have little impact on our results.Since our data covers 37 countries and 40 provincesworldwide and we know that the growth rate of the disease will depend both on properties of the disease(which are universal), and properties speciﬁc to eachcountry (e.g. culture and population density) it is nat-ural to model the data with a hierarchical Bayesianstructure which allows growth rates to vary somewhatfrom country to country but to also be somewhat simi-lar between countries.The priors and hyperpriors for each variable to beestimated is chosen to be:• µ D ∼ N (7 , ) ; the hyperprior for the mean onthe prior for initial deaths, D c , for each country( D c > as part of our data cuts).• σ D ∼ Half Normal (3) ; the standard deviation onthe prior for initial deaths for each country.• µ G ∼ N (20 , ) ; the mean (in percent) on theprior for the growth rate of each country, G c .• σ G ∼ Half Normal (20) ; the standard deviationon the prior for the growth rate of each country.• D c ∼ N ( µ D , σ D ) , the prior on the initial deathsparameter for each country, c . All countries to-gether will be denoted by D .• G c ∼ N ( µ G , σ G ) ; the prior on the growth rateparameter for each country, c . All countries to-gether will be denoted by G .• Θ ∼ N ( , I ) ; the prior on the vector of parametersof key interest.The complete vector of all parameters is therefore: (cid:0) µ D , σ D , µ G , σ G , D , G , Θ (cid:1) T (6)The joint prior over our full set of parameters is as-sumed factorizable, i.e. a product of the prior distribu-tions listed above.We assume our data to be statistically independentboth between countries and from day-to-day. Ourmodel simultaneously ﬁts the death data from all coun-tries. We use NumPyro[42] for Monte Carlo MarkovChain (MCMC) probabilistic sampling from our poste-rior using the No U-Turn Sampling (NUTS) algorithm[41]. NumPyro has diagnostic tools built in and allowsfor easy running on accelerators, such as GPUs, whichwere key in being able to iterate quickly through ourmany models including all the multivariate and uni-variate combinations, as well as looking at the eﬀectof priors.The MCMC simulations were generally run as 4 inde-pendent chains with each chain starting from a randomposition sampled from the prior for our parametersand run for 2000 steps in order for length scales (andother sampling options) to be determined automati-cally by NumPyro. After this initial “burn-in" each chainwas typically run for 5000 samples. Chains were longenough to ensure that the Gelman-Rubin convergencetest was always less than 1.01.Each chain was then thinned by a factor of two in or-der to further improve independence of samples. Thisleads to the ﬁnal 10k samples collected for each modeland led to good, converged, trace plots and posteriors; Page 9 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates see Fig. (4). We used ChainConsumer for the analysisof the chains, the model selection metrics in Table (1)and some of the plots [51]. The data for blood type prevalence comes from on-line heterogeneous collections of published and unpub-lished data sources covering a wide range of publicationdates. As a result, data integrity is an issue and wehave chosen to separate the blood analysis from ourmain Bayesian analysis.A random forest analysis ﬁnds only A+ blood type aspotentially relevant amongst the diﬀerent ABO bloodtypes (see section 5.5). Here we present the resultsof both univariate and multivariate A+ blood typeBayesian analyses using the formalism described insection (5.2).Fig. (6) shows the result of the Bayesian analysis forthe A+ blood type coeﬃcient: it is consistent with zeroin both the univariate and multivariate cases. If welook at the AIC, BIC and DIC for A+ blood type we ﬁndvalues of: (10 . , . , . compared to the no-factormodels results of (15 . , . , . . The A+ AIC andBIC are better than the no-factor model, but the DICvalue is worse. In addition, the random forest analysisfound A+ blood type to be more important than bothBCG and and Tests/1k.As a result we conclude that there is somewhat con-ﬂicting evidence regarding the potential eﬀect of A+blood type, but that none of the evidence is strong. Figure 6:

The univariate and multivariate results for the co-eﬃcient of A+ blood type are consistent with zero,i.e. no eﬀect. https://samreay.github.io/ChainConsumer/ Θ on Base Parameters In the results section we focused on the best-ﬁt values of Θ . The reverse question is interesting too: what is theimpact of including the Θ parameters on the hyperpriorparameters, mean, µ G , and standard deviation, σ G ,of the base growth rates, G c , for the 40 regions? If the Θ do explain some of the variation we would expectthe standard deviation on the base growth rate (parentdistribution) to get smaller, which is captured by thehyperparameter σ G . The mean of the base growthrates (parent distribution), given by µ G , are nothingmore than an oﬀset dictating the value when all factorsare zero i.e. Θ = .This is what we see: in the No-Factor model σ G =7 . which changes to . when we allow all the Θ tovary, a signiﬁcant shrinkage. On the other hand, if weinclude all parameters except for the two testing param-eters, σ G returns to . , while σ G = 6 . . when we add the positive rate (Tests/1k) parametersalone; showing that it is primarily the positive rateparameter that drives the shrinkage in the uncertaintyin the parent distribution on the base growth rates. To assess the stability of our results we ﬁt each of ourkey parameters in Θ both alone (univariate) and simul-taneously with all the other parameters (multivariate).As shown in Table (3) and Fig. (4), the results arequite stable. Positive rate and temperature are thestill the most signiﬁcantly parameters in both cases:the positive rate is non-zero at more than σ in bothcases while the signiﬁcance for temperature increasesfrom . σ to . σ when going from the univariateto multivariate ﬁt. Relative Humidity also increases insigniﬁcance in the multivariate case.To access correlations between parameters we showin Fig. (10) the one and two-dimensional marginalisedposterior plots. Correlations are typically small, thoughthere are is some small positive correlation betweenTests per 1k and Positive Rate, and a small negativecorrelation between Temperature and UV Index.Our full results for all the hyperpriors, the base coun-try growth rates, G c and the Θ parameters for eachof the diﬀerent univariate and multivariate models areshown in Table (6). Since our goal in this analysis is to assess whether thereis evidence for external factors such as climate, bloodtype etc... in addition to known country-speciﬁc factors,an important internal check is to look for potentialsensitivity to our priors and hyperpriors.In our hierarchical analysis, data for each countrytries to pull the measured growth rates of countries totheir own best-ﬁt values, while the hierarchical nature

Page 10 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Parameter Multivariate UnivariateBCG Vaccine 0.01 ± ± -0.29 ± -0.19 ± ± ± ± ± ± ± UV Index -0.01 ± ± Table 3:

Comparison of univariate and multivariate ﬁts to thedata. Parameters whose mean are more than 2 σ away from zero (temperature and positive rate) areshown in bold. Only Postive Rate is nonzero at morethan σ . of our model described in section (5.3) attempts to pullthem all together. Between this tug of war, the algo-rithm searches for joint values of Θ that will improvethe ﬁts to all the data. One concern might be that if wemake the hyperpriors on the parent distributions muchstronger or much weaker, we allow the algorithm togiven less or more freedom to the base growth rates G c which in turn may aﬀect the best-ﬁt parameters Θ and our main conclusions.To test this we tightened the following hyperpriorsand priors listed in section (5.3) by an order of magni-tude to:• µ G ∼ N (20 , ) ; the mean (in percent) on theprior for the growth rate of each country, G c .• σ G ∼ Half Normal (2) ; the standard deviation onthe prior for the growth rate of each country.• Θ ∼ N (0 , . ; the prior on the vector of parame-ters of key interest.The results of these changes are shown in Fig. (8).As expected tightening the prior on the Θ pulled mostparameters slightly closer to zero but also tighten theposterior so that none of our conclusions were altered:the statistical signiﬁcance of all variables was unal-tered. To provide a largely independent test of our Bayesianresults we also undertook a machine learning analy-sis of the deaths data using random forest regression[52]. Random forests are a powerful ensemble methodthat naturally provide feature selection capability, andare hundreds of times faster to run than our compu-tationally intensive hierarchical Bayesian framework.While random forest does not provide estimates of sta-tistical signiﬁcance of factors it does allow us to rankadditional factors in terms of importance and hence toexplore the potential of additional explanatory featuresfor inclusion in the main set of parameters, Θ , for theBayesian analysis. To undertake the random forest regression we usedthe results from our No-Factor Bayesian run (i.e. Θ =0 ) to obtain the base growth rates G c for each countryand used these as the target for the random forestregression with the X c data as features. In the randomforest analysis we augmented the data both with PM . pollution and extra blood types data (namely A–, AB+,O–, AB–, B–).Feature selection was done using the standard Ran-dom Forest impurity method[52] and by ranking vari-ables by the impact they had on the average Root-Mean-Square-Error (RMSE) of the regression: leavingout important explanatory variables is expected to sig-niﬁcantly degrade the performance of the algorithm,leaving out irrelevant features does not. All feature im-portance scores were averages over 500 random 70-30training-test splits of the data. We found that the onlyvariables that made more than one percent diﬀerenceto the RMSE value were the Postive Rate (3.98%) andA+ (2.28%). This lead us to select A+ blood type as avariable in our separate full Bayesian analysis shownin section (5.4.1).The results from the impurity-based feature selec-tion are shown in Table (4). Again Positive Rate and A+were the most signiﬁcant features, followed by Testsper 1k and BCG vaccine coverage, PM . and the otherblood types at signiﬁcantly less importance. As a re-sult blood types, other than A+, were not included inthe Bayesian analysis since additional parameters sig-niﬁcantly increased the computational complexity ofthe analysis, both because of increased time to conver-gence and an increase in the number of models to run(since we ﬁt both multivariate and univariate modelsin all cases). The distribution of a selection of featureimportances is shown in Fig. (7)In summary the machine learning analysis conﬁrms,to the extent that they overlap, the results of our muchmore intensive Bayesian analysis: Positive Rate is themost important feature, followed by A+ (subject to thecaveats discussed in section (5.4.1), while BCG is notrelevant. References [1] S. Flaxman, S. Mishra, A. Gandy, et al et al

Page 11 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Feature Mean Std. DevPos-Rate 0.36 0.14A+ 0.20 0.11Tests/1k 0.08 0.04BCG 0.08 0.05B+ 0.08 0.06PM . Table 4:

Mean and standard deviation of impurity-based ran-dom forest feature importances from 500 runs with100 estimators and a − test-training split.The sample density of feature importances is shownin Fig. (7). Positive Fraction is selected as the mostimportant feature, as with the Bayesian analysis. Cli-matic variables (temp, humidity and UV Index) werenot included in the analysis. Figure 7:

Histograms of feature importances from 500 ran-dom forest runs showing, from most important toleast: Positive-Fraction (beige), A+ blood type (teal),BCG (salmon), Tests per 1k (red) and PM . (darkbrown). [9] H. V. Fineberg et al. et al ., https://arxiv.org/abs/2003.05003 See also:http://covid19-report.com/ et al. , medRxiv 2020.04.03.20052787; doi:https://doi.org/10.1101/2020.04.03.20052787[12] P. Shi et al , medRxiv 2020.03.22.20038919; doi:https://doi.org/10.1101/2020.03.22.20038919[13] M. Sajadi et al , https://ssrn.com/abstract=3550308;http://dx.doi.org/10.2139/ssrn.3550308[14] J. Xi and Y. Zhu, Science of The Total Environment 724, 138201, 2020,https://doi.org/10.1016/j.scitotenv.2020.138201[15] J. Ma et al , Science of The Total Environment, 724, 138226, 2020;https://doi.org/10.1016/j.scitotenv.2020.138226[16] A. Anis, https://ssrn.com/abstract=3567639;http://dx.doi.org/10.2139/ssrn.3567639[17] S. Pawar et al , medRxiv 2020.03.29.20044461; doi:https://doi.org/10.1101/2020.03.29.20044461[18] D. Gupta, http://dx.doi.org/10.2139/ssrn.3558470 [19] A. Notari, medRxiv 2020.03.26.20044529; doi:https://doi.org/10.1101/2020.03.26.20044529[20] J. Zhao et al , medRxiv 2020.03.11.20031096; doi:https://doi.org/10.1101/2020.03.11.20031096[21] M. Zietz, N. P. Tatonetti medRxiv 2020.04.08.20058073; doi:https://doi.org/10.1101/2020.04.08.20058073[22] A. Miller, et al medRxiv 2020.03.24.20042937; doi:https://doi.org/10.1101/2020.03.24.20042937[23] L. E. Escobar, A. Molina-Cruz, C. Barillas-Mury medRxiv 2020.05.05.20091975; doi:https://doi.org/10.1101/2020.05.05.20091975[24] S. Singh, medRxiv 2020.04.11.20062232; doi:https://doi.org/10.1101/2020.04.11.20062232[25] M. Asahara, medRxiv 2020.04.17.20068601; doi:https://doi.org/10.1101/2020.04.17.20068601[26] E. Conticini, B. Frediani, D. Caro, Environmental Pollution, 2020;114465 DOI: 10.1016/j.envpol.2020.114465[27] X. Wu, et al , medRxiv 2020.04.05.20054502; doi:https://doi.org/10.1101/2020.04.05.20054502[28] M. Travaglio, et al , medRxiv 2020.04.16.20067405; doi:https://doi.org/10.1101/2020.04.16.20067405[29] D. Liang, L. Shi, J. Zhao, et al , Preprint. medRxiv.2020;2020.05.04.20090746. doi:10.1101/2020.05.04.20090746[30] V. Bianconi et al , Archives of Medical Science. 2020.doi:10.5114/aoms.2020.95336.[31] A. Gómez-Carballa, et al et al et al , PLoS medicine, 8, 3, 2011.[47] C. Modi et al , medRxiv 2020.04.15.20067074; doi:https://doi.org/10.1101/2020.04.15.20067074[48] S. Vandoros, Social Science & Medicine doi:10.1016/j.socscimed.2020.113101[49] P. Wikramaratna, R. S Paton, M. Ghafari, J.Lourenco, medRxiv 2020.04.05.20053355; doi:https://doi.org/10.1101/2020.04.05.20053355[50] I. Arevalo-Rodriguez et al , medRxiv 2020.04.16.20066787; doi:https://doi.org/10.1101/2020.04.16.20066787[51] S. Hinton, The Journal of Open Source Software, 1, 00045; 201610.21105/joss.00045[52] F. Pedregosa, et al , Journal of Machine Learning Research, 12, 2825,2011 Page 12 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Figure 8:

Marginal distributions for the various factors in the case of the standard priors/hyperpriors and in the case where allthe priors are tightened by a factor of ten. We see that the signiﬁcance of the best-ﬁt changes by less than . σ althoughthe means are typically shifted towards zero, other than BCG and Positive Rate. Our main conclusions are unchangedand stable to changing the priors dramatically. Page 13 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Country/Region BCG (%) T ( ◦ C) RH (%) Tests/1k Pos-Rate (%) A+ (%) UVArgentina 58.5 21.6 51.0 0.8 12.7 34.3 1.3Australia / NSW 38.9 19.7 74.2 17.3 3.4 31.0 1.4Austria 53.1 1.3 60.3 21.5 15.0 33.0 1.0Canada / Alberta 0.0 -3.8 74.5 15.1 7.2 36.0 0.7Canada / BC 0.0 7.9 70.9 15.1 4.4 36.0 0.8Canada / Ontario 0.0 -2.9 72.5 15.1 10.5 36.0 0.8Canada / Quebec 0.0 -10.8 68.8 14.8 27.3 36.0 0.7Chile 93.1 15.8 62.8 6.6 13.7 8.7 1.5Colombia 86.4 14.2 90.5 1.2 12.9 26.1 2.4Cuba 46.9 22.3 79.3 2.6 7.8 32.8 2.5Czechia 73.5 3.0 56.7 16.8 6.7 36.0 1.0Denmark 50.7 4.8 61.6 17.3 16.1 37.0 0.7Estonia 27.2 1.9 65.1 32.5 5.1 30.8 0.6Finland 80.4 0.2 68.1 11.0 10.4 38.0 0.4France 71.4 8.6 71.6 1.5 53.3 37.0 0.7Germany 49.7 0.8 48.0 13.8 12.0 37.0 0.8Greece 13.9 7.6 80.1 4.8 8.8 32.9 1.1Hungary 82.3 6.2 57.4 5.2 8.9 33.0 0.9India 96.9 30.1 29.3 0.3 14.5 20.8 2.4Israel 29.0 15.3 58.3 28.4 10.4 34.0 1.7Italy 0.0 4.5 46.5 1.2 40.0 36.0 1.2Japan 70.0 4.4 64.8 1.4 16.8 39.8 1.2Lithuania 28.2 2.7 54.9 24.8 4.5 33.0 0.7Luxembourg 0.0 6.7 55.5 57.9 12.5 37.0 1.0Mexico 98.9 20.9 29.7 0.3 57.0 29.9 2.8Netherlands 0.0 4.5 44.9 4.0 29.9 35.0 0.8Norway 79.8 -1.8 63.0 26.7 7.3 42.5 0.6Pakistan 77.8 16.5 63.1 0.5 22.2 20.6 1.6Peru 96.5 24.5 86.5 4.7 35.5 18.4 1.9Philippines 61.8 27.5 82.5 0.5 14.7 28.9 2.4Poland 80.8 4.3 51.1 5.6 11.6 31.3 0.8Slovenia 75.2 6.3 47.1 20.6 4.4 33.0 1.1South Africa 79.2 21.2 43.3 2.2 4.5 32.0 1.9South Korea 51.3 8.2 50.7 11.1 3.3 32.8 1.3Sweden 40.9 -0.1 62.4 7.7 25.8 37.0 0.3Switzerland 31.9 -1.0 80.3 20.6 18.1 37.0 0.8Thailand 53.4 31.1 60.8 0.6 10.0 16.9 2.4Turkey 92.8 5.9 57.9 3.3 40.3 37.8 1.2US 0.0 11.8 85.0 0.9 61.0 35.7 1.2United Kingdom 67.2 3.5 71.3 1.6 19.6 35.0 0.5

Table 5:

Factor data, X c for all regions and countries in our study. The columns are BCG fraction, temperature (T), relativehumidity (RH), tests per 1000 population (Tests/1k), positive test rate (Pos-Rate), A+ blood type prevalence and UVIndex (UV). We analyse blood type separately as discussed in section (5.4.1). Page 14 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Figure 9:

Example ﬁts to all 40 regions from 37 countries in our data sample. Posterior samples from the simple exponentialmodel is shown in red, plotted up to the cutoﬀ date, showing the data we use in our analysis. The full model with timevarying growth rate is shown in green while data points are shown in blue. Only the USA had an exponent larger after t c than before. Page 15 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Figure 10:

1, 2 and 3- σ contours from the marginalised posterior samples of the parameters Θ showing that most parameters areweakly correlated aside from one positive (Pos-Rate & Tests/1k) and one negative (Temp & UVIndex) correlation. Zerovalues for the parameters are shown by dashed lines to help assess statistical signiﬁcance. Page 16 of 17 limate & BCG: Eﬀects on COVID-19 Death Growth Rates

Parameter NoFactors(68% CI) BCGVaccine(68% CI) Temperature(68% CI) RelativeHumid-ity(68% CI) Tests /1k(68% CI) PositiveRate(68% CI) A+BloodType(68% CI) UV Index(68% CI) Excl.Tests(68% CI) Incl.Tests(68% CI) µ D . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . ± . σ D . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . µ G . +1 . − . . +2 . − . . +1 . − . . +6 . − . . +1 . − . . +1 . − . . +6 . − . . +1 . − . . +7 . − . . +5 . − . σ G . +1 . − . . +1 . − . . +0 . − . . +1 . − . . +1 . − . . +1 . − . . +0 . − . . +1 . − . . +0 . − . . +1 . − . G Argentina . +2 . − . . +3 . − . . +4 . − . . +6 . − . . +2 . − . . +2 . − . . +6 . − . . +2 . − . . +6 . − . . +5 . − . G Australia / NSW . +2 . − . . ± . . +3 . − . . +7 . − . . +2 . − . . ± . . +7 . − . . +2 . − . . +7 . − . . +6 . − . G Austria . ± . . +2 . − . . +2 . − . . +6 . − . . +3 . − . . +3 . − . . +5 . − . . +2 . − . . +6 . − . . +6 . − . G Canada / Alberta . +1 . − . . ± . . ± . . +7 . − . . +1 . − . . +1 . − . . +7 . − . . +1 . − . . +6 . − . . +6 . − . G Canada / BC . +2 . − . . +2 . − . . +2 . − . . +7 . − . . +3 . − . . +2 . − . . +6 . − . . +2 . − . . +9 . − . . +6 . − . G Canada / Ontario . +2 . − . . +1 . − . . +1 . − . . +6 . − . . +2 . − . . +1 . − . . +7 . − . . +1 . − . . +6 . − . . +8 . − . G Canada / Quebec . ± . . +1 . − . . +2 . − . . +7 . − . . +1 . − . . +2 . − . . +7 . − . . +1 . − . . +5 . − . . +5 . − . G Chile . +2 . − . . +4 . − . . +3 . − . . +7 . − . . +1 . − . . ± . . +3 . − . . +3 . − . . +6 . − . . +7 . − . G Colombia . +1 . − . . +3 . − . . +1 . − . . +8 . − . . +1 . − . . +2 . − . . +5 . − . . +2 . − . . +6 . − . . +6 . − . G Cuba . +1 . − . . +2 . − . . ± . . +6 . − . . +1 . − . . +2 . − . . +6 . − . . +2 . − . . +5 . − . . +7 . − . G Czechia . +3 . − . . +4 . − . . +2 . − . . +5 . − . . +3 . − . . +3 . − . . +6 . − . . +2 . − . . +7 . − . . +6 . − . G Denmark . +2 . − . . +3 . − . . +2 . − . . +5 . − . . +2 . − . . +2 . − . . +7 . − . . +2 . − . . ± . . +6 . − . G Estonia . +2 . − . . +2 . − . . +2 . − . . +6 . − . . +3 . − . . ± . . +6 . − . . +2 . − . . +5 . − . . +5 . − . G Finland . +1 . − . . +3 . − . . +1 . − . . +6 . − . . +2 . − . . +2 . − . . +7 . − . . +1 . − . . +6 . − . . +5 . − . G France . +1 . − . . +3 . − . . +1 . − . . +6 . − . . ± . . +4 . − . . +6 . − . . +1 . − . . +6 . − . . +7 . − . G Germany . +1 . − . . +2 . − . . +1 . − . . +4 . − . . +2 . − . . +1 . − . . +7 . − . . +1 . − . . +5 . − . . +6 . − . G Greece . ± . . +1 . − . . +1 . − . . +8 . − . . +1 . − . . +2 . − . . +4 . − . . +1 . − . . +6 . − . . +6 . − . G Hungary . +2 . − . . +3 . − . . +1 . − . . +6 . − . . +1 . − . . +1 . − . . +7 . − . . +2 . − . . +6 . − . . +8 . − . G India . +0 . − . . +4 . − . . +3 . − . . +2 . − . . +0 . − . . +1 . − . . +3 . − . . +2 . − . . +6 . − . . +6 . − . G Israel . +1 . − . . +2 . − . . +2 . − . . +5 . − . . +3 . − . . +2 . − . . +5 . − . . +2 . − . . +6 . − . . +9 . − . G Italy . +0 . − . . +0 . − . . +1 . − . . +3 . − . . +0 . − . . +2 . − . . +7 . − . . +0 . − . . +7 . − . . +6 . − . G Japan . +1 . − . . +3 . − . . +1 . − . . +5 . − . . +1 . − . . +2 . − . . +7 . − . . +2 . − . . +7 . − . . +6 . − . G Lithuania . +1 . − . . +1 . − . . +1 . − . . +5 . − . . +3 . − . . +1 . − . . +5 . − . . +1 . − . . +5 . − . . +4 . − . G Luxembourg . +2 . − . . ± . . +2 . − . . +5 . − . . +4 . − . . +2 . − . . +7 . − . . +2 . − . . +9 . − . . +5 . − . G Mexico . +1 . − . . +3 . − . . +3 . − . . +3 . − . . ± . . +4 . − . . +6 . − . . +2 . − . . +8 . − . . +7 . − . G Netherlands . +1 . − . . ± . . +1 . − . . +4 . − . . +1 . − . . +3 . − . . +6 . − . . +1 . − . . +5 . − . . +6 . − . G Norway . +1 . − . . +3 . − . . +1 . − . . +5 . − . . +3 . − . . +1 . − . . +7 . − . . +0 . − . . +7 . − . . +6 . − . G Pakistan . +2 . − . . +4 . − . . +2 . − . . +7 . − . . +2 . − . . +3 . − . . +4 . − . . +3 . − . . +8 . − . . +7 . − . G Peru . +1 . − . . +4 . − . . +4 . − . . +10 . − . . +1 . − . . +3 . − . . +4 . − . . +2 . − . . +6 . − . . +7 . − . G Philippines . +1 . − . . +2 . − . . +4 . − . . +6 . − . . +1 . − . . +1 . − . . +5 . − . . +2 . − . . +4 . − . . +8 . − . G Poland . +1 . − . . +3 . − . . +1 . − . . +4 . − . . ± . . ± . . +5 . − . . +1 . − . . +5 . − . . +5 . − . G Slovenia . +1 . − . . +3 . − . . +1 . − . . +4 . − . . +3 . − . . +1 . − . . +6 . − . . +2 . − . . +4 . − . . +5 . − . G South Africa . +1 . − . . +3 . − . . +2 . − . . +4 . − . . +1 . − . . ± . . +5 . − . . ± . . +6 . − . . +7 . − . G South Korea . +2 . − . . +3 . − . . +2 . − . . +3 . − . . +2 . − . . +2 . − . . +6 . − . . +2 . − . . +7 . − . . +5 . − . G Sweden . +0 . − . . +1 . − . . +0 . − . . +6 . − . . ± . . ± . . +7 . − . . +0 . − . . +7 . − . . +5 . − . G Switzerland . +1 . − . . ± . . +1 . − . . +8 . − . . +2 . − . . +1 . − . . +7 . − . . +1 . − . . +5 . − . . +6 . − . G Thailand . +2 . − . . +3 . − . . +4 . − . . +7 . − . . ± . . +2 . − . . +3 . − . . +1 . − . . +7 . − . . +7 . − . G Turkey . +0 . − . . +3 . − . . ± . . +4 . − . . +1 . − . . +3 . − . . +6 . − . . +1 . − . . +5 . − . . +5 . − . G US . +1 . − . . +1 . − . . +2 . − . . ± . . ± . . +5 . − . . +7 . − . . ± . . +6 . − . . +7 . − . G United Kingdom . ± . . +4 . − . . +2 . − . . +7 . − . . +2 . − . . +2 . − . . +6 . − . . ± . . +6 . − . . +7 . − . Θ BCG – . +0 . − . – – – – – – . +0 . − . − . +0 . − . Θ Temp – – − . +0 . − . – – – – – − . +0 . − . − . +0 . − . Θ RH – – – − . +0 . − . – – – – − . +0 . − . − . +0 . − . Θ Tests/1k – – – – − . +0 . − . – – – – − . +0 . − . Θ Pos.Rate – – – – – . +0 . − . – – – . +0 . − . Θ A+ – – – – – – . +0 . − . – – – Θ UV – – – – – – – − . +0 . − . . +0 . − . − . +0 . − . Table 6:

Base growth rates, Θ parameters and hierarchical hyperparameters for the 40 regions. For space reasons we use theabbreviations RH (relative humidity), NSW (New South Wales) and BC (British Columbia).parameters and hierarchical hyperparameters for the 40 regions. For space reasons we use theabbreviations RH (relative humidity), NSW (New South Wales) and BC (British Columbia).