Bridging the COVID-19 Data and the Epidemiological Model using Time Varying Parameter SIRD Model
BBridging the COVID-19 Data and the Epidemiological Modelusing Time Varying Parameter SIRD Model
Cem C¸ akmaklı ∗ ,1 and Yasin S¸im¸sek † ,11 Ko¸c University
July 7, 2020
Abstract
This paper extends the canonical model of epidemiology, SIRD model, to allow fortime varying parameters for real-time measurement of the stance of the COVID-19pandemic. Time variation in model parameters is captured using the generalizedautoregressive score modelling structure designed for the typically daily count datarelated to pandemic. The resulting specification permits a flexible yet parsimoniousmodel structure with a very low computational cost. This is especially crucial at theonset of the pandemic when the data is scarce and the uncertainty is abundant. Fullsample results show that countries including US, Brazil and Russia are still not able tocontain the pandemic with the US having the worst performance. Furthermore, Iranand South Korea are likely to experience the second wave of the pandemic. A real-timeexercise show that the proposed structure delivers timely and precise information onthe current stance of the pandemic ahead of the competitors that use rolling window.This, in turn, transforms into accurate short-term predictions of the active cases. Wefurther modify the model to allow for unreported cases. Results suggest that the effectsof the presence of these cases on the estimation results diminish towards the end ofsample with the increasing number of testing.
Keywords : COVID-19, SIRD, Observation driven models, Score models, Count data,time varying parameters
JEL Classification : C13, C32, C51, I19 ∗ Correspondence to: Cem C¸ akmaklı, Ko¸c University, Rumelifeneri Yolu 34450 Saryer Istanbul Turkey,e–mail: [email protected]. † e–mail: [email protected] a r X i v : . [ q - b i o . P E ] J u l Introduction
The outbreak of the new coronavirus, COVID-19, is one of most severe health crisis theworld has encountered in last decades if not century. The spread of the virus has beenat an unexpected pace since the burst of the pandemic first in Wuhan, China in earlyJanuary of 2020. The World Health Organization has declared the outbreak of COVID-19as a global pandemic in March 11 2020. Official records indicate that as of the end ofJune, more than 10 million people are infected with a total death toll approaching to halfa million.Anticipating the devastating humanitarian and economic effects of the COVID-19,countries have taken various measures to contain the pandemic. A variety of measureshave been imposed ranging from complete lockdown, essentially freezing the flow of life foran uncertain period, to partial lockdown implying a partial closure to the daily routinesfor the protection of the most vulnerable in the population. On the contrary, some coun-tries including Sweden, England and Netherlands were reluctant to consider any measuresat first at the onset of the outbreak but rapidly switch to impose lockdown measures.Recently, a vast majority of countries has launched the process for normalization con-fronting economic pressures. The decision of imposing and/or relaxation of these varioussorts of measures and evaluating the outcome of these actions evidently rely on efficientlymonitoring the course of the pandemic. Therefore, epidemiological models for estimating,and perhaps even more crucially for predicting the trajectory of pandemic come to theforefront. However, if these measures indeed turn out to be effective and changing thenatural course of the pandemic, then this implies that the parameters of the epidemiolog-ical models alter to comply this changing trajectory. This is the departure point of thispaper. Specifically, we develop a simple and statistically coherent model that allows fortime variation of the parameters of the conventional workhorse epidemiological model.We start our analysis by confronting a simple version of the workhorse epidemiologicalmodel with the actual data. From the perspective of econometrics, we specify a countprocess for modeling the course of the COVID-19 pandemic for a selected set of countriesbased on the SIRD model. The SIRD model identifies the four states of the pandemicas Susceptible, Infected, Recovered and Death and it depicts the evolution of these statesdepending on the total number of infected individuals, see Kermack and McKendrick(1927); Allen (2008). It is the contestation of these forces, i.e. the parameters governing1he rates of infection and resolution (in the form of recovery or death) that determinesthe course of pandemic. Therefore, we extend the econometric model by allowing for timevariation in the structural parameters resorting to the Generalized Autoregressive Score(GAS) modeling framework which is a class of observation-driven models. The proposedmodel permits a flexible yet feasible framework that can track the evolution of structuralparameters quite timely and accurately. One important aspect of our specification is itsrelatively low computational cost, which might be crucial especially at the beginning ofthe pandemic when the data is scarce and the uncertainty is overwhelming.We construct a set of selected countries that so far have experienced different courses ofpandemic to demonstrate the efficacy of the proposed framework. These include countriesthat can mitigate the pandemic but with differential momentum, countries where thepandemic starts relatively late and countries that experience a second wave of pandemic.This provides us a testing ground with a wide variety of patterns to examine the potentialof the proposed model. Our results indicate that for a majority of countries the structuralparameters alter over time. The rate of infection typically starts at a high level at theonset of the pandemic and but it decreases at distinct paces depending on the success ofthe country in containing the transmission of the virus. On the contrary, the recovery ratestarts at a low level and gets stabilized after an increase from these low values reflecting theperformance of health systems in handling the active cases. As a result, the reproductionrates, computed as the ratio of the infection rate to the recovery and mortality rates, startat high levels often exceeding the value 5, but diminish at differential rates. Two crucialfindings emerge from the outcomes of the proposed model with time varying parameters.First, US, Russia and Brazil still cannot tackle with the (first wave of) pandemic as thereproduction rates have not fallen below the critical value of 1. Second, Iran and SouthKorea seem to experience the second wave of the pandemic.We further examine the real-time performance of the models. It is crucial for themodels to indicate the stance of the pandemic in real-time and to provide accurate andtimely predictions at least in the short-run. A real-time estimation and forecasting exercisestarting from April show that the proposed model with time varying parameters indeedprovide timely information on the current stance of the pandemic ahead of the compet-ing models. Moreover, it also provides superior forecasting performance up to 1 weekahead, especially for those countries who are currently experiencing the second wave of2he pandemic. Finally, we extend the model for including the cases that are undocumented(as these infected individuals do not show symptoms) using the strategy in Grewelle andDe Leo (2020). While inclusion of those leads to large discrepancies in parameters com-pared to initial findings especially at the onset of the pandemic, parameter values convergeto similar values towards the end of May as the cumulative figures mount.The literature on estimating the SIRD model (with fixed parameters) and variants toevaluate the current stance of the COVID-19 pandemic has exploded since the outbreakof the pandemic. Relatively earlier analysis include Read et al. (2020) and Lourencoet al. (2020) who estimate a SIRD based model with the data from China for the formerand for the UK and Italy for the latter using a likelihood based inference strategy. Wuet al. (2020) blend data related to COVID-19 for China with mobility data and estimatethe epidemiological model using Bayesian inference to predict the spread of the virusdomestically and internationally. Li et al. (2020) conduct a similar analysis employinga modified SIR model together with a network structure and mobility data to uncoverthe size of the undocumented cases, see also Horta¸csu et al. (2020). Zhang et al. (2020)extend the standard SIR model with many additional compartments and estimate a partof parameters using Bayesian inference.Several factors might lead to time variation in the parameters of the epidemiologicalmodel regarding to the course of pandemic. On the one hand, the lockdown measurestaken by the policy makers are intended to isolate the infected from the susceptible in-dividuals. Therefore, the parameter governing the rate of infection which is simply theaverage number of contacts of a person is likely to alter with the conduct of lockdown,see for example Hale et al. (2020). On the other hand, advancements in the fight againstCOVID-19 including the recovery of drugs that could effectively mitigate the course of thedisease, the installment or the lack of the medical equipment such as ventilators mightalter the rate of recovery or in other words the duration of the state of being infected, seefor example Greenhalgh and Day (2017) on time variation in recovery rates. Accordingly,Anastassopoulou et al. (2020) use a least squares based approach on a rolling windowof daily observations and document the time variation of parameters in the SIRD basedmodel using Chinese data. Tan and Chen (2020) also employ a similar but more artic-ulated rolling window strategy to capture the time variation in the model parameters.Other frameworks with time varying model parameters almost exclusively allow for the3ime variation only in the infection rate. An application prior to COVID-19 outbreakincludes, for example, Xu et al. (2016) among others, who utilize a Gaussian process priorfor the incidence rate involving the rate of infection using a Bayesian nonparametric struc-ture. In the context of COVID-19 pandemic, Kucharski et al. (2020) estimate a modifiedSIR model using a parameter driven model framework allowing the infection rate to followa geometric random walk with the remaining parameters kept as constant, see Marioliet al. (2020) for a similar approach. Similarly, Yang et al. (2020) and Fernndez-Villaverdeand Jones (2020) allow for time variation in the rate of infection keeping the remainingparameters constant.In this paper, we propose an alternative modeling strategy to capture the time variationin the full set of structural parameters of the SIRD model. On the one hand, our model-ing framework is statistically consistent with the typical count data structure related topandemic unlike the models that either employ least squares based inference or likelihoodbased inference using Gaussian distribution, e.g. Kalman filter. On the other hand, ourframework is computationally inexpensive unlike the models that are statistically consis-tent but employing particle filter type of methods for inference which are computationallyquite costly. This might be crucial especially at the onset of the pandemic when the datais scarce and uncertainty about the course of pandemic is abounding. Our framework be-longs to the class of observation driven models and specifically to the class of GeneralizedAutoregressive Score Models (henceforth GAS Models) proposed by Creal et al. (2013).GAS models involve many of the celebrated econometric models including Generalized Au-toregressive Heteroskedasticy (GARCH) model and various variants as a special case, andthus, they proved to be useful in both model fitting as well as prediction. Koopman et al.(2016) provides a comprehensive analysis on predictive power of these models comparedto parameter driven models in many settings including models with count data.Independent of the analysis of COVID-19 pandemic, observation-driven models forcount data are considered in many different cases. Davis et al. (2003) provides a com-prehensive analysis on observation driven models with a focus on data with (conditional)Poisson distributions. Ferland et al. (2006) derives an integer-valued analogue of theGARCH model (IN-GARCH) using Poisson distribution instead of Gaussian distribution.Fokianos et al. (2009) considers the Poisson autoregression of linear as well as nonlinearform including IN-GARCH model as a specific case. Chen and Lee (2016) extend the4oisson autoregression to allow for smooth regime switches in parameters. Our frameworknaturally extends these approaches to the epidemiological model framework for each of thecompartments of the SIRD model essentially using a multivariate structure.The remainder of the paper is organized as follows. In Section 2 we describe thecanonical SIRD model and introduce the SIRD model with time varying parameters withfull details provided in the Appendix. In this section, we discuss the estimation resultsusing full sample data from various countries. In Section 3, we discuss the real-timeperformance of our model framework in capturing the current stance of the pandemic aswell as in short-term forecasting compared to frequently used competitors. In Section 4, weextend the model to account for infected individuals who are not diagnosed and thereforenot included in the sample. Finally, we conclude in Section 5.
We start our analysis with the epidemiological model denoted as the SIRD model of Ker-mack and McKendrick (1927), which is the acronym of Susceptible, Infected, Recoveredand Death individuals. Specifically, the SIRD model categorizes the population into thesefour classes of individuals representing four distinct states of the pandemic as Susceptible( S t ), Infected ( I t ) and Recovered ( R t ) and Death ( D t ) in period t . The susceptible groupdoes not yet have immunity to disease, and individuals in this group have the possibility ofgetting infected. The recovered group, on the other hand, consists of individuals who areimmune to the disease, and finally D t represents individuals that have succumbed to thedisease. The Susceptible-Infected-Recovered-Death (SIRD) model builds on the principlethat fraction of the infected individuals in the population, I t − N , can transmit the diseaseto susceptible ones S t − with an (structural) infection rate of β by assuming a quadraticmatching in the spirit of gravity law, see Acemoglu et al. (2020) for details on alternativematching structures. Therefore, the number of newly infected individuals in the currentperiod is βS t − I t − N . The newly infected individuals, i.e. confirmed cases, should be de-ducted from the susceptible individuals in the current period. Meanwhile, in each period,a fraction γ of the infected people recovers from the disease, which in turn reduces thenumber of actively infected individuals. Similarly, a fraction ν of the infected people havesuccumbed to the disease further reducing the number of actively infected individuals.5ence, a fraction γ + ν of the infections are ‘resolved’ in total. This leads to the followingsets of equations: ∆ S t = − βS t − I t − N ∆ R t = γI t − ∆ D t = νI t − ∆ I t = βS t − I t − N − ( γ + ν ) I t − (1)Note that the last equation defines the law of motion for the number of infected individualsand it is the outcome of the first three equations as ∆ S t + ∆ R t + ∆ I t + ∆ D t = 0 holds atany given time, assuming that the size of the population remains constant. The parameters of interest are the structural parameters β , γ and ν that provide informa-tion on the transmission and resolution rates of the COVID-19 pandemic, respectively. Acentral metric which characterizes the course of the pandemic is the reproduction number, R . The reproduction number refers to the speed of the diffusion which can be computedby the ratio of newly confirmed cases, denoted as ∆ C t , to the resolved cases, that is∆ C t / (∆ R t + ∆ D t ). Therefore, it serves as a threshold parameter of many epidemiologicalmodels for disease extinction or spread. Considering the fact that S t /N ≈ R ,t canbe approximated by β/ ( γ + ν ) in (1) and it holds exactly when t = 0, referred as thebasic reproduction rate R . In this sense, a value of R being less than unity indicatesthat the pandemic is contained and if it exceeds unity, this implies that the spread of thepandemic continues. Inference on β , γ and ν and thereby inference on R enables us totrack the trajectory of the pandemic. Our main motivation for employing the model fromthe econometrics point of view is to conform this canonical epidemiological model withthe actual datasets and pinpoint the stance of the pandemic timely. For that purpose, wefirst discretesize (1) as the typical COVID-19 dataset involves daily observations on thecounts of individuals belonging to these states of health. Motivated by this, we specifya counting process for the states using the Poisson distribution with conditional arrivalsimplying a nonhomogenous Poisson process for all the counts see for example Allen (2008);Yan (2008); Rizoiu et al. (2018) for earlier examples and Li et al. (2020) in the COVID-19 In fact, the number of deaths reduces the total population. We assume that the total number of deathsis negligible compared to the population for the sake of tractability of the resulting SIRD model. ∆ C t is identical to − ∆ S t , i.e. the deduction in the number of susceptible individuals. C t | Ω t − ∼ P oisson ( β S t − N I t − )∆ R t | Ω t − ∼ P oisson ( γI t − )∆ D t | Ω t − ∼ P oisson ( νI t − )∆ I t = ∆ C t − ∆ R t − ∆ D t (2)where Ω t stands for information set that is available up to time t . We assume that ∆ C t ,∆ R t and δD t are independent conditional on Ω t − . The resulting distribution for theactive number of infections, I t , is a Skellam distribution (conditional on Ω t − ) with themean πI t − = (1 + β ( R − − I t − and the variance as β ( R − + 1) I t − , where we use theidentity in the last equation together with the definition of R . Therefore, stationarity ofthe resulting process depends on whether R < R ≥
1, i.e. whether the pandemic istaken under control or not. While moments conditional on I t − is identical due to Poissondistribution, the first and second moments diverge when we consider the unconditonalmoments. In this case, E [ I t ] = π t I V ar ( I t ) = β (1 + R − ) π t − (1 − π t )1 − π I , (3)where we assume that the initial condition, I , is known. If the initial condition is consid-ered as a parameter to be estimated then the variance is further amplified with a factor interms of the variance of the initial condition. Accordingly, the unconditional moments ofthe states of the pandemic are linear functions of these unconditional moments of I t . Werefer to Appendix B.1 for details.We conduct Bayesian inference using the likelihood implied by (2) together with non-informative priors for estimating the model parameters. Specifically, for all the models,we use independent Metropolis-Hastings algorithm using Normal distribution around theposterior mode and Hessian as the candidate density, see Robert and Casella (2013) fordetails. We use the data of selected set of countries starting from the early days of pan-demic until the end of second week of June. For each country, we use the day when thenumber of confirmed cases exceeds 1000 as the starting point of the sample. We displaythe dataset in Table 1. 7Insert Table 1 about here]These countries exhibit extensive heterogeneity in terms of their experience related topandemic. Some countries in this set impose strict measures of full lockdown and successfulpolicies of testing and tracing promptly at the onset of the pandemic including SouthKorea, while other countries including Italy has imposed these immense measures aftera certain threshold regarding the number of infected individuals. Some others opt forimposing mixed strategies involving partial lockdowns and voluntary quarantine such asTurkey and US. We also include other interesting cases including Brazil and Russia whoare going through differential phases of the pandemic. Finally, Iran experiences a secondwave of the pandemic, though South Korea seems to start a similar pattern, albeit muchmilder. Hence, this relatively rich and heterogeneous dataset involving countries withall sorts of pandemic experience enables us to examine the success of the econometricmodel in tracking the changes in the structural parameters as a response to this policyimplementations. The estimates of the model parameters are displayed in Table 2.[Insert Table 2 about here]We first focus on the basic reproduction rate, R as displayed in the last column of Table 2.For all countries but South Korea R exceeds the threshold of 1 and for some it also exceeds2. Apparently, when the full sample of several days is taken into account, estimation resultsshow that in almost none of the countries in our sample the pandemic could be taken undercontrol. For Brazil, Russia and US who experience relatively prolonged earlier phases ofthe pandemic we estimate an R that is very close to 2 or that exceeds 2 departing fromthe rest of the countries in the sample. This is due to the almost exceptional low recoveryrate for the case of US and high rate of infection for Brazil. Indeed, two groups comeinto view considering the infection rate. Brazil and Iran constitute the group with highinfection rate departing from the remaining countries. A similar grouping appears also forthe mortality rate. In this case Italy joins to the group of high mortality rate togetherwith Brazil and Iran.The estimation results in Table 2 rely on the SIRD model with fixed parameters asdemonstrated in (2). While, as of the second week of June it is widely accepted that thepandemic is taken under control in countries including South Korea, we still obtain an R very close to 1. This might be due to rapid pace of infectiousness, captured by β , at the8nset of the pandemic which is brought under control due to rapidly imposed measures.Moreover, increasing knowledge about the SARS-Cov-2 virus, availability of the medicalcare facilities such as ICU’s and more effective treatment of the infection potentially leadto changes in the recovery rate γ and the mortality rate ν . Therefore, it might be crucial tomodel this time variation efficiently in a data scarce environment allowing for estimationat the onset of the pandemic as well. In this section we put forward the SIRD model with time varying parameters. For mod-eling the time variation we use the framework of Generalized Autoregressive Score model(GAS hereafter) which encompasses a wide range of celebrated models in econometrics, in-cluding Generalized Autoregressive Conditional Heeteroscedasticity model (GARCH) andits variants. Briefly, the GAS model relies on the intuitive principle of modeling the timevariation in key parameters in an autoregressive manner which evolves in the directionimplied by the score function and thereby improving the (local) likelihood, see Creal et al.(2013) for a detailed analysis of the GAS model. As in the case of the GARCH model, iteffectively captures the time dependence in long lags in a parsimonious yet quite flexiblestructure. Perhaps more importantly, since it admits a recursive deterministic structurethe resulting data driven time variation in parameters is computationally inexpensive toestimate. This might be crucial given that these flexible models are evaluated throughoutthe course of the pandemic when data is often scarce especially at the onset of it. Considerthe SIRD model with time varying parameters as β t , γ t and ν t . We first transform theseparameters into logarithmic terms to ensure positivity of these parameters and therebythe positivity of the predicted counts every time period. Let the parameter with a tildedenote the logarithmic transformations as ˜ β t = ln ( β t ), ˜ γ t = ln ( γ t ) and ˜ ν t = ln ( ν t ). The9esulting Time Varying Paramaters - SIRD (TVP-SIRD) model is as follows∆ C t | Ω t − ∼ P oisson ( β t S t − N I t − )∆ R t | Ω t − ∼ P oisson ( γ t I t − )∆ D t | Ω t − ∼ P oisson ( ν t I t − )˜ β t = α + α ˜ β t − + α s ,t ˜ γ t = φ + φ ˜ γ t − + φ s ,t ˜ ν t = ψ + ψ ˜ ν t − + ψ s ,t ∆ I t = ∆ C t − ∆ R t − ∆ D t (4)where s ,t , s ,t and s ,t are the (scaled) score functions of the joint likelihood. Since thelikelihood function of the SIRD model is constituted by the (conditionally) independentPoisson processes, each score function is derived using the corresponding compartment.Specifically, let ∇ ,t = ∂L (∆ C t ; ˜ β t ) ∂ ˜ β t , ∇ ,t = ∂L (∆ R t ; ˜ γ t ) ∂ ˜ γ t and ∇ ,t = ∂L (∆ D t ; ˜ ν t ) ∂ ˜ ν t denote the scorefunctions of the likelihood function for period t observation. We specify s i,t such that thescore functions are scaled by their variance as s i,t = ∇ i,t Var( ∇ i,t ) for i = 1 , , In the specificcase of SIRD model, this modeling strategy leads to the following specification for the(scaled score functions) in terms of the logarithmic link function s ,t = ∆ C t − − λ ,t − λ ,t − s ,t = ∆ R t − − λ ,t − λ ,t − s ,t = ∆ D t − − λ ,t − λ ,t − (5)where λ ,t = β t I t − S t − N , λ ,t = γ t I t − and λ ,t = ν t I t − . The resulting specification impliesan intuitive updating rule in the sense that the parameters (in logarithmic form) are up-dated using a combination of recent parameter value and recent percentage deviation fromthe mean. We refer to Appendix A for the details on derivation of (5). The specificationin (4) leads to quite rich dynamics both in terms of mean and the variance of the resultingprocess. This enables us to capture the evolution of the pandemic accurately which isreflected as timely and prompt response of the parameters to the changes in the data, i.e.in the states of the pandemic. We refer to Appendix B.2 for the details on the properties Alternative approaches for scaling the score function include the standard deviation rather than thevariance and using score function without scaling. Our experience on this experimentation is that usingthe variance as the scaling function leads to smoother and more robust evolution of model parameters overtime.
10f the process described in (4).An appealing feature of the TVP-SIRD model is that it encompasses the SIRD modelwith fixed parameters. For example, when α = 1 together with α and α to be zero, thenthe rate of infection, β t , remains fixed over the course of pandemic. This would indicatethat the lockdown measures are proved to be ineffective as it does not lead to a systematicchange in the infection rate. Similar reasoning also applies to γ t and ν t . Therefore, itprovides a solid framework for statistically testing the efficiency of measures for taking thepandemic under control. We display the estimates of the underlying parameters governing(logarithms of) β t , γ t and ν t in Table 3.[Insert Table 3 about here]The parameter estimates in Table 3 indicate that the structural parameters governing thediffusion of the infection exhibit time varying behaviour. For all the countries, the 95%credible intervals for the posterior joint distribution exclude the set of α = α = 0 and α = 1 implying that β t indeed varies over time. The same conclusion also applies for therecovery rate γ t and mortality rate ν t which is considered as constant parameters in vastmajority of studies.We display the evolution of the underlying structural parameters, β t , γ t , ν t and theresulting R ,t over time in Figure 1.[Insert Figure 1 about here]Figure 1 reveals interesting patterns for the underlying parameters of the TVP-SIRD modelwhen applied to the real datasets. First, considering β t , which is the rate of infection, forItaly, Russia, Turkey and US, we observe a decreasing pattern, which seems to be stabilizedafterwards. It seems that the full lockdown measures imposed by these countries haveweaken the transmission of the virus throughout the society. Still, we observe a mildincrease for Turkey reflecting the outcome of the normalization process. On the otherhand, limited reaction to the virus in Brazil leads to fluctuations in the rate of infectionswhich exhibits a weakly decreasing trend only starting from the end of April. Iran andSouth Korea exhibit an alarming pattern in the sense that while the rate of infection hadlanded at very low levels compared to initial values, starting with the first of week of May,we observe a take off again hinting a potential second wave of the pandemic. The resultsshow that the dynamic structure of the flexible modeling strategy could capture many11ypes of pattern of the infection rate affected by the containment measures imposed bythe countries.The recovery rate seems to have less variation for a majority of countries, which usuallystarts with low levels as still the first wave of recoveries are limited and then gettingstabilized around some fixed values. Turkey seems to be an exception with an increasingrate of recovery towards the end of April first and then at the end of May. This high rateof recovery coincides perfectly with the days right after the peak of active cases at the endof the third week of April. We also note that the uncertainty around these values also risesas a result of this rapid change. For Italy and Russia the improvement in the recovery rateis quite gradual approaching to a stable and high level only towards the last week of May.When we consider the mortality rate, an interesting structure emerges. For all countrieswith the exception of South Korea the value of mortality rate is smoothly stabilized arounda fixed value. While this fixed value is 0.001 for a majority of countries, it is lower forSouth Korea and larger for Brazil and Iran reflecting the varying capability of the healthsystems of these countries in coping with the pandemic. For South Korea the mortalityrate of the pandemic is quite low and the seemingly volatile nature of the mortality ratemight be due to these minuscule rates prone to fluctuations over time.The course of the reproduction rate, R ,t is of central importance for tracing the efficacyof the containment efforts of the pandemic. The last column of the Figure 1 displays theseand we discuss our findings country by country. For Brazil, although the reproductionrate has decreased from record high levels to values around 2, it is still larger than 1.Even more crucially, the drop from 2 to values just above 1 at the beginning of June isdue to sudden increase in the rate of recovery rather than a decrease in the infection rate.For Italy, the reproduction rate has fallen below the value of 1 at the beginning of Mayand remained there since then. Therefore, Italy seems to take the pandemic successfullyunder control. For Iran, the reproduction rate fell below 1 as early as mid April but it hasexceeded this critical threshold starting from the second week of May and still remainsabove 1. As discussed earlier, this is due to the increasing infection rate, to a large extent,reflecting the potential threat of the second wave. A similar pattern is also observed forSouth Korea with the reproduction rate exceeded the threshold of 1 as of June. For USand Russia, while the reproduction rate has been stabilized, it does that above 1 wherefor Russia it is close to 1 and for US it is around the value of 2. Hence, for these countries12he pandemic is far from being contained and it continues to diffuse rapidly. Finally, forTurkey the pandemic is contained as of first week of May but the increasing pattern ofthe reproduction rate after May leads it to wander around values just below 1 with thebeginning of normalization process similar to other countries. Considering the cases ofIran and South Korea, it might easily cross the threshold especially given the fact that therate of infection exhibits an increasing tendency starting from the first week of June. The results in the previous section display our findings based on the estimates using fullsample dataset as of the end of second week of June. These results indicate that our flex-ible modelling structure can accommodate various forms of parameter changes reflectingthe course of pandemic. However, exploring the real-time performance of the model woulduncover whether this additional flexibility brought by the time varying parameters couldprovide timely and accurate information on the real-time stance of the pandemic. There-fore, we provide the estimates of the model parameters in real-time using the model withfixed parameters and using the model with time varying parameters. We use a movingwindow for performing the estimations of the SIRD model with fixed parameters as theevidence in the previous sections show that the values at the onset of the pandemic coulddrive the findings intensively. Specifically, using the dataset from t − M, t − M +1 , . . . , t , weestimate the SIRD model and the resulting parameter estimates are those for the period t ,and we repeat the process by adding one more observation (and dropping one observationat the beginning of the sample) recursively. We consider three cases by setting M = 10, 20and 30, i.e. starting from ten days of data up to one month of data. For the TVP-SIRDmodel we use the expanding window at hand up to time period t as the parameters in thiscase are time varying. We display the evolution of the structural parameters, β t , γ t , ν t and the resulting R ,t over time in Figure 2.[Insert Figure 2 about here]When we consider the rolling window estimates using the SIRD model with fixed param-eters, we observe that there is a trade-off between the speed of reaction to the evolutionof pandemic and the window size as expected. When the window size is taken as 30 daysthe parameters evolve quite smoothly but cannot react to the rapid changes promptly. On13he contrary, when 10 days of window size is used parameters adjust to the new conditionsmore quickly. When we focus on the time varying parameters SIRD model, we observethat the parameters can accommodate to the newly changing conditions swiftly, ahead ofthe SIRD model with fixed parameters regardless of the window size. In addition, theycan also react to the abrupt changes in the data. Specifically, parameter estimates ob-tained by the TVP model is leading the estimates of the fixed parameter model with 10days of window size by almost 2 weeks ahead. This lead time increases when we considerwider rolling window sizes. While the lead time is larger for the first half of the sample,it decreases over the course of pandemic, when the pandemic is stabilized and thus thevariation in parameters are limited. This indicates that the TVP model is most usefulwhen there are abrupt changes in the pattern of the data and the uncertainty about thecourse of pandemic is at highest level. To give a concrete example, we consider Brazil. Inthis case, the relatively mild increases in the number of active cases after 10 th of April isinstantly reflected to the rate of infection with the TVP model whereas this process takesmuch longer for the SIRD model with 10 days of window size for estimation. Consideringthe recovery rate, the jump in the number of recoveries in April 14 is instantly reflected tothe recovery rate with the release of first set of recoveries. For the SIRD model with 10days of window size in inference, this process takes almost 2 weeks to reflect the changesin the numbers. As a combination of these two forces, the reproduction rate immediatelyfell to the levels around 1 in mid April, which bounced back again to values around 2 inabout two weeks. Considering the SIRD model with 10 days of window size, the repro-duction rate decreased to values around 1 around April 23 rd and returned back to thevalues around 2 with around one week of lag. Considering the severity of the pandemicthese lead times might be crucial in comprehending the current stance of the pandemic.Here we do not discuss the results with 20 and 30 days of rolling window size for the fixedparameter SIRD model as these perform worse than the model with 10 days of window sizeused for inference. Finally, we explore whether this capability of the TVP-SIRD modelin reflecting the stance of the pandemic in a timely manner indeed proved to be useful inforecasting the number of active cases. This would also indicate whether the TVP-SIRDmodel indeed reflects the current stance of the pandemic in a timely but accurate manner.We therefore perform a real-time forecasting exercise where using our recursive estima- Here, we do not take a stance on the quality of the official statistics as all competing models areestimated using the same data. t ,we perform h = 1 , , . . . , − day ahead predictions of active cases, i.e. I t + h . We use thefirst one third of the full sample as the estimation sample and we expand the window byadding one more observation and repeat the procedure. This roughly provides us at least40 days of evaluation period for each country. We display the results involving RMSFEsof the competing models relative to TVP-SIRD model in Table 4.[Insert Table 4 about here]Table 4 reveals that the TVP-SIRD model outperforms the SIRD models estimated usingan expanding window (EW) or using a rolling window with 10, 20 or 30 days of observationsin almost all of the considered cases. First, models estimated using an expanding windowor rolling windows with more than 10 days of observations perform inferior relative tothe TVP-SIRD model and the SIRD model using 10 days of observations in the sensethat the relative RMSFE increases monotonically with the use of more data. The closestcompetitor to the TVP-SIRD model is the SIRD model using 10 days of rolling window(RW-10). In this case, for 1-day ahead predictions of active cases, TVP-SIRD modelprovides muuch better predictions than the RW-10 with relative RMSFEs ranging from1.26 for South Korea up to the value as large as 2.22 for Iran and Turkey. However, for allthe models the forecasting performance deteriorates with the increasing prediction horizon.Therefore, the relative RMSFEs of the models monotonically approaches to values closeto 1 for a majority of models. Still, for all countries with some exceptions the TVP-SIRDmodel outperforms all competing models for all horizons. For Italy and Russia, the SIRDmodel using 10 days of rolling window performs slightly better than the TVP-SIRD modelwith the relative RMSFE getting smaller as the forecast horizon increases. For Turkeyand US, this outperformance of the RW-10 model starts with the 4- and 5-day forecasthorizon. An interesting finding is on South Korea and Iran. In these cases, where thesecountries presumably go through the second wave of the pandemic, TVP-SIRD modelevidently surpasses all competing models reflecting the ability of the flexible structure toaccommodate the changing behavior of the pandemic rapidly.15 Accounting for sample selection
A key underlying assumption of the model specification in previous sections is that thevariables of the infected, recovered and succumbed individuals represent the aggregatenumbers in the society. However, one of the stylized facts related to COVID-19 pandemicis the presence of the infected individuals who do not show any symptoms, denoted asasymptomatic. This complicates the analysis by leading to a selection bias in econometricinference among other factors, see Manski and Molinari (2020). The underlying reason forthis bias is that only the patients who show symptoms can be detected as those are theones who comply for the tests. These exclude a portion of the infected cases which plagueseconometric inference. In this section, we provide results on the total number of individualsbased on explicit assumptions on the model structure to capture asymptomatic infectedindividuals. Let I ∗ t be the number of infected individuals involving both asymptomatic andsymptomatic cases. Similarly, let S ∗ t , R ∗ t and D ∗ t denote the total number of susceptible,recovered and dead individuals, respectively. Throughout the analysis, we use the followingassumptions, Assumption 1
The total number of individuals in the state of ‘infected’ that are symp-tomatic constitutes a − δ t fraction of the total number of individuals in the state of‘infected’, i.e. I t = I ∗ t (1 − δ t ) , Assumption 2
The total number of recovered individuals that were infected with symp-toms constitutes a − δ t fraction of the total number of individuals in the state of ‘recovered’,i.e. R t = R ∗ t (1 − δ t ) , Assumption 3
The infected individuals that are asymptomatic always recover, and thus,they do not switch to the state of ‘death’, i.e. D t = D ∗ t . The second assumption implies that the recovery process for the infected individuals withand without symptoms are identical. This assumption is obviously subject to doubt,however, it saves us an extra parameter to calibrate. These assumptions serve as a roughapproximation to the entire sample without deep epidemiological insight and therefore theestimation outputs should be taken with caution. Using these assumptions, the TVP-SIRD16odel in terms of the total numbers can be written as∆ C ∗ t | Ω t − ∼ P oisson ( β t S ∗ t − N I ∗ t − )∆ R ∗ t | Ω t − ∼ P oisson ( γ t I ∗ t − )∆ D t | Ω t − ∼ P oisson ( ν t I ∗ t − )˜ β t = α + α ˜ β t − + α s ,t ˜ γ t = φ + φ ˜ γ t − + φ s ,t ˜ ν t = ψ + ψ ˜ ν t − + ψ s ,t ∆ C ∗ t = − ∆ S ∗ t = ∆ I ∗ t + ∆ R ∗ t + ∆ D t (6)where ∆ I ∗ t = ∆ I t (1 − δ t ) and ∆ R ∗ t = ∆ R t (1 − δ t ) . The key observation in these new set of equations isthat the observed number of deaths are identical to the total number if the third assumptionindeed holds. This, in turn, prevents the number of individuals who are susceptible tobe computed as a fraction of 1 / (1 − δ t ) as the final equation suggests. Therefore, theevolution of the structural parameters differs from the counterpart in the previous cases,where observed data is assumed to represent the full sample. While this source of variationmight suffice for the identification of the δ t parameter, we note that the number of deathsconstitutes only a minuscule fraction of the total number of susceptible individuals. Toenhance the identification of δ t we exploit the information in the total number of testingfollowing Grewelle and De Leo (2020). Briefly, the underlying idea stems from the factthat the detection of the infections including the asymptomatic individuals would improvewith the increasing number of testing. In that sense, the fraction of tested individualsamong the population should be related to the ratio of reported infections to the totalnumber of infections. This leads to the following expression I t I ∗ t = exp( − kρ t ) δ t = 1 − exp( − kρ t ) . (7)Here ρ t is the fraction of tests with positive outcomes among all tests. With the increasingnumber of testing on the population this fraction is expected to be low and therefore δ t approaches to 1. On the other hand, if testing is concentrated only on symptomaticindividuals then this fraction is close to 1 and δ t approaches to a lower bound, capturedby the parameter exp( − k ) where the functional form admits for exponential decay. Wedisplay the evolution of the model parameters estimated using (6) and (7) in Figure 317or selected countries. Compared to previous sections, countries including Brazil, Iran andRussia are excluded from the sample as these countries do not provide accurate informationon testing at daily frequency. [Insert Figure 3 about here]As can be seen from the graphs in the first row of Figure 3, we observe a sizable fraction ofthe infected individuals that do not show symptoms for many countries at the beginning ofthe sample. While this fraction is smallest for the South Korea, starting from 15% at thebeginning of the sample decreasing to 4% in June, it is largest for US starting from 50%of all infected individuals decreasing to 30% in June. For Italy and Turkey, the fraction isabout 20% and it declines to about 10% in June. The temporary increases in the fractionof asymptomatic individuals at the beginning of the sample is due to the efforts of thesecountries to increase their testing capacity in April. In this case, the speed of increase inthe fraction of positive outcomes falls behind the speed of increase in the number of testsleading to ‘spuriously’ low values of δ t . However, once the capacity is reached we observea monotonically decreasing pattern in the course of δ t as expected.The impact of the relatively sizable fraction of the asymptomatic individuals for UScan be seen from the last column of Figure 3. While the pattern of the evolution of theparameters remains unaffected there are some level shifts in all rates which seem to bediminishing towards June. We also observe similar level shifts of parameters for othercountries, albeit limited compared to US case.These effects are further aggravated when we consider R ,t which is the ratio of theinfection rate over the rate of resolution. In this case, for US and Italy, R ,t differs consid-erably from the earlier estimates computed using the reported numbers at the beginningof the sample. The parameter values converge as the cumulative figures mount and thisseems to have a little effect after the first half of the samples. Therefore, under these as-sumptions, the R ,t computed using official statistics reflects progressively more and morethe actual stance of the pandemic. The world is struggling heavily to mitigate the spread of COVID-19 pandemic, which sofar has devastating effects both from humanitarian and economic point of view. Countries18ave been imposing various measures to fight the pandemic ranging from partial curfewto full lockdown to lower the transmission of the pandemic. These measures supposedlypave the way for the normalization of economies and reopening policies which has startedsince the early June in many countries. Health systems with overloaded intensive careunits lead to substantial variation in the number of recoveries as well as daily death tollsover the course of the pandemic. Additionally, many countries including South Korea andIran start to experience a second wave of the pandemic after the easing of the first wave.Therefore, the parameters in the workhorse epidemiological SIRD model, and ultimatelythe key statistics of the pandemic, i.e. the reproduction rate, change over time due to thechange in these structural parameters.In this paper, we extend the SIRD model allowing for time varying structural param-eters for timely and accurate measurement of the stance of the pandemic. Our modelingframework falls into the class of generalized autoregressive score models, where the pa-rameters evolve deterministically according to an autoregressive process in the directionimplied by the score function. Therefore, the resulting approach permits quite a flexibleyet parsimonious and statistically coherent framework that can operate in data scarceenvironments easily due to low computational cost. We demonstrate the potential of theproposed model using daily data from seven countries ranging from US to South Koreathat have distinct pandemic dynamics over the last 6 months.Our results show that the proposed framework can nicely track the stance of the pan-demic in real-time. For all countries, the infection rate has reduced considerably but atdifferential speed depending on the success in containing the pandemic. Our findings sug-gest that there is considerable fluctuation in recovery and mortality rates, which seems toget more stable towards June. For US, Russia and Brazil the reproduction rate is abovethe critical level of 1 implying that these countries are unable to contain the pandemic yet.Our findings confirm the observation that Iran and South Korea experience the secondwave of the pandemic. We further extend the model for including the infected individualsthat do not show symptoms and therefore are not diagnosed. This seems to have a sizableimpact on the estimated level of reproduction rate at the onset of the sample but convergeto similar levels with the earlier findings towards the end of the sample.19 eferences
Acemoglu D, Chernozhukov V, Werning I, Whinston MD. 2020. A multi-risk SIR modelwith optimally targeted lockdown. Working Paper 27102, National Bureau of EconomicResearch.Allen LJS. 2008.
An Introduction to Stochastic Epidemic Models . Berlin, Heidelberg:Springer Berlin Heidelberg, 81–130.Anastassopoulou C, Russo L, Tsakris A, Siettos C. 2020. Data-based analysis, modellingand forecasting of the covid-19 outbreak.
PloS one : e0230405.Chen CW, Lee S. 2016. Generalized poisson autoregressive models for time series of counts. Computational Statistics & Data Analysis : 51 – 67.Creal D, Koopman SJ, Lucas A. 2013. Generalized autoregressive score models withapplications. Journal of Applied Econometrics : 777–795.Davis RA, Dunsmuir WTM, Streett SB. 2003. Observation-driven models for poissoncounts. Biometrika : 777–790.Ferland R, Latour A, Oraichi D. 2006. Integer-valued garch process. Journal of TimeSeries Analysis : 923–942.Fernndez-Villaverde J, Jones CI. 2020. Estimating and simulating a sird model of covid-19 for many countries, states, and cities. Working Paper 27128, National Bureau ofEconomic Research.Fokianos K, Rahbek A, Tjstheim D. 2009. Poisson autoregression. Journal of the AmericanStatistical Association : 1430–1439.Greenhalgh S, Day T. 2017. Time-varying and state-dependent recovery rates in epidemi-ological models.
Infectious Disease Modelling : 419 – 430.Grewelle R, De Leo G. 2020. Estimating the global infection fatality rate of covid-19. medRxiv .URL Hale T, Petherick A, Phillips T, Webster S. 2020. Variation in government responses tocovid-19.
Blavatnik school of government working paper .Horta¸csu A, Liu J, Schwieg T. 2020. Estimating the fraction of unreported infections inepidemics with a known epicenter: an application to covid-19. Technical report, NationalBureau of Economic Research.Kermack WO, McKendrick AG. 1927. A contribution to the mathematical theory ofepidemics. Proceedings of the royal society of london. Series A, Containing papers of amathematical and physical character : 700–721.Koopman SJ, Lucas A, Scharth M. 2016. Predicting time-varying parameters withparameter-driven and observation-driven models.
Review of Economics and Statistics : 97–110. 20ucharski AJ, Russell TW, Diamond C, Liu Y, Edmunds J, Funk S, Eggo RM, Sun F, JitM, Munday JD, Davies N, Gimma A, [van Zandvoort] K, Gibbs H, Hellewell J, JarvisCI, Clifford S, Quilty BJ, Bosse NI, Abbott S, Klepac P, Flasche S. 2020. Early dynamicsof transmission and control of covid-19: a mathematical modelling study. The LancetInfectious Diseases : 553 – 558. ISSN 1473-3099.Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, Shaman J. 2020. Substantial undoc-umented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2). Science : 489–493.Lourenco J, Paton R, Ghafari M, Kraemer M, Thompson C, Simmonds P, Klenerman P,Gupta S. 2020. Fundamental principles of epidemic spread highlight the immediate needfor large-scale serological surveys to assess the stage of the sars-cov-2 epidemic.
MedRxiv .Manski CF, Molinari F. 2020. Estimating the covid-19 infection rate: Anatomy of aninference problem.
Journal of Econometrics .Marioli FA, Bullano F, Kuˇcinskas S, Rond´on-Moreno C. 2020. Tracking r of covid-19: Anew real-time estimation using the kalman filter. medRxiv .Read JM, Bridgen JR, Cummings DA, Ho A, Jewell CP. 2020. Novel coronavirus 2019-ncov: early estimation of epidemiological parameters and epidemic predictions .Rizoiu MA, Mishra S, Kong Q, Carman M, Xie L. 2018. Sir-hawkes: linking epidemicmodels and hawkes processes to model diffusions in finite populations. In
Proceedingsof the 2018 World Wide Web Conference . 419–428.Robert C, Casella G. 2013.
Monte Carlo statistical methods . Springer Science & BusinessMedia.Tan SX, Chen L. 2020. Real-time differential epidemic analysis and prediction for COVID-19 pandemic. arXiv preprint arXiv:2004.06888 .Wu JT, Leung K, Leung GM. 2020. Nowcasting and forecasting the potential domestic andinternational spread of the 2019-ncov outbreak originating in wuhan, china: a modellingstudy.
The Lancet : 689–697.Xu X, Kypraios T, O’Neill PD. 2016. Bayesian non-parametric inference for stochasticepidemic models using gaussian processes.
Biostatistics : 619–633.Yan P. 2008. Distribution Theory, Stochastic Processes and Infectious Disease Modelling .Berlin, Heidelberg: Springer Berlin Heidelberg, 229–293.Yang Q, Yi C, Vajdi A, Cohnstaedt LW, Wu H, Guo X, Scoglio CM. 2020. Short-term fore-casts and long-term mitigation evaluations for the covid-19 epidemic in hubei province,china. medRxiv .Zhang Y, You C, Cai Z, Sun J, Hu W, Zhou XH. 2020. Prediction of the covid-19 outbreakbased on a realistic stochastic model. medRxiv .21 ables and Figures
Table 1: The datasetCountry Time spanBrazil March 08 - June 12Italy March 29 - June 12Iran March 29 - June 12S. Korea Febru. 26 - June 12Russia March 09 - June 12Turkey March 22 - June 12US March 11 - June 12
Note:
The data is obtained from GitHub,COVID-19 Data Repository by the Center forSystems Science and Engineering (CSSE) atJohns Hopkins University.
Table 2: Estimation results of SIRD model β γ ν R Brazil 0.101 (0.003) 0.050 (0.003) 0.005 (0.001) 1.834 (0.103)Italy 0.036 (0.003) 0.026 (0.001) 0.005 (0.000) 1.163 (0.102)Iran 0.097 (0.003) 0.077 (0.002) 0.005 (0.000) 1.194 (0.045)S. Korea 0.033 (0.004) 0.033 (0.003) 0.001 (0.000) 0.979 (0.141)Russia 0.055 (0.003) 0.028 (0.001) 0.001 (0.000) 1.957 (0.118)Turkey 0.053 (0.003) 0.044 (0.003) 0.001 (0.000) 1.163 (0.104)US 0.032 (0.002) 0.008 (0.000) 0.002 (0.000) 3.130 (0.221)
Note:
The table displays the estimation results of the model in (2). We displaythe posterior modes and posterior standard deviations (in parenthesis) of the cor-responding parameter shown in the first row and for the country shown in the firstcolumn.
Table 3: Estimation results of TVP-SIRD model parameters
Brazil Italy Iran S. Korea Russia Turkey US α -0.94 (0.02) -0.84 (0.03) -0.18 (0.02) -0.23 (0.02) -1.05 (0.02) -1.08 (0.03) -0.72 (0.00) α α φ -1.07 (0.05) -1.34 (0.02) -0.18 (0.02) -0.34 (0.03) -1.35 (0.05) -1.07 (0.02) -1.53 (0.05) φ φ ψ -1.63 (0.06) -1.52 (0.09) -0.12 (0.01) -0.49 (0.05) -1.79 (0.00) -1.78 (0.00) -1.56 (0.21) ψ ψ Note:
The table displays the estimation results of the model in (4). We display the posterior modesand posterior standard deviations (in parenthesis) of the corresponding parameter shown in the firstcolumn and for the country shown in the first row. h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7Brazil EW 2.19 1.97 1.81 1.65 1.49 1.37 1.26RW-10 1.63 1.49 1.39 1.31 1.22 1.15 1.07RW-20 1.88 1.71 1.57 1.44 1.31 1.22 1.12RW-30 2.03 1.83 1.68 1.54 1.40 1.30 1.19Italy EW 6.09 4.29 3.42 2.91 2.58 2.36 2.21RW-10 1.28 0.93 0.77 0.67 0.61 0.57 0.54RW-20 2.61 1.88 1.53 1.32 1.19 1.10 1.04RW-30 3.94 2.82 2.28 1.97 1.77 1.63 1.54Iran EW 3.50 2.53 2.09 1.85 1.72 1.64 1.58RW-10 2.22 1.65 1.39 1.26 1.17 1.11 1.06RW-20 3.02 2.20 1.82 1.63 1.51 1.43 1.38RW-30 3.40 2.46 2.04 1.81 1.68 1.59 1.54S. Korea EW 5.27 5.45 5.24 5.02 4.76 4.56 4.35RW-10 1.26 1.32 1.28 1.25 1.20 1.15 1.09RW-20 1.57 1.62 1.56 1.49 1.41 1.35 1.28RW-30 2.13 2.18 2.06 1.93 1.80 1.70 1.62Russia EW 4.76 2.95 2.31 1.98 1.79 1.66 1.57RW-10 1.51 0.99 0.82 0.73 0.69 0.66 0.65RW-20 2.68 1.71 1.37 1.21 1.11 1.06 1.02RW-30 3.63 2.29 1.82 1.59 1.45 1.37 1.31Turkey EW 5.52 3.39 2.62 2.23 2.01 1.86 1.77RW-10 2.22 1.42 1.14 1.00 0.92 0.87 0.83RW-20 3.95 2.46 1.92 1.65 1.49 1.39 1.33RW-30 5.01 3.09 2.40 2.06 1.86 1.73 1.65US EW 7.25 5.14 4.13 3.50 3.10 2.83 2.64RW-10 1.76 1.26 1.03 0.89 0.81 0.75 0.72RW-20 3.34 2.42 1.99 1.73 1.56 1.45 1.39RW-30 4.79 3.47 2.84 2.46 2.21 2.05 1.94 Note:
The table displays the Root Mean Squared Forecast Errors (RMSFE) of the compet-ing models relative to the TVP-SIRD model introduced in (4) for the country shown in thefirst column. EW stands for the Expanding Window. RW-M stands Rolling Window with M observations as the sample size for M = 10 , , β t , γ t , ν t and R ,t estimated using TVP-SIRD model β t γ t ν t R ,t B r a z il I t a l y I r a nS . K o r e a R u ss i a T u r k e y U S Note:
The graphs show the evolution of the time varying parameters, β t , the rate of infection, γ t ,the rate of recovery, ν t , the mortality rate and the resulting reproduction rate, R ,t estimated using theTVP-SIRD model introduced in (4) for the countries shown in the first column. β t γ t ν t R ,t B r a z il I t a l y I r a nS . K o r e a R u ss i a T u r k e y U S Note:
The graphs show the evolution of the time varying parameters in real-time, i.e. estimatedusing the sample up to the period t for the TVP-SIRD model (the blue line), estimated using the last M observations up to the period t for the SIRD model shown using the red, yellow and purple lines for M = 10 ,
20 and 30 respectively. See Figure 1 for further details. δ t , β t , γ t , ν t and R ,t when asymptomatic infected individualsare also considered in the sampleItaly S. Korea Turkey US δ t β t γ t ν t R ,t Note:
The graphs show the evolution of the time varying parameters, δ t , the fraction of asymptomaticcases in total cases, β t , the rate of infection, γ t , the rate of recovery, ν t , the mortality rate and the resultingreproduction rate, R ,t estimated using the TVP-SIRD model introduced in (4) displayed with the blueline and the TVP-SIRD model that also takes the unreported cases into account introduced in (6) and (7)displayed with the red line. ppendix A Derivation of updating rules Let f (∆ C t | Ω t − ), f (∆ R t | Ω t − ) and f (∆ D t | Ω t − ) denote the conditional probability den-sity functions for ∆ C t , ∆ R t and ∆ D t conditional on the information set at time period t − t − , respectively. Assuming (conditional) independence among these variables, the con-ditional joint probability density function could be written as f (∆ C t , ∆ R t , ∆ D t | Ω t − ) = f (∆ C t | Ω t − ) f (∆ R t | Ω t − ) f (∆ D t | Ω t − ). We assume that these marginal distributionsfollow Poisson distribution with arrival rates specified in SIRD model in equation (2).Thus, the joint density is as follows, f (∆ C t , ∆ R t , ∆ D t | Ω t − ) = λ ∆ Ct ,t exp( − λ ,t )Γ(∆ C t +1) λ ∆ Rt ,t exp( − λ ,t )Γ(∆ R t +1) λ ∆ Dt ,t exp( − λ ,t )Γ(∆ D t +1) , (A.1)where λ ,t = β t S t − I t − N , λ ,t = γ t I t − and λ ,t = ν t I t − . The score functions, denoted as ∇ ,t , ∇ ,t and ∇ ,t , can be written as ∇ ,t = (cid:16) ∆ C t λ ,t − (cid:17) (cid:16) I t − S t − N (cid:17) ∇ ,t = (cid:16) ∆ R t λ ,t − (cid:17) I t − ∇ ,t = (cid:16) ∆ D t λ ,t − (cid:17) I t − (A.2)We use the variance of the score functions for the scaling parameter. Variance of the scorefunction for ∇ ,t , for example, can be computed as V ar ( ∇ ,t | Ω t − ) = E [ ∇ ,t ∇ (cid:48) ,t | Ω t − ]= E (cid:104) ( ∆ C t λ ,t − (cid:12)(cid:12)(cid:12) Ω t − ] (cid:16) I t − S t − N (cid:17) = E [(∆ C t − λ ,t ) | Ω t − ] λ ,t (cid:16) I t − S t − N (cid:17) . (A.3)As E [(∆ C t − λ ,t ) ] refers to the variance of Poisson distributed random variable ∆ C t , itis identical to λ ,t . Hence, the resulting expression is, V ar ( ∇ t | Ω t − ) = (cid:18) λ ,t (cid:19) (cid:18) I t − S t − N (cid:19) (A.4)Similar computations lead to the variances of score functions for ∆ R t and ∆ D t as V ar ( ∇ ,t | Ω t − ) = I t − λ ,t V ar ( ∇ ,t | Ω t − ) = I t − λ ,t (A.5)27caling (A.2) together with (A.4) and (A.5), the scaled score functions can be written asfollows, s ,t = (∆ C t − λ ,t ) (cid:16) NI t − S t − (cid:17) s ,t = ∆ R t − λ ,t I t − s ,t = ∆ D t − λ ,t I t − (A.6)The final step includes the division of the scaled score functions by β t , γ t and ν t , re-spectively, to obtain the scaled score function in terms of parameters with logarithmictransformations applying the chain rule. The resulting time evolution for ˜ β t = log ( β t ),˜ γ t = log ( γ t ) and ˜ ν t = log ( ν t ) is˜ β t = α + α ˜ β t − + α (cid:16) ∆ C t − − λ ,t − λ ,t − (cid:17) ˜ γ t = φ + φ ˜ γ t − + φ (cid:16) ∆ R t − − λ ,t − λ ,t − (cid:17) ˜ ν t = ψ + ψ ˜ ν t − + ψ (cid:16) ∆ D t − − λ ,t − λ ,t − (cid:17) (A.7)Combining (A.7) together with the SIRD equations, the final model becomes as follows∆ C t | Ω t − ∼ P oisson ( β t S t − N I t − )∆ R t | Ω t − ∼ P oisson ( γ t I t − )∆ D t | Ω t − ∼ P oisson ( ν t I t − )˜ β t = α + α ˜ β t − + α s ,t ˜ γ t = φ + φ ˜ γ t − + φ s ,t ˜ ν t = ψ + ψ ˜ ν t − + ψ s ,t ∆ C t = − ∆ S t = ∆ I t + ∆ R t + ∆ D t (A.8)28 ppendix B Implied Moments of the SIRD models B.1 SIRD Model
We assume that the initial values of states, S , I , R and D are known. For the sakeof simplicity, we assume that S t ≈ N . We focus on the general form of equations as Y t | Ω t − ∼ P oisson ( λI t − ) where λ = β , γ and ν for Y t = ∆ C t , ∆ R t and ∆ D t , respectively.The conditional mean and the variance for the Poisson distributed variables are E [ Y t | Ω t − ] = λI t − V ar ( Y t | Ω t − ) = λI t − (B.9)The resulting process is stationary if the underlying process for I t is stationary. This, inturn depends on the basic reproduction rate, R . To see this, we first start with ∆ I t anduse the fact that E [∆ I t ] = E [∆ C t ] − E [∆ R t ] − E [∆ D t ]. Therefore, the difference equationgoverning I t takes the form of I t = I t − + ∆ C t − ∆ R t − ∆ D t E [ I t | Ω t − ] = I t − + βI t − − γI t − − νI t − = (1 + β − γ − ν ) I t − E [ I t ] = (1 + β (1 − R − )) E [ I t − ] (B.10)where the last equation uses the definition of the basic reproduction rate, R , as the ratioof the infection rate to the resolution rate. In case R exceeds unity, as β is positive,the process is explosive, i.e. the pandemic progress exponentially. On the other hand,if R falls below unity, the process becomes stationary. We can track down the processconditional on the starting value for I . Let π = (1 + β (1 − R − )). E [ I t ] = π t I , (B.11)29n case the initial condition is known, otherwise it is replaced by E [ I ]. For the variancewe can use a similar recursion. We start with computing V ar ( I t | Ω t − ). V ar ( I t | Ω t − ) = V ar [∆ C t − ∆ R t − ∆ D t | I t − ]= V ar [∆ C t | I t − ] + V ar [∆ R t | I t − ] + V ar [∆ D t | I t − ]= ( β + γ + v ) I t − = β (1 + R − ) I t − (B.12)By the law of total variance and using forward iteration, the unconditional variance canbe computed as V ar ( I t ) = β (1 + R − ) E [ I t − ] + π V ar ( I t − )...= β (1 + R − )( (cid:80) t − i =0 π i ) π t − E [ I ] + π t V ar ( I )= β (1 + R − ) π t − (1 − π t )1 − π E [ I ] + π t V ar ( I ) (B.13)In case the initial condition is known, the second term drops. Finally, we can use (B.11)and (B.13) for construction of the unconditional moments of Y t as follows E [ Y t ] = λE [ I t ] V ar ( Y t ) = λE [ I t ] + λ V ar ( I t ) . (B.14)A common drawback of the models which involves variables assumed to follow Poisson dis-tribution is that the conditional mean is assumed to be identical to the variance. However,as can be seen in above derivations, the spread of the random variable Y t is greater thanits expected value, i.e V ar [ Y t ] > E [ Y t ]. Therefore, the model allows for overdispersion inthe data. 30 .2 TVP-SIRD As in the previous section, we focus on the general form of equations as Y t | Ω t − ∼ P oisson ( λ t ) where λ t = Ψ t I t − where Ψ t = β t , γ t and ν t for Y t = ∆ C t , ∆ R t and ∆ D t , re-spectively. We consider W t = log( λ t ) = log(Ψ t ) + log( I t − ) = θ t + K t − to insure positivityof the dynamic arrival rate. The time evolution of the model parameters is as follows θ t = α + α θ t − + α e t − (B.15)where e t − = Y t − − λ t − λ t − . Assuming stationarity for the evolution of the parameters, thatwould imply for θ t that θ t = α − α + (cid:80) t − i =1 α i α e t − i = γ + (cid:80) ∞ i =1 γ i e t − i (B.16)For e t , given the initial conditions e s = 0 for s ≤
0, the mean of e t E [ e t | Ω t − ] = 0 for s > , hence E [ e t ] = E [ E [ e t | Ω t − ]] = 0 (B.17)Accordingly, the variance could be computed as, E [ e t ] = E [ E [ e t | λ t ]] = E [ λ − t ] for s > . (B.18)Moreover, Cov ( e t , e s ) = 0 for t (cid:54) = s . This suggests that e t series can be considered as i.i.d.innovations. Therefore, unconditional moments of θ t follows as E [ θ t ] = γ = α − α . (B.19)and V ar ( θ t ) = (cid:80) ∞ i =1 γ i V ar [ e t − i ]= (cid:80) ∞ i =1 γ i E [ λ − t − i ] Cov ( θ t , θ t − h ) = (cid:80) ∞ i =1 γ i γ i + h E [ λ − t − i ] (B.20)Considering W t = θ t + K t − = γ + (cid:80) ∞ i =1 γ i e t − i + K t − E [ W t ] = γ + E [ K t − ] (B.21)31nd V ar ( W t ) = (cid:80) ∞ i =1 γ i E [ λ − t − i ] + V ar ( K t − ) Cov ( W t , W t − h ) = (cid:80) ∞ i =1 γ i γ i + h E [ λ − t − i ] + Cov ( K t − , K t − − h ) (B.22)where we use the i.i.d. property of the innovations. For deriving the expectations of thevariables on the states of the pandemic, we use an approximation to Gaussian distributionand we can write E [ Y t ] = E [ E [ Y t | λ t ]] = E [ λ t ] = E [exp( W t )] ≈ exp (cid:0) γ + E [ K t − ] + (cid:0)(cid:80) ∞ i =1 γ i E [ λ − t − i ] + V ar ( K t − ) (cid:1)(cid:1) (B.23)see Davis et al. (2003). Moreover the unconditional variance of Y t series follows from thelaw of total variance together with delta method as V ar ( Y t ) = E [ λ t ] + V ar ( λ t ) ≈ E [ λ t ] + E [ λ t ] V ar ( W t ) (B.24)Notice that λ t is a linear function of I t − , and thus, the discussion provided on the prop-erties of I t in the previous section also applies here. An important distinction is that theprocess governing I t in (B.10) is replaced with parameters that are time varying. Hence,score component of the model has a significant impact on the long run behavior of themodel variables. We impose stationarity restrictions on the dynamic process governing θ t .Therefore, unconditional moments regarding to I t can be derived similarly with θ t replacedwith the unconditional mean and the variance of θ t .Furthermore, let us consider the conditional variance V ar ( θ t | Ω t − ) = α λ − t − . There-fore, the time variation in the conditional variance of the data stems from both K t − and θ tt