[PDF] A Vector Autoregression Prediction Model for COVID-19 Outbreak

Abstract

Since two people came down a county of north Seattle with positive COVID-19 (coronavirus-19) in 2019, the current total cases in the United States (U.S.) are over 12 million. Predicting the pandemic trend under effective variables is crucial to help find a way to control the epidemic. Based on available literature, we propose a validated Vector Autoregression (VAR) time series model to predict the positive COVID-19 cases. A real data prediction for U.S. is provided based on the U.S. coronavirus data. The key message from our study is that the situation of the pandemic will getting worse if there is no effective control.

Full PDF

OO R I G I N A L A R T I C L E

A Vector Autoregression Prediction Model forCOVID-19 Outbreak

Qinan Wang | Yaomu Zhou | Xiaofei Chen Lyle School of Engineering, SouthernMethodist University, Dallas, TX, 75275,USA Department of Statistical Science,Southern Methodist University, Dallas, TX,75275, USA Department of Population & DataSciences, UT Southwestern Medical Center,Dallas, TX, 75390, USA

Correspondence

Xiaofei Chen, PhDDepartment of Statistical Science, SouthernMethodist University, Dallas, TX, 75275,USAEmail: [email protected]

Since two people came down a county of north Seattle withpositive COVID-19 (coronavirus-19) in 2019, the currenttotal cases in the United States (U.S.) are over 12 million.Predicting the pandemic trend under eﬀective variables iscrucial to help ﬁnd a way to control the epidemic. Basedon available literature, we propose a validated Vector Au-toregression (VAR) time series model to predict the positiveCOVID-19 cases. A real data prediction for U.S. is providedbased on the U.S. coronavirus data. The key message fromour study is that the situation of the pandemic will gettingworse if there is no eﬀective control.

K E Y W O R D S

COVID-19, Prediction, Time series data, Vector autoregression,Internal validation | INTRODUCTION

COVID-19 (coronavirus-19) a new type virus, belonging to the Coronaviridae family, spreads from Wuhan, China in2019.[1] The Coronaviridae family consists of two main subfamilies: Coronavirinae and Torovirinae. These virusesaﬀect the neurological, gastrointestinal, hepatic, and respiratory systems and can be grown by humans, livestock,etc.[2, 3, 4]Since the appearance, COVID-19 has infected over 59 million people worldwide. [5]The worst situation experiencedby the United States (U.S.) followed by the United Kingdom, Italy, France, and Spain. The U.S. has a cumulative 12million positive cases up to now. It found itself grappling with the worst outbreak after Italy and Spain.[6]The Centersfor Disease Control and Prevention (CDC) has veriﬁed evidence that COVID-19 is distributed from human to human, a r X i v : . [ s t a t . A P ] F e b Wang et al. and has also reported that COVID-19 spreads through touching surfaces, close contact, air, or objects that containviral particles. In the incubation period, it can spread to others. It should be noted that the incubation period andmedian age of conﬁrmed cases are 3 days and 47 years respectively. [7, 8]The economic and social disruption caused by the pandemic is devastating. The disease prevention and control iseager for a disease prediction guidance. Eﬃcient models for short-term forecasting has a pivotal role to developstrategic planning methods in the public health system. Under the guidance of the prediction model, we know theseverity and the trends of epidemic under diﬀerent strategies. It can arouse public awareness and help governmenttake the most beneﬁt measures to avoid deaths and reduce infection, such as ordered school closure, case-base mea-sures, the banning of public events, the encouragement of social distancing, and lockdown.Per literature, a system of diﬀerential equations for Susceptible-Infected-Removed (SIR) sequences is a typical math-ematical epidemiological model for COVID-19 forecasting.[5, 9, 10, 11, 12, 13, 14] Joining SIR models, Khan et al.proposed the SQUIDER compartmental model to predict the coronavirus 2019 spread [15], and Xu et al. appliedthe generalized fractional-order SEIR model.[16] The SIR model has a good ﬁtting for the simulation and data of theoutbreak in the early stage of the disease. However, the obvious limitations are not limited to that the overall modelsystem has a small external control power, and the number of patients presents a typical exponential growth, whichis due to the absence of external drugs and preventive measures.Other works on COVID-19 prediction has been carried out in Deep Learning and ARIMA (Auto Regressive IntegratedMoving Average) univariate time series model. To assess the dynamics of epidemic diseases, time series analysis toolsand deep learning are also widely used in publications. Zeroual et al., Shahid et al., and Chimmula et al. performed theRecurrent Neural Network to predict the spread.[17, 18, 19] With time series tools, Alzahrani et al., Sahai et al., andKumar et al. predicted the COVID-19 by ARIMA univariate model.[20, 21, 22] Deep learning requires a high numberof training samples. However, the data we have are still few, so the model generalization is unappealing, namely over-ﬁtting. In the time series ﬁeld, the ARIMA model is quite simple, requiring only endogenous variables and no otherexogenous variables. A Stationary is required for the time series, or it is stationary after diﬀerencing. Essentially, itlays a shortfall in explaining the causality between diﬀerent variables.This article aims to build a generalized VAR (Vector Autoregressive) model for predicting the dynamics of COVID-19daily cases of the epidemic. VAR is a comprehensive model integrating the advantages of multiple linear regression andthe advantages of time series model (the inﬂuence of lag term can be analyzed). It applies linear relations to describe astable system. Under the stationary condition, we can achieve a consistent estimator with the least-square estimation.Besides, VAR can describe the dynamic linear correlation between variables that aﬀect each other, whether used forprediction, interpretation, or sensitivity analysis are clear. With the selected correlated variables among undetectedinfected, detected deaths, detected recovered, average temperature, precipitation, wind speed, humidity, populationdensity, social trust and civic engagement, that are commonly cited in other epidemiology publications, VAR multivari-ate model can have a better performance on forecasting and provide an interpretive result. [15, 23, 24, 25, 26, 27, 28]The correlated variables we choose are useful for analyzing the critical factors driving epidemics. Not only for COVID-19 but may also this model enlighten other epidemics prediction. For the result, some publications tend to be moreconcerned with the cumulative positive cases, while this article has a very deﬁnite awareness of the daily increasecases. A cumulative positive cases prediction is less meaningful than a daily cases increase, since the latter is a better ang et al. 3 representative signal for epidemic severity. It is also a critical indicator to access the eﬃciency of COVID-19 control.In the next section (Section 2), we describe the data we used in the analyses. The method section (Section 3) elab-orates the proposed VAR model and analysis plan. The Results section (Section 4) provides the prediction results byVAR modeling and an internal validation/evaluation of the model. Section 5 discusses the model performance, furtherimprovement, and comparison with other models. | THE DAILY REPORTED COVID-19 DATA

The COVID-19 disease has been reported by CDC (Centers for Disease Control and Prevention) and published innation and ﬁfty states in the United States by the Center for Systems Science and Engineering (CSSE) at Johns Hop-kins University. We obtain the data from https://covidtracking.com/data/download maintained by “The AtlanticMonthly Group”. The data contain the number of death conﬁrmed, death increasing, death probable, hospitalized, hos-pitalized cumulative, positive conﬁrmed, positive case viral, positive increasing, etc. The available data is beginningon January 22, 2020 to November 24, 2020 (now). | METHOD3.1 | A Vector Autoregressive Panel Time Series Model

VAR model was proposed by Christopher Sims in 1980s, using all the current variables in the model to carry out re-gression for some lagged variables. It is an extension of the AR (autoregression) model, which has been widely usedfor time series. VAR model takes each endogenous variable as a function of the lag value of all endogenous variablesin the system, thus extending the univariate autoregressive model to the "vector" autoregressive model composed ofmultiple time series variables.Let X t be a causal, stationary multivariate process, then the VAR model can be expressed as: X t = α + Φ X t − − · · · − Φ p X t − p + a t (1)where X t = ( X t , . . . , X tm ) T is an m × t matrix; Φ k is a real-valued m × m matrix for each k = 1 , . . . , p ; a t is multivariatewhite noise with covariance matrix (cid:197) (cid:2) a t a T t (cid:3) = Γ a ; 4. α = (cid:0) I − Φ − · · · − Φ p (cid:1) µ , and µ = (cid:197) ( X t ) ; I = { , , . . . , } . Now X t is called a VAR ( p ) process, that is, a vector AR process of order p .Equation (1) can be expressed in multivariate operator notation way: Φ ( B ) ( X t − µ ) = a t , where Φ ( B ) = I − Φ B −· · · − Φ p B p and B k X t = X t − k .A multivariate process X t satisfying the diﬀerence equation in Equation (1) is a stationary and causal VAR ( p ) pro- Wang et al. cess if and only if the roots of the determinantal equation, | Φ ( z ) | = (cid:12)(cid:12) I − Φ z − · · · − Φ p z p (cid:12)(cid:12) = 0 lie outside the unitcircle. A detailed proof see Brockwell et al. and Reinsel et al. [29, 30] | Variables potentially correlated to outcome

Several potential variables might inﬂuence the number of COVID-19 positive cases according to literature [15, 23,24, 25, 26, 27, 28]: undetected infected, detected deaths, detected recovered, average temperature, precipitation,wind speed, humidity, population density, social trust, civic engagement, that are considered in other publications ofepidemic prediction.In Chowdhury et al. [31], climate changes directly aﬀect ﬁve infectious disease transmission. Altered climatic con-ditions may increase the vector biting rate and the vector’s reproduction rate and shorten the pathogen incubationperiod. Furthermore, depending on the report, If the temperature is higher than 25.0 ◦ C, there is a signiﬁcant negativecorrelation between increasing temperature and pneumonia ( p = 0 . ). [31] That is, if the temperature is decreasingunder 25.0 ◦ C, pneumonia would spread out faster. [31] In Liu et al. [32], when the temperature is lower than 13.0 ◦ C,the number of hospital admission increases, which means the speed of infection also rises up. Those are the reasonwhy COVID-19 positive conﬁrmed cases appear rebound tendency after October. [32]Depending on data reported by the Tasci et al. [33], during the periods of high, normal, and low humidity, the numberof days admitted with pneumonia was higher at high humidity rates ( p < . ). AS a result, the speed of COVID-19transmission would increase at high humidity situation. In other words, the positive conﬁrmed cases show a signiﬁ-cant positive relationship with humidity.[33]According to Brundage et al. [34], the pneumonia rate has a stronger positive correlation with mortality. The mortalityincrease would aﬀect COVID-19 spreading out faster than before. However, the number of death increased wouldhappen after COVID-19 transmission rising. [34]Recovered cases should also have a negative correlation with COVID-19. If recovered cases become more, the numberof patients with the virus should be less than before. As a result, fewer patients with the virus would match the lowerspread of the virus. When the recovered cases are increasing, the transmission of COVID-19 transmission would becontrolled. | Model selection

As shown in Section 3.1, we need determine the lag order p of the VAR model. There are diverse criteria, AkaikeInformation Criterion (AIC), Hannan-Quinn Criterion (HQC), Schwarz Criterion (SC), and Final Prediction Error (FPE),to ﬁnd the optimal p . Speciﬁcally, AIC is an estimator of out-of-sample prediction error and thereby relative quality ofstatistical models for a given set of data. [35, 36] Suppose that we have a statistical model of some data. Let k be thenumber of estimated parameters in the model and L be the maximum value of the likelihood function for the model. ang et al. 5 Then the AIC value of the model is AIC = 2 k − ln ( L ) . [37, 38] HQC is an alternative to AIC and Bayesian informationcriterion (BIC). It is given as HQC = − L + 2 k ln ( ln ( n )) , where n is the number of observations.Schwarz criterion (SC) is given as SC = log ( n ) k − log ( L ( ˆ θ )) , where θ is set of all parameter values and L ( ˆ θ ) islikelihood of the model returning the data we have, when tested at the maximum likelihood values of θ . Final Predic-tion Error (FPE) criterion provides a measure of model quality by simulating the situation where the model is testedon a diﬀerent data set. It is givin as det (cid:18) n (cid:205) n e (cid:16) t , ˆ θ i (cid:17) (cid:16) e (cid:16) t , ˆ θ i (cid:17)(cid:17) T (cid:19) (cid:16) d n − d n (cid:17) , where n is the number of values in theestimation data set, e ( t ) is a n -by- vector of prediction errors, ˆ θ i represents the i -th estimated parameters, d is thenumber of estimated parameters.The ordinary least square (OLS) approach is applied to achieve the model estimation. Besides, the model residuals arediagnosed to see if the VAR model assumptions meet. | RESULT4.1 | Preliminary analysis

According to literature and correlation analysis, we include cumulative death, cumulative recovered patients, temper-ature and humidity in the VAR model. Considering the positive cases prediction nationwide, we choose the climatedata of Washington D.C. that could be representative.The correlation analyses are shown in Figure 1. Even though ’Death’ and ’Humidity’ have relatively small correla-tive coeﬃcients, 0.071 and 0.016, with daily positive case increase, we still keep these two variables. Because theCOVID-19 have been veriﬁed correlated with cumulative death cases and diﬀerent humidity.[33, 34]A descriptive analysis for cumulative death case, cumulative recovered case, temperature, humidity is shown in Fig-ure 2. The cumulative death cases and the cumulative recovered cases presents straight up tendency.Since co-integration between daily positive cases and other selected variables is required by VAR model. We runthe co-integration test (Engle Granger test) on all the variables (series). This test is for daily increase positive caseswith other variables. The null hypothesis is that there is no co-integration relationship between the two variables. Ifthe variables are all co-integrated with daily positive cases, we can claim that they have are stably correlated in a longrun. Results see Appendix Table 3. As all p -values are less than 0.05, we have all variables co-integrated with dailyincrease positive cases. Based on the above results, it is considered that there is a stable relationship and there is nospurious regression for the constructed model. Wang et al.

F I G U R E 1

Correlation matrix plot. (Death: Cumulative death case; Pos.Case: daily positive case; Recovered:cumulative recovered case; Temp: temperature) | Model selection

To determine the lag order of our VAR model, we take AIC, HQC, SC and FPE into consideration (results see AppendixTable 4). The optimal lag order is determined to be 8.With the suggested lag order 8, we estimate our model using ordinary least square technique. We show the pa-rameter estimates in Table 1.To verify the assumptions of VAR model, we plot residuals and residuals autocorrelation as shown in Appendix Figure 4.The mean value of residual is almost zero (-1.45e-14) and autocorrelation coeﬃcients are within 95% conﬁdence inter-val (CI; blue dotted line). We also test the residuals by Ljung-Box test and have p-value 0.20 (null hypothesis is that thedata are independently distributed). Hence, the ﬁtted model satisﬁes the assumptions mentioned above: (cid:197) ( e t ) = 0 and (cid:197) (cid:16) e t e T t − k (cid:17) = 0 , where e t is the residual at time t . | COVID-19 daily positive cases prediction

The objective is predicting the trend of the daily increase positive cases. We predict 30-day daily positive cases start-ing from July 2, August 21, and November 24, respectively, for internal validation purpose. We pick these three datesfor particular reasons. First, 30-day daily increase positive cases after July 2 and August 21 are not ﬂuctuating too ang et al. 7

F I G U R E 2

Time series from March 25 to November 24, 2020: (a) Cumulative death of COVID-19 cases in US; (b)Cumulative recovered cases; (c) Temperature in Washington D.C.; (d) Humidity in Washington D.C..

TA B L E 1

VAR model parameters estimation Lag order/ Variables Death Pos.Case Recovered Temp Humidity1 3.62 2.29 7.49 1.30 4.972 -5.07 -1.28 -3.46 -3.68 -1.213 3.25 -4.38 -1.53 -2.96 4.124 -5.00 -40.62 9.61 2.66 -3.915 9.06 2.09 1.67 -1.61 -1.696 7.43 -2.46 -2.92 -3.96 8.077 -6.89 5.80 -1.50 3.61 -6.478 1.53 -1.29 1.94 -5.22 -2.83

Note: “constant” item is 8327.24. Wang et al.

F I G U R E 3

Real data (black line) and prediction (red line) with 95% conﬁdence interval (blue dotted line).(i). For validation purpose: (a) 30 days prediction from July 2, 2020. (b) 30 days prediction from August 21, 2020. (c)30 days prediction from September 6, 2020.(ii). Real prediction: (d) 30 days prediction from November 24, 2020.much. It is a preliminary test on model performance. Critically, the trend after September 6 becomes steep suddenly.However, the trend before September 6 is similar to before July 2 and before August 21. It may be trapped for themodel to identify these three conditions. We want to test if the model will predict the correct rapid increase afterSeptember 6.As mentioned, the ﬁrst three plots (a) – (c) in Figure 3 are for internal validation purpose. As we can see that themodel is useful, since the real data (black) is covered by the predicted 95% conﬁdence interval (blue dotted line). Tobe speciﬁc, in Figure 3 (a), the black line is the real selected positive conﬁrmed cases daily data, which presents astable tendency in the ﬁrst three months, around 40,000 cases every day. After that, in the middle of June, the realdata begin to increase. The short red color line is our prediction using proposed model. The black line and the red lineare almost overlapped. One thing need to mention is that the black line ﬂuctuates slightly larger than the red, but thepredicted is mostly covered by the 95% conﬁdence interval. It concludes a satisﬁed prediction. In Figure 3 (b), after themiddle of July, the number of daily positive conﬁrmed cases decreases and experiences the ﬁrst peek of 80,000 cases.However, both the black and red lines show a stable trend then, and the ﬁgure shows almost the same appearanceas the ﬁrst 30-day prediction. In Figure 3 (c), the real data experiences a decreasing trend. But, at the beginning ofSeptember, the number of daily positive conﬁrmed cases appears a rebound, directed straight up to the second peakvalue. The peak value even reaches two hundred thousand cases for one day. Our predicted values are a little lower ang et al. 9

TA B L E 2

The real values and predictions of the daily increase positive cases on Tuesday with 95% conﬁdenceinterval . Date Real value Prediction Lower 95% CI Upper 95% CI2020-07-09 58961 64116 58240 699932020-07-16 70446 66497 57820 751732020-07-23 71225 66129 56216 760432020-07-30 68806 71059 60634 814842020-08-25 36588 35466 28817 421152020-09-01 42426 30940 21343 405362020-09-08 22137 30037 17357 427172020-09-15 34904 32092 16668 475152020-11-03 86662 86890 77433 963482020-11-10 131182 104960 89508 1204112020-11-17 156722 121455 100033 1428782020-11-24 167012 152604 123769 1814382020-12-01 176995 161998 1919932020-12-08 185548 160309 2107872020-12-15 194945 159893 2299982020-12-22 208196 164852 251539 Values on December 1, December 8, December 15, December 22 are blank since real data are not available until now. than real values. The reason can include Halloween holiday parties and some assemblies because those happened atthe end of October, and many COVID-19 cases can be conﬁrmed in early November. Furthermore, those are someextrinsic factor besides the ones in our VAR model. As a result, it is reasonable that the black line is higher than thered prediction line and the conﬁdence interval’s upper bound. The success is that the model correctly predicts therapidly increasing trend after August 21.The last plot (d) in Figure 3 is our main result that predicts the daily positive COVID-19 cases 30 days later start-ing from November 24 (now), that is, a prediction for unknown future trend (to December 24). It is obvious that thethe future 30-day growth trend will increase if government are not taking any new measures to control the transmis-sion of COVID-19. During the Christmas, the predicted daily positive case is around 240,000 in US.Table 2 shows a comparison of real values and predictions with a 95% conﬁdence interval. Considering that thedaily cases increase data on Monday is partly derived from the cases accumulation over the weekend, we comparethe predicted data on each Tuesday with the real values. In Table 2, the real values are generally within the 95% con-ﬁdence level. For the predictions on November 3, 10, 17, 24, the model predicts 86,890, 104,960, 121,455, 152,604and upper bounds are 96,348, 120,411, 142,878 and 181,438. The real values exceed upper bounds on November

10 and 17 by around 10,000. It still shows the real values are within the conﬁdence interval since the real values donot deviate from the upper bound too far. The model has good performance of catching a rapid increase trend andregular trend. | DISCUSSION

The study proposed and applied the VAR model for predicting the dynamics of daily COVID-19 positive cases. Weselected relevant variables according to literature and checked their correlation coeﬃcients and co-integration. Weevaluated our model by comparing the predicted values and real values.We can introduce more relevant variables in the future to improve the performance if outside force appears to in-ﬂuence viral transmission, control or exacerbate. The most possible variables available may be the estimation ofsocial distance and the number of vaccination. Then the model will be still useful after vaccine comes out. It enablesthe model to predict the decrease of infections at the vaccination initial stage. It is also the reason why we investigatethe application of VAR model on pandemic predictions. The VAR model is diﬀerent from and better than SIR andARIMA. Because SIR and ARIMA have an unsatisﬁed performance when outside force gets involved.The proposed model can be strongly generalized because it is not limited to speciﬁc data, since the structure ofthe model is constructed. Based on the generalization, this model can be used to predict other epidemics with thesame characteristics as COVID-19. references [1] Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus:implications for virus origins and receptor binding. The Lancet 2020;395(10224):565–574.[2] Chen Y, Liu Q, Guo D. Emerging coronaviruses: genome structure, replication, and pathogenesis. Journal of medicalvirology 2020;92(4):418–423.[3] Tang B, Wang X, Li Q, Bragazzi NL, Tang S, Xiao Y, et al. Estimation of the transmission risk of the 2019-nCoV and itsimplication for public health interventions. Journal of clinical medicine 2020;9(2):462.[4] Wang L, Kraemer R, Borngraeber J. An improved highly-linear low-power down-conversion micromixer for 77 GHzautomotive radar in SiGe technology. In: 2006 IEEE MTT-S International Microwave Symposium Digest IEEE; 2006. p.1834–1837.[5] Kufel T, et al. ARIMA-based forecasting of the dynamics of conﬁrmed Covid-19 cases for selected European countries.Equilibrium Quarterly Journal of Economics and Economic Policy 2020;15(2):181–204.[6] Konarasinghe K. Modeling COVID-19 Epidemic of USA, UK and Russia. Journal of New Frontiers in Healthcare andBiological Sciences 2020;1(1):1–14.[7] Guan Wj, Ni Zy, Hu Y, Liang Wh, Ou Cq, He Jx, et al. Clinical characteristics of 2019 novel coronavirus infection in China.MedRxiv 2020;.[8] Maleki M, Mahmoudi MR, Wraith D, Pho KH. Time series modelling to forecast the conﬁrmed and recovered cases ofCOVID-19. Travel Medicine and Infectious Disease 2020;p. 101742. ang et al. 11 [9] Malavika B, Marimuthu S, Joy M, Nadaraj A, Asirvatham ES, Jeyaseelan L. Forecasting COVID-19 epidemic in India andhigh incidence states using SIR and logistic growth models. Clinical Epidemiology and Global Health 2020;.[10] Dhanwant JN, Ramanathan V. Forecasting COVID 19 growth in India using Susceptible-Infected-Recovered (SIR) model.arXiv preprint arXiv:200400696 2020;.[11] Ndiaye BM, Tendeng L, Seck D. Analysis of the COVID-19 pandemic by SIR model and machine learning technics forforecasting. arXiv preprint arXiv:200401574 2020;.[12] Bastos SB, Cajueiro DO. Modeling and forecasting the Covid-19 pandemic in Brazil. arXiv preprint arXiv:2003142882020;.[13] Fanelli D, Piazza F. Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos, Solitons & Fractals2020;134:109761.[14] Chen YC, Lu PE, Chang CS, Liu TH. A Time-dependent SIR model for COVID-19 with undetectable infected persons.IEEE Transactions on Network Science and Engineering 2020;.[15] Khan Z, Van Bussel F, Hussain F. A predictive model for Covid-19 spread–with application to eight US states and howto end the pandemic. Epidemiology & Infection 2020;148.[16] Xu C, Yu Y, Yang Q, Lu Z. Forecast analysis of the epidemics trend of COVID-19 in the United States by a generalizedfractional-order SEIR model. arXiv preprint arXiv:200412541 2020;.[17] Zeroual A, Harrou F, Dairi A, Sun Y. Deep learning methods for forecasting COVID-19 time-Series data: A Comparativestudy. Chaos, Solitons & Fractals 2020;140:110121.[18] Shahid F, Zameer A, Muneeb M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM.Chaos, Solitons & Fractals 2020;140:110212.[19] Chimmula VKR, Zhang L. Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos,Solitons & Fractals 2020;p. 109864.[20] Alzahrani SI, Aljamaan IA, Al-Fakih EA. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMAprediction model under current public health interventions. Journal of infection and public health 2020;13(7):914–919.[21] Sahai AK, Rath N, Sood V, Singh MP. ARIMA modelling & forecasting of COVID-19 in top ﬁve aﬀected countries.Diabetes & Metabolic Syndrome: Clinical Research & Reviews 2020;14(5):1419–1427.[22] Kumar P, Kalita H, Patairiya S, Sharma YD, Nanda C, Rani M, et al. Forecasting the dynamics of COVID-19 Pandemic inTop 15 countries in April 2020: ARIMA Model with Machine Learning Approach. medRxiv 2020;.[23] Siedner MJ, Harling G, Reynolds Z, Gilbert RF, Haneuse S, Venkataramani AS, et al. Social distancing to slow the USCOVID-19 epidemic: Longitudinal pretest–posttest comparison group study. PLoS medicine 2020;17(8):e1003244.[24] Behnood A, Golafshani EM, Hosseini SM. Determinants of the infection rate of the COVID-19 in the US using ANFISand virus optimization algorithm (VOA). Chaos, Solitons & Fractals 2020;139:110051.[25] Bialek S, Bowen V, Chow N, Curns A, Gierke R, Hall A, et al. Geographic diﬀerences in COVID-19 cases, deaths, andincidence—United States, February 12–April 7, 2020 2020;.[26] Elgar FJ, Stefaniak A, Wohl MJ. The trouble with trust: Time-series analysis of social capital, income inequality, andCOVID-19 deaths in 84 countries. Social Science & Medicine 2020;263:113365.[27] Bruine de Bruin W. Age diﬀerences in COVID-19 risk perceptions and mental health: Evidence from a national USsurvey conducted in March 2020. The Journals of Gerontology: Series B 2020;. [28] James N, Menzies M. Cluster-based dual evolution for multivariate time series: Analyzing COVID-19. Chaos: AnInterdisciplinary Journal of Nonlinear Science 2020;30(6):061108.[29] Brockwell PJ, Davis RA, Fienberg SE. Time series: theory and methods: theory and methods. Springer Science & BusinessMedia; 1991.[30] Reinsel GC. Elements of multivariate time series analysis. Springer Science & Business Media; 2003.[31] Chowdhury FR, Ibrahim QSU, Bari MS, Alam MJ, Dunachie SJ, Rodriguez-Morales AJ, et al. The association be-tween temperature, rainfall and humidity with common climate-sensitive infectious diseases in Bangladesh. PLoS One2018;13(6):e0199579.[32] Liu Y, Kan H, Xu J, Rogers D, Peng L, Ye X, et al. Temporal relationship between hospital admissions for pneumonia andweather conditions in Shanghai, China: a time-series analysis. BMJ open 2014;4(7).[33] Tasci SS, Kavalci C, Kayipmaz AE. Relationship of meteorological and air pollution parameters with pneumonia in elderlypatients. Emergency Medicine International 2018;2018.[34] Brundage JF, Shanks GD. Deaths from bacterial pneumonia during 1918–19 inﬂuenza pandemic. Emerging infectiousdiseases 2008;14(8):1193.[35] McElreath R. Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press; 2020.[36] Taddy M. Business data science: Combining machine learning and economics to optimize, automate, and acceleratebusiness decisions. McGraw Hill Professional; 2019.[37] Burnham KP, Anderson DR. A practical information-theoretic approach. Model selection and multimodel inference, 2nded Springer, New York 2002;2.[38] Akaike H. A new look at the statistical model identiﬁcation. IEEE transactions on automatic control 1974;19(6):716–723. ang et al. 13

Appendix

TA B L E 3

Result for Engle-Granger test (co-integration test)Variables Statistics p-value Co-integrationCumulative death cases 0.0656 < .0001 YCumulative recovered cases 0.0864 < .0001 YTemperature 0.0472 < .0001 YHumidity 0.0373 < .0001 Y

TA B L E 4

Lag order selection: AIC, HQ, SC, and FPELag order AIC HQC SC FPE1 5.57 5.59 5.62 1.572 5.50 5.54 5.59 7.823 5.48 5.53 5.61 6.524 5.47 5.53 5.63 5.675 5.45 5.53 5.65 4.766 5.44 5.54 5.68 4.457 5.42 5.52 5.69 3.428 5.39 5.51 5.69 2.499 5.39 5.53 5.73 2.5510 5.39 5.55 5.78 2.76