A Data-driven Understanding of COVID-19 Dynamics Using Sequential Genetic Algorithm Based Probabilistic Cellular Automata
AA Data-driven Understanding of COVID-19 DynamicsUsing Sequential Genetic Algorithm BasedProbabilistic Cellular Automata
Sayantari Ghosh a , Saumik Bhattacharya ∗ b a Department of Physics, National Institute of Technology Durgapur, India b Department of E & ECE, Indian Institute of Technology Kharagpur, India
Abstract
COVID-19 pandemic is severely impacting the lives of billions across the globe.Even after taking massive protective measures like nation-wide lockdowns, dis-continuation of international flight services, rigorous testing etc., the infectionspreading is still growing steadily, causing thousands of deaths and serious socio-economic crisis. Thus, the identification of the major factors of this infectionspreading dynamics is becoming crucial to minimize impact and lifetime ofCOVID-19 and any future pandemic. In this work, a probabilistic cellular au-tomata based method has been employed to model the infection dynamics for asignificant number of different countries. This study proposes that for an accu-rate data-driven modeling of this infection spread, cellular automata providesan excellent platform, with a sequential genetic algorithm for efficiently esti-mating the parameters of the dynamics. To the best of our knowledge, this isthe first attempt to understand and interpret COVID-19 data using optimizedcellular automata, through genetic algorithm. It has been demonstrated thatthe proposed methodology can be flexible and robust at the same time, and canbe used to model the daily active cases, total number of infected people andtotal death cases through systematic parameter estimation. Elaborate analysesfor COVID-19 statistics of forty countries from different continents have beenperformed, with markedly divergent time evolution of the infection spreadingbecause of demographic and socioeconomic factors. The substantial predictivepower of this model has been established with conclusions on the key players inthis pandemic dynamics.
Keywords:
Epidemiological model; Probabilistic cellular Automata; Geneticalgorithm; Real data modeling. ∗ Corresponding author
Email addresses: [email protected] (Sayantari Ghosh), [email protected] (Saumik Bhattacharya ∗ ) Preprint submitted to Elsevier August 28, 2020 a r X i v : . [ q - b i o . Q M ] A ug . Introduction With its outbreak in Wuhan, China, Coronavirus disease-2019 (COVID-19)has spread across the world within a few months. Due to its explosive growthand considerable rate of fatality, World Health Organization (WHO) declaredCOVID-19 as a pandemic and a global health emergency [1]. According to theavailable statistics in June, 2020, the total number of infections by SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2), the causative agent ofthis disease, is approaching 19 million around the world, causing around 700,000deaths in 213 countries and territories, with no effective vaccination availablein the market so far. Beyond respiratory discomforts including pneumonia, drycough, cold and sneezing [2, 3], it has been reported to cause liver and gastroin-testinal tract maladies, kidney dysfunction and heart inflammation, in cases ofsevere infection [4, 5, 6]. This highly infectious disease transmits from person-to-person through respiratory droplets produced by infected person. Fomite-mediated and nosocomially acquired infections are also being identified as im-portant sources of viral diffusion [7, 8, 9]. A typical incubation time from expo-sure to symptoms has been reported for COVID-19, while infection transmissionfrom asymptomatic individuals has been observed as well [10, 11, 12].Immediately after the detection of human-to-human transmission, the gov-ernment agencies of various countries started implementing several mitigationstrategies to control the epidemic. The measures thus taken include social dis-tancing, restrictions on domestic as well as international travel, cancelling socialevents, shutting down of public as well as commercial activities etc. which caneffectively reduce the possibilities of physical human contact. Moreover, contacttracing, aggressive testing as well as hospital or home quarantine for infectedindividuals and suspected cases have also been executed to track and preventfurther spread. However, these strategies are directly contributing to enormouseconomical loss. The optimum estimation of this novel disease dynamics isemerging out as a challenging problem in this context. The immense disruptioncaused by COVID-19, resulting into overwhelming disorder in the health, econ-omy and lives of billions of people around the globe, has brought the necessityfor accurate modelling of infectious diseases into the focus. The effect and effec-tiveness of this complex interplay between differing length-scales and time-scaleswith the applied control strategies can only be understood and predicted withthe help of precisely designed quantitative models.
With a tremendous effort from researchers around the world, a spectrum of var-ious mathematical and computational approaches is being used to understandand predict COVID-19 statistics, addressing its different perspectives. On arudimentary sense, the studies being pursued can be segmented in two cate-gories: (i) data science and machine learning approaches and (ii) differentialequation based mathematical modelling techniques. The first group of studiestrusted mostly on data mining from national/international repositories (e.g.,WHO, country specific data centres etc.) or popular social media platforms to2 igure 1: An overview of the dynamics: (a) Object process diagram of the proposed model;(b) The schematic diagram of the disease transmission dynamics in form of a modified SEIQRmodel. Transition probabilities p se , p ei , p iq , p ir and p qr are pointed out. The associated statetransition delays are indicated on the timeline of the disease dynamics. (c) Time evolution ofthe spatial lattice during spread of the infection in a population. The colors of the respectivesubpopulations, (i.e., susceptible, exposed, infected, quarantined and removed) are same asdepicted in (a). In this study, we propose probabilistic cellular automata based dynamicalmodel, optimised through sequential genetic algorithm for an accurate assess-ment of the extent of COVID-19 dynamics. The major motivation of usingcellular automata (CA) is its ability in depicting extremely complex macro-scopic outcomes, while being based on local interactions that trusts on the4 able 1: Comparison of the proposed method with the state-of-the-art COVID-19 models
BasicMethodology
Differential equation models Data science approaches
References [33-37], [39], [40] [13-22]
Limitations a) Homogeneous Mixingb) Most models are consideredas deterministic a) No way to track personto person transmission.b) No neighborhoodconsideration.
Contribution
Proposed method,a) accommodates heterogeneity in populationb) includes stochasticity and probabilistic dynamicsc) estimates optimum epidemic dynamics parameters.d) considers neighborhood and demography explicitly.e) performs robust prediction with limited data.interaction of a multitude of single individuals [41, 42]. This methodology is ca-pable of giving a direct correspondence to the physical system and also rectifiesthe major drawbacks of ODE models by (i) tracking individual contact pro-cesses, (ii) giving room for introducing probabilistic individual behaviour, and(iii) capturing neighbourhood as well as global spatial information. Becauseof these reasons, CA based approaches have been successfully used as a com-petent substitute method to simulate physical, biological, environmental andsocial contagion-like spreading [43, 44, 45, 46]. For studying past epidemics aswell as interpreting COVID-19, some studies have proposed cellular automataas an alternative method [47, 48, 49, 50]. However, to capture and interpret thebehaviour of real data through CA needs a large-scale parameter optimizationthat could be time consuming as well as sub-optimal. Thus, though being ex-tremely flexible and powerful, CA has not been yet optimized to understand andinterpret COVID-19 data for countries worldwide. To explore this, in this study,genetic algorithm (GA) has been employed, which is a well-known method forgenerating the optimal parameter subset through stochastic search proceduresbased on the principle of the survival of the fittest [51, 52, 53, 54, 55]. Cross-over and mutations, two key properties of genetic algorithm help to optimizethe parameter set efficiently in limited steps. Cellular automata coupled withgenetic algorithm has been used before to explore evolutionary aspects of gametheoretical problems [56], but to the best of our knowledge analyzing and devel-oping understanding from real pandemic data like COVID-19 using optimizedCA platform has not been attempted yet. The main contributions of this workare as follows: • To build a CA model which is probabilistic, so that it can take into accountof demographic variations, neighbourhood diversity and uncertainties ofreal dynamics. 5
To create an easily implementable framework where optimization usingGA will be done sequentially for all parameters associated with the tran-sition rules of the CA model for real data interpretation. • To interpret and understand COVID-19 disease transmission dynamicswith an optimized CA framework, which can be extended for predictionas well.Through this, on one hand, one can track the individual contact process throughtime and space; on the other hand, a self-adapting process of evolutionary strate-gies has been created by designing the chromosome with parametric genes andestablishing fitness function that maximises over the generations. The mainlimitations of the state-of-the-art algorithms and the major contributions of theproposed method are listed in Table 1 for a clear understanding. The main ra-tionality behind this approach is that it is extremely difficult to find the optimalparameter of the complex spatial epidemiological model using random searchor analytical techniques. The proposed GA based framework helps to searchthe parameter space more efficiently for the optimal performance of the entirealgorithm.The rest of this article is organized as follows: Section 2 includes the proposedconcepts of epidemiological model, probabilistic cellular automata and the se-quential genetic algorithm used in this work. In Section 3, the results has beenelaborately discussed where the optimized CA model has been employed forsimultaneously understanding as well as analyzing active infections, total infec-tions and total death caused by COVID-19 for several countries, considering thedemographic and spatial population density variations. Section 4 is comprisedof concluding remarks.
2. Proposed Methodology
An object process diagram of the proposed method has been depicted inFig. 1(a). The methodology starts with the infection spreads following theSEIQR epidemiological model in a random human population over a 2D grid,initialized on a country-specific basis. The parameters of the epidemiologicalmodel is continuously optimized using proposed sequential genetic algorithm tomatch the real country-specific infection spread data. The proposed methodol-ogy is consisted of three distinct parts − ( A ) epidemiological model that governsthe infection spreading, ( B ) probabilistic cellular automata (PCA) to modelthe dynamics of the pandemic spread and ( C ) optimization of the parametersassociated with PCA using genetic algorithm (GA) to fit real-world data. In the epidemiological model, the entire population is partitioned in fivedistinct parts. At the very beginning, every person was healthy but they arevulnerable to the infection. These people are denoted as susceptible ( S ) sub-population. At time instance t = 0, some people in the population got exposed6 able 2: Descriptions of the parameters used in the proposed work Notation Description L Spatial lattice A Set of possible states on lattice A \ n ti Total number of people at state a i at time t Ω d d -neighbourhood of x ∈ L Λ at Mapping L → a at time tp tij Probability at time t that x ∈ L moves from a i to a j τ ij Transitional delay for x to move from a i to a j e t , i t Number of exposed and infected people in the d -neighbourhood of x at time tp e , p i Probabilities that an exposed or an infected person spreadsthe infection to a susceptible person when they meetΘ A gene containing all the parameters of PCA method B Binary encoded representation of Θ G (Θ) The PCA model with parameter Θ y Time series of an epidemiological state in a countryˆ y Time series estimate of epidemiological state from PCA e ji Estimation error of j th gene in i th generation N g Total number of chromosome in genepool F Number of parents selected for mating from N g p β Fraction of r t that recovers from the disease ρ Fraction of parents F that lives in the next generationto the infection from some known or unknown source. These exposed peopledo not have any particular symptom of the infection, but they can spread theinfection to the susceptible people. These asymptomatic people are referredas exposed ( E ) subpopulation. At time instance t = 0, there were also somepeople who had clear symptoms of the infection and they also had the poten-tial to spread the infection among susceptible people. This symptomatic peopleare considered as infected ( I ) subpopulation. After an incubation period, someof the exposed people show the symptoms of the infection and they move tosubpopulation I . Because of the health facilities and testing time, the infectedpeople are detected with some average delay, and put to quarantine. The peo-ple who are quarantined cannot spread the infection to other people, thoughthey themselves remain in the infectious stage. These people are denoted asquarantined ( Q ) subpopulation. Both the quarantined people and the infected(but not detected) people would come out of the infectious stage eventually,and after that they no longer contribute in the infection spreading dynamics.These people are denoted as removed ( R ) subpopulation in the model. Thisremoved subpopulation contains two kinds of people − one who have recoveredfrom the infection completely and they neither infect nor get infected in future,and the other kind of people who have died due to the severity of the infection.7chematic diagram related to the transitions, probabilities and timelines corre-sponding to the dynamics of infection are shown in Fig. 1(b). In the analysis,normalized subpopulations have been considered, and the respective normalizedsubpopulation is denoted using the same lowercase character. For example, thenormalized susceptible and infected subpopulations are denoted by s and i re-spectively. As shown in Fig. 1(c), this epidemiological time evolution has beenimplemented on a 2D lattice using PCA as discussed below. Let L be a finite subset of Z at time instance t , denoted as L (cid:64) Z whichdefines a regular 2D lattice. Every point on this lattice x ∈ L can acquire fi-nite number of states A . In this particular problem, the set A can be definedas A = { , s, e, i, q, r } , where the terms s , e , i , q and r denote the particularpossible states of infection as discussed in Sec. 2.1, and 0 denotes no humanoccupant or an empty space. At time t = 0, n i points are randomly selected on L and assign the state a i where i ∈ A . The total initial population is definedas N = (cid:80) i ∈ A \ n i . At any instance of time t , n ti , i ∈ A \ a i .For neighbourhood criteria, modified-Moore neighbourhood or d -neighbourhoodhas been used. A finite subset Ω d (cid:64) Z is defined, containing the origin = (0 , d is 4 d ( d + 1). General probabilistic cellularautomata (PCA) is a stochastic process that describes sequence of mappingsΛ at : L → a , a ∈ A , where any particular state Λ at ( x ) of x ∈ L at a particulartime instance t is dependent on the previous states of the d -neighbourhood of x , denoted as x + Ω d = { x + ω : ∀ ω ∈ Ω d } with certain probabilities. Moreprecisely, in COVID-19 infection spread, Λ Et ( x ) will be decided by Λ t − ( x + ω ), ∀ ω ∈ Ω d . The other mappings Λ at ( x ), a ∈ A \ E , depends on the sequence ofstates Λ aκ ( x ), 0 ≤ κ < t . The transition probability p ta i a j denotes the probability of transition at time t from state a i to state a j , where a i , a j ∈ A . Without any loss of generality, p ta i a j is denoted as p tij and transition from state a i to a j as a ij in the rest ofthe discussion for a simpler notation. In cases, where a i (cid:54) = a j , p tij is referredas state transitional probability, and if a i = a j , p tii is called as self transitionalprobability.If a state transition a ij , i (cid:54) = j , happens in x at time t following the transitionprobability p tij and the transition state a ij has a transitional delay τ ij , then p tij = (cid:26) t < t ui + τ ij p ij if t ≥ t ui + τ ij where t ui is the time instance when transition a ui , u (cid:54) = i happened. In thisinfection diffusion model, only the state transitional probabilities p tse , p tei , p tiq , p tqr and p tir are considered to be nonzero at certain instance of time, and for all8 igure 2: Time series data for active cases (blue) of COVID-19 pandemic in different countrieswhere the peaks of the infection spread of the first wave have been passed, and estimated activecases (red) from proposed PCA-GA method. the other transitional probabilities, τ ij is set to infinity, where p ij and τ ij areuser defined parameters. However, for the transition a se , t ui and τ ij are set tozero, and for x ∈ L , let us define p tse = p ij = 1 − p tss and the self-transitionprobability p tss = (1 − p i ) i t − (1 − p e ) e t − where i t − and e t − are the number ofcells in states i and e respectively in the Ω d neighbourhood of x at time t − p e and p i are defined as ‘infection probabilities’ which can beconsidered as the probabilities that a susceptible person become exposed to theinfection when that person meets an exposed or an infected person respectively.An empty cell does not contribute in the infection spread, and thus, self transi-tional probability p t = 1, ∀ t . Among the total removed population r t at timeinstance t , a population fraction p β r t is considered that recover from the in-fection at time instance t and acquire long-term immunity towards the disease,and a population fraction (1 − p β ) r t is considered to be deceased. The removedpopulation r t is not considered further in the infection dynamics and it is takenthat p t (cid:48) rr = 1, t (cid:48) > t . 9 .3. Parameter optimization using GA Though PCA has potential to model the probabilistic transition of states ona spatial lattice, the main challenge to use it for modeling a real-world scenariois to find out the optimal parameters for the PCA. As the searching space forthe proposed PCA model is very large, it is practically impossible to searchfor the optimal parameter setting manually to analyse the characteristics ofthe infection spread from a real data. Thus, genetic algorithm (GA) has beenapplied to find out the optimal parameter set given a real time-series data.Let us assume a discrete time signal y [n], 0 ≤ n ≤ ( T −
1) associated withthe real world infection spread. The PCA model is denoted by G (Θ), whereΘ = [ θ , θ . . . θ h ] denotes the set of parameters used for the PCA model. Ifˆ y [n], 0 ≤ n ≤ ( T −
1) is the time evolution of the desired variable in the model G (Θ), then the objective is to find an optimal parameter set Θ ∗ such thatˆ y [n] → y [n], ∀ n. To apply GA, each θ i , 1 ≤ i ≤ h , is encoded as a string ofbinary digits b i [54, 55] assuming the θ i has a bound | θ i | < ζ i , 1 ≤ i ≤ h . Thisbinary string is referred as gene , and the concatenated genes in the order ofthe appearance of respective θ i in Θ is called the chromosome . For example, if B is the chromosome corresponding to parameter set Θ, G ( B ) is equivalent to G (Θ). A collection of N g number of chromosomes of estimated parameters, oftenreferred as gene pool , are evaluated at every time step (called as generation ).In our work, the error of each chromosome has been evaluated using l normdistance. At i th generation, the error of the j th chromosome B ji is computed as e ji = (cid:107) y − ˆ y ji (cid:107) = T − (cid:88) n=0 | y [n] − ˆ y ji [n] | where ˆ y ji is the estimated output of G ( B ji ) in the vector form and ˆ y ji [n] is thevalue of ˆ y ji at time instance ‘n’. At each generation, GA finds out min ( e ji ), ∀ j and tries to make e ji → i → ∞ . In the proposed framework, some of theparameters are related to probabilities having a range 0 to 1, and some of theparameters are associated with time (in days) which are discrete integers, andgreater than or equal to zero in our case. Thus, the parameters are initializedrandomly keeping their domain restrictions intact.For mating, two chromosomes, often referred as parents , are selected from thegene pool considering their ‘ fitness ’. Among two selected parents, a crossoverpoint or a splice point is selected at b i , 1 ≤ i ≤ h in both chromosomes anda crossover [55] happens that produces two offsprings. In our approach, fitness f ji of each chromosome has been defined as the inverse of their respective errorsat a particular generation. At each generation, F number of best chromosomesare selected from the gene pool having the maximum fitness for mating. Fol-lowing the idea of [52], ρF number of parents are kept to the next generationalong with the new chromosomes to ensure that the error in the next generationis always less than or equal to the current generation. Selecting ρF numberof chromosomes from the parents, N g − ρF number of children are producedfrom mating to keep the size of the gene pool constant. After the offspringsare generated, in the parameter space, s genes are randomly selected and small10 igure 3: Parameter estimations and goodness of model estimation: (a) RMSE, Correlationand χ distance, d l , d c and d χ for all 40 countries considered in this work in terms of goodnessof agreement with model estimations shown in percentage. The colors green, orange and redsignify level of agreement. Values between (0:0.05) for d l , (0:0.01) for d c and (0:1) for d χ areconsidered as good (green). Values between (0.05:0.08) for d l , (0.01:0.1) for d c and (1:3) for d χ are considered as moderate (orange). Values above moderate are considered as poor (red).For all three metrics 65 −
75% countries have shown good agreement with model estimation;(b) and (c) represent boxplot for the best-fit parameters of state transition probabilities andstate transitional delays respectively, for all the 20 countries shown in Fig. 2. The height ofthe boxplots represents the interquartile range (IQR). The dark line inside the box representsthe median. The lower and upper whisker extend to the lowest and highest values within 1.5IQR of the first and third quartile, respectively. perturbations are added individually to mimic mutation.As shown by several researchers [57], the homogeneity in the gene pool increaseswith the generations, and as the perturbations due to mutation are typicallysmall, the reduction of error becomes a problem after a few generations. Thus,to restrict homogeneity in the gene pool, a small number of offsprings µ areselected from the total N g − ρF number of generated offsprings, and replacedthem with randomly generated chromosomes to maintain diversity. This step iscalled as ‘diversification’ of gene pool.In our problem, the parameters Θ of the PCA model G (Θ) are the state tran-sitional probabilities p ei , p iq , p ir , p qr , infection probabilities p e and p i , statetransition delays τ ei , τ iq , τ qr , τ ir , neighbourhood d , and death probability p β as mentioned in Sec.2.2. As optimizing these many parameters simultaneouslymight be challenging and require huge amount of resources, we propose a vari-ant of GA with sequential evolution mechanism where instead of optimizing thesolutions simultaneously, the parameters are optimized sequentially. Let us de-fine a set of generations as an era . For the first era containing a small numberof generations, a traditional GA methodology is followed as discussed this farto have a set of initial parameters. From the next era onward, two parametersare fixed and optimized sequentially in that era. Mutation and crossover arerestricted to those two respective genes, whereas parent selection is done basedon the performances of the entire chromosomes. This newly proposed sequen-11 igure 4: Time series data for active cases (blue) of COVID-19 pandemic in different countrieswhere the cases are saturating, and estimated active cases (red) from proposed PCA-GAmethod. tial optimization of parameters of PCA using GA is defined as PCA-GA. Theproposed approach can optimize a large number of parameters using limitedresources efficiently. All the notations used in PCA-GA are briefly summarizedin the Table 2.Proposed PCA-GA has a complexity which can be approximated as O ( N g T g O ( f ))where N g is the number of population, T g is the total generation and O ( f ) isthe complexity to measure the fitness in the GA. For a large enough N g , T g is considered as a comparatively smaller constant and thus, the complexity ofthe entire algorithm is mainly governed by N g and O ( f ). The complexity ofestimating the fitness can be approximated as O ( f ) = O ( T + 8 N τ T ) for Mooreneighbourhood criteria, where N is the total population on the 2D grid. Thelength of the original time series data T , and τ , the maximum of τ ij , are bothconstant, and thus O ( f ) can be represented as O ( N ).Though GA has been selected as a strategy to optimize the parameters of theproposed PCA model, it is evident that because of the generalized constructionof the proposed framework, other meta-heuristic methods could also be em-ployed to search the parameters of the spatially driven SEIQR model which isthe main focus of this work. However, presence of mutation and diversificationin GA help to search for better solutions as the search space is extremely large.
3. Results
To validate the effectiveness of the proposed framework, using PCA-GA, theactual statistics of COVID-19 spreads till 20th June, 2020 in different countries12s used. For finalizing the data-set from available data of 213 countries, severalaspects have been considered. At first, 102 countries had been dropped due toless number of reported cases (less than 1000 reported cases till 20th June 2020).Out of the remaining countries, some countries, like Iran, Greece, Paraguay etc.,are removed due to data inconsistency, and finally 40 countries are randomlyselected ensuring the following points: • At least 2 countries from each continent got selected to maintain demo-graphic diversity in our data. • Care has been taken to maintain significant variation in population den-sity, which we believe as a major factor contributing in disease transmis-sion. • It was ensured that countries from three distinct stages of COVID-19infection are considered: (i) where the infection is significantly diminished,(ii) where the peak infection has been reached but substantial infectionstill persists, and (iii) where consistent growth in infection is occurring.With these widely variant spectrum of time series data, we proceed for quan-titative calibration and interpretation through the proposed methodology. Alldata samples are taken from the website worldometers.info .To point out the major contributing factors in dynamics of infection spread,for every country under consideration, three available time series, namely dailyactive cases, total number of infected cases and total number of deaths are ac-cumulated. Out of these three series, the daily active cases time series is usedfor model formulation, and the rest are considered for model validation. It isimportant to mention that the population q t is the relevant observable here, asinfected people as i t and e t remain latent and undetected in the population.The reported daily active case data is associated with lifetime of the infection,and are used in this study to check the effectiveness of the proposed frameworkas follows. By applying PCA-GA on the daily active case data of a particularcountry, the parameters Θ ∗ that gives the minimum l error is extracted. Forvalidation of the optimized parameters and understanding the robustness of thealgorithm, results generated by using G (Θ ∗ ) for the total infected states and de-ceased states are then compared with the real-world data. Here it must be notedthat the optimal parameters Θ ∗ remain unaltered and no further optimizationis performed. For all the simulations, PCA is initialized with a fixed lattice size of 100 × n e = 50 and n i = 4. The population n q and n r are set to zero at t = 0.The susceptible population n s has been initiated depending on the populationdensity of a country as follows: among the countries considered in our study, igure 5: Time series data for active cases (blue) of COVID-19 pandemic in different countrieswhere the cases are increasing exponentially, and estimated active cases (red) from proposedPCA-GA method. for the country with lowest population density (Canada), n s = 2500 has beenselected, and for the country with highest population density (Singapore), n s =6000 has been fixed. For any other country, n s has been assigned within thisrange using logarithmic scaling based on the population of that country. As eachof the parameters of PCA-GA has physical relevance, the sequential searchingprocess has been initiated by following restrictions of ranges. It is important tonote that in our problem, genes associated with probabilities are initiated in therange [0 ,
1] and clipped during the optimization process accordingly. The statetransition delays τ ei (incubation period) and τ iq (testing delay) are consideredto be within the range (0 , τ ir and τ qr (correspondingrecovery periods) are initialized in the range (20 , The daily active cases can be defined as the c t = c t − + q t − r t where c t is thenumber of active cases at time instance t having the initialization c = 0. In Fig.2, the active cases of 20 different countries are shown along with the respectiveestimated active cases using PCA-GA model. For the countries shown in Fig.2, the first peak of the infection is already crossed and a steady fall in theinfection spread is observed. It can also be seen that some of the active cases ofthe countries like China, Israel, Switzerland, follow smooth bell-shaped curves,whereas for some countries, like Australia, Cyprus, Hungary etc., the timesseries data deviates from bell-shaped curves with substantial degree of noises.In all the cases, PCA-GA has successfully captures the trend of the time seriesdata estimating the parameters of the epidemiological process. To measurethe goodness of the model estimation, three different metrics has been used tomeasure the quality of the estimated values. The root mean square (RMSE)14istance, correlation distance and chi-square distance [58, 59, 60], denoted as d l , d c and d χ respectively, are computed between the real data and the estimatedvalues from the PCA-GA model to evaluate the effectiveness of the optimizedmodel. For two vectors u and v , we define d l = (cid:118)(cid:117)(cid:117)(cid:116) T T (cid:88) i =1 ( u i − v i ) , d c = 1 − ( u − ¯ u ) . ( v − ¯ v ) (cid:107) ( u − ¯ u ) (cid:107) (cid:107) ( v − ¯ v ) (cid:107) , d χ = T (cid:88) i =0 ( u i − v i ) v i where T is the length of each vector, u i and v i are the i th elements of u and v respectively and (.) denotes dot product of two vectors. As shown in Fig. 3(a),the proposed model performs well in modelling the real data. When evaluatedover all the countries considered in this work, the proposed model fits the datawell, and for only 0% -12.5% cases the fittings were poor depending on theevaluation metric. It is important to mention that all the distance measures areevaluated on normalized data.In Fig. 2, an interesting point to notice is that the peak of the active casesare located at markedly differing time instances, and the other properties, likevariance, skewness etc., of the observed distributions are also varying drastically.The fundamental differences between the fitted curves are quantified with thehelp of boxplot of the parameters in Fig.3(b)-(c) by analysing basic statisticalproperties. The reported boxplots are specifically for the countries selected inFig. 2. It can be noted that p e , p i and p ei exhibit a wide variability in Fig. 3(b).During our analysis, a strong positive correlation with population density for p e and p i has been also observed. This can be thus inferred that the variationin population density in the considered countries causes the wide range of theseparameters. It can be also concluded that high density of population increasesthe probability of transmission of the disease. The considerable difference inthe mean magnitudes of the infection associated probabilities ( p e , p i and p ei )and recovery-related probabilities ( p iq , p ir and p qr ) indicate the sharper riseand slower fall of active cases curves, which results into a skewed distributionin most of the cases (see Fig. 2). In Fig. 3(c), it is also shown that τ ei , whichis identified as the incubation time in the model, exhibits a range of 3-14 dayswith a mean at 7.3, which perfectly aligns with the observed cases all aroundthe world [61]. In this figure, a wide variability in the range of τ ir and τ qr isobserved, which points out the substantial difference in health infrastructure ofthese countries.Here it must be mentioned that, while performing this statistical analysis withall 40 countries, some countries were detected showing consistent outliers (notincluded in Fig. 3(b)-(c)) in terms of four transitional parameters: p ir . p qr , τ ir and τ qr . While analyzing the active case distributions of these outliers, it wasfound out that the time series data for all these countries have a saturating trendwhere the daily active cases do not show an average descent with time. Someof such cases are shown in Fig. 4. Even for these data which have drasticallydifferent qualitative trend compared to countries shown in Fig. 2, the proposedPCA-GA framework has successfully captured the trend of the real time series15 igure 6: Total infected cases (blue) of COVID-19 pandemic in different countries, and esti-mated total cases (red) from proposed PCA-GA method. data accurately.There are also certain countries, like India, Brazil, Chile, Mexico, etc., for whichthe infection spreading started later than the countries like China or Italy, andthe active daily cases are still growing almost exponentially. As shown in Fig.5, PCA-GA is able to estimate the time series data for these countries where theinfection is spreading rapidly. Dynamics of COVID-19 spread in these countriesare of particular interest as the prediction of the peak positions in these countriesmight help immensely to understand the maximum socioeconomic impact of thedisease at a time in that geographical location. While analyzing a complex dynamics like the spread of a pandemic, it isnot always sufficient to model the input real data only. It is required that theoptimized model should be robust and can provide meaningful interpretationswithout further retraining or parameter tuning for real-world applications. Tovalidate the robustness and the effectiveness of the proposed algorithm, the opti-mized model is now employed for three different tasks. At first, the robustness ofthe optimized model is checked by estimating the total number of infected cases,followed by total number of death cases without any further training, tuning orsupervision. Finally, to further validate the efficiency of the model, its perfor-mance has been evaluated for the prediction task by training the model withpartitioned data and evaluating on its future predictions without any furtheroptimization. 16 igure 7: Total deaths (blue) of COVID-19 pandemic in different countries, and estimatedtotal deaths (red) from proposed PCA-GA method.
The total number of infected cases z t at time instance ‘ t ’ can defined as z t = (cid:80) ti =0 q i . This cumulative sum indicates the total number of people whosuffered from the disease at any point of time. For a country, where the firstwave of the infection has passed, e.g., Croatia, Italy, etc., z t follows a sigmoidfunction approximately, whereas for the countries like India, Mexico etc., wherethe infection has not reached the peak, z t follows an exponential function. AsPCA-GA is optimized using the time series information of daily active cases c t , z t is used to validate the parameters learnt by the sequential GA framework inthe following way. Once a particular country is selected, Θ ∗ is estimated usingPCA-GA with the actual c t . Next the ˆ z t for G (Θ ∗ ) is calculated without anyfurther fine-tuning of the parameters, and compared ˆ z t with actual z t . In Fig.6, the total cases (blue) of six such countries are shown along with the best-fitresults obtained from PCA-GA (red) which depict an excellent agreement withthe data. It must be mentioned that for all three dynamical stages of infectionspreading as discussed in Sec. 3.2, i.e., where the first wave of infection haspassed, where the active cases are almost saturated currently or where the activecases are increasing rapidly, our estimated ˆ z t closely matches z t without anyfurther parameter optimization. When evaluated over all 40 countries for thenumber of infected people, the proposed method gives average d l , average d c andaverage d χ as 0.037,0.006 and 0.53 respectively, which exhibits the robustnessof the model. To further validate the ‘goodness’ of the estimated parameters, the parame-ter set Θ ∗ optimized over the daily active cases of a particular country is taken17 igure 8: Prediction of daily active cases from truncated data. For Israel and Switzerland,real data up to 54 and 43 days has been used to predict the daily active cases for 100 days.For prediction, the average of 50 independent PCA-GA simulations are considered. and the identical parameter values are used to compare the estimated totaldeaths with the actual total deaths of that country. Death in the population isthe prime concern in case of the COVID-19 pandemic, and as mentioned in Sec.2.2.1, daily deceased population is a fraction of r t in our model. So, the totalestimated death cases can be defined as ˆ d t = (1 − p β ) (cid:80) ti =0 r i where p β and r i for 0 ≤ i ≤ t are given by Θ ∗ and G (Θ ∗ ) respectively. Fig. 7 demonstrates thecomparison of the actual total death cases d t with estimated total death casesˆ d t for Θ ∗ , the identical set of parameters used for estimating active cases aswell as total cases previously. The same countries shown in Fig. 6 have beenselected to show the robustness of the estimated parameter Θ ∗ using the pro-posed technique. Excellent agreement with data has been found for this case aswell; when evaluated over all 40 countries for the total number of death cases,the proposed method gives average d l , average d c and average d χ as 0.041,0.006and 0.48 respectively. Prediction of future events is always challenging in data modeling [62].Forthe final stage of validation of the methodology, the predictive power of themodel has been tested. As the impacts of this pandemic becomes far reachingas the socioeconomic contexts vary, a considerably accurate prediction about thedynamics of the infection spread can be crucial and useful in many ways. AsPCA-GA successfully estimates the optimal parameter Θ ∗ , the set of parameterscan also be utilised to predict the future course of the infection in that country.To validate the capacity of the prediction strategy, the daily active cases of acountry c t is truncated to c P keeping the first ‘ P ’ values. PCA-GA is appliedon c P to estimate the parameters Θ P . Then Θ P is used to predict the dailyactive cases ˆ c t . As shown in Fig. 8, for two countries Israel and Switzerland, thedaily active case information up to 54 and 43 days respectively are considered18or an attempt to predict the daily active cases up to 100 days. In the figure,the estimated curve (shown in red) is optimized using all the real data pointsavailable, whereas the predicted curve (shown in black) is optimized using thetruncated real data. It can be observed that the predictive estimation closelyfollows the real active case data, even though only ∼
50% data points are usedfor parameter estimation. For Israel and Switzerland, 100 days prediction ofthe algorithm produces ( d l , d c , d χ ) as (0 . , . , .
95) and (0 . , . , . d l , d c , d χ )as (0 . , . , .
82) and (0 . , . , .
6) respectively for the truncated timeseries of Switzerland. SVM regression with RBF kernel performs satisfactorilyon the same truncated data and produces ( d l , d c , d χ ) as (0 . , . , . d l , d c , d χ ) as (0 . , . , . As the PCA-GA methodology has been elaborately validated in Section 3.3,now, in this section, it is employed for the purpose of prediction of consistentlyrising real epidemic data. Though the parameter estimation works well evenwhen the minimum information about the peak position in c t is available, theprediction task becomes really challenging when c t is exponential in nature. Fora particular country where c t is almost exponentially rising, proceeding with pre-diction, first the best set of parameters Θ ∗ is detected by PCA-GA with fitness f ∗ and error e ∗ . As the drop of the infection heavily depends on the transitionalprobabilities p ir , p qr and state transitional delays τ ir and τ qr , this parametersare tuned to find a region of predictions bounded by the possible best case andthe worst case scenarios. While estimating the best case scenario, p ir and p qr is chosen equal to the maximum and minimum p ir and p qr observed in the con-tinent from which the country belongs. The reason behind this strategy is thatthe parameters related to the infection spreading are different in each continentwhich is also observed by [64]. In the best case scenario, transitional delays τ ∗ ir and τ ∗ qr are reduced to obtain best case transitional delays τ (cid:9) ir and τ (cid:9) qr respec-tively such that the fitness remain within 90% of f ∗ , where τ ∗ ir and τ ∗ qr are thecorresponding optimized delays available in Θ ∗ . For the worst case scenario,we consider τ ⊕ ir = τ ∗ ir + α ir and τ ⊕ qr = τ ∗ qr + α qr , where α ir = τ ∗ ir − τ (cid:9) ir and α qr = τ ∗ qr − τ (cid:9) qr .Fig. 9 depicts the prediction of the daily active cases using the method dis-cussed so far. In the Fig. 9, the black dotted line indicates the prediction usingthe optimal parameters Θ ∗ estimated using PCA-GA. The orange line indicatesthe best case scenario, where the maximum daily active cases would be mini-mized given the real data. The red line indicates the worst case scenario basedon the specific conditions mentioned above. The best case and the worst case19 igure 9: Prediction of the course of the disease: Exponentially rising daily active cases forIndia (blue) till 20th July, 2020 are used for parameters estimation and the predictions. scenarios act as limiting cases of an area (shaded in pink color) of probable fu-ture state. Any curve inside the pink region that contains the real data could bethe evolution of the daily active cases in future given the real time series data,that is in exponentially rising state currently. This indicates that for India,which is now one of the biggest epicenters of COVID-19 in South-eastern Asia,the disease can start decline very soon if vigorous measures from governmentand complete support from the public could be achieved. It also shows thatthe maximum active cases on a day, that puts a direct burden on the healthinfrastructure of the country can be restricted below 750,000 if people partici-pate to government indicated mitigation strategies, and recovery rate remainsat its current value. In that case, the peak of the disease is expected to passduring mid-September to mid-October, and the disease can be over with its firstwave by March 2021. But these predictions also imply that the range of futurestates, that are possible for exponentially rising daily active cases, not only de-pend on the evolution of the epidemic so far, but also gets highly affected bythe consistency and implementation efficiency of mitigation strategies.
4. Conclusion
COVID-19 outbreak has created a massive impact all across the globe. Evenafter nation-wide lockdowns, extensive testing strategies and medical supports,the spread of the virus has overwhelmed several countries. Thus, it is becomingmore and more important to understand the nature of the infection spread andthe key parameters that are controlling the spread. In this work, we proposeda probabilistic cellular automata model to understand and depict COVID-19spread using appropriate choice of loss functions and evolutionary optimiza-tion framework. The parameters of this cellular automata model are optimisedusing sequential evolutionary genetic algorithm. It has been shown that this20elf-adapting methodology can be highly flexible and has the power to accu-rately estimate time trajectories of epidemics. This model works with physi-cally interpretable parameters, which are accessible for analysis, data collectionand further experiment, and can be readily identified with ground reality. Thismodel has been successfully employed for optimizing all these parameters simul-taneously for the daily active cases, total infected cases and total deaths withextreme robustness. The performance of the model has been exhibited for alarge number of countries with huge diversity in population density, continentsand available healthcare infrastructures. The predictive strength of the modelhas also been validated extensively, and demonstrated to estimate the course ofthe pandemic for the countries where infection peak has not been reached yet.It is important to mention that the motivation of the work was to develop adata driven, generalized, spatial framework that can be used to estimate rele-vant epidemiological parameters. This methodology is so powerful and flexiblethat physical interpretations of the results obtained from these analyses canhave a wide range implications. Once the data is properly interpreted with theproposed methodology, interesting realistic features can be identified for spe-cific countries. For example, in a pandemic situation, easily relatable factorslike population clusters, variable population density, variable health facilitiesat different places of a country etc, can be studied to understand and predictemergence of new hotspots which can be used to design selective area contain-ment strategies. While we propose and establish the applicability and strengthof this framework in this work, we wish address these application perspectivesin a study in our upcoming research studies.With this proposed platform, the impact of individuality on contagion processcan be explicitly studied, which might be directly related to the questions likelockdown behavioral differences, influence of rumors, vaccination opinion dif-ferences etc. As the effects of more complex dynamical factors like periodiclockdown or population clusters are not considered in this present model, theprediction capability of the proposed model is not satisfactory for time seriesdata with abrupt discontinuities in the present form. The proposed frameworkcould be enhanced with other l p norm distances and different optimization tech-niques like multi-objective genetic algorithm or strength pareto evolutionaryalgorithm. Other swarm-based optimization techniques can also be exploredfor further refinement of the model. The potential of the proposed approachcan be utilized to better understand the disease spreading and controlling, be-yond this pandemic the world is facing currently, by keeping track of the spatialinformation of the dynamics, incorporating realistic behavioural aspects, andoptimizing in terms of demographic as well as socioeconomic features. References [1] World Health Organization Coronavirus disease (COVID-2019) situa-tion reports, Available at url: (accessed June2020), 2020. 212] X. Jin, J.-S. Lian, J.-H. Hu, J. Gao, L. Zheng, Y.-M. Zhang, S.-R. Hao, H.-Y. Jia, H. Cai, X.-L. Zhang, et al., Epidemiological, clinical and virologicalcharacteristics of 74 cases of coronavirus-infected disease 2019 (COVID-19)with gastrointestinal symptoms, Gut 69 (2020) 1002–1009.[3] L. Pan, M. Mu, P. Yang, Y. Sun, R. Wang, J. Yan, P. Li, B. Hu, J. Wang,C. Hu, et al., Clinical characteristics of COVID-19 patients with diges-tive symptoms in Hubei, China: a descriptive, cross-sectional, multicenterstudy, The American journal of gastroenterology 115 (2020).[4] Y. Cheng, R. Luo, K. Wang, M. Zhang, Z. Wang, L. Dong, J. Li, Y. Yao,S. Ge, G. Xu, Kidney disease is associated with in-hospital death of patientswith COVID-19, Kidney International (2020).[5] C. Han, C. Duan, S. Zhang, B. Spiegel, H. Shi, W. Wang, L. Zhang, R. Lin,J. Liu, Z. Ding, et al., Digestive symptoms in COVID-19 patients withmild disease severity: clinical presentation, stool viral RNA testing, andoutcomes, The American journal of gastroenterology (2020).[6] Y.-Y. Zheng, Y.-T. Ma, J.-Y. Zhang, X. Xie, COVID-19 and the cardio-vascular system, Nature Reviews Cardiology 17 (2020) 259–260.[7] C. Wang, R. Pan, X. Wan, Y. Tan, L. Xu, C. S. Ho, R. C. Ho, Immedi-ate psychological responses and associated factors during the initial stageof the 2019 coronavirus disease (COVID-19) epidemic among the generalpopulation in China, International journal of environmental research andpublic health 17 (2020) 1729.[8] Y. Wang, Y. Wang, Y. Chen, Q. Qin, Unique epidemiological and clini-cal features of the emerging 2019 novel coronavirus pneumonia (covid-19)implicate special control measures, Journal of medical virology 92 (2020)568–576.[9] N. van Doremalen, T. Bushmaker, D. H. Morris, M. G. Holbrook, A. Gam-ble, B. N. Williamson, A. Tamin, J. L. Harcourt, N. J. Thornburg, S. I.Gerber, et al., Aerosol and surface stability of SARS-CoV-2 as comparedwith SARS-CoV-1, New England Journal of Medicine 382 (2020) 1564–1567.[10] Y. Bai, L. Yao, T. Wei, F. Tian, D.-Y. Jin, L. Chen, M. Wang, Presumedasymptomatic carrier transmission of COVID-19, Jama 323 (2020) 1406–1407.[11] H. Nishiura, T. Kobayashi, T. Miyama, A. Suzuki, S. Jung, K. Hayashi,R. Kinoshita, Y. Yang, B. Yuan, A. R. Akhmetzhanov, et al., Estima-tion of the asymptomatic ratio of novel coronavirus infections (COVID-19),medRxiv (2020). 2212] P. Yu, J. Zhu, Z. Zhang, Y. Han, A familial cluster of infection associatedwith the 2019 novel coronavirus indicating possible person-to-person trans-mission during the incubation period, The Journal of infectious diseases221 (2020) 1757–1761.[13] G. Giordano, F. Blanchini, R. Bruno, P. Colaneri, A. Di Filippo, A. Di Mat-teo, M. Colaneri, Modelling the covid-19 epidemic and implementation ofpopulation-wide interventions in italy, Nature Medicine (2020) 1–6.[14] Z. Yang, Z. Zeng, K. Wang, S.-S. Wong, W. Liang, M. Zanin, P. Liu, X. Cao,Z. Gao, Z. Mai, et al., Modified seir and ai prediction of the epidemics trendof covid-19 in china under public health interventions, Journal of ThoracicDisease 12 (2020) 165.[15] V. Volpert, M. Banerjee, S. Petrovskii, On a quarantine model of coro-navirus infection and data analysis, Mathematical Modelling of NaturalPhenomena 15 (2020) 24.[16] C. Li, L. J. Chen, X. Chen, M. Zhang, C. P. Pang, H. Chen, Retrospectiveanalysis of the possibility of predicting the covid-19 outbreak from internetsearches and social media data, china, 2020, Eurosurveillance 25 (2020)2000199.[17] L. Li, Z. Yang, Z. Dang, C. Meng, J. Huang, H. Meng, D. Wang, G. Chen,J. Zhang, H. Peng, et al., Propagation analysis and prediction of the covid-19, Infectious Disease Modelling 5 (2020) 282–292.[18] S. J. Fong, G. Li, N. Dey, R. G. Crespo, E. Herrera-Viedma, Compositemonte carlo decision making under high uncertainty of novel coronavirusepidemic using hybridized deep learning and fuzzy rule induction, AppliedSoft Computing (2020) 106282.[19] K. Chatterjee, K. Chatterjee, A. Kumar, S. Shankar, Healthcare impactof covid-19 epidemic in india: A stochastic mathematical model, MedicalJournal Armed Forces India (2020).[20] S. J. Fong, G. Li, N. Dey, R. G. Crespo, E. Herrera-Viedma, Finding anaccurate early forecasting model from small dataset: A case of 2019-ncovnovel coronavirus outbreak, arXiv preprint arXiv:2003.10776 (2020).[21] G. Baltas, F. A. Prieto Rodr´ıguez, M. Frantzi, C. Garc´ıa Alonso,P. Rodr´ıguez Cort´es, et al., Monte carlo deep neural network model forspread and peak prediction of covid-19 (2020).[22] D. Khatua, A. De, S. Kar, E. Samanta, A. A. Seikh, D. Guha, A fuzzydynamic optimal model for covid-19 epidemic in india based on granulardifferentiability, Available at SSRN 3621640 (2020).2323] P. Liu, P. Beeler, R. K. Chakrabarty, Covid-19 progression timeline andeffectiveness of response-to-spread interventions across the united states,medRxiv (2020).[24] M. C. Traini, C. Caponi, G. V. De Socio, Modelling the epidemic 2019-ncovevent in italy: a preliminary note, medRxiv (2020).[25] S. Lai, I. I. Bogoch, N. W. Ruktanonchai, A. Watts, X. Lu, W. Yang,H. Yu, K. Khan, A. J. Tatem, Assessing spread risk of wuhan novel coron-avirus within and beyond china, january-april 2020: a travel network-basedmodelling study, medRxiv (2020).[26] L. Wynants, B. Van Calster, M. M. Bonten, G. S. Collins, T. P. Debray,M. De Vos, M. C. Haller, G. Heinze, K. G. Moons, R. D. Riley, et al., Pre-diction models for diagnosis and prognosis of covid-19 infection: systematicreview and critical appraisal, bmj 369 (2020).[27] C. T. Bauch, J. O. Lloyd-Smith, M. P. Coffee, A. P. Galvani, Dynamicallymodeling sars and other newly emerging respiratory illnesses: past, present,and future, Epidemiology (2005) 791–801.[28] G. R. Shinde, A. B. Kalamkar, P. N. Mahalle, N. Dey, J. Chaki, A. E.Hassanien, Forecasting models for coronavirus disease (covid-19): A surveyof the state-of-the-art, SN Computer Science 1 (2020) 1–15.[29] B. M. Althouse, J. Lessler, A. A. Sall, M. Diallo, K. A. Hanley, D. M. Watts,S. C. Weaver, D. A. Cummings, Synchrony of sylvatic dengue isolations: amulti-host, multi-vector sir model of dengue virus transmission in senegal,PLoS Negl Trop Dis 6 (2012) e1928.[30] R. M. Anderson, R. M. May, Infectious diseases of humans: dynamics andcontrol, Oxford university press, 1992.[31] H. W. Hethcote, Asymptotic behavior in a deterministic epidemic model,Bulletin of Mathematical Biology 35 (1973) 607–614.[32] H. Behncke, Optimal control of deterministic epidemics, Optimal controlapplications and methods 21 (2000) 269–285.[33] S. Bhattacharya, K. Gaurav, S. Ghosh, Viral marketing on social networks:An epidemiological perspective, Physica A: Statistical Mechanics and itsApplications 525 (2019) 478–490.[34] Y. Liu, A. A. Gayle, A. Wilder-Smith, J. Rockl¨ov, The reproductive num-ber of COVID-19 is higher compared to SARS coronavirus, Journal oftravel medicine (2020).[35] E. Shim, A. Tariq, W. Choi, Y. Lee, G. Chowell, Transmission potential andseverity of COVID-19 in South Korea, International Journal of InfectiousDiseases (2020). 2436] A. J. Kucharski, T. W. Russell, C. Diamond, Y. Liu, J. Edmunds, S. Funk,R. M. Eggo, F. Sun, M. Jit, J. D. Munday, et al., Early dynamics oftransmission and control of covid-19: a mathematical modelling study, Thelancet infectious diseases (2020).[37] L. Peng, W. Yang, D. Zhang, C. Zhuge, L. Hong, Epidemic analysis ofcovid-19 in china by dynamical modeling, arXiv preprint arXiv:2002.06563(2020).[38] W. O. Kermack, A. G. McKendrick, A contribution to the mathematicaltheory of epidemics, Proceedings of the royal society of london. Series A,Containing papers of a mathematical and physical character 115 (1927)700–721.[39] A. Rachah, D. F. Torres, Analysis, simulation and optimal control ofa seir model for ebola virus with demographic effects, arXiv preprintarXiv:1705.01079 (2017).[40] T. Berge, J.-S. Lubuma, G. Moremedi, N. Morris, R. Kondera-Shava, Asimple mathematical model for ebola in africa, Journal of biological dy-namics 11 (2017) 42–74.[41] T. Toffoli, N. Margolus, Cellular automata machines: a new environmentfor modeling, MIT press, 1987.[42] S. Wolfram, Cellular automata and complexity: collected papers, CRCPress, 2018.[43] N. Boccara, K. Cheong, M. Oram, A probabilistic automata network epi-demic model with births and deaths exhibiting cyclic behaviour, Journalof Physics A: Mathematical and General 27 (1994) 1585.[44] C. Beauchemin, J. Samuel, J. Tuszynski, A simple cellular automatonmodel for influenza a viral infections, Journal of theoretical biology 232(2005) 223–234.[45] H. Fuks, A. T. Lawniczak, Individual-based lattice model for spatial spreadof epidemics, Discrete Dynamics in Nature and Society 6 (2001).[46] R. Willox, B. Grammaticos, A. Carstea, A. Ramani, Epidemic dynam-ics: discrete-time and cellular automaton models, Physica A: StatisticalMechanics and its Applications 328 (2003) 13–22.[47] P. Eosina, T. Djatna, H. Khusun, A cellular automata modeling for visu-alizing and predicting spreading patterns of dengue fever, Telkomnika 14(2016) 228.[48] K. S. Pokkuluri, S. U. D. Nedunuri, A novel cellular automata classifier forcovid-19 prediction, Journal of Health Sciences 10 (2020) 34–38.2549] M. Dascalu, M. Malita, A. Barbilian, E. Franti, G. M. Stefan, Enhanced cel-lular automata with autonomous agents for covid-19 pandemic modeling,ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECH-NOLOGY 23 (2020) S15–S27.[50] S. Ghosh, S. Bhattacharya, Computational model on covid-19 pandemicusing probabilistic cellular automata, arXiv preprint arXiv:2006.11270(2020).[51] A. H. Wright, Genetic algorithms for real parameter optimization, in:Foundations of genetic algorithms, volume 1, Elsevier, 1991, pp. 205–218.[52] L. Yao, W. A. Sethares, Nonlinear parameter estimation via the geneticalgorithm, IEEE Transactions on signal processing 42 (1994) 927–935.[53] S. Katare, A. Bhan, J. M. Caruthers, W. N. Delgass, V. Venkatasubrama-nian, A hybrid genetic algorithm for efficient parameter estimation of largekinetic models, Computers & chemical engineering 28 (2004) 2569–2581.[54] M. Gulsen, A. Smith, D. Tate, A genetic algorithm approach to curvefitting, International Journal of Production Research 33 (1995) 1911–1923.[55] C. L. Karr, B. Weck, D.-L. Massart, P. Vankeerberghen, Least mediansquares curve fitting using a genetic algorithm, Engineering Applicationsof Artificial Intelligence 8 (1995) 177–189.[56] P. H. Schimit, Evolutionary aspects of spatial prisoners dilemma in a pop-ulation modeled by continuous probabilistic cellular automata and geneticalgorithm., Applied Mathematics and Computation 290 (2016) 178–188.[57] J. H. Holland, et al., Adaptation in natural and artificial systems: anintroductory analysis with applications to biology, control, and artificialintelligence, MIT press, 1992.[58] T. W. Liao, Clustering of time series dataa survey, Pattern recognition 38(2005) 1857–1874.[59] J. Gao, H. Sultan, J. Hu, W.-W. Tung, Denoising nonlinear time seriesby adaptive filtering and wavelet shrinkage: a comparison, IEEE signalprocessing letters 17 (2009) 237–240.[60] O. Salem, Y. Liu, A. Mehaoua, Anomaly detection in medical wsns us-ing enclosing ellipse and chi-square distance, in: 2014 IEEE InternationalConference on Communications (ICC), IEEE, pp. 3658–3663.[61] World Health Organization coronavirus disease (COVID-2019) situationreports, Available at url: