A mathematical epidemic model using genetic fitting algorithm with cross-validation and application to early dynamics of COVID-19 in Algeria
Mohamed Taha Rouabah, Abdellah Tounsi, Nacer Eddine Belaloui
AA mathematical epidemic model using geneticfitting algorithm with cross-validation andapplication to early dynamics of COVID-19 inAlgeria
M. T. Rouabah ∗ , A. Tounsi, N. E. Belaloui Laboratoire de Physique Mathmatique et SubatomiqueFrres Mentouri University Constantine - 1, Ain El Bey Road, Constantine,25017, Algeria.
Abstract.
A compartmental epidemic model based on genetic fitting algorithmand using cross-validation method to overcome the overfitting problem isproposed. This generic enhanced SEIR model allows to estimate approximatenowcast and forecast of epidemic evolution including key epidemic parameters andnon-measurable asymptomatic infected portion of the susceptible population. Themodel is used to study COVID-19 outbreak dynamics in Algeria between February25th and May 24th. Basic reproduction number on Feb. 25th is estimated to 3.78(95% CI 3.033-4.53) and effective reproduction number on May 24th after threemonths of the outbreak is estimated to 0.651 (95% CI 0.539-0.761). Infection peaktime is predicted to the end of April while active cases peak time is predictedto the end of May 2020. The disease incidence, CFR and IFR are calculated.Information provided by this study could help establish a realistic assessment ofthe situation in Algeria for the time being, inform predictions about potentialfuture evolution, and guide the design of appropriate public health measures.
Keywords : COVID-19; mathematical modeling; genetic algorithm; cross-validation;Algeria. ∗ Corresponding author: [email protected], [email protected]. a r X i v : . [ q - b i o . P E ] J un
1. Introduction
The recent outbreak of the highly infectious COVID-19 disease caused by SARS-CoV-2 in Wuhan and othercities in China in 2019 has become a global pandemicsince the first quarter of 2020 as declared by the WorldHealth Organization (WHO). SARS-CoV-2 has beenfirst imported to Algeria on Feb. 17th, 2020 by anItalian national who has been confirmed positive toCOVID-19 on Feb. 25th [1]. The Italian man hasbeen repatriated back to Italy via a special flight onFeb. 28th and no contaminated individuals by thisfirst confirmed case have been reported by the Algerianofficial authorities [2]. As far as we know, the effectiveoutbreak of COVID-19 in Algeria started late Feb.2020. Indeed, the Algerian Health Ministry (AHM)reported in a statement on March 2nd the two firstconfirmed cases of COVID-19 in Blida province southof the capital Algiers [3]. Since then the spread of thevirus in Algeria has gone through different epidemicphases [4].Besides medical and biological research, theoreti-cal studies based on either statistics or mathematicalmodeling may also play a key role in understandingthe epidemic characteristics of the outbreak. Epidemicmodeling represents a crucial tool in forecasting theinflection point and ending time and provides insightsinto the epidemiological situation. Such analysis canpredict the potential future evolution, help estimatethe efficiency of already taken measures, and guide thedesign of alternative interventions. To the best of ourknowledge, few theoretical studies on COVID-19 out-break in Algeria have so far been achieved [5]. The lackof theoretical and clinical publicly accessible studiesabout SARS-CoV-2 spread in Algeria exposing the ac-tual situation and analyzing possible evolution scenar-ios is making the situation more confusing for the Al-gerian public and scientific community. A lot of studieson COVID-19 specifications and dynamics around theworld are published every day, some of which includingAlgerian case [6, 7]. However, we believe that any anal-ysis of COVID-19 outbreak in Algeria should take intoconsideration many specific aspects that are not con-sidered in such universal studies and online-simulatorswhich use raw data accessible on many databases. Be-yond the fact that the majority of those databasescontain many wrong reported data for Algeria, datanomenclature and interpretation, as well as test meth-ods proper to every country should be taken into con-sideration for more accurate outcomes. In our anal-ysis instead of relying only on official Reverse Tran-scriptase Polymerase Chain Reaction (RT-PCR) con-firmed SARS-CoV-2 infection cases, which are stronglyaffected by limited test capacities, we rather combinethem with the official number of hospital admitted pa-tients due to SARS-CoV-2 infection in order to deduce the effective number of new confirmed infection perday. This choice makes a significant difference not onlyon the cumulative number of confirmed cases but alsoon the nowcast and forecast of the virus spread.The paper is organized as follows: in the nextsection we present the mathematical model we usefor the dynamical modeling of COVID-19 propagationand some results of the model with reference to thepandemic spread in some specific countries. The thirdsection will be devoted to the application of the modelon the Algerian case through the estimation of keyepidemic parameters and a forecast analysis. Resultswill be shown and discussed in the fourth section. Theconcluding section will include some ideas about futuredevelopments of this work.
2. Model and Methods
At the very beginning of the epidemic, during thefree spread phase, it is common to assume an initialexponential-growth, which is characteristic of mosthuman infectious diseases [8]. However, spontaneousherd immunity, protections and lockdown measureswill confront the geometrical evolution. A dynamicalmodel is then required to describe the evolution of thedisease.
The compartmental classical Susceptible ExposedInfectious Recovered (SEIR) model [8, 9] has been themost widely adopted model for characterizing manyhistorical propagating infectious diseases such as theSpanish flu [10]. SEIR model is extensively used tostudy the COVID-19 pandemic in China and manyother countries with variations best suiting the subjectregion and time period [11, 12, 13].Regarding the novelty of the time course ofinfection shown by the disease and the requiredprotection measures, to simulate COVID-19 spreadwe use a SEIQRDP model in which at time t thepopulation is split into compartments that representthe different stages of a disease [12, 14]. S ( t ) representsthe susceptible portion of the population i.e. thoseyet to be infected. P ( t ) represents the effectivelyprotected population, mainly individuals who tendsto strictly follow the standard advised protectionmeasures such as wearing masks, physical distancing...etc. Hence, this part of the population is consideredas not susceptible to be infected. Introducing thiscompartment is crucial to reflect increasing awarenesswithin the major part of the population as thepandemic evolves and allows to take into considerationthe control measures taken by authorities to fightagainst the pandemic such as closing public area,suspending public transportation and lockdown. E ( t )represents a latent state in which individuals have beenexposed to the disease but are not yet infectious, i.e.the individuals in this stage have the virus but cannot infect others whereas I ( t ) represents those that arecurrently infectious. The asymptomatic exposed andinfectious portion of the population are not detectableand hence non-measurable. The proportion of this partof the population can only be revealed by theoreticalmodeling of the disease. Q ( t ) represents quarantinedindividuals considered as active cases, R ( t ) representsindividuals that have recovered from the disease andsupposed to no longer take part in the disease spreadand D ( t ) represents closed cases or deaths. N = S ( t ) + E ( t ) + I ( t ) + Q ( t ) + R ( t ) + D ( t ) + P ( t ) is thetotal population at time t considered constant at thetime scale of the epidemic evolution. The SEIQRDPmodel represents the virus propagation by a set ofordinary differential equations (ODEs) that associatetransition parameters to the mobility of individualsbetween population compartments defined above:˙ S = − βS ( t ) I ( t ) /N − αS ( t ) , (1)˙ E = βS ( t ) I ( t ) /N − γE ( t ) , (2)˙ I = γE ( t ) − δI ( t ) , (3)˙ Q = δI ( t ) − λQ ( t ) − κQ ( t ) , (4)˙ R = λQ ( t ) , (5)˙ D = κQ ( t ) , (6)˙ P = αS ( t ) . (7)where ˙ S refers to the time derivative of S . The positiverate α called the protection rate, is introduced intothe model assuming that the susceptible populationis stably decreasing as result of increasing populationawareness and public health authorities actions [12].All the other parameters depend on the evolution ofthe epidemic, testing and health care capacities and arecalculated based on the official daily confirmed cases,deaths and recoveries numbers. The transmissionrate β represents the ability of an infected individualinfecting others (depending on the population density,the toxicity of the virus etc . . . ) and βS ( t ) I ( t ) /N isthe incidence of the disease, i.e., the number of newinfected individuals yielding in unit time at time t [15]. γ − is the average latent time that an individual spendsincubating the virus to become infectious (infected butnot yet infectious) and δ − is the average infectiousnesstime, i.e, time for an infectious individual to getsymptoms and get detected and quarantined. λ isthe cure rate and κ is the mortality rate while λ − and κ − represent quarantine to recovery time andquarantine to death time respectively. These transitionparameters are used by the model to define a time-dependent number of secondary cases generated by aprimary infectious individual R t = βδ − S ( t ) /N knownas the effective reproduction number. It is a veryimportant parameter for the analysis of any epidemic outbreak and provides a measure of the intensity ofinterventions required to control the virus propagation.In general, if R t > R t ≤
1, which mathematically correspondsto ˙ E + ˙ I ≤
0, the disease dies out . At the beginningof the epidemic matching a situation of a completelysusceptible population, this quantity is known as thebasic reproduction number R = βδ − and obtainedby the next generation matrix method [16].Even though many COVID-19 studies try tocalculate universal mean values of the reproductionnumber and transition parameters in some specificspots of the outbreak, they remain strongly relatedto local data and could change from one country toanother and even from one region to another within thesame country. Such parameters are the kind of valuableinformation this model could provide in addition toapproximate peak times of the disease (infection peaktime and active cases peak time) and approximatenumbers of the non-measurable asymptomatic cases,active cases, and total quarantined, recoveries anddeaths cases. An a priori knowledge of those numbers,though approximate, could help to optimize humanand material resources on the global and local scales ofa country. In this SEIQRDP model, key parametersare extracted from official numbers of cumulativeconfirmed cases, recoveries and deaths available at agiven period of the epidemic. The parameters obtainedeither by direct calculation or by a fitting algorithmare used to construct the variables curves that fit theinitial data. Those curves are then extrapolated to alonger period of time, thus forecasting the evolution ofthe epidemic. To calibrate our model’s parameters and fit thereal data originating from a specific region ofthe world, many fitting methods are available,most of which are widely used in epidemiologicalstudies and machine learning models. Single stageproblems such as calibrating the parameters of ourmodel are usually solved with modified deterministicoptimization methods such as the L-BFGS-B method[17, 18]. However, a stochastic method would havethe benefit of taking into account the diversity ofthe possible calibrations scenarios. Evolutionarygenetic algorithms are one of those stochastic methodsthat have a good reputation in solving optimizationproblems. In the following section we will discussthe advantages that the genetic algorithm method canyield to our study.
Definition
A genetic algorithm (GA) is an optimization approachinspired by Darwinian evolution in which an initial setof candidate solutions called initial population, eachrepresented by a set of values making a genome, evolveby breeding and reproducing while being subject torandom mutations [19]. The key mechanism in aGA is that only the best performing solutions get toreproduce and pass on their genes just as in Darwiniannatural selection. The evolution process is finallystopped after a certain number of generations whena defined stopping condition is met.
Application
In our case, applying a GA to find the best fitting forthe SEIQRD model parameters is a straightforwardapplication of the above definition. The genome isthe set of parameters itself, breeding is the process ofcreating a new set from two parent sets by randomlyselecting genes from either one of the parents, andmutation is a random alteration of one of the genesof the resulting new genome. The best performingset of parameters is the one for which the curvesproduced by the model match the best the originaldata. This is measured by a normalized leastsquares method. We can speed up the process byconstraining the randomly generated initial populationto be somewhat around already published values forCOVID-19 epidemic parameters [20]. Different runsof the GA give slightly different solutions. From thesesolutions, an error on the prediction made by the modelcan be computed.
The overfitting problem
Since finding the correct SEIQRDP parameters for anepidemic is essentially a curve fitting problem, thepredictive effectiveness of the model can considerablybe reduced if we underfit or overfit the availablereal data, which we will call training data. If thetraining data is underfitted, the model could simplydiverge or give overestimated numbers with very largevariance. On the opposite, if the data is overfitted,the predictive curves produced by the model will bestrongly influenced by the given training data andwill have very low variance. Thus artificially reducingthe error on the predicted numbers and eventuallyleading to a non-realistic forecast. Overfitting remainsa major problem with epidemic dynamical models[21]. In many of them overfitting occurs becauseso many parameters can fluctuate over their rangeof uncertainty that their fitted values can becomeexcessively influenced by noise in the original data [22].Thus, restrictions have been applied to some epidemicanalysis including COVID-19 outbreak in order to diminish the number of free parameters and avoidoverfitting affecting the pertinence of those studies[12]. To overcome this issue we cut off the fittingprocess after a number of generations which is largeenough to actually fit the training data and smallenough to not go beyond overfitting limits. We willcall this number the optimum generation number G opt and we compute it using data from a given province orcountry passing through a two-samples cross-validationprocedure. Thus, G opt corresponds to the fitting depththat ensures the well balance between underfitting andoverfitting in our model. Computation of the optimum generation number
Cross-validation is a procedure where an originaltraining set is split into training and validation subsets,and where the model is trained on the first subsetand tested for the second one [23]. In our case, theoriginal training set is the whole available data on theCOVID-19 epidemic for a given country or region for n days. This data is then split into a training subsetcontaining the data of the first n − v days, and avalidation subset of the last v days. The ratio v/n depends on the number of adjustable parameters inthe regression problem [24]. This ratio is around 1/4for the SEIQRDP model.To determine G opt , we run our genetic algorithmfor fitting with the training subset. After eachgeneration we measure the fitness of the best solutionwith the validation set. The expected result of thisprocess is a bad fitness for very low generationsnumber G , which gets better with every new generationuntil we start overfitting the training subset (high G )resulting in a worse fitness. The value of G ≡ G opt for which the fitness on the validation set is the best ischosen as the stopping point for the genetic algorithmwhen applied for predictive purposes. In order to allow take advantage of the fitting algorithmand the cross-validation method presented in this studyfor other epidemic cases, a tailored set of Pythonprograms developed by the authors have been gatheredin a Python package and made accessible onlinewith all the necessary instructions for installation andefficient use [25]. This package is adapted for parallelcomputation and includes tools to: download datafrom online repositories, determine the optimum fittingdepth for a given city, region or country using the cross-validation method, calibrate the SEIQRDP model byfitting the real data with the genetic algorithm, andsolve the system of ODEs to produce the forecast.Hence, our generic programs might be easily applied tostudy any outbreak for which a compartmental analysis (a)
Italy, G opt = 40 , S N = 17%. (b) Spain, G opt = 20 , S N = 10%. (c) Germany, G opt = 10, S N = 7%. (d) South Korea, G opt = 10 , S N = 3%. Figure 1:
Results of the SEIQRDP model forecast for COVID-19 outbreak in (a) Italy, (b) Spain, (c) Germany and (d) South-Korea. A training sets of 30, 45, 60 and 90 days of official data (green tri down marker lines), respectively, are initially used tofit the model’s parameters. The forecast curves (blue lines) are calculated using the SEIQRDP model and compared to real activecases curves (red lines) for each country. Light blue shading represents 95% confidence intervals of the model estimate. Even withonly four or six weeks of training data the model is able to produce a realistic forecasting estimate. All fits have coefficient ofdetermination R > . is adequate in any region of the world provided that asufficient amount of epidemic data is available. Provided a reasonably accurate data, our modelsuccessfully reproduces the evolution of COVID-19in different spots worldwide for which a sufficientamount of data is available. In this section we presentthe results obtained using the SEIQRDP model toestimate the active cases evolution in Italy, Spain,Germany and South Korea. For those countries weuse publicly available confirmed cases, recoveries anddeaths numbers from online raw data set [26, 27]. Wepick training data starting from the date for whichall confirmed cases, recoveries and deaths numberstake non-zero values to avoid computational bugs andoptimize parameters fitting. The active cases curve is then reproduced for 6 months following that date.In order to highlight the efficiency of the model weuse only an early part of the official data to train themodel instead of all the available data. We have used30 days training data for Italy, 45 days for Spain, 60days for Germany and 90 days for South Korea. Fig.1shows that the results obtained using SEIRQDP arein very good accordance with official statistics for thenumber of active cases in those countries. All thefittings have a coefficient of determination R > . S N to evaluate the goodness ofthe fits. We remind that in order to avoid overfittingthe training data and obtain the best forecast, we lookfor the optimum fitting corresponding to G opt ratherthan the best one. For Italy, the model is able toreproduce to a good approximation ( S N = 17%) the Figure 2:
Estimation of key epidemic parameters during the early stage of COVID-19 outbreak in Algeria (Feb. 25th - May24th). Intermediate values are calculated for five different time periods corresponding to specific phases of the virus propagationwith specific circumstances and authorities measures severity. The intermediate values of each parameter are compared to its meanvalue on the whole three months period (dashed red line). Protection rate, transmission rate, latent time and infectious time areestimated using the SEIQRDP model while recovery and fatality rates are calculated from official data. Error bars represent 95%confidence intervals of the model estimate (color online). active cases curve with only 30 days of training data.Fig.1 reveals that the larger is the training data samplethe lower is the optimum number of generation G opt used by the genetic algorithm to fit the data with alower S N . The active cases curve is one of the mostpertinent in our opinion as it reflects the amplitudeof the epidemic outbreak as well as the efficiency ofthe measures applied to control it. Moreover, theepidemic will end only if all the active cases are closed.Germany and South Korea are very special cases andneed profound analysis that is beyond the scope ofthis paper, but one can clearly observe the quickerdecrease on their active cases curves after the epidemicpeak time due to particular strategies to control theepidemic.
3. Model estimation for Algeria
For COVID-19 dynamics study in Algeria, we useofficial public data provided by the AHM [28, 29]. Ouranalysis specificity, that we believe makes its previsionresults for Algeria more accurate than different studiesin which Algeria is presented as an example [6], is thefact that instead of relaying on official numbers of RT-PCR-confirmed SARS-CoV-2 infection cases, whichare strongly affected by limited test capacities, wededuce the effective number of confirmed infections perday by considering the number of hospital admittedpatients. This number is considered as the effectivenumber of active cases in our study. The effectiveconfirmed cases number for a given date is then deduced by adding computed tomography (CT) scansconfirmed cases to the official RT-PCR confirmed cases(see Appendix Appendix A).
Besides the more exciting forecast use of the SEIRQDPmodel, this latter is particularly efficient for nowcastestimations. Indeed, fitting the official data allows usto estimate key epidemic parameters of the early stagespread of COVID-19 in Algeria. Even though hundredof studies are estimating those parameters for COVID-19 in different spots of the world, a local estimation isof major importance as their values are strongly relatedto local population discipline, public health capacitiesand severity of local containment measures at the verybeginning and during the epidemic period.During one month after the first confirmed caseof COVID-19 in Algeria on Feb. 25th, the diseasehas undergone a practically free propagation phase.On March 12th universities, schools and nurserieshave been closed. On March 19th, all trips betweenAlgeria and European countries have been canceled byAlgerian authorities who have decided the first strongcontainment measures against COVID-19 spread onMarch 24th. A total lockdown of Blida provinceand partial lockdown in many other provinces havebeen applied. Coffee shops, restaurants and all non-essential shops have been closed, public transportationsuspended and grouping of more than two personsforbidden. On April 24th, the authorities decided apartial release of lockdown measures in Blida and otherprovinces and allowed many commercial activities to (a)
Infected Cases. (b)
Cumulative numbers. (c)
Reproduction number since Feb. 25th. (d)
Incidence of the disease.
Figure 3:
SEIQRDP model forecast. (a) Number of exposed (infected not yet infectious), infectious (asymptomatic infectious) andactive (quarantined) cases. The figure shows the epidemic peak time corresponding to the maximum active cases to be on the timeperiod of May 20th - May 30th with roughly ten thousand active cases. (b) Total quarantined, recoveries and deaths. Real data arerepresented with the red, green and blue dashed lines respectively. (c) Time dependent reproduction number. R t = 1 point is in aperfect accordance with exposed, infected and active cases inflection points. (d) Number of new infected individuals per day. Lightshadings represent 95% confidence intervals of the model estimate (color online). resume. This date coincided with the starting of theholy month of Ramadan resulting in a brutal increaseof social and commercial activities. Due to low respectof physical distancing and protection measures thenumber of new confirmed cases increased significantlyand shops have been closed again in many provincessince May 7th. On the light of this chronology ofmeasures we have estimated intermediate mean valuesof the epidemic parameters during the free-propagationphase (Feb. 25th - Mar. 25th) and then everytwo-weeks intermediate period till May 24th. Thoseintermediate mean values exposed in Fig.2 providevaluable information revealing the evolution of theepidemic in Algeria during its three first months andthe impact of the applied control measures. In order to forecast the evolution of the COVID-19in Algeria we apply the SEIRQDP with training dataperiod from Feb. 25th to May 24th. The cross-validation method script is applied on the first 70days of the data set and tested on the 20 remainingto calculate the optimum number of generation. Forthe chosen set of data we obtain G opt = 20. Then,the genetic algorithm and the rest of SEIQRDP setof programs are applied on the whole training data tocalculate the optimum fit and reproduce the SEIQRDPvariables curves using the fit parameters obtained. Wepresent in this paper a forecast of COVID-19 outbreakdynamics until the end of September 2020, time forwhich the reopening of schools and universities isscheduled. That step would represent a crucial periodin the disease evolution and will require a specificanalysis in due course.Parameter Definition Value for Algeria(95% CI) Value for Wuhan(95% CI) Reference(Wuhan) α Protection rate (mean) 0.015 (0.014-0.017) 0.085 Peng et al. [12] β Transmission rate (mean) 0.64 (0.62-0.66) 0.99 Peng et al. [12] γ − Latent time (mean) 2.7 (2.6-2.8) 2 Peng et al. [12] δ − Infectious time (mean) 5.9 (5.7-6.1) 7.4 Peng et al. [12] R Basic reproduction number 3.78 (3.033-4.53) 6.47 (5.717.23) Tang et al. [14]
Table 1:
Summary of SEIQRDP parameters estimates for COVID-19 in Algeria compared to Wuhan (China). R is estimated onFeb. 25th while the mean values of the other parameters are calculated for the three first months of the outbreak in Algeria.
4. Results and Discussion
Our model estimates that on Feb. 25th, in additionto the first confirmed SARS-CoV-2 infected case inAlgeria at least 7 other individuals have been infectedwithout showing any symptoms. On March 2nd whenthe two first cases have been confirmed at Blida, weestimate that the number of asymptomatic infectiouspeople has already reached 10 individuals and at least10 others have been in a latent period. One weeklater the number of asymptomatic infected personshave already exceeded 70 following our estimations.Officially, 20 of them have been confirmed at that time.Epidemic parameters model estimates for thefirst three months of COVID-19 in Algeria are in agood agreement with on-the-ground evolution of theoutbreak. The estimated basic reproduction numberon Feb. 25th is R = 3 .
78 (95% CI 3.033-4.53)while the value of R t on May 24th is estimated to0.651 (95% CI 0.539-0.761) and the mean effectivereproduction number during the first three monthsof the epidemic is evaluated to 1.74 (95% CI 1.55-1.92). The notable decline in R t during this periodmight reflects outbreak control efforts and growingawareness of SARS-CoV-2. By the same token, wedistinguish a significant rise of the protection rate α after the first control measures on March 24th jumpingfrom 0.0041 during the free propagation phase beforeMarch 25th to 0.0089 on the period of March 26th -April 10th and doubling again to 0.021 on the nextperiod (see Fig.2 upper-left corner). Interestingly, theprotection rate curve reflects the release of containmentand a lower respect of protection measure on the periodbetween April 27th and May 12th resulting in a declineof α during the next period. The protection ratemean value of the overall study period is estimatedto 0.015 (95% CI 0.014-0.017). The increase of thetransmission rate shown on Fig.2 lower-left corner isreasonable due to the continuous propagation of thevirus and the apparition of many clusters in densepopulation provinces. In addition, the low number ofdaily tests and the relatively long test-to-result timeof the used testing technology increase the probabilitythat an asymptotic infectious individual spreads the virus before being quarantined. The transmissionrate mean value is estimated to 0.64 (95% CI 0.62-0.66). The mean latent time is evaluated to 2.7(95% CI 2.6-2.8) days and the mean infectious timeis predicted to 5.9 (95% CI 5.7-6.1) days. The meanincubation time (latent time + infectiousness time)has a mean value of 8.6 (95% CI 8.3-8.9) days. Oneremarkable point that can be observed on Fig.2 middlepanel is that besides the first period, the incubationtime remains relatively stable taking values within therange [7.9-8.6] days. This reflects the fact that themodel effectively calibrates the global features of theevolution of hidden variables representing the exposed E ( t ) and infectious I ( t ) portions of the populationwhich are not measurable. The decrease of theincubation time after the first period of the studymight be a consequence of a better detection scheme.In fact, a high diagnosis capacity allowing large scaletesting strategy and efficient tracking are essentialtools to reduce the onset to quarantine (incubation)period since early and quick detection of infectiousindividuals enables authorities to quarantine thembefore showing symptoms, hence limiting the numberof their contacts. Moreover, this will help diminishthe effective reproduction number R t and then bettercontrol the disease spread.In contrast to other epidemic parameters, recoveryand fatality rates shown on the right panel of Fig.2are directly calculated from official data. The recoveryrate varies in the range [1.1% - 2.7%] with a meanvalue of 1.9% and the fatality rate, initially estimatedas the highest in the world at the time, fell bellow0.5% since mid-April with a mean value estimated to1.02%. The significant decrease of fatality rate, eventhough affected by the growing test capacities after thenumber of RT-PCR daily tests have been increased andthe CT-scan diagnostic of COVID-19 adopted on thebeginning of April, could also be interpreted as theconsequence of a better medical care. The fatalityrate seems to stabilize during the last month of thestudy (0.071% on April 27th - May 12th and 0.058% onMay 13th - May 24th) as newly deployed RT-PCR testcapacities are reaching again their limits. The epidemic Figure 4:
Case to Fatality Rate (CFR) and Infection to FatalityRate (IFR) for COVID-19 outbreak in Algeria between Feb. 25thand May 24th. Light red shading represents 95% confidenceintervals of the model estimate (color online). analysis of the parameters’ values is beyond the scopeof this paper as it requires information to which wedon’t have access. Nevertheless, we notice that thekey parameters values obtained through our modelfor Algeria fall within the values ranges estimated forthe Chinese city of Wuhan where SARS-CoV-2 firstappeared [12, 14, 20] as shown on Table 1.The forecast simulations (Fig. 3), based onthe available official data, estimate that the infectionpeak time corresponding to the maximum incidenceoccurred on April 24th-26th with 387 (95% CI 267-509) new infections per day as shown on Fig.3d. Theeffective reproduction number continuously decreasedreflecting a better control of the disease spread andcrossed the line R t = 1 by May 1st (see Fig.3c). Atthat crucial point the disease entered the attenuationphase. The SEIQRDP model evaluates the activecases peak time for COVID-19 outbreak in Algeria,corresponding to active cases maximum, to be on theperiod between May 20th and May 30th with 9794(95% CI 8770-1024) active cases (see Fig.3a).We estimate that the number of new infections willvanish by mid-September. At that time the numberof active quarantined cases will be still above 500.Assuming that the epidemic will remain ongoing aslong as all active cases have not been closed yet, themodel predicts the outbreak to end no earlier thanOctober 2020, with an estimated total quarantinedindividuals of 24021 (95% CI 20768-27274) , 15291(95% CI 13272-17310) recovered and 8172 (95% CI7093-9251) deaths as shown on Fig.3b. Notice thatthe provisional total number of deaths appears to beparticularly overestimated compared to official number(blue dashed line). A solution to this technical issue isunder investigation. We emphasize that the numberswe present in this forecast are only estimations thatcould be seriously affected by the behaviour of the population and any eventual measures taken by theauthorities during the period of the epidemic. A brutalrelease of containment could result in a reversal of thecurves as far as no vaccine have still been developedfor a large scale use.An important information that could be extractedfrom the official public data is the Case Fatality Rate(CFR) corresponding to the ratio of deaths to effectiveconfirmed cases. The Infected Fatality Rate (IFR),often confused with CFR, is the ratio of deaths toinfected cases including asymptomatic cases which arenon-measurable. For that reason we calculate CFRbased on official data while the IFR is calculatedthrough the ratio of the official cumulative deathsto cumulative number of infected individual obtainedfrom the SEIQRDP model (see Fig.4). The mean CFRon the period Feb. 25th - May 24th is estimated to5.3% while the mean value of IFR on the same periodis 2.9% (95% CI 1.7%-3.9%). Notice that the meanIFR for the three first month of the outbreak in Algeriais higher than the global value estimated to 1.4% bya recent study using cumulative COVID-19 data from139 countries [30].It is worth to know that compartmental modelsincluding the SEIQRDP model work perfectly whensome conditions on the studied population areassumed. Indeed, the SEIQRDP model requires awell-mixed and homogeneous population. Well-mixedpopulation means that all individuals in the populationhave the same chance to be infected by an infectiousone. Homogeneity means that all individuals behavelikely toward the disease and thus are governed bythe same rules of transitions’ probabilities betweendifferent population compartments. As a consequence,all calibrated parameters in this study should be seenas a statistical average over population. Moreover,SEIQRDP model is fundamentally not additive i.ethe sum of different SEIQRDP models applied todifferent provinces of a given country is not necessarilyequivalent to the SEIQRDP model applied to thewhole country. Because of the previous considerationsaltogether, it would be very interesting to apply ourstudy on different major infected cities of the countryseparately.
5. Conclusion
In this paper we have presented an enhanced com-partmental SEIQRDP model for epidemics in whicha protection rate has been introduced and notewor-thy compartments of quarantined and protected pop-ulation have been added compared to the most widelyused SEIR models. Our approach is based on a geneticfitting algorithm and makes use of a cross-validationmethod to overcome the overftting problem. We have0designed a generic open-source package containing allcomputational tools used in our analysis [25]. Basedon official cumulative recoveries, cumulative deathsand deduced effective cumulative confirmed cases, thismodel allowed us to estimate key epidemic parametersand make prevision of the disease effective reproduc-tion number time evolution in order to evaluate the epi-demic situation and the effect of the control measuresapplied. Using our SEIQRDP model, we have beenable to provide valuable approximate estimation of thedaily evolution of the non-measurable asymptomaticexposed and infectious cases in addition to the dailyactive cases from the beginning until a very advancedstage of the COVID-19 outbreak in Algeria. We haveestimated the periods in which these numbers will beat their highest peak and approximated the maximumvalues they could reach. We have also estimated thetime in which the number of new infections will van-ish. Even though the SEIQRDP model we presented,as many of SEIR derivatives, are effective in differentcontexts, we are still studying Algerian case carefullybecause the reported COVID-19 epidemic evolution inAlgeria quickly reached the country’s maximum capac-ity of diagnosis which is well reflected in the linear formof official confirmed cases data. Moreover, we shouldnote that, in a basic way, the SEIQRDP model is wellestablished to simulate a well-mixed closed populationand additionally, it is very sensitive to data accuracy.In this context we stress the fact that our estimationsdepend strongly on the public available data at thetime this study has been achieved. In addition, theepidemic evolution could be significantly affected by fu-ture containment or release measures and then deviatesfrom the estimated forecast we presented. Further-more, the scenario in which the epidemic vanishes aftera first peak without a secondary wave is one amongmany others and not the most probable.We are investigating many possibilities to optimizeour model to fit the COVID-19 evolution in Algeria andelsewhere with more ingenious methods. Additionally,a completely different epidemic agent-based model isalready in an advanced development stage and willbe used to tackle the virus spread from a differentperspective.We hope this study can serve as a useful guidelinefor Algerian scientists and Algerian government andefficiently contribute to the fight against COVID-19pandemic on national and international scale.
CRediT authorship contribution statementM. T. R. : Conceptualization, Methodology, Software,Formal analysis, Writing - Original Draft, Supervision.
A. T. : Conceptualization, Methodology, Software,Formal analysis, Writing - Review & Editing.
N. E. B. : Conceptualization, Methodology, Software,Formal analysis, Writing - Review & Editing.
Conflict of interest
All authors declare no conflicts of interests.
Acknowledgment
This study presents results of a curiosity-drivenresearch, which has been achieved only through thepersonal resources of the authors.
References [1] Algerian-Health-Ministry Situation report of 25/02/2020.URL http://covid19.sante.gov.dz/fr/2020/02/25/25-fevrier-2020/ [2] Algerian-Health-Ministry Situation report of 29/02/2020.URL http://covid19.sante.gov.dz/fr/2020/03/03/point-de-situation-au-29-02-2020/ [3] Algerian-Health-Ministry Situation report of 02/03/2020.URL http://covid19.sante.gov.dz/fr/2020/03/03/point-de-situation-au-02-03-2020-2/ [4] Lounis M 2020
European Journal of Medical and Educa-tional Technologies, [5] Hamidouche M Bull World Health Organ. [Preprint] URL http://dx.doi.org/10.2471/BLT.20.256065 [6] Zhao Z, Li X, Liu F, Zhu G, Ma C and Wang L2020
Science of The Total Environment [7] Luo J
Data-Driven Innovation Lab
URL https://ddi.sutd.edu.sg/ [8] Anderson M and May R M 1992
Infectious Diseases ofHumans, Dynamics and Control (OXFORD UniversityPress) ISBN 9780198540403[9] Meloni S, Perra N, Arenas A, G´omez S, Moreno Y andVespignani A 2011
Scientific Reports
62 URL https://doi.org/10.1038/srep00062 [10] Chowell G, Nishiura H and Bettencourt L M 2007
Journal of The Royal Society Interface https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2006.0161 [11] Wu J T, Leung K and Leung G M 2020 TheLancet https://doi.org/10.1016/S0140-6736(20)30260-9 [12] Peng L, Yang W, Zhang D, Zhuge C and HongL 2020 medRxiv
URL [13] Labadin J and Hong B H 2020 medRxiv
URL [14] Tang B, Wang X, Li Q, Bragazzi N L, Tang S, XiaoY and Wu J 2020
Journal of Clinical Medicine
462 ISSN 2077-0383 URL http://dx.doi.org/10.3390/jcm9020462 [15] Ma Z, Zhou Y and Wu J 2009
Modeling and Dynam-ics of Infectious Diseases (CO-PUBLISHED WITHHIGHER EDUCATION PRESS) URL [16] Diekmann O; Heesterbeek H B T 2013
Mathematical toolsfor understanding infectious diseases
Princeton seriesin theoretical and computational biology (PrincetonUniversity Press) ISBN 9780691155395,0691155399 URL http://gen.lib.rus.ec/book/index.php?md5=a5ac7b120c717866af712961deca8bf3 [17] Hannah L 2015
International Encyclopedia of the Social &Behavioral Sciences [18] Giudici M, Comunian A and Gaburro R 2020 Inversion of asir-based model: a critical analysis about the applicationto covid-19 epidemic ( Preprint )[19] Chambers L 1995
Practical Handbook of Genetic Algo-rithms, Applications, Volume 1 (CRC-Press, Inc.) ISBN0849325196[20] Lin Q, Zhao S, Gao D, Lou Y, Yang S, Musa S S, Wang M H,Cai Y, Wang W, Yang L and He D 2020
InternationalJournal of Infectious Diseases https://doi.org/10.1016/j.ijid.2020.02.058 [21] Yang W, Zhang D, Peng L, Zhuge C and HongL 2020 medRxiv URL [22] Basu S and Andrews J 2013
PLOS Medicine https://doi.org/10.1371/journal.pmed.1001540 [23] Browne M W 2000 Journal of Mathematical Psy-chology
108 – 132 ISSN 0022-2496 URL [24] Guyon I 1997 A scaling law for the validation-set training-set size ratio[25] URL https://github.com/Taha-Rouabah/COVID-19 [26] Data-Packaged-Core-Datasets URL https://raw.githubusercontent.com/datasets/covid-19/master/data/time-series-19-covid-combined.csv [27] Ritchie H Our world in data URL https://ourworldindata.org/coronavirus-source-data [28] Algerian-Health-Ministry-EADN Epidemic map covid-19 inalgeria. URL http://covid19.sante.gov.dz/carte/ [29] CDTA Epidemic map covid-19 in algeria URL https://covid19.cdta.dz/dashboard/production/index.php [30] Grewelle R and De Leo G 2020 medRxiv
URL [31] INSP Epidemic report 1, 18/04/2020, algeria URL
Appendix A. Effective confirmed cases
On Fig.A1 we expose the official number of hospitaladmitted patients due to COVID-19 in Algeria (bluedashed line) and the official number of active cases(yellow dashed line) computed by subtracting theofficial numbers for recoveries an deaths from the
Figure A1: