[PDF] A mathematical epidemic model using genetic fitting algorithm with cross-validation and application to early dynamics of COVID-19 in Algeria

Abstract

A compartmental epidemic model based on genetic fitting algorithm and a cross-validation method to overcome the overfitting problem are proposed. This generic enhanced SEIR model allows to estimate approximate nowcast and forecast of epidemic evolution including key epidemic parameters and non-measurable asymptomatic infected portion of the susceptible population. The model is used to study COVID-19 outbreak dynamics in Algeria between February 25th and May 24th. The Basic reproduction number is estimated to 3.78 (95% CI 3.033-4.53) and effective reproduction number on May 24th after three months of the outbreak is estimated to 0.651 (95% CI 0.539-0.761). The Infections peak time is predicted to the end of April while active cases peak time is predicted to the end of May 2020. The disease incidence, CFR and IFR are calculated. Information provided by this study could help establish a realistic assessment of the situation in Algeria for the time being, inform predictions about potential future evolution, and guide the design of appropriate public health measures.

Full PDF

AA mathematical epidemic model using geneticﬁtting algorithm with cross-validation andapplication to early dynamics of COVID-19 inAlgeria

M. T. Rouabah ∗ , A. Tounsi, N. E. Belaloui Laboratoire de Physique Mathmatique et SubatomiqueFrres Mentouri University Constantine - 1, Ain El Bey Road, Constantine,25017, Algeria.

Abstract.

A compartmental epidemic model based on genetic ﬁtting algorithmand using cross-validation method to overcome the overﬁtting problem isproposed. This generic enhanced SEIR model allows to estimate approximatenowcast and forecast of epidemic evolution including key epidemic parameters andnon-measurable asymptomatic infected portion of the susceptible population. Themodel is used to study COVID-19 outbreak dynamics in Algeria between February25th and May 24th. Basic reproduction number on Feb. 25th is estimated to 3.78(95% CI 3.033-4.53) and eﬀective reproduction number on May 24th after threemonths of the outbreak is estimated to 0.651 (95% CI 0.539-0.761). Infection peaktime is predicted to the end of April while active cases peak time is predictedto the end of May 2020. The disease incidence, CFR and IFR are calculated.Information provided by this study could help establish a realistic assessment ofthe situation in Algeria for the time being, inform predictions about potentialfuture evolution, and guide the design of appropriate public health measures.

Keywords : COVID-19; mathematical modeling; genetic algorithm; cross-validation;Algeria. ∗ Corresponding author: [email protected], [email protected]. a r X i v : . [ q - b i o . P E ] J un

1. Introduction

The recent outbreak of the highly infectious COVID-19 disease caused by SARS-CoV-2 in Wuhan and othercities in China in 2019 has become a global pandemicsince the ﬁrst quarter of 2020 as declared by the WorldHealth Organization (WHO). SARS-CoV-2 has beenﬁrst imported to Algeria on Feb. 17th, 2020 by anItalian national who has been conﬁrmed positive toCOVID-19 on Feb. 25th [1]. The Italian man hasbeen repatriated back to Italy via a special ﬂight onFeb. 28th and no contaminated individuals by thisﬁrst conﬁrmed case have been reported by the Algerianoﬃcial authorities [2]. As far as we know, the eﬀectiveoutbreak of COVID-19 in Algeria started late Feb.2020. Indeed, the Algerian Health Ministry (AHM)reported in a statement on March 2nd the two ﬁrstconﬁrmed cases of COVID-19 in Blida province southof the capital Algiers [3]. Since then the spread of thevirus in Algeria has gone through diﬀerent epidemicphases [4].Besides medical and biological research, theoreti-cal studies based on either statistics or mathematicalmodeling may also play a key role in understandingthe epidemic characteristics of the outbreak. Epidemicmodeling represents a crucial tool in forecasting theinﬂection point and ending time and provides insightsinto the epidemiological situation. Such analysis canpredict the potential future evolution, help estimatethe eﬃciency of already taken measures, and guide thedesign of alternative interventions. To the best of ourknowledge, few theoretical studies on COVID-19 out-break in Algeria have so far been achieved [5]. The lackof theoretical and clinical publicly accessible studiesabout SARS-CoV-2 spread in Algeria exposing the ac-tual situation and analyzing possible evolution scenar-ios is making the situation more confusing for the Al-gerian public and scientiﬁc community. A lot of studieson COVID-19 speciﬁcations and dynamics around theworld are published every day, some of which includingAlgerian case [6, 7]. However, we believe that any anal-ysis of COVID-19 outbreak in Algeria should take intoconsideration many speciﬁc aspects that are not con-sidered in such universal studies and online-simulatorswhich use raw data accessible on many databases. Be-yond the fact that the majority of those databasescontain many wrong reported data for Algeria, datanomenclature and interpretation, as well as test meth-ods proper to every country should be taken into con-sideration for more accurate outcomes. In our anal-ysis instead of relying only on oﬃcial Reverse Tran-scriptase Polymerase Chain Reaction (RT-PCR) con-ﬁrmed SARS-CoV-2 infection cases, which are stronglyaﬀected by limited test capacities, we rather combinethem with the oﬃcial number of hospital admitted pa-tients due to SARS-CoV-2 infection in order to deduce the eﬀective number of new conﬁrmed infection perday. This choice makes a signiﬁcant diﬀerence not onlyon the cumulative number of conﬁrmed cases but alsoon the nowcast and forecast of the virus spread.The paper is organized as follows: in the nextsection we present the mathematical model we usefor the dynamical modeling of COVID-19 propagationand some results of the model with reference to thepandemic spread in some speciﬁc countries. The thirdsection will be devoted to the application of the modelon the Algerian case through the estimation of keyepidemic parameters and a forecast analysis. Resultswill be shown and discussed in the fourth section. Theconcluding section will include some ideas about futuredevelopments of this work.

2. Model and Methods

At the very beginning of the epidemic, during thefree spread phase, it is common to assume an initialexponential-growth, which is characteristic of mosthuman infectious diseases [8]. However, spontaneousherd immunity, protections and lockdown measureswill confront the geometrical evolution. A dynamicalmodel is then required to describe the evolution of thedisease.

The compartmental classical Susceptible ExposedInfectious Recovered (SEIR) model [8, 9] has been themost widely adopted model for characterizing manyhistorical propagating infectious diseases such as theSpanish ﬂu [10]. SEIR model is extensively used tostudy the COVID-19 pandemic in China and manyother countries with variations best suiting the subjectregion and time period [11, 12, 13].Regarding the novelty of the time course ofinfection shown by the disease and the requiredprotection measures, to simulate COVID-19 spreadwe use a SEIQRDP model in which at time t thepopulation is split into compartments that representthe diﬀerent stages of a disease [12, 14]. S ( t ) representsthe susceptible portion of the population i.e. thoseyet to be infected. P ( t ) represents the eﬀectivelyprotected population, mainly individuals who tendsto strictly follow the standard advised protectionmeasures such as wearing masks, physical distancing...etc. Hence, this part of the population is consideredas not susceptible to be infected. Introducing thiscompartment is crucial to reﬂect increasing awarenesswithin the major part of the population as thepandemic evolves and allows to take into considerationthe control measures taken by authorities to ﬁghtagainst the pandemic such as closing public area,suspending public transportation and lockdown. E ( t )represents a latent state in which individuals have beenexposed to the disease but are not yet infectious, i.e.the individuals in this stage have the virus but cannot infect others whereas I ( t ) represents those that arecurrently infectious. The asymptomatic exposed andinfectious portion of the population are not detectableand hence non-measurable. The proportion of this partof the population can only be revealed by theoreticalmodeling of the disease. Q ( t ) represents quarantinedindividuals considered as active cases, R ( t ) representsindividuals that have recovered from the disease andsupposed to no longer take part in the disease spreadand D ( t ) represents closed cases or deaths. N = S ( t ) + E ( t ) + I ( t ) + Q ( t ) + R ( t ) + D ( t ) + P ( t ) is thetotal population at time t considered constant at thetime scale of the epidemic evolution. The SEIQRDPmodel represents the virus propagation by a set ofordinary diﬀerential equations (ODEs) that associatetransition parameters to the mobility of individualsbetween population compartments deﬁned above:˙ S = − βS ( t ) I ( t ) /N − αS ( t ) , (1)˙ E = βS ( t ) I ( t ) /N − γE ( t ) , (2)˙ I = γE ( t ) − δI ( t ) , (3)˙ Q = δI ( t ) − λQ ( t ) − κQ ( t ) , (4)˙ R = λQ ( t ) , (5)˙ D = κQ ( t ) , (6)˙ P = αS ( t ) . (7)where ˙ S refers to the time derivative of S . The positiverate α called the protection rate, is introduced intothe model assuming that the susceptible populationis stably decreasing as result of increasing populationawareness and public health authorities actions [12].All the other parameters depend on the evolution ofthe epidemic, testing and health care capacities and arecalculated based on the oﬃcial daily conﬁrmed cases,deaths and recoveries numbers. The transmissionrate β represents the ability of an infected individualinfecting others (depending on the population density,the toxicity of the virus etc . . . ) and βS ( t ) I ( t ) /N isthe incidence of the disease, i.e., the number of newinfected individuals yielding in unit time at time t [15]. γ − is the average latent time that an individual spendsincubating the virus to become infectious (infected butnot yet infectious) and δ − is the average infectiousnesstime, i.e, time for an infectious individual to getsymptoms and get detected and quarantined. λ isthe cure rate and κ is the mortality rate while λ − and κ − represent quarantine to recovery time andquarantine to death time respectively. These transitionparameters are used by the model to deﬁne a time-dependent number of secondary cases generated by aprimary infectious individual R t = βδ − S ( t ) /N knownas the eﬀective reproduction number. It is a veryimportant parameter for the analysis of any epidemic outbreak and provides a measure of the intensity ofinterventions required to control the virus propagation.In general, if R t > R t ≤

1, which mathematically correspondsto ˙ E + ˙ I ≤

0, the disease dies out . At the beginningof the epidemic matching a situation of a completelysusceptible population, this quantity is known as thebasic reproduction number R = βδ − and obtainedby the next generation matrix method [16].Even though many COVID-19 studies try tocalculate universal mean values of the reproductionnumber and transition parameters in some speciﬁcspots of the outbreak, they remain strongly relatedto local data and could change from one country toanother and even from one region to another within thesame country. Such parameters are the kind of valuableinformation this model could provide in addition toapproximate peak times of the disease (infection peaktime and active cases peak time) and approximatenumbers of the non-measurable asymptomatic cases,active cases, and total quarantined, recoveries anddeaths cases. An a priori knowledge of those numbers,though approximate, could help to optimize humanand material resources on the global and local scales ofa country. In this SEIQRDP model, key parametersare extracted from oﬃcial numbers of cumulativeconﬁrmed cases, recoveries and deaths available at agiven period of the epidemic. The parameters obtainedeither by direct calculation or by a ﬁtting algorithmare used to construct the variables curves that ﬁt theinitial data. Those curves are then extrapolated to alonger period of time, thus forecasting the evolution ofthe epidemic. To calibrate our model’s parameters and ﬁt thereal data originating from a speciﬁc region ofthe world, many ﬁtting methods are available,most of which are widely used in epidemiologicalstudies and machine learning models. Single stageproblems such as calibrating the parameters of ourmodel are usually solved with modiﬁed deterministicoptimization methods such as the L-BFGS-B method[17, 18]. However, a stochastic method would havethe beneﬁt of taking into account the diversity ofthe possible calibrations scenarios. Evolutionarygenetic algorithms are one of those stochastic methodsthat have a good reputation in solving optimizationproblems. In the following section we will discussthe advantages that the genetic algorithm method canyield to our study.

Deﬁnition

A genetic algorithm (GA) is an optimization approachinspired by Darwinian evolution in which an initial setof candidate solutions called initial population, eachrepresented by a set of values making a genome, evolveby breeding and reproducing while being subject torandom mutations [19]. The key mechanism in aGA is that only the best performing solutions get toreproduce and pass on their genes just as in Darwiniannatural selection. The evolution process is ﬁnallystopped after a certain number of generations whena deﬁned stopping condition is met.

Application

In our case, applying a GA to ﬁnd the best ﬁtting forthe SEIQRD model parameters is a straightforwardapplication of the above deﬁnition. The genome isthe set of parameters itself, breeding is the process ofcreating a new set from two parent sets by randomlyselecting genes from either one of the parents, andmutation is a random alteration of one of the genesof the resulting new genome. The best performingset of parameters is the one for which the curvesproduced by the model match the best the originaldata. This is measured by a normalized leastsquares method. We can speed up the process byconstraining the randomly generated initial populationto be somewhat around already published values forCOVID-19 epidemic parameters [20]. Diﬀerent runsof the GA give slightly diﬀerent solutions. From thesesolutions, an error on the prediction made by the modelcan be computed.

The overﬁtting problem

Since ﬁnding the correct SEIQRDP parameters for anepidemic is essentially a curve ﬁtting problem, thepredictive eﬀectiveness of the model can considerablybe reduced if we underﬁt or overﬁt the availablereal data, which we will call training data. If thetraining data is underﬁtted, the model could simplydiverge or give overestimated numbers with very largevariance. On the opposite, if the data is overﬁtted,the predictive curves produced by the model will bestrongly inﬂuenced by the given training data andwill have very low variance. Thus artiﬁcially reducingthe error on the predicted numbers and eventuallyleading to a non-realistic forecast. Overﬁtting remainsa major problem with epidemic dynamical models[21]. In many of them overﬁtting occurs becauseso many parameters can ﬂuctuate over their rangeof uncertainty that their ﬁtted values can becomeexcessively inﬂuenced by noise in the original data [22].Thus, restrictions have been applied to some epidemicanalysis including COVID-19 outbreak in order to diminish the number of free parameters and avoidoverﬁtting aﬀecting the pertinence of those studies[12]. To overcome this issue we cut oﬀ the ﬁttingprocess after a number of generations which is largeenough to actually ﬁt the training data and smallenough to not go beyond overﬁtting limits. We willcall this number the optimum generation number G opt and we compute it using data from a given province orcountry passing through a two-samples cross-validationprocedure. Thus, G opt corresponds to the ﬁtting depththat ensures the well balance between underﬁtting andoverﬁtting in our model. Computation of the optimum generation number

Cross-validation is a procedure where an originaltraining set is split into training and validation subsets,and where the model is trained on the ﬁrst subsetand tested for the second one [23]. In our case, theoriginal training set is the whole available data on theCOVID-19 epidemic for a given country or region for n days. This data is then split into a training subsetcontaining the data of the ﬁrst n − v days, and avalidation subset of the last v days. The ratio v/n depends on the number of adjustable parameters inthe regression problem [24]. This ratio is around 1/4for the SEIQRDP model.To determine G opt , we run our genetic algorithmfor ﬁtting with the training subset. After eachgeneration we measure the ﬁtness of the best solutionwith the validation set. The expected result of thisprocess is a bad ﬁtness for very low generationsnumber G , which gets better with every new generationuntil we start overﬁtting the training subset (high G )resulting in a worse ﬁtness. The value of G ≡ G opt for which the ﬁtness on the validation set is the best ischosen as the stopping point for the genetic algorithmwhen applied for predictive purposes. In order to allow take advantage of the ﬁtting algorithmand the cross-validation method presented in this studyfor other epidemic cases, a tailored set of Pythonprograms developed by the authors have been gatheredin a Python package and made accessible onlinewith all the necessary instructions for installation andeﬃcient use [25]. This package is adapted for parallelcomputation and includes tools to: download datafrom online repositories, determine the optimum ﬁttingdepth for a given city, region or country using the cross-validation method, calibrate the SEIQRDP model byﬁtting the real data with the genetic algorithm, andsolve the system of ODEs to produce the forecast.Hence, our generic programs might be easily applied tostudy any outbreak for which a compartmental analysis (a)

Italy, G opt = 40 , S N = 17%. (b) Spain, G opt = 20 , S N = 10%. (c) Germany, G opt = 10, S N = 7%. (d) South Korea, G opt = 10 , S N = 3%. Figure 1:

Results of the SEIQRDP model forecast for COVID-19 outbreak in (a) Italy, (b) Spain, (c) Germany and (d) South-Korea. A training sets of 30, 45, 60 and 90 days of oﬃcial data (green tri down marker lines), respectively, are initially used toﬁt the model’s parameters. The forecast curves (blue lines) are calculated using the SEIQRDP model and compared to real activecases curves (red lines) for each country. Light blue shading represents 95% conﬁdence intervals of the model estimate. Even withonly four or six weeks of training data the model is able to produce a realistic forecasting estimate. All ﬁts have coeﬃcient ofdetermination R > . is adequate in any region of the world provided that asuﬃcient amount of epidemic data is available. Provided a reasonably accurate data, our modelsuccessfully reproduces the evolution of COVID-19in diﬀerent spots worldwide for which a suﬃcientamount of data is available. In this section we presentthe results obtained using the SEIQRDP model toestimate the active cases evolution in Italy, Spain,Germany and South Korea. For those countries weuse publicly available conﬁrmed cases, recoveries anddeaths numbers from online raw data set [26, 27]. Wepick training data starting from the date for whichall conﬁrmed cases, recoveries and deaths numberstake non-zero values to avoid computational bugs andoptimize parameters ﬁtting. The active cases curve is then reproduced for 6 months following that date.In order to highlight the eﬃciency of the model weuse only an early part of the oﬃcial data to train themodel instead of all the available data. We have used30 days training data for Italy, 45 days for Spain, 60days for Germany and 90 days for South Korea. Fig.1shows that the results obtained using SEIRQDP arein very good accordance with oﬃcial statistics for thenumber of active cases in those countries. All theﬁttings have a coeﬃcient of determination R > . S N to evaluate the goodness ofthe ﬁts. We remind that in order to avoid overﬁttingthe training data and obtain the best forecast, we lookfor the optimum ﬁtting corresponding to G opt ratherthan the best one. For Italy, the model is able toreproduce to a good approximation ( S N = 17%) the Figure 2:

Estimation of key epidemic parameters during the early stage of COVID-19 outbreak in Algeria (Feb. 25th - May24th). Intermediate values are calculated for ﬁve diﬀerent time periods corresponding to speciﬁc phases of the virus propagationwith speciﬁc circumstances and authorities measures severity. The intermediate values of each parameter are compared to its meanvalue on the whole three months period (dashed red line). Protection rate, transmission rate, latent time and infectious time areestimated using the SEIQRDP model while recovery and fatality rates are calculated from oﬃcial data. Error bars represent 95%conﬁdence intervals of the model estimate (color online). active cases curve with only 30 days of training data.Fig.1 reveals that the larger is the training data samplethe lower is the optimum number of generation G opt used by the genetic algorithm to ﬁt the data with alower S N . The active cases curve is one of the mostpertinent in our opinion as it reﬂects the amplitudeof the epidemic outbreak as well as the eﬃciency ofthe measures applied to control it. Moreover, theepidemic will end only if all the active cases are closed.Germany and South Korea are very special cases andneed profound analysis that is beyond the scope ofthis paper, but one can clearly observe the quickerdecrease on their active cases curves after the epidemicpeak time due to particular strategies to control theepidemic.

3. Model estimation for Algeria

For COVID-19 dynamics study in Algeria, we useoﬃcial public data provided by the AHM [28, 29]. Ouranalysis speciﬁcity, that we believe makes its previsionresults for Algeria more accurate than diﬀerent studiesin which Algeria is presented as an example [6], is thefact that instead of relaying on oﬃcial numbers of RT-PCR-conﬁrmed SARS-CoV-2 infection cases, whichare strongly aﬀected by limited test capacities, wededuce the eﬀective number of conﬁrmed infections perday by considering the number of hospital admittedpatients. This number is considered as the eﬀectivenumber of active cases in our study. The eﬀectiveconﬁrmed cases number for a given date is then deduced by adding computed tomography (CT) scansconﬁrmed cases to the oﬃcial RT-PCR conﬁrmed cases(see Appendix Appendix A).

Besides the more exciting forecast use of the SEIRQDPmodel, this latter is particularly eﬃcient for nowcastestimations. Indeed, ﬁtting the oﬃcial data allows usto estimate key epidemic parameters of the early stagespread of COVID-19 in Algeria. Even though hundredof studies are estimating those parameters for COVID-19 in diﬀerent spots of the world, a local estimation isof major importance as their values are strongly relatedto local population discipline, public health capacitiesand severity of local containment measures at the verybeginning and during the epidemic period.During one month after the ﬁrst conﬁrmed caseof COVID-19 in Algeria on Feb. 25th, the diseasehas undergone a practically free propagation phase.On March 12th universities, schools and nurserieshave been closed. On March 19th, all trips betweenAlgeria and European countries have been canceled byAlgerian authorities who have decided the ﬁrst strongcontainment measures against COVID-19 spread onMarch 24th. A total lockdown of Blida provinceand partial lockdown in many other provinces havebeen applied. Coﬀee shops, restaurants and all non-essential shops have been closed, public transportationsuspended and grouping of more than two personsforbidden. On April 24th, the authorities decided apartial release of lockdown measures in Blida and otherprovinces and allowed many commercial activities to (a)

Infected Cases. (b)

Cumulative numbers. (c)

Reproduction number since Feb. 25th. (d)

Incidence of the disease.

Figure 3:

SEIQRDP model forecast. (a) Number of exposed (infected not yet infectious), infectious (asymptomatic infectious) andactive (quarantined) cases. The ﬁgure shows the epidemic peak time corresponding to the maximum active cases to be on the timeperiod of May 20th - May 30th with roughly ten thousand active cases. (b) Total quarantined, recoveries and deaths. Real data arerepresented with the red, green and blue dashed lines respectively. (c) Time dependent reproduction number. R t = 1 point is in aperfect accordance with exposed, infected and active cases inﬂection points. (d) Number of new infected individuals per day. Lightshadings represent 95% conﬁdence intervals of the model estimate (color online). resume. This date coincided with the starting of theholy month of Ramadan resulting in a brutal increaseof social and commercial activities. Due to low respectof physical distancing and protection measures thenumber of new conﬁrmed cases increased signiﬁcantlyand shops have been closed again in many provincessince May 7th. On the light of this chronology ofmeasures we have estimated intermediate mean valuesof the epidemic parameters during the free-propagationphase (Feb. 25th - Mar. 25th) and then everytwo-weeks intermediate period till May 24th. Thoseintermediate mean values exposed in Fig.2 providevaluable information revealing the evolution of theepidemic in Algeria during its three ﬁrst months andthe impact of the applied control measures. In order to forecast the evolution of the COVID-19in Algeria we apply the SEIRQDP with training dataperiod from Feb. 25th to May 24th. The cross-validation method script is applied on the ﬁrst 70days of the data set and tested on the 20 remainingto calculate the optimum number of generation. Forthe chosen set of data we obtain G opt = 20. Then,the genetic algorithm and the rest of SEIQRDP setof programs are applied on the whole training data tocalculate the optimum ﬁt and reproduce the SEIQRDPvariables curves using the ﬁt parameters obtained. Wepresent in this paper a forecast of COVID-19 outbreakdynamics until the end of September 2020, time forwhich the reopening of schools and universities isscheduled. That step would represent a crucial periodin the disease evolution and will require a speciﬁcanalysis in due course.Parameter Deﬁnition Value for Algeria(95% CI) Value for Wuhan(95% CI) Reference(Wuhan) α Protection rate (mean) 0.015 (0.014-0.017) 0.085 Peng et al. [12] β Transmission rate (mean) 0.64 (0.62-0.66) 0.99 Peng et al. [12] γ − Latent time (mean) 2.7 (2.6-2.8) 2 Peng et al. [12] δ − Infectious time (mean) 5.9 (5.7-6.1) 7.4 Peng et al. [12] R Basic reproduction number 3.78 (3.033-4.53) 6.47 (5.717.23) Tang et al. [14]

Table 1:

Summary of SEIQRDP parameters estimates for COVID-19 in Algeria compared to Wuhan (China). R is estimated onFeb. 25th while the mean values of the other parameters are calculated for the three ﬁrst months of the outbreak in Algeria.

4. Results and Discussion

Our model estimates that on Feb. 25th, in additionto the ﬁrst conﬁrmed SARS-CoV-2 infected case inAlgeria at least 7 other individuals have been infectedwithout showing any symptoms. On March 2nd whenthe two ﬁrst cases have been conﬁrmed at Blida, weestimate that the number of asymptomatic infectiouspeople has already reached 10 individuals and at least10 others have been in a latent period. One weeklater the number of asymptomatic infected personshave already exceeded 70 following our estimations.Oﬃcially, 20 of them have been conﬁrmed at that time.Epidemic parameters model estimates for theﬁrst three months of COVID-19 in Algeria are in agood agreement with on-the-ground evolution of theoutbreak. The estimated basic reproduction numberon Feb. 25th is R = 3 .

78 (95% CI 3.033-4.53)while the value of R t on May 24th is estimated to0.651 (95% CI 0.539-0.761) and the mean eﬀectivereproduction number during the ﬁrst three monthsof the epidemic is evaluated to 1.74 (95% CI 1.55-1.92). The notable decline in R t during this periodmight reﬂects outbreak control eﬀorts and growingawareness of SARS-CoV-2. By the same token, wedistinguish a signiﬁcant rise of the protection rate α after the ﬁrst control measures on March 24th jumpingfrom 0.0041 during the free propagation phase beforeMarch 25th to 0.0089 on the period of March 26th -April 10th and doubling again to 0.021 on the nextperiod (see Fig.2 upper-left corner). Interestingly, theprotection rate curve reﬂects the release of containmentand a lower respect of protection measure on the periodbetween April 27th and May 12th resulting in a declineof α during the next period. The protection ratemean value of the overall study period is estimatedto 0.015 (95% CI 0.014-0.017). The increase of thetransmission rate shown on Fig.2 lower-left corner isreasonable due to the continuous propagation of thevirus and the apparition of many clusters in densepopulation provinces. In addition, the low number ofdaily tests and the relatively long test-to-result timeof the used testing technology increase the probabilitythat an asymptotic infectious individual spreads the virus before being quarantined. The transmissionrate mean value is estimated to 0.64 (95% CI 0.62-0.66). The mean latent time is evaluated to 2.7(95% CI 2.6-2.8) days and the mean infectious timeis predicted to 5.9 (95% CI 5.7-6.1) days. The meanincubation time (latent time + infectiousness time)has a mean value of 8.6 (95% CI 8.3-8.9) days. Oneremarkable point that can be observed on Fig.2 middlepanel is that besides the ﬁrst period, the incubationtime remains relatively stable taking values within therange [7.9-8.6] days. This reﬂects the fact that themodel eﬀectively calibrates the global features of theevolution of hidden variables representing the exposed E ( t ) and infectious I ( t ) portions of the populationwhich are not measurable. The decrease of theincubation time after the ﬁrst period of the studymight be a consequence of a better detection scheme.In fact, a high diagnosis capacity allowing large scaletesting strategy and eﬃcient tracking are essentialtools to reduce the onset to quarantine (incubation)period since early and quick detection of infectiousindividuals enables authorities to quarantine thembefore showing symptoms, hence limiting the numberof their contacts. Moreover, this will help diminishthe eﬀective reproduction number R t and then bettercontrol the disease spread.In contrast to other epidemic parameters, recoveryand fatality rates shown on the right panel of Fig.2are directly calculated from oﬃcial data. The recoveryrate varies in the range [1.1% - 2.7%] with a meanvalue of 1.9% and the fatality rate, initially estimatedas the highest in the world at the time, fell bellow0.5% since mid-April with a mean value estimated to1.02%. The signiﬁcant decrease of fatality rate, eventhough aﬀected by the growing test capacities after thenumber of RT-PCR daily tests have been increased andthe CT-scan diagnostic of COVID-19 adopted on thebeginning of April, could also be interpreted as theconsequence of a better medical care. The fatalityrate seems to stabilize during the last month of thestudy (0.071% on April 27th - May 12th and 0.058% onMay 13th - May 24th) as newly deployed RT-PCR testcapacities are reaching again their limits. The epidemic Figure 4:

Case to Fatality Rate (CFR) and Infection to FatalityRate (IFR) for COVID-19 outbreak in Algeria between Feb. 25thand May 24th. Light red shading represents 95% conﬁdenceintervals of the model estimate (color online). analysis of the parameters’ values is beyond the scopeof this paper as it requires information to which wedon’t have access. Nevertheless, we notice that thekey parameters values obtained through our modelfor Algeria fall within the values ranges estimated forthe Chinese city of Wuhan where SARS-CoV-2 ﬁrstappeared [12, 14, 20] as shown on Table 1.The forecast simulations (Fig. 3), based onthe available oﬃcial data, estimate that the infectionpeak time corresponding to the maximum incidenceoccurred on April 24th-26th with 387 (95% CI 267-509) new infections per day as shown on Fig.3d. Theeﬀective reproduction number continuously decreasedreﬂecting a better control of the disease spread andcrossed the line R t = 1 by May 1st (see Fig.3c). Atthat crucial point the disease entered the attenuationphase. The SEIQRDP model evaluates the activecases peak time for COVID-19 outbreak in Algeria,corresponding to active cases maximum, to be on theperiod between May 20th and May 30th with 9794(95% CI 8770-1024) active cases (see Fig.3a).We estimate that the number of new infections willvanish by mid-September. At that time the numberof active quarantined cases will be still above 500.Assuming that the epidemic will remain ongoing aslong as all active cases have not been closed yet, themodel predicts the outbreak to end no earlier thanOctober 2020, with an estimated total quarantinedindividuals of 24021 (95% CI 20768-27274) , 15291(95% CI 13272-17310) recovered and 8172 (95% CI7093-9251) deaths as shown on Fig.3b. Notice thatthe provisional total number of deaths appears to beparticularly overestimated compared to oﬃcial number(blue dashed line). A solution to this technical issue isunder investigation. We emphasize that the numberswe present in this forecast are only estimations thatcould be seriously aﬀected by the behaviour of the population and any eventual measures taken by theauthorities during the period of the epidemic. A brutalrelease of containment could result in a reversal of thecurves as far as no vaccine have still been developedfor a large scale use.An important information that could be extractedfrom the oﬃcial public data is the Case Fatality Rate(CFR) corresponding to the ratio of deaths to eﬀectiveconﬁrmed cases. The Infected Fatality Rate (IFR),often confused with CFR, is the ratio of deaths toinfected cases including asymptomatic cases which arenon-measurable. For that reason we calculate CFRbased on oﬃcial data while the IFR is calculatedthrough the ratio of the oﬃcial cumulative deathsto cumulative number of infected individual obtainedfrom the SEIQRDP model (see Fig.4). The mean CFRon the period Feb. 25th - May 24th is estimated to5.3% while the mean value of IFR on the same periodis 2.9% (95% CI 1.7%-3.9%). Notice that the meanIFR for the three ﬁrst month of the outbreak in Algeriais higher than the global value estimated to 1.4% bya recent study using cumulative COVID-19 data from139 countries [30].It is worth to know that compartmental modelsincluding the SEIQRDP model work perfectly whensome conditions on the studied population areassumed. Indeed, the SEIQRDP model requires awell-mixed and homogeneous population. Well-mixedpopulation means that all individuals in the populationhave the same chance to be infected by an infectiousone. Homogeneity means that all individuals behavelikely toward the disease and thus are governed bythe same rules of transitions’ probabilities betweendiﬀerent population compartments. As a consequence,all calibrated parameters in this study should be seenas a statistical average over population. Moreover,SEIQRDP model is fundamentally not additive i.ethe sum of diﬀerent SEIQRDP models applied todiﬀerent provinces of a given country is not necessarilyequivalent to the SEIQRDP model applied to thewhole country. Because of the previous considerationsaltogether, it would be very interesting to apply ourstudy on diﬀerent major infected cities of the countryseparately.

5. Conclusion

In this paper we have presented an enhanced com-partmental SEIQRDP model for epidemics in whicha protection rate has been introduced and notewor-thy compartments of quarantined and protected pop-ulation have been added compared to the most widelyused SEIR models. Our approach is based on a geneticﬁtting algorithm and makes use of a cross-validationmethod to overcome the overftting problem. We have0designed a generic open-source package containing allcomputational tools used in our analysis [25]. Basedon oﬃcial cumulative recoveries, cumulative deathsand deduced eﬀective cumulative conﬁrmed cases, thismodel allowed us to estimate key epidemic parametersand make prevision of the disease eﬀective reproduc-tion number time evolution in order to evaluate the epi-demic situation and the eﬀect of the control measuresapplied. Using our SEIQRDP model, we have beenable to provide valuable approximate estimation of thedaily evolution of the non-measurable asymptomaticexposed and infectious cases in addition to the dailyactive cases from the beginning until a very advancedstage of the COVID-19 outbreak in Algeria. We haveestimated the periods in which these numbers will beat their highest peak and approximated the maximumvalues they could reach. We have also estimated thetime in which the number of new infections will van-ish. Even though the SEIQRDP model we presented,as many of SEIR derivatives, are eﬀective in diﬀerentcontexts, we are still studying Algerian case carefullybecause the reported COVID-19 epidemic evolution inAlgeria quickly reached the country’s maximum capac-ity of diagnosis which is well reﬂected in the linear formof oﬃcial conﬁrmed cases data. Moreover, we shouldnote that, in a basic way, the SEIQRDP model is wellestablished to simulate a well-mixed closed populationand additionally, it is very sensitive to data accuracy.In this context we stress the fact that our estimationsdepend strongly on the public available data at thetime this study has been achieved. In addition, theepidemic evolution could be signiﬁcantly aﬀected by fu-ture containment or release measures and then deviatesfrom the estimated forecast we presented. Further-more, the scenario in which the epidemic vanishes aftera ﬁrst peak without a secondary wave is one amongmany others and not the most probable.We are investigating many possibilities to optimizeour model to ﬁt the COVID-19 evolution in Algeria andelsewhere with more ingenious methods. Additionally,a completely diﬀerent epidemic agent-based model isalready in an advanced development stage and willbe used to tackle the virus spread from a diﬀerentperspective.We hope this study can serve as a useful guidelinefor Algerian scientists and Algerian government andeﬃciently contribute to the ﬁght against COVID-19pandemic on national and international scale.

CRediT authorship contribution statementM. T. R. : Conceptualization, Methodology, Software,Formal analysis, Writing - Original Draft, Supervision.

A. T. : Conceptualization, Methodology, Software,Formal analysis, Writing - Review & Editing.

N. E. B. : Conceptualization, Methodology, Software,Formal analysis, Writing - Review & Editing.

Conﬂict of interest

All authors declare no conﬂicts of interests.

Acknowledgment

This study presents results of a curiosity-drivenresearch, which has been achieved only through thepersonal resources of the authors.

References [1] Algerian-Health-Ministry Situation report of 25/02/2020.URL http://covid19.sante.gov.dz/fr/2020/02/25/25-fevrier-2020/ [2] Algerian-Health-Ministry Situation report of 29/02/2020.URL http://covid19.sante.gov.dz/fr/2020/03/03/point-de-situation-au-29-02-2020/ [3] Algerian-Health-Ministry Situation report of 02/03/2020.URL http://covid19.sante.gov.dz/fr/2020/03/03/point-de-situation-au-02-03-2020-2/ [4] Lounis M 2020

European Journal of Medical and Educa-tional Technologies, [5] Hamidouche M Bull World Health Organ. [Preprint] URL http://dx.doi.org/10.2471/BLT.20.256065 [6] Zhao Z, Li X, Liu F, Zhu G, Ma C and Wang L2020

Science of The Total Environment [7] Luo J

Data-Driven Innovation Lab

URL https://ddi.sutd.edu.sg/ [8] Anderson M and May R M 1992

Infectious Diseases ofHumans, Dynamics and Control (OXFORD UniversityPress) ISBN 9780198540403[9] Meloni S, Perra N, Arenas A, G´omez S, Moreno Y andVespignani A 2011

Scientiﬁc Reports

62 URL https://doi.org/10.1038/srep00062 [10] Chowell G, Nishiura H and Bettencourt L M 2007

Journal of The Royal Society Interface https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2006.0161 [11] Wu J T, Leung K and Leung G M 2020 TheLancet https://doi.org/10.1016/S0140-6736(20)30260-9 [12] Peng L, Yang W, Zhang D, Zhuge C and HongL 2020 medRxiv

URL [13] Labadin J and Hong B H 2020 medRxiv

URL [14] Tang B, Wang X, Li Q, Bragazzi N L, Tang S, XiaoY and Wu J 2020

Journal of Clinical Medicine

462 ISSN 2077-0383 URL http://dx.doi.org/10.3390/jcm9020462 [15] Ma Z, Zhou Y and Wu J 2009

Modeling and Dynam-ics of Infectious Diseases (CO-PUBLISHED WITHHIGHER EDUCATION PRESS) URL [16] Diekmann O; Heesterbeek H B T 2013

Mathematical toolsfor understanding infectious diseases

Princeton seriesin theoretical and computational biology (PrincetonUniversity Press) ISBN 9780691155395,0691155399 URL http://gen.lib.rus.ec/book/index.php?md5=a5ac7b120c717866af712961deca8bf3 [17] Hannah L 2015

International Encyclopedia of the Social &Behavioral Sciences [18] Giudici M, Comunian A and Gaburro R 2020 Inversion of asir-based model: a critical analysis about the applicationto covid-19 epidemic ( Preprint )[19] Chambers L 1995

Practical Handbook of Genetic Algo-rithms, Applications, Volume 1 (CRC-Press, Inc.) ISBN0849325196[20] Lin Q, Zhao S, Gao D, Lou Y, Yang S, Musa S S, Wang M H,Cai Y, Wang W, Yang L and He D 2020

InternationalJournal of Infectious Diseases https://doi.org/10.1016/j.ijid.2020.02.058 [21] Yang W, Zhang D, Peng L, Zhuge C and HongL 2020 medRxiv URL [22] Basu S and Andrews J 2013

PLOS Medicine https://doi.org/10.1371/journal.pmed.1001540 [23] Browne M W 2000 Journal of Mathematical Psy-chology

108 – 132 ISSN 0022-2496 URL [24] Guyon I 1997 A scaling law for the validation-set training-set size ratio[25] URL https://github.com/Taha-Rouabah/COVID-19 [26] Data-Packaged-Core-Datasets URL https://raw.githubusercontent.com/datasets/covid-19/master/data/time-series-19-covid-combined.csv [27] Ritchie H Our world in data URL https://ourworldindata.org/coronavirus-source-data [28] Algerian-Health-Ministry-EADN Epidemic map covid-19 inalgeria. URL http://covid19.sante.gov.dz/carte/ [29] CDTA Epidemic map covid-19 in algeria URL https://covid19.cdta.dz/dashboard/production/index.php [30] Grewelle R and De Leo G 2020 medRxiv

URL [31] INSP Epidemic report 1, 18/04/2020, algeria URL

Appendix A. Eﬀective conﬁrmed cases

On Fig.A1 we expose the oﬃcial number of hospitaladmitted patients due to COVID-19 in Algeria (bluedashed line) and the oﬃcial number of active cases(yellow dashed line) computed by subtracting theoﬃcial numbers for recoveries an deaths from the

Figure A1: