[PDF] Epidemiological Forecasting with Model Reduction of Compartmental Models. Application to the COVID-19 pandemic

Abstract

We propose a forecasting method for predicting epidemiological health series on a two-week horizon at the regional and interregional resolution. The approach is based on model order reduction of parametric compartmental models, and is designed to accommodate small amount of sanitary data. The efficiency of the method is shown in the case of the prediction of the number of infected and removed people during the two pandemic waves of COVID-19 in France, which have taken place approximately between February and November 2020. Numerical results illustrate the promising potential of the approach.

Full PDF

EEpidemiological short-term Forecasting withModel Reduction of Parametric Compartmental Models.Application to the ﬁrst pandemic wave of COVID-19 in France.

Athmane Bakhta, Thomas Boiveau, Yvon Maday, Olga Mula

Abstract

We propose a forecasting method for predicting epidemiological health series on a two-week horizon at the regional and interregional resolution. The approach is based on modelorder reduction of parametric compartmental models, and is designed to accommodate smallamount of sanitary data. The eﬃciency of the method is shown in the case of the predictionof the number of hospitalized and infected and removed people during the ﬁrst pandemicwave of COVID-19 in France, which has taken place approximately between February andMay 2020. Numerical results illustrate the promising potential of the approach.

Providing reliable epidemiological forecasts during an ongoing pandemic is crucial to mitigatethe potentially disastrous consequences on global public health and economy. As the ongoingpandemic on COVID-19 sadly illustrates, this is a daunting task in the case of new diseasesdue to the incomplete knowledge of the behavior of the disease, and the heterogeneities anduncertainties in the health data count. Despite these diﬃculties, many forecasting strategies existand we could cast them into two main categories. The ﬁrst one is purely data-based, and involvesstatistical and learning methods such as time series analysis, multivariate linear regression, greyforecasting or neural networks [6, 1, 10, 24, 17]. The second approach uses epidemiologicalmodels which are appealing since they provide an interpretable insight of the mechanisms ofthe outbreak. They also provide high ﬂexibility in the level of detail to describe the evolutionof the pandemic, ranging from simple compartmental models that divide the population intoa few exclusive categories, to highly detailed descriptions involving numerous compartments oreven agent-based models (see, e.g., [20, 9, 12]). One salient drawback of using epidemiologicalmodels for forecasting purposes lies in the very high uncertainty in the estimation of the involvedparameters. This is due to the fact that most of the time the parameters cannot be inferredfrom real observations and the available data are insuﬃcient or too noisy to provide any reliableestimation. The situation is aggravated by the fact that the number of parameters can quicklybecome large even in moderately simple compartmental models [9]. As a result, forecastingwith these models involves making numerous a priori hypothesis which can sometimes be hardlyjustiﬁed by data observations.In this paper, our goal is to forecast the time-series of hospitalized, recovered and deadpatients with compartmental models that involve as few parameters as possible in order to inferthem solely from the data. The model involving the least number of parameters is probably theSIR model [15] which is based on a partition of the population into:• Uninfected people, called susceptible (S),1 a r X i v : . [ s t a t . M E ] N ov Infected and contagious people (I), with more or less marked symptoms,• People removed (R) from the infectious process, either because they are cured or unfortu-nately died after being infected.If N denotes the total population size that we assume to be constant over a certain time interval [0 , T ] , we have N = S ( t ) + I ( t ) + R ( t ) , ∀ t ∈ [0 , T ] , and the evolution from S to I and from I to R is given for all t ∈ [0 , T ] by dSdt ( t ) = − βI ( t ) S ( t ) NdIdt ( t ) = βI ( t ) S ( t ) N − γI ( t ) dRdt ( t ) = γI ( t ) . The SIR model has only two parameters:• γ > represents the recovery rate. In other words, its inverse γ − can be interpreted asthe length (in days) of the contagious period.• β > is the transmission rate of the disease. It essentially depends on two factors: thecontagiousness of the disease and the contact rate within the population. The larger thissecond parameter is, the faster the transition from susceptible to infectious will be. Asa consequence, the number of hospitalized patients may increase very fast, and may leadto a collapse of the health system [2]. Strong distancing measures like conﬁnement caneﬀectively act on this parameter [25, 11], helping to keep it low.Our forecasting strategy is motivated by the following observation: by allowing the parameters β and γ to be time-dependent, we can ﬁnd optimal coeﬃcients β ∗ ( t ) and γ ∗ ( t ) that exactly ﬁt anyseries of hospitalized and removed patients. In other words, we can perfectly ﬁt any observedhealth series with a SIR model having time-dependent coeﬃcients.As we explain later on in the paper, the high ﬁtting power stems from the fact that theparameters β and γ are searched in L ∞ ([0 , T ] , R + ) , the space of essentially bounded measurablefunctions. For our forecasting purposes, this space is however too large to give any predictivepower and we need to ﬁnd a smaller manifold that has simultaneously good ﬁtting and forecastingproperties. To this aim, we develop a method based on model order reduction. The idea is to ﬁnda space of reduced dimension that can host the dynamics of the current epidemic. This reducedspace is learnt from a series of detailed compartmental models based on precise underlyingmechanisms of the disease. One major diﬃculty in these models is to ﬁt the correct parameters.In our approach, we do not seek to estimate them. Instead, we consider a large range of possibleparameter conﬁgurations with a uniform sampling that allows us to simulate virtual epidemicscenarios on a longer range than the ﬁtting window [0 , T ] . We next cast each virtual epidemicsfrom the family of detailed compartmental models into the family of SIR models with time-dependent coeﬃcients in a sense that we explain later on. This procedure yields time-dependentparameters β and γ for each detailed virtual epidemics. The set of all such β (resp. γ ) is thencondensed into a reduced basis of small dimension. We ﬁnally use the available health data onthe time window [0 , T ] to ﬁnd the functions β and γ from the reduced space that ﬁt at best thecurrent epidemics over [0 , T ] . Since the reduced basis functions are deﬁned over a longer timerange [0 , T + τ ] with τ > (say, e.g., two weeks), the strategy automatically provides forecasts2rom T to T + τ . Its accuracy will be related to the pertinence of the mechanistic mathematicalmodels that have been used in the learning process.We note that an important feature of our approach is that all virtual simulations are con-sidered equally important on a ﬁrst stage and the procedure automatically learns what are thebest scenarios (or linear combinations of scenarios) to describe the available data. Moreover, theapproach even mixes diﬀerent models to accommodate these available data. This is in contrastto other existing approaches which introduce a strong a priori belief on the quality of a certainparticular model. Note also that we can add other models, related to other illness and use thelarge manifold to ﬁt a possible new epidemic.The paper is organized as follows. In Section 2, we present the forecasting method in the caseof one single region with constant population. For this, we brieﬂy introduce in Section 2.1 theepidemiological models involved in the procedure, namely the SIR model with time-dependentcoeﬃcients and more detailed compartmental models used for the training step. In Section2.2, after proving that SIR models with time-dependent coeﬃcients in L ∞ ([0 , T ]) have perfectﬁtting properties, we present the main steps of the forecasting method. The method involves acollapsing step from detailed models to SIR models with time-dependent coeﬃcients, and modelreduction techniques. We detail these points in Sections 2.3 and 2.4. In Section 3 we explain howthe method can easily be extended to a multi-regional context involving population mobility,and regional health data observations (provided, of course, that mobility data is available). Westart in Section 3.1 by clarifying that the nature of the mobility data will dictate the kind ofmulti-regional SIR model to use in this context. In Section 3.2 we outline how to adapt themain steps of the method to the multi-regional case. Finally, in Section 4, we present numericalresults for the period of the ﬁrst pandemic wave of COVID-19 in France, which has taken placeapproximately between February and May 2020. Concluding comments are given in Section 5followed by two appendices A and B that present details about the processing of the data noise,and the forecasting error. For the sake of clarity, we ﬁrst consider the case of one single region with constant populationand no population ﬂuxes with other regions. Here, the term region is generic and may be appliedto very diﬀerent geographical scales, ranging from a full country, to a department within thecountry, or even smaller partitions of a territory.

The ﬁnal output of our method is a mono-regional SIR model with time dependent coeﬃcients asexposed below. This SIR model with time-dependent coeﬃcients will be evaluated with reducedmodeling techniques involving a large family of models with ﬁner compartments proposed in theliterature. Before presenting the method in the next section, we introduce here all the modelsthat we use in this paper along with useful notations for the rest of the paper.

SIR models with time-dependent parameters:

We will ﬁt and forecast the series ofhospitalized and removed patients from hospitals (dead and recovered) with SIR models where3he coeﬃcients β and γ are time-dependent: dSdt ( t ) = − β ( t ) I ( t ) S ( t ) NdIdt ( t ) = β ( t ) I ( t ) S ( t ) N − γ ( t ) I ( t ) dRdt ( t ) = γ ( t ) I ( t ) . In the following, we use bold-faced letters for past-time quantities. For example, f := { f ( t ) : 0 ≤ t ≤ T } for any function f ∈ L ∞ ([0 , T ]) . Using this notation, for any given β and γ ∈ L ∞ ([0 , T ]) we denote by ( S , I , R ) = SIR ( β , γ , [0 , T ]) the solution of the associated SIR dynamics in [0 , T ] . Detailed compartmental models:

Models involving many compartments oﬀer a detaileddescription of epidemiological mechanisms at the expense of involving many parameters. Inour approach, we use them to generate virtual scenarios. One of the initial motivations behindthe present work is to provide forecasts for the COVID-19 pandemic, thus we have selected thetwo following models which are speciﬁc for this disease, but note that any other compartmentalmodel [16, 9, 20] or agent-based simulation could also be used .•

First model , SEI CHRD : This model is inspired from the one proposed in [9]. It involves11 diﬀerent compartments and a set of 19 parameters. The dynamics of the model isillustrated in Figure 1 and the system of equations reads dSdt ( t ) = − N S ( t ) ( β p I p ( t ) + β a I a ( t ) + β ps I ps ( t ) + β ms I ms ( t ) + β ss I ss ( t ) + β H H ( t ) + β C C ( t )) dEdt ( t ) = 1 N S ( t ) ( β p I p ( t ) + β a I a ( t ) + β ps I ps ( t ) + β ms I ms ( t ) + β ss I ss ( t ) + β H H ( t ) + β C C ( t )) − εE ( t ) dI p dt ( t ) = εE ( t ) − µ p I p ( t ) dI a dt ( t ) = p a µ p I p ( t ) − µI a ( t ) dI ps dt ( t ) = p ps (1 − p a ) µ p I p ( t ) − µI ps ( t ) dI ms dt ( t ) = p ms (1 − p a ) µ p I p ( t ) − µI ms ( t ) dI ss dt ( t ) = p ss (1 − p a ) µ p I p ( t ) − µI ss ( t ) dCdt ( t ) = p c µI ss ( t ) − ( λ C,R + λ C,D ) C ( t ) dHdt ( t ) = (1 − p c ) µI ss ( t ) − ( λ H,R + λ H,D ) H ( t ) dRdt ( t ) = λ C,R C ( t ) + λ H,R H ( t ) dDdt ( t ) = λ C,D C ( t ) + λ H,D H ( t ) The diﬀerent parameters involved in the model are described in Table 2 and detailed inthe appendix of [9]. 4 ps I ms I ss I a I p ES C DRH

Figure 1:

SEI CHRD modelCompartment Description S Susceptible E Exposed (non infectious) I p Infected and pre-symptomatic (already infectious) I a Infected and a-symptomatic (but infectious) I ps Infected and paucisymptomatic I ms Infected with mild symptoms I ss Infected with severe symptoms H Hospitalized C Intensive Care Unit R Removed D DeadTable 1: Description of the compartments in Model

SEI CHRD β p Relative infectiousness of I p β a Relative infectiousness of I a β ps Relative infectiousness of I ps β ms Relative infectiousness of I ms β ss Relative infectiousness of I ss β H Relative infectiousness of I H β C Relative infectiousness of I C ε − Latency period µ − p Duration of prodromal phase p a Probability of being asymptomatic µ − Infectious period of I a , I ps , I ms , I ss p ps If symptomatic, probability of being paucisymptomatic p ms If symptomatic, probability of developing mild symptoms p ss If symptomatic, probability of developing severe symptoms (note that p ps + p ms + p ss = 1 ) p C If severe symptoms, probability of going in C λ CR If in C, daily rate entering in Rλ CD If in C, daily rate entering in Dλ HR If hospitalized, daily rate entering in Rλ HD If hospitalized, daily rate entering in D Table 2: Description of the parameters involved in Model

SEI CHRD

We denote by u = SEI CHRD ( u (0) , β p ,β a , β ps , β ms , β ss , β H , β C ,ε, µ p , p a , µ, p ps , p ms , p ss , p C ,λ CR , λ CD , λ HR , λ HD , [0 , T ]) the parameter-to-solution map where u = ( S , E , I p , I a , I ps , I ms , I ss , C , H , R , D ) .• Second model , SE IUR : This model is a variant of the one proposed in [20]. It involves 5diﬀerent compartments and a set of 6 parameters. The dynamics of the model is illustratedin Figure 2 and the set of equations is dSdt ( t ) = − N βS ( t )( E ( t ) + U ( t ) + I ( t )) dE dt ( t ) = 1 N βS ( t )( E ( t ) + U ( t ) + I ( t )) − δE ( t ) dE dt ( t ) = δE ( t ) − σE ( t ) dIdt ( t ) = νσE ( t ) − γ I ( t ) dUdt ( t ) = (1 − ν ) σE ( t ) − γ U ( t ) dRdt ( t ) = γ I ( t ) + γ U ( t ) U RE E S Figure 2: SE IUR modelCompartment Description S Susceptible E Exposed (non infectious) E Infected and pre-symptomatic (already infectious) I Infected and symptomatic U Un-noticed R dead and removedTable 3: Description of the compartments in Model SE IUR

We denote by u = SE IUR ( u (0) , β, δ, σ, ν, γ , γ , [0 , T ]) the parameter-to-solution map where u = ( S , E1 , E2 , I , U , R ) . The diﬀerent parametersinvolved in the model are described in Table 4.• Generalization:

In the following, we abstract the procedure as follows. For a any

Detailed_Model with d compartments involving a vector µ ∈ R p of p parameters, wedenote by u ( µ ) = Detailed_Model ( µ, [0 , (cid:101) T ]) , u ( µ ) ∈ L ∞ ([0 , (cid:101) T ] , R d ) the parameter-to-solution map, where (cid:101) T is some given time simulation that can be as largeas wished because this is a virtual scenario. Note that, because the initial condition of theillness at time 0 is not known, we include in the parameter set the initial condition u (0) . We assume that we are given health data in a time window [0 , T ] , where T > is assumed to bethe present time. The observed data is the series of hospitalized people, denoted I obs , and peopleParameter Description β Relative infectiousness of I , U , E δ − Latency period σ − Duration of prodromal phase ν Proportion of I among I + Uγ If I , daily rate entering in Rγ If U , daily rate entering in R Table 4: Description of the parameters involved in Model SE IUR R obs . They are usually given at a national or a regional scale andon a daily basis. For our discussion, it will be useful to work with time-continuous functions and t → I obs ( t ) will denote the piecewise constant approximation in [0 , T ] from the given data (andsimilarly for R obs ( t ) ). Our goal is to give short-term forecasts of the series in a time window τ > whose size will be about two weeks. We denote by I ( t ) and R ( t ) the approximations tothe series I obs ( t ) and R obs ( t ) at any time t ∈ [0 , T ] .As already brought up, we propose to ﬁt the data and provide forecasts with SIR modelswith time-dependent parameters β and γ . The main motivation for using such a simple familyis because it possesses optimal ﬁtting and forecasting properties for our purposes in the sensethat we explain next. Deﬁning the cost function J ( β , γ | I obs ( t ) , R obs ( t ) , [0 , T ]) := (cid:90) T (cid:0) | I ( t ) − I obs ( t ) | + | R ( t ) − R obs ( t ) | (cid:1) d t (1)such that ( S , I , R ) = SIR ( β , γ , [0 , T ]) , the ﬁtting problem can be expressed at the continuous level as the optimal control problem ofﬁnding J ∗ = inf ( β , γ ) ∈ L ∞ ([0 ,T ]) × L ∞ ([0 ,T ]) J ( β , γ | I obs , R obs , [0 , T ]) . (2)The following result ensures the existence of a unique minimizer under very mild constraints. Proposition 2.1.

Let N ∈ N ∗ and T > . For any real-valued functions S obs , I obs , R obs of class C , deﬁned on [0 , T ] satisfying(i) S obs ( t ) + I obs ( t ) + R obs ( t ) = N for every t ∈ [0 , T ] ,(ii) S obs in nonincreasing on [0 , T ] ,(iii) R obs is nondeacreasing on [0 , T ] ,there exists a unique minimizer ( β ∗ obs , γ ∗ obs ) to problem (2.2) .Proof. One can set  β ∗ obs ( t ) := − NI obs ( t ) S obs ( t ) dS obs dt ( t ) γ ∗ obs ( t ) := 1 I obs ( t ) dR obs dt ( t ) (3)so that ( S obs , I obs , R obs ) = SIR ( β ∗ , γ ∗ , [0 , T ]) and J ( β ∗ obs , γ ∗ obs , [0 , T ]) = 0 which obviously implies that J ∗ = 0 .Note that one can equivalently set  β ∗ obs ( t ) := − NI obs ( t ) S obs ( t ) dS obs dt ( t ) γ ∗ obs ( t ) := 1 I obs ( t ) (cid:20) dI obs dt ( t ) − β ∗ obs ( t ) I obs ( t ) S obs ( t ) N (cid:21)

8r again  γ ∗ obs ( t ) := 1 I obs ( t ) dR obs dt ( t ) β ∗ obs ( t ) := NI obs ( t ) S obs ( t ) (cid:20) dI obs dt ( t ) − γ ∗ obs ( t ) I obs ( t ) (cid:21) This simple observation means that there exists a time-dependent SIR model which canperfectly ﬁt the data of any (epidemiological) scenario that satisﬁes properties (i), (ii) and(iii). In particular, we can perfectly ﬁt the series of sick people with a time-dependent SIRmodel (modulo a smoothing of the local oscillations due to noise). Since the health data areusually given on a daily basis, we can approximate β ∗ obs , γ ∗ obs by approximating the derivativesby classical ﬁnite diﬀerences in equation (2.2).The fact that we can build β ∗ obs and γ ∗ obs such that J ( β ∗ obs , γ ∗ obs ) = J ∗ = 0 implies thatthe family of time-dependent SIR models is rich enough not only to ﬁt the evolution of anyepidemiological series, but also to deliver perfect predictions of the health data. However, thisgreat approximation power comes at the cost of deﬁning the parameters β and γ in L ∞ ([0 , T ]) which is a space that is too large in order to be able to deﬁne any feasible prediction strategy.In order to pin down a smaller manifold where these parameters may vary without sacriﬁcingmuch on the ﬁtting and forecasting power, our strategy is the following:1. Learning phase:

The fundamental hypothesis of our approach is the conﬁdence thatthe specialists of epidemiology have well understood the mechanics of the propagation ofthe illness. The second hypothesis is that these accurate models suﬀer from the properdetermination of the parameters they contain. We thus propose to generate a large numberof virtual epidemics, following these mechanistic models with parameters that can bechosen in a vicinity of the suggested parameters values, including the diﬀerent initialconditions.(a)

Generate virtual scenarios using detailed models with constant coeﬃcients: • Deﬁne the notion of

Detailed_Model which is most appropriate for the epidemio-logical study. Several models could be considered. For our numerical application,the detailed models were deﬁned in Section 2.1.• Deﬁne an interval range

P ⊂ R p where the parameters µ of Detailed_Model willvary. We call the solution manifold U the set of virtual dynamics over [0 , T + τ ] ,namely U := { u ( µ ) = Detailed_Model ( µ, [0 , T + τ ]) : µ ∈ P} . • Draw a ﬁnite training set P tr = { µ , . . . , µ K } ⊆ P of K (cid:29) parameter instances and we compute u ( µ ) = Detailed_Model ( µ, [0 , T + τ ]) for µ ∈ P tr . Each u ( µ ) is a virtual epidemiological scenario. An importantdetail for our prediction purposes is that the simulations are done in [0 , T + τ ] ,that is, we simulate not only in the ﬁtting time interval but also in the predictiontime interval. We call U tr = { u ( µ ) : µ ∈ P tr } the training set of all virtual scenarios.9b) Collapse:

Collapse every detailed model u ( µ ) ∈ U tr into a SIR model following theideas which we explain in Section 2.3. For every u ( µ ) , the procedure gives time-dependent parameters β ( µ ) and γ ( µ ) and associated SIR solutions ( S , I , R )( µ ) in [0 , T + τ ] . This yields the sets B tr := { β ( µ ) : µ ∈ P tr } and G tr := { γ ( µ ) : µ ∈ P tr } . (4)(c) Compute reduced models:

We apply model reduction techniques using B tr and G tr as training sets in order tobuild two basis B n = span { b , . . . , b n } , G n = span { g , . . . , g n } ⊂ L ∞ ([0 , T + τ ] , R ) , which are deﬁned over [0 , T + τ ] . The space B n is such that it approximates well allfunctions β ( µ ) ∈ B tr (resp. all γ ( µ ) ∈ G tr can be well approximated by elements of G n ). We outline in Section 2.4 the methods we have explored in our numerical tests.2. Fitting on the reduced spaces:

We next solve the ﬁtting problem (2.2) in the interval [0 , T ] by searching β (resp. γ ) in B n (resp. in G n ) instead of in L ∞ ([0 , T ]) , that is, J ∗ (B n , G n ) = min ( β , γ ) ∈ B n × G n J ( β , γ | I obs , R obs , [0 , T ]) . (5)Note that the respective dimensions of B n and G n can be diﬀerent, for simplicity we takethem equal in the following. Obviously, since B n and G n ⊂ L ∞ ([0 , T ]) , we have that J ∗ ≤ J ∗ (B n , G n ) , but we numerically observe that the function n (cid:55)→ J ∗ (B n , G n ) decreases very rapidly as n increases, which indirectly conﬁrms the fact that the manifold generated by the two abovemodels accommodates well the COVID-19 epidemics.The solution of problem (2) gives us coeﬃcients ( c ∗ i ) ni =1 and (˜ c ∗ i ) ni =1 ∈ R n such that thetime-dependent parameters β ∗ n ( t ) = n (cid:88) i =1 c ∗ i b i ( t ) , ∀ t ∈ [0 , T + τ ] ,γ ∗ n ( t ) = n (cid:88) i =1 ˜ c ∗ i g i ( t ) . achieve the minimum (2).3. Forecast:

For a given dimension n of the reduced spaces, propagate in [0 , T + τ ] theassociated SIR model ( S ∗ n , I ∗ n , R ∗ n ) = SIR ( β ∗ n , γ ∗ n , [0 , T + τ ]) The values I ∗ n ( t ) and R ∗ n ( t ) for t ∈ [0 , T [ are by construction close to the observed data I obs , R obs (up to some numerical optimization error). The values I ∗ n ( t ) and R ∗ n ( t ) for t ∈ [ T, T + τ ] are then used for prediction.4. Forecast Combination/Aggregation of Experts (optional step):

By varying thedimension n and using diﬀerent model reduction approaches, we can easily produce acollection of diﬀerent forecasts and the question of how to select the best predictive model10rises. Alternatively, we can also resort to Forecast Combination techniques [22]: denoting ( I , R ) , . . . , ( I P , R P ) the diﬀerent forecasts, the idea is to search for an appropriate linearcombination I FC ( t ) = P (cid:88) p =1 w p I p ( t ) and similarly for R . Note that these combinations do not need to involve forecasts fromour methodology only. Other approaches like time series forecasts could also be included.One simple forecast combination is the average, in which all alternative forecasts are giventhe same weight w p = 1 /P, p = 1 , . . . P . More elaborate approaches consist in estimatingthe weights that minimize a loss function involving the forecast error.Before going into the details of some of the steps, two remarks are in order:1. To bring out the essential mechanisms, we have idealized some elements in the abovediscussion by omitting certain unavoidable discretization aspects. To start with, the ODEsolutions cannot be computed exactly but only up to some accuracy given by a numericalintegration scheme. In addition, the optimal control problems (2.2) and (2) are non-convex.As a result, in practice we can only ﬁnd a local minima. Note however that modern solversﬁnd solutions which are very satisfactory for all practical purposes. In addition, note thatsolving the control problem in a reduced space as in (2) could be interpreted as introducinga regularizing eﬀect with respect to the control problem (2.2) in the full L ∞ ([0 , T ]) space.It is to be expected that the search of global minimizers is facilitated in the reducedlandscape.2. routine-IR and routine- βγ : A variant for the ﬁtting problem (2) which we have studiedin our numerical experiments is to replace the cost function J ( β , γ | I obs , R obs , [0 , T ]) bythe cost function (cid:101) J ( β , γ | β ∗ obs , γ ∗ obs , [0 , T ]) := (cid:90) T (cid:0) | β − β ∗ obs | + | γ − γ ∗ obs ) | (cid:1) d t. (6)In other words, we use the variables β ∗ obs and γ ∗ obs from (2.2) as observed data insteadof working with the observed health series I obs , R obs . In Section 4, we will refer to thestandard ﬁtting method as routine-IR and to this variant as routine- βγ . Let u ( µ ) = Detailed_Model ( µ, [0 , T + τ ]) ∈ L ∞ ([0 , T + τ ] , R d ) be the solution in [0 , T + τ ] to a detailed model for a given vector of parameters µ ∈ R d . Here d ispossibly large ( d = 11 in the case of SEI CHRD model and d = 5 in the case of SE IUR ’s one). Thegoal of this section is to explain how to collapse the detailed dynamics u ( µ ) into a SIR dynamicswith time-dependent parameters. The procedure can be understood as a projection of a detaileddynamics into the family of dynamics given by

SIR models with time-dependent parameters.For the

SEI CHRD model, we collapse the variables by setting S col = S + EI col = I p + I a + I ps + I ms + I ss + C + HR col = R + D SE IUR model, we set S col = S + E i I col = E i + I i + U i R col = R Note that S col , I col and R col depend on µ but we have omitted the dependence to lighten thenotation.Once the collapsed variables are obtained, we identify the time-dependent parameters β and γ of the SIR model by solving the ﬁtting problem ( β , γ ) ∈ arg inf ( β , γ ) ∈ L ∞ ([0 ,T + τ ] , R ) × L ∞ ([0 ,T + τ ] , R ) J ( β , γ | I col , R col , [0 , T + τ ]) (7)where ( S , I , R ) = SIR ( β , γ , [0 , T + τ ]) . Note that problem (2.3) has the same structure as problem (2.2), the diﬀerence coming from thefact that the collapsed variables I col , R col in (2.3) play the role of the health data I obs , R obs in(2.2). Therefore, it follows from Proposition 2.1 that problem (2.3) has a very simple solutionsince it suﬃces to apply formula (2.2) for solving it. Note here that the exact derivatives of S col , I col and R col can be obtained from the Detailed_Model .Since the solution ( β , γ ) to (2.3) depends on the parameter µ of the detailed model, repeatingthe above procedure for every detailed scenario u ( µ ) for any µ ∈ P tr yields the two families oftime-dependent functions B tr = { β ( µ ) : µ ∈ P tr } and G tr = { γ ( µ ) : µ ∈ P tr } deﬁned on theinterval [0 , T + τ ] which were introduced in section (1b). Model Order Reduction is a family of methods aiming at approximating a set of solutions ofparametrized PDEs or ODEs (or related quantities) with linear spaces, which are called reducedmodels or reduced spaces. In our case, the sets to approximate are B = { β ( µ ) : µ ∈ P} and G = { γ ( µ ) : µ ∈ P} , where each µ is the vector of parameters of the detailed model which take values over P , and β ( µ ) and γ ( µ ) are the associated time-dependent coeﬃcients of the collapsed SIR evolution. Inthe following, we view B and G as subsets of L ([0 , T ]) , and we denote by (cid:107) · (cid:107) and (cid:104)· , ·(cid:105) its normand inner product. Indeed, in view of Proposition 2.1, B and G ⊂ L ∞ ([0 , T ]) ⊂ L ([0 , T ]) .Let us carry the discussion for B (the same will hold for G ). If we measure performancein terms of the worst error in the set B , the best possible performance that reduced models ofdimension n can achieve is given by the Kolmogorov n -width d n ( B ) L ([0 ,T ]) := inf Y ∈ L ([0 ,T ])dim( Y ) ≤ n max u ∈B || u − P Y u (cid:107) where P Y is the orthogonal projection onto the subspace Y . In the case of measuring errors inan average sense, the benchmark is given by δ n ( B , ν ) L ([0 ,T ]) := inf Y ∈ L ([0 ,T ])dim( Y )= n (cid:90) P || u ( y ) − P Y u ( y ) (cid:107) d ν ( y ) ν is some given measure on P .In practice, building spaces that meet these benchmarks is generally not possible. However,it is possible to build sequences of spaces for which their error decay is comparable to the onegiven by (cid:0) d n ( B ) L ([0 ,T ]) (cid:1) n or (cid:0) δ n ( B ) L ([0 ,T ]) (cid:1) n . As a result, when the Kolmogorov width decaysfast, the constructed reduced spaces will deliver a very good approximation of the set B withfew modes (see [5, 8, 7, 18]).We next present the reduced models that we have used in our numerical experiments. Othermethods could of course be considered and we refer to [23, 4, 14, 19] for general references onmodel reduction. We carry the discussion in a fully discrete setting in order to simplify thepresentation and keep it as close to the practical implementation as possible. All the claimsbelow could be written in a fully continuous sense at the expense of introducing additionalmathematical objects such as certain Hilbert-Schmidt operators to deﬁne the continuous versionof the Singular Value Decomposition (SVD).We build the reduced models using the two discrete training sets of functions B tr = { β i } Ki =1 and G tr = { γ i } Ki =1 from B and G where K denotes the number of virtual scenarios considered.The sets have been generated in step 1-(b) of our general pipeline (see Section 2.2). We considera discretization of the time interval [0 , T + τ ] into a set of Q ∈ N ∗ points as follows { t =0 , · · · , t P = T, · · · , t Q = T + τ } where P < Q . Thus, we can represent each function β i as avector of Q values β i = ( β i ( t ) , · · · , β i ( t Q )) T ∈ R Q + . and hence assemble all the functions of the family { β i } Ki =1 into a matrix M B ∈ R Q × K + . The sameremark applies for the family { γ i } Ki =1 that gives a matrix M G ∈ R Q × K + .1. SVD:

The eigenvalue decomposition of the correlation matrix M T B M B ∈ R K × K writes M T B M B = V Λ V T , where V = ( v i,j ) ∈ R K × K is an orthogonal matrix and Λ ∈ R K × K is a diagonal matrixwith non-negative entries which we denote λ i and order them in decreasing order. The (cid:96) ( R Q ) -orthogonal basis functions { b , . . . , b K } are then given by the linear combinations b i = K (cid:88) j =1 v j,i β j , ≤ i ≤ K. For n ≤ K , the space B n = span { b , . . . b n } is the best n -dimensional space to approximate the set { β i } Ki =1 in the average sense. Wehave δ n ( { β i } Ki =1 ) (cid:96) ( R Q +1 ) = (cid:32) K K (cid:88) i =1 || β i − P B n β i (cid:107) (cid:96) ( R Q +1 ) (cid:33) / = (cid:32) K (cid:88) i>n λ i (cid:33) / and the average approximation error is given by the sum of the tail of the eigenvalues.Therefore the SVD method is particularly eﬃcient if there is a fast decay of the eigenvalues,meaning that the set B tr = { β i } Ki =1 can be approximated by only few modes. However,note that by construction, this method does not ensure positivity in the sense that P B n β i ( t ) may become negative for some t ∈ [0 , T ] although the original function β i ( t ) ≥ for all t ∈ [0 , T ] . This is due to the fact that the vectors b i are not necessarily nonnegative.As we will see later, in our study, ensuring positivity especially for extrapolation (i.e.,forecasting) is particularly important, and motivates the next methods.13. Nonnegative Matrix Factorization (NMF, see [21, 13]): NMF is a variant of SVD in-volving nonnegative modes and expansion coeﬃcients. In this approach, we build a familyof non-negative functions { b i } ni =1 and we approximate each β i with a linear combination β NMF i = n (cid:88) j =1 a i,j b j , ≤ i ≤ K, (8)where for every ≤ i ≤ K and ≤ j ≤ n , the coeﬃcients a i,j ≥ and the basis function b j ≥ . In other words, we solve the following constrained optimization problem ( W ∗ , B ∗ ) ∈ arg min ( W,B ) ∈ R K × n + × R n × Q + (cid:107) M B − W B (cid:107) F . We refer to [13] for further details on the NMF and its numerical aspects.3.

Enlarged Nonnegative Greedy (ENG): greedy algorithm with projection onan extended cone of positive functions:

We next present a novel model reductionmethod, which is of interest in itself since it allows to build reduced models that preservepositivity and even other types of bounds. The method stems from the observation thatNMF approximates functions in the cone of positive functions of span { b i ≥ } ni =1 sinceit imposes that a i,j ≥ in the linear combination (2). However, note that the positivityof the linear combination is not equivalent to the positivity of the coeﬃcients a i,j sincethere are obviously linear combinations involving very small a i,j < for some j whichmay still deliver a nonnegative linear combination (cid:80) nj =1 a i,j b j . We can thus widen thecone of linear combinations yielding positive values by carefully including these negativecoeﬃcients a i,j . One possible strategy for this is proposed in Algorithm 1, which describesa routine that we call Enlarge_Cone . The routine { ψ , . . . , ψ n } = Enlarge_Cone [ { b , . . . , b n } , ε ] takes a set of nonnegative functions { b , . . . , b n } as input, and modiﬁes each function b i by iteratively adding negative linear combinations of the other basis functions b j for j (cid:54) = i (see line 11 of the routine). The coeﬃcients are chosen in an optimal way so asto maintain the positivity of the ﬁnal linear combination while minimizing the L ∞ -norm.The algorithm returns a set of functions ψ i = b i − (cid:88) j (cid:54) = i σ ij b j , ∀ i ∈ { , . . . , n } with σ ij ≥ . Note that the algorithm requires to set a tolerance parameter ε > for thecomputation of the σ ij .Once we have run Enlarge_Cone , the approximation of any function β is then sought as β ( EC ) = arg min c ,...,c n ≥ (cid:107) β − n (cid:88) j =1 c j ψ j (cid:107) L ([0 ,T + τ ]) We emphasize that the routine is valid for any set of nonnegative input functions. Wecan thus apply

Enlarge_Cone to the functions { b i ≥ } ni =1 from NMF, but also to thefunctions selected by a greedy algorithm such as the following:• For n = 1 , ﬁnd b = arg max β ∈{ β i } Ki =1 (cid:107) β (cid:107) L ([0 ,T + τ ])

14 At step n > , we have selected the set of functions { b , . . . , b n − } . We next ﬁnd b n = arg max β ∈{ β i } Ki =1 min c ,...,c n ≥ (cid:107) β − n − (cid:88) j =1 c j b j (cid:107) L ([0 ,T + τ ]) In our numerical tests, we call Enlarged Nonnegative Greedy (ENG) the routine involvingthe above greedy algorithm combined with our routine

Enlarge_Cone . Algorithm 1

Enlarge_Cone [ { b , . . . , b n } , ε ] → { ψ , . . . , ψ n } Input:

Set of nonnegative functions { b , . . . , b n } . Tolerance ε > . for i ∈ { , . . . , n } do Set σ ij = 0 , ∀ j (cid:54) = i. for (cid:96) ∈ { , . . . , n } do α i, ∗ (cid:96) = arg max { α ≥ (cid:2) b i − (cid:80) j (cid:54) = i σ ij b j − α b (cid:96) (cid:3) ( t ) > , ∀ t ∈ [0 , T + τ ] } σ i(cid:96) = σ i(cid:96) + α i, ∗ (cid:96) while α i, ∗ (cid:96) ≥ ε do α i, ∗ (cid:96) = arg max { α ≥ (cid:2) b i − (cid:80) j (cid:54) = i σ ij b j − α b (cid:96) (cid:3) ( t ) > , ∀ t ∈ [0 , T + τ ] } σ i(cid:96) = σ i(cid:96) + α i, ∗ (cid:96) end while end for ψ i = b i − (cid:80) j (cid:54) = i σ ij b j end forOutput: { ψ , . . . , ψ n } We remark that if we work with positive functions that are upper bounded by a constant

L > , we can ensure that the approximations, denoted as Ψ , and written as a linearcombination of basis functions, will also be between these bounds and L by deﬁning, onthe one hand and as we have just done, a cone of positive functions generated by the abovefamily { ψ i } i , and, on the other hand, by considering the base of the functions L − ϕ , ϕ being as above the set all greedy elements of the reduced basis to which we also apply anenlargement of these positive functions. We then impose that the approximation is writtenas a positive combination of the ﬁrst (positive) functions and that L − Ψ is also writtenas a combination with positive components in the second basis.In this frame, the approximation appears under the form of a least square approximationwith n linear constraints on the n coeﬃcients expressing the fact that in the two abovetransformed basis the coeﬃcients are nonnegative. Remark 2.2.

Reduced models on I = { I ( µ ) : µ ∈ P} and R = { R ( µ ) : µ ∈ P} Instead ofapplying model reduction to the sets B and G as we do in our approach, we could apply the sameabove techniques directly to the sets of solutions I and R of the SIR models with time-dependentcoeﬃcients in B and G . In this case, the resulting approximation would however not follow aSIR dynamics. The forecasting method of Section 2.2 for one single region can be extended to the treatment ofmultiple regions involving population mobility. The prediction scheme will now be based on a15ultiregional SIR with time-dependent coeﬃcients. Compared to other more detailed models,its main advantage is that it reduces drastically the number of parameters to be estimated.Indeed, detailed multiregional models such as multiregional extensions of the above

SEI CHRD and SE IUR models from Section 2.3 require a number of parameters which quickly grows withthe number P of regions involved. Their calibration thus requires large amounts of data which,in addition, may be unknown, very uncertain, or not available. In a forthcoming paper, we willapply the fully multi-regional setting for the post-lockdown period.The structure of this section is the same as the previous one for the case of a single region. Westart by introducing in Section 3.1 the multi-regional SIR model with time-dependent coeﬃcientsand associated detailed models. As any multiregional model, mobility data are required as aninput information, and the nature and level of detail of the available data motivates certainchoices on the modelling of the multiregional SIR (as well as the other detailed models). Wenext present in Section 3.2 the general pipeline, in which we emphasize the high modularity ofthe approach. In the spirit of ﬂuid ﬂow modeling, there are essentially two ways of describing mobility betweenregions:• In a Eulerian description, we take the regions as ﬁxed references for which we recordincoming and outgoing travels.• In a Lagrangian description, we follow the motion of people domiciled in a certain regionand record their travels in the territory. We can expect this modeling to be more informa-tive on the geographical spread of the disease but it comes at the cost of additional detailson the people’s domicile region.Note that both descriptions hold at any coarse or ﬁne geographical level in the sense that whatwe call the regions could be taken to be full countries, departments within a country, or verysmall geographical partitions of a territory. We next describe the multi-regional SIR modelswith the Eulerian and Lagrangian description of population ﬂuxes which will be the output ofour methodology.

Assume that we have P regions and the numberof people in region i is N i for i = 1 , . . . , P . Due to mobility, the population in each region varies,so N i depends on t . However, the total population is assumed to be constant and equal to N ,that is N = P (cid:88) i =1 N i ( t ) , ∀ t ≥ . For any t ≥ , let λ i → j ( t ) ∈ [0 , be the probability that people from i travel to j at time t . Inother words, λ i → j ( t ) N i ( t ) δt is the number of people from region i that have travelled to region j between time t and t + δt . Note that we have P (cid:88) j =1 λ i → j ( t ) = 1 , ∀ t ≥ . δt ≥ , N i ( t + δt ) = N i ( t ) − (cid:88) j (cid:54) = i λ i → j ( t ) N i ( t ) δt + (cid:88) j (cid:54) = i λ j → i ( t ) N j ( t ) δt dividing by δt and taking the limit δt → yields d N i d t ( t ) = − (cid:88) j (cid:54) = i λ i → j ( t ) N i ( t ) + (cid:88) j (cid:54) = i λ j → i ( t ) N j ( t ) . Note that we have P (cid:88) i =1 d N i d t ( t ) = 0 , ∀ t ≥ . Thus (cid:80) i N i ( t ) = (cid:80) i N i (0) = N , which is consistent with our assumption that the total popu-lation is constant.The time evolution of the N i is known in this case if we are given the λ i → j ( t ) from Eulerianmobility data. In addition to this mobility data, we also have the data of the evolution ofhospitalized, deceased and healed people and our goal is to ﬁt a multiregional SIR model thatis in accordance with this data. We propose the following model.Denoting S i , I i an R i the number of Susceptible, Infectious and Recovered people in region i at time t , we ﬁrst have the relation N i ( t ) = S i ( t ) + I i ( t ) + R i ( t ) ⇔ S i ( t ) N i ( t ) + I i ( t ) N i ( t ) + R i ( t ) N i ( t ) . Note that from the second relation, it follows that t S i N i + dd t I i N i + dd t R i N i . (9)To model the evolution between compartments, one possibility is the following SIR model dd t S i N i = −  β i λ i → i I i N i + (cid:88) j (cid:54) = i β j λ j → i I j N j  S i N i (10) dd t I i N i = − dd t S i N i − γ i I i N i dd t R i N i = γ i I i N i , The parameters β i , γ i , N i depend on t but we have omitted the dependence to ease the reading.Introducing the compartmental densities s i = S i N i , i i = I i N i , r i = R i N i , the system equivalently reads dd t s i = −  β i λ i → i i i + (cid:88) j (cid:54) = i β j λ j → i i j  s i (11) dd t i i = − dd t s i − γ i i i dd t r i = γ i i i , Before going further, some comments are in order:17 The model is consistent in the sense that it satisﬁes (3.1.1) and when P = 1 we recoverthe traditional SIR model.• Under lockdown measures, λ i → j ≈ δ i,j and the population N i ( t ) remains practically con-stant. As a result, the evolution of each region is decoupled from the others and eachregion can be addressed with the mono-regional approach.• The use of β j in equation (3.1.1) is debatable. When the people from region j arrive toregion i , it may be reasonable to assume that the contact rate is β i .• The use of λ j → i in equation (3.1.1) is also very debatable. The probability λ j → i wasoriginally deﬁned to account for the mobility of people from region j to region i withoutspecifying the compartment. However, in equation (3.1.1), we need the probability ofmobility of infectious people from region j to region i , which we denote by µ j → i in thefollowing. It seems reasonable to think that µ j → i may be smaller than λ j → i because assoon as people become symptomatic and suspect their illness, they will probably not move.Two possible options would be: – We could try to make a guess on µ j → i . If the symptoms arise, say, 2 days afterinfection and if we recover in 15 days in average, then we could say that µ j → i =2 / λ j → i . – Since, the above seems however pretty empirical, another option is to use λ j → i andabsorb the uncertainty in the values of the β j that one ﬁts. Lagrangian description of population ﬂux:

We call the above description Eulerian becausewe have ﬁxed the regions as a ﬁxed reference. Another point of view is to follow the trajectoriesof inhabitants of each region, in the same spirit as when we follow the trajectories of ﬂuidparticles.Let now S i , I i , R i the number of susceptible, infectious and recovered people who are domi-ciled in region i, i = 1 , . . . P . It is reasonable to assume that S i ( t ) + I i ( t ) + R i ( t ) is constant intime. However, all the dwellers of region i may not all be in that region at time t . Let λ ( i ) j → k ( t ) be the probability that susceptible people domiciled at i travel from region j to region k at time t . With this notation, λ ( i ) i → i ( t ) is the probability that susceptible people domiciled at i remain inregion i at time t . Similarly, let µ ( i ) j → k ( t ) be the probability that infectious people domiciled at i travel from region j to k at time t . Hence the total number of susceptible and infectious peoplethat are in region i at time t is S i ( t ) = P (cid:88) k =1 P (cid:88) j =1 (cid:16) λ ( k ) j → i ( t ) − λ ( k ) i → j ( t ) (cid:17) S k ( t ) I i ( t ) = P (cid:88) k =1 P (cid:88) j =1 (cid:16) µ ( k ) j → i ( t ) − µ ( k ) i → j ( t ) (cid:17) S k ( t ) We can thus write the evolution over S i , I i , R i as d S i d t = − P (cid:88) j =1 P (cid:88) k =1 β k ( t ) λ ( i ) j → k ( t ) S i ( t ) I k ( t ) (12) d I i d t = − d S i d t − γ i ( t ) I i ( t )d R i d t = γ i ( t ) I i ( t ) S i ( t ) + I i ( t ) + R i ( t ) is constant, which is consistent with the fact that in our model dd t ( S i + I i + R i ) = 0 . We emphasize that, to implement this model, one needs the Lagrangian mobility data λ ( i ) j → k forall ( i, j, k ) ∈ { , . . . , P } . Notation:

In the following, we gather the compartmental variables in vectors (cid:126) S := ( S ) Pi =1 , (cid:126) I := ( I ) Pi =1 , (cid:126) R := ( R ) Pi =1 as well as the time-dependent coeﬃcients (cid:126) β = ( β ) Pi =1 , (cid:126) γ = ( γ ) Pi =1 . For any (cid:126) β and (cid:126) γ ∈ ( L ∞ ([0 , T ])) P , we denote by ( (cid:126) S ,(cid:126) I , (cid:126) R ) = Multiregional_SIR ( (cid:126) S (0) ,(cid:126) I (0) , (cid:126) R (0) , (cid:126) β , (cid:126) γ , [0 , T ]) the output of any of the above multiregional SIR models. For simplicity in what follows, we willomit the initial condition in the notation. In the spirit of the multi-regional SIR, one can formulate detailed multi-regional versions of moredetailed models such as the ones introduced in Section 2.1. We omit the details for the sake ofbrevity.

Similarly as in the mono-regional case, we assume that we are given health data in [0 , T ] in allregions. The observed data in region i is the series of hospitalized people, denoted I obs i , andpeople sent home or dead, denoted R obs i . They are usually given at a national or a regional scaleand on a daily basis.We propose to ﬁt the data and provide forecasts with SIR models with time-dependentparameters β i and γ i for each region i . Like in the mono-regional case, we can prove that sucha simple family possesses optimal ﬁtting properties for our purposes. In the current case, thecost function reads J ( (cid:126) β , (cid:126) γ | (cid:126) I obs , (cid:126) R obs , [0 , T ]) := P (cid:88) i =1 (cid:90) T (cid:0) | I i ( t ) − I obs i ( t ) | + | R i ( t ) − R obs i ( t ) | (cid:1) d t such that (cid:16) (cid:126) S ,(cid:126) I , (cid:126) R (cid:17) = Multiregional_SIR (cid:16) (cid:126) β , (cid:126) γ , [0 , T ] (cid:17) , and the ﬁtting problem is the optimal control problem of ﬁnding J ∗ = inf (cid:126) β ,(cid:126) γ ∈ ( L ∞ ([0 ,T ])) P × ( L ∞ ([0 ,T ])) P J ( (cid:126) β , (cid:126) γ | (cid:126) I obs , (cid:126) R obs , [0 , T ]) . (13)The following proposition ensures the existence of a unique minimizer under certain conditions.To prove it, it will be useful to remark that any of the above multi-regional SIR models (see193.1.1), (3.1.1)) can be written in the general form d (cid:126) S d t = M(Λ( t ) , (cid:126) S ( t ) ,(cid:126) I ( t )) (cid:126) β d (cid:126) I d t = − d (cid:126) S d t − diag( I ( t )) (cid:126) γ d (cid:126) R d t = diag( I ( t )) (cid:126) γ , where, by a slight abuse of notation, the vectors (cid:126) S , (cid:126) I and (cid:126) R are densities of population in thecase of the Eulerian approach (see equation (3.1.1)). They are classical population numbers inthe case of the Lagrangian approach (see equation (3.1.1)). diag( I ( t )) is the P × P diagonalmatrix with diagonal entries given by the vector I ( t ) . M(Λ( t ) , (cid:126) S ( t ) ,(cid:126) I ( t )) is a matrix of size P × P which depends on the vectors of susceptible and infectious people (cid:126) S ( t ) , (cid:126) I ( t ) and on the mobilitymatrix Λ . In the case of the Eulerian description, Λ( t ) = ( λ i,j ( t )) ≤ i,j ≤ P and in the case of theLagrangian approach Λ( t ) is the P × P × P tensor Λ( t ) = ( λ ( i ) j,k ( t )) ≤ i,j,k ≤ P . For example, in thecase of the Eulerian model (3.1.1), the matrix M reads M(Λ( t ) , (cid:126) S ( t ) ,(cid:126) I ( t )) = − diag( (cid:126) S ( t ))Λ T diag( (cid:126) I ( t )) = − ( S i λ i → j I j ) ≤ i,j ≤ P (14) Proposition 3.1. If M(Λ( t ) , (cid:126) S ( t ) ,(cid:126) I ( t )) is invertible for all t ∈ [0 , T ] , then there exists a uniqueminimizer ( (cid:126) β ∗ , (cid:126) γ ∗ ) to problem (3.2) .Proof. Since we assume that

M(Λ( t ) , (cid:126) S ( t ) ,(cid:126) I ( t )) is invertible for every t ∈ [0 , T ] , we can set (cid:40) (cid:126) β ∗ ( t ) := M − ( t ) d (cid:126) S d t (cid:126) γ ∗ ( t ) := diag − ( I ( t )) d (cid:126) R d t or equivalently  (cid:126) β ∗ ( t ) := M − ( t ) d (cid:126) S d t (cid:126) γ ∗ ( t ) := − diag − ( I ( t )) (cid:16) d (cid:126) I d t + M(Λ( t ) , (cid:126) S ( t ) ,(cid:126) I ( t )) (cid:126) β ∗ (cid:17) so that ( (cid:126) S obs ,(cid:126) I obs , (cid:126) R obs ) = Multiregional_SIR (cid:16) (cid:126) β ∗ , (cid:126) γ ∗ , [0 , T ] (cid:17) and J ( (cid:126) β ∗ , (cid:126) γ ∗ | (cid:126) I obs , (cid:126) R obs , [0 , T ]) = 0 which implies that J ∗ = 0 .Before going further, let us comment on the invertibility of M(Λ( t ) , (cid:126) S ( t ) ,(cid:126) I ( t )) which is neces-sary in Proposition 3.1. A suﬃcient condition to ensure it is if the matrix is diagonally dominantrow-wise or column-wise. This yields certain conditions on the mobility matrix Λ( t ) with re-spect to the values of (cid:126) S ( t ) , (cid:126) I ( t ) . For example, if M is deﬁned as in equation (3.2), the matrix isdiagonally dominant per rows if for every ≤ i ≤ P , λ i → i > (cid:88) j (cid:54) = i λ i → j I j I i . Similarly, if for every ≤ j ≤ P , λ j → j > (cid:88) i (cid:54) = j λ i → j S i S j , λ i → i ≈ δ i,j .Now that we have exactly deﬁned the set up for the multi-regional case, we can follow thesame steps of Section 2.2 to derive forecasts involving model reduction for the time-dependentvariables (cid:126) β and (cid:126) γ . In this section we apply our forecasting method to the ﬁrst pandemic wave of COVID-19 inFrance which has taken place approximately between February and May 2020. In particular, weconsider the period going from March 5 to May 7, 2020. Due to the lockdown imposed betweenMarch 11 and May 11, inter-regional population mobility was drastically reduced. Studiesusing anonymized Facebook data have estimated the reduction in 80% (see [3]). As a result,it is reasonable to treat each region independently from the rest, and we apply the mono-regional setting of Section 2. Here, we focus on the case of the Paris region, and we reportdiﬀerent forecasting errors obtained using the methods introduced in this paper. Forecasts willbe done on a τ = 14 days window, starting from three diﬀerent dates, namely T = 25 / / , T = 04 / / and T = 14 / / .Section 4.1 explains the sources of health data. Section 4.2.1 discussed the behavior ofour models in terms of their ﬁtting power. Sections 4.2.2 and 4.2.3 study them in terms oftheir forecasting power. In order not to overload the section, we have put some additionalcomputations in Appendices A and B that bring additional information about certain numericalaspects. We use data from the SI-VIC database to get the number I obs ( t ) of hospitalized, and R obs ( t ) ofrecovered people (sent home or dead). The database was pre-processed and made available to usby the DREES . As shown in Figure 3, the delivered data by DREES present sharp oscillationsat the scale of the week (see red curves), which are due to administrative delays in which thecases where oﬃcially reported by the hospitals. For our methodology, we have smoothed thedata by applying a 7 days moving average ﬁlter (see green curves). In order to account for thetotal number of infected people, we also multiply the data by an adjustment factor α = 15 ashinted from [9]. Obviously, this factor is uncertain and could be improved in the light of furtherretrospective studies of the outbreak. However, note that when S (cid:39) N which is the case in thestart of the epidemic, the impact of this factor is negligible in the dynamics as can be understoodfrom (2.2). Using the observations I obs ( t ) and R obs ( t ) , we apply a ﬁnite diﬀerence scheme in formula (2.2)to derive β ∗ obs ( t ) and γ ∗ obs ( t ) for t ∈ [0 , T ] . We next follow the steps of Section 2.2 to computeour forecasts. In the learning phase (step 1), we use two parametric detailed models of type SE IUR and

SEI CHRD to generate training sets B tr and G tr composed of K = 2618 trainingfunctions β ( µ ) and γ ( µ ) where µ are uniformly sampled in the set of parameters P tr in thevicinity of the parameter values suggested in the literature [9, 20]. Based on these training DREES stands for Direction de la Recherche, des Études, de l’Évaluation et des Statis-tiques. It is a department within the French Ministry of Solidarity and Health. Website: https://drees.solidarites-sante.gouv.fr/etudes-et-statistiques/ - - - - - (a) Hospital Admissions. - - - - - (b) Sent home. - - - - - (c) Dead. Figure 3: Example of raw and smoothed data stemming from SI-VIC database for the depart-ment of Paris (France). .sets, we build three types of reduced models: SVD, NMF and ENG (see section 2.4). Givenreduced bases B n and G n , we search for the optimal β ∗ n ∈ B n and γ ∗ n ∈ G n that ﬁt at best theobservations (step 2 of our procedure). For this ﬁtting step we consider two loss functions:1. routine-IR : loss function J ( β , γ | I obs , R obs , [0 , T ]) from (2.2),2. routine- βγ : loss function (cid:101) J ( β , γ | β ∗ obs , γ ∗ obs , [0 , T ]) from (2)For each type of ﬁtting routine, we study the performance of each of the three reduced modelsand the impact of the dimension n of the reduced model in terms of ﬁtting and forecasting error. Given reduced basis B n and G n for the sets B and G , the ﬁtting error for routine-IR was deﬁnedin equation (2) as J ∗ (B n , G n ) = min ( β , γ ) ∈ B n × G n J ( β , γ | I obs , R obs , [0 , T ])= min ( β , γ ) ∈ B n × G n (cid:90) T (cid:0) | I ( t ) − I obs ( t ) | + | R ( t ) − R obs ( t ) | (cid:1) d t, s.t. ( S , I , R ) = SIR ( β , γ , [0 , T ]) . The ﬁtting error for routine- βγ has a similar deﬁnition, namely J ∗ (B n , G n ) = min ( β , γ ) ∈ B n × G n (cid:101) J ( β , γ | β ∗ obs , γ ∗ obs , [0 , T ])= min ( β , γ ) ∈ B n × G n (cid:90) T (cid:0) | β ( t ) − β ∗ obs ( t ) | + | γ ( t ) − γ ∗ obs ( t ) | (cid:1) d t We then report the error components associated to β and γ of these L errors. We study thebehavior of the errors as the dimension n of the reduced models increases. We recall that theﬁtting procedure works both on the components of the reduced basis and the initial time of theepidemics to minimize the loss function. Fitting from March 12, 2020 to April 4, 2020:

Figure 4 summarizes the ﬁtting resultsfor an example ﬁtting period taken from t = 12 / to T = 04 / and for routine- βγ . Fig-ures 4a and 4b show the relative ﬁtting errors (cid:107) β ∗ n − β ∗ obs (cid:107) L ([ t ,T ]) / (cid:107) ¯ β ∗ obs (cid:107) L ([ t ,T ]) and (cid:107) γ ∗ n − γ ∗ obs (cid:107) L ([ t ,T ]) / (cid:107) ¯ γ ∗ obs (cid:107) L ([ t ,T ]) using SVD, NMF and ENG reduced bases, where the notation ¯ x denotes the mean of the quantity x over [0 , T ] . Figures 4c and 4d show the L and L ∞ relativeerrors on I n and R n after the propagation of the ﬁttings of β ∗ n and γ ∗ n .22rom Figures 4a and 4b, we observe that the ﬁtting error with SVD decreases at a moderaterate as the dimension n of the reduced basis is increased. The error with NMF and ENG doesnot decrease and oscillates around certain constant error value of order − . Note that thisvalue is small and yields to small errors in the approximation of I and R as Figures 4c and 4dillustrate. One may however wonder why the ﬁtting error from the reduced models decreasesso slowly and is even non decreasing in the case of NMF and ENG. In Appendix A, we showthrough some numerical tests that this behavior is due to a combination of several elements:1. Observation noise:

The observed health data I obs and R obs presents several sources ofnoise: there may be some inaccuracies in the reporting of cases, and the data are thenpostprocessed. This eventually yields to noisy time series I obs and R obs for which it isdiﬃcult to estimate the uncertainty. These noisy data are then used to produce β ∗ obs and γ ∗ obs which are in turn also noisy.2. Model errors: (a)

Intrinsic model error on B and G : The families of detailed models that we use arerich but they may not cover all possible evolutions of I obs and R obs . In other words,our manifolds B and G may not perfectly cover the real epidemiological scenario.(b) Sampling error of B and G : The size of the training sets B tr and G tr and the di-mension n of the reduced models B n and G n also limit the approximation capabilitiesof the continuous target sets B and G Note however that the model error are here of a smaller order than the errors due to thenoise in the observation data

Fitting from April 4, 2020 to April 27, 2020:

Figure 5 presents under the same format asbefore the ﬁtting results for another example ﬁtting period taken from t = 04 / to T = 27 / and for routine- βγ . Observations and conclusions are the same as the ones for the previousexample.By comparing the behaviour of the error in ﬁgures 4, 5 with ﬁgures 15, 16, we observe thatSVD is the best method for ﬁtting whereas the errors for NMF and ENG remains higher evenwhen n gets larger. For ﬁgures 4, 5, the convergence of SVD is slower and the error is two ordersof magnitude higher for NMF and ENG. Figures 6, 7 show the ﬁttings obtained using the threemethods for the two time intervals considered.We observe that SVD is the best at ﬁtting β ∗ obs and γ ∗ obs , in another hand ENG is producinga smooth ﬁtting of the data. Despite the error observed for ENG, the smooth trend that itproduces seems more promising for the forecasting. In this section we illustrate the behavior of our strategy in terms of forecasting power. Weconsider a forecasting window of τ = 14 days and we examine three diﬀerent starting days T = 25 / , T = 04 / and T = 14 / , which are shown in Figures 8, and 9 and 10 respectively.Recall that that the forecasting uses the coeﬃcients of the reduced basis obtained by the ﬁttingprocedure, but also the optimal initial time of the forecast that minimises the error on the threedays prior to the start of the forecast. For each given ﬁtting strategy ( routine-IR , routine- βγ ),and each given type of reduced model (SVD, NMF, ENG), we have chosen to plot an averageforecast computed with predictions using reduced dimensions n ∈ { , , , , , } . This choiceis a simple type of forecast combination but of course other more elaborate aggregation optionscould be considered. The labels of the plots correspond to the following:23 − − − SV DNMFENG (a) n (cid:55)→ (cid:107) β ∗ n − β ∗ obs (cid:107) L t ,T ]) (cid:107) ¯ β ∗ obs (cid:107) L t ,T ]) − − − SV DNMFENG (b) n (cid:55)→ (cid:107) γ ∗ n − γ ∗ obs (cid:107) L t ,T ]) (cid:107) ¯ γ ∗ obs (cid:107) L t ,T ]) − − − SV DNMFENG (c) n (cid:55)→ || I n − I obs || L t ,T ]) || ¯ I obs || L t ,T ]) − − − SV DNMFENG (d) n (cid:55)→ || R n − R obs || L ∞ ([ t ,T ]) || ¯ R obs || L ∞ ([ t ,T ]) Figure 4:

Fitting errors from t = 12 / / to T = 03 / / − − SV DNMFENG (a) n (cid:55)→ (cid:107) β ∗ n − β ∗ obs (cid:107) L t ,T ]) (cid:107) ¯ β ∗ obs (cid:107) L t ,T ]) − − SV DNMFENG (b) n (cid:55)→ (cid:107) γ ∗ n − γ ∗ obs (cid:107) L t ,T ]) (cid:107) ¯ γ ∗ obs (cid:107) L t ,T ]) − − − SV DNMFENG (c) n (cid:55)→ || I n − I obs || L t ,T ]) || ¯ I obs || L t ,T ]) − − − − SV DNMFENG (d) n (cid:55)→ || R n − R obs || L ∞ ([ t ,T ]) || ¯ R obs || L ∞ ([ t ,T ]) Figure 5:

Fitting errors from t = 04 / / to T = 27 / / . - - - - - - - - - - - - - - - - - - . . . β ∗ obs β ∗ ,ENG β ∗ ,SV D β ∗ ,NMF (a) β - - - - - - - - - - - - - - - - - - . . . . . γ ∗ obs γ ∗ ,ENG γ ∗ ,SV D γ ∗ ,NMF (b) γ Figure 6:

Fitting from t = 12 / / to T = 03 / / - - - - - - - - - - - - - - - - - - . . . . . β ∗ obs β ∗ ,ENG β ∗ ,SV D β ∗ ,NMF (a) β - - - - - - - - - - - - - - - - - - . . . . . γ ∗ obs γ ∗ ,ENG γ ∗ ,SV D γ ∗ ,NMF (b) γ Figure 7:

Fitting from t = 04 / / to T = 27 / / . • I SV D , I NMF , I ENG , R SV D , R NMF , R ENG are the average forecasts obtained using routine- βγ .• I ∗ SV D , I ∗ NMF , I ∗ ENG , R ∗ SV D , R ∗ NMF , R ∗ ENG are the average forecasts obtained using routine-IR .The main result that Figures 8 to 10 illustrate is that routine- βγ using ENG reduced modelsis the most robust and accurate forecasting method. These ﬁgures only provide visual proof ofthis claim though, so we conﬁrm it by making a study of the numerical forecasting errors of thediﬀerent methods. In order not overload the section with too many plots, this study is presentedin Appendix B.Figures 8 to 10 also show that the SVD reduced model is very unstable and it providesforecasts that blow up. This behavior illustrates one of the other important ideas of the paper,namely that a method with high ﬁtting power may present poor forecasting power due to in-stabilities. In the case of the SVD, the instabilities stem from the fact that approximations areallowed to take negative values. This is the reason why NMF, which incorporates the nonneg-ative constraint, performs better than SVD. The reason why ENG outperforms NMF relies inthe enlargement of the cone of nonnegative functions. - - - - - - - - - - - - - - - - - - - - I ENG I SVD I NMF I ∗ ENG I ∗ NMF I ∗ SVD I obs data limit (a) Infected - - - - - - - - - - - - - - - - - - - - R ENG R SVD R NMF R ∗ ENG R ∗ NMF R ∗ SVD R obs data limit (b) Recovered Figure 8: 14 day forecasts from T = 25 / . - - - - - - - - - - - - - - - - - - - - I ENG I SVD I NMF I ∗ ENG I ∗ NMF I ∗ SVD I obs data limit (a) Infected - - - - - - - - - - - - - - - - - - - - R ENG R SVD R NMF R ∗ ENG R ∗ NMF R ∗ SVD R obs data limit (b) Recovered Figure 9: 14 day forecasts from T = 04 / .26 - - - - - - - - - - - - - - - - - - - - I ENG I SVD I NMF I ∗ ENG I ∗ NMF I ∗ SVD I obs data limit (a) Infected - - - - - - - - - - - - - - - - - - - - R ENG R SVD R NMF R ∗ ENG R ∗ NMF R ∗ SVD R obs data limit (b) Recovered Figure 10: 14 day forecasts from T = 14 / . For our best forecasting method ( routine- βγ using ENG), we plot in Figures 11 to 13 theforecasts for each given dimension n = 5 to . We see that the method performs reasonablywell for all values of n , and proves that the results of the previous section with the averagedforecast are not compensating spurious eﬀects which could occur for certain values of n . Weintentionally chose to display the forecast from 04/05 as it is one of the worse predictions obtainedusing this method. Note that the method was able to predict the peak of the epidemic severaldays in advance (see Figure 11). - - - - - - - - - - - - - - - - - - - - n = n = n = n = n = n = I ENG I obs data limit (a) Infected - - - - - - - - - - - - - - - - - - - - n = n = n = n = n = n = R ENG R obs data limit (b) Recovered Figure 11: Greedy forecast from T = 25 / - - - - - - - - - - - - - - - - - - - - n = n = n = n = n = n = I ENG I obs data limit (a) Infected - - - - - - - - - - - - - - - - - - - - n = n = n = n = n = n = R ENG R obs data limit (b) Recovered Figure 12: Greedy forecast from T = 04 / - - - - - - - - - - - - - - - - - - - - n = n = n = n = n = n = I ENG I obs data limit (a) Infected - - - - - - - - - - - - - - - - - - - - n = n = n = n = n = n = R ENG R obs data limit (b) Recovered Figure 13: Greedy forecast from T = 14 / Conclusion

We have developed an epidemiological forecasting method based on reduced modeling techniques.Among the diﬀerent strategies that have been explored, the one that outperforms the restin terms of robustness and forecasting power involves reduced models that are built with anEnlarged Nonnegative Greedy (ENG) strategy. This method is novel and of interest in itselfsince it allows to build reduced models that preserve positivity and even other types of bounds.Despite that ENG does not have optimal ﬁtting properties (i.e. interpolation properties), itis well suited for forecasting since, due to the preservation of the natural constraints of thecoeﬃcients, it generates realistic dynamics with few modes. The results have been presented fora mono-regional test case, and we plan to extend the present methodology to a multi-regionalsetting using mobility data.Last but not least, we would like to emphasize that the developed strategy is very general,and could be applied to the forecasting of other types of dynamics. The speciﬁcity of eachproblem may however require adjustments in the reduced models. This is exempliﬁed in thepresent problem through the construction of reduced models that preserve positivity.

Acknowledgments and disclosure of funding

We would like to thank our colleagues from the “Face au virus” initiative in PSL University:Jamal Atif, Laurent Massoulié, Olivier Cappé, and Akin Kazakcy. We also thank GabrielTurinici for his feedback on the multiregional models. Finally, we are grateful to the DREESfor providing the health data. Part of this research has been supported by the Emergencesproject grant “Models and Measures” of the Paris city council, by the generous donation madeavailable by Alkan together with the complementary funding given by the Sorbonne UniversityFoundation. This research is done in the frame of the project Pandemia/Covidia and also theGIS Obepine, both projects aiming at a better understanding of the Covid-19 pandemic.

A Analysis of model error and observation noise

In this section we study the impact of observation noise and model error in the quality andbehavior of the ﬁtting step. The elements that impact the accuracy of our procedure are thefollowing:1.

Observation noise:

The observed health data I obs and R obs presents several sources ofnoise: there may be some inaccuracies in the reporting of cases, and the data are thenpost-processed. This eventually yields to noisy time series I obs and R obs for which it isdiﬃcult to estimate the uncertainty. These noisy data are then used to produce (throughﬁnite diﬀerences) β ∗ obs and γ ∗ obs which are in turn also noisy.2. Model errors: (a)

Intrinsic model error on B and G : The families of detailed models that we use arerich but they may not cover all possible evolutions of I obs and R obs . In other words,our manifolds B and G may not perfectly cover the real epidemiological scenario.(b) Sampling error of B and G : The size of the training sets B tr and G tr and the di-mension n of the reduced models B n and G n also limit the approximation capabilitiesof the continuous target sets B and G In this section, we aim at disentangling these eﬀects in order to give a better insight of the ﬁttingerror behavior obtained for the ﬁtting with real data from Paris observed (see Figures 4 and 5).28 .1 Study of the impact of the sampling error: Fitting a virtual scenario

Our set of virtual epidemiological dynamics is U . After the collapsing step, the manifolds for β and γ are B and G . These sets are potentially inﬁnite and we have used ﬁnite training subsets B tr ⊆ B and G tr ⊆ G to build the reduced models B n and G n .First we look at the eigenvalues for β and γ when performing an SVD decomposition for thevirtual scenarios. Figure 14 shows a rapid decay of the eigenvalues obtained by SVD decomposi-tion, it shows that we can obtain a very good approximation of elements of B tr ⊆ B and G tr ⊆ G with only a few modes. − − − − (a) Normalised eigenvalues for β vs n − − − − (b) Normalised eigenvalues for γ vs n Figure 14: Training step: Decay of SVD eigenvaluesAmong the sources of noise and model error given at the beginning of Section A, we canstudy the impact of the sampling error (point 2-b) as follows. We consider the ﬁtting erroron an interval of days for two functions β and γ which belong to the manifold B and G and are in the training sets B tr and G tr . We recall that the ﬁtting procedure works both onthe components of the reduced basis and the initial time of the epidemics to minimize the lossfunction. Figure 15 shows the ﬁtting error on the virtual scenario considered using SVD, NMFand ENG reduced bases for n = 2 , . . . , . We observe that the ﬁtting errors behave similarlyas the error decay of the training step (Figure 14). This illustrates that the sampling error isnot playing a major role in our experiments. − − − − SV DNMFENG (a) L relative error of β vs n − − − − SV DNMFENG (b) L relative error of γ vs n Figure 15: Study of sampling errors: Fitting errors of functions β and γ from B and G thatbelong to the training sets B tr and G tr .Figure 16 shows the L and L ∞ errors obtained after the propagation of the ﬁttings of β and γ . In both metrics, the error for I and R obtained using SVD is decreasing to − . Whenthe NMF and the ENG are used, the error decreases and stagnates at − for both I and R .29 − − − − − SV DNMFENG (a) L relative error of I vs n − − − − − SV DNMFENG (b) L ∞ relative error of R vs n Figure 16: Study of sampling errors: Errors on I and R obtained by a SIR model using theﬁtted β and γ .Now we consider the ﬁtting error for two functions β and γ which belong to the manifold B and G but which are not in the training sets B tr and G tr . Figure 17 shows the ﬁtting erroron the virtual scenario considered using SVD, NMF and ENG reduced bases for n = 2 , . . . , .Comparing the results to Figure 16, one notes that weather or not β and γ are in the trainingsets B tr and G tr the quality of the ﬁtting obtained by each method is similar. − − − − SV DNMFENG (a) L relative error of β vs n − − − − SV DNMFENG (b) L relative error of γ vs n Figure 17: Study of sampling errors: Fitting errors of functions β and γ from B and G butwhich do not belong to the training sets B tr and G tr .Figure 18 shows the L and L ∞ errors obtained after the propagation of the ﬁttings of β and γ . In both metrics, the error for I and R obtained using SVD is decreasing to each − .When the NMF and the ENG are used, the error decreases and stagnates respectively at − and − for both I and R . 30 − − − − SV DNMFENG (a) L relative error of I vs n − − − − SV DNMFENG (b) L ∞ relative error of R vs n Figure 18: Study of sampling errors: Errors on I and R obtained by a SIR model using theﬁtted β and γ . A.2 Study of the impact of noisy data and intrinsic model error

To investigate the impact of noise in the observed data, we now add noise to the two previouslychosen functions β ∈ B and γ ∈ G , and we study the ﬁtting error on this noisy data. The levelof the noise has been chosen to be of the same level as the one estimated for the real dynamics.In order to estimate the noise we performed a ﬁtting of the real data using SVD reduced bases,the level of noise is deﬁned as the diﬀerence between this ﬁtting and the real data. This level ofnoise is then added to the virtual scenario considered here. Figure 19 shows the ﬁtting error on β and γ using SVD, NMF and NEG reduced bases for n = 2 , . . . , . − − − SV DNMFENG (a) L relative error of β vs n − − − SV DNMFENG (b) L relative error of γ vs n Figure 19: Fitting errors of β and γ on noisy virtual scenarioWe observe that the noise strongly deteriorates the ﬁtting error obtained using NMF, and theerror becomes oscillatory and very unstable. For ENG, the error remains low and consistentlyaround − for β . We observe the same behaviour for γ after an initial decay. For SVD theerror is slightly lower than in the ENG case. Figure 20 shows the L and L ∞ errors obtainedafter the propagation of the ﬁttings of β and γ . In line with the observations from Figure 19the error obtained using NMF is very unstable. Using ENG, we observe a decay from n = 2 to n = 10 and stagnates around − . The decay observed for SVD is faster and stagnates from n = 4 at − . 31 − − − − SV DNMFENG (a) L relative error of I vs n − − − − SV DNMFENG (b) L ∞ relative error of R vs n Figure 20: Fitting errors of I and R on noisy dataFinally, it remains to add intrinsic model error on top of the previous sampling error andobservation noise. If we do so, the main change is that the previous ﬁtting error plots fromFigure 19 have essentially the same behavior but the error values are increased depending onthe degree of model inaccuracy.We have therefore disentangled all the eﬀects of model error and noisy data, and all theobservations from this section give thus a better insight of the ﬁtting error behavior obtainedfor the ﬁtting with real data from Paris observed (see Figures 4 and 5). B Study of forecasting errors

In this section we give a detailed study of the forecasting errors for each diﬀerent ﬁtting strat-egy ( routine-IR , routine- βγ ), reduced model (SVD, NMF, ENG) and starting date T . Weanticipate the main conclusion: ENG ﬁtted with routine- βγ outperforms the other reducedmodels and is the most robust and accurate reduced model to use in a forecasting strategy. B.1 Forecasting with routine- βγ Figure 21 shows the relative errors of a 7 days and 14 days forecast from T = 25 / for eachforecasting method and each reduced basis. Similarly Figures 22 and 23 show forecasts relativeerrors respectively from T = 04 / and T = 14 / .We observe that the quality of the forecast depends on the reduced basis but also stronglyon the starting day T from which the forecast is done. Using the SVD reduced bases, the 7 daysand 14 days forecasts lead to large relative errors which even blow up as n increases. The NMFreduced model leads to errors that are in most cases worse than the ones obtained with ENG.The error is often above . The ENG reduced model produces more accurate and consistentforecasts than SVD and NMF. Particularly, the relative error is always below . − and oftenbelow − for 7 day forecasts. 32 − − SV DNMFENG (a) L relative error of I vs n (7 days) − − SV DNMFENG (b) L ∞ relative error of R vs n (7 days) − SV DNMFENG (c) L relative error of I vs n (14 days) − SV DNMFENG (d) L ∞ relative error of R vs n (14 days) Figure 22: Forecasting relative errors of I and R (from T = 04 / , routine- βγ ) − − SV DNMFENG (a) L relative error of I vs n (7 days) − − SV DNMFENG (b) L ∞ relative error of R vs n (7 days) − SV DNMFENG (c) L relative error of I vs n (14 days) − SV DNMFENG (d) L ∞ relative error of R vs n (14 days) Figure 21: Forecasting errors of I and R (from T = 25 / , routine- βγ )33 SV DNMFENG (a) L relative error of I vs n (7 days) − − SV DNMFENG (b) L ∞ relative error of R vs n (7 days) − SV DNMFENG (c) L relative error of I vs n (14 days) − SV DNMFENG (d) L ∞ relative error of R vs n (14 days) Figure 23: Forecasting relative errors of I and R (from T = 14 / using routine- βγ ) B.2 Forecasting with routine-IR

Figure 24 shows the relative errors of a 7 days and 14 days forecast from T = 25 / for eachextrapolation method and each reduced basis. Similarly ﬁgures 25 and 26 show forecasts relativeerrors respectively from T = 05 / and T = 14 / .We observe that the forecasts obtained by SVD are not accurate and produce a constanterror for every value of n . This is due to the fact that the optimisation method used for theﬁtting did not converge in every case. The accuracies obtained by NMF and ENG are relativelyoscillatory as the errors sometimes blow up with n . This makes them less reliable compared tothe behavior observed with routine- βγ . 34 − SV DNMFENG (a) L relative error of I vs n (7 days) − SV DNMFENG (b) L ∞ relative error of R vs n (7 days) − SV DNMFENG (c) L relative error of I vs n (14 days) × − × − × − SV DNMFENG (d) L ∞ relative error of R vs n (14 days) Figure 24: Forecasting relative errors of I and R (from T = 25 / , routine-IR ) − SV DNMFENG (a) L relative error of I vs n (7 days) − SV DNMFENG (b) L ∞ relative error of R vs n (7 days) − SV DNMFENG (c) L relative error of I vs n (14 days) − SV DNMFENG (d) L ∞ relative error of R vs n (14 days) Figure 25: Forecasting relative errors of I and R (from T = 05 / , routine-IR )35 − SV DNMFENG (a) L relative error of I vs n (7 days) − × − × − SV DNMFENG (b) L ∞ relative error of R vs n (7 days) − SV DNMFENG (c) L relative error of I vs n (14 days) − SV DNMFENG (d) L ∞ relative error of R vs n (14 days) Figure 26: Forecasting relative errors of I and R (from T = 15 / , routine-IR ) References [1] Anastassopoulou, C., Russo, L., Tsakris, A. and Siettos, C. Data-based analysis, modellingand forecasting of the COVID-19 outbreak.

PloS one , 15(3), p. e0230405, 2020.[2] Armocida, B., Formenti, B., Ussai, S., Palestra, F. and Missoni, E. The Italian healthsystem and the COVID-19 challenge.

The Lancet Public Health , 5(5), p. e253, 2020.[3] Atif, J., Cappé, O., Kazakçi, A., Léo, Y., Massoulié, L. and Mula, O. Initiative face auvirus Observations sur la mobilité pendant l’épidémie de Covid-19. Technical report, PSLUniversity, May 2020. URL https://hal.archives-ouvertes.fr/hal-02921194 .[4] Benner, P., Cohen, A., Ohlberger, M. and Willcox, K.

Model Reduction and Approximation:Theory and Algorithms , volume 15. SIAM, 2017.[5] Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G. and Wojtaszczyk, P. Conver-gence Rates for Greedy Algorithms in Reduced Basis Methods.

SIAM Journal on Mathemat-ical Analysis , 43(3), pp. 1457–1472, 2011. URL http://dx.doi.org/10.1137/100795772 .[6] Ceylan, Z. Estimation of COVID-19 prevalence in Italy, Spain, and France.

Science of TheTotal Environment , p. 138817, 2020.[7] Cohen, A. and DeVore, R. Kolmogorov widths under holomorphic mappings.

IMA Journalof Numerical Analysis , 36(1), pp. 1–12, 2016. URL http://dx.doi.org/10.1093/imanum/dru066 .[8] DeVore, R., Petrova, G. and Wojtaszczyk, P. Greedy algorithms for reduced bases in Banachspaces.

Constructive Approximation , 37(3), pp. 455–466, 2013.369] Di Domenico, L., Pullano, G., Sabbatini, C. E., Boëlle, P.-Y. and Colizza, V. Impact attendudu conﬁnement en Ile-de-France et stratégie de sortie possibles. URL .[10] Fang, X., Liu, W., Ai, J., He, M., Wu, Y., Shi, Y., Shen, W. and Bao, C. Forecastingincidence of infectious diarrhea using random forest in Jiangsu Province, China.

BMCInfectious Diseases , 20(1), pp. 1–8, 2020.[11] Ferguson, N., Laydon, D., Nedjati Gilani, G., Imai, N., Ainslie, K., Baguelin, M., Bha-tia, S., Boonyasiri, A., Cucunuba Perez, Z., Cuomo-Dannenburg, G. et al. Report 9:Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality andhealthcare demand. 2020. URL .[12] Flaxman, S., Mishra, S., Gandy, A., Unwin, H. J. T., Mellan, T. A., Coupland, H.,Whittaker, C., Zhu, H., Berah, T., Eaton, J. W. et al. Estimating the eﬀects of non-pharmaceutical interventions on COVID-19 in Europe.

Nature , 584(7820), pp. 257–261,2020.[13] Gillis, N. The why and how of nonnegative matrix factorization.

Regularization, optimiza-tion, kernels, and support vector machines , 12(257), pp. 257–291, 2014.[14] Hesthaven, J. S., Rozza, G. and Stamm, B.

Certiﬁed reduced basis methods for parametrizedpartial diﬀerential equations , volume 590. Springer, 2016.[15] Kermack, W. O., McKendrick, A. G. and Walker, G. T. A contribution to the mathematicaltheory of epidemics.

Proceedings of the Royal Society of London. Series A, ContainingPapers of a Mathematical and Physical Character , 115(772), pp. 700–721, 1927. URL http://dx.doi.org/10.1098/rspa.1927.0118 .[16] Liu, Z., Magal, P., Seydi, O. and Webb, G. A COVID-19 epidemic model with latencyperiod.

Infectious Disease Modelling , 5, pp. 323–337, 2020.[17] Liu, Z., Magal, P., Seydi, O. and Webb, G. Predicting the cumulative number of cases forthe COVID-19 epidemic in China from early data. arXiv preprint arXiv:2002.12298 , 2020.[18] Maday, Y., Mula, O. and Turinici, G. Convergence analysis of the Generalized EmpiricalInterpolation Method.

SIAM Journal on Numerical Analysis , 54(3), pp. 1713–1731, 2016.URL http://dx.doi.org/10.1137/140978843 .[19] Maday, Y. and Patera, A. Reduced basis methods. In P. Benner, S. Grivet-Talocia, A. Quar-teroni, G. Rozza, W. Schilders and L. Silveira (editors),

Model Order Reduction , chapter 4,pp. 139–179. De Gruyter, Oxford, 16 Dec. 2020.[20] Magal, P. and Webb, G. Predicting the number of reported and unreported cases for theCOVID-19 epidemic in South Korea, Italy, France and Germany.

Medrxiv , 2020. URL https://doi.org/10.1101/2020.03.21.20040154 .[21] Paatero, P. and Tapper, U. Positive matrix factorization: A non-negative factor modelwith optimal utilization of error estimates of data values.

Environmetrics , 5(2), pp. 111–126, 1994. URL https://doi.org/10.1002/env.3170050203 .[22] Poncela, P., Rodríguez, J., Sánchez-Mangas, R. and Senra, E. Forecast combination throughdimension reduction techniques.

International Journal of Forecasting , 27(2), pp. 224–237,April 2011. URL https://ideas.repec.org/a/eee/intfor/v27yi2p224-237.html .3723] Quarteroni, A., Manzoni, A. and Negri, F.

Reduced basis methods for partial diﬀerentialequations: an introduction , volume 92. Springer, 2015.[24] Roda, W. C., Varughese, M. B., Han, D. and Li, M. Y. Why is it diﬃcult to accuratelypredict the COVID-19 epidemic?

Infectious Disease Modelling , 2020.[25] Roques, L., Klein, E. K., Papaïx, J., Sar, A. and Soubeyrand, S. Impact of Lockdown onthe Epidemic Dynamics of COVID-19 in France.