Robust inference for nonlinear regression models from the Tsallis score: application to Covid-19 contagion in Italy
Paolo Girardi, Luca Greco, Valentina Mameli, Monica Musio, Walter Racugno, Erlis Ruli, Laura Ventura
RRobust inference for nonlinear regression models from the Tsallis score: application to
Covid-19 contagion in Italy
Paolo Girardi , Luca Greco , Valentina Mameli ,Monica Musio , Walter Racugno , Erlis Ruli andLaura Ventura Department of Developmental Psychology and Socialisation, University of Padova,Padova, Italy Department of Law, Economics, Management and Quantitative Methods, SannioUniversity, Benevento, Italy Department of Economic and Statistical Sciences, University of Udine, Udine, Italy Department of Mathematics and Informatics, University of Cagliari, Cagliari, Italy Department of Statistical Sciences, University of Padova, Padova, Italy
Address for correspondence:
Laura Ventura, Department of Statistical Sciences,Via C. Battisti 241, 35121 Padova, Italy.
E-mail: [email protected] . Phone: (+39) 049 8274177.
Fax: +39) 049 8274170. a r X i v : . [ s t a t . A P ] A p r Paolo Girardi et al.
Abstract:
We discuss an approach for fitting robust nonlinear regression models,which can be employed to model and predict the contagion dynamics of the Covid-19in Italy. The focus is on the analysis of epidemic data using robust dose-responsecurves, but the functionality is applicable to arbitrary nonlinear regression models.
Key words:
Influence function; Model misspecification; Nonlinear regression; SARSCoV-2 disease; Scoring rules.
We aim to discuss a robust model which can be useful to model and predict the spreadof the Coronavirus disease SARS CoV-2 (Covid-19) in Italy. In particular, we focus ona statistical model for daily deaths and cumulated intensive care unit hospitalizationsand that can help to understand when the peak and the upper asymptote of contagionis reached, so that preventive measures (such as mobility restrictions) can be appliedand/or relaxed. For these data robust procedures are particularly useful since theyallow us to deal at the same time with model misspecifications and data reliability.Nonlinear regression is an extension of classical linear regression, in which data aremodeled by a function, which is a nonlinear combination of unknown parameters anddepends on an independent variable. A relevant application of non-linear regressionmodels concerns the modeling of so called dose-response relation, useful in toxicol-ogy, pharmacology and in the analysis of epidemic data. In these frameworks, theparameters of the model have a relevant interpretation, such as the upper limit andthe inflection point. obust inference for nonlinear regression models
3A normal non-linear regression model is obtained by replacing the linear predictor x T β by a known non-linear function µ ( x, β ), called the mean function. The model(see, e.g., Bates and Watts, 2007) y i = µ ( x i , β ) + ε i , with i = 1 , . . . , n, (1.1)is called a non-linear regression model, where x i is a scalar covariate, β is an unknown p -dimensional parameter, and ε i are independent and identically distributed N (0 , σ )random variables.Likelihood inference is the usual approach to deal with nonlinear models. The log-likelihood function for θ = ( β, σ ) is (cid:96) ( θ ) = − n πσ ) − σ n (cid:88) i =1 [ y i − µ ( x i , β )] , (1.2)and all likelihood quantities (maximum likelihood estimates, tests, confidence inter-vals, prediction, etc) can be easily derived. In practice, using the statistical envi-ronment R , the package drc (Ritz et al. , 2015) provides a user-friendly interface tospecify the model assumptions about the nonlinear relationship and comes with anumber of extractors for summarizing fitted models and carrying out inference onderived parameters.A large number of more or less well-known model functions are available (log-logistic,Weibull, Gamma, etc, see for instance Ritz et al. , 2015, Table 1). These models areparameterized using a unified structure with a coefficient b denoting the steepness ofthe curve, c and d the lower and upper asymptotes or limits of the response, and,for some models, e the inflection point. For instance, the five parameter log-logisticmodel assumes µ ( x i , β ) = c + d − c (1 + exp( b (log( x ) − log( e )))) f , with β = ( b, c, d, e, f ) . (1.3) Paolo Girardi et al.
However, in the presence of model misspecifications or deviations in the observed data,classical likelihood inference may be innacurate (see, e.g., Huber and Ronchetti, 2009,and references therein). The aim of this paper is to discuss the use of robust inferenceon nonlinear regression models. In particular, we discuss a general approach basedon the Tsallis score (Basu et al. , 1988, Ghosh and Basu, 2013, Dawid et al. , 2016,Mameli et al. , 2018).
To deal with model misspecifications, useful surrogate likelihoods are given by properscoring rules (see Dawid et al. , 2016, and references therein). A scoring rule is a lossfunction which is used to measure the quality of a given probability distribution fora random variable Y , in view of the result y of Y .When working with a parametric model with probability density function f ( y ; θ ),with θ ∈ Θ ⊆ IR d , an important example of proper scoring rules is the log-score,which corresponds to minus the log-likelihood function (Good, 1952). In this paper,to deal with robustness, we focus on the Tsallis score (Tsallis, 1988), given by S ( y ; θ ) = ( γ − (cid:90) f ( y ; θ ) γ dy − γf ( y ; θ ) γ − , γ > . (2.1)The density power divergence d α of Basu et al. (1998) is just (2.1), with γ = α + 1,multiplied by 1 /α . The Tsallis score gives in general robust procedures (Ghosh andBasu, 2013, Dawid et al. , 2016), and the parameter γ is a trade-off between efficiencyand robustness. obust inference for nonlinear regression models θ = ( β, σ ) is S ( θ ) = n (cid:88) i =1 (cid:34) − γ (cid:18) √ πσ (cid:19) ( γ − exp (cid:18) − ( γ − σ ( y i − µ i ( x i , β )) (cid:19) + ( γ − √ γ (2 πσ ) ( γ − (cid:35) . (2.2) The validity of inference about θ = ( β, σ ) using scoring rules can be justified byinvoking the general theory of unbiased M -estimating functions. Indeed, inferencebased on proper scoring rules is a special kind of M -estimation (see, e.g., Dawid etal. , 2016, and references therein). The class of M -estimators is broad and includesa variety of well-known estimators. For example it includes the maximum likelihoodestimator (MLE) and robust estimators (see e.g. Huber and Ronchetti, 2009) amongothers.Let s ( y ; θ ) be the gradient vector of S ( y ; θ ) with respect to θ , i.e. s ( y ; θ ) = ∂S ( y ; θ ) /∂θ .Under broad regularity conditions (see Mameli and Ventura, 2015, and referencestherein), the scoring rule estimator ˜ θ is the solution of the unbiased estimating equa-tion s ( θ ) = (cid:80) ni =1 s ( y i ; θ ) = 0 (see Dawid et al. , 2016) and it is asymptotically normal,with mean θ and covariance matrix V ( θ ) /n , where V ( θ ) = K ( θ ) − J ( θ )( K ( θ ) − ) T , (2.3)where K ( θ ) = E θ ( ∂s ( Y ; θ ) /∂θ T ) and J ( θ ) = E θ ( s ( Y ; θ ) s ( Y ; θ ) T ) are the sensitivityand the variability matrices, respectively. The matrix G ( θ ) = V ( θ ) − is known as theGodambe information and its form is due to the failure of the information identitysince, in general, K ( θ ) (cid:54) = J ( θ ). Paolo Girardi et al.
Asymptotic inference on the parameter θ can be based on the Wald-type statistic w S ( θ ) = n (˜ θ − θ ) T V (˜ θ ) − (˜ θ − θ ) , (2.4)which has an asymptotic chi-square distribution with d degrees of freedom; see Dawid et al. (2016). In contrast, the asymptotic distribution of the scoring rule ratio statistic W S ( θ ) = 2 (cid:110) S ( θ ) − S (˜ θ ) (cid:111) is a linear combination of independent chi-square randomvariables with coefficients related to the eigenvalues of the matrix J ( θ ) K ( θ ) − (Dawid et al. , 2016). More formally, W S ( θ ) ˙ ∼ (cid:80) dj =1 µ j Z j , with µ , . . . , µ d eigenvalues of J ( θ ) K ( θ ) − and Z , . . . , Z d independent standard normal variables. Adjustments ofthe scoring rule ratio statistic have received consideration in Dawid et al. (2016),extending results of Pace et al. (2011) for composite likelihoods. In particular, usingthe rescaling factor A ( θ ) = ( s ( θ ) T J ( θ ) s ( θ )) / ( s ( θ ) T K ( θ ) s ( θ )), we have (Dawid et al. ,2016, Theorem 4) W adjS ( θ ) = A ( θ ) W S ( θ ) ˙ ∼ χ d .A consistent estimate of V ( θ ) can be obtained by using a parametric bootstrap; seeVarin et al. (2011) for a detailed discussion of the issues related to the estimationof J ( θ ) and K ( θ ). However, for Tsallis score (2.2) the matrices K ( θ ) and J ( θ ) canbe derived analitically. Indeed, under the same assumptions of Theorem 3.1 in Goshand Basu (2013), it is possibile to show that K ( θ ) = ξ α n ∂µ∂β T ∂µ∂β ς α , where ∂µ∂β T = ( ∂µ ∂β , · · · , ∂µ n ∂β ) is a p × n matrix, with µ i = µ ( x i , β ), i = 1 , . . . , n ,and ξ α and ς α are the same as given in Gosh and Basu (2013) for the linear re-gression model (see Sect.6), namely ξ α = (2 π ) − α/ σ − ( α +2) / (1 + α ) − / and ς α = obust inference for nonlinear regression models (2 π ) − α/ σ − ( α +4) / α (1+ α ) / . Moreover, the computation of J ( θ ) leads to J ( θ ) = ξ α n ∂µ∂β T ∂µ∂β ς α − α ξ α . These matrices can then be used in (2.3) to derive the asymptotic distribution of ˆ θ .Note that ˆ β and ˆ σ are asymptotically independent. From the general theory of M -estimators, the influence function ( IF ) of the estimator˜ θ is given by IF ( y ; ˜ θ ) = K ( θ ) − s ( y ; θ ) , (2.5)and it measures the effect on the estimator ˜ θ of an infinitesimal contamination at thepoint y , standardised by the mass of the contamination. The estimator ˜ θ is B-robustif and only if s ( y ; θ ) is bounded in y (see Hampel et al. , 1986). Note that the IF of theMLE is proportional to the score function; therefore, in general, MLE has unbounded IF , i.e. it is not B-robust. Sufficient conditions for the robustness of the Tsallis scoreare discussed in Basu et al. (1998) and Dawid et al. (2016).For the Tsallis score (2.2), straightforward calculation show that the IF for the Tsallisestimator of the regression coefficients becomes IF ( y ; ˜ β ) ∝ ( y − µ ( x, β )) ∂µ ( x, β ) ∂β exp (cid:26) − ( γ − σ ( y − µ ( x, β )) (cid:27) and that the IF for the Tsallis estimator of the error variance becomes IF ( y ; ˜ σ ) ∝ γ exp (cid:26) − ( γ − σ ( y − µ ( x, β )) (cid:27) (cid:20) σ γ − − ( γ − σ γ − ( y − µ ( x, β )) (cid:21) − ( γ − √ γσ γ − . Since the functions s exp {− s } and s exp {− s } are bounded in s ∈ IR, than boththe influence functions IF ( y ; ˜ β ) and IF ( y ; ˜ σ ) are bounded in y for all γ > Paolo Girardi et al.
We illustrate statistical modelling of daily deaths (DD) and cumulative intensive careunit hospitalizations (ICU) in Italy and in the italian region Lombardia using theTsallis scoring rule. Following Dawid et al. (2016), for DD and ICU data the Tsallisscore (2.2) represents a composite scoring rule , based on only marginals. In particular,it can be interpreted as an independent scoring rule (see also Varin et al. , 2011, forcomposite likelihoods).In Italy Covid-19 epidemic reported the presence of first positive cases to swab inFebruary 20 in well-defined villages of Lombardia and Veneto, immediately followedby other regions. Note that, due to a large increase of cases, the Italian Governmentintroduced on March 8 a new law decree with strong restrictions on mobility regard-ing Lombardia and other provinces. A further decree on March 9 and subsequentmeasures protracted restrictions to all regions of Italy until April 13.We aim to build a data-driven model which can provide support to policy makersengaged in contrasting the spread of the Covid-19. In particular, we have appliedrobust inference for the model (1.1) to the available data, which cover the period fromJanuary 20 to April 4, 2020. The data sources are the daily report of the ProtezioneCivile ( opendatadpc.maps.arcgis.com/ apps/opsdashboard/index.html ).The reader is also pointed to the work of the StatGroup-19 research group that modelsthe mean function by using the five parameters Richards curve (Divino et al. , 2020).Here, we consider two independent applications to: • Daily Deaths (DD) for Covid-19, deaths confirmed by the Istituto Superiore di obust inference for nonlinear regression models • Cumulative Intensive Care Unit (ICU) hospitalizations with positive Covid-19swab; this data can be interpreted as a ”department use index”.Here we model the considered data in two different geographical extensions: Italy andLombardia. However, the proposed methodology has been applied to all the italianregions.Figures 1–4 present the robust fitted models for, respectively, the daily deaths (DD)and cumulative intensive care unit hospitalizations (ICU) in Italy and Lombardia. Ineach plot, black points are the observed data and red curves are the estimates/forecastswith our robust nonlinear models. Furthermore, the plots on the left and centre showthe cumulated data and the ones on the right show the daily data (new hospitaliza-tions and new deaths, every day).We remark that in all the figures the models fit well the observed data. Moreover, itcan be also noted that with respect to the DD the peaks have been reached few dayago, while for the expected number of daily ICU we are on to the peaks. The peak isnot a single day; it is a stabilization period that can last even several days, before astable decrease.The robust fits (Tsallis estimates and 95% confidence intervals) of the parameters e (inflection point) and d (upper asymptote) for the models are summarized in Tables1 and 2 for DD and ICU, respectively. With respect to DD data the estimated0 Paolo Girardi et al.
Mar 01 Mar 15 Apr 01
Italy C u m u l a t i v e nu m be r o f DD Mar Apr Mag Giu
Italy C u m u l a t i v e nu m be r o f DD Mar Apr Mag Giu
Italy E x pe c t ed nu m be r o f da il y DD Figure 1:
Fitted nonlinear robust models for the daily deaths in Italy.
Mar 01 Mar 15 Apr 01
Italy C u m u l a t i v e nu m be r o f I CU Mar Apr Mag Giu
Italy C u m u l a t i v e nu m be r o f I CU Mar Apr Mag Giu
Italy E x pe c t ed nu m be r o f da il y I CU Figure 2:
Fitted nonlinear robust models for cumulated intensive care unit hospitalizationsin Italy.
Mar 01 Mar 15 Apr 01
Lombardia C u m u l a t i v e nu m be r o f dea t h s Mar Apr Mag Giu
Lombardia C u m u l a t i v e nu m be r o f dea t h s Mar Apr Mag Giu
Lombardia E x pe c t ed nu m be r o f da il y dea t h s Figure 3:
Fitted nonlinear robust models for the daily deaths in Lombardia. obust inference for nonlinear regression models Mar 01 Mar 15 Apr 01
Lombardia C u m u l a t i v e nu m be r o f I CU Mar Apr Mag Giu
Lombardia C u m u l a t i v e nu m be r o f I CU Mar Apr Mag Giu
Lombardia E x pe c t ed nu m be r o f da il y I CU Figure 4:
Fitted nonlinear robust models for cumulated intensive care unit hospitalizationsin Lombardia.
Tsallis estimate e IC e Tsallis estimate d IC d Italy DD 39.9 (39.0;40.8) 29392 (28530;30881)Lombardia DD 38.0 (36.8;39.2) 15157 (14141;16173)Table 1: Robust estimates (and 95% confidence intervals) of the parameters e and d for the models for daily deaths.inflection point has been reached in Italy (day 39 corresponds to April 2, 2020) andin Lombardia. On the contrary, with respect to the cumulated ICU data the inflectionpoint will be reached from the end of April 2020 (day 61 corresponds to April 24,2020). Note that the inflection point corresponds to the median of the curves of thedaily data in Figures 1–4, while the peak corresponds to the mode.Tsallis estimate e IC e Tsallis estimate d IC d Italy ICU 61.0 (59.9;62.2) 489456 (489336;489575)Lombardia ICU 74.5 (62.2;84.7) 274213 (263057;285369)Table 2: Robust estimates (and 95% confidence intervals) of the parameters e and d for the models for cumulative intensive care unit hospitalizations.2 Paolo Girardi et al. . . . . . . . estimative density DD MLETsallis 81800 82000 82200 82400 82600 82800 83000 . . . . . . estimative density cumulated ICU MLETsallis
Figure 5:
Estimative predictive densities based on the Tsallis and the ML estimators forDD and cumulated ICU.
In the simplest instance of prediction, the object of inference is a future or a yetunobserved random variable Z . Let p Z ( z ; θ ) be the density of Z . The basic frequentistapproach to prediction of Z , on the basis of the observed y from Y , consists in usingthe estimative predictive density function p e ( z ) = p Z ( z ; ˆ θ ) , obtained by substituting the unknown θ with a consistent estimator of θ , such asthe Tsallis estimator or the MLE. Figure 5 reports the estimative predictive densi-ties based on both the estimators for DD and cumulated ICU. Note the the Tsallisestimative predictive density is shifted on the right and exhibits a larger variability.To compare the predictive performances of the Tsallis method with respect to theMLE, a simulation study has been performed, based on N = 10000 Monte Carloreplications. Figure 6 reports the boxplots of point estimators of the mean µ ( z, β )for the future observation z based on the predictive density estimated with the MLEand of the predictive density estimated with the Tsallis scoring rule, both under the obust inference for nonlinear regression models mle tsallis mle tsallis Figure 6:
Boxplot of point estimators of the mean µ ( z, β ) for the future observation z underthe central model (left) and under the contaminated model (right) based on the MLE andon the Tsallis scoring rule. central model (left) and under a contaminated model (right).We note that, under the central model, the two estimators present a similar behaviour.On the contrary, under the contaminated model, only the Tsallis estimator presenta robust performance. The values of the bias (and sd) are quite similar under thecentral model, on the contrary of the contaminated model. To conclude, we believe that our procedure can constitute a useful statistical toolin modelling Italian Covid-19 contagion data. Indeed, the Tsallis robust proceduresallow us to take into account for the inevitable inaccuracy of the Italian Covid-19 data,which are often underestimated. And example are those deaths of patients who diedwith symptoms compatible with Covid-19 but who have not had a tampon, or whathas been described by many media regarding the growing number of elderly peoplewho remain in their homes while needing to be hospitalized in intensive care. Thus,4
Paolo Girardi et al. for these data robust procedures are particularly useful since the allow us to deal atthe same time with model misspecifications (with respect to the normal assumption,independence or with respect to homoschedasticicy) and data reliability.With respect to the italian Covid-19, we are now exploring the Bayesian approach(see Giummol´e et al. , 2019), since it allow us to include prior information on theparameters of the model. Moreover, the plots of the posterior distributions for theparameters of the model and of the predictive distibutions may be quite useful inpractice.As a final remarks, since the variables are daily counts we will investigate the use ofthe Tsallis scoring rule in the context of nonlinear Poisson regression models.Updates on the results and on the Italian regions may be found in the web page ofthe
Robbayes-C19 research group: https://homes.stat.unipd.it/lauraventura/content/ricerca . Acknowledgements
This research work was partially supported by University of Padova (
BIRD197903 )and by PRIN 2015 (
Grant 2015EASZFS003 ). References
Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C. (1998). Robust and efficient estimationby minimising a density power divergence.
Biometrika , , 549–559. obust inference for nonlinear regression models Nonlinear regression analysis and its applications .Wiley, New York.Dawid, A.P., Musio, M., Ventura, L. (2016). Minimum scoring rule inference.
Scand.J. Statist. , , 123–138.Divino, F., Farcomeni, A., Jona Lasinio, G., Lovison, G., Maruotti, A. (2020) avail-able at http://afarcome.altervista.org/growthGLM.pdf Ghosh, M., Basu, A. (2013). Robust estimation for independent non-homogeneousobservations using density power divergence with applications to linear regression.
Electr. J. Statist. , , 2420–2456.Ghosh, M., Basu, A. (2016). Robust Bayes estimation using the density power diver-gence. Ann. Inst. Stat. Math. , , 413–437.Giummol´e, F., Mameli, V., Ruli, E., Ventura, L. (2019). Objective Bayesian inferencewith proper scoring rules. Test , , 728–755.Huber, P.J., Ronchetti, E.M. (2009). Robust Statistics . Wiley, New York.Mameli, V., Musio, M., Ventura, L. (2018), Bootstrap adjustments of signed scoringrule root statistics,
Comm. Stat. Simul. Comput. , , 1204–1215.Mameli, V., Ventura, L. (2015). Higher-order asymptotics for scoring rules. J. Statist.Plann. Inf. , , 13–26.Ritz, C., Baty, F., Streibig, J.C., Gerhard, D. (2015). Dose-Response Analysis using R . PLoS ONE
J. Statist.Physics , , 479–487.6 Paolo Girardi et al.
Varin, C., Reid, N., Firth, D. (2011). An overview of composite likelihood methods.
Statist. Sinica , , 5–42.Ventura, L., Racugno, W. (2016). Pseudo-likelihoods for Bayesian inference. In: Top-ics on methodological and applied statistical inference, series studies in theoreticaland applied statistics , pp. 205–220. Springer, Berlin., pp. 205–220. Springer, Berlin.