sensobol: an R package to compute variance-based sensitivity indices
Arnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin
ssensobol: an R package to compute variance-basedsensitivity indices Arnald Puy
Ecology and Evolutionary Biology,Princeton University
Samuele Lo Piano
School of the Built EnvironmentUniversity of Reading
Andrea Saltelli
Open Evidence ResearchUniversitat Oberta de Catalunya
Simon A. Levin
Ecology and Evolutionary Biology,Princeton University
Abstract
The R package sensobol provides several functions to conduct variance-based uncer-tainty and sensitivity analysis, from the estimation of sensitivity indices to the visualrepresentation of the results. It implements several state-of-the-art first and total-orderestimators and allows the computation of up to third-order effects, as well as of the ap-proximation error, in a swift and user-friendly way. Its flexibility makes it also appropriatefor models with either a scalar or a multivariate output. We illustrate its functionality byconducting a variance-based sensitivity analysis of three classic models: the Sobol’ (1998)G function, the logistic population growth model of Verhulst (1845), and the spruce bud-worm and forest model of Ludwig, Jones, and Holling (1976). Keywords : R, Uncertainty, Sensitivity Analysis, Modeling.
1. Introduction
It has been argued that any form of knowledge based on mathematical modeling is conditionalon a set, perhaps a hierarchy, of either stated or unspoken assumptions (Kay 2012; Saltelli,Bammer, Bruno, Charters, Di Fiore, Didier, Nelson Espeland, Kay, Lo Piano, Mayo, PielkeJr, Portaluri, Porter, Puy, Rafols, Ravetz, Reinert, Sarewitz, Stark, Stirling, van der Sluijs,and Vineis 2020). Such assumptions range from the choice of the data and of the methodsto the framing of the problem, including normative elements that identify the nature andthe relevance of the problem itself. This conditional uncertainty is a property of the modeland not of the reality that the model has the ambition to depict. Yet it affects the modeloutput and hence any model-based inference aiming at guiding policies in the “real world.”Identifying and understanding this conditional uncertainty is especially paramount when themodel output serves to inform a political decision, and boils down to answering two classesof questions: • How uncertain is the the inference? Is this uncertainty compatible with the taking ofa decision based on the model otcomes? Given the uncertainty, are the policy options a r X i v : . [ s t a t . C O ] J a n sensobol : variance-based sensitivity indices distinguishable in their outcome? • Which factor is dominating this uncertainty? Is this uncertainty reducible, e.g. withmore data or deeper research? Are there a few dominating factors or is the uncertaintyoriginating from several factors? Do the factors act singularly or in combination withone another?The second class of questions is the realm of global sensitivity analysis, which aims to offer adiagnosis as to the composition of the uncertainty affecting the model output, and hence themodel based inference (Saltelli, Andres, and Homma 1993; Homma and Saltelli 1996a; Saltelli,Ratto, Andres, Campolongo, Cariboni, Gatelli, Saisana, and Tarantola 2008). In helping toappreciate the extent and the nature of the problems linked to the use of a given model in apractical setting, global sensitivity analysis can be considered as a tool for the hermeneuticsof mathematical modeling.Global sensitivity analysis is well represented in international guidelines for impact assessment(Azzini, Listorti, Mara, and Rosati 2020a; Gilbertson 2018), as well as in many disciplinaryjournals (Jakeman, Letcher, and Norton 2006; Puy, Lo Piano, and Saltelli 2020c). However,the uptake of state-of-the-art global sensitivity analysis tools is still in its infancy. Most studiescontinue to prioritize local sensitivity or one-at-a-time analyses, which explore how the modeloutput changes when one factor is varied and the rest is kept fixed at their nominal values(Saltelli, Aleksankina, Becker, Fennell, Ferretti, Holst, Li, and Wu 2019). This approachunderexplores the input space and can not appraise interactions between factors, which areubiquitous in many models. Some reasons behind the scarce use of global sensitivity analysismethods are lack of technical skills or resources available, unawareness of global sensitivitymethods or simply reluctance due to their “destructive honesty”: if applied properly, theuncertainty uncovered by a global sensitivity analysis might be so wide as to render themodel largely impractical for policy-making (Leamer 2010; Saltelli et al. et al.
The sparse uptake of global sensitivity methods contrasts with the many packages available indifferent languages. In
Python there is the
SALib package (Herman and Usher 2017), whichincludes the Sobol’, Morris and the Fourier Amplitude Sensitivity Test (FAST) methods.In
MATLAB , the
UQLab offers the Morris method, the Borgonovo (2007) indices, Sobol’indices (with the Sobol’ and Janon estimators) and the Kucherenko indices (Marelli andSudret 2014). The
SAFE package (Pianosi, Sarrazin, and Wagener 2015), developed originallyfor
MATLAB / Octave but with scripts available for R and Python , includes variance-basedanalysis, elementary effects and the PAWN method (Pianosi and Wagener 2015).To our knowledge, there are three packages on CRAN that implement global sensitivity anal-ysis in R (R Core Team 2020): the multisensi package (Bidot, Lamboni, and Monod 2018), rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin fast package (Reusser D.2015), which implements FAST (removed from CRAN on 29-08-2020); and the sensitivity package (Iooss, Janon, Pujol, with contributions from Baptiste Broto, Boumhaout, Veiga,Delage, Amri, Fruth, Gilquin, Guillaume, Le Gratiet, Lemaitre, Marrel, Meynaoui, Nelson,Monari, Oomen, Rakovec, Ramos, Roustant, Song, Staum, Sueur, Touati, and Weber 2020),the most comprehensive collection of functions in R for screening, global sensitivity analysisand robustness analysis. sensobol differs from these R packages by the following characteristics:1. It offers a state-of-the-art compilation of variance-based sensitivity estimators.
Variance-based sensitivity analysis is regarded as the gold standard of global sensitivity methods.In its current version, sensobol comprises four first-order and eight total-order variance-based estimators, from the classic formulae of Sobol’ (1993) or Jansen (1999) to the morerecent contributions by Glen and Isaacs (2012), Razavi and Gupta (2016b,a) (VARS-TO) or Azzini, Mara, and Rosati (2020b).2.
It is very flexible and user-friendly.
Firstly, there is only one function to computeSobol’-based sensitivity indices, sobol_indices() . Any first and total-order estimatorcan be simultaneously feed into the function provided that the user correctly specifiesthe sampling design (see Section 2.1). This contrasts with the sensitivity package (Iooss et al. sobol_indices() with the data.table syntax makes the calculationof sensitivity indices for scalar outputs as easy as for multivariate outputs (see Section3.3) (Dowle and Srinivasan 2019).3.
It permits the computation of up to third-order effects.
Appraising high-order effectsis paramount when models are non-additive (see Section 2). Although the total-orderindex already informs on whether a parameter is involved in interactions, sometimesa more precise account of the nature of this interaction is needed. sensobol opensthe possibility to probe into these interactions through the computation of second andthird-order effects regardless of the selected estimator.4.
It offers publication-ready figures of the model output and sensitivity-related analysis. sensobol relies on ggplot2 (Wickham 2016) and the grammar of graphics to yield high-quality plots which can be easily modified by the user.5.
It is more efficient than current implementations of variance-based estimators in R . Our benchmark of sensobol and sensitivity functions suggest that the former may beapproximately two times faster than the latter (See Annex, section 6.1).The paper is organized as follows: in Section 2 we briefly describe variance-based sensitivityanalysis. In Section 3 we walk through three examples of models with different characteristicsand increasing complexity to show all the functionalities of sensobol . Finally, we summarizethe main contributions of the package in Section 4.
2. Variance-based sensitivity analysis
Variance-based sensitivity indices use the variance to describe the model output uncertainty.Given a model of the form y = f ( x ), x = x , x , . . . , x i , . . . , x k ∈ R k , where y is a scalar sensobol : variance-based sensitivity indices output and x , ..., x k are k uncertain parameters described by probability distributions, theanalyst might be interested in assessing how sensitive y is to changes in x i . One way oftackling this question is to check how much the variance in y decreases after fixing x i to its“true” value x ∗ i , i.e. V ( y | x i = x ∗ i ). But the true value of x i is unknown, so instead of fixing itto an arbitrary number, we take the mean of the variance of y after fixing x i to all its possiblevalues over its uncertainty range, while all other parameters are left to vary. This can beexpressed as E x i [ V x ∼ i ( y | x i )], where x ∼ i denotes all parameters-but- x i and E ( . ) and V ( . ) arethe mean and the variance operator respectively. E x i [ V x ∼ i ( y | x i )] ≤ V ( y ), and in fact, V ( y ) = V x i [ E x ∼ i ( y | x i )] + E x i [ V x ∼ i ( y | x i )] , (1)where V x i [ E x ∼ i ( y | x i )] is known as the first-order effect of x i and E x i [ V x ∼ i ( y | x i )] is the residual.When a parameter is important in conditioning V ( y ), V x i [ E x ∼ i ( y | x i )] is high.To illustrate this property, let’s imagine we run a three-dimensional model, plot the modeloutput y against the range of values in x i , divide the latter in n bins and compute the mean y in each bin. This is represented in Figure 1, with the red dots showing the mean in eachbin. The parameter whose mean y values vary the most has the highest direct influence inthe model output; in this case, this is clearly x . This procedure applied over very small binsis actually V x i [ E x ∼ i ( y | x i )], and is the conditional variance of x i on V ( y ), V i (Saltelli et al. x , x , . . . , x k are independent parameters, V ( y ) can be decomposed as the sumof all partial variances up to the k -th order, as x x x x y Figure 1: Scatterplot of y against x i , i = 1 , ,
3. The red dots show the mean y value ineach bin (we have set the number of bins arbitrarely at 30), and N = 2 . The model isthe polynomial function shown in Becker and Saltelli (2015), where y = 3 x + 2 x x − x , x i ∼ U (0 , V ( y ) = (cid:88) i =1 V i + (cid:88) i (cid:88) i 0. However, and unlike x , x does influence y given the shape of the scatterplot, so it can not be an inconsequential parameter. Indeed, x influences y through high-order effects, i.e. by interacting with some other parameter(s).In this specific case, it is clear that x must interact with x given that x is non-influential.This notwithstanding, such appraisal of interactions can rarely be made through the visualinspection of scatterplots alone, and often requires computing higher-order terms in Equation8.Since there are 2 k − T i , which measures the first-order effect of x i jointly with its interactions with all the sensobol : variance-based sensitivity indices other parameters. In other words, T i includes all terms in Equation 2 that include the index i , and is computed as follows: T i = 1 − V x ∼ i (cid:2) E x i ( y | x ∼ i ) (cid:3) V ( y ) = E x ∼ i (cid:2) V x i ( y | x ∼ i ) (cid:3) V ( y ) , (9)For a three-dimensional model, the total-order index of x will thus be computed as T = S + S , + S , + S , , . Since T i = 0 indicates that x i does not convey any uncertainty to themodel output, the total-order index has been used to screen influential from non-influentialparameters, a setting known as “factor fixing” (Saltelli et al. x x x x y Figure 2: Scatterplot of y against x i , i = 1 , , 3. The red dots show the mean y value ineach bin (we have set the number of bins arbitrarely at 30), and N = 2 . The model is theIshigami and Homma (1990) function. The computation of variance-based sensitivity indices requires two elements: 1) a samplingdesign, i.e. a strategy to arrange the sample points into the multidimensional space of the inputfactors, and 2) an estimator, i.e. a formula to compute the sensitivity measures (Lo Piano,Ferretti, Puy, Albrecht, and Saltelli 2020). Both elements are inextricably intertwined: thereliance on a given sampling design determines which estimators can be used, and the otherway around.The package sensobol currently offers support for five first-order and eight total-order sensi-tivity estimators, which rely on specific combinations of A , B , A ( i ) B or B ( i ) A matrices (Tables1–2). Estimator 9 in Table 2 is known as VARS-TO and requires a different sampling designbased on star-centers and cross-sections (Razavi and Gupta 2016a,b). We provide further in-formation about VARS-TO in the Annex, section 6.2. All these estimators are sample-basedand hence sensobol does not include emulators or surogate models.How are these matrices formed, and why are they required? Let Q be a ( N, k ) matrixconstructed using either random or quasi-random number generators, such as the Sobol’(1967, 1976) sequence or a Latin Hypercube Sampling design (McKay, Beckman, and Conover1979)). The A and the B matrices include respectively the leftmost and rightmost k columnsof the Q matrix. As for the A ( i ) B ( B ( i ) A ) matrices, they are formed by all columns from the A ( B ) matrix except the i -th, which comes from B ( A ) (Equation 10, Figure 3). rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin º Estimator first Author1 N (cid:80) Nv =1 f ( A ) v f ( B ( i ) A ) v − f V ( y ) "sobol" Sobol’ (1993)2 N (cid:80) Nv =1 f ( B ) v (cid:104) f ( A ( i ) B ) v − f ( A ) v (cid:105) V ( y ) "saltelli" Saltelli, Annoni,Azzini, Campo-longo, Ratto, andTarantola (2010)3 V ( y ) − N (cid:80) Nv =1 (cid:104) f ( B ) v − f ( A ( i ) B ) v (cid:105) V ( y ) "jansen" Jansen (1999)4 (cid:80) Nv =1 ( f ( B ( i ) A ) v − f ( B ) v )( f ( A ) v − f ( A ( i ) B ) v ) (cid:80) Nv =1 (cid:104) ( f ( A ) v − f ( B ) v ) +( f ( B ( i ) A ) v − f ( A ( i ) B ) v ) (cid:105) "azzini" Azzini and Rosati(2019)Table 1: First-order estimators included in sensobol (v1.0.0).In these matrices each column is a model input and each row a sampling point. First and total-order effects are then calculated by averaging several elementary effects computed rowwise:for S i , we need pairs of points where all factors but x i have different values (i.e. A v , ( B ( i ) A ) v ;or B v , ( A ( i ) B ) v ), and for T i pairs of points where all factors except x i have the same values(i.e. A v , ( A ( i ) B ) v ; or B v , ( B ( i ) A ) v ). The elementary effect for S i thus requires moving from A v to ( B ( i ) A ) v (or from B v to ( A ( i ) B ) v ), therefore taking a step along x ∼ i , whereby the elementaryeffect for T i involves moving from A v to ( A ( i ) B ) v (or from B v to ( B ( i ) A ) v ), hence moving along x i (Saltelli et al. sobol_matrices() allows to create these sampling designs using either Sobol’(1967, 1976) Quasi-Random Numbers ( type = "QRN" ), Latin Hypercube Sampling ( type ="LHS" ) or Random numbers ( type = "R" ). In Figure 3 we show how these sampling methodsdiffer in two dimensions. Comparatively, quasi-random numbers fill the input space quickerand more evenly, leaving smaller unexplored volumes. However, random numbers mightprovide more accurate sensitivity indices when the model under examination has importanthigh-order terms (Kucherenko, Feil, Shah, and Mauntz 2011). In addition, Latin Hyper-cube Sampling might outperform quasi-random numbers for some specific function typologies(Kucherenko, Delpuech, Iooss, and Tarantola 2015). In general, quasi-random numbers areassumed to be the safest bet when selecting a sampling algorithm for a function of unknownbehavior, and are the default setting in sensobol . sensobol : variance-based sensitivity indices N º Estimator total Author1 N (cid:80) Nv =1 (cid:104) f ( A ) v − f ( A ( i ) B ) v (cid:105) V ( y ) "jansen" Jansen (1999)2 N (cid:80) Nv =1 f ( A ) v (cid:104) f ( A ) v − f ( A ( i ) B ) v (cid:105) V ( y ) "sobol" Sobol’ (2001)3 V ( y ) − N (cid:80) Nv =1 f ( A v ) f ( A ( i ) B ) v + f V ( y ) "homma" Homma andSaltelli (1996b)4 1 − N (cid:80) Nv =1 f ( B ) v f ( B ( i ) A ) v − f N (cid:80) Nv =1 f ( A ) v − f saltelli Saltelli et al. (2008)5 1 − N (cid:80) Nv =1 f ( A ) v f ( A ( i ) B ) v − f N (cid:80) Nv =1 f ( A v )2+ f ( A ( i ) B )2 v − f "janon" Janon, Klein,Lagnoux, Nodet,and Prieur (2014)Monod, Naud, andMakowski (2006)6 1 − N − (cid:80) Nv =1 [ f ( A ) v −(cid:104) f ( A ) v (cid:105) ] (cid:104) f ( A ( i ) B ) v − (cid:68) f ( A ( i ) B ) v (cid:69)(cid:105)(cid:114) V [ f ( A ) v ] V (cid:104) f ( A ( i ) B ) v (cid:105) "glen" Glen and Isaacs(2012)7 (cid:80) Nv =1 [ f ( B ) v − f ( B ( i ) A ) v ] +[ f ( A ) v − f ( A ( i ) B ) v ] (cid:80) Nv =1 [ f ( A ) v − f ( B ) v ] +[ f ( B ( i ) A ) v − f ( A ( i ) B ) v ] "azzini" Azzini and Rosati(2019); Azzini et al. (2020b)8 E x ∗∼ i [ γ x ∗∼ i ( h i ) ] + E x ∗∼ i [ C x ∗∼ i ( h i ) ] V ( y ) See Annex,section 6.2 Razavi and Gupta(2016b,a).Table 2: Total-order estimators included in sensobol (v1.0.0). LHS QRN R0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.000.250.500.751.00 x x Figure 4: Sampling methods. Each dot is a sampling point. N = 2 .Once the sampling design is set, the computation of Sobol’ indices is done with the function sobol_indices() . The arguments first , total and matrices are set by default at first ="saltelli" , total = "jansen" and matrices = c("A", "B", "AB") following best prac- rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin Q = . 50 0 . 50 0 . 50 0 . 50 0 . 50 0 . 50 0 . 50 0 . . 75 0 . 25 0 . 75 0 . 25 0 . 75 0 . 25 0 . 75 0 . . 25 0 . 75 0 . 25 0 . 75 0 . 25 0 . 75 0 . 25 0 . . 38 0 . 38 0 . 62 0 . 12 0 . 88 0 . 88 0 . 12 0 . . 88 0 . 88 0 . 12 0 . 62 0 . 38 0 . 38 0 . 62 0 . A BA (1) B = . 50 0 . 50 0 . 50 0 . . 75 0 . 25 0 . 75 0 . . 25 0 . 75 0 . 25 0 . . 88 0 . 38 0 . 62 0 . . 38 0 . 88 0 . 12 0 . A (2) B = . 50 0 . 50 0 . 50 0 . . 75 0 . 25 0 . 75 0 . . 25 0 . 75 0 . 25 0 . . 38 0 . 88 0 . 62 0 . . 88 0 . 38 0 . 12 0 . ... B (1) A = . 50 0 . 50 0 . 50 0 . . 75 0 . 25 0 . 75 0 . . 25 0 . 75 0 . 25 0 . . 38 0 . 88 0 . 12 0 . . 88 0 . 38 0 . 62 0 . B (2) A = . 50 0 . 50 0 . 50 0 . . 75 0 . 25 0 . 75 0 . . 25 0 . 75 0 . 25 0 . . 88 0 . 38 0 . 12 0 . . 38 0 . 88 0 . 62 0 . ... (10)Figure 3: Example of the creation of an A and a B matrix, as well as A ( i ) B and B ( i ) A matrices.The Q matrix has been created with Sobol’ (1967, 1976) Quasi-Random Numbers, k = 4 and N = 5. The figure is based on Puy et al. (2020a).tices in sensitivity analysis (Saltelli et al. et al. sensobol : variance-based sensitivity indices between all first and total-order estimators listed in Tables 1–2 is possible with the appro-priate sampling design (Table 3). If the analyst selects estimators whose combination do notmatch the specific designs listed in Table 3, sobol_indices() will generate an error and urgeto revise the specifications. This would be the case, for instance, if the analyst sets first ="sobol" , total = "glen" and matrices = "c("A", "AB", "BA") . first total matrices N º model runs "saltelli""jansen" "jansen""sobol""homma""janon""glen" c("A", "B", "AB") N ( k + 2) "sobol" "saltelli" c("A", "B", "BA") N ( k + 2) "azzini" "jansen""sobol""homma""janon""glen""azzini""saltelli" c("A", "B", "AB", "BA") N ( k + 1) "saltelli""jansen""sobol""azzini" "azzini" c("A", "B", "AB", "BA") N ( k + 1)Table 3: Available combinations of first and total-order estimators in sensobol (v1.0.0). 3. Usage In this section we illustrate the functionality of sensobol through three different examples ofincreasing complexity. Let us first load the required libraries: R> library("sensobol")R> library("data.table")R> library("ggplot2") In sensitivity analysis, the accuracy of sensitivity estimators is usually checked against testfunctions for which the variance and the sensitivity indices can be expressed analytically. sensobol includes five of these test functions: Ishigami and Homma (1990)’s, Sobol’ (1998)’s rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin sensobol with the Sobol’ G function,one of the most used benchmark functions in sensitivity analysis (Lo Piano et al. et al. S > S > S > S and ( S , . . . , S ) ≈ º et al. (2011)’s taxonomy (a function with few important factors and minorinteractions), with Type B and Type C functions designing those with equally importantparameters but with few and large interactions respectively.We first define the settings of the uncertainty and sensitivity analysis: we set the sample size N of the base sample matrix and the number of uncertain parameters k , and create a vectorwith the parameters’ name. Since we will bootstrap the indices to get confidence intervals,we also set the number of bootstrap replicas R and define the bootstrap confidence intervalmethod, which in this case will be the normal method. Finally, we set the confidence intervalsat 0.95:N º Test function Author1 y = sin( x ) + a sin( x ) + bx sin( x ),where a = 2 , b = 1 and ( x , x , x ) ∼ U ( − π, + π ) Ishigami andHomma (1990)2 y = (cid:81) ki =1 | x i − | + a i a i ,where k = 8, x i ∼ U (0 , 1) and a = (0 , , . , , , , , 99) Sobol’ (1998)3 y = (cid:80) ki =1 ( − i (cid:81) ij =1 x j ,where x i ∼ U (0 , 1) Bratley et al. (1992)4 y = (cid:81) ki =1 | x i − | ,where x i ∼ U (0 , 1) Bratley and Fox(1988)5 y = a T x + a T sin ( x ) + a T cos ( x ) + x T M x ,where x = x , x , . . . , x k , k = 15, and valuesfor a Ti , i = 1 , , M are defined by the authors Oakley andO’Hagan (2004)6 y = k (cid:88) i =1 α i f u i ( x i ) + k (cid:88) i =1 β i f u Vi, ( x V i, ) f u Vi, ( x V i, )+ k (cid:88) i =1 γ i f u Wi, ( x W i, ) f u Wi, ( x W i, ) f u Wi, ( x W i, ) See Becker (2020)and Puy et al. (2020a) for details.Table 4: Test functions included in sensobol (v1.0.0). R> N <- 2 ^ 10R> k <- 8R> params <- paste("$x_", 1:k, "$", sep = "")R> R <- 10^3 sensobol : variance-based sensitivity indices R> type <- "norm"R> conf <- 0.95 The next step is to create the sample matrix. In this specific case we will use an A , B , A ( i ) B design and Sobol’ quasi-random numbers to compute first and total-order indices. These aredefault settings in sobol_matrices() . In our call to the function we only need to define thesample size and the parameters: R> mat <- sobol_matrices(N = N, params = params) Once the sample matrix is defined we can run our model. Note that in mat each column is amodel input and each row a sample point, so the model has to be coded as to run rowwise.This is already the case of the Sobol’ G function included in sensobol : R> y <- sobol_Fun(mat) The package also allows the user to swiftly visualize the model output uncertainty by plottingan histogram of the model output obtained from the A matrix: R> plot_uncertainty(Y = y, N = N) ++ labs(y = "Counts", x = "$y$") y C o un t s Figure 5: Empirical distribution of the Sobol’ G model output.Before computing Sobol’ indices, it is recommended to explore how the model output mapsonto the model input space. sensobol includes two functions to that aim, plot_scatter() and plot_multiscatter() . The first displays the model output y against x i while showingthe mean y value (i.e. as in Figures 1–2), and allows the user to identify patterns denotingsensitivity (Pianosi, Beven, Freer, Hall, Rougier, Stephenson, and Wagener 2016). R> plot_scatter(data = mat, N = N, Y = y, params = params) rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin x x x x x x x x Value y Figure 6: Scatter plots of model inputs against the model output for the Sobol’ G function.The scatter plots in Figure 6 evidence that x , x and x have more “shape” than the restand thus have a higher influence on y than ( x , . . . , x ). However, scatter plots do not alwayspermit to detect which parameters have a joint effect on the model output. To gain a firstinsight on these interactions, the function plot_multiscatter() plots x i against x j and mapsthe resulting coordinate to its respective model output value. Interactions are then visible bythe emergence of colored patterns.By default, plot_multiscatter() plots all possible combinations of x i and x j , which equal k !2!( k − = 6 possible combinations in this specific case. In high-dimensional models withseveral inputs this might lead to overplotting. To avoid this drawback, the user can subsetthe parameters they wish to focus on following the results obtained with plot_scatter() : if x i does not show “shape” in the scatterplots of x i against y , then it may be excluded from plot_multiscatter() .Below we plot all possible combinations of pairs of inputs between x − x , which are influentialaccording to Figure 6: R> plot_multiscatter(data = mat, N = N, Y = y,+ params = paste("$x_", 1:4, "$", sep = "")) sensobol : variance-based sensitivity indices x x x x x x x x x x x x y Figure 7: Scatterplot matrix of pairs of model inputs for the Sobol’ G function. The topmostand bottommost label facets refer to the x and the y axis respectively.The results suggest that x might interact with x given the colored pattern of the ( x , x )facet: the highest values of the model output are concentrated in the corners of the ( x , x )input space and thus result from combinations of high/low x values with high/low x values.In case the analyst is interested in assessing the exact weight of this high-order interaction,the computation of second-order indices would be required.The last step is the computation of Sobol’ indices. In this specific case, we set boot = TRUE to bootstrap the Sobol’ indices and get confidence intervals: R> ind <- sobol_indices(Y = y, N = N, params = params, boot = TRUE,+ R = R, type = type, conf = conf) Let’s have a look at the output of sobol_indices() : to improve the visualization, we set thenumber of digits in each numerical column to 3: R> cols <- colnames(ind)[1:5]R> ind[, (cols):= round(.SD, 3), .SDcols = (cols)]R> print(ind) rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin original bias std.error low.ci high.ci sensitivity parameters1: 0.724 0.005 0.069 0.584 0.854 Si $x_1$2: 0.184 -0.002 0.039 0.110 0.261 Si $x_2$3: 0.025 0.000 0.015 -0.005 0.054 Si $x_3$4: 0.010 0.000 0.008 -0.007 0.025 Si $x_4$5: 0.000 0.000 0.001 -0.001 0.002 Si $x_5$6: 0.000 0.000 0.001 -0.002 0.002 Si $x_6$7: 0.000 0.000 0.001 -0.001 0.002 Si $x_7$8: 0.000 0.000 0.001 -0.002 0.002 Si $x_8$9: 0.799 0.001 0.036 0.728 0.868 Ti $x_1$10: 0.243 0.000 0.013 0.217 0.269 Ti $x_2$11: 0.035 0.000 0.002 0.030 0.039 Ti $x_3$12: 0.011 0.000 0.001 0.009 0.012 Ti $x_4$13: 0.000 0.000 0.000 0.000 0.000 Ti $x_5$14: 0.000 0.000 0.000 0.000 0.000 Ti $x_6$15: 0.000 0.000 0.000 0.000 0.000 Ti $x_7$16: 0.000 0.000 0.000 0.000 0.000 Ti $x_8$ The output when boot = TRUE is a data.table with the bootstrap statistics in the five left-most columns (the observed statistic, the bias, the standard error and the low and highconfidence intervals), and two extra columns linking each statistic to a sensitivity index( sensitivity ) and a parameter ( parameters ). If boot = FALSE , sobol_indices() com-putes a point estimate of the indices and the output includes only the columns original , sensitivity and parameters .The results indicate that x is responsible for 72% of the uncertainty in y , followed by x (18%). x and x have a very minor first-order effect, while the rest are non-influential. Notethat the observed statistics suggest the presence of non-additivities: T and T (0.79 and 0.24)are respectively higher than S and S (0.72 and 0.18). As we have seen in Figure 7, x and x have a non-additive effect in y .We can also compute the Sobol’ indices of a dummy parameter, i.e. a parameter that hasno influence on the model output, to estimate the numerical approximation error. Thiswill be used later on to identify parameters whose contribution to the output variance isless than the approximation error, and therefore can not be considered influential. Like sobol_indices() , the function sobol_dummy() allows to obtain point estimates (the default)or bootstrap estimates. In this example we use the latter option: R> ind.dummy <- sobol_dummy(Y = y, N = N, params = params, boot = TRUE, R = R) The last stage is to plot the Sobol’ indices and their confidence intervals, as well as the Sobol’indices of a dummy parameter, with the function plot_sobol() : R> plot_sobol(data = ind, dummy = ind.dummy) sensobol : variance-based sensitivity indices x x x x x x x x S o b o l’i nd e x Sobol’ indices S i T i Figure 8: Sobol’ indices of the Sobol’ G function.The error bars of S and S overlap with those of T and T respectively. In the case of theSobol’ G function we know that T > S and T > S because the analytic variance is known,but for models where this is not the case such overlap might hamper the identification ofnon-additivities. Under these circumstances, narrower confidence intervals can be obtainedby increasing the sample size N and re-running the analysis from the creation of the samplematrix onwards.The horizontal, blue / red dashed lines respectively mark the upper limit of the T i and S i indices of the dummy parameter. This helps in identifying which parameters do conditionthe model output given the sample size constraints of the analysis: only parameters whoselower confidence intervals are not below the S i and T i indices of the dummy parameter canbe considered truly influential, in this case x and x . Note that although T (cid:54) = 0, the T i index of the dummy parameter is higher than T and therefore T can not be distinguishedfrom the approximation error.It is also worth stating that the presence of non-additivities can also be probed with R> ind[sensitivity == "Si", sum(original)] [1] 0.9419303 with ( (cid:80) ki =1 S i ) < In this section we show how sensobol can be implemented to conduct a global uncertainty andsensitivity analysis of a dynamic model. We will use the following logistic population growthmodel: dNdt = rN (1 − NK ) (11)Malthusian models of population growth (i.e. exponential growth) can not forever describethe growth of a population because resources, including space, are limited and competitive rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin N at time t is dependent on the growth rate r ,the number of individuals N and the carrying capacity K , defined as the maximum numberof individuals that a given environment can sustain. When N approaches K , the populationgrowth slows down until the number of individuals converges to a constant (Figure 9). t N Figure 9: Dynamics of the logistic population growth model for N = 3, r = 0 . K = 100.Before moving forward to conduct a global sensitivity analysis of Equation 11, we set thesample size N of the base sample matrix at 2 and create a vector with the name of theparameters. For this specific example we will use the Azzini et al. (2020b) estimators, whichrequire a sampling design based on A , B , A ( i ) B , B ( i ) A matrices. We will compute up to second-order effects, bootstrap the indices 10 times and compute the 95% confidence intervals usingthe percentile method. R> N <- 2 ^ 13R> params <- c("$r$", "$K$", "$N_0$")R> matrices <- c("A", "B", "AB", "BA")R> first <- total <- "azzini"R> order <- "second"R> R <- 10 ^ 3R> type <- "percent"R> conf <- 0.95 In the next two code snippets we code Equation 11 and wrap it up in a mapply() call to makeit run rowwise: R> population_growth <- function (r, K, X0) {+ X <- X0+ for (i in 0:20) {+ X <- X + r * X * (1 - X / K) sensobol : variance-based sensitivity indices + }+ return (X)+ }R> population_growth_run <- function (dt) {+ return(mapply(population_growth, dt[, 1], dt[, 2], dt[, 3]))+ } We now construct the sample matrix. This time we set type = "LHS" to use a Latin Hyper-cube Sampling Design: R> mat <- sobol_matrices(matrices = matrices, N = N, params = params,+ order = order, type = "LHS") Let’s assume that, after surveying the literature and conducting fieldwork, we have agreedthat the uncertainty in the model inputs can be fairly approximated with the distributionspresented in Table 5. We thus transform each model input in mat to its specific probabilitydistribution using the appropriate inverse quantile transformation:Table 5: Summary of the parameters and their distributions (Chalom and Inacio Knegt Lopez2017). Parameter Description Distribution r Population growth rate N (1 . , . K Maximum carrying capacity N (40 , N Initial population size U (10 , R> mat[, "$r$"] <- qnorm(mat[, "$r$"], 1.7, 0.3)R> mat[, "$K$"] <- qnorm(mat[, "$K$"], 40, 1)R> mat[, "$N_0$"] <- qunif(mat[, "$N_0$"], 10, 50) The sample matrix in mat is now ready, and we can run the model: R> y <- population_growth_run(mat) And display the model output uncertainty: R> plot_uncertainty(Y = y, N = N) + labs(y = "Counts", x = "$y$") rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin y C o un t s Figure 10: Empirical distribution of the logistic population growth model output.The results suggest that after 20 time steps the number of individuals will most likely con-centrate around 40. Note however the right and left tails of the distribution, indicating thata few simulations also yielded significantly lower and higher population values. These tailsresult from some specific combinations of parameter values and are indicative of interactioneffects, which will be explored later on. We can also compute some statistics to get a bettergrasp of the output distribution, such as quantiles: R> quantile(y, probs = c(0.01, 0.025, 0.5, 0.975, 0.99, 1)) With plot_scatter() we can map the model inputs onto the model output. Instead ofplotting one dot per simulation, in this example we use hexagon bins by setting method= "bin" and internally calling ggplot2::geom_hex() . With this specification we divide theplane into regular hexagons, count the number of hexagons and map the number of simulationsto the hexagon bin. method = "bin" is therefore a useful resource to avoid overplotting with plot_scatter() when the sample size of the base sample matrix ( N ) is high, as in this case( N = 2 ): R> plot_scatter(data = mat, N = N, Y = y, params = params, method = "bin") sensobol : variance-based sensitivity indices r K N Value y 200 400 600 count Figure 11: Scatterplot of model inputs against the model output for the population growthmodel.Zones with a higher density of dots are highlighted by lighter blue colors. Note also thebifurcation of the model output at r ≈ 2. This behavior is the result of the discretizationof the logistic (May and Oster 1976). At r > 2, a cycle of length 2 emerges, followed as r is increased further by an infinite sequence of period-doubling bifurcations approaching acritical value after which chaotic behavior and strange attractors result.We can also avoid overplotting in plot_multiscatter() by randomly sampling and displayingonly n simulations. This is controlled by the argument smpl , which is NULL by default. Herewe set smpl at 2 and plot only 1/4th of the simulations. R> plot_multiscatter(data = mat, N = N, Y = y, params = params, smpl = 2 ^ 11) rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin KN rK rN 38 40 42 1 2 1.0 1.5 2.0 2.520403840422040 20 30 40 50 y Figure 12: Scatterplot matrix of pairs of model inputs for the population growth model.The results suggest that there may be interactions between ( r, K ) and ( r, N ): note theyellow-green colours concentrated on the right side of the ( r, K ) plot as well as on the topright and bottom right sides of the ( r, N ) facet. In the case of the pair ( K, N ), no clearpattern emerges.These interactions, as well as the first-order effects of N , r, K , can be quantified with a callto sobol_indices() : R> ind <- sobol_indices(matrices = matrices, Y = y, N = N, params = params,+ first = first, total = total, order = order,+ boot = TRUE, R = R, parallel = "no", type = type,+ conf = conf) We round the number of digits of the numeric columns to 3 to better inspect the results, andprint ind : R> cols <- colnames(ind)[1:5]R> ind[, (cols):= round(.SD, 3), .SDcols = (cols)]R> print(ind) original bias std.error low.ci high.ci sensitivity parameters1: 0.028 -0.001 0.019 -0.010 0.067 Si $r$2: 0.114 0.000 0.004 0.107 0.122 Si $K$3: 0.115 0.000 0.011 0.093 0.136 Si $N_0$4: 0.760 0.000 0.011 0.738 0.781 Ti $r$5: 0.185 0.000 0.010 0.165 0.205 Ti $K$6: 0.861 0.001 0.020 0.825 0.900 Ti $N_0$7: -0.004 0.000 0.010 -0.024 0.015 Sij $r$.$K$8: 0.673 0.001 0.022 0.627 0.716 Sij $r$.$N_0$9: 0.012 0.000 0.007 -0.002 0.025 Sij $K$.$N_0$ sensobol : variance-based sensitivity indices Since in this example we have set order = "second" , the output also displays the second-order effects ( S ij ) of the pairs ( r, K ), ( r, N ) and ( K, N ). Note that S r,N conveys ∼ 67% ofthe uncertainty in y , and that S r + S K + S N = 2 . . . ≈ circa K and N have the higher first-order effect in the model output. If the aim is toreduce the uncertainty in the prediction (i.e. to better assess the potential impact of a specieson a territory), these results suggest to focus the efforts on better quantifying the initialpopulation N , and/or on conducting further research on what is the maximum carryingcapacity K of the environment for this particular species. Priority should be given to N given its strong interaction with r .In order to get an estimation of the approximation error we compute the Sobol’ indices alsofor the dummy parameter: R> ind.dummy <- sobol_dummy(Y = y, N = N, params = params, boot = TRUE, R = R) And plot the output: R> plot_sobol(data = ind, dummy = ind.dummy) K N r S o b o l’i nd e x Sobol’ indices S i T i Figure 13: First and total-order Sobol’ indices of the population growth model.Note the importance of interactions, reflected in T N (cid:29) S N and T r (cid:29) S r . It is also importantto highlight that T K is below the T i index of the dummy parameter (the dashed, horizontalblue line at c. 0.5). This makes T K indistinguishable from the approximation error. The sameapplies to S r , whose 95% confidence interval overlaps with the S i of the dummy parameter(the dashed, red line).Finally, we can also plot second-order indices by setting order = "second" in a plot_sobol() call: rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin R> plot_sobol(data = ind, order = "second") r . N S o b o l’i nd e x Figure 14: Second-order Sobol’ indices.Only S r,N is displayed because plot_sobol() only returns second-order indices for whichthe lower confidence interval does not overlap with zero. The last example illustrates the flexibility of sensobol against systems of differential equations.Under these circumstances, the analyst might be interested in exploring the sensitivity of eachmodel output or state variable to the uncertain inputs at different points in time, i.e. at thetransient phase and/or at equilibrium. sensobol integrates with the deSolve and the data.table packages to achieve this goal in an easy and user-friendly manner (Soetaert, Petzoldt, andSetzer 2010; Dowle and Srinivasan 2019) .We rely here on the spruce budworm and forest model of Ludwig et al. (1976). Sprucebudworm is a devastating pest of Canadian and high-latitude US balsam fir and spruce forests.A half-century ago, research teams led by Crawford Holling developed detailed models of theinteraction between the budworm and their target species, models capable of reproducingthe boom-and-bust dynamics and spatial patterns exhibited in real forests. Donald Ludwigpointed out that these models were overparameterized and that much simpler models couldcapture the essential dynamics in a more robust manner.The basic idea of the model is that the dynamics of the system play out on multiple timescales. In particular, budworm population dynamics respond to forest quality on a fast timescale, leading to changes in forest quality on a slower time scale. In turn, the slow dynamicschange the topological profile of the fast-time scale dynamics, introducing hysteretic oscilla-tions reminiscent of relaxation oscillations. The simplest version of the budworm model in(Ludwig et al. sensobol : variance-based sensitivity indices form dBdt = r B B (cid:18) − BKS T E + E E (cid:19) − β B ( αS ) + B dSdt = r S S (cid:18) − SK E EK S (cid:19) dEdt = r E E (cid:18) − EK E (cid:19) − P BS E T E + E , (12)where B , S and E are the budworm density, the average size of the trees and the energyreserve of the trees respectively (Figure 15). Equation 12 allows a full characterization of theparameter space with empirical data (Table 6).Table 6: Summary of the parameters and their distribution. DU stands for discrete uniform(Ludwig et al. r B Intrinsic budworm growth rate U (1 . , . K Maximum budworm density U (100 , β Maximum budworm predated U (20000 , α maximum density for predation U (1 , r S Intrinsic branch growth rate U (0 . , . K S Maximum branch density U (24000 , K E Maximum E level U (1 , . r E Intrinsic E growth rate U (0 . , P Consumption rate of E U (0 . , . T E E proportion U (0 . , . N , avector with the name of the parameters, the order of the sensitivity indices investigated, thenumber of bootstrap replicas R and the type of confidence intervals. Furthermore, we createa time sequence timeOutput specifying the times at which we will extract the model outputto conduct the sensitivity analysis (25, 50, 75, 100, 125, 150). These time slices have beenselected to get an insight into all the stages of the model (i.e. growth, equilibirum) (Figure15). R> N <- 2 ^ 9R> params <- c("r_b", "K", "beta", "alpha", "r_s", "K_s", "K_e",+ "r_e", "P", "T_e")R> order <- "first"R> R <- 10 ^ 3R> type <- "norm"R> conf <- 0.95R> timeOutput <- seq(25, 150, 25) Since the model in Equation 12 is a system of differential equations, we can easily code itfollowing the guidelines set by the deSolve package (Soetaert et al. rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin R> budworm_fun <- function(t, state, parameters) {+ with(as.list(c(state, parameters)), {+ dB <- r_b * B * (1 - B / (K * S) * (T_e^2 + E^2) / E^2) -+ beta * B^2/((alpha ^ S)^2 + B^2)+ dS <- r_s * S * (1 - (S * K_e) / (E * K_s))+ dE <- r_e * E * (1 - E / K_e) - P * (B / S) * E^2/(T_e^2 + E^2)+ list(c(dB, dS, dE))+ })+ } We can then create the sample matrix as in the previous examples: R> mat <- sobol_matrices(N = N, params = params, order = order) And transform each column to the probability distributions specified in Table 6: R> mat[, "r_b"] <- qunif(mat[, "r_b"], 1.52, 1.6)R> mat[, "K"] <- qunif(mat[, "K"], 100, 355)R> mat[, "beta"] <- qunif(mat[, "beta"], 20000, 43200)R> mat[, "alpha"] <- qunif(mat[, "alpha"], 1, 2)R> mat[, "r_s"] <- qunif(mat[, "r_s"], 0.095, 0.15)R> mat[, "K_s"] <- qunif(mat[, "K_s"], 24000, 25440)R> mat[, "K_e"] <- qunif(mat[, "K_e"], 1, 1.2)R> mat[, "r_e"] <- qunif(mat[, "r_e"], 0.92, 1)R> mat[, "P"] <- qunif(mat[, "P"], 0.0015, 0.00195)R> mat[, "T_e"] <- qunif(mat[, "T_e"], 0.7, 0.9) B S E0 50 100 150 200 0 50 100 150 200 0 50 100 150 2000.951.001.051.10050001000015000200000e+001e+062e+06 t V a l u e Figure 15: Dynamics of the spruce budworm and forest model. The vertical, dashed linesmark the times at which we will conduct the sensitivity analysis. Initial state values: B =1 , S = 0 . , E = 1. The parameter values are the mean values of the distributions shown inTable 6.Since we wish to appraise how the uncertainties in the model inputs affect the three statevariables through time, we need to run budworm_fun() rowwise over mat up to each of thetimes specified in timeOutput , and retrieve the correspondent model output. In order to6 sensobol : variance-based sensitivity indices speed up the computation, it might be convenient to arrange a parallel setting. To that aim,we load the packages foreach (Microsoft and Weston 2020b), parallel (R Core Team 2020),and doParallel (Microsoft and Weston 2020a). R> library("foreach")R> library("parallel")R> library("doParallel") In the next code snippet we design a nested loop to conduct the computations. Note thatthe function budworm_fun() is called through sensobol ’s sobol_ode() , a wrapper around deSolve ’s ode function which allows to retrieve only the model output at time t . In thenested loop, sobol_ode() runs budworm_fun() up to the j -th element of timeOutput withthe value of the parameters as defined in the i -th row of mat . Once it has ran over all therows of mat it starts over again, and computes the model output up to the j + 1 time withthe parameters as defined in the i -th row. Since j = 25 , , , , , mat .Before executing the nested loop, we order the computer to use 75% of the cores available inorder to take advantage of parallel computing. R> n.cores <- makeCluster(floor(detectCores() * 0.75))R> registerDoParallel(n.cores)R> y <- foreach(j = timeOutput,+ .combine = "rbind") %+ foreach(i = 1:nrow(mat),+ .combine = "rbind") %+ {+ sensobol::sobol_ode(d = mat[i, ],+ times = seq(0, j, 1),+ state = c(B = 0.1, S = 007, E = 1),+ func = budworm_fun)+ }R> stopCluster(n.cores) Now we can rearrange the data for the sensitivity analysis. We first create a column to linkthe model output with each time in timeOutput : R> full.dt <- data.table(cbind(y, times = rep(timeOutput, each = nrow(mat))))R> print(full.dt) B S E times1: 16164.60 131.65941 0.9505648 252: 16502.68 182.76395 1.0475317 253: 14454.00 94.79514 0.8514826 254: 20518.10 215.36948 0.9395692 255: 19044.85 111.68735 0.8902224 25--- rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin And transform the resulting data.table from wide to long format: R> indices.dt <- melt(full.dt, measure.vars = c("B", "S", "E"))R> print(indices.dt) times variable value1: 25 B 1.616460e+042: 25 B 1.650268e+043: 25 B 1.445400e+044: 25 B 2.051810e+045: 25 B 1.904485e+04---110588: 150 E 8.937556e-01110589: 150 E 9.330312e-01110590: 150 E 1.078799e+00110591: 150 E 9.380807e-01110592: 150 E 9.975085e-01 With this transformation and the compatibility of sensobol with the data.table package (Dowleand Srinivasan 2019), we can easily compute variance-based sensitivity indices at each selectedtime step for each state variable. We first activate 75% of the CPUs to bootstrap the Sobol’indices in parallel, and then compute the Sobol’ indices grouping by times and variable : R> ncpus <- floor(detectCores() * 0.75)R> indices <- indices.dt[, sobol_indices(Y = value, N = N, params = params,+ order = order, boot = TRUE,+ first = "jansen", R = R,+ parallel = "multicore",+ ncpus = ncpus),+ .(variable, times)] We can then compute the Sobol’ indices of the dummy parameter: R> indices.dummy <- indices.dt[, sobol_dummy(Y = value, N = N, params = params),+ .(variable, times)] Finally, with some lines of code we can get to visualize the evolution of S i and T i indicesthrough time for each state variable and uncertain model input:8 sensobol : variance-based sensitivity indices R> ggplot(indices, aes(times, original, fill = sensitivity,+ color = sensitivity,+ group = sensitivity)) ++ geom_line() ++ geom_ribbon(aes(ymin = indices[sensitivity %+ ymax = indices[sensitivity %+ color = sensitivity),+ alpha = 0.1, linetype = 0) ++ geom_hline(data = indices.dummy[, parameters:= NULL][sensitivity == "Ti"],+ aes(yintercept = original, color = sensitivity, group = times),+ lty = 2, size = 0.1) ++ ggplot2::guides(linetype = FALSE, color = FALSE) ++ guides(linetype = FALSE, color = FALSE) ++ facet_grid(parameters ~ variable) ++ scale_y_continuous(breaks = scales::pretty_breaks(n = 3)) ++ labs(x = expression(italic(t)),+ y = "Sobol ' indices") ++ theme_AP() ++ theme(legend.position = "top") rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin B S E α β K E K S K P r B r E r S T E 40 80 120 40 80 120 40 80 1200.00.51.00.00.51.00.00.51.00.00.51.00.00.51.00.00.51.00.00.51.00.00.51.00.00.51.00.00.51.0 t S o b o l’i nd i ce s sensitivity Si Ti Figure 16: Evolution of Sobol’ indices through time in the spruce budworm and forest model.The dashed, horizontal blue line shows the T i of the dummy parameter.The main results in Figure 16 can be summarized as follows:1. The spruce budworm and forest model is largely additive up to t ≈ 80 as interactionsare very small and only affect the behavior of B ( S α < T α , S K < T K , S r s < T r s ). From t > 80, the model seems to be fully additive on all three state variables ( S i ≈ T i ).0 sensobol : variance-based sensitivity indices 2. Given the uncertainty ranges defined in Table 6, only four parameters out of 10 areinfluential in conveying uncertainty to B , S and E : α , K , K E and r S :(a) The uncertainty in B is determined by α , K and r S at 0 < t < 100 and by K at t > S is fully driven by r S up to t ≈ 80 and by K at t > E is influenced by α , K E and K at 0 < t < 40 and by K E and K at t > 4. Conclusions Mathematical models are widely used to gain insights into complex processes, to predict theoutcome of a variable of interest or the explore “what if” scenarios. In order to increase theirtransparency and ensure the quality of model-based inferences, it is paramount to scrutinizethese model with a global sensitivity analysis. sensobol aims at furthering the uptake of globalsensitivity analysis methods by the modeling community with a convenient, easy-to-use setof functions to compute variance-based analysis, the yardstick of global sensitivity methods. sensobol allows the user to combine several first and total-order estimators, to estimate up tothird-order effects and to visualize the results in publication-ready plots. Due to its integrationwith data.table and deSolve , sensobol can easily compute variance-based indices for modelswith a scalar or multivariate model output, as well as for systems of differential equations. sensobol will keep on developing as the search for more efficient variance-based estimatorsis an active field of research. We encourage the users to provide feedback and suggestionson how can the package be improved. The most recent updates can be followed on https://github.com/arnaldpuy/sensobol . 5. Acknowledgements This work has been funded by the European Commission (Marie Sk(cid:32)lodowska-Curie GlobalFellowship, grant number 792178 to AP). 6. Annex rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin sensobol and sensitivity , from the design of the samplematrix to the computation of the model output and the Sobol’ indices. To ensure that theresults were not critically conditioned by a particular benchmark design, we draw on Becker(2020) and Puy et al. (2020a) and compare the efficiency of sensobol and sensitivity onseveral randomly defined sensitivity settings. These settings are created by treating the basesample size N , the model dimensionality k and the functional test of the model as randomparameters: N and k are described with the probability distributions shown in Table 7, andthe test function is Becker (2020)’s metafunction (Table 4, N º p univariate functions in a multivariate function of dimension k . Becker’s metafunction can becalled in sensobol with metafunction() , and in its current implementation includes cubic,discontinuous, exponential, inverse, linear, no-effect, non-monotonic, periodic, quadratic andtrigonometric functions. We direct the reader to Becker (2020) and Puy et al. (2020a) forfurther information.Table 7: Summary of the benchmark parameters N and k . DU stands for discrete uniformParameter Description Distribution N Base sample size of the sample matrix DU (10 , k Model dimensionality DU (3 , sensobol and sensitivity as follows: • We create a (2 , 2) sample matrix using random numbers, where the first column islabeled N and the second column k . • We describe N and k with the probability distributions in Table 7. • For v = 1 , , . . . , rows, we conducte two parallel sensitivity analysis using the func-tions and guidelines of sensitivity and sensobol respectively, with the base sample matrixas defined by N v and k v . The metafunction runs separately in both sensitivity analysis,and we bootstrap the sensitivity indices 100 times. • We time the computation for both sensobol and sensitivity in each row.The results suggest that sensobol may be a median of two times faster than sensitivity .We provide the code below:Load required packages: R> library(microbenchmark) Define the settings of the analysis: R> N <- 2 ^ 11R> parameters <- c("N", "k")R> R <- 10 ^ 2 Create the sample matrix: R> dt <- sensobol::sobol_matrices(matrices = "A", N = N, params = parameters)R> dt[, 1] <- floor(qunif(dt[, 1], 10, 10 ^ 2 + 1))R> dt[, 2] <- floor(qunif(dt[, 2], 3, 100)) sensobol : variance-based sensitivity indices Run benchmark in parallel: R> n.cores <- makeCluster(floor(detectCores() * 0.75))R> registerDoParallel(n.cores)R> Arrange the data and transform from nanoseconds to milliseconds: R> out <- rbindlist(y)[, time := time / 1e+06] Plot the results: sensobolsensitivity 0 500 1000 1500 2000 Time (Milliseconds) Figure 17: Benchmark of the sensitivity and sensobol packages. The comparison has beendone with the Jansen estimators. rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin R> out[, median(time), expr] expr V11: sensitivity 255.44672: sensobol 115.9619 Given its reliance on variance and co-variance matrices, sensobol also offers support to com-pute the Variogram Analysis of Response Surfaces Total-Order index (VARS-TO) (Razaviand Gupta 2016b,a). VARS uses variograms and co-variograms to characterize the spatialstructure and variability of a given model output across the input space, and allows to dif-ferentiate sensitivities as a function of scale h : if x A and x B are two points separated by adistance h , and y ( x A ) and y ( x B ) is the corresponding model output y , the variogram γ ( . ) iscalculated as γ ( x A − x B ) = 12 V [ y ( x A ) − y ( x B )] , (13)and the covariogram C ( . ) as C ( x A − x B ) = COV [ y ( x A ) , y ( x B )] . (14)Since V [ y ( x A ) − y ( x B )] = V [ y ( x A )] + V [ y ( x B )] − COV [ y ( x A ) , y ( x B )] , (15)and V [ y ( x A )] = V [ y ( x B )], then γ ( x A − x B ) = V [ y ( x )] − C ( x A , x B ) . (16)If we want to compute the variogram for factor x i , then γ ( h i ) = 12 E ( y ( x , ..., x i +1 + h i , ..., x n ) − y ( x , ..., x i , ..., x n )) . (17)Note that the difference in parentheses in Equation 17 involves taking a step along the x i direction, and is analogous to computing the total-order index T i (see Section 2.1). Theequivalent of Equation 16 for the model input x i would therefore be γ x ∗∼ i ( h i ) = V ( y | x ∗∼ i ) − C x ∗∼ i ( h i ) , (18)where x ∗∼ i is a fixed point in the space of non- x i . To compute T i in the framework of VARS(labelled as VARS-TO by Razavi and Gupta (2016b)), the mean value across the factors’space should be taken on both sides of Equation 18, e.g. E x ∗∼ i (cid:2) γ ∗ x ∼ i ( h i ) (cid:3) = E x ∗∼ i [ V ( y | x ∗∼ i )] − E x ∗∼ i (cid:2) C ∗ x ∼ i ( h i ) (cid:3) , (19)4 sensobol : variance-based sensitivity indices which can also be written as E x ∗∼ i (cid:2) γ ∗ x ∼ i ( h i ) (cid:3) = V ( y ) T i − E x ∗∼ i (cid:2) C ∗ x ∼ i ( h i ) (cid:3) , (20)and therefore T i = VARS-TO = E ∗ x ∼ i (cid:2) γ ∗ x ∼ i ( h i ) (cid:3) + E ∗ x ∼ i (cid:2) C ∗ x ∼ i ( h i ) (cid:3) V ( y ) . (21)The computation of VARS does not require A , B , A ( i ) B . . . matrices, but a sampling designbased on stars. Such stars are created as follows: firstly, N star points across the factor spaceneed to be selected by the analyst using random or quasi-random numbers. These are the starcentres and their location can be denoted as s v = s v , ..., s v i , ..., s v k , where v = 1 , , ..., N star .Then, for each star centre, a cross section of equally spaced points ∆ h apart needs to begenerated for each of the k factors, including and passing through the star centre. The crosssection is produced by fixing s v ∼ i and varying s i . Finally, for each factor all pairs of pointswith h values of ∆ h, h, h and so on should be extracted. The total computational costof this design is N t = N star (cid:2) k ( h − 1) + 1 (cid:3) .In order to use VARS in sensobol , the analyst should follow the same steps as in the previousexamples. Firstly, she should define the setting of the analysis, i.e. the number of star centersand distance h , and create a vector with the name of the parameters: R> star.centers <- 100R> h <- 0.1R> params <- paste("X", 1:8, sep = "") Then, the function vars_matrices() creates the sample matrix needed to compute VARS-TO: R> mat <- vars_matrices(star.centers = star.centers, h = h, params = params) We can then run the model rowwise, in this case the Sobol’ (1998) G function (Table 4): R> y <- sobol_Fun(mat) And compute VARS-TO with the vars_to() function R> ind <- vars_to(Y = y, star.centers = star.centers, params = params, h = h)R> print(ind) Ti parameters1: 0.8213028904 X12: 0.2526291054 X23: 0.0346579957 X34: 0.0104013502 X45: 0.0001079858 X56: 0.0001034112 X67: 0.0001076416 X78: 0.0001055614 X8 rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin vars_to() does not allow to bootstrap the indices. This isplanned for future sensobol releases. References Azzini I, Listorti G, Mara TA, Rosati R (2020a). “Uncertainty and Sensitivity Analysis forpolicy decision making. An introductory guide.” doi:10.2760/922129 .Azzini I, Mara T, Rosati R (2020b). “Monte Carlo estimators of first-and total-orders Sobol’indices.” , URL http://arxiv.org/abs/2006.08232 .Azzini I, Rosati R (2019). “The IA-Estimator for Sobol’ sensitivity indices.” In Ninth Inter-national Conference on Sensitivity Analysis of Model Output . Barcelona.Becker W (2020). “Metafunctions for benchmarking in sensitivity analysis.” Reliability En-gineering and System Safety , , 107189. ISSN 09518320. doi:10.1016/j.ress.2020.107189 . URL https://doi.org/10.1016/j.ress.2020.107189 .Becker W, Saltelli A (2015). “Design for sensitivity analysis.” In A Dean, M Morris, J Stufken,D Bingham (eds.), Handbook of Design and Analysis of Experiments , pp. 627–674. CRCPress, Taylor & Francis, Boca Rat´on. ISBN 9780429096341. doi:10.1201/b18619 .Bidot C, Lamboni M, Monod H (2018). “multisensi: Multivariate Sensitivity Analysis.” URL https://cran.r-project.org/web/packages/multisensi/index.html .Borgonovo E (2007). “A new uncertainty importance measure.” Reliability Engineering andSystem Safety , (6), 771–784. ISSN 09518320. doi:10.1016/j.ress.2006.04.015 .Bratley P, Fox BL (1988). “ALGORITHM 659: implementing Sobol’s quasirandom sequencegenerator.” ACM Transactions on Mathematical Software (TOMS) , (1), 88–100.Bratley P, Fox BL, Niederreiter H (1992). “Implementation and tests of low-discrepancysequences.” ACM Transactions on Modeling and Computer Simulation (TOMACS) , (3),195–213.Chalom A, Inacio Knegt Lopez P (2017). “pse: Parameter Space Exploration with LatinHypercubes.”Dowle M, Srinivasan A (2019). “data.table: Extension of ‘data.frame‘.” URL https://cran.r-project.org/package=data.table .Ferretti F, Saltelli A, Tarantola S (2016). “Trends in sensitivity analysis practice in thelast decade.” Science of the Total Environment , , 666–670. ISSN 18791026. doi:10.1016/j.scitotenv.2016.02.133 . URL http://dx.doi.org/10.1016/j.scitotenv.2016.02.133 .Gilbertson A (2018). “An approach for using probabilistic risk assessment in risk-informeddecisions on plant-specific changes to the licensing basis. Regulatory guide 1.174, revision3.” Technical report , US Nuclear Regulatory Commission.6 sensobol : variance-based sensitivity indices Glen G, Isaacs K (2012). “Estimating Sobol sensitivity indices using correlations.” Environ-mental Modelling and Software , , 157–166. ISSN 13648152. doi:10.1016/j.envsoft.2012.03.014 . URL http://dx.doi.org/10.1016/j.envsoft.2012.03.014 .Herman J, Usher W (2017). “SALib: An open-source Python library for Sensitivity Analysis.” The Journal of Open Source Software , (9), 97. ISSN 2475-9066. doi:10.21105/joss.00097 . URL http://joss.theoj.org/papers/10.21105/joss.00097 .Homma T, Saltelli A (1996a). “Importance measures in global sensitivity analysis of nonlinearmodels.” Reliability Engineering & System Safety , (1), 1–17. ISSN 09518320. doi:10.1016/0951-8320(96)00002-6 . URL https://linkinghub.elsevier.com/retrieve/pii/0951832096000026 .Homma T, Saltelli A (1996b). “Importance measures in global sensitivity analysis of nonlinearmodels.” Reliability Engineering & System Safety , , 1–17. ISSN 09518320. doi:10.1016/0951-8320(96)00002-6 . URL .Iooss B, Janon A, Pujol G, with contributions from Baptiste Broto, Boumhaout K, Veiga SD,Delage T, Amri RE, Fruth J, Gilquin L, Guillaume J, Le Gratiet L, Lemaitre P, Marrel A,Meynaoui A, Nelson BL, Monari F, Oomen R, Rakovec O, Ramos B, Roustant O, Song E,Staum J, Sueur R, Touati T, Weber F (2020). sensitivity: Global Sensitivity Analysis ofModel Outputs . URL https://cran.r-project.org/package=sensitivity .Ishigami T, Homma T (1990). “An importance quantification technique in uncertainty analysisfor computer models.” Proceedings. First International Symposium on Uncertainty Modelingand Analysis , , 398–403.Jakeman A, Letcher R, Norton J (2006). “Ten iterative steps in development and evaluationof environmental models.” Environmental Modelling & Software , (5), 602–614. ISSN13648152. doi:10.1016/j.envsoft.2006.01.004 . URL https://linkinghub.elsevier.com/retrieve/pii/S1364815206000107 .Janon A, Klein T, Lagnoux A, Nodet M, Prieur C (2014). “Asymptotic normality and ef-ficiency of two Sobol index estimators.” ESAIM: Probability and Statistics , (3), 342–364. ISSN 1292-8100. doi:10.1051/ps/2013040 . arXiv:1303.6451v1 , URL .Jansen M (1999). “Analysis of variance designs for model output.” Computer Physics Commu-nications , (1-2), 35–43. ISSN 00104655. doi:10.1016/S0010-4655(98)00154-4 . URL http://linkinghub.elsevier.com/retrieve/pii/S0010465598001544 .Kay JA (2012). “Knowing when we don’t know.” Technical report , Institute for Fiscal Studies.URL .Kucherenko S, Delpuech B, Iooss B, Tarantola S (2015). “Application of the control variatetechnique to estimation of total sensitivity indices.” Reliability Engineering and SystemSafety , (July), 251–259. ISSN 09518320. doi:10.1016/j.ress.2014.07.008 .Kucherenko S, Feil B, Shah N, Mauntz W (2011). “The identification of modeleffective dimensions using global sensitivity analysis.” Reliability Engineering & rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin System Safety , (4), 440–449. ISSN 09518320. doi:10.1016/j.ress.2010.11.003 . URL http://dx.doi.org/10.1016/j.ress.2010.11.003https://linkinghub.elsevier.com/retrieve/pii/S0951832010002437 .Leamer EE (2010). “Tantalus on the road to asymptopia.” Journal of Economic Perspectives , (2), 31–46. ISSN 08953309. doi:10.1257/jep.24.2.31 .Lo Piano S, Ferretti F, Puy A, Albrecht D, Saltelli A (2020). “Variance-based sensitivityanalysis: The quest for better estimators between explorativity and efficiency.” ReliabilityEngineering & System Safety , p. 107300. ISSN 0951-8320. doi:10.1016/j.ress.2020.107300 . URL https://doi.org/10.1016/j.ress.2020.107300 .Lotka A (1925). Elements of Physical Biology . Williams & Wilkins Company.Ludwig D, Jones DD, Holling CS (1976). “Qualitative analysis of insect outbreak systems: thespruce budworm and forest.” The Journal of Animal Ecology , (1), 315. ISSN 00218790. doi:10.2307/3939 .Marelli S, Sudret B (2014). “: a framework for Uncertainty Quantification in M.” In The 2ndInternational Conference on Vulnerability and Risk Analysis and Management (ICVRAM2014) , Bourinet 2009, pp. 2554–2563. University of Liverpool, United Kingdomm July 13-16, Liverpool.May RM, Oster GF (1976). “Bifurcations and dynamic complexity in simple ecological mod-els.” The American Naturalist , (974), 573–599.McKay MD, Beckman RJ, Conover WJ (1979). “Comparison of three methods for selectingvalues of input variables in the analysis of output from a computer code.” Technometrics , (2), 239–245. ISSN 15372723. doi:10.1080/00401706.1979.10489755 .Microsoft, Weston S (2020a). “doParallel: Foreach Parallel Adaptor for the ’parallel’ Package.”Microsoft, Weston S (2020b). “foreach: Provies Foreach Looping Construct.”Monod H, Naud C, Makowski D (2006). Uncertainty and sensitivity analysis for crop models .ISBN 0-444-52135-6. doi:10.1016/j.ress.2007.06.003 . URL http://reseau-mexico.fr/sites/reseau-mexico.fr/files/06{_}zebook{_}CH-03.pdf .Oakley J, O’Hagan A (2004). “Probabilistic sensitivity analysis of complex models: a Bayesianapproach.” Journal of the Royal Statistical Society B , (3), 751–769. ISSN 13697412. doi:10.1111/j.1467-9868.2004.05304.x .Pearl R, Reed LJ (1920). “On the rate of growth of the population of the United Statessince 1790 and Its mathematical representation.” Proceedings of the National Academyof Sciences , (6), 275–288. ISSN 0027-8424. doi:10.1073/pnas.6.6.275 . URL .Pianosi F, Beven K, Freer J, Hall JW, Rougier J, Stephenson DB, Wagener T (2016).“Sensitivity analysis of environmental models: A systematic review with practical work-flow.” Environmental Modelling and Software , , 214–232. ISSN 13648152. doi:10.1016/j.envsoft.2016.02.008 . j.envsoft.2016.02.008 , URL http://dx.doi.org/10.1016/j.envsoft.2016.02.008 .8 sensobol : variance-based sensitivity indices Pianosi F, Sarrazin F, Wagener T (2015). “A Matlab toolbox for Global Sensitivity Analy-sis.” Environmental Modelling and Software , , 80–85. ISSN 13648152. doi:10.1016/j.envsoft.2015.04.009 . URL http://dx.doi.org/10.1016/j.envsoft.2015.04.009 .Pianosi F, Wagener T (2015). “A simple and efficient method for global sensitivity analysisbased on cumulative distribution functions.” Environmental Modelling and Software , ,1–11. ISSN 13648152. doi:10.1016/j.envsoft.2015.01.004 . URL http://dx.doi.org/10.1016/j.envsoft.2015.01.004 .Puy A, Becker W, Piano SL, Saltelli A (2020a). “The battle of total-order sensitivity estima-tors.” PLoS ONE . , URL http://arxiv.org/abs/2009.01147 .Puy A, Lo Piano S, Saltelli A (2020b). “A sensitivity analysis of the PAWN sensitivityindex.” Environmental Modelling and Software , , 104679. ISSN 13648152. doi:10.1016/j.envsoft.2020.104679 . , URL http://arxiv.org/abs/1904.04488 .Puy A, Lo Piano S, Saltelli A (2020c). “Current models underestimate future irrigated areas.” Geophysical Research Letters , (8). ISSN 0094-8276. doi:10.1029/2020GL087360 . URL https://onlinelibrary.wiley.com/doi/abs/10.1029/2020GL087360 .R Core Team (2020). “R: A language and environment for statistical computing. R Foundationfor Statistical Computing.”Razavi S, Gupta HV (2016a). “A new framework for comprehensive, robust, andefficient global sensitivity analysis: 1. Theory.” Water Resources Research , (1), 423–439. ISSN 0043-1397. doi:10.1002/2015WR017559 . URL http://doi.wiley.com/10.1111/j.1752-1688.1969.tb04897.xhttps://onlinelibrary.wiley.com/doi/abs/10.1002/2015WR017558 .Razavi S, Gupta HV (2016b). “A new framework for comprehensive, robust, and efficientglobal sensitivity analysis: 2. Application.” Water Resources Research , (1), 440–455. ISSN0043-1397. doi:10.1002/2015WR017558 . , URL https://onlinelibrary.wiley.com/doi/abs/10.1002/2015WR017559 .Reusser D (2015). “Fast: Implementation of the Fourier amplitude sensitivity test (fast).”Saltelli A, Aleksankina K, Becker W, Fennell P, Ferretti F, Holst N, Li S, Wu Q (2019). “Whyso many published sensitivity analyses are false: A systematic review of sensitivity analysispractices.” Environmental Modelling and Software , (January), 29–39. ISSN 13648152. doi:10.1016/j.envsoft.2019.01.012 . , URL https://doi.org/10.1016/j.envsoft.2019.01.012 .Saltelli A, Andres TH, Homma T (1993). “Sensitivity analysis of model output. An inves-tigation of new techniques.” Computational Statistics and Data Analysis , (2), 211–238.ISSN 01679473. doi:10.1016/0167-9473(93)90193-W .Saltelli A, Annoni P, Azzini I, Campolongo F, Ratto M, Tarantola S (2010). “Variancebased sensitivity analysis of model output. Design and estimator for the total sensi-tivity index.” Computer Physics Communications , (2), 259–270. ISSN 00104655. doi:10.1016/j.cpc.2009.09.018 . URL http://dx.doi.org/10.1016/j.cpc.2009.09.018https://linkinghub.elsevier.com/retrieve/pii/S0010465509003087 . rnald Puy, Samuele Lo Piano, Andrea Saltelli, Simon A. Levin Nature , (7813), 482–484.ISSN 0028-0836. doi:10.1038/d41586-020-01812-9 . URL .Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, TarantolaS (2008). Global Sensitivity Analysis. The Primer . John Wiley & Sons, Ltd, Chichester,UK. ISBN 9780470725184. doi:10.1002/9780470725184 . URL http://doi.wiley.com/10.1002/9780470725184 .Sobol’ IM (1967). “On the distribution of points in a cube and the approximate evalua-tion of integrals.” USSR Computational Mathematics and Mathematical Physics , (4), 86–112. ISSN 00415553. doi:10.1016/0041-5553(67)90144-9 . URL https://linkinghub.elsevier.com/retrieve/pii/0041555367901449 .Sobol’ IM (1976). “Uniformly distributed sequences with an additional uniform property.” USSR Computational Mathematics and Mathematical Physics , (5), 236–242. ISSN00415553. doi:10.1016/0041-5553(76)90154-3 . URL https://linkinghub.elsevier.com/retrieve/pii/0041555376901543 .Sobol’ IM (1993). “Sensitivity analysis for nonlinear mathematical models.” MathematicalModeling and Computational Experiment , (3), 407–414. ISSN 0234-0879.Sobol’ IM (1998). “On quasi-Monte Carlo integrations.” Mathematics and Computers inSimulation , (2-5), 103–112. ISSN 03784754. doi:10.1016/S0378-4754(98)00096-2 .URL http://linkinghub.elsevier.com/retrieve/pii/S0378475498000962 .Sobol’ IM (2001). “Global sensitivity indices for nonlinear mathematical models and theirMonte Carlo estimates.” Mathematics and Computers in Simulation , (1-3), 271–280. ISSN03784754. doi:10.1016/S0378-4754(00)00270-6 . URL http://linkinghub.elsevier.com/retrieve/pii/S0378475400002706 .Soetaert K, Petzoldt T, Setzer RW (2010). “Solving differential equations in R: PackagedeSolve.” Journal of Statistical Software , (9), 1–25. ISSN 15487660. doi:10.18637/jss.v033.i09 .Steinmann P, Wang JR, van Voorn GAK, Kwakkel JH (2020). “Don’t try to predict COVID-19. If you must, use deep uncertainty methods.” Review of Artificial Societies and SocialSimulation , (April 17).Verhulst PF (1845). “Recherches math´ematiques sur la loi d’accroissement de la population.” Nouveaux M´emoires de l’Acad´emie Royale des Sciences et Belles-Lettres de Bruxelles , (8).Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag, New York.URL https://ggplot2.tidyverse.org .0 sensobol : variance-based sensitivity indices Affiliation: Arnald PuyEcology and Evolutionary Biology,Princeton UniversityM31 Guyot HallNew Jersey 08544, USAE-mail: [email protected] URL: