[PDF] Gaussian linear approximation for the estimation of the Shapley effects

Abstract

In this paper, we address the estimation of the sensitivity indices called "Shapley eects". These sensitivity indices enable to handle dependent input variables. The Shapley eects are generally dicult to estimate, but they are easily computable in the Gaussian linear framework. The aim of this work is to use the values of the Shapley eects in an approximated Gaussian linear framework as estimators of the true Shapley eects corresponding to a non-linear model. First, we assume that the input variables are Gaussian with small variances. We provide rates of convergence of the estimated Shapley eects to the true Shapley eects. Then, we focus on the case where the inputs are given by an non-Gaussian empirical mean. We prove that, under some mild assumptions, when the number of terms in the empirical mean increases, the dierence between the true Shapley eects and the estimated Shapley eects given by the Gaussian linear approximation converges to 0. Our theoretical results are supported by numerical studies, showing that the Gaussian linear approximation is accurate and enables to decrease the computational time signicantly.

Full PDF

GGaussian linear approximation for theestimation of the Shapley eﬀects

Baptiste Broto , François Bachoc , Marine Depecker , andJean-Marc Martinez CEA, LIST, Université Paris-Saclay, F-91120, Palaiseau, France Institut de Mathématiques de Toulouse, Université Paul Sabatier,F-31062 Toulouse, France CEA, DES/DM2S, Université Paris-Saclay, F-91191Gif-sur-Yvette, FranceJune 4, 2020

Abstract

In this paper, we address the estimation of the sensitivity indices called"Shapley eﬀects". These sensitivity indices enable to handle dependentinput variables. The Shapley eﬀects are generally diﬃcult to estimate, butthey are easily computable in the Gaussian linear framework. The aim ofthis work is to use the values of the Shapley eﬀects in an approximatedGaussian linear framework as estimators of the true Shapley eﬀects corre-sponding to a non-linear model. First, we assume that the input variablesare Gaussian with small variances. We provide rates of convergence ofthe estimated Shapley eﬀects to the true Shapley eﬀects. Then, we fo-cus on the case where the inputs are given by an non-Gaussian empiricalmean. We prove that, under some mild assumptions, when the numberof terms in the empirical mean increases, the diﬀerence between the trueShapley eﬀects and the estimated Shapley eﬀects given by the Gaussianlinear approximation converges to 0. Our theoretical results are supportedby numerical studies, showing that the Gaussian linear approximation isaccurate and enables to decrease the computational time signiﬁcantly.

Sensitivity analysis, and particularly sensitivity indices, have became importanttools in applied sciences. The aim of sensitivity indices is to quantify the im-pact of the input variables X , · · · , X p on the output Y = f ( X , · · · , X p ) of amodel f . This information improves the interpretability of the model. In globalsensitivity analysis, the input variables are assumed to be random variables. In1 a r X i v : . [ m a t h . S T ] J un his framework, the Sobol indices [Sob93] were the ﬁrst suggested indices to beapplicable to general classes of models. Nevertheless, one of the most importantlimitations of these indices is the assumption of independence between the inputvariables. Hence, many variants of the Sobol indices have been suggested fordependent input variables [MT12, Cha13, MTA15, CGP12].Recently, Owen deﬁned new sensitivity indices in [Owe14] called "Shapleyeﬀects". These sensitivity indices have many advantages over the Sobol indicesfor dependent inputs [IP19]. For general models, [SNS16] suggested an estimatorof the Shapley eﬀects. However, this estimator requires to be able to generatesamples with the conditional distributions of the input variables. Then, a con-sistent estimator has been suggested in [BBD20], requiring only a sample of theinputs-output. However, in practice, this estimator requires a large sample andis very costly in terms of computational time.Let us now consider the framework when the distribution of X , · · · , X p isGaussian and f is linear, that we call the Gaussian linear framework. Thisframework is considered relatively commonly (see for example [KHF +

06, HT11,Ros04, Clo19]), since the unknown function f ( X , · · · , X p ) can be approximatedby its linear approximation around E ( X ) . The Gaussian linear setting is highlybeneﬁcial, since the theoretical values of the Shapley eﬀects can be computedexplicitly [OP17, IP19, BBDM19, BBCM20]. These values depend on the co-variance matrix of the inputs and on the coeﬃcients of the linear model. Analgorithm enabling to compute these values is implemented as the function"ShapleyLinearGaussian" in the R package sensitivity [IAP20]. It is shownin [BBDM19] that this computation is almost instantaneous when the number p of input variables is smaller than 15, but becomes more diﬃcult for p ≥ .However, "ShapleyLinearGaussian" uses the possible block-diagonal structureof the covariance matrix to reduce the dimension, thereby reducing the compu-tation cost [BBDM19].The aim of this paper is to use the Shapley values computed from a Gaus-sian linear model as estimates of the true Shapley values corresponding to anon-linear model f . We provide convergence guarantees, as the Gaussian lin-ear approximation is more and more accurate. We address the two followingsettings.First, we assume that X = ( X , . . . , X p ) is a Gaussian vector with variancesdecreasing to 0, and f is not linear. We give the rate of convergence of thediﬀerence between the true Shapley eﬀects and the ones given by the ﬁrst-orderTaylor polynomial of f at the mean of X . To estimate the Shapley eﬀects in abroader setting, we also provide the rate of convergence when the Taylor poly-nomial is unknown and the linear approximation is given by a ﬁnite diﬀerenceapproximation and a linear regression. To strengthen these theoretical results,we compare the three linear approximations on simulated data.Second, we consider the case where the input vector is non-Gaussian andgiven by an empirical mean and the model f is non-linear. We address the esti-mators of the Shapley values obtained by treating the input vector as Gaussian2nd the model as linear. We show that, as the number of summands goes toinﬁnity, the estimators of the Shapley values converge to the true Shapley val-ues, corresponding to the non-Gaussian input vector and the non-linear model.Then, we treat the particular case when the Shapley eﬀects evaluate the impactof the individual estimation errors on a global estimation error. In numeri-cal experiments, we compare the estimator of the Shapley eﬀects given by theGaussian linear framework with the estimator of the Shapley eﬀects given bythe general procedure of [BBD20], to the advantage of the former.The rest of the article is organized as follows. In Section 2, we recall thedeﬁnition of the Shapley eﬀects and we detail the particular form of the Gaussianlinear framework. Section 3 provides the rates of convergence for Gaussianinputs and non-linear models. In Section 4, we address the case where theinputs are given by an empirical mean and f is non-linear. The conclusions aregiven in Section 5. All the proofs are postponed to the supplementary material. Let ( X i ) i ∈ [1: p ] be random input variables on R p and let Y = f ( X ) be thereal random output variable which is squared integrable . We assume that Var( Y ) (cid:54) = 0 . Here, f : R p → R can be a numerical simulation model [SWNW03].If u ⊂ [1 : p ] and x = ( x i ) i ∈ [1: p ] ∈ R p , we write x u := ( x i ) i ∈ u . We can deﬁnethe Shapley eﬀects as in [Owe14], where for each input variable X i , the Shapleyeﬀect is: η i ( X, f ) := 1 p Var( Y ) (cid:88) u ⊂− i (cid:18) p − | u | (cid:19) − (cid:0) Var( E ( Y | X u ∪{ i } )) − Var( E ( Y | X u )) (cid:1) (1)where − i is the set [1 : p ] \ { i } . We let η ( X, f ) be the vector of dimension p composed of η ( X, f ) , ..., η p ( X, f ) . One can see in Equation (1) that adding X i to X u changes the conditional expectation of Y , and increases the variability ofthis conditional expectation. The Shapley eﬀect η i ( X, f ) is large when, on aver-age, the variance of this conditional expectation increases signiﬁcantly when X i is observed. Thus, a large Shapley eﬀect η i ( X, f ) corresponds to an importantinput variable X i .The Shapley eﬀects have interesting properties for global sensitivity analy-sis. Indeed, there is only one Shapley eﬀect for each variable (contrary to theSobol indices). Moreover, the sum of all the Shapley eﬀects is equal to (see[Owe14]) and all these values lie in [0 , even with dependent inputs. This isvery convenient for the interpretation of these sensitivity indices.An estimator of the Shapley eﬀects has been suggested in [SNS16]. It is im-plemented in the R package sensitivity as the function "shapleyPermRand".However, it requires to be able to generate samples with the conditional distri-butions of the inputs, which limits the application framework. [BBD20] sug-gested another estimator which requires only a sample of the inputs-output.3his estimator uses nearest-neighbour methods to mimic the generation of sam-ples from these conditional distributions. It is implemented in the R package sensitivity as the function "shapleySubsetMC". However, in practice, thisestimator requires a large sample and is very costly in terms of computationaltime.Consider now the case where X ∼ N ( µ, Σ) , with Σ ∈ S ++ p ( R ) and wherethe model is linear, that is f : x (cid:55)−→ β + β T x , for a ﬁxed β ∈ R and a ﬁxedvector β . In this framework, the sensitivity indices can be calculated explicitly[OP17]: η i ( X, f ) := 1 p Var( Y ) (cid:88) u ⊂− i (cid:18) p − | u | (cid:19) − (cid:0) Var( Y | X u ) − Var( Y | X u ∪{ i } ) (cid:1) (2)with Var( Y | X u ) = Var( β T − u X − u | X u ) = β T − u (Σ − u, − u − Σ − u,u Σ − u,u Σ u, − u ) β − u (3)where Γ v,w := (Γ i,j ) i ∈ v,j ∈ w . Thus, in the Gaussian linear framework, the Shap-ley eﬀects are functions of the parameters β and Σ . The Gaussian linear frame-work is thus very beneﬁcial from an estimation point of view, because in gen-eral one needs to estimate conditional moments of the form Var( E ( Y | X v )) for v ⊂ [1 , p ] , using nearest-neighbour methods, while in the Gaussian linear frame-work, only standard matrix vector operations are required. To model uncertain physical values, it can be convenient to consider them asa Gaussian vector. For example, the international libraries [McL05, JEF13,JEN11] on real data from the ﬁeld of nuclear safety provide the average andcovariance matrix of the input variables, so it is natural to model them withthe Gaussian distribution. Hence, to quantify the impact of the uncertaintiesof the physical inputs of a model on a quantity of interest, it is commonlythe case to estimate the Shapley eﬀects of Gaussian inputs. The model f is ingeneral non-linear and the estimation procedures dedicated to non-linear models[SNS16, BBD20] are typically computationally costly, with an accuracy that canbe sensitive to the speciﬁc situation. Nevertheless, when the uncertainty on theinputs become small, the input vector converges to its mean µ , and a linearapproximation of the model at µ seems more and more appropriate.To formalize this idea, let X { n } ∼ N ( µ { n } , Σ { n } ) be the input vector, with asequence of mean vectors ( µ { n } ) and a sequence of covariance matrices (Σ { n } ) .The index n can represent for instance the number of measures of an uncertaininput, in which case the covariance matrix Σ { n } will decrease with n .4 ssumption 1. The covariance matrix Σ { n } decreases to such that the eigen-values of a { n } Σ { n } are lower-bounded and upper-bounded in R ∗ + , with a { n } −→ n → + ∞ + ∞ . Moreover, µ { n } −→ n → + ∞ µ , where µ is a ﬁxed vector. In Assumption 1, the condition on the eigenvalues of a { n } Σ { n } means thatthe correlation matrix obtained from Σ { n } can not get close to a singular matrix.This condition is necessary in our proofs.If j ∈ N and if f is C j at µ { n } , we will write f { n } j ( x ) = j ! D j f ( µ { n } )( x − µ { n } ) (where D j ( µ { n } )( z ) is the image of ( z, z, · · · , z ) ∈ ( R p ) j through the multilinearfunction D j f ( µ { n } ) , which gathers all the partial derivatives of order j of f at µ { n } ) and R { n } j ( x ) = f ( x ) − (cid:80) jl =0 f { n } l ( x ) the remainder of the j -th order Taylorapproximation of f at µ { n } . In particular, f { n } ( x ) = Df ( µ { n } )( x − µ { n } ) , where Df = D f . We identify the linear function Df ( µ { n } ) with the correspondingrow gradient vector of size × p and the bilinear function D f ( µ { n } ) with thecorresponding Hessian matrix of size p × p . We also write f ( x ) = Df ( µ )( x − µ ) .Finally, we assume that the function f is subpolynomial, that is, there exist k ∈ N and C > such that, ∀ x ∈ R p , | f ( x ) | ≤ C (1 + (cid:107) x (cid:107) k ) . First, we study the asymptotic diﬀerence between the Shapley eﬀects given bythe true model f and the ones given by the ﬁrst-order Taylor polynomial of f at µ { n } . Remark that adding a constant to the function does not aﬀect the valuesof the Shapley eﬀects. Thus, the Shapley eﬀects η ( X { n } , f ( µ { n } ) + f { n } ) givenby the ﬁrst-order Taylor polynomial of f at µ { n } are equal to η ( X { n } , f { n } ) . Inthe next proposition, we show that approximating the true Shapley eﬀects ofthe non-linear f by the Shapley eﬀects of the linear approximation f { n } yieldsa vanishing error of order /a { n } as n → ∞ . Proposition 1.

Assume that X { n } ∼ N ( µ { n } , Σ { n } ) , Assumption 1 holds and f is subpolynomial and C on a neighbourhood of µ and Df ( µ ) (cid:54) = 0 . Then, (cid:107) η ( X { n } , f ) − η ( X { n } , f { n } ) (cid:107) = O (cid:18) a { n } (cid:19) . We remark that, when f is a computer model, it can be the case that thegradient vector is available. First, the computer model can already providethem, by means of the Adjoint Sensitivity Method [Cac03]. Second, automaticdiﬀerentiation methods can be used on the source ﬁle of the code and yield adiﬀerentiated code [HP04]. 5 emark 1. The rate O (1 /a { n } ) is the best rate that we can reach under the as-sumptions of Proposition 1. Indeed, letting X { n } = ( X { n } , X { n } ) ∼ N (0 , a { n } I ) and Y { n } = f ( X { n } ) = X { n } + X { n } , we have η ( X { n } , f { n } ) = 1 and η ( X { n } , f { n } ) = 0 . Moreover, η ( X { n } , f ) = a { n } a { n } +2 and η ( X { n } , f ) = a { n } +2 .Thus, the rate of the diﬀerence between η ( X { n } , f ) and η ( X { n } , f { n } ) is exactly /a { n } . In Proposition 1, we bound the diﬀerence between the Shapley eﬀects givenby f and the ones given by the ﬁrst-order Taylor polynomial of f . Moreover,when the matrix a { n } Σ { n } converges, Proposition 2 shows that the Shapleyeﬀects given by the Taylor polynomial converge. Proposition 2.

Assume that X { n } ∼ N ( µ { n } , Σ { n } ) , Assumption 1 holds, f is C on a neighbourhood of µ , Df ( µ ) (cid:54) = 0 and a { n } Σ { n } −→ n → + ∞ Σ ∈ S ++ p ( R ) .Then, if X ∗ ∼ N ( µ, Σ) , (cid:107) η ( X { n } , f { n } ) − η ( X ∗ , f ) (cid:107) = O ( (cid:107) µ { n } − µ (cid:107) ) + O ( (cid:107) a { n } Σ { n } − Σ (cid:107) ) . Proposition 1 shows that replacing f by its ﬁrst-order Taylor polynomial f { n } does not impact signiﬁcantly the Shapley eﬀects when the input variancesare small. Thus, the knowledge of f { n } would enable us to use the explicitexpression (3) of the Gaussian linear case, and for instance the function "Shap-leyLinearGaussian" of the package sensitivity , to estimate the true Shapleyeﬀects η ( X { n } , f ) . However, in practice, the ﬁrst-order Taylor polynomial f { n } is not always available, except for instance in situations described above. Thus,one may be interested in replacing the true ﬁrst-order Taylor polynomial f { n } by an approximation. We will study two such approximations given by ﬁnitediﬀerence and linear regression. For h = ( h , · · · , h p ) ∈ ( R ∗ + ) p and writing ( e , · · · , e p ) the canonical basis of R p ,let (cid:98) D h f ( x ) := (cid:18) f ( x + e h ) − f ( x − e h )2 h , · · · , f ( x + e p h p ) − f ( x − e p h p )2 h p (cid:19) , (4)be the approximation of the diﬀerential of f at x with the steps h , · · · , h p . If ( h { n } ) n is a sequence of ( R ∗ + ) p converging to , let ˜ f ,h { n } ( x ) := ˜ f ,h { n } ,µ { n } ( x ) := (cid:98) D h { n } f ( µ { n } )( x − µ { n } ) be the approximation of the ﬁrst-order Taylor polynomial of f − f ( µ { n } ) at µ { n } with the steps h , · · · , h p . The next proposition ensures that the Shapley eﬀectscomputed from the true Taylor polynomial and the approximated one are close,for small steps. 6 roposition 3. Under the assumptions of Proposition 1, we have (cid:107) η ( X { n } , f { n } ) − η ( X { n } , ˜ f { n } ,h { n } ) (cid:107) = O (cid:16) (cid:107) h { n } (cid:107) (cid:17) . Then, the next corollary extends Propositions 1 and 2 to the approximatedTaylor polynomial based on ﬁnite diﬀerences.

Corollary 1.

Under the assumptions of Proposition 1, and if (cid:107) h { n } (cid:107) ≤ C sup √ a { n } (for example, choosing h { n } i := (cid:113) Var( X { n } i ) , the standard deviation of X { n } i ),we have (cid:107) η ( X { n } , f ) − η ( X { n } , ˜ f { n } ,h { n } ) (cid:107) = O ( 1 a { n } ) . Moreover, if a { n } Σ { n } −→ n → + ∞ Σ , then, letting X ∗ ∼ N ( µ, Σ) , (cid:107) η ( X { n } , ˜ f { n } ,h { n } ) − η ( X ∗ , f ) (cid:107) = O ( (cid:107) µ { n } − µ (cid:107) )+ O ( (cid:107) a { n } Σ { n } − Σ (cid:107) )+ O (cid:18) a { n } (cid:19) . For n ∈ N and N ∈ N ∗ , let ( X { n } ( l ) ) l ∈ [1: N ] be an i.i.d. sample of X { n } of size N and assume that we compute the image of f at each sample point, obtainingthe vector Y { n } . Then, we can approximate f with a linear regression, by leastsquares. In this case, we estimate the coeﬃcients of the linear regression by thevector: (cid:32) (cid:98) β { n } (cid:98) β { n } (cid:33) = (cid:16) A { n } T A { n } (cid:17) − A { n } T Y { n } , where A { n } ∈ M N,p +1 ( R ) is such that, for all j ∈ [1 : N ] , the j -th line of A { n } is (1 X { n } ( j ) T ) . The function f is then approximated by (cid:98) f { n } ( N ) lin : x (cid:55)−→ (cid:98) β { n } + (cid:98) β { n } T x. Remark that the linear function (cid:98) f { n } ( N ) lin is random and so, the deduced Shap-ley eﬀects η ( X { n } , (cid:98) f { n } ( N ) lin ) are random variables. The next proposition andcorollary correspond to Proposition 3 and Corollary 1, for the linear regressionapproximation of f . Proposition 4.

Under Assumption 1, if f is C on a neighbourhood of µ with Df ( µ ) (cid:54) = 0 , there exist C inf > , C (1)sup < + ∞ and C (2)sup < + ∞ such that, withprobability at least − C (1)sup exp( − C inf N ) , we have (cid:107) η ( X { n } , f { n } ) − η ( X { n } , (cid:98) f { n } ( N ) lin ) (cid:107) ≤ C (2)sup √ a { n } . orollary 2. Under the assumptions of Proposition 1, there exist C inf > , C (1)sup < + ∞ and C (2)sup < + ∞ such that, with probability at least − C (1)sup exp( − C inf N ) ,we have (cid:107) η ( X { n } , f ) − η ( X { n } , (cid:98) f { n } ( N ) lin ) (cid:107) ≤ C (2)sup √ a { n } . Moreover, if a { n } Σ { n } −→ n → + ∞ Σ , then, letting X ∗ ∼ N ( µ, Σ) , there exists C (3)sup < + ∞ such that, with probability at least − C (1)sup exp( − C inf N ) , (cid:107) η ( X { n } , (cid:98) f { n } ( N ) lin ) − η ( X ∗ , f ) (cid:107) ≤ C (3)sup (cid:18) (cid:107) µ { n } − µ (cid:107) + (cid:107) a { n } Σ { n } − Σ (cid:107) + 1 √ a { n } (cid:19) . In this section, we compute the Shapley eﬀects of the true function f and theones obtained from the three previous linear approximations to illustrate theprevious theoretical results. Let p = 4 and f ( x ) = cos( x ) x + sin( x ) + 2 cos( x ) x − sin( x ) . This function is -Lipschitz continuous and C ∞ on R . We choose Σ { n } = n Σ (that is, a { n } = n ), where Σ is deﬁned by: Σ = A T A, A =  − − − − − −

10 1 2 −  . Let µ = (1 , , , and µ { n } = µ + n (1 , , , .On Figure 1, we plot, for diﬀerent values of n , the vector η ( X { n } , (cid:98) f { n } ( N ) lin ) (given by the linear regression), the vector η ( X { n } , f { n } ) (given by the true Tay-lor polynomial), the vector η ( X { n } , ˜ f { n } ,h { n } ) (given by the ﬁnite diﬀerence ap-proximation of the derivatives) and the boxplots of 200 estimates of η ( X { n } , f ) computed by the R function "shapleyPermRand" from the R package sensitivity (see [SNS16, IP19]), which is adapted to non-linear functions, with parameters N V = 10 , m = 10 and N I = 3 . To compute the linear regression, we observeda sample of size N = 40 . To compute the ﬁnite diﬀerence approximation, wetook h { n } i = (cid:113) Var( X { n } i ) .The diﬀerences between the Shapley eﬀects given by f and the ones givenby the linear approximations of f seem to converge to , as it is proved byPropositions 1, 3 and 4. Moreover, Figure 1 emphasizes that the Shapley eﬀectsobtained from the linear regression get closer slower to the true ones than theones given by the other linear approximations.We remark that we have here Σ { n } = a { n } Σ and thus the assumptions ofProposition 2 hold. Hence, the values of the true Shapley eﬀects η ( X { n } , f ) converge, as we can see on Figure 1. 8 f {n}1 . . . . . n . . . . . . . . . . . . . . . η η η η f f~ f {n}{n}{n}(N) ^f {n} Figure 1: Shapley eﬀects of the linear approximations (cid:98) f { n } ( N ) lin , f { n } , ˜ f { n } ,h { n } andboxplots of estimates of the Shapley eﬀects of the function f .The computation time for each estimate of the Shapley eﬀects is around 5seconds using "shapleyPermRand", . × − using the linear approximation f { n } or ˜ f { n } ,h { n } and . × − using the linear approximation (cid:98) f { n } ( N ) lin . Remarkthat this time diﬀerence can become more accentuated if the function f is acostly computer code. Here, we extend the results of Section 3 to the case where the distribution ofthe input (that we now write (cid:98) X { n } ) is close to a Gaussian distribution X { n } .9e focus on the setting where the input vector is an empirical mean (cid:98) X { n } = 1 n n (cid:88) l =1 U ( l ) , where ( U ( l ) ) l ∈ [1: n ] is an i.i.d. sample of a random vector U in R p such that E ( (cid:107) U (cid:107) ) < + ∞ and Var( U ) (cid:54) = 0 . Let µ := E ( U ) and Σ be the covariancematrix of U . Remark that, as is Section 3, the input vector (cid:98) X { n } is a randomvector converging to its mean, and its covariance matrix Σ { n } is equal to n Σ .Contrary to Section 3, (cid:98) X { n } is not Gaussian, but, thanks to the centrallimit theorem, its distribution is close to N ( µ, n Σ) . Hence, we would like toestimate the Shapley eﬀects η ( (cid:98) X { n } , f ) by η ( X ∗ , Df ( µ )) , where X ∗ ∼ N (0 , Σ) ,since η ( X ∗ , Df ( µ )) can be computed using the explicit expression (3) of theGaussian linear case, and for instance the function "ShapleyLinearGaussian" ofthe package sensitivity . Proposition 5.

Assume that f is C on a neighbourhood of µ with Df ( µ ) (cid:54) = 0 and that f is subpolynomial, that is there exist k ∈ N ∗ and C > such that forall x ∈ R p , we have | f ( x ) | ≤ C (1 + (cid:107) x (cid:107) k ) . If E ( (cid:107) U (cid:107) k ) < + ∞ and if U has abounded probability density function, then η ( (cid:98) X { n } , f ) −→ n → + ∞ η ( X ∗ , Df ( µ )) . Proposition 5 justiﬁes that η ( X ∗ , Df ( µ )) is a good approximation of η ( (cid:98) X { n } , f ) .Furthermore, if µ , Σ and Df ( µ ) are unknown, the following corollary shows thatthey can be replaced by approximations. Let ( U { l }(cid:48) ) l ∈ [1: n (cid:48) ] and ( U { l }(cid:48)(cid:48) ) l ∈ [1: n (cid:48)(cid:48) ] beindependent of ( U { l } ) l ∈ [1: n ] , composed of i.i.d. copies of U and with n (cid:48) = n (cid:48) ( n ) and n (cid:48)(cid:48) = n (cid:48)(cid:48) ( n ) such that n (cid:48) , n (cid:48)(cid:48) → ∞ when n → ∞ . We can estimate µ (resp. Σ ) by the empirical mean (cid:98) X { n (cid:48) }(cid:48) of ( U { l }(cid:48) ) l ∈ [1: n (cid:48) ] (resp. the empirical covariancematrix (cid:98) Σ { n (cid:48)(cid:48) }(cid:48)(cid:48) of ( U { l }(cid:48)(cid:48) ) l ∈ [1: n (cid:48)(cid:48) ] ), and we can estimate Df by a ﬁnite diﬀerenceapproximation. The next corollary guarantees that the error stemming fromthese additional estimations goes to as n → ∞ . Corollary 3.

Assume that the assumptions of Proposition 5 hold and that ( h { n } ) n ∈ N is a sequence of ( R ∗ + ) p converging to . Let X ∗ n be a random vectorwith distribution N ( µ, (cid:98) Σ { n (cid:48)(cid:48) }(cid:48)(cid:48) ) conditionally to (cid:98) Σ { n (cid:48)(cid:48) }(cid:48)(cid:48) . Then (cid:13)(cid:13)(cid:13) η ( (cid:98) X { n } , f ) − η ( X ∗ n , ˜ f { n } ,h { n } , (cid:98) X { n (cid:48)}(cid:48) ) (cid:13)(cid:13)(cid:13) a.s. −→ n → + ∞ , where ˜ f { n } ,h { n } , (cid:98) X { n (cid:48)}(cid:48) is the linear approximation of f at (cid:98) X { n (cid:48) }(cid:48) obtained fromEquation (4) by replacing µ { n } by (cid:98) X { n (cid:48) } (cid:48) . Remark 2. If µ , Σ or Df is known, the previous corollary holds replacing (cid:98) X { n (cid:48) }(cid:48) , (cid:98) Σ { n (cid:48)(cid:48) }(cid:48)(cid:48) or ˜ f { n } ,h { n } , (cid:98) X { n (cid:48)}(cid:48) by µ, Σ or Df ( (cid:98) X { n (cid:48) }(cid:48) ) respectively. emark 3. The notation η ( X ∗ n , ˜ f { n } ,h { n } , (cid:98) X { n (cid:48)}(cid:48) ) is to be understood conditionallyto (cid:98) Σ { n (cid:48)(cid:48) }(cid:48)(cid:48) , (cid:98) X { n (cid:48) }(cid:48) . That is, conditionally to (cid:98) Σ { n (cid:48)(cid:48) }(cid:48)(cid:48) , (cid:98) X { n (cid:48) }(cid:48) , the Shapley eﬀects η ( X ∗ n , ˜ f { n } ,h { n } , (cid:98) X { n (cid:48)}(cid:48) ) are deﬁned with the ﬁxed linear function ˜ f { n } ,h { n } , (cid:98) X { n (cid:48)}(cid:48) andthe Gaussian distribution for X ∗ n . Let us show an example of application of the results of Section 4.1. Let U be acontinuous random vector of R p , with a bounded density and with an unknownmean µ . Assume that we observe an i.i.d. sample ( U ( l ) ) l ∈ [1: n ] of U and that wefocus on the estimation of a parameter θ = f ( µ ) , where f is C . This parameteris estimated by f ( (cid:98) X { n } ) (which is asymptotically eﬃcient by the delta-method),where (cid:98) X { n } is the empirical mean of ( U ( l ) ) l ∈ [1: n ] . The estimation error of eachvariable (cid:98) X { n } i (for i = 1 , · · · , p ) propagates through f . To quantify the partof the estimation error of Y = f ( (cid:98) X { n } ) caused by the individual estimationerrors of each (cid:98) X { n } i (for i = 1 , · · · , p ), one can estimate the Shapley eﬀects η ( (cid:98) X { n } , f ) = η ( (cid:98) X { n } − µ, f ( · + µ ) − f ( µ )) which assess the impact of individualerrors on the global error. To that end, Proposition 5 and Corollary 3 statethat the Shapley eﬀects can be estimated using a Gaussian linear approxima-tion, with an error that vanishes as n increases.For example, let f = (cid:107) · (cid:107) and p = 5 . In this case, the derivative Df is known and no ﬁnite diﬀerence approximation is required. To generate U with a bounded density and with dependencies, we deﬁne A ∼ U ([5 , , A ∼N (0 , , A with a symmetric triangular distribution T ( − , , A ∼ Beta (1 , and A ∼ Exp (1) . Then, we deﬁne  U = A + 2 A − . A U = A + 2 A − . A U = A + 2 A − . A U = A + 2 A − . A U = A + 2 A − . A . Since the mean µ and the covariance matrix Σ are unknown, we need to estimatethem (as in Corollary 3). Using the notation of Section 4.1, we choose n = n (cid:48) = n (cid:48)(cid:48) and ( U ( l ) (cid:48) ) l ∈ [1: n (cid:48) ] = ( U ( l ) (cid:48)(cid:48) ) l ∈ [1: n (cid:48) ] (that is, we estimate the empiricalmean and the empirical covariance matrix with the same sample). We estimatethe Shapley eﬀects η ( (cid:98) X { n } , f ) by η ( X ∗ n , Df ( (cid:98) X { n }(cid:48) )) , where X ∗ n is a randomvector with distribution N ( µ, (cid:98) Σ { n }(cid:48)(cid:48) ) conditionally to (cid:98) Σ { n }(cid:48)(cid:48) . By Corollary 3 andRemark 2, the diﬀerence between η ( (cid:98) X { n } , f ) and η ( X ∗ n , Df ( (cid:98) X { n }(cid:48) )) convergesto 0 almost surely when n goes to + ∞ .Here, we compute 1000 estimates of µ and Σ and we compute the 1000 corre-sponding Shapley eﬀects of the Gaussian linear approximation η ( X ∗ n , Df ( (cid:98) X { n }(cid:48) )) .11o compare with these estimates, we also compute 1000 estimates given by thefunction "shapleySubsetMC" suggested in [BBD20], with parameters N tot =1000 , N i = 3 and with an i.i.d. sample of (cid:98) X { n } with size 1000. We plot theresults on Figure 2. − . . . . n = 100 h h h h h − . . . . n = 200 h h h h h − . . . . n = 500 h h h h h − . . . . n = 1000 h h h h h Figure 2: Boxplots of the estimates of the Shapley eﬀects given by the generalestimation function "shapleySubsetMC" (in red) and by the Gaussian linearapproximation (in black).We observe that the estimates of the Shapley eﬀects given by "shapley-SubsetMC" and the Gaussian linear approximation are rather similar, even for n = 100 . However, the variance of the estimates given by the Gaussian lin-ear approximation is smaller than the one of the general estimates given by"shapleySubsetMC". Moreover, each Gaussian linear estimation requires onlya sample of ( U ( l ) (cid:48) ) l ∈ [1: n ] (to compute (cid:98) X { n }(cid:48) and (cid:98) Σ { n }(cid:48)(cid:48) ) and takes around 0.007second on a personal computer, whereas each general estimation with "shap-leySubsetMC" requires here 1000 samples of ( U ( l ) (cid:48) ) l ∈ [1: n ] and takes around 1112econds. Remark that this time diﬀerence can become more accentuated if thefunction f is a costly computer code. Finally, the estimator of the Shapley ef-fects given by the linear approximation converges almost surely when n goes to + ∞ , whereas the estimator of the Shapley eﬀects given by "shapleySubsetMC"is only shown to converge in probability when the sample size and N tot go to + ∞ (see [BBD20]).To conclude, we have provided a framework where the theoretical results ofSection 4.1 can be applied. We have illustrated this framework with numericalexperiments on generated data. We have showed that, in this framework, toestimate the Shapley eﬀects, the Gaussian linear approximation provides anestimator much faster and much more accurate than the general estimator givenby "shapleySubsetMC". In this paper, we worked on the Gaussian linear framework approximation toestimate the Shapley eﬀects, in order to take advantage of the simplicity broughtby this framework. First, we focused on the case where the inputs are Gaussianvariables converging to their means. This setting is motivated, in particular,by the case of uncertainties on physical quantities that are reduced by takingmore and more measurements. We showed that, to estimate the Shapley eﬀects,one can replace the true model f by three possible linear approximations: theexact Taylor polynomial approximation, a ﬁnite diﬀerence approximation and alinear regression. We gave the rate of convergence of the diﬀerence between theShapley eﬀects of the linear approximations and the Shapley eﬀects of the truemodel. These results are illustrated by a simulated application that highlightsthe accuracy of the approximations. Then, we focused on the case where the in-puts are given by an empirical mean. In this case, we proved that the instinctiveidea to replace the empirical mean by a Gaussian vector and the true model by alinear approximation around the mean indeed gives good approximations of theShapley eﬀects. We highlighted the beneﬁts of these estimators on numericalexperiments.Several questions remain open to future work. In particular, it would bevaluable to obtain more insight on the choice between the general estimator ofthe Shapley eﬀects for non-linear models and the estimators based on Gaussianlinear approximations. Quantitative criteria for this choice, based for instanceon the magnitude of the input uncertainties or on the number of input samplesthat are available, would be beneﬁcial. Regarding the results on the impactof individual estimation errors in Section 4.2, it would be interesting to obtainextensions to estimators of quantities of interest that are not only empiricalmeans, for instance general M-estimators.13 cknowledgements We acknowledge the ﬁnancial support of the Cross- Disciplinary Program onNumerical Simulation of CEA, the French Alternative Energies and Atomic En-ergy Commission. We would like to thank BPI France for co-ﬁnancing thiswork, as part of the PIA (Programme d’Investissements d’Avenir) - Grand Déﬁdu Numérique 2, supporting the PROBANT project. We acknowledge the In-stitut de MathÃľmatiques de Toulouse.

References [BBCM20] Baptiste Broto, François Bachoc, Laura Clouvel, and Jean-MarcMartinez. Block-diagonal covariance estimation and applicationto the Shapley eﬀects in sensitivity analysis. https://hal.archives-ouvertes.fr/hal-02196583v2, February 2020.[BBD20] Baptiste Broto, Francois Bachoc, and Marine Depecker. Vari-ance reduction for estimation of Shapley eﬀects and adaptation tounknown input distribution.

SIAM/ASA Journal on UncertaintyQuantiﬁcation , 8(2):693–716, 2020.[BBDM19] Baptiste Broto, FranÃğois Bachoc, Marine Depecker, and Jean-Marc Martinez. Sensitivity indices for independent groups ofvariables.

Mathematics and Computers in Simulation , 163:19–31,September 2019.[BR86] Rabi N Bhattacharya and R Ranga Rao.

Normal approximationand asymptotic expansions , volume 64. SIAM, 1986.[Cac03] Dan G Cacuci. Sensitivity and uncertainty analysis, volume 1: The-ory (hardcover), 2003.[CGP12] Gaëlle Chastaing, Fabrice Gamboa, and Clémentine Prieur. Gen-eralized hoeﬀding-sobol decomposition for dependent variables-application to sensitivity analysis.

Electronic Journal of Statistics ,6:2420–2448, 2012.[Cha13] GaÃńlle Chastaing.

Indices de Sobol gÃľnÃľralisÃľs pour vari-ables dÃľpendantes . phdthesis, UniversitÃľ de Grenoble, September2013.[Clo19] Laura Clouvel.

Quantiﬁcation de l’incertitude du ﬂux neutroniquerapide reÃğu par la cuve d’un rÃľacteur Ãă eau pressurisÃľe . PhDThesis, UniversitÃľ Paris-Saclay, November 2019.[GJK +

16] Fabrice Gamboa, Alexandre Janon, Thierry Klein, A. Lagnoux,and ClÃľmentine Prieur. Statistical inference for Sobol pick-freezeMonte Carlo method.

Statistics , 50(4):881–902, 2016.14HP04] Laurent Hascoët and Valérie Pascual. Tapenade 2.1 user’s guide.2004.[HT11] Hugo Hammer and HÃěkon Tjelmeland. Approximate forward-backward algorithm for a switching linear Gaussian model.

Com-putational Statistics & Data Analysis , 55(1):154–167, January 2011.[IAP20] Bertrand Iooss, Janon Alexandre, and Gilles Pujol. sensitivity:Global Sensitivity Analysis of Model Outputs, February 2020.[IP19] Bertrand Iooss and Clémentine Prieur. Shapley eﬀects for sensitiv-ity analysis with correlated inputs: comparisons with sobol’indices,numerical estimation and applications.

International Journal forUncertainty Quantiﬁcation , 9(5):493–514, 2019.[JEF13] JEFF-3.1. Validation of the jeﬀ-3.1 nuclear data library: Jeﬀ report23, 2013.[JEN11] JENDL-4.0. Jendl-4.0: A new library for nuclear science and en-gineering.

Journal of Nuclear Science and Technology , 48(1):1–30,2011.[KHF +

06] T. Kawano, K. M. Hanson, S. Frankle, P. Talou, M. B. Chadwick,and R. C. Little. Evaluation and Propagation of the $$Pu Fis-sion Cross-Section Uncertainties Using a Monte Carlo Technique.

Nuclear Science and Engineering , 153(1):1–7, May 2006.[McL05] V. McLane. ENDF-6 data formats and procedures for the evaluatednuclear data ﬁle ENDF-VII, 2005.[MT12] Thierry A. Mara and Stefano Tarantola. Variance-based sensitivityindices for models with dependent inputs.

Reliability Engineering& System Safety , 107:115–121, November 2012.[MTA15] Thierry A. Mara, Stefano Tarantola, and Paola Annoni. Non-parametric methods for global sensitivity analysis of model outputwith dependent inputs.

Environmental Modelling and Software ,72:173–183, July 2015.[OP17] Art B. Owen and ClÃľmentine Prieur. On Shapley value for mea-suring importance of dependent inputs.

SIAM/ASA Journal onUncertainty Quantiﬁcation , 5(1):986–1002, 2017.[Owe14] Art B. Owen. Sobol’ Indices and Shapley Value.

SIAM/ASA Jour-nal on Uncertainty Quantiﬁcation , 2(1):245–251, January 2014.[Ros70] Haskell P Rosenthal. On the subspaces oﬂ p (p> 2) spanned bysequences of independent random variables.

Israel Journal of Math-ematics , 8(3):273–303, 1970.15Ros04] Antti-Veikko Ilmari Rosti.

Linear Gaussian models for speech recog-nition . PhD Thesis, University of Cambridge, 2004.[She71] TL Shervashidze. On a uniform estimate of the rate of convergencein the multidimensional local limit theorem for densities.

Theory ofProbability & Its Applications , 16(4):741–743, 1971.[SNS16] Eunhye Song, Barry L. Nelson, and Jeremy Staum. Shapley Ef-fects for Global Sensitivity Analysis: Theory and Computation.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , 4(1):1060–1083, January 2016.[Sob93] Ilya M. Sobol. Sensitivity estimates for nonlinear mathematicalmodels.

Mathematical Modelling and Computational Experiments ,1(4):407–414, 1993.[SWNW03] Thomas J. Santner, Brian J. Williams, William Notz, and Brain J.Williams.

The design and analysis of computer experiments , vol-ume 1. Springer, 2003. 16 ppendices

We will write C sup for a generic non-negative ﬁnite constant. The actual valueof C sup is of no interest and can change in the same sequence of equations.Similarly, we will write C inf for a generic strictly positive constant. Moreover,for all u ⊂ [1 : p ] , if Z is a random vector in R p and g is a function from R p to R such that E ( g ( Z ) ) < + ∞ and Var( g ( Z )) > , let S clu ( Z, g ) be the closedSobol index (see [GJK +

16] for example) for the input vector Z and the model g , deﬁned by: S clu ( Z, g ) = Var( E ( g ( Z ) | Z u ))Var( g ( Z )) . Proof of Proposition 1

We divide the proof into several lemmas. We assume that the assumptionsof Proposition 1 hold throughout this proof.Let ε ∈ ]0 , be such that f is C on B ( µ, ε ) and such that, for all x ∈ B ( µ, ε ) ,we have Df ( x ) (cid:54) = 0 . Since µ { n } converges to µ , there exists N ∈ N such that,for all n ≥ N , µ { n } ∈ B ( µ, ε/ . In the following, we assume that n is largerthan N . Lemma 1.

For all x ∈ B ( µ { n } , ε/ , we have | R { n } ( x ) | ≤ C (cid:107) x − µ { n } (cid:107) , | R { n } ( x ) | ≤ C (cid:48) (cid:107) x − µ { n } (cid:107) and for all x / ∈ B ( µ { n } , ε/ , | R { n } ( x ) | ≤ C (cid:107) x − µ { n } (cid:107) k , | R { n } ( x ) | ≤ C (cid:48) (cid:107) x − µ { n } (cid:107) k , where C , C (cid:48) , C and C (cid:48) are positive constants that do not depend on n .Proof. Using Taylor’s theorem, for all x ∈ B ( µ { n } , ε ) , there exist θ ( n, x ) , θ ( n, x ) ∈ ]0 , such that f ( x ) = f { n } + f { n } ( x ) + 12 D f ( µ { n } + θ ( n, x )( x − µ { n } ))( x − µ { n } )= f { n } + f { n } ( x ) + f { n } ( x )+ 16 D f ( µ { n } + θ ( n, x )( x − µ { n } ))( x − µ { n } ) . Let C = max x ∈ B ( µ,ε ) (cid:107) D f ( x ) (cid:107) and C (cid:48) = max x ∈ B ( µ,ε ) (cid:107) D f ( x ) (cid:107) , where (cid:107)·(cid:107) also means the operator norm of a multilinear form. Thus, for all x ∈ B ( µ, ε ) , | R { n } ( x ) | ≤ C (cid:107) x − µ { n } (cid:107) , | R { n } ( x ) | ≤ C (cid:48) (cid:107) x − µ { n } (cid:107) . f is subpolynomial, so ∃ k ≥ , and C < + ∞ such that, ∀ x ∈ R p , | f ( x ) | ≤ C (1 + (cid:107) x (cid:107) k ) . Hence, taking C (cid:48) = C (2 (cid:107) µ (cid:107) + 2) k , we have | f ( x ) | ≤ C (1 + 2 k (cid:107) x − µ { n } (cid:107) k + 2 k (cid:107) µ { n } (cid:107) k ) ≤ C (cid:48) (1 + (cid:107) x − µ { n } (cid:107) k ) . Hence, taking C (cid:48)(cid:48) := C (cid:48) + max y ∈ B ( µ,ε ) (cid:107) Df ( y ) (cid:107) , we have | R { n } ( x ) | ≤ | f ( x ) | + max y ∈ B ( µ,ε ) (cid:107) Df ( y ) (cid:107)(cid:107) x − µ { n } (cid:107) ≤ C (cid:48)(cid:48) (1 + (cid:107) x − µ { n } (cid:107) k ) . Now, taking C := C (cid:48)(cid:48) (cid:0) ε ) k (cid:1) , we have, for all x / ∈ B ( µ { n } , ε/ , | R { n } ( x ) | ≤ C (cid:48)(cid:48) + C (cid:48)(cid:48) (cid:107) x − µ { n } (cid:107) k ≤ C (cid:107) x − µ { n } (cid:107) k . Similarly, there exists C (cid:48) < + ∞ such that | R { n } ( x ) | ≤ C (cid:48) (cid:107) x − µ { n } (cid:107) k . Lemma 2.

We have cov( E ( f { n } ( X { n } ) | X { n } u ) , f { n } ( X { n } ) | X { n } u )) = 0 . Proof.

Let n ∈ N . To simplify notation, let A = X { n } − µ { n } , β ∈ R p be thevector of the linear application Df ( µ { n } ) and Γ ∈ M p ( R ) be symmetric thematrix of the quadratic form D f ( µ { n } ) . Then, cov( E ( f { n } ( X { n } ) | X { n } u ) , E ( f { n } ( X { n } ) | X { n } u ))= cov( E ( β T A ) | A u ) , E ( A T Γ A | A u ))= E (cid:0)(cid:2) β Tu A u + β T − u E ( A − u | A u ) (cid:3) (cid:2) A Tu Γ u,u A u + 2 A Tu Γ u, − u E ( A − u | A u ) + E ( A T − u Γ − u, − u A − u | A u ) (cid:3)(cid:1) = E (cid:0)(cid:2) β Tu A u + β T − u E ( A − u | A u ) (cid:3) E ( A T − u Γ − u, − u A − u | A u ) (cid:1) since all the other terms are linear combinations of expectations of products ofthree zero-mean Gaussian variables. Indeed, the coeﬃcients of E ( A − u | A u ) arelinear combinations of the coeﬃcients of A u . Now, E (cid:0) β Tu A u × E ( A T − u Γ − u, − u A − u | A u ) (cid:1) = E (cid:0) E ( β Tu A u × A T − u Γ − u, − u A − u | A u ) (cid:1) = E ( β Tu A u × A T − u Γ − u, − u A − u )= 0 . Similarly, the term E (cid:0) β − u E ( A − u | A u ) E ( A T − u Γ − u, − u A − u | A u ) (cid:1) is equal to 0.18 emma 3. There exists C sup < + ∞ such that, for all u ⊂ [1 : p ] , Var( E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u )) ≤ C sup a { n } , and (cid:12)(cid:12)(cid:12) cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u )) (cid:12)(cid:12)(cid:12) ≤ C sup a { n } . Proof.

Using Lemma 1, we have, E ( | (cid:112) a { n } R { n } ( X { n } ) | ) = E ( | (cid:112) a { n } R { n } ( X { n } ) | (cid:107) X n (cid:107) < ε ) + E ( | (cid:112) a { n } R { n } ( X { n } ) | (cid:107) X n (cid:107)≥ ε ) ≤ C a { n } E ( (cid:107) (cid:112) a { n } ( X { n } − µ { n } ) (cid:107) )+ C a { n } ( k − E ( (cid:107) (cid:112) a { n } ( X { n } − µ { n } ) (cid:107) k ) ≤ C sup a { n } , since a { n } Σ { n } is bounded. Hence, Var( (cid:112) a { n } R { n } ( X { n } )) ≤ C sup a { n } . Moreover, for all u ⊂ [1 : p ] , ≤ Var( E ( a { n } R { n } ( X { n } ) | X { n } u )) ≤ Var( a { n } R { n } ( X { n } )) ≤ C sup a { n } . For all u ⊂ [1 : p ] , cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u ))= cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ))+ cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u ))= cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u )) , using Lemma 2. Now, by Cauchy-Schwarz inequality, (cid:12)(cid:12)(cid:12) cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u )) (cid:12)(cid:12)(cid:12) ≤ (cid:113) Var( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) (cid:113) Var( (cid:112) a { n } R { n } ( X { n } ) | X { n } u ) ≤ (cid:113) Var( (cid:112) a { n } f { n } ( X { n } )) (cid:113) Var( (cid:112) a { n } R { n } ( X { n } )) . E ( | (cid:112) a { n } R { n } ( X { n } ) | )= E ( | (cid:112) a { n } R { n } ( X { n } ) | (cid:107) X n (cid:107)≤ ε ) + E ( | (cid:112) a { n } R { n } ( X { n } ) | (cid:107) X n (cid:107)≥ ε ) ≤ C a { n } E ( (cid:107) (cid:112) a { n } ( X { n } − µ { n } ) (cid:107) )+ C a { n } ( k − E ( (cid:107) (cid:112) a { n } ( X { n } − µ { n } ) (cid:107) k × ) ≤ C sup a { n } . Furthermore,

Var( (cid:112) a { n } f { n } ( X { n } )) ≤ max x ∈ B ( µ { n } ,ε/ (cid:107) Df ( x ) (cid:107) E (cid:16) (cid:107) (cid:112) a { n } ( X { n } − µ { n } ) (cid:107) (cid:17) ≤ C sup . Finally, (cid:12)(cid:12)(cid:12) cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u )) (cid:12)(cid:12)(cid:12) ≤ C sup a { n } , that concludes the proof of Lemma 3. Lemma 4.

For all u ⊂ [1 : p ] , S clu ( X { n } , f ) = S clu ( X { n } , f { n } ) + O (cid:18) a { n } (cid:19) . Proof.

We have f ( X { n } ) = f ( µ { n } ) + f { n } ( X { n } ) + R { n } ( X { n } ) . For all u ⊂ [1 : p ] , we have E ( f ( X { n } ) | X { n } u ) = f ( µ { n } ) + E ( f { n } ( X { n } ) | X { n } u ) + E ( R { n } ( X { n } ) | X { n } u ) , so, a { n } Var( E ( f ( X { n } ) | X { n } u ))= Var( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u )) + Var( E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u ))+2 cov( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) , E ( (cid:112) a { n } R { n } ( X { n } ) | X { n } u ))= Var( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u )) + O ( 1 a { n } ) , by Lemma 3. Hence, for u = [1 : p ] , we have a { n } Var( f ( X { n } )) = Var( (cid:112) a { n } f { n } ( X { n } )) + O ( 1 a { n } ) . u ⊂ [1 : p ] , S clu ( X { n } , f ) = Var( E ( f ( X { n } ) | X { n } u ))Var( f ( X { n } ))= a { n } Var( E ( f ( X { n } ) | X { n } u )) a { n } Var( f ( X { n } ))= Var( E ( √ a { n } f { n } ( X { n } ) | X { n } u )) + O ( a { n } )Var( √ a { n } f { n } ( X { n } )) + O ( a { n } )= Var( √ a { n } f { n } ( X { n } ) | X { n } u )Var( √ a { n } f { n } ( X { n } )) + O ( 1 a { n } )= S clu ( X { n } , f { n } ) + O (cid:18) a { n } (cid:19) , where we used that, Var( (cid:112) a { n } f { n } ( X { n } )) = Df ( µ { n } )( a { n } Σ { n } ) Df ( µ { n } ) T ≥ λ min ( a { n } Σ { n } ) inf x ∈ B ( µ,ε/ (cid:107) Df ( x ) (cid:107) ≥ C inf . Now we have proved the convergence of the closed Sobol indices, we canprove Proposition 1 easily.

Proof.

By Lemma 4 and applying the linearity of the Shapley eﬀects with respectto the Sobol indices, we have η ( X { n } , f ) = η ( X { n } , f { n } ) + O ( 1 a { n } ) . Proof of Remark 1

Proof.

Let X { n } = ( X { n } , X { n } ) ∼ N (0 , a { n } I ) and Y { n } = f ( X { n } ) = X { n } + X { n } , we have f { n } ( X { n } ) = X { n } and R { n } ( X { n } ) = X { n } . Thus, η ( X { n } , f { n } ) = 1 and η ( X { n } , f { n } ) = 0 . Now, let us compute the Shapleyeﬀects η ( X { n } , f ) . We have Var( f ( X { n } )) = Var( X { n } ) + Var( X { n } )= Var( X { n } ) + E ( X { n } ) − E ( X { n } ) = 1 a { n } + 3 a { n } − a { n } = a { n } + 2 a { n } . Var( E ( f ( X { n } ) | X { n } )) = Var( X { n } + 1 a { n } ) = Var( X { n } ) = 1 a { n } and Var( E ( f ( X { n } ) | X { n } )) = Var( X { n } ) = E ( X { n } ) − E ( X { n } ) = 3 − a { n } = 2 a { n } . Hence, η ( X { n } , f ) = a { n } ( a { n } + 2)2 (cid:18) a { n } + a { n } + 2 a { n } − a { n } (cid:19) = a { n } a { n } + 2 , and η ( X { n } , f ) = 2 a { n } + 2 . Proof of Proposition 2

As in the proof of Proposition 1, we ﬁrst prove the convergence for the closedSobol indices. To simplify notation, let Γ { n } := a { n } Σ { n } . Lemma 5.

Under the assumptions of Proposition 2, for all u ⊂ [1 : p ] , we have S clu ( f { n } ( X { n } )) = S clu ( f ( X ∗ )) + O ( (cid:107) µ { n } − µ (cid:107) ) + O ( (cid:107) Γ { n } − Σ (cid:107) ) . Proof.

We have

Var( (cid:112) a { n } f { n } ( X { n } )) − Var( f ( X ∗ ))= Df ( µ { n } )Γ { n } Df ( µ { n } ) T − Df ( µ )Σ Df ( µ ) T = Df ( µ { n } )Γ { n } (cid:104) Df ( µ { n } ) T − Df ( µ ) T (cid:105) + Df ( µ { n } ) (cid:104) Γ { n } − Σ (cid:105) Df ( µ ) T (cid:104) Df ( µ { n } ) − Df ( µ ) (cid:105) Σ Df ( µ ) T = O ( (cid:107) Df ( µ { n } ) − Df ( µ ) (cid:107) ) + O ( (cid:107) Γ { n } − Σ (cid:107) )= O ( (cid:107) µ { n } − µ (cid:107) ) + O ( (cid:107) Γ { n } − Σ (cid:107) ) , using that Df is Lipschitz continuous on a neighbourhood of µ (thanks to thecontinuity of D f ).Moreover, for all ∅ (cid:32) u (cid:32) [1 : p ] , we have Var( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u )) − Var( E ( f ( X ∗ ) | X ∗ u ))= Var( (cid:112) a { n } f { n } ( X { n } )) − E (Var( (cid:112) a { n } f ( X { n } ) | X { n }− u )) − Var( f ( X ∗ )) + E (Var( f ( X ∗ ) | X ∗ u ))= Df ( µ { n } )Γ { n } Df ( µ { n } ) T − Df ( µ { n } ) u (Γ { n } u,u − Γ { n } u, − u Γ { n }− − u, − u Γ { n }− u,u ) Df ( µ { n } ) Tu − Df ( µ )Σ Df ( µ ) T − Df ( µ ) u (Σ u,u − Σ u, − u Σ − − u, − u Σ − u,u ) Df ( µ ) Tu = O ( (cid:107) µ { n } − µ (cid:107) ) + O ( (cid:107) Γ { n } − Σ (cid:107) ) , S clu ( X { n } , f { n } ) = S clu ( X ∗ , f ) + O ( (cid:107) µ { n } − µ (cid:107) ) + O ( (cid:107) Γ { n } − Σ (cid:107) ) . Now, we can easily prove Proposition 2.

Proof.

By Lemma 5 and applying the linearity of the Shapley eﬀects with respectto the Sobol indices, we have η ( f { n } ( X { n } )) = η ( f ( X ∗ )) + O ( (cid:107) µ { n } − µ (cid:107) ) + O ( (cid:107) Γ { n } − Σ (cid:107) ) . Proof of Proposition 3

Under the assumption of Proposition 3, let ε > be such that f is C on B ( µ, ε ) and such that, for all x ∈ B ( µ, ε ) , we have Df ( x ) (cid:54) = 0 . Since µ { n } converges to µ , there exists N ∈ N such that, for all n ≥ N , µ { n } ∈ B ( µ, ε/ .In the following, we assume that n is larger than N . Lemma 6.

For all x ∈ B ( µ, ε ) and h ∈ ( R ∗ + ) p such that (cid:107) h (cid:107) ≤ ε , we have (cid:107) (cid:98) D h f ( x ) − Df ( x ) (cid:107) ≤

16 max i ∈ [1: p ] max y ∈ B ( µ,ε ) | ∂ i f ( y ) |(cid:107) h (cid:107) Proof.

Let x ∈ B ( µ, ε ) and h ∈ ( R ∗ + ) p such that (cid:107) h (cid:107) ≤ ε . For all i ∈ [1 : p ] ,using Taylor’s theorem, there exist θ + x,h,i , θ − x,h,i ∈ ]0 , such that f ( x + e i h i ) − f ( x − e i h i )2 h i = ∂ i f ( x )+ h i (cid:16) ∂ i f ( x + θ + x,h,i h ) + ∂ i f ( x − θ − x,h,i h ) (cid:17) . Hence, (cid:107) (cid:98) D h f ( x ) − Df ( x ) (cid:107) ≤ p (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:104) (cid:98) D h f ( x ) − Df ( x ) (cid:105) i (cid:12)(cid:12)(cid:12) ≤

16 max i ∈ [1: p ] max y ∈ B ( µ,ε ) | ∂ i f ( y ) | p (cid:88) i =1 h i = 16 max i ∈ [1: p ] max y ∈ B ( µ,ε ) | ∂ i f ( y ) |(cid:107) h (cid:107) . Lemma 7.

For all linear functions l and l from R p to R , we have (cid:12)(cid:12)(cid:12) Var( E ( l ( X { n } ) | X { n } u ) − Var( E ( l ( X { n } ) | X { n } u ) (cid:12)(cid:12)(cid:12) ≤ C sup a { n } (cid:107) l − l (cid:107) . roof. For all u ⊂ [1 : p ] , let φ { n } u : R | u | −→ R p be deﬁned by φ { n } u ( x u ) = (cid:18) x u µ { n }− u + Γ { n }− u,u Γ { n }− u,u ( x u − µ { n } u ) (cid:19) and φ { n } [1: p ] = id R p .Let u ⊂ [1 : p ] . Then E ( X { n } | X { n } u ) = φ { n } u ( X { n } u ) . Now, for all linear function l : R p −→ R , we have E ( l ( X { n } ) | X { n } u ) = l (cid:16) E ( X { n } | X { n } u ) (cid:17) = l ( φ { n } u ( X { n } u )) , so, identifying a linear function from R p to R with its matrix of size × p , wehave Var (cid:16) E ( l ( X { n } ) | X { n } u ) (cid:17) = lφ { n } u Γ { n } u,u a { n } φ { n } Tu l T . Hence, for l = l and l = l , one can show that, (cid:12)(cid:12)(cid:12) Var( E ( l ( X { n } ) | X { n } u )) − Var( E ( l ( X { n } ) | X { n } u )) (cid:12)(cid:12)(cid:12) ≤ C sup a { n } (cid:107) l − l (cid:107) . Now, we can prove Proposition 3.

Proof.

By Lemmas 6 and 7, we have, for all u ⊂ [1 : p ] , Var( E ( (cid:112) a { n } f { n } ( X { n } ) | X { n } u ) − Var( E ( (cid:112) a { n } ˜ f { n } ,h { n } ( X { n } ) | X { n } u ) = O (cid:16) (cid:107) h { n } (cid:107) (cid:17) . Thus, S clu ( X { n } , f { n } ) − S clu ( X { n } , ˜ f { n } ,h { n } ) = O (cid:16) (cid:107) h { n } (cid:107) (cid:17) , so η ( X { n } , f { n } ) − η ( X { n } , ˜ f { n } ,h { n } ) = O (cid:16) (cid:107) h { n } (cid:107) (cid:17) . Proof of Proposition 4

There exists C sup such that, with probability at least − p exp( − C inf N ) − p exp( − C inf N ) , (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) A { n } T A { n } (cid:17) − A { n } T (cid:13)(cid:13)(cid:13)(cid:13) ≤ C sup √ a { n } √ N . roof. (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) A { n } T A { n } (cid:17) − A { n } T (cid:13)(cid:13)(cid:13)(cid:13) = λ max (cid:20)(cid:16) A { n } T A { n } (cid:17) − (cid:21) = a { n } N λ max (cid:34)(cid:18) a { n } N A { n } T A { n } (cid:19) − (cid:35) . Now, by the strong law of large numbers, we have almost surely a { n } N A { n } T A { n } − ( a { n } − (cid:18) µ { n } (cid:19) (cid:18) µ { n } (cid:19) T −→ N → + ∞ M { n } := (cid:18) µ { n } T µ { n } Γ { n } + µ { n } µ { n } T (cid:19) . Let M { n } := (cid:18) µ { n } T µ { n } λ inf I p + µ { n } µ { n } T (cid:19) and M := (cid:18) µ T µ λ inf I p + µµ T (cid:19) ,where λ inf > is a lower-bound of the eigenvalues of (Γ { n } ) n . We can seethat M { n } ≥ M { n } −→ n → + ∞ M . Now, det( M ) = det(1) det (cid:0) [ λ inf I p + µµ T ] − µ − µ T (cid:1) = λ p inf > . Hence, writing λ (cid:48) inf > the smallest eigenvalue of M , we have that the eigen-values of M { n } are lower-bounded by λ (cid:48) inf / for n large enough.Similarly, let M { n } := (cid:18) µ { n } T µ { n } λ sup I p + µ { n } µ { n } T (cid:19) , and M := (cid:18) µ T µ λ sup I p + µµ T (cid:19) , where λ sup > is an upper-bound of the eigenvalues of (Γ { n } ) n . Writing λ (cid:48) sup < + ∞ the largest eigenvalue of M , we have that the eigenvalues of M { n } areupper-bounded by λ (cid:48) sup for n large enough.Now, since the eigenvalues of ( M { n } ) n are lower-bounded and upper-bounded,there exists α > such that, for all n ∈ N (large enough), ∀ M ∈ S p ( R ) , (cid:107) M − M { n } (cid:107) ≤ α = ⇒ | λ min ( M ) − λ min ( M { n } ) | ≤ λ (cid:48) inf . Now, by Bernstein inequality, P (cid:32)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a { n } N A { n } T A { n } − ( a { n } − (cid:18) µ { n } (cid:19) (cid:18) µ { n } (cid:19) T − M { n } (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ α (cid:33) ≥ − p exp( − C inf N ) − × p exp( − C inf N ) ≥ − C sup exp( − C inf N ) , p exp( − C inf N ) bounds the diﬀerence of the submatrices ofindex [2 : p + 1] × [2 : p + 1] and the term × p exp( − C inf N ) bounds thediﬀerences of the submatrices of index { } × [2 : p + 1] and [2 : p + 1] × { } .Hence, with probability at least − C sup exp( − C inf N ) , we have λ min (cid:32) a { n } N A { n } T A { n } − ( a { n } − (cid:18) µ { n } (cid:19) (cid:18) µ { n } (cid:19) T (cid:33) ≥ λ (cid:48) inf , and so λ min (cid:18) a { n } N A { n } T A { n } (cid:19) ≥ λ (cid:48) inf . Lemma 9.

With probability at least − C sup exp( − C inf N ) , we have (cid:13)(cid:13)(cid:13) (cid:98) β { n } − ∇ f ( µ { n } ) (cid:13)(cid:13)(cid:13) ≤ C sup √ a { n } . Proof.

Let Z { n } ∼ N (0 , Γ { n } ) . Then (cid:107) X { n } − µ { n } (cid:107) ≤ ε with probability P ( (cid:107) Z { n } (cid:107) ≤ a { n } ε ) −→ n → + ∞ . Let Ω { n } N := { ω ∈ Ω | ∀ j ∈ [1 : N ] , (cid:107) X { n } ( j ) ( ω ) − µ { n } (cid:107) ≤ ε } . Hence, P (Ω { n } N ) ≥ − N exp (cid:16) − C inf a { n } (cid:17) −→ n → + ∞ . On B ( µ { n } , ε ) , we have f = f ( µ { n } ) + f { n } + R { n } . Hence, on Ω { n } N , for all j ∈ [1 : N ] , f ( X { n } ( j ) ) = f ( µ { n } ) + f { n } ( X { n } ( j ) ) + R { n } ( X { n } ( j ) ) . Thus, (cid:98) β { n } = (cid:16) A { n } T A { n } (cid:17) − A { n } T (cid:18) f ( µ { n } )+ f { n } ( X { n } ( j ) )+ R { n } ( X { n } ( j ) ) (cid:19) j ∈ [1: N ] . Since f ( µ { n } ) + f { n } is a linear function with gradient vector ∇ f ( µ { n } ) andwith value at zero f ( µ { n } ) − Df ( µ { n } ) µ { n } , we have, (cid:16) A { n } T A { n } (cid:17) − A { n } T ( f ( µ { n } )+ f { n } ( X { n } ( j ) )) j ∈ [1: N ] = (cid:18) f ( µ { n } ) − Df ( µ { n } ) µ { n } ∇ f ( µ { n } ) (cid:19) . Hence, it remains to see if (cid:16) A { n } T A { n } (cid:17) − A { n } T ( R { n } ( X { n } ( j ) )) j ∈ [1: N ]

26s small enough. By Lemma 1, we have on Ω { n } N , (cid:107) ( R { n } ( X { n } ( j ) )) j ∈ [1: N ] (cid:107) = N (cid:88) j =1 R { n } ( X { n } ( j ) ) ≤ C sup N (cid:88) j =1 (cid:107) X { n } ( j ) − µ { n } (cid:107) ≤ C sup a { n } N (cid:88) j =1 (cid:107) (cid:112) a { n } ( X { n } ( j ) − µ { n } ) (cid:107) . Hence, on Ω { n } N , (cid:107) ( R { n } ( X { n } ( j ) )) j ∈ [1: N ] (cid:107) ≤ C sup √ Na { n } . Thus, (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) A { n } T A { n } (cid:17) − A { n } T ( R { n } ( X { n } ( j ) )) j ∈ [1: N ] (cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) A { n } T A { n } (cid:17) − A { n } T (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ( R { n } ( X { n } ( j ) )) j ∈ [1: N ] (cid:13)(cid:13)(cid:13) ≤ C sup √ a { n } , with probability at least − C sup exp( − C inf N ) .Now, it is easy to prove Proposition 4. Proof.

By Lemma 7 for l = (cid:98) β { n } T and l = Df ( µ { n } ) , and by Lemma 9 wehave, with probability at least − C sup exp( − C inf N ) , (cid:12)(cid:12)(cid:12) Var( E ( (cid:112) a { n } Df ( µ { n } ) X { n } | X { n } u )) − Var( E ( (cid:112) a { n } (cid:98) β { n } T X { n } | X { n } u )) (cid:12)(cid:12)(cid:12) ≤ C sup (cid:107) Df ( µ { n } ) − (cid:98) β { n } T (cid:107)≤ C sup √ a { n } , where the conditional expectations and the variances are conditional to ( X { n } ( j ) ) j ∈ [1: N ] .Thus, with probability at least − C sup exp( − C inf N ) , there exists C inf > suchthat, for n large enough (cid:107) (cid:98) β { n } T (cid:107) ≥ C inf , thus Var( √ a { n } (cid:98) β { n } T X { n } ) is lower-bounded. Hence, with probability at least − C sup exp( − C inf N ) , (cid:12)(cid:12)(cid:12) S clu ( X { n } , f { n } ) − S clu ( X { n } , (cid:98) β { n } T ) (cid:12)(cid:12)(cid:12) ≤ C sup √ a { n } , and so (cid:13)(cid:13)(cid:13) η ( X { n } , f { n } ) − η ( X { n } , (cid:98) β { n } T ) (cid:13)(cid:13)(cid:13) ≤ C sup √ a { n } . Proofs for Section 4

In this section, we prove Proposition 5 in Subsections B.1 to B.6 and we proveCorollary 3 in Subsection B.7.

Recall that ( U ( l ) ) l ∈ [1: n ] is an i.i.d. sample of U with E ( U ) = µ and Var( U ) = Σ and (cid:98) X { n } = 1 n n (cid:88) l =1 U ( l ) . Let X { n } ∼ N ( µ, n Σ) . By Proposition 1, we have η (cid:16) X { n } , f (cid:17) = η (cid:16) X { n } , Df ( µ ) (cid:17) + O (cid:18) a { n } (cid:19) = η ( X ∗ , Df ( µ )) + O (cid:18) a { n } (cid:19) . Hence, it remains to prove that (cid:13)(cid:13)(cid:13) η (cid:16) (cid:98) X { n } , f (cid:17) − η (cid:16) X { n } , f (cid:17)(cid:13)(cid:13)(cid:13) −→ n → + ∞ , that is, writing f n := √ n (cid:16) f (cid:16) ·√ n + µ (cid:17) − f ( µ ) (cid:17) and ˜ X { n } := √ n ( (cid:98) X { n } − µ ) ,that (cid:13)(cid:13)(cid:13) η (cid:16) ˜ X { n } , f n (cid:17) − η ( X ∗ , f n ) (cid:13)(cid:13)(cid:13) −→ n → + ∞ . In Section 7.2, we give some lemmas of f n . Then, deﬁning E u,n,K ( Z ) := E (cid:16) E (cid:2) f n ( Z ) (cid:107) Z (cid:107) ∞ ≤ K (cid:12)(cid:12) Z u (cid:3) (cid:17) ,E u,n ( Z ) := E (cid:16) E (cid:2) f n ( Z ) (cid:12)(cid:12) Z u (cid:3) (cid:17) , we prove in Section 7.3 that sup n | E u,n,K ( ˜ X { n } ) − E u,n ( ˜ X { n } ) | converges to when K → + ∞ . In particular, for U ∼ N ( µ, Σ) , the result holds for ˜ X { n } = X ∗ .Hence, for any ε > , choosing K such that | E u,n,K ( ˜ X { n } ) − E u,n ( ˜ X { n } ) | <ε/ and | E u,n,K ( X ∗ ) − E u,n ( X ∗ ) | < ε/ , we show in Section 7.4 that | E u,n,K ( X ∗ ) − E u,n,K ( ˜ X { n } ) | −→ n → + ∞ . In Section 7.5, we conclude the proof that (cid:12)(cid:12)(cid:12)

Var( E ( f n ( ˜ X { n } ) | ˜ X { n } u )) − Var( E ( f n ( X ∗ ) | X ∗ u )) (cid:12)(cid:12)(cid:12) −→ n → + ∞ . In Section 7.6, we conclude the proof that (cid:13)(cid:13)(cid:13) η (cid:16) ˜ X { n } , f n (cid:17) − η ( X ∗ , f n ) (cid:13)(cid:13)(cid:13) −→ n → + ∞ . The key of the proof is that the probability density function of ˜ X { n } con-verges uniformly to the one of X ∗ by local limit theorem (see [She71] or Theorem19.1 of [BR86]). 28 .2 Part 1 Lemma 10.

There exists C sup < + ∞ such that, for all x ∈ R p , | f n ( x ) | ≤ C sup (cid:32) (cid:107) x (cid:107) (cid:107) x (cid:107)≤√ n + (cid:107) x (cid:107) k √ n k − (cid:107) x (cid:107) > √ n ) (cid:33) , where we recall that k ∈ N ∗ is such that for all x ∈ R p , we have | f ( x ) | ≤ C (1 + (cid:107) x (cid:107) k ) .Proof. For all x ∈ R p , we have (cid:12)(cid:12)(cid:12)(cid:12) f (cid:18) x √ n + µ (cid:19) − f ( µ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) f (cid:18) x √ n + µ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) + | f ( µ ) |≤ C sup (cid:32) (cid:13)(cid:13)(cid:13)(cid:13) x √ n + µ (cid:13)(cid:13)(cid:13)(cid:13) k (cid:33) + | f ( µ ) |≤ C sup (cid:32) (cid:13)(cid:13)(cid:13)(cid:13) x √ n (cid:13)(cid:13)(cid:13)(cid:13) k (cid:33) . Thus, for all (cid:107) x (cid:107) ≥ √ n , we have | f n ( x ) | ≤ C sup (cid:107) x (cid:107) k √ n k − . If (cid:107) x (cid:107) ≤ √ n , we have (cid:12)(cid:12)(cid:12)(cid:12) f (cid:18) x √ n + µ (cid:19) − f ( µ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ max (cid:107) y (cid:107)≤ (cid:107) µ (cid:107) (cid:107) Df ( y ) (cid:107) (cid:13)(cid:13)(cid:13)(cid:13) x √ n + µ − µ (cid:13)(cid:13)(cid:13)(cid:13) ≤ C sup (cid:13)(cid:13)(cid:13)(cid:13) x √ n (cid:13)(cid:13)(cid:13)(cid:13) , and thus, | f n ( x ) | ≤ C sup (cid:107) x (cid:107) . In particular, | f n ( x ) | ≤ C sup ( (cid:107) x (cid:107) + (cid:107) x (cid:107) k ) , f n ( x ) ≤ C sup ( (cid:107) x (cid:107) + (cid:107) x (cid:107) k ) Lemma 11.

For i = 1 , , we have E ( f n ( ˜ X { n } ) i ) ≤ C sup . Proof.

We have E ( f n ( ˜ X { n } ) i ) ≤ C sup (cid:16) E ( (cid:107) ˜ X { n } (cid:107) ik ) + E ( (cid:107) ˜ X { n } (cid:107) i ) (cid:17) ≤ C sup (cid:16) E ( (cid:107) ˜ X { n } (cid:107) ik ik ) + E ( (cid:107) ˜ X { n } (cid:107) i i ) (cid:17) . E ( | ˜ X j | ik ) = 1 n ik E (cid:32) n (cid:88) l =1 U ( l ) j − µ j (cid:33) ik  ≤ C sup n ik max (cid:18) n E ([ U (1) j − µ j ] ik ) , (cid:104) n E ([ U (1) j − µ j ] ) (cid:105) ik (cid:19) ≤ C sup . Lemma 12.

For all v ⊂ [1 : p ] , v (cid:54) = ∅ and for i = 1 , , we have sup n E (cid:16) f n ( ˜ X { n } ) i ˜ X { n } v / ∈ [ − K,K ] | v | (cid:17) −→ K → + ∞ . Proof.

We have E (cid:16) f n ( ˜ X { n } ) i ˜ X { n } v / ∈ [ − K,K ] | v | (cid:17) ≤ (cid:114) E (cid:16) f n ( ˜ X { n } ) i (cid:17)(cid:113) P ( ˜ X { n } v / ∈ [ − K, K ] | v | ) . By Lemma 11, sup n (cid:114) E (cid:16) f n ( ˜ X { n } ) i (cid:17) is bounded.Now, since ( ˜ X { n } v ) n converges in distribution, it is a tight sequence, hence sup n P (cid:16) ˜ X { n } v / ∈ [ − K, K ] | v | (cid:17) ≤ sup n P ( (cid:107) ˜ X { n } v (cid:107) ≥ K ) −→ K → + ∞ . Lemma 13.

The sequence ( f n ) n converges pointwise to Df ( µ ) .Proof. For all x ∈ R , f (cid:18) x √ n + µ (cid:19) − f ( µ ) = Df ( µ ) x √ n + O (cid:32)(cid:13)(cid:13)(cid:13)(cid:13) x √ n (cid:13)(cid:13)(cid:13)(cid:13) (cid:33) , so, f n ( x ) = Df ( µ ) x + O (cid:18) (cid:107) x (cid:107) √ n (cid:19) . .3 Part 2 We want to prove that, for all u ⊂ [1 : p ] , u (cid:54) = ∅ , we have sup n | E u,n,K ( ˜ X { n } ) − E u,n ( ˜ X { n } ) | −→ K → + ∞ . We will prove this result for ∅ (cid:32) u (cid:32) [1 : p ] , since it is easier for u = [1 : p ] (see Remark 4).We have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) R | u | (cid:18)(cid:90) R |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19) d P ˜ X { n } u ( x u ) − (cid:90) [ − K,K ] | u | (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:33) d P ˜ X { n } u ( x u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) ([ − K,K ] | u | ) c (cid:18)(cid:90) R |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19) d P ˜ X { n } u ( x u )+ (cid:90) [ − K,K ] | u | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:18)(cid:90) R |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19) − (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d P ˜ X { n } u ( x u ) . We have to bound the two summands of the previous upper-bound.The ﬁrst term converges to by Lemma 12. Let us bound the second term.31y mean-value inequality with the square function, we have (cid:90) [ − K,K ] | u | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:18)(cid:90) R |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19) − (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d P ˜ X { n } u ( x u ) ≤ (cid:90) [ − K,K ] | u | (cid:18)(cid:90) R |− u | | f n ( x ) | d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) R |− u | x − u / ∈ [ − K,K ] |− u | f n ( x ) d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:12)(cid:12)(cid:12)(cid:12) d P ˜ X { n } u ( x u ) ≤ (cid:90) [ − K,K ] | u | (cid:18)(cid:90) R |− u | | f n ( x ) | d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19) × (cid:18)(cid:90) R |− u | x − u / ∈ [ − K,K ] |− u | | f n ( x ) | d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19) d P ˜ X { n } u ( x u ) ≤ (cid:113) E ( E ( | f n ( ˜ X { n } ) | | ˜ X { n } u ) ) × (cid:115)(cid:90) R | u | (cid:18)(cid:90) R |− u | x − u / ∈ [ − K,K ] |− u | | f n ( x ) | d P ˜ X { n }− u | ˜ X { n } u = x u ( x − u ) (cid:19) d P ˜ X { n } u ( x u ) . Now, E ( E ( | f n ( ˜ X { n } ) | | ˜ X { n } u ) ) ≤ E ( f n ( ˜ X { n } ) ) which is bounded by Lemma11 and the other term converges to uniformly on n by Lemma 12. Remark 4.

In the case where u = [1 : p ] , it is much simpler, since E ( f n ( ˜ X { n } ) ) − E ( f n ( ˜ X { n } ) ˜ X { n } ∈ [ − K,K ] p ) = E ( f n ( ˜ X { n } ) ˜ X { n } / ∈ [ − K,K ] p ) , which converges to 0 uniformly on n when K → + ∞ by Lemma 12. Let K ∈ R ∗ + and u ⊂ [1 : p ] such that u (cid:54) = ∅ . We want to prove that | E u,n,K ( X ∗ ) − E u,n,K ( ˜ X { n } ) | −→ n → + ∞ . The case u = [1 : p ] is much easier (see Remark 5), hence, assume that ∅ (cid:32) u (cid:32) [1 : p ] . Since K is ﬁxed, the probability density function f X ∗ of X ∗ islower-bounded by a > on [ − K, K ] p . Let ε n := max ∅ (cid:32) u ⊂ [1: p ] sup x ∈ R p | f X ∗ ( x ) − f ˜ X { n } ( x ) | . Using local limit theorem (see Theorem 19.1 of [BR86] or [She71]), ε n −→ n → + ∞ . We assume that n is large enough such that ε n ≤ a . Let b < + ∞ be the maximum of f X ∗ . 32e have | E u,n,K ( X ∗ ) − E u,n,K ( ˜ X { n } ) |≤ (cid:90) [ − K,K ] | u | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) f X ∗ ( x ) f X ∗ u ( x u ) dx − u (cid:33) − (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) f ˜ X { n } ( x ) f ˜ X { n } u ( x u ) dx − u (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f X ∗ u ( x u ) dx u + (cid:90) [ − K,K ] | u | (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) f ˜ X { n } ( x ) f ˜ X { n } u ( x u ) dx − u (cid:33) | f X ∗ u ( x u ) − f ˜ X { n } u ( x u ) | dx u . Hence, we have to prove the convergence of the two summands in the previousupper-bound. For the second term, it suﬃces to remark that | f X ∗ u ( x u ) − f ˜ X { n } u ( x u ) | ≤ ε n ≤ ε n a f ˜ X { n } u ( x u ) . Hence, the second term is smaller than ε n a E ( f n ( ˜ X { n } ) ) that converges to 0. Itremains to prove that the ﬁrst term converges to . By mean-value inequality,we have (cid:90) [ − K,K ] | u | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) f X ∗ ( x ) f X ∗ u ( x u ) dx − u (cid:33) − (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) f ˜ X { n } ( x ) f ˜ X { n } u ( x u ) dx − u (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f X ∗ u ( x u ) dx u ≤ (cid:90) [ − K,K ] | u | (cid:32)(cid:90) [ − K,K ] |− u | | f n ( x ) | max (cid:32) f X ∗ ( x ) f X ∗ u ( x u ) , f ˜ X { n } ( x ) f ˜ X { n } u ( x u ) (cid:33) dx − u (cid:33) × (cid:32)(cid:90) [ − K,K ] |− u | | f n ( x ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f X ∗ ( x ) f X ∗ u ( x u ) − f ˜ X { n } ( x ) f ˜ X { n } u ( x u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dx − u (cid:33) f X ∗ u ( x u ) dx u . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f X ∗ ( x ) f X ∗ u ( x u ) − f ˜ X { n } ( x ) f ˜ X { n } u ( x u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | f X ∗ ( x ) − f ˜ X { n } ( x ) | f X ∗ u ( x u ) + f ˜ X { n } ( x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f X ∗ u ( x u ) − f ˜ X { n } u ( x u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | f X ∗ ( x ) − f ˜ X { n } ( x ) | f X ∗ u ( x u ) + f ˜ X { n } ( x ) 4 a (cid:12)(cid:12)(cid:12) f X ∗ u ( x u ) − f ˜ X { n } u ( x u ) (cid:12)(cid:12)(cid:12) ≤ ε n f X ∗ u ( x u ) + f ˜ X { n } ( x ) 4 a ε n ≤ ε n f X ∗ u ( x u ) + f X ∗ ( x ) 8 a ε n ≤ ε n a f X ∗ ( x ) f X ∗ u ( x u ) + 8 ba ε n f X ∗ ( x ) f X ∗ u ( x u ) ≤ C sup ε n f X ∗ ( x ) f X ∗ u ( x u ) . Hence, for n large enough such that C sup ε n ≤ , we have (cid:90) [ − K,K ] | u | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) f X ∗ ( x ) f X ∗ u ( x u ) dx − u (cid:33) − (cid:32)(cid:90) [ − K,K ] |− u | f n ( x ) f ˜ X { n } ( x ) f ˜ X { n } u ( x u ) dx − u (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f X ∗ u ( x u ) dx u ≤ (cid:90) [ − K,K ] | u | (cid:32)(cid:90) [ − K,K ] |− u | | f n ( x ) | f X ∗ ( x ) f X ∗ u ( x u ) dx − u (cid:33) × (cid:32)(cid:90) [ − K,K ] |− u | | f n ( x ) | C sup ε n f X ∗ ( x ) f X ∗ u ( x u ) dx − u (cid:33) f X ∗ u ( x u ) dx u ≤ C sup ε n E ( f n ( X ∗ ) ) , that converges to . Remark 5. If u = [1 : p ] , it suﬃces to remark that | f X ∗ ( x ) − f ˜ X { n } ( x ) | ≤ ε n ≤ ε n a f X ∗ ( x ) . Thus, | E u,n,K ( X ∗ ) − E u,n,K ( ˜ X { n } ) |≤ (cid:90) [ − K,K ] p f n ( x ) | f X ∗ ( x ) − f ˜ X { n } ( x ) | dx ≤ ε n a E ( f n ( X ∗ ) ) ≤ C sup ε n . .5 Part 4 Let us prove that E ( f n ( ˜ X { n } )) − E ( f n ( X ∗ )) −→ By lemma 12, we have sup n (cid:12)(cid:12)(cid:12) E ( f n ( ˜ X { n } ) − E ( f n ( ˜ X { n } ) ˜ X { n } ∈ [ − K,K ] p ) (cid:12)(cid:12)(cid:12) −→ K →∞ Let ε > and let K such that sup n (cid:12)(cid:12)(cid:12) E ( f n ( ˜ X { n } ) − E ( f n ( ˜ X { n } ) ˜ X { n } ∈ [ − K,K ] p ) (cid:12)(cid:12)(cid:12) < ε and sup n (cid:12)(cid:12) E ( f n ( X ∗ ) − E ( f n ( X ∗ ) X ∗ ∈ [ − K,K ] p ) (cid:12)(cid:12) < ε . By local limit theorem, we have (cid:12)(cid:12)(cid:12) E ( f n ( ˜ X { n } ) ˜ X { n } ∈ [ − K,K ] p ) − E ( f n ( X ∗ ) X ∗ ∈ [ − K,K ] p ) (cid:12)(cid:12)(cid:12) −→ n → + ∞ . Thus, for all u ⊂ [1 : p ] , we have Var( E ( f n ( ˜ X { n } ) | ˜ X { n } u )) − Var( E ( f n ( X ∗ ) | X ∗ u )) −→ n → + ∞ . To prove the convergence of the Shapley eﬀects, it suﬃces to prove the

Var( f n ( X ∗ )) is lower-bounded. Hence, we show that Var( f n ( X ∗ )) converges to Var( Df ( µ ) X ∗ ) .Let i = 1 , and let ε > . By Lemma 12, let K such that sup n E ( f n ( X ∗ ) i X ∗ / ∈ [ − K,K ] p ) ≤ ε , E ([ Df ( µ ) X ∗ ] i X ∗ / ∈ [ − K,K ] p ) ≤ ε . By Lemmas 10 and 13 and by dominated convergence theorem, we have : E ( f n ( X ∗ ) i X ∗ ∈ [ − K,K ] p ) −→ n → + ∞ E ([ Df ( µ ) X ∗ ] i X ∗ ∈ [ − K,K ] p ) . Hence,

Var( f n ( X ∗ )) converges to Var( Df ( µ ) X ∗ ) . Thus, for all u ⊂ [1 : p ] S clu ( ˜ X { n } , f n ) − S u ( X ∗ , f n ) −→ n → + ∞ . Hence, (cid:13)(cid:13)(cid:13) η ( ˜ X { n } , f n ) − η ( X, f n ) (cid:13)(cid:13)(cid:13) −→ n → + ∞ . .7 Proof of Corollary 3 Since (cid:98) X { n (cid:48) }(cid:48) a.s −→ n → + ∞ µ and (cid:98) Σ { n (cid:48)(cid:48) }(cid:48) a.s −→ n → + ∞ Σ , it suﬃces to prove that, if ( x { n } ) n converges to µ , and (Σ { n } ) n converges to Σ , we have (cid:13)(cid:13)(cid:13) η ( (cid:98) X { n } , f ) − η ( X ∗ n , ˜ f { n } ,h { n } ,x { n } ) (cid:13)(cid:13)(cid:13) −→ n → + ∞ , where X ∗ n is a random vector with distribution N ( µ, Σ { n } ) . Let ( x { n } ) n and (Σ { n } ) n be such sequences. Recall that (cid:13)(cid:13)(cid:13) η ( ˜ X { n } , f n ) − η ( X ∗ , f n ) (cid:13)(cid:13)(cid:13) −→ n → + ∞ , where X ∗ ∼ N (0 , Σ) , that is (cid:13)(cid:13)(cid:13) η (cid:16) (cid:98) X { n } , f (cid:17) − η (cid:16) X { n } , f (cid:17)(cid:13)(cid:13)(cid:13) −→ n → + ∞ , where X { n } ∼ N ( µ, n Σ) . Hence, we have to prove that (cid:13)(cid:13)(cid:13) η ( X { n } , f ) − η ( X ∗ n , ˜ f { n } ,h { n } ,x { n } ) (cid:13)(cid:13)(cid:13) −→ n → + ∞ . By Propositions 1 and Proposition 2, remark that η ( X { n } , f ) converges to η ( X ∗ , f ) . Moreover, η ( X ∗ n , ˜ f { n } ,h { n } ,x { n } ) = η ( X ∗ n + x { n } − µ { n } , ˜ f { n } ,h { n } ,x { n } ) −→ n → + ∞ η ( X ∗ , f ) ,,