Local Polynomial Estimation for Sensitivity Analysis on Models With Correlated Inputs
aa r X i v : . [ s t a t . M E ] M a r Local Polynomial Estimation for Sensitivity Analysis on Models WithCorrelated Inputs
Sebastien DA VEIGA (1) , (2) , Francois WAHL (1) , Fabrice GAMBOA (2) (1) : IFP-Lyon, France(2) : Institut de Mathematiques, Toulouse, France Abstract
Sensitivity indices when the inputs of a model are not independent are estimated by local polynomial techniques. Two originalestimators based on local polynomial smoothers are proposed. Both have good theoretical properties which are exhibited andalso illustrated through analytical examples. They are used to carry out a sensitivity analysis on a real case of a kinetic modelwith correlated parameters.KEY WORDS: Nonparametric regression; Global sensitivity indices; Conditional moments estimation.
Achieving better knowledge of refining processes usually re-quires to build a kinetic model predicting the output compo-nents produced in a unit given the input components intro-duced (the “feed”) and the operating conditions. Such a modelis based on the choice of a reaction mechanism depending onvarious parameters ( e.g. kinetic constants). But the complexityof the mechanism, the variability of the behavior of catalystswhen they are used and the difficulty of observations and exper-iments imply that most often these parameters cannot be in-ferred from theoretical considerations and need to be estimatedthrough practical experiments. This estimation procedure leadsto consider them uncertain and this uncertainty spreads on themodel predictions. This can be highly problematic in real sit-uations. It is then essential to quantify this uncertainty andto study the influence of parameters variations on the modeloutputs through uncertainty and sensitivity analysis.During the last decades much effort in mathematical analy-sis of physical processes has focused on modeling and reasoningwith uncertainty and sensitivity. Model calibration and vali-dation are examples where sensitivity and uncertainty analysishave become essential investigative scientific tools. Roughlyspeaking, uncertainty analysis refers to the inherent variationsof a model ( e.g. a modeled physical process) and is helpfulin finding the relation between some variability or probabilitydistribution on input parameters and the variability and prob-ability distribution of outputs, while sensitivity analysis inves-tigates the effects of varying a model input on the outputs andascertains how much a model depends on each or some of itsinputs.Over the years several mathematical and computer-assistedmethods have been developed to carry out global sensitivityanalysis and the reader may refer to the book of Saltelli, Chan& Scott (2000) for a wide and thorough review. Amongst thesemethods a particular popular class is the one composed by“variance-based” methods which is detailed below. Let us con- sider a mathematical model given by Y = η ( X ) (1)where η : R d → R is the modeling function, Y ∈ R representsthe output or prediction of the model and X = ( X , ..., X d ) isthe d -dimensional real vector of the input factors or parame-ters. The vector of input parameters is treated as a randomvector, which implies that the output is also a random variable.In variance-based methods, we are interested in explaining thevariance Var( Y ) through the variations of the X i , i = 1 , ..., d and we decompose Var( Y ) as follows :Var( Y ) = Var( E ( Y | X i )) + E (Var( Y | X i ))for all i = 1 , ..., d where E ( Y | X i ) and Var( Y | X i ) are respec-tively the conditional expectation and variance of Y given X i .The importance of X i on the variance of Y is linked to howwell E ( Y | X i ) fits Y and can then be measured by the first or-der sensitivity index S i = Var( E ( Y | X i ))Var( Y )also called correlation ratio . We can also introduce sensitivityindices of higher orders to take into account input interactions.For example, the second order sensitivity index for X i and X j is S ij = Var( E ( Y | X i , X j )) − Var( E ( Y | X i )) − Var( E ( Y | X j ))Var( Y ) , and so on for other orders, see Saltelli et al. (2000) for details.In the case of independent inputs, two techniques, Sobol(Sobol’ 1993) and FAST (Cukier, Fortuin, Shuler, Petschek &Schaibly 1973) are the most popular methods for estimating1he S i indices. Although powerful and computationally effi-cient, these methods rely on the assumption of independentinputs which is hard to hold in many practical cases for ki-netic models. Nevertheless, three original methods, originatedby Ratto, Tarantola & Saltelli (2001), Jacques, Lavergne & De-victor (2004) and Oakley & O’Hagan (2004), try to deal withthis problem. The first one sets out to calculate the sensitivityindices by using a replicated latin hypercube sampling, but thisapproach requires a large amount of model evaluations to reachan acceptable precision. The second one is based on the ideaof building new sensitivity indices which generalize the orig-inal ones by taking into account block of correlations amongthe inputs. This method is however useless when many inputfactors are correlated. The last approach is that of Oakley &O’Hagan (2004) and rely upon the idea of approximating thefunction η in model (1) by a so-called ’kriging’ response surface(Santner, Williams & Notz 2003) and of computing analyti-cal expressions of the sensitivity indices based on the results ofthe kriging approximation. However appealing and accurate,these analytical expressions involve multidimensional integralsthat are only tractable when the conditional densities of theinput factors are known and easy to integrate. If this is notthe case the multidimensional integrals must be approximatednumerically, but at high computational cost. We then proposea new way of estimating sensitivity indices through an interme-diate technique in the sense that it is based on a sample fromthe joint density of the inputs and the output like Ratto et al.(2001) but also on a nonparametric regression model like Oak-ley & O’Hagan (2004). This approach does not require as manymodel evaluations as Ratto et al. (2001) and does not require toapproximate multidimensional integrals as Oakley & O’Hagan(2004) in the general case.In this paper to deal with correlated inputs we consider anew method based on local polynomial approximations for con-ditional moments (see the work of Fan & Gijbels (1996) andFan, Gijbels, Hu & Huang (1996) on conditional expectationand the papers of Fan & Yao (1998) and Ruppert, Wand, Holst& Hˆssjer (1997) on the conditional variance). Given the formof the sensitivity indices, local polynomial regression can beused to estimate them. This approach not only allows to com-pute a sensitivity index through an easy black-box procedurebut also reaches a good precision.The paper is organized as follows. In Section 1 we reviewthe methods of Ratto et al. (2001), Jacques et al. (2004) andOakley & O’Hagan (2004) and discuss their merits and draw-backs. In Section 2 we propose and study two new estimatorsfor sensitivity indices relying on local polynomial methods. InSection 3 we present both analytical and practical examples.In Section 4 we finally give some conclusions and directions forfuture research.
1. MODELS WITH CORRELATED INPUTS
When the inputs are independent, Sobol showed that the sumof the sensitivity indices of all orders is equal to 1, due to an or-thogonal decomposition of the function η (Sobol’ 1993). Indeedsensitivity indices naturally arise from this functional ANOVAdecomposition. Nevertheless, when the inputs are correlated,this property does not hold anymore because such a decompo- sition can not be done without taking into account the jointdistribution of the inputs. If one decides to estimate sensitivityindices under the independence hypothesis although it does nothold, results and consequently interpretation can be higly mis-leading, see the first example of Section 3.1. But we can stillconsider the initial ANOVA decomposition and work with theoriginal sensitivity indices without ignoring the correlation, andwhen quantifying the first order sensitivity index of a particularinput factor a part of the sensitivity of all the other input fac-tors correlated with it is also taken into account. Thus the sameinformation is considered several times. Interpretation of sen-sitivity indices when the inputs are not independent becomesproblematic. However, the input factors being independent ornot, the first-order sensitivity index still points out which factor(if fixed) will mostly reduce the variance of the output. Thus,if the goal of the practitioner is to conduct a ’Factors Priori-tisation’ (Saltelli, Tarantola, Campolongo & Ratto 2004), i.e. identifying the factor that one should fix to achieve the greatestreduction in the uncertainty of the output, first-order sensitivityindices remain the measure of importance to study, see Saltelliet al. (2004). Considering that this goal is common for prac-titioners, being able to compute first-order sensitivity indiceswhen the inputs are no longer independent is an interestingchallenge.Beyond this problem of interpretation, correlation alsomakes the computational methods FAST and Sobol unusableas they have been designed for the independent case. To getover these difficulties, it is first possible to build ’new’ sensitivityindices that would generalize the original ones and match theirproperties, allowing interpretation. This is the idea of multidi-mensional sensitivity analysis of Jacques (Jacques et al. 2004)detailed in the next section. Secondly, Ratto et al. (2001) triedto continue on working with the original sensitivity indices andto compute them as described in Section 1.3, even if they do notgive clues for interpretation. The authors generate replicatedlatin hypercube samples to approximate conditional densities.Finally, Oakley & O’Hagan (2004) suggest to approach thefunction η in model (1) by a kriging response surface which al-lows to get analytical expressions of sensitivity indices throughmultidimensional integrals.1.1 Multidimensional Sensitivity AnalysisTo define multidimensional sensitivity indices, Jacques et al.(2004) suggest to split X into p vectors U j , j = 1 , ..., p , eachof size k j such that U j is independent from U l for 1 ≤ j, l ≤ p , j = l : X = ( X , ..., X d ) = ( X , ..., X k | {z } U , X k +1 , ..., X k + k | {z } U , ......, X k + k + ... + k p − +1 , ..., X k + k + ... + k p | {z } U p )where k + k + .... + k p = d . For example, if X = ( X , X , X )where X is independent of X and X but X and X arecorrelated, we set U = X and U = ( X , X ).Thus they build first order multidimensional sensitivity in-dices using the U j vectors :2 j = Var( E ( Y | U j ))Var( Y )= Var( E ( Y | X k + k + ... + k j − +1 , ..., X k + k + ... + k j ))Var( Y )for j = 1 , ..., p . Remark that if the inputs are independent,these sensitivity indices have the same expression as in classi-cal sensitivity analysis. Finally, it is possible to compute theseindices by Monte-Carlo estimations.Nevertheless, this method has a main drawback hard toovercome. If all the inputs are correlated, the U j vectors cannotbe defined (except the trivial case U = X ) and interpretationis not possible. The problem remains the same if too manyinputs are dependent because this situation leads to considervery few multidimensional indices. Moreover, identifying a setof correlated variables U j with high sensitivity index does notallow to point up whether this is due to one particular input ofthe set as we cannot differentiate among them. We will illus-trate this phenomenon in the second example of Section 3.1.1.2 Correlation-Ratios With Known Conditional DensityFunctionsThe estimator introduced by Ratto et al. (2001) was first dis-cussed in McKay (1996) and is based on samples from the con-ditional density functions of Y given X i , i = 1 , ..., d .Let ( X j ) j =1 ,...,n be an i.i.d sample of size n from the distri-bution of the vector X . ( X ji ) j =1 ,...,n is then an i.i.d. sample ofsize n from the distribution of the input factor X i . For eachrealization X ji of this sample, let ( Y jki ) k =1 ,...,r be an i.i.d. sam-ple of size r from the conditional density function of Y given X i = X ji and define the sample means Y ji = 1 r r X k =1 Y jki Y i = 1 n n X j =1 Y ji . Note that Y ji and r P rk =1 ( Y jki − Y ji ) respectively estimatethe conditional expectation E ( Y | X i = X ji ) and the conditionalvariance Var( Y | X i = X ji ), while Y i estimates E ( Y ).Using these moments estimators the numerator of the firstorder sensitivity index S i , Var( E ( Y | X i )), can be estimated bythe empirical estimator1 n n X j =1 ( Y ji − Y i ) . Similarly the denominator of S i , Var( Y ), is estimated by1 n n X j =1 r r X k =1 ( Y jki − Y i ) . The estimator of the first order sensitivity index S i of the inputfactor X i , i = 1 , ..., d is then defined asˆ S i = SSBSST where
SSB = r n X j =1 ( Y ji − Y i ) and SST = n X j =1 r X k =1 ( Y jki − Y i ) . To compute these indices and to generate the samples needed,Ratto uses two different methods : pure Monte-Carlo samplingand a single replicated Latin HyperCube (r-LHS) sampling.It is crucial to note, however, that these two methods requirea huge amount of model evaluations to reach a good precisionand can only be used for cases where model runs have very lowcomputational cost.1.3 Bayesian Sensitivity AnalysisThe idea of Oakley & O’Hagan (2004) is to see the function η ( · )as an unknown smooth function and to formulate a prior dis-tribution for it. More precisely, it is modeled as the realizationof a Gaussian stationary random field with given mean and co-variance functions. Then, given a set of of values y i = η ( x i ), wecan derive the posterior distribution of η ( · ) by classical Bayesianconsiderations. The prior distribution of η ( x ) is a Gaussian sta-tionary field : η ( x ) = h ( x ) t β + Z ( x )conditionally on β and σ , where h ( · ) is a vector of q knownregression functions and Z ( X ) is a Gaussian stationary randomfield with zero mean and covariance function σ c ( x , x ′ ). Thevector h ( · ) and the correlation function c ( · , · ) are to be chosenin order to incorporate some information about how the outputresponds to the inputs and about the amount of smoothnesswe require on the output respectively. We refer the reader toSantner et al. (2003) and to Kennedy & O’Hagan (2001) fora detailed discussion on these choices. The second stage priorconcerns the conjugate prior form for β and σ , which is chosento be a normal inverse gamma distribution. Now assuming weobserve a set y of n values of y i = η ( x i ), we can derive thatthe posterior distribution of η ( · ) given these data is a Studentdistribution, see Oakley & O’Hagan (2004) for details.Using this posterior distribution, sensitivity indices can be com-puted analytically through multidimensional integrals involvingfunctions of the observations and the conditional distributionsof the input factors only. The main advantage of this Bayesianapproach is that the model is only evaluated to calculate thequantities above, i.e. to ’fit’ the response surface. Once thisis done the estimation of sensitivity indices just involves theconditional distributions of the input factors. When the num-ber of model runs is fixed, this method clearly reduces thestandard errors of the estimated sensitivity indices obtained byMonte-Carlo methods such as Sobol (when the input factorsare independent) and can still be used when the input factorsare not independent.However, the multidimensional integrals leading to the com-putation of the sensitivity indices, if not tractable analytically,need to be estimated. Let us describe more particularly one3f the estimators proposed in Oakley & O’Hagan (2004). Wekeep the authors notations and denote by E ∗ the expectationsdefined with respect to the posterior distribution of η ( · ). Thenumerator of the first-order sensitivity index of Y with respectto X is estimated by E ∗ (Var( E ( Y | X ))) = E ∗ ( E ( E ( Y | X ) )) − E ∗ ( E ( Y ) )and one of the quantities involved in the computation of E ∗ ( E ( E ( Y | X ) )) is for example U = Z R d − Z R d − Z R c ( x , x ∗ ) d F − | ( x − | x )d F − | ( x ′− | x ) d F ( x )where F − | is the marginal distribution of X − (subvectorof X containing all elements except X ) given X , F is themarginal distribution of X and x ∗ denotes the vector with ele-ments made up of x and x ′− in the same way as x is composedof x and x − . If the conditional distribution F − | is not an-alytically known, we first need to estimate it with a sample ofthe joint distribution F . Many methods have been developed todo so, let us just mention for example kernel techniques. But ingeneral in high dimension the data is very sparsely distributedand it is difficult to get an accurate approximation of condi-tional distributions since the so-called curse of dimensionalityarises. For instance the best possible MSE rate with kerneltechniques is n − / (4+ d ) which becomes worse as d gets larger.Moreover, even if we could get a good approximation of F − | ,still remains the problem of evaluating the multidimensionalintegrals. Indeed the dimensionality of these integrals canreach 2 d − U above. Since theseintegrals can not in general be separated into unidimensionalintegrals, approximating them with a sufficent accuracy is notan obvious mathematical problem. Deterministic schemes cannot reasonably be considered, and with Monte-Carlo or quasiMonte-Carlo sampling (Owen 2005) thousands (or millions) ofdraws are required to get a reasonable accuracy.With unknown densities, even if conceptually, samplingrather than analytical integration in the Oakley and O’Haganapproach seems reasonable, the results could be highly af-fected by the curse of dimensionality. Let us mention that Pr.O’Hagan has public domain software carrying out this analysis.However it does not yet allow to consider dependent inputs.
2. NEW ESTIMATION METHODOLOGY
Our approach is to estimate the conditional moments E ( Y | X i = X ji ) and Var( Y | X i = X ji ) with an intermediate method be-tween the one of Ratto et al. (2001) and Oakley & O’Hagan(2004). We first use a sample ( X i , Y i ) to estimate the condi-tional moments with nonparametric tools (provided they aresmooth functions of the input factors). Then, we computesensitivity indices by using another sample of the input factorsonly (and thus no more model runs are needed). While Oakley& O’Hagan (2004) approximate the function η ( X ) in R d , weapproximate it marginally, i.e. we approximate the conditionalexpectations E ( η ( X ) | X i ) in R . This approach allows to over-come the multidimensional integration problem of the Bayesian sensitivity analysis.To simplify the notations, until Section 2.4 ( X, Y ) willstand for a bivariate random vector ( i.e. X is unidimen-sional). As the variance may be decomposed as Var( Y ) =Var( E ( Y | X )) + E (Var( Y | X )), the index we wish to estimatecan be written S = Var( E ( Y | X ))Var( Y ) or S = 1 − E (Var( Y | X ))Var( Y ) . (2)These expressions clearly give two ways of estimating S : theissue is to be able to estimate Var( E ( Y | X )) or alternatively E (Var( Y | X )), obviously by estimating first the conditional mo-ments E ( Y | X = x ) and Var( Y | X = x ) ( x ∈ R ). In both casesthe denominator term Var( Y ) can be easily estimated. To ap-proximate the conditional moments, we propose to use localpolynomial regression. This highly statistical efficient tool iseasy to apprehend as it is close to the weighted least-squaresapproach in regression problems. Only basic results will be pre-sented here, for a detailed picture of the subject the interestedreader is referred to Fan & Gijbels (1996).2.1 Formulation of the EstimatorsLet ( X i , Y i ) i =1 ,...,n be a two-dimensional i.i.d. sample of a realrandom vector ( X, Y ). Assuming that X and Y are square in-tegrable we may write an heteroskedastic regression model of Y i on X i , exhibiting the conditional expectation and variance,as Y i = m ( X i ) + σ ( X i ) ǫ i , i = 1 , . . . , n where m ( x ) = E ( Y | X = x ) and σ ( x ) = Var( Y | X = x )( x ∈ R ) are the conditional moments and the errors ǫ , . . . , ǫ n are independent random variables satisfying E ( ǫ i | X i ) = 0 andVar( ǫ i | X i ) = 1. Usually ǫ i and X i are assumed to be inde-pendent although this is not the case in our work. Note thatresults for correlated errors have been recently developed (Vilar-Fern´andez & Francisco-Fern´andez (2002) for the autoregressivecase for example). Local polynomial fitting consists in approx-imating locally the regression function m by a p -th order poly-nomial m ( z ) ≈ p X j =0 β j ( z − x ) j for z in a neighborhood of x . This polynomial is then fitted tothe observations ( X i , Y i ) by solving the weighted least-squaresproblem min β n X i =1 (cid:16) Y i − p X j =0 β j ( X i − x ) j (cid:17) K (cid:16) X i − xh (cid:17) (3)where K ( . ) denotes a kernel function and h is a smooth-ing parameter (or bandwidth ). In this case, if ˆ β ( x ) =( ˆ β ( x ) , ..., ˆ β p ( x )) T denotes the minimizer of (3) we haveˆ m ( x ) = ˆ β ( x ) , while the ν -th derivative of m ( x ) is estimated via the relationˆ β ν ( x ) = ˆ m ( ν ) ( x ) ν ! , h is chosen to balancebias and variance of the estimator. Finally, remark that theparticular case p = 0 (constant fit) leads to the well-known Nadaraya-Watson estimator ˆ m NW ( x ) of the conditional expec-tation, given explicitly byˆ m NW ( x ) = n X i =1 Y i K (cid:16) X i − xh (cid:17) n X i =1 K (cid:16) X i − xh (cid:17) , see Wand & Jones (1994).Estimation of the conditional variance is less straightfor-ward. If the regression function m was known, the prob-lem of estimating σ ( . ) would be regarded as a local poly-nomial regression of r i on X i with r i = ( Y i − m ( X i )) , as E ( r | X = x ) = σ ( x ) with r = ( Y − m ( X )) . But in practice, m is unknown. A natural approach is to substitute m ( . ) byits estimate ˆ m ( . ) defined as above and to get the the residual-based estimator ˆ σ ( x ) by solving as previously the weightedleast-squares problemmin γ n X i =1 (cid:16) ˆ r i − q X j =1 γ j ( X i − x ) j (cid:17) K (cid:16) X i − xh (cid:17) (4)where ˆ r i = ( Y i − ˆ m ( X i )) , K ( . ) is a kernel and h a smoothingparameter. Note that the kernel K ( . ) is not necessarily chosento be equal to the kernel K ( . ). Thenˆ σ ( x ) = ˆ γ ( x )where ˆ γ ( x ) = (ˆ γ ( x ) , ..., ˆ γ q ( x )) is the minimizer of (4). As pre-viously, the smoothing parameter h has to be chosen to balancebias and variance of the estimator, see Fan & Yao (1998).Going back over the equalities in (2), the last step is to es-timate the quantities Var( E ( Y | X )) and E (Var( Y | X )) by usingthe local polynomial estimators for the conditional momentsdefined right above. To do this let us assume we have an-other i.i.d. sample ( ˜ X j ) j =1 ,...,n ′ with same distribution as X .If the functions m ( . ) and σ ( . ) were known, we could estimateVar( E ( Y | X )) = Var( m ( X )) and E (Var( Y | X )) = E ( σ ( X ))with the classical empirical moments1 n ′ − n ′ X j =1 (cid:16) m ( ˜ X j ) − ¯ m (cid:17) and 1 n ′ n ′ X j =1 σ ( ˜ X j )where ¯ m = 1 n ′ n ′ X j =1 m ( ˜ X j ). As m ( . ) and σ ( . ) are unknown, themain idea is to replace them by their local polynomial estima-tors which leads to considerˆ T = 1 n ′ − n ′ X j =1 (cid:16) ˆ m ( ˜ X j ) − ˆ¯ m (cid:17) and ˆ T = 1 n ′ n ′ X j =1 ˆ σ ( ˜ X j )where ˆ¯ m = 1 n ′ n ′ X j =1 ˆ m ( ˜ X j ) and ˆ m ( . ) and ˆ σ ( . ) are the local poly-nomial estimators of m ( . ) and σ ( . ) introduced above. It is important to note that we need two samples, the first one( X i , Y i ) i =1 ,...,n to compute ˆ m ( . ) and ˆ σ ( . ) and the second one( ˜ X j ) j =1 ,...,n ′ to finally compute the empirical estimators ˆ T andˆ T .2.2 Bandwidth and Orders SelectionThe selection of the smoothing parameters h and h and to alesser extent of the polynomials orders p and q can be crucialto get the least mean squared error (MSE) of the estimatorsˆ T and ˆ T . Classically the MSE consists of a bias term plusa variance term and so is minimized by finding a compromisebetween bias and variance.Concerning this choice, the reader is referred to Fan et al.(1996), Fan & Yao (1998) or Ruppert (1997). Most of the meth-ods suggested by these authors rely upon asymptotic argumentsand their efficiency for finite sample cases is not clear. In prac-tice cross-validation methods can be used for the finite samplecase (Jones, Marron & Sheather 1996), but in the examplesof Section 3 we will use the empirical-bias bandwidth selector(EBBS) of Ruppert which appears to be efficient on simulateddata. EBBS is based on estimating the MSE empirically andnot with an asymptotic expression. The choice of the polyno-mials orders is more subjective. Concerning the estimation ofthe conditional expectation, Fan & Gijbels (1996) recommendto use a ν + 1 or ν + 3th-order polynomial to estimate the ν th-derivative of m ( x ), following theoretical considerations on theasymptotic bias of ˆ m ( x ) on the boundary. We would then belead to take p = 1 or p = 3 to estimate the 0th-derivative m ( x ).But Ruppert, Wand & Carroll (2003) suggest that this conclu-sion should be balanced by simulation studies and stress that p = 2 often outperforms p = 1 and p = 3. The only commonconclusion is that local linear regression ( p = 1) is usually supe-rior to kernel regression (Nadaraya-Watson estimator obtainedwith p = 0). This is the reason why we will only consider andstudy local linear regression for m ( x ) in the next theoretical andpractical sections. The choice is still difficult when estimatingthe conditional variance as we have to choose p and q simul-taneously. One more time, the authors are not unanimous :Fan & Yao (1998) recommend the case p = 1 , q = 1 whereasRuppert et al. (1997) suggest p = 2 , q = 1 or p = 3 , q = 1.However on the simulations we have carried out, the choice of p = 1 , q = 1 is adequate and satisfactory in terms of precision.This is the reason why we have decided to consider only thecase p = 1 , q = 1 for both theoretical and practical results.2.3 Theoretical Properties of the EstimatorsThe properties of ˆ T and ˆ T strongly depend on the asymp-totic results on the bias and variance of the local linear esti-mators ˆ m ( . ) and ˆ σ ( . ). We only give here two main results, allassumptions ( A , ..., A , B , ..., B , C ) and proofs are given inappendix for readability. E X and Var X stand for the conditionalexpectation and variance given the predictors X = ( X , ..., X n ).The expression o P ( ϕ ( h )) is equal to ϕ ( h ) o P (1) for a given func-tion ϕ . Here o P (1) is the standard notation for a sequence ofrandom variables that converges to zero in probability. Theorem 1
Under assumptions (A0)-(A4) and (C0), the es-timator ˆ T is asymptotically unbiased. More precisely X ( ˆ T ) = Var ( E ( Y | X )) + M h + M nh + o P ( h ) . where M and M are constants given in appendix.Remark 1. It would be interesting to calculate the varianceof this estimator, but it would require the expressions of thethird and fourth moments of the local linear estimator ˆ m ( . )(see the appendix). This is not an obvious problem and to thebest of our knowledge it has not been addressed in the liter-ature. It is beyond the scope of the present paper but it isan interesting problem for future research. Nevertheless, thevariance can be estimated on practical cases through bootstrapmethods for example (Efron & Tibshirani 1994). Theorem 2
Under assumptions (B0)-(B4) and (C0), the es-timator ˆ T is consistent. More precisely E X ( ˆ T ) = E ( Var ( Y | X )) + V h + o P ( h + h ) andVar X ( ˆ T ) = 1 n ′ (cid:26) E ( Var ( Y | X ) ) + V h + V h + V nh + o P (cid:18) h + h + 1 √ nh (cid:19)(cid:27) where V , V , V and V are constants given in appendix. X is multidimen-sional. The goal is to get an estimate of S i for i = 1 , . . . , d byusing one of the two estimators ˆ T and ˆ T . We need two samplesto compute each of them, i.e. a sample ( X ki , Y k ) k =1 ,...,n to esti-mate ˆ m ( . ) and ˆ σ ( . ) and a sample ( ˜ X li ) l =1 ,...,n ′ to get ˆ T and ˆ T where ( X ki ) k =1 ,...,n and ( ˜ X li ) l =1 ,...,n ′ are samples from the jointdistribution of the d -dimensional input factors X = ( X i ) i =1 ,...,d and ( Y k ) k =1 ,...,n a sample of the output Y . Note that the modelis run just for the first sample and not for the second one. Threesituations can arise :1. Sampling from the joint distribution of X has lowcomputational cost and running the model to compute( Y k ) k =1 ,...,n is cheap. This is the ideal situation. In-deed in this case the two samples ( X ki , Y k ) k =1 ,...,n and( ˜ X li ) l =1 ,...,n ′ can be generated independently and be aslarge as required ;2. Sampling from the joint distribution of X has low com-putational cost but model evaluations have not. In thiscase (also pointed out by Oakley & O’Hagan (2004)) amoderate-sized sample ( X ki , Y k ) k =1 ,...,n is used in orderto fit the conditional moments. However to compute ˆ T and ˆ T we can then use a sample ( ˜ X li ) l =1 ,...,n ′ of largesize ;3. Sampling from the joint distribution of X has high compu-tational cost. This case can arise in practice for example when the input factors are obtained through a procedurebased on experimental data and optimization routines.We then have an initial sample ( X j ) j =1 ,...,N of limitedsize N that we wish to use for the two steps of the esti-mation. The first idea is to split it and to use the first partto get the sample ( X ki , Y k ) k =1 ,...,n and the second one toget ( ˜ X li ) l =1 ,...,n ′ . The drawback of this method clearlyarises if N is very small. Another way to tackle the prob-lem is to use the well-known leave-one-out idea procedurewhich gives better approximation than data splitting.As suggested by the Associate Editor another possiblemethod could be to use the sample of size N to esti-mate the conditional moments and to estimate also themarginal densities of each input using for instance a non-parametric density estimator. One could then use thesedensity estimates to get the sample ( ˜ X li ) l =1 ,...,n ′ . Theclear disadvantage of this procedure is that it may biasthe final estimators. Some simulation runs not reportedhere for lack of space show that such a procedure leadsto less efficient estimates probably due to the large biasproduced by nonparametric methods.The last situation obviously leads to the less accurate ap-proximations of first-order sensitivity indices. However in gen-eral, litterature and results on sensitivity analysis assume that,if not analytically known, the joint distribution of the input fac-tors can at least be generated at low computational cost. Thisis the reason why we will only describe here the procedure forestimating first-order sensitivity indices in case 1 or 2. We nowassume that we have two samples ( X ki ) k =1 ,...,n and ( ˜ X li ) l =1 ,...,n ′ obtained by one of the methods described right above.The estimation procedure for S i = Var( E ( Y | X i ))Var( Y ) is the follow-ing :Step 1 : Compute the output sample ( Y k ) k =1 ,...,n by run-ning the model at ( X k ) k =1 ,...,n Step 2 : Compute ˆ σ Y , the classical unbiased estimator ofthe variance Var( Y )ˆ σ Y = 1 n − n X k =1 (cid:0) Y k − ¯ Y (cid:1) Step 3 : Use the sample ( X ki , Y k ) k =1 ,...,n to obtain ˆ m ( ˜ X li )for l = 1 , . . . , n ′ and ˆ m ( X ki ) for k = 1 , . . . , n using the smooth-ing parameter h given by EBBSStep 4 : Compute squared residuals ˆ r k = ( Y k − ˆ m ( X ki )) for k = 1 , . . . , n and apply the smoothing parameter h obtainedby EBBS to compute ˆ σ ( ˜ X li ) for l = 1 , . . . , n ′ Step 5 : Compute ˆ T with ˆ m ( ˜ X li ) for l = 1 , . . . , n ′ from Step3 and compute ˆ T with ˆ σ ( ˜ X li ) for l = 1 , . . . , n ′ from Step 4Step 6 : The estimates of S i are thenˆ S (1) i = ˆ T ˆ σ Y and ˆ S (2) i = 1 − ˆ T ˆ σ Y . (5)6o obtain all the first-order sensitivity indices, repeat theprocedure from Step 3 to Step 6 for i = 1 , ..., d . Remark 2.
Given the theoretical properties of ˆ T and ˆ T and more precisely their non-parametric convergence rate, wecan also expect a nonparametric convergence rate for ˆ S (1) andˆ S (2) . Remark 3.
In practice, our simulations show that n of theorder of 100 and n ′ around 2000 are enough for accurate esti-mation of the sensitivity indices.
3. EXAMPLES
In all the following examples we use the two estimators ˆ S (1) and ˆ S (2) defined in (5). As mentioned in Section 2.2, the condi-tional expectation is estimated here with local linear regression( p = 1) and the conditional variance with p = 1 and q = 1,the bandwidths being selected by the estimated-bias method ofRuppert (1997).3.1 Analytical ExamplesIn this section, we carry out two different comparisons in orderto study our two estimators from a numerical point of view.The first model has been chosen to underline their precision incorrelated cases when FAST and Sobol methods are no longerefficient and when Jacques’ approach for multidimensional sen-sitivity analysis is limited. We also show how interpretationwith sensitivity indices obtained by neglecting correlation canbe false. The second one is an example illustrating the perfor-mance of our estimators with respect to the method of Oakleyand O’Hagan in a two-dimensional setting.In the first analytical example, we study the model Y = X + X + X where ( X , X , X ) is a three-dimensional normal vector withmean and covariance matrixΓ = ρσ ρσ σ where ρ is the correlation of X and X and σ > X . The first order sensitivity indices can beevaluated analytically : S = 12 + σ + 2 ρσS = (1 + ρσ ) σ + 2 ρσS = ( σ + ρ ) σ + 2 ρσ The first crucial remark to be done in this case is that we musttake into account correlations to estimate sensitivity indices ifwe want a serious investigation of this model. Indeed, let usconsider the case where σ = 1 . ρ = − .
8. We then have S = 0 . , S = 0 . , S = 0 . , indicating that X should be the input to be fixed to reach thehigher variance reduction on Y . But if one neglects the corre-lation, by computing for instance these indices with the FASTmethod, i.e. working with a three-dimensional normal vectorwith mean and covariance matrix I instead of Γ, one wouldestimate S = 0 . , S = 0 . , S = 0 . S stands for the sensitivity indices when ρ = 0. Theseresults indicate that X should be fixed to mostly reduce thevariance of Y , which is absolutely wrong as the calculationsabove have shown. This simple example highlights the dan-ger of neglecting the correlations between the inputs and theimportance to take them into consideration when computingsensitivity indices.Otherwise, applying Jacques’ idea to X and the couple( X , X ), we also get the expression of the first order multi-dimensional sensitivity index S { , } = 1 + σ + 2 ρσ σ + 2 ρσ Choosing ρ = − . σ = 0 .
4, we have S = S { , } = 0 . , S = 0 . , S = 0 . X , X ) has the same importance as X .Indeed S { , } = S . But actually the high value of S { , } comesfrom X as shown by the exact calculations above, which im-plies that the information on S { , } alone is not sufficient. Butwith our method, we can estimate all the first order sensitivityindices :ˆ S (1)1 = 0 . , ˆ S (1)2 = 0 . , ˆ S (1)3 = 0 . S (2)1 = 0 . , ˆ S (2)2 = 0 . , ˆ S (2)3 = 0 . n = 50 and n ′ = 1000.We display in Figure 1 the boxplots corresponding to the distri-bution of the sensitivity indices on these 100 simulations withthe estimator ˆ T . Because of the mathematical complexity men-tioned before for the computation of the variance of ˆ T , we arenot able to recommend one estimator over the other one froma theoretical point of view. But in practice, we have observedthat the variance of ˆ T is at least comparable to the varianceof ˆ T , and sometimes lower. Nervertheless, the computation ofˆ T is more difficult as illustrated in Section 2.4.7 S en s i t i v i t y i nde x Input factor
Figure 1. Boxplot of the estimated sensitivity indices ( ˆ S (2) ) for thethree-factor additive model, 100 simulations. Dot lines are the truevalues .Computing S and S with our method, even if both ofthem take into account correlations, allows to confirm the ex-pected result : all the variability comes from X , and not from X . This simple example then brings out the limitation of themultidimensional approach.In the second analytical example we consider the model Y = 0 . X −
3) + 2 . | X | + 1 . X − X − . X − . X + 2 . X + 0 . X + 3(8 X − + (5 X − + 1 + sin(5 X ) cos(3 X )where X and X are independent random variables uniformlydistributed on [ − , Figure 2. Function proposed in model 2 .
In this case, the sensitivity indices are S = 0 . S = 0 . × , and used itto estimate the posterior distribution in the method of Oakley and O’Hagan and to estimate the conditional moments in ourmethod. Then, we calculated analytically the multidimensionalintegrals in the Bayesian approach while using a sample of size5000 to compute ˆ S (2) i for i = 1 ,
2. The Bayesian approach leadsto ˆ S = 0 . S = 0 . S (2)1 = 0 . S (2)2 = 0 . . We can see on this example that the results obtained with bothmethods are comparable. However on this simple case the mul-tidimensional integrals were analytically computed, which couldnot be the case in a non-independent setting. If not, a numericalintegration, if feasible, would lead to less accurate approxima-tions as discussed in Section 1.3.3.2 Practical Example from Chemical Field : Isomerization ofthe Normal ButaneThe isomerization of the normal butane, i.e. molecules withfour carbon atoms, is a chemical process aiming at transformingnormal butane (nC4) into iso-butane (iC4) in order to obtaina higher octane number, favored by iC. A simplified reactionmechanism has been used :nC4 ←→ iC4 (1)2 iC4 −→ C3+C5 (3)nC4 + iC4 −→ C3+C5 (4)where reaction (1) is the main reversible reaction converting thenormal butane into iso-butane. Reactions (3) and (4) are sec-ondary and irreversible reactions which produce propane (C3)and a lump of normal and iso-pentane (C5), paraffins with threeand five carbon atoms. The model linked to this process canbe written as Y = η ( c , θ )where- Y is the 3-dimensional result vector (mole fractions of thecomponents nC4, iC4, C3 and C5 ; note that their sum is 1),- c is the vector containing the operating conditions (pres-sure, temperature,...) and the mole fraction of the input com-ponents (nC4 and iC4, this is called the feed ),- θ = ( θ i ) i =1 ,..., is the 8-dimensional random vector of theparameters of the reactions (pre-exponential factors, activationenergies, adsorption constants,...),- f is the function modeling the chemical reactor in whichthe reaction takes place. It is evaluated through the resolutionof an ordinary differential equations system which can not beanalytically solved and is calculated numerically.The first step here is to get the distribution of θ which isunknown. However, it is possible to use the experience and theknowledge of chemical engineers to suggest a reasonable approx-imation of this distribution. Classically, we assume that θ hasa multivariate Gaussian distribution with mean zero (once the8arameters are centered). Concerning the correlation matrix, itis built with experts and with the help of bootstrap simulationsand is given by : Γ = .
43 0 .
09 0 .
29 0 .
55 0 .
66 0 . − . .
43 1 − .
54 0 .
11 0 .
37 0 .
25 0 . − . . − .
54 1 − .
02 0 .
20 0 . − .
40 0 . .
29 0 . − .
02 1 − . − . − .
22 0 . .
55 0 .
37 0 . − .
41 1 0 .
43 0 .
31 00 .
66 0 .
25 0 . − .
07 0 .
43 1 0 . − . .
10 0 . − . − .
22 0 .
31 0 .
17 1 − . − . − .
48 0 .
73 0 .
01 0 − . − .
61 1
In order to compute sensitivity indices, we generate a sampleof size n = 5000 from this distribution.Here we wish to estimate, for a given operating conditions andfeed vector c , the sensitivity indices of the outputs with respectto the input factors in θ , i.e. S ji = Var( E ( Y j | θ i ))Var( Y j )for j = 1 , ..., i = 1 , ...,
8. Actually, our goal is to identifyon which factor we should make the effort of reducing the un-certainty, by carrying out new experiments. This factor shouldbe chosen in order to reduce as much as possible the uncertaintyof the outputs.We consider two particular vectors c and c containing thesame operating conditions but a different feed ( c : nC4=1and iC4=0, c : iC4=1 and nC4=0). We have drawn for eachvector c i , i = 1 , n from Y by Monte-Carlosimulations, i.e. by computing Y j = η ( c i , θ j ) for j = 1 , ..., n .Thus we have a sample from ( Y , θ ) for each particular c and c . For instance, the estimates of the sensitivity indices of thethird output C3+C5 with the ˆ T estimator are given in Figure4. Filled bars correspond to c and empty bars to c .Note that the estimates given by the ˆ T estimator are simi-lar. These results highlight the behavior of the C3+C5 outputwhen the feed changes. Indeed when we only use nC4 in thefeed ( c ) the production of C3+C5 is mainly linked to theproduction of iC4 by reaction (1). This is confirmed by theimportance of parameters 1 and 6 in Figure 4 which are theparameters involved in reaction (1). When the feed only con-tains iC4 ( c ), the first reaction is no longer dominating forthe production of C3+C5, now mainly linked to reaction (3).Parameters 4 and 2 that are the most important in Figure 4for c are connected to reaction (3). We can thus conclude thatthe results confirm the expected behavior of the C3+C5 output. Parameters S en s i t i v i t y i nde x θ θ θ θ θ θ θ θ Figure 4. Sensitivity indices of the C3+C5 output in the isomer-ization model for the particular conditions c (filled bars) and c (empty bars) .We could obviously study the sensitivity indices for the otheroutputs, and for other operating conditions. Such a study hasbeen carried out and showed that the most influent parametersdepend on the operating conditions and the feed. But it alsounderlined that each parameter of the model has an influenceon at least one output for at least one operating condition. Inthis case these sensitivity indices estimates enlighten the factthat all the parameters are potentially important. A discussionwith chemical engineers would then be necessary in order toidentify which outputs are most critical for their goals (control-ling for instance the first output iC4 which is strongly linked tothe octane rate) and would thus help us to choose which inputparameters deserve most attention.
4. DISCUSSION AND CONCLUSION
The estimation method proposed in this paper is an efficientway to carry out sensitivity analysis by computing first ordersensitivity indices when the inputs are not independent. Theuse of local polynomial estimators is the key point of the es-timation procedure. It guarantees some interesting theoreticalproperties and ensures good qualities to the estimators we haveintroduced. Beyond these theoretical results, practical exam-ples also show a good precision for a rather low computationtime. Obviously, higher precision requires higher calculationtime and the user has the possibility to adapt the estimators,by fixing some hyper-parameter values such as polynomials or-ders.The main advantage of our estimators is obviously that theyonly make the assumption that the marginals are smooth andthen require less model runs than classical sampling methods.Comparing with the Bayesian approach of Oakley & O’Hagan(2004), our method has the same philosophy as it uses modelruns to fit a response surface under smoothness assumptions,but we avoid its numerical integration issue in high dimension.Moreover our approach is appealing for practioners in the sensethat they can see it as a black-box routine, as each step of theprocedure is data-driven once the user has given the two sam-ples needed for the estimation.Finally, we think that a practitioner willing to carry out asensitivity analysis should combine different approach to getthe most accurate result, for example computing the indiceswith the method we introduce her and the one of Oakley andO’Hagan. Indeed these two methods are not concurrent butcomplementary.Future work will also be based on building multi-outputssensitivity indices through multivariate nonparametric regres-sion techniques.
ACKNOWLEDGMENTS
We thank Professor Anestis Antoniadis for very helpful discus-sions. We also thank the referees and associate editor for veryuseful suggestions.9
PPENDIX : PROOFS OF THEOREMS
A.1 AssumptionsWe list below all the assumptions we use in the developmentof our proofs. Note that the bandwidths h and h are bydefinition positive real numbers.(A0) As n → ∞ , h → nh → ∞ ;(A1) The kernel K ( . ) is a bounded symmetric and continu-ous density function with finite 7 th moment ;(A2) f X ( x ) > f X ( . ) is bounded in a neighborhood of x where f X ( . ) denotes the marginal density function of X ;(A3) ... m ( . ) exists and is continuous in a neighborhood of x ;(A4) σ ( . ) has a bounded third derivative in a neighborhoodof x and ¨ m ( x ) = 0 ;(B0) As n → ∞ , h i → nh i > i = 1 , K ( . ) is a symmetric density function witha bounded support in R . Further, | K ( x ) − K ( x ) | ≤ c | x − x | for x , x ∈ R ;(B2) The marginal density function f X ( . ) satisfies f X ( x ) > | f X ( x ) − f X ( x ) | ≤ c | x − x | for x , x ∈ R ;(B3) E ( Y ) < ∞ ;(B4) σ ( x ) > E ( Y k | X = . ) is continuousat x for k = 3 ,
4. Further, ... m ( . ) and ... σ ( . ) are uniformly con-tinuous on an open set containing the point x ;(C0) f X ( . ) has compact support [ a, b ]Assumptions (A0) and (B0) are standard ones in kernel esti-mation theory. Some classical considerations on MSE or MISE(Mean Integrated Squared Error) lead to theoretical optimalconstant bandwidths of order n − / .Assumptions (A1) and (B1) are directly satisfied by com-monly used kernel functions. We can note that they requirea kernel with bounded support, but this is only a technicalassumption for brevity of proofs. For example, the Gaussiankernel can be used.The assumption f X ( x ) > f X ( . ) to be bounded in a neighborhood of x is nat-ural. The Lipschitz condition on f in (B2) is directly satisfiedif f is sufficiently regular and with compact support.Assumptions (A3), (A4), (B3) and (B4) are natural andensure sufficient regularity to the conditional moments.Assumption (C0) is made to make the presentation easier.It can be relaxed by means of the conventional truncation tech-niques used in real cases (Mack & Silverman (1982)). Never-theless in practice, the input factors considered in sensitivity analysis are bounded and so have densities with compact sup-port.A.2 Proof of Theorem 1This theorem is a direct consequence of the asymptotic behaviorof the bias and variance in local linear regression.Under assumptions (A0)-(A4), Fan et al. (1996) establishedthat for a given kernel K ( . ) E X ( ˆ m ( x )) = m ( x ) + 12 µ ¨ m ( x ) h + o P ( h ) (6)and Var X ( ˆ m ( x )) = ν σ ( x ) f X ( x ) nh + o P ( h ) (7)where µ k = Z u k K ( u ) du and ν k = Z u k K ( u ) du . Now as theestimator ˆ T is ˆ T = 1 n ′ − n ′ X j =1 (cid:16) ˆ m ( ˜ X j ) − ˆ¯ m (cid:17) we can write ˆ T = 1 n ′ − n ′ X j =1 ( Z j − ¯ Z ) where ( Z j ) j =1 ,...,n ′ := ( ˆ m ( ˜ X j )) j =1 ,...,n ′ and ¯ Z = 1 n ′ n ′ X j =1 Z j . Byconditioning on the predictors X , the sample ( Z j | X ) j =1 ,...,n ′ isan i.i.d. sample distributed as Z | X and the conditional bias ofˆ T can then be obtained through the classical formula for theempirical estimator of the variance : E X ( ˆ T ) = Var X ( Z ) = E X ( Z ) − E X ( Z ) . Note that we can also compute its varianceVar X ( ˆ T ) = 1 n ′ (cid:18) E X (( Z − E X ( Z )) ) − n ′ − n ′ − X ( Z )) (cid:19) even though we do not use this result here (see Remark 1.).As ˜ X is independent of X and Y , we write E X ( Z ) = Z E X ( ˆ m ( x ) ) f ˜ X ( x ) dx = Z (cid:0)
Var X ( ˆ m ( x )) + E X ( ˆ m ( x )) (cid:1) f X ( x ) dx. Considering assumptions (A3), (A4) and (C0) we then get us-ing (6) and (7), in a similar way as for the standard MISEevaluation, E X ( Z ) = Z m ( x ) f X ( x ) dx + ν nh Z σ ( x ) dx + µ h Z m ( x ) ¨ m ( x ) f X ( x ) dx + o P ( h )and by the same arguments we also have E X ( Z ) = Z m ( x ) f X ( x ) dx + 12 µ h Z ¨ m ( x ) f X ( x ) dx + o P ( h ) , E X ( ˆ T ) = E X ( Z ) − E X ( Z ) = Var( E ( Y | X ))+ µ h (cid:20)Z m ( x ) ¨ m ( x ) f X ( x ) dx − (cid:18)Z m ( x ) f X ( x ) dx (cid:19) (cid:18)Z ¨ m ( x ) f X ( x ) dx (cid:19)(cid:21) + ν nh Z σ ( x ) dx + o P ( h )= Var( E ( Y | X )) + M h + M nh + o P ( h )where M = µ (cid:20)Z m ( x ) ¨ m ( x ) f X ( x ) dx − (cid:18)Z m ( x ) f X ( x ) dx (cid:19) (cid:18)Z ¨ m ( x ) f X ( x ) dx (cid:19)(cid:21) and M = ν Z σ ( x ) dx. A.3 Proof of Theorem 2Similarly we first recall asymptotic results for the residual-basedestimator of the conditional variance.Under assumptions (B0)-(B4) Fan & Yao (1998) showedthat E X (ˆ σ ( x )) = σ ( x ) + 12 µ ¨ σ ( x ) h + o P ( h + h )and Var X (ˆ σ ( x )) = ν σ ( x ) λ ( x ) f X ( x ) nh + o P (cid:18) √ nh (cid:19) where λ ( x ) = E (( ǫ − | X = x ) and µ and ν are as definedabove. The estimator ˆ T can be written asˆ T = 1 n ′ n ′ X j =1 U j where ( U j ) j =1 ,...,n ′ := (ˆ σ ( ˜ X j )) j =1 ,...,n ′ . As in the proof of The-orem 1, we then get the conditional bias and variance of ˆ T : E X ( ˆ T ) = E X ( U )and Var X ( ˆ T ) = 1 n ′ Var X ( U ) . As ˜ X is independent of X and Y , we have E X ( U ) = Z E X (ˆ σ ( x )) f ˜ X ( x ) dx. Considering assumptions (B4) and (C0) as in the proof of The-orem 1 we then get E X ( ˆ T ) = E (Var( Y | X )) + 12 µ h Z ¨ σ ( x ) f X ( x ) dx + o P ( h + h )= E (Var( Y | X )) + V h + o P ( h + h ) where V = 12 µ Z ¨ σ ( x ) f X ( x ) dx and using the same argumentsVar X ( ˆ T ) = 1 n ′ (cid:8) E (Var( Y | X ) )+ µ h Z σ ( x )¨ σ ( x ) f X ( x ) dx − µ h (cid:18)Z ¨ σ ( x ) f X ( x ) dx (cid:19) (cid:18)Z σ ( x ) f X ( x ) dx (cid:19) + ν nh Z σ ( x ) λ ( x ) dx + o P (cid:18) h + h + 1 √ nh (cid:19)(cid:27) = 1 n ′ (cid:26) E (Var( Y | X ) ) + V h + V h + V nh + o P (cid:18) h + h + 1 √ nh (cid:19)(cid:27) where V = µ Z σ ( x )¨ σ ( x ) f X ( x ) dx,V = − µ (cid:18)Z ¨ σ ( x ) f X ( x ) dx (cid:19) (cid:18)Z σ ( x ) f X ( x ) dx (cid:19) ,V = ν Z σ ( x ) λ ( x ) dx. REFERENCES
Cukier, R., Fortuin, C., Shuler, K., Petschek, A. & Schaibly,J. (1973), ‘Study of the sensitivity of coupled reaction sys-tems to uncertainties in rate coefficients. I Theory’,
TheJournal of Chemical Physics , 3873–3878.Efron, B. & Tibshirani, R. J. (1994), An Introduction to theBootstrap , New York: Chapman and Hall/CRC.Fan, J. & Gijbels, I. (1996),
Local Polynomial Modelling and itsApplications , London: Chapman and Hall.Fan, J., Gijbels, I., Hu, T.-C. & Huang, L.-S. (1996), ‘Anasymptotic study of variable bandwidth selection for lo-cal polynomial regression’,
Statistica Sinica , 113–127.Fan, J. & Yao, Q. (1998), ‘Efficient estimation of conditionalvariance functions in stochastic regression’, Biometrika , 645–660.Jacques, J., Lavergne, C. & Devictor, N. (2004), Sensitivityanalysis in presence of model uncertainty and correlatedinputs, in ‘Proceedings of SAMO2004’.Jones, M., Marron, J. & Sheather, S. (1996), ‘A brief surveyof bandwidth selection for density estimation’, Journal ofthe American Statistical Association , 401–407.11ennedy, M. & O’Hagan, A. (2001), ‘Bayesian calibration ofcomputer models (with discussion)’, Journal of the RoyalStatistical Society Series B , 425–464.Mack, Y. & Silverman, B. (1982), ‘Weak and strong uniformconsistency of kernel regression estimates’, Z. Wahrsch.Verw.Gebiete , 405–415.McKay, M. (1996), ‘Variance-based methods for assessing un-certainty importance in NUREG-1150 analyses’, LA-UR-96-2695 pp. 1–27.Oakley, J. & O’Hagan, A. (2004), ‘Probabilistic sensitivity anal-ysis of complex models : a bayesian approach’,
Journal ofthe Royal Statistical Society Series B , 751–769.Owen, A. B. (2005), Multidimensional variation for quasi-montecarlo, in ‘Fan, J. and Li, G., editors, International Confer-ence on Statistics in honour of Professor Kai-Tai Fang’s65th birthday.’.Ratto, M., Tarantola, S. & Saltelli, A. (2001), Estimation of im-portance indicators for correlated inputs, in ‘Proceedingsof ESREL2001’.Ruppert, D. (1997), ‘Empirical-bias bandwidths for local poly-nomial nonparametric regression and density estimation’, Journal of the American Statistical Association , 1049–1062. Ruppert, D., Wand, M. & Carroll, R. (2003), SemiparametricRegression , Cambridge: Cambridge University Press.Ruppert, D., Wand, M., Holst, U. & Hˆssjer, O. (1997), ‘Localpolynomial variance function estimation’,
Technometrics , 262–273.Saltelli, A., Chan, K. & Scott, E. M. (2000), Sensitivity Analy-sis , Chichester: Wiley Series in Probability and Statistics.Saltelli, A., Tarantola, S., Campolongo, F. & Ratto, M. (2004),
Sensitivity Analysis in Practice , Chichester: Wiley.Santner, T. J., Williams, B. J. & Notz, W. I. (2003),
The De-sign and Analysis of Computer Experiments , New York:Springer Verlag.Sobol’, I. (1993), ‘Sensitivity estimates for nonlinear mathemat-ical models’,