Stratified Random Sampling for Dependent Inputs
SStratified Random Sampling for Dependent Inputs
Anirban MondalCase Western Reserve University, Cleveland, OH 44106, USAAbhijit MandalWayne State University, Detroit, MI 48202, USAApril 2, 2019
Abstract
A new approach of obtaining stratified random samples from statistically dependent randomvariables is described. The proposed method can be used to obtain samples from the input spaceof a computer forward model in estimating expectations of functions of the corresponding outputvariables. The advantage of the proposed method over the existing methods is that it preservesthe exact form of the joint distribution on the input variables. The asymptotic distribution ofthe new estimator is derived. Asymptotically, the variance of the estimator using the proposedmethod is less than that obtained using the simple random sampling, with the degree of variancereduction depending on the degree of additivity in the function being integrated. This techniqueis applied to a practical example related to the performance of the river flood inundation model.
Keywords : Stratified sampling; Latin hypercube sampling; Variance reduction; Sampling withdependent random variables; Monte Carlo simulation.
Mathematical models are widely used by engineers and scientists to describe physical, economic,and social processes. Often these models are complex in nature and are described by a system ofordinary or partial differential equations, which cannot be solved analytically. Given the valuesof the input parameters of the processes, complex computer codes are widely used to solve such1 a r X i v : . [ s t a t . M E ] A p r ystems numerically, providing the corresponding outputs. These computer models, also knownas forward models, are used for prediction, uncertainty analysis, sensitivity analysis, and modelcalibration. For such analysis, the computer code is evaluated multiple times to estimate somefunction of the outputs. However, complex computer codes are often too expensive to be directlyused to perform such studies. Design of computer experiment plays an important role in suchsituations, where the goal is to choose the input configuration in an optimal way such that theuncertainty and sensitivity analysis can be done with a few number of simulation runs. Moreover,to avoid the computation challenge, it is often useful to replace the computer model either bya simpler mathematical approximation called as surrogate or meta model (Volkova et al., 2008,Simpson et al., 2001), or by a statistical approximation, called emulator (Oakley and O’hagan,2002, O’Hagan, 2006, Conti et al., 2009). Such surrogate models or emulators are also widely usedfor the model calibration (Kennedy and O’Hagan, 2001), which is the inverse problem of inferringabout the model input parameters given the outputs. The surrogate or the emulator model is builtbased on set of training simulations of the original computer model. The full optimal explorationof the variation domain of the input variables is therefore very important in order to avoid non-informative simulation points (Fang et al., 2006, Sacks et al., 1989).In most situations, the simple random sampling (SRS) needs very large number of samplesto achieve a fixed level of efficiency in estimating some functions of the outputs of such complexmodels. The required sample size increases rapidly with an increase in dimension of the inputspace. On the other hand, the Latin hypercube sampling (LHS) introduced by McKay et al. (1979)needs smaller sample sizes than the SRS to achieve the same level of efficiency. Compared to theSRS, which only ensures independence between samples, the LHS also ensures the full coverage ofthe range of the input variables through stratification over the marginal probability distributions ofthe inputs. Thus, the LHS method is more efficient and robust if the components of the inputs areindependently distributed. In case of dependent inputs Iman and Conover (1980) has proposed anapproximate version of the LHS based on the rank correlation. Stein (1987) further improved thisprocedure such that each sample vector has approximately the correct joint distribution when thesample size is large. However, it is noticed that in many scenarios, the rank-based method resultsin a large bias and small efficiency in the estimators, particularly for small sample sizes. Moreover,2he joint distribution of the inputs are also not completely preserved in these sampling schemeseven for moderately large sample sizes. Therefore, these rank-based techniques are not very usefulfor many real applications where one is restricted to a small sample size due to the computationalburden of the expensive forward simulator. To overcome this situation, we propose a novel samplingscheme for dependent inputs that precisely gives a random sample from the target joint distributionwhile keeping the mean squared error of the estimator smaller than the existing methods. Inthe traditional LHS, where the components of inputs are independent, the stratification of themarginal distributions leads to the stratification of the joint distribution. However, for dependentrandom variables, all the marginal distributions and the joint distribution cannot be stratifiedsimultaneously. Here, we propose a new sampling scheme, called the Latin hypercube sampling fordependent random variables (LHSD), where we ensure that the conditional probability distributionsof the inputs are stratified. The main algorithm of the LHSD is similar to the traditional LHS,hence it retains all important properties of LHS. The joint distribution of the inputs is preservedin our sampling scheme as it is precisely the product of the conditional distributions.In some practical situations the joint probability distribution of the inputs, and hence the cor-responding conditional probability distributions may be unknown. In these situations, we proposea copula-based method to construct the joint probability distribution from the marginal distribu-tions. For finite sample, it is shown that the variance of the estimators based on LHSD is alwayssmaller than the variance of the estimator based on the SRS. The large sample properties of theLHSD based estimators are also provided. We consider two simulation-based examples and onepractical example, where the results show that the traditional LHS and rank-based LHS methodhave considerable bias in the estimator when the inputs are dependent. Our proposed LHSD out-performs the SRS, the traditional LHS, and rank-based LHS in terms of the mean squared error(MSE), for small and moderately large sample sizes. The simulation results also show that therank-based LHS fails to retain the joint probability distribution of the input variables even formoderately large sample sizes, which is the reason for the considerable bias in the estimators. Onthe other hand, the joint distribution is completely retained in our proposed sampling scheme andhence the estimators are unbiased.The paper is organized as follows. In the next section, we first formulate the estimation problem,3hen describe the LHS algorithm for independent inputs (McKay et al., 1979) and our proposedLHSD algorithm for dependent inputs. Another variant of the proposed method, which will becalled the centered Latin hypercube sampling for dependent random variables (LHSD c ), is alsodescribed here. The use of copula, when the joint probability distribution of the inputs is notknown, is also discussed in this section. The large sample properties of the estimators using theLHSD are discussed in Section 3. Section 4 provides the numerical results from two simulationexamples, where the performance of different sampling schemes are compared. The application ofthe proposed sampling scheme to a real field example on a river model is presented in Section 5. Aconcluding remark is given at the end of the paper, and the proofs of the theorems are provided inAppendix. Consider a process (or device) which depends on an input random vector X = ( X , X , · · · , X K ) T of fixed dimension K . Suppose we want to estimate the expected value of some measure of theprocess output, given by a function h ( X ). As described before, in many cases, h is highly non-linearand/or K is very large, so the probability distribution of h ( X ) is not analytically tractable eventhough the probability distribution of X is completely known. For example, h could depend onthe output of a physical process that involves a system of partial differential equations. In suchcases, Monte Carlo methods are usually used to estimate τ = E ( h ( X )), the expected value of h ( X ).Suppose X , X , · · · , X N form a random sample of size N , generated from the joint distribution of X using a suitable sampling scheme, then τ is estimated by ˆ τ = N − (cid:80) Nj =1 h ( X j ). Since, in termsof time or other complexity, it is costly to compute h ( X ), we are interested in a sampling schemethat estimates τ efficiently while keeping the sample size N as small as possible. Suppose X = ( X , X , · · · , X K ) T is a random vector with K components. Let F be the jointcumulative density function (c.d.f.) of X , and F k be the marginal distribution function of X k , k = 1 , , · · · K . McKay et al. (1979) proposed a sampling scheme to generate a Latin hypercube4ample of size N from X assuming that the components of X are mutually independent. The detailsof the LHS algorithm is given in Algorithm 1. Algorithm 1
LHS1. Generate P = (( p jk )) N × K , a N × K random matrix, where each column of P is an independentrandom permutation of 1 , , · · · , N .2. Using the simple random sampling method generate N K independent and identically dis-tributed (i.i.d.) uniform random variables u jk over [0 , u jk i . i . d . ∼ U (0 ,
1) for j =1 , , · · · , N and k = 1 , , · · · , K .3. The LHS of size N from independent X k ’s are given by x jk = F − k (( p jk − u jk ) /N ) , j =1 , , · · · N, k = 1 , , · · · , K , where the inverse function F − k ( · ) is the quantile defined by F − k ( y ) = inf { x : F k ( x ) ≥ y } .For the above algorithm, all the marginal distributions of X are stratified. Such marginalstratification leads to the stratification of the joint distribution and hence there is a reductionin variance of the estimator compared to the simple random sampling. Note that, for any fixed j , ( x j , x j , · · · , x jK ) form a random sample from the marginal distribution of X j , where j =1 , , · · · , K . But, as a whole, they are not a set of random samples from the joint distribution of X unless the component variables X , X , · · · , X K are mutually independent. Hence the estimatorˆ τ , based on samples from LHS, becomes a biased estimator for τ , when X , X , · · · , X K are notmutually independent. This bias could lead to a large MSE of the estimator even if there is asignificant reduction of variance due to stratification. In this section, we propose a general method for the Latin Hypercube sampling, where the com-ponents of X are not necessarily independent. Let us define F k ( x k | X = x , X = x , · · · , X k − = x k − ) as the conditional distribution function of X k given X = x , X = x , · · · , X k − = x k − for k = 2 , , · · · , K . First, we transform X to Z = ( Z , Z , · · · , Z K ) T , such that Z = F ( x ) and Z k = F k ( x k | X = x , X = x , · · · , X k − = x k − ) , k = 2 , , · · · , K . Note that the components of Z are i.i.d. U (0 , N from Z using the LHS algorithm, and finally,we convert them back to the random sample of X using the inverse distribution function of Z . The5ifferent steps for the proposed LHSD algorithm are given in Algorithm 2. Algorithm 2
LHSD1. Suppose the components of Z are i.i.d. U (0 , z jk , j =1 , , · · · N, k = 1 , . · · · , K , a LHS of size N from Z .2. Sequentially get x j = ( x j , x j , · · · , x jK ) T , the j -th random sample from X for j =1 , , · · · , N , as given below: • x j = F − ( z j ), • x j = F − ( z j | X = x ), • x j = F − ( z j | X = x , X = x )... • x jK = F − K ( z jK | X = x , X = x , · · · , X k − = x k − ),where F − k ( ·| X = x , X = x , · · · , X k − = x k − ) is the inverse distribution function of X k given X = x , X = x , · · · , X k − = x k − for k = 1 , , · · · , K .In this sampling scheme, the conditional distributions of X are stratified, but not all the marginaldistributions except for the first one. In fact, it can be shown that, in case of dependent randomvariables, all marginal distributions cannot be stratified simultaneously while preserving the com-plete joint distribution. However, if the components of X are independently distributed, then boththe marginal and conditional distributions are stratified by this method. In this case, the proposedalgorithm turns out to be the traditional LHS sampling scheme proposed by McKay et al. (1979).The advantage of the proposed method over the traditional LHS is that, the joint distribution of X is fully preserved by this sampling scheme, even when its component are dependent, because theproduct of the conditional distributions is precisely the corresponding joint distribution. As a resultˆ τ becomes an unbiased estimator of τ when the samples from LHSD are used. The stratification ofthe conditional distributions also guarantees reduction in variance of the estimator. For these tworeasons the LHSD sampling method is more efficient in terms of MSE of the estimator, when thecomponents of X are dependent.Note that, the definition of Z depends on the order of the components of X . Moreover, aftergenerating a random sample from Z , we sequentially recover the components of the random samplefor X . So, our method depends on the order of the components of X in the definition of Z .6heoretically, we can take any order to get a random sample from X . However, the efficiency ofthe sampling scheme is maximized if we arrange the components of X in the descending order ofconditional sensitivity. It will be further discussed in the next sections. Algorithm 1 shows that the Latin hypercube sampling has two stages of randomization pro-cess. In the first stage, p jk decides in which stratum the j -observation of X k will belong, where j = 1 , , · · · , N and k = 1 , , · · · , K . Then, in the second stage, u jk fixes the position of the ob-servation in that stratum. The variability in the estimation of τ comes from these two sources ofrandomization. So, it is obvious that if we fix u jk = 1 /
2, the small sample variance of ˆ τ is expectedto reduce further. It introduces a small bias as it can be easily shown that E (ˆ τ ) (cid:54) = τ . However, thesimulation studies show that the reduction in the variance is larger than the bias term, and as aresult, the MSE of ˆ τ is reduced in small samples. Stein (1987) has proved that as the sample size N → ∞ , there is no role of u jk in the asymptotic distribution of ˆ τ of the traditional LHS givenin Algorithm 1. Therefore, the large sample properties of the modified LHS remain unchanged.Now, fixing u jk = 1 / p jk in the first stage of the algorithm, so the centered LHS is also a random sampling scheme whereall observations are stratified. As our primary goal of this paper is to draw a LHS from a set ofdependent random variables, we will apply this method in Algorithm 2, where at the first stage, wegenerate z jk using the centered LHS instead of the traditional LHS; then rest of the method willremain unchanged. We denote the modified sampling scheme as the centered LHSD or LHSD c . It should be noted that in the proposed LHSD method, the joint distribution of the inputs, F , andthe corresponding marginal distributions F k ( x k | x , x , · · · , x k − ), k = 2 , , · · · , K are assumed tobe known. However, in some applications the joint distributions and the corresponding conditionaldistributions may be unknown. In those situations, we propose to use a copula-based method toconstruct the joint distribution from the known marginal distributions. Let the marginal cumu-7ative distribution function of X k be F k ( x k ) , k = 1 , , · · · , K . Then according to Sklar’s theorem(Sklar, 1959), there exists a copula C : [0 , K → [0 , X , X , · · · , X K can be written as F ( x , x , · · · , x k ) = C ( F ( x ) , F ( x ) , · · · , F K ( x K )) . (2.1)Let us define U k = F k ( X k ) , k = 1 , , · · · , K . If F is continuous then C is uniquely written as C ( u , u , · · · , u K ) = F ( F − ( u ) , F − ( u ) , · · · , F − K ( u K )) . (2.2)For k = 2 , , · · · , K , the k -th conditional copula function is defined as C k ( u k | u , · · · , u k − ) = P ( U k ≤ u k | U = u , · · · , U k − = u k − )= ∂ k − C k ( u , u , · · · , u k ) ∂u , · · · , ∂u k − , (2.3)where C k ( u , u , · · · , u k ) = C ( u , u , · · · , u k , , , · · · , k -dimensional marginal copula func-tion. The conditional distribution of X k given X = x , X = x , · · · , X k − = x k − is defined as F ( x k | X = x , X = x , · · · X k − = x k − )= ∂ k − ∂u , · · · , ∂u k − C k ( u , · · · , u k ) (cid:12)(cid:12)(cid:12) u = F ( x ) , ··· ,u k = F k ( x k ) . (2.4)By choosing a suitable copula and then finding the conditional distributions from equation (2.4),one can use the same LHSD algorithm (Algorithm 2) to sample from the joint distribution. Al-ternatively, one can also use the conditional copulas in equation (2.3) to sample from the jointdistribution using Algorithm 3.There are several families of copula functions; and the most commonly used families are ellipticalcopula family – such as Gaussian copula and t-copula, and Archimedean copula family – suchas Gumbel copula, Clayton copula, Frank copula etc. Different copula functions are suitable tomeasure different dependency structures among the random variables. For example, Gumbel copulais suitable to demonstrate the dependency structure with upper tail dependence, and Frank copula8 lgorithm 3 LHSD using Copula1. Suppose the components of Z are i.i.d. U (0 , z jk , j =1 , , · · · N, k = 1 , . · · · , K , a LHS of size N from Z .2. Sequentially get u j = ( u j , u j , · · · , u jK ) T , the j -th random sample from the joint copula for j = 1 , , · · · , N , using the inverse conditional copula functions, • u j = z j , • u j = C − ( z j | u j ), • u j = C − ( z j | u j , u j ) , ... • u jK = C − K ( z jK | u j , · · · , u jK − ),3. Obtain x j = ( x j , x j , · · · , x jK ) T , the j -th random sample from X for j = 1 , , · · · , N , usingthe inverse distribution function, x jk = F − K ( u jk ), for k = 1 , , · · · , K .is adopted to measure the symmetric dependency structure.Given a data set, choosing a suitable copula function that fits the data is an important butdifficult problem. Since the real data generation mechanism is unknown, it is possible that sev-eral candidate copula functions may fit the data reasonably well; on the other hand, none of thecandidate may give a reasonable fit. However, the common procedure to select a suitable copulafunction contains the following three steps: (i) choose several commonly used copula functions; (ii)for a given data set, estimate parameters of the selected copula functions by some methods, suchas the maximum likelihood estimation method; and finally, (iii) select the optimal copula function.A divergence based shortest distance measure between the empirical distribution function and theestimated copula function is used to select the optimal copula function.It is to be noted that Algorithm 3 can also be used to sample from any joint distribution whenthe conditional distributions are not known or difficult to calculate in the closed form, but theconditional copulas are easier to find. Suppose we are interested in a measurable function h : R K → R . Our goal is to estimate τ = E ( h ( X )), where X is a random vector with distribution function F ( · ). We assume that h ( · ) has a9nite second moment. Based on X , X , · · · , X N , a LHSD of size N , the estimator of τ is given byˆ τ = N − (cid:80) Nj =1 h ( X j ). In this section, we discuss about the asymptotic properties of ˆ τ .We define Z = ( Z , Z , · · · , Z K ) T by the following transformations: • Z j = F ( X j ), • Z j = F ( X j | X = x ), • Z j = F ( X j | X = x , X = x ),... • Z jK = F K ( X jK | X = x , X = x , · · · , X k − = x k − ),where F k ( ·| X = x , X = x , · · · , X k − = x k − ) is the conditional distribution function of X k given X = x , X = x , · · · , X k − = x k − for k = 2 , , · · · , K . Suppose the transformation can bewritten as Z = m ( X ), where m : R K → R K . Then, we have h ( X ) = h ( m − ( Z )) ≡ h ∗ ( Z ) , (3.1)where h ∗ : R K → R . Thus, τ = E ( h ( X )) = E ( h ∗ ( Z )).Note that the components of Z are i.i.d. U (0 ,
1) variables. Once we transform X to Z andsubsequently h ( · ) to h ∗ ( · ), our problem simplifies to the estimation of h ∗ ( Z ), where the componentsof Z are i.i.d. U (0 ,
1) variables. Thus, the theoretical properties of ˆ τ can be derived obtained fromthe work of Stein (1987) and Owen (1992).Let us assume that (cid:82) A K h ∗ ( z ) d z < ∞ , where A K = { z ∈ R K : || z || ≤ } . Following Stein(1987) we decompose h ∗ as: h ∗ ( z ) = τ + K (cid:88) k =1 α k ( z k ) + r ( z ) , (3.2)where α k ( z k ) = (cid:90) A K − ( h ∗ ( z ) − τ ) d z − k , (3.3) z − k being the vector z without the k -th component. Here, α k ( z k ) is called as the “main effect”function of the k -th component of Z , and r ( z ), defined by subtraction in equation (3.2), is the10residual from additivity” of h ∗ . It is clear from the definition that (cid:90) α k ( z k ) dz k = 0 and (cid:90) A K − r ( z ) d z − k = 0 , (3.4)for all k = 1 , , · · · , K . The variance of ˆ τ under the LHSD is derived from the following theorem. Theorem 1. As N → ∞ Var
LHSD (ˆ τ ) = 1 N (cid:90) A K r ( z ) d z + o (cid:18) N (cid:19) . (3.5)The proof of the theorem is given in Appendix, and it directly follows from Stein (1987). Notethat the variance of ˆ τ under the SRS is given byVar SRS (ˆ τ ) = 1 N (cid:90) A K r ( z ) d z + 1 N K (cid:88) k =1 (cid:90) A α k ( z k ) dz k + o (cid:18) N (cid:19) . (3.6)It shows that the variance of ˆ τ in the LHSD is smaller than that of the SRS unless all the maineffects α k are equal to zero. For a fixed N , the variance of ˆ τ is reduced by N (cid:80) Kk =1 (cid:82) A α k ( z k ) dz k if the LHSD is used instead of the SRS. Therefore, if the main effects are large, the reduction invariance can be substantial. One may notice that the function α k in equation (3.2) depends onthe order of X in the transformation of Z ; and hence it is possible to minimize the variance bytaking an appropriate order of X . Our conjecture is that if the components of X are arranged inthe decreasing order of their sensitivity in estimating function h , then there will be a maximumreduction of variance. This will be addressed in more detail in our future research. Now, theasymptotic distribution of ˆ τ under the LHSD is obtained from the following theorem. Theorem 2.
The asymptotic distribution of ˆ τ under the LHSD is given by N / (ˆ τ − τ ) a ∼ N (cid:18) , (cid:90) A K r ( z ) d z (cid:19) . (3.7)11fter taking transformation from X to Z , the proof of the theorem directly follows from Owen(1992). An outline of the proof is given in Appendix. Using the above theorem, we can constructa confidence interval for τ provided we have an estimate of (cid:82) A K r ( z ) d z . For this reason, we mustestimate the additive main effects α k from the data. One may use a non-parametric regression forthis purpose. It assumes some degree of smoothness for α k in case of application. Equation (3.2)shows that a non-parametric regression of Y = h ∗ ( Z ) on Z k provides an estimate ˆ g k of the function g k = τ + α k . So, α k is estimated by ˆ α k = ˆ g k − ¯ Y , where¯ Y = 1 N N (cid:88) j =1 Y j = 1 N N (cid:88) j =1 h ∗ ( Z j ) = 1 N N (cid:88) j =1 h ( X j ) = ˆ τ . Then, the residual for the i -th observation is estimated asˆ r i = Y i − ¯ Y − K (cid:88) k =1 ˆ α k ( Z ik ) . Finally, the variance (cid:82) A K r ( z ) d z is estimated by ˆ r i /N . The theoretical properties of the varianceestimate are discussed in Owen (1992). To compare the performance of our proposed LHSD and LHSD c methods with the other existingmethods, we perform two simulation studies. In the first example, the set of input variables isassumed to follow a multivariate normal distribution; the second example takes Gumbel’s bivariatelogistic distribution. The estimated quantity is the mean of a non-linear function of the outputvariables. We assume that input variable X = ( X , X , · · · , X K ) T follows a multivariate normal distribution, N K ( µ, Σ ), where µ = ( µ , µ , · · · , µ K ) T is the mean vector, and Σ = (( σ ij )) K × K is the covariancematrix. Now, X ∼ N ( µ , σ ), and for k = 2 , , · · · , K the conditional distribution of the k -th12omponent of X is given by X k | x , · · · , x k − ∼ N ( µ ∗ k , σ ∗ k ), where µ ∗ k = µ k + σ Tk Σ − kk (cid:16) x ( k ) − µ ( k ) (cid:17) , (4.1) σ ∗ k = σ kk − σ Tk Σ − kk σ k . (4.2)Here, x ( k ) = ( x , x , · · · , x k − ) T , µ ( k ) = ( µ , µ , · · · , µ k − ) T , σ k = ( σ k , σ k , · · · , σ k ( k − ) T , and Σ kk is the ( k − × ( k −
1) matrix containing the first ( k −
1) rows and ( k −
1) columns of Σ . Then,using the proposed LHSD algorithm (Algorithm 2), we get the samples from the joint distributionusing the following steps:1. Using Algorithm 1, we generate z jk , j = 1 , , · · · N, k = 1 , . · · · , K , a LHS of size N from Z whose components are from i.i.d. U (0 , j = 1 , , · · · , N the LHSD from X is given by • x j = Φ − ( z j ; µ , σ ), • x j = Φ − ( z j ; µ ∗ , σ ∗ ), • x j = Φ − ( z j ; µ ∗ , σ ∗ ),... • x jK = Φ − ( z jK ; µ ∗ K , σ ∗ K ),where Φ − ( · ; µ ∗ , σ ∗ ) is the inverse distribution function of the normal distribution with mean µ ∗ , and standard deviation σ ∗ .In this simulation study, the input variable X is generated from a four-dimensional multivariatenormal distribution N ( µ, Σ). The parameters of the distribution, i.e. µ and Σ, are randomlychosen, and then they are kept fixed for all replications. Here, µ is generated from the standardnormal distribution. For the covariance matrix Σ, first we generated a 4 × P from thestandard normal distribution, then we took Σ = P T P . We have generated random samples usingfive different methods – (i) LHSD: LHS using our proposed method given in Algorithm 2, (ii) LHSD c :centered LHSD as proposed in Section 2.2.1, (iii) LHS(ind): LHS assuming that the components13 . . . . t ^ D en s i t y LHSDLHSD c LHS(ind)LHS(Stien)SRSTarget t Figure 1: The empirical density plot of ˆ τ using five different sampling methods, where samples aredrawn from a multivariate normal distribution.of X are independent as given in Section 2.1, (iv) LHS(Stien): rank-based LHS proposed by Stein(1987) and (v) SRS: the simple random sample. We used a fixed sample of size N = 30 in all fivecases and replicated the simulations 10 ,
000 times. We are interested in estimating τ = E ( h ( X )),where h ( X ) = x + x x − x log( | x | ) + exp (cid:16) x (cid:17) , (4.3)and X = ( x , x , x , x ) T . We estimate τ by ˆ τ = N − (cid:80) Nj =1 h ( X j ) using these five samplingprocedures. The empirical densities of ˆ τ are plotted in Figure 1. We also repeated the experimentby taking sample sizes as 20, 75, and 100. The bias, variance and MSE of √ N ˆ τ for all the fivesampling methods are presented in Table 1. The true value of τ = E ( h ( X )) is calculated basedon a SRS of size N = 10 , X . All other methods give very small bias. Thevariance of the SRS is very large, and for this reason, the MSE is also very large. On the otherhand, our LHSD-based methods give the lowest MSE among all the methods. As mentioned inthe previous section, the centered LHSD introduces a small bias, but it substantially reduces thevariance of the estimator. As a result, it outperforms the other sampling schemes.14ethods Bias Variance MSE Bias Variance MSE N = 20 N = 30LHSD 0.08 10.55 10.56 0.03 10.22 10.22LHSD c -0.20 7.19 7.23 -0.15 8.24 8.26LHS(ind) -6.59 21.24 64.61 -8.15 21.03 87.45LHS(Stien) -0.39 13.06 13.21 -0.22 12.81 12.86SRS -0.03 16.50 16.50 0.12 16.75 16.77Methods N = 75 N = 100LHSD 0.04 9.18 9.18 -0.42 9.09 9.27LHSD c -0.09 8.25 8.26 -0.56 8.10 8.42LHS(ind) -12.77 20.03 183.16 -15.29 19.40 253.24LHS(Stien) -0.24 11.96 12.02 -0.60 12.13 12.49SRS -0.06 16.55 16.55 -0.41 16.64 16.81Table 1: The bias, variance and MSE of √ N ˆ τ using five different sampling methods, where samplesare drawn from a multivariate normal distribution.Table 1 shows that the bias and variance of √ N ˆ τ for the LHSD are stable as the sample sizeincreases, so it demonstrates that the estimator is √ N -consistent as proved in Theorem 2. Wealso calculated the theoretical variances of the LHSD and SRS methods using equations (3.5) and(3.6), respectively, for sample size 30. We generated a large SRS of size N = 10 ,
000 and estimated α k ( z k ) and r ( z ) as discussed at the end of Section 3. The k-nearest neighbor algorithm is usedin the non-parametric regression. The theoretical variances of √ N ˆ τ for the LHSD and SRS areobtained as 9 .
45 and 17 .
60, respectively. These values are also very close to the simulation resultsas given in Table 1.Now, we present a comparison study to see how well the different sampling schemes representthe target distribution. We need a goodness-of-fit measure to calculate the deviation of a randomsample from the target distribution N ( µ, Σ). The Kullback-Leibler (KL) divergence is used forthis purpose (Kullback and Leibler, 1951). Suppose f ( · ) is the probability density function (p.d.f.)of the target distribution, and f n ( · ) is the empirical p.d.f. Then, the KL-divergence between twodensities is defined asKL( f n , f ) = (cid:90) x f n ( x ) log f n ( x ) f ( x ) dx = (cid:90) x f n ( x ) log f n ( x ) dx − (cid:90) x f n ( x ) log f ( x ) dx. (4.4)The first term in equation (4.4) is called the Shannon entropy function, which is estimated using15ethods Mean Variance Mean Variance N = 20 N = 30LHSD 13.372 0.067 12.962 0.034LHSD c N = 75 N = 100LHSD 12.287 0.009 12.123 0.006LHSD c (cid:80) Ni =1 log f ( x i ). A small value of the KL-divergence indicates that the densities f n and f are closeto each other. If the two densities are identical, then the KL-divergence is zero, otherwise, it isalways positive. The mean and variance of the KL-divergence over 10,000 replications are presentedin Table 2 for different sample sizes. The SRS gives the minimum mean, as it generates unbiasedrandom samples from the target distribution. Our LHSD-based sampling schemes are also givingthe mean KL-divergence very close to the SRS, and their variances are around ten times smallerthan the SRS. The mean from the rank based LHS method is very high for the small sample sizes,which indicates that there is a large bias in sampling from the target distribution. We also noticedthat the variance of the KL-divergence using rank based LHS is considerably high in this example. In this section, we assume that input variable X follows the Gumbel’s bivariate logistic distribution,where the joint distribution function is given by F ( x , x ) = 11 + exp( − x ) + exp( − x ) , x , x ≥ . X k is given by F k ( x k ) = − x k ) , k = 1 ,
2, The correspondingbivariate copula function is given by C ( u , u ) = u u u + u − u u , u , u ∈ [0 , C ( u | u ) = u ( u + u − u u ) . The samples from the joint distribution are obtainedusing the following steps.1. Using Algorithm 1, we generate z jk , j = 1 , , · · · N, k = 1 , . · · · , K , a LHS of size N from Z whose components are from i.i.d. U (0 , j = 1 , , · · · , N , the corresponding samples from the joint copula is given by • u j = z i • u j = C − ( z j | u j ) = u j √ z j −√ z j + u j √ z j .3. For j = 1 , , · · · N the LHSD from X is given by x jk = F − k ( u jk ) = − log (cid:16) − u jk u jk (cid:17) , k = 1 , . We are interested in estimating τ = E ( h ( X )), where h ( X ) = x − x + x log( | x | ) . We generaterandom samples using the same five methods discussed in the previous example. Here, we use N = 30 in all five cases and replicate the simulations 10 ,
000 times. The empirical density functionsof ˆ τ using these methods are presented in Figure 2. The biases and MSEs are given in Table 2 forfour different sample sizes N = 20, 30, 75 and 100. It is seen that the centered LHSD performsbest. The rank based LHS method and LHSD have similar performance, but LHSD is slightlybetter in terms of the MSE. The LHS assuming independence produced a very large bias, and theSRS produced a large variance as expected.Like the previous example, we also estimated the KL-divergence using equation (4.4). Theresult is shown in Table 4. On an average, the SRS produces the minimum KL-divergence asthe sampling scheme is unbiased. Both LHSD and LHSD c gives slightly higher mean values thanSRS, however they are lower than the independent LHS and the ranked based LHS methods. Thevariance of the KL-divergence using LHSD c method is very low compared to the other methods.17 . . . . . . . . t ^ D en s i t y LHSDLHSD c LHS(ind)LHS(Stien)SRSTarget t Figure 2: The empirical density plot of ˆ τ using five different sampling methods, where samples aredrawn from the bivariate logistic distribution.Methods Bias Variance MSE Bias Variance MSE N = 20 N = 30LHSD 0.24 3.74 3.79 0.22 3.42 3.46LHSD c N = 75 N = 100LHSD -0.23 3.19 3.24 0.06 3.14 3.15LHSD c -0.09 2.55 2.55 0.16 2.76 2.79LHS(ind) 2.61 4.45 11.24 3.33 4.38 15.45LHS(Stien) -0.11 3.26 3.27 0.20 3.21 3.25SRS -0.23 9.92 9.97 0.06 9.90 9.91Table 3: The bias, variance and MSE of √ N ˆ τ using five different sampling methods, where samplesare drawn from the bivariate logistic distribution.18ethods Mean Variance Mean Variance N = 20 N = 30LHSD 7.732 0.033 7.682 0.016LHSD c N = 75 N = 100LHSD 7.639 0.004 7.636 0.003LHSD c We now illustrate our sampling methodology for a real application related to the river flood inun-dation model (Iooss, 2011). The study case concerns an industrial site located near a river andprotected from it by a dyke. When the river height exceeds one of the dyke, flooding occurs. Thegoal is to study the water level with respect to the dyke height. The model considered here is basedon a crude simplification of the 1D hydro-dynamical equations of Saint Venant under assumptionsof uniform and constant flow rate and large rectangular sections. It consists of an equation thatinvolves the characteristics of the river stretch given by S = Z v + H − H d − C b , (5.1) H = QBK s (cid:113) Z m − Z v L . . (5.2)The model output S is the maximal annual overflow (in meters) and H is the maximal annualheight of the river (in meters). The inputs of the model (8 inputs) together with their probabilitydistributions are defined in Table 5. Among the input variables of the model, dyke height ( H d ) isa design parameter; its variation range corresponds to a design domain. The randomness of theother variables is due to their spatio-temporal variability, our ignorance of their true value or some19nput Description Unit Probability distribution Q Maximal annual flowrate m /s Truncated Gumbel G (1013 , , K s Strickler coefficient – Truncated normal N (30 ,
8) on [15 , ∞ ) Z v River downstream level m Triangular T (49 , , Z m River upstream level m Triangular T (54 , , H d Dyke height m Uniform U (7 , C b Bank level m Triangular T (55 , . , T (4990 , , T (295 , , Q ) andStrickler coefficient ( K s ) are correlated with correlation coefficient ρ Q,K = 0 . Z v ) and river upstream level ( Z m ) are knownto be correlated with correlation coefficient ρ Z m ,Z v = 0 .
3. The correlation coefficient between thelength of the river ( L ) and breadth of the river ( B ) are also known to be ρ L,B = 0 .
3. The otherinput variables are assumed to be uncorrelated.The quantity of interest is the mean maximal annual overflow τ = E ( S ). We compare our20roposed LHSD-based methods with the SRS, LHS(ind), LHS(Stien) in estimating τ . Althoughthe marginal distribution of the inputs X = ( Q, K s , Z v , Z m , H d , C b , L, B ) T are known, the jointdistribution is not known. We construct a joint distribution using the multivariate normal copulawith the known pairwise correlation coefficients as described in Section 2.2.2. The choice of suchcopula is appropriate here as it would retain the known correlation structure in the inputs. Themultivariate normal copula function for K variables is given by C ( u , u , · · · , u K ; Σ) = Φ K (Φ − ( u ) , Φ − ( u ) , · · · , Φ − ( u K ); Σ) , (5.3)where Φ K ( · ; Σ) is the joint c.d.f. of the multivariate normal distribution with covariance matrixΣ, and Φ( · ) is the c.d.f. of a standard normal distribution. In this example, K = 8, and Σ =(( σ ij )), where σ i,i = 1 for i = 1 , · · · , K , σ , = σ , = 0 . , σ , = σ , = 0 . , σ , = σ , =0 .
3, and all other elements of Σ are 0. As most of the components on the copula function areassumed to be independent and the dependency exists only pairwise, the conditional copulas iseasily obtained using properties of the bivariate normal distribution. The inverse of such conditionaldistributions can also be obtained using the inverse of the univariate normal distribution. Hence,we use Algorithm 3 to sample from the inputs using LHSD. The same conditional copulas are usedto sample from the joint distribution using the SRS, those are also used in the first step of rankbased LHS method.We sample 30 observations from the inputs using all these five methods and computed thecorresponding value of the outputs using (5.1). The value of τ is estimated by the sample meanof the outputs. We repeated this procedure 10 ,
000 times to estimate the bias and variance of theestimator using these five methods. We also repeated the experiment by taking sample sizes as20, 75, and 100. The results are shown in Table 6. The empirical probability distributions of theestimators are shown in Figure 4. Here, the target value of τ is computed by sampling 10 , N = 20 N = 30LHSD 0.013 0.010 0.010 0.062 0.008 0.011LHSD c N = 75 N = 100LHSD 0.017 0.006 0.006 0.002 0.005 0.005LHSD c √ N ˆ τ using five different sampling methods, where samplesare drawn from the river flood inundation model.The biases and MSE of the estimators for all pairwise correlation coefficients are also computedusing the five sampling methods (see Table 7). The true correlation coefficients for the first threepairs in that table, i.e. ( Q, K s ), ( Z v , Z m ), and ( L, B ), are non-zero. The last column for both biasand MSE combines all other pairs where the variables are pairwise independent, and we reportedthe maximum absolute bias and MSE for those pairs. It shows that all methods except LHS(ind)estimate the correlation coefficient efficiently. The LHSD and LHSD c methods gives smaller biasand MSE for the correlation coefficient in comparison to the ranked based LHS method. However,combining other results, we observe that the centered LHSD produces the best samples to estimatethe output mean.Methods Bias MSE( Q, K s ) ( Z v , Z m ) ( L, B ) max( | Other | ) ( Q, K s ) ( Z v , Z m ) ( L, B ) max( | Other | )LHSD -0.026 -0.006 -0.009 0.003 0.019 0.027 0.027 0.036LHSD c -0.023 -0.009 -0.011 0.003 0.018 0.027 0.027 0.036LHS(ind) -0.502 -0.297 -0.300 0.003 0.286 0.123 0.125 0.036LHS(Stien) -0.038 -0.015 -0.013 0.002 0.026 0.030 0.030 0.036SRS -0.017 -0.005 -0.004 0.002 0.023 0.029 0.029 0.036Table 7: The biases and MSEs of the correlation coefficients for different pairs of variables, wheresamples are drawn from the river flood inundation model.22 t ^ D en s i t y LHSDLHSD c LHS(ind)LHS(Stien)SRSTarget t Figure 4: The empirical density plot of ˆ τ using five different sampling methods, where samples aredrawn from the river flood inundation model. In this paper, we have presented some new sampling methods, based on Latin hypercube sampling,to estimate the mean of the output variable when the inputs are treated as random variables withdependent components. It is theoretically proved that the asymptotic variance of the estimatoris smaller than that of the estimator based on the SRS. The simulation results show that ourestimators outperform the SRS, and also give smaller mean squared errors than the existing LHSmethods. The most advantage of our method is that it retains the exact joint distribution of theinput variables. To the best of our knowledge, there does not exist any other sampling schemebased on the LHS that satisfies this property in case of dependent inputs.
References
Chastaing, G., Gamboa, F., and Prieur, C. (2012). Generalized Hoeffding-Sobol decomposition fordependent variables – application to sensitivity analysis.
Electron. J. Stat. , 6:2420–2448.Conti, S., Gosling, J. P., Oakley, J. E., and O’Hagan, A. (2009). Gaussian process emulation of23ynamic computer codes.
Biometrika , 96(3):663–676.Fang, K.-T., Li, R., and Sudjianto, A. (2006).
Design and modeling for computer experiments .Computer Science and Data Analysis Series. Chapman & Hall/CRC, Boca Raton, FL.Iman, R. L. and Conover, W. J. (1980). Small sample sensitivity analysis techniques for computermodels, with an application to risk assessment.
Comm. Statist. A—Theory Methods , 9(17):1749–1874.Iooss, B. (2011). Revue sur l’analyse de sensibilit´e globale de mod`eles num´eriques.
J. SFdS ,152(1):3–25.Kennedy, M. C. and O’Hagan, A. (2001). Bayesian calibration of computer models.
J. Roy. Statist.Soc. Ser. B , 63(3):425–464.Kullback, S. and Leibler, R. A. (1951). On information and sufficiency.
Ann. Math. Statistics ,22:79–86.McKay, M. D., Beckman, R. J., and Conover, W. J. (1979). A comparison of three methods forselecting values of input variables in the analysis of output from a computer code.
Technometrics ,21(2):239–245.Oakley, J. and O’hagan, A. (2002). Bayesian inference for the uncertainty distribution of computermodel outputs.
Biometrika , 89(4):769–784.O’Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial.
Reliability Engineering& System Safety , 91(10):1290–1300.Owen, A. B. (1992). A central limit theorem for Latin hypercube sampling.
J. Roy. Statist. Soc.Ser. B , 54(2):541–551.Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). Design and analysis of computerexperiments.
Statist. Sci. , 4(4):409–435.Simpson, T. W., Poplinski, J., Koch, P. N., and Allen, J. K. (2001). Metamodels for computer-basedengineering design: survey and recommendations.
Engineering with Computers , 17(2):129–150.24klar, M. (1959). Fonctions de repartition an dimensions et leurs marges.
Publ. inst. statist. univ.Paris , 8:229–231.Stein, M. (1987). Large sample properties of simulations using Latin hypercube sampling.
Techno-metrics , 29(2):143–151.Volkova, E., Iooss, B., and Van Dorpe, F. (2008). Global sensitivity analysis for a numerical modelof radionuclide migration from the rrc ‘Kurchatov Institute’ radwaste disposal site.
StochasticEnvironmental Research and Risk Assessment , 22(1):17–31.
Appendix
Proof of Theorem 1.
Suppose Z , Z , · · · , Z N form a LHS from Z , where the components of Z arei.i.d. U (0 ,
1) variables. Then, using the inverse transformation in Equation (3.1) we getˆ τ = 1 N N (cid:88) i =1 h ( X j ) = 1 N N (cid:88) i =1 h ∗ ( Z j ) , where X , X , · · · , X N form a LHSD from X having a target distribution F . For 0 ≤ y , y < G N ( y , y ) = , if [ N y ] = [ N y ]0 , otherwise,where [ y ] denotes the greatest integer less than or equal to y . Following McKay et al. (1979), thejoint density of Z and Z can be written as f ( z , z ) = (cid:18) NN − (cid:19) K K (cid:89) k =1 (1 − G N ( z k , z k )) , where z j = ( z j , z j , · · · , z jK ) T for j = 1 ,
2. If A K = { z ∈ R K : || z || ≤ } , thenCov(h ∗ ( Z ) , h ∗ ( Z )) = (cid:18) NN − (cid:19) K (cid:90) A K (cid:90) A K h ∗ ( z )h ∗ ( z ) K (cid:89) k=1 (1 − G N (z , z ))d z d z − τ . (cid:16) NN − (cid:17) K = 1 + KN + O (cid:0) N (cid:1) and keeping only the first order terms in the expansion of theproduct, we getCov(h ∗ ( Z ) , h ∗ ( Z ))= (cid:18) NN − (cid:19) K (cid:90) A K (cid:90) A K h ∗ ( z ) , h ∗ ( z ) d z d z − (cid:18) NN − (cid:19) K K (cid:88) k =1 (cid:90) A K h ∗ ( z ) h ∗ ( z ) G N ( z k , z k ) d z d z − τ + O (cid:18) N (cid:19) = (cid:18) KN + O (cid:18) N (cid:19)(cid:19) τ − (cid:18) O (cid:18) N (cid:19)(cid:19) K (cid:88) k =1 (cid:90) A K (cid:90) A K h ∗ ( z ) , h ∗ ( z ) G N ( z k , z k ) d z d z + O (cid:18) N (cid:19) . (6.1)Define I j = [( j − /N, j/N ]. Using Equation (3.3) we have (cid:90) A K (cid:90) A K h ∗ ( z ) , h ∗ ( z ) G N ( z k , z k ) d z d z = (cid:90) A (cid:90) A ( α k ( z k ) − τ )( α k ( z k ) − τ ) G N ( z k , z k ) dz k dz k = N (cid:88) j =1 (cid:32)(cid:90) I j ( α k ( z k ) − τ ) dz k (cid:33) . (6.2)If α k is Riemann integrable, one can show that N N (cid:88) j =1 (cid:32)(cid:90) I j ( α k ( z k ) − τ ) dz k (cid:33) −→ (cid:90) ( α k ( z k ) − τ ) dz k . (6.3)Combining Equations (6.1), (6.2) and (6.3) we haveCov(h ∗ ( Z ) , h ∗ ( Z )) = KN τ − K (cid:88) k=1 (cid:90) ( α k (z k ) − τ ) dz k + o (cid:18) (cid:19) . (6.4)26hereforeVar LHSD (ˆ τ ) = 1 N Var(h ∗ ( Z )) + N −
1N Cov(h ∗ ( Z ) , h ∗ ( Z ))= 1 N (cid:40)(cid:90) A K h ∗ ( z ) d z − τ + Kτ − K (cid:88) k =1 (cid:90) ( α k ( z k ) − τ ) dz k (cid:41) + o (cid:18) N (cid:19) = 1 N (cid:40)(cid:90) A K h ∗ ( z ) d z − τ − K (cid:88) k =1 (cid:90) α k ( z k ) dz k (cid:41) + o (cid:18) N (cid:19) = 1 N (cid:90) A K (cid:40) h ∗ ( z ) − τ − K (cid:88) k =1 α k ( z k ) (cid:41) d z + o (cid:18) N (cid:19) = 1 N (cid:90) A K r ( z ) d z + o (cid:18) N (cid:19) . Proof of Theorem 2.
From Theorem 1 we find that the asymptotic variance of N / (ˆ τ − τ ) is (cid:82) A K r ( z ) d z . As E (ˆ τ ) = τ , it is enough to prove that the asymptotic distribution of N / (ˆ τ − τ ) isnormal. We write N / (ˆ τ − τ ) = N / (cid:32) K (cid:88) k =1 ¯ α k + ¯ R (cid:33) , (6.5)where ¯ α k = 1 N N (cid:88) j =1 α k ( Z jk ) and ¯ R = 1 N N (cid:88) j =1 r ( Z i ) . Theorem 1 shows that Var