Gaussian process surrogates for failure detection: a Bayesian experimental design approach
GGaussian process surrogates for failuredetection: a Bayesian experimental designapproach
Hongqiao Wang a a Institute of Natural Sciences and Department of Mathematics, Shanghai JiaoTong University, Shanghai 200240, China.
Guang Lin b b Department of Mathematics and School of Mechanical Engineering, PurdueUniversity, 150 N. University Street, West Lafayette, IN 47907-2067, USA.
Jinglai Li c c Institute of Natural Sciences, Department of Mathematics, and MOE KeyLaboratory of Scientific and Engineering Computing, Shanghai Jiao TongUniversity, Shanghai 200240, China. (Corresponding author)
Abstract
An important task of uncertainty quantification is to identify the probability ofundesired events, in particular, system failures, caused by various sources of uncer-tainties. In this work we consider the construction of Gaussian process surrogatesfor failure detection and failure probability estimation. In particular, we considerthe situation that the underlying computer models are extremely expensive, and inthis setting, determining the sampling points in the state space is of essential impor-tance. We formulate the problem as an optimal experimental design for Bayesianinferences of the limit state (i.e., the failure boundary) and propose an efficientnumerical scheme to solve the resulting optimization problem. In particular, theproposed limit-state inference method is capable of determining multiple samplingpoints at a time, and thus it is well suited for problems where multiple computersimulations can be performed in parallel. The accuracy and performance of theproposed method is demonstrated by both academic and practical examples.
Key words:
Bayesian inference, experimental design, failure detection, Gaussianprocesses, Monte Carlo, response surfaces, uncertainty quantification
Preprint submitted to Elsevier 16 September 2015 a r X i v : . [ s t a t . C O ] S e p Introduction
Real-life engineering systems are unavoidably subject to various uncertaintiessuch as material properties, geometric parameters, boundary conditions andapplied loadings. These uncertainties may cause undesired events, in particu-lar, system failures or malfunctions, to occur. Accurate identification of failureregion and evaluation of failure probability of a given system is an essentialtask in many fields of engineering such as risk management, structural design,reliability-based optimization, etc.Conventionally the failure probability is often computed by constructing linearor quadratic expansions of the system model around the so-called most proba-ble point, known as the first/second-order reliability method (FORM/SORM);see e.g., [15,39] and the references therein. It is well known that FORM/SORMmay fail for systems with nonlinearity or multiple failure regions. The MonteCarlo (MC) simulation, which estimates the failure probability by repeatedlysimulating the underlying system, is another popular method for solving suchproblems. The MC method makes no approximation to the underlying com-puter models and thus can be applied to any systems. On the other hand,the MC method is notorious for its slow convergence, and thus can becomeprohibitively expensive when the underlying computer model is computation-ally intensive and/or the system failures are rare and each sample requires toa full-scale numerical simulation of the system. To reduce the computationaleffort, one can construct an computationally inexpensive approximation to thetrue model, and then evaluate the approximate model in the MC simulations.Such approximate models are also known as response surfaces, surrogates,metamodels, and emulators, etc. These methods are referred to as the re-sponse surface (RS) methods [20,10,37,21,22,36] in this work. The responsesurface can often provide a reliable estimate of the failure probability, at amuch lower computational cost than direct MC simulations.In this work we are focused on a specific kind of RS, the Gaussian processes(GP) surrogates, also known as kriging in many fields of applications. TheGP surrogates have been widely used in machine learning [44], geostatistics[34], engineering optimizations [40], and most recently, uncertainty quantifica-tions [8,9]. A number of GP-based methods have been also been successfullyimplemented for failure probability estimation [28,4]. In this work we considerthe situation where the underlying computer models are extremely expensiveand one can only afford a very limited number of simulations. In this setting,choosing the sampling points (i.e. the parameter values with which the simu-lation is performed) in the state space is of essential importance. Determining
Email addresses: [email protected] (Hongqiao Wang), [email protected] (Guang Lin), [email protected] (Jinglai Li). information-theoretic optimal experimentaldesign, which uses the relative entropy from the prior to the posterior as thedesign criteria, to determine the sampling points. We also present an efficientnumerical scheme for solving the resulting optimal design problem, modifiedfrom the simulation-based method developed in [24]. We compare the perfor-mance of the proposed limit-state inference (LSI) method with that of theSUR by numerical examples.We note that another line of research in failure probability estimation is todevelop more efficient sampling schemes, such as the subset simulations [3,2],importance sampling [32,19], sequential Monte Carlo [11], the cross-entropymethod [38,43], etc. For practical engineering systems, computer simulationscan be extremely time consuming. In many cases, one can only afford very lim-ited number of simulations — nothing beyond a few hundreds. In this case,even the most effective sampling method is not applicable. To this end, surro-gates are needed even in those advanced sampling schemes and in particularthe LSI method can be easily integrated into the aforementioned samplingschemes, resulting in more efficient estimation schemes. Examples of combin-ing surrogates and efficient sampling schemes include [27,28,4,16].3he rest of this paper is organized as following. We first review the prelim-inaries of our work in Section 2, including the mathematical formulation offailure probability computation and the GP surrogates. Our Bayesian experi-mental design framework and its numerical implementations are presented inSection 3. Numerical examples are presented in Section 4 to demonstrate theeffectiveness of the proposed method, and finally Section 5 offers some closingremarks.
Here we describe the failure probability estimation problem in a general set-ting. We consider a probabilistic model where x is a d -dimensional randomvariable that represents the uncertainty in model and the system failure isoften defined using a real-valued function g ( · ) : R d → R , which is known asthe limit state function or the performance function . Specifically, the event offailure is defined as g ( x ) < P = P ( g ( x ) <
0) = (cid:90) { x ∈ R d | g ( x ) < } I g ( x ) p ( x ) d x = (cid:90) x ∈ R d I g ( x ) p ( x ) d x , where I g ( x ) is an indicator function: I g ( x ) = g ( x ) < , g ( x ) ≥ p ( x ) is the probability density function (PDF) of x . In what follows weshall omit the integration domain when it is simply R d . This is a generaldefinition for failure probability, which is used widely in many disciplines in-volving reliability analysis and risk management. P can be computed with thestandard Monte Carlo estimation:ˆ P = 1 n n (cid:88) i =1 I g ( x i ) , (1)where samples x , ..., x n are drawn from distribution p ( x ).In practice, many engineering systems require high reliability, namely the fail-ure probability P (cid:28)
1. In this case, MC requires a rather large number ofsamples to produce a reliable estimate of the failure probability. For exam-ple, for P ≈ − , MC simulation requires 10 samples to obtain an estimatewith 10% coefficient of variation. On the other hand, in almost all practical4ases, the limit state function g ( x ) does not admit analytical expression andhas to be evaluated through expensive computer simulations, which rendersthe crucial MC estimation of the failure probability prohibitive. To reduce thenumber of full-scale computer simulations, one can construct a computation-ally inexpensive surrogate G ( x ) to replace the real function g ( x ) in the MCestimation. In this work we choose the Gaussian Process surrogates and weprovide a description of GP in the next section. The GP method constructs the approximation of g ( x ) in a nonparametericBayesian regression framework [33,44]. Specifically the target function g ( x ) iscast as g ( x ) = µ ( x ) + (cid:15) ( x ) (2)where µ ( x ) is a real-valued function and (cid:15) ( x ) is a zero-mean Gaussian processwhose covariance is specified by a kernel K ( x , x (cid:48) ), namely,COV[ (cid:15) ( x ) , (cid:15) ( x (cid:48) )] = K ( x , x (cid:48) ) . The kernel K ( x , x (cid:48) ) is positive semidefinite and bounded. Suppose that N computer simulations of the function g ( x ) are performed at parameter values X ∗ := [ x ∗ , . . . x ∗ n ], yielding function evaluations y ∗ := [ y ∗ , . . . y ∗ n ], where y ∗ i = g ( x ∗ i ) for i = 1 , . . . , n. Suppose we want to predict the function values at points D := [ x , . . . x m ],i.e., y = [ y , . . . y m ] where y i = g ( x i ). The posterior distribution of y is y | D , X ∗ , y ∗ ∼ N ( u , C ) , (3a)where the posterior mean u = ( u , . . . u m ) is a m -dimensional vector with eachelement to be u j = µ ( x j ) + r Tj R − ( y ∗ − µ ) for j = 1 ...m, (3b)where µ = ( µ , . . . , µ n ) and µ i = µ ( x ∗ i ) for i = 1 , . . . , n . The posterior covari-ance matrix C is given by( C ) j,j (cid:48) = K ( x j , x j (cid:48) ) − r Tj R − r j (cid:48) . (3c)In the equations above, r j represents a n -dimensional vector whose i -th com-ponent is K ( x ∗ i , x j ) and the matrix R is given by ( R ) i,i (cid:48) = K ( x ∗ i , x ∗ i (cid:48) ) for i, i (cid:48) = 1 , . . . , n . 5 The experimental design framework
A key question in constructing a GP surrogate is to determine the locations D where the true limit state function g ( x ) is evaluated, which is often known asa design of computer experiments. In this section, we show that determiningthe sampling locations can be translated into a Bayesian experimental designproblem whose goal is to find the locations of limit state with a given priordistribution. We formulate a Bayesian inference problem based on the following argument.In failure probability estimation, the limit state function g is only used inthe indicator function I g ( x ) and so one is really interested in the sign of g ( x )rather than the precise value of it. To this end, the essential task in con-structing surrogates for the failure probability estimation is to learn aboutthe boundary of the failure domain. Let z represents the boundary of the fail-ure domain, i.e., the solutions of g ( x ) = 0. We want to identify the boundary z with Bayesian inference. In this setting, the data is obtained by makingobservations, i.e., evaluate g ( x ) at locations D = ( x , x , . . . , x n ), resulting indata y = ( y , . . . , y n ) with each y i = g ( x i ). The likelihood function p ( y | z , D )in this case is given by Eqs. (3) with X ∗ = [ z ] and y ∗ = [0]. We also needto choose a prior distribution for z , and without additional information, it isreasonable to assume that the prior is simply p ( z ). In this setting the posteriordistribution of z with data y can be computed with the Bayes’ theorem: p ( z | y , D ) = p ( y | z , D ) p ( z ) p ( y | D ) , where p ( y | D ) is the evidence. As is mentioned earlier, determining the locations D can be formulated as adesign of the Bayesian inference experiments. Following the decision-theoreticoptimal experimental design framework, an objective for experimental designcan be generally formed: U ( D ) = (cid:90) (cid:90) u ( D , y , z ) p ( z , y | D ) d z d y = (cid:90) (cid:90) u ( D , y , z ) p ( z | y , D ) p ( y | D ) d z d y , (4)6here u ( D , y , z ) is a utility function, U ( D ) is the expected utility. The utilityfunction u should reflect the usefulness of an experiment at conditions D ,i.e., we will get a particular value outcome y at condition D by inputting aparticular value of the parameters z . Since we do not know the exact value of z and y in advance, we take the expectation of u over the joint distribution of z and y , resulting in the expected utility U ( D ). The optimal choice of D thencan be obtained by maximizing the expected utility of the design space Ξ: D ∗ = arg max D ∈ Ξ U ( D ) . (5)A popular choice for the utility function is the relative entropy, also known asthe Kullback-Leibler divergence (KLD), between the prior and the posteriordistributions. For two distributions p A ( z ) and p B ( z ), the KLD from p A to p B is defined as D KL ( p A (cid:107) p B ) = (cid:90) p A ( z ) ln[ p A ( z ) p B ( z ) ] d z = E A [ln p A ( z ) p B ( z ) ] (6)where we define 0 ln 0 ≡
0. This quantity is non-negative, non-symmetric, andreflects the difference in information carried by the two distributions. WhenKLD is used in our inference problem, the utility function becomes u ( D , y , z ) ≡ D KL ( p ( ·| y , D ) (cid:107) p ( · )) = (cid:90) p (˜ z | y , D ) ln (cid:34) p (˜ z | y , D ) p (˜ z ) (cid:35) d ˜ z . (7)The utility function u ( D , y , z ) in Eq. (7) can be understood as the informationgain by performing experiments under conditions D , and larger value of u implies that the experiment is more informative for parameter inference. Notethat the utility function in Eq. (7) is independent of z and as such the expectedutility U ( D ) is reduced to U ( D ) = (cid:90) D KL ( p ( ·| y , D ) (cid:107) p ( · )) p ( y | D ) d y = (cid:90) (cid:90) p ( z | y , D ) ln (cid:34) p ( z | y , D ) p ( z ) (cid:35) d z p ( y | D ) d y , (8)where ˜ z in Eq. (7) is replaced by z for simplicity. Next we discuss how tonumerically solve the optimization problem Eq. (5), with U ( D ) is given byEq. (8). Following the recommendation of [24], we use the simultaneous perturbationstochastic approximation (SPSA) method, to solve the optimization prob-7em (5). SPSA is a derivative-free stochastic optimization method that wasfirst proposed by Spall [41,42], and we provide the detailed algorithm of SPSAin Appendix A. Note here that, since it is a derivative-free method, the algo-rithm only uses the function value of U ( D ).Next we discuss the evaluation of U ( D ). To start, we re-write Eq. (8) as U ( D ) = (cid:90) (cid:90) p ( z | y , D ) ln (cid:34) p ( z | y , D ) p ( z ) (cid:35) d z p ( y | D ) d y = − (cid:90) ln | C | p ( z ) d z − (cid:90) (cid:90) ln[ p ( y | D )] p ( y | D ) d y + Z = − E [ln | C | ] + H [ p ( y | D )] + Z (9)where H ( · ) is the notation for entropy and Z is a constant independent of D .The detailed derivation of Eq. (9) is given in Appendix B. Note that Eq. (9)typically has no closed-form expression and has to be evaluated with MCsimulations. Draw m pairs of samples { ( y , z ) , . . . , ( y m , z m ) } from p ( y , z | D ),and the MC estimator of E [ln | C | ] isˆ U = 1 m m (cid:88) i =1 | C ( z i , D ) | . (10)Recall that in [24], the entropy term H [ p ( y | D )] is computed using a nestedMC. For efficiency’s sake, we use the resubstitution method developed in [1] toestimate the entropy. The basic idea of the method is rather straightforward:given a set of samples { y , . . . , y m } of p ( y | D ), one first computes an estimatorof the density p ( y | D ), say ˆ p ( y ), with certain density estimation approach, andthen estimates the entropy with,ˆ U = 1 m m (cid:88) i =1 ˆ p ( y i ) . (11)Theoretical properties of the resubstitution method are analyzed in [25,23] andother entropy estimation methods can be found in [6]. In the original work [1],the distribution ˆ p is obtained with kernel density estimation, which can becomevery costly when the dimensionality of y gets high, and to address the issue,we use Gaussian mixture based density estimation method [31]. Without lossof generality we can simply set the constant Z = 0 and thus an estimate of U is simply ˆ U = ˆ U + ˆ U , (12)which will be used to provide function values in the SPSA algorithm. Finallywe reinstate that the numerical implementation in this work differs from amore general implementation outlined in [24] in the following two aspects:first, thanks to the Gaussianity of p ( y | D ) , we can analytically integrate thevariable y in the first integral on the right hand side of Eq. (9); second, we8se Gaussian mixture density estimation method to approximate p ( y | D ) toavoid the nested MC integration. We note that there are other methods suchas [29] that can be used to approximate the nested integral. Until this point we have presented our method under the assumption that thesampling points are determined all in advance of performing computer simula-tions, which is often referred to as an open-loop design. In many applications,a more practical strategy is to choose the sampling points in a sequentialfashion: determine a set of sampling points, perform simulations, determineanother set of points based on the previous results, and so forth. A sequen-tial (close-loop) design can be readily derived from the open-loop version. Forcomputational efficiency, we adopt the “greedy” approach for the sequentialdesign, while noting that this approach can only obtain sub-optimal solutionsin general. Simply speaking, the greedy sequential design iterates as followsuntil a prescribed stopping criteria is met:(1) determine n sampling points with an open-loop design;(2) perform simulations at the chosen sampling points;(3) use the simulation results to update p ( z ) and p ( y | z , D );(4) return to step (1).A key ingredient in the greedy procedure is that the prior p ( z ) and the likeli-hood p ( z | z , D ) are updated based on the simulation results in each stage. Tobe specific, the prior distribution at step k + 1 is the posterior distribution of z at step k : p k +1 ( z ) = p k ( z | y , D ) . The update of the likelihood is a bit more complicated. Namely, let X k repre-sent all the sampling points at which the function g ( · ) has been evaluated inthe first k stages and y k be the associated function values, and the likelihoodfunction p k +1 ( y | z , D ) is once again given by Eqs. (3) while X ∗ = [ X k , z ] and y ∗ = [ y k , . Note that we do not have analytical expression for p k +1 ( z ) or equivalently p k ( z | y , D ); however, our implementation only requires samples drawn from p k +1 ( z ) which are available from the previous iteration. Another issue thatshould be noted is that, every time a new observation ( z ,
0) is added to X ∗ ,the posterior covariance needs to be updated. Recomputing the covariancematrix for each sample can be rather time consuming, and so here we use thecovariance matrix update method which computes the new covariance matrixbased on the previous one. The detailed update formulas are given in [18,13]9 x −200−150−100−50050g(x)=0 Fig. 1. The rescaled Branin function. The solid line is the limit state g ( x ) = 0. and we shall not repeat it here. Finally we also need to specify a stopping cri-teria for the algorithm and following the previous works we stop the iterationswhen the difference the estimated probabilities in two consecutive iterationsis smaller than a threshold value. To compare the performance of our method with SUR, we first test bothmethods on the rescaled Branin function g ( x , x ) = 80 − [(15 x − π (15 x − + 5 π (15 x − − + 10(1 − π ) cos(15 x −
5) + 10] . (13)which is plotted in Fig. 1. Here we assume that x and x are independent andeach follows a uniform distribution in [0 , . P r obab ili t y o f e rr o r LSISUR 4 6 8 10 12 1400.020.040.060.080.10.12 Number of sample points E rr o r i n f a il u r e o f p r obab ili t y LSISUR
Fig. 2. Left: the probability of detection error plotted against the number of de-sign points. Right: the error in the failure probability estimates plotted against thenumber of design points. to use a squared exponential covariance: K ( x , x (cid:48) ) = α exp( − (cid:107) x − x (cid:48) (cid:107) β ) , (14)where we choose α = 0 . β = 1 in this example. If desired, the parametervalues in Eq. (14) can also be obtained with maximum marginal likelihoodmethod. The prior mean is determined with a quadratic regression. In all thethree examples, the algorithm is terminated after 50 iterations in the SPSAmethod, and in each iteration, 5000 samples are used to evaluate the value of U ( D ). We draw 10000 samples from the uniform distribution, and each time anew design point is added, we use the resulting GP surrogates to detect whichpoints are in the failure region and which are not. Using the detection resultswe compute the failure probability. To compare the performance of the twomethods, we compute two quantities as the indicators of performance. Thefirst is the error between the failure probability estimate by each method andthe true failure probability which is estimated by standard MC. The secondis the probability that a point is mis-identified: P [ x ∈ { x | g ( x ) < , ˆ g ( x ) > } ∪ { x | g ( x ) > , ˆ g ( x ) < } ] , where ˆ g ( x ) represents the surrogate. In Fig. 2, we plot both indicators as afunction of the number of design points for our method and the SUR. Theresults of both indicators suggest that our LSI method seems to have a betterperformance than SUR in this example. Our second example is the so-called four branch system, which is a popularbenchmark test case in reliability analysis. In this example the limit state11 x −10−8−6−4−202g(x)=0 Fig. 3. The limit state function of the four-branch model. The solid lines in bothfigures are the limit state g ( x ) = 0. −3 Iterations P r ob a b ilit y o f d e t ec ti on e rr o r −3 Iterations E rr o r i n f a il u r e p r ob a b ilit y Fig. 4. Left: the probability of detection error plotted against the number of de-sign points. Right: the error in the failure probability estimates plotted against thenumber of design points. function reads g ( x , x ) = min . x − x ) − ( x + x ) / √
23 + 0 . x − x ) + ( x + x ) / √ x − x ) + 7 / √ x − x ) − / √ , which is shown in Fig. 3. The input random variable x and x are assumed tobe independent and follow standard normal distribution. We first compute thefailure probability with a standard MC estimation of 10 samples, resultingan estimate of 2 . × − , i.e., 234 samples fall in the failure region.We use the squared exponential covariance function (14) in this example, butwith a = 0 . β = 0 .
8. The prior mean of the GP surrogate is computedwith a quadratic regression. We perform the sequential design described in12
10 −5 0 5 10−10−50510 x x g(x)<0 −10 −5 0 5 10−10−50510 x x g(x)<0 −10 −5 0 5 10−10−50510 x x g(x)<0 −10 −5 0 5 10−10−50510 x x g(x)<0 Fig. 5. Top: the left figure shows the initial design points and the right one shows thedesign points determined in the first iteration. Bottom: the left and the right figuresare the design points determined in the 7th and the 13th iteration respectively.
Section 3 with 4 sampling points determined in each iteration. The algorithmterminates in 13 iterations, resulting in totally 62 design points. We plot theerrors in failure probability estimation as a function of the number of iterationsin Fig. 4 (left), and the mis-detection probability, also as a function of thenumber of iterations in Fig. 4 (right). In Figs. 5, we plot the initial designpoints and the design points found in the first, the 7th, and the last iterations.In Fig. 6 (left), we show all the 62 design points including both the initial pointsand those found by our method. Also shown in the figure is the surrogateconstructed with the obtained design points. We can see that our methodallocates more points near the boundary of the failure region than randomsampling points. In Fig. 6 (right), we show the samples that are incorrectlyidentified (crosses) and those are correctly identified as failures (dots).We note that, the failure probability estimate obtained in the final iteration is2 . × − , while the estimate of standard MC is 2 . × − . Also, comparedto the MC simulation with true model, we find that totally 19 samples areincorrectly classified: 11 safe samples are identified as failures, and 8 failuresamples are identified as safe ones. The numerical results indicate that ourmethod can obtain reliable estimates of the failure probability with a rathersmall number of sampling points. 13
10 −5 0 5 10−10−50510 x x −10 −5 0 5 10−10−50510 x x Fig. 6. Left: all the design points (asterisks), the limit stage (solid line) and the GPsurrogate (dashed line). Right: the dots are the points that are corrected identifiedas failures and the crosses are the mis-identified samples.
Our last example is a practical problem which concerns the dynamic behaviorof a beam clamped at both ends and a uniform pressure load is suddenlyapplied as shown in Fig. 7. We are interested in the dynamics of the deflectionsat the center of the beam caused by the pressure.The Lagrangian formulation of the equilibrium equations without body forcescan be written ρ d u dt − ∇ · ( F · S ) = 0 , where ρ is the material density, u is the displacement vector, F is the defor-mation gradient, F = ∂u i ∂x k + δ ik and S is the second Piola-Kirchoff stress tensor. For simplicity, we assumelinear elastic constitutive relations and isotropic material. As a result, theconstitutive relations may be written in matrix form: S = S S S = C C C C G E E E where E ij = 12 (cid:32) ∂u i ∂x j + ∂u j ∂x i + ∂u k ∂x i ∂u k ∂x j (cid:33) , and C = C = E − ν , C = Eν − ν , G = E ν ) . arameter L h ρ E ν δ mean 5 0.1 7 . − × . × − . × − . × − . × . × − Here E is Young’s modulus and ν is Poisson’s ratio. The initial conditions are u (0 , x ) = 0 , ∂u∂t (0 , x ) = 0 . Readers who are interested in more details about the Lagrangian formulationfor nonlinear elasticity can consult, for example, [30].We assume in this examples that the beam length L , the height h , the materialdensity ρ , the Young’s module E , the Poisson ratio ν , and the applied load δ are random. All the random parameters follow a normal distribution and areindependent to each other. The means and the variances of the parametersare summarized in Table 1. To demonstrate the statistical variations of thebeam dynamics, we randomly generate 10 parameter samples and plot the10 resulting beam dynamics in Fig. 8. We define the failure as the maximumdeflection at the center of the beam being larger than a threshold value u max .To be specific, we take u max = 0 .
23. We first run standard MC with 10 samplesto compute the failure probability, resulting in an estimate of 3 . × − .In the GP method, we use 64 initial sampling points drawn by the hyperLatin cube approach. Our algorithm determines 4 points in each iteration andit is terminated after 25 iterations, resulting in totally 164 sampling points.As before, we plot the errors in failure probability estimation and the mis-detection probability, both against the number of iterations, in Figs. 9. Inboth plots we can see that the accuracy of the estimations improve rapidlyas the number of sampling points increases, indicating that our methods caneffectively identify good sampling points for this practical problem. Our failureprobability estimate is 3 . × − while 12 points in the failure region is mis-identified as safe ones and 10 safe ones are mis-identified as failures. In conclusion, we have presented a GP-based failure-detection method toconstruct GP surrogate for failure probability estimations. In particular, themethod recasts the failure detection as inferring a contour g ( x ) = 0 withBayesian methods, and then determines the optimal sampling locations for15 L (cid:71) Fig. 7. Schematic illustration of the clamped beam. time × -3 de f l e c t i on -0.0500.050.10.150.20.25 Fig. 8. Dynamics of the deflection at the beam center for ten randomly generatedsamples. −3 Iterations P r ob a b ilit y o f d e t ec ti on e rr o r −3 Iterations E rr o r i n f a il u r e p r ob a b ilit y Fig. 9. Left: the probability of detection error plotted against the number of de-sign points. Right: the error in the failure probability estimates plotted against thenumber of design points.
Acknowledgment
The work was partially supported by the National Natural Science Founda-tion of China under grant number 11301337. G. Lin would like to acknowledgethe support of the U.S. Department of Energy, Office of Science, Office ofAdvanced Scientific Computing Research, Applied Mathematics program aspart of the Multifaceted Mathematics for Complex Energy Systems (M ACS)project and part of the Collaboratory on Mathematics for Mesoscopic Model-ing of Materials project, and NSF Grant DMS-1115887.
A The SPSA method
Here we briefly introduce the SPSA method, in the context of our specificapplications. In each step, the method only uses two random perturbationsto estimate the gradient regardless of the problem’s dimension, which makesit particularly attractive for high dimensional problems. Specifically, in step j , one first draw a n d dimensional random vector ∆ j = [ δ j, , ...δ j,n d ], where n d = dim( D ), from a prescribed distribution that is symmetric and of finite17nverse moments. The algorithm then updates the solution using the followingequations: D j +1 = D j − a j b j ( D j ) , (A.1a) b j ( D j ) = ˆ U ( D j + c j ∆ j ) − ˆ U ( D j − c j ∆ j )2 c j ∆ − j , (A.1b)where ˆ U ( · ) is computed with Eq. (12) and ∆ − j = (cid:104) ∆ − j, , . . . , ∆ − j,n d (cid:105) T , a k = a ( A + k + 1) α , c k = c ( k + 1) γ , with A, α, c , and γ being algorithm parameters. Following the recommendationof [cite], we choose ∆ j ∼ Bernoulli(0.5) and A = 100 , α = 0 . , c = 1, and γ = 0 .
101 in this work.
B Derivation of Equation (3.6)
From Eq. (8), we have U ( D ) = (cid:90) (cid:90) p ( z | y , D ) ln (cid:34) p ( z | y , D ) p ( z ) (cid:35) d z p ( y | D ) d y = (cid:90) (cid:90) ln (cid:34) p ( y | z , D ) p ( y | D ) (cid:35) p ( y | z , D ) p ( z ) d z d y = (cid:90) (cid:90) ln( p ( y | z , D )) p ( y | z , D ) p ( z ) d z d y − (cid:90) (cid:90) ln[ p ( y | D )] p ( y | z , D ) p ( z ) d z d y (B.1)Recall that p ( y | z , D ) is multivariate normal distribution: p ( y | z , D ) = 1(2 π ) n/ · | C | / · exp[ −
12 ( y − u ) (cid:48) C − ( y − u )] , Because u and C only depend on z and D ( µ = µ ( z, d ) , C = C ( z, d ) ), wechange of variable: s = C − / ( y − u )18 (cid:90) ln( p ( y | z , D )) p ( z | z , D ) p ( z ) d z d y = (cid:90) (cid:90) (cid:18) − n π ) −
12 ln | C | −
12 ( y − u ) (cid:48) C − ( y − u ) (cid:19) × π ) n/ | C | / exp[ −
12 ( y − u ) (cid:48) C − ( y − u )] p ( z ) d z d y = (cid:90) (cid:90) (cid:18) − n π ) −
12 ln | C | − s (cid:48) s (cid:19) π ) n/ exp( − s (cid:48) s ) p ( z ) d z d s = − n π ) − (cid:90) ln | C | d z − (cid:90) (cid:90) (cid:18) s (cid:48) s (cid:19) π ) n/ exp( − s (cid:48) s ) p ( z ) d z d s = − (cid:90) ln | C | p ( z ) d z − (cid:90) (cid:18) s (cid:48) s (cid:19) π ) n/ exp( − s (cid:48) s ) d s − n π ) (B.2)Note that the second integral on the right hand side of Eq. (B.2) actually doesnot depend on D and so we can define a constant Z such that Z = − (cid:90) (cid:90) (cid:18) s (cid:48) s (cid:19) π ) n/ exp( − s (cid:48) s ) d s − n π ) . It follows immediately that (cid:90) (cid:90) ln( p ( y | z , D )) p ( z | z , D ) p ( z ) d z d y = − E z [ln | C | ] + Z, which in turns yields Eq. (9). References [1] Ibrahim Ahmad, Pi-Erh Lin, et al. A nonparametric estimation of the entropyfor absolutely continuous distributions (corresp.).
Information Theory, IEEETransactions on , 22(3):372–375, 1976.[2] S.K. Au and J. Beck. A new adaptive importance sampling scheme for reliabilitycalculations.
Struct. Safety , 21(2):135–158, 1999.[3] S.K. Au and J. Beck. Estimation of small failure probabilities in high dimensionsby subset simulation.
Prob. Eng. Mech. , 16:263–277, 2001.[4] Mathieu Balesdent, Jerome Morio, and Julien Marzat. Kriging-based adaptiveimportance sampling algorithms for rare event estimation.
Structural Safety ,44:1–10, 2013.[5] Julien Bect, David Ginsbourger, Ling Li, Victor Picheny, and EmmanuelVazquez. Sequential design of computer experiments for the estimation of aprobability of failure.
Statistics and Computing , 22(3):773–793, 2012.
6] Jan Beirlant, Edward J Dudewicz, Laszlo Gyorfi, and Edward C Van derMeulen. Nonparametric entropy estimation: An overview.
International Journalof Mathematical and Statistical Sciences , 6(1):17–39, 1997.[7] Barron J Bichon, Michael S Eldred, Laura Painton Swiler, SandaranMahadevan, and John M McFarland. Efficient global reliability analysis fornonlinear implicit performance functions.
AIAA journal , 46(10):2459–2468,2008.[8] Ilias Bilionis and Nicholas Zabaras. Multi-output local gaussianprocess regression: Applications to uncertainty quantification.
Journal ofComputational Physics , 231(17):5718–5746, 2012.[9] Ilias Bilionis, Nicholas Zabaras, Bledar A Konomi, and Guang Lin. Multi-output separable gaussian process: Towards an efficient, fully bayesian paradigmfor uncertainty quantification.
Journal of Computational Physics , 241:212–239,2013.[10] C.G. Bucher and U. Bourgund. A fast and efficient response surface approachfor structural reliability problems.
Struct. Safety , 7:57–66, 1990.[11] Frederic Cerou, Pierre Del Moral, Teddy Furon, and Arnaud Guyader.Sequential monte carlo for rare event estimation.
Statistics and Computing ,22(3):795–808, 2012.[12] Clement Chevalier, David Ginsbourger, Julien Bect, Emmanuel Vazquez,Victor Picheny, and Yann Richet. Fast parallel kriging-based stepwiseuncertainty reduction with application to the identification of an excursion set.
Technometrics , 56(4), 2014.[13] Cl´ement Chevalier, David Ginsbourger, and Xavier Emery. Corrected krigingupdate formulae for batch-sequential data assimilation. In
Mathematics ofPlanet Earth , pages 119–122. Springer, 2014.[14] Cl´ement Chevalier, Victor Picheny, and David Ginsbourger. Kriginv:An efficient and user-friendly implementation of batch-sequential inversionstrategies based on kriging.
Computational Statistics and Data Analysis ,71:1021 – 1034, 2014.[15] Ove Ditlevsen and Henrik O Madsen.
Structural reliability methods , volume178. Wiley New York, 1996.[16] Vincent Dubourg, B Sudret, and F Deheeger. Metamodel-based importancesampling for structural reliability analysis.
Probabilistic Engineering Mechanics ,33:47–57, 2013.[17] B Echard, N Gayton, and M Lemaire. Ak-mcs: an active learning reliabilitymethod combining kriging and monte carlo simulation.
Structural Safety ,33(2):145–154, 2011.[18] Xavier Emery. The kriging update equations and their application to theselection of neighboring data.
Computational Geosciences , 13(3):269–280, 2009.
19] S. Engelund and R. Rackwitz. A benchmark study on importance samplingtechniques in structural reliability.
Struct. Safety , 12:255–276, 1993.[20] L. Faravelli. Response surface approach for reliability analysis.
J. Eng. Mech. ,115(12):2763–2781, 1989.[21] N. Gayton, J.M. Bourinet, and M. Lemaire. CQ2RS: a new statistical approachto the response surface method for reliability analysis.
Struct. Safety , 25:99–121,2003.[22] S. Gupta and C.S. Manohar. An improved response surface method for thedetermination of failure probability and importance measures.
Struct. Safety ,26:123–139, 2004.[23] Peter Hall and Sally C Morton. On the estimation of entropy.
Annals of theInstitute of Statistical Mathematics , 45(1):69–88, 1993.[24] Xun Huan and Youssef M. Marzouk. Simulation-based optimal Bayesianexperimental design for nonlinear systems.
Journal of Computational Physics ,232(1):288–317, 2013.[25] Harry Joe. Estimation of entropy and other functionals of a multivariate density.
Annals of the Institute of Statistical Mathematics , 41(4):683–697, 1989.[26] Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-Optimal SensorPlacements in Gaussian Processes-Theory, Efficient Algorithms and EmpiricalStudies.
The Journal of Machine Learning Research , 9(May):235–284, 2008.[27] Jing Li, Jinglai Li, and Dongbin Xiu. An efficient surrogate-based methodfor computing rare failure probability.
Journal of Computational Physics ,230(24):8683–8697, 2011.[28] Ling Li, Julien Bect, and Emmanuel Vazquez. Bayesian subset simulation:a kriging-based subset simulation algorithm for the estimation of smallprobabilities of failure. arXiv preprint arXiv:1207.1963 , 2012.[29] Quan Long, Marco Scavino, Ra´ul Tempone, and Suojin Wang. Fastestimation of expected information gains for bayesian experimental designsbased on laplace approximations.
Computer Methods in Applied Mechanicsand Engineering , 259:24–39, 2013.[30] Lawrence E Malvern.
Introduction to the Mechanics of a Continuous Medium .Number Monograph. 1969.[31] Geoffrey McLachlan and David Peel.
Finite mixture models . John Wiley &Sons, 2004.[32] RE Melchers. Importance sampling in structural systems.
Structural safety ,6(1):3–10, 1989.[33] Anthony O’Hagan and JFC Kingman. Curve fitting and optimal design forprediction.
Journal of the Royal Statistical Society. Series B (Methodological) ,pages 1–42, 1978.
34] Margaret A Oliver and Richard Webster. Kriging: a method of interpolationfor geographical information systems.
International Journal of GeographicalInformation System , 4(3):313–332, 1990.[35] V. Picheny, D. Ginsbourger, O. Roustant, R. T. Haftka, and N. H. Kim.Adaptive designs of experiments for accurate approximation of a target region.
Journal of Mechanical Design , 2010.[36] R. Pulch. Polynomial chaos for the computation of failure probabilities inperiodic problems. In J. Roos and L. Costa, editors,
Scientific Computing inElectrical Engineering SCEE 2008 , 2010.[37] M.R. Rajashekhar and B.R. Ellingwood. A new look at the response surfaceapproach for reliability analysis.
Struct. Safety , 12:205–220, 1993.[38] R.Y. Rubinstein and D.P. Kroese.
The cross-entropy method . SpringerScience+Business Media, Inc., New York, NY, 2004.[39] G.I. Schueller, H.J. Pradlwarter, and P.S. Koutsourelakis. A critical appraisalof reliability estimation procedures for high dimensions.
Prob. Eng. Mech. ,19:463–474, 2004.[40] Timothy W Simpson, Timothy M Mauery, John J Korte, and Farrokh Mistree.Kriging models for global approximation in simulation-based multidisciplinarydesign optimization.
AIAA journal , 39(12):2233–2241, 2001.[41] James C Spall. Multivariate stochastic approximation using a simultaneousperturbation gradient approximation.
Automatic Control, IEEE Transactionson , 37(3):332–341, 1992.[42] James C Spall. Implementation of the simultaneous perturbation algorithm forstochastic optimization.
Aerospace and Electronic Systems, IEEE Transactionson , 34(3):817–823, 1998.[43] Hui Wang and Xiang Zhou. A cross-entropy scheme for mixtures.
ACMTransactions on Modeling and Computer Simulation (TOMACS) , 25(1):6, 2015.[44] Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes formachine learning. the MIT Press , 2(3):4, 2006., 2(3):4, 2006.