Sequential Bayesian experimental design for estimation of extreme-event probability in stochastic dynamical systems
SSequential Bayesian experimental design for estimation ofextreme-event probability in stochastic dynamical systems
Xianliang Gong, Yulin Pan ∗ a Department of Naval Architecture and Marine Engineering, University of Michigan,48109, MI, USA
Abstract
We consider a dynamical system with two sources of uncertainties: (1) pa-rameterized input with a known probability distribution and (2) stochas-tic input-to-response (ItR) function with heteroscedastic randomness. Ourpurpose is to efficiently quantify the extreme response probability when theItR function is expensive to evaluate. The problem setup arises often inphysics and engineering problems, with randomness in ItR coming from ei-ther intrinsic uncertainties (say, as a solution to a stochastic equation) oradditional (critical) uncertainties that are not incorporated in the input pa-rameter space. To reduce the required sampling numbers, we develop asequential Bayesian experimental design method leveraging the variationalheteroscedastic Gaussian process regression (VHGPR) to account for thestochastic ItR, along with a new criterion to select the next-best samplessequentially. The validity of our new method is first tested in two syntheticproblems with the stochastic ItR functions defined artificially. Finally, wedemonstrate the application of our method to an engineering problem ofestimating the extreme ship motion probability in ensemble of wave groups,where the uncertainty in ItR naturally originates from the uncertain initialcondition of ship motion in each wave group.
Keywords: extreme events, Bayesian experimental design, stochasticsystems ∗ Corresponding author
Email address: [email protected] (Yulin Pan)
Preprint submitted to Journal of Computational Physics February 23, 2021 a r X i v : . [ s t a t . M E ] F e b . Introduction Extreme events, considered as abnormally large response of a dynamicalsystem, can happen in many natural and engineering systems [1, 2, 3]. Ex-amples of such events include tsunami, extreme precipitation and extremestructural vibrations, which often cause catastrophic consequences to thesociety and environment. The quantification of the probability of extremeevents is of vital importance for the design of the engineering systems toalleviate their devastating impact.Many systems of interest are characterized by an input-to-response (ItR)function which can only be queried by expensive experiments or simulations.For example, the ship motion response induced by an input wave group isobtained from expensive model of computational fluid dynamics (CFD). Ifthe ItR function is deterministic, the probability distribution of the responseis then induced only by the (assumed) known probability distribution of theinput. The quantification of extreme-event probability for such systems hasbeen studied extensively in the context of Monte-Carlo simulations. Due tothe expensive evaluations of ItR and rareness of the extreme events, manystudies aim for reducing the number of required samples for the computa-tion. While techniques such as importance sampling and control variate [4]provide certain levels of acceleration, a more successful category of methodsmake use of the sequential Bayesian experimental design (BED) [5] (or activelearning [6]), where the next-best samples are selected based on the existinginformation. Two key components of the sequential BED are a surrogatemodel (usually a Gaussian progress regression [7]) to approximate the ItR,and an optimization problem maximizing a predefined acquisition functionto select the next-best samples. Varying in details in the two components, aspectrum of sequential BED methods for the purpose of estimating extreme-event probability have been developed [8, 9, 10, 11, 12, 13].However, in many cases, the ItR has to be considered as a stochasticfunction, instead of a deterministic one. These situations may originatefrom (a) an intrinsically stochastic dynamical system, e.g. stochastic differ-ential equations modeling a physical diffusion process or stock prices [14]; thestochastic model of climate variability including the non-average ‘weather’2omponent as random forcing terms [15], and (b) some uncertain variablesthat are not easily incorporated in the input parameter space. Under suchsituations, the probability distribution of the response is critically influencedby the uncertainty in the ItR (in addition to the uncertainty of input param-eters). Moreover, the uncertainty of the ItR (in particular the variance ofresponse) is often inhomogeneous for different input parameters. For exam-ple, the uncertainty of power production of wind turbine is different amongvarious wind speed inputs [16]; the uncertain ItR of ship motion resultedfrom uncertain initial conditions is sensitive to the wave group parametersas the input. A sequential BED method, which considers the heteroscedas-tic uncertainty in ItR and its effect on extreme-event probability is not yetavailable.In this work, we propose a new method to quantify the probability ofextreme events (defined as an observable above a given threshold) consider-ing the ItR with heteroscedastic uncertainty. The core of our algorithm isa variational heteroscedastic Gaussian process regression (VHGPR) whichapproximates the ItR with sufficiently low computational cost and high ac-curacy. This is a critical improvement upon the standard Gaussian processregression (SGPR) used in previous work which is unable to resolve the het-eroscedasticity in ItR. The VHGPR also provides significant benefits in com-putational cost relative to other methods such as [17], [18] and [19] where amuch larger dataset are needed to resolve the ItR. Accordingly, we formulatea new acquisition function for selecting the next-best sample considering theuncertainties in both the ItR and input parameter space. We first demon-strate the effectiveness of our method in two synthetic problems to estimatethe extreme-event probability. We show that drastically improved perfor-mance is achieved compared to existing approaches due to the resolution ofthe heteroscedasticity in ItR. Finally, we demonstrate the superiority of ourmethod (to existing methods) in solving an engineering problem of estimat-ing the extreme ship motion probability in irregular waves, where the ItRis established through a phenomenological nonlinear differential equationswith uncertainty in its initial conditions.3 . Computational framework
We start from a dynamical system with input x ∈ R d of known distri-bution X ∼ p X ( x ) and response y ∈ R . An ItR function S directly relates x to y with its randomness represented by ω : y ( ω ) = S ( x , ω ) , ω ∈ Ω . (1)Our interest is the exceeding probability of y ( ω ) above a threshold δ : P e ≡ P ( S ( X, ω ) > δ ) = (cid:90) P ( S ( X, ω ) > δ | X = x ) p X ( x ) d x = (cid:90) P ( S ( x , ω ) > δ ) p X ( x ) d x (2)It is clear that both the uncertainties associated with p X and ω contributeto the exceeding probability in (2). Moreover, the dependence of S on x and ω , as in (1), cannot be decoupled in general, resulting in different variance(introduced by ω ) for different input x , i.e., a heteroscedastic uncertainty.Resolving this heteroscedasticity in the ItR is critical in computing the ex-ceeding probability (2).A brute-force computation of (2) calls for extensive Monte-Carlo sam-ples in the probability space associated with both X and ω [17], which isprohibitive under expensive queries of S ( x, ω ). Therefore, we seek to de-velop a sampling algorithm following the sequential BED framework, whereeach sample is selected making use of the existing information of previoussamples. Two key components in our new approach are (1) an inexpensivesurrogate model based on the variational heteroscedastic Gaussian processregression (VHGPR) to approximate the heteroscedastic ItR; and (2) anoptimization based on an acquisition function to provide the next-best sam-ples with fast convergence in computing (2). We next describe the twocomponents in detail in § § To introduce the surrogate model for the ItR, we first rewrite (1) as S ( x , ω ) = f ( x ) + R ( x , ω ) , (3)4here f ( x ) ≡ E ω [ S ( x , ω )] is the mean of S ( x , ω ) with respect to ω , and R ( x , ω ) is the uncertain component with zero mean. Given a dataset (fromprevious samples) D = { x i , y i } i = ni =1 , our purpose is to approximate (3) usingGaussian process regression as involved in many sequential BED problems.In standard Gaussian process regression (SGPR), as implemented inmost previous applications for extreme-event probability, one can approxi-mate (3) as (given D ) f ( x ) ∼ GP ( µ f ( x ) , cov f ( x , x (cid:48) )) , (4) R ( x , ω ) ∼ N (0 , γ ) , (5)where GP ( · , · ) represents a Gaussian process with the first argument as themean and the second as the covariance function. The uncertain compo-nent R ( x , ω ) is approximated by an independent normal function at all x with constant variance γ . Clearly, the heteroscedasticity in ItR (i.e., thedependence of R on x ) cannot be captured by the SGPR.To incorporate the heteroscedasticity, we need to rely on the heteroscedas-tic Gaussian process regression (implemented as VHGPR following [20] inthis work). In VHGPR, we are able to approximate (3) as (given D ) f ( x ) ∼ GP ( µ f ( x ) , cov f ( x , x (cid:48) )) , (6) R ( x , ω ) ∼ N (0 , e g ( x ) ) , (7) g ( x ) ∼ GP ( µ g ( x ) , cov g ( x , x (cid:48) )) , (8)where the heteroscedastic (log) variance of the uncertain term R ( x , ω ) is rep-resented by another GP with mean µ g ( x ) and covariance function cov g ( x , x (cid:48) ).Both approximations in SGPR (in terms of (4)) and VHGPR ((6) and(8)) are computed as posterior predictive distributions under a Bayesianframework, with hyperparameters (say θ ) determined from maximizing thelikelihood function p ( D| θ ). For SGPR, both the likelihood function (used intraining of the model) and posterior (4) can be derived analytically, allowinga straightforward and inexpensive numerical implementation. In contrast,for VHGPR, the introduction of the Gaussian process on g ( x ) prohibits an-alytical results on the likelihood function and posterior, posing great chal-5enges in the numerical computation (which involves high-dimensional inte-gration).In order to reduce the computational cost, variational inference is appliedin VHGPR, which uses parameterized Gaussian distributions to approxi-mate some critical distributions involved in the posterior and likelihood func-tion. These Gaussian distributions can be determined efficiently throughsome optimization problem to minimize their differences from the criticaldistributions. As a result of this approximation, the high-dimensional inte-gration can be reduced to analytical formulations which leads to inexpensivecomputations (approximations) of the posterior and the likelihood function.More details on the algorithms of the VHGPR, along with SGPR, are sum-marized in Appendix A for completeness. The interested readers can alsorefer to [7, 20] for details.In summary, the VHGPR provides us with an estimation of the ItR, S ( x , ω | ω f , ω g ), where ω f and ω g denotes realizations of f and g in (6) and(8), and ω represents the randomness associated with (7). In other words,given realizations in ω f and ω g (or f and g ), the intrinsic randomness in ItRis expected to be captured by ω , i.e., the heteroscedastic distribution in (7). Given the VHGPR of the ItR, the exceeding probability can be expressedas P ( S ( X, ω | ω f , ω g ) > δ ) = (cid:90) P ( S ( x , ω | ω f , ω g ) > δ ) p X ( x ) d x , (9)which depends on the realizations in ω f and ω g . The purpose hereis to construct an acquisition function, based on which the next sam-ple can be selected to minimize the variance of the estimation (9), i.e.,var ω f ,ω g [ P ( S ( X, ω | ω f , ω g ) > δ )]. For simplicity in computation, we use anupper bound of the estimation variance as its proxy (see Appendix B forthe derivation of the upper bound):var ω f ,ω g [ P ( S ( X, ω | ω f , ω g ) > δ )] ≤ (cid:90) std ω f ,ω g (cid:104) P ( S ( x , ω | ω f , ω g ) > δ ) (cid:105) p X ( x ) d x , (10)6here std ω f ,ω g is a standard deviation operator with respect to ω f and ω g .The next-best sample is selected at the value of x which contributes mostto the integral on the right hand side of (10), i.e., the maximum point ofthe integrand corresponding to the most uncertain part in the estimation of P ( S ( X, ω | ω f , ω g ) > δ ): x ∗ = argmax x std ω f ,ω g [ P ( S ( x , ω | ω f , ω g ) > δ )] p X ( x ) . (11)We note that (11) is closely related to the so-called U criterion [10] widelyused in computing the exceeding probability associated with a determinis-tic ItR. In general, the U criterion seeks the most ‘dangerous’ point (i.e.,point with maximum local variance), which in our problem corresponds to x ∗ = argmax x std ω f ,ω g [ P ( S ( x , ω | ω f , ω g ) > δ )] (i.e., variance of (9) withoutconsidering the correlation at different x and p X ( x )). Therefore, our acqui-sition function in (11) can be considered as a weighted U criterion whichincorporates the influence of the input distribution p X ( x ) in computing thevariance of (9).In practice, we approximate the operator std ω f ,ω g ≡ (var ω f ,ω g ) . by thetwo-dimensional spherical cubature integration [21] with 4 quadrature points(although extension to more quadrature points is straightforward): m = E ω f ,ω g [ P ( S ( x , ω | ω f , ω g ) > δ )] = 14 (cid:88) i =1 P ( S ( x , ω | u i )) , (12)var ω f ,ω g [ P (( S ( x , ω | ω f , ω g ) > δ )] = 14 (cid:88) i =1 ( P ( S ( x , ω | u i )) − m ) , (13) u = { f ( x ) = µ f ( x ) + (cid:113) f ( x ) , g ( x ) = 0 } u = { f ( x ) = 0 , g ( x ) = µ g ( x ) + (cid:113) g ( x ) } u = { f ( x ) = − µ f ( x ) − (cid:113) f ( x ) , g ( x ) = 0 } u = { f ( x ) = 0 , g ( x ) = − µ g ( x ) − (cid:113) g ( x ) } , (14)where the quadrature points (14) and the corresponding weights 1 / x in addition to the stan-dard deviation in (11). For example, techniques developed for deterministicItR, such as using a hypothetical point [1, 23] and global sensitivity anal-ysis [9], may be transferred here. However, they may lead to significantlyincreased computational cost when combining with VHGPR (e.g., the re-training of the variational parameters when hypothetical points are used).These potential developments will be left to our future work.Combining the VHGPR surrogate model (Eq. (6), (7), and (8)) and theoptimization of acquisition function (11), we are able to sequentially selectthe next-best samples starting from an initial dataset. The final estimationof exceeding probability P e is computed by VHGPR with µ f and µ g torepresent functions f and g : P e = P ( S ( X, ω | f ( x ) = µ f ( x ) , g ( x ) = µ g ( x ) ) > δ ) . (15)We summarize the algorithm of this sequential sampling process in Al-gorithm 1.
3. Results
In this section, we validate our approach using two synthetic problemsand a (more) realistic engineering application to quantify the extreme shipmotion probability in irregular waves. For all cases, we present the resultsfrom our current method of sequential sampling (or sequential BED) withVHGPR as a surrogate model (Seq-VHGPR), as well as other methods forvalidation and comparison. These other methods include Monte Carlo sam-pling using one million samples for accurate estimation of the mean andvariance of ItR (Exact-MC, which serves as the exact result to validate Seq-VHGPR); space-filling Latin hypercube (LH) sampling [24] with VHGPRas a surrogate (LH-VHGPR, which serves as a reference to demonstrate the8 lgorithm 1
Sequential sampling for systems with stochastic ItR
Require:
Number of initial points n init , number of iterations n iter Input:
Initial dataset D = { x i , y i } i = n init i =1 Initialization j = 0 while j < n iter do Train the surrogate model (Eq. (6), (7), and (8)) with D Maximize the acquisition function (11) to find the next best sample x j +1 Implement numerical simulation to get y j +1 = S ( x j +1 , ω )Update the dataset D = D ∪ { x j +1 , y j +1 } j = j + 1 end whileOutput: Compute the exceeding probability (15) based on the surrogatemodelefficiency of sequential sampling); LH sampling with SGPR as a surrogate(LH-SGPR, to demonstrate the necessity of using VHGPR). We remark thatthe asymptotic LH-SGPR solution (with increase of sampling number) cor-responds to the best solution that can be achieved by previous vast methodsusing SGPR [9, 10, 12, 25].
We start from a 1D synthetic problem, where the true ItR S ( x, ω ) (3) isconstructed with (see figure 1(a) for an illustration) f ( x ) = ( x − , (16)and R ( x, ω ) ∼ N (0 , γ ( x )) with γ ( x ) = 0 . . x . (17)The input X is assumed to follow a Gaussian distribution with p X ( x ) = N (5 , P e = P ( S ( X, ω ) > .0 2.5 5.0 7.5 10.0x02040 Figure 1: The mean f ( x ) ( ) and uncertainty bounds f ( x ) ± γ ( x ) ( ) of the 1D ItR,as well as the threshold ( ) in defining the exceeding probability.
40 50 60 70 80 90 100Number of Samples0.000.020.040.060.08 P e results of 1D problem Figure 2: P e in the 1D synthetic problem, computed by Seq-VHGPR( ), LH-VHGPR( ), LH-SGPR( ) with its asymptotic value ( ), Exact-MC( ) (in termsof the upper and lower 5% error bounds). The shaded region represents one standard de-viation above the mean estimated from 200 applications of the each method. y (a) ( S ( x , ) > ) p X ( x ) (b) Figure 3: (a) Typical positions of initial 40 samples ( ) and 60 sequential samples ( ) inSeq-VHGPR, as well as the learned function f ( x )( ) and f ( x ) ± γ ( x )( ) comparedto the corresponding exact functions ( , ); (b) the function P ( S ( x, ω ) > δ ) p X ( x ). Figure 2 plots the results P e computed by Exact-MC, Seq-VHGPR, LH-VHGPR and LH-SGPR (where in Seq-VHGPR and LH-VHGPR, P e is esti-mated by (15); in LH-SGPR, P e is estimated by (15) with constant variance g ( x ) = log( γ )). Also included in figure 2 are the standard deviations inSeq-VHGPR, LH-VHGPR and LH-SGPR obtained from 200 applicationsof the methods. (These uncertainties come from the initial sampling po-sitions and the randomness ω of ItR in computing S ( x, ω ) for each query,which is much larger than the uncertainty associated with f and g in eachapplication.) We see that the result from Seq-VHGPR converges rapidly tothat from Exact-MC (shown in terms of the 5%-error region) in the first 20sequential samples, with an accurate estimation of the exceeding probability.In contrast, the LH-VHGPR result converges much slower, with a significantdifference from the Exact-MC result at the end of 100 samples in figure 2(in spite of a favorable trend). Furthermore, the LH-SGPR result convergesto an asymptotic value which is 3 times of the Exact-MC result, showingthe incapability of this class of methods (i.e., most previous methods usingSGPR) in estimating the exceeding probability induced by a heteroscedasticItR. Finally, as shown by the shaded area in figure 2, Seq-VHGPR leads tosignificantly reduced standard deviation compared to other approaches.We further examine the reason for the fast convergence achieved by Seq-11 x x . . . . . . . . (a): f( x , x ) x x . . . . . . . . (b): ( x , x ) x x - . - . - . - . - . - . (c): log ( ( S ( x , x , ) > 5) p ( x , x )) Figure 4: (a) f ( x , x ) and (b) γ ( x , x ) as in R ( x , x , ω ) in the 2D stochastic four-branchItR. VHGPR (relative to LH-VHGPR). Figure 3(a) plots the positions of 100samples (i.e., 40 initial and 60 sequential samples) in Seq-VHGPR, as wellas the learned functions f ( x ) and γ ( x ). While the initial 40 samples arerandomly chosen (providing the overall trend of f ( x ) and γ ( x )), the 60sequential samples are concentrated near x = 6 .
5, providing more accurateestimation of f ( x ) and γ ( x ) in the nearby region. This point correspondsto the maximum in P ( S ( x, ω ) > δ ) p X ( x ) (the integrand in (2)) as shown infigure 3(b), leading to the largest contribution in computing the exceedingprobability (2). We construct a 2D synthetic problem by setting f ( x ) to be a four-branchfunction (that has been widely-used in estimating extreme-event probability12 P e Figure 5: P e in the 2D synthetic problem, computed by Seq-VHGPR( ), LH-VHGPR( ), LH-SGPR( ) with its asymptotic value ( ), Exact-MC( ) (in termsof the upper and lower 5% error bounds). The shaded region represents one standard de-viation above the mean estimated from 200 applications of the each method. x x . . . . . . . . . . . . . . . . . . . . . . (a): f ( x , x ) x x . . . . . . . . . (b): ( x , x ) x x . . . . . . . . (a): f( x , x ) x x . . . . . . . . (b): ( x , x ) x x - . - . - . - . - . - . (c): log ( ( S ( x , x , ) > 5) p ( x , x )) Figure 6: Typical positions of initial 60 samples ( ) and 60 sequential samples ( ) in Seq-VHGPR, as well as the learned f ( x , x ) ( ) compared to the exact function ( ) in(a); and the learned γ ( x , x ) ( ) compared to the exact function ( ) in (b); (c) the(log) function P ( S ( x , x , ω ) > δ ) p X ,X ( x , x ). f ( x , x ) = − min . x − x ) + ( x + x ) √
28 + 0 . x − x ) − ( x + x ) √ x − x ) + 6 √ x − x ) + 6 √ . To generate an uncertain ItR, we add a Gaussian randomness R ( x , x , ω ) ∼N (0 , γ ( x , x )) to f ( x , x ) with standard deviation γ ( x , x ) = f ( x , x ) / f ( x , x ) and γ ( x , x ))). We assume the input X to follow a Gaussian distribution p X ,X ( x , x ) = N ( , I), with I being a2 × P e = P ( S ( X , X , ω ) > f ( x , x ) and γ ( x , x ). Similar to the 1D case, the sequential samples areexpected to concentrate in regions where P ( S ( x , x , ω ) > δ ) p X ,X ( x , x )is maximized, i.e., the four regions enclosed by -4 contour lines in figure 6(c).As shown in figure 6(a)(b), most sequential samples lie in three out of thefour regions (although the situation depends on the initial samples and forsome cases all four regions can be filled). The difficulty of the sequential14 igure 7: (a) η ( t ) ( ) and the corresponding ρ ( t ) ( ) in a narrow-band wave field.(b) ρ ( t ) ( ) fitted by an ensemble of Gaussian wave groups ρ c ( t )( ) with parameters L and A . (c) p L,A ( L, A ) obtained from wave fields. samples transiting to all four regions within 60 samples can be anticipated,which is consistent with the observation in the case of deterministic ItR ifthe U criterion is used as the acquisition function [10]. While the designof better acquisition function is possible referring to the counterpart in thedeterministic case [9], the current results already show the adequacy of Seq-VHGPR in estimating the exceeding probability (even if not all four regionsare filled and the estimation of γ ( x , x ) is relatively less accurate than thatof f ( x , x )). We further consider an engineering application of our method to esti-mate the probability of extreme ship roll motions in uni-directional irreg-ular waves. In marine engineering, the ship motion problem can often betreated as a dynamical system where the input is a time series of wave (orsurface) elevation η ( t ), and the output is, say, the ship roll motion ξ ( t ).The ItR connecting η ( t ) and ξ ( t ) can be computed by Computational FluidDynamics (CFD) simulations. However, the resolution of exact exceedingprobability requires running expensive CFD simulations with a very long-15ime input η ( t ) (due to the rareness of the extreme roll motion), leading toprohibitively high computational cost. Therefore, for the purpose of vali-dating our approach, we use an inexpensive phenomenological nonlinear rollequation [26] to construct the ItR (with the uncertainty associated with ω introduced later)¨ ξ + α ˙ ξ + α ˙ ξ + ( β + (cid:15) cos( φ ) η ( t )) ξ + β ξ = (cid:15) sin( φ ) η ( t ) , (18)with empirical coefficients [27] α = 0 . α = 0 . β = 0 . β = − . (cid:15) = 0 . (cid:15) = 0 . φ = π/ η ( t ) is usually specified from a wave spectral process,which resides in a high-dimensional input space. A typical procedure toreduce the dimension is to describe η ( t ) by an ensemble of wave groupsembedded in its envelope process [28, 1] (see figure 7(a) for an illustration,which is only strictly valid in narrow-band wave spectrum). Specifically, wecompute the envelope process ρ ( t ) from η ( t ) through the Hilbert transform[29], and then construct two-parameter Gaussian-like wave groups ρ c ( x )which best fits ρ ( t ): ρ c ( t ) ∼ A exp − ( t − t c ) L , (19)where t c is the temporal location of the group, and the two parameters A (group amplitude) and L (group length) describe the geometry of thegroup (figure 7(b)). This dimension-reduction procedure allows η ( t ) to bedescribed by an ensemble of ( L, A ) wave groups, i.e., a two-dimensionalinput parameter space (see figure 7(c)). We can then construct an ItR withthe input as (
L, A ) to (18) and the output as the maximum roll through thiswave group, and consider the group-based probability.However, sampling in the (
L, A ) space (with each sample as the inputto (18) with pre-defined initial condition, say ξ (0) = 0 and ˙ ξ (0) = 0 as in[8, 30]) loses critical information associated with η ( t ). The most importantone among them is the varying initial condition of the ship at the encounterof each wave group. As shown in [31, 32], (18) (and the ship roll in general)may be sensitive to the initial conditions for some A and L (see figure 8 as16
20 40 60 80 t ( t ) (a) t ( t ) (b) t ( t ) (c) t ( t ) (d) Figure 8: The roll responses (c) and (d) with different initial conditions { ˙ ξ (0) = 0 , ξ (0) =0 } ( ); { ˙ ξ (0) = 0 , ξ (0) = − . } ( ); { ˙ ξ (0) = 0 , ξ (0) = 0 . } ( ), computed from(18) with input from a wave group with respectively (a) larger and (b) smaller amplitudes.
17n example). This creates heteroscedastic uncertainty in the ItR (associatedwith ω ) that needs to be dealt with by our current approach Seq-VHGPR.In the following, we show the results with input η ( t ) extracted from anarrow-band Gaussian wave spectrum: F ( k ) = H s
16 1 √ π K exp − ( k − k ) K , (20)with significant wave height H s = 12 m , peak (carrier) wavenumber k =0 . m − (corresponding to peak period T p = 15 s ), and K = 0 . k . Theexact exceeding probability is computed by simulating 1500 hours (36000 T p ) of ship responses. To compute the ItR incorporating the heteroscedasticrandomness in ω , after a sample ( L, A ) is chosen, we randomly select a wavegroup of this (
L, A ) in ρ ( x ), and simulate (18) starting from (on average) 5groups ahead of the ( L, A ) group with a (0 ,
0) initial condition. Since theimpact of initial conditions typically decay in O (1) wave group, we are ableto naturally capture the true initial condition as the ship encounters the( L, A ) group.Figure 9 plots P e = P (max( ξ L,A ( t )) > .
3) (the probability that maxi-mum roll in a group exceeds 17 degrees) obtained from Seq-VHGPR, LH-VHGPR, LH-SGPR and the exact solution. We see that the Seq-VHGPRresult converges to the exact solution within the first 30 sequential samples,much faster than the convergence of the LH-VHGPR result. The LH-SGPRresult converges to an asymptotic value which is appreciably larger than theexact solution. We remark that this value represents the best result thatcan be achieved by all previous methods on this problem [8, 30] based onSGPR. These results, again, demonstrate the effectiveness of Seq-VHGPRin computing the exceeding probability relative to all other approaches.
4. Conclusions
In this paper, we present a new method (Seq-VHGPR) to efficiently es-timate the extreme-event probability (in terms of the exceeding probability)induced by an ItR with heteroscedastic uncertainty. The method is estab-lished in the framework of sequential BED, and leverages the VHGPR as a18 P e Figure 9: P e in the ship roll problem, computed by Seq-VHGPR( ), LH-VHGPR( ),LH-SGPR( ) with its asymptotic value ( ), exact solution( ) (in terms of theupper and lower 5% error bounds). The shaded region represents one standard deviationabove the mean estimated from 200 applications of the each method. surrogate model to estimate the uncertain ItR. A new acquisition functioncorresponding to the VHGPR estimation is developed to select the next-bestsequential sample which leads to fast convergence of the exceeding proba-bility. We validate our new method in two synthetic problems and oneengineering application to estimate the extreme ship motion probability inirregular waves. In all cases, we find fast convergence of Seq-VHGPR to theexact solution, demonstrating its superiority to all existing methods if anItR with heteroscedastic uncertainty is associated with the problem. This isindeed due to the effectiveness of VHGPR in estimating the ItR, althoughthere is still room for improvement of the acquisition function to acceler-ate the convergence as done in the case of deterministic ItR [9]. Finally,we remark that the present method also provides a potential way for high-dimensional BED, where the most influential dimensions can be selected as(low-dimensional) input X , with other secondary ones packaged into Ω inthe ItR (as in the ship motion problem).19 CKNOWLEDGEMENT
This research is supported by the Office of Naval Research grant N00014-20-1-2096. We thank the program manager Dr. Woei-Min Lin for severalhelpful discussions on the research.
References [1] M. Farazmand, T. P. Sapsis, Extreme events: Mechanisms and predic-tion, Applied Mechanics Reviews 71 (5) (2019).[2] M. Ghil, P. Yiou, S. Hallegatte, B. Malamud, P. Naveau, A. Soloviev,P. Friederichs, V. Keilis-Borok, D. Kondrashov, V. Kossobokov, et al.,Extreme events: dynamics, statistics and prediction, Nonlinear Pro-cesses in Geophysics 18 (3) (2011) 295–350.[3] S. Rahmstorf, D. Coumou, Increase of extreme events in a warmingworld, Proceedings of the National Academy of Sciences 108 (44) (2011)17905–17909.[4] A. B. Owen, Monte Carlo theory, methods and examples, 2013.[5] K. Chaloner, I. Verdinelli, Bayesian experimental design: A review,Statistical Science (1995) 273–304.[6] D. A. Cohn, Z. Ghahramani, M. I. Jordan, Active learning with statis-tical models, Journal of artificial intelligence research 4 (1996) 129–145.[7] C. E. Rasmussen, Gaussian processes in machine learning, in: SummerSchool on Machine Learning, Springer, 2003, pp. 63–71.[8] M. A. Mohamad, T. P. Sapsis, Sequential sampling strategy for ex-treme event statistics in nonlinear dynamical systems, Proceedings ofthe National Academy of Sciences 115 (44) (2018) 11138–11143.[9] Z. Hu, S. Mahadevan, Global sensitivity analysis-enhanced surrogate(gsas) modeling for reliability analysis, Structural and MultidisciplinaryOptimization 53 (3) (2016) 501–521.2010] B. Echard, N. Gayton, M. Lemaire, Ak-mcs: an active learning reliabil-ity method combining kriging and monte carlo simulation, StructuralSafety 33 (2) (2011) 145–154.[11] H. Wang, G. Lin, J. Li, Gaussian process surrogates for failure detec-tion: A bayesian experimental design approach, Journal of Computa-tional Physics 313 (2016) 247–259.[12] B. J. Bichon, M. S. Eldred, L. P. Swiler, S. Mahadevan, J. M. McFar-land, Efficient global reliability analysis for nonlinear implicit perfor-mance functions, AIAA journal 46 (10) (2008) 2459–2468.[13] A. Blanchard, T. Sapsis, Output-weighted optimal sampling forbayesian experimental design and uncertainty quantification, arXiv e-prints (2020) arXiv–2006.[14] H. Holden, B. Øksendal, J. Ubøe, T. Zhang, Stochastic partial differ-ential equations, in: Stochastic partial differential equations, Springer,1996, pp. 141–191.[15] K. Hasselmann, Stochastic climate models part i. theory, tellus 28 (6)(1976) 473–485.[16] T. Rogers, P. Gardner, N. Dervilis, K. Worden, A. Maguire, E. Pap-atheou, E. Cross, Probabilistic modelling of wind turbine power curveswith application of heteroscedastic gaussian process regression, Renew-able Energy 148 (2020) 1124–1136.[17] P. Perdikaris, D. Venturi, J. O. Royset, G. E. Karniadakis, Multi-fidelitymodelling via recursive co-kriging and gaussian–markov random fields,Proceedings of the Royal Society A: Mathematical, Physical and Engi-neering Sciences 471 (2179) (2015) 20150018.[18] L. Le Gratiet, Multi-fidelity gaussian process regression for computerexperiments, Ph.D. thesis (2013).2119] L. Parussini, D. Venturi, P. Perdikaris, G. E. Karniadakis, Multi-fidelitygaussian process regression for prediction of random fields, Journal ofComputational Physics 336 (2017) 36–50.[20] M. L´azaro-Gredilla, M. K. Titsias, Variational heteroscedastic gaussianprocess regression, in: ICML, 2011.[21] E. A. Wan, R. Van Der Merwe, The unscented kalman filter for non-linear estimation, in: Proceedings of the IEEE 2000 Adaptive Systemsfor Signal Processing, Communications, and Control Symposium (Cat.No. 00EX373), Ieee, 2000, pp. 153–158.[22] Z. Zhu, X. Du, Reliability analysis with monte carlo simulation anddependent kriging predictions, Journal of Mechanical Design 138 (12)(2016).[23] P. Pandita, I. Bilionis, J. Panchal, Bayesian optimal design of exper-iments for inferring the statistical expectation of expensive black-boxfunctions, Journal of Mechanical Design 141 (10) (2019).[24] M. D. McKay, R. J. Beckman, W. J. Conover, A comparison of threemethods for selecting values of input variables in the analysis of outputfrom a computer code, Technometrics 42 (1) (2000) 55–61.[25] Z. Sun, J. Wang, R. Li, C. Tong, Lif: A new kriging based learningfunction and its application to structural reliability analysis, ReliabilityEngineering & System Safety 157 (2017) 152–165.[26] N. Umeda, H. Hashimoto, D. Vassalos, S. Urano, K. Okou, Nonlineardynamics on parametric roll resonance with realistic numerical mod-elling, International shipbuilding progress 51 (2, 3) (2004) 205–220.[27] A. H. Nayfeh, D. T. Mook, P. Holmes, Nonlinear oscillations (1980).[28] M. S. Longuet-Higgins, On the joint distribution of wave periods andamplitudes in a random wave field, Proceedings of the Royal Societyof London. A. Mathematical and Physical Sciences 389 (1797) (1983)241–258. 2229] K. Shum, W. K. Melville, Estimates of the joint statistics of amplitudesand periods of ocean waves using an integral transform technique, Jour-nal of Geophysical Research: Oceans 89 (C4) (1984) 6467–6476.[30] X. Gong, Z. Zhang, K. Maki, Y. Pan, Full resolution of extreme shipresponse statistics, 33rd Symposium on Naval Hydrodynamics (2000).[31] P. A. Anastopoulos, K. J. Spyrou, Evaluation of the critical wave groupsmethod in calculating the probability of ship capsize in beam seas,Ocean Engineering 187 (2019) 106213.[32] M. S. Soliman, J. Thompson, Transient and steady state analysis ofcapsize phenomena, Applied Ocean Research 13 (2) (1991) 82–92.[33] C. M. Bishop, Pattern recognition and machine learning, springer, 2006.[34] S. Kullback, R. A. Leibler, On information and sufficiency, The annalsof mathematical statistics 22 (1) (1951) 79–86.
Appendix A. Algorithms for Gaussian Process Regression
We consider the task of inferring the input to response (ItR) functionfrom a dataset D = { x i , y i } i = ni =1 consisting of n inputs X = { x i ∈ R d } i = ni =1 andthe corresponding outputs y = { y i ∈ R } i = ni =1 . Appendix A.1. Standard Gaussian Process Regression (SGPR)
SGPR assumes the function to be a sum of a mean f ( x ) and a Gaussianrandomness with constant variance γ (at all x ): y = f ( x ) + R R ∼ N (0 , γ ) , (A.1)A prior, representing our beliefs over all possible functions we expect toobserve, is placed on f as a Gaussian process f ( x ) ∼ GP (0 , k f ( x , x (cid:48) )) withzero mean and covariance function k f (usually defined by a radial-basis-function kernel): k f ( x , x (cid:48) ) = τ exp( − j = d (cid:88) j =1 ( x j − x (cid:48) j ) l j ) , (A.2)23here the amplitude τ and length scales l j , together with γ , are hyperpa-rameters θ = { τ, l j , γ } in SGPR.Following the Bayesian formula, the posterior prediction for f given thedataset D can be derived to be another Gaussian: p ( f ( x ) |D ) = p ( f ( x ) , y ) p ( y ) = N ( µ f ( x ) , cov f ( x )) , (A.3)with analytically tractable mean µ f ( x ) and covariance cov f ( x , x (cid:48) ): µ f ( x ) = k f ( x , X ) T (K f ( X , X ) + γ I) − y , (A.4)cov f ( x , x (cid:48) ) = k f ( x , x (cid:48) ) − k f ( x , X ) T (K f ( X , X ) + γ I) − k f ( x (cid:48) , X ) , (A.5)where matrix element K f ( X , X ) ij = k f ( x i , x j ). The hyperparameters θ aredetermined which maximizes the likelihood function p ( D| θ ) ≡ p ( y | θ ) = N (0 , K f ( X , X ) + γ I).
Appendix A.2. Variational Heteroscedastic Gaussian Process Regression(VHGPR)
In this section, we briefly outline the algorithm of VHGPR. The purposeis to provide the reader enough information to understand the logic behindVHGPR. For the conciseness of the presentation, some details in the algo-rithm have to be omitted. We recommend the interested readers to read[20] and §
10 in [33] for details.In VHGPR, the function of ItR is considered as the sum of a mean anda Gaussian term (independent at all x ) with heteroscedastic uncertainty: y = f ( x ) + R R ∼ N (0 , e g ( x ) ) . (A.6)Two Gaussian priors are placed on the mean and the (log) variance function: f ( x ) ∼ GP (0 , k f ( x , x (cid:48) )) , (A.7) g ( x ) ∼ GP ( µ , k g ( x , x (cid:48) )) , (A.8)where k f and k g are respectively the radial-basis-function kernels for f and g ,defined similarly as (A.2). µ is the prior mean for g . The hyperparametersin VHGPR can then be defined as θ = ( µ , θ f , θ g ) where θ f,g includes the24mplitudes and length scales involved in k f or k g . Moreover, we assume f is independent with g .The increased expressive power with the heteroscedastic variance is atthe cost of analytically intractable likelihood (for determination of hyper-parameters) and posterior (prediction). Let f = f ( X ) and g = g ( X ) denotethe realizations of mean and variance at training inputs X following the dis-tributions in (A.7) and (A.8) . The likelihood and prediction are formulatedas: p ( D| θ ) ≡ p ( y | θ ) = (cid:90) (cid:90) p ( y | f , g ) p ( g | θ ) p ( f | θ )d g d f , (A.9) p ( f ( x ) , g ( x ) |D ) = (cid:90) (cid:90) p ( f ( x ) , g ( x ) | f , g ) p ( f , g | y ) d f d g . (A.10)Since analytical integration cannot be achieved for (A.9) and (A.10),numerical evaluations of the integrals are needed for their computations.However, the dimension of integration (w.r.t f and g ) is the same as numberof data points n , which can be prohibitively high for a direct integration, say,using quadrature methods. While the Monte-Carlo method (e.g., MCMC)offers some advantages in computational cost, its application is still tooexpensive for most practical problems. For these problems, the VHGPRleveraging variational inference is the only method which provides practicalsolutions with low computational cost and sufficient accuracy.The key distribution in computing (A.10) and (A.9) is p ( f , g | y ) (directlyinvolved in (A.10) and related to (A.9) due to (A.11) that will be discussed),which is however expensive to compute directly. The key idea in VHGPRis to approximate p ( f , g | y ) by q ( f , g ), where the latter is assumed to havea Gaussian distribution with parameters (multi-dimensional means and co-variance) denoted here by θ q . Through minimizing the KL divergence [34]between p ( f , g | y ) and q ( f , g ), the parameters θ q can be determined andboth the posterior and likelihood can be evaluated accordingly as discussedbelow.For the posterior, (A.10) becomes a linear Gaussian model (an integra-tion of the exponential of quadratic function of f and g with Gaussianweights) which has an analytical formulation. For the likelihood, we avoid25irectly using (A.9) but decompose log p ( y | θ ) as a summation of the ev-idence lower bound (ELBO) L ( q ( f , g )) and the K-L divergence between q ( f , g ) and p ( f , g | y ) :log p ( y | θ ) = L ( q ( f , g )) + KL( q ( f , g ) | p ( f , g | y )) , (A.11)where L ( q ( f , g )) = (cid:90) (cid:90) q ( f , g ) log p ( y , f , g ) q ( f , g ) d f d g , (A.12)KL( q ( f , g ) | p ( f , g | y )) = (cid:90) (cid:90) q ( f , g ) log q ( f , g ) p ( f , g | y ) d f d g . (A.13)We then formulate an optimization problem of L (where L , as a weightedintegration of the exponential function of f and g , has an analytical expres-sion for Gaussian weights q ( f , g ) [33]) to determine both the parameters in q ( f , g ) ( θ q ) and the hyperparameters ( θ ): θ ∗ q , θ ∗ = argmax θ q , θ L ( θ q , θ ) . (A.14)We remark that (A.14) can be conceived as two sequential optimizationproblems with respect to θ q and θ . For the former, maximizing L is equiv-alent to minimizing the KL divergence (as an aforementioned goal) sincelog p ( y | θ ) in (A.11) is not a function of θ q . For the latter, since the KLdivergence has been minimized, the ELBO gives a good approximation oflikelihood log p ( y | θ ). Therefore, the solution of (A.14) simultaneously pro-vides the optimal θ q leading to a good approximation of p ( f , g | y ) by q ( f , g ),as well as the optimal θ leading to a maximized likelihood.However, (A.14) with respect to θ q is still a prohibitively expensive opti-mization with dimensions 2 n +2 n (2 n +1) / q ( f , g )). To reduce the number ofdimensions, a key procedure employed in VHGPR is to assume that the This decomposition can be derived from manipulation of (A.13) to beKL( q ( f , g ) | p ( f , g | y )) = − (cid:90) (cid:90) q ( f , g ) log p ( y , f , g ) q ( f , g ) d f d g + log p ( y ) . q ( f , g ) is separable in f and g , i.e., q ( f , g ) = q f ( f ) q g ( g ). Thisbrings two benefits: First, one can show that the maximized solution of L involves a relation between q f ( f ) and q g ( g ), i.e., q f ( f ) can be representedas a function of q g ( g ) so that the parameters in θ q is reduced to the mean µ and covariance Σ of q g ( g ) (with dimension n + n ( n + 1) / L with respect to µ and Σ (by making ∂ L /∂ µ = and ∂ L /∂ Σ = ) leads to an analytical form µ = K g ( X , X )(Λ −
12 I) + µ , (A.15)Σ = (K g ( X , X ) − + Λ) − , (A.16)with K g ( X , X ) ij = k g ( x i , x j ), Λ being a diagonal matrix involving n un-known parameters and being a vector with all elements 1. Therefore theoptimization (A.14) is finally reduced toΛ ∗ , θ ∗ = argmax Λ , θ L ( µ (Λ) , Σ(Λ) , θ ) , (A.17)with only n parameters in θ q (or Λ). This can be solved by gradient-basedmethod, with the major computational cost in computing n × n matrixinversions in L . The computational cost of each iteration in (A.17) is onlyapproximately twice as that in SGPR.With q f ( f ), q g ( g ) (computed from Λ ∗ ), and θ available, the posteriorpredictions for f and g in (A.10) are: p ( f ( x ) , g ( x ) |D ) = (cid:90) (cid:90) p ( f ( x ) , g ( x ) | f , g ) q f ( f ) q g ( g ) d f d g = (cid:90) (cid:90) p ( f ( x ) | f ) p ( g ( x ) | g ) q f ( f ) q g ( g ) d f d g (independence of f ( x ) and g ( x ))= (cid:90) p ( f ( x ) | f ) q f ( f ) d f (cid:90) p ( g ( x ) | g ) q g ( g ) d g = p ( f ( x ) |D ) p ( g ( x ) |D ) , (A.18)27here: p ( f ( x ) |D ) = N ( µ f ( x ) , cov f ( x , x (cid:48) )) , (A.19) p ( g ( x ) |D ) = N ( µ g ( x ) , cov g ( x , x (cid:48) )) , (A.20) µ f ( x ) = k f ( x , X ) T (K f ( X , X ) + Z ) − y , (A.21)cov f ( x , x (cid:48) ) = k f ( x , x (cid:48) ) − k f ( x , X ) T (K f ( X , X ) + Z ) − k f ( x (cid:48) , X ) , (A.22) µ g ( x ) = k g ( x , X ) T (Λ −
12 I) + µ , (A.23)cov g ( x , x (cid:48) ) = k g ( x , x (cid:48) ) − k g ( x , X ) T (K g ( X , X ) + Λ − ) − k g ( x (cid:48) , X ) , (A.24)with Z ii = e µ i − Σ ii / being a diagonal matrix and K f ( X , X ) ij = k f ( x i , x j ). Appendix B. The upper bound of the estimation variance
Here we show the construction of an upper bound of the estimationvariance (For convenience we use ˆ p ω ( x ) to represent P ( S ( x , ω | ω f , ω g ) > δ )):var ω f ,ω g [ P ( S ( X, ω | ω f , ω g ) > δ )]= var ω f ,ω g (cid:104) (cid:90) ˆ p ω ( x ) p X ( x ) d x (cid:105) = E ω f ,ω g (cid:104)(cid:16) (cid:90) ˆ p ω ( x ) p X ( x ) d x (cid:17) (cid:105) − (cid:16) E ω f ,ω g (cid:104) (cid:90) ˆ p ω ( x ) p X ( x ) d x (cid:105)(cid:17) = E ω f ,ω g (cid:104) (cid:90) ˆ p ω ( x ) p X ( x ) d x (cid:90) ˆ p ω ( x (cid:48) ) p X ( x (cid:48) ) d x (cid:48) (cid:105) − (cid:16) E ω f ,ω g (cid:104) (cid:90) ˆ p ω ( x ) p X ( x ) d x (cid:105)(cid:17)(cid:16) E ω f ,ω g (cid:104) (cid:90) ˆ p ω ( x (cid:48) ) p X ( x (cid:48) ) d x (cid:48) (cid:105)(cid:17) = (cid:90) (cid:90) E ω f ,ω g (cid:104) ˆ p ω ( x )ˆ p ω ( x (cid:48) ) (cid:105) p X ( x ) p X ( x (cid:48) ) d x d x (cid:48) − (cid:90) (cid:90) E ω f ,ω g (cid:104) ˆ p ω ( x ) (cid:105) E ω f ,ω g (cid:104) ˆ p ω ( x (cid:48) ) (cid:105) p X ( x ) p X ( x (cid:48) ) d x d x (cid:48) = (cid:90) (cid:90) cov ω f ,ω g (cid:2) ˆ p ω ( x ) , ˆ p ω ( x (cid:48) ) (cid:3) p X ( x ) p X ( x (cid:48) ) d x d x (cid:48) = (cid:90) std ω f ,ω g (cid:2) ˆ p ω ( x ) (cid:3)(cid:16) (cid:90) std ω f ,ω g (cid:2) ˆ p ω ( x (cid:48) ) (cid:3) ρ ω f ,ω g (cid:2) ˆ p ω ( x ) , ˆ p ω ( x (cid:48) ) (cid:3) p X ( x (cid:48) ) d x (cid:48) (cid:17) p X ( x ) d x ≤ . (cid:90) std ω f ,ω g (cid:2) ˆ p ω ( x ) (cid:3) p X ( x ) d x (B.1)28here ρ [ · , · ] denotes the correlation coefficient. The last inequality comesfrom std ω f ,ω g [ˆ p ω ( x (cid:48) )] ≤ . ρ ω f ,ω g [ˆ p ω ( x ) , ˆ p ω ( x (cid:48) )] ≤
1. The equality in(B.1) holds when ˆ p ω degenerates to a Bernoulli random variable with equalprobability 0.5 at both ˆ p ω = 0 and ˆ p ω = 1 and ρ ω f ,ω g [ˆ p ω ( x ) , ˆ p ω ( x (cid:48)(cid:48)