Goal-oriented adaptive sampling under random field modelling of response probability distributions
GGOAL-ORIENTED ADAPTIVE SAMPLING UNDER RANDOM FIELDMODELLING OF RESPONSE PROBABILITY DISTRIBUTIONS
A. Gautier , D. Ginsbourger and G. Pirot Abstract.
In the study of natural and artificial complex systems, responses that are not completelydetermined by the considered decision variables are commonly modelled probabilistically, resulting inresponse distributions varying across decision space. We consider cases where the spatial variation ofthese response distributions does not only concern their mean and/or variance but also other featuresincluding for instance shape or uni-modality versus multi-modality. Our contributions build upon anon-parametric Bayesian approach to modelling the thereby induced fields of probability distributions,and in particular to a spatial extension of the logistic Gaussian model. The considered models deliverprobabilistic predictions of response distributions at candidate points, allowing for instance to per-form (approximate) posterior simulations of probability density functions, to jointly predict multiplemoments and other functionals of target distributions, as well as to quantify the impact of collectingnew samples on the state of knowledge of the distribution field of interest. In particular, we introduceadaptive sampling strategies leveraging the potential of the considered random distribution field mod-els to guide system evaluations in a goal-oriented way, with a view towards parsimoniously addressingcalibration and related problems from non-linear (stochastic) inversion and global optimisation. Introduction
Many problems in science and engineering boil down to studying the effect on a response of interest ofvarying some decision or control variables x . Yet it is typically unrealistic to assume a deterministic relationshipbetween x and the response, be it for instance because of uncertainty in other input variables or because theassumed response and/or observation generating processes themselves involve some randomness. Addressingoptimisation and related tasks in such frameworks comes with both conceptual and implementation challenges.While a number of contributions around stochastic and “noisy” optimisation have been developed, they oftenrely on homoscedasticity or specific distributional assumptions. In contrast, expensive-to-evaluate systems forwhich various features of response distributions evolve in decision space call for versatile modelling approaches.Here we consider the situation where responses follow probability distributions µ x indexed by decisionvariables x ∈ D and with a common support I . Without loss of generality, the index set –or decision space – D is assumed to be a compact subset of R d ( d = 1 or 2 in forthcoming examples). We furthermore assume thatan objective function g ( x ) = ρ ( µ x ) depending on x through µ x is to be minimised, where ρ returns for anyconsidered probability distribution a real-valued quantity such as a moment or a quantile with some given level.One of the main challenges in the considered framework is that typically, g ( x ) cannot be exactly evaluated asone only observes draws from µ x . We use the letter t (resp. T in random form) to denote such draws.In stochastic optimisation (Ruszczy´nski and Shapiro [42]; Birge and Louveaux [7]; Pr´ekopa [36]), hav-ing adapted assumptions for µ x is important to achieve good results. Approximation procedures have beenstudied, including but not limited to the Robbins Monro procedures and further developments in BayesianApproximation (Robbins and Monro [39]; Mandt et al. [27]) or the multi-armed bandit paradigm (Thompson[45]; Cesa-Bianchi and Lugosi [9]; Bubeck and Cesa-Bianchi [8]). The most natural choice for the functional ρ is to consider the expectation. Yet, many other choices for ρ have been considered. These choices includequantiles (Rostek [41]; Torossian et al. [49]), the conditional value at risk (Rockafellar and Uryasev [40]) orexpectiles (Bellini and Di Bernardino [3]). It is also possible to learn the unknown distribution µ x (Hall et al. Institute of Mathematical Statistics and Actuarial Science, University of Bern, Switzerland Centre for Exploration Targeting, The University of Western Australia, Australia a r X i v : . [ s t a t . M E ] F e b Index space D x x x
150 data pointsavailable No data available 10 data pointsavailable
Figure 1.
Schematic representation of a field of probability distributions µ x (represented bytheir probability density functions, in red) to be predicted based on scattered samples (withassociated histograms in grey). Here the end goal is to collect additional samples towards theminimisation of g ( x ) = ρ ( µ x ).[17]; Efromovich [13]; Moutoussamy et al. [30]; Zhu and Sudret [50]) but it generally requires a large sample size.In the context of global optimisation under expensive objective function evaluations, Bayesian Optimi-sation (BO) leverages the potential of meta-modelling, especially Gaussian Processes (GP) (Rasmussen andWilliams [37]), to keep a memory of explored points with the aim to explore decision space while keeping aparsimonious evaluation budget. First introduced by Moˇckus et al. [29] and further developed and popularizedby Jones et al. [21] in the noise free setting, it has latter been extended to stochastic black box optimisation(Frazier et al. [15], Frazier [16]; Srinivas et al. [43]; Picheny et al. [33]; Hern´andez-Lobato et al. [18]; Jalali et al.[19] and references therein) and further sequential strategies have been studied (Risk and Ludkovski [38]; Binoiset al. [6]). However, existing approaches typically assume Gaussian response distributions µ x . Another branchof study pertaining to geostatistics and that does not rely on Gaussian or specific distributional assumptions isthe so-called distributional Kriging (Aitchison [1]; Egozcue et al. [14]; Talsk´a et al. [44]) but such approachesare ill-suited in the case of moderate sample size heterogeneously scattered across space.In this paper, we investigate and leverage a non-parametric Bayesian model for fields of probability distri-butions that generalizes the logistic Gaussian process model. We focus on the situation where, for each x , thedistribution µ x admits a density p x with respect to a prescribed dominating measure. Our approach is adaptedto the case of moderate and heterogeneously scattered across space sample size. It delivers a probabilisticprediction of the field of probability distributions which allows us to derive sampling acquisition schemes. Weintroduce the aforementioned model in the Section 2. Then, we study an application of this model in sequentialdesign in Section 3 and demonstrate its applicability on toy probability density fields and on a contaminantlocalization problem in Section 4.The main contribution of this paper lies in the exploitation of a spatial extension of the logistic Gaussianprocess model for the sequential design of stochastic simulations towards optimisation and inversion. Sample-based modelling of density valued fields
The logistic Gaussian process for density estimation
Before considering the full scale problem of density field estimation, let us briefly consider the simplerproblem of (univariate) density estimation without any indexing on a spatial variable.Our focus is on a class of Bayesian density estimation methods where prior distributions of probabilitydensity functions are constructed by rescaling positive random processes in such a way that resulting realisationsintegrate to one. When the considered random processes are obtained by taking the logistic density transform(exponentiation and normalisation) of a Gaussian process, the resulting model is called Logistic Gaussian Process(LGP). LGP for density estimation were established and studied by Leonard [26] and Lenk [24, 25]. Morerecently, Tokdar and Ghosh [47] demonstrated that a large class of LGP achieves strong and weak consistency(under some regularity assumptions) at true densities that are either continuous or piecewise continuous.Using this prior within a Markov Chain Monte Carlo framework allows approximate sampling from theposterior distribution over the space of densities on I , as illustrated in Figure 2. Figure 2.
Using the logistic Gaussian process model in density estimation.Models relying on other mappings than the logistic density transform have been introduced and studied.In the particular case where the sigmoid ( logistic function ) is used instead of an exponential, Murray et al.[31] introduced an exact sampling approach relying on the boundedness of the sigmoid. More recently, Donnerand Opper [11] proposed an augmented model that allows for efficient inference by Gibbs sampling and anapproximate variational mean field approach. However, our first experiments hinted that choosing a boundedfunction such as the sigmoid often leads to a lack of expressiveness, hence we decided to rely on LGP settings.
A spatial extension of the logistic Gaussian process for density field estimation
When modelling and optimising functions depending on both decision and uncontrolled variables, it can bea relevant option to use augmented models incorporating the uncontrolled variables as part of their inputs. InPicheny et al. [33], a GP indexed by the joint space of design parameters and computational time is consideredto model partially converged simulation data. In Janusevskis and Le Riche [20], a GP model over the jointspaces of controllable and uncontrollable variables is used for simulation-based robust optimisation.Here we consider a spatial extension of the logistic Gaussian process. We work with a GP indexed by theproduct space D ×I and apply a logistic density transform to each x -slice of the process to produce a probabilitydensity field. A similar spatial LGP was introduced in Tokdar et al. [48] in the context of conditional densitymodelling (i.e., with random x ). To our knowledge, this family of models has not yet been used in the presentcontext of sequential design of experiments, be it for stochastic optimisation or towards other goals.Here we assume that our probability measures µ x of interest possess probability densities p x (with respectto some prescribed reference measure(s)) and we appeal to the said generalization of LGP to model the resultingdensity field. The resulting Spatial Logistic Gaussian process (SLGP) is defined by p x ( t ) = e W x ,t (cid:82) I e W x ,u du ( x ∈ D, t ∈ I ) , (1)where W x ,t ∼ GP ( m, k ) is a measurable GP with given mean function m : D × I → R and covariance kernel k : ( D × I ) × ( D × I ) → R such that for any x ∈ D , (cid:82) I e W x ,u du < ∞ almost surely. Remark 2.1.
When the index space D is a singleton, SLGP coincides with the LGP mentioned in Section 2.1. Remark 2.2.
Throughout the paper, the Lebesgue measure is assumed as dominating one, I is [0 , Conditioning on data and practicalities
In the SLGP model, the denominator involves values of W over the whole domain. This infinite dimensionalobjects makes likelihood-based computations cumbersome. In Tokdar [46], this problem is reduced to the oneof a finite dimensional integral by relying on a moderate number of inducing points, introduced to reduce thedimensionality of the LGP. These inducing points are selected according to a life and death process.Here, we opt for another way to reduce the dimensionality that is more suitable for sequential optimisation.We consider finite-rank GP with the following parametrisation W x ,t = m ( x , t ) + p (cid:88) j =1 (cid:112) λ j e j ( x , t ) ε j ( x ∈ D, t ∈ I ) (2)where p ∈ N , the e i ’s are functions, the λ i ’s are scalars satisfying λ ≥ . . . ≥ λ p > ε i ’s are i.i.d. N (0 , e j ’s at the grid points and then swiftlyobtain evaluations of W and its integral for varying realisations (or tentative values) (cid:15) i ’s of the ε i ’s.Now, consider that our data consist in n couples of locations and observations { ( x i , t i ) } ≤ i ≤ n ∈ ( D × I ) n .Moreover, assume the t i ’s are realisations of some independent random variables T i , and let µ x i be the (unknown)distribution of T i . We will note T n = ( T i ) ≤ i ≤ n and t n = ( t i ) ≤ i ≤ n . The GP W defined by the Equation 2,as well as the associated SLGP inherit their randomness from the Gaussian random vector εεε = ( ε i ) ≤ i ≤ p . Tohighlight this dependency we shall also denote the SLGP p x ( · ) by p x ( · ; εεε ) and use the notation p x ( · ; (cid:15)(cid:15)(cid:15) ) wheninstantiating the SLGP with a fixed value (cid:15)(cid:15)(cid:15) of εεε . We perform a Bayesian estimation of the true unknown densityfield p x i in the following way: • We use the prior εεε = ( ε i ) ≤ i ≤ p ∼ N ( , I p ), with density φ ( · ) (with respect to Lebesgue measure in R p ). • The finite-rank SLGP p x ( · ; εεε ) := e m ( x , · )+ (cid:80) pj =1 √ λ j e j ( x , · ) ε j (cid:90) I e m ( x ,u )+ (cid:80) pj =1 √ λ j e j ( x ,u ) ε j du is used to model p x ( · ).Hence, for T , ..., T n independent responses at locations x , ..., x n , one obtains the likelihood L ( · ; t n ) : (cid:15)(cid:15)(cid:15) ∈ R → L ( (cid:15)(cid:15)(cid:15) ; t n ) = n (cid:89) i =1 p x i ( t i ; (cid:15)(cid:15)(cid:15) ) (3) • The posterior density of εεε , given by Bayes formula is π ( (cid:15)(cid:15)(cid:15) | T n = t n ) ∝ φ ( (cid:15)(cid:15)(cid:15) ) (cid:81) ni =1 p x i ( t i ; (cid:15)(cid:15)(cid:15) )We leverage the fact that the posterior is known (up to a constant) to implement our density field estima-tion via a preconditioned Crank Nicholson algorithm (Cotter et al. [10]). This approach delivers a probabilisticprediction of ( p x ) x ∈ D , and allows us to approximately sample from the SLGP posterior distribution and rec-ommend new sampling locations. Some first contribution in stochastic optimisation
One of the main challenges in optimisation is to define suitable sequential design strategies. Choosing whereto add observations is indeed crucial and a good design strategy must achieve a trade-off between explorationand exploitation of the input space so as to discover regions of interest while avoiding getting trapped in thevicinity of local minimisers or of artefactual basins of attraction.Deriving such strategies requires anticipating (probabilistically) the effect of adding new observations. Tothis end, meta-models are commonly used. We now present the considered optimisation problem, the criterionused to derive a sequential design strategy, and a stochastic simulation-based approach to estimate this criterion.
Optimisation problem considered
We recall that we are interested in minimising g ( x ) = ρ ( µ x ), while we do not know the field µ x (orequivalently its density p x ). We consider G x , the random function obtained by applying ρ to the density valuedfield delivered by the SLGP model. G x is the model for g ( x ) induced by the SLGP p x ( · ).In the rest of the paper we will consider the particular case where ρ is the median, but the presentedapproach is not restricted to this choice and can be applied to arbitrary (measurable) mappings, potentiallyalso mappings depending on x . Remark 3.1. G x remains uncertain knowing T n = t n because of the conditional variability of εεε .In the spirit of robust optimisation, we will consider the problem of minimising an α -quantile of the randomfunction G x . Here the value of α is arbitrarily set to 0.9 and stays fixed through the optimisation procedure.We note Q n ( x ) the α -quantile of G x . Q n ( x ) is a random function as the observations T n are left in randomform. We shall also consider Q n + K ( x ; x new ), the α -quantile of G x conditioned on past observations T n and ona batch of K i.i.d. observations to be made at x new ∈ D . Denoting E n [ · ] = E [ ·| T n = t n ], we consider:EQI n ( x new , K ) = E n (cid:34)(cid:18) min x ∈ D Q n ( x ) − min x ∈ D Q n + K ( x ; x new ) (cid:19) + (cid:35) . (4) Remark 3.2.
This criterion was inspired by the Expected Quantile Improvement presented in Picheny et al.[32], which would write here as E n [(min i ≤ n Q n ( x i ) − Q n +1 ( x new ; x new )) + ], but has been modified in the spirit ofknowledge gradient approaches from Frazier et al. [15] to account for improvements on the whole domain. Simulation-based computation of criteria
Classically, in Sequential Uncertainty Reduction (SUR) (Bect et al. [2]) approaches, it is assumed that thefunction of interest is a realisation of a GP. Under these assumptions, several criteria enjoy (semi-)analyticalforms, favouring criterion optimisation and the implementation of design strategies.However, in our situation, it does not appear feasible to obtain a closed-form formula for the considered EQIcriterion and we estimate it therefore via stochastic simulation.In order to quantify the effect of adding an observation T n +1 at a given location x new , one needs to studythe probability density of a new observation T at x conditioned on T n = t n . In favour of the law of totalprobability, and by considering Equation 3, one finds that it is given by: π ( t | T n = t n ) ∝ (cid:90) π ( t | (cid:15)(cid:15)(cid:15) ) π ( (cid:15)(cid:15)(cid:15) | t n ) d(cid:15)(cid:15)(cid:15) ∝ (cid:90) p x ( t ; (cid:15)(cid:15)(cid:15) ) π ( (cid:15)(cid:15)(cid:15) | t n ) d(cid:15)(cid:15)(cid:15). (5)This motivates the following approach, which can be considered as a basic application of Sequential MonteCarlo (Doucet et al. [12]), where we use a simple simulation-based particle filter to approximate an unknownfuture quantity:(1) The generative model given by the SLGP model is implemented within a MCMC framework and usedto obtain N approximate realisations of εεε | T n = t n , noted ( (cid:15)(cid:15)(cid:15) ( j ) ) ≤ j ≤ N .The density of a new observation at x (See Equation 5) is approximated by the mixture N − N (cid:80) j =1 p x ( · ; (cid:15)(cid:15)(cid:15) ( j ) ).(2) The impact of adding K observations at a given location x new is estimated by doing M simulations:For the i -th simulation, K realisations of the random variable (cid:101) T new are independently drawn from the density N − N (cid:80) j =1 p x new ( · ; (cid:15)(cid:15)(cid:15) ( j ) ). The corresponding batch of response values is noted (cid:101) t ( i )new = ( (cid:101) t ( i ) , , ..., (cid:101) t ( i ) ,K new ).In light of Eq. 5, the response density at x conditional on past data and on the simulated batch is givenby π ( t | T n = t n , (cid:101) T ( i )new = (cid:101) t ( i )new ) ∝ (cid:90) p x ( t ; (cid:15)(cid:15)(cid:15) ) K (cid:89) (cid:96) =1 p x new ( (cid:101) t ( i ) ,(cid:96) new ; (cid:15)(cid:15)(cid:15) ) π ( (cid:15)(cid:15)(cid:15) | t n ) d(cid:15)(cid:15)(cid:15) and can be approximated by N (cid:80) j =1 p x ( · ; (cid:15)(cid:15)(cid:15) ( j ) ) ω i,j , with weights ω i,j proportional to K (cid:81) (cid:96) =1 p x new ( (cid:101) t ( i ) ,(cid:96) new ; (cid:15)(cid:15)(cid:15) ( j ) ) and summing to one.(3) The density field distribution after adding K observations at x new is predicted by M − M (cid:80) i =1 K (cid:80) j =1 p x ( · ; (cid:15)(cid:15)(cid:15) ( j ) ) ω i,j . Applications
For all the coming applications, we consider a zero mean GP W ( x, t ) = (cid:80) pj =1 (cid:112) λ j e j ( x, t ) ε j , p ∈ N .To define the e j ’s, we start from multivariate Fourier functions of order q >
0: sine and cosine of 2 π ( ωωω (cid:62) t t + ωωω (cid:62) x x ), where ωωω t and ωωω x are integer-valued frequencies vectors satisfying (cid:107) ωωω x (cid:107) ∞ , (cid:107) ωωω t (cid:107) ∞ ≤ q . Then, we removeredundant terms as well as those irrelevant in the SLGP setting (i.e. functions independent of t , that wouldbe simplified with the normalisation of the process) and set the corresponding λ j to be 1 / (1 + (cid:107) ωωω x (cid:107) + (cid:107) ωωω t (cid:107) ).Assuming the λ j are ordered, it defines a specific finite-rank GP through its Karhunen Lo`eve decomposition. Two analytical applications
The data in the first applications is obtained by corrupting two reference functions with noise. In thissetting we have D = I = [0 , f ( x ) = 0 .
25 sin(16 x +9)+0 .
25 sin(4 . x +2 . . x ≈ . f ( x ) = 0 .
15 + . x − − x −
5) + 6 . x − + 1 (minimum at x ≈ . Figure 3.
The two reference density fields used, their median and its global optimum.We consider two types of noises for this application. One being a truncated centred homoscedastic Gaussiannoise, where the truncation ensures that the support of the resulting density field is included in [0 ,
1] while thenoise remains symmetrical. The second noise being a multi-modal, non symmetrical noise, with median zero.The distribution field obtained with this multi-modal noise are shown in Figure 3.We perform density field estimation as presented in Section 2.3 for such reference fields for different samplesizes and order of the GP. In this section, we are not yet in an adaptive setting: each time a new observation isadded to the model, its location is determined randomly, with uniform distribution over the index set. Figure 4displays the posterior mean field with two sample sizes for the truncated Gaussian noise.
Figure 4.
True field and samples used (top), posterior mean field (bottom) for a sample sizeof 100 (left) and 10000 (right). The SLGP uses 121 basis functions.The goodness of fit of our density estimation procedure is expected to be determined by the number ofavailable observations (as illustrated in Figure 4), and the GP’s order. We define an integrated squared Hellingerdistance that quantifies the dissimilarity between two probability density valued fields p i = ( p i x ) x ∈ D ( i = 1 , d ( p , p ) = (cid:90) D (cid:90) I (cid:16)(cid:112) p v ( u ) − (cid:112) p v ( u ) (cid:17) du d v (6)In Figure 5 we display the distribution of d between true and estimated fields for various sample sizesand SLGP orders. As both functions f and f yielded close results, we chose to show only the results for f .We see that the goodness of fit are comparable for small sample sizes, and that the order becomes limiting whenobservations are available as low order SLGPs are unable to model small scale variations. Figure 5.
Integrated squared Hellinger distance distribution for different sample sizes andprocess orders, when the reference field is obtained by corrupting f with the two noises. Hydrogeological problem
In addition to the analytical probability density, we consider a one dimensional contaminant problem.In this case, given some observed concentration breakthrough curves, we want to localize the depth of a con-taminant source injected in a saturated aquifer whose geological properties are highly uncertain. To do so,hydrogeologists must rely on the use of analogues and expert knowledge to generate an ensemble of plausiblegeological realisations that can be used to quantify prediction uncertainty. To keep the problem simple, thezone of interest of the aquifer is modelled as a 2D vertical section (10 meter deep and 5 meter wide) aligned withthe main flow direction. At the domain inlet, the depth of the released contaminant (normalized location x inwhat follows) is the unknown of the problem. The reference observations consist of concentration breakthroughcurves at different depths of the domain outlet and are obtained for a release of the contaminant at depth1.84 m.The ensemble of plausible geological realisations and the geological references are multiple-point statisticsrealisations generated with the Deesse algorithm (Mariethoz et al. [28]) that reproduce the complex features ofbraided-river aquifer models (Pirot et al. [35]). The contaminant flow and transport is simulated under steady-state flow and fixed boundary conditions (constant hydraulic gradient) using the Maflot Matlab code (K¨unzeand Lunati [23]).Localizing the contaminant source means finding the depths at which simulations are the most similar toour reference. In particular, we use a normalized misfit, denoted as the response t in what follows and we focuson finding the source depth that will minimise the median of the misfit distribution. In Figure 6, we representa reference breakthrough curve and two simulations. Figure 6.
Data considered: First unknown geology and reference curves / Plausible geology 1and simulated response, source at 1,84m / Plausible geology 2 and simulated response, sourceat 1,84m.In this section we consider two cases, corresponding to two latent geologies. In order to know wherethe real minimum of the medians are and to evaluate the performances of our approach, we need knownreference fields. We obtain them by computing the misfit between our reference breakthrough curves and 10000simulations (corresponding to 50 distinct source depth and 200 plausible geological structures) and estimatingthe distribution of the misfit values depending on the source depth with a SLGP. The posterior mean field isused as references, and are presented in Figure 7.Using these two reference density fields to draw new samples, we followed the methodology introduced inpart 3 and represent in Figure 8 the posterior mean field and its estimated median before and after running 25steps of the algorithm on the first geological structure.
Figure 7.
Misfit data (top) and posterior mean field (bottom) for two latent geological struc-tures.
Figure 8.
Results at the beginning of the algorithm [left] and after the 25th step [right].Mean field estimated and samples available [top]; Estimated VS reference median [bottom].On this application, the global minimum (at 2,24m) is easily found and the algorithm focuses on improvingits estimation of the median, hence producing a better estimation of it. This corresponds to an exploitation-oriented approach. Although not displayed here, we found out, as reflected in Figure 9 that with the othergeological structure, our approach was also able to locate the minimum, this time by focusing on exploring byadding observations at new locations of interest.
Optimisation benchmark
For our small benchmark, the starting design consists in n = 20 data points ( x i , t i ) from the reference fieldheterogeneously scattered across space. Their location are independent uniformly distributed across parameterspace. At each step, observations are added in batches of 20 at the same location. We repeat 24 independentinstances of the optimisation process for each strategy and each application and compare the performancesin term of optimality gap (difference between real and estimated optimal medians). This approach has beenfavoured due to the relatively high cost of one evaluation of the EQI criterion for SLGP, but we expect it tobe detrimental to our GP-based competitors, as GPs would benefit more from having scattered observations0rather than batches scattered over different points.We compare different strategies for modelling the field and choosing the next sampling location. The valueof the minimiser is inferred by modelling the function of interest with one of three models: the first one, thatwill be called homoscedastic GP consists in a GP regression where the observation noise level is assumed tobe uniform throughout the domain. The second one, a GP regression with input dependent noise rates asin Kersting et al. [22], Binois et al. [5] will be called heteroscedastic GP. And the last one being the SLGPmodelling presented here.For each of these three models, we compare a non-adaptive approach (where at each step, the new observationsare added at a location chosen at random, with uniform distribution) to an adaptive approach. The criterionused are: Approximated Knowledge Gradient (AKG) as implemented in the R package DiceOptim version 2.0.1[34] for the homoscedastic GP, the Expected improvement, as implemented in the R package hetGP version1.2.1 [4] for the heteroscedastic GP, and the criterion 4 of part 3 for the SLGP. The results of the benchmarkare shown in Figure 9. Figure 9.
Median of the log optimality gap for the six considered strategies on the six con-sidered test cases.Hyperparameters of the two GP’s are estimated by maximum likelihood. For the adaptive SLGP, wedecided to display the results obtained when minimising a 90% quantile of the random median, as our firstexperiments showed no strong sensitivity to the chosen level α . On a personal laptop with 16Go RAM and asimple R implementation, performing the SLGP density estimation took between 13 and 24 minutes dependingon the sample size, and inferring the EQI with 150 simulations for 101 locations took 5 more minutes.One notices that in the most complex situations, the sampling scheme based on GP modelling of thefunctional performs worse than the approaches based on SLGP modelling. We found out that GP basedapproaches tends to get trapped in local optima. This might be explained by the credit allocation.1 Conclusion and perspectives
We demonstrated how a spatial generalization of the Logistic Gaussian Process model can be used forsample-based modelling of distribution fields, and how the resulting probabilistic predictions of distributionfields can be leveraged for moderate-dimensional stochastic optimization problems under unknown heterogeneousnoise distributions. In particular, we introduced a simple sequential Monte Carlo-based approach to quantify theimpact of sampling at a location and showed, on a modest benchmark, that it enables implementing sequentialdesign strategies trading off between exploration and exploitation.The conducted experiments yielded promising results. However, further work is needed to more exten-sively compare algorithms with respect to the employed meta-models and criteria, respectively, and identifywhich of these degrees of freedom are most influential on performances. Tuning model settings and strategiesappropriately could ultimately lead to substantial efficiency gains, be it towards stochastic optimization orbroader uncertainty reduction goals. In particular, SLGP would benefit from more systematic model selectionand parameter estimation approaches.Upcoming research directions encompass making the SLGP model more suitable for higher dimensions andreducing associated computational costs, assessing the performance of the resulting probabilistic predictions,and deriving further uncertainty reduction strategies. It is also of interest to investigate theoretical properties ofSLGP models and considered sequential strategies, notably in view of underlying mean and covariance functions.
Acknowledgements
The authors wish to thank the reviewing team for their remarks that helped substantially improve the paper.AG’s and DG’s contributions have taken place within the Swiss National Science Foundation project number178858. AG and DG would like to warmly thank Dr. Tomasz Kacprzak (ETH Z¨urich) for early discussionshaving motivated part of this work, as well as Yves Deville for insightful comments.
References [1] J. Aitchison. The statistical analysis of compositional data.
Journal of the Royal Statistical Society, Series B: Methodological ,44:139–177, 1982.[2] J. Bect, F. Bachoc, and D. Ginsbourger. A supermartingale approach to Gaussian process based sequential design of experi-ments.
Bernoulli , 25(4A):2883–2919, 2019.[3] F. Bellini and E. Di Bernardino. Risk management with expectiles.
The European Journal of Finance , 23(6):487–506, 2017.[4] M. Binois and R. B. Gramacy. hetGP: Heteroskedastic Gaussian Process Modeling and Design under Replication , 2019. URL https://CRAN.R-project.org/package=hetGP . R package version 1.1.2.[5] M. Binois, R. B. Gramacy, and M. Ludkovski. Practical heteroscedastic gaussian process modeling for large simulationexperiments.
Journal of Computational and Graphical Statistics , 27(4):808–821, 2018.[6] M. Binois, J. Huang, R. B. Gramacy, and M. Ludkovski. Replication or exploration? Sequential design for stochastic simulationexperiments.
Technometrics , 61(1):7–23, 2019.[7] J. R. Birge and F. Louveaux.
Introduction to stochastic programming . Springer Science & Business Media, 2011.[8] S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems.
Foundationsand Trends ® in Machine Learning , 5(1):1–122, 2012.[9] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games . Cambridge university press, 2006.[10] S. Cotter, G. Roberts, A. Stuart, and D. White. MCMC methods for functions: modifying old algorithms to make them faster.
Statistical Science , 28(3):424–446, 2013.[11] C. Donner and M. Opper. Efficient bayesian inference for a gaussian process density model. arXiv:1805.11494 , 2018.[12] A. Doucet, N. De Freitas, and N. Gordon. Sequential Monte Carlo methods in practice. 2001.[13] S. Efromovich. Dimension reduction and adaptation in conditional density estimation.
Journal of the American StatisticalAssociation , 105(490):761–774, 2010.[14] J. J. Egozcue, J. D´ıaz–Barrero, and V. Pawlowsky-Glahn. Hilbert space of probability density functions based on Aitchisongeometry.
Acta Mathematica Sinica, English Series , 22:1175–1182, 2006.[15] P. Frazier, W. Powell, and S. Dayanik. The knowledge-gradient policy for correlated normal beliefs.
INFORMS journal onComputing , 21(4):599–613, 2009.[16] P. I. Frazier. A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811 , 2018. [17] P. Hall, J. Racine, and Q. Li. Cross-validation and the estimation of conditional probability densities. Journal of the AmericanStatistical Association , 99(468):1015–1026, 2004.[18] J. M. Hern´andez-Lobato, M. W. Hoffman, and Z. Ghahramani. Predictive entropy search for efficient global optimization ofblack-box functions.
Advances in neural information processing systems , 27:918–926, 2014.[19] H. Jalali, I. Nieuwenhuyse, and V. Picheny. Comparison of Kriging-based algorithms for simulation optimization with hetero-geneous noise.
European Journal of Operational Research , 2017.[20] J. Janusevskis and R. Le Riche. Simultaneous kriging-based estimation and optimization of mean response.
Journal of GlobalOptimization , 55(2):313–336, 2013.[21] D. Jones, M. Schonlau, and W. Welch. Efficient Global Optimization of expensive black-box functions.
Journal of GlobalOptimization , 13:455–492, 1998.[22] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard. Most likely heteroscedastic gaussian process regression. In
Proceedingsof the 24th International Conference on Machine Learning , ICML ’07, page 393–400, 2007.[23] R. K¨unze and I. Lunati. A matlab toolbox to simulate flow through porous media. Technical report, University of Lausanne,Switzerland, 2011.[24] P. J. Lenk. The logistic normal distribution for Bayesian, nonparametric, predictive densities.
Journal of the AmericanStatistical Association , 83:509–516, 1988.[25] P. J. Lenk. Towards a practicable Bayesian nonparametric density estimator.
Biometrika , 78:531–543, 1991.[26] T. Leonard. Density estimation, stochastic processes and prior information.
Journal of the Royal Statistical Society, SeriesB: Methodological , 40:113–132, 1978.[27] S. Mandt, M. D. Hoffman, and D. M. Blei. Stochastic gradient descent as approximate bayesian inference.
The Journal ofMachine Learning Research , 18(1):4873–4907, 2017.[28] G. Mariethoz, P. Renard, and J. Straubhaar. The direct sampling method to perform multiple-point geostatistical simulations.
Water Resources Research , 46(11), 2010.[29] J. Moˇckus, V. Tiesis, and A. ˇZilinskas. The application of Bayesian methods for seeking the extremum.
Towards globaloptimization , 2(117-129):2, 1978.[30] V. Moutoussamy, S. Nanty, and B. Pauwels. Emulators for stochastic simulation codes.
ESAIM: Proceedings and Surveys , 48:116–155, 2015.[31] I. Murray, D. MacKay, and R. P. Adams. The gaussian process density sampler. In
Advances in Neural Information ProcessingSystems , volume 21, pages 9–16, 2009.[32] V. Picheny, D. Ginsbourger, Y. Richet, and G. Caplin. Quantile-based optimization of noisy computer experiments withtunable precision.
Technometrics , 55(1):2–13, 2013.[33] V. Picheny, T. Wagner, and D. Ginsbourger. A benchmark of kriging-based infill criteria for noisy optimization.
Structuraland Multidisciplinary Optimization , 48(3):607–626, 2013.[34] V. Picheny, D. Ginsbourger, and O. Roustant.
DiceOptim: Kriging-Based Optimization for Computer Experiments , 2020.URL https://CRAN.R-project.org/package=DiceOptim . R package version 2.0.1.[35] G. Pirot, J. Straubhaar, and P. Renard. A pseudo genetic model of coarse braided-river deposits.
Water Resources Research ,51(12):9595–9611, 2015.[36] A. Pr´ekopa.
Stochastic programming , volume 324. Springer Science & Business Media, 2013.[37] C. Rasmussen and C. Williams.
Gaussian Processes for Machine Learning . Adaptive Computation and Machine Learning.MIT Press, Cambridge, MA, USA, 2006.[38] J. Risk and M. Ludkovski. Sequential design and spatial modeling for portfolio tail risk measurement.
SIAM Journal onFinancial Mathematics , 9(4):1137–1174, 2018.[39] H. Robbins and S. Monro. A stochastic approximation method.
The annals of mathematical statistics , pages 400–407, 1951.[40] R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk.
Journal of risk , 2:21–42, 2000.[41] M. Rostek. Quantile maximization in decision theory.
The Review of Economic Studies , 77(1):339–371, 2010.[42] A. Ruszczy´nski and A. Shapiro. Stochastic programming models, 2003.[43] N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian Process optimization in the bandit setting: No regret andexperimental design. In
Proceedings of the 27th International Conference on Machine Learning , ICML’10, page 1015–1022,2010.[44] R. Talsk´a, A. Menafoglio, J. Machalov´a, K. Hron, and E. Fiˇserov´a. Compositional regression with functional response.
Computational Statistics & Data Analysis , 123:66–85, 2018.[45] W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.
Biometrika , 25(3/4):285–294, 1933.[46] S. T. Tokdar. Towards a faster implementation of density estimation with Logistic Gaussian Process priors.
Journal ofComputational and Graphical Statistics , 16(3):633–655, 2007.[47] S. T. Tokdar and J. K. Ghosh. Posterior consistency of logistic Gaussian process priors in density estimation.
Journal ofStatistical Planning and Inference , 137(1):34–42, 2007.[48] S. T. Tokdar, Y. M. Zhu, and J. K. Ghosh. Bayesian density regression with logistic gaussian process and subspace projection.
Bayesian Analysis , 5(2):319–344, 2010. [49] L. Torossian, V. Picheny, R. Faivre, and A. Garivier. A review on quantile regression for stochastic computer experiments. Reliability Engineering & System Safety , 2020.[50] X. Zhu and B. Sudret. Replication-based emulation of the response distribution of stochastic simulators using generalizedlambda distributions.