Multiobjective optimization using Gaussian process emulators via stepwise uncertainty reduction
aa r X i v : . [ m a t h . O C ] O c t Multiobjective optimization using Gaussian process emulators viastepwise uncertainty reduction
Victor PichenyINRA, 31326 Castanet Tolosan, FranceTel.: +33-5-61 28 54 [email protected] 3, 2013
Abstract
Optimization of expensive computer models with the helpof Gaussian process emulators in now commonplace. How-ever, when several (competing) objectives are considered,choosing an appropriate sampling strategy remains anopen question. We present here a new algorithm based onstepwise uncertainty reduction principles to address thisissue. Optimization is seen as a sequential reduction of thevolume of the excursion sets below the current best solu-tions, and our sampling strategy chooses the points thatgive the highest expected reduction. Closed-form formulaeare provided to compute the sampling criterion, avoidingthe use of cumbersome simulations. We test our methodon numerical examples, showing that it provides an effi-cient trade-off between exploration and intensification. keywords
Kriging; EGO; Pareto front; Excursion sets
We consider the problem of simultaneous optimization ofseveral objective functions over a design region X ⊂ R d :min y (1) ( x ) , . . . , y ( q ) ( x ) , where y ( i ) : X → R are outputs of a complex com-puter code. The objectives being typically conflicting,there exists no unique minimizer, and the goal is toidentify the set of optimal solutions, called Pareto front(Collette and Siarry, 2003). Defining that a point domi-nates another if all his objectives are better, the Paretofront X ∗ is the subset of the non-dominated points in X : ∀ x ∗ ∈ X ∗ , ∀ x ∈ X , ∃ k ∈ { , . . . , q } such that y ( k ) ( x ∗ ) ≤ y ( k ) ( x ) . When the computational cost of a single model evaluationis high, a well-established practice consists of using Gaus-sian process (GP) emulators to approximate the modeloutputs and guide the optimization process. Followingthe seminal article of Jones et al. (1998) and its EfficientGlobal Optimization (EGO) algorithm for single objec-tive optimization, several strategies have been proposedin the past few years to address the multi-objective prob-lem (Knowles, 2006; Keane, 2006; Ponweiser et al., 2008; Wagner et al., 2010). They consist in evaluating sequen-tially the computer model at the set of inputs that maxi-mizes a so-called infill criterion , derived from the GP em-ulator, that expresses a trade-off between exploration ofunsampled areas and sampling intensification in promis-ing regions. While the single objective case has been ex-tensively discussed (Jones, 2001; Wang and Shan, 2007),finding efficient and statistically consistent infill criteriafor the multi-objective case remains an open question.Alternatively to the EGO paradigm, stepwise uncer-tainty reduction (SUR) strategies aim at reducing, by se-quential sampling, an uncertainty measure about a quan-tity of interest. In a single objective optimization context,Villemonteix et al. (2009) defined the Shannon entropy ofthe maximizer (computed using the GP model) as an un-certainty measure: a smaller entropy implies that the max-imizer is well-identified. They show that their approachoutperforms the EGO strategy on a series of problems.Another example in a reliability assessment context can befound in Bect et al. (2012). In general, SUR approachesallow to define policies rigorously with respect to a givenobjective, resulting in very good performances. However,they are often challenging to use in practice, as they relyon very expensive GP simulations.We propose here a novel SUR strategy to address themulti-objective problem. It is based on a measure of un-certainty of the current identification of the Pareto front X ∗ , hence avoiding some of the drawbacks of the existingcriteria (hierarchy between objectives, difficult-to-tune pa-rameters, etc.). Following Chevalier et al. (2012), explicitformulae for the expected uncertainty reduction are pro-vided, avoiding the need to rely on simulations.The paper is organized as follows: section 2 presents theGP model and the basics of GP-based optimization. Then,we describe our SUR strategy for a single objective in sec-tion 3 and for several objectives in section 4. We providesome numerical experiments in section 5 and compare ourmethod to the state-of-the-art. Finally, advantages anddrawbacks of the method are discussed in section 6.1 Some concepts of Gaussian-process-based optimization
We consider first the emulation of a single computer re-sponse y . The response is modelled as Y ( . ) = f ( . ) T β + Z ( . ) , (1)where f ( . ) T = ( f ( . ) , . . . , f p ( . )) is a vector of trend func-tions, β a vector of (unknown) coefficients and Z ( . ) is aGaussian process (GP) Z with zero mean and known co-variance kernel k (Cressie, 1993; Rasmussen and Williams,2006). Let us call A n the event: { Y ( x ) = y , . . . , Y ( x n ) = y n } ;conditionally on A n , the mean and covariance of Y aregiven by: m n ( x ) = E (cid:0) Y ( x ) (cid:12)(cid:12) A n (cid:1) == f ( x ) T ˆ β + k n ( x ) T K − n ( y n − F n ˆ β ) ,c n ( x , x ′ ) = cov (cid:0) Y ( x ) , Y ( x ′ ) (cid:12)(cid:12) A n (cid:1) = k ( x , x ′ ) − k n ( x ) T K − n k n ( x ′ )+ (cid:0) f ( x ) T − k n ( x ) T K − n F n (cid:1) T (cid:0) F Tn K − n F n (cid:1) − (cid:0) f ( x ′ ) T − k n ( x ′ ) T K − n F n (cid:1) , where • y n = ( y , . . . , y n ) T are the observations, • K n = ( k ( x i , x j )) ≤ i,j ≤ n is the observation covariancematrix, • k n ( x ) T = ( k ( x , x ) , . . . , k ( x , x n )), • F n = (cid:0) f ( x ) T , . . . , f ( x n ) T (cid:1) T , and • ˆ β = (cid:0) F Tn K − n F n (cid:1) − F Tn K − n y n is the best linear un-biased estimate of β .In addition, the prediction variance is defined as s n ( x ) = c n ( x , x ) . The covariance kernel depends on parameters that areusually unknown and must be estimated from an initial setof responses. Typically, maximum likelihood estimates areobtained by numerical optimization and used as face value,the estimates being updated when new observations areadded to the model. The reader can refer to Stein (1999)(chapter 6), Rasmussen and Williams (2006) (chapter 5)or Roustant et al. (2012) for detailed calculations and im-plementation issues.When several functions y (1) , . . . , y ( q ) are predicted si-multaneously, it is possible to take their dependency intoaccount (Kennedy and O’Hagan, 2001; Craig et al., 2001).However, in this work we consider all the processes Y ( i ) independent, hence modelled as above, which is in linewith current practice. The EGO strategy, as well as most of its modifications,is based on the following scheme. An initial set of ob-servations is generated, from which the GP model is con-structed and validated. Then, new observations are ob-tained sequentially, at the point in the design space thatmaximizes the infill criterion, and the model is updatedevery time a new observation is added to the training set.The two later steps are repeated until a stopping criterionis met.The expected improvement criterion ( EI ) used in EGOrelies on the idea that progress is achieved by perform-ing an evaluation at step n if the ( n + 1) th design has alower objective function value than any of the n previousdesigns. Hence, the improvement is defined as the differ-ence between the current observed minimum and the newfunction value if it is positive, or zero otherwise, and EI is its conditional expectation under the GP model: EI ( x ) = E (cid:2) max (cid:0) , y min n − Y ( x ) (cid:1) |A n (cid:3) , where y min n denotes the current minimum of y found atstep n : y min n = min( y , . . . , y n ).EGO is the one-step optimal strategy (in expectation)regarding improvement: at step n , the new measurementis chosen as x n +1 = arg max x ∈ X EI ( x ) , which is in practice done by running an optimization al-gorithm. It has been shown in Jones (2001) that EGOprovides, among numerous alternatives, an efficient solu-tion for global optimization. Several adaptations of EGO to the multi-objective frame-work have been proposed; a review can be found inPonweiser et al. (2008). The main difficulty is that theconcept of improvement cannot be transferred directly, asthe current best point is here a set, and the gain is mea-sured on several objectives simultaneously. In Knowles(2006), the objectives are aggregated in a single func-tion using random weights, which allows using the stan-dard EGO. Keane (2006) derived an EI with respect tomultiple objectives. Ponweiser et al. (2008) proposed anhypervolume-based infill criterion, where the improvementis measured in terms of hypervolume increase.
We consider first the case of a problem with a single ob-jective y to minimize. In this section, we propose a newstrategy in a form similar to EGO that uses an alterna-tive infill criterion based on stepwise uncertainty reductionprinciples. The adaptation of this criterion to the multi-objective case is presented in Section 4.2 .1 Definition of an uncertainty measurefor optimization The EGO strategy focuses on progress in terms of objec-tive function value. It does not account (or only indirectly)for the knowledge improvement that a new measurementwould provide to the GP model, nor for the discrepancybetween the location of the current best design found andthe actual minimizer (which is actually most users’ objec-tive).Alternative sampling criteria have been proposed to ac-count for these two aspects. In Villemonteix et al. (2009),the IAGO stategy chooses the point that minimizes theposterior Shannon entropy of the minimizer: the inter-est of performing a new observation is measured in gain ofinformation about the location of the minimizer. Unfortu-nately, it relies on expensive GP simulations, which makesits use challenging in practice. Gramacy and Lee (2011)proposed an integrated expected conditional improvement to measure a global informational gain of an observation.In the noisy case, Scott et al. (2011) proposed a some-how similar knowledge gradient policy that also measuresglobal information gain. However, as both criteria rely onnotions of improvement, it makes them difficult to adaptto the multiobjective case. The criterion we propose belowaddress this issue.Consider that n measurements have been performed.As a measure of performance regarding the optimizationproblem, we consider the expected volume of excursion setbelow the current minimum y min n : ev n = E X (cid:2) P (cid:0) Y ( x ) ≤ y min n |A n (cid:1)(cid:3) . (2)Similarly to the Shannon entropy measure in IAGO, alarge volume indicates that the optimum is not yet pre-cisely located (see Figure 1); on the contrary, a small vol-ume indicates that very little can be gained by pursuingthe optimization process. Following the stepwise uncer-tainty reduction paradigm, this volume is an uncertaintymeasure related to our objective (finding the minimizerof y ); minimizing the uncertainty amounts to solving theoptimization problem.The probability p n ( x , y min n ) := P (cid:0) Y ( x ) ≤ y min n |A n (cid:1) ,which is often referred to as probability of improvement (Jones, 2001), can be expressed in closed form, and Eq. (2)writes: ev n = Z X p n ( x , y min n ) d x = Z X Φ (cid:18) y min n − m n ( x ) s n ( x ) (cid:19) d x , (3)where Φ( . ) is the cumulative distribution function (CDF)of the standard Gaussian distribution. Hypothesizing thata measurement y n +1 is performed at a point x n +1 , itsbenefit can be measured by the reduction of the expectedvolume of excursion set ∆ = ev n − ev n +1 , with: ev n +1 = Z X p n +1 ( x , min (cid:0) y min n , y n +1 (cid:1) ) d x = Z X Φ min (cid:0) y min n , y n +1 (cid:1) − m n +1 ( x ) s n +1 ( x ) ! d x . Of course, ev n +1 cannot be known exactly without eval-uating y n +1 . However, we show in the following that its expectation can be calculated in closed form, leading toa suitable infill criterion. To do so, we first formulate aseries of propositions in the next subsection. An interesting property of the GP model is that, when anew observation y n +1 = y ( x n +1 ) is added to the trainingset, its new predictive distribution can be expressed simplyas a function of the old one (Emery, 2009): m n +1 ( x ) = m n ( x ) + c n ( x , x n +1 ) c n ( x n +1 , x n +1 ) ( y n +1 − m n ( x n +1 )) ; s n +1 ( x ) = s n ( x ) − c n ( x , x n +1 ) s n ( x n +1 ) . (4)Note that only m n +1 ( x ) depends on the value of the newobservation y n +1 . Now, conditionally on the n first obser-vations, Y n +1 is a random variable (as the new observationhas not yet been performed) with its moments given bythe GP model: Y n +1 ∼ N (cid:0) m n ( x n +1 ) , s n ( x n +1 ) (cid:1) . We can then define the future expectation M n +1 ( x ) (orany quantity depending on it) as a random variable con-ditionally on A n and on the fact that the next observationwill be at x n +1 . This applies to any quantity dependingon Y n +1 or M n +1 ( x ), for instance, the probability of beingbelow a threshold a ∈ R : P n +1 ( x , a ) = Φ (cid:18) a − M n +1 ( x ) s n +1 ( x ) (cid:19) . Proposition 3.1.
Without any restriction on the valueof Y n +1 , the expectation of the future probability of beingbelow the threshold is equal to the current probability: P ( Y ( x ) ≤ a |A n , Y ( x n +1 ) = Y n +1 ) = E (cid:2) P n +1 ( x , a ) (cid:12)(cid:12) A n (cid:3) = p n ( x , a ) . Proposition 3.2.
Conditioning further by Y n +1 ≤ b ,the probability expectation writes in simple form using theGaussian bivariate CDF: q ( x , b, a ) := P (cid:2) Y ( x ) ≤ a (cid:12)(cid:12) A n , Y ( x n +1 ) = Y n +1 , Y n +1 ≤ b (cid:3) × P (cid:2) Y n +1 ≤ b (cid:12)(cid:12) A n (cid:3) = E (cid:2) P n +1 ( x , a ) × Y n +1 ≤ b (cid:12)(cid:12) A n (cid:3) = Φ ρ (cid:0) ¯ b, ˜ a (cid:1) , (5) where Φ ρ is the Gaussian bivariate CDF with zero meanand covariance (cid:20) ρρ (cid:21) , ¯ b = b − m n ( x n +1 ) s n ( x n +1 ) , ˜ a = a − m n ( x ) s n ( x ) and ρ = c n ( x , x n +1 ) s n ( x n +1 ) s n ( x ) . Corollary 3.3.
Similarly, conditioning by Y n +1 ≥ b leadsto: r ( x , b, a ) := P (cid:2) Y ( x ) ≤ a (cid:12)(cid:12) A n , Y ( x n +1 ) = Y n +1 , Y n +1 ≥ b (cid:3) × P (cid:2) Y n +1 ≥ b (cid:12)(cid:12) A n (cid:3) = Φ − ρ (cid:0) − ¯ b, ˜ a (cid:1) . (6)3he final proposition resembles Proposition 3.2, but thefixed threshold a is here replaced by Y n +1 : Proposition 3.4.
The expectation of the probability that Y ( x ) is smaller than Y n +1 , conditionally on Y n +1 ≤ b , isgiven by: h ( x , b ) := P (cid:2) Y ( x ) ≤ Y n +1 |A n , Y ( x n +1 )= Y n +1 , Y n +1 ≤ b (cid:3) P (cid:2) Y n +1 ≤ b (cid:12)(cid:12) A n (cid:3) = E (cid:2) P n +1 ( x , Y n +1 ) × Y n +1 ≤ b (cid:12)(cid:12) A n (cid:3) = Φ ν (cid:0) ¯ b, η (cid:1) , (7) with: η = m n ( x n +1 ) − m n ( x ) p s n ( x ) + s n ( x n +1 ) − c n ( x , x n +1 ) and ν = c n ( x , x n +1 ) − s n ( x n +1 ) s n ( x n +1 ) p s n ( x ) + s n ( x n +1 ) − c n ( x , x n +1 ) . All the proofs are reported in Appendix A.
Coming back to the SUR criterion, at step n the futurevolume of excursion set EV n +1 is a random variable, andits expectation is: EEV ( x n +1 ) := E (cid:0) EV n +1 (cid:12)(cid:12) A n , Y ( x n +1 ) = Y n +1 (cid:1) = Z X E h Φ min (cid:0) y min n , Y n +1 (cid:1) − M n +1 ( x ) s n +1 ( x ) !(cid:12)(cid:12) A n , Y ( x n +1 ) = Y n +1 i d x . Let ϕ ( y n +1 ) be the probability density function (PDF) of Y n +1 conditionally on A n . We have: EEV ( x n +1 )= Z X Z R Φ min (cid:0) y min n , y n +1 (cid:1) − m n +1 ( x ) s n +1 ( x ) ! dϕ ( y n +1 ) d x = Z X (cid:2) Z y min n −∞ Φ (cid:18) y n +1 − m n +1 ( x ) s n +1 ( x ) (cid:19) + Z + ∞ y min n Φ (cid:18) y min n − m n +1 ( x ) s n +1 ( x ) (cid:19) dϕ ( y n +1 ) (cid:3) d x = Z X (cid:2) h ( x , y min n ) + r ( x , y min n , y min n ) (cid:3) d x . The first term of the integrand is given by Eq. (7) in Propo-sition 3.4, with b = y min n , and the second term is given byEq. (6) in Corrolary 3.3, with a = b = y min n , hence: EEV ( x n +1 ) = Z X (cid:2) Φ ν (cid:0) y min n , η (cid:1) + Φ ρ (cid:0) − y min n , e y min n (cid:1)(cid:3) d x , (8)with: y min n = ( y min n − m n ( x n +1 )) /s n ( x n +1 )and e y min n = ( y min n − m n ( x )) /s n ( x ) . The SUR optimization strategy consists in adding theexperiment that minimizes the expected volume of excur-sion set (or maximizes the difference), that is, the one-stepoptimal policy in terms of reduction of the uncertainty onthe objective function minimizer: x n +1 = arg min x + ∈ X EEV ( x + ) (9) Remark
In general, the probability of improvement p n ( x , y min n ) is high where the prediction mean m n ( . ) islow and/or the prediction variance s n ( . ) is high. Sim-ply choosing points that maximize p n ( x , y min n ) is knownto be inefficient (Jones, 2001), as it does not consider theamplitude of the gain in the objective function. Here, EEV ( x + ) strongly depends on the potential gain am-plitude. Indeed, minimizing the expected volume relieson two mecanisms: reducing the local uncertainty andlowering the current minimum value ( y min n ). The first isachieved by adding measurements in unsampled regions(high s n ( . )), the second in regions where this potential re-duction is high. Hence, the EEV criterion can be seen asa mixed measure of uncertainty on the current minimumlocation and of potential gain in the objective function.
Figure 1 illustrates the concept of reduction of volume ofexcursion on a toy example. A GP model is built on a six-point training set, from which the probability of improve-ment p ( x , y min6 ) is computed for every point in X = [0 , x ∗ = 0 .
47, as the model can only predict that x ∗ is likely to be between 0 . .
6. Then, we considertwo candidate points ( x + = 0 . x + = 0 .
5) and com-pute, for each, the expected new probability (integrandin Eq. (8)). We see that the probability is likely to re-main mostly unchanged by adding the measurement at x + = 0 . x + = 0 .
5. In terms of volume of excur-sion set, we have
EEV (0 . ≈ ev (no reduction), while EEV (0 . ≈ ev / EEV criterionclearly indicates x + = 0 . Let y ( x ) = (cid:0) y (1) ( x ) , . . . , y ( q ) ( x ) (cid:1) be the vector of ob-jective functions to minimize. A point x dominates an-other point x ′ if y ( k ) ( x ) ≤ y ( k ) ( x ′ ) for all k in { , . . . , q } ,which we denote by x ′ ≺ x in the following. At step n , X n = { x , . . . , x n } is the current experimental set and Y n = { y ( x ) , . . . , y ( x n ) } the corresponding set of mea-sures. The non-dominated subset X ∗ n of X n constitutesthe current Pareto front (of size m ≤ n ). In the objective4 .0 0.2 0.4 0.6 0.8 1.0 . . . GP model . . . . . . Proba of improvement
Figure 1: Illustration of the effect of a new observation on the EV criterion. Left: actual objective function (dottedline), GP model (depicted by its mean in black plain line and 95% confidence interval in grey) based on six observations(black circles). The horizontal line shows the current minimum; the vertical bars are placed at two candidate locations.Right: probability of improvement given by the current model and expected updated probability for each candidate.Adding a point at x + = 0 . x + = 0 .
05 (dotted line) has little expected effect.space, the corresponding subset Y ∗ n separates the regionsdominated and not dominated by the experimental set.Then, we decompose the objective space plane using atesselation { Ω i } i ∈{ ,...,I } of size I = ( m +1) q ( ∪ i ∈ I Ω i = R q and ∩ i ∈ I Ω i = ∅ ), each cell being a hyperrectangle definedas: Ω i = { y ∈ R q | y ( k ) i − ≤ y ( k ) < y ( k ) i + , k ∈ { , . . . , q }} . Each couple ( y ( k ) i − , y ( k ) i + ) consists of two consecutive valuesof the vector (cid:2) −∞ , y ( k ) ( x ∗ ) , . . . , y ( k ) ( x ∗ m ) , + ∞ (cid:3) . An illus-tration is given in Figure 2.A cell Ω i dominates another cell Ω j (Ω j ≺ Ω i ) if anypoint in Ω i dominates any point in Ω j , and it partiallydominates Ω j if there exists a point in Ω j that is domi-nated by any point in Ω i . Otherwise, we say that Ω j isnot dominated by Ω i (Ω j Ω i ).We denote by I ∗ the indices of all the non-dominatedcells at step n , that is, the cells that are not dominated byany point of X ∗ n . In two dimensions, the non-dominatedcells are located in the bottom left half of the plane (Figure2).Now, let us assume that GP models are fitted to eachobjective y ( k ). At step n , the probability that Y ( x ) be-longs to the cell Ω i is: p in ( x ) = P (cid:2) Y ( x ) ∈ Ω i (cid:12)(cid:12) A n (cid:3) = q Y k =1 Φ y ( k ) i + − m ( k ) n ( x ) s ( k ) n ( x ) ! − Φ y ( k ) i − − m ( k ) n ( x ) s ( k ) n ( x ) ! := q Y k =1 p i ( k ) n by pairwise independence of Y (1) , . . . , Y ( q ) . The probabil-ity that x is not dominated by any point of X n is then theprobability that Y ( x ) belongs to one of the non-dominatedparts of the objective space. As the Ω i ’s are disjoint, it isequal to: P ( x X n (cid:12)(cid:12) A n ) = X i ∈ I ∗ p in ( x ) . (10) Figure 2: Example of Pareto front generated by four points(circles), and associated tesselation. The grey area corre-sponds to the dominated cells.Figure 3: Example of Pareto front modification due toa new measurement. Two points are removed from thePareto front while the new point is added. The hatchedarea represents the additional dominated region.5inally, the volume of the excursion sets behind thePareto front is equal to the integral of this probabilityover X : ev n = Z X P ( x X n (cid:12)(cid:12) A n ) d x = Z X X i ∈ I ∗ p in ( x ) d x = X i ∈ I ∗ Z X p in ( x ) d x . (11)When ev n is high, a large proportion of the design space islikely to be better than the current Pareto set; inversely,when X ∗ n approaches the actual Pareto set X ∗ , the volumetends to zero. Hence, it defines naturally an uncertaintyindicator for a SUR strategy. Now, let us consider that a measurement y n +1 is per-formed at a point x n +1 . Compared to step n , the vol-ume ev is modified by two means. First, the new mea-surement will modify the quantities m ( k ) n ( x ), s ( k ) n ( x ) ( k ∈{ , . . . , q } ), hence, the probabilities p in ( x ). Second, if thenew measurement is not dominated by the current Paretoset, it modifies the Pareto optimal front, as the new value y ( x n +1 ) is added to Y ∗ and the values of Y ∗ dominatedby y ( x n +1 ) (if they exist) are removed. An example ofsuch update is given in Figure 3.Focusing on the probability that a point remains non-dominated (Eq. (10)), accounting for the modificationsof the models is relatively easy (that is, computing thequantity p in +1 ( . )), but accounting for modifications in thePareto front is complex, as both the number of elementsand their values might change. To address this issue, weconsider that the updated probability P ( x X n +1 (cid:12)(cid:12) A n +1 )can be computed using the same sum as for P ( x X n (cid:12)(cid:12) A n )(Eq. (10)) by modifying its elements p in ( x ): P ( x X n +1 (cid:12)(cid:12) A n +1 ) = X i ∈ I ∗ ˜ p in +1 ( x ) , with˜ p in +1 ( x ) = P (cid:16) x X n +1 ∩ Y ( x ) ∈ Ω i (cid:12)(cid:12)(cid:12) A n , y ( x n +1 ) = y n +1 (cid:17) and the Ω i ’s defined using Y ∗ n (not Y ∗ n +1 ).Seing from step n , the ˜ P jn +1 ( x ) are random, as Y ( x n +1 ) ( k ) ∼ N (cid:16) m ( k ) n ( x n +1 ) , s ( k )2 n ( x n +1 ) (cid:17) , ∀ k ∈ { , . . . , q } .The expectation of the new volume is then: EEV ( x n +1 ) = E Z X X j ∈ I ∗ ˜ P jn +1 ( x ) d x = X j ∈ I ∗ Z X E h ˜ P jn +1 ( x ) i d x . This expression can be decomposed by conditioning onthe range values of the new observation (using the fact that the Ω i ’s are disjoint): E h ˜ P jn +1 ( x ) i = X i ∈ I P n +1 h x X n +1 ∩ Y ( x ) ∈ Ω j (cid:12)(cid:12)(cid:12) Y n +1 ∈ Ω i i × P n +1 h Y n +1 ∈ Ω i i := X i ∈ I p ij ( x ) , where P n +1 is the probability conditional on (cid:16) A n , Y ( x n +1 ) = Y n +1 (cid:17) .We first note from Proposition 3.1 that P n +1 [ Y n +1 ∈ Ω i ] = p in ( x n +1 ) . Then, leaving aside non-domination, the probability that Y ( x ) belongs to Ω j knowing that Y n +1 belongs to Ω i isgiven by: P n +1 h Y ( x ) ∈ Ω j (cid:12)(cid:12)(cid:12) Y n +1 ∈ Ω i i × p in ( x n +1 ) = q Y k =1 b ( k ) ij ( x ) , with: b ( k ) ij ( x ) = P n +1 h y ( k ) j − ≤ Y ( k ) ( x ) < y ( k ) j + (cid:12)(cid:12)(cid:12) y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + i × p i ( k ) n ( x n +1 ) ,p i ( k ) n ( x n +1 ) = P n h y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + i , by pairwise independence of Y (1) , . . . , Y ( q ) . We show inAppendix B that b ( k ) ij ( x ) can be expressed in closed formas: b ( k ) ij ( x ) = Φ ( k ) ρ (cid:18) y ( k ) i + , g y ( k ) j + (cid:19) − Φ ( k ) ρ (cid:18) y ( k ) i + , g y ( k ) j − (cid:19) − Φ ( k ) ρ (cid:18) y ( k ) i − , g y ( k ) j + (cid:19) + Φ ( k ) ρ (cid:18) y ( k ) i − , g y ( k ) j − (cid:19) with the notations introduced in Section 3.2.Now, we define: d ( k ) ij ( x ) = P n +1 h y ( k ) j − ≤ Y ( k ) ( x ) < y ( k ) j + ∩ Y ( k ) n +1 ≤ Y ( k ) ( x ) (cid:12)(cid:12)(cid:12) y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + i × p i ( k ) n ( x n +1 ) , which is identical to b ( k ) ij ( x ) with the additional condition Y ( k ) n +1 ≤ Y ( k ) ( x ). This condition is met when the k -th com-ponent of the new observation dominates the k -th compo-nent of Y ( x ). We have x ≺ x n +1 only if the condition Y ( k ) n +1 ≤ Y ( k ) ( x ) is met for all components, hence, withprobability of occurence Q pk =1 d ( k ) ij ( x ). Three cases arise: • y ( k ) i − ≥ y ( k ) j + : the component cannot be dominated,which implies d ( k ) ij ( x ) = 0; • y ( k ) i + ≤ y ( k ) j − : the component is always dominated,which implies d ( k ) ij ( x ) = b ( k ) ij ( x );6 y ( k ) i + = y ( k ) j + (and y ( k ) i − = y ( k ) j − ): Y ( k ) ( x ) and Y ( k ) n +1 sharethe same interval of variation, and: d ( k ) ij ( x ) = P n +1 h F ( k ) n +1 ≤ Y ( k ) ( x ) < y ( k ) i + (cid:12)(cid:12)(cid:12) y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + i × p i ( k ) n ( x n +1 ) , which is equal (as shown in Appendix B) to: d ( k ) ij ( x ) = Φ ( k ) ρ (cid:18) y ( k ) i + , g y ( k ) j + (cid:19) − Φ ( k ) ν (cid:16) y ( k ) i + , η ( k ) (cid:17) + Φ ( k ) ν (cid:16) y ( k ) i − , η ( k ) (cid:17) − Φ ( k ) ρ (cid:18) y ( k ) i − , g y ( k ) j + (cid:19) . The probability of Y ( x ) being non-dominated while inΩ j (and Y n +1 being in Ω i ) is then: p ij ( x ) = q Y k =1 b ( k ) ij ( x ) − q Y k =1 d ( k ) ij ( x ) . If Ω j ≺ Ω i , the new observation dominates any pointin Ω j , hence d ( k ) ij ( x ) = b ( k ) ij ( x ) for all k , which gives p ij ( x ) = 0. Inversely, if Ω j Ω i , the new observation can-not dominate any point in the cell Ω j . We have d ( k ) ij ( x ) = 0for at least one value of k , and p ij ( x ) is the probabilitythat Y ( x ) belongs to Ω j : p ij ( x ) = Q qk =1 b ( k ) ij ( x ).Finally, for a given point x ∈ X , we compute the prob-ability that it is non-dominated at step n + 1 using: P n +1 ( x X n +1 ) = X i ∈ I X j ∈ I ∗ p ij ( x ) , and the SUR criterion is: EEV ( x n +1 ) = X i ∈ I X j ∈ I ∗ Z X p ij ( x ) d x , (12)with: p ij ( x ) = j ≺ Ω i Q qk =1 b ( k ) ij ( x ) if Ω j Ω i Q qk =1 b ( k ) ij ( x ) − Q qk =1 d ( k ) ij ( x ) otherwise(13)The first sum in Eq. (12) accounts for Y n +1 potentiallybeing in any cell Ω i ; the second sum accounts for Y ( x )potentially being in a non-dominated cell Ω j . Evaluating the criterion as in Eq. (12) is a non-trivial task;besides, a relatively fast computation is needed, as it maybe embedded in an optimization loop to search for thebest new observation (Eq. (9)). We provide here sometechnical solutions to ease its computation. Some of theseissues have also been experienced with SUR criteria forinversion, as reported in Chevalier et al. (2012, 2013).Firstly, as no closed form exists for the integration overthe design domain X in Eq. (12), one may rely on Monte-Carlo integration, with approximations of the form: Z X p ij ( x ) d x ≈ L L X l =1 w l p ij ( x l ) , where the x l ’s and w l ’s are integration points and weights,respectively. One solution to alleviate the computationalcost is to use a fixed set of integration points while search-ing for the best new observation. Then, many quantitiesthat do not depend on x n +1 can be precalculated only oncebeforehand outside the optimization loop, as suggested inChevalier et al. (2012).Secondly, the criterion relies on the bivariate normaldistribution, which also must be computed numerically.Very efficient programs can be found, such as the R pack-age pbivnorm (Kenkel, 2012), which makes this task rela-tively inexpensive.Thirdly, the tesselation used in the previous section has I = ( m + 1) q elements, with I ∗ = I/ We consider here the two-objective case, for which the
EEV criterion can be expressed in a compact and compu-tationally efficent way. With two objectives, the Pareto setcan be ordered as follows (the first and second objectivefunctions in ascending and descending order, respectively): y (1) ∗ ≤ . . . ≤ y (1) ∗ m and y (2) ∗ ≥ . . . ≥ y (2) ∗ m .The non-dominated part of the objective space can bedivided in m + 1 cells. Then, given a non-dominated cellΩ j , only four cases arise for Ω i (the cell of the new ob-servation), as shown in Figure 4, for which the quantities p ij ( x ) need to be computed.Hence, the criterion can be expressed as a sum of atmost ( m + 1) × EEV ( x n +1 ) = m X j =0 Z X α j ( x ) , m + 1 non-dominated cells (in white). Right: the four cases for Ω i given Ω j : three are representedby the different hatched regions, the fourth corresponds to Ω j = Ω i .with: α ( x ) = h Φ (1) ρ (cid:16) ¯ y (1) ∗ , ˜ y (1) ∗ (cid:17) − Φ (1) ν (cid:16) ¯ y (1) ∗ , η (1) (cid:17)i × h Φ (cid:16) η (2) (cid:17) − i + Φ (cid:16) ˜ y (1) ∗ (cid:17) ,α j ( x ) = (cid:2) Φ (1) ρ (cid:16) ¯ y (1) ∗ j +1 , ˜ y (1) ∗ j +1 (cid:17) − Φ (1) ν (cid:16) ¯ y (1) ∗ j +1 , η (1) (cid:17) + Φ (1) ν (cid:16) ¯ y (1) ∗ j , η (1) (cid:17) − Φ (1) ρ (cid:16) ¯ y (1) ∗ j , ˜ y (1) ∗ j (cid:17) (cid:3) × h Φ (2) ν (cid:16) ¯ y (2) ∗ j , η (2) (cid:17) − Φ (2) ρ (cid:16) ¯ y (2) ∗ j , y (2) ∗ j (cid:17)i + h Φ (cid:16) ˜ y (1) ∗ j +1 (cid:17) − Φ (cid:16) ˜ y (1) ∗ j (cid:17)i Φ (cid:16) ˜ y (2) ∗ j (cid:17) , ∀ j ∈ { , . . . , m − } , and: α m ( x ) = (cid:2) − Φ (cid:16) η (1) (cid:17) + Φ (1) ν (cid:16) ¯ y (1) ∗ m , η (1) (cid:17) − Φ (1) ρ (cid:16) ¯ y (1) ∗ m , ˜ y (1) ∗ m (cid:17) (cid:3) × h Φ (2) ν (cid:16) ¯ y (2) ∗ m , η (2) (cid:17) − Φ (2) ρ (cid:16) ¯ y (2) ∗ m , ˜ y (2) ∗ m (cid:17)i + h − Φ (cid:16) ¯ y (1) ∗ m (cid:17)i Φ (cid:16) ¯ y (2) ∗ m (cid:17) . Calculations are not detailed, as they are straightforwarddevelopments of Eq. (13). The two extremal terms ( j = 0and j = m + 1) correspond to special cases of Ω j (first andlast cells in Figure 4, right). In this section, we apply the method to the follow-ing bi-objective problem: F (1) and F (2) are indepen-dent realizations of one-dimensional GPs, indexed bya 300-point regular grid on [0 , ν = 3 / /
5, respec-tively.Now, two GP models are built based on four randomlychosen observations. The covariance function is considered as known. Figure 5 shows the initial models and Paretofront. Here, a single point dominates the three others. Af-ter building the tesselation as described in Section 4.1, wecompute the volume of the excursion sets corresponding toeach cell (Eq. (11)). As there are only four observations,the probability to belong to a non-dominated cell is rela-tively high (Figure 5, bottom right). Then, the criterionis computed for each point in the grid (Figure 5, bottomright). Its maximum is obtained in a region with high un-certainty and low expected values for the two functions.After 10 iterations (Figure 6), the Pareto front is well-approximated. The models are accurate in the optimalregions and have high prediction variances in the otherregions, which indicates a good allocation of the compu-tational resources.Next, we compare these results to a state-of-the-artmethod, called SMS-EGO (Ponweiser et al., 2008), whichhas been shown to outperform significantly non-GP basedmethods (such as NSGA-II), in particular when onlya limited budget of evaluation is available. As mea-suring performances is non-trivial in multi-criteria op-timization, we use a series of three indicators: hyper-volume, epsilon and R indicators (Zitzler et al., 2003;Hansen and Jaszkiewicz, 1998), all available in the Rpackage EMOA (Mersmann, 2012). They provide differentmeasures of distance to the actual Pareto set and coverageof the objective space. Results are reported in Figure 7.The Pareto front obtained with SMS-EGO shows that thealgorithm only detected one of the two Pareto optimal re-gions. As a consequence, the Pareto front is locally moreaccurate than the one obtained with the SUR strategy,but the indicators are much poorer. Here, the objectives functions are realizations of six-dimensional GPs indexed by a 2000-point Sobol sequenceon [0 , , with a stationary Matern covariance with regu-larity parameter ν = 5 /
2. The variance and range param-eters are taken as one and √ /
6, respectively. The initialexperimental set consists of 10 points randomly chosen,and 40 points are added iteratively using the SUR andSMS-EGO strategies. The results are given in Figure 8.Again, the SUR strategy shows better performances com-8 .0 0.2 0.4 0.6 0.8 1.0 − − F ( ) − − F ( ) . . . Criterion −1.5 −1.0 −0.5 0.0 0.5 − . − . Pareto front
Figure 5: Top graphs: initial models (same representation as Figure 1); the actual Pareto-optimal points are repre-sented by red crosses. Bottom right: observations (black circles) represented in the objective space, actual Paretofront (red) and current front (blue). Bottom left: criterion value as a function of x . The vertical bars show the newobservation location; the green circle is the new observation. − . − . − . . . F ( ) − . − . . . F ( ) −1.5 −1.0 −0.5 0.0 0.5 − . − . − . . Pareto front
Figure 6: Models and Pareto front after 10 iterations. −1.5 −0.5 0.5 − . − . SMSEGO Y ( ) Y ( ) . . . . Hypervolume indicator
Iteration 2 4 6 8 10 . . . Epsilon indicator
Iteration 2 4 6 8 10 . . . . . R2 indicator
Iteration
Figure 7: SMS-EGO Pareto front after 10 iterations (left) and performance comparison between SUR (plain line)and SMS-EGO (dotted line) on a one-dimensional problem. Hypervolume indicator: higher is better; other indicatorsshould tend to zero.pared to SMS-EGO.
We have proposed a new sequential sampling strategy,based on stepwise uncertainty reduction principles, formulti-objective optimization. Closed-form expressions9 − − Pareto front Y ( ) Y ( ) −2 −1 0 1 2 3 − − SUR Y ( ) Y ( ) −2 −1 0 1 2 3 − − SMSEGO Y ( ) Y ( ) Hypervolume indicator
Iteration 0 10 20 30 40 . . . Epsilon indicator
Iteration 0 10 20 30 40 . . . R2 indicator
Iteration
Figure 8: Performance comparison between SUR (plain line) and SMS-EGO (dotted line) on a six-dimensional problem.Top left: all 2000 points in the objective space; Pareto-optimal points are the red crosses. Top middle and right: Paretofronts after 40 iterations.were provided for the infill criterion. Numerical exper-iments showed promising performances of our strategycompared to a state-of-the-art method. We point heresome strengths and weaknesses of our approach.First of all, as it is based on Gaussian process modeling,it shares the limits inherents to the model. In particu-lar, it is well-known that classical GP models cannot copewith large datasets ( > > A Probabilities update
A.1 Proof of Proposition 3.2
Using the model update equations (4), we note first that: p n +1 ( x , a ) = Φ a − m n ( x ) + c n ( x , x n +1 ) s n ( x n +1 ) [ m n ( x n +1 ) − y n +1 ] s n +1 ( x ) ϕ ( y n +1 ) be the PDF of Y n +1 (conditional on A n ).We have: q ( x , b, a )= Z b −∞ p n +1 ( x , a ) dϕ ( y n +1 )= Z b −∞ Φ a − m n ( x ) + c n ( x , x n +1 ) s n ( x n +1 ) [ m n ( x n +1 ) − y n +1 ] s n +1 ( x ) dϕ ( y n +1 )As Y n +1 ∼ N (cid:0) m n ( x n +1 ) , s n ( x n +1 ) (cid:1) , we can write (fol-lowing Chevalier et al. (2012)): Y n +1 = m n ( x n +1 ) + s n ( x n +1 ) U with U ∼ N (0 , , which allows to simplify the previous equations to: q ( x , b, a )= Z ¯ b −∞ Φ (cid:20) a − m n ( x ) s n +1 ( x ) − (cid:18) c n ( x , x n +1 ) s n ( x n +1 ) s n +1 ( x ) (cid:19) u (cid:21) dϕ ( u )= Z ¯ b −∞ Φ [ˆ a − βu ] dϕ ( u ) , (14)with β = ( c n ( x , x n +1 )) / ( s n ( x n +1 ) s n +1 ( x )) , ˆ a = ( a − m n ( x )) /s n +1 ( x ) and¯ b = ( b − m n ( x n +1 )) /s n ( x n +1 ) . This quantity can be written as a bivariate Gaussian CDF.Indeed: Z ¯ b −∞ Φ [ˆ a − βu ] dϕ ( u )= 1 √ π Z ¯ b −∞ Φ [ˆ a − βu ] exp (cid:18) − u (cid:19) du = 12 π Z ¯ b −∞ Z ˆ a − βu −∞ exp (cid:20) − (cid:0) u + t (cid:1)(cid:21) dtdu = 12 π Z ¯ b −∞ Z ˆ a −∞ exp (cid:20) − (cid:16) u + [ t − βu ] (cid:17)(cid:21) dtdu = 12 π | Σ β | Z ¯ b −∞ Z ˆ a −∞ exp (cid:20) − (cid:2) u t (cid:3) Σ − β (cid:20) ut (cid:21)(cid:21) dtdu, with Σ β = (cid:20) ββ β (cid:21) (noting that | Σ β | = 1), whichis the standard form of the bivariate Gaussian CDF withzero mean and covariance matrix Σ β , hence: q ( x , b, a ) = Φ Σ β (cid:0) ¯ b, ˆ a (cid:1) . Finally, applying the normalization ˜ a = ˆ a/ p β , wehave: q ( x , b, a ) = Φ ρ (cid:0) ¯ b, ˜ a (cid:1) , with: ρ = β p β = c n ( x , x n +1 ) s n ( x n +1 ) s n ( x ) . A.2 Proof of Proposition 3.1
The result can be obtained directly from Proposition 3.2with b → + ∞ . We have then: q ( x , b, a ) → Φ (˜ a ) = p n ( x , a ). A.3 Proof of Corollary 3.3
From Eq. (14), we have directly: r ( x , b, a ) = Z + ∞ ¯ b Φ [ˆ a − βu ] dϕ ( u )= Z − ¯ b −∞ Φ [ˆ a + βu ] dϕ ( u )= Φ Σ − β (cid:0) − ¯ b, ˆ a (cid:1) = Φ − ρ (cid:0) − ¯ b, ˜ a (cid:1) . A.4 Proof of Proposition 3.4
The steps of the proof are similar to those of Proposition3. Using the update equations (4), we have first: P ( Y ( x ) ≤ y n +1 |A n , y n +1 = y ( x n +1 ))= Φ (cid:20) y n +1 − m n +1 ( x ) s n +1 ( x ) (cid:21) = Φ − m n ( x ) + c n ( x , x n +1 ) m n ( x n +1 ) s n ( x n +1 ) + h − c n ( x , x n +1 ) s n ( x n +1 ) i y n +1 s n +1 ( x ) = Φ (cid:20) m n ( x n +1 ) − m n ( x ) s n +1 ( x ) − (cid:18) c n ( x , x n +1 ) − s n ( x n +1 ) s n ( x n +1 ) s n +1 ( x ) (cid:19) u (cid:21) . Now: h ( x , b )= Z b −∞ P ( Y ( x ) ≤ Y n +1 |A n , Y n +1 = y n +1 ) dϕ ( y n +1 )= Z ¯ b −∞ Φ h m n ( x n +1 ) − m n ( x ) s n +1 ( x ) − (cid:18) c n ( x , x n +1 ) − s n ( x n +1 ) s n ( x n +1 ) s n +1 ( x ) (cid:19) u i dϕ ( u )= Z ¯ b −∞ Φ [ µ − τ u ] dϕ ( u )= Φ Σ τ (cid:0) ¯ b, µ (cid:1) , as we get a form similar to Equation 14, with Φ Σ τ theCDF of the centered bigaussian with covariance Σ τ = (cid:20) ττ τ (cid:21) , µ = ( m n ( x n +1 ) − m n ( x )) /s n +1 ( x ) τ = ( c n ( x , x n +1 ) − s n ( x n +1 )) / ( s n ( x n +1 ) s n +1 ( x )) . Normalizing η = µ/ √ τ delivers the final result. B b ( k ) ij ( x ) and d ( k ) ij ( x ) computation Let X and Y be two dependent random variables, and a , b , c and d four real numbers. By direct application of11ayes formula, we have: P ( a ≤ X < b | c ≤ Y < d ) P ( c ≤ Y < d )= P ( Y < d ) × [ P ( X < b | Y < d ) − P ( X ≤ a | Y < d )] − P ( Y ≤ c ) × [ P ( X < b | Y ≤ c ) − P ( X ≤ a | Y ≤ c )] P ( Y ≤ X < b | a ≤ Y < b ) P ( a ≤ Y < b )= P ( Y < b ) × [ P ( X < b | Y < b ) − P ( X ≤ Y | Y < b )] − P ( Y ≤ a ) × [ P ( X < b | Y ≤ a ) − P ( X ≤ Y | Y ≤ a )]Now, by definition, b ( k ) ij is of the form of the fist equa-tion: b ( k ) ij ( x ) := P n +1 (cid:16) y ( k ) j − ≤ Y ( k ) ( x ) < y ( k ) j + | y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + (cid:17) × P n h y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + i , hence write as the sum of four terms: b ( k ) ij ( x ) = p ( k ) n ( x n +1 , y ( k ) i + ) (cid:16) P n +1 (cid:2) Y ( k ) ( x ) ≤ y ( k ) j + (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i + (cid:3) − P n +1 (cid:2) Y ( k ) ( x ) ≤ y ( k ) j − (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i + (cid:3)(cid:17) − p ( k ) n ( x n +1 , y ( k ) i − ) (cid:16) P n +1 (cid:2) Y ( k ) ( x ) < y ( k ) j + (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i − (cid:3) − P n +1 (cid:2) Y ( k ) ( x ) ≤ y ( k ) j − (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i − (cid:3)(cid:17) = q ( k ) (cid:16) x , y ( k ) i + , y ( k ) j + (cid:17) − q ( k ) (cid:16) x , y ( k ) i + , y ( k ) j − (cid:17) − q ( k ) (cid:16) x , y ( k ) i − , y ( k ) j + (cid:17) + q ( k ) (cid:16) x , y ( k ) i − , y ( k ) j − (cid:17) , with q ( k ) ( x , b, a ) given by Eq. (5), thus: b ( k ) ij ( x ) = Φ ( k ) ρ (cid:18) y ( k ) i + , g y ( k ) j + (cid:19) − Φ ( k ) ρ (cid:18) y ( k ) i + , g y ( k ) j − (cid:19) − Φ ( k ) ρ (cid:18) y ( k ) i − , g y ( k ) j + (cid:19) + Φ ( k ) ρ (cid:18) y ( k ) i − , g y ( k ) j − (cid:19) . Similary, d ( k ) ij is of the form of the second equation:Starting with the definition: d ( k ) ij ( x ) := P n +1 (cid:16) Y ( k ) n +1 ≤ Y ( k ) ( x ) < y ( k ) j + (cid:12)(cid:12) y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + (cid:17) × P n h y ( k ) i − ≤ Y ( k ) n +1 < y ( k ) i + i , hence writes: d ( k ) ij ( x ) = p ( k ) n ( x n +1 , y ( k ) i + ) (cid:16) P n +1 (cid:2) Y ( k ) ( x ) ≤ y ( k ) j + (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i + (cid:3) − P n +1 (cid:2) Y ( k ) ( x ) ≤ Y ( k ) n +1 (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i + (cid:3)(cid:17) − p ( k ) n ( x n +1 , y ( k ) i − ) (cid:16) P n +1 (cid:2) Y ( k ) ( x ) < y ( k ) j + (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i − (cid:3) − P n +1 (cid:2) Y ( k ) ( x ) ≤ Y ( k ) n +1 (cid:12)(cid:12) Y ( k ) n +1 ≤ y ( k ) i − (cid:3)(cid:17) = q ( k ) (cid:16) x , y ( k ) i + , y ( k ) j + (cid:17) − h ( k ) (cid:16) x , y ( k ) i + (cid:17) − h ( k ) (cid:16) x , y ( k ) i − (cid:17) + q ( k ) (cid:16) x , y ( k ) i − , y ( k ) j − (cid:17) , with q ( k ) ( x , b, a ) given by Eq. (5) and h ( k ) ( x , b ) given byEq. (7), thus: d ( k ) ij ( x ) = Φ ( k ) ρ (cid:18) y ( k ) i + , g y ( k ) j + (cid:19) − Φ ( k ) ν (cid:16) y ( k ) i + , η ( k ) (cid:17) + Φ ( k ) ν (cid:16) y ( k ) i − , η ( k ) (cid:17) − Φ ( k ) ρ (cid:18) y ( k ) i − , g y ( k ) j + (cid:19) . References
Banerjee, A., Dunson, D.B., Tokdar, S.T.: Efficient gaus-sian process regression for large datasets. Biometrika (1), 75–89 (2013)Bect, J., Ginsbourger, D., Li, L., Picheny, V., Vazquez, E.:Sequential design of computer experiments for the esti-mation of a probability of failure. Statistics and Com-puting (3), 773–793 (2012)Chevalier, C., Bect, J., Ginsbourger, D., Vazquez,E., Picheny, V., Richet, Y.: Fast parallel kriging-based stepwise uncertainty reduction with appli-cation to the identification of an excursion set. http://hal.inria.fr/hal-00641108/en (2012)Chevalier, C., Picheny, V., Ginsbourger, D.: Kriginv:An efficient and user-friendly implementation of batch-sequential inversion strategies based on kriging. Com-putational Statistics & Data Analysis (2013)Collette, Y., Siarry, P.: Multiobjective optimization: prin-ciples and case studies. Springer (2003)Craig, P.S., Goldstein, M., Rougier, J.C., Seheult, A.H.:Bayesian forecasting for complex systems using com-puter simulators. Journal of the American StatisticalAssociation (454), 717–729 (2001)Cressie, N.: Statistics for Spatial Data, revised edition,vol. 928. Wiley, New York (1993)Emery, X.: The kriging update equations and their appli-cation to the selection of neighboring data. Computa-tional Geosciences (3), 269–280 (2009)Gramacy, L., Lee, H.: Optimization under unknown con-straints. Bayesian Statistics , 229 (2011)Gramacy, R.B., Lee, H.K.: Bayesian treed gaussian pro-cess models with an application to computer model-ing. Journal of the American Statistical Association (483) (2008)Hansen, M.P., Jaszkiewicz, A.: Evaluating the quality ofapproximations to the non-dominated set. IMM, De-partment of Mathematical Modelling, Technical Univer-sity of Denmark (1998)Jones, D.R.: A taxonomy of global optimization methodsbased on response surfaces. Journal of global optimiza-tion (4), 345–383 (2001)Jones, D.R., Schonlau, M., Welch, W.J.: Efficient globaloptimization of expensive black-box functions. Journalof Global optimization (4), 455–492 (1998)Keane, A.J.: Statistical improvement criteria for usein multiobjective design optimization. AIAA journal (4), 879–891 (2006)Kenkel, B.: pbivnorm: Vectorized Bi-variate Normal CDF (2012). URL http://CRAN.R-project.org/package=pbivnorm .R package version 0.5-112ennedy, M.C., O’Hagan, A.: Bayesian calibration ofcomputer models. Journal of the Royal Statistical Soci-ety: Series B (Statistical Methodology) (3), 425–464(2001)Knowles, J.: Parego: A hybrid algorithm with on-linelandscape approximation for expensive multiobjectiveoptimization problems. Evolutionary Computation,IEEE Transactions on (1), 50–66 (2006)Mersmann, O.: emoa: Evolutionary Multiobjec-tive Optimization Algorithms (2012). URL http://CRAN.R-project.org/package=emoa . Rpackage version 0.5-0Ponweiser, W., Wagner, T., Biermann, D., Vincze, M.:Multiobjective optimization on a limited budget of eval-uations using model-assisted s-metric selection. In:Parallel Problem Solving from Nature, pp. 784–794.Springer (2008)Rasmussen, C., Williams, C.: Gaussian processes for ma-chine learning. MIT Press (2006)Roustant, O., Ginsbourger, D., Deville, Y.: Dicekriging,diceoptim: Two r packages for the analysis of computerexperiments by kriging-based metamodeling and opti-mization. Journal of Statistical Software (1), 1–55(2012)Scott, W., Frazier, P., Powell, W.: The correlated knowl-edge gradient for simulation optimization of continuousparameters using gaussian process regression. SIAMJournal on Optimization (3), 996–1026 (2011)Stein, M.: Interpolation of spatial data: some theory forkriging. Springer Verlag (1999)Villemonteix, J., Vazquez, E., Walter, E.: An informa-tional approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization (4), 509–534 (2009)Wagner, T., Emmerich, M., Deutz, A., Ponweiser, W.: Onexpected-improvement criteria for model-based multi-objective optimization. In: Parallel Problem Solvingfrom Nature, PPSN XI, pp. 718–727. Springer (2010)Wang, G.G., Shan, S.: Review of metamodeling tech-niques in support of engineering design optimization.Journal of Mechanical Design , 370 (2007)Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M.,Da Fonseca, V.G.: Performance assessment of multiob-jective optimizers: An analysis and review. Evolution-ary Computation, IEEE Transactions on7