[PDF] Iterative Method for Tuning Complex Simulation Code

Abstract

Tuning a complex simulation code refers to the process of improving the agreement of a code calculation with respect to a set of experimental data by adjusting parameters implemented in the code. This process belongs to the class of inverse problems or model calibration. For this problem, the approximated nonlinear least squares (ANLS) method based on a Gaussian process (GP) metamodel has been employed by some researchers. A potential drawback of the ANLS method is that the metamodel is built only once and not updated thereafter. To address this difficulty, we propose an iterative algorithm in this study. In the proposed algorithm, the parameters of the simulation code and GP metamodel are alternatively re-estimated and updated by maximum likelihood estimation and the ANLS method. This algorithm uses both computer and experimental data repeatedly until convergence. A study using toy-models including inexact computer code with bias terms reveals that the proposed algorithm performs better than the ANLS method and the conditional-likelihood-based approach. Finally, an application to a nuclear fusion simulation code is illustrated.

Full PDF

IIterative Method for Tuning Complex Simulation Code

Yun Am Seo a ,Youngsaeng Lee b , and Jeong-Soo Park c, ∗ a: AI Weather Forecast Res Team, National Inst of Meteorological Science, Koreab: Digital Transformation Department, Korea Electric Power Corporation, Koreac: Department of Statistics, Chonnam National University, Gwangju 500-757, Korea*: Corresponding author, E-mail: [email protected], Tel: +82-62-530-3445 Feb 2020

Abstract

Tuning a complex simulation code refers to the process of improving the agreement ofa code calculation with respect to a set of experimental data by adjusting parameters im-plemented in the code. This process belongs to the class of inverse problems or modelcalibration. For this problem, the approximated nonlinear least squares (ANLS) methodbased on a Gaussian process (GP) metamodel has been employed by some researchers. Apotential drawback of the ANLS method is that the metamodel is built only once and notupdated thereafter. To address this diﬃculty, we propose an iterative algorithm in this study.In the proposed algorithm, the parameters of the simulation code and GP metamodel arealternatively re-estimated and updated by maximum likelihood estimation and the ANLSmethod. This algorithm uses both computer and experimental data repeatedly until conver-gence. A study using toy-models including inexact computer code with bias terms revealsthat the proposed algorithm performs better than the ANLS method and the conditional-likelihood-based approach. Finally, an application to a nuclear fusion simulation code isillustrated. Keywords: Best linear unbiased prediction; Calibration; Computer experiments; Inexact com-puter model; Kriging; Numerical optimization.

Modern computer simulation codes contain various unknown parameters. Assuming the valid-ity of the simulation code, we can adjust or estimate such parameters using the nonlinear leastsquares estimation (NLSE) method, which minimizes the sum of the squared diﬀerences between Communications in Statistics – Simulation and Computation , 2020. doi:10.1080/03610918.2020.1728317 a r X i v : . [ s t a t . C O ] J u l omputer responses and real observations. This procedure is called calibration (e.g., Kennedyand O’Hagan 2001; Higdon et al. 2008; Tuo and Wu 2018) or code tuning (e.g., Cox, Park, andSinger 2001; Kumar 2015). It is formally deﬁned as the process of improving the agreement ofa code calculation or set of code calculations with respect to a chosen and ﬁxed set of experi-mental data via adjustment of the parameters implemented in the code (Trucano et al. 2006).Han, Santner, and Rawlinson (2009) diﬀerentiated between tuning parameter and calibrationparameter. In this study, however, the two parameters are treated as the same, henceforth it isreferred to as the tuning parameter.If a simulation program is complex, with one execution requiring several hours, the NLSEmethod may not be computationally feasible. In this case, a statistical metamodel can bebuilt to approximate the unknown functional relationship between a set of controllable inputvariables and a simulated response variable. This metamodel is then employed in place of theoriginal simulation code in the NLSE method, making the problem solvable and computationallyfeasible. This method was described by Cox, Park, and Singer (2001), where a Gaussian process(GP) model was employed as the metamodel of a complex simulation code. It is called theapproximated NLS (ANLS) method. As alternatives to the ANLS, Cox, Park, and Singer(2001) proposed likelihood-based methods. Kennedy and O’Hagan (2001) introduced a fullBayesian calibration method. See, for example, Higdon et al. (2004), Henderson et al. (2009),Guillas, Glover, and Malki-Epshtein (2014), and Pratola and Higdon (2016) for various Bayesiancalibrations.A potential drawback of the ANLS method is that the metamodel is built only once andnot updated thereafter; thus, the computer data are no longer used live. To address this, in thepresent report, an iterative algorithm is proposed, in which the likelihood function is maximized ,and the squared distance is minimized iteratively until convergence is achieved. That is, thetuning parameters of the simulation code and model parameters of the GP are repeatedly re-estimated and updated. The parameters of the GP model are estimated from updated combineddata using the maximum likelihood approach. Then, the tuning parameters are re-estimatedusing the ANLS method. These two optimizations are iteratively executed until convergence isachieved. We call this method the “Max-min” algorithm.The remainder of this report proceeds as follows. Section 2 presents a GP model to approxi-mate a complex simulation code and real experimental data. Section 3 describes the calibrationmethods, including our proposed method, and the calculation of standard errors. Section 4presents a toy-model simulation study. Section 5 applies the proposed methods to a nuclear fu-sion simulator, and Section 6 summarizes the results and provides suggestions for future research.Some details of this study are provided in the Supplemental Material.2 Gaussian process model as a surrogate

For the metamodel of the simulation code, we use a GP or a spatial regression model that treatsresponse y ( x ) as a realization of a random function superimposed on a regression model: Y ( x ) = k (cid:88) j =1 β j f j ( x ) + Z ( x ) (1)where f s are known functions, and β s are unknown regression coeﬃcients. Here, the randomprocess Z ( . ), which represents a departure from the assumed linear model, is assumed to be aGP with mean zero and covariance cov ( t, u ) = V ( t, u ) = σ R ( t, u ) (2)between Z ( t ) and Z ( u ) for t = ( t , ..., t d ) , u = ( u , ..., u d ), where σ represents the processvariance (a scale factor), and R ( t, u ) is the correlation function. When the response of a computercode is stochastic, the random component term (cid:15) is added to the model (1). However, we donot include (cid:15) in this study because a computer code is assumed to be deterministic.Some possible choices for the correlation function are obtained from the Gaussian correlationsdenoted by R ( t, u ) = exp [ − θ d (cid:88) i =1 | t i − u i | ] , (3)where θ ≥

0. These are special cases of a power exponential family with a power of 2. Thenon-negative parameter θ determines the covariance structure of Z : a small θ reﬂects highcorrelations between nearby observations, whereas a large θ reﬂects low nearby correlations.One may consider a diﬀerent version of (3) by taking several θ values as follows: R ( t, u ) = exp [ − d (cid:88) i =1 θ i | t i − u i | ] , (4)where θ i ≥

0, for i = 1 , , · · · , d . This is a legitimate separable correlation function because itis a product of valid correlation functions. We refer the readers to Santner, Williams, and Notz(2018) for more information on this GP model and its application to the design and analysis ofcomputer experiments.Once the data have been collected at the design sites or at “training” inputs X , the parame-ters are estimated via the maximum likelihood estimation (MLE) method and are then pluggedin to predict y ( x ) as in (5), where x is an untried site or a “test” input. This predictionis called Kriging. The empirical best linear unbiased prediction (Santner, Williams, and Notz2003) with the MLEs of the parameters is denoted byˆ Y ( x ) = f t (cid:98) β + r t ˆ V − ( y − F (cid:98) β ) , (5)3here f is the known linear regression function vector, F is a design matrix, r t is the correlationvector between Y ( x ) and model outputs Y ( X ), y is the vector of observations collected at thedesign sites, and ˆ β is the generalized least squares estimator of β (see the Supplemental Materialfor the details).The combinations of β s and θ s determine the model, but the following simple GP model isconsidered ﬁrst: y ( x ) = β + β x + ... + β d x d + Z ( x ) (6)with the correlation function (3) or (4). When a common θ of (3) is used, we call (6) “Model1” in this study. When the several θ i s of (4) are used, we call (6) “Model 2”. For notational convenience, experimental data is denoted by the subscript “E,” computer simu-lation data by the subscript “C,” and “both” computer and experimental data by “B.” Let τ bean adjustable parameter vector to be estimated. Let T be the input variables of the computercode corresponding to τ . The original experimental input variables are denoted by X . Let q and p be the dimensions of τ and X . Further, let n C andn E be the number of observations;then, n B = n C + n E . The details of the data structure for calibration are described in theSupplemental Material. In this subsection, the GP model approach to be used in the ANLS procedure provided inCox, Park, and Singer (2001) is described. If the parameters σ , β , and θ are known, then,for a given value of τ , a “prediction” of y E ( τ, x iE ) can be calculated for the given computerdata. This is obtained using equation (5) and X E , F E , y E , and V EE . Here, the computerdata alone are used to calculate (cid:98) β and ˆ θ , and the data are not used thereafter. Note that X E , F E , R ( X E , X C ) , R ( X E , X E ), and (cid:98) β are functions of τ .A design site selected for a computer experiment is denoted by ( T , x C ). Then, the computerresponse y (or y C ) at ( T , x C ) is y C = Y ( T , x C ) , (7)where Y represents the expected value of the output from computer code. Y can be viewedas the value produced from a theory. Since we assume that the computer code is close to thereal experimental data with some variation if the tuning parameters in the computer code areoptimal, the response of the real experiment y E at ( τ , x E ) is modeled by y E = Y ( τ , x E ) + (cid:15) E . (8)4ere Y ( τ , x E ) also represents the expected value of the response in the real experiment. Thiscommon Y in the two abovementioned equations (7) and (8) connects the computer code andthe real experiment. The stochastic term (cid:15) E is assumed to be independent and identicallydistributed with mean zero and variance σ (cid:15)E . On the other hand, a computer code is sometimesconsidered inexact because of computer model bias, which is the systematic diﬀerence betweenthe model and the truth (Kennedy and O’Hagan 2001; Gramacy 2016; Plumlee 2017). In thatcase, the response of the real experiment can be modeled by y E = Y ( τ , x E ) + b ( τ , x E ) + (cid:15) E , (9)where b ( τ , x E ) represents the model bias or discrepancy.A drawback of the computer experiment is that one run of a complex simulator sometimesrequires several minutes, say, m minutes. Then, τ is usually estimated by minimizing theresidual sum of the squares: RSS ( τ ) = n E (cid:88) i =1 [ y Ei − Y ( τ , x Ei )] , (10)where y Ei is an observed response from the real experiments, and Y ( τ , x Ei ) is the expectedvalue of the output from the computer code at the experimental point ( τ , x Ei ). One evaluationof RSS ( τ ) requires approximately n E × m minutes. It is thus computationally infeasible to runthe code as many times as needed for an iterative nonlinear optimizer to ﬁnd τ .The ANLS method ﬁrst ﬁts the GP Model 1 or Model 2 as deﬁned in equation (6), withMLE using computer data alone. Then, upon treating the ﬁtted prediction model as if it werethe true model, the method attempts to determine the (cid:98) τ that minimizes the residual sum ofsquares with predictors: RSS P ( τ ) = n E (cid:88) i =1 [ y Ei − ˆ Y ( τ , x Ei )] , (11)where ˆ Y ( τ , x Ei ) is the empirical best linear unbiased prediction of Y ( τ , x Ei ), as in (5). Since itis diﬃcult to have a closed-form minimizer of (11), a numerical optimization routine is necessaryto determine (cid:98) τ . Note that ˆ Y is a computationally cheaper emulator (surrogate or metamodel) ofthe expensive simulation code. This makes the problem computationally feasible. It is appliedfor each GP model (Model 1 and Model 2) stated in (6).The advantages of this method are that it is reasonably economical and easy to implement,and the computer and experimental data are uncoupled. The ANLS method does not require thefunctional relation between the inputs and output to be the same for the computer code and realexperiments. In contrast, the likelihood-based approaches described in the next subsection dorequire this, but they employ marginal likelihoods to estimate model parameters. The predictionresiduals, r i = y Ei − ˆ Y ( (cid:98) τ , x Ei ), can be used to check the validity of prediction model, (cid:98) τ , and5f the ANLS method. A potential drawback of the ANLS method is that it does not accountfor uncertainty in the approximation of Y C by ˆ Y . Another diﬃculty is that the metamodel ˆ Y isbuilt only once and is not updated thereafter. Given a computer code and our GP approach for y , a uniﬁed statistical approach is available.We have the likelihood for all the parameters, including the tuning parameters τ ; the error termparameter σ (cid:15)E , and the random function parameters β , θ , and σ . Thus, all parameters can beestimated using the MLE method. We refer to this method as the full MLE. It can be done foreach GP model deﬁned in (6). The − β B and ˆ σ B plugged in is − log L ( τ , η B ; y B , X B ) = n B log ˆ σ B + log | V B | , (12)where ˆ σ B = ( y B − F B (cid:98) β B ) t V − B ( y B − F B (cid:98) β B ) /n B , (13) (cid:98) β B = ( F Bt V − B F B ) − F Bt V − B y B , (14)where η B = ( θ B , β B , σ B , γ E ), where γ E = σ (cid:15)E /σ .Some other approaches based on the likelihood function, including the full MLE, were alsoproposed by Cox, Park, and Singer (2001). One of them is the “Separated MLE” (SMLE)method, which maximizes the conditional likelihood function of experimental data when thecomputer data is given. Here, the parameters β , θ , and σ are estimated by maximizing themarginal likelihood for the computer data only. These are plugged into the conditional likelihoodof the experimental data given the computer data. This is then maximized with respect to γ E and τ to obtain estimates of those parameters.One advantage of the likelihood-based method is that it can simultaneously use both com-puter and experimental data to estimate τ , whereas ANLS method uses computer data only.These likelihood-based approaches enrich the tuning methods. Cox, Park, and Singer (2001),found the SMLE method to be better than the full MLE method. Thus, SMLE is compared withthe proposed method in this study. The details of the SMLE are provided in the SupplementalMaterial. The following are the steps for the proposed tuning method. We call it a Max-min algorithmbecause it uses maximization and minimization iteratively.6 lgorithm Max-min: iterate Step 3 and Step 4 until convergence is achieved

Step 1 (model building): build a surrogate (6) using the MLE for the given computerdata only . Step 2 (initial solution): set iteration i = 1, and ﬁnd (cid:98) τ by minimizing RSS p ( τ ) in (11)using surrogate (6). Step 3 (maximization): build a new surrogate (6), using the MLE for the combined data with the ﬁxed (cid:98) τ obtained in the previous step. Step 4 (minimization): set iteration i = i + 1, and ﬁnd (cid:98) τ by minimizing RSS p ( τ ) in (11)using the surrogate built in Step 3. If (cid:98) τ satisﬁes the stopping rule, then stop; otherwise,go to Step 3.Note that in each iteration of Steps 3 and 4, (cid:98) τ is updated; thus, the estimates of theparameters of θ, β, σ (cid:15)E , and σ are updated. We expect this to positively inﬂuence the ﬁndingof (cid:98) τ of Step 4. Steps 2 and 4 are the same in terms of minimizing RSS p ( τ ), but Step 2 usesonly computer data, while Step 4 uses either the combined or computer data. Steps 1 and 3 arethe same in terms of obtaining the MLE by maximizing the likelihood function, but Step 1 usesonly computer data, while Step 3 uses the combined data. The likelihood function in Step 3 is L ( η B ; (cid:98) τ , y B , X B ), which is used for the full MLE method. But here, the tuning parametersare ﬁxed as the (cid:98) τ that was obtained in the previous step. For the optimizations in Steps 2, 3,and 4, quasi-Newton numerical algorithms were used.On using the combined data in Step 3, we assume that the functional relation between theinputs and output is the same for the computer code and physical process. The computer dataand the experimental data are linked by the use of this common response function, with (cid:98) τ beingthe value for the tuning parameters in data from the physical process.The Max-min algorithm stops when one of the following rules is satisﬁed: for i = 1 , , · · · ,1 (maximum iteration): Number of iterations reaches the pre-assigned maximum number,2 (minimum improvement): RSS p ( (cid:98) τ i +1 ) > RSS p ( (cid:98) τ i ) − f tol for ‘maxagain’ consecutiveiterations,3 (minimum relative improvement): [ RSS p ( (cid:98) τ i +1 ) − RSS p ( (cid:98) τ i )] /RSS p ( (cid:98) τ i ) > − f tol for ‘max-again’ consecutive iterations,where ‘ftol’ is a pre-assigned small value for tolerance. Here RSS p ( (cid:98) τ ) is the minimum value of RSS p obtained in Step 2, and RSS p ( (cid:98) τ i ) is the minimum value obtained in Step 4 in the i -thiteration. 7hen the RSS p in Step 4 is greater than that of Step 2 or that of the last iteration, asmall random ﬂuctuation on (cid:98) τ is given before Step 3. Without this ﬂuctuation setting, thealgorithm stopped within four iterations. Based on our experience, the random ﬂuctuationcaused a reduction in RSS p . However, it did not make the algorithm execute more than 20iterations.When x is a given site representing ( (cid:98) τ , x Ei ) in the experimental data ( X E ), the predictionformula needed for computing RSS p in Step 2 isˆ Y C ( x ) = f t (cid:98) β C + r t C ˆ V − CC ( y C − F C (cid:98) β C ) , (15)where f is the known linear regression function vector; r C is the n C × Y ( x ) and Y ( X C ), and (cid:98) β C = ( F tC V − CC F C ) − F tC V − CC y C . The ˆ θ C is plugged into V CC ,where ˆ θ C is the MLE from the computer data only in Step 1.The prediction formula needed for computing RSS p in Step 4 isˆ Y B ( x ) = f t (cid:98) β B + r t B ˆ V − B ( y B − F B (cid:98) β B ) , (16)where r B is the ( n E + n C ) × Y ( x ) and Y ( X B ). Here, ˆ θ B , (cid:98) β B ,and ˆ γ E are the MLEs from the combined data in Step 3. Note that ˆ θ B and ˆ γ E are needed forconstructing V B . Even though x and X E are included in the construction of r B and V B , theprediction ˆ Y ( x ) in (16) is not an exact interpolation because of the positive ˆ γ E .Our approach was motivated by the iteratively re-weighted least squares method in regressionanalysis. The alternating estimation of parameters recursively is similar to the EM (expectation-maximization) algorithm. One can view this method as similar to a frequentist version of theBayesian modularization approach (Liu, Bayarri, and Berger 2009). Some practical suggestionsfor the calibration of large-scale simulations and optimization are provided in Gramacy (2016,Section 4). However, our study was conducted independently from their study.The advantages of the proposed Max-min algorithm are as follows. The uncertainty in theapproximation of Y using ˆ Y in Step 4 is smaller than that in the ordinary ANLS, becausethe MLE method in Step 3 and the prediction model in Step 4 use combined data (a largersample size), while the ordinary ANLS method uses only the computer data in Step 1 for MLEand in Step 2 for prediction. In addition, the Max-min algorithm accounts for the covariancebetween the computer data and experimental data in building the metamodel. Moreover, our testfunction experience revealed that the solutions from the Max-min algorithm are less inﬂuencedby the initial value of (cid:98) τ than by the ANLS method because the estimated (cid:98) τ in the Max-min isiteratively updated several times. Once τ has been estimated, some indications of the accuracy of the estimates are generallynecessary. Thus, we rely on the asymptotic theory for the nonlinear least squares estimator8hat appears in the regression analysis (Draper and Smith 1981). The approximate 100(1 − α )%conﬁdence region of τ is obtained by { τ : RSS P ( τ ) ≤ RSS P ( (cid:98) τ )[1 + qn E − q F α ( q, n E − q )] } , (17)where F α ( q, n E − q ) is the upper (1 − α ) percentile of the F distribution with q and n E − q degreesof freedom ( q is the number of parameters in τ ). Wong, Storlie and Lee (2017) considered aBootstrap approach to compute the uncertainty of the estimates. In this section, we apply the methodology outlined in the previous sections to the seven testfunctions Y ( τ , x ) that are easy to compute. In the ﬁrst ﬁve test functions, the experimentaldata with sample size n E are generated by y E = Y ( τ ∗ , x ) + e, (18)where the random variable e follows a normal distribution with mean zero and variance σ e .These ﬁve examples assume that the functional relation between the output and inputs is thesame for both the computer code and the physical process. By contrast, in the last two testfunctions (6 and 7), the model bias term b ( x ) is added to the function Y ( τ ∗ , x ) in (18). Here, τ ∗ stands for the true tuning parameters and is speciﬁed for each toy model. No random erroris given to the computer responses. Sample sizes for both data are set to 30 ( n C = n E = 30)for the ﬁrst ﬁve test functions. For generating the computer and experimental data, the sameuniform distributions were used for the random numbers of x variables. The details on the eighttest functions are described in the Supplemental Material.To address the uncertainty resulting from the designs for the input variables, we repeatedthe τ estimation using 30 random Latin-hypercube designs. Thus, 30 sets of estimates of τ wereobtained, and the averages and standard deviations are reported. Two diﬀerent GP models of(6) were used with the correlation functions of (3) and (4), which are called “Model 1” and“Model 2,” respectively.Figure 1 shows box plots of the distance to the true value in ﬁve test functions in whichModel 2 is employed as a surrogate. The Max-min algorithm generally works better than theANLS and SMLE methods. Tables S1 to S8 in the Supplemental Material present the resultsfor each toy model, each method, and Model 1 and Model 2. Some columns show the averageof Euclidian distances (Dist) between the true τ and estimates, with standard deviations inparentheses. The last column shows the mean squared error (MSE) of the estimates obtained9sing the following formula: M SE ( (cid:98) τ ) = ( Dist ) + q (cid:88) i =1 ( std (ˆ τ i )) , (19)where std (ˆ τ i ) is the standard deviation of each estimate computed from 30 repetitions. FiguresS1 to S5 show the box plots for each test function and method. In terms of the average distanceto the true value and MSE in Tables S1 to S5, the Max-min algorithm works better than ANLSfor all toy models. Max-min also works better than SMLE for test functions 1, 2, and 4. Finally,SMLE works slightly better than ANLS, except in test function 2. . . . . . A v e r age d i s t an c e t o t he t r ue v a l ue A S M A S M A S M A S M A S M

Test 1 Test 2 Test 3 Test 4 Test5

Figure 1: Box plot of distance to the true value in Five test functions in which Model 2 isemployed as a surrogate. The acronyms A, S, and M at the bottom stand for the ANLS, SMLE,and Max-min methods, respectively.Figure S6 shows the typical convergence of the Max-min algorithm for 10 trials of testfunction 1. In most cases, the algorithm stops at the third or fourth iteration, except for a fewcases wherein it stops at the sixth or seventh iteration. This means that the improvement of (cid:98) τ in the second iteration is signiﬁcant, while it may not be signiﬁcant after the second iteration.The small random ﬂuctuation on (cid:98) τ , as mentioned in subsection 4.1, was not applied in thiscomputation. 10 .2 Another prediction A potential drawback of the Max-min algorithm is that it partially fails to decouple between theestimation of τ and building of the metamodel because it uses the combined data in both Steps3 and 4. To address this diﬃculty, instead of using equation (16) in Step 4, one can employprediction (15) but with ˆ θ B and (cid:98) β B (or (cid:98) β C ):ˆ Y ( x ) = f t (cid:98) β B + r t C ˆ V − CC ( y C − F C (cid:98) β B ) . (20)This leads to another version of the Max-min algorithm, which is applied to the next toy models.Note that this version does not use ˆ γ E . We denote this prediction as ˆ Y C | B ( x ). We can employ (cid:98) β C instead of (cid:98) β B in (16), but we have not yet attempted to do so. A computer model is often considered inexact. This means that the computer model does notperfectly match the real system even if some parameters included with it are optimal (Plumlee2017). This is the computer model bias; it is the diﬀerence between the model and the truth.The model-bias term b ( x ) is included in the following two test functions. These are employedto check the performance of the proposed method for cases where the functions generating theoutput for the computer code and for the physical experiment diﬀer, but we incorrectly assumethat (7) and (8) hold.Test function 6: n C = n E = 20 Y ( τ , x ) = τ (1) x + τ (2) x Computer data : T ∼ U (1 , , , T ∼ U (1 , , x ∼ U (0 , , x ∼ U (0 , Experimental data : y E = Y ( τ , x ) + b ( x ) + ε, b ( x ) = x sin (5 x ) ,τ = 4 . , τ = 4 . , σ E = 0 . . Test function 7: n C = n E = 20 Y ( τ , x ) = (cid:16) − exp ( − x ) (cid:17) × (100 τ x + 1900 x + 2092 x + 60) / (100 τ x + 500 x + 4 x + 20)+5 exp ( − τ ) × ( x τ / ) / (100( x τ / + 1)) Computer data : T , T , T ∼ U (0 . , , x , x ∼ U (0 , Experimental data : y E = Y ( τ , x ) + b ( x ) + ε, b ( x ) = (10 x + 4 x ) / (50 x x + 10) ,τ = 2 . , τ = 1 . , τ = 3 . , σ E = 0 . . Test function 6 is modiﬁed from Plumlee (2017). Test function 7 was used in Bastos andO’Hagan (2009), in Goh et al.(2013), and in Gramacy (2016). Twenty random Latin hypercubedesigns were used repeatedly to address the uncertainty.11n the previous exact toy-model study, we used the diﬀerence between (cid:98) τ and true τ for thephysical process as a measure of performance. This requires that there be a true τ , but therewould be no such true τ in an inexact computer model. In this situation, the minimum valueof RSS p would be a more appropriate measure.In calculating RSS p in the Max-min algorithm for the above two test funtions, we set ftol =1.e-4, and maxagain = 7. For the small ﬂuctuation of (cid:98) τ , random numbers from N (0 , σ τ ) wereused where σ τ = max ( (cid:98) τ × . , . RSS p values for test function 6 computedfrom 20 Latin hypercube designs. RSS p values are calculated using the ANLS method and theMax-min algorithm with GP Model 1 and Model 2 via two diﬀerent predictions. The predictionsby ˆ Y B in (16) and ˆ Y C | B in (20) in calculating RSS p in the Max-min algorithm were attemptedfor the sake of comparison. The Max-min algorithm worked better than ANLS in most cases.The left panel of Figure 3 is an ANOVA-type main-eﬀect plot based on mean values, whichshows improvements from ANLS to Max-min, from Model 1 to Model 2, and from ˆ Y C | B to ˆ Y B .The mean values from the last two predictions (IB0 and IB1) are calculated within the Max-minalgorithm.Relative improvement (RI) from ANLS to Max-min is provided in Table 1. RI is deﬁned by RI = mean RSS p ( AN LS ) − mean RSS p ( M ax − min ) mean RSS p ( AN LS ) . (21) ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ●● ●●●●●●●●●● ●●●●●● ●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●● ●●●●●●● ●●● ●●●●●●●●●●●● ●●●●● ●●●●● ●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●● ●●●●●● ●●●●●●●●●●●●●●●● ●●●●●● ●●●● ●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●● ●●●●●●● ●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●● ●●●● ●●●●●●● ●●●●●●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● Model 1 Model 2 Model 1 Model 2 R SS p IB=0 IB=1

Figure 2:

RSS p (residual sum of squares with prediction) values for test function 6 computedfrom 20 Latin hypercube designs. RSS p values were calculated via two tuning methods with GPModel 1 and Model 2, and via two diﬀerent predictions. The acronyms IB=0 and IB=1 standfor prediction by ˆ Y C | B in (20) and by ˆ Y B in (16) in the Max-min algorithm, respectively.12 m ean o f R SS p M1 M2IB 0 IB 1ANLSE Max−Min m ean o f R SS p Ibias 0 Ibias 1M1 M2IB 0 IB 1ANLSE Max−Min

Figure 3: Mean values of

RSS p (residual sum of squares with prediction) for test function 6(left panel) and for test function 7 (right panel), showing the tuning methods (ANLS and Max-min), surrogate models (M1 and M2), and prediction formulas (IB0 and IB1). The acronymsM1 and M2 represent for GP Model 1 and Model 2, IB0 and IB1 for the predictions by ˆ Y C | B and ˆ Y B in the Max-min algorithm, and ‘ibias 1’ and ‘ibias 0’ for the cases with and without biascorrection, respectively.Table 1: Relative improvement in % deﬁned in (21) from ANLS to the Max-min method for testfunctions 6 and 7. The acronyms ‘ibias 1’ and ‘ibias 0’ stand for the cases with and without biascorrection, and IB = 0 and IB = 1 represent the predictions by ˆ Y C | B and ˆ Y B in the Max-minalgorithm, respectively. Test 6 ibias 0 Test 7 ibias 0 Test 7 ibias 1IB Model 1 Model 2 Model 1 Model 2 Model 1 Model 20 ( ˆ Y C | B ) 16.5 27.0 24.5 61.4 23.0 69.51 ( ˆ Y B ) 23.1 39.9 52.4 37.4 76.2 74.6For tuning test function 7, we considered a simple bias correction (BC) technique. Two con-stants for additive and multiplicative corrections were set for the predictor (Fernandez-Godinoet al. 2016) as ˆ Y bc ( x ) = ρ ˆ Y ( x ) + δ. (22)This naive BC method may not be better than sophisticated methods such as that using ρ ( x )and δ ( x ), or the approach by Kennedy and O’Hagan (2001), but we hope for this to work better13 ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●● ●●●●●●●●●●●●●● ●●●●●● ●● ●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●● ●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ●● ●●●●●●● ●●●●● ●●●● ●●●●● ●●●●●●●●●●●●●● ●● ●●●●●●●● ●●●●●●●●● ●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●●●● ●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●● ●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●●●●●●● ●●●●● ●●●●●● ●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●● ●●●●●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●● ●●●● ●●●●●● ●●● ●●●● ●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● Model 1 Model 1 Model 1 Model 1Model 2 Model 2 Model 2 Model 2 R SS p non−BC BCIB=0 IB=1 IB=0 IB=1 Figure 4: This is the same as Figure 2, but for test function 7, calculated without (non-BC) andwith bias correction (BC).than the non-BC case. The objective function

RSS p in (11) is now changed to RSS bcp ( τ , ρ, δ ) = n E (cid:88) i =1 [ y Ei − ( ρ ˆ Y ( τ , x Ei ) + δ )] . (23)Then, τ , ρ , and δ are estimated simultaneously by minimizing RSS bcp in both the ANLS andthe Max-min methods.Figure 4 shows parallel coordinated box plots of the

RSS p values for test function 7, com-puted from 20 Latin hypercube designs. The left panel is for non-BC, while the right one isfor BC. The acronyms in this ﬁgure are the same as in Figure 2. This ﬁgure shows that theMax-min algorithm worked better than ANLS in most cases. The right panel of Figure 3 is amain-eﬀect plot for test function 7. It shows improvements from ANLS to Max-min, Model 1to Model 2, non-BC to BC, and ˆ Y C | B to ˆ Y B . Note that the patterns of the main eﬀects for testfunctions 6 and 7 are very similar. The RI from ANLS to Max-min is provided in Table 1. Thebias correction was most eﬀective when used with Model 2 and with the prediction ˆ Y B . A simple measure of energy eﬃciency in a nuclear fusion device (called a tokamak, from theRussian language) is the global energy conﬁnement time τ E . The theoretically based conﬁnement14odel can be written as follows (Kay and Goldston 1985): y E = f ( τ , P, I, N, B ) , (24)where f is a known function calculated using a complex simulation code called Baldur, P is thetotal input power, I is the plasma current, N is the electron density, B is the magnetic ﬁeld, and τ = ( τ , τ , τ , τ ) are the following adjustable parameters that determine energy transfer,that is, drift waves, rippling, resistive ballooning, and the critical value of η i (which provokesincreased ion energy losses for the drift waves), respectively.The experimental data comprises only P, I, N, B , and the real observation y E , whereas thecomputer data comprises eight independent variables ( T , P, I, N, B ) and computer response y C obtained using Baldur. The experimental data were drawn from the database of S. Kaye:42 observations from the PDX (Poloidal Divertor Experiment) tokamak in Princeton and 64from the Baldur simulator (Singer et al. 1988). Because the Baldur simulator requires ﬁveCPU minutes on a Cray supercomputer for one execution, a careful selection of input points isrequired, which is a statistical design problem for a complex simulation code. For this purpose,we used a data-adaptive sequential optimal design strategy, which is described brieﬂy in theSupplemental Material.Table 2: Tuning results from nuclear fusion example where ˆ τ are estimates of tuning parameters. RSS p is the residual sum of squares with predictor in which a GP Model 1 is employed.Method ˆ τ ˆ τ ˆ τ ˆ τ RSS p ANLS 1.012 2.035 1.110 1.308 0.4406SMLE 1.120 2.055 0.118 1.303 0.2908Max-min 0.667 1.053 0.477 1.823 0.1546

Table S9 in the Supplemental Material presents the MLE of the parameters of GP Model 1obtained from the combined data. Table 2 provides the results of τ estimation obtained usingANLS, SMLE, and the Max-min methods on the basis of Model 1. A quasi-Newton optimiza-tion routine was employed in searching for (cid:98) τ . Several starting values were tried to avoid thelocal minima. The last column in Table 2 shows the value of RSS p at the convergence of thealgorithms. The RSS p for SMLE was obtained by calculating RSS p for the estimated τ . The RSS p for the Max-min with Model 1 is the smallest among the three methods.Figures 5 and 6 show the residual plot (residual vs. predicted values) and conﬁdence regionsof the tuning parameters ( τ and τ ) for the tokamak data, which were obtained using the Max-min algorithm with Model 1. In Figure 5, the predicted values for the computer data (circles)15ad a wider horizontal range than those for the experimental predicted values (crosses). Inaddition, the residuals for the computer data had a narrower vertical range than those for theexperimental data. These results indicate that the computer data are ﬁtted relatively well, andthat good coverage of the range of experimental observations is ensured. − . − . − . . . . . Residual plot y^ R e s i dua l Experimental dataComputer data

Figure 5: Nuclear fusion example in which the computer code was tuned by the Max-minalgorithm using GP Model 1. The circles and crosses denote experimental and computer data,respectively.

Using a GP model, we considered an iteratively re-estimated ANLS method to tune complexcomputer code to data, namely a Max-min algorithm. This method is an extension of the ANLSmethod. A simulation study using toy functions suggests that the proposed Max-min algorithmworks better than the ANLS and SMLE methods. We applied this technique to a computationalnuclear fusion model. The proposed method can be useful for other applications in disciplines inwhich unknown theoretical parameters must be estimated using complex computer codes and realexperimental data. Moreover, the Max-min algorithm is also applicable to other metamodels,such as spline (Wong, Storlie and Lee 2017), support vector machines, and neural networks.The Max-min algorithm requires more computing time than the ANLS method. This isbecause Max-min uses the combined data, resulting in a larger correlation matrix that must be16 . . . . Confidence Region t t

99% C.R95% C.R90% C.R

Figure 6: Tuning parameters in Baldur code for nuclear fusion data, where the estimate(ˆ τ , ‘ ˆ τ ) = ( . , . RSS p compared to the ANLS method,which requires just one iteration.Two versions of predictions were tried in the last two toy models wherein bias functionswere added. The prediction based on both data sources ( ˆ Y B ) reduced RSS p (residual sum ofsquares of prediction) more than that based on computer data given the parameter estimatesobtained from both data sources ( ˆ Y C | B ). Even though ˆ Y B reduced RSS p more than ˆ Y C | B ,the improvement of the estimation of true τ compared to ˆ Y C | B is not known. A simple biascorrection was considered in test function 7. This bias correction was the most eﬀective whenused with GP Model 2 and with the prediction based on both data sources.We tried two diﬀerent GP models in the toy-model study, and found that the estimatesof tuning parameters were changed according to the selected GP model. Model 1 in (6) is asuperimposition of a GP with correlation function (3) on the ﬁrst-order regression model. Webelieve that the ﬁrst-order regression part sometimes works well to ﬁt the dominant relationship,even though the isotropic correlation (3) may be unrealistic for capturing the variability ofresponses from the dominant relationship. It is computationally easier to obtain the MLE whenthe number of input variables is large. Model 2, with (4) in (6) assumes a separable Gaussiancorrelation, which is a popular family of correlation models as per the literature (Santner,17illiams and Notz 2018), but it is computationally expensive to obtain the MLE if many inputsexist.A model selection procedure among many GP models with various combinations of nonzero β j s and θ j s in (6), as in Marrel et al. (2008), Chen et al. (2016), and Lee and Park (2017),may lead to realization of a better surrogate model. In this study, we did not consider theidentiﬁability of the tuning parameter, which is an important focus of recent developments (e.g.,Plumlee 2017; Tuo and Wu 2018).There are, however, basic limitations with tuning computer code to real-world data regardingthe experimental design. We found that the performance of tuning methods is signiﬁcantlydependent on the designs for both the computer experiments and the physical experiments.Some authors, including Cailliez, Bourasseau, and Pernot (2014), and Beck and Guillas (2016),explored this topic. We agree that a sequential tuning approach is practically useful, as inPratola et al. (2013), Kumar (2015), and Damblin et al. (2018). Further research on relevantdesigns under the sequential tuning approach will be helpful. Acknowledgments

The authors would like to thank the reviewers and the associate editor for their helpful suggestions,which have greatly improved the presentation of this paper. This report follows some parts of an invitedpresentation by the third author at the GdR Mascot-Num annual conference at Ecole de Mines St-Etienne,France, in 2015. We would like to thank the organizers of this conference for the invitation and for theirhospitality. We are also grateful to Professor Cliﬀord Singer (Department Of the Nuclear Engineeringdepartment, University of Illinois at Urbana-Champaign) for providing the tokamak data. This work wassupported by the National Research Foundation of Korea (NRF) grant funded by the Korean government(MSIP) (no. 2016R1A2B4014518). Seo’s work was funded by the Korea Meteorological AdministrationResearch and Development Program “Enhancement of Convergence Technology of Analysis and Forecaston Severe Weather” under Grant 1365003081.

References [1] Bastos, L.S., and A. O’Hagan. 2009. Diagnostics for GP emulators,

Technometrics . 51(1):425–438.[2] Beck, J., and S. Guillas. 2016. Sequential design with mutual information for computer experiments(MICE): Emulation of a tsunami model.

Jour of Uncertainty Quantiﬁcation . 4:739–766.[3] Cailliez, F., A. Bourasseau, and P. Pernot. 2014. Calibration of Forceﬁelds for Molecular Simulation:Sequential Design of Computer Experiments for Building Cost-Eﬃcient Kriging Metamodels.

Journalof Computational Chemistry . 35:130–149.[4] Chen, H., J.L. Loeppky, J. Sacks, and W.J. Welch. 2016. Analysis methods for computer experiments:How to assess and what counts.

Statistical Science . 31(1):40–60.[5] Cox, D.D, J.S. Park, and C.E. Singer. 2001. A statistical method for tuning a computer code to adata base.

Computational Statistics & Data Analysis . 37:77–92.[6] Damblin, G., P. Barbillon, M. Keller, A. Pasanisi, and E. Parent. 2018. Adaptive numerical designs forthe calibration of computer codes.

SIAM/ASA Journal of Uncertainty Quantiﬁcation . 6(1):151–179.

7] Draper, N., and L. Smith. 1981.

Applied Regression Analysis . 2nd Ed, New York, Wiley.[8] Fernandez-Godino, M.G., C. Park, N.H. Kim, and R.T. Haftka. 2016. Review of multi-ﬁdelity models.arXiv preprint arXiv:1609.07196[9] Goh, J., D. Bingham, J.P. Holloway, M.J. Grosskopf, C.C. Kuranz, and E. Rutter. 2013. Predic-tion and computer model calibration using outputs from multi-ﬁdelity simulators,

Technometrics .55(4):501–512.[10] Gramacy, R.B. 2016. laGP: Large-scale spatial modeling via local approximate Gaussian processesin R.

Journal of Statistical Software . 72(1):1–46.[11] Guillas, S., N. Glover, and L. Malki-Epshtein. 2014. Bayesian calibration of the constants of thek-epsilon turbulence model for a CFD model of street canyon ﬂow.

Computer Methods in AppliedMechanics and Engineering . 279:536–553.[12] Han, G., T.J. Santner, and J.J. Rawlinson. 2009. Simultaneous determination of tuning and calibra-tion parameters for computer experiments.

Technometrics . 51:464–474.[13] Henderson, D.A., R.J. Boys, K.J. Krishnan, C. Lawless, and D.J. Wilkinson. 2009. Bayesian emu-lation and calibration of a stochastic computer model of mitochondrial DNA deletions in substantiaNigra Neurons.

Journal of the American Statistical Association . 104:76–87.[14] Higdon, D., M. Kennedy, J.C. Cavendish, J.A. Cafeo, and R.D. Ryne. 2004. Combining ﬁeld data andcomputer simulations for calibration and prediction.

SIAM Journal on Scientiﬁc Computing . 26:448–466.[15] Higdon, D., C. Nakhleh, J. Gattiker, and B. Williams. 2008. A Bayesian calibration approach to thethermal problem.

Computer Methods in Applied Mechanics and Engineering . 197:2431–2441.[16] Kaye, S.M., and G.C. Goldston. 1985. Global energy conﬁnement scaling for neutral-beam-heatedtokama.,

Nuclear Fusion . 25:65–69.[17] Kennedy, M.C., and A. O’Hagan. 2001. Bayesian calibration of computer models.

Journal of theRoyal Statistical Society: Series B . 63:425–464.[18] Kumar, A., 2015. Sequential tuning of complex computer models.

Journal of Statistical Computationand Simulation . 85:393–404.[19] Lee, Y., and J.S. Park. 2017. Model selection algorithm in Gaussian process regression for computerexperiments.

Communications for Statistical Applications and Methods . 24(4):383-396[20] Liu, F., M.J. Bayarri, and J.O. Berger. 2009. Modularization in Bayesian analysis, with emphasison analysis of computer models.

Bayesian Analysis . 4(1):119–150.[21] Marrel, A., B. Iooss, F.V. Drope, and E. Volkova. 2008. An eﬃcient methodology for modelingcomplex computer codes with Gaussian processes.

Computational Statistics & Data Analysis . 52:4731–4744.[22] Molina, G., M.J. Bayarri, and J.O. Berger. 2005. Statistical inverse analysis of a network microsim-ulator.

Technometrics . 47:388–398.[23] Plumlee, M., 2017. Bayesian calibration of inexact computer models.

Journal of the American Sta-tistical Association . 112(519):1274–1285.[24] Pratola, M.T., and D.M. Higdon. 2016. Bayesian additive regression tree calibration of complexhigh-dimensional spatial-temporal processes.

Technometrics . 58(2):166–179.

25] Pratola, M.T., S.R. Sain, D. Bingham, M. Wiltberger, and E.J. Riglerd. 2013. Fast sequential com-puter model calibration of large nonstationary spatial-temporal processes.

Technometrics . 55(2):232–242.[4] Sacks, J., W. Welch, T. Mitchell, and H. Wynn. 1989. Design and analysis of computer experiment(with discussion).

Statistical Science . 4:409–435.[5] Santner, T.J., B. Williams, and W. Notz. 2003.

The Design and Analysis of Computer Experiments .New York, Springer-Verlag.[28] Santner, T.J., B. Williams, and W. Notz. 2018.

The Design and Analysis of Computer Experiments,Second Edition . New York, Springer-Verlag.[29] Singer, C., D. Post, D. Mikkelsen, M. Redi, A. Mckenney, A. Silverman, F.G.P. Seidl, P.H. Ruther-ford, R.J. Hawryluk, W.D. Langer, L. Foote, D.B. Heifetz, W.A. Houlberg, M.H. Hughes, R.V. Jensen,G. Lister, and J. Ogden. 1988. BALDUR: A one-dimensional plasma transport code.

Computer PhysicsCommunications . 49:275–398.[30] Trucano, T.G., L.P. Swiler, T. Igusa, W.L. Oberkampf, and M. Pilch. 2006. Calibration, validation,and sensitivity analysis: What’s what.

Reliability Engineering & System Safety . 91:1331–1357.[31] Tuo, R., and C.F.J. Wu. 2018. Prediction based on the Kennedy-O’Hagan calibration model: asymp-totic consistency and other properties.

Statistica Sinica . 28:743–759.[32] Wong, R., C. Storlie, and T. Lee. 2017. A frequentist approach to computer model calibration.

Journal of the Royal Statistical Society: Series B , 79:635–648. upplemental Material A.1 Maximum likelihood estimation in Gaussian process model

Once the data have been collecte at the observation sites { x , ..., x n } , we use the maximum likeli-hood estimation (MLE) method to estimate the parameters in linear model part and covariancefunction. Since we assume that y ( x ) is a Gaussian process with mean F β and covariance matrix σ R , the likelihood function of y is L ( y ; θ, β, σ , γ C , x ) = (2 πσ ) − n/ (cid:112) | V | exp (cid:32) − ( y − F β ) t V − ( y − F β )2 σ (cid:33) , (25)where F is a so-called design matrix. When the covariance parameters θ and γ C are speciﬁed,the MLEs of σ and β are denoted by (cid:98) β = ( F t V − F ) − F t V − y, (cid:99) σ = 1 n ( y − F (cid:98) β ) t V − ( y − F (cid:98) β ) . (26)Here, ˆ β is the generalized least squares estimator of β . Since the likelihood equations do notlead to a closed-form solution, a numerical optimization procedure is required. The Choleskydecomposition V = U t U is used as a major computation in calculating the likelihood function,where U is an upper triangular Cholesky factor. The computational details of calculating andminimizing negative log-likelihood function are provided in Park and Baek (2001). One can usea R program DiceKrig (Roustant 2012).

A.2 Data structure for code tuning

A.2.1 Computer and experimental data

For notational convenience, experimental data is denoted by the subscript “E” and computersimulation data by subscript “C.” Let τ be an adjustable parameter vector to be estimated. Let T be the input variables of the computer code corresponding to τ . Here, τ is a vector of thedeterministic tuning parameters and T is a vector of the random variables.The original experimental input variables is denoted by X . Let q and p be the dimensionsof τ and X . Further, let n C , n E be the number of observations. Then, we have the data matrixof the independent variables: X C and X E for computer and experimental data; X E =  τ τ · · · τ q x E x E · · · x Ep τ τ · · · τ q x E x E · · · x Ep ... ... ... τ τ · · · τ q x E n E x E n E · · · x Epn E  (27) X C =  t t · · · t q x C x C · · · x Cp t t · · · t q x C x C · · · x Cp ... ... ... t n C t n C · · · t qn C x C n C x C n C · · · x Cpn C  . (28)Here, t ij in X C represents the j -th value of the i -th T variable ( T i ) and x Eij and x Cij denotethe j -th value of the i -th X variable of experimental ( X Ei ) and computer ( X Ci ) input. X E

21s a n E × ( q + p ) matrix and X C is a n C × ( q + p ) matrix. Note that the ﬁrst part of X E iscomposed of the unknown parameters τ , · · · , τ q , while the corresponding part of X C comprisesinput values ( t ij ). A.2.2 Combined data

The following notations for combined computer and experimental data are introduced: X B = (cid:18) X C X E (cid:19) , F B = (cid:18) F C F E (cid:19) = (cid:18) f ( X C ) f ( X E ) (cid:19) , y B = (cid:32) y C y E (cid:33) (29)for the data matrix of the independent variables; the so-called “design matrix, deﬁned as thefunctions of the values of input variables; and the computer responses and real observations,respectively. Here, the subscript “B” indicates the combined “both” computer and experimentaldata. Note that variables T are incorporated in the simulation code as design sites. X C and F C contain T , while X E and F E are the functions of the unknown parameters τ .The Gaussian process model is now simultaneously applied to computer and experimentaldata. Let η = ( τ , θ, γ C , γ E , σ , β ), where γ C = σ (cid:15)C /σ and γ E = σ (cid:15)E /σ , which are the varianceratios for the computer and experimental data. Here, σ (cid:15)C and σ (cid:15)E are the variances of errorterm ( (cid:15) ) in the Gaussian process model for the computer and experimental data, respectively.When necessary, β C and β E are used to denote the regression coeﬃcients for the computer andreal experimental data. Then, given the independence and normality assumptions, we have Law ( y B | η ) = N ( F B β B , V B ) , (30)where β B = ( β C , β E ) t , (31) V B = (cid:20) V CC V CE V EC V EE (cid:21) = σ (cid:20) R ( X C , X C ) R ( X C , X E ) R ( X E , X C ) R ( X E , X E ) (cid:21) + σ (cid:20) γ C I γ E I (cid:21) , (32)where R ( X C , X E ) represents a n C × n E matrix composed of the correlations computed between X C and X E . Note that V B is a n B × n B positive deﬁnite covariance matrix for the combineddata, where n B = n C + n E . We set γ C = 0 because only a deterministic computer model isconsidered in this study. A.3 Separated MLE for code tuning

For the details of SMLE, we make use of the conditional distribution of the experimental datagiven the computer data, which is normally distributed with mean µ E | C = E [ y E | y C ; τ , η ] = F E β E + + V tCE V − CC ( y C − F C β C ) , (33)and covariance V E | C = Cov[ y E | y C ; τ , η ] = V EE − V tCE V − CC V CE , (34)where covariance matrices V ’s are given as in (32). In these formulae, we suppressed the pa-rameter dependencies in µ E = F E β E , µ C = F C β C , V CE , V CC , and V EE . Now the − β and ˆ σ E | C pluggedin is − log L ( τ , γ E ; X E , y E | y C , ˆ γ C , (cid:98) β C , ˆ θ, X C ) = n E log ˆ σ E | C + log | V E | C | , (35)where ˆ σ E | C = ( y E − ˆ µ E | C ) t V − E | C ( y E − ˆ µ E | C ) /n E , (36)ˆ µ E | C = F E (cid:98) β E + V tCE V − CC ( y C − F C (cid:98) β C ) . (37) A.4 Test functions for toy model study

Test function 1: nC = nE = 30 for all ﬁve test functions. Y ( τ, x ) = τ exp( τ + x ) + τ x − τ x Computer data : T ∼ U (0 , , T ∼ U (0 , , x ∼ U ( − , ,x ∼ U ( − , , x ∼ U (0 , Experimental data : τ = 2 , τ = 2 , σ E = 1 . Test function 2: Y ( τ, x ) = τ exp ( τ + x + τ ) + τ τ x − τ x − τ log ( x ) Computer data : T ∼ U (0 , , T ∼ U (0 , , T ∼ U (1 , ,x ∼ U ( − , , x ∼ U ( − , , x ∼ U (0 , , x ∼ U (1 , Experimental data : τ = 2 , τ = 1 , τ = 3 , σ E = 1 . Test function 3: Y ( τ, x ) = τ exp( | x + x | ) + τ ( x + 1 . x + 1) / . τ x + x ) Computer data : T ∼ U (0 , , T ∼ U (1 , , x ∼ U ( − . , . ,x ∼ U ( − . , . , x ∼ U ( − . , . , x ∼ U ( − . , . Experimental data : τ = 2 , τ = 3 , σ E = 0 . . Test function 4: Y ( τ, x ) = τ x ( x − x ) / log( x x ) (cid:16) τ x x log( x /x ) x x + x x (cid:17) Computer data : T ∼ U (5 , , T ∼ U (1 , , x ∼ U (6370 , ,x ∼ U (990 , , x ∼ U (700 , , x ∼ U (100 , , x ∼ U (0 . , . ,x ∼ U (1120 , , x ∼ U (9855 , , x ∼ U (63 . , Experimental data : τ = 2 π, τ = 2 , σ E = 2 . Y ( τ, x ) = τ x + τ x + τ cos ( x π ) + τ sin ( x π ) Computer data : T ∼ U (0 , , T ∼ U (0 , , T ∼ U (0 , , T ∼ U (0 , x ∼ U (0 , , x ∼ U (0 , , x ∼ U (0 , , x ∼ U (0 , Experimental data : τ = 1 , τ = 2 , τ = 3 , τ = 2 , σ E = 4 . The test function 4 was used in Morris and Mitchell (1995), which has a physical interpre-tation that y C represents steady-state ﬂow of water through a borehole between two aquifers.In each test function, n C values of ( T , x ) were selected as inputs for the “computer code”,that is, the function Y ( T , x ) is evaluated at n C values. The inputs were chosen to be well spreadaround a reasonable space known to potentially contain the true parameter value. n C computerdata points and n E experimental data points were generated by using random Latin-hypercubedesigns, except the τ values were used instead of T . In real application situation, one wouldconsider more sophisticated designs such as data-adaptive sequential optimal experiments asreported in the next section. A.5 Sequential designs in tuning a nuclear fusion simulator

For given a Gaussian process model, the A-optimal design is obtained by minimizing the inte-grated mean squared error of prediction (MSEP) with respect to a design S (Sacks et al. 1989), IM SE S ( ˆ Y ( x )) = (cid:90) Q M SEP ( ˆ Y ( x )) dµ ( x ) , (38)where Q is the design region, and µ is a “weight function” which may be the empirical measureof uniformly distributed random points. Note that neither M SEP nor

IM SE depend on theunknown parameters β and σ , but depend on θ, γ C and design S . This makes it possible todesign an experiment (for speciﬁed values of θ and γ C ) before taking the data.Because θ is generally not available for the initial design stage, in our example, we used anrough estimate of θ based on a previous similar work given by a Baldur specialist. Initially wefound 10 optimal design points for eight variables ( T , P, I, N, B ) which minimize

IM SE overthe design region Q , with θ = . γ C = . Step 1.

Collect computer observations based on the given optimal design ( n points, say). Step 2.

Find an appropriate model and estimates of parameters ( θ, β, σ , γ C ) from the com-puter data. Step 3.

Check the MMSE, and stop constructing the next stage design if MMSE is smallerthan a preassigned target value. Otherwise, go to the next step.

Step 4.

Use the estimates and model found in Step 2 to choose the next stage optimal design( n points, say) under the condition that the previous design is given (i.e, update n morepoints to the previous design to make n + n points).24 tep 5. Collect ( n ) more observations, and go to Step 2.The MMSE (maximum mean squared error of prediction) is deﬁned as M M SE S ( ˆ Y ( x )) = M axx i ∈ Q M SEP ( ˆ Y ( x i )) , (39)where x i , i = 1 , , · · · , K , are d-dimensional random vectors. Note that the MMSE is used hereas a measure of accuracy of a given prediction model. References [1] Morris, M.D., and T.J. Mitchell. 1995. Exploratory designs for computational experiments,

Journalof Statistical Planning and Inference . 43:381–402.[2] Park J.S., and J. Baek. 2001. Eﬃcient computation of maximum likelihood estimators in a spatiallinear model with power exponential covariogram,

Computers and Geosciences . 27:1–7.[3] Roustant O, D. Ginsbourger, and Y. Deville. 2012. DiceKriging and DiceOptim: Two R packagesfor the analysis of computer experiments by kriging-based metamodeling and optimization.

Journal ofStatistical Software . 51(1):1–55.[4] Sacks J, W. Welch, T. Mitchell, and H. Wynn. 1989. Design and analysis of computer experiment(with discussion).

Statistical Science . 4:409–435.[5] Santner T.J., B. Williams, and W. Notz. 2003.

The Design and Analysis of Computer Experiments ,New York, Springer-Verlag. . . . . . . . . t t ^ . . . . . . . t t ^ Figure 7: Box plot of the tuning parameter estimates (ˆ τ ) in the test function 1, obtained from 30Latin-hypercube design experiments using a Gaussian process Model 1. The horizontal dottedline denotes the true value. 25able 3: Result of test functions 1, 3 and 4 with Models 1 and 2. The standard deviation (SD)computed from 30 Latin-hypercube repetitions is given in parentheses.Testfunc-tion Truevalues model method Average of ˆ τ (SD) Average of ˆ τ (SD) Averagedistance to thetrue value(SD) MSE1 τ =2 , τ =2 Model 1 ANLSE 1.609 (0.207) 1.773 (0.246) 0.527 (0.168) 0.381SMLE 1.790 (0.245) 1.758 (0.275) 0.462 (0.147) 0.349Max-min 2.296 (0.135) 1.966 (0.224) 0.377 (0.109) 0.2111 τ =2 , τ =2 Model 2 ANLSE 1.448 (0.249) 1.602 (0.351) 0.749 (0.288) 0.746SMLE 2.228 (0.409) 1.465 (0.267) 0.703 (0.280) 0.732Max-min 2.382 (0.399) 2.151 (0.575) 0.718 (0.342) 1.0053 τ =2 , τ =3 Model 1 ANLSE 2.000(0.276) 2.984(0.264) 0.370(0.089) 0.283SMLE 2.038(0.217) 2.980(0.183) 0.249(0.136) 0.143Max-min 2.009(0.228) 2.985(0.194) 0.258(0.146) 0.1464 τ =2 π, τ =2 Model 1 ANLSE 5.990 (0.442) 2.078 (0.140) 0.494 (0.239) 0.460SMLE 6.016 (0.459) 2.055 (0.135) 0.485 (0.251) 0.464Max-min 6.120 (0.426) 2.109 (0.122) 0.420 (0.240) 0.373Table 4: Result of test function 2 with Model 1 when τ = 2, τ = 1, τ = 3. The standarddeviation (SD) computed from 30 Latin-hypercube repetitions is given in parentheses.method Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Averagedistance to thetrue value(SD) MSEANLSE 1.801 (0.324) 1.349 (0.225) 3.001 (0.319) 0.598 (0.232) 0.615SMLE 1.737 (0.565) 1.080 (0.527) 2.957 (0.519) 0.873 (0.398) 1.637Max-min 1.706 (0.293) 0.842 (0.127) 3.003 (0.435) 0.535 (0.327) 0.57726able 5: Result of test function 5 with Model 1 when τ = 1, τ = 2, τ = 3, τ = 2. Thestandard deviation (SD) computed from 30 repetitions is given in parentheses.method Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Averagedistance tothe truevalue (SD) MSEANLSE 0.935(0.237) 1.753(0.396) 3.044(0.908) 1.951(0.610) 1.025(0.628) 2.460SMLE 0.845(0.291) 1.975(0.758) 2.875(0.532) 1.852(0.607) 1.069(0.437) 2.453Max-min 0.562(0.261) 2.155(0.472) 3.087(0.549) 2.049(0.560) 0.955(0.291) 1.818Table 6: Result of test function 2 with Model 2 when τ = 2, τ = 1, τ = 3. The standarddeviation (SD) computed from 30 repetitions is given in parentheses.method Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Averagedistance tothe truevalue (SD) MSEANLSE 1.715(0.244) 1.107(0.322) 3.034(0.344) 0.570(0.209) 0.606SMLE 1.912(0.347) 0.901(0.381) 2.858(0.416) 0.585(0.353) 0.781Max-min 1.768(0.231) 1.106(0.314) 3.054(0.283) 0.511(0.176) 0.493Table 7: Result of test function 5 with Model 2 when τ = 1, τ = 2, τ = 3, τ = 2. Thestandard deviation (SD) computed from 30 repetitions is given in parentheses.method Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Average ofˆ τ ( SD ) Averagedistance tothe truevalue (SD) MSEANLSE 0.874(0.250) 1.852(0.414) 2.765(0.846) 1.729(0.672) 1.149(0.456) 2.721SMLE 1.008(0.515) 1.895(0.895) 2.740(0.475) 1.816(0.953) 1.398(0.530) 4.154Max-min 0.600(0.230) 2.319(0.549) 3.114(0.749) 2.104(0.407) 1.095(0.362) 2.28027able 8: Maximum likelihood estimates of the parameters of a Gaussian process Model 1 ob-tained using both computer and experimental data for PDX nuclear fusion.Symbol Description Estimates n E Sample size of experiment data 42 n C Sample size of computer data 64 θ Parameter for covariance 0.980 β Regression coeﬃcient (intercept) 0.025 β Regression coeﬃcient for T -0.027 β Regression coeﬃcient for T -0.010 β Regression coeﬃcient for T β Regression coeﬃcient for T -0.015 β Regression coeﬃcient for P -0.031 β Regression coeﬃcient for I β Regression coeﬃcient for N -0.012 β Regression coeﬃcient for B σ Variance of Y γ E Variance of σ (cid:15)E /σ . . . . t t ^ . . . . . t t ^ . . . . . t t ^ Figure 8: Same as Figure S1 but the test function 2.28 . . . . . t t ^ . . . . . t t ^ Figure 9: Same as Figure S1 but the test function 3. . . . . t t ^ . . . . . . . t t ^ Figure 10: Same as Figure S1 but the test function 4.29 . . . . t ANLSE SMLE Min−Max t ^ . . . . . . . t ANLSE SMLE Min−Max t ^ . . . . . . . t ANLSE SMLE Min−Max t ^ . . . . . . t ANLSE SMLE Min−Max t ^ Figure 11: Same as Figure S1 but the test function 5.

Convergence of RSS

Iteration R SS data 1data 2data 3 data 4data 5data 6 data 7data 8data 9data 10data 1data 2data 3 data 4data 5data 6 data 7data 8data 9data 10