Bayes Linear Emulation of Simulator Networks
UUsing Bayes Linear Emulators to Analyse Networks ofSimulators
Samuel E. Jackson ∗ , David C. Woods Southampton Statistical Sciences Research Institute, University of Southampton,Southampton, UK
October 18, 2019
Abstract
The key dynamics of processes within physical systems are often represented in the formof computer models, or simulators. Often, the overarching system of interest, comprising ofmultiple processes, may be represented as a network of simulators, with some of the inputs tosome simulators arising from the outputs to other simulators. Much of the computational statis-tics literature focusses on approximating computationally intensive simulators using statisticalemulators. An important, yet underexplored question in this literature is: can emulating theindividual simulators within a network be more powerful than emulating the composite simu-lator network as a single simulator? In this article we present two novel approaches for linkingBayes linear emulators of several component simulators together to obtain an approximationfor the composite simulator, comparing these ideas to approximating the composite simulatorusing a single emulator. These techniques, termed the Bayes linear sampling approach and Un-certain Input Bayes linear emulator, will be demonstrated on a couple of illustrative simulatedexamples, as well as being applied to an important dispersion dose-response chain of simulatorsused for disease modelling.
Physical processes of scientific interest are often represented in the form of computer models,or simulators. Such simulators aim to represent the key kinetics and dynamics of a physicalsystem using, for example, sets of differential equations. Such a representation can aid theunderstanding of the behaviour of the physical system itself. Scientists are frequently interestedin complex physical systems which can be thought of in terms of simpler component processes.In this case, it is common for a simulator of the overall physical system to be composed of achain or network of less complex simulators, each representing one of these component processes.For example, the disease modelling application of this article involves an atmospheric dispersionmodel of a biological agent linked to a dose-response model, representing how the populationwill respond to contact with certain doses of the agent. The chaining of simulators also exists ∗ [email protected] a r X i v : . [ s t a t . M E ] O c t n many other important applications, including climate modelling [36] and volcanic eruptionmodelling [2].For any individual component simulator, there are various sources of uncertainty arising asa result of the modelling procedure [15]. In particular, there is often substantial uncertaintyresulting from uncertain knowledge of simulator behaviour across the vast majority of the inputspace as a result of the computational intensity of running the simulator even once prohibitingarbitrarily large numbers of runs. Powerful analytical tools, such as emulators, are often usedto aid the analysis of these computationally intensive simulators. Bayes linear emulators arefast statistical approximations of simulators, built using a set of simulator runs across the inputspace. These emulators provide an expected value of simulator output at any input alongwith a corresponding uncertainty estimate reflecting our beliefs about the uncertainty in theapproximation [13, 39]. The key advantage of these emulators is their computational efficiency,allowing them to be used as surrogates for the simulators themselves for the purposes of inferenceand analysis.An important, yet rather underexplored, question in the literature is: can combining em-ulators for the individual simulators within a network be more powerful than emulating thenetwork as a single composite simulator? In this article we present two novel approaches forlinking Bayes linear emulators of several component simulators together to obtain an approx-imation for the composite simulator, comparing these ideas to approximating the compositesimulator using a single emulator. We demonstrate the discussed techniques on an illustrativesimulated example throughout the paper, before applying them to a scientifically relevant chainof simulators used in epidemiology.The epidemiological chain that we consider consists of an Anthrax dispersion model d linkedthrough to a dose-response model ρ (based on [19]), which, as illustrated by the graphicalrepresentation of Figure 1, can be represented by the composite simulator h ( x ) = ρ ( d ( x )). Thedispersion model simulates the dispersion behaviour of a biological agent that has been releasedacross a specific region. Input parameters of interest to this simulator correspond to the physicalquantities wind speed ( x WS ), wind direction ( x WD ) and source mass ( x SM ). Since interest inthis article is in linking simulators together as a network, for simplicity we take the output of thedispersion model to be the dose of the agent at a single specific location, although dose across aspatial domain could also be incorporated. The dose-response simulator ρ takes dose as input z and outputs a resulting proportion of casualties g ( z ). Whilst the two aspects of this overallepidemiological system are modelled by two different groups of experts, there is overarchinginterest in the effect of the release conditions on the proportion of casualties. winddirectionsource massinterventionpolicy x d ( x ) dose z ρ ( z ) proportionofcasualties d ρ Figure 1: Graphical representation of the dispersion dose-response network of simulators h ( x ) = ρ ( d ( x )), where d represents the dispersion model, and ρ represents the dose-response model. .1 Bayes Linear Statistics In this article, we focus on the Bayes linear approach to analysis (and hence emulation- see Section 1.2). The Bayes linear approach [20, 32, 13, 16] to statistical inferencetakes expectation as primitive, following De Finetti [8, 9, 40], and deals with second-order belief specifications (that is, expectations, variances and covariances) of observablequantities. Probabilities can be represented as the expectation of the correspondingindicator function when required. More precisely, suppose that there are two collectionsof random quantities, B = ( B , ..., B r ) and D = ( D , ..., D s ). Bayes linear analysisinvolves updating subjective beliefs about B given observation of D . In order to do so,prior mean vectors and covariance matrices for B and D (that is E[ B ], E[ D ], Var[ B ] andVar[ D ]), along with a covariance matrix between B and D (that is Cov[ B, D ]), must bespecified. Second-order beliefs about B can be adjusted in the light of D using the Bayeslinear update formulae:E D [ B ] = E[ B ] + Cov[ B, D ]Var[ D ] − ( D − E[ D ]) (1)Var D [ B ] = Var[ B ] − Cov[
B, D ]Var[ D ] − Cov[
D, B ] (2)Cov D [ B , B ] = Cov[ B , B ] − Cov[ B , D ]Var[ D ] − Cov[
D, B ] (3)E D [ B ] and Var D [ B ] are termed the adjusted expectation and variance of B given D [16].Cov D [ B , B ] is termed the adjusted covariance of B and B given D , where B and B are subcollections of B . For a more detailed overview of Bayes linear methods, see [13],and for a thorough treatment, see [16]. For a comparison of Bayes linear methods withthe full Bayesian approach, see, for example, [16, 13, 38]. We consider a simulator represented by the function f , which takes a vector x ∈ X ⊂ R p of input parameters, and outputs a vector f ( x ). We choose to represent our beliefs aboutthe behaviour of any scalar simulator output component f ( x ) ∈ f ( x ), for any input x , inthe following form [14]: f ( x ) = t ( x ) T β + u ( x ) = m (cid:88) j =1 β j t j ( x ) + u ( x ) , (4)where t ( x ) is a vector of known basis regression functions of x , β is a vector of unknownregression coefficients, and u ( x ) is a second-order weakly stationary stochastic process.Suppose F = ( f ( x (1) ) , ..., f ( x ( n ) )) T represents simulator output evaluated at n sim-ulator input locations X F = { x (1) , ...., x ( n ) } . We can adjust a second-order prior beliefspecification about f ( x ) across X using the Bayes linear update Equations (1)-(3) toobtain an emulator prediction for simulator output at a new x , given byE F [ f ( x )] = E[ f ( x )] + Cov[ f ( x ) , F ]Var[ F ] − ( F − E[ F ]) (5)along with a measure of the uncertainty in the prediction, given byVar F [ f ( x )] = Var[ f ( x )] − Cov[ f ( x ) , F ]Var[ F ] − Cov[
F, f ( x )] . (6)3n Equations (5) and (6), the notation E F [ f ( x )] and Var F [ f ( x )] reflect the fact that wehave adjusted our prior beliefs about f ( x ) by simulator runs F . The terms on the right-hand side correspond to the simulator runs F and a second-order prior belief specificationfor f ( x ) across X , which can be derived from the assumed form given by Equation (4)and a second-order prior belief specification across the collection { β, u ( x ) : x ∈ X } . Priorspecification, as for a full Bayesian analysis, is not trivial. It is common, however, toassume E[ u ( x )] = 0, Cov[ β, u ( x )] = , and a covariance structure between the value of u ( x ) and u ( x (cid:48) ), at two inputs x and x (cid:48) , of the following form:Cov[ u ( x ) , u ( x (cid:48) )] = σ c ( x, x (cid:48) ) , (7)where c ( x, x (cid:48) ) represents a stationary correlation function, for example c ( x, x (cid:48) ) = exp (cid:32) − p (cid:88) k =1 (cid:26) x k − x (cid:48) k θ k (cid:27) (cid:33) . (8)The form of Equation (8) is known as the Gaussian correlation function, and involvesspecification of the variance and correlation length parameters σ and θ k , k = 1 , ..., p .As already discussed, we choose to focus on the Bayes linear approach to emulation.In the literature, it is common to instead assume normal and Gaussian process priors for β and u ( x ) in Equation (4) [27, 4, 25], thus resulting in Gaussian process emulation, oth-erwise known as kriging [26, 34]. In this case, the resulting Bayesian update equations arepractically similar to Equations (5) and (6) presented above, however, methodologicallyinvolve additional distributional assumptions that we may rather not make, nor want anyconsequential inference or decisions to be made in the light of. Therefore, whilst care-fully specified distributions may allow for a more detailed Bayesian analysis of uncertainparameters, there are times when it may be preferential to restrict ourselves to a second-order specification. It is for this reason that we present novel approaches to linking Bayeslinear emulators for networks of simulators in this article. It is also worth noting thatvarious results in the literature, such as Chebyshev’s inequality [3] or Pukelsheim’s 3 σ rule [33] (at least 95% of the probability mass of any unimodal continuous distributionwill lie within ± The remainder of this article is outlined as follows. Section 2 formally introduces the no-tion of linking simulators together in a network. Section 3 introduces two novel approachesto linking Bayes linear emulators of each individual simulator in a network, namely theBayes linear sampling approach and the Uncertain Input Bayes linear emulator. Section4 demonstrates and compares these novel approaches to emulating the application chainof simulators introduced in Section 1 with the direct emulation approach. Whereas muchof the article focusses on a relatively simple composite simulator formed of two individualsimulators, Section 5 demonstrates the methodology on a more complicated network ofsimulators. The conclusions of our research are discussed in Section 6.4
Networks of Simulators
Linking of simulators exists in many areas of science, including epidemiology and seismicactivity modelling [24]. Such a network of simulators can be thought of as a singlesimulator which is made up of several modules [23]; some of the inputs to many of thesimulators in the network are taken to be outputs of other simulators. More precisely,suppose we have a simulator g which takes an input vector z ∈ Z ⊂ R q , and generates anoutput vector g ( z ). In the case of a network of simulators, some of the input parametersof z arise from (one or more) other simulators, that is, z = f ( x ) for some scalar outputcomponent of simulator f . These networks of simulators may exist, for example, as aresult of the different modules being constructed by different groups of scientific experts,but where overarching interest lies in the entire physical process described across thenetwork. In this section, we define some notation for a generic network of two simulatorslinked together as a chain. We discuss some of the current approaches in the literature,before introducing the idea of emulating the entire network as a single simulator.For the purposes of clarity, we here-on focus on perhaps the simplest of simulatornetworks; a chain formed of two simulators f, g forming a composite simulator h = g · f ,taking input vector x and generating output vector h ( x ) = g ( f ( x )), as illustrated inFigure 2. In addition, we have dropped the bold vector notation for functions f and g , essentially considering scalar output functions for clarity. The methods discussed,however, directly generalise to more complicated networks of vector-output simulators;this being demonstrated in Section 5 on the network of simulators illustrated in Figure7. x f ( x ) g ( f ( x )) f g Figure 2: Graphical representation of a generic chain of models h ( x ) = g ( f ( x )).Even considering the simple network considered in Figure 2 there are many problemsto be addressed, such as possible links and discrepancies between the output from f andthe input z to g . Such links should likely be made with reference to the physical systemproperties associated with z and f ( x ). We put such issues aside for the time being,focussing instead on the power of approximating networks of simulators by linking Bayeslinear emulators of the modules in comparison to using a single Bayes linear emulator toapproximate the simulator network. Crucially, however, a new challenge arises from thefact that the output of the first simulator becomes uncertain, resulting in an uncertaininput to the second simulator.In relation to this, Kyzyurova et al. [28] propose coupling simulators by linking in-dependently developed Gaussian process emulators of the simulators. Their motivationarose from potentially having separate training runs for the two simulators f and g , thisprohibiting direct emulation of h (see Section 2.1). For certain Gaussian processes, theyderive closed form expressions for the overall mean and variance of the emulator chain, ap-proximating the distribution by a normal distribution with equivalent mean and variance.The lack of closed-form distribution, but availability of second-order posterior statistics,for their linked emulator has inspired our investigation into Bayes linear approaches to5mulation in this context.Firstly, use of the Bayes linear approach removes the requirement to assume that z follows a particular distribution (specifically normal, Laplace or exponential). Secondly,in the methods that we present in this article, there is no requirement to restrict thecorrelation functions to being power exponential correlation functions (with power pa-rameter 1 or 2). Finally, it is not directly clear how the calculations in [28] generalise tolarger networks of simulators. In contrast, our methods generalise directly, as discussedin Section 5. We feel that, if the assumptions required by [28] are deemed reasonable, andthe chain only involves two simulators, then the precise results of [28] are appropriate foranalysis. On the other hand, we feel that the methods presented in this article are moreflexible, primarily resulting from working in a Bayes linear framework. The proposed ap-proaches allow inferences to be made about a network of simulators, but without relyingon distributional assumptions, to which any consequential analysis may be sensitive.We also note here the similarity between emulating networks of simulators using Gaus-sian processes to modelling using deep Gaussian processes [7, 10]. Deep Gaussian pro-cesses arise from belief networks about simulator behaviour based on Gaussian processmappings, such that layers of Gaussian process latent variables exist between the input x and output h ( x ). Intermediate nodes act as inputs for the layer below and outputs forthe layer above. Each layer adds model parameters and a regularisation challenge, arisingfrom needing to choose the size of each latent layer. The latent variables are effectivelymarginalised out variationally (see, for example, [37]). The primary difference betweenthe latent variables of a deep Gaussian process and the intermediate variables in a sim-ulator network is that, in a network, the variables themselves represent physical systemproperties, these aiding the construction and modelling of the emulators for each of theindividual component processes. Use of a deep Gaussian process in place of chaining indi-vidual simulators in a network is likely to remove this additional information. This doesnot mean that the ideas of a deep Gaussian process couldn’t be applied on the individualsimulators within a network, these then being linked together using the methods that wepresent in this article. h ( x ) The direct method to emulating h neglects the fact that h is composed of two modules.Such an approach relies on being able to run the composite simulator for any choice ofinput to the first simulator, that is, the training runs for g are at the output locations ofthe training runs for f . In this case, we have a set of training runs: H = ( h ( x (1) ) , ..., h ( x ( n ) )) (9)and use the standard Bayes linear update for h as given by Equations (5) and (6), namelyE H [ h ( x )] = E[ h ( x )] + Cov[ h ( x ) , H ]Var[ H ] − ( H − E[ H ]) (10)Var H [ h ( x )] = Var[ h ( x )] − Cov[ h ( x ) , H ]Var[ H ] − Cov[
H, h ( x )] (11)We demonstrate the direct emulation approach in the following example, which thusserves as an example of standard Bayes linear emulation methodology. This example willbe used throughout the article to demonstrate the novel methods for emulating networksof simulators. 6 .2 Illustrative Example For the purposes of this example, we will consider the three functions: f ( x ) = 0 . x + cos( x ) (12) g ( z ) = exp( z − sin(5 z ) (13) h ( x ) = g ( f ( x )) (14)over a domain of interest for f of x ∈ [0 , g of z ∈ [ − . , .
5] (roughly corresponding to the output domain of f ). These three functions areplotted in the top left, top right and bottom left panels of Figure 3 respectively. Notehow chaining even simple functions togethers can lead to much more complex behaviour.We treat these three functions as computationally intensive in order to demonstrate theemulation techniques discussed throughout this article.In this section, we focus on emulating simulator h directly. We represent our beliefsabout the behaviour of h using the form given by Equation (4), with covariance structuregiven by Equations (7) and (8). We let the regression functions t h ( x ) = (1 , x ) T , that is,a first-order polynomial function of x .We let the regression and covariance parameters for the emulator for h be denotedby β h , σ h and θ h . Specification of σ h and θ h is in general non-trivial, and several meth-ods are proposed in the literature. Ideally, this specification should be made a priori ,although this is often difficult in practice. Alternatively, the parameters may enter intoour belief network, however, this is also challenging, and, owing to not having a physicalinterpretation or “correct” value, somewhat meaningless. As a result, many pragmaticapproaches are available for estimating σ h and θ h from the data. For example, we canuse maximum likelihood estimates if we are happy to specify distributions [1], we can usesimple heuristics [38], or we can use predictive diagnostics, such as Leave-One-Out-Cross-Validation (LOOCV) [5, 29]. However they are chosen, the resulting choices should bechecked using rigorous diagnostics.In this section, we fit σ h and θ h via a LOOCV diagnostic predictions approach, fol-lowing the guidelines suggested in [35] for when only small numbers of training runs areavailable. In addition, we specify vague prior beliefs on β h (which results in the adjustedspecification for β being the generalised least squares estimate mean and variance for β ).Having specified prior beliefs about simulator h , we proceed to adjust these beliefs forany x in light of simulator runs H using Equations (10) and (11), where H is chosen tobe 8 equally spaced points over the input domain.The results of this emulation process are shown in the bottom right panel of Figure2. The blue lines represent emulator expectation E H [ h ( x )]. The red lines represent theemulator mean ± H [ h ( x )] ± (cid:112) Var H [ h ( x )],these being bounds for a 95% credible interval, following Pukelsheim’s 3 σ rule [33]. Alsoplotted is the true simulator (green line). Note that this would be unavailable in generalfor computationally intensive simulators. There are several ways to diagnose the overallability of an emulator; these measures being used throughout this article for emulatorcomparison purposes. Standardised prediction errors, given by:Λ h ( x ) = h ( x ) − E H [ h ( x )] (cid:112) Var H [ h ( x )] , (15)7igure 3: Top panels: simulator function f ( x ) and g ( z ) evaluated across the input domains [0 ,
10] and[ − . , .
5] respectively. Bottom left panel: simulator function h ( x ). Bottom right panel: a Bayeslinear emulator of h constructed using n = 8 training points. The blue lines represent emulatorexpectation, and the red lines represent emulator expectation plus and minus 3 standard deviations.can be used to assess the validity of an emulator. Large absolute errors | Λ h ( x ) | indicateconflict between simulator and emulator. If these are observed, the emulator is not validfor inference. It may be possible that the emulator prior beliefs were misspecified, forexample, as a result of incorrect prior specifications for the parameters β , σ and θ . Al-ternatively, it could be indication of an erratically behaved simulator that would requiresubstantially more simulator runs in order to be emulated well. In accordance with thisdiagnostic, the fitted emulator in Figure 3 is valid, since the green line largely falls within ± f and g .We note at this point that various approaches exist in the literature for obtaininga better emulator for an erratically behaved function, such as h . Examples include lo-cal Gaussian processes [17], Treed Gaussian processes [18] and the aforementioned deepGaussian processes, amongst others. The aim of this article is not to compete with suchexisting methods, but to demonstrate the efficacy of emulating individual simulators ina network and linking them together over emulating the composite simulator directly.8n particular, these alternative methods, with possible slight modification for use in theBayes linear paradigm, may still be used on any component simulator in a network toimprove emulation of that component. In Section 2, we motivated finding alternative approaches to emulating networks of sim-ulators (rather than direct emulation of the entire network) as a result of the compositesimulator potentially arising from a chain or network of less complex simulators. In ad-dition, it may well be the case that the training runs for f and g simply don’t match up(that is, the outputs to f do not correspond to training run input locations for g ). Weare therefore interested in making inference for a new input x to the composite model h given the union of training runs F = ( f ( x (1) ) , ..., f ( x ( n ) )) and G = ( g ( z (1) ) , ..., g ( z ( l ) )),at input locations X F = ( x (1) , ..., x ( n ) ) ⊂ X and Z G = ( z (1) , ..., z ( l ) ) ⊂ Z respectively.From a Bayes linear point-of-view we are interested in E F,G [ h ( x )] and Var F,G [ h ( x )] forany input x .Such adjusted beliefs can be calculated via sequential adjustment of our second-orderprior beliefs about h ( x ) by F and then G via a sequential Bayes linear update:E F,G [ h ( x )] = E F [ h ( x )] + Cov F [ h ( x ) , G ]Var F [ G ] − ( G − E F [ G ]) (16)Var F,G [ h ( x )] = Var F [ h ( x )] − Cov F [ h ( x ) , G ]Var F [ G ] − Cov F [ G, h ( x )] (17)Equivalently, we could adjust sequentially by G then F by swapping F ↔ G in Equations(16) and (17). Consideration of the required belief specification to calculate the aboveexpressions is likely to be challenging. We can avoid direct specification by claiming that f ( x ) is Bayes linear sufficient for F for adjusting h ( x ) [16], written: (cid:98) F ⊥⊥ h ( x ) (cid:99) /f ( x ) . (18)In other words, the training runs F have no effect on our beliefs about h ( x ) once adjustedby f ( x ). This is very similar to a conditional independence property in the full Bayesianparadigm. As a result, we have that:E F,G [ h ( x )] = E F [E f ( x ) ,G [ h ( x )]] (19)Var F,G [ h ( x )] = E F [Var f ( x ) ,G [ h ( x )]] + Var F [E f ( x ) ,G [ h ( x )]] , (20)where we are treating E f ( x ) ,G [ h ( x )] and Var f ( x ) ,G [ h ( x )] as uncertain quantities, our beliefsabout which we wish to adjust in light of F . From this point there are several approachesto making inferences about h ( x ) given F and G . We present three such approacheshere; the first is a generalisation of Section 2.1; the latter two novel, these cruciallymaking use of the fact that, for any f ( x ) = z , we have E f ( x ) ,G [ h ( x )] = E G [ g ( z )] andVar f ( x ) ,G [ h ( x )] = Var G [ g ( z )]. This first approach is a generalisation of the direct emulation approach of Section 2.1,and is discussed for completeness. We define the adjusted second-order belief specifica-tion for h ( x ) given G and f ( x ) as two simulators k E ( x ) = E f ( x ) ,G [ h ( x )] and k V ( x ) =9ar f ( x ) ,G [ h ( x )] with the same input x as f . We proceed to construct Bayes linear emu-lators for k E ( x ) and k V ( x ) in the usual way (using Equations (4)-(6) with f ↔ k E , k V etc.). Note that training runs K E and K V can be obtained from the set of training runs F since we can trivially calculate E G [ g ( z )] and Var G [ g ( z )] for any z = f ( x ) ∈ F . Suchemulation results in expressions for E F [ k E ( x )] , Var F [ k E ( x )] and E F [ k V ( x )], which we canthen plug into Equations (19) and (20) in order to calculate E F,G [ h ( x )] and Var F,G [ h ( x )].There are benefits to splitting the composite simulator into two simulators f and g before emulating, primarily as this removes the restriction for Z G = F . As a result, weno longer need the number of training points for f and g to be the same, which maybe of advantage if one simulator is much quicker to evaluate than the other. However,there are limits to the benefits, in terms of approximating h , of improving the emulatorsfor f and g in this manner. Emulation of k E ( x ) and k V ( x ) becomes more accurate andprecise as the number of training points in F (equivalently K E , K V ) increases. However,the emulated approximation k E ( x ) → h ( x ) as the number of training points in G for g increases. In other words, for a fixed set of training points x ∈ X F , an increase in thenumber of training points for g is highly unlikely to lead to an improved approximationover the direct emulation of Section 2.1 with the same X F .The direct emulator of Section 2.1 can be recovered by setting Z G = F , in which casewe have that K E = H and K V = 0. It is not unreasonable to assume that having K V = 0leads to E K V [ k V ( x )] = 0 for all x . Having K E = H results in E K E [ k E ( x )] = E H [ h ( x )]and Var K E [ k E ( x )] = Var H [ h ( x )]. Combining these results with Equations (19) and (20)leads to E F,G [ h ( x )] = E H [ h ( x )] and Var F,G [ h ( x )] = Var H [ h ( x )] when Z G = F . This is alogical conclusion; it would be strange if we were unable to adjust our belief specificationfor h ( x ) given { F, G } with Z G = F in the same way as directly adjusting by H .In this approach, the training runs F are used to calculate the second-order beliefspecification of h ( x ) given G , but thereafter any additional knowledge about the behaviourof f is lost. For example, for prediction of h ( x ) at new x , we have information about f ( x ),but this is not utilised. It is for this reason that we present the two novel approaches inSections 3.2 and 3.3. We will compare these two approaches with the direct emulationapproach of Section 2.1, which as discussed is a special case of the emulation methoddiscussed in this section. Emulator design is the process of selecting the points in simulator parameter space atwhich the simulator will be run in order to construct an emulator [35]. If h is emulatedas a composite simulator this amounts to selecting X H = { x (1) , ..., x ( n ) } for H . Com-puter experimental design is well covered in the literature in the context of emulatinga single model (see, for example, [11, 31]). On the other hand, if each simulator is em-ulated separately, a design for each is required, for example, X F = { x (1) , ..., x ( n ) } and Z G = { z (1) , ..., z ( l ) } for F and G respectively. Experimental design in this context likelyinvolves optimising over the two parameter spaces simultaneously, and is scope for muchresearch in its own right. As a result, in this article we will use the popular MaximinLatin Hypercube (MLH) design [30, 6] over assumed known simulator parameter spaceswhenever a design is required, whether this be for an emulator of an individual simulatoror the entire simulator network. The exception to this is for simulators with only one10arameter, in which cases an evenly spaced set of points across the parameter range isused, such as for the simulated example of Section 2.2. Let us consider Equations (19) and (20) once again. The required second-order beliefspecification is across { f ( x ) , E f ( x ) ,G [ h ( x )] : x ∈ X } . In the previous section, we treated k E ( x ) as a simulator itself, and noted that given f ( x ), we could calculate { k E ( x ) , k V ( x ) } ,thus effectively avoiding making this complex second-order belief specification. In thissection, we also seek to avoid making this specification, however, we now wish to utiliseknowledge about f ( x ) (namely E F [ f ( x )] and Var F [ f ( x )]) directly for making inferenceabout h ( x ) at a new input point x . Since F only affects E f ( x ) ,G [ h ( x )] through f ( x ), wehave that E F,G [ h ( x )]] = E E F [ f ( x )] , Var F [ f ( x )] ,G [ h ( x )] , (21)Var F,G [ h ( x )] = Var E F [ f ( x )] , Var F [ f ( x )] ,G [ h ( x )] . (22)Note that obtaining E F,G [ h ( x )] and Var F,G [ h ( x )] through Equations (21) and (22) istantamount to requiring belief statements about g ( Z ) for simulator g , where the input Z is a random variable with a second-order belief specification; E[ Z ] = E F [ f ( x )] , Var[ Z ] =Var F [ f ( x )]. We present two approaches to emulating g ( Z ) in this case. The approach inthe next section remains fully in the Bayes linear paradigm, whereas the approach in thissection utilises appropriate distributions to effectively integrate Z out of the emulatorspecification. In particular, we assume that random variable Z follows an appropriateprobability distribution which is consistent with our second-order adjusted belief specifi-cation for f at x , for example Z ∼ N (E F [ f ( x )] , Var F [ f ( x )]) . (23)It is important to note that being Bayes linear in principle does not prevent us frominvestigating the consequences of assuming certain distributions, for example, as tools forsampling purposes. The dependence of any results to this choice of distribution shouldbe explored by performing a robustness analysis.We then assume interest lies in E Z,G [ g ( Z )] and Var Z,G [ g ( Z )], which has effectivelyreplaced the second-order specification given in Equations (21) and (22) with that of Z given by Expression (23). The fact that we have now assumed a distribution on Z allowsus to integrate Z out in a fully Bayesian fashion so that:E F,G [ h ( x )] ≈ E Z,G [ g ( Z )] ≈ E Z [E G [ g ( z )]] , (24)Var F,G [ h ( x )] ≈ Var
Z,G [ g ( Z )] ≈ V ar Z [E G [ g ( z )]] + E Z [Var G [ g ( z )]] . (25)In Equations (24) and (25) it is convenient to distinguish between expectations andvariances in the Bayes linear framework (where expectation is primitive) and full Bayesianframework (where expectation is derived) using slightly different notation E[ · ] , Var[ · ] and E [ · ] , V ar [ · ] respectively.Although integration over the specified distribution (23) may be possible for specificdistributional choices, we instead propose taking a Monte Carlo sample of k possible z -values for each input x , and use these to approximate Expressions (24) and (25) above,11o that E F,G [ h ( x )] ≈ k k (cid:88) i =1 E G [ g ( z ( i ) )] , (26)Var F,G [ h ( x )] ≈ k k (cid:88) i =1 (E G [ g ( z ( i ) )] − E F,G [ h ( x )]) + 1 k k (cid:88) i =1 Var G [ g ( z ( i ) )] . (27)Note that such Monte Carlo should be relatively fast as it only involves evaluation ofemulators. In addition, we anticipate that, by emulating a (hopefully less complex) singlemodule, each emulator should be simpler and thus cheaper to evaluate (by requiring lesstraining points). Further, if we wish to evaluate h ( x ) at very many points, then we couldemulate the approximation calculation itself by viewing it as a stochastic simulator. Thiswould avoid the need to run g multiple times for each x .We can also gain much insight into the uncertainty arising from emulating the separatemodules, since each corresponds to one of the two parts of Equation (25): • V ar Z [E G [ g ( z )]] ≈ Var F [E f ( x ) ,G [ g ( z ]] reflects uncertainty in h ( x ) as a result of emu-lating f . • E Z [Var G [ g ( z )]] ≈ E F [Var f ( x ) ,G [ g ( z ]] reflects uncertainty in h ( x ) as a result of emu-lating g .This separation of contributions to the overall variance of h could be insightful for multiplereasons. An example is experimental design, briefly discussed in Section 3.1.1 where oneis allocating computational resource budget between training runs of simulators f and g .Finally, having obtained a Monte Carlo sample, it is trivial to calculate an approximationto V ar Z [Var G [ g ( z )]], which may also be useful for design purposes, as well as diagnostics. In this section, we continue the example of Section 2.2. We construct Bayes linear emu-lators for each of the functions f and g using n = 8 training points and the same priorbeliefs and techniques for eliciting the hyperparameters as discussed for h in the previoussection. The results of emulating these two simulators are shown in the top two panelsof Figure 4. We can see that both of these emulators are valid and accurate, with lowuncertainty. The emulator for f is more precise, resulting from its simpler behaviour overthat of g .We proceed to combine the two emulators for f and g using the Bayes linear samplingapproach to emulating networks of simulators. We begin by emulating f at 1000 evenly-spaced points x across the input space to obtain E F [ f ( x )] and Var F [ f ( x )]. For each ofthese points x we then sampled 100 points from a normal distribution with correspondingmean and variance. We then approximated E F,G [ h ( x )] and Var F,G [ h ( x )] using Equations(26) and (27). The results of this approximation are shown in the bottom left panel ofFigure 4. We observe that h has been emulated well using this sampling approach, withlow uncertainty. Areas of slightly larger uncertainty can be associated with regions of theinput spaces for f and/or g with larger uncertainty, as should be expected.To assess the effect of the chosen sampling distribution, we repeat the sampling ap-proximation using a uniform distribution, the results of which are shown in the bottom12igure 4: Top panels: Bayes linear emulators of f and g constructed using n = 8 training points. Theblue lines represent emulator expectation, and the red lines represent emulator expectation plus andminus 3 standard deviations. The green lines represent the true function for comparison purposes.Bottom panels: emulators for h ( x ) resulting from applying the Bayes linear sampling approach (usingnormal and uniform distributions respectively).right panel of Figure 4. We can see that the results are fairly similar, suggesting that theeffect of exact distributional specification isn’t too influential in this case. In this section, we propose a third method for emulating networks of simulators. In par-ticular, following Equations (21) and (22) we consider a Bayes linear emulator approachfor simulators with second-order specified random variable uncertain inputs. In our case,this uncertainty arises as a result of the input to one simulator being (connected to)the output of another simulator about which we are uncertain. Whilst work has beendone with regards to uncertain input emulators, it usually involves Gaussian assumptionswhich we prefer not to make [12]. The suggested approach is efficient, and we professresults in reasonable approximations to the overall simulator network. We proceed todescribe our approach to Bayes linear emulation with uncertain inputs, the linking ofBayes linear emulators following naturally.To be consistent with the chained simulators discussed above, we consider a scalaroutput emulator g , for which we have a set of training runs G = g ( Z G ) = ( g ( z (1) ) , ..., g ( z ( n ) )) T (28)13t known inputs Z G = { z (1) , ..., z ( n ) } . We wish to make inference about g ( Z ), where Z ∈ R q is an uncertain input to the simulator, and g is uncertain across the majority ofthe input space, so that g ( Z ) = t ( Z ) T β + u ( Z ) . (29)We assume a prior specification as follows: E[ β ] = µ β , Var[ β ] = Σ β , E[ u ( Z )] = 0,Cov[ β, u ( Z )] = . These are similar to a specification that may be made in the caseof known inputs. One of the key differences to the prior specification in the context ofunknown inputs is specification of a correlation function for unknown input points. Weassume a general form, similar to that given by Equation (7), as follows:Cov[ u ( Z ) , u ( Z (cid:48) )] = σ c ( Z, Z (cid:48) ) = σ c (E[ Z ] , Var[ Z ] , E[ Z (cid:48) ] , Var[ Z (cid:48) ]) . (30)For example, we propose a reasonable extension to the Gaussian correlation function,presented in Equation (8), to be c ( Z, Z (cid:48) ) = exp (cid:32) − p (cid:88) j =1 (cid:18) E[ Z j − Z (cid:48) j ] + Var[ Z j − Z (cid:48) j ] θ j (cid:19)(cid:33) , (31)this being derived from the general formula for the expected value of a quadratic form[21]. In addition, this choice follows a general rule we believe should hold, namely thatthe generic correlation function involving E[ Z ] , Var[ Z ] , E[ Z (cid:48) ] , Var[ Z (cid:48) ] should reduce downto a standard correlation function form if ( Z, Z (cid:48) ) = ( z, z (cid:48) ) are known.Given this choice of correlation function, the Bayes linear update at an uncertaininput follows relatively straightforwardly from that assuming known inputs. For detailsof calculating a Bayes linear update in the case of known inputs, see, for example, [22].We express G , following Equation (4), as G = T β + U, (32)where T is an n × m design matrix, β is the m -vector of regression parameters and U isan n -vector of residuals. Following the prior specification above, we have that E[ U ] = ,Var[ U ] = Ω = σ C and Cov[ β, U ] = 0 M , where we define C = c ( z (1) , z (1) ) c ( z (1) , z (2) ) · · · c ( z (1) , z ( n ) ) c ( z (2) , z (1) ) c ( z (2) , z (2) ) · · · c ( z (2) , z ( n ) )... ... . . . ... c ( z ( n ) , z (1) ) c ( z ( n ) , z (2) ) · · · c ( z ( n ) , z ( n ) ) . (33)The expected value of g at random input Z isE G [ g ( Z )] = E G [ t ( Z ) T β ] + E G [ u ( Z )]= E[ t ( Z ) T ]E G [ β ] + σ c ( Z ) T Ω − ( G − T E G [ β ]) , (34)which follows since Cov G [ t ( Z ) , β ] = 0, Cov[ Z, G ] = 0 and t is assumed to be a knownfunction of z (as usual). Specification of E[ t ( Z )] (and later Var[ t ( Z )]) is straight forwardfor first-order linear basis functions. It is also possible for further functions of the inputcomponents, but these transformed input components will require a sensible second-order14pecification as a result of emulating the previous simulator. There are several ways toachieve this, the most straightforward approach being to emulate the transformed inputsas further output quantities to the previous simulator.We now proceed to adjust our beliefs about the variance of g ( Z ) using Bayes linearupdate Equation (2):Var G [ g ( Z )] = Var G [ t ( Z ) T β + u ( Z )]= Var G [ t ( Z ) T β ] + Var G [ u ( Z )] + 2 Cov G [ t ( Z ) T β, u ( Z )] , (35)where Var G [ t ( Z ) T β ] = E[Var G ∪ Z [ t ( Z ) T β ]] + Var[E G ∪ Z [ t ( Z ) T β ]]= trace(Var G [ β ]E[ t ( Z ) t ( Z ) T ]) + E G [ β T ]Var[ t ( Z )]E G [ β ] , (36)Var G [ u ( Z )] = σ − σ c ( Z ) T Ω − c ( Z ) σ + σ c ( Z ) T Ω − T Var G [ β ] T T Ω − c ( Z ) σ , (37)and Cov G [ t ( Z ) T β, u ( Z )] = − E[ t ( Z ) T ]Var G [ β ] T Ω − c ( Z ) σ , (38)so that Var G [ g ( Z )]= trace(Var G [ β ]E[ t ( Z ) t ( Z ) T ]) + E G [ β T ]Var[ t ( Z ) T ]E G [ β ]+ σ − σ c ( Z ) T Ω − c ( Z ) σ + σ c ( Z ) T Ω − T Var G [ β ] T T Ω − c ( Z ) σ − t ( Z ) T ]Var G [ β ] T Ω − c ( Z ) σ . (39)These equations permit the output of a simulator to be emulated at unknown inputprovided a second-order specification is provided about the input. We can apply thisapproach to emulation directly to a network of simulators, using the adjusted second-order belief structure about the output of any simulator as the required specificationabout the unknown input to another simulator. Since the result in each case is a second-order belief specification, this approach can be applied to arbitrarily large networks ofsimulators. We now demonstrate this on the illustrative example. We emulate f at 1000 evenly-spaced points x across the input space to obtain E F [ f ( x )]and Var F [ f ( x )]. The uncertain input emulator for g is trained using the same trainingruns as in the previous sections, resulting in the same values for the parameters σ g and θ g .Given these parameters, we can now approximate the output to h at each correspondinguncertain input Z resulting form the adjusted second-order belief specification for f ( x )at each x of interest. The results of doing this are shown in Figure 5.The results of emulating h using this uncertain inputs approach is slightly differentto that obtained using the sampling approach. The blue-line prediction is very similar,however, the ± h ( x ) resulting from applying the Bayes linear uncertain inputs approachdiscussed in Section 3.3. The blue lines represent emulator expectation, and the red lines representemulator expectation plus and minus 3 standard deviations. The green lines represent the truefunction for comparison purposes. In this section, we consider applying the three techniques discussed in Section 3 to theDispersion Dose-Response network of simulators introduced in Section 1. Due to thebehaviour of the dispersion model d , we chose to emulate a transformation of the output,namely f ( x ) = log( d ( x ) + 1), treating this transformed function f as the first simulatorof the network. To be consistent with this output, we considered the second simulator g to take input ˆ z = log( z + 1) (so that g (ˆ z ) = ρ ( z )), where z represents dose. g (ˆ z ), like ρ ( z ), represents number of casualties. The combined simulator h = g · f = ρ · d takeswind speed, wind direction and source mass as input, and directly outputs a number ofcasualties. The graphical simulator showing the links between the quantities of interestand the inputs and outputs of simulators f (dispersion) and g (dose-response) is detailedin Figure 1.We proceeded to construct Bayes linear emulators for each of the individual simulators f and g , as well as the combined simulator h . We consider that the ranges of interest of theinputs to simulator d, f (and thus h ) are as given in Table 1, each of which were scaled to[ − ,
1] for the purposes of our analysis. We constructed a training point design for f and h using a MLH of size 50 across the three input dimensions. In contrast, simulator g is one-dimensional, thus the need far fewer training points, so we take a sample of 20 points froma uniform distribution. Note that, as a result of needing to run simulator g many timesat each training point in order to find the mean function, this simulator is slow, hence iffewer training points are required to train the emulator for this simulator it is already an16nput parameter Minimum Maximum x W D
37 63 x W S x SM d, f, h (which were scaled to [ − ,
1] for analysis).advantage. For each of the emulators for f, g and h , we assumed a Gaussian correlationfunction, as given by Equation (8), along with a first-order polynomial mean function.We represent the scalar variance parameter and correlation length vectors as σ f , σ g , σ h and θ f , θ g , θ h respectively. We fit these parameters using maximum likelihood for eachemulator, as recommended in [35] for models of moderate/large training run size, this alsopermitting a fair comparison between the emulation methods presented. It should alsobe noted that, whilst the focus of the article concerns the use of Bayes linear emulators,this does not prevent us from considering the effects of assuming certain distributions, inthis case for selecting parameters for the correlation functions using maximum likelihoodin order to obtain adequate emulators [38].Given the individual emulators for f and g , we can then combine them using the sam-pling method of Section 3.2 and uncertain inputs method of Section 3.3 to yield chainedemulators for h . Figure 6 shows diagnostic plots of adjusted expectation ± f and dose-response simulator g respectively;the left panel of the second row showing the results of emulating the joint dispersiondose-response network of simulators h directly as a single simulator; the right panel ofthe second row and left panel of the bottom row showing the results of emulating h bycombining the emulators for f and g using the sampling approach of Section 3.2, first us-ing a normal and then a uniform distribution; the right panel of the bottom row showingthe results of emulating h by combining the emulators for f and g using the uncertaininputs approach of Section 3.3. The inputs for these diagnostic runs of size 50 and 20(for f /h and g respectively) were constructed in the same manner as the training runs.In addition to the plots, Table 2 shows Mean Absolute Standardised Prediction Errors(MASPEs): 1 n n (cid:88) i =1 | ψ ( x ( i ) ) − µ ψ ( x ( i ) ) | (cid:112) ν ψ ( x ( i ) ) , (40)and Root Mean Squared Errors (RMSEs): (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) i =1 ( ψ ( x ( i ) ) − E[ ψ ( x ( i ) )]) , (41)for the diagnostic runs for each of the six emulators, with ψ = f, g or h as appropriateand µ ψ , ν ψ representing appropriate emulator mean and variance.We can see that the emulator for f is fairly accurate, with the exception of pointstowards the bottom end of the output range, where there are several cases of severeoverestimation (with underestimated uncertainty). The emulator for g is very accurate,reflecting the fact that emulator predictions can be taken with almost as much certainty17igure 6: Adjusted expectation ± f and dose-responsesimulator g respectively; the left panel of the second row showing the results of emulating the jointdispersion dose-response network of simulators h directly as a single simulator; the right panel ofthe second row and left panel of the bottom row showing the results of emulating h by combiningthe emulators for f and g using the sampling approach of Section 3.2, first using a normal and thena uniform distribution; the right panel of the bottom row showing the results of emulating h bycombining the emulators for f and g using the uncertain inputs approach of Section 3.3.18mulator MASPE RMSE f g h .The direct emulator yields predictions with underestimated uncertainty. By compari-son, the estimated uncertainty for the remaining methods is larger, with more appropriateMASPE values. In addition, the accuracy of the predictions for the chained emulatorsare, on the whole, improved, this being confirmed by the RMSE values in Table 2. TheRMSE values for the sampling approach and uncertain inputs approach are compara-ble. It is interesting to note, however, that the uncertainty attributed to each diagnosticpoint is different between the two emulators, with the uncertainty of the Uncertain Inputsemulators being larger for runs resulting in low or high values of h ( x ), and smaller forthose points in the middle. This is likely to be a consequence of the way the uncertaintyin f is propagated through g in the two methods. The sampling approach propagatesuncertainty in f by sampling possible values of g according to possible values of f . Thisresults in a heteroscedastic error structure across the emulator for h (for example, if g isexpected to change little regardless of the possible values of f , the uncertainty is small).In contrast, the uncertain inputs approach has uncertainty from the regression part andcovariance structure. As with a standard Bayes linear emulator that uses a single cor-relation structure across X (or in this case E[ X ] and Var[ X ]), there is some averagingof the uncertainty estimates for simulator prediction across the input space, even if thebehaviour at some points is smoother than others. Incorporation of more sophisticatedmethodology into this current emulator network methodology, for example, utilising sim-ilar ideas to local Gaussian processes [17], may be of benefit in this case. To summarise,we feel that the results presented give evidence for the two methods presented for linkingnetworks of emulators over applying a single emulator to a composite simulator in manycases. We defer further discussion to Section 6. The previous sections of this article have demonstrated how emulating individual simula-tors in a network can result in better overall emulators than emulating the entire networkdirectly. The previous examples have focussed on chains of two models; we now proceedto demonstrate how the applicability of the sampling approach and uncertain input ap-proach directly generalises to more complex networks of simulators. In each case, thesecond order specification resulting from one emulator leads to the sampling distribution19 = x f ( x ) v = y h ( x ) = g ( f ( x )) h ( y ) v = w k ( v ) = l ( h ( x ) , h ( y ) , w ) f gh Figure 7: Graphical representation of the one-dimensional network example of Section 5.1or uncertain input specification of the next one.
We here demonstrate the applicability of our methods to a more complicated network ofsimulators. We analyse the small network of simulators shown in Figure 7. f and g arethe same functions as in Section 2.2, h is the one-dimensional function h ( y ) = (cid:112) | y | − . y , (42)with domain of interest y ∈ [ − ,
6] and l is the three-dimensional function l ( w ) = w w + w w + cos( w + w ) , (43)with domains of interest given by w ∈ [0 , , w ∈ [ − ,
8] (roughly corresponding to theoutput ranges of h and h respectively) and w ∈ [1 , . k with input v = ( v , v , v ) = ( x, y, w ). This illustrative example demonstratespotential aspects of utilising networks of emulators over emulating the whole networkdirectly.To begin with, we construct Bayes linear emulators for f , g , h , l and k . We takethe training points for k to be a Latin hypercube of size 30 across the three dimensions,appealing to the rough heuristic suggesting a minimum of 10 d design points, where d isthe parameter space dimension. The relevant inputs of this Latin hypercube can thenalso be used as the training points for f and h . For g , a new set of 30 equally spacedpoints are taken as the training points z . A Latin hypercube of size 30 was taken asthe training points w for the emulator for l . We again assume emulators of the formgiven by Equation (4) with covariance structure given by Equation (8). We specify vagueprior beliefs on β , fitting σ and θ by maximum likelihood as recommended in [35] forsimulators of moderate/large training run size. The emulators for f , g , h and l werethen combined similarly to the previous examples using the Uncertain Input emulatorapproach to yield an approximation for k . The results of these six emulators are shownin Figure 8.The top three panels of Figure 8 show diagnostic plots for the individual emulators f , g and h . Since these are relatively simple 1-dimensional functions, 30 training points20igure 8: Adjusted expectation ± f , g and h , each constructed using 30 training points; the bottom row for an emulatorof l constructed using 30 training points, a direct emulator of the whole network k constructed using30 training points, and the results of emulating k by combining the emulators for f , g , h and l usingthe uncertain inputs approach of Section 3.3. 21llow almost-perfect predictions. Moving onto the bottom row, the left panel suggeststhat l is emulated well with some uncertainty, whereas the middle panel shows that thebehaviour of k is hard to mimic using a direct emulator. The uncertain input emulator for k (bottom right panel) is much more accurate than the direct emulator. The most obviousbenefit of emulating the individual simulators in this example, as for many networks ofcomplex simulators, arises from their reduced dimension. Since many of the individualsimulators are 1-dimensional, their behaviour can be captured accurately.The results shown in Figure 8 indicate that linking emulators together may sometimesbe a useful strategy for obtaining more accurate emulators of the entire network of sim-ulators for the same amount of computational expenditure. Figure 9 explores the resultsof reducing the number of training points for the emulators of the one-dimensional simu-lators f , g and h to 8, as was the case in Sections 3.2.1 and 3.3.1. Due to the reductionin training point number, we this time fitted σ and θ using LOOCV, as recommended in[35] for emulating simulators using small numbers of training points. All other aspects ofemulator construction remained the same. The top row of Figure 9 shows diagnostic plotsfor the individual emulators for f , g and h , now constructed using a reduced numberof training points. Whilst f and h still have fairly low uncertainty, the uncertainty on g is higher. All three emulators have high accuracy. The middle row shows diagnosticplots for an emulator of l constructed using 30 training points, and then direct emulatorsof k constructed using 30 and 120 training points respectively. Whilst the constructionmethod for the first two of these emulators is as for the corresponding emulation in Figure8, we note that a different Latin hypercube was sampled for the training points, thus ex-plaining the variability in the results. Although we use MLHs for a fair comparison, it isinteresting to note that the direct emulator for k is particularly susceptible to the sampledLatin hypercube of training run locations. The direct emulator for k constructed using120 training points is much more accurate, although only comparably to the samplingand uncertain input emulator approaches used to construct the emulators for which diag-nostics are shown in the bottom row of Figure 9. Whilst again providing evidence for theadvantages of the methods proposed in this article, there seems to be little discrepancybetween the sampling approach and uncertain input emulator approach for this exam-ple. This is in contrast to the application example of Section 4, for which the samplingand uncertain input emulators approaches yielded different, though comparably valid andaccurate, results. We have presented novel methodology for efficient emulation of networks of simulators.We have demonstrated these techniques on a couple of simulated illustrative examplesand on a scientifically relevant network from the disease modelling literature, where itis frequently the case that different experts construct simulators representing the inher-ent physics in different parts of the physical system. We have demonstrated how bothpresented approaches to linking individual Bayes linear emulators together can result inmore accurate emulators compared to direct emulation of the entire network.Both the sampling approach and uncertain input approach emulators were comparablein the examples considered, with little evidence that either was particularly better than22igure 9: Adjusted expectation ± f , g and h , each constructed using 8 training points; the middle row for an emulatorof l constructed using 30 training points, and direct emulators of the whole network k constructedusing 30 and 120 training points respectively; the bottom row showing the results of emulating k bycombining the emulators for f , g , h and l , first using the sampling approach of Section 3.2 using anormal then uniform sampling distribution, and then the uncertain inputs approach of Section 3.3.23he other. Having said that, in the application of Section 4, each of these two approachesto emulating the network showed different levels of accuracy of different parts of the overallinput space. In addition, the uncertain input emulator is much more efficient due to thefact that the sampling approach requires running the emulators at many more points asa result of sampling, although this will most likely only be noticable for simulators withlarge numbers of training runs. In the examples discussed this was not an issue, and runtimes were fast on a standard laptop computer.The work in this article can be extended in many ways, for example by developing themethodology to allow for stochastic simulators and ensembles of simulators. In addition,there is an interesting design question when several simulators are linked together. Inparticular, the efficiency of running the various simulators may vary, for example, multipleruns of one simulator may involve a similar amount of computational intensity as asingle run of another. As an example, the dose-response simulator of Section 4 wascomputationally cheap relative to the dispersion simulator. Whilst in this case the dose-response model was both faster and simpler, it may be that some component simulatorsof a network are computationally heavy, whilst exhibiting relatively simple behaviourwhen analysed in isolation to the entire chain.As a final thought, there is scope for future research in the area of parameter estimationfor chains of emulators. In this article, we constructed the individual emulators such thatthey satisfied diagnostics and then combined them together. However, it may be morebeneficial to utilise a combined parameter estimation process over all the simulators in anetwork. References [1] Y. Andrianakis and P. G. Challenor. Parameter estimation for gaussian processemulators. Technical report, MUCM, 2011.[2] M. Bayarri, J. O. Berger, E. S. Calder, K. Dalbey, S. Lunagomez, A. K. Patra, E. B.Pitman, E. T. Spiller, and R. L. Wolpert. Using statistical and computer models toquantify volcanic hazards.
Technometrics , 51:402–413, 2009.[3] P. Chebyshev. Des valeurs moyennes.
Journal de math´ematiques pures et appliqu´ees ,2(12):177–184, 1867.[4] S. Conti, J. P. Gosling, J. Oakley, and A. O’Hagan. Gaussian process emulation ofdynamic computer codes.
MUCM , 2009.[5] N. Cressie.
Statistics for Spatial Data . Wiley, 1991.[6] C. Currin, T. Mitchell, M. Morris, and D. Ylvisaker. Bayesian prediction of determin-istic functions with applications to the design and analysis of computer experiments.
Journal of the American Statistical Association , 86(416):953–963, 1991.[7] A. C. Damianou and N. D. Lawrence. Deep gaussian processes.
Proceedings of the16th International Conference on Artificial Intelligence and Statistics , 31, 2013.[8] B. de Finetti.
Theory of Probability , volume 1. Wiley, 1974.249] B. de Finetti.
Theory of Probability , volume 2. Wiley, 1975.[10] M. M. Dunlop, M. A. Girolami, A. M. Stuart, and A. L. Teckentrup. How deep aredeep gaussian processes.
Journal of Machine Learning Research , 19:1–46, 2018.[11] R. A. Fisher.
The Design of Experiments . Oliver and Boyd, 1937.[12] A. Girard and R. Murray-Smith. Gaussian processes: Prediction at a noisy inputand applicatio to iterative multiple-step ahead forecasting of time-series.
Switchingand Learning in Feedback Systems , pages 158–184, 2005.[13] M. Goldstein. Bayes linear analysis. In S. Kotz, C. B. Read, N. Balakrishnan,and B. Vidakovic, editors,
Encyclopedia of statistical Sciences , chapter Bayes LinearAnalysis, pages 29–34. Wiley, New York, 1999.[14] M. Goldstein and J. C. Rougier. Probabilistic formulations for transferring infer-ences from mathematical models to physical systems.
SIAM Journal on ScientificComputing , 26(2):467–487, 2004.[15] M. Goldstein, A. Seheult, and I. Vernon. Assessing model adequacy. In J. Wain-wright and M. Mulligan, editors,
Environmental Modelling: Finding Simplicity inComplexity . John Wiley and Sons, Chichester, 2013.[16] M. Goldstein and D. Wooff.
Bayes Linear Statistics . Wiley, Chichester, 2007.[17] R. B. Gramacy and D. W. Apley. Large gaussian process approximation forlarge computer experiments.
Journal of Computational and Graphical Statistics ,24(2):561–578, 2015.[18] R. B. Gramacy and H. K. H. Lee. Bayesian treed gaussian process models with anapplication to computer modeling.
Journal of the American Statistical Association ,103(483):1119–1130, 2012.[19] P. G. Groer. Dose-response curves and competing risks.
Proceedings of the NationalAcademy of Sciences of the United States of America , 75(9):4087–4091, 1978.[20] J. A. Hartigan. Linear Bayesian methods.
Journal of the Royal Statistical Society ,31:446–454, 1969.[21] D. A. Harville.
Linear Models and the relevant distributions and matrix algebra .CRC press, Boca Raton, 2018.[22] S. E. Jackson.
Design of Physical System Experiments Using Bayes Linear Emulationand History Matching Methodology with Application to Arabidopsis Thaliana . PhDthesis, Durham University, 2018.[23] P. E. Jacob, L. M. Murray, C. C. Holmes, and C. P. Robert. Better together?statistical learning in models made of modules. arXiv:1708.08719v1, August 2017.[24] B. Jha and R. Juanes. Coupled multiphase flow and poromechanics: A computa-tional model of pore pressure effects on fault slip and earthquake triggering.
WaterResources Research , 50:3776–3808, 2014.2525] J. S. Johnson, J. P. Gosling, and M. C. Kennedy. Gaussian process emulation forsecond-order monte carlo simulations.
Journal of Statistical Planning and Inference ,141:1838–1848, 2011.[26] A. G. Journel and C. J. Huijbregts.
Mining Geostatistics . Academic Press, Amster-dam, 1978.[27] M. C. Kennedy and A. O’Hagan. Bayesian calibration of computer models.
Journalof the Royal Statistical Society , 63(3):425–464, 2001.[28] K. N. Kyzyurova, J. O. Berger, and R. L. Wolpert. Coupling computer modelsthrough linking their statistical emulators. Draft: for review purposes only., 2018.[29] H. Maatouk, O. Roustant, and Y. Richet. Cross-validation estimations of hyper-parameters of gaussian processes with inequality constraints.
Procedia EnvironmentalSciences , 27:38–44, 2015.[30] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three methodsfor selecting values of input variables in the analysis of output from a computer code.
Technometrics , 21(2):239–245, 1979.[31] D. C. Montgomery.
Design and Analysis of Experiments . Wiley, 2009.[32] A. O’Hagan. Bayes linear estimators for randomized response models.
Journal ofthe American Statistical Association , 82:580–585, 1987.[33] F. Pukelsheim. The three sigma rule.
The American Statistician , 48(2):88–91, 1994.[34] C. E. Rasmussen and C. K. I. Williams.
Gaussian Processes for Machine Learning .MIT Press, 2006.[35] T. J. Santner, B. J. Williams, and W. I. Notz.
The Design and Analysis of ComputerExperiments . Springer, New York, 2003.[36] K. E. Taylor, R. J. Stouffer, and G. A. Meehi. An overview of cmip5 and theexperiment design.
Journal of the American Meteorological Society , 93:485–498,2012.[37] M. K. Titisias and N. D. Lawrence. Bayesian gaussian process latent variable model.
Proceedings of the 13th International Conference on Artificial Intelligence and Statis-tics , 9:844–851, 2010.[38] I. Vernon, M. Goldstein, and R. G. Bower. Galaxy formation: A Bayesian uncertaintyanalysis.
Bayesian Analysis , 5(4):619–669, 2010.[39] I. Vernon, M. Goldstein, J. Rowe, J. Topping, J. Liu, and K. Lindsey. Bayesian un-certainty analysis for complex systems biology models: emulation, global parametersearches and evaluation of gene functions.
BMC Systems Biology , 12(1), 2018.[40] P. Whittle.