[PDF] Bayesian prediction for physical models with application to the optimization of the synthesis of pharmaceutical products using chemical kinetics

Abstract

Quality control in industrial processes is increasingly making use of prior scientific knowledge, often encoded in physical models that require numerical approximation. Statistical prediction, and subsequent optimization, is key to ensuring the process output meets a specification target. However, the numerical expense of approximating the models poses computational challenges to the identification of combinations of the process factors where there is confidence in the quality of the response. Recent work in Bayesian computation and statistical approximation (emulation) of expensive computational models is exploited to develop a novel strategy for optimizing the posterior probability of a process meeting specification. The ensuing methodology is motivated by, and demonstrated on, a chemical synthesis process to manufacture a pharmaceutical product, within which an initial set of substances evolve according to chemical reactions, under certain process conditions, into a series of new substances. One of these substances is a target pharmaceutical product and two are unwanted by-products. The aim is to determine the combinations of process conditions and amounts of initial substances that maximize the probability of obtaining sufficient target pharmaceutical product whilst ensuring unwanted by-products do not exceed a given level. The relationship between the factors and amounts of substances of interest is theoretically described by the solution to a system of ordinary differential equations incorporating temperature dependence. Using data from a small experiment, it is shown how the methodology can approximate the multivariate posterior predictive distribution of the pharmaceutical target and by-products, and therefore identify suitable operating values. Materials to replicate the analysis can be found at this http URL.

Full PDF

BBayesian prediction for physical models with application to theoptimization of the synthesis of pharmaceutical products usingchemical kinetics

Antony M. Overstall †∗ , David C. Woods † & Kieran J. Martin ‡† University of Southampton, Southampton SO17 1BJ, UK ‡ Roche, Welwyn Garden City AL7 1TW, UK

Quality control in industrial processes is increasingly making use of prior scientiﬁc knowledge, oftenencoded in physical models that require numerical approximation. Statistical prediction, and subsequentoptimization, is key to ensuring the process output meets a speciﬁcation target. However, the numericalexpense of approximating the models poses computational challenges to the identiﬁcation of combinationsof the process factors where there is conﬁdence in the quality of the response. Recent work in Bayesiancomputation and statistical approximation (emulation) of expensive computational models is exploited todevelop a novel strategy for optimizing the posterior probability of a process meeting speciﬁcation. Theensuing methodology is motivated by, and demonstrated on, a chemical synthesis process to manufacturea pharmaceutical product, within which an initial set of substances evolve according to chemical reactions,under certain process conditions, into a series of new substances. One of these substances is a target phar-maceutical product and two are unwanted by-products. The aim is to determine the combinations of processconditions and amounts of initial substances that maximize the probability of obtaining suﬃcient targetpharmaceutical product whilst ensuring unwanted by-products do not exceed a given level. The relation-ship between the factors and amounts of substances of interest is theoretically described by the solutionto a system of ordinary diﬀerential equations incorporating temperature dependence. Using data from asmall experiment, it is shown how the methodology can approximate the multivariate posterior predictivedistribution of the pharmaceutical target and by-products, and therefore identify suitable operating values.Materials to replicate the analysis can be found at . Keywords:

Approximate coordinate exchange; multivariate Gaussian process; Gibbs sampling; paralleltempering; Riemann manifold Langevin Metropolis-Hastings

Pharmaceutical products are often manufactured by chemical synthesis, with an initial set of chemicalsubstances evolving over time via a series of chemical reactions into a substance or substances of interest.One (or more) of these substances will be the target pharmaceutical product while the others will be unwantedor harmful by-product substances. In addition to a time dependence, the chemical synthesis process maybe hypothesised to depend on various controllable factors. The aim of the chemical engineers will be tomanipulate the controllable factors such that speciﬁcation limits on the ﬁnal substances are satisﬁed. Forexample, the quantity of pharmaceutical product may be required to exceed some speciﬁed level whilst theamounts of unwanted by-products are correspondingly kept below a certain level. ∗ Corresponding author. Southampton Statistical Sciences Research Institute, University of Southampton, Southampton,SO17 1BJ, UK. Email: [email protected] a r X i v : . [ s t a t . O T ] O c t t is common for a physical model, for example derived from scientiﬁc theory as the solution to a setof ordinary diﬀerential equations (ODEs), to be postulated to approximately describe the dependence ofthe synthesis process on the controllable factors (and time); see chapters in am Ende (2011). In addition,often an experiment can be performed within which the amounts of the substances of interest are measuredfor several diﬀerent speciﬁcations of the controllable factors. These observed responses can then be used toinform the relationship between controllable factors and the synthesis process, e.g. through the estimationof unknown tuning parameters.In this paper, we develop and apply methodology for the statistical modelling of such experiments,motivated by a pharmaceutical exemplar which measured three substances of interest: the pharmaceuticalproduct and two unwanted by-products. The dynamics behind the synthesis process are approximatelydescribed by a system of non-linear ODEs, the solution to which forms a physical model for the amounts ofthe three substances at a given time. This model depends on a set of controllable factors (including time)and unknown parameters. A small experiment has been conducted where the amounts of the substancesof interest have been observed for diﬀerent controllable factors at certain time points. The aim is to usethis observed data to estimate the unknown parameters and then to use these estimates to maximize theprobability of the predicted amounts satisfying known speciﬁcation limits as a function of the controllablefactors and time. The approach used here will be Bayesian, giving a coherent method of propagatinguncertainty on the unknown parameters (given the observed data) through to the predicted amounts. Thisapproach is closely aligned with the concept of “pharmaceutical design space” , a set of combinations of valuesof the controllable factors that have been demonstrated to provide assurance of quality. See, for example,Peterson (2008), Peterson and Yahyah (2009) and Lebrun et al. (2013) for related Bayesian approaches to thedeﬁnition of a design space using linear statistical modelling methods. Beyond pharmaceutical design space,in the wider ﬁeld of process optimization, see, for example, Chiao and Hamada (2001), Peterson (2004) andDel Castillo (2007).Despite the apparent simplicity and appealing nature of the Bayesian modelling approach, several com-putational challenges remain. Firstly, the solution to the system of ODEs is analytically intractable and soa computationally expensive numerical solution is required (e.g. Iserles, 2009). The computational expenseof this approximation will impact the computing time required both for the estimation of the unknown pa-rameters through the evaluation of their posterior distribution and the predictions for new combinations ofthe controllable factors. To reduce computational expense, we make use of (multivariate) Gaussian processemulators to accelerate both the Markov chain Monte Carlo (MCMC) algorithm employed and the predictionof future responses. Secondly, some of the observations fall below a minimum threshold, χ , under which theactual amount cannot be measured reliably, resulting in left-censored observations. Instead of assuming thatthe true observations are equal to zero or χ with certainty, we assume that the true observations are unknown,subject to being in the interval ( , χ ) , within a natural Bayesian formulation. Thirdly, we know that thephysical model derived from the solution to the ODEs only provides an approximation to the true process.We account for any potential mismatch by introducing an explicit model discrepancy term (c.f. Kennedy andO’Hagan, 2001). Fourthly, the function giving the probability of satisfying the speciﬁcation limits for givencontrollable factors will be noisy and computationally expensive to evaluate making optimization non-trivial.The contribution of this paper is the novel application of a combination of modern computational andstatistical methodology to address these challenges, and the demonstration of this approach on an importantpractical problem.The paper is organised as follows. In Section 2 we describe the motivating experiment, and the physicaland statistical models for the observations in more detail. In Section 3 we describe the methodology includingconstruction of Gaussian process emulators. In Section 4 we present the results of applying the methodologyto the motivating experiment, before concluding with a discussion in Section 5. Background

The motivating chemical synthesis process involves nine substances, labelled A to I. The substances of interestare E, F and H, where F constitutes the pharmaceutical product with E and H being unwanted by-productsubstances. Let A ( t ) denote the amount (in mols) of substance A at time t (in seconds), with similar notationfor the remaining substances. At time t =

0, initial amounts of A, D and E, denoted by A = A ( ) , D = D ( ) and E = E ( ) , respectively, are introduced into the laboratory apparatus at a speciﬁed temperature ( λ inKelvins) and volume ( V , in litres). Let x = ( x , . . . , x ) = ( A , D , E , λ, V ) be an F × F =

5) andassume that x ∈ X = ∏ Ff = [ x f,min , x f,max ] . The upper and lower limits, x f,min and x f,max , for each factorare given in Table 1.Deﬁnition Name Lower limit, x f,min Upper limit, x f,max UnitsInitial amount of A x = A x = D x = E x = λ x = V k —→ + CB + D + E k ←→ B + G + FF + B k —→ H + I ⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭ (1)where k , k , and k denote the chemical reaction rates. The second line denotes a reversible reaction andthus k is split into k − and k + , for the forward and backward reactions, respectively.Let [ A ]( t ) = A ( t )/ V be the concentration (in mol/litre) at time t of A, let ˙ A ( t ) = d [ A ]( t )/ d t be thecorresponding time derivative, and assume similar deﬁnitions for the remaining substances; B to I. Chemicalreactions (1) lead to the following system of non-linear ODEs (suppressing the dependence on t ):˙ A = − k [ A ] ˙ B = k [ A ] − k [ F ][ B ] ˙ C = k [ A ] ˙ D = − k − [ B ][ D ][ E ] + k + [ B ][ G ][ F ] ˙ E = − k − [ B ][ D ][ E ] + k + [ B ][ G ][ F ] ˙ F = k − [ B ][ D ][ E ] − k + [ B ][ G ][ F ] − k [ F ][ B ] ˙ G = k − [ B ][ D ][ E ] − k + [ B ][ G ][ F ] ˙ H = k [ F ][ B ] ˙ I = k [ F ][ B ] ⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭ for t ∈ T = [ , ] seconds. (2)The initial concentrations of all nine substances are zero, except for A, D and E which are [ A ]( ) = A / V , [ D ]( ) = D / V and [ E ]( ) = E / V , respectively.For given chemical reaction rates, the solution ([ A ]( t ) , . . . , [ I ]( t )) from (2) provides the concentrationsof the nine substances for given combinations of the controllable factors, x , and reaction rates. The amountsof the nine substances are then given by ( A ( t ) , . . . , I ( t )) = V × ([ A ]( t ) , . . . , [ I ]( t )) . However, the solutionis analytically intractable and hence we employ a computationally expensive numerical method to ﬁnd an3pproximate solution. In particular, we use the variable coeﬃcient ordinary diﬀerential equation solver(Brown et al., 1989) as implemented in the R (R Core Team, 2018) package deSolve() function (Soetaertet al., 2010). However the methodology described in Section 3 can be used in conjunction with any numericalmethod.The ﬁrst reaction is assumed to be instantaneous, which the chemical engineers model by assuming k = k − , k + and k , are unknown, and the temperature dependence isincorporated into the system of ODEs via the Arrehnius equation (see, for example, Laidler 1984). Hence,the general form for a reaction rate is k i = k ( r ) i exp ( E i G ( λ − λ ( r ) )) , i = , , where λ ( r ) is the reference temperature (here 298.15K), k ( r ) i > E i > G = . − K − is the gas constant. By deﬁnition, theactivation energies for the reversible reactions are equal, i.e. E + = E − = E . This means there are p = γ = ( γ , . . . , γ ) = ( k ( r )− , k ( r )+ , k ( r ) , E , E ) . To increase computational eﬃ-ciency of the methodology described in Section 3, we operate on the logarithm scale, i.e. consider θ k = log γ k ,for k = , . . . , p , and let θ = ( θ , . . . , θ p ) ∈ Θ = R p .The quality control question of interest is to make choices of initial amounts of A, D and E, temperature,volume and time such that the amount of the three substances of interest satisfy the following speciﬁcationlimits (in mols): E ( t ) < ,F ( t ) > ,H ( t ) < . (3)Recall that F is the pharmaceutical product and its amount needs to be suﬃciently large to make theproduction process economically viable. However E and H represent unwanted by-products with upperlimits for safety reasons. We estimate the values of the unknown physical parameters θ using observations from a small experiment.The chemical reactions (1) are observed for I ′ = x , given in Table 2.For run i = , . . . , I ′ , we observe the amounts of E, F and H at a series of n i times t i , . . . , t in i . The timesrange from 0.5 to 2902 seconds, with n i ranging from 17 to 20 time points per run. In total, there are n = ∑ I ′ i = n i =

109 observations of the amounts over all I ′ runs. Note how run 6 is a repetition of run 5, thusthere are I = I ′ − = m =

93 unique combinations of controllable factorsand time points. Let ( x ′ j , t j ) be the values of the controllable factors for each of these unique combinations( j = , . . . , m ). The experiment was not designed to be statistically optimal for either the estimation ofparameters θ or prediction for new treatments.Let K = µ ( θ ; x , t ) = log ( E ( t ) , F ( t ) , H ( t )) the K × x , time t and parameters θ .Let Y i be the n i × K matrix containing the log observed amounts of E, F and H for the i th run, for i = , . . . , I ′ . The j th row of Y i , denoted by y ij is a K × j th timepoint of the i th run, for i = , . . . , I ′ and j = , . . . , n i . Let Y be the n × K matrix formed by stacking thematrices Y , . . . , Y I ′ , i.e. Y = ⎛⎜⎝ Y ⋮ Y I ′ ⎞⎟⎠ . A D E Temperature ( λ , Kelvin) Volume ( V , litres)1 x x x x x x χ = .

01 mols. Therefore, wehave censored values when the observed amount is less than χ . In total, there are zero censored observationsin the ﬁrst column of Y , and 5 and 45 in the second and third columns, respectively. We take account ofthe censoring in our statistical modelling strategy described in Section 2.3. We assume that Y = G ( M ( θ ) + D ) + E . (4)In (4), M ( θ ) is the m × K matrix of unique model solutions, given by M ( θ ) = ⎛⎜⎝ M ( θ )⋮ M I ( θ ) ⎞⎟⎠ , where M i ( θ ) , for i = , . . . , I −

1, is the n i × K matrix with j th row given by µ ( θ ; x i , t ij ) , and M I ( θ ) isthe ( n I + ) × k matrix where the j th row is given by µ ( θ ; x I , t Ij ) , for j = , . . . , n I , and the ( n I + ) th and ( n I + ) th rows are given by µ ( θ ; x I , ) and µ ( θ ; x I , ) , respectively (corresponding to the two timepoints that diﬀer between runs 5 and 6).Furthermore, D is the m × K matrix of model discrepancy errors given by D = ⎛⎜⎝ D ⋮ D I ⎞⎟⎠ , which represents the discrepancy between the physical model and the true mean value of the process (Kennedyand O’Hagan, 2001). The n × m binary incidence matrix G identiﬁes the rows of M + D corresponding tothe rows of Y . Finally, E is the n × K matrix of observational errors.The distribution of E is assumed to be E ∼ MN ( , Ω , T ) , where MN denotes the matrix normal distribution (e.g. Dawid, 1981), Ω is an unknown K × K columncovariance matrix, and T is an n × n row correlation matrix. The matrix T = diag { T , . . . , T I ′ } is block-diagonal, with jl th element of T i given by T i,jl = exp (− ρ ( t ij − t il ) ) j, l = , . . . , n i , with time correlation parameter ρ > E assumes that thecorrelation between observations from the same run is dependent on the diﬀerence in time, but observationsfrom diﬀerent runs are independent. 5e complete the model speciﬁcation by specifying prior distributions for the discrepancy D and themodel parameters. The discrepancy is assumed to follow a matrix normal distribution, D ∼ MN ( , Σ , S ) , (5)where Σ is an unknown K × K covariance matrix and S is an m × m correlation matrix. The jl th elementof S is given by S jl = exp ⎛⎝− ψ F ∑ f = ( d jp − d lp ) − ψ ( t j − t l ) ⎞⎠ , (6)for j, l = , . . . , m and where the model discrepancy correlation parameters ψ > ψ > d jf = ( x ′ jf − x f,min ) / ( x f,max − x f,min ) is the scaled value of the f th controllable factor, for f = , . . . , F and j = , . . . , m , where x ′ jf is the f th element of x ′ j . This correlation structure allows the diﬀerencesin model discrepancy between diﬀerent treatments to depend on the “distance” between the controllablefactors and time points for each treatment.The chemical engineers have some limited prior knowledge on the location of θ . It is believed that thereaction rates are likely to lie between 10 − and 10 − and the activation energies between 10 and 10 . Weencode this information by assuming that the reaction rates and activation energies have independent log-normal prior distributions with hyperparameters chosen so that there is probability 0.95 that the value ofinterest lies between the two limits above. This is achieved by setting the 0.025 and 0.975 quantiles to bethe lower and upper limits respectively. In summary, we assumelog θ ∼ N ( µ , ∆ ) , where µ = (− . , − . , − . , . , . ) T and ∆ = . I p . There is no prior information available for theremaining parameters, so we specify vague prior distributions that contribute negligible information. Thevariance matrices Ω and Σ are both assumed to have inverse-Wishart prior distributions with ν = ρ , ψ and ψ are given independent exponential prior distributions with mean equal to one following the argumentsof, for example, Overstall and Woods (2016). In Section 4 we discuss the results of a sensitivity analysis toassess the robustness to the choice of these vague prior distributions.Let y = vec ( Y ) , m ( θ ) = vec ( M ( θ )) , d = vec ( D ) and e = vec ( E ) , with the vec operator stacking columnsof a matrix. Then a model speciﬁcation equivalent to (4) is given by y = H ( m ( θ ) + d ) + e , (7)where H = I k ⊗ G , e ∼ N ( , Ω ⊗ T ) , (8) d ∼ N ( , Σ ⊗ S ) , (9)and ⊗ denotes the Kronecker product.Let y S and y C denote the elements of y which are fully observed (i.e. greater than log χ ) and censored(i.e. less than log χ ), respectively. Therefore, we need to evaluate the posterior distribution (conditional on y S ) of the model parameters and censored observations. This distribution is given by π ( θ , Ω , ρ, d , Σ , ψ , y C ∣ y S ) ∝ π ( y ∣ θ , d , Ω , ρ ) π ( d ∣ Σ , ψ ) π ( θ ) π ( Ω ) π ( ρ ) π ( Σ ) π ( ψ ) , (10)where π ( y ∣ θ , d , Ω , ρ ) is the complete data likelihood given by (7) and (8), π ( d ∣ Σ , ψ ) is the density of thevectorised model discrepancy (9), and π ( θ ) , π ( Ω ) , π ( ρ ) , π ( Σ ) and π ( ψ ) are prior densities for the modelparameters. 6 .4 Prediction Let y be a K × x ∈ X and t ∈ T . The posterior predictive distribution of y is given by π ( y ∣ y S ) = ∫ π ( y ∣ θ , D , ψ , Ω , Σ ) π ( θ , D , ψ , Ω , Σ ∣ y S ) d θ d D d ψ d Ω d Σ , (11)where y ∣ θ , D , ψ , Ω , Σ ∼ N ( µ ( θ ; x , t ) + D T S − s , ( − s T S − s ) Σ + Ω ) , and s is an m × j th element given by s j = exp ⎛⎝− ψ F ∑ f = ( d jf − d f ) − ψ ( t j − t ) ⎞⎠ , with d f = ( x f − x f,min ) / ( x f,max − x f,min ) being the scaled value of each element of x , for f = , . . . , F .The probability of E, F and H satisfying the speciﬁcation limits (3) at point ( x , t ) is given byP ( y ∈ Y∣ y S ) = ∫ Y π ( y ∣ y S ) d y , , where Y = { η ∈ R K ∣ η < log 3 , η > log 20 , η < log 3 } . The posterior and predictive distributions given by (10) and (11), respectively, will be analytically intractable.Our general approach to numerically approximating these distributions uses two phases. The

SamplingPhase involves generating a sample from the posterior distribution using MCMC methods. The

PredictionPhase uses this MCMC sample to generate a sample from the posterior predictive distribution and thusestimate the probability P ( y ∈ Y∣ y S ) by the proportion of sampled values satisfying the speciﬁcation limits.These phases make use of Gibbs sampling and parallel tempering algorithms, and multivariate Gaussianprocess emulators for the numerical solution to the ODEs. We use Gibbs sampling in conjunction with parallel tempering to generate a sample from the joint posteriordistribution (10) of model parameters and censored observations. To improve the eﬃciency of the Gibbssampler, we employ hierarchical centring (Papaspiliopoulos et al., 2003) and reparameterise model (7) as y = Hd ∗ + e , (12)where d ∗ ∼ N ( m ( θ ) , Σ ⊗ S ) . (13)Sampled values of d ∗ can easily be transformed to values of d using d = d ∗ − m ( θ ) . The posterior distributionof model parameters and censored observations is now given by π ( θ , Ω , ρ, d ∗ , Σ , ψ , y C ∣ y S ) ∝ π ( y ∣ d ∗ , Ω , ρ ) π ( d ∗ ∣ θ , Σ , ψ ) π ( θ ) π ( Ω ) π ( ρ ) π ( Σ ) π ( ψ ) , where π ( y ∣ d ∗ , Ω , ρ ) is the complete-data likelihood deﬁned by (12) and π ( d ∗ ∣ θ , Σ , ψ ) is deﬁned by (13). Thefull conditional posterior distributions of d ∗ , Ω and Σ are of known form and are given in Appendix A. Thefull conditional posterior distribution of y C is a truncated multivariate normal distribution (see Appendix A)and there exist eﬃcient methods for generating values from such distributions (Geweke, 1991).For the remaining parameters θ , ρ and ψ , the full conditional distributions are not of known form,so a Metropolis-within-Gibbs step is employed. Speciﬁcally, we exploit the Riemann manifold Langevin7etropolis-Hastings (RMLMH) algorithm (Girolami and Calderhead, 2011) . This method uses derivativeinformation on the posterior surface to construct an eﬃcient proposal distribution. The algorithmic detailsare brieﬂy described in Appendix B and the components necessary for its application to the full conditionalposterior distributions of θ , ρ and ψ are given in Appendix C.Parallel tempering (see, for example, Gelman et al., 2014, pp. 299-300) is used to improve the eﬃciencyof sampling from a potentially multi-modal posterior distribution. It involves deﬁning R distributions by aseries of increasing “temperatures”, where the lowest temperature distribution corresponds to the posteriordistribution. An MCMC chain is initialized under each distribution, and the parallel tempering schemeallows for swaps between the states of the chains for distributions at diﬀerent temperatures. The lowesttemperature MCMC chain is a sample from the posterior distribution. Parallel tempering allows largermoves (e.g. between diﬀerent areas of high posterior density) made at the higher temperature chains to bepassed down to the lower temperature chains, hence improving mixing.Let δ = ( θ , Ω , ρ, d ∗ , Σ , ψ , y C ) be the collection of unknown parameters and censored observations. For r = , . . . , R , deﬁne the distribution for the r th chain to have density proportional to U r ( δ ) = exp ( log π ( y S ∣ δ ) + log π ( δ ) τ r ) , where 1 = τ < ⋅ ⋅ ⋅ < τ R are a series of temperatures. A parallel tempering iteration involves applying either i)a sampling step (with probability ω ); or ii) a swap step (with probability 1 − ω ). In the sampling step, thecurrent values δ c ( r ) in the r th chain are updated to δ c + ( r ) , for r = , . . . , R , using the Gibbs sampling algorithm.In the swap step, two neighbouring chains ( r and r +

1) are chosen at random. With probabilitymin ⎡⎢⎢⎢⎢⎣ , U r + ( δ c ( r ) ) U r ( δ c ( r + ) ) U r + ( δ c ( r + ) ) U r ( δ c ( r ) ) ⎤⎥⎥⎥⎥⎦ , set δ c + ( r ) = δ c ( r + ) and δ c + ( r + ) = δ c ( r ) . To complete sampling step (i) for each distribution at temperature t r ,the full conditional and RMLMH components need to be adjusted (see Appendices A and B for details). One iteration of the RMLMH step in the Gibbs sampling algorithm outlined in Section 3.1 requires one eval-uation of m ( θ ) to calculate the acceptance probability and one evaluation each of sensitivities ∂ m ( θ )/ ∂ θ and ∂ m ( θ )/ ∂ θ ∂θ k to form the proposal distribution. To evaluate m ( θ ) we are required to evaluate µ ( θ ; x i , t ij ) , for i = , . . . , I and j = , . . . , n i . The derivative terms require evaluation of ∂ µ ( θ ; x i , t ij )/ ∂ θ and ∂ µ ( θ ; x i , t ij )/ ∂ θ ∂θ k , for i = , . . . , I , j = , . . . , n i and k = , . . . , p .The system of ODEs can be augmented with additional equations to numerically solve for the ﬁrst-and second-order sensitivities (Valko and Vajda, 1984; Girolami and Calderhead, 2011). However this willsubstantially increase the computational burden of the numerical solution. Note that to generate a sampleof size B using parallel tempering with R chains will require BR evaluations of the three functions for each x i and t ij , for i = , . . . , I , j = , . . . , n i and k = , . . . , p . Once we have generated a sample of size B fromthe posterior distribution, to estimate the probability P ( y ∈ Y∣ y S ) , we require B further evaluations of µ ( θ ; x , t ) , for each x ∈ X and t ∈ T at which we wish to evaluate this probability.Making such a large number of evaluations of the numerical solution is clearly computationally infeasible.Instead we form an approximation to the numerical solution by the construction of a statistical emulatorthrough a computer experiment (Sacks et al., 1989). The numerical solution to the ODEs is evaluated ata selection of N combinations of parameters θ ; controllable variables x ; and times t . These evaluations aretreated as data to which a statistical model, termed an emulator, is ﬁtted. The emulator is essentially apredictive distribution (denoted by Q ( θ , x , t ) ) from which fast predictions of the numerical solution can beobtained. For example, the predictive mean, ˆ µ ( θ ; x , t ) , will be used as a point prediction. For more detailson computer experiments and statistical emulators, see, for example, Santner et al. (2003), Fang et al. (2006)and (Dean et al., 2015, Section V). 8n the Sampling Phase, we apply the Gibbs sampling and parallel tempering algorithm with the followingamendments:1. All evaluations of sensitivities ∂ µ ( θ ; x , t )/ ∂ θ and ∂ µ ( θ ; x , t )/ ∂ θ ∂θ k are replaced by ∂ ˆ µ ( θ ; x , t )/ ∂ θ and ∂ ˆ µ ( θ ; x , t )/ ∂ θ ∂θ k , respectively, for k = , . . . , p .2. R + = τ = τ < τ < ⋅ ⋅ ⋅ < τ R . In chains r ≥ µ ( θ ; x , t ) in the evaluation of U r ( δ ) in the acceptance probability are replaced byevaluation of ˆ µ ( θ ; x , t ) . For evaluation of U ( δ ) in chain r = τ =

1, evaluation ofthe actual µ ( θ ; x , t ) is retained.This modiﬁcations results in the sample generated in chain r = δ but the overall sampling scheme requiring only B evaluations of µ ( θ ; x i , t ij ) for i = , . . . , I and j = , . . . , n i .In the Prediction Phase, to estimate the probability of satisfying the speciﬁcation limits, all evaluationsof µ ( θ ; x , t ) are replaced by a value generated from Q ( θ , x , t ) . Sampling from this predictive distributionmeans the estimate of the probability takes account of additional uncertainty introduced by using an emulatorapproximation. In this paper, we employ the multivariate Gaussian process (MGP; Conti and O’Hagan 2010) emulator. Thenumerical solution to the ODEs, µ ( θ ; x , t ) , is evaluated at each of the elements of the set (or meta-design) ζ = {( θ ∗ , x ∗ , t ∗ ) , . . . , ( θ ∗ N , x ∗ N , t ∗ N )} . The superscript ∗ notation has been introduced to distinguish the meta-design from the design of the physical experiment. Let z i = µ ( θ ∗ i ; x ∗ i , t ∗ i ) , Z be the N × K matrix with i throw given by z T i , and w i = ( θ ∗ i ; x ∗ i , t ∗ i ) be the U × U = p + F + =

11. The centralassumption of the MGP is that Z follows a matrix normal distribution, Z ∼ MN ( N β T , Φ , P ) , where β is a K × Z , Φ is a K × K unstructured column covariancematrix, P is an N × N row correlation matrix and N is the N × ij th elementof P as P ij = exp (− U ∑ u = α u ( w iu − w ju ) ) , where w iu is the u th element of w i and α u > u = , . . . , U .Suppose we wish to predict z = µ ( θ ; x , t ) . Letˆ β = T N P − Z1 T N P − N , (14)ˆ Ψ = Z T P − ( I N − N T N P − T N P − N ) Z , (15)and w = ( θ , x , t ) , with I N the N × N identity matrix.It can be shown that the predictive distribution of z , i.e. Q ( θ , x , t ) , conditional on outputs z , . . . , z N is given by N ( ˆ µ ( θ ; x , t ) , ˆ Π ( θ ; x , t )) , (16)where ˆ µ ( θ ; x , t ) = ˆ β + p T0 P − ( Z − N ˆ β T ) , (17)ˆ Π ( θ ; x , t ) = ( − p T0 P − p ) ˆ Ψ , (18)9ith N × p having i th element p i = exp (− U ∑ u = α u ( w iu − w u ) ) , and w u being the u th element of w . The predictive distribution given by (16) is conditional on thecorrelation parameters, α = ( α , . . . , α U ) . We replace these values by their marginal posterior mode (e.g.Overstall and Woods, 2016) although a fully Bayesian approach could be taken with these parametersintegrated out with respect to their posterior distribution, conditional on Z . The two emulators formed prior to the Sampling and Prediction Phases depend on the choice of the meta-design ζ . The numerical solution, µ ( θ ; x , t ) , has three diﬀerent inputs: θ , x and t . We construct themeta-design as a cartesian product of designs for each of these three input types, ζ = ζ × ζ × ζ , where ζ = { θ ∗ , . . . , θ ∗ N } , ζ = { x ∗ , . . . , x ∗ N } , ζ = { t ∗ , . . . , t ∗ N } , and N = ∏ Mi = N i where M = t and θ and x allow us to exploit computational eﬃciencies in the computation ofthe numerical solution to the ODEs; computing µ ( θ ; x , t ) and µ ( θ ; x , t ) , for t ≠ t , requires a negligibleincrease in computational eﬀort over just computing µ ( θ ; x , t ) .This structure of meta-design also allows the emulator row correlation matrix, P , to be decomposed as P = M ⊗ i = P i = P ⊗ P ⊗ P , where the ij th elements of P , P and P are P ,ij = exp (− ∑ pu = α u ( θ ∗ iu − θ ∗ ju ) ) for i, j = , . . . , N ,P ,ij = exp (− ∑ Fu = α p + u ( x ∗ iu − x ∗ ju ) ) for i, j = , . . . , N ,P ,ij = exp (− α U ( t ∗ i − t ∗ j ) ) for i, j = , . . . , N , respectively. Hence, P − = M ⊗ i = P − i , signiﬁcantly reducing the computational burden of constructing the emulator and improving numerical sta-bility. Similarly, for a meta-design with this structure, (14), (15), (17) and (18) also simplify:ˆ β = (⊗ Mi = T N i P − i ) Z ∏ Mi = T N i P − i N i , ˆ Ψ = Z T [ M ⊗ i = P − i − M ⊗ i = P − i N i T N i P − i ] Z , ˆ µ ( θ ; x , t ) = ˆ β + ( M ⊗ i = p T i P − i ) ( Z − N ˆ β T ) , ˆ Π ( θ ; x , t ) = ( − M ∏ i = p T i P − i p i ) ˆ Ψ , where vectors p , p and p have j th elements given by p j = exp (− ∑ pu = α u ( θ ∗ ju − θ u ) ) , for j = , . . . , N , p j = exp (− ∑ pu = α p + u ( x ∗ ju − x u ) ) , for j = , . . . , N , p j = exp (− α U ( t ∗ j − t ) ) , for j = , . . . , N ,10espectively.When we choose ζ to construct the emulator for the Sampling Phase , we choose the values of x and t to coincide with the design of the physical experiment, i.e. to be equal to x , . . . , x I and t , . . . , t m . Henceuncertainty in the prediction will only result from the diﬀerent choice of parameter θ . For our experiment,this means ﬁxing ζ = { x , . . . , x } and ζ = { t , . . . , t m } , i.e. N = I = N = m .We use the following exploratory algorithm to adaptively choose ζ . This uses a hybrid of the method-ologies proposed by Rasmussen (2003), Fielding et al. (2011) and Overstall and Woods (2013). Rasmussen(2003) ﬁrst proposed using MCMC methods to adaptively improve an emulator to an unnormalised posteriordensity based on a computationally expensive function in light of observed data. Fielding et al. (2011) ex-tended this methodology by using parallel tempering, and Overstall and Woods (2013) proposed to emulatethe computationally expensive function, as opposed to the posterior density, to make model criticism moreeﬃcient.The following exploratory algorithm iteratively improves the accuracy of the emulator in the region ofthe parameter space Θ corresponding to high posterior density.1. Generate the set ζ = { θ ∗ , . . . , θ ∗ N } as a sample of size N from the prior distribution of θ .2. Let ζ = ζ × ζ × ζ and ﬁt the MGP to the resulting evaluations from µ ( θ ; x , t ) .3. Let ζ = ζ and repeat the following steps until µ ( θ ; x , t ) as been evaluated a total of N times.(a) Perform L iterations of the Gibbs sampling and parallel tempering scheme (for chains r = , . . . , R )where evaluation of µ ( θ ; x , t ) is replaced by evaluation of ˆ µ ( θ ; x , t ) in all instances.(b) Evaluate ˜ z rij = µ ( θ ( r ) ; x i , t j ) , for r = , . . . , R , i = , . . . , I and j = , . . . , n i , where θ ( r ) is thecurrent value of θ in the r th chain. Augment the matrix Z by ˜ z rij for r = , . . . , R , i = , . . . , I and j = , . . . , n i . Reﬁt the MGP.The meta-design for the Prediction Phase is constructed as the cartesian product of three space-ﬁllingdesigns. Firstly, ζ is constructed as a space-ﬁlling (e.g. Johnson et al., 1990) sub-sample of size N , selectedusing the cover.design function in the R package Fields (Nychka et al., 2015) using the sample generatedfrom the posterior distribution of θ in the Sampling Phase as a candidate list. Secondly, ζ and ζ are chosenas Latin hypercube designs in X and T of sizes N and N , respectively (Santner et al., 2003, Ch. 5). To create the meta-design for the Sampling Phase, we set N = N =

100 and set up R = L =

50. We adaptthe temperatures of the parallel chains using the methodology of Miasojedow et al. (2013). Figure 1 showsthe resulting values of θ ∗ in meta-design ζ . It is clear that the exploratory algorithm selects points from aconcentrated region of Θ.We then obtain a sample of size B = δ using the amended Gibbssampling and parallel tempering algorithm presented in Section 3.2. Convergence was assessed informallyvia trace plots (not shown) which showed that the chains had mixed adequately and each posterior wasunimodal. Figure 2 shows plots of the prior and estimated posterior densities for each element of θ as well asthe elements of ψ and ρ . Clearly the data has led to an increase in information from the prior to posteriordistributions.Diagnostics for assessing the adequacy of the ﬁtted model (see, for example, Gelman et al., 2014, Chapter6) indicated that there were no reasons to believe that the ﬁt was inadequate.11 ll lll l lll lllll l l ll ll lll ll ll llll llll l ll lll lll lll ll ll ll lll llllll llllll lllllllllllllllllllll lllll lllll −20 −16 −12 − − − − q q ll ll lll ll ll lll lll ll ll ll ll l ll ll l llll llll ll ll lllll l lllllllllllllllllllll lllllllllll lllllllllllllllllll −18 −14 −10 − − − − q q ll l ll ll ll ll llll ll l ll l llll lll ll ll ll ll ll lll ll l llll ll llllllllllllllllllllllllllllllllllllllllllllllllll − − − − q q l ll ll l llll llll l llll ll llll ll llll l llll ll ll ll l ll ll lll llllllllllllllll lllll lllllllll lllll lllllllllllll ll − − − − q q ll ll lll ll ll lll lll ll ll ll ll l ll ll l llll llll ll ll lllll l lllllllllllllllllllll lllllllllll lllllllllllllllllll −18 −14 −10 − − − q q ll l ll ll ll ll llll ll l ll l llll lll ll ll ll ll ll lll ll l llll ll llllllllllllllllllllllllllllllllllllllllllllllllll − − − q q l ll ll l llll llll l llll ll llll ll llll l llll ll ll ll l ll ll lll llllllllllllllll lllll lllllllll lllll lllllllllllll ll − − − q q ll l ll ll ll ll llll ll l ll l llll lll ll ll ll ll ll lll ll l llll ll llllllllllllllllllllllllllllllllllllllllllllllllll − − − q q l ll ll l llll llll l llll ll llll ll llll l llll ll ll ll l ll ll lll llllllllllllllll lllll lllllllll lllll lllllllllllll ll − − − q q ll Initial sample fromprior distributionValues selectedby exploratoryalgorithm l ll ll l llll llll l llll ll llll ll llll l llll ll ll ll l ll ll lll llllllllllllllll lllll lllllllll lllll lllllllllllll ll q q Figure 1: Values of θ in the meta-design ζ for the Sampling Phase as found by the exploratory algorithmin Section 3.3.1: grey points are the N initial values generated from the prior distribution and black pointsare the values selected in step 3 of the exploratory algorithm.12 q D en s i t y −16 −15 −14 −13 . . . q D en s i t y −15.5 −15.0 −14.5 −14.0 . . . . q D en s i t y q D en s i t y . . . . q D en s i t y . . . . y D en s i t y

10 15 20 25 30 35 . . . y D en s i t y . . . . r D en s i t y Prior distribution Posterior distribution

Figure 2: Trace plots of the posterior sample (left hand panels) and the estimated posterior and priordensities (right hand panels) for each element of θ . 13 .2 Prediction Phase We now use the sample generated from the posterior distribution of δ , and, in particular, θ , to construct anemulator Q ( θ , x , t ) , as deﬁned in (16), for the numerical solution to the ODEs. We let N = N =

50 and N =

50. It could be noted that this value of N is quite small compared to the rule of thumb of Loeppkyet al. (2009) that the sample size should be approximately ten times the number of input dimensions (in thiscase p = P . To assess theadequacy of the emulator, we created a test meta-design of the same size (as above, a space ﬁlling sub-samplewith a candidate list excluding the points from the original meta-design) and implemented the diagnosticmethods of Overstall and Woods (2016). In particular, the root mean squared errors between the numericalsolution and the predictive mean were 3 . × − , 0 .

19 and 0 .

40 for each of the K = µ ( θ ; x , t ) ,respectively. The overall coverage of the 95% predictive intervals, as assessed on the test design, was 89.0%.Name Symbol Value UnitInitial amount of A A x D x E x λ x V x x and t , as found by ACE.Now we can use emulator Q ( θ , x , t ) to approximate the probability, P ( y ∈ Y∣ y S ) , of satisfying thespeciﬁcation limits for given x and t . We need to maximize this probability over the space S = X × T . Todo this we use the approximate coordinate exchange (ACE; Overstall and Woods 2017) algorithm. ACEwas originally developed for ﬁnding Bayesian optimal experimental designs where the objective function ismaximised over a design space. In these situations, the objective function is usually analytically intractable.ACE uses a cyclic ascent algorithm (e.g. Lange, 2013, Chapter 7) to maximise the objective function via aunivariate Gaussian process emulator based on evaluations of a Monte Carlo approximation to the objectivefunction. We use the implementation of ACE given by the R package acebayes (Overstall et al., 2017). Weuse 100 random starts of ACE, where the initial value of ( x , t ) is uniformly generated in S ×T . The optimumvalues of x and t , as found by ACE, are shown in Table 3. The approximate maximum value of P ( y ∈ Y∣ y S ) attained was 0.77 (subject to Monte Carlo error). We investigated the sensitivity of the above approach tothe choice of prior distributions for ρ , ψ , Ω and Σ . We did this by repeating the above analysis to ﬁnd theoptimal settings for x and t under diﬀerent prior speciﬁcations. We found the results to be broadly robustto choice of prior distribution and details of this are provided in Appendix D.To further explore how P ( y ∈ Y∣ y S ) depends on x and t , we ﬁx the initial amount of E, temperatureand volume at 26.47 mols, 31.28 litres, and 313.5 K, respectively, i.e. the optimum values as found by ACE.We then vary the initial amounts of A and D, and time over the ranges identiﬁed in Table 1. The last rowof Figure 3 shows P ( y ∈ Y∣ y S ) for time against the initial amount of A, with diﬀerent columns for diﬀerentinitial amounts of D. The ﬁrst three rows show corresponding plots for the marginal probability of satisfyingeach of the speciﬁcation limits (3) on E, F and H, respectively. The trade-oﬀ between the objectives givenby the speciﬁcation limits can be clearly seen. To satisfy the limit E ( t ) < t has to be large. This isintuitively obvious since the initial amount of E is 26.47 mols so the process will need to progress for sometime before the amount of E has decreased enough to satisfy the limit. The opposite is true for H (initialamount of 0 mols), where the optimum time to satisfy H ( t ) < t to be small. This means thatthere is a very narrow window of values of t that will satisfy all three constraints, as shown in the last rowof Figure 3. Note that the optimum values of x and t (as shown in Table 3) lie within this narrow window.14 Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Time (s) A m oun t o f A ( m o l s ) . . Figure 3: The probability of satisfying each of the three marginal constraints and the joint constraints fortime (in s; x -axis) against initial amount of A (in mols; y -axis) for ﬁve diﬀerent initial amounts of D. Thecolours indicate the magnitude of the probability, with black being 0 and white being 1.15 Discussion

This paper has developed and applied methodology for choosing combinations of values of the controllablefactors that produce a high probability of meeting process speciﬁcation when the process is at least approx-imating determined by a physical model. Our application was a chemical synthesis process used to producea pharmaceutical product when speciﬁcation was determined by a) the amount of the substance constitut-ing the pharmaceutical product being above some level; and b) the amounts of two unwanted by-productsubstances being below some levels. The relationship between the controllable factors and the amounts ofsubstances of interest is hypothesised to be governed by the analytically intractable solution to a systemof ODEs. Responses from a physical experiment are used to reﬁne the knowledge on this relationship andthe probability of satisfying the constraints is maximized. Statistical emulators for the numerical solutionare constructed to accelerate the model-ﬁtting process and the estimation of the probability of satisfyingthe constraints. The methodology produces computationally feasible results that take account of diﬀerentuncertainties involved in the experimental and modelling processes (measurement error, model inadequacy,emulator error). The methods have the potential to be applied to a variety of quality control and pharma-ceutical design space problems that involve the numerically expensive approximation to physical models.The modelling and computational approach taken in this paper can be modiﬁed in a variety of diﬀer-ent ways to suit the application of interest. For example, we have explicitly taken account of the modeldiscrepancy. If the experimenters believed the model given by the solution to the ODEs was a very closeapproximation to the true physical process then the model discrepancy could be discarded by setting D = and omitting the steps for sampling from the full conditional distributions of Σ and ψ . Another example ofmodiﬁcation is the introduction of noise into the speciﬁcation of the controllable factors. In this paper, weassumed that the chemical engineers have complete control over the speciﬁcation of the controllable factorswhen we perform the maximization of the probability of satisfying the speciﬁcation limits given by (3). Insome cases, the chemical engineer would not be able to control these exactly. Following the approach inthis paper, it would be straightforward to introduce variability as an intermediate step between specifyingthe controllable variables and evaluating the Gaussian process emulator. Lastly, we employed the RMLMHalgorithm to sample from the full conditional distributions of θ , ρ and ψ . It would also be possible to use themethodology in this paper to employ Riemann Manifold Hamiltonian Monte Carlo methodology (Girolamiand Calderhead, 2011). Acknowledgments

The authors would like to thank Drs John Peterson, Mohammad Yahyah and Neil Hodnett (GlaxoSmithK-line) for providing the original kinetics modelling problem and data set. Additionally, the authors are gratefulfor the editor, associate editor and two referees who provided valuable feedback on earlier versions of thepaper. DCW was support by UK Engineering and Physical Sciences Research Council (EPSRC) FellowshipEP/J018317/1 and KJM was supported by a PhD award from EPSRC and GlaxoSmithKline.

A Full-conditional distributions for Gibbs sampling

A.1 Full conditional distribution for model discrepancy

For the r th chain with temperature τ r , the full-conditional distribution of d ∗ is d ∗ ∣ y , θ , Σ , Ω , ρ, ψ ∼ N ( µ d ∗ , τ r V d ∗ ) , where V d ∗ = ( H T ( Ω − ⊗ T − ) H + ( Σ − ⊗ S − )) − , µ d ∗ = V d ∗ ( H T ( Ω − ⊗ T − ) y + ( Σ − ⊗ S − ) m ( θ )) . .2 Full conditional distributions for covariance matrices For the r th chain with temperature τ r , the full-conditional distributions of Ω and Σ are given by Ω ∣ Y , D ∗ , ρ ∼ IW ( ν + n + K + τ r − K − , τ r ( I K + ( Y − GD ∗ ) T T − ( Y − GD ∗ ))) , and Σ ∣ D ∗ , θ , ψ ∼ IW ( ν + m + K + τ r − K − , τ r ( I k + ( D ∗ − M ( θ )) T S − ( D ∗ − M ( θ )))) , respectively, where IW denotes the inverse-Wishart distribution. A.3 Full conditional distribution for censored observations

Let A be the n × n permutation matrix which re-orders the elements of y such that we can write Ay = ( y S y C ) ∼ N ( AHd ∗ , A ( Ω ⊗ T ) A T ) . Deﬁne b = AHd ∗ and L = A ( Ω ⊗ T ) A T . Let b = ( b S , b c ) T where b S and b ) C are the n S × n c × y S and y C , respectively. Similarly, partition L as L = ( L SS L SC L CS L CC ) . The full conditional distribution of y c is y C ∼ N ( b C + L CS L − SS ( y S − b S ) , τ r ( L CC − L CS L − SS L SC )) truncated to the hypercube [−∞ , log χ ] n c . B Riemann manifold Langevin Metropolis-Hastings (RMLMH)Algorithm

Consider generating a sample from the distribution of δ ∈ R p , conditional on y , having density π ( δ ∣ y ) ∝( π ( y ∣ δ ) π ( δ )) / τ r , for temperature τ r , using the Metropolis-Hastings algorithm. Let h ( δ ) = log π ( y ∣ δ ) + log π ( δ ) and deﬁne ▽ h ( δ ) = ∂ log π ( y ∣ δ ) ∂ δ + ∂ log π ( δ ) ∂ δ , G ( δ ) = − E y ∣ δ [ ∂ log π ( y ∣ δ ) ∂ δ ∂ log π ( y ∣ δ ) ∂ δ T ] − ∂ log π ( δ ) ∂ δ ∂ δ T . If δ c is the current value of δ , then under the RMLMH algorithm, the proposal distribution is N ( µ δ , τ r V δ ) , where µ δ = δ c + (cid:15) G ( δ c ) − ▽ h ( δ c ) − (cid:15) n ( ) + (cid:15) n ( ) , V δ = (cid:15) G ( δ c ) − , with n ( ) and n ( ) having i th elements n ( ) i = p ∑ j = { G ( δ c ) − ∂ G ( δ ) ∂δ j ∣ δ = δ c G ( δ c ) − } ij ,n ( ) i = p ∑ j = { G ( δ c ) − } ij tr { G ( δ c ) − ∂ G ( δ ) ∂δ j ∣ δ = δ c } , respectively. 17 Components required for Riemann manifold MH Algorithm

C.1 Physical parameters

The log density of the full conditional distribution of θ is given by h ( θ ) ∝ − ( d ∗ − m ( θ )) T ( Σ − ⊗ S − ) ( d ∗ − m ( θ )) − ( θ − µ ) T ∆ − ( θ − µ ) . The derivative with respect to θ is ▽ h ( θ ) = ( d ∗ − m ( θ )) T ( Σ − ⊗ S − ) ∂ m ( θ ) ∂ θ − ∆ − ( θ − µ ) . The matrix tensor for θ is G ( θ ) = ∂ m ( θ ) ∂ θ T ( Σ − ⊗ S − ) ∂ m ( θ ) ∂ θ + ∆ − , with derivatives ∂ G ( θ ) ∂θ k = ∂ m ( θ ) ∂ θ T ( Σ − ⊗ S − ) ∂ m ( θ ) ∂ θ ∂θ k , for k = , . . . , p . C.2 Correlation parameter for time dependency

Consider a log transformation of ρ , i.e. a = log ρ . The log density of the full conditional distribution of a isgiven by h ( a ) ∝ − K I ′ ∑ i = log ∣ T i ∣ − ( y − Hd ∗ ) T ( Ω − ⊗ diag i = ,...,I ′ { T − i }) ( y − Hd ∗ )+ a − exp ( a ) . The derivative with respect to a is ∂h ( a ) ∂a = − K I ′ ∑ i = tr { T − i T ia }+ ( y − Hd ∗ ) T ( Ω − ⊗ diag i = ,...,I ′ { T − i T ia T − i }) ( y − Hd ∗ )+ − exp ( a ) , where T ia = ∂ T i / ∂a with jl th element − exp ( a )( t ij − t il ) T i,jl . The tensor matrix for a is G ( a ) = K { T − i T ia T − i T ia } − exp ( a ) , with derivative ∂ G ( a ) ∂a = K tr { T − i T ia T − i ( T iaa − T ia T − i T ia )} − exp ( a ) , where T iaa = ∂ T i / ∂a with jl th element exp ( a )( t ij − t il ) ( exp ( a )( t ij − t il ) − ) T i,jl .18 .3 Correlation parameters for model discrepancy Consider a log transformation of each element of ψ , i.e. b i = log ψ i for i = ,

2. The log density of the fullconditional distribution of b = ( b , b ) is given by h ( b ) ∝ − K ∣ S ∣ − ( d ∗ − m ( θ )) T ( Σ ⊗ S − ) ( d ∗ − m ( θ )) + ∑ i = b i − exp ( b i ) . The derivative with respect to b i is ∂h ( b ) ∂b i = − K { S − S i } + ( d ∗ − m ( θ )) T ( Σ ⊗ S − S i S − ) ( d ∗ − m ( θ )) + − exp ( b i ) , where S i = ∂ S / ∂b i with jl th element given by ⎧⎪⎪⎨⎪⎪⎩ − exp ( b ) [∑ Ff = ( d jf − d lf ) ] S jl , if i = , − exp ( b )( t j − t l ) S jl , if i = . The tensor matrix, G ( b ) has ij th element G ( b ) ij = K { S − S i S − S j } − I ( i = j ) exp ( b i ) . The derivatives of G ( b ) are ∂G ( b ) ij ∂b k = K { S − S i S − S jk − S − S i S − S k S − S j + S − S j S − S ik − S − S i S − S j S − S k }− I ( i = j = k ) exp ( b i ) , where S ik = ∂ S / ∂b i ∂b k with jl th element given by ⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ exp ( b ) [∑ Ff = ( d jf − d lf ) ] ( exp ( b ) [∑ Ff = ( d jf − d lf ) ] − ) S jl , if i = k = , exp ( b )( t j − t l ) ( exp ( b )( t j − t l ) − ) S jl , if i = k = , exp ( b + b ) [∑ Ff = ( d jf − d lf ) ] ( t j − t l ) S jl , if i = k = . D Prior sensitivity analysis

Suppose ρ ∼ Gamma ( α, β ) Σ ∼ IW ( ν, I ) ψ i ∼ Gamma ( α, β ) Ω ∼ IW ( ν, I ) for i = ,

2, where the Gamma distribution is parameterised such that the expectation and variance are α / β and α / β , respectively, and the inverse-Wishart such that the mode is I / ν . Let the prior distribution for ρ , ψ , Σ and Ω outlined in Section 2.3 be denoted as Prior 0 where α = β = ν =

4. We considerthree diﬀerent prior speciﬁcations (denoted Prior 1, 2 and 3) given by diﬀerent combinations of α , β and ν ,corresponding to a sequence of increasingly diﬀuse prior distributions. For each prior, the analysis describedin Section 4 was undertaken and the optimum values of x and t were found by ACE. Table 4 shows thevalues of α , β and ν for Priors 1, 2 and 3 as well as the optimum values of x and t . Also included is theprobability of satisfying the speciﬁcation limits given by (3) under each prior distribution for the optimumvalues also found under each prior distribution. There is some variability with respect to initial amountsof A and D and, particularly, with respect to time. However the probability of satisfying the speciﬁcationlimits remain broadly insensitive to these changes. We conclude that the results are robust to choice of priordistribution. 19rior 0 1 2 3 α β ν α , β and ν specifying the prior distributions for ψ , Σ and Ω ; b) the optimum values of x and t for maximizing the probability of satisfying the speciﬁcation limits;and c) the probability of satisfying the speciﬁcation limits given by (3) under each prior distribution for theoptimum values also found under each prior distribution. References am Ende, D.J. (Ed.), 2011. Chemical Engineering in the Pharmaceutical Industry. Wiley, Hoboken, NJ.Brown, P., Byrne, G., Hindmarsh, A., 1989. Vode: A variable coeﬃcient ode solver. SIAM Journal onScientiﬁc and Statistical Computing 10, 1038–1051.Chiao, C., Hamada, M., 2001. Analyzing experiments with correlated multiple responses. Journal of QualityTechnology 33, 451–465.Conti, S., O’Hagan, A., 2010. Bayesian emulation of complex multi-output and dynamic computer models.Journal of Statistical Planning and Inference 140, 640–651.Dawid, A., 1981. Some matrix-variate distribution theory: Notational considerations and a bayesian appli-cation. Biometrika 68, 265–274.Dean, A., Morris, M., J., S., Bingham, D.e. (Eds.), 2015. Handbook of Design and Analysis of Experiments.CRC Press, Boca Raton.Del Castillo, E., 2007. Process Optimization – A Statistical Approach. Springer, New York.Fang, K., Li, R., Sudianto, A., 2006. Design and Modelling for Computer Experiments. Chapman and Hall,Boca Raton.Fielding, M., Nott, D.J., Liong, S.Y., 2011. Eﬃcient MCMC schemes for computationally expensive posteriordistributions. Technometrics 53, 16–28.Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B., 2014. Bayesian Data Analysis.volume 3rd. Chapman and Hall.Geweke, J.F., 1991. Eﬀcient simulation from the multivariate normal and student-t distributions subjectto linear constraints, in: Computer Science and Statistics. Proceedings of the 23rd Symposium on theInterface. 20irolami, M., Calderhead, B., 2011. Riemann manifold langevin and hamiltonian monte carlo methods (withdiscussion). Journal of the Royal Statistical Society, Series B 73, 123–214.Iserles, A., 2009. A First Course in the Numerical Analysis of Diﬀerential Equations. 2nd ed., CambridgeUniversity Press.Johnson, M.E., Moore, L.M., Ylvisaker, D., 1990. Minimax and maximin distance designs. Journal ofStatistical Planning and Inference 26, 131–148.Kennedy, M.C., O’Hagan, A., 2001. Bayesian calibration of computer models (with discussion). Journal ofthe Royal Statistical Society, Series B 63, 425–464.Laidler, K.J., 1984. The development of the arrhenius equation. Journal of Chemical Education 61, 494.Lange, K., 2013. Optimization. 2nd ed., Springer, New York.Lebrun, P., Boulanger, B., Debrus, B., Lambert, P., Hubert, P., 2013. A Bayesian design space for analyticalmethods based on multivariate models and predictions. Journal of Biopharmaceutical Statistics 23, 1330–1351.Loeppky, J.L., Sacks, J., Welch, W.J., 2009. Choosing the sample size of a computer experiment: a practicalguide. Technometrics 51, 366–376.Miasojedow, B., Moulines, E., Vihola, M., 2013. An adaptive parallel tempering algorithm. Journal ofComputational and Graphical Statistics 22, 649–664.Nychka, D., Furrer, R., Paige, J., Sain, S., 2015. ﬁelds: Tools for spatial data. URL: . R package version 8.15.Overstall, A.M., Woods, D.C., 2013. A strategy for bayesian inference for computationally expensive modelswith application to the estimation of stem cell properties. Biometrics 69, 458–468.Overstall, A.M., Woods, D.C., 2016. Multivariate emulation of computer simulators: model selection anddiagnostics with application to a humanitarian relief model. Journal of the Royal Statistical Society C 65,483–505.Overstall, A.M., Woods, D.C., 2017. Bayesian design of experiments using approximate coordinate exchange.Technometrics 59, 458–470.Overstall, A.M., Woods, D.C., Adamou, M., 2017. acebayes: An R package for Bayesian optimal design ofexperiments via approximate coordinate exchange. arXiv:1705.08096 .Papaspiliopoulos, O., Roberts, G.O., Skold, M., 2003. Non-centered parameterizations for hierarchical modelsand data augmentation, in: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D.,Smith, A.F.M., West, M. (Eds.), Bayesian Statistics 7, Oxford.Peterson, J., 2004. A posterior predictive approach to multiple response surface optimization. Journal ofQuality Technology 36, 139–153.Peterson, P.J., 2008. A Bayesian approach to the ICH Q8 deﬁnition of design space. Journal of Biopharma-ceutical Statistics 18, 959–975.Peterson, P.J., Yahyah, M., 2009. A Bayesian design space approach to robustness and system suitabilityfor pharmaceutical assays and other processes. Statistics in Biopharmaceutical Research 1, 441–449.R Core Team, 2018. R: A Language and Environment for Statistical Computing. R Foundation for StatisticalComputing. Vienna, Austria. URL: .21asmussen, C.E., 2003. Gaussian processes to speed up hybrid monte carlo for expensive bayesian integrals,in: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M.(Eds.), Bayesian Statistics 7, Oxford.Sacks, J., Welch, W., Mitchell, T., Wynn, H., 1989. Design and analysis of computer experiments. StatisticalScience 4, 409–435.Santner, T.J., Williams, B.J., Notz, W.I., 2003. The Design and Analysis of Computer Experiments. Springer,New York.Soetaert, K., Petzoldt, T., Woodrow Setzer, R., 2010. Solving diﬀerential equations in R: Package deSolve.Journal of Statistical Software 33, 1–25. URL: