[PDF] Design of Experiments for Verifying Biomolecular Networks

Abstract

There is a growing trend in molecular and synthetic biology of using mechanistic (non machine learning) models to design biomolecular networks. Once designed, these networks need to be validated by experimental results to ensure the theoretical network correctly models the true system. However, these experiments can be expensive and time consuming. We propose a design of experiments approach for validating these networks efficiently. Gaussian processes are used to construct a probabilistic model of the discrepancy between experimental results and the designed response, then a Bayesian optimization strategy used to select the next sample points. We compare different design criteria and develop a stopping criterion based on a metric that quantifies this discrepancy over the whole surface, and its uncertainty. We test our strategy on simulated data from computer models of biochemical processes.

Full PDF

DDesign of Experimentsfor Verifying Biomolecular Networks

Ruby Sedgwick

John Goertz Molly Stevens Ruth Misener Mark van der Wilk Department of Materials, Department of Bioengineering and Institute of Biomedical Engineering Department of ComputingImperial College London {r.sedgwick19, j.goertz, m.stevens, r.misener, m.vdwilk}@imperial.ac.uk

Abstract

There is a growing trend in molecular and synthetic biology of using mechanistic(non machine learning) models to design biomolecular networks. Once designed,these networks need to be validated by experimental results to ensure the theoreticalnetwork correctly models the true system. However, these experiments can beexpensive and time consuming. We propose a design of experiments approachfor validating these networks efﬁciently. Gaussian processes are used to constructa probabilistic model of the discrepancy between experimental results and thedesigned response, then a Bayesian optimization strategy used to select the nextsample points. We compare different design criteria and develop a stopping crite-rion based on a metric that quantiﬁes this discrepancy over the whole surface, andits uncertainty. We test our strategy on simulated data from computer models ofbiochemical processes.

As mathematical models of biological systems improve, they are increasingly used to design systemsfor a particular purpose. Our motivating example is the creation of bespoke networks of interactingbiomolecules that respond to changes of input concentrations as logic gates (e.g. AND or OR). Thesecan be used to create biosensors [1], targeted drug delivery [2], and tunable genetic circuits [3]. Dueto discrepancies between the predictions of the mathematical model and biological reality, eachnetwork has to be validated experimentally before use to verify that the real-world system exhibits thedesired response. In the context of verifying many networks, the cost of these experiments becomeslarge. In this work, we propose a method for verifying whether the true response of the network issuitably similar to that which is desired.The response of biomolecular networks have been validated before [1, 4, 5], by running experimentsat a ﬁxed predetermined set of input points that were deemed important for the application. Thenumber of points was chosen to satisfy the experimenter that the surfaces were similar. A similar, butsigniﬁcantly different problem, is investigated in the surrogate modeling literature, where it is alsoimportant to understand the discrepancy between experimental data and a true reference [6, 7, 8, 9].We take inspiration from these methods in (1) modeling the discrepancy so the observed data informsour prediction of the discrepancy over the entire surface and (2) using the discrepancy model tointelligently determine the next experiment to run. In our work we follow a similar procedure; eachtime more experimental data is obtained, a Gaussian process is ﬁtted to the discrepancy betweenthe experimental data and the computer model. We use standard Bayesian optimization designcriteria to select the next experimental point, although this element of our method could be improved.The Gaussian process is then updated, before the process repeats. Our main contribution is in theproposal of a principled stopping criterion designed to recognize when enough experiments have been

Machine Learning for Molecules Workshop at NeurIPS 2020. https://ml4molecules.github.io a r X i v : . [ q - b i o . Q M ] N ov onducted to accept or reject a network. We assessed the performance of this method by applying itto the validation of simulated biomolecular networks, although this method could be useful for anyapplication where it is necessary to test whether a designed response surface matches experimentalreality. We wish to reject an incorrect (or accept a correct) biomolecular network as quickly as possible. Todo this, our method can be broken down into: (i) modeling the discrepancy between the theoreticalassay design and the true underlying response surface, (ii) the stopping criterion and (iii) the designcriterion.

To understand how well the experimental data ﬁts the designed surface, we model the discrepancy g ( x ) , g : R D (cid:55)→ R as in Eq. 1 [10]. In an input domain X ∈ R D , where D is the dimensions of theinput domain, for an input x ∈ X the experimental observations y ( x ) , y : R D (cid:55)→ R , can be split intothe model prediction η ( x , θ m ) , η : R D (cid:55)→ R , with best-ﬁt parameter values, where θ m is a vector ofthe mechanistic model parameters; the mechanistic model inadequacy ψ ( x ) , ψ : R D (cid:55)→ R , whichis unknown in untested conditions; and the experimental noise (cid:15) ( x ) , (cid:15) : R D (cid:55)→ R . This breakdownallows for explicit modeling of the inadequacy term ψ ( x ) to account for the systematic deviationsfrom the model. g ( x ) = y ( x ) − η ( x , θ m ) = ψ ( x ) + (cid:15) ( x ) (1)We use a squared exponential Gaussian process as a probabilistic model for the discrepancy, im-plemented using GPﬂow [11]. To better understand how the algorithm is working, we simulate theexperimental data as noiseless, so therefore set the noise variance to be very small σ n = 1 e − anduntrainable. We set both the signal variance and lengthscale priors to a gamma distribution with ahigh mode σ f ∼ Gamma (2 , . To train the model, the most probable values of the hyperparametersare found using the maximum a posteriori (MAP) estimate. A maximum variance design criterionwas used to select the next sample point. The aim of the stopping criterion is to terminate the process when a reasonable degree of certaintythat the network architecture should be accepted or rejected has been reached. To do this, we need ameasure of the average discrepancy over the entire surface. Only the absolute discrepancy matters,regardless of its sign and a measure of certainty in the estimates is required. Therefore, we developeda stopping metric based on the RMSE of the surface.As the Gaussian process is ﬁtted to the true discrepancy, the stopping metric needs to convert thetrue discrepancy into an absolute value. The metric should also incorporate the uncertainty in themodel, to prevent a decision from being made on the basis of too little information. We developed aroot mean squared error (RMSE) based metric M to quantify the overall discrepancy. N functionsamples f s = { f s ( x j ∗ ) } mj =0 are drawn from the Gaussian process posterior f ∗ for m sample inputs X ∗ = { x j ∗ } mj =0 spaced uniformly across the input domain, such that F s = { f sh } Nh =0 . The RMSEmetric for each Gaussian process sample is calculated as in Eq. 2. M h = (cid:118)(cid:117)(cid:117)(cid:116) m m (cid:88) j ∗ =0 f sh ( x j ∗ ) (2)This metric directly measures the average discrepancy of the surface and allows for the propagationof uncertainty from the probabilistic model of the surface to the metric itself. Calculating this metricfor numerous function draws gives a distribution of metric values as shown in Fig. 1. The probability p ( M true < T avg ) that the true RMSE of the deviation lies below the threshold for average deviation T avg set by a domain expert is then calculated. If the probability is smaller than a predetermined2inimum probability p reject or larger than a predetermined maximum probability p accept , then thenetwork architecture is rejected or accepted respectively. If the network architecture is not acceptedor rejected, then the process continues. MAP inference for the GP hyperparameters has a tendencyto cause predictions to be over-conﬁdent for very small sets of observations. To prevent prematuretermination, the stopping criterion is only applied after a given number of data points have beenobserved. Figure 1: Plot of samples drawnfrom a one dimensional Gaus-sian process posterior (left) anda histogram of the distributionof metrics M for 40 samplesdrawn from this Gaussian pro-cess (right), with a skewed nor-mal distribution ﬁtted to the data(black line). The RMSE thresh-old T avg is marked with thedashed red line. Four Bayesian optimization design criteria; maximum variance, expected improvement and upperconﬁdence bound with weightings of 1 and 3 on the standard deviation, were compared to threestandard experimental strategies: grid search, random and Sobol random. Each strategy was testedmultiple times on 8 test surfaces, and the absolute error in mean and maximum discrepancy recorded(Figure 5 in Appendix A). None of the regimes tested outperformed the others for both maximumand average discrepancies, but maximum variance, UCB with high weighting on variance and Sobolmethods were found to perform the best.

We ran tests to determine how accurately, and in how many runs, the stopping metric can determineif a designed network should be accepted or rejected. 12 simulated biochemical networks werecompared to the ideal logic gate surfaces they were designed to recreate. We ran the algorithm 10times for each network to account for the semi-random nature of the acquisition function. Figure 6 inAppendix B demonstrates the sequential procedure of the method. The ideal AND, OR and XORsurfaces provide a +1/-1 response based on the input concentrations of biomolecules. The RMSEmetrics at each iteration were recorded so that the stopping metric could be applied retrospectively,meaning all tests were done on the same dataset. The two stopping metric parameters that we exploredare the minimum number of experiments before the stopping metric is applied and the percentagecertainty we wish to have before accepting or rejecting a network.To investigate the effect of the minimum number of experiments, we measured the numbers of falseacceptances/rejections for a minimum number of experiments n min = { , , , .., , , } . TheRMSE threshold T avg was set by a domain expert to . , as this was deemed an acceptable level ofdiscrepancy. Figure 2 shows a plot of sensitivity against speciﬁcity for different values of n min anddifferent percentage certainties. These results show that there is a trade off between the minimumnumber of experiments and the number of false acceptances and rejections made by the algorithm.This is because for larger minimum number of experiments the algorithm has seen more data pointsso has more information on what the discrepancy surface looks like.We chose the optimal minimum number of experiments to be n min = 8 as the improvement insensitivity and speciﬁcity is small for values of n min greater than this. Figure 3 shows whenterminations of the algorithm occur and the type of termination for n min = 8 for 3 levels of certaintywe wish to have in our results, p accept = { . , . , . } and p reject = { . , . , . } . Increasingthe level of certainty required before accepting or rejecting a network increases the number ofexperiments before termination. When choosing the value for p accept and p reject , the tolerance forerroneously accepting or rejecting should take the application into consideration.3igure 2: Sensitivity-speciﬁcity plots for different levels of certainty p accept and p reject . The numbers next tothe points are the minimum number of runs. Figure 3:

The number of experiments to termination of the experimental run for different levels of conﬁdencein the result. These results are split by outcomes: true acceptance (TA), true rejection (TR), false acceptance(FA) and false rejection (FR).

Interestingly, for n min > and higher values of p accept , the algorithm failed to determine whether 1network out of the 120 tested should be accepted or rejected within the experimental budget. Thenetwork in question has a true RMSE value very close to the threshold, so the method never becamecertain enough that the network should be accepted. Figure 4 shows a plot of the RMSE metricdistribution for this network with 50 observed data points.The number of runs to termination and the nature of the termination for n min = 8 are shown in thehistograms in Figure 3. These results show that the stopping criterion can identify valid and invalidnetwork architectures with few data points, and that the time to termination is dependent on theconﬁdence in prediction required. Figure 4:

RMSE metric distri-bution for the run of our methodthat provided an inconclusive re-sult once the experimental budgetwas exhausted. The grey dottedline shows the true RMSE valuefor this surface and the red dashedline is the threshold T avg . We have presented a method for determining if a network shouldbe accepted or rejected for a given application using a sequentialexperimental approach and a stopping metric designed to give ameasure of discrepancy across the entire surface. We tested thismethod on simulated biomolecular networks and demonstrated thatit can accept or reject networks with a reasonable degree of accuracyin a small number of experimental runs.Two main improvements will be made in future work. Firstly, wecan use acquisition functions tailored for Bayesian quadrature, sincewe are trying to maximize our certainty about an integral. Recentwork on modeling non-negative functions is particularly relevant[12, 13]. We also aim to eliminate the need for the minimum numberof experiments parameter by integrating over the Gaussian processhyperparameters. 4 cknowledgments and Disclosure of Funding

This work was supported by the UKRI CDT in AI for Healthcare http://ai4health.io [GrantNo. EP/S023283/1], UK Research and Innovation [Grant No. EP/P016871/1] and US NIH [GrantNo. 5F32GM131594].

References [1] Piro Siuti, John Yazbek, and Timothy K. Lu. Synthetic circuits integrating logic and memoryin living cells.

Nature Biotechnology , 31(5):448–452, May 2013. ISSN 1546-1696. doi:10.1038/nbt.2510.[2] Barry A. Badeau, Michael P. Comerford, Christopher K. Arakawa, Jared A. Shadish, andCole A. DeForest. Engineered modular biomaterial logic gates for environmentally triggeredtherapeutic delivery.

Nature chemistry , 10(3):251–258, March 2018. ISSN 1755-4330. doi:10.1038/nchem.2917.[3] Vittorio Bartoli, Grace A. Meaker, Mario di Bernardo, and Thomas E. Gorochowski. Tunablegenetic devices through simultaneous control of transcription and translation.

Nature Communi-cations , 11(1):1–11, April 2020. ISSN 2041-1723. doi: 10.1038/s41467-020-15653-7.[4] Caleb J. Bashor, Nikit Patel, Sandeep Choubey, Ali Beyzavi, Jané Kondev, James J. Collins,and Ahmad S. Khalil. Complex signal processing in synthetic gene circuits using cooperativeregulatory assemblies.

Science , 364(6440):593–597, May 2019. ISSN 0036-8075, 1095-9203.doi: 10.1126/science.aau8287.[5] Yolanda Schaerli, Andreea Munteanu, Magüi Gili, James Cotterell, James Sharpe, and MarkIsalan. A uniﬁed design space of synthetic stripe-forming networks.

Nature Communications , 5(1):1–10, September 2014. ISSN 2041-1723. doi: 10.1038/ncomms5905.[6] Sez Atamturktur, Matthew C. Egeberg, François M. Hemez, and Garrison N. Stevens. Deﬁningcoverage of an operational domain using a modiﬁed nearest-neighbor metric.

MechanicalSystems and Signal Processing , 50-51:349–361, January 2015. ISSN 0888-3270. doi: 10.1016/j.ymssp.2014.05.040.[7] Brian J. Williams, Jason L. Loeppky, Leslie M. Moore, and Mason S. Macklem. Batchsequential design to achieve predictive maturity with calibrated computer models.

ReliabilityEngineering & System Safety , 96(9):1208–1219, September 2011. ISSN 09518320. doi:10.1016/j.ress.2010.04.017.[8] C. Unal, B. Williams, F. Hemez, S. H. Atamturktur, and P. McClure. Improved best estimate plusuncertainty methodology, including advanced validation concepts, to license evolving nuclearreactors.

Nuclear Engineering and Design , 241(5):1813–1833, May 2011. ISSN 0029-5493.doi: 10.1016/j.nucengdes.2011.01.048.[9] Leslie M. Moore, Brian J. Williams, and Jason L. Loeppky. Batch sequential designs forcomputer experiments.

Journal of Statistical Planning and Inference , January 2009. doi:10.1016/j.jspi.2009.12.004.[10] Marc C. Kennedy and Anthony O’Hagan. Bayesian calibration of computer models.

Journal ofthe Royal Statistical Society: Series B (Statistical Methodology) , 63(3):425–464, 2001. ISSN1467-9868. doi: 10.1111/1467-9868.00294.[11] Alexander G. De G. Matthews, Mark Van Der Wilk, Tom Nickson, Keisuke Fujii, AlexisBoukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. GPﬂow: aGaussian process library using tensorﬂow.

The Journal of Machine Learning Research , 18(1):1299–1304, January 2017. ISSN 1532-4435.[12] Tom Gunter, Michael A Osborne, Roman Garnett, Philipp Hennig, and Stephen J Roberts.Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature.

Advances inNeural Information Processing Systems 27 , pages 2789–2797, 2014.513] Michael Osborne, Roman Garnett, Zoubin Ghahramani, David K Duvenaud, Stephen J Roberts,and Carl E. Rasmussen. Active Learning of Model Evidence Using Bayesian Quadrature. In

Advances in Neural Information Processing Systems 25 , pages 46–54. 2012.6

Design Criteria Comparison

Figure 5:

The absolute error of the predicted mean and maximum discrepancy for 8 test functions comparedto the true mean discrepancies for Sobol random, random, grid and the following design criteria: maximumvariance, upper conﬁdence bound (UCB), upper conﬁdence bound with a higher weighting on the standarddeviation term (UCB with κ = 3 ), and expected improvement (EI). The grid search, random and Sobol methodswere all run on each test surface 100 times. Due to time constraints, the DOE strategy with different designcriteria were only run 20 times for each test function. For the grid method, the tests were only run for numbersof points that are square numbers, and it is assumed that for a given experimental budget, the number of pointsselected would be the largest square number within that budget. The solid line in each plot is the median valueand the shaded region denotes the interquartile range. Biomolecular Networks Validation

Figure 6:

An illustration of the sequential nature of the method. α and β are input concentrations of biomoleculesand the two plots in the top left hand corner show the ideal XOR response surface and the true response surfaceas generated by the biomolecular simulations. The plot second from top left shows the true discrepancy betweenthese surfaces. The remaining plots then show the Gaussian process belief over the surface after 1, 5, 10, 15and 20 experiments. The black crosses show where on the surface experiments have been conducted. Each newexperimental point is selected using the maximum variance acquisition function.are input concentrations of biomoleculesand the two plots in the top left hand corner show the ideal XOR response surface and the true response surfaceas generated by the biomolecular simulations. The plot second from top left shows the true discrepancy betweenthese surfaces. The remaining plots then show the Gaussian process belief over the surface after 1, 5, 10, 15and 20 experiments. The black crosses show where on the surface experiments have been conducted. Each newexperimental point is selected using the maximum variance acquisition function.