[PDF] Physics-informed Gaussian Process for Online Optimization of Particle Accelerators

Abstract

High-dimensional optimization is a critical challenge for operating large-scale scientific facilities. We apply a physics-informed Gaussian process (GP) optimizer to tune a complex system by conducting efficient global search. Typical GP models learn from past observations to make predictions, but this reduces their applicability to new systems where archive data is not available. Instead, here we use a fast approximate model from physics simulations to design the GP model. The GP is then employed to make inferences from sequential online observations in order to optimize the system. Simulation and experimental studies were carried out to demonstrate the method for online control of a storage ring. We show that the physics-informed GP outperforms current routinely used online optimizers in terms of convergence speed, and robustness on this task. The ability to inform the machine-learning model with physics may have wide applications in science.

Full PDF

PPhysics-informed Gaussian Process for Online Optimization of Particle Accelerators

Adi Hanuka, ∗ X. Huang, J. Shtalenkova, D. Kennedy, A. Edelen, V. R. Lalchand, D. Ratner, and J. Duris SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA University of Cambridge, United Kingdom (Dated: September 9, 2020)High-dimensional optimization is a critical challenge for operating large-scale scientiﬁc facilities.We apply a physics-informed Gaussian process (GP) optimizer to tune a complex system by con-ducting eﬃcient global search. Typical GP models learn from past observations to make predictions,but this reduces their applicability to new systems where archive data is not available. Instead, herewe use a fast approximate model from physics simulations to design the GP model. The GP is thenemployed to make inferences from sequential online observations in order to optimize the system.Simulation and experimental studies were carried out to demonstrate the method for online controlof a storage ring. We show that the physics-informed GP outperforms current routinely used onlineoptimizers in terms of convergence speed, and robustness on this task. The ability to inform themachine-learning model with physics may have wide applications in science.

Online control and tuning of modern particle acceler-ators, such as free electron lasers and storage ring lightsources, is a challenging task, since those systems oftenconsist of hundreds of correlated parameters that couldbe adjusted in order to ﬁnd a set of parameter valuesto achieve optimal target performance. Automated tun-ing can help deliver the highest beam quality to scientiﬁcusers during operation, and reduce tuning time for op-eration mode switching. This would be enabled by eﬃ-cient online optimization algorithms, which are necessaryin particle accelerators because, although physics mod-els exist, there are often signiﬁcant diﬀerences betweenthe simulation and the real accelerator. The critical re-quirement for a suitable tuning algorithm is the ability torobustly ﬁnd the optimum in a complex parameter spacewith high eﬃciency (minimum number of steps).Traditional model-independent optimization methods,that don’t require the gradient of the system, such asNelder-Mead simplex [1], may not work well for onlineapplications when the target is noisy. Other local, model-independent methods, such as robust conjugate directionsearch (RCDS) [2, 3] and extremum seeking (ES) [4] oﬀerresilience to noise by taking a large number of samples,thereby often taking a long time to converge, or requiresome fair initial conditions [5]. Machine learning (ML)model-based optimization methods may be beneﬁcial toimprove the quality of the solution, the speed of conver-gence and robustness to noise.ML model-based methods for online optimization typ-ically rely on learning from previously observed data.However, limited sparse sampling of high dimensionalarchived data may be insuﬃcient, for example whenlearning correlations between various control variables.In addition, learning from archive data becomes impos-sible when preparing for new conﬁgurations where rel-evant experimental data does not exist. On the otherhand, approximate physics models cannot be applied di-rectly on the system to be optimized. Those need to becalibrated, and even then cannot exactly ﬁt the observeddata. In this Letter we circumvent the limitations of both approaches by approximating the covariance of the sys-tem directly from the physics model and then buildinga model from a few online observations. Physics mod-els may capture the qualitative response of the objectivewith respect to controls better than archive data. Incor-porating those into ML models may increase the speed ofconvergence and robustness of an online tuning process.Bayesian optimization is a model-based approach tooptimizing expensive to evaluate, black-box systems withpossibly noisy inputs and outputs [6–8]. Its eﬀectivenessderives from probabilistic models of the system, such asGaussian processes (GPs) [9], which provide not only aprediction of the system’s response, but the uncertaintyin that prediction as well. GPs predict a distribution ofpossible functions compatible with observations by uti-lizing a covariance function, called the kernel, describingrelationships between those observations. An attractivefeature of GP modeling is the interpretability of the ker-nel’s functional form. The ﬂexibility to capture the com-plex dependencies encountered in modern experimentslies in the design of this kernel. Learning the kernel func-tion rather than the target function itself is less prone toerrors resulted from dependencies on drift or random hid-den variables.Recently, Bayesian optimization with Gaussian processsurrogate models has been successfully demonstrated onlinear accelerators [10–13]. Refs. [10, 11] contain GPswith diagonal kernels (without correlations) learned fromarchive experimental data. In Ref. [12], we learned corre-lations from a physics model, but still required archivedmachine data to learn the length scales to build the fullkernel. This was done in part because there was not acomplete physics model available. The ability to easilylearn the full kernel directly from a physics model wouldturn GPs into a practical tool applicable for tuning newmachines and conﬁgurations without any archived data.In this Letter we experimentally demonstrate a physics-informed

Bayesian optimization, where we usea physical model to directly derive the GP kernel in-cluding correlations. As an alternative to the traditional a r X i v : . [ phy s i c s . c o m p - ph ] S e p empirical kernel learning procedure using prior data [9],we construct the kernel from the physical model’s ba-sis functions. The basis function kernel eliminates theneed for many data samples (either observed or simu-lated) and empirical kernel selection through marginallikelihood maximization (referred as ML-II). As our pri-mary result, we demonstrate experimentally the physics-informed basis function approach eﬀectiveness by com-paring performance with the traditional data-informedML-II approach, and several other algorithms on theSPEAR3 storage ring [14] facility for minimizing the ver-tical emittance with respect to 13 skew quadrupoles mag-net. We ﬁnally discuss the importance of constructing akernel and prior mean that is representative of the systemto be modeled. Methods.—

Online tuning by Bayesian optimizationinvolves two main components: (i) an online surrogatemodel g ( x ) of how the objective f ( x ) responds to a vec-tor of input control values x (e.g. beam loss rate withrespect to 13 skew quadrupole magnet strengths). Thismodel is iteratively updated with observed data duringoptimization. (ii) An acquisition function which choosesthe next state based on the current state of the modelbuilt from the observed data.The surrogate model we chose is a Gaussian Process(GP) [9] — Bayesian non-parametric model which in-duces a prior over mean and covariance functions g ( x ) ∼G P ( m ( x ) , k ( x i , x j )), where x i , x j are all possible pairs inthe input domain. The mean function m ( x ) describes theexpected value of the objective, and the kernel k ( x i , x j )characterizes similarities between possible objective func-tion values at diﬀerent input points x i and x j . While theoptimum of the objective function may ﬂuctuate day today, the kernel captures the underlying behavior, allow-ing it to well represent the function given sampled data.To account for the observations’ noise, we model thenoise as independent and identically distributed Gaus-sian random variables with a zero mean and a vari-ance of σ n . The corresponding Gaussian noise kernel is k noise ( x i , x j ) = σ n δ i , j , δ is the Kronecker delta function.The GP is constructed directly from sampled instances,thus allowing the model’s complexity to grow with ob-servations and adapt to previously unexplored regions ofthe input space.One of the critical steps in achieving an operationalGP optimizer for complex systems is constructing a ker-nel which encodes the underlying behaviour and relation-ships in the modeled data. For systems with complexhigh-dimensional data structures, expressive kernels fa-cilitate eﬃcient learning from online acquired data. Ex-isting techniques to create expressive kernels from sim-pler ones include adding or multiplying kernels [15, 16] orapplying a nonlinear transform to the input data [17–19].In principle, general properties of kernels are controlledby a number of hyperparameters.Usually, kernels and their hyperparameters are cho-sen by the type-II maximum likelihood method [9]. This ML-II method learns the hyperparameters of a chosenkernel which maximizes the marginal likelihood of his-torical data; see the Supplemental for more details onthis approach [20]. When using experimental archiveddata we refer to this approach as data-informed ML-II Gaussian process. However, estimating a kernel’s hyper-parameters from archive data becomes impossible whenpreparing for new conﬁgurations.As an alternative, a physics simulation could be usedinstead of experimental data [21, 22], making it possibleto learn a kernel if there is only little or even no histori-cal data at all. We refer to this approach to kernel con-struction as physics-informed . However, as in the data-informed case, care must be taken in sampling the simula-tion input space to capture the objective’s complexity aswell as correlations between the input parameters. Us-ing simulation data can be expensive process and mayrequire long computational time since high dimensionalinput space would require many evaluations of a possiblyslow simulation. Then, using ML-II is costly, since thecomputational complexity scales as n for n data points.Therefore, there is a need to develop methods to ﬁnd thebest kernel and its hyperparameters without relying onmany data samples (either observed or simulated), whileallowing for the incorporation of prior physics knowledge.This would increase the kernel’s interpretability, and mayhelp gaining real insight into the system.In order to address this need, and to eliminate the re-quirement of empirical kernel selection using data (eitherobserved or simulated), we calculate the kernel directlyfrom a physical model. There is growing interest in in-corporating domain knowledge into kernel construction,including calculating the kernel directly from a physi-cal model. For example, previous studies used governingpartial diﬀerential equations to numerically calculate thecovariance matrices [23–25]. In this work, we leverage theconnection between inﬁnitely wide Bayesian neural net-works and Gaussian processes, to calculate the covariancefunction from an explicit basis function [9, 26, 27]: k ( x i , x j ) ∝ (cid:90) ∞−∞ φ ( x i − c ) φ ( x j − c ) dc (1)where c denotes the center of the basis function φ ( x ).We refer to GPs based on kernels designed in this way as basis-function GPs. Alternatively if the power spectraldensity (PSD) of the system is easier to model, the covari-ance function can be calculated from the amplitude of theFourier transform of the PSD using the Wiener–Khinchintheorem [28].For example, Radial Basis Function (RBF) of the form φ ( x ) = exp( − x T Σ x/ k ( x i , x j ) ∝ exp( − ( x i − x j ) T (Σ / x i − x i ) /

2) [9, 26],where Σ is the precision matrix, and ( .. ) T is the trans-pose operation. This type of basis functions are useful formodeling many smooth functions. The precision matrixis a symmetric matrix encoding properties of the func-tion. For example, if there are no correlations betweeninput parameters, Σ = diag( l ) − is a diagonal matrixwherein l is a vector of characteristic length-scales. Thelatter specify how function values at two points sepa-rated in space along a single dimension (for example, aquadrupole magnet strength) relate to each other. (a) Layout of SPEAR3 (b) Beam loss rate FIG. 1: (a) Layout of the SPEAR3 storage ring withthe 13 free skew quadrupoles used for the onlineoptimization of beam loss rate. The non-destructivecurrent monitor is shown as a yellow dot. (b) Beam lossrate projected on a single skew quadrupole current. Thedata is taken from archived operations scans.In what follows, we use an approximation to thephysics model as the basis function to design the ker-nel. This allows the GP to make predictions of the sys-tem using the covariance of the physics model as a esti-mate of that of the system. We refer to this approach as physics-informed basis-function

GP. Learning the kernelfrom simulated data instead of machine data is a formof kernel transfer learning [29, 30]. Furthermore, con-structing the kernel from basis functions without usingthe likelihood function is a form of Gaussian process withlikelihood-free inference [31, 32].In this Letter we consider the task of ﬁnding the peakof a system, which has a physics model of suﬃcient ﬁ-delity to capture the qualitative system’s response [33].For example, the simulation could have an unknown scal-ing and translation with respect to the machine but itsfunctional form is similar. In order to calculate thephysics model’s basis function, in this work, we considersystems that can be roughly approximated with a Gaus-sian around the optimum of the simulation. We thenapproximate the basis function by expanding the log ofthe simulation ˆ f ( x ) about a point x close to the globaloptimum with an analytic expansion to second order, af-ter subtracting oﬀ the asymptotic behavior ˆ f ( ∞ ) > G ) and the Hessian ( H ) of the log of the simulation G i = ∂ x i log (cid:2) ˆ f ( x ) − ˆ f ( ∞ ) (cid:3) | x = x , H i , j = ∂ x i ∂ x j log (cid:2) ˆ f ( x ) − ˆ f ( ∞ ) (cid:3) | x = x via numerical diﬀerentiation. The resultingexpansion is log (cid:2) ˆ f ( x ) − ˆ f ( ∞ ) (cid:3) + ( x − x ) T G + ( x − x ) T H ( x − x ). If the expansion point x is an optimumas in the work presented here, then the gradient may beneglected, and the basis function has the functional form of a Gaussian: φ ( x ) = [ ˆ f ( x ) − ˆ f ( ∞ ) (cid:3) exp (cid:20)

12 ( x − x ) T H ( x − x ) (cid:21) . (2)Then Eq. 2 is used to calculate the associated covariancefunction by applying Eq. 1. The resulting covariancefunction has the same functional form as the RBF kernel[9], with a precision matrix half that of the Hessian aboveΣ = − H/

2. The function value ˆ f ( ∞ ) was taken intoaccount as the GP prior mean. Experiment.—

In what follows we demonstrate exper-imentally the eﬀectiveness of the physics-informed basisfunction approach on SPEAR3 [14], a third generationstorage ring light source operating with low emittance,which results in high photon beam brightness. The goalof this optimization task is to minimize the average verti-cal emittance with skew quadrupoles. In an ideal electronstorage ring, the vertical emittance is nearly zero. How-ever, in reality there are various sources of errors that giverise to a ﬁnite vertical emittance, such as vertical disper-sion in dipole magnets and linear betatron coupling be-tween horizontal and vertical planes. Those error sourcescan be compensated by skew quadrupole magnets. InSPEAR3, there are 13 free skew quadrupoles for verti-cal emittance control (they do not change the horizontalemittance) - see Fig. 1a.Minimizing the vertical emittance is equivalent to min-imize the vertical beam area. Since the beam loss inthe experiment is Touschek scattering dominated, min-imizing the beam size corresponds to maximizing thebeam loss rate (Amperes per minute) [34], which is non-disruptively monitored. We calculated the Hessian atthe maximum beam loss rate point (see Fig. 1b) in twoways. First, we used a fast-executing surrogate modeltrained on simulation data of the

SPEAR3 storage ringMatlab Simulator [3] (details described in the Supple-mental [20]). This facilitates fast calculation of the Hes-sian. Second, we numerically calculated the Hessian di-rectly from a noiseless SPEAR3 physics simulation. Wefound these to be in acceptable agreement.While the precision matrix of the kernel containingboth lengthscales and correlations was calculated fromphysics simulations, the kernel’s amplitude was evaluatedfrom the variance of a uniform distribution spanning theobjective’s range, which was similar to the value obtainedby the ML-II. The kernel’s noise was measured from afew live machine measurements ( σ n = 0 . f ( ∞ ) ∼ f ( ∞ ), and ﬁrst observed point. (a) Online machine optimization -Comparison of optimizers (b) Simulated optimization -Comparison of optimizers (c) Online machine optimization -Prior mean eﬀect FIG. 2: (a) Comparison of optimization of beam loss rate over 13 skew quadrupole magnets for Gaussian process(GP) with physics-informed kernel including oﬀ-diagonal elements (blue), GP with diagonal-only data-informedkernel (red), Nelder-Mead simplex (green), and RCDS (black). Each step corresponds to approximately 1 to 2seconds for GP and simplex, and to 6 seconds for RCDS. (b) Simulations using the conditions of (a). Six individualscans for each method, with means shown by thick lines, are consistent with the relative performance of the onlineoptimizations. (c) Comparison of GP optimizers with the objective’s oﬀset as prior mean (solid), and without(dashed).

Results.—

In what follows, we show that online op-timization using the physics-informed basis functionapproach converges faster than the traditional data-informed ML-II approach. Since the basis-function ap-proach makes it easier to calculate correlations, it is fea-sible to create a kernel including oﬀ-diagonal elements,whereas for the available data, the ML-II approach is lim-ited to resolving a diagonal-only kernel. We also showthat both methods surpass the current established op-timization algorithms (Nelder-Mead simplex [1] and ro-bust conjugate direction search (RCDS) [2]), which areroutinely used to tune particle accelerator systems [35].Figure 2a shows results from online optimization ofthe beam loss rate simultaneously on 13 skew quadrupolemagnets. The GP optimizer with physics-informed ba-sis function kernel reached an optimum of 1.67 mA/minin the smallest number of function evaluations (30 to 40steps which are equivalent to 0.5 to 1 minutes). Thearchive data-informed ML-II GP achieved 1.62 mA/minin 40 to 60 steps (0.66 to 1.2 minutes). The Nelder-Meadsimplex optimizer achieved on 1.32 mA/min in approx-imately 160 steps (2.6 minutes). The RCDS optimizerachieved 1.66 mA/min, but took longer to converge; ap-proximately 180 steps wherein each step is 6 seconds—total of 20 minutes. This increased measurement steptime for RCDS allows for a reduced measurement noise of0.02 mA/min which was found helpful for RCDS to con-verge. In contrast, the GP optimizers handle the noisiermeasurements better, resulting in shorter step times.Although all optimizers, with the exception of Nelder-Mead simplex, found similar optimal loss rate within themeasurement uncertainty (0.02 mA/min RMS for RCDSand 0.04 mA/min for the rest), the physics-informedbasis-function found the optimum faster than even thedata-informed ML-II GP, owing to the fact that it incor-porates correlations between the quadrupoles to produce a better model. We also found that these results wereconsistent with subsequent tests where each optimizerstarted from the same random starting point with thesame initial beam loss rate as before.A comparison of the above optimizers in simulation en-vironment is shown in Fig. 2b. Although the SPEAR3simulator does not capture the full complexity of the livemachine, it allows us to compare the relative performanceof the optimizers with a simulated objective function,which we ﬁnd consistent with the online optimization.In simulation, on average, the physics-informed basis-function approach ﬁnds a better optimum in fewer it-erations than the other methods. In addition, the spreadof six individual scans for each method (with means inthick lines) reveals the robustness of the GPs, which fol-low similar trajectories for individual scans.Notably, the maximum available value of the simulatedobjective function is higher than the corresponding onlineoptimization value. This is understandable as the actualmachine has more coupling error sources than modeledin the simulation. In reality, on the machine we cannotexpect more than ∼ m ( x ) = ˆ f ( ∞ )). In what follows, wedemonstrate the eﬀect of the prior mean by comparing aGP model with a prior mean to one without ( m ( x ) = 0).Figure 2c shows a comparison of the optimization re-sults of the GP approaches with m ( x ) = 0 and ˆ f ( ∞ ).For both cases, the GP optimizers with zero prior mean(dashed lines) converged slower to a lower optimum, fur-ther validating the importance of the prior mean choice.In addition, the physics-informed basis function GP op-timizer converged faster than the data-informed ML-IIGP optimizer in both cases. Conclusion.—

We presented and experimentallydemonstrated a method incorporating physics modelsdirectly into a Gaussian process (GP) optimizer. Ourmethod presents a simple way to construct the GP ker-nel, including correlations between devices. The physics-informed GP, which is more representative of the system,performed faster in an online optimization task comparedto routinely used optimizers.In general, ML model-based methods for online opti-mization typically rely on many data samples. On theother hand, physics abounds with well veriﬁed mathe-matical models which we can exploit to learn approxi-mate system dynamics in order to optimize new systems,without prior data. We computed the kernel using anapproximated basis function from a physics model ratherthan from data samples. This method is faster to con-struct the full kernel, and could be easily adapted to othersystems. It would also be applicable for automatic tuningand control of new machines and other complex conﬁg-urations where historical data is unavailable or insuﬃ-cient to resolve the kernel’s hyper-parameters, includingcorrelations. The basis function method is particularlywell suited to analytical or diﬀerentiable models [37], aswell as surrogate models [38]. The incorporation of priorphysics knowledge would increase the attractiveness ofBayesian optimization with GPs for practitioners acrossvarious scientiﬁc domains, and may have wide applica-tions in science.The authors are grateful to the SPEAR3 operators andengineers for their help with live tests on the storage ring.This work was supported by the Department of Energy,Laboratory Directed Research and Development programat SLAC National Accelerator Laboratory, under con-tract DE-AC02-76SF00515, and by Oﬃce of AdvancedScientiﬁc Computing Research under FWP 2018-SLAC-100469ASCR. ∗ Corresponding author: [email protected][1] J. A. Nelder and R. Mead, The Computer Journal , 308(1965). [2] X. Huang, Phys. Rev. Accel. Beams , 104601 (2018).[3] X. Huang and J. Safranek, Phys. Rev. ST Accel. Beams , 084001 (2015).[4] A. Scheinker, X. Pang, and L. Rybarcyk, Phys. Rev. STAccel. Beams , 102803 (2013).[5] A. Scheinker, A. Edelen, D. Bohler, C. Emma, andA. Lutman, Phys. Rev. Lett. , 044801 (2018).[6] J. Moˇckus, Optimization Techniques IFIP TechnicalConference, Novosibirsk, July 1–7 , 400 (1975).[7] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, andN. de Freitas, Proceedings of the IEEE , 148 (2016).[8] E. Brochu, V. M. Cora, and N. de Freitas, Univ. ofBritish Columbia Tech. Rep. UBC TR-2009-023 andarXiv:1012.2599 (2010).[9] C. E. Rasmussen and C. K. I. Williams,Gaussian Processes for Machine Learning (MIT Press,2006).[10] M. McIntire, D. Ratner, and S. Ermon, Proceedings ofthe Thirty-Second Conference on Uncertainty in Artiﬁ-cial Intelligence , 517 (2016).[11] J. Kirschner, M. Mutn´y, N. Hiller, R. Ischebeck, andA. Krause, Proceedings of the 36th International Confer-ence on Machine Learning (ICML) , 3429 (2019).[12] J. Duris, D. Kennedy, A. Hanuka, J. Shtalenkova, A. Ede-len, P. Baxevanis, A. Egger, T. Cope, M. McIntire, S. Er-mon, and D. Ratner, Phys. Rev. Lett. , 124801(2020).[13] A. Hanuka, J. Duris, J. Shtalenkova, D. Kennedy, A. Ede-len, D. Ratner, and X. Huang, Proceedings of theMachine Learning for the Physical Sciences Workshop,NeurIPS (2019).[14] R. Hettel, in 9th European Particle Accelerator Conference(2004).[15] D. Duvenaud, J. Lloyd, R. Grosse, J. Tenenbaum, andG. Zoubin, Proceedings of the 30th International Confer-ence on Machine Learning , 1166 (2013).[16] S. Sun, G. Zhang, C. Wang, W. Zeng, J. Li, andR. Grosse, Proceedings of the 35th International Con-ference on Machine Learning , 4828– (2018).[17] A. G. Wilson, Z. Hu, R. R.Salakhutdinov, and E. P. Xing, inAdvances in Neural Information Processing Systems 29(2016) pp. 2586–2594.[18] R. Calandra, J. Peters, C. E. Rasmussen, and M. P.Deisenroth, Proceedings of the International Joint Con-ference on Neural Networks , 3338 (2016).[19] A. C. Damianou and N. D. Lawrence, Proceedings of the16th International Conference on Artiﬁcial Intelligenceand Statistics (AISTATS) , 207 (2013).[20] See Supplemental Material at [URL will be inserted bypublisher] for further details..[21] X. Yang, D. Barajas-Solano, G. Tartakovsky, and A. M.Tartakovsky, Journal of Computational Physics , 410(2019).[22] G. Camps-Valls, L. Martino, D. H. Svendsen,M. Campos-Taberner, J. Mu˜noz-Mar´ı, V. Laparra,D. Luengo, and F. J. Garc´ıa-Haro, Applied SoftComputing , 69 (2018).[23] A. M. Tartakovsky and R. Tipireddy, 52nd Hawaii Inter-national Conference on System Sciences HICSS 2019.[24] J.-l. Wu, C. Michel´en-str¨ofer, and H. Xiao, Computersand Fluids , 104292 (2019).[25] E. M. Constantinescu and M. Anitescu,International Journal for Uncertainty Quantiﬁcation,Tech. Rep. 1 (2013). [26] D. MacKay, ASI series F: Computer and System Sciences, 133–165 (1998).[27] R. M. Neal, Bayesian Learning for Neural Networks,Tech. Rep. (1996).[28] N. Wiener, Acta Math. , 117 (1930).[29] F. Aiolli, JMLR: Workshop and Conference Proceedings , 81 (2012).[30] 31st AAAI Conference on Artiﬁcial Intelligence, AAAI2017 , 1763 (2017).[31] M. U. Gutmann, J. Cor, and er, Journal of MachineLearning Research , 1 (2016).[32] E. Meeds and M. Welling, Uncertainty in Artiﬁcial Intel-ligence - Proceedings of the 30th Conference, UAI 2014 ,593 (2014), arXiv:1401.2838.[33] R. C. Conant and W. Ross Ashby, International Journalof Systems Science , 89 (1970). [34] X. Huang, Beam-based correction and optimization foraccelerators (CRC Press, 2019).[35] S. Tomin, G. Geloni, I. Agapov, I. Zagorodnov, Y. Fomin,Y. Krylov, A. Valintinov, W. Colocho, T. Cope, A. Eg-ger, and D. Ratner, Proceedings of the 7th InternationalParticle Accelerator Conference (2016).[36] J. Safranek, Nuclear Instruments and Methods in PhysicsResearch, Section A: Accelerators, Spectrometers, Detec-tors and Associated Equipment , 27 (1997).[37] J. Degrave, M. Hermans, J. Dambre, and F. wyﬀels,Frontiers in Neurorobotics , 6 (2019).[38] A. Edelen, N. Neveu, M. Frey, Y. Huber, C. Mayes,and A. Adelmann, Phys. Rev. Accel. Beams23