[PDF] Machine learning for beam dynamics studies at the CERN Large Hadron Collider

Abstract

Machine learning entails a broad range of techniques that have been widely used in Science and Engineering since decades. High-energy physics has also profited from the power of these tools for advanced analysis of colliders data. It is only up until recently that Machine Learning has started to be applied successfully in the domain of Accelerator Physics, which is testified by intense efforts deployed in this domain by several laboratories worldwide. This is also the case of CERN, where recently focused efforts have been devoted to the application of Machine Learning techniques to beam dynamics studies at the Large Hadron Collider (LHC). This implies a wide spectrum of applications from beam measurements and machine performance optimisation to analysis of numerical data from tracking simulations of non-linear beam dynamics. In this paper, the LHC-related applications that are currently pursued are presented and discussed in detail, paying also attention to future developments.

Full PDF

MM ACHINE LEARNING FOR BEAM DYNAMICS STUDIES AT THE

CERN L

ARGE H ADRON C OLLIDER

A P

REPRINT

P. Arpaia , G. Azzopardi , F. Blanc , G. Bregliozzi , X. Buffat , L. Coyle , E. Fol , F. Giordano ,M. Giovannozzi ∗ , T. Pieloni , R. Prevete , S. Redaelli , B. Salvachua , B. Salvant , M. Schenk ,M. Solfaroli Camillocci , R. Tom´as , G. Valentino , F.F. Van der Veken , and J. Wenninger Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione (DIETI), Universit`a degli studi di NapoliFederico II, 80125 Napoli, Italy Beams Department, CERN, Esplanade des Particules 1, 1211 Geneva 23, Switzerland Ecole Polytechnique Federale Lausanne, 1015, Lausanne, Switzerland Technology Department, CERN, Esplanade des Particules 1, 1211 Geneva 23, Switzerland Johann Wolfgang Goethe Universit¨at, Max-von-Laue-Str. 9, 60438 Frankfurt, Germany University of Malta MSD2080, Msida, Malta18th September 2020 A BSTRACT

Machine learning entails a broad range of techniques that have been widely used in Science andEngineering since decades. High-energy physics has also proﬁted from the power of these tools foradvanced analysis of colliders data. It is only up until recently that Machine Learning has startedto be applied successfully in the domain of Accelerator Physics, which is testiﬁed by intense effortsdeployed in this domain by several laboratories worldwide. This is also the case of CERN, whererecently focused efforts have been devoted to the application of Machine Learning techniques tobeam dynamics studies at the Large Hadron Collider (LHC). This implies a wide spectrum of ap-plications from beam measurements and machine performance optimisation to analysis of numericaldata from tracking simulations of non-linear beam dynamics. In this paper, the LHC-related applic-ations that are currently pursued are presented and discussed in detail, paying also attention to futuredevelopments. K eywords Machine Learning, Beam dynamics, LHC

Machine Learning (ML) is the process of building a mathematical model based on sample data, known as “trainingdata”, in order to make predictions or decisions without being explicitly programmed [61]. ML is a subset of ArtiﬁcialIntelligence (AI), and encompasses a number of learning paradigms, including Supervised Learning (SL), Unsuper-vised Learning (UL), and Reinforcement Learning. Typical ML tasks include classiﬁcation, regression, clustering,anomaly detection, dimensionality reduction, and reward maximisation [16].The process involved in using ML to train a mathematical model to achieve successfully a particular task involvesa number of steps. These include data collection and curation, feature (input) engineering, feature selection anddimensionality reduction, model hyper-parameter optimisation, model training, performance evaluation and ﬁnallydeployment in operation.In the SL paradigm, ML algorithms are trained on labelled data sets, meaning that there exists a ground-truth out-put (continuous or discrete) for each input. On the other hand, in UL [17] no ground-truth output is available, and ∗ Corresponding author: [email protected] a r X i v : . [ phy s i c s . acc - ph ] S e p PREPRINT - 18 TH S EPTEMBER

The layout of the LHC ring is shown schematically in Fig. 1 (see [25] for more detail). In the Run 2 (2015-2018),the LHC accelerated proton and ion beams from an energy of

Z GeV to a maximum ﬂat-top energy of . Z TeVwhere most of the physics programme was performed (see, e.g. [90]).The eight-fold symmetry is clearly visible as well as the main function of each long straight section. Note that the RFsystem is located in the same straight section as the non-distributed beam instrumentation devices, such as transverseand longitudinal proﬁle monitors and beam current monitors. It is also worth mentioning that LHC Sectors are deﬁnedas the ring parts in between the mid-points of consecutive Octants, e.g. Sector 4-5 is deﬁned as the fraction of the ringcircumference between the mid-point of Octant 4 and the mid-point of Octant 5.Great emphasis has been put on the design and development [35] of the linear optics and on its measurement andcorrection and these efforts have been rewarded by outstanding results [2, 68, 82, 67], which are among the key itemsthat brought to the excellent performance of the LHC. To improve these results even further one should tackle twokey issues, namely devise techniques to recognise efﬁciently faulty Beam Position Monitor (BPM) readings and buildeffective models to reproduce the impact of ﬁeld errors distributed in the ring. Both aspects are particularly suited tothe use of ML techniques, which have been actively pursued in recent years.Optics measurements and corrections at the LHC are incorporating ML techniques in two different forms, namely SLand UL. Supervised methods are used to explore the opportunities to build regression models that aim to reconstructindividual magnet errors from optics perturbations caused by these errors, while currently available correction tech-niques compute circuits strength settings to compensate the measured optics deviations from design. The preliminaryresults presented in [36, 39, 41, 40] clearly demonstrate the ability of ML-based regression models [20, 71, 70, 72, 54]2

PREPRINT - 18 TH S EPTEMBER

The LHC is susceptible to beam losses from normal and abnormal conditions, which can damage the state of super-conductivity in its magnets and eventually lead to a quench [25]. As a result, the equipment must be protected fromany damage or down-time that may be caused due to beam losses. The LHC relies on a robust collimation systemto dispose safely of such unavoidable beam losses. The LHC collimation system consists of around 100 collimatorsdistributed along the 27 km ring, whereby each collimator is made of two parallel absorbing blocks. Each of the fourjaw corners can be moved individually using dedicated stepper motors, for a total of about 400 degrees of freedom forthe whole system. The collimator jaws are positioned with an accuracy of 5 µ m around the circulating beam, with thetightest operational gap being around . mm at top energy.The halo cleaning performance provided by the LHC collimation system relies on a precise multi-stage transversesetting hierarchy of different collimator “families” (primary, secondary, and tertiary collimators; shower absorbers;protection devices) [23, 24]. The collimator settings are determined following a beam-based alignment (BBA) proced-ure established in [3], to determine the beam centre and beam size at their locations. This procedure moves collimatorjaws separately towards the beam halo, whilst monitoring the measured beam loss signal. Each collimator has a dedic-ated Beam Loss Monitoring (BLM) device positioned immediately downstream, to detect beam losses generated whenhalo particles impact the collimator jaws. A collimator is said to be aligned when both jaws are centred around thebeam after touching the beam halo, which is indicated by a signature spike pattern in the recorded beam loss signal.The local beam size is determined by comparing the aligned jaw positions against a reference beam halo establishedwith the primary collimators [84]. At present, the BBA is semi-automated [84] and collimation experts are required tomanually detect and classify such spikes following training and experience. Given the complexity of the LHC collim-ation system and of the LHC operational cycle, the procedure to establish settings from injection to collision is tediousand time consuming.Each year during beam commissioning, the collimators must be aligned to ensure the correct setup for the speciﬁcLHC run conﬁguration, prior to achieving nominal operation. They are aligned at different machine states; at injection( GeV) 79 collimators are aligned, and at ﬂat top ( . TeV) 75 collimators are aligned. Their settings are monitoredalong the year, and different collimator setups are required when machine parameters are changed. This alignmentprocedure is crucial as it is a prerequisite for every machine conﬁguration to set up the system for high-intensitybeams. This motivated the development of an automatic method, to allow for collimator alignments to be performedmore efﬁciently and upon request, at regular intervals.In addition to this motivation, it is noted that collimators are for the moment aligned assuming no tilt between thecollimator and the beam, therefore any tank misalignments or beam envelope angles at large-divergence locationscould introduce a tilt with respect to the optimum orientation, which might limit the collimation performance. Thisis a concern in particular if the collimation hierarchy is pushed to tighter retraction between families to optimise theperformance [23]. It is planned to improve the angular accuracy in the future, to optimise further the collimationsystem performance. A recent study [4] introduced three novel angular alignment methods to determine a collimatorsmost optimal angle, however these methods also make use of the semi-automated software and require much longersetup times.Collimator alignment campaigns involve continuously moving the jaws towards the beam, whilst ignoring any non-alignment spikes, until a clear alignment spike is observed. An alignment spike, as shown in Figure 2(a), indicatesthat the moving jaw touched the beam halo and is hence in contact with the primary beam. It consists of a steady-statesignal before the spike (corresponding to movements of the jaws before the beam is reached), the loss spike itself, thetemporal decay of losses, and a steady-state signal after the spike. This second steady-state, with larger losses than theﬁrst one, is a result of the continuous scraping of halo particles when the jaw positions are ﬁxed. The further a jaw cutsinto the beam halo the more the steady-state signal increases, as the density of the particles near the jaw increases. Anyother spikes which do not follow this pattern are classiﬁed as non-alignment spikes as shown in Figure 2(b). They donot have a ﬁxed structure and can contain spurious high spikes. Such non-alignment spikes arise due to other factors,i.e. beam instabilities or mechanical vibrations of the opposite jaw, thus indicating that the jaw has not yet touched thebeam and must resume its alignment. In order to achieve a reliable alignment, one has to be able to correctly identifysuch alignment spikes. Note that in a single alignment campaign, hundreds of such spikes need to be analysed.Recent work [10], sought to fully-automate the BBA by automating the process of spike recognition, by casting it asa classiﬁcation problem, such that ML models were trained to distinguish between the two spike patterns in the BLMlosses. Data was gathered from 11 semi-automatic collimator alignment campaigns performed in 2016 and 2018, bothat injection and at ﬂat top. A total of 6446 samples were extracted, 4379 positive (alignment spikes) and 2067 negative(non-alignment spikes). The data logged during alignment campaigns consists of the 100 Hz BLM signals and thecollimator jaw positions logged at a frequency of 1 Hz. The data extracted for the data set consists of the moments4

PREPRINT - 18 TH S EPTEMBER (a) Alignment spike - the corresponding collimator is aligned with the beam.(b) Non-alignment spikes - the corresponding collimator is far from the beam.

Figure 2: Typical BLM signals at 100 Hz showing (a) a clear alignment spike and (b) non-alignment spikes , afterinward collimator movements at approximately t = 0.5 s.when each collimator jaw(s) stopped moving, when the losses exceeded the threshold deﬁned for the semi-automaticalignment.Fourteen manually-engineered features were extracted from this data set and were analysed. In order to select the mostrelevant features, the strength of association between each pair of variables was ﬁrst analysed using the Spearmancorrelation. A feature selection analysis was then performed using ﬁve different ML models to see how they orderthe importance of each of the features. The models were individually trained using all features and outputted thefeatures ranked in ascending order, according to their importance. Finally, sequential forward selection algorithm(SFS) [69] was performed, to select the best features with the best hyper-parameters. The SFS algorithm tries allfeature combinations by introducing one feature at a time and keeping the best feature for future combinations.The resulting ﬁve most important features were: • Height (1 feature) - This is calculated by subtracting the average steady state losses before the spike from themaximum value. The average steady state is calculated from the BLM signal after the decay of the previousalignment, until the current collimator was stopped. • Spike decay (3 features) - Exponential ﬁt to the decay in the BLM signal using ae − bx + c . • Position in sigma (1 feature) - A beam size invariant way of expressing the fraction of the normally distributedbeam interrupted by the jaw, as the beam size in mm varies across locations in the accelerator.These features were used to train and compare six ML models for binary classiﬁcation, namely; Logistic Regression,Neural Network, Support Vector Machine, Decision Tree, Random Forest, Gradient Boost.When aligning any collimator it is vital that after the spike detection predicts a collimator to be aligned, then the col-limator should have actually touched the beam and is aligned. Otherwise, if the spike detection predicts the collimatorto be aligned when the collimator is still far from the beam, this would result in a misalignment and the collimatormust be realigned from the beginning. As a result, false detection of an alignment spike is more grievous than notdetecting an alignment spike, therefore precision was used as the main performance metric. Each model was ana-lysed in-depth, optimised using hyper-parameters, and thoroughly tested on unseen data. The results were collectedby applying a cross-validation on the training set, which was performed by dividing the original training set into tenrandomly-deﬁned subsets: nine used for training and the last one for validating the results. This procedure was thenrepeated 30 times, each time with a different random partition of the training set in order to handle lucky splits. Fig-ure 3 plots the precision distribution obtained by each of the models and their Ensemble. In addition, Tukeys HSD5

PREPRINT - 18 TH S EPTEMBER

BLM 2018 MD B L M C o mm i ss i o n i n g Figure 4: Parallel alignment MD results at injection: the beam centres obtained in the 2018 commissioning usingBLMs as a function of the centre values obtained in MD. The results show a good correlation.This new fully-automatic alignment software was successfully used throughout 2018 LHC operation. The ﬁrst versionwas used during commissioning, such that the collimators in the two beams were automatically aligned sequentially, atinjection and ﬂat top. A machine development (MD) study was then scheduled to test the alignment of the collimatorsof the two beams in parallel. Finally, another MD was scheduled to test the parallel fully-automatic software withangular alignments.The collimator centres measured at injection with BLM detectors during injection commissioning in 2018 are similarto the centres obtained during the parallel alignment MD. This is depicted in Figure 4, evidently showing the reprodu-cibility of the LHC and the quality of orbit and optics correction that enable such an excellent stability of the collimatoralignment.The time to align the collimators at injection was decreased by 71.4%, compared to the semi-automatic alignmentin 2017, namely from 2.8 hours to 50 minutes [5, 9], as shown in Figure 5. Finally, this fully-automatic tool wasalso incorporated into the angular alignment implementation and successfully decreased the alignment time by 70%,requiring no human intervention. Overall, the full-automation with the use of ML has proven to be more efﬁcient and6

PREPRINT - 18 TH S EPTEMBER

The LHC is a complex machine with numerous intertwined systems, each potentially impacting the dynamics andstability of the beams. As such, building a rigorous model of particle losses occurring in the LHC is a very dauntingtask, but it would offer valuable insight into the inner workings of the machine. This will help push its performancefurther, and also allow exploring ML techniques for potential use in the design and operation of a future FCC [14].The main goal behind this work is to develop a system capable of determining the optimal set of operational parametersso as to maximise the beams’ intensity lifetimes given a speciﬁc machine conﬁguration [28]. This system couldthen assist in the setup of the LHC, and potentially help identify unknown correlations between different machineparameters. Eventually, the objective is to also compare the model built from experimental data with results fromparticle tracking simulations.The approach we took to develop the system is to make use of the swaths of LHC data acquired through the severalinstrumentation systems in order to build a data-driven surrogate model of the beams’ lifetime. This could then becoupled with an optimisation algorithm to determine the optimal operational parameters.This problem was treated with a supervised-learning framework. The output of the model is the beam lifetime and theinputs are the operational knobs of the machine, i.e. the tunes, chromaticities, and magnet currents. The data coveran entire operational year, but to simplify the input/output relationship, the data are taken from a small section of thecomplete machine cycle, corresponding to the end of the injection energy plateau just before launching the ramp. Thiswill need to be extended in the future.Several SL models were trained and compared. The best performance was achieved with a Gradient Boosted DecisionTree model [50]. Once a surrogate model is trained, it can be paired with a variety of optimisers. In our case, anoff-the-shelf simplex optimiser [63] was used to extract the optimal machine conﬁguration from the trained lifetimeresponse. We observed, however, that the distribution of the input data was far from ideal. This is to be expected,as LHC operation relies on reproducing strictly the same parameter set in every cycle to avoid uncontrolled beamlosses with very high stored energy due to accidental ‘exploration’ of the beam parameter space. By consequence,the surrogate model trained on the available data represents operational machine conﬁgurations well, but has a ratherlimited predictive power for non-operational machine setups. The parameter space was hence explored further withthe help of a dedicated MD session in which multiple random tune scans were performed over varying machineconﬁgurations [29]. The data collected during this study are used to benchmark and supplement the current model. Anumber of beam instabilities that increased the beams’ emittances, thus reducing the machine’s performance, had beenobserved. Such instabilities are currently not taken into account by the model which is yet a weakness of the setup.Nonetheless, ignoring this blind spot and restricting ourselves to the, albeit naive, lifetime optimisation problem, themodel does agree with the lifetime optimal regions of the vertical Q v vs horizontal Q h tune diagrams, see Fig. 6. Themodel proves to be capable of moving towards the optimal regions, achieving a lifetime improvement of a factor oftwo, but falls short of the maximum. This could be due to the fact that the observed maximum is quite far from thenominal working point, as such the model will have sampled relatively few conﬁgurations as eccentric in the training7 PREPRINT - 18 TH S EPTEMBER

Collective instabilities can lead to a severe deterioration of beam quality, in terms of reduced beam intensity andincreased beam emittance, and consequently a reduction of the collider’s luminosity. It is therefore crucial for the op-eration of the LHC to understand the conditions in which they appear in order to ﬁnd appropriate mitigation measures.For that purpose the LHC is equipped with a few dedicated measurement devices. Here we focus on the transversedamper’s observation box (ObsBox) [26].This device handles data coming from various transverse beam position monitors and keeps a rolling buffer of the data.When triggered, the ObsBox writes the buffer to disk. This trigger can be done manually, but the analyses presentedhere focus on data coming from an automatic system, which should ideally trigger the data saving when an instabilityis detected. The data saved is of very high resolution and contains bunch-by-bunch and turn-by-turn transverse beamposition information throughout the machine cycle covering all beam modes, and ﬁll types. The automatic triggeringsystem has so far accumulated around TB of data. The analysis presented here focuses on the horizontal motionof Beam 1, but it can be trivially extended to the other beam and planes. This data set was acquired between th ofSeptember 2017 and rd of December 2018 and contains a total of 36196 triggers. Unfortunately, the vast majority of8 PREPRINT - 18 TH S EPTEMBER , while explaining 93% of the variance of the extracted features, as shown in Fig. 7.Figure 7: Principal Component Analysis of ObsBox automatic triggering data. Histogram: explained variance of eachPCA component. Curve: cumulative explained variance.We then apply an off-the-shelf Isolation Forest [59] algorithm to such PCA space to isolate the anomalous samples. IFworks by iteratively splitting the space using randomly placed hyperplanes, with the intuition that anomalous pointstake fewer iterations to isolate than nominal points, and based on this it is able to distinguish between the two. Thequality of the IF’s predictions is hard to evaluate quantitatively as there are no easily-accessible labels for these data.However, there is a small list of manually-classiﬁed instabilities [1], which was used to qualitatively tune and evaluatethe accuracy of the IF. An example of a nominal and an anomalous signal, as predicted by the IF, is shown in Fig. 8.The nominal signals are from ﬁll 6595, on rd July 2018 at 22:07:21, and represent the situation during the injectionbeam process. On the other hand, the anomalous signals occurred in ﬁll 7392, on th October 2018 at 20:21:10, andrepresent the situation at top-energy, prior to bringing the beams in collision.Therefore, by using the trained IF we are able to ﬁlter out the false triggers from the ObsBox data. After retaining onlythe predicted anomalous samples from the entire set of ObsBox data, more computationally-expensive algorithms canbe run for further analyses. In order to simplify the clustering, each bunch is considered as independent. Althoughwith this assumption the model is not able to distinguish between single- and coupled-bunch instabilities, it greatlysimpliﬁes the modelling as we are left with a clustering problem for an univariate time series. The clustering of this9

PREPRINT - 18 TH S EPTEMBER (a) Predicted inlier by Isolation Forest.(b) Predicted outlier by Isolation Forest.

Figure 8: Examples of signals for 108 nominal bunches from ﬁll 6595 (a) and 10 anomalous ones from ﬁll 7392 (b),as predicted by the IF. | ∆ x | is the absolute change in horizontal beam amplitude.type of time series requires determining an adequate distance metric, which should quantify how similar two timeseries are. A well-regarded metric for time-series similarity is the Dynamic Time Warping (DTW) [13, 76], because,as shown in [45], DTW and its variants outperform other time-series similarity metrics and it is able to determinethe similarity of two time series while remaining invariant to local warping. Using DTW it is possible to create adistance matrix describing the similarity of the evolution of each bunch with respect to all other bunches. This distancematrix is then fed to a Hierarchical Clustering Algorithm [62], which iteratively identiﬁes and links clusters to form adendrogram. A random subset of such dendrogram, obtained from the whole data set, along with the correspondingsignals, is shown in Fig. 9.Signals with similar behaviour do, for the most part, get clustered together. However, it is clear that these data containmuch more than simply transverse excitation due to collective instabilities. Indeed, there are other phenomena visiblehere. Notably, the step-function-like signals highlighted in blue background are caused by the AC dipole, installedin the LHC to measure the optical functions of the magnetic lattice [77]. The AC dipole induces large transverseoffsets of the beam that are clearly observed and clustered correctly. In the subset of plotted signals we do observesome well-clustered instabilities, highlighted in red background. It is also possible to observe some patterns thatappear in different clusters, while they should be put together, which are highlighted in green background. This showsa limitation of the DTW implementations used, which does not allow for partial matches. Other implementationscan relax the end-point conditions, which could be used if deemed necessary. The remaining signals in this randomsubset are clustered correctly, however, the cause of their behaviour remains unknown, and further investigations mustbe performed to explain these patterns. It is clear from this analysis that the ObsBox’s triggering system can beimproved. In fact using a data-driven model, similar to those presented, to control the triggering in a more intelligentmanner would drastically reduce the number of false triggers.10 PREPRINT - 18 TH S EPTEMBER

PREPRINT - 18 TH S EPTEMBER

The charged particle beams stored in a high-energy, high-intensity accelerator such as the LHC may induce heatingof surrounding equipment. The main sources of heating are: electron cloud [73], particles lost on the beam surround-ings [22], synchrotron radiation [78], and beam-induced RF heating due to impedance [93], which has been one of thelimitations to reaching nominal performance of the machine during the LHC Run 1 [75].In principle, the beam-induced heating can be directly monitored by means of temperature probes, such as PT100 [86]devices in the LHC ring or optical ﬁbres in the CMS [74] detector. However, large fractions of the LHC ring are leftwithout temperature monitoring. Temperature increase in a high-vacuum environment may lead to outgassing [47],which can be observed as pressure increase in vacuum gauges and this was the case, e.g. for the injection protectiondevice (TDI), during the 2012 LHC run [75]. However, this was not an isolated case. Indeed, abnormal outgassinglevels were observed in the Beam Gas Ionization (BGI) proﬁle monitor during Run 1 and the installation of temperatureprobes conﬁrmed during Run 2 that the outgassing was in fact linked with heating effects. Similarly, some vacuummodules implementing RF ﬁngers featured high-level of pressure spikes, which turned out to be due to RF heating andcould be solved by an improved design of the modules. Overall, the behaviours observed during LHC operations hintclearly to the need of an efﬁcient tool for an early detection of heating effects to avoid serious damages to the LHChardware.Vacuum monitoring is much denser and systematic than temperature monitoring, with more than 1200 vacuum gaugesdistributed all over the LHC circumference [49]. Nevertheless, analysing the pattern of the readings from these vacuumgauges one by one after each LHC ﬁll in order to detect abnormal behaviour, represents a tedious and time-consumingtask. Not to mention that a robust technique to translate the vacuum readings into equivalent values of temperature isnot completely trivial. Hence, automatising this pattern-classiﬁcation process is expected to result in a signiﬁcant gainof time for the physics run and manpower.The applicability of ML to this task has been investigated by building an automatic classiﬁcation algorithm for pressurereadings produced by the vacuum gauges in order to detect heating patterns. Heating in a pressure-reading pattern canbe observed as an anomalous pressure increase as seen in Fig 10. The pressure evolution of a vacuum gauge located inSector 4-5, close to the stand-alone magnets D4 and Q5, on the Beam 2 channel is shown as a function of time duringa typical LHC cycle. The evolution of beam intensity, energy, and the average (over the bunches) bunch length is alsoadded. The pressure reading features sudden changes that are not related with bunch length variations, thus hinting tooutgassing induced by a temperature increase. It is clear that in this approach the underlying physical phenomenontriggering the temperature rise is disregarded and only its consequences are analysed.Since the goal is to reduce the time needed in ﬁnding the abnormal gauges, the classiﬁer aims to select a subset ofall the gauges producing data in which all the abnormal ones are present. Statistically speaking, this means that theclassiﬁer should reach a high recall score [46], which is deﬁned as the fraction of true positives detected over the totalamount of positive cases. In order to apply SL techniques, more than 700 readings have been labelled with expertsupervision creating a data set with 700 time series and of 3000 steps in time each, where each time series is a vacuumgauge reading. Even if the data set is made of time series the goal here is to classify them and not to predict futurevalues as it is usually done with common time-series problems.To reduce the dimensionality of the data set, a PCA [91] has been performed leading to retain only 12 features. These12 features do not have any physical meaning, but they explain the . of the variance of the full data set. In thisway, the dimensionality reduction does not lead to a signiﬁcant information loss. On the resulting data set with only12 features, ﬁrstly a K-Nearest Neighbour Classiﬁer (KNN) [31] and then Multi-Layer Perceptron (MLP) [66] havebeen trained. For the KNN algorithm, the best value of k and the best algorithm to use between k-dimensional Tree(k-d Tree) and Ball Tree [65] and a boolean value that indicates whether the data set has been scaled to have zeromean and unit variance have been tuned using Grid Search [53]. For the MLP algorithm, a sigmoid activation function(also called logistic function) has been applied at the output layer to perform a classiﬁcation, while a rectiﬁed linearunit activation function, deﬁned as the positive part of its argument, i.e. max { , x } where x is the input value, hasbeen used for the hidden layers. Cross-entropy has been used as loss function being a classiﬁcation task. RandomisedSearch [15] has been applied to identify the optimal number of layers and neurons in order to explore a wide range ofinput values. The goal of the tuning of hyper-parameters was to increase the recall score.To evaluate the performance of both the KNN and the MLP classiﬁers, a 4-fold cross-validation technique [52] hasbeen applied while training each model. Stratiﬁed splitting [64] has been used for the 4-fold technique in which thefolds are made by preserving the percentage of samples for each class.The results of the parameter set scan maximising the recall for the KNN are shown in Fig. 11 (upper). Note the reddots corresponding to recall = 1 in the KNN classiﬁer are parameter sets for which the algorithm is overﬁtting the12 PREPRINT - 18 TH S EPTEMBER recall = 1 for the training data is a perfect example of overﬁtting.The best parameter set is found to be that at index 7 in the plot, i.e the ﬁrst set of parameters that is not overﬁtting thetraining set, for which recall = 0 . ± . . The parameter set for the the KNN contains the value k of the algorithm,the algorithm used, and a boolean value that indicates whether the data set has been scaled to have zero mean and unitvariance. For the best parameter set k = 5 , the algorithms is k-d Tree, and the data set has been scaled. Note thatKNN has precision = 0 . ± . meaning that within the vacuum gauges classiﬁed as abnormal, 97% are actuallyabnormal.The MLP scan over the network parameters is represented in Fig. 11 (bottom). This result is achieved for the parameterset index 0, corresponding to a network made of 2 hidden layers with 176 neurons per layer. The recall score for theneural network is . ± . , which represents a improvement with respect to the KNN case. The precision score in this case is . ± . , which is lower than the KNN score . However, MLP is to be preferred over KNN,since recall is the main ﬁgure-of-merit and the increased computational burden is not an issue, given the relativelysmall data set used for training.The implementation of a very simple Neural Network shows already promising recall scores, which motivates testingmore reﬁned ML techniques on this task. Convolutional Neural Network [51] and ensemble methods are currentlybeing investigated to push further the performance of the classiﬁer.13 PREPRINT - 18 TH S EPTEMBER

Recall scores of the KNN classiﬁer (top) and of the MLP (bottom). The red dots at recall = 1 in the KNNclassiﬁer are parameters set for which the algorithm is overﬁtting the training data.14

PREPRINT - 18 TH S EPTEMBER

One of the most relevant and useful concepts in the study of non-linear beam dynamics is that of Dynamic Aperture(DA), which represents the radius of the smallest sphere inscribed in the connected volume in phase space in whichmotion is bounded over a given time interval [81]. It can be estimated from tracking simulations, where a given setof initial conditions, uniformly distributed in polar co-ordinates in normalised physical space, are probed for boundedmotion. All this is repeated for a number of different realisations of the magnetic ﬁeld errors (the so-called seeds) fora given accelerator model, according to the following formula DA ave = 1 N seed N seed (cid:88) i =1 (cid:90) dθ r i ( θ ) , (1)where r i ( θ ) represents the last stable amplitude for seed i in the direction given by the angle θ . This deﬁnition istypically used for a reﬁned understanding of the features ruling the DA (see, e.g. [12]). However, for design studies,where conservative estimates are more appropriate, the DA is evaluated as DA min = min i,j r i,j ≤ i ≤ N seed , ≤ j ≤ N angle , (2)where r i,j represents the last stable amplitude for the i th seed and j th angle. Such a deﬁnition might be stronglyaffected by outliers, which is the reason for our attempts to provide automatic tools to deal with outlier recognition.An example of DA plots is given in Fig. 16, where results of DA computations for two LHC conﬁgurations are givenfor sixty seeds, eleven angles, and simulated turns. The left plot refers to the optics version 1.3 for the High-Luminosity LHC at top energy, with β ∗ = 15 cm, Q (cid:48) = 15 and strong powering of the Landau octupoles, but withoutbeam-beam effects. The right plot refers to the optics conﬁguration of the LHC during the 2016 proton run at injection, Q (cid:48) = 8 and strong Landau octupoles to ﬁght electron-cloud effects.It is not uncommon that for a given angle the stable amplitude differs considerably from seed to seed, thus generating adistribution of stable amplitudes over seeds with outliers, which can strongly affect DA min . Outliers may arise becausethe distribution of non-linear magnetic errors excites particular resonances in a way that is highly seed-dependent. It isclear that outliers possibly represent unlikely conﬁgurations that might be removed from the analysis of the numericaldata for the computation of DA min . ML techniques have been used in large-scale DA simulations to ﬂag certain resultsas outliers, which can then be dealt with accordingly.One has to make sure to distinguish a set of outliers from a justiﬁable split of a set of points into a certain numberof clusters. For this reason, the outlier detection is done in several steps. First, for each angle the r i,j values for thatangle and for the different seeds are rescaled between the minimum and maximum values. We then investigated twotypes of ML approaches in order to automatically detect outliers. In the SL approach, we treat the goal of outlierdetection as a classiﬁcation problem, and train a SVM to distinguish between normal and abnormal points. Followinga hyperparameter search, we identiﬁed the Radial Basis Function (RBF) kernel [87] with a penalty factor C of unityas the best hyperparameters for the SVM model.It is useful to observe the performance of the model as a function of the number of training points. This is knownas a learning curve, and is shown in Fig. 12. Each point in the curves represents the number of True Positives (TP),False Positives (FP), True Negatives (TN) and False Negatives (FN) obtained on a test data set whose size correspondsto 25% of the overall data set available, when the model is trained on all anomalous points plus a certain increasingnumber of normal points. A TP corresponds to a ground-truth anomalous point, which was correctly predicted to beanomalous. The results show that when the training data set is close to being balanced between anomalous and normalpoints, the number of TP is quite high, while the FP and FN are low. However, as the data set becomes more and moreskewed towards normal points, the model achieves a lower performance. This is understandable given the assumptionof balance in the SVM algorithm.We also investigated two UL approaches for detecting anomalies on an angle-by-angle basis. The ﬁrst algorithm is theDensity-Based Spatial Clustering of Applications with Noise (DBSCAN) method [34]. The DBSCAN is a density-based clustering non-parametric algorithm: it groups points that are closely packed together. The points which are notassigned to any cluster after applying the algorithm are automatically considered to be outliers. The second approachis the Local Outlier Factor (LOF) [21] algorithm. LOF quantiﬁes the outlier strength of each point based on a conceptof a local density. Locality is given by the K nearest neighbours, whose distance is used to estimate the density. Thecomparison of the local density of an object to the local densities of its neighbours, allows regions of similar densityto be identiﬁed. The points that have a substantially lower density than their neighbours are considered to be outliers.Following hyperparameter optimisation, the following is a list of the determined hyperparameters for each method:15 PREPRINT - 18 TH S EPTEMBER N u m b e r o f s a m p l e s TPFPFN N u m b e r o f s a m p l e s TN Figure 12: Learning curves for the SVM training, showing the TP, FP and FN (top) and the TN (bottom). • DBSCAN: eps=1 (the maximum distance between two samples for one to be considered as in the neighbour-hood of the other); min samples=3 (the number of samples, or total weight, in a neighbourhood for a point tobe considered as a core point, including the point itself); • LOF: n neighbors=58 (number of neighbours used to measure the local deviation of density of a given samplewith respect to the same neighbours); contamination=0.001 (the expected proportion of outliers in the dataset).A comparison between the SVM, DBSCAN, and LOF algorithms is shown in Fig. 13. The labels predicted by theDBSCAN and LOF algorithms were also combined through a binary OR operation to produce a fourth set of labels. Afurther ﬁfth set of labels is created by removing false positives using a statistical method (following an initial labellingby DBSCAN) to determine whether this would add to the robustness of the prediction. For a point, being ﬂagged byDBSCAN as an outlier, to be considered a true outlier, we demand that it satisﬁes three additional criteria: the distancefrom the mean should be at least 3 standard deviations (where mean and standard deviation are calculated over theregular points, only); the distance to the nearest regular point should be at least . , in absolute units; the distanceto the nearest regular point should be at least 34% of the total spread of the regular points. This post-processingis performed in an iterative manner, starting at the minimum (maximum) point and working outwards (inwards),recalculating the statistical variables of the regular points at every step. The values of these thresholds are chosenempirically as such to help catching FP points that arise from strongly-packed clusters.The results show that the unsupervised methods perform an order of magnitude better than SVM in terms of false pos-itives, however they are worse in terms of false negatives, especially when using LOF. The method of post-processingfollowing DBSCAN clearly contributes to reducing the number of false positives, while maintaining the TP and FNrates.It is also very useful to investigate the dependence of the number of outliers on the angular distribution and on the seednumber. A comparison between the ground-truth and predicted anomalies (using post-processing following DBSCAN)is shown in Fig. 14, where it can be seen that the anomaly proﬁles for both seeds and angles are similar between the16 PREPRINT - 18 TH S EPTEMBER

TN TP FN FP10 N u m b e r o f s a m p l e s SVMDBSCANLOFBinary ORPost-processing after DBSCAN

Figure 13: Results from anomaly detection using SVM, DBSCAN, LOF, a binary OR between DBSCAN and LOF andpost-processing following DBSCAN methods. TP = True Positives (anomaly correctly detected), TN = True Negatives(normal point correctly detected), FP = False Positives, FN = False Negatives.ground-truth and the predictions. It is worth noting the peculiar proﬁle of outliers as a function of seed, featuring threeclusters with very large number of outliers. As far as the anomalies distribution as a function of angle is concerned,there is a tendency towards larger number of outliers for angles close to ◦ . As a last observation, the number ofoutliers at low amplitude equals that at large amplitude. A combined visualisation for seeds and angles is shown inFig. 15. N u m b e r o f a n o m a li e s ground truthDBSCAN with post-processing N u m b e r o f a n o m a li e s ground truthDBSCAN with post-processing Figure 14: Visualizations of the anomalies by seed (top) and by angle (bottom), showing the similarity between theground-truth and the result of the post-processing following DBSCAN.17

PREPRINT - 18 TH S EPTEMBER S ee d s A n g l e s [ d e g ] N u m b e r o f a n o m a li e s Anomalies by seed and angle S ee d s A n g l e s N u m b e r o f a n o m a li e s Anomalies by seed and angle

Figure 15: Visualizations of the anomalies by seed and angle for the ground-truth (top) and the result of the post-processing following DBSCAN (bottom).This analysis potentially provides insight into the sensitivity of the underlying physics; investigations on this matterare still on-going.Two examples of the classiﬁcation obtained by means of the post-processing following DBSCAN are shown in Fig. 16. i,j [σ]0 2 4 6 8 10 12True NegativeTrue Positive 0°10°20°30°40°50°60°70°80°90° r i,j [σ]0 2 4 6 8 10True NegativeTrue PositiveFalse Positive

Figure 16: DA simulations for two LHC conﬁgurations. The initial 4D co-ordinates are of the type ( x, , y, and apolar scan is performed on ( x, y ) . The various markers represent the results of the sixty seeds used. Left: Example ofa DA computation where an outlier is correctly ﬂagged (in green). Right: Examples of false positives (in red). The FPcases, are less worrisome as they refer to the determination of the maximum stable amplitude, which does not affect DA min .It should be noted that the dynamics governing the DA can be very different as a function of the angle θ , hence evenwhen the neighbouring points are similar in amplitude, the spotted outlier might be a genuine one. It is clear that18 PREPRINT - 18 TH S EPTEMBER

In this paper, a selection of ML applications, based on algorithms of either Supervised Learning or UnsupervisedLearning type, for a variety of domains linked with beam dynamics aspects at the LHC have been presented anddiscussed in detail.All this started from the quest for improving optics measurements and corrections, in particular by detecting faultyBPMs in harmonic analysis of turn-by-turn measurements to avoid the appearance of outliers in the computed opticsfunctions. Data cleaning has also been successfully achieved by means of anomaly detection and clustering techniques.As far as optics corrections are concerned, already the basic Neural Network implementation produces very interestingand promising results. Note that a larger data set is being generated and more error sources and non-linearities arebeing added in order to create a more general model. Further improvements are foreseen by using an autoencoder net-work to improve the quality of betatron-phase measurements, which is the fundamental part of optics and correctionscomputations. Furthermore, autoencoders could be used to perform denoising of turn-by-turn data.Excellent results have been obtained with the automated alignment of collimators, in which ML has been used todistinguish genuine beam-loss spikes from spurious events during the alignment process of the collimators’ jaws. Thisapproach has achieved a remarkable speeding up in setup time of the collimators systems, with an overall beneﬁcialimpact on LHC operation as the ML implementation has become the operational one. A continued analysis of thecross-talk effects for the losses of the two beams in the future will allow to perform more alignments in parallel and isbeing pursued actively.The ML model of the LHC beam lifetimes showed promising results once the parameter space was uncorrelated bya dedicated machine experiment in the LHC. While it manages to represent operational machine setups well, it lackspredictive power when considering non-standard conﬁgurations at present. Future work includes exploring alternativeML approaches, such as treating the data as time series, evaluating the model performance across several years of LHCoperation, and supporting the experimental data with a surrogate model based on detailed beam loss simulations fromparticle tracking codes.Unsupervised learning has been employed to detect and classify beam instabilities as well as other anomalies inthe beam position data acquired by the LHC ObsBox system through an automatic triggering system. Promisingresults have been obtained by combining anomaly-detection techniques with a Hierarchical Clustering algorithm.Dynamic Time Warping was found to provide a robust distance metric to group similar-looking time series. Theresults demonstrate the power of these techniques when dealing with drastically unbalanced data sets, i.e. where onlya small fraction of the data actually represents beam instabilities. To further improve the analysis, potentially importantchanges are already envisioned, such as using autoencoders in place of the feature extraction, Principal ComponentAnalysis and Isolation Forest steps. Additionally, it will be explored whether the ML approach can help improving theautomatic ObsBox triggering with the goal to reduce the number of false triggers in the ﬁrst place.The classiﬁcation of vacuum gauges measurements to identify possible heating issues during LHC operation has beenalso considered as a candidate for ML applications. Promising results have been obtained and a Multi-Layer Perceptronhas been shown to perform better in terms of recall score. Currently, more ML and deep learning approaches are underinvestigation to push further the performance of the classiﬁcation algorithms. The ultimate goal is to develop anapplication to be deployed in routine operation during the LHC Run 3.While the majority of ML applications presented here is connected with beam measurements, the reﬁned analysis ofnumerical simulations for DA computation has also been considered as a suitable candidate for ML techniques. Indeed,the detection of outliers in the DA values, carried out for sixty realisations of the magnetic errors of the LHC model,has proven to be effectively tackled by Unsupervised Learning. A certain improvement in outlier detection has beenachieved by using voting between two different clustering algorithms, but the best outcome has been achieved with acareful post-processing of the results obtained with DBSCAN. Supervised Learning has been attempted too, but eventhough it has the lowest number of false negatives it creates ten times as much false positives, severely hampering itsusability. Thanks to these analyses, it has been possible to study the distribution of outliers for the various realisationsof the magnetic ﬁeld errors as well as for the angle in x − y space. These results need further physical analyses to getmore insight in the observed features, but represent a very useful and new tool.In the future, the time evolution of DA will be considered in conjunction with ML techniques. It is known from theorythat such an evolution follows well-deﬁned scaling laws [44, 43, 85, 12]. These laws can be used to extrapolate aCPU-intensive simulation, performed over a relatively short number of turns, to realistic timescales in an inexpensive19 PREPRINT - 18 TH S EPTEMBER

Acknowledgments

We would like to thank the LHC team of the Operations Group for the support during the experimental sessions.EPFL studies are supported by the Swiss Accelerator Research and Technology institute (CHART).

References [1] LHC transverse instability table 2018. Available at http://lhcinstability.web.cern.ch/lhcinstability/csv-to-html-table/2018.html (17/04/2020).[2] M. Aiba, S. Fartoukh, A. Franchi, M. Giovannozzi, V. Kain, M. Lamont, R. Tom´as, G. Vanbavinckhove, J. Wen-ninger, F. Zimmermann, R. Calaga, and A. Morita. First β -beating measurement and optics analysis for theCERN Large Hadron Collider. Phys. Rev. ST Accel. Beams , 12:081002, 2009.[3] Ralph W. Aßmann, Eva Barbara Holzer, Jean-Bernard Jeanneret, Verena Kain, Stefano Redaelli, GuillaumeRobert-Demolaize, and Jorg Wenninger. Expected performance and beam-based optimization of the LHC col-limation system. In

Proc. 9th European Particle Accelerator Conf. (EPAC’04) , pages 1825–1827, Lucerne,Switzerland, 2004.[4] G. Azzopardi, G. Valentino, A. Muscat, B. Salvachua, A. Mereghetti, and S. Redaelli. Automatic angular align-ment of LHC collimators. In

Proc. 16th Int. Conf. on Accelerator and Large Experimental Physics ControlSystems (ICALEPCS’17) , Barcelona, Spain, 2017.[5] G. Azzopardi, A. Muscat, S. Redaelli, B. Salvachua, and G. Valentino. Operational results of LHC collimatoralignment using machine learning. In

Proc. 10th Int. Particle Accelerator Conf. (IPAC’19) , pages 1208–1211,Melbourne, Australia, 2019. doi: 10.18429/JACoW-IPAC2019-TUZZPLM1.[6] Gabriella Azzopardi, Stefano Redaelli, Belen Salvachua, Adrian Muscat, and Gianluc Valentino. Software Ar-chitecture for Automatic LHC Collimator Alignment using Machine Learning. In

Proc. 17th Int. Conf. on Ac-celerator and Large Experimental Physics Control Systems (ICALEPCS’19) , New York, USA, 2019.[7] Gabriella Azzopardi, Stefano Redaelli, Belen Salvachua, Adrian Muscat, and Gianluca Valentino. AutomaticBeam Loss Threshold Selection for LHC Collimator Alignment. In

Proc. 17th Int. Conf. on Accelerator andLarge Experimental Physics Control Systems (ICALEPCS’19) , New York, USA, 2019.[8] Gabriella Azzopardi, Belen Salvachua, and Gianluca Valentino. Data-driven cross-talk modeling of beam lossesin LHC collimators.

Phys. Rev. Accel. Beams , 22(8):083002, 2019. doi: 10.1103/PhysRevAccelBeams.22.083002.[9] Gabriella Azzopardi, Belen Salvachua, Gianluca Valentino, Stefano Redaelli, and Adrian Muscat. Operationalresults on the fully automatic LHC collimator alignment.

Phys. Rev. Accel. Beams , 22(9):093001, 2019. doi:10.1103/PhysRevAccelBeams.22.093001.[10] Gabriella Azzopardi, Gianluca Valentino, Adrian Muscat, and Belen Salvachua. Automatic spike detection inbeam loss signals for LHC collimator alignment.

Nucl. Instrum. Methods Phys. Res. A , 934:10–18, 2019. doi:10.1016/j.nima.2019.04.057.[11] Gustavo Batista, Xiaoyue Wang, and Eamonn Keogh. A complexity-invariant distance measure for time series.In

Proceedings of the 11th SIAM International Conference on Data Mining (SDM 2011) , pages 699–710, 042011. doi: 10.1137/1.9781611972818.60.[12] A. Bazzani, M. Giovannozzi, E. H. Maclean, C. E. Montanari, F. F. Van der Veken, and W. Van Goethem.Advances on the modeling of the time evolution of dynamic aperture of hadron circular accelerators.

Phys. Rev.Accel. Beams , 22:104003, 2019. doi: 10.1103/PhysRevAccelBeams.22.104003. URL https://link.aps.org/doi/10.1103/PhysRevAccelBeams.22.104003 .[13] R. Bellman and R. Kalaba. On adaptive control processes.

IRE Transactions on Automatic Control , 4(2):1–9,November 1959. ISSN 1558-3651. doi: 10.1109/TAC.1959.1104847.[14] Michael Benedikt, Mar Capeans Garrido, Francesco Cerutti, Brennan Goddard, Johannes Gutleber, Jose MiguelJimenez, Michelangelo Mangano, Volker Mertens, John Andrew Osborne, Thomas Otto, John Poole, WernerRiegler, Daniel Schulte, Laurent Jean Tavian, Davide Tommasini, and Frank Zimmermann. FCC-hh: The Hadron20

PREPRINT - 18 TH S EPTEMBER

The European Physical Journal SpecialTopics , 228(4):755–1107, Dec 2019. doi: 10.1140/epjst/e2019-900087-0.[15] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization.

Journal of machinelearning research , 13(Feb):281–305, 2012.[16] Christopher M. Bishop.

Pattern Recognition and Machine Learning (Information Science and Statistics) .Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 0387310738.[17] Giuseppe Bonaccorso.

Hands-on unsupervised learning with Python: implement machine learning and deeplearning models using Scikit-Learn, TensorFlow, and more . Packt Publishing, Birmingham, 2019.[18] Rosalin Bonetta and Gianluca Valentino. Machine learning techniques for protein function prediction.

Proteins:Structure, Function, and Bioinformatics , 88(3):397–413, 2020. doi: 10.1002/prot.25832.[19] E. Bozoki and A. Friedman. Neural networks and orbit control in accelerators. In

Proc. 4th European ParticleAccelerator Conf. (EPAC’94) , pages 1589–1592, London, England, 1994. [,1589(1994)].[20] Leo Breiman. Random forests.

Mach. Learn. , 45(1):532, 2001. doi: 10.1023/A:1010933404324.[21] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying Density-based Local Outliers. In

Proc.of the 2000 ACM SIGMOD International Conference on Management of Data , pages 93–104, Dallas, USA,2000. doi: 10.1145/335191.335388.[22] R. Bruce, R.W. Aßmann, V. Boccone, C. Bracco, M. Brugger, M. Cauchi, F. Cerutti, D. Deboy, A. Ferrari,L. Lari, A. Marsili, A. Mereghetti, D. Mirarchi, E. Quaranta, S. Redaelli, G. Robert-Demolaize, A. Rossi, B. Sal-vachua, E. Skordis, C. Tambasco, G. Valentino, T. Weiler, V. Vlachoudis, and D. Wollmann. Simulations andmeasurements of beam loss patterns at the CERN Large Hadron Collider.

Physical Review Special Topics –Accelerators and Beams , 17(8):081004, 2014.[23] R. Bruce, R. W. Aßmann, and S. Redaelli. Calculations of safe collimator settings and β ∗ at the CERN LargeHadron Collider. Phys. Rev. ST Accel. Beams , 18:061001, Jun 2015. doi: 10.1103/PhysRevSTAB.18.061001.URL http://link.aps.org/doi/10.1103/PhysRevSTAB.18.061001 .[24] R. Bruce, C. Bracco, R. De Maria, M. Giovannozzi, A. Mereghetti, D. Mirarchi, S. Redaelli, E. Quaranta,and B. Salvachua. Reaching record-low β ∗ at the CERN Large Hadron Collider using a novel scheme ofcollimator settings and optics. Nucl. Instrum. Methods Phys. Res. A , 848:19 – 30, Jan 2017. doi: http://dx.doi.org/10.1016/j.nima.2016.12.039. URL .[25] Oliver Sim Br¨uning, Paul Collier, P Lebrun, Stephen Myers, Ranko Ostojic, John Poole, and Paul Proud-lock.

LHC Design Report . CERN Yellow Reports: Monographs. CERN, Geneva, 2004. doi: 10.5170/CERN-2004-003-V-1. URL http://cds.cern.ch/record/782076 .[26] Lee Carver, Xavier Buffat, Andrew Butterworth, Wolfgang Hﬂe, Giovanni Iadarola, Gerd Kotzian, Kevin Li,Elias Mtral, Miguel Ojeda Sandons, Martin Sdern, and Daniel Valuch. Usage of the Transverse Damper Ob-servation Box for High Sampling Rate Transverse Position Data in the LHC. Technical Report ACC-2017-117,CERN, Switzerland, 2017. URL https://cds.cern.ch/record/2289712 .[27] S. K. Chalup, C. L. Murch, and M. J. Quinlan. Machine learning with aibo robots in the four-legged leagueof robocup.

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 37(3):297–310, May 2007. ISSN 1558-2442. doi: 10.1109/TSMCC.2006.886964.[28] Loic Coyle. Machine learning applications for hadron colliders: Lhc lifetime optimization. Master’s thesis,Grenoble INP, France and EPFL, Switzerland, 2018. URL https://cds.cern.ch/record/2719933 .[29] Loic Thomas Davies Coyle, Tatiana Pieloni, Lenny Rivkin, and Belen Maria Salvachua Ferrando. MD 4510: Working point exploration for use in lifetime optimization by machine learning. Dec 2019. URL https://cds.cern.ch/record/2705860 .[30] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. Credit card fraud detection: A realisticmodeling and a novel learning strategy.

IEEE Transactions on Neural Networks and Learning Systems , 29(8):3784–3797, Aug 2018. ISSN 2162-2388. doi: 10.1109/TNNLS.2017.2736643.[31] Sahibsingh A Dudani. The distance-weighted k-nearest-neighbor rule.

IEEE Transactions on Systems, Man, andCybernetics , (4):325–327, 1976.[32] A. Edelen, C. Mayes, D. Bowring, D. Ratner, A. Adelmann, R. Ischebeck, J. Snuverink, I. Agapov, R. Kam-mering, J. Edelen, I. Bazarov, G. Valentino, and J. Wenninger. Opportunities in Machine Learning for ParticleAccelerators. 2018. 21

PREPRINT - 18 TH S EPTEMBER

Phys. Rev. Accel. Beams , 21:112802, 2018.[34] Martin Ester, Hans-Peter Kriegel, Jrg Sander, and Xiaowei Xu. A density-based algorithm for discoveringclusters in large spatial databases with noise. In

Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining(KDD’96) , pages 226–231, Portland (OR), USA, 1996.[35] S. Fartoukh. Achromatic telescopic squeezing scheme and application to the LHC and its luminosity upgrade.

Phys. Rev. ST Accel. Beams , 16:111002, 2013.[36] Elena Fol. Evaluation of machine learning methods for LHC optics measurements and corrections software.Master’s thesis, University of Applied Sciences, Karlsruhe, 2017.[37] Elena Fol and Rogelio Tom´as. Isolation forest algorithm for faulty beam position monitors detection. to bepublished.[38] Elena Fol, Rogelio Tom´as, and Giuliano Franchetti. Regression models for magnet errors prediction in the LHC.to be published.[39] Elena Fol, Felix Simon Carlier, Ana Coello de Portugal, Jaime Maria Garcia-Tabares, and Rogelio Tom´as. Ma-chine Learning Methods for Optics Measurements and Corrections at LHC. In

Proc. 9th Int. Particle AcceleratorConf. (IPAC’18) , pages 1967–1970, Vancouver, Canada, 2018. doi: 10.18429/JACoW-IPAC2018-WEPAF062.[40] Elena Fol, Jaime Maria Coello de Portugal, Giuliano Franchetti, and Rogelio Tom´as. Application of MachineLearning to Beam Diagnostics. In

Proc. 39th Int. Free Electron Laser Conf. (FEL’19) , Hamburg, Germany, 2019.[41] Elena Fol, Jaime Maria Coello de Portugal, Giuliano Franchetti, and Rogelio Tom´as. Optics corrections us-ing machine learning in the LHC. In

Proc. 10th Int. Particle Accelerator Conf. (IPAC’19) , pages 3990–3993,Melbourne, Australia, 2019. doi: 10.18429/JACoW-IPAC2019-THPRB077.[42] Elena Fol, Jaime Maria Coello de Portugal, and Rogelio Tom´as. Unsupervised Machine Learning for Detectionof Faulty Beam Position Monitors. In

Proc. 10th Int. Particle Accelerator Conf. (IPAC’19) , pages 2668–2671,Melbourne, Australia, 2019. doi: 10.18429/JACoW-IPAC2019-WEPGW081.[43] Massimo Giovannozzi and Frederik F. Van der Veken. Description of the luminosity evolution for the CERNLHC including dynamic aperture effects. Part II: application to Run 1 data.

Nucl. Instrum. Methods Phys. Res.A , 908:1–9, 2018. doi: 10.1016/j.nima.2018.08.019.[44] Massimo Giovannozzi and Frederik F. Van der Veken. Description of the luminosity evolution for the CERNLHC including dynamic aperture effects. Part I: the model.

Nucl. Instrum. Methods Phys. Res. A , 905:171–179,2018. doi: 10.1016/j.nima.2019.01.072. [Erratum: Nucl. Instrum. Methods Phys. Res. A A927,471(2019)].[45] R. Giusti and G. E. A. P. A. Batista. An empirical comparison of dissimilarity measures for time series classiﬁc-ation. In , pages 82–88, 2013.[46] Cyril Goutte and Eric Gaussier. A probabilistic interpretation of precision, recall and f-score, with implicationfor evaluation. In

European Conference on Information Retrieval , pages 345–359. Springer, 2005.[47] Oswald Gr¨obner. Dynamic outgassing. Technical report, CERN, 1999.[48] S. Hussein, P. Kandel, C. W. Bolan, M. B. Wallace, and U. Bagci. Lung and pancreatic tumor characterization inthe deep learning era: Novel supervised and unsupervised learning approaches.

IEEE Transactions on MedicalImaging , 38(8):1777–1787, Aug 2019. ISSN 1558-254X. doi: 10.1109/TMI.2019.2894349.[49] JM Jimenez. LHC: The world’s largest vacuum systems being operated at CERN.

Vacuum , 84(1):2–7, 2009.[50] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-YanLiu. Lightgbm: A highly efﬁcient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Ben-gio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,

Advances in Neural Information Pro-cessing Systems 30 , pages 3146–3154. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf .[51] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neuralnetworks. In

Advances in neural information processing systems , pages 1097–1105, 2012.[52] Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation, and active learning. In

Advancesin neural information processing systems , pages 231–238, 1995.[53] Steven M LaValle, Michael S Branicky, and Stephen R Lindemann. On the relationship between classical gridsearch and probabilistic roadmaps.

The International Journal of Robotics Research , 23(7-8):673–692, 2004.[54] Yann Lecun and Yoshua Bengio.

Convolutional networks for images, speech, and time-series . MIT Press, 1995.22

PREPRINT - 18 TH S EPTEMBER

Nature , 521(7553):436–444, 2015. doi:10.1038/nature14539. URL https://doi.org/10.1038/nature14539 .[56] S. C. Leemann, S. Liu, A. Hexemer, M. A. Marcus, C. N. Melton, H. Nishimura, and C. Sun. Demonstration ofMachine Learning-Based Model-Independent Stabilization of Source Properties in Synchrotron Light Sources.

Phys. Rev. Lett. , 123:194801, 2019.[57] Yongjun Li, Weixing Cheng, Li Hua Yu, and Robert Rainer. Genetic algorithm enhanced by machine learning indynamic aperture optimization.

Phys. Rev. Accel. Beams , 21:054601, 2018.[58] D. Lien Minh, A. Sadeghi-Niaraki, H. D. Huy, K. Min, and H. Moon. Deep learning approach for short-termstock trends prediction based on two-stream gated recurrent unit network.

IEEE Access , 6:55392–55404, 2018.ISSN 2169-3536. doi: 10.1109/ACCESS.2018.2868970.[59] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In

Proc. 8th IEEE Int. Conf. on Data Mining(ICDM’08) , pages 413–422. IEEE Computer Society, 2008. doi: 10.1109/ICDM.2008.17.[60] E. Meier, Y. E. Tan, and G. LeBlanc. Orbit Correction Studies using Neural Networks. In

Proc. 3rd Int. ParticleAccelerator Conf. (IPAC’12) , pages 2837–2839, New Orleans, USA, 2012.[61] Tom M. Mitchell.

Machine Learning . McGraw-Hill, New York, 1997. ISBN 978-0-07-042807-2.[62] Daniel Mllner. Modern hierarchical, agglomerative clustering algorithms, 2011.[63] J. A. Nelder and R. Mead. A Simplex Method for Function Minimization.

The Computer Journal , 7(4):308–313,01 1965. ISSN 0010-4620. doi: 10.1093/comjnl/7.4.308. URL https://doi.org/10.1093/comjnl/7.4.308 .[64] Jerzy Neyman. On the two different aspects of the representative method: the method of stratiﬁed sampling andthe method of purposive selection. In

Breakthroughs in Statistics , pages 123–150. Springer, 1992.[65] Dr Otair et al. Approximate k-nearest neighbour based spatial clustering using kd tree. arXiv preprintarXiv:1303.1951 , 2013.[66] Sankar K Pal and Sushmita Mitra. Multilayer perceptron, fuzzy sets, classiﬁcation.

IEEE Transactions on NeuralNetworks , 3:683–697, 1992. doi: 10.1109/72.159058.[67] T. Persson, F. Carlier, J. Coello de Portugal, A. Garcia-Tabares Valdivieso, A. Langner, E. H. Maclean, L. Malina,P. Skowronski, B. Salvant, R. Toms, , and A. C. Garca Bonilla. LHC optics commissioning: A journey towards1% optics control.

Phys. Rev. Accel. Beams , 20:061002, 2017.[68] R. Tom´as and O. Br¨uning and M. Giovannozzi and P. Hagen and M. Lamont and F. Schmidt and G. Vanbavinck-hove and M. Aiba and R. Calaga and R. Miyamoto. CERN Large Hadron Collider optics model, measurements,and corrections.

Phys. Rev. ST Accel. Beams , 13:121004, 2010.[69] Stanley J Reeves and Zhao Zhe. Sequential algorithms for observation selection.

IEEE Transactions on SignalProcessing , 47(1):123–132, 1999.[70] Ryan Rifkin and Ross Lippert. Notes on regularized least-squares. Technical report, Massachusetts Institute ofTechnology Computer Science and Artiﬁcial Intelligence Laboratory, 2007.[71] H. Robbins and T.L. Lai. Strong consistency of least-squares estimates in regression models.

Journal of Mul-tivariate Analysis , 23:77–92, 1987. doi: 10.1016/0047-259X(87)90179-5.[72] R. Rubinstein, M. Zibulevsky, and M. Elad. Efﬁcient implementation of the k-svd algorithm using batch ortho-gonal matching pursuit. Technical report, CS Technion report CS-2008-08, 2008.[73] Giovanni Rumolo, AZ Ghalam, T Katsouleas, CK Huang, VK Decyk, C Ren, WB Mori, F Zimmermann, andF Ruggiero. Electron cloud effects on beam evolution in a circular accelerator.

Physical Review Special Topics-Accelerators and Beams , 6(8):081002, 2003.[74] Andrea Saccomanno, Armando Laudati, Zoltan Szillasi, Noemi Beni, Antonello Cutolo, Andrea Irace, MicheleGiordano, Salvatore Buontempo, Andrea Cusano, and Giovanni Breglio. Long-term temperature monitoring inCMS using ﬁber optic sensors.

IEEE Sensors journal , 12(12):3392–3398, 2012.[75] B Salvant, O Aberle, G Arduini, R Aßmann, V Baglin, M Barnes, W Bartmann, P Baudrenghien, O Berrig,C Bracco, E Bravin, G Bregliozzi, R Bruce, A Bertarelli, F Carra, G Cattenoz, F Caspers, S Claudet, H Day,M Garlasche, L Gentini, B Goddard, A Grudiev, B Henrist, R Jones, O Kononenko, G Lanza, L Lari, T Mastor-idis, V Mertens, E Mtral, N Mounet, J Muller, A Nosych, J Nougaret, S Persichelli, A Piguiet, S Redaelli, F Ron-carolo, G Rumolo, B Salvachua, M Sapinski, R Schmidt, E Shaposhnikova, L Tavian, M Timmins, J Uythoven,A Vidal, J Wenninger, D Wollmann, M Zerlauth, P Fassnacht, S Jakobsen, and M Deile. Update on beam in-duced RF heating in the LHC. In

Proc. 4th International Particle Accelerator Conf. (IPAC’13) , pages 1646–1648,Shanghai, China, 2013. 23

PREPRINT - 18 TH S EPTEMBER http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf .[77] J Serrano and M Cattin. The LHC AC Dipole system: an introduction. Technical Report BE-Note-2010-014,CERN, Switzerland, May 2010. URL https://cds.cern.ch/record/1263248 .[78] Arsenij Aleksandroviˇc Sokolov and Igor Michajlovic Ternov. Synchrotron radiation.

Akademia Nauk SSSR,Moskovskoie Obshchestvo Ispytatelei prirody. Sektsia Fiziki. Sinkhrotron Radiation, Nauka Eds., Moscow, 1966(Russian title: Sinkhrotronnoie izluchenie), 228 pp. , 1966.[79] J. Su, J. Wu, P. Cheng, and J. Chen. Autonomous vehicle control through the dynamics and controller learning.

IEEE Transactions on Vehicular Technology , 67(7):5650–5657, July 2018. ISSN 1939-9359. doi: 10.1109/TVT.2018.2819806.[80] Richard S. Sutton and Andrew G. Barto.

Reinforcement Learning: An Introduction . The MIT Press, secondedition, 2018.[81] E. Todesco and M. Giovannozzi. Dynamic aperture estimates and phase-space distortions in nonlinear betatronmotion.

Phys. Rev. E , 53:4067–4076, 1996.[82] R. Tom´as, T. Bach, R. Calaga, A. Langner, Y. I. Levinsen, E. H. Maclean, T. H. B. Persson, P. K. Skowronski,M. Strzelczyk, G. Vanbavinckhove, and R. Miyamoto. Record low β -beating in the LHC. Phys. Rev. ST Accel.Beams , 15:091001, 2012.[83] John W Tukey. Comparing individual means in the analysis of variance.

Biometrics , 5(2):99–114, 1949.[84] Gianluca Valentino, Ralph Aßmann, Roderik Bruce, Stefano Redaelli, Adriana Rossi, Nicholas Sammut, andDaniel Wollmann. Semiautomatic beam-based LHC collimator alignment.

Physical Review Special Topics-Accelerators and Beams , 15(5):051002, 2012.[85] Frederik F. Van der Veken and Massimo Giovannozzi. Scaling Laws for the Time Dependence of Luminosityin Hadron Circular Accelerators based on Simple Models of Dynamic Aperture Evolution. In

Proc. 61st ICFAAdvanced Beam Dynamics Workshop on High-Intensity and High-Brightness Hadron Beams (HB2018) , pages260–265, Daejeon, Korea, 2018. doi: 10.18429/JACoW-HB2018-WEP2PO002.[86] L Vega, A Ab´anades, MJ Barnes, V Vlachodimitropoulos, and W Weterings. Thermal analysis of the LHCinjection kicker magnets. In

Journal of Physics: Conference Series , volume 874, page 012100. IOP Publishing,2017.[87] JP. Vert, K. Tsuda, and B. Sch¨olkopf.

A Primer on Kernel Methods , pages 35–70. MIT Press, Cambridge, MA,USA, 2004.[88] D. Vilsmeier, M. Sapinski, and R. Singh. Space-charge distortion of transverse proﬁles measured by electron-based ionization proﬁle monitors and correction methods.

Phys. Rev. Accel. Beams , 22:052801, 2019.[89] Jinyu Wan, Paul Chu, Yi Jiao, and Yongjun Li. Improvement of machine learning enhanced genetic algorithmfor nonlinear beam dynamics optimization.

Nucl. Instrum. Methods Phys. Res. A , 946:162683, 2019.[90] J. Wenninger. LHC Status and Performance. In

Proc. Prospects for Charged Higgs Discovery at Colliders(CHARGED2018) , Uppsala, Sweeden, 2018. doi: https://pos.sissa.it/339/001/pdf.[91] Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis.

Chemometrics and intelligentlaboratory systems , 2(1-3):37–52, 1987.[92] Xingyi Xu, Yimei Zhou, and Yongbin Leng. Machine learning based image processing technology applicationin bunch longitudinal phase information extraction.

Phys. Rev. Accel. Beams , 23:032805, 2020.[93] Carlo Zannini, G Rumolo, and G Iadarola. Power loss calculation in separated and common beam chambers ofthe LHC. In5th Int. Particle Accelerator Conf. (IPAC’14)