[PDF] A Novel Approach for Classification and Forecasting of Time Series in Particle Accelerators

Abstract

The beam interruptions (interlocks) of particle accelerators, despite being necessary safety measures, lead to abrupt operational changes and a substantial loss of beam time. A novel time series classification approach is applied to decrease beam time loss in the High Intensity Proton Accelerator complex by forecasting interlock events. The forecasting is performed through binary classification of windows of multivariate time series. The time series are transformed into Recurrence Plots which are then classified by a Convolutional Neural Network, which not only captures the inner structure of the time series but also utilizes the advances of image classification techniques. Our best performing interlock-to-stable classifier reaches an Area under the ROC Curve value of 0.71 \pm 0.01 compared to 0.65 \pm 0.01 of a Random Forest model, and it can potentially reduce the beam time loss by 0.5 \pm 0.2 seconds per interlock.

Full PDF

AArticle

A Novel Approach for Classiﬁcation and Forecastingof Time Series in Particle Accelerators

Sichen Li , Mélissa Zacharias , Jochem Snuverink , Jaime Coello de Portugal , FernandoPerez-Cruz , Davide Reggiani and Andreas Adelmann * Paul Scherrer Institut, 5232 Villigen PSI, Switzerland Swiss Data Science Center, ETH Zürich and EPFL, Universitätstrasse 25 8092, Zürich, Switzerland * Correspondence: [email protected]† These authors contributed equally to this work.Submit preprint: 01.02.2021

Abstract:

The beam interruptions (interlocks) of particle accelerators, despite being necessary safetymeasures, lead to abrupt operational changes and a substantial loss of beam time. A novel time seriesclassiﬁcation approach is applied to decrease beam time loss in the High Intensity Proton Acceleratorcomplex by forecasting interlock events. The forecasting is performed through binary classiﬁcation ofwindows of multivariate time series. The time series are transformed into Recurrence Plots which arethen classiﬁed by a Convolutional Neural Network, which not only captures the inner structure ofthe time series but also utilizes the advances of image classiﬁcation techniques. Our best performinginterlock-to-stable classiﬁer reaches an Area under the ROC Curve value of 0.71 ± ± ± Keywords: time series classiﬁcation; recurrence plot; convolutional neural network; random forest;charged particle accelerator

1. Introduction

Recent years have seen a boost in the development of machine learning (ML) algorithms and theirapplications [1]. With the rapid growth of data volume and processing capabilities, the value of datahas been increasingly recognized both academically and in social life, and the prospect of ML has beenpushed to an unprecedented level.Particle accelerators, as large facilities operating with a steady stream of structured data, arenaturally suitable for ML applications. The highly complicated operation conditions and precisecontrol objectives fall perfectly inside ML’s scope [2]. Over the past few years, the interest andengagement of ML applications in the ﬁeld of particle accelerators has started to grow [1,3] alongwith the trend of data-driven approaches in other disciplines. Fast and accurate ML-based beamdynamics modelling can serve as a guide for future accelerator design and commissioning [4]. An MLsurrogate model could achieve the same precision by utilizing only a few runs of the original highﬁdelity simulation [5]. In addition, various types of problems in accelerator control and operation,such as fast and safe parameter tuning to achieve optimal beam energy in the SwissFEL [6], opticscorrection [7] and collimator alignment in the LHC [8], and model-free stabilization of source sizein the ALS [9], are intrinsically suitable for ML solutions due to the complexity of the input space,and a target function that is rather straightforward to implement. Applications in beam diagnosticshave also attracted much interest due to ML’s advantages on non-linear behavior and multi-objectiveoptimization [10].Apart from the above-mentioned applications, anomaly detection and prevention has alwaysbeen an inescapable and signiﬁcant issue for accelerators to achieve high efﬁciency and reliability.There is also an increasing number of studies on data-driven approaches on this topic [11,12]. However,most existing research focuses on diagnostic techniques that distinguish anomalies from normal a r X i v : . [ phy s i c s . acc - ph ] F e b of 20 circumstances, rather than prognostic approaches that predict potential anomalies in advance. Severalrecent studies have attempted to address the issue of prediction. Donon et al. present differentapproaches, including signal processing and statistical feature extraction, to identify and predict jittersin the RF sources of Linac4 [13,14]. All approaches manage to detect the example jitters when theyoccur, and some even detect ﬁrst symptoms ahead of time, which implies predictive power. Contraryto our beam interruptions, most jitters appear progressively rather than abruptly, which makes it morelikely to locate an earlier precursor. There also lacks a quantiﬁed validation of the model performanceamong all the jitters. Rescic et al. applied various binary classiﬁcation algorithms on beam pulses fromthe SNS to identify the last pulse ahead of the failure from normal pulses [15], in which predictivepower is embedded, but their approach depends highly on the discrete pulse structure and cannot bedirectly adapted to continuous cases. It is also worth noting that both studies deal with univariate timeseries, i.e. RF power for Donon et al. and beam current for Rescic et al. Our work presents a novel andmore ambitious approach to classify a continuous stream of multivariate time series data by cuttingout windows from different regions, and trigger an alarm for a potential beam interruption before ithappens. Model performance on the validation set is evaluated by two metrics, with examples frommimicked live prediction as well.Outside the accelerator community, the prediction and prevention of failures has for long attractedresearchers’ interest, especially in the ﬁeld of predictive maintenance [16,17] where a typical researchdeals with the lifetime prediction of an engine or bearing. However, in contrast to the gradualdegradation process of a machinery equipment, the accelerator interruptions may appear rathersuddenly due to momentary causes, thus traditional signal processing methods such as noise reductionor vibration analysis [18,19], are largely ineffective in the rare events scenario. Instead of long-termevolution in time, short-scale structures should raise more concern. By generating Recurrence Plots(RPs) in close adjacency of the failures as well as in the middle of stable operation period, we managedto extract ﬁner information out of the original time series, which is better suited to the abrupt cases.This work aims at capturing the precursor of beam interruptions - or for short "interlocks" - ofthe High Intensity Proton Accelerators (HIPA) by classifying whether the accelerator is in stable orunstable operation mode. A novel approach, a Recurrence Plots based Convolutional Neural Network(RPCNN), is introduced and adapted to the problem setting. The method transforms multivariate timeseries into images, which allows to extract enough structures and exploit the mature ML techniques ofimage classiﬁcation. The RPCNN model achieves an AUC value of 0.71 ± ± ± ±

2. Materials and Methods

The HIPA at the Paul Scherrer Institut (PSI) is one of the most powerful proton cyclotron facilitiesin the world, with nearly 1.4 MW of beam power [20]. The HIPA Archiver is an in-house developedarchiver software that stores values of EPICS [21] Process Variables, i.e. channels, according to tailoredconﬁguration settings [22]. A data API is available as a Python library to access the archiver and exportthe data into Pandas Dataframes [23].The dataset was taken from 14 weeks in 2019 starting from September from the HIPA Archiver,composed of 376 operating channels of the RING section and the PKANAL (see Figure 1) sectionfor the model input (SINQ beamline was not operating during the period), and the Interlock records of 20 for the model output. Figure 1 shows the different sections of the channels, and table 1 lists someexample channels with detailed information. Figure 2 shows some examples of typical interlockrecords, which are text messages reporting timestamp and all possible causes. Each interlock has atleast one cause, though multiple causes can be attributed to one interlock if no single cause couldbe identiﬁed. For instance, an interlock of “MRI GR.3” indicates that at least one loss monitor atthe end of Ring Cyclotron exceeds its limit, as shown in the second line of Figure 2. The detaileddescription “MRI14>MAX-H” gives the exact cause that the beam loss monitor “MRI14” is over itsmaximal allowed value.

RING PKANAL SINQbeamline H igh I ntensity P roton A ccelerators: HIPA

590 MeV, 1.3 MW

Figure 1.

Different HIPA sections of the channels.

Table 1.

List of some example channels.

Channel Section Description (unit)

AHA:IST:2 PKANAL Bending magnet (A)CR1IN:IST:2 RING Voltage of Ring RF Cavity 1 (kVp)MHC1:IST:2 PKANAL Beam current measurement at the ring exit ( µ A) Figure 2.

Example interlock record messages.

Taking into account the original causes of the interlocks, to ease the classiﬁcation problem weseparated them into four different general types. Table 2 and Figure 3 show the four types and theirstatistics in the considered dataset that contains 2027 interlocks in total.

Table 2.

The different types of interlocks in the dataset.

Type Description

Electrostatic Interlock related to electrostatic elementsTransmission Interlock related to transmission through targetLosses Interlock related to beam lossesOther type Interlock related to another type or unknown of 20

Figure 3.

Distribution of the interlock events by type. “Other Types” denotes all interlock types thatare not prevalent enough to have a separate category. Note that an interlock may be labeled as morethan one type.

The archiver logs each value change with a certain maximum frequency per channel. In order tohave input samples at a ﬁxed frequency, during preprocessing the data of all channels are synchronizedonto a predeﬁned 5 Hz time grid by taking the last point that was recorded before each point in thegrid (Figure 4). The interlock timestamps as given by the archiver, were found to be up to a second offwith respect to the real time of the beam interruption. Therefore, the interlock timestamp is chosen tobe the closest point in the 5 Hz grid in which the beam current drops below a certain threshold. x .0 x .2 x .4 x .6 x .8 ( x + ) .0 ( x + ) .2 ( x + ) .4 Time(s)Original beam current Aligned beam current Aligned interlocksBeam currentInterlock Figure 4.

Synchronization of the beam current as an example channel and the interlocks.

The problem of interlock prediction is formulated as a classiﬁcation problem of two types ofsamples – interlock windows labeled as class 1 and stable windows labeled as class 0, taken from thechannel time series. The model prediction of a sample is a value in [

0, 1 ] , and a sample is predicted tobe class 1 once its output gets above a custom classiﬁcation threshold determined by performances.An interlock window is taken by setting its endpoint closely before but not exactly at an interlockevent. This allows the model to identify the precursors of the interlock and not the interlock eventitself, in case the signals synchronization is slightly off. Thus, if the last timestamp of the windowhas been taken 1 second before the interlock event, the classiﬁcation of a sample as interlock by themodel means that an interlock is expected to happen in 1 second. To increase the number of interlocksamples, an interval of 5 overlapping windows are assigned the label “interlock” rather than only onewindow per interlock, which is shown in the ﬁrst two columns of Table 3. The stable windows aretaken as the series of non-overlapping windows that are in the periods between two interlocks. To stay of 20 away from the unstable situations, two buffer regions of 10 minutes before and after the interlocks areignored, as displayed in Figure 5.Taking sliding windows as interlock samples mimics the real-time application scenario wherethere is a constant incoming data ﬂow. Since the machine is mostly in stable operation mode, thestable samples are abundant and similar, thus it is neither necessary nor economical to cut slidingstable windows. All windows have a length of 12.8 seconds (64 time steps, and each time step takes0.2 seconds), which is determined by the model performance. Figure 5.

Deﬁnition of interlock windows (orange) and stable windows (green). 5 overlapping interlockwindows are cut at 1 second before the interlocks, and non-overlapping stable windows are cut inbetween two interlocks with a 10 minutes buffer region. Each window has a length of 12.8 seconds (64timesteps).

In addition, data where the beam current (channel “MHC1:IST:2”) presents a value of lessthan 1000 µ A , at the ﬁrst time stamp of the sample, are excluded from the dataset. This measureremoves samples where the HIPA machine was not in a stable state. Such samples may not be a truerepresentative of the respective class and thus not suitable as training data.As shown in Figure 3, interlocks of the type “Losses” make up a majority (60.8%) of the availablesamples. Since the vast difference in types may be problematic for a binary classiﬁer and as not enoughsamples of each type are available to perform a multi-class classiﬁcation, only interlocks of the type“Losses” are considered in the current study. The number of considered interlocks reduces from 2027to 894 after the above cleaning measures.The model was trained on the ﬁrst 80% of the available data and validated on the remaining 20%in time order to assess the predictive power of the model in future samples.On average, there is about one interlock event per hour in the dataset (the interlock distributionand counts during our data collection period are given in Appendix A). Due to the nature of the data,the two classes of windows are highly imbalanced, as shown in Table 3. In order to compensate forthe class imbalance, bootstrapping is employed on the training set. Interlock samples are drawn withreplacement from the training set until their number equals the number of stable samples. Numbers ofinterlock and stable samples in training and validation sets after all preprocessing steps with and w/obootstrapping are listed in Table 3. Table 3.

Numbers of samples in the training and validation sets, before and after bootstrapping.

731 3655 176046 176046

Validation set

163 815 44110 not applied of 20

The HIPA channels present behaviors on various time scales, such as slow signal drifts or suddenjumps, which are very hard to clean up in the manner necessary for usual time series predictionmethods. Recurrence plots are intrinsically suited to deal with these issues, as they are capable ofextracting time dependent correlations on different time scales and complexity via tunable parameters.2.3.1. Theory

Figure 6.

Generation of recurrence plot from a signal with ﬁxed (cid:101) = Recurrence plots were developed as a method to analyze dynamical systems and detect hiddendynamical patterns and nonlinearities [24]. The structures in a recurrence plot contain informationabout time evolution of the system it portrays. In this study, recurrence plots are used to transform thetime series into images, which are then classiﬁed by a Convolutional Neural Network.The original recurrence plot described in [24] is deﬁned as R i , j = θ ( (cid:101) i − || (cid:126) x i − (cid:126) x j || ) , (cid:126) x i ∈ R m , i , j =

1, . . . , N .with θ the Heaviside function, i , j the indices of time steps inside a time window taken from thesignal, and N the length of the window. Here the radius (cid:101) i is chosen for each i in such a way that theneighborhood it deﬁnes contains a ﬁxed number of states (cid:126) x j . As a consequence the recurrence plotis not symmetric but all columns have the same recurrence density. The most common deﬁnitionsuse a formulation with a ﬁxed radius (cid:101) i = (cid:101) , ∀ i which was ﬁrst introduced by Zbilut et al. in [25].The variation we use here is a so-called global recurrence plot [26] with a ﬁxed epsilon, as deﬁned inEquation (1) D i , j = (cid:40) || (cid:126) x i − (cid:126) x j || , || (cid:126) x i − (cid:126) x j || ≤ (cid:101)(cid:101) , || (cid:126) x i − (cid:126) x j || > (cid:101) (1)where D is symmetric. Figure 6 explains the process of transforming an original signal to a recurrenceplot, with ﬁxed (cid:101) = of 20 Figure 7.

Examples of recurrence plots generated from the RPCNN model. From left to right:uncorrelated stochastic data, data starting to grow, and stochastic data with a linear trend. Thetop row shows the signals before the recurrence plot generation, and the bottom row shows thecorresponding recurrence plots.

Figure 8.

Schematic representation of the RPCNN model architecture. of 20

Table 4.

Overview of the RPCNN model training parameters.

Parameter Value

Learning rate 0.001Batch size 512Optimizer AdamWindow size 12.8 s Number of channels 97Std of the Gaussian noise 0.3Dropout ratio 0.1

PreprocessingIn order to decrease the model parameters and thus reduce over-ﬁtting, only 97 of the available376 channels were used as model input. Since there are groups of input channels that measureapproximately the same quantity and have a high correlation between each other (for instance beamposition monitors), a lot of redundancy exists in the input channels, which enables an automatedrandom feature selection procedure based on model performance. The selection was done by acombination of random search and evaluation of past experimental results, with details presentedin Algorithm 1. Note that the resulting set of channels is only one out of many possible channelcombinations.

Algorithm 1:

Random feature selectionN = total number of channels, n = sampled number of channels, M = total number of runs;Fix the initialization by ﬁxing the Tensorﬂow seed; for n in a list of randomly chosen numbers < N dofor

M runs do randomly sample n out of N channels;train a model with the sampled channels;calculate beam time saved for that model; endend Take the set of channels which optimize the beam time saved, considering model convergenceand losses;Repeat the subsampling among the currently selected set, to obtain the best performing set;Randomly change the initialization seed with the channels ﬁxed to the best performing set, toobtain the best performing conﬁguration;The channel signals that are used as input features differ vastly with respect to their scale. As thismight impact the convolution operations, the channel signals are standardized to mean 0 and standarddeviation 1.RP generationThe RP generation part starts with a L -regularized fully connected layer to reduce the numberof available features from 97 to 20. Gaussian noise with a standard deviation of 0.3 is added as aregularization and data augmentation measure before the signals are converted into recurrence plots.The process is illustrated in Figure 9. of 20 Figure 9.

Illustration of section (a) RP generation of the RPCNN model (c.f. Figure 8).

The recurrence plots are produced internally by a custom “Recurrence Plot layer” in the model.This approach allows to set trainable (cid:101) parameters and generate recurrence plots on the ﬂy, whicheases deployment of the trained models as the recurrence plots do not have to be generated and storedexplicitly during preprocessing. For each feature a recurrence plot is drawn and the respective (cid:101) parameters are initialized with random numbers in the interval [

0, 1 ] . Examples of recurrence plots ofan interlock sample and a stable sample are given in Appendix B, where each sample contains 20 plotsrespectively.Feature extractionThe structure of the feature extraction section was adapted from [31] and consists of a depthwiseseparable convolution performed in parallel with a pointwise convolution. Both are followed by aReLU non-linearity. Batch normalization layers are added after both convolutional layers.ClassiﬁcationThe classiﬁcation part of the network consists of three fully connected layers separated by dropoutlayers with a ratio of 0.1. L regularization was applied to all fully connected layers. As a feasibility study of the dataset and baseline model for the RPCNN, we trained anotherRandom Forest [32] model on all 376 input channels. The Random Forest classiﬁer is an ensemblelearning method built upon Decision Trees. While each tree decides its optimal splits based on arandom sub-sample of training data and features, the ensemble takes the average classiﬁcation scoresfrom all trees as the output classiﬁcation. It is widely applied due to its robustness against over-ﬁtting,straightforward implementation and relatively simple hyper-parameter tuning. It also does not requirethe input features to be standardized or re-scaled, and no difference is noticed on trial runs.For the RPCNN model, information on time dimension is embedded in the recurrence plotsimplicitly. It is thus incompatible to use a window with both time and channel dimensions directly as input for the RF model. Instead, only the last timesteps of each of the same interlocks andstable windows are fed as input into the RF model. The RF model is implemented using theRandomForestClassiﬁer method of the Scikit-learn [33] ensemble module with parameters listedin Table 5. The RF on the 97 channels from the RPCNN shows the same performance as with all 376channels.

Table 5.

Overview of the RF model training parameters.

Parameter Value

Number of estimators 90Maximum depth 9Number of channels 376Maximum leaf nodes 70Criterion Gini

Typical binary classiﬁcation metrics in a confusion matrix are deﬁned and applied in our setting.A True Positive (TP) means an interlock sample – a sample less than 15 seconds before an interlock– being classiﬁed as an interlock. A False Positive (FP) means a stable sample – a sample at least 10minutes away from an interlock – being mistaken as an interlock. A True Negative (TN) means astable sample being reported as stable. And the rest False Negative (FN) means that the model fails toidentify an interlock. All the metrics are counted with regard to samples, namely TP + FN =

815 forvalidation set according to Table 3.For an imbalanced dataset such as the present one, evaluation of the model performance throughthe classiﬁcation accuracy ( TP + TNTP + FN + FP + TN ) is not sufﬁcient. The performance of the model is evaluatedwith a Receiver Operating Curve (ROC) as well as a target deﬁned to maximize the beam time saved.2.5.1. Receiver Operating CurveThe ROC curve shows the true positive rate ( TPR = TPTP + FN ) against the false positive rate( FPR = FPFP + TN ) of the predictions of a binary classiﬁcation model as a function of a varyingclassiﬁcation threshold [34]. The AUC measures the area under the ROC curve bounded by theaxes. It represents the probability that a random positive (i.e. interlock) sample receives a higher valuethan a random negative (i.e. stable) sample, which ranges from 0 to 1 and 0.5 corresponds to randomguess. Aggregated over all possible classiﬁcation thresholds, it indicates the general capacity of amodel to distinguish between the two classes.2.5.2. Beam time savedWe also propose a more practical metric for this particular use case. The beam is shut off whenan interlock occurs. After each interlock the beam current is ramped up automatically again. Eachinterlock event causes a beam time loss of about 25 seconds. The average uptime of the HIPA facility isabout 90% [35]. Short interruptions, mostly from beam losses and electrostatic elements, are responsiblefor about 2% beam time lost at HIPA. Hence an “actually negative” state (i.e. FP + TN ) is about 45times (90% over 2%) more likely to occur than an “actually positive” state (i.e. TP + FN ). We denote p as the probability of “actually positive”, i.e. the probability that an interlock would occur. Thus p = + = .For the target deﬁnition it is assumed that each correctly predicted interlock can be prevented bya 10% beam current reduction for 60 seconds. Each interlock thus leads to an equivalent of six secondsof lost beam time. Note that this is only for performance evaluation purposes and the 10% reductionis not necessarily able to prevent interlocks in real life. The actual mitigation measure is still underinvestigation. Under this assumption, the evaluation of beam time saved is illustrated in Figure 10 and shownin detail in Table 6.

Figure 10.

Expected lost beam time without (top) and with (bottom) the proposed 10% currentreduction. An interlock equivalently leads to 25 seconds of lost beam time. With the current reduction,the interlock is expected to be avoided with a cost of six seconds of lost beam time.

Table 6.

Detail of beam time saved according to different classiﬁcation result of one sample. T s (s) TP FN FP TN Without current reduction -25 -25 0 0With current reduction -6 -25 -6 0Incentive 19 0 -6 0

Based on the above analysis, the “beam time saved”, denoted by T s , per interlock is deﬁned inEquation (2). We decide on the best classiﬁcation threshold corresponding to the largest T s , rather thanthe largest AUC value. T s = · TP − · FP = · TPTP + FN − · FPTP + FN = · TPR − · FPR · FP + TNTP + FN = · TPR − · FPR · − pp = · TPR − · FPR measured in ( s /interlock ) (2)

3. Results

Figure 11 shows the ROC curves of the best performing RF and RPCNN models, as well as theROC curve over varying initializations of the RPCNN model.The conﬁdence intervals are a measure for the variability of the model, which may result fromdifferent selection of validation dataset, or randomness in model and parameter initialization. The variability over conﬁgurations of the validation dataset was calculated for both the RF and RPCNNmodel in Figure 11 (a) and (b) . Speciﬁcally, the models are trained on the same training set, thenvalidated 20 times over randomly sampled subsets of the validation set. Initialization in the RPCNNmodel could also introduce various uncertainties depending on the convergence behavior of the model.Therefore, a corresponding conﬁdence interval of the RPCNN model is calculated. For Figure 11 (c) (b) are displayed.The black dashed line marks the boundary 19 · TPR − · FPR =

0. A model could only savebeam time if its ROC curve reaches the left side of this dashed line and saves more beam time thefurther away it is from this line.(a) RF best (b) RPCNN best(c) RPCNN varying initialization

Figure 11.

ROC curves of the best-performing models in terms of Beam time saved and AUC: ( a ) RF;( b ) RPCNN. The mean and 95% conﬁdence interval is calculated from re-sampling of the validationsets. ( c ) Varying initialization of the same RPCNN model as in ( b ). The mean and 95% conﬁdenceinterval are calculated from the validation results of 25 model instances. Table 7.

Beam time saved ( T s ) and AUC results of the best-performing RF and RPCNN models overre-sampling of the validation sets. Model T s [s/interlock] AUC RF best . ± . ± ± . ± . RPCNN mean over initialization 0.1 ± ± Table 8.

Classiﬁcation results in terms of True Positive (TP), False Positive (FP) and False Negative (FN)of the best-performing RF and RPCNN models. The thresholds for both models were obtained fromthe ROC curves, with 0.70 for the RF and 0.65 for the RPCNN.

Model TP FP FN

RF best 25 28 790RPCNN best 40 75 775

The AUC and beam time saved for the models displayed in Figure 11 can be found in Table 7.The TP, FP and FN counts for the best performing RF an RPCNN models are shown in Table 8. The RFmodel saves 0.7 s per interlock, better than 0.5 s of the RPCNN, but the RPCNN achieves a higher AUCvalue. The RPCNN successfully predicts 4.9% of interlock samples (40 out of 815). While it correctlyidentiﬁes more interlocks than the RF (40 vs. 25), it also triggers more false alarms (75 vs. 28).During the HIPA run from September to December 2019, there are 894 interlocks of the “Losses”type, ignoring the secondary ones. Taking the result of the best RPCNN model, 0.5 seconds would besaved for each interlock, which means we would deliver 7.45 minutes more beam altogether.

While the AUC and Beam time loss metrics allow an evaluation of the expected performance ofthe model, the ﬁnal test is the behavior of the model during live, i.e. continuous predictions. Figure 12presents a selection of snapshots of the RPCNN and RF models during mimicked live predictions onthe extended validation set – to examine the model behavior in unseen regions, the buffer regions arereduced from 10 min to 14 s and predictions are generated until 0.2 s before the interlock event.Figure 12 shows that the RPCNN model achieves an overall better performance than the RFmodel in this mimicked live setting. In the successful cases of (a) and (b) , the RPCNN not only reportsimminent predictions of interlocks in time scale of seconds (green circle), as how the training set isdeﬁned, but could also generate alarms farther in advance (purple circle), even several minutes aheadof time. However, there is much subtlety in the distinction between an early alarm and a FP accordingto the time from the earliest threshold crossing until the interlock. The three crossings (red circle) in (c) could be either taken as FPs if the machine is believed to remain stable and only start deviating shortlybefore interlocks, or recognized as early precursors if the machine keeps operating in an unstablemode. As for the prediction failure in (d) , the RPCNN still outperforms RF in the sense of a growingtrend in output values (red arrow), which indicates some underlying progress of anomalies. (a) TP from RPCNN (top)compared to RF (bottom) (b) TP from RPCNN (top)compared to RF (bottom)(c) FP from RPCNN (top)compared to RF (bottom) (d) FN from RPCNN (top)compared to RF (bottom)

Figure 12.

Screenshots from simulated live predictions over the validation data. The blue line is theprediction value from the model, with higher values indicating higher probability that interlocks areapproaching. The orange line is the readout of the beam current channel “MHC1:IST:2” where a dropto zero indicates an interlock. The green line is the binary classiﬁcation threshold, here taking 0.65 forthe RPCNN and 0.7 for the RF. ( a ) and ( b ) show two examples of successful prediction of interlocksby RPCNN compared to the result of RF on the same time periods. The positive predictions closer tointerlocks are regarded as TP marked by green circles, and the earliest times that the model outputcross the threshold are enclosed in purple circles. ( c ) shows the FPs from the RPCNN in red circles,compared to the result of RF. ( d ) shows an example of FN from the RPCNN compared to the result ofRF. A clear trend increasing with time is present in the output of RPCNN, shown by the red arrow. Figure 13.

Reverse cumulative histogram of the ﬁrst instance the alarm threshold was crossed beforean interlock event and the model prediction values remained above the threshold. From evaluation ofthe ROC curve a threshold value of 0.65 is chosen. Displayed are the results for the best performingRPCNN model. Note that the interlock event is at 0 s, i.e time runs from right to left.

The behavior of the RPCNN model as seen in Figure 12 suggests that some interlocks are detectedwell before the 1 s range deﬁned in the problem formulation. This observation is supported by theﬁndings shown in Figure 13. The time range before the interlock event at which the RPCNN modelprediction ﬁrst crossed and stayed above the threshold is obtained for interlocks in the validationset and compiled into a reverse cumulative histogram. Detection times vary between the interlockevents with some events being only detected a second before their occurrence and others close to 5 minbeforehand.

4. Discussion

As shown in Table 7, all variations of models achieve an AUC value larger than 0.6, indicatingthat our models possess the ability to distinguish between potential interlocks and stable operation.The customized target T s , the “beam time saved”, poses strict requirements on the model performancesince it has little tolerance for FPs, as described in Equation (2). In spite of such constraints, our modelsis able to increase the amount of beam for the experiments.It is observed from Figure 12 that the RPCNN model, while showing comparable performancewith the RF on binary classiﬁcation metrics in Table 7, is more suitable for real-time predictions. Despite the mentioned achievements and potential, there are still limitations in multiple aspects.As shown in Figure 11 (c) , the RPCNN model under random initialization does not converge well,which is expected to be resolved by acquisition of more taining data. Moreover, the available channelsmight change for each data run. For example the 2019 model is invalid in 2020, since the SINQ beamlinewas restarted, so that several old channels were no longer available.Another limitation lies in the ambiguity in manipulating the data and model to achievebetter training, from sample selection (number and size of windows, buffer region etc.), extent ofpreprocessing and data augmentation, feature engineering considering the redundancy in channels, tothe selection of model architecture and hyper-parameter values.Although a thorough cross-validation is a more convincing proof of the model performance, theconﬁdence interval in Figure 11 (a) and (b) are computed solely from constructing different subsets ofthe validation dataset. A cross-validation is done for the RF model with same level performance. Forthe RPCNN model, there has not yet been a complete cross-validation over the full dataset, since themodel still diverges from initialization. Future studies should address these issues with new data.

The results obtained in a mimicked live setting as displayed in Figure 12 and Figure 13 indicatethat the problem formulation as well as the performance metrics need to be reevaluated. As thedetection times of the interlock events are inconsistent, labeling a ﬁxed number of 5 samples per eventas interlock might not be optimal. A next step could be a closer investigation to determine if thedetection time can be linked to some property of the interlocks like their type, length or location, andto adjust the sample labeling accordingly.Following the discussion in 3.2 about the FPs in Figure 12 (c) , another adaptation could bechanging the currently sample-based counting of TP, FP and FN to bring the metrics closer to theintended use case of live predictions. For instance, TPs could be counted with regard to interlocksrather than samples, i.e. if at least one sample is positive in some range before an interlock, one TP iscounted. It would also be reasonable to ignore consecutive FPs in a certain range or to ignore FP andFN occurring in a certain range before the interlock. Our customized target T s (beam time saved isconstructed upon the deﬁnition of TP and FP. With the above adaptation, T s could also be broughtcloser to reality and become more instructive for real practices. To enhance the model performance, one possible method is to tailor the training set accordingto model output. By excluding the interlocks that receive a low detection from the model, it mightbe possible to improve the detection of the remaining interlocks and reduce the number of FPs, thusincreasing the overall model performance.Currently, a GUI for real-time implementation of the model has already been developed andtested on archived data. One of the next steps is to apply transfer learning to the model, train andupdate it with new data continuously, and eventually develop a real prognostic tool with GUI to becontrolled by the machine operators.

Author Contributions: conceptualization, S.L., F.P. and A.A.; methodology, M.Z., F.P. and A.A.; software, M.Z.and S.L.; validation, J.S., J.C. and A.A.; formal analysis, M.Z. and S.L.; investigation, J.S., J.C. and D.R.; resources,J.S. and D.R.; data curation, J.S., J.C. and D.R.; writing–original draft preparation, S.L. and M.Z.; writing–reviewand editing, J.S., J.C., F.P. and A.A.; visualization, S.L. and M.Z.; funding acquisition, J.S. and A.A.

Funding:

This research was partly funded by the Swiss Data Science Center.

Acknowledgments:

We would like to place our special thanks to Dr. Anastasia Pentina and other colleagues fromthe Swiss Data Science Center for their insightful collaboration and generous support throughout the research.We also thank Hubert Lutz and Simon Gregor Ebner from PSI for their expert knowledge on the HIPA Archiverand help in data collection. We acknowledge the assistance of Dr. Derek Feichtinger and Marc Chaubet for theirhelp with the Merlin cluster which enables the computational work of the research.

Conﬂicts of Interest:

The authors declare no conﬂict of interest. The funders had no role in the design of thestudy; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision topublish the results.

Appendix A Interlock statistics

Figure A1.

Time of all interlocks from 18.9.2019 to 25.12.2019. Regular maintenance periods can beobserved.

Figure A2.

Number of interlocks happened per day from 18.9.2019 to 25.12.2019.

Appendix B Example Recurrence Plots

Figure A3.

The 20 Recurrence Plots of an interlock sample.

Figure A4.

The 20 Recurrence Plots of a stable sample.

Appendix C Architecture of the RPCNN model

Figure A5.

The RPCNN architecture.

1. Edelen, A.; Mayes, C.; Bowring, D.; Ratner, D.; Adelmann, A.; Ischebeck, R.; Snuverink, J.; Agapov, I.;Kammering, R.; Edelen, J.; others. Opportunities in machine learning for particle accelerators. arXivpreprint arXiv:1811.03172 .2. Edelen, A.L.; Biedron, S.; Chase, B.; Edstrom, D.; Milton, S.; Stabile, P. Neural networks for modeling andcontrol of particle accelerators.

IEEE Transactions on Nuclear Science , , 878–897.3. Arpaia, P.; Azzopardi, G.; Blanc, F.; Bregliozzi, G.; Buffat, X.; Coyle, L.; Fol, E.; Giordano, F.; Giovannozzi,M.; Pieloni, T.; others. Machine learning for beam dynamics studies at the CERN Large Hadron Collider. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors andAssociated Equipment , , 164652.4. Zhao, W.; Patil, I.; Han, B.; Yang, Y.; Xing, L.; Schüler, E. Beam data modeling of linear accelerators (linacs)through machine learning and its potential applications in fast and robust linac commissioning and qualityassurance. Radiotherapy and Oncology .5. Adelmann, A. On nonintrusive uncertainty quantiﬁcation and surrogate model construction in particleaccelerator modeling.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , , 383–416.6. Kirschner, J.; Nonnenmacher, M.; Mutn `y, M.; Krause, A.; Hiller, N.; Ischebeck, R.; Adelmann, A. BayesianOptimisation for Fast and Safe Parameter Tuning of SwissFEL. FEL2019, Proceedings of the 39thInternational Free-Electron Laser Conference. JACoW Publishing, 2019, pp. 707–710.7. Fol, E.; Coello de Portugal, J.; Franchetti, G.; Tomás, R. Optics corrections using Machine Learning in theLHC. Proceedings of the 2019 International Particle Accelerator Conference, Melbourne, Australia, 2019.8. Azzopardi, G.; Muscat, A.; Redaelli, S.; Salvachua, B.; Valentino, G. Operational Resultsof LHC Collimator Alignment Using Machine Learning. Proc. 10th International ParticleAccelerator Conference (IPAC’19), Melbourne, Australia, 19-24 May 2019; JACoW Publishing: Geneva,Switzerland, 2019; Number 10 in International Particle Accelerator Conference, pp. 1208–1211.doi:doi:10.18429/JACoW-IPAC2019-TUZZPLM1.9. Leemann, S.; Liu, S.; Hexemer, A.; Marcus, M.; Melton, C.; Nishimura, H.; Sun, C. Demonstration ofMachine Learning-Based Model-Independent Stabilization of Source Properties in Synchrotron LightSources. Physical review letters , , 194801.10. Fol, E.; Coello de Portugal, J.; Franchetti, G.; Tomás, R. Application of Machine Learning to BeamDiagnostics. Proc. FEL’19. JACoW Publishing, Geneva, Switzerland, 2019, number 39 in Free ElectronLaser Conference, pp. 311–317. doi:10.18429/JACoW-FEL2019-WEB03.11. Tilaro, F.; Bradu, B.; Gonzalez-Berges, M.; Roshchin, M.; Varela, F. Model Learning Algorithms for AnomalyDetection in CERN Control Systems. 16th Int. Conf. on Accelerator and Large Experimental ControlSystems (ICALEPCS’17), Barcelona, Spain, 8-13 October 2017. JACOW, Geneva, Switzerland, 2018, pp.265–271.12. Piekarski, M.; Jaworek-Korjakowska, J.; Wawrzyniak, A.I.; Gorgon, M. Convolutional neural networkarchitecture for beam instabilities identiﬁcation in Synchrotron Radiation Systems as an anomaly detectionproblem. Measurement , , 108116.13. Donon, Y.; Kupriyanov, A.; Kirsh, D.; Di Meglio, A.; Paringer, R.; Seraﬁmovich, P.; Syomic, S. AnomalyDetection and Breakdown Prediction in RF Power Source Output: a Review of Approaches. NEC 201927th Symposium on Nuclear Electronics and Computing, 2019.14. Donon, Y.; Kupriyanov, A.; Kirsh, D.; Di Meglio, A.; Paringer, R.; Rytsarev, I.; Seraﬁmovich, P.; Syomic,S. Extended anomaly detection and breakdown prediction in LINAC 4’s RF power source output. 2020International Conference on Information Technology and Nanotechnology (ITNT). IEEE, 2020, pp. 1–7.15. Rescic, M.; Seviour, R.; Blokland, W. Predicting particle accelerator failures using binary classiﬁers. NuclearInstruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and AssociatedEquipment , , 163240.16. Hashemian, H.M. State-of-the-art predictive maintenance techniques. IEEE Transactions on Instrumentationand measurement , , 226–236.17. Selcuk, S. Predictive maintenance, its implementation and latest trends. Proceedings of the Institution ofMechanical Engineers, Part B: Journal of Engineering Manufacture , , 1670–1679.18. Scheffer, C.; Girdhar, P. Practical machinery vibration analysis and predictive maintenance ; Elsevier, 2004.

19. Kovalev, D.; Shanin, I.; Stupnikov, S.; Zakharov, V. Data mining methods and techniques for fault detectionand predictive maintenance in housing and utility infrastructure. 2018 International Conference onEngineering Technologies and Computer Science (EnT). IEEE, 2018, pp. 47–52.20. Reggiani, D.; Blau, B.; Dölling, R.; Duperrex, P.A.; Kiselev, D.; Talanov, V.; Welte, J.; Wohlmuther, M.Improving beam simulations as well as machine and target protection in the SINQ beam line at PSI-HIPA.

Journal of Neutron Research , , 1–11.21. Dalesio, L.R.; Kozubal, A.; Kraimer, M. EPICS architecture. Technical report, Los Alamos National Lab.,NM (United States), 1991.22. Lutz, H.; Anicic, D. Database driven control system conﬁguration for the PSI proton accelerator facilities.Proc. ICALEPCS’11. JACoW Publishing, Geneva, Switzerland, 2011, number 9 in International Conferenceon Accelerator and Large Experimental Physics Control Systems, pp. 289–291.23. Ebner, S.; et al. Data API for PSI SwissFEL DataBuffer and EPICS Archiver. https://github.com/paulscherrerinstitute/data_api_python, 2020.24. Eckmann, J.P.; Kamphorst, S.O.; Ruelle, D. Recurrence Plots of Dynamical Systems. EPL (EurophysicsLetters) , , 973.25. Zbilut, J.P.; Koebbe, M.; Loeb, H.; Mayer-Kress, G. Use of recurrence plots in the analysis of heart beatintervals. Proceedings Computers in Cardiology. IEEE, 1990, pp. 263–266.26. Webber, Jr., C.L.; Zbilut, J.P. Recurrence quantiﬁcation analysis of nonlinear dynamical systems. Tutorialsin contemporary nonlinear methods for the behavioral sciences , Physics reports , , 237–329.29. Webber, Jr., C.L.; Marwan, N. Recurrence quantiﬁcation analysis: Theory and Best Practices ; Springer, 2015.30. Chollet, F.; others. Keras. https://keras.io, 2015.31. Chollet, F. Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEEconference on computer vision and pattern recognition, 2017, pp. 1251–1258.32. Ho, T.K. Random decision forests. Proceedings of 3rd international conference on document analysis andrecognition. IEEE, 1995, Vol. 1, pp. 278–282.33. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.;Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E.Scikit-learn: Machine Learning in Python.

Journal of Machine Learning Research , , 2825–2830.34. Fawcett, T. An introduction to ROC analysis. Pattern recognition letters ,27