[PDF] Classification of tokamak plasma confinement states with convolutional recurrent neural networks

Abstract

During a tokamak discharge, the plasma can vary between different confinement regimes: Low (L), High (H) and, in some cases, a temporary (intermediate state), called Dithering (D). In addition, while the plasma is in H mode, Edge Localized Modes (ELMs) can occur. The automatic detection of changes between these states, and of ELMs, is important for tokamak operation. Motivated by this, and by recent developments in Deep Learning (DL), we developed and compared two methods for automatic detection of the occurrence of L-D-H transitions and ELMs, applied on data from the TCV tokamak. These methods consist in a Convolutional Neural Network (CNN) and a Convolutional Long Short Term Memory Neural Network (Conv-LSTM). We measured our results with regards to ELMs using ROC curves and Youden's score index, and regarding state detection using Cohen's Kappa Index.

Full PDF

CClassiﬁcation of tokamak plasma conﬁnement stateswith convolutional recurrent neural networks

F. Matos , V. Menkovski , F. Felici , A. Pau , F. Jenko , theTCV Team ‡ and the EUROfusion MST1 Team § Max Planck Institute for Plasma Physics, Boltzmannstraße 2, 85748 Garching,Germany Eindhoven University of Technology, 5612 AZ Eindhoven, Netherlands Ecole Polytechnique Fdrale de Lausanne (EPFL), Swiss Plasma Center (SPC),CH-1015 Lausanne, SwitzerlandE-mail: [email protected]

Abstract.

During a tokamak discharge, the plasma can vary between diﬀerentconﬁnement regimes: Low (L), High (H) and, in some cases, a temporary (intermediatestate), called Dithering (D). In addition, while the plasma is in H mode, Edge LocalizedModes (ELMs) can occur. The automatic detection of changes between these states,and of ELMs, is important for tokamak operation. Motivated by this, and by recentdevelopments in Deep Learning (DL), we developed and compared two methods forautomatic detection of the occurrence of L-D-H transitions and ELMs, applied on datafrom the TCV tokamak. These methods consist in a Convolutional Neural Network(CNN) and a Convolutional Long Short Term Memory Neural Network (Conv-LSTM).We measured our results with regards to ELMs using ROC curves and Youden’s scoreindex, and regarding state detection using Cohen’s Kappa Index.

Keywords : CNN, LSTM, Deep Learning, ELM, H mode, L mode, Dither, AutomatedDetection

1. Introduction

In a fusion experiment, plasma can typically be described as being in one of two diﬀerentconﬁnement regimes or modes: Low (L) and High (H). Furthermore, the plasma can alsosometimes be described as being in a third, additional, mode, called the Intermediateor Dithering (D)[1] phase. In addition, when the plasma is in H mode, Edge LocalizedModes (ELMs) can periodically occur.Current tokamaks regularly run in H mode, which motivates the necessity for somemeasure of control (and therefore, detection) of ELMs and transitions between plasma ‡ See author list of S. Coda et al 2019 Nucl. Fusion 59 112023 § See author list of B. Labit et al., 2019 Nucl. Fusion 59 086020 a r X i v : . [ phy s i c s . d a t a - a n ] N ov modes. Furthermore, it is expected that future machines will also run in the sameoperating conditions[2]. Thus, the development of automated, data-based approachesto automatically detect the occurrence of certain events would be useful for both existingand future tokamak experiments and operation. A detector would not only simplify andspeed-up the post-experimental, oﬄine analysis of shots, but also (ideally) detect ELMsand plasma state rapidly enough to allow for its usage in the real-time control systems ofa fusion experiment, for purposes of plasma control and real-time discharge monitoringand supervision[3].Due to uncertainties in the scaling laws, it is diﬃcult to determine, a priori , when,during a discharge, a switch between diﬀerent plasma modes will occur[4]. Nevertheless,physicists can usually pinpoint, through a post-experimental visual analysis of severaldiagnostic signal time-traces, at what point in time any transitions between diﬀerentmodes did take place. Similarly to transitions between plasma modes, the occurrence ofan ELM can usually be pinpointed by looking at the time-traces of several diagnosticsfrom a plasma discharge post-shot. Yet through an analysis of signals, some types ofELMs can be easily confused with dithers; a distinction between the two phenomenacan not always be clearly made[5].Although the identiﬁcation by an expert, through post experimental visual analysisof signal time-traces, of a single ELM, or a single transition between plasma modes, isrelatively straightforward for a typical shot, it becomes much more cumbersome to carryout that analysis eﬀectively for many shots, especially when the associated time-seriesdata is long, and when a shot has many transitions between diﬀerent modes.Recent advances in the ML ﬁeld with the introduction of Deep Learning (DL)approaches deal with exactly such challenges. In the past years, the ﬁeld of DeepLearning has brought about signiﬁcant advances in Computer Vision and SequentialData Processing. Convolutional Neural Networks (CNNs) have proven adept atlocalization, recognition and detection tasks in both 2-dimensional[6, 7, 8, 9, 10] and1-dimensional[11, 12, 13, 14, 15, 16] data (i.e. signal analysis) in many diﬀerent ﬁelds ofscience. In addition, Long Short-Term Memory (LSTM) Networks, which are one typeof Recurrent Neural Network, have been successfully used for processing of sequentialdata where one expects correlations to exist across time, namely, automatic translation,natural language modelling[17], traﬃc analysis[18], and automated video description[19].These tasks are much akin to what one can expect to ﬁnd in terms of processing fusionshot data.Given this, a Deep Learning approach is well motivated to address this challenge.Speciﬁcally, deep neural network models oﬀer particular advantages when modelinghigh-dimensional data as given in this setting. In this work, we develop an approachfor automatic classiﬁcation of L-D-H plasma states and detection of ELMs based ontwo deep neural network models. The ﬁrst model is based on a sliding-window feed-forward neural network, speciﬁcally a convolutional neural network (CNN). The secondmodel is based on a recurrent neural network (RNN), speciﬁcally a long short-termmemory network (LSTM) with convolutional layers. The ﬁrst model captures the localcorrelations within the windows to classify the transitions between plasma states fromthe shape of the signals. The second model extends this to capturing longer-termdependencies in the evolution of the states with the recurrent neural network layers.We empirically demonstrate the approach on data collected from the TCV tokamakexperiment, labelled by an ensemble of experts. The presented results demonstrate theeﬀectiveness of the proposed model to detect the state and events of the plasma. Wefurther discuss the trade-oﬀs between increased precision and increased complexity ofboth models.This paper is organized as follows: Section 2 discusses related work and Section3 describes the physical phenomena being analyzed. Section 4 formalizes our problem,details the data we have available, and explains our decisions regarding how we modelthe data and design and train the neural networks. Section 5 gives an overview of themetrics we used to evaluate our results and our rationale behind using those metrics.Section 6 gives an overview of the results achieved, and we wrap up with a discussionin Section 7.

2. Previous work

Several diﬀerent approaches for automated detection of events in plasma experimentsexist. One such approach is to use threshold-based detectors. This corresponds todeﬁning a point or series of points (in time) at which a signal surpasses a certainamplitude as corresponding to a detection[20, 21, 22], with additional constraints suchas an increasing probability of the occurrence of an ELM as time passes since the lastone. These approaches are limited to simple thresholding and cannot compute complexpatterns in the data. Other work builds upon methods such as Kalman Filters to modelthe expected characteristics of the signal over a period of time[23], whilst also keepingtrack (in each time point) of the current plasma mode, according to a pre-deﬁned model.In both of these cases, a detection algorithm’s performance depends on the extent towhich the theoretical assumptions and mathematical descriptions as to how the signalsshould behave are correct, whether those assumptions are exhaustive (i.e., whetherthere may be additional causes which are unaccounted for), and whether some of thoseassumptions are more important than others; in other words, it is diﬃcult to designan exhaustive rule-based system to detect the occurrence of transitions between plasmamodes, as well as to detect ELMs.The alternative is to use a purely data-based, supervised, Machine Learning (ML),approach, whereby a set of data, previously manually labeled by an expert (for example,through visual analysis), is used to train a detector. In this case, one does not specifywhich characteristics or correlations in the data are thought to correspond to theoccurrence of an event; rather, it is expected that the algorithm can automatically learnwhat those correlations are, based on the labels, and then use the learned data featuresto make correct classiﬁcations on new data. Examples of such work are the usage ofSupport Vector Machines (SVMs)[24, 25, 26, 27] and Multi-Layer Perceptron (MLP)Neural Networks[28] on data from several tokamaks for detection of L-H transitions,classiﬁcation of L and H modes, and detection of ELMs.This type of scenario is, indeed, well suited for application of ML methods towardsenabling automation. However, traditional ML methods such as SVMs and MLPstypically have limitations when faced with data with complex dynamics, such as thelong sequences (i.e., signal time-series) present in this environment. SVMs typicallydepend on expert-deﬁned feature engineering, which, while being superior to simplethreshold-based detectors, is nevertheless insuﬃcient when considering the complexdata correlations which are observed in this setting. On the other hand, MLPs, whilenot requiring that sort of expert-deﬁned input, are very ineﬃcient when compared tomodern Deep Learning models such as CNNs and RNNs, requiring much larger numbersof neurons and layers to perform the same task. These limitations are what motivateus to use Deep Learning approaches instead.

3. Background

When a discharge starts, the plasma is considered to be in Low (L) conﬁnement mode.Once a certain threshold of input heating power to the plasma is reached[29], the plasmacan spontaneously transition into High (H) conﬁnement mode. Originally discovered atthe ASDEX-Upgrade Tokamak[30], High (H) mode is nowadays regularly observed inalmost all other machines[31]. H mode is characterized by the appearance, in the plasmaedge, of a steep gradient in the electron density and the electron/ion temperatures,and a reduction in the transport of particles and energy. As a consequence of thisedge transport barrier, the temperature and energy in the plasma core increase. Whencompared to L-mode, H mode allows for a larger amount of stored plasma energy perinput power, thus rendering the fusion process more eﬃcient. Yet the actual inputpower threshold that triggers the transition between the two modes is dependent onmany factors, such as, for example, the conﬁguration of the magnetic ﬁeld, plasmadensity, and plasma size [4]. Furthermore, when the input heating power passes theaforementioned threshold but a change from L to H mode does not immediately occur,the plasma can be considered to be in a dithering (D)[1] phase. In this case, a temporary,weak, edge transport barrier starts to develop at the plasma edge, only to collapse andreappear in rapid succession[29]. These oscillations then repeat themselves until theplasma transitions into L or H mode. The localization of transitions into, and out of, Dmode can, however, be diﬃcult to identify, and there are often disagreements betweenexperts as to which periods of a shot are in a Dithering phase [32].

When the plasma enters H mode, the corresponding accumulation of energy and thelarge pressure gradient at the plasma edge can trigger the occurrence of Edge LocalizedModes (ELMs). These consist of periodic bursts of particles and energy which, if a longamount of time passes between successive ELMs, can impose a signiﬁcant power loadon the divertor, potentially damaging it. However, ELMs also allow for the periodicremoval of accumulated impurities from the plasma, and for a relaxation of the plasmadensity, which can otherwise increase as the H mode progresses, eventually triggering adisruption[33]. On the other hand, frequent, less energetic, ELMs lower the power loadon the divertor, at the cost of reduced plasma conﬁnement. Thus, tokamak operationrequires knowledge of the occurrence of ELMs, in particular for larger machines whereELMs may cause deterioration of in-vessel components. Although several diﬀerent typesof ELMs exist, for the purposes of this work, we did not make any distinctions betweenthem – we train the models to detect all occurring ELMs equally, regardless of theirsubclass.

4. Methods

To develop a model for this task, we formulate the problem as follows:We observe a sequence of measurements x t for 0 < t ≤ N from the sensors for each shot.These observations are conditioned on the state of the plasma z t at corresponding time t , where z t ∈ Z and Z : { (cid:48) Low (cid:48) , (cid:48) Dither (cid:48) , (cid:48) High (cid:48) } . Our goal is to ﬁnd the most likelysequence of z N and the occurrence of ELMS e N that explains the observations x N .ˆ z N = arg max z N (cid:88) t log p ( z t | x t , z t − )ˆ e N = arg max e N (cid:88) t log p ( e t | x t )For this purpose, we develop two models.The ﬁrst model is trained to detect the transitions between the diﬀerent states of theplasma deﬁned as q t ∈ Q where Q : { (cid:48) Low → Dither (cid:48) , (cid:48) Dither → Low (cid:48) , (cid:48) Low → High (cid:48) , (cid:48) High → Low (cid:48) , (cid:48) Dither → High (cid:48) , (cid:48) High → Dither (cid:48) , (cid:48) N otransition } and to detect theELM events as e t ∈ E where E : { (cid:48) ELM (cid:48) , (cid:48) N oELM (cid:48) } .We implement this model with a feed-forward CNN that processes a windowof observations x t − w , .., x t , ..., x t + w and produces a probability distribution over thetransitions p ( q z t − → z t | x t − w : t + w ) and over the presence of an ELM p ( ELM t | x t − w : t + w ) at t . We now deﬁne the probability of transitioning to z t after being in z t − ( p ( z t | x t , z t − )) with our model p ( q z t − → z t | x t − w : t + w ) where w is the number ofobservations around t , therefore:ˆ z N = arg max z N (cid:88) t log p ( q z t − → z t | x t − w : t + w )Practically, we implement the arg max given above as a state evolution of a ﬁnalstate machine S t ( z ( a ) → z ( b ) ) where z ( a ) and z ( b ) are elements in Z and the transitionprobabilities are given by p ( q z t − → z t | x t − w : t + w ) at time t (see Figure 1). The evolutionof the state machine produces several possible sequences of states, and the one mostlikely to have generated the observed sequence of transitions can be found through animplementation of the Viterbi algorithm[34]. 𝐿 𝐷𝐻 𝑞 𝐿𝐷 𝑞 𝐿𝐻 𝑞 𝐻𝐿 𝑞 𝐷𝐿 𝑞 𝐷𝐻 𝑞 𝐻𝐷 𝑁 𝑜 _ 𝑡 𝑟 𝑎 𝑛 𝑠 . 𝑁 𝑜 _ 𝑡 𝑟 𝑎 𝑛 𝑠 . 𝑁𝑜_𝑡𝑟𝑎𝑛𝑠 . 𝑆𝑡𝑎𝑟𝑡

Figure 1: State machine for processing of the CNN outputs

Conv InputFeature

Extraction

FeatureExtraction FeatureExtractionConv Input Conv Input … ……… …

Conv Input

Feature

Extraction … T r a n s i t i on I n p u t S i g n a l C on v o l u t i on s + M a x P oo li n g + D r o p ou t 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝑞 𝑡 + 𝒘 𝑥 𝑡 + 𝑛 − 𝑥 𝑡 + 𝑛 𝑥 𝑡 + 𝑛 + 𝑥 𝑡 + 𝑛 + 𝑥 𝑡 + 𝑛 + 𝑞 𝑡 + 𝑛 − 𝑞 𝑡 + 𝑛 𝑞 𝑡 + 𝑛 + 𝑞 𝑡 + 𝑛 + 𝑞 𝑡 + 𝑛 + Figure 2: Representation of how a CNN can be used to model the transitions betweendiﬀerent plasma modes. The network’s output prediction for a time slice t depends onlyon the data features in a deﬁned region immediately surrounding t .The ﬁrst model can capture the localized correlations in the signals that indicate thetransition of the state of plasma well. However, it is incapable of capturing the longerdistance correlations that may be present in the signal. To generalize the approachfurther, we introduce a sequence model that models the full sequence of observationsup to time t and produce a probability distribution p ( z t | x t ) for 0 < t ≤ N , as well as adistribution over the presence of the ELM s ( p ( ELM t | x t ). This model is implementedby extending the CNN with a recurrent (LSTM) neural network. In this case, the modelnow observes a sequence of sliding windows x t − w , ..., x t , ..., x t + w for each t in the range { , ..t } . Conv Input 1FeatureExtraction 1 Feature

Extraction 2

Feature

Extraction 11

Conv Input 2 Conv Input 11 … ……… …

Conv Input n

FeatureExtraction n … … L S T M S t a t e I n p u t S i g n a l C on v o l u t i on s + M a x P oo li n g + D r o p ou t 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑥 𝑡 + 𝑛 − 𝑥 𝑡 + 𝑛 𝑥 𝑡 + 𝑛 + 𝑥 𝑡 + 𝑛 + 𝑥 𝑡 + 𝑛 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑧 𝑡 + 𝑛 − 𝑧 𝑡 + 𝑛 𝑧 𝑡 + 𝑛 + 𝑧 𝑡 + 𝑛 + 𝑧 𝑡 + 𝑛 + 𝒘 Figure 3: Schematic representation of the ﬂow of data inside a convolutional LSTMNeural Network. The network’s prediction (i.e. output probability) at any time t ofa shot depends not only on whatever features the convolutional layers have extractedfrom the points immediately around t , but also on features extracted in the past.The ﬁrst model has a lower computational complexity and can be trained moreeﬃciently, as we only need windows of the signal with or without the diﬀerent transitions,but it is limited to the information only present in the given window (see Figure 2).Increasing the size of this window that forms the context, increases the complexity bothof the model and of dealing with multiple transitions appearing.The second model addresses these challenges by modeling the sequence rather thana ﬁxed window (see Figure 3). As a sequential model, it has an internal representationof the past observations x , .., x t , that enable it to weigh-in the likelihood of transitionbased on information in the more distant past[35]. The LSTM eﬀectively assumes therole of the ﬁnite state machine and so the model can directly model the state of theplasma rather than the transitions. However, it introduces higher level of complexity,particularly for training, as we need to train on sequences rather than ﬁxed-lengthwindows. For the purposes of this work, we have assembled a dataset based on the time-tracesof four signals originating in the TCV tokamak[36, 37]. We opted, for the purposes ofthis work, to use the same, limited set of diagnostic signals that experimentalists use todetermine, in post-shot analysis, the state of the plasma (Figure 4). S i g n a l v a l u e s ( n o r m . ) PDFIRDMLIPLowDitherHigh

Figure 4: Switches between diﬀerent plasma modes(Low, Dither and High), and time-traces of the collected signals, TCV shot S i g n a l v a l u e s ( n o r m . ) PDFIRDMLIPELMLowDitherHigh

Figure 5: ELMs and L and H plasma modes, TCV shot

Photodiode (PD) signal . Corresponds to the measurements given by the photodiodediagnostic at TCV along a vertical chord, measuring the line-integrated emittedvisible radiation; the photodiode has an H α ﬁlter which measures radiation at653.3 nm.Transitions between diﬀerent plasma states, as well as ELMs, can be most easilyobserved through analysis of the photodiode (PD) signal (Figure 5). Transitionsfrom L to H mode are characterized by a sudden drop in the baseline value ofthe signal, whereas transitions back into L mode have the opposite trace, i.e., thebaseline PD signal suddenly increases and remains at a steady level. ELMs arecharacterized by a sudden spike in the PD signal, followed by a relaxation thattakes at most 2ms. D modes generate rapid ﬂuctuations in the signal (see Figure7); they do not necessarily correspond to a change in the baseline signal value,unless they are followed by a transition into a diﬀerent state from the one at thepoint where they started.(ii) Interferometer (FIR) signal . The interferometers at TCV measure the line-integrated electron density in the plasma along 14 parallel, vertical lines of sight.Of these, we take the mean value, per time instant, of the 12 inner-most detectors.In the interferometer signal, the transition between L and H mode can most easily beseen as a sudden increase in the time derivative of the signal, while transitions backinto L mode correspond to a decrease in the derivative. Similarly to what happenswith the photodiode signal, ELMs may provoke short (albeit less pronounced) spikesin the FIR signal.(iii)

Diamagnetic Loop (DML) signal . Refers to the measurement of the total toroidalmagnetic ﬂux of the plasma[38]. The derivative of the DML signal frequentlyswitches signs when a transition occurs between L and H mode, as well as when anELM occurs (Figure 6). Furthermore, the sign of this signal’s derivative changesdepending on the sign of the plasma current.(iv)

Plasma Current (IP) signal . Refers to the total plasma electric current. For thiswork, we use the current value to determine when the actual classiﬁcation of plasmastates should begin. Speciﬁcally, we ignore, for classiﬁcation purposes, time pointswhere the absolute value of the current is lower than 50 kA. S i g n a l v a l u e s ( n o r m . ) PDFIRDMLIPELMLowDitherHigh

Figure 6: ELMs, and L and H modes from a section of TCV shot S i g n a l v a l u e s ( n o r m . ) PDFIRDMLIPLowDitherHigh

Figure 7: L, D and H modes from a section of TCV shot

The two proposed models develop diﬀerent maps. The ﬁrst model is a map betweena ﬁxed window of observations and a distribution over transitions, while the secondmodels a sequence of observations and produces a sequence of states (see Figure 8).Accordingly, the training data has diﬀerent arrangements. For transitionclassiﬁcation, we need to prepare a dataset D , { ( x, q, e ) } , where a training point consistsof a section of the recorded signal( x t − w , ...x t , ..., x t + w ), the corresponding label of one ofthe transitions q t in Q and the matching label e t indicating the presence (or not) of anELM. Figure 9 illustrates this in detail.For the second model D , { ( x, z, e ) } , a training point consists of a sequence ofwindows of observations drawn from x t to x t + l + w (where l is a deﬁned sequence length,and w is the window length), a sequence of state labels z t in Z of length l , with eachlabel corresponding to the state of the plasma at times t , and a sequence of labels e t oflength l corresponding to the presence of an ELM at times t . Figure 9 illustrates thisin detail.1There is an inherent uncertainty in the labeling of the ELMs and plasma states,particularly when it comes to transitions into and out of dithers. The raw data onlyhas hard, binary, one-hot encodings[39] – that is, a transition between two states, forexample, is labeled as a sudden switch (from one time slice to the next) from one stateto another. This means that it is easy to mistakenly label an event or transition in aslightly shifted time slice. This type of hard threshold also makes it diﬃcult for a neuralnetwork to generalize to outside of its training set[40].Therefore, for the ﬁrst model (CNN), we process the target time-series such thatthe probability of an ELM, or of a given state transition, is a continuous value, startingat zero and peaking at one, with several intermediate probabilities. In practical terms,we apply on each event a gaussian smoothing such that, if an ELM or state transitionoccurs at time t , its probability at that point is 1, and we deﬁne an interval ∆ t – beforeand after t – where the probability, respectively, smoothly increases and decreases.We deﬁned these smoothing intervals as corresponding to 2ms, which, at the deﬁnedsampling rate, translates to 20 time slices. We do the same with the states z t for thesecond model (Conv-LSTM), such that a switch between two diﬀerent states, from z to z , does not happen immediately from one time slice to the next, but rather, theprobability of z decreases, while that of z increases, over a span of 20 time slices.This procedure not only models the uncertainty in the labeling process, but alsoacts as an automatic regularization for the neural network training process, i.e., it makesit easier for a neural network to generalize what it learns to unseen data[41]. P D ( n o r m . ) PD P r o b . LHHL DHHD LDDL P r o b . LowDitherHigh P r o b . ELM

Figure 8: Representation of the diﬀerent types of encoding of the target “smooth” datadistributions, to be learned by the two classiﬁers, from TCV shot

Figure 9: Representation of the sliding temporal windows fed to the CNN on top of thePD signal, and their corresponding ELM probability output. At inference time, thesewindows slide over the 4 signals across the whole shot, each of them rendering an outputprobability for a single time slice.The choice of the size of the temporal windows with which the CNN is trained is atrade-oﬀ between the assumptions made about the data, and computational feasibility.Larger windows contain more spatial information and thus, intuitively, should makethe classiﬁcation at a particular time slice more precise, but also make the trainingand inference process by the network slower. Smaller windows contain arguably lessinformation, but can be processed faster. We opted to train the CNN with temporalwindows with a length of 20ms, which we judged to be a good compromise between thosetwo requirements. At our sampling rate, these windows are 200 time slices long. This isillustrated in Figure 9: the green region represents a window of signals (in this case, onlythe PD signal) which is fed to the neural network, and its associated target, which isthe probability of an ELM occurring at t = 0 . t = 0 . t = 0 . t = 0 . p ( e t | x t − w : t + w ) and p ( q z t − → z t | x t − w : t + w ), where w = 180 and w = 20. In practice, in a real-time setting,that oﬀset would constitute a minimum delay between the occurrence of an event ina machine, and a detection by the classiﬁer. Once again, the size of this oﬀset is atrade-oﬀ: a smaller oﬀset is ideal for real-time applications because it gives more timefor feedback control mechanisms, but it also contains less information for the networkto accurately classify an event.We train the Conv-LSTM not with windows, but with sequences of windows. Thedistinction is an important one, for it implies diﬀerent assumptions about the data. In3 Figure 10: Example of a sequence fed to the LSTM. At a 10 kHz sampling rate, itconsists of 200 overlapping temporal windows of length 40. The output probability fora given window depends not only on what data features are present in that window, butalso on the past windows in the sequence.the case of the windows fed to the CNN, it is assumed that each window is independentof each other. In the data fed to the Conv-LSTM, each sequence itself is composedof several windows, with future windows depending on past ones. We deﬁned each ofthose sequences to consist of 200 windows (since that was also the length of the windowsfed to the CNN). In this case, each of the individual windows has a length of 4ms (40time slices), with an oﬀset of 2ms, as in the data for the CNN (see Figure 10). Thesequences have a stride[42] of 1: each window starts and ends exactly 1 time slice afterthe previous one ﬁnishes. Each of these sequences is randomly subsampled from thewhole shots, and the corresponding targets for them are chosen randomly from one ofthe three labelers.Although not all of these subsamples start in L mode, our expectation is that thenetwork would learn by itself that an actual shot always begins in that state. Thereare several reasons for this. First, the network will learn to recognize any features inthe subsequences that are consistent with the beginning of a shot, and learn that thosefeatures correlate to L mode. Second, even if some training sequences start in D orH mode, the network will statistically learn that these modes are more frequently theresult of a transition from a previous mode.

The architecture of the neural networks used for the transition detection starts with a1-D convolutions with four channels, each of which receives the values from the PD, FIR,IP and DML signals. These are followed by several convolutional layers, interspersed4with pooling and dropout layers, which are trained for feature extraction, with deeperlayers extracting higher-level data features (Figure 11). The last layers of the network arefully-connected, and are responsible for receiving the pre-processed high-level featuresand producing an appropriate output for them, i.e., the desired classiﬁcation. Thismodel is loosely inspired by the VGG architecture for classiﬁcation of images whereﬁxed sized ﬁlters are used[43].

Conv1D (64,3)Conv1D (128,3)Dropout (0,5)Maxpool (2) Conv1D (256,3)Conv1D(256,3)Conv1D(256,3)Dropout (0,5)Maxpool (2) Dense(64)Dense(16) Dense (7)/Dense(2)Conv1D (256,3)

Conv1D(256,3)

Conv1D(256,3)Dropout (0,5)Maxpool (2)

Figure 11: Architecture of the Convolutional NNOur convolutional LSTM network builds on top of CNN model that showed the bestperformance on the transition detection task. We add a recurrent layer that processesthe output of the CNN to capture the longer-distance correlations in the data (Figure12). We designed the networks using the Keras framework for Deep Learning[44]. Bothnetworks used a categorical cross-entropy loss function, and were trained with the Adamoptimizer[45] using the default learning rate value provided by Keras.

Conv1D (64,3)Conv1D (128,3)Dropout (0,5)Maxpool (2) Conv1D (256,3)

Conv1D(256,3)

Conv1D(256,3)Dropout (0,5)Maxpool (2) Dense(64)

Dense(16)

Conv1D (256,3)Conv1D(256,3)Conv1D(256,3)

Dropout (0,5)

Maxpool (2) LSTM(32)LSTM(32) Dense (32)Dropout (0,5)

Dense (3)/

Dense(2)(Time-

Distributed) (Time-Distributed) (Time-

Distributed)

Figure 12: Architecture of the convolutional LSTM. All layers and nodes use ReLUactivation functions, apart from the ﬁnal output layer, which uses Softmax activation.5

In total, we possessed 54 shots fully labeled by the three experts. In a typical DeepLearning setting, some sort of normalization[46] is usually applied on the available data.The most common procedure would have been to normalize across the entire dataset.However, because of the diﬀerent calibrations of the PD signals and the subsequentlarge variance and multimodal distribution associated with it, we decided, at this stage,to normalize each shot separately dividing each signal in each shot by its own meanacross the whole shot. For potential real-time applications, as any new shots could falloutside the normalization range, the procedure would require grouping and normalizingthe shots with respect to diﬀerent signal gains and calibrations.From these normalized full sequences, we draw batches of smaller temporal windowsand subsequences to train the neural networks. There are several reasons for thissubsampling. First, the full shot time-series are up to about 20,000 time slices long,but the actual length of a shot can vary signiﬁcantly. Yet for purposes of trainingthe networks, we require batches of data of ﬁxed length, which can be achieved bysubsampling from the full sequences.Second, this method allows us to automatically perform data augmentation fortraining, since one long sequence will contain many shorter subsequences and windows.Third, feeding very large temporal windows to a CNN would be computationallydiﬃcult, as the number of network parameters requiring training would growconsiderably.Finally, the distribution of the data in the full sequences is highly unbalanced:in most shots, dithering phases are signiﬁcantly shorter than L and H phases; onlya few dozen transitions happen at most per shot; and, some transitions tend to bemore frequent than others. Training with whole sequences would signiﬁcantly bias thenetworks towards the events and transitions that occur more frequently in the labeleddata. Drawing subsequences allows us to control the data fed to the network suchthat this inherent bias is mitigated. To do this, the training data batches must bebalanced, i.e., generated such that they contain roughly equal fractions of the diﬀerenttypes of events and/or transitions of interest. In the CNN, there are 8 possible eventsof interest – LH, HL, HD, DH, LD, DL, ELM, and no transition. Generating batchesfor the CNN means that, for a batch containing n data samples, n/8 of those sampleswill correspond to each of those diﬀerent types of transitions. Similarly, for the Conv-LSTM, the batches are generated such that the three target distributions (L, D and H)correspond to approximately 1/3 of the data samples each.

5. Evaluation metrics

We consider the detection of single, discrete ELMs by the networks as corresponding to apoint in time (in a shot) where the direct network outputs for ELM probability ˆ e N reach6a maximum value. This is not necessarily a point where the output network probabilityfor ELM is 1, but rather, a point t where the output probability P ( ELM t ) follows a seriesof strictly increasing probability values, and precedes a series of strictly decreasing ones.Because we deﬁned the length of the gaussian smoothing of the probabilities as 20, herewe consider a local maximum for P ( ELM t ) within a 20-wide interval to correspond tothe detection of a single ELM – which we denote as a positive. The remaining points areconsidered non-detections, i.e., negatives. In addition, we deﬁned diﬀerent probabilitythresholds for what can be considered a detection of an ELM by the network. Forexample, deﬁning a threshold of 50% implies that only ELM probability maxima abovethat threshold are considered positives.Positives and negatives must then be compared to the labeled ELMs. To that end,we build the ELM Confusion Matrix, which deﬁnes several variables: negatives thatmatch their label at the same point in time are True Negatives (TN), while those thatdo not are False Negatives (FN). Similarly, positives that match their label are TruePositives (TP) and those that do not are False Positives(FP).Using this method to determine the points in which the network detects individualELMs, one can then compute the True Positive Rate (TPR) and False Positive Rate(FPR) for diﬀerent detection thresholds: T P R = T PT P + F N (1)

F P R = F PF P + T N (2)Plotting the TPR versus FPR for a series of diﬀerent detection thresholds yields theclassiﬁer’s ROC curve[47], which illustrates the network’s capacity for discriminationgiven diﬀerent detection thresholds. There are several ways to compute the idealdetection threshold based on the ROC curve, depending on the task in question. Inour case, we use the Youden index[48], whereby the best threshold is the value whichmaximizes the diﬀerence

T P R − F P R , the maximum value being 1.

To compare the models’ accuracy with that of the human labelers, we use Cohen’sKappa-statistic coeﬃcient, which measures agreement between two sets of categoricaldata[49], deﬁned as κ = p − p e − p e (3)where p denotes the actual relative agreement between the two sets, and p e denotesthe probability of the two sets randomly agreeing with each other. Generically, the κ coeﬃcient’s values oscillate between 0 and 1, the former indicating poor performance,and the latter indicating perfect performance. In our case, given two sequences z and z of plasma states, Kohen’s Kappa measures the overlap between them. If z t = z t for7all time instants t , the metric will yield a score of 1; if there are mismatches betweenthe two sequences, the score will go down.The κ -statistic can be interpreted diﬀerently based on the sections of the data forwhich it is computed. For that reason, we will now deﬁne several variables that allowus to interpret the κ -statistic scores.Remember that we possess labels drawn from three diﬀerent experts; as such,generically, labeled shot states at each point in time t of a shot can be in one of threepossible categories: • No majority agreement, i.e., all labelers disagree as to what state the plasma is in,which we denote as category C . • Majority agreement, i.e., two labelers agree on the state of the plasma, while onedisagrees, which we denote as category C . • Consensual agreement – all labelers agree as to what state the plasma is in, whichwe denote as category C .We deﬁne the union of C and C as ground truth ( C ), i.e., they are sections ofshots where there is at least a majority opinion as to what state the plasma is in. Wealso have, for each shot, the most likely sequences ˆ z N of states (given the observeddata) produced by the neural networks, which we will now denote as C .Computing the κ -statistic score, κ l , between sets C and C gives an indicationof the probability that a single labeler disagrees with the ground truth: a κ l -score of1 would indicate that there is agreement between all the labelers all the time, while alower score would indicate that at least some of the time, one labeler disagrees withthe others. Simultaneously, computing the κ -statistic score between sets C and C ( κ n ) gives an indication of the networks’ performance given the ground truth. But, inaddition, we can directly compare κ l and κ n . This comparison allows to test how anetwork and a single labeler compare against each other, on average, given the groundtruth.The κ -coeﬃcient is calculated separately for each of the three possible labels forthe plasma state (L, D and H), and as a weighted mean across all three states. Theweights of that mean are taken to be the relative frequencies of each individual state inthe dataset, based on the ground truth ( C ) labels.

6. Results

We performed several training runs using the data labeled by the three experts; wecarried out experiments where we trained both models (CNN and Conv-LSTM) threetimes, each time randomizing the training and test shots, to test whether diﬀerences inthe data could lead to diﬀerent results. In a typical Deep Learning setting, the data isusually split so that approximately 80 −

90% is used for training, and 20 −

10% is usedfor validation of the results, i.e., testing the network’s capability to accurately predicton data that was not used for training. In our case, we opted for a training/test data8split of 50%, i.e., of the 54 shots, we used 27 for training and 27 for testing. The resultsthat follow are the best results of those three experiments, for each model. We alsoexperimented with varying oﬀsets (see Figure 9) for the convolutional windows to seewhat eﬀect that factor could have on the results; we settled for an oﬀset value of 2 ms (20time slices), as smaller oﬀsets degraded results, while larger ones did not improve them.We computed the metric scores on the training and test data at several points duringtraining to control for overﬁtting[50], and present the results from the epoch where thestate detection results on test data were the highest. We ran the neural networks on anNVIDIA Quadro RTX 5000 GPU. We computed the κ -statistic based on the regions deﬁned in Subsection 5.2 – that is, wecompute scores based on the network output versus the ground truth ( κ n ), and basedon labeler disagreement versus the ground truth ( κ l ). We computed the scores on aper-state (L, D and H) basis, and also computed a mean of the values obtained for eachstate.We trained the CNN for 250 epochs, allowing for the loss function to stabilize;each epoch consisted in 32 batches, with each batch containing 64 data samples. Uponcompletion of training, we tested the CNN’s accuracy on both the training and testdata. The model’s results on ELM classiﬁcation (ROC curve) can be seen in Figure13. Table 1 shows the scores κ n and κ l for the entire dataset, while Figure 14 containshistograms showing the κ n s distribution on a per-shot basis.L D H Mean K n Train 0.691 0.358 0.657 0.649Test 0.219 0.115 0.157 0.182 K l Train 0.937 0.896 0.987 0.958Test 0.941 0.848 0.986 0.962Table 1: κ -statistic scores ( κ n and κ l ) for each plasma mode and as a mean, on trainingand test data (values across all shots), for the CNN9 T r u e P o s i t i v e R a t e (a) Training data. T r u e P o s i t i v e R a t e (b) Test data. Figure 13: ROC curves for ELM detection for the CNN model. The detection thresholdsthat maximize the Youden index are 0 . . .

993 and 0 .

99. Using the ideal threshold for the training data(0 .

2) on the test data gives a slightly lower Youden index of 0 . Low Dither . . . . . . High . . . . . . Mean F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score (a) Training data.

Low Dither . . . . . . High . . . . . . Mean F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score (b) Test data.

Figure 14: Distribution of the κ -statistic score ( κ n ) on a per-shot basis, for the CNN. We trained the convolutional LSTM for 400 epochs, allowing the loss function tostabilize. Each epoch consisted of 64 batches, with each batch containing 64 datasamples. The results of computing scores κ l and κ n , using the same deﬁnitions as forthe CNN can be seen in tables 2. The ROC curves detailing the results on ELM detectioncan be seen in Figure 15. Figure 16 contains histograms showing the score K n valueson a per-shot basis.0L D H Mean K n Train 0.96 0.889 0.967 0.96Test 0.82 0.766 0.85 0.832 K l Train 0.96 0.94 0.992 0.98Test 0.901 0.808 0.98 0.935Table 2: κ -statistic scores ( κ n and κ l ) for each plasma mode on training and test data,for the Conv-LSTM T r u e P o s i t i v e R a t e (a) Training data. T r u e P o s i t i v e R a t e (b) Test data. Figure 15: ROC curves for ELM detection for the Conv-LSTM model. The detectionthreshold which maximizes the Youden index is 0 . . .

977 and 0 .

969 for each set respectively. Using the idealthreshold for the training data (0 .

5) on the test data gives a slightly lower Youden indexof 0 . Low Dither . . . . . . High . . . . . . Mean F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score (a) Training data.

Low Dither . . . . . . High . . . . . . Mean F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score F r e q u e n c y Score (b) Test data.

Figure 16: Distribution of the κ -statistic score ( κ n ) on a per-shot basis, for the Conv-LSTM1 A comparison of the κ n scores on training and test data for each classiﬁer shows thatthe vonvolutional LSTM performs better than the CNN for all three plasma states.Furthermore, looking at the distribution of the mean κ n scores on a per-shot basisthrough the histograms, one can see that the worst Conv-LSTM classiﬁcations do nothave a score lower than 0.6 on training data, while for the CNN alone, even on trainingdata, mean κ n scores lower than 0.2 exist. For both classiﬁers, the performance ontraining data surpasses that on test data, both on a state-by-state basis, and as a meanacross all states, which indicates the occurrence of overﬁtting.For both networks, an analysis of the κ l scores of their training and test dataindicates that human labeler disagreement is highest for dithers – the scores for thatparticular state are consistently lower. Interestingly, both networks also score theirlowest results for dithers.Comparing the Conv-LSTM’s κ l and κ n scores shows that, at least on training data,the network behaves, on average, similarly to a single human labeler, making errors (ordisagreeing with the ground truth) at approximately the same rate – the mean κ l scorefor training data is 0.98, while the mean κ n score for training data is 0.96. On test data,the Conv-LSTM performs slightly worse than a single human labeler, as seen by thefact that the network’s mean K-index score on test data κ n is 0.832, while κ l is 0.935.As measured by the Youden index, we achieve excellent performance in detection ofELMs on both training and test data using both models; the ideal detection thresholdsgenerate true positive detection rates very close to 1, while bringing false positivedetection rates essentially to 0. The Youden indexes for test data are only slightlylower than for training data, which suggests that overﬁtting is minimal. Furthermore,for both models, on both training and test data, the ROC curves’ points are mostlyconcentrated close to True Positive Rates of 1 and False Positive Rates of 0, whichindicates that the choice of ELM detection threshold does not signiﬁcantly change thebehavior of the classiﬁers.Finally, the scores for ELMs being essentially the same for both models indicatesthat the features in the data which allow for identiﬁcation of ELMs are mostly local: theCNN, even without knowledge of long-term temporal correlations, performs excellentclassiﬁcation.Because the Conv-LSTM has highest κ n scores, we made a case-by-case analysis ofthat network’s classiﬁcation of all our available shots. Broadly, the Conv-LSTM’s resultson state detection, on a per-shot basis, can be placed into six diﬀerent categories:(i) A (sometimes very) short detection, of a dither that is not labeled in the data.Due to the way the K-score κ n is computed, a mistaken dither classiﬁcation by thenetwork of a single time point (in a whole sequence), in a shot which has no regionswhere the ground truth ( C ) is dithering, will bring the score for that state downto 0, even if the remainder of the shot is correctly classiﬁed (17 shots).2(ii) A clearly incorrect classiﬁcation, of a temporal region of a shot as being in adithering state (4 shots);(iii) A missed detection of an L-H transition (1 shot);(iv) A missed detection of an H-L transition (2 shots);(v) An overall bad detection across an entire shot (7 shots);(vi) An overall good detection across an entire shot (23 shots).Table 3 lists 6 shots which are representative of each of the types of results listedabove. The table shows the computed κ n scores for each of those shots on a per-statebasis, as well as the score’s mean value, and the fraction of time, for the ground truthof each shot, that a particular state is labeled. The table also lists which of the 6 casesabove the shot is representative of. Figures 17 to 22 are plots of those same shots, wherethe background color in the top plot denotes the state detected by the Conv-LSTM, andin the bottom plot, denotes the ground truth label. Small gray areas in the bottom plotdenote regions where ground truth is not deﬁned, i.e., there is no majority agreementbetween labelers.Case Shot ID L D H MeanFraction Score Fraction Score Fraction Score1 57751 0.756 0.97 0 0 0.243 0.97 0.972 34010 0.679 0.856 0.073 0.232 0.248 0.602 0.7483 58182 0.22 0.912 0.095 0.969 0.685 0.927 0.9284 30197 0.951 0.384 0 1 0.049 0.384 0.3845 33459 0.811 0.662 0 0 0.189 0.846 0.6976 33942 0.455 0.953 0.183 0.884 0.412 0.997 0.962Table 3: Kappa statistic ( κ n ) scores for each plasma mode on training and test data forselected shots representative of each of the six result categories P D , n o r m a li z e d LDH C l a ss i f i c a t i o n G r . T r u t h Shot

Figure 17: TCV shot t = 0 . P D , n o r m a li z e d LDH C l a ss i f i c a t i o n G r . T r u t h Shot

Figure 18: In TCV shot t = 0 . s , but it shortly thereafter (incorrectly) switches back to dithering. P D , n o r m a li z e d LDH C l a ss i f i c a t i o n G r . T r u t h Shot

Figure 19: In TCV shot t = 0 . s ) but then incorrectly switches back to L mode and remainsthere until the ﬁrst ELMs (spikes in the PD signal) appear. P D , n o r m a li z e d LDH C l a ss i f i c a t i o n G r . T r u t h Shot

Figure 20: In shot t = 0 . s .4 P D , n o r m a li z e d LDH C l a ss i f i c a t i o n G r . T r u t h Shot

Figure 21: Shot t = 0 . s , immediately after classifying a D mode, the network oscillates betweenL and H in quick succession for about 0 . s , which to the naked eye might appear inthis plot as a gray area; in reality, it is an artifact of the plot, with alternating red andgreen regions. P D , n o r m a li z e d LDH C l a ss i f i c a t i o n G r . T r u t h Shot

Figure 22: Shot

7. Conclusions

We have developed two Deep Learning-based classiﬁers to perform automatic detectionof ELMs and classiﬁcation of plasma modes. The task was two-fold: on one hand,to perform a binary classiﬁcation, for each time slice of a plasma shot, on whetheran ELM is occurring or not; and, to automatically determine which plasma mode (oralternatively, whether a transition between plasma modes) is occurring. One approachis to use a convolutional Neural Network (CNN), which uses only local correlations indata to perform classiﬁcation. The second approach uses a Convolutional LSTM (Conv-LSTM) neural network, which also takes advantages of long-term temporal correlationsin data.On ELM detection, the two networks can achieve essentially equal results. On theplasma state classiﬁcation, a clear diﬀerence can be seen between the results obtainedwith the CNN, and those obtained with the Conv-LSTM. Comparing the κ -index ( κ n )scores of each network shows that the LSTM’s scores are clearly higher, which suggeststhat, at least when it comes to detection of plasma modes, the processing of long-termcorrelations in data facilitates accurate classiﬁcation. There is some indication thatoverﬁtting occurred. However, our monitoring of the training progression indicatedthat, while the metric values for test data are always lower, they did, nevertheless,become better as training progressed. Thus, an overﬁtting-avoidance strategy such asearly stopping would, in this case, not have helped achieve better test accuracy.While the results from the Conv-LSTM are better, that network is also morecomplex with both network training and inference taking longer.Although this work used data from the TCV tokamak, it should also be possibleto adapt it to other machines; as a matter of fact, the data sources used exist on mosttokamaks. As long as the data fed to the neural networks is from those same sources,this model could in principle be used for automatic labeling of shots from a number ofdiﬀerent machines. Acknowledgements

This work has been carried out within the framework of the EUROfusion Consortium and hasreceived funding from the Euratom research and training programme 2014-2018 and 2019-2020under grant agreement No 633053. The views and opinions expressed herein do not necessarilyreﬂect those of the European Commission. We would like express our gratitude to B. Labit,R. Maurizio and O. Sauter at SPC/EPFL for taking the time to manually label the data usedfor training. This work was supported in part by the Swiss National Science Foundation. References [1] Zhang T, Gao X, Zhang S, Wang Y, Han X, Liu Z, Ling B, Team E et al.

Physics Letters A et al.

Nuclear Fusion et al. Physics of Plasmas et al. Journal ofPhysics: Conference Series vol 123 (IOP Publishing) p 012033[5] Ryter F, Buchl K, Fuchs C, Gehre O, Gruber O, Herrmann A, Kallenbach A, Kaufmann M,Koppendorfer W, Mast F et al.

Plasma physics and controlled fusion A99[6] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and RabinovichA 2015 Going deeper with convolutions

The IEEE Conference on Computer Vision and PatternRecognition (CVPR) [7] Ince T, Kiranyaz S, Eren L, Askar M and Gabbouj M 2016

IEEE Trans. Industrial Electronics Advances in Neural Information Processing Systems 25 ed Pereira F, BurgesC J C, Bottou L and Weinberger K Q (Curran Associates, Inc.) pp 1097–1105[9] Tompson J, Goroshin R, Jain A, LeCun Y and Bregler C 2015 Eﬃcient object localization usingconvolutional networks

The IEEE Conference on Computer Vision and Pattern Recognition(CVPR) [10] L¨ahivaara T, K¨arkk¨ainen L, Huttunen J M and Hesthaven J S 2018

The Journal of the AcousticalSociety of America

Computers in biology and medicine

Journal of Sound and Vibration

Journal of Chemometrics e2977[14] Golik P, T¨uske Z, Schl¨uter R and Ney H 2015 Convolutional neural networks for acoustic modelingof raw time signal in lvcsr Sixteenth annual conference of the international speech communicationassociation [15] Kiranyaz S, Ince T and Gabbouj M 2015

IEEE Transactions on Biomedical Engineering Expert systems with applications Thirteenth annual conference of the international speech communication association [18] Ma X, Tao Z, Wang Y, Yu H and Wang Y 2015

Transportation Research Part C: EmergingTechnologies arXiv preprint arXiv:1604.01729 [20] Webster A and Dendy R 2013 Physical review letters

Plasma physics and controlled fusion [23] Shousha R et al. et al. Nuclear Fusion et al. Plasma Physics and Controlled Fusion [26] Murari A, Vagliasindi G, Zedda M K, Felton R, Sammon C, Fortuna L and Arena P 2006 IEEETransactions on Plasma Science Plasma Physics and Controlled Fusion et al. Plasma Physics and Controlled Fusion et al. Nuclear Fusion et al. Physical Review Letters Phys. Rev. Lett. (22) 2276–2279[32] Basse N, Zoletnik S, Antar G, Baldzuhn J, Werner A et al. Plasma physics and controlledfusion Speech and Language Processing, second edition (PearsonEducation)[35] Boulanger-Lewandowski N, Bengio Y and Vincent P 2013 High-dimensional sequence transduction (IEEE) pp3178–3182[36] Hofmann F, Lister J, Anton W, Barry S, Behn R, Bernel S, Besson G, Buhlmann F, Chavan R,Corboz M et al.

Plasma Physics and Controlled Fusion B277[37] Coda S et al.

Nuclear Fusion [38] Moret J M, Buhlmann F and Tonetti G 2003

Review of Scientiﬁc instruments Digital Design and Computer Architecture

Computer organizationbundle, VHDL Bundle (Elsevier Science) ISBN 9780080547060[40] Szegedy C, Vanhoucke V, Ioﬀe S, Shlens J and Wojna Z 2016 Rethinking the inception architecturefor computer vision

Proceedings of the IEEE conference on computer vision and patternrecognition pp 2818–2826[41] Zheng Q, Yang M, Yang J, Zhang Q and Zhang X 2018

IEEE Access stat arXiv preprint arXiv:1409.1556 [44] Chollet F et al. https://keras.io [45] Kingma D P and Ba J 2014 arXiv preprint arXiv:1412.6980 [46] Han J, Pei J and Kamber M 2011 Data mining: concepts and techniques (Elsevier)[47] Fawcett T 2006