[PDF] Automatic classification of plasma regions in near-Earth space with supervised machine learning: application to Magnetospheric Multi Scale 2016-2019 observations

Abstract

The proper classification of plasma regions in near-Earth space is crucial to perform unambiguous statistical studies of fundamental plasma processes such as shocks, magnetic reconnection, waves and turbulence, jets and their combinations. The majority of available studies have been performed by using human-driven methods, such as visual data selection or the application of predefined thresholds to different observable plasma quantities. While human-driven methods have allowed performing many statistical studies, these methods are often time-consuming and can introduce important biases. On the other hand, the recent availability of large, high-quality spacecraft databases, together with major advances in machine-learning algorithms, can now allow meaningful applications of machine learning to in-situ plasma data. In this study, we apply the fully convolutional neural network (FCN) deep machine-leaning algorithm to the recent Magnetospheric Multi Scale (MMS) mission data in order to classify ten key plasma regions in near-Earth space for the period 2016-2019. For this purpose, we use available intervals of time series for each such plasma region, which were labeled by using human-driven selective downlink applied to MMS burst data. We discuss several quantitative parameters to assess the accuracy of both methods. Our results indicate that the FCN method is reliable to accurately classify labeled time series data since it takes into account the dynamical features of the plasma data in each region. We also present good accuracy of the FCN method when applied to unlabeled MMS data. Finally, we show how this method used on MMS data can be extended to data from the Cluster mission, indicating that such method can be successfully applied to any in situ spacecraft plasma database.

Full PDF

AA UTOMATIC CLASSIFICATION OF PLASMA REGIONS INNEAR -E ARTH SPACE WITH SUPERVISED MACHINE LEARNING : APPLICATION TO M AGNETOSPHERIC M ULTI S CALE

OBSERVATIONS

A P

REPRINT

Hugo Breuillard

Laboratoire de physique des plasmas (LPP)Palaiseau, France [email protected]

Romain Dupuis

Center for Mathematical Plasma Astrophysics (CmPA)KU Leuven, Leuven, Belgium

Alessandro Retino

Laboratoire de physique des plasmas (LPP)Palaiseau, France

Oliver Le Contel

Laboratoire de physique des plasmas (LPP)Palaiseau, France

Jorge Amaya

Center for Mathematical Plasma Astrophysics (CmPA)KU Leuven, Leuven, Belgium

Giovanni Lapenta

Center for Mathematical Plasma Astrophysics (CmPA)KU Leuven, Leuven, BelgiumSeptember 2, 2020 A BSTRACT

The proper classiﬁcation of plasma regions in near-Earth space is crucial to perform unambiguousstatistical studies of fundamental plasma processes such as shocks, magnetic reconnection, waves andturbulence, jets and their combinations. The majority of available studies have been performed byusing human-driven methods, such as visual data selection or the application of predeﬁned thresholds a r X i v : . [ phy s i c s . p l a s m - ph ] S e p PREPRINT - S

EPTEMBER

2, 2020to different observable plasma quantities. While human-driven methods have allowed performingmany statistical studies, these methods are often time-consuming and can introduce important biases.On the other hand, the recent availability of large, high-quality spacecraft databases, together withmajor advances in machine-learning algorithms, can now allow meaningful applications of machinelearning to in-situ plasma data. In this study, we apply the fully convolutional neural network (FCN)deep machine-leaning algorithm to the recent Magnetospheric Multi Scale (MMS) mission datain order to classify ten key plasma regions in near-Earth space for the period 2016-2019. For thispurpose, we use available intervals of time series for each such plasma region, which were labeled byusing human-driven selective downlink applied to MMS burst data. We discuss several quantitativeparameters to assess the accuracy of both methods. Our results indicate that the FCN method isreliable to accurately classify labeled time series data since it takes into account the dynamicalfeatures of the plasma data in each region. We also present good accuracy of the FCN methodwhen applied to unlabeled MMS data. Finally, we show how this method used on MMS data can beextended to data from the Cluster mission, indicating that such method can be successfully applied toany in situ spacecraft plasma database. K eywords heliophysics, classiﬁcation, near-Earth regions, magnetospheric multiscale mission, time series, machinelearning Near-Earth space regions classiﬁcation is a very challenging topic in space physics because of the strong plasmadynamics resulting from the interaction between the propagating solar wind and the standing Earth’s magnetosphere.This obstacle forms a collisionless bow shock (Kivelson et al., 1995) which can reﬂect and backpropagate energeticions into the supersonic solar wind, forming the ion foreshock region upstream. However, most of the plasma isdecelerated and heated downstream to a subsonic ﬂow which forms a sheath around the Earth’s magnetosphere calledthe magnetosheath. The boundary on which magnetosheath and magnetospheric pressures balance is called themagnetopause. The plasma ﬂows along the magnetopause towards the magnetotail, driving the frozen-in magneticﬁeld lines through the lobes, plasma sheet boundary layer and ultimately the plasma sheet, in which they eventuallyreconnect (Dungey, 1963). This large-scale solar-terrestrial interaction process is dominated by the highly-variable solarwind conditions, but also by localized and intermittent small-scale processes in each region. As a result, the plasmathroughout the near-Earth space is highly dynamic which make it impossible to classify the different regions using onlyspacecraft location along its orbit.Yet, this classiﬁcation is needed for statistical studies of the plasma properties of the different regions. These large-scalestatistics are allowed by the ever-growing quantity of available in-situ data from space missions that cover most of thenear-Earth space. A large amount of available data has been used to identify near-Earth space regions using human-driven (i.e., threshold-based) methods, e.g., (Jelnek et al., 2012). However, ﬁnding the optimal set of thresholds whichcharacterizes every region of the magnetosphere is time-consuming and presumably not ﬂexible enough with regard tothe large variability expected from the strong plasma dynamics. This may introduce important biases depending on thechosen thresholds, and may also result in many ”unclassiﬁed” regions (i.e., a signiﬁcant loss of information).One way to bypass this issue and beneﬁt from this large amount of data is to make use of supervised learning, themost common form of machine learning. The role of the supervised algorithms is to learn the relationship betweendata instances and an associated label for each data instance. For example, the text from an email may represent thedata instance and the label is a binary encoding the presence or not of spam in order to detect and ﬁlter spams fromemails. Supervised methods aim at expanding human knowledge automatically by identifying the intrinsic differencesbetween labeled points in a dataset. This knowledge is then generalized to other unlabeled datasets. Such machinelearning techniques show extremely promising (often state-of-the-art) results in various tasks, such as image and speech2

PREPRINT - S

EPTEMBER

2, 2020recognition, analyzing particle accelerator data, or natural language understanding (LeCun et al., 2015). They havealready been used speciﬁcally in heliophysics: to classify time series into several categories deﬁned by the solar origin ofthe wind (Camporeale et al., 2017), to detect magnetospheric ultra-low frequency waves (Balasis et al., 2019), to forecastsolar ﬂares (Nishizuka et al., 2017), or in particular to identify different regions of the near-Earth space (Olshevskyet al., 2019; Nguyen et al., 2019). Nevertheless, these models remain usually region- or mission-speciﬁc because theprocess of human-labeling is a time-consuming and tedious task and there is still a blatant lack of human labels to coverthe whole near-Earth space.However, the Magnetospheric Multi Scale mission (MMS) (Burch et al., 2016), launched in 2015 and whose orbitcovers most of near-Earth space, has enabled such a dataset to be built. This mission uses the so-called Scientist In TheLoop (SITL) system (see the following section for more details) to human-label phenomena and regions of interest.The SITL process involves that an expert scientist is designated to select data of interest that will be transmitted to theground as high-time resolution (burst) data. The selected intervals are tagged with timestamps and a comment, whichare reported in the SITL reports. This system has been active since July 2015 and thus provides more than 4 years ofcommented data as of now. This makes it the biggest dataset of commented events and regions to our knowledge. Wedeveloped an automatic parser to convert all the comments, written by different scientists, in a uniﬁed list of labels thatcan be used by a machine learning algorithm.All SITL comments are associated with a time interval, therefore the temporal variations of the measurements canbe used to predict the label of the signals. Thus, each data instance can be interpreted as a time series and the near-Earth classiﬁcation using SITL data can be seen as a time series classiﬁcation (TSC), a very challenging problem indata mining and machine learning (Esling and Agon, 2012; Fawaz et al., 2019). The main difference with classicalclassiﬁcation problems comes from the ordering of the data. The important features helping to discriminate the labels aremainly found in the ordering of the values while classical methods only consider the values at a given time. We employtwo different techniques to solve this multivariate and multicategory classiﬁcation task: the multilayer perceptron isapplied naively on instantaneous data and is considered as a baseline for comparison and the convolutional neuralnetwork is applied on time series.In this article, we use the magnetic ﬁeld and particle measurements of the MMS mission to establish a vast andreproducible automatic detection of 8 plasma regions that cover almost the entire near-Earth space. First, we introducethe MMS and Cluster missions, and how we used the SITL reports to build a large labeled dataset of magnetic ﬁeld,plasma moments and plasma distributions from the 2016-2019 period. Then, we present the two machine learningalgorithms we used and their characteristics, as well as the metrics to measure their performances. Later, we presenttheir respective results for the MMS mission and the adaptation to the Cluster mission. Finally, we discuss these resultsand show our conclusions.

A fundamental and time-consuming step in any machine learning application is collecting, cleaning, and labelling data.This is a very important task as the quality of the data deﬁnes accuracy and the generalisation capability of the model.

MMS is a NASA space mission, launched in 2015, designed to study the electron-scale physics in Earth’s magnetosphereand in particular where magnetic reconnection occurs (Burch et al., 2016). Its equatorial orbit is optimized to spendextended periods in locations where reconnection is known to occur and thus covers the majority of the near-Earth spacekey regions. The mission is composed of four identical spacecraft ﬂying in an adjustable tetrahedral formation and itshighly-elliptic orbit also covers almost all regions of near-Earth space. The direct-current (DC) magnetic ﬁeld data areprovided by the Fluxgate Magnetometer (FGM) from the FIELDS instrument suite (Torbert et al., 2016; Russell et al.,3

PREPRINT - S

EPTEMBER

2, 20202016) with a temporal resolution of 0.1s in ”survey” mode and the plasma parameters by the Fast Plasma Investigation(FPI) instrument Pollock et al. (2016) with a temporal resolution of 4.5s in ”fast” mode.Cluster is an ESA space mission, launched in 2000, whose aim is to study the ion-scale physics of Earth’s magneticenvironment and its interaction with the solar wind. The mission is also composed of four identical spacecraft ﬂyingin a tetrahedral formation and its highly-elliptic polar orbit also covers almost all regions of near-Earth space. Themagnetic ﬁeld data are provided by the Fluxgate Magnetometer with a temporal resolution of 4s (Balogh et al., 2001)and the plasma parameters by the Hot Ion Analyzer instrument (R`eme et al., 2001) when the instrument was workingunder the magnetosphere or the magnetosheath mode.

One of the most important innovation brought by the MMS mission is its burst data management and selection system.The MMS spacecraft collect a combined volume of ∼ gigabits per day of particle and ﬁeld data. On average, only4 gigabits of that volume can be transmitted to the ground. With nested automation and ”Scientist-in-the-Loop” (SITL)processes, this system is designed to maximize the value of the burst data by prioritizing the data segments selected fortransmission to the ground.Concretely, the SITL system consists of a manual selection process by a scientiﬁc expert designated to eyeball dailysurvey (low-time resolution) data and pick time intervals of interest that will be transmitted to the ground as burst(high-time resolution) data. The selected intervals are tagged with timestamps and a comment, which usually includesthe type of event selected and eventually the near-Earth region in which it occurred. For each orbit, these selections arewritten up in a ”SITL report”, in the form of a text ﬁle notably. An example of these reports can be found in Argall et al.(2020).We developed a Python code which parses these text ﬁles to extract the timestamps and comments by identifyingkeywords associated with the near-Earth region where the event occurred. Each time interval was then associated with alabel indicating where the spacecraft was located as follows:Regions LabelsSolar wind SWIon foreshock FSBow shock BSMagnetosheath MSHMagnetopause MPBoundary layer BLMagnetosphere MSPPlasma sheet PSPlasma sheet boundary layer PSBLLobe LOBETable 1: Labels for the 10 different near-Earth regions in our model.If a region cannot be identiﬁed from the SITL comment, then the time interval is rejected. If FGM or FPI data is notavailable during a labeled time interval, then the time interval is also rejected. Using this technique, from the dateat which the SITL reports are available as text ﬁles, i.e., April 2016, to the end of 2019, we collected , labeledtime intervals relevant for our study. We note here that the number of occurrences for each regions were somewhatunbalanced, so we added a total of time intervals labeled by hand to undersampled regions (examples of typicalplasma parameters for the different magnetospheric regions can be found in textbooks, e.g. Baumjohann and Treumann(1996)), bringing the total to , labeled time intervals of various lengths. We resampled the label data to the samecadence as FPI data, i.e., 4.5s, representing a total of , , labeled data points. This constitutes the biggest datasetof labeled time intervals for near-Earth space to our knowledge.4 PREPRINT - S

EPTEMBER

2, 2020Regarding the Cluster mission labels, we use the dataset presented in Nguyen et al. (2019). This dataset covers threenear-Earth space regions, namely the magnetosphere, the magnetosheath and the solar wind, over a 2-year time period(2005-2006). The data is resampled to a 1 minute resolution, yielding a total of , labelled points. Once the labels are deﬁned, we build the ﬁnal dataset that will feed the ML models. In Earth’s magnetosphere, theplasma is dominated by its core magnetic ﬁeld, thus the ﬁrst feature we took into account is the magnetic ﬁeld vector (cid:126)B . Additionally, in each magnetospheric region the plasma has a rather typical distribution function. However, thefull particle distribution function constitutes an enormous amount of data when dealing with several years of data (asdone here), in particular with the MMS mission (several TB of data). Furthermore, important differences exist betweendifferent heliospheric missions in the speciﬁcities of the distribution functions, e.g. energy and angular ranges as wellas energy and angular resolutions. Therefore, we chose not to consider those products and focused on moments only(density, bulk velocity and temperature), to keep our model lightweight and as general as possible (i.e., applicable todifferent heliospheric missions such as Cluster). For this latter reason, we also decided not to use the spacecraft locationas a feature, because different missions cover different locations in near-Earth space (e.g. Cluster has a polar orbit whileMMS has an equatorial orbit; apogees/perigees are different, etc).We start with loading the raw MMS magnetic ﬁeld and plasma moments data from FGM and FPI instruments for the2016-2019 period using the aidapy package , and resample the data to the FPI cadence, i.e., 4.5s. The resulting datasetcontains 12 variables: the magnetic ﬁeld magnitude B tot and its components B x , B y , B z , the ion density N i , the bulkvelocity magnitude V i tot and its components V i x , V i y , V i z , the parallel and perpendicular temperatures T (cid:107) , T ⊥ withregards to the ambient magnetic ﬁeld and the total value T tot . The dataset is matched with the labels points deﬁned inthe previous section. If data gaps are present in the speciﬁed time interval, then the whole interval is rejected. Thismethod yields a total dataset of size , , × .The TSC model requires as input arrays of equal sizes with one label per array. Thus the dataset is grouped as timeseries corresponding to the labeled time intervals, resulting in , data blocks of various sizes. We decided to splitthese time series into equal chunks of 3 minutes, because this time length correspond roughly to the mean value ofthe data blocks (i.e., it minimizes the padding) and yields enough points (40) in each block given the time samplingresolution. To do so, we apply the following scheme: if the data block is shorter than 3 minutes, it is padded with thewrap of the vector along the axis (i.e., the ﬁrst values are used to pad the end and the end values are used to pad thebeginning); and if the data block is longer than 3 minutes, then it is split into several time series of 3 minutes each(the last one being padded as described above if also shorter than 3 minutes). Table 2 gives the distribution of theoccurrences for each class.This operation yields a dataset as an multidimensional array of size (34159, 40, 12). Finally, to input this array to theTSC model (and to avoid temporal bias due to the spacecraft orbit), we shufﬂe and randomly split the time series intotraining (56.25%), test (25%) and validation (18.75%) datasets. These three classical categories are deﬁned such as: • the training set is used to determined the weights of the model; • the validation set is an out-of-sample set and allows to evaluate the error for data which has not been observed.In particular, the validation set is used for hyperparameter search and early stopping to avoid over-ﬁttingensuring a good generalization of the model; • the test set is also an out-of-sample set which has not been used during training. The data are only used afterthe training to assess the performance of the ﬁnal model, after training and hyperparameter optimization. https://gitlab.com/aidaspace/aidapy PREPRINT - S

EPTEMBER

2, 2020Time seriesSW 2,130FS 4,714BS 2,328MSH 4,274MP 3,764BL 4,544MSP 4,658PS 3,576PSBL 2,703LOBE 1,468Total 34,159Table 2: Number of occurrences for each class.

In this section, we overview the machine-learning algorithm we use: the fully convolutional neural network (FCN).This algorithm is assessed with several evaluation metrics. FCN works with time-dependant inputs (time series) inorder to learn dynamical features. The main purpose of classifying time series and not only the instantaneous valuesis to learn automatically different features across the time dimension. The events can be characterized by dynamicalinsights at different scales learned by the model. We present a second model, called Multilayer perceptron (MLP), inthe supplemental materials. It considers the input of the variables as time independent. The main advantages ofthis method are its simplicity to prepare the machine learning pipeline and the possibility to work with any temporalresolution. However, the classiﬁcation performance is much lower.The FCN belongs to the category of artiﬁcial neural networks. They are usually characterized by their architecture:number of layers, type of connection (feedforward, feedback, convolution, etc.), and the activation functions. A layer isdeﬁned as a set of neurons which are not connected to each other. We selected neural network as they are able to learnnon-linear models and can handle large datasets compared to kernel methods. Time series classiﬁcation (TSC) is an active ﬁeld of research and hundreds of different algorithms have been developedin recent years (Bagnall et al., 2017). Classical TSC usually uses new features spaces generated from the time series.The best performing models can combine dozen of different classiﬁers and feature transformations, leading to signiﬁcantcomplexity and an important computational cost (Lucas et al., 2019).Even if deep learning has seen very successful applications in the last decades, only a few examples of algorithmsfor TSC exist. This lack of overview has been ﬁlled recently (Fawaz et al., 2019) and several models show verypromising results. In particular, we are convinced that convolution architectures can build very accurate classiﬁers. Theconvolutional layers use convolution in place of general matrix multiplication.Local groups of values are often highly correlated in time series and they form distinctive local motifs that are easilydetected. Convolutional layers are designed to identify these patterns invariant to location: if a motif speciﬁc to a labelappears in one part of the time series, it could also appear anywhere. Hence a convolutional layer sharing the sameweights at different locations can detect temporal patterns in different parts of the time series (LeCun et al., 2015).Moreover, several deep learning frameworks, such as Tensorﬂow (Abadi et al., 2016) or Pytorch (Paszke et al., 2019),are nowadays freely available. They provide differentiable programming to build very efﬁcient neural networks. Thus,we consider deep learning for TSC for these two reasons: the availability of high-quality frameworks and the relevanceof convolutional architectures. 6

PREPRINT - S

EPTEMBER

2, 2020Figure 1: Learning curve history comparing training and validation loss.The Fully Convolutional Network (FCN) (Wang et al., 2017) is a competitive deep-learning architecture for TSCyielding good results on large benchmarks (Fawaz et al., 2019). The network architecture is relatively simple and iscomprised of a sequence of thee time convolution blocks followed by a global average pooling block (Lin et al., 2013).Each time convolution block is divided into a convolution layer, a batch normalization layer, and a Rectiﬁed Linear Unit(relu) Nair and Hinton (2010) activation function. The convolutional layer is used to extract temporal features from theinputs by performing a convolution between the input signal and several ﬁlters. Each convolutional block has differentﬁlter lengths in order to analyze several time scales. The extracted features of the last convolutional block are used asinputs for the global average pooling (GAP) module to output the classiﬁcation result. This last block is composed ofthe global average pooling layer and the Softmax layer. A more complete description of the FCN architecture can befound in Wang et al. (2017); Fawaz et al. (2019).As regards the numerical parameter, we performed a bayesian optimisation to select the hyperparameters using thevalidation set for four parameters: • the optimizer: adam or rmsprop • the learning rate: between − and − ; • the batch size: 32, 64, or 92; • the number of ﬁlters: the baseline of the FCN paper, half the baseline and twice the baseline. The baselinegiven by the FCN paper (Wang et al., 2017) is the following: the ﬁrst convolutional block has 128 ﬁlterswith a ﬁlter length equal to 8, the second convolution extracts features with 256 ﬁlters of size 5 and the ﬁnalconvolutional layer is deﬁned by 128 ﬁlters, each one with a length equal to 3.The ﬁnal FCN is trained for , epochs with the categorical cross entropy as loss function. The result of the bayesianopimization provides the different hyperparameters. The Adam optimizer is used with an initial value of the learningrate set to . . An adaptive learning rate decrease is selected (with a minimum learning rate of − ), reducing thelearning rate when the validation accuracy has stopped improving for 50 epochs. A batch size of is used with twicethe baseline conﬁguration for the number of ﬁlters. The ﬁnal FCN has , , free parameters. The method of earlytermination is adopted to avoid over-ﬁtting and stops the learning process when the validation loss did not decrease for20 epochs. All the time series are standardized before the training process. The learning curve is presented in Figure 1.The learning process stopped after epochs thanks to the early termination strategy.7 PREPRINT - S

EPTEMBER

2, 2020

Assessing accurately the quality of a classiﬁcation model raises signiﬁcant challenges. A single and uniﬁed performancemetric cannot effectively evaluate a multi-class classiﬁer from all perspectives, such as class-balance or the number ofdifferent outcomes. Therefore, several metrics generalized from binary classiﬁcation are considered.The confusion matrix is a basic but useful tool to visualize the distribution of the predicted classes. Each columnof the matrix gives the instances of a true class while each row represents the instance of a predicted class. Severalmetrics from binary classiﬁcation can be generalized to multiclass by considering the prediction of each class as abinary classiﬁcation problem: the given class has been predicted (positive) or not (negative). Then, the total number ofTrue Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) can be computed and severalmetrics are deﬁned:-

Accuracy is the most intuitive metric and gives the percentage of correct predictions among all the predictions N p : accuracy = T P + T NN p . (1)- Precision , also called positive predictive value, measures the proportion of positive values that are correctlyidentiﬁed as such among all predicted positives. In other words, it corresponds to the percentage of detectionsthat are right. High precision is associated to a small amount of FP: precision = T PT P + F P . (2)-

Recall , also called the true positive rate, measures the proportion of positives detected. A signiﬁcant numberof missed true values (FN) leads to low recall: recall = T PT P + F N . (3)-

F1-score is the harmonic mean of precision and recall. Usually, high recall is detrimental to precision andvice versa. Therefore, the F1-score gives a trade-off between the two quantities and can help to compare twoclassiﬁers with different precision and recall values: accuracy = 2 . precision.recallprecision + recall . (4)All these metrics are computed for each class. Therefore, averaged quantities can be also deﬁned to provide a moregeneral view of the classiﬁer. A macro-average gives the unweighted mean of the metric without taking imbalanceinto account and a micro-average computes the average weighted by the number of true instances for each class. Themicro-average will account for label imbalance.Finally, a receiver operating characteristic (ROC), also called ROC curve, is a graphical tool illustrating the performanceof a binary classiﬁer system when the discrimination threshold is varied. Indeed, each class prediction is associated to aprobability. Usually, the class predicted by the classiﬁer is chosen as the class with the highest probability. Instead, athreshold value can be used to decide if the class is effectively predicted by the classiﬁer. The ROC curve plots thefraction of true positives out of all the predicted positives (True Positive Rate) versus the fraction of false positives outof all the predicted negatives (False Positive rate) at various threshold values. For instance, all classes are predictedwith a zero threshold (False Positive Rate and True Positive Rate of 1.0) while there are no correct predicted class andno false positive with a unit threshold (False Positive Rate and True positive rate of 0.0). A perfect classiﬁer is locatedon the left corner of the curve with a true positive rate of 1.0 and a false positive rate of 0.0. A ROC curve can be drawnfor each class. In case of a model with no discrimination capacity to distinguish between positive class and negative8 PREPRINT - S

EPTEMBER

2, 2020class, the ROC curse is a straight line of slope 1. The area under the ROC curve (AUC) summarized in one number ifthe model is capable of distinguishing between classes.

We use here the test set (the out-of-sample set that is not involved in the training process) to evaluate the performance ofthe model. It is obtained by shufﬂing 25% of the data as described in section . .Precision Recall f1-score AUCSW 0.88 0.92 0.90 1.00FS 0.93 0.95 0.94 1.00BS 0.85 0.80 0.83 0.99MSH 0.91 0.91 0.91 0.99MP 0.84 0.84 0.84 0.99BL 0.84 0.84 0.84 0.98MSP 0.96 0.95 0.96 1.00PS 0.89 0.89 0.89 0.99PSBL 0.86 0.87 0.87 0.99LOBE 0.95 0.95 0.95 1.00Macro average 0.89 0.89 0.89 0.98Micro average 0.89 0.89 0.89 0.99Accuracy 0.89Table 3: Classiﬁcation report for FCN computed on the test set.Table 3 summarizes the precision, recall, f1-score, and AUC evaluated by the FCN for the 10 different classes. In termof absolute values, the FCN shows great performance metrics. The AUC value is above . for all the classes. Itmeans that a signiﬁcant probability is always associated to the correct region by the classiﬁer, even it is not the highestprobability. For instance, the classiﬁer could give a probability of 45% to the SW and 35% to the FS for a test samplebelonging to the FS. Even if the prediction is wrong, the classiﬁer still has detected some features which can belong tothe correct class.Two different groups of regions are identiﬁed in Table 3. The ﬁrst group consisting of the FS, the MSP, and the LOBE isvery well predicted with a respective f1-score above . and a AUC of . . Such regions show usually very speciﬁcpatterns, explaining why the classiﬁcation metrics are very high. The second group is formed by the BS, the MP, theBL, the PS, and the PSBL, with smaller metric values, such as a f-1 score between . and . and a AUC between . and . . Figure 2 shows the different ROC curves for the FCN. BL and BS are the lowest curves. They have amuch higher false positive rate when they reached a true positive rate of . compared to the other classes. It meansthat, for some very speciﬁcs predictions, the classiﬁer associates a very low probability to the correct class. The BLand the BS are thin regions marking the boundary between larger regions. Therefore, several time intervals labeled bySITLs may overlap two or more nearby regions. It could explain the lower metric values for these two regions.From the supplemental material, the second model (MLP) has important also problems to classify regions with strongvariations, such as the BS with a f-1 score below . . Thus, it seems that convolutional methods help to improve theaccuracy of the classiﬁer for the most challenging regions and it validates the choice of such algorithms designed toextract important temporal features and variations.Table 3 gives details on the FCN predictions with a confusion matrix. Very typical errors can be observed for nearbyregions, such as: SW and FS; PS, BL, and PSBL; PSBL and LOBE; MP, MSH and BL. These errors can be explainedby physical arguments. The plasma properties can be similar for several nearby regions. For instance, we may expectthat MP shares common properties with MSH and MSP as the MP acts as a boundary between the two other regions.9 PREPRINT - S

EPTEMBER

2, 2020Figure 2: Receiver operating characteristic curve computed on the test set.TrueSW FS BS MSH MP BL MSP PS PSBL LOBE P r e d i c ti on SW 493 53 8 0 2 1 0 0 0 0FS 42 1,140 40 0 0 0 0 0 0 0BS 2 12 450 54 8 1 0 0 0 0MSH 0 1 52 982 41 5 0 0 0 0MP 0 0 10 37 801 92 5 3 2 1BL 0 0 1 4 78 933 49 39 10 0MSP 0 0 0 0 10 32 1,101 0 0 0PS 0 0 0 0 5 38 5 803 54 1PSBL 0 0 0 0 7 13 0 59 572 15LOBE 0 0 0 0 0 0 0 1 16 354Table 4: Confusion Matrix for FCN computed on the test set.The SW region can also be very similar to the FS as the latter comes from the reﬂection of the SW on the BS, as statedin the introduction. Therefore, it may be concluded that the FCN model really learned the typical patterns associated toeach region as the main errors are identiﬁed for regions with similar physics.

In this section we show some examples of time series observed by the MMS spacecraft as it spanned the differentnear-Earth regions, along with the labels of the classiﬁcations made by the FCN model. The labels of the differentregions are sorted roughly according to their distance to the Sun, and summarized in Table 1.Figure 3 shows the MMS1 probe observations for the whole day of 2019-11-09, during which it ﬂew through thedayside region. From top to bottom, the ﬁgure panels show the energy spectrum of the omnidirectional ion ﬂux, thethree components of the magnetic ﬁeld ( B x , B y , B z ) in GSE coordinates and its magnitude B tot , the ion numberdensity, the three components of the ion bulk velocity ( V x , V y , V z ) in GSE coordinates, the ion temperatures paralleland perpendicular ( T (cid:107) , T ⊥ ) to the ambient magnetic ﬁeld, and ﬁnally the labels given by the FCN model (FCN labels).The time in UT format and the spacecraft position ( X pos , Y pos , Z pos in R E and GSE coordinates) are displayed at thebottom of the X-axis.In this example, the FCN model classiﬁes well the regions, as its labels follow to a great extend the MMS1 observations:ﬁrst the spacecraft is located in the magnetosphere, then it crosses the magnetopause to enter the magnetosheath,afterwards it ﬂies through a series of bow shocks and goes back and forth to the solar wind, until it ﬁnally enters thesolar wind. The model is even able to precisely detect the small-scale dynamics of the boundary regions such as partialmagnetopause, boundary layer and bow shock crossings, and the come and go between the solar wind and the ion10 PREPRINT - S

EPTEMBER

2, 2020Figure 3: Example of MMS dayside classiﬁcation. Labels, not present in the initial data set, have been manually addedto compare the FCN predictions with a ground truth. They are represented by colored regions on the FCN labels plot.foreshock. Only three points seem to be clearly misclassiﬁed (as MSP) in the ion foreshock. However, these points canbe easily eyeballed as outliers or discriminated by their low quality ﬂag.Figure 4 shows another example of the FCN classiﬁcation method described above, but this time when MMS spacecraftﬂew through the nightside region. The MMS1 observations take place during the whole day of 2019-07-13, and theformat is the same as in Figure 3. In this example again, the FCN model is able to label properly the regions spanned byMMS1: at the very beginning MMS1 exits the magnetosphere to the plasma sheet boundary layer, then goes in andout the lobe regions before reaching the plasma sheet. Again, the FCN model is also able to pick up the small-scaledynamics such as short plasma sheet crossings.The two examples shown in this section therefore support the good metrics obtained in the previous section, showingconcretely that the FCN model is suited to properly label most of the MMS data with the regions spanned.11

PREPRINT - S

EPTEMBER

2, 2020Figure 4: Example of MMS nightside classiﬁcation. Labels, not present in the initial data set, have been manually addedto compare the FCN predictions with a ground truth. They are represented by colored regions on the FCN labels plot.

Class Activation Map (CAM) is a technique developed by Zhou et al. (2016) to get the discriminative regions used by aCNN to identify a speciﬁc class in the input data. CAM has been applied to TSC for one-dimensional case in (Wanget al., 2017) for the ﬁrst time. The objective in our case is to highlight the subsequences (i.e., sections) in each timeseries which are relevant to its class. Thus, this approach allows to explain the decision taken by the classiﬁer. Themathematical description of the method can be found in Zhou et al. (2016); Wang et al. (2017); Fawaz et al. (2019).This method works only for models with the global average pooling as last layer (Lin et al., 2013). CAM is representedby a univariate time series (with the same size than the FCN input) where the value gives the importance of the signal toclassiﬁcation. Low values mean the subsequence does not contribute to the decision of the classiﬁer while high valuesmean the section contributes signiﬁcantly.Figure 5 shows CAM examples for two different classes: the solar wind and the foreshock. For each class, a well-predicted time series is drawn and colored by the value of the CAM, illustrating the results from MMS data given12

PREPRINT - S

EPTEMBER

2, 2020 (a) Solar wind. (b) Foreshock.

Figure 5: The Class Activation Map highlighting the important sections of the time series contributing most to theclassiﬁcation. Low values mean the time step of the time series does not contribute to the decision of the classiﬁer whilehigh values mean the time step contributes signiﬁcantly.in section . . As regards the SW example, almost all the input signal contributes to the classiﬁcation, as stated bythe uniform CAM values above 0.8 between 20 and 160 seconds. It means that the values of input features are muchmore important than their variations or than speciﬁc patterns. On the other hand, the classiﬁer detects the FS mainlythanks to a speciﬁc pattern around 65 seconds associated to extrema for B x , B y , N i , and V y and signiﬁcant slopes forthe other features. Only a few speciﬁc time intervals are involved in the decision of the classiﬁer for FS. The CAManalysis strengthens the conclusion that temporal and dynamical analysis are fundamental to classify nearby regionswith ﬂuctuating behaviors. For instance, the MLP results given in the supplemental materials shows that a signiﬁcantnumber of SW region are misclassiﬁed as FS as the model works only on instantaneous quantities. In this section, we put to test the adaptability of the FCN model to different magnetospheric data, namely data from theCluster mission. This mission has been chosen as the data is pretty close to MMS data (notably the data sampling),and a 2-year period of labeled data is available (see Nguyen et al. (2019)) to audit the classiﬁcation resulting from themodel.First, we qualitatively assess the quality of the different classiﬁcation methods, by comparing them with C1 probeobservations. We randomly select a day during which the C1 probe ﬂew through the dayside region. This example,from the day of 2005-02-13, is shown in Figure 6 in the same format as Figure 3 and Figure 4, except that we includethe manual labels from Nguyen et al. (2019) (named ”Man labels”) in the penultimate bottom panel.13

PREPRINT - S

EPTEMBER

2, 2020Figure 6: Example of Cluster dayside classiﬁcation.The ”Man labels” are in good agreement with the observations of C1, as the probe is ﬁrst located in the magnetosphere,then ﬂies through the magnetosheath and ﬁnally reaches the solar wind. The ”FCN labels” show also good agreementwith C1 observations: the probe is ﬁrst located in the magnetosphere and then crosses the magnetopause a few timesbefore entering the magnetosheath, then it crosses the bow shock to enter the solar wind before crossing several timesagain the bow shock and ﬁnally spanning the ion foreshock. As in the examples shown for the MMS mission, theFCN model is able to pick up the small-scale dynamics of the near-Earth frontiers: the magnetopause, the bow shockand the foreshock. As a result, the model is able to outperform the manual labels by identifying in more detail themagnetospheric regions, in particular the small-scale dynamics of the transition regions.Then, we take advantage of the 2-year period of Cluster labeled data from Nguyen et al. (2019) to quantify the qualityof the FCN model classiﬁcation, by investigating the performance metrics. We select the points labeled as ’MSP’,’MSH’ and ’SW’ by the FCN model using Cluster data from the 2005-2006 period, and compare them with the labelsfor these 3 regions obtained from Nguyen. Using this method, we obtain an accuracy classiﬁcation score of 0.97 over atotal of , common labeled points. the classiﬁcation report can be found in Table 5. We note here that this score isnot an exact quantization of the model’s accuracy on Cluster data, and the classiﬁcation score is probably overestimated14 PREPRINT - S

EPTEMBER

2, 2020Precision Recall f1-scoreMSP 0.84 0.97 0.90MSH 1.00 0.97 0.98SW 0.95 0.99 0.97Macro average 0.93 0.97 0.95Micro average 0.98 0.97 0.97Accuracy 0.97Table 5: Classiﬁcation report for FCN computed on the Cluster dataset.because only 3 classes (i.e., magnetospheric regions) are considered here instead of the 10 included in the FCN model.Therefore, a dataset of Cluster data labeled with these 10 near-Earth regions is required to get an absolute quantiﬁcationof its accuracy regarding the Cluster mission. However, it shows globally that our FCN model classiﬁcation is a reliablemethod for the labeling of the Cluster mission data and potentially other heliophysics missions.

The most time-consuming task in the present work has been the data preparation and the parsing of the SITL reports.Training a FCN model needs about

CPU hours on a workstation featured with an Intel Xeon E5-2670 v3 (12 coresat 2.30GHz). These values must be multiplied by the number of different trials needed to optimize the hyperparameters(about 40 in that case).

Using deep learning algorithms, namely a fully convolutional neural network (FCN), we built an automatic detection of10 near-Earth regions: the solar wind, the ion foreshock, the bow shock, the magnetosheath, the magnetopause, theboundary layer, the magnetosphere, the plasma sheet, the plasma sheet boundary layer and the lobes.Using more than 3 years of labeled (SITL and additional human-labeled) MMS mission data, we showed that thismethod are reliable to classify near-Earth regions. The FCN method(1,079,000 free parameters) has been very effectivein taking into account the dynamical features of the most challenging plasma regions (important data variability due tothe ﬂuctuations of the plasma), in particular the bow shock and the boundary layer. The high accuracy of the FCN modelalso highlights the quality of the labeled data set generated from SITL reports. We demonstrate the good accuracy ofthe classiﬁcation predictions on the test and validation datasets, but also on unlabeled data from the MMS mission. Wealso show the adaptability of the trained FCN model by applying the classiﬁcation to the Cluster ESA mission data.The predictions showed good accuracy and enhanced dynamics in comparison with previous human-labeled dataset.These results show on one hand that the model can be applied to the whole MMS dataset, which would be of greatinterest to map the different magnetospheric boundaries and build empirical models of the properties and dynamics ofthe plasma in the near-Earth space. On the other hand, these results show that the model could be applied on differentspace missions orbiting around Earth (such as Cluster or THEMIS) or other planets to automatically label availabledata. However, we recommend to build a small labeled dataset speciﬁc to each mission with limited retraining of thelast layer. This process called transfer learning could help improving the generalization of our labeled dataset to othermissions such as the JUICE mission, which will be launched in 2022 and for which selective downlink strategies arebeing discussed.As a matter of fact, the model could be used as a support to the scientiﬁc experts in charge of the data selection process(such as the SITL system), providing classiﬁcation of the regions that would contribute to ease and speed up thistime-consuming and tedious task. Such lightweight and easily-adaptable algorithms could also be important for theso-called SOC (Science Operation Center) of current and future spacecraft missions, both on ground and onboard. For15

PREPRINT - S

EPTEMBER

2, 2020the latter case, these algorithms could be implemented within onboard spacecraft digital boards to automatically selectregions and events of scientiﬁc interest, much reducing the complexity and the cost of science operations.The intended integration of this model into the aidapy package , which allows to automatically load selected data fromopen-access databases, will be of particular interest by providing plasma data accompanied with region labels andquality ﬂags on the ﬂy for use case and statistical studies. The integration of such lightweight algorithms can be alsogeneralized to other programs.In terms of modeling, we expect to improve the FCN results by generalizing the use of hyperparameter optimization fora higher number of hyperparameters and on larger ranges. We also plan to investigate the use of convolution methodsdirectly on the omnidirectional ion energy ﬂuxes.Finally, the classiﬁcation of near-Earth regions represents only a ﬁrst step for the time series classiﬁcation in heliophysics.It paves the way to more ambitious models, such as the identiﬁcation and the classiﬁcation of space plasma processes(e.g., magnetic reconnection and structures, waves and turbulence or plasma jets), as well as their combinations.However, this topic is beyond the scope of the present study and is left for a forthcoming study. Conﬂict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or ﬁnancial relationships thatcould be construed as a potential conﬂict of interest.

Author Contributions

HB and RD built the data set, the machine learning models and wrote the manuscript. HB, RD, AR and OL gatheredinformation from SITL reports and provided information for their usage for machine learning. HB, AR, OL and GLprovided physical interpretation of the results. RD and JA provided insights into the use of the different machinelearning techniques. AR and GL supervised the work. All authors contributed to manuscript revision, read and approvedthe submitted version.

Funding

This paper has received funding from the European Unions Horizon 2020 research and innovation programme undergrant agreement No 776262 (AIDA, ). Acknowledgments

Data were retrieved using HELIOPY v0.5.3(Stansby et al., 2020) and processed using aidapy , pandas Wes McKinney(2010), and scikit-learn (Pedregosa et al., 2011). Figures were produced using matplotlib (Hunter, 2007). The modelhas been built with Keras from Tensorﬂow 2.2 (Abadi et al., 2016) and we modiﬁed scripts provided in the dl-4-tsrepository . The hyperparameter optimization is performed with the Python library optuna (Akiba et al., 2019). Theauthors would like to acknowledge Gautier Nguyen (LPP) for providing classiﬁcations on Cluster data, Nicolas Aunai(LPP) and Alexis Jeandet (LPP) for the helpful discussion and support. https://gitlab.com/aidaspace/aidapy https://gitlab.com/aidaspace/aidapy https://github.com/hfawaz/dl-4-tsc PREPRINT - S

EPTEMBER

2, 2020

Data and Model Availability Statement

The models used for this study, the post-processing tools, and the link to the data can be found in the Jupyter notebookclassiﬁcation results.ipynb in the AIDA repository . References

Kivelson A, Kivelson M, Russell C.

Introduction to Space Physics . Cambridge atmospheric and space science series(Cambridge University Press) (1995).Dungey JW. Interactions of solar plasma with the geomagnetic ﬁeld.

Planetary and Space Science (1963) 233–237.doi:10.1016/0032-0633(63)90020-5.Jelnek K, Nmeek Z, afrnkov J. A new approach to magnetopause and bow shock modeling based on automated regionidentiﬁcation. Journal of Geophysical Research: Space Physics (2012). doi:10.1029/2011JA017252.LeCun Y, Bengio Y, Hinton G. Deep learning. nature (2015) 436–444.Camporeale E, Car`e A, Borovsky JE. Classiﬁcation of solar wind with machine learning.

Journal of GeophysicalResearch: Space Physics (2017) 10–910.Balasis G, Aminalragia-Giamini S, Papadimitriou C, Daglis IA, Anastasiadis A, Haagmans R. A machine learningapproach for automated ulf wave recognition.

Journal of Space Weather and Space Climate (2019) A13.Nishizuka N, Sugiura K, Kubo Y, Den M, Watari S, Ishii M. Solar ﬂare prediction model with three machine-learningalgorithms using ultraviolet brightening and vector magnetograms. The Astrophysical Journal (2017) 156.Olshevsky V, Khotyaintsev YV, Divin A, Delzanno GL, Anderzen S, Herman P, et al. Automated classiﬁcation ofplasma regions using 3d particle energy distribution. arXiv preprint arXiv:1908.05715 (2019) 1–16.Nguyen G, Aunai N, Michotte de Welle B, Jeandet A, Fontaine D. Automatic detection of the earth bow shock andmagnetopause from in-situ data with machine learning.

Submitted to Annales Geophysicae Discussions (2019)1–22. doi:10.5194/angeo-2019-149.Burch J, Moore T, Torbert R, Giles B. Magnetospheric multiscale overview and science objectives.

Space ScienceReviews (2016) 5–21. doi:10.1007/s11214-015-0164-9.Esling P, Agon C. Time-series data mining.

ACM Computing Surveys (CSUR) (2012) 1–34.Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller PA. Deep learning for time series classiﬁcation: a review. DataMining and Knowledge Discovery (2019) 917–963.Torbert R, Russell C, Magnes W, Ergun R, Lindqvist PA, LeContel O, et al. The ﬁelds instrument suite on mms:Scientiﬁc objectives, measurements, and data products. Space Science Reviews (2016) 105–135.Russell C, Anderson B, Baumjohann W, Bromund K, Dearborn D, Fischer D, et al. The magnetospheric multiscalemagnetometers.

Space Science Reviews (2016) 189–256.Pollock C, Moore T, Jacques A, Burch J, Gliese U, Saito Y, et al. Fast plasma investigation for magnetosphericmultiscale.

Space Science Reviews (2016) 331–406.Balogh A, Carr CM, Acu˜na MH, Dunlop MW, Beek TJ, Brown P, et al. The Cluster Magnetic Field Investigation:overview of in-ﬂight performance and initial results.

Annales Geophysicae (2001) 1207–1217. doi:10.5194/angeo-19-1207-2001.R`eme H, Aoustin C, Bosqued JM, Dandouras I, Lavraud B, Sauvaud JA, et al. First multispacecraft ion measurementsin and near the Earth’s magnetosphere with the identical Cluster ion spectrometry (CIS) experiment. AnnalesGeophysicae (2001) 1303–1354. doi:10.5194/angeo-19-1303-2001. https://gitlab.com/aidaspace/notebooks_aida/-/tree/master/04_sitl_classification_region PREPRINT - S

EPTEMBER

2, 2020Argall MR, Small C, Piatt S, Breen L, Petrik M, Barnum J, et al. Mms sitl ground loop: Automating the burst dataselection process. arXiv preprint arXiv:2004.07199 (2020).Baumjohann W, Treumann RA.

Basic space plasma physics (1996). doi:10.1142/p015.Bagnall A, Lines J, Bostrom A, Large J, Keogh E. The great time series classiﬁcation bake off: a review and experimentalevaluation of recent algorithmic advances.

Data Mining and Knowledge Discovery (2017) 606–660.Lucas B, Shifaz A, Pelletier C, ONeill L, Zaidi N, Goethals B, et al. Proximity forest: an effective and scalabledistance-based classiﬁer for time series. Data Mining and Knowledge Discovery (2019) 607–635.Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. Tensorﬂow: A system for large-scale machine learning. { USENIX } Symposium on Operating Systems Design and Implementation ( { OSDI } (2016), 265–283.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. Pytorch: An imperative style, high-performancedeep learning library. Advances in Neural Information Processing Systems (2019), 8024–8035.Wang Z, Yan W, Oates T. Time series classiﬁcation from scratch with deep neural networks: A strong baseline. (IEEE) (2017), 1578–1585.Lin M, Chen Q, Yan S. Network in network. arXiv preprint arXiv:1312.4400 (2013).Nair V, Hinton GE. Rectiﬁed linear units improve restricted boltzmann machines.

Proceedings of the 27th internationalconference on machine learning (ICML-10) (2010), 807–814.Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization.

Proceedingsof the IEEE conference on computer vision and pattern recognition (2016), 2921–2929.[Dataset] Stansby D, Rai Y, Argall M, JeffreyBroll, Erwin N, Shaw S, et al. heliopython/heliopy: Heliopy 0.10.0 (2020).doi:10.5281/zenodo.3676651.Wes McKinney. Data Structures for Statistical Computing in Python. St´efan van der Walt, Jarrod Millman, editors,

Proceedings of the 9th Python in Science Conference (2010), 56 – 61. doi:10.25080/Majora-92bf1922-00a.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in python.