[PDF] Detecting Reconnection Events in Kinetic Vlasov Hybrid Simulations Using Clustering Techniques

Abstract

Kinetic turbulence in magnetized space plasmas has been extensively studied via in situ observations, numerical simulations and theoretical models. In this context, a key point concerns the formation of coherent current structures and their disruption through magnetic reconnection. We present automatic techniques aimed at detecting reconnection events in large data set of numerical simulations. We make use of clustering techniques known as K-means and DBscan (usually referred in literature as unsupervised machine learning approaches), and other methods based on thresholds of standard reconnection proxies. All our techniques use also a threshold on the aspect ratio of the regions selected. We test the performance of our algorithms. We propose an optimal aspect ratio to be used in the automated machine learning algorithm: AR=18. The performance of the unsupervised approach results to be strongly competitive with respect to those of other methods based on thresholds of standard reconnection proxies.

Full PDF

DDraft version December 18, 2020

Typeset using L A TEX manuscript style in AASTeX63

Detecting Reconnection Events in Kinetic Vlasov Hybrid Simulations UsingClustering Techniques

Manuela Sisti ,

1, 2

Francesco Finelli, Giorgio Pedrazzi, Matteo Faganello, Francesco Califano, and Francesca Delli Ponti Aix-Marseille University, CNRS, PIIM UMR 7345, Marseille, France, EU Dipartimento di Fisica, Universit`a di Pisa, Italy, EU HPC Department, Cineca, Bologna, Italy, EU (Received December 18, 2020; Revised; Accepted)

Submitted to ApJABSTRACTKinetic turbulence in magnetized space plasmas has been extensively studied via insitu observations, numerical simulations and theoretical models. In this context, akey point concerns the formation of coherent current structures and their disruptionthrough magnetic reconnection. We present automatic techniques aimed at detect-ing reconnection events in large data set of numerical simulations. We make use ofclustering techniques known as K-means and DBscan (usually referred in literature asunsupervised machine learning approaches), and other methods based on thresholds ofstandard reconnection proxies. All our techniques use also a threshold on the aspectratio of the regions selected. We test the performance of our algorithms. We propose anoptimal aspect ratio to be used in the automated machine learning algorithm: AR=18.

Corresponding author: Manuela [email protected] a r X i v : . [ phy s i c s . p l a s m - ph ] D ec Sisti et al.

The performance of the unsupervised approach results to be strongly competitive withrespect to those of other methods based on thresholds of standard reconnection proxies.

Keywords: magnetic reconnection, methods: data analysis, methods: numerical, unsu-pervised machine learning, Hybrid Vlasov Maxwell simulations INTRODUCTIONMagnetic reconnection is an ubiquitous phenomenon observed in the laboratory as well as in manyspace and astrophysical environments. It is the energy driver for solar ﬂares and coronal massejections (Priest 1982). It occurs routinely at the boundary between the solar wind and the Earth’smagnetosphere (Dungey 1961; Priest & Forbes 2001), both at dayside and at magnetotail, or at theﬂanks induced by the presence of large-scale Kelvin-Helmholtz vortices (Faganello & Califano 2017).An important outcome is the injection of accelerated particles into the magnetosphere. In the contextof turbulent plasmas, such as solar wind (Haynes et al. 2014; Osman et al. 2014) and magnetosheatplasma (Retin`o et al. 2007), reconnection plays a central role representing a possible alternative path,as compared to the usual vortex-vortex interaction, for energy transfer towards smaller and smallerscales (Cerri & Califano 2017; Karimabadi et al. 2013; Camporeale et al. 2018). It is behind many ofthe risks associated with space weather such as, for example, disturbs to Global Navigation SatelliteSystem signals, electronic damage to satellites or power grids (Cassak 2016).Magnetic reconnection is probably the most known and important process in magnetized plasmasfor which Magnetohydrodynamics (MHD) accurately describe the evolution at large scales, wherethe magnetic ﬁeld is frozen in the ﬂuid motion of the plasma and its topology is preserved. Inthis context, magnetic reconnection is the only process being able to rearrange the magnetic ﬁeldtopology on a global, MHD scale despite it occurs in a very narrow region of typical size of theorder of the ion kinetic scale length. The magnetic rearrangement, being on the large MHD scale,is accompanied by a strong energy release due to the conversion of magnetic energy into plasmaﬂow, particle acceleration, heating, and wave generation. The classical picture of reconnection isobtained by considering a 2D domain where a large scale magnetic ﬁeld, the so called equilibrium,presents an inversion line where it changes its direction. This is a 1D equilibrium conﬁguration,since ∂ x = ∂ z = 0 while the inversion takes place along the y direction, that corresponds to a kineticequilibrium for the distribution functions known as the Harris sheet (Harris 1962). The 1D magneticﬁeld can be represented as B = B x ( y ) e x , directed along the x -axis and varying along the y -axis beingzero along the neutral line at a given y = y . The corresponding out-of-plane current has a peakand the region around is dubbed current sheet (hereafter CS) or current layer . In certain favorablecondition this layer becomes unstable to the 2D reconnection instability and the ideal MHD laws arelocally violated (Furth et al. 1963; White 1980; Priest & Forbes 2001) leading to the breaking andreconnection of ﬁeld lines and eventually to the formation of an X-point like structures characterizedby an inﬂow/outﬂow ion and electron ﬂuid velocity advecting the magnetic ﬂux toward/away fromthe reconnecting regions. Often in the laboratory but also in space, a mean out-of-plane magneticﬁeld, B gf = B e z , is associated to the CS. In this case the dynamics is quasi-2D and reconnectinglines do not “cut&past” but actually slip across the CS changing their original connectivity, evenif their projection still seems to break and re-connect. This conﬁguration, the so-called guide-ﬁeldregime, is often studied by means of 2D simulations.In such a system, magnetic reconnection can be easily recognized. On the contrary, as soon as CSshave more complex shapes, to ascertain the presence of reconnection is far less simple, e.g when CSsare dynamically generated by 2D (Henri et al. 2012; Daughton et al. 2014) or 3D (Faganello et al.2014; Borgogno et al. 2015; Sisti et al. 2019) Kelvin-Helmholtz vortices or, even worse, by magnetizedturbulence (Servidio et al. 2010; Zhdankin et al. 2013). For each of these cases a diﬀerent ad hoc method has been developed, based on a deep knowledge of the peculiarity of each system.For investigating the complex non-linear dynamics of such systems, numerical simulations are widelyused, providing also an important support for the understanding of satellite data. However thisapproach generates an impressive amount of data to deal with, in particular when working withkinetic simulations spanning the entire phase space. Large data sets, as well as the diﬃculty of easilyrecognize reconnecting structures in these sets, suggest a possible application of machine learningfor reconnection analysis. Recently, in Dupuis et al. (2020), some machine learning techniques based Sisti et al. on signatures in the particle velocity distribution function in the phase space have been tested toidentify reconnection regions. Moreover, in Hu et al. (2020) for the same purpose to automaticallyﬁnd magnetic reconnection sites, supervised machine learning methods based on CNN (ConvolutionalNeural Network) are used. In Hu et al. (2020) it has been shown that the supervised machine learningapproach turns out to perform very good and promising and that the algorithm seems to be capableof detecting reconnection even in cases in which the human labelling has failed. However, the setup of a large labelled training dataset is a long-term process which requires the contribution of anumber of experts in the ﬁeld. On the other hand, an unsupervised algorithm doesn’t require atraining dataset thus being more ﬂexible and quicker to be implemented (but not simpler). Here weanalyse the performances of unsupervised algorithms in order to prove that such an approach can bean eﬃcient method to automatically detect reconnection. For this purpose, we set up an hybrid 2D-3V (two-dimensional in space, three-dimensional in velocity space) simulation with kinetic ions andﬂuid electrons, describing the turbulent dynamics of a collisionless plasma where magnetic structuresand CSs, continuously generated by plasma motions, interact together and eventually coalesce ordisrupt due to magnetic reconnection. Beyond the physical interest of this simulation set on typicalsolar wind turbulence parameters, it represents a good test for a machine learning approach sincereconnection occurs in a variety of diﬀerent conﬁgurations far from the idealized 1D Harris-like.Aiming at developing an automatic procedure for the detection of physical structures of basicinterest, such as CSs and reconnecting structures, we have developed the following techniques. Theﬁrst group relies on “standard” unsupervised machine learning techniques, such as K-means (inparticular Lloyd’s algorithm) (MacQueen 1967) and DBscan (Density Based Spatial Clustering ofApplications with Noise) (Ester et al. 1996). The second group instead is based on a number ofstandard proxies used in the literature, and the deﬁnition of the corresponding thresholds, as markersto detect and highlight the presence of a reconnection event. In particular we cite: current density,electron vorticity and decoupling of the electron dynamics from the magnetic ﬁeld. Note that allmethods use physical quantities particularly suited to be considered as a signature of reconnection.These quantities, extracted here from numerical simulations, are the ones usually measured by on-board instruments of in-situ satellites. All methods are based on a ﬁnal fundamental step for selectingreconnecting structures: the deﬁnition of a threhsold on the aspect ratio of the CSs. This thresholdis motivated by the physics of the reconnecting CSs whose shape (their typical lenght and width) isnot random but imposed by the development of the reconnection instability.There exist several models of reconnection that could be used to extrapolate a reasonable value forthis threshold. The simplest one is the resistive Sweet-Parker model (Sweet 1958; Parker 1957) wherethe CS length L and width δ depend on the local reconnection rate R , i.e. on the outﬂow velocity(Cassak et al. 2017). Actually, the reconnection rate predicted from the Sweet-Parker model is tooslow to account for observations and simulations in collisionless plasmas. Other models such as theso-called “fast reconnection” predict a reconnection rate of the order of ∼ .

1, quite in agreementwith observations and simulation results recently found in the literature (Cassak et al. 2017). Adiﬀerent approach for the estimation of the CS aspect ratio relies on the role of the tearing modeas a suﬃcient condition for reconnection to occur. By looking at the wavenumber k m of the mostunstable mode in a CS of width δ , k m δ ∼ πδ/L ∼ .

5, we get δ/L ∼ .

08 (Karimabadi et al.2005), not far from ∼ .

1. In other words, there is a direct link between the aspect ratio and thereconnection rate. Therefore, we make use of the aspect ratio to distinguish structures where mostprobably reconnection is on-going using it as a threshold.The paper is organized as follows. In Section 2 the Hybrid Vlasov-Maxwell model (HVM) andthe 2D-3V simulation data are introduced. In Section 3 we describe the methods developed toautomatically identify CSs and reconnecting regions. In particular in Section 3.1 we discuss aboutour main method which uses the unsupervised machine learning techniques. In Section 3.2 and 3.3we present other two alternative methods which don’t use machine learning but are still automaticalgorithms to locate reconnecting regions using standard reconnection proxies. In Section 4 wediscuss the performances of these diﬀerent methods in ﬁnding reconnection sites. Finally, our resultsare summarized in Section 5. SIMULATION

Sisti et al.

To identify a wide panorama of reconnection sites emerging dynamically and not prepared ad-hoc,we make use of a 2D-3V plasma turbulence simulation and use an hybrid model with kinetic ionsand ﬂuid electrons (with mass) (Valentini et al. 2007). This simulation is run using the HVM codebased on an Eulerian approach to solve the ion Vlasov equation (Mangeney et al. 2002) coupledto the Maxwell equations but neglecting the displacement current and adopting the quasi-neutralapproximation, n i (cid:39) n e (cid:39) n . We use ion quantities to normalize the equations. These are: theion mass m i , the ion cyclotron frequency Ω ci = e ¯ B/m i c , Alfv´en velocity v A = ¯ B/ √ πm i n and sothe ion skin depth d i = v A Ω ci . The electron skin depth reads d e = (cid:112) m e /m i , where m e is theelectron mass. Finally ¯ n , ¯ B , ¯ E and ¯ f are the normalizing density, magnetic ﬁeld, electric ﬁeldand distribution function, respectively. Then, the Vlasov equation for the ion distribution function f = f ( x, y, v x , v y , v z , t ) reads: ∂f∂t + v · ∂f∂ x + ( E + v × B ) · ∂f∂ v = 0 (1)The Ohm’s equation for the electron response reads: E − d e ∇ E = − u e × B − n ∇ P e + d e ∇ · [ n ( uu − u e u e )] (2) where u = (cid:82) f v d v / (cid:82) f d v and u e are the ion and the electron ﬂuid velocities, respectively. Fur-thermore n is given by n = (cid:82) f d v . Finally the dimensionless Faraday and Ampere laws read: ∇ × B = J ; ∂ B ∂t = −∇ × E (3) For the sake of simplicity, we take an isothermal closure: P e = nT e . In this approach electroninertia (terms proportional to d e in Eq. 2) is a key ingredient since it allows for reconnection to occurby decoupling the electrons from the magnetic ﬁeld.We take a squared simulation box with L = L x = L y = 2 π

50 using N x = N y = 3072 points. Thecorresponding physics goes from the MHD ﬂuid-like behavior of the largest wavelength λ ∼ L tothe ion kinetic physics λ ∼ d i (= 1 in dimensionless units) to the electron inertial scale, λ ∼ d e . Interms of the corresponding frequencies we are across the ion cyclotron physics (= 1 in dimensionless Figure 1.

Snapshot of the contour plots of current density | J | for two diﬀerent times of our simulation: a) t = 247 [1 / Ω ci ] (current sheets formed), b) t = 494 [1 / Ω ci ] (fully-developed turbulence). units). We set the magnitude of the initial guide ﬁeld along the z -direction equal to one. The initialion distribution function is given by a Maxwellian with corresponding uniform temperature. Theelectron temperature is set equal to that of the ions, T i = T e . As a result, the plasma beta, theratio between the ﬂuid to the magnetic pressure, is equal to one (still in dimensionless units). Wesample the velocity space using 51 uniformly distributed grid point spanning [ − v th,i , v th,i ] in eachdirection, where v th,i = (cid:112) β i / m i /m e = 100allowing us to well separate the ion ( d i ) and electron ( d e ) skin depth. The simulation is initialized byadding a isotropic magnetic perturbation given by the sum of sinusoidal modes with random phase.The corresponding wave number interval is k ∈ [0 . , . δB rms (cid:39) .

28. In the initial phase, the initial perturbation starts toproduce a number of non interacting CSs where, later, at about one eddy turnover time ∼ Sisti et al. METHODSIn order to speed up the identiﬁcation of CSs and reconnecting structures, we develop three diﬀerenttechniques. The main one uses unsupervised machine learning, and will be addressed hereafteras “AML”. Two other alternative algorithms, which don’t use machine learning, are developed inorder compare the performances; they will be addressed as “A1” and “A2”. A comparison betweenthese three methods will be presented hereafter. The physical quantities we use as variables fordetecting reconnection in AML are the current density magnitude J = | J | , the magnitude of thein-plane electron ﬂuid velocity | u e, in − plane | , the magnitude of the in-plane magnetic ﬁeld | B in − plane | ,the magnitude of the electron vorticity Ω e = | Ω e | = |∇ × u e | , the electron decoupling term E dec ,e = | ( E + u e × B ) z | and the energy conversion term J · ( E + u e × B ). The ﬁrst three variables are relatedto the geometry of the CSs. The electron decoupling term describes the de-freezing of magneticﬁeld lines from the electron ﬂuid motion. The last variable is a proxy which accounts for the energydissipation at the reconnection sites (Zenitani et al. 2011). In A1 and A2 we use only some of them,in particular in A1 J , while in A2 J , Ω e and E dec,e .The aspect ratio (hereafter AR) of reconnecting CSs is not random but it’s linked to reconnectiondevelopment, thus all our algorithms have a common step: a threshold on the AR of the CSs selected.We deﬁne the AR as AR = lengthwidth (4)for each structure found. We give an estimation for the CSs width and length using the automatedmethod explained in Califano et al. (2020) and references therein. Note that for methods A1 andA2 the CSs are deﬁned as region where the current density J is greater than a certain thresholds, asdeﬁned in Zhdankin et al. (2013), while in AML, we make use of a diﬀerent technique to deﬁne theCSs in the physical space (as explained in the following sections). Anyway, once a CS is deﬁned asa collection of grid points in the 2D physical space, the automated procedure computes the Hessianmatrix H of | J | at the current peak (the local maximum of J ). We look at the interpolated proﬁle of J along the direction given by the eigenvector associated to the largest eigenvalue of H and deﬁne theCS’s width as the full width at half maximum of J . A similar procedure for computing the length (i.e.interpolating J along the direction of the eigenvector associated to the smallest eigenvalue) wouldbe misleading because in a turbulent system CSs are seldom “straight” along that direction. Wegive an automated estimation for the length by computing the maximal distance between two pointsbelonging to the same structure, similarly to what done in Zhdankin et al. (2013) where, however,the CSs are deﬁned using a threshold on the current density.It is worth noticing that all the variables we use are available in satellite data sets. In particular | u e | and | B | are directly measured by on-board instruments, while J = | n ( u − u e ) | , E dec ,e = | ( E + u e × B ) z | and J · ( E + u e × B ) are simple algebraic combinations of measured quantities. | u e, in − plane | and | B in − plane | can be computed by setting the out-of-plane direction aligned with the average (local)magnetic ﬁeld. Finally Ω e and AR can be calculated by measured quantities and spacecrafts’ relativepositions in the case of multi-satellite missions as CLUSTER or MMS (Dunlop et al. 2002; Fadanelliet al. 2019). 3.1. AML

The core of the ﬁrst method is constituted by two unsupervised machine learning algorithms: K-means (Lloyd’s algorithm) and DBscan (Density Based Spatial Clustering of Applications with Noise).These are both clustering techniques aimed at learning a grouping structure in a data-set. K-meansalgorithm requires to set the parameter “K” corresponding to the number of clusters we expect toﬁnd in our data-set in the variable space. To ﬁx the parameter “K” we make a pre-processing stepin which we tune the best value of “K” for our data. Our approach then summarizes in the followingsteps:1) Pre-processing step for the tuning of the “K” parameter for the K-means algorithm using a cross-validation-like approach with the Davis-Bouldin index applied to the predicted clusters as internalcluster metric (Rhys 2020). The tuning is applied in the variable space deﬁned at the beginningof Section 3. We choose to tune the “K” value at about one eddy-turnover time ( t (cid:39)

247 [1 / Ω ci ]),corresponding to the phase when CSs are well formed but still not interacting one each other. The0 Sisti et al.

Figure 2.

Value of the David-Bouldin index as a function of possible “K” from K = 5 to K = 15, for t = 247 [1 / Ω ci ]. The David-Bouldin index has a minimum for K = 11. result of the tuning is shown in Figure 2. The best “K” value is the one for which the David-Bouldinindex reaches its minimum, in our case it turns out to be K = 11, as shown in Fig.2.2) K-means (Llyod’s algorithm) is applied to the chosen variables which we list here again: thecurrent density magnitude J = | J | , the magnitude of the in-plane electron ﬂuid velocity | V e, in − plane | ,the magnitude of the in-plane magnetic ﬁeld | B in − plane | , the magnitude of the electron vorticityΩ e = | Ω e | = |∇ × V e | , the electron decoupling term (i.e., E dec ,e = | ( E + V e × B ) z | ) and the energyconversion term J · ( E + u e × B ). These variables are normalized between 0 and 1. This is a commonrequirement for many machine learning estimators, since in non-normalized data-set the presence ofoutliers could alter the result. The K-means algorithm returns K = 11 clusters in the variable space,as we show in Table 1 for time t ∼

Table 1.

The eleven clusters in the variable space of our data-set which result from the application of theK-means algorithm to t ∼

247 [1 / Ω ci ]. For each cluster, identiﬁed by an index, we report the mean value ofour variables and the number of grid points which belong to it. In bold the “interesting” cluster with thehighest value of mean current density. with the highest mean value of the current density (since a necessary condition for reconnection tooccur is the presence of a peak in the current density value), which is cluster “1”. This particularcluster in the variables space is made-up of diﬀerent structures in the physical space of our box. InFigure 3, top panel, we draw in the simulation ( x, y ) space domain the shaded iso-contours of theeleven clusters calculated by K-means (in the variables space). Cluster 1 is represented by the redcolor, the others by the diﬀerent blue variations. In the same Figure, bottom panel, we draw theshaded iso-contours of cluster 1 regions (red) superimposed to the contour shaded plots of the currentdensity | J | , suggesting that cluster 1 roughly corresponds to the ensemble of CSs.It is worth noticing that other unsupervised methods are suitable for the same purpose such as,for instance, the K-medoids technique. In particular, by applying K-medoids at t=247 we did notobserved any signiﬁcant improvement of the results. Therefore, K-means turns out to be a very goodapproach in this context, a baseline for this kind of analysis.2 Sisti et al.

Figure 3.

Top panel: the regions which constitute all eleven clusters in the variable space; in red cluster“1”, while in various shades of blue all other clusters (please note that any of the colors refers to a physicalvariable). Bottom panel: the regions which constitute cluster “1” (the interesting one) in the variable space,in red, superimposed to the contour plot of current density | J | . Figure 4.

Contour plot of current density | J | . In diﬀerent colors the diﬀerent structures identiﬁed throughDBscan algorithm applied to cluster “1” of Table 1.

3) DBscan. From Figure 3 we see that cluster 1 in the variable space is composed by diﬀerentstructures in physical space. Thus we need another algorithm able to distinguish diﬀerent structuresusing their location in physical space. A technique which is suited for this scope is DBscan, whichtakes as input the x and y-coordinates of the points belonging to cluster 1. We take as searchingradius for DBscan (cid:15) = 50 grid points which correspond to about 5 d i , and M in pts = 100 points, asthe minimum number of points for a single structure, where (cid:15) and

M in pts are intrinsic variablesof the DBscan algorithm. The results obtained by using DBscan are shown in Figure 4 where thediﬀerent colors represent the diﬀerent structures that are identiﬁed.We underline the fact that DBscan is not used here for image analysis but only for “cutting” cluster1 in diﬀerent subsets corresponding, in physical space, to diﬀerent structures. Figure 3 and 4 helphumans in visualizing the correspondence between these subsets and CSs where reconnection couldoccur.4

Sisti et al.

4) Threshold on the AR of structures found. For each structure found at step 3 we compute theAR (see Section 3, Eq. 4). We consider diﬀerent possible thresholds for the AR to get the betterperformance. In particular we have tried the following values: 10, 12.5, 20, 30, 50 and 70.3.2. A1 A1 is the simplest algorithm to identify structures. It is based on two step:1) A threshold on the current density | J | deﬁned as J thr = √ < J > +3 σ , where σ = (cid:112) < J > − ( < J > ) , see Zhdankin et al. (2013). More precisely, we select all regions in the physicalspace where the current density overcomes the threshold J thr . However, since a region of enhancedcurrent is a necessary condition but not a suﬃcient one for reconnection, we add a second step.2) A threshold on the AR of the structures in order to select the most probably reconnectingstructures. As done in AML, we consider diﬀerent possible threshold values on AR: 10, 12.5, 20, 30,50 and 70. 3.3. A2 A2 is a reﬁnement of A1. Another step is added in order to increase the precision in ﬁndingreconnection events. The method is summarized as follows.1) As in A1 we ﬁx a threshold on the current density and select the regions which overcome thisthreshold.2) We look at each structure and we select only the ones which include some points (at least two)which overcome both a threshold over electron vorticity Ω e and over E dec,e . In particular thesetwo thresholds are deﬁned as: Ω e, thr = (cid:112) < Ω e > +3 σ Ω , where σ Ω = (cid:112) < Ω e > − ( < Ω e > ) , and E dec,e, thr = (cid:113) < E dec,e > +3 σ E , where σ E = (cid:113) < E dec,e > − ( < E dec,e > ) .3) Finally, as in A1, we set a threshold on the AR of the selected structures. Also in this case, weconsider diﬀerent possible values for the AR: 10, 12.5, 20, 30, 50 and 70.In Table 2 we give a summary of the steps of each method, while in Figure 5 we show a ﬂow-chartto sketch the selection procedure to detect possible reconnecting structures. RESULTS5

Figure 5.

Flow-chart for the three methods we are considering.

Once three sets of candidate reconnecting structures have been selected using the three diﬀerentmethods (see ﬂow-chart in Figure 5), we estimate the accuracy of these techniques by looking atcandidate sites one by one. In particular, we check if reconnection is going on in each single siteby looking if typical signatures of reconnection are present: inversion of the in-plane magnetic ﬁeld,X-point conﬁguration of the magnetic ﬂux Ψ, converging electron inﬂows toward the X-point anddiverging outﬂows, magnetic ﬂuctuation along the guide-ﬁeld direction, peaked electron decouplingand energy conversion terms.Now, to compare quantitatively the performances of the three algorithms AML, A1 and A2 fordiﬀerent values of the AR threshold. We deﬁne the following quality-parameters:6

Sisti et al.

Method StepsAML 0) Selection of physical variables we want to use1) pre-processing: tuning of “K” value for the K-means2) K-means on variable space3) DBscan on physical space4) threshold on aspect ratioA1 1) threshold on the current density | J | in the physical space2) threshold on the aspect ratioA2 1) threshold on the current density | J | in the physical space2) request of points which overcome thresholds on | Ω e | and E dec,e

3) threshold on the aspect ratio

Table 2.

Summary of the steps for each method.

Precision = ∼

1. Indeed, in this case we would have at the sametime that the algorithm has selected only reconnection sites and has not excluded reconnection sites.In Table 3, 4 and 5 we show the values of these quality-parameters for AML, A1 and A2, respectively,and for diﬀerent AR thresholds. The results are shown at ﬁve time instants of our simulation: t ∼ / Ω ci ] (initial phase, no evidence of CS structures, t ∼

230 [1 / Ω ci ], t ∼

247 [1 / Ω ci ], t ∼ / Ω ci ] (CS developed regime), t ∼

494 [1 / Ω ci ] (fully developed turbulence). We will refer to thesethree distinct phases as phase I, II and III. In the last two columns we list the mean value of ourquality-parameters during phase II, t = t , t , t and during phase II and III, t = t , t , t , t .Looking at Table 3 corresponding to AML we observe: a) precision increases when AR threshold7increases, b) nMR-precision decreases when AR threshold increases, meaning that we are loosing“good” (reconnecting) sites, c) precision worsen when we include in our analysis phase III (turbulentregime). Same conclusions are drawn by looking at Table 4 and 5, corresponding to methods A1 andA2. In order to ease the comparison for the diﬀerent performances, we plot in Figure 6 precision(solid red, dashed orange and dotted brown lines, AML, A1 and A2, respectively) and nMR-precision(solid blue, dashed cyan and dotted violet lines, AML, A1 and A2, respectively) averaged over phaseII (left panel) and over phase II and III (right panel) as a function of the AR threshold. By comparingthese plots, left panel, we see that the unsupervised machine learning algorithm AML turns out tobe strongly competitive with respect to the methods based on thresholds on standard proxies. Inparticular precision performs better for AML rather than for A1. For what concerns A2, we seethat the precision appears to be less inﬂuenced by the AR threshold variations. In particular, theprecision of A2 (brown dotted line) turns out to be better at small AR thresholds than for AML; onthe other hand, the nMR-precision (dotted violet line) worsen, so that lots of “good” reconnectionsites are excluded. The same holds when including in the average the values at t = t (fully developedturbulent phase), see the right panel of Figure 6. However, and this is a quite general result, precisionand nMR-precision worsen a bit when considering the full turbulent phase. This worsening is theconsequence of the dynamics driven by the turbulence which advects, shrinks, breaks and deforms theCSs. This dynamics together with the merging of nearby structures typical of a 2D geometry makesmuch more diﬃcult a correct estimation of the typical width and length. Moreover, turbulence alsocreates regions with sharp variations in current density where, however, reconnection is not occurringthus “confusing” the various Methods and decreasing their accuracy. Except for this worsening inprecision at the end of the simulations, all Methods perform well both during the initial phase I when,correctly, no reconnection sites are detected, as well as during the central phase II when the CSs arewell detected. According to us, it is possible to deﬁne an optimal AR threshold for AML and A1,which corresponds to a compromise between having a good precision and not loosing lots of goodreconnection sites. For AML, this optimal AR threshold corresponds to the one at the intersectionbetween the plot of precision (left panel of Figure 6, red solid line) and the plot of nMR-precision8 Sisti et al.

Figure 6.

Plots of the values of precision (red, orange and brown lines) and nMR-precision (blue, cyan andviolet lines) averaged over the three central times (left panel) and over four times (right panel), as a functionof the AR threshold, for the three diﬀerent Methods we are analyzing: solid line, AML; dashed line, A1;dotted line, A2. (left panel of Figure 6, blue solid line). In this case, we get AR (cid:39)

18, which gives a precision andnMR-precision accuracy of the order of 78%. For A1, the same is obtained for AR (cid:39)

14 with a scoreof (cid:39) J · ( E + u e × B + ∇ P e ne ) instead of J · ( E + u e × B ).This can be due to the fact that in our model we consider a scalar pressure, thus, in the absence ofoﬀ-diagonal terms in the pressure tensor, this term doesn’t contribute to the eﬀective Ohmic heating(Birn & Hesse 2010). CONCLUSIONS9

Tempo [1/Ω ci ] 20 230 247 282 494 Mean (3t) Mean (4t)N. structures 35 29 19 24 32N. structures AR >

10 0 17 17 21 28

AR > . AR >

20 0 14 15 14 19

AR >

30 0 10 10 13 12

AR >

50 0 6 9 12 7

AR >

70 0 3 7 6 4precision

AR >

10 – 0.65 0.82 0.67 0.43 0.71 0.64

AR > . AR >

20 – 0.79 0.80 0.79 0.42 0.79 0.7

AR >

30 – 0.8 1 0.77 0.33 0.82 0.72

AR >

50 – 1 1 0.75 0.43 0.92 0.79

AR >

70 – 1 1 0.83 0.75 0.94 0.89nMR-precision

AR <

10 1 0.92 1 1 1 0.97 0.98

AR < . AR <

20 1 0.93 0.5 0.7 0.69 0.71 0.7

AR <

30 1 0.79 0.55 0.64 0.6 0.66 0.64

AR <

50 1 0.74 0.5 0.58 0.64 0.61 0.61

AR <

70 1 0.65 0.42 0.5 0.68 0.52 0.56

Table 3.

Number of structures found, precision and nMR-precision for AML, for diﬀerent AR threshold(from 10 to 70). The results are shown for ﬁve diﬀerent times of our simulation: t ∼

20 [1 / Ω ci ], t ∼ / Ω ci ], t ∼

247 [1 / Ω ci ], t ∼

282 [1 / Ω ci ], t ∼

494 [1 / Ω ci ]. In the last two columns we report the mean valueof our quality-parameters for the three central times (230, 247, 282) and for four times (230, 247, 282, and494). In the second row we list the number of structures found at the end of the third step; in rows from 3rdto 8th we give the number of detected structures overcoming the speciﬁed AR threshold. In rows from 9th to14th we give the values of precision for diﬀerent AR threshold computed among the structures enumeratedin rows 3-8; ﬁnally in rows 15th to 20th the same for nMR-precision. Sisti et al.

Tempo [1/Ω ci ] 20 230 247 282 494 Mean (3t) Mean (4t)N. structures 11 45 47 52 454N. structures AR >

10 0 22 23 18 88

AR > . AR >

20 0 15 15 12 43

AR >

30 0 13 12 10 26

AR >

50 0 11 9 7 11

AR >

70 0 6 6 5 4precision

AR >

10 – 0.68 0.61 0.61 0.26 0.63 0.54

AR > . AR >

20 – 0.8 0.8 0.75 0.33 0.78 0.67

AR >

30 – 0.77 0.83 0.8 0.35 0.8 0.69

AR >

50 – 0.82 1 0.86 0.36 0.89 0.76

AR >

70 – 1 1 0.8 0.25 0.93 0.76nMR-precision

AR <

10 1 0.74 0.71 0.65 0.90 0.7 0.75

AR < . AR <

20 1 0.7 0.72 0.65 0.89 0.69 0.74

AR <

30 1 0.66 0.68 0.64 0.88 0.66 0.71

AR <

50 1 0.65 0.68 0.62 0.88 0.65 0.71

AR <

70 1 0.61 0.63 0.60 0.87 0.61 0.68

Table 4.

Number of structures found, precision and nMR-precision for A1, for diﬀerent AR threshold (from10 to 70). The results are shown for ﬁve diﬀerent times of our simulation: t ∼

20 [1 / Ω ci ], t ∼

230 [1 / Ω ci ], t ∼

247 [1 / Ω ci ], t ∼

282 [1 / Ω ci ], t ∼

494 [1 / Ω ci ]. In the last two columns we report the mean value of ourquality-parameters for the three central times (230, 247, 282) and for four times (230, 247, 282, and 494).In the second row we list the number of structures found at the end of the third step; in rows from 3rd to8th we give the number of detected structures overcoming the speciﬁed AR threshold. In rows from 9th to14th we give the values of precision for diﬀerent AR threshold computed among the structures enumeratedin rows 3-8; ﬁnally in rows 15th to 20th the same for nMR-precision. Tempo [1/Ω ci ] 20 230 247 282 494 Mean (3t) Mean (4t)N. structures 2 30 27 26 126N. structures AR >

10 0 21 18 12 52

AR > . AR >

20 0 16 14 10 29

AR >

30 0 13 11 9 20

AR >

50 0 11 9 7 8

AR >

70 0 6 6 5 3precision

AR >

10 – 0.76 0.78 0.92 0.33 0.82 0.70

AR > . AR >

20 – 0.81 0.86 0.9 0.38 0.86 0.74

AR >

30 – 0.77 0.91 0.89 0.4 0.86 0.74

AR >

50 – 0.82 1 0.86 0.37 0.89 0.76

AR >

70 – 1 1 0.8 0.33 0.93 0.78nMR-precision

AR <

10 1 0.89 0.44 0.64 0.81 0.66 0.69

AR < . AR <

20 1 0.71 0.46 0.56 0.79 0.58 0.63

AR <

30 1 0.59 0.44 0.53 0.78 0.52 0.58

AR <

50 1 0.58 0.44 0.47 0.77 0.5 0.56

AR <

70 1 0.54 0.38 0.43 0.76 0.45 0.53

Table 5.

Number of structures found, precision and nMR-precision for A2, for diﬀerent AR threshold (from10 to 70). The results are shown for ﬁve diﬀerent times of our simulation: t ∼

20 [1 / Ω ci ], t ∼

230 [1 / Ω ci ], t ∼

247 [1 / Ω ci ], t ∼

282 [1 / Ω ci ], t ∼

494 [1 / Ω ci ]. In the last two columns we report the mean value of ourquality-parameters for the three central times (230, 247, 282) and for four times (230, 247, 282, and 494).In the second row we list the number of structures found at the end of the third step; in rows from 3rd to8th we give the number of detected structures overcoming the speciﬁed AR threshold. In rows from 9th to14th we give the values of precision for diﬀerent AR threshold computed among the structures enumeratedin rows 3-8; ﬁnally in rows 15th to 20th the same for nMR-precision. Sisti et al.

Summarizing, we developed three diﬀerent Methods aimed at automatically detecting reconnectingCSs in 2D simulation of turbulence. AML uses unsupervised machine learning techniques for thescope, while Methods A1 and A2 use only thresholds on the standard reconnection proxies. AllMethods are based on a threshold on the structures’ AR since the AR is directly linked to thereconnection rate and is a physical adimensional parameter independent from all simulations set up.We have deﬁned two diﬀerent quality parameters in order to estimate quantitatively the performanceof our algorithms: precision, which gives a measure of the capability of the method to select goodreconnection sites, and nMR-precision, which gives us an information about excluded regions, inparticular how many of them are really not reconnecting sites. In other words, it is an informationabout good sites excluded. AML is the one which performs the best. Moreover, although A2 hasa better precision at smallest AR threshold, for AML nMR-precision is better, and it is possible todeﬁne for it an optimal AR (as the intersection between the plot of precision an nMR-precision) whichinstead cannot be done for A2. The impossibility to deﬁne an optimal AR for A2, together with thebad values of nMR-precision for this method (lots of good sites lost), brings us to the conclusion thatA2 is not the best method to be applied. Considering only time periods where CSs are formed but stillnot interacting or signiﬁcantly aﬀected by the turbulent dynamics, we found for AML an optimal ARof about 18, which gives both a precision and a nMR-precision of ∼ Sisti et al.