[PDF] Automatic classification of nuclear physics data via a Constrained Evolutionary Clustering approach

Abstract

This paper presents an automatic method for data classification in nuclear physics experiments based on evolutionary computing and vector quantization. The major novelties of our approach are the fully automatic mechanism and the use of analytical models to provide physics constraints, yielding to a fast and physically reliable classification with nearly-zero human supervision. Our method is successfully validated by using experimental data produced by stacks of semiconducting detectors. The resulting classification is highly satisfactory for all explored cases and is particularly robust to noise. The algorithm is suitable to be integrated in the online and offline analysis programs of existing large complexity detection arrays for the study of nucleus-nucleus collisions at low and intermediate energies.

Full PDF

AAutomatic classiﬁcation of nuclear physics data via aConstrained Evolutionary Clustering approach

D. Dell’Aquila a, ∗ , M. Russo b,c a Ruđer Bošković Institute, Department of Experimental Physics, Bijenička 54, Zagreb,HR-10000, Croatia b Dipartimento di Fisica e Astronomia, Università degli Studi di Catania, 95235 Catania,Italy c INFN-Sezione di Catania, 95235 Catania, Italy

Abstract

This paper presents an automatic method for data classiﬁcation in nuclearphysics experiments based on evolutionary computing and vector quantiza-tion. The major novelties of our approach are the fully automatic mechanismand the use of analytical models to provide physics constraints, yielding to afast and physically reliable classiﬁcation with nearly-zero human supervision.Our method is successfully validated by using experimental data producedby stacks of semiconducting detectors. The resulting classiﬁcation is highlysatisfactory for all explored cases and is particularly robust to noise. The al-gorithm is suitable to be integrated in the online and oﬄine analysis programsof existing large complexity detection arrays for the study of nucleus-nucleuscollisions at low and intermediate energies.

Keywords:

Nuclear physics data classiﬁcation, Evolutionary computing,Clustering algorithms, Charged particle identiﬁcation in nuclear collisions.

1. Introduction

Nuclear physics experiments signiﬁcantly rely on data classiﬁcation, i.e.the grouping of data into meaningful physics classes, to reconstruct nucleus-nucleus collision events and enable the exploration of the underlying physics.In studies that exploit the detection of charged particles, the classiﬁcation ∗ Corresponding author.

E-mail address: [email protected]

Preprint submitted to Computer Physics Communications April 28, 2020 a r X i v : . [ nu c l - e x ] A p r roblem is often that of identifying charge ( Z ) and mass ( A ) of detectedions. This process is usually indicated as particle identiﬁcation . To thisend, a number of detection systems capable to record information usefulto the classiﬁcation process have been developed in the last decades [1–10].A quite common strategy consists in the use of detector arrays based onstacks of detection layers through which the particle penetrates before beingcompletely stopped. In similar arrays, if organized in a 2D correlation plot,data recorded by pairs of independent layers assemble into bi-dimensionalnon-overlapping clusters, each representing a certain ( Z , A ) class. To thisextent, the problem of nuclear physics data classiﬁcation is equivalent to theextraction of clusters in a bi-dimensional space.Numerous algorithms for Cluster Analysis (CA) or Vector Quantization(VQ) have been proposed in the literature so far, achieving a noticeablesuccess in standard partitioning problems and being focused on obtainingclusters of nearly equal dispersion (see for example [11–14]). However, theclusterization process in nuclear physics data is made more diﬃcult by thelarge variability of size and dispersion of the various clusters [15]. Becauseof these unique features, an optimal solution according to the CA/VQ ap-proach, where a good equalization of the content of each cluster is obtained,might not be directly applicable to nuclear physics classiﬁcation problems,where an acceptable physical solution usually contains clusters with stronglyunbalanced distortions. For this reason, only a reduced number of works havebeen previously carried out attempting to use CA/VQ methods in nucleardata classiﬁcation problems. For example, fuzzy c-means algorithms havebeen successfully applied to the identiﬁcation of particles in nucleus-nucleuscollisions, but their use was restricted, exclusively, to cases characterized bythe presence of few clusters with nearly equal distortion [16].Several studies have been instead focused on the classiﬁcation of nuclearphysics data in more general cases. Among those, image processing tech-niques are widely applied. For example, unsupervised learning approachesexploiting contextual image segmentation methods [15], neural networks [17]or spatial density analysis [18] led to quite satisfactory classiﬁcations. How-ever, these methods were conceived to classify exclusively Z -values, thus be-ing not particularly suitable for the vast majority of modern high-resolutionexperiments, where A -classiﬁcation is often a crucial requirement [7]. Inmore recent times, another automatic classiﬁcation method was proposedin Ref. [19]. This method allowed a good extraction of clusters, but theprocedure does not comprise an explicit link between extracted clusters and2hysically meaningful classes, thus requiring a signiﬁcant human supervision,especially in the analysis of data produced by large detection arrays.Because of the above discussed limitations, the vast majority of the ap-proaches commonly used for the classiﬁcation of data in nuclear physics ex-periments involve human-supervised techniques. In similar approaches, theoperator manually extracts information by visually inspecting bi-dimensionaldistributions of data, which is then used as input for supervised learning pro-cedures. With this respect, artiﬁcial neural networks have been proposed [20].More usually, error minimization procedures based on mathematical models[21, 22], which contain Z and A -values explicitly, are instead preferred. Thelatter allow to manually extract information only for a reduced number ofclusters, while the resulting classiﬁcation can be meaningfully extended toany possible ( Z , A ) ion by model extrapolation. An additional reduction inthe required information can be ﬁnally obtained if one follows the proceduresuggested in Ref. [23].However, despite the signiﬁcant eﬀort poised to minimize human supervi-sion, obtaining a physically meaningful data classiﬁcation in nuclear physicsexperiments is still a quite repetitive and time consuming task for the oper-ator, especially in the case of modern detection arrays, where the number ofindividual bi-dimensional plots to inspect ranges from few hundreds to thou-sands. As an example, depending on the number of bi-dimensional assemblyto classify, the time required to an operator to perform a similar task rangesfrom days to months. In this framework, it is clear that new fully-automaticmethods for data classiﬁcation are highly required.In this paper, we present an innovative approach to nuclear physics dataclassiﬁcation that allows to reduce the task to few minutes or hours of ma-chine time with nearly-zero supervision by the operator. Our approach in-volves Evolutionary Computing (EC) and CA/VQ and is based on an two-level search for the optimal solution of the data clusterization problem. Theupper level consists in a global search operated by an EC algorithm thattreats each solution as an individual of a given population and applies somesuitable evolutionary criteria. In our EC approach, an individual is encodedaccording to a given choice of functional parameters, which constrain a phys-ically meaningful model for the description of bi-dimensional clusters , and For the present work, the model proposed in Refs. [21, 22] has been used. However, ouralgorithm is fully general and can be applied exploiting any given model with an arbitrary Z , A ) physics classes. The lower level, used as a local hill-climbing operator for the EC process, performs a fast local search through asuitable VQ algorithm that exploits the resulting EC individual as the ini-tial codebook for the optimization problem. The major novelty with respectto previously published CA/VQ methods is that the codebook has a physi-cal meaning and resulting solutions are immediately applicable to the dataclassiﬁcation with nearly-zero eﬀort required to the operator. In addition,our algorithm is fully automatic as no a priori information is required to theexperimenter.Since the proposed methodology is highly interdisciplinary, in order toeﬀectively drive the reader through the paper, we provide with an introduc-tion of all individual research ﬁelds that cooperate in our algorithm. Thepaper is organized as follows: Section 2 gives a more quantitative descriptionof the classiﬁcation problem studied in this paper, Section 3 is concernedwith a description of the programming techniques used in our study, i.e. ECand CA/VQ, Section 4 provides a detailed description of the algorithm, inSection 5 we test the performance of the algorithm in a typical experimentalcase, in Section 6 we compare our methodology with previously publishedapproaches and, ﬁnally, Section 7 reports conclusions and possible futureperspectives of our work.

2. Experimental context

Charged particles are among the most frequent products of a nucleus-nucleus collision. At low and intermediate incident energies ( ≤ MeV/A)they consist almost exclusively of heavy ions ( Z ≥ ). Their number in atypical collision event varies depending on the incident energy and the sizeof the nuclei involved in the collision and can range from a few units to morethan . To meet the requirements of modern nuclear physics studies, state-of-the-art apparatus for the detection of charged particles have reached anextremely high level of accuracy in identifying a large variety of ions.Two numerical quantities are typically used to identify an ion: its numberof protons, often called charge and indicated with Z , and its total numberof nucleons, i.e. the sum of the number of its protons and neutrons, called mass and indicated with A . Because the energy deposition of a heavy ion in number of parameters to adjust.

20 40 60 80 100E (MeV)01020304050 E ( M e V ) D Li Li Li Li Li Be Be Be B B E (MeV) E ( M e V ) D H H H He He He Figure 1: A typical plot obtained by correlating the energy signals of two longitudinallystacked detectors used as ∆ E-E telescope. Data is obtained by means of the GEANT4toolkit [24] for a Silicon ( µ m)-Silicon telescope. In such a bi-dimensional representation,data group into clusters, each corresponding to a given ( Z , A ) ion. Colors deﬁne thestatistics of counts as indicated by the color scale. The insert represents a zoom of the ≤ Z ≤ region. Labels indicate loci corresponding to the various ions. a given material is a well-known function of its energy (i.e. its velocity), theproperties of the material and the nature of the ion (i.e. Z and A -values)[25], one can identify an ion produced in a collision event if its velocity andits energy loss in a layer of a material of given properties are known. Thisis the principle of the particle identiﬁcation technique called ∆ E-E, which isprobably the most widely used approach to identify charged particles in anucleus-nucleus collision at intermediate and low energies. The practical im-plementation of the ∆ E-E technique relies on the use of so-called telescopes,i.e. stacks of longitudinally arranged detectors, each with an independentreadout. Two detection stages are typically suﬃcient to this end, but con-ﬁgurations with more than two stages have also been proposed to accountfor the need for a particularly large dynamic range in a single experiment(see e.g. [5, 7, 8]). Let us assume, for the sake of clarity, that a telescopecomposed of two detection stages is used and that the particle has suﬃcient5nergy to pass through the ﬁrst stage, being stopped in the second stage . Ina similar telescope, the signal recorded by the ﬁrst detection stage, associatedto the energy deposited by the particle in the detector volume, correspondsonly to a portion of the total kinetic energy E kin of the particle and can betherefore indicated as ∆ E . Since the particle is stopped in the second de-tection stage, the signal produced by the latter will complement the kineticenergy of the particle, i.e. E kin = ∆ E + E . If the ﬁrst stage is suﬃcientlythin, the following equation can be easily derived from the non-relativisticformalism introduced in [25]: − dEdX = 4 πe Z m e v N B ≈ ∆ E ∆ X ⇒ ∆ E ∝ Z A E (1)where e is the elementary electric charge, m e is the mass of the electron, v is the velocity of the incident particle, N is the number of atoms per cm inthe material, I is the average excitation potential of the atoms in the ma-terial and B is a dimensionless quantity weakly dependent on the velocityof the particle and on the properties of the material. In the last term, wereplaced E kin with E , having E kin = ∆ E + E ≈ E and we considered theapproximation B ≈ const . It is important to point out that these approxi-mations and the simpliﬁed formalism derived from [25] are adequate for thepurpose of this section, a fully accurate formalism will be instead used toderive the classiﬁcation algorithm of Section 4. From the last term of eq. 1,one can state that pairs of charged particle signals recorded by a telescopeoccupy hyperbolic-like loci in the ( E , ∆ E ) plane depending on their ( Z , A ),values. Loci are not equally spaced as the dependency is quadratic on Z and linear on A . This can be observed in the plot shown on Fig. 1, wheredata simulated by using the GEANT4 toolkit [24] for a typical Silicon-Silicontelescope are shown. To produce the data in ﬁgure, we have considered, forsimplicity, a uniform energy distribution in the range [0 , MeV for allemitted particles, discarding the signals of particles not fully stopped in thesecond detection stage. The Z -value range ≤ Z ≤ was taken into con-sideration, and a realistic A distribution for each Z -specie was introduced.By visually inspecting the bi-dimensional distribution, one can immediatelyobserve that diﬀerent areas are populated, corresponding to the Z -values The approach is more general and can be applied to any array of longitudinally stackeddetectors if signals produced by pairs of independent layers are considered; to this end,the hypothesis that the particle is fully stopped in the second of those layers is required. Z = 1 and Z = 2 , Z -lines are additionally separated into A -clusters, eachcorresponding to a diﬀerent isotope of the particular Z -specie. Because thelatter are intrinsically less separated than Z -lines, as a result of the lineardependence on A given by eq. 1, their identiﬁcation is usually more chal-lenging. In a typical experiment, diﬀerent Z and A distributions could berecorded by diﬀerent detectors within the same setup, depending on theirgeometrical position with respect to the accelerated beam. Another interest-ing feature observed in Fig. 1 is the diﬀerence in population of the variousclusters. They reﬂect the statistics of production of diﬀerent ions emittedin the nuclear collision. In the example of Fig. 1, where colors indicate thecontent of each individual bin, we have chosen a distribution similar to thatobserved in Li+ Li, Li+ Li and Li+ F collisions at MeV incident en-ergy. It is evident, for instance, how clusters corresponding to Z = 1 areparticularly more populated that Z = 5 ones. The dispersion of each clusteraround its mean value diﬀers also very signiﬁcantly cluster-by-cluster. Forexample, lines associated to Z = 1 isotopes, i.e. H, H and H, well-visiblein the insert, have an average dispersion of about keV FWHM, whilethat associated to B results in an average dispersion of the order of keV. These values, that usually increase for increasing Z -values, are charac-teristic of the process of interaction of the radiation with the matter and aretherefore aﬀected by the eﬀective thickness of the ∆ E detector as well as bythe uniformity of the detector itself. In addition, the thickness of the ∆ E detector deﬁnes the absolute position of clusters in the ( E , ∆ E ) plane, whichis typically diﬀerent detector-by-detector.To summarize, the problem of charged particle identiﬁcation with lon-gitudinally stacked detectors consists in recognizing the ( E , ∆ E ) pairs thatbelong to the same physical class and to link them to a given ( Z , A ) ion.Such a task is therefore equivalent to a classiﬁcation problem. The mostrelevant features of the bi-dimensional clusters to extract are: (i) numberof clusters and their Z and A values are diﬀerent for each experiment andoften detector-by-detector, (ii) statistics within each cluster are signiﬁcantlydiﬀerent to each other, (iii) clusters have non-equal dispersions, (iv) no rea-sonable hypothesis can be made regarding the absolute position of clustersin the ( E , ∆ E ) plane. 7 . Description of the programming techniques The newly proposed methodology comprises two diﬀerent programmingtechniques that cooperate to obtain, in a fully automatic way, a solution tothe proposed classiﬁcation problem: EC and CA/VQ. In this section, weprovide an introduction on the most salient concepts of EC and CA/VQ.

Evolutionary computing is a scientiﬁc ﬁeld that concerns with the reso-lution of optimization problems through the application of concepts derivedfrom the natural world [26]. Nowadays, EC is applied to numerous domainsof science [27–29]. A very frequent scheme, which is derived from the Dar-winian evolutionary theory, can be schematically described in the followingway:1. A set of possible solutions to the optimization problem, encoded ac-cording to a predeﬁned scheme, is generated (often randomly). Eachof such solutions is called individual , while a set of individuals forms a population .2. A numerical value, called ﬁtness , is associated to each individual. Theﬁtness quantiﬁes how much a given solution is optimal to the problemto solve. The higher is the ﬁtness associated to an individual the morepromising is the individual itself. This is a crucial quantity for thesuccess of the optimization procedure.3. Until a predeﬁned convergence criterion is reached, the following stepsare iterated:(a) Some individuals are selected ( parents ) to be used as a startingpoint for the generation of new individuals ( oﬀsprings ).(b) Oﬀsprings are obtained through a suitable mechanism of parentsencoding recombination ( crossover ). In this phase, the chromo-somes of the parents, i.e. their encoding, are suitably combinedto generate new individuals. A valid crossover should produce in-dividuals whose genetic code is, to some extent, similar to that ofthe parents. Crossover is usually followed by a random variation,with low probability, of some portions of the derived encoding.Such a process is called mutation and has a crucial importanceas it allows to introduce missing genetic code and to keep geneticdiversity in the population. The ﬁtness is ﬁnally calculated for allnewly obtained individuals.8c) Some oﬀsprings live suﬃciently long to replace other pre-existingindividuals.The mechanism of ﬁtness improvement typical of EC is a result of theimplementation of suitable selection criteria. This is mandatory in orderto enhance the performance of the algorithm with respect to purely MonteCarlo codes. As an example, the initial choice of the parents of step (a)can be aﬀected by the ﬁtness of the available individuals, i.e. parents canbe chosen according to a probability distribution that favors the extractionof high-ﬁtness individuals. In a similar way, the replacement criterion usedto introduce newely generated oﬀsprings in the population can account forthe ﬁtness of pre-existing individuals. The latter are usually selected amongthose rejected in step (a). To this end, deterministic tournaments betweenpairs of individuals or ﬁtness-based stochastic criteria are often used. Thereplacement of individuals is generally required in order to maintain constantthe total number of individuals in the population.EC algorithms are strongly CPU-oriented; this makes it crucial to allocatemost of CPU resources towards more promising individuals, even if a certainCPU quota has to be ensured to all individuals in the population. A similarallocation of available resources is intrinsic in the selection process operatedby EC.Another fundamental aspect is the so-called premature convergence . Itconsists in the convergence of the algorithm towards a local maximum ofthe ﬁtness function and often negatively aﬀects the capabilities of EC of ob-taining satisfactory results. As shown in previous studies, see e.g. Ref. [29],the problem of premature convergence can be contrasted by subdividing thepopulation into multiple sub-populations. In a similar way, even if a sub-population has reached a premature convergence, with a consequent loss ofgenetic diversity, it is extremely unlikely that all sub-populations simultane-ously converge towards the same individual. In addition, migration plays afundamental role in this context. If a sub-population has converged towardsa local maximum, thus stopping to improve individuals, injecting an individ-ual from another independent sub-population might result in restarting theevolution for the prematurely converged sub-population.The global search of EC is generally extremely powerful to obtain goodsolutions that are close to a maximum of the ﬁtness function but is rather slowfor the search for the maximum itself. Local search techniques are insteadsuitable for the fast determination of the closest local maximum. For this9eason, one or more hill-climbing operators, devoted to a fast local search inthe proximity of a maximum, are sometimes introduced in EC to signiﬁcantlyspeed-up the determination of the maximum for the ﬁtness function.

Clustering is an important instrument in many scientiﬁc disciplines. Thepartitioning approach known as VQ [30] consists in the derivation of a set( codebook ) of reference or prototype vectors ( codewords ) from a given dataset. In a similar way, each subset of vectors ( patterns ) belonging to theoriginal dataset is represented uniquely by one codeword. Clusters can beeasily extracted based on their proximity to the available codewords. WhileVQ is concerned with ﬁndings a codebook to represent the original multi-dimensional dataset as well as possible, CA is conceived as the problemof identifying clusters of data, regardless the determination of a codebook.Codewords are determined by a procedure that consists in the minimizationof an objective function ( distortion ) representing the quantization error (QE)[31]. A widespread accepted classiﬁcation scheme distinguishes between K -means and competitive learning. Clustering algorithms belong to the ﬁrstof those classes [11, 32] and are based on the minimization of the averagedistortion through a suitable choice of codewords. In approaches belongingto the second category, a codebook is instead obtained as a consequence ofa competition process between codewords [33].Quantitatively, the objective of standard VQ consists in the representa-tion of a given set of vectors x ∈ X ⊆ (cid:60) k through a set, Y = { y , ..., y N C } ,of N C reference vectors in (cid:60) k . In this deﬁnition, the vectors y i represent thecodewords and Y is the codebook. A VQ can be therefore represented by afunction q : X −→ Y . The determination of q allows to obtain a partition S of the original dataset X constituted by N C subsets, S i , called cells : S = { S i ; i = 1 , . . . , N C } (2)Where each cell S i is deﬁned by the following equation: S i = { x ∈ X : q ( x ) = y i } (3) QE is the value deduced by d ( x , q ( x )) , being d a generic distance operatorbetween vectors deﬁned in X × Y . Several functions are conventionally usedfor distortion measurement [32]. For the purpose of this work, as it will be10iscussed in detail in Section 4, the scheme introduced by eq. 3 has beenmodiﬁed to be suitable for our classiﬁcation problem, where codewords arerepresented by hyperbolic-like curves of the type of eq. 1 and the distanceoperator d is deﬁned accordingly. However, the following discussion is validwithout any lack of generality.The performance of a given quantizer q is usually evaluated through theMean QE (MQE). When X is constituted by a ﬁnite number ( N P ) of pat-terns, MQE is generally given by: MQE ≡ D ( Y, S ) = 1 N P N C (cid:88) i =1 D i (4)where D i is the total distortion of the i -th cell, being it deﬁned by the fol-lowing equation: D i = (cid:88) n : x n ∈S i d ( x n , q ( x n )) (5)Equation 4 shows that MQE is equivalent to a function ( D ) of the codebook Y and the corresponding partition S .The core of a CA/VQ algorithm consists in the application of two im-portant conditions that are used for the calculation of the optimal partition(when the codebook is ﬁxed) and the optimal codebook (when the partitionis ﬁxed) [32]: • Nearest Neighbor Condition (NNC) . The NNC consists in assigning thenearest codeword, according to the meaning given by the metric d , toeach pattern in the original dataset X . For a given codebook Y , thefollowing partition is thus identiﬁed by the NNC: ¯ S i = { x ∈ X : d ( x , y i ) ≤ d ( x , y j ) ∀ y j ∈ Y ) } (6)The set ¯ S i deﬁned by the NNC corresponds to the so-called VoronoiPartition [30] of the original dataset and is usually indicated with thesymbol P ( Y ) = { ¯ S , · · · , ¯ S N C } . P ( Y ) corresponds to the optimal X partition, given the codebook Y [32]. • Centroid Condition (CC) . The CC is concerned with the procedure ofﬁnding the optimal codebook for a certain partition S of the originaldataset X , i.e. to determine the centroid ¯ x of each individual cell S i ,11ccording to the given metric. The corresponding codebook will bethen deﬁned by grouping all centroids ¯ x ( S i ) : ¯ X ( S ) ≡ { ¯ x ( S i ); i = 1 , ..., N C } (7) K -means LBG is an iterative algorithm that, N C being ﬁxed, for each iteration pro-duces a quantizer q better than or equal to the one obtained in the previousiteration. This approach is practically equivalent to that of the traditional K -means [34]. The steps through which LBG develops can be schematicallydescribed as follows:1. Initialization : in this phase, an initial codebook is chosen according toa given approach, often randomly (see e.g. [32]).2.

Partition calculation : Given the codebook determined in the previousstep, the related Voronoi Partition, (eq. 6) is calculated according tothe NNC.3.

Termination condition : The MQE at the current iteration D curr is com-pared with the one obtained in the previous iteration D prev . If the ratio | D prev − D curr | /D prev is less than a preﬁxed threshold ( ε ) then the al-gorithm ends; otherwise, it continues with the next step;4. Codebook calculation : By using the partition calculated in step 2, anew codebook is calculated according to the deﬁned CC (eq. 7).5. Return to step 2.

Key works in the ﬁeld of VQ have pointed out, both from a theoretical andan experimental point of view, that VQ approaches converge towards parti-tions whose cells contribute almost equally to the total distortion [35, 36].Under this hypothesis, one can easily state that the nuclear physics clas-siﬁcation problem cannot be approached through standard VQ algorithms[15, 17]. In particular, according to the features observed in Fig. 1, a phys-ically meaningful partition of the of the ( E , ∆ E ) plane is characterized byhyperbolic-like cells with strongly diﬀerent distortions. As an example, thecluster associated to ( Z = 1 , A = 1 ), indicated with H in Fig. 1, containsa number of patterns equal to several orders of magnitude those of ( Z = 5 , A = 10 ), B. Consequently, the distortion introduced by H is signiﬁcantlylarger than that produced by B. Even if a normalization to the number12f patterns is introduced, distortions would still be unbalanced among cellsas a result of the intrinsically diﬀerent dispersion of patterns around theirmean value observed in diﬀerent clusters; this aspect is more quantitativelydiscussed in Section 2. Another signiﬁcant limitation to the application ofstandard VQ algorithms is represented by the need to know the physicallymeaningful number of classes N C a priori. The latter, as stated in Section 2,is usually diﬀerent for each individual case.In order to develop a suitable unsupervised learning approach for the clas-siﬁcation of nuclear physics data based on VQ, we have modiﬁed the originalLBG method of Ref. [32] via the introduction of physical constraints deducedby the formal treatment of the interaction of radiation with the matter [25].In our modiﬁed LBG, a codebook corresponds to a particular choice of N par physical parameters P i that allow the calculation of a family of curves ( C ),one for each physical class ( Z , A ), being N C ﬁxed. Accordingly, the relateddistance function d is deﬁned in (cid:60) × C . The computation of the Voronoipartition, calculated as in eq. 6, is used for the NNC. The CC correspondsinstead to the determination of the best set of functional parameters P i fora given partition. The latter is operated by means of a gradient descenttechnique [37] applied to the mathematical expression of the total distortionin the parameters space.Resulting VQ algorithm is used as a hill-climbing operator devoted to thelocal search for a maximum of the ﬁtness function (and therefore a minimumof the quantization error of the codebook) in the proximity of a good individ-ual determined by an EC approach. To this extent, the initial codebook Y ,composed by the number of suitable physics classes N C , their ( Z , A ) valuesand an initial choice of parameters P i , is uniquely determined, fully auto-matically, through a global search procedure via evolutionary criteria, andthe VQ algorithm plays the role to speed-up the search for the maximum.The resulting approach is called Constrained Evolutionary Clustering (C-EC) and is described in detail in Section 4. Sections 5 and 6 are instead ded-icated, respectively, to discuss the capabilities of the algorithm in classifyingexperimental data and to a detailed comparison with previously publishedalgorithms for the classiﬁcation of data in nuclear physics experiments at lowand intermediate energies. 13 . The Constrained Evolutionary Clustering algorithm

C-EC is an algorithm for automatic data classiﬁcation in nuclear physicsexperiments based on EC and VQ. It is conceived for the classiﬁcationproblem typical of experiments that involve the detection of charged par-ticles produced in nucleus-nucleus collisions at low and intermediate energiesthrough longitudinally stacked detectors. In previously published automaticapproaches [15, 17–19] the link between extracted clusters and meaningfulphysics classes ( Z , A ) was a non-trivial task, often requiring non-negligiblehuman supervision and/or the use of a priori physics information, leading thescientiﬁc community to more often rely on human-supervised classiﬁcationmethods [22, 23]. To overcome these limitations, C-EC combines unsuper-vised learning techniques with physically meaningful constraints. The result-ing solution is a codebook, whose codewords are directly linked to physicalclasses.The algorithm is based on two-level blocks: the upper block is an EC algo-rithm that is devoted to derive, fully automatically, a physically meaningfulcodebook for the clusterization problem through the improvement of suitablydeﬁned individuals. To enhance the convergence speed, a hill-climbing oper-ator, which deals with the optimization of the codebook via a VQ algorithm,is introduced. The latter consists in the lower block of the algorithm andcan be intended as the fast search for a local maximum of the ﬁtness in theproximity of the individual determined by the EC block. Figures 2 and 3report ﬂow charts describing, respectively, the steps executed by the EC andVQ blocks. The VQ algorithm is invoked exclusively by the EC block as alocal hill-climbing operator. The mechanism outlined by the ﬂow chart of Fig. 2 can be described asfollows:1. A village V r is considered, selected according to a random uniformdistribution. Two random individuals, I p and I p , are selected withinthe village V r (including overlap regions with neighbor villages). Withinthe rest of the village V r − { I p , I p } , the worse individual, i.e. theindividual I w having the lowest ﬁtness value, is identiﬁed.14 election of a random village 𝑽 𝒓 and 2 random individuals 𝑰 𝒑 , 𝑰 𝒑 Initial PopulationCrossover between 𝑰 𝒑 and 𝑰 𝒑 MutationReplacement of 𝑰 𝒘 in 𝑽 𝒓 𝐅 𝑰 𝒘 ≥ 𝑽𝑸 𝒕𝒉𝒓 𝑭 𝑰 𝒃 Limit Iterations reached n o no End y e s Hill-climbing operator

Start

Hill-climbng operator

Figure 2: Description of the EC algorithm for initial codebook determination. When thegenetic iterations end, i.e. when a pre-deﬁned number of iterations is reached, the VQblock is invoked. The VQ block is also invoked during the genetic iterations, when aparticularly promising individual is produced.

2. The crossover between I p and I p is executed, followed by a mutationas described in Section 4.2.5.3. The oﬀspring is introduced in the population replacing I w .4. If the ﬁtness of the newly introduced individual is greater than a certain15 artition calculation (NNC) 𝑺 𝒎 = 𝑷(𝒀 𝒎 ) Distortion calculation 𝑫 𝒎 New codebook calculation (CC) 𝒀 𝒎+𝟏 = 𝑿(𝑺 𝒎 )𝒎 = 𝒎 + 𝟏 no Final codebook ( 𝒀 𝒎 ) y e s Input codebook 𝒀 Start 𝒎 = 𝟎𝑫 −𝟏 = +∞ End 𝑫 𝒎−𝟏 − 𝑫 𝒎 /𝑫 𝒎 < 𝜺 Figure 3: Description of the VQ algorithm devoted to codebook optimization. This blockis usually invoked by the EC algorithm, that provides the initial codebook Y . preﬁxed fraction V Q thr of the ﬁtness of the best individual ( I b ) in thepopulation, hill-climbing operator is invoked for the optimization of thenew individual. The latter (described in Section. 4.3) is a particularlytime consuming task and therefore V Q thr is usually chosen to be greaterthan .5. If the number of iterations executed is lower than a certain preﬁxedvalue, the algorithm returns to step 1.6. The best individual in the population is optimized by invoking thehill-climbing operator. 16

10 20 30 40 50 60 70 80010203040 genetic iteration 0 genetic iteration 800 genetic iteration 3500

Figure 4: Experimental data (color scale represents the counts in each bin, blue corre-sponds to less counts) and a visual representation of the best individual in the populationafter genetic iterations (top panel), genetic iterations (middle panel) and ge-netic iterations (bottom panels). The resulting codebook after iterations is quitesatisfactory to the classiﬁcation process. The bottom-right panel is a zoom of the low Z clusters after iterations. Figure 4 shows an example of individual improvement by adopting apopulation of N v = 13 , N i = 5 , with an overlap of individual betweenvillages. genetic iterations are considered. The best individual in thepopulation after , top panel, , middle panel, and iterations, bot-17om panels (right panel is a zoom of the low- Z region), is shown. Data,represented by points (the color scale represents the counts), are obtainedby using a pair of longitudinally stacked silicon detectors (see Section 5 foradditional experimental details). The individual is represented by red lines,each corresponding to a given codeword. It is obvious that the results after iterations, i.e. the best individual in the initial population generated ran-domly, produces a highly unsatisfactory classiﬁcation of data. After about iterations, the result is visually satisfactory for low ∆ E clusters. The so-lution obtained after genetic iterations is very close to a good maximumof the ﬁtness function. By visually inspecting bottom panels in ﬁgure, oncan clearly see that all clusters are correctly identiﬁed, including a group of poorly populated and high-dispersion clusters located in the upper part ofthe bi-dimensional distribution. A number of iterations is found to belargely suﬃcient to obtain a fully satisfactory codebook for all cases exploredin this paper. It is reasonable that the number of iterations to perform couldbe slightly adjusted by the experimenter based on the performance of thealgorithm in the classiﬁcation problem explored. The EC algorithm schematically described by the ﬂow chart in Fig. 2is focused on the global search for an optimal solution of the clusterizationproblem. This is done through the improvement of individuals via the imple-mentation of the selection criteria described in Sect. 4.2.1. Because individu-als represent solutions of a clusterization problem, they are suitably encodedto represent codebooks of the data to classify. In a similar way, one can statethat the EC block operates a codebook improvement. However, diﬀerentlyfrom the local search performed by the VQ block (Sect. 4.3), which is donein the proximity of the input codebook determined by the EC block, hereany possible codebook is explored and the search is in this sense global .As previously stated, one of the major novelties of our approach consistsin the use of physical constraints for the derivation of a physically meaning-ful codebook. We deﬁne the adopted codebook in the following way. Let usconsider a functional of the form ∆ E = f P ( E, Z, A ) , where ( E , ∆ E ) pairs ∈ (cid:60) represent the coordinates of patterns in the original bi-dimensionaldistribution to classify, a ( Z , A ) pair indicates a given ion and P is a setof N par numerical parameters that constrain the functional. For a given ¯ P = { ¯ P , . . . , ¯ P N par − } set, the vector ( f ¯ P ( E, Z, A ) , E ) represents the loca-tion of a ( Z , A ) isotope in the ( E , ∆ E ) plane, corresponding to the abscissa18 and given the parameter vector ¯ P . A diﬀerent choice of parameters P i willresult in a diﬀerent location. If Z = ¯ Z and A = ¯ A are ﬁxed, i.e. if a certainion is considered, the locus of ( f ¯ P ( E, ¯ Z, ¯ A ) , E ) points can be treated as thecodeword associated to the cluster ( ¯ Z , ¯ A ), being ¯ P the functional parame-ters. Without full mathematical rigor, let us indicate with C the class of allpossible ( f P ( E, Z, A ) , E ) curves, corresponding to any ( Z , A ) isotope and anypossible choice of parameter set { P i ∈ (cid:60)} . A codebook can be then deﬁnedby considering the N C elements of C corresponding to a certain set of param-eters ¯ P and a certain choice of ( Z , A ) pairs { ( ¯ Z , ¯ A ) , . . . , ( ¯ Z N C − , ¯ A N C − ) } .In such a way, diﬀerent codewords within a codebook diﬀer exclusively for( Z , A ) values, i.e. each codeword is related to a diﬀerent ion being the P setﬁxed. The clusterization problem is therefore equivalent to the search for theoptimal set of parameters P and of ( Z , A ) pairs.In this framework, the functional f P ( E, Z, A ) is any possible paramet-ric functional suitable for the description of ( E , ∆ E ) signals produced bycharged particles in longitudinally stacked detectors. Several valuable mod-els are available in the literature, being derived from the well-know formalismdescribed in [25], see for example [21, 22]. To produce the results describedin this paper, we adopted the analytical model introduced in Ref [21]: ∆ E = f P ( E, Z, A ) == (cid:104) ( P E ) P + P +1 + (cid:0) P Z P A P (cid:1) P + P +1 + P Z A P ( P E ) P (cid:105) (1 / ( P + P +1)) (8)Where the dependence on Z and A is explicit. For the functional of equation8, one has N par = 7 .To fulﬁll the requirements of the physically meaningful codebook de-scribed above, we adopted a hybrid ﬂoat/integer genetic encoding. Figure 5schematically represents the encoding of an individual. The ﬁrst N par ﬂoat-ing points are used to identify a given parameter set for the functional givenby eq. 8. Possible ( Z , A ) pairs identifying physically meaningful isotopes areconsidered according to the compilation published on Ref. [38] and organizedinto a database for increasing Z A values, as suggested by the simpliﬁed for-mula of eq. 1. In this way, larger indexes are associated to ( Z , A ) pairs whosedata lie on a higher ∆ E region of the ( ∆ E , E ) plane and only one numericalindex is needed to identify a given isotope. For the sake of clarity, we haveintroduced the notation n i to indicate the index corresponding to the i -th19 𝑃 𝑁 𝑝𝑎𝑟 −1 (...) 𝑁 𝐶 𝑛 𝑛 𝑁 𝐶 −1 (...) c l u s t e r s f un c t i o na l pa r f l o a t pa r t i n t e g e r pa r t Figure 5: Genetic encoding of an individual according to the EC algorithm developed inthis work. A hybrid ﬂoat/integer encoding is chosen in order to suitably implement acodebook to the clusterization problem. cluster, being i = 0 , . . . , N C − . Together with the N par functional param-eters, N C and the set { n i ; i = 0 , . . . , N C − } are required to complete theencoding of an individual. In this manner, an individual identiﬁes a uniquemeaningful codebook. The initial population is generated randomly. Resulting individuals ( I i )are distributed according to the scheme described in Fig. 6. In order to limitthe probability of premature convergence of the algorithm and to suitablyimplement the migration, as discussed in Section 3.1, the population is sub-divided into N v villages, each containing an equal number N i of individuals.The ﬁrst and last individual of each village belong simultaneously to twocontiguous villages. In this way, as shown in Fig. 6, ﬁrst and last villagesin the population are connected. The presence of such overlap regions ef-fectively implements the migration of individuals through diﬀerent villages.It is important to note that, to avoid premature convergence of the entirepopulation, overlap involves exclusively pairs of neighbor villages.20 ... ) ( ... ) overlap ( ... ) ( ... ) overlapoverlapoverlap 𝑉 𝑉 𝑁 𝑣 −1 𝑉 Figure 6: The distribution of individuals in the population.

The generation process of each individual involves four diﬀerent steps. (1)A set of functional parameters is produced, being each parameter generatedaccording to a uniform random distribution. For simplicity, parameters arechosen to vary within physically meaningful ranges. The latter can be easilydetermined by considering the physical meaning of each parameter, as dis-cussed in Ref. [21]. (2) N C is generated with a uniform integer distribution,with minimum value . (3) N C numerical indexes n i are picked randomlywithout duplicates. (4) The ﬁtness of the newly generated individual is cal-culated. The selection mechanism operated by the EC algorithm is based on theﬁtness function, that we indicate with f fit , being the latter the objective tomaximize. Consequently, a suitable choice of f fit is crucial to the success ofthe algorithm. We deﬁne three terms that contribute in the ﬁtness function: f e , related to total the distortion error associated to the codebook identiﬁed21y the individual, f n , related to the number of codewords N C , and f avg ,related to the average distortion per unit of pattern associated to each Z -group of clusters.The total distortion is obtained through the following equation: e = N P − (cid:88) i =0 d ( x i , q ( x i )) (9)where d : (cid:60) × C → (cid:60) +0 is the modiﬁed distance function: d ( x i , q ( x i )) = | ∆ E i − f P ( E i , Z, A ) | (10)being x i = ( E i , ∆ E i ) the i -th pattern and q ( x i ) = { ( E, f P ( E, Z, A )) } thecodeword associated to the i -th pattern within the considered codebook.The latter is easily determined by scanning through the codebook till thecodeword with the minimum | ∆ E i − f P ( E i , Z, A ) | value is found.By using the formulation of the total distortion introduced by eq. 9, the f e term can be deﬁned as follows: f e = e max − ee max (11)where e max is the maximum expected error a priori. By default, e max iscalculated as the mean absolute error of the pattern ordinates: e max = N P (cid:80) N P i =0 | ∆ E i − ∆ E avg | .From the formulation of f e , it is obvious that introducing a larger numberof ( Z , A ) clusters in the codebook represented by the individual usually con-tributes to decrease the distortion error e and therefore to increase f e . Forthis reason, if only f e contributes to f fit , the algorithm will naturally con-verge towards individuals with numerous clusters; f n is required to contrastthis phenomenon. We deﬁne f n in the following way: f n = n max − N C n max (12)where n max is the maximum allowed number of clusters. f avg is introduced in the ﬁtness function to account for the following factspointed out in Section 2: (i) the number of patterns within a cluster diﬀerssigniﬁcantly between clusters, (ii) clusters associated to higher Z -values have22arger dispersion and the dispersion is roughly similar between diﬀerent sub-clusters of a given Z -value. For f avg one has: f avg = 1 N Z (cid:88) Z (cid:48) (cid:88) A (cid:48) N ( Z (cid:48) ,A (cid:48) ) P D ( Z (cid:48) ,A (cid:48) ) (13)where we have indicated with N Z the number of diﬀerent Z -values present inthe codebook, the sum on Z (cid:48) is extended to all available Z -values, the sum on A (cid:48) is extended to all ( Z (cid:48) , A (cid:48) ) sub-clusters corresponding to the same Z (cid:48) valueand D ( Z (cid:48) ,A (cid:48) ) is the total distortion (eq. 9) introduced by the cluster ( Z (cid:48) , A (cid:48) ),being N ( Z (cid:48) ,A (cid:48) ) P its total number of patterns. Equation 13 has the meaningof average distortion per pattern of Z -groups of clusters. The distortionper pattern is used to account for the requirement (i), outlined above, whileaveraging on all Z -values has a physical meaning because of (ii).The terms of equations 11, 12 and 13 are combined together to form theﬁtness function f fit : f fit = 100 f e u ( f e ) + α n f n u ( f n ) + α avg f avg α n + α avg (14)Where we introduced the Heaviside step function u ( x ) to smooth the eﬀectsdue to solutions with e > e max or N C > n max . α n and α avg are used totune the relative importance of f n and f avg , respectively, with respect to f e .Typically, α n ∈ [0 . , . and α avg ∈ [0 . , . are found suitable for theclassiﬁcation process. Crossover is operated through the algorithm schematically described inFig. 7 and according to the following procedure. Let us consider two individ-uals, I and I (cid:48) , I having encoding { P , . . . , P N par − } , N C , { n , . . . , n N C − } and I (cid:48) with encoding { P (cid:48) , . . . , P (cid:48) N par − } , N (cid:48) C , { n (cid:48) , . . . , n (cid:48) N (cid:48) C − } . In addi-tion, let us indicate with I (cid:48)(cid:48) , { P (cid:48)(cid:48) , . . . , P (cid:48)(cid:48) N par − } , N (cid:48)(cid:48) C , { n (cid:48)(cid:48) , . . . , n (cid:48)(cid:48) N (cid:48)(cid:48) C − } , theresult of the crossover of I and I (cid:48) . Tho distinct processes are performed forthe functional parameters and the clusters. Each functional parameter P (cid:48)(cid:48) i ofthe oﬀspring I (cid:48)(cid:48) is obtained via a so-called extended average of the analogousparameters from the encoding of I and I (cid:48) : P (cid:48)(cid:48) i = αP i + (1 − α ) P (cid:48) i (15)23 𝑃 𝑁 𝑝𝑎𝑟 −1 (...) 𝑁 𝐶 𝑛 𝑛 𝑁 𝐶 −1 (...) CV 𝑃 𝑃 𝑁 𝑝𝑎𝑟 −1′ (...) 𝑁 𝐶′ 𝑛 𝑛 𝑁 𝐶 −1′ (...)extended averageextended average(...)extended averageCVsuperset of non-duplicated indexes f ee d s up e r s e t f ee d s up e r s e t 𝑛 𝑛 𝑁 𝐶′′ −1′′ (...) p i c k r and o m parent 2parent 1 Crossover of parent 1 and parent 2 Figure 7: A schematic description of the crossover between two individuals. where α is a random number (a newly generated for each i = 0 , . . . , N par − )uniformly generated in the range α ∈ [ − ε par , ε par ] , being ε par ≈ . acertain constant.A similar procedure is performed for the number of clusters: N (cid:48)(cid:48) C = αN C + (1 − α ) N (cid:48) C , α being extracted in the range α ∈ [ − ε C , ε C ] . Once N (cid:48)(cid:48) is determined, a superset of indexes n is produced containing all the non-duplicated indexes that better approximate the original I and I (cid:48) codewordsin the new parameter set { P (cid:48)(cid:48) i } . N (cid:48)(cid:48) C indexes are then picked randomly fromthe superset of indexes. If the size of the superset is lower than N (cid:48)(cid:48) C , the re-maining indexes to complement the set { n (cid:48)(cid:48) i } are extracted randomly from thedatabase constructed using the data of Ref. [38]. In a similar way, one obtainsan individual with the following characteristics: (i) functional parameters P (cid:48)(cid:48) i have intermediate values between those of the parents, but some probabilityto obtain external values exists, (ii) the number of codewords is close to those24f the parents but some probability to be larger or smaller than that of theparents exists, (iii) the spatial disposition of resulting codewords is in goodgeometrical matching with codewords of the parents I and I (cid:48) , even if, withsome small probability, clusters in regions of the ( E , ∆ E ) plane not coveredby I and I (cid:48) codewords can be produced. Mutation is implemented both for the ﬂoat and integer part of the indi-vidual encoding. For the ﬂoat part, the, each of the functional parameters isaltered with a small probability p parmut , usually chosen to range within . and . . For the integer part, resulting N (cid:48)(cid:48) C is altered of one unity ( ± ) witha small probability p Cmut ∈ [0 . , . .An interesting mechanism of p parmut and p Cmut variation is introduced toensure the stability of the genetic diversity, thus helping to contrast thepremature convergence of the algorithm. At the beginning, p parmut and p Cmut are the minimum possible. This choice is to allow a fast elimination ofparticularly bad individuals. To monitor the genetic diversity, the ranges ofparameter variation are then divided in cells for each of the parameters.Each cell corresponds to a given choice of a given parameter, within a smallinterval. A cell is considered full if at least one individual has a parametercontained in the corresponding interval. At each iteration of the EC block,the total number of non-empty cells is calculated. This value is expressed aspercentage of the total number of cells and is used to evaluate the geneticdiversity of the population. If the genetic diversity drops below a preﬁxedthreshold, p parmut is increased. On the contrary, p parmut is suitably decreased ifthe genetic diversity surpasses a certain threshold. After the process summarized by Fig. 7 is done, an algorithm of smartelimination and insertion of codewords is executed. This deals with removingunnecessary codewords and inserting more useful codewords. Unnecessarycodewords are identiﬁed by calculating the number of patterns in the cor-responding cluster. If the latter is lower than a certain minimum size, thecodeword is removed from the codebook. New codewords are then insertedin order to restore their original number. Insertion process is executed ac-cording to the following steps: (i) a loop through all the codewords is donestarting from a randomly selected codeword, (ii) if a cluster whose distortionper unit of pattern is above the average in the codebook is found, the indexes25

10 20 30 40 50 60 70 800510152025303540 (a) (b) (c)

Figure 8: An example of crossover (c) between the individuals (a) and (b). Experimentaldata are represented by gray dots. The codebook encoded by the individuals is graphicallyrepresented by means of red dashed lines. Each panel shows one individual. Resultingindividual (c) contains clusters whose geometrical dispositions is in matching both withsome (a) and (b) codewords. related to its lower and upper neighbor clusters are taken into consideration.(iii) If one of those is not present in the codebook, it is then introduced. Ifafter this procedure the number of clusters is lower than the original numberof clusters before smart elimination, indexes are picked randomly from thedatabase of ions (without duplicates) till the required number is reached.Figure 8 shows a graphical example of crossover between two individuals.In the ﬁgure, experimental data are represented by gray dots, while thecodebook encoded by individuals is represented by dashed red lines. Data areobtained by using two longitudinally stacked silicon detectors, as describedin Section 5. Individuals are indicated with labels (a), (b) and (c). A singlepanel is used to represent each individual: (a) Left panel, (b) center panel,(c) right panel. (c) represents the result of the crossover between (a) and(b). In this example, (a) covers only the lower region of the ( E , ∆ E ) plane,while, on the contrary, (b) contains uniquely clusters geometrically located inthe upper part of the plane, resulting in a highly unsatisfactory classiﬁcationfor low Z clusters. In addition, a number of clearly unnecessary clustersis present, especially in the codebook identiﬁed by the individual (b). Theresulting crossover (c), obtained with the prescriptions discussed above and26y combining the parents (a) and (b), contains codewords whose geometricaldisposition covers suitably the entire range of data. In addition, the numberof clusters that are unnecessary to the classiﬁcation process is reduced, as aresult of the smart elimination and insertion process. By carefully inspecting the bottom panels of Fig. 4, one can observe anumber of codewords not in fully satisfactory agreement with observed clus-ters. This is due to the fact that a large number of iterations is required foran EC algorithm to fully converge to an absolute maximum of the ﬁtnessfunction, and the performances are consequently poorer when approachingextremely high ﬁtness individuals. To speed-up the optimization of the code-book, the VQ block is used as hill-climbing operator for the EC block.Codebook optimization is operated through the algorithm schematicallydescribed in Fig. 3. After the initial codebook Y is calculated (output of theEC block), the following optimization steps are iterated:1. Iteration m -th begins. The Voronoi partition S m of the input data iscomputed (NNC, Sect. 4.3.1) according to the Y m partition calculatedin the ( m − -th iteration.2. The total distortion D m associated to the S m partition is calculated(Section 4.3.2).3. If ( D m − − D m ) /D m < ε , being ε a given preﬁxed value, then Y m isused as ﬁnal codebook and the optimization process ends.4. A new codebook Y m +1 is calculated (CC, Section 4.3.3) starting fromthe partition determined in step 1.5. The algorithm returns to step 1. The NNC consists in the calculation of the Voronoi partition P ( Y ) = { ¯ S , . . . , ¯ S N C } according to the deﬁnition of eq. 6. Adopted distance functionis that of eq. 10. MQE is deﬁned by eq. 4, being the total distortion within a cell D i calcu-lated according to eq. 5 and by using the modiﬁed distance function deﬁnedby eq. 10. 27 .3.3. CC Let us assume that a codebook ¯ Y is deﬁned by the parameter set ¯ P = { ¯ P , . . . , ¯ P N par − } , a number of codewords ¯ N C and the set of physics classes { ( ¯ Z , ¯ A ) , . . . , ( ¯ Z ¯ N C − , ¯ A ¯ N C − ) } . If one considers an input partition ¯ S = { ¯ S , . . . , ¯ S ¯ N C − } , where ¯ S i is the cell associated to the physics class ( ¯ Z i , ¯ A i ) ,CC consists in a suitable variation of each individual parameter P i ( i =0 , . . . , N par − ) in order to minimize the MQE associated to ¯ Y . It is im-portant to note that, according to the scheme introduced in this paper, avariation of any parameter ¯ P i will aﬀect the absolute position of all code-words in the codebook, in a physically meaningful way. To this end, suchMQE minimization corresponds to the search for the minimum of the totaldistortion function in the parameters space.The function of the total distortion error in the parameters space can bedeﬁned as follows: E ( P , . . . , P N par − ) = N C − (cid:88) i =0 N iP − (cid:88) j =0 | ∆ E ij − f P (∆ E ij , Z i , A i ) | (16)where the sum on i is extended to all the cells in the input partition, thesum on j is intended on all the N iP patterns in the i -th cell, and ( Z i , A i )is the physics class associated to the i -th cell. The dependence on the pa-rameters { P i } is implicit in the deﬁnition of f P as shown, for example, forthe model deﬁned by eq. 8. CC consists in the minimization of the function E ( P , . . . , P N par − ) . To this end, a gradient descent technique is used [37].The mathematical expression of the gradient of E ( P , . . . , P N par − ) inthe parameters space can be easily derived from eq. 16: ∇ E ( P , . . . , P N par − ) =  ∂E∂P . . . ∂E∂P Npar −  (17)and involves the partial derivatives of f P with respect to P k , ∂f P ∂P k . The lattercan be computed analytically for a functional f like the one of eq. 8, butnumerical approximations can also be used if more complex functionals areadopted. The vector deﬁned by eq. 17 represents the direction, in the param-eters space, of maximum increment for E . The direction identiﬁed by −∇ E can be therefore used to vary the parameter vector P towards the maximumdecrement of E . 28 ( P , . . . , P N par − ) minimization can be thus executed through the fol-lowing steps:1. A vector η = { η , . . . , η N par − } is deﬁned, being η i suitably small num-bers. The latter depend usually on the amplitude of the error functiongradient. For the model of eq. 8, values η i ≈ − are found appropri-ate. However, he use of non-optimal initial η i values will not aﬀect theaccuracy of the results but exclusively the number of iterations requiredfor the minimization of E .2. E prev = E ( P , . . . , P N par − ) is calculated and conserved.3. A new set of parameters is obtained starting from the initial parametervector P and according to the opposite of the direction of the gradient, P newi = P i − η i ( ∇ E ( P , . . . , P N par − )) i .4. E new = E ( P new , . . . , P newN par − ) is calculated and conserved.5. if E new > E prev , then all η i are multiplied by a suitably small factor,usually ≈ . , and the algorithm returns to step 3.6. The gradient in the new parameter set ∇ E ( P new , . . . , P newN par − ) is cal-culated. If ∇ E ( P new , . . . , P newN par − ) i ∇ E ( P , . . . , P N par − ) i > then η i is multiplied by a factor greater than , usually ≈ . , otherwise it ismultiplied by a suitably small factor, usually ≈ . .7. The parameters are updated to the new values P i = P newi .8. Steps 2 to 7 are iterated until the condition ( E prev − E new ) /E new < ε CC is veriﬁed, being ε CC a preﬁxed value.9. The minimum for E ( P , . . . , P N par − ) is found in correspondence ofthe parameter set { P , . . . , P N par − } and the CC is terminated.

5. Test of the algorithm with experimental data

To probe the capabilities of the C-EC algorithm in the classiﬁcation ofnuclear physics data, we considered experimental data recorded by two lon-gitudinally stacked layers of silicon. The experiment was performed at theTRIUMF laboratory of Vancouver (Canada). A Li accelerated beam wasdelivered on a LiF target at an energy of MeV. Li+ Li, Li+ Li and Li+ F collisions were investigated. They resulted in the production of sev-eral ions especially in the range ≤ Z ≤ . Detection apparatus consistedof pairs of semiconducting detectors (silicon) each having individual de-tection units. Consequently, the number of independent bi-dimensional dis-tributions to classify is . The experiment was aimed at the investigation of29 E ( M e V ) D Figure 9: Result of the classiﬁcation of data collected in the TRIUMF experiment for atypical detector. Data are represented by dots, whose color reﬂects the number of countsin a given bin. Deduced codebook used for the classiﬁcation (red lines represent theposition of each codeword) is produced after genetic iterations and the VQ algorithmoptimization process. The insert shows a zoom of the low Z region. the production of resonances in light neutron-rich nuclei. The technique wasbased on the exploration of the invariant mass of ions emitted in resonancedisintegration events. In similar studies, that are widely used in experimentsfocused at the discovery of new resonances, high-precision data classiﬁca-tion is critical. As an example, the experiment explored the existence ofnew highly-excited states of Be and their decay in channels involving theemission of clusters such as He+ He, He+ He, t + Li or p + Li. Such infor-mation, and especially distinguishing among the various emission channels,is relevant to fully understand the production of clustered states in nuclei[39, 40] and the possible formation of nuclear molecules [41]. Despite therelatively low complexity of the apparatus, our experiment represents a goodbenchmark for automatic classiﬁcation methods because of the requirementof a highly-precise identiﬁcation of a variety of ions, including both Z and A values.Figure 9 shows the result of the C-EC processing of a typical detectorcase after genetic iterations and local search operated by the VQ block.30f compared to the bottom panels of Figure 4, where the best codebook after genetic iteration is shown, one can semi-quantitatively observe that thevarious codewords signiﬁcantly better approximate observed clusters with re-spect to the pure EC individual improvement. This a result of the codebookoptimization procedure. The result is extremely satisfactory for all cluster.The insert in ﬁgure shows a zoom of the low Z region. Even if a deviationin the high E part of clusters is present, reﬂecting ions that have suﬃcientkinetic energy to punch-through the second detection layer, and resultingin an inversion of the cluster towards low ∆ E and E values, low Z code-words are in fully satisfactory agreement with clusters corresponding to H, H and H isotopes. A similar deviation is present in many experiments inthis energy domain and corresponding data populate a region undesired tothe classiﬁcation process. Furthermore, none of the identiﬁed codewords isassociated to regions populated by background counts present in the spec-trum. This is a particularly relevant result as it shows that the algorithm isalmost insensitive to noise.The plot of Figure 9 allows to a visual semi-quantitative inspection of theproduced codebook but does now allow a completely quantitative analysisof the quality of the classiﬁcation. Classiﬁcation is quantitatively proventhrough a procedure similar to the one proposed in Ref. [22]. Data shown onFigure 9 is initially subdivided into groups corresponding to their identiﬁed Z -value. This is done by considering their proximity to groups of codewordsassociated to each Z -value. If a ( ∆ ¯ E , ¯ E ) point is closer to the group of linesassociated to a certain ¯ Z , then ( ∆ ¯ E , ¯ E ) is classiﬁed as Z = ¯ Z . By consideringall data classiﬁed to a given ¯ Z -group, then a point is associated to the closestpossible A for that particular ¯ Z -value, A = ¯ A . To quantify the quality ofthe classiﬁcation, the normalized distance from the ¯ A codeword to the pointis calculated. The distance is normalized in such a way that the neighborcodeword ( A = ¯ A + 1 ) has a distance of from the ¯ A -codeword. In thisway, a point is classiﬁed as ( ¯ Z , ¯ A ) if it has a normalized distance lower than . from the ( ¯ Z , ¯ A ) codeword. In Figure 10 we report normalized distancedistributions for data classiﬁed as Z = 1 , , , , , respectively from left toright and from top to bottom. Several peaks are present and associated todiﬀerent identiﬁed isotopes from H to B. Peaks associated to H, H and Hare well-visible, even if they are broader as a result of the unavoidable eﬀect ofpunch-through discussed above. Very interestingly, unphysical classiﬁcationsare not present. As an example, no peak is visible at A = 5 for Z = 2 , asthe corresponding He is an unbound nucleus. Similarly, as expected by a31 c oun t s H H H c oun t s He He He He c oun t s Li Li Li Li Li c oun t s Be Be Be Be Be c oun t s B B B B Figure 10: Quantitative analysis of the quality of the classiﬁcation obtained from thecodebook shown in Fig. 9. We adopted a similar procedure than the one proposed inRef. [22]. Data are subdivided into diﬀerent panel, each corresponding to an identiﬁed Z -value. Peaks in the panels correspond to each identiﬁed A -value associated to a given Z , they are identiﬁed by labels. meaningful classiﬁcation, Li and Be peaks are also missing. The latter ischaracterized by an extremely reduced lifetime (of the order of − seconds)and undergoes therefore to decay into lighter fragments before reaching the32 c oun t s H H H c oun t s He He He He c oun t s Li Li Li Li Li c oun t s Be Be Be Be Be c oun t s B B B B noisy casefew classes case Figure 11: Analysis of the classiﬁcation results, with the procedure suggested in Ref. [22],for a particularly noisy case (solid line) and a detector characterized by the presence ofa reduced number of clusters (dashed red line). The ﬁrst is a detector placed close tothe beam line; in this case larger statistics of particles in a broader energy and Z domainare expected. Data for the few clusters case is collected by a detector placed at a largerdistance from the beam line; lower rate of ions in a narrower Z domain is expected in thiscase. Obtained classiﬁcation is highly satisfactory in both cases. detectors. 33xtremely satisfactory classiﬁcations have been obtained, consistently,for all detectors explored in our benchmark. Figure 11 shows the ob-tained classiﬁcations, according to the method proposed in Ref. [22], for aparticularly noisy detector (solid lines in panels of Fig. 11) and for a detectorcharacterized by a reduced number of clusters (dashed red line in Fig. 11).Diﬀerences in the detectors arise essentially from their geometrical disposi-tion: the noisy detector was placed very close to the beam line and is thereforesubject to a larger rate of impinging ions having broader energy distributions,causing the presence of undesired noise, while the other is placed at a largerdistance from the beam line, being therefore characterized by a lower rateof incident ions in a narrower Z domain. As clearly observed, meaningfulphysics classes are correctly identiﬁed in both cases. In the top left panel( Z = 1 ), the three peaks associated to H, H and H are present, but theyare broader for the noisy detector, reﬂecting a larger amount of data in theregion corresponding to ion punch-through. The peak corresponding to Hein the top right panel ( Z = 2 ) results partially merged with the neighbor Hepeak (populated with the highest statistics) in the case of the noisy detector,while the He- He separation is excellent for the few clusters case. He peakis extremely weakly populated in the few clusters case, reﬂecting the lowstatistics of production of the neutron-rich ion He for the emission directioncovered by the detector. A more populated He peak is instead observedfor the noisy case, where the statistics of production of a larger number ofions is quite signiﬁcant. Very interestingly, as expected, Z range is morelimited for the detector placed at a larger distance from the beam line, and,in particular, no clusters are identiﬁed as Z = 5 . Also in the Z = 4 classesthe two detectors diﬀer, one can observe signiﬁcant Be, Be, Be, Be and Be peaks in the detector placed close to the beam line, while only Be and Be are signiﬁcantly identiﬁed for the one distant from the beam line.The analysis discussed in this section comprises independent classiﬁ-cation problems. The CPU time requested to complete the task resulted tobe about seconds on a commercial Intel i7-9700K (8 cores) processor ata frequency of . GHz. This is a satisfactory result, which testiﬁes that thealgorithm is suitable for analysis of data produced by large apparatus (wherethe number of individual classiﬁcation problems is usually between severalhundreds and a few thousands). 34 . Comparison with other approaches

Let us summarize the capabilities of the C-EC algorithm: (i) no a prioriinformation is required, number of clusters to classify and relevant ( Z , A )values are obtained through an individual improvement algorithm based onevolutionary criteria, (ii) after suitable codebook optimization, the solutionof the clusterization problem is readily usable for data classiﬁcation withexplicit link to physically meaningful classes, (iii) the algorithm is robust tonoise.This section is dedicated to a detailed comparison between the C-ECalgorithm and previous approaches published in the literature [15–20, 22, 23].The work published in Ref. [15] is probably the ﬁrst example of classiﬁca-tion of nuclear physics data recorded by stacks of detectors via unsupervisedlearning approaches. It is based on the visual analysis of bi-dimensionalassembly of data via sophisticated image segmentation techniques. Thisapproach has been used exclusively in cases where only Z -classiﬁcation ismeaningful. Major limitation of the approach consists in the need for a pri-ori information such as the slope of clusters and their inter-distances. Evenif the authors show how to systematically calculate these quantities based onphysical considerations within accuracy, the method is eﬀective only fora reduced class of detectors for which a reliable energy calibration is possiblewithout assumptions on data classiﬁcation.Ref. [17] is concerned with the use of pre-attentive neural systems. Perfor-mances are particularly good for groups of clusters characterized by similarinter-distance and dispersion, e.g. the high Z -region, where the resolutionis not suﬃcient to distinguish sub-clusters associated to diﬀerent A -values.The eﬀectiveness of the approach in the low Z -region, where one can expectto observe a separation of clusters due to the A -value, is reduced as a resultof the variety of dispersion and inter-distance between clusters. The link tophysically meaningful classes relies on the deﬁnition of Z -value for the lastidentiﬁed cluster. 35 o r k a pp r oa c h s ub - c l u s t e r s Z - r a n g e l o w - e n e r g y n oa p r i o r ili n k t o c l a ss . c l a ss .i n f o . ( Z , A ) R e f .[ ]i m ag e n o f u ll y e s n o f r o m a p r i o r ii n f o s e g m e n t a t i o n R e f .[ ] p r e - a tt e n t i v e n oo n l y h i g h Z y e s y e s n o n e u r a l s y s t e m R e f .[ ] s p a t i a l d e n s i t y n o f u ll n o y e s n o d a t a p r o ce ss i n g R e f .[ ] d a t a s li c i n g y e s f u ll y e s y e s n o R e f .[ ] f u zz y -- y e s y e s n o c - m e a n s R e f .[ ] a rt i ﬁ c i a l y e s f u ll y e s n o f r o m a p r i o r ii n f o n e u r a l n e t w o r k s R e f .[ ] m o d e li n go f y e s f u ll y e s n o f r o m a p r i o r ii n f o p a tt e r n s R e f .[ ] d a t a s li c i n g y e s f u ll y e s n o f r o m a p r i o r ii n f o T h i s w o r k E C a nd V Q y e s f u ll y e s y e s y e s T a b l e : Su mm a r y o f t h ec o m p a r i s o n s w i t h o t h e r a pp r oa c h e s f o r nu c l e a r ph y s i c s d a t a c l a ss i ﬁ c a t i o n .

36 spatially density data processing is reported in Ref. [18]. The procedureis based on the identiﬁcation of points along clusters and their subsequentlinearization. Performances are strongly reduced in the low energy part ofdata, i.e. low E by looking at Fig. 1, because of the rapid change in slope ofclusters. In addition, sub-clusters associated to A -values are not identiﬁed.The procedure is particularly powerful for online data analysis cases, wherea reduced body of information is typically suﬃcient, for example to monitorthe stability of large detectors.More recently, Ref. [19] has proposed a classiﬁcation method based on asimple data slicing procedure. The approach is capable to identify both Z -clusters and A -clusters but the association of extracted clusters to physicallymeaningful ( Z , A ) classes is left to the operator.Fuzzy clustering methods derived from the c-means were adopted inRef. [16]. However, the approach is restricted exclusively to cases comprisingfew classes.Artiﬁcial neural networks were adopted in Ref. [20] to approximate thespatial disposition of physics clusters in bi-dimensional distributions of data.The underlying mechanism is that of a supervised learning approach, wherea number of patterns are manually extracted by the operator for each ( Z , A )class to then feed the training process of the neural network. Even if resultingclusters are directly linked to ( Z , A ) pairs, the whole procedure heavily relieson human-supervision.In a similar way with respect to Ref. [20], the authors of Ref. [22] adopteda supervised learning procedure where the classiﬁcation capabilities rely ona number of manually extracted patterns. Diﬀerently from Ref. [20], a for-mal model of data with explicit ( Z , A ) dependence is used, thus allowingthe extraction of patterns only for a reduced number of classes. Resultsare extended to all possible ( Z , A ) via model extrapolation. The procedurecritically relies on the quality of the information extracted by the operator.Finally, a new approach based on data slicing was very recently proposedin Ref. [23]. The procedure combines peak ﬁnding algorithms with informa-tion that needs to be manually extracted by an operator. The mechanismis similar to that proposed in Ref. [22] with the major diﬀerence that thepatterns needed to constrain the model are extracted automatically from aninitial information given by the operator. The advantage of this method withrespect to [22] consists therefore in the strongly reduced information requiredby the algorithm, thus signiﬁcantly minimizing human supervision. Even ifan explicit link to ( Z , A ) classes is provided, the result of the classiﬁcation is37ﬀected by the initial information provided by the operator and an a priorivisual inspection of data distributions is always required.Comparisons are summarized in table 1. As one can easily observe, ourapproach is the only one where an explicit link between identiﬁed clusters and( Z , A ) values is done in a fully automatic way. The vast majority of other ap-proaches is exclusively concerned with the extraction of clusters and humansupervision is consequently required to associate them to physically meaning-ful classes. A link to Z -classes is done by the algorithm of Ref. [15], but theprocedure requires the use of a priori information. It is important to stressthat an automatic association is usually a non-trivial task, especially when A -classiﬁcation is required. In a similar case, as observed in the example shownin Fig. 11, for a given Z , not all A -sub-clusters are populated and diﬀerencescan also be observed detector-by-detector. Diﬀerently from our approach, themethods of Refs. [15, 17, 18] where not concerned with A -cluster extraction.This is probably the case because the algorithms of Refs. [15, 17, 18] weredeveloped for the previous generation of detectors, where A -classiﬁcation wasnot a crucial requirement. The only algorithms concerned withy explicit linkto Z and A classiﬁcation are those of Refs. [20, 22, 23]. In all these cases,the quality of the classiﬁcation crucially relies on the capability of the opera-tor to manually extract the required information via visual inspection of thedistribution of data.

7. Conclusions and perspectives

The classiﬁcation of data in nuclear physics experiments is key to extractthe required physics information. Modern experiments have been largelyfocused on obtaining high-quality classiﬁcation over a number of physicallymeaningful classes. In charged particles experiments at low and intermediateincident energies, the classiﬁcation problem is that of identifying charge andmass of ions produced in nucleus-nucleus collision. This task is particularlyrepetitive and time consuming as it usually requires the supervision of anoperator that visually inspects and extracts information from bi-dimensionalassembly of data.In this framework, we developed a fully automatic algorithm for data clas-siﬁcation in nuclear physics experiments by combining Evolutionary Com-puting and Vector Quantization. In our approach, a two-level search for theindividual with the maximum ﬁtness value is operated. The EC block repre-sents the upper level and deals with a global search extended to all possible38ndividuals, applying suitable evolutionary criteria based on the Darwinianscheme. Once an individual close to a maximum of the ﬁtness functionis determined, a hill-climbing operator is invoked to perform a fast localsearch of the maximum. The latter consists in a modiﬁed LBG algorithmthat performs a codebook optimization procedure, being ﬁxed the numberof codewords and their physical meaning. Our procedure is innovative as itcombines unsupervised learning approaches with suitable physics constraints.Resulting solutions are in the form of a codebook, where the link betweencodewords and physics classes is explicit with nearly-zero eﬀort required tothe experimenter. Furthermore, no a priori information is used.The newly developed algorithm is benchmarked against experimentaldata obtained by pairs of longitudinally stacked silicon detectors. A sat-isfactory classiﬁcation is obtained for all explored cases, including cases witha reduced number of clusters or characterized by noise.With respect to previously published approaches, our method oﬀers theadvantage of the explicit link between extracted clusters and physically mean-ingful classes, thus signiﬁcantly simplifying the subsequent analysis of data.The procedure does not rely on any a priori information. This makes the re-sulting procedure fully-automatic and a minimal human-supervision is onlyrequired to inspect the result of the classiﬁcationIn addition, C-EC is particularly suitable to be integrated in the onlineand oﬄine analysis of nuclear physics experiments at low and intermediateincident energies, thus allowing a signiﬁcant reduction of the time requiredto the analysis of data, especially in large acceptance arrays.