Automatic classification of nuclear physics data via a Constrained Evolutionary Clustering approach
AAutomatic classification of nuclear physics data via aConstrained Evolutionary Clustering approach
D. Dell’Aquila a, ∗ , M. Russo b,c a Ruđer Bošković Institute, Department of Experimental Physics, Bijenička 54, Zagreb,HR-10000, Croatia b Dipartimento di Fisica e Astronomia, Università degli Studi di Catania, 95235 Catania,Italy c INFN-Sezione di Catania, 95235 Catania, Italy
Abstract
This paper presents an automatic method for data classification in nuclearphysics experiments based on evolutionary computing and vector quantiza-tion. The major novelties of our approach are the fully automatic mechanismand the use of analytical models to provide physics constraints, yielding to afast and physically reliable classification with nearly-zero human supervision.Our method is successfully validated by using experimental data producedby stacks of semiconducting detectors. The resulting classification is highlysatisfactory for all explored cases and is particularly robust to noise. The al-gorithm is suitable to be integrated in the online and offline analysis programsof existing large complexity detection arrays for the study of nucleus-nucleuscollisions at low and intermediate energies.
Keywords:
Nuclear physics data classification, Evolutionary computing,Clustering algorithms, Charged particle identification in nuclear collisions.
1. Introduction
Nuclear physics experiments significantly rely on data classification, i.e.the grouping of data into meaningful physics classes, to reconstruct nucleus-nucleus collision events and enable the exploration of the underlying physics.In studies that exploit the detection of charged particles, the classification ∗ Corresponding author.
E-mail address: [email protected]
Preprint submitted to Computer Physics Communications April 28, 2020 a r X i v : . [ nu c l - e x ] A p r roblem is often that of identifying charge ( Z ) and mass ( A ) of detectedions. This process is usually indicated as particle identification . To thisend, a number of detection systems capable to record information usefulto the classification process have been developed in the last decades [1–10].A quite common strategy consists in the use of detector arrays based onstacks of detection layers through which the particle penetrates before beingcompletely stopped. In similar arrays, if organized in a 2D correlation plot,data recorded by pairs of independent layers assemble into bi-dimensionalnon-overlapping clusters, each representing a certain ( Z , A ) class. To thisextent, the problem of nuclear physics data classification is equivalent to theextraction of clusters in a bi-dimensional space.Numerous algorithms for Cluster Analysis (CA) or Vector Quantization(VQ) have been proposed in the literature so far, achieving a noticeablesuccess in standard partitioning problems and being focused on obtainingclusters of nearly equal dispersion (see for example [11–14]). However, theclusterization process in nuclear physics data is made more difficult by thelarge variability of size and dispersion of the various clusters [15]. Becauseof these unique features, an optimal solution according to the CA/VQ ap-proach, where a good equalization of the content of each cluster is obtained,might not be directly applicable to nuclear physics classification problems,where an acceptable physical solution usually contains clusters with stronglyunbalanced distortions. For this reason, only a reduced number of works havebeen previously carried out attempting to use CA/VQ methods in nucleardata classification problems. For example, fuzzy c-means algorithms havebeen successfully applied to the identification of particles in nucleus-nucleuscollisions, but their use was restricted, exclusively, to cases characterized bythe presence of few clusters with nearly equal distortion [16].Several studies have been instead focused on the classification of nuclearphysics data in more general cases. Among those, image processing tech-niques are widely applied. For example, unsupervised learning approachesexploiting contextual image segmentation methods [15], neural networks [17]or spatial density analysis [18] led to quite satisfactory classifications. How-ever, these methods were conceived to classify exclusively Z -values, thus be-ing not particularly suitable for the vast majority of modern high-resolutionexperiments, where A -classification is often a crucial requirement [7]. Inmore recent times, another automatic classification method was proposedin Ref. [19]. This method allowed a good extraction of clusters, but theprocedure does not comprise an explicit link between extracted clusters and2hysically meaningful classes, thus requiring a significant human supervision,especially in the analysis of data produced by large detection arrays.Because of the above discussed limitations, the vast majority of the ap-proaches commonly used for the classification of data in nuclear physics ex-periments involve human-supervised techniques. In similar approaches, theoperator manually extracts information by visually inspecting bi-dimensionaldistributions of data, which is then used as input for supervised learning pro-cedures. With this respect, artificial neural networks have been proposed [20].More usually, error minimization procedures based on mathematical models[21, 22], which contain Z and A -values explicitly, are instead preferred. Thelatter allow to manually extract information only for a reduced number ofclusters, while the resulting classification can be meaningfully extended toany possible ( Z , A ) ion by model extrapolation. An additional reduction inthe required information can be finally obtained if one follows the proceduresuggested in Ref. [23].However, despite the significant effort poised to minimize human supervi-sion, obtaining a physically meaningful data classification in nuclear physicsexperiments is still a quite repetitive and time consuming task for the oper-ator, especially in the case of modern detection arrays, where the number ofindividual bi-dimensional plots to inspect ranges from few hundreds to thou-sands. As an example, depending on the number of bi-dimensional assemblyto classify, the time required to an operator to perform a similar task rangesfrom days to months. In this framework, it is clear that new fully-automaticmethods for data classification are highly required.In this paper, we present an innovative approach to nuclear physics dataclassification that allows to reduce the task to few minutes or hours of ma-chine time with nearly-zero supervision by the operator. Our approach in-volves Evolutionary Computing (EC) and CA/VQ and is based on an two-level search for the optimal solution of the data clusterization problem. Theupper level consists in a global search operated by an EC algorithm thattreats each solution as an individual of a given population and applies somesuitable evolutionary criteria. In our EC approach, an individual is encodedaccording to a given choice of functional parameters, which constrain a phys-ically meaningful model for the description of bi-dimensional clusters , and For the present work, the model proposed in Refs. [21, 22] has been used. However, ouralgorithm is fully general and can be applied exploiting any given model with an arbitrary Z , A ) physics classes. The lower level, used as a local hill-climbing operator for the EC process, performs a fast local search through asuitable VQ algorithm that exploits the resulting EC individual as the ini-tial codebook for the optimization problem. The major novelty with respectto previously published CA/VQ methods is that the codebook has a physi-cal meaning and resulting solutions are immediately applicable to the dataclassification with nearly-zero effort required to the operator. In addition,our algorithm is fully automatic as no a priori information is required to theexperimenter.Since the proposed methodology is highly interdisciplinary, in order toeffectively drive the reader through the paper, we provide with an introduc-tion of all individual research fields that cooperate in our algorithm. Thepaper is organized as follows: Section 2 gives a more quantitative descriptionof the classification problem studied in this paper, Section 3 is concernedwith a description of the programming techniques used in our study, i.e. ECand CA/VQ, Section 4 provides a detailed description of the algorithm, inSection 5 we test the performance of the algorithm in a typical experimentalcase, in Section 6 we compare our methodology with previously publishedapproaches and, finally, Section 7 reports conclusions and possible futureperspectives of our work.
2. Experimental context
Charged particles are among the most frequent products of a nucleus-nucleus collision. At low and intermediate incident energies ( ≤ MeV/A)they consist almost exclusively of heavy ions ( Z ≥ ). Their number in atypical collision event varies depending on the incident energy and the sizeof the nuclei involved in the collision and can range from a few units to morethan . To meet the requirements of modern nuclear physics studies, state-of-the-art apparatus for the detection of charged particles have reached anextremely high level of accuracy in identifying a large variety of ions.Two numerical quantities are typically used to identify an ion: its numberof protons, often called charge and indicated with Z , and its total numberof nucleons, i.e. the sum of the number of its protons and neutrons, called mass and indicated with A . Because the energy deposition of a heavy ion in number of parameters to adjust.
20 40 60 80 100E (MeV)01020304050 E ( M e V ) D Li Li Li Li Li Be Be Be B B E (MeV) E ( M e V ) D H H H He He He Figure 1: A typical plot obtained by correlating the energy signals of two longitudinallystacked detectors used as ∆ E-E telescope. Data is obtained by means of the GEANT4toolkit [24] for a Silicon ( µ m)-Silicon telescope. In such a bi-dimensional representation,data group into clusters, each corresponding to a given ( Z , A ) ion. Colors define thestatistics of counts as indicated by the color scale. The insert represents a zoom of the ≤ Z ≤ region. Labels indicate loci corresponding to the various ions. a given material is a well-known function of its energy (i.e. its velocity), theproperties of the material and the nature of the ion (i.e. Z and A -values)[25], one can identify an ion produced in a collision event if its velocity andits energy loss in a layer of a material of given properties are known. Thisis the principle of the particle identification technique called ∆ E-E, which isprobably the most widely used approach to identify charged particles in anucleus-nucleus collision at intermediate and low energies. The practical im-plementation of the ∆ E-E technique relies on the use of so-called telescopes,i.e. stacks of longitudinally arranged detectors, each with an independentreadout. Two detection stages are typically sufficient to this end, but con-figurations with more than two stages have also been proposed to accountfor the need for a particularly large dynamic range in a single experiment(see e.g. [5, 7, 8]). Let us assume, for the sake of clarity, that a telescopecomposed of two detection stages is used and that the particle has sufficient5nergy to pass through the first stage, being stopped in the second stage . Ina similar telescope, the signal recorded by the first detection stage, associatedto the energy deposited by the particle in the detector volume, correspondsonly to a portion of the total kinetic energy E kin of the particle and can betherefore indicated as ∆ E . Since the particle is stopped in the second de-tection stage, the signal produced by the latter will complement the kineticenergy of the particle, i.e. E kin = ∆ E + E . If the first stage is sufficientlythin, the following equation can be easily derived from the non-relativisticformalism introduced in [25]: − dEdX = 4 πe Z m e v N B ≈ ∆ E ∆ X ⇒ ∆ E ∝ Z A E (1)where e is the elementary electric charge, m e is the mass of the electron, v is the velocity of the incident particle, N is the number of atoms per cm inthe material, I is the average excitation potential of the atoms in the ma-terial and B is a dimensionless quantity weakly dependent on the velocityof the particle and on the properties of the material. In the last term, wereplaced E kin with E , having E kin = ∆ E + E ≈ E and we considered theapproximation B ≈ const . It is important to point out that these approxi-mations and the simplified formalism derived from [25] are adequate for thepurpose of this section, a fully accurate formalism will be instead used toderive the classification algorithm of Section 4. From the last term of eq. 1,one can state that pairs of charged particle signals recorded by a telescopeoccupy hyperbolic-like loci in the ( E , ∆ E ) plane depending on their ( Z , A ),values. Loci are not equally spaced as the dependency is quadratic on Z and linear on A . This can be observed in the plot shown on Fig. 1, wheredata simulated by using the GEANT4 toolkit [24] for a typical Silicon-Silicontelescope are shown. To produce the data in figure, we have considered, forsimplicity, a uniform energy distribution in the range [0 , MeV for allemitted particles, discarding the signals of particles not fully stopped in thesecond detection stage. The Z -value range ≤ Z ≤ was taken into con-sideration, and a realistic A distribution for each Z -specie was introduced.By visually inspecting the bi-dimensional distribution, one can immediatelyobserve that different areas are populated, corresponding to the Z -values The approach is more general and can be applied to any array of longitudinally stackeddetectors if signals produced by pairs of independent layers are considered; to this end,the hypothesis that the particle is fully stopped in the second of those layers is required. Z = 1 and Z = 2 , Z -lines are additionally separated into A -clusters, eachcorresponding to a different isotope of the particular Z -specie. Because thelatter are intrinsically less separated than Z -lines, as a result of the lineardependence on A given by eq. 1, their identification is usually more chal-lenging. In a typical experiment, different Z and A distributions could berecorded by different detectors within the same setup, depending on theirgeometrical position with respect to the accelerated beam. Another interest-ing feature observed in Fig. 1 is the difference in population of the variousclusters. They reflect the statistics of production of different ions emittedin the nuclear collision. In the example of Fig. 1, where colors indicate thecontent of each individual bin, we have chosen a distribution similar to thatobserved in Li+ Li, Li+ Li and Li+ F collisions at MeV incident en-ergy. It is evident, for instance, how clusters corresponding to Z = 1 areparticularly more populated that Z = 5 ones. The dispersion of each clusteraround its mean value differs also very significantly cluster-by-cluster. Forexample, lines associated to Z = 1 isotopes, i.e. H, H and H, well-visiblein the insert, have an average dispersion of about keV FWHM, whilethat associated to B results in an average dispersion of the order of keV. These values, that usually increase for increasing Z -values, are charac-teristic of the process of interaction of the radiation with the matter and aretherefore affected by the effective thickness of the ∆ E detector as well as bythe uniformity of the detector itself. In addition, the thickness of the ∆ E detector defines the absolute position of clusters in the ( E , ∆ E ) plane, whichis typically different detector-by-detector.To summarize, the problem of charged particle identification with lon-gitudinally stacked detectors consists in recognizing the ( E , ∆ E ) pairs thatbelong to the same physical class and to link them to a given ( Z , A ) ion.Such a task is therefore equivalent to a classification problem. The mostrelevant features of the bi-dimensional clusters to extract are: (i) numberof clusters and their Z and A values are different for each experiment andoften detector-by-detector, (ii) statistics within each cluster are significantlydifferent to each other, (iii) clusters have non-equal dispersions, (iv) no rea-sonable hypothesis can be made regarding the absolute position of clustersin the ( E , ∆ E ) plane. 7 . Description of the programming techniques The newly proposed methodology comprises two different programmingtechniques that cooperate to obtain, in a fully automatic way, a solution tothe proposed classification problem: EC and CA/VQ. In this section, weprovide an introduction on the most salient concepts of EC and CA/VQ.
Evolutionary computing is a scientific field that concerns with the reso-lution of optimization problems through the application of concepts derivedfrom the natural world [26]. Nowadays, EC is applied to numerous domainsof science [27–29]. A very frequent scheme, which is derived from the Dar-winian evolutionary theory, can be schematically described in the followingway:1. A set of possible solutions to the optimization problem, encoded ac-cording to a predefined scheme, is generated (often randomly). Eachof such solutions is called individual , while a set of individuals forms a population .2. A numerical value, called fitness , is associated to each individual. Thefitness quantifies how much a given solution is optimal to the problemto solve. The higher is the fitness associated to an individual the morepromising is the individual itself. This is a crucial quantity for thesuccess of the optimization procedure.3. Until a predefined convergence criterion is reached, the following stepsare iterated:(a) Some individuals are selected ( parents ) to be used as a startingpoint for the generation of new individuals ( offsprings ).(b) Offsprings are obtained through a suitable mechanism of parentsencoding recombination ( crossover ). In this phase, the chromo-somes of the parents, i.e. their encoding, are suitably combinedto generate new individuals. A valid crossover should produce in-dividuals whose genetic code is, to some extent, similar to that ofthe parents. Crossover is usually followed by a random variation,with low probability, of some portions of the derived encoding.Such a process is called mutation and has a crucial importanceas it allows to introduce missing genetic code and to keep geneticdiversity in the population. The fitness is finally calculated for allnewly obtained individuals.8c) Some offsprings live sufficiently long to replace other pre-existingindividuals.The mechanism of fitness improvement typical of EC is a result of theimplementation of suitable selection criteria. This is mandatory in orderto enhance the performance of the algorithm with respect to purely MonteCarlo codes. As an example, the initial choice of the parents of step (a)can be affected by the fitness of the available individuals, i.e. parents canbe chosen according to a probability distribution that favors the extractionof high-fitness individuals. In a similar way, the replacement criterion usedto introduce newely generated offsprings in the population can account forthe fitness of pre-existing individuals. The latter are usually selected amongthose rejected in step (a). To this end, deterministic tournaments betweenpairs of individuals or fitness-based stochastic criteria are often used. Thereplacement of individuals is generally required in order to maintain constantthe total number of individuals in the population.EC algorithms are strongly CPU-oriented; this makes it crucial to allocatemost of CPU resources towards more promising individuals, even if a certainCPU quota has to be ensured to all individuals in the population. A similarallocation of available resources is intrinsic in the selection process operatedby EC.Another fundamental aspect is the so-called premature convergence . Itconsists in the convergence of the algorithm towards a local maximum ofthe fitness function and often negatively affects the capabilities of EC of ob-taining satisfactory results. As shown in previous studies, see e.g. Ref. [29],the problem of premature convergence can be contrasted by subdividing thepopulation into multiple sub-populations. In a similar way, even if a sub-population has reached a premature convergence, with a consequent loss ofgenetic diversity, it is extremely unlikely that all sub-populations simultane-ously converge towards the same individual. In addition, migration plays afundamental role in this context. If a sub-population has converged towardsa local maximum, thus stopping to improve individuals, injecting an individ-ual from another independent sub-population might result in restarting theevolution for the prematurely converged sub-population.The global search of EC is generally extremely powerful to obtain goodsolutions that are close to a maximum of the fitness function but is rather slowfor the search for the maximum itself. Local search techniques are insteadsuitable for the fast determination of the closest local maximum. For this9eason, one or more hill-climbing operators, devoted to a fast local search inthe proximity of a maximum, are sometimes introduced in EC to significantlyspeed-up the determination of the maximum for the fitness function.
Clustering is an important instrument in many scientific disciplines. Thepartitioning approach known as VQ [30] consists in the derivation of a set( codebook ) of reference or prototype vectors ( codewords ) from a given dataset. In a similar way, each subset of vectors ( patterns ) belonging to theoriginal dataset is represented uniquely by one codeword. Clusters can beeasily extracted based on their proximity to the available codewords. WhileVQ is concerned with findings a codebook to represent the original multi-dimensional dataset as well as possible, CA is conceived as the problemof identifying clusters of data, regardless the determination of a codebook.Codewords are determined by a procedure that consists in the minimizationof an objective function ( distortion ) representing the quantization error (QE)[31]. A widespread accepted classification scheme distinguishes between K -means and competitive learning. Clustering algorithms belong to the firstof those classes [11, 32] and are based on the minimization of the averagedistortion through a suitable choice of codewords. In approaches belongingto the second category, a codebook is instead obtained as a consequence ofa competition process between codewords [33].Quantitatively, the objective of standard VQ consists in the representa-tion of a given set of vectors x ∈ X ⊆ (cid:60) k through a set, Y = { y , ..., y N C } ,of N C reference vectors in (cid:60) k . In this definition, the vectors y i represent thecodewords and Y is the codebook. A VQ can be therefore represented by afunction q : X −→ Y . The determination of q allows to obtain a partition S of the original dataset X constituted by N C subsets, S i , called cells : S = { S i ; i = 1 , . . . , N C } (2)Where each cell S i is defined by the following equation: S i = { x ∈ X : q ( x ) = y i } (3) QE is the value deduced by d ( x , q ( x )) , being d a generic distance operatorbetween vectors defined in X × Y . Several functions are conventionally usedfor distortion measurement [32]. For the purpose of this work, as it will be10iscussed in detail in Section 4, the scheme introduced by eq. 3 has beenmodified to be suitable for our classification problem, where codewords arerepresented by hyperbolic-like curves of the type of eq. 1 and the distanceoperator d is defined accordingly. However, the following discussion is validwithout any lack of generality.The performance of a given quantizer q is usually evaluated through theMean QE (MQE). When X is constituted by a finite number ( N P ) of pat-terns, MQE is generally given by: MQE ≡ D ( Y, S ) = 1 N P N C (cid:88) i =1 D i (4)where D i is the total distortion of the i -th cell, being it defined by the fol-lowing equation: D i = (cid:88) n : x n ∈S i d ( x n , q ( x n )) (5)Equation 4 shows that MQE is equivalent to a function ( D ) of the codebook Y and the corresponding partition S .The core of a CA/VQ algorithm consists in the application of two im-portant conditions that are used for the calculation of the optimal partition(when the codebook is fixed) and the optimal codebook (when the partitionis fixed) [32]: • Nearest Neighbor Condition (NNC) . The NNC consists in assigning thenearest codeword, according to the meaning given by the metric d , toeach pattern in the original dataset X . For a given codebook Y , thefollowing partition is thus identified by the NNC: ¯ S i = { x ∈ X : d ( x , y i ) ≤ d ( x , y j ) ∀ y j ∈ Y ) } (6)The set ¯ S i defined by the NNC corresponds to the so-called VoronoiPartition [30] of the original dataset and is usually indicated with thesymbol P ( Y ) = { ¯ S , · · · , ¯ S N C } . P ( Y ) corresponds to the optimal X partition, given the codebook Y [32]. • Centroid Condition (CC) . The CC is concerned with the procedure offinding the optimal codebook for a certain partition S of the originaldataset X , i.e. to determine the centroid ¯ x of each individual cell S i ,11ccording to the given metric. The corresponding codebook will bethen defined by grouping all centroids ¯ x ( S i ) : ¯ X ( S ) ≡ { ¯ x ( S i ); i = 1 , ..., N C } (7) K -means LBG is an iterative algorithm that, N C being fixed, for each iteration pro-duces a quantizer q better than or equal to the one obtained in the previousiteration. This approach is practically equivalent to that of the traditional K -means [34]. The steps through which LBG develops can be schematicallydescribed as follows:1. Initialization : in this phase, an initial codebook is chosen according toa given approach, often randomly (see e.g. [32]).2.
Partition calculation : Given the codebook determined in the previousstep, the related Voronoi Partition, (eq. 6) is calculated according tothe NNC.3.
Termination condition : The MQE at the current iteration D curr is com-pared with the one obtained in the previous iteration D prev . If the ratio | D prev − D curr | /D prev is less than a prefixed threshold ( ε ) then the al-gorithm ends; otherwise, it continues with the next step;4. Codebook calculation : By using the partition calculated in step 2, anew codebook is calculated according to the defined CC (eq. 7).5. Return to step 2.
Key works in the field of VQ have pointed out, both from a theoretical andan experimental point of view, that VQ approaches converge towards parti-tions whose cells contribute almost equally to the total distortion [35, 36].Under this hypothesis, one can easily state that the nuclear physics clas-sification problem cannot be approached through standard VQ algorithms[15, 17]. In particular, according to the features observed in Fig. 1, a phys-ically meaningful partition of the of the ( E , ∆ E ) plane is characterized byhyperbolic-like cells with strongly different distortions. As an example, thecluster associated to ( Z = 1 , A = 1 ), indicated with H in Fig. 1, containsa number of patterns equal to several orders of magnitude those of ( Z = 5 , A = 10 ), B. Consequently, the distortion introduced by H is significantlylarger than that produced by B. Even if a normalization to the number12f patterns is introduced, distortions would still be unbalanced among cellsas a result of the intrinsically different dispersion of patterns around theirmean value observed in different clusters; this aspect is more quantitativelydiscussed in Section 2. Another significant limitation to the application ofstandard VQ algorithms is represented by the need to know the physicallymeaningful number of classes N C a priori. The latter, as stated in Section 2,is usually different for each individual case.In order to develop a suitable unsupervised learning approach for the clas-sification of nuclear physics data based on VQ, we have modified the originalLBG method of Ref. [32] via the introduction of physical constraints deducedby the formal treatment of the interaction of radiation with the matter [25].In our modified LBG, a codebook corresponds to a particular choice of N par physical parameters P i that allow the calculation of a family of curves ( C ),one for each physical class ( Z , A ), being N C fixed. Accordingly, the relateddistance function d is defined in (cid:60) × C . The computation of the Voronoipartition, calculated as in eq. 6, is used for the NNC. The CC correspondsinstead to the determination of the best set of functional parameters P i fora given partition. The latter is operated by means of a gradient descenttechnique [37] applied to the mathematical expression of the total distortionin the parameters space.Resulting VQ algorithm is used as a hill-climbing operator devoted to thelocal search for a maximum of the fitness function (and therefore a minimumof the quantization error of the codebook) in the proximity of a good individ-ual determined by an EC approach. To this extent, the initial codebook Y ,composed by the number of suitable physics classes N C , their ( Z , A ) valuesand an initial choice of parameters P i , is uniquely determined, fully auto-matically, through a global search procedure via evolutionary criteria, andthe VQ algorithm plays the role to speed-up the search for the maximum.The resulting approach is called Constrained Evolutionary Clustering (C-EC) and is described in detail in Section 4. Sections 5 and 6 are instead ded-icated, respectively, to discuss the capabilities of the algorithm in classifyingexperimental data and to a detailed comparison with previously publishedalgorithms for the classification of data in nuclear physics experiments at lowand intermediate energies. 13 . The Constrained Evolutionary Clustering algorithm
C-EC is an algorithm for automatic data classification in nuclear physicsexperiments based on EC and VQ. It is conceived for the classificationproblem typical of experiments that involve the detection of charged par-ticles produced in nucleus-nucleus collisions at low and intermediate energiesthrough longitudinally stacked detectors. In previously published automaticapproaches [15, 17–19] the link between extracted clusters and meaningfulphysics classes ( Z , A ) was a non-trivial task, often requiring non-negligiblehuman supervision and/or the use of a priori physics information, leading thescientific community to more often rely on human-supervised classificationmethods [22, 23]. To overcome these limitations, C-EC combines unsuper-vised learning techniques with physically meaningful constraints. The result-ing solution is a codebook, whose codewords are directly linked to physicalclasses.The algorithm is based on two-level blocks: the upper block is an EC algo-rithm that is devoted to derive, fully automatically, a physically meaningfulcodebook for the clusterization problem through the improvement of suitablydefined individuals. To enhance the convergence speed, a hill-climbing oper-ator, which deals with the optimization of the codebook via a VQ algorithm,is introduced. The latter consists in the lower block of the algorithm andcan be intended as the fast search for a local maximum of the fitness in theproximity of the individual determined by the EC block. Figures 2 and 3report flow charts describing, respectively, the steps executed by the EC andVQ blocks. The VQ algorithm is invoked exclusively by the EC block as alocal hill-climbing operator. The mechanism outlined by the flow chart of Fig. 2 can be described asfollows:1. A village V r is considered, selected according to a random uniformdistribution. Two random individuals, I p and I p , are selected withinthe village V r (including overlap regions with neighbor villages). Withinthe rest of the village V r − { I p , I p } , the worse individual, i.e. theindividual I w having the lowest fitness value, is identified.14 election of a random village 𝑽 𝒓 and 2 random individuals 𝑰 𝒑 , 𝑰 𝒑 Initial PopulationCrossover between 𝑰 𝒑 and 𝑰 𝒑 MutationReplacement of 𝑰 𝒘 in 𝑽 𝒓 𝐅 𝑰 𝒘 ≥ 𝑽𝑸 𝒕𝒉𝒓 𝑭 𝑰 𝒃 Limit Iterations reached n o no End y e s Hill-climbing operator
Start
Hill-climbng operator
Figure 2: Description of the EC algorithm for initial codebook determination. When thegenetic iterations end, i.e. when a pre-defined number of iterations is reached, the VQblock is invoked. The VQ block is also invoked during the genetic iterations, when aparticularly promising individual is produced.
2. The crossover between I p and I p is executed, followed by a mutationas described in Section 4.2.5.3. The offspring is introduced in the population replacing I w .4. If the fitness of the newly introduced individual is greater than a certain15 artition calculation (NNC) 𝑺 𝒎 = 𝑷(𝒀 𝒎 ) Distortion calculation 𝑫 𝒎 New codebook calculation (CC) 𝒀 𝒎+𝟏 = 𝑿(𝑺 𝒎 )𝒎 = 𝒎 + 𝟏 no Final codebook ( 𝒀 𝒎 ) y e s Input codebook 𝒀 Start 𝒎 = 𝟎𝑫 −𝟏 = +∞ End 𝑫 𝒎−𝟏 − 𝑫 𝒎 /𝑫 𝒎 < 𝜺 Figure 3: Description of the VQ algorithm devoted to codebook optimization. This blockis usually invoked by the EC algorithm, that provides the initial codebook Y . prefixed fraction V Q thr of the fitness of the best individual ( I b ) in thepopulation, hill-climbing operator is invoked for the optimization of thenew individual. The latter (described in Section. 4.3) is a particularlytime consuming task and therefore V Q thr is usually chosen to be greaterthan .5. If the number of iterations executed is lower than a certain prefixedvalue, the algorithm returns to step 1.6. The best individual in the population is optimized by invoking thehill-climbing operator. 16
10 20 30 40 50 60 70 80010203040 genetic iteration 0 genetic iteration 800 genetic iteration 3500
Figure 4: Experimental data (color scale represents the counts in each bin, blue corre-sponds to less counts) and a visual representation of the best individual in the populationafter genetic iterations (top panel), genetic iterations (middle panel) and ge-netic iterations (bottom panels). The resulting codebook after iterations is quitesatisfactory to the classification process. The bottom-right panel is a zoom of the low Z clusters after iterations. Figure 4 shows an example of individual improvement by adopting apopulation of N v = 13 , N i = 5 , with an overlap of individual betweenvillages. genetic iterations are considered. The best individual in thepopulation after , top panel, , middle panel, and iterations, bot-17om panels (right panel is a zoom of the low- Z region), is shown. Data,represented by points (the color scale represents the counts), are obtainedby using a pair of longitudinally stacked silicon detectors (see Section 5 foradditional experimental details). The individual is represented by red lines,each corresponding to a given codeword. It is obvious that the results after iterations, i.e. the best individual in the initial population generated ran-domly, produces a highly unsatisfactory classification of data. After about iterations, the result is visually satisfactory for low ∆ E clusters. The so-lution obtained after genetic iterations is very close to a good maximumof the fitness function. By visually inspecting bottom panels in figure, oncan clearly see that all clusters are correctly identified, including a group of poorly populated and high-dispersion clusters located in the upper part ofthe bi-dimensional distribution. A number of iterations is found to belargely sufficient to obtain a fully satisfactory codebook for all cases exploredin this paper. It is reasonable that the number of iterations to perform couldbe slightly adjusted by the experimenter based on the performance of thealgorithm in the classification problem explored. The EC algorithm schematically described by the flow chart in Fig. 2is focused on the global search for an optimal solution of the clusterizationproblem. This is done through the improvement of individuals via the imple-mentation of the selection criteria described in Sect. 4.2.1. Because individu-als represent solutions of a clusterization problem, they are suitably encodedto represent codebooks of the data to classify. In a similar way, one can statethat the EC block operates a codebook improvement. However, differentlyfrom the local search performed by the VQ block (Sect. 4.3), which is donein the proximity of the input codebook determined by the EC block, hereany possible codebook is explored and the search is in this sense global .As previously stated, one of the major novelties of our approach consistsin the use of physical constraints for the derivation of a physically meaning-ful codebook. We define the adopted codebook in the following way. Let usconsider a functional of the form ∆ E = f P ( E, Z, A ) , where ( E , ∆ E ) pairs ∈ (cid:60) represent the coordinates of patterns in the original bi-dimensionaldistribution to classify, a ( Z , A ) pair indicates a given ion and P is a setof N par numerical parameters that constrain the functional. For a given ¯ P = { ¯ P , . . . , ¯ P N par − } set, the vector ( f ¯ P ( E, Z, A ) , E ) represents the loca-tion of a ( Z , A ) isotope in the ( E , ∆ E ) plane, corresponding to the abscissa18 and given the parameter vector ¯ P . A different choice of parameters P i willresult in a different location. If Z = ¯ Z and A = ¯ A are fixed, i.e. if a certainion is considered, the locus of ( f ¯ P ( E, ¯ Z, ¯ A ) , E ) points can be treated as thecodeword associated to the cluster ( ¯ Z , ¯ A ), being ¯ P the functional parame-ters. Without full mathematical rigor, let us indicate with C the class of allpossible ( f P ( E, Z, A ) , E ) curves, corresponding to any ( Z , A ) isotope and anypossible choice of parameter set { P i ∈ (cid:60)} . A codebook can be then definedby considering the N C elements of C corresponding to a certain set of param-eters ¯ P and a certain choice of ( Z , A ) pairs { ( ¯ Z , ¯ A ) , . . . , ( ¯ Z N C − , ¯ A N C − ) } .In such a way, different codewords within a codebook differ exclusively for( Z , A ) values, i.e. each codeword is related to a different ion being the P setfixed. The clusterization problem is therefore equivalent to the search for theoptimal set of parameters P and of ( Z , A ) pairs.In this framework, the functional f P ( E, Z, A ) is any possible paramet-ric functional suitable for the description of ( E , ∆ E ) signals produced bycharged particles in longitudinally stacked detectors. Several valuable mod-els are available in the literature, being derived from the well-know formalismdescribed in [25], see for example [21, 22]. To produce the results describedin this paper, we adopted the analytical model introduced in Ref [21]: ∆ E = f P ( E, Z, A ) == (cid:104) ( P E ) P + P +1 + (cid:0) P Z P A P (cid:1) P + P +1 + P Z A P ( P E ) P (cid:105) (1 / ( P + P +1)) (8)Where the dependence on Z and A is explicit. For the functional of equation8, one has N par = 7 .To fulfill the requirements of the physically meaningful codebook de-scribed above, we adopted a hybrid float/integer genetic encoding. Figure 5schematically represents the encoding of an individual. The first N par float-ing points are used to identify a given parameter set for the functional givenby eq. 8. Possible ( Z , A ) pairs identifying physically meaningful isotopes areconsidered according to the compilation published on Ref. [38] and organizedinto a database for increasing Z A values, as suggested by the simplified for-mula of eq. 1. In this way, larger indexes are associated to ( Z , A ) pairs whosedata lie on a higher ∆ E region of the ( ∆ E , E ) plane and only one numericalindex is needed to identify a given isotope. For the sake of clarity, we haveintroduced the notation n i to indicate the index corresponding to the i -th19 𝑃 𝑁 𝑝𝑎𝑟 −1 (...) 𝑁 𝐶 𝑛 𝑛 𝑁 𝐶 −1 (...) c l u s t e r s f un c t i o na l pa r f l o a t pa r t i n t e g e r pa r t Figure 5: Genetic encoding of an individual according to the EC algorithm developed inthis work. A hybrid float/integer encoding is chosen in order to suitably implement acodebook to the clusterization problem. cluster, being i = 0 , . . . , N C − . Together with the N par functional param-eters, N C and the set { n i ; i = 0 , . . . , N C − } are required to complete theencoding of an individual. In this manner, an individual identifies a uniquemeaningful codebook. The initial population is generated randomly. Resulting individuals ( I i )are distributed according to the scheme described in Fig. 6. In order to limitthe probability of premature convergence of the algorithm and to suitablyimplement the migration, as discussed in Section 3.1, the population is sub-divided into N v villages, each containing an equal number N i of individuals.The first and last individual of each village belong simultaneously to twocontiguous villages. In this way, as shown in Fig. 6, first and last villagesin the population are connected. The presence of such overlap regions ef-fectively implements the migration of individuals through different villages.It is important to note that, to avoid premature convergence of the entirepopulation, overlap involves exclusively pairs of neighbor villages.20 ... ) ( ... ) overlap ( ... ) ( ... ) overlapoverlapoverlap 𝑉 𝑉 𝑁 𝑣 −1 𝑉 Figure 6: The distribution of individuals in the population.
The generation process of each individual involves four different steps. (1)A set of functional parameters is produced, being each parameter generatedaccording to a uniform random distribution. For simplicity, parameters arechosen to vary within physically meaningful ranges. The latter can be easilydetermined by considering the physical meaning of each parameter, as dis-cussed in Ref. [21]. (2) N C is generated with a uniform integer distribution,with minimum value . (3) N C numerical indexes n i are picked randomlywithout duplicates. (4) The fitness of the newly generated individual is cal-culated. The selection mechanism operated by the EC algorithm is based on thefitness function, that we indicate with f fit , being the latter the objective tomaximize. Consequently, a suitable choice of f fit is crucial to the success ofthe algorithm. We define three terms that contribute in the fitness function: f e , related to total the distortion error associated to the codebook identified21y the individual, f n , related to the number of codewords N C , and f avg ,related to the average distortion per unit of pattern associated to each Z -group of clusters.The total distortion is obtained through the following equation: e = N P − (cid:88) i =0 d ( x i , q ( x i )) (9)where d : (cid:60) × C → (cid:60) +0 is the modified distance function: d ( x i , q ( x i )) = | ∆ E i − f P ( E i , Z, A ) | (10)being x i = ( E i , ∆ E i ) the i -th pattern and q ( x i ) = { ( E, f P ( E, Z, A )) } thecodeword associated to the i -th pattern within the considered codebook.The latter is easily determined by scanning through the codebook till thecodeword with the minimum | ∆ E i − f P ( E i , Z, A ) | value is found.By using the formulation of the total distortion introduced by eq. 9, the f e term can be defined as follows: f e = e max − ee max (11)where e max is the maximum expected error a priori. By default, e max iscalculated as the mean absolute error of the pattern ordinates: e max = N P (cid:80) N P i =0 | ∆ E i − ∆ E avg | .From the formulation of f e , it is obvious that introducing a larger numberof ( Z , A ) clusters in the codebook represented by the individual usually con-tributes to decrease the distortion error e and therefore to increase f e . Forthis reason, if only f e contributes to f fit , the algorithm will naturally con-verge towards individuals with numerous clusters; f n is required to contrastthis phenomenon. We define f n in the following way: f n = n max − N C n max (12)where n max is the maximum allowed number of clusters. f avg is introduced in the fitness function to account for the following factspointed out in Section 2: (i) the number of patterns within a cluster differssignificantly between clusters, (ii) clusters associated to higher Z -values have22arger dispersion and the dispersion is roughly similar between different sub-clusters of a given Z -value. For f avg one has: f avg = 1 N Z (cid:88) Z (cid:48) (cid:88) A (cid:48) N ( Z (cid:48) ,A (cid:48) ) P D ( Z (cid:48) ,A (cid:48) ) (13)where we have indicated with N Z the number of different Z -values present inthe codebook, the sum on Z (cid:48) is extended to all available Z -values, the sum on A (cid:48) is extended to all ( Z (cid:48) , A (cid:48) ) sub-clusters corresponding to the same Z (cid:48) valueand D ( Z (cid:48) ,A (cid:48) ) is the total distortion (eq. 9) introduced by the cluster ( Z (cid:48) , A (cid:48) ),being N ( Z (cid:48) ,A (cid:48) ) P its total number of patterns. Equation 13 has the meaningof average distortion per pattern of Z -groups of clusters. The distortionper pattern is used to account for the requirement (i), outlined above, whileaveraging on all Z -values has a physical meaning because of (ii).The terms of equations 11, 12 and 13 are combined together to form thefitness function f fit : f fit = 100 f e u ( f e ) + α n f n u ( f n ) + α avg f avg α n + α avg (14)Where we introduced the Heaviside step function u ( x ) to smooth the effectsdue to solutions with e > e max or N C > n max . α n and α avg are used totune the relative importance of f n and f avg , respectively, with respect to f e .Typically, α n ∈ [0 . , . and α avg ∈ [0 . , . are found suitable for theclassification process. Crossover is operated through the algorithm schematically described inFig. 7 and according to the following procedure. Let us consider two individ-uals, I and I (cid:48) , I having encoding { P , . . . , P N par − } , N C , { n , . . . , n N C − } and I (cid:48) with encoding { P (cid:48) , . . . , P (cid:48) N par − } , N (cid:48) C , { n (cid:48) , . . . , n (cid:48) N (cid:48) C − } . In addi-tion, let us indicate with I (cid:48)(cid:48) , { P (cid:48)(cid:48) , . . . , P (cid:48)(cid:48) N par − } , N (cid:48)(cid:48) C , { n (cid:48)(cid:48) , . . . , n (cid:48)(cid:48) N (cid:48)(cid:48) C − } , theresult of the crossover of I and I (cid:48) . Tho distinct processes are performed forthe functional parameters and the clusters. Each functional parameter P (cid:48)(cid:48) i ofthe offspring I (cid:48)(cid:48) is obtained via a so-called extended average of the analogousparameters from the encoding of I and I (cid:48) : P (cid:48)(cid:48) i = αP i + (1 − α ) P (cid:48) i (15)23 𝑃 𝑁 𝑝𝑎𝑟 −1 (...) 𝑁 𝐶 𝑛 𝑛 𝑁 𝐶 −1 (...) CV 𝑃 𝑃 𝑁 𝑝𝑎𝑟 −1′ (...) 𝑁 𝐶′ 𝑛 𝑛 𝑁 𝐶 −1′ (...)extended averageextended average(...)extended averageCVsuperset of non-duplicated indexes f ee d s up e r s e t f ee d s up e r s e t 𝑛 𝑛 𝑁 𝐶′′ −1′′ (...) p i c k r and o m parent 2parent 1 Crossover of parent 1 and parent 2 Figure 7: A schematic description of the crossover between two individuals. where α is a random number (a newly generated for each i = 0 , . . . , N par − )uniformly generated in the range α ∈ [ − ε par , ε par ] , being ε par ≈ . acertain constant.A similar procedure is performed for the number of clusters: N (cid:48)(cid:48) C = αN C + (1 − α ) N (cid:48) C , α being extracted in the range α ∈ [ − ε C , ε C ] . Once N (cid:48)(cid:48) is determined, a superset of indexes n is produced containing all the non-duplicated indexes that better approximate the original I and I (cid:48) codewordsin the new parameter set { P (cid:48)(cid:48) i } . N (cid:48)(cid:48) C indexes are then picked randomly fromthe superset of indexes. If the size of the superset is lower than N (cid:48)(cid:48) C , the re-maining indexes to complement the set { n (cid:48)(cid:48) i } are extracted randomly from thedatabase constructed using the data of Ref. [38]. In a similar way, one obtainsan individual with the following characteristics: (i) functional parameters P (cid:48)(cid:48) i have intermediate values between those of the parents, but some probabilityto obtain external values exists, (ii) the number of codewords is close to those24f the parents but some probability to be larger or smaller than that of theparents exists, (iii) the spatial disposition of resulting codewords is in goodgeometrical matching with codewords of the parents I and I (cid:48) , even if, withsome small probability, clusters in regions of the ( E , ∆ E ) plane not coveredby I and I (cid:48) codewords can be produced. Mutation is implemented both for the float and integer part of the indi-vidual encoding. For the float part, the, each of the functional parameters isaltered with a small probability p parmut , usually chosen to range within . and . . For the integer part, resulting N (cid:48)(cid:48) C is altered of one unity ( ± ) witha small probability p Cmut ∈ [0 . , . .An interesting mechanism of p parmut and p Cmut variation is introduced toensure the stability of the genetic diversity, thus helping to contrast thepremature convergence of the algorithm. At the beginning, p parmut and p Cmut are the minimum possible. This choice is to allow a fast elimination ofparticularly bad individuals. To monitor the genetic diversity, the ranges ofparameter variation are then divided in cells for each of the parameters.Each cell corresponds to a given choice of a given parameter, within a smallinterval. A cell is considered full if at least one individual has a parametercontained in the corresponding interval. At each iteration of the EC block,the total number of non-empty cells is calculated. This value is expressed aspercentage of the total number of cells and is used to evaluate the geneticdiversity of the population. If the genetic diversity drops below a prefixedthreshold, p parmut is increased. On the contrary, p parmut is suitably decreased ifthe genetic diversity surpasses a certain threshold. After the process summarized by Fig. 7 is done, an algorithm of smartelimination and insertion of codewords is executed. This deals with removingunnecessary codewords and inserting more useful codewords. Unnecessarycodewords are identified by calculating the number of patterns in the cor-responding cluster. If the latter is lower than a certain minimum size, thecodeword is removed from the codebook. New codewords are then insertedin order to restore their original number. Insertion process is executed ac-cording to the following steps: (i) a loop through all the codewords is donestarting from a randomly selected codeword, (ii) if a cluster whose distortionper unit of pattern is above the average in the codebook is found, the indexes25
10 20 30 40 50 60 70 800510152025303540 (a) (b) (c)
Figure 8: An example of crossover (c) between the individuals (a) and (b). Experimentaldata are represented by gray dots. The codebook encoded by the individuals is graphicallyrepresented by means of red dashed lines. Each panel shows one individual. Resultingindividual (c) contains clusters whose geometrical dispositions is in matching both withsome (a) and (b) codewords. related to its lower and upper neighbor clusters are taken into consideration.(iii) If one of those is not present in the codebook, it is then introduced. Ifafter this procedure the number of clusters is lower than the original numberof clusters before smart elimination, indexes are picked randomly from thedatabase of ions (without duplicates) till the required number is reached.Figure 8 shows a graphical example of crossover between two individuals.In the figure, experimental data are represented by gray dots, while thecodebook encoded by individuals is represented by dashed red lines. Data areobtained by using two longitudinally stacked silicon detectors, as describedin Section 5. Individuals are indicated with labels (a), (b) and (c). A singlepanel is used to represent each individual: (a) Left panel, (b) center panel,(c) right panel. (c) represents the result of the crossover between (a) and(b). In this example, (a) covers only the lower region of the ( E , ∆ E ) plane,while, on the contrary, (b) contains uniquely clusters geometrically located inthe upper part of the plane, resulting in a highly unsatisfactory classificationfor low Z clusters. In addition, a number of clearly unnecessary clustersis present, especially in the codebook identified by the individual (b). Theresulting crossover (c), obtained with the prescriptions discussed above and26y combining the parents (a) and (b), contains codewords whose geometricaldisposition covers suitably the entire range of data. In addition, the numberof clusters that are unnecessary to the classification process is reduced, as aresult of the smart elimination and insertion process. By carefully inspecting the bottom panels of Fig. 4, one can observe anumber of codewords not in fully satisfactory agreement with observed clus-ters. This is due to the fact that a large number of iterations is required foran EC algorithm to fully converge to an absolute maximum of the fitnessfunction, and the performances are consequently poorer when approachingextremely high fitness individuals. To speed-up the optimization of the code-book, the VQ block is used as hill-climbing operator for the EC block.Codebook optimization is operated through the algorithm schematicallydescribed in Fig. 3. After the initial codebook Y is calculated (output of theEC block), the following optimization steps are iterated:1. Iteration m -th begins. The Voronoi partition S m of the input data iscomputed (NNC, Sect. 4.3.1) according to the Y m partition calculatedin the ( m − -th iteration.2. The total distortion D m associated to the S m partition is calculated(Section 4.3.2).3. If ( D m − − D m ) /D m < ε , being ε a given prefixed value, then Y m isused as final codebook and the optimization process ends.4. A new codebook Y m +1 is calculated (CC, Section 4.3.3) starting fromthe partition determined in step 1.5. The algorithm returns to step 1. The NNC consists in the calculation of the Voronoi partition P ( Y ) = { ¯ S , . . . , ¯ S N C } according to the definition of eq. 6. Adopted distance functionis that of eq. 10. MQE is defined by eq. 4, being the total distortion within a cell D i calcu-lated according to eq. 5 and by using the modified distance function definedby eq. 10. 27 .3.3. CC Let us assume that a codebook ¯ Y is defined by the parameter set ¯ P = { ¯ P , . . . , ¯ P N par − } , a number of codewords ¯ N C and the set of physics classes { ( ¯ Z , ¯ A ) , . . . , ( ¯ Z ¯ N C − , ¯ A ¯ N C − ) } . If one considers an input partition ¯ S = { ¯ S , . . . , ¯ S ¯ N C − } , where ¯ S i is the cell associated to the physics class ( ¯ Z i , ¯ A i ) ,CC consists in a suitable variation of each individual parameter P i ( i =0 , . . . , N par − ) in order to minimize the MQE associated to ¯ Y . It is im-portant to note that, according to the scheme introduced in this paper, avariation of any parameter ¯ P i will affect the absolute position of all code-words in the codebook, in a physically meaningful way. To this end, suchMQE minimization corresponds to the search for the minimum of the totaldistortion function in the parameters space.The function of the total distortion error in the parameters space can bedefined as follows: E ( P , . . . , P N par − ) = N C − (cid:88) i =0 N iP − (cid:88) j =0 | ∆ E ij − f P (∆ E ij , Z i , A i ) | (16)where the sum on i is extended to all the cells in the input partition, thesum on j is intended on all the N iP patterns in the i -th cell, and ( Z i , A i )is the physics class associated to the i -th cell. The dependence on the pa-rameters { P i } is implicit in the definition of f P as shown, for example, forthe model defined by eq. 8. CC consists in the minimization of the function E ( P , . . . , P N par − ) . To this end, a gradient descent technique is used [37].The mathematical expression of the gradient of E ( P , . . . , P N par − ) inthe parameters space can be easily derived from eq. 16: ∇ E ( P , . . . , P N par − ) = ∂E∂P . . . ∂E∂P Npar − (17)and involves the partial derivatives of f P with respect to P k , ∂f P ∂P k . The lattercan be computed analytically for a functional f like the one of eq. 8, butnumerical approximations can also be used if more complex functionals areadopted. The vector defined by eq. 17 represents the direction, in the param-eters space, of maximum increment for E . The direction identified by −∇ E can be therefore used to vary the parameter vector P towards the maximumdecrement of E . 28 ( P , . . . , P N par − ) minimization can be thus executed through the fol-lowing steps:1. A vector η = { η , . . . , η N par − } is defined, being η i suitably small num-bers. The latter depend usually on the amplitude of the error functiongradient. For the model of eq. 8, values η i ≈ − are found appropri-ate. However, he use of non-optimal initial η i values will not affect theaccuracy of the results but exclusively the number of iterations requiredfor the minimization of E .2. E prev = E ( P , . . . , P N par − ) is calculated and conserved.3. A new set of parameters is obtained starting from the initial parametervector P and according to the opposite of the direction of the gradient, P newi = P i − η i ( ∇ E ( P , . . . , P N par − )) i .4. E new = E ( P new , . . . , P newN par − ) is calculated and conserved.5. if E new > E prev , then all η i are multiplied by a suitably small factor,usually ≈ . , and the algorithm returns to step 3.6. The gradient in the new parameter set ∇ E ( P new , . . . , P newN par − ) is cal-culated. If ∇ E ( P new , . . . , P newN par − ) i ∇ E ( P , . . . , P N par − ) i > then η i is multiplied by a factor greater than , usually ≈ . , otherwise it ismultiplied by a suitably small factor, usually ≈ . .7. The parameters are updated to the new values P i = P newi .8. Steps 2 to 7 are iterated until the condition ( E prev − E new ) /E new < ε CC is verified, being ε CC a prefixed value.9. The minimum for E ( P , . . . , P N par − ) is found in correspondence ofthe parameter set { P , . . . , P N par − } and the CC is terminated.
5. Test of the algorithm with experimental data
To probe the capabilities of the C-EC algorithm in the classification ofnuclear physics data, we considered experimental data recorded by two lon-gitudinally stacked layers of silicon. The experiment was performed at theTRIUMF laboratory of Vancouver (Canada). A Li accelerated beam wasdelivered on a LiF target at an energy of MeV. Li+ Li, Li+ Li and Li+ F collisions were investigated. They resulted in the production of sev-eral ions especially in the range ≤ Z ≤ . Detection apparatus consistedof pairs of semiconducting detectors (silicon) each having individual de-tection units. Consequently, the number of independent bi-dimensional dis-tributions to classify is . The experiment was aimed at the investigation of29 E ( M e V ) D Figure 9: Result of the classification of data collected in the TRIUMF experiment for atypical detector. Data are represented by dots, whose color reflects the number of countsin a given bin. Deduced codebook used for the classification (red lines represent theposition of each codeword) is produced after genetic iterations and the VQ algorithmoptimization process. The insert shows a zoom of the low Z region. the production of resonances in light neutron-rich nuclei. The technique wasbased on the exploration of the invariant mass of ions emitted in resonancedisintegration events. In similar studies, that are widely used in experimentsfocused at the discovery of new resonances, high-precision data classifica-tion is critical. As an example, the experiment explored the existence ofnew highly-excited states of Be and their decay in channels involving theemission of clusters such as He+ He, He+ He, t + Li or p + Li. Such infor-mation, and especially distinguishing among the various emission channels,is relevant to fully understand the production of clustered states in nuclei[39, 40] and the possible formation of nuclear molecules [41]. Despite therelatively low complexity of the apparatus, our experiment represents a goodbenchmark for automatic classification methods because of the requirementof a highly-precise identification of a variety of ions, including both Z and A values.Figure 9 shows the result of the C-EC processing of a typical detectorcase after genetic iterations and local search operated by the VQ block.30f compared to the bottom panels of Figure 4, where the best codebook after genetic iteration is shown, one can semi-quantitatively observe that thevarious codewords significantly better approximate observed clusters with re-spect to the pure EC individual improvement. This a result of the codebookoptimization procedure. The result is extremely satisfactory for all cluster.The insert in figure shows a zoom of the low Z region. Even if a deviationin the high E part of clusters is present, reflecting ions that have sufficientkinetic energy to punch-through the second detection layer, and resultingin an inversion of the cluster towards low ∆ E and E values, low Z code-words are in fully satisfactory agreement with clusters corresponding to H, H and H isotopes. A similar deviation is present in many experiments inthis energy domain and corresponding data populate a region undesired tothe classification process. Furthermore, none of the identified codewords isassociated to regions populated by background counts present in the spec-trum. This is a particularly relevant result as it shows that the algorithm isalmost insensitive to noise.The plot of Figure 9 allows to a visual semi-quantitative inspection of theproduced codebook but does now allow a completely quantitative analysisof the quality of the classification. Classification is quantitatively proventhrough a procedure similar to the one proposed in Ref. [22]. Data shown onFigure 9 is initially subdivided into groups corresponding to their identified Z -value. This is done by considering their proximity to groups of codewordsassociated to each Z -value. If a ( ∆ ¯ E , ¯ E ) point is closer to the group of linesassociated to a certain ¯ Z , then ( ∆ ¯ E , ¯ E ) is classified as Z = ¯ Z . By consideringall data classified to a given ¯ Z -group, then a point is associated to the closestpossible A for that particular ¯ Z -value, A = ¯ A . To quantify the quality ofthe classification, the normalized distance from the ¯ A codeword to the pointis calculated. The distance is normalized in such a way that the neighborcodeword ( A = ¯ A + 1 ) has a distance of from the ¯ A -codeword. In thisway, a point is classified as ( ¯ Z , ¯ A ) if it has a normalized distance lower than . from the ( ¯ Z , ¯ A ) codeword. In Figure 10 we report normalized distancedistributions for data classified as Z = 1 , , , , , respectively from left toright and from top to bottom. Several peaks are present and associated todifferent identified isotopes from H to B. Peaks associated to H, H and Hare well-visible, even if they are broader as a result of the unavoidable effect ofpunch-through discussed above. Very interestingly, unphysical classificationsare not present. As an example, no peak is visible at A = 5 for Z = 2 , asthe corresponding He is an unbound nucleus. Similarly, as expected by a31 c oun t s H H H c oun t s He He He He c oun t s Li Li Li Li Li c oun t s Be Be Be Be Be c oun t s B B B B Figure 10: Quantitative analysis of the quality of the classification obtained from thecodebook shown in Fig. 9. We adopted a similar procedure than the one proposed inRef. [22]. Data are subdivided into different panel, each corresponding to an identified Z -value. Peaks in the panels correspond to each identified A -value associated to a given Z , they are identified by labels. meaningful classification, Li and Be peaks are also missing. The latter ischaracterized by an extremely reduced lifetime (of the order of − seconds)and undergoes therefore to decay into lighter fragments before reaching the32 c oun t s H H H c oun t s He He He He c oun t s Li Li Li Li Li c oun t s Be Be Be Be Be c oun t s B B B B noisy casefew classes case Figure 11: Analysis of the classification results, with the procedure suggested in Ref. [22],for a particularly noisy case (solid line) and a detector characterized by the presence ofa reduced number of clusters (dashed red line). The first is a detector placed close tothe beam line; in this case larger statistics of particles in a broader energy and Z domainare expected. Data for the few clusters case is collected by a detector placed at a largerdistance from the beam line; lower rate of ions in a narrower Z domain is expected in thiscase. Obtained classification is highly satisfactory in both cases. detectors. 33xtremely satisfactory classifications have been obtained, consistently,for all detectors explored in our benchmark. Figure 11 shows the ob-tained classifications, according to the method proposed in Ref. [22], for aparticularly noisy detector (solid lines in panels of Fig. 11) and for a detectorcharacterized by a reduced number of clusters (dashed red line in Fig. 11).Differences in the detectors arise essentially from their geometrical disposi-tion: the noisy detector was placed very close to the beam line and is thereforesubject to a larger rate of impinging ions having broader energy distributions,causing the presence of undesired noise, while the other is placed at a largerdistance from the beam line, being therefore characterized by a lower rateof incident ions in a narrower Z domain. As clearly observed, meaningfulphysics classes are correctly identified in both cases. In the top left panel( Z = 1 ), the three peaks associated to H, H and H are present, but theyare broader for the noisy detector, reflecting a larger amount of data in theregion corresponding to ion punch-through. The peak corresponding to Hein the top right panel ( Z = 2 ) results partially merged with the neighbor Hepeak (populated with the highest statistics) in the case of the noisy detector,while the He- He separation is excellent for the few clusters case. He peakis extremely weakly populated in the few clusters case, reflecting the lowstatistics of production of the neutron-rich ion He for the emission directioncovered by the detector. A more populated He peak is instead observedfor the noisy case, where the statistics of production of a larger number ofions is quite significant. Very interestingly, as expected, Z range is morelimited for the detector placed at a larger distance from the beam line, and,in particular, no clusters are identified as Z = 5 . Also in the Z = 4 classesthe two detectors differ, one can observe significant Be, Be, Be, Be and Be peaks in the detector placed close to the beam line, while only Be and Be are significantly identified for the one distant from the beam line.The analysis discussed in this section comprises independent classifi-cation problems. The CPU time requested to complete the task resulted tobe about seconds on a commercial Intel i7-9700K (8 cores) processor ata frequency of . GHz. This is a satisfactory result, which testifies that thealgorithm is suitable for analysis of data produced by large apparatus (wherethe number of individual classification problems is usually between severalhundreds and a few thousands). 34 . Comparison with other approaches
Let us summarize the capabilities of the C-EC algorithm: (i) no a prioriinformation is required, number of clusters to classify and relevant ( Z , A )values are obtained through an individual improvement algorithm based onevolutionary criteria, (ii) after suitable codebook optimization, the solutionof the clusterization problem is readily usable for data classification withexplicit link to physically meaningful classes, (iii) the algorithm is robust tonoise.This section is dedicated to a detailed comparison between the C-ECalgorithm and previous approaches published in the literature [15–20, 22, 23].The work published in Ref. [15] is probably the first example of classifica-tion of nuclear physics data recorded by stacks of detectors via unsupervisedlearning approaches. It is based on the visual analysis of bi-dimensionalassembly of data via sophisticated image segmentation techniques. Thisapproach has been used exclusively in cases where only Z -classification ismeaningful. Major limitation of the approach consists in the need for a pri-ori information such as the slope of clusters and their inter-distances. Evenif the authors show how to systematically calculate these quantities based onphysical considerations within accuracy, the method is effective only fora reduced class of detectors for which a reliable energy calibration is possiblewithout assumptions on data classification.Ref. [17] is concerned with the use of pre-attentive neural systems. Perfor-mances are particularly good for groups of clusters characterized by similarinter-distance and dispersion, e.g. the high Z -region, where the resolutionis not sufficient to distinguish sub-clusters associated to different A -values.The effectiveness of the approach in the low Z -region, where one can expectto observe a separation of clusters due to the A -value, is reduced as a resultof the variety of dispersion and inter-distance between clusters. The link tophysically meaningful classes relies on the definition of Z -value for the lastidentified cluster. 35 o r k a pp r oa c h s ub - c l u s t e r s Z - r a n g e l o w - e n e r g y n oa p r i o r ili n k t o c l a ss . c l a ss .i n f o . ( Z , A ) R e f .[ ]i m ag e n o f u ll y e s n o f r o m a p r i o r ii n f o s e g m e n t a t i o n R e f .[ ] p r e - a tt e n t i v e n oo n l y h i g h Z y e s y e s n o n e u r a l s y s t e m R e f .[ ] s p a t i a l d e n s i t y n o f u ll n o y e s n o d a t a p r o ce ss i n g R e f .[ ] d a t a s li c i n g y e s f u ll y e s y e s n o R e f .[ ] f u zz y -- y e s y e s n o c - m e a n s R e f .[ ] a rt i fi c i a l y e s f u ll y e s n o f r o m a p r i o r ii n f o n e u r a l n e t w o r k s R e f .[ ] m o d e li n go f y e s f u ll y e s n o f r o m a p r i o r ii n f o p a tt e r n s R e f .[ ] d a t a s li c i n g y e s f u ll y e s n o f r o m a p r i o r ii n f o T h i s w o r k E C a nd V Q y e s f u ll y e s y e s y e s T a b l e : Su mm a r y o f t h ec o m p a r i s o n s w i t h o t h e r a pp r oa c h e s f o r nu c l e a r ph y s i c s d a t a c l a ss i fi c a t i o n .
36 spatially density data processing is reported in Ref. [18]. The procedureis based on the identification of points along clusters and their subsequentlinearization. Performances are strongly reduced in the low energy part ofdata, i.e. low E by looking at Fig. 1, because of the rapid change in slope ofclusters. In addition, sub-clusters associated to A -values are not identified.The procedure is particularly powerful for online data analysis cases, wherea reduced body of information is typically sufficient, for example to monitorthe stability of large detectors.More recently, Ref. [19] has proposed a classification method based on asimple data slicing procedure. The approach is capable to identify both Z -clusters and A -clusters but the association of extracted clusters to physicallymeaningful ( Z , A ) classes is left to the operator.Fuzzy clustering methods derived from the c-means were adopted inRef. [16]. However, the approach is restricted exclusively to cases comprisingfew classes.Artificial neural networks were adopted in Ref. [20] to approximate thespatial disposition of physics clusters in bi-dimensional distributions of data.The underlying mechanism is that of a supervised learning approach, wherea number of patterns are manually extracted by the operator for each ( Z , A )class to then feed the training process of the neural network. Even if resultingclusters are directly linked to ( Z , A ) pairs, the whole procedure heavily relieson human-supervision.In a similar way with respect to Ref. [20], the authors of Ref. [22] adopteda supervised learning procedure where the classification capabilities rely ona number of manually extracted patterns. Differently from Ref. [20], a for-mal model of data with explicit ( Z , A ) dependence is used, thus allowingthe extraction of patterns only for a reduced number of classes. Resultsare extended to all possible ( Z , A ) via model extrapolation. The procedurecritically relies on the quality of the information extracted by the operator.Finally, a new approach based on data slicing was very recently proposedin Ref. [23]. The procedure combines peak finding algorithms with informa-tion that needs to be manually extracted by an operator. The mechanismis similar to that proposed in Ref. [22] with the major difference that thepatterns needed to constrain the model are extracted automatically from aninitial information given by the operator. The advantage of this method withrespect to [22] consists therefore in the strongly reduced information requiredby the algorithm, thus significantly minimizing human supervision. Even ifan explicit link to ( Z , A ) classes is provided, the result of the classification is37ffected by the initial information provided by the operator and an a priorivisual inspection of data distributions is always required.Comparisons are summarized in table 1. As one can easily observe, ourapproach is the only one where an explicit link between identified clusters and( Z , A ) values is done in a fully automatic way. The vast majority of other ap-proaches is exclusively concerned with the extraction of clusters and humansupervision is consequently required to associate them to physically meaning-ful classes. A link to Z -classes is done by the algorithm of Ref. [15], but theprocedure requires the use of a priori information. It is important to stressthat an automatic association is usually a non-trivial task, especially when A -classification is required. In a similar case, as observed in the example shownin Fig. 11, for a given Z , not all A -sub-clusters are populated and differencescan also be observed detector-by-detector. Differently from our approach, themethods of Refs. [15, 17, 18] where not concerned with A -cluster extraction.This is probably the case because the algorithms of Refs. [15, 17, 18] weredeveloped for the previous generation of detectors, where A -classification wasnot a crucial requirement. The only algorithms concerned withy explicit linkto Z and A classification are those of Refs. [20, 22, 23]. In all these cases,the quality of the classification crucially relies on the capability of the opera-tor to manually extract the required information via visual inspection of thedistribution of data.
7. Conclusions and perspectives
The classification of data in nuclear physics experiments is key to extractthe required physics information. Modern experiments have been largelyfocused on obtaining high-quality classification over a number of physicallymeaningful classes. In charged particles experiments at low and intermediateincident energies, the classification problem is that of identifying charge andmass of ions produced in nucleus-nucleus collision. This task is particularlyrepetitive and time consuming as it usually requires the supervision of anoperator that visually inspects and extracts information from bi-dimensionalassembly of data.In this framework, we developed a fully automatic algorithm for data clas-sification in nuclear physics experiments by combining Evolutionary Com-puting and Vector Quantization. In our approach, a two-level search for theindividual with the maximum fitness value is operated. The EC block repre-sents the upper level and deals with a global search extended to all possible38ndividuals, applying suitable evolutionary criteria based on the Darwinianscheme. Once an individual close to a maximum of the fitness functionis determined, a hill-climbing operator is invoked to perform a fast localsearch of the maximum. The latter consists in a modified LBG algorithmthat performs a codebook optimization procedure, being fixed the numberof codewords and their physical meaning. Our procedure is innovative as itcombines unsupervised learning approaches with suitable physics constraints.Resulting solutions are in the form of a codebook, where the link betweencodewords and physics classes is explicit with nearly-zero effort required tothe experimenter. Furthermore, no a priori information is used.The newly developed algorithm is benchmarked against experimentaldata obtained by pairs of longitudinally stacked silicon detectors. A sat-isfactory classification is obtained for all explored cases, including cases witha reduced number of clusters or characterized by noise.With respect to previously published approaches, our method offers theadvantage of the explicit link between extracted clusters and physically mean-ingful classes, thus significantly simplifying the subsequent analysis of data.The procedure does not rely on any a priori information. This makes the re-sulting procedure fully-automatic and a minimal human-supervision is onlyrequired to inspect the result of the classificationIn addition, C-EC is particularly suitable to be integrated in the onlineand offline analysis of nuclear physics experiments at low and intermediateincident energies, thus allowing a significant reduction of the time requiredto the analysis of data, especially in large acceptance arrays.