[PDF] Multiobjective optimization of the dynamic aperture for SLS 2.0 using surrogate models based on artificial neural networks

Abstract

Modern synchrotron light source storage rings, such as the Swiss Light Source upgrade (SLS 2.0), use multi-bend achromats in their arc segments to achieve unprecedented brilliance. This performance comes at the cost of increased focusing requirements, which in turn require stronger sextupole and higher-order multipole fields for compensation and lead to a considerable decrease in the dynamic aperture and/or energy acceptance. In this paper, to increase these two quantities, a multi-objective genetic algorithm (MOGA) is combined with a modified version of the well-known tracking code tracy. As a first approach, a massively parallel implementation of a MOGA is used. Compared to a manually obtained solution this approach yields very good results. However, it requires a long computation time. As a second approach, a surrogate model based on artificial neural networks is used in the optimization. This improves the computation time, but the results quality deteriorates. As a third approach, the surrogate model is re-trained during the optimization. This ensures a solution quality comparable to the one obtained with the first approach while also providing an order of magnitude speedup. Finally, good candidate solutions for SLS 2.0 are shown and further analyzed.

Full PDF

MMultiobjective optimization of the dynamic aperture for SLS 2.0using surrogate models based on artiﬁcial neural networks

M. Kranjˇcevi´c, ∗ B. Riemann, A. Adelmann, and A. Streun

Paul Scherrer Institut, CH-5232 Villigen PSI, Switzerland

Modern synchrotron light source storage rings, such as the Swiss Light Source upgrade (SLS 2.0),use multi-bend achromats in their arc segments to achieve unprecedented brilliance. This perfor-mance comes at the cost of increased focusing requirements, which in turn require stronger sextupoleand higher-order multipole ﬁelds for compensation and lead to a considerable decrease in the dynamicaperture and/or energy acceptance. In this paper, to increase these two quantities, a multi-objectivegenetic algorithm (MOGA) is combined with a modiﬁed version of the well-known tracking code tracy . As a ﬁrst approach, a massively parallel implementation of a MOGA is used. Comparedto a manually obtained solution this approach yields very good results. However, it requires a longcomputation time. As a second approach, a surrogate model based on artiﬁcial neural networks isused in the optimization. This improves the computation time, but the results quality deteriorates.As a third approach, the surrogate model is re-trained during the optimization. This ensures asolution quality comparable to the one obtained with the ﬁrst approach while also providing anorder of magnitude speedup. Finally, good candidate solutions for SLS 2.0 are shown and furtheranalyzed.

I. INTRODUCTION

The upgrade of the Swiss Light Source, called SLS 2.0,is scheduled for 2023–24. To increase the brilliance, thecurrent 3rd generation storage ring will be replaced byone employing seven-bend achromats, including reversebends and longitudinal gradient bends [1]. The strongerfocusing requirements need higher sextupole and higher-order multipole ﬁelds for chromatic compensation. Thismakes ﬁnding a reasonably large dynamic aperture (DA)for injection and an energy acceptance for a suﬃcientbeam lifetime more challenging and more important. Itcan either be done indirectly, by computing and minimiz-ing the dominant resonance driving terms [2], or directly,by computing and maximizing the DA and energy accep-tance [3, 4].In this work the latter approach is used and a con-strained multiobjective optimization problem is formu-lated (section II). The search space comprising thestrengths of sextupole families, as well as horizontal andvertical linear chromaticity is considered. Similarly tothe approach in [3], the objective functions are deﬁned tomaximize the transverse DAs at three diﬀerent energiesand to prevent the tune resonances from being crossed,thus maximizing the energy acceptance and beam life-time. These ﬁgures of merit are computed using directparticle tracking with a modiﬁed version of the well-known tracking code tracy [5].Out of the many multiobjective optimization algo-rithms, particle swarm optimization [6], diﬀerential evo-lution [7, 8] and multiobjective genetic algorithms [3, 9–14] have already been successfully applied to the problemof optimizing the DA. In this work a multiobjective ge-netic algorithm (MOGA) is chosen and further extended ∗ [email protected] with constraint-handling methods (section III).Previous work includes approaches that speed up theconvergence of the multi-generation optimization methodby, e.g., preselecting points that are likely to be good us-ing k -means clustering [4] or a surrogate model [15–17],i.e., an approximation model which captures the signif-icant properties of a given simulation model and is alsovery cheap to evaluate. In this work an artiﬁcial neuralnetwork (ANN) surrogate model is used for the optimiza-tion, in combination with a MOGA, similarly to [18]. Ad-ditionally, the solution quality is improved by re-trainingthe surrogate model during the optimization.First, the ANN surrogate model is built to approximatethe objective functions (section IV). In particular, goodhyperparameters are determined and the surrogate modelquality is shown. The surrogate model is then used foroptimization and a new re-training procedure is devised(section V). The run time and the solution quality of thenew approach are compared with those of a massivelyparallel implementation of a MOGA coupled with tracy .The solution quality is determined in comparison with anexisting manually obtained solution.Finally, good candidate solutions are shown and fur-ther analyzed (section VI). In particular, the transverseDAs at the three considered energies are shown and com-pared with those of the manually obtained solution. II. OPTIMIZATION PROBLEMA. Dynamic aperture (DA)

The DA can be loosely deﬁned as an area in the trans-verse phase space in which stable particle motion can oc-cur. To quantify the size of the DA the approach from [3]together with the modiﬁcations from [9] is adopted. Asthe DA area is dependent on the local linear optics given a r X i v : . [ phy s i c s . acc - ph ] A ug by the Twiss parameters α and β at the starting locationof the particle tracking, the DA coordinates r and θ inFloquet space are mapped to the coordinates for trackingvia (cid:18) xx (cid:48) (cid:19) = (cid:18) β − α (cid:19) x r cos θ k √ β x , (cid:18) yy (cid:48) (cid:19) = (cid:18) β − α (cid:19) y r sin θ k (cid:112) β y . (1)The particle trajectories along 2 K rays in the ( x, y ) Flo-quet space starting at the origin are considered. Theangles between these rays and the x axes are θ k = kπ/K for k ∈ { , . . . , K − } . (2)Since particles get lost on the vacuum chamber walls, it isreasonable to assume that a realistic DA will not exceedthe aperture at the reference energy that would exist ifall sextupoles and higher-order magnets were turned oﬀ.This assumed upper limit is referred to as the linear aper-ture , and the length it spans on the k -th ray is denotedby ¯ L ( θ k ). Similarly, the length that the DA at a relativeenergy oﬀset δ spans on this ray is denoted by L ( θ k , δ ) . (3)In order not to reward cases with L ( θ k , δ ) > ¯ L ( θ k ), the line objective is deﬁned as f k,δ = max { , ¯ L ( θ k ) − L ( θ k , δ ) } ¯ L ( θ k ) . (4)Both ¯ L ( θ k ) and L ( θ k , δ ) are computed using the biasedbinary search as presented in [9]. For ¯ L k the dimen-sionless initial length in Floquet space is set to a suﬃ-ciently large reference radius of 0 .

01. The initial radiusfor L ( θ k , δ ) is then set to ¯ L k . The DA objective for agiven relative energy oﬀset δ is deﬁned asDA δ = 12 K K − (cid:88) k =0 f k,δ . (5)In a ﬂat lattice there is vertical symmetry of the aperturearea, so (5) becomesDA δ = 12 K (cid:32) f ,δ + f K,δ + 2 K − (cid:88) k =1 f k,δ (cid:33) . (6)Due to the normalizations in Eqs. (4) and (5), the DAobjective is always in [0 , − δ , DA and DA δ , (7)where δ = 0 . B. Crossing tune resonances

The DA objectives in Eq. (7) are computed to ensurea suﬃciently large aperture region in phase space. Un-fortunately, the binary search used to compute the lineobjectives in Eq. (4) is suﬃcient only when the apertureregion is simply connected. This is not always the case,especially when particles cross tune resonances. There-fore, additional requirements that take into account thecrossing of tune resonances need to be deﬁned. For this,the tune (cid:126)ν ( x, y, δ ) = (cid:18) ν x ( x, y, δ ) ν y ( x, y, δ ) (cid:19) (8)is considered as a function of the initial positions x , y intransverse Floquet space and the relative energy devia-tion δ .

1. Chromatic tune footprint

For a suﬃcient energy acceptance it is beneﬁcial toconstrain the variation of tunes so that no low-order res-onances are crossed. In particular, in this paper the tunefootprint is constrained inside the triangle formed bythree intersecting 2nd order resonances around the on-momentum tune. In the case of the considered SLS 2.0lattice, the vertices of this triangle (see Fig. 6 on p.13,red lines and part of x axis) are(39 , − (39 . , . − (39 . , . (9)To prevent particles from getting lost on resonance stop-bands, a margin of 0 .

025 around the resonance lines isused in this work (see Fig. 6, inner black triangle).The tune footprint is approximately computed by sam-pling the energy-dependent tunes ν x and ν y at the energyoﬀsets δ p = p · δ max P , p = − P, . . . , P. (10)There are cases for which the particle motion is unstableand the tunes cannot be computed. Denoting by g ( (cid:126)ν )the squared Euclidean distance of (cid:126)ν from the aforemen-tioned triangle (see Eq. (9)), and taking into account that (cid:126)ν (0 , , δ ) = (cid:126)ν (0 , ,

0) is known and that g ( (cid:126)ν (0 , , , (11)the tune footprint distance is deﬁned as ctfp = (cid:88) p (cid:54) =0 computable g ( (cid:126)ν (0 , , δ p )) . (12)Furthermore, as in [3], unstable ± = 1 − | δ u, ± | /δ max , (13)is deﬁned, with δ u, + and δ u, − denoting the ﬁrst (i.e.,smallest in magnitude) positive and negative values, re-spectively, for which the tunes are located outside thetriangle or not computable.The value of P needs to be high enough to achieve asuﬃcient resolution in tune space, without superﬂuouscomputational overhead – in this paper P = 25 is used.Furthermore, δ max = 0 .

05 is used.

2. Amplitude-dependent tune shifts (ADTS)

The betatron oscillation is nonlinear and thus anhar-monic. Therefore, a number of diﬀerent amplitudes haveto be considered. In this work, to achieve a suﬃcientresolution, Q = 20 equidistant points are taken on eachof the following two line segments in the transverse Flo-quet plane: the horizontal line segment (see the sentencecontaining Eq. (3) for the deﬁnition of L ( · , (cid:8) ( t, ∆) | t ∈ ]0 , L (0 , (cid:9) (14)and the vertical line segment (cid:8) (∆ , t ) | t ∈ ]0 , L ( π/ , (cid:9) . (15)In these points (with δ = 0) the tunes are computedas the fundamental frequencies of turn-by-turn data ineach plane. To compute these frequencies, the FFT of128 tracked turns with zero padding to 512 samples isused. To excite both oscillation modes ∆ = 10 − is used,oﬀsetting these line segments from the original rays ofthe DA computation.As in the case of computing the tune footprint dis-tance ctfp in Eq. (12), the squared Euclidean distanceof the tune in these point from the triangle formed bythe 2nd order resonances (see Eq. (9)) is subsumed intothe amplitude-dependent tune footprint distance adts = Q (cid:88) q =1 computable g ( (cid:126)ν ( x q , ∆ , Q (cid:88) q =1 computable g ( (cid:126)ν (∆ , y q , . (16)Here ‘computable’ refers to the tracked particle not beinglost in 512 turns. C. Search space

Sextupoles are mainly used to compensate chromatic-ity, but they also limit the on-momentum transverse DA.To have the possibility to extend the DA limits, morethan two sextupole families are used. The strengths ofthese two sextupole families are subsumed into a vectorof tuning sextupole strengths (cid:126)t = ( t , t ), whose linear re-lationship with chromaticity is quantiﬁed by the matrix T . The sextupole strengths of the remaining families aregrouped into a vector (cid:126)κ , and their inﬂuence on (linear)chromaticity (cid:126)ξ = ( ξ x , ξ y ) is characterised by a matrix M ,so that (cid:126)ξ = M (cid:126)κ + T (cid:126)t + (cid:126)ξ ua , (17)where (cid:126)ξ ua is the chromaticity of the unaltered lattice.By magnet design, the applicable sextupole strength islimited to some interval [ − κ max , κ max ].In addition to the sextupole strengths (cid:126)κ , the chro-maticity is also taken to be a part of the search space.To prevent head-tail instability, it must be non-negative.On the other hand, the upper limit ξ max can be adjusted.To sum up, a design point in the search space is (cid:126)d = ( ξ x , ξ y , κ , . . . , κ ) , (18)where ξ x,y ∈ [0 , ξ max ] , κ i ∈ [ − κ max , κ max ] . (19)The SLS 2.0 sextupoles have a bore of 22 mm and a max-imum poletip ﬁeld of 0 .

71 T at 2 . κ max = 650 m − . Furthermore, in this work ξ max is set to 1. D. Multiobjective optimization problem

Summing up the three preceding sections, the con-strained multiobjective optimization problem consideredin this paper is (see Eqs. (6), (7), (13) and (18))min (cid:126)d (cid:0) F (cid:122) (cid:125)(cid:124) (cid:123) DA − δ , F (cid:122)(cid:125)(cid:124)(cid:123) DA , F (cid:122)(cid:125)(cid:124)(cid:123) DA δ , F ,F (cid:122) (cid:125)(cid:124) (cid:123) unstable ∓ (cid:1) , (20)subject to (see Eqs. (12) and (16)) t , t ∈ [ − κ max , κ max ] and ctfp + adts = 0 . (21)The second constraint ensures that the tune footprint dis-tance ctfp and the amplitude-dependent tune footprintdistance adts from the triangle formed by 2nd order res-onances (see Eq. (9)) are both zero. III. MULTIOBJECTIVE GENETICALGORITHM (MOGA)

In Eq. (20) multiple objectives have to be optimizedsimultaneously. There are many multiobjective algo-rithms for this purpose, such as particle swarm optimiza-tion [6, 19–21], ant colony optimization [22], simulatedannealing [23], artiﬁcial immune system [24], diﬀerentialevolution [7, 8] or genetic algorithm [25]. Multiobjectivegenetic algorithms (MOGA) are probably the most popu-lar and they have already been successfully applied in theﬁeld of particle accelerator physics [12, 18, 26–29], in par-ticular also for the DA optimization [3, 4, 10, 11, 13, 14].A design point (cid:126)d (see Eq. (18)) dominates (cid:126)d if it isnot worse in any of the objectives (see Eq. (20)), and itis strictly better in at least one objective. A MOGA al-lows independent evaluations of solution candidates andis therefore suitable for parallelization. In this work amassively parallel implementation of a MOGA [28–30] isused to ﬁnd points that are not dominated by any otherpoint, called Pareto optimal points . The basic steps of aMOGA are shown in Algorithm 1.

Algorithm 1

Multiobjective genetic algorithm random population of individuals, (cid:126)d i for i = 1 , . . . , M compute (cid:126)F ( (cid:126)d i ) for i = 1 , . . . , M while a stopping criterion not reached do for pairs of individuals (cid:126)d i , (cid:126)d i +1 do crossover( (cid:126)d i , (cid:126)d i +1 ), mutate( (cid:126)d i ), mutate( (cid:126)d i +1 ) for each new individual (cid:126)d new , compute (cid:126)F ( (cid:126)d new ) choose M ﬁttest individuals for the next generation In the context of a MOGA a design point is referredto as an individual . First, in line 1, M individuals arechosen uniformly at random from intervals in Eq. (19).In line 2 their objective function values (Eq. (20)) arecomputed. Then, in lines 3–7, a number of cycles is per-formed, each resulting in a new generation. In everycycle, new individuals are created using crossover andmutation operators (lines 4–5) and their objective func-tion values are computed (line 6). Finally, in line 7, ap-proximately M ﬁttest individuals are chosen to comprisethe new generation. The implementation from [26, 30](called opt-pilot ) is used, where the algorithm is im-plemented in C++ and parallelized using MPI such thata new generation is created (line 7) once the objectivefunction values have been computed for n new individu-als (line 6).In this paper simulated binary crossover and indepen-dent bit mutation are used. A. Particle tracking and lattice conﬁguration

The particle tracking code tracy [5] is used to computethe objective function values (Eq. (20)) and constraint vi-olations (Eq. (21)) for a given design point (cid:126)d (Eq. (18)). tracy is a ﬂexible and well-tested beam dynamics librarythat was also used for SLS [31]. It uses a 4th-order sym-plectic integrator [32] for all multipole orders and allowsfast tracking of single particles, enabling a trade-oﬀ be-tween computation time and accuracy.For the purpose of this paper the tracking code is mod-iﬁed, including the computation of the ADTS and chro-matic tune footprints (see section II B) and the DA (seesection II A). Furthermore, an interface was created sothat the values needed in Eqs. (20) and (21) could beobtained by opt-pilot . In this paper the current lattice for SLS 2.0 is used [33].This lattice is based on the multi-bend achromat scheme,including reverse bends and bends with a 3-step longi-tudinal proﬁle and additional quadrupole focusing. Themagnet lattice has a 3-fold symmetry. For on-momentumparticles a ‘virtual’ 12-fold symmetry exists due to theproper adjustment of betatron phase advances betweensextupoles in the insertion spaces. In this work the par-ticles are tracked for 500 turns.

B. Constraint handling

Only some randomly chosen individuals (around 48 %)satisfy the ﬁrst constraint in Eq. (21), i.e., their tuningsextupoles are inside of the bounds. In the following,such individuals are called feasible . If an individual isinfeasible, its objective function values are not computed.Instead, infeasible individuals are compared based on theseverity of their constraint violations. Since the objectivefunction values in Eq. (20) are at most one, this can bedone by simply setting (see Eq. (21)) F i ← { , | t i | − κ max } for i = 1 , , (22) F i ← i = 3 , , . (23)On the other hand, for individuals that violate the sec-ond constraint in Eq. (21) (i.e., when tune resonances arecrossed) the objective function value computed in line 2or 6 of Algorithm 1 is penalized as F i ← F i + penalty , (24)where (see Eqs. (12) and (16)) penalty = α · ctfp + α · adts . (25)If penalty ≥ α = 0 .

01 and α = 1 are used. C. Results

All computations in this paper are run on Intel XeonGold 6152 nodes of the PSI Merlin cluster. An opti-mization using opt-pilot run for 48 h on three nodes(i.e., 132 processes), with M = 300 and n = 130 (seesection III above III A), computed 829 generations. Thequality of a generation is quantiﬁed by counting the num-ber of distinct design points in that generation whichsatisfy the constraints in Eq. (21) and have all of the ob-jective function values better than those of the manuallyobtained solution. The values for a few representativegenerations, including the last one, are shown in Table I.In particular, since the values of the nonnegative objec-tive functions F and F for the manually obtained solu-tion are zero, they are also zero for all of the newfoundpoints counted in Table I. When comparing the resultsit should also be taken into account that the DAs of themanually obtained solution were optimized beyond thelinear aperture limits.From these 31 good points from the last generation (seeTable I) three points are chosen based on diﬀerent criteriaand their objective function values are compared to thoseof the manually obtained solution in Table II. The designpoint called point-1 is chosen because it has the lowestvalue of the objective F , DA − . = 0 . point-2 because it has the lowest value of F and, coincidentally,also of F , with values DA − . = 0 .

001 and DA . =0 . point-3 was chosenbecause it improves all three objectives by a comparableamount (for all of these points F = F = 0).To sum up, using the massively parallel implementa-tion opt-pilot of a MOGA, many design points withvery good objective function values are found, but therun time (48 h) is quite long. TABLE I. The number of design points in a speciﬁc genera-tion that satisfy the constraints in Eq. (21) and have all ofthe objective function values better than those of the man-ually obtained solution, referred to as the ‘design solution’.Generation 829 is the last one that is considered because theoptimization was stopped after 48 h. All of the objective func-tion values are computed with 500 turns in tracy .generation 100 200 300 400 500 829nof better 1 10 17 18 26 31TABLE II. A comparison of the manually obtained ‘design so-lution’ with three good points found in the optimization with opt-pilot . The objective function values are computed with500 turns in tracy , and all of these design points satisfy theconstraints in Eq. (21). Out of the 31 design points in gen-eration 829 computed in 48 h (see also Table I), point-1 ischosen as the design point that has the lowest value of F and point-2 is chosen to have the lowest value of F (coinciden-tally, it also has the lowest value of F ). point-3 is chosen toimprove all of these three objectives by a comparable amount.The column labeled ‘gen’ shows the generation in which thespeciﬁc point was found.Objective F F F F F gendesign solution 0 .

032 0 .

004 0 .

011 0 0 - point-1 .

021 0 .

003 0 .

010 0 0 763 point-2 .

031 0 .

001 0 .

002 0 0 769 point-3 .

025 0 .

001 0 .

005 0 0 807

IV. BUILDING THE SURROGATE MODEL

The convergence of the optimization method can beimproved by using, e.g., k -means clustering [4], ANN [15]or Gaussian process models [16, 17] to pre-select thepoints that need to be evaluated. Alternatively, an ANNsurrogate model can be trained to approximate the ob-jective function values and then used in the optimiza-tion [18]. Due to the encouraging results shown in [18],available tools and promising preliminary computations,the approach in this paper is based on training and usingan ANN surrogate model.First, a random feasible sample is created and eval-uated using tracy . In particular, around 7 . × de-sign points (see Eq. (18)) are chosen uniformly at ran-dom from the intervals in Eq. (19). Their feasibility (seesection III B) is then checked according to Eq. (17) and3 × feasible points are chosen. The run time for thisis negligible. These feasible points are evaluated using tracy and divided into training (70 %), validation (20 %)and test set (10 %). This took 9 h 6 min on ﬁve nodes (220cores).Second, this random feasible sample is used to trainthe ANN surrogate model. In particular, a feed-forwardANN with N layers hidden layers is used. The ﬁrst hid-den layer has N neurons neurons while the others have2 N neurons neurons. The activation function is ReLU andthe loss function is the mean squared error. The modelis generated in Python using the

Keras [34] API on topof

TensorFlow [35], with some functionality taken from

MLLIB [36]. The

Talos framework [37] is used to ﬁndgood hyperparameters N layers ∈ { , , } , N neurons ∈ { , , } (26)and the batch size for the stochastic gradient descentalgorithm Adam [38] N batch ∈ { , } . (27)Other parameters for Adam are the default ones [39] from Keras , including the learning rate 0.001. A comparisonof the six best combinations is shown in Table III. Onone core this took 52 min.

TABLE III. A comparison of hyperparameters (only the sixbest combinations). The training is stopped if there is noimprovement in the validation loss for 100 epochs.epochs validation loss N layers N neurons N batch FIG. 1. The training loss (blue) and the validation loss (orange) as a function of the number of training epochs. The designpoint from Eq. (18) is considered and the functions that are approximated are the ones from Eq. (20) and Eq. (21), right (sevenin total). The size of the random sample is 3 × and the hyperparameters are: N layers = 5, N neurons = 64 and N batch = 128,with the ReLU activation function.

The dependence of the training and validation loss onthe number of training epochs for the case with the small-est validation loss (Table III), i.e., the hyperparameters N layers = 5 , N neurons = 64 and N batch = 128 , (28)is shown in Fig. 1. A comparison of this ANN surro-gate model with the particle tracking results in tracy isshown in Fig. 2. The comparison is performed on thetest set, i.e., random feasible design points that were notused for training. In each sub-plot the x and y coordi-nates are the values computed with the ANN surrogatemodel and tracy , respectively. The line y = x indicatesperfect agreement. In the case of F and F the surro-gate model prediction is rounded to the nearest fraction(see Eq. (13)) and, to facilitate the presentation of theresults, the average of these values is shown in the secondrow, ﬁrst column. Moreover, for ctfp and adts negativepredictions are set to zero. V. OPTIMIZING WITH THE SURROGATEMODEL

In this section, the MOGA implemented in the Python pymoo [40] module is used for the optimization, with theANN surrogate model from section IV used to predictthe necessary ﬁgures of merit (see Eqs. (20) and (21)).This is compared to the optimization using opt-pilot as described in section III C, to preserve the solution quality while speeding up the optimization. Addition-ally, as in section III C, the candidate solutions are againcompared with the manually obtained solution.The crossover, mutation and constraint handling usedwith pymoo are therefore chosen to be as close as possibleto the ones used with opt-pilot . A. Direct approach

As a ﬁrst approach, an optimization with M = 10 individuals in a generation is run for 1000 generations.This took 60 min on one core. The points in generation1000 are then re-evaluated using tracy (3 h 50 min on ﬁvenodes) and the comparison is shown in Fig. 3, blue color.It can be seen from the scale that the objective functionvalues of these design points are very good comparedto the values computed for random points (see Fig. 2).For example, in the case of F = DA the test sampleachieved values up to around 0 . . x axis) and the values obtained with tracy ( y axis) isquite poor. In particular, for F , F and F the surrogatemodel prediction is generally much smaller than the valuecomputed in tracy – the points that seem good duringthe optimization with the surrogate model turn out to bemediocre. Therefore, despite the initial surrogate modelquality seen in Fig. 2 as evaluated on random feasible FIG. 2. The surrogate model quality on the test set, i.e., random design points that were not used for training. The surrogatemodel was trained using a training set of size 2 . × and a validation set of size 6000. Its quality is tested on a test set ofsize 3000. In each sub-plot the x and y coordinates are the values computed with the surrogate model and tracy , respectively(a point on the line y = x would be perfect agreement). Darker blue colors represent higher design point densities. points, the surrogate model quality evaluated on gooddesign points, such as those computed in generation 1000,is not adequate for optimization. For example, none ofthe design points in generation 1000 have F below 0 . F for the manually obtained solution is 0 . opt-pilot run for48 h on three nodes computed 829 generations (with M = 300 and n = 130), where 31 points satisfy the con-straints and have all objective functions better than thedesign solution (see section III C). The optimization withthe surrogate model is 3 . × faster, but the solution qual-ity is not as good. The comparison is clearly presentedin Table IVTo improve the solution quality, the quality of the sur-rogate model predictions has to be much better for pointswith good objective function values. To achieve this, inthe next section the surrogate model will be re-trainedduring the optimization. B. Re-training the surrogate model

The second approach, devised to achieve both the runtime of the ANN surrogate model optimization and thesolution quality of the opt-pilot optimization, is thefollowing: the surrogate model is re-trained during theoptimization. To keep the total run time for training thesurrogate models low, the re-training is done only twotimes: ﬁrst after generation m and then after generation m . The points used for re-training can be chosen as asubset of the random sample used for training the surro-gate model in section IV and the points evaluated duringthe optimization. Preliminary computations showed thatusing only the points in generation m i ( i ∈ { , } ) is notenough to accurately predict the values of new points. Onthe other hand, using both the initial random sample andsome of the points from generation m i works very well.To keep the total number of tracy evaluations below3 × , the surrogate model is trained on 10 randomfeasible points and re-trained in generation m with therandom feasible points used previously and 5000 pointsfrom generation m , and again in generation m with FIG. 3. Blue color: the quality of the predictions of the surrogate model from Fig. 2 on the design points in generation1000 (computed with the approach from section V A). For all points the value of adts is computed as zero using tracy andpredicted to be zero using the surrogate model. Orange color: the quality of the predictions of the third surrogate model fromthe approach in section V B on the design points in generation 1000. In each sub-plot the x and y coordinates are the valuescomputed with the surrogate model and tracy , respectively (a point on the line y = x would be perfect agreement). Darkercolors represent higher design point densities. The beneﬁt of the approach from section V B can clearly be seen since: (1) theorange smudges overlap better with the line y = x , i.e., the surrogate model predictions at the end of the optimization are moreaccurate, (2) the orange points have smaller y coordinate values, i.e., better objective function values computed with tracy .TABLE IV. Solution quality and run time for diﬀerent optimization methods. ‘SM’ is an abbreviation for ‘surrogate model’. opt-pilot denotes the massively parallel MOGA implementation combined with tracy , in particular the optimization fromsection III C. ‘SM (3 × )’ and ‘SM + re-train (2 × )’ refer to the approaches described in section V A and V B, respectively.‘SM + re-train (10 )’ and ‘SM + re-train (5000)’ are described in section V C for N = 5000 and N = 2500, respectively. Thenumber in the parenthesis is the combined size of the used samples. ‘nof pts better’ refers to the number of design points inthe last generation that satisfy the constraints in Eq. (21) and have all objectives better than the manually obtained solution.‘re-eval all’ refers to the case where all 10 individuals in the last generation are re-evaluated using tracy , which accounts foraround 3 h 50 min of the total run time. ‘re-eval 10 %’ refers to the case where only 1000 of the 10 individuals in the lastgeneration are re-evaluated. The speedup is computed with respect to the opt-pilot approach in the ﬁrst column. opt-pilot SM (3 × ) SM + re-train (2 × ) SM + re-train (10 ) SM + re-train (5000)nof pts better 31 0 148 368 87run time (re-eval all) 48 h 14 h 48 min 12 h 15 min 8 h 31 min 6 h 33 mincore hours (re-eval all) 6336 2847 2325 1593 1210speedup (re-eval all) 1.0 3.2 3.9 5.6 7.3run time (re-eval 10 %) - 11 h 21 min 8 h 52 min 5 h 5 min 3 h 10 mincore hours (re-eval 10 %) - 2089 1578 838 465speedup (re-eval 10 %) - 4.2 5.4 9.4 15.1 the 1 . × points used previously and 5000 additionalpoints from generation m . Preliminary computationsshowed that m = 50 and m = 500 can be used, dueto a more rapid change of the objective function valuesin the beginning of the optimization. For example, ingeneration 50 all of the points had F = DA < . F = 0 .

004 (Table II).The quality of the predictions of the third surrogatemodel on the 10 design points from generation 1000is shown in Fig. 3, orange color. The total run time,including re-evaluating the entire last generation using tracy , is now around 12 h 15 min, which is a speedupof 3 . × compared to the approach from section III C.However, not all points in the last generation need to bere-evaluated – if only 1000 of these points (i.e., 10 %) arere-evaluated, the total run time is 8 h 52 min, which isa speedup of 5 . × (see Table IV). The 1000 points tobe re-evaluated can be chosen based on the values of thepredictions.There are 148 design points in the last generation thatsatisfy the constraints in Eq. (21) and have all of the ob-jective function values better than those of the manuallyobtained solution, which is signiﬁcantly more than the31 points found in section III C (see Tables I and IV).As in Table II, out of these 148 design points, the oneswith the lowest value of F , F and F are shown in Ta-ble V and referred to as point-4 , point-5 and point-6 ,respectively. The quality of these points is clearly com-parable with point-1 and point-2 from Table II. Forexample, the lowest value of F = DA − . = 0 . point-4 , while point-1 has F = 0 . point-2 has F = DA . = 0 . F = 0 .

004 for point-6 . Withboth approaches the lowest value of F = DA = 0 . point-3 in Table II, e.g.,a design point with( F , F , F , F , F ) = (0 . , . , . , , , (29) ctfp = 0 and adts = 0. C. Using a smaller sample for training

In section V A a large sample of 3 × random fea-sible design points was used to illustrate that, regardlessof the surrogate model quality on random points, thequality on points with good objective functions is verypoor. In section V B the combined size of the sampleswas 2 × instead of 3 × . In this section the size ofthe samples is further reduced.The approach is the same as the one from section V B.An initial sample of size N is used to train the ﬁrst sur-rogate model. The second surrogate model is trained ingeneration m = 50 using these N points together with TABLE V. The points with the smallest value of F ( point-4,7,10 ), F ( point-5,8,11 ) and F ( point-6,9,11 ),out of all the points (ﬁrst row in Table IV) that satisfy theconstraints in Eq. (21) and have all objectives better than themanually obtained solution (ﬁrst row in Table II). All the dig-its of the objective function values computed with 500 turnsin tracy are used for the comparison, and the shaded num-bers denote the minimal value of the respective objective be-fore rounding. The last column denotes the size of the initialsample. The case with N = 10 is described in section V B.The cases with N = 5000 and N = 2500 are described insection V C.Objective F F F F F N point-4 .

020 0 .

004 0 .

011 0 0 10 point-5 .

029 0 .

001 0 .

008 0 0 10 point-6 .

028 0 .

002 0 .

004 0 0 10 point-7 .

024 0 .

003 0 .

008 0 0 5000 point-8 .

030 0 .

002 0 .

006 0 0 5000 point-9 .

029 0 .

002 0 .

004 0 0 5000 point-10 .

025 0 .

003 0 .

011 0 0 2500 point-11 .

032 0 .

001 0 .

005 0 0 2500 N/ m . The thirdsurrogate model is trained in generation m = 500 usingalso N/ m . Whilein section V B N = 10 was used, this is now lowered to N = 5000 and N = 2500.As shown in Table IV, the number of points that sat-isfy the constraints in Eq. (21) and have all of the ob-jective function values better than those of the manuallyobtained solution is now 368 for N = 5000 and 87 for N = 2500. The run time including the re-evaluation ofthe entire last generation is 8 h 31 min (speedup 5 . × )and 6 h 33 min (speedup 7 . × ), respectively. If only1000 of the points in the last generation (i.e., 10 %)are re-evaluated using tracy , the speedups in the cases N = 5000 and N = 2500 are 9 . × and 15 . × , respec-tively. A detailed comparison is shown in Table IV.Furthermore, in addition to counting ‘nof pts better’(Table IV), out of these design points the points withthe lowest values of F , F and F are shown in Table V.Both approaches from this section found numerous pointswith very good objective function values in a signiﬁcantlyshorter time – most notably the 3 h 10 min for the fastestcase (instead of 48 h needed for opt-pilot ). VI. CANDIDATE SOLUTIONS

In this section some of the candidate solutions obtainedin sections III and V are compared and further analyzed.In particular, out of the points found in section III Cwith the massively parallel implementation of a MOGA, point-3 (see Table II) is chosen. Out of the points foundwith the new method in section V B ( N = 10 ), the pointshown in Eq. (29) is chosen. Out of the points foundwith the new method in section V C, using the smallest0considered combined sample size ( N = 2500), point-11 (see Table V) is chosen. All of these points are comparedwith the manually obtained solution, referred to as the‘design solution’.For each of these solution candidates, the transverseDAs at three diﬀerent energies ( δ ∈ {− . , , . } ) areshown in Fig. 4. In each sub-plot the bold black lineshows the boundary of the on-momentum DA, computedwith 500 turns in tracy as described in section II A. Asindicated by the smaller values of the objective function F = DA , all three candidate solutions have a larger on-momentum DA than the design solution. The green andblue areas show the oﬀ-momentum DA for δ = − . δ = 0 .

03, respectively. These transverse DAs cor-respond to the objective functions F = DA − . and F = DA . , respectively. In the case of δ = 0 .

03 itcan clearly be seen that the computed transverse DA forall three new candidate solutions is larger than that ofthe design solution, as indicated by the smaller values of F . In the case of δ = − .

03 the area of the transverseDA for point-11 is of a similar size to that of the de-sign solution. This agrees with the fact that for both ofthese design points F = 0 .

032 (see Tables II and V). Theother two new candidate solutions have lower values of F ( F = 0 .

025 in Table II and F = 0 .

026 in Eq. (29)) andthe computed areas of the transverse DA at δ = − . tracy -computed transverse DAs (Fig. 4).The chromatic tune footprint (section II B 1) andamplitude-dependent tune footprint (section II B 2) forthe three new solution candidates and the design solu-tion are shown in Fig. 6. In each sub-plot, the outer triangle (red) is the one formed by three intersecting2nd order resonances around the on-momentum tune (seeEq. (9)). The inner triangle (black) includes the marginfrom section II B 1. For all of the candidate solutions, F , = unstable − , + = 0 (see Eq. (13)) and the secondconstraint in Eq. (21) is satisﬁed, i.e., ctfp + adts = 0(see Eqs. (12) and (16)), so all footprints are located in-side the inner triangle. VII. CONCLUSIONS

In this paper a multi-objective genetic algorithm isused to ﬁnd a good dynamic aperture and energy accep-tance for the Swiss Light Source upgrade. To speed upthis expensive computation, artiﬁcial neural network sur-rogate models are used in the optimization. Compared toa massively parallel implementation of a multi-objectivegenetic algorithm, the new optimization method resultsin an order of magnitude speedup. At the same time, thesolution quality is preserved. In particular, tens of thedesign points in the last generation are better than thedesign solution in all of the considered objective func-tions.The new, faster method makes it possible to includemore design parameters in the optimization problem,such as the octupole strengths, which could further im-prove the solution quality. Furthermore, it allows for theinclusion of a more accurate and more expensive model,e.g., a model which includes nonlinear synchrotron oscil-lation. In this work the focus is on the lattice for theSwiss Light Source upgrade, but an analogous procedurecould easily be used for a diﬀerent lattice or a diﬀerentmachine.

VIII. ACKNOWLEDGEMENTS

M. Aiba provided the manually obtained solution.J. Kallestrup found useful reference suggestions. R. Bel-lotti helped with

TensorFlow and with proofreadinga part of the manuscript. M. Zacharias helped with

TensorFlow . T. Schietinger proofread the manuscript. [1] B. Riemann and A. Streun, Low emittance lattice de-sign from ﬁrst principles: Reverse bending and longitudi-nal gradient bends, Phys. Rev. Accel. Beams , 021601(2019).[2] J. Bengtsson and A. Streun, Robust design strategy forSLS 2 , Report SLS2-BJ84-001 (PSI, 2017).[3] M. P. Ehrlichman, Genetic algorithm for chromaticitycorrection in diﬀraction limited storage rings, Phys. Rev.Accel. Beams , 044001 (2016).[4] Y. Li, W. Cheng, L. H. Yu, and R. Rainer, Genetic al-gorithm enhanced by machine learning in dynamic aper-ture optimization, Phys. Rev. Accel. Beams , 054601 (2018).[5] Tracy-3, Self-consistent charged particle beam trackingcode based on a symplectic integrator by J. Bengts-son, available for download at https://github.com/jbengtsson/tracy-3.5 .[6] X. Huang and J. Safranek, Nonlinear dynamics opti-mization with particle swarm and genetic algorithms forSPEAR3 emittance upgrade, Nucl. Instrum. MethodsPhys. Res. A , 48–53 (2014).[7] R. Husain and A. Ghodke, Constrained multi-objectiveoptimization of storage ring lattices, Nucl. Instrum.Methods Phys. Res. A , 151 (2017). (a) Design solution. (b) point-3 from Table II. (c) The point from Eq. (29). (d) point-11 from Table V. FIG. 4. Transverse DAs in Floquet space of the solution candidates for δ = − .

03 (green), δ = 0 .

03 (blue) and δ = 0(bold black line), computed using tracy as described in section II A. The particles are tracked for 500 turns. For a clearerpresentation of the results, only half of the oﬀ-momentum apertures is shown. This is suﬃcient due to machine-plane symmetry.Each sub-plot corresponds to one candidate solution: the manually obtained solution (referred to as the ‘design solution’, topleft), point-3 from Table II (top right), the design point from Eq. (29) (bottom left) and point-11 from Table V (bottomright). The relative relationships of the DA area sizes are consistent with the computed objective function values.[8] J. Wu, Y. Zhang, Q. Qin, Y. Wang, C. Yu, and D. Zhou,Dynamic aperture optimization with diﬀusion map anal-ysis at CEPC using diﬀerential evolution algorithm, Nucl.Instrum. Methods Phys. Res. A , 163517 (2020).[9] M. Kranjˇcevi´c, B. Riemann, A. Adelmann, andA. Streun, Multi-objective optimization of the dy-namic aperture for the Swiss Light Source upgrade(2020), abstract submitted to IPAC’20 (cancelled),arXiv:2002.08685.[10] W. Gao, L. Wang, and W. Li, Simultaneous optimiza-tion of beam emittance and dynamic aperture for elec-tron storage ring using genetic algorithm, Phys. Rev. STAccel. Beams , 094001 (2011). [11] Y. Li and L. Yang, Multi-objective dynamic aperture op-timization for storage rings, Int. J. Mod. Phys. A ,1644019 (2016).[12] A. Hoﬂer et al. , Innovative applications of genetic algo-rithms to problems in accelerator physics, Phys. Rev. Ac-cel. Beams , 010101 (2013).[13] L. Yang, Y. Li, W. Guo, and S. Krinsky, Multiobjectiveoptimization of dynamic aperture, Phys. Rev. ST Accel.Beams , 054001 (2011).[14] L. Yang, D. Robin, F. Sannibale, C. Steier, and W. Wan,Global optimization of an accelerator lattice using mul-tiobjective genetic algorithms, Nucl. Instrum. MethodsPhys. Res. A , 50 (2009). DA, δ = − .

03 DA, δ = 0 DA, δ = 0 . d e s i g n s o l u t i o n point-3 p o i n t f r o m E q . ( ) point-11 FIG. 5. The columns show the transverse DAs at δ = − . δ = 0 and δ = 0 .

03, recomputed in OPA for: the manually obtainedsolution (referred to as the ‘design solution’, 1st row), point-3 from Table II (2nd row), the design point from Eq. (29) (3rdrow) and point-11 from Table V (4th row). The particles are tracked for 500 turns on a grid in the transverse plane. Theresults are consistent with the tracy computation in Fig. 4.[15] F. Wang, M. Song, A. Edelen, and X. Huang, Machinelearning for design optimization of storage ring nonlineardynamics (2019), arXiv:1910.14220.[16] X. Huang, M. Song, and Z. Zhang, Multi-objective multi-generation Gaussian process optimizer for design opti-mization (2019), SLAC-PUB-17451, arXiv:1907.00250.[17] M. Song, X. Huang, L. Spentzouris, and Z. Zhang, Stor-age ring nonlinear dynamics optimization with multi-objective multi-generation gaussian process optimizer,Nucl. Instrum. Methods Phys. Res. A , 164273 (2020).[18] A. Edelen, N. Neveu, M. Frey, Y. Huber, C. Mayes, andA. Adelmann, Machine learning for orders of magnitude speedup in multiobjective optimization of particle ac-celerator systems, Phys. Rev. Accel. Beams , 044601(2020).[19] J. Kennedy and R. Eberhart, Particle swarm optimiza-tion, in Proceedings of the IEEE International Conferenceon Neural Networks , Vol. 4 (1995) pp. 1942–1948.[20] D. Karaboga, An idea based on honey bee swarm fornumerical optimization, Techn. Rep. TR06, Erciyes Univ.Press, Erciyes (2005).[21] H. Shah-Hosseini, The intelligent water drops algorithm:a nature-inspired swarm-based optimization algorithm,Int. J. Bio-Inspired Comput. , 71 (2009). Q x Q y x y (a) Design solution. Q x Q y x y (b) point-3 from Table II. Q x Q y x y (c) The point from Eq. (29). Q x Q y x y (d) point-11 from Table V. FIG. 6. Chromatic tune footprint in the range δ ∈ [ − . , .

05] (top row) and amplitude-dependent tune footprint (bottomrow) for the solution candidates: the manually obtained solution (referred to as the ‘design solution’, 1st column), point-3 from Table II (2nd column), the design point from Eq. (29) (3rd column) and point-11 from Table V (4th column). The 2ndorder resonances are shown as red and the black triangle includes the additional margin around the resonance lines.[22] M. Dorigo, V. Maniezzo, and A. Colorni, The ant system:Optimization by a colony of cooperating agents, IEEETrans. Syst. Man Cybern. Part B Cybern. , 29 (1996).[23] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimiza-tion by simulated annealing, Science , 671 (1983).[24] L. N. De Castro and J. Timmis, Artiﬁcial ImmuneSystems: A New Computational Intelligence Approach (Springer, 2002).[25] K. Deb,

Multi-Objective Optimization Using Evolution-ary Algorithms (Wiley, 2009).[26] N. Neveu et al. , Parallel general purpose multiobjec-tive optimization framework with application to electronbeam dynamics, Phys. Rev. Accel. Beams , 054602(2019).[27] I. V. Bazarov and C. K. Sinclair, Multivariate optimiza-tion of a high brightness dc gun photoinjector, Phys. Rev.Accel. Beams , 034202 (2005).[28] M. Kranjˇcevi´c, S. G. Zadeh, A. Adelmann, P. Arbenz,and U. van Rienen, Constrained multiobjective shapeoptimization of superconducting rf cavities consideringrobustness against geometric perturbations, Phys. Rev.Accel. Beams , 122001 (2019).[29] Y. Ineichen, A. Adelmann, C. Bekas, A. Curioni, andP. Arbenz, A fast and scalable low dimensional solver forcharged particle dynamics in large particle accelerators,Comput. Sci. Res. Dev. , 185 (2013).[30] Y. Ineichen, Toward massively parallel multi-objectiveoptimization with application to particle accelera- tors, ETH Research Collection 10.3929/ethz-a-009792359(2013).[31] M. Boege and A. Streun, Beam lifetime studies for theSLS storage ring, in Proceedings of the 1999 Particle Ac-celerator Conference (PAC) (New York, USA, 1999).[32] E. Forest and R. Ruth, Fourth-order symplectic integra-tion, Physica D , 105 (1990).[33] A. Streun, SLS 2.0 Baseline Lattice , Internal ReportSLS2-SA81-004-13 (PSI, 2020).[34] F. Chollet et al. , Keras: The python deep learning library(2015).[35] M. Abadi et al. , TensorFlow: Large-scale machine learn-ing on heterogeneous systems (2015).[36] A. Adelmann et al. , https://gitlab.psi.ch/adelmann/mllib .[37] Autonomio Talos [Computer software]. (2019). Retrievedfrom http://github.com/autonomio/talos .[38] D. P. Kingma and J. Ba, Adam: A method for stochasticoptimization (2014), arXiv:1412.6980.[39] Keras API reference / Optimizers / Adam, https://keras.io/api/optimizers/adam (2020).[40] J. Blank and K. Deb, pymoo: Multi-objective Optimiza-tion in Python (2020), arXiv:2002.04504.[41] OPA accelerator optics software with examples, http://ados.web.psi.ch/opahttp://ados.web.psi.ch/opa