[PDF] Towards Large Scale Automated Algorithm Design by Integrating Modular Benchmarking Frameworks

Abstract

We present a first proof-of-concept use-case that demonstrates the efficiency of interfacing the algorithm framework ParadisEO with the automated algorithm configuration tool irace and the experimental platform IOHprofiler. By combing these three tools, we obtain a powerful benchmarking environment that allows us to systematically analyze large classes of algorithms on complex benchmark problems. Key advantages of our pipeline are fast evaluation times, the possibility to generate rich data sets to support the analysis of the algorithms, and a standardized interface that can be used to benchmark very broad classes of sampling-based optimization heuristics. In addition to enabling systematic algorithm configuration studies, our approach paves a way for assessing the contribution of new ideas in interplay with already existing operators -- a promising avenue for our research domain, which at present may have a too strong focus on comparing entire algorithm instances.

Full PDF

TTowards Large Scale Automated Algorithm Designby Integrating Modular Benchmarking Frameworks

Amine Aziz-Alaoui, ISAE-SUPAERO, Universit´e de Toulouse, France ∗ Carola Doerr, Sorbonne Universit´e, CNRS, LIP6, Paris, FranceJohann Dreo, Thales Research & Technology, Palaiseau, France † February 15, 2021

Abstract

We present a ﬁrst proof-of-concept use-case that demonstrates the eﬃciency of interfacingthe algorithm framework

Paradiseo with the automated algorithm conﬁguration tool irace and the experimental platform

IOHproﬁler . By combing these three tools, we obtain apowerful benchmarking environment that allows us to systematically analyze large classesof algorithm spaces on complex benchmark problems. Key advantages of our pipeline arefast evaluation times, the possibility to generate rich data sets to support the analysis of thealgorithms, and a standardized interface that can be used to benchmark very broad classesof sampling-based optimization heuristics.In addition to enabling systematic algorithm conﬁguration studies, our approach paves away for assessing the contribution of new ideas in interplay with already existing operators– a promising avenue for our research domain, which at present may have a too strong focuson comparing entire algorithm instances.

When confronted with an optimization problem in practice, one of the major challenges that weface is the selection (and the conﬁguration) of an algorithm that corresponds well to the givenproblem structure, optimization objective(s), and the available resources (compute, possibilityto parallelize computations, accessibility of the problem, etc.). A vast amount of diﬀerentoptimization techniques exist, which renders this algorithm selection problem non-trivial.In practice, algorithm selection is often biased by personal preferences and experiences, aswell as by practical aspects such as the availability of ready-to-use implementations. Support-ing practitioners in making more systematic choices is one of the key objectives of our researchdomain. A key tool for deriving such recommendations is algorithm benchmarking , i.e. the anal-ysis of empirical performance data and search trajectories of one or several algorithms on one orseveral optimization problems [BDB +

20, HAR + +

16, LJD + +

20, RT18, DWY +

18, WKB + +

19, EPK20, FGLP11], andperformance extrapolation [KHNT19]. However, most of these tool are developed in isolation, ∗ This work was partially done during the M2 master internship of Amine Aziz-Alaoui at ´Ecole Polytechnique,Institut Polytechnique de Paris, CNRS, LIX, Palaiseau, France. He is now a PhD student at Institut de RechercheTechnologique Saint Exup´ery, Toulouse, France. † Corresponding author, [email protected]. a r X i v : . [ c s . N E ] F e b aying little attention to building compatible interfaces to other benchmarking modules. Thissigniﬁcantly hinders their wider adoption.With this work we demonstrate the beneﬁts of a fully modular benchmarking pipeline design,which keeps the diﬀerent steps of the benchmarking study in mind. We see our work as a proofof concept for better compatibility between our benchmarking software. On the practical side,our pipeline paves a way for assessing the beneﬁts of new algorithmic ideas in the context andin interplay with other operators and ideas that our community oﬀers. Our contribution:

Concretely, we propose in this work a benchmarking pipeline thatintegrates the modular algorithm framework

Paradiseo [KMRS02, CMT04] with the algorithmconﬁguration tool irace [LDC + IOHexperimenter [DWY + IOHanalyzer [WVY + Quality of the results:

We show that irace is capable of ﬁnding algorithm instances whichoutperform all baseline algorithms selected by hand, and this for each of the 19 problem instancesthat we consider. The relative advantage of the best out of 15 irace suggestions over the bestbaseline algorithm, measured in terms of area under the ECDF curve (see Sec. 3.1), variesbetween 1% and 30%, with a median gain of 13%.

Scalability:

Our algorithmic framework is capable of generating large set of solvers, up toseveral millions of unique conﬁgurations. We show that it is possible to tackle such spacesthanks to fast computations. For instance, we give irace a budget of 100 000 target runs foreach of 19 problems, and it completes the full task in approximately 3 hours on a laptop. In ourexperience, our C++ pipeline is at least 10 times faster than heavily optimized counterparts inPython, not mentioning that most of the available modular frameworks are not always heavilyoptimized.

Take-away for instance selection:

As a side result, we observe that similar algorithm in-stances can be suggested by irace for some problems, suggesting that the diversity in perfor-mance proﬁles sought in [WCLW20] may be weaker than intended. Our work suggests that anapproach like ours may result in a more reliable instance selection, since it will be less biasedby a small set of baseline algorithms, but rather be built on a large and diverse set of possiblealgorithm instances.

Extendability:

Our pipeline is ready to perform large benchmark studies, covering largeclasses of continuous and discrete optimization algorithms. For example, local searches, particleswarm optimization, estimation of distribution algorithms and using numerical or bitstringencodings. Similarly, the pipeline gives direct access to all problems collected in

IOHproﬁler ,which comprises in particular the BBOB functions from the COCO framework [HAR + + IOHproﬁler would beeasily used to further extend this study.

Comparison to Previous Works:

Our work is a top-down approach for automatic algorithm design [MLDS14], which usea parametrized algorithmic framework to instantiate many algorithm instances . Follow-ing [LIKS17], we observe that this diﬀers from bottom-up “grammar-based” approaches, likeGrammatical Evolution [RCN98, LPC12], which allow for easily designed algorithm space, butcomplicates algorithm instantiation and optimization. In our case, the width of the designspace is already large and we targets fast algorithm instantiation, we thus favor the top-downapproach.A similar approach to ours was suggested in [LS12, BLIS20, BLIS16] for multi-objectiveoptimization. Those studies also uses irace , but the authors implemented their own modularalgorithm frameworks, which is restricted to multi-objective optimization. Our work signiﬁ-cantly scales up this kind of study, by leveraging larger algorithm design spaces, larger set of2

OHexperimenterParadisEO <> eoAlgoFoundryFastGA +crossover_rates: {double}+crossover_selectors: {eoSelectOne}+crossovers: {eoQuadOp}+aftercross_selector: eoSelectOne+mutation_rates: {double}+mutation_selectors: {eoSelectOne}+mutations: {eoMonOp}+replacements: {eoReplacement}+offspring_sizes: {size_t}+eval: eoEvalFunc+select(encoded_algo:vector)+operator()(pop:eoPop)

IOH_ecdf_logger +target_range: RangeLinear+budget_range: RangeLinear+data(): IOH_AttainSuite

IOH_csv_loggerIOH_observer_combine +vector

IOH_logger +do_log(problem_info) <> eoEvalIOHproblem +pb: IOH_problem+log: IOH_observer+operator()(sol:Bits)<>

IOH_problem

W_Model_OneMax +epistasis: int+neutrality: int+ruggedness: int+max_target: int+dimension: int+operator()(sol:Bits): double

IOH_ecdf_sum +operator()(ecdf:IOH_AttainSuite): double

After run <> eoEvalFunc fastga +run(--problem:int, --pop-size:size_t, --crossover-rate:int, --cross-selector:int, --crossover:int, --aftercross-selector:int, --mutation-rate:int, --mut-selector:int, --mutation:int, --replacement:int, --offspring-size:size_t)

Select, Run irace +run()

Figure 1: Summary diagram of the FastGA evaluation pipeline involving the

Paradiseo (up-per part, red colors) and

IOHexperimenter (lower part, blue colors) frameworks along withthe irace entry point. The execution starts from the irace run command on the left, goesthrough the

Paradiseo modules, which call the

IOHexperimenter problem (in blue) and loggers(in green). After the run of the algorithm, a statistic is computed on the ECDF data (incyan), which is then returned to irace as performance metric (i.e., this is the “ﬁtness value”that the evaluation associates to the conﬁguration under evaluation). Involved classes are rep-resented using the UML convention. For the sake of clarity, the

IOHproﬁler preﬁx is writtenas

IOH and the type of the eoAlgoFoundryFastGA slots are indicated as { double } instead of eoOperatorFoundry .benchmarks, with more problems and allowing more detailed analysis of the results. Structure of the Paper:

Sec. 2 brieﬂy introduces the individual modules of our algorithmdesign pipeline and how they interplay with each other. The use-case on which we apply thispipeline, as well as the experimental setup are summarized in Sec. 3. The results of our empiricalanalysis are described in Sec. 4. We conclude our paper in Sec. 5 with a discussion on promisingavenues for future work.

Availability of Code and Data:

Our code is available on GitHub at https://github.com/nojhan/paradiseo . Figure 1 summarizes our automated algorithm design pipeline for the concrete use-case thatwill be studied in Sec. 3. The pipeline links an algorithm selector with an algorithm generatorand a benchmark platform. The algorithm selector asks the algorithm generator to instantiatean algorithm, which then solves a problem of the benchmark platform while being observed bya logger. After this run, the logger’s data are summarized as a scalar performance measure,which is sent back to the algorithm selector. We brieﬂy present in this section the diﬀerentcomponents of our pipeline, and explain the reasons behind our choices.

Algorithm Framework:

Paradis eo Many evolutionary algorithms share similar design pat-terns, and are often composed of similar operators. This has given rise to several platformswhich aim at supporting their users in designing evolutionary heuristics by compiling a set of3eadily-available operators within a standardized software environment. Given the substantialwork that has put into these frameworks, we decided to build our pipeline around one of themost powerful toolboxes. To this end, we have ranked 39 frameworks among the ones easilyavailable on the web, based on an adhoc metric combining rapidity, activity, features and li-cense, e.g., [NDV15, SL19, GP06, Jen, ECF, FDG +

12, CEP08], to name only a few. Sincespeed is a major concern for our pipeline, we favor frameworks written in C++. To select anup-to-date framework and to ensure availability of support in case of technical issues, we alsochecked the contribution activity in recent years. These two criteria reduced our choices to

Par-adiseo [KMRS02, CMT04], OpenBeagle [GP06], and ECF [ECF]. Among these three,

Paradiseo covers the largest portfolio of algorithm families, which are composed in the framework by as-sembling atomic functions (called operators ). Paradiseo is also the most actively maintainedframework among the three, so that we decided to use it for our work.The upper part of Figure 1 shows the core classes of

Paradiseo involved in our setting.

Algorithm Conﬁguration: irace

Several algorithm conﬁguration tools have been developedin the last decade. Among the most common ones used in our community are irace [LDC + + + irace for this study, for practical considerations (previous experience, availability ofdocumentation, support from development team). Experimental Environment:

IOHexperimenter

The

IOHproﬁler project [DWY +

18] is amodular platform for algorithm benchmarking of iterative optimization heuristics (IOH). Withinthis project,

IOHexperimenter provides synthetic benchmarks which are very fast to execute anda standardized way of observing algorithms behavior through so-called loggers . We have chosenthis platform, because it is fast and its modular design made it particularly easy for us tointegrate the algorithm design framework (being written in C++, as

Paradiseo ). IOHproﬁler isalso actively maintained, and provides access to broad ranges of diﬀerent optimization processes.Compared to Nevergrad [RT18], we particularly like the detailed logging options, whichprovide information about the any-time behavior of the algorithms —information that is cur-rently not available in Nevergrad. Compared to the COCO [HAR +

20] environment,

IOHproﬁler makes it considerably easier to test algorithms’ performance on our own benchmark problemsor suites. Finally, the project also supports interactive performance analysis and visualizationmodule,

IOHanalyzer , which we used for the interpretation of our data.The lower part of Figure 1 shows the classes related to the loggers and the problems thatare used in our experimental study in Sec. 3.

Data Records: fast ECDF Logger

In our use-case, we decide to tune algorithms for goodanytime performance, and to use area under the empirical cumulative density function (ECDF)curve as objective. To this end, we have implemented within

IOHexperimenter an eﬃcient wayof computing these values. This “ECDF logger” will be described in Sec. 3.1.

Data Analysis and Visualization with

IOHanalyzer

Data analysis and visualization isperformed via

IOHanalyzer [WVY + IOHproﬁler project [DWY + Our use-case is the optimization of the anytime performance of a genetic algorithm on selectedinstances of the W-model problem. Our performance measure (Sec. 3.1), the algorithmic frame- Version 3.4.1 of https://cran.r-project.org/web/packages/irace/ , ran with R In order to allow for large scale experiments, we implement a fast logger within

IOHexperimenter ,which essentially stores a histogram of the two-dimensional distribution of the ratio of runshaving reached a quality/time target. The time dimension is given as the number of calls tothe objective function, linearly discretized between zero and the allowed budget. The qualitydimension is given as the absolute value of the best solution found during the run, linearlydiscretized between zero and the known V max bound (see Table 1). The W-Model problemis here converted in a minimization problem where the solver seek to optimize − OM ( x ) (seeSec. 3.3). Figure 2 shows two examples of such histograms, arbitrarily chosen. The matrixdeﬁnes the considered quality/time targets ( v, t ). The color of each cell corresponds to theprobability that the algorithm has identiﬁed, within the ﬁrst t function evaluations, a solutionof quality at least v . The darker a cell, the larger the fraction of runs that could successfullymeet the quality/time target.Using the histogram of the performance ECDF instead of its continuous counterpart allowsto keep the data in-memory, in compact data structures, without having to rely on slow diskaccesses.The performance of the considered algorithm is computed as a statistic on this histogram.In our study, we use the area under the curve (AUC) of the discretized ECDF, approximatedas the sum of the histogram. This allows for a compromise between quality and time, which iseasily available because we consider synthetic benchmarks with known bounds. ( µ, λ ) “Fast” GA Family We chose for our use-case a family of ( µ + λ ) GAs, which is to a large extend inspired by thestudy [YWDB20]. Algorithm 1 summarizes the framework, called “FastGA” in the implemen-tation.Essentially, given a parent population of µ points, each of the λ oﬀspring is created byﬁrst deciding which variation operator is applied (line 9): with probability p c the oﬀspring isgenerated by ﬁrst recombining two search points from the parent population (lines 11–13) andthen randomly deciding (with probability p m ) whether or not to apply a mutation operator tothe so-created oﬀspring (lines 15–18). When crossover was not selected in line 10, the oﬀspringis created by mutation (lines 20–21). When all λ oﬀspring have been created, the iteration iscompleted by a replacement step (line 25). Implementation of this Family in

Paradis eo : We implement this family of GAs through

Paradiseo ’s “foundries”, which allow to register a set of operators (e.g., several kind of mutations)within a “slot” (e.g., the step at which mutation is called within the algorithm). Before eachcall, it is possible to instantiate a speciﬁc operator among the registered ones, for each slot,thus assembling one of the algorithm instance among all the possible combinations of operators.Note that operators can be simple numbers, like a probability. Operators are referenced withinslots by their indices.Most of the operators we use were already available in

Paradiseo , to the exception of mu-tations operators with indices 1–5 (see below), which we implemented for this study. We alsoimplemented the algorithm 1 as the eoFastGA class , in which to plug the operators.We consider the following operators and parametrizations, which result in a total numberof 1 630 475 diﬀerent conﬁgurations of Algorithm 1. Numbers in brackets indicate the indices ofthe corresponding operators within its slot. All our code is contributed to the

Paradiseo project. lgorithm 1: A Conﬁgurable Family of ( µ + λ ) Genetic Algorithms. Input:

Budget B , conﬁguration ( µ, λ, p c , p m ), choice of the operators and conditionalparameters. Note that P and P (cid:48) are multi-sets, i.e the same point may appearmultiple times; Initialization: P ← InitialSampling( µ ); evaluate the µ points in P ; Evals ← µ ; Optimization: for t = 1 , , , . . . until Evals = B do P (cid:48) ← ∅ ; for i = 1 , . . . , λ do Sample r c ∈ [0 ,

1] u.a.r.; if r c ≤ p c then (cid:0) y ( i, , y ( i, (cid:1) ← SelectC( P ); (cid:0) y (cid:48) ( i, , y (cid:48) ( i, (cid:1) ← Crossover (cid:0) y ( i, , y ( i, (cid:1) ; Sample z ( i, ∈ (cid:0) y (cid:48) ( i, , y (cid:48) ( i, (cid:1) u.a.r. ; Sample r m ∈ [0 ,

1] u.a.r. ; if r m ≤ p m then z ( i, ← Mutation (cid:0) z ( i, (cid:1) ; else z ( i, ← z ( i, ; else z ( i, ← SelectM( P ); z ( i, ← Mutation (cid:0) z ( i, (cid:1) ; Evaluate z ( i, ; Evals ← Evals+1; P (cid:48) ← P (cid:48) ∪ (cid:8) z ( i, (cid:9) ; P ← Replace(

P, P (cid:48) , µ ); InitialSampling ( µ ) : Initialization of the Algorithm (1 option) We only consider in-dependent uniform sampling, i.e. , the µ points are i.i.d. uniform samples. The correspondingParadisEO operator is eoInitFixedLength . Crossover rate p c (6 options): We consider p c ∈ { , . , . , . , . , . } . Being only ableto use the integer and categorical interface for irace , we predeﬁne the set of rates, which irace will see as integers. SelectC ( P ) : Selection of two points for the crossover operation (7 options). Notethat in the implementation, the selection operator (line 11) is called twice to select the twocandidate points. [0] eoRandomSelect() : Uniformly select a point from P (without removing the ﬁrst selectedindividual from the set P used by the second selection). (1 option). [1] eoStochTournamentSelect( k ) : Select a point from P with tournament selection, i.e. , weselect uniformly at random k diﬀerent points in P and the best one of these is selected. k denotes the tournament size as percentage of population ( i.e., k ∈ [0 , k = 0 . [2] eoSequentialSelect() : Select the best point from P (with respect to the objective func-tion value). This operator is sometimes referred to as elitist selection or truncation selec-tion . When called twice, it selects the two distinct best points from P . (1 option). [3] eoProportionalSelect() : Select a point from P with so-called ﬁtness-proportional selec-tion, i.e. , point x ∈ P is chosen with probability f ( x ) / (cid:80) y ∈ P f ( y ). (1 option). [4--6] eoDetTournamentSelect( k ) : Like eoDetTournamentSelect , but k is deterministic.(3 diﬀerent options, each one for k ∈ [2 , , Crossover ( x, y ) : Bivariate Variation Operators (11 options). [0--4] eoUBitXover( b c ) : Uniform crossover with bias (or “preference” in ParadisEO) b c , set-ting (independently for each position i ∈ [1 ..n ]) z i = x i with probability b c and setting z i = y i otherwise. z denotes the oﬀspring element coming from the crossover of x and y .(5 diﬀerent options, b c ∈ [0 . , . , . , . , . [5--9] eoNPtsBitXover( k ) : k -point crossover, which selects i , . . . , i k uniformly at randomand without replacement from [1 ..n ] and sets z i = x i for i ∈ [1 ..i ] ∪ [ i + 1 ..i ] ∪ . . . andsets z i = y i for i ∈ [ i + 1 ..i ] ∪ [ i + 1 ..i ] ∪ . . . (5 diﬀerent options, k ∈ [1 , , , , [10] eo1PtBitXover() : Classic 1-point crossover. (1 option). Mutation probability p m (6 options): We consider p m ∈ { , . , . , . , . , . } . Mutation ( x ) : Univariate Variation Operator (11 options) All mutation operators areunary unbiased in the sense proposed in [LW12]. For a compact representation, we follow thecharacterization suggested in [DDY20] and deﬁne the mutation operators via the distributionsthat they deﬁne over the possible mutation strengths k ∈ [0 ..n ]. After sampling k from theoperator-speciﬁc distribution, the k -bit ﬂip operator, ﬂip k ( · ), is applied; it ﬂips the entries in k uniformly chosen, pairwise diﬀerent bits ( i.e. , the k bits are chosen u.a.r. without replacement).7

0] eoUniformBitMutation() : The “uniform” mutation operator, which samples k uniformlyat random in the set [0 ..n ]. (1 option). [1] eoStandardBitMutation( p = 1 /n ) : This is the standard bit mutation with mutation rate p . It chooses k from the binomial distribution B ( n, p ). (1 option). [2] eoConditionalBitMutation( p = 1 /n ) : A conditional standard bit mutation operatorwith mutation rate p . It chooses k (cid:48) from B ( n − , p ) and applies the ﬂip k ( · ) operator with k = k (cid:48) + 1. (1 option). [3] eoShiftedBitMutation( p = 1 /n ) : The “shifted” standard bit mutation with mutationrate p , suggested in [CD18]. It samples k (cid:48) from the binomial distribution B ( n, p ). When k (cid:48) = 0, it uses k = 1 and it uses k = k (cid:48) otherwise. (1 option). [4] eoNormalBitMutation( p , σ ): The “normal” mutation operator suggested in [YDB19]. Itsamples k from the normal distribution N ( pn, σ ). When k > n , k is replaced by a valuechosen uniformly at random in the set [0 ..n ]. (1 option, p = 1 /n and σ = 1 . [5] eoFastBitMutation( β ) : The “fast” mutation operator suggested in [DLMN17]. It samples k (cid:48) from the power-law distribution P [ L = k ] = ( C βn/ ) − k − β with C βn/ = (cid:80) n/ i =1 i − β . When k (cid:48) is larger than n , it samples a uniform value k in [0 ..n ], and it uses k = k (cid:48) otherwise. (1option, β = 1 . [6--10] eoDetSingleBitFlip( k ) : Deterministically applies ﬂip k ( · ). (5 diﬀerent options, k ∈ [1 , , , , SelectM ( P ) : Selection of one point for the mutation operation if crossover was notchosen (7 options) We essentially have the same selection operators as for crossover. Theonly diﬀerence is that we select only one point instead of two.

Replace ( P, P (cid:48) , µ ) : Replacement of population (11 options) [0] eoPlusReplacement() : The best µ points of the multiset P ∪ P (cid:48) are chosen. (1 option). [1] eoCommaReplacement() : The best µ points of the oﬀspring multiset P (cid:48) are chosen. (1option). [2] eoSSGAWorseReplacement() : The min ( λ, µ ) points of the oﬀspring multiset P (cid:48) replace theworst points in P . (1 option). [3--5] eoSSGAStochTournamentReplacement( k ) : Like eoSSGADetTournamentReplacement() k being the the tournament size as percentage of population. (3 diﬀerent options, k ∈ [0 . , . , . [6--10] eoSSGADetTournamentReplacement( k ) : The µ points are selected through tourna-ment selection. Each tournament involves k uniformly chosen points in P ∪ P (cid:48) and the bestones of these k points is selected. This procedure is repeated µ times, each time removingan already selected point from the multi-set P ∪ P (cid:48) . (5 diﬀerent options, k ∈ [2 , , , , µ + λ ) GAs.The set of all combinations generates the algorithm design space on which we let irace searchfor the conﬁguration(s) that best solve a given problem instance.8 aseline Algorithms We consider four baseline algorithms, against which we compare theresults of the automated design:1. ( λ + λ ) EA: no crossover, plus replacement, standard bit mutation, random selector formutations.2. ( λ + λ ) fEA: no crossover, plus replacement, fast bit mutation, random selector for mu-tations.3. ( λ + λ ) xGA: sequential selections, uniform crossover, standard bit mutation, plus re-placement, p c = 0 . b c = 0 . λ + λ ) sequential selections, 1-point crossover, standard bit mutation, plusreplacement, p c = 0 . b c = 0 . We evaluate our automated algorithm design pipeline on the W-model functions originallysuggested in [WW18]. In a nutshell, the W-model is a benchmark problem generator, whichallows to tune diﬀerent characteristics of the problems, see below for a description. We selectedfrom this family of benchmark problems the 19 instances suggested in [WCLW20], which aresummarized in Table 1. Note here that the description diﬀers from that given in [WCLW20],since we used the implementation within

IOHexperimenter , which was made available in thecontext of the work [DYH + +

20] to superpose the W-model transformations to diﬀerent op-timization problems. The instances selected in [WCLW20], however, were only selected fromtransformations applied to the

OneMax problem OM : { , } −→ [0 ..n ] , x −→ (cid:80) ni =1 x i . The

One-Max problem has a very smooth and non-deceptive ﬁtness landscape. Due to the well-knowncoupon collector eﬀect [DP09], it is relatively easy to make progress when the function values aresmall, and the probability to obtain an improving move decreases considerably with increasingfunction values. The complexity of the

OneMax problem can be considerably increased throughthe following W-model transformations. (1) Neutrality W ( ., µ, ., . ): The bit string ( x , ..., x n ) is reduced to a string ( y , ..., y m ) with m := n/µ , where µ is a parameter of the transformation. For each i ∈ [ m ] the value of y i is themajority of the bit values in the size- µ substring ( x ( i − µ , x ( i − µ , ..., x iµ ) of x . That is, y i = 1if and only if there are at least µ/ n/µ / ∈ N , the last bits of x arecopied to y . (2) Epistasis W ( ., ., ν, . ): Epistasis introduces local perturbations to the bit strings. Itﬁrst “cuts” the input string ( x , ..., x n ) into subsequent blocks of size ν . Using a permutation e ν : { , } ν −→ { , } ν , each substring ( x ( i − ν +1 , x ( i − ν +2 , ..., x iν ) is mapped to another string( y ( i − ν +1 , y ( i − ν +2 , ..., y iν ) = e ν (( x ( i − ν +1 , x ( i − ν +2 , ..., x iν )). The permutation e ν is chosenin a way that Hamming-1 neighbors are mapped to strings of Hamming distance at least ν − (3) Ruggedness and Deceptiveness W ( ., ., ., γ ): This layer perturbs the ﬁtness values,by applying a permutation σ ( γ ) to the possible ﬁtness values [0 ..n ]. The parameter γ can bethought of as a parameter which controls the distance of the permutation to the identity. Thepermutations σ ( γ ) are chosen in a way such that the “hardness” of the instances monotonicallyincreases with increasing γ , see [WW18] for details. Our test bed is the automated design of Algorithm 1 with the options speciﬁed in Sec. 3.2 andwith the objective to maximize the AUC as deﬁned in Sec. 3.1, and this for each of the 19 prob-9able 1: Test problems on which the pipeline is evaluated, taken from [WCLW20]. In column“best” we list the baseline algorithm with largest average AUC value, reported in column AUC b .AUC i is the average AUC of the elite conﬁgurations suggested by the 15 independent runs of irace . AUC-values are w.r.t. to at least 50 validation runs and “rel.” indicates the relative gain(AUC i − AUC b ) / AUC b .FID dim µ ν γ V max best AUC b AUC i rel.1 20 2 6 10 10 xGA 8378 8740 4%2 20 2 6 18 10 fEA 8402 8754 4%3 16 1 5 72 16 fEA 8352 8397 1%4 48 3 9 72 16 EA 8299 8914 7%5 25 1 23 90 25 fEA 8003 8510 6%6 32 1 2 397 32 1pt 7055 7311 4%7 128 4 11 0 32 1pt 6833 8183 20%8 128 4 14 0 32 EA 6885 8499 23%9 128 4 8 128 32 xGA 8154 8786 8%10 50 1 36 245 50 fEA 7216 8122 13%11 100 2 21 256 50 EA 8314 9139 10%12 150 3 16 613 50 EA 8034 8730 9%13 128 2 32 256 64 fEA 8076 9345 16%14 192 3 21 16 64 fEA 6173 7677 24%15 192 3 21 256 64 fEA 6797 8292 22%16 192 3 21 403 64 fEA 7273 8592 18%17 256 4 52 2 64 xGA 6935 9028 30%18 75 1 60 16 75 EA 5958 7089 19%19 150 2 32 4 75 EA 7399 8717 18%lems listed in Table 1. These instances of the W-model problem were suggested in [WCLW20]based on an empirical study using clustering of algorithm performance data, with the goal toselect a diverse collection of benchmark problems. Note here that we tune the algorithms foreach problem individually. That is, we apply our algorithm design pipeline 19 independenttimes.We ﬁx the population sizes to λ = µ = 5, for the search performed by irace and for ourbaseline algorithms.For each use-case, we set the budget of the algorithms to 5 n function evaluations (FEs).To compute the AUC, we evaluate the performance at 100 linearly distributed budgets b , . . . , b ∈ [1 , n ] and at 100 linearly distributed target values v , . . . , v ∈ [0 , V max ]. Lin-earization computes the bucket index i = (cid:98) ( x − x min ) / ( x max − x min ) · (cid:99) for both budgets andtargets.To ﬁnd the best algorithm design, we allow irace a budget of 100 000 target runs and weensure that it performed at least 50 independent validation runs for the elite conﬁgurations. Werun this irace search 15 independent times, to check the robustness of its selection. To comparethis performance to the four baseline algorithms, we run each of these 50 independent times,on each of the 19 test problems.Running irace with this budget on 19 problems on a computer with four Intel CPU coresi5-7300HQ at 2.50GHz and Crucial P1 solid-state disks takes approximately 3 hours. Comparison of AUC Values by Function

Table 1 compares the AUC values of the bestout of the four baseline algorithms against that of the elite conﬁguration suggested by irace . We10able 2: Conﬁguration of the best out of the elite recommendation suggested by 15 independentruns of irace , for each of the 19 benchmark problems speciﬁed in Table 1, and compared againstthe conﬁguration of the four baseline algorithms. The “op.” column gives the number of optionsper operator. All other integer values correspond to the indices with which the diﬀerent optionsare listed in Sec. 3.2 and “-” indicates a non-applicable element (for instance, no crossoveroperator is used when p c = 0). Operator op. p c SelectC

Crossover

11 1 2 8 1 2 - 3 - 2 2 10 5 2 9 2 10 2 2 2 - - 2 5 p m SelectM

Mutation

11 8 9 3 9 7 6 10 10 10 9 10 9 10 8 8 10 10 8 9 1 5 1 1

Replace

11 8 9 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Table 3: Distribution of operators variants recommended by 15 runs of irace , for problems 5(left), problem 17 (center) and all problems (right). The most selected indices are highlightedin bold and the darker the background color, the more often the operator instance is selected.Empty cells indicates that irace never selected the operator instance, cells with a “-” entrymarks indices which are not deﬁned for this operator.

Problem 5 Problem 17 All problemsOp. index 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10Pc 5 6 1 3 - - - - - - 1 14 - - - - - - 18 74 56 57 65 - - - - - -SelectC 1 5 2 4 3 - - - - 15 - - - - 18 16 141 19 18 30 28 - - - -Crossover 1 5 2 2 1 1 1 2 15 11 29 78 24 6 8 11 19 18 23 43Pm 2 3 10 - - - - - - 15 - - - - - - 4 9 25 49 183 - - - - - -SelectM 14 1 - - - - 6 2 1 2 2 2 - - - - 26 17 61 36 27 41 62 - - - -Mutation 2 5 1 1 2 4 1 9 4 1 2 7 3 1 2 6 10 62 81 96Replace 15 15 245 2 3 6 2 1 2 5 1 3 observe that, for each of the 19 functions, the elite conﬁgurations suggested by irace performbetter than the best baseline algorithm. We report in Table 1 the average values, but thediﬀerences between the individual irace runs are very small, less than 2.1% diﬀerence in AUCvalue between the best and the worst elite conﬁguration for all 19 problems, and less than1% performance diﬀerence for 9 out of the 19 functions. The relative advantage of the irace recommendations over the best baseline algorithms varies between 1% and 30%. When lookingat each of the 15 elite conﬁgurations suggested per function, the best relative advantage is31% for F17, whereas two of the irace elites performed worse than the best of the four baselinealgorithms on function F3. For all other functions, all 15 irace elites have a better AUC valuethan the best of the four baseline algorithms. However, although we see a clear advantage ofthe irace conﬁgurations, we should keep in mind that the irace conﬁgurations are speciﬁcallytuned for each function, whereas the conﬁgurations of the four baseline algorithms are identicalfor all 19 W-model functions.Unfortunately, our pipeline does not yet allow to tune a single best solver, i.e. , a singleconﬁguration which maximized the AUC under the aggregated ECDF curve. Adding this func-tionality is a straightforward extension of our framework, which we plan to address in futurework. The key challenge in implementing this extension is that

Paradiseo does not have thefeature to easily reset on the ﬂy the states of solvers between two runs on diﬀerent problems.

Comparison of the Conﬁgurations.

Table 2 summarizes the best of the 15 elite conﬁgu-rations that were suggested by irace and compares them against the four baseline algorithms.Table 3 shows the distribution of operators chosen by the 15 irace runs. For the latter, wehave chosen problem 17 as an example because we observed here the largest relative gain (seeTable 1). We have added problem 5 for comparison, because the distribution of operators11

20 40 60 80 100 120 140 1601416182022242628

EA fEAxGAirace elite for problem 5

Function Evaluations B e s t - s o - f a r f ( x ) - v a l u e EA fEAxGA 1ptGAirace elite for problem 17

Function Evaluations B e s t - s o - f a r f ( x ) - v a l u e Figure 3: Convergence plots for the baseline algorithms and the elite conﬁgurations suggestedby irace , on problem 5 (left) and problem 17 (right).suggested for it is very distinct from that of problem 17.It is worth noting that each operator is selected at least once in the 19 ×

15 elite conﬁgurationssuggested by irace (Table 3, right), which seems to conﬁrm that i) diﬀerent operators work wellon diﬀerent problems and ii) that irace searches the full design space, giving some indicationthat it is not too large or too complex for automated tuning approaches.We can see that, among all the best conﬁgurations proposed by irace across 15 runs, none aresimilar to one of the baseline algorithms. The probability of mutation p m is most frequently setto higher values and the most often chosen mutation is deterministic bit ﬂip with larger numberof bits (index 10 in the Mutation slot, which is the ﬂip mutation operator). This indicates thatlarger mutation strengths could have been worth investigating, a result that has surprised us,since in most benchmark studies we see small mutation rates as defaults. The results conﬁrmthe superiority of the plus replacement (id. 0 in the Replace slot) and support the use of anelitist selection for the crossover candidates (id. 2 in

SelectC ). We can also see that the uniformcrossover with b c = 0 . Crossover ) is more often chosen, like a small probability ofperforming a crossover (id. 1 in p c ).For some problems, irace almost always suggest a similar algorithm. On problem 17, forexample, it often selects a GA with a large probability of applying uniform crossover in combi-nation with deterministic bit ﬂip mutations. For some other problems, a larger variance on theselected operators can be observed. For instance on problem 5, irace selects a high mutationprobability along with an elitist mutation selection, but does not show a clear preference forthe other slots.These results support the idea that there is not always a single best solver (i.e., “No FreeLunch”), even when considering limited design and benchmarking spaces. We also see that someproblems seem to require certain design choices, whereas others can be solved well by a broadrange of conﬁgurations. A more detailed analysis of how these preferences correlate with thecharacteristics of the problems should oﬀer plenty of interesting insights, but is left for futurework. Fixed-budget solution qualities

Figure 3 shows two examples of convergence plots, wherewe plot the values of the best solutions found so far against the number of objective functionevaluations performed, for each baseline algorithm and for the best elite conﬁguration selectedby irace . Problems 5 and 17 are chosen to allow for comparison with Table 2.We observe that the elite conﬁguration on problem 17 is largely more eﬃcient than any ofthe baseline algorithms. However, on problem 5, the elite conﬁguration is only the most eﬃcientuntil 140 evaluations. It is selected nonetheless, because we consider the AUC of the 2D ECDF,12hich takes into account the average performance (across all budgets and targets) rather thanthe terminal budget of the best target. We believe that, whatever the performance metric wechoose, there will always exists such artifacts, where some algorithm would be the best, hadwe chosen another metric. It is clear, however, that even in this plot the elite conﬁgurationperforms better most of the time . By interfacing the three state-of-the-art benchmarking modules from the evolutionary compu-tation literature, irace [LDC + Paradiseo [KMRS02], and

IOHproﬁler [DWY + where to look for interesting structures.The modular design of the pipeline and its components makes our approach very broadlyapplicable. It is not restricted to particular types of problems nor to speciﬁc algorithms. In par-ticular, extensions to continuous or mixed-integer problems are rather straightforward. Indeed,the Paradiseo framework is designed to separate operators which are independent of the encoding(selection, replacement, etc.) from operator which depends on it (mutation, crossover, etc.), al-lowing for easy reuse of components and extensions to other algorithmic paradigms (estimationof distribution, local search, multi-objective, etc.). Additionally, the

IOHexperimenter providesloggers for vectorial encodings and benchmarks for both numerical and bitstring encodings.Our work is partially motivated by an industrial application that requires an automatedconﬁguration of hardware products. However, we believe that our pipeline is not only interestingfor such practical purposes. For researchers, our pipeline oﬀers an elegant way of assessing newalgorithm operators and their interplay with already existing ones.In terms of further development, we plan to add the necessary features which would i) allowfor running the same algorithm on multiple problems, while using a single logger that aggregatesthe results and would ii) support irace ’s interface for numerical parameters (additionally tocategorical and integer ones).We then plan to test the approach on diﬀerent algorithms families, with a possible extensionto generic “bottom-up” hybridization grammars [MMLIS13] and studies on the most eﬃcientalgorithms design (e.g., on the correlations between elite algorithms’ operators).We also plan to extend the framework by integrating feature extraction methods that usealgorithm trajectory data [DLV +

19, BPRH19] and/or samples speciﬁcally made for exploratorylandscape analysis [MBT +

11, KT16] to couple the algorithm design to such information, similarto the per-instance conﬁguration approaches made in [HHHL06, BDSS17].Our long-term vision is a pipeline for the automated design of algorithms which adjust theirbehavior during the optimization process, by taking into account information accumulated sofar, similar to the dynamic algorithm conﬁgurations studied under the notion of parametercontrol [KHE15]. In contrast to the static designs considered in this work, the automateddesign of dynamic algorithms requires to select suitable update rules (e.g. based on time, onprogress, on self-adaption, etc.).Finally, we also consider interesting the idea to provide a user-friendly front-end which allows13sers to assemble a benchmark study by selecting (e.g. through a graphical user interface) one ormore algorithms and problems, the budget, etc. and then passing on this study to an automatedinterface which tunes (if desired) and runs the algorithm(s) and then automatically directs itsusers to the data summary and visualization platform

IOHanalyzer , where the results of theempirical study can be analyzed. We believe that such a pipeline would greatly improve thedeployment of evolutionary methods in practice.

References [AMS +

15] Carlos Ans´otegui, Yuri Malitsky, Horst Samulowitz, Meinolf Sellmann, and Kevin Tierney,

Model-based genetic algorithms for algorithm conﬁguration , Proc. of International Conference on ArtiﬁcialIntelligence (IJCAI’15), AAAI Press, 2015, pp. 733–739.[BBFKK10] Thomas Bartz-Beielstein, Oliver Flasch, Patrick Koch, and Wolfgang Konen,

SPOT: A toolbox forinteractive and automatic tuning in the R environment , Proc. of the 20. Workshop ComputationalIntelligence, Universit¨atsverlag Karlsruhe, 2010, pp. 264–273.[BDB +

20] Thomas Bartz-Beielstein, Carola Doerr, Jakob Bossek, Sowmya Chandrasekaran, Tome Eftimov,Andreas Fischbach, Pascal Kerschke, Manuel L´opez-Ib´a˜nez, Katherine M. Malan, Jason H. Moore,Boris Naujoks, Patryk Orzechowski, Vanessa Volz, Markus Wagner, and Thomas Weise,

Bench-marking in optimization: Best practice and open issues , CoRR abs/2007.03488 (2020).[BDSS17] Nacim Belkhir, Johann Dreo, Pierre Sav´eant, and Marc Schoenauer,

Per instance algorithm conﬁgu-ration of CMA-ES with limited budget , Proc. of Genetic and Evolutionary Computation Conference(GECCO’17), ACM, 2017, pp. 681–688.[BLIS16] Leonardo C. T. Bezerra, Manuel L´opez-Ib´a˜nez, and Thomas St¨utzle,

Automatic component-wisedesign of multi-objective evolutionary algorithms , IEEE Transactions on Evolutionary Computation (2016), no. 3, 403–417.[BLIS20] , Automatically designing state-of-the-art multi- and many-objective evolutionary algorithms ,Evolutionary Computation (2020), no. 2, 195–226.[BPRH19] Luk´as Bajer, Zbynek Pitra, Jakub Repick´y, and Martin Holena, Gaussian process surrogate modelsfor the CMA evolution strategy , Evolutionary Computation (2019), no. 4, 665–697.[CD18] Eduardo Carvalho Pinto and Carola Doerr, Towards a more practice-aware runtime analysis ofevolutionary algorithms , CoRR abs/1812.00493 (2018).[CEP08] T. Cloete, Andries Petrus Engelbrecht, and Gary Pampara,

Cilib: A collaborative framework forcomputational intelligence algorithms - part II , Proc. of the International Joint Conference on NeuralNetworks (IJCNN’08), IEEE, 2008, pp. 1764–1773.[CMT04] S´ebastien Cahon, Nordine Melab, and El-Ghazali Talbi,

Paradiseo: A framework for the reusabledesign of parallel and distributed metaheuristics , J. Heuristics (2004), no. 3, 357–380, Latestrelease available on https://nojhan.github.io/paradiseo/ .[CSC +

19] Borja Calvo, Ofer M. Shir, Josu Ceberio, Carola Doerr, Hao Wang, Thomas B¨ack, and Jose A.Lozano,

Bayesian performance analysis for black-box optimization benchmarking , Proc. of Geneticand Evolutionary Computation Conference (GECCO’19, Companion), ACM, 2019, pp. 1789–1797.[DDY20] Benjamin Doerr, Carola Doerr, and Jing Yang,

Optimal parameter choices via precise black-boxanalysis , Theoretical Computer Science (2020), 1–34.[DLMN17] Benjamin Doerr, Huu Phuoc Le, R´egis Makhmara, and Ta Duy Nguyen,

Fast genetic algorithms ,Proc. of Genetic and Evolutionary Computation Conference (GECCO’17), ACM, 2017, pp. 777–784.[DLV +

19] Bilel Derbel, Arnaud Liefooghe, S´ebastien V´erel, Hern´an E. Aguirre, and Kiyoshi Tanaka,

Newfeatures for continuous exploratory landscape analysis based on the SOO tree , Proc. of Foundationsof Genetic Algorithms (FOGA’19), ACM, 2019, pp. 72–86.[DP09] Devdatt P. Dubhashi and Alessandro Panconesi,

Concentration of measure for the analysis of ran-domised algorithms , Cambridge University Press, 2009.[DWY +

18] Carola Doerr, Hao Wang, Furong Ye, Sander van Rijn, and Thomas B¨ack,

IOHproﬁler: A Bench-marking and Proﬁling Tool for Iterative Optimization Heuristics , CoRR abs/1810.05281 (2018),Available at http://arxiv.org/abs/1810.05281 . A more up-to-date documentation of IOHproﬁleris available at https://iohprofiler.github.io/ .[DYH +

20] Carola Doerr, Furong Ye, Naama Horesh, Hao Wang, Ofer M. Shir, and Thomas B¨ack,

Benchmark-ing discrete optimization heuristics with iohproﬁler , Applied Soft Computing (2020), 106027. ECF]

Evolutionary Computation Framework (ECF), howpublished = http: // ecf. zemris. fer. hr/ ,note = Last visited: 2021-02-04 .[EPK20] Tome Eftimov, Gasper Petelin, and Peter Korosec,

Dsctool: A web-service-based framework forstatistical comparison of stochastic optimization algorithms , Appl. Soft Comput. (2020), 105977.[FDG +

12] F´elix-Antoine Fortin, Fran¸cois-Michel De Rainville, Marc-Andr´e Gardner, Marc Parizeau, and Chris-tian Gagn´e,

DEAP: Evolutionary algorithms made easy , Journal of Machine Learning Research (2012), 2171–2175.[FGLP11] Carlos M. Fonseca, Andreia P. Guerreiro, Manuel L´opez-Ib´a˜nez, and Lu´ıs Paquete, On the com-putation of the empirical attainment function , Proc. of Evolutionary Multi-Criterion Optimization(EMO’11), LNCS, vol. 6576, Springer, 2011, pp. 106–120.[GP06] Christian Gagn´e and Marc Parizeau,

Genericity in evolutionary computation software tools: Princi-ples and case study , International Journal on Artiﬁcial Intelligence Tools (2006), no. 2, 173–194.[HAR +

20] Nikolaus Hansen, Anne Auger, Raymond Ros, Olaf Mersmann, Tea Tuˇsar, and Dimo Brockhoﬀ,

COCO: a platform for comparing continuous optimizers in a black-box setting , Optimization Meth-ods and Software (2020), 1–31.[HHHL06] Frank Hutter, Youssef Hamadi, Holger H. Hoos, and Kevin Leyton-Brown,

Performance predictionand automated tuning of randomized and parametric algorithms , Proc. of Principles and Practice ofConstraint Programming (CP’06), LNCS, vol. 4204, Springer, 2006, pp. 213–228.[HHLB11] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown,

Sequential model-based optimization forgeneral algorithm conﬁguration , Proc. of Learning and Intelligent Optimization (LION’11), Springer,2011, pp. 507–523.[Jen]

Jenetics, howpublished = https: // jenetics. io/ , note = Last visited: 2021-02-04 .[KHE15] Giorgos Karafotias, Mark Hoogendoorn, and A.E. Eiben,

Parameter control in evolutionary al-gorithms: Trends and challenges , IEEE Transactions on Evolutionary Computation (2015),167–187.[KHNT19] Pascal Kerschke, Holger H. Hoos, Frank Neumann, and Heike Trautmann, Automated algorithmselection: Survey and perspectives , Evolutionary Computation (2019), no. 1, 3–45.[KMRS02] Maarten Keijzer, J. J. Merelo, G. Romero, and M. Schoenauer, Evolving objects: A general purposeevolutionary computation library , Artiﬁcial Evolution (2002), 829–888, Latest release availableon https://nojhan.github.io/paradiseo/ .[KT16] Pascal Kerschke and Heike Trautmann,

The r-package FLACCO for exploratory landscape analysiswith applications to multi-objective optimization problems , Proc. of IEEE Congress on EvolutionaryComputation (CEC’16), IEEE, 2016, pp. 5262–5269.[LDC +

16] Manuel L´opez-Ib´a˜nez, J´er´emie Dubois-Lacoste, Leslie P´erez C´aceres, Mauro Birattari, and ThomasSt¨utzle,

The irace package: Iterated racing for automatic algorithm conﬁguration , Operations Re-search Perspectives (2016), 43–58.[LIKS17] Manuel L´opez-Ib´anez, Marie-El´eonore Kessaci, and Thomas G St¨utzle, Automatic design of hybridmetaheuristic from algorithmic components , Tech. report, 2017.[LJD +

16] Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar,

Hyperband:A novel bandit-based approach to hyperparameter optimization , arXiv preprint arXiv:1603.06560(2016).[LPC12] Nuno Louren¸co, Francisco Pereira, and Ernesto Costa,

Evolving Evolutionary Algorithms , Proc.of Genetic and Evolutionary Computation Conference (GECCO’12, Companion Material), ACM,2012, pp. 51–58.[LS12] Manuel L´opez-Ib´a˜nez and Thomas St¨utzle,

The automatic design of multiobjective ant colony opti-mization algorithms , IEEE Trans. Evol. Comput. (2012), no. 6, 861–875.[LW12] Per Kristian Lehre and Carsten Witt, Black-box search by unbiased variation , Algorithmica (2012), 623–642.[MBT +

11] Olaf Mersmann, Bernd Bischl, Heike Trautmann, Mike Preuss, Claus Weihs, and G¨unter Rudolph,

Exploratory landscape analysis , Proc. of Genetic and Evolutionary Computation Conference(GECCO’11), ACM, 2011, pp. 829–836.[MLDS14] Franco Mascia, Manuel L´opez-Ib´a˜nez, J´er´emie Dubois-Lacoste, and Thomas St¨utzle,

Grammar-based generation of stochastic local search heuristics through automatic algorithm conﬁguration tools ,Comput. Oper. Res. (2014), 190–199.[MMLIS13] Marie-El´eonore Marmion, Franco Mascia, Manuel L´opez-Ib´anez, and Thomas St¨utzle, Towardsthe Automatic Design of Metaheuristics , MIC 2013 - 10th Metaheuristics International Conference(Singapore, Singapore) (Hoong Chuin Lau, G¨unther Raidl, , and Pascal Van Hentenryck, eds.),Proceedings of the 10th Metaheuristics International Conference (MIC2013), August 2013, pp. 1–3. NDV15] Antonio J. Nebro, Juan J. Durillo, and Matthieu Vergne,

Redesigning the jmetal multi-objectiveoptimization framework , Proceedings of the Companion Publication of the 2015 Annual Confer-ence on Genetic and Evolutionary Computation (New York, NY, USA), GECCO Companion ’15,Association for Computing Machinery, 2015, p. 1093–1100.[RCN98] Conor Ryan, John James Collins, and Michael O Neill,

Grammatical evolution: Evolving programsfor an arbitrary language , European Conference on Genetic Programming, Springer, 1998, pp. 83–96.[RT18] J´er´emy Rapin and Olivier Teytaud,

Nevergrad - A gradient-free optimization platform , https://GitHub.com/FacebookResearch/Nevergrad , 2018.[SB15] Kate Smith-Miles and Simon Bowly, Generating new test instances by evolving in instance space ,Comput. Oper. Res. (2015), 102–113.[SL19] Eric O. Scott and Sean Luke, ECJ at 20: toward a general metaheuristics toolkit , Proc. of Ge-netic and Evolutionary Computation Conference (GECCO’19, Companion Material), ACM, 2019,pp. 1391–1398.[WCLW20] Thomas Weise, Yan Chen, Xinlu Li, and Zhize Wu,

Selecting a diverse set of benchmark instancesfrom a tunable model problem for black-box discrete optimization algorithms , Appl. Soft Comput. (2020), 106269.[WKB +

14] Stefan Wagner, Gabriel Kronberger, Andreas Beham, Michael Kommenda, Andreas Scheibenpﬂug,Erik Pitzer, Stefan Vonolfen, Monika Koﬂer, Stephan Winkler, Viktoria Dorfer, and Michael Af-fenzeller,

Architecture and design of the heuristiclab optimization environment , Topics in IntelligentEngineering and Informatics, vol. 6, Springer, 2014, pp. 197–261.[WVY +

20] Hao Wang, Diederick Vermetten, Furong Ye, Carola Doerr, and Thomas B¨ack,

Iohanalyzer: Per-formance analysis for iterative optimization heuristic , CoRR abs/2007.03953 (2020).[WW18] Thomas Weise and Zijun Wu,

Diﬃcult features of combinatorial optimization problems and thetunable w-model benchmark problem for simulating them , Proc. of Genetic and Evolutionary Com-putation Conference (GECCO’18, Companion Material), ACM, 2018, pp. 1769–1776.[XHHL12] Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown,

Evaluating component solvercontributions to portfolio-based algorithm selectors , Proc. of Theory and Applications of SatisﬁabilityTesting (SAT’12), Lecture Notes in Computer Science, vol. 7317, Springer, 2012, pp. 228–241.[YDB19] Furong Ye, Carola Doerr, and Thomas B¨ack,

Interpolating local and global search by controlling thevariance of standard bit mutation , Proc. of IEEE Congress on Evolutionary Computation (CEC’19),IEEE, 2019, pp. 2292–2299.[YWDB20] Furong Ye, Hao Wang, Carola Doerr, and Thomas B¨ack,

Benchmarking a ( µ + λ ) genetic algorithmwith conﬁgurable crossover probability , Proc. of Parallel Problem Solving from Nature (PPSN’20),LNCS, vol. 12270, Springer, 2020, pp. 699–713.[ZR20] Martin Zaeﬀerer and Frederik Rehbach, Continuous optimization benchmarks by simulation , Proc.of Parallel Problem Solving from Nature (PPSN’20), Lecture Notes in Computer Science, vol. 12269,Springer, 2020, pp. 273–286. Comparison of frameworks

Table 4: Comparison of software frameworks for evolutionary computation. Rank is based onan adhoc aggregation of subjective metrics based on performance of the programming language,activity of the project (number of contributors), breadth of features (number of modules, numberof lines of code), and ease of integration in industrial projects (license). Data has been gatheredin 2019. N Perf Activit FeaturLicenc

Score Name Language Type UpdatLicense Contrib. kloc ParadisEO

C++ Framework 2019 LGPLv2 33 822 jMetal

Java Framework 2019 MIT 29 603

ECF

C++ Framework 2017 MIT 19 154

OpenBeagle

C++ Framework 2017 LGPLv3 4 485

Jenetics

Java Framework 2019 Apachev2 10 476

ECJ

Java Framework 2018 AFLv3 33 547

DEAP

Python Framework 2019 LGPLv3 45 98

GP.NET

DGPF

Java Framework 2007 LGPLv2 610

JGAP

Java Library 2015 LGPLv2 111

Watchmaker

Java Framework 2013 Apachev2 212

GenPro

Java Framework 2009 Apachev2 113

GAlib

C++ Library 1998 MIT 114

PyBrain

Python Module 2017 MIT 3315

JCLEC

Java Framework 2014 ? 116

HeuristicLab

GPE

JGAlib

Java Library 2004 ? 119

CIlib

Scala Framework 2019 Apachev2 1720 pycma

Python Solver 2019 BSD 421

PyEvolve

Python Framework 2015 PSF 1222

GPLAB

Matlab Library 2018 LGPLv2 823

Clojush

Clojure Framework 2019 EPLv1 1724 pySTEP

Python Framework 2013 MIT 125 µGP3

C++ Framework 2016 GPLv2 226

Pyvolution

Python Framework 2012 Apachev2 127

PISA

C++ Library 2008 * 428

EvoJ

Java Framework 2015 CC-BY-NC-SA-3 129

Galapagos

Java Framework 2013 GPLv2 130 branecloud

JAGA

Java Framework 2008 GPLv2 132

PMDGP

C++ Framework 2002 GPLv2 133

GPC++

C++ Framework 1997 GPLv2 234

PonyGE

Python Framework 2014 ? 335

Platypus

Python Framework 2019 GPLv3 936

DCTG-GP

Prolog Library 2001 ? 137

Desdeo

Python Framework 2019 MPLv2 638

PonyGE2

Python Framework 2018 GPLv3 939

EvoGrad

Python Framework 2019 Proprietary 1 Average AUC values

Table 5: Average AUC values of the elites returned by irace in each of the 15 independent runsand of the four baseline algorithms on each of the 19 W-model instances

FID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1ptGA EA fEA xGA

AUC of the elite returned by irace run number Baseline Algorithms

Figure 4: Distances between AUCs of elite algorithms and baseline algorithms.18

Diagram of

Paradis eo classes Replacement Selection<>Mutations <>Crossovers eoFastGA +rate_crossover: double+rate_mutation: double+crossover: eoQuadOp+select_aftercross: eoSelectOne+select_cross: eoSelectOne+select_mut: eoSelectOne+mutation: eoMonOp+replace: eoReplacement eoRandomSelecteoSequentialSelecteoDetTournamentSelect +tour_size = [2,6,… 11[ eoStochTournamentSelect +tour_rate = 0.5 eoProportionalSelecteo1PtBitXovereoUBitXover +preference = [0.1, 0.3,… 1[ 5 eoNPtsBitXover +num_points = [1,3,… 10[ 115 eoDetSingleBitFlip +num_bits = [1,2,…11[ [0, 0.2, 0.4,…1] eoUniformBitMutation +rate = 1 eoStandardBitMutation +rate = 1 eoConditionalBitMutation +rate = 1 eoShiftedBitMutation +rate = 1 eoNormalBitMutation +rate = 1 eoFastBitMutation +rate = 1 115 3717 eoPlusReplacementeoCommaReplacementeoSSGAWorseReplacementeoSSGAStochTournamentReplacement +tour_rate = [0.51,0.71,0.91] eoSSGADetTournamentReplacement +tour_rate = [2,4,…11[311 5

Figure 5: Summary UML diagram of the FastGA family of algorithms, as modeled with

Par-adiseo classes. Aggregation arrows shows the cardinality of instances (arrow tail side) and slots(arrow head side) involved in the ﬁnal combination. No cardinality is indicated when it equalsone. 19

Convergence plots for all problems

The following 19 ﬁgures shows the convergence plots of the baseline algorithms against the bestelite selected by irace . Algorithms are denoted in the legend by the set of indices for each slots,using the following code (see Table 2 for the corresponding algorithms):P: population size (always 5 in this study),C: crossover probability,s: crossover selector,c: crossover,a: selector after crossover (always 0 in this study),M: mutation probability,u: mutation selector,m: mutation,r: replacement,O: stopping criterion (always 0 in this study).20

20 40 60 80 10066.577.588.59

FastGA_P=5_C=0_s=0_c=0_a=0_M=0_u=0_m=1_r=0_O=0 FastGA_P=5_C=0_s=0_c=0_a=0_M=0_u=0_m=5_r=0_O=0FastGA_P=5_C=1_s=2_c=1_a=0_M=2_u=2_m=8_r=8_O=0 FastGA_P=5_C=2_s=2_c=2_a=0_M=2_u=2_m=1_r=0_O=0FastGA_P=5_C=2_s=2_c=5_a=0_M=2_u=2_m=1_r=0_O=0

Function Evaluations B ee