[PDF] Leveraging Benchmarking Data for Informed One-Shot Dynamic Algorithm Selection

Abstract

A key challenge in the application of evolutionary algorithms in practice is the selection of an algorithm instance that best suits the problem at hand. What complicates this decision further is that different algorithms may be best suited for different stages of the optimization process. Dynamic algorithm selection and configuration are therefore well-researched topics in evolutionary computation. However, while hyper-heuristics and parameter control studies typically assume a setting in which the algorithm needs to be chosen while running the algorithms, without prior information, AutoML approaches such as hyper-parameter tuning and automated algorithm configuration assume the possibility of evaluating different configurations before making a final recommendation. In practice, however, we are often in a middle-ground between these two settings, where we need to decide on the algorithm instance before the run ("oneshot" setting), but where we have (possibly lots of) data available on which we can base an informed decision. We analyze in this work how such prior performance data can be used to infer informed dynamic algorithm selection schemes for the solution of pseudo-Boolean optimization problems. Our specific use-case considers a family of genetic algorithms.

Full PDF

LLeveraging Benchmarking Data for Informed One-ShotDynamic Algorithm Selection

Furong Ye

LIACS, Leiden UniversityLeiden, Netherlands

Carola Doerr

Sorbonne Université, CNRS, LIP6Paris, France

Thomas Bäck

LIACS, Leiden UniversityLeiden, Netherlands

ABSTRACT

A key challenge in the application of evolutionary algorithms inpractice is the selection of an algorithm instance that best suits theproblem at hand. What complicates this decision further is thatdifferent algorithms may be best suited for different stages of the op-timization process. Dynamic algorithm selection and configurationare therefore well-researched topics in evolutionary computation.However, while hyper-heuristics and parameter control studies typ-ically assume a setting in which the algorithm needs to be chosenwhile running the algorithms, without prior information, AutoMLapproaches such as hyper-parameter tuning and automated algo-rithm configuration assume the possibility of evaluating differentconfigurations before making a final recommendation. In practice,however, we are often in a middle-ground between these two set-tings, where we need to decide on the algorithm instance beforethe run (“oneshot” setting), but where we have (possibly lots of)data available on which we can base an informed decision.We analyze in this work how such prior performance data canbe used to infer informed dynamic algorithm selection schemes forthe solution of pseudo-Boolean optimization problems. Our specificuse-case considers a family of genetic algorithms.

CCS CONCEPTS • Theory of computation → Bio-inspired optimization . KEYWORDS

Genetic algorithms, Dynamic Algorithm Selection, Black-Box Opti-mization, Evolutionary Computation

It is well-known that genetic algorithms (GAs) require proper pa-rameter settings and operators to work efficiently. Though manyparameter control methods [2, 4, 16, 17, 20, 23] have been proposedto tune the parameters of algorithms, they usually provide onlya fixed suggestion for the algorithms. However, recent researchhas shown that the optimal settings of parameters may adjust atdifferent optimization stages, so searching for an optimal staticparameter setting will prevent us from identifying the best solver.Simultaneously, self-adaptation has been studied for tuning param-eters on the fly, but the effectiveness of an adaptive method alsodepends on the properties of the problem and the stage of optimiza-tion. For example, the optimal mutation rate of ( + 𝜆 ) EA has beenproven to be non-static for OneMax, and a ( + 𝜆 ) EA 𝑟 / , 𝑟 with self-adaptive mutation rate has been proposed in [10]. ( + 𝜆 ) EA 𝑟 / , 𝑟 has shown its ability of following the dynamic optimal mutationrate for OneMax in [10]. However, when we solve the Jump func-tion, ( + 𝜆 ) EA 𝑟 / , 𝑟 will not be the best choice anymore. Jump is similar to OneMax, but the values in the interval [ 𝑛 − 𝑚 + , 𝑛 − ] are either set to zero or to 𝑛 − OneMax ( 𝑥 ) (both variants arestudied in the literature, see [18] for a discussion and yet anotherjump function) so that in order to reach the global optimum, elitistalgorithms like the ( + 𝜆 ) EA need to jump from a solution offitness 𝑛 − 𝑚 directly to the optimum. While the ( + 𝜆 ) EA 𝑟 / , 𝑟 is still efficient at the stage before reaching fitness layer 𝑛 − 𝑚 , it isnot very good at jumping to the optimum. For this last step, othermethods, including crossover-based algorithms [7, 28] may be abetter choice.To tackle situations as above, we would, intuitively, want to selecta best-suited algorithm for each stage of the optimization process.This idea defines a new meta-optimization problem, which is calledthe dynamic algorithm selection (dynAS) problem. The dynAS isexpected to unlock the potential benefit from switching amongdifferent algorithms online. Related work has been performed onblack-box optimization for numeric optimization [25]. Based on therich BBOB data set [14], [25] investigates the potential improve-ment that can be achieved from switching between using solvers.However, the results presented in [25] are restricted to a theoreticalassessment, without an experimental proof.The dynAS approach is closely related to hyper-heuristics [6, 21]and Algorithm control [3]. However, while hyper-heuristics andparameter control studies typically assume a setting in which the al-gorithm needs to be chosen while running the algorithms, withoutprior information ( “on-the-fly” , “online” , or “adaptive” selection),AutoML approaches such as hyper-parameter tuning and auto-mated algorithm configuration assume the possibility of evaluatingdifferent configurations before making a final recommendation( “offline” tuning). In practice, however, we are often in a middle-ground between these two settings, where we need to decide on thealgorithm instance before the run (“one-shot” decision, no trainingor partial evaluations possible), but where we have (possibly lotsof) data available on which we can base our decision (the “informed” setting). Our contribution:

We analyze in this work how well existingbenchmark data can be used for the selection of suitable algorithmcombinations. We base our experiments on the results of the bench-mark study presented in [29]. This dataset provides us with detailedperformance records for 80 different instances of a family of ( 𝜇 + 𝜆 ) GAs run on 25 pseudo-Boolean problems introduced in [13] (first 23functions) and [29] (last two problems). The data records are storedin a COCO-like format [15] and are conveniently interpretable byIOHanalyzer [26], the data analysis and visualization module ofIOHprofiler [12].Starting with an assessment of the performance improvementthat we can expect from using the dynAS, extensive experimenta-tion has been performed to reveal the effectiveness of the dynAS a r X i v : . [ c s . N E ] F e b urong Ye, Carola Doerr, and Thomas Bäckand difficulties we may encounter for future study. dynAS is ahard problem, and we cannot fully solve it in this work. So insteadof proposing a complete solution for it, our work highlights theadvantages of dynAS in some settings, and experimental resultshave shown that we can obtain better solvers and spot competitivealgorithms for different stages of the optimization process by usingthe dynAS.After a purely theoretical investigation of the benchmark dataof [29], we expected to gain improvements on all problems. Inpractice, however, these predicted potentials could not be realizedon all problems. We analyze both successful and unsuccessful trialsof the dynAS by considering the set of algorithms , the switchingpoints , and the properties of the problems (local optima) , which helpsdesigning solutions of the dynAS in future work.Simultaneously, we highlight competitive algorithms for dif-ferent stages of the optimization process on problems, such asLeadingOnes and W-model [27] problems, which illustrates whywe recommend our work of the dynAS together with benchmark-ing. The reason is that we not only obtain better solvers but alsostudy how algorithms perform at stages on different problems. Byapplying the dynAS for benchmarking, we can easily spot usefulcombinations of switching algorithms from the set of possible com-binations, which also builds a bridge from practical experimentto theoretical analysis. Overall, we promote the idea of dynamicgenetic algorithm selection in this work. By applying the dynAS,we obtain better solvers for most of the IOHprofiler benchmarkproblems, and we address the main challenges of the dynAS forfuture study. Moreover, we highlight the competitive settings of theGA for problems such as LeadingOnes and W-model problems. Outline of the paper:

We recall a formalization of the dynASproblem in Sec. 2, summarize the GA family and the benchmarkproblems in Sec. 3, demonstrate the effectiveness of dynAS in Sec. 4,discuss its generalization to the IOHprofiler problems in Sec. 5, andwe conclude the paper in Sec. 6.

The algorithm selection problem is to find the best algorithm 𝐴 ∗ from an algorithm set A to solve a problem 𝑃 [22]. We call thisclassic version the static algorithm selection, and the definition isgiven below. Definition 2.1 (AS: Static Algorithm Selection).

Given a problem 𝑃 ,a set A = { 𝐴 , ..., 𝐴 𝑛 } of algorithms, and a cost metric 𝑐 : 𝐴 × 𝑃 ↦→ R (e.g., the expected running time to solve the problem), the objectiveis to find: 𝐴 ∗ ∈ arg min 𝐴 ∈A 𝑐 ( 𝐴, 𝑃 ) For the dynamic algorithm selection (dynAS) discussed in thiswork, a dynamic selection policy 𝜋 ∈ Π is introduced to definethe dynamic method of algorithm switching. We define the dynASas below, which refers to the definition of the dynamic algorithmconfiguration task (dynAC) in [3]. Definition 2.2 (dynAS: Dynamic Algorithm Selection).

Given aproblem 𝑃 , a set A = { 𝐴 , ..., 𝐴 𝑛 } of algorithms, a state description 𝑠 𝑡 ∈ S of solving 𝑃 at time point 𝑡 , and a cost metric 𝑐 : Π × 𝑃 ↦→ R accessing the cost of a dynamic selection policy 𝜋 on a problem 𝑝 (e.g., the expected running time to solve the problem), the objectiveis to find a policy 𝜋 ∗ : S × 𝑃 ↦→ A , that selects an algorithm 𝐴 ∈ A at time point 𝑡 , by optimizing its cost on the problem 𝑃 : 𝜋 ∗ ∈ arg min 𝜋 ∈ Π 𝑐 ( 𝜋, 𝑃 ) . We note that Definition 2.1.2 may suggest that the optimal pol-icy may depend on the time elapsed. In practice, however, otherindicators such as solution quality are also considered [24, 25] oreven known to be optimal [5, 9].

To solve the dynAS problem, we need to define the state description 𝑠 𝑡 ∈ S at time point 𝑡 and the cost metric 𝑐 . The fixed-target ap-proach of measuring algorithm performance by using the expectedrunning time (ERT) perfectly matches this requirement. The ERTof an algorithm 𝐴 hitting a target 𝜙 on a problem 𝑃 is given asbelow [15]: ERT ( 𝐴, 𝑃, 𝜙 ) = (cid:205) 𝑟𝑖 = min { 𝑡 𝑖 ( 𝐴, 𝑃, 𝜙 ) , 𝐵 } (cid:205) 𝑟𝑖 = { 𝑡 𝑖 ( 𝐴, 𝑃, 𝜙 ) < ∞} , (1)where 𝑟 is the number of runs of the algorithm 𝐴 , 𝐵 is the maximalbudget (e.g., maximal number of function evaluation) of the algo-rithm 𝐴 on the problem 𝑃 . Application of ERT

By using the ERT as the cost metric 𝑐 of theAS problem, we can define the best static algorithm 𝐴 ∗ as below: Definition 2.3 (BSA: Best Static Algorithm).

Given a problem 𝑃 and a set A = { 𝐴 , ..., 𝐴 𝑛 } of algorithms, the best static algorithm 𝐴 ∗ for the target 𝜙 is 𝐴 ∗ = arg min 𝐴 ∈A ERT ( 𝐴, 𝑃, 𝜙 ) . As for the dynAS, we restrict our attention in this work to dy-namic selection policies 𝜋 which switch only once, i.e., from usingan algorithm 𝐴 to an algorithm 𝐴 at the state 𝑠 , where a fitness 𝑓 ≥ 𝜙 𝑠 is found for the first time. Therefore, the policy 𝜋 can beconstructed as 𝜋 = ( 𝐴 , 𝐴 , 𝜙 𝑠 ) for this switch-once dynAS, where 𝐴 , 𝐴 ∈ A and 𝜙 𝑠 is within the domain of fitness values. Then thepredicted performance of 𝜋 hitting the final target 𝜙 𝑓 on a problem 𝑃 can be calculated as: 𝑇 ( 𝜋, 𝑃, 𝜙 𝑓 ) = ERT ( 𝐴 , 𝑃, 𝜙 𝑠 ) + ERT ( 𝐴 , 𝑃, 𝜙 𝑓 ) − ERT ( 𝐴 , 𝑃, 𝜙 𝑠 ) . (2)By using the upper predicted performance as the cost metric 𝑐 ofthe dynAS problem, we define the best dynamic algorithm selectionpolicy 𝜋 ∗ as below: Definition 2.4 (BDA: Best Dynamic Algorithm Selection Policy).

Given a problem P, a set Φ of targets, and a set A = { 𝐴 , ..., 𝐴 𝑛 } ofalgorithms, the best dynamic algorithm selection policy 𝜋 ∗ for thetarget 𝜙 𝑓 is ( 𝐴 , 𝐴 , 𝜙 𝑠 ) ∗ = 𝜋 ∗ = arg min 𝜋 ∈(A×A× Φ ) 𝑇 ( 𝜋, 𝑃, 𝜙 𝑓 ) , where 𝐴 , 𝐴 ∈ A , 𝜙 𝑠 ∈ Φ .urong Ye, Carola Doerr, and Thomas Bäck We describe in this section a configurable GA framework, the IOH-profiler problems, and our prior benchmark data for the dynASproblem in Sec. 5. ( 𝜇 + 𝜆 ) GA To instantiate variants of the GA for the dynAS problem, we workon the configurable GA framework proposed in [29], which allowsus to tune parameters and select from a set of operators. Thisframework can also be used for a future extension of this studyto the dynAC problem. Algorithm 1 presents the details of theframework.The GA initializes its population uniformly at random. For eachiteration, 𝜆 offspring is created either by using crossover (withprobability 𝑝 𝑐 ) or using mutation (with probability − 𝑝 𝑐 ), andthe best 𝜇 of parent and offspring individuals are selected for theparent population of the next iteration. The GA terminates untilhitting the optimum or reaching the maximal budget of functionevaluations.Three well-known crossover operators, one-point crossover , two-point crossover , and uniform crossover , and two mutation operators, standard bit mutation and fast mutation are optional for the GAframework. For the standard bit mutation , we flip bits at ℓ distinct po-sitions, which are randomly chosen. ℓ is sampled from a conditionalbinomial distribution Bin > ( 𝑑, 𝑝 ) [19], where 𝑑 is the dimensionand 𝑝 is fixed as / 𝑑 in this paper. For the fast mutation , ℓ is sam-pled from a power-law distribution, and we follow the suggestionin [11]. The IOHprofiler problem set [13] initially contains 23 real-valuedpseudo-Boolean problems, and another two problems were addedin [29]. Based on the prior data set of 80 variants of GAs, whichis available at [30], we can investigate the potential improvementthat could be theoretically obtained from applying the dynAS.For the ease of understanding the following discussion, we pro-vide partial definitions of the problems below. Details of the otherproblems are available in [13, 29]. • F1: OneMax is maximizing the number of ones, which asksto maximize the function OM: { , } 𝑛 → [ ..𝑛 ] , 𝑥 ↦→ (cid:205) 𝑥 𝑖 . • F2: LeadingOnes is maximizing the number of initial ones,which asks to maximize the function LO: { , } 𝑛 → [ ..𝑛 ] , 𝑥 ↦→ max { 𝑖 ∈ [ ..𝑛 ]|∀ 𝑗 ≤ 𝑖 : 𝑥 𝑗 = } . • F3: A linear function with Harmonic Weights.

We cansee this function as a variant of OneMax with weightedvariables, which asks to maximize: { , } 𝑛 → R , 𝑥 ↦→ (cid:205) 𝑖𝑥 𝑖 . • F5: A W-model extension of OneMax (Reduction).

Dummy variables are introduced to OneMax for this func-tion. randomly selected bits do not have any impacton the fitness value. Therefore, for a 𝑛 -dimensional F5, itsoptimum is . 𝑛 . For F4, bits do not have any impacton the fitness value, the others are the same. • F6: A W-model extension of OneMax (Neutrality).

Theoriginal input bit-string ( 𝑥 , ..., 𝑥 𝑛 ) is mapped to a bit-string ( 𝑦 , ..., 𝑦 ⌊ 𝑛 / ⌋ ) . The value of 𝑦 𝑖 is the majority of Algorithm 1:

A Family of ( 𝜇 + 𝜆 ) Genetic Algorithms Input:

Population sizes 𝜇 , 𝜆 , crossover probability 𝑝 𝑐 ,mutation rate 𝑝 ; Initialization: for 𝑖 = , . . . , 𝜇 do sample 𝑥 ( 𝑖 ) ∈ { , } 𝑑 uniformly at random (u.a.r.), and evaluate 𝑓 ( 𝑥 ( 𝑖 ) ) ; Set 𝑃 = { 𝑥 ( ) , 𝑥 ( ) , ..., 𝑥 ( 𝜇 ) } ; Optimization: for 𝑡 = , , , . . . do 𝑃 ′ ← ∅ ; for 𝑖 = , . . . , 𝜆 do Sample 𝑟 ∈ [ , ] u.a.r.; if 𝑟 ≤ 𝑝 𝑐 then select two individuals 𝑥, 𝑦 from 𝑃 u.a.r. (withreplacement); 𝑧 ( 𝑖 ) ← Crossover ( 𝑥, 𝑦 ) ; if 𝑧 ( 𝑖 ) ∉ { 𝑥, 𝑦 } then evaluate 𝑓 ( 𝑧 ( 𝑖 ) ) else infer 𝑓 ( 𝑧 ( 𝑖 ) ) from parent; else select an individual 𝑥 from 𝑃 u.a.r.; 𝑧 ( 𝑖 ) ← Mutation ( 𝑥 ) ; if 𝑧 ( 𝑖 ) ≠ 𝑥 then evaluate 𝑓 ( 𝑧 ( 𝑖 ) ) else infer 𝑓 ( 𝑧 ( 𝑖 ) ) from parent; 𝑃 ′ ← 𝑃 ′ ∪ { 𝑧 ( 𝑖 ) } ; 𝑃 is updated by the best 𝜇 points in 𝑃 ∪ 𝑃 ′ (ties brokenu.a.r.); Terminal Condition:

The optimum is found or the budgetis used out; ( 𝑥 𝑖 − , 𝑥 𝑖 − , 𝑥 𝑖 ) . The fitness value of 𝑥 on F6 is OM ( 𝑦 ) .Therefore, for a 𝑛 -dimensional F6, its optimum is ⌊ 𝑛 / ⌋ . • F7: A W-model extension of OneMax (Epistasis) . Anepistasis function is applied to disturb permutation of bit-strings. Assuming there are two bit-strings 𝑏 , 𝑏 with Ham-ming distance , after the transformation with the epistasisfunction, the distance between two transformed bit-strings 𝑏 ′ , 𝑏 ′ is 𝜐 − , where 𝜐 is the length of the bit-string. F7partitions the input bit-string into segments of length 𝜐 = and applies the epistasis function on each segment. Moredetails can be found in Sec 3.7.3 of [13]. • F8: A W-model extension of OneMax (Ruggedness) .Ruggedness is introduced by performing a transformationfunction on the fitness value OM ( 𝑥 ) . The transformationfunction 𝑟 : [ ..𝑑 ] → [ .. ⌈ 𝑑 / ⌉ + ] is defined as follows: 𝑟 ( 𝑑 ) = ⌈ 𝑑 / ⌉ + , 𝑟 ( 𝑖 ) = ⌊ 𝑑 / ⌋ + if 𝑖 is even and 𝑖 < 𝑑 , and 𝑟 ( 𝑖 ) = ⌈ 𝑑 / ⌉ + if 𝑖 is odd and 𝑖 < 𝑑 . The fitness value of F8is 𝑟 ( OM ( 𝑥 ))• F24: Concatenated Trap (CT) partitions the input bit-stringinto segments of length 𝑘 and returns the sum of fitnessvalues of concatenating Trap functions that take the seg-ments as input. The Trap function asks to maximize Trap: { , } 𝑘 → [ , ] . Trap ( 𝑥 ) = if the number 𝑢 of ones isequal to 𝑘 , otherwise, Trap ( 𝑥 ) = ( 𝑘 − − 𝑢 )/ 𝑘 . 𝑘 sets as .urong Ye, Carola Doerr, and Thomas Bäck Inspired by the investigation in [29] that, on LeadingOnes, theoptimal crossover probability of Algorithm 1 is dynamic along theproblem dimension and population size, we are interested in theperformance of the GAs with using uniform crossover at differentstages.

Dynamic optimal crossover probability.

To obtain the opti-mal crossover probability at different stages of the algorithm, wetest the ( + ) GA using standard bit mutation with 𝑝 = / 𝑛 and uniform crossover with different 𝑝 𝑐 ∈ { . 𝑘 | 𝑘 ∈ [ .. ]} .Algorithms run at stages of fitness value 𝑓 ∈ [ 𝑠, 𝑠 + ] , 𝑠 ∈ { 𝑖 | 𝑖 ∈ [ .. ]} on -dimensional LeadingOnes. Practically, weinitialize the population of the GAs with all the individual’s fitnessvalues equal to 𝑠 , and the algorithms terminate once a solution with 𝑓 ( 𝑥 ) ≥ 𝑠 + is found.Figure 1 plots function evaluations used by the GAs at eachstage. It shows that the GA with 𝑝 𝑐 = spends the least functionevaluations at the early stages 𝑠 ≤ , but is outperformed byother GAs with 𝑝 𝑐 > as 𝑠 increasing. With the observation onthe population of the GAs at late stages, we find that, for the GAwith 𝑝 𝑐 > , the fitness of most individuals converges quickly tothe best found fitness after a better solution is found, but for theGA with 𝑝 𝑐 = , the fitness of most individuals remains constant.when the best solution individual has been updated several times.An intuitive explanation for why the former performs better atlater stages is that the GA can copy the current best initial onesto increase the quality of the whole population by using uniformcrossover. Dynamic crossover probability selection.

With the result inFigure 1, we expect to gain improvement by using the optimal crossover probability at all stages. Figure 3 plots the fixed-targetERTs of GAs with static 𝑝 𝑐 and dynamic ones. The dynamic policyselects the corresponding best 𝑝 𝑐 at each stage. Practically, as theGA finds a solution with 𝑠 ≤ 𝑓 ( 𝑥 ) < 𝑠 , 𝑠 = 𝑠 + , 𝑠 ∈ { 𝑖 | 𝑖 ∈[ .. ]} , the 𝑝 𝑐 will adjust by using the corresponding best valuein Figure 1. In other words, the dynamic policy is a dynAS policy 𝜋 in which inputs are 𝑃 is LeadingOnes, A consists of GAs withdifferent 𝑝 𝑐 , and 𝑆 is the set of targets 𝑠 .We observe in Figure 2 that the GA with dynamic 𝑝 𝑐 outperformsother GAs at all points in time, which leads to a success hitting theoptimum 𝑓 ( 𝑥 ) = with the smallest ERT. Concretely, the ERT ofthe dynamic policy is , whereas that of the best runner-up (theGA with 𝑝 𝑐 = . ) is . This corresponds to a improvementof the dynamic GA over the best static one. This performanceempirically proves that the GA can benefit from dynamic crossoverprobability, and it displays a successful case of applying the dynASfor the GA. However, the dynAS problem is not usually comingwith with the ideal condition that candidate algorithms differ byonly one parameter, so that we are considering GAs with morecombinations of parameters and operators in the next section. Since the LeadingOnes case shows significant improvement byusing dynamic crossover probabilities, which is a particular case of

Initial f(x) F un c t i on E v a l ua t i on s Figure 1: Average number of function evaluations needed bydifferent ( + ) GAs to find a solution 𝑦 with 𝑓 ( 𝑦 ) ≥ 𝑠 + on the -dimensional LeadingOnes function when allten points in the initial population are uniformly chosenfrom the set of points 𝑥 that satisfy 𝑓 ( 𝑥 ) = 𝑠 , for 𝑠 ∈ { 𝑖 | 𝑖 ∈ [ .. ]} . The GAs differ only in the crossover probabil-ity 𝑝 𝑐 ∈ { . 𝑘 | 𝑘 ∈ [ .. ]} (different lines). Results are av-eraged of independent runs. The connecting lines areonly meant to help visual interpretation, the data points areonly at the values 0, 5, 10, ..., 95. Best-so-far f(x)-value F un c t i o n E v a l u a t i o n s Figure 2: Fixed-target ERTs of GAs on -dimensionalLeadingOnes. The legend presents values of 𝑝 𝑐 , and the dynamic adjusts its 𝑝 𝑐 to the optimal value at each target 𝑓 ( 𝑥 ) = 𝑠, 𝑠 ∈ { 𝑖 | 𝑖 ∈ [ .. ]} , based on the result in Figure 1.Results are average of independent runs. The figure isproduced using the IOHprofiler tool [12]. the dynAS, we study the behavior of the dynAS on a broader rangeof problems and GAs. In this section, we apply the dynAS on the25 IOHprofiler benchmark problems (see Sec 3.2) with considering GAs. The optional parameter settings and operators for the GA(see Sec3.1) are listed below: • population size schemata: ( 𝜆 + ) , ( 𝜆 + 𝜆 / ) , and ( 𝜆 + 𝜆 ) , 𝜆 ∈{ , , } , and ( + 𝜆 ) , 𝜆 ∈ { , , , } . • mutation operators: standard bit mutation (sbm) and fastmutation . • crossover operators: one-point crossover , two-point crossover ,and uniform crossover . • crossover probabilities: 𝑝 𝑐 ∈ { , . } Crossover operators are only applied for ( 𝜆 + ) , ( 𝜆 + 𝜆 / ) , and ( 𝜆 + 𝜆 ) GAs with 𝑝 𝑐 = . , and ( + 𝜆 ) GAs are all mutation-onlyGAs.urong Ye, Carola Doerr, and Thomas Bäck

At first, we investigate the performance of the static GAs. Fig-ure 3 shows the distributions of ERTs among 25 functions. Thetargets used to calculate ERTs are listed in Table 1. We observe sub-stantial differences among algorithms as well as among problems.A red dashed line connects the best ERTs of static GAs on eachproblem.Based on the data in [30], we can calculate theoretical perfor-mance (predicted ERTs in formula 2) of all possible policies 𝜋 withcombinations of GAs. As mentioned in Sec 2.2, we consider the switch-once dynAS. To generate the set Φ of targets, we select evenly spaced partition points within [ 𝜙 𝑚 , 𝜙 𝑓 ] by linear scale andlog scale respectively, where 𝜙 𝑚 is the smallest fitness value of theproblem, and 𝜙 𝑓 is the final target. Note that we only consider theGAs that hit the corresponding target with a success rate 𝑝𝑠 ≥ . for the dynAS.Table 1 lists the best dynamic algorithm policy (BDA, see Defini-tion 2.2.2 ) for the IOHprofiler problems, and their predicted ERTsare also visualized by a solid red line in Figure 3. For ease of no-tation, we denote dynGA as the method of the dynAS policy. Weexpect the dynGA, which is the theoretically best, to outperformthe BSA on all problems. We also observe that the BSAs areusually selected for either the first or the second stage for the BDAs,expect for F7, F14, and F22-23. For the targets where the BDAsswitch from using one algorithm to another one, they either areclose to the final target or locate at the early stage, expect for F18and F24. Problem ID E R T Figure 3: Distributions of log ERTs among all GAs on the25 IOHprofiler problems in dimension 𝑑 = . The dashedline connects the points of the best ERTs for each problem.The solid line connects the predicted ERTs of the best dyn-GAs. Experimental results are from independent runs. To reveal the practical performance of the predicted BDA and tostudy the behavior of the dynAS, instead of considering only thetheoretically best one, we test dynGAs for each problem. Prac-tically, we calculate the predicted ERTs of all combinations of 𝜋 over 80 algorithms and 42 targets and take the best for the ex-periment. For the dynGAs which the parent population sizes of 𝐴 and 𝐴 remain the same, we only adjust the parameter settings andoperators as switching, for the dynGAs with 𝜇 > 𝜇 , we selectedthe best 𝜇 of 𝜇 for the new parents after switching, and for the −0.50−0.250.000.250.50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Problem ID R e l a t i v e E R T Figure 4: Box plots of relative ERTs of dynGAs ( 𝑑𝐸𝑅𝑇 ) forthe IOHprofiler problems in dimension 𝑑 = , comparingto the 𝑠𝐸𝑅𝑇 in Table 1. The relative deviation is calculated by ( 𝑑𝐸𝑅𝑇 − 𝑠𝐸𝑅𝑇 )/ 𝑠𝐸𝑅𝑇 . Results of each algorithm are plottedby black dots. Negative values (below the red line) indicatebetter solvers comparing to the BSA. Values are capped by [− . , − . ] for visualization so that the results of F24-F25are missing here with values larger than . Results are from100 independent runs. Detailed data can be found at [1] dynGAs with 𝜇 < 𝜇 , the new parent population consist of copiesof previous 𝜇 , ⌊ 𝜇 / ⌋ − 𝜇 copies of the best of previous 𝜇 , and ⌈ 𝜇 / ⌉ new individuals randomly generated. The summary data ofthis paper can be found at [1].Figure 4 plots the distributions of relative ERTs comparing to the 𝑠𝐸𝑅𝑇 in Table 1, and the result of each dynGA are marked by blackdots. Note that we do not expect the entire group of dynGAsto perform better than the BSA because not all of the dynGAscan theoretically obtain ERTs better than 𝑠𝐸𝑅𝑇 . However, as longas some dynGAs outperform the BSA (dots below the red line inFigure 4), we can expect an improvement by applying the dynAS.We observe promising results of better solvers for problems(except F5, F8, F10, and F24-25) in Figure 4. On the other hand, wewould like to investigate and better understand the unsuccessfultrials.Recall that a dynAS policy is described by 𝜋 = ( 𝐴 , 𝐴 , 𝜙 𝑠 ) withcomponents of 𝐴 , 𝐴 ∈ A and 𝜙 ∈ Φ . We at first discuss theexperiment of F3-6 here concerning the limitation of candidatealgorithms A . Figure 5 plots the frequencies of tested parametersand operators and the averaged relative ERTs of the dynGAs withthe corresponding GAs combination. The frequency stands for the-oretical prediction, and the relative ERT stands for the experimentalresult. Tiles are distributed at three zones in the figure: the bottomleft is combinations of ( 𝜇 + 𝜆 ) , the middle is combinations of mu-tation operators, and the upper right is combinations of crossoveroperators. We have erased the operators not being selected fromthe figures. For example, . in the bottom left indicates dynGAsusing ( + ) GAs as the first algorithm and using ( + ) GAs asthe second one. The purple color indicates the averaged relativeERTs of these four dynGAs are less than 0 compared to the 𝑠𝐸𝑅𝑇 .For F3 and F5-6, the dynAS always chooses the ( + ) EA > for A2, and it does not recognize mutation and crossover operatorswith different (dis)advantages for 𝐴 . Looking at the 𝜙 𝑠 , we observethat switching is located around the initial fitness of these vari-ants of OneMax problems. According to Table 1, the ( + ) EA > urong Ye, Carola Doerr, and Thomas Bäck funcId fTarget BSA sERT A1 A2 sTarget dERT ratio ( % )1 100 (1+1) EA >

705 (1+1) EA > (10+10)-uniform-GA 96 638 9.52 100 (1+1) EA > >

21 5 186 4.53 5 050 (1+1) EA >

702 (100+1)-two-point-fGA (1+1) EA > >

387 (1+1) EA > (1+10) EA >

49 379 2.15 90 (1+1) EA >

562 (1+1) fast GA (1+1) EA >

55 559 0.56 33 (1+1) EA >

265 (100+100) EA > (1+1) EA >

20 255 3.87 100 (50+50) EA >

234 980 (100+100)-two-point-GA (100+50) EA >

95 182 271 22.48 51 (10+10)-uniform-GA 1 808 (10+10)-uniform-GA (50+50)-uniform-GA 50 1 441 20.39 100 (100+1)-uniform-fGA 3 354 (50+1)-uniform-fGA (100+100)-uniform-fGA 96 2 255 32.810 100 (100+100)-uniform-fGA 53 083 (50+25)-uniform-fGA (100+100)-uniform-fGA 94 16 956 68.111 50 (1+1) EA > >

17 1 818 8.312 90 (1+1) EA > >

19 4 535 4.813 33 (1+1) EA > >

14 929 11.314 7 (100+1)-uniform-fGA 166 (100+50)-one-point-GA (50+25)-uniform-fGA 5 145 12.715 51 (1+1) EA > >

10 6 200 4.216 100 (1+1) EA > >

21 9 455 3.217 100 (1+1) EA >

41 697 (1+1) fast GA (1+1) EA >

21 40 350 3.218 4.22 (50+50) fast GA 240 145 (10+10) EA > (50+50) fast GA 3.57 14 468 94.019 98 (1+1) EA >

10 048 (1+1) fast GA (1+1) EA >

60 10 044 0.020 180 (1+1) EA > > (1+10) EA >

178 1 482 7.421 260 (1+1) EA > > (1+10) EA >

258 1 041 3.322 42 (10+5)-two-point-fGA 31 920 (1+10) EA > (100+1)-two-point-GA 39 1 092 96.623 9 (1+10) EA > > (10+1) EA > > (100+100)-two-point-fGA 15.81 1 607 60.125 -0.30 (100+100)-uniform-fGA 21 208 (1+1) fast GA (100+100)-uniform-fGA -0.32 11 151 47.4 Table 1: Theoretical Performance of the DAS for the 25 IOHprofiler benchmark problems in dimension 𝑑 = . 𝑓 𝑇𝑎𝑟𝑔𝑒𝑡 liststhe final targets used to calculate ERTs, and 𝑠𝐸𝑅𝑇 lists the ERTs of the algorithms in column 𝑠𝑡𝑎𝑡𝑖𝑐𝐴𝑙𝑔 , which are the bestones among the 80 tested GAs for each problem. The DAS switches from using 𝐴 to using 𝐴 as finding a solution with 𝑓 ( 𝑥 ) ≥ 𝑠𝑇𝑎𝑟𝑔𝑒𝑡 , and the corresponding predicted ERTs are listed in 𝑑𝐸𝑅𝑇 . 𝑟𝑎𝑡𝑖𝑜 = ( 𝑠𝐸𝑅𝑇 − 𝑑𝐸𝑅𝑇 )/ 𝑠𝐸𝑅𝑇 . All algorithms are testedwith 100 independent runs.For algorithm names: ’EA > ’ denotes the mutation-only GAs using sbm with 𝑝 = / 𝑛 , ’fast GA’ denotes the mutation-only GAsusing fast mutation. The GAs with 𝑝 𝑐 = . are named as ’ ( 𝜇 + 𝜆 ) -crossover operator-GA/fGA’, where ’GA’ indicates using sbmwith 𝑝 = / 𝑛 and ’fGA’ indicates using fast mutation. performs best on OneMax variants (F3-F6), and the dynAS is ex-pected to gain improvement by switching at the very beginning.However, this may even not happen in practice because of the ran-domness of initialization. Also, due to the 𝑠𝐸𝑅𝑇 being relativelysmall on the problems, we can see that GAs using large 𝜇 deterio-rate because it takes unnecessary evaluations for a large population.Uniform crossover has shown its advantages on OneMax in previ-ous study [8], and we gain improvement by switching to use a GAwith uniform crossover at late stages. However, for the OneMaxvariants with weighted variables, dummy variables, and neutral-ity, we do not observe that the dynGAs can benefit from uniformcrossover. Apart from F3-6, we did notsee significant improvement by using the dynAS for the OneMaxvariants F8 and F10. Recall that we study the informed dynAS sothat we can obtain some preliminary information. Differently fromthe performance on F3-6, ( + ) EA > is not the BSA for F8-10. Thesituations for F8 and F10 are similar, and F8 is taken here for thediscussion. According to Figure 6(a), the BSA ( + ) -uniform-GAis not selected for the tested dynGAs. 𝐴 of all dynGAs are still GAsusing uniform crossover, but the parent population size 𝜇 > . Toexplore potential improvement by increasing the diversity of thedynAS policy 𝜋 , the number of a GA can not exceed when weselect algorithms for 𝐴 and 𝐴 , respectively. Figure 6(b) plots the results of the dynGAs selected with such constraints. We observethat ( + ) GAs are included for 𝐴 , and the combination ofusing ( + ) GAs as 𝐴 and using ( + ) GAs as 𝐴 shows theonly tile where improvement is obtained by average.Moreover, we plot the distribution of relative ERTs of the dynGAsselected with the constraints in Figure 7, and the fixed-target resultof the best one is given beside. The dynGAs with ( + ) GAsas 𝐴 contribute all better solvers (dots below the red line). Ac-cording to the fixed-target result, the dynGA benefits from uniformcrossover at the late stage on F7, using fewer evaluation functions tohandle the ruggedness and deceptiveness. Specifically, the dynGAusing uniform crossover requires proper settings of 𝜇 , according toFigure 6, the advantage disappears as using 𝜇 = { , } . It is known that local optimabring difficulties for optimization, and in this work, we also observethe obstacle they cause for the dynAS. Recall that in formula 2the contribution of 𝐴 to the predicted ERT is decided by its ERThitting the target 𝑓 ( 𝑥 ) = 𝜙 𝑠 . However, using the ERT as the costmetric of the dynAS, we do not obtain more information to estimateif 𝐴 is trapped or around a local optimum. This lack of knowledgemay affect the dynAS, and we observe it results in failures of thisstrategy for F24-25.Figure 8 plots the fixed-target result of the best tested dynGA onF24, which uses a ( + ) -two-point-fGA at first and switches to aurong Ye, Carola Doerr, and Thomas Bäck . . . . . . . . .

15 0 . . . . .

26 0 . ( + ) s b mm u t a t i on − on l y ( + )( + )( + )( + )( + )( + )( + )( + )( + ) s b m f as t on e − po i n tt w o − po i n t un i f o r mm u t a t i on − on l y A1 A (a) F3 . . . .

08 0 . . . . . .

02 0 . . . . . . ( + ) s b mm u t a t i on − on l y ( + )( + )( + )( + )( + )( + )( + )( + )( + )( + ) s b m f as t on e − po i n tt w o − po i n t un i f o r mm u t a t i on − on l y A1 A (b) F5 .

06 0 . . . . . . . . . . .

24 0 . . ( + ) s b mm u t a t i on − on l y ( + )( + )( + )( + )( + )( + )( + )( + ) s b m f as t on e − po i n tt w o − po i n t un i f o r mm u t a t i on − on l y A1 A (c) F6 Figure 5: Averaged Relative ERTs of dynGAs with corre-sponding operator combination relative to the 𝑠𝐸𝑅𝑇 on F3-4,and F6 in dimension 𝑑 = . X-axis and Y-axis indicate theoperators selected by 𝐴 and 𝐴 respectively. Purple colorindicates better solvers comparing to the BSA. Numbers ontiles show the frequency of the combination that appearsamong algorithms.The figures on the top of the tile plots are the distributions of 𝜙 𝑠 of dynGAs, and values are scaled by ( 𝜙 𝑠 − 𝜙 𝑚 )/( 𝜙 𝑓 − 𝜙 𝑚 ) . 𝜙 𝑚 is the minimal fitness of the problem. ( + ) -two-point-fGA afterward. By using a small populationsize ( + ) initially, the dynGA indeed converges to the switchpoint fast, but it is trapped there and could not follow the originaltrend of the ( + ) GA later.We do not solve this problem here, but it is interesting to spotthis issue for future work. Concerning 𝐴 , its performance at theswitch point should be considered from different perspectives. If 𝐴 leads the dynAS policy into a local optimum, we should set theswitch point earlier. Regarding 𝐴 , it makes sense that ( + ) can avoid being trapped for F24 with a large population size, butthe question is how the algorithm handles local optima. If thealgorithm obtains the ability to escape from local optima by meansof the diversity of the population, we can expect to solve the dynASby considering the initialization of 𝐴 . If the algorithm possessespowerful operators to escape from local optima, the method offormula 2 can still be useful to predict the performance of the . . . . .

06 0 . . . . . .

46 0 . .

02 0 .

38 0 . ( + )( + )( + )( + ) s b m f as t un i f o r m ( + )( + )( + ) s b m f as t un i f o r mm u t a t i on − on l y A1 A (a) Rank-select . . .

01 0 . . . . . . .

11 0 . . .

16 0 . .

19 0 . .

11 0 .

03 0 .

42 0 . ( + )( + )( + )( + )( + ) s b m f as t un i f o r m ( + )( + )( + )( + ) s b m f as tt w o − po i n t un i f o r mm u t a t i on − on l y A1 A (b) Restriction-select Figure 6: Averaged Relative ERTs of dynGAs with corre-sponding operator combination relative to the 𝑠𝐸𝑅𝑇 on F8in dimension 𝑑 = . X-axis and Y-axis indicate the oper-ators selected by 𝐴 and 𝐴 respectively. Purple color indi-cates better solvers comparing to the BSA. Numbers on tilesshow the frequency of the combination that appears among algorithms.The left figure plots the result of 100 dynGAs, which are thebest 100 ranked by theoretical performance, and the selectedtimes of the algorithms are capped by for the right figure. −0.50−0.250.000.250.50 R e l a t i v e E R T

20 25 30 35 40 45 501 (1+1) EA>0(10+10)-uniform-GAdynGA Best-so-far f(x)-value F un c t i o n E v a l u a t i o n s Figure 7: The left is the box plot of relative ERTs comparingto the 𝑠𝐸𝑅𝑇 on F8 in dimension 𝑑 = . The right plots fixed-target ERTs of GAs. The dynGA switches from the ( + ) -uniform-GA to the ( + ) EA > at the target 𝑓 ( 𝑥 ) = , whichis produced using the IOHprofiler tool [12]. Results are from100 independent runs. dynAS policy. If the algorithm can avoid entering the local optimaarea but obtain the ability to escape from the area, we need to setthe switch point before being trapped. Although there are problems that the dy-nAS does not find better solvers as discussed, we gain improvementon most of the benchmark problems. Nevertheless, our goals are toobtain better results for problems and analyze the performance ofGAs by applying the dynAS. In this section, we take the successfultrial of F7 as an example to illustrate what we can achieve by usingthe informed dynAS.Figure 9 presents the frequencies of combinations of GAs andtheir corresponding relative ERTs comparing to the 𝑠𝐸𝑅𝑇 on F7.We observe that various GAs are selected for the dynAS policies,and the superior settings can be easily recognized. According toTable 1, ( + ) EA > is the BSA for F7. Meanwhile, the dynGAsgain improvement in Figure 9 by using ( + ) EA > as 𝐴 . For 𝐴 , ( + ) -two-point-GA is the one that can be useful forurong Ye, Carola Doerr, and Thomas Bäck (10+10)-two-point-fGA(100+100)-two-point-fGAdynGA Best-so-far f(x)-value F un c t i o n E v a l u a t i o n s Figure 8: Fixed-target ERTs of GAs on F24 in dimension 𝑑 = . The dynGA switches from the ( + ) -two-point-fGAto the ( + ) -two-point-fGA at the target 𝑓 ( 𝑥 ) = . .Results are from 100 independent runs. The figure is pro-duced using the IOHprofiler tool [12]. the dynGAs. Based on the observation, we expect that, for such aOneMax variant of epistasis, using two-point crossover can savefunction evaluations in the early stage, and a mutation-only GAwill be the right choice for the later stage.Additionally, we plot the fixed-target result of the best dynGAon F7 in Figure 10. Interestingly, the 𝐴 of the best dynGA is us-ing one-point crossover instead of two-point crossover. Accordingto Figure 9, we do not observe a significant improvement by us-ing one-point crossover for 𝐴 . By analyzing raw data, we findthis advantage is hidden by averaging other dynAS policies. Thedistribution of 𝜙 𝑠 (Figure 9) shows two peaks around and respectively, but the performance of the dynAS policy deterioratesas 𝜙 𝑠 > increases, though the theoretical prediction still indi-cates an improvement. This observation reflects the discussion inSec 5.2.2 that the switching point should be chosen by consideringthe state of 𝐴 . . . .

03 0 .

01 0 . . . . . . .

02 0 . . . .

12 0 .

72 0 .

28 0 .

29 0 . ( + )( + )( + )( + ) s b mm u t a t i on − on l y ( + )( + )( + )( + )( + ) s b m f as t on e − po i n tt w o − po i n t A1 A −0.15−0.10−0.050.000.050.10 Figure 9: Averaged Relative ERTs of dynGAs with corre-sponding operator combination relative to the 𝑠𝐸𝑅𝑇 on F7 indimension 100. X-axis and Y-axis indicate the operators se-lected by 𝐴 and 𝐴 respectively. Purple color indicates bet-ter solvers comparing to the BSA. Numbers on tiles show thefrequency of the combination that appears among algo-rithms.The figure on the top of the tile plot is the distributions of 𝜙 𝑠 of dynGAs, and values are scaled by ( 𝜙 𝑠 − 𝜙 𝑚 )/( 𝜙 𝑓 − 𝜙 𝑚 ) . 𝜙 𝑚 is the minimal fitness of the problem. We have investigated in this work possibilities to leverage existingbenchmark data to derive switch-once dynamic algorithm selection

30 40 50 60 70 80 90 1001101001e+31e+41e+51e+61e+7 (100+100)-one-point-GA(50+50) EA>0dynGA

Best-so-far f(x)-value F un c t i o n E v a l u a t i o n s Figure 10: Fixed-target ERTs of GAs on F7 in dimension 𝑑 = . The dynGA switches from the ( + ) -one-point-GAto the ( + ) EA > at the target 𝑓 ( 𝑥 ) = . Results arefrom 100 independent runs. The figure is produced using theIOHprofiler tool [12]. policies. Our use-case was a family of genetic algorithms, applied tothe 25 problems suggested in [13, 29]. We first used the benchmarkdata to compute a hypothetical performance of the dynAS policies.We then executed the ones which showed the best improvementpotential. Our experimental analysis confirmed the existence ofcombinations which outperform the best static algorithms. For thedynGAs that do not perform as expected, we could either explainthe reasons or we offered a more fine-grained investigation of ourdynAS approach. We have also analyzed the role of the diversity ofthe candidate algorithms, the choice of the switch points, and ofthe local optima.Moreover, we highlight the competitive GAs of stages of the op-timization process for some problems. Applying uniform crossovercan be helpful at the late stage of optimization for LeadingOnes,and the experimental result shows that we can gain improvementby switching to the 𝑜𝑝𝑡𝑖𝑚𝑎𝑙 crossover probability dynamically.Uniform crossover is useful at the late stage of optimization forOneMax, but the dynAS has not recognized this advantage forthe OneMax variants with weighted variables, dummy variables,and neutrality. The dynGA gains improvement over the BSA of ( + ) EA > for the OneMax variant of ruggedness by starting withthe ( + ) EA > and switching to the GA with uniform crossover.Oppositely, one-point and two-point crossover can accelerate theearly optimization for the OneMax variant with epistasis, but thestandard bit mutation with 𝑝 = / 𝑛 is a better choice for the latestage. Understanding and Design of Algorithms.

The previous result onF3-6 has shown that we can not rely on dynAS to achieve bettersolvers when the potential of the set of algorithms is limited, but itcan still help us understand how the different algorithms performin the different stages of the optimization process. Such insightscan facilitate the design of new algorithms on the one hand, and itcan support theoretical analyses on the other.

Performance Measures.

We have used in this work the ERT per-formance measure. Our results revealed that this cost measure hasseveral drawbacks for the use within one-shot informed dynAS.Firstly, its value can be affected by the budget for the experimentswith unsuccessful runs. For the second stage of the switch-ones dynAS, if an algorithm cannot hit the target at the switching pointat all runs, the later segment of formula 2 will not reflect its per-formance as 𝐴 accurately. Secondly, the ERT only reflects theurong Ye, Carola Doerr, and Thomas Bäckperformance with respect to the target. We can not utilize the per-formance before the algorithm hits the target by using it for thedynAS. We could mitigate these shortcomings by considering othermeasures such as the area under the empirical distribution functioncurve, which considers a set of targets and the fraction of successfulruns. REFERENCES [1] NoName Anonym. 2021.

Data Sets for the study "Leveraging Benchmarking Datafor Informed One-Shot Dynamic Algorithm Selection" .[2] Thomas Bartz-Beielstein, Christian WG Lasarczyk, and Mike Preuß. 2005. Sequen-tial parameter optimization. In ,Vol. 1. IEEE, 773–780.[3] André Biedenkapp, H Furkan Bozkurt, Theresa Eimer, Frank Hutter, and MariusLindauer. 2020. Dynamic algorithm configuration: foundation of a new meta-algorithmic framework. In

Proceedings of the Twenty-fourth European Conferenceon Artificial Intelligence (ECAI’20) .[4] Mauro Birattari, Zhi Yuan, Prasanna Balaprakash, and Thomas Stützle. 2010. F-Race and iterated F-Race: An overview. In

Experimental methods for the analysisof optimization algorithms . Springer, 311–336.[5] Süntje Böttcher, Benjamin Doerr, and Frank Neumann. 2010. Optimal Fixedand Adaptive Mutation Rates for the LeadingOnes Problem. In

Proc. of ParallelProblem Solving from Nature (PPSN’10) (LNCS, Vol. 6238) . Springer, 1–10.[6] Edmund K. Burke, Michel Gendreau, Matthew R. Hyde, Graham Kendall, GabrielaOchoa, Ender Özcan, and Rong Qu. 2013. Hyper-heuristics: a survey of the stateof the art.

J. Oper. Res. Soc.

64, 12 (2013), 1695–1724. https://doi.org/10.1057/jors.2013.71[7] Duc-Cuong Dang, Tobias Friedrich, Timo Kötzing, Martin S Krejca, Per KristianLehre, Pietro S Oliveto, Dirk Sudholt, and Andrew M Sutton. 2017. Escapinglocal optima using crossover with emergent diversity.

IEEE Transactions onEvolutionary Computation

22, 3 (2017), 484–497.[8] Benjamin Doerr and Carola Doerr. 2018. Optimal Static and Self-AdjustingParameter Choices for the (1+( 𝜆 , 𝜆 )) Genetic Algorithm. Algorithmica

80, 5(2018), 1658–1709. https://doi.org/10.1007/s00453-017-0354-9[9] Benjamin Doerr, Carola Doerr, and Jing Yang. 2020. Optimal parameter choicesvia precise black-box analysis.

Theoretical Computer Science

801 (2020), 1–34.https://doi.org/10.1016/j.tcs.2019.06.014[10] Benjamin Doerr, Christian Gießen, Carsten Witt, and Jing Yang. 2019. The ( + 𝜆 )Evolutionary Algorithm with Self-Adjusting Mutation Rate. Algorithmica

81, 2(2019), 593–631.[11] Benjamin Doerr, Huu Phuoc Le, Régis Makhmara, and Ta Duy Nguyen. 2017. Fastgenetic algorithms. In

Proceedings of the Genetic and Evolutionary ComputationConference . 777–784.[12] Carola Doerr, Hao Wang, Furong Ye, Sander van Rijn, and Thomas Bäck. 2018.IOHprofiler: A Benchmarking and Profiling Tool for Iterative OptimizationHeuristics. arXiv:1810.05281 [cs.NE][13] Carola Doerr, Furong Ye, Naama Horesh, Hao Wang, Ofer M Shir, and ThomasBäck. 2020. Benchmarking discrete optimization heuristics with IOHprofiler.

Applied Soft Computing

88 (2020), 106027.[14] Nikolaus Hansen, Anne Auger, and Dimo Brockhoff. 2020. Data from the BBOBworkshops. https://coco.gforge.inria.fr/doku.php?id=algorithms-bbob.[15] Nikolaus Hansen, Anne Auger, Raymond Ros, Olaf Mersmann, Tea Tušar, andDimo Brockhoff. 2020. COCO: A platform for comparing continuous optimizersin a black-box setting.

Optimization Methods and Software (2020), 1–31.[16] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2011. Sequential Model-Based Optimization for General Algorithm Configuration. In

Proc. of Learningand Intelligent Optimization (LION’11) . Springer, 507–523.[17] Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown, and Thomas Stützle. 2009.ParamILS: An Automatic Algorithm Configuration Framework.

Journal of Artifi-cial Intelligence Research

36 (2009), 267–306.[18] Thomas Jansen. 2015. On the Black-Box Complexity of Example Functions: TheReal Jump Function. In

Proc. of Foundations of Genetic Algorithms (FOGA’15) .ACM, 16–24. https://doi.org/10.1145/2725494.2725507[19] Thomas Jansen and Christine Zarges. 2011. Analysis of evolutionary algorithms:From computational complexity analysis to algorithm engineering. In

Proceedingsof the 11th workshop proceedings on Foundations of genetic algorithms . 1–14.[20] Manuel López-Ibáñez, Jérémie Dubois-Lacoste, Leslie Pérez Cáceres, Mauro Birat-tari, and Thomas Stützle. 2016. The irace package: Iterated racing for automaticalgorithm configuration.

Operations Research Perspectives

Hyper-heuristics: Theory and applications .Springer.[22] John R Rice. 1976. The algorithm selection problem. In

Advances in computers .Vol. 15. Elsevier, 65–118.[23] Bas van Stein, Hao Wang, and Thomas Bäck. 2019. Automatic Configurationof Deep Neural Networks with Parallel Efficient Global Optimization. In . IEEE, 1–7.[24] Diederick Vermetten, Sander van Rijn, Thomas Bäck, and Carola Doerr. 2019.Online selection of CMA-ES variants. In

Proc. of Genetic and Evolutionary Compu-tation Conference (GECCO’19) . ACM, 951–959. https://doi.org/10.1145/3321707.3321803[25] Diederick Vermetten, Hao Wang, Thomas Bäck, and Carola Doerr. 2020. Towardsdynamic algorithm selection for numerical black-box optimization: investigat-ing BBOB as a use case. In

Proceedings of the 2020 Genetic and EvolutionaryComputation Conference . 654–662.[26] Hao Wang, Diederick Vermetten, Furong Ye, Carola Doerr, and Thomas Bäck.2020. IOHanalyzer: Performance Analysis for Iterative Optimization Heuristic.

CoRR abs/2007.03953 (2020). https://arxiv.org/abs/2007.03953 IOHanalyzer isavailable at https://iohprofiler.liacs.nl/.[27] Thomas Weise and Zijun Wu. 2018. Difficult features of combinatorial opti-mization problems and the tunable w-model benchmark problem for simulatingthem. In

Proceedings of the Genetic and Evolutionary Computation ConferenceCompanion . 1769–1776.[28] Darrell Whitley, Swetha Varadarajan, Rachel Hirsch, and Anirban Mukhopadhyay.2018. Exploration and Exploitation Without Mutation: Solving the Jump Functionin 𝛩 ( 𝑛 ) Time. In

International Conference on Parallel Problem Solving from Nature .Springer, 55–66.[29] Furong Ye, Hao Wang, Carola Doerr, and Thomas Bäck. 2020. Benchmarking a ( 𝜇 + 𝜆 ) Genetic Algorithm with Configurable Crossover Probability. In

ParallelProblem Solving from Nature – PPSN XVI , Thomas Bäck, Mike Preuss, AndréDeutz, Hao Wang, Carola Doerr, Michael Emmerich, and Heike Trautmann (Eds.).Springer International Publishing, Cham, 699–713.[30] Furong Ye, Hao Wang, Carola Doerr, and Thomas Bäck. 2020.