[PDF] Optimal Static Mutation Strength Distributions for the (1+λ) Evolutionary Algorithm on OneMax

Abstract

Full PDF

OOptimal Static Mutation Strength Distributions for the ( + 𝜆 ) Evolutionary Algorithm on OneMax

Maxim Buzdalov

ITMO UniversitySaint Petersburg, Russia

Carola Doerr

Sorbonne Université, CNRS, LIP6Paris, France

ABSTRACT

Most evolutionary algorithms have parameters, which allow a greatflexibility in controlling their behavior and adapting them to newproblems. To achieve the best performance, it is often needed tocontrol some of the parameters during optimization, which gaverise to various parameter control methods. In recent works, how-ever, similar advantages have been shown, and even proven, forsampling parameter values from certain, often heavy-tailed, fixeddistributions. This produced a family of algorithms currently knownas “fast evolution strategies” and “fast genetic algorithms”.However, only little is known so far about the influence of thesedistributions on the performance of evolutionary algorithms, andabout the relationships between (dynamic) parameter control and(static) parameter sampling. We contribute to the body of knowl-edge by presenting, for the first time, an algorithm that computesthe optimal static distributions, which describe the mutation opera-tor used in the well-known simple ( + 𝜆 ) evolutionary algorithmon a classic benchmark problem OneMax. We show that, for largeenough population sizes, such optimal distributions may be sur-prisingly complicated and counter-intuitive. We investigate certainproperties of these distributions, and also evaluate the performanceregrets of the ( + 𝜆 ) evolutionary algorithm using commonly usedmutation distributions. Many evolutionary algorithms are composed of operators that areused within in several algorithms. In the context of bit string repre-sentations, a very popular example of such an operator is standardbit mutation (SBM) , the variation operator that takes as input apoint 𝑥 (the “parent”), modifies it by changing the entry in eachposition with some positive probability 𝑝 (independently of allother decisions), and outputs the so-obtained “offspring” 𝑦 . SBMis a global search operator, since the probability that it generatesa particular point 𝑦 is positive, regardless of the input 𝑥 . Anothercommon mutation operator is the flip operator, which creates theoffspring by changing the entry in exactly one uniformly selectedposition. This search operator is a local one. Both mutation opera-tors are unbiased in the sense proposed by Lehre and Witt [24], i.e.,they treat all positions and all possible values identically.By a characterization derived in [13], unary unbiased operatorsare exactly the ones that can be described by a distribution 𝐷 overthe integers { , , . . . , 𝑛 } . To apply them, one first samples a muta-tion strength ℓ from this distribution and then changes the inputby applying the flip ℓ operator, which flips the entry in ℓ uniformlychosen, pairwise different positions. That SBM is unbiased now eas-ily follows from the observation that it can be exactly characterizedby the binomial distribution Bin ( 𝑛, 𝑝 ) with 𝑛 trials (one for eachposition) and success probability 𝑝 (success=bit flip). Likewise, flip is the operator associated to the 1-point distribution assigning allprobability mass to mutation strength 1. Our Contribution:

We want to analyze in this work how com-mon mutation operators, such as the ones mentioned above or thefast mutation operator suggested in [14], compare to an optimalone. To analyze this question, we introduce the ( + 𝜆 ) UUSD-EA,the family of all ( + 𝜆 ) -type mutation-only algorithms whose muta-tion operator can be defined via a unary unbiased satic distribution(which may hence depend on the problem dimension 𝑛 and theoffspring population size 𝜆 , but which may not change during therun). This family comprises all ( + 𝜆 ) evolutionary algorithms(EAs), their Randomized Local Search (RLS) counterparts (whichuse the flip operator instead of SBM), the fastGAs, the normalizedEAs [28], etc.We numerically compute for various combinations of 𝑛 and 𝜆 theminimal expected runtime that can be obtained by any ( + 𝜆 ) UUSD-EA on OneMax in dimension 𝑛 , and we compare these runtimes tothat of the classically studies ( + 𝜆 ) -type algorithms mentionedabove. Our lower bounds are constructive, in that we also derivethe distributions associated to these optimal ( + 𝜆 ) UUSD-EAs.This allows us to study the properties of these optimal distributions.

Approach:

Since the optimal distributions cannot be determined byexact analytical approaches, we apply the separable CMA-ES [27]to identify them. The CMA-ES shows very good performance, andprovides us with distributions that are optimal up to the last fewdigits in the available machine precision. This is also one of the caseswhere the separable version of CMA-ES is not only computationallyfaster, but also yields results of same or better quality than thedefault CMA-ES implementations.

Main result:

Our numerical results show, among other things,that the flip operator is optimal when 𝜆 is small, whereas theconditional binomial distribution Bin > ( 𝑛, / ) is optimal when 𝜆 is very large – with this distribution, the ( + 𝜆 ) UUSD-EA performsa uniform random search. The optimal distributions in the middleregime show a rather complex behavior, for which we can identifya few patterns, but for which we also observe a few phenomenathat may look counter-intuitive at first glance. For instance, thefirst mutation strength different from one-bit flip that gets nonzeroprobability when 𝜆 grows appears to be either 𝑛 or 𝑛 − Relationship to black-box complexity and to parametercontrol:

Our work can be seen as a continuation of black-box com-plexity theory for 𝜆 -parallel [4] elitist [16] unary unbiased [24]black-box algorithms with static configuration. A key advantage oflower bounds such as ours is that it allows to rigorously quantify theimpact of individual decisions. In our case, the driving motivationbehind our analyses is a rigorous quantification of the difference be-tween static and dynamic algorithm configurations. Put differently,we aim at quantifying the gap between algorithms using parameter a r X i v : . [ c s . N E ] F e b axim Buzdalov and Carola Doerr control and such that do not. In contrast to classical runtime andblack-box complexity results, our work focuses on an exact numericevaluation of this gap for concrete problem dimensions. Impact:

While the results of our concrete analysis may mostlyappeal to theoreticians, our work invites to take a different viewon mutation operators by defining them via distributions over thepossible mutation strengths. This alternative view makes it sub-stantially easier to generalize concepts such as SBM, flip , etc., anadvantage that can be leveraged, for example, for designing smoothconvergence from global to local search behavior. But the designprinciple can also lead to performance gains in the static setting.A first strong result in this context is the fastGA [14], which hasbecome the new state-of-the-art in several applications [25]. Ourwork indicates that there is quite some untapped potential in thedesign of new mutation operators, and we hope that our workinspires new work in this direction.Based on the findings of our work, we formulate to conjectures: Conjecture 1:

We conjecture that for each 𝑛 there exists a threshold 𝜆 ( 𝑛 ) such that for all 𝜆 ≤ 𝜆 ( 𝑛 ) the 1-point distribution is opti-mal. Our guess on the particular dependency is 𝜆 ( 𝑛 ) = Θ ( log 𝑛 ) .However, we currently do not have a formal proof for this. Conjecture 2:

We conjecture that for each 𝑛 and each arbitrar-ily small 𝜀 there exists another threshold 𝜆 𝑏 ( 𝑛, 𝜀 ) such that, forall 𝜆 ≥ 𝜆 𝑏 ( 𝑛, 𝜀 ) , the optimal distribution is closer than 𝜀 , in anyimaginable measure – such as the maximum of differences betweenprobabilities across all mutation strengths – to the conditional bi-nomial distribution Bin > ( 𝑛, / ) . This is equivalent to stating thatthe uniform random search is arbitrarily close to being optimal forlarge enough 𝜆 . Again, we do not yet have a formal proof. Related Work:

Our study continues the recent works [6] and [7],which provide optimal dynamic configurations for ( + ) and ( + 𝜆 ) -type algorithms, respectively. While their works are restricted tospecific mutation operators (variants of SMB and the 𝑘 -bit flipsflip 𝑘 ), we study in this work in a generalization to arbitrary unaryunbiased variation operators. In contrast to [6, 7] we focus on static configurations, with the idea to build a rigorous baseline againstwhich we can compare dynamic parameter control methods.For 𝜆 =

1, the work [13] quantifies the asymptotic advantageof the best unary unbiased algorithm with dynamic distributionsagainst the best static one (RLS). To extend this work to ( + 𝜆 ) -typealgorithms, a rigorous bound on the best static case is needed - abaseline that we provide in this work for various combinations of 𝑛 and 𝜆 , with the hope that the insights generated by our examplescan be leveraged to rigorously prove certain characteristics of theoptimal static unary unbiased operators.For 𝜆 >

1, related works can be found in the context of theparallel black-box complexity model [4, 5, 23] and for the ( + 𝜆 ) EA [18, 19]. All these works, however, are either less interestedin exact runtime bounds (and focus on asymptotic runtime guaran-tees instead) or they concern specific mutation operators only. Forgeneralized mutation operators, concrete examples can be found inthe mentioned works [14, 28]. We are not aware, however, of previ-ous works explicitly studying optimal unary mutation operators.

Availability of Code and Data:

All project source code anddata are available for public use at [1].

We are concerned in this work with a generalizing view on unaryunbiased variation operators, often referred to in the evolutionarycomputation context as “mutation” . In a nutshell, a mutation op-erator takes as input a search point 𝑥 ∈ S , S denoting the searchspace, and creates from it an offspring 𝑦 ∈ S . More formally, amutation operator is a family ( 𝐷 ( 𝑥 )) 𝑥 ∈S of unary distributionsover the search space S . When fed with an input 𝑥 , a new searchpoint 𝑦 is sampled from 𝐷 ( 𝑥 ) .One of the most common mutation operators is standard bitmutation (SBM). In the context of pseudo-Boolean optimization(i.e., the maximization of a function 𝑓 : { , } 𝑛 → R ) – which isthe setting that we assume for the remainder of the paper – SBMis often explained as follows: to obtain an offspring 𝑦 from 𝑥 , wefirst create a copy of 𝑥 and then we decide for each bit position 𝑖 ∈ [ 𝑛 ] : = { , . . . , 𝑛 } whether the entry shall be updated to 1 − 𝑥 𝑖 (“bit flip”) or whether the current entry is maintained.The bit flipdecisions are made independently of each other. The probability 𝑝 ∈ ( , ] to flip an entry is referred to as the mutation rate .SBM is a prime example of a unary unbiased mutation operator inthe sense proposed by Lehre and Witt in [24]. By a characterizationproven in [13, Lemma 1], this class subsumes all variation operatorsthat are fully described by a distribution 𝐷 over the possible mu-tation strengths ℓ ∈ [ ..𝑛 ] : = [ 𝑛 ] ∪ { } . When applied to a searchpoint 𝑥 , the operator first samples a mutation strength ℓ ∈ [ ..𝑛 ] from its operator-specific distribution 𝐷 and then creates the off-spring 𝑦 by flipping the entries in ℓ pairwise different, uniformlyselected entries.It is not difficult to see that the operator-specific distributionof SBM is the binomial distribution Bin ( 𝑛, 𝑝 ) with 𝑛 trials and suc-cess probability 𝑝 . Another common operator is the 1-bit-flip op-erator 1pt, which is used, for example, within Randomized LocalSearch (RLS). 1pt creates the offspring by flipping exactly one uni-formly chosen bit. Its operator-specific distribution over the muta-tion strengths [ ..𝑛 ] is hence the 1-point distribution that assignsall probability mass to ℓ =

1. Likewise, the 𝑘 pt operator flips 𝑘 pairwise different, uniformly selected bits, and its operator-specificdistribution is the 1-point distribution giving all probability massto ℓ = 𝑘 . Other common unary unbiased mutation operators in-clude the “shift” SBM, SBM → , which is similar to SBM but whichassign all probability weight from ℓ = ℓ = > , which assigns the probability weight ofsampling mutation strength 0 proportionally to all positive mu-tation strengths 1 ≤ ℓ ≤ 𝑛 by assigning to each of these valuesprobability Bin ( 𝑛, 𝑝 )( ℓ )/( −( − 𝑝 ) 𝑛 ) ; see [8, 9] for a discussion andmotivation of these two latter variants. Motivated by the observa-tion that infrequent large “jumps” can be beneficial in evolutionaryalgorithm behavior, B. Doerr et al. introduced in [14] the fast ge-netic algorithm (GA) , which samples the mutation strength from theheavy-tailed, power-law distribution P [ ℓ = 𝑘 ] = ( 𝐶 𝛽𝑛 / ) − 𝑘 − 𝛽 with 𝐶 𝛽𝑛 / = (cid:205) 𝑛 / 𝑖 = 𝑖 − 𝛽 and 𝛽 being some constant, often set as 𝛽 = .

5. Fi-nally, in [28] a normalized mutation operator was suggested, whichsamples the mutation strength from a normal distribution

N ( 𝜇, 𝜎 ) .In contrast to the examples discussed above, this operators allows ptimal Static Mutation Strength Distributions for the ( + 𝜆 ) Evolutionary Algorithm on OneMax

Algorithm 1:

The ( + 𝜆 ) unary unbiased static distributionEA (UUSD-EA) maximizing a function 𝑓 : { , } 𝑛 → R . Initialization:

Sample 𝑥 ∈ { , } 𝑛 uniformly at randomand evaluate 𝑓 ( 𝑥 ) ; Optimization: for 𝑡 = , , , . . . do for 𝑖 = , . . . , 𝜆 do Sample ℓ ( 𝑖 ) ∼ 𝐷 ( 𝑛, 𝜆 ) ; 𝑦 ( 𝑖 ) ← flip ℓ ( 𝑖 ) ( 𝑥 ) ; evaluate 𝑓 ( 𝑦 ( 𝑖 ) ) ; 𝑦 ← select (cid:16) arg max { 𝑓 ( 𝑦 ( 𝑖 ) ) | 𝑖 ∈ [ 𝜆 ]} (cid:17) ; if 𝑓 ( 𝑦 ) ≥ 𝑓 ( 𝑥 ) then 𝑥 ← 𝑦 ;to scale the mean and the variance of the distribution independentlyof each other.The characterization from [13, Lemma 1], which identifies muta-tion operators via their distributions over the set [ ..𝑛 ] , is classicallyonly used to verify that a certain operator is unbiased. We use ithere the other way around, by asking ourselves how different opti-mal mutation operators are from those commonly studied in theevolutionary computation community. The ( + 𝜆 ) UUSD-EA.

We study this question in the contextof the ( + 𝜆 ) unary unbiased static distribution EA (UUSD-EA),which is given by Algorithm 1. The ( + 𝜆 ) UUSD-EA is initializeduniformly at random. In each iteration, it samples 𝜆 points, whichare all sampled from the same unary unbiased mutation operator.We denote the distribution from which the mutation strength issampled by 𝐷 ( 𝑛, 𝜆 ) to indicate that it may depend on 𝑛 and 𝜆 ,but not on any information accumulated during the run of thealgorithm. That is, the ( + 𝜆 ) UUSD-EA allows only static mutationoperators. For creating the 𝜆 search points that shall be evaluated inthe current iteration, the ( + 𝜆 ) UUSD-EA samples for each one ofthem a mutation strength ℓ ( 𝑖 ) and then creates the 𝑖 -th “offspring”by applying the flip ℓ ( 𝑖 ) operator, which flips ℓ ( 𝑖 ) pairwise different,uniformly chosen bits in the input 𝑥 . The best of these offspringreplaces 𝑥 if it is at least as good as it. It is irrelevant for the contextof our work how the ties are broken in line 8, as our results applyto all tie-breaking rules. OneMax:

We focus on OneMax, i.e., our goal is to determinethe optimal static unary unbiased distributions for maximizing thefunction (cid:205) 𝑛𝑖 = 𝑥 𝑖 . In the context of our work, this problem is equiv-alent to that of minimizing the Hamming distance to an arbitrarybit string 𝑧 ∈ { , } 𝑛 [24]. Expected Runtimes:

As common in literature on theory of evo-lutionary computation, we understand as an optimal distributionthe one that minimizes the expected runtime, which we measurehere in terms of generations. Since the offspring population size 𝜆 is fixed, this does not impact our results: a parallel runtime of 𝑇 generations corresponds to a runtime of exactly ( 𝑇 − ) 𝜆 + runtime we therefore denote the number ofiterations that the algorithm performs until it evaluates an optimalsolution for the first time. (Non-)Uniqueness of the Optimal Distributions: We notethat we do not have any guarantee at the moment that the optimal distributions are unique. In fact, we observe that for any fixedproblem dimension 𝑛 , there is a certain threshold 𝜆 ( 𝑛 ) before whichsmall differences in the distributions cause measurable effects onthe expected running times, so that the optimal distribution seemsto be unique from the point of view of our computations. After thethreshold, however, differences between the distributions have nomeasurable effect on the expected runtime, so that our algorithmmay consider different ones as optimal, and may hence not alwaysreturn the same distribution. Notation:

For combinations of 𝑛 and 𝜆 for which the optimaldistributions are unique, we denote by 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) the probabilitythat this distribution assigns to flipping 𝑘 bits, 𝑘 ∈ [ 𝑛 ] . To easethe reading, we sometimes use the same notation also for thosedistributions which yield expected running times that deviates onlynegligibly from what appears to be the true optimum. Our algorithm to compute the optimal static unary unbiased distri-butions is based on the dynamic programming approach from [7].We start the description by explaining the principles of dynamicprogramming that are used to compute the expected running time 𝑇 𝑓 of the ( + 𝜆 ) UUSD-EA, measured in iterations, assuming it startsat fitness 𝑓 and the values 𝑇 𝑓 ′ are known for all 𝑓 ′ > 𝑓 . We notethen that a (practical) analytic solution of the problem of finding anoptimal distribution is quite unlikely to exist even for small valuesof 𝑛 , and instead give a black-box minimization scheme with the useof a separable CMA-ES [27], a simplified and more computationallyefficient version of the well-known continuous optimizer [20]. Wecomplete with an investigation of convergence properties, whichallows us to say, with great confidence, that separable CMA-ESfinds a globally optimal distribution in a constant fraction of runs. We first explain how to compute 𝑇 𝑓 for a given distribution 𝐷 = ( 𝐷 𝑘 ) 𝑘 ∈[ ..𝑛 ] . We begin with computing the probabilities 𝑆 𝑘,𝑔 ofsampling an offspring with fitness 𝑔 by flipping exactly 𝑘 bits chosenuniformly without replacement in a solution of fitness 𝑓 . Notethat these quantities depend only on the current fitness 𝑓 andthe problem properties, that is, they depend neither on 𝜆 nor on 𝐷 . It holds from simple combinatorics that 𝑆 𝑘,𝑔 = (cid:0) 𝑛 − 𝑓𝑔 − 𝑓 + 𝑖 (cid:1) (cid:0) 𝑓𝑖 (cid:1) / (cid:0) 𝑛𝑘 (cid:1) ,where we assume this probability to be zero if one of the binomialcoefficient arguments are out of bounds.Next we compute the probabilities ( 𝑄 ( ) 𝑔 ) 𝑔 = 𝑓 + ,𝑓 + ,...,𝑛 of sam-pling a single offspring of fitness 𝑔 . For 𝑔 > 𝑓 they are derived from 𝑆 𝑘,𝑔 by using the distribution parameters ( 𝐷 𝑘 ) 𝑘 ∈[ ..𝑛 ] as follows: 𝑄 ( ) 𝑔 = ∑︁ 𝑛𝑘 = 𝐷 𝑘 𝑆 𝑘,𝑔 . As the ( + 𝜆 ) UUSD-EA is an elitist algorithm, and the behaviorof the algorithm with different parents having the same fitness valueis the same, with the remaining probability the algorithm remains inthe same state, which we capture as 𝑄 ( ) 𝑓 = − (cid:205) 𝑔 = 𝑓 + ,𝑓 + ,...,𝑛 𝑄 ( ) 𝑔 . axim Buzdalov and Carola Doerr The probability of each possible fitness improvement after sam-pling all the 𝜆 offspring is then computed as follows: 𝑄 ( 𝜆 ) 𝑔 = (cid:16)∑︁ 𝑔𝑖 = 𝑓 𝑄 ( ) 𝑖 (cid:17) 𝜆 − (cid:16)∑︁ 𝑔 − 𝑖 = 𝑓 𝑄 ( ) 𝑖 (cid:17) 𝜆 . (1)In simple words, the new fitness is 𝑔 if all offspring have the fitnessin [ 𝑓 ..𝑔 ] , but not all of them have the fitness in [ 𝑓 ..𝑔 − ] , countingall offspring with fitness smaller than 𝑓 towards 𝑓 .Finally, the expected time to reach the optimum from the fitness 𝑓 is computed using the following expression: 𝑇 𝑓 = − 𝑄 ( 𝜆 ) 𝑓 (cid:16) + ∑︁ 𝑛𝑔 = 𝑓 + 𝑄 ( 𝜆 ) 𝑔 𝑇 𝑔 (cid:17) , (2)by standard arguments as detailed in [6, 7].The time and memory complexities of such a step is 𝑂 ( 𝑛 ) forcomputing 𝑆 𝑘,𝑔 and 𝑂 ( 𝑛 ) for the other stages. As there are 𝑛 dif-ferent fitness values for 𝑓 , the time complexity of computing allthe expected running times for one set of algorithm parameters ( 𝜆, 𝐷 ) is 𝑂 ( 𝑛 ) , whereas the memory complexity is still 𝑂 ( 𝑛 ) as the 𝑆 𝑘,𝑔 matrix may be discarded once 𝑓 changes. However, since 𝑆 𝑘,𝑔 depend only on 𝑛 and 𝑓 , one may evaluate up to 𝑂 ( 𝑛 ) different com-binations of algorithm parameters ( 𝜆, 𝐷 ) in a single run by mergingthe activities corresponding to each 𝑓 , which preserves the totaltime complexity of 𝑂 ( 𝑛 ) and the memory complexity of 𝑂 ( 𝑛 ) .This feature will turn later to be beneficial when a population-basedoptimizer is applied to find the best possible distribution 𝐷 . When solving the problem of finding an optimal distribution 𝐷 , onecould choose to express each of 𝑇 𝑓 as a function of 𝑛 + ( 𝐷 𝑘 ) 𝑘 ∈[ ..𝑛 ] , promote such expressions throughdynamic programming, take their weighed sum for the expectedrunning time from a random point and perform analytical optimiza-tion using standard analysis approaches. However, the presenceof 𝑄 ( 𝜆 ) 𝑓 in the denominator in (2) makes the resulting expressionnonlinear even for 𝜆 =

1, and having a 𝜆 in the exponent in (1)makes it even harder. The resulting expression appears to be a ratioof polynomials of degree Θ ( 𝑛𝜆 ) with 𝐷 𝑘 as variables, which makesit infeasible to perform an exact analytical minimization even forsmall problem sizes.Such large degree of the polynomials also effectively preventsgradient-based optimization, since the exact numeric computationof the derivatives – although possible – would require considerablecomputational resources, which furthermore significantly increasewith 𝜆 . At the same time, evaluating the expected runtime of the ( + 𝜆 ) UUSD-EA for a given distribution does not depend on 𝜆 ,assuming the computations are done in machine precision. Thismakes it possible to apply black-box optimization techniques toidentify distributions which minimize the expected runtime. Fromthe large set of possible black-box optimization techniques, wechose the CMA-ES family of algorithms [20] that are well suitablefor such kind of optimization. The application of CMA-ES requiresa few clarifications, which we list below. • Our implementation of CMA-ES is based on the one fromApache Commons Math, version 3.6.1. The choice of the Java 0 2 ,

000 4 ,

000 6 ,

000 8 , − − Fitness evaluations F i t n e ss − . Figure 1: Fitness as a function of the number of evaluationsfor 𝑛 = and 𝜆 = . First four runs are shown. programming platform was due to the computational com-plexity of the fitness evaluation. On one hand, it is too expen-sive to be implemented in Python or Matlab (the languageswith reference implementations of CMA-ES maintained bythe authors of this algorithm). On the other hand, the costsof inter-process communication are not negligible comparedto fitness evaluation, so evaluation of fitness shall happen inthe same process as the optimizer. • To search the distributions, we tune the CMA-ES to respectbox constraints (each variable is in [ , ] ) and, before fitnessevaluation, we normalize variable values in the Lamarckiansense, that is, without updating the individuals themselves,which would clash with the assumptions made by CMA-ES. • As we perform distribution optimization for a noiseless prob-lem, we can safely set 𝐷 to zero, which restricts the searchto the 𝑛 -dimensional cube [ , ] 𝑛 .We had to modify the implementation of CMA-ES, as the par-ticular class hierarchy in Apache Commons Math does not allowto evaluate the whole population in a single call, which we bene-fit from. Besides, after some experimentation, we switched to theseparable CMA-ES [27], which is more efficient in terms of compu-tational costs, but also produced better results (likely due to lessoperations that caused precision loss). We also vastly optimized itsimplementation, which resulted in an 8 × reduction of the wall-clockrunning time.The configurable parameters of CMA-ES were: population size10, initial step size 1 .

0, random initial guess, computational budget100 𝑛 . The optimizer, however, did not reach the computationalbudget, as, in all runs, it converged to a single point and terminatesat one of the degeneration criteria much earlier than that. The preliminary experiments showed that CMA-ES typically con-verges relatively quickly to nearly the same value in most of runs.Fig. 1 shows example runs for 𝑛 =

16 and 𝜆 =

8. Runs for all 𝑛 and 𝜆 demonstrate similar behavior. The rest of the paper is based onthe data collected for 𝑛 ∈ { , , , , , , , , , , } and 𝜆 ∈ [ .. ] ∪ { 𝑖 | ≤ 𝑖 ≤ } . For each ( 𝑛, 𝜆 ) pair we performed 50independent runs of the CMA-ES.We observed that in a constant fraction of runs the algorithmoptimized the distribution to produce the same expected running ptimal Static Mutation Strength Distributions for the ( + 𝜆 ) Evolutionary Algorithm on OneMax − − Run number, 𝑛 = F i t n e ss d i ff e r e n c e t o b e s t 𝜆 = 𝜆 = 𝜆 = 𝜆 = 𝑛 = 𝜆 = 𝜆 = 𝜆 = 𝜆 = Figure 2: Relative loss of the expected runtime induced bythe result of the 50 CMA-ES runs, when compared againstthe best expected runtime. Plots are for 𝑛 = (left) and 𝑛 = (right), respectively, values are sorted by increasing loss. − 𝜆 M a x s t dd e v 𝑛 = 𝑛 = 𝑛 = 𝑛 = 𝑛 = 𝑛 = Figure 3: Maximal standard deviations, out of 𝑘 standard de-viations of 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) , as functions of 𝜆 for various 𝑛 . time up to precision of 10 − and better. Fig. 2 shows examples forsome values of 𝜆 for 𝑛 = 𝑛 = 𝜆 = 𝑛 =

100 plot since in this case all the values wereidentical. Since the delivered precision is very close to the precisionavailable for the 64-bit floating point machine numbers, we assumethat CMA-ES reaches the global optimum of the problem in mostof the cases. For further analyses we selected only the runs that areat most 1 + − times greater than the obtained minimum.We have also performed the robustness analysis for the distri-butions produced by the optimizer. For each 𝑛 , 𝜆 , and 𝑘 , the stan-dard deviation of the values 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) was computed across allthe good enough distributions. The results are presented in Fig. 3,where an intriguing picture appears. First, small 𝜆 produce verysmall (much less than 10 − ) maximal standard deviations, whichthen jump to the region of [ − ; 10 − ] and remain there until 𝜆 reaches a certain threshold. Above that threshold, the maximalstandard deviations experience some sort of phase shift and raise tovery high values reaching 0 . ( + 𝜆 ) UUSD-EA still shows nearly identical expected running times on suchdifferent distributions and values of 𝜆 . We show later in the nextsection that this is, in fact, an expected behavior that correspondsto situations when there is a single global optimum, but a number 10 − 𝑛 W a ll - c l o c k t i m e 𝜆 = 𝜆 = 𝜆 = · − · 𝑛 · − · 𝑛 . Figure 4: Average wall-clock times for various 𝜆 togetherwith guesses for asymptotic bounds. . 𝜆 P r o b a b i l i t y 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = Figure 5: Probabilities of flipping 𝑘 bits in optimal static mu-tation strength distributions for different population sizes 𝜆 for optimizing OneMax in dimension 𝑛 = . of different distributions coincide in its running time expectationwith that global optimum up to the machine precision.Fig. 4 displays the average wall-clock times required to find theoptimal distribution for all available 𝑛 and few values of 𝜆 . Theavailable data suggests that the time complexity scales polynomi-ally with the degree of 4 + 𝜀 for some small 𝜀 , which, together withthe earlier cubic runtime bound for the evaluation of a given dis-tribution, suggests that the CMA-ES requires 𝑂 ( 𝑛 + 𝜀 ) iterationsbefore hitting one of its termination criteria. We now take a closer look at the distributions returned by thealgorithm from Sec. 3. As mentioned, the data is available at [1],and we present here only our main findings.We first study the impact of 𝜆 on the optimal distributions. Tothis end, we fix the dimension and analyze how for each possiblemutation strength 𝑘 ∈ [ ..𝑛 ] its probability of being chosen dependson 𝜆 . Fig. 5 illustrates these probabilities for dimension 𝑛 = The 1-point distribution Pr [ ℓ = ] = is optimal when 𝝀 is small. For 𝑛 =

16, we see that deterministically flipping onebit (Pr [ ℓ = ] =

1) is optimal for 𝜆 ≤

4. Note that for 𝜆 = axim Buzdalov and Carola Doerr . . 𝑘 P r o b a b i l i t y 𝜆 = 𝜆 = 𝜆 = 𝜆 = 𝜆 = > ( , ) Figure 6: Values 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) for large 𝜆 plotted against 𝑘 . The larger 𝜆 , the more the curves resemble the binomialdistribution Bin > ( , / ) , which is also plotted. that is often used as baseline for comparisons, both in empirical [17]and in theoretical [15] research. Our data shows that for 𝑛 ∈ { , } the generalized ( + 𝜆 ) RLS is optimal for 𝜆 ≤ 𝜆 =

4. Similarly, for 𝑛 ∈ { , , } it is optimal for 𝜆 ≤

4, andfor 𝑛 ∈ { , , } the threshold is 𝜆 =

5. We are confident thatthis describes a general trend of a positive correlation between thedimension 𝑛 and the maximal 𝜆 for which the 1-point distributionPr [ ℓ = ] = The conditional binomial distribution

Bin > ( 𝒏, / ) is ar-bitrarily close to an optimal one for large 𝝀 . We plot in Fig. 6the optimal probability 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) for large 𝜆 . The curvesfor 𝜆 =

512 and 𝜆 = > ( , / ) , which we plot as dotted red line. Thiscan be explained as follows: the ( + 𝜆 ) UUSD-EA with the staticmutation strength distribution Bin > ( 𝑛, / ) is simply the randomsearch algorithm, which samples all search points, except the par-ent, uniformly at random. For 𝜆 = Ω ( 𝑛 ) , this algorithm has a verygood chance of sampling every point 𝑦 ∈ { , } 𝑛 , so that it also hasa decent chance of hitting the optimum. When 𝜆 is not much largerthan 2 𝑛 , the quality of a distribution is significantly influenced byeach of its parameters, so for these cases our computations providedistributions that are almost identical to random search in eachindependent run. When 𝜆 is much bigger than 2 𝑛 , however, thesituation changes: while the common sense suggests that the trulyoptimal distribution gets even closer to pure random search, thequality of a distribution quickly loses sensitivity to its parameters,and we obtain different distributions, which all yield practicallyindistinguishable expected runtimes. We provide an example for 𝑛 = 𝜆 ≤

128 the averageof the computed static mutation strength distributions convergeagainst Bin > ( 𝑛, / ) . The standard deviation of the independentruns of our optimizer are negligible in this regime. For 𝜆 = 𝜆 = 𝑷 ∗ ( | 𝝀, 𝒏 ) decreases monotonically with increasing 𝝀 . For fixed dimension 𝑛 , the importance of 1-bit flips significantly Table 1: Average values of the recommended distributions 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) for different 𝜆 and all mutation strengths 𝑘 .For 𝜆 ≤ the standard deviations are negligible, then growquickly (see Figure 3). Unreliable values are indicated withgray font. We add Bin > ( , / ) for comparison. 𝜆𝑘 Bin > ( , ) . . . . 𝑘 = 𝑘 = 𝑘 = Figure 7: 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 = ) for each of the 50 indepen-dent runs of our optimizer. The expected optimization timeis identical for all of them (0.875). . 𝜆 P r o b a b i l i t y 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = 𝑛 = ,𝑘 = Figure 8: 𝑃 ∗ ( | 𝜆, 𝑛 ) and 𝑃 ∗ ( | 𝜆, 𝑛 ) in dependence of 𝜆 , forselected problem dimensions 𝑛 . decreases as 𝜆 grows. We have seen this for 𝑛 =

16 in Fig. 5. Thistrend generalizes to other problem dimensions, which can be seenin Fig. 8, where we plot the probabilities 𝑃 ∗ ( | 𝜆, 𝑛 ) and 𝑃 ∗ ( | 𝜆, 𝑛 ) for 𝑛 ∈ { , , , , , , } . 𝑷 ∗ ( | 𝝀, 𝒏 ) is non-monotonic in 𝝀 . We clearly see from Fig. 8that, for fixed dimension 𝑛 , the optimal probability to flip two bitsis non-monotonic in 𝜆 . The 𝜆 at which it becomes non-zero appearsto be monotonic in 𝑛 , however, the set of 𝜆 we used is not enoughto determine the exact threshold. It is 𝜆 = 𝑛 =

8, 8 < 𝜆 ≤ 𝑛 ∈ { , , } , and it is 16 < 𝜆 ≤

32 for 𝑛 ∈ { , , , , } .Surprisingly enough, this threshold is always larger than the valueat which 𝑃 ∗ ( | 𝜆, 𝑛 ) becomes less than one. We do not see anypattern in the points at which Pr [ 𝑘 = ] starts to decrease again. Flipping all bits can be optimal.

Intuitively, flipping morethan 𝑛 / ptimal Static Mutation Strength Distributions for the ( + 𝜆 ) Evolutionary Algorithm on OneMax

Table 2: Combinations of 𝑛 and 𝜆 for which 𝑃 ∗ ( 𝑛 | 𝑛, 𝜆 ) > ,i.e., the optimal probability of flipping all bits is non-zero. 𝜆𝑛 . . . 𝜆 P r o b a b i l i t y 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = 𝑘 = Figure 9: 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) , in dependence of 𝜆 , and for allvalues 𝑘 ∈ [ , ] for which there is at least one 𝜆 such that 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) > . Note that, for all these 𝑘 , 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) = for 𝜆 ∈ { , , , } . OM ( 𝑥 ) < 𝑛 /

2. It may therefore be surprising that even for mutationstrength ℓ = 𝑛 (i.e., flipping all bits) the optimal static probabilitycan be non-zero already for comparatively small 𝜆 . In Tab. 2 weshow for which combinations of 𝑛 and 𝜆 the optimal probability offlipping all bits is non-zero. Note that for some 𝑛 we have not seenany 𝜆 where this probability is nonzero, and this may be relatedto the parity of 𝑛 : for 𝑛 ∈ { , , } the small 𝜆 feature rathera nonzero 𝑃 ∗ ( 𝑛 − | 𝑛, 𝜆 ) instead. So far we do not have anyexplanation for the observed patterns.We take a set Λ = { , , , , , } and show an example inFig. 9, where we display 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ∈ Λ ) for all 𝑘 ∈ [ , ] for which there exists at least one 𝜆 ∈ Λ with 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) > 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) = 𝜆 >

64. We also seethat only nine different 𝑘 appear out of [ , ] , of which at mostthree have a non-zero optimal probability for any of the tested 𝜆 . The number of mutation strengths 𝒌 with 𝑷 ∗ ( 𝒌 | 𝝀, 𝒏 ) > increases with 𝝀 , but not monotonically. We summarize inTab. 3 the number of different mutation strengths 𝑘 for which 𝑃 ∗ ( 𝑘 | 𝜆, 𝑛 ) >

0. While there is a general trend for an increasingnumber of such 𝑘 with increasing 𝜆 , these trends are non-monotonic.From the previous insights, however, it is clear that for every 𝑛 thereexists a threshold 𝜆 ( 𝑛 ) such that for all 𝜆 > 𝜆 ( 𝑛 ) the number of 𝑘 with 𝑃 ∗ ( 𝑘 | 𝜆, 𝑛 ) > 𝑛 . Table 3: Number of different mutation strengths 𝑘 for which 𝑃 ∗ ( 𝑘 | 𝜆, 𝑛 ) > , in dependence of 𝑛 and 𝜆 . 𝜆𝑛 Table 4: Values of 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) for selected 𝜆 . 𝜆𝑘 Bin > ( , ) A closer look for small 𝝀 . It is well known that small valuesof 𝜆 are preferable for the optimization of simple function suchas OneMax [21]. We therefore take a more detailed look at smallvalues in Tab. 4, where we list for 𝑛 = 𝜆 ∈ [ .. ]∪{ , , } , andfor all possible mutation strengths 𝑘 ∈ [ ] the optimal probability 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) . We recall from above that for 𝜆 ≤ 𝑘 = 𝜆 = 𝜆 =

7, the optimal probability 𝑃 ∗ ( | 𝑛 = , 𝜆 ) is slightly less than1, and the remaining probability mass is assigned to 𝑘 =

10, i.e., tothe operator flipping all bits but one. For 𝜆 = 𝑘 = 𝑘 =

11, are also chosen with positive probability. Asalready discussed in the context of Tab. 3 and Fig. 6, the numberof mutation strengths 𝑘 with 𝑃 ∗ ( 𝑘 | 𝑛 = , 𝜆 ) > 𝜆 , and the distribution converges to the conditionalbinomial distribution Bin > ( , / ) , which we include in the tablefor reference. After having focused on the distributions in the previous section,we now study the runtime of the optimal ( + 𝜆 ) UUSD-EA incomparison to other common ( + 𝜆 ) UUSD-EAs. We include inour comparison the ( + 𝜆 ) EA variants with SBM, SBM > , andSBM → standard bit mutation operators, the ( + 𝜆 ) RLS, and the ( + 𝜆 ) fastGA with different 𝛽 ∈ { . , . , . } . We also consideredthe variant of the latter which directly samples mutation strengths ℓ proportional to ℓ − 𝛽 for the same parameter values.It is not difficult to see that, for any fixed 𝑛 , the expected runtimeof the optimal ( + 𝜆 ) UUSD-EAs converge to 1 − / 𝑛 as 𝜆 → ∞ .This is also the case for all ( + 𝜆 ) UUSD-EA variants that assign a axim Buzdalov and Carola Doerr positive probability to each positive mutation strength 𝑘 ∈ [ 𝑛 ] , andthis for all problems 𝑓 : { , } 𝑛 → R . Since SBM and fast mutationsatisfy these requirements, the expected runtime of all ( + 𝜆 ) EAvariants as well as that of the fastGA variants also converges to1 − / 𝑛 , but at a possibly much different speed. The expectedruntime of the generalized ( + 𝜆 ) RLS, in contrast, converges to 𝑛 / 𝜆 → R .In Table 5 we present the expected runtimes of the optimal ( + 𝜆 ) UUSD-EA(s) on OneMax for all different combinations of 𝑛 and 𝜆 we have. Note that these numbers are the lower bounds for all ( + 𝜆 ) UUSD-EAs, including the algorithms mentioned above. Notealso that algorithms obtaining a better expected runtime requireadaptive parameter choices.In Figures 10 we illustrate the regret in the expected runtimeof common ( + 𝜆 ) UUSD-EAs compared to the optimal one, for 𝑛 ∈ { , } , respectively. Corresponding to our discussion above,we observe that for 𝑛 = . = − / 𝜆 → ∞ , whereas the generalized ( + 𝜆 ) RLS converges to 1 . / ≈ . ... relative disadvantage.Note that the regrets are not monotone for those algorithmswhich never flip zero bits, except for the ( + 𝜆 RLS. As 𝜆 growsfrom the small values, their regret decreases, most probably asflipping more than one bit becomes a better choice. However, withfurther increase of 𝜆 , the fact that these algorithms flip many bitswith a small probability turns to a disadvantage. Note how theheavy-tailed algorithm that samples the mutation strength directlyfrom a power-law distribution with the smallest tested 𝛽 = . 𝜆 ≥

7. The similar behavior is seen alsofor 𝑛 =

100 with the exception that the values of 𝜆 are not largeenough to observer the complete convergence picture. We have analyzed in this paper the dependence of 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) , theoptimal probability of flipping 𝑘 bits in the ( + 𝜆 ) EA-UUSD, independence of 𝑛 and 𝜆 . Among other insights, we have shown thatthe ( + 𝜆 ) RLS is optimal when 𝜆 is small, and that the value forwhich it ceases to be optimal increases with increasing 𝑛 . We havealso seen that, for fixed 𝑛 , the distribution 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) converges tothe conditional binomial distribution Bin > ( 𝑛, / ) when 𝜆 → ∞ .For future work, we consider the following particularly exciting.

1) Formalizing the observations into rigorous results:

Webelieve that some of the observations made in this work could beformalized with reasonable effort.

2) Analyzing benefits of generalized mutation for morecomplex problems:

For practitioners, our work is perhaps mostinteresting in that it invites to consider mutation operators throughthe lens of probability distributions over the set of possible radii.This idea should show its full potential on problems that are morecomplex than OneMax. The fastGA proposed in [14] and fast AISfrom [10] are compelling examples that shows that such general-ization can indeed be very beneficial [25, 26].

3) Transferring the generalizations to variation operatorsof higher arity:

The quest for analyzing more general variation 2 − − 𝑛 = 𝜆 R u n t i m e r e g r e t RLS ℓ ∼ pow ( . ) ( + 𝜆 ) fast , 𝛽 = . ( + 𝜆 ) EA ℓ ∼ pow ( . ) ( + 𝜆 ) fast , 𝛽 = . ( + 𝜆 ) EA → ℓ ∼ pow ( . ) ( + 𝜆 ) fast , 𝛽 = . ( + 𝜆 ) EA > − 𝑛 = 𝜆 R u n t i m e r e g r e t Figure 10: Regrets of expected runtime of different ( + 𝜆 ) UUSD-EAs on OneMax, in dependency of 𝜆 , compared tothe runtime of the optimal ( + 𝜆 ) UUSD-EA computed inSec. 4, for 𝑛 = (top) and 𝑛 = (bottom). The algorithmsthat sample the mutation strength ℓ directly from the power-law distribution with parameter 𝛽 are denoted as ℓ ∼ pow ( 𝛽 ) .The notation for other algorithms is standard. operators is not restricted to mutation alone, but also generalizesto variation operators of higher arity, called “crossover” or “re-combination” operators in evolutionary computation. In a simplestextension, one could study effects of changing the binomial dis-tribution associated with the number of bits that is taken fromeach parent in uniform crossover. We did not yet investigate thisidea further, but we hope that a de-coupling of mean and variance,similarly as proposed for variation in [28], may be beneficial.

4) Interplay of generalized mutation with other variationoperators:

The benefits of generalized mutation operators are verylikely not restricted to mutation-only algorithms, but could alsoimprove algorithms that use variation operators of different ari-ties. First examples demonstrating clear advantages of heavy-tailedmutation in the ( + ( 𝜆, 𝜆 )) GA [12] were recently shown in [2, 3].

5) Extensions to the dynamic case:

We studied in this workthe case of static distributions 𝑃 ∗ ( 𝑘 | 𝑛, 𝜆 ) , whereas it is well knownthat a dynamic choice of the mutation rates, or algorithms’ param-eters in general, can lead to significant performance gains [11, 22].Combining the analyses made in [7] for the optimal dynamic 1ptand the optimal dynamic SBM operators with the approach takenin this work (the generalization to arbitrary distributions) wouldprovide an exact quantification of the disadvantage of static againstdynamic mutation operator choices. ptimal Static Mutation Strength Distributions for the ( + 𝜆 ) Evolutionary Algorithm on OneMax

Table 5: Expected runtimes (in generations) of the optimal ( + 𝜆 ) UUSD-EA(s) on OneMax for different combinations of 𝑛 (rows) and 𝜆 (columns), rounded to two digits. For every fixed 𝑛 , the optimal expected runtime converges to − / 𝑛 as 𝜆 → ∞ . 𝜆 n 1 2 3 4 5 6 7 8 16 32 64 128 256 512 10243 3.50 2.26 1.87 1.64 1.48 1.37 1.28 1.21 0.96 0.88 0.88 0.88 0.87 0.87 0.875 7.97 4.78 3.78 3.26 2.92 2.67 2.49 2.35 1.78 1.39 1.10 0.98 0.97 0.97 0.978 16.20 9.32 7.12 6.04 5.36 4.89 4.54 4.27 3.15 2.43 1.98 1.67 1.39 1.14 1.0111 25.59 14.45 10.84 9.11 8.04 7.30 6.76 6.34 4.63 3.53 2.82 2.33 2.01 1.83 1.6216 43.00 23.87 17.64 14.62 12.83 11.60 10.71 10.02 7.26 5.46 4.31 3.54 3.01 2.61 2.2623 69.95 38.35 28.02 22.99 20.04 18.04 16.60 15.50 11.14 8.32 6.51 5.30 4.46 3.83 3.3432 107.69 58.52 42.40 34.52 29.91 26.84 24.62 22.94 16.34 12.14 9.42 7.62 6.38 5.47 4.7945 166.58 89.83 64.63 52.27 45.03 40.25 36.82 34.23 24.17 17.84 13.74 11.05 9.21 7.87 6.8664 259.25 138.90 99.31 79.86 68.44 60.95 55.59 51.56 36.09 26.45 20.23 16.17 13.39 11.41 9.9291 400.44 213.38 151.76 121.44 103.60 91.93 83.62 77.39 53.70 39.07 29.70 23.59 19.45 16.50 14.31100 449.42 239.17 169.88 135.79 115.70 102.57 93.24 86.24 59.70 43.36 32.90 26.09 21.48 18.22 15.79 REFERENCES [1] Authors Anonymized. 2021. Code and data repository for this paper. Blinded forGECCO reviews.[2] Denis Antipov, Maxim Buzdalov, and Benjamin Doerr. 2020. Fast mutation incrossover-based algorithms. In

Proc. of Genetic and Evolutionary Computation Con-ference (GECCO’20) . ACM, 1268–1276. https://doi.org/10.1145/3377930.3390172[3] Denis Antipov and Benjamin Doerr. 2020. Runtime Analysis of a Heavy-Tailed ( + ( 𝜆, 𝜆 )) Genetic Algorithm on Jump Functions. In

Proc. of Parallel ProblemSolving from Nature (PPSN’20) (LNCS) , Vol. 12270. Springer, 545–559. https://doi.org/10.1007/978-3-030-58115-2_38[4] Golnaz Badkobeh, Per Kristian Lehre, and Dirk Sudholt. 2014. Unbiased Black-Box Complexity of Parallel Search. In

Proc. of Parallel Problem Solving from Nature(PPSN’14) (LNCS) , Vol. 8672. Springer, 892–901.[5] Golnaz Badkobeh, Per Kristian Lehre, and Dirk Sudholt. 2015. Black-box Com-plexity of Parallel Search with Distributed Populations. In

Proc. of Foundations ofGenetic Algorithms (FOGA’15) . ACM, 3–15.[6] Nathan Buskulic and Carola Doerr. 2019. Maximizing drift is not optimalfor solving OneMax. In

Proc. of Genetic and Evolutionary Computation Confer-ence (GECCO’19, Companion Material) . ACM, 425–426. https://doi.org/10.1145/3319619.3321952 An extension of this work is to appear in the EvolutionaryComputation journal.[7] Maxim Buzdalov and Carola Doerr. 2020. Optimal Mutation Rates for the ( + 𝜆 ) EA on OneMax. In

Proc. of Parallel Problem Solving from Nature (PPSN’20) (LNCS) ,Vol. 12270. Springer, 574–587. https://doi.org/10.1007/978-3-030-58115-2_40[8] Eduardo Carvalho Pinto and Carola Doerr. 2018. A Simple Proof for the Usefulnessof Crossover in Black-Box Optimization. In

Proc. of Parallel Problem Solving fromNature (PPSN’18) (LNCS) , Vol. 11102. Springer, 29–41. https://doi.org/10.1007/978-3-319-99259-4_3[9] Eduardo Carvalho Pinto and Carola Doerr. 2018. Towards a More Practice-AwareRuntime Analysis of Evolutionary Algorithms.

CoRR abs/1812.00493 (2018).arXiv:1812.00493 http://arxiv.org/abs/1812.00493[10] Dogan Corus, Pietro S. Oliveto, and Donya Yazdani. 2018. Fast Artificial ImmuneSystems. In

Parallel Problem Solving from Nature . Lecture Notes in ComputerScience, Vol. 11102. 67–78.[11] Benjamin Doerr and Carola Doerr. 2020. Theory of Parameter Control Mecha-nisms for Discrete Black-Box Optimization: Provable Performance Gains ThroughDynamic Parameter Choices. In

Theory of Evolutionary Computation: Recent De-velopments in Discrete Optimization . Springer, 271–321.[12] Benjamin Doerr, Carola Doerr, and Franziska Ebel. 2015. From black-box com-plexity to designing new genetic algorithms.

Theoretical Computer Science

801 (2020), 1–34.https://doi.org/10.1016/j.tcs.2019.06.014[14] Benjamin Doerr, Huu Phuoc Le, Régis Makhmara, and Ta Duy Nguyen. 2017. Fastgenetic algorithms. In

Proc. of Genetic and Evolutionary Computation Conference(GECCO’17) . ACM, 777–784. https://doi.org/10.1145/3071178.3071301[15] Benjamin Doerr and Frank Neumann. 2020.

Theory of Evolutionary Computation:Recent Developments in Discrete Optimization . Springer. https://doi.org/10.1007/978-3-030-29414-4 [16] Carola Doerr and Johannes Lengler. 2017. Introducing Elitist Black-Box Models:When Does Elitist Behavior Weaken the Performance of Evolutionary Algo-rithms?

Evolutionary Computation

25 (2017). https://doi.org/10.1162/evco_a_00195[17] Carola Doerr, Furong Ye, Naama Horesh, Hao Wang, Ofer M. Shir, and ThomasBäck. 2020. Benchmarking discrete optimization heuristics with IOHprofiler.

Applied Soft Computing

88 (2020), 106027. https://doi.org/10.1016/j.asoc.2019.106027[18] Christian Gießen and Carsten Witt. 2017. The Interplay of Population Size andMutation Probability in the ( + 𝜆 ) EA on OneMax.

Algorithmica

78, 2 (2017),587–609.[19] Christian Gießen and Carsten Witt. 2018. Optimal Mutation Rates for the (1+ 𝜆 )EA on OneMax Through Asymptotically Tight Drift Analysis. Algorithmica

80, 5(2018), 1710–1731. https://doi.org/10.1007/s00453-017-0360-y[20] Nikolaus Hansen and Andreas Ostermeier. 2001. Completely DerandomizedSelf-Adaptation in Evolution Strategies.

Evolutionary Computation

9, 2 (2001),159–195. https://doi.org/10.1162/106365601750190398[21] Thomas Jansen, Kenneth A. De Jong, and Ingo Wegener. 2005. On the Choice ofthe Offspring Population Size in Evolutionary Algorithms.

Evol. Comput.

13, 4(2005), 413–440. https://doi.org/10.1162/106365605774666921[22] Giorgos Karafotias, Mark Hoogendoorn, and A.E. Eiben. 2015. Parameter Con-trol in Evolutionary Algorithms: Trends and Challenges.

IEEE Transactions onEvolutionary Computation

19 (2015), 167–187.[23] Per Kristian Lehre and Dirk Sudholt. 2019. Parallel Black-Box Complexity withTail Bounds.

CoRR abs/1902.00107 (2019). arXiv:1902.00107 http://arxiv.org/abs/1902.00107[24] Per Kristian Lehre and Carsten Witt. 2012. Black-Box Search by Unbiased Varia-tion.

Algorithmica

64 (2012), 623–642.[25] Laurent Meunier, Herilalaina Rakotoarison, Pak-Kan Wong, Baptiste Rozière,Jérémy Rapin, Olivier Teytaud, Antoine Moreau, and Carola Doerr. 2020. Black-Box Optimization Revisited: Improving Algorithm Selection Wizards throughMassive Benchmarking.

CoRR abs/2010.04542 (2020). arXiv:2010.04542 https://arxiv.org/abs/2010.04542[26] Vladimir Mironovich and Maxim Buzdalov. 2017. Evaluation of heavy-tailedmutation operator on maximum flow test generation problem. In

Proc. of Geneticand Evolutionary Computation Conference (GECCO’17, Companion Material) . ACM,1423–1426. https://doi.org/10.1145/3067695.3082507[27] Raymond Ros and Nikolaus Hansen. 2008. A simple modification in CMA-ESachieving linear time and space complexity. In

Parallel Problem Solving fromNature – PPSN X . Number 5199 in Lecture Notes in Computer Science. 296–305.[28] Furong Ye, Carola Doerr, and Thomas Bäck. 2019. Interpolating Local and GlobalSearch by Controlling the Variance of Standard Bit Mutation. In