[PDF] A Survey on Recent Progress in the Theory of Evolutionary Algorithms for Discrete Optimization

Abstract

The theory of evolutionary computation for discrete search spaces has made significant of progress in the last ten years. This survey summarizes some of the most important recent results in this research area. It discusses fine-grained models of runtime analysis of evolutionary algorithms, highlights recent theoretical insights on parameter tuning and parameter control, and summarizes the latest advances for stochastic and dynamic problems. We regard how evolutionary algorithms optimize submodular functions and we give an overview over the large body of recent results on estimation of distribution algorithms. Finally, we present the state of the art of drift analysis, one of the most powerful analysis technique developed in this field.

Full PDF

aa r X i v : . [ c s . N E ] J un A Survey on Recent Progress in the Theory ofEvolutionary Algorithms for Discrete Optimization

Benjamin Doerr ∗ Frank Neumann † July 1, 2020

Abstract

The theory of evolutionary computation for discrete search spaces has made a lot ofprogress during the last ten years. This survey summarizes some of the most importantrecent results obtained in this research area. It reviews important methods such as driftanalysis, discusses theoretical insight on parameter tuning and parameter control, andsummarizes the advances made for stochastic and dynamic problems. Furthermore,the survey highlights important results in the area of combinatorial optimization witha focus on parameterized complexity and the optimization of submodular functions.Finally, it gives an overview on the large amount of new important results for estimationof distribution algorithms.

Evolutionary computing techniques have been applied in a large variety of diﬀerent set-tings ranging from classical optimization problems in the context of supply chain manage-ment and renewable energy [BM16, TWD +

13, NAW20] over to the creation of music andart [Dos13, Lew08, NAN20]. The easy applicability of evolutionary algorithms makes themattractive also to users from outside computer science disciplines and is one of the majorreason for their success in a wide range of engineering applications such as the design ofwater networks [BDM15] or processing and planning in mining [MD10, OWBM13].The theoretical understanding and analysis of evolutionary algorithms is key to furtherincrease the applicability and performance of evolutionary computing methods in a widerange of settings. The area of runtime analysis as played a predominant role during thelast 25 years in the area of theory of evolutionary computation when considering discreteoptimization problems. The goal of this survey is to point out important research directions ∗ Laboratoire d’Informatique (LIX), CNRS, ´Ecole Polytechnique, Institut Polytechnique de Paris,Palaiseau, France † Optimisation and Logistics, School of Computer Science, The University of Adelaide, South Australia,Australia drift analysis has provided a wide range of analytical methods that arefrequently used for the runtime analysis of evolutionary algorithms. It establishes conditionsfor the progress of an evolutionary algorithm that lead to speciﬁc runtime bounds. Wewill summarize the most important drift theorems and their applications together with thechallenges involved when using drift analysis in Section 2.After that we turn to the parameterized complexity analysis of evolutionary computingin Section 3. This area investigates the runtime with respect to the given input size andimportant structural parameters of a given problem instance. It allows to give a more ﬁnegrained view on the runtime behavior and reveals how structural parameters inﬂuence theruntime. The analysis carried out in this area are focused on classical combinatorial opti-mization problems such as minimum vertex cover and the Euclidean Traveling Salespersonproblems and we will summarize the main results for them.Setting the parameters of evolutionary algorithms is a key challenge in the proﬁtableuse of these heuristics. In Section 3, we discuss how recent theoretical works suggest to setthe parameters. We also discuss diﬀerent ways to let the algorithm optimize its parametersitself, which currently appears as a very powerful, easy-to-use approach.

Dynamic and stochastic problems play a key role in many real-world applications andevolutionary algorithms have been shown to be very successful in dynamic and stochasticenvironments. The theoretical investigations in terms of runtime analysis for such problemshave been started by Droste in the mid 2000s and a wide range of results have been obtainedduring the last 10 years. We will summarize such results in Section 5.Many important problems can be formulated in terms of a submodular functions with agiven set of constraints. The analysis and the design of evolutionary algorithms for submodu-lar optimization problems has gained a lot of attention during the last 5 years. Various typesof constraints as well as dynamic and stochastic settings have been investigated and provablyeﬃcient evolutionary algorithms outperforming previous state-of-the-art greedy approacheshave been designed. The most important results and the diﬀerent areas investigated arepresented in Section 6.

Estimation-of-distribution algorithms (EDAs) are evolutionary algorithms which do notevolve a population of good solution candidates, but a probability distribution on the searchspace that allows to sample good solutions. Due to the complicated nature of the underlyingmathematical objects (a random process taking probability distributions as states), for along time the theoretical understanding of these algorithms was very limited. The last fewyears, however, have seen great progress in this topic, both showing new advantages of EDAssuch as robustness to noise and giving advice in how to set their parameters. We reviewsome of these results in Section 7. 2

Drift Analysis

Drift analysis has become one of the most heavily employed tools in the mathematicalanalysis of evolutionary algorithms (EAs). Interestingly, it is one of the few tool sets whichwere not imported from the classic algorithms ﬁeld. Rather, the classic algorithms ﬁeld isnow starting to use the drift theorems developed in our ﬁeld, see, e.g., [BLM +

20, GKK18,KU18, OE12].Drift analysis as tool in the performance analysis of EAs builds on the insight that it isoften easy to estimate the expected progress (with regard to some suitable measure) of anEA in one iteration. Drift analysis therefore tries to translate this information into estimatesfor the ﬁrst time that a particular goal is achieved.As a simple humorous example, inspired by a similar one from [Doe11], consider thefollowing question. You have an initial capital of $1,000. Each day you go to your favoritepub and a drink random number of beers for an expected total price of $10. After how manydays you are bankrupt?If there was no randomness involved, that is, if you would spend exactly $10 each day,then obviously it takes exactly 100 days to spend your money. So does the answer changewith randomness? Interestingly, it does not (of course, we can now only talk about theexpected number of days to bankruptcy): The expected number of days until you havespend all your money is exactly 100, regardless of the distribution of the amount you spentper day (which could be diﬀerent for each day, could depend on previous days, and could alsotake negative values). This is a simple application of the additive drift theorem (Theorem 1below).The additive drift theorem is intuitive, but is in fact a deep mathematical result. Also, wehave to note that it is not true that “randomness never changes things”. Take for examplethe opposite process: You start with no money, but each day you earn an expected numberof ten dollars. What is the expected time it takes until you have at least $1,000? Now wecan only say that it is at least 100 days (with a slightly less direct application of the additivedrift theorem), but it could be much larger. For example, if each day we earn $10,000 withprobability 0 .

001 and $0 otherwise, then if takes an expected number of 1000 days until wehave at least $1,000.Drift analysis was introduced to the ﬁeld of evolutionary computation in the seminalwork [HY01] of He and Yao. The additive drift theorem developed there from Hajek’swork [Haj82] was already the elegant tool we still use a lot, but many of its courageousapplications, e.g., to the linear functions problem, were highly technical. For that reason,many researchers shied away from using this method and preferred classic arguments likeWegener’s ﬁtness level technique [Weg01]. Over time, however, more elegant applicationsof the additive drift theorem, e.g., J¨agersk¨upper’s [J¨ag08] analysis of the linear functionsproblem, and drift theorems capturing better particular scenarios, e.g., the multiplicativedrift theorem [DJW12a], paved the way to drift analysis becoming the possibly most powerfultool in the mathematical analysis of EAs. 3 .1 Three True Drift Theorems

To show the beauty, simplicity, and power of drift analysis, we now present three centraldrift theorem. We call them true drift theorems to reﬂect that all three translate informationon the expected one-step progress into a hitting time without further assumptions on thedistribution of the one-step progress. We state these theorems in their most basic versionand trust that the reader is able to derive more general looking, but equivalent versions viascaling, shifting, or mirroring the random process.

From a deeper mathematical result of Hajek [Haj82], He and Yao [HY01] derived the additivedrift theorem and used it to prove several runtime bounds.

Theorem 1 (additive drift theorem) . Let X , X , . . . be a sequence of random variablestaking values in some ﬁnite set S ⊆ R ≥ with ∈ S . Let T = inf { t | X t = 0 } . • Assume that there is a δ > such that for all t ≥ and s ∈ S \ { } , we have E [ X t − X t +1 | X t = s ] ≥ δ . Then E [ T | X ] ≤ X δ . • Assume that there is a δ > such that for all t ≥ and s ∈ S \ { } , we have E [ X t − X t +1 | X t = s ] ≤ δ . Then E [ T | X ] ≥ X δ . Without going into details, we note that the assumptions can be weakened slightly, e.g.,one can replace the “point-wise drift requirement”, that is, the conditioning on X t = s , byan “average drift condition”, that is, conditioning only on X t > S ⊆ R ≥ and the second part is true also for boundedinﬁnite sets S ; see [LS18], where also a short and elegant proof of this result is presented.The additive drift theorem gives good results if there is a roughly uniform progressregardless of time and state. In fact, as the two estimates together show, the additive drifttheorem gives an exact estimate for the hitting time T when the expected progress is knownto be exactly δ at all times before hitting the target. For many natural optimization processes, the progress towards the optimum slows downwhen getting closer to the optimum. To use the additive drift theorem in such situations,the natural distance measure has to be transformed in such a way that the resulting expectedprogress is roughly uniform. Since the expected transformed progress is usually not just thetransformation of the expected progress, such proofs can become technical and unintuitive.Noting that a common situation is that the expected progress is roughly proportional tothe distance to the target, in [DJW12a] a multiplicative drift theorem was derived from theadditive drift theorem. With a simpler direct proof, the following variant was later shownin [DG13]. According to [Len20], the multiplicative drift theorem is the most often useddrift theorem in the theory of evolutionary algorithms.4 heorem 2 (multiplicative drift theorem) . Let X , X , . . . be a sequence of random variablesover a state space S ⊆ { } ∪ R ≥ with ∈ S . Let T = min { t | X t = 0 } . Assume that thereis a δ > such that for all t ≥ and s ∈ S \ { } , we have E [ X t +1 | X t = s ] ≤ (1 − δ ) s .Then the following estimates hold. • E [ T | X ] ≤ ln( X )+1 δ . • For all λ > , we have Pr[

T > ⌈ ln( X )+ λδ ⌉ ] ≤ exp( − λ ) . While indeed very many processes occurring in the analysis of evolutionary algorithms dis-play an additive or multiplicative drift behavior, there remain processes in which the driftis decreasing when approaching the target (so that the additive drift theorem is hard touse), but not in a multiplicative fashion (so that the multiplicative drift theorem is hardto use). For these, so-called variable drift theorems can be applied. The ﬁrst variable drifttheorem for the analysis of evolutionary algorithms was proposed by Mitavskiy, Rowe, andCannings [MRC09], however, the independently developed result of Johannsen [Joh10] ap-pears to be used more often. The following is a variant of Johannsen’s result avoiding theuse of integrals.

Theorem 3 (variable drift theorem) . Let X , X , . . . be a sequence of random variablesover a ﬁnite space S . Assume that S = { s , . . . , s M } with s < s < · · · < s M .Let T = min { t | X t = 0 } . Assume that there is a monotonically non-decreasing function h : S \{ } → R such that for all t ≥ and s ∈ S \{ } , we have E [ X t − X t +1 | X t = s ] ≥ h ( s ) .Then E [ T | X ] ≤ P X i =1 s i − s i − h ( s i ) . The above are, most likely, the three most important drift theorems. We mention that theonly other real drift theorem (that is, not requiring additional assumptions on the one-stepdistribution) we are aware of is the following result proven in [DLO19]: Let a random processas in the multiplicative drift theorem be given, but with the drift condition E [ X t − X t +1 | X t = s ] ≥ δs replaced by the slightly stronger condition E [ X t − X t +1 | X t = s ] ≥ δs (log γ ( s ) + 1)for some γ >

1. Then E [ T | X ] ≤ γ +max { , log γ X } δ . We do not know if this resultwill ﬁnd other applications, so we state the result here maily to demonstrate that an onlyslightly stronger assumption on the drift – Ω( s log s ) instead of Ω( s ) – can lead to a drasticallysmaller hitting time – O (log log X ) instead of O (log X ). The results presented in the previous section derive estimates for hitting times solely fromthe expected one-step progress; however, with two important restrictions: (i) except for theadditive drift theorem, only upper bounds for hitting times can be obtained, and (ii) onlyprocesses can be analyzed in which there is a drift towards the target that can be uniformlybounded or that decreases when approaching the target.5onsequently, these drift theorems miss out a large number of behaviors of randomprocesses that occur in the analysis of evolutionary algorithms. In this section, we brieflydescribe such behaviors and what solutions for their analysis exist. Unfortunately, and this isthe reason why we shall state no precise result, all these tools not only require information onthe expected one-step change, but also on the distribution of the one-step change (typically,that the one-step change is concentrated around its expectation). For all results, this is nota weakness of the result, but an intrinsic necessity.

From classic algorithms theory we know that it is very valuable to also have lower boundson runtimes as these quantify how good our performance guarantees (upper bounds) are. Ifwe have derived an upper bound from a certain drift behavior, say additive, multiplicative,or a certain variable drift, then the most natural approach would be to show a matchingor near-matching upper bound on the expected one-step progress and derive (via a suitabledrift theorem) from it a lower bound on the runtime. This works perfectly for the additivedrift theorem as it contains such matching upper and lower bound results.For multiplicative and variable drift, the theorems presented in the previous section aremissing such matching results, and this for good reason, namely because in general they arenot true. As a simple example, consider the process on the state space S = { , n } , startingwith probability one in X = n , which leaves state n to 0 with probability 1 /n and stays in n otherwise. Apparently, we have E [ X t +1 | X t = n ] = n − − n ) n , that is, we have perfectmultiplicative drift with δ = n . The multiplicative drift theorem thus gives an estimate forthe expected hitting time of E [ T ] = O ( n log n ). This is best-possible in the sense that thereare processes with multiplicative drift with δ = n which indeed need Ω( n log n ) time, but forthis particular process, the truth obviously is E [ T ] = n . This shows that a matching lowerbound cannot exist without additional assumptions.The assumption that usually gives the desired behavior (and the desired lower bounds)is that the one-step progress is concentrated around its expectation, typically with someexponential tails or by forbidding large progresses at all. We spare the details and point thereader to [Wit13, DDK18] for a multiplicative drift theorem for lower bounds and to [DFW11,GW18, DDY20] for variable drift theorems for lower bounds. All three main drift theorems require that the one-step progress is not increasing whenapproaching the target. This is a behavior often observed in evolutionary computation:The better the current solutions are, the harder it is to make progress. However, also theopposite behavior can be found, for example, when we consider how a better individual takesover a population. Here we would expect that the number of copies of the good individualincreases in a multiplicative fashion (of course, only up to the point that a certain saturationis reached). Processes showing an increasing multiplicative drift have been analyzed inseveral papers dealing with population-based EAs, most notably in Lehre’s [Leh11] level- ased theorem and many follow-up works. An explicit formulation of a drift result for suchprocesses was given in [DK19]. Again, an expected multiplicative one-step progress is notenough, but some additional concentration assumptions are necessary. Motivated by theapplication to population processes, the additional assumption was made that the one-stepprogress stochastically dominates a binomial distribution. A diﬀerent situation is that a process shows a drift away from the target and that wewant to argue that it takes a long time to reach this target. Such a situation naturallyarises again in lower-bound proofs. The ﬁrst such drift theorem was given by Oliveto andWitt [OW11, OW12]. Like many results proven later, see again the survey [Len20], it showsthat if there is a constant negative expected progress in some interval of length ℓ and theone-step changes have both-sided exponential tails, then with probability 1 − exp( − Ω( ℓ )),the process takes time exponential in ℓ to reach the target.A diﬀerent approach to analyze a negative drift situation was taken in [ADY19]. Insteadof the true process X t , one regards an exponential transformation Y t = exp( c ( X t − d )) forsuitable constants c, d , shows that Y t has at most a constant additive drift, and then usesthe lower bound part of the additive drift theorem to derive the desired result. Dependingon how easy it is to compute the drift of the transformed process, this approach mightbe technically simpler than using the existing negative-drift theorems. Diﬀerent from allexisting negative-drift theorems, it allows to derive explicit constants in the exponent. Asshown in [Doe20a], this approach can also give super-exponential lower bounds.Very recently, a negative drift theorem without additional constraints was presentedin [Doe20b]. At the moment, it is hard to foresee if it will ﬁnd other applications than thosepresented in [Doe20b]. The results discussed so far show that we now have a decent number of drift theorems, whichcover many diﬀerent random processes. While surely new drift theorems will come up andexisting ones will be polished, we are optimistic that the drift theorems developed in the lasttwenty years allow us to analyze most random processes occurring in the analysis of EAs.What is less understood, and often still a challenge, is deﬁning the right random process.To be able to apply a drift theorem, we need to deﬁne a random process ( X t ) that describessome aspect of the run of our EA on some problem. Formally speaking, we need a function g that maps the full state S t of the algorithm after iteration t into a real number X t = g ( S t ),and this in a way that the process ( X t ) still contains some relevant information of the run ofthe EA (e.g., that a suitable hitting time of ( X t ) corresponds to the time when an optimumwas ﬁrst found) and in a way that a drift theorem can be applied. While there are somegeneric solutions to this technical problem, many questions are still open here and this mightbe the biggest challenge in the future of drift analysis.7 natural way to deﬁne the potential function g is to take the ﬁtness distance of thecurrent-best solution to the optimum. This works well when there is a good correlationbetween the remaining optimization time and the ﬁtness distance as observed, e.g., for thesimple benchmarks OneMax and

LeadingOnes (note that the classic analyses [DJW02]stem from the time before drift analysis was introduced and hence use Wegener’s [Weg01]ﬁtness level method) or combinatorial problems such as the minimum spanning tree problem(again, the classic proof [NW07] does not use drift, but the expected multiplicative weightdecrease method) or the maximum satisﬁability problem with clauses of length 3 [DNS17].An equally natural potential is the structural distance to the optimum, e.g., the Hammingdistance in the case of pseudo-Boolean optimization. This was used, e.g., to show that the(1 + 1) EA with mutation rate c/n , c a constant strictly between 0 and 1, optimizes anystrictly monotonic function in time O ( n log n ) [DJS + f ( x ) = P ni =1 a i x i . With a sequence of more powerful potentialfunctions, all diﬀerent from ﬁtness and structural distance, increasingly strong results wereobtained [DJW02, DJW12a, DG13, Wit13]. Unfortunately, it remains unclear how to easilyderive such potential functions. In fact, the only result regarding this question is a negativeone, namely that to prove the results for larger mutation rates such as [DG13, Wit13], itis not possible to use one “universal” potential function for all linear functions, but thepotential has to be chosen depending on the problem instance [DJW12b].In three particular directions, we currently see the greatest lack of understanding how todeﬁne potential functions to use drift analysis. These are the following.

Once a relatively compact analysis of the runtime of the (1 + 1) EA with standard mutationrate 1 /n on linear functions was found [DJW12a], the question was raised how far thesemethods could be extended. One direction are linear functions deﬁned not on bit strings,but on higher-arity representations { , . . . , r } n . While the O ( n log n ) runtime estimate couldbe shown for the search space { , , } n [DJS11], it was also shown in this work that there isno universal potential function from r ≥

43 on. With instance-speciﬁc potential functions, an O ( rn log n + r n log log n ) upper bound was shown in [DP12]. This extends the O ( rn log n )bound to all r = O (( log n log log n ) / ), but not beyond. It is an open problem whether larger r indeed lead to an inferior runtime behavior or not. This example and the general shortage ofworks analyzing EAs with representations diﬀerent from bit strings via drift analysis (we areonly aware of [KLW15a, LW16, DDK18]) suggest that more work is needed in this direction. We note that J¨agersk¨upper with a clever averaging argument could also use the structural distance aspotential function. .3.2 Drift Analysis for Population-based EAs. All works described above, and in general the vast majority of runtime analyses building ondrift arguments, only regard very simple EAs such as the (1 + 1) EA or, occasionally, the(1 + λ ) EA or the (1 + ( λ, λ )) GA. For such EAs, a potential function only needs to estimatethe quality of the single parent individual. For EAs working with a non-trivial parentpopulation, it is much harder to deﬁne a suitable potential function. In fact, the main work onlower bounds for such algorithms by Lehre [Leh10] used drift arguments only in the ancestrallines of single individuals and captured the eﬀect of the whole population via family trees(see [Doe20b] for an alternative approach). Again for lower bound proofs, Neumann, Oliveto,and Witt [NOW09] and later [OW15, ADY19] used P x ∈ P c OneMax ( x ) as potential (to bemaximized) of a population P in an algorithm maximizing the OneMax benchmark, where c > +

09, ADFH18].

With the popularity of dynamic parameter choices both in theory (see also Section 4.2) andpractice, there is a strong need for mathematical methods to analyze such algorithms. Fromthe perspective of drift analysis, again the challenge is to deﬁne a suitable potential functionon the cross product of populations (which in the simplest case are just single individuals)and parameter values (or more generally, the full inner state of the algorithm). So far, we areonly aware of the four works [DDK18, AAG18, Row18, DWY18a] providing solutions to thisproblem. In the interest of brevity, we refer to [DWY18b, Section 1.3] for a more detaileddiscussion, and state here only that our impression is that more work on this problem isnecessary (and desirable) to ease future analyses of dynamic parameter settings.

Traditional runtime analysis investigates the runtime of an evolutionary computing techniquewith respect to the size of the given input. This takes a worst case perspective over allpossible inputs without being able to distinguish important input characteristics that makea problem hard or easy to solve.Parameterized analysis of algorithms [DF99] allows to investigate algorithms not justwith respect to the worst-case behaviour regarding the length of the given input, usuallydenoted by n , but also with respect to some additional parameter(s) that characterize theproblem. A problem is called a ﬁxed parameter tractable (FPT) with respect to a parameter9 lgorithm 1: GSEMO Choose an initial solution x ∈ { , } n uniformly at random; Determine f ( x ) and initialize P ← { x } ; repeat forever Choose x ∈ P randomly; Create x ′ by ﬂipping each bit of x independently with probability 1 /n ; Determine f ( x ′ ); if ∃ x ′′ ∈ P, f ( x ′′ ) ≤ f ( x ′ ) and f ( x ′′ ) = f ( x ′ ) then P is unchanged else exclude all x ′′ where f ( x ′ ) ≤ f ( x ′′ ) from P and add x ′ to Pk iﬀ there is an algorithm that runs in time O ( poly ( n ) · f ( k )), where f ( k ) is a function onlydepending on k . We call an algorithm an FPT algorithm with respect to a parameter k iﬀit runs in time O ( poly ( n ) · f ( k )). This implies that an FPT algorithm runs in polynomialtime if k is constant.The approach of analyzing evolutionary algorithms in the context of parameterized com-plexity has been introduced by Kratsch and Neumann [KN13] although there are earlieranalyses that investigate RLS and the (1+1) EA for the maximum clique problem on pla-nar graphs [Sto06] and population-based EAs with respect to the runtime dependent on thesize of the cliques obtained [Sto07]. We call an evolutionary algorithm a ﬁxed-parameterevolutionary algorithm with respect to a parameter k iﬀ its expected optimization time is O ( poly ( n ) · f ( k )). As common in the runtime analysis of evolutionary algorithms, the ex-pected optimisation time refers to the expected number of ﬁtness evaluations until an optimalsolution has been produced for the ﬁrst time. The minimum vertex cover problem is the classical problem in the area of parameterizedcomplexity and several FPT algorithms are available. The input of the minimum vertexgraph problem is an undirected graph G = ( V, E ) and the goal is to ﬁnd a minimum set ofvertices V ′ ⊆ V such that each edge is covered by at least one node of V ′ , i.e. e ∩ V ′ = ∅ holds for all e ∈ E .Kratsch and Neumann [KN13] showed that a simple evolutionary multi-objectivealgorithm called GSEMO (see Algorithm 1) frequently used in the area of runtimeanalysis [NW06, GL10, FHH +

10] is able to compute a kernelization for the problem. Akernelization is a reduced problem where the decision for some nodes whether or not toinclude them has already been made in an optimal way. It is shown in [KN13] that such akernelization can be obtained by using two diﬀerent types of helper objectives as a secondobjective. The ﬁrst one considered in the article is the number of uncovered edges of a given10olution x . The second approach uses the optimal value of the linear programming relaxationof the graph consisting only of the uncovered edges of a given solution x . Note that bothhelper objectives estimate the degree of infeasibility of a solution x which is quite commonwhen using multi-objective models for single-objective optimization problems in the contextof evolutionary computing.Having obtained such a solution an alternative mutation operator ﬂipping bits corre-sponding to nodes that are adjacent to so far uncovered edges can obtain an optimalsolution in time O ( f ( k ) · poly ( n )) which leads to the result that the examined evolu-tionary algorithms are ﬁxed parameter evolutionary algorithms. It has also been shownthat a factor (1 + ǫ )-approximation, 0 ≤ ε ≤

1, can be obtained in expected time O ( n · log n + OP T · n + n · (1 − ǫ ) · OP T ), where

OP T is the value of an optimal solution, whenusing the LP relaxation as the second objective. This gives a trade-oﬀ between approxi-mation quality and runtime. Setting ǫ = 1, it shows that the approach computes a factor2-approximation in expected polynomial time.In the weighted vertex cover problem, each node has a positive weighted and the goalis to minimize the sum of the weights of the chosen nodes under the condition that alledges are covered. The use of the dual formulation of the vertex cover in form of edgesets has been investigated by Pourhassan et al. [PSN19]. They have generalized the edgebased presentation by Jansen et al. [JOZ13] to the weighted case and shown that theirevolutionary multi-objective algorithm is a ﬁxed parameter algorithm for the weighted vertexcover problem. The authors have shown that a 2-approximation for the weighted vertexcover problem is obtained by the algorithm in expected polynomial time and presented apopulation-based approach which achieves a (1 + ǫ )-approximation in expected time O ( n · min { n, − ǫ ) OP T } + n ). Setting ǫ = 1, it shows that the approach computes a factor 2-approximation in expected polynomial time for the weighted vertex cover problem. The traveling salesperson problem is another very prominent problem in the area of com-binatorial optimization. Given a set of n cities i = 1 , . . . , n , and distances d ( i, j ) betweenthem the goal is to compute a tour of minimal cost visiting each city exactly one and re-turning to the origin. A possible solution for the TSP is usually given by a permutation π = ( π (1) , . . . , π ( n )) of the given n cities and the goal is to ﬁnd a tour π that minimizes c ( π ) = d ( π ( n ) , π (1)) + n − X i =1 d ( π ( i ) , π ( i + 1)) . In the context of parameterized analysis of evolutionary algorithms, the Euclidean TSPhas been investigated by Sutton et al. [SNN14]. Here each city i is given as coordinates( x i , y i ) and the distance between city i and j is given as d ( i, j ) = p ( x j − x i ) + ( y j − y i ) .The Euclidean TSP is still NP-hard but admits a PTAS. In terms of parameterized analysis,the impact of the number of inner points has been considered which is given by the number11f points that do not lie on the convex hull of the points in 2D. We denote by n − k thepoints on the convex hull and k the number of inner points.The Euclidean TSP can be solved by classical algorithms in time O ( poly ( n ) · f ( k )) usingdynamic programming [DHOW06]. This makes use of the properties that an optimal solutionhas to visit the points of the convex hull in the order as they appear on the hull. The diﬃculttask is then to ”ﬁll in” the inner points such that an optimal solution is obtained.Investigations in the area of evolutionary algorithms focused on the runtime analysiswith respect to the number of inner points for the Euclidean TSP. The ﬁrst part of theanalysis carried out in [SN12] analyzes the expected time until the classical (1+1) EA usinginversions as the mutation operators has computed a tour that is intersection free. Theanalysis depends on the progress that can be made by inversion operations removing anintersection and this progress depends on the angle ǫ > k inversion operationsare suﬃcient to produce from an intersection free tour an optimal tour. This implies thatthe (1+1) EA obtains an optimal solution in expected time O ( n m + n k (2 k − m × m grid and no set of three points is collinear. Here theparameter m for the grid directly determines the smallest angle that any set of three noncollinear points can have. Note that the runtime bound does not meet the requirement of aﬁxed parameter evolutionary algorithm.Afterwards, the ability of evolutionary algorithms to ﬁll in the inner points correctlygiven that the points on the convex hull are in correct order has been examined. Antcolony optimization [NSN13b] and evolutionary algorithms [NSN13a, SNN14] have beeninvestigated in this context. For ant colony optimization, the crucial aspect to obtain aruntime of O ( n k ) is to construct solutions following the order on the convex hull. Forevolutionary algorithms, a population-based algorithm building on a previous approach ofTheile [The09] and allowing to build an optimal tour following dynamic programming leadsto a ﬁxed parameter evolutionary algorithm with respect to the number of inner points.Furthermore, it is shown in [SNN14] that a simple ( µ + λ )-EA searching for a permutationof the inner points and connecting them to the outer points using the dynamic programmingapproach given in [DHOW06] leads to a ﬁxed parameter evolutionary algorithm. The early studies of Storch [Sto06] for the maximum clique problem in planar graphs in-vestigated the runtime of RLS and the (1+1) EA with respect to the size of the maximumclique. The ﬁtness (to be maximized) of a search point x ∈ { , } n , representing a selectionof nodes, is given by the number of selected nodes if x represents a clique and −∞ otherwise.The algorithms investigated start with the initial solution x = 0 n which is a feasible solu-tion. For standard bit mutations an expected optimization time of Θ( n ) has been shown12or (1+1) EA. However, it should be noted that the size of a maximum clique in a planargraph is at most 4 as the complete graph on 5 vertices is not planar. Improved results havebeen shown in [Sto06] for restart strategies used in RLS and for variants of the ( µ + 1)-EAalways deleting an individual with the worst ﬁtness from the population.The use of problem-speciﬁc mutation operators in the (1+1) EA for the maximum leafspanning tree problem has been investigated in [KLNO10]. In this work, it has been pointedout that standard bit ﬂip mutations do not lead to ﬁxed parameter evolutionary algorithms,where the parameter is the value of an optimal solution. Edge exchanges that include anedge currently not present in a spanning tree and that remove an edge from the resultingcycle are frequently used for spanning tree problems as they again lead to spanning trees.Using edge exchanges for mutation where the number of edge exchanges is chosen accordingto a Poisson distribution with expected value 1, it has been shown in [KLNO10] that theresulting (1+1) EA is a ﬁxed-parameter evolutionary algorithm when taking the value of anoptimal solution OP T as the parameter.

The parameters of an evolutionary algorithm allow to adjust the EA to the problem to besolved and thus to optimize its performance. This is a great feature of EAs, but, at the sametime, a diﬃcult challenge [LLM07]. Missing good parameter values often gives a horribleperformance. Unfortunately, there is not much general advice on how to set the parameters.The few suggestions in this direction we have, however, have been inﬂuenced signiﬁcantly bytheoretical works. In this section, we show how theoretical works have helped to understandhow the parameters of EAs inﬂuence their performance. Recently, the theory of EAs hasalso made big progress in understanding and even designing automated ways to ﬁnd goodparameter values. By parameter tuning we understand the problem (or process) or ﬁnding suitable parame-ter values and then running the EA with these parameters. The parameter values are notchanged during the run, so we speak also of static parameter values . For reasons of space,we cannot discuss the whole literature on theoretical results that help tuning EA parame-ters, and therefore pick the mutation rate in standard bit mutation as the most prominentexample. Other parameters that have attracted theoretical research include the parent andoﬀspring population size (see, e.g, [JJW05, Wit06, RS14, DK15, ADFH18]) and the selec-tion pressure (see, e.g., [JS07, Leh10, Leh11, ADY19]). For a discussion on how to set theparameters of estimation-of-distribution algorithms, we refer to Section 7.2.3.The mutation rate is the parameter most discussed in the literature, and for good reason.A too small mutation rate leads to slow a progress because the radius of exploration is small.A too high mutation rate is detrimental because the random choice of the bits to be ﬂipped13n average increases the distance from the target solution, and this eﬀect is linear in themutation rate.An early established [M¨uh92, B¨ac93] and generally accepted [B¨ac96, BFM97] recommen-dation is to use the mutation rate p = n in standard bit mutation, that is, we generatean oﬀspring by ﬂipping each bit independently with probability n . With this choice, theexpected distance between parent and oﬀspring is one, so we inherit principles from localsearch. Diﬀerent from local search, this mutation operator can leave local optima by ﬂippingmore than one bit.A large number of mathematical runtime analyses shows that p = n often is optimaland thus complements the experimental support for this recommendation (see, e.g., [Och02]and the references therein). For the performance of the (1 + 1) EA on OneMax , a mix ofrigorous and heuristic arguments already in [M¨uh92] and then fully rigorously in [GKS99]shows that p = n is asymptotically optimal. For the LeadingOnes benchmark, a rate of p ≈ . n was proven to be optimal in [BDN10]. The OneMax result was greatly extendedin [Wit13] with a proof that p = n is the asymptotically optimal mutation rate for eachpseudo-Boolean linear function with non-zero coeﬃcients. In [GW17] it was proven that p = n is the asymptotically optimal mutation rate for the (1 + λ ) EA when the oﬀspringpopulation size λ is not too large. The optimality of p = n was also shown for the opti-mization of long-path functions [Sud13]. For monotone functions, the situation is not fullyunderstood, but again mutation rates around p = n appear to be a good choice. For theruntime of the (1 + 1) EA on strictly monotonically increasing functions, a Θ( n log n ) run-time can easily be shown when the mutation rate is cn for a constant 0 < c <

1. That cn mutation rates for larger c can lead to exponential runtimes was ﬁrst shown in [DJS + c is 2 . ... [LS18]. In the range around p = n , for along time only a runtime guarantee of O ( n / ) was known for p being exactly n [Jan07]. Asigniﬁcant progress on this long-standing problem was only made very recently – in [LMS19]an entropy compression argument was used to show that an O ( n log n ) runtime guaranteeholds for all mutation rates p = cn , where c ≤ c for some constant c > p = n is a good ﬁrst choice forthe mutation rate, but by no means they prove that it always is. Indeed, already in [JW00] anexample was constructed such that the (1 + 1) EA with any mutation rate that is not Θ( log nn )needs super-polynomial time with high probability to optimize this problem. In [Pr¨u04], theoptimal mutation rates for the (1 + 1) EA optimizing hurdle functions with hurdle widths2 and 3 were shown to be n and n . This result could have led to the following ﬁndings,but apparently its broader implications on mutation rates (in a paper primarily discussingcrossover) were not detected. So it was only in [DLMN17] that the optimal mutation rateof the (1 + 1) EA on jump functions was shown to be roughly kn , where k is the size of theﬁtness gap of the jump function. Also, it was shown that a small deviation from the optimalrate, say by a factor of (1 ± ε ), ε > k .This result shows that the optimal mutation rate depends strongly on the input instance,that there is no rate that is universally good for all jump functions, and that the price for14issing the right rate is signiﬁcant. This let the authors of [DLMN17] suggest to use a ran-dom mutation rate, chosen independently for each mutation from a power-law distribution.This heavy-tailed mutation operator shares with the classic mutation operator the propertythat a single bit (and more generally, any constant number of bits) is ﬂipped with constantprobability. When the power-law exponent is above two, then it also shares the property thatan expected constant number of bits is ﬂipped. Diﬀerent from the classic recommendation,however, higher numbers of bits are ﬂipped with larger probabilities. This essentially parame-terless operator was shown to give on any jump function a performance of the (1 + 1) EA thatdiﬀers from the one with instance-optimal mutation rate by only a small factor polynomialin k . Heavy-tailed mutation operators proved to be successful in several other discrete op-timization problems [FQW18, FGQW18b, FGQW18a, WQT18, ABD20a, ABD20b, AD20].From a broader perspective, this line of work is an example showing that theoretical work notonly can help understanding evolutionary algorithms, but it can also propose new operatorsand algorithms. Instead of trying to ﬁnd a good parameter setting before starting the EA and sticking tothis choice throughout the run of the EA, one could also think of optimizing the parametersduring the run of the algorithm. This sophisticated-looking idea is called parameter control and turns out to be less frightening than it appears at ﬁrst.Indeed, the decision space (and thus also the opportunity to take an unsuitable decision)is much larger now – in principle, we could choose diﬀerent parameter values in each iteration– but there are several powerful ways to overcome this diﬃculty. The advantage of parametercontrol is that we can react on the performance observed so far. This has two particularlypositive consequences: (i) The need for ﬁnding good parameter values before the start ofthe algorithm, based on a maybe only vague understanding of the problem to be solved, isreduced since a suboptimal initial choice can be corrected. (ii) In the common situation thatdiﬀerent parameter settings are optimal during diﬀerent stages of the optimization process,we have the chance to use the optimal parameters for each stage (whereas a static choicewould need to ﬁnd a suitable trade-oﬀ).It is clear that the large space of diﬀerent parameter settings for each iteration rendersit unlikely to ﬁnd the absolutely best dynamic choice of the parameters. However, it turnsout that often very simple success-based or learning-based approaches lead to a very goodperformance, and often one that is better than the best static parameter setting. This isconﬁrmed in many practical applications, see, e.g., [KHE15], but also in now a decent numberof theoretical works.The theoretical superiority of dynamic parameter settings over static ones was alreadydemonstrated in [DJW00] (see also [JW06] for an extension of this work), albeit for a simplealgorithm with a simple time-dependent parameter choice optimizing an artiﬁcial problem.Nevertheless, this result has rigorously proven that, in principle, dynamic parameter choicescan eﬃciently solve problems where classic static choices would badly fail. Interestingly, the15dea of time-dependent mutation rates was recently used again [RW20] to help EAs leavinglocal optima.It took ten years until dynamic parameter choices could be shown superior also for classicbenchmark problems. The ﬁrst such work [BDN10] (see also [Doe19a, Section 2.3] for an ex-tension) showed that a constant-factor runtime gain can be obtained from a ﬁtness-dependent choice of the mutation rate when optimizing the classic

LeadingOnes benchmark via the(1 + 1) EA. Again it took some time until in [BLS14], a super-constant runtime gain (of or-der O (log log λ )) from a dynamic parameter setting was shown for the (1 + λ ) EA optimizing OneMax . Other ﬁtness-dependent parameter choices were discussed in [DDE15, DDY20].A main problem with ﬁtness-dependent parameter settings (or more generally speaking, pa-rameter choices that depend on the current state of the algorithm) is that is needs a verygood understanding of the problem to deﬁne a suitable functional dependence of the param-eter value on the algorithm state. For the two examples from [BLS14, DDE15], it appearsunlikely that without a mathematical analysis someone would have found the optimal func-tional dependence. Finding sub-optimal state-dependent parameter values that beat thebest static values appears more realistic, but this remains a challenging task requiring a lotof expert knowledge.Fortunately, there are dynamic parameter settings that need much less expertise. Gener-ally speaking, these observe how the algorithm performs with the current parameter values(and sometimes also the values used in a longer history) and based on this try to adjustthe parameter values to more proﬁtable values. The easiest of these on-the-ﬂy parameterchoices are success-based multiplicative parameter updates . Assume that we have a parameterfor which we suspect that an increase increases the chance to ﬁnd an improvement, but thisalso increases the computational cost of one iteration. Then increasing the current parametervalue after each iteration without improvement and decreasing it after each iteration withimprovement is a simple way to try to move the parameter value into a proﬁtable region. Ex-actly this was suggested for the oﬀspring population size λ of the (1 + λ ) EA in [JJW05] andwas rigorously analyzed in [LS11], where an asymptotically optimal speed-up of the parallelruntime (number of iterations, ignoring the diﬀerent costs of the iterations) was shown. Thesame basic idea was shown to give a (small) asymptotic improvement of the total runtime(number of ﬁtness evaluations) for the (1 + ( λ, λ )) GA optimizing OneMax [DD18] andcertain random SAT instances [BD17].The usual way to change the parameter value is multiplying or dividing by suitable con-stant factors. In [LS11], simply the factor 2 was used, and it is clear that any other constantfactor would have given the same asymptotic runtime. In general, as observed in [DD18],smaller update factors can be the safer choice, and also the relation of the factors used in caseof success and no success can be important. In [DDL19], a detailed analysis how the choice ofthese hyperparameters inﬂuences the runtime of the (1 + 1) EA with dynamic mutation rateon the

LeadingOnes function was conducted. Other theoretical works on multiplicativeparameter updates include [DDK18] for multi-valued decision variables, [MS15] for migra-tion intervals of island models, and [DLOW18] for the learning period of a hyperheuristic.We note that the results just described are the ﬁrst examples of success-based parameter16pdates in discrete evolutionary optimization. In continuous optimization, a multiplicativeupdate of the step size known as one-ﬁfth rule was already proposed in [Rec73].Multiplicative update rules work best if there is a simple monotonic inﬂuence of theparameter on the success, e.g., as seen for the oﬀspring population size of the (1 + λ ) EA.Since such a simple relation is harder to ﬁnd for the mutation rate in the (1 + λ ) EA, adiﬀerent success-based scheme was developed in [DGWY19]. Here half of the oﬀspring aregenerated with twice the current rate, the other half with half the current rate. The mutationrate is then updated to the rate the best oﬀspring was generated with (however, only withprobability a half, with the other one-half probability the new rate is chosen randomly fromthe two alternatives). This mechanism was shown to let the (1 + λ ) EA optimize OneMax inasymptotically the same time as with the optimal ﬁtness-dependent mutation rate developedin [BLS14].A second way to go beyond multiplicative updates, and to additionally take more stabledecisions, was proposed in [DDY16]. Here for a small number of possible values of a param-eter, a time-discounted estimate of the eﬀectiveness of this parameter value was computed.In each iteration, with large probability, the best-performing value was used (exploitation)and with small probability a random one of the other values was used. With the right choiceof the hyperparameters, this mechanism was shown to arbitrarily well approach the optimalmutation strengths of the (1 + 1) EA optimizing

OneMax that were computed in [DDY20].The most generic way to let an EA optimize its parameters itself is self-adaptation , whichmeans that the parameters are made part of the encoding of the solution candidates andthus become subject to variation and selection. This idea goes back to [B¨ac92]. Taking themutation rate as example, one appends an encoding of the mutation rate to the represen-tation of the solution candidates. When mutating such an extended individual, one ﬁrstmutates the mutation rate encoded in the extended individual and then, with the new rate,the remainder of the individual. The hope is that the suitability of a rate is visible froma higher ﬁtness of the resulting individuals, and that the selection mechanisms of the EAbring these individuals (and thus the good mutation rate) forward in the population. Whilethis way of adjusting parameters is clearly more natural for an EA than parameter adjust-ment mechanisms outside the evolutionary process, only two rigorous results supporting theusefulness of self-adaptation in discrete evolutionary computation have been published. In aﬁrst proof-of-concept work [DL16], an example is constructed that shows that self-adaptationcan be useful. In this example, only two diﬀerent mutation rates are available and it is as-sumed that the whole initial population starts in a particular search point. In [DWY18a],the (1 , λ ) EA with self-adapting mutation rate is analyzed. With the hyperparameters suit-ably chosen, it can evolve suﬃciently good mutation rates to obtain asymptotically the sameperformance on

OneMax that was previously obtained with the optimal ﬁtness-dependentsetting [BLS14] and the two-population self-adjustment [DGWY19].

The existing results show that we are now able to analyze a variety of static and dynamicparameter choices with a precision high enough to clearly distinguish good from bad choices.17ome of these works not only analyzed existing algorithms or parameter adjusting mech-anisms, but also suggested new approaches. Clearly, as true for all theoretical works, thealgorithms and problems that were regarded are much simpler than those occurring in apractical application of EAs. To what extent the recommendations obtained from these sim-ple settings generalize to more realistic ones is a crucial question which can only be answeredin a collaboration between theoretical and applied researchers.From the theory perspective, the following questions appear timely and interesting. • Interaction of parameters: So far, the vast majority of runtime analyses varies at mostone parameter of the algorithm. Experience from practice shows that the interaction ofseveral parameters is even harder to understand. So more runtime analyses discussingseveral parameters at once are clearly needed. Also, to the best of our knowledge,there is currently no theoretical work regarding two or more independent heavy-tailedparameters or self-adjusting or self-adaptive settings of two or more parameters. • Self-adaptation: The most natural way to let an algorithm optimize its parameters isself-adaptation, where the parameters are integrated into the evolutionary cycle. Sofar, only very little theoretical advice exists how to successfully control parameters viaself-adaptation. Here clearly more work is required. • Connections with machine learning: The area of machine learning has made tremen-dous progress in the last decades. Given that EAs are iterative algorithms in whichoften the state of the system changes only little in each iteration, one could envisagethat dynamic parameter choices can proﬁt from ideas and concepts borrowed frommachine learning. While some ideas used in EAs can be related to similar ideas inmachine learning, it seems to us that the full power of this connection has not yet beenexploited.

Dynamic and stochastic environments play a key role in real-world applications as informa-tion is often uncertain and circumstances change over time. Evolutionary algorithms havethe ability to deal with changing circumstances and perform well in noisy environmentswhich makes them well suited for dealing with dynamic and stochastic problems. The areaof runtime analysis has initially focused on simple toy problem in dynamic and stochas-tic settings. Again the function OneMax has played a crucial role to get initial insights.An important aspect in the context of dynamic optimisation is how often and how drastica function changes over time. We will describe important results for settings where thefunction or the constraints of a given problem change dynamically. Furthermore, we willsummarize results where the ﬁtness evaluation is impacted by noise and point out diﬀerentresults according to diﬀerent noise models studied in the literature. Additional investigations18egarding dynamic and stochastic constraints in the context of submodular optimization aresummarized in Section 6.

Runtime analysis for dynamically changing functions in discrete search spaces have beenstarted by Droste [Dro02, Dro03]. He investigated a dynamic variant of the classical OneMaxproblem on binary strings. In the ﬁrst dynamic setting, one randomly chosen bit is ﬂippedin each iteration with probably p . Droste [Dro02] showed that the expected optimizationtime of the (1+1) EA is polynomial iﬀ p = O (log( n ) /n ). In the case where each bit is ﬂippedin each iteration with a given probability p investigated in [Dro03], the runtime becomessuper-polynomial if p = ω (log( n ) /n ) and is polynomial if p = O (log( n ) /n ).These investigations have ten years later on be revisited using drift analysis and general-ized to the case where each element is not binary but can take on r diﬀerent values [KLW15b].A comparison on the ability of simple evolutionary algorithms and ant colony optimizationapproaches for dealing with dynamic ﬁtness functions has been carried out in [KM12, LW16].These studies show that ant colony optimization can beat evolutionary algorithms due totheir ability of adjusting slowly to changes in the ﬁtness functions. Investigations of parallelevolutionary algorithms using island models carried out in [LW18] for the MAZE function,introduced in [KM12], show that infrequent migration of individuals is necessary for densemodels where as infrequent migration becomes less necessary when working with sparsetopologies in the island model. There are also some results on classical combinatorial optimization problems in the dynamicsetting. Lissovoi and Witt [LW15] have investigated ACO algorithms and shown how thenumber of ants can impact diﬀerent types of changes that can be tracked over time. Theyalso gives example of dynamic oscillations that can not be tracked with a polynomial numberof ants. Dynamic makespan scheduling for two machines has been investigated by Neumannand Witt [NW15]. They have studied dynamic settings where solutions of small discrepancyof the two machines has to be recomputed. The results show that a worst case discrepancy of U where U is an upper bound on the maximal job length can be maintained. Furthermore,better upper bounds on the runtime and lower discrepancies are shown for the case wherethe processing times of the jobs change randomly.Dynamic variants of the minimum vertex cover problem have been considered in [PGN15,PRN20]. Following the edge-based encoding for the minimum vertex cover problem intro-duced in [JOZ13], the problem formulation makes use of the dual formulation of the problemin order to represent solutions. In [PGN15], the expected time to recompute 2-approximationwhen edges are added or removed has been studied and improved results have been presentedin [PRN20].Dynamic settings of the classical graph coloring problem have been investigatedin [BNPS19]. Here, in particular bipartite graphs have been studied and the necessity of19omplex mutation operators has been revealed even if there are only slight dynamic changesto the graph structure. These investigations have recently been extended in [BNPS20] andit has been shown that a dynamic setting where edges are presented to the algorithm in aniterative way can provably lead to better optimization times than presenting the algorithmwith the whole input graph at once. Studies on the runtime behaviour of evolutionary computing techniques for discrete searchspaces involving noisy objective functions have again been started by Droste [Dro04] whoanalyzed the (1+1) EA on a noisy version of OneMax. He studied a prior noise model. Inthis case, some bits of a solution x are ﬂipped prior to the ﬁtness valuation. The studiesconsidered ﬂipping each bit with probability p prior to ﬁtness evaluation and Droste showedthat the (1+1) EA can still obtain the optimal solution for OneMax in expected polynomialtime if p = O ( n/ log n ) whereas the expected optimization time becomes super-polynomialif p = ω ( n/ log n ). In general, investigations can be separated into ones investigating priornoise as described above and posterior noise.In the case of posterior noise, the search is eval-uation on the solution x but noise is added afterwards to the ﬁtness value f ( x ). Gießen andK¨otzing [GK16] build on this initial study by Droste and extended the studies to population-based evolutionary algorithms and also used prior and posterior noise. Results for priorbit-wise noise for the classical benchmark functions OneMax and LeadingOnes have beenobtained in [BQT18, QBJT19]. Additional and improved results including an example wherenoise helps have been provided by Sudholt [Sud18] and estimation of distribution algorithms(see Section 7) have been studied for OneMax in [FKKS17]. A method that can be usedfor the analysis of dynamic and noisy ﬁtness functions has been developed in [DNDD + .4 Combinatorial Optimization Problems with Dynamic andStochastic Constraints Dynamic constraints reﬂect the change in resources to solve a given problem. This is of-ten a crucial aspect in many planning problems where resources such as trucks and trainsmight become unavailable due to failures or become available (again) after maintenance.Considering dynamic constraints, the objective function to be optimized is often assumedto be ﬁxed and only changes to the constraint are considered. The simplest example is themaximization of a linear function subject to a uniform constraint which limits the numberof elements to be at most B . The ﬁrst runtime analysis in this area considered the casewhere the bound B changes to B ∗ and the question is how long an evolutionary algorithmneeds to recompute from an optimal solution for a given bound B an optimal solution forthe updated bound B ∗ . The (1+1) EA and simple evolutionary multi-objective algorithmshave been studied in [SSF + f under the condition that theconstraint is violated with probability at most α , where α is usually a small value, e.g. α = 0 . +

19, XNN20]. Furthermore,Assimi et al. [AHX +

20] investigated evolutionary multi-objective evolutionary algorithms forthe dynamic chance-constrained knapsack problem where the constraint bound for the knap-sack dynamically changes over time through experimental studies. A ﬁrst runtime analysisfor problems with chance constraints has been carried out by Neumann and Sutton [NS19]for special instances of the knapsack problem. It shows that even very simple linear functionswith a simple linear constraint can lead to local optima with large inferior neighbourhoodsthat may make it hard for the (1+1) EA to produce an optimal solution.Important results on evolutionary algorithms for the optimization of submodular func-tions under dynamic and stochastic constraints have been obtained recently and are sum-marized in Section 6.3. Furthermore, a more comprehensive and technical survey on thetheory of evolutionary computing in dynamic and stochastic environments can be found in[RPN18].

Submodular functions play a keyrole in the area of optimization as many real world problemscan be stated in terms of a submodular function as many problem face a diminishing returnwhen adding additional components to a solution. The recent book by Zhou et al. [ZYQ19] isgiven a very comprehensive presentation on submodular optimization by evolutionary algo-21ithms solving a wide range submodular problems in the areas of optimisation and machinelearning.We consider the following setting. Given a set V = { v , . . . , v n } of elements, the goal isto maximize a function f : 2 V → R + that maximizes f subject to a given set of constraints.Submodular functions are usually considered in terms of marginal value when adding a newelement. We denote by F i ( A ) = f ( A ∪ { i } ) − f ( A ) the marginal value of i with respectto A . A function f is submodular iﬀ F i ( A ) ≥ F i ( B ) for all A ⊆ B ⊆ X and i ∈ X \ B .Furthermore, a function f is called monotone iﬀ f ( A ) ≤ f ( B ) for A ⊆ B .The ﬁrst investigations in terms of runtime behaviour of evolutionary algorithms forsubmodular functions, we are aware of, have been carried out by Rudolph [Rud97] in the1990s. More then 15 years later this research area has been re-started by Friedrich andNeumann [FN14] and has since then gained signiﬁcant attention. The research in the contextof static optimisation can be grouped with respect to the type of objective functions andthe type of constraints that are considered. In terms of objective functions, it is usuallydiﬀerentiated between monotone and non-monotone submodular functions. Furthermore, thesubmodularity ratio plays a crucial role when broadening the class of functions to functionsthat are not submodular. This ratio measures how close a function is to being submodular.The submodularity ratio α f of a given function f is deﬁned as α f = min X ⊆ Y,v Y f ( X ∪ v ) − f ( X ) f ( Y ∪ v ) − f ( Y ) .Note that if f is submodular then α f = 1 holds. The other important component in theseinvestigations are the type of constraints that are considered. Constraints are usually of thetype c ( X ) ≤ B , where c : 2 V → R = assigns a non negative cost to each set of elements X ⊆ V and B is a given constraint bound. This type of constraints includes the case of a simpleuniform constraints where c ( X ) = | X | holds and limits the number of elements that can beincluded in a feasible solution by B . The maximization of a monotone submodular functionis already NP-hard and can be approximated within a factor of (1 − /e ) by a simple greedyalgorithm [NWF78]. More complex constraints involve partition or matroid constraints whichare given in form of linear functions. Complex constraints considered include cost values thatcan only be approximated, i.e. involving NP-hard routing problems. Then the approximationobtained for the submodular function depends on the type of objective function as well asthe ability to calculate the cost of the considered constraint. In the following, we summarizesome of the main results in this currently very active research area. Optimal solutions for monotone submodular functions f ( X ) with a cost constraint c ( X ) ≤ B can often be approximated well by simple greedy algorithms (see [KG14] for a comprehensivesurvey). 22uch greedy algorithms start with the empty set and add in each iterations an elementwith the the largest marginal gain( f ( X ∪ { x } ) − f ( X )) / ( c ( X ∪ { x } ) − c ( X ))that does not violate the constraint. The algorithm stops if no element can be added withoutviolating the constraint bound.Variants of GSEMO (see Algorithm 1) have been widely studied in the context of op-timzing submodular functions. The initial analysis carried out in [FN14] considered themaximization of monotone submodular functions with diﬀerent types of constraints. Afterthis

GSEMO has been widely studied in the context of submodular optimization under theumbrella of Pareto Optimization which formulates a given constraint optimization problemas a multi-objective problem by establishing an additional objective based on the consideredconstraint. Such approaches have been widely used already before this in the context of run-time analysis of

GSEMO . Solving single-objective problems by multi-objective formulationsis a well-known concept in the evolutionary computation literature and has been studied froma practical and theoretical perspective since mid of the 2000s [Jen04, NW06, BFH + c ( X ) = | X | , GSEMO selects in eachstep an element with the largest marginal again with respect to f . Friedrich and Neu-mann [FN14] have shown that GSEMO produces a (1 − /e )-approximation for monontonesubmodular functions with a uniform constraint in expected time O ( n (log n + B ) where B ≤ n . For monotone submodular functions with k matroid constraints, local search andsimple single-objective evolutionary algorithms such as the classical (1+1) EA are able toobtain good approximation results. It has been shown by Lee at al. [LMNS09] that localsearch introducing at 2 p new elements and removing at most 2 kp elements is able to obtaina (1 / ( k + 1 /p + ǫ ))-approximation in polynomial time if k ≥ p ≥ O ( ǫ · n p ( k +1)+1 · k · log n ). The crucial part of the proof isresult from [LMNS09] which shows that every solution x for which there is no y in the deﬁnedneighborhood with f ( y ) ≥ (1 + ǫn ( k +1) · f ( x ) is already a (1 / ( k + 1 /p + ǫ ))-approximation.Further investigations lead to a wide range of results for GSEMO on various submod-ular problems with cost constraints. The algorithm

GSEMO is often called

POMC (orsimilar) in such articles and the approaches are referred to as Pareto optimization. However,usually the diﬀerence only lies in the formulation of the objective functions to formulate theconstrained submodular problems as a multi-objective optimisation problem.An important result covering a wide range of monotone functions for a broad class ofcost constraints has been obtained by Qian et al. [QSYT17]. They investigated monotonefunctions in terms of submodularity ratio and general cost functions including ones for whichit is hard to obtain an optimal solution exactly. Their theoretical results make use of proofideas used for an adaptive greedy algorithm and show that a variant of

GSEMO called

POMC is able to obtain the same approximation guarantee in expected pseudo-polynomialtime. The expected runtime may be exponential with respect to the given input here if boththe submodular function and the cost function can take on exponentially many values. In23his case, the population size of

GSEMO may become exponential during the run. Moreprecisely, the have shown that

POMC obtains a ( α/ · (1 − e − α )-approximation, α is thesubmodularity ratio, for a relaxed cost constraint ˆ B (instead of B) where ˆ B depends on howwell the given cost constraint can be approximated. Note that this setting includes problemswhere the cost of a solution may be hard to compute, i.e. for a selection of items it couldbe an approximation of a minimum Traveling Salesperson tour. The experimental resultsshow that POMC clearly outperforms the adaptive greedy approach if the evolutionaryalgorithm is given a suﬃcient large number of ﬁtness evaluations.Recently, an evolutionarymulti-objective algorithm called EAMC has been introduced in [BFQY20] which obtainsthe same worst-case approximation ratio as

POMC in expected polynomial time if thesubmodularity ratio of the given problem and used in the algorithm. However, EAMCusually performs worse than

POMC on important benchmark problems.Subset selection has also been investigated in the context of sparse regression. Here thesubmodular ratio α f of the underlying function to be optimized plays crucial role for theapproximation quality obtained. Again a variant of GSEMO called POSS [QYZ15] achievesin expected polynomial time the same approximation quality as a greedy approach calledforward regression [DK11], namely a solution X with f ( x ) ≥ (1 − e − α ) · OP T . Furthermore,POSS outperforms forward regression and other simple heuristics in experimental investi-gations in terms of solution quality when giving it a suﬃcient amount of time to improvesolutions during the evolutionary optimisation process.

For symmetric functions which are not necessarily monotone and have k matroid constraints.Evolutionary algorithms and local search approaches can increase the function value bylocal operations to obtain a good approximation. Lee et al [LMNS09] have shown thatif a solution the value of a solution x can not be increased by a factor of at least (1 + ǫ/n ) changing at most k + 1 elements, then x is a k +2)(1+ ǫ ) -approximation. The series ofsuch local improvements requires that the algorithm obtains a solution x of value at least f ( x ) ≥ OP T /n . Such a solution can be obtained from the empty set by adding the singleelement with the largest function value. Consequently local search algorithms building onsuch a solution and exchanging at most k + 1 elements obtain a solution with the statedapproximation quality in polynomial time [LMNS09]. It has been shown that GSEMO obtains a k +2)(1+ ǫ ) -approximation in expected time O ((1 /ǫ ) n k +6 log n ). The proof analyzesthe process until a solution x with f ( x ) ≥ OP T /n is obtained and the required number oflocal improvements until a solution of the stated approximation quality is obtained.In their recent work, Qian et al. [QYT +

19] give other major results which are broaden-ing the setting of previous investigations. The considered an evolutionary multi-objectivealgorithm called

GSEMO-C which diﬀers from

GSEMO by producing from the oﬀspring x ′ a second oﬀspring x ′′ which is the complement of the ﬁrst oﬀspring. The selection stepof GSEMO is then applied to both x ′ and x ′′ . The authors ﬁrst showed that for the caseof non-monotone submodular functions without any constraint, GSEMO-C is able to ob-24ain a (1 / − ǫ/n )-approximation in expected time O ( n ǫ log n ). For ǫ -monotone submodularfunctions, ǫ ≥

0, where f ( X ∪ { x } ) ≥ f ( X ) − ǫ holds for any X ⊆ V and x X , and auniform constraint with bound B , they showed that GSEMO-C achieves a solution x with f ( x ) ≥ (1 − /e ) · ( OP T − kǫ ) in expected time O ( n ( B + log n )) which generalizes theresult given in [FN14] to a wider range of functions by taking their closeness to monotonic-ity into account. Similar approximation results also hold for GSEMO-C when considering ǫ -approximately submodular functions, i.e. for functions f for which a submodular function g exists such that for all X ⊆ V , (1 − ǫ ) g ( X ) ≤ f ( X ) ≤ (1 + ǫ ) g ( X ) holds. The authorsshowed that suitable approximation can also be obtained for a wider range of functions witha cardinality constraint in expected time O ( n ( B +log n )). Speciﬁcally, they obtained resultsthat depend on to the submodularity ratio of the problem and investigated functions thatare ǫ -approximately submodular.Functions with bounded curvature under partition matroid constraints have been inves-tigated in [FGN + GSEMO which is able to guarantee the same approximationquality as greedy but usually outperforms greedy in practice.

Recent studies extended the investigations for monotone submodular functions to problemswith dynamic constraints as well as constraints involving stochastic components. Roostapouret al. [RNNF19] investigated the setting of general cost constraints where the constraintbound B changes over time. Generalizing the results of Qian et al. [QSYT17] which aresummarized in Section 6.1, they have shown that the evolutionary multi-objective approach POMC computes a approximation for every budget b , 0 ≤ b ≤ B . Furthermore, theyhave shown that if B is increased to B ∗ , then a approximation for every b , 0 ≤ b ≤ B ∗ isobtained in pseudo-polynomial time. In contrast to this, it has been pointed out in [RNNF19]that simple adaptations of the generalized greedy algorithm are not able to maintain goodapproximations when dynamic changes are carried out. Furthermore, POMC is able tolearn the dynamic problems over time which gives it signiﬁcant advantages over the greedyapproaches as shown in comprehensive experimental investigations [RNNF18].Recently, the investigations in the area of submodular optimisation have also been ex-tended to stochastic constraints. Chance constraints play an important role in stochasticsettings. These model situations where components of a constraint are stochastic and thegoal it to optimize a given submodular objective functions such that the probability of vi-olating a given constraint bound is at most α . Doerr et al. [DDN +

19] investigated greedyalgorithms for the optimization of monotone submodular functions for two settings. In theﬁrst setting, the stochastic weights are identically and independently uniformly distributedwithin a given interval [ a − δ, a + δ ], δ ≤ a , where δ models the uncertainty of the items. Inthe second setting each element s has its own expected weight and is chosen independentlyof the others and uniformly at random in [ a ( s ) − δ, a ( s ) + δ ], δ ≤ min s ∈ V a ( s ). The investiga-25ions have recently been extended by Neumann and Neumann [NN20] to GSEMO and it hasbeen shown that this algorithm is able to obtain the same approximation guarantee as thegreedy approach in expected polynomial time in the case of identically and independentlyuniformly distributed weights. For the second setting, the same approximation guarantee asthe one obtained for the greedy approach is obtained in expected pseudo-polynomial time.Furthermore, experimental investigations carried out for the inﬂuence maximization prob-lem in social networks and the maximum coverage problem show that

GSEMO signiﬁcantlyoutperforms the greedy approach. A comparison of

GSEMO to a standard setup of NSGA-II reveals that

GSEMO is also often outperforming NSGA-II for the investigated settingswhich suggests that the ability of

GSEMO to construct solutions in a greedy fashion is alsocrucial for the success of the algorithm in practice.

Estimation-of-distribution algorithms (EDAs) are a more recent class of evolutionary algo-rithms (EAs). As a main diﬀerence to classic EAs, they to not evolve a population (that is,a ﬁnite set of solution candidates), but a probabilistic model of a solution candidate (thatis, a probability distribution over the search space). Whereas a traditional EA selects indi-viduals from a parent population, creates from them oﬀspring via mutation and crossover,evaluates the oﬀspring, and based on this evaluation selects from parents and oﬀspring thenext parent population, the EDA samples individuals from the current probabilistic model,evaluates them, and based on this evaluation deﬁnes the next probabilistic model. Whenviewing a parent population of a classic EA as probabilistic model (uniformly distributedon the individuals of the population), one can interpret population-based EAs as particu-lar EDAs, but it is clear that the probabilistic models of EDAs are much more expressivethan models building on ﬁnite populations. The obvious hope is that this richer class ofalgorithms contains better optimizers. However, there is also the additional fantasy that theprobabilistic model evolved by an EDA can give insights beyond the good solutions that canbe sampled from it.Most EDAs were deﬁned in the 1990s, ﬁrst in 1993 in an unpublished work [JBS93]by Ari Juels, Shumeet Baluja, and Alistair Sinclair (see [Lob07]) proposing the equilibriumgenetic algorithm . While clearly containing the right ideas, this paper was never publishedand this algorithm is little known. Acknowledging the joint work with Ari Juels, ShumeetBaluja [Bal94] proposed a very similar algorithm called population-based incremental learning(PBIL) . As an important special case of it, M¨uhlenbein and Paass [MP96] two years latersuggested the univariate marginal distribution algorithm (UMDA) . In 1999, Harik, Lobo,and Goldberg [HLG99] proposed the compact genetic algorithm (cGA) . These and severalother EDAs found numerous successful applications in the following years, see, e.g., thesurveys [HP11, LL02, PHL15].First attempts to understand EDAs via theoretical means soon followed, starting – asoften – with convergence results such as [HR97]. We note, however, that many of these veryearly results work with simplifying assumptions such as inﬁnite population models and thus26re not fully rigorous in the strict mathematical sense. In a series of works, Shapiro [Sha02,Sha05, Sha06] analyzed how the parameters of EDAs inﬂuence the eﬀect of genetic drift. Wediscuss this central topic in more detail in Section 7.2.3.The ﬁrst rigorous runtime analysis for an EDA was presented by Droste at GECCO2005 (journal version [Dro06]). Chen, Lehre, Tang, and Yao [CLTY09] exhibited an artiﬁcialexample problem which is easily solved by the UMDA, but for which the (1 + 1) EA with anyΘ( n ) mutation rate needs exponential time to ﬁnd the optimum. In [CTCY10], Chen, Tang,Chen, and Yao discussed the use of frequency boundaries to prevent premature convergence.After these early works, it took another ﬁve years without theoretical works on EDAs untilthis area gained signiﬁcant momentum in 2015–2016 with works like [DLN19] (conferenceversion at GECCO 2015), conducting a runtime analysis of the UMDA on OneMax and

LeadingOnes , [FKKS17] (conference version at ISAAC 2015) on the robustness of EDAsto noise, [SW19] (conference version at GECCO 2016) on how the update strength inﬂuencesthe runtime of the cGA, and [FKK16] pointing out that the main known EDAs are balanced,but not stable (that is, subject to genetic drift). These works generated a broad interestin theoretical analyses of EDAs, resulting is a large number of strong papers by a decentnumber of diﬀerent authors. We refer to the recent survey [KW20a] for more details.

We now describe the compact genetic algorithm (cGA), which will serve as central examplein this section. Other EDAs such as the UMDA or PBIL are substantially diﬀerent, butappear to have similar strengths and challenges, so expecting similar results for these is areasonable rule of thumb. However, we only concentrate on EDAs for discrete optimizationproblems here and we expect very diﬀerent results in the continuous world. For reasons ofsimplicity, we only regard pseudo-Boolean problems, that is, the optimization of functions f : { , } n → R .The compact genetic algorithm (cGA) was proposed by Harik, Lobo, and Gold-berg [HLG99]. Being a univariate EDA , it develops a probabilistic model described by a frequency vector p ∈ [0 , n . This frequency vector determines the following probabilitydistribution on the search space { , } n . If X = ( X , . . . , X n ) ∈ { , } n is a search pointsampled according to this distribution – we write X ∼ Sample( p ) to indicate this – thenwe have Pr[ X i = 1] = p i independently for all i ∈ [1 ..n ] := { , . . . , n } . In other words, theprobability that X equals some ﬁxed search point y is Pr[ X = y ] = Q i : y i =1 p i Q i : y i =0 (1 − p i ).In each iteration, the cGA updates this probabilistic model by sampling two search points x , x ∼ Sample( p ), computing their ﬁtness, sorting them, that is deﬁning ( y , y ) = ( x , x )if x is at least as ﬁt as x and ( y , y ) = ( x , x ) otherwise, and updating the frequencyvector to p := p + K ( y − y ), capped into the interval [0 , y and y diﬀer in some bit position i , the i -th frequency moves by a step of K intothe direction of y i (but not below zero and above one). The hypothetical population size K ,often also denoted by µ , is an algorithm parameter that controls how strong this update is.27o avoid a premature convergence, one often works with the frequency boundaries n and1 − n , that is, one caps the new frequency vector into the interval [ n , − n ] instead of [0 , runtime of the cGA), we do not specify a terminationcriterion and pretend that the algorithm runs forever. Algorithm 2:

The compact genetic algorithm (cGA) to maximize a function f : { , } n → R . p = ( , . . . , ) ∈ [0 , n ; repeat x ∼ Sample( p ); x ∼ Sample( p ); if f ( x ) ≥ f ( x ) then ( y , y ) ← ( x , x ) else ( y , y ) ← ( x , x ); p ← p + K ( y − y ) capped into [0 ,

1] or [ n , − n ]; until forever ; In this section, we discuss three main insights which the theoretical analysis of EDAs hasproduced. For reasons of space, we point out two of them only brieﬂy, namely that EDAscan perform well in noisy optimization and that they can cope well with local optima, andthen discuss in detail how to set the parameters of EDA as this might the biggest obstaclein successfully using EDAs.

In their remarkable work [FKKS17], Friedrich, K¨otzing, Krejca, and Sutton exhibit that thecGA is extremely robust to noise. More precisely, they show that the cGA with a suitableparameter choice can optimize a

OneMax function subject to additive normally distributednoise in a runtime that only polynomially depends on the variance σ of the noise. As theyalso show, such a performance cannot be obtained with many classic EAs. The reason for thisrobustness is the cautious update of the probabilistic model in each iteration (as opposed tothe “drastic” alternatives of a classic EA, rejecting an oﬀspring or keeping it and discardingsome other individual). This caution of the EDA implies that a single wrong evaluation of asearch point only has a small inﬂuence on the future run of the algorithm. In the only otherstudy on how EDAs cope with noise, Lehre and Nguyen [LN19b] show that the UMDA withsuitable parameter choices can optimize the LeadingOnes problem in time O ( n ) also inthe presence of constant-probability one-bit prior noise. We note that a strong robustnessto noise was previously found in ant-colony optimizers [DHK12, FK13, ST12], which bearsome similarity to EDAs. 28 .2.2 EDAs Can Cope Well with Local Optima Another diﬃculty for many EAs are local optima. Once the population of the EA is con-centrated on the local optimum, it is diﬃcult to leave this local optimum. As Hasen¨ohrland Sutton [HS18] (see also [Doe19c]) show, the larger sampling variance of the cGA (in theregime without genetic drift) enables the algorithm to leave local optima much faster thanmany classic EAs. More speciﬁcally, Hasen¨ohrl and Sutton show that the cGA can optimizea jump function with jump size k in time exp( O ( k + log n )), whereas many mutation-basedEAs need time Ω( n k ). Both the result on noisy optimization and the one on local optima indicate that EDAs canhave signiﬁcant advantages over classic EAs. For reasons of brevity, we nevertheless omitfurther details and now turn to an important topic where a large sequence of works togetherhave greatly increased our understanding, namely how to choose the parameters of EDAsand what is the role of genetic drift in EDAs.While choosing optimal parameters for EAs is never easy, for many classic EAs a numberof easy rules of thumb have been developed. For example, for mutation-based EAs thegeneral recommendation to use standard-bit mutation with mutation rate p = n often givesreasonable results (though [DLMN17] suggests that this impression is caused by an overﬁttingto unimodal problems). For EDAs, such general rules that are true over diﬀerent classes ofproblems appear to be harder to ﬁnd. From a large number of theoretical works, we nowunderstand quite well why and we also have a number of diﬀerent solutions to this problem.The main challenge is choosing an appropriate speed of adapting the probabilistic model.If this speed of adaptation is low, then it simply takes long to change the initial, usually uni-form, model into a model that samples good solutions with reasonable probability. However,if the speed of adaptation is high, then the small random signals stemming from the randomchoices in the sampling of solutions are over-interpreted and the model is quickly adjusted toan incorrect model. When an EDA without frequency boundaries is used, this means thatthe model has (at least partially) converged to an incorrect model without the possibility toever return. With frequency boundaries, there is still the change to revert to a good model,but practical experience and theory shows (i) that this can take long and (ii) that usually theEDA continues to work with degenerate models and thus, to some extent, imitates classicEAs (and consequently does not proﬁt from the more general model-building ability). Theeﬀect that sampling frequencies without a justiﬁcation from the ﬁtness function move toboundary values is known as genetic drift .Since genetic drift can lead to signiﬁcant performance problems and since the risk ofencountering genetic drift via unfortunate parameter choices is high, the question how toavoid genetic drift is, explicitly or implicitly, a common theme of almost all theoretical workson EDAs. Shapiro’s very early works [Sha02, Sha05, Sha06] discussed this question explicitly,Droste’s ﬁrst rigorous runtime analysis regarded how the cGA optimizes OneMax only whenthe update strength K is O ( n − . − ε ) for some constant ε >

0, a parameter regime in which29he cGA with high probability ﬁnds the optimum of

OneMax in a way that never a samplingfrequency goes below , that is, without encountering genetic drift. For reasons of space,we shall not describe in detail the whole history of understanding genetic drift of EDAs,but present immediately the ﬁnal result only mentioning that both explicit investigations ofgenetic drift like [Sha02, Sha05, Sha06, FKK16, DZ20b] and the insights gained from manyruntime analyses like [Dro06, DLN19, LN17, SW19, Wit19, KW20b, LSW18, HS18, Doe19c,Doe19b] paved the way towards this result.Before discussing how to avoid genetic drift, let us quickly describe what is known on thedanger of genetic drift. A ﬁrst indication that genetic drift could be dangerous can be derivedfrom the positive results – the majority of the proven upper bounds for runtimes of EDAsonly apply to regimes in which there is provably no genetic drift, and in fact, most proofsheavily exploit this. Rigorous proofs that genetic drift can lead to performance losses aremuch more rare and appeared only very recently, owing to the fact that lower bound proofsfor EDAs are often very diﬃcult. In their deep analysis [LSW18], Lengler, Sudholt, and Wittshowed that the cGA with K = Θ( n . / (log n · log log n )) needs time Ω( n / / (log n · log log n ))to optimize OneMax and the proof of this result shows that genetic drift is present. For K = cn . ln n , c a suﬃciently large constant, the cGA only needs time O ( n log n ) and hereno genetic drift occurs [SW19]. A more drastic loss from genetic drift, albeit on an artiﬁcialexample problem, was observed in [LN19a, DK20c]. Lehre and Nguyen [LN19a] deﬁne thedeceiving-leading-blocks (DLB) problem and show that the UMDA with Ω(log n ) ≤ µ = o ( n )needs time exponential in µ to ﬁnd the optimum. By [DZ20b], in this parameter regimegenetic drift is encountered when the runtime is ω ( n ). In [DK20c], it is shown that theUMDA with µ = Θ( n log n ) can optimize the DLB problem in time O ( n log n ) by proﬁtingfrom the fact that now there is no genetic drift. A few experimental results also discuss theinﬂuence of genetic drift on the performance of an EDA, e.g., Figure 3 in [KW20a] shows theruntime of the UMDA on OneMax and Figure 1 in [DZ20a] shows the runtimes of the cGAon

OneMax , LeadingOnes , jump functions and the DLB problem. These results show amild negative impact of genetic drift in the two

OneMax experiments, a stronger impactfor

LeadingOnes , and a drastic impact for jump and DLB.We now discuss how to predict and avoid genetic drift. A good way to measure geneticdrift is by regarding a ﬁtness function with a neutral bit, that is, a bit position that has noinﬂuence on the ﬁtness. This might be overly pessimistic, since for such a bit the risk thatthe sampling frequency approaches an unwanted boundary value might be higher than fora bit with strong inﬂuence on the ﬁtness, but (i) a pessimistic view cannot be wrong hereas a slightly too weak model update strength only slightly increases the runtime, whereasgenetic drift as just seen can be detrimental, and (ii) the results just described show thatthe estimates from regarding neutral bits, for these examples, cannot be far from the truth.The up to now most complete answer to the question of genetic drift was given in [DZ20b],as said, a work that would not exist without the long sequence of previous works namedabove. We discuss this result in detail for the cGA and note that similar results are true forthe UMDA and PBIL. 30 heorem 4.

Let f : { , } n → R . Assume that the i -th bit of f is neutral, that is, f ( x ) = f ( y ) for all x, y ∈ { , } n with x j = y j for all j ∈ [1 ..n ] \ { i } . Consider optimizing f viathe cGA with hypothetical population size K using the frequency range [ ε, − ε ] for some ε ∈ [0 , ] . Denote by p ( t ) the frequency vector resulting from the t -th iteration.1. Let T ∗ = min { t | p ( t ) i ∈ { ε, − ε }} . Then E [ T ∗ ] = O ( K ) .2. Let T / = min { t | p ( t ) i ∈ [0 , ] ∪ [ , } ≤ T ∗ be the ﬁrst time the i -th frequency leavesthe interval ( , ) of the frequency range. Then E [ T / ] = Ω( K ) .3. For all γ > and T ∈ N , we have Pr[ ∀ t ∈ [0 ..T ] : | p ( t ) i − | < γ ] ≥ − (cid:16) − γ K T (cid:17) . In very simple words the above result states that if we run the cGA for less than roughly K iterations, then we do not encounter genetic drift, whereas after more than roughly K iterations, genetic drift is likely to occur.The tail bound (3) together with a simple union bound admits more precise guarantees,e.g., the following two formulations. • If our aim is to run the cGA for T iterations on some pseudo-Boolean function ofdimension n , then by taking K ≥ p T ln(2 n ) we can ensure that with probabilityat least 1 − n no neutral bit has its frequency leave the interval ( , ) within these T iterations. • When K is given, the probability that within T ≤ K

32 ln(2 n ) iterations a neutral fre-quency leaves the interval ( , ) is at most n .We note without further details that similar statements hold for bits which are not neutral,but which have a preference for a particular value b , that is, where changing the bit-valueto b can never decrease the ﬁtness. Here the above statements hold for the undesired eventsthat the frequency of this bit approaches the wrong boundary 1 − b . We refer to [DZ20b]for a precise statement of this result. This extension allows to determine good values for thehypothetical population size for simple test functions like OneMax or LeadingOnes : Ifwe run the cGA on one of these functions for T iterations, then taking K ≥ p T ln(2 n )ensures that with probability at least 1 − n no frequency will go below .For bit-values that have no uniform preference for a particular value (which is, naturally,the typical case for diﬃcult optimization problems), we would still recommend to stick tothe above-derived recommendations for setting K since this at least avoids that frequenciesreach the wrong value due to genetic drift. If a ﬁtness landscape is badly deceptive, clearly,such arguments cannot avoid that frequencies approach the wrong end of the frequency rangedue to the misleading ﬁtness signal. We note though that the heuristic argument for setting K along the lines from above gives a good value and a good optimization behavior for thenon-unimodal jump function class [HS18, Doe19c].We ﬁnally note that there are three “automated” ways to approach the diﬃculty of ﬁndingthe right parameter value. Inspired by the above insight, Doerr and Zheng [DZ20a] proposed31o start with a small value of K , run the cGA until either a satisfying solution is found orthe time exceeds a limit up to which we are sure to not observe genetic drift and then restartwith twice the K -value. In [Doe19c], a strategy is proposed that in parallel works withdiﬀerent K -values. Both approaches were proven to optimize simple test functions in a timethat is by at most a logarithmic factor larger than the runtime that can be obtained fromusing the optimal value of K . An experimental comparison [DZ20a] gives no clear picturewhich of the two approaches is superior. Clearly, both perform better than what results froma static, but inappropriate choice of K .Instead of solving the genetic drift problem via a suitable choice of K , the signiﬁcance-based cGA proposed in [DK20b] tries to avoid genetic drift outright. We recall that geneticdrift is caused by random ﬂuctuations of the frequencies, which again are caused by samplingsearch points from the probabilistic model and updating the model based on these. Therefore,the signiﬁcance-based EDA avoids updating the model based on such short-sighted insights.Instead, this algorithm does not update the model until the history of the process givessuﬃcient evidence that some bit should better have a particular value. In this case, a drasticmodel update is performed by setting the corresponding frequency to n or 1 − n . Thisalgorithm was shown to optimize both OneMax and

LeadingOnes in time O ( n log n ), aperformance not observed with any other classic EA or EDA so far. With no research onnon-unimodal objective functions and no practical experience so far, of course, this is still avery preliminary line of research. Being a very recent research topic, it is clear that the theory of EDAs contains more openproblems than solved ones, and many open problems are fundamental for our understandingand the future use of EDAs. We ﬁrst mention briefly research topics where we feel that moreresults would greatly help and then give more details two particular research questions. • Robust optimization: The only two result [FKKS17, LN19b] here show that the cGAcan eﬃciently optimize

OneMax in the presence of normally distributed additive pos-terior noise and that the UMDA can eﬃciently optimize

LeadingOnes in the presenceof one-bit prior noise. Having such results for other EDAs, other optimization prob-lems, and other noise models (and other stochastic disturbances such as dynamicallychanging problem instances) would be highly desirable. • Combinatorial optimization: While for classic EAs a large number of runtime analysesfor combinatorial optimization problems exist [NW10], no such results have been shownfor EDAs. • Representations diﬀerent from bit strings: For classic EAs, a number of results exist forproblem representations diﬀerent from bit strings, e.g., [STW04, DJ10, DDK18], andthese results show that the choice of the representation and the choice of the variationoperators for these can make a crucial diﬀerence. For EDAs, all results so far onlydiscuss bit-string representations. 32e now discuss in more detail two possible directions for future research.

In the regime without genetic drift, EDAs often show a regular optimization behavior whichoften allows to prove matching upper and lower bounds for runtimes. In the presence of ge-netic drift, the runtime is strongly inﬂuenced by how some frequencies approach the bound-aries of the frequency range. It is thus rare events that determine the runtime and this makesit much harder to prove tight bounds. One could argue that runtime analyses in this regimeare less interesting since we rather expect larger runtimes and rather an undesired behavior(e.g., imitating EAs), but this is not the full truth. For example, the UMDA optimizes

OneMax in time Θ( n log n ) both for µ = Θ(log n ) in the regime with (strong) genetic driftand for µ = Θ( √ n log n ) in the regime without genetic drift.Apart from sporadic results, which most likely are not tight in most of the cases, notmuch is known about the runtimes of EDAs in the genetic drift regime. In particular, thefollowing questions are not understood. • Runtimes of EDAs for very high update strengths: The few runtime analyses in thegenetic drift regime all assume that the update strength is at least so small thata (suﬃciently large) logarithmic number of frequency updates is necessary to bring afrequency to a boundary value, e.g., that K ≥ C ln n for some suﬃciently large constant C when considering the cGA. Nothing nontrivial is known for even larger updatestrengths, but it is conjectured that one will typically encounter a super-polynomialruntime here. • Runtimes of EDAs on

OneMax for moderate update strengths. For the casethat the update strength is smaller than in the previous paragraph, but still highenough to lead to genetic drift, a general Ω( n log n ) lower bound for the cGA andUMDA [KW20b, SW19] and an O ( nλ ) upper bound for the UMDA [DLN19, Wit19]are known. For the cGA, a slightly stronger lower bound of Ω( K / n ) was shown inthe regime K = Ω(log n ) ∩ O ( √ n/ log n ) [LSW18]. With this being all that is known,a true understanding of this regime is far from established. • Runtimes of EDAs on jump functions: In the regime without genetic drift, a reasonableunderstanding of the runtime of the cGA on jump functions has been obtained in [HS18,Doe19c, Doe19b]. The lower bound [Doe19b], exponential in the jump size k , alsoapplies to the regime with genetic drift, but is by far not suﬃcient to explain the hugeruntimes observed experimentally [DZ20a] in this regime. Hence a proof that the cGAoptimizing jump functions suﬀers signiﬁcantly from genetic drift is still missing. Essentially all theoretical research so far regarded only univariate EDAs, that is, EDAs whichevolve a univariate probabilistic model in which the bits are sampled independently. This is33ot surprising given how diﬃcult it already was to obtain our limited understanding of uni-variate EDAs. Since more complex EDAs have the fantasy both of being better optimizersfor complex optimization problems (in which often decision variables are highly interdepen-dent) and of evolving better probabilistic models to represent the structure of interestingparts of the search space, a better understanding of multivariate EDAs is highly desirable.So far, only two results in this direction exist, and both rely on theory-driven experi-ments and not on proven results. In [LN19a], the authors claim that the bivariate EDA mutual information maximization for input clustering (MIMIC) can cope better with ﬁtnesslandscape in which the decision variables are interdependent. They deﬁne an artiﬁcial ﬁtnesslandscape with strong inter-variable dependencies, the DLB-problem, prove that the UMDAwith µ = o ( n ) needs time exponential in µ and show experimentally that the MIMIC canoptimize this landscape in time polynomial in n . Based on this ﬁnding, they suggest “thatone should consider EDAs with more complex probabilistic models when optimizing prob-lems with some degree of epistasis and deception.” As discussed in Section 7.2.3, the lowerbound on the runtime of the UMDA only applies to the regime with strong genetic driftand from µ = Ω( n log n ) on, the runtime of the UMDA on DLB becomes O ( µn ) [DK20c].For this reason, it is not clear if the MIMIC, and more generally, bivariate EDAs, are alsosuperior to the UMDA with the right choice of the parameters.While hence no example exists in which a multivariate EDA shows a better optimizationbehavior than a univariate one (with good parameters), the recent work [DK20a] shows (againonly experimentally) that bivariate EDAs can evolve very expressive probabilistic models.For a simple ﬁtness landscape with 2 n/ global optima it is shown that the MIMIC veryquickly evolves a probabilistic model which allows to sample global optima with constantchance and in a way that very rarely an optimum is sampled repeatedly. Hence the modelevolved indeed represents to some extend the structure of the set of optimal solutions. It isclear that this would not be possible with a univariate EDA or a population-based EA.In summary, there is a cautious indication that multivariate EDAs could be interestingboth from the viewpoint of good optimization times and good representations of the structureof the ﬁtness landscape, but almost all of the work in this direction still needs to be done. We provided an overview on areas of research in the ﬁeld of theory of evolutionary com-putation in discrete search spaces that have gained signiﬁcant attention during the last 10years. The survey tried to capture the most important aspects from the perspective of theauthors. We refer to the recent edited book [DN20b] for a more comprehensive overview,which also includes other evolutionary computing techniques such as genetic programmingand artiﬁcial immune systems. For the true technical details, naturally, we invite the readerto consult the original articles.There are many areas where we see a lot of room for progress. Analyses for constrainedproblems static, dynamic, or stochastic have just recently been started and understand-ing the behavior of evolutionary algorithms for linear functions even very special simple34onstraints is still a challenging task [NPW19]. A ﬁrst analysis of diﬀerential evolution indiscrete search spaces has been carried out in [ZYD18], however, indicating that our cur-rent methods cannot cope well with the complicated stochastic dependencies arising in thisoptimization process. The entropy compression method has found a ﬁrst application in evo-lutionary computation [LMS19], but other applications of this powerful methods are not insight. From a broader perspective, our understanding of the impact of populations, crossoveroperators, and diversity mechanisms still lags behind their practical success and proving theusefulness of such modules of an evolutionary algorithm for complex optimization problemsis a challenging task.We hope that the readers ﬁnd this survey useful and that it helps them to understandthe current theoretical research and to pursue their own research in this area. Althoughtremendous progress has been made during the last 10 years, there are still a lot of openquestions and problems, some of which have been outlined in this article. We encourage thereader to make their own contribution to this ﬁeld of research and help to transfer theoreticalknowledge into the design of high performing evolutionary computing techniques.

References [AAG18] Youhei Akimoto, Anne Auger, and Tobias Glasmachers. Drift theory in contin-uous search spaces: expected hitting time of the (1 + 1)-ES with 1/5 successrule. In

Genetic and Evolutionary Computation Conference, GECCO 2018 ,pages 801–808. ACM, 2018.[ABD20a] Denis Antipov, Maxim Buzdalov, and Benjamin Doerr. Fast mutation incrossover-based algorithms. In

Genetic and Evolutionary Computation Con-ference, GECCO 2020 . ACM, 2020. To appear.[ABD20b] Denis Antipov, Maxim Buzdalov, and Benjamin Doerr. First steps towardsa runtime analysis when starting with a good solution. In

Parallel ProblemSolving From Nature, PPSN 2020 . Springer, 2020. To appear.[AD20] Denis Antipov and Benjamin Doerr. Runtime analysis of a heavy-tailed (1 +( λ, λ )) genetic algorithm on jump functions. In

Parallel Problem Solving FromNature, PPSN 2020 . Springer, 2020. To appear.[ADFH18] Denis Antipov, Benjamin Doerr, Jiefeng Fang, and Tangi Hetet. Runtimeanalysis for the ( µ + λ ) EA optimizing OneMax. In Genetic and Evolution-ary Computation Conference, GECCO 2018 , pages 1459–1466. ACM, 2018.[ADY19] Denis Antipov, Benjamin Doerr, and Quentin Yang. The eﬃciency thresholdfor the oﬀspring population size of the ( µ, λ ) EA. In

Genetic and EvolutionaryComputation Conference, GECCO 2019 , pages 1461–1469. ACM, 2019.35AHX +

20] Hirad Assimi, Oscar Harper, Yue Xie, Aneta Neumann, and Frank Neumann.Evolutionary bi-objective optimization for the dynamic chance-constrainedknapsack problem based on tail bound objectives.

CoRR , abs/2002.06766, 2020.Conference version to appear at ECAI 2020.[B¨ac92] Thomas B¨ack. Self-adaptation in genetic algorithms. In

European Conferenceon Artiﬁcal Life, ECAL 1992 , pages 263–271. MIT Press, 1992.[B¨ac93] Thomas B¨ack. Optimal mutation rates in genetic search. In

InternationalConference on Genetic Algorithms, ICGA 1993 , pages 2–8. Morgan Kaufmann,1993.[B¨ac96] Thomas B¨ack.

Evolutionary Algorithms in Theory and Practice – EvolutionStrategies, Evolutionary Programming, Genetic Algorithms . Oxford UniversityPress, 1996.[Bal94] Shumeet Baluja. Population-based incremental learning: A method for inte-grating genetic search based function optimization and competitive learning.Technical report, Carnegie Mellon University, 1994.[BD17] Maxim Buzdalov and Benjamin Doerr. Runtime analysis of the (1 + ( λ, λ )) ge-netic algorithm on random satisﬁable 3-CNF formulas. In

Genetic and Evolu-tionary Computation Conference, GECCO 2017 , pages 1343–1350. ACM, 2017.[BDM15] Weiwei Bi, Graeme C. Dandy, and Holger R. Maier. Improved genetic algo-rithm optimization of water distribution system design by incorporating domainknowledge.

Environ. Model. Softw. , 69:370–381, 2015.[BDN10] S¨untje B¨ottcher, Benjamin Doerr, and Frank Neumann. Optimal ﬁxed andadaptive mutation rates for the LeadingOnes problem. In

Parallel ProblemSolving from Nature, PPSN 2010 , pages 1–10. Springer, 2010.[BFH +

09] Dimo Brockhoﬀ, Tobias Friedrich, Nils Hebbinghaus, Christian Klein, FrankNeumann, and Eckart Zitzler. On the eﬀects of adding objectives to plateaufunctions.

IEEE Trans. Evol. Comput. , 13(3):591–603, 2009.[BFM97] Thomas B¨ack, David B. Fogel, and Zbigniew Michalewicz.

Handbook of Evolu-tionary Computation . IOP Publishing Ltd., 1997.[BFQY20] Chao Bian, Chao Feng, Chao Qian, and Yang Yu. An eﬃcient evolutionaryalgorithm for subset selection with general cost constraints. In

AAAI , pages3267–3274. AAAI Press, 2020.[BLM +

20] Daniel Bertschinger, Johannes Lengler, Anders Martinsson, Robert Meier, An-gelika Steger, Miloˇs Truji´c, and Emo Welzl. An optimal decentralized ( δ + 1)-coloring algorithm. CoRR , abs/2002.05121, 2020.36BLS14] Golnaz Badkobeh, Per Kristian Lehre, and Dirk Sudholt. Unbiased black-boxcomplexity of parallel search. In

Parallel Problem Solving from Nature, PPSN2014 , pages 892–901. Springer, 2014.[BM16] Mohammad Reza Bonyadi and Zbigniew Michalewicz. Evolutionary compu-tation for real-world problems. In

Challenges in Computational Statistics andData Mining , volume 605 of

Studies in Computational Intelligence , pages 1–24.Springer, 2016.[BNPS19] Jakob Bossek, Frank Neumann, Pan Peng, and Dirk Sudholt. Runtime analysisof randomized search heuristics for dynamic graph coloring. In

GECCO , pages1443–1451. ACM, 2019.[BNPS20] Jakob Bossek, Frank Neumann, Pan Peng, and Dirk Sudholt. More eﬀectiverandomized search heuristics for graph coloring through dynamic optimization.

CoRR , abs/2005.13825, 2020. To appear as full paper at GECCO 2020.[BQT18] Chao Bian, Chao Qian, and Ke Tang. Towards a running time analysis ofthe (1+1)-EA for OneMax and LeadingOnes under general bit-wise noise. In

Parallel Problem Solving from Nature, PPSN 2018, Part II , pages 165–177.Springer, 2018.[CDEL18] Dogan Corus, Duc-Cuong Dang, Anton V. Eremeev, and Per Kristian Lehre.Level-based analysis of genetic algorithms and other search processes.

IEEETransactions on Evolutionary Computation , 22:707–719, 2018.[CHS +

09] Tianshi Chen, Jun He, Guangzhong Sun, Guoliang Chen, and Xin Yao. A newapproach for analyzing average time complexity of population-based evolution-ary algorithms on unimodal problems.

IEEE Transactions on Systems, Man,and Cybernetics, Part B , 39:1092–1106, 2009.[CLTY09] Tianshi Chen, Per Kristian Lehre, Ke Tang, and Xin Yao. When is an esti-mation of distribution algorithm better than an evolutionary algorithm? In

Congress on Evolutionary Computation, CEC 2009 , pages 1470–1477. IEEE,2009.[CTCY10] Tianshi Chen, Ke Tang, Guoliang Chen, and Xin Yao. Analysis of computa-tional time of simple estimation of distribution algorithms.

IEEE Transactionson Evolutionary Computation , 14:1–22, 2010.[DD18] Benjamin Doerr and Carola Doerr. Optimal static and self-adjusting parameterchoices for the (1 + ( λ, λ )) genetic algorithm.

Algorithmica , 80:1658–1709, 2018.[DDE15] Benjamin Doerr, Carola Doerr, and Franziska Ebel. From black-box complexityto designing new genetic algorithms.

Theoretical Computer Science , 567:87–104,2015. 37DDK18] Benjamin Doerr, Carola Doerr, and Timo K¨otzing. Static and self-adjustingmutation strengths for multi-valued decision variables.

Algorithmica , 80:1732–1768, 2018.[DDL19] Benjamin Doerr, Carola Doerr, and Johannes Lengler. Self-adjusting muta-tion rates with provably optimal success rules. In

Genetic and EvolutionaryComputation Conference, GECCO 2019 , pages 1479–1487. ACM, 2019.[DDN +

19] Benjamin Doerr, Carola Doerr, Aneta Neumann, Frank Neumann, and An-drew M. Sutton. Optimization of chance-constrained submodular functions.

CoRR , abs/1911.11451, 2019. to appear in proceedings of AAAI 2020.[DDY16] Benjamin Doerr, Carola Doerr, and Jing Yang. k -bit mutation with self-adjusting k outperforms standard bit mutation. In Parallel Problem Solvingfrom Nature, PPSN 2016 , pages 824–834. Springer, 2016.[DDY20] Benjamin Doerr, Carola Doerr, and Jing Yang. Optimal parameter choices viaprecise black-box analysis.

Theoretical Computer Science , 801:1–34, 2020.[DF99] Rodney G. Downey and Michael R. Fellows.

Parameterized Complexity .Springer, 1999.[DFW11] Benjamin Doerr, Mahmoud Fouz, and Carsten Witt. Sharp bounds byprobability-generating functions and variable drift. In

Genetic and EvolutionaryComputation Conference, GECCO 2011 , pages 2083–2090. ACM, 2011.[DG13] Benjamin Doerr and Leslie A. Goldberg. Adaptive drift analysis.

Algorithmica ,65:224–250, 2013.[DGWY19] Benjamin Doerr, Christian Gießen, Carsten Witt, and Jing Yang. The (1 + λ )evolutionary algorithm with self-adjusting mutation rate. Algorithmica , 81:593–631, 2019.[DHK12] Benjamin Doerr, Ashish Ranjan Hota, and Timo K¨otzing. Ants easily solvestochastic shortest path problems. In

Genetic and Evolutionary ComputationConference, GECCO 2012 , pages 17–24. ACM, 2012.[DHOW06] Vladimir G. Deineko, Michael Hoﬀmann, Yoshio Okamoto, and Gerhard J.Woeginger. The traveling salesman problem with few inner points.

Oper. Res.Lett. , 34(1):106–110, 2006.[DJ10] Benjamin Doerr and Daniel Johannsen. Edge-based representation beatsvertex-based representation in shortest path problems. In

Genetic and Evo-lutionary Computation Conference, GECCO 2010 , pages 759–766. ACM, 2010.38DJS11] Benjamin Doerr, Daniel Johannsen, and M. Schmidt. Runtime analysis of the(1+1) evolutionary algorithm on strings over ﬁnite alphabets. In

Foundationsof Genetic Algorithms, FOGA 2011 , pages 119–126. ACM, 2011.[DJS +

13] Benjamin Doerr, Thomas Jansen, Dirk Sudholt, Carola Winzen, and Chris-tine Zarges. Mutation rate matters even when optimizing monotone functions.

Evolutionary Computation , 21:1–21, 2013.[DJW00] Stefan Droste, Thomas Jansen, and Ingo Wegener. Dynamic parameter con-trol in simple evolutionary algorithms. In

Foundations of Genetic Algorithms,FOGA 2000 , pages 275–294. Morgan Kaufmann, 2000.[DJW02] Stefan Droste, Thomas Jansen, and Ingo Wegener. On the analysis of the (1+1)evolutionary algorithm.

Theoretical Computer Science , 276:51–81, 2002.[DJW12a] Benjamin Doerr, Daniel Johannsen, and Carola Winzen. Multiplicative driftanalysis.

Algorithmica , 64:673–697, 2012.[DJW12b] Benjamin Doerr, Daniel Johannsen, and Carola Winzen. Non-existence of linearuniversal drift functions.

Theoretical Computer Science , 436:71–86, 2012.[DK11] Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algo-rithms for subset selection, sparse approximation and dictionary selection. In

ICML , pages 1057–1064. Omnipress, 2011.[DK15] Benjamin Doerr and Marvin K¨unnemann. Optimizing linear functions withthe (1 + λ ) evolutionary algorithm—diﬀerent asymptotic runtimes for diﬀerentinstances. Theoretical Computer Science , 561:3–23, 2015.[DK19] Benjamin Doerr and Timo K¨otzing. Multiplicative up-drift. In

Genetic andEvolutionary Computation Conference, GECCO 2019 , pages 1470–1478. ACM,2019.[DK20a] Benjamin Doerr and Martin S. Krejca. Bivariate estimation-of-distributionalgorithms can ﬁnd an exponential number of optima. In

Genetic and Evolu-tionary Computation Conference, GECCO 2020 . ACM, 2020. To appear.[DK20b] Benjamin Doerr and Martin S. Krejca. Signiﬁcance-based estimation-of-distribution algorithms.

IEEE Transactions on Evolutionary Computation ,2020. To appear.[DK20c] Benjamin Doerr and Martin S. Krejca. The univariate marginal distributionalgorithm copes well with deception and epistasis. In

Evolutionary Computationin Combinatorial Optimization, EvoCOP 2020 , pages 51–66. Springer, 2020.39DL15] Duc-Cuong Dang and Per Kristian Lehre. Simpliﬁed runtime analysis of esti-mation of distribution algorithms. In

Genetic and Evolutionary ComputationConference, GECCO 2015 , pages 513–518. ACM, 2015.[DL16] Duc-Cuong Dang and Per Kristian Lehre. Self-adaptation of mutation rates innon-elitist populations. In

Parallel Problem Solving from Nature, PPSN 2016 ,pages 803–813. Springer, 2016.[DLMN17] Benjamin Doerr, Huu Phuoc Le, R´egis Makhmara, and Ta Duy Nguyen. Fastgenetic algorithms. In

Genetic and Evolutionary Computation Conference,GECCO 2017 , pages 777–784. ACM, 2017.[DLN19] Duc-Cuong Dang, Per Kristian Lehre, and Phan Trung Hai Nguyen. Level-based analysis of the univariate marginal distribution algorithm.

Algorithmica ,81:668–702, 2019.[DLO19] Benjamin Doerr, Andrei Lissovoi, and Pietro Simone Oliveto. Evolving booleanfunctions with conjunctions and disjunctions via genetic programming. In

Ge-netic and Evolutionary Computation Conference, GECCO 2019 , pages 1003–1011. ACM, 2019.[DLOW18] Benjamin Doerr, Andrei Lissovoi, Pietro S. Oliveto, and John Alasdair War-wicker. On the runtime analysis of selection hyper-heuristics with adap-tive learning periods. In

Genetic and Evolutionary Computation Conference,GECCO 2018 , pages 1015–1022. ACM, 2018.[DN20a] Viet Anh Do and Frank Neumann. Maximizing submodular or monotone func-tions under partition matroid constraints by multi-objective evolutionary al-gorithms. In

PPSN , Lecture Notes in Computer Science. Springer, 2020. toappear.[DN20b] Benjamin Doerr and Frank Neumann, editors.

Theoryof Evolutionary Computation—Recent Developments in Dis-crete Optimization . Springer, 2020. Also available athttps://cs.adelaide.edu.au/ ∼ frank/papers/TheoryBook2019-selfarchived.pdf .[DNDD +

18] Rapha¨el Dang-Nhu, Thibault Dardinier, Benjamin Doerr, Gautier Izacard, andDorian Nogneng. A new analysis method for evolutionary optimization of dy-namic and noisy objective functions. In

Genetic and Evolutionary ComputationConference, GECCO 2018 , pages 1467–1474. ACM, 2018.[DNS17] Benjamin Doerr, Frank Neumann, and Andrew M. Sutton. Time complex-ity analysis of evolutionary algorithms on random satisﬁable k -CNF formulas. Algorithmica , 78:561–586, 2017.40Doe11] Benjamin Doerr. Drift analysis. In

Genetic and Evolutionary ComputationConference, GECCO 2011, Companion Material , pages 1311–1320. ACM, 2011.[Doe19a] Benjamin Doerr. Analyzing randomized search heuristics via stochastic domi-nation.

Theoretical Computer Science , 773:115–137, 2019.[Doe19b] Benjamin Doerr. An exponential lower bound for the runtime of the compactgenetic algorithm on jump functions. In

Foundations of Genetic Algorithms,FOGA 2019 , pages 25–33. ACM, 2019.[Doe19c] Benjamin Doerr. A tight runtime analysis for the cGA on jump functions:EDAs can cross ﬁtness valleys at no extra cost. In

Genetic and EvolutionaryComputation Conference, GECCO 2019 , pages 1488–1496. ACM, 2019.[Doe20a] Benjamin Doerr. Does comma selection help to cope with local optima? In

Genetic and Evolutionary Computation Conference, GECCO 2020 . ACM, 2020.To appear.[Doe20b] Benjamin Doerr. Lower bounds for non-elitist evolutionary algorithms via nega-tive multiplicative drift. In

Parallel Problem Solving From Nature, PPSN 2020 .Springer, 2020. To appear.[Dos13] Martin Dost´al. Evolutionary music composition. In

Handbook of Optimization ,volume 38 of

Intelligent Systems Reference Library , pages 935–964. Springer,2013.[DP12] Benjamin Doerr and Sebastian Pohl. Run-time analysis of the (1+1) evolution-ary algorithm optimizing linear functions over a ﬁnite alphabet. In

Genetic andEvolutionary Computation Conference, GECCO 2012 , pages 1317–1324. ACM,2012.[Dro02] Stefan Droste. Analysis of the (1+1) EA for a dynamically changing OneMax-variant. In

Congress on Evolutionary Computation, CEC 2002 , pages 55–60.IEEE, 2002.[Dro03] Stefan Droste. Analysis of the (1+1) EA for a dynamically bitwise changingonemax. In

GECCO , volume 2723 of

Lecture Notes in Computer Science , pages909–921. Springer, 2003.[Dro04] Stefan Droste. Analysis of the (1+1) EA for a noisy onemax. In

GECCO (1) ,volume 3102 of

Lecture Notes in Computer Science , pages 1088–1099. Springer,2004.[Dro06] Stefan Droste. A rigorous analysis of the compact genetic algorithm for linearfunctions.

Natural Computing , 5:257–283, 2006.41DWY18a] Benjamin Doerr, Carsten Witt, and Jing Yang. Runtime analysis for self-adaptive mutation rates. In

Genetic and Evolutionary Computation Conference,GECCO 2018 , pages 1475–1482. ACM, 2018.[DWY18b] Benjamin Doerr, Carsten Witt, and Jing Yang. Runtime analysis for self-adaptive mutation rates.

CoRR , abs/1811.12824, 2018.[DZ20a] Benjamin Doerr and Weijie Zheng. A parameter-less compact genetic algorithm.In

Genetic and Evolutionary Computation Conference, GECCO 2020 . ACM,2020. To appear.[DZ20b] Benjamin Doerr and Weijie Zheng. Sharp bounds for genetic drift in estimation-of-distribution algorithms.

IEEE Transactions on Evolutionary Computation ,2020. To appear.[FGN +

19] Tobias Friedrich, Andreas G¨obel, Frank Neumann, Francesco Quinzan, andRalf Rothenberger. Greedy maximization of functions with bounded curvatureunder partition matroid constraints. In

AAAI , pages 2272–2279. AAAI Press,2019.[FGQW18a] Tobias Friedrich, Andreas G¨obel, Francesco Quinzan, and Markus Wagner.Evolutionary algorithms and submodular functions: Beneﬁts of heavy-tailedmutations.

CoRR , abs/1805.10902, 2018.[FGQW18b] Tobias Friedrich, Andreas G¨obel, Francesco Quinzan, and Markus Wagner.Heavy-tailed mutation operators in single-objective combinatorial optimization.In

Parallel Problem Solving from Nature, PPSN 2018, Part I , pages 134–145.Springer, 2018.[FHH +

10] Tobias Friedrich, Jun He, Nils Hebbinghaus, Frank Neumann, and CarstenWitt. Approximating covering problems by randomized search heuristics usingmulti-objective models.

Evolutionary Computation , 18:617–633, 2010.[FK13] Matthias Feldmann and Timo K¨otzing. Optimizing expected path lengths withant colony optimization using ﬁtness proportional update. In

Foundations ofGenetic Algorithms, FOGA 2013 , pages 65–74. ACM, 2013.[FKK16] Tobias Friedrich, Timo K¨otzing, and Martin S. Krejca. EDAs cannot bebalanced and stable. In

Genetic and Evolutionary Computation Conference,GECCO 2016 , pages 1139–1146. ACM, 2016.[FKKS17] Tobias Friedrich, Timo K¨otzing, Martin S. Krejca, and Andrew M. Sutton. Thecompact genetic algorithm is eﬃcient under extreme Gaussian noise.

IEEETransactions on Evolutionary Computation , 21:477–490, 2017.42FKQS17] Tobias Friedrich, Timo K¨otzing, Francesco Quinzan, and Andrew M. Sutton.Resampling vs recombination: a statistical run time estimation. In

FOGA ,pages 25–35. ACM, 2017.[FN14] Tobias Friedrich and Frank Neumann. Maximizing submodular functions undermatroid constraints by multi-objective evolutionary algorithms. In , volume8672 of

Lecture Notes in Computer Science , pages 922–931. Springer, 2014.[FQW18] Tobias Friedrich, Francesco Quinzan, and Markus Wagner. Escaping largedeceptive basins of attraction with heavy-tailed mutation operators. In

Ge-netic and Evolutionary Computation Conference, GECCO 2018 , pages 293–300.ACM, 2018.[GK16] Christian Gießen and Timo K¨otzing. Robustness of populations in stochasticenvironments.

Algorithmica , 75(3):462–489, 2016.[GKK18] Andreas G¨obel, Timo K¨otzing, and Martin S. Krejca. Intuitive analyses viadrift theory.

CoRR , abs/1806.01919, 2018.[GKS99] Josselin Garnier, Leila Kallel, and Marc Schoenauer. Rigorous hitting times forbinary mutations.

Evolutionary Computation , 7:173–203, 1999.[GL10] Oliver Giel and Per Kristian Lehre. On the eﬀect of populations in evolutionarymulti-objective optimisation.

Evol. Comput. , 18(3):335–356, 2010.[GW17] Christian Gießen and Carsten Witt. The interplay of population size and mu-tation probability in the (1 + λ ) EA on OneMax. Algorithmica , 78:587–609,2017.[GW18] Christian Gießen and Carsten Witt. Optimal mutation rates for the (1 + λ ) EAon OneMax through asymptotically tight drift analysis. Algorithmica , 80:1710–1731, 2018.[Haj82] Bruce Hajek. Hitting-time and occupation-time bounds implied by drift analysiswith applications.

Advances in Applied Probability , 13:502–525, 1982.[HLG99] Georges R. Harik, Fernando G. Lobo, and David E. Goldberg. The compactgenetic algorithm.

IEEE Transactions on Evolutionary Computation , 3:287–297, 1999.[HP11] Mark Hauschild and Martin Pelikan. An introduction and survey of estimationof distribution algorithms.

Swarm and Evolutionary Compututation , 1:111–128,2011. 43HR97] Markus Hohfeld and Gnter Rudolph. Towards a theory of population-basedincremental learning. In

Conference on Evolutionary Computation , pages 1–5.IEEE Press, 1997.[HS18] V´aclav Hasen¨ohrl and Andrew M. Sutton. On the runtime dynamics of thecompact genetic algorithm on jump functions. In

Genetic and EvolutionaryComputation Conference, GECCO 2018 , pages 967–974. ACM, 2018.[HY01] Jun He and Xin Yao. Drift analysis and average time complexity of evolutionaryalgorithms.

Artiﬁcial Intelligence , 127:51–81, 2001.[J¨ag08] Jens J¨agersk¨upper. A blend of Markov-chain and drift analysis. In

ParallelProblem Solving From Nature, PPSN 2008 , pages 41–51. Springer, 2008.[Jan07] Thomas Jansen. On the brittleness of evolutionary algorithms. In

Foundationsof Genetic Algorithms, FOGA 2007 , pages 54–69. Springer, 2007.[JBS93] Ari Juels, Shumeet Baluja, and Alistair Sinclair. The equilibrium genetic algo-rithm and the role of crossover. Unpublished, 1993.[Jen04] Mikkel T. Jensen. Helper-objectives: Using multi-objective evolutionary algo-rithms for single-objective optimisation.

J. Math. Model. Algorithms , 3(4):323–347, 2004.[JJW05] Thomas Jansen, Kenneth A. De Jong, and Ingo Wegener. On the choice of theoﬀspring population size in evolutionary algorithms.

Evolutionary Computation ,13:413–440, 2005.[Joh10] Daniel Johannsen.

Random Combinatorial Structures and Randomized SearchHeuristics . PhD thesis, Universit¨at des Saarlandes, 2010.[JOZ13] Thomas Jansen, Pietro Simone Oliveto, and Christine Zarges. Approximatingvertex cover using edge-based representations. In

FOGA , pages 87–96. ACM,2013.[JS07] Jens J¨agersk¨upper and Tobias Storch. When the plus strategy outperforms thecomma strategy and when not. In

Foundations of Computational Intelligence,FOCI 2007 , pages 25–32. IEEE, 2007.[JW00] Thomas Jansen and Ingo Wegener. On the choice of the mutation probabilityfor the (1+1) EA. In

Parallel Problem Solving from Nature, PPSN 2000 , pages89–98. Springer, 2000.[JW06] Thomas Jansen and Ingo Wegener. On the analysis of a dynamic evolutionaryalgorithm.

Journal of Discrete Algorithms , 4:181–199, 2006.44KG14] Andreas Krause and Daniel Golovin. Submodular function maximization. In

Tractability , pages 71–104. Cambridge University Press, 2014.[KHE15] Giorgos Karafotias, Mark Hoogendoorn, and ´Agoston E. Eiben. Parametercontrol in evolutionary algorithms: trends and challenges.

IEEE Transactionson Evolutionary Computation , 19:167–187, 2015.[KLNO10] Stefan Kratsch, Per Kristian Lehre, Frank Neumann, and Pietro SimoneOliveto. Fixed parameter evolutionary algorithms and maximum leaf span-ning trees: A matter of mutation. In

PPSN (1) , volume 6238 of

Lecture Notesin Computer Science , pages 204–213. Springer, 2010.[KLW15a] Timo K¨otzing, Andrei Lissovoi, and Carsten Witt. (1+1) EA on generalizeddynamic OneMax. In

Foundations of Genetic Algorithms, FOGA 2015 , pages40–51. ACM, 2015.[KLW15b] Timo K¨otzing, Andrei Lissovoi, and Carsten Witt. (1+1) EA on generalizeddynamic onemax. In

FOGA , pages 40–51. ACM, 2015.[KM12] Timo K¨otzing and Hendrik Molter. ACO beats EA on a dynamic pseudo-boolean function. In

PPSN (1) , volume 7491 of

Lecture Notes in ComputerScience , pages 113–122. Springer, 2012.[KN13] Stefan Kratsch and Frank Neumann. Fixed-parameter evolutionary algorithmsand the vertex cover problem.

Algorithmica , 65(4):754–771, 2013.[KU18] Adrian Kosowski and Przemyslaw Uznanski. Population protocols are fast.

CoRR , abs/1802.06872, 2018.[KW20a] Martin Krejca and Carsten Witt. Theory of estimation-of-distribution algo-rithms. In Benjamin Doerr and Frank Neumann, editors,

Theory of Evolu-tionary Computation: Recent Developments in Discrete Optimization , pages405–442. Springer, 2020. Also available at https://arxiv.org/abs/1806.05392.[KW20b] Martin S. Krejca and Carsten Witt. Lower bounds on the run time of theUnivariate Marginal Distribution Algorithm on OneMax.

Theoretical ComputerScience , 2020. To appear.[Leh10] Per Kristian Lehre. Negative drift in populations. In

Parallel Problem Solvingfrom Nature, PPSN 2010 , pages 244–253. Springer, 2010.[Leh11] Per Kristian Lehre. Fitness-levels for non-elitist populations. In

Genetic andEvolutionary Computation Conference, GECCO 2011 , pages 2075–2082. ACM,2011. 45Len20] Johannes Lengler. Drift analysis. In Benjamin Doerr and Frank Neu-mann, editors,

Theory of Evolutionary Computation: Recent Developmentsin Discrete Optimization , pages 89–131. Springer, 2020. Also available athttps://arxiv.org/abs/1712.00964.[Lew08] Matthew R. Lewis. Evolutionary visual art and design. In

The Art of ArtiﬁcialEvolution , Natural Computing Series, pages 3–37. Springer, 2008.[LL02] Pedro Larra˜naga and Jos´e Antonio Lozano, editors.

Estimation of Distribu-tion Algorithms . Genetic Algorithms and Evolutionary Computation. Springer,2002.[LLM07] Fernando G. Lobo, Cl´audio F. Lima, and Zbigniew Michalewicz, editors.

Pa-rameter Setting in Evolutionary Algorithms . Springer, 2007.[LMNS09] Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko.Non-monotone submodular maximization under matroid and knapsack con-straints. In

STOC , pages 323–332. ACM, 2009.[LMS19] Johannes Lengler, Anders Martinsson, and Angelika Steger. When does hill-climbing fail on monotone functions: an entropy compression argument. In

An-alytic Algorithmics and Combinatorics, ANALCO 2019 , pages 94–102. SIAM,2019.[LN17] Per Kristian Lehre and Phan Trung Hai Nguyen. Improved runtime boundsfor the univariate marginal distribution algorithm via anti-concentration. In

Genetic and Evolutionary Computation Conference, GECCO 2017 , pages 1383–1390. ACM, 2017.[LN19a] Per Kristian Lehre and Phan Trung Hai Nguyen. On the limitations of theunivariate marginal distribution algorithm to deception and where bivariateEDAs might help. In

Foundations of Genetic Algorithms, FOGA 2019 , pages154–168. ACM, 2019.[LN19b] Per Kristian Lehre and Phan Trung Hai Nguyen. Runtime analysis of theunivariate marginal distribution algorithm under low selective pressure andprior noise. In

Genetic and Evolutionary Computation Conference, GECCO2019 , pages 1497–1505. ACM, 2019.[Lob07] Fernando G. Lobo. Lost gems of EC: The equilibrium genetic algorithm andthe role of crossover.

SIGEVOlution , 2(2):14–15, 2007.[LS11] J¨org L¨assig and Dirk Sudholt. Adaptive population models for oﬀspring popu-lations and parallel evolutionary algorithms. In

Foundations of Genetic Algo-rithms, FOGA 2011 , pages 181–192. ACM, 2011.46LS18] Johannes Lengler and Angelika Steger. Drift analysis and evolutionary algo-rithms revisited.

Combinatorics, Probability & Computing , 27:643–666, 2018.[LSW18] Johannes Lengler, Dirk Sudholt, and Carsten Witt. Medium step sizes areharmful for the compact genetic algorithm. In

Genetic and Evolutionary Com-putation Conference, GECCO 2018 , pages 1499–1506. ACM, 2018.[LW15] Andrei Lissovoi and Carsten Witt. Runtime analysis of ant colony optimizationon dynamic shortest path problems.

Theor. Comput. Sci. , 561:73–85, 2015.[LW16] Andrei Lissovoi and Carsten Witt. MMAS versus population-based EA on afamily of dynamic ﬁtness functions.

Algorithmica , 75(3):554–576, 2016.[LW18] Andrei Lissovoi and Carsten Witt. The impact of a sparse migration topol-ogy on the runtime of island models in dynamic optimization.

Algorithmica ,80(5):1634–1657, 2018.[MD10] Christie Myburgh and Kalyanmoy Deb. Evolutionary algorithms in large-scaleopen pit mine scheduling. In

GECCO , pages 1155–1162. ACM, 2010.[MP96] Heinz M¨uhlenbein and Gerhard Paass. From recombination of genes to theestimation of distributions I. Binary parameters. In

Parallel Problem Solvingfrom Nature, PPSN 1996 , pages 178–187. Springer, 1996.[MRC09] Boris Mitavskiy, Jonathan E. Rowe, and Chris Cannings. Theoretical analysis oflocal search strategies to optimize network communication subject to preservingthe total number of links.

International Journal on Intelligent Computing andCybernetics , 2:243–284, 2009.[MS15] Andrea Mambrini and Dirk Sudholt. Design and analysis of schemes for adapt-ing migration intervals in parallel evolutionary algorithms.

Evolutionary Com-putation , 23:559–582, 2015.[M¨uh92] Heinz M¨uhlenbein. How genetic algorithms really work: mutation and hill-climbing. In

Parallel Problem Solving from Nature, PPSN 1992 , pages 15–26.Elsevier, 1992.[NAN20] Aneta Neumann, Bradley Alexander, and Frank Neumann. Evolutionary imagetransition and painting using random walks.

CoRR , abs/2003.01517, 2020. toappear in the journal Evolutionary Computation (MIT Press).[NAW20] Mehdi Neshat, Bradley Alexander, and Markus Wagner. A hybrid cooperativeco-evolution algorithm framework for optimising power take oﬀ and placementsof wave energy converters.

Inf. Sci. , 534:218–244, 2020.47NN20] Aneta Neumann and Frank Neumann. Optimising chance-constrained sub-modular functions using evolutionary multi-objective algorithms. In

PPSN ,Lecture Notes in Computer Science. Springer, 2020. to appear, available athttp://arxiv.org/abs/2006.11444.[NOW09] Frank Neumann, Pietro S. Oliveto, and Carsten Witt. Theoretical analysisof ﬁtness-proportional selection: landscapes and eﬃciency. In

Genetic andEvolutionary Computation Conference, GECCO 2009 , pages 835–842. ACM,2009.[NPW19] Frank Neumann, Mojgan Pourhassan, and Carsten Witt. Improved runtime re-sults for simple randomised search heuristics on linear functions with a uniformconstraint. In

GECCO , pages 1506–1514. ACM, 2019.[NS19] Frank Neumann and Andrew M. Sutton. Runtime analysis of the (1 + 1)evolutionary algorithm for the chance-constrained knapsack problem. In

FOGA ,pages 147–153. ACM, 2019.[NSN13a] Samadhi Nallaperuma, Andrew M. Sutton, and Frank Neumann. Fixed-parameter evolutionary algorithms for the euclidean traveling salesperson prob-lem. In

IEEE Congress on Evolutionary Computation , pages 2037–2044. IEEE,2013.[NSN13b] Samadhi Nallaperuma, Andrew M. Sutton, and Frank Neumann. Parame-terized complexity analysis and more eﬀective construction methods for ACOalgorithms and the euclidean traveling salesperson problem. In

IEEE Congresson Evolutionary Computation , pages 2045–2052. IEEE, 2013.[NW06] Frank Neumann and Ingo Wegener. Minimum spanning trees made easier viamulti-objective optimization.

Nat. Comput. , 5(3):305–319, 2006.[NW07] Frank Neumann and Ingo Wegener. Randomized local search, evolutionaryalgorithms, and the minimum spanning tree problem.

Theoretical ComputerScience , 378:32–40, 2007.[NW10] Frank Neumann and Carsten Witt.

Bioinspired Computation in CombinatorialOptimization – Algorithms and Their Computational Complexity . Springer,2010.[NW15] Frank Neumann and Carsten Witt. On the runtime of randomized local searchand simple evolutionary algorithms for dynamic makespan scheduling. In

IJ-CAI , pages 3742–3748. AAAI Press, 2015.[NWF78] George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysisof approximations for maximizing submodular set functions - I.

Math. Program. ,14(1):265–294, 1978. 48Och02] Gabriela Ochoa. Setting the mutation rate: scope and limitations of the 1/Lheuristic. In

Genetic and Evolutionary Computation Conference, GECCO 2002 ,pages 495–502. Morgan Kaufmann, 2002.[OE12] Adrian Ogierman and Robert Els¨asser. The impact of the power law exponenton the behavior of a dynamic epidemic type process. In

Symposium on Par-allelism in Algorithms and Architectures, SPAA 2012 , pages 131–139. ACM,2012.[OW11] Pietro S. Oliveto and Carsten Witt. Simpliﬁed drift analysis for proving lowerbounds in evolutionary computation.

Algorithmica , 59:369–386, 2011.[OW12] Pietro S. Oliveto and Carsten Witt. Erratum: Simpliﬁed drift analysis for prov-ing lower bounds in evolutionary computation.

CoRR , abs/1211.7184, 2012.[OW15] Pietro S. Oliveto and Carsten Witt. Improved time complexity analysis of thesimple genetic algorithm.

Theoretical Computer Science , 605:21–41, 2015.[OWBM13] Yuki Osada, R. Lyndon While, Luigi Barone, and Zbigniew Michalewicz.Multi-mine planning using a multi-objective evolutionary algorithm. In

IEEECongress on Evolutionary Computation , pages 2902–2909. IEEE, 2013.[PGN15] Mojgan Pourhassan, Wanru Gao, and Frank Neumann. Maintaining 2-approximations for the dynamic vertex cover problem using evolutionary al-gorithms. In

GECCO , pages 903–910. ACM, 2015.[PHL15] Martin Pelikan, Mark Hauschild, and Fernando G. Lobo. Estimation of distri-bution algorithms. In Janusz Kacprzyk and Witold Pedrycz, editors,

SpringerHandbook of Computational Intelligence , pages 899–928. Springer, 2015.[PRN20] Mojgan Pourhassan, Vahid Roostapour, and Frank Neumann. Runtime analysisof RLS and (1+1) EA for the dynamic weighted vertex cover problem.

Theor.Comput. Sci. , 832:20–41, 2020.[Pr¨u04] Adam Pr¨ugel-Bennett. When a genetic algorithm outperforms hill-climbing.

Theoretical Computer Science , 320:135–153, 2004.[PSN19] Mojgan Pourhassan, Feng Shi, and Frank Neumann. Parameterized analysis ofmultiobjective evolutionary algorithms and the weighted vertex cover problem.

Evol. Comput. , 27(4):559–575, 2019.[QBJT19] Chao Qian, Chao Bian, Wu Jiang, and Ke Tang. Running time analysis of the(1 + 1)-EA for OneMax and LeadingOnes under bit-wise noise.

Algorithmica ,81:749–795, 2019. 49QSYT17] Chao Qian, Jing-Cheng Shi, Yang Yu, and Ke Tang. On subset selection withgeneral cost constraints. In Carles Sierra, editor,

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2017,Melbourne, Australia, August 19-25, 2017 , pages 2613–2619. ijcai.org, 2017.[QYT +

19] Chao Qian, Yang Yu, Ke Tang, Xin Yao, and Zhi-Hua Zhou. Maximizing sub-modular or monotone approximately submodular functions by multi-objectiveevolutionary algorithms.

Artif. Intell. , 275:279–294, 2019.[QYZ15] Chao Qian, Yang Yu, and Zhi-Hua Zhou. Subset selection by pareto optimiza-tion. In

NIPS , pages 1774–1782, 2015.[Rec73] Ingo Rechenberg.

Evolutionsstrategie . Friedrich Fromman Verlag (G¨untherHolzboog KG), Stuttgart, 1973.[RNN20] Vahid Roostapour, Aneta Neumann, and Frank Neumann. Evolutionarymulti-objective optimization for the dynamic knapsack problem.

CoRR ,abs/2004.12574, 2020. Conference version appeared at PPSN 2018.[RNNF18] Vahid Roostapour, Aneta Neumann, Frank Neumann, and Tobias Friedrich.Pareto optimization for subset selection with dynamic cost constraints.

CoRR ,abs/1811.07806, 2018.[RNNF19] Vahid Roostapour, Aneta Neumann, Frank Neumann, and Tobias Friedrich.Pareto optimization for subset selection with dynamic cost constraints. In

AAAI , pages 2354–2361. AAAI Press, 2019.[Row18] Jonathan E. Rowe. Linear multi-objective drift analysis.

Theoretical ComputerScience , 736:25–40, 2018.[RPN18] Vahid Roostapour, Mojgan Pourhassan, and Frank Neumann. Analysis ofevolutionary algorithms in dynamic and stochastic environments.

CoRR ,abs/1806.08547, 2018.[RS14] Jonathan E. Rowe and Dirk Sudholt. The choice of the oﬀspring population sizein the (1, λ ) evolutionary algorithm. Theoretical Computer Science , 545:20–38,2014.[Rud97] G¨unter Rudolph.

Convergence properties of evolutionary algorithms . Kovac,1997.[RW20] Amirhossein Rajabi and Carsten Witt. Self-adjusting evolutionary algorithmsfor multimodal optimization. In

Genetic and Evolutionary Computation Con-ference, GECCO 2020 . ACM, 2020. To appear.50Sha02] Jonathan L. Shapiro. The sensitivity of PBIL to its learning rate, and howdetailed balance can remove it. In

Foundations of Genetic Algorithms, FOGA2002 , pages 115–132. Morgan Kaufmann, 2002.[Sha05] Jonathan L. Shapiro. Drift and scaling in estimation of distribution algorithms.

Evolutionary Computing , 13:99–123, 2005.[Sha06] Jonathan L. Shapiro. Diversity loss in general estimation of distribution algo-rithms. In

Parallel Problem Solving from Nature, PPSN 2006 , pages 92–101.Springer, 2006.[SN12] Andrew M. Sutton and Frank Neumann. A parameterized runtime analysisof evolutionary algorithms for the Euclidean traveling salesperson prob-lem. In

Proceedings of the Twenty-Sixth Conference on Artiﬁcial Intelligence(AAAI’12) , pages 1105–1111. AAAI Press, 2012.[SNN14] Andrew M. Sutton, Frank Neumann, and Samadhi Nallaperuma. Parameterizedruntime analyses of evolutionary algorithms for the planar euclidean travelingsalesperson problem.

Evol. Comput. , 22(4):595–628, 2014.[SSF +

19] Feng Shi, Martin Schirneck, Tobias Friedrich, Timo K¨otzing, and Frank Neu-mann. Reoptimization time analysis of evolutionary algorithms on linear func-tions under dynamic uniform constraints.

Algorithmica , 81(2):828–857, 2019.[ST12] Dirk Sudholt and Christian Thyssen. A simple ant colony optimizer for stochas-tic shortest path problems.

Algorithmica , 64:643–672, 2012.[Sto06] Tobias Storch. How randomized search heuristics ﬁnd maximum cliques inplanar graphs. In

GECCO , pages 567–574. ACM, 2006.[Sto07] Tobias Storch. Finding large cliques in sparse semi-random graphs by simplerandomized search heuristics.

Theor. Comput. Sci. , 386(1-2):114–131, 2007.[STW04] Jens Scharnow, Karsten Tinnefeld, and Ingo Wegener. The analysis of evolu-tionary algorithms on sorting and shortest paths problems.

Journal of Mathe-matical Modelling and Algorithms , 3:349–366, 2004.[Sud13] Dirk Sudholt. A new method for lower bounds on the running time of evolution-ary algorithms.

IEEE Transactions on Evolutionary Computation , 17:418–435,2013.[Sud18] Dirk Sudholt. On the robustness of evolutionary algorithms to noise: reﬁnedresults and an example where noise helps. In

Genetic and Evolutionary Com-putation Conference, GECCO 2018 , pages 1523–1530. ACM, 2018.51SW19] Dirk Sudholt and Carsten Witt. On the choice of the update strength inestimation-of-distribution algorithms and ant colony optimization.

Algorith-mica , 81:1450–1489, 2019.[The09] Madeleine Theile. Exact solutions to the traveling salesperson problem bya population-based evolutionary algorithm. In

Evolutionary Computation inCombinatorial Optimization, EvoCOP 2009 , pages 145–155. Springer, 2009.[TWD +

13] Raymond Tran, Junhua Wu, Christopher Denison, Thomas Ackling, MarkusWagner, and Frank Neumann. Fast and eﬀective multi-objective optimisationof wind turbine placement. In

GECCO , pages 1381–1388. ACM, 2013.[Weg01] Ingo Wegener. Theoretical aspects of evolutionary algorithms. In

Automata,Languages and Programming, ICALP 2001 , pages 64–78. Springer, 2001.[Wit06] Carsten Witt. Runtime analysis of the ( µ + 1) EA on simple pseudo-Booleanfunctions. Evolutionary Computation , 14:65–86, 2006.[Wit13] Carsten Witt. Tight bounds on the optimization time of a randomized searchheuristic on linear functions.

Combinatorics, Probability & Computing , 22:294–318, 2013.[Wit19] Carsten Witt. Upper bounds on the running time of the univariate marginaldistribution algorithm on OneMax.

Algorithmica , 81:632–667, 2019.[WQT18] Mengxi Wu, Chao Qian, and Ke Tang. Dynamic mutation based Pareto op-timization for subset selection. In

Intelligent Computing Methodologies, ICIC2018, Part III , pages 25–35. Springer, 2018.[XHA +

19] Yue Xie, Oscar Harper, Hirad Assimi, Aneta Neumann, and Frank Neumann.Evolutionary algorithms for the chance-constrained knapsack problem. In

GECCO , pages 338–346. ACM, 2019.[XNN20] Yue Xie, Aneta Neumann, and Frank Neumann. Speciﬁc single- and multi-objective evolutionary algorithms for the chance-constrained knapsack problem.

CoRR , abs/2004.03205, 2020. Conference version to appear at GECCO 2020.[ZYD18] Weijie Zheng, Guangwen Yang, and Benjamin Doerr. Working principles ofbinary diﬀerential evolution. In

Genetic and Evolutionary Computation Con-ference, GECCO 2018 , pages 1103–1110. ACM, 2018.[ZYQ19] Zhi-Hua Zhou, Yang Yu, and Chao Qian.