The Univariate Marginal Distribution Algorithm Copes Well With Deception and Epistasis
TThe Univariate Marginal DistributionAlgorithm Copes Well With Deception andEpistasis
Benjamin Doerr [email protected] d’Informatique (LIX), CNRS, École Polytechnique, Institut Polytechniquede Paris, Palaiseau, France
Martin S. Krejca [email protected] Plattner Institute, University of Potsdam, Potsdam, Germany
Abstract
In their recent work, Lehre and Nguyen (FOGA 2019) show that the univariate marginaldistribution algorithm (UMDA) needs time exponential in the parent populations sizeto optimize the
DeceptiveLeadingBlocks ( DLB ) problem. They conclude fromthis result that univariate EDAs have difficulties with deception and epistasis.In this work, we show that this negative finding is caused by an unfortunate choice ofthe parameters of the UMDA. When the population sizes are chosen large enough toprevent genetic drift, then the UMDA optimizes the DLB problem with high probabilitywith at most λ ( n + 2 e ln n ) fitness evaluations. Since an offspring population size λ of order n log n can prevent genetic drift, the UMDA can solve the DLB problemwith O ( n log n ) fitness evaluations. In contrast, for classic evolutionary algorithmsno better run time guarantee than O ( n ) is known (which we prove to be tight forthe (1 + 1) EA), so our result rather suggests that the UMDA can cope well withdeception and epistatis.From a broader perspective, our result shows that the UMDA can cope better withlocal optima than evolutionary algorithms; such a result was previously known onlyfor the compact genetic algorithm. Together with the result of Lehre and Nguyen,our result for the first time rigorously proves that running EDAs in the regime withgenetic drift can lead to drastic performance losses. Keywords
Estimation-of-distribution algorithm, univariate marginal distribution algorithm, runtime analysis, epistasis, theory.
Estimation-of-distribution algorithms (EDAs) are randomized search heuristics thatevolve a probabilistic model of the search space in an iterative manner. Starting withthe uniform distribution, an EDA takes samples from its current model and then adjustsit such that better solutions have a higher probability of being generated in the nextiteration. This method of refinement leads to gradually better solutions and performswell on many practical problems, often outperforming competing approaches (Pelikanet al., 2015).Theoretical analyses of EDAs also often suggest an advantage of EDAs whencompared to evolutionary algorithms (EAs); for an in-depth survey of run time resultsfor EDAs, please refer to the article by Krejca and Witt (2020). With respect to simple c (cid:13) a r X i v : . [ c s . N E ] J u l . Doerr and M. S. Krejca unimodal functions, EDAs seem to be comparable to EAs. For example, Sudholt andWitt (2019) proved that the two EDAs cGA and 2-MMAS ib have an expected run timeof Θ( n log n ) on the standard theory benchmark function OneMax (assuming optimalparameter settings; n being the problem size), which is a run time that many EAsshare. The same is true for the EDA UMDA, as shown by the results of Krejca andWitt (2017), Lehre and Nguyen (2017), and Witt (2019). For the benchmark function LeadingOnes , Dang and Lehre (2015) proved an expected run time of O ( n ) for theUMDA when setting the parameters right, which is, again, a common run time bound forEAs on this function. One result suggesting that EDAs can outperform EAs on unimodalfunction was given by Doerr and Krejca (2018). They proposed an EDA called sig-cGA,which has an expected run time of O ( n log n ) on both OneMax and
LeadingOnes – aperformance not known for any classic EA or EDA.For the class of all linear functions, the EDAs perform slightly worse than EAs. Theclassical (1 + 1) evolutionary algorithm optimizes all linear functions in time O ( n log n )in expectation (Droste et al., 2002). In contrast, the conjecture of Droste (2006) that thecGA does not perform equally well on all linear functions was recently proven by Witt(2018), who showed that the cGA has an Ω( n ) expected run time on the binary valuefunction. We note that the binary value function was found harder also for classical EAs.While the (1 + λ ) evolutionary algorithm optimizes OneMax with Θ( nλ log log λ/ log λ )fitness evaluations, it takes Θ( nλ ) fitness evaluations for the binary value functions (Doerrand Künnemann, 2015).For the bimodal Jump k benchmark function, which has a local optimum witha Hamming distance of k away from the global optimum, EDAs seem to drasticallyoutperform EAs. Hasenöhrl and Sutton (2018) recently proved that the cGA only has arun time of exp (cid:0) O ( k + log n ) (cid:1) . Doerr (2019b) proved that the cGA even has an expectedrun time of O ( n log n ) on Jump k if k < ln n , meaning that the cGA is unfazed by thegap of k separating the local from the global optimum. In contrast, common EAs haverun times such as Θ( n k ) ((1 + 1) EA, Droste et al. (2002)), Ω( n k ) (( µ, λ ) EA, Doerr(2020a)), and Θ( n k − ) (( µ + 1) GA, Dang et al. (2018)), and only go down to smallerrun times such as O ( n log n + kn + 4 k ) by using crossover in combination with diversitymechanisms like island models (Dang et al., 2016).Another result in favor of EDAs was given by Chen et al. (2009), who introducedthe Substring function and proved that the UMDA optimizes it in polynomial time,whereas the (1 + 1) evolutionary algorithm has an exponential run time, both with highprobability. In the
Substring function, only substrings of length αn , for α ∈ (0 , DeceptiveLeadingBlocks function (
DLB for short), which they introduce and which consists of blocks of size 2, eachwith a deceptive function value, that need to be solved sequentially. The authors provethat many common EAs optimize
DLB within O ( n ) fitness evaluations in expectation,whereas the UMDA has a run time of e Ω( µ ) (where µ is an algorithm-specific parameterthat often is chosen as a small power of n ) for a large regime of parameters. Only forextreme parameter values, the authors prove an expected run time of O ( n ) also for theUMDA.2 Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis
In this paper, we prove that the UMDA is, in fact, able to optimize
DLB in time O ( n log n ) with high probability if its parameters are chosen more carefully (Theorem 3).Note that our result is better than any of the run times proven in the paper by Lehreand Nguyen (2019). We achieve this run time by choosing the parameters of the UMDAsuch that its model is unlikely to degenerate during the run time (Lemma 2). Here by degenerate we mean that the sampling frequencies approach the boundary values 0 and1 without that this is justified by the objective function. This leads to a probabilisticmodel that is strongly concentrated around a single search point. This effect is oftencalled genetic drift (Sudholt and Witt, 2019). While it appears natural to choose theparameters of an EDA as to prevent genetic drift, it also has been proven that geneticdrift can lead to a complicated run time landscape and inferior performance (see Lengleret al. (2018) for the cGA).In contrast to our setting, for their exponential lower bound, Lehre and Nguyen(2019) use parameters that lead to genetic drift. Once the probabilistic model it issufficiently degenerated, the progress of the UMDA is so slow that even to leave the localoptima of DLB (that have a better search point in Hamming distance two only), theEDA takes time exponential in µ .Since the UMDA shows a good performance in the (more natural) regime withoutgenetic drift and was shown inferior only in the regime with genetic drift, we disagreewith the statement of Lehre and Nguyen (2019) that there are “inherent limitations ofunivariate EDAs against deception and epistasis”.In addition to the improved run time, we derive our result using only tools commonlyused in the analysis of EDAs and EAs, whereas the proof of the polynomial run timeof O ( n ) for the UMDA with uncommon parameter settings (Lehre and Nguyen, 2019)uses the level-based population method (Lehre, 2011; Dang and Lehre, 2016; Corus et al.,2018; Doerr and Kötzing, 2019), which is an advanced tool that can be hard to use. Weare thus optimistic that our analysis method can be useful also in other run time analysesof EDAs.We recall that the previous work (Lehre and Nguyen, 2019) only proved upper boundsfor the run time of EAs on DLB , namely of order O ( n ) unless overly large populationsizes are used. To support our claim that the UMDA shows a better performance on DLB than EAs, we rigorously prove a lower bound of Ω( n ) for the run time of the(1 + 1) EA on DLB (Theorem 4). More precisely, we determine a precise expressionfor the expected run time of this algorithm on
DLB , which is asymptotically equal to(1 ± o (1)) e − n .Last, we complement our theoretical result with an empirical comparison of theUMDA to various other evolutionary algorithms. The outcome of these experimentssuggests that the UMDA outperforms the competing approaches while also having asmaller variance (Figure 1). Further, we compare the UMDA to the EDA MIMIC (Bonetet al., 1996), which is similar to the UMDA but uses a more sophisticated probabilisticmodel that is capable of capturing dependencies among bit positions. Our comparisonshows that, for the same parameter regime of both algorithms, the UMDA and theMIMIC behave asymptotically equally, with the MIMIC being slightly faster. This, again,highlights that the UMDA is well suited to optimize DLB .The remainder of this paper is structured as follows: in Section 2, we introduceour notation, formally define
DLB and the UMDA, and we state the tools we use inour analysis. Section 3 contains our main result (Theorem 3) and discusses its proofinformally before stating the different lemmas used to prove it. In Section 4, we conducta tight runtime analysis of the (1 + 1) EA on
DLB . In Section 5, we discuss our empirical
Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca results. Last, we conclude this paper in Section 6.This paper extends our conference version (Doerr and Krejca, 2020b) in two majorways: (i) we prove a lower bound of Ω( n ) for the run time of the (1 + 1) EA on DLB (Theorem 4). (ii) Our empirical analysis contains more EAs, the MIMIC (Figure 1), andalso considers the impact of the UMDA’s population size (Figure 2).
We are concerned with the run time analysis of algorithms optimizing pseudo-Booleanfunctions, that is, functions f : { , } n → R , where n ∈ N denotes the dimension of theproblem. Given a pseudo-Boolean function f and a bit string x , we refer to f as a fitnessfunction, to x as an individual, and to f ( x ) as the fitness of x . For n , n ∈ N := { , , , . . . } , we define [ n ..n ] = [ n , n ] ∩ N , and for an n ∈ N ,we define [ n ] = [1 ..n ]. From now on, if not stated otherwise, the variable n always denotesthe problem size. For a vector x of length n , we denote its component at index i ∈ [ n ] by x i and, for and index set I ⊆ [ n ], we denote the subvector of length | I | consisting only ofthe components at indices in I by x I . Further, let | x | denote the number of 1s of x and | x | its number of 0s. DeceptiveLeadingBlocks.
The pseudo-Boolean function
DeceptiveLeadingBlocks (abbreviated as
DLB ) wasintroduced by Lehre and Nguyen (2019) as a deceptive version of the well knownbenchmark function
LeadingOnes . In
DLB , an individual x of length n is divided intoblocks of equal size 2. Each block consists of a trap, where the fitness of each block isdetermined by the number of 0s (minus 1), except that a block of all 1s has the bestfitness of 2. The overall fitness of x is then determined by the longest prefix of blockswith fitness 2 plus the fitness of the following block. Note that in order for the chunkingof DLB to make sense, it needs to hold that 2 divides n . In the following, we alwaysassume this implicitly.We now provide a formal definition of DLB . To this end, we first introduce thefunction
DeceptiveBlock : { , } → [0 ..
2] (abbreviated as DB ), which determines thefitness of a block (of size 2). For all x ∈ { , } , we have DB ( x ) = ( | x | = 2 , | x | − Prefix : { , } n → [0 ..n ], which determines the longestprefix of x with blocks of fitness 2. For a logic formula P , let [ P ] be 1 if P is true and 0otherwise. We define, for all x ∈ { , } n , Prefix ( x ) = n/ X i =1 (cid:2) ∀ j ≤ i : DB ( x { i − , i } ) = 2 (cid:3) . DLB is now defined as follows for all x ∈ { , } n : DLB ( x ) = ( n if Prefix ( x ) = n, P Prefix ( x )+1 i =1 DB ( x { i − , i } ) else. The univariate marginal distribution algorithm.
Our algorithm of interest is the UMDA ((Mühlenbein and Paaß, 1996); Algorithm 1) withparameters µ, λ ∈ N + , µ ≤ λ . It maintains a vector p (frequency vector) of probabilities4 Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis (frequencies) of length n as its probabilistic model. This vector is used to sample anindividual x ∈ { , } n , which we denote as x ∼ sample( p ), such that, for all y ∈ { , } n ,Pr[ x = y ] = n Y i =1: y i =1 p i n Y i =1: y i =0 (1 − p i ) . The UMDA updates this vector iteratively in the following way: first, λ individuals aresampled. Then, among these λ individuals, a subset of µ with the highest fitness ischosen (breaking ties uniformly at random), and, for each index i ∈ [ n ], the frequency p i is set to the relative number of 1s at position i among the µ best individuals. Last, if afrequency p i is below n , it is increased to n , and, analogously, frequencies above 1 − n are set to 1 − n . Capping into the interval [ n , − n ] circumvents frequencies from beingstuck at the extremal values 0 or 1. Last, we denote the frequency vector of iteration t ∈ N with p ( t ) . Algorithm 1:
The UMDA (Mühlenbein and Paaß, 1996) with parameters µ and λ , µ ≤ λ , maximizing a fitness function f : { , } n → R with n ≥ t ← p ( t ) ← ( ) i ∈ [ n ] ; repeat (cid:66) iteration t for i ∈ [ λ ] do x ( i ) ∼ sample (cid:0) p ( t ) (cid:1) ; let y (1) , . . . , y ( µ ) denote the µ best individuals out of x (1) , . . . , x ( λ ) (breakingties uniformly at random); for i ∈ [ n ] do p ( t +1) i ← µ P µj =1 y ( j ) i ; restrict p ( t +1) to the interval [ n , − n ]; t ← t + 1; until termination criterion met; Run time analysis.
When analyzing the run time of the UMDA optimizing a fitness function f , we areinterested in the number T of fitness function evaluations until an optimum of f issampled for the first time. Since the UMDA is a randomized algorithm, this run time T is a random variable. Note that the run time of the UMDA is at most λ times thenumber of iterations until an optimum is sampled for the first time, and it is at least( λ −
1) times this number.In the area of run time analysis of randomized search heuristics, it is common to givebounds for the expected value of the run time of the algorithm under investigation. Thisis uncritical when the run time is concentrated around its expectation, as often observedfor classical evolutionary algorithms. For EDAs, it has been argued, among othersin (Doerr, 2019b), that it is preferable to give bounds that hold with high probability.This is what we shall aim at in this work as well. Of course, it would be even better togive estimates in a distributional sense, as argued for in (Doerr, 2019a), but this appearsto be difficult for EDAs, among others, because of the very different behavior in theregimes with and without strong genetic drift.
Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca Probabilistic tools.
We use the following results in our analysis. In order to prove statements on randomvariables that hold with high probability, we use the following commonly known Chernoffbound.
Theorem 1 (Chernoff bound (Doerr, 2020b, Theorem 10 . . Let k ∈ N , δ ∈ [0 , , and let X be the sum of k independent random variables, each takingvalues in [0 , . Then Pr (cid:2) X ≤ (1 − δ )E[ X ] (cid:3) ≤ exp (cid:18) − δ E[ X ]2 (cid:19) . The next lemma tells us that, for a random X following a binomial law, theprobability of exceeding E[ X ] is bounded from above by roughly the term with thehighest probability. Lemma 1 ((Doerr, 2020b, Eq. (10.62))) . Let k ∈ N , p ∈ [0 , , X ∼ Bin( k, p ) , and let m ∈ (cid:2) E[ X ] + 1 ..k (cid:3) . Then Pr[ X ≥ m ] ≤ m (1 − p ) m − E[ X ] · Pr[ X = m ] . We use Lemma 1 for the converse case, that is, in order to bound the probabilitythat a binomially distributed random variable is smaller than its expected value.
Corollary 1.
Let k ∈ N , p ∈ [0 , , X ∼ Bin( k, p ) , and let m ∈ (cid:2) .. E[ X ] − (cid:3) . Then Pr[ X ≤ m ] ≤ ( k − m ) p E[ X ] − m · Pr[ X = m ] . Proof.
Let X := k − X , and let m := k − m . Note that X ∼ Bin( k, − p ) withE[ X ] = k − E[ X ] and that m ∈ (cid:2) E[ X ] + 1 ..k (cid:3) . With Lemma 1, we computePr[ X ≤ m ] = Pr[ X ≥ m ] ≤ mpm − E[ X ] · Pr[ X = m ] = ( k − m ) p E[ X ] − m · Pr[ X = m ] , which proves the claim.Last, the following theorem deals with a neutral bit in a fitness function f , that is, aposition i ∈ [ n ] such that bit values at i do not contribute to the fitness value at all. Thefollowing theorem from (Doerr and Zheng, 2020) states that if the UMDA optimizes suchan f , then the frequency at position i stays close to its initial value for Ω( µ ) iterations.We go more into detail about how this relates to DLB at the beginning of Section 3.
Theorem 2 ((Doerr and Zheng, 2020, Theorem 2)) . Consider the UMDA optimizing afitness function f with a neutral bit i ∈ [ n ] . Then, for all d > and all t ∈ N , we have Pr (cid:2) ∀ t ∈ [0 ..t ] : | p ( t ) i − | < d (cid:3) ≥ − (cid:18) − d µ t (cid:19) . In the following, we prove that the UMDA optimizes
DeceptiveLeadingBlocks efficiently, which is the following theorem.6
Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis
Theorem 3.
Let c µ , c λ ∈ (0 , be constants chosen sufficiently large or small, respec-tively. Consider the UMDA optimizing DeceptiveLeadingBlocks with µ ≥ c µ n ln n and µ/λ ≤ c λ . Then the UMDA samples the optimum after λ ( n + 2 e ln n ) fitness functionevaluations with a probability of at least − n − . Before we present the proof, we sketch its main ideas and introduce importantnotation. We show that the frequencies of the UMDA are set to 1 − n block-wise fromleft to right with high probability. We formalize this concept by defining that a block i ∈ [ n ] is critical (in iteration t ) if and only if p ( t )2 i − + p ( t )2 i < − n and, for each index j ∈ [2 i − p ( t +1) j is at 1 − n . Intuitively, a critical block is the first blockwhose frequencies are not at their maximum value. We prove that a critical block isoptimized within a single iteration with high probability if we assume that its frequenciesare not below (1 − ε ) /
2, for ε ∈ (0 ,
1) being a constant.In order to assure that the frequencies of each block are at least (1 − ε ) / neutral. More formally,a frequency p i is neutral in iteration t if and only if the probability to have a 1 atposition i in each of the µ selected individuals equals p ( t ) i . Note that since we assumethat µ = Ω( n log n ), the impact of the genetic drift on neutral frequencies is low withhigh probability (Theorem 2).We know which frequencies are neutral and which are not by the following keyobservation: consider a population of λ individuals of the UMDA during iteration t ; onlythe first (leftmost) block that has strictly fewer than µ
11s is relevant for selection, sincethe fitness of individuals that do not have a 11 in this block cannot be changed by bits tothe right anymore. We call this block selection-relevant.
Note that this is a notion thatdepends on the random offspring population in iteration t , whereas the notion critical depends only on p ( t ) .The consequences of a selection-relevant block are as follows: if block i ∈ [ n ] isselection-relevant, then all frequencies in blocks left of i are set to 1 − n , since there areat least µ individuals with 11s. All blocks right of i have no impact on the selectionprocess: if an individual has no 11 in block i , its fitness is already fully determined by allof its bits up to block i by the definition of DLB . If an individual has a 11 in block i , itis definitely chosen during selection, since there are fewer than µ such individuals andsince its fitness is better than that of all of the other individuals that do not have a 11 inblock i . Thus, its bits at positions in blocks right of i are irrelevant for selection. Overall,since the bits in blocks right of i do not matter, the frequencies right of block i get nosignal from the fitness function and are thus neutral (Lemma 2).Regarding block i itself, all of the individuals with 11s are chosen, since they havethe best fitness. Nonetheless, individuals with a 00, 01, or 10 can also be chosen, wherean individual with a 00 in block i is preferred, as a 00 has the second best fitness after a11. Since the fitness for a 10 or 01 is the same, this does not impact the number of 1s atposition i in expectation. However, if more 00s than 11s are sampled for block i , it canhappen that the frequencies of block i are decreased. Since we assume that µ = Ω( n log n ),the frequency is sufficiently high before the update and the frequencies of block i do notdecrease by much with high probability (Lemma 3). Since, in the next iteration, block i is the critical block, it is then optimized within a single iteration (Lemma 4), and we donot need to worry about its frequencies decreasing again. Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca Neutral frequencies.
We now prove that the frequencies right of the selection-relevant block do not decreaseby too much within the first n iterations. Lemma 2.
Let ε ∈ (0 , be a constant. Consider the UMDA with λ ≥ µ ≥ (16 n/ε ) log n optimizing DeceptiveLeadingBlocks . Let t ≤ n be the first iteration such that block i ∈ [ n ] becomes selection-relevant for the first time. Then, with a probability of at least − n − , all frequencies at the positions [2 i + 1 ..n ] are at least (1 − ε ) / within the first t iterations.Proof. Let j ∈ [2 i + 1 ..n ] denote the index of a frequency right of block i . Note that bythe assumption that t is the first iteration such that block i becomes selection-relevant itfollows that, for all t ≤ t , the frequency p ( t ) i is neutral, as we discussed above.Since p ( t ) j is neutral for all t ≤ t , by Theorem 2 with d = ε , we see that theprobability that p j leaves the interval (cid:0) (1 − ε ) / , (1 + ε ) / (cid:1) within the first t ≤ n iterations is at most 2 exp (cid:0) − ε µ/ (8 t ) (cid:1) ≤ (cid:0) − ε µ/ (8 n ) (cid:1) ≤ n − , where we usedour bound on µ .Applying a union bound over all n − i ≤ n neutral frequencies yields that at leastone frequency leaves the interval (cid:0) (1 − ε ) / , (1 + ε ) / (cid:1) within the first t iterations with aprobability of at most 2 n − , as desired. Update of the selection-relevant block.
As mentioned at the beginning of the section, while frequencies right of the selection-relevant block do not drop below (1 − ε ) / − ε ) /
2, as the followingexample shows.
Example 1.
Consider the UMDA with µ = 0 . λ ≥ c ln n , for a sufficiently largeconstant c , optimizing DLB . Consider an iteration t and assume that block i = n − − o ( n ) is critical. Assume that the frequencies in blocks i and i + 1 are all at / . Then theoffspring population in iteration t has roughly (cid:0) (2 / /e (cid:1) λ ≈ . λ > µ individuals withat least i leading s in expectation. By Theorem 1, this also holds with high probability.Thus, the frequencies in block i are set to − n with high probability.The expected number of individuals with at least i + 2 leading s is roughly (cid:0) (2 / /e (cid:1) λ ≈ . λ , and the expected number of individuals with i leading s fol-lowed by a is roughly (cid:0) (2 / · (3 / /e (cid:1) λ ≈ . λ . In total, we expect approximately . λ < µ individuals with i leading s followed by either a or a . Again, byTheorem 1, these numbers occur with high probability. Note that this implies that block i + 1 is selection-relevant with high probability.Consider block i + 1 . For selection, we choose all . λ individuals with i leading s followed by either a or a (which are sampled with high probability). Forthe remaining µ − . λ = 0 . λ selected individuals with i leading s, we expecthalf of them, that is, . λ individuals to have a . Thus, with high probability,the frequency p ( t +1)2 i +1 is set to roughly (0 . . λ/µ = 0 . , which is less than . p ( t )2 i +1 . Thus, this frequency decreased. The next lemma shows that such frequencies do not drop too low, however.
Lemma 3.
Let ε, δ ∈ (0 , be constants, and let c be a sufficiently large constant.Consider the UMDA with λ ≥ µ ≥ c ln n optimizing DeceptiveLeadingBlocks .Further, consider an iteration t such that block i ∈ [2 .. n ] is selection-relevant, and assume Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis that its frequencies p ( t )2 i − and p ( t )2 i are at least (1 − ε ) / when sampling the population.Then the frequencies p ( t +1)2 i − and p ( t +1)2 i are at least (1 − δ )(1 − ε ) / with a probability ofat least − n − .Proof. Let k denote the number of individuals with a prefix of at least 2 i − i is selection-relevant, it follows that k ≥ µ . We consider a random variable X that follows a binomial law with k trials and with a success probability of p ( t )2 i − p ( t )2 i =: e p ≥ (1 − ε ) /
4. We now bound the probability that at least (1 − δ ) e pµ =: m have 2 i leading 1s, that is, we bound Pr[ X ≥ m | X < µ ], where the condition follows from thedefinition of block i being selection-relevant.Elementary calculations show thatPr[ X ≥ m | X < µ ] = 1 − Pr[
X < m | X < µ ] = 1 − Pr[
X < m, X < µ ]Pr[
X < µ ]= 1 − Pr[
X < m ]Pr[
X < µ ] . (1)To show a lower bound for (1), consider separately the two cases that E[ X ] < µ andE[ X ] ≥ µ . Case 1: E[ X ] < µ . We first bound the numerator of the subtrahend in (1). Since m/ (1 − δ ) = e pµ ≤ e pk = E[ X ], we have Pr[ X < m ] ≤ Pr (cid:2) X < (1 − δ )E[ X ] (cid:3) . ByTheorem 1, by E[ X ] ≥ e pµ , and by our assumption that µ ≥ c ln n , choosing c sufficientlylarge, we havePr (cid:2) X < (1 − δ )E[ X ] (cid:3) ≤ exp (cid:18) − δ E[ X ]2 (cid:19) ≤ exp (cid:18) − δ e pµ (cid:19) ≤ n − . For bounding the denominator, we note that e p ≤ − n and use the fact that abinomially distributed random variable with a success probability of at most 1 − n isbelow its expectation with a probability of at least (Doerr, 2020b, Lemma 10 .
20 (b)).This yields Pr[
X < µ ] ≥ Pr (cid:2) X < E[ X ] (cid:3) ≥ . Combining these bounds, we obtain Pr[ X ≥ m | X < µ ] ≥ − n − for this case. Case 2: E[ X ] ≥ µ > m . We bound the subtrahend from (1) from above. By basicestimations and by Corollary 1, we see thatPr[
X < m ]Pr[
X < µ ] ≤ Pr[ X ≤ m − X = µ − ≤ ( k − m + 1) e p E[ X ] − m + 1 · Pr[ X = m − X = µ − . (2)We bound the first factor of (2) as follows:( k − m + 1) e p E[ X ] − m + 1 ≤ E[ X ]E[ X ] − m = 1 + m E[ X ] − m ≤ mµ − m ≤ m m e p − m = 1 + e p − e p ≤ n − n , where the last inequality uses that e p ≤ (1 − n ) ≤ − n . Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca For the second factor of (2), we computePr[ X = m − X = µ −
1] = (cid:0) km − (cid:1)e p m − (1 − e p ) k − m +1 (cid:0) kµ − (cid:1)e p µ − (1 − e p ) k − µ +1 = ( µ − k − µ + 1)!( m − k − m + 1)! · (cid:18) − e p e p (cid:19) µ − m . (3)Since e p ≥ (1 − ε ) /
4, we see that (1 − e p ) / e p ≤ / (1 − ε ) .For the first factor of (3), let p ∗ := (1 − δ ) e p , thus µp ∗ = m . Noting that, for all a, b ∈ R with a < b , the function j ( a + j )( b − j ) is maximal for j = ( b − a ) /
2, we firstbound ( µ − m − µ − m − Y j =0 ( µ − − j ) ≤ b ( µ − m − / c Y j =0 (cid:0) ( µ − − j )( m + j ) (cid:1) ≤ (cid:18) µ + m (cid:19) µ − m ≤ (cid:16) µ p ∗ ) (cid:17) µ − m . Substituting this into the first factor of (3), we bound( µ − k − µ + 1)!( m − k − m + 1)! ≤ (cid:16) µ p ∗ ) (cid:17) µ − m · ( k − µ + 1)!( k − m + 1)!= (cid:0) µ (1 + p ∗ ) (cid:1) µ − m Q µ − m − j =0 ( k − m + 1 − j ) = µ − m − Y j =0 µ (1 + p ∗ )2( k − m + 1 − j ) . By noting that k e p = E[ X ] ≥ µ , we bound the above estimate further: µ − m − Y j =0 µ (1 + p ∗ )2( k − m + 1 − j ) ≤ µ − m − Y j =0 µ (1 + p ∗ )2( µ/ e p − m + 1 − j ) ≤ (cid:18) µ (1 + p ∗ )2 µ (1 / e p −
1) + 2 (cid:19) µ − m ≤ (cid:18) e p (1 + p ∗ )2(1 − e p ) (cid:19) µ − m . Substituting both bounds into (3) and recalling that m = µp ∗ , we obtainPr[ X = m − X = µ − ≤ (cid:18) p ∗ (cid:19) µ (1 − p ∗ ) = exp (cid:18) − µ (1 − p ∗ ) ln (cid:18)
21 + p ∗ (cid:19)(cid:19) . Finally, substituting this back into our bound of (2), using our assumption that µ ≥ c ln n and noting that p ∗ is constant, choosing c sufficiently large, we obtainPr[ X < m ]Pr[
X < µ ] ≤ n exp (cid:18) − µ (1 − p ∗ ) ln (cid:18)
21 + p ∗ (cid:19)(cid:19) ≤ n − . Concluding the proof.
In both cases, we see that the number of 11s in block i isat least m = (1 − δ ) e pµ ≥ (cid:0) (1 − δ )(1 − ε ) / (cid:1) µ with a probability of at least 1 − n − .Since each 11 contributes to the new values of p i − and p i , after the update, bothfrequencies are at least (1 − δ )(1 − ε ) /
4, as we claimed.10
Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis
Optimizing the critical block.
Our next lemma considers the critical block i ∈ [ n ] of an iteration t . It shows that, withhigh probability, for all j ∈ [2 i ], we have that p ( t +1) j = 1 − n . Informally, this means that(i) all frequencies left of the critical block remain at 1 − n , and (ii) the frequencies of thecritical block are increased to 1 − n . Lemma 4.
Let δ, ε, ζ ∈ (0 , be constants and let q = (1 − δ ) (1 − ε ) / . Considerthe UMDA optimizing DeceptiveLeadingBlocks with λ ≥ (4 /ζ ) ln n and µ/λ ≤ (1 − ζ ) q/e , and consider an iteration t such that block i ∈ [ n ] is critical and that p ( t )2 i − and p ( t )2 i are at least √ q . Then, with a probability of at least − n − , at least µ offspringare generated with at least i leading s. In other words, the selection-relevant block ofiteration t is at a position in [ i + 1 .. n ] .Proof. Let X denote the number of individuals that have at least 2 i leading 1s. Sinceblock i is critical, each frequency at a position j ∈ [2 i −
2] is at 1 − n . Thus, theprobability that all of these frequencies sample a 1 for a single individual is (1 − n ) i − ≥ (1 − n ) n − ≥ /e . Further, since the frequencies p ( t )2 i − and p ( t )2 i are at least √ q , theprobability to sample a 11 at these positions is at least q . Hence, we have E[ X ] ≥ qλ/e .We now apply Theorem 1 to show that it is unlikely that fewer than µ individualsfrom the current iteration have fewer than 2 i leading 1s. Using our bounds on µ and λ ,we computePr[ X < µ ] ≤ Pr h X ≤ (1 − ζ ) qe λ i ≤ Pr (cid:2) X ≤ (1 − ζ )E[ X ] (cid:3) ≤ e − ζ λ ≤ n − . Thus, with a probability of at least 1 − n − , at least µ individuals have at least 2 i leading 1s. This concludes the proof. The run time of the UMDA on DLB.
We now prove our main result.
Proof of Theorem 3.
We prove that the UMDA samples the optimum after n + 2 e ln n iterations with a probability of at least 1 − n − . Since it samples λ individuals eachiteration, the theorem follows.Due to Lemma 2 and µ ≥ c µ n log n , for an ε ∈ (0 , n iterations,with a probability of at least 1 − n − , no frequency drops below (1 − ε ) / δ ∈ (0 , − n − ,once a block becomes selection-relevant, its frequencies do not drop below (1 − δ )(1 − ε ) / n consecutive times witha probability of at least 1 − n − . Note that a selection-relevant block becomes criticalin the next iteration.Consider a critical block i ∈ [ n ]. By Lemma 4, choosing c λ sufficiently large, with aprobability of at least 1 − n − , all frequencies at positions in [2 i ] are immediately set to1 − n in the next iteration, and the selection-relevant block has an index of at least i + 1,thus, moving to the right. Applying a union bound for the first n iterations of the UMDAand noting that each frequency belongs to a selection-relevant block at most once showsthat all frequencies are at 1 − n after the first n iterations, since each block containstwo frequencies, and stay there for at least n additional iterations with a probability ofat least 1 − n − . Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca Consequently, after the first n iterations, the optimum is sampled in each iterationwith a probability of (1 − n ) n ≥ / (2 e ). Thus, after 2 e ln n additional iterations, theoptimum is sampled with a probability of at least 1 − (cid:0) − / (2 e ) (cid:1) e ln n ≥ − n − .Overall, by applying a union bound over all failure probabilities above, the UMDAneeds at most n + ln n iterations to sample the optimum with a probability of at least1 − n − . (1 + 1) EA Aiming at a lower bound for the run time of an EA on
DeceptiveLeadingBlocks ,we now conduct a precise run time analysis of the (1 + 1) EA on
DLB . Being of orderΩ( n ), it shows that the corresponding O ( n ) bound from Lehre and Nguyen (2019)is tight apart from constant factors and lower order terms. We note that Lehre andNguyen (2019) have also shown upper bounds for the run time of other evolutionaryalgorithms, again typically of order O ( n ). We conjecture that for these setting an Ω( n )lower bound is valid as well, as our empirical results in Section 5 suggest, but we do notprove this here.To make the following result precise, we recall that the (1 + 1) EA for the maxi-mization of f : { , } n → R is the simple EA which (i) starts with an individual x chosenuniformly at random from { , } n and then (ii) in each iteration creates from the currentsolution x an offspring y via flipping each bit independently with probability n andreplaces x by y if and only if f ( y ) ≥ f ( x ).The following result determines precisely the run time of the (1 + 1) EA on DLB . Theorem 4.
In expectation, the (1 + 1)
EA samples the optimum of
DeceptiveLea-dingBlocks after n (cid:18) n − (cid:19) (1 + n − ) n −
11 + n − = (1 + o (1)) e − n fitness function evaluations.Proof. Let ‘ ∈ [0 ..n/ −
1] and b ∈ { , } . Let x ∈ { , } n such that Prefix ( x ) = ‘ and DB ( x ‘ +1 , ‘ +2 ) = b , in other words, such that x i = 1 for all i ≤ ‘ and the contributionof the ‘ + 1-st block is DB ( x ‘ +1 , ‘ +2 ) = b . Consider a run of the (1 + 1) EA on DLB ,started with the search point x instead of a random initial solution. Let X be the randomvariable describing the first time that a solution with Prefix -value greater than ‘ isfound.We first observe that X is independent of x i , i > ‘ + 2. In the case that b = 0, also X is independent of whether ( x ‘ +1 , ‘ +2 ) = (1 ,
0) or ( x ‘ +1 , ‘ +2 ) = (0 , X ‘,b := X without ambiguity later in this proof.We compute E [ X ]. If b = 1, then X follows a geometric law with success probability p = (1 − n ) ‘ n − ; hence E [ X ‘, ] = 1 p = (cid:18) − n (cid:19) − ‘ n . If b = 0, in principle we could also precisely describe the distribution of X , but sincethis is more complicated and we only regard expected run times in this proof, we avoidthis and take the more simple route to only determine the expectation. From x , the(1 + 1) EA in one iteration finds a search point with Prefix -value greater than ‘ with12 Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis probability (1 − n ) ‘ +1 n − . With the same probability, it finds a search point with(unchanged) Prefix -value equal to ‘ and (higher) b -value 1. Otherwise it finds no trueimprovement and stays with x or an equally good search point. In summary, we have E [ X ‘, ] = 1 + (cid:18) − n (cid:19) ‘ +1 n − · (cid:18) − n (cid:19) ‘ +1 n − E [ X ‘, ]+ − (cid:18) − n (cid:19) ‘ +1 n − ! E [ X ‘, ] , hence E [ X ‘, ] = 12 (cid:18) − n (cid:19) − ‘ − n + 12 E [ X ‘, ] . Building on the random initialization of the algorithm, in a fashion analogousto (Doerr, 2019a, Theorem 3), the run time T of the (1 + 1) EA with random initializationon DLB has the distribution P n/ − ‘ =0 P b =0 [ Z ‘ = b ] X ‘,b , where the X ‘,b are copies of therandom variables introduced above, the Z ‘ are identically distributed with Pr[ Z ‘ = 0] = ,Pr[ Z ‘ = 1] = , and Pr[ Z ‘ = 2] = , and all these random variables are independent.Consequently, E [ T ] = n/ − X ‘ =0 (cid:18) E [ X ‘, ] + 14 E [ X ‘, ] (cid:19) = n/ − X ‘ =0 (cid:18) − n (cid:19) − ‘ − n + n/ − X ‘ =0 (cid:18) − n (cid:19) − ‘ n . Noting that P n/ − ‘ =0 (cid:0) − n (cid:1) − ‘ = ( n − (1+ n − ) n − n − , we obtain E [ T ] = 14 n (cid:18) n − (cid:19) (1 + n − ) n −
11 + n − . In their paper, Lehre and Nguyen (2019) analyze the run time of many EAs on
DLB .For an optimal choice of parameters, they prove an expected run time of O ( n ) for allconsidered algorithms.Since these are only upper bounds and since we showed in Section 4 an Ω( n ) lowerbound only for the (1 + 1) EA, it is not clear how well the other algorithms actuallyperform against the UMDA, which has a run time in the order of n ln n (Theorem 3)for optimal parameters. Thus, we provide some empirical results in Figure 1 on how wellthese algorithms compare against each other. Considered algorithms.
Besides the UMDA, we analyze the run time of the followingEAs: the (1 + 1) EA, the ( µ, λ ) EA, and the ( µ, λ ) GA with uniform crossover, whichare most of the EAs that Lehre and Nguyen (2019) consider in their paper. Lehre andNguyen also analyze the (1 + λ ) EA and the ( µ + 1) EA. However, in our preliminaryexperiments with µ = d ln n e and λ = d√ n e , they were always slower than the (1 + 1) EA,so we do not include these algorithms in Figure 1. Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca Further, we also depict the run time of the mutual-information-maximizing inputclustering algorithm (MIMIC; Bonet et al. (1996)), which is one of the algorithmsthat Lehre and Nguyen (2019) also analyze empirically. The MIMIC is an EDA witha more sophisticated probabilistic model than the UMDA. This model is capable ofcapturing dependencies among the bit positions by storing a permutation π of indicesand conditional probabilities. An individual is created bit by bit, following π . Each bitis sampled with respect to a conditional probability, depending on the bit sampled inthe prior position. The MIMIC updates its model as follow: similar to the UMDA, theMIMIC samples each iteration λ individuals and selects the µ best. To update its richerprobabilistic model, the MIMIC then searches for the position with the least (empirical)entropy, that is, the position that has the most 0s or 1s, and sets the new frequency equalto the number of 1s in the selected subpopulation. Iteratively, for all remaining positions,it determines the position with the lowest conditional (empirical) entropy and sets theconditional frequencies to the conditional numbers of 1s, where conditional is always withrespect to the prior position in π . Since a precise description of this algorithm wouldneed a significant amount of space, we refer the reader to a recent paper by Doerr andKrejca (2020a) for more details. Parameter choice.
For each of the EAs depicted in Figure 1, we choose parameterssuch that the upper run time bound proven by Lehre and Nguyen (2019) is O ( n ), thatis, optimal. For both the ( µ, λ ) EA and the ( µ, λ ) GA, we choose µ = d ln n e and λ = 9 µ .For the latter, we further choose uniform crossover with a crossover probability of 1 / µ = d n ln n e and λ = 12 µ . Ourparameter choice for the UMDA is based on Theorem 3. For the MIMIC, we speculatethat the similarities with the UMDA imply that these parameter values are suitable aswell. In any case, with these parameter values all runs of these two algorithms weresuccessful, that is, the optimum of DLB was found within 10 n function evaluations. Results.
When interpreting the plots from Figure 1, please note that both axes usea logarithmic scale. Thus, any polynomial is depicted as a linear function, where theslope depends on the exponent of the highest term. Figure 1 clearly shows two separateregimes: The EAs with a larger slope and the EDAs with a smaller slope.We see that the (1 + 1) EA from n ≥
200 on performs the worst out of all algorithms.The EAs all have a very similar slope, which ranges, according to a regression to fit apower law, from Θ( n . ) to Θ( n . ). This indicated that all of these EAs have a runtime of Θ( n ), as we proved for the (1 + 1) EA in Theorem 4. In contrast, the UMDAand the MIMIC have a smaller slope, about Θ( n . ) and Θ( n . ), respectively,suggesting that our upper bound of O ( n log n ) for the UMDA (Theorem 3) is tight.Further note that the variance of the EDAs is very small, suggesting that their runtime holds with a higher concentration bound than what we prove for the UMDA inTheorem 3.Interestingly, the MIMIC behaves very similarly to the UMDA. However, this maybe a result of the same parameter choice of both algorithms. Lehre and Nguyen (2019)also analyze the MIMIC empirically, with choices of µ in the orders of √ n , √ n log n ,and n . In their results, the run time of the MIMIC is a bit faster; for n = 100, theempirical run time of the MIMIC as reported by Lehre and Nguyen (2019) is in the orderof roughly around 2 ≈ . · . In Figure 1, for the same value of n , the run time ofthe MIMIC is about 6 · , which is very close. This suggests that the parameter regimewithout strong genetic drift, which we consider in this paper for the UMDA, is not thatimportant for the MIMIC. While the MIMIC seems to have a larger tolerable parameter14 Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis
50 100 150 200 250 300 350 400 450500 n nu m b e r o f fi t n e ss e v a l u a t i o n s UMDAMIMIC( µ, λ ) EA( µ, λ ) GA(1 + 1) EA
Figure 1: A log-log plot depicting the number of fitness evaluations of various algorithmsuntil the optimum of
DLB is sampled for the first time against the input size n from 50to 300 in steps of 50. For each value of n , 100 independent runs were started peralgorithm. The lines depict the median of the 50 runs of an algorithm, and the shadedareas denote the center 50 %.Please refer to Section 5 for more details.regime, it is impressive that the UMDA has a similar performance to the MIMIC. Figure 1 compares the run time of the UMDA to that of other algorithms. To thisend, we chose the parameters in the regime without strong genetic drift, as proposed inTheorem 3. However, Lehre and Nguyen (2019) prove a lower bound of exp (cid:0) Ω( µ ) (cid:1) forthe UMDA on DLB , which can be better than our upper bound of O ( n log n ) if µ issufficiently small. Thus, we analyze in Figure 2 the impact of µ on the run time of theUMDA on DLB for n = 300. It is important to note that we only report the number of fitness evaluations in Figure 1. Whencomparing the time to perform an update, the UMDA is faster, as the model update can be computed intime Θ( n ), whereas the one for the MIMIC takes time Θ( n ) due to the iterated search of the minimalconditional entropy.Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca Parameter choice.
In each run, we choose λ = 12 µ , as we did before; µ ranges from2 to 2 in powers of 2. Note that our range for µ ends with 2 = 4 , µ = d n ln n e in the previous experiment, which results in a value of µ = 5 , Results.
In Figure 2, we show the run times for those values of µ where at least one outof 100 runs was successful, that is, where the UMDA found the optimum of DLB within10 n function evaluations. There are two very distinct ranges of µ leading to successfulruns. The first regime consists of the values 2, 4, and 8; note that the effect of geneticdrift is strong in this regime. For each value of µ , the number of fitness evaluations is atleast 10 , which is worse than the performance of the UMDA in Figure 1. Interestingly,in this regime the UMDA is successful in every run. This suggests that the range offrequency values for the UMDA is coarse enough such that it is able to quickly changefrequencies from n to 1 − n . If a frequency is at n , it takes some time until a 1 is sampledand selected, but if this occurs, the respective frequency has a decent chance of beingincreased to 1 − n .For values of µ between 2 to 2 , the range of frequency values increases, consequentlyleading to longer times for a frequency to be increased from n to 1 − n . The growth ofthe plot from the regime of smaller values of µ suggests a superpolynomial run time in µ for this behavior. Thus, it is not surprising that no run of the UMDA is successful forthe medium values of µ .The second successful regime of µ consists of the values 2 , 2 , and 2 . Whilenot all runs are successful, in particular not those for the smallest value of µ , the trendof the curve seems to be linear, which suggests a polynomial run time of the UMDA(because of the log-log scale). Further, the run time is comparable to that of the UMDAseen in Figure 1. Likely, starting with µ = 2 , the UMDA enters the regime wherethe effect of genetic drift starts to diminish. For µ = 2 , the effect is still rather large,and oftentimes frequencies reach the lower border n , however, not all of the time. Withincreasing µ , this happens less and less. For µ = 2 , almost all runs are successful. Incomparison, in Figure 1, all runs of the UMDA were successful, which uses a slightlylarger value of µ than 2 .Overall, the results from Figure 2 show an interesting transition from the run timeof the UMDA on DLB when the effect of genetic drift vanishes.
We conducted a rigorous run time analysis of the UMDA on the
DeceptiveLeading-Blocks function. In particular, it shows that the algorithm with the right parameterchoice finds the optimum in O ( n log n ) fitness evaluations with high probability (Theo-rem 3). This result shows that the lower bound by Lehre and Nguyen (2019), which isexponential in µ , is not due to the UMDA being ill-suited for coping with epistasis anddeception, but rather due to an unfortunate choice of the algorithm’s parameters. Forseveral EAs, Lehre and Nguyen (2019) showed a run time bound of O ( n ) on Decep-tiveLeadingBlocks . We proved a matching lower bound for the (1 + 1) EA (Section 4)and conducted experiments which suggest that also other EAs perform worse than theUMDA on
DeceptiveLeadingBlocks (Section 5). In this light, our result suggeststhat the UMDA can handle epistasis and deception even better than many evolutionaryalgorithms and similar to a more complex EDA.Our run time analysis holds for parameter regimes that prevent genetic drift. Whencomparing our run time with the one shown in (Lehre and Nguyen, 2019), we obtain a16
Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis . . . . . . . . n = 3001 .
00 1 .
00 1 .
00 0 .
23 0 .
81 0 . µ nu m b e r o f fi t n e ss e v a l u a t i o n s UMDA
Figure 2: A log-log plot depicting the number of fitness evaluations of the UMDA untilthe optimum of
DLB for n = 300 is sampled for the first time against the parameter µ from 2 to 2 , doubling the value in each step. For each value of n , 100 independentruns were started. We terminated each run after 10 n = 2 . · function evaluations ifthe UMDA did not find the optimum until then.The number over each data point denotes the ratio of runs in which the UMDA actuallyfound an optimum. The lines depict the median of the successful runs, and the shadedareas denote their center 50 %.Note that for the values of µ from 2 to 2 , no run was successful. Thus, no data pointsare depicted.Please refer to Section 5.1 for more details. Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca strong suggestion for running EDAs in regimes of low genetic drift. In contrast to thework of Lengler et al. (2018) that indicates moderate performance losses due to geneticdrift, here we obtain the first fully rigorous proof of such a performance loss, and inaddition one that is close to exponential in n (the exp(Ω( µ )) lower bound of Lehre andNguyen (2019) holds for µ up to o ( n )). Our proven upper and lower bound also showthat the UMDA has an advantage in coping with local optima compared to EAs. Suchan observation has previously only been made for the compact genetic algorithm (whenoptimizing jump functions, see Hasenöhrl and Sutton (2018); Doerr (2019b)).On the technical side, our result indicates that the regime of low genetic drift admitsrelatively simple and natural analyses of run times of EDAs, in contrast, e.g., to thelevel-based methods previously used in comparable analyses, e.g., in (Dang and Lehre,2015; Lehre and Nguyen, 2019).We conjecture that our result can be generalized to a version of the Deceptive-LeadingBlocks function with a block size of k ≤ n . Acknowledgments
This work was supported by COST Action CA15140 and by a public grant as part of theInvestissements d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in ajoint call with Gaspard Monge Program for optimization, operations research and theirinteractions with data sciences.
References
Bonet, J. S. D., Jr., C. L. I., and Viola, P. A. (1996). MIMIC: finding optima byestimating probability densities. In
Proc. of NIPS ’96 , pages 424–430.Chen, T., Lehre, P. K., Tang, K., and Yao, X. (2009). When is an estimation ofdistribution algorithm better than an evolutionary algorithm? In
Proc. of CEC ’09 ,pages 1470–1477.Corus, D., Dang, D., Eremeev, A. V., and Lehre, P. K. (2018). Level-based analysis ofgenetic algorithms and other search processes.
IEEE Transactions on EvolutionaryComputation , 22(5):707–719.Dang, D., Friedrich, T., Kötzing, T., Krejca, M. S., Lehre, P. K., Oliveto, P. S., Sudholt,D., and Sutton, A. M. (2016). Escaping local optima with diversity mechanisms andcrossover. In
Proc. of GECCO ’16 , pages 645–652.Dang, D., Friedrich, T., Kötzing, T., Krejca, M. S., Lehre, P. K., Oliveto, P. S., Sudholt,D., and Sutton, A. M. (2018). Escaping local optima using crossover with emergentdiversity.
IEEE Transactions on Evolutionary Computation , 22(3):484–497.Dang, D. and Lehre, P. K. (2015). Simplified runtime analysis of estimation of distributionalgorithms. In
Proc. of GECCO ’15 , pages 513–518.Dang, D. and Lehre, P. K. (2016). Runtime analysis of non-elitist populations: fromclassical optimisation to partial information.
Algorithmica , 75(3):428–461.Doerr, B. (2019a). Analyzing randomized search heuristics via stochastic domination.
Theoretical Computer Science , 773:115–137.Doerr, B. (2019b). A tight runtime analysis for the cGA on jump functions: EDAs cancross fitness valleys at no extra cost. In
Proc. of GECCO ’19 , pages 1488–1496.18
Evolutionary Computation Volume x, Number x
MDA Copes Well With Deception and Epistasis
Doerr, B. (2020a). Does comma selection help to cope with local optima? In
Proc. ofGECCO ’20 . To appear.Doerr, B. (2020b). Probabilistic tools for the analysis of randomized optimizationheuristics. In
Theory of Evolutionary Computation: Recent Developments in DiscreteOptimization , pages 1–87. Springer. Also available at https://arxiv.org/abs/1801.06733 .Doerr, B. and Künnemann, M. (2015). Optimizing linear functions with the (1+ λ ) evolu-tionary algorithm – Different asymptotic runtimes for different instances. TheoreticalComputer Science , 561:3–23.Doerr, B. and Kötzing, T. (2019). Multiplicative up-drift. In
Proc. of GECCO ’19 , pages1470–1478.Doerr, B. and Krejca, M. S. (2018). Significance-based estimation-of-distribution algo-rithms. In
Proc. of GECCO ’18 , pages 1483–1490.Doerr, B. and Krejca, M. S. (2020a). Bivariate estimation-of-distribution algorithms canfind an exponential number of optima. In
Proc. of GECCO ’20 . To appear.Doerr, B. and Krejca, M. S. (2020b). The univariate marginal distribution algorithmcopes well with deception and epistasis. In
Proc. of EvoCOP ’20 , pages 51–66.Doerr, B. and Zheng, W. (2020). Sharp bounds for genetic drift in estimation-of-distribution algorithms.
IEEE Transactions on Evolutionary Computation . To appear.Droste, S. (2006). A rigorous analysis of the compact genetic algorithm for linear functions.
Natural Computing , 5(3):257–283.Droste, S., Jansen, T., and Wegener, I. (2002). On the analysis of the (1+1) evolutionaryalgorithm.
Theoretical Computer Science , 276(1-2):51–81.Hasenöhrl, V. and Sutton, A. M. (2018). On the runtime dynamics of the compactgenetic algorithm on jump functions. In
Proc. of GECCO ’18 , pages 967–974.Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
Journal of the American Statistical Association , 58(301):13–30.Krejca, M. S. and Witt, C. (2017). Lower bounds on the run time of the univariatemarginal distribution algorithm on OneMax. In
Proc. of FOGA ’17 , pages 65–79.Krejca, M. S. and Witt, C. (2020). Theory of estimation-of-distribution algorithms. In
Theory of Evolutionary Computation: Recent Developments in Discrete Optimization ,pages 405–442. Springer. Also available at http://arxiv.org/abs/1806.05392 .Lehre, P. K. (2011). Fitness-levels for non-elitist populations. In
Proc. of GECCO ’11 ,pages 2075–2082.Lehre, P. K. and Nguyen, P. T. H. (2017). Improved runtime bounds for the univariatemarginal distribution algorithm via anti-concentration. In
Proc. of GECCO ’17 , pages1383–1390.Lehre, P. K. and Nguyen, P. T. H. (2019). On the limitations of the univariate marginaldistribution algorithm to deception and where bivariate EDAs might help. In
Proc. ofFOGA ’19 , pages 154–168.
Evolutionary Computation Volume x, Number x . Doerr and M. S. Krejca Lengler, J., Sudholt, D., and Witt, C. (2018). Medium step sizes are harmful for thecompact genetic algorithm. In
Proc. of GECCO ’18 , pages 1499–1506.Mühlenbein, H. and Paaß, G. (1996). From recombination of genes to the estimation ofdistributions I. Binary parameters. In
Proc. of PPSN ’96 , pages 178–187.Pelikan, M., Hauschild, M., and Lobo, F. G. (2015). Estimation of distribution algorithms.In
Springer Handbook of Computational Intelligence , pages 899–928. Springer.Sudholt, D. and Witt, C. (2019). On the choice of the update strength in estimation-of-distribution algorithms and ant colony optimization.
Algorithmica , 81(4):1450–1489.Witt, C. (2018). Domino convergence: why one should hill-climb on linear functions. In
Proc. of GECCO ’18 , pages 1539–1546.Witt, C. (2019). Upper bounds on the running time of the univariate marginal distributionalgorithm on OneMax.
Algorithmica , 81(2):632–667.20, 81(2):632–667.20