A Simplified Run Time Analysis of the Univariate Marginal Distribution Algorithm on LeadingOnes
aa r X i v : . [ c s . N E ] A p r A Simplified Run Time Analysis of theUnivariate Marginal Distribution Algorithmon LeadingOnes
Benjamin DoerrLaboratoire d’Informatique (LIX)CNRSÉcole PolytechniqueInstitut Polytechnique de ParisPalaiseauFrance Martin S. KrejcaHasso Platter InstituteUniversity of PotsdamPotsdamGermanyApril 13, 2020
Abstract
With elementary means, we prove a stronger run time guaranteefor the univariate marginal distribution algorithm (UMDA) optimiz-ing the
LeadingOnes benchmark function in the desirable regimewith low genetic drift. If the population size is at least quasilinear,then, with high probability, the UMDA samples the optimum withina number of iterations that is linear in the problem size divided bythe logarithm of the UMDA’s selection rate. This improves over theprevious guarantee, obtained by Dang and Lehre (2015) via the deeplevel-based population method, both in terms of the run time and bydemonstrating further run time gains from small selection rates. Withsimilar arguments as in our upper-bound analysis, we also obtain thefirst lower bound for this problem. Under similar assumptions, weprove that a bound that matches our upper bound up to constantfactors holds with high probability.
Estimation-of-distribution algorithms (EDAs) are randomized search-heuristics that create a probabilistic model of the search space and refine1t iteratively. In each iteration, the current model of an EDA is used tocreate some samples which, in turn, are used to adjust the model such thatbetter solutions are more likely to be created in the following iteration. Thus,the model evolves over time into one that creates very good solutions. EDAshave been applied to real-world problems with great success [PHL15].Within the last few years, the theoretical analysis of EDAs has gainedincreasing interest, as summarized by Krejca and Witt [KW20]. Oneof the first papers in this period was by Dang and Lehre [DL15], whoproved run time guarantees for the univariate marginal distribution algorithm (UMDA, [MP96]) when optimizing the two classical benchmark functions
OneMax and
LeadingOnes . While their run time bound for
OneMax has been improved since then independently by Lehre and Nguyen [LN17]and Witt [Wit17], the run time bound of O ( n + nλ log λ ), where n is theproblem dimension and λ is the offspring population size of the UMDA, isthe best known result so far on LeadingOnes . In this work, we improve with Theorem 5 the second term of this boundfrom O ( nλ log λ ) to O (cid:16) n λ log( λ/µ ) (cid:17) when µ = Ω( n log n ), where µ ≤ λ is thesize of the subpopulation selected for the model update. In the regime of µ =Ω( n log n ), the UMDA shows the generally desirable behavior of low geneticdrift, that is, the sampling frequencies stay in the middle range of, say, ( , )until a sufficiently strong fitness signal moves them into the right direction.While EDAs are not necessarily inefficient in the presence of stronger geneticdrift, their optimization behavior then often becomes similar to a slowed-down version of the (1+1) evolutionary algorithm. Genetic drift, however,can also lead to a performance loss, since it may take long to move a frequencyfrom the wrong boundary value back into the middle range. This has beenrigorously shown by Lengler et al. [LSW18].Equally interesting to the improved run time guarantee is our elementaryproof method. While it was truly surprising that Dang and Lehre [DL15]could use the level-based population method to analyze an EDA (which doesnot have a population that is transferred from one iteration to the next),this method is a highly advanced tool and one that can be difficult to use.In contrast to this, our proof only uses elementary arguments common inthe analysis of evolutionary algorithms. We are thus optimistic that ourarguments can more easily be applied to other EDAs as well.We further demonstrate the usability of our proof method by provinga matching lower bound (see Theorem 6), which improves the previouslybest known lower bounds by Lehre and Nguyen [LN19] for the regime of µ = In an extension of [DL15], Dang et al. [DLN19] show the same run time bound butslightly improve the required population sizes. n log n ). For the regime of µ = Ω(log n ) ∩ o ( n log n ), the bound Ω (cid:16) nλ log( λ − µ ) (cid:17) by Lehre and Nguyen remains the best known lower bound. Additionally,Lehre and Nguyen prove a lower bound of e Ω( µ ) for µ = Ω(log n ) and λ . eµ ,which remains untouched by our result.We note that both of our bounds do not require the fraction µ/λ to beconstant, which is a common requirement of many other analyses of theUMDA [DLN19, LN17, Wit17, KW18] (although this is not always explicitlystated in the result). In particular, our bounds show that the gain fromreducing the selection rate µ/λ (which often requires a costly increase of λ )is very small, namely, only logarithmic in µ/λ .Another advantage of our approach is that it gives run time guaranteesthat hold with high probability, whereas the level-based method, relying ondrift arguments, can only give bounds on expected run times. Consequently,the result of Dang and Lehre [DL15] also concerns the expectation only. Webelieve that a result that holds with high probability is often more relevant,as has also been argued by Doerr [Doe19]. We are concerned with the run time analysis of algorithms optimizing pseudo-Boolean functions, that is, functions f : { , } n → R , where n ∈ N denotesthe dimension of the problem. Given a pseudo-Boolean function f and a bitstring x , we refer to f as a fitness function, to x as an individual, and to f ( x )as the fitness of x . For an n ∈ N , we define [ n ] = [1 , n ] ∩ N . From now on, if not statedotherwise, the variable n always denotes a natural number. For a vector x of length n , we denote its component at position i ∈ [ n ] via x i .We consider the optimization of the pseudo-Boolean function LeadingOnes : { , } n → { } ∪ [ n ], which states for a bit string of length n the longest prefix of leading 1s within that bit string. More formally, for all x ∈ { , } n , LeadingOnes ( x ) = n X i =1 i Y j =1 x i . Note that the all-1s bit string is the unique global optimum of
Leading-Ones .Our algorithm of interest is the UMDA (Algorithm 1) with parameters µ, λ ∈ N , µ ≤ λ . It maintains a vector p of probabilities (the frequencyvector ) of length n , whose components we call frequencies, and it updates3his vector iteratively in the following way: first, λ individuals are createdindependently from another such that, for each individual x ∈ { , } n andeach position i ∈ [ n ], it holds that x i is 1 with probability p i and 0 otherwise.Then, from these λ individuals, a subset of µ individuals with the highestfitness is chosen (breaking ties uniformly at random), and, for each position i ∈ [ n ], the frequency p i is set to the relative number of 1s at position i amongthe µ best individuals. Last, if a frequency p i is below n , it is increased to n ,and if it is above 1 − n , it is decreased to 1 − n . This circumvents frequenciesfrom being stuck at the extremal values 0 or 1. We denote the frequencyvector of iteration t ∈ N by p ( t ) . Note that we start with iteration t = 0. Algorithm 1:
The UMDA [MP96] with parameters µ and λ , µ ≤ λ ,maximizing a fitness function f : { , } n → R with n ≥ t ← p ( t ) ← ( ) i ∈ [ n ] ; repeat ⊲ iteration t for i ∈ [ λ ] do x ( i ) ← individual sampled via p ( t ) ; let y (1) , . . . , y ( µ ) denote the µ best individuals out of x (1) , . . . , x ( λ ) (breaking ties uniformly at random); for i ∈ [ n ] do p ( t +1) i ← µ P µj =1 y ( i ) j ; restrict p ( t +1) to the interval [ n , − n ]; until termination criterion met;In the context of optimizing LeadingOnes , we say that a position i ∈ [ n ]of p ( t ) is critical in iteration t ∈ N if and only if all of the frequencies atindices less than i are 1 − n and p ( t ) i is less than 1 − n . Intuitively, a criticalfrequency is the next one that needs to be set to 1 − n in order to optimize LeadingOnes efficiently.When analyzing the run time of the UMDA optimizing a fitness func-tion f , we are interested in the number T of fitness function evaluationsuntil an optimum of f is sampled for the first time. Since the UMDA is arandomized algorithm, T is a random variable, and we are interested in abound on T that holds with high probability. Note that the run time T ofthe UMDA is at most λ times the number I of iterations until an optimumis sampled for the first time. Likewise, T is at least ( I − λ + 1.In order to prove statements on random variables that hold with highprobability, we use the following commonly known Chernoff bounds. Theorem 1 (Chernoff bound [Doe20, Theorem 1 . . . Let k ∈ N , δ ∈ [0 , ,and let X be the sum of k independent random variables, each taking values n [0 , . Then Pr h X ≤ (1 − δ )E[ X ] i ≤ e − δ X ]2 . Theorem 2 (Chernoff bound [Doe20, Theorem 1 . . . Let k ∈ N , δ ∈ [0 , ,and let X be the sum of k independent random variables, each taking valuesin [0 , . Then Pr h X ≥ (1 + δ )E[ X ] i ≤ e − δ X ]3 . The next two theorems, recently proven in [DZ19], give upper bounds onthe negative effect of genetic drift on the UMDA. The first result considersthe optimization of fitness functions f that weakly prefer a 1 at a position i ∈ [ n ], that is, for all bit strings x, x ′ ∈ { , } n with x i = 1, x ′ i = 0, and x j = x ′ j for all other positions j ∈ [ n ] \ { i } , it holds that f ( x ) ≥ f ( x ′ ). Inother words, having a 1 at position i always yields a fitness at least as goodas when having a 0 at i . Note that LeadingOnes weakly prefers a 1 inall bit positions. The theorem states that the frequency at such a position i does not drop far below its initial value for a long time. Theorem 3 ([DZ19, Theorem 7]) . Consider the UMDA with parameters µ and λ optimizing a function f that weakly prefers a at position i ∈ [ n ] .Then, for all d > and all iterations t ∈ N , we have Pr h ∀ t ′ ∈ [0 ..t ] : p ( t ′ ) i > − d i ≥ − e − d µ t . The next theorem considers the case that there is no preference for abit value at position i ∈ [ n ], that is, for all bit strings x, x ′ ∈ { , } n with x i = 1, x ′ i = 0, and x j = x ′ j for all other positions j ∈ [ n ] \ { i } , it holds that f ( x ) = f ( x ′ ). Given this assumption, we call position i neutral. Theorem 4 ([DZ19, Corollary 2]) . Consider the UMDA with parameters µ and λ optimizing a function f such that position i ∈ [ n ] is neutral. Then, forall d > and all iterations t ∈ N , we have Pr h ∀ t ′ ∈ [0 ..t ] : p ( t ′ ) i ∈ ( − d, + d ) i ≥ − e − d µ t . In the following, we present our simple and intuitive run time analysis forthe upper bound of the UMDA optimizing
LeadingOnes , which gives thefollowing theorem. 5 heorem 5.
Let δ ∈ (0 , be a constant, and let ζ = − δ e . Consider theUMDA optimizing LeadingOnes with µ ≥ n ln n and λ ≥ µζ . Further,let d = ⌊ log ( ζ λµ ) ⌋ . Then the UMDA samples the optimum after at most λ ( ⌈ nd +1 ⌉ + ⌈ nn − e ln n ⌉ ) fitness function evaluations with a probability of atleast − n − . As discussed in the introduction, we only want to consider the regimewith low genetic drift. Hence, we first argue that no frequency drops below before the optimum is sampled (Lemma 1). Then we show that, in this case,in each iteration, roughly log λµ additional frequencies are set to 1 − n . Morespecifically, if i ∈ [ n ] is critical, then all frequencies at positions roughly up to i +log λµ are set to 1 − n (Lemma 2). Thus, a total of roughly n log( λ/µ ) iterationssuffice to move all frequencies to 1 − n . From such a state, the optimum issampled with high probability after a logarithmic number of iterations.We start by proving that the following parameter setting ensures that nofrequency drops below the value within 2 n iterations with high probability. Lemma 1.
Consider the UMDA with λ ≥ µ ≥ n ln n . Assume that itoptimizes a function that weakly prefers a at all positions. Then, with aprobability of at least − n − , each frequency will stay at a value of at least for the first n iterations.Proof. Consider an iteration t ≤ n as well as a position i ∈ [ n ]. ByTheorem 3 with d = , we see that the probability that p i drops below within the first t ≤ n iterations is at most 2 e − µ/ (32 · t ) ≤ e − µ/ (32 · n ) ≤ n − ,where we used our bound on µ . Applying a union bound over all n frequenciesgives the claim.We now prove that a critical frequency, all its preceding frequencies, aswell as roughly log λµ following frequencies are set to 1 − n within a singleiteration. That is, we increase roughly 1 + log λµ new frequencies to theirmaximum value. Lemma 2.
Let δ ∈ (0 , be a constant, and let ζ = − δ e . Consider the UMDAoptimizing LeadingOnes with µ ≥ − δδ ln n and λ ≥ µζ . Furthermore,consider an iteration t ∈ N such that position i ∈ [ n ] is critical and that,for all positions j ≥ i , we have p ( t ) j ≥ . Let d = ⌊ log ( ζ λµ ) ⌋ . Then, with aprobability of at least − n − , for all positions j ∈ [min { n, i + d } ] , we have p ( t +1) j = 1 − n .Proof. Note that d ≥ λ . We look at the pop-ulation of λ individuals that is sampled in iteration t and determine the6umber X of individuals that have at least i ′ := min { n, i + d } leading 1s.Since the frequencies at all positions less than i are at 1 − n , the prob-ability that all of these frequencies sample a 1 for a single individual is(1 − n ) i − ≥ (1 − n ) n − ≥ e . Further, since the probability to sample a 1 atpositions at least i is at least , we have E[ X ] ≥ λe · − (1+ d ) ≥ µ eζ ≥ µ − δ .We now apply Theorem 1 in order to show that it is unlikely that fewerthan µ individuals from iteration t have fewer than i ′ leading 1s. Using ourbounds on µ and our estimate on E[ X ] from above, we computePr[ X < µ ] ≤ Pr h X ≤ (1 − δ )E[ X ] i ≤ e − δ X ]2 ≤ e − δ − δ ) µ ≤ n − . Thus, with a probability of at least 1 − n − , at least µ individuals have atleast i ′ leading 1s.Since the UMDA is optimizing LeadingOnes , in this case, all of theselected top µ individuals have at least i ′ leading 1s, which results in allfrequencies at positions in [ i ′ ] being set to 1 − n , that is, for all j ∈ [ i ′ ], wehave p ( t +1) j = 1 − n .We now prove our main result. Proof of Theorem 5.
We prove that the UMDA samples the optimum after ⌈ nd +1 ⌉ + ⌈ nn − e ln n ⌉ iterations with a probability of at least 1 − n − . Since itperforms λ fitness function evaluations each iteration, the theorem follows.Since LeadingOnes weakly prefers 1 at all positions, by Lemma 1 and µ ≥ n ⌈ ln n ⌉ , no frequency drops below within 2 n iterations with aprobability of at least 1 − n − .Consider an iteration t ≤ n such that position i ∈ [ n ] is critical. Notethat µ ≥ ⌈ − δδ ln n ⌉ for sufficiently large n . By Lemma 2, with a probabilityof at least 1 − n − , for each frequency at position in j ∈ [min { n, i + d } ], wehave p ( t +1) j = 1 − n . That is, d + 1 additional frequencies are set to 1 − n .Applying a union bound for the first 2 n iterations of the UMDA shows thatall frequencies are at 1 − n after the first ⌈ nd +1 ⌉ iterations and stay there forat least n additional iterations with a probability of at least 1 − n − .Consequently, after the first n iterations, the optimum is sampled in eachiteration with a probability of (1 − n ) n ≥ (1 − n ) e . Thus, after ⌈ nn − e ln n ⌉ additional iterations, the optimum is sampled with a probability of at least1 − (cid:16) − n − n e (cid:17) ⌈ nn − e ln n ⌉ ≥ − n − .Overall, by applying a union bound over all failure probabilities, theUMDA needs at most ⌈ nd +1 ⌉ + ⌈ nn − e ln n ⌉ iterations to sample the optimumfor the first time with a probability of at least 1 − n − .7e note that we stated explicit constants in the result above as we feltthat this eases reading, but we did not try to optimize them. For example,a selection rate of at most some constant less than e can give the samerun time guarantee when raising λ by a sufficiently large constant factor. Aselection rate of at most some constant less than e can also be tolerated.Now it takes a constant number of iterations to move a critical frequency to1 − n , so the run time guarantee increases by a constant factor. Our main insight, which gave our sharper upper bound with a proof simplerthan in previous works, was that the UMDA, when optimizing
Leading-Ones in the regime of low genetic drift, makes a steady progress in eachiteration: It sets the frequencies to the maximum value 1 − n in a left-to-right fashion, keeping the other frequencies close to the middle value of .The increase of the number of frequencies at the maximum value, with asimple Chernoff bound argument, could be shown to be logarithmic in thereciprocal µ/λ of the selection rate.In this section, we show that the same proof approach (with small mod-ifications) can also be employed to show lower bounds, and in this case, amatching lower bound, which also is the first lower bound for this setting atall. Theorem 6.
Let δ ∈ (0 , be a constant, and let ζ = (1 + δ ) . Considerthe UMDA optimizing LeadingOnes with λ ≥ µ ≥ n ln n and λ ≥ µζ .Further, let d = ⌈ log / ( ζ λµ ) ⌉ , and let ξ = ⌈ log / ( n λ ) ⌉ + 1 . Then the UMDAsamples the optimum after more than λ ⌊ n − ξd +1 ⌋ fitness function evaluationswith a probability of at least − n − . To prove a lower bound via the general idea laid out above, we need toshow that frequencies that do not receive a fitness signal do not approach1 − n due to genetic drift. Here we have to be slightly more careful than in ourupper bound analysis, since now the fitness signal does move the frequenciesinto the undesired (from the view-point of lower bound proofs) direction.Consequently, we can employ the low-genetic drift argument only while weare sure that we do not receive a fitness signal (Lemma 3).Using a Chernoff-type concentration argument (which in principle workssimilarly for upper and lower bounds), we show that at most roughly log λµ frequencies above the critical position receive a fitness signal (and thus po-tentially leave the middle range), see Lemma 4.8onsequently, in the first O ( n log λ/µ ) iterations, we have many frequenciesthat are far from the maximum value, and thus sampling the optimum isunlikely (Lemma 5). This yields our lower bound.To make these arguments precise, we define when a frequency of theUMDA stops being neutral, that is, receives a fitness signal. To this end, wesay that a position i ∈ [ n ] is selection-relevant (with respect to Leading-Ones ) in iteration t ∈ N if and only if the offspring population of the UMDAin iteration t has at least µ individuals with at least i − i decides whether an indi-vidual is selected for the update or not. We call the largest selection-relevantposition in an iteration the maximum selection-relevant position. Note thatall positions greater than the maximum selection-relevant position are neu-tral during this iteration.Since, by the definition of a selection-relevant position i , all frequencies atpositions less than i are set to 1 − n , the critical position for the next iterationis i , too. Thus, bounding the progress of the selection-relevant position alsobounds the overall progress of the UMDA on LeadingOnes .We start by showing that each frequency stays in the interval ( , ) untilits position becomes selection-relevant. Lemma 3.
Consider the UMDA with λ ≥ µ ≥ n ln n . Further, for eachposition i ∈ [ n ] , let t ′ i ∈ N denote the first iteration such that position i isselection-relevant, and let t sel i = min { t ′ i , n } . Then, with a probability of atleast − n − , within the first n iterations, for each position i ∈ [ n ] and foreach iteration t ≤ t sel i , it holds that p ( t ) i ∈ ( , ) .Proof. Consider a position i ∈ [ n ]. Note that, for all iterations t ≤ t sel i , thefrequency p i is neutral. By Theorem 4 with d = , we see that the probabilitythat p i leaves the interval ( , ) within the first t sel i ≤ n iterations is at most2 e − µ/ (32 · t sel i ) ≤ e − µ/ (32 · n ) ≤ n − , where we used our lower bound on µ .Applying a union bound over all n frequencies yields that at least onefrequency leaves the interval ( , ) within the first n iterations before beingselection-relevant with a probability of at most 2 n − , which concludes theproof.We now show that the maximum selection-relevant position is onlyroughly log λµ larger than the critical position during each iteration. Lemma 4.
Let δ ∈ (0 , be a constant, and let ζ = (1 + δ ) . Consider theUMDA optimizing LeadingOnes with µ ≥ δδ ln n and λ ≥ µ · max { , ζ } .Furthermore, consider an iteration t ∈ N such that position i ∈ [ n ] is criticaland that, for all positions j > i , we have p ( t ) j ≤ . Let d = ⌈ log / ( ζ λµ ) ⌉ . hen, with a probability of at least − n − , the maximum selection-relevantposition for iteration t is at most min { n, i + d + 1 } .Proof. Note that d ≥ λ . Similar to the proof ofLemma 2, we consider the offspring population of λ individuals sampled initeration t . Let X denote the number of individuals that have at least i ′ :=min { n, i + d + 1 } leading 1s. By assumption, all frequencies at positionsgreater than i are at most . Thus, E[ X ] ≤ λ ( ) d = λ ( ) − (1+ d ) ≤ µζ ≤ µ δ .We now apply Theorem 2 in order to show that it is unlikely that atleast µ individuals from iteration t have at least i ′ leading 1s. Using ourbounds on µ and our estimate on E[ X ] from above, we computePr[ X ≥ µ ] ≤ Pr h X ≥ (1 + δ )E[ X ] i ≤ e − δ X ]3 ≤ e − δ µ δ ) ≤ n − . Thus, with a probability of at least 1 − n − , fewer than µ individuals have atleast i ′ leading 1s. This means that the maximum selection-relevant positionin this iteration is in [ i ′ ].Before we prove our lower bound, we show that the UMDA does notsample the optimal solution of LeadingOnes with high probability whilethe critical position is at least logarithmically far away from the end.
Lemma 5.
Consider the UMDA optimizing
LeadingOnes with λ ≥ µ .Further, consider an iteration t ∈ N and a position i ∈ [ n ] such that, forall positions j > i , we have p ( t ) j ≤ . Then, with a probability of at least − λ ( ) n − i , the UMDA does not sample the optimum in this iteration.Proof. By our assumption on the frequencies and on i , the probability thata single individual in the offspring population in iteration t is the all-1sstring (that is, the optimum of LeadingOnes ) is at most ( ) n − i . Thus, theprobability that none of the λ offspring is optimal is, by Bernoulli’s inequality,at least (cid:16) − ( ) n − i (cid:17) λ ≥ − λ ( ) n − i , as desired.We now prove our lower bound. Proof of Theorem 6.
We prove that the UMDA needs, with a probability ofat least 1 − n − , more than ⌊ n − ξd +1 ⌋ iterations until it samples the optimum.Since it performs λ fitness function evaluations each iteration, the theoremthen follows. 10n the following, we assume that all frequencies remain in the interval( , ) for the first n iterations as long as they have never been selection-relevant. By Lemma 3, this happens with a probability of at least 1 − n − .We now prove by induction on the iteration index t ∈ N that, with aprobability of at least 1 − ( t + 1) n − , for each position i > t + 1)( d + 1),we have that position i is not selection-relevant up to iteration t .For the base case t = 0, note that position i = 1 is critical and that allfrequencies are and thus at most . By Lemma 4, with a probability of atleast 1 − n − , the maximum selection-relevant position is at most d + 2. Thus,all positions greater than d + 2 are not selection-relevant up to iteration 0.For the inductive step, we assume that our inductive hypothesis holdsup to iteration t . Note that this means that, with a probability of at least1 − ( t + 1) n − , the maximum selection-relevant index is in [1 + ( t + 1)( d + 1)]and, thus, the critical position for iteration t + 1 is in [1 + ( t + 1)( d + 1)].By Lemma 3, all frequencies at positions greater than 1 + ( t + 1)( d + 1)are thus at most . Then, in iteration t + 1, again by Lemma 4, with aprobability of at least 1 − n − , the maximum-selection relevant index is atmost 1 + ( t + 1)( d + 1) + d + 1 = 1 + ( t + 2)( d + 1). Consequently, via a unionbound on the error probabilities of the inductive hypothesis and the currentiteration t + 1, with a probability of at least 1 − ( t + 2) n − , no position greaterthan 1 + ( t + 2)( d + 1) is selection-relevant up to iteration t + 1, which provesour claim.We now assume that n − ξ ≥
1, as Theorem 6 is trivial otherwise. Ourclaim above then yields that, up to iteration t ′ := ⌊ n − ξd +1 ⌋−
1, with a probabilityof at least 1 − n − ξd +1 n − ≥ − n − , each position greater than 1 + n − ξ wasnever selection-relevant. This means that by Lemma 3, all such frequenciesare at most . By Lemma 5 with i = 1 + n − ξ , with a probability of at least1 − λ ( ) n − i = 1 − λ ( ) ξ − = 1 − n − , the UMDA does not sample the optimumof LeadingOnes within a single iteration. Applying a union bound over thefirst t ′ + 1 ≤ n iterations, with a probability of at least 1 − n − , the UMDAdoes not sample the optimum up to iteration t ′ (which are t ′ + 1 iterations).Overall, by a union bound over all error probabilities, with a probabilityof at least 1 − n − , the UMDA does not sample the optimum within thefirst t ′ + 1 = ⌊ n − ξd +1 ⌋ iterations, which concludes the proof.For the sake of completeness, we state the combined result of our upperand lower bound. Note that such frequencies are at most with a probability of 1, as we condition onthis event throughout the proof, as stated at the beginning of the proof. orollary 1 (combining Theorems 5 and 6) . Let C be a sufficiently largeconstant. Consider the UMDA optimizing LeadingOnes with λ ≥ Cµ ≥ n ln n and with λ being bounded from above by a polynomial in n . Witha probability of at least − n − , it samples the optimum after Θ (cid:18) λ n log( λ/µ ) (cid:19) fitness function evaluations.Proof. From the assumptions of Theorems 5 and 6, we take the stricter ones.The additive term ⌈ nn − e ln n ⌉ in Theorem 5 vanishes in asymptotic notation,and the term n − ξ in Theorem 6 is Ω( n ), due to λ being bounded from aboveby a polynomial in ξ . Taking the union bound over the failure probabilitiesof both theorems concludes the proof. We improved the best known upper bound for the run time of the UMDAon
LeadingOnes for the case of µ ∈ Ω( n log n ) from O ( nλ log λ ) to O (cid:16) n λ log( λ/µ ) (cid:17) . This result improves the previous best result both by removingan unnecessary log λ factor and, not discussed in previous works, by gaininga log( λ/µ ) factor and thus showing an advantage of using a low selectionrate µ/λ . We obtained these results via a different proof method that avoidsthe technically demanding level-based method. Our arguments can also beemployed for lower bounds. We did so and provided the first lower boundfor the run time of the UMDA on LeadingOnes . Combined, these resultsprovide a run time estimate for the UMDA on
LeadingOnes that is tightup to constant factors.We note that the general proof idea can be extended also to the parameterregime of µ ∈ o ( n log n ) for the UMDA. We conjecture that a more generalupper bound of the UMDA (with λ ∈ Ω(log n )) on LeadingOnes is O λ (cid:18) n + ne µ/n (cid:18) nλ + log min { µ, n } (cid:19)(cid:19) . Speaking in terms of iterations and thus ignoring the factor of λ , this expres-sion can be explained as follows: the first term of n considers O ( n ) frequenciesthat do not drop below constant values. Each of these frequencies is set to1 − n within a constant number of iterations with high probability. Since λ ∈ Ω(log n ), frequencies at 1 − n do not drop until the optimum is sampledwith high probability.The second, more complicated term deals with frequencies that, pes-simistically, reached the lower border n . There are O ( n/e µ/n ) of these fre-quencies, by the same argument as used in the proof of Lemma 1. The other12actor is concerned with the time it takes a critical frequency to be increasedto 1 − n with high probability. Here, a case distinction needs to be madewith respect to whether µ ≥ n . The inverse of the maximum of µ and n determines the step size in which a critical frequency can be increased. Theterm log min { µ, n } stems from the multiplicative up-drift [DK19] of such afrequency in order to reach a constant value. Afterward, it is set to 1 − n within a constant number of iterations (as the first O ( n ) frequencies). Last,the term n/λ is only important if λ ∈ o ( n ) and denotes the waiting timefor a critical frequency to sample at least one 1 with λ tries (given that theprefix consists of only 1s).Last, we are positive that our proof technique is applicable to a greaterclass of EDAs. In order to transfer the proof of the upper bound to otherunivariate EDAs, only Lemmas 1 and 2 need to be adjusted to the specificalgorithm, which should work similarly for other EDAs too. For the lowerbound, Lemmas 3 to 5 need to be changed. Acknowledgments
This work was supported by a public grant as part of the Investissementd’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in ajoint call with Gaspard Monge Program for optimization, operations researchand their interactions with data sciences. This publication is based uponwork from COST Action CA15140, supported by COST.
References [DK19] Benjamin Doerr and Timo Kötzing. Multiplicative up-drift. In
Proc. of GECCO ’19 , pages 1470–1478, 2019.[DL15] Duc-Cuong Dang and Per Kristian Lehre. Simplified runtime analy-sis of estimation of distribution algorithms. In
Proc. of GECCO ’15 ,pages 513–518, 2015.[DLN19] Duc-Cuong Dang, Per Kristian Lehre, and Phan Trung Hai Nguyen.Level-based analysis of the univariate marginal distribution algo-rithm.
Algorithmica , 81(2):668–702, 2019.[Doe19] Benjamin Doerr. A tight runtime analysis for the cGA on jumpfunctions: EDAs can cross fitness valleys at no extra cost. In
Proc. of GECCO ’19 , pages 1488–1496, 2019.13Doe20] Benjamin Doerr. Probabilistic tools for the analysis of ran-domized optimization heuristics. In
Theory of Evolution-ary Computation—Recent Developments in Discrete Optimization ,pages 1–87. Springer International Publishing, 2020. Also availableat https://arxiv.org/abs/1801.06733 .[DZ19] Benjamin Doerr and Weijie Zheng. Sharp bounds for genetic driftin EDAs.
CoRR , abs/1910.14389, 2019.[KW18] Martin S. Krejca and Carsten Witt. Lower bounds on the runtime of the univariate marginal distribution algorithm on onemax.
Theoretical Computer Science , 2018.[KW20] Martin S. Krejca and Carsten Witt. Theory of estimation-of-distribution algorithms. In
Theory of EvolutionaryComputation—Recent Developments in Discrete Optimiza-tion , pages 405–442. Springer, 2020. Also available at http://arxiv.org/abs/1806.05392 .[LN17] Per Kristian Lehre and Phan Trung Hai Nguyen. Improved runtimebounds for the univariate marginal distribution algorithm via anti-concentration. In
Proc. of GECCO ’17 , pages 1383–1390, 2017.[LN19] Per Kristian Lehre and Phan Trung Hai Nguyen. Runtime anal-ysis of the univariate marginal distribution algorithm under lowselective pressure and prior noise. In
Proc. of GECCO ’19 , pages1497–1505, 2019.[LSW18] Johannes Lengler, Dirk Sudholt, and Carsten Witt. Medium stepsizes are harmful for the compact genetic algorithm. In
Proc. ofGECCO ’18 , pages 1499–1506, 2018.[MP96] Heinz Mühlenbein and Gerhard Paaß. From recombination of genesto the estimation of distributions I. Binary parameters. In
Proc. ofPPSN ’96 , pages 178–187, 1996.[PHL15] Martin Pelikan, Mark Hauschild, and Fernando G. Lobo. Estima-tion of distribution algorithms. In
Springer Handbook of Computa-tional Intelligence , pages 899–928. Springer, 2015.[Wit17] Carsten Witt. Upper bounds on the runtime of the univari-ate marginal distribution algorithm on OneMax. In