[PDF] A Simplified Run Time Analysis of the Univariate Marginal Distribution Algorithm on LeadingOnes

Abstract

With elementary means, we prove a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LeadingOnes benchmark function in the desirable regime with low genetic drift. If the population size is at least quasilinear, then, with high probability, the UMDA samples the optimum within a number of iterations that is linear in the problem size divided by the logarithm of the UMDA's selection rate. This improves over the previous guarantee, obtained by Dang and Lehre (2015) via the deep level-based population method, both in terms of the run time and by demonstrating further run time gains from small selection rates. With similar arguments as in our upper-bound analysis, we also obtain the first lower bound for this problem. Under similar assumptions, we prove that a bound that matches our upper bound up to constant factors holds with high probability.

Full PDF

aa r X i v : . [ c s . N E ] A p r A Simpliﬁed Run Time Analysis of theUnivariate Marginal Distribution Algorithmon LeadingOnes

Benjamin DoerrLaboratoire d’Informatique (LIX)CNRSÉcole PolytechniqueInstitut Polytechnique de ParisPalaiseauFrance Martin S. KrejcaHasso Platter InstituteUniversity of PotsdamPotsdamGermanyApril 13, 2020

Abstract

With elementary means, we prove a stronger run time guaranteefor the univariate marginal distribution algorithm (UMDA) optimiz-ing the

LeadingOnes benchmark function in the desirable regimewith low genetic drift. If the population size is at least quasilinear,then, with high probability, the UMDA samples the optimum withina number of iterations that is linear in the problem size divided bythe logarithm of the UMDA’s selection rate. This improves over theprevious guarantee, obtained by Dang and Lehre (2015) via the deeplevel-based population method, both in terms of the run time and bydemonstrating further run time gains from small selection rates. Withsimilar arguments as in our upper-bound analysis, we also obtain theﬁrst lower bound for this problem. Under similar assumptions, weprove that a bound that matches our upper bound up to constantfactors holds with high probability.

Estimation-of-distribution algorithms (EDAs) are randomized search-heuristics that create a probabilistic model of the search space and reﬁne1t iteratively. In each iteration, the current model of an EDA is used tocreate some samples which, in turn, are used to adjust the model such thatbetter solutions are more likely to be created in the following iteration. Thus,the model evolves over time into one that creates very good solutions. EDAshave been applied to real-world problems with great success [PHL15].Within the last few years, the theoretical analysis of EDAs has gainedincreasing interest, as summarized by Krejca and Witt [KW20]. Oneof the ﬁrst papers in this period was by Dang and Lehre [DL15], whoproved run time guarantees for the univariate marginal distribution algorithm (UMDA, [MP96]) when optimizing the two classical benchmark functions

OneMax and

LeadingOnes . While their run time bound for

OneMax has been improved since then independently by Lehre and Nguyen [LN17]and Witt [Wit17], the run time bound of O ( n + nλ log λ ), where n is theproblem dimension and λ is the oﬀspring population size of the UMDA, isthe best known result so far on LeadingOnes . In this work, we improve with Theorem 5 the second term of this boundfrom O ( nλ log λ ) to O (cid:16) n λ log( λ/µ ) (cid:17) when µ = Ω( n log n ), where µ ≤ λ is thesize of the subpopulation selected for the model update. In the regime of µ =Ω( n log n ), the UMDA shows the generally desirable behavior of low geneticdrift, that is, the sampling frequencies stay in the middle range of, say, ( , )until a suﬃciently strong ﬁtness signal moves them into the right direction.While EDAs are not necessarily ineﬃcient in the presence of stronger geneticdrift, their optimization behavior then often becomes similar to a slowed-down version of the (1+1) evolutionary algorithm. Genetic drift, however,can also lead to a performance loss, since it may take long to move a frequencyfrom the wrong boundary value back into the middle range. This has beenrigorously shown by Lengler et al. [LSW18].Equally interesting to the improved run time guarantee is our elementaryproof method. While it was truly surprising that Dang and Lehre [DL15]could use the level-based population method to analyze an EDA (which doesnot have a population that is transferred from one iteration to the next),this method is a highly advanced tool and one that can be diﬃcult to use.In contrast to this, our proof only uses elementary arguments common inthe analysis of evolutionary algorithms. We are thus optimistic that ourarguments can more easily be applied to other EDAs as well.We further demonstrate the usability of our proof method by provinga matching lower bound (see Theorem 6), which improves the previouslybest known lower bounds by Lehre and Nguyen [LN19] for the regime of µ = In an extension of [DL15], Dang et al. [DLN19] show the same run time bound butslightly improve the required population sizes. n log n ). For the regime of µ = Ω(log n ) ∩ o ( n log n ), the bound Ω (cid:16) nλ log( λ − µ ) (cid:17) by Lehre and Nguyen remains the best known lower bound. Additionally,Lehre and Nguyen prove a lower bound of e Ω( µ ) for µ = Ω(log n ) and λ . eµ ,which remains untouched by our result.We note that both of our bounds do not require the fraction µ/λ to beconstant, which is a common requirement of many other analyses of theUMDA [DLN19, LN17, Wit17, KW18] (although this is not always explicitlystated in the result). In particular, our bounds show that the gain fromreducing the selection rate µ/λ (which often requires a costly increase of λ )is very small, namely, only logarithmic in µ/λ .Another advantage of our approach is that it gives run time guaranteesthat hold with high probability, whereas the level-based method, relying ondrift arguments, can only give bounds on expected run times. Consequently,the result of Dang and Lehre [DL15] also concerns the expectation only. Webelieve that a result that holds with high probability is often more relevant,as has also been argued by Doerr [Doe19]. We are concerned with the run time analysis of algorithms optimizing pseudo-Boolean functions, that is, functions f : { , } n → R , where n ∈ N denotesthe dimension of the problem. Given a pseudo-Boolean function f and a bitstring x , we refer to f as a ﬁtness function, to x as an individual, and to f ( x )as the ﬁtness of x . For an n ∈ N , we deﬁne [ n ] = [1 , n ] ∩ N . From now on, if not statedotherwise, the variable n always denotes a natural number. For a vector x of length n , we denote its component at position i ∈ [ n ] via x i .We consider the optimization of the pseudo-Boolean function LeadingOnes : { , } n → { } ∪ [ n ], which states for a bit string of length n the longest preﬁx of leading 1s within that bit string. More formally, for all x ∈ { , } n , LeadingOnes ( x ) = n X i =1 i Y j =1 x i . Note that the all-1s bit string is the unique global optimum of

Leading-Ones .Our algorithm of interest is the UMDA (Algorithm 1) with parameters µ, λ ∈ N , µ ≤ λ . It maintains a vector p of probabilities (the frequencyvector ) of length n , whose components we call frequencies, and it updates3his vector iteratively in the following way: ﬁrst, λ individuals are createdindependently from another such that, for each individual x ∈ { , } n andeach position i ∈ [ n ], it holds that x i is 1 with probability p i and 0 otherwise.Then, from these λ individuals, a subset of µ individuals with the highestﬁtness is chosen (breaking ties uniformly at random), and, for each position i ∈ [ n ], the frequency p i is set to the relative number of 1s at position i amongthe µ best individuals. Last, if a frequency p i is below n , it is increased to n ,and if it is above 1 − n , it is decreased to 1 − n . This circumvents frequenciesfrom being stuck at the extremal values 0 or 1. We denote the frequencyvector of iteration t ∈ N by p ( t ) . Note that we start with iteration t = 0. Algorithm 1:

The UMDA [MP96] with parameters µ and λ , µ ≤ λ ,maximizing a ﬁtness function f : { , } n → R with n ≥ t ← p ( t ) ← ( ) i ∈ [ n ] ; repeat ⊲ iteration t for i ∈ [ λ ] do x ( i ) ← individual sampled via p ( t ) ; let y (1) , . . . , y ( µ ) denote the µ best individuals out of x (1) , . . . , x ( λ ) (breaking ties uniformly at random); for i ∈ [ n ] do p ( t +1) i ← µ P µj =1 y ( i ) j ; restrict p ( t +1) to the interval [ n , − n ]; until termination criterion met;In the context of optimizing LeadingOnes , we say that a position i ∈ [ n ]of p ( t ) is critical in iteration t ∈ N if and only if all of the frequencies atindices less than i are 1 − n and p ( t ) i is less than 1 − n . Intuitively, a criticalfrequency is the next one that needs to be set to 1 − n in order to optimize LeadingOnes eﬃciently.When analyzing the run time of the UMDA optimizing a ﬁtness func-tion f , we are interested in the number T of ﬁtness function evaluationsuntil an optimum of f is sampled for the ﬁrst time. Since the UMDA is arandomized algorithm, T is a random variable, and we are interested in abound on T that holds with high probability. Note that the run time T ofthe UMDA is at most λ times the number I of iterations until an optimumis sampled for the ﬁrst time. Likewise, T is at least ( I − λ + 1.In order to prove statements on random variables that hold with highprobability, we use the following commonly known Chernoﬀ bounds. Theorem 1 (Chernoﬀ bound [Doe20, Theorem 1 . . . Let k ∈ N , δ ∈ [0 , ,and let X be the sum of k independent random variables, each taking values n [0 , . Then Pr h X ≤ (1 − δ )E[ X ] i ≤ e − δ X ]2 . Theorem 2 (Chernoﬀ bound [Doe20, Theorem 1 . . . Let k ∈ N , δ ∈ [0 , ,and let X be the sum of k independent random variables, each taking valuesin [0 , . Then Pr h X ≥ (1 + δ )E[ X ] i ≤ e − δ X ]3 . The next two theorems, recently proven in [DZ19], give upper bounds onthe negative eﬀect of genetic drift on the UMDA. The ﬁrst result considersthe optimization of ﬁtness functions f that weakly prefer a 1 at a position i ∈ [ n ], that is, for all bit strings x, x ′ ∈ { , } n with x i = 1, x ′ i = 0, and x j = x ′ j for all other positions j ∈ [ n ] \ { i } , it holds that f ( x ) ≥ f ( x ′ ). Inother words, having a 1 at position i always yields a ﬁtness at least as goodas when having a 0 at i . Note that LeadingOnes weakly prefers a 1 inall bit positions. The theorem states that the frequency at such a position i does not drop far below its initial value for a long time. Theorem 3 ([DZ19, Theorem 7]) . Consider the UMDA with parameters µ and λ optimizing a function f that weakly prefers a at position i ∈ [ n ] .Then, for all d > and all iterations t ∈ N , we have Pr h ∀ t ′ ∈ [0 ..t ] : p ( t ′ ) i > − d i ≥ − e − d µ t . The next theorem considers the case that there is no preference for abit value at position i ∈ [ n ], that is, for all bit strings x, x ′ ∈ { , } n with x i = 1, x ′ i = 0, and x j = x ′ j for all other positions j ∈ [ n ] \ { i } , it holds that f ( x ) = f ( x ′ ). Given this assumption, we call position i neutral. Theorem 4 ([DZ19, Corollary 2]) . Consider the UMDA with parameters µ and λ optimizing a function f such that position i ∈ [ n ] is neutral. Then, forall d > and all iterations t ∈ N , we have Pr h ∀ t ′ ∈ [0 ..t ] : p ( t ′ ) i ∈ ( − d, + d ) i ≥ − e − d µ t . In the following, we present our simple and intuitive run time analysis forthe upper bound of the UMDA optimizing

LeadingOnes , which gives thefollowing theorem. 5 heorem 5.

Let δ ∈ (0 , be a constant, and let ζ = − δ e . Consider theUMDA optimizing LeadingOnes with µ ≥ n ln n and λ ≥ µζ . Further,let d = ⌊ log ( ζ λµ ) ⌋ . Then the UMDA samples the optimum after at most λ ( ⌈ nd +1 ⌉ + ⌈ nn − e ln n ⌉ ) ﬁtness function evaluations with a probability of atleast − n − . As discussed in the introduction, we only want to consider the regimewith low genetic drift. Hence, we ﬁrst argue that no frequency drops below before the optimum is sampled (Lemma 1). Then we show that, in this case,in each iteration, roughly log λµ additional frequencies are set to 1 − n . Morespeciﬁcally, if i ∈ [ n ] is critical, then all frequencies at positions roughly up to i +log λµ are set to 1 − n (Lemma 2). Thus, a total of roughly n log( λ/µ ) iterationssuﬃce to move all frequencies to 1 − n . From such a state, the optimum issampled with high probability after a logarithmic number of iterations.We start by proving that the following parameter setting ensures that nofrequency drops below the value within 2 n iterations with high probability. Lemma 1.

Consider the UMDA with λ ≥ µ ≥ n ln n . Assume that itoptimizes a function that weakly prefers a at all positions. Then, with aprobability of at least − n − , each frequency will stay at a value of at least for the ﬁrst n iterations.Proof. Consider an iteration t ≤ n as well as a position i ∈ [ n ]. ByTheorem 3 with d = , we see that the probability that p i drops below within the ﬁrst t ≤ n iterations is at most 2 e − µ/ (32 · t ) ≤ e − µ/ (32 · n ) ≤ n − ,where we used our bound on µ . Applying a union bound over all n frequenciesgives the claim.We now prove that a critical frequency, all its preceding frequencies, aswell as roughly log λµ following frequencies are set to 1 − n within a singleiteration. That is, we increase roughly 1 + log λµ new frequencies to theirmaximum value. Lemma 2.

Let δ ∈ (0 , be a constant, and let ζ = − δ e . Consider the UMDAoptimizing LeadingOnes with µ ≥ − δδ ln n and λ ≥ µζ . Furthermore,consider an iteration t ∈ N such that position i ∈ [ n ] is critical and that,for all positions j ≥ i , we have p ( t ) j ≥ . Let d = ⌊ log ( ζ λµ ) ⌋ . Then, with aprobability of at least − n − , for all positions j ∈ [min { n, i + d } ] , we have p ( t +1) j = 1 − n .Proof. Note that d ≥ λ . We look at the pop-ulation of λ individuals that is sampled in iteration t and determine the6umber X of individuals that have at least i ′ := min { n, i + d } leading 1s.Since the frequencies at all positions less than i are at 1 − n , the prob-ability that all of these frequencies sample a 1 for a single individual is(1 − n ) i − ≥ (1 − n ) n − ≥ e . Further, since the probability to sample a 1 atpositions at least i is at least , we have E[ X ] ≥ λe · − (1+ d ) ≥ µ eζ ≥ µ − δ .We now apply Theorem 1 in order to show that it is unlikely that fewerthan µ individuals from iteration t have fewer than i ′ leading 1s. Using ourbounds on µ and our estimate on E[ X ] from above, we computePr[ X < µ ] ≤ Pr h X ≤ (1 − δ )E[ X ] i ≤ e − δ X ]2 ≤ e − δ − δ ) µ ≤ n − . Thus, with a probability of at least 1 − n − , at least µ individuals have atleast i ′ leading 1s.Since the UMDA is optimizing LeadingOnes , in this case, all of theselected top µ individuals have at least i ′ leading 1s, which results in allfrequencies at positions in [ i ′ ] being set to 1 − n , that is, for all j ∈ [ i ′ ], wehave p ( t +1) j = 1 − n .We now prove our main result. Proof of Theorem 5.

We prove that the UMDA samples the optimum after ⌈ nd +1 ⌉ + ⌈ nn − e ln n ⌉ iterations with a probability of at least 1 − n − . Since itperforms λ ﬁtness function evaluations each iteration, the theorem follows.Since LeadingOnes weakly prefers 1 at all positions, by Lemma 1 and µ ≥ n ⌈ ln n ⌉ , no frequency drops below within 2 n iterations with aprobability of at least 1 − n − .Consider an iteration t ≤ n such that position i ∈ [ n ] is critical. Notethat µ ≥ ⌈ − δδ ln n ⌉ for suﬃciently large n . By Lemma 2, with a probabilityof at least 1 − n − , for each frequency at position in j ∈ [min { n, i + d } ], wehave p ( t +1) j = 1 − n . That is, d + 1 additional frequencies are set to 1 − n .Applying a union bound for the ﬁrst 2 n iterations of the UMDA shows thatall frequencies are at 1 − n after the ﬁrst ⌈ nd +1 ⌉ iterations and stay there forat least n additional iterations with a probability of at least 1 − n − .Consequently, after the ﬁrst n iterations, the optimum is sampled in eachiteration with a probability of (1 − n ) n ≥ (1 − n ) e . Thus, after ⌈ nn − e ln n ⌉ additional iterations, the optimum is sampled with a probability of at least1 − (cid:16) − n − n e (cid:17) ⌈ nn − e ln n ⌉ ≥ − n − .Overall, by applying a union bound over all failure probabilities, theUMDA needs at most ⌈ nd +1 ⌉ + ⌈ nn − e ln n ⌉ iterations to sample the optimumfor the ﬁrst time with a probability of at least 1 − n − .7e note that we stated explicit constants in the result above as we feltthat this eases reading, but we did not try to optimize them. For example,a selection rate of at most some constant less than e can give the samerun time guarantee when raising λ by a suﬃciently large constant factor. Aselection rate of at most some constant less than e can also be tolerated.Now it takes a constant number of iterations to move a critical frequency to1 − n , so the run time guarantee increases by a constant factor. Our main insight, which gave our sharper upper bound with a proof simplerthan in previous works, was that the UMDA, when optimizing

Leading-Ones in the regime of low genetic drift, makes a steady progress in eachiteration: It sets the frequencies to the maximum value 1 − n in a left-to-right fashion, keeping the other frequencies close to the middle value of .The increase of the number of frequencies at the maximum value, with asimple Chernoﬀ bound argument, could be shown to be logarithmic in thereciprocal µ/λ of the selection rate.In this section, we show that the same proof approach (with small mod-iﬁcations) can also be employed to show lower bounds, and in this case, amatching lower bound, which also is the ﬁrst lower bound for this setting atall. Theorem 6.

Let δ ∈ (0 , be a constant, and let ζ = (1 + δ ) . Considerthe UMDA optimizing LeadingOnes with λ ≥ µ ≥ n ln n and λ ≥ µζ .Further, let d = ⌈ log / ( ζ λµ ) ⌉ , and let ξ = ⌈ log / ( n λ ) ⌉ + 1 . Then the UMDAsamples the optimum after more than λ ⌊ n − ξd +1 ⌋ ﬁtness function evaluationswith a probability of at least − n − . To prove a lower bound via the general idea laid out above, we need toshow that frequencies that do not receive a ﬁtness signal do not approach1 − n due to genetic drift. Here we have to be slightly more careful than in ourupper bound analysis, since now the ﬁtness signal does move the frequenciesinto the undesired (from the view-point of lower bound proofs) direction.Consequently, we can employ the low-genetic drift argument only while weare sure that we do not receive a ﬁtness signal (Lemma 3).Using a Chernoﬀ-type concentration argument (which in principle workssimilarly for upper and lower bounds), we show that at most roughly log λµ frequencies above the critical position receive a ﬁtness signal (and thus po-tentially leave the middle range), see Lemma 4.8onsequently, in the ﬁrst O ( n log λ/µ ) iterations, we have many frequenciesthat are far from the maximum value, and thus sampling the optimum isunlikely (Lemma 5). This yields our lower bound.To make these arguments precise, we deﬁne when a frequency of theUMDA stops being neutral, that is, receives a ﬁtness signal. To this end, wesay that a position i ∈ [ n ] is selection-relevant (with respect to Leading-Ones ) in iteration t ∈ N if and only if the oﬀspring population of the UMDAin iteration t has at least µ individuals with at least i − i decides whether an indi-vidual is selected for the update or not. We call the largest selection-relevantposition in an iteration the maximum selection-relevant position. Note thatall positions greater than the maximum selection-relevant position are neu-tral during this iteration.Since, by the deﬁnition of a selection-relevant position i , all frequencies atpositions less than i are set to 1 − n , the critical position for the next iterationis i , too. Thus, bounding the progress of the selection-relevant position alsobounds the overall progress of the UMDA on LeadingOnes .We start by showing that each frequency stays in the interval ( , ) untilits position becomes selection-relevant. Lemma 3.

Consider the UMDA with λ ≥ µ ≥ n ln n . Further, for eachposition i ∈ [ n ] , let t ′ i ∈ N denote the ﬁrst iteration such that position i isselection-relevant, and let t sel i = min { t ′ i , n } . Then, with a probability of atleast − n − , within the ﬁrst n iterations, for each position i ∈ [ n ] and foreach iteration t ≤ t sel i , it holds that p ( t ) i ∈ ( , ) .Proof. Consider a position i ∈ [ n ]. Note that, for all iterations t ≤ t sel i , thefrequency p i is neutral. By Theorem 4 with d = , we see that the probabilitythat p i leaves the interval ( , ) within the ﬁrst t sel i ≤ n iterations is at most2 e − µ/ (32 · t sel i ) ≤ e − µ/ (32 · n ) ≤ n − , where we used our lower bound on µ .Applying a union bound over all n frequencies yields that at least onefrequency leaves the interval ( , ) within the ﬁrst n iterations before beingselection-relevant with a probability of at most 2 n − , which concludes theproof.We now show that the maximum selection-relevant position is onlyroughly log λµ larger than the critical position during each iteration. Lemma 4.

Let δ ∈ (0 , be a constant, and let ζ = (1 + δ ) . Consider theUMDA optimizing LeadingOnes with µ ≥ δδ ln n and λ ≥ µ · max { , ζ } .Furthermore, consider an iteration t ∈ N such that position i ∈ [ n ] is criticaland that, for all positions j > i , we have p ( t ) j ≤ . Let d = ⌈ log / ( ζ λµ ) ⌉ . hen, with a probability of at least − n − , the maximum selection-relevantposition for iteration t is at most min { n, i + d + 1 } .Proof. Note that d ≥ λ . Similar to the proof ofLemma 2, we consider the oﬀspring population of λ individuals sampled initeration t . Let X denote the number of individuals that have at least i ′ :=min { n, i + d + 1 } leading 1s. By assumption, all frequencies at positionsgreater than i are at most . Thus, E[ X ] ≤ λ ( ) d = λ ( ) − (1+ d ) ≤ µζ ≤ µ δ .We now apply Theorem 2 in order to show that it is unlikely that atleast µ individuals from iteration t have at least i ′ leading 1s. Using ourbounds on µ and our estimate on E[ X ] from above, we computePr[ X ≥ µ ] ≤ Pr h X ≥ (1 + δ )E[ X ] i ≤ e − δ X ]3 ≤ e − δ µ δ ) ≤ n − . Thus, with a probability of at least 1 − n − , fewer than µ individuals have atleast i ′ leading 1s. This means that the maximum selection-relevant positionin this iteration is in [ i ′ ].Before we prove our lower bound, we show that the UMDA does notsample the optimal solution of LeadingOnes with high probability whilethe critical position is at least logarithmically far away from the end.

Lemma 5.

Consider the UMDA optimizing

LeadingOnes with λ ≥ µ .Further, consider an iteration t ∈ N and a position i ∈ [ n ] such that, forall positions j > i , we have p ( t ) j ≤ . Then, with a probability of at least − λ ( ) n − i , the UMDA does not sample the optimum in this iteration.Proof. By our assumption on the frequencies and on i , the probability thata single individual in the oﬀspring population in iteration t is the all-1sstring (that is, the optimum of LeadingOnes ) is at most ( ) n − i . Thus, theprobability that none of the λ oﬀspring is optimal is, by Bernoulli’s inequality,at least (cid:16) − ( ) n − i (cid:17) λ ≥ − λ ( ) n − i , as desired.We now prove our lower bound. Proof of Theorem 6.

We prove that the UMDA needs, with a probability ofat least 1 − n − , more than ⌊ n − ξd +1 ⌋ iterations until it samples the optimum.Since it performs λ ﬁtness function evaluations each iteration, the theoremthen follows. 10n the following, we assume that all frequencies remain in the interval( , ) for the ﬁrst n iterations as long as they have never been selection-relevant. By Lemma 3, this happens with a probability of at least 1 − n − .We now prove by induction on the iteration index t ∈ N that, with aprobability of at least 1 − ( t + 1) n − , for each position i > t + 1)( d + 1),we have that position i is not selection-relevant up to iteration t .For the base case t = 0, note that position i = 1 is critical and that allfrequencies are and thus at most . By Lemma 4, with a probability of atleast 1 − n − , the maximum selection-relevant position is at most d + 2. Thus,all positions greater than d + 2 are not selection-relevant up to iteration 0.For the inductive step, we assume that our inductive hypothesis holdsup to iteration t . Note that this means that, with a probability of at least1 − ( t + 1) n − , the maximum selection-relevant index is in [1 + ( t + 1)( d + 1)]and, thus, the critical position for iteration t + 1 is in [1 + ( t + 1)( d + 1)].By Lemma 3, all frequencies at positions greater than 1 + ( t + 1)( d + 1)are thus at most . Then, in iteration t + 1, again by Lemma 4, with aprobability of at least 1 − n − , the maximum-selection relevant index is atmost 1 + ( t + 1)( d + 1) + d + 1 = 1 + ( t + 2)( d + 1). Consequently, via a unionbound on the error probabilities of the inductive hypothesis and the currentiteration t + 1, with a probability of at least 1 − ( t + 2) n − , no position greaterthan 1 + ( t + 2)( d + 1) is selection-relevant up to iteration t + 1, which provesour claim.We now assume that n − ξ ≥

1, as Theorem 6 is trivial otherwise. Ourclaim above then yields that, up to iteration t ′ := ⌊ n − ξd +1 ⌋−

1, with a probabilityof at least 1 − n − ξd +1 n − ≥ − n − , each position greater than 1 + n − ξ wasnever selection-relevant. This means that by Lemma 3, all such frequenciesare at most . By Lemma 5 with i = 1 + n − ξ , with a probability of at least1 − λ ( ) n − i = 1 − λ ( ) ξ − = 1 − n − , the UMDA does not sample the optimumof LeadingOnes within a single iteration. Applying a union bound over theﬁrst t ′ + 1 ≤ n iterations, with a probability of at least 1 − n − , the UMDAdoes not sample the optimum up to iteration t ′ (which are t ′ + 1 iterations).Overall, by a union bound over all error probabilities, with a probabilityof at least 1 − n − , the UMDA does not sample the optimum within theﬁrst t ′ + 1 = ⌊ n − ξd +1 ⌋ iterations, which concludes the proof.For the sake of completeness, we state the combined result of our upperand lower bound. Note that such frequencies are at most with a probability of 1, as we condition onthis event throughout the proof, as stated at the beginning of the proof. orollary 1 (combining Theorems 5 and 6) . Let C be a suﬃciently largeconstant. Consider the UMDA optimizing LeadingOnes with λ ≥ Cµ ≥ n ln n and with λ being bounded from above by a polynomial in n . Witha probability of at least − n − , it samples the optimum after Θ (cid:18) λ n log( λ/µ ) (cid:19) ﬁtness function evaluations.Proof. From the assumptions of Theorems 5 and 6, we take the stricter ones.The additive term ⌈ nn − e ln n ⌉ in Theorem 5 vanishes in asymptotic notation,and the term n − ξ in Theorem 6 is Ω( n ), due to λ being bounded from aboveby a polynomial in ξ . Taking the union bound over the failure probabilitiesof both theorems concludes the proof. We improved the best known upper bound for the run time of the UMDAon

LeadingOnes for the case of µ ∈ Ω( n log n ) from O ( nλ log λ ) to O (cid:16) n λ log( λ/µ ) (cid:17) . This result improves the previous best result both by removingan unnecessary log λ factor and, not discussed in previous works, by gaininga log( λ/µ ) factor and thus showing an advantage of using a low selectionrate µ/λ . We obtained these results via a diﬀerent proof method that avoidsthe technically demanding level-based method. Our arguments can also beemployed for lower bounds. We did so and provided the ﬁrst lower boundfor the run time of the UMDA on LeadingOnes . Combined, these resultsprovide a run time estimate for the UMDA on

LeadingOnes that is tightup to constant factors.We note that the general proof idea can be extended also to the parameterregime of µ ∈ o ( n log n ) for the UMDA. We conjecture that a more generalupper bound of the UMDA (with λ ∈ Ω(log n )) on LeadingOnes is O  λ (cid:18) n + ne µ/n (cid:18) nλ + log min { µ, n } (cid:19)(cid:19)  . Speaking in terms of iterations and thus ignoring the factor of λ , this expres-sion can be explained as follows: the ﬁrst term of n considers O ( n ) frequenciesthat do not drop below constant values. Each of these frequencies is set to1 − n within a constant number of iterations with high probability. Since λ ∈ Ω(log n ), frequencies at 1 − n do not drop until the optimum is sampledwith high probability.The second, more complicated term deals with frequencies that, pes-simistically, reached the lower border n . There are O ( n/e µ/n ) of these fre-quencies, by the same argument as used in the proof of Lemma 1. The other12actor is concerned with the time it takes a critical frequency to be increasedto 1 − n with high probability. Here, a case distinction needs to be madewith respect to whether µ ≥ n . The inverse of the maximum of µ and n determines the step size in which a critical frequency can be increased. Theterm log min { µ, n } stems from the multiplicative up-drift [DK19] of such afrequency in order to reach a constant value. Afterward, it is set to 1 − n within a constant number of iterations (as the ﬁrst O ( n ) frequencies). Last,the term n/λ is only important if λ ∈ o ( n ) and denotes the waiting timefor a critical frequency to sample at least one 1 with λ tries (given that thepreﬁx consists of only 1s).Last, we are positive that our proof technique is applicable to a greaterclass of EDAs. In order to transfer the proof of the upper bound to otherunivariate EDAs, only Lemmas 1 and 2 need to be adjusted to the speciﬁcalgorithm, which should work similarly for other EDAs too. For the lowerbound, Lemmas 3 to 5 need to be changed. Acknowledgments

This work was supported by a public grant as part of the Investissementd’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in ajoint call with Gaspard Monge Program for optimization, operations researchand their interactions with data sciences. This publication is based uponwork from COST Action CA15140, supported by COST.

References [DK19] Benjamin Doerr and Timo Kötzing. Multiplicative up-drift. In

Proc. of GECCO ’19 , pages 1470–1478, 2019.[DL15] Duc-Cuong Dang and Per Kristian Lehre. Simpliﬁed runtime analy-sis of estimation of distribution algorithms. In

Proc. of GECCO ’15 ,pages 513–518, 2015.[DLN19] Duc-Cuong Dang, Per Kristian Lehre, and Phan Trung Hai Nguyen.Level-based analysis of the univariate marginal distribution algo-rithm.

Algorithmica , 81(2):668–702, 2019.[Doe19] Benjamin Doerr. A tight runtime analysis for the cGA on jumpfunctions: EDAs can cross ﬁtness valleys at no extra cost. In

Proc. of GECCO ’19 , pages 1488–1496, 2019.13Doe20] Benjamin Doerr. Probabilistic tools for the analysis of ran-domized optimization heuristics. In

Theory of Evolution-ary Computation—Recent Developments in Discrete Optimization ,pages 1–87. Springer International Publishing, 2020. Also availableat https://arxiv.org/abs/1801.06733 .[DZ19] Benjamin Doerr and Weijie Zheng. Sharp bounds for genetic driftin EDAs.

CoRR , abs/1910.14389, 2019.[KW18] Martin S. Krejca and Carsten Witt. Lower bounds on the runtime of the univariate marginal distribution algorithm on onemax.

Theoretical Computer Science , 2018.[KW20] Martin S. Krejca and Carsten Witt. Theory of estimation-of-distribution algorithms. In

Theory of EvolutionaryComputation—Recent Developments in Discrete Optimiza-tion , pages 405–442. Springer, 2020. Also available at http://arxiv.org/abs/1806.05392 .[LN17] Per Kristian Lehre and Phan Trung Hai Nguyen. Improved runtimebounds for the univariate marginal distribution algorithm via anti-concentration. In

Proc. of GECCO ’17 , pages 1383–1390, 2017.[LN19] Per Kristian Lehre and Phan Trung Hai Nguyen. Runtime anal-ysis of the univariate marginal distribution algorithm under lowselective pressure and prior noise. In

Proc. of GECCO ’19 , pages1497–1505, 2019.[LSW18] Johannes Lengler, Dirk Sudholt, and Carsten Witt. Medium stepsizes are harmful for the compact genetic algorithm. In

Proc. ofGECCO ’18 , pages 1499–1506, 2018.[MP96] Heinz Mühlenbein and Gerhard Paaß. From recombination of genesto the estimation of distributions I. Binary parameters. In

Proc. ofPPSN ’96 , pages 178–187, 1996.[PHL15] Martin Pelikan, Mark Hauschild, and Fernando G. Lobo. Estima-tion of distribution algorithms. In

Springer Handbook of Computa-tional Intelligence , pages 899–928. Springer, 2015.[Wit17] Carsten Witt. Upper bounds on the runtime of the univari-ate marginal distribution algorithm on OneMax. In