[PDF] Stagnation Detection with Randomized Local Search

Abstract

Recently a mechanism called stagnation detection was proposed that automatically adjusts the mutation rate of evolutionary algorithms when they encounter local optima. The so-called SD-(1+1)EA introduced by Rajabi and Witt (GECCO 2020) adds stagnation detection to the classical (1+1)EA with standard bit mutation, which flips each bit independently with some mutation rate, and raises the mutation rate when the algorithm is likely to have encountered local optima. In this paper, we investigate stagnation detection in the context of the k-bit flip operator of randomized local search that flips k bits chosen uniformly at random and let stagnation detection adjust the parameter k. We obtain improved runtime results compared to the SD-(1+1)EA amounting to a speed-up of up to e=2.71\dots Moreover, we propose additional schemes that prevent infinite optimization times even if the algorithm misses a working choice of k due to unlucky events. Finally, we present an example where standard bit mutation still outperforms the local k-bit flip with stagnation detection.

Full PDF

SStagnation Detection withRandomized Local Search

Amirhossein RajabiTechnical University of DenmarkKgs. [email protected] Carsten WittTechnical University of DenmarkKgs. [email protected] 9, 2021

Abstract

Recently a mechanism called stagnation detection was proposed that automati-cally adjusts the mutation rate of evolutionary algorithms when they encounter localoptima. The so-called SD-(1+1) EA introduced by Rajabi and Witt (GECCO 2020)adds stagnation detection to the classical (1+1) EA with standard bit mutation,which ﬂips each bit independently with some mutation rate, and raises the muta-tion rate when the algorithm is likely to have encountered local optima.In this paper, we investigate stagnation detection in the context of the k -bit ﬂipoperator of randomized local search that ﬂips k bits chosen uniformly at randomand let stagnation detection adjust the parameter k . We obtain improved runtimeresults compared to the SD-(1+1) EA amounting to a speed-up of up to e = 2 . . . . Moreover, we propose additional schemes that prevent inﬁnite optimization timeseven if the algorithm misses a working choice of k due to unlucky events. Finally,we present an example where standard bit mutation still outperforms the local k -bitﬂip with stagnation detection. Evolutionary Algorithms (EAs) are parameterized algorithms, so it has been ongoingresearch to discover how to choose their parameters best. Static parameter settingsare not eﬃcient for a wide range of problems. Also, given a speciﬁc problem, theremight be diﬀerent scenarios during the optimization, which results in ineﬃciency of onestatic parameter conﬁguration for the whole run. Self-adjusting mechanisms addressthis issue as a non-static parameter control framework that can learn acceptable or evennear-optimal parameter settings on the ﬂy. See also the survey article Doerr and Doerr(2020) for a detailed coverage of static and non-static parameter control.Many studies have been conducted on frameworks which adjust the mutation rate ofdiﬀerent mutation operators, in particular in the standard bit mutation for the search1 a r X i v : . [ c s . N E ] F e b pace of bit strings { , } n to make the rate eﬃcient on unimodal functions. For example,the (1 + ( λ, λ )) GA using the 1 / OneMax (Doerr and Doerr, 2018), resulting in asymptotic speed-upscompared to static settings. Likewise, the self-adjusting mechanism in the (1+ λ ) EA withtwo rates proposed in Doerr et al. (2019) performs on unimodal functions as eﬃcientlyas the best λ -parallel unary unbiased black-box algorithm.The self-adjusting frameworks mentioned above are mainly designed to optimizeunimodal functions. Generally, they are not able to suggest an eﬃcient parameter settingwhere algorithms get stuck in a local optimum since they mainly work based on thenumber of successes, so there is no signal in such a situation. On multimodal functions,where some speciﬁc numbers of bits have to ﬂip to make progress, Stagnation Detection (SD) introduced in Rajabi and Witt (2020) can overcome local optima in eﬃcient time.This module can be added to most of the existing algorithms to leave local optimawithout any signiﬁcant increase of the optimization time of unimodal (sub)problems.To our knowledge, no study has put forward other runtime analyses of self-adjustingmechanisms on multimodal functions. However, in a broader context of mutation-basedrandomized search heuristics, the heavy-tailed mutation presented in Doerr et al. (2017)has been able to leave a local optimum in a much more eﬃcient time than the standardbit mutation does. Moreover, in the context of artiﬁcial immune systems (Corus, Olivetoand Yazdani, 2018) and hyperheuristics (Lissovoi, Oliveto and Warwicker, 2019), thereare proofs that speciﬁc search operators and selection of low-level heuristics can speedup multimodal optimization compared to the classical mutation operators.Recent theoretical research on evolutionary algorithms in discrete search spacesmainly considers global mutations which can create all possible points in one itera-tion. These mutations have been functional in optimization scenarios where informationabout the diﬃculties of the local optima is not available. For example, the standard bitmutation which ﬂips each bit independently with a non-zero probability can produce anypoint in the search space. However, local mutations can only create a ﬁxed set of oﬀ-spring points. The 1-bit ﬂip mutation that often can be found in the Randomized LocalSearch algorithm (RLS) can only reach a limited number of search points, which resultsin being stuck in a local optimum with the elitist selection. Nevertheless, local muta-tions may outperform global mutations on unimodal functions and multimodal functionswith known gap sizes. It is of special interest to use advantages of local mutations onunimodal (sub)functions additionally to overcome local optima eﬃciently.This paper investigates k -bit ﬂip mutation as a local mutation in the context of theabove-mentioned stagnation detection mechanism. This mechanism detects when thealgorithm is stuck in a local optimum and gradually increases mutation strength (i. e.,the number of ﬂipped bits) to a value the algorithm needs to leave the local optimum.Similarly, we aim to show that the algorithms using k -bit ﬂip can use stagnation de-tection to tune the parameter k . One of the key beneﬁts of such algorithms is usingthe eﬃciency of RLS, which performs very well on unimodal (sub)problems without fearof inﬁnite running time in local optima. An additional advantage of using k -bit ﬂipmutation accompanied by stagnation detection is that it overcomes local optima more2ﬃciently than global mutations. Moreover, the outcome points out the advantages andpracticability of our self-adjusting approach that makes local-mutation algorithms ableto optimize functions that have been intractable to solve so far.We propose two algorithms combining stagnation detection with local mutations.The ﬁrst algorithm called SD-RLS gradually increases the mutation strength when thecurrent strength has been unsuccessful in ﬁnding improvements for a signiﬁcantly longtime. In the most extreme case, the strength ends at n , i. e., mutations ﬂipping allbits. With high probability, SD-RLS has a runtime that is by a factor of ( nem ) m / (cid:0) nm (cid:1) (up to lower-order terms) smaller on functions with Hamming gaps of size m than theSD-(1+1) EA previously considered in Rajabi and Witt (2020). This improvementis especially strong for small m and amounts to a factor of e on unimodal functions.Although it is unlikely that the algorithm fails to ﬁnd an improvement when the currentstrength allows this, there is a risk that this algorithm misses the “right” strength andtherefore it can have inﬁnite expected runtime. To address this, we propose a secondalgorithm called SD-RLS ∗ that repeatedly loops over all smaller strengths than the lastattempted one when it fails to ﬁnd an improvement. This results in expected ﬁniteoptimization time on all problems and only increases the typical runtime by lower-orderterms compared to SD-RLS. We also observe that the algorithms we obtain can stillfollow the same search trajectory as the classical RLS when one-bit ﬂips are suﬃcient tomake improvements. In those cases, well-established techniques for the analysis of RLSlike the ﬁtness-level method carry over to our variant enhanced with stagnation detection.This is not necessarily the case in related approaches like variable neighborhood search(Hansen and Mladenovic, 2018) and quasirandom evolutionary algorithms (Doerr, Fouzand Witt, 2010) both of which employ more determinism and do not generally followthe trajectory of RLS.We shall investigate the two suggested algorithms on unimodal functions and func-tions with local optima of diﬀerent so-called gap sizes, corresponding to the numberof bits that need to be ﬂipped to escape from the optima. Many results are obtainedfollowing the analysis of the SD-(1+1) EA (Rajabi and Witt, 2020) which uses a globaloperator with self-adjusted mutation strength. In fact, often the general proof structurecould be taken over almost literally but with improved overall bounds. In conclusion, theself-adjusting local mutation seems to be the preferred alternative to the SD-(1+1) EAwith global mutation. However, we will also investigate carefully chosen scenarios whereglobal mutations are superior.This paper is structured as follows: in Section 2, we state the classical RLS algorithmand introduce our self-adjusting variants with stagnation detection; moreover, we collectimportant mathematical tools. Section 3 shows runtime results for the simpler variantSD-RLS, concentrating on the probability of leaving local optima, while Section 4 givesa more detailed analysis of the variant SD-RLS ∗ on benchmark functions like OneMax and

Jump . Section 5 analyzes an example function which the standard (1+1) EA withstandard bit mutation can solve in polynomial time with high probability whereas the k -bit ﬂip mutation with stagnation detection needs exponential time. Through improvedupper bounds, we give in Section 6 indications for that our approach may also be superior3o static settings on instances of the minimum spanning tree problem. This problem andother scenarios are investigated experimentally in Section 7 before we ﬁnally concludethe paper. In this paper, we consider pseudo-boolean functions f : { , } n → R that w. l. o. g. areto be maximized. One of the ﬁrst randomized search heuristics studied in the literatureis randomized local search (RLS) (Doerr and Doerr, 2016) displayed in Algorithm 1.This heuristic starts with a random search point and then repeats mutating the point byﬂipping s uniformly chosen bits (without replacement) and replacing it with the oﬀspringif it is not worse than the parent. Algorithm 1

RLS with static strength s Select x uniformly at random from { , } n for t ← , , . . . do Create y by ﬂipping s bit(s) in a copy of x . if f ( y ) ≥ f ( x ) then x ← y .The runtime or the optimization time of a heuristic on a function f is the ﬁrst pointtime t where a search point of maximal ﬁtness has been created; often the expectedruntime, i. e., the expected value of this time, is analyzed.Theoretical research on evolutionary algorithms mainly studies algorithms on simpleunimodal well-known benchmark problems like OneMax ( x , . . . , x n ) := | x | , but also on the multimodal Jump m function with gap size m deﬁned as follows: Jump m ( x , . . . , x n ) = ( m + | x | if | x | ≤ n − m or | x | = nn − | x | otherwiseThe mutation used in RLS is a local mutation as it only produces a limited numberof oﬀspring. This mutation, which we call s -ﬂip in the following (in the introduction,we used the classical name k -bit ﬂip), ﬂips exactly s bits randomly chosen from the bitstring of length n , so for any point x ∈ { , } n , RLS can just sample from (cid:0) ns (cid:1) possiblepoints. As a result, s -ﬂip is often more eﬃcient compared to global mutations when weknow the diﬃculty of making progress since the algorithm just looks at a certain part ofthe search space. To be more precise, we recall the so-called gap of the point x ∈ { , } n deﬁned in Rajabi and Witt (2020) as the minimum Hamming distance to points withthe strictly larger ﬁtness function value. Formally,gap( x ) := min { H ( x, y ) : f ( y ) > f ( x ) , y ∈ { , } n } .

4t is not possible to make progress by ﬂipping less than gap( x ) bits of the current searchpoint x . However, if the algorithm uses the s -ﬂip with s = gap( x ), it can make progresswith a positive probability. In addition, on unimodal functions where the gap of all pointsin the search space (except for global optima) is one, the algorithm makes progress withstrength s = 1.Nevertheless, understanding the diﬃculty of a local optimum has not generally beenpossible so far, and beneﬁting from domain knowledge to use it to determine the strengthis not always feasible in the perspective of black-box optimization. Therefore, despite theadvantages of s -ﬂip, global mutations, e. g. standard bit mutation, which can produceany point in the search space, have been used in the literature frequently. For example,the (1+1) EA that uses a similar approach to Algorithm 1 beneﬁts from standard bitmutation that implicitly uses the binomial distribution to determine how many bits mustﬂip. Consequently, even if the algorithm uses strength 1 (e.g. mutation rate 1 /n ), witha positive probability, the algorithm can escape from any local optimum.We study the search and success probability of Algorithm 1 and its relation to stag-nation detection more closely. With similar arguments as presented in Rajabi and Witt(2020), if the gap of the current search point is 1 then the algorithm makes an improve-ment with probability 1 /R at strength 1, and the probability of not ﬁnding it in n ln R steps is at most (1 − /n ) n ln R ≤ /R (where R is a parameter to be discussed). Similarly,the probability of not ﬁnding an improvement for a point with gap of k within (cid:0) nk (cid:1) ln R steps is at most − (cid:0) nk (cid:1) ! ( nk ) ln R ≤ R .

Hence, after (cid:0) nk (cid:1) ln R steps without improvement there is a probability of at least 1 − /R that no improvement at Hamming distance k exists, so for enough large R the probabilityof failing is small.We consider this idea to develop the ﬁrst algorithm. We add the stagnation detectionmechanism to RLS to manage the strength s . As shown in Algorithm 2, hereinafter calledSD-RLS, the initial strength is 1. Also, there is a counter u for counting the numberof unsuccessful steps to ﬁnd the next after the last success. When the counter exceedsthe threshold of (cid:0) ns (cid:1) ln R , strength s is increased by one, and when the algorithm makesprogress, the counter and strength are reset to their initial values. In the case that thealgorithm is failed to have a success where the strength is equal to the gap of the currentsearch point, the algorithm misses the chance of making progress. Therefore, withprobability 1 /R , the optimization time would be inﬁnitive. Choosing a large enough R to have an overwhelming large probability of making progress could be a solution to thisproblem. However, we propose another algorithm that resolves this issue, although therunning time is not always as eﬃcient as with Algorithm 2.In Algorithm 3, hereinafter called SD-RLS ∗ , we introduce a new variable r called ra-dius. This parameter determines the largest Hamming distance from the current searchpoint that algorithm must investigate. In details, when the radius becomes r , the algo-rithm starts with strength r (i. e., s = r ) and when the threshold is exceeded, it decreases5he strength by one as long as the strength is greater than 1. This results in a morerobust behavior. In the case that the threshold exceeds and the current strength is 1,the radius is increased by one to cover a more expanded space. Also, when the radiusexceeds n/

2, the algorithm increases the radius to n , which means that the algorithmcovers all possible strengths between 1 and n . We note that the strategy of repeatedlyreturning to lower strengths remotely resembles the 1 / Algorithm 2

RLS with stagnation detection (SD-RLS)Select x uniformly at random from { , } n and set s ← u ← for t ← , , . . . do Create y by ﬂipping s t bits in a copy of x uniformly. u ← u + 1. if f ( y ) > f ( x ) then x ← y . s t +1 ← u ← else if f ( y ) = f ( x ) and s t = 1 then x ← y . if u > (cid:0) ns (cid:1) ln R then s t +1 ← min { s t + 1 , n } . u ← else s t +1 ← s t .The parameter R represents the probability of failing to ﬁnd an improvement at the“right” strength. More precisely, as we will see in Theorem 1 and Lemma 2 (for SD-RLSand SD-RLS ∗ , respectively), the probability of not ﬁnding an improvement where thereis a potential of making progress is at most 1 /R . We recommend R ≥ | Im f | for SD-RLS(where Im f is the image set of f ), and for a constant (cid:15) , R ≥ n (cid:15) · | Im f | for SD-RLS ∗ ,resulting in that the probability of ever missing an improvement at the right strength issuﬃciently small throughout the run. The following lemma containing some combinatorial inequalities will be used in theanalyses of the algorithms. The ﬁrst part of the lemma seems to be well known andhas already been proved in Lugo (2017) and is also a consequence of Lemma 1.10.38 inDoerr (2020). The second part follows from elementary manipulations.6 lgorithm 3

RLS with robust stagnation detection (SD-RLS ∗ )Select x uniformly at random from { , } n and set r ← s ← u ← for t ← , , . . . do Create y by ﬂipping s t bits in a copy of x uniformly. u ← u + 1. if f ( y ) > f ( x ) then x ← y . s t +1 ← r t +1 ← u ← else if f ( y ) = f ( x ) and r t = 1 then x ← y . if u > (cid:0) ns t (cid:1) ln R thenif s t = 1 thenif r t < n/ then r t +1 ← r t + 1 else r t +1 ← ns t +1 ← r t +1 . else r t +1 ← r t . s t +1 ← s t − u ← else s t +1 ← s t . r t +1 ← r t . Lemma 1.

For any integer m ≤ n/ , we have(a) P mi =1 (cid:0) ni (cid:1) ≤ n − ( m − n − (2 m − (cid:0) nm (cid:1) ,(b) (cid:0) nM (cid:1) ≤ (cid:0) nm (cid:1) (cid:0) n − mm (cid:1) M − m for m < M < n/ . Proof. (a) We use the following proof due to Lugo (2017). Through the equation (cid:0) nk − (cid:1) = kn − k +1 (cid:0) nk (cid:1) which comes from the deﬁnition of the binomial coeﬃcient and geometricseries sum formula, we achieve the following result: P mi =1 (cid:0) ni (cid:1)(cid:0) nm (cid:1) = (cid:0) nm (cid:1)(cid:0) nm (cid:1) + (cid:0) nm − (cid:1)(cid:0) nm (cid:1) + · · · + (cid:0) n (cid:1)(cid:0) nm (cid:1) < mn − m + 1 + m ( m − n − m + 1)( n − m + 2) + . . . ≤ mn − m + 1 + (cid:18) mn − m + 1 (cid:19) + · · · = n − ( m − n − (2 m − t ≥ m , by using (cid:0) nm (cid:1) = n − m +1 m (cid:0) nm − (cid:1) , we have nM ! = ( n − M + 1) . . . ( n − M + t ) M . . . ( M − t + 1) nM − t ! ≤ (cid:18) n − M + tM − t (cid:19) t nM − t ! Thus, by setting m = M − t , we obtain the statement. (cid:3) In this section, we study the ﬁrst algorithm called SD-RLS, see Algorithm 2. In thebeginning of the section, we show upper and lower bounds on the time for escapingfrom local optima. Then, in Theorem 2, we show the important result that on unimodalfunctions, SD-RLS with probability 1 − | Im f | /R behaves in the same way as RLS withstrength 1, including the same asymptotic bound on the expected optimization time.The following theorem shows the time SD-RLS takes with probability 1 − /R tomake progress of search point x with a gap of m . Theorem 1.

Let x ∈ { , } n be the current search point of SD-RLS on a pseudo-boolean function f : { , } n → R . Deﬁne T x as the time to create a strict improvementif gap( x ) = m . Let U be the event of ﬁnding an improvement at Hamming distance m .Then, we have E ( T x | U ) ≤ (cid:0) nm (cid:1) (1 + O ( m ln Rn )) if m = o ( n ) ,O (cid:0)(cid:0) nm (cid:1) ln R (cid:1) if m = Θ( n ) ∧ m < n/ ,O (2 n ln R ) if m ≥ n/ . Moreover,

Pr( U ) ≥ − /R . Compared to the corresponding theorems in Rajabi and Witt (2020), the boundsin Theorem 1 are by a factor of ( nem ) m / (cid:0) nm (cid:1) (up to lower-order terms) smaller. Thisspeedup is roughly e for m = 1, i. e., unimodal functions (like OneMax ) but becomesless pronounced for larger m since, intuitively, the number of ﬂipped bits in a standardbit mutation will become more and more concentrated and start resembling the m -bitﬂip mutation. Proof of Theorem 1.

The algorithm SD-RLS can make an improvement only wherethe current strength s is equal to m and the probability of not ﬁnding an improvementduring this phase is  − nm ! −  ( nm ) ln R ≤ R .

8f the improvement event happens, the running time of the algorithm to escape fromthis local optimum is E ( T x | U ) < m − X i =1 ni ! ln R | {z } =: S + nm !| {z } =: S , where S is the number of iterations for s < m and S is the expected number ofiterations needed to make an improvement where s = m .By using Lemma 1 for m < n/

2, we haveE ( T x | U ) < m − X i =1 ni ! ln R + nm ! < n − m + 2 n − m + 3 nm − ! ln R + nm ! = n − m + 2 n − m + 3 · mn − m + 1 nm ! ln R + nm ! = nm ! (cid:18) n − m + 2 n − m + 3 · mn − m + 1 ln R + 1 (cid:19) , and for m ≥ n/

2, we know that P ni =1 (cid:0) ni (cid:1) < n , so we can computeE ( T x | U ) = m − X i =1 ni ! ln R + nm ! ≤ O (2 n ln R )Altogether we achieveE ( T x | U ) ≤ (cid:0) nm (cid:1) (1 + O ( m ln Rn )) if m = o ( n ) ,O (cid:0)(cid:0) nm (cid:1) ln R (cid:1) if m = Θ( n ) ∧ m < n/ ,O (2 n ln R ) if m ≥ n/ . (cid:3) Using the previous lemma, we obtain the following result that allows us to reuseexisting results for RLS on unimodal functions.

Theorem 2.

Let f : { , } n → R be a unimodal function and consider SD-RLS with R ≥ | Im f | . Then, with probability at least − | Im f | R , the SD-RLS never increases theradius and behaves stochastically like RLS before ﬁnding an optimum of f . Proof.

As on unimodal functions, the gap of all points is 1, the probability of notﬁnding and improvement is  − nm ! −  ( nm ) ln R ≤ R . | Im f | improving steps happen before ﬁnding the optimum, by a union bound the prob-ability of algorithm 2 ever increasing the strength beyond 1 is at most | Im( f ) | R , whichproves the lemma. (cid:3) With these two general results, we conclude the analysis of SD-RLS and turn tothe variant SD-RLS ∗ that always has ﬁnite expected optimization time. In fact, wewill present similar results in general optimization scenarios and supplement them byanalyses on speciﬁc benchmark functions. It is possible to analyze the simpler SD-RLSon these benchmark functions as well, but we do not feel that this gives additionalinsights. ∗ In this section, we turn to the algorithm SD-RLS ∗ that iteratively returns to lowerstrengths to avoid missing the “right” strength. We recall T x as the number of steps SD-RLS ∗ takes to ﬁnd an improvement point from the current search point x . Let phase r consists of all points of time where radius r is used in the algorithm. When the algorithmenters phase r , it starts with strength r , but when the counter exceeds the threshold,the strength decreases by one as long as it is greater than 1. In the case of strength 1,the radius r is increased to r + 1 (or to n if r + 1 is at least n/ r + 1 (or phase n ).Let E r be the event of not ﬁnding the optimum within phase r , and U ji for j > i bethe event of not ﬁnding the optimum during phases i to j − j .In other words, U ji = E i ∩ · · · ∩ E j − ∩ E j . For i = j , we deﬁne U ii = E i . We obtain thefollowing result on the failure probability which follows from the fact that the algorithmtries to ﬁnd an improvement for (cid:0) nm (cid:1) ln R iterations with a probability of success of (cid:0) nm (cid:1) − when the radius is at least m . Lemma 2.

Let x ∈ { , } n be the current search point of SD-RLS ∗ on a pseudo-booleanﬁtness function f : { , } n → R and let m = gap( x ) . Then Pr( E r ) ≤ ( R if m ≤ r < n if r = n. The following lemma bounds the time to leave a local optimum conditional on thatthe “right” strength was missed.

Proof.

Assume r ≥ m . During phase r , the algorithm spends (cid:0) nm (cid:1) ln R steps atstrength m until it changes the strength or phase. Then, the probability of not im-proving at strength s = m in phase r is at mostPr( E r ) =  − nm ! −  ( nm ) ln R ≤ R . n , the algorithm does not change radius r anymore, and it continuesto ﬂip s bits with diﬀerent s containing m until making progress so the probability ofeventually failing to ﬁnd the improvement in this phase is 0. (cid:3) Lemma 3.

Let x ∈ { , } n with m = gap( x ) < n/ be the current search point of SD-RLS ∗ with R ≥ n (cid:15) · | Im f | for an arbitrary constant (cid:15) > on a pseudo-boolean function f : { , } n → R and T x be the time to create a strict improvement. Then, we have E ( T x | E m ) = o R | Im f | nm !! , where E m is the event of not ﬁnding an optimum when the radius r equals m . The reason behind the factor R/ | Im f | in Lemma 3 is that for proving a running timeof SD-RLS ∗ on a function like f , the event E m happens with probability 1 /R for eachpoint in Im f , so in the worst case, during the run, there are expected | Im f | /R searchpoints where the counter exceeds the threshold, resulting in an expected number of atmost | Im f | /R · o ( R/ | Im f | (cid:0) nm (cid:1) ) extra iterations for the whole run in the case of exceedingthe thresholds. Also, note that we always have R/ | Im f | = Ω(1) since according to theassumption, R > | Im f | . Proof of Lemma 3.

Using the law of total probability with respect to the events U im +1 deﬁned above, we haveE ( T x | E m ) = b n/ − c X i = m +1 E (cid:16) T x | U im +1 (cid:17) Pr (cid:16) U im +1 (cid:17)| {z } =: S + E (cid:0) T x | U nm +1 (cid:1) Pr (cid:0) U nm +1 (cid:1)| {z } =: S . In order to estimate Pr (cid:0) U im +1 (cid:1) , using Lemma 2, we havePr (cid:16) U im +1 (cid:17) < i − Y j = m +1 Pr( E j ) < R − ( i − m − . For S , by using Lemma 1 multiple times, we compute b n/ − c X i = m +1 E (cid:16) T x | U im +1 (cid:17) Pr (cid:16) U im +1 (cid:17) = b n/ − c X i = m +1 i X r =1 r X s =1 ns ! ln R · R − ( i − m − ≤ b n/ − c X i = m +1 i X r =1 n − r + 1 n − r + 1 nr ! ln R · R − ( i − m − ≤ R b n/ − c X i = m +1 (cid:18) n − i + 1 n − i + 1 (cid:19) ni ! ln R · R − ( i − m ) ≤ R b n/ − c X i = m +1 (cid:18) n − i + 1 n − i + 1 (cid:19) (cid:18) n − mm (cid:19) i − m nm ! ln R · R − ( i − m ) R | Im f | nm ! b n/ − c X i = m +1 (cid:18) n − i + 1 n − i + 1 (cid:19) (cid:18) n − mn · m (cid:19) i − m · n i − m | Im f | ln RR ( i − m ) . Using the fact that R ≥ n (cid:15) · | Im f | and i > m , the last expression is bounded fromabove by o ( R/ | Im f | (cid:0) nm (cid:1) ).Regarding S , when radius r is increased to n , the algorithm mutates s bits of thethe current search point for all possible strengths of 1 to n periodically. In each cyclethrough diﬀerent strengths, according to lemma 2, the algorithm escapes from the localoptimum with probability 1 − /R so there are R/ ( R −

1) cycles in expectation viageometric distribution. Besides, each cycle of radius n costs P ni = s (cid:0) ns (cid:1) ln R . Overall,we have RR − P ns =1 (cid:0) ns (cid:1) ln R extra ﬁtness function calls if the algorithm fails to ﬁnd theoptimum in the ﬁrst b n/ − c phases happened with the probability of R − ( b n/ c− m − .Thus, we haveE (cid:0) T x | U nm +1 (cid:1) Pr (cid:0) U nm +1 (cid:1) ≤  b n/ c X r =1 r X s =1 ns ! ln R + RR − n X s =1 ns ! ln R  R − ( b n/ c− m − ≤ R · (cid:18) RR − (cid:19) n/ X s =1 ns ! ln R · R − ( b n/ c− m ) ≤ R · (cid:18) RR − (cid:19) n/ X s =1 (cid:18) n − mm (cid:19) s − m nm ! ln R · R − ( b n/ c− m ) = R | Im f | · (cid:18) RR − (cid:19) nm ! n/ X s =1 (cid:18) n − mn · m (cid:19) s − m | Im f | n s − m ln RR ( b n/ c− m ) = o R | Im f | nm !! Altogether, we ﬁnally have E ( T x | E m ) = S + S = o (cid:0) R/ | Im f | (cid:0) nm (cid:1)(cid:1) as suggested. (cid:3) The following theorem and its proof are similar to Theorem 1 but require a morecareful analysis to cover the repeated use of smaller strengths. We note that the boundsdiﬀer from Theorem 1 only in lower-order terms unless m is very big. Theorem 3.

Let x ∈ { , } n be the current search point of SD-RLS ∗ with R ≥ n (cid:15) ·| Im f | for an arbitrary constant (cid:15) > on a pseudo-boolean function f : { , } n → R .Deﬁne T x as the time to create a strict improvement if gap( x ) = m . Then, we have E ( T x ) ≤ (cid:0) nm (cid:1) (cid:16) O (cid:16) m n − m ln R (cid:17)(cid:17) if m < n/ n n ln R if m ≥ n/ , and E ( T x ) ≥ (cid:0) nm (cid:1) /W , where W is the number of strictly better search points at Hammingdistance m . roof. Using the law of total probability with respect to E m deﬁned above as the eventof not ﬁnding the optimum by the end of phase m , we haveE ( T x ) = E (cid:16) T x | E m (cid:17) Pr (cid:16) E m (cid:17)| {z } =: S + E ( T x | E m ) Pr( E m ) | {z } =: S . Regarding S , it takes P m − i =1 P ij =1 (cid:0) nj (cid:1) ln R steps until SD-RLS ∗ increases both radiusand strength to m . When the mutation strength is m , within an expected number of (cid:0) nm (cid:1) steps, a better point will be found.In regard to S , where the optimum is not found by the end of phase m , there wouldbe at most O ( R/ | Im f | (cid:0) nm (cid:1) ) iterations in expectation through Lemma 3. This event, i. e., E m , is happened with the probability of 1 /R .Altogether, for m < n/

2, using Lemma 1, we haveE ( T x ) = Pr (cid:16) E m (cid:17) E (cid:16) T x | E m (cid:17) + Pr( E m )E ( T x | E m ) ≤ E (cid:16) T x | E m (cid:17) + Pr( E m )E ( T x | E m ) ≤ m − X i =1 i X j =1 nj ! ln R + nm ! + 1 R · o R | Im f | nm !! ≤ m − X i =1 n − ( i − n − (2 i − ni ! ln R + nm ! + o nm !! ≤ (cid:18) mn − m + 1 (cid:19) m − X i =1 ni ! ln R + nm ! + o nm !! ≤ (cid:18) mn − m + 1 (cid:19) (cid:18) m − n − m + 3 (cid:19) nm ! ln R + nm ! + o nm !! = nm ! O m n − m ln R ! . For m ≥ n/

2, the algorithm is not able to make an improvement for radius r lessthan n/

2. However, as radius r is increased to n , the algorithm mutates m-bits of thethe current search point for all possible strengths of 1 to n periodically. Thus, accordingto lemma 2, the algorithm escapes from the local optimum with probability 1 − /R sothere are R/ ( R −

1) cycles in expectation through geometric distribution in this phase.Finally, we computeE ( T x | U m ) = b n/ − c X i =1 i X j =1 nj ! ln R + RR − n X i =1 ni ! ln R ≤ O (2 n n ln R ) . Moreover, the expected number of iterations for making an improvement is at least (cid:0) nm (cid:1) where the current strength s equals m . The algorithm is not able to have a successwith other strengths. Therefore, we have E ( T x ) ≥ (cid:0) nm (cid:1) . (cid:3) Lemma 4.

Let f : { , } n → R be a unimodal function and consider SD-RLS ∗ with R ≥ n (cid:15) ·| Im f | for an arbitrary constant (cid:15) > . Then, with probability at least − | Im f | R ,SD-RLS ∗ never increases the radius and behaves stochastically like RLS before ﬁndingan optimum of f .Denote by T the runtime of SD-RLS ∗ on f . Let f i be the i -th ﬁtness value of anincreasing order of all ﬁtness values in f and s i be a lower bound on the probability thatRLS ﬁnds an improvement from search points with ﬁtness value f i , then E ( T ) ≤ | Im f | X i =1 s i + o ( n ) . Proof.

As on unimodal functions, the gap of all points is 1, the probability of notﬁnding and improvement is (cid:16) − (cid:0) nm (cid:1) − (cid:17) ( nm ) ln R ≤ R . This argumentation holds for eachimprovement that has to be found. Since at most | Im f | improving steps happen beforeﬁnding the optimum, by a union bound the probability of SD-RLS ∗ ever increasing thestrength beyond 1 is at most | Im( f ) | R , which proves the lemma.We let the random set W contain the search points from which SD-RLS ∗ does notﬁnd an improvement within phase 1 (i. e., while r t = 1). To prove the second claim, weconsider all ﬁtness levels A , . . . , A | Im f | such that A i contains search points with ﬁtnessvalue f i and sum up upper bounds on the expected times to leave each of these ﬁtnesslevels. Under the condition that the strength is not increased before leaving a ﬁtnesslevel, the worst-case time to leave ﬁtness level A i is 1 /s i similarly to RLS. Hence, webound the expected optimization time of SD-RLS ∗ from above by adding the waitingtimes on all ﬁtness levels for RLS, which is given by P | Im f | i =1 /s i , and the expected timesspent to leave the points in W ; formally,E ( T ) ≤ | Im f | X i =1 s i + X x ∈ W E ( T x ) . Each point in Im f contributes with probability Pr( E ) to W . Hence, E ( | W | ) ≤| Im f | Pr( E ). As on unimodal functions, the gap of all points is 1, by Lemma 3, wecompute X x ∈ W E ( T x ) ≤ | Im f | · Pr( E ) · E ( E ) ≤ | Im f | · R − o R | Im f | · n !! = o ( n ) . Thus, we ﬁnally have E ( T ) ≤ | Im f | X i =1 s i + o ( n ) , as suggested. (cid:3) Jump functionwhich seems to be the best available for mutation-based hillclimbers.

Theorem 4.

Let n ∈ N . For all ≤ m , the expected runtime E ( T ) of SD-RLS ∗ with R ≥ n (cid:15) for an arbitrary constant (cid:15) > on Jump m satisﬁes E ( T ) ≤ (cid:0) nm (cid:1) (cid:16) O (cid:16) m n − m ln n (cid:17)(cid:17) if m < n/ ,O (2 n n ln n ) otherwise. Proof.

Before reaching the plateau consisting of all points of n − m one-bits, Jump isequivalent to

OneMax ; hence, according to Lemma 4, the expected running time SD-RLS takes to reach the plateau is at most O ( n ln n ). Note that this bound was obtainedvia the ﬁtness level method with s i = ( n − i ) /n as minimum probability for leaving theset of search points with i one-bits.Every plateau point x with n − m one-bits satisﬁes gap( x ) = m according to thedeﬁnition of Jump . Thus, using Theorem 3, the algorithm ﬁnds the optimum withinexpected time E ( T x ) ≤ (cid:0) nm (cid:1) (cid:16) O (cid:16) m n − m ln n (cid:17)(cid:17) if m < n/ O (2 n n ln n ) if m ≥ n/ . This dominates the expected time of the algorithm before the plateau point and resultsin the running time in the theorem. (cid:3)

While our s -ﬂip mutation along with stagnation detection can outperform the (1+1) EAon Jump functions, it is clear that its diﬀerent search behavior may be disadvantageouson other examples. Concretely, we will present a function that has a unimodal path to alocal optimum with a large Hamming distance to the global optimum. SD-RLS will withhigh probability follow this path and incur exponential optimization time. However, thefunction has a second gradient that requires two-bit ﬂips to make progress. The classical(1+1) EA will be able to follow this gradient and to arrive at the global optimum beforeone-bit ﬂips have reached the end of the path to the local optimum.In a broader context, our function illustrates an advantage of global mutation opera-tors. By a simple swap of local and global optimum, it immediately turns into the directopposite, i. e., an example where using global instead of local mutations is highly detri-mental and increases the runtime from polynomial to exponential with overwhelmingprobability. An example of such a function was previously presented in Doerr, Jansenand Klein (2008); however, both the underlying construction and the proof of exponentialruntime for the (1+1) EA seem much more complicated than our example.We will in the following deﬁne the example function called

NeedGlobalMut andgive proofs for the behavior of SD-RLS and (1+1) EA. In fact,

NeedGlobalMut isobtained from the function

NeedHighMut deﬁned in Rajabi and Witt (2020) to show15isadvantages of stagnation detection adjusting the rate of a global mutation operator.The only change is to adjust the length of the suﬃx part of the function, which ratherelegantly allows us to re-use the previous technique of construction and a major part ofthe analysis. We also encourage the reader to read the corresponding section in Rajabiand Witt (2020) for further insights into the construction.In the following, we will imagine any bit string x of length n as being split into apreﬁx a := a ( x ) of length n − m and a suﬃx b := b ( x ) of length m , where m is deﬁnedbelow. Hence, x = a ( x ) ◦ b ( x ), where ◦ denotes the concatenation. The preﬁx a ( x )is called valid if it is of the form 1 i n − m − i , i. e., i leading ones and n − m − i trailingzeros. The preﬁx ﬁtness pre ( x ) of a string x ∈ { , } n with valid preﬁx a ( x ) = 1 i n − m − i equals i , the number of leading ones. The suﬃx consists of d √ n e consecutive blocksof d n / e bits each, altogether m ≤ n / = o ( n ) bits. Such a block is called valid if itcontains either 0 or 2 one-bits; moreover, it is called active if it contains 2 and inactive ifit contains 0 one-bits. A suﬃx where all blocks are valid and where all blocks followingﬁrst inactive block are also inactive is called valid itself, and the suﬃx ﬁtness suff ( x )of a string x with valid suﬃx b ( x ) is the number of leading active blocks before the ﬁrstinactive one. Finally, we call x ∈ { , } n valid if both its preﬁx and suﬃx are valid.The ﬁnal ﬁtness function is a weighted combination of pre ( x ) and suff ( x ). Wedeﬁne for x ∈ { , } n , where x = a ◦ b with the above-introduced a and b , NeedGlobalMut ( x ) :=  n suff ( x ) + pre ( x ) if pre ( x ) ≤ n − m )10 ∧ x valid n m + pre ( x ) + suff ( x ) − n − pre ( x ) > n − m )10 ∧ x valid − OneMax ( x ) otherwise.The function NeedGlobalMut equals

NeedHighMut ξ from Rajabi and Witt(2020) for the setting ξ = 1 / ξ < n m − n −

1, which is bigger than n ( m −

1) + n , an upper bound on the ﬁt-ness of search points that fall into the ﬁrst case without having m leading active blocksin the suﬃx. Hence, search points x where pre ( x ) = n − m and suff ( x ) = d √ n e represent local optima of second-best overall ﬁtness. The set of global optima equalsthe points where pre ( x ) = 9( n − m ) /

10 and suff ( x ) = d √ n e , which implies that( n − m ) /

10 = Ω( n ) bits have to be ﬂipped simultaneously to escape from the localtoward the global optimum. Theorem 5.

With probability − o (1) , SD-RLS with R ≥ n needs Ω( n ) steps to op-timize NeedGlobalMut . The (1+1) EA optimizes this function in time O ( n ) withprobability − − Ω( n / ) . Proof.

As in the proof of Theorem 4.1 in Rajabi and Witt (2020), we have that theﬁrst valid search point (i. e., search point of non-negative ﬁtness) of both SD-RLS and(1+1) EA has both pre - and suff -value value of at most n / with probability 2 − Ω( n / ) .16n the following, we tacitly assume that we have reached a valid search point of thedescribed maximum pre - and suff -value and note that this changes the required numberof improvements to reach local or global maximum only by a 1 − o (1) factor. Forreadability this factor will not be spelt out any more.As long as the counter threshold of SD-RLS is not exceeded, the algorithm behaveslike RLS. We the argumentation from Theorem 2 until the point in time where pre ( x ) = n − m since it is possible to improve the function value by one-bit ﬂips before. Hence,the probability of ever increasing the suff -value before pre ( x ) = n − m is at most1 /R ≤ /n . The ﬁtness can only be further improved if at least ( n − m ) /

10 bits ﬂipsimultaneously. This requires the rate to be increased ( n − m ) / − (cid:0) n ( n − m ) / − (cid:1) = 2 Ω( n ) .This proves the statement for SD-RLS.We now analyze the success probability of the (1+1) EA. To this end, we ﬁrstbound the probability of a mutation being accepted after a valid search point hasbeen reached. Even if a mutation changes up to o ( n ) consecutive bits of the preﬁxor suﬃx, it must maintain n − o ( n ) preﬁx bits in order to result in a valid searchpoint. Hence, the probability of an accepted step at mutation probability 1 /n is atmost (1 − /n ) n − m − o ( n ) = (1 + o (1)) e − . Since the probability of ﬂipping Ω( n ) bits is n − Ω( n ) , the probability of an accepted step is altogether, by the law of total probability,(1 ± o (1))(1 − /n ) n = (1 ± o (1)) e − . By similar arguments, the probability of a mutationimproving the pre -value by k at most (1 + o (1)) e − /n k and the probability of improvingthe suff -value is at least (1 − o (1))( e − / n − / since there are (cid:0) n / (cid:1) = (1 − o (1)) n / / e − /n each.We now consider a phase of (3 / emn steps. Using the bound on improving the suff -value, we expect (1 − o (1))(3 / √ n activated blocks. By Chernoﬀ bounds, withoverwhelming probability we have at least √ n such blocks. The probability of improv-ing the pre -value by k ≥ o (1)) e − n − k , amounting to an expected numberof improvements by k of (1+ o (1))(3 / mn − k = (1+ o (1))(3 / n − k and, using Chernoﬀbounds and union bounds over all k = o ( n ), the probability of improving the pre -valueby at least (9 / m during the phase is 2 − Ω( n / ) . (cid:3) Our self-adjusting s -ﬂip mutation operator can also have advantages on classical com-binatorial optimization problems. We reconsider the minimum spanning tree (MST)problem on which EAs and RLS were analyzed before Neumann and Wegener (2007).The known bounds for the globally searching (1+1) EA are not tight. More precisely,they depend on log( w max ), the logarithm of the largest edge weight. This is diﬀerentwith RLS variants that ﬂip only one or two bits due to an equivalence ﬁrst formulatedin Raidl, Koller and Julstrom (2006): if only up to two bits ﬂip in each step, then theMST instance becomes indistinguishable from the MST instance formed by replacing alledge weights with their rank in their increasingly sorted sequence. This results in a tightupper bound of O ( m ln n ), where m is the number of edges, for RLS , , an algorithm17hat uniformly at random decides to ﬂip either one or two uniformly chosen bits (Witt,2014). Although not spelt out in the paper, it is easy to see that the leading term inthe polynomial O ( m ln m ) is at most 2. This 2 stems from the logarithm of sum of theweight ranks, which can be in the order of m . We will see that the ﬁrst factor of 2 can,in some sense, be avoided in our SD-RLS ∗ .The following theorem bounds the optimization time of SD-RLS ∗ in the case thatthe algorithm has reached a spanning tree and the ﬁtness function only allows spanningtrees to be accepted. It is well known that with the ﬁtness functions from Neumann andWegener (2007), the expected time to ﬁnd the ﬁrst spanning tree is O ( m log m ), whichalso transfers to SD-RLS ∗ ; hence we do not consider this lower-order term further.However, our bound comes with an additional term related to the number of strictimprovements. We will discuss this term after the proof. Theorem 6.

The expected optimization time of SD-RLS ∗ with R = m on the MSTproblem with m edges, starting with an arbitrary spanning tree, is at most (1 + o (1)) (cid:0) ( m / r + · · · + r m )) + (4 m ln m )E ( S ) (cid:1) = (1 + o (1)) (cid:0) m ln m + (4 m ln m )E ( S ) (cid:1) , where r i is the rank of the i th edge in the sequence sorted by increasing edge weights and E ( S ) is the expected number of strict improvements that the algorithm makes conditionedon that the strength never exceeds . Proof.

We aim at using multiplicative drift analysis using g ( x ) = P mi =1 x i r i as potentialfunction. Since the algorithm has diﬀerent states we do not have the same lower boundon the drift towards the optimum. However, at strength 1 no mutation is accepted sincethe ﬁtness function from Neumann and Wegener (2007) gives a huge penalty to non-trees. Hence, our plan is to conduct the drift analysis conditioned on that the strengthis at most 2 and account for the steps spent at strength 1 separately. Cases where thestrength exceeds 2 will be handled by an error analysis and a restart argument.Let X ( t ) := g ( x t ) − g ( x opt ) for the current search point x ( t ) and an optimal searchpoint x opt . Since the algorithm behaves stochastically the same on the original ﬁt-ness function f and the potential function g , we obtain that E (cid:16) X ( t ) − X ( t +1) | X ( t ) (cid:17) ≥ X ( t ) / (cid:0) m (cid:1) ≥ X ( t ) /m since the g -value can be decreased by altogether g ( x t ) − g ( x opt )via a sequence of at most (cid:0) m (cid:1) disjoint two-bit ﬂips; see also the proof of Theorem 15in Doerr, Johannsen and Winzen (2012) for the underlying combinatorial argument.Let T denote the number of steps at strength 2 until g is minimized, assuming nolarger strength to occur. Using the multiplicative drift theorem, we have E ( T ) ≤ ( m / r + · · · + r m )) ≤ ( m / m )) and by the tail bounds for mul-tiplicative drift (e. g., Lengler, 2020) it holds that Pr (cid:0) T > ( m / m ) + ln( m )) (cid:1) ≤ e − ln( m ) = 1 /m . Note that this bound on T is below the threshold for strength 2 since (cid:0) m (cid:1) ln R = ( m − m ) ln( m ) ≥ ( m / m ) for m large enough. Hence, with probabil-ity at most 1 /m the algorithm fails to ﬁnd the optimum before the strength can changefrom 2 to a diﬀerent value due to the threshold being exceeded.18e next bound the expected number of steps spent at larger strengths. Since eachincrease of the radius implies an unsuccessful phase at strength 2, the probability that ra-dius r , where 3 ≤ r ≤ m/

2, is selected before ﬁnding the optimum is at most (1 /m ) r − .According to Lemma 1, the number of steps spent for each such radius is at most m − ( r − m − (2 r − (cid:0) mr (cid:1) . By the law of total probability, the expected number of steps at largerstrengths than 2 is at most m/ X r =3 m − ( r − m − (2 r − mr ! (cid:18) m (cid:19) r − = o ( m )and contributes only a lower-order term captured by the o (1) in the statement of thetheorem. If the strength exceeds 2, we wait for it become 2 again and restart theprevious drift analysis, which is conditional on strength at most 2. Since the probabilityof a failure is at most 1 /m , this accounts for an expected number of at most 1 / (1 − m )restarts, which is 1 + o (1) as well.It remains to bound the number of steps at strength 1. For each strict improvement,the strength is reset to 1. Thereafter, m ln R = 4 m ln n steps pass before the strengthbecomes 2 again. Hence, if the strength does not exceed 2 before the optimum is reached,this adds a term of (4 m ln n ) S , where S is the number of strict improvements in the run,to the running time. The expected number of strict improvements is bounded by E ( S ),where we assume a random starting point of the algorithm and count the number ofstrict improvement after reaching the ﬁrst tree. If an error occurs and the strengthexceeds 2, the remaining expected number of strict improvements will not be bigger. (cid:3) The term E ( S ) appearing in the previous theorem is not easy to bound. If E ( S ) = o ( m ), the upper bound suggests that SD-RLS may be more eﬃcient than the classicalRLS , algorithm; with the caveat that we are talking about upper bounds only. How-ever, it is not diﬃcult to ﬁnd examples where E ( S ) = Ω( m ), e. g., on the worst-casegraph used for the lower-bound proof in Neumann and Wegener (2007), which we willstudy below experimentally, and we cannot generally rule out that E ( S ) is asymptoti-cally bigger than m on certain instances. However, empirically SD-RLS ∗ can be fasterthan RLS , and the (1+1) EA on MST instances, as we will see in Section 7. In anycase, although the algorithm can search globally, the bound in Theorem 6 does not suﬀerfrom the log( w max ) factor appearing in the analysis of the (1+1) EA.We also considered variants of SD-RLS ∗ that do not reset the strength to 1 aftereach strict improvement and would therefore, be able to work with strength 2 for a longwhile on the MST problem. However, such an approach is risky in scenarios where, e. g.,both one-bit ﬂips and two-bit ﬂips are possible and one-bit ﬂips should be exploited forthe sake of eﬃciency. Instead, we think that a combination of stagnation detection andselection hyperheuristics (Warwicker, 2019) based on the s -ﬂip operator or the learningmechanism from Doerr, Doerr and Yang (2016), which performs very well on the MST,would be more promising here. 19igure 1: Average number of ﬁtness calls (over 1000 runs) the mentioned algorithmstook to optimize Jump . In this section, we present the results of the experiments conducted to see the perfor-mance of the proposed algorithms for small problem dimensions. This experimentaldesign was employed because our theoretical results are asymptotic.In the ﬁrst experiment, we ran an implementation of Algorithm 3 (SD-RLS ∗ ) onthe Jump ﬁtness function with jump size m = 4 and n varying from 80 to 160. Wecompared our algorithm against the (1+1) EA with standard mutation rate 1 /n , the(1+1) EA with mutation probability m/n , Algorithm (1+1) FEA β from Doerr et al.(2017) with three diﬀerent β = { . , , } , and the SD-(1+1) EA presented in Rajabiand Witt (2020). In Figure 1, we observe that SD-RLS ∗ outperforms the rest of thealgorithms.In the second experiment, we ran an implementation of four algorithms SD-RLS ∗ ,(1+1) FEA β with β = 1 . , from Neumann and Wegener (2007) on the MST problem with the ﬁtness function fromNeumann and Wegener (2007) for two types of graphs called TG and Erd˝os–R´enyi.The graph TG with n vertices and m = 3 n/ (cid:0) n/ (cid:1) edges contains a sequence of p = n/ q = n/

2. Regarding the weights, the edges of the completegraph have the weight 1, and we set the weights of edges in triangle to 2 a and 3 a forthe side edges and the main edge, respectively. In this paper, we consider a = n . Thegraph TG is used for estimating lower bounds on the expected runtime of the (1+1) EAand RLS in the literature Neumann and Wegener (2007). In this experiment, we use20igure 2: Box plots comparing number of ﬁtness calls (over 1000 runs) the mentionedalgorithms took to optimize Jump .Figure 3: Example graph TG with p = n/ q vertices with edges of weight 1. The source of the image is from Neumann andWegener (2007). 21 a) Graphs Erd˝os–R´enyi (b) Graphs TG Figure 4: Average number of ﬁtness calls (over 400 runs) the mentioned algorithms tookto optimize the ﬁtness function MST of the graphs. n = { , , , } . As can be seen in Figure 4b, (1+1) FEA β is faster than the rest ofthe algorithms, but SD-RLS ∗ outperforms the standard (1+1) EA and RLS , .Regarding the graphs Erd˝os–R´enyi, we produced some random Erd˝os–R´enyi graphswith p = (2 ln n ) /n and assigned each edge an integer weight in the range [1 , n ] uniformlyat random. We also checked that the graphs certainly had a spanning tree. Then, weran the implementation on MST of these graphs. The obtained results can be seen inFigure 4a. As we discussed in Section 6, SD-RLS ∗ does not outperform the (1+1) EAand RLS , on MST with graphs where the number of strict improvements in SD-RLS ∗ is large.For statistical tests, we ran the algorithms on the graphs TG and Erd˝os–R´enyi over400 times, and all p-values obtained from a Mann-Whitney U-test between the algo-rithms, with respect to the null hypothesis of identical behavior, are less than 10 − except for the results regarding the graph TG with n = 24. Conclusions

We have transferred stagnation detection, previously proposed for EAs with standard bitmutation, to the operator ﬂipping exactly s uniformly randomly chosen bits as typicallyencountered in randomized local search. Through both theoretical runtime analysesand experimental studies we have shown that this combination of stagnation detectionand local search eﬃciently leaves local optimal and often outperforms the previouslyconsidered variants with global mutation. We have also introduced techniques thatmake the algorithm robust if it, due to its randomized nature, misses the right numberof bits ﬂipped, and analyzed scenarios where global mutations are still preferable. In thefuture, we would like to investigate stagnation detection more thoroughly on instances ofclassical combinatorial optimization problem like the minimum spanning tree problem,for which the present paper only gives preliminary but promising results.22 cknowledgement This work was supported by a grant by the Danish Council for Independent Research(DFF-FNU 8021-00260B).

References

Bassin, Anton and Buzdalov, Maxim (2019). The 1/5-th rule with rollbacks: on self-adjustment of the population size in the (1+( λ , λ )) GA. In Proc. of GECCO 2019(Companion) , 277–278. ACM Press.Corus, Dogan, Oliveto, Pietro S., and Yazdani, Donya (2018). Fast artiﬁcial immunesystems. In

Proc. of PPSN 2018 , vol. 11102 of

LNCS , 67–78. Springer.Doerr, Benjamin (2020). Probabilistic tools for the analysis of randomized optimizationheuristics. In Doerr, Benjamin and Neumann, Frank (eds.),

Theory of EvolutionaryComputation – Recent Developments in Discrete Optimization , 1–87. Springer.Doerr, Benjamin and Doerr, Carola (2016). The impact of random initialization on theruntime of randomized search heuristics.

Algorithmica , (3), 529–553.Doerr, Benjamin and Doerr, Carola (2018). Optimal static and self-adjusting parameterchoices for the (1+( λ , λ )) genetic algorithm. Algorithmica , (5), 1658–1709.Doerr, Benjamin and Doerr, Carola (2020). Theory of parameter control for dis-crete black-box optimization: Provable performance gains through dynamic parameterchoices. In Doerr, B. and Neumann, F. (eds.), Theory of Evolutionary Computation– Recent Developments in Discrete Optimization , 271–321. Springer.Doerr, Benjamin, Doerr, Carola, and Yang, Jing (2016). k -bit mutation with self-adjusting k outperforms standard bit mutation. In Proc. of PPSN 2016 , vol. 9921of

Lecture Notes in Computer Science , 824–834. Springer.Doerr, Benjamin, Fouz, Mahmoud, and Witt, Carsten (2010). Quasirandom evolutionaryalgorithms. In

Proc. of GECCO ’10 , 1457–1464. ACM.Doerr, Benjamin, Gießen, Christian, Witt, Carsten, and Yang, Jing (2019). The (1+ λ ) evolutionary algorithm with self-adjusting mutation rate. Algorithmica , (2),593–631.Doerr, Benjamin, Jansen, Thomas, and Klein, Christian (2008). Comparing global andlocal mutations on bit strings. In Ryan, Conor and Keijzer, Maarten (eds.), Proc. ofGECCO ’08 , 929–936. ACM Press.Doerr, Benjamin, Johannsen, Daniel, and Winzen, Carola (2012). Multiplicative driftanalysis.

Algorithmica , , 673–697. 23oerr, Benjamin, Le, Huu Phuoc, Makhmara, R´egis, and Nguyen, Ta Duy (2017). Fastgenetic algorithms. In Proc. of GECCO ’17 , 777–784. ACM Press.Hansen, Pierre and Mladenovic, Nenad (2018). Variable neighborhood search. In Mart´ı,Rafael, Pardalos, Panos M., and Resende, Mauricio G. C. (eds.),

Handbook of Heuris-tics , 759–787. Springer.Lengler, Johannes (2020). Drift analysis. In Doerr, B. and Neumann, F. (eds.),

Theory ofEvolutionary Computation – Recent Developments in Discrete Optimization , 89–131.Springer.Lissovoi, Andrei, Oliveto, Pietro S., and Warwicker, John Alasdair (2019). On the timecomplexity of algorithm selection hyper-heuristics for multimodal optimisation. In

Proc. of AAAI 2019 , 2322–2329. AAAI Press.Lugo, Michael (2017). Sum of ”the ﬁrst k ” binomial coeﬃcients for ﬁxed n . MathOver-ﬂow. URL https://mathoverflow.net/q/17236 . (version: 2017-10-01).Neumann, Frank and Wegener, Ingo (2007). Randomized local search, evolutionaryalgorithms, and the minimum spanning tree problem. Theoretical Computer Science , , 32–40.Raidl, G¨unther R., Koller, Gabriele, and Julstrom, Bryant A. (2006). Biased muta-tion operators for subgraph-selection problems. IEEE Transaction on EvolutionaryComputation , (2), 145–156.Rajabi, Amirhossein and Witt, Carsten (2020). Self-adjusting evolutionary algorithmsfor multimodal optimization. In Proc. of GECCO ’20 , 1314–1322. ACM Press.Warwicker, John Alasdair (2019).

On the runtime analysis of selection hyper-heuristicsfor pseudo-Boolean optimisation . Ph.D. thesis, University of Sheﬃeld, UK. URL http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.786561 .Wegener, Ingo (2001). Methods for the analysis of evolutionary algorithms on pseudo-Boolean functions. In Sarker, Ruhul, Mohammadian, Masoud, and Yao, Xin (eds.),

Evolutionary Optimization . Kluwer Academic Publishers.Witt, Carsten (2014). Revised analysis of the (1+1) EA for the minimum spanning treeproblem. In