Stagnation Detection with Randomized Local Search
SStagnation Detection withRandomized Local Search
Amirhossein RajabiTechnical University of DenmarkKgs. [email protected] Carsten WittTechnical University of DenmarkKgs. [email protected] 9, 2021
Abstract
Recently a mechanism called stagnation detection was proposed that automati-cally adjusts the mutation rate of evolutionary algorithms when they encounter localoptima. The so-called SD-(1+1) EA introduced by Rajabi and Witt (GECCO 2020)adds stagnation detection to the classical (1+1) EA with standard bit mutation,which flips each bit independently with some mutation rate, and raises the muta-tion rate when the algorithm is likely to have encountered local optima.In this paper, we investigate stagnation detection in the context of the k -bit flipoperator of randomized local search that flips k bits chosen uniformly at randomand let stagnation detection adjust the parameter k . We obtain improved runtimeresults compared to the SD-(1+1) EA amounting to a speed-up of up to e = 2 . . . . Moreover, we propose additional schemes that prevent infinite optimization timeseven if the algorithm misses a working choice of k due to unlucky events. Finally,we present an example where standard bit mutation still outperforms the local k -bitflip with stagnation detection. Evolutionary Algorithms (EAs) are parameterized algorithms, so it has been ongoingresearch to discover how to choose their parameters best. Static parameter settingsare not efficient for a wide range of problems. Also, given a specific problem, theremight be different scenarios during the optimization, which results in inefficiency of onestatic parameter configuration for the whole run. Self-adjusting mechanisms addressthis issue as a non-static parameter control framework that can learn acceptable or evennear-optimal parameter settings on the fly. See also the survey article Doerr and Doerr(2020) for a detailed coverage of static and non-static parameter control.Many studies have been conducted on frameworks which adjust the mutation rate ofdifferent mutation operators, in particular in the standard bit mutation for the search1 a r X i v : . [ c s . N E ] F e b pace of bit strings { , } n to make the rate efficient on unimodal functions. For example,the (1 + ( λ, λ )) GA using the 1 / OneMax (Doerr and Doerr, 2018), resulting in asymptotic speed-upscompared to static settings. Likewise, the self-adjusting mechanism in the (1+ λ ) EA withtwo rates proposed in Doerr et al. (2019) performs on unimodal functions as efficientlyas the best λ -parallel unary unbiased black-box algorithm.The self-adjusting frameworks mentioned above are mainly designed to optimizeunimodal functions. Generally, they are not able to suggest an efficient parameter settingwhere algorithms get stuck in a local optimum since they mainly work based on thenumber of successes, so there is no signal in such a situation. On multimodal functions,where some specific numbers of bits have to flip to make progress, Stagnation Detection (SD) introduced in Rajabi and Witt (2020) can overcome local optima in efficient time.This module can be added to most of the existing algorithms to leave local optimawithout any significant increase of the optimization time of unimodal (sub)problems.To our knowledge, no study has put forward other runtime analyses of self-adjustingmechanisms on multimodal functions. However, in a broader context of mutation-basedrandomized search heuristics, the heavy-tailed mutation presented in Doerr et al. (2017)has been able to leave a local optimum in a much more efficient time than the standardbit mutation does. Moreover, in the context of artificial immune systems (Corus, Olivetoand Yazdani, 2018) and hyperheuristics (Lissovoi, Oliveto and Warwicker, 2019), thereare proofs that specific search operators and selection of low-level heuristics can speedup multimodal optimization compared to the classical mutation operators.Recent theoretical research on evolutionary algorithms in discrete search spacesmainly considers global mutations which can create all possible points in one itera-tion. These mutations have been functional in optimization scenarios where informationabout the difficulties of the local optima is not available. For example, the standard bitmutation which flips each bit independently with a non-zero probability can produce anypoint in the search space. However, local mutations can only create a fixed set of off-spring points. The 1-bit flip mutation that often can be found in the Randomized LocalSearch algorithm (RLS) can only reach a limited number of search points, which resultsin being stuck in a local optimum with the elitist selection. Nevertheless, local muta-tions may outperform global mutations on unimodal functions and multimodal functionswith known gap sizes. It is of special interest to use advantages of local mutations onunimodal (sub)functions additionally to overcome local optima efficiently.This paper investigates k -bit flip mutation as a local mutation in the context of theabove-mentioned stagnation detection mechanism. This mechanism detects when thealgorithm is stuck in a local optimum and gradually increases mutation strength (i. e.,the number of flipped bits) to a value the algorithm needs to leave the local optimum.Similarly, we aim to show that the algorithms using k -bit flip can use stagnation de-tection to tune the parameter k . One of the key benefits of such algorithms is usingthe efficiency of RLS, which performs very well on unimodal (sub)problems without fearof infinite running time in local optima. An additional advantage of using k -bit flipmutation accompanied by stagnation detection is that it overcomes local optima more2fficiently than global mutations. Moreover, the outcome points out the advantages andpracticability of our self-adjusting approach that makes local-mutation algorithms ableto optimize functions that have been intractable to solve so far.We propose two algorithms combining stagnation detection with local mutations.The first algorithm called SD-RLS gradually increases the mutation strength when thecurrent strength has been unsuccessful in finding improvements for a significantly longtime. In the most extreme case, the strength ends at n , i. e., mutations flipping allbits. With high probability, SD-RLS has a runtime that is by a factor of ( nem ) m / (cid:0) nm (cid:1) (up to lower-order terms) smaller on functions with Hamming gaps of size m than theSD-(1+1) EA previously considered in Rajabi and Witt (2020). This improvementis especially strong for small m and amounts to a factor of e on unimodal functions.Although it is unlikely that the algorithm fails to find an improvement when the currentstrength allows this, there is a risk that this algorithm misses the “right” strength andtherefore it can have infinite expected runtime. To address this, we propose a secondalgorithm called SD-RLS ∗ that repeatedly loops over all smaller strengths than the lastattempted one when it fails to find an improvement. This results in expected finiteoptimization time on all problems and only increases the typical runtime by lower-orderterms compared to SD-RLS. We also observe that the algorithms we obtain can stillfollow the same search trajectory as the classical RLS when one-bit flips are sufficient tomake improvements. In those cases, well-established techniques for the analysis of RLSlike the fitness-level method carry over to our variant enhanced with stagnation detection.This is not necessarily the case in related approaches like variable neighborhood search(Hansen and Mladenovic, 2018) and quasirandom evolutionary algorithms (Doerr, Fouzand Witt, 2010) both of which employ more determinism and do not generally followthe trajectory of RLS.We shall investigate the two suggested algorithms on unimodal functions and func-tions with local optima of different so-called gap sizes, corresponding to the numberof bits that need to be flipped to escape from the optima. Many results are obtainedfollowing the analysis of the SD-(1+1) EA (Rajabi and Witt, 2020) which uses a globaloperator with self-adjusted mutation strength. In fact, often the general proof structurecould be taken over almost literally but with improved overall bounds. In conclusion, theself-adjusting local mutation seems to be the preferred alternative to the SD-(1+1) EAwith global mutation. However, we will also investigate carefully chosen scenarios whereglobal mutations are superior.This paper is structured as follows: in Section 2, we state the classical RLS algorithmand introduce our self-adjusting variants with stagnation detection; moreover, we collectimportant mathematical tools. Section 3 shows runtime results for the simpler variantSD-RLS, concentrating on the probability of leaving local optima, while Section 4 givesa more detailed analysis of the variant SD-RLS ∗ on benchmark functions like OneMax and
Jump . Section 5 analyzes an example function which the standard (1+1) EA withstandard bit mutation can solve in polynomial time with high probability whereas the k -bit flip mutation with stagnation detection needs exponential time. Through improvedupper bounds, we give in Section 6 indications for that our approach may also be superior3o static settings on instances of the minimum spanning tree problem. This problem andother scenarios are investigated experimentally in Section 7 before we finally concludethe paper. In this paper, we consider pseudo-boolean functions f : { , } n → R that w. l. o. g. areto be maximized. One of the first randomized search heuristics studied in the literatureis randomized local search (RLS) (Doerr and Doerr, 2016) displayed in Algorithm 1.This heuristic starts with a random search point and then repeats mutating the point byflipping s uniformly chosen bits (without replacement) and replacing it with the offspringif it is not worse than the parent. Algorithm 1
RLS with static strength s Select x uniformly at random from { , } n for t ← , , . . . do Create y by flipping s bit(s) in a copy of x . if f ( y ) ≥ f ( x ) then x ← y .The runtime or the optimization time of a heuristic on a function f is the first pointtime t where a search point of maximal fitness has been created; often the expectedruntime, i. e., the expected value of this time, is analyzed.Theoretical research on evolutionary algorithms mainly studies algorithms on simpleunimodal well-known benchmark problems like OneMax ( x , . . . , x n ) := | x | , but also on the multimodal Jump m function with gap size m defined as follows: Jump m ( x , . . . , x n ) = ( m + | x | if | x | ≤ n − m or | x | = nn − | x | otherwiseThe mutation used in RLS is a local mutation as it only produces a limited numberof offspring. This mutation, which we call s -flip in the following (in the introduction,we used the classical name k -bit flip), flips exactly s bits randomly chosen from the bitstring of length n , so for any point x ∈ { , } n , RLS can just sample from (cid:0) ns (cid:1) possiblepoints. As a result, s -flip is often more efficient compared to global mutations when weknow the difficulty of making progress since the algorithm just looks at a certain part ofthe search space. To be more precise, we recall the so-called gap of the point x ∈ { , } n defined in Rajabi and Witt (2020) as the minimum Hamming distance to points withthe strictly larger fitness function value. Formally,gap( x ) := min { H ( x, y ) : f ( y ) > f ( x ) , y ∈ { , } n } .
4t is not possible to make progress by flipping less than gap( x ) bits of the current searchpoint x . However, if the algorithm uses the s -flip with s = gap( x ), it can make progresswith a positive probability. In addition, on unimodal functions where the gap of all pointsin the search space (except for global optima) is one, the algorithm makes progress withstrength s = 1.Nevertheless, understanding the difficulty of a local optimum has not generally beenpossible so far, and benefiting from domain knowledge to use it to determine the strengthis not always feasible in the perspective of black-box optimization. Therefore, despite theadvantages of s -flip, global mutations, e. g. standard bit mutation, which can produceany point in the search space, have been used in the literature frequently. For example,the (1+1) EA that uses a similar approach to Algorithm 1 benefits from standard bitmutation that implicitly uses the binomial distribution to determine how many bits mustflip. Consequently, even if the algorithm uses strength 1 (e.g. mutation rate 1 /n ), witha positive probability, the algorithm can escape from any local optimum.We study the search and success probability of Algorithm 1 and its relation to stag-nation detection more closely. With similar arguments as presented in Rajabi and Witt(2020), if the gap of the current search point is 1 then the algorithm makes an improve-ment with probability 1 /R at strength 1, and the probability of not finding it in n ln R steps is at most (1 − /n ) n ln R ≤ /R (where R is a parameter to be discussed). Similarly,the probability of not finding an improvement for a point with gap of k within (cid:0) nk (cid:1) ln R steps is at most − (cid:0) nk (cid:1) ! ( nk ) ln R ≤ R .
Hence, after (cid:0) nk (cid:1) ln R steps without improvement there is a probability of at least 1 − /R that no improvement at Hamming distance k exists, so for enough large R the probabilityof failing is small.We consider this idea to develop the first algorithm. We add the stagnation detectionmechanism to RLS to manage the strength s . As shown in Algorithm 2, hereinafter calledSD-RLS, the initial strength is 1. Also, there is a counter u for counting the numberof unsuccessful steps to find the next after the last success. When the counter exceedsthe threshold of (cid:0) ns (cid:1) ln R , strength s is increased by one, and when the algorithm makesprogress, the counter and strength are reset to their initial values. In the case that thealgorithm is failed to have a success where the strength is equal to the gap of the currentsearch point, the algorithm misses the chance of making progress. Therefore, withprobability 1 /R , the optimization time would be infinitive. Choosing a large enough R to have an overwhelming large probability of making progress could be a solution to thisproblem. However, we propose another algorithm that resolves this issue, although therunning time is not always as efficient as with Algorithm 2.In Algorithm 3, hereinafter called SD-RLS ∗ , we introduce a new variable r called ra-dius. This parameter determines the largest Hamming distance from the current searchpoint that algorithm must investigate. In details, when the radius becomes r , the algo-rithm starts with strength r (i. e., s = r ) and when the threshold is exceeded, it decreases5he strength by one as long as the strength is greater than 1. This results in a morerobust behavior. In the case that the threshold exceeds and the current strength is 1,the radius is increased by one to cover a more expanded space. Also, when the radiusexceeds n/
2, the algorithm increases the radius to n , which means that the algorithmcovers all possible strengths between 1 and n . We note that the strategy of repeatedlyreturning to lower strengths remotely resembles the 1 / Algorithm 2
RLS with stagnation detection (SD-RLS)Select x uniformly at random from { , } n and set s ← u ← for t ← , , . . . do Create y by flipping s t bits in a copy of x uniformly. u ← u + 1. if f ( y ) > f ( x ) then x ← y . s t +1 ← u ← else if f ( y ) = f ( x ) and s t = 1 then x ← y . if u > (cid:0) ns (cid:1) ln R then s t +1 ← min { s t + 1 , n } . u ← else s t +1 ← s t .The parameter R represents the probability of failing to find an improvement at the“right” strength. More precisely, as we will see in Theorem 1 and Lemma 2 (for SD-RLSand SD-RLS ∗ , respectively), the probability of not finding an improvement where thereis a potential of making progress is at most 1 /R . We recommend R ≥ | Im f | for SD-RLS(where Im f is the image set of f ), and for a constant (cid:15) , R ≥ n (cid:15) · | Im f | for SD-RLS ∗ ,resulting in that the probability of ever missing an improvement at the right strength issufficiently small throughout the run. The following lemma containing some combinatorial inequalities will be used in theanalyses of the algorithms. The first part of the lemma seems to be well known andhas already been proved in Lugo (2017) and is also a consequence of Lemma 1.10.38 inDoerr (2020). The second part follows from elementary manipulations.6 lgorithm 3
RLS with robust stagnation detection (SD-RLS ∗ )Select x uniformly at random from { , } n and set r ← s ← u ← for t ← , , . . . do Create y by flipping s t bits in a copy of x uniformly. u ← u + 1. if f ( y ) > f ( x ) then x ← y . s t +1 ← r t +1 ← u ← else if f ( y ) = f ( x ) and r t = 1 then x ← y . if u > (cid:0) ns t (cid:1) ln R thenif s t = 1 thenif r t < n/ then r t +1 ← r t + 1 else r t +1 ← ns t +1 ← r t +1 . else r t +1 ← r t . s t +1 ← s t − u ← else s t +1 ← s t . r t +1 ← r t . Lemma 1.
For any integer m ≤ n/ , we have(a) P mi =1 (cid:0) ni (cid:1) ≤ n − ( m − n − (2 m − (cid:0) nm (cid:1) ,(b) (cid:0) nM (cid:1) ≤ (cid:0) nm (cid:1) (cid:0) n − mm (cid:1) M − m for m < M < n/ . Proof. (a) We use the following proof due to Lugo (2017). Through the equation (cid:0) nk − (cid:1) = kn − k +1 (cid:0) nk (cid:1) which comes from the definition of the binomial coefficient and geometricseries sum formula, we achieve the following result: P mi =1 (cid:0) ni (cid:1)(cid:0) nm (cid:1) = (cid:0) nm (cid:1)(cid:0) nm (cid:1) + (cid:0) nm − (cid:1)(cid:0) nm (cid:1) + · · · + (cid:0) n (cid:1)(cid:0) nm (cid:1) < mn − m + 1 + m ( m − n − m + 1)( n − m + 2) + . . . ≤ mn − m + 1 + (cid:18) mn − m + 1 (cid:19) + · · · = n − ( m − n − (2 m − t ≥ m , by using (cid:0) nm (cid:1) = n − m +1 m (cid:0) nm − (cid:1) , we have nM ! = ( n − M + 1) . . . ( n − M + t ) M . . . ( M − t + 1) nM − t ! ≤ (cid:18) n − M + tM − t (cid:19) t nM − t ! Thus, by setting m = M − t , we obtain the statement. (cid:3) In this section, we study the first algorithm called SD-RLS, see Algorithm 2. In thebeginning of the section, we show upper and lower bounds on the time for escapingfrom local optima. Then, in Theorem 2, we show the important result that on unimodalfunctions, SD-RLS with probability 1 − | Im f | /R behaves in the same way as RLS withstrength 1, including the same asymptotic bound on the expected optimization time.The following theorem shows the time SD-RLS takes with probability 1 − /R tomake progress of search point x with a gap of m . Theorem 1.
Let x ∈ { , } n be the current search point of SD-RLS on a pseudo-boolean function f : { , } n → R . Define T x as the time to create a strict improvementif gap( x ) = m . Let U be the event of finding an improvement at Hamming distance m .Then, we have E ( T x | U ) ≤ (cid:0) nm (cid:1) (1 + O ( m ln Rn )) if m = o ( n ) ,O (cid:0)(cid:0) nm (cid:1) ln R (cid:1) if m = Θ( n ) ∧ m < n/ ,O (2 n ln R ) if m ≥ n/ . Moreover,
Pr( U ) ≥ − /R . Compared to the corresponding theorems in Rajabi and Witt (2020), the boundsin Theorem 1 are by a factor of ( nem ) m / (cid:0) nm (cid:1) (up to lower-order terms) smaller. Thisspeedup is roughly e for m = 1, i. e., unimodal functions (like OneMax ) but becomesless pronounced for larger m since, intuitively, the number of flipped bits in a standardbit mutation will become more and more concentrated and start resembling the m -bitflip mutation. Proof of Theorem 1.
The algorithm SD-RLS can make an improvement only wherethe current strength s is equal to m and the probability of not finding an improvementduring this phase is − nm ! − ( nm ) ln R ≤ R .
8f the improvement event happens, the running time of the algorithm to escape fromthis local optimum is E ( T x | U ) < m − X i =1 ni ! ln R | {z } =: S + nm !| {z } =: S , where S is the number of iterations for s < m and S is the expected number ofiterations needed to make an improvement where s = m .By using Lemma 1 for m < n/
2, we haveE ( T x | U ) < m − X i =1 ni ! ln R + nm ! < n − m + 2 n − m + 3 nm − ! ln R + nm ! = n − m + 2 n − m + 3 · mn − m + 1 nm ! ln R + nm ! = nm ! (cid:18) n − m + 2 n − m + 3 · mn − m + 1 ln R + 1 (cid:19) , and for m ≥ n/
2, we know that P ni =1 (cid:0) ni (cid:1) < n , so we can computeE ( T x | U ) = m − X i =1 ni ! ln R + nm ! ≤ O (2 n ln R )Altogether we achieveE ( T x | U ) ≤ (cid:0) nm (cid:1) (1 + O ( m ln Rn )) if m = o ( n ) ,O (cid:0)(cid:0) nm (cid:1) ln R (cid:1) if m = Θ( n ) ∧ m < n/ ,O (2 n ln R ) if m ≥ n/ . (cid:3) Using the previous lemma, we obtain the following result that allows us to reuseexisting results for RLS on unimodal functions.
Theorem 2.
Let f : { , } n → R be a unimodal function and consider SD-RLS with R ≥ | Im f | . Then, with probability at least − | Im f | R , the SD-RLS never increases theradius and behaves stochastically like RLS before finding an optimum of f . Proof.
As on unimodal functions, the gap of all points is 1, the probability of notfinding and improvement is − nm ! − ( nm ) ln R ≤ R . | Im f | improving steps happen before finding the optimum, by a union bound the prob-ability of algorithm 2 ever increasing the strength beyond 1 is at most | Im( f ) | R , whichproves the lemma. (cid:3) With these two general results, we conclude the analysis of SD-RLS and turn tothe variant SD-RLS ∗ that always has finite expected optimization time. In fact, wewill present similar results in general optimization scenarios and supplement them byanalyses on specific benchmark functions. It is possible to analyze the simpler SD-RLSon these benchmark functions as well, but we do not feel that this gives additionalinsights. ∗ In this section, we turn to the algorithm SD-RLS ∗ that iteratively returns to lowerstrengths to avoid missing the “right” strength. We recall T x as the number of steps SD-RLS ∗ takes to find an improvement point from the current search point x . Let phase r consists of all points of time where radius r is used in the algorithm. When the algorithmenters phase r , it starts with strength r , but when the counter exceeds the threshold,the strength decreases by one as long as it is greater than 1. In the case of strength 1,the radius r is increased to r + 1 (or to n if r + 1 is at least n/ r + 1 (or phase n ).Let E r be the event of not finding the optimum within phase r , and U ji for j > i bethe event of not finding the optimum during phases i to j − j .In other words, U ji = E i ∩ · · · ∩ E j − ∩ E j . For i = j , we define U ii = E i . We obtain thefollowing result on the failure probability which follows from the fact that the algorithmtries to find an improvement for (cid:0) nm (cid:1) ln R iterations with a probability of success of (cid:0) nm (cid:1) − when the radius is at least m . Lemma 2.
Let x ∈ { , } n be the current search point of SD-RLS ∗ on a pseudo-booleanfitness function f : { , } n → R and let m = gap( x ) . Then Pr( E r ) ≤ ( R if m ≤ r < n if r = n. The following lemma bounds the time to leave a local optimum conditional on thatthe “right” strength was missed.
Proof.
Assume r ≥ m . During phase r , the algorithm spends (cid:0) nm (cid:1) ln R steps atstrength m until it changes the strength or phase. Then, the probability of not im-proving at strength s = m in phase r is at mostPr( E r ) = − nm ! − ( nm ) ln R ≤ R . n , the algorithm does not change radius r anymore, and it continuesto flip s bits with different s containing m until making progress so the probability ofeventually failing to find the improvement in this phase is 0. (cid:3) Lemma 3.
Let x ∈ { , } n with m = gap( x ) < n/ be the current search point of SD-RLS ∗ with R ≥ n (cid:15) · | Im f | for an arbitrary constant (cid:15) > on a pseudo-boolean function f : { , } n → R and T x be the time to create a strict improvement. Then, we have E ( T x | E m ) = o R | Im f | nm !! , where E m is the event of not finding an optimum when the radius r equals m . The reason behind the factor R/ | Im f | in Lemma 3 is that for proving a running timeof SD-RLS ∗ on a function like f , the event E m happens with probability 1 /R for eachpoint in Im f , so in the worst case, during the run, there are expected | Im f | /R searchpoints where the counter exceeds the threshold, resulting in an expected number of atmost | Im f | /R · o ( R/ | Im f | (cid:0) nm (cid:1) ) extra iterations for the whole run in the case of exceedingthe thresholds. Also, note that we always have R/ | Im f | = Ω(1) since according to theassumption, R > | Im f | . Proof of Lemma 3.
Using the law of total probability with respect to the events U im +1 defined above, we haveE ( T x | E m ) = b n/ − c X i = m +1 E (cid:16) T x | U im +1 (cid:17) Pr (cid:16) U im +1 (cid:17)| {z } =: S + E (cid:0) T x | U nm +1 (cid:1) Pr (cid:0) U nm +1 (cid:1)| {z } =: S . In order to estimate Pr (cid:0) U im +1 (cid:1) , using Lemma 2, we havePr (cid:16) U im +1 (cid:17) < i − Y j = m +1 Pr( E j ) < R − ( i − m − . For S , by using Lemma 1 multiple times, we compute b n/ − c X i = m +1 E (cid:16) T x | U im +1 (cid:17) Pr (cid:16) U im +1 (cid:17) = b n/ − c X i = m +1 i X r =1 r X s =1 ns ! ln R · R − ( i − m − ≤ b n/ − c X i = m +1 i X r =1 n − r + 1 n − r + 1 nr ! ln R · R − ( i − m − ≤ R b n/ − c X i = m +1 (cid:18) n − i + 1 n − i + 1 (cid:19) ni ! ln R · R − ( i − m ) ≤ R b n/ − c X i = m +1 (cid:18) n − i + 1 n − i + 1 (cid:19) (cid:18) n − mm (cid:19) i − m nm ! ln R · R − ( i − m ) R | Im f | nm ! b n/ − c X i = m +1 (cid:18) n − i + 1 n − i + 1 (cid:19) (cid:18) n − mn · m (cid:19) i − m · n i − m | Im f | ln RR ( i − m ) . Using the fact that R ≥ n (cid:15) · | Im f | and i > m , the last expression is bounded fromabove by o ( R/ | Im f | (cid:0) nm (cid:1) ).Regarding S , when radius r is increased to n , the algorithm mutates s bits of thethe current search point for all possible strengths of 1 to n periodically. In each cyclethrough different strengths, according to lemma 2, the algorithm escapes from the localoptimum with probability 1 − /R so there are R/ ( R −
1) cycles in expectation viageometric distribution. Besides, each cycle of radius n costs P ni = s (cid:0) ns (cid:1) ln R . Overall,we have RR − P ns =1 (cid:0) ns (cid:1) ln R extra fitness function calls if the algorithm fails to find theoptimum in the first b n/ − c phases happened with the probability of R − ( b n/ c− m − .Thus, we haveE (cid:0) T x | U nm +1 (cid:1) Pr (cid:0) U nm +1 (cid:1) ≤ b n/ c X r =1 r X s =1 ns ! ln R + RR − n X s =1 ns ! ln R R − ( b n/ c− m − ≤ R · (cid:18) RR − (cid:19) n/ X s =1 ns ! ln R · R − ( b n/ c− m ) ≤ R · (cid:18) RR − (cid:19) n/ X s =1 (cid:18) n − mm (cid:19) s − m nm ! ln R · R − ( b n/ c− m ) = R | Im f | · (cid:18) RR − (cid:19) nm ! n/ X s =1 (cid:18) n − mn · m (cid:19) s − m | Im f | n s − m ln RR ( b n/ c− m ) = o R | Im f | nm !! Altogether, we finally have E ( T x | E m ) = S + S = o (cid:0) R/ | Im f | (cid:0) nm (cid:1)(cid:1) as suggested. (cid:3) The following theorem and its proof are similar to Theorem 1 but require a morecareful analysis to cover the repeated use of smaller strengths. We note that the boundsdiffer from Theorem 1 only in lower-order terms unless m is very big. Theorem 3.
Let x ∈ { , } n be the current search point of SD-RLS ∗ with R ≥ n (cid:15) ·| Im f | for an arbitrary constant (cid:15) > on a pseudo-boolean function f : { , } n → R .Define T x as the time to create a strict improvement if gap( x ) = m . Then, we have E ( T x ) ≤ (cid:0) nm (cid:1) (cid:16) O (cid:16) m n − m ln R (cid:17)(cid:17) if m < n/ n n ln R if m ≥ n/ , and E ( T x ) ≥ (cid:0) nm (cid:1) /W , where W is the number of strictly better search points at Hammingdistance m . roof. Using the law of total probability with respect to E m defined above as the eventof not finding the optimum by the end of phase m , we haveE ( T x ) = E (cid:16) T x | E m (cid:17) Pr (cid:16) E m (cid:17)| {z } =: S + E ( T x | E m ) Pr( E m ) | {z } =: S . Regarding S , it takes P m − i =1 P ij =1 (cid:0) nj (cid:1) ln R steps until SD-RLS ∗ increases both radiusand strength to m . When the mutation strength is m , within an expected number of (cid:0) nm (cid:1) steps, a better point will be found.In regard to S , where the optimum is not found by the end of phase m , there wouldbe at most O ( R/ | Im f | (cid:0) nm (cid:1) ) iterations in expectation through Lemma 3. This event, i. e., E m , is happened with the probability of 1 /R .Altogether, for m < n/
2, using Lemma 1, we haveE ( T x ) = Pr (cid:16) E m (cid:17) E (cid:16) T x | E m (cid:17) + Pr( E m )E ( T x | E m ) ≤ E (cid:16) T x | E m (cid:17) + Pr( E m )E ( T x | E m ) ≤ m − X i =1 i X j =1 nj ! ln R + nm ! + 1 R · o R | Im f | nm !! ≤ m − X i =1 n − ( i − n − (2 i − ni ! ln R + nm ! + o nm !! ≤ (cid:18) mn − m + 1 (cid:19) m − X i =1 ni ! ln R + nm ! + o nm !! ≤ (cid:18) mn − m + 1 (cid:19) (cid:18) m − n − m + 3 (cid:19) nm ! ln R + nm ! + o nm !! = nm ! O m n − m ln R ! . For m ≥ n/
2, the algorithm is not able to make an improvement for radius r lessthan n/
2. However, as radius r is increased to n , the algorithm mutates m-bits of thethe current search point for all possible strengths of 1 to n periodically. Thus, accordingto lemma 2, the algorithm escapes from the local optimum with probability 1 − /R sothere are R/ ( R −
1) cycles in expectation through geometric distribution in this phase.Finally, we computeE ( T x | U m ) = b n/ − c X i =1 i X j =1 nj ! ln R + RR − n X i =1 ni ! ln R ≤ O (2 n n ln R ) . Moreover, the expected number of iterations for making an improvement is at least (cid:0) nm (cid:1) where the current strength s equals m . The algorithm is not able to have a successwith other strengths. Therefore, we have E ( T x ) ≥ (cid:0) nm (cid:1) . (cid:3) Lemma 4.
Let f : { , } n → R be a unimodal function and consider SD-RLS ∗ with R ≥ n (cid:15) ·| Im f | for an arbitrary constant (cid:15) > . Then, with probability at least − | Im f | R ,SD-RLS ∗ never increases the radius and behaves stochastically like RLS before findingan optimum of f .Denote by T the runtime of SD-RLS ∗ on f . Let f i be the i -th fitness value of anincreasing order of all fitness values in f and s i be a lower bound on the probability thatRLS finds an improvement from search points with fitness value f i , then E ( T ) ≤ | Im f | X i =1 s i + o ( n ) . Proof.
As on unimodal functions, the gap of all points is 1, the probability of notfinding and improvement is (cid:16) − (cid:0) nm (cid:1) − (cid:17) ( nm ) ln R ≤ R . This argumentation holds for eachimprovement that has to be found. Since at most | Im f | improving steps happen beforefinding the optimum, by a union bound the probability of SD-RLS ∗ ever increasing thestrength beyond 1 is at most | Im( f ) | R , which proves the lemma.We let the random set W contain the search points from which SD-RLS ∗ does notfind an improvement within phase 1 (i. e., while r t = 1). To prove the second claim, weconsider all fitness levels A , . . . , A | Im f | such that A i contains search points with fitnessvalue f i and sum up upper bounds on the expected times to leave each of these fitnesslevels. Under the condition that the strength is not increased before leaving a fitnesslevel, the worst-case time to leave fitness level A i is 1 /s i similarly to RLS. Hence, webound the expected optimization time of SD-RLS ∗ from above by adding the waitingtimes on all fitness levels for RLS, which is given by P | Im f | i =1 /s i , and the expected timesspent to leave the points in W ; formally,E ( T ) ≤ | Im f | X i =1 s i + X x ∈ W E ( T x ) . Each point in Im f contributes with probability Pr( E ) to W . Hence, E ( | W | ) ≤| Im f | Pr( E ). As on unimodal functions, the gap of all points is 1, by Lemma 3, wecompute X x ∈ W E ( T x ) ≤ | Im f | · Pr( E ) · E ( E ) ≤ | Im f | · R − o R | Im f | · n !! = o ( n ) . Thus, we finally have E ( T ) ≤ | Im f | X i =1 s i + o ( n ) , as suggested. (cid:3) Jump functionwhich seems to be the best available for mutation-based hillclimbers.
Theorem 4.
Let n ∈ N . For all ≤ m , the expected runtime E ( T ) of SD-RLS ∗ with R ≥ n (cid:15) for an arbitrary constant (cid:15) > on Jump m satisfies E ( T ) ≤ (cid:0) nm (cid:1) (cid:16) O (cid:16) m n − m ln n (cid:17)(cid:17) if m < n/ ,O (2 n n ln n ) otherwise. Proof.
Before reaching the plateau consisting of all points of n − m one-bits, Jump isequivalent to
OneMax ; hence, according to Lemma 4, the expected running time SD-RLS takes to reach the plateau is at most O ( n ln n ). Note that this bound was obtainedvia the fitness level method with s i = ( n − i ) /n as minimum probability for leaving theset of search points with i one-bits.Every plateau point x with n − m one-bits satisfies gap( x ) = m according to thedefinition of Jump . Thus, using Theorem 3, the algorithm finds the optimum withinexpected time E ( T x ) ≤ (cid:0) nm (cid:1) (cid:16) O (cid:16) m n − m ln n (cid:17)(cid:17) if m < n/ O (2 n n ln n ) if m ≥ n/ . This dominates the expected time of the algorithm before the plateau point and resultsin the running time in the theorem. (cid:3)
While our s -flip mutation along with stagnation detection can outperform the (1+1) EAon Jump functions, it is clear that its different search behavior may be disadvantageouson other examples. Concretely, we will present a function that has a unimodal path to alocal optimum with a large Hamming distance to the global optimum. SD-RLS will withhigh probability follow this path and incur exponential optimization time. However, thefunction has a second gradient that requires two-bit flips to make progress. The classical(1+1) EA will be able to follow this gradient and to arrive at the global optimum beforeone-bit flips have reached the end of the path to the local optimum.In a broader context, our function illustrates an advantage of global mutation opera-tors. By a simple swap of local and global optimum, it immediately turns into the directopposite, i. e., an example where using global instead of local mutations is highly detri-mental and increases the runtime from polynomial to exponential with overwhelmingprobability. An example of such a function was previously presented in Doerr, Jansenand Klein (2008); however, both the underlying construction and the proof of exponentialruntime for the (1+1) EA seem much more complicated than our example.We will in the following define the example function called
NeedGlobalMut andgive proofs for the behavior of SD-RLS and (1+1) EA. In fact,
NeedGlobalMut isobtained from the function
NeedHighMut defined in Rajabi and Witt (2020) to show15isadvantages of stagnation detection adjusting the rate of a global mutation operator.The only change is to adjust the length of the suffix part of the function, which ratherelegantly allows us to re-use the previous technique of construction and a major part ofthe analysis. We also encourage the reader to read the corresponding section in Rajabiand Witt (2020) for further insights into the construction.In the following, we will imagine any bit string x of length n as being split into aprefix a := a ( x ) of length n − m and a suffix b := b ( x ) of length m , where m is definedbelow. Hence, x = a ( x ) ◦ b ( x ), where ◦ denotes the concatenation. The prefix a ( x )is called valid if it is of the form 1 i n − m − i , i. e., i leading ones and n − m − i trailingzeros. The prefix fitness pre ( x ) of a string x ∈ { , } n with valid prefix a ( x ) = 1 i n − m − i equals i , the number of leading ones. The suffix consists of d √ n e consecutive blocksof d n / e bits each, altogether m ≤ n / = o ( n ) bits. Such a block is called valid if itcontains either 0 or 2 one-bits; moreover, it is called active if it contains 2 and inactive ifit contains 0 one-bits. A suffix where all blocks are valid and where all blocks followingfirst inactive block are also inactive is called valid itself, and the suffix fitness suff ( x )of a string x with valid suffix b ( x ) is the number of leading active blocks before the firstinactive one. Finally, we call x ∈ { , } n valid if both its prefix and suffix are valid.The final fitness function is a weighted combination of pre ( x ) and suff ( x ). Wedefine for x ∈ { , } n , where x = a ◦ b with the above-introduced a and b , NeedGlobalMut ( x ) := n suff ( x ) + pre ( x ) if pre ( x ) ≤ n − m )10 ∧ x valid n m + pre ( x ) + suff ( x ) − n − pre ( x ) > n − m )10 ∧ x valid − OneMax ( x ) otherwise.The function NeedGlobalMut equals
NeedHighMut ξ from Rajabi and Witt(2020) for the setting ξ = 1 / ξ < n m − n −
1, which is bigger than n ( m −
1) + n , an upper bound on the fit-ness of search points that fall into the first case without having m leading active blocksin the suffix. Hence, search points x where pre ( x ) = n − m and suff ( x ) = d √ n e represent local optima of second-best overall fitness. The set of global optima equalsthe points where pre ( x ) = 9( n − m ) /
10 and suff ( x ) = d √ n e , which implies that( n − m ) /
10 = Ω( n ) bits have to be flipped simultaneously to escape from the localtoward the global optimum. Theorem 5.
With probability − o (1) , SD-RLS with R ≥ n needs Ω( n ) steps to op-timize NeedGlobalMut . The (1+1) EA optimizes this function in time O ( n ) withprobability − − Ω( n / ) . Proof.
As in the proof of Theorem 4.1 in Rajabi and Witt (2020), we have that thefirst valid search point (i. e., search point of non-negative fitness) of both SD-RLS and(1+1) EA has both pre - and suff -value value of at most n / with probability 2 − Ω( n / ) .16n the following, we tacitly assume that we have reached a valid search point of thedescribed maximum pre - and suff -value and note that this changes the required numberof improvements to reach local or global maximum only by a 1 − o (1) factor. Forreadability this factor will not be spelt out any more.As long as the counter threshold of SD-RLS is not exceeded, the algorithm behaveslike RLS. We the argumentation from Theorem 2 until the point in time where pre ( x ) = n − m since it is possible to improve the function value by one-bit flips before. Hence,the probability of ever increasing the suff -value before pre ( x ) = n − m is at most1 /R ≤ /n . The fitness can only be further improved if at least ( n − m ) /
10 bits flipsimultaneously. This requires the rate to be increased ( n − m ) / − (cid:0) n ( n − m ) / − (cid:1) = 2 Ω( n ) .This proves the statement for SD-RLS.We now analyze the success probability of the (1+1) EA. To this end, we firstbound the probability of a mutation being accepted after a valid search point hasbeen reached. Even if a mutation changes up to o ( n ) consecutive bits of the prefixor suffix, it must maintain n − o ( n ) prefix bits in order to result in a valid searchpoint. Hence, the probability of an accepted step at mutation probability 1 /n is atmost (1 − /n ) n − m − o ( n ) = (1 + o (1)) e − . Since the probability of flipping Ω( n ) bits is n − Ω( n ) , the probability of an accepted step is altogether, by the law of total probability,(1 ± o (1))(1 − /n ) n = (1 ± o (1)) e − . By similar arguments, the probability of a mutationimproving the pre -value by k at most (1 + o (1)) e − /n k and the probability of improvingthe suff -value is at least (1 − o (1))( e − / n − / since there are (cid:0) n / (cid:1) = (1 − o (1)) n / / e − /n each.We now consider a phase of (3 / emn steps. Using the bound on improving the suff -value, we expect (1 − o (1))(3 / √ n activated blocks. By Chernoff bounds, withoverwhelming probability we have at least √ n such blocks. The probability of improv-ing the pre -value by k ≥ o (1)) e − n − k , amounting to an expected numberof improvements by k of (1+ o (1))(3 / mn − k = (1+ o (1))(3 / n − k and, using Chernoffbounds and union bounds over all k = o ( n ), the probability of improving the pre -valueby at least (9 / m during the phase is 2 − Ω( n / ) . (cid:3) Our self-adjusting s -flip mutation operator can also have advantages on classical com-binatorial optimization problems. We reconsider the minimum spanning tree (MST)problem on which EAs and RLS were analyzed before Neumann and Wegener (2007).The known bounds for the globally searching (1+1) EA are not tight. More precisely,they depend on log( w max ), the logarithm of the largest edge weight. This is differentwith RLS variants that flip only one or two bits due to an equivalence first formulatedin Raidl, Koller and Julstrom (2006): if only up to two bits flip in each step, then theMST instance becomes indistinguishable from the MST instance formed by replacing alledge weights with their rank in their increasingly sorted sequence. This results in a tightupper bound of O ( m ln n ), where m is the number of edges, for RLS , , an algorithm17hat uniformly at random decides to flip either one or two uniformly chosen bits (Witt,2014). Although not spelt out in the paper, it is easy to see that the leading term inthe polynomial O ( m ln m ) is at most 2. This 2 stems from the logarithm of sum of theweight ranks, which can be in the order of m . We will see that the first factor of 2 can,in some sense, be avoided in our SD-RLS ∗ .The following theorem bounds the optimization time of SD-RLS ∗ in the case thatthe algorithm has reached a spanning tree and the fitness function only allows spanningtrees to be accepted. It is well known that with the fitness functions from Neumann andWegener (2007), the expected time to find the first spanning tree is O ( m log m ), whichalso transfers to SD-RLS ∗ ; hence we do not consider this lower-order term further.However, our bound comes with an additional term related to the number of strictimprovements. We will discuss this term after the proof. Theorem 6.
The expected optimization time of SD-RLS ∗ with R = m on the MSTproblem with m edges, starting with an arbitrary spanning tree, is at most (1 + o (1)) (cid:0) ( m / r + · · · + r m )) + (4 m ln m )E ( S ) (cid:1) = (1 + o (1)) (cid:0) m ln m + (4 m ln m )E ( S ) (cid:1) , where r i is the rank of the i th edge in the sequence sorted by increasing edge weights and E ( S ) is the expected number of strict improvements that the algorithm makes conditionedon that the strength never exceeds . Proof.
We aim at using multiplicative drift analysis using g ( x ) = P mi =1 x i r i as potentialfunction. Since the algorithm has different states we do not have the same lower boundon the drift towards the optimum. However, at strength 1 no mutation is accepted sincethe fitness function from Neumann and Wegener (2007) gives a huge penalty to non-trees. Hence, our plan is to conduct the drift analysis conditioned on that the strengthis at most 2 and account for the steps spent at strength 1 separately. Cases where thestrength exceeds 2 will be handled by an error analysis and a restart argument.Let X ( t ) := g ( x t ) − g ( x opt ) for the current search point x ( t ) and an optimal searchpoint x opt . Since the algorithm behaves stochastically the same on the original fit-ness function f and the potential function g , we obtain that E (cid:16) X ( t ) − X ( t +1) | X ( t ) (cid:17) ≥ X ( t ) / (cid:0) m (cid:1) ≥ X ( t ) /m since the g -value can be decreased by altogether g ( x t ) − g ( x opt )via a sequence of at most (cid:0) m (cid:1) disjoint two-bit flips; see also the proof of Theorem 15in Doerr, Johannsen and Winzen (2012) for the underlying combinatorial argument.Let T denote the number of steps at strength 2 until g is minimized, assuming nolarger strength to occur. Using the multiplicative drift theorem, we have E ( T ) ≤ ( m / r + · · · + r m )) ≤ ( m / m )) and by the tail bounds for mul-tiplicative drift (e. g., Lengler, 2020) it holds that Pr (cid:0) T > ( m / m ) + ln( m )) (cid:1) ≤ e − ln( m ) = 1 /m . Note that this bound on T is below the threshold for strength 2 since (cid:0) m (cid:1) ln R = ( m − m ) ln( m ) ≥ ( m / m ) for m large enough. Hence, with probabil-ity at most 1 /m the algorithm fails to find the optimum before the strength can changefrom 2 to a different value due to the threshold being exceeded.18e next bound the expected number of steps spent at larger strengths. Since eachincrease of the radius implies an unsuccessful phase at strength 2, the probability that ra-dius r , where 3 ≤ r ≤ m/
2, is selected before finding the optimum is at most (1 /m ) r − .According to Lemma 1, the number of steps spent for each such radius is at most m − ( r − m − (2 r − (cid:0) mr (cid:1) . By the law of total probability, the expected number of steps at largerstrengths than 2 is at most m/ X r =3 m − ( r − m − (2 r − mr ! (cid:18) m (cid:19) r − = o ( m )and contributes only a lower-order term captured by the o (1) in the statement of thetheorem. If the strength exceeds 2, we wait for it become 2 again and restart theprevious drift analysis, which is conditional on strength at most 2. Since the probabilityof a failure is at most 1 /m , this accounts for an expected number of at most 1 / (1 − m )restarts, which is 1 + o (1) as well.It remains to bound the number of steps at strength 1. For each strict improvement,the strength is reset to 1. Thereafter, m ln R = 4 m ln n steps pass before the strengthbecomes 2 again. Hence, if the strength does not exceed 2 before the optimum is reached,this adds a term of (4 m ln n ) S , where S is the number of strict improvements in the run,to the running time. The expected number of strict improvements is bounded by E ( S ),where we assume a random starting point of the algorithm and count the number ofstrict improvement after reaching the first tree. If an error occurs and the strengthexceeds 2, the remaining expected number of strict improvements will not be bigger. (cid:3) The term E ( S ) appearing in the previous theorem is not easy to bound. If E ( S ) = o ( m ), the upper bound suggests that SD-RLS may be more efficient than the classicalRLS , algorithm; with the caveat that we are talking about upper bounds only. How-ever, it is not difficult to find examples where E ( S ) = Ω( m ), e. g., on the worst-casegraph used for the lower-bound proof in Neumann and Wegener (2007), which we willstudy below experimentally, and we cannot generally rule out that E ( S ) is asymptoti-cally bigger than m on certain instances. However, empirically SD-RLS ∗ can be fasterthan RLS , and the (1+1) EA on MST instances, as we will see in Section 7. In anycase, although the algorithm can search globally, the bound in Theorem 6 does not sufferfrom the log( w max ) factor appearing in the analysis of the (1+1) EA.We also considered variants of SD-RLS ∗ that do not reset the strength to 1 aftereach strict improvement and would therefore, be able to work with strength 2 for a longwhile on the MST problem. However, such an approach is risky in scenarios where, e. g.,both one-bit flips and two-bit flips are possible and one-bit flips should be exploited forthe sake of efficiency. Instead, we think that a combination of stagnation detection andselection hyperheuristics (Warwicker, 2019) based on the s -flip operator or the learningmechanism from Doerr, Doerr and Yang (2016), which performs very well on the MST,would be more promising here. 19igure 1: Average number of fitness calls (over 1000 runs) the mentioned algorithmstook to optimize Jump . In this section, we present the results of the experiments conducted to see the perfor-mance of the proposed algorithms for small problem dimensions. This experimentaldesign was employed because our theoretical results are asymptotic.In the first experiment, we ran an implementation of Algorithm 3 (SD-RLS ∗ ) onthe Jump fitness function with jump size m = 4 and n varying from 80 to 160. Wecompared our algorithm against the (1+1) EA with standard mutation rate 1 /n , the(1+1) EA with mutation probability m/n , Algorithm (1+1) FEA β from Doerr et al.(2017) with three different β = { . , , } , and the SD-(1+1) EA presented in Rajabiand Witt (2020). In Figure 1, we observe that SD-RLS ∗ outperforms the rest of thealgorithms.In the second experiment, we ran an implementation of four algorithms SD-RLS ∗ ,(1+1) FEA β with β = 1 . , from Neumann and Wegener (2007) on the MST problem with the fitness function fromNeumann and Wegener (2007) for two types of graphs called TG and Erd˝os–R´enyi.The graph TG with n vertices and m = 3 n/ (cid:0) n/ (cid:1) edges contains a sequence of p = n/ q = n/
2. Regarding the weights, the edges of the completegraph have the weight 1, and we set the weights of edges in triangle to 2 a and 3 a forthe side edges and the main edge, respectively. In this paper, we consider a = n . Thegraph TG is used for estimating lower bounds on the expected runtime of the (1+1) EAand RLS in the literature Neumann and Wegener (2007). In this experiment, we use20igure 2: Box plots comparing number of fitness calls (over 1000 runs) the mentionedalgorithms took to optimize Jump .Figure 3: Example graph TG with p = n/ q vertices with edges of weight 1. The source of the image is from Neumann andWegener (2007). 21 a) Graphs Erd˝os–R´enyi (b) Graphs TG Figure 4: Average number of fitness calls (over 400 runs) the mentioned algorithms tookto optimize the fitness function MST of the graphs. n = { , , , } . As can be seen in Figure 4b, (1+1) FEA β is faster than the rest ofthe algorithms, but SD-RLS ∗ outperforms the standard (1+1) EA and RLS , .Regarding the graphs Erd˝os–R´enyi, we produced some random Erd˝os–R´enyi graphswith p = (2 ln n ) /n and assigned each edge an integer weight in the range [1 , n ] uniformlyat random. We also checked that the graphs certainly had a spanning tree. Then, weran the implementation on MST of these graphs. The obtained results can be seen inFigure 4a. As we discussed in Section 6, SD-RLS ∗ does not outperform the (1+1) EAand RLS , on MST with graphs where the number of strict improvements in SD-RLS ∗ is large.For statistical tests, we ran the algorithms on the graphs TG and Erd˝os–R´enyi over400 times, and all p-values obtained from a Mann-Whitney U-test between the algo-rithms, with respect to the null hypothesis of identical behavior, are less than 10 − except for the results regarding the graph TG with n = 24. Conclusions
We have transferred stagnation detection, previously proposed for EAs with standard bitmutation, to the operator flipping exactly s uniformly randomly chosen bits as typicallyencountered in randomized local search. Through both theoretical runtime analysesand experimental studies we have shown that this combination of stagnation detectionand local search efficiently leaves local optimal and often outperforms the previouslyconsidered variants with global mutation. We have also introduced techniques thatmake the algorithm robust if it, due to its randomized nature, misses the right numberof bits flipped, and analyzed scenarios where global mutations are still preferable. In thefuture, we would like to investigate stagnation detection more thoroughly on instances ofclassical combinatorial optimization problem like the minimum spanning tree problem,for which the present paper only gives preliminary but promising results.22 cknowledgement This work was supported by a grant by the Danish Council for Independent Research(DFF-FNU 8021-00260B).
References
Bassin, Anton and Buzdalov, Maxim (2019). The 1/5-th rule with rollbacks: on self-adjustment of the population size in the (1+( λ , λ )) GA. In Proc. of GECCO 2019(Companion) , 277–278. ACM Press.Corus, Dogan, Oliveto, Pietro S., and Yazdani, Donya (2018). Fast artificial immunesystems. In
Proc. of PPSN 2018 , vol. 11102 of
LNCS , 67–78. Springer.Doerr, Benjamin (2020). Probabilistic tools for the analysis of randomized optimizationheuristics. In Doerr, Benjamin and Neumann, Frank (eds.),
Theory of EvolutionaryComputation – Recent Developments in Discrete Optimization , 1–87. Springer.Doerr, Benjamin and Doerr, Carola (2016). The impact of random initialization on theruntime of randomized search heuristics.
Algorithmica , (3), 529–553.Doerr, Benjamin and Doerr, Carola (2018). Optimal static and self-adjusting parameterchoices for the (1+( λ , λ )) genetic algorithm. Algorithmica , (5), 1658–1709.Doerr, Benjamin and Doerr, Carola (2020). Theory of parameter control for dis-crete black-box optimization: Provable performance gains through dynamic parameterchoices. In Doerr, B. and Neumann, F. (eds.), Theory of Evolutionary Computation– Recent Developments in Discrete Optimization , 271–321. Springer.Doerr, Benjamin, Doerr, Carola, and Yang, Jing (2016). k -bit mutation with self-adjusting k outperforms standard bit mutation. In Proc. of PPSN 2016 , vol. 9921of
Lecture Notes in Computer Science , 824–834. Springer.Doerr, Benjamin, Fouz, Mahmoud, and Witt, Carsten (2010). Quasirandom evolutionaryalgorithms. In
Proc. of GECCO ’10 , 1457–1464. ACM.Doerr, Benjamin, Gießen, Christian, Witt, Carsten, and Yang, Jing (2019). The (1+ λ ) evolutionary algorithm with self-adjusting mutation rate. Algorithmica , (2),593–631.Doerr, Benjamin, Jansen, Thomas, and Klein, Christian (2008). Comparing global andlocal mutations on bit strings. In Ryan, Conor and Keijzer, Maarten (eds.), Proc. ofGECCO ’08 , 929–936. ACM Press.Doerr, Benjamin, Johannsen, Daniel, and Winzen, Carola (2012). Multiplicative driftanalysis.
Algorithmica , , 673–697. 23oerr, Benjamin, Le, Huu Phuoc, Makhmara, R´egis, and Nguyen, Ta Duy (2017). Fastgenetic algorithms. In Proc. of GECCO ’17 , 777–784. ACM Press.Hansen, Pierre and Mladenovic, Nenad (2018). Variable neighborhood search. In Mart´ı,Rafael, Pardalos, Panos M., and Resende, Mauricio G. C. (eds.),
Handbook of Heuris-tics , 759–787. Springer.Lengler, Johannes (2020). Drift analysis. In Doerr, B. and Neumann, F. (eds.),
Theory ofEvolutionary Computation – Recent Developments in Discrete Optimization , 89–131.Springer.Lissovoi, Andrei, Oliveto, Pietro S., and Warwicker, John Alasdair (2019). On the timecomplexity of algorithm selection hyper-heuristics for multimodal optimisation. In
Proc. of AAAI 2019 , 2322–2329. AAAI Press.Lugo, Michael (2017). Sum of ”the first k ” binomial coefficients for fixed n . MathOver-flow. URL https://mathoverflow.net/q/17236 . (version: 2017-10-01).Neumann, Frank and Wegener, Ingo (2007). Randomized local search, evolutionaryalgorithms, and the minimum spanning tree problem. Theoretical Computer Science , , 32–40.Raidl, G¨unther R., Koller, Gabriele, and Julstrom, Bryant A. (2006). Biased muta-tion operators for subgraph-selection problems. IEEE Transaction on EvolutionaryComputation , (2), 145–156.Rajabi, Amirhossein and Witt, Carsten (2020). Self-adjusting evolutionary algorithmsfor multimodal optimization. In Proc. of GECCO ’20 , 1314–1322. ACM Press.Warwicker, John Alasdair (2019).
On the runtime analysis of selection hyper-heuristicsfor pseudo-Boolean optimisation . Ph.D. thesis, University of Sheffield, UK. URL http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.786561 .Wegener, Ingo (2001). Methods for the analysis of evolutionary algorithms on pseudo-Boolean functions. In Sarker, Ruhul, Mohammadian, Masoud, and Yao, Xin (eds.),
Evolutionary Optimization . Kluwer Academic Publishers.Witt, Carsten (2014). Revised analysis of the (1+1) EA for the minimum spanning treeproblem. In