Analyzing Adaptive Parameter Landscapes in Parameter Adaptation Methods for Differential Evolution
AAnalyzing Adaptive Parameter Landscapes in ParameterAdaptation Methods for Differential Evolution
Ryoji Tanabe
Yokohama National UniversityYokohama, [email protected]
ABSTRACT
Since the scale factor and the crossover rate significantly influencethe performance of differential evolution (DE), parameter adapta-tion methods (PAMs) for the two parameters have been well studiedin the DE community. Although PAMs can sufficiently improve theeffectiveness of DE, PAMs are poorly understood (e.g., the workingprinciple of PAMs). One of the difficulties in understanding PAMscomes from the unclarity of the parameter space that consists of thescale factor and the crossover rate. This paper addresses this issueby analyzing adaptive parameter landscapes in PAMs for DE. First,we propose a concept of an adaptive parameter landscape, whichcaptures a moment in a parameter adaptation process. For each iter-ation, each individual in the population has its adaptive parameterlandscape. Second, we propose a method of analyzing adaptive pa-rameter landscapes using a 1-step-lookahead greedy improvementmetric. Third, we examine adaptive parameter landscapes in threePAMs by using the proposed method. Results provide insightfulinformation about PAMs in DE.
CCS CONCEPTS • Mathematics of computing → Evolutionary algorithms ; KEYWORDS
DE, parameter adaptation methods, landscape analysis
ACM Reference Format:
Ryoji Tanabe. 2020. Analyzing Adaptive Parameter Landscapes in ParameterAdaptation Methods for Differential Evolution. In
Genetic and EvolutionaryComputation Conference (GECCO ’20), July 8–12, 2020, Cancún, Mexico.
ACM,New York, NY, USA, 9 pages. https://doi.org/10.1145/3377930.3389820
This paper considers a black-box numerical optimization. Theseproblems involve finding a d -dimensional solution x = ( x , ..., x d ) ⊤ that minimizes a given objective function f : R d → R , x (cid:55)→ f ( x ) .Any explicit knowledge of f is not given in black-box optimization.Differential Evolution (DE) is a variant of evolutionary algo-rithms (EAs) mainly for black-box numerical optimization [40].The results in the annual IEEE CEC competitions have shown thatDE is competitive with more complex optimizers despite its relative GECCO ’20, July 8–12, 2020, Cancún, Mexico © 2020 Association for Computing Machinery.This is the author’s version of the work. It is posted here for your personal use. Notfor redistribution. The definitive Version of Record was published in
Genetic andEvolutionary Computation Conference (GECCO ’20), July 8–12, 2020, Cancún, Mexico ,https://doi.org/10.1145/3377930.3389820. simplicity. A number of previous studies have also demonstratedthe effectiveness of DE in real-world applications [8, 9].Main control parameters in the basic DE [40] include the popu-lation size n , the scaling factor F , and the crossover rate C . Fromthe late 1990s to the early 2000s, it had been believed that the per-formance of DE is robust with respect to the settings of F and C [40]. However, some studies in the mid-2000s demonstrated thatthe performance of DE is sensitive to the settings of the controlparameters [7, 14, 55]. In general, the performance of EAs signifi-cantly depends on the characteristics of a given problem and thestate of the search progress [12]. Thus, a fixed parameter setting(e.g., F = . C = .
9) does not yield the best performanceof an EA. For these reasons, DE algorithms that automatically ad-just the control parameters (mainly F and C ) have received muchattention in the DE community since the mid-2000s. Representa-tive adaptive DE algorithms include jDE [7], SaDE [37], JADE [53],EPSDE [28], and SHADE [42]. These adaptive DE algorithms havemechanisms to adaptively adjust the F and C parameters during thesearch process. Parameter control methods in EAs can be classifiedinto deterministic, adaptive, and self-adaptive control methods [12].Although some DE algorithms with deterministic and self-adaptiveapproaches have been proposed (e.g., [32, 48]), adaptive approacheshave mainly been studied in the DE community [45].As in [44, 45], this paper explicitly distinguishes “an adaptiveDE” and “a parameter adaptation method (PAM) in an adaptive DE”.While “an adaptive DE” is a complex algorithm that consists ofmultiple components, “a PAM” is a single component only for adap-tively adjusting F and C values. As explained in [45], “L-SHADE”[46] is “an adaptive DE” that mainly consists of the following fourcomponents: (a) the current-to- p best/1 mutation strategy [53], (b)the binomial crossover, (c) the “PAM” in SHADE [42], and (d) thelinear population size reduction strategy. In this paper, we are in-terested in (c) the “PAM” in SHADE, rather than L-SHADE.While most previous studies focused on “adaptive DE algorithms”(e.g., [38, 52]), only a few previous studies tried to examine “PAMs”in DE. Zielinski et al. investigated the performance of some PAMsfor constrained optimization in an isolated manner [54]. Similarbenchmarking studies for multi- and single-objective optimizationwere performed in [10, 45], respectively. In [43], a lower bound onthe performance of PAMs was analyzed by using an oracle-basedmethod for approximating an optimal parameter adaptation processin DE. A simulation framework for quantitatively evaluating theadaptation ability of PAMs was also proposed in [44].One of the difficulties in analyzing PAMs comes from the unclar-ity of the control parameter space that consists of F and C . For eachiteration t , a PAM generates a parameter pair of F ti and C ti values θ ti = ( F ti , C ti ) for the i -th individual x ti in the population, where a r X i v : . [ c s . N E ] S e p ECCO ’20, July 8–12, 2020, Cancún, Mexico Ryoji Tanabe
Table 1:
Summary of the three landscape analysis.
Landscape analysis Target space HeightFitness landscape analysis Solutions of a given problem Fitness (or objective) values of solutionsParameter landscape analysis Static parameters in an EA (e.g., DE) Expected performance of an EA with parametersAdaptive parameter landscape analysis Dynamic parameters adjusted by a PAM 1-step-lookahead greedy improvement metric (G1) of parameters i ∈ { , ..., n } . It is desirable that a trial vector u ti (a child or a newsolution) generated with θ ti is better than its parent x ti in termsof their objective values. Generating a good θ ti can be viewed asa two-dimensional numerical optimization problem. The goal ofthis problem is to find the optimal parameter pair θ ∗ , ti ∈ Θ ti thatminimizes the objective value of a trial vector u ti , where Θ ti ⊆ R is a set of all feasible pairs of F and C values. Although the prop-erties of Θ ti ( NOT the optimal parameter ) can provide insightfulinformation about PAMs, they have never been analyzed in the DEcommunity and the evolutionary computation community.This paper tries to understand Θ ti by analyzing its adaptive pa-rameter landscape. The term “ adaptive parameter landscapes ” is anew concept proposed in this paper inspired by recent work on parameter landscapes [20, 36, 51]. As reviewed in [27, 31, 35], fitnesslandscapes have been well studied in the evolutionary computationcommunity. In contrast, the field of parameter landscape analysisis relatively new. A parameter landscape consists of feasible pa-rameter values in an EA. The “height” in parameter landscapes isthe expected performance (or the utility) of an EA with controlparameters on training problem instances [13]. An adaptive param-eter landscape proposed in this paper can be viewed as a dynamicversion of a parameter landscape influenced by a PAM. For eachiteration t , each individual x ti in the population has its adaptiveparameter landscape that consists of Θ ti . A PAM can be intuitivelyanalyzed by investigating its adaptive parameter landscapes. Table 1summarizes differences in fitness landscapes, parameter landscapes,and adaptive parameter landscapes. They are explained in Sections2.3, 2.4, and 3.1, respectively.Our contributions in this paper are at least threefold:(1) We propose a concept of adaptive parameter landscapes,which are landscapes of dynamic parameters adjusted byPAMs. This is the first study to address such dynamicallychanging parameter landscapes in the DE community andthe evolutionary computation community.(2) We propose a method of analyzing adaptive parameter land-scapes using a 1-step-lookahead greedy improvement metric.(3) We examine adaptive parameter landscapes in three repre-sentative PAMs on the 24 BBOB functions [17] by using theproposed method. Results provide insightful informationabout PAMs. Our observations are summarized in Section 6.The rest of this paper is organized as follows. Section 2 providessome preliminaries. Section 3 explains the concept of adaptiveparameter landscapes and the proposed analysis method. Section 4describes the setting of our computational experiments. Section 5shows analysis results. Section 6 concludes this paper. First, Section 2.1 explains the basic DE with a PAM. Then, Section2.2 describes three PAMs in DE (the PAMs in jDE [7], JADE [53], andSHADE [42]). Finally, Sections 2.3 and 2.4 explain fitness landscapeanalysis and parameter landscape analysis, respectively.
Algorithm 1 shows the overall procedure of the basic DE algorithmwith a PAM. Below, we explain DE in an unusual manner for a betterunderstanding of the proposed G1 metric in Section 3.2.At the beginning of the search t =
1, the population P t = { x t , ..., x tn } is initialized (line 1), where n is the population size.For each i ∈ { , ..., n } , x ti is the i -th individual in the population P t .Each individual represents a d -dimensional solution of a problem.For each j ∈ { , ..., d } , x ti , j is the j -th element of x ti .After the initialization of P t , the following steps (lines 2–14) arerepeatedly performed until a termination condition is satisfied. Foreach x ti , a parameter pair θ ti = ( F ti , C ti ) is generated by a PAM (line4). The scale factor F ti > C ti ∈ [ , ] controls the number of elementsinherited from x ti to a trial vector (child) u ti . When θ ti is fixed forall individuals in the entire search process, Algorithm 1 becomesthe classical DE without any PAM [40].A set of parent indices R = { r , r , ... } are randomly selectedfrom { , ..., n }\{ i } such that they differ from each other (line 5). Foreach x ti , a mutant vector v ti is generated by applying a differentialmutation to x tr , x tr , ... (line 6). Although a number of mutationstrategies have been proposed in the literature [9], we consider thefollowing two representative mutation strategies: v ti = x tr + F ti ( x tr − x tr ) , (1) v ti = x ti + F ti ( x tp best − x ti ) + F ti ( x tr − ˜ x tr ) , (2)where the strategy in (1) is rand/1 [40], and the strategy in (2)is current-to- p best/1 [53]. The rand/1 strategy is the most basicstrategy, and the current-to- p best/1 strategy is one of the mostefficient strategies used in recent work (e.g., [42], [46], [53]). Foreach individual, the individual x tp best is randomly selected from thetop “max (⌊ n × p ⌋ , ) ” individuals in P t , where p ∈ [ , ] controlsthe greediness of current-to- p best/1. The individual ˜ x tr in (2) israndomly selected from a union of P t and an external archive A t , where inferior parent individuals are preserved in A t (how toupdate A t is explained later).After the mutant vector v ti has been generated for each x ti , a trialvector u ti is generated by applying crossover to x ti and v ti (lines7–9). In this paper, we use binomial crossover [40], which is themost representative crossover method in DE. First, a d -dimensional nalyzing Adaptive Parameter Landscapes in PAMs for DE GECCO ’20, July 8–12, 2020, Cancún, Mexico Algorithm 1:
The basic DE algorithm with a PAM t ← , initialize P t = { x t , ..., x tn } randomly; while The termination criteria are not met do for i ∈ { , ..., n } do Sample a parameter pair θ ti = ( F ti , C ti ) ; R ← A set of randomly selected indices from { , ..., n } \ { i } ; v ti ← mutation ( P t , R , F ti ) ; s ← A randomly generated d -dimensional vector ( s , ..., s d ) ⊤ ; j rand ← A randomly selected number from { , ..., d } ; u ti ← crossover ( x ti , v ti , C ti , s , j rand ) ; for i ∈ { , ..., n } do if f ( u ti ) ≤ f ( x ti ) then x t + i ← u ti ; else x t + i ← x ti ; Update internal parameters for adaptation of F and C ; t ← t + ; vector s = ( s , ..., s d ) T is generated (line 7), where each element in s is randomly selected from [ , ] . An index j rand is also randomlyselected from { , ..., d } (line 8). Then, for each i ∈ { , ..., n } , thetrial vector u ti is generated as follows (line 9): u ti , j = (cid:40) v ti , j if s j ≤ C ti or j = j rand x ti , j otherwise , (3)where the existence of j rand ensures that at least one element isinherited from v ti even when C ti = u ti has been generated for each x ti , theenvironmental selection is performed in a pair-wise manner (lines10–12). For each i ∈ { , ..., n } , x ti is compared with u ti . The betterone between x ti and u ti survives to the next iteration t +
1. Theindividuals that were worse than the trial vectors are preservedin the external archive A used in (2). When the size of the archiveexceeds a pre-defined size, randomly selected individuals are deletedto keep the archive size constant. After the environmental selection,some internal parameters in a PAM are updated (line 13). We briefly explain the following three representative PAMs for DE:the PAM for jDE (P-jDE), the PAM for JADE (P-JADE), and thePAM for SHADE (P-SHADE). Our explanations are based on [45].Although we briefly explain the three PAMs due to space constraint,their detailed explanations with precisely described pseudo-codescan be found in [45]. Below, the generation of the trial vector u ti is said to be successful if f ( u ti ) ≤ f ( x ti ) (line 11 in Algorithm 1).Otherwise, the generation of u ti is said to be failed .P-jDE [7] assigns a pair of F ti and C ti to each x ti in P t . At thebeginning of the search, these parameter values are initialized to F ti = . C ti = . i ∈ { , ..., n } . In each iteration t , F trial , ti and C trial , ti used for the generation of u ti are inheritedfrom x ti as follows: F trial , ti = F ti and C trial , ti = C ti . However, withpre-defined probabilities τ F and τ C , these values are randomly gen-erated as follows: F trial , ti = randu [ . , ] and C trial , ti = randu [ , ] .Here, randu [ a , b ] is a value selected uniformly randomly from [ a , b ] . In general, the two hyper-parameters τ F and τ C are set to0 .
1. When the generation of u i , t is successful, F t + i = F trial , ti and C t + i = C trial , ti . Otherwise, F t + i = F ti and C t + i = C ti . P-JADE [53] adaptively adjusts F and C values using two meta-parameters µ F and µ C , respectively. For t =
1, both µ F and µ C are initialized to 0 .
5. For each iteration t , F ti and C ti are generatedas follows: F ti = randc ( µ F , . ) and C ti = randn ( µ C , . ) . Here,randn ( µ , σ ) is a value selected randomly from a Normal distri-bution with mean µ and variance σ . Also, randc ( µ , σ ) is a valueselected randomly from a Cauchy distribution with location pa-rameter µ and scale parameter σ . At the end of each iteration, µ F and µ C are updated based on sets S F and S C of successful F and C values as follows: µ F = ( − c ) µ F + c mean L ( S F ) and µ C = ( − c ) µ C + c mean A ( S C ) . Here, c ∈ [ , ] is a learning rate.In general, c = .
1. While mean A ( S F ) is the arithmetic mean of S F ,mean L ( S C ) is the Lehmer mean of S C .P-SHADE [42] adaptively adjusts F and C using historical mem-ories M F = ( M F , ..., M FH ) and M C = ( M C , ..., M CH ) . Here, H is amemory size. H =
10 was recommended in [44]. For t =
1, all ele-ments in M F and M C are initialized to 0 .
5. As reviewed in [45], someslightly different versions of P-SHADE have been proposed by thesame authors. As in [45], this paper considers the simplest versionof P-SHADE presented in [44]. In each iteration t , F ti and C ti are gen-erated as follows: F ti = randc ( M Fr , . ) and C ti = randn ( M Cr , . ) ,where r is an index randomly selected from { , ..., H } . At the endof each iteration, the k -th elements in M F and M C are updatedas follows: M Fk = mean L ( S F ) and M Ck = mean L ( S C ) . An index k ∈ { , ..., H } represents the position to be updated and is incre-mented on every update. If k > H , k is re-initialized to 1. According to Pitzer and Affenzeller [35], a fitness landscape L f in anumerical optimization problem is defined by a 3-tuple as follows: L f = ( X , f , D ) , (4)where X ⊆ R d is the solution space (i.e., a set of all feasible solutions x ). Also, f : x (cid:55)→ f ( x ) is the objective function of a given problem. D : x × x (cid:55)→ R is a distance function between two solutions (e.g.,the Euclidean distance).An analysis of L f can provide useful information even for black-box optimization. For example, if the features of a given problem(e.g., ruggedness and neutrality) becomes clear by analyzing L f , anappropriate optimizer can be selected [31]. A number of methodsfor analyzing L f have been proposed in the literature [27, 35]. Rep-resentative methods include fitness distance correlation (FDC) [24],dispersion metric (DISP) [26], and evolvability [39]. Recently, moresophisticated methods have been proposed, such as exploratorylandscape analysis (ELA) [29] and local optima networks (LON) [1].These methods can quantify at least one feature about L f . For ex-ample, the FDC value represents a global structure of L f based onthe correlation between the distance from solutions to the optimalsolution and their objective values. Roughly speaking, a parameter tuning problem [13] involves find-ing a tuple θ of control parameters that optimizes the empiricallyestimated performance of an algorithm on a set of training probleminstances. For example, a parameter tuning problem for the basicDE with no PAM can be defined as a problem to find θ = ( F , C ) ECCO ’20, July 8–12, 2020, Cancún, Mexico Ryoji Tanabe that minimizes the average objective values of best-so-far solutionson the Sphere, Rastrigin, and Rosenbrock functions. In general, theparameter tuning problem addresses only numerical parameters.In contrast, an algorithm configuration problem [22] addressesnumerical, ordinal (e.g., low, medium, and high), and categoricalparameters (e.g., the choice of mutation strategies). According to[21], parameter tuning is a problem that involves only numerical pa-rameters, while algorithm configuration is a problem that involvesmany categorical parameters. The parameter tuning problem canbe viewed as a special case of the algorithm configuration problem.Parameter landscapes appear in parameter tuning problems.Since it is difficult to define a distance function for categorical pa-rameters, the field of parameter landscape analysis considers onlynumerical parameters as in [36]. The term “parameter landscapes”was first coined in [51]. Parameter landscapes were also denotedas “performance landscapes” [50], “meta-fitness landscapes” [34],“utility landscapes” [13], “ERT landscapes” [5], “parameter configu-ration landscapes” [20], and “algorithm configuration landscapes”[36]. To avoid any confusion, we use the term “parameter land-scapes” throughout this paper. Since only numerical parametersare considered, we believe that the term “parameter landscapes” isappropriate. Although some previous studies (e.g., [4, 25]) did notuse the term “landscapes”, they essentially investigated parameterlandscapes.According to Harrison et al. [20], a parameter landscape L p in aparameter tuning problem is formally defined as follows: L p = ( Θ , M , D ) , (5)where the definition of L p in (5) is a slightly different versionof the original one in [20]. Θ is the numerical parameter space(i.e., a set of all feasible parameters θ ). Also, M : θ (cid:55)→ M ( θ ) is aperformance metric that empirically estimates the performanceof a given algorithm on a set of training problem instances (e.g.,the average of objective values [20] and PAR10 [36]). Similar to (4), D : θ × θ (cid:55)→ R is a distance function between two parameters.Helpful information about parameter tuning and an algorithmcan be obtained by analyzing L p . For example, as mentioned in [50],if L p is multimodal, a global parameter tuner may perform betterthan a local parameter tuner. As demonstrated in [51], an influenceof multiple parameters on the performance of an algorithm can bevisually discussed by analyzing L p . First, Section 3.1 explains the proposed concept of adaptive pa-rameter landscapes. Then, Section 3.2 introduces a 1-step greedyfitness improvement (G1) metric, which is a performance metricfor adaptive parameter landscapes. Finally, Section 3.3 proposes themethod of analyzing adaptive parameter landscapes.
We define an adaptive parameter landscape L a in a PAM as follows: L a = ( Θ ti , M , D ) , (6)where Θ ti is the numerical parameter space for the i -th individual inthe population at iteration t (i.e., a set of all feasible parameters θ ti ). The difference between L p in (5) and L a in (6) is only the targetspace ( Θ vs. Θ ti ). While Θ in L p is static, Θ ti in L a is dynamic. Anadaptive parameter landscape can be viewed as a parameter land-scape that captures a moment in a parameter adaptation process.Our ultimate goal is to understand PAMs for DE. While therehave been significant contributions in analyzing DE itself in recentyears (e.g., [3, 33]), only a few previous studies examined PAMs forDE (see Section 1). One reason is that very little is known aboutdynamically changing parameter spaces handled by PAMs. Webelieve that this issue can be addressed by analyzing adaptive pa-rameter landscapes. A better understanding of adaptive parameterlandscapes in PAMs can also lead to design a more efficient PAM.Recall that Table 1 in Section 1 has already summarized the dif-ferences in a fitness landscape L f in (4), a parameter landscape L p in (5), and an adaptive parameter landscape L a in (6). Very recently,Jankovic and Doerr [23] investigated dynamic fitness landscapesseen from CMA-ES [18]. They denoted their analysis as “adaptivelandscape analysis”. While “adaptive landscape analysis” focuseson fitness landscapes of a problem, our adaptive parameter land-scape analysis focuses on dynamic parameter landscapes adaptivelyadjusted by PAMs. Thus, the names “adaptive landscape analysis”and “adaptive parameter landscape analysis” are similar, but theyare totally different from each other. As analyzed in [6, 11, 19], thebest parameter settings in EAs depend on the maximum numberof function evaluations when the performance of EAs is estimatedbased on final results (e.g., the objective value of the best-so-farsolution at the end of each run). We are interested in dynamicallychanging parameter landscapes, rather than such static parameterlandscapes limited by a termination criterion.Automated algorithm methods based on fitness landscape fea-tures of a given problem have been well studied in the evolutionarycomputation community [31]. Note that an adaptive parameterlandscape analysis do not mean such a parameter selection ap-proach that seeks the best static parameters (i.e., F and C , not F ti and C ti ) based on fitness landscape features in a one-shot manner. For example, this paper is unrelated to [5].
One critical obstacle in analyzing an adaptive parameter landscape L a in (6) is how to define the performance metric M . For an analysisof a parameter landscape L p in (5), some performance metricscan be derived from the field of parameter tuning without anysignificant change (e.g., PAR10 as mentioned in Section 2.4). Incontrast, M in L a is not obvious.Here, we introduce a 1-step-lookahead greedy improvement (G1)metric as a performance metric M for analyzing L a . As explainedin Section 2.1 using Algorithm 1, DE generates the trial vector u ti for each parent individual x ti ( i ∈ { , ..., n } ) at each iteration t .A parameter pair of F and C values θ ti = ( F ti , C ti ) is used for thegeneration of u ti . We define the G1 value of θ ti as follows:G1 ( θ ti ) = (cid:40)(cid:12)(cid:12) f ( x ti ) − f ( u ti ) (cid:12)(cid:12) if f ( u ti ) < f ( x ti ) . (7) nalyzing Adaptive Parameter Landscapes in PAMs for DE GECCO ’20, July 8–12, 2020, Cancún, Mexico The G1 ( θ ti ) value in (7) represents how significantly θ ti con-tributes to generate a better u ti than x ti in terms of the objectivevalue. A large G1 value indicates that the corresponding θ ti cangenerate a good u ti . For example, let us consider the following threeparameter pairs used for the generation of u ti : θ ti , , θ ti , , and θ ti , .Their G1 values are also as follows: G1 ( θ ti , ) = .
7, G1 ( θ ti , ) = . ( θ ti , ) =
0. In this case, θ ti , is the best in the three parameterpairs in terms of G1. The objective value of x ti can be significantlyimproved by using θ ti , . G1 ( θ ti , ) = u ti generated byusing θ ti , is inferior (or equal) to x ti . Note that the G1 value isalways non-negative. In DE, an inferior trial vector compared toits parent individual cannot survive to the next iteration. For thisreason, we equally treat all parameter pairs of F and C values thatgenerate worse trial vectors than their parent individuals.The idea of measuring the fitness improvement value as in (7)itself is not new at all. Such an approach can be found in the litera-ture (e.g., [16, 49]). In contrast to previous studies, the G1 metricaims to capture adaptive parameter landscapes by the proposedmethod explained in the next section. L a We explain the proposed method of analyzing L a . Our proposedmethod can be incorporated into DE in Algorithm 1 with no change.The procedure of our proposed method is totally independent fromthat of DE. Thus, the search behavior of DE with and without ourproposed method is exactly the same.First, m parameter pairs θ ti , , ..., θ ti , m are generated for eachindividual x ti at iteration t (line 4 in Algorithm 1). Although anygeneration method can be used (e.g., the random sampling method),we generate m parameter pairs in a grid manner in this study.We generate 50 ×
50 parameter pairs in the ranges F ∈ [ , ] and C ∈ [ , ] . Thus, m = × = ×
50 parameter pairs. We notice thatany differential mutation strategy with F = F = F = L a . We also notice that the upper valueof F is unbounded in principle, but it was generally set to 1 in mostprevious studies (e.g., [7, 28, 42, 53]).Then, we calculate the G1 values of the m parameter pairsG1 ( θ ti , ) , ..., G1 ( θ ti , m ) by simply generating m trial vectors u ti , ,..., u ti , m . Their objective values f ( u ti , ) , ..., f ( u ti , m ) are evaluatedby f . Then, for each j ∈ { , ..., m } , G1 ( θ ti , j ) is calculated by (7). Itshould be noted that m extra function evaluations by f are neededto calculate the m objective values f ( u ti , ) , ..., f ( u ti , m ) .In the proposed method, the m extra function evaluations foreach individual are not counted in the function evaluations used inthe search. This manner is similar to GAO [43]. The m trial vectorsare used only for adaptive parameter landscape analysis and arenot used for the actual search. Independently of the generation ofthe m trial vectors, each individual x ti generates its trial x ti as inthe traditional DE (lines 4–9 in Algorithm 1). As mentioned above,the behavior of DE with any PAM does not change even whengenerating the m extra trial vectors. . . . . F . . . . C (a) × pairs . . . . F . . . . C . . . . . . . . (b) Contour map of L a Figure 1: (a) Distribution of × pairs of F and C valuesgenerated in a grid manner. (b) Contour map of its L a . The stochastic nature of the basic DE is due to (1) the randomselection of individual indices R = { r , r , ... } for mutation (line5 in Algorithm 1) and (2) the generation of random numbers s = ( s , ..., s d ) ⊤ and j rand for crossover (lines 7–8 in Algorithm 1). Thus,the stochastic nature of DE can be “virtually” suppressed by fixingthese random factors ( R , s , and j rand ). For each iteration t , eachindividual x ti generates one actual trial vector u ti and the m extratrial vectors u ti , , ..., u ti , m using the same R , s , and j rand . When usingthe current-to- p best/1 strategy in (2), x tp best must also be fixed. Notethat this suppression mechanism is used only to generate the m trial vectors for each individual x ti .Figure 1(b) shows the contour map of L a based on the 50 × L a is the normalizedG1 value. For the sake of clarity, for each individual x ti , we normalizeall G1 values into the range [ , ] by using the maximum G1 valueG1 max and the minimum G1 value G1 min as follows: G1 ( θ ti , j ) = ( G1 ( θ ti , j ) − G1 min )/( G1 max − G1 min ) , where j ∈ { , ..., m } . Thecontour map in Figure 1(b) is L a in the 100-th individual in P-SHADE on the 20-dimensional f in the BBOB function set [17].Figure 1(b) shows L a at the 100-th function evaluations. We usedthe same experimental setting explained in Section 4. Details of thesetting are described in Section 4 later. Figure 1(b) is the same withthe most bottom left of Figure 2. In Figure 1(b), the parameter pairof F = .
78 and C = m = ×
50 parameter pairsin terms of G1. As seen from Figure 1(b), the closer the parameterpair is to the best parameter pair, the better the G1 value.
We performed all experiments using the COCO software (https://github.com/numbbo/coco), which is standard in the GECCO black-box optimization benchmarking (BBOB) workshops since 2009. Weused the 24 BBOB noiseless functions f , ..., f [17], which aregrouped into the following five categories: separable functions( f , ..., f ), functions with low or moderate conditioning ( f , ..., f ),functions with high conditioning and unimodal ( f , ..., f ), multi-modal functions with adequate global structure ( f , ..., f ), andmultimodal functions with weak global structure ( f , ..., f ). Thedimensionality d of the BBOB functions was set to 2 , , , , ECCO ’20, July 8–12, 2020, Cancún, Mexico Ryoji Tanabe
G1 x axis: F y axis: C t h50 t h75 t h100 t h Figure 2: Contour maps of adaptive parameter landscapes inP-SHADE on f with d = . the GECCO BBOB workshops. The maximum number of functionevaluations was set to 10 000 × d .We analyze the three PAMs (P-jDE, P-JADE, and P-SHADE)explained in Section 2.2. Source code used in this study can bedownloaded from https://github.com/ryojitanabe/APL. We set theirhyper-parameters to the values recommended by the correspondingarticles. As in [7, 42, 53], we set the population size n to 100. We usedthe rand/1 and current-to- p best/1 strategies described in Section2.1. However, we show only results with current-to- p best/1 due tospace constraints. As in [53], the control parameters of the current-to- p best/1 strategy were set as follows: p = .
05 and | A | = n . Weused binomial crossover. This section analyzes adaptive parameter landscapes in PAMs forDE by the proposed method. Our findings are summarized in Sec-tion 6. Section 5.1 discusses the shapes of adaptive parameter land-scapes by using contour maps as in Figure 1(b). Section 5.2 examinesadaptive parameter landscapes using landscape measures.
Let us consider that a DE with the population size n =
100 termi-nates the search at 10 000 function evaluations on a problem. Inthis case, we can obtain 9 900 contour maps, where the first 100
G1 x axis: F y axis: C t h50 t h75 t h100 t h Figure 3: Contour maps of adaptive parameter landscapes inP-SHADE on f with d = . evaluations out of 10 000 are for the initialization of the popula-tion. Also, we performed 15 runs of the 3 PAMs on the 24 BBOBfunctions with the 6 dimensionalities d ∈ { , , , , , } . Evenif all runs terminate at 10 000 function evaluations, we can obtain64 152 000 contour maps ( = × × × × ) contour maps.It is impossible and meaningless to show 64 152 000 contour mapsin this paper.For the above-mentioned reason, we “thinned” data as followsso that we can focus only on meaningful results. • Data of all runs.
For each PAM, we show results of a single runwith a median best-so-far error value, which is the gap between theobjective values of the best-so-far solution and the optimal solution.When the error value is smaller than 10 − , it is treated as 0. Tiesare broken by the number of function evaluations that is used tofind the best-so-far solution. • Data of all individuals.
For each iteration, first, all individuals aresorted based on their objective values in descending order. Then,we show only adaptive parameter landscapes of the 25th, 50th, 75th,and 100th individuals out of 100 individuals. Since a parameter pairfor the best (1st) individual is seldom successful, we omit its results.The reason is discussed in Section 5.2 later. • Data of all function evaluations.
In order to reduce the computa-tional cost of the proposed method, we calculate adaptive parameterlandscapes only in every 1 000 function evaluations. Also, we show nalyzing Adaptive Parameter Landscapes in PAMs for DE GECCO ’20, July 8–12, 2020, Cancún, Mexico results at 100, ⌊ . stop ⌋ , ⌊ .
75 fe stop ⌋ , and ⌊ stop ⌋ function eval-uations, where fe stop is the number of function evaluations whenthe best-so-far solution is updated last time.Figures 2 and 3 show the contour maps of adaptive parameterlandscapes in P-SHADE on f and f with d =
20, respectively.Here, f and f are modified versions of the Sphere function andthe Rastrigin function, respectively. In Figures 2 and 3, “fe” standsfor “function evaluations”. The x and y axes represent F and C ,respectively. The star in each figure is the best parameter pair thatmaximizes the G1 value. The circle in each figure is the parameterpair generated by the PAM. See Section 3.3 for how to generateFigures 2 and 3. When the G1 values of all 50 ×
50 parameter pairs are0 (i.e., no parameter pair can improve the individual), the adaptiveparameter landscape is flat. In such a case, we do not show results(e.g., the result of the 50th individual at 43 000 function evaluationsin Figure 3). Figures S.73 in the supplementary file show errorvalues of P-jDE, P-JADE, and P-SHADE on all 24 BBOB functionswith d =
20. Note that we are not interested in benchmarking DEalgorithms. As shown in Figures S.73(a) and (c), P-SHADE found theoptimal solution on f and f at about 16 000 and 87 000 functionevaluations in a median run. Due to space constraints, we showresults of P-jDE, P-JADE, and P-SHADE on the 24 BBOB functions( f , ..., f ) with d =
20 in Figures S.1–S.72 in the supplementaryfile. Although we show only the results of P-SHADE in this section,the qualitative results of the three PAMs are similar. The results onthe functions with d ≥ d = F and C is easy for PAMs in an earlystage of evolution. However, the area with non-zero G1 values de-creases as the search progresses. Thus, it is relatively difficult togenerate a parameter pair of F and C that improves each individualin a mid stage of evolution. As seen from Figures 2 and 3, the shapeof adaptive parameter landscapes is also different depending onthe rank of each individual. By comparing Figures 2 and 3, we cansee that generating a successful parameter pair on a multimodalfunction is more difficult than that on a unimodal function. Adap-tive parameter landscapes at 43 000 and 65 000 function evaluationsin Figure 3 indicate that the area with non-zero G1 values is verysmall like needle-in-haystack landscapes. In addition to the multi-modality, the nonseparability is an important factor to determinethe shape of adaptive parameter landscapes as seen from results onnonseparable unimodal functions ( f – f ) shown in Figures S.6–S.62 in the supplementary file. Interestingly, as shown in adaptiveparameter landscapes at 87 000 function evaluations in Figure 3, thearea with non-zero G1 values becomes large again in a late stageof evolution. This is because the population has well converged tothe optimal solution, and generating better trial vectors is not sodifficult at such a situation on f .The shape of adaptive parameter landscapes is significantly in-fluenced by the global structures of fitness landscapes. For example, . . . . F . . . . C . . . . . . . . (a) f . . . . F . . . . C . . . . . . . . (b) f Figure 4: Contour maps of adaptive parameter landscapes inP-SHADE on f and f with d = . Figure 4 shows that the contour maps of adaptive parameter land-scapes in the 100-th individual of P-SHADE on f and f with d =
20 at 100 function evaluations. Figures 4(a) and (b) are partsof Figures S.70 and S.71 in the supplementary file, respectively.The original functions of f and f are the Gallagher’s Gaussian21 peaks function and the Katsuura function, which have fitnesslandscapes without any global structure. Figure 4(a) shows that theadaptive parameter landscape on f has only the small area withnon-zero G1 values even at the beginning of the search. Figure 4(b)also shows that the adaptive parameter landscape do not have anyglobal structure similar to the fitness landscape of f .As seen from the positions of the star and the circle in Figures2 and 3, a parameter pair actually generated by P-SHADE is farfrom the best parameter pair. Ideally, it is desirable that a PAM cangenerate a parameter pair close to the best parameter pair. Thisobservation indicates that there is room for improving PAMs in DE. This section analyzes adaptive parameter landscapes using tworepresentative landscape measures (FDC [24] and DISP [26]) anda non-zero ratio (NZR) measure. FDC measures the correlationbetween objective values and the distance to the best solutionfound (or the optimal solution). A large FDC value indicates thatthe corresponding fitness landscape has a strong global structure.In DISP, first, all solutions are sorted based on their objective valuesin descending order. Then, the dispersion of the top b solutions iscalculated based on the average pairwise distance between them( b = ⌊ . m ⌋ in this study). A large DISP value indicates that thecorresponding fitness landscape has a multi-funnel. Although FDCand DISP were originally proposed for fitness landscape analysis ,they can be extended for parameter landscape analysis with nosignificant change, as demonstrated in [20]. When using FDC andDISP for adaptive parameter landscape analysis , “the objective value”is replaced with the G1 value, and “the solution” is replaced with theparameter pair of F and C values. We did not normalize parametervalues since F ∈ [ , ] and C ∈ [ , ] . FDC and DISP performpoorly in high-dimensional spaces [30], but we address only thetwo-dimensional space ( F and C ).We introduce NZR for analyzing adaptive parameter landscapes.We do not argue that NZR is one of our contributions since it justcounts numbers. The NZR value of an adaptive parameter landscape ECCO ’20, July 8–12, 2020, Cancún, Mexico Ryoji Tanabe
24 BBOB functions . . . . . F DC (a) FDC
24 BBOB functions . . . . D I SP (b) DISP
24 BBOB functions . . . . N Z R (c) NZR Figure 5: Average FDC, DISP, and NZR values of adaptive parameter landscapes in P-SHADE ( d = ). in the i -th individual at iteration t is given as follows:NZR ( θ ti , , ..., θ ti , m ) = m (cid:12)(cid:12)(cid:12)(cid:110) θ ti , j | G1 ( θ ti , j ) > , j = , ..., m (cid:111)(cid:12)(cid:12)(cid:12) , (8)where the NZR value is always in the range [ , ] . NZR measuresthe difficulty in generating a “successful” parameter pair of F and C values based on the area with non-zero G1 values (see Section 2.2 forthe definition of “successful”). A large NZR value indicates that it iseasy to generate a successful parameter pair on the correspondingadaptive parameter landscape.Figure 5 shows the average FDC, DISP, and NZR values of the1st, 25th, 50th, 75th, and 100th individuals in P-SHADE at 100, 1 000,2 000, ... function evaluations on the 24 BBOB functions with d = d ∈ { , , , , , } .Figure 5(a) shows that all FDC values are non-negative on allfunctions. As seen from results on all 24 functions, the worse theindividual is, the larger the FDC value. Adaptive parameter land-scapes for individuals with similar ranks (e.g., the 50th and 75thindividuals) have similar FDC values. This observation indicatesthat the global structures of adaptive parameter landscapes cancorrelate with the rank of individuals. Some previous studies (e.g.,[15, 41, 47]) gave a rule of thumb that the appropriate parameter pairof F and C values may depend on the rank of individuals. Althoughthis rule of thumb has never been supported by any result, it canbe justified by our observation in adaptive parameter landscapes.Results of NZR in Figure 5(c) show that generating a successfulparameter pair is relatively easy for inferior individuals. Ali [2]demonstrated that generating a better trial vector than an inferiorindividual in the population is easy. We believe that our observationsupports a generalization of Ali’s observation since it can be appliedeven to adaptive parameter landscapes.As shown in Figure 5(a), the FDC value is different depending onthe function. While the average FDC values on the three separableand unimodal functions ( f , f , and f ) are large, those on the multi-modal or nonseparable functions are small, except for f and f .Although f and f are multi-modal functions, each peak of theirfitness landscapes is unimodal. This property of f and f mayinfluence the FDC values of adaptive parameter landscapes.As shown in Figures 5(a) and (b), results of DISP are consistentwith the above-mentioned results of FDC in most cases. Figure 5(b)can be viewed as an upside-down version of Figure 5(a). This maybe because both FDC and DISP quantify the global structures of fitness landscapes. An analysis with other landscape measures (e.g.,ELA [29] and LON [1]) is another future work.Figures S.74–S.82 in the supplementary file show that the resultsof P-jDE, P-JADE, and P-SHADE for d ∈ { , , } are similar tothose for d =
20 (Figure 5). In contrast, results for d ∈ { , } arenoisy. This may be because DE algorithms with PAMs do not workwell on such low-dimensional problems as reported in [45]. We have analyzed adaptive parameter landscapes based on F and C in PAMs for DE. We introduced the concept of adaptive parameterlandscapes (Section 3.1). We proposed the method of analyzingadaptive parameter landscapes based on the G1 metric (Sections3.2 and 3.3). We also examined adaptive parameter landscapes inP-jDE, P-JADE, and P-SHADE on the 24 BBOB functions by usingthe proposed method (Section 5).Our observations in this study can be summarized as follows:i) An adaptive parameter landscape L a ( NOT the optimal pa-rameter θ ∗ ) is different depending on the search progress.For example, it is relatively easy to generate successful pa-rameters in an early stage of evolution.ii) L a ( NOT θ ∗ ) is significantly influenced by the characteris-tics of a given problem (e.g., the local/global multimodality).iii) L a ( NOT θ ∗ ) differs depending on the rank of an individual,but L a of individuals with similar ranks are generally similar.iv) In most cases, P-jDE, P-JADE, and P-SHADE generate aparameter pair of F and C values far from the best parameterpair. This means that there is room for improving PAMs.We emphasize that our observations about PAMs could not beobtained without analyzing adaptive parameter landscapes. Webelieve that our observations can be useful clues to design an effi-cient PAM. Overall, we conclude that adaptive parameter landscapeanalysis can provide important information about PAMs for DE.Although we examined adaptive parameter landscapes in PAMsfor DE, we believe that the proposed analysis method can be ap-plied to PAMs for other evolutionary algorithms, including geneticalgorithms and evolution strategies. Further analysis is needed. ACKNOWLEDGMENTS
This work was supported by Leading Initiative for Excellent YoungResearchers, MEXT, Japan. nalyzing Adaptive Parameter Landscapes in PAMs for DE GECCO ’20, July 8–12, 2020, Cancún, Mexico
REFERENCES [1] J. Adair, G. Ochoa, and K. M. Malan. 2019. Local optima networks for continuousfitness landscapes. In
GECCO (Companion) . 1407–1414.[2] M. M. Ali. 2011. Differential evolution with generalized differentials.
J. Comput.Appl. Math.
IEEE TEVC
24, 1 (2020), 84–98.[4] T. Bartz-Beielstein, C. Lasarczyk, and M. Preuss. 2010. The Sequential ParameterOptimization Toolbox. In
Experimental Methods for the Analysis of OptimizationAlgorithms . 337–362.[5] N. Belkhir, J. Dréo, P. Savéant, and M. Schoenauer. 2016. Feature Based AlgorithmConfiguration: A Case Study with Differential Evolution. In
PPSN . 156–166.[6] L. C. T. Bezerra, M. López-Ibáñez, and T. Stützle. 2018. A Large-Scale Experi-mental Evaluation of High-Performing Multi- and Many-Objective EvolutionaryAlgorithms.
Evol. Comput.
26, 4 (2018).[7] J. Brest, S. Greiner, B. Bošković, M. Mernik, and V. Žumer. 2006. Self-AdaptingControl Parameters in Differential Evolution: A Comparative Study on NumericalBenchmark Problems.
IEEE TEVC
10, 6 (2006), 646–657.[8] S. Das, S. S. Mullick, and P. N. Suganthan. 2016. Recent advances in differentialevolution - An updated survey.
Swarm and Evol. Comput.
27 (2016), 1–30.[9] S. Das and P. N. Suganthan. 2011. Differential Evolution: A Survey of the State-of-the-Art.
IEEE TEVC
15, 1 (2011), 4–31.[10] M. Drozdik, H. E. Aguirre, Y. Akimoto, and K. Tanaka. 2015. Comparison ofParameter Control Mechanisms in Multi-objective Differential Evolution. In
LION .89–103.[11] A. S. D. Dymond, A. P. Engelbrecht, and P. S. Heyns. 2011. The sensitivity ofsingle objective optimization algorithm control parameter values under differentcomputational constraints. In
IEEE CEC . 1412–1419.[12] A. E. Eiben, R. Hinterding, and Z. Michalewicz. 1999. Parameter control inevolutionary algorithms.
IEEE TEVC
3, 2 (1999), 124–141.[13] A. E. Eiben and S. K. Smit. 2011. Parameter tuning for configuring and analyzingevolutionary algorithms.
Swarm and Evol. Comput.
1, 1 (2011), 19–31.[14] R. Gämperle, S. D. Müller, and P. Koumoutsakos. 2002. A Parameter Study forDifferential Evolution. In
Int. Conf. on Adv. in Intelligent Systems, Fuzzy Systems,Evol. Comput.
Inf.Sci.
Evolution Strategies . Springer.[17] N. Hansen, S. Finck, R. Ros, and A. Auger. 2009.
Real-Parameter Black-Box Opti-mization Benchmarking 2009: Noiseless Functions Definitions . Technical Report.INRIA.[18] N. Hansen and A. Ostermeier. 2001. Completely Derandomized Self-Adaptationin Evolution Strategies.
Evol. Comput.
9, 2 (2001), 159–195.[19] K. R. Harrison, A. P. Engelbrecht, and B. M. Ombuki-Berman. 2018. Optimalparameter regions and the time-dependence of control parameter values forthe particle swarm optimization algorithm.
Swarm and Evol. Comput.
41 (2018),20–35.[20] K. R. Harrison, B. M. Ombuki-Berman, and A. P. Engelbrecht. 2019. The ParameterConfiguration Landscape: A Case Study on Particle Swarm Optimization. In
IEEECEC . 808–814.[21] H. H. Hoos. 2012. Automated Algorithm Configuration and Parameter Tuning.In
Autonomous Search . 37–71.[22] F. Hutter, H. H. Hoos, K. Leyton-Brown, and T. Stützle. 2009. ParamILS: AnAutomatic Algorithm Configuration Framework.
JAIR
36 (2009), 267–306.[23] A. Jankovic and C. Doerr. 2019. Adaptive landscape analysis. In
GECCO (Com-panion) . 2032–2035.[24] T. Jones and S. Forrest. 1995. Fitness Distance Correlation as a Measure of ProblemDifficulty for Genetic Algorithms. In
ICGA . 184–192.[25] I. Loshchilov, M. Schoenauer, and M. Sebag. 2012. Alternative Restart Strategiesfor CMA-ES. In
PPSN . 296–305.[26] M. Lunacek and D. Whitley. 2006. The dispersion metric and the CMA evolutionstrategy. In
GECCO . 477–484.[27] K. Malan and A. P. Engelbrecht. 2013. A survey of techniques for characterisingfitness landscapes and some possible ways forward.
Inf. Sci.
241 (2013), 148–163.[28] R. Mallipeddi, P. N. Suganthan, Q. K. Pan, and M. F. Tasgetiren. 2011. Differentialevolution algorithm with ensemble of parameters and mutation strategies.
Appl. Soft Comput.
11 (2011), 1679–1696.[29] O. Mersmann, B. Bischl, H. Trautmann, M. Preuss, C. Weihs, and G. Rudolph.2011. Exploratory landscape analysis. In
GECCO . 829–836.[30] R. Morgan and M. Gallagher. 2014. Sampling Techniques and Distance Metricsin High Dimensional Continuous Landscape Analysis: Limitations and Improve-ments.
IEEE TEVC
18, 3 (2014), 456–461.[31] M. A. Muñoz, Y. Sun, M. Kirley, and S. K. Halgamuge. 2015. Algorithm selectionfor black-box continuous optimization problems: A survey on methods andchallenges.
Inf. Sci.
317 (2015), 224–245.[32] M. G. H. Omran, A. A. Salman, and A. P. Engelbrecht. 2005. Self-adaptive Differ-ential Evolution. In
CIS . 192–199.[33] K. R. Opara and J. Arabas. 2019. Differential Evolution: A survey of theoreticalanalyses.
Swarm and Evol. Comput.
44 (2019), 546–558.[34] M. E. H. Pedersen. 2010.
Tuning & Simplifying Heuristical Optimization . Ph.D.Dissertation. University of Southampton.[35] E. Pitzer and M. Affenzeller. 2012. A Comprehensive Survey on Fitness LandscapeAnalysis. In
Recent Advances in Intelligent Engineering Systems . 161–191.[36] Y. Pushak and H. H. Hoos. 2018. Algorithm Configuration Landscapes: - MoreBenign Than Expected?. In
PPSN . 271–283.[37] A. K. Qin, V. L. Huang, and P. N. Suganthan. 2009. Differential Evolution Algo-rithm With Strategy Adaptation for Global Numerical Optimization.
IEEE TEVC
13, 2 (2009), 398–417.[38] C. Segura, C. A. C. Coello, E. Segredo, and C. León. 2014. An analysis of theautomatic adaptation of the crossover rate in differential evolution. In
IEEE CEC .459–466.[39] T. Smith, P. Husbands, P. J. Layzell, and M. O’Shea. 2002. Fitness Landscapes andEvolvability.
Evol. Comput.
10, 1 (2002), 1–34.[40] R. Storn and K. Price. 1997. Differential Evolution - A Simple and EfficientHeuristic for Global Optimization over Continuous Spaces.
J. Glo. Opt.
11, 4(1997), 341–359.[41] T. Takahama and S. Sakai. 2012. Efficient Constrained Optimization by the ϵ Constrained Rank-Based Differential Evolution. In
IEEE CEC . 1–8.[42] R. Tanabe and A. Fukunaga. 2013. Success-History Based Parameter Adaptationfor Differential Evolution. In
IEEE CEC . 71–78.[43] R. Tanabe and A. Fukunaga. 2016. How Far Are We from an Optimal, AdaptiveDE?. In
PPSN . 145–155.[44] R. Tanabe and A. Fukunaga. 2017. TPAM: a simulation-based model for quantita-tively analyzing parameter adaptation methods. In
GECCO . 729–736.[45] R. Tanabe and A. Fukunaga. 2020. Reviewing and Benchmarking ParameterControl Methods in Differential Evolution.
IEEE Trans. Cyber.
50, 3 (2020), 1170–1184.[46] R. Tanabe and A. S. Fukunaga. 2014. Improving the search performance of SHADEusing linear population size reduction. In
IEEE CEC . 1658–1665.[47] L. Tang, Y. Dong, and J. Liu. 2015. Differential Evolution With an Individual-Dependent Mechanism.
IEEE TEVC
19, 4 (2015), 560–574.[48] Y. Wang, Z. Cai, and Q. Zhang. 2011. Differential Evolution With Composite TrialVector Generation Strategies and Control Parameters.
IEEE TEVC
15, 1 (2011),55–66.[49] Z. Yang, K. Tang, and X. Yao. 2008. Self-adaptive Differential Evolution withNeighborhood Search. In
IEEE CEC . 1110–1116.[50] B. Yuan and M. Gallagher. 2007. Combining Meta-EAs and Racing for DifficultEA Parameter Tuning Tasks. In
Parameter Setting in Evolutionary Algorithms .121–142.[51] Z. Yuan, M. A. M. de Oca, M. Birattari, and T. Stützle. 2012. Continuous optimiza-tion algorithms for tuning real and integer parameters of swarm intelligencealgorithms.
Swarm Intell.
6, 1 (2012), 49–75.[52] A. Zamuda and J. Brest. 2015. Self-adaptive control parameters’ randomizationfrequency and propagations in differential evolution.
Swarm and Evol. Comput.
25 (2015), 72–99.[53] J. Zhang and A. C. Sanderson. 2009. JADE: Adaptive Differential Evolution WithOptional External Archive.
IEEE TEVC
13, 5 (2009), 945–958.[54] K. Zielinski, X. Wang, and R. Laur. 2008. Comparison of Adaptive Approachesfor Differential Evolution. In
PPSN . 641–650.[55] K. Zielinski, P. Weitkemper, R. Laur, and K. D. Kammeyer. 2006. ParameterStudy for Differential Evolution Using a Power Allocation Problem IncludingInterference Cancellation. In