[PDF] A Modular Hybridization of Particle Swarm Optimization and Differential Evolution

Abstract

In swarm intelligence, Particle Swarm Optimization (PSO) and Differential Evolution (DE) have been successfully applied in many optimization tasks, and a large number of variants, where novel algorithm operators or components are implemented, has been introduced to boost the empirical performance. In this paper, we first propose to combine the variants of PSO or DE by modularizing each algorithm and incorporating the variants thereof as different options of the corresponding modules. Then, considering the similarity between the inner workings of PSO and DE, we hybridize the algorithms by creating two populations with variation operators of PSO and DE respectively, and selecting individuals from those two populations. The resulting novel hybridization, called PSODE, encompasses most up-to-date variants from both sides, and more importantly gives rise to an enormous number of unseen swarm algorithms via different instantiations of the modules therein. In detail, we consider 16 different variation operators originating from existing PSO- and DE algorithms, which, combined with 4 different selection operators, allow the hybridization framework to generate 800 novel algorithms. The resulting set of hybrid algorithms, along with the combined 30 PSO- and DE algorithms that can be generated with the considered operators, is tested on the 24 problems from the well-known COCO/BBOB benchmark suite, across multiple function groups and dimensionalities.

Full PDF

AA Modular Hybridization of Particle Swarm Optimization andDifferential Evolution

Rick Boks

Leiden Institute of AdvancedComputer ScienceLeiden, The [email protected]

Hao Wang

LIP6, Sorbonne Universit´eParis, [email protected]

Thomas B¨ack

Leiden Institute of AdvancedComputer ScienceLeiden, The [email protected]

ABSTRACT

In swarm intelligence, Particle Swarm Optimization (PSO) and Dif-ferential Evolution (DE) have been successfully applied in manyoptimization tasks, and a large number of variants, where novelalgorithm operators or components are implemented, has beenintroduced to boost the empirical performance. In this paper, wefirst propose to combine the variants of PSO or DE by modularizingeach algorithm and incorporating the variants thereof as differentoptions of the corresponding modules. Then, considering the simi-larity between the inner workings of PSO and DE, we hybridize thealgorithms by creating two populations with variation operatorsof PSO and DE respectively, and selecting individuals from thosetwo populations. The resulting novel hybridization, called PSODE,encompasses most up-to-date variants from both sides, and moreimportantly gives rise to an enormous number of unseen swarmalgorithms via different instantiations of the modules therein.In detail, we consider 16 different variation operators originatingfrom existing PSO- and DE algorithms, which, combined with 4different selection operators, allow the hybridization frameworkto generate 800 novel algorithms. The resulting set of hybrid algo-rithms, along with the combined 30 PSO- and DE algorithms thatcan be generated with the considered operators, is tested on the24 problems from the well-known COCO/BBOB benchmark suite,across multiple function groups and dimensionalities.

In this paper, we delve into two naturally-inspired algorithms,Particle Swarm Optimization (PSO) [5] and Differential Evolution(DE) [19] for solving continuous black-box optimization problems f : R n → R , which is subject to minimization without loss of gener-ality. Here we only consider simple box constraints on R n , meaningthe search space is a hyper-box [ x min , x max ] = (cid:206) ni = [ x min i , x max i ] .In the literature, a huge number of variants of PSO and DE hasbeen proposed to enhance the empirical performance of the respec-tive algorithms. Despite the empirical success of those variants,we, however, found that most of them only differ from the originalPSO/DE in one or two operators (e.g., the crossover), where usuallysome simple modifications are implemented. Therefore, it is almostnatural for us to consider combinations of those variants. Follow-ing the so-called configurable CMA-ES approach [22, 23], we firstmodularize both PSO and DE algorithms, resulting in a modularframework where different types of algorithmic modules are ap-plied sequentially in each generation loop. When incorporating GECCO ’20 Companion, Canc´un, Mexico variants into this modular framework , we first identify the mod-ules at which modifications are made in a particular variant, andthen treat the modifications as options of the corresponding mod-ules. For instance, the so-called inertia weight [18], that is a simplemodification to the velocity update in PSO, shall be considered asan option of the velocity update module.This treatment allows for combining existing variants of eitherPSO or DE and generating non-existing algorithmic structures. It,in the loose sense, creates a space/family of swarm algorithms , whichis configurable via instantiating the modules, and hence potentiallyprimes the application of algorithm selection/configuration [21]to swarm intelligence. More importantly, we also propose a meta-algorithm called PSODE that hybridizes the variation operatorsfrom both PSO and DE, and therefore gives rise to an even largerspace of unseen algorithms. By hybridizing PSO and DE, we aim tounify the strengths from both sides, in an attempt to, for instance,improve the population diversity and the convergence rate. Onthe well-known Black-Box Optimization Benchmark (BBOB) [7]problem set, we extensively tested all combinations of four differentvelocity updates (PSO), five neighborhood topologies (PSO), twocrossover operators (DE), five mutation operators (DE), and fourselection operators, leading up to 800 algorithms. We benchmarkthose algorithms on all 24 test functions from the BBOB problemset and analyze the experimental results using the so-called IOH-profiler [4], to identify algorithms that perform well on (a subsetof) the 24 test functions.This paper is organized as follows: Section 2 summarizes therelated work. Section 3 reviews the state-of-the-art variants of PSO.Section 4 covers various cutting-edge variants of DE. In Section 5,we describe the novel modular PSODE algorithm. Section 6 specifiesthe experimental setup on the BBOB problem set. We discuss theexperimental results in Section 7 and finally provide, in Section 8,the insights obtained in this paper as well as future directions.

A hybrid PSO/DE algorithm has been coined previously [25] to im-prove the population diversity and prevent premature convergence.This is attempted by using the DE mutation instead of the tradi-tional velocity- and position-update to evolve candidate solutionsin the PSO algorithm. This mutation is applied to the particle’sbest-found solution p i rather than its current position x i , resultingin a steady-state strategy. Another approach [8] follows the con-ventional PSO algorithm, but occasionally applies the DE operatorin order to escape local minima. Particles maintain their velocityafter being permuted by the DE operator. Other PSO/DE hybrids The source code is available at https://github.com/rickboks/pso-de-framework. a r X i v : . [ c s . N E ] J un ECCO ’20 Companion, July 8–12, 2020, Canc´un, Mexico R. Boks et al. include a two-phase approach [16] and a Bare-Bones PSO variantbased on DE [15], which requires little parameter tuning.This work follows the approach of the modular and extensibleCMA-ES framework proposed in [22], where many

ES-structures can be instantiated by arbitrarily combining existing variations ofthe CMA-ES. The authors of this work implement a Genetic Algo-rithm to efficiently evolve the ES structures, instead of performingan expensive brute force search over all possible combinations ofoperators.

As introduced by Eberhart and Kennedy [5], Particle Swarm Op-timization (PSO) is an optimization algorithm that mimics the be-haviour of a flock of birds foraging for food. A particle in a swarmof size M ∈ N > is associated with three vectors: the currentposition x i , velocity v i , and its previous best position p i , where i ∈ { , . . . , M } . After the initialization of x i and v i , where x i isinitialized randomly and v i is set to , the algorithm iteratively con-trols the velocity v i for each particle (please see the next subsection)and moves the particle x i accordingly: x i ← x i + v i (1)To prevent the velocity from exploding, v i is kept in the range [− v max , v max ] ( is a n × f i = f ( x i ) .Here, p i stands for the best solution found by x i (thus personal best)while g i is used to track the best solution found in the neighborhood of x i (thus global best). Typically, the termination of PSO can bedetermined by simple termination criteria, such as the depletion ofthe function evaluation budget, as well as more complicated onesthat reply on the convergence behavior, e.g., detecting whether theaverage distance between particles has gone below a predeterminedthreshold. The pseudo-code is given in Alg. 1. Algorithm 1

Original Particle Swarm Optimization for i = → M do f best i ← f ( x i ) x i ← U ( x min , x max ) , v i ← (cid:46) Initialize end for while termination criteria are not met do for i = → M do f i ← f ( x i ) (cid:46) Evaluate if f i < f best i then p i ← x i , f best i ← f i (cid:46) Update personal best end if if f i < f ( g i ) then g i ← x i (cid:46) Update global best end if

Calculate v i according to Eq. (2) x i ← x i + v i (cid:46) Update position end for end while

As proposed in the original paper [5], the velocity vector in original

PSO is updated as follows: v i ← v i + U ( , ϕ ) ⊗ ( p i − x i ) + U ( , ϕ ) ⊗ ( g i − x i ) , (2)where U ( a , b ) stands for a continuous uniform random vector witheach component distributed uniformly in the range [ a i , b i ] , and ⊗ is component-wise multiplication. Note that, henceforth the param-eter settings such as ϕ , ϕ will be specified in the experimentationpart (Section 6). As discussed before, velocities resulting from Eq. (2)have to be clamped in range [− v max , v max ] . Alternatively, the inertia weight [18] ω ∈ [ , ] is introduced to moderate the velocityupdate without using v max : v i ← ω v i + U ( , ϕ )⊗( p i − x i ) + U ( , ϕ )⊗( g i − x i ) . (3)A large value of ω will result in an exploratory search, while asmall value leads to a more exploitative behavior. It is suggestedto decrease the inertia weight over time as it is desirable to scaledown the explorative effect gradually. Here, we consider the inertiamethod with fixed as well as decreasing weights.Instead of only being influenced by the best neighbor, the velocityof a particle in the Fully Informed Particle Swarm (FIPS) [14] isupdated using the best previous positions of all its neighbors. Thecorresponding equation is: v i ← χ (cid:16) v i + | N i | (cid:213) p ∈ N i U ( , ϕ ) ⊗ ( p − x i ) (cid:17) , (4)where N i is the number of neighbors of particle i and χ = /( ϕ − + (cid:112) ϕ − ϕ ) . Finally, the so-called Bare-Bones

PSO [11] is a com-pletely different approach in the sense that velocities are not usedat all and instead every component x ij ( j = , . . . , n ) of position x i is sampled from a Gaussian distribution with mean ( p ij + д ij )/ | p ij − д ij | , where p ij and д ij are the j th componentof p i and g i , respectively: x ij ∼ N (cid:0) ( p ij + д ij )/ , | p ij − д ij | (cid:1) , j = , . . . , n . (5) Five different topologies from the literature have been implementedin the framework: • lbest (local best) [5] takes a ring topology and each particleis only influenced by its two adjacent neighbors. • gbest (global best) [5] uses a fully connected graph andthus every particle is influenced by the best particle of theentire swarm. • In the

Von Neumann topology [12], particles are arrangedin a two-dimensional array and have four neighbors: theones horizontally and vertically adjacent to them, withtoroidal wrapping. • The increasing topology [20] starts with an lbest topologyand gradually increases the connectivity so that, by theend of the run, the particles are fully connected. • The dynamic multi-swarm topology (DMS-PSO) [13] createsclusters consisting of three particles each, and creates newclusters randomly after every 5 iterations. If the populationsize is not divisible by three, every cluster has size three,except one, which is of size 3 + ( M mod 3 ) . Modular Hybridization of PSO and DE GECCO ’20 Companion, July 8–12, 2020, Canc´un, Mexico

Differential Evolution (DE) is introduced by Storn and Price in1995 [19] and uses scaled differential vectors between randomlyselected individuals for perturbing the population. The pseudo-codeof DE is provided in Alg. 3.After the initialization of the population (please see the nextsubsection) P = { x i } Mi = ⊂ R n ( M is again the swarm size), foreach individual x i , a donor vector v i (a.k.a. mutant) is generatedaccording to: v i ← x r + F · ( x r − x r ) (6)where three distinct indices r (cid:44) r (cid:44) r (cid:44) i ∈ [ .. M ] are chosenuniformly at random (u.a.r.). Here F ∈ [ . , ] is a scalar value calledthe mutation rate and x r is referred as the base vector . Afterwards,a trial vector x (cid:48) i is created by means of crossover.In the so-called binomial crossover, each component x (cid:48) ij ( j = , . . . , n ) of x (cid:48) i is copied from v ij with a probability Cr ∈ [ , ] (a.k.a. crossover rate), or when j equals an index j rand ∈ [ .. n ] chosen u.a.r.: x (cid:48) ij ← (cid:26) v ij if U ( , ) ≤ Cr or j = j rand x ij otherwise (7)In exponential crossover, two integers p , q ∈ { , . . . , n } are cho-sen. The integer p acts as the starting point where the exchange ofcomponents begins, and is chosen uniformly at random. q repre-sents the number of elements that will be inherited from the donorvector, and is chosen using Algorithm 2. Algorithm 2

Assigning a value to q q ← do q ← q + while (( U ( , ) ≤ Cr ) and ( q ≤ n )) The trial vector x (cid:48) i is generated as: x (cid:48) ij ← (cid:26) v ij for j = (cid:104) p (cid:105) n , (cid:104) p + (cid:105) n . . . (cid:104) p + q − (cid:105) n x ij for all other j ∈ { , . . . , n } (8)The angular brackets (cid:104)(cid:105) n denote the modulo operator with mod-ulus n . Elitism selection is applied between x i and x (cid:48) i , where thebetter one is kept for the next iteration. In addition to the so-called DE/rand/1 mutation operator (Eq. 6),we also consider the following variants:(1) DE/best/1 [19]: the base vector is chosen as the currentbest solution in the population x best : v i ← x best + F · ( x r − x r ) (2) DE/best/2 [19]: two differential vectors calculated usingfour distinct solutions are scaled and combined with thecurrent best solution: v i ← x best + F · ( x r − x r ) + F · ( x r − x r ) (3) DE/Target-to-best/1 [19]: the base vector is chosen as thesolution on which the mutation will be applied and the Algorithm 3

Differential Evolution using Binomial Crossover x i ← U ( x min , x max ) , i = , . . . , M . (cid:46) Initialize while termination criteria are not met do for i = → M do Choose r (cid:44) r (cid:44) r (cid:44) i ∈ [ .. M ] u.a.r. v i ← x r + F ( x r − x r ) (cid:46) Mutate Choose j rand ∈ [ .. n ] u.a.r. for j = → n do if U ( , ) ≤ Cr or j = j rand then x (cid:48) ij ← v ij else x (cid:48) ij ← x ij end if end for if f ( x (cid:48) i ) < f ( x i ) then x i ← x (cid:48) i (cid:46) Select end if end for end while difference from the current best to this solution is used asone of the differential vectors: v i ← x i + F · ( x best − x i ) + F · ( x r − x r ) (4) Target-to- p best/1 [10]: the same as above except that wetake instead of the current best a solution x p best that israndomly chosen from the top 100 p % solutions in the pop-ulation with p ∈ ( , ] . v i ← x i + F · ( x p best − x i ) + F · ( x r − x r ) (5) DE/2-Opt/1 [2]: v i ← (cid:26) v i ← x r + F ( x r − x r ) if f ( x r ) < f ( x r ) v i ← x r + F ( x r − x r ) otherwise The performance of the DE algorithm is highly dependent on valuesof the parameters F and Cr , for which the optimal values are in turndependent on the optimization problem at hand. The self-adaptiveDE variant JADE [10] has been proposed in desire to control theparameters in a self-adaptive manner, without intervention of theuser. This self-adaptive parameter scheme is used in both DE andhybrid algorithm instances. Here, we propose a hybrid algorithm framework called

PSODE ,that combines the mutation- and crossover operators from DE withthe velocity- and position updates from PSO. This implementationallows combinations of all operators mentioned earlier, in a singlealgorithm, creating the potential for a large number of possiblehybrid algorithms. We list the pseudo-code of PSODE in Alg. 4,which works as follows.

ECCO ’20 Companion, July 8–12, 2020, Canc´un, Mexico R. Boks et al. (1) The initial population P = { x ( ) , . . . , x ( M ) } ( M stands forthe swarm size) is sampled uniformly at random in thesearch space, and the corresponding velocity vectors areinitialized to zero (as suggested in [6]).(2) After evaluating P , we create P by applying the PSOposition update to each solution in P .(3) Similarly, P is created by applying the DE mutation toeach solution in P .(4) Then, a population P of size M is generated by recombin-ing information among the solutions in P and P , basedon the DE crossover.(5) Finally, a new population is generated by selecting goodsolutions from P , P , and P (please see below).Four different selection methods are considered in this work, twoof which are elitist, and two non-elitist. A problem arises duringthe selection procedure: solutions from P have undergone themutation and crossover of DE that alters their positions but ignoresthe velocity thereof, leading to an unmatched pair of positions andvelocities. In this case, the velocities that these particles have in-herited from P may no longer be meaningful, potentially breakingdown the inner workings of PSO in the next iteration. To solve thisissue, we propose to re-compute the velocity vector according to thedisplacement of a particle resulting from mutation and crossoveroperators, namely: v ( i ) ← x ( i , ) − x ( i , ) , for i = , , . . . , M , (9)where x ( i , ) ∈ P is generated by x ( i , ) ∈ P using aforementionedprocedure.A selection operator is required to select particles from P , P ,and P for the next generation. Note that P is not considered in theselection procedure, as the solution vectors in this population wererecombined and stored in P . We have implemented four differentselection methods: two of those methods only consider population P , resulting from variation operators of PSO, and population P ,obtained from variation operators of DE. This type of selectionmethods is essentially non-elitist allowing for deteriorations. Alter-natively, the other two methods implement elitism by additionallytaking population P into account.We use the following naming scheme for the selection methods: [comparison method]/[ P i considered] Using this scheme, we can distinguish the four selection meth-ods: pairwise/2, pairwise/3, union/2, and union/3. The “pairwise”comparison method means that the i -th members (assuming thesolutions are indexed) of each considered population are comparedto each other, from which we choose the best one for the nextgeneration. The “union” method selects the best M solutions fromthe union of the considered populations. Here, a “2” signals theinclusion of two populations, P and P , and a “3” indicates thefurther inclusion of P . For example, the pairwise/2 method selectsthe best individual from each pair of x ( i , ) and x ( i , ) , while theunion/3 method selects the best M individuals from P ∪ P ∪ P . Algorithm 4

PSODE Sample P = { x ( ) , . . . , x ( M ) } uniformly at random in [ x min , x max ] Initialize velocities V ← { , . . . , } . while termination criteria are not met do P ← ∅ for x ∈ P with its corresponding velocity v ∈ V do v (cid:48) ← velocity-update ( x , v ) x (cid:48) ← x + v (cid:48) Evaluate x (cid:48) on f P ← P ∪ { x (cid:48) } end for P ← ∅ for x ∈ P do x (cid:48) ← de-mutation ( x ) P ← P ∪ { x (cid:48) } end for P ← ∅ , V ← ∅ for i = → M do x (cid:48) ← de-crossover ( x ( i , ) , x ( i , ) ) calculate v (cid:48) for x (cid:48) using Eq. 9 Evaluate x (cid:48) on f P ← P ∪ { x (cid:48) } V ← V ∪ { v (cid:48) } end for P ← selection ( P , P , P ) end while A software framework has been implemented in

C++ to generatePSO, DE and PSODE instances from all aforementioned algorithmicmodules, e.g. topologies and mutation strategies. Such a frame-work is tested on IOHprofiler, which contains the 24 functions fromBBOB/COCO [7] that are organized in five function groups: 1) Sep-arable functions 2) Functions with low or moderate conditioning 3)Unimodal functions with high conditioning 4) Multi-modal func-tions with adequate global structure and 5) Multi-modal functionswith weak global structure.In the experiments conducted, a PSODE instance is consideredas a combination of five modules: velocity update strategy, pop-ulation topology, mutation method, crossover method, and selec-tion method. Combining each option for each of these five mod-ules, we obtain a total of 5 ( topologies ) × × × × =

800 different PSODE instances.By combining the 4 velocity update strategies and 5 topolo-gies, we obtain 4 × =

20 PSO instances, and similarly we ob-tain 5 ( mutation methods ) × =

10 DEinstances.

Modular Hybridization of PSO and DE GECCO ’20 Companion, July 8–12, 2020, Canc´un, Mexico

Naming Convention of Algorithm Instances.

As each PSO, DE,and hybrid instance can be specified by the composing modules, itis named using the abbreviations of its modules: hybrid instancesare named as follows:

H [velocity strategy] [topology] [mutation][crossover] [selection]

PSO instances are named as:

P [velocity strategy] [topology]

And DE instances are named as:

D [mutation] [crossover]

Options of all modules are listed in Table 1.

Experiment Setup.

The following parameters are used throughoutthe experiment: • Function evaluation budget: 10 n . • Population (swarm) size: 5 n is used for all algorithm in-stances, due to the relatively consistent performance thatinstances show across different function groups and dimen-sionalities when using this value. • Hyperparameters in PSO: In Eq. (2) and (3), ϕ = ϕ = . ϕ = . ω is set to 0 . ω is linearly decreased from 0 . .

4. For theTarget-to- p best/1 mutation scheme, a value of p = . • Hyperparameters in DE: F and Cr are managed by theJADE self-adaptation scheme. • Number of independent runs per function: 30. Note thatonly one function instance (instance “1”) is used for eachfunction. • Performance measure: expected running time (ERT) [17],which is the total number of function evaluations an algo-rithm is expected to use to reach a given target functionvalue for the first time. ERT is defined as ( f target ) , where ( f target ) denotes the total number of function evalua-tions taken to hit f target in all runs, while f target might notbe reached in every run, and f opt + { , . . . , − } of every benchmark function,and then taking the average rank across all targets per function.Finally, the presented rank is obtained by taking the average rankover all 24 test functions. This is done for both dimensionalities. Adataset containing the running time for each independent run andERT’s for each algorithm instance, with the supporting scripts, areavailable at [1]. [velocity strategy] [mutation]B – Bare-Bones PSO B1 – DE/best/1 F – Fully-informed PSO (FIPS) B2 – DE/best/2 I – Inertia weight T1 – DE/target-to-best/1 D – Decreasing inertia weight PB – DE/target-to- p best/1 [crossover] O1 – 2-Opt/1 B – Binomial crossover [selection]E – Exponential crossover U2 – Union/2 [topology] U3 – Union/3 L – lbest (ring) P2 – Pairwise/2 G – дbest (fully connected) P3 – Pairwise/3 N – Von Neumann I – Increasing connectivity M – Dynamic multi-swarm Table 1: Module options and codings of velocity strategy,crossover, initialization, topology, and mutation.

Figure 1 depicts the Empirical Cumulative Distribution Functions(ECDF) of the top-5 highest ranked algorithm instances in both5-D and 20-D. Due to overlap, only 8 algorithms are shown. Tables2 and 3 show the the Estimated Running Times of the 10 highestranked instances, and the 10 ranked in the middle in 5-D and 20-D,respectively. ERT values are normalized using the correspondingERT values of the state-of-the-art Covariance Matrix AdaptationEvolution Strategy (CMA-ES).Though many more PSODE instances were tested, DE instancesgenerally showed the best performance in both 5-D and 20-D. AllPSO instances were outperformed by DE and many PSODE in-stances. This is no complete surprise, as several studies (e.g. in[9, 24]) demonstrated the relative superiority of DE over PSO.Looking at the ranked algorithm instances, it is clear to see thatsome modules are more successful than others. The (decreasing)inertia weight velocity update strategies are dominant among thetop-performing algorithms, as well as pairwise/3 selection and bino-mial crossover. Target-to- p best/1 mutation is most successful in 5-Dwhile target-to-best/1 seems a better choice in 20-D. This is surpris-ing, as one may expect the less greedy target-to- p best/1 mutationto be more beneficial in higher-dimensional search spaces, whereit is increasingly difficult to avoid getting stuck in local optima.The best choice of selection method is convincingly pairwise/3.This seems to be one of the most crucial modules for the PSODEalgorithm, as most instances with any other selection method showconsiderably worse performance. This seemingly high importanceof an elitist strategy suggests that the algorithm’s convergence withnon-elitist selection is too slow, which could be due to the applica-tion of two different search strategies. The instances H I * PB B P3and H I * T1 B P3 appear to be the most competitive PSODE in-stances, with the topology choice having little influence on theobserved performance. The most highly ranked DE instances areDE T1 B and D PB B, in both dimensionalities. Binomial crossoverseems superior to the exponential counterpart, especially in 20dimensions.Interestingly, the PSODE and PSO algorithms “prefer” differentmodule options. As an example, the Fully Informed Particle Swarmworks well on PSO instances, but PSODE instances perform better ECCO ’20 Companion, July 8–12, 2020, Canc´un, Mexico R. Boks et al. with the (decreasing) inertia weight. Bare-Bones PSO showed theoverall poorest performance of the four velocity update strategies.Notable is the large performance difference between the worstand best generated algorithm instances. Some combinations ofmodules, as to be expected while arbitrarily combining operators,show very poor performance, failing to solve even the most trivialproblems. This stresses the importance of proper module selection.

We implement an extensible and modular hybridization of PSO andDE, called PSODE, in which a large number of variants from bothPSO and DE is incorporated as module options. Interestingly, avast number of unseen swarm algorithms can be easily instantiatedfrom this hybridization, paving the way for designing and selectingappropriate swarm algorithms for specific optimization tasks. Inthis work, we investigate, on 24 benchmark functions from BBOB,20 PSO variants, 10 DE variants, and 800 PSODE instances resultingfrom combining the variants of PSO and DE, where we identifysome promising hybrid algorithms that surpass PSO but fail tooutperform the best DE variants, on subsets of BBOB problems.Moreover, we obtained insights into suitable combinations of algo-rithmic modules. Specifically, the efficacy of the target-to-( p )bestmutation operators, the (decreasing) inertia weight velocity updatestrategies, and binomial crossover was demonstrated. On the otherhand, some inefficient operators, such as Bare-Bones PSO, wereidentified. The neighborhood topology appeared to have the leasteffect on the observed performance of the hybrid algorithm.The future work lies in extending the hybridization framework.Firstly, we are planning to incorporate the state-of-the-art PSOand DE variants as much as possible. Secondly, we shall explorealternative ways of combining PSO and DE. Lastly, it is worthwhileto consider the problem of selecting a suitable hybrid algorithm foran unseen optimization problem, taking the approach of automatedalgorithm selection. ACKNOWLEDGMENTS

Hao Wang acknowledges the support from the Paris ˆIle-de-FranceRegion.

REFERENCES [1] Rick Boks, Hao Wang, and Thomas Bck. 2020. Experimental Results for thestudy ”A Modular Hybridization of Particle Swarm Optimization and DifferentialEvolution”. (May 2020). https://doi.org/10.5281/zenodo.3814197[2] Cheng-Wen Chiang, Wei-Ping Lee, and Jia-Sheng Heh. 2010. A 2-Opt baseddifferential evolution for global optimization.

Applied Soft Computing

10, 4 (2010),1200 – 1207. https://doi.org/10.1016/j.asoc.2010.05.012 Optimisation Methods &Applications in Decision-Making Processes.[3] M. Clerc and J. Kennedy. 2002. The particle swarm - explosion, stability, and con-vergence in a multidimensional complex space.

IEEE Transactions on EvolutionaryComputation

6, 1 (Feb 2002), 58–73. https://doi.org/10.1109/4235.985692[4] Carola Doerr, Furong Ye, Naama Horesh, Hao Wang, Ofer M Shir, and ThomasB¨ack. 2019. Benchmarking discrete optimization heuristics with IOHprofiler.

Applied Soft Computing (2019), 106027.[5] R. Eberhart and J. Kennedy. 1995. A New Optimizer Using Particle Swarm Theory.

Proceedings of the sixth international symposium on micro machine and humanscience (1995), 39–fi?!43.[6] A. Engelbrecht. 2012. Particle swarm optimization: Velocity initialization. In . 1–8. https://doi.org/10.1109/CEC.2012.6256112[7] N. Hansen, A. Auger, O. Mersmann, T. Tuˇsar, and D. Brockhoff. 2016. COCO: APlatform for Comparing Continuous Optimizers in a Black-Box Setting.

ArXive-prints arXiv:1603.08785 (2016). [8] Tim Hendtlass. 2001. A Combined Swarm Differential Evolution Algorithmfor Optimization Problems. In

Proceedings of the 14th International Conferenceon Industrial and Engineering Applications of Artificial Intelligence and ExpertSystems: Engineering of Intelligent Systems (IEA/AIE fi01) . Springer-Verlag, Berlin,Heidelberg, 11fi?!18.[9] Mahmud Iwan, R. Akmeliawati, Tarig Faisal, and Hayder M.A.A. Al-Assadi.2012. Performance Comparison of Differential Evolution and Particle SwarmOptimization in Constrained Optimization.

Procedia Engineering

41 (2012), 1323– 1328. https://doi.org/10.1016/j.proeng.2012.07.317 International Symposiumon Robotics and Intelligent Sensors 2012 (IRIS 2012).[10] Jingqiao Zhang and A. C. Sanderson. 2007. JADE: Self-adaptive differential evo-lution with fast and reliable convergence performance. In . 2251–2258. https://doi.org/10.1109/CEC.2007.4424751[11] J. Kennedy. 2003. Bare bones particle swarms. In

Proceedings of the 2003 IEEESwarm Intelligence Symposium. SIS’03 (Cat. No.03EX706) . 80–87. https://doi.org/10.1109/SIS.2003.1202251[12] J. Kennedy and R. Mendes. 2002. Population structure and particle swarmperformance. In

Proceedings of the 2002 Congress on Evolutionary Computation.CEC’02 (Cat. No.02TH8600) , Vol. 2. 1671–1676 vol.2. https://doi.org/10.1109/CEC.2002.1004493[13] J. J. Liang and P. N. Suganthan. 2005. Dynamic multi-swarm particle swarmoptimizer. In

Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005.

IEEE Transactions on Evolutionary Computation

8, 3 (June2004), 204–210. https://doi.org/10.1109/TEVC.2004.826074[15] M. G. H. Omran, A. P. Engelbrecht, and A. Salman. 2007. Differential EvolutionBased Particle Swarm Optimization. In .112–119. https://doi.org/10.1109/SIS.2007.368034[16] M. Pant, R. Thangaraj, C. Grosan, and A. Abraham. 2008. Hybrid DifferentialEvolution - Particle Swarm Optimization Algorithm for Solving Global Opti-mization Problems. In . 18–24. https://doi.org/10.1109/ICDIM.2008.4746766[17] K. V. Price. 1997. Differential evolution vs. the functions of the 2/sup nd/ ICEO.In

Proceedings of 1997 IEEE International Conference on Evolutionary Computation(ICEC ’97) . 153–157. https://doi.org/10.1109/ICEC.1997.592287[18] Y. Shi and R. Eberhart. 1998. A modified particle swarm optimizer. In . 69–73. https://doi.org/10.1109/ICEC.1998.699146[19] Rainer Storn and Kenneth Price. 1995. Differential Evolution: A Simple andEfficient Adaptive Scheme for Global Optimization Over Continuous Spaces.

Journal of Global Optimization

23 (01 1995).[20] P. N. Suganthan. 1999. Particle swarm optimiser with neighbourhood operator.In

Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No.99TH8406) , Vol. 3. 1958–1962 Vol. 3. https://doi.org/10.1109/CEC.1999.785514[21] Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown.2013. Auto-WEKA: Combined Selection and Hyperparameter Optimizationof Classification Algorithms. In

Proceedings of the 19th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining (KDD fi13) . As-sociation for Computing Machinery, New York, NY, USA, 847fi?!855. https://doi.org/10.1145/2487575.2487629[22] S. van Rijn, H. Wang, M. van Leeuwen, and T. Bck. 2016. Evolving the structure ofEvolution Strategies. In . 1–8. https://doi.org/10.1109/SSCI.2016.7850138[23] Sander van Rijn, Hao Wang, Bas van Stein, and Thomas B¨ack. 2017. AlgorithmConfiguration Data Mining for CMA Evolution Strategies. In

Proceedings of theGenetic and Evolutionary Computation Conference (GECCO ’17) . ACM, New York,NY, USA, 737–744. https://doi.org/10.1145/3071178.3071205[24] J. Vesterstrom and R. Thomsen. 2004. A comparative study of differentialevolution, particle swarm optimization, and evolutionary algorithms on nu-merical benchmark problems. In

Proceedings of the 2004 Congress on Evolu-tionary Computation (IEEE Cat. No.04TH8753) , Vol. 2. 1980–1987 Vol.2. https://doi.org/10.1109/CEC.2004.1331139[25] Wen-Jun Zhang and Xiao-Feng Xie. 2003. DEPSO: hybrid particle swarm withdifferential evolution operator. In

SMC’03 Conference Proceedings. 2003 IEEEInternational Conference on Systems, Man and Cybernetics. Conference Theme- System Security and Assurance (Cat. No.03CH37483) , Vol. 4. 3816–3821 vol.4.https://doi.org/10.1109/ICSMC.2003.1244483

Modular Hybridization of PSO and DE GECCO ’20 Companion, July 8–12, 2020, Canc´un, Mexico

5D 20D F − F − F − F − F − runtime / dimension p r opo r t i on o f ( f un c t i on , t a r ge t ) pa i r s DEHybridD_T1_BD_PB_BD_PB_ED_T1_ED_O1_BH_I_I_T1_B_P3H_I_M_T1_B_P3H_I_L_T1_B_P3

Figure 1: Empirical Cumulative Distribution Functions (ECDFs) of the top- ranked algorithms in both D and D for eachfunction group defined in BBOB [7]. ECDFs are aggregated over target values { , ,..., − } and the ranking is in accordancewith Table 2 and 3. Note that only eight algorithms appear here since two algorithms are simultaneously among the top fivein both D and D . ECCO ’20 Companion, July 8–12, 2020, Canc´un, Mexico R. Boks et al.

Algorithm Instance F1 F2 F6 F8 F11 F12 F17 F18 F21rank CMA-ES 658.933 2138.400 1653.667 2834.714 2207.400 5456.867 9248.600 13745.867 74140.5381 D T1 B 2.472 1.175 2.261 3.177 1.640 2.362 1.907 9.397 0.5922 D PB B 2.546 1.213 2.321 4.031 1.643 2.580 1.258 5.324 1.0723 D PB E 3.176 1.483 3.635 5.152 1.700 2.750 1.584 4.350 0.3054 D T1 E 3.060 1.477 3.583 3.670 1.660 2.281 2.036 9.112 0.3525 D O1 B 3.152 1.466 3.717 4.155 6.360 8.818 1.445 8.405 0.3836 H I I PB E P3 3.911 1.830 3.817 3.724 2.951 3.055 3.301 3.021 0.5197 H I I PB B P3 3.685 1.694 3.117 3.115 2.912 3.047 2.102 3.222 1.0638 H I G PB B P3 3.138 1.473 2.813 5.656 2.968 3.099 4.684 3.507 2.2519 H I I T1 B P3 3.599 1.700 3.155 5.106 2.837 2.670 2.914 3.975 0.72710 H I N PB B P3 3.480 1.650 3.100 5.061 2.852 2.932 2.453 3.213 1.064. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .411 H I N PB B P2 4.761 2.268 4.744 12.933 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ Table 2: On D , the normalized Expected Running Time (ERT) values of the top- ranked and algorithms ranked in themiddle among all algorithms. The ranking is firstly determining on each test problem with respect to ERT and thenaveraged over all test problem. For the reported ERT values, the target f opt + − is used. All ERT values are normalizedper problem with respect to a reference CMA-ES, shown in the first row of algorithms. Algorithm Instance F1 F2 F6 F8 F11 F12 F17 F18 F21rank CMA-ES 830.800 16498.533 4018.600 19140.467 12212.267 15316.733 5846.400 17472.333 8017591 D T1 B 7.377 0.864 5.912 3.702 2.678 4.699 3.144 3.604 0.3852 D PB B 7.731 0.901 6.884 6.766 3.833 5.999 3.158 1.719 0.1933 H I I T1 B P3 10.988 1.195 7.894 4.153 6.596 7.656 3.988 3.081 0.2984 H I M T1 B P3 12.621 1.434 9.714 5.296 8.389 8.152 4.979 3.138 0.1865 H I L T1 B P3 11.402 1.299 9.271 5.146 8.170 7.422 4.771 3.406 0.3416 H I N T1 B P3 10.641 1.202 8.218 4.705 7.253 7.928 4.325 2.741 0.3387 H D M T1 B P3 12.865 1.476 10.100 6.036 8.119 8.768 5.345 3.450 0.3548 D B2 B 7.983 0.885 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ Table 3: On D , the normalized Expected Running Time (ERT) values of the top- ranked and algorithms ranked inthe middle among all algorithms. The ranking is firstly determining on each test problem with respect to ERT and thenaveraged over all test problem. For the reported ERT values, the target f opt + −1