A Framework to Handle Multi-modal Multi-objective Optimization in Decomposition-based Evolutionary Algorithms
11 A Framework to Handle Multi-modalMulti-objective Optimization inDecomposition-based Evolutionary Algorithms
Ryoji Tanabe,
Member, IEEE, and Hisao Ishibuchi,
Fellow, IEEE
Abstract —Multi-modal multi-objective optimization is to locate(almost) equivalent Pareto optimal solutions as many as possible.While decomposition-based evolutionary algorithms have goodperformance for multi-objective optimization, they are likelyto perform poorly for multi-modal multi-objective optimizationdue to the lack of mechanisms to maintain the solution spacediversity. To address this issue, this paper proposes a frameworkto improve the performance of decomposition-based evolutionaryalgorithms for multi-modal multi-objective optimization. Ourframework is based on three operations: assignment, deletion,and addition operations. One or more individuals can be assignedto the same subproblem to handle multiple equivalent solutions.In each iteration, a child is assigned to a subproblem basedon its objective vector, i.e., its location in the objective space.The child is compared with its neighbors in the solution spaceassigned to the same subproblem. The performance of improvedversions of six decomposition-based evolutionary algorithms byour framework is evaluated on various test problems regardingthe number of objectives, decision variables, and equivalentPareto optimal solution sets. Results show that the improvedversions perform clearly better than their original algorithms.
Index Terms —Multi-modal multi-objective optimization,decomposition-based evolutionary algorithms, reference vector-based evolutionary algorithms, solution space diversity
I. I
NTRODUCTION M ULTI-OBJECTIVE optimization problems (MOPs) ap-pear in real-world applications. Since no solution x can simultaneously minimize multiple objective functions ingeneral, the goal of MOPs is usually to find a Pareto optimalsolution preferred by a decision maker [1]. When the decisionmaker’s preference information is unavailable a priori, an “aposteriori” decision making is conducted. The decision makerselects the final solution x final from a set of solutions thatapproximates the Pareto front in the objective space.An evolutionary multi-objective optimization algorithm(EMOA) is frequently used for the “a posteriori” decisionmaking. Since EMOAs are population-based optimizers, theyare likely to find a set of solutions in a single run. A numberof EMOAs have been proposed in the literature. They can beclassified into the following three categories: dominance-basedEMOAs (e.g., NSGA-II [2]), indicator-based EMOAs (e.g., R. Tanabe and H. Ishibuchi are with Shenzhen Key Laboratory of Computa-tional Intelligence, University Key Laboratory of Evolving Intelligent Systemsof Guangdong Province, Department of Computer Science and Engineering,Southern University of Science and Technology, Shenzhen 518055, China.e-mail: ([email protected], [email protected]). (Corresponding au-thor: Hisao Ishibuchi)
Solution space Objective space
Fig. 1:
Illustration of a situation where three solutions are almostthe same in the objective space but dissimilar in the solution space.
IBEA [3]), and decomposition-based EMOAs (e.g., MOEA/D[4] and NSGA-III [5]). In particular, decomposition-basedEMOAs have shown promising performance [6].In the EMO community, it has been implicitly assumedthat the decision maker is interested only in the distributionof solutions in the objective space. Thus, the distributionof solutions in the solution space has not received muchattention. However, after the decision maker has selected thefinal solution x final based on its objective vector f ( x final ) ,she/he may want to examine other dissimilar solutions withequivalent quality or slightly inferior quality [7]–[9].Fig. 1 shows a situation where the four solutions x a , x b , x c , and x d are far from each other in the solution space butclose to each other in the objective space. Although x d isa dominated solution, it may be acceptable for the decisionmaker. This is because the difference between f ( x a ) , f ( x b ) , f ( x c ) , and f ( x d ) is small enough. If the decision makerhas obtained multiple dissimilar solutions with similar quality,she/he can select x final according to her/his preference inthe solution space. For example, suppose that x a in Fig.1 becomes unavailable due to some accident (e.g., materialshortages and mechanical failures) after the decision makerhas selected x a as x final . In such a case, she/he can substituteone of x b , x c , and x d for x a . Sch¨utze et al. give a morepractical example on space mission design problems [9]. Sebaget al. also demonstrate the importance of multiple equivalentsolutions for the decision maker on functional brain imagingproblems [7]. Previous studies on other real-world problemswith multiple equivalent solutions include diesel engine designproblems [10], distillation plant layout problems [11], androcket engine design problems [12].A multi-modal multi-objective problem (MMOP) is to lo-cate (almost) equivalent Pareto optimal solutions as manyas possible [7], [13]. On the one hand, it is sufficient for a r X i v : . [ c s . N E ] S e p MOPs to find one of x a , x b , and x c in Fig. 1 since theirquality is almost the same. On the other hand, all of x a , x b , x c , and x d should be found for MMOPs. Since multipleequivalent solutions play crucial roles in the reliable decisionmaking process as explained above, MMOPs are important inpractice. Evolutionary multi-modal multi-objective optimiza-tion algorithms (EMMAs) are specially designed optimizersfor MMOPs. Unlike EMOAs, EMMAs have mechanisms tohandle multiple equivalent solutions. Representative EMMAsinclude P Q,(cid:15) -MOEA [9], Omni-optimizer [13], DIOP [14],Niching-CMA [15], and MO Ring PSO SCD [16].In our earlier study [17], we proposed MOEA/D-AD, whichis a specially designed MOEA/D for MMOPs. MOEA/D-ADcan assign multiple individuals to each subproblem in orderto handle equivalent solutions. For each iteration, a child u isassigned to a subproblem whose weight vector is closest to itsobjective vector f ( u ) , in terms of the perpendicular distance.Then, u is compared to the individuals assigned to the samesubproblem based on the weighted Tchebycheff function andthe contribution to the solution space diversity.In this paper, we extend our earlier study [17] and propose ageneral framework (called ADA) to handle multi-modal multi-objective optimization in a variety of decomposition-basedEMOAs (including reference-based algorithms such NSGA-III [5] and θ -DEA [18]). The proposed ADA frameworkuses assignment, deletion, and addition operations. Whiledecomposition-based EMOAs work well for multi-objectiveoptimization, they are likely to perform poorly for multi-modal multi-objective optimization. This is because most ofthem do not have mechanisms to maintain the solution spacediversity. In this paper, we show that efficient EMMAs couldbe realized by incorporating a solution space diversity mainte-nance mechanism into decomposition-based EMOAs. The pro-posed ADA framework facilitates the solution space diversitymaintenance of existing EMOAs. We incorporate ADA intothe following six representative decomposition-based EMOAs:MOEA/D-AGR [19], MOEA/D-DU [20], eMOEA/D [21],NSGA-III [5], θ -DEA [18], and RVEA [22]. We examinethe ability of the ADA versions of those EMOAs (MOEA/D-AGR-ADA, MOEA/D-DU-ADA, eMOEA/D-ADA, NSGA-III-ADA, θ -DEA-ADA, and RVEA-ADA) to locate multipleequivalent Pareto optimal solutions on various test problems.We also analyze the behavior of those ADA versions.The differences between this paper and our previous study[17] are as follows:1) We propose ADA. While MOEA/D-AD [17] is a singlealgorithm for MMOPs, ADA is a framework to improvethe performance of the existing decomposition-basedEMOAs for MMOPs. In contrast to MOEA/D-AD, eachconfiguration (e.g., the assignment operation) in ADAdepends on an EMOA to be combined.2) We demonstrate that ADA can be combined with thesix decomposition-based EMOAs, including so-calledreference vector-based EMOAs. MOEA/D-AD does nothave such flexibility.3) We introduce a practical two-phase decision makingmethod with a solution set found by an ADA-based al-gorithm. We also introduce two post-processing methods for benchmarking an ADA-based algorithm.4) While we used only two-objective and two-variableproblems in [17], we use problems with a various num-ber of objectives M , design variables D , and equivalentPareto optimal solution subsets O to investigate thescalability of EMMAs. Such a scale-up study has notbeen performed in the literature. We also use problemswith distance-related variables [23], [24].5) We compare the six ADA-based algorithms to threestate-of-the-art EMMAs, including TriMOEA-TA&R[24] proposed after the publication of [17].The rest of this paper is organized as follows. Section II pro-vides some preliminaries of this paper. Section III introducesADA and explains how to incorporate it into decomposition-based EMOAs. Section IV describes the experimental setup.Section V examines the performance of the six ADA variants.Section VI concludes this paper.II. P RELIMINARIES
First, Subsections II-A and II-B give definitions of MOPsand MMOPs, respectively. Then, Subsections II-C and II-Ddescribe decomposition-based EMOAs and reference vector-based EMOAs, respectively. Subsections II-C and II-D mainlyexplain components of the six existing EMOAs used in ADA.Algorithms S.1–S.6 in the supplementary file provide theoverall procedures of the six EMOAs. Section III introduceshow each component is used in ADA.
A. Definition of MOPs
A continuous MOP is to find a solution x ∈ S ⊆ R D that minimizes a given objective function vector f : S → R M , x (cid:55)→ f ( x ) . S = (cid:81) Dj =1 [ x min j , x max j ] is the D -dimensionalsolution space where x min j ≤ x j ≤ x max j for each index j ∈{ , ..., D } . R M is the M -dimensional objective space.A solution x is said to dominate x if and only if f i ( x ) ≤ f i ( x ) for all i ∈ { , ..., M } and f i ( x ) < f i ( x ) for at leastone index i . If x ∗ is not dominated by any other solutions, it iscalled a Pareto optimal solution. The set of all x ∗ is the Paretooptimal solution set, and the set of all f ( x ∗ ) is the Paretofront. The goal of MOPs for the “a posteriori” decision makingis to find a non-dominated solution set that approximates thePareto front in the objective space. B. Definition of MMOPs
Although the term “multi-modal multi-objective optimiza-tion” was firstly coined in [7], [25] in 2005, the definitionof MMOPs has not been explicitly given in most previousstudies. We consider the following two types of MMOPs: (i)
Type1-MMOP is to locate all Pareto optimal solutions. (ii)
Type2-MMOP is to locate all Pareto optimal solutions and non-Pareto optimal solutions which have acceptable quality for thedecision maker. Those non-Pareto optimal solutions should befar from the Pareto optimal solutions and the other non-Paretooptimal solutions in the solution space.For example, x a , x b , and x c in Fig. 1 should be found forType1-MMOPs. In addition, the non-Pareto optimal solution x d should be found for Type2-MMOPs if its quality f ( x d ) isacceptable to the decision maker. While most existing studies(e.g., [13], [16]) assume Type1-MMOPs, only a few studies(e.g., [7], [9], [14]) address Type2-MMOPs. Although thereis room for discussion, Type2-MMOPs may be more practicalthan Type1-MMOPs. Diverse solutions with similar quality tothe final solution x final are beneficial to the decision makereven if their quality is slightly worse than x final [7], [9].Nevertheless, we focus only on Type1-MMOPs in thispaper. The main reason is due to the difficulty in benchmarkingEMMAs on Type2-MMOPs. In contrast to Type1-MMOPs,Type2-MMOPs are loosely defined. This is because the terms“acceptable quality” and “far from” in Type2-MMOPs signifi-cantly depend on the decision maker. It is difficult to define thetwo key factors in Type2-MMOPs for benchmarking purposesin a fair manner. How to evaluate the performance of EMMAsfor Type2-MMOPs itself is another research topic. C. Decomposition-based EMOAs (MOEA/D-type algorithms)
Although some frameworks of decomposition-basedEMOAs have been proposed in the literature, MOEA/D-typealgorithms show the promising performance [6]. Here, wedescribe MOEA/D-type algorithms. First, we explain the mostbasic MOEA/D [4] and its modified version called MOEA/D-DE [26]. The framework of MOEA/D-DE is used in mostMOEA/D-type algorithms. Then, we describe componentsof three MOEA/D-type algorithms: MOEA/D-AGR [19],MOEA/D-DU [20], and eMOEA/D [21].
1) MOEA/D:
The most basic MOEA/D [4] decomposes an M -objective MOP into N single-objective subproblems usinga scalarizing function g : R M → R and a set of uniformlydistributed weight vectors W = { w , ..., w N } . For each i ∈ { , ..., N } , w i = ( w i, , ..., w i,M ) T , and (cid:80) Mk =1 w i,k = 1 .The j -th individual x j in the population P is assigned to the j -th subproblem with w j . Thus, the population size µ is alwaysequal to the number of weight vectors N . The j -th subproblemalso has its neighborhood index list B j = { b j, , ..., b j,S } ,which consists of indices of the S closest weight vectors to w j in the weight vector space.After the initialization of P and W , the following steps arerepeatedly performed until a termination condition is satisfied.In each iteration t , for the j -th subproblem ( j ∈ { , ..., N } ),an index list T is set to B j . Two indices a and b of theparent individuals x a and x b are randomly selected from T . u is generated by applying variation operators to x a and x b .The SBX crossover and the polynomial mutation [27] areused in the original MOEA/D. After u has been generated,the replacement selection is applied to each neighborhoodsubproblem k ∈ T . The current solution x k of the k -thsubproblem is replaced with u if g ( u | w k ) ≤ g ( x k | w k ) .
2) MOEA/D-DE:
The differences between MOEA/D andMOEA/D-DE are threefold. First, the differential evolution(DE) operator [28] is used instead of the SBX crossover [27].Second, two types of index lists ( B j and { , ..., N } ) are used.The index list T is set to B j with a probability of δ ∈ [0 , and { , ..., N } with a probability of − δ . Thus, T can beset to all individual indices. Third, the maximum number ofindividuals replaced by the child u is restricted to n rep .
3) MOEA/D-AGR:
The difference between MOEA/D-AGRand MOEA/D-DE is caused by a replacement method. InMOEA/D-AGR, indices of subproblems to be updated by anewly generated solution u are set to K neighborhood indicesof the j -th subproblem in the weight vector space where j is the subproblem index whose scalarizing function value g ( u | w j ) is the best among all N subproblems: j = argmin k ∈{ ,...,N } (cid:8) g ( u | w k ) (cid:9) . (1)The weighted Tchebycheff function g tch is used in (1): g tch ( x | w ) = max i ∈{ ,...,M } (cid:8) w i | f i ( x ) − z ∗ i | (cid:9) , (2)where z ∗ = ( z ∗ , ..., z ∗ M ) T is the ideal point. Since findingthe true ideal point is difficult in general, its approximation isused in (2). The i -th element of the approximation of z ∗ isthe minimum objective value found during the search process.The replacement neighborhood size K plays a crucial role inbalancing exploration and exploitation in MOEA/D-AGR in asimilar manner to S in MOEA/D. A large K value encouragesexploitation. In MOEA/D-AGR, the K value deterministicallyincreases with the number of iterations. The results presentedin [19] show that a scheduling method based on the sigmoidfunction is suitable for MOEA/D-AGR.
4) MOEA/D-DU:
In contrast to MOEA/D-AGR, the se-lection of subproblems that need to be updated is based onthe distance in the objective and weight vector spaces inMOEA/D-DU. In the normalized objective space, the per-pendicular distance between the normalized objective vector f (cid:48) ( u ) and w j is calculated for each j ∈ { , ..., N } . The re-placement is performed for K subproblems with the minimumperpendicular distance. Similar to MOEA/D-AGR, K controlsthe balance between exploration and exploitation.MOEA/D-DU uses the division version of the weightedTchebycheff function g dtch [29]: g dtch ( x | w ) = max i ∈{ ,...,M } (cid:26) | f i ( x ) − z ∗ i | w i (cid:27) , (3)if w i = 0 , it is set to − to avoid division by zero. It isreported in [29] that the distribution of the search directionsof MOEA/D with g dtch is more uniform than that with g tch .
5) eMOEA/D:
In eMOEA/D, the replacement method isapplied to K subproblems with the minimum scalarizingfunction values. MOEA/D-AGR and eMOEA/D are the sameregarding the use of scalarizing function values to selectsubproblems. The following multiplicative scalarizing function(MSF) [21] is used in eMOEA/D: g msf ( x | w ) = (cid:16) max i ∈{ ,...,M } (cid:8) w i | f i ( x − z ∗ i ) | (cid:9)(cid:17) α (cid:16) min i ∈{ ,...,M } (cid:8) w i | f i ( x − z ∗ i ) | (cid:9)(cid:17) α , (4)where α controls the size of the so-called improvement region. α plays a similar role in the penalty value θ of the PBI functionin (7), which will be explained later. Even if f ( x ) is farfrom w j regarding the the perpendicular distance, x wouldbe evaluated as being superior using a sufficiently large α value. In (4), g msf with α = 0 is identical to g dtch . According to a general rule of thumb “emphasize diversityand convergence at the early and later stages, respectively”, α decreases linearly with the number of iterations t : α = β (cid:18) − tt max (cid:19)(cid:18) M (cid:16) min k ∈{ ,...,M } { w k } (cid:17)(cid:19) , (5)where t max is the maximum number of iterations. The recom-mended value of β is . D. Reference vector-based EMOAs
Representative reference vector-based EMOAs includeNSGA-III [5], θ -DEA [18], RVEA [22], VaEA [30], andSPEA/R [31]. While w ∈ W is called the weight vector indecomposition-based EMOAs, it is referred to the referencevector in reference vector-based EMOAs. Below, we explainNSGA-III, θ -DEA, and RVEA.
1) NSGA-III:
NSGA-III is an improved version of NSGA-II [2] for many-objective optimization. For each iteration t ,children Q are generated by applying variation operatorsto randomly selected pairs of individuals from P . In theenvironmental selection, µ individuals for the next iteration t + 1 are selected from the union of P and Q . The primaryand secondary criteria are based on the non-domination levelsand the reference vector-based niching method, respectively.Individuals in the union P ∪ Q are grouped as F , F , ... according to their non-domination levels. First, P and the frontindex i are initialized as P = ∅ and i = 1 , respectively.Then, individuals in F i are added to P and i is incrementeduntil | P | + | F i | ≥ µ . After this operation, P = F ∪ ... ∪ F l − , where l is the index of the last front. If | P | < µ , other µ − | P | individuals are selected from the last front F l usingthe niching method described below. At the beginning of theniching procedure, an individual x in P and F l is assigned tothe j -th subproblem with the minimum perpendicular distancebetween its normalized objective vector f (cid:48) ( x ) and w j : j = argmin k ∈{ ,...,N } (cid:110) PD (cid:0) f (cid:48) ( x ) , w k (cid:1)(cid:111) , (6)where the function PD returns the perpendicular distancebetween two input vectors. After the assignments of all in-dividuals, other individuals in the next iteration are selectedfrom F l based on the number of individuals assigned to eachsubproblem and their perpendicular distance. θ -DEA: The environmental selection in θ -DEA is similarto that of NSGA-III. However, the secondary criterion in θ -DEA is based on the so-called θ -dominance. First, eachindividual in the union of P and F l is assigned to the j -th subproblem using (6). Then, individuals assigned to eachsubproblem are ranked based on their θ -dominance levels.Let us assume that two individuals x and y are assignedto the same j -th subproblem. x is said to θ -dominate y if g pbi ( x | w j ) < g pbi ( y | w j ) . The PBI function g pbi is given as: g pbi ( x | w ) = d + θ d , (7) d = (cid:107) ( f ( x ) − z ∗ ) T w (cid:107)(cid:107) w (cid:107) , (8) d = (cid:13)(cid:13)(cid:13)(cid:13) f ( x ) − (cid:18) z ∗ + d w (cid:107) w (cid:107) (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) , (9) where (cid:107) a (cid:107) indicates the Euclidean norm of a . The distance d represents how close the objective vector f ( x ) is to the Paretofront, and d is the perpendicular distance between f ( x ) and w . The penalty parameter θ balances the convergence ( d )and the diversity ( d ). The recommended θ value is for M vectors w with the axis directions (e.g., (1 , , ..., T ) and for all the other vectors w .
3) RVEA:
RVEA uses a set of unit reference vectors V = { v , ..., v N } , instead of a set of reference vectors W = { w , ..., w N } , where v i = w i / (cid:107) w i (cid:107) for each i ∈ { , ..., N } .RVEA adaptively adjusts V based on the current P .After children Q have been generated, the environmentalselection is applied to the union of P and Q . First, for eachindividual in P ∪ Q , f ( x ) is transformed as f (cid:48) ( x ) = f ( x ) − z ∗ . Then, x in P ∪ Q is assigned to the j -th subproblem withthe minimum angle between f (cid:48) ( x ) and v j : j = argmin k ∈{ ,...,N } (cid:110) angle (cid:0) f (cid:48) ( x ) , v k (cid:1)(cid:111) , (10)where the function angle( a , b ) in (10) returns the anglebetween the two input vectors a and b .Then, individuals assigned to the same subproblem are com-pared based on their angle-penalized distance (APD) values.For each j ∈ { , ..., N } , the best individual with the minimumAPD value can survive to the next iteration. The APD valueof x with the unit reference vector v is given as: APD( x ) = (cid:0) P ( x , v ) (cid:1) (cid:107) f (cid:48) ( x ) (cid:107) , (11) P ( x , v ) = M (cid:32) tt max (cid:33) α (cid:32) angle (cid:0) f (cid:48) ( x ) , v (cid:1) γ ( v ) (cid:33) , (12) γ ( v ) = min s ∈ V \{ v } (cid:8) angle( v , s ) (cid:9) , (13)where P ( x , v ) is a penalty value for x . The larger the anglebetween f (cid:48) ( x ) and v is, the larger the penalty value is given to x . γ ( v ) is used to normalize the angle. t is the current numberof iterations, and t max is the maximum number of iterations.The influence of the angle-based penalty scheme increases asthe search progresses. The recommended setting of α is .III. P ROPOSED
ADA
FRAMEWORK
This section explains the proposed ADA framework and thesix ADA versions. Unlike EMOAs for MOPs, EMMAs forMMOPs need to maintain the diversity of the population inboth the objective and solution spaces. For example, Omni-optimizer [13] uses an aggregate crowding distance metricin the objective and solution spaces. While most EMMAs(e.g., [14]–[16]) aggregate the objective and solution spacediversity metrics similar to Omni-optimizer, ADA uses themin a two-phase manner similar to TriMOEA-TA&R [24]. Morespecifically, ADA handles the objective space diversity by anassignment method in an original EMOA and the solutionspace diversity by a simple niching criterion.On the one hand, a single individual is assigned to eachsubproblem in most decomposition-based EMOAs. Thus, thepopulation size µ is equal to the number of weight/referencevectors N (i.e., µ = N ). On the other hand, one or moreindividuals can be assigned to each subproblem in the ADA framework (i.e., µ ≥ N ). This mechanism is to maintainmultiple equivalent individuals in each subproblem. The µ value is adaptively adjusted in ADA.Algorithm 1 shows the ADA framework. Algorithms S.7–S.12 in the supplementary file show the six ADA-based algo-rithms. Algorithms S.7–S.12 are almost the same to Algorithm1. Lines 3, 11, and 16 are different between Algorithms S.7–S.12 and Algorithm 1. At the beginning of the search, µ isset to N (line 1). The population P and the weight/referencevector set W are also initialized. The i -th individual x i in P isassigned to the i -th subproblem (lines 2–3). This operation isunnecessary for those EMOAs with a reassignment procedureof individuals to subproblems such as RVEA. After the child u has been generated by the reproduction operations (line 6–8),the population P is updated using the assignment (line 10),deletion (lines 16–17), and addition (lines 18–19) operations.The assignment and deletion operations differ depending onan EMOA to be combined with the ADA framework. Afterthe normalization of the µ + 1 objective vectors (line 9), u isassigned to the j -th subproblem (line 10). X denotes a set ofindividuals that have been assigned to the j -th subproblem andare in the neighborhood of u in the solution space (line 11).Thus, ADA requires a neighborhood criterion in the solutionspace. X is explained later in detail using Fig. 2. Two Booleanvariables b winner and b explorer (line 12) are used in the additionoperation (lines 18–19). u enters P if it satisfies either of thefollowing two conditions. One is that there is no individual in X (lines 13–14). The other is that u is better than at least oneindividual in X in the deletion operation (lines 15–17).The following Subsections (from III-A to III-F) explaineach step of ADA in detail. Subsection III-G presents aneffective decision making based on a solution set found bythe proposed approach. Apart from the decision making,Subsection III-H introduces two post-processing methods forbenchmarking. Subsection III-I discusses the applicability ofADA. Subsection III-J discusses the originality of ADA. A. Reproduction operation
ADA uses the basic GA operators (i.e., the SBX crossoverand the polynomial mutation [27]) and the simplest methodof selecting parents. First, two parents x a and x b are ran-domly selected from P such that a (cid:54) = b , regardless of thesubproblem to which each individual has been assigned. Then, u is reproduced by applying SBX to x a and x b . Finally,the polynomial mutation is applied to u . All ADA-basedalgorithms use the same mating selection scheme, regardlessof the mating selection schemes in their original EMOAs.In principle, any variation operators can be incorporated intoADA, including DE operators [26], PSO operators [16], andmodel-based methods [15], [32]. Any parent selection meth-ods can also be used in ADA, including the distance-basedselection methods in the solution space [33]. The performanceof ADA can be improved by using these more sophisticatedmethods. However, if such effective methods are used in ADA,it is unclear which algorithmic component mainly contributesto the overall performance of ADA-based algorithms. Since wewant to investigate the effectiveness of the ADA framework Algorithm 1:
The ADA framework t ← , µ ← N , initialize the population P = { x , ..., x µ } and the weight/reference vector set W = { w , ..., w N } ; for i ∈ { , ..., N } do Assign x i to the i -th subproblem; while The termination criteria are not met do µ ← | P | ; Randomly select a and b from { , ..., µ } such that a (cid:54) = b ; Generate the child u by applying the crossover operationto x a and x b ; Apply the mutation operation to u ; Normalize the objective vectors f ( x ) , ..., f ( x µ ) , f ( u ) ; Assign u to the j -th subproblem; X ← { x ∈ P | x has been assigned to the j -thsubproblem and is in the neighborhood of u in thesolution space } ; b explorer ← FALSE and b winner ← FALSE ; if X = ∅ then b explorer ← TRUE ; for x ∈ X do if x is worse than u then P ← P \{ x } and b winner ← TRUE if b winner = TRUE or b explorer = TRUE then P ← P ∪ { u } ; t ← t + 1 ; return P for the decision making (Subsection III-G) orbenchmarking (Subsection III-H); in an isolated manner, we use the simplest variation operatorsand parent selection method in this paper. B. Normalization
Since the objective functions of most real-world problemsare differently scaled, the normalization method is mandatory.In ADA, the normalization method depends on an EMOA tobe combined. For example, NSGA-III-ADA uses the intercept-based normalization method in NSGA-III.If the original EMOA does not have a normalization method(e.g., MOEA/D-AGR), we use the following simple normal-ization method. The objective vector f ( x ) is normalized usingthe approximated ideal point z ∗ = ( z ∗ , ..., z ∗ M ) T and the worstpoint in the population z worst = ( z worst1 , ..., z worst M ) T . The i -thelement of the normalized objective vector f (cid:48) ( x ) is given as: f (cid:48) i ( x ) = ( f i ( x ) − z ∗ i ) / ( z worst i − z ∗ i ) . C. Assignment operation
The child u is assigned to the j -th subproblem based onits normalized objective vector (line 10 in Algorithm 1). Theassignment operation plays a crucial role in ADA to maintainthe diversity of the population in the objective space.Each EMOA has a different method to select the index j ofthe subproblem to which u is assigned. MOEA/D-AGR-ADAand eMOEA/D-ADA select the j -th subproblem with the min-imum scalarizing function value g ( u | w j ) as in (1). MOEA/D-DU-ADA, NSGA-III-ADA, and θ -DEA-ADA assign u to the j -th subproblem with the minimum perpendicular distancebetween f (cid:48) ( u ) and w j as in (6). RVEA selects the subproblem with the minimum angle between f (cid:48) ( u ) and the unit referencevector v j as in (10). ADA requires only a single index j for u whereas MOEA/D-AGR, eMOEA/D, and MOEA/D-DU select K subproblem indices from { , ..., N } . D. Neighborhood criterion in the solution space
ADA requires the neighborhood criterion in the solutionspace (line 11 in Algorithm 1). While the objective space di-versity is maintained by the assignment operation, the solutionspace diversity is controlled by the neighborhood criterion.We use a simple relative distance-based neighborhood crite-rion presented in [17]. First, the normalized Euclidean distancebetween each individual x in P and u is calculated in thesolution space. The upper and lower bounds for each decisionvariable of a problem are used for the normalization. Then, all µ individuals in P are sorted based on their distance valuesin descending order. If x is within the L nearest individualsfrom u , x is said to be a neighbor of u in the solution space. L is a control parameter in this neighborhood criterion.In addition to the relative distance-based neighborhoodcriterion, any neighborhood criterion can be incorporated intoADA. A number of niching methods have been proposed inthe multi-modal single-objective optimization community [34].Modern niching methods include the adaptive radius-basedmethod [35] and the nearest-better clustering [36]. However,our preliminary results show that such sophisticated nichingmethods do not work well in ADA. The main reason for thefailure is that an appropriate parameter specification for sucha modern niching method is difficult due to the difficulty inunderstanding the fitness landscape of an MOP [37]. For thisreason, we use the simple neighborhood criterion. E. Deletion operation
Let X be a set of individuals that are in the neighborhoodof u in the solution space among the individuals assignedto the same j -th subproblem as u (line 11 in Algorithm 1).Fig. 2 shows an example of X . In Fig. 2, µ = 10 , N = 4 , L = 3 , and u has been assigned to the third-subproblem ( w ).A set of neighborhood individuals of u in the solution spaceis Y = { x b , x c , x d } . A set of individuals assigned to thethird subproblem in the objective space is Z = { x a , x b , x c } .In this case, X = Y ∩ Z = { x b , x c } . Whereas x d is theneighborhood individual of u , x d has been assigned to thesecond subproblem ( w ). x a has been assigned to the thirdsubproblem with u , but x a is not in the neighborhood of u .Since x a is dominated by u with respect to the two objectivesin Fig. 2, x a is deleted in most EMOAs. In contrast, x a cansurvive in P in ADA. This is because x a is not a neighbor of u (and any other individual assigned to the third subproblem).Thus, x a is not compared to u . Although the quality of x a ispoor, x a contributes to the solution space diversity of P .A paired comparison between u and each x in X isperformed (lines 15–17 in Algorithm 1). If x is evaluated asbeing worse than u (by the evaluation criterion explained inthe next paragraph), x is deleted from P . The deletion ofsuch x is reasonable since it is based on both the quality inthe objective space and the diversity in the solution space. Solution space Objective space
Individuals assigned tothe same subproblem Neighborhood individuals in the solution space
Fig. 2:
Example of X . In this example, X = { x b , x c } . Since x a and x d are not compared to the child u in ADA, x a and x d surviveto the next iteration independent of their quality. The comparison criterion depends on the environmentalselection in each original EMOA. In MOEA/D-type algo-rithms, the comparison is based on the scalarizing function g .MOEA/D-AGR-ADA, MOEA/D-DU-ADA, and eMOEA/D-ADA use g tch in (2), g dtch in (3), and g msf in (4), respec-tively. If g ( x | w j ) ≥ g ( u | w j ) on the j -th subproblem, x isremoved from P . In NSGA-III-ADA and θ -DEA-ADA, theprimary comparison between x and u is based on the Pareto-dominance relation. In NSGA-III-ADA, ties are broken bythe perpendicular distance between the normalized objectivevector and w j . In θ -DEA-ADA, the tie-breaker is the θ -dominance. RVEA-ADA uses the APD value given in (11). F. Addition operation
The child u is added to the population P if either of thefollowing two conditions is met (lines 18–19 in Algorithm 1).One is that no individual exists in X . An empty X meansthat there is no neighborhood individual of u in P in thesolution space (or the objective space). If the first conditionis met, u enters P without any comparison. Although u withinferior quality is likely to enter P , it helps P to maintainthe solution space diversity (or the objective space diversity).While a dominated individual is unlikely to survive to the nextiteration in most EMOAs, it can remain in P due to the firstcriterion in the addition operation.The other is that u performs better than at least oneindividual in X in the deletion operation (lines 16–17 inAlgorithm 1). Since u has good quality in its neighborhoodin the solution space, it should be added to P . G. Decision making
We introduce an effective decision making with a solu-tion set found by an ADA-based algorithm. Fig. 3 showsan example of the decision making. Fig. 3 (a) exhibits thedistribution of all objective vectors in the final population P of MOEA/D-AGR-ADA on the two-objective and two-variableSYM-PART1 problem [38]. As shown in Fig. 3 (d), nineequivalent Pareto optimal solution sets are on the nine lines inSYM-PART1. The experimental setting is described in SectionIV later. The decision making in ADA is the following two-phase method based on A primary and A secondary . (a) P in the obj. space The objective vector ofthe 77th subproblem (b) A primary in the obj. space .
51 0 .
53 0 . . . . . . (c) A secondary in the obj. space − − − − − − (d) A secondary in the solution space Fig. 3:
Example of the decision making based on multiple equivalent solutions found by the proposed approach. Solutions obtained byMOEA/D-AGR-ADA on SYM-PART1 are shown. These figures show results of a single run with a median IGD + value among 31 runs. InFig. 3 (a), (b), and (c), the x and y axes represent f and f , respectively. In Fig. 3 (d), the x and y axes represent x and x , respectively.The nine gray lines in Fig. 3 (d) are the equivalent Pareto optimal solution sets. A primary -based decision making: Fig. 3 (a) shows that P contains a large number of solutions. This is undesirable forthe decision maker, because she/he usually wants to examinea small number of well-distributed solutions in the objectivespace [4]. To address this issue, the best solution for the j -thsubproblem is selected from P based on the same criterionin the deletion operation ( j ∈ { , ..., N } ). Then, only non-dominated solutions are selected from the N best solutions.Let A primary be a set of the non-dominated solutions obtainedby this procedure. Fig. 3 (b) shows non-dominated solutions in A primary in the objective space. The decision maker examinesobjective vectors in A primary , rather than P .When the decision maker wants to examine N or less non-dominated solutions (e.g., only 10 solutions), other solution se-lection methods can be used to select A primary from P . If thenumber of objectives M is less than , efficient hypervolume-based selection approaches (e.g., [39]) are available. If M ≥ ,computationally cheap distance-based selection methods (e.g.,[40]) can be used. A secondary -based decision making: After the decisionmaker has examined A primary in the objective space, she/heselects the final solution x final from A primary . Suppose thatthe decision maker has selected a solution on the 77-thsubproblem as x final in Fig. 3 (b). In ADA, she/he can examineother dissimilar solutions with similar quality to x final . Let A secondary be a set of all solutions assigned to the same77-th subproblem with x final . Figs. 3 (c) and (d) show thedistribution of all nine solutions in A secondary in the objectiveand solution spaces, respectively. Although the nine solutionsin A secondary have almost the same quality in the objectivespace, they are dissimilar in the solution space. In additionto x final , the decision maker can examine other candidates in A secondary based on her/his preference in the solution space. H. Two Post-processing methods for benchmarking
Apart from the practical two-phase decision making methoddescribed in Subsection III-G, we here consider benchmarkingof ADA-based EMMAs. The performance of EMOAs andEMMAs is evaluated using performance indicators. However,a fair performance comparison is difficult between ADA-based EMMAs and other optimizers. This is because the currentpopulation P in ADA can contain an unbounded number ofsolutions. Most performance indicators cannot assess solutionsets with different sizes in a fair manner [41]. IGD + [42] andIGDX [32] described in Section IV later are not exceptions.Thus, ADA requires a method of selecting a constant numberof solutions from P for benchmarking studies.We introduce two post-processing methods that are used toevaluate the performance of ADA-based algorithms for MOPsand MMOPs, respectively. In general, uniformly distributedsolutions are unlikely to be uniformly distributed objectivevectors due to a non-uniform mapping from the solution spaceto the objective space. To address this issue, we use thetwo post-processing methods for the objective and solutionspaces, respectively. On the one hand, the same method ofselecting A primary from P described in Subsection III-G isused for performance indicators of MOPs (e.g., IGD [43]).Since indicators of MOPs assess the distribution of a solutionset in the objective space, the choice of A primary is reasonable.On the other hand, the solution distance-based selectionmethod presented in [17] is used for performance indica-tors for MMOPs (e.g., IGDX [32]). Let us consider thetask of selecting N sparsely distributed solutions from allnon-dominated solutions in P . Below, D ( x , A ) denotes thedistance between a solution x and its nearest solution in asolution set A in the normalized solution space. First, A tertiary is set to empty. A solution is randomly selected from thenon-dominated solution set and stored into A tertiary . Then, asolution with the maximum D ( x , A tertiary ) value is repeatedlyadded to A tertiary until | A tertiary | = N . Unlike A primary and A secondary , A tertiary is used only for benchmarking of ADA-based algorithms. I. Applicability of ADA
ADA is a framework to improve the performance ofdecomposition-based EMOAs for MMOPs. We do not claimthat ADA can be combined into any decomposition-basedEMOAs. Since ADA requires a method of assigning a child u to a subproblem, ADA is inapplicable to EMOAs withno assignment mechanism. Such EMOAs include MOEA/D [4] and MOEA/D-DRA [44]. Also, the deletion operationperforms the pairwise comparison independently from otherindividuals. Thus, ADA is not applicable to EMOAs whoseenvironmental selection is performed for all individuals in P ,such as MOEA/D-STM [45] and VaEA [30].In summary, ADA can be combined into EMOAs with anassignment mechanism and a pairwise comparison-based envi-ronmental selection. The six EMOAs explained in SubsectionsII-C and II-D satisfy these conditions. In addition, ADA isapplicable to I-DBEA [46] and SPEA/R [31].Fortunately, even if a decomposition-based EMOA doesnot satisfy the above-mentioned two conditions, ADA can beapplied to the EMOA after some modifications. For example,since MOEA/D does not have the assignment method, we can-not directly combine ADA in MOEA/D. However, MOEA/Dcan be easily modified by using any of the three assignmentmethods described in Subsection III-C. In fact, MOEA/D-AD proposed in [17] is an ADA-based MOEA/D with theassignment operation in (6). Thus, the applicability of ADAis not limited to only a few decomposition-based EMOAs. J. Originality of ADA
In addition to MOEA/D-AD [17], a variant of MOEA/D forMMOPs is proposed in [47]. The MOEA/D variant assigns K individuals to each subproblem. The fitness value is basedon the PBI function value and two distance values in thesolution space. The main disadvantage of the MOEA/D variantin [47] is the difficulty in finding a proper K value. Sincethe number of equivalent Pareto optimal solution subsetsis unknown a priori, fine-tuning of K is necessary for agiven problem. Although a multi-start decomposition-basedapproach is proposed in [38], it has a similar disadvantage.In contrast, ADA adaptively adjusts the number of individualsassigned to each subproblem. Thus, ADA does not require aproblem-dependent parameter such as K .TriMOEA-TA&R [24] consists of various advanced com-ponents, including the convergence and diversity archives-based strategy as in Two Arch2 [48], the decision variable-decomposition method in MOEA/DVA [49], and the angle-based individual assignment in RVEA [22]. ADA andTriMOEA-TA&R are similar in that they handle diversity inthe objective and solution spaces in a two-phase manner. WhileTriMOEA-TA&R uses an absolute distance-based neighbor-hood criterion with σ niche , ADA uses the relative distance-based neighborhood criterion with L . ADA uses a muchsimpler mating scheme, and it does not use any decompositionmethod of decision variables. ADA also uses the adaptivepopulation sizing strategy to handle equivalent solutions, asmentioned above. Whereas TriMOEA-TA&R is an algorithmfor MMOPs with distance-related variables, ADA is a generalframework for various MMOPs.IV. E XPERIMENTAL SETTINGS
A. Test problems
We use the following eight multi-modal multi-objective testproblems: the Two-On-One problem [50], the three SYM-PART problems [38], the two SSUF problems [51], the Poly-gon problem [52], and the Omni-test problem [13]. Table I TABLE I:
Properties of multi-modal multi-objective test problems,where M , D , and O denote the number of objectives, designvariables, and equivalent Pareto optimal solution subsets, respectively. Test problems
M D O
Two-On-One [50] and SSUF1,3 [51] 2 2 2SYM-PART1–3 [38] 2 2 9Polygon [52] Any 2 AnyOmni-test [13] 2 Any D shows their properties, including the number of objectives M , the number of decision variables D , and the number ofequivalent Pareto optimal solution sets O .Two-On-One has two equivalent Pareto optimal solution setsthat are symmetrical with respect to the origin. EquivalentPareto optimal solution sets are on the nine lines in SYM-PART1, as shown in Fig. 3. The nine lines are rotatedin SYM-PART2. In addition, the nine lines are distortedin SYM-PART3. SSUF1 and SSUF3 have two symmetricalPareto optimal solution sets. We evaluate the scalability ofEMMAs to M using Polygon. We set M of Polygon asfollows: M ∈ { , , , } . Although O can be any number inPolygon, it was set to be nine. We investigate the scalability ofEMMAs to D and O using Omni-test. We set D as follows: D ∈ { , , , , } . O increases exponentially with increased D in Omni-test. In summary, we use 15 test problem instances.HPS [23] and MMMOP [24] have been recently proposed.However, HPS and MMMOP have so-called “distance-related”variables that affect only the distance between the objectivevector and the Pareto front. For this reason, we do not mainlyuse HPS and MMMOP for our benchmarking study. We useHPS and MMMOP only in Subsections V-E and V-F. B. Performance indicators
Below, A is a set of solutions obtained by an EMMA. A ∗ is also a set of reference solutions in the Pareto optimalsolution set. The size of A ∗ was set to . For A ∗ ofeach problem, solutions were selected from randomlygenerated
10 000
Pareto-optimal solutions using the distance-based solution selection method [17] (Subsection III-H).We use IGD + [42] to evaluate A in terms of both conver-gence to the Pareto front and diversity in the objective space: IGD + ( A ) = 1 | A ∗ | (cid:88) z ∈ A ∗ min x ∈ A (cid:110) d (cid:0) f ( x ) , f ( z ) (cid:1)(cid:111) , (14) where d ( a , b ) = (cid:113)(cid:80) Mi =1 (cid:0) max { a i − b i , } (cid:1) . IGD + is amodified version of IGD [43]. While the original IGD is Paretonon-compliant, IGD + is weakly Pareto compliant.We evaluate how well A approximates the Pareto-optimalsolution set in the solution space using IGDX [32]: IGDX( A ) = 1 | A ∗ | (cid:88) z ∈ A ∗ min x ∈ A (cid:110) ED (cid:0) x , z (cid:1)(cid:111) , (15) where ED( a , b ) is the Euclidean distance between a and b . EMOAs that can find A with small IGD + and IGDX valuesare efficient multi-objective optimizers and multi-modal multi-objective optimizers, respectively. In the ADA-based algo-rithms, A primary and A tertiary explained in Subsection III-Hare used for the IGD + and IGDX calculations, respectively. C. Average performance score
We use the average performance score (APS) [53] in orderto aggregate results on various problems. Suppose that n algorithms A , ..., A n are compared for a problem instancebased on the indicator values obtained in multiple runs. Foreach i ∈ { , ..., n } and j ∈ { , ..., n } \{ i } , if A j signifi-cantly outperforms A i using the Wilcoxon rank-sum test with p < . , then δ i,j = 1 ; otherwise, δ i,j = 0 . The score P ( A i ) is defined as follows: P ( A i ) = (cid:80) nj ∈{ ,...,n }\{ i } δ i,j .The score P ( A i ) represents the number of algorithms thatoutperform A i . The APS value of A i is the average of the P ( A i ) values for all problem instances. A small APS valueof A i indicates that A i performs well among n algorithms. D. EMMAs and EMOAs
We examine the performance of the six ADA-based EM-MAs: MOEA/D-AGR-ADA, MOEA/D-DU-ADA, eMOEA/D-ADA, NSGA-III-ADA, θ -DEA-ADA, and RVEA-ADA. Weimplemented the ADA-based algorithms using jMetal [54].Their source codes can be downloaded from the supplementarywebsite (https://sites.google.com/view/mmoada). We comparethe ADA-based EMMAs to their original EMOAs: MOEA/D-AGR [19], MOEA/D-DU [20], eMOEA/D [21], NSGA-III [5], θ -DEA [18], and RVEA [22]. Our implementations of the sixoriginal EMOAs were based on their corresponding articles,except for MOEA/D-AGR. We replaced the DE operator withSBX in MOEA/D-AGR to remove the effect of variationoperators. For details, refer to the corresponding articles.The number of maximum function evaluations was
30 000 . runs were performed for each test problem. A set ofweight/reference vectors were generated using the simplex-lattice design method for M < and its two-layered version[5] for M ≥ . The number of the weight/reference vectors N was , , , , and , for M = 2 , , , ,and , respectively. The SBX crossover and the polynomialmutation were used in all methods, including MOEA/D-AGR.Their control parameters were set as follows: p c = 1 , η c = 20 , p m = 1 /D , and η m = 20 . According to the analysis presentedin [17], L of the neighborhood criterion in ADA was set to L = (cid:98) . µ (cid:99) . For example, L = 201 when µ = 2 019 . Otherparameters were set according to the corresponding references.V. E XPERIMENTAL RESULTS
This section shows performance analysis of the six ADA-based EMMAs. Subsection V-A describes the effect of ADAon the six EMOAs. Subsection V-B compares the six ADA-based EMMAs to state-of-the-art EMMAs. Subsection V-Cdiscusses the adaptive population sizing in ADA. SubsectionV-D investigates the influence of the L value on the per-formance of the six ADA-based EMMAs. Subsection V-E TABLE II: Results of the six EMOAs and their ADA versions onthe 15 test problem instances. Tables (a) and (b) show the APS valuesof the algorithms for IGD + and IGDX, respectively. AGR and DUstand for MOEA/D-AGR and MOEA/D-DU, respectively.(a) IGD + Orig. ADAAGR-ADA 0.0 (1) 1.0 (2)DU-ADA 0.0 (1) 0.8 (2)eMOEA/D-ADA 0.2 (1) 0.7 (2)NSGA-III-ADA 0.0 (1) 0.8 (2) θ -DEA-ADA 0.1 (1) 0.6 (2)RVEA-ADA 0.5 (2) 0.3 (1) (b) IGDX Orig. ADAAGR-ADA 1.0 (2) 0.0 (1)DU-ADA 0.9 (2) 0.0 (1)eMOEA/D-ADA 1.0 (2) 0.0 (1)NSGA-III-ADA 1.0 (2) 0.0 (1) θ -DEA-ADA 1.0 (2) 0.0 (1)RVEA-ADA 1.0 (2) 0.0 (1) examines the performance of the six ADA-based EMMAs ontest problems with distance-related variables. Subsection V-Fpresents a comparison with the state-of-the-art EMMAs usingan unbounded external archive. Subsection V-G discusses theruntime of the ADA-based algorithms. A. Effect of ADA
Tables II (a) and (b) show the paired comparison of eachEMOA and its ADA version on the 15 test problem instancesin terms of IGD + and IGDX, respectively. Table II showsonly the APS value of each algorithm at the final iteration.Table S.1 in the supplementary file shows detailed results. Asdescribed in Subsection III-H, only N solutions in A primary and A tertiary were used for the IGD + and IGDX calculations.Thus, all algorithms are compared under the same number ofsolutions. We set the p value to . for the Wilcoxon rank-sum test. M -Polygon is the M -objective Polygon problem, and D -Omni-test is the D -variable Omni-test problem. Below, wedescribe the results of IGD + and IGDX.
1) IGD + : Table II (a) shows that the original EMOAsoutperform their ADA-based EMMAs regarding IGD + (exceptfor RVEA-ADA). In general, EMMAs perform slightly worsethan EMOAs for multi-objective optimization [15], [16]. WhileEMOAs aim to find a good approximation of the Pareto front,EMMAs attempt to locate all Pareto optimal solutions. Forthis reason, the performance of the ADA versions regardingIGD + is worse than that of their original EMOAs.However, as shown in Table S.1 in the supplementary file,the IGD + value of the ADA versions is only . times worsethan that of their original EMOAs even in the worst case.Also, the ADA versions perform better than their originalEMOAs on some problems. For example, eMOEA/D-ADAobtains better IGD + values than eMOEA/D on SYM-PART2,SSUF3, and 10-Polygon. As discussed in [55], a mechanismfor maintaining the solution space diversity can help the ADAversions to find high-quality solutions.
2) IGDX:
Table II (b) shows that the ADA-based EMMAssignificantly outperform their original EMOAs. As shown inTable S.1, the IGDX value of the ADA versions is . times better than that of their original EMOAs in the best case.These results indicate that ADA improves the performance ofthe six decomposition-based EMOAs for MMOPs.Table S.1 shows that the ADA-based EMMAs work wellon Polygon with M ∈ { , , , } and Omni-test with D ∈ { , , , , } . Since the number of equivalent Pareto . .
51 00 . (a) eMOEA/D . .
51 00 . (b) NSGA-III . .
51 00 . (c) RVEA . .
51 00 . (d) eMOEA/D-ADA . .
51 00 . (e) NSGA-III-ADA . .
51 00 . (f) RVEA-ADA Fig. 4:
Distribution of non-dominated solutions found by eachmethod in the objective space on the three-objective Polygon prob-lem. The x, y, and z axes represent f , f , and f , respectively. (a) eMOEA/D (b) NSGA-III (c) RVEA (d) eMOEA/D-ADA (e) NSGA-III-ADA (f) RVEA-ADA Fig. 5:
Distribution of non-dominated solutions found by eachmethod in the solution space on the three-objective Polygon problem.The x and y axes represent x and x , respectively. optimal solution sets O increases exponentially with increased D in Omni-test, the results show that ADA can also handleproblems with a large O value. In summary, ADA has asufficient scalability to M , D , and O .
3) Distribution of solutions:
Figs. 4 and 5 show the distri-bution of non-dominated solutions found by eMOEA/D-ADA,NSGA-III-ADA, RVEA-ADA, and their original versions onPolygon with M = 3 in the objective and solution spaces,respectively. These figures show results of a single run with amedian IGD + and IGDX values, respectively. For the ADA-based algorithms, non-dominated solutions in A primary and A tertiary are shown in Figs. 4 and 5, respectively. Figs. S.1and S.2 in the supplementary file show the results of the otherADA-based algorithms, but the results are similar to Figs. 4and 5. It was shown in [5] that some decomposition-basedEMOAs did not work well on problems with convex Paretofronts. It was also presented in [56] that some decomposition-based EMOAs performed poorly on problems with inverted-triangular Pareto fronts. Since Polygon has a convex and an in- TABLE III: Results of the nine EMMAs on the 15 test probleminstances. Tables (a) and (b) show the APS values of the algorithmsfor the IGD + and IGDX indicators, respectively. The numbers inparentheses are the ranks of the algorithms based on their APS values.(a) IGD + MOEA/D-AGR-ADA 3.7 (7)MOEA/D-DU-ADA 2.8 (5)eMOEA/D-ADA 2.9 (6)NSGA-III-ADA 1.7 (2) θ -DEA-ADA 2.7 (4)RVEA-ADA 5.9 (9)TriMOEA-TA&R 2.4 (3)MO Ring PSO SCD 4.2 (8)Omni-optimizer 1.6 (1) (b) IGDX MOEA/D-AGR-ADA 2.1 (3)MOEA/D-DU-ADA 2.1 (4)eMOEA/D-ADA 2.3 (5)NSGA-III-ADA 0.8 (1) θ -DEA-ADA 1.0 (2)RVEA-ADA 4.1 (6)TriMOEA-TA&R 7.7 (9)MO Ring PSO SCD 4.3 (7)Omni-optimizer 5.1 (8) verted triangular Pareto front, it is difficult for decomposition-based EMOAs to find good solutions.For the above-mentioned reasons, no method in Fig. 4can approximate the Pareto front of Polygon well. Never-theless, some ADA-based algorithms (e.g., eMOEA/D-ADAand RVEA-ADA) find better distributed objective vectors thantheir original EMOAs. This unintended effect of ADA may becaused by the diversity of the population in the solution space[55]. Although we do not claim that ADA can improve theperformance of EMOAs for MOPs in addition to MMOPs, thesolution space diversity maintenance might help the originalEMOAs to handle problems with irregular Pareto fronts.In the Polygon problem with M = 3 and O = 9 , equivalentPareto solution sets are inside of the nine regular trianglesin the solution space. Figs. 5 (a)–(c) show that the originalEMOAs cannot locate all equivalent Pareto solution sets well.In contrast, Figs. 5 (d)–(f) demonstrate that their ADA versionscan locate all nine equivalent Pareto solution sets. Results onother test problems are similar to Fig. 5. B. Comparison with other EMMAs
In Subsection V-A, we demonstrated that ADA can improvethe performance of the six EMOAs for MMOPs. Here, wecompare the six ADA-based algorithms to the followingthree EMMAs: Omni-optimizer [13], MO Ring PSO SCD[16], and TriMOEA-TA&R [24]. Omni-optimizer is the mostrepresentative EMMA. MO Ring PSO SCD and TriMOEA-TA&R are recently proposed methods. Default parametersettings were used for the three methods.Table III shows the APS values of the nine EMMAs onthe 15 test problem instances. Table S.2 in the supplementaryfile shows detailed results. TriMOEA-TA&R incorrectly recog-nizes that some problems have distance-related variables (e.g.,SYM-PART1). In such a case, TriMOEA-TA&R generates aset of additional solutions A add by recombining solutions inthe diversity archive and the distance-related variables at theend of the search. Since | A add | > N , methods of selecting N solutions from A add are needed. We use the two post-processing methods in ADA. In the same manner as in ADA, A primary and A tertiary are used for the IGD + and IGDXcalculations, respectively. We use the selection method inNSGA-III-ADA to obtain A primary from A add .Table III (a) shows that Omni-optimizer performs the bestregarding IGD + . The good performance of Omni-optimizer Number of evaluations P opu l a t i on s i z e µ Fig. 6:
Evolution of µ of NSGA-III-ADA on Omni-test with D ∈{ , , , , } . The mean µ values over 31 runs are reported. Number of evaluations C u m u l a t i v enu m be r Addition (explorer)Addition (winner)Deletion
Fig. 7:
Cumulative number of the activations of the addition opera-tion based on the two criteria ( b explorer and b winner ) and the deletionoperation in NSGA-III-ADA on 3-Polygon ( M = 3 ). Results of asingle run with a median IGDX value among 31 runs are shown. for MOPs is consistent with the results presented in [16], [17].NSGA-III-ADA shows the best performance regarding IGD + among the six ADA-based algorithms. Table III (b) shows thatthe six ADA-based algorithms perform better than the threeother algorithms regarding IGDX. TriMOEA-TA&R performsthe worst regarding IGDX. This is because the 15 probleminstances have no distance-related variables. In summary, theresults indicate that the six ADA-based algorithms have betterperformance than the state-of-the-art algorithms for MMOPs. C. Adaptive population sizing in ADA
In ADA, the number of individuals assigned to each sub-problem is adaptively adjusted by the deletion and additionoperations. Thus, the population size µ is not constant. Here,we discuss the adaptive population sizing in ADA.Fig. 6 shows the change of µ of NSGA-III-ADA on Omni-test with D ∈ { , , , , } . Results of the other ADAversions are similar to Fig. 6. The µ value for D = 3 isalways larger than that for D = 2 . In contrast, the µ valuedecreases as the D value increases from D = 3 . This may bebecause problems with a large D value are generally difficultfor EMMAs. Since the search does not proceed well for D ∈ { , , } , the µ value does not significantly increase.The difficulty of finding multiple equivalent Pareto sets hasbeen reported for the case of a large D value [57].In addition, the trajectory of µ is problem-dependent. Fig.S.3 in the supplementary file shows the change of µ on otherproblems. While the µ value is approximately on Two-On-One and 3-Polygon at the end of the search, it is approximately – on other problems. Fig. S.3 shows that the µ value sharply increases in early iterations. After some evaluations,the µ value is stable.Fig. 7 shows the cumulative numbers of the activationsof the addition operation based on the two criteria ( b explorer and b winner in Algorithm 1) and the deletion operation inNSGA-III-ADA on 3-Polygon. On the one hand, the numberof activations of the addition operation based on b winner isalmost the same as that of the deletion operation throughoutthe evolution in Fig. 7. This competitive behavior of the twooperations leads to the stable µ value in Fig. S.3. On the otherhand, the addition operation based on b explorer is frequentlyperformed only in an early stage of evolution in Fig. 7. Thus,the sharp increase of the µ value in Fig. S.3 is due to the b explorer -based addition operation. Since only one individualis assigned to each subproblem at the beginning of the search,the b explorer -based addition operation is often activated.In summary, the µ value is adaptively adjusted by thecooperative behavior of the deletion and addition operationsin ADA. In the addition operation, the roles of the twocriteria b explorer and b winner differ from each other. Thus, it isimportant for ADA to use both b explorer and b winner . D. Impact of L on the six ADA-based algorithms We investigate the influence of the L value on the per-formance of the ADA-based algorithms. Table IV shows theAPS values of the six ADA-based algorithms with six L values on the 15 test problem instances. Tables S.3–S.8 inthe supplementary file show detailed results.Table IV (a) shows that the six ADA-based algorithms with L = (cid:98) . µ (cid:99) have the worst performance regarding IGD + .Too small L values degrade the performance of the ADA-based algorithms regarding IGD + . Table IV (b) shows that L = (cid:98) . µ (cid:99) is most suitable for all ADA-based algorithms(except for RVEA-ADA) in terms of IGDX. Too large L values degrade the performance of the ADA-based algorithmsregarding IGDX. In summary, L = (cid:98) . µ (cid:99) are suitable formost ADA-based algorithms for MMOPs.We also investigate the influence of L on the evolution of µ .Figs. S.4 and S.5 in the supplementary file show the change of µ of NSGA-III-ADA with various L values on Omni-test withvarious D values and other problems, respectively. Results ofother ADA versions are similar to the results of NSGA-III-ADA. Figs. S.4 and S.5 indicate that the µ value decreasesas the L value increases. The larger the L value is, the moreindividuals can be neighbors of the child u . As a result, theniching mechanism in ADA deteriorates. These observationsare consistent with the above-mentioned results of IGDX. E. Results on test problems with distance-related variables
This subsection shows results on the HPS and MMMOP testproblems. Although the six HPS problem instances (HPS1,..., HPS6) are proposed in [23], we use only HPS2. This isbecause all the other five problem instances are variants ofHPS2 and also because the details of only HPS2 are providedin [23]. M , D , and O in HPS2 are as follows: M = 2 , D = 7 ,and O = 4 . The 20 MMMOP problem instances (MMMOP1A,..., MMMOP6D) are proposed in [24]. We also use those 20 TABLE IV:
Results of the six ADA-based algorithms with L ∈{(cid:98) . µ (cid:99) , (cid:98) . µ (cid:99) , (cid:98) . µ (cid:99) , (cid:98) . µ (cid:99) , (cid:98) . µ (cid:99) , (cid:98) . µ (cid:99)} on the 15 testproblem instances. Tables (a) and (b) show the APS values of thealgorithms for the IGD + and IGDX indicators, respectively. AGRand DU stand for MOEA/D-AGR and MOEA/D-DU, respectively.(a) IGD + (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) AGR-ADA 1.6 (6) 0.9 (3) 0.6 (1) 0.7 (2) 1.0 (4) 1.3 (5)DU-ADA 1.6 (6) 0.7 (2) 0.5 (1) 0.7 (3) 1.0 (4) 1.3 (5)eMOEA/D-ADA 1.7 (6) 0.9 (3) 0.5 (1) 0.5 (2) 0.9 (4) 1.1 (5)NSGA-III-ADA 2.6 (6) 1.5 (5) 0.7 (4) 0.1 (1) 0.1 (1) 0.2 (3) θ -DEA-ADA 2.2 (6) 1.3 (5) 0.4 (4) 0.2 (3) 0.0 (1) 0.0 (1)RVEA-ADA 1.9 (6) 0.5 (1) 1.1 (3) 0.9 (2) 1.3 (4) 1.9 (5) (b) IGDX (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) (cid:98) . µ (cid:99) AGR-ADA 0.8 (2) 0.1 (1) 0.8 (3) 2.6 (4) 2.9 (5) 3.2 (6)DU-ADA 0.8 (2) 0.3 (1) 1.0 (3) 2.5 (4) 2.9 (5) 3.5 (6)eMOEA/D-ADA 0.9 (2) 0.3 (1) 1.1 (3) 2.2 (4) 2.5 (5) 3.2 (6)NSGA-III-ADA 1.0 (3) 0.1 (1) 0.5 (2) 2.2 (4) 2.7 (5) 3.5 (6) θ -DEA-ADA 1.1 (3) 0.2 (1) 0.9 (2) 2.1 (4) 2.7 (5) 3.2 (6)RVEA-ADA 0.5 (1) 0.6 (2) 1.6 (3) 2.7 (4) 2.9 (5) 3.1 (6) TABLE V:
Results of the nine EMMAs on the 21 test problem in-stances with distance-related variables: APS value of each algorithmfor IGD+ in (a) and for IGDX in (b).(a) IGD + MOEA/D-AGR-ADA 3.0 (5)MOEA/D-DU-ADA 2.2 (3)eMOEA/D-ADA 1.5 (2)NSGA-III-ADA 4.2 (7) θ -DEA-ADA 3.0 (6)RVEA-ADA 4.4 (8)TriMOEA-TA&R 1.0 (1)MO Ring PSO SCD 6.8 (9)Omni-optimizer 2.3 (4) (b) IGDX MOEA/D-AGR-ADA 2.0 (3)MOEA/D-DU-ADA 1.5 (2)eMOEA/D-ADA 0.9 (1)NSGA-III-ADA 3.0 (6) θ -DEA-ADA 2.2 (4)RVEA-ADA 3.7 (7)TriMOEA-TA&R 2.7 (5)MO Ring PSO SCD 6.9 (9)Omni-optimizer 4.1 (8) instances. M , D , and O in MMMOP are as follows: M ∈{ , } , D ∈ { , ..., } , and O ∈ { , ..., } .Table S.9 in the supplementary file shows results of the sixEMOAs and their ADA versions on the 21 problem instances.The results in Table S.9 are similar to the results in TableII. Since the solution space diversity maintenance mechanismin ADA is not useful to handle distance-related variables, theADA-based algorithms are likely to perform poorly on theproblems with distance-related variables. However, the resultsshow that the ADA-based algorithms perform better than theiroriginal EMOAs (except for NSGA-III-ADA).Table V shows the APS values of the nine EMMAs onthe 21 problem instances. Table S.10 in the supplementaryfile shows detailed results. Since TriMOEA-TA&R can ex-plicitly exploit distance-related variables for the search similarto MOEA/DVA [49], its performance is improved on the21 problem instances. Table V (a) shows that TriMOEA-TA&R performs the best regarding IGD + . Table V (b)shows that TriMOEA-TA&R is the fifth-ranked algorithmregarding IGDX. MOEA/D-AGR-ADA, MOEA/D-DU-ADA,eMOEA/D-ADA, and θ -DEA-ADA perform better than thethree competitors regarding IGDX. In summary, the ADA-based algorithms have high performance for MMOPs even onthe problems with distance-related variables. F. Comparison using an unbounded external archive
While the three EMMAs (TriMOEA-TA&R,MO Ring PSO SCD, and Omni-optimizer) keep the µ value constant, the ADA-based algorithms can adaptivelyadjust the µ value. One may think that the comparisons inSubsections V-B and V-E are unfair for this reason. Here,we compare the ADA-based algorithms to the three EMMAsusing an unbounded external archive (UEA) [58], whichstores all non-dominated solutions found during the searchprocess. The UEA can be incorporated into any algorithmswith no changes in their algorithmic behavior. The threeEMMAs with the UEA can maintain all non-dominatedsolutions found so far. Note that only the three EMMAs usethe UEA in this section. Since an algorithm with the UEAsignificantly outperforms its original version (see [58]), sucha comparison is unfair for the ADA-based algorithms.Tables S.11 and S.12 in the supplementary file showthe comparisons with the three EMMAs using the UEAon the 15 test problem instances with no distance-relatedvariables and the 21 test problem instances with distance-related variables, respectively. The performance of the threeEMMAs is improved by using the UEA. However, Table S.11indicates that all ADA-based algorithms (except for RVEA-ADA) outperform the three EMMAs with the UEA on the 15test problems in terms of IGDX. Table S.12 also shows thateMOEA/D-ADA performs the best on the 21 test problems interms of IGDX. Thus, the results show that some ADA-basedalgorithms perform significantly better than the three EMMAseven with the UEA. Note that the performance of the ADA-based algorithms can be further improved by using the UEAas in the three EMMAs demonstrated here. G. On the runtime of the ADA-based algorithms
The space complexity of ADA itself is O ( µD ) if D > M .Otherwise, it is O ( µM ) . Although the µ value equals to themaximum number of function evaluations n max in ADA inthe worst case, the µ value is naturally bounded as shown inSubsection V-C. Thus, the empirical space complexity is muchsmaller than O ( n max D ) and O ( n max M ) .Below, we explain the worst-case time complexity of theADA-based algorithms. ADA itself in one iteration requires O ( µD ) , which is due to the procedure of selecting theneighbors of the child u in the solution space (line 11 in Al-gorithm 1). The time complexity of an ADA-based algorithmdepends on its original version. Since the time complexityof NSGA-III is the larger value between O ( µ log M − µ ) and O ( µ M ) [5], that of NSGA-III-ADA is the largest valueamong O ( µ log M − µ ) , O ( µ M ) , and O ( µD ) . Similarly, thetime complexity of θ -DEA-ADA and RVEA-ADA is thelarger value between O ( µ M ) [18], [22] and O ( µD ) . Theassignment operations in MOEA/D-AGR, MOEA/D-DU, andeMOEA/D in one function evaluation require O ( µM ) [19]–[21]. Thus, the time complexity of MOEA/D-AGR-ADA,MOEA/D-DU-ADA, and eMOEA/D-ADA in one iteration isthe larger value between O ( µM ) and O ( µD ) .Fig. 8 shows the average CPU time of each EMMA overthe five Omni-test problems with D = 2 , , , , . We Number of decision variables D − A v e r age C P U t i m e θ -DEA-ADARVEA-ADAMO Ring PSO SCDNSGA-III-ADAMOEA/D-AGR-ADAMOEA/D-DU-ADAeMOEA/D-ADATriMOEA-TA&R θ -DEAOmni-optimizer Fig. 8:
Average CPU time of the EMMAs over the five Omni-testproblems with D = 2 , , , , . obtained all results using a workstation with Xeon E5-2620v4/ . GHz and 8GB RAM. Fig. 8 also shows results of θ -DEA. The six ADA-based algorithms and θ -DEA are im-plemented in Java, Omni-optimizer is implemented in C, andMO Ring PSO SCD and TriMOEA-TA&R are implementedin Matlab. These implementations are based on the availablecodes of these algorithms. The use of C and Matlab is dueto the original implementations of these algorithms [13], [16],[24]. Since different programming languages require differentCPU time, Fig. 8 provides rough comparison.As shown in Fig. 8, Omni-optimizer is fastest, followed by θ -DEA and TriMOEA-TA&R. The six ADA-based algorithmsare much slower than these three algorithms. This is becausethe µ value of the ADA-based algorithms is always largerthan that of the other algorithms (see Subsection V-C). Sincesome operations require O ( µ M ) and O ( µD ) as discussedabove, a larger µ value makes an algorithm slower. AlthoughMO Ring PSO SCD and RVEA-ADA perform similarly inFig. 8, we believe that MO Ring PSO SCD with an opti-mized source code can perform much faster than the ADA-based algorithms. Clearly, θ -DEA-ADA is slowest. This isbecause θ -DEA-ADA needs to normalize objective vectors forevery iteration in a computationally expensive manner. Sincethe original θ -DEA is a generational EMOA, it can reduce thenumber of normalizations. In fact, θ -DEA is much faster than θ -DEA-ADA. Note that θ -DEA-ADA can be speeded up byusing a simple normalization method (e.g., Subsection III-B).As demonstrated here, the slow speed of the ADA-algorithms is their disadvantage. Speeding up the ADA-algorithms is an avenue for future work. However, this doesnot mean that ADA is impractical. Some real-world problemsrequire a long computation time to evaluate a solution, e.g.,by the execution of an expensive computer simulation [59].In this case, the time of function evaluations dominates thatof the other parts of optimization algorithms. For this reason,in the EMO community, the comparison is generally based onthe number of function evaluations, rather than the CPU time.VI. C ONCLUSION
We proposed the ADA framework to improve the perfor-mance of decomposition-based EMOAs for MMOPs. ADAadaptively assigns one or more individuals to each subprob-lem to locate multiple equivalent Pareto optimal solutions.The population size µ is automatically adjusted during thesearch process. As presented in Subsection III-G, the effective decision making can be performed using multiple equivalentsolutions found by ADA. We incorporated ADA into the sixdecomposition-based EMOAs. We examined the performanceof those six ADA-based algorithms on the 15 test probleminstances and the 21 test problem instances with distance-related variables. The results show that ADA can improve theperformance of the original EMOAs for MMOPs. We alsoanalyzed the adaptive behavior of ADA.Computational overhead in the ADA-based algorithms istheir disadvantage (see Subsection V-G). Thus, the unboundpopulation size of ADA is a “double-edged sword”. However,as discussed in Subsection V-G, the computational overheadin ADA is practically acceptable in real-world applicationswhere solution evaluations need dominant computation timein the whole optimization process.For a fair comparison, we compared the ADA-based al-gorithms to the three EMMAs using the UEA in SubsectionV-F. However, the UEA is inherently a post-processing methodsince an algorithm cannot take advantage of non-dominatedsolutions in the UEA during the search process as in the ADA-based algorithms. For this reason, the comparison with thethree EMMAs using the UEA is not totally fair. A generaladaptive population sizing framework which can be combinedwith any EMMA is needed for a totally fair comparison.Although we focused on Type1-MMOPs, we believe thatADA is applicable to Type2-MMOPs using a relaxed dom-inance relation or a relaxed equivalent relation. An analysisof ADA on Type2-MMOPs is an avenue for future work.Other post-processing methods that handle the diversity in boththe objective and solution spaces may be better than the twopost-processing methods in ADA. Further analysis of post-processing methods is a future research topic.A CKNOWLEDGEMENT
This work was supported by National Natural ScienceFoundation of China (Grant No. 61876075), the Programfor Guangdong Introducing Innovative and EnterpreneurialTeams (Grant No. 2017ZT07X386), Shenzhen Peacock Plan(Grant No. KQTD2016112514355531), the Science and Tech-nology Innovation Committee Foundation of Shenzhen (GrantNo. ZDSYS201703031748284), and the Program for Uni-versity Key Laboratory of Guangdong Province (Grant No.2017KSYS008). R
EFERENCES[1] K. Miettinen,
Nonlinear Multiobjective Optimization . Springer, 1998.[2] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast and elitistmultiobjective genetic algorithm: NSGA-II,”
IEEE TEVC , vol. 6, no. 2,pp. 182–197, 2002.[3] E. Zitzler and S. K¨unzli, “Indicator-based selection in multiobjectivesearch,” in
PPSN , 2004, pp. 832–842.[4] Q. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithmbased on decomposition,”
IEEE TEVC , vol. 11, no. 6, pp. 712–731, 2007.[5] K. Deb and H. Jain, “An evolutionary many-objective optimizationalgorithm using reference-point-based nondominated sorting approach,part I: solving problems with box constraints,”
IEEE TEVC , vol. 18,no. 4, pp. 577–601, 2014.[6] A. Trivedi, D. Srinivasan, K. Sanyal, and A. Ghosh, “A Survey of Mul-tiobjective Evolutionary Algorithms Based on Decomposition,”
IEEETEVC , vol. 21, no. 3, pp. 440–462, 2017.[7] M. Sebag, N. Tarrisson, O. Teytaud, J. Lef`evre, and S. Baillet, “A Multi-Objective Multi-Modal Optimization Approach for Mining StableSpatio-Temporal Patterns,” in
IJCAI , 2005, pp. 859–864.[8] G. Rudolph and M. Preuss, “A multiobjective approach for finding equiv-alent inverse images of pareto-optimal objective vectors,” in
MCDM ,2009, pp. 74–79.[9] O. Sch¨utze, M. Vasile, and C. A. C. Coello, “Computing the Set ofEpsilon-Efficient Solutions in Multiobjective Space Mission Design,”
JACIC , vol. 8, no. 3, pp. 53–70, 2011.[10] T. Hiroyasu, S. Nakayama, and M. Miki, “Comparison study of SPEA2+,SPEA2, and NSGA-II in diesel engine emissions and fuel economyproblem,” in
IEEE CEC , 2005, pp. 236–242.[11] M. Preuss, C. Kausch, C. Bouvy, and F. Henrich, “Decision SpaceDiversity Can Be Essential for Solving Multiobjective Real-WorldProblems,” in
MCDM , 2008, pp. 367–377.[12] F. Kudo, T. Yoshikawa, and T. Furuhashi, “A study on analysis of designvariables in Pareto solutions for conceptual design optimization problemof hybrid rocket engine,” in
IEEE CEC , 2011, pp. 2558–2562.[13] K. Deb and S. Tiwari, “Omni-optimizer: A generic evolutionary algo-rithm for single and multi-objective optimization,”
EJOR , vol. 185, no. 3,pp. 1062–1087, 2008.[14] T. Ulrich, J. Bader, and L. Thiele, “Defining and Optimizing Indicator-Based Diversity Measures in Multiobjective Search,” in
PPSN , 2010,pp. 707–717.[15] O. M. Shir, M. Preuss, B. Naujoks, and M. T. M. Emmerich, “EnhancingDecision Space Diversity in Evolutionary Multiobjective Algorithms,” in
EMO , 2009, pp. 95–109.[16] C. Yue, B. Qu, and J. Liang, “A Multiobjective Particle Swarm Op-timizer Using Ring Topology for Solving Multimodal MultiobjectiveProblems,”
IEEE TEVC , vol. 22, no. 5, pp. 805–817, 2018.[17] R. Tanabe and H. Ishibuchi, “A Decomposition-Based EvolutionaryAlgorithm for Multi-modal Multi-objective Optimization,” in
PPSN ,2018, pp. 249–261.[18] Y. Yuan, H. Xu, B. Wang, and X. Yao, “A New Dominance Relation-Based Evolutionary Algorithm for Many-Objective Optimization,”
IEEETEVC , vol. 20, no. 1, pp. 16–37, 2016.[19] Z. Wang, Q. Zhang, A. Zhou, M. Gong, and L. Jiao, “AdaptiveReplacement Strategies for MOEA/D,”
IEEE Trans. Cyber. , vol. 46,no. 2, pp. 474–486, 2016.[20] Y. Yuan, H. Xu, B. Wang, B. Zhang, and X. Yao, “Balancing Conver-gence and Diversity in Decomposition-Based Many-Objective Optimiz-ers,”
IEEE TEVC , vol. 20, no. 2, pp. 180–198, 2016.[21] S. Jiang, S. Yang, Y. Wang, and X. Liu, “Scalarizing Functions inDecomposition-Based Multiobjective Evolutionary Algorithms,”
IEEETEVC , vol. 22, no. 2, pp. 296–313, 2018.[22] R. Cheng, Y. Jin, M. Olhofer, and B. Sendhoff, “A Reference Vec-tor Guided Evolutionary Algorithm for Many-Objective Optimization,”
IEEE TEVC , vol. 20, no. 5, pp. 773–791, 2016.[23] B. Zhang, K. Shafi, and H. A. Abbass, “On Benchmark Problems andMetrics for Decision Space Performance Analysis in Multi-ObjectiveOptimization,”
IJCIA , vol. 16, no. 1, pp. 1–18, 2017.[24] Y. Liu, G. G. Yen, and D. Gong, “A Multi-Modal Multi-Objective Evo-lutionary Algorithm Using Two-Archive and Recombination Strategies,”
IEEE TEVC , 2018 (in press).[25] K. Deb and S. Tiwari, “Omni-optimizer: A Procedure for Single andMulti-objective Optimization,” in
EMO , 2005, pp. 47–61.[26] H. Li and Q. Zhang, “Multiobjective Optimization Problems WithComplicated Pareto Sets, MOEA/D and NSGA-II,”
IEEE TEVC , vol. 13,no. 2, pp. 284–302, 2009.[27] K. Deb and R. B. Agrawal, “Simulated Binary Crossover for ContinuousSearch Space,”
Complex Systems , vol. 9, no. 2, 1995.[28] R. Storn and K. Price, “Differential Evolution - A Simple and EfficientHeuristic for Global Optimization over Continuous Spaces,”
J. GlobalOptimiz. , vol. 11, no. 4, pp. 341–359, 1997.[29] Y. Qi, X. Ma, F. Liu, L. Jiao, J. Sun, and J. Wu, “MOEA/D with AdaptiveWeight Adjustment,”
Evol. Comput. , vol. 22, no. 2, pp. 231–264, 2014.[30] Y. Xiang, Y. Zhou, M. Li, and Z. Chen, “A Vector Angle-Based Evo-lutionary Algorithm for Unconstrained Many-Objective Optimization,”
IEEE TEVC , vol. 21, no. 1, pp. 131–152, 2017.[31] S. Jiang and S. Yang, “A Strength Pareto Evolutionary AlgorithmBased on Reference Direction for Multiobjective and Many-ObjectiveOptimization,”
IEEE TEVC , vol. 21, no. 3, pp. 329–346, 2017.[32] A. Zhou, Q. Zhang, and Y. Jin, “Approximating the Set of Pareto-Optimal Solutions in Both the Decision and Objective Spaces by anEstimation of Distribution Algorithm,”
IEEE TEVC , vol. 13, no. 5, pp.1167–1189, 2009.[33] J. C. Castillo, C. Segura, A. H. Aguirre, G. Miranda, and C. Le´on,“A multi-objective decomposition-based evolutionary algorithm with enhanced variable space diversity control,” in
GECCO (Companion) ,2017, pp. 1565–1571.[34] X. Li, M. G. Epitropakis, K. Deb, and A. P. Engelbrecht, “SeekingMultiple Solutions: An Updated Survey on Niching Methods and TheirApplications,”
IEEE TEVC , vol. 21, no. 4, pp. 518–538, 2017.[35] S. Bird and X. Li, “Adaptively choosing niching parameters in a PSO,”in
GECCO , 2006, pp. 3–10.[36] M. Preuss, “Improved Topological Niching for Real-Valued GlobalOptimization,” in
EvoApplications , 2012, pp. 386–395.[37] P. Kerschke, H. Wang, M. Preuss, C. Grimme, A. H. Deutz, H. Traut-mann, and M. Emmerich, “Towards analyzing multimodality of contin-uous multiobjective landscapes,” in
PPSN , 2016, pp. 962–972.[38] G. Rudolph, B. Naujoks, and M. Preuss, “Capabilities of EMOA toDetect and Preserve Equivalent Pareto Subsets,” in
EMO , 2007, pp. 36–50.[39] K. Bringmann, T. Friedrich, and P. Klitzke, “Two-dimensional subsetselection for hypervolume and epsilon-indicator,” in
GECCO , 2014, pp.589–596.[40] H. K. Singh, K. S. Bhattacharjee, and T. Ray, “Distance based subsetselection for benchmarking in evolutionary multi/many-objective opti-mization,”
IEEE TEVC , 2018 (in press).[41] H. Ishibuchi, Y. Setoguchi, H. Masuda, and Y. Nojima, “How to comparemany-objective algorithms under different settings of population andarchive sizes,” in
IEEE CEC , 2016, pp. 1149–1156.[42] H. Ishibuchi, H. Masuda, Y. Tanigaki, and Y. Nojima, “ModifiedDistance Calculation in Generational Distance and Inverted GenerationalDistance,” in
EMO , 2015, pp. 110–125.[43] C. A. C. Coello and M. R. Sierra, “A Study of the Parallelization ofa Coevolutionary Multi-objective Evolutionary Algorithm,” in
MICAI ,2004, pp. 688–697.[44] Q. Zhang, W. Liu, and H. Li, “The performance of a new version ofMOEA/D on CEC09 unconstrained MOP test instances,” in
IEEE CEC ,2009, pp. 203–208.[45] K. Li, Q. Zhang, S. Kwong, M. Li, and R. Wang, “Stable Matching-Based Selection in Evolutionary Multiobjective Optimization,”
IEEETEVC , vol. 18, no. 6, pp. 909–923, 2014.[46] M. Asafuddoula, T. Ray, and R. A. Sarker, “A Decomposition-BasedEvolutionary Algorithm for Many Objective Optimization,”
IEEE TEVC ,vol. 19, no. 3, pp. 445–460, 2015.[47] C. Hu and H. Ishibuchi, “Incorporation of a decision space diversitymaintenance mechanism into MOEA/D for multi-modal multi-objectiveoptimization,” in
GECCO (Companion) , 2018, pp. 1898–1901.[48] H. Wang, L. Jiao, and X. Yao, “Two Arch2: An Improved Two-ArchiveAlgorithm for Many-Objective Optimization,”
IEEE TEVC , vol. 19,no. 4, pp. 524–541, 2015.[49] X. Ma, F. Liu, Y. Qi, X. Wang, L. Li, L. Jiao, M. Yin, and M. Gong,“A Multiobjective Evolutionary Algorithm Based on Decision VariableAnalyses for Multiobjective Optimization Problems With Large-ScaleVariables,”
IEEE TEVC , vol. 20, no. 2, pp. 275–298, 2016.[50] M. Preuss, B. Naujoks, and G. Rudolph, “Pareto Set and EMOABehavior for Simple Multimodal Multiobjective Functions,” in
PPSN ,2006, pp. 513–522.[51] J. J. Liang, C. T. Yue, and B. Y. Qu, “Multimodal multi-objectiveoptimization: A preliminary study,” in
IEEE CEC , 2016, pp. 2454–2461.[52] H. Ishibuchi, Y. Hitotsuyanagi, N. Tsukamoto, and Y. Nojima, “Many-Objective Test Problems to Visually Examine the Behavior of Multiob-jective Evolution in a Decision Space,” in
PPSN , 2010, pp. 91–100.[53] J. Bader and E. Zitzler, “HypE: An Algorithm for Fast Hypervolume-Based Many-Objective Optimization,”
Evol. Comput. , vol. 19, no. 1, pp.45–76, 2011.[54] J. J. Durillo and A. J. Nebro, “jMetal: A Java framework for multi-objective optimization,”
AES , vol. 42, no. 10, pp. 760–771, 2011.[55] T. Ulrich, J. Bader, and E. Zitzler, “Integrating decision space diversityinto hypervolume-based multiobjective search,” in
GECCO , 2010, pp.455–462.[56] H. Ishibuchi, Y. Setoguchi, H. Masuda, and Y. Nojima, “Performanceof Decomposition-Based Many-Objective Algorithms Strongly Dependson Pareto Front Shapes,”
IEEE TEVC , vol. 21, no. 2, pp. 169–190, 2017.[57] H. Ishibuchi, Y. Peng, and K. Shang, “A Scalable Multimodal Multiob-jective Test Problem,” in
IEEE CEC , 2019, pp. 302–309.[58] R. Tanabe, H. Ishibuchi, and A. Oyama, “Benchmarking Multi- andMany-objective Evolutionary Algorithms under Two Optimization Sce-narios,”
IEEE Access , vol. 5, pp. 19 597–19 619, 2017.[59] T. Chugh, K. Sindhya, J. Hakanen, and K. Miettinen, “A survey onhandling computationally expensive multiobjective optimization prob-lems with evolutionary algorithms,”
Soft Comput. , vol. 23, no. 9, pp.3137–3166, 2019. Ryoji Tanabe is a Research Assistant Professorwith Department of Computer Science and Engi-neering, Southern University of Science and Tech-nology, China. He was a Post-Doctoral Researcherwith ISAS/JAXA, Japan, from 2016 to 2017. Hereceived his Ph.D. in Science from The University ofTokyo, Japan, in 2016. His research interests includestochastic single- and multi-objective optimizationalgorithms, parameter control in evolutionary algo-rithms, and automatic algorithm configuration.