Fast Greedy Subset Selection from Large Candidate Solution Sets in Evolutionary Multi-objective Optimization
IIEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 1
Fast Greedy Subset Selection from Large CandidateSolution Sets in Evolutionary Multi-objectiveOptimization
Weiyu Chen, Hisao Ishibuchi,
Fellow, IEEE, and Ke Shang
Abstract —Subset selection is an interesting and important topicin the field of evolutionary multi-objective optimization (EMO).Especially, in an EMO algorithm with an unbounded externalarchive, subset selection is an essential post-processing procedureto select a pre-specified number of solutions as the final result. Inthis paper, we discuss the efficiency of greedy subset selection forthe hypervolume, IGD and IGD+ indicators. Greedy algorithmsusually efficiently handle subset selection. However, when a largenumber of solutions are given (e.g., subset selection from tens ofthousands of solutions in an unbounded external archive), theyoften become time-consuming. Our idea is to use the submodularproperty, which is known for the hypervolume indicator, toimprove their efficiency. First, we prove that the IGD and IGD+indicators are also submodular. Next, based on the submodularproperty, we propose an efficient greedy inclusion algorithm foreach indicator. Then, we demonstrate through computationalexperiments that the proposed algorithms are much faster thanthe standard greedy subset selection algorithms.
Index Terms —Evolutionary multi-objective optimization, evo-lutionary many-objective optimization, subset selection, perfor-mance indicators, submodularity.
I. I
NTRODUCTION E VOLUTIONARY multi-objective optimization (EMO)algorithms aim to optimize m potentially conflictingobjectives concurrently. To compare solutions, we usuallyuse the Pareto-dominance relation [1]. In a multi-objectiveminimization problem with m objectives f i ( x ) , i = 1 , , ..., m ,solution a dominates solution b (i.e., a ≺ b ) if and only if ∀ i ∈{ , , . . . , m } , f i ( a ) ≤ f i ( b ) and ∃ j ∈ { , , . . . , m } , f j ( a ) EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 3 of the proposed submodular property-based algorithms for thethree performance indicators.II. H YPERVOLUME S UBSET S ELECTION A. Hypervolume indicator and hypervolume contribution The hypervolume indicator [27]–[29] is a widely usedmetric to evaluate the diversity and convergence of a solutionset. It is defined as the size of the objective space which iscovered by a set of non-dominated solutions and bounded bya reference set R . Formally, the hypervolume of a solution set S is defined as follows: HV ( S ) := (cid:90) z ∈ R m A s ( z ) dz, (1)where m is the number of dimensions and A s : R m → { , } is the attainment function of S with respect to the referenceset R and can be written as A s ( z ) = (cid:40) if ∃ s ∈ S, r ∈ R : f ( s ) (cid:22) z (cid:22) r, otherwise. (2)Calculating the hypervolume of a solution set is a p to a set S is HV C ( p, S ) = HV ( S ∪ { p } ) − HV ( S ) . (3)Fig. 1 illustrates the hypervolume of a solution set and thehypervolume contribution of a solution to the solution set intwo dimensions. The grey region is the hypervolume of thesolution set S = { a, b, c, d, e } and the yellow region is thehypervolume contribution of a solution p to S . Minimize f a b c d e p r ( ) HV S Fig. 1. The hypervolume of the solution set S = { a, b, c, d, e } and thehypervolume contribution of p to the solution set S for a two-objectiveminimization problem. Note that calculating the hypervolume contribution basedon its definition in (3) requires hypervolume calculation twice,which is not very efficient. Bringmann and Friedrich [19] and Bradstreet et al. [36] proposed a new calculation method to re-duce the amount of calculation. The hypervolume contributionis calculated as HV C ( p, S ) = HV ( { p } ) − HV ( S (cid:48) ) , (4)where S (cid:48) = { limit ( s, p ) | s ∈ S } , (5) limit (( s , ..., s m ) , ( p , ..., p m ))= ( worse ( s , p ) , ..., worse ( s m , p m )) . (6)In this formulation worse ( s i , p i ) takes the larger valuefor minimization problems. Compared to the straightforwardcalculation method in (3), this method is much more efficient.The hypervolume of one solution (i.e., HV ( { p } ) ) can beeasily calculated. We can also apply the previous mentionedHSO [30], [31], HOY [32]–[34] and WFG [35] to calculatethe hypervolume of a reduced solution set S (cid:48) (i.e., HV ( S (cid:48) ) ).Let us take Fig. 2 as an example. Suppose we want tocalculate the hypervolume contribution of solution p to asolution set S = { a, b, c, d, e } . First, for each solution in S , we replace each of its objective values with the corre-sponding value from solution p if the value of p is larger(i.e., we calculate limit ( a, p ) , ..., limit ( e, p ) ). This leads to S (cid:48) = { a (cid:48) , b (cid:48) , c (cid:48) , d (cid:48) , e (cid:48) } . After the replacement, e (cid:48) is dominatedby d (cid:48) . Thus e (cid:48) can be removed from S (cid:48) since e (cid:48) has nocontribution to the hypervolume of S (cid:48) . Similarly, a (cid:48) and b (cid:48) canalso be removed from S (cid:48) . Then, we calculate the hypervolumeof S (cid:48) (i.e., the area of the gray region in Fig. 2) and subtractit from the hypervolume of solution p . The remaining yellowpart is the hypervolume contribution of solution p . Minimize f a b c d e p r ( ') HV S ' a ' d ' e Fig. 2. Illustration of the efficient hypervolume contribution computationmethod. B. Hypervolume Subset Selection Problem The hypervolume subset selection problem (HSSP) [13]uses the hypervolume indicator as the solution selection crite-ria (i.e., the hypervolume indicator is used as the performanceindicator g in Definition 1 ). The HSSP aims to select a pre-specified number of solutions from a given candidate solutionset to maximize the hypervolume of the selected solutions.For two-objective problems, HSSP can be solved with timecomplexity of O ( nk + nlogn ) [18] and O (( n − k ) k + nlogn ) [15]. For multi-objective problems with three or more objec-tives, HSSP is an NP-hard problem [37], it is impractical to EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 4 try to find the exact optimal solution set when the size ofthe candidate set is large and/or the dimensionality of theobjective space is high. In practice, some greedy heuristicalgorithms and genetic algorithms are employed to obtain anapproximated optimal solution set. C. Greedy Hypervolume Subset Selection Hypervolume greedy inclusion selects solutions from V oneby one. In each iteration, the solution that has the largesthypervolume contribution to the selected solution set is se-lected until the required number of solutions are selected. Thepseudocode of greedy inclusion is shown in Algorithm 1. Sincehypervolume is a submodular indicator [38], hypervolumegreedy inclusion algorithms provide a (1 − /e ) -approximationto HSSP [6].Hypervolume greedy removal algorithms discard one solu-tion with the least hypervolume contribution to the currentsolution set in each iteration. Unlike greedy inclusion, greedyremoval has no approximation guarantee. It can obtain anarbitrary bad solution subset [39]. However, in practice, itusually leads to good approximations.Although the greedy algorithm has polynomial complexity,it still takes a long computation time when the candidateset is large or the dimension is high. A lot of algorithmsare proposed to accelerate the na¨ıve greedy algorithm. Forexample, to accelerate the hypervolume-based greedy removalalgorithm, Incremental Hypervolume by Slicing Objectives(IHSO*) [36] and Incremental WFG (IWFG) [40] were pro-posed to identify the solution with the least hypervolumecontribution quickly. Some experimental results show thatthese methods can significantly accelerate greedy removalalgorithms. Algorithm 1 Greedy Inclusion Hypervolume Subset Selection Input: V (A set of non-dominated solutions), k (Solutionsubset size) Output: S (The selected subset from V ) if | V | < k then S = V else S = ∅ while | S | < k do for each s i in V \ S do Calculate the hypervolume contribution of s i to S end for p = Solution in V \ S with the largest hypervolumecontribution S = S ∪ { p } end while end if Hypervolume-based greedy inclusion/removal algorithmscan be accelerated by updating hypervolume contributionsinstead of recalculating them in each iteration (i.e., by utilizingthe calculation results in the previous iteration instead ofcalculating hypervolume contributions in each iteration inde-pendently). Guerreiro et al. [16] proposed an algorithm to update the hypervolume contributions efficiently in three andfour dimensions. Using their algorithm, the time complexity ofhypervolume-based greedy removal in three and four dimen-sions can be reduced to O ( n ( n − k )+ nlogn ) and O ( n ( n − k )) respectively.In a hypervolume-based EMO algorithm called FV-MOEAproposed by Jiang et al. [41], an efficient hypervolume con-tribution update method applicable to any dimension was pro-posed. The main idea of their method is that the hypervolumecontribution of a solution is only associated with a smallnumber of its neighboring solutions rather than all solutions inthe solution set. Let us suppose that one solution s j have justbeen removed from the solution set S , the main process of thehypervolume contribution update method in [41] is shown inAlgorithm 2. Algorithm 2 Hypervolume Contribution Update Input: HV C (The hypervolume contribution of each solu-tion in S ), s j (The newly removed solution) Output: HV C (The updated hypervolume contribution ofeach solution in S ) for each s k ∈ S do w = worse ( s k , s j ) W = { limit ( t, w ) | t ∈ S } HV C ( s k ) = HV C ( s k ) + HV ( { w } ) − HV ( W ) end for The worse and limit operations in Algorithm 2 are thesame as those in Section II.A. Let us explain the basic ideaof Algorithm 2 using Fig. 3. When we have a solution set S = { a, b, c, d, e } in Fig. 3, the hypervolume contributionof solution c is the blue area. When solution b is removed,the hypervolume contribution of c is updated as follows.The worse solution w in line 2 of Algorithm 2 has themaximum objective values of solutions b and c . In line 3,firstly the limit operator changes solutions a , d and e to a (cid:48) , d (cid:48) and e (cid:48) . Next, the dominated solution e (cid:48) is removed.Then the solution set W = { a (cid:48) , d (cid:48) } is obtained. In line 4,the hypervolume contribution of c is updated by adding theterm HV ( { w } ) − HV ( W ) to its original value (i.e., the blueregion in Fig. 3). The added term is the joint hypervolumecontribution of solutions b and c (i.e., the yellow region inFig. 3). In this way, the hypervolume contribution of eachsolution is updated.Since the limit process reduces the number of non-dominated solutions, this updated method greatly improvesthe speed of hypervolume-based greedy removal algorithms.Algorithm 2 in [41] is the fastest known algorithm to updatethe hypervolume contribution in any dimension.III. IGD AND IGD+ S UBSET S ELECTION The Inverted Generational Distance (IGD) indicator [42]is another widely-used indicator to evaluate solution sets. Itcalculates the average Euclidean distance from each referencepoint to the nearest solution. Formally, the IGD of a solutionset S with respect to a reference set R can be calculated using EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 5 Minimize f a b c d e r ( ) HV W ' a ' d ' e w Fig. 3. Illustration of the hypervolume contribution update method in FV-MOEA. In this figure, it is assumed that point b has just been removed andthe hypervolume contribution of point c is to be updated. the following formula: IGD ( S, R ) = 1 | R | (cid:88) r ∈ R min s ∈ S (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 ( s i − r i ) . (7)The IGD indicator can evaluate the convergence and diver-sity of a solution set in polynomial time complexity. However,one of the disadvantages of the IGD indicator is that it isnot Pareto compliant [43]. This means that a solution set A which dominates another solution set B may have a worseIGD value than B . The Inverted Generational Distance plus(IGD+) indicator [43] overcomes this disadvantage. Instead ofusing Euclidean distance to calculate the difference betweena solution and a reference point, the IGD+ indicator usesa different distance called IGD+ distance. It is based onthe dominance relation. The formula to calculate the IGD+distance between a solution s and a reference point r is asfollows (for minimization problems): D + ( s, r ) = (cid:118)(cid:117)(cid:117)(cid:116) m (cid:88) i =1 ( max { , s i − r i } ) . (8)Based on the definition of the IGD+ distance, the formulato calculate the IGD+ value of a solution set S with respectto a reference set R is: IGD + ( S, R ) = 1 | R | (cid:88) r ∈ R min s ∈ S (cid:118)(cid:117)(cid:117)(cid:116) m (cid:88) i =1 ( max { , s i − r i } ) . (9)Fig. 4 illustrates the calculation of IGD and IGD+ of asolution set. In this example, the solution set S has threesolutions a , b and c (i.e., red points in the figure). The referenceset R has four points α, β, γ, δ (i.e., blue points in the figure).To calculate the IGD value of S , firstly, we need to find theminimum Euclidean distance from each reference point in R to the solutions in the solution set S . They are shown as blacklines in Fig. 4). Then, the IGD of S is the mean value ofthese distances. To calculate IGD+ of S in Fig. 4, we shouldcalculate the IGD+ distance between each reference point andeach solution. Since solution b is dominated by the referencepoint β , b i − β i is large or equal to zero for all objectives.Therefore, the IGD+ distance from solution β to b is the same as the Euclidean distance. For solution a and reference point α , since a − α is smaller than zero, we only calculate thepositive part (i.e., a − α ). Thus, the IGD+ distance between a and α is the green dash line between a and α . In a similarway, we can obtain the IGD+ distance from each referencepoint to each solution and find the minimum IGD+ distancefrom each reference point to the solution set S (i.e., greendash lines in Fig. 4). The IGD+ of S is the mean value ofthese distances. Minimize f a b c Fig. 4. Illustration of the calculation of IGD and IGD+. A. IGD/IGD+ Subset Selection Problem In contrast to hypervolume, the smaller IGD or IGD+ valueindicates a better solution set. Therefore, the IGD subsetselection problem takes the minus IGD of a solution set asthe performance indicator g in Definition 1 while the IGD+subset selection problem takes the minus IGD+ of a solutionset. Similar to HSSP, it is impractical to obtain the optimalsubset unless the candidate solution set size is small.Although IGD is a popular indicator to evaluate the per-formance of an EMO algorithm, the IGD subset selectionproblem is seldom studied. The reason is that calculatingthe IGD needs a reference point set R . When evaluating analgorithm on an artificial test problem, we can use the pointson the true PF of the test problem to evaluate the solution set(since the true PF is known). However, in general, the true PFof the problem is unknown, which brings difficulties to thecalculation of IGD [44].One approach is to generate reference points from theestimated PF as in IGD-based EMO algorithms [45], [46].Firstly, the extreme points in the current solution set are found.Next, based on these extreme points, we specify a hyperplanein the objective space. Then, we uniformly sample referencepoints from the hyperplane for IGD/IGD+ calculation.Another approach is to use the whole candidate solutionset as the reference point set. In EMO algorithms with abounded or unbounded external archive, some or all non-dominated solutions among the evaluated solutions duringtheir execution are stored externally. In this paper, we assumesuch an EMO algorithm, and we use all the stored non-dominated solutions in the external archive as the referencepoint set for IGD/IGD+ calculation. As shown in Table III inSection V for some real-world problems, tens of thousands EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 6 of non-dominated solutions are found by a single run of anEMO algorithm. However, it is often unrealistic to show sucha large number of solutions to the decision maker. Thus, subsetselection is used to choose a pre-specified number of solutions.Subset selection is also needed in performance comparisonof EMO algorithms to compare them using solution sets ofthe same size. In the proposed IGD/IGD+ subset selectionalgorithms, all the obtained non-dominated solutions (i.e., allcandidate solutions) are used as the reference point set.The following are two advantages of IGD and IGD+ subsetselection over hypervolume subset selection: (i) When thenumber of dimensions is large, calculating hypervolume isvery time-consuming since hypervolume calculation is µ distributions in three or higherdimensions are still unknown. In contrast to hypervolume, IGDand IGD+ subset selection can be formulated as a problem tominimize the expected loss function [17]. From this point ofview, IGD subset selection and IGD+ subset selection have aclear meaning for the decision-maker. B. IGD/IGD+ Subset Selection Algorithms Similar to HSSP, we can use greedy inclusion algorithms,greedy removal algorithms and evolutionary algorithms to findan approximate solution to the IGD and IGD+ subset selectionproblems. In greedy algorithms, we always need to calculatethe IGD improvement of a solution a to the current solutionset S with respect to the reference set R . According to thedefinition of IGD, the IGD improvement of solution a to S can be calculated by IGD ( S, R )– IGD ( S ∪{ a } , R ) . However,this is not efficient since IGD improvement calculation needsto calculate IGD twice. The time complexity of IGD improve-ment calculation is O ( m | S || R | ) for an m -objective problemwith a candidate solution set S and a reference point set R .In this paper, we use a more efficient calculation method.When calculating IGD of a solution set, we also use an array D to store the distance from each reference point to the nearestsolution in the solution set S . When we want to calculate theIGD improvement of a new solution a to S , we only needto calculate the distance from each reference point to a andstore them in a new distance array D (cid:48) . Then, we compare eachitem in D and D (cid:48) . If the item in D (cid:48) is smaller than D , wereplace the item in D with the corresponding item in D (cid:48) . Thenew IGD of the solution set S ∪ { a } is the average value ofeach item in the new array D . Finally, we subtract the newIGD from the original IGD to obtain the IGD improvement.The proposed method can reduce the complexity of IGDimprovement calculation to O ( m | R | ) , which can significantlydecrease the computation time.The details of the IGD greedy inclusion algorithm with theproposed efficient IGD improvement calculation are shown in Algorithm 3. In this algorithm, mean ( · ) calculates the meanvalue of each item in an array and min ( · , · ) takes the smallerone between two items that have the same index in the twoarrays. Algorithm 3 is for IGD subset selection. It can be easily Algorithm 3 Greedy Inclusion IGD Subset Selection Input: V (A set of non-dominated solutions), k (Solutionsubset size) Output: S (The selected subset from V ) if | V | < k then S = V else S = ∅ s min = Solution in V \ S with the smallest IGD ( s i , V ) S = S ∪ { s min } D = Euclidean distance from each reference point to s min while | S | < k do for each s i in V \ S do D (cid:48) = Euclidean distance from each reference pointto s i c i = mean ( D ) − mean ( min ( D, D (cid:48) )) end for p = Solution in V \ S with the largest c i D (cid:48) = Euclidean distance from each reference pointto p D = min ( D, D (cid:48) ) S = S ∪ { p } end while end if changed to the algorithm for IGD+ subset selection by replac-ing the Euclidean distance with the IGD+ distance. Besides,we can also use the efficient IGD improvement calculationmethod in IGD and IGD+ greedy removal algorithms.IV. T HE P ROPOSED L AZY G REEDY A LGORITHMS A. Submodularity of Indicators The core idea of the proposed algorithms is to exploitthe submodular property of the hypervolume, IGD and IGD+indicators. Submodular function is a kind of set function thatsatisfies diminishing returns property. Formally, the definitionof a submodular function is as follows [6]. Definition 2 (Submodular Function) . A real-valued functionz(V) defined on the set of all subsets of V that satisfies z ( S ∪ { p } ) − z ( S ) ≤ z ( S ∪ { p } ) − z ( S ) , S ⊂ S ⊂ V,p ∈ V − S is called a submodular set function. Note that the submodular property is different from the non-decreasing property of set functions. To further illustrate thedifference between them, we show three types of set functionsin Fig. 5. The hypervolume indicator is similar to z , whichis non-decreasing (i.e., HV ( Y ) ≥ HV ( X ) , if X ⊆ Y ) andsubmodular (i.e., HV C ( p, X ) ≥ HV C ( p, Y ) , if X ⊆ Y ). z is submodular but it is not non-decreasing. z is non-decreasing but it is non-submodular. EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 7 ( ) z V z z z ... n n n A A A A A A − − Fig. 5. Illustration of three types of set functions.TABLE IS UMMARY OF N OTATIONS Notation Description A, B, X Solution sets R Reference set x, y Solutions that do not belong to Ai Index of the reference point in the reference set I , I Sets of reference point indexes whose closest distanceto the solution set are changed after adding x and y ,respectively ∆ d xi , ∆ d yi Change of the closest distance between reference point i and the solution set by adding x and y , respectively The hypervolume indicator has been proved to be non-decreasing submodular [38]. Now, we show that the minusIGD indicator and the minus IGD+ indicator are both non-decreasing submodular. The minus here means to take theopposite number of the indicator value. Theorem 1. The minus IGD indicator is non-decreasingsubmodular.Proof: Let us consider a solution set A , a reference set R and two solutions x and y such that x / ∈ A and y / ∈ A .By adding a solution x to A , the nearest distance from somereference point to A will be changed. Let us denote the setof indexes of these points as I , and the difference betweenthe original distance and the new distance as ∆ d xi , i ∈ I .Similarly, by adding a solution y , the set of indexes ofreference points whose nearest distance are changed is denotedas I , and the corresponding distance differences are denotedas ∆ d yi , i ∈ I . The notations used in the proof are summariesin Table 1.According to the definition of IGD, IGD ( A ∪ { x } ) = IGD ( A ) − | R | (cid:80) i ∈ I ∆ d xi . Since ∆ d xi is larger than zero, theIGD indicator is non-increasing. Therefore, the minus IGDindicator is non-decreasing.Similarly, by adding a solution y to a solution set A ∪ { x } , IGD ( A ∪ { x } ∪ { y } ) = IGD ( A ) − | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d xi − | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d yi − | R | (cid:88) i ∈ ( I ∩ I ) max { ∆ d xi , ∆ d yi } . In | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d xi + 1 | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d yi + 1 | R | (cid:88) i ∈ ( I ∩ I ) max { ∆ d xi , ∆ d yi } , if i ∈ ( I ∩ I ) , only the larger one between ∆ d xi and ∆ d yi will be added while in (cid:80) i ∈ I ∆ d xi + (cid:80) i ∈ I ∆ d yi all ∆ d xi and ∆ d yi will be added together.Hence, | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d xi + 1 | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d yi + 1 | R | (cid:88) i ∈ ( I ∩ I ) max { ∆ d xi , ∆ d yi } ≤ (cid:88) i ∈ I ∆ d xi + (cid:88) i ∈ I ∆ d yi . − IGD ( A ∪ { y } ) − ( − IGD ( A )) = 1 | R | (cid:88) i ∈ I ∆ d yi .IGD ( A ∪ { x } ∪ { y } ) − ( − IGD ( A ∪ { x } ))= 1 | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d xi + 1 | R | (cid:88) i ∈ I \ ( I ∩ I ) ∆ d yi + 1 | R | (cid:88) i ∈ ( I ∩ I ) max { ∆ d xi , ∆ d yi } − | R | (cid:88) i ∈ I ∆ d xi ≤ | R | (cid:88) i ∈ I ∆ d xi + 1 | R | (cid:88) i ∈ I ∆ d yi − | R | (cid:88) i ∈ I ∆ d xi = 1 | R | (cid:88) i ∈ I ∆ d yi . Therefore, − IGD ( A ∪ { y } ) − ( − IGD ( A )) ≥ IGD ( A ∪{ x } ∪ { y } ) − ( − IGD ( A ∪ { x } )) . Then, by mathematical induction, we can obtain that − IGD ( A ∪ { y } ) − ( − IGD ( A )) ≥ IGD ( A ∪ X ∪ { y } ) − ( − IGD ( A ∪ X )) , where X is a solution set and X ∩ A = ∅ .If we let B = A ∪ X , the above formula can be rewrittenas − IGD ( A ∪ { y } ) − ( − IGD ( A )) ≥ IGD ( B ∪ { y } ) − ( − IGD ( B )) , A ⊂ B, which is the same as the definition ofthe submodular function. Hence, the minus IGD indicator isnon-decreasing submodular. Theorem 2. The minus IGD+ indicator is non-decreasingsubmodular.Proof: We can prove that the minus IGD+ indicator is alsonon-decreasing submodular in the same manner as the prooffor the submodularity of IGD indicator. The only difference isto replace the Euclidean distance with the IGD+ distance. B. Algorithm Proposal In each iteration of the hypervolume greedy inclusion algo-rithm, we only need to identify the solution with the largesthypervolume contribution. However, we usually calculate thehypervolume contributions of all solutions. Since the calcula-tion of the contribution of each solution is time-consuming,such an algorithm is not efficient. EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 8 As discussed in the last subsection, the hypervolume indi-cator is non-decreasing submodular. This property can helpus avoid unnecessary calculations. The submodular propertyof the hypervolume indicator means that the hypervolumecontribution of a solution to the selected solution subset Snever increases as the number of solutions in S increasesin a greedy inclusion manner. Hence, instead of recomputingthe hypervolume contribution of every candidate solution ineach iteration, we can utilize the following lazy evaluationmechanism.We use a list C to store the candidate (i.e., unselected)solutions and their tentative HVC (hypervolume contribution)values. The tentative HVC value of each solution is initializedwith its hypervolume (i.e., its hypervolume contribution whenno solution is selected). The tentative HVC value of eachsolution is the upper bound of its true hypervolume contri-bution. For finding the solution with the largest hypervolumecontribution from the list, we pick the most promising so-lution with the largest tentative HVC value and recalculateits hypervolume contribution to the current solution subset S .If the recalculated hypervolume contribution of this solutionis still the largest in the list, we do not have to calculatethe hypervolume contributions of the other solutions. This isbecause the hypervolume contribution of each solution neverincreases through the execution of greedy inclusion. In thiscase (i.e., if the recalculated hypervolume contribution ofthe most promising solution is still the largest in the list),we move this solution from the list to the selected solutionsubset S . If the recalculated hypervolume contribution of thissolution is not the largest in the list, its tentative HVC value isupdated with the recalculated value. Then the most promisingsolution with the largest tentative HVC value in the list isexamined (i.e., its hypervolume contribution is recalculated).This procedure is iterated until the recalculated hypervolumecontribution is the largest in the list.In many cases, the recalculation of the hypervolume contri-bution of each solution results in the same value as or slightlysmaller value than its tentative HVC value in the list sincethe inclusion of a single solution to the solution subset S changes the hypervolume contributions of only its neighborsin the objective space. Thus, the solution with the largesthypervolume contribution is often found without examiningall solutions in the list. By applying this lazy evaluationmechanism, we can avoid a lot of unnecessary calculations inhypervolume-based greedy inclusion algorithms. The detailsof the proposed lazy greedy inclusion hypervolume subsetselection (LGI-HSS) algorithm are shown in Algorithm 4.For the IGD and IGD+ indicators, the same idea can beused. Since the submodularity of the IGD and IGD+ indicatorshas been proved in section IV.A, we can obtain that the IGD(IGD+) improvement of a solution to the selected solutionset S will never increase. Thus, we do not need to calculateother solutions if the recalculated IGD (IGD+) improvementof a solution is the largest among all tentative IGD (IGD+)improvement values. Besides, to accelerate the IGD (IGD+)improvement calculation, we use our proposal for the standardIGD greedy inclusion in Section III. An array D is used tostore the distance from each reference point to the nearest Algorithm 4 Lazy Greedy Inclusion Hypervolume SubsetSelection (LGI-HSS) Input: V (A set of non-dominated solutions), k (Solutionsubset size) Output: S (The selected subset from V ) if | V | < k then S = V else S = ∅ , C = ∅ for each s i in V do Insert ( s i , HV ( { s i } )) into C end for while | S | < k do while C (cid:54) = ∅ do c max = Solution with the largest HVC in C Update the HVC of c max to S if c max has the largest HVC in C then S = S ∪ { c max } C = C \ { c max } break end if end while end while end if solution in the solution set S . The details of the lazy greedyinclusion IGD subset selection algorithm (LGI-IGDSS) isshown in Algorithm 5. It can be changed to the algorithm forthe IGD+ indicator (LGI-IGD+SS) by using the IGD+ distanceinstead of the Euclidean distance to calculate the distancesbetween each reference point and each solution.Note that we need to find the solution with the largesthypervolume contribution in Algorithm 4 and the largest IGD(IGD+) improvement in Algorithm 5. The priority queueimplemented by the maximum heap is used to accelerate theprocedure.The proposed algorithms only need one parameter k , whichis the solution subset size. This parameter is needed in allsubset selection algorithms. That is, the proposed algorithmsdo not need any additional parameter.The idea of the lazy evaluation was proposed by Minoux[48] to accelerate the greedy algorithm for maximizing sub-modular functions. Then, it was applied to some specific areassuch as influence maximization problems [49] and networkmonitoring [50]. Minoux [48] proved that if the function issubmodular and the greedy solution is unique, the solutionproduced by the lazy greedy algorithm and the originalgreedy algorithm is identical. Since the hypervolume, IGDand IGD+ indicators are submodular, the proposed algorithms(i.e., Algorithm 4 and Algorithm 5) find the same subsets asthe corresponding original greedy inclusion algorithms if thegreedy solutions are unique. C. An Illustrative Example Let us explain the proposed algorithm using a simple exam-ple. Fig. 6 shows the changes of the hypervolume contribution EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 9 Algorithm 5 Lazy Greedy Inclusion IGD Subset Selection(LGI-IGDSS) Input: V (A set of non-dominated solutions), k (Solutionsubset size) Output: S (The selected subset from V ) if | V | < k then S = V else S = ∅ , C = ∅ s min = Solution in V \ S with the smallest IGD ( { s i } , V ) S = S ∪ { s min } D (cid:48) = Distance from each reference point to s min for each s i in V \ S do Insert ( s i , IGD ( { s min } , R ) − IGD ( { s min , s i } , R )) into C end for while | S | < k do while C (cid:54) = ∅ do c max = Solution with the largest IGD improvementin C Distance from each reference point to c max Update the IGD improvement of c max to mean ( D (cid:48) ) − mean ( min ( D (cid:48) , D )) if c max has the largest IGD improvement in C then S = S ∪ { c max } C = C \ { c max } D = Distance from each reference point to c max D = min ( D (cid:48) , D ) break end if end while end while end if in the list C . The value in the parentheses is the stored HVCvalue of each solution to the selected subset. For illustrationpurposes, the solutions in the list are sorted by the storedHVC values. However, in the actual implementation of thealgorithm, the sorting is not necessarily needed (especiallywhen the number of candidate solutions is huge). This isbecause our algorithm only needs to find the most promisingcandidate solution with the largest HVC value in the list. Fig.6 (i) shows the initial list C including five solutions a , b , c , d and e . The current solution subset is empty. In Fig. 6 (i),solution a has the largest HVC value. Since the initial HVCvalue of each solution is the true hypervolume contribution tothe current empty solution subset S , no recalculation is needed.Solution a is moved from the list to the solution subset.In Fig. 6 (ii), solution b has the largest HVC value in the listafter solution a is moved. Thus, the hypervolume contributionof b is to be recalculated. We assume that the recalculatedHVC value is 4 as shown in Fig. 6 (iii).Fig. 6 (iii) shows the list after the recalculation. Since theupdated HVC value of b is not the largest, we need to choosesolution e with the largest HVC value in the list and recalculate its hypervolume contribution. We assume that the recalculatedHVC value is 6 as shown in Fig. 6 (iv).Fig. 6 (iv) shows the list after the recalculation. Since therecalculated HVC value is still the largest in the list, solution e is moved from the list to the solution subset.Fig. 6 (v) shows the list after the removal of e . Solution c with the largest HVC value is examined.In this example, for choosing the second solution fromthe remaining four candidates ( b , c , d and e ), we evaluatethe hypervolume contributions of only the two solutions ( b and e ). In the standard greedy inclusion algorithm, all fourcandidates are examined. In this manner, the proposed algo-rithm decreases the computation time of the standard greedyinclusion algorithm. (i) (ii) (v) (iv) (iii) (10) a (9) b (8) e (6) c (5) d (9) b (8) e (6) c (5) d (4) b (8) e (6) c (5) d (4) b (6) e (6) c (5) d (4) b (6) c (5) d Fig. 6. Illustration of the proposed algorithm. The values in the parenthesesare the stored tentative HVC values. V. E XPERIMENTS A. Algorithms Used in Computational Experiments The proposed lazy greedy inclusion hypervolume subsetselection algorithm LGI-HSS is compared with the followingtwo algorithms: 1) Standard greedy inclusion hypervolume subset selection(GI-HSS): This is the greedy inclusion algorithm described inSection II.C. When calculating the hypervolume contribution,the efficient method described in Section II.A is employed. 2) Greedy inclusion hypervolume subset selection with hy-pervolume contribution updating (UGI-HSS): The hypervol-ume contribution updating method proposed in FV-MOEA[41] (Algorithm 2) is used. Since Algorithm 2 is for greedyremoval, it is changed for greedy inclusion here. It is thefastest known greedy inclusion algorithm applicable to anydimension.The proposed lazy greedy inclusion IGD subset selectionalgorithm LGI-IGDSS is compared with the standard greedyinclusion algorithm GI-IGDSS with the efficient IGD im-provement evaluation method in Section III.B. The proposedlazy greedy inclusion IGD+ subset selection algorithm LGI-IGD+SS is compared with the standard greedy inclusionalgorithm GI-IGD+SS with the efficient IGD+ improvementevaluation method in Section III.B.Our main focus is the selection of a solution subset froman unbounded external archive. Since the number of solutionsto be selected is much smaller than the number of candidatesolutions: k (cid:28) n = | V | in HSSP, greedy removal is notefficient. Hence, greedy removal algorithms (e.g., those withIHSO* [36] and IWFG [40]) are not compared in this paper. EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 10 B. Test Problems and Candidate Solution Sets To examine the performance of the above-mentioned subsetselection algorithms, we use the following four representativetest problems with different Pareto front shapes:(i) DTLZ2 [51] with a triangular concave (spherical) Paretofront.(ii) Inverted DTLZ2 (I-DTLZ2 [52]) with an inverted trian-gular concave (spherical) Pareto front.(iii) DTLZ1 [51] with a triangular linear Pareto front.(iv) DTLZ7 [51] with a disconnected Pareto front.For each test problem, we use three problem instances withfive, eight and ten objectives (i.e., solution subset selectionis performed in five-, eight- and ten-dimensional objectivespaces). Candidate solution sets are generated by samplingsolutions from the Pareto front of each problem instance.Four different settings of the candidate solution set size areexamined: 5000, 10000, 15000 and 20000 solutions. First weuniformly sample 100,000 solutions from the Pareto front ofeach problem instance. In each run of a solution subset selec-tion algorithm, a pre-specified number of candidate solutions(i.e., 5000, 10000, 15000 or 20000 solutions) are randomlyselected from the generated 100,000 solutions. Computationalexperiments are performed 11 times for each setting of thecandidate solution set size for each problem instance by eachsubset selection algorithm. The number of solutions to beselected is specified as 100. Thus, our problem is to select100 solutions from 5000, 10000, 15000 and 20000 candidatesolutions to maximize the hypervolume value, and to minimizethe IGD and IGD+ values of the selected 100 solutions. C. Experimental Settings In each hypervolume subset selection algorithm, the ref-erence point for hypervolume (contribution) calculation isspecified as (1.1, 1.1, ..., 1.1) for all test problems independentof the number of objectives. We use the WFG algorithm [35] for hypervolume calculation in each subset selectionalgorithm with the hypervolume indicator. All subset selectionalgorithms are coded by MatlabR2018a. The computation timeof each run is measured on an Intel Core i7-8700K CPU with16GB of RAM, running in Windows 10. D. Comparison of Hypervolume Subset Selection Algorithms The average run time of each hypervolume subset selectionalgorithm on each test problem is summarized in Table II (inthe columns labelled as GI-HSS, UGI-HSS and LGI-HSS).Since the subset obtained by three algorithms on the samedata set are the same (i.e., the proposed algorithm does notchange the performance), we only show the run time of eachalgorithm.Compared with the standard GI-HSS algorithm, we can seethat our LGI-HSS algorithm can reduce the computation timeby 91% to 99%. That is, the computation time of our LGI-HSSis only 1-9% of that of GI-HSS. By the increase in the numberof objectives (i.e., by the increase in the dimensionality of the objective space), the advantage of LGI-HSS over the othertwo algorithms becomes larger. Among the four test problemsin Table II, all the three algorithms are fast on the I-DTLZ2problem and slow on the DTLZ2 and DTLZ1 problems.Even when we compare our LGI-HSS algorithm with thefastest known greedy inclusion algorithm UGI-HSS, LGI-HSSis also faster. On DTLZ2 and I-DTLZ2, when the number ofobjectives is not very large (i.e., five-objective problems), thedifference in the average computation time between the twoalgorithms is not large (the LGI-HSS average computationtime is 41-67% of that of UGI-HSS). When the number ofobjectives is larger (i.e., eight-objective and ten-objective prob-lems), the difference in the average computation time betweenthe two algorithms becomes larger (i.e., LGI-HSS needs only6-25% of the UGI-HSS computation time). On DTLZ7, LGI-HSS needs only 4-24% of the UGI-HSS computation time.On DTLZ1, the difference between LGI-HSS and UGI-HSS issmall: LGI-HSS needs 36-88% of the UGI-HSS computationtime. The reason for the small difference is that each candidatesolution on the linear Pareto front of DTLZ1 has a similarhypervolume contribution value. As a result, recalculationis frequently performed in our LGI-HSS. On the contrary,there exist large differences among hypervolume contributionvalues of candidate solutions on the nonlinear Pareto fronts ofDTLZ2, I-DTLZ2 and DTLZ7. This leads to a less frequentupdate of their hypervolume contribution values (i.e., highefficiency of LGI-HSS).From the three columns by HSS in Table I, we can alsoobserve that the average computation time of each algorithmdid not severely increase with the increase in the number ofobjectives (i.e., with the increase in the dimensionality of theobjective space) for DTLZ7 and I-DTLZ2. In some cases, theaverage computation time decreased with the increase in thenumber of objectives. The reasons are as follows. Firstly, theWFG algorithm is used in the three algorithms to calculate thehypervolume contribution of each solution. The computationtime of hypervolume contribution does not increase severely asthe number of objectives increases. Besides, the total numberof solution evaluations needed for LGI-HSS will decrease onsome problems as the number of objectives increases. Thisis because the difference in the hypervolume contributionvalues of the candidate solutions increases with the numberof objectives, which leads to the decrease in the numberof updates of the hypervolume contribution value of eachsolution. E. Comparison of IGD/IGD+ Subset Selection Algorithms As can be observed from the last four columns of TableII, the proposed LGI-IGDSS and LGI-IGD+SS are muchfaster than the standard greedy algorithms (i.e., GI-IGDSS andGI-IGD+SS). Compared with the GI-IGDSS algorithm, ourLGI-IGDSS algorithm needs only 6-10% computation time.Compared with the GI-IGD+SS algorithm, our LGI-IGD+SSalgorithm needs only 6-9% computation time.Different from the hypervolume subset selection algorithms,the computation time of the IGD and IGD+ subset selectionalgorithms does not have a large difference among different EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 11 TABLE IIA VERAGE C OMPUTATION T IME ( IN S ECONDS ) ON D IFFERENT P ROBLEMS O VER 11 R UNS . T HE B EST R ESULTS ARE H IGHLIGHTED BY B OLD .Problem Candidate Solutions GI-HSS UGI-HSS LGI-HSS GI-IGDSS LGI-IGDSS GI-IGD+SS LGI-IGD+SSFive-Objective DTLZ2 5,000 46.9 10.7 Eight-Objective DTLZ2 5,000 413.8 75.4 Ten-Objective DTLZ2 5,000 3521.7 540.6 Five-Objective I-DTLZ2 5,000 35.7 4.1 Eight-Objective I-DTLZ2 5,000 46.0 5.4 Ten-Objective I-DTLZ2 5,000 43.9 8.4 Five -Objective DTLZ1 5,000 48.6 17.9 Eight-Objective DTLZ1 5,000 362.8 61.6 Ten-Objective DTLZ1 5,000 2436.2 278.5 Five-Objective DTLZ7 5,000 40.1 13.0 Eight-Objective DTLZ7 5,000 52.8 26.5 Ten-Objective DTLZ7 5,000 57.9 27.6 problems. One interesting observation is that the computationtime of LGI-IGD+SS does not increase with the number ofobjectives whereas that of LGI-IGDSS clearly increases withthe number of objectives. This is because the difference in theIGD+ contribution values of the candidate solutions increaseswith the number of objectives, which leads to the decreasein the number of updates of the IGD+ contribution value ofeach solution. However, the difference in the IGD contributionvalues of the candidate solutions does not clearly increase withthe number of objectives. This observation suggests that theIGD+ indicator is more similar to the hypervolume indicatorthan the IGD indicator, which was suggested by the optimaldistributions of solutions for each indicator [53]. F. Number of Solution Evaluations To further examine the effectiveness of the proposed LGI-HSS, we monitor the number of solution evaluations (i.e.,contribution calculations) to select each solution used by GI-HSS and LGI-HSS. In GI-HSS, all the remaining solutionsneed to be re-evaluated during each iteration. For example,when the candidate solution set size is 10,000, 10,000 − ( i − solutions are examined to select the i -th solution in GI-HSS. However, LGI-HSS can choose the same i -th solutionby evaluating only a small number of solutions. In Fig. 7,we show the number of examined solutions to choose eachsolution (i.e., the i -th solution for i = 1, 2, ..., 100) in a singlerun of LGI-HSS on the eight-objective DTLZ2 and I-DTLZ2problems where the candidate solution size is 10,000. Thesingle run with the median computation time among 11 runson each problem is selected in Fig. 7. For comparison, the EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 12 results by GI-HSS are also shown, which is the line specifiedby 10,000 − ( i − . For selecting the first solution, bothalgorithms examine all the given 10,000 solutions. After that,the proposed LGI-HSS algorithm can significantly decreasethe number of solution evaluations. On the eight-objectiveDTLZ2 problem, the number of solution evaluations decreaseswith fluctuation as the number of selected solutions increases.On the eight-objective I-DTLZ2 problem, the average numberof solution evaluations needed to select each solution is muchsmaller than that on the eight-objective DTLZ2 problem. Thisdifference is the reason for the difference in the results in TableII for these two problems.In the same manner as in Fig. 7, we show the results byGI-IGDSS and LGI-IGDSS in Fig. 8, and the results by GI-IGD+SS and LGI-IGD+SS in Fig. 9. We can observe fromthese two figures that the number of solution evaluationsneeded to select the first 50 solutions is much larger thanthat for the remaining 50 solutions. We can also see that thedifference in the results on the two problems in these twofigures is much smaller than that in Fig. 7. This is the reasonwhy similar results were obtained for all problems by LGI-IGDSS and LGI-IGD+SS. Fig. 7. The number of solution evaluations to select each solution used byGI-HSS and LGI-HSS.Fig. 8. The number of solution evaluations to select each solution used byGI-IGDSS and LGI-IGDSS.Fig. 9. The number of solution evaluations to select each solution used byGI-IGD+SS and LGI-IGD+SS. In the worst case, the number of solution evaluations inthe proposed algorithms is the same as the standard greedyalgorithms (i.e., the worst time complexity of the proposedalgorithms is the same as that of the standard greedy al-gorithms). However, as shown in Figs. 7-9, the number ofactually evaluated solutions is much smaller than the worst-case upper bound (which is shown by the dotted green line ineach figure).In order to examine the effect of the number of objectiveson the efficiency of the proposed algorithm, we show exper-imental results by the proposed three algorithms on the five-objective and ten-objective I-DTLZ2 problems in Figs. 10-12.As in Figs. 7-9, a single run with the median computation timeis selected for each algorithm on each test problem. In Fig. 10by LGI-HSS, much fewer solutions are examined for the ten-objective problem than the five-objective problem. As a result,the average computation time on the ten-objective I-DTLZ2problem by LGI-HSS in Table I was shorter than that on thefive-objective I-DTLZ2 problem. In Fig. 11 by LGI-IGDSS,the difference in the number of solution evaluations betweenthe two problems is not so large if compared with Fig. 10. As aresult, the average computation time by LGI-IGDSS increasedwith the increase in the number of objectives in Table II. Theresults by LGI-IGD+SS in Fig. 12 are between Fig. 10 andFig. 11. G. Size of Candidate Solution Sets In the previous subsection, 100 solutions were selectedfrom 5000-20000 solutions of test problems with five to tenobjectives. Our algorithms are proposed for choosing a pre-specified number of solutions from an unbounded externalarchive. In order to show the size of the subset selectionproblem in this scenario, we monitor the number of non-dominated solutions among all the examined solutions byan EMO algorithm on the four test problems with eightobjectives. As an EMO algorithm, we use NSGA-II [54] andNSGA-III [25]. The default parameter settings in [54] areused in the algorithms (e.g., the population size is specified as156). The number of non-dominated solutions is counted at the200th, 400th, 600th, 800th and 1000th generations. Averageresults over 11 runs are shown in Fig. 13. From these figures,we can see that hundreds of thousands of non-dominatedsolutions can be obtained by EMO algorithms for many-objective problems. This observation supports the necessityof highly-efficient subset selection algorithms. H. Performance on real-world solution sets We examine the performance of the three HSS algorithms onfour real-world problems (i.e., car side impact design problem[55], conceptual marine design problem [56], water resourceplanning problem [57] and car cab design problem [25]). Weapply NSGA-II [54] and NSGA-III [25] with an unboundedexternal archive to each problem. The population size isspecified as 220 for the four-objective problems, 182 for thesix-objective problem, and 210 for the nice-objective problem.After the 200 th generation of each algorithm, non-dominatedsolutions in the archive are used as candidate solution sets. EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 13 Fig. 10. The number of solution evaluations to select each solution used byLGI-HSS on five-objective and ten-objective I-DTLZ2 problems.Fig. 11. The number of solution evaluations to select each solution used byLGI-IGDSS on five-objective and ten-objective I-DTLZ2 problems. In this manner, we generate eight candidate solution sets.Then, each hypervolume-based subset selection algorithm isapplied to each candidate solution set to select 100 solutions.This experiment is iterated 11 time. The average size of thecandidate solution sets and the average computation time ofeach subset selection algorithm are shown in Table III.We can see from Table III that the proposed algorithmsalways achieve the best results (i.e., the shortest averagecomputation time) for all the eight settings. Especially whenthe number of objectives is large (e.g., the nine-objectivecar cab design problem), the proposed algorithm can signif-icantly decrease the computation time for subset selection.This is consistent with the experimental results on artificiallygenerated solution sets. Similar significant improvement bythe proposed algorithms in the computation time is alsoobserved for subset selection based on the IGD and IGD+indicators whereas they are not included due to the paperlength limitation. Fig. 12. The number of solution evaluations to select each solution used byLGI-IGD+SS on five-objective and ten-objective I-DTLZ2 problems. (a) NSGA-II.(b) NSGA-III.Fig. 13. The average number of non-dominated solutions among all theexamined solutions at each generation.TABLE IIIA VERAGE C OMPUTATION T IME ( IN S ECONDS ) ON R EAL - WORLD S OLUTION S ETS O VER 11 R UNS . T HE B EST R ESULTS ARE H IGHLIGHTEDBY B OLD .Solution Set Size NSGA-III on the car sideimpact design problem 11453 4 92.7 13.0 NSGA-II on the conceptualmarine design problem 8549 4 62.4 6.1 NSGA-III on the concep-tual marine design problem 10328 4 81.6 8.0 NSGA-II on the water re-source planning problem 12284 6 117.3 22.1 NSGA-III on the water re-source planning problem 19985 6 238.1 38.3 NSGA-II on the car cab de-sign problem 22196 9 412.5 81.5 NSGA-III on the car cabdesign problem 23958 9 516.3 97.6 VI. C ONCLUDING R EMARKS In this paper, we proposed efficient greedy inclusion al-gorithms to select a small number of solutions from a largecandidate solution set for hypervolume maximization, and IGDand IGD+ minimization. The proposed algorithms are basedon the submodular property of the three indicators. The coreidea of these algorithms is to use the submodular propertyto avoid unnecessary contribution calculations. The proposedlazy greedy algorithm finds the same solution subset as thestandard greedy inclusion algorithm for each indicator sinceour algorithm does not change the basic framework of greedyinclusion. Our experimental results on the four test problems(DTLZ1, DTLZ2, DTLZ7 and Inverted DTLZ2) with five,eight and ten objectives showed that the proposed three greedyalgorithms are much faster than the standard greedy inclusionalgorithms. Besides, the proposed LGI-HSS is much faster EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 14 than the state-of-the-art fast greedy inclusion algorithm.Our experimental results clearly showed that the proposedidea can drastically decrease the computation time (e.g., to 6-9% of the computation time by the standard greedy inclusionalgorithm for IGD+). The proposed idea is applicable toany performance indicator with the submodular property suchas hypervolume, IGD and IGD+. However, when we usehypervolume approximation instead of exact calculation, thecalculated hypervolume indicator is not strictly submodular.One interesting future research topic is to apply the proposedidea to approximate hypervolume subset selection algorithmsin order to examine the quality of obtained subsets by lazyapproximate algorithms. Another interesting research directionis to accelerate the proposed algorithms by further explorethe special properties of these indicators. Besides, in distance-based greedy subset selection [11], the distance of eachcandidate solution to the selected set does not increase bythe addition of a new solution to the selected solution set.Thus, using similar lazy idea to reduce the number of distancecalculations is also a promising future work.The source code of the proposed lazy greedysubset selection algorithms is available fromhttps://github.com/weiyuchen1999/LGISS.R EFERENCES[1] M. Ehrgott, Multicriteria Optimization . Springer Science & BusinessMedia, 2005, vol. 491.[2] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. daFonseca, “Performance assessment of multiobjective optimizers: ananalysis and review,” IEEE Transactions on Evolutionary Computation ,vol. 7, no. 2, pp. 117–132, 2003.[3] H. Ishibuchi, Y. Setoguchi, H. Masuda, and Y. Nojima, “How to comparemany-objective algorithms under different settings of population andarchive sizes,” in ,2016, pp. 1149–1156.[4] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAMJournal on Computing , vol. 24, no. 2, pp. 227–234, 1995.[5] G. Davis, “Adaptive greedy approximations,” Constructive Approxima-tion , vol. 13, no. 1, pp. 57–98, 1997.[6] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of ap-proximations for maximizing submodular set functions-I,” MathematicalProgramming , vol. 14, no. 1, pp. 265–294, 1978.[7] L. Bradstreet, L. While, and L. Barone, “Incrementally maximisinghypervolume for selection in multi-objective evolutionary algorithms,”in , 2007, pp. 3203–3210.[8] C. Qian, Y. Yu, and Z.-H. Zhou, “Subset selection by pareto opti-mization,” in Advances in Neural Information Processing Systems 28 ,C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett,Eds. Curran Associates, Inc., 2015, pp. 1774–1782.[9] H. Ishibuchi, H. Masuda, and Y. Nojima, “Selecting a small numberof non-dominated solutions to be presented to the decision maker,” in , 2014, pp. 3816–3821.[10] R. Tanabe, H. Ishibuchi, and A. Oyama, “Benchmarking multi- andmany-objective evolutionary algorithms under two optimization scenar-ios,” IEEE Access , vol. 5, pp. 19 597–19 619, 2017.[11] H. K. Singh, K. S. Bhattacharjee, and T. Ray, “Distance-based subsetselection for benchmarking in evolutionary multi/many-objective opti-mization,” IEEE Transactions on Evolutionary Computation , vol. 23,no. 5, pp. 904–912, Oct 2019.[12] W. Chen, H. Ishibuchi, and K. Shang, “Modified distance-based subsetselection for evolutionary multi-objective optimization algorithms,” in , 2020.[13] J. Bader and E. Zitzler, “HypE: An algorithm for fast hypervolume-basedmany-objective optimization,” Evolutionary Computation , vol. 19, no. 1,pp. 45–76, 2011. [14] K. Bringmann, T. Friedrich, and P. Klitzke, “Generic postprocessing viasubset selection for hypervolume and epsilon-indicator,” in InternationalConference on Parallel Problem Solving from Nature . Springer, 2014,pp. 518–527.[15] T. Kuhn, C. M. Fonseca, L. Paquete, S. Ruzika, M. M. Duarte, andJ. R. Figueira, “Hypervolume subset selection in two dimensions:Formulations and algorithms,” Evolutionary Computation , vol. 24, no. 3,pp. 411–425, 2016.[16] A. P. Guerreiro and C. M. Fonseca, “Computing and updating hyper-volume contributions in up to four dimensions,” IEEE Transactions onEvolutionary Computation , vol. 22, no. 3, pp. 449–463, 2017.[17] H. Ishibuchi, L. M. Pang, and K. Shang, “Solution subset selectionfor final decision making in evolutionary multi-objective optimization,”2020.[18] K. Bringmann, T. Friedrich, and P. Klitzke, “Two-dimensional subsetselection for hypervolume and epsilon-indicator,” in Proceedings of the2014 Annual Conference on Genetic and Evolutionary Computation ,2014, pp. 589–596.[19] K. Bringmann and T. Friedrich, “Approximating the volume of unionsand intersections of high-dimensional geometric objects,” ComputationalGeometry , vol. 43, no. 6-7, pp. 601–610, 2010.[20] M. Li and X. Yao, “An empirical investigation of the optimalityand monotonicity properties of multiobjective archiving methods,” in Evolutionary Multi-Criterion Optimization , K. Deb, E. Goodman, C. A.Coello Coello, K. Klamroth, K. Miettinen, S. Mostaghim, and P. Reed,Eds. Cham: Springer International Publishing, 2019, pp. 15–26.[21] A. Liefooghe, F. Daolio, S. Verel, B. Derbel, H. Aguirre, andK. Tanaka, “Landscape-aware performance prediction for evolutionarymulti-objective optimization,” IEEE Transactions on Evolutionary Com-putation , pp. 1–1, 2019.[22] L. M. Pang, H. Ishibuchi, and K. Shang, “Algorithm configurations ofmoea/d with an unbounded external archive,” 2020.[23] Y. Nan, K. Shang, H. Ishibuchi, and L. He, “Reverse strategy for non-dominated archiving,” IEEE Access , vol. 8, pp. 119 458–119 469, 2020.[24] R. Tanabe and H. Ishibuchi, “An analysis of control parameters ofmoea/d under two different optimization scenarios,” Applied Soft Com-puting , vol. 70, pp. 22 – 40, 2018.[25] K. Deb and H. Jain, “An evolutionary many-objective optimizationalgorithm using reference-point-based nondominated sorting approach,part I: Solving problems with box constraints,” IEEE Transactions onEvolutionary Computation , vol. 18, no. 4, pp. 577–601, 2014.[26] W. Chen, H. Ishibuchi, and K. Shang, “Lazy greedy hypervolume subsetselection from large candidate solution sets,” in , 2020, pp. 1149–1156.[27] J. D. Knowles, D. W. Corne, and M. Fleischer, “Bounded archivingusing the lebesgue measure,” in , vol. 4. IEEE, 2003, pp. 2490–2497.[28] E. Zitzler and L. Thiele, “Multiobjective optimization using evolutionaryalgorithms-a comparative case study,” in International Conference onParallel Problem Solving from Nature . Springer, 1998, pp. 292–301.[29] K. Shang, H. Ishibuchi, L. He, and L. M. Pang, “A survey on thehypervolume indicator in evolutionary multi-objective optimization,” IEEE Transactions on Evolutionary Computation , pp. 1–1, 2020.[30] J. J. Durillo, A. J. Nebro, and E. Alba, “The jmetal framework for multi-objective optimization: Design and architecture,” in . IEEE, 2010, pp. 1–8.[31] J. J. Durillo and A. J. Nebro, “jmetal: A java framework for multi-objective optimization,” Advances in Engineering Software , vol. 42,no. 10, pp. 760–771, 2011.[32] M. H. Overmars and C. Yap, “New upper bounds in klee’s measureproblem,” in , Oct 1988, pp. 550–556.[33] M. H. Overmars and C.-K. Yap, “New upper bounds in klee’s measureproblem,” SIAM Journal on Computing , vol. 20, no. 6, pp. 1034–1045,1991.[34] N. Beume, “S-metric calculation by considering dominated hypervolumeas klee’s measure problem,” Evolutionary Computation , vol. 17, no. 4,pp. 477–492, 2009.[35] L. While, L. Bradstreet, and L. Barone, “A fast way of calculatingexact hypervolumes,” IEEE Transactions on Evolutionary Computation ,vol. 16, no. 1, pp. 86–95, 2011.[36] L. Bradstreet, L. While, and L. Barone, “A fast incremental hypervolumealgorithm,” IEEE Transactions on Evolutionary Computation , vol. 12,no. 6, pp. 714–723, 2008.[37] G. Rote, K. Buchin, K. Bringmann, S. Cabello, and M. Emmerich, “Se-lecting k points that maximize the convex hull volume,” in Proceedings EEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. XX, NO. XX, AUGUST 2020 15 of the 19th Japan Conference on Discrete and Computational Geometry,Graphs, and Games , 2016, pp. 58–60.[38] T. Ulrich and L. Thiele, “Bounding the effectiveness of hypervolume-based ( µ + λ )-archiving algorithms,” in International Conference onLearning and Intelligent Optimization . Springer, 2012, pp. 235–249.[39] K. Bringmann and T. Friedrich, “An efficient algorithm for computinghypervolume contributions,” Evolutionary Computation , vol. 18, no. 3,pp. 383–402, 2010.[40] W. Cox and L. While, “Improving the IWFG algorithm for calculatingincremental hypervolume,” in . IEEE, 2016, pp. 3969–3976.[41] S. Jiang, J. Zhang, Y.-S. Ong, A. N. Zhang, and P. S. Tan, “Asimple and fast hypervolume indicator-based multiobjective evolutionaryalgorithm,” IEEE Transactions on Cybernetics , vol. 45, no. 10, pp. 2202–2213, 2014.[42] C. A. Coello Coello and M. Reyes Sierra, “A study of the parallelizationof a coevolutionary multi-objective evolutionary algorithm,” in MexicanInternational Conference on Artificial Intelligence , 2004.[43] H. Ishibuchi, H. Masuda, Y. Tanigaki, and Y. Nojima, “Modifieddistance calculation in generational distance and inverted generationaldistance,” in Evolutionary Multi-Criterion Optimization , A. Gaspar-Cunha, C. Henggeler Antunes, and C. C. Coello, Eds. Cham: SpringerInternational Publishing, 2015, pp. 110–125.[44] H. Ishibuchi, R. Imada, Y. Setoguchi, and Y. Nojima, “Reference pointspecification in inverted generational distance for triangular linear paretofront,” IEEE Transactions on Evolutionary Computation , vol. 22, no. 6,pp. 961–975, 2018.[45] E. M. Lopez and C. A. C. Coello, “IGD+-EMOA: A multi-objectiveevolutionary algorithm based on IGD+,” in , 2016, pp. 999–1006.[46] Y. Sun, G. G. Yen, and Z. Yi, “IGD indicator-based evolutionary algo-rithm for many-objective optimization problems,” IEEE Transactions onEvolutionary Computation , vol. 23, no. 2, pp. 173–187, 2019.[47] H. Ishibuchi, R. Imada, Y. Setoguchi, and Y. Nojima, “How to specifya reference point in hypervolume calculation for fair performancecomparison,” Evolutionary Computation , vol. 26, no. 3, pp. 411–440,2018.[48] M. Minoux, “Accelerated greedy algorithms for maximizing submodularset functions,” in Optimization Techniques . Springer, 1978, pp. 234–243.[49] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen,and N. Glance, “Cost-effective outbreak detection in networks,” in Proceedings of the 13th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining , 2007, pp. 420–429.[50] M. Gomez-Rodriguez, J. Leskovec, and A. Krause, “Inferring networksof diffusion and influence,” ACM Transactions on Knowledge Discoveryfrom Data , vol. 5, no. 4, Feb. 2012.[51] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable multi-objective optimization test problems,” in , vol. 1, May 2002, pp. 825–830 vol.1.[52] H. Jain and K. Deb, “An evolutionary many-objective optimizationalgorithm using reference-point based nondominated sorting approach,part II: Handling constraints and extending to an adaptive approach,” IEEE Transactions on Evolutionary Computation , vol. 18, no. 4, pp.602–622, Aug 2014.[53] H. Ishibuchi, R. Imada, N. Masuyama, and Y. Nojima, “Comparisonof hypervolume, IGD and IGD+ from the viewpoint of optimal dis-tributions of solutions,” in Evolutionary Multi-Criterion Optimization ,K. Deb, E. Goodman, C. A. Coello Coello, K. Klamroth, K. Miettinen,S. Mostaghim, and P. Reed, Eds. Cham: Springer InternationalPublishing, 2019, pp. 332–345.[54] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitistmultiobjective genetic algorithm: NSGA-II,” IEEE Transactions onEvolutionary Computation , vol. 6, no. 2, pp. 182–197, 2002.[55] H. Jain and K. Deb, “An evolutionary many-objective optimizationalgorithm using reference-point based nondominated sorting approach,part ii: Handling constraints and extending to an adaptive approach,” IEEE Transactions on Evolutionary Computation , vol. 18, no. 4, pp.602–622, 2014.[56] M. G. Parsons and R. L. Scott, “Formulation of multicriterion designoptimization problems for solution with scalar numerical optimizationmethods,” Journal of Ship Research , vol. 48, no. 1, pp. 61–76, 2004.[57] T. Ray, K. Tai, and K. C. Seow, “Multiobjective design optimization byan evolutionary algorithm,”