Hybrid Algorithm for Multi-Objective Optimization by Greedy Hypervolume Maximization
aa r X i v : . [ c s . N E ] J un Hybrid Algorithm for Multi-Objective Optimizationby Greedy Hypervolume Maximization
Conrado S. Miranda, Fernando J. Von Zuben,
Senior Member, IEEE
Abstract —This paper introduces a high-performance hybridalgorithm, called Hybrid Hypervolume Maximization Algorithm(H2MA), for multi-objective optimization that alternates betweenexploring the decision space and exploiting the already obtainednon-dominated solutions. The proposal is centered on maximizingthe hypervolume indicator, thus converting the multi-objectiveproblem into a single-objective one. The exploitation employsgradient-based methods, but considering a single candidate effi-cient solution at a time, to overcome limitations associated withpopulation-based approaches and also to allow an easy controlof the number of solutions provided. There is an interchange be-tween two steps. The first step is a deterministic local exploration,endowed with an automatic procedure to detect stagnation.When stagnation is detected, the search is switched to a secondstep characterized by a stochastic global exploration using anevolutionary algorithm. Using five ZDT benchmarks with 30variables, the performance of the new algorithm is compared tostate-of-the-art algorithms for multi-objective optimization, morespecifically NSGA-II, SPEA2, and SMS-EMOA. The solutionsfound by the H2MA guide to higher hypervolume and smallerdistance to the true Pareto frontier with significantly less functionevaluations, even when the gradient is estimated numerically.Furthermore, although only continuous decision spaces have beenconsidered here, discrete decision spaces could also have beentreated, replacing gradient-based search by hill-climbing. Finally,a thorough explanation is provided to support the expressive gainin performance that was achieved.
Index Terms —Exploration-exploitation algorithm; Gradient-based optimization; Hypervolume maximization; Multi-objectiveoptimization.
I. I
NTRODUCTION M ULTI-OBJECTIVE optimization (MOO) is a general-ization of the standard single-objective optimization toproblems where multiple criteria are defined and they conflictwith each other [1]. In this case, there can be multiple optimalsolutions with different trade-offs between the objectives.Since the optimal set can be continuous, an MOO problem isgiven by finding samples from the optimal set, called Paretoset. However, we also wish that the projection of the obtainedsamples of the Pareto set into the objective space be well-distributed along the Pareto frontier, which is the counterpartfor the Pareto set, so that the solutions present more diversetrade-offs.The current state-of-the-art for MOO relies on the use ofevolutionary algorithms for finding the desired samples [2].One of these algorithms is the NSGA-II [3], which performsnon-dominance sorting, thus ordering the proposed solutionsaccording to their relative dominance degree, and dividing the
C. S. Miranda and F. J. Von Zuben are with the School of Electrical andComputer Engineering, University of Campinas (Unicamp), Brazil. E-mail:{conrado,vonzuben}@dca.fee.unicamp.br solution set in subsequent frontiers of non-dominated solu-tions. NSGA-II also uses crowding distance, which measureshow close the nearby solutions are, to maintain diversity inthe objective space. Another well-known algorithm is theSPEA2 [4], where the solutions have a selective pressure tomove towards the Pareto frontier and also to stay away fromeach other.These algorithms are based on heuristics to define whatcharacterizes a good set of solutions. However, the hypervol-ume indicator [5] defines a metric of performance for a setof solutions, thus allowing a direct comparison of multipledistinct sets of solutions [6], with higher values indicatingpossible better quality. The hypervolume is maximal at thePareto frontier and increases if the samples are better dis-tributed along the frontier [7]. Due to these properties, itrepresents a good candidate to be maximized in MOO, beingexplicitly explored in the SMS-EMOA [8], where solutionsthat contribute the least to the hypervolume are discarded.On the other hand, local search methods have been suc-cessful in single-objective optimization (SOO) due to theirefficiency in finding a local optimum for some problems [9],[10], so that research has been performed to try to adaptthese methods for MOO problems. For instance, [11] defined amethod for finding all minimizing directions in a MOO prob-lem, but the proposed algorithm achieved low performance onusual benchmark functions.Alternatively, instead of adapting the single-objective meth-ods to work on MOO problems, we can create a SOO problemassociated with the MOO one, such that a good solution for thesingle-objective case is a good solution for the multi-objectivecase. Since the hypervolume is able to describe how good apopulation is, based on a single indicator, the MOO problemcan be converted into the maximization of the population’shypervolume.Based on this idea, [12] proposed a method to compute thehypervolume’s gradient for a given population, so that the op-timal search direction for each individual could be established.However, [13] showed that adjusting the population throughintegration of the hypervolume’s gradient not always work,with some initially non-dominated points becoming dominatedand others changing very little over the integration.In this paper, we introduce an algorithm for maximizingthe hypervolume by optimizing one point at a time, instead ofadjusting a whole population at once. The algorithm alternatesbetween exploring the space for non-dominated solutions and,when they are found, exploiting them using local searchmethods to maximize the populations’ hypervolume whenonly this active point can be moved. Therefore, once thehypervolume has converged, which is guaranteed to happen because the problem is bounded, the point is fixed in all furtheriterations. We found that this restriction is enough to overcomethe issues presented in [13] when using the hypervolume’sgradient. The proposed algorithm, called Hybrid HypervolumeMaximization Algorithm (H2MA), is a hybrid one, sinceit is composed of global exploration and local exploitationprocedures, properly managed to be executed alternately.Results over the ZDT benchmark [14] show that the newalgorithm performs better than the state-of-the-art evolutionaryalgorithms, both in terms of total hypervolume and distanceto the Pareto frontier. Moreover, the algorithm was able towork deterministically in most of the benchmark problems,which makes it less susceptible to variations due to randomnumber generation. Due to the high quality of the solutionsfound in less function evaluations than what is achieved by thecurrent state-of-the-art, we consider that the new algorithm isa viable choice for solving MOO problems. Moreover, sincea single solution is introduced at a time, the user is able tostop the algorithm when the desired number of solutions isfound, while evolutionary algorithms must evolve the wholepopulation at the same time.This paper is organized as follows. Section II introduces theconcepts of multi-objective optimization required, includingthe hypervolume indicator, and discusses the problems withthe gradient-based approach for hypervolume maximizationintroduced in [13]. Section III provides the details of the newH2MA algorithm, and Section IV shows the comparison withthe state-of-the-art algorithms. Finally, Section V summarizesthe results and discusses future research direction.II. M
ULTI -O BJECTIVE O PTIMIZATION AND THE H YPERVOLUME I NDICATOR
A multi-objective optimization problem is described by itsdecision space X and a set of objective functions f i ( x ) : X →Y i , i ∈ { , . . . , M } , where Y i ⊆ R is the associated ob-jective space for each objective function [15]. Due to thesymmetry between maximization and minimization, only theminimization problem is considered here. Each point x inthe decision space has a counterpart in the objective space Y = Y ×· · ·×Y M given by y = f ( x ) = ( f ( x ) , . . . , f M ( x )) .Since there are multiple objectives, a new operator forcomparing solutions must be used, since the conventional“less than” operator < can only compare two numbers. Thisoperator is denoted the dominance operator and is defined asfollows. Definition 1 (Dominance) . Let y and y ′ be points in Y , theobjective space. Then y dominates y ′ , denoted y ≺ y ′ , if y i In order to define the hypervolume indicator [5], we mustfirst define the Nadir point, which is a point in the objectivespace that is dominated by every point in a set. Definition 3 (Nadir Point) . Let X = { x , . . . , x N } ∈ X N bea set of points in the decision space. Let z ∈ R M be a pointin the objective space. Then z is a valid Nadir point if, for all x ∈ X and i ∈ { , . . . , M } , we have that f i ( x ) < z i . UsingDefinition 1, this can be written as f ( x ) ≺ z .Again, it is possible to allow equality in the definition of theNadir point, just like in the definition of dominance. However,when equality is allowed, it is possible for some point to havea null hypervolume, which can guide to undesired decisionswhen using the hypervolume as a performance metric, sincesuch points would not contribute to the hypervolume andwould be replaced by other points. Using the definition ofa Nadir point, we can define the hypervolume for a set ofpoints. Definition 4 (Hypervolume) . Let X = { x , . . . , x N } ∈ X N be a set of points in the decision space. Let z ∈ R M be a validNadir point in the objective space. Then the hypervolume canbe defined as: H ( X ; z ) = Z R M ∃ x ∈ X : f ( x ) ≺ y ≺ z ] d y, (1)where · ] is the indicator function.The hypervolume measures how much of the objective spaceis dominated by a current set X and dominates the Nadir point z . Fig. 1 shows an example of the hypervolume for a set ofthree non-dominated points. For each point, the shaded regionrepresents the area dominated by the given point, with colorscombining when there is overlap. B. Gradient of the Hypervolume As stated earlier, since the hypervolume provides such agood indicator of performance in multi-objective problems, itcan be used to transform the multi-objective problem into asingle-objective one, characterized by the maximization of thehypervolume.Although such approach proved to be successful when usingevolutionary algorithms as the optimization method [8], thesame did not happen when using the hypervolume’s gradient y y Figure 1: Example of hypervolume. The non-dominated solu-tions in the objective space are shown in black circles, andthe Nadir point is shown in the black square. For each non-dominated solution, the region between it and the Nadir pointis filled, with colors combining when there is overlap, and thetotal hypervolume is given by the area of the shaded regions.Best viewed in color.to perform the optimization [13]. However, it is well-knownthat gradient methods have been successful in single-objectiveoptimization [9], [10], thus suggesting that they should be areasonable choice for multi-objective optimization devoted tomaximizing the hypervolume, since the hypervolume operatoris well-defined almost everywhere in the objective space.The hypervolume’s gradient for a set of points was intro-duced in [12], and it can be used to compute the optimaldirection in which a given point should move to increase thehypervolume associated with the current set of non-dominatedsolutions. Although the hypervolume is not a continuously dif-ferentiable function of its arguments, since dominated pointsdo not contribute to the hypervolume and thus have nullgradient, the gradient can be computed whenever any twopoints have different values for all objectives.Based on this motivation, [13] used the hypervolume’sgradient as a guide for adjusting a set of points by numericalintegration, that is, performing a small step in the directionpointed by the gradient. Even though the algorithm was ableto achieve the Pareto set in some cases, it failed to convergeto efficient points when some points got stuck along theiterative process, either because their gradients became verysmall or because they became dominated by other points. Oncedominated, these points do not contribute to the hypervolumeand remain fixed. This causes a major issue to using thehypervolume gradient in practice, since dominated points canbe discarded, because there is no possibility to revert themto non-dominated points anymore, and the points with smallgradients remain almost stagnant.If we analyze Eq. (1), we can see that points at the borderin the objective space are the only ones that can fill someportions of the objective space. On the other hand, points thatare not at the border have less influence in the hypervolume,since part of the area dominated by them is also dominatedby some other points. In the analysis presented in [13], it isclear that the cases where some points got stuck had highergradients for the border points in the objective space, whichled to the dominance or decrease of contribution of some or all central points.To make this idea clearer, consider the example in Fig. 1.If the point located at (0 . , . decreases its value on thesecond objective, it can increase the population’s hypervol-ume. Moreover, it is the only point that can do so withoutcompetition for that portion of the space, since it is the pointwith the largest value for the first objective. The same holdsfor the point at (0 . , . and the first objective.However the point located at (0 . , . has to competewith the other two points to be the sole contributor for someregions. Therefore, its effect on the hypervolume is smaller,which leads to a smaller gradient. Furthermore, if less area isdominated by the middle point alone, which can occur duringthe points adjustment as the middle one moves less, then itsinfluence becomes even smaller and it can become dominated.It is important to highlight that this behavior does nothappen always, but can occur along the iterative process, asshown in [13]. This leads to the base hypothesis for the algo-rithm developed in this paper: when using the hypervolume’sgradient for optimization, the competition for increasing thehypervolume among points should be avoided.III. H YBRID H YPERVOLUME M AXIMIZATION A LGORITHM From the discussion in Section II-B, one can see that themajor problem when optimizing the hypervolume directlyusing its gradient may be the competition among points.Therefore, our proposed algorithm optimizes a single solutionat a time, avoiding this competition.Theoretically, the algorithm can be described by choosing anew point that maximizes the hypervolume when taking intoaccount the previous points, such that its recurring equationcan be written as: x t = arg max x ∈X H ( X t − ∪{ x } ) , X t = X t − ∪{ x t } , t ∈ N , (2)where the initial set is given by X − = {} .Since a single point is being optimized at a time, the opti-mization becomes simpler and, as we will show in Section IV,requires less function evaluations. Moreover, one could arguethat maintaining the previous set fixed reduces the flexibilityallowed in comparison with a set where all the points are beingconcurrently adjusted. Although this may be true, we will alsoshow in Section IV that the proposed algorithm performs welldespite this loss of flexibility.The algorithm described in Eq. (2) is theoretically ideal,since finding the maximum is hard in practice. Therefore, theactual algorithm proposed is shown in Fig. 2. This algorithmperforms exploration of the objective space until a new solu-tion that is not dominated by the previous candidate solutionsis found. When it happens, the hypervolume of the whole setis larger than the hypervolume when considering only previouscandidate solutions.The new candidate solution is then exploited to maximizethe total hypervolume and, after convergence, is then added tothe existing set. It is important to highlight that the exploitationphase cannot make the solution become dominated, sincethat would reduce the hypervolume in comparison with theinitial condition. Therefore, the problem of points becoming Input: Objectives f Input: Design space X Input: Nadir point z Output: Set of candidate solutions X function H YBRID G REEDY O PTIMIZER ( f, X , z ) Regions, X ← C REATE I NITIAL R EGION ( f, X ) while not stop condition and | Regions | > do R ← Regions .pop() ⊲ Removes the region with thelargest volume x ← E XPLORE D ETERMINISTIC ( f, X , R, X ) if x is valid then x ← E XPLOIT ( f, X , x , X, z ) NewRegions ← C REATE R EGIONS ( R,x, f ) Regions ← Regions ∪ NewRegionsX ← X ∪ { x } end ifend whilewhile not stop condition do x ← E XPLORE S TOCHASTIC ( f, X , X ) x ← E XPLOIT ( f, X , x , X, z ) X ← X ∪ { x } end whilereturn X end function Figure 2: Hybrid algorithm that performs deterministic andstochastic exploration until a suitable solution is found, andthen exploits it. Input: Objectives f Input: Design space X Input: Current exploration region R Input: Set of candidate solutions X Output: New initial condition x function E XPLORE D ETERMINISTIC ( f, X , R, X ) x ← M EAN ( R.X )Minimize k R.mid − f ( x ) k from x until some candidate x isnot dominated by X if found non-dominated x then x ← x else x ← some invalid state end ifreturn x end function Figure 3: A deterministic exploration is performed based onsome region.dominated during the exploitation is avoided. Furthermore, theexploitation is a traditional single-objective optimization, sothat gradient methods can be used if the decision set X iscontinuous or hill-climbing methods can be used for discrete X .Once finished the exploitation, the algorithm begins theexploration phase again. The exploration can be deterministic,based on regions of the objective space defined by previoussolutions, or stochastic, where a stochastic algorithm, such asan evolutionary algorithm, is used to find the new candidate.When a non-dominated candidate is found, the algorithm turnsto exploitation again.We highlight that the deterministic exploitation algorithmproposed is based on the definition of these regions, but otherdeterministic methods can be used. However, the algorithmmust be able to establish when it is not able to provide Input: Objectives f Input: Design space X Output: Set of candidate solutions X Output: Initial exploration region R function C REATE I NITIAL R EGION ( f, X ) X ← {} x ← X .mean ⊲ Gets the average candidate for i = 1 , . . . , | f | do x ← M INIMIZE ( f i , x , X ) X ← X ∪ { x } end for R ← C REATE R EGION ( X,f ) return { R } , X end function Figure 4: The initial region is created from the points thatminimize each objective individually. Input: Current explored region R Input: Current solution x Input: Objectives f Output: New exploration regions NewRegions function C REATE R EGIONS ( R,x, f ) NewRegions ← {} for X ′ in C OMBINATIONS ( R.X, | R.X | − ) do R ′ ← C REATE R EGION ( X ′ ∪ { x } , f ) NewRegions ← NewRegions ∪ { R ′ } end forreturn NewRegions end function Figure 5: New exploration regions are created by combiningthe current solution with the previous region.further improvements, so that the change to the stochasticglobal exploration can be made. In the algorithm shown inFig. 2, regions that do not provide a valid initial condition arediscarded without creating new regions, so that eventually thealgorithm can switch to the stochastic global exploration.The algorithm for deterministic exploration is shown inFig. 3. It combines the points used to create a given region inorder to produce an initial condition and tries to minimize thedistance between its objective value and a reference point.Once a non-dominated point is found, it is returned forexploitation. Although this simple optimization provided goodresults without requiring many function evaluations, othermethods can be used to perform this exploration. Alternatively,one can also perform a stochastic exploration instead of adeterministic one, but this may have negative effects on theperformance if the information provided by the output (regionR) is not used, since a global search would be required.The first region is created by finding points that minimizeeach objective separately, as shown in Fig. 4. This establishesthat the initial region will have a number of candidate solutionsassociated with it equal to the number of objectives, so thatthe solutions are at the border of the region.When new regions are created after exploitation, we ignorethe solutions that created the region, one at a time, and replaceit with the proposed new solution, as shown in Fig. 5, to createa new region. This guarantees that the number of solutions foreach region is kept equal to the number of objectives.Finally, Fig. 6 shows how a region is created. If a region f ( x ) f ( x ) f ( x ) f ( x ′ ) y f ( · ) f ( · ) R (a) Deterministic exploration f ( x ) f ( x ) f ( x ′ ) f ( x ∗ ) H ∗ H ′ f ( · ) f ( · ) R R (b) Exploitation Figure 7: Deterministic exploration and exploitation steps of the new algorithm in an example problem. The Pareto frontier isshown in the blue line, and the regions used by the deterministic exploration are shown in yellow. Input: Objectives f Input: Set of candidate solutions X Output: Exploration region R function C REATE R EGION ( X,f ) V = Q | f | i =1 (max x ∈ X f i ( x ) − min x ∈ X f i ( x )) if V > then R.X ← XR.mid ← M EAN ( { f ( x ) | x ∈ X } ) R.V ← V else R ← null element such that { R } ≡ {} end ifreturn R end function Figure 6: An exploration region is created from a set ofcandidates if the region have some volume.does not have a volume, then at least one objective for twosolutions is the same. Although we could allow such regionto exist without modifying the rest of the algorithm, theseregions tend to not provide good candidates for exploitationand delay the change to stochastic global exploration. Fur-thermore, one can even prohibit regions with volume smallerthan some known constant, as they probably will not providegood exploitation points, and the change to stochastic globalexploration happens earlier.Fig. 7 shows a step of the algorithm in an example problemwith two objectives. The deterministic exploration receives aregion R , composed of the points x and x . The mean of thepoints that compose the region is given by x = ( x + x ) / and its evaluation in the objective space is shown in Fig. 7a.The mean objective of the points that compose the region isalso computed and is shown as y = ( f ( x ) + f ( x )) / . Thedeterministic exploration is then defined by the problem min x ∈X k f ( x ) − y k , (3)which uses x as the initial condition for the optimization.Since y is guaranteed to be non-dominated by f ( x ) and f ( x ) , this should guide the search to the non-dominatedregion of the space.While performing this optimization, some intermediarypoints are evaluated, either while computing the numericgradient or after performing a gradient descent step. The de-terministic exploration stops as soon as a non-dominated pointis found, which is given by f ( x ′ ) in the example in Fig. 7a.Note that this example shows f ( x ) as being dominated by f ( x ) and f ( x ) , but it can also be non-dominated. In thiscase, x ′ = x and no optimization step for the problem inEq. (3) is performed. Supposing no non-dominated point f ( x ′ ) is found during the deterministic exploration, the region issimply discarded, without performing an exploitation step.Using the point x ′ , whose f ( x ′ ) is non-dominated, providedby the deterministic or stochastic exploration, the exploitationis performed. Fig. 7b shows the hypervolume contributionsfor the initial point x ′ and the optimal point x ∗ , whichmaximizes the total hypervolume as in Eq. (2). Since x ′ isnon-dominated, its hypervolume contribution H ′ is positiveand the hypervolume gradient relative to the objectives isnon-zero. After finding x ∗ and if x ′ was provided by thedeterministic exploration, new regions must be created to allowfurther exploration. Therefore, according to Fig. 5, the regions R = ( x , x ∗ ) and R = ( x , x ∗ ) are created for furtherexploration.This finalizes a step of the algorithm, which is repeated untilthe given stop condition is not met. As at most one point isfound by each step, the stop condition can be defined basedon the number of desired points.Note that all the methods used in this algorithm assume thatthe optimization, either for exploitation or for minimizing oneobjective alone, requires an initial condition. This is true forhill climbing or gradient methods, but the algorithm can easilybe modified if the optimization does not require it.IV. E XPERIMENTAL R ESULTS To compare the algorithm proposed in Section III, calledHybrid Hypervolume Maximization Algorithm (H2MA), with −0.5 0 0.5 1 1.5 2 2.5x 10 H (a) Hypervolume −0.5 0 0.5 1 1.5 2 2.5x 10 −3.5−3−2.5−2−1.5−1−0.500.51 NSGA−IISPEA2SMS1SMS2H2MA l og P (b) P-distance. Zero values not shown. Figure 8: 0th, 25th, 50th, 75th, and 100th percentiles every 2000 evaluations for the all algorithms on ZDT1.Table I: Benchmark problems used for evaluation. See theAppendix for Eqs. (4) to (8). Problem Objectives X Nadir pointZDT1 Eq. (4) [0 , n (2 , ZDT2 Eq. (5) [0 , n (2 , ZDT3 Eq. (6) [0 , n (2 , ZDT4 Eq. (7) [0 , × [ − , n − (2 , n − ZDT6 Eq. (8) [0 , n (2 , the existing algorithms, the ZDT family of functions [14] waschosen. These functions define a common benchmark set inthe multi-objective optimization literature, since they define awide range of problems to test different characteristics of theoptimization algorithm. All functions defined in [14] have acontinuous decision space X , except for the ZDT5 which hasa binary space. In this paper, only the continuous test functionswere used to evaluate the performance of the new algorithm,and their equations are shown in the appendix.Table I provides a summary of the evaluation functions,their decision spaces, and the Nadir points used to compute thehypervolume. The Nadir points are defined by upper boundsof the objectives, which guarantees that the hypervolumecomputation is always valid, plus one, since not adding anextra value would mean that points at the border of the frontierwould have no contribution to the hypervolume and would beavoided. In all instances, a total of n = 30 variables wereconsidered, as common in the literature. The evolutionaryalgorithms’ and evaluation functions’ implementations weregiven by the PaGMO library [16].We compare our algorithm with existing state-of-the-artmulti-objective optimization algorithms, namely NSGA-II [3],SPEA2 [4], and SMS-EMOA [8]. All of them used a popula-tion size of 100 individuals. Tests have shown that this size isable to provide a good performance due to balance betweenexploration of the space and exploitation of the individuals,with much less individuals not providing good exploration andmuch more not providing good exploitation. The SMS-EMOA can use two methods for selecting points in dominated fronts:the least hypervolume contribution or the domination count.Both methods were tested, with labels SMS1 and SMS2,respectively, in the following figures. Note that this methodonly applies for the dominated fronts, since the dominationcount is zero for all points in the non-dominated front andthe least contributor method must be used. Furthermore, theSMS-EMOA algorithm’s performance presented in this paperuses a dynamic Nadir point, which is found by adding one tothe maximum over all points in each objective, since usingthe Nadir points presented in Table I created a very highselective pressure, which in turn led to poor exploration andperformance.Since the decision space and objectives are continuous, theexploitation and deterministic exploration methods may resortto a gradient-based algorithm. In this paper, we used the L-BFGS-B method implemented in the library SciPy [17], whichis able to handle the bounds of X and is very efficient to finda local optimum. As the other algorithms being compared areevolutionary algorithms, which can only access the objectivefunctions by evaluating them at given points, the gradient forthe L-BFGS-B is computed numerically to avoid an unfairadvantage in favor of our algorithm.For the stochastic global exploration, we used an evolu-tionary algorithm with non-dominance sorting and removalbased on the number of dominating points. The populationhad a minimum size of and was filled with the given setof previous solutions X . If less than points were provided,the others were created by randomly sampling the decisionspace X uniformly. Once a new point is introduced to thenon-dominated front, it is returned for exploitation because itincreases the hypervolume when added to the previous solu-tions X . The size of this population was chosen experimentallyto provide a good enough exploration of the space towardthe initial conditions for the exploitation. This size is smallerthan the population size for the pure evolutionary algorithms −0.5 0 0.5 1 1.5 2 2.5x 10 H (a) Hypervolume −0.5 0 0.5 1 1.5 2 2.5x 10 −3.5−3−2.5−2−1.5−1−0.500.51 NSGA−IISPEA2SMS1SMS2H2MA l og P (b) P-distance. Zero values not shown. Figure 9: 0th, 25th, 50th, 75th, and 100th percentiles every 2000 evaluations for the all algorithms on ZDT2. −0.5 0 0.5 1 1.5 2 2.5x 10 H (a) Hypervolume −0.5 0 0.5 1 1.5 2 2.5x 10 −3−2.5−2−1.5−1−0.500.51 NSGA−IISPEA2SMS1SMS2H2MA l og P (b) P-distance. Zero values not shown. Figure 10: 0th, 25th, 50th, 75th, and 100th percentiles every 2000 evaluations for the all algorithms on ZDT3.because the pure evolutionary algorithm need diversity toexplore and exploit all of its population, but the stochasticpart of the H2MA is already initialized with good and diversecandidate solutions provided by the exploitation procedure,reducing its exploration requirements.Besides computing the solutions’ hypervolume, which is themetric that the H2MA is trying to maximize and that providesa good method for comparing solutions, we can computethe distance between the achieved objectives and the Paretofrontier, since the Pareto frontiers for the ZDT functions areknown. This defines the P-distance, which is zero, or close tozero due to numerical issues, for points at the frontier.Figs. 8, 9, 10, 11, and 12 present the results for the prob-lems ZDT1, ZDT2, ZDT3, ZDT4, and ZDT6, respectively.A maximum of 20000 function evaluations was considered,and the graphs show the th, th, th, th, and th percentiles for each performance indicator over 100 runs ofthe algorithms. Since the P-distance is shown in log-scale,some values obtained by our proposal are absent or partiallypresent, because they have produced zero P-distance.From ZDT1 to ZDT4, the H2MA never ran out of regionsto explore, so the stochastic exploration was not used and allruns have the same performance. For the function ZDT6, thefirst objective, given by Eq. (8a), causes some problems to thedeterministic exploration.During the creation of the first region for this problem, themean point is used as initial condition for optimizing eachobjective, as shown in Fig. 4. However, the first objective forZDT6 has null derivative when x = 0 . . In this case, eventraditional higher-order methods would not help, since the firstnon-zero derivative of f ( x ) is the sixth. As the first objectivedoes not change in this case and it also has local minima that −0.5 0 0.5 1 1.5 2 2.5x 10 H (a) Hypervolume −0.5 0 0.5 1 1.5 2 2.5x 10 −12−10−8−6−4−2024 NSGA−IISPEA2SMS1SMS2H2MA l og P (b) P-distance. Zero values not shown. Figure 11: 0th, 25th, 50th, 75th, and 100th percentiles every 2000 evaluations for the all algorithms on ZDT4. −0.5 0 0.5 1 1.5 2 2.5x 10 H (a) Hypervolume −0.5 0 0.5 1 1.5 2 2.5x 10 −0.8−0.6−0.4−0.200.20.40.60.81 NSGA−IISPEA2SMS1SMS2H2MA l og P (b) P-distance. Zero values not shown. Figure 12: 0th, 25th, 50th, 75th, and 100th percentiles every 2000 evaluations for the all algorithms on ZDT6.are very hard to overcome, the algorithm quickly switches tousing stochastic exploration. Once new regions have candidatepoints, the algorithm is able to exploit them.Besides this issue in the deterministic exploration of theproblem ZDT6, the local minima of the first objective makessome candidate solutions be sub-optimal, increasing the P-distance as shown in Fig. 12b. Nonetheless, the achieved P-distance is better than the evolutionary algorithms and the thpercentile is zero. Moreover, Figs. 8b, 9b, 10b, and 11b showthat the candidate solutions are always on the Pareto frontierfor the problems ZDT1 to ZDT4. This allows the user to stopthe optimization at any number of evaluations, even with veryfew function evaluations, and have a reasonable expectationthat the solutions found are efficient.When we evaluate the hypervolume indicator, we see that,for the problems ZDT1, ZDT2, ZDT4, and ZDT6, the per- formance of the H2MA is much better, even for the lastone using stochastic exploration. Moreover, the H2MA’s worsthypervolume was always better than the best hypervolume forall evolutionary algorithms and it was able to get closer to themaximum hypervolume possible with relatively few functionevaluations, being a strong indication of its efficiency.For the problem ZDT3, whose hypervolume performanceis shown in Fig. 10a, the H2MA was generally better thanthe evolutionary algorithms. The Pareto frontier for ZDT3 iscomposed of disconnected sets of points, which was createdto test the algorithm’s ability to deal with discontinuousfrontiers. Since the exploitation algorithm used for the resultsis gradient-based, it is not able to properly handle discon-tinuous functions, which is the case of the hypervolume ondiscontinuous frontiers. However, the deterministic explorationmethod is able to find points whose exploitation lay on the ZDT1ZDT2ZDT3ZDT4 Analytic N u m e r i c Figure 13: Comparison between the number of functionevaluations required to achieve the same hypervolume usingnumeric or analytic gradient. The dotted line represents a 30-fold improvement.different parts of the Pareto frontier, providing the expecteddiversity.Fig. 13 shows a comparison between the number of functionevaluations required by the numeric and the analytic gradientto achieve the same hypervolume on the problems ZDT1 toZDT4. The analytic method for computing the hypervolume’sgradient is described in [12]. The comparison for ZDT6 is notshown due to its different scale, since many function evalua-tions are used in the global stochastic exploration because thedeterministic exploration fails to find regions.As expected, using the analytic gradient causes a 30-foldimprovement in comparison to the numeric gradient, sincethe number of decision variables is 30. However, the gainis not linear. This can be explained by the difference inbehavior during the deterministic exploration: the first non-dominated point found is used to perform the exploitation,even if this point was found during the computation of thenumeric gradient. For ZDT1 and ZDT4, this causes the newpoints found by the numeric gradient to be very close to theoriginal points, reducing its performance and increasing theimprovement of using the analytic gradient.Moreover, a similar effect makes the ZDT3 performance tohave a lower improvement when using the analytic gradient.Since the Pareto frontier for ZDT3 is discontinuous and thiscauses a discontinuity in the hypervolume, these large changescan be seen by the numeric gradient because small changesin the variables can have large effects on the hypervolume,pulling the solution if the difference is significant, whilethe analytic gradient is not able to provide such knowledge.Nonetheless, the analytic gradient presents at least a 15-foldimprovement over the numeric one over the ZDT3. A. Analysis of the H2MA’s performance As shown in Section IV, the proposed H2MA is able tosurpass the state-of-the-art in multi-objective optimization,based on evolutionary algorithms. Therefore, it is importantto analyze the algorithm and to discuss why this improvementhappened.Evolutionary algorithms perform a guided exploration, withnew individuals created based on existing high-performingindividuals, which allows them to escape local minima butreduces the convergence speed. On the other hand, traditionaloptimization algorithms tend to find local minima quickly, butthe optimal point achieved depends on the minima’s regionsof attraction.These two kind of algorithms have complementary natures,which makes them good candidates for creating a hybridalgorithm: the evolutionary algorithm explores the space andprovides initial conditions for the local optimization, whichthen finds minima quickly. Although this does create betterresults, it only explains the performance on the ZDT6 problem,since the other problems did not enter the stochastic phase.In order to understand the algorithm’s behavior, we mustkeep in mind that each new point added by the algorithm issolving a very different problem. Since the previous points thatare considered during the hypervolume optimization changeas more points are added, the objective surface for each newpoint is different from the previous ones and takes into accountthe already achieved portion of the hypervolume. To visualizethis, supposed that the hypervolume’s gradient is defined overpreviously found points and one previous solution is used asthe initial condition for the gradient-based exploitation to finda new point to be added to the solution set. Although theinitial condition was a local optimum for a previous problem,it is not a local optimum to the current problem, because anysmall change that creates a non-dominated point will improvethe total hypervolume. Therefore, we do not need to worryabout the new optimization converging to a previous solutionpoint because the problem landscape is different and differentlocal minima will be found, increasing the total hypervolume.The deterministic exploration is only required because thehypervolume’s gradient is not defined at the border of thehypervolume, so a new independent point must be found.This explains the performance improvement over ZDT1 toZDT4, because every added point improves the hypervolumeas much as it can do locally, so that an improvement is guar-anteed to happen. Evolutionary algorithms, on the other hand,use function evaluations without guarantees of improvementof the total hypervolume, since dominated solutions can befound.Moreover, although a local optimum found during exploita-tion may not be an efficient solution due to irregularities inthe objective surface, the experiments show that this is not thecase most of the time, since the P-distance of the solutionsfound are generally zero. This result is expected, since thehypervolume is maximal when computed over points in thePareto set, and the performance on all ZDT problems providesupport to this claim.We must highlight that we are not saying that evolutionaryalgorithms should not be used at all, but that they should be applied whenever traditional optimization methods are notable to solve the problem. This is the case of the ZDT6,for instance, where an evolutionary algorithm was requiredto provide initial conditions for the exploitation. We considervery important to have alternative methods that are better ona subset of the problems and to use them when a problemfrom such subset is present. This is exactly what the H2MAdoes: when the traditional optimization is not able to find ananswer, which indicates that the problem is outside of thesubset with which it can deal, an evolutionary algorithm, whichis able to handle a superset class of problems, is used untilthe problem becomes part of the subset again, establishing aswitching behavior that takes advantage of both algorithms.V. C ONCLUSION This paper proposed the Hybrid Hypervolume MaximizationAlgorithm (H2MA) for multi-objective optimization, whichtries to maximize the hypervolume one point at a time. It firsttries to perform deterministic local exploration and, when itgets stuck, it switches to stochastic global exploration usingan evolutionary algorithm. The optimization algorithm usedduring deterministic optimization is problem-dependent andcan be given by a gradient-based method, when the decisionspace is continuous, or a hill-climbing method, when thedecision space is discrete. Here we have explored solelycontinuous decision spaces.The algorithm was compared with state-of-the-art algo-rithms for multi-objective optimization, namely NSGA-II,SPEA2, and SMS-EMOA on the ZDT problems. Despite usingnumeric gradient for the objective functions, which increasesthe number of function calls, the algorithm consistently pro-vided a higher hypervolume for the same number of functionevaluations when compared to the aforementioned evolution-ary algorithms. Only for the ZDT3 the performance wasslightly reduced due to the discontinuous nature of the Paretofrontier, which causes a discontinuity in the hypervolume, notproperly handled by gradient-based methods.Moreover, for all problems except for ZDT6, all the solu-tions found by the algorithm were over the Pareto frontier,which makes them efficient solutions. For the ZDT6, themedian case also had all solutions over the Pareto frontier,but the use of the stochastic exploration not always guidedto a solution at the Pareto frontier. Nonetheless, the obtainedsolutions were better than those provided by the evolutionaryalgorithms. Moreover, the solutions provided for ZDT1 toZDT4 achieved high performance using only the deterministicpart of the algorithm.Evolutionary algorithms usually have better performancewhen their populations are larger, so that diverse individualscan be selected for crossover. However, most of the timepeople do not require many options, so the H2MA presentsitself as an alternative choice for finding a good set of solutionsat a lower computational cost in most problems, although itdoes not limit the computational burden and the number ofpoints found. If the problem has more reasonable objectivesthan ZDT6, which was designed with an extreme case inmind, we can expect that many points will be found by the deterministic mechanisms, which makes the algorithm morereliable. Moreover, the solutions found should be efficient,which is characterized by a low P-distance, and diverse onthe objectives, which is characterized by a larger hypervolumewhen only efficient solutions are considered.Future work should focus on using surrogates to reduce thenumber of evaluations [18], [19], [20]. Although the H2MAis very efficient on its evaluations, the numeric gradient mayconsume lots of evaluations and be unreliable for complicatedfunctions, as their implementation can cause numerical errorslarger than the step used. Using a surrogate, the gradient canbe determined directly and less evaluations are required.Another important research problem is to find a new al-gorithm for computing the hypervolume, since existing al-gorithms are mainly focused on computing the hypervolumegiven a set of points [21]. Since the solution set is constructedone solution at a time in the H2MA, a recursive algorithm thatcomputes the hypervolume of X ∪ { x } given the hypervolumeof X should reduce the computing requirement.A PPENDIX ZDT1: f ( x ) = x (4a) f ( x ) = g ( x ) h ( f ( x ) , g ( x )) (4b) g ( x ) = 1 + 9 n − n X i =2 x i (4c) h ( f ( x ) , g ( x )) = 1 − p f ( x ) /g ( x ) (4d)ZDT2: f ( x ) = x (5a) f ( x ) = g ( x ) h ( f ( x ) , g ( x )) (5b) g ( x ) = 1 + 9 n − n X i =2 x i (5c) h ( f ( x ) , g ( x )) = 1 − ( f ( x ) /g ( x )) (5d)ZDT3: f ( x ) = x (6a) f ( x ) = g ( x ) h ( f ( x ) , g ( x )) (6b) g ( x ) = 1 + 9 n − n X i =2 x i (6c) h ( f ( x ) , g ( x )) = 1 − s f ( x ) g ( x ) − sin(10 πf ( x )) f ( x ) g ( x ) (6d)ZDT4: f ( x ) = x (7a) f ( x ) = g ( x ) h ( f ( x ) , g ( x )) (7b) g ( x ) = 1 + 10( n − 1) + n X i =2 ( x i − 10 cos(4 πx i )) (7c) h ( f ( x ) , g ( x )) = 1 − p f ( x ) /g ( x ) (7d) ZDT6: f ( x ) = 1 − exp( − x ) sin (6 πx ) (8a) f ( x ) = g ( x ) h ( f ( x ) , g ( x )) (8b) g ( x ) = 1 + 9 n X i =2 x i n − ! . (8c) h ( f ( x ) , g ( x )) = 1 − ( f ( x ) /g ( x )) (8d)A CKNOWLEDGMENT The authors would like to thank CNPq for the financialsupport. R EFERENCES[1] K. Miettinen, Nonlinear Multiobjective Optimization . Springer US,1999.[2] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms .John Wiley & Sons, 2001.[3] K. Deb, A. Pratap, S. Agarwal, and T. A. M. T. Meyarivan, “A fastand elitist multiobjective genetic algorithm: NSGA-II,” EvolutionaryComputation, IEEE Transactions on , vol. 6, no. 2, pp. 182–197, 2002.[4] E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strengthPareto evolutionary algorithm,” 2001.[5] E. Zitzler, D. Brockhoff, and L. Thiele, “The hypervolume indicatorrevisited: On the design of Pareto-compliant indicators via weightedintegration,” in Evolutionary multi-criterion optimization . Springer,2007, pp. 862–876.[6] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G.Da Fonseca, “Performance Assessment of Multiobjective Optimizers: AnAnalysis and Review,” Evolutionary Computation, IEEE Transactionson , vol. 7, no. 2, pp. 117–132, 2003.[7] A. Auger, J. Bader, D. Brockhoff, and E. Zitzler, “Theory of thehypervolume indicator: optimal µ -distributions and the choice of thereference point,” in Proceedings of the tenth ACM SIGEVO workshopon Foundations of genetic algorithms . ACM, 2009, pp. 87–102.[8] N. Beume, B. Naujoks, and M. Emmerich, “SMS-EMOA: Multiobjec-tive selection based on dominated hypervolume,” European Journal ofOperational Research , vol. 181, no. 3, pp. 1653–1669, 2007.[9] D. Bertsekas, Nonlinear programming . Athena Scientific, 1999.[10] D. Luenberger and Y. Ye, Linear and Nonlinear Programming . SpringerUS, 2008.[11] P. A. N. Bosman, “On gradients and hybrid evolutionary algorithmsfor real-valued multiobjective optimization,” Evolutionary Computation,IEEE Transactions on , vol. 16, no. 1, pp. 51–69, 2012.[12] M. Emmerich and A. Deutz, “Time Complexity and Zeros of theHypervolume Indicator Gradient Field,” in EVOLVE-A Bridge betweenProbability, Set Oriented Numerics, and Evolutionary Computation III .Springer, 2014, pp. 169–193.[13] V. A. S. Hernández, O. Schütze, and M. Emmerich, “HypervolumeMaximization via Set Based Newton’s Method,” in EVOLVE-A Bridgebetween Probability, Set Oriented Numerics, and Evolutionary Compu-tation V . Springer, 2014, pp. 15–28.[14] E. Zitzler, K. Deb, and L. Thiele, “Comparison of Multiobjective Evo-lutionary Algorithms: Empirical Results,” Evolutionary Computation ,vol. 8, no. 2, pp. 173–195, 2000.[15] K. Deb, “Multi-objective optimization,” in Search methodologies .Springer, 2014, pp. 403–449.[16] F. Biscani, D. Izzo, and C. H. Yam, “A global optimisation tool-box for massively parallel engineering optimisation,” arXiv preprintarXiv:1004.3824 , 2010.[17] E. Jones, T. Oliphant, P. Peterson et al. Soft computing , vol. 9, no. 1, pp. 3–12, 2005.[19] J. Knowles and H. Nakayama, “Meta-Modeling in Multiobjective Opti-mization,” in Multiobjective Optimization . Springer, 2008, pp. 245–284.[20] I. Voutchkov and A. Keane, “Multi-objective Optimization Using Surro-gates,” in Computational Intelligence in Optimization . Springer, 2010,pp. 155–175. [21] N. Beume, C. M. Fonseca, M. López-Ibáñez, L. Paquete, and J. Vahren-hold, “On the complexity of computing the hypervolume indicator,” Evolutionary Computation, IEEE Transactions on , vol. 13, no. 5, pp.1075–1082, 2009. Conrado S. Miranda received his M.S. degree on Mechanical Engineeringand his B.S. in Control and Automation Engineering from the University ofCampinas (Unicamp), Brazil, in 2014 and 2011, respectively. He is currently aPh.D. student at the School of Electrical and Computer Engineering, Unicamp.His main research interests are machine learning, multi-objective optimization,neural networks, and statistical models.