An Online Prediction Approach Based on Incremental Support Vector Machine for Dynamic Multiobjective Optimization
Dejun Xu, Min Jiang, Weizhen Hu, Shaozi Li, Renhu Pan, Gary G.Yen
11 An Online Prediction Approach Based onIncremental Support Vector Machine for DynamicMultiobjective Optimization
Dejun Xu, Min Jiang,
Senior Member, IEEE,
Weizhen Hu, Shaozi Li,
Senior Member, IEEE,
Renhu Pan,and Gary G. Yen,
Fellow, IEEE
Abstract —Real-world multiobjective optimization problemsusually involve conflicting objectives that change over time, whichrequires the optimization algorithms to quickly track the Paretooptimal front (POF) when the environment changes. In recentyears, evolutionary algorithms based on prediction models havebeen considered promising. However, most existing approachesonly make predictions based on the linear correlation betweena finite number of optimal solutions in two or three previousenvironments. These incomplete information extraction strategiesmay lead to low prediction accuracy in some instances. In thispaper, a novel prediction algorithm based on incremental supportvector machine (ISVM) is proposed, called ISVM-DMOEA. Wetreat the solving of dynamic multiobjective optimization problems(DMOPs) as an online learning process, using the continuouslyobtained optimal solution to update an incremental supportvector machine without discarding the solution information atearlier time. ISVM is then used to filter random solutions andgenerate an initial population for the next moment. To overcomethe obstacle of insufficient training samples, a synthetic minorityoversampling strategy is implemented before the training ofISVM. The advantage of this approach is that the nonlinearcorrelation between solutions can be explored online by ISVM,and the information contained in all historical optimal solutionscan be exploited to a greater extent. The experimental results andcomparison with chosen state-of-the-art algorithms demonstratethat the proposed algorithm can effectively tackle dynamicmultiobjective optimization problems.
Index Terms —Evolutionary algorithm, multiobjective opti-mization, prediction model, oversampling, incremental supportvector machine.
I. I
NTRODUCTION D YNAMIC multiobjective optimization problems, whichrefer to a class of optimization problems involvingmultiple conflicting objectives and the objective functions orconstraints change over time, are very common in urban trafficcontrol, power system scheduling, investment management,data mining and other industrial applications [1]–[3]. For ex-ample, in the optimization schedule of multi-reservoirs systemfor a large-scale hydropower station, engineers need to mini-mize irrigation water shortage and maximize power generationunder the constraints of maintaining water balance and average
This work was supported by the National Natural Science Foundation ofChina under Grant 61673328. (Corresponding authors: M. Jiang; G. G. Yen.)
D. Xu, M. Jiang W. Hu and S. Li are with the School of Informatics,Xiamen University, China, Fujian, 361005.R. Pan is with the Fujian Longking CO., LTD, China, Fujian, 364000.G. G. Yen is with the School of Electrical and Computer Engineering,Oklahoma State University, USA. power generation rate [4]. Another convincing example is theemergency supplies allocation after a sudden disaster [5]: thetotal distance traveled and allocation time of supply trucks aretaken as optimization objectives, with the demand or urgencyof each disaster-stricken area and the maximum load of supplytrucks comprehensively considered. It is clear that dynamicmultiobjective optimization problems are widespread in real-world and play a very important role. However, solving theDMOPs still remains a big challenge due to its constant changein time or environment [6]. Therefore, the research of dynamicmultiobjective optimization algorithms (DMOAs) is of greatimportance in both theoretical front and practical use.In recent years, evolutionary algorithms have been widelyused in solving dynamic multiobjective optimization problems[7]. Evolutionary algorithm usually starts from an initialpopulation, and gradually select the optimal solution in eachiteration by specific rules, which in turn efficiently solvesome complex problems. In particular, significant progresshas been made in a class of prediction-based methods thatreuse valuable information from past moments by machinelearning and other means. For example, Koo et al. [8] proposeda dynamic predictive gradient strategy which estimates thedirection and magnitude of the next change based on previoussolutions by a weighted average approach. Solutions updatedwith the predictive gradient will remain in the vicinity of thenew Pareto-optimal set and be conducive to population conver-gence. Zhou et al. [9] presented a method named populationprediction strategy (PPS) which maintains a sequence of centerpoints to predict the next center and uses the previous manifoldto estimate the next manifold. When changes are detected,PPS can initialize the population by combining the predictioncenter and estimated manifold. Various models are used inthe prediction-based approaches to learn historical knowledgeand guide the search, enabling it to respond well to changingenvironments.However, room for improvement on the prediction-basedevolutionary dynamic multiobjective algorithm still remains.First, most of the existing methods are based on linearprediction model [10]. These models can not accurately pre-dict the new solutions if the optimal solutions in DMOPsare nonlinearly correlated at different times. Second, mostavailable methods predict the position of the new Pareto-optimal set based on optimal solutions in previous two orthree environments, but realistically only fetching the historicalinformation in the adjacent time may lead to the neglect of a r X i v : . [ c s . N E ] F e b some distribution patterns existing in earlier search. Moreover,the accuracy of prediction is highly related to the number ofhistorical optimal solutions. In reality, the optimal solutionsobtained by each search are very few compared with thewhole decision space. How to extract more information froma limited number of historical optimal solutions remains achallenge.To address these issues, this paper proposes a novelprediction-based algorithm, called ISVM-DMOEA, whichseamlessly integrate several strategies to generate a high-quality population. We believe that there are some implicitcorrelations between the optimal solutions, from which pre-dictable general patterns can be detected. For a specificproblem with a certain regularity, such patterns may exist in allenvironments experienced. If we can extract the features of theoptimal solutions to a greater extent by oversampling method,and constantly assimilate the features via online learning, wecan build a more efficient and accurate prediction model forDMOPs.The proposed method can be briefly summarized as follows:support vector machine (SVM) is introduced to explore thepotential correlations between the optimal solutions at differenttimes, and the features included in the latest optimal solutionsare utilized online in an incremental learning process. Tofurther improve the performance of incremental support vectormachine (ISVM), we use synthetic minority oversamplingtechnique to deal with the imbalanced data. As the envi-ronment changes, a continuously modified ISVM classifiercan accurately predict a good initial population for the nextmoment, which is of great help to the handling of dynamicmultiobjective optimization problems.The contributions of this work are as follows: First, thekernel function in SVM maps the vectors to a high dimensionalfeature space to construct the classifier, which can handle thepossible nonlinear correlation between solutions at differenttimes. Second, the incremental SVM not only can obtain theoptimal solution distribution in the new environment online,but also effectively reuse the information contained in allpast moments to extract a more comprehensive distributionpattern. Furthermore, the combination of oversampling andincremental SVM overcomes the sample imbalance causedby the small number of optimal solutions. The experimentalresults show that the algorithm can significantly improve theconvergence rate and the quality of solutions, and can becombined with various population-based static optimizationalgorithms.The remainder of the paper is organized as follows: Sec-tion II provides the background and some related work ofDMOPs. Section III introduces the principles of incrementalsupport vector machine and synthetic minority oversamplingtechnique. Section IV describes the proposed algorithm ISVM-DMOEA in detail. Section V presents the experimental studyand analysis. Section VI concludes the paper with suggestionsfor future work. II. P RELIMINARIES AND R ELATED W ORK
A. Dynamic Multiobjective Optimization
The dynamic multiobjective optimization problem can bedefined as: (cid:26) min F ( x, t ) = ( f ( x, t ) , f ( x, t ) , . . . , f M ( x, t )) s.t. x ∈ Ω (1)where t is the discrete time instants and x is the D -dimensiondecision variable within the decision space Ω . F refers tothe objective vector consists of M time-varying objectivefunctions. Definition 1. [Pareto Dominance]
At time t , a decision vector x p is said to dominate another vector x q , denoted by x p (cid:31) x q ,if and only if : (cid:40) ∀ i ∈ (1 , . . . , M ) , f i ( x p , t ) ≤ f i ( x q , t ) ∃ i ∈ (1 , . . . , M ) , f i ( x p , t ) < f i ( x q , t ) . (2) Definition 2. [Dynamic Pareto-optimal Set]
At time t , asolution x ∗ is said to be nondominated (Pareto-optimal) ifand only if there is no solution x in the decision space whichcan dominate x ∗ . The Dynamic Pareto-optimal Set (DPOS) isthe set of all Pareto-optimal solutions : DP OS ( t ) = { x ∗ ∈ Ω | (cid:64) x ∈ Ω , x (cid:31) x ∗ } (3) Definition 3. [Dynamic Pareto-optimal Front]
At time t , theDynamic Pareto-optimal Front includes the correspondingobjective vectors of the DPOS : DP OF ( t ) = { F ( x ∗ , t ) | x ∗ ∈ DP OS ( t ) } (4) B. Related Work
Over the years, great progress has been made in the inves-tigation of DMOAs. In general, most existing algorithms canbe categorized into three classes: diversity-based approaches,memory-based approaches, and prediction-based approaches.Diversity-based approaches aim to keep the balance betweenconvergence and diversity. There are two main strategies toenhance the diversity of a population: diversity introductionand diversity maintenance. Diversity introduction can effec-tively prevent the solutions from trapping in local optima [11],[12]. Deb et al. [13] proposed two variants of NSGA-II fordynamic optimization problems. The first version is calledDNSGA-II-A, which introduces randomly generated solutionsto replace part of the population; the second version is calledDNSGA-II-B, which enhances the diversity by replacing aportion of the population with mutated solutions. Liu et al. [14] proposed a method sensitive to change intensity. Whenenvironmental change is detected, two strategies are utilizedin different situations: an inverse modeling is used for drasticchanges, while partially initialization is utilized for mild ones.Ruan et al. [15] presented a hybrid diversity algorithm. In anexploitation step, some diverse individuals within the regionof the next probable POS are randomly generated.Some methods based on diversity maintenance were alsopresented to solve the DMOPs [16], [17]. A steady-state andgenerational evolutionary algorithm (SGEA) was introduced in [18], which responds to environmental changes in a steady-state manner. When change occurs, SGEA retains part ofoutdated solutions with good diversity and predicts some solu-tions according to the previous environment. These solutionsare mixed with random solutions with a certain proportionto create a new population. Shang et al. proposed a class ofevolutionary optimization algorithms based on clonal selection[19], [20]. These algorithms directly use the POS of the currentenvironment as the initial population of the new environment.Memory-based approaches use additional storage to im-plicitly or explicitly reserve the solutions in the historicalenvironment, and reuse the stored solutions in the new en-vironment. An adaptive hybrid population management strat-egy using memory, local search and random strategies wasproposed by Azzouz et al. in [21]. In this algorithm, thememory size and the number of random solutions to beextracted are dynamically adjusted according to the severityof the change. Xu et al. [22] presented a memory-enhanceddynamic multiobjective evolutionary algorithm based on Lpdecomposition (dMOEA/D-Lp). A subproblem-based bunchymemory scheme is used in dMOEA/D-Lp to store goodsolutions from past environments and reuse them when neces-sary. Sahmoud et al. [23] proposed a hybrid storage strategyintegrating memory mechanism within NSGA-II. To improvethe ability of NSGA-II to track the non-dominated solutionsin dynamic environment, explicit memory is implemented tostore the best solutions in each generation. Helbig et al. [24] introduced a dynamic vector evaluation particle swarmoptimisation (DVEPSO) algorithm and investigated variousways to manage the archive when the environment changes.Chen et al. [25] proposed a two-archive algorithm that dy-namically reconstructs two populations (one concerns aboutconvergence and the other concerns about diversity) to solveproblems with a time-dependent number of objectives. Ingeneral, memory-based mechanism is suitable for DMOPswith periodic changes.Recently, prediction-based approaches have arisen with anincreasing interest among researchers, and a great numberof prediction algorithms have been proposed. Essentially,prediction-based approaches use the information of the his-torical optimal solutions to predict the location of the newPOS. Muruganantham et al. [26] proposed a Kalman Filter(KF) based dynamic multiobjective optimization algorithm(MOEA/D-KF). In this method, a linear discrete KF composedof time update equations and measurement update equationsis used to estimate the process state by feedback control. A2-D KF and a 3-D KF were designed to predict the location ofnew POS when change is detected, and then a decomposition-based differential evolution algorithm was used to obtain theoptimal population.Rong et al. [27] presented a multidirectional predictionstrategy (MDP) to enhance the performance of evolution algo-rithms. A number of representative individuals are selected viaadaptive clustering, and the population is then classified intoseveral clusters according to the distances between individu-als. Subsequently, MDP constructs time series models basedon the historical information provided by the representativeindividuals, which is used to predict a number of evolutionary directions. However, only the trajectories in the previous twoenvironments are considered in MDP.To reduce computing cost, Li et al. [28] proposed a predic-tive strategy based on special points (SPPS) including feed-forward center points, boundary points, close-to-center points,close-to-boundary points and knee points. The special pointset that eliminates useless individuals can make the predictedpopulation track POF more accurately. Knee points were alsoadopted in [29] and [30] to facilitate the tracking ability. Wu etal. [31] introduced a directed search strategy to predict a newpopulation, in which the moving direction of POS is estimatedby the position of centroid points. Min et al. [32] proposed anadaptive knowledge reuse framework based on multiproblemsurrogates, which accelerated the convergence of expensivemultiobjective optimization.Evolutionary transfer optimization (ETO) is an emergingparadigm in prediction [33], [34]. Da et al. [35] proposed anadaptive transfer framework to utilize the similarity of black-box optimization problems online. Bali et al. [36] dynamicallyadapted the extent of transfer between different tasks basedon the optimal mixing of probabilistic model. Jiang et al. [37]proposed a dynamic multiobjective optimization method basedon transfer learning (Tr-DMOEA), which maps the POF in thepast environment into a latent space via transfer componentanalysis (TCA), and then uses these mapped solutions toconstruct a high-quality population. Inspired by Tr-DMOEA,some transfer learning algorithms combined with memorymechanisms or pre-search strategies were proposed to solveDMOPs [38]–[40].Differential models and linear models are often used inprediction algorithms. Liu et al. [41] proposed an improvedadaptive differential evolution crossover operator to facilitatepopulation evolution, and use the information from the pasttwo searches to make prediction. In [9], Zhou et al. constructeda time series for each individual in the population, and useda simple linear model to predict the individual position in thenext time window. Cao et al. [42] introduced a first-order andsecond-order mixed difference model based on the historicalposition to predict the centroid position of the population.Liang et al. [43] proposed a hybrid of memory and predictionstrategies for dynamic multiobjective optimization (MOEA/D-HMPS). In response to dissimilar changes, MOEA/D-HMPSexploits the moving direction of the population center at theprevious two continuous time steps to predict the movingtrajectory at the next moment.Prediction-based methods can take advantage of the trendingin POS changes and show a promising performance in solvingDMOPs. However, most prediction models assume that alinear correlation exists in the solutions at different times.While in many cases, even POS at adjacent moments arenonlinearly correlated. In addition, the time series constructedby most models is very short, and the information containedin the earlier searches is lost, which will affect the accuracyof prediction. Therefore, it is necessary to improve the gen-eralization ability and information extraction ability of theprediction model. In this paper, incremental support vectormachine (ISVM) and synthetic minority over-sampling tech-nique (SMOTE) will be introduced to enhance the prediction model.III. I
NCREMENTAL S UPPORT V ECTOR M ACHINE AND S YNTHETIC M INORITY O VER - SAMPLING T ECHNIQUE
As incremental support vector machine and synthetic mi-nority over-sampling technique are two critical components inthe proposed algorithm, we will review them in this sectionfor the completeness of the presentation. In the process ofsolving DMOPs, ISVM can incrementally learn knowledgefrom previous POS to accurately determine the quality of thesolutions, while SMOTE can extract more information from alimited number of POS samples.
A. Incremental Support Vector Machine
Support Vector Machine (SVM) is a widely used binaryclassification model with sparsity and robustness [44]. Thestrategy of SVM is to build an optimal hyperplane in thefeature space for binary classification by maximizing theclassification interval [45]. The application of kernel functionin SVM enables it to solve nonlinear and high dimensionalpattern recognition problems well, which maps the samplesfrom low dimensional space to a high dimensional space andturns the problem into a linearly separable one.Generally, the training data of a typical SVM are importedin batch. But in some instances, SVM needs to be trained on-line. Incremental Support Vector Machine (ISVM) is proposedto handle incoming samples. ISVM can gradually update theparameters to accommodate new samples without training onall samples repeatedly [46]. Next, the principle of ISVM isbriefly introduced.To generate an ISVM, we need to create a discriminantfunction f ( x ) = w · φ ( x ) + b learned from the samples { ( x i , y i ) ∈ R m × {− , } , ∀ i ∈ { , . . . , N }} . That is to solvea quadratic programming problem: (cid:26) min w, b 12 (cid:107) w (cid:107) + C · (cid:80) Ni =1 ε i s.t. y i ( w · x i + b ) ≥ − ε i , i ∈ { , . . . , N } (5)The first term represents the maximized interval distance,while the second term is the regularization term. C is thepenalty parameter, and ε i is the slack variable used to build asoft margin. When dealing with nonlinear issues, the quadraticprogram is typically expressed in its dual form: min ≤ α i ≤ C : L = 12 (cid:88) i,j α i Q ij α j − (cid:88) i α i + b (cid:88) i y i α i (6)where Q ij = y i y j K ( x i , x j ) , K ( x i , x j ) = ϕ ( x i ) · ϕ ( x j ) . K is the kernel function that implicitly maps the vectorsto a high dimensional feature space meanwhile simplify thecalculation. The dual form of SVM discriminant function isherein represented as f ( x ) = (cid:80) j α j y j K ( x j , x ) + b . The Karush-Kuhn-Tucker (KKT) conditions uniquely definethe solution of dual parameters { α, b } by the first-orderconditions on L : G i = ∂L∂α i = (cid:88) j Q ij α j + y i b − > , α i = 0= 0 , ≤ α i ≤ C< , α i = C∂L∂b = (cid:88) j y j α j = 0 (7)The KKT conditions partition the training samples into threecategories: the set S of margin support vectors with G i = 0 ,the set E of error support vectors with G i ≤ and the set R of the remaining vectors with G i > [47].With the continuous introduction of new samples in theincremental learning process, the margin vector coefficientschange simultaneously to keep the KKT conditions satisfiedfor all previously trained samples. For a new sample m considered as a candidate support vector, the KKT conditionscan be expressed differentially as: ∆ G i = Q im ∆ α m + (cid:80) j ∈ S Q ij ∆ α j + y i ∆ b, ∀ i ∈ D ∪ { m } y m ∆ α m + (cid:80) j ∈ S y j ∆ α j (8)Since G i = 0 for the margin vector set S = { S , ..., S ls } ,the changes in coefficients must satisfy: Q · ∆ b ∆ α s ... ∆ α s lS = − y m Q s m ... Q s lS m ∆ α m (9)where Q is a symmetric but not positive-definite Jacobianmatrix: Q = y s · · · y s (cid:96)S y s Q s s · · · Q s s (cid:96)S ... ... . . . ... y s (cid:96)S Q s(cid:96) S s · · · Q s (cid:96)S s (cid:96)S (10)Then we can get ∆ b = β ∆ α m ∆ α j = β j ∆ α m , ∀ j ∈ D (11)with coefficient sensitivities ββ s ... β s (cid:96)S = −R · y m Q s m ... Q s (cid:96)S m (12)where R = Q − and β j = 0 for all j outside S . Hence, wehave the KKT conditions in equation (7) changed accordingto: ∆ G i = γ i ∆ α m , ∀ i ∈ D ∪ { m } γ i = Q im + (cid:80) j ∈ S Q ij β j + y i β, ∀ i / ∈ S (13) To append a candidate vector m into the margin vector set S , R is expanded as: R ← R ... · · · + 1 γ m ββ s ... β s (cid:96)S · (cid:104) β, β s · · · β s (cid:96)S , (cid:105) (14)Conversely, to remove a vector k from S , R is deflated as: R ij ← R ij − R − kk R ik R kj , ∀ i, j ∈ S ∪ { } ; i, j (cid:54) = k (15)Consequently, when a new sample m is added to thetraining data set D : D l +1 = D l ∪ { m } , the solution ofparameters { α, b } is updated with respect to the candidate x m , y m , the present solution and Jacobian inverse matrix R .The incremental procedure can be summarized as:1. Initialize α m to zero, calculate G m ;2. If G m > , terminate ( m is not a margin or error vector);3. If G m ≤ , apply the largest possible increment α m sothat one of the following conditions occurs: (a) G m = 0 : Add m to margin set S , update R accordingly and terminate; (b) α m = C : Add c to error set E , terminate; (c) Elements of D l migrate across S , E and R : update membership of elementsand repeat step 3. If S changes, update R accordingly. B. Synthetic Minority Over-sampling Technique
Uniformity in quantity of samples is a key factor affectingthe accuracy of a classifier. Excessive differences in thenumber of samples may bias the classification results towardthe dominant category. Synthetic minority over-sampling tech-nique (SMOTE) is an effective way to deal with imbalanceddata [48].SMOTE suggests that there are some potentially availablesamples between the adjacent minority samples in the featurespace. New samples are synthesized based on adjacent samplesto balance the data set. As shown in Fig. 1, x m is a samplebelonging to minority class in a two-dimensional featurespace, and X m = (cid:8) x m , x m , . . . , x km (cid:9) are k neighbors of x m in the same category. By multiplying the difference of thetwo vectors by a random number (range 0-1), new samples Y = (cid:8) y , y , . . . , y r (cid:9) are synthesized along the line segmentbetween x m and samples in X m . The number of synthesizedsamples is determined by the oversampling rate r . Finally, r new samples are added to the initial minority sample setto maintain the balance of sample class thereby improve thegeneralization ability of the classifier.IV. P ROPOSED A LGORITHM
In this section, we propose an online algorithm based on in-cremental support vector machine to solve DMOPs. The mainidea is to train an ISVM classifier online by reusing the optimalsolutions obtained in the previous environments which arepreprocessed by a sampling strategy based on SMOTE. Whenenvironmental changes are detected, the classifier predicts ahigh-quality initial population, which helps the optimizationalgorithm to converge to the true POS more quickly and Fig. 1: Illustration of synthetic minority over-samplingtechnique (taking 2D decision space as an example).accurately. The schematic diagram of the proposed ISVM-DMOEA is presented in Fig. 2. Briefly, ISVM-DMOEA con-sists of two main subroutines:1) POSMOTE sampling strategyand 2) ISVMPRE online prediction strategy. The descriptionof subroutines is followed by the framework illustration ofISVM-DMOEA.
A. POSMOTE Sampling Strategy
As mentioned above, an ISVM classifier will be trainedto predict an initial population for the next moment basedon previous distribution of solutions. The optimal solutionsobtained in the past environments are regarded as positivesamples, while solutions with poor quality are regarded asnegative samples accordingly. In DMOPs, a limited number ofpositive samples composed by optimal solutions will lead tolow accuracy of ISVM. Therefore, it is necessary to generate asufficient and balanced sample set for the predictor in advance.SMOTE is employed in the oversampling of positive sam-ples belonging to minority class. The optimal solution setobtained at the last moment is expressed as
P OS t − = (cid:8) P OS t − , P OS t − , . . . , P OS nt − (cid:9) . In the first step, k nearestneighbors of each optimal solution P OS nt − are identifiedin P OS t − by the Euclidean distance. To synthesize a newsample P + sy , a vector P OS mt − is selected from the k near-est neighbors of P OS nt − . The attributes of P + sy in eachdimension are calculated randomly by linear interpolation of P OS mt − and P OS nt − : P + sy ( d ) = P OS nt − ( d ) + Rand × (cid:0) P OS nt − ( d ) − P OS mt − ( d ) (cid:1) (16)where d ∈ { , . . . , D } , and D is the dimension of thedecision vector. Rand represents a random number in (0,1).The number of new samples synthesized by each optimalsolution depends on the oversampling rate r . n × r syntheticsamples and n original samples constitute the positive sampleset P + .Compared with positive samples, the number of negativesamples is very large. The generation of negative sample setcan be regarded as a down-sampling process. Since most ofthe solutions in the decision space fall outside of the P OS t − ,negative sample set can be composed of randomly synthesizedsolutions. To generate a balanced sample set, n × ( r + 1) Fig. 2: Schematic of ISVM-DMOEA. At time t , the search for POS can be decomposed into: sampling (POSMOTE) → prediction (ISVMPRE) → optimization (SMOA).Fig. 3: Illustration of POSMOTE sampling strategy (taking2D decision space as an example): positive samples aresynthesized by calculating the linear interpolation betweenthe solutions in P OS t − , and negative samples are randomlygenerated according to the number of positive samples.negative samples are synthesized. As illustrated in Fig. 3, thenumber of negative samples in P − is consistent with that ofpositive samples in P + , and the samples are sufficient for thetraining of ISVM. The details of POSMOTE are shown inAlgorithm 1. B. ISVMPRE Online Prediction Strategy
For a specific problem, we assume that a general distributionpattern lies in the optimal solutions at different instants. Thestrategy of prediction is to explore the general pattern with abinary classifier. A well-trained classifier can estimate whethera solution has the characteristics of being an optimal solution.SVM can map the solutions in the decision space to ahigh dimensional feature space to construct a linear classifier,which can explore the implicit connection between the optimalsolutions in the original space, regardless linear or nonlinear.However, if only optimal solutions in the former environmentare used to construct an SVM each time, the informationcontained in solutions at earlier instants cannot be extracted;while if all the optimal solutions in previous environments areused to train the SVM, enormous amount of time and storagespace will be needed.To construct a prediction model with both high efficiencyand accuracy, incremental SVM is employed. The parametersof ISVM are constantly updated as the environment changes.ISVM t is an updated model of ISVM t − based on newsamples imported online. A randomly generated solution isrecognized as a candidate for optimal solution if it is classified Algorithm 1:
POSMOTE — Sampling Strategy
Input: the POS with n solutions obtained at the lastmoment, P OS t − ; the oversampling rate r ; thenumber of nearest neighbors considered, k ; thenumber of decision variables, D ; Output: a balanced sample set, P train ; initialize the sample set, P train = ∅ ; for i = n do identify k nearest neighbors of P OS it − ; while r (cid:54) = 0 do randomly select a neighbor in k neighbors; for j = D do calculate the attributes of the synthesizedvector P + sy in each dimension according toformula 16; end P + = P + ∪ P + sy ; r = r - 1; end end P + = P + ∪ P OS t − ; for i = n ( r + 1) do randomly generate a vector P − sy in the decisionspace; P − = P − ∪ P − sy ; end P train = { P + , P − } ; return P train ;as positive in ISVM t . To generate a promising population,ISVM t works as a filter: random solutions classified as positiveare retained, and the negative ones are discarded. Finally, N P positive samples are placed into P OP t , which is a high-qualityinitial population for the new environment. The pseudocode ofISVMPRE is presented in Algorithm 2. C. Framework of ISVM-DMOEA
Algorithm 3 depicts the overall framework of the proposedISVM-DMOEA. The entire process of solving a dynamicmultiobjective optimization problem is accompanied by theonline learning process of ISVM-based predictor.In the first environment, the population is initialized ran-domly, and then the initial population is optimized by apopulation-based static multiobjective optimization algorithm
Algorithm 2:
ISVMPRE — Online Prediction Strategy
Input: the population size, N p ; the parameters ofISVM at the last moment, ISVM t − ; thetraining samples, P train ; Output: a predicted population,
P OP t ; updatedparameters of ISVM at time t, ISVM t ; initialize the predicted population: P OP t = ∅ , N count = 0; incrementally train the ISVM t based on ISVM t − withsamples in P train ; while N count Framwork of ISVM-DMOEA Input: the dynamic optimization problem, F ( x, t ) ;the population size, N p ; the oversampling rate r ; the number of nearest neighbors considered, k ; the number of decision variables, D ; Output: The solutions at time t, P OS t ; randomly initialize a population P OP , t = 0; P OS = SMOA( P OP , F ( x, , N p ); while change detected do t = t + 1; P train = POSMOTE( P OS t − , r , k , D ); P OP t = ISVMPRE( P train , ISVM t − , N p ); P OS t = SMOA( P OP t , F ( x, t ) , N p ); end return P OS t ; V. E XPERIMENTAL S TUDY A. Benchmark Problems and performance indicators In this paper, the performance of the proposed ISVM-DMOEA and chosen competing algorithms is uniformly ex-amined on CEC 2018 DMO benchmark suite with nine bi-objective and five tri-objective problems. These problems arenamed DF1-DF14, which cover diverse problem character-istics such as dynamic POF/POS geometries, irregular POFshapes, variable linkage and disconnectivity. The time instancet involved in the problems is defined as t = n t (cid:98) τ /τ t (cid:99) , where n t , τ t and τ denote the severity of change, the frequencyof change and the iteration counter, respectively. For eachproblem, various pairs of n t and τ t are implemented toconfigure different dynamic characteristics. The definition oftest instances is detailed in [49].The following matrix are adopted to assess the performanceof algorithms:1) Inverted Generational Distance (IGD): IGD is afrequently-used metric to measure the convergence and diver-sity of the solutions by computing the difference between truePOF and the POF estimated by an algorithm. At time t , IGDis calculated as: IGD (POF ∗ t , POF et ) = (cid:80) p ∈ POF ∗ t d ( p, POF et ) | POF ∗ t | (17)where POF ∗ t is a set of points uniformly sampled from thetrue POF t , and POF et is estimated by the tested algorithm. d ( p, POF et ) represents the minimum Euclidian distance be-tween p belonging to POF ∗ t and the points in POF et . The d ( p, POF et ) of each point in POF ∗ t is calculated and then theaverage distance is computed.To apply IGD to the changing environments, a variant ofIGD called MIGD is adopted. MIGD takes the average of theIGD values at different time, given by: MIGD = (cid:80) t ∈ T IGD (POF ∗ t , POF et ) | T | (18)where T is a set of discrete instants and | T | is the numberof changes in a run. In this paper, 1000 points are uniformlysampled from POF ∗ t for the calculation of IGD.2) Hypervolume (HV): HV is used to evaluate the diversityand distribution of the solutions by computing the hypervol-ume of the region dominated by the obtained POF [50]. HVis formally defined as follows: HV(POF et , rp ) = Λ (cid:91) p ∈ POF et { q | rp (cid:31) q (cid:31) p } (19)where Λ denotes the Lebesgue measure and rp ∈ R D is thereference point calculated by the maximum value of the d − th objective of the POF. Similarily, we can define a metric MHVbased on HV, which is given as: MHV = (cid:80) t ∈ T HV (POF et , rp ) | T | (20) B. Compared algorithms and parameter settings Without loss of generality, we choose the regularity model-based multiobjective estimation of distribution algorithm (RM-MEDA) as the static optimizer in the proposed ISVM-DMOEA. RM-MEDA is a widely-used algorithm that con-structs a posterior probability distribution model of promisingsolutions based on global statistical information extracted fromselected solutions [51]. The proposed algorithm incorporatingRM-MEDA is called ISVM-RM-MEDA.To verify the performance of ISVM-RM-MEDA, four state-of-the-art algorithms are chosen for comparison: Kalmanprediction-based MOEA (MOEA/D-KF) [26], population pre-diction strategy (PPS) [52], support vector regression-basedpredictor (MOEA/D-SVR) [10] and transfer learning-basedDMOEA (Tr-DMOEA) [37]. These algorithms use differentprediction strategies to solve DMOPs, and have achievedconsiderable performance. Besides, RM-MEDA is modifiedto adapt to dynamic problems, called DA-RM-MEDA. For afair comparison, the baseline optimizer in these algorithms areuniformly replaced by RM-MEDA. In our empirical studies,the compared algorithms are referred to as KF-RM-MEDA,PPS-RM-MEDA, SVR-RM-MEDA, TR-RM-MEDA and DA-RM-MEDA, respectively.The parameters in the test environments and algorithms areset as follows:1) Population size and number of decision variables: Thepopulation size is set to be 100 for bi-objective problems(DF1-DF9) and 150 for tri-objective problems (DF10-DF14).The number of decision variables is set as 10 for all testproblems.2) Dynamic characteristics: For each problem, three pairsof dynamic characteristics are set by different combinationsof n t and τ t : ( n t =10, τ t =10); ( n t =5, τ t =10); ( n t =10, τ t =5). τ /τ t is fixed to be 30, which ensures there are 30 changes ineach run. Each algorithm is run 20 times on each test instanceindependently.3) Parameters in algorithms: In the proposed algorithm,Gaussian RBF kernel is selected as the kernel function ofISVM, and the kernel scale is determined by grid search;both the oversampling rate r and the number of k nearestneighbors are both set to be 5 in POSMOTE. The parametersof RM-MEDA and compared algorithms are referenced fromthe original papers. C. Comparison Study The statistical results of MIGD and MHV obtained byISVM-RM-MEDA and five competing algorithms are pre-sented in Table I and Table II, respectively. For an intuitivecomparison, the best values on each instance are hilighted inbold and the Wilcoxon rank sum test at the 0.05 significancelevel is carried out to indicate the significance between differ-ent results.It can be seen from Table I that ISVM-RM-MEDA ob-tains the best MIGD results on the majority of the testedproblems, implying that the proposed algorithm has superiortracking ability of dynamic POF in most cases. ISVM-RM-MEDA performs slightly worse than PPS-RM-MEDA in half of the configurations on DF1, DF2 and DF9, which may beattributed to the comprehensive strategy of PPS combiningPOS center prediction and manifold estimation. The POS ofDF8 contains a group of stationary centroids and the POSvaries nonlinearly over time, which makes it very difficult to beapproximated. Therefore, the MIGD values on DF8 obtainedby the randomly reinitialized approach for RM-MEDA (DA-RM-MEDA) is better than those of all the algorithms basedon prediction. Nevertheless, two nonlinear prediction drivenalgorithms, ISVM-RM-MEDA and SVR-RM-MEDA, achievebetter results than other prediction algorithms on DF8. ISVM-RM-MEDA seems slightly inefficient on the complex problemDF12 with a time-varying number of POF holes. Exceptfor DF12, ISVM-RM-MEDA significantly outperform otheralgorithms on tri-objective problems. Obviously, the MIGDvalues obtained by ISVM-RM-MEDA are competitive in threepairs of ( n t , τ t ) configurations, demonstrating that ISVM-RM-MEDA is a robust model with less sensitive to the frequencyand severity of change.Fig. 4 depicts the evolution curves of the IGD values onDF1-DF14 averaged over 20 runs with n t = 10 and τ t = 10. Wecan see that the IGD curves obtained by ISVM-RM-MEDA arerelatively low on most problems except for DF12. Besides, thefluctuation range of the curves obtained by ISVM-RM-MEDAis slighter than other algorithms, indicating that the proposedmethod can respond to the environmental changes quickly andstably.As shown in Table II, ISVM-RM-MEDA achieves the bestperformance on 30 out of 42 test instances in terms of MHVmetrics. The MHV value obtained by ISVM-RM-MEDA isslightly worse than that of PPS-RM-MEDA on DF1-DF2 andTR-RM-MEDA on DF12, but better than that of the restalgorithms. The Wilcoxon test result indicates that there isno significant difference between these solution sets.The MIGD values indicate the comparatively convergenceof solutions obtained by ISVM-RM-MEDA, and the MHVvalues indicate the superior diversity and distribution. In otherwords, the proposed ISVM-based prediction model can effec-tively explore the linear or nonlinear correlations between POSobtained at different environments, thereby generate promisingpopulations for varying environments. D. Ablation Study In the proposed algorithm, two key procedures are imple-mented: SMOTE-based sampling strategy and ISVM-basedprediction strategy. The results of comparison study haveshown that the combination of the two strategies can sig-nificantly improve the quality of the predicted population.However, the role that each strategy plays in solving DMOPsremains unclear. To verify the effectiveness of these twostrategies, we carry out an ablation experiment. We modifythe sampling strategy and propose two variants, in which theoversampling rate r of POSMOTE is set to 0 and 3, and r = 0 indicates that the sampling strategy is switched off.The two variants are denoted as ISVM r -RM-MEDA andISVM r -RM-MEDA, respectively. In the exploration of theprediction strategy, we keep r = 5 and deactivate the online TABLE I: MEAN AND STANDARD DEVIATION VALUES OF MIGD OBTAINED BY ISVM-RM-MEDA ANDCOMPARED ALGORITHMS Prob ( n t , τ t ) DA-RM-MEDA KF-RM-MEDA PPS-RM-MEDA SVR-RM-MEDA TR-RM-MEDA ISVM-RM-MEDADF1 (10,10) 0.1576±4.55E-03(+) 0.1972±1.99E-02(+) (10,5) 0.3481±9.82E-03(+) 0.2462±7.21E-03(+) (5,10) 0.1034±1.41E-03(+) 0.1366±1.29E-02(+) 0.0741±5.64E-03(+) 0.0933±7.11E-03(+) 0.0711±7.32E-03(+) (10,5) 0.2286±4.12E-03(+) 0.2018±8.95E-03(+) (5,10) 0.4238±1.15E-02(+) 0.2656±1.84E-02(+) 0.2415±1.54E-02(+) 0.4089±1.63E-02(+) 0.2360±1.19E-02(+) (10,5) 0.5371±1.74E-02(+) 0.3276±8.79E-03(+) 0.2809±3.57E-02(+) 0.5361±2.29E-02(+) 0.3810±3.96E-02(+) DF4 (10,10) 1.4078±9.79E-03(+) 2.0779±2.56E-02(+) 1.0118±1.78E-02(+) 1.3655±3.07E-02(+) 1.2846±3.51E-02(+) (5,10) 1.4163±1.05E-02(+) 2.0865±1.64E-02(+) 1.0334±1.15E-02(+) 1.3735±2.32E-02(+) 1.3092±3.96E-02(+) (10,5) 1.7953±2.38E-02(+) 2.1402±1.57E-02(+) 1.0861±1.48E-02(=) 1.7378±4.49E-02(+) 1.5536±6.89E-02(+) DF5 (10,10) 1.5713±2.82E-02(+) 5.0891±1.03E+00(+) 1.1942±1.30E-02(+) 1.8507±3.50E-02(+) 2.1915±1.33E-01(+) (5,10) 1.7356±2.23E-02(+) 5.4275±7.56E-01(+) 1.1311±6.80E-03(=) 1.7124±2.94E-02(+) 1.8596±1.60E-01(+) (10,5) 1.9029±1.84E-02(+) 4.4369±1.05E+00(+) 1.3787±6.25E-02(+) 2.1661±3.93E-02(+) 2.1836±3.44E-02(+) DF6 (10,10) 7.5990±5.43E-01(+) 26.517±2.48E+00(+) DF7 (10,10) 6.2632±3.93E-01(+) 26.876±2.69E-01(+) DF8 (10,10) DF10 (10,10) 0.1683±1.53E-03(+) 0.2286±2.04E-03(+) 0.1389±1.84E-03(+) 0.1792±1.80E-03(+) 0.1198±2.91E-03(+) (5,10) 0.1462±1.55E-03(+) 0.2285±3.85E-03(+) 0.1478±1.88E-03(+) 0.1815±2.81E-03(+) 0.1320±2.24E-03(+) (10,5) 0.2053±2.83E-03(+) 0.3072±5.13E-03(+) 0.1874±2.68E-03(+) 0.2181±3.09E-03(+) 0.1548±3.98E-03(+) DF11 (10,10) 0.1935±7.33E-03(+) 0.2373±7.34E-03(+) 0.1565±1.09E-02(+) 0.2280±2.55E-03(+) 0.3587±4.45E-02(+) (5,10) 0.1919±3.35E-03(+) 0.2823±5.46E-03(+) 0.2235±4.30E-03(+) 0.2436±3.49E-03(+) 0.4266±2.40E-02(+) (10,5) 0.3418±6.32E-03(+) 0.3808±3.66E-03(+) 0.1783±7.75E-03(+) 0.3656±4.36E-03(+) 0.3395±1.87E-02(+) DF12 (10,10) 0.7447±2.51E-02(-) (5,10) 1.7724±3.20E-02(+) 1.2424±1.84E-02(+) 1.2986±2.03E-02(+) 1.7943±2.12E-02(+) 1.8959±2.20E-02(+) (10,5) 1.8983±2.36E-02(+) 1.6241±4.30E-02(+) 1.6325±3.88E-02(+) 2.2068±4.07E-02(+) 2.3661±3.62E-02(+) DF14 (10,10) 1.1099±3.00E-02(+) 0.9028±1.43E-02(+) 0.8579±7.83E-03(+) 1.2956±3.70E-03(+) 1.2527±3.54E-02(+) (5,10) 1.2009±1.91E-02(+) 0.8273±1.53E-02(+) 0.8362±6.64E-03(+) 1.1804±1.96E-02(+) 1.1503±1.86E-02(+) (10,5) 1.3038±2.59E-02(+) 1.0438±1.96E-02(+) 1.0334±4.22E-02(+) 1.4983±5.73E-02(+) 1.5649±3.39E-02(+) (+), (=) and (-) indicate that ISVM-RM-MEDA performs significantly better or equivalently or worse than the compared algorithms, respectively. update mechanism of ISVM. In other words, every time theenvironment changes, a new SVM is constructed based on thePOS obtained from the former environment. The third variantis denoted as SVM-RM-MEDA.Three variants were tested with the same parameters as thecomparison experiment and the statistical results of MIGDaveraged over three pairs of ( n t , τ t ) configurations for eachproblem are shown in Table III. Consistent with the resultsin Table I, the average MIGD values obtained by ISVM-RM-MEDA are significantly better than those of DA-RM-MEDAon most problems with 30%-60% improvement. The onlyexceptions are DF8 and DF12, which are also challenges forother comparison algorithms.ISVM r -RM-MEDA performs worse than DA-RM-MEDAon 6 out of 14 problems, and only a tiny improvement was observed on the rest of problems. Invalidation of ISVM r -RM-MEDA can be attributed to the limited number of POS trainingsamples, which is insufficient to build an accurate classifier.With the samples tripled by the POSMOTE sampling strat-egy, ISVM r -RM-MEDA achieves significantly better MIGDresults than ISVM r -RM-MEDA. It is clear that inefficientpredictions on DF1-DF3 and DF7 are corrected and the MIGDvalues on most problems are reduced. The quality of thesolution sets obtained by ISVM-based algorithms is improvedwith the increase of the oversampling rate ( r = 0, 3, 5),which demonstrates the necessity and advantage of proposedsampling strategy.SVM-RM-MEDA with the oversampling rate of 5 showseffective predictions. But compared with ISVM-RM-MEDA,SVM-RM-MEDA still has room for improvement. The su- l og (I GD ) DF1 l og (I GD ) DF2 l og (I GD ) DF3 l og (I GD ) DF4 l og (I GD ) DF5 l og (I GD ) DF6 l og (I GD ) DF7 l og (I GD ) DF8 l og (I GD ) DF9 l og (I GD ) DF10 l og (I GD ) DF11 l og (I GD ) DF12 l og (I GD ) DF13 l og (I GD ) DF14 DA-RM-MEDAKF-RM-MEDAPPS-RM-MEDA SVR-RM-MEDATR-RM-MEDAISVM-RM-MEDA Fig. 4: Average IGD values versus environmental changes for different problems with n t = 10 and τ t = 10.periority of ISVM-RM-MEDA over SVM-RM-MEDA canbe attributed to the incremental learning mechanism, whichincorporate more general features from samples in previousenvironments rather than only utilize the information fromadjacent moment. E. Adaptation Study In the experiments above, RM-MEDA is employed to opti-mize the population in each iteration. To further investigateif ISVM-DMOEA relies on a specific static multiobjectiveoptimization algorithm, we choose two other popular staticalgorithms as the optimizer. The first one is NSGA-II [53], agenetic algorithm that uses non-dominant sorting and crowdingdistance to select dominant individuals derived from crossmutation. The second one is the multiple objective particleswarm optimization algorithm [54], abbreviated as MOPSO,in which the flight direction of particles determined by Paretodominance is regarded as a guide for solution search.Similar to the operation on RM-MEDA, these two methodsare embedded in ISVM-DMOEA and also modified to adaptto dynamic change. The modified algorithms are called DA-NSGA-II, ISVM-NSGA-II, DA-MOPSO and ISVM-MOPSO,respectively. The results of average MIGD obtained by thefour algorithms on DF1-DF14 are shown in Table IV.Due to different static optimization strategies, the averageMIGD values obtained by three kinds of algorithms are quite different. However, it is clear from the table that ISVM-basedalgorithms for NSGA-II and MOPSO significantly surpassthe randomly reinitialized algorithms, which is also observedin RM-MEDA. ISVM-MOPSO greatly improves the perfor-mance of DA-MOPSO and even reduces the average MIGDvalues by more than 60% on four problems, because thehigh-quality initial population accelerates the convergence ofparticle swarm to the optimal solutions.The strong adaptability and outstanding performance onRM-MEDA, NSGA-II and MOPSO demonstrate that ISVM-DMOEA is a versatile algorithm that can effectively cooperatewith various static optimization algorithms.VI. C ONCLUSION Prediction-based models are promising in solving dynamicmultiobjective optimization problems, which estimate the opti-mal solutions by taking advantage of existing information. Anassumption is put forward that predictable general patternscan be detected from some implicit correlations betweenthe optimal solutions obtained in changing environments. Inthis paper, we propose an online prediction model based onincremental support vector machine (ISVM) to utilize thesepatterns. The parameters of ISVM are updated online toaccommodate new samples and extract the linear or nonlinear(more common in practice) correlations between previous TABLE II: MEAN AND STANDARD DEVIATION VALUES OF MHV OBTAINED BY ISVM-RM-MEDA ANDCOMPARED ALGORITHMS Prob ( n t , τ t ) DA-RM-MEDA KF-RM-MEDA PPS-RM-MEDA SVR-RM-MEDA TR-RM-MEDA ISVM-RM-MEDADF1 (10,10) 0.4047±3.10E-03(+) 0.3979±1.66E-02(+) (5,10) 0.2042±2.68E-03(+) 0.2946±1.61E-02(+) 0.3362±1.25E-02(+) 0.2142±7.86E-03(+) 0.3092±1.09E-02(+) (10,5) 0.1504±7.65E-03(+) 0.2383±8.31E-03(+) 0.2884±2.39E-02(=) 0.1482±9.61E-03(+) 0.1982±1.83E-02(+) DF4 (10,10) 1.8391±3.73E-02(+) 1.2709±4.45E-02(+) 3.3939±5.09E-02(+) 2.0724±9.91E-02(+) 2.3573±1.22E-01(+) (5,10) 1.8020±5.88E-02(+) 1.1554±3.26E-02(+) DF5 (10,10) 0.3326±5.39E-03(+) 0.1216±4.12E-02(+) 0.4425±7.35E-03(+) 0.3338±4.36E-03(+) 0.2746±3.71E-02(+) (5,10) 0.3277±4.31E-03(+) 0.1256±2.03E-02(+) DF6 (10,10) 0.0015±2.04E-03(+) 0.0259±1.37E-02(+) 0.2918±2.97E-02(=) 0.0015±1.53E-03(+) 0.0302±1.75E-02(=) (5,10) 0.0014±2.71E-03(+) 0.0535±3.26E-02(+) 0.0780±4.01E-02(+) 0.0065±6.29E-03(+) 0.0424±1.63E-02(+) (10,5) 0.0120±2.87E-03(+) 0.0144±9.89E-03(+) 0.0118±1.69E-02(+) 0.0113±5.78E-03(+) 0.0158±8.43E-03(+) DF7 (10,10) 0.0452±2.07E-02(+) 0.0043±1.04E-02(+) 3.3767±5.00E-02(+) 0.2116±1.38E-01(+) 1.3009±3.61E-01(+) (5,10) 0.1130±1.20E-01(+) 0.6582±4.13E-01(+) 2.5602±2.97E-01(+) 0.3686±2.29E-01(+) 1.5217±2.75E-01(+) (10,5) 0.0232±2.11E-02(+) 0.2107±7.40E-01(+) 0.9028±3.12E-01(+) 0.0396±1.94E-02(+) 1.2744±1.59E-01(+) DF8 (10,10) 51.818±4.83E-02(+) 49.071±3.56E-02(+) 52.378±5.24E-02(=) 51.932±2.96E-02(+) 52.354±3.60E-02(=) (5,10) 53.590±2.01E-02(=) 50.523±3.06E-02(+) 53.970±1.96E-02(=) 53.642±6.84E-02(=) 53.858±3.74E-02(=) (10,5) 51.412±3.38E-02(+) 48.976±4.09E-02(+) 52.126±7.43E-02(=) 51.643±8.14E-02(+) 52.197±5.06E-02(=) DF9 (10,10) 15.630±3.13E-01(+) 11.739±4.70E-01(+) 16.229±4.54E-01(+) 12.032±4.57E-01(+) 14.140±3.56E-01(+) (5,10) 14.526±2.20E-01(+) 3.9067±1.19E+00(+) 9.7689±6.89E-01(+) 6.6475±1.22E-01(+) 10.454±2.91E-01(+) (10,5) 14.287±2.84E-01(+) 11.941±5.46E-01(+) 13.500±9.43E-01(+) 10.162±4.12E-01(+) 12.955±6.08E-01(+) DF10 (10,10) 0.6101±3.79E-03(+) 0.6164±2.98E-03(+) 0.6883±2.73E-03(+) 0.5969±2.03E-03(+) 0.7330±9.24E-03(+) (5,10) 0.7650±2.50E-03(+) 0.7324±4.55E-03(+) 0.7945±3.60E-03(+) 0.7205±4.80E-03(+) 0.8304±4.60E-03(+) (10,5) 0.5517±2.66E-03(+) 0.5432±5.16E-03(+) 0.5892±4.84E-03(+) 0.5338±3.23E-03(+) 0.6566±8.06E-03(+) DF11 (10,10) 0.5900±1.37E-02(+) 0.6462±1.07E-02(+) 0.8824±5.66E-03(+) 0.5537±1.30E-02(+) 0.4029±4.69E-02(+) (5,10) 0.5872±8.29E-03(+) 0.8023±1.44E-02(+) DF12 (10,10) 9.8341±3.31E-03(+) 8.3648±3.05E-02(+) 9.8572±4.91E-03(=) 9.8319±2.22E-03(+) (5,10) 2.0124±1.41E-02(+) 2.0079±4.28E-02(+) 2.2706±1.47E-02(+) 2.0188±1.16E-02(+) 1.9317±4.53E-02(+) (10,5) 1.6846±1.36E-02(+) 1.4227±1.51E-02(+) 1.9582±4.04E-02(+) 1.6584±1.15E-02(+) 1.2634±4.19E-02(+) DF14 (10,10) 0.2796±9.17E-03(+) 0.2833±6.96E-03(+) 0.3623±1.48E-03(=) 0.2806±5.31E-03(+) 0.3211±1.40E-02(+) (5,10) 0.2760±4.31E-03(+) 0.2824±5.36E-03(+) 0.3475±3.10E-03(+) 0.2788±6.19E-03(+) 0.3255±5.99E-03(+) (10,5) 0.2130±4.25E-03(+) 0.2398±2.79E-03(+) 0.3190±7.60E-03(+) 0.2106±4.29E-03(+) 0.1947±3.05E-02(+) (+), (=) and (-) indicate that ISVM-RM-MEDA performs significantly better or equivalently or worse than the compared algorithms, respectively. POS. When the environment changes, an evolving ISVM clas-sifier can estimate whether a solution has the characteristicsof being an optimal solution and guide the prediction of theinitial population. To avoid low prediction accuracy caused bythe limited number of POS samples, a sampling strategy basedon SMOTE is implemented in advance.The experimental results demonstrate that the proposedalgorithm achieves competitive performance on DMOPs. Be-sides, ablation and adaptation study indicate the versatility ofISVM-DMOEA and the advantage of sub-strategies. However,solving problems with low correlation between solutions atdifferent environments and problems with strange distributionssuch as POF holes remains a challenge. For future work, wewould like to combine ISVM-DMOEA with transfer learn-ing to improve the applicability of the algorithm on more instances. In addition, we plan to introduce few-shot learningand sample quality pre-evaluation, which can enhance theaccuracy of prediction. Furthermore, kinds of machine learn-ing methods are expected to be integrated into evolutionaryalgorithm for solving real-world problems.R EFERENCES[1] A. Oyama, T. Nonomura, and K. Fujii, “Data mining of pareto-optimaltransonic airfoil shapes using proper orthogonal decomposition,” Journalof Aircraft , vol. 47, no. 5, pp. 1756–1762, 2010.[2] B. C. Barroso, F. G. Ferreira, G. P. Hanaoka, F. D. Paiva, and R. T.Cardoso, “Composition of investment portfolios through a combinatorialmultiobjective optimization model using cvar,” in . IEEE, 2017, pp. 1795–1802.[3] M. Iqbal, B. Xue, H. Al-Sahaf, and M. Zhang, “Cross-domain reuse ofextracted knowledge in genetic programming for image classification,” IEEE Transactions on Evolutionary Computation , vol. 21, no. 4, pp.569–587, 2017. TABLE III: MIGD AVERAGED OVER THREE DYNAMICCONFIGURATIONS OBTAINED BY ISVM-RM-MRDAAND ITS VARIANTS Prob DA-RM-MEDA ISVM r -RM-MEDA ISVM r -RM-MEDA SVM-RM-MEDA ISVM-RM-MEDADF1 0.2212 0.2517(-) 0.1774(19.8) 0.1090(50.7) 0.0871(60.6)DF2 0.1434 0.1605(-) 0.1125(21.5) 0.0787(45.1) 0.0679(52.6)DF3 0.4588 0.5234(-) 0.3718(19.0) 0.2698(41.2) 0.2309(49.7)DF4 1.5398 1.4585(5.3) 1.3069(15.1) 1.1064(28.1) 0.9935(35.5)DF5 1.7366 1.6074(7.4) 1.4552(16.2) 1.3175(24.1) 1.0924(37.1)DF6 8.6197 8.2571(4.2) 8.0207(6.9) 6.0058(30.3) 5.0582(41.3)DF7 7.3851 7.4237(-) 6.9706(5.6) 5.8792(20.4) 4.6844(36.6)DF8 0.7116 0.8249(-) 0.7743(-) 0.7697(-) 0.7693(-)DF9 2.3775 2.3015(3.2) 2.0593(13.4) 1.8020(24.2) 1.6047(32.5)DF10 0.1732 0.1611(7.0) 0.1374(20.7) 0.1236(28.6) 0.1125(35.0)DF11 0.2424 0.2243(7.5) 0.1694(30.1) 0.1209(50.1) 0.0893(63.2)DF12 0.6825 1.1603(-) 1.1195(-) 1.1638(-) 1.1594(-)DF13 1.7704 1.5978(9.7) 1.4464(18.3) 1.3293(24.9) 1.2280(30.6)DF14 1.2048 1.1334(5.9) 0.9247(23.2) 0.8651(28.2) 0.7406(38.5)The values in parentheses indicate the percentage improvement versus DA-RM-MEDA, and (-) indicates no improvement. TABLE IV: MIGD AVERAGED OVER THREE DYNAMICCONFIGURATIONS OBTAINED BY DA-NSGA-II,ISVM-NSGA-II, DA-MOPSO AND ISVM-MOPSO Prob DA-NSGA-II ISVM-NSGA-II DA-MOPSO ISVM-MOPSODF1 0.5660 0.4381(22.6) 0.1315 0.0488(62.9)DF2 0.4855 0.3132(35.5) 0.0887 0.0282(68.2)DF3 0.6347 0.3744(41.0) 0.2008 0.1294(35.6)DF4 1.9088 1.2459(34.7) 1.4380 0.8301(42.3)DF5 2.0731 1.5564(24.9) 0.3078 0.1380(55.2)DF6 12.650 12.299(2.8) 4.4071 0.3611(91.8)DF7 12.281 10.224(16.7) 1.6328 0.5933(63.7)DF8 0.6447 0.7094(-) 0.6074 0.6642(-)DF9 2.4538 2.2416(8.6) 0.8732 0.8358(4.3)DF10 0.2745 0.1692(38.4) 0.1928 0.1262(34.5)DF11 0.4788 0.2822(41.1) 0.2289 0.1259(45.0)DF12 0.6737 0.8052(-) 0.4208 0.7478(-)DF13 2.1968 1.8398(16.3) 0.2710 0.2552(5.8)DF14 1.4877 1.3651(8.2) 0.4496 0.3981(11.5)The values in parentheses indicate the percentage improvement of ISVM-NSGA-II versus DA-NSGA-II and that of ISVM-MOPSO versus DA-MOPSO,respectively. (-) indicates no improvement. [4] L.-C. Chang and F.-J. Chang, “Multi-objective evolutionary algorithmfor operating parallel reservoir system,” Journal of hydrology , vol. 377,no. 1-2, pp. 12–20, 2009.[5] X. Bai, “Two-stage multiobjective optimization for emergency suppliesallocation problem under integrated uncertainty,” Mathematical Prob-lems in Engineering , vol. 2016, 2016.[6] C. Cruz, J. R. Gonz´alez, and D. A. Pelta, “Optimization in dynamicenvironments: a survey on problems, methods and measures,” SoftComputing , vol. 15, no. 7, pp. 1427–1448, 2011.[7] T. T. Nguyen, S. Yang, and J. Branke, “Evolutionary dynamic opti-mization: A survey of the state of the art,” Swarm and EvolutionaryComputation , vol. 6, pp. 1–24, 2012.[8] W. T. Koo, C. K. Goh, and K. C. Tan, “A predictive gradient strategy formultiobjective evolutionary algorithms in a fast changing environment,” Memetic Computing , vol. 2, no. 2, pp. 87–110, 2010.[9] A. Zhou, Y. Jin, Q. Zhang, B. Sendhoff, and E. Tsang, “Prediction-basedpopulation re-initialization for evolutionary dynamic multi-objective op-timization,” in International Conference on Evolutionary Multi-CriterionOptimization . Springer, 2007, pp. 832–846.[10] L. Cao, L. Xu, E. D. Goodman, C. Bao, and S. Zhu, “Evolutionarydynamic multiobjective optimization assisted by a support vector re-gression predictor,” IEEE Transactions on Evolutionary Computation ,vol. 24, no. 2, pp. 305–319, 2019.[11] R. Liu, W. Zhang, L. Jiao, F. Liu, and J. Ma, “A sphere-dominance basedpreference immune-inspired algorithm for dynamic multi-objective op-timization,” in Proceedings of the 12th annual conference on Geneticand evolutionary computation , 2010, pp. 423–430.[12] C. R. Azevedo and A. F. Ara´ujo, “Generalized immigration schemesfor dynamic evolutionary multiobjective optimization,” in . IEEE, 2011, pp. 2033–2040.[13] K. Deb, S. Karthik et al. , “Dynamic multi-objective optimization anddecision-making using modified nsga-ii: a case study on hydro-thermalpower scheduling,” in International conference on evolutionary multi-criterion optimization . Springer, 2007, pp. 803–817.[14] R. Liu, L. Peng, J. Liu, and J. Liu, “A diversity introduction strategybased on change intensity for evolutionary dynamic multiobjectiveoptimization,” Soft Computing , vol. 24, no. 17, pp. 12 789–12 799, 2020.[15] G. Ruan, G. Yu, J. Zheng, J. Zou, and S. Yang, “The effect of diversitymaintenance on prediction in dynamic multi-objective optimization,” Applied Soft Computing , vol. 58, pp. 631–647, 2017.[16] C.-K. Goh and K. C. Tan, “A competitive-cooperative coevolutionaryparadigm for dynamic multiobjective optimization,” IEEE Transactionson Evolutionary Computation , vol. 13, no. 1, pp. 103–127, 2008.[17] K. Li, S. Kwong, J. Cao, M. Li, J. Zheng, and R. Shen, “Achievingbalance between proximity and diversity in multi-objective evolutionaryalgorithm,” Information Sciences , vol. 182, no. 1, pp. 220–242, 2012.[18] S. Jiang and S. Yang, “A steady-state and generational evolutionaryalgorithm for dynamic multiobjective optimization,” IEEE Transactionson Evolutionary Computation , vol. 21, no. 1, pp. 65–82, 2016.[19] R. Shang, L. Jiao, Y. Ren, L. Li, and L. Wang, “Quantum immune clonalcoevolutionary algorithm for dynamic multiobjective optimization,” SoftComputing , vol. 18, no. 4, pp. 743–756, 2014.[20] R. Shang, L. Jiao, M. Gong, and B. Lu, “Clonal selection algorithm fordynamic multiobjective optimization,” in International Conference onComputational and Information Science . Springer, 2005, pp. 846–851.[21] R. Azzouz, S. Bechikh, and L. B. Said, “A dynamic multi-objectiveevolutionary algorithm using a change severity-based adaptive popula-tion management strategy,” Soft Computing , vol. 21, no. 4, pp. 885–906,2017.[22] X. Xu, Y. Tan, W. Zheng, and S. Li, “Memory-enhanced dynamic multi-objective evolutionary algorithm based on lp decomposition,” AppliedSciences , vol. 8, no. 9, p. 1673, 2018.[23] S. Sahmoud and H. R. Topcuoglu, “A memory-based nsga-ii algorithmfor dynamic multi-objective optimization problems,” in European Con-ference on the Applications of Evolutionary Computation . Springer,2016, pp. 296–310.[24] M. Helbig and A. P. Engelbrecht, “Archive management for dynamicmulti-objective optimisation problems using vector evaluated particleswarm optimisation,” in . IEEE, 2011, pp. 2047–2054.[25] R. Chen, K. Li, and X. Yao, “Dynamic multiobjectives optimization witha changing number of objectives,” IEEE Transactions on EvolutionaryComputation , vol. 22, no. 1, pp. 157–171, 2017.[26] A. Muruganantham, K. C. Tan, and P. Vadakkepat, “Evolutionarydynamic multiobjective optimization via kalman filter prediction,” IEEEtransactions on cybernetics , vol. 46, no. 12, pp. 2862–2873, 2015.[27] M. Rong, D. Gong, Y. Zhang, Y. Jin, and W. Pedrycz, “Multidirectionalprediction approach for dynamic multiobjective optimization problems,” IEEE transactions on cybernetics , vol. 49, no. 9, pp. 3362–3374, 2018.[28] Q. Li, J. Zou, S. Yang, J. Zheng, and G. Ruan, “A predictive strategybased on special points for evolutionary dynamic multi-objective opti-mization,” Soft Computing , vol. 23, no. 11, pp. 3723–3739, 2019.[29] J. Zou, Q. Li, S. Yang, H. Bai, and J. Zheng, “A prediction strategybased on center points and knee points for evolutionary dynamic multi-objective optimization,” Applied soft computing , vol. 61, pp. 806–818,2017.[30] L. Wei, Z. Guo, R. Fan, H. Sun, and Z. Zhao, “A prediction strategybased on special points and multiregion knee points for evolutionarydynamic multiobjective optimization,” Applied Intelligence , vol. 50,no. 12, pp. 4357–4377, 2020.[31] Y. Wu, Y. Jin, and X. Liu, “A directed search strategy for evolutionarydynamic multiobjective optimization,” Soft Computing , vol. 19, no. 11,pp. 3221–3235, 2015.[32] A. T. W. Min, Y.-S. Ong, A. Gupta, and C.-K. Goh, “Multiproblemsurrogates: Transfer evolutionary multiobjective optimization of com-putationally expensive problems,” IEEE Transactions on EvolutionaryComputation , vol. 23, no. 1, pp. 15–28, 2017.[33] K. C. Tan, L. Feng, and M. Jiang, “Evolutionary transfer optimization-anew frontier in evolutionary computation research,” IEEE ComputationalIntelligence Magazine , vol. 16, no. 1, pp. 22–33, 2021.[34] A. Gupta, Y.-S. Ong, and L. Feng, “Insights on transfer optimization:Because experience is the best teacher,” IEEE Transactions on EmergingTopics in Computational Intelligence , vol. 2, no. 1, pp. 51–64, 2017. [35] B. Da, A. Gupta, and Y.-S. Ong, “Curbing negative influences onlinefor seamless transfer evolutionary optimization,” IEEE transactions oncybernetics , vol. 49, no. 12, pp. 4365–4378, 2018.[36] K. K. Bali, Y.-S. Ong, A. Gupta, and P. S. Tan, “Multifactorial evolu-tionary algorithm with online transfer parameter estimation: Mfea-ii,” IEEE Transactions on Evolutionary Computation , vol. 24, no. 1, pp.69–83, 2019.[37] M. Jiang, Z. Huang, L. Qiu, W. Huang, and G. G. Yen, “Transferlearning-based dynamic multiobjective optimization algorithms,” IEEETransactions on Evolutionary Computation , vol. 22, no. 4, pp. 501–514,2017.[38] M. Jiang, Z. Wang, S. Guo, X. Gao, and K. C. Tan, “Individual-based transfer learning for dynamic multiobjective optimization,” IEEETransactions on Cybernetics , 2020.[39] M. Jiang, Z. Wang, H. Hong, and G. G. Yen, “Knee point basedimbalanced transfer learning for dynamic multi-objective optimization,” IEEE Transactions on Evolutionary Computation , 2020.[40] M. Jiang, Z. Wang, L. Qiu, S. Guo, X. Gao, and K. C. Tan, “A fastdynamic evolutionary multiobjective algorithm via manifold transferlearning,” IEEE Transactions on Cybernetics , 2020.[41] R. Liu, J. Fan, and L. Jiao, “Integration of improved predictive modeland adaptive differential evolution based dynamic multi-objective evo-lutionary optimization algorithm,” Applied Intelligence , vol. 43, no. 1,pp. 192–207, 2015.[42] L. Cao, L. Xu, E. D. Goodman, and H. Li, “Decomposition-basedevolutionary dynamic multiobjective optimization using a differencemodel,” Applied Soft Computing , vol. 76, pp. 473–490, 2019.[43] Z. Liang, S. Zheng, Z. Zhu, and S. Yang, “Hybrid of memory and pre-diction strategies for dynamic multiobjective optimization,” InformationSciences , vol. 485, pp. 200–218, 2019.[44] J. Zheng, T. Chen, H. Xie, and S. Yang, “An improved memoryprediction strategy for dynamic multiobjective optimization,” in . IEEE, 2020, pp. 166–171.[45] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm foroptimal margin classifiers,” in Proceedings of the fifth annual workshopon Computational learning theory , 1992, pp. 144–152.[46] P. Laskov, C. Gehl, S. Kr¨uger, and K.-R. M¨uller, “Incremental supportvector learning: Analysis, implementation and applications,” Journal ofmachine learning research , vol. 7, no. Sep, pp. 1909–1936, 2006.[47] G. Cauwenberghs and T. Poggio, “Incremental and decremental supportvector machine learning,” Advances in neural information processingsystems , vol. 13, pp. 409–415, 2000.[48] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:synthetic minority over-sampling technique,” Journal of artificial intel-ligence research , vol. 16, pp. 321–357, 2002.[49] S. Jiang, S. Yang, X. Yao, K. C. Tan, M. Kaiser, and N. Krasnogor,“Benchmark problems for cec2018 competition on dynamic multiobjec-tive optimisation,” Proc. CEC2018 Competition , pp. 1–8, 2018.[50] Y. Tian, R. Cheng, X. Zhang, F. Cheng, and Y. Jin, “An indicator-basedmultiobjective evolutionary algorithm with reference point adaptationfor better versatility,” IEEE Transactions on Evolutionary Computation ,vol. 22, no. 4, pp. 609–622, 2017.[51] Q. Zhang, A. Zhou, and Y. Jin, “Rm-meda: A regularity model-basedmultiobjective estimation of distribution algorithm,” IEEE Transactionson Evolutionary Computation , vol. 12, no. 1, pp. 41–63, 2008.[52] A. Zhou, Y. Jin, and Q. Zhang, “A population prediction strategy forevolutionary dynamic multiobjective optimization,” IEEE transactionson cybernetics , vol. 44, no. 1, pp. 40–53, 2013.[53] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitistmultiobjective genetic algorithm: Nsga-ii,” IEEE transactions on evolu-tionary computation , vol. 6, no. 2, pp. 182–197, 2002.[54] C. A. C. Coello, G. T. Pulido, and M. S. Lechuga, “Handling multipleobjectives with particle swarm optimization,”