[PDF] A Federated Data-Driven Evolutionary Algorithm

Abstract

Data-driven evolutionary optimization has witnessed great success in solving complex real-world optimization problems. However, existing data-driven optimization algorithms require that all data are centrally stored, which is not always practical and may be vulnerable to privacy leakage and security threats if the data must be collected from different devices. To address the above issue, this paper proposes a federated data-driven evolutionary optimization framework that is able to perform data driven optimization when the data is distributed on multiple devices. On the basis of federated learning, a sorted model aggregation method is developed for aggregating local surrogates based on radial-basis-function networks. In addition, a federated surrogate management strategy is suggested by designing an acquisition function that takes into account the information of both the global and local surrogate models. Empirical studies on a set of widely used benchmark functions in the presence of various data distributions demonstrate the effectiveness of the proposed framework.

Full PDF

DDRAFT 1

A Federated Data-Driven Evolutionary Algorithm

Jinjin Xu, Yaochu Jin,

Fellow, IEEE , Wenli Du, Sai Gu

Abstract —Data-driven evolutionary optimization has witnessedgreat success in solving complex real-world optimization prob-lems. However, existing data-driven optimization algorithmsrequire that all data are centrally stored, which is not alwayspractical and may be vulnerable to privacy leakage and securitythreats if the data must be collected from different devices.To address the above issue, this paper proposes a federateddata-driven evolutionary optimization framework that is able toperform data driven optimization when the data is distributedon multiple devices. On the basis of federated learning, a sortedmodel aggregation method is developed for aggregating localsurrogates based on radial-basis-function networks. In addition,a federated surrogate management strategy is suggested bydesigning an acquisition function that takes into account theinformation of both the global and local surrogate models.Empirical studies on a set of widely used benchmark functionsin the presence of various data distributions demonstrate theeffectiveness of the proposed framework.

Index Terms —Data-driven evolutionary optimization, dis-tributed optimization, federated learning, RBFN surrogatemodel.

I. I

NTRODUCTION E VOLUTIONARY algorithms (EAs), as meta-heuristictechniques, have been shown to be effective solvers formany real-world problems over the past few decades [1][2] [3] [4]. Apart from their strong capability of tacklingnon-convex and multi-model problems, EAs do not reply onanalytic and differential objective functions and are able toperform optimization on the basis of collected data, which areusually referred to as data-driven optimization [4]. In somedata-driven optimization problems, data are collected throughtime-consuming numerical simulations or expensive physicalexperiments. For example, a single run of ﬁre dynamicssimulation may take several hours [5]. In these cases, onlya limited amount of data is available for data-driven optimiza-tion, posing major challenges to building surrogates neededfor data-driven optimization. Therefore, existing work on data-driven surrogate-assisted evolutionary optimization focuses ondeveloping surrogate modeling and management techniques[6] [7] that can help ﬁnd acceptable solutions with a limitedcomputational budget for single-objective optimization [8] [9]

Manuscript received xx, 2021; revised xxx, 2021. The work is supportedby the National Natural Science Foundation of China (Basic Science CenterProgram: 61988101), International (Regional) Cooperation and ExchangeProject (61720106008) and National Natural Science Fund for DistinguishedYoung Scholars (61725301). (

Jinjin Xu and Yaochu Jin contributed equallyto this work. ) (

Corresponding authors: Yaochu Jin; Wenli Du. )Jinjin Xu, Wenli Du are with the Key Laboratory of Advanced Controland Optimization for Chemical Processes, Ministry of Education, East ChinaUniversity of Science and Technology, Shanghai, 200237, China. E-mail:[email protected], [email protected] Jin is with the Department of Computer Science, University ofSurrey, Guildford, GU2 7XH, UK. E-mail: [email protected] Gu is with the Department of Chemical and Process Engineering,University of Surrey, Guildford, GU27XH, UK. E-mail:[email protected]. [10] [11], multi-objective optimization [12] [13] [14], andmany-objective optimization [15] [16] [17].One implicit assumption made in most existing work ondata-driven evolutionary optimization is that all data formodeling the surrogates is centrally stored, which does nothold for many real-world optimization problems. For example,there are many large-scale complex systems in manufacturingand process industries [18] [19] consisting of sub-systemsthat may be distributed on multiple locations and all thesesystems must be considered at the same time to achieve theoptimal performance. Collecting data from sub-systems andstoring the data on a central server not only give rise tocommunication problems, but also raise security and privacyconcerns. To address the above problem, some research ondistributed optimization has been carried out in the ﬁeld ofautomatic control, where distributed or decentralized gradientbased optimization methods have been developed [20] [21][22]. These work assumes, however, that exact analytic ob-jective functions are available. Most recently, Li et al. [18]present a gradient-based distributed optimization method forblack-box optimization problems, under the assumption thatthe approximated objective functions are differentiable andstrongly convex. Thus, their optimization algorithm is notapplicable to optimization problems such as airfoil design [23]and trauma system design [24].In evolutionary computation, distributed evolutionary algo-rithms (EAs) have been investigated to reduce computationtime or to deal with large-scale optimization problems. Forexample, parallel evolutionary optimization [25] based on amaster-slave mode [26], island mode [27] and grid mode [28]have been proposed to perform ﬁtness evaluations in a paralleland distributed manner to reduce the required computationtime, assuming that multiple processors are available. Inaddition, a large number of cooperative and co-evolutionaryalgorithms [29] have been proposed for solving large-scaleand complex optimization problems, which can largely bedivided into population-distributed [30] [31] and dimension-distributed [32] [33]. To further reduce the computation time,a surrogate model is built for each sub-population in [34]. Acomprehensive survey of distributed EAs, including parallel,hierarchical and co-evolutionary algorithms can be found in[35]. It should be noted that none of these distributed EAs aremeant for solving data-driven optimization problems wherethe data is distributed on multiple devices.Meanwhile, a distributed data-driven machine learningparadigm, called federated learning [36] [37], has received in-creasing attention in the ﬁeld of machine learning. In federatedlearning, multiple clients collaboratively train a global modelwithout requiring to upload the data collected on multipleclients to a server, reducing the privacy and security risks.Interestingly, evolutionary algorithms (EAs) have been applied a r X i v : . [ c s . N E ] F e b RAFT 2 in federated learning, including using EAs to optimize themixture coefﬁcients or model weights [38] [39], or performingevolutionary neural architecture search in the federated learn-ing framework [40] [41]. To the best of our knowledge, how-ever, none of the above work deals with distributed data-drivensurrogate-assisted evolutionary optimization, where surrogatesare built on the basis of data distributed collected on multipledevices.This work aims to propose a framework for federated data-driven optimization to address a class of distributed data-drivenoptimization problems, focusing on surrogate construction andsurrogate management in a distributed environment in thepresence of possibly noisy and non-independently identicallydistributed (non-iid) data. The main contributions of the workare summarized as follows.1) On the basis of federated learning, a surrogate-assistedfederated data-driven evolutionary algorithm, calledFDD-EA, is proposed, which does not require to storedata on a single server that are originally collected onmultiple devices.2) A sorted averaging method is designed for aggregatinglocal radial-basis-function networks into a global surro-gate on the server, thereby enhancing the performanceof incremental federated learning in particular in thepresence of non-iid data.3) The lower conﬁdence bound acquisition function isadapted to the federated optimization environment,which integrates information from both local and theglobal surrogate models.To validate the proposed federated data-driven evolutionaryoptimization framework, benchmark optimization problemsare adopted for generating data distributed on multiple ma-chines. Non-iid data are simulated by assuming that eachdevice is not able to generate data in some decision subspaces.Our experimental results demonstrate that the proposed feder-ated data-driven optimization framework comparably well orbetter than the state-of-the-art centralized online data-drivensurrogate-assisted evolutionary algorithms on the majority ofthe tested instances.The remainder of this paper is organized as follows. InSection II, we brieﬂy review the basic federated learningparadigm and a few widely used surrogate-assisted optimiza-tion approaches, on both of which the proposed work is based.Section III presents the proposed federated data-driven evolu-tionary optimization framework. In Section IV, the benchmarkproblems used in the empirical studies, experimental settingsand the comparative results are given. Finally, conclusions aredrawn and future directions are discussed in Section V.II. R

ELATED W ORK

In this section, we brieﬂy review the background of thiswork, including federated learning, radial basis function net-work, and the acquisition functions for surrogate-assistedevolutionary algorithms.

A. Federated Learning

The explosion of data resulting from massive edge devices,e.g., Internet-of-Things (IoT), mobile phones and enterprise clouds, has led to the emergence of many edge computingalgorithms and frameworks. Federated learning is one of themost prevalent edge computing methods due to its capabilityof privacy preservation, data security and communicationefﬁciency. The concept of federated learning was ﬁrst proposedin [42] [43], and a large body of work has been reportedto further reduce communication cost [44] [40] [45], handlevertical data partition [46], or enhance privacy protection [47][48] [49]. However, the test accuracy of federated learningusually suffers from the non-iid nature of the client datacompared to centralized machine learning methods, since theclient data may be drawn from different distributions in real-world applications. To alleviate this problem, a large numberof new techniques have been proposed, including data sharing[50], client selection [51], and adaptive federated learning [52].

Cellular network EdgedevicesServer

Fig. 1. An illustrative example of federated learning. The edge devices trainlocal models on the private local data and send the trained model parametersto the server for aggregation to obtain the global model. Then, all clientsdownload the aggregated global model and repeat the process until the globalmodel converges.

As illustrated in Fig. 1, a vanilla (standard) federatedlearning system consists of a server, a communication networkand a number of edge devices, also called local clients [36]. Inthe ﬁrst round, the server randomly initialize a global modeland send the model to all participating local clients. Notethat not all clients participate in each round of model update.Then, each participating local client trains the received globalmodel with its own local data, typically using the gradient-based method, for a number of epochs. After training, theparticipating local clients uploads its updated local model tothe server. Finally, the server will aggregate the uploaded localmodels using weighted averaging, known as FedAvg [36], andsend the the updated global model to the participating localclients for the next round of model update. This process repeatsuntil the global model converges.Assume in the current round, the global model parameters w are sent to λN participating clients, where λ is the ratioof the local clients participating in the current round and N is the total number of local clients. Then the k -th client trainsthe received model on the local dataset D k consisting of n k training data pairs ( x i , y i ), where i = 1 , , , ..., n k , resultingin the updated local model w k . The loss function F k of thelocal client k is deﬁned by F k ( w k ) = 1 n k n k (cid:88) i L ( x i , y i ; w k ) + γ ( w k ) , (1)where L ( · ) is the user-deﬁned loss function and γ denotes theregularizer. And the aim of federated learning is to minimizethe global objective F with a global model w . Therefore, the RAFT 3 global loss function of the federated learning system can bedeﬁned as: min w (cid:40) F ( w ) = λN (cid:88) k =1 p k F ( w k ) (cid:41) , (2)where p k is the weight of w k and calculated by p k = n k (cid:80) λNk =1 n k . (3)In each round, the local model w k is initialized with thedownloaded global w , then client k can update w k usingthe mini-batch stochastic gradient descent (SGD) [53] witha learning rate η k and performs E ( ≥ local training epochs: w i +1 k = w i +1 k − η ik ∇ F k ( w ik ) , i = 0 , , ..., E − . (4)At the end of each round, the server aggregates all updatedlocal models to obtain the updated global model and start thenext round.Note that the vanilla federated learning algorithm adoptsa deep neural network and assumes there is enough trainingdata. However, as we have previously discussed, in data-driven evolutionary optimization, the number of training datais usually very limited since data collection is very expensive.Thus, the global model we are going to use will be a tinymodel. More discussions will be provided in Section III. B. Radial-Basis-Function Networks

Various regression, classiﬁcation and interpolation methodscan be used as the surrogate models to approximate the realobjective functions [54]. Among them, radial basis functionnetworks (RBFNs) [55], artiﬁcial neural networks (ANNs) [8],Gaussian processes (GPs) [56] and polynomial regression (PR)[11] are the most widely ones, and surrogate ensembles havealso been widely studied [55], [57]. In this work, an RBFN isadopted as the surrogates because it has, based on our pilotstudies, shown to be more scalable to the number decisionvariables and easier to train. The structure of the RBFN issimilar to that in [58]. Let x i ∈ R d be the i -th input sampleof the training data, the prediction of the RBFN can be denotedby: ˆ y i = m (cid:88) j =1 a j ϕ j ( || x i − c j || ) + b, (5)where m is the number of centers of the RBFN, a =( a , a , · · · , a m ) are the weights of the RBFN (the subscriptshere indicate the weight indices of the RBFN), the centervector C = ( c , c , ..., c m ) is obtained by the k-meansclustering algorithm, c m ∈ R d , and b ∈ R is the bias. And || x i − c j || is the Euclidean distance between an input and thecenter c j . Finally, ϕ j ( · ) denotes the basis function, and thereare various choices for ϕ j , such as the Gaussian function,logistic function, or thin-plate spline function [58]. In thiswork, we use the Gaussian function, which is expressed by ϕ j ( || x i − c j || ) = e − || x i − c j || δ j , (6) where δ j denotes the standard deviations, also known as thespreads, width or radii. In this work, we select δ j accordingto the maximum distance between the centers [58].For the weights connecting the hidden nodes and the outputof the network, we have  ϕ ( || x − c || ) ... ϕ m ( || x − c m || ) ... ...ϕ ( || x n k − c || ) ... ϕ m ( || x n k − c m || )   a ...a m  + b ≈  y ...y n k  , (7)where n k is the total number of the training samples, and thepseudo-inverse method or gradient-based method [58] can beemployed to train the weights. C. Acquisition functions

Surrogate management, which determines which new so-lutions are to be sampled, i.e., evaluated using the expensiveobjective functions, is central to the effectiveness of surrogate-assisted evolutionary algorithms (SAEAs) [54]. Among differ-ent surrogate management strategies, acquisition functions inBayesian optimization [59], also known as inﬁll criteria inglobal optimization [60]–[62], are mathematically solid andhave been shown very effective in balancing exploration andexploitation in online data-driven surrogate-assisted evolution-ary optimization.Several acquisition functions have been proposed to samplenew solutions [63], such as lower conﬁdence bound (LCB)[64], expected improvement (EI) [62] and probability of im-provement (PI) [65]. Given a candidate solution x p , LCB of x p is calculated by: LCB( x p ) = ˆ f ( x p ) − µ ˆ s ( x p ) , (8)where ˆ f ( x p ) and ˆ s ( x p ) are the predicted mean and standarddeviation (the conﬁdence level of the prediction) of the solu-tion point x p , respectively. The trade-off hyperparameter µ isusually set to 2 as recommended in [66].In this work, we adapt the LCB to the federated optimizationframework because the original LCB cannot be directly ap-plied. The reason is that the RBFN is adopted as the surrogatesand hence the uncertainty information of the predictions, ˆ s ( x p ) , is not directly available. Therefore, a federated inﬁllcriterion based on LCB is proposed, which will be discussedin greater detail in Section III-D.III. P ROBLEM DEFINITION AND P ROPOSED A LGORITHM

In this section, we at ﬁrst formulate the federated data-driven optimization problem to be addressed in this work.Then, the overall workﬂow of the proposed FDD-EA isdescribed. Finally, a new strategy for aggregating the localRBFN surrogates will be presented together with an adaptedLCB.

A. Problem deﬁnition

As stated in Section I, the main purpose of the present workis to solve data-driven optimization problems where the rawdata for optimization is distributed on different local machinesand not allowed to be transmitted to a central server. Inaddition, we assume that each client has all decision variables,

RAFT 4

GlobalModel w Server

Sorted Averaging D D D k [ x p , f ( x p )][ x p , f ( x p )][ x p , f k ( x p )] Optimizer w w w k Client 1Client 2Client k AcquisitionFunction x p initk D init D init D Fig. 2. The overall framework of the proposed FDD-EA. Firstly, the clients build local surrogates trained on the private dataset D init . Secondly, the serverobtains the global surrogate w by averaging the received local surrogates using the sorted averaging method, and then conducts evolutionary optimizationassisted by the global model w . Thirdly, promising solutions are selected with the help of a proposed surrogate management strategy, which are broadcast toall clients for data collection, i.e., evaluation of the ﬁtness value using the expensive objective functions f k . Finally, the local surrogates are updated on theunion of D initk and the newly sampled data D k . and is able to sample a limited amount of new data as informedby the server. However, each client may be limited to samplingsolutions in a given subspace of the decision space, resultingin a horizontal but non-iid data distribution on different clients.Thus, in federated data-driven evolutionary optimizationsystems considered in this work, only the local clients cansample new data by performing expensive simulations orexperiments in a limited sub-space of the whole decisionspace. Denote the approximated objective function on the k -thclient by y k = f k ( x ) , (9)where x ∈ R d is the decision vector. Since we considerhorizontal data partition in this work, the expensive objectivefunction of all local clients are the same, although the decisionvariables of the clients may be limited to a subspace in theoverall decision space.On the k -th client, a local dataset D k = { ( x , y ) , ..., ( x n k , y n k ) } consisting of n k data pairs ofdecision vectors and their corresponding ﬁtness valuesevaluated according to Equation (9) is stored. Thus, a localsurrogate model w k can be constructed on D k , which isdescribed by: ˆ f k = w k ( x | D k ) . (10)Denote the difference between the surrogate and the real ﬁtnessfunction on the k -th client as (cid:15) , then we have f k ( x ) = ˆ f k + (cid:15) .Recall that the server has no direct access no the trainingdata describing the functional relationship between a decisionvector and its objective function value. Furthermore, the clientsare not allowed to upload such raw data to the server, neithercan they communicate the local raw data to other clients.However, the clients are allowed to upload the parameters oftheir local surrogates to the server for constructing a globalsurrogate. Consequently, the server can aggregate the uploadedlocal surrogates to build global surrogate, based on which the optimum of the overall system x ∗ can be found using anoptimizer, an EA in this work. In other words, the evolutionarysearch is conducted on the server instead of on the localclients, assuming that the computational power on the clientsis limited.Similar to centralized data-driven evolutionary optimization,the global surrogate must be properly managed to effectivelyassist the EA to ﬁnd the optimum of the global system. Once apromising solution x p is selected according to the acquisitionfunction on the server, it will be broadcast to all participatingclients. If the solution is within the feasible subspace of k -thclient, the objective value of x p will be sampled using f k ( x p ) ,in practice, a time-consuming numerical simulation will beperformed or an experiment will be conducted. The samplednew data pair, ( x p , f k ( x p )) will be added in the local dataset D k for updating the local surrogate. If solution x p falls in itsinfeasible decision subspace, i.e., the k -th client is not able tosample this solution, no new data is sampled in this round.Nevertheless, the local surrogate can still be updated in thenext round if this client participates the model update.The proposed federated data-driven evolutionary optimiza-tion problem and federated learning share some commonfeatures, but they also have clear differences. The main goalof federated learning is to build a high-quality global modelfor classiﬁcation or regression, without requiring the data dis-tributed on multiple clients to be uploaded and centrally storedon the server so that data security can be ensured and privacycan be protected. Similarly, federated data-driven optimizationalso needs to build a global surrogate model, which is usedto assist the evolutionary search of the optimum of the globalsystem. The global surrogate is built in a federated learningmanner so that the local data does not need to transmittedto the server to reduce security and privacy issues. However,federated data-driven optimization distinguishes itself withfederated learning in at least the following two aspects. First, RAFT 5 the goal of federated data-driven optimization is to ﬁnd theoptimum of the whole system, and as a result, a propersurrogate model management strategy must be designed, de-termining which new solutions should be sampled on the localclients. Similar to centralized surrogate-assisted evolutionaryoptimization [67], the capability of effectively guiding theevolutionary search, rather than the prediction accuracy, is ofparamount importance in federated data-driven evolutionaryoptimization. The second main difference is that in federateddata-driven optimization, new data keeps being generated andtherefore the surrogates must be incrementally updated duringthe optimization. In contrast to federated learning where theamount of data on each client is often big, the amount ofdata on the local clients is usually very limited in federatedoptimization. The main differences between federated data-driven optimization and federated learning are summarized inTable I.It should also be pointed out that federated data-drivenoptimization differs from existing parallel or distributed evo-lutionary optimization in that in the former, the raw data iscollected and stored in a distributed way, while in the latter,the computation or optimization is proactively distributed todifferent machines for reducing computation time. As a result,the way of distributing the computation or data tasks indistributed optimization is under the full control of the user,while in federated optimization, the data distribution is mainlydetermined by the nature of the local clients (subsystems).

TABLE IT

HE MAIN DIFFERENCES BETWEEN VANILLA FEDERATED LEARNINGSYSTEM AND FEDERATED DATA - DRIVEN EVOLUTIONARY OPTIMIZATION . VanillaFederated Learning Federated Data-DrivenEvolutionary Optimization

Server Model aggregation Model aggregation andsurrogate managementLocal clients Local training Local trainingand informed samplingData (Usually) Big, stationary Small, incrementalObjective Prediction accuracy Solution optimality

B. Overall framework

The overall framework of FDD-EA is given in Fig. 2. Inthe beginning, the server samples a certain amount of datausing the Latin hypercube sampling (LHS) method [68]. Thesesolutions are sent to all clients and evaluated on the clientsusing the local real objective function, which constitutes theinitial training set D initk of each client. Note that if not allclients can sample the whole decision space, D initk can alsovary from client to client. Then each client uses D initk totrain a local RBFN surrogate, and uploads the trained modelparameters to the server after the training is completed. Aglobal surrogate is obtained by aggregating the local RBFNsusing the sorted averaging method, the details of which will bepresented in Section III-C. An EA is then employed to searchfor the optimal solutions of the global surrogate by minimizing the proposed federated LCB (to be described in Section III-C)for a certain number generations. Finally, the optimal solutionfound by the EA, x p , is broadcast to all clients participatingthe next round of surrogate update. The point x p will beevaluated on the participating clients using their real objectivefunction f k and if successful (in their feasible subspace in thenon-iid case), be added to their database D k . The surrogate( w k ) on each participating client will then be updated onthe augmented D k . This process repeats until the maximumcomputation budget is exhausted. The pseudo code of theproposed FDD-EA is given in Algorithm 1. Algorithm 1:

Pseudo Code of FDD-EA

Input:

Number of participating clients N , globalsurrogate w , local surrogate w k , empty indexset S , local archives D k , client weight p k , k = { , , .., N } , maximum number of realobjective function evaluations FE max . Init:

Sample 5 d points x , x , ..., x d in the decisionspace by LHS and evaluate the values y , y , ..., y d with the real objective functions as D initk , and trainthe initial local models w k with D initk . while FE ≤ FE max do update S ← λN randomly selected from N clients Server does: /*update the global surrogate*/ w ← Algorithm 2call F-LCB to evaluate the individualsselected solution x p ← EAbroadcast w , x p to clients ∈ S end f ( x p ) ← real evaluate x p /* not on the server*/ for Client k ∈ S in parallel do /*update local surrogates*/receive w , x p from the serversynchronize w k ← w add { x p , f k ( x p ) } to D k train w k incrementally using D initk ∪ D k upload w k to server end F E + = 1 /* λN for distinct x p */ end Note that in FDD-EA, sampling a candidate solution x p on multiple clients at the same time is seen as one ﬁtnessevaluation (FE) in comparing with the centralized data-drivenoptimization. We consider this is fair since the evaluationson multiple clients are done in parallel, and only one samepoint is sampled. Also, only the participating clients receivethe selected solution x p and are involved in the next round ofmodel updates.In the following, we will present in detail the proposed RAFT 6 sorted model aggregation method, as well as the federatedsurrogate management strategy.

C. Sorted Averaging

In FDD-EA, training of the global surrogate is meant toeffectively guide the evolutionary search instead of achievingaccurate predictions. For this purpose, a federated surrogatemanagement strategy is proposed to sample a new data on theparticipating clients in each round of model update, which isadded to the existing database for training the local surrogate.Consequently, federated surrogate training in FDD-EA is anincremental learning process.In the vanilla federated learning, the training data on thelocal clients is stationary and a weighted averaging aggregationmethod, known as FedAvg [36], is proposed to parameter-wiseaverage the uploaded local models according to the amount ofthe local data. Many variants have been proposed to enhancethe learning performance in the presence of non-iid dataand asynchronous model update [69]. However, since RBFNsare used for the surrogates, and the centers of the radial-basis-functions might have been shifted differently during thetraining. Thus, it has been found that a signiﬁcant performancedrop will occur if we average the centers, widths, and theweights of the RBFNs simply according to the index of thenodes. C C C C C C C C C FedAvg

Fig. 3. An illustrative example of aggregating two local univariate RBFNs,each having three nodes. The three Gaussian functions of the local RBFNsare shown on the left, and the resulting three Gaussian functions of theaggregated global RBFN are plotted on the right. Due to the shifted centersof the neurons, averaging the centers of the Gaussian functions accordingto the node index will lead to unreasonable results, causing possible seriousperformance degradation of the global surrogate.

After analysis, it is found that the serious performancedegradation of the global model is caused by the averagingof very difference centers of the radial basis functions fromdifferent local models. Recall that the initial centers of theRBFNs are generated by clustering the decision variables ofthe training data. Given the same RBFN structure, the centersof the Gaussian functions of the corresponding nodes aresimilar, and a global surrogate can be generated by weightedaveraging. However, it can happen after training, that thecenters of the Gaussian functions have completely shifted,resulting in very different centers for the corresponding nodes of the RBFNs. Fig. 3 provides an illustrative example ofmismatches between two local RBFNs, where C , C and C and C , C and C , respectively, represent the centersof the Gaussian functions of nodes , and of the twoRBFNs. If the Gaussian functions of the two RBFNs areaveraged according to the index of the nodes, the resultingglobal RBFN will have three similar Gaussian functions withcenters located on C , C and C , respectively. As can be seenclearly on the right of Fig. 3, this global RBFN may performvery differently from the two local RBFNs, causing possiblebig performance drop.To alleviate this problem, we propose a sorted averagingmethod to obtain the global RBFN surrogate. The main idea ofthis strategy is to sort the centers of the radial basis functionsof different models so that nodes with similar centers are aver-aged. To achieve this, let us denote C k = ( c k, , c k, , ..., c k,m ) as the center vectors of the k -th uploaded local surrogate w k ,where c k,m ∈ R d , m is the number of centers. Then thefollowing matching metric, which is the squared sum of thecenters over all d dimensions, is calculated for each of the j -th node ( j = 1 , · · · , m ) of the k -th local RBFN: M k, j = d (cid:88) i =1 c k,j,i , j = 1 , , · · · , m. (11)Therefore, the matching vector of the k -th RBFN C k canbe denoted by M k = ( M k, , M k, , ..., M k,m ) . Then, theindex of nodes of the local RBFNs are sorted according totheir matching vector in an ascending order. This way, allparameters, including the centers, widths, and the connectingweights are averaged according to the sorted index of the nodesof different local RBFNs: w = λN (cid:88) k =1 p k w k , (12)where p k is the weight as deﬁned in Equation (3). The pseudocode of sorted averaging is summarized in Algorithm 2. Algorithm 2:

Sorted Averaging

Input :

Local RBF surrogates w k (contain centers C k , weights a k , spreads δ k , bias b k ), globalRBF surrogate w . Init:

Empty matching vector M k . foreach k = 1, 2, 3 ... λN do calculate M k by equation (11)index ← sort( M k )sequence C k , a k , δ k by the index w k = [ C k ; a k ; δ k ; b k ] end w ← (cid:80) λNk =1 p k w k Output: averaged w D. Federated Surrogate Management

In Bayesian optimization, the estimated uncertainty of thepredictions provided by the Gaussian process plays an im-

RAFT 7 portant role in the acquisition functions, since the uncertaintyinformation is vital for striking a balance between explorationand exploitation. In the proposed FDD-EA, the global sur-rogate is built based on the weighted averaging of the localRBFNs and therefore the uncertainty of the the predictionsneeds to be properly estimated.In Section III-B, we have mentioned that LCB instead of EIis adopted in our work. We prefer LCB over EI because thelatter requires the current real minimums of the participatingclients, leading to possible data privacy risks. To avoid thisproblem, we propose a federated LCB (F-LCB for short) asthe acquisition function. The main idea is to make use of thepredictions of both the global and local surrogates as well: ˆ f ( x p ) = ˆ f local ( x p ) + ˆ f fed ( x p )2 , (13)where ˆ f local ( x p ) is calculated by ˆ f local ( x p ) = λN (cid:88) k p k ˆ f k ( x p ) , (14)in which p k is the weight for the k -th local surrogate, n k isthe number of data on the k -th client, Equation (14) denotesthe mean predicted value on x p of all participated clients. And f fed ( x p ) is the predicted value of the federated global model, ˆ f fed ( x p ) = w ( x p ) . (15)Consequently, the square of the standard deviation can becalculated by: ˆ s ( x p ) = 1 λN (cid:34) λN (cid:88) k (cid:16) ˆ f k ( x p ) − ˆ f ( x p ) (cid:17) + (cid:16) ˆ f fed ( x p ) − ˆ f ( x p ) (cid:17) (cid:21) . (16)Finally, the federated acquisition function, F-LCB, can becalculated by replacing ˆ f ( x p ) and ˆ s ( x p ) in Equation (8)with the predicted ﬁtness in Equations (13) and the estimatedstandard deviation in (16), respectively.The combination of the local and global predictions, insteadof the aggregated global surrogate only, aims to further in-crease the prediction quality and the quality of the uncertaintyestimation for surrogate management. A comparative studywith discussions will be presented in Section IV-E.IV. S IMULATION S TUDIES

To verify the effectiveness of FDD-EA, we ﬁrst compareit with several state-of-the-art centralized data-driven evolu-tionary optimization algorithms on widely used benchmarks.Furthermore, we examine the performance of FDD-EA on thebenchmarks when the data distributed on the clients are noisy,or when the data is non-iid. Finally, we present a comparativestudy of the proposed acquisition function to its variants.

A. Experimental setting1) Compared algorithms:

To examine the performanceof the proposed algorithm, we compare FDD-EA with fourpopular online data-driven surrogate-assisted evolutionary al-gorithms on ﬁve benchmark problems, Ellipsoid, Rosenbrock,Ackely, Rastrigin and Griewank (the reader is referred to[56] for details of the benchmarks), with different numbersof decision variables ( d =10, 20, 30). A summary of the mainfeatures of the benchmarks is listed in Table II. The algorithmsunder comparison include CAL-SAPSO [11], GPEME [56],SHPSO [70], and SSLPSO [71]. Note that all these SAEAsassume that the data is centrally stored and there does not existany federated data-driven evolutionary optimization algorithmsthat address the problem in this work, to the best of ourknowledge.1) CAL-SAPSO: An online SAEA with committee-basedactive learning strategy, which uses an ensemble sur-rogate consisting of a quadratic polynomial regression(PR) model, an RBFN and a simple Kriging model.CAL-SAPSO adopts a query by committee model man-agement strategy on the basis of prediction quality andamount of uncertainty.2) GPEME: A Kriging-based online SAEA with LCBas the acquisition function to select solutions for realobjective evaluations.3) SHPSO: An RBF-based online SAEA, which combinesthe standard PSO and social learning PSO for optimiza-tion and selection of the solutions to be sampled andthen updates the RBF surrogate model.4) SSLPSO: A surrogate-assisted social learning particleswarm optimization method.In this work, a real-coded genetic algorithm (RCGA) [72],[73] is selected as the base optimizer that applies the simu-lated binary crossover, polynomial mutation and tournamentselection. The RCGA will run for 100 generations to ﬁndthe minimum of the F-LCB. All algorithms under comparisoncollect d data pairs using the real ﬁtness evaluations (FEs)for building the surrogate before optimization starts, and theoptimization ends when a total of d FEs is exhausted.

2) Data partitions:

To thoroughly investigate the perfor-mance of the proposed algorithm, we verify its optimizationperformance on three types of data distributions: IID, noisy,and non-iid.1)

IID : All clients are able to sample any data pointin the entire decision space, and the initial points x i ( i = 1 , , · · · , d ) on all clients are the same, whichare sampled using the LHS method. Meanwhile, allclients are able to sample any solutions identiﬁed byminimizing the federated acquisition function during theoptimization.2) Noisy environments : The same setting as the above,except that the ﬁtness evaluations on all clients aresubject to noise. Detailed deﬁnition of the noise is givenin Section IV-C.3)

Non-IID : In real-world applications, it is likely that someclients are not able to sample all data points speciﬁed bythe acquisition function due to, for instance, different op-

RAFT 8 erating conditions. To examine the performance of FDD-EA subject to this constraint, we conduct experimentsin which some data points speciﬁed by the acquisitionfunction by the server are not accessible to some clients,resulting in non-iid data similar to federated learning.Note, however, that even in the IID case, the training dataon different clients may be slightly different, due to the factthat in each round of model update, only a small portion ofthe clients participate and sample new data.

TABLE IIT

EST P ROBLEMS .Problem d Optimum CharacteristicsEllipsoid 10, 20, 30 0.0 Uni-modalAckley 10, 20, 30 0.0 Multi-modalRastrigin 10, 20, 30 0.0 Multi-modalGriewank 10, 20, 30 0.0 Multi-modalRosenbrock 10, 20, 30 0.0 Multi-modal

3) Parameter settings:

The parameter settings of FDD-EAare as follows: • Total number of clients : N = 100. • Participating ratio: λ = 0.1. • Number of local epochs: E = 20. • Learning rate for clients: η = 0.12. • Number of RBF centers: m = 2 d + 1 . • Number of generations of EA: g = 100.The basis function of the RBF models is the Gaussian func-tion, the m centers are determined by the k-means clusteringalgorithm, and the widths are calculated by the maximumdistance between the centers, the weights and the biases aretrained using the gradient based method for E = 20 epochswith a learning rate η = 0.12. Each experiment is performedfor 20 independent runs. B. Results on IID Data

In the case of IID data distribution, LHS is used to samplethe initial d data pairs on all clients. The results are presentedin Table III, in which the average ranks of each algorithm arecalculated by the Friedman’s test, and the p -values are adjustedaccording to the Hommel’s procedure [74] with a signiﬁcancelevel of 0.05. The better results are highlighted. It can be con-cluded that the proposed FDD-EA performs signiﬁcantly betterthan the compared algorithm on 11 out of 15 instances. Thisindicates that FDD-EA performs competitively in comparisonwith the state-of-the-art centralized data-driven evolutionaryalgorithms.To take a closer look at the performances of the comparedalgorithms, the convergence proﬁles of the compared algo-rithms on the 10D, 20D and 30D test functions are presentedin Fig. 4, 5, 6, 7 and 8, respectively. From these results, we canobserve that FDD-EA clearly outperforms all other comparedalgorithms on the Rosenbrock, Ackley and Rastrigin functions.On the Elliposid function, CAL-SAPSO and FDD-EA outper-forms other algorithms under comparison, although FDD-EA performs worse than CAL-SAPSO on the 10D instance. On theGriewank function, FDD-EA is outperformed by CAL-SAPSOor SHPSO, nevertheless, still clearly outperforms SSLPSO andGPEME.To summarize, FDD-EA performs the best on 11 out of15 instances in the IID data environment, and it consistentlyconverges fast on all test instances. From these results wecan conclude that FDD-EA and the ensemble-based methodCAL-SAPSO are more competitive than the other comparedalgorithms. However, it is noticed that FDD-EA quickly getsstuck in a local optimum in the later search stage.We also compare FDD-EA with SA-COSO on high-dimensional problems up to 100D, because the online central-ized SAEAs compared above are designated for optimizationproblems up to 30 decision variables. Therefore, we com-pare FDD-EA with the surrogate-assisted co-operative swarmoptimization algorithm (SA-COSO) [10] designed for high-dimensional expensive optimization on the ﬁve test instancesof 50D and 100D, respectively. The experimental resultsare presented in Table V and Figs. 16, 17, 18, 19, 20 ofthe Supplementary material in the Appendix A. From theseresults, we can see that FDD-EA outperforms SA-COSO onall instances except for the 100D Griewank function. C. Noisy Fitness Evaluations

In data driven optimization, data might be collected fromproduction process, which is subject to noise [75] [76]. Tostudy robustness of the proposed algorithm against noisyﬁtness evaluations, we add f k ( x ) = f ( x ) + α ξ, ∀ k ∈ N (17)where α ∈ [0 , is a constant that determines the magnitudeof the noise, ξ is noise sampled from the standard Gaussiandistribution N (0 , , and f ( x ) denotes the benchmark prob-lem.We examine the performance of FDD-EA on the above-mentioned benchmark problems, with all settings being thesame as in Section IV-B and present the results in Fig. 9.From these results, we ﬁnd that the performance of FDD-EAon all the tested instances except for the 20D Ackley and30D Griewank functions does not seriously deteriorate whenthe noise level changes from 0 to 1. Thus, we conclude thatFDD-EA is fairly insensitive to noise in the ﬁtness evaluations. D. Non-IID

Here, we examine the performance of FDD-EA on non-IIDdata distribution, considering the fact that different clients mayhave different operating conditions and speciﬁcations, whichleads to the situation where some regions in the decision spacebecome infeasible on some local clients. To take this situationinto account, we introduce an infeasible domain dom ( k, τ ) forthe k -th client to determine whether a candidate point x p canbe sampled by the k -th client. dom ( k, τ ) =[ x lb + ( k − g k , min { x lb + ( k + τ − g k , x ub } ] , (18) RAFT 9

TABLE IIIA

VERAGE BEST FITNESS VALUES ( SHOWN AS AVG ± STD ) OBTAINED BY

FDD-EA, CAL-SAPSO, GPEME, SHPSO

AND

SSLPSO. T

HE AVERAGERANKS ARE OBTAINED ACCORDING TO THE F RIEDMAN ’ S TEST WITH THE p - VALUES BEING ADJUSTED ACCORDING TO THE H OMMEL ’ S PROCEDURE ANDTHE SIGNIFICANCE LEVEL OF

IS THE CONTROL METHOD .Problem d FDD-EA CAL-SAPSO GPEME SHPSO SSLPSOEllipsoid 10 6.17e-01 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± p -value NA 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W (OOLSVRLGG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG )''($&$/6$362*3(0(6+36266/362 Fig. 4. Convergence proﬁles of FDD-EA, CAL-SAPSO, GPEME, SHPSO and SSLPSO on the Elliposid function for d = 10 , , in terms of the naturallogarithm. where x lb , x ub are the lower and upper bounds of one certaindimension of x , respectively, τ is an integer control parameterthat determines the range of the infeasible domain, and g k iscalculated by g k = x ub − x lb N (19)Speciﬁcally, if the value of a certain dimension of a sampledoes not fall in the infeasible domain dom ( k, τ ) , x p willbe sampled by the k -th client; otherwise, the client fails tosample x p and no new data sample will be generated. Notethat we have the IID case for τ = 0 . The sampling constraintimposed in (18) will lead to more different distributions of thenewly sampled data on different clients, making the federatedoptimization more challenging. The same settings as in the experiment on IID data are adopted, except that data samplingis now constrained. Again, all results are averaged over 20independent runs.The average optimization results together with their standarddeviations over 20 independent runs are presented in Fig. 10.From these results, we can conclude that FDD-EA is robustagainst the non-IID of the newly sampled data, although theperformance has a slight degradation on a few instances, suchas on the 30D Ellipsoid and Griewank functions. Note alsothat the performance of FD-FO even slightly enhances, forinstance, on the 10D and 30D Ackley functions.For a further investigation of the performance change, weplot the convergence proﬁles of FDD-EA on the ﬁve 30Dbenchmark functions in Fig. 11. Note that the size of real RAFT 10 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W 5RVHQEURFNG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV 5RVHQEURFNG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV 5RVHQEURFNG )''($&$/6$362*3(0(6+36266/362 Fig. 5. Convergence proﬁles of FDD-EA, CAL-SAPSO, GPEME, SHPSO and SSLPSO on the Rosenbrock function when d = 10 , , in terms of thenatural logarithm. 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W $FNOH\G )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV $FNOH\G )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV $FNOH\G )''($&$/6$362*3(0(6+36266/362 Fig. 6. Convergence proﬁles of FDD-EA, CAL-SAPSO, GPEME, SHPSO and SSLPSO on the Ackley function when d = 10 , , . 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W 5DVWULJLQG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV 5DVWULJLQG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV 5DVWULJLQG )''($&$/6$362*3(0(6+36266/362 Fig. 7. Convergence proﬁles of FDD-EA, CAL-SAPSO, GPEME, SHPSO and SSLPSO on the Rastrigin function when d = 10 , , in terms of the naturallogarithm. evaluated samples decreases as τ increases, resulting in a slightdeterioration in the performance of FDD-EA. E. Comparison of Surrogate Management Strategies

To demonstrate the efﬁciency of the proposed federatedsurrogate management strategy (F-LCB), we compare it withits two variants: L-LCB calculates the mean ﬁtness based onthe weighted average of the predictions of the local surrogates

RAFT 11 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W *ULHZDQNG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV *ULHZDQNG )''($&$/6$362*3(0(6+36266/362 5HDOILWQHVVHYDOXDWLRQV *ULHZDQNG )''($&$/6$362*3(0(6+36266/362 Fig. 8. Convergence proﬁles of FDD-EA, CAL-SAPSO, GPEME, SHPSO and SSLPSO on the Griewank function when d = 10 , , in terms of thenatural logarithm. = 0.0 = 0.5 = 1.0 $ Y HU DJ H F X UUH Q W E H V W (OOLSVRLG G G G = 0.0 = 0.5 = 1.0 5RVHQEURFN G G G = 0.0 = 0.5 = 1.0 $ Y HU DJ H F X UUH Q W E H V W $FNOH\ G G G = 0.0 = 0.5 = 1.0 5HDOILWQHVVHYDOXDWLRQV 5DVWULJLQ G G G = 0.0 = 0.5 = 1.0 5HDOILWQHVVHYDOXDWLRQV $ Y HU DJ H F X UUH Q W E H V W *ULHZDQN G G G Fig. 9. The mean ﬁtness and the standard deviation of FDD-EA on 15 testinstances in the presence of noisy ﬁtness evaluations when the noise level α changes from 0 to 1. using Equation (14), while G-LCB predicts the mean ﬁtnessusing the global surrogate according to Equation (15). Ac-cordingly, the uncertainty of the predictions for L-LCB and = 0 = 1 = 10 $ Y HU DJ H F X UUH Q W E H V W (OOLSVRLG G G G = 0 = 1 = 10 5RVHQEURFN G G G = 0 = 1 = 10 $ Y HU DJ H F X UUH Q W E H V W $FNOH\ G G G = 0 = 1 = 10 5HDOILWQHVVHYDOXDWLRQV 5DVWULJLQ G G G = 0 = 1 = 10 5HDOILWQHVVHYDOXDWLRQV $ Y HU DJ H F X UUH Q W E H V W *ULHZDQN G G G Fig. 10. The convergence proﬁles of FDD-EA on non-IID data with aninfeasible domain under different τ values. G-LCB are calculated as follows: ˆ s ( x p ) = 1 λN − (cid:34) λN (cid:88) k (cid:16) ˆ f k ( x p ) − ˆ f ( x p ) (cid:17) (cid:35) . (20)The results averaged over 20 independent runs are presentedin Table IV with the same parameter settings as in theexperiments on the IID data except for the acquisition function. RAFT 12 / RJ DY HU DJ H F X UUH Q W E H V W (OOLSVRLGG )''($)''($ )''($ 5RVHQEURFNG )''($)''($ )''($ 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W *ULHZDQNG )''($)''($ )''($ 5HDOILWQHVVHYDOXDWLRQV 5DVWULJLQG )''($)''($ )''($ / RJ DY HU DJ H F X UUH Q W E H V W $FNOH\G )''($)''($ )''($ Fig. 11. The convergence proﬁles of non-IID with an infeasible domain fordifferent τ values. TABLE IVA VERAGE BEST FITNESS VALUES ( SHOWN AS AVG ± STD ) OF FDD-EA

AND ITS TWO VARIANTS USING A DIFFERENT ACQUISITION FUNCTION .Problem d F-LCB L-LCB G-LCBEllipsoid 10 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± These results indicate that F-LCB consistently outperforms G-LCB and F-LCB on all the test instances studied in this work, and the advantage becomes even clearer when the dimensionincreases from 10 to 30. To further illustrate the performancedifferences resulting from the acquisition functions, the con-vergence proﬁles of F-LCB, L-LCB and G-LCB on the ﬁve30D test functions are presented in Fig. 12. One generalobservation we can make is that all methods converge very fastin the early search stage (recall that the ﬁrst 150 FEs are ofﬂinesamples for the 30D test instances), however, the optimizationstagnates quickly after a small number of federated searchsteps. / RJ DY HU DJ H F X UUH Q W E H V W (OOLSVRLGG )/&%//&%*/&% 5RVHQEURFNG )/&%//&%*/&% 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W *ULHZDQNG )/&%//&%*/&% 5HDOILWQHVVHYDOXDWLRQV 5DVWULJLQG )/&%//&%*/&% $ Y HU DJ H F X UUH Q W E H V W $FNOH\G )/&%//&%*/&% Fig. 12. The convergence proﬁles of FDD-EA and its variants using a differentacquisition function.

F. Sensitivity Analysis

To investigate the impacts of the local epoch E , the par-ticipation ratio λ and the learning rate η on the performanceof the proposed FDD-EA, empirical studies are carried outon the 10D Ellipsoid function and the results are describedin the Section A-A of the Supplementary material in theAppendix A. From Fig. 13 - 15, we can conclude that theperformance of the proposed algorithm is relatively insensitiveto these parameter settings. Interestingly, FDD-EA can achievesatisfactory results also for a small E , which is attractive foronline applications of proposed method. We also set E to 20in all our empirical studies presented above. RAFT 13

V. C

ONCLUSIONS AND F UTURE W ORK

Existing data-driven evolutionary algorithms assume all datais centrally stored. In many real-world problems, by contrast,the data is collected on multiple sites and not allowed to betransmitted to a central server for security and privacy reasons,as well as for computational and real-time requirements. Tobridge the gap, this paper proposes a federated data-drivenevolutionary algorithm, FDD-EA, for distributed optimization.In FDD-EA, a sorted averaging method is designed for ag-gregating the local RBF surrogates to generate the globalsurrogate. In addition, a federated acquisition function isproposed to make use of the information in both local and theglobal surrogates. Our comparative studies on ﬁve benchmarkproblems demonstrate the effectiveness of the proposed FDD-EA, also in the presence of noisy and non-iid data.However, several questions remain open. For example,although FDD-EA converges fast in the beginning of thesearch and is able to achieve competitive performance incomparison to existing centralized SAEAs, it can make onlyminor improvements after a certain number of search rounds,which might be attributed to the inefﬁciency of the presentincremental learning algorithm in the federated optimizationenvironment. In addition, the data distributions considered inthis work is relatively ideal, taking into account the fact thatin practice the data may be vertically partitioned, which willmake the federated surrogate training and optimization muchmore challenging. Furthermore, it is desirable to extend theproposed framework to multi- and many-objective data-drivenoptimization problems. Finally, validation of the proposedframework on real-world problems will be our future target.A

PPENDIX AS UPPLEMENTARY M ATERIAL

This is the supplementary material of the paper ”A Fed-erated Data-Driven Evolutionary Algorithm”, providing someadditional experimental results. In this material, we start bypresenting the results of sensitivity analysis of three importantparameters, the local learning epochs, participation ratio andlarning rate, on the performance of the proposed algorithm.Finally, an comparison of FDD-EA with SA-COSO is con-ducted to examine the performance of the proposed algorithmon 50- and 100-dimensional optimization problems.

A. Sensitivity Analysis

To investigate the impact of the important parameters ofFDD-EA on its performance, we conduct sensitivity analysisof three parameters, namely the number of epochs E in localtraining, participation ratio λ , and learning rate η . We presentthe results on the 10-D Ellipsoid function as an illustrativeexample, and similar conclusions can be drawn on other testinstances.

1) Sensitivity Analysis of Number of Local Epochs:

Toavoid the disturbance introduced by newly added samples, welook at the performance changes on the ofﬂine data only when E is set to 20, 30, 40, and 50. The convergence proﬁles ofFDD-EA on 10D, 20D and 30D Ellipsoid function averagedover 20 independent runs are illustrated in Fig. 13. The performance ﬂuctuates when E changes, although the changesare minor. The best performance on the 10 is achieved when E = 20 . The performance drops more seriously when E dropsto 20, indicating that the training might be insufﬁcient, Aswe can see, when E = 20 , the performance of FDD-EAis worse than others, increasing the local epoch to 30 and40 lead to a better performance, which may be due to theunderﬁtting problem when E is small. However, in the case of E ≥ , the increasement of E will cause a slight performancedegeneration, which may be caused by the overﬁtting problem.In this paper, wo have not adjust the local epoch E frequently,and use E = 20 for all experiments. The obtained satisfactoryexperimental results proves the robustness of FDD-EA ondifferent local epochs and the positive effect of the servermodel.

2) Sensitivity Analysis of Participation Ratio:

As we know,participation ratio λ is a parameters of important practicalsigniﬁcance. Intuitively, the participation ratio determines thenumber of participants, and in real-world applications, λ isalways inﬂuenced by the bandwidth, the remaining power ofedge devices and the computational budgets. Generally, lowparticipation rates make federated leanring more challening.Here, we set λ = 0 . , . , . , . to investigate itsinﬂuence on the optimization results. The convergence proﬁlesof FDD-EA on the 10D Ellipsoid function are illustrated inFig. 14. We see that the performance of FDD-EA is insensi-tive to the participation ratio, although a slight performancedegeneration is observed when λ = 0 . , meaning ﬁve clientsparticipate in each round of updates.

3) Sensitivity Analysis of Learning Rate:

The training pro-cess of local RBFN models are based on gradient the descendmethod and hence, the learning rate is also an important factor.To invesitgate its inﬂuence, we set the learning rate η to 0.01,0.05, 0.10, 0.12, 0.15. The convergence proﬁles are plotted inFig. 15. It can be concluded that FDD-EA performs well when η = 0 . , . , . , . . We also notice that the performancestarts to drop when η = 0 . and d = 10 , which imply that thetraining is inadequte. When d = 30 , the lower learning rateslead to slightly a better performance due to the enough trainingdata and communication rounds. Hence, we can introducelearning rate decay for FDD-EA in high dimensional problems. B. Results on 50- and 100-dimensional Benchmark Problems

In order to examine the ability of FDD-EA to deal withhigh-dimensional optimization problems, we further compareFDD-EA with one state-of-the-art SA-COSO [10] on 50- and100-dimensional benchmark problems. The parameter settingsare the same as the previous experiments. Both algorithmsuse d ofﬂine samples and a maximum of d real ﬁtnessevaluations. d + 1 nodes are used for RBFN modeling. ForSA-COSO, the sample sizes for its two populations are set to d and d , respectively. Note that for the experiments on 100-dimensional problems, we ﬁx the number of training data sizeto d for FDD-EA by selecting the best samples for reducingthe computation time.The experimental results are listed in Table V. As we cansee, the proposed FDD-EA outperforms SA-COSO on 9 out of RAFT 14 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W (OOLSVRLGG ( ( ( ( 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG ( ( ( ( 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG ( ( ( ( Fig. 13. Convergence proﬁles of FDD-EA in terms of the natural logarithm on the 10D-30D Ellipsoid functions with different local epochs. 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W (OOLSVRLGG 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG Fig. 14. Convergence proﬁles of FDD-EA in terms of the natural logarithm on the 10D-30D Ellipsoid functions with different participation ratios. 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W (OOLSVRLGG 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG Fig. 15. Convergence proﬁles of FDD-EA in terms of the natural logarithm on the 10D-30D Ellipsoid functions with different learning rates.

10 benchmark problems according to the Wilcoxon test [77]with the signiﬁcance level being 0.05. It is undeperformed bySA-COSO only on the 100D Griewank function.The convergence proﬁles of the two algorithms on the ﬁvebenchmarks when d = 50 , are illustrated in Fig. 16 -20. Similar to lower dimensional cases, FDD-EA convergesquickly on all test instances, and outperforms SA-COSO onfour out of ﬁve test functions. However, as shown in Fig. 20,the mean best ﬁtness of SA-COSO continues to improve on the50D and 100D Griewank function and eventually outperforms FDD-EA when the number of FE increases. This might beattributed to the fact that FDD-EA does not contain any ﬁnesearch strategies as SA-COSO or other online SAEAs.R EFERENCES[1] P. J. Fleming and R. C. Purshouse, “Evolutionary algorithms in controlsystems engineering: a survey,”

Control Eng. Pract. , vol. 10, no. 11, pp.1223–1241, 2002.[2] M. G. C. Tapia and C. A. C. Coello, “Applications of multi-objectiveevolutionary algorithms in economics and ﬁnance: A survey,” in

Proc.IEEE Congr. Evol. Comput.

IEEE, 2007, pp. 532–539.

RAFT 15

TABLE VA

VERAGE BEST FITNESS VALUES ( SHOWN AS AVG ± STD ) ON THE BENCHMARK FUNCTIONS WHEN d = 50 , , WHERE THE p - VALUES ARECALCULATED BY THE PAIRWISE W ILCOXON RANK SUM TEST AT

SIGNIFICANCE LEVEL .Problem d FDD-EA SA-COSO p -valueEllipsoid 50 ± ± + )100 ± ± + )Rosenbrock 50 ± ± + )100 ± ± + )Ackley 50 ± ± + )100 ± ± + )Rastrigin 50 ± ± + )100 ± ± - )Griewank 50 ± ± + )100 4.72e+01 ± ± - )Win / Lose / Tie 9/1/0 1/9/0 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W (OOLSVRLGG )''($6$&262 5HDOILWQHVVHYDOXDWLRQV (OOLSVRLGG )''($6$&262 Fig. 16. Convergence proﬁles of FDD-EA and SA-COSO on the Ellipsoid function in terms of the natural logarithm when d = 50 , . 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W 5RVHQEURFNG )''($6$&262 5HDOILWQHVVHYDOXDWLRQV 5RVHQEURFNG )''($6$&262 Fig. 17. Convergence proﬁles of FDD-EA and SA-COSO in terms of the natural logarithm on the Rosenbrock function when d = 50 , .[3] D. Dasgupta and Z. Michalewicz, Evolutionary algorithms in engineer-ing applications . Springer Science & Business Media, 2013.[4] Y. Jin, H. Wang, T. Chugh, D. Guo, and K. Miettinen, “Data-drivenevolutionary optimization: An overview and case studies,”

IEEE Trans.Evol. Comput. , vol. 23, no. 3, pp. 442–458, 2018. [5] Y. Mouilleau and A. Champassith, “Cfd simulations of atmospheric gasdispersion using the ﬁre dynamics simulator (fds),”

J. Loss Prev. ProcessInd. , vol. 22, no. 3, pp. 316–323, 2009.[6] Y. Jin, “Surrogate-assisted evolutionary computation: Recent advancesand future challenges,”

Swarm Evol. Comput. , vol. 1, no. 2, pp. 61–70,

RAFT 16 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W $FNOH\G )''($6$&262 5HDOILWQHVVHYDOXDWLRQV $FNOH\G )''($6$&262 Fig. 18. Convergence proﬁles of FDD-EA and SA-COSO in terms of the natural logarithm on the Ackley problem when d = 50 , . 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W 5DVWULJLQG )''($6$&262 5HDOILWQHVVHYDOXDWLRQV 5DVWULJLQG )''($6$&262 Fig. 19. Convergence proﬁles of FDD-EA and SA-COSO in terms of the natural logarithm on the Rastrigin function when d = 50 , . 5HDOILWQHVVHYDOXDWLRQV / RJ DY HU DJ H F X UUH Q W E H V W *ULHZDQNG )''($6$&262 5HDOILWQHVVHYDOXDWLRQV *ULHZDQNG )''($6$&262 Fig. 20. Convergence proﬁles of FDD-EA and SA-COSO in terms of the natural logarithm on the Griewank function when d = 50 , .2011.[7] T. Chugh, K. Sindhya, J. Hakanen, and K. Miettinen, “A survey on han-dling computationally expensive multiobjective optimization problemswith evolutionary algorithms,” Soft Comput. , vol. 23, pp. 3137–3166,2019. [8] Y. Jin, M. Olhofer, and B. Sendhoff, “A framework for evolutionaryoptimization with approximate ﬁtness functions,”

IEEE Trans. Evol.Comput. , vol. 6, no. 5, pp. 481–494, 2002.[9] Z. Zhou, Y. S. Ong, P. B. Nair, A. J. Keane, and K. Y. Lum, “Combiningglobal and local surrogate models to accelerate evolutionary optimiza-

RAFT 17 tion,”

IEEE T. SYST. MAN. CY. C , vol. 37, no. 1, pp. 66–76, 2007.[10] C. Sun, Y. Jin, R. Cheng, J. Ding, and J. Zeng, “Surrogate-assisted co-operative swarm optimization of high-dimensional expensive problems,”

IEEE Trans. Evol. Comput. , vol. 21, no. 4, pp. 644–660, 2017.[11] H. Wang, Y. Jin, and J. Doherty, “Committee-based active learning forsurrogate-assisted particle swarm optimization of expensive problems,”

IEEE T. Cybern. , vol. 47, no. 9, pp. 2664–2677, 2017.[12] J. Knowles, “ParEGO: A hybrid algorithm with on-line landscape ap-proximation for expensive multiobjective optimization problems,”

IEEETrans. Evol. Comput. , vol. 10, no. 1, pp. 50–66, 2006.[13] Q. Zhang, W. Liu, E. Tsang, and B. Virginas, “Expensive multiobjectiveoptimization by MOEA/D with Gaussian process model,”

IEEE Trans.Evol. Comput. , vol. 14, no. 3, pp. 456–474, 2010.[14] D. Lim, Y. Jin, Y. Ong, and B. Sendhoff, “Generalizing surrogate-assisted evolutionary computation,”

IEEE Trans. Evol. Comput. , vol. 14,no. 3, pp. 329–355, 2010.[15] T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, and K. Sindhya, “Asurrogate-assisted reference vector guided evolutionary algorithm forcomputationally expensive many-objective optimization,”

IEEE Trans.Evol. Comput. , vol. 22, pp. 129–142, 2018.[16] A. Habib, H. K. Singh, T. Chugh, T. Ray, and K. Miettinen, “Amultiple surrogate assisted decomposition-based evolutionary algorithmfor expensive multi/many-objective optimization,”

IEEE Trans. Evol.Comput. , vol. 23, no. 6, pp. 1000–1014, 2019.[17] D. Guo, X. Wang, K. Gao, Y. Jin, J. Ding, and T. Chai, “Evolutionaryoptimization of high-dimensional multi- and many-objective expensiveproblems assisted by a dropout neural network,”

IEEE Trans. Syst. ManCybern. Syst. , 2020.[18] Z. Li, Z. Dong, Z. Liang, and Z. Ding, “Surrogate-based distributedoptimisation for expensive black-box functions,”

Automatica , vol. 125,p. 109407, 2021.[19] S. Mao, B. Wang, Y. Tang, and F. Qian, “Opportunities and challengesof artiﬁcial intelligence for green manufacturing in the process industry,”

Engineering , vol. 5, no. 6, pp. 995–1002, 2019.[20] W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact ﬁrst-orderalgorithm for decentralized consensus optimization,”

SIAM J. Optim. ,vol. 25, no. 2, pp. 944–966, 2015.[21] X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Candecentralized algorithms outperform centralized algorithms? a case studyfor decentralized parallel stochastic gradient descent,” in

Proc. Adv.Neural Inf. Process Syst. , 2017, pp. 5330–5340.[22] J. Zhang and K. You, “Asyspa: An exact asynchronous algorithm forconvex optimization over digraphs,”

IEEE Trans. Autom. Control , 2019.[23] H. Liu, Y.-S. Ong, and J. Cai, “A survey of adaptive sampling for globalmetamodeling in support of simulation-based complex engineering de-sign,”

Struct. Multidiscip. Optim. , vol. 57, no. 1, pp. 393–416, 2018.[24] H. Wang, Y. Jin, and J. O. Jansen, “Data-driven surrogate-assistedmultiobjective evolutionary optimization of a trauma system,”

IEEETrans. Evol. Comput. , vol. 20, no. 6, pp. 939–952, 2016.[25] T. Harada and E. Alba, “Parallel genetic algorithms: A useful survey,”

ACM Surveys , vol. 53, no. 4, p. Article No. 86, 2020.[26] E. Cantu-Paz, “Designing efﬁcient master-slave parallel genetic algo-rithms,” Department of Computer Science, University of Illinois atUrbana-Champaign, Tech. Rep. IlliGAL Report No. 97004, 1997.[27] R. Michel and M. Middendorf, “An island model based ant system withlookahead for the shortest supersequence problem,” in

Proc. ParallelProbl. Solving Nat.

Springer, 1998, pp. 692–701.[28] D. Lim, Y.-S. Ong, Y. Jin, B. Sendhoff, and B.-S. Lee, “Efﬁcienthierarchical parallel genetic algorithms using grid computing,”

FutureGeneration Computer Systems , vol. 23, no. 4, pp. 658–670, 2007.[29] X. Ma, X. Li, Q. Zhang, K. Tang, Z. Liang, W. Xie, and Z. Zhu, “Asurvey on cooperative co-evolutionary algorithms,”

IEEE Trans. Evol.Comput. , vol. 23, no. 3, pp. 421 – 441, 2018.[30] G. Folino, C. Pizzuti, and G. Spezzano, “Training distributed gp en-semble with a selective algorithm based on clustering and pruning forpattern classiﬁcation,”

IEEE Trans. Evol. Comput. , vol. 12, no. 4, pp.458–468, 2008.[31] G. Roy, H. Lee, J. L. Welch, Y. Zhao, V. Pandey, and D. Thurston,“A distributed pool architecture for genetic algorithms,” in

Proc. IEEECongr. Evol. Comput.

IEEE, 2009, pp. 1177–1184.[32] P. Bouvry, F. Arbab, and F. Seredynski, “Distributed evolutionaryoptimization, in manifold: Rosenbrock’s function case study,”

Inf. Sci. ,vol. 122, no. 2-4, pp. 141–159, 2000.[33] R. Subbu and A. C. Sanderson, “Network-based distributed planningusing coevolutionary agents: architecture and evaluation,”

IEEE Trans.Syst., Man, Cybern. Syst , vol. 34, no. 2, pp. 257–269, 2004. [34] Z. Ren, B. Pang, M. Wang, Z. Feng, Y. Liang, A. Chen, and Y. Zhang,“Surrogate model assisted cooperative coevolution for large scale opti-mization,”

Appl. Intell. , vol. 49, no. 2, pp. 513–531, 2019.[35] Y.-J. Gong, W.-N. Chen, Z.-H. Zhan, J. Zhang, Y. Li, Q. Zhang, andJ.-J. Li, “Distributed evolutionary algorithms and their models: A surveyof the state-of-the-art,”

Appl. Soft Comput. , vol. 34, pp. 286–300, 2015.[36] H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al. ,“Communication-efﬁcient learning of deep networks from decentralizeddata,” in

Proc. Artif. Intell. Statist. , 2016.[37] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning:Concept and applications,”

ACM Trans. Intell. Syst. Technol. , vol. 10,no. 2, p. Article No.: 12, 2019.[38] M. Mohri, G. Sivek, and A. T. Suresh, “Agnostic federated learning,”in

Proc. Int. Conf. Mach. Learn. , 2019, pp. 4615–4625.[39] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith,“Federated optimization in heterogeneous networks,” in

Proc. Conf.Machine Learning and Systems , 2020.[40] H. Zhu and Y. Jin, “Multi-objective evolutionary federated learning,”

IEEE Trans. Neural Netw. Learn. Syst. , vol. 31, no. 4, pp. 1310 – 1322,2020.[41] H. Zhu, H. Zhang, and Y. Jin, “From federated learning to federatedneural architecture search: a survey,”

Complex & Intelligent Systems ,2020.[42] J. Koneˇcn`y, H. B. McMahan, D. Ramage, and P. Richt´arik, “Federatedoptimization: Distributed machine learning for on-device intelligence,” arXiv preprint arXiv:1610.02527 , 2016.[43] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman,V. Ivanov, C. Kiddon, J. Konecny, S. Mazzocchi, H. B. McMahan et al. ,“Towards federated learning at scale: System design,” in

Proc. Conf.SysML , 2019.[44] F. Sattler, S. Wiedemann, K.-R. M¨uller, and W. Samek, “Robust andcommunication-efﬁcient federated learning from non-iid data,”

IEEETrans. Neural Netw. Learn. Syst. , 2019.[45] J. Xu, W. Du, R. Cheng, W. He, and Y. Jin, “Ternary compression forcommunication-efﬁcient federated learning,”

IEEE Trans. Neural Netw.Learn. Syst. , 2020.[46] T. Chen, X. Jin, Y. Sun, and W. Yin, “Vaﬂ: a method of verticalasynchronous federated learning,” in

Proc. Int. Conf. Mach. Learn.

JMLR, 2020.[47] A. Triastcyn and B. Faltings, “Federated learning with bayesian differ-ential privacy,” in

Proc. IEEE Inter. Conf. Big Data (Big Data) . IEEE,2019, pp. 2587–2596.[48] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. Quek,and H. V. Poor, “Federated learning with differential privacy: Algorithmsand performance analysis,”

IEEE Trans. Inf. Forensics Security , 2020.[49] H. Zhu, R. Wang, Y. Jin, K. Liang, and J. Ning, “Distributedadditive encryption and quantization for privacy preserving federateddeep learning,” arXiv:2011.12623 , 2020. [Online]. Available: https://arxiv.org/pdf/2011.12623.pdf[50] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra,“Federated learning with non-iid data,” arXiv:1806.00582 , 2018.[Online]. Available: https://arxiv.org/pdf/1806.00582.pdf[51] T. Nishio and R. Yonetani, “Client selection for federated learning withheterogeneous resources in mobile edge,” in

Proc. IEEE Int. Conf.Commun.

IEEE, 2019, pp. 1–7.[52] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, andK. Chan, “Adaptive federated learning in resource constrained edgecomputing systems,”

IEEE J. Sel. Areas Commun. , vol. 37, no. 6, pp.1205–1221, 2019.[53] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”

Nature , vol. 521,no. 7553, pp. 436–444, 2015.[54] Y. Jin, “A comprehensive survey of ﬁtness approximation in evolutionarycomputation,”

Soft Comput. , vol. 9, no. 1, pp. 3–12, 2005.[55] H. Wang, Y. Jin, C. Sun, and J. Doherty, “Ofﬂine data-driven evolution-ary optimization using selective surrogate ensembles,”

IEEE Transac-tions on Evolutionary Computation , vol. 23, no. 2, pp. 203–216, 2018.[56] B. Liu, Q. Zhang, and G. G. Gielen, “A gaussian process surrogate modelassisted evolutionary algorithm for medium scale expensive optimizationproblems,”

IEEE Trans. Evol. Comput. , vol. 18, no. 2, pp. 180–192,2013.[57] T. Goel, R. T. Haftka, W. Shyy, and N. V. Queipo, “Ensemble ofsurrogates,”

Struct. Multidiscip. Optim. , vol. 33, no. 3, pp. 199–216,2007.[58] K. L. Du and M. N. S. Swamy, “Radial basis function networks,” in

Neural Networks and Statistical Learning , 1st ed. London, U.K.:Springer, 2014, pp. 299–335.

RAFT 18 [59] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. d. F. de Freitas,“Taking the human out of the loop: A review of bayesian optimization,”

Proc. IEEE , vol. 104, no. 1, pp. 148–175, 2016.[60] H. J. Kushner, “A new method of locating the maximum point of anarbitrary multipeak curve in the presence of noise,”

J. Basic Eng. ,vol. 86, no. 1, pp. 97–106, 1964.[61] A. Zhilinskas, “Single-step bayesian search method for an extremum offunctions of a single variable,”

Cybern. Syst. Anal. , vol. 11, no. 1, pp.160–166, 1975.[62] D. R. Jones, M. Schonlau, and W. J. Welch, “Efﬁcient global optimiza-tion of expensive black-box functions,”

J. Glob. Optim. , vol. 13, no. 4,pp. 455–492, 1998.[63] P. Auer, “Using conﬁdence bounds for exploitation-exploration trade-offs,”

J. Mach. Learn. Res. , vol. 3, no. Nov, pp. 397–422, 2002.[64] V. Torczon and M. Trosset, “Using approximations toaccelerate engineering design optimization,” in

Proc. 7thAIAA/USAF/NASA/ISSMOSymp. Multidiscipl. Anal. Optim. , 1998,p. 4800.[65] A. ˇZilinskas, “A review of statistical models for global optimization,”

J.Global Optim. , vol. 2, no. 2, pp. 145–153, 1992.[66] M. T. Emmerich, K. C. Giannakoglou, and B. Naujoks, “Single-andmultiobjective evolutionary optimization assisted by gaussian randomﬁeld metamodels,”

IEEE Trans. Evol. Comput. , vol. 10, no. 4, pp. 421–439, 2006.[67] M. Huesken, Y. Jin, and B. Sendhoff, “Structure optimization of neuralnetworks for evolutionary design optimization.”

Soft Comput. , vol. 9,no. 1, pp. 21–28, 2005.[68] M. Stein, “Large sample properties of simulations using latin hypercubesampling,”

Technometrics , vol. 29, no. 2, pp. 143–151, 1987.[69] Y. Chen, X. Sun, and Y. Jin, “Communication-efﬁcient federated deeplearning with layer-wise asynchronous model update and temporallyweighted aggregation,”

IEEE Trans. Neural Netw. Learn. Syst. , vol. 31,no. 10, p. 4229 – 4238, 2020.[70] H. Yu, Y. Tan, J. Zeng, C. Sun, and Y. Jin, “Surrogate-assisted hierar-chical particle swarm optimization,”

Inf. Sci. , vol. 454, pp. 59–72, 2018.[71] H. Yu, Y. Tan, C. Sun, and J. Zeng, “A generation-based optimalrestart strategy for surrogate-assisted social learning particle swarmoptimization,”

Knowledge-Based Syst. , vol. 163, pp. 14–25, 2019.[72] K. Kumar and K. Deb, “Real-coded genetic algorithms with simulatedbinary crossover: Studies on multimodal and multiobjective problems,”

Complex syst. , vol. 9, pp. 431–454, 1995.[73] K. Deb, “An efﬁcient constraint handling method for genetic algorithms,”

Comput. Meth. Appl. Mech. Eng. , vol. 186, no. 2-4, pp. 311–338, 2000.[74] J. Derrac, S. Garc´ıa, D. Molina, and F. Herrera, “A practical tutorial onthe use of nonparametric statistical tests as a methodology for comparingevolutionary and swarm intelligence algorithms,”

Swarm Evol. Comput. ,vol. 1, no. 1, pp. 3–18, 2011.[75] D. Guo, T. Chai, J. Ding, , and Y. Jin, “Small data driven evolutionarymulti-objective optimization of fused magnesium furnaces,” in

IEEESymposium on Computational Intelligence . IEEE, 2016.[76] T. Chugh, N. Chakraborti, K. Sindhya, , and Y. Jin, “A data-drivensurrogate-assisted evolutionary algorithm applied to a many-objectiveblast furnace optimization problem,”

Materials and Manufacturing Pro-cesses , vol. 32, no. 1, pp. 1172–1178, 2017.[77] H. B. Mann and D. R. Whitney, “On a test of whether one of two randomvariables is stochastically larger than the other,”