A Federated Data-Driven Evolutionary Algorithm
DDRAFT 1
A Federated Data-Driven Evolutionary Algorithm
Jinjin Xu, Yaochu Jin,
Fellow, IEEE , Wenli Du, Sai Gu
Abstract —Data-driven evolutionary optimization has witnessedgreat success in solving complex real-world optimization prob-lems. However, existing data-driven optimization algorithmsrequire that all data are centrally stored, which is not alwayspractical and may be vulnerable to privacy leakage and securitythreats if the data must be collected from different devices.To address the above issue, this paper proposes a federateddata-driven evolutionary optimization framework that is able toperform data driven optimization when the data is distributedon multiple devices. On the basis of federated learning, a sortedmodel aggregation method is developed for aggregating localsurrogates based on radial-basis-function networks. In addition,a federated surrogate management strategy is suggested bydesigning an acquisition function that takes into account theinformation of both the global and local surrogate models.Empirical studies on a set of widely used benchmark functionsin the presence of various data distributions demonstrate theeffectiveness of the proposed framework.
Index Terms —Data-driven evolutionary optimization, dis-tributed optimization, federated learning, RBFN surrogatemodel.
I. I
NTRODUCTION E VOLUTIONARY algorithms (EAs), as meta-heuristictechniques, have been shown to be effective solvers formany real-world problems over the past few decades [1][2] [3] [4]. Apart from their strong capability of tacklingnon-convex and multi-model problems, EAs do not reply onanalytic and differential objective functions and are able toperform optimization on the basis of collected data, which areusually referred to as data-driven optimization [4]. In somedata-driven optimization problems, data are collected throughtime-consuming numerical simulations or expensive physicalexperiments. For example, a single run of fire dynamicssimulation may take several hours [5]. In these cases, onlya limited amount of data is available for data-driven optimiza-tion, posing major challenges to building surrogates neededfor data-driven optimization. Therefore, existing work on data-driven surrogate-assisted evolutionary optimization focuses ondeveloping surrogate modeling and management techniques[6] [7] that can help find acceptable solutions with a limitedcomputational budget for single-objective optimization [8] [9]
Manuscript received xx, 2021; revised xxx, 2021. The work is supportedby the National Natural Science Foundation of China (Basic Science CenterProgram: 61988101), International (Regional) Cooperation and ExchangeProject (61720106008) and National Natural Science Fund for DistinguishedYoung Scholars (61725301). (
Jinjin Xu and Yaochu Jin contributed equallyto this work. ) (
Corresponding authors: Yaochu Jin; Wenli Du. )Jinjin Xu, Wenli Du are with the Key Laboratory of Advanced Controland Optimization for Chemical Processes, Ministry of Education, East ChinaUniversity of Science and Technology, Shanghai, 200237, China. E-mail:[email protected], [email protected] Jin is with the Department of Computer Science, University ofSurrey, Guildford, GU2 7XH, UK. E-mail: [email protected] Gu is with the Department of Chemical and Process Engineering,University of Surrey, Guildford, GU27XH, UK. E-mail:[email protected]. [10] [11], multi-objective optimization [12] [13] [14], andmany-objective optimization [15] [16] [17].One implicit assumption made in most existing work ondata-driven evolutionary optimization is that all data formodeling the surrogates is centrally stored, which does nothold for many real-world optimization problems. For example,there are many large-scale complex systems in manufacturingand process industries [18] [19] consisting of sub-systemsthat may be distributed on multiple locations and all thesesystems must be considered at the same time to achieve theoptimal performance. Collecting data from sub-systems andstoring the data on a central server not only give rise tocommunication problems, but also raise security and privacyconcerns. To address the above problem, some research ondistributed optimization has been carried out in the field ofautomatic control, where distributed or decentralized gradientbased optimization methods have been developed [20] [21][22]. These work assumes, however, that exact analytic ob-jective functions are available. Most recently, Li et al. [18]present a gradient-based distributed optimization method forblack-box optimization problems, under the assumption thatthe approximated objective functions are differentiable andstrongly convex. Thus, their optimization algorithm is notapplicable to optimization problems such as airfoil design [23]and trauma system design [24].In evolutionary computation, distributed evolutionary algo-rithms (EAs) have been investigated to reduce computationtime or to deal with large-scale optimization problems. Forexample, parallel evolutionary optimization [25] based on amaster-slave mode [26], island mode [27] and grid mode [28]have been proposed to perform fitness evaluations in a paralleland distributed manner to reduce the required computationtime, assuming that multiple processors are available. Inaddition, a large number of cooperative and co-evolutionaryalgorithms [29] have been proposed for solving large-scaleand complex optimization problems, which can largely bedivided into population-distributed [30] [31] and dimension-distributed [32] [33]. To further reduce the computation time,a surrogate model is built for each sub-population in [34]. Acomprehensive survey of distributed EAs, including parallel,hierarchical and co-evolutionary algorithms can be found in[35]. It should be noted that none of these distributed EAs aremeant for solving data-driven optimization problems wherethe data is distributed on multiple devices.Meanwhile, a distributed data-driven machine learningparadigm, called federated learning [36] [37], has received in-creasing attention in the field of machine learning. In federatedlearning, multiple clients collaboratively train a global modelwithout requiring to upload the data collected on multipleclients to a server, reducing the privacy and security risks.Interestingly, evolutionary algorithms (EAs) have been applied a r X i v : . [ c s . N E ] F e b RAFT 2 in federated learning, including using EAs to optimize themixture coefficients or model weights [38] [39], or performingevolutionary neural architecture search in the federated learn-ing framework [40] [41]. To the best of our knowledge, how-ever, none of the above work deals with distributed data-drivensurrogate-assisted evolutionary optimization, where surrogatesare built on the basis of data distributed collected on multipledevices.This work aims to propose a framework for federated data-driven optimization to address a class of distributed data-drivenoptimization problems, focusing on surrogate construction andsurrogate management in a distributed environment in thepresence of possibly noisy and non-independently identicallydistributed (non-iid) data. The main contributions of the workare summarized as follows.1) On the basis of federated learning, a surrogate-assistedfederated data-driven evolutionary algorithm, calledFDD-EA, is proposed, which does not require to storedata on a single server that are originally collected onmultiple devices.2) A sorted averaging method is designed for aggregatinglocal radial-basis-function networks into a global surro-gate on the server, thereby enhancing the performanceof incremental federated learning in particular in thepresence of non-iid data.3) The lower confidence bound acquisition function isadapted to the federated optimization environment,which integrates information from both local and theglobal surrogate models.To validate the proposed federated data-driven evolutionaryoptimization framework, benchmark optimization problemsare adopted for generating data distributed on multiple ma-chines. Non-iid data are simulated by assuming that eachdevice is not able to generate data in some decision subspaces.Our experimental results demonstrate that the proposed feder-ated data-driven optimization framework comparably well orbetter than the state-of-the-art centralized online data-drivensurrogate-assisted evolutionary algorithms on the majority ofthe tested instances.The remainder of this paper is organized as follows. InSection II, we briefly review the basic federated learningparadigm and a few widely used surrogate-assisted optimiza-tion approaches, on both of which the proposed work is based.Section III presents the proposed federated data-driven evolu-tionary optimization framework. In Section IV, the benchmarkproblems used in the empirical studies, experimental settingsand the comparative results are given. Finally, conclusions aredrawn and future directions are discussed in Section V.II. R
ELATED W ORK
In this section, we briefly review the background of thiswork, including federated learning, radial basis function net-work, and the acquisition functions for surrogate-assistedevolutionary algorithms.
A. Federated Learning
The explosion of data resulting from massive edge devices,e.g., Internet-of-Things (IoT), mobile phones and enterprise clouds, has led to the emergence of many edge computingalgorithms and frameworks. Federated learning is one of themost prevalent edge computing methods due to its capabilityof privacy preservation, data security and communicationefficiency. The concept of federated learning was first proposedin [42] [43], and a large body of work has been reportedto further reduce communication cost [44] [40] [45], handlevertical data partition [46], or enhance privacy protection [47][48] [49]. However, the test accuracy of federated learningusually suffers from the non-iid nature of the client datacompared to centralized machine learning methods, since theclient data may be drawn from different distributions in real-world applications. To alleviate this problem, a large numberof new techniques have been proposed, including data sharing[50], client selection [51], and adaptive federated learning [52].
Cellular network EdgedevicesServer
Fig. 1. An illustrative example of federated learning. The edge devices trainlocal models on the private local data and send the trained model parametersto the server for aggregation to obtain the global model. Then, all clientsdownload the aggregated global model and repeat the process until the globalmodel converges.
As illustrated in Fig. 1, a vanilla (standard) federatedlearning system consists of a server, a communication networkand a number of edge devices, also called local clients [36]. Inthe first round, the server randomly initialize a global modeland send the model to all participating local clients. Notethat not all clients participate in each round of model update.Then, each participating local client trains the received globalmodel with its own local data, typically using the gradient-based method, for a number of epochs. After training, theparticipating local clients uploads its updated local model tothe server. Finally, the server will aggregate the uploaded localmodels using weighted averaging, known as FedAvg [36], andsend the the updated global model to the participating localclients for the next round of model update. This process repeatsuntil the global model converges.Assume in the current round, the global model parameters w are sent to λN participating clients, where λ is the ratioof the local clients participating in the current round and N is the total number of local clients. Then the k -th client trainsthe received model on the local dataset D k consisting of n k training data pairs ( x i , y i ), where i = 1 , , , ..., n k , resultingin the updated local model w k . The loss function F k of thelocal client k is defined by F k ( w k ) = 1 n k n k (cid:88) i L ( x i , y i ; w k ) + γ ( w k ) , (1)where L ( · ) is the user-defined loss function and γ denotes theregularizer. And the aim of federated learning is to minimizethe global objective F with a global model w . Therefore, the RAFT 3 global loss function of the federated learning system can bedefined as: min w (cid:40) F ( w ) = λN (cid:88) k =1 p k F ( w k ) (cid:41) , (2)where p k is the weight of w k and calculated by p k = n k (cid:80) λNk =1 n k . (3)In each round, the local model w k is initialized with thedownloaded global w , then client k can update w k usingthe mini-batch stochastic gradient descent (SGD) [53] witha learning rate η k and performs E ( ≥ local training epochs: w i +1 k = w i +1 k − η ik ∇ F k ( w ik ) , i = 0 , , ..., E − . (4)At the end of each round, the server aggregates all updatedlocal models to obtain the updated global model and start thenext round.Note that the vanilla federated learning algorithm adoptsa deep neural network and assumes there is enough trainingdata. However, as we have previously discussed, in data-driven evolutionary optimization, the number of training datais usually very limited since data collection is very expensive.Thus, the global model we are going to use will be a tinymodel. More discussions will be provided in Section III. B. Radial-Basis-Function Networks
Various regression, classification and interpolation methodscan be used as the surrogate models to approximate the realobjective functions [54]. Among them, radial basis functionnetworks (RBFNs) [55], artificial neural networks (ANNs) [8],Gaussian processes (GPs) [56] and polynomial regression (PR)[11] are the most widely ones, and surrogate ensembles havealso been widely studied [55], [57]. In this work, an RBFN isadopted as the surrogates because it has, based on our pilotstudies, shown to be more scalable to the number decisionvariables and easier to train. The structure of the RBFN issimilar to that in [58]. Let x i ∈ R d be the i -th input sampleof the training data, the prediction of the RBFN can be denotedby: ˆ y i = m (cid:88) j =1 a j ϕ j ( || x i − c j || ) + b, (5)where m is the number of centers of the RBFN, a =( a , a , · · · , a m ) are the weights of the RBFN (the subscriptshere indicate the weight indices of the RBFN), the centervector C = ( c , c , ..., c m ) is obtained by the k-meansclustering algorithm, c m ∈ R d , and b ∈ R is the bias. And || x i − c j || is the Euclidean distance between an input and thecenter c j . Finally, ϕ j ( · ) denotes the basis function, and thereare various choices for ϕ j , such as the Gaussian function,logistic function, or thin-plate spline function [58]. In thiswork, we use the Gaussian function, which is expressed by ϕ j ( || x i − c j || ) = e − || x i − c j || δ j , (6) where δ j denotes the standard deviations, also known as thespreads, width or radii. In this work, we select δ j accordingto the maximum distance between the centers [58].For the weights connecting the hidden nodes and the outputof the network, we have ϕ ( || x − c || ) ... ϕ m ( || x − c m || ) ... ...ϕ ( || x n k − c || ) ... ϕ m ( || x n k − c m || ) a ...a m + b ≈ y ...y n k , (7)where n k is the total number of the training samples, and thepseudo-inverse method or gradient-based method [58] can beemployed to train the weights. C. Acquisition functions
Surrogate management, which determines which new so-lutions are to be sampled, i.e., evaluated using the expensiveobjective functions, is central to the effectiveness of surrogate-assisted evolutionary algorithms (SAEAs) [54]. Among differ-ent surrogate management strategies, acquisition functions inBayesian optimization [59], also known as infill criteria inglobal optimization [60]–[62], are mathematically solid andhave been shown very effective in balancing exploration andexploitation in online data-driven surrogate-assisted evolution-ary optimization.Several acquisition functions have been proposed to samplenew solutions [63], such as lower confidence bound (LCB)[64], expected improvement (EI) [62] and probability of im-provement (PI) [65]. Given a candidate solution x p , LCB of x p is calculated by: LCB( x p ) = ˆ f ( x p ) − µ ˆ s ( x p ) , (8)where ˆ f ( x p ) and ˆ s ( x p ) are the predicted mean and standarddeviation (the confidence level of the prediction) of the solu-tion point x p , respectively. The trade-off hyperparameter µ isusually set to 2 as recommended in [66].In this work, we adapt the LCB to the federated optimizationframework because the original LCB cannot be directly ap-plied. The reason is that the RBFN is adopted as the surrogatesand hence the uncertainty information of the predictions, ˆ s ( x p ) , is not directly available. Therefore, a federated infillcriterion based on LCB is proposed, which will be discussedin greater detail in Section III-D.III. P ROBLEM DEFINITION AND P ROPOSED A LGORITHM
In this section, we at first formulate the federated data-driven optimization problem to be addressed in this work.Then, the overall workflow of the proposed FDD-EA isdescribed. Finally, a new strategy for aggregating the localRBFN surrogates will be presented together with an adaptedLCB.
A. Problem definition
As stated in Section I, the main purpose of the present workis to solve data-driven optimization problems where the rawdata for optimization is distributed on different local machinesand not allowed to be transmitted to a central server. Inaddition, we assume that each client has all decision variables,
RAFT 4
GlobalModel w Server
Sorted Averaging D D D k [ x p , f ( x p )][ x p , f ( x p )][ x p , f k ( x p )] Optimizer w w w k Client 1Client 2Client k AcquisitionFunction x p initk D init D init D Fig. 2. The overall framework of the proposed FDD-EA. Firstly, the clients build local surrogates trained on the private dataset D init . Secondly, the serverobtains the global surrogate w by averaging the received local surrogates using the sorted averaging method, and then conducts evolutionary optimizationassisted by the global model w . Thirdly, promising solutions are selected with the help of a proposed surrogate management strategy, which are broadcast toall clients for data collection, i.e., evaluation of the fitness value using the expensive objective functions f k . Finally, the local surrogates are updated on theunion of D initk and the newly sampled data D k . and is able to sample a limited amount of new data as informedby the server. However, each client may be limited to samplingsolutions in a given subspace of the decision space, resultingin a horizontal but non-iid data distribution on different clients.Thus, in federated data-driven evolutionary optimizationsystems considered in this work, only the local clients cansample new data by performing expensive simulations orexperiments in a limited sub-space of the whole decisionspace. Denote the approximated objective function on the k -thclient by y k = f k ( x ) , (9)where x ∈ R d is the decision vector. Since we considerhorizontal data partition in this work, the expensive objectivefunction of all local clients are the same, although the decisionvariables of the clients may be limited to a subspace in theoverall decision space.On the k -th client, a local dataset D k = { ( x , y ) , ..., ( x n k , y n k ) } consisting of n k data pairs ofdecision vectors and their corresponding fitness valuesevaluated according to Equation (9) is stored. Thus, a localsurrogate model w k can be constructed on D k , which isdescribed by: ˆ f k = w k ( x | D k ) . (10)Denote the difference between the surrogate and the real fitnessfunction on the k -th client as (cid:15) , then we have f k ( x ) = ˆ f k + (cid:15) .Recall that the server has no direct access no the trainingdata describing the functional relationship between a decisionvector and its objective function value. Furthermore, the clientsare not allowed to upload such raw data to the server, neithercan they communicate the local raw data to other clients.However, the clients are allowed to upload the parameters oftheir local surrogates to the server for constructing a globalsurrogate. Consequently, the server can aggregate the uploadedlocal surrogates to build global surrogate, based on which the optimum of the overall system x ∗ can be found using anoptimizer, an EA in this work. In other words, the evolutionarysearch is conducted on the server instead of on the localclients, assuming that the computational power on the clientsis limited.Similar to centralized data-driven evolutionary optimization,the global surrogate must be properly managed to effectivelyassist the EA to find the optimum of the global system. Once apromising solution x p is selected according to the acquisitionfunction on the server, it will be broadcast to all participatingclients. If the solution is within the feasible subspace of k -thclient, the objective value of x p will be sampled using f k ( x p ) ,in practice, a time-consuming numerical simulation will beperformed or an experiment will be conducted. The samplednew data pair, ( x p , f k ( x p )) will be added in the local dataset D k for updating the local surrogate. If solution x p falls in itsinfeasible decision subspace, i.e., the k -th client is not able tosample this solution, no new data is sampled in this round.Nevertheless, the local surrogate can still be updated in thenext round if this client participates the model update.The proposed federated data-driven evolutionary optimiza-tion problem and federated learning share some commonfeatures, but they also have clear differences. The main goalof federated learning is to build a high-quality global modelfor classification or regression, without requiring the data dis-tributed on multiple clients to be uploaded and centrally storedon the server so that data security can be ensured and privacycan be protected. Similarly, federated data-driven optimizationalso needs to build a global surrogate model, which is usedto assist the evolutionary search of the optimum of the globalsystem. The global surrogate is built in a federated learningmanner so that the local data does not need to transmittedto the server to reduce security and privacy issues. However,federated data-driven optimization distinguishes itself withfederated learning in at least the following two aspects. First, RAFT 5 the goal of federated data-driven optimization is to find theoptimum of the whole system, and as a result, a propersurrogate model management strategy must be designed, de-termining which new solutions should be sampled on the localclients. Similar to centralized surrogate-assisted evolutionaryoptimization [67], the capability of effectively guiding theevolutionary search, rather than the prediction accuracy, is ofparamount importance in federated data-driven evolutionaryoptimization. The second main difference is that in federateddata-driven optimization, new data keeps being generated andtherefore the surrogates must be incrementally updated duringthe optimization. In contrast to federated learning where theamount of data on each client is often big, the amount ofdata on the local clients is usually very limited in federatedoptimization. The main differences between federated data-driven optimization and federated learning are summarized inTable I.It should also be pointed out that federated data-drivenoptimization differs from existing parallel or distributed evo-lutionary optimization in that in the former, the raw data iscollected and stored in a distributed way, while in the latter,the computation or optimization is proactively distributed todifferent machines for reducing computation time. As a result,the way of distributing the computation or data tasks indistributed optimization is under the full control of the user,while in federated optimization, the data distribution is mainlydetermined by the nature of the local clients (subsystems).
TABLE IT
HE MAIN DIFFERENCES BETWEEN VANILLA FEDERATED LEARNINGSYSTEM AND FEDERATED DATA - DRIVEN EVOLUTIONARY OPTIMIZATION . VanillaFederated Learning Federated Data-DrivenEvolutionary Optimization
Server Model aggregation Model aggregation andsurrogate managementLocal clients Local training Local trainingand informed samplingData (Usually) Big, stationary Small, incrementalObjective Prediction accuracy Solution optimality
B. Overall framework
The overall framework of FDD-EA is given in Fig. 2. Inthe beginning, the server samples a certain amount of datausing the Latin hypercube sampling (LHS) method [68]. Thesesolutions are sent to all clients and evaluated on the clientsusing the local real objective function, which constitutes theinitial training set D initk of each client. Note that if not allclients can sample the whole decision space, D initk can alsovary from client to client. Then each client uses D initk totrain a local RBFN surrogate, and uploads the trained modelparameters to the server after the training is completed. Aglobal surrogate is obtained by aggregating the local RBFNsusing the sorted averaging method, the details of which will bepresented in Section III-C. An EA is then employed to searchfor the optimal solutions of the global surrogate by minimizing the proposed federated LCB (to be described in Section III-C)for a certain number generations. Finally, the optimal solutionfound by the EA, x p , is broadcast to all clients participatingthe next round of surrogate update. The point x p will beevaluated on the participating clients using their real objectivefunction f k and if successful (in their feasible subspace in thenon-iid case), be added to their database D k . The surrogate( w k ) on each participating client will then be updated onthe augmented D k . This process repeats until the maximumcomputation budget is exhausted. The pseudo code of theproposed FDD-EA is given in Algorithm 1. Algorithm 1:
Pseudo Code of FDD-EA
Input:
Number of participating clients N , globalsurrogate w , local surrogate w k , empty indexset S , local archives D k , client weight p k , k = { , , .., N } , maximum number of realobjective function evaluations FE max . Init:
Sample 5 d points x , x , ..., x d in the decisionspace by LHS and evaluate the values y , y , ..., y d with the real objective functions as D initk , and trainthe initial local models w k with D initk . while FE ≤ FE max do update S ← λN randomly selected from N clients Server does: /*update the global surrogate*/ w ← Algorithm 2call F-LCB to evaluate the individualsselected solution x p ← EAbroadcast w , x p to clients ∈ S end f ( x p ) ← real evaluate x p /* not on the server*/ for Client k ∈ S in parallel do /*update local surrogates*/receive w , x p from the serversynchronize w k ← w add { x p , f k ( x p ) } to D k train w k incrementally using D initk ∪ D k upload w k to server end F E + = 1 /* λN for distinct x p */ end Note that in FDD-EA, sampling a candidate solution x p on multiple clients at the same time is seen as one fitnessevaluation (FE) in comparing with the centralized data-drivenoptimization. We consider this is fair since the evaluationson multiple clients are done in parallel, and only one samepoint is sampled. Also, only the participating clients receivethe selected solution x p and are involved in the next round ofmodel updates.In the following, we will present in detail the proposed RAFT 6 sorted model aggregation method, as well as the federatedsurrogate management strategy.
C. Sorted Averaging
In FDD-EA, training of the global surrogate is meant toeffectively guide the evolutionary search instead of achievingaccurate predictions. For this purpose, a federated surrogatemanagement strategy is proposed to sample a new data on theparticipating clients in each round of model update, which isadded to the existing database for training the local surrogate.Consequently, federated surrogate training in FDD-EA is anincremental learning process.In the vanilla federated learning, the training data on thelocal clients is stationary and a weighted averaging aggregationmethod, known as FedAvg [36], is proposed to parameter-wiseaverage the uploaded local models according to the amount ofthe local data. Many variants have been proposed to enhancethe learning performance in the presence of non-iid dataand asynchronous model update [69]. However, since RBFNsare used for the surrogates, and the centers of the radial-basis-functions might have been shifted differently during thetraining. Thus, it has been found that a significant performancedrop will occur if we average the centers, widths, and theweights of the RBFNs simply according to the index of thenodes. C C C C C C C C C FedAvg
Fig. 3. An illustrative example of aggregating two local univariate RBFNs,each having three nodes. The three Gaussian functions of the local RBFNsare shown on the left, and the resulting three Gaussian functions of theaggregated global RBFN are plotted on the right. Due to the shifted centersof the neurons, averaging the centers of the Gaussian functions accordingto the node index will lead to unreasonable results, causing possible seriousperformance degradation of the global surrogate.
After analysis, it is found that the serious performancedegradation of the global model is caused by the averagingof very difference centers of the radial basis functions fromdifferent local models. Recall that the initial centers of theRBFNs are generated by clustering the decision variables ofthe training data. Given the same RBFN structure, the centersof the Gaussian functions of the corresponding nodes aresimilar, and a global surrogate can be generated by weightedaveraging. However, it can happen after training, that thecenters of the Gaussian functions have completely shifted,resulting in very different centers for the corresponding nodes of the RBFNs. Fig. 3 provides an illustrative example ofmismatches between two local RBFNs, where C , C and C and C , C and C , respectively, represent the centersof the Gaussian functions of nodes , and of the twoRBFNs. If the Gaussian functions of the two RBFNs areaveraged according to the index of the nodes, the resultingglobal RBFN will have three similar Gaussian functions withcenters located on C , C and C , respectively. As can be seenclearly on the right of Fig. 3, this global RBFN may performvery differently from the two local RBFNs, causing possiblebig performance drop.To alleviate this problem, we propose a sorted averagingmethod to obtain the global RBFN surrogate. The main idea ofthis strategy is to sort the centers of the radial basis functionsof different models so that nodes with similar centers are aver-aged. To achieve this, let us denote C k = ( c k, , c k, , ..., c k,m ) as the center vectors of the k -th uploaded local surrogate w k ,where c k,m ∈ R d , m is the number of centers. Then thefollowing matching metric, which is the squared sum of thecenters over all d dimensions, is calculated for each of the j -th node ( j = 1 , · · · , m ) of the k -th local RBFN: M k, j = d (cid:88) i =1 c k,j,i , j = 1 , , · · · , m. (11)Therefore, the matching vector of the k -th RBFN C k canbe denoted by M k = ( M k, , M k, , ..., M k,m ) . Then, theindex of nodes of the local RBFNs are sorted according totheir matching vector in an ascending order. This way, allparameters, including the centers, widths, and the connectingweights are averaged according to the sorted index of the nodesof different local RBFNs: w = λN (cid:88) k =1 p k w k , (12)where p k is the weight as defined in Equation (3). The pseudocode of sorted averaging is summarized in Algorithm 2. Algorithm 2:
Sorted Averaging
Input :
Local RBF surrogates w k (contain centers C k , weights a k , spreads δ k , bias b k ), globalRBF surrogate w . Init:
Empty matching vector M k . foreach k = 1, 2, 3 ... λN do calculate M k by equation (11)index ← sort( M k )sequence C k , a k , δ k by the index w k = [ C k ; a k ; δ k ; b k ] end w ← (cid:80) λNk =1 p k w k Output: averaged w D. Federated Surrogate Management
In Bayesian optimization, the estimated uncertainty of thepredictions provided by the Gaussian process plays an im-
RAFT 7 portant role in the acquisition functions, since the uncertaintyinformation is vital for striking a balance between explorationand exploitation. In the proposed FDD-EA, the global sur-rogate is built based on the weighted averaging of the localRBFNs and therefore the uncertainty of the the predictionsneeds to be properly estimated.In Section III-B, we have mentioned that LCB instead of EIis adopted in our work. We prefer LCB over EI because thelatter requires the current real minimums of the participatingclients, leading to possible data privacy risks. To avoid thisproblem, we propose a federated LCB (F-LCB for short) asthe acquisition function. The main idea is to make use of thepredictions of both the global and local surrogates as well: ˆ f ( x p ) = ˆ f local ( x p ) + ˆ f fed ( x p )2 , (13)where ˆ f local ( x p ) is calculated by ˆ f local ( x p ) = λN (cid:88) k p k ˆ f k ( x p ) , (14)in which p k is the weight for the k -th local surrogate, n k isthe number of data on the k -th client, Equation (14) denotesthe mean predicted value on x p of all participated clients. And f fed ( x p ) is the predicted value of the federated global model, ˆ f fed ( x p ) = w ( x p ) . (15)Consequently, the square of the standard deviation can becalculated by: ˆ s ( x p ) = 1 λN (cid:34) λN (cid:88) k (cid:16) ˆ f k ( x p ) − ˆ f ( x p ) (cid:17) + (cid:16) ˆ f fed ( x p ) − ˆ f ( x p ) (cid:17) (cid:21) . (16)Finally, the federated acquisition function, F-LCB, can becalculated by replacing ˆ f ( x p ) and ˆ s ( x p ) in Equation (8)with the predicted fitness in Equations (13) and the estimatedstandard deviation in (16), respectively.The combination of the local and global predictions, insteadof the aggregated global surrogate only, aims to further in-crease the prediction quality and the quality of the uncertaintyestimation for surrogate management. A comparative studywith discussions will be presented in Section IV-E.IV. S IMULATION S TUDIES
To verify the effectiveness of FDD-EA, we first compareit with several state-of-the-art centralized data-driven evolu-tionary optimization algorithms on widely used benchmarks.Furthermore, we examine the performance of FDD-EA on thebenchmarks when the data distributed on the clients are noisy,or when the data is non-iid. Finally, we present a comparativestudy of the proposed acquisition function to its variants.
A. Experimental setting1) Compared algorithms:
To examine the performanceof the proposed algorithm, we compare FDD-EA with fourpopular online data-driven surrogate-assisted evolutionary al-gorithms on five benchmark problems, Ellipsoid, Rosenbrock,Ackely, Rastrigin and Griewank (the reader is referred to[56] for details of the benchmarks), with different numbersof decision variables ( d =10, 20, 30). A summary of the mainfeatures of the benchmarks is listed in Table II. The algorithmsunder comparison include CAL-SAPSO [11], GPEME [56],SHPSO [70], and SSLPSO [71]. Note that all these SAEAsassume that the data is centrally stored and there does not existany federated data-driven evolutionary optimization algorithmsthat address the problem in this work, to the best of ourknowledge.1) CAL-SAPSO: An online SAEA with committee-basedactive learning strategy, which uses an ensemble sur-rogate consisting of a quadratic polynomial regression(PR) model, an RBFN and a simple Kriging model.CAL-SAPSO adopts a query by committee model man-agement strategy on the basis of prediction quality andamount of uncertainty.2) GPEME: A Kriging-based online SAEA with LCBas the acquisition function to select solutions for realobjective evaluations.3) SHPSO: An RBF-based online SAEA, which combinesthe standard PSO and social learning PSO for optimiza-tion and selection of the solutions to be sampled andthen updates the RBF surrogate model.4) SSLPSO: A surrogate-assisted social learning particleswarm optimization method.In this work, a real-coded genetic algorithm (RCGA) [72],[73] is selected as the base optimizer that applies the simu-lated binary crossover, polynomial mutation and tournamentselection. The RCGA will run for 100 generations to findthe minimum of the F-LCB. All algorithms under comparisoncollect d data pairs using the real fitness evaluations (FEs)for building the surrogate before optimization starts, and theoptimization ends when a total of d FEs is exhausted.
2) Data partitions:
To thoroughly investigate the perfor-mance of the proposed algorithm, we verify its optimizationperformance on three types of data distributions: IID, noisy,and non-iid.1)
IID : All clients are able to sample any data pointin the entire decision space, and the initial points x i ( i = 1 , , · · · , d ) on all clients are the same, whichare sampled using the LHS method. Meanwhile, allclients are able to sample any solutions identified byminimizing the federated acquisition function during theoptimization.2) Noisy environments : The same setting as the above,except that the fitness evaluations on all clients aresubject to noise. Detailed definition of the noise is givenin Section IV-C.3)
Non-IID : In real-world applications, it is likely that someclients are not able to sample all data points specified bythe acquisition function due to, for instance, different op-
RAFT 8 erating conditions. To examine the performance of FDD-EA subject to this constraint, we conduct experimentsin which some data points specified by the acquisitionfunction by the server are not accessible to some clients,resulting in non-iid data similar to federated learning.Note, however, that even in the IID case, the training dataon different clients may be slightly different, due to the factthat in each round of model update, only a small portion ofthe clients participate and sample new data.
TABLE IIT
EST P ROBLEMS .Problem d Optimum CharacteristicsEllipsoid 10, 20, 30 0.0 Uni-modalAckley 10, 20, 30 0.0 Multi-modalRastrigin 10, 20, 30 0.0 Multi-modalGriewank 10, 20, 30 0.0 Multi-modalRosenbrock 10, 20, 30 0.0 Multi-modal
3) Parameter settings:
The parameter settings of FDD-EAare as follows: • Total number of clients : N = 100. • Participating ratio: λ = 0.1. • Number of local epochs: E = 20. • Learning rate for clients: η = 0.12. • Number of RBF centers: m = 2 d + 1 . • Number of generations of EA: g = 100.The basis function of the RBF models is the Gaussian func-tion, the m centers are determined by the k-means clusteringalgorithm, and the widths are calculated by the maximumdistance between the centers, the weights and the biases aretrained using the gradient based method for E = 20 epochswith a learning rate η = 0.12. Each experiment is performedfor 20 independent runs. B. Results on IID Data
In the case of IID data distribution, LHS is used to samplethe initial d data pairs on all clients. The results are presentedin Table III, in which the average ranks of each algorithm arecalculated by the Friedman’s test, and the p -values are adjustedaccording to the Hommel’s procedure [74] with a significancelevel of 0.05. The better results are highlighted. It can be con-cluded that the proposed FDD-EA performs significantly betterthan the compared algorithm on 11 out of 15 instances. Thisindicates that FDD-EA performs competitively in comparisonwith the state-of-the-art centralized data-driven evolutionaryalgorithms.To take a closer look at the performances of the comparedalgorithms, the convergence profiles of the compared algo-rithms on the 10D, 20D and 30D test functions are presentedin Fig. 4, 5, 6, 7 and 8, respectively. From these results, we canobserve that FDD-EA clearly outperforms all other comparedalgorithms on the Rosenbrock, Ackley and Rastrigin functions.On the Elliposid function, CAL-SAPSO and FDD-EA outper-forms other algorithms under comparison, although FDD-EA performs worse than CAL-SAPSO on the 10D instance. On theGriewank function, FDD-EA is outperformed by CAL-SAPSOor SHPSO, nevertheless, still clearly outperforms SSLPSO andGPEME.To summarize, FDD-EA performs the best on 11 out of15 instances in the IID data environment, and it consistentlyconverges fast on all test instances. From these results wecan conclude that FDD-EA and the ensemble-based methodCAL-SAPSO are more competitive than the other comparedalgorithms. However, it is noticed that FDD-EA quickly getsstuck in a local optimum in the later search stage.We also compare FDD-EA with SA-COSO on high-dimensional problems up to 100D, because the online central-ized SAEAs compared above are designated for optimizationproblems up to 30 decision variables. Therefore, we com-pare FDD-EA with the surrogate-assisted co-operative swarmoptimization algorithm (SA-COSO) [10] designed for high-dimensional expensive optimization on the five test instancesof 50D and 100D, respectively. The experimental resultsare presented in Table V and Figs. 16, 17, 18, 19, 20 ofthe Supplementary material in the Appendix A. From theseresults, we can see that FDD-EA outperforms SA-COSO onall instances except for the 100D Griewank function. C. Noisy Fitness Evaluations
In data driven optimization, data might be collected fromproduction process, which is subject to noise [75] [76]. Tostudy robustness of the proposed algorithm against noisyfitness evaluations, we add f k ( x ) = f ( x ) + α ξ, ∀ k ∈ N (17)where α ∈ [0 , is a constant that determines the magnitudeof the noise, ξ is noise sampled from the standard Gaussiandistribution N (0 , , and f ( x ) denotes the benchmark prob-lem.We examine the performance of FDD-EA on the above-mentioned benchmark problems, with all settings being thesame as in Section IV-B and present the results in Fig. 9.From these results, we find that the performance of FDD-EAon all the tested instances except for the 20D Ackley and30D Griewank functions does not seriously deteriorate whenthe noise level changes from 0 to 1. Thus, we conclude thatFDD-EA is fairly insensitive to noise in the fitness evaluations. D. Non-IID
Here, we examine the performance of FDD-EA on non-IIDdata distribution, considering the fact that different clients mayhave different operating conditions and specifications, whichleads to the situation where some regions in the decision spacebecome infeasible on some local clients. To take this situationinto account, we introduce an infeasible domain dom ( k, τ ) forthe k -th client to determine whether a candidate point x p canbe sampled by the k -th client. dom ( k, τ ) =[ x lb + ( k − g k , min { x lb + ( k + τ − g k , x ub } ] , (18) RAFT 9
TABLE IIIA
VERAGE BEST FITNESS VALUES ( SHOWN AS AVG ± STD ) OBTAINED BY
FDD-EA, CAL-SAPSO, GPEME, SHPSO
AND
SSLPSO. T
HE AVERAGERANKS ARE OBTAINED ACCORDING TO THE F RIEDMAN ’ S TEST WITH THE p - VALUES BEING ADJUSTED ACCORDING TO THE H OMMEL ’ S PROCEDURE ANDTHE SIGNIFICANCE LEVEL OF
IS THE CONTROL METHOD .Problem d FDD-EA CAL-SAPSO GPEME SHPSO SSLPSOEllipsoid 10 6.17e-01 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± p -value NA 5 H D O I L W Q H V V H Y D O X D W L R Q V / R J D Y H U D J H F X U U H Q W E H V W ( O O L S V R L G G ) ' ' ( $ &