[PDF] Multi-Space Evolutionary Search for Large-Scale Optimization

Abstract

In recent years, to improve the evolutionary algorithms used to solve optimization problems involving a large number of decision variables, many attempts have been made to simplify the problem solution space of a given problem for the evolutionary search. In the literature, the existing approaches can generally be categorized as decomposition-based methods and dimension-reduction-based methods. The former decomposes a large-scale problem into several smaller subproblems, while the latter transforms the original high-dimensional solution space into a low-dimensional space. However, it is worth noting that a given large-scale optimization problem may not always be decomposable, and it is also difficult to guarantee that the global optimum of the original problem is preserved in the reduced low-dimensional problem space. This paper thus proposes a new search paradigm, namely the multi-space evolutionary search, to enhance the existing evolutionary search methods for solving large-scale optimization problems. In contrast to existing approaches that perform an evolutionary search in a single search space, the proposed paradigm is designed to conduct a search in multiple solution spaces that are derived from the given problem, each possessing a unique landscape. The proposed paradigm makes no assumptions about the large-scale optimization problem of interest, such as that the problem is decomposable or that a certain relationship exists among the decision variables. To verify the efficacy of the proposed paradigm, comprehensive empirical studies in comparison to four state-of-the-art algorithms were conducted using the CEC2013 large-scale benchmark problems.

Full PDF

L. Feng and Q. Shang are with College of Computer Science, Chongqing University, China. E-mail: {liangf, qxshang}@cqu.edu.cn Y. Hou is with the College of Computer Science and Technology, Dalian University of Technology, China. E-mail: {houyq}@@dlut.edu.cn Kay Chen Tan is with the Department of Computer Science, City University of Hong Kong, and the City University of Hong Kong Shenzhen Research Institute. E-mail: {kaytan}@cityu.edu.hk Yew-Soon Ong are with the School of Science and Computer Engineering, Nanyang Technological University (NTU), Singapore. E-mail:{asysong}@ntu.edu.sg

Multi-Space Evolutionary Search for Large-Scale Optimization

Liang Feng, Qingxia Shang, Yaqing Hou, Kay Chen Tan and Yew-Soon Ong

Abstract:

Index Terms:

Large-scale Optimization, Evolutionary Search, Multi-Space Optimization, Knowledge Transfer

Ⅰ. Introduction

An evolutionary algorithm (EA) is a stochastic optimization search method that takes inspiration from the theory of natural biological evolution. It starts with a population of individuals that undergo reproduction, including crossover and mutation, to produce offspring. This procedure is executed iteratively and terminated when a predefined condition is satisfied. In contrast to traditional optimization approaches such as calculus-based and enumerative strategies, an EA contains flexible procedures and is robust to changing problem circumstances. In recent decades, EAs have attracted significant research attention, and have been successfully applied in many complex applications, such as scheduling in logistics [1], [2], image processing [3], [4], and the architecture optimization of deep neural networks [5], [6]. Today, because of the exponential growth of the volume of data in big data applications, large-scale optimization problems (i.e., optimization problems with a large number of decision variables) have become ubiquitous in the real world [7], [8], [9], [10], [11]. Because increasing the decision variables not only leads to an exponential increase in the problem solution space, but also results in the growth of the computational cost involved in the solution search and evaluation process, the performance of an EA decreases significantly for large-scale optimization problems [12], [13], [14], [15]. To improve the scalability of EA for solving problems with large-scale dimensionality, many research efforts have been conducted to simplify the search space of a large-scale optimization problem [16], [17], [18], [19], [20], [21]. According to a recent survey [22], the existing approaches can generally be categorized into two groups: decomposition-based approaches and dimension-reduction-based approaches. Specifically, decomposition-based methods follow the principle of divide-and-conquer and decompose a problem into several relatively small subcomponents that are optimized concurrently. Further, dimension-reduction-based methods reduce the number of decision variables by selecting a set of principal variables or transforming the igh-dimensional space into a space with fewer dimensions. However, despite the successes enjoyed by these two classes of methods, it is worth noting here that because decomposition-based methods rely on the accurate detection of the relationships between decision variables, this type of method may fail when used for large-scale optimization problems that possess complex variable interactions, or are not even decomposable. Moreover, reducing the dimensionality of the search space could lose important information for optimization, and it is difficult to guarantee that the global optimum or high-quality solutions are preserved in the reduced solution space. Evolutionary multitasking (EMT) is a recently emerging research topic in the field of evolutionary computation [23], [24], [25], [26]. In contrast to a traditional single-task evolutionary search, EMT conducts an evolutionary search on multiple tasks, each corresponding to a particular optimization problem. It aims to improve the convergence characteristics of an evolutionary search across multiple optimization problems at once by seamlessly transferring knowledge among the tasks. In the literature, the efficacy of EMT has been verified on sets of continuous, discrete tasks, and mixtures of continuous and combinatorial tasks [27], [28], [29], [30], [31]. Inspired by this, in the context of large-scale optimization, besides replacing the original problem space with a simplified space, the advantage of a simplified search space could also be obtained by configuring the constructed solution space as an auxiliary task of the original problem, under EMT. In this manner, as the solution space of the original problem serves as one task in EMT, there is no assumption and requirement for the relationships between decision variables, and the existence of global optimum or high-quality solutions is guaranteed. Keeping the above in mind, this paper proposes a new evolutionary search paradigm, namely the multi-space evolutionary search, for solving large-scale optimization problems. In particular, for a given large-scale optimization problem, besides the original problem space, multiple simplified solution spaces are derived for the given problem, which possess unique landscapes. Further, instead of conducting an evolutionary search on the given problem space, evolutionary searches are concurrently performed on both the original and constructed simplified spaces of the given problem. By transferring useful traits while the search progresses online across different spaces, via EMT, an enhanced problem-solving process can be obtained. To evaluate the efficacy of the proposed paradigm for large-scale optimization, comprehensive empirical studies on the CEC2013 large-scale optimization benchmarks were conducted using four state-of-the-art representative methods for large-scale optimization, and the results were analyzed. The reminder of this paper is organized as follows. Section II begins with a review of the literature on the existing decomposition- and dimension-reduction-based approaches for solving large-scale optimization problems. A brief introduction of EMT, which serves as the optimization engine of the proposed multi-space evolutionary search, is also presented in section II. Further, section III provides the details of the proposed multi-space evolutionary search for solving largescale optimization problems. Section IV discusses the comprehensive empirical studies that were conducted on the CEC2013 large-scale optimization benchmarks using four state-of-art algorithms for large-scale optimization. Finally, the concluding remarks of this paper are presented in section V.

II. Preliminary

This section first presents a review of the literature on the existing decomposition- and dimension-reduction-based approaches for solving large-scale optimization problems. Next, a brief introduction of the EMT paradigm is provided.

A. Existing Approaches for Simplifying Search Space of large-scale Optimization Problems

According to recent surveys in the literature [8], [22], [32], the existing approaches to simplifying the search space of a given large-scale optimization problem can generally be categorized as decomposition-based approaches, and dimension-reduction-based methods. In particular, the ecomposition-based approaches are also known as divide-and-conquer approaches in evolutionary computation and mainly involve cooperative coevolution (CC) algorithms, which decompose a given large-scale optimization problem into several smaller subproblems and then optimize each subproblem separately using different EAs. Generally, decomposition-based approaches consist of three major steps. First, by considering the structure of the underlying decision variable interactions, the original D -dimensional problem is exclusively divided into 𝑁 d i -dimensional sub-problems, where ∑ 𝑑 𝑖𝑁𝑖=1 = 𝐷 . Next, each subproblem is solved by a particular EA. Finally, the 𝑑 -dimensional solutions to these subproblems are merged to form the D-dimensional complete solution for the original problem. It is straightforward to see how the decomposition of the problem is essential to the performance of CC algorithms, and how an inappropriate decomposition of the decision variables may even lead to a deteriorated optimization performance [32], [33]. Particular examples in this category include strategies that randomly divide the variables into groups without taking the variable interaction into consideration [34], [35], [36], approaches that make use of evolutionary information to learn variable interdependency and then divide variables into groups [37], [38], and static decomposition methods that are performed before conducting an evolutionary search based on the detection of variable interaction [33], [39], [40], [41]. On the other hand, instead of decomposing the solution space of the given problem, a dimension-reduction-based approach attempts to create a new solution space with lower dimensionality from the original solution space. The evolutionary search is then performed on the newly created low dimension space, and the obtained solution is mapped back to the original space for evaluation. Generally, the existing approaches perform dimension reduction either by selecting a subset of the original decision variables or transforming the original solution space into a low-dimensional solution space. As can be observed, the preservation of important information for guiding the search toward high-quality solutions in the reduced solution space plays a key role in determining the performance of a dimension-reduction-based approach. Examples belonging to this class include the random matrix projection-based estimation of distribution algorithm [42], random embedding-based approach for large-scale optimization problems with low effective dimensions [43], and multi-agent system assisted embedding for large-scale optimization [44]. Although the above methods have shown good performances in solving large-scale optimization problems, there are two main drawbacks with these two categories of methods. First, because decomposition-based methods rely heavily on the accurate detection of decision variable interactions, these methods may fail on large-scale optimization problems with complex variable interactions or that are not decomposable. Second, although dimension reduction may not rely on variable interaction, it is difficult to guarantee that the global optimum or high-quality solutions are preserved in the reduced space. However, because a simplified solution space can provide useful information for efficient and effective problem solving, it is desirable to develop new search paradigms for large-scale optimization that can leverage the advantage of simplified solution spaces without the limitations discussed above. B. Evolutionary Multitasking

Consider a situation where 𝐾 optimization tasks are to be performed. EMT has been defined in the literature as an optimization paradigm that solves multiple optimization tasks at the same time, with the aim of improving the problem-solving performance across tasks by seamlessly transferring knowledge between them [23]. In particular, as depicted in Fig.1, let 𝑓 𝑖 : 𝒳 𝑖 → ℝ be a global optimization task on a compact subset 𝒳 𝑖 ∈ ℝ 𝐷 𝑖 , with objective 𝑥 𝑖∗ =𝑎𝑟𝑔𝑚𝑖𝑛 𝑥 𝑖 ∈𝒳 𝑖 𝑓 𝑖 (𝑥 𝑖 ) , The input of EMT is a set of optimization tasks IS : {𝑓 , ⋯ , 𝑓 𝑖 , ⋯ , 𝑓 𝐾 } , where 𝐾 denotes the number of tasks. Please note that each task 𝑓 𝑖 may possess unique dimensionality 𝐷 𝑖 . The output of EMT is then given by the set of optimized solutions OS : {𝑥 , ⋯ , 𝑥 𝑖∗ , ⋯ , 𝑥 𝐾∗ } . Fig. 1. Illustration of EMT.

In contrast to the traditional single-task optimization, EMT involves automatically exploiting and transferring the latent synergies between distinct (but possibly similar) optimization problems while the optimization progresses online, which could eventually lead to enhanced problem-solving on all the tasks. For large-scale optimization, if each task under EMT corresponds to a unique solution space for the given optimization problem, the useful traits found in different spaces could be transferred across these spaces via EMT, producing a more efficient and effective evolutionary search process. Inspired by this, a multi-space evolutionary search paradigm is proposed for large-scale optimization, which will be discussed in detail in the next section.

III. Proposed Multi-space Evolutionary Search for Large-scale Optimization

This section presents the details of the proposed multi-space evolutionary search for large-scale optimization. In particular, the outline of the proposed paradigm is presented in Fig. 2. For a given problem of interest, besides the original problem space, a simplified problem space for the given problem is first created. Next, the mapping between these two problem spaces is learned, which will be used for knowledge transfer across spaces during the evolutionary search process via EMT. Further, by treating these two problem spaces as two tasks, evolutionary searches can be conducted on the tasks concurrently. As can be observed in the figure, knowledge transfer will be performed across tasks while the evolutionary search progresses online (see the green rectangle in Fig. 2). In this way, the useful traits found in the simplified problem space can be leveraged to facilitate the search in the original space, while the high-quality solutions found in the original problem space may also guide the search direction in the simplified problem space toward promising areas. Furthermore, to explore the usefulness of diverse auxiliary tasks, the simplified problem space will be re-constructed periodically using the solutions found during the evolutionary search process (see the yellow rectangle in Fig. 2). Finally, the EMT process is terminated when certain stopping criteria are satisfied. The following sections present details on the

Construction of the simplified problem space , Learning of Mapping across Problem Spaces , Knowledge Transfer across Problem Spaces , and

Reconstruction of the simplified problem space . A. Construction of The Simplified Problem Space

Because the simplified problem space serves as an auxiliary task of a given problem of interest, there are generally no particular constraints on the construction of the simplified space. Therefore, the existing approaches proposed in the literature, such as random embedding [43], dimension reduction [45], or even search space decomposition [34], [38] could be employed for constructing the space. In this paper, for simplicity, the popular and well-known dimension reduction approach principal component analysis (PCA) [46] is considered for constructing a simplified problem space, 𝐏 s , in the proposed multi-space evolutionary search paradigm. In particular, to generate initial population 𝐏𝐨𝐏 s of the evolutionary search in 𝐏 s , initial population 𝐏𝐨𝐏 is first sampled in original problem space 𝐏 , which is routine [47]. Next, the obtained 𝐏𝐨𝐏 in 𝐏 will undergo PCA with dimension 𝑑 𝑠 to generate 𝐏𝐨𝐏 s for the evolutionary search in 𝐏 s . Fig. 2. Workflow of the proposed multi-space evolutionary search for large-scale optimization.

B. Learning of Mapping across Problem Spaces

Once the simplified problem space has been constructed, the mappings across simplified problem space 𝐏 s and original problem space 𝐏 have to be learned, to allow the useful traits found in each space to be transferred across spaces toward efficient and effective problem solving for large-scale optimization. In this paper, the mappings across 𝐏 s and 𝐏 are learned using labeled data from each space via supervised learning. In particular, as discussed in section III-A, 𝐏𝐨𝐏 s is generated by performing a PCA of 𝐏𝐨𝐏 . Therefore, each solution in

𝐏𝐨𝐏 s has a unique corresponding solution in 𝐏𝐨𝐏 . This correspondence thus provides the label information to connect space 𝐏 s and 𝐏 . Taking this cue, by configuring 𝐓 and 𝐒 as 𝐏𝐨𝐏 and

𝐏𝐨𝐏 s , respectively, the mapping 𝑀 𝑃 𝑠 →𝑃 : ℛ 𝑑 𝑠 → ℛ 𝑑 ( 𝑑 is the dimension of the original problem space) from the simplified space 𝐏 s to the original problem space 𝐏 can then be obtained by minimizing the squared reconstruction loss, which is given by the following: ℒ 𝑠𝑞 (𝑀) = 12𝑁 ∑‖𝑝 𝑖 − 𝑀 × 𝑞 𝑖 ‖ (1) where 𝑁 denotes the number of solutions in 𝐒 and 𝐓 . 𝑞 𝑖 is the solution in 𝐒 , and 𝑝 𝑖 gives the solution in 𝐓 which corresponds to 𝑞 𝑖 . Further, to simplify the notation, it is assumed that a constant feature is added to the input, that is, 𝑝 𝑖 = [𝑝 𝑖 ; 1] and 𝑞 𝑖 = [𝑞 𝑖 ; 1], and an appropriate bias is incorporated within the mapping, M = [𝑀, 𝑏] . The loss in Eq. 1 is then reduced to the matrix form: 𝑠𝑞 (𝑀) = 12𝑁 𝑡𝑟[(𝑇 − 𝑀 × 𝑆) 𝑇 (𝑇 − 𝑀 × 𝑆)] (2) Where tr(∙) and 𝑇 denote the trace operation and transpose operation of a matrix, respectively. The solution of Eq. 2 can be expressed as the well-known closed-form solution for ordinary least squares [48], which is given by the following: M = (T × 𝑆 𝑇 )(𝑆 × 𝑆 𝑇 ) −1 (3) Finally, it is straightforward to see that the mapping 𝑀 𝑃→𝑃 𝑠 from space 𝐏 to 𝐏 s can also be learned via Eq. 3 by configuring 𝐓 and 𝐒 as 𝐏𝐨𝐏 s and 𝐏𝐨𝐏 , respectively.

C. Knowledge Transfer across Problem Spaces

With the learned mapping 𝑀 𝑃 𝑠 →𝑃 and 𝑀 𝑃→𝑃 𝑠 across the simplified and original problem spaces, a knowledge transfer across these two spaces can be easily conducted by the simple operation of matrix multiplication. In particular, for a knowledge transfer from 𝐏 s to 𝐏 , suppose this process occurs at generation 𝐺 . First, the Q best solutions are selected in terms of the fitness values from the population of the simplified problem space, dented by 𝑆 𝑠 , which is a 𝑑 𝑠 × 𝑄 matrix. Next, the transferred solutions, 𝑇𝑆 𝑃 𝑠 →𝑃 , are obtained by 𝑀 𝑃 𝑠 →𝑃 × 𝑆 𝑠 . Finally, the solutions in 𝑇𝑆 𝑃 𝑠 →𝑃 are injected into the population of the original problem space to undergo natural selection for the next generation. On the other hand, at generation 𝐺 , knowledge transfer also occurs from the original problem space to the simplified problem space. In particular , the 𝑃 best solutions in terms of fitness value are first selected from the population of the original problem space, which are labeled as 𝑆 𝑠′ and are a 𝑑 × 𝑃 matrix. Subsequently, the transferred solutions 𝑇𝑆 𝑃→𝑃 𝑠 can be obtained by 𝑀 𝑃→𝑃 𝑠 × 𝑆 𝑠′ . Further, the solutions in 𝑇𝑆 𝑃→𝑃 𝑠 are inserted into the population of the simplified problem space with natural selection. Moreover, after the knowledge transfer process, the updated population of the simplified problem space is further transformed backed to the original problem space, and archived in 𝑨 𝑠 . The repeated solutions in 𝑨 𝑠 are removed. As can be observed, 𝑨 𝑠 preserves Algorithm 1:

Pseudo code of the population re-initialization in 𝐏 s Input:

𝐏𝐨𝐏 s : the population in the simplified problem space before the reconstruction; 𝑨 𝑠 : Archive in the original problem space

Output:

𝐏𝐨𝐏 s : the re-initialized population in the new simplified problem space 1: Begin Transform

𝐏𝐨𝐏 s back to the original problem space using 𝑀 𝑃 𝑠 →𝑃 , denoted by 𝐏𝐨𝐏 𝑠′ ; 3: Perform the reconstruction of the simplified problem space using PCA with 𝑨 𝑠 ; 4: Learn the new 𝑀 𝑃 𝑠 →𝑃 and 𝑀 𝑃→𝑃 𝑠 across problem spaces 𝐏 and 𝐏 s as discussed in section III-B. 5: Re-initialize the population in the new simplified problem space by

𝑃𝑜𝑃 𝑠 = 𝑀 𝑃→𝑃 𝑠 × 𝑃𝑜𝑃 𝑠′ ; 6: End search traces in the present simplified problem space, which will be used for the reconstruction of a new simplified space. That is discussed in detail in the next section.

D. Reconstruction of The Simplified Space

To explore the usefulness of diverse auxiliary tasks for large-scale optimization, instead of using one fixed simplified problem space, the proposal is made to build multiple simplified problem spaces periodically while the evolutionary search progresses online. In particular, if the reconstruction of the simplified problem space occurs in generation 𝐺 , the PCA used in section III-A is considered here again to reconstruct a simplified problem space with a new set of solutions in the original problem space. Further, in order to preserve the useful traits found in the last simplified space, the solutions in archive 𝑨 𝑠 are used as the new set of solutions and subjected to PCA o construct a new 𝐏 s . Subsequently, the solutions in 𝑨 𝑠 and corresponding mapped solutions in 𝐏 s will be used to learn mapping 𝑀 𝑃 𝑠 →𝑃 and 𝑀 𝑃→𝑃 𝑠 across problem spaces 𝐏 and 𝐏 s . Finally, the population of the simplified problem space is also re-initialized in the new 𝐏 s , which is shown in detail in Alg. 1. In this study, for simplicity, we kept the dimension of 𝐏 𝑠 as 𝑑 𝑠 unchanged. lgorithm 2: Pseudo code of the proposed multi-space evolutionary search for large-scale optimization

Input: 𝐏 : Given the problem space of interest; 𝑑 𝑠 : Dimensionality of the simplified problem space; 𝐺 𝑡 : Interval of knowledge transfer across problem spaces; 𝐺 𝑟 : Interval of simplified space reconstruction Output: 𝐬 ∗ : optimized solution of the given problem 1: Begin Construct the simplified problem space 𝐏 s with dimension 𝑑 𝑠 of given problem 𝐏 ; 3: Learn the mappings 𝑀 𝑃 𝑠 →𝑃 and 𝑀 𝑃→𝑃 𝑠 across 𝐏 s and 𝐏 ; 4: 𝑔𝑒𝑛 = 1; 𝐴 𝑠 = ∅ while Terminate condition is not satisfied do 6: 𝑔𝑒𝑛 = 𝑔𝑒𝑛 + 1 ; 7: Perform reproduction operators (e.g., crossover and mutation) for 𝐏 s and 𝐏 , respectively; 8: Transform solutions in 𝐏 s back to 𝐏 ;

9: Perform natural selection for both 𝐏 s and 𝐏 in problem space 𝐏 ; 10: if 𝑚𝑜𝑑(𝑔𝑒𝑛, 𝐺 𝑡 ) = 0 then Perform knowledge transfer across 𝐏 s and 𝐏 ;

12: Transform the population of 𝐏 s back to 𝐏 , and archive the population in 𝑨 𝑠 ; 13: Remove the repeated solutions in 𝑨 𝑠 ; 14: end if if 𝑚𝑜𝑑(𝑔𝑒𝑛, 𝐺 𝑟 ) = 0 then Reconstruct the new simplified problem space 𝐏 s with dimension 𝑑 𝑠 using solutions in 𝑨 𝑠 ; 17: Learn the new mappings 𝑀 𝑃 𝑠 →𝑃 and 𝑀 𝑃→𝑃 𝑠 across the new constructed 𝐏 s and 𝐏 ; 18: Re-initialize the populations in 𝐏 s using the new learned 𝑀 𝑃→𝑃 𝑠 ; 19: end if end while End

E. Summary of Proposed Multi-Space Evolutionary Search

A summary of the proposed multi-space evolutionary search for large-scale optimization is presented in Alg. 2. As can be observed, while the EMT progresses online, the knowledge transfer across 𝐏 s and 𝐏 occurs in every in every 𝐺 𝑡 generation (see line 10-13 in Alg. 2), and the reconstruction of a new simplified problem space 𝐏 s is performed in every 𝐺 𝑟 Table I Properties of the CEC2013 Benchmark Functions

Separability Function Modality Search Space Base Function Fully Separable Functions F1 F2 F3 unimodal multimodal multimodal [−100,100] 𝐷 [−5,5] 𝐷 [−32,32] 𝐷 Elliptic Rastrigin Ackley Partially Additive Separable Functions I F4 F5 F6 F7 unimodal multimodal multimodal multimodal [−100,100] 𝐷 [−5,5] 𝐷 [−32,32] 𝐷 [−100,100] 𝐷 Elliptic Rastrigin Ackley Schwefel Partially Additive Separable Functions II F8 F9 F10 F11 unimodal multimodal multimodal unimodal [−100,100] 𝐷 [−5,5] 𝐷 [−32,32] 𝐷 [−100,100] 𝐷 Elliptic Rastrigin Ackley Schwefel Overlapping Functions F12 F13 F14 multimodal unimodal unimodal [−100,100] 𝐷 [−100,100] 𝐷 [−100,100] 𝐷 Rosenbrock Schwefel Schwefel Fully Non-separable Functions F15 unimodal [−100,100] 𝐷 Schwefel generations (see line 15-18 in Alg. 2). Further, archive 𝑨 𝑠 on line 12-13 of Alg. 2 is used to store the non-repeating search traces in the simplified problems with solution representation in the original problem space. Without loss of generality, the volume of 𝑨 𝑠 can be configured as needed. However, in this paper, for simplicity, the volume of 𝑨 𝑠 is configured as , where NP denotes the population size of the evolutionary search. Once the search traces exceed the volume of 𝑨 𝑠 , only the latest search traces are archived. IV. Empirical Study

This section discusses the results of comprehensive empirical studies that were conducted to evaluate the performance of the proposed multi-space evolutionary search paradigm on the commonly used large-scale optimization benchmarks, compared to several state-of-the-art algorithms proposed in the literature. A. Experimental Setup

In this study, the commonly used CEC2013 large-scale optimization benchmark [49], which contains 15 functions with diverse properties, was used to investigate the performance of the proposed multi-space evolutionary search. As summarized in Table I, according to [49], the benchmark consists of oth unimodal and multimodal minimization functions, which can be generally categorized into the following five classes: 1) fully separable functions, 2) partially additive separable functions I, 3) partially additive separable functions II, 4) overlapping functions, and 5) fully nonseparable functions. Further, except for functions F13 and F14, all the functions have a dimensionality of 1000. Because of the overlapping property, functions F13 and F14 both have 905 decision variables. For more details on the CEC2013 large-scale optimization benchmark, interested readers can refer to [49]. Next, to verify the efficacy of the proposed multi-space evolutionary search (referred to as MSES hereafter) for large-scale optimization, four state-of-the-art methods for addressing large-scale optimization, including decomposition based cooperative coevolution and non-decomposition-based approaches, were considered as the baseline algorithms for comparison. In particular, the cooperative coevolution approaches included the recursive decomposition method proposed by Sun et al. in 2018 (called RDG) [41], and an improved variant of the differential grouping algorithm introduced by Omidvar et al. which is called DG2 [50]. The non-decomposition-based approaches included the level based learning swarm optimizer proposed by Yang et al. in 2018 (called DLLSO) [51], and the random embedding-based method proposed by Hou et al. in 2019 (called MeMAO) [44]. Further, in these compared algorithms, it should be noted that different evolutionary search methods were used as the basic optimizer. For example, RDG and DG2 employed the self-adaptive differential evolution with neighborhood search (SaNSDE) [41], [50] as the optimizer, while MeMAO considered the classical differential evolution (DE) method as the optimizer [44]. Rather than using differential evolution, DLLSO used the particle swarm optimizer as the basic search method [51]. For a fair comparison to the different baseline algorithms, the optimizer for each space in the proposed MSES was kept consistent with the optimizer used in the compared algorithm. Lastly, the parameter and operator settings of all the compared algorithms and the proposed MSES were kept the same as those in [41], [50], [51], and [44], which are summarized as follows:  Population Size: population size

𝑁𝑃 = 50 , 100, and 500 for optimizers SaNSDE, DE, and DLLSO, respectively.  Independent number of runs: 𝑟𝑢𝑛𝑠 = 25 for all the compared algorithms.  Maximum number of fitness evaluations:

𝑀𝑎𝑥_𝐹𝐸𝑠 = 3𝐸 + 06 .  Number of solutions to be transferred across spaces in MSES:

𝑃 = 𝑄 = 0.2 ∗ 𝑁𝑃 .  Interval of knowledge transfer across problem spaces: 𝐺 𝑡 = 1 .  Interval of simplified space reconstruction: 𝐺 𝑟 = 10 .  Dimensionality of the simplified problem space: 𝑑 𝑠 = 600 .  Size of 𝑨 𝑠 : 𝑨 𝑠𝑠𝑖𝑧𝑒 = 5 ∗ 𝑁𝑃 . B. Results and Discussion

This section presents and discusses the performance of the proposed MSES in comparison to those of the existing state-of-the-art approaches on the CEC2013 large-scale benchmark functions in terms of the solution quality and search efficiency.

1) Solution Quality : Table II tabulates the results with respect to the averaged objective values and standard deviations obtained by all the compared algorithms over 25 independent runs. In particular, based on the evolutionary solver employed for the search (e.g., SaNSDE, PSO, and DE), the comparison results are divided into three groups, with each group sharing the same evolutionary solver. The best performance in each comparison group is highlighted using a bold font in the table. Further, in order to obtain a statistical comparison, a Wilcoxon rank sum test with a 95% confidence level was conducted on the experimental results, where “ + ” , “ - ” , and “≈” show that statistically the algorithm is significantly better than, significantly worse than, or similar to the proposed MSES, respectively. As can be observed in the table, in all three comparison groups, when using different evolutionary search methods as the optimizer, the proposed MSES obtained a superior solution quality in terms of the veraged objective value on most of the problems compared to the other algorithms. In the comparison groups using SaNSDE and PSO as the optimizers, the proposed approach, that is, MSES SaNSDE and MSES

DLLSO , lost to the compared algorithms on large-scale benchmarks F1 and F2. Table I shows that F1 and F2 are fully separable functions. Moreover, F1 is based on a unimodal “Elliptic” function, and the search space of F2 is only within the range of [-5, 5], which indicates the simplicity of the search spaces for these two functions. However, on the other more complex large-scale benchmarks such as partially additive separable, overlapping, and fully nonseparable problems, where greater appropriate guidance is required for an effective search, the proposed MSES

SaNSDE and MSES

DLLSO achieved superior and competitive averaged objective values in contrast to DG2/RDG and DLLSO, respectively. On benchmarks F11, F13 and F14, only the proposed method was able to consistently find solutions with objective values of approximately e+07 in both of these comparison groups. On 15 large-scale benchmarks, the proposed MSES

SaNSDE and MSES

DLLSO achieved significantly better averaged objective values on 13 and 9 problems in contrast to DG2/RDG and DLLSO, respectively. Furthermore, in the comparison group of that used DE as the optimizer, the proposed MSES DE obtained superior or competitive averaged objective values on all the large-scale benchmarks compared to MeMAO. In particular, on benchmarks such as F4 and F8, MSES DE achieved improvements of orders of magnitude in contrast to MeMAO. The objective values achieved on these benchmarks were even superior to those obtained using SaNSDE and PSO as the optimizers. On 15 large-scale benchmarks, the proposed MSES DE achieved significantly better averaged objective values on 13 problems in contrast to MeMAO. Table II Averaged objective values and standard deviations obtained by the proposed MSES and the compared baseline algorithms. (Superior performance in each comparison group is highlighted in bold, “+”, “≈”and “−” denote that the compared algorithm is statistically significant better, similar and worse than the proposed MSES using different EA solvers, respectively.)

Problems Comparison 1 Comparison 2 Comparison 3 MSES

SaNSDE

DG2 RDG MSES

DLLSO

DLLSO MSES DE MeMAO F1 F2 F3 1.34e+07±1.54e+06 5.20e+02±1.32e+03+ n summary, because the proposed method used the same optimizer as the compared algorithms in each comparison group, and only differed in the search spaces, the superior solution quality observed in Table II confirmed the effectiveness of the proposed multi-spaces evolutionary search for large-scale optimization.

2) Search Efficiency : This section presents the convergence graphs obtained by all the compared algorithms on all the large-scale benchmarks to assess the search efficiency of the proposed multi-spaces evolutionary search for large-scale optimization. In particular, Fig. 3 and Fig. 4 show the obtained convergence graphs obtained on the fully separable functions, partially additive separable functions, overlapping functions, and fully non-separable functions. In these figures, the Y-axis denotes the averaged objective values in log scale, while the X-axis gives the respective computational effort required in terms of the number of fitness evaluations. It can be observed from Fig. 3 and Fig. 4 that on benchmarks F1 and F2, the compared algorithm DLLSO obtained the best convergence performance. This indicates that on benchmarks with a relatively simple decision space, a search on the original problem space can efficiently find high-quality solutions using properly designed search strategies. Moreover, on the other functions of the CEC 2013 benchmarks with more complex decision spaces (e.g., partially additive separable functions, overlapping functions, and fully nonseparable functions), where greater appropriate guidance is required for an efficient search for high-quality solutions, the proposed MSES obtained the best and competitive convergence performances in all three comparison groups. In particular, even on the fully separable function F3, because it is based on the complex “Ackley” function, the proposed MSES

SaNSDE , MSES

DLLSO , and MSES DE obtained faster convergences over the compared algorithms that shared the same EA solvers. In addition, for functions such as F8, F13, and F14, regardless of which EA was considered as the search optimizer, the proposed MSES obtained the best convergence performance in contrast to the baseline algorithms in all three comparison groups. Because the proposed MSES used the same search optimizers as the compared algorithms, the superior search speed obtained confirmed the efficiency of the proposed multispace evolutionary search for large-scale optimization. Finally, to provide deeper insights into the superior performance obtained by the proposed MSES, considering the three different search algorithms (SaNSDE, PSO, and DE) as the optimizers, the transferred solutions from the simplified space and the best solutions in the population of the original problem space on the representative benchmarks are plotted in Fig. 5. As can be observed in the figure, solutions were transferred across the problem spaces during the evolutionary search process. In particular, in Fig. 5(a), compared to the best solution in the original problem space at different stages of the search, both inferior and superior solutions in terms of the objective value were transferred across the spaces. The former could be eliminated via natural selection, while the latter survived and efficiently guided the evolutionary search in the original problem space toward promising areas of high-quality solutions, which led to the enhanced search performance of the proposed MSES, as observed in Table II, Fig. 3, and Fig. 4. Similar observations can also be made in the cases of using PSO and DE as the optimizers, as depicted in Fig. 5(b) and Fig. 5(c), respectively. These also confirmed that useful traits could be embedded in the different spaces of a given problem, and concurrently conducting an evolutionary search on multiple spaces can lead to efficient and effective problem-solving for large-scale optimization. (a) F1 (b) F2 (c) F3 (d) F4 (e) F5 (f) F6 Fig. 3. Convergence curves of average fitness (over 25 independent runs) obtained by MSES and the compared algorithms on CEC2013 fully separable and partially additive separable functions, i.e., F1-F6. Y-axis: averaged objective value in log scale; X-axis: number of fitness evaluations.

3) Sensitivity Study : Five parameters were used in the proposed multi-space evolutionary search: the size of 𝑨 𝑠 ( 𝑨 𝑠𝑠𝑖𝑧𝑒 ), dimension of the simplified problem space ( 𝑑 𝑠 ), interval for reconstructing the simplified problem space ( 𝐺 𝑟 ), and interval and number of solutions transferred from the simplified to the original problem space ( 𝐺 𝑡 and 𝑄 , respectively). This section presents and discusses how these parameters affect the performance of the proposed multi-space evolutionary search. In particular, Figs. 6 to 8 present the averaged objective values obtained by the proposed MSES and DLLSO on the representative benchmarks across 25 independent runs with different configurations of 𝑨 𝑠𝑠𝑖𝑧𝑒 , 𝑑 𝑠 , 𝑄 , 𝐺 𝑡 and 𝐺 𝑟 . In the figures, the X-axis gives different benchmark functions, while the Y-axis denotes the normalized averaged objective value obtained by each compared configuration. Specifically, the obtained objective values on each benchmark are normalized by the worst (largest) objective obtained by all the compared algorithms on the benchmark. Therefore, values close to 0 and 1 denote the best and worst performances, respectively. Further, because DLLSO was observed to obtain a superior solution quality and search speed in contrast to the other compared algorithms in sections IV-B1 and IV-B2, it is considered as the baseline algorithm here. For a fair investigation, DLLSO was also used as the optimizer in the proposed MSES. Out of these parameters, 𝑨 𝑠𝑠𝑖𝑧𝑒 , 𝑑 𝑠 , and 𝐺 𝑟 were involved in the simplified problem space construction. 𝑨 𝑠𝑠𝑖𝑧𝑒 defined the number of solutions for constructing the simplified problem space, while 𝑑 𝑠 gave the dimensionality of the constructed space. Further, 𝐺 𝑟 determined the frequency for reconstructing the simplified problem space. Generally, small numbers of 𝑨 𝑠𝑠𝑖𝑧𝑒 and 𝑑 𝑠 simplified the problem space to a large extent, and large numbers of these two parameters could make the constructed space close to the original problem space. In addition, small and large values for 𝐺 𝑟 reconstructed the low-dimensional space frequently and infrequently during the evolutionary search, respectively. As can be observed in Fig. 6, on the partially additive separable functions, for example, F5 and F8, 𝑁𝑃 (population size) number of solutions are already able to construct (a) F7 (b) F8 (c) F9 (d) F10 (e) F11 (f) F12 (g) F13 (h) F14 (i) F15 Fig. 4. Convergence curves of average fitness (over 25 independent runs) obtained by MSES and the compared algorithms on CEC2013 partially additive separable, overlapping and fully non-separable functions, i.e., F7-F15. Y-axis: averaged objective value in log scale; X-axis: number of fitness evaluations. (a) Comparison 1: SaNSDE as the EA solver (b) Comparison 2: PSO as the EA solver (c) Comparison 3: DE as the EA solver Fig. 5. Illustration of the transferred solutions and the best solution in the population on representative benchmarks of different comparison groups.

Fig. 6. Averaged objective values obtained by the proposed MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of 𝑨 𝑠𝑠𝑖𝑧𝑒 . Fig. 7. Averaged objective values obtained by the proposed MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of 𝑑 𝑠 . a useful problem space that can improve the search in the original problem space (see the superior objective values achieved by MSES with 𝑨 𝑠𝑠𝑖𝑧𝑒 = 𝑁𝑃 , 𝑨 𝑠𝑠𝑖𝑧𝑒 = 5 ∗ 𝑁𝑃 , and 𝑨 𝑠𝑠𝑖𝑧𝑒 = 10 ∗ 𝑁𝑃 ). However, on the more complex functions, for example, F12 and F15, a larger 𝑨 𝑠𝑠𝑖𝑧𝑒 may be required to provide more information for constructing a useful problem space in MSES. Furthermore, for the dimensionality of the simplified problem space, as can be observed in Fig. 7, neither a small nor a large number for 𝑑 𝑠 is good for building a useful simplified problem space because a very low dimensionality could lose important information for efficient evolutionary search, and a space with a dimensionality close to the original problem cannot play a complementary role to the original problem space for the proposed MSES. Lastly, s depicted in Fig. 8, the frequency of reconstructing the simplified space did not significantly affect the performance of MSES on the considered large-scale benchmarks. Fig. 8. Averaged objective values obtained by the proposed MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of 𝐺 𝑟 . On the other hand, parameters 𝑄 and 𝐺 𝑡 defined the amount and frequency of knowledge sharing across problem spaces. Generally, a small value of 𝑄 and large value of 𝐺 𝑡 significantly reduced the amount and frequency of solution transfer across spaces, while a large value of 𝑄 small value of 𝐺 𝑡 greatly increased the amount and frequency of knowledge sharing across problem spaces. It can be observed from Fig. 9 and Fig. 10, with different configurations of 𝑄 and 𝐺 𝑡 values, superior solution qualities were obtained by the proposed MSES compared to DLLSO on most of the benchmarks. However, while the optimal confirmations of these parameters were generally problem-dependent, the configuration considered in the empirical study, as discussed above, was found to provide noteworthy results across a variety of larger-scale optimization problems. Fig. 9. Averaged objective values obtained by the proposed MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of 𝑄 Fig. 10. Averaged objective values obtained by the proposed MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of 𝐺 𝑡 . V. Conclusions

This paper proposed a multi-space evolutionary search paradigm for large-scale optimization. In particular, it presented the details of the problem space construction, learning of the mapping across problem spaces, and knowledge transfer across problem spaces. In contrast to existing methods, the proposed paradigm conducts an evolutionary search on multiple solution spaces derived from the given problem, each possessing a unique landscape. More importantly, the proposed paradigm makes no assumptions about the given large-scale optimization problem, such as that the problem is decomposable or that a certain relationship exists among the decision variables. To validate the performance of the proposed paradigm, comprehensive empirical studies on the CEC2013 large-scale benchmark problems were conducted. The results were compared to those of recently proposed largescale evolutionary algorithms, which confirmed the efficacy of the proposed multi-space evolutionary search for large-scale optimization. Future work will further explore effective approaches for constructing simplified problem spaces in the proposed multi-space evolutionary search for efficient problem solving in large-scale optimization. The design of adaptive parameter configurations in the proposed paradigm is also a promising research direction for improving the generality of a multi-space evolutionary search for large-scale optimization.