[PDF] A Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

Abstract

Problem decomposition plays a vital role when applying cooperative coevolution (CC) to large scale global optimization problems. However, most learning-based decomposition algorithms either only apply to additively separable problems or face the issue of false separability detections. Directing against these limitations, this study proposes a novel decomposition algorithm called surrogate-assisted variable grouping (SVG). SVG first designs a general-separability-oriented detection criterion according to whether the optimum of a variable changes with other variables. This criterion is consistent with the separability definition and thus endows SVG with broad applicability and high accuracy. To reduce the fitness evaluation requirement, SVG seeks the optimum of a variable with the help of a surrogate model rather than the original expensive high-dimensional model. Moreover, it converts the variable grouping process into a dynamic-binary-tree search one, which facilitates reutilizing historical separability detection information and thus reducing detection times. To evaluate the performance of SVG, a suite of benchmark functions with up to 2000 dimensions, including additively and non-additively separable ones, were designed. Experimental results on these functions indicate that, compared with six state-of-the-art decomposition algorithms, SVG possesses broader applicability and competitive efficiency. Furthermore, it can significantly enhance the optimization performance of CC.

Full PDF

A Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

An Chen, Zhigang Ren, Muyi Wang, Yongsheng Liang, Hanqing Liu, Wenhao Du School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, China

Corresponding author name : Zhigang Ren

Affiliation : School of Automation Science and Engineering, Xi’an Jiaotong University

Permanent address : No.28 Xianning West Road, Xi’an Shaanxi, 710049, P.R. China

Email address : [email protected]

Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

An Chen, Zhigang Ren, Muyi Wang, Yongsheng Liang, Hanqing Liu, Wenhao Du School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, China

Abstract:

Keywords : cooperative coevolution, problem decomposition, surrogate model, large scale global optimization.

1. Introduction

In the past decade, an increasing number of large scale global optimization (LSGO) problems have emerged from scientific research and engineering applications [13, 22, 23, 54]. Parameter training in deep neural networks [48], the design of potable water distribution networks [3], and petroleum reservoir management are a few typical examples [47]. Despite its prevalence, the LSGO problem is a hard nut to crack. For one thing, its mathematical model is difficult to build, making classic mathematical programming methods inapplicable. For another, evolutionary algorithms (EAs) suffer from the curse of dimensionality and lose their efficacy dramatically [2]. To relieve these difficulties, some researchers developed a cooperative coevolution (CC) architecture [5, 22, 28, 33, 37, 38, 50]. Taking the divide-and-conquer idea, CC first decomposes a complex LSGO problem into a set of simpler sub-problems and then cooperatively optimizes them using conventional EAs like differential evolution (DE) [5, 28, 50] and particle swarm optimization [1, 21]. As such, the problem decomposition plays a critical role in CC. It aims to group nonseparable variables into the same sub-problem and separable variables into different ones. A proper decomposition can decrease the solving difficulty of an LSGO problem without changing its optimum, but an improper one may lead CC to a Nash equilibrium rather than a real optimum [32]. Besides, it is expected that the decomposition process consumes as few fitness evaluations (FEs) as possible because even a single evaluation for an LSGO problem is generally very costly [39, 53]. p to now, much research effort has been devoted to developing decomposition methods [22]. Nevertheless, some fundamental and pivotal challenges remain open. Early methods try to group variables statically [1, 33, 34] or randomly [21, 50, 52]. They perform well on fully separable problems but lose effectiveness on partially separable ones since they seldom consider the separability among variables [5, 28]. To overcome this drawback, learning-based methods perform decomposition by explicitly detecting variable separability. As a representative, differential grouping (DG) provides a simple separability detection criterion, i.e., checking whether the fitness variation caused by the perturbation on one variable is independent of other variables [28]. Since its inception, DG has attracted much research effort and a series of improved variants such as DG2 [31] and recursive DG (RDG) [41] were developed. Although these methods have achieved impressive decomposition performance, they only apply to additively separable problems. It is worth noting that some early learning-based methods such as variable interaction learning (VIL) [5] attempted to decompose general LSGO problems without exploiting characteristics of additive separability. VIL and its variants [9, 10, 43] group variables by detecting whether the monotonicity of the objective function w.r.t. a variable is affected by other ones and have the opportunity to identify non-additive separability. However, besides requiring a large number of FEs, these methods often encounter the issue of false separability detections [19]. To remedy the above shortcomings, this study proposes a surrogate-assisted variable grouping (SVG) algorithm. SVG designs a general-separability-oriented detection criterion . To determine the separability between a variable and other ones, this criterion checks whether the global optimum of this variable remains unchanged after perturbing the latter. It is consistent with the definition of general separability and thus applies well to generally separable problems. Moreover, this criterion only needs to locate the global optimum of a variable once. For a separable variable, the located optimum can be taken as its final optimum, which means that there is no need to tackle it again in the optimization process. Under the context of LSGO, the problem of locating the global optimum of a variable can be considered as a single-dimensional expensive optimization problem. SVG tries to solve it using the well-performed surrogate model technique [8, 14, 15, 39]. To be specific, it develops a two-layer polynomial regression scheme , which is able to obtain the global optimum of a variable with only a few FEs. As for the variable grouping process, SVG converts it into a dynamic-binary-tree-based search one . In this way, it systematically re-utilizes historical separability detection information and avoids lots of redundant detections. To verify the efficacy of SVG, we specially designed a more general suite of benchmark functions with up to 2000 dimensions and conducted comprehensive empirical studies on them. The results reveal the superiority of SVG over six state-of-the-art decomposition algorithms. The main contributions of this study are as follows: 1) A new separability detection criterion is designed. It is consistent with the definition of general separability and possesses broader applicability than existing ones. 2) A surrogate-assisted scheme is introduced to seek the global optimum of a variable required by the new criterion. With the help of the surrogate technique, the global optimum of a variable can be located with a few FEs. 3) A dynamic-binary-tree-based variable grouping procedure is developed. It facilitates re-utilizing historical separability detection information and avoids lots of redundant detections. 4) A more general benchmark suite, including both additively and non-additively separable functions, is designed. The performance of SVG is comprehensively investigated on this benchmark suite. A thorough analysis of the empirical results is presented. The remaining part of this paper proceeds as follows: Section 2 reviews the existing decomposition algorithms after briefly introducing the definition of separability and the framework of CC. Section 3 describes the proposed SVG algorithm in detail. Section 4 first introduces the designed benchmark suite and then reports experimental settings and results. Finally, Section 5 oncludes this paper and discusses some future research directions.

2. Preliminaries

Definition 1.

A problem ( ) f x is generally separable if it satisfies (assuming minimization) [45]: arg min ( ) (arg min ( , , ), , arg min ( , , )) k k f f f       x x x x x x , (1) where ( , , ) n x x   x is an n -dimensional decision vector and , , k  x x are k (

2, , k n   ) disjoint subcomponents. We say the variables from the same subcomponent (or different subcomponents) to be non-separable (or separable) with each other. The variable separable with all the others is called a separable variable. Definition 2.

A problem ( ) f x is additively separable if it has the following form [28]: ( ) ( ) k i ii f f    x x , (2) where ( ) i f  denotes the subfunction of the i -th nonseparable subcomponent. It is obvious that additive separability is a special case of the general one. To further distinguish these two kinds of separability, let us compare the generally separable Ridge function ( , ) f x x x x   and the additively separable Sphere function ( , ) f x x x x   . For the latter, a perturbation on x only causes the fitness landscape w.r.t. x to translate. In other words, it does not affect the fitness variation caused by the change of x . By contrast, this property does not hold for the Ridge function whose separability only consists in that the global optimum of x remains unchanged after perturbing x . The difficulty of tackling an LSGO problem mainly lies in its large solution space that cannot be fully explored by traditional EAs. Two main categories of approaches have been developed to improve traditional EAs for LSGO, namely, CC methods and non-CC methods [11, 16, 18, 25, 26]. No-CC methods solve an LSGO problem as a whole and are generally equipped with some exploration-ability-enhanced operators [18, 25, 26] or some dimensionality reduction techniques [11, 16]. As a contrast, CC methods are developed by exploiting the separability of LSGO problems. Algorithm 1 outlines the general framework of CC. It involves two main processes: decomposition and optimization. Note that since an explicit low-dimensional simulation model is unavailable for each sub-problem due to the black-box characteristic of the original LSGO problem, CC generally initializes a context vector ( cv ) with a complete solution to assist the evaluation of sub-solutions [1, 37] in line 2. Concretely, it inserts a sub-solution to be evaluated into the corresponding positions in cv and estimates its fitness by indirectly evaluating the modified cv with the original high-dimensional simulation model. Algorithm 1 : Cooperative coevolution 1. Perform decomposition: { , , } k   x x x ; 2. Initialize the context vector and the population for each sub-problem; 3. while the termination criterion is not met 4. Determine the sub-problem i to be optimized; 5. Optimize the i -th sub-problem for a certain number of generations; 6. Re-divide x if necessary; 7. return the best solution found so far. .2 Decomposition methods According to

Algorithm 1 , problem decomposition plays a fundamental role in CC. So far, lots of decomposition methods have been developed. The incipient decomposition methods are static ones. They directly divide an n -dimensional problem into k m -dimensional sub-problems ( n k m   and m n  ) and keep the grouping fixed during the following optimization process [1, 33, 34]. These methods are easy to implement but can only apply to fully separable problems due to the neglect of variable separability. Directing against the rigidness of static methods, random decomposition methods re-divide all decision variables in each evolution cycle [21, 50, 52]. The nonseparable variables thus have a chance of being assigned to the same sub-problem. Despite some improvement, random methods still perform poorly on the problems involving several groups of nonseparable variables [28]. Different from the above two types of decomposition methods, learning-based methods perform decomposition by explicitly detecting variable separability. Some studies attempted to learn variable separability by analyzing the solutions sampled in the optimization process. For instance, Ray and Yao [35] calculated the correlation coefficient between each pair of variables based on the current population individuals and assigned them to the same sub-problem if their correlation coefficient is larger than a predetermined threshold. Xu et al. [46] pointed out that this indicator cannot describe nonlinear separability and replaced it with mutual information. Omidvar et al. [30] detected variable separability according to the variation of each variable in two successive evolutionary cycles and thought that the variables with small changes are more likely to be nonseparable. By contrast, Yang et al. [49] tended to group the variables showing evolution consistency into the same sub-problem. In general, these decomposition methods can outperform static and random ones. However, as their separability detection criteria are set in a heuristic way, they can hardly achieve a proper decomposition. To improve decomposition accuracy, some other studies designed separability detection criteria according to the characteristic of variable separability and judged whether the corresponding criteria hold by investigating the relationship among the fitness values of some purposefully sampled solutions. Inspired by the feature of additive separability, Omidvar et al. [28] developed the DG algorithm which takes the following detection criterion: Criterion 1 : For an n -dimensional function ( ) f x , its two variables i x and j x are additively separable if for [ , ] n   cv lb ub , ' '' , [ ( ), ( )] i i x x i i  lb ub and ' '' , [ ( ), ( )] j j x x j j  lb ub ( ' '' i i x x  and ' '' j j x x  ), the following condition holds: ' '' ' ' '' '' ( , | ) ( , | ) i i j i i j x x x x x x    , (3) where ' '' ' ' ' '' ' ( , | ) ( | , ) ( | , ) i i j i j i j x x x f x x f x x      cv cv . (4) Here, lb and ub represent the lower and upper bound of x , respectively, and ' ' | , i j x x  cv denotes the complete solution generated by inserting ' i x and ' j x into the corresponding positions in cv . Considering the computational roundoff error, DG introduces a predefined threshold  into Eq. (3) and converts it to ' '' ' ' '' '' | ( , | ) ( , | ) | i i j i i j x x x x x x     . DG achieves great success and is one of the most influential decomposition algorithms now. However, the original DG omits the indirect interdependency, i.e., the interdependency between two variables being directly separable but linked by other variables [42], and also faces the difficulty of setting a proper value for  . To identify indirect interdependency, some DG variants such as global DG (GDG) [24] and DG2 [31] detect the separability between each pair of variables and link each group of nonseparable variables together. Despite their effectiveness, these methods require ( ) O n

FEs for an n -dimensional problem. To reduce the FE consumption, some other DG variants detect separability from he perspective of variable subsets instead of variable individuals [4, 12, 17, 36, 41, 47]. The motivation behind this consists in that if two variable subsets i X and j X are separable, then each variable in i X is separable with the ones in j X , thereby avoiding pairwise separability detection. The earliest variable-subset-based DG algorithm is fast interdependency identification (FII) [12]. It first excludes separable variables by investigating the separability between each variable and all the other ones and then further decomposes the remaining nonseparable variables. Different from FII, RDG [41] involves a recursive process, during which it equally divides the variable subset interacting with the current target variable and successively detects the separability between each resulting subset and the target variable. This process continues until all the variables interacting with the target variable are captured. By this means, RDG reduces the FE requirement to ( log ) O n n . The efficient variable interdependency identification and decomposition method (EVIID) [17] further accelerates RDG by reutilizing some sampled solutions and pre-sorting variables according to the expected numbers of their expected nonseparable variables. Recently, the topology-based DG [47] algorithm integrates topological information into the decomposition process and remarkably improves decomposition efficiency. As for the decomposition threshold  , GDG [24] sets it to a value proportional to the minimum fitness of several randomly sampled solutions. Compared with the way to set  to a fixed value, this method provides certain adaptability and achieves higher decomposition accuracy. However, it still requires users to specify a proportion factor. To alleviate this issue, DG2 [31] first estimates the greatest lower bound and the least upper bound of the computational roundoff error and then configures  according to the estimation results. Instead of directly setting a value for  , the study in [6] normalizes the indicator in DG and adaptively generates a threshold value for the indicator by analyzing its distribution. Due to the additive feature of Criterion 1 , DG and its variants only apply to additively separable problems. For generally separable ones, Chen et al. [5] proposed the well-known VIL algorithm. Its separability detection criterion can be detailed as follows:

Criterion 2.

For an n -dimensional problem ( ) f x , i x and j x are separable if for [ , ] n   cv lb ub , ' '' , [ ( ), ( )] i i x x i i  lb ub and ' '' , [ ( ), ( )] j j x x j j  lb ub ( ' '' i i x x  and ' '' j j x x  ), the following condition holds: ' '' ' ' '' '' ( , | ) ( , | ) 0 i i j i i j x x x x x x     . (5) This criterion endows VIL with an ability to identify general separability among variables. However, for a pair of variables, it requires VIL to detect whether Eq. (5) holds many times, resulting in extremely high FE consumption. Even so, it was indicated that VIL may still omit some interdependencies [19]. Moreover, our study presented in Section 3.1 will reveal that VIL may also erroneously identify separable variables as nonseparable ones. To improve decomposition efficiency, some VIL variants extend Criterion 2 by conducting separability detection from the perspective of variable subsets [9, 10]. For example, the fast variable interdependency searching (FVIS) [10] algorithm employs a recursive process like the one in RDG to perform decomposition and remarkably reduces the FE consumption. Nevertheless, it still cannot ensure a desirable decomposition accuracy. Recently, Sun et al. [40] developed an another generally-separable-problem-oriented decomposition method called maximum entropic epistasis (MEE). It employs the maximal information coefficient to measure whether the partial derivative of a variable has a relationship with an another variable and identifies the separability between them accordingly. MEE adapts itself well to general separability but is conducted in a pairwise fashion and requires many evaluated samples to calculate the maximal information coefficient, which is unaffordable under the context of LSGO. From previous studies, it can be known that a proper separability detection criterion is the basis to achieve accurate decomposition, nd reducing separability detection times and the number of FEs consumed in a single detection is the key to lower the FE requirement of a decomposition algorithm. Moreover, the study on decomposing generally separable LSGO problems is still unfolding. Given this situation, this study proposes the SVG algorithm. It can efficiently decompose generally separable LSGO problems not limited to additively separable ones.

3. Surrogate-Assisted Variable Grouping

To attain an accurate decomposition for a generally separable problem, a commensurate separability detection criterion is indispensable. Unfortunately, current predominant

Criterion 1 and

Criterion 2 show some apparent limitations, which can be intuitively illustrated by

Fig. 1 . Figures 1(a)-(c) present the fitness landscapes of ( , ) f x x x x   , ( , ) f x x x x   , and ( , ) x x f x x x x x x e    w.r.t. x before and after perturbing x . All these three functions are separable, where the first one is additively separable. However, Criterion 1 only takes effect on

Fig. 1(a) but fails to identify the separability shown by

Figs. 1(b)-(c) . Take the case in

Fig. 1(b) as an example. When x is fixed to '2 x , the fitness variation caused by perturbing x from 0 to 1 does not equal the corresponding variation when x is fixed to ''2 x , although the two fitness variations share the same sign. Criterion 2 can successfully identify the separability shown by

Fig. 1(b) but fails to tackle the case in

Fig. 1(c) , where the two fitness variations have different signs. To this end, a more reliable separability detection criterion is required. (a) ( , ) f x x x x   (b) ( , ) f x x x x   (c) ( , ) x x f x x x x x x e   

Fig. 1.

Three examples to show the limitations of

Criterion 1 and

Criterion 2 . According to

Definition 1 , a target variable t x is generally separable with an undetected one u x if its global optimum is independent of the latter. A direct approach to detecting this hallmark is to check whether the two global optima of t x obtained before and after perturbing u x equal each other. However, it needs to locate the global optimum of t x twice in a single detection and thus consumes excessive FEs. To avoid this situation, a simple way is to check whether the previous optimum of t x still holds after perturbing u x . This basic idea can be illustrated by Fig. 2 , where each curve represents the fitness landscape w.r.t. t x around its global optimum. Let * t x denote the previous global optimum of t x . Then as shown in Fig. 2(a) , if t x and u x are generally separable, * t x keeps its optimality after perturbing u x , which means that the fitness value of * t x is still smaller than those of * t x  and * t x  with  being a small positive number. Otherwise, the global optimum of t x will move, which means that he fitness value of * t x must be larger than that of * t x  (as shown in Fig. 2(b) ) or * t x  (as shown in Fig. 2(c) ). By summarizing the cases shown in

Figs. 2(a)-(c) , we can get a simple yet efficient separability detection criterion as follows:

Criterion 3 : For an n -dimensional function ( ) f x and [ , ] n   cv lb ub , suppose the current global optimum of a variable t x is * t x . Then t x is generally separable with an another variable u x if the following condition holds: * ' * ' * ' ( | , ) min{ ( | , ), ( | , )} t u t u t u f x x f x x f x x        cv cv cv , (6) where ' u x is a value of u x satisfying ' [ ( ), ( )] u x u u  lb ub and ' ( ) u x u  cv . Otherwise, the two variables are generally nonseparable. (a) The global optimum stays unchanged (b) The global optimum moves left (c) The global optimum moves right Fig. 2.

The basic idea of

Criterion 3 . This criterion possesses some superiorities over

Criterion 1 and

Criterion 2 . It extends

Criterion 1 by applying to non-additive separability. It can also avoid the false detection encountered by

Criterion 2 . Besides, some further remarks are deserved to make on

Criterion 3 . Firstly, like

Criterion 1 and

Criterion 2, the undetected variable u x can be replaced with a variable subset u X . If the optimum of t x is independent with u X , then t x can be judged separable with each variable in u X . Otherwise, there must be one or more variables in u X interacting with t x . Secondly, this criterion requires to locate * t x in advance. This is a seemingly challenging problem, considering the FEs required by an optimization process. Fortunately, despite its expensive characteristic, this optimization problem merely involves a single variable and can be efficiently solved by the surrogate model technique [8, 14, 15, 39]. Moreover, if t x is a separable variable, the located * t x can be directly taken as its final optimum. Therefore, we need not tackle it again in the optimization process of CC. Finally,  specifies a small neighborhood of * t x . We may set a value for it according to the step size when searching for * t x . The details will be described in the next subsection. Algorithm 2 : * ' ( , ) DetectSep( , , , , , ) t u t isSep FEsUsed x X x  cv cv

1. Perform initializations: FEsUsed  , isSep flase  , and  s cv ; 2. For each u x in u X

3. Set ( ) u s to ' ( ) u cv ; 4. Set s , s , and s : , ,  s s s s and *1 ( ) t t x   s , *2 ( ) t t x  s , *3 ( ) t t x   s ; 5. if ( ) min{ ( ), ( )} f f f  s s s then isSep true  ; 7. Update FEsUsed : FEsUsed FEsUsed   ; 8. return isSep and

FEsUsed . lgorithm 2 presents the pseudocode of applying Criterion 3 to detecting the separability between t x and u X , where ' cv is employed to specify the value of u X after perturbation. For simplicity, we set cv and ' cv to lb and ( ) / 2  lb ub , respectively. As described above, we need to seek the global optimum of a variable before applying

Criterion 3 . An EA may effortlessly accomplish this task but generally requires many FEs. To reduce the FE requirement, this study employs the well-performed surrogate model technique to tackle this issue [8, 14, 15, 39]. Its main idea is to first construct a surrogate model for the original problem using some real-evaluated solutions, and then to efficiently and approximatively evaluate candidate solutions with the constructed model. As a consequence, many real FEs can be avoided. Up to now, several types of classic surrogate models [14, 15], including polynomial regression (PR), Gaussian process, and radial basis function, have been developed. Considering the one-dimension and possible multi-model features of the problem to be optimized, this study designs a two-layer polynomial regression (TLPR) scheme . As its name implies, TLPR involves a bilayer surrogate structure. The first layer is about a global PR model. It aims to approximate the global profile of the fitness landscape by smoothing out the local optima. With its help, a promising region covering the global optimum of the problem is expected to be obtained. Then in the second layer, TLPR first divides the obtained trust region into several segments and then constructs a local PR model for each one. As each local model is just responsible for a small segment, it can achieve very high accuracy. By comparing the optima provided by all the local PR models, an expectant global optimum can be obtained. The basic PR model for a one-dimensional problem can be formulated as follows: ( ) r rr r

PR x p x p x p x       , (7) where , , , r r p p p   are the coefficients and can be approximated by applying the least-squares training method based on at least r  samples. The degree r determines the expression preference of the built model. The model of a low degree (e.g. two) can smooth out the local optima of the problem and thus is good at capturing its global profile. By contrast, the model of a high degree (e.g. five) can describe fitness features in detail but may over fit. Motivated by these properties, TLPR employs the two-degree and five-degree PR models in the first and second layers, respectively. Note that the global optima of these two kinds of PR models can be directly deduced according to their analytical expressions. Algorithm 3 describes the pseudocode of TLPR. Lines 1-4 are about the first layer, where a two-degree PR model is constructed based on 100 uniformly sampled solutions. As our preliminary study suggested, such a configuration can provide enough solutions around the global optimum and thus approximate the global profile of the fitness landscape well. After locating the optimum of the two-degree PR model in line 3,

Algorithm 3 takes it as the center of the trust region and specifies the region as 10% of the feasible one in line 4. Lines 5-9 are about the second layer. To divide the trust region as finely as possible,

Algorithm 3 takes each six sample solutions, the minimum number required for training a five-degree PR model, as a unit to construct the model in lines 7-8. For the preliminary global optimum * t x chosen in line 9, line 10 further improves it with a local search procedure based on the original simulation model. The classic BFGS method is employed here for its impressive performance [27]. As * t x is near the real global optimum, BFGS generally gets converged within several iterations. Its final step size is also output and can be naturally employed to specify the small neighborhood  for Criterion 3 . lgorithm 3 : * ( , , ) TLPR( , , , ) t t x FEsUsed x   lb ub cv

1. Uniformly sample 100 solutions in [ ( ), ( )] t t lb ub ; 2. Evaluate the sampled solutions based on cv and set FEsUsed to 100; 3. Construct a two-degree PR model and locate its optimum *' t x ; 4. Define the trust region as 10% of [ ( ), ( )] t t lb ub centered on *' t x ; 5. Uniformly re-sample another 100 solutions in the trust region; 6. Evaluate the sampled solutions and update FEsUsed : FEsUsed FEsUsed   ; 7.

For each six sampled solutions 8. Construct a five-degree PR model and locate the candidate optimum; 9. Pick out the best optimum * t x among all the candidates; 10. Conduct BFGS to further improve * t x ; 11. Set  to the last step size of BFGS and update FEs; 12. return * t x ,  , and FEsUsed . The key of problem decomposition lies in that, for the current target variable t x , how to efficiently extract concrete variables interacting with it from the undetected variable set u X . A direct way is to check the separability between t x and each variable individual in u X one by one. However, it requires a considerable number of FEs. Directing against the inefficiency, we develop a dynamic-binary-tree-based variable grouping (DBTG) procedure. DBTG works as follows: it first sets u X to be the root node and detects its separability with t x . If they interact with each other, DBTG equally divides u X into two child nodes and successively detects their separability with t x . A node will not be further divided if it only involves a single variable or is separable with t x . DBTG shares some similarities with the recursive process in RDG [41], which can be also described by a binary tree. Besides taking a different separability detection criterion, another improvement of DBTG lies in that it reduces separability detection times by reutilizing some historical detection information. To get a better understanding of this mechanism, let us consider the following example: Fig. 3.

An example for illustrating the work process of DBTG.

Example : For the problem ( ) ( ) ( 1) f x x x x x x        x , Fig. 3 presents the process of DBTG when capturing the variables interacting with x . After identifying the nonseparability between x and { , , x x , } x x , DBTG equally divides the latter into two child nodes and then detects the separability between { , } x x and x . Since they are separable, it can be deduced that { , } x x is nonseparable with x . The reason behind this deduction is that a variable subset nonseparable with x must cover at least one variable interacting with it. Thus DBTG directly divides { , } x x into { } x and { } x . Similarly, after identifying the separability between x and { } x , DBTG judges x to be nonseparable with x without further detection. This means that compared with the existing recursive decomposition process, DBTG avoids two unnecessary detections. lgorithm 4 presents the pseudocode of DBTG. A queue treeNodes is set to dynamically store the nodes interacting with t x . For each node popped out from treeNodes , if it involves a single variable, Algorithm 4 directly adds it to ' t X in lines 8-9 which stores all the variables interacting with t x . Otherwise, it halves this node into two child ones in line 11. After detecting the separability between the first child node and t x in lines 12-13, DBTG further detects or deduces the separability between the other one and t x in lines 16-19, and push the two nodes into treeNodes if they are nonseparable with t x . The process will not terminate until treeNodes becomes empty. Algorithm 4 : ' * ' ( , ) DBTG( , , , , , ) t t u t X FEsUsed x X x  cv cv

1. Perform initializations: ' t X   , FEsUsed  , and treeNodes   ; 2. Detect the separability between t x and u X : ' * ' ( , ) DetectSep( , , , , , ) t u t isSep FEsUsed x X x  cv cv ; 3. Update FEsUsed : ' FEsUsed FEsUsed FEsUsed   ; 4. if isSep false  then

5 Push u X into the rear of treeNodes ; 6. while | | 0 treeNodes 

7. Pop out the first node of treeNodes and denote it as c X ; 8. if | | 1 c X  then

9. Update ' t X : ' ' t t c X X X   ; 10. else

11. Equally divide c X : c c c X X X   and set isSep false ; 12. Detect the separability between t x and c X : ' * '1 ( 1, ) DetectSep( , , , , , ) t c t isSep FEsUsed x X x  cv cv ; 13. Update FEsUsed : ' FEsUsed FEsUsed FEsUsed   ; 14. if isSep false  then

15. Push c X into the rear of treeNodes ; 16. Detect the separability between t x and c X : ' * '2 ( 2, ) DetectSep( , , , , , ) t c t isSep FEsUsed x X x  cv cv ; 17. Update FEsUsed : ' FEsUsed FEsUsed FEsUsed   ; 18. if isSep false  then

19. Push c X into the rear of treeNodes ; 20. return ' t X and FEsUsed . Taking DBTG as a key component,

Algorithm 5 presents the pseudocode of the whole SVG algorithm. After performing initializations, it starts an iteration by randomly selecting a variable from the undetected variable subset u X as the target variable t x in line 4. Then it successively locates the global optimum of t x using the proposed TLPR scheme in line 5 and extracts the concrete variables interacting with t x by performing DBTG in line 9. If t x is separable, it is stored into seps in lines 8 and 12; Otherwise, t x together with its partners is stored into nonseps in line 14. SVG iteratively conducts the above process until all the variables are tackled. Note that once the global optimum of a variable is obtained, it is employed to update the corresponding component in cv , which will be further transferred to the optimization process of CC and thus provides a promising initial solution. Since the global optima of separable variables have been obtained, we only need to focus on nonseparable variables in the optimization process. lgorithm 5: ( , , , ) SVG( , ) seps nonseps FEsUsed  cv lb ub

1. Perform initializations: , seps nonseps   ,  cv lb , ' ( ) / 2   cv lb ub and FEsUsed  ; 2. Assign all decision variables to u X ; 3. while | | 1 u X 

4. Initialize t x with a random variable in u X and delete it from u X ; 5. Locate the global optimum of t x : * ' ( , , ) TLPR( , , , ) t t x FEsUsed x   lb ub cv ; 6 Update cv and FEsUsed : * ( ) t t x  cv and ' FEsUsed FEsUsed FEsUsed   ; 7. if | | ==0 u X then

8. Update seps : { } t seps seps x   ; break ; 9. Extract the variables interacting with t x : ' ' * ' ( , ) DBTG( , , , , , ) t t u t X FEsUsed x X x  cv cv ; 10. Update FEsUsed : ' FEsUsed FEsUsed FEsUsed   ; 11. if ' | | 0 t X  then

12. Update seps : { } t seps seps x   ; 13. else

14. Update nonseps and u X : ' { } t t nonseps nonseps x X    and ' \ u u t X X X  ; 15. return seps , nonseps , cv and FEsUsed .

3. Experimental studies

In this section, we first introduce the new benchmark suite designed for generally separable LSGO problems, and then sequentially evaluate the performance of SVG in terms of decomposition accuracy and efficiency, capability in enhancing the optimization performance of CC, and scalability by comparing it with six state-of-the-art decomposition methods on the new benchmark suite.

To date, several LSGO benchmark suites have been developed and played important roles in evaluating the performance of LSGO algorithms [20, 29, 44]. Nevertheless, the separable functions in these suites are generally limited to additively separable ones, which cannot completely reflect the features of separable problems in practice and may mislead the development of decomposition algorithms. For example, the widely-used CEC’2010 benchmark suite involves 18 fully or partially separable functions, 15 out of which are additively separable ones [44]. To this end, we design a more general benchmark suite in this study. The new benchmark suite takes a similar function generation method as the CEC’2010 suite [44], but introduces two more generally separable functions, i.e., the Exponential function and the Ridge function, as basic ones to generate new benchmark functions. All the involved basic functions are listed as follows: 1) The n -dimensional additively separable Elliptic Function:

16 211 ( ) (10 ) in nelli ii f x    x , [ 100,100] n   x ; 2) The n -dimensional additively separable Rastrigin's Function: ( ) [ 10 cos(2 ) 10] nrast i ii f x x       x , [ 5,5] n   x ; 3) The n -dimensional generally separable Exponential Function:

1( ) 200 200 ( ) nexpo ii if exp xn n        x , [ 32,32] n   x ; 4) The n -dimensional generally separable Ackley Function: n nackl i ii i f x x en n                      x , [ 32,32] n   x ; 5) The n -dimensional generally separable Ridge Function: ( ) nridg ii f n x     x , [ 100,100] n   x ; 6) The n -dimensional nonseparable Schwefel Function:

21 1 ( ) n ischw ii j f x          x , [ 100,100] n   x . Table 1 presents the details of the generated 21 benchmark functions, which can be divided into five categories, including fully separable functions, partially separable ones, and nonseparable ones. The symbol o denotes the global optimum satisfying ( ) 0 f  o , z represents the translation of a candidate solution x from o , the subscript Rot denotes the coordinate rotation operator making the corresponding variables nonseparable, and m specifies the size of a nonseparable variable subcomponent. The problem dimension n and m are set to 1000 and 50, respectively. Overall, nine functions, including f - f , f - f , and f - f , are generally separable but are not additively separable. Table 1.

Details of the 21 new benchmark functions.

Func. Equation Categories f ( ) ( ) elli f f  x z ,   ο z x Fully separable function f ( ) ( ) rast f f  x z f ( ) ( ) expo f f  x z f ( ) ( ) ackl f f  x z f ( ) ( ) ridg f f  x z f ( ) ( (1: )) ( ( 1: )) elli Rot elli f f m f m n    x z z Partially separable function with a single m -dimensional subcomponent f ( ) ( (1: )) ( ( 1: )) rast Rot rast f f m f m n    x z z f ( ) ( (1: )) ( ( 1: )) expo Rot expo f f m f m n    x z z f ( ) ( (1: )) ( ( 1: )) ackl Rot ackl f f m f m n    x z z f ( ) ( (1: )) ( ( 1: )) schw ridg f f m f m n    x z z f /(2 )1 ( ) ( (( 1) 1: )) ( ( / 2 1: )) n m elli Rot ellii f f i m i m f n n          x z z Partially separable function with n /(2 m ) m -dimensional subcomponents f /(2 )1 ( ) ( (( 1) 1: )) ( ( / 2 1: )) n m rast Rot rasti f f i m i m f n n          x z z f /(2 )1 ( ) ( (( 1) 1: )) ( ( / 2 1: )) n m expo Rot expoi f f i m i m f n n          x z z f /(2 )1 ( ) ( (( 1) 1: )) ( ( / 2 1: )) n m ackl Rot ackli f f i m i m f n n          x z z f /(2 )1 ( ) ( (( 1) 1: )) ( ( / 2 1: )) n m schw ridgi f f i m i m f n n          x z z f /1 ( ) ( (( 1) 1: )) n m elli Roti f f i m i m        x z Partially separable function with / n m m -dimensional subcomponents f /1 ( ) ( (( 1) 1: )) n m rast Roti f f i m i m        x z f /1 ( ) ( (( 1) 1: )) n m expo Roti f f i m i m        x z f /1 ( ) ( (( 1) 1: )) n m ackl Roti f f i m i m        x z f /1 ( ) ( (( 1) 1: )) n m schwi f f i m i m        x z f ( ) ( ) schw f f  x z Nonseparable function .2 Investigation of decomposition performance

To reveal the decomposition performance of SVG, we experimentally compared it with six state-of-the-art decomposition algorithms, including DG, DG2, RDG, EVIID, VIL, and FVIS, whose main ideas have been briefly introduced in Section 2.2. For a fair comparison, the parameters involved in each competitor were set according to the suggestions in the corresponding original paper. The decomposition efficiency and accuracy of each algorithm were quantified by the number of consumed FEs and an indicator named normalized mutual information (NMI), respectively. The latter is defined as follows [7]:

Definition 3 . For an n -dimensional problem ( ) f x , suppose { , , } k D X X   and ' ' '1 { , , } k D X X   are the ideal decomposition and the one generated by a decomposition algorithm, respectively. The NMI indicator between D and ' D can be formulated as '21 1' ''2 21 1 k k ijij i ji jk k jii ji j nMM X XD D XXX Xn n                       , (8) where M denotes the confusion matrix with each element ij M representing the number of common variables in i X and ' j X . This metric is derived from mutual information with its value varying in the range [0,1] . NMI can precisely quantify the consistency between D and ' D . The more consistent they are, the larger ' NMI( , )

D D is. Specially, if ' D is the same as D , then ' NMI( , ) 1

D D  . Table 2 lists the decomposition results of SVG and its six competitors on the developed benchmark suite. For a more in-depth analysis of the decomposition accuracy of each algorithm, we separately calculated its NMI values on the separable and nonseparable variables, which are denoted by  and  in Table 2 , respectively. For SVG, the entry dis shows the Euclidean distance between the subcomponent of the final cv w.r.t. the identified separable variables and the corresponding real optima, and thus can quantify the performance of the developed TLPR scheme. The bottom row of Table 2 averages the results of each algorithm on all the functions. According to

Table 2 , the following comparative analyses can be made: 1)

Decomposition accuracy : For separable variables, SVG achieves 100% decomposition accuracy (  ) on 12 out of all the 15 functions containing separable variables and nearly 100% accuracy on the other three functions, i.e., f , f , and f . These three functions are all the variants of the Ackley function, where each decision variable involves many local optima. The imperfect of SVG on them comes from that the TLPR scheme occasionally mistakes the local optima of some target variables as the global ones, leading to improper decompositions. As discussed above, DG and its variants achieve satisfying accuracy on additively separable functions but lose their competitiveness on most non-additively separable ones. For example, DG2, RDG, and EVIID all perform poorly on f - f and f - f . However, it is somewhat counterintuitive that DG achieves 100%  on f and f , RDG achieves 100%  on f , and EVIID achieves 100%  on f - f . A closer inspection reveals that their success on these non-additively separable functions can be attributed to large threshold values, which unintentionally make Criterion 1 correctly identify general separability. As for VIL and FVIS, they correctly identify most separable variables, but also make some mistakes on the variants of the Ackley function. able 2.

Decomposition results of the seven decomposition algorithms on 1000-dimensional benchmark functions.

Func. DG DG2 RDG EVIID VIL FVIS SVG   FEsUsed   FEsUsed   FEsUsed   FEsUsed   FEsUsed   FEsUsed   FEsUsed dis f        f         f         f         f         f f

100 100 9.07e+05 100 100 5.01e+05 100 100 4.23e+03 100 100 3.08e+03 99.62 0.00 1.80e+06 100 0.00 5.39e+04 100 100 2.02e+05 1.62e  f

100 100 9.06e+05 2.85 100 5.01e+05 8.08 100 9.12e+03 100 100 3.77e+04 100 0.00 6.60e+04 100 0.00 1.43e+05 100 100 1.93e+05 1.55e  f

100 100 9.08e+05 15.30 100 5.01e+05 100 100 5.08e+04 100 100 5.97e+03 97.57 0.00 1.80e+06 100 100 5.25e+04 99.69 100 1.91e+05 1.26e  f  f

100 100 2.70e+05 100 100 5.01e+05 100 100 1.39e+04 100 100 7.66e+03 98.17 38.62 1.80e+06 100 60.35 1.40e+06 100 100 1.23e+05 0.00e+00 f

100 100 2.71e+05 100 100 5.01e+05 100 100 1.43e+04 100 100 8.74e+03 99.18 54.40 1.80e+06 100 97.70 1.25e+05 100 100 1.21e+05 1.14e  f  f

100 99.79 2.85e+05 0.00 100 5.01e+05 0.00 100 1.35e+04 0.00 100 6.97e+03 98.05 51.15 1.81e+06 100 100 1.09e+05 98.48 100 1.10e+05 5.43e  f  f 

100 2.10e+04 

100 5.01e+05 

100 2.09e+04 

100 1.25e+04   

100 2.69e+04  f 

100 2.10e+04 

100 5.01e+05 

100 2.08e+04 

100 1.23e+04   

100 2.53e+04  f  

100 5.01e+05 

100 2.08e+04 

100 1.26e+04   

100 2.55e+04  f  

100 5.01e+05 

100 2.07e+04 

100 1.25e+04   

100 2.49e+04  f 

100 2.10e+04 

100 5.01e+05 

100 2.07e+04 

100 1.36e+04   

100 3.01e+04  f 

100 2.00e+03 

100 5.01e+05 

100 6.00e+03 

100 4.01e+03 

100 1.00e+05  

100 6.41e+03  Avg. 75.7 99.81 3.74e+05 43.27 100 5.01e+05 47.25 100 1.33e+04 60.0 100 1.19e+04 99.44 48.03 1.31e+06 100 54.38 6.21e+05 99.85 100 1.30e+05 4.71e-10

For nonseparable variables, SVG achieves 100% decomposition accuracy (  ) on all the 16 functions containing nonseparable variables. DG and its variants also perform well on most of these functions. The reason is that if two variables are generally nonseparable, they must be additively nonseparable. As for VIL and FVIS, despite their competition on separable variables, they achieve very low decomposition accuracy on nonseparable variables. This is because, for a pair of nonseparable variables, the solutions sampled without care can hardly satisfy Criterion 2 , which inevitably results in some omissions of interdependency. To sum up, the six state-of-the-art algorithms merely show some superiority on separable variables or nonseparable ones, while SVG performs consistently well on both types of variables. This merit mainly profits from two factors: 1) The developed

Criterion 3 can correctly identify general separability beyond additive separability; 2) the algorithmic component TLPR can efficiently provide global optima of variables required by

Criterion 3 . For the detailed decomposition results of SVG, readers can refer to

Table S1 in the supplementary. 2)

Decomposition efficiency : SVG shows distinctly different FE consumptions w.r.t. different function categories. On the functions with separable variables ( f - f ), the number of FEs it consumes varies in the range [1 10 , 2 10 ]   . The results are better than those of DG2 and VIL but worse than those of RDG and EVIID, whose number of FEs drops an order of magnitude on average. The mediocrity of SVG on these functions derives from the fact that it needs to frequently invoke TLPR to locate the global optima for separable variables. Fortunately, this cost is not in vain. The located global optima can be directly transferred to the optimization process of CC. The entry dis in Table 2 demonstrates that these optima are very close to the real ones. This verifies the efficiency of TLPR on the one hand. On the other hand, this means that it is unnecessary to tackle separable variables in the optimization process of CC, which will save many FEs. We will further reveal this merit in the subsequent experiment. s for the functions without separable variables ( f - f ), SVG achieves competitive decomposition efficiency. Its FE consumption is much fewer than those of DG2, FVIS, and VIL, and is similar to those of DG and RDG. SVG shows some inferiority when compared with EVIID which is an efficient decomposition algorithm developed most recently. Nevertheless, the numbers of FEs they consumed still share the same order of magnitude. To investigate the capability of SVG in enhancing the optimization performance of CC, we incorporated it into the DECC framework [50] and compared its final optimization results with those obtained by the above six competitors. DECC is a classic CC framework. It takes an improved DE algorithm called SaNSDE [51] as the sub-problem optimizer and optimizes all the sub-problems in a round-robin fashion. During the experiment, the parameters of DECC were strictly set following the original paper. According to the guidelines in [44], the maximum number of allowed FEs was set to  , covering the FEs required by both the decomposition and optimization processes. On each benchmark function, we independently ran each DECC algorithm 25 times and assessed its performance in terms of the median, mean, and standard deviation of the obtained optima. For statistical analysis, we first used the Kruskal-Wallis nonparametric one-way ANOVA test with a confidence interval of 0.95 to determine whether there is at least one method showing distinct optimization performance and then conducted a series of two-tailed Wilcoxon rank-sum tests at a significance level of 0.05 in a pairwise fashion to compare the optimization results. Table 3 presents the final optimization results of the seven DECC algorithms, where the entries highlighted in bold represent the best optimization results. According to

Table 3 , we can make the following observations: On the fully separable functions f - f , DECC-SVG shows excellent optimization performance. It obtains the optimal or near-optimal solution for each function and significantly outperforms its six competitors. It is worth mentioning that for f - f and f , DECC-SVG directly inherits their optimal solutions located by SVG and does not consume any FE in the optimization process. As for f , although SVG misjudges a few variables to be nonseparable, the resulting sub-problem can be effortlessly solved by the optimizer SaNSDE for its low dimension (see Table S1 in the supplementary). DECC-SVG also shows prominent superiority on the partially separable functions f - f involving separable variables. For most of these benchmark functions, DECC-SVG wins each of its six competitors by multiple orders of magnitude in terms of both median and mean. Especially, it generates near-optimal solutions for f , f , f , and f , while the solutions obtained by each of the other six DECC algorithms are far from the optima. As the seven DECC algorithms take the same CC framework and optimizer, the success of DECC-SVG mainly profits from the high decomposition accuracy and efficiency of SVG, which enable DECC-SVG to tackle nonseparable variables with sufficient FEs. As for the partially separable functions f - f that involve no separable variable and the non-separable one f , DECC-SVG also achieves competitive results in comparison with its six competitors. It shows almost the same optimization performance as DECC-DG, DECC-RDG, and DECC-EVIID. Such a result is readily comprehensible since the decomposition performance of SVG, DG, RDG, and EVIID on these functions differs slightly. able 3. Optimization results of the seven DECC algorithms on 1000-dimensional benchmark functions when  FEs are allowed.

Func. Stats DECC-DG DECC-DG2 DECC-RDG DECC-EVIID DECC-VIL DECC-FVIS DECC-SVG f Median 3.48e+06 2.09e+06 9.24e+05 1.13e+06 1.51e+06 1.66e+06  Mean 1.22e+07 4.90e+06 1.78e+06 3.78e+06 3.33e+06 1.73e+06

Std 2.30e+07 8.48e+06 2.77e+06 8.75e+06 5.74e+06 1.23e+06  f Median 7.54e+03 7.22e+03 6.73e+03 6.75e+03 6.84e+03 6.81e+03  Mean 7.57e+03 7.28e+03 6.75e+03 6.81e+03 6.84e+03 6.85e+03

Std 2.23e+02 3.11e+02 3.22e+02 3.88e+02 1.97e+02 2.86e+02  f Median 1.66e+00 1.01e+00 9.35e-01 6.23e-01 8.67e-01 8.41e-01  Mean 2.14e+00 1.09e+00 1.08e+00 8.53e-01 9.29e-01 9.60e-01

Std 1.30e+00 6.14e-01 8.49e-01 7.17e-01 5.59e-01 7.11e-01  f Median 1.11e+01 1.08e+01 1.06e+01 1.04e+01 2.04e+01 1.06e+01

Mean 1.12e+01 1.09e+01 1.06e+01 1.04e+01 2.04e+01 1.07e+01

Std 5.99e-01 8.36e-01 5.75e-01 6.24e-01 6.99e-02 7.94e-01 f Median 1.86e+04 2.49e+04 1.32e+04 1.48e+04 1.80e+04 1.73e+04  Mean 2.31e+04 3.59e+04 1.74e+04 2.27e+04 2.28e+04 2.01e+04

Std 1.64e+04 2.85e+04 1.44e+04 2.24e+04 1.89e+04 1.22e+04  f Median 2.06e+11 2.90e+10 2.47e+10 2.36e+10 1.89e+13 1.02e+13

Mean 2.16e+11 3.66e+10 3.00e+10 3.11e+10 2.02e+13 1.17e+13

Std 1.14e+11 2.47e+10 2.44e+10 2.46e+10 8.56e+12 4.95e+12 f Median

Mean

Std f Median 5.47e+00 6.00e+00 4.16e+00 3.11e+00 4.38e+01 6.39e+01

Mean 5.51e+00 6.24e+00 4.22e+00 3.01e+00 4.38e+01 6.52e+01

Std 9.60e-01 1.26e+00 1.07e+00 9.72e-01 3.67e+00 6.72e+00 f Median

Mean

Std f Median 2.01e+05 8.17e+04 1.28e+05 7.10e+04 2.17e+05 4.32e+09

Mean 1.99e+05 8.67e+04 1.20e+05 7.70e+04 2.19e+05 7.70e+09

Std 3.25e+04 2.47e+04 2.82e+04 3.13e+04 3.07e+04 6.31e+09 f Median 1.13e+07 1.51e+07 8.20e+06 7.94e+06 7.72e+09 1.11e+10

Mean 1.47e+07 1.64e+07 9.16e+06 1.22e+07 7.56e+09 1.08e+10

Std 1.19e+07 8.02e+06 3.30e+06 1.01e+07 9.20e+08 9.97e+08 f Median 5.71e+03 5.84e+03 5.61e+03 5.59e+03 1.37e+04 6.19e+03

Mean 5.69e+03 5.82e+03 5.63e+03 5.56e+03 1.36e+04 6.18e+03

Std 1.49e+02 1.80e+02 1.42e+02 1.59e+02 3.57e+02 1.71e+02 f Median 5.33e+00 6.68e+00 4.85e+00 5.66e+00 8.31e+02 7.76e+02

Mean 5.58e+00 6.78e+00 4.98e+00 5.63e+00 8.32e+02 7.80e+02

Std 1.27e+00 1.66e+00 1.59e+00 1.64e+00 2.12e+01 3.56e+01 f Median 2.93e+01 3.03e+01 3.02e+01 2.87e+01 2.29e+02 2.95e+01

Mean 2.92e+01 3.01e+01 2.97e+01 2.87e+01 2.29e+02 2.93e+01

Std 2.60e+00 2.49e+00 2.74e+00 2.30e+00 4.50e-01 2.72e+00 f Median 3.34e+04 4.47e+04 3.22e+04 3.50e+04 3.96e+05 1.26e+06

Mean 3.49e+04 4.38e+04 3.32e+04 3.49e+04 3.94e+05 1.28e+06

Std 8.90e+03 1.07e+04 9.10e+03 1.07e+04 3.73e+04 8.11e+04 f Median

Mean

Std f Median

Mean

Std f Median 2.12e-01 3.47e-01

Mean 2.14e-01 3.47e-01

Std 3.48e-02 6.95e-02 f Median f Median

Mean

Std f Median

Mean

Std

No. of the Best

7 3 8 8 1 2 The bottom row of

Table 3 summarizes the number of functions on which each DECC algorithm achieves the best solution. It can be seen that DECC-SVG performs best on 20 out of all 21 benchmark functions and gets an apparent edge over the other six lgorithms. DECC-RDG and DECC-EVIID both exhibit best optimization performance on 8 out of 21 functions and can be ranked second. They mainly win their scores on f - f , where RDG and EVIID achieve excellent decomposition accuracy and efficiency. Compared with the above two algorithms, DECC-DG loses its dominance on a function, i.e. f , where DG gets an improper decomposition. Due to the high FE requirement of DG2, DECC-DG2 shows unsatisfying performance on most functions and only get competitive solutions for f , f , and f . As for DECC-VIL and DECC-FVIS, they perform poorly on most benchmark functions. Their failure on partially separable and nonseparable ones stems from the low capability of VIL and FVIS in identifying nonseparable variables. For fully separable functions, DECC-VIL and DECC-FVIS still perform mediocrely and generate almost the same optimization results as all the other algorithms except DECC-SVG. Such a result is somewhat counterintuitive since VIL and FVIS correctly identify most of the separable variables while DG and its variants misjudge some of them to be nonseparable. The reason consists in that now it is still an open issue to reasonably group separable variables to achieve efficient optimization, and the DECC framework simply treats all the separable variables as a whole. SVG handily avoids this puzzle by optimizing each separable variable with the TLPR scheme. The above results indicate the superiority of SVG in enhancing the optimization performance of DECC when  FEs are allowed. Nevertheless, the available FEs for an LSGO problem are usually limited in practice. To further reveal the capability of SVG, we reduced the maximum number of allowed FEs to  and  , and repeated the optimization experiment. Based on the detailed results (see Table S2 and

Table S3 in the supplementary), we ranked the seven DECC algorithms according to the two-tailed Wilcoxon rank-sum test on each benchmark function.

Fig. 4 shows the final rankings with two radar charts, from which it can be seen that DECC-SVG still significantly outperforms its six competitors. It is ranked first on 20 and 19 out of all 21 functions when  and  FEs are allowed, respectively. (a) allowedFEs   (b) allowedFEs   Fig. 4.

Rankings of the seven DECC algorithms according to the two-tailed Wilcoxon rank-sum test on each 1000-dimensional benchmark function when two different maximum numbers of FEs are allowed. .4 Scalability studies

This experiment aims to study the scalability of SVG. To this end, we scaled the originally designed benchmark functions up to 2000 dimensions by resetting n and m to 2000 and 100, respectively, and re-conducted the above two sets of experiments on the new benchmark functions. Note that the maximum number of allowed FEs in this experiment was also doubled to  . Table 4 provides the decomposition results of DG, DG2, RDG, EVIID, VIL, FVIS, and SVG. It can be observed that each of them achieves similar decomposition accuracy with that on 1000-dimensional functions. Especially, SVG correctly decomposes all the functions except for misjudging very few separable variables in f , f , and f , and shows a visible edge over its six competitors. For the detailed grouping results, readers can refer to Table S4 in the supplementary. As for the decomposition efficiency, the seven algorithms increase their FE consumptions by different degrees. By comparing the bottom row of

Tables 2 and , it can be known that the numbers of FEs consumed by DG, DG2, and FVIS increase by an order of magnitude, and those consumed by RDG, EVIID, VIL, and SVG keep the same order of magnitude. A closer observation indicates that SVG just doubles its number of FEs and demonstrates good scalability in problem decomposition. Table 5 reports the final optimization results of the seven DECC algorithms. It can be seen that DECC-SVG obtains the best solutions for 18 out of all 21 functions and significantly outperforms DECC-DG, DECC-DG2, DECC-RDG, DECC-EVIID, DECC-VIL, and DECC-FVIS, which obtain the best solutions for 5, 2, 6, 8, 0, and 2 functions, respectively. For functions on which DECC-SVG shows superiority, it either achieves near-optimal solutions or outperforms most of its competitors by several orders of magnitude in terms of both median and mean. For functions on which DECC-SVG shows some inferiority, it still shares the same order of magnitude with the corresponding winners. Therefore, SVG also dramatically enhances the optimization performance of DECC on 2000-dimensional problems.

Table 4.

Decomposition results of the seven decomposition algorithms on the 2000-dimensional benchmark functions.

Func. DG DG2 RDG EVIID VIL FVIS SVG   FEsUsed   FEsUsed   FEsUsed   FEsUsed   FEsUsed   FEsUsed   FEsUsed dis f        f        f        f        f        f f

100 100 3.62e+06 100 100 2.00e+06 100 100 8.36e+03 100 100 6.13e+03 97.18 0.00 3.61e+06 100 0.00 9.70e+04 100 100 4.01e+05 2.37e-14 f

100 100 3.62e+06 9.73 100 2.00e+06 23.65 100 1.19e+05 100 100 3.49e+04 99.32 0.00 3.60e+06 100 0.00 3.71e+05 100 100 3.83e+05 7.85e-10 f

100 100 3.62e+06 35.70 100 2.00e+06 100 100 5.92e+04 100 100 9.87e+03 97.63 0.00 3.60e+06 100 100 1.01e+05 99.95 100 3.84e+05 1.00e+00 f f

100 100 1.05e+06 100 100 2.00e+06 100 100 2.81e+04 100 100 1.51e+04 99.14 49.32 3.60e+06 100 53.44 7.14e+06 100 100 2.41e+05 0.00e+00 f

100 100 1.05e+06 100 100 2.00e+06 100 100 2.81e+04 100 100 1.69e+04 99.14 49.65 3.60e+06 100 97.10 2.55e+05 100 100 2.37e+05 1.67e-14 f

100 87.75 1.13e+06 0.00 100 2.00e+06 0.00 100 2.66e+04 0.00 100 1.69e+04 99.70 51.64 3.60e+06 100 50.37 2.28e+06 100 100 2.25e+05 1.94e-05 f

100 99.88 1.05e+06 0.00 100 2.00e+06 0.00 100 2.66e+04 0.00 100 1.42e+04 98.57 47.24 3.60e+06 100 100 2.21e+05 99.56 100 2.24e+05 1.52e-10 f f 

100 4.20e+04 

100 2.00e+06 

100 4.18e+04 

100 2.52e+04   

100 4.80e+04  f 

100 4.20e+04 

100 2.00e+06 

100 4.19e+04 

100 2.48e+04   

100 4.63e+04  f  

100 2.00e+06 

100 4.15e+04 

100 2.53e+04   

100 4.58e+04  f  

100 2.00e+06 

100 4.18e+04 

100 2.48e+04  

100 2.77e+05 

100 4.58e+04  f 

100 4.20e+04 

100 2.00e+06 

100 4.18e+04 

100 2.79e+04   

100 5.12e+04  f 

100 4.00e+03 

100 2.00e+06 

100 1.20e+04 

100 8.00e+03 

100 1.95e+05  

100 1.24e+04  Avg. 76.57 98.68 1.50e+06 47.39 100 2.00e+06 54.91 100 6.02e+04 60 100 1.64e+04 99.06 41.54 2.78e+06 100 51.83 3.49e+06 99.96 100 2.57e+05 6.66e-02

Table 5.

Optimization results of the seven resulting DECC algorithms on the 2000-dimensional benchmark functions.

Func. Stats. DECC-DG DECC-DG2 DECC-RDG DECC-EVIID DECC-VIL DECC-FVIS DECC-SVG f Median 5.07e+08 1.45e+08 4.24e+07 5.06e+07 4.74e+07 5.23e+07  Mean 5.22e+08 1.90e+08 5.89e+07 6.32e+07 6.18e+07 8.39e+07

Std 1.63e+08 1.41e+08 5.18e+07 4.14e+07 3.80e+07 1.08e+08  f Median 2.18e+04 1.91e+04 1.76e+04 1.79e+04 1.77e+04 1.77e+04  Mean 2.16e+04 1.90e+04 1.75e+04 1.78e+04 1.77e+04 1.76e+04

Std 5.93e+02 1.06e+03 9.34e+02 3.55e+02 5.39e+02 9.75e+02  f Median 9.78e-01 5.38e-01 3.34e-01 3.07e-01 3.28e-01 3.34e-01  Mean 9.61e-01 5.26e-01 3.35e-01 3.06e-01 3.31e-01 3.32e-01

Std 8.28e-02 6.81e-02 8.30e-02 7.63e-02 7.20e-02 7.37e-02  f Median 1.33e+01 1.26e+01 1.22e+01 1.23e+01 2.07e+01 1.22e+01

Mean 1.33e+01 1.27e+01 1.22e+01 1.23e+01 2.07e+01 1.22e+01

Std 2.55e-01 3.56e-01 3.12e-01 2.73e-01 2.58e-02 3.11e-01 f Median 7.30e+05 5.43e+05 3.24e+05 3.33e+05 3.57e+05 3.25e+05  Mean 7.30e+05 5.50e+05 3.08e+05 3.30e+05 3.57e+05 3.24e+05

Std 7.50e+04 8.45e+04 7.13e+04 4.99e+04 6.05e+04 7.56e+04  f Median 1.48e+12 5.12e+11 3.25e+11 3.48e+11 5.22e+14 2.43e+13

Mean 1.53e+12 5.77e+11 3.59e+11 3.79e+11 5.36e+14 2.48e+13

Std 3.96e+11 2.97e+11 1.48e+11 1.22e+11 1.27e+14 6.80e+12 f Median 3.21e+08 3.05e+08 2.96e+08 f Median 2.11e+01 5.38e+00 3.39e+00 7.04e-01 2.13e+06 3.85e+01

Mean 2.55e+01 8.53e+00 4.25e+00 6.99e-01 2.10e+06 5.09e+01

Std 2.15e+01 8.97e+00 3.34e+00 8.14e-02 2.79e+05 3.85e+01 f Median

Mean

Std f Median 1.05e+06 2.65e+05 6.42e+05 5.88e+05 1.48e+09 3.97e+06

Mean 7.29e+06 2.75e+05 6.55e+05 6.05e+05 2.26e+09 1.42e+07

Std 3.12e+07 5.84e+04 7.72e+04 8.53e+04 2.19e+09 5.13e+07 f Median 1.71e+08 2.72e+08 1.29e+08 1.18e+08 3.06e+10 9.63e+11

Mean 2.00e+08 3.29e+08 1.43e+08 1.35e+08 3.08e+10 9.70e+11

Std 9.15e+07 2.07e+08 6.08e+07 4.76e+07 2.68e+09 2.15e+10 f Median 1.52e+04 1.58e+04 1.44e+04 1.45e+04 2.95e+04 1.64e+04

Mean 1.51e+04 1.58e+04 1.44e+04 1.45e+04 2.96e+04 1.64e+04

Std 2.48e+02 2.41e+02 3.04e+02 3.63e+02 4.82e+02 2.34e+02 f Median 3.31e+00 1.53e+00 1.12e+00 1.04e+00 9.93e+01 8.32e+01

Mean 3.27e+00 1.57e+00 1.13e+00 1.03e+00 9.92e+01 8.38e+01

Std 1.94e-01 1.29e-01 1.43e-01 9.60e-02 1.83e+00 2.10e+00 f Median 5.40e+01 5.43e+01 5.35e+01 5.29e+01 2.29e+02 5.34e+01

Mean 5.38e+01 5.45e+01 5.34e+01 5.33e+01 2.29e+02 5.36e+01

Std 3.58e+00 2.57e+00 2.73e+00 2.30e+00 2.64e-01 2.83e+00 f Median 2.98e+05 3.90e+05 2.86e+05 2.86e+05 1.63e+06 2.62e+08

Mean 3.03e+05 3.86e+05 2.91e+05 2.89e+05 1.64e+06 3.28e+08

Std 2.88e+04 3.41e+04 3.85e+04 3.14e+04 8.12e+04 1.05e+08 f Median

Mean

Std f Median f Median

Mean

Std f Median

Mean

Std f Median

Mean

Std f Median

Mean

Std

No. of the Best

5 2 6 8 0 2 18 . Conclusion

In this paper, we present a novel decomposition algorithm named SVG for CC. This algorithm can efficiently decompose general LSGO problems with a high accuracy, which benefits from the new designed separability detection criterion. Checking whether the global optimum of a variable relies on other variables, this criterion is consistent with the definition of general separability and extends SVG beyond additive separability. To ensure its decomposition efficiency, SVG designs a two-layer polynomial regression scheme and a dynamic-binary-tree-based variable grouping procedure. The former can locate the global optimum of a variable required by the separability detection criterion with a small number of FEs, and the latter systematically reutilizes historical separability detection information and thus can reduce detection times. Extensive experimental results on a new designed benchmark suite for generally separable LSGO problems reveal that compared with six state-of-the-art decomposition algorithms, SVG owns impressive accuracy, efficiency, scalability, and capability in enhancing the optimization performance of CC. It is worth mentioning that this study temporarily takes no account of indirect interdependency since now it is still an open issue to group indirectly nonseparable variables for possibly resulting high-dimensional sub-problems [42]. We will improve SVG to deal with this special case in our future work. Besides, we will further verify the performance of SVG by applying it to real-world LSGO problems.

Credit authorship contribution statement

An Chen : Conceptualization, Methodology, Software, Validation, Investigation, Writing-original draft, Writing-review & editing.

Zhigang Ren : Conceptualization, Methodology, Visualization, Investigation, Writing-review & editing.

Muyi Wang : Software, Validation, Visualization, Investigation, Writing-review & editing.

Yongsheng Liang : Software, Investigation, Writing-review & editing.

Hanqing Liu : Writing-review & editing.

Wenhao Du : Visualization, Writing-review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant number 61873199), the Natural Science Basic Research Plan in Shaanxi Province of China (grant number 2020JM-059), and the Fundamental Research Funds for the Central Universities [grant numbers xzy022020057]

References [1]

F. V. D. Bergh, A. P. Engelbrecht, A cooperative approach to particle swarm optimization, IEEE Trans. Evol. Comput. 8 (3) (2004) 225-239, doi: https://doi.org/10.1109/TEVC.2004.826069. [2]

F. Caraffini, F. Neri, G. Iacca, Large scale problems in practice: the effect of dimensionality on the interaction among variables, in: Proceedings of the 20th European Conference on the Applications of Evolutionary Computation, 2017, pp. 636-652, doi: https://doi.org/10.1007/978-3-319-55849-3_41. 3]

W. Chen, Y. Jia, F. Zhao, X. Luo, X. Jia and J. Zhang, A cooperative co-evolutionary approach to large-scale multisource water distribution network optimization, IEEE Trans. Evol. Comput. 23 (5) (2019) 842-857, doi: https://doi.org/10.1109/TEVC.2019.2893447. [4]

A. Chen, Z. Ren, Y. Yang, Y. Liang, B. Pang, A historical interdependency based differential grouping algorithm for large scale global optimization, in: Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2018, pp. 1711-1715, https://doi.org/10.1145/3205651.3208278. [5]

W. Chen, T. Weise, Z. Yang, K. Tang, Large-scale global optimization using cooperative coevolution with variable interaction learning, in: Proceedings of the Conference on Parallel Problem Solving from Nature, 2010, pp. 300-309, doi: https://doi.org/10.1007/978-3-642-15871-1_31. [6]

A. Chen, Y. Zhang, Z. Ren, Y. Yang, Y. Liang, B. Pang, A global information based adaptive threshold for grouping large scale optimization problems, in: Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2018, pp. 833-840, https://doi.org/10.1145/3205455.3205641. [7]

L. Danon, A. Díaz-Guilera, J. Duch, A. Arenas, Comparing community structure identification, J. Stat. Mech.: Theory Exp. 2005 (09) P09008, doi: https://doi.org/10.1088/1742-5468/2005/09/P09008. [8]

I. D. Falco, A. D. Cioppa, G. A. Trunfio, Investigating surrogate-assisted cooperative coevolution for large-scale global optimization, Inf. Sci. 482 (2019) 1-26, doi: https://doi.org/10.1016/j.ins.2019.01.009. [9]

H. Ge, L. Sun, G. Tan, Z. Chen, and C. L. P. Chen, Cooperative hierarchical PSO with two stage variable interaction reconstruction for large scale optimization, IEEE Trans. Cyber. 47 (9) (2017) 2809-2823, doi: https://doi.org/10.1109/TCYB.2017.2685944. [10]

H. Ge, L. Sun, X. Yang, S. Yoshida, and Y. Liang, Cooperative differential evolution with fast variable interdependence learning and cross-cluster mutation, Appl. Soft Comput. 36 (2015) 300-314, doi: https://doi.org/10.1016/j.asoc.2015.07.016. [11]

D. Guo, Z. Ren, Y. Liang, A. Chen, Scaling up radial basis function for high-dimensional expensive optimization using random projection, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2020, pp. 1-8, doi: https://doi.org/10.1109/CEC48606.2020.9185699. [12]

X. Hu, F. He, W. Chen, and J. Zhang, Cooperation coevolution with fast interdependency identification for large scale optimization, Inf. Sci., 381 (2017) 142-160, doi: https://doi.org/10.1016/j.ins.2016.11.013. [13]

J. Jian, Z. Zhan, J. Zhang, Large-scale evolutionary optimization: A survey and experimental comparative study, Int. J. Mach. Learn. Cyber. 11 (2020) 729-745, doi: https://doi.org/10.1007/s13042-019-01030-4. [14]

Y. Jin, Surrogate-assisted evolutionary computation: Recent advances and future challenges, Swarm Evol. Comput. 1 (2) (2011) 61-70, doi: https://doi.org/10.1016/j.swevo.2011.05.001. [15]

Y. Jin, H. Wang, T. Chugh, D. Guo, K. Miettinen, Data-driven evolutionary optimization: An overview and case studies, IEEE Trans. Cyber. 23 (3) (2019) 442-458, doi: https://doi.org/10.1109/TEVC.2018.2869001. [16]

A. Kabán, J. Bootkrajang, R. J. Durrant, Toward large-scale continuous EDA: A random matrix theory perspective, Evol. Comput. 24 (2) (2016) 255-291, doi: https://doi.org/10.1162/EVCO_a_00150. [17]

K. S. Kim, Y. S. Choi, An efficient variable interdependency-identification and decomposition by minimizing redundant computations for large-scale global optimization, Inf. Sci. 513 (2020) 289-323, doi: https://doi.org/10.1016/j.ins.2019.10.049. [18]

A. LaTorre, S. Muelas, J. Peña, Large scale global optimization: Experimental results with MOS-based hybrid algorithms, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2013, pp. 2742-2749, doi: https://doi.org/10.1109/CEC.2013.6557901. [19]

L. Li, L. Jiao, R. Stolkin, F. Liu, Mixed second order partial derivatives decomposition method for large scale optimization, Appl. Soft Comput. 61 (2017) 1013-1021, doi: https://doi.org/10.1016/j.asoc.2017.08.025. [20]

X. Li, K. Tang, M. N. Omidvar, Z. Yang, K. Qin, Benchmark functions for the CEC’2013 special session and competition on large-scale global optimization, Tech. rep., RMIT University, Melbourne, Australia, 2013. 21]

X. Li, X. Yao, Cooperatively coevolving particle swarms for large scale optimization, IEEE Trans. Evol. Comput. 16 (2) (2012) 210-224, doi: https://doi.org/10.1109/TEVC.2011.2112662. [22]

X. Ma, X. Li, Q. Zhang, K. Tang, Z. Liang, W. Xie, Z. Zhu, A survey on cooperative co-evolutionary algorithms, IEEE Trans. Evol. Comput. 23 (3) (2019) 421-441, doi: https://doi.org/10.1109/TEVC.2018.2868770. [23]

S. Mahdavi, M. E. Shiri, S. Rahnamayan, Metaheuristics in largescale global continues optimization: A survey, Inf. Sci. 295 (2015) 407-428. doi: https://doi.org/10.1016/j.ins.2014.10.042. [24]

Y. Mei, M. N. Omidvar, X. Li, X. Yao, A competitive divide-and-conquer algorithm for unconstrained large-scale black-box optimization, ACM Trans. Math. Soft. 42 (2) (2016) 13, doi: https://doi.org/10.1145/2791291. [25]

D. Molina, A. LaTorre, F. Herrera, SHADE with iterative local search for large-scale global optimization, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2018, pp. 1-8, doi: https://doi.org/10.1109/CEC.2018.8477755. [26]

D. Molina, M. Lozano, F. Herrera, MA-SW-Chains: Memetic algorithm based on local search chains for large scale continuous global optimization, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2010, pp. 1-8, doi: https://doi.org/10.1109/CEC.2010.5586034. [27]

J. Nocedal, S. J. Wright, Numerical optimization (2nd edition), Springer, New York, 2006, doi: https://doi.org/10.1007/978-0-387-40065-5. [28]

M. N. Omidvar, X. Li, Y. Mei, X. Yao, Cooperative co-evolution with differential grouping for large scale optimization, IEEE Trans. Evol. Comput. 18 (3) (2014) 378-393, doi: https://doi.org/10.1109/TEVC.2013.2281543. [29]

M. N. Omidvar, X. Li, K. Tang, Designing benchmark problems for large-scale continuous optimization, Inf. Sci. 316 (2015), 419-436, doi: https://doi.org/10.1016/j.ins.2014.12.062. [30]

M. N. Omidvar, X. Li, X. Yao, Cooperative co-evolution with delta grouping for large scale non-separable function optimization, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2010, pp. 1-8, doi: https://doi.org/10.1109/CEC.2010.5585979. [31]

M. N. Omidvar, M. Yang, Y. Mei, X. Li, and X. Yao, DG2: A faster and more accurate differential grouping for large-scale black-box optimization, IEEE Trans. Evol. Comput. 21 (6) (2017) 929-942, doi: https://doi.org/10.1109/TEVC.2017.2694221. [32]

L. Panait, Theoretical convergence guarantees for cooperative coevolutionary algorithms, Evol. Comput. 18 (4) (2010) 581-615, doi: https://doi.org/10.1162/EVCO_a_00004. [33]

M. A. Potter, K. A. D. Jong, A cooperative coevolutionary approach to function optimization, in: Proceedings of the Third Conference on Parallel Problem Solving from Nature, 1994, pp. 249–257, doi: https://doi.org/10.1007/3-540-58484-6_269. [34]

M. A. Potter, K. A. D. Jong, Cooperative coevolution: An architecture for evolving coadapted subcomponents, Evol. Comput. 8 (1) (2000), doi: https://doi.org/10.1162/106365600568086. [35]

T. Ray, X. Yao, A cooperative coevolutionary algorithm with correlation based adaptive variable partitioning, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2009, pp. 983-989, doi: https://doi.org/10.1109/CEC.2009.4983052. [36]

Z. Ren, A. Chen, L. Wang, Y. Liang, B. Pang, An efficient vector-growth decomposition algorithm for cooperative coevolution in solving large scale problems, in: Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2017, pp. 41-42, https://doi.org/10.1145/3067695.3082048. [37]

Z. Ren, A. Chen, M. Wang, Y. Yang, Y. Liang, K. Shang, Bi-hierarchical cooperative coevolution for large scale global optimization, IEEE Access 8 (2020) 41913-41928, doi: https://doi.org/10.1109/ACCESS.2020.2976488. [38]

Z. Ren, Y. Liang, A. Zhang, Y. Yang, Z. Feng, L. Wang, Boosting cooperative coevolution for large scale optimization with a fine-grained computation resource allocation strategy, IEEE Trans. Cyber. 49 (12) (2019) 4180-4193, doi: https://doi.org/10.1109/TCYB.2018.2859635. [39]

Z. Ren, B. Pang, M. Wang, Z. Feng, Y. Liang, A. Chen, Y. Zhang, Surrogate model assisted cooperative coevolution for large scale optimization, Appl. Intel. 49 (2) (2019) doi: https://doi.org/10.1007/s10489-018-1279-y. [40]

Y. Sun, M. Kirley, S. K. Halgamuge, Quantifying variable interactions in continuous optimization problems, IEEE Trans. Evol. omput. 21 (2) (2017) 249-264, doi: https://doi.org/10.1109/TEVC.2016.2599164. [41]

Y. Sun, M. Kirley, S. K. Halgamuge, A recursive decomposition method for large scale continuous optimization, IEEE Trans. Evol. Comput. 22 (5) (2018) 647-661, doi: https://doi.org/10.1109/TEVC.2017.2778089. [42]

Y. Sun, X. Li, A. Ernst, and M. N. Omidvar, Decomposition for large-scale optimization problems with overlapping components, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2019, pp. 326-333, doi: https://doi.org/10.1109/CEC.2019.8790204. [43]

L. Sun, S. Yoshida, X. Cheng, Y. Liang, A cooperative particle swarm optimizer with statistical variable interdependence learning, Inf. Sci. 186 (1) (2012) 20-39, doi: https://doi.org/10.1016/j.ins.2011.09.033. [44]

K. Tang, X. Li, P. N. Suganthan, Z. Yang, T. Weise, Benchmark functions for the CEC’2010 special session and competition on large-scale global optimization, Tech. rep., USTC, Hefei, China, 2010. [45]

T. Weise, R. Chiong, K. Tang, Evolutionary optimization: Pitfalls and booby traps, J. Computer Sci. Technol. 27 (2012) 907-936, doi: https://doi.org/10.1007/s11390-012-1274-4. [46]

Q. Xu, M. L. Sanyang, A. Kaban, Large scale continuous EDA using mutual information, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2016, pp. 3718-3725, doi: https://doi.org/10.1109/CEC.2016.7744260. [47]

X. Xue, K. Zhang, R. Li, L. Zhang, C. Yao, J. Wang, J. Yao, A topology-based single-pool decomposition framework for large-scale global optimization, Appl. Soft Comput. 92 (2020) 106295, doi: https://doi.org/10.1016/j.asoc.2020.106295. [48]

T. Yang, A. A. Asanjan, M. Faridzad, N. Hayatbini, X. Gao, S. Sorooshian, An enhanced artificial neural network with a shuffled complex evolutionary global optimization with principal component analysis, Inf. Sci. 418-419 (2017) 302-316, doi: https://doi.org/10.1016/j.ins.2017.08.003. [49]

Q. Yang, W. Chen, J. Zhang, Evolution consistency based decomposition for cooperative coevolution, IEEE Access 6 (2018) 51084-51097, doi: https://doi.org/10.1109/ACCESS.2018.2869334. [50]

Z. Yang, K. Tang, X. Yao, Large scale evolutionary optimization using cooperative coevolution, Inf. Sci. 178 (15) (2008) 2985-2999, doi: https://doi.org/10.1016/j.ins.2008.02.017. [51]

Z. Yang, K. Tang, X. Yao, Self-adaptive differential evolution with neighborhood search, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2008, pp. 1110-1116, doi: https://doi.org/10.1109/CEC.2008.4630935. [52]

Z. Yang, K. Tang, X. Yao, Multilevel cooperative coevolution for large scale optimization, in: Proceedings of the IEEE Congress on Evolutionary Computation, 2008, pp. 1663-1670, doi: https://doi.org/10.1109/CEC.2008.4631014. [53]