Epistocracy Algorithm: A Novel Hyper-heuristic Optimization Strategy for Solving Complex Optimization Problems
Seyed Ziae Mousavi Mojab, Seyedmohammad Shams, Hamid Soltanian-Zadeh, Farshad Fotouhi
EEpistocracy Algorithm: A Novel Hyper-heuristic Optimization Strategy for Solving Complex Optimization Problems
Seyed Ziae Mousavi Mojab , Seyedmohammad Shams , Hamid Soltanian-Zadeh , Farshad Fotouhi Dept. of Computer Science, Wayne State University, Detroit, MI 48202, USA
Dept. of Radiology, Henry Ford Health System, Detroit, MI 48202, USA
Abstract.
This paper proposes a novel evolutionary algorithm called Epistocracy which incorporates human socio-political behavior and intelligence to solve complex optimization problems. The inspiration of the Epistocracy algorithm originates from a political regime where educated people have more voting power than the uneducated or less educated. The algorithm is a self-adaptive, and multi-population optimizer in which the evolution process takes place in parallel for many populations led by a council of leaders. To avoid stagnation in poor local optima and to prevent a premature convergence, the algorithm employs multiple mechanisms such as dynamic and adaptive leadership based on gravitational force, dynamic population allocation and diversification, variance-based step-size determination, and regression-based leadership adjustment. The algorithm uses a stratified sampling method called Latin Hypercube Sampling (LHS) to distribute the initial population more evenly for exploration of the search space and exploitation of the accumulated knowledge. To investigate the performance and evaluate the reliability of the algorithm, we have used a set of multimodal benchmark functions, and then applied the algorithm to the MNIST dataset to further verify the accuracy, scalability, and robustness of the algorithm. Experimental results show that the Epistocracy algorithm outperforms the tested state-of-the-art evolutionary and swarm intelligence algorithms in terms of performance, precision, and convergence.
Keywords:
Epistocracy Algorithm, Evolutionary Computation, Metaheuristic Optimization Algorithm, Multi-dimensional Search, Swarm Intelligence. Introduction
Evolutionary computation (EC) is a subfield of artificial intelligence that encompasses methods mimicking mechanisms of biological evolution to solve various optimization problems. An optimization problem essentially requires finding a set of parameters π₯β = (π₯ , β¦ , π₯ π ) ο π of the current system, such that a certain quantity π: π β β is maximized (or minimized) βπ₯β ο π βΆ π(π₯β) β€ π(π₯β β ) . ver the past few decades, many state-of-the-art evolutionary algorithms such as Genetic algorithm (GA) and Evolutionary Strategies (ES) have been proposed for applications where a well-defined or closed-form solution does not exist [1]. Genetic algorithm was developed by John Holland in the early 1970s [2]-[4] mimicking Darwinian theory of survival of the fittest and Evolutionary Strategies founded by Rechenberg and Schwefel in 1965 [5]-[7] based on the hypothesis that small mutations occur more commonly than large mutations. Both Genetic algorithm and Evolutionary Strategies rely on the concept of population, representing potential solutions to the optimization problem which iteratively undergo genetic operators to improve their fitness score. While Genetic algorithms use a binary string of digits to represent solutions and use both mutation and recombination as genetic operators, in Evolutionary Strategies a fixed-length real-valued vector is used for representation, and only mutation is used as a primary search operator. In evolutionary algorithms, the recombination operator performs an information exchange, and the mutation operator generates variations of the solutions and increases the diversity among the population. The selection operator, however, makes better individuals to survive and reproduce. Another subset of nature-inspired algorithms is Swarm Intelligence (SI) which is based on collective behavior of a decentralized, self-organizing network of agents such as bird flocks or honeybees. In SI algorithms, multiple agents can locally interact and exchange heuristic information which leads to the emergence of global behavior of adaptive search and optimization. Particle Swarm Optimization (PSO) is an example of swarm intelligence proposed by Eberhart and Kennedy in 1995 [8]. This algorithm is inspired by social behavior of bird flocking and fish schooling. Similar to GA, PSO is initialized with a population of random candidate solutions that are improved iteratively over time, however, unlike GA has no evolution operators such as recombination and mutation. Despite the fact that PSO is a powerful and effective optimization technique, it still suffers from stagnation and premature convergence [9], [10]. Several solutions including inertia weight, and time-varying coefficients have been proposed to eliminate these problems [11], [12]. The Artificial Bee Colony (ABC) is another popular swarm intelligence-based algorithm which is inspired by the foraging behavior of the honeybees. ABC consists of three groups of bees: employed bees, onlookers, and scouts that have different roles in the optimization process. ABC is simple, easy to implement, and highly flexible [13]. This algorithm was first proposed by Dervis Karaboga in 2005 [14] to optimize numerical problems. Since then, many variants of ABC have been introduced to increase the population diversity and avoid premature convergence [15], [16]. Cuckoo Search Algorithm (CSA) is one of the latest swarm intelligence-based algorithms developed by Yang and Deb in 2009 [17]. This algorithm is inspired by natural behavior of cuckoos who lay their eggs in other birds' nests for breeding. Compared to other approaches, Cuckoo requires fewer numbers of parameters to be fine-tuned. In 2018, Mareli et al . [18] developed three new Cuckoo search algorithms using linear, exponential and power increasing switching parameters to maintain an optimum balance between local and global exploration and increase the efficiency of CS algorithm. In 2019, Li et al . [19] proposed a new variant of CSA called I-PKL-CS lgorithm which employs self-adaptive knowledge learning strategies to mitigate premature convergence and poor balance between exploitation and exploration. I-PKL-CS exploits individual and population knowledge learning to improve the quality of solutions and convergence rate. There exist many real-world applications for EC. In [20], the genetic algorithm was used to decrease the dimension of the data and to optimize the weights and biases of the neural network in ECG signal classification. Xi et al . [21] used PSO to improve the performance of their neural network in order to assess the hazard of earthquake-induced landslide. Kim et al . [22] used self-adaptive Evolutionary Strategies to optimize the parameters of an autonomous car controller. Prakash et al . [23] employed the Cuckoo Search algorithm to perform job scheduling and resource allocation on the grid. Yeh et al . [24] used ABC to optimize a bee recurrent neural network to generate a novel approximate model for predicting network reliability. The selection of an evolutionary approach can drastically reduce the amount of time needed for finding an optimal solution. According to several studies, evolutionary algorithms, in general, suffer from various problems such as limited searching ability [25]-[27], curse of dimensionality and scalability [28], [29], premature convergence and stagnation [30]-[33], and poor performance which usually occur in the absence of population diversity and adaptability [34]-[36], and due to unbalanced exploration-exploitation capacities [37], [38]. The work reported in this paper was motivated by the fact that optimization algorithms require new explorative and exploitative capabilities along with a dynamic resource allocation technique and diversification strategies to help them converge to the optimal solution at the early stages of the optimization process. There is a need for a new generation of evolutionary algorithms that can avoid entrapment in local optima and prevent premature convergence [39], [30]. To find the optimal solution, these algorithms must employ a directed and goal-oriented search rather than a purely random and stochastic one. In this paper, we propose a new hyper-heuristic algorithm based on a political regime called Epistocracy where educated people have more voting power (weight) than the uneducated or less educated. The Epistocracy algorithm splits the population into Governors and Citizens based on the performance of the individuals. The Citizens are assigned Governors based on the degree of similarity and the exercise of free will. Once a Citizen is assigned a Governor, they move towards their Governor in an attempt to mimic some of the traits which made their Governor successful. Governors will also try to improve themselves and lead their population to collaboratively search for the optimal solution. The Epistocracy algorithm is a self-adaptive, and multi-population optimizer in which the evolution process takes place in parallel for many populations led by a council of leaders. To avoid entrapment in poor local optima and to prevent a premature convergence, the algorithm employs multiple mechanisms such as dynamic and adaptive leadership based on gravitational force, dynamic population allocation and diversification, variance-based step-size determination, and regression-based leadership adjustment. The algorithm uses a stratified sampling method called Latin ypercube Sampling (LHS) to distribute the initial population more evenly for exploration of the search space and exploitation of the accumulated knowledge. The rest of the paper is organized as follows. Section 2 describes the overall structure of the Epistocracy algorithm in detail. Experimental results and comparative studies on benchmark test functions along with Convolutional Neural Networks (CNNs) parameter optimization are presented in Section 3. Finally, conclusions and directions for future research are presented in Section 4. Epistocracy Algorithm
Overview
The term Epistocracy is derived from the Greek word epistΓͺmΓͺ meaning knowledge, knowing, and understanding. John Stuart Mill (1806-1873), the British philosopher and political economist in his book βMill on Bentham and Coleridgeβ proposed to give more votes to the better educated [40]. Jason Brennan believes that more competent or knowledgeable citizens must have slightly more political power than less competent citizens [41]. In fact, the problem with democracy is the elimination of the epistemic dimension of democracy. While Democracy is more about the input aspect of the decision-making process, Epistocracy is concerned about the output. Epistocracy algorithm is multi-population optimization algorithm which seeks to minimize the time taken to find an optimal value for the problem being solved. As an adaptive, hyper-heuristic algorithm, Epistocracy employs problem-related knowledge, and globally aggregated statistics to automatically adjust itself during each run and search through a space of meta-heuristics to find the optimal solution. Epistocracy attempts to incorporate human sociopolitical behavior and intelligence to improve the performance and convergence speed and reduce the probability of getting trapped in local optima compared to other meta-heuristic algorithms.
Fig. 1.
Flow diagram of Epistocracy algorithm. s illustrated in Fig. 1, the Epistocracy algorithm is made of two primary components: Governors and Citizens. Citizens are individual solutions that are randomly and uniformly created. In each generation, all individuals are evaluated with a pre-defined fitness function. The top-performing individuals (Governors) are then selected through the Select() function to lead the population. Governors are, in fact, a network of cooperative leaders who influence and evolve the generation of the new population via Lead() function. While Governors continuously improve themselves, citizens can vote for governors and affect their position in the government. Information is systematically propagated among citizens and governors. Fig. 2 shows the flowchart of the proposed algorithm.
Fig. 2.
Flowchart of the Epistocracy algorithm with all steps involved from the population generation until outputting the optimal solution. .2 Generating the Initial Population
The Epistocracy algorithm starts the optimization process by generating a population of random solutions, using a stratified sampling method called Latin Hypercube Sampling (LHS) which was originally proposed by McKay in 1979 [42]. Each individual solution has a set of genes or attributes known as chromosome which are defined using their corresponding upper and lower bounds in the search space. In this algorithm, the set of attributes represent the initial position and level of political knowledge of each individual in the society.
Performance Evaluation
The performance of an individual in the population is evaluated using a pre-defined fitness function. Given the individualβs current chromosome, the actual performance is calculated and stored as an individualβs βactual performance.β The previous actual performance is also recorded for future reference. Individual solutions are then ranked based on their actual performance (fitness score). The adjusted performance is calculated based on the actual performance of each individual solution. The calculation steps will be explained in detail in the following sections . Population Separation
Different people demonstrate different understandings of patterns of change and achieve different levels of success and result upon the social hierarchy. This algorithm plans to separate the Governors from Citizens based on the level of success an individual can achieve. The top performers in the population will be considered βGovernorsβ and the rest will be considered βCitizensβ.
Governor Assignment
Before evolving each individual and moving them around, about five percent of the top-performing individuals in each generation are selected as a set of governors to lead the population and help them improve their performance. Transcending traditional societies, in Epistocracy, governments have no obvious borders, and individuals can follow or vote for any governor anywhere expressing the idea of βGlobal Villageβ. In the Epistocracy algorithm, each individual is assigned to a governor based on their phenotypic characteristics, and the degree of influence and impact of the governor on the citizen. To that end, the Gravitational Force (1) is used to calculate the magnitude of attraction between each citizen and every governor. A governor with a larger gravitational force has a higher probability to attract a citizen and form a larger territory. However, some citizens may act as rebels and resist against the orders of the befitting authorities and may follow different governors.
πΉ = πΊ Γ (π Γ π π ) (1) n the above equation of the Gravitational Force, G is a constant, and m , and m , are the adjusted performances of the governor and citizen respectively. All performances are normalized using the following formula: π ππππ π = [ π π β πππ(π πππ£ππππππ )πππ₯(π πππ£ππππππ ) β πππ(π πππ£ππππππ )] β1 (2) In (2), P i is the individualβs actual performance, where P governors is the list of governorsβ performances. The Euclidean distance (3) is used to calculate the distance r between a governor and a citizen. πππ π‘(π π , π π ) = βπ π β π π β = ββ(π₯ ππ β π₯ ππ ) π = ππ’ππππ ππ π£ππππππππ (3) To imitate the rebelliousness of citizens, a roulette wheel with the governors' calculated gravitational forces is used to give citizens a freedom of selecting other governors with even a greater dissimilarity (distance). This will help the algorithm to explore the interspace between the governors by moving a citizen across the governments. The selection probability is defined using the following equation: π π = πΊ(π π )β πΊ(π π ) ππ=1 (4) In (4), n is the number of governors. G is the gravitational force of solution S j . In the next generation, if the assigned governor is overthrown or resigned due to their poor performance or their own population votes, the surviving citizen will choose a new governor from the updated list of governors. If a governor performs poorly, eventually, he will be degraded and may lose all his population and get removed from the current list of governors. This happens when a population's total performance over a certain period of time (an iteration) compared to other populations is very small. In this case, the governor will lose his popularity regardless of his own performance at the time of being selected. In fact, a governor's popularity rests on his credibility and competence, and his performance in leading his population and improving their lives. By adjusting the actual performance of the governor, the governor's rank in the governors list will change. Given that, this governor will have a lower chance to be selected by new citizens who do not have any governor yet. Leading the Population
In the next step, the Epistocracy algorithm allows governors to lead their own population. Each citizen will take a step of variable length (5) toward his governor to improve his performance and become similar and even better than their governor. The tep size is proportional to the distance between the governor and citizen and inversely proportional to the self-improvement of the citizen under the rule of the governor. The following formula is used to calculate the next step of each citizen: π π = (πΌ ππ£π πΌ πππ ) Γ π Γ π π,π Γ π (5) where S i is the individualβs new step size, and I avg is the average improvement of the governorβs sub-population (7). I min is the minimum improvement in the population. Ο is the variance of the sub-population, and d i,g is the Euclidean distance between the individual and its designated governor. Ο is the rate of change equal to 0.1. The self-improvement is calculated as follows: πΌ π = (π πππ π β π πππ‘π’ππ π ) (6) The self-improvement is the difference between the old and the current actual performance of the citizen. The average improvement is then calculated by: πΌ ππ£π = 1π Γ β πΌ πππ=1 (7) In (7), n is the size of the governorβs sub-population. The average improvement is an important factor for the step size determination. To avoid missing any minima or maxima, a smaller step will be taken when a larger improvement is achieved, and a larger step will be taken when a smaller improvement is obtained. The population variance is given by the following formula: π = β(π₯ π β π) π (8) To reflect the diversity of the society, if a citizen by taking a new step becomes exactly similar to his governor or another citizen in the same population, the citizen will be mutated to save the system resources. This also helps the algorithm to avoid division by zero in calculating the gravitational force when the distance between a citizen and his governor becomes zero.
Improving Governors
Similar to citizens, governors will also improve themselves by taking a step in a direction that hopefully increases their performance. To that end, the variance of governorsβ population is calculated, helping governors converge toward a location with the highest possibility of finding the optimal solution. The next step size of the governor is calculated like that of a citizen. However, instead of calculating the distance between the governor and citizen, this time the governorβs previous step is considered according to the following formula: π π = (πΌ ππ£π πΌ π ) Γ π Γ π ππππ£ π (9) where βprevious stepβ S prev is initialized as: ππππ£ππ‘ππ = (π’ππππ πππππ‘ β πππ€ππ πππππ‘ ) Γ π ππππ πππ πππ’π‘πππ (10) In (10), upper limit and lower limit are the boundaries of the search space, and the space resolution is initially set to 0.001. Since the governor is in charge of leading his population and pushing them to the right direction, the algorithm will let the governor take a step only if that step improves his overall performance, otherwise, the governor will stay in his previous place without making any movement.
Since for computing the new step the variance of all governors is used, in the next iterations for the same governor the step size might be different and might help the governor to get improved and consequently positively contribute to the improvement of his population. The following piecewise function (11) shows the conditional step that must be taken by each governor, provided that the step improves the governorβs actual performance: π π = {(πΌ ππ£π πΌ π ) Γ π Γ π ππππ£ π ππ βπ β€ 00 ππ‘βπππ€ππ π (11) where βπ = π πππ‘π’ππ πππ€ β π πππ‘π’ππ πππ . This formula is designed for a minimization optimization problem. Governorβs Performance
Adjustment
When a population performs well or poorly under a leadership of a governor, the algorithm will adjust the governorβs actual performance to allocate the right amount of resources (individuals) to the governor. For example, if a population is performing well, that must increase the trust of the population in the governor. In this case, generally more individuals will be following the governor to help him accomplish the task of finding the optimal solution. If a governor is performing poorly, the governorβs actual performance will be lowered accordingly, and eventually, some individuals will leave the governor and follow another governor to improve their quality of life. In other words, like the Epistocratic societies, when a population is under-performing, this will eventually affect the popularity and credibility of the governor. Those people who initially voted for that governor, will shift away from the governor, and try to choose another governor. In each iteration, the population will vote on the performance of the governor, however, these votes have different weights. The Epistocracy algorithm will compute the average improvement per each population, giving higher weights to individuals who are closer to the governor (and ore educated) and lower weights to citizens who are farther away (and less educated) before using the following formula: πΌ ππ£π = 1β π€ πππ=1 Γ β π€ π πΌ πππ=1 (12) In (12), n is the size of the sub-population and I i is the individualβs self-improvement given by: πΌ π = (π πππ πππ‘π’ππ π β π πππ‘π’ππ π ) (13) In (13), P i is the individualβs performance. The weight of an individualβs vote, w i is calculated as follows: π€ π = β log π π,π β π π,πππ=0 Γ π πππ‘π’ππ π (π πππ‘π’ππ π β π πππ‘π’ππ π ) + π (14) where π π,π is the Euclidean distance between the individual and their governor. β π π,πππ=0 is the total distance between a governor and every individual in their sub-population. P is the performance, and π is a very small positive number. In (14), the log scale is used to mitigate the impact of extreme changes in distance calculation. In the next step, a linear regression is used to compute the adjusted performance of each governor based on their population average performance and votes. Given πΌ ππ£π and actual performance of each governor we calculate the predicted performance, P predicted as follows: π πππππππ‘ππ = π½ + π½ Γ πΌ ππ£π + π π (15) where π π is the residual error whose distribution is N (0, Ο ), and b and b are calculated as follows: π = β(πΌ ππ£π β πΌ ππ£π Μ Μ Μ Μ Μ ) Γ (π πππ‘π’ππ β π πππ‘π’ππ Μ Μ Μ Μ Μ Μ Μ Μ Μ )β(πΌ ππ£π β πΌ ππ£π Μ Μ Μ Μ Μ ) (16) π = π πππ‘π’ππ Μ Μ Μ Μ Μ Μ Μ Μ Μ β π Γ πΌ ππ£π Μ Μ Μ Μ Μ (17) The adjusted performance, P adjusted is calculated using the following formula: π ππππ’π π‘ππ = π πππ‘π’ππ + [π Γ βπ] (18) where βπ = π πππ‘π’ππ β π πππππππ‘ππ (19) π = 1π Γ π π β π πππ=1 (20) In (20), n is the number of governors, and π π is the population size of the j th governor. Genetic Operators: Recombination and Mutation
Finally, the genetic operators are used to generate offspring based on the initial population. To maintain genetic diversity, recombination and then mutation is applied to the existing solutions. Selection, crossover, and mutation are the three operators used in the Epistocracy algorithm. The crossover operator uses the tournament method to choose the best parents among the sampled candidates. Their chromosomes are split at randomly picked points between 1 and the chromosome size - 1. The chromosome of each of the two new offspring is made of the genes from the opposite side of the split point of each of the two parents. A percentage of the new individuals are then mutated. Once a chromosome is selected to be mutated one of its genes is selected at random. This gene will be replaced with a random number between the upper and lower bounds. The chromosome is then validated to ensure all genes are within the bounds. If a gene is not within the bounds it will be set to the closest bound. Experimental Results and Analysis
Evaluation of Epistocracy Algorithm Using Benchmark Functions
To test the performance of our proposed algorithm, we have used several multimodal benchmark functions (i.e. Eggholder, Rastrigin, Schaffer-4, CrossInTray, Griewank) with a large number of local optima from the global optimization literature. We have also used 5 state-of-the-art evolutionary algorithms to compare the consistency and reliability of the Epistocracy algorithm using these benchmark functions. In order to make a fair comparison between Epistocracy and other state-of-the-art algorithms, we have selected a set of optimization problems, and tested each algorithm with a population size of 100, for 100 runs and 100 iterations in each run. The results of comparison among Epistocracy, Genetic Algorithm, Evolutionary Strategies, Artificial Bee Colony, Cuckoo Search, and Particle Swarm Optimization on different functions are given in Table 1, where βMeanβ indicates the average fitness obtained from 100 runs and βStd.β is the standard deviation. βMinβ and βMaxβ are the best and worst fitness values, found throughout 100 runs, respectively. As shown in Table 1, the Epistocracy algorithm demonstrates higher reliability and consistency compared to other algorithms due to lower variation and dispersion in the outcome of the objective function. able 1.
Comparison of the benchmark functions.
Function Epistocracy GA ES ABC CSA PSO Eggholder 2D
Min Max Mean Std. -959.6407 -957.7592 -959.5399 0.4198 -959.6407 -894.4704 -938.0387 16.6918 -959.6407 -893.6453 -946.7461 22.8390 -959.6407 -951.0668 -958.8248 1.9929 -959.6407 -753.0372 -917.5035 48.1905 -959.6407 -786.5260 -927.3717 49.8634
Rastrigin 5D
Min Max Mean Std.
Schaffer-4 2D
Min Max Mean Std.
CrossInTray 2D
Min Max Mean Std. -2.0626 -2.0626 -2.0626 9.1125E-16 -2.0626 -2.0620 -2.0625 0.0002 -2.0626 -2.0623 -2.0625 7.5410E-05 -2.0626 -2.0626 -2.0626 8.7118E-09 -2.0626 -2.0626 -2.0626 4.2811E-08 -2.0626 -2.0626 -2.0626 9.1125E-16
Griewank 2D
Min Max Mean Std.
Griewank 5D
Min Max Mean Std.
As illustrated in Fig. 3, The absence of outliers and smaller standard deviation represented by a tinier boxplot are the most significant advantages of the Epistocracy algorithm. According to the test results of Rastrigin 5D, Epistocracy obtained the smallest standard deviation and produced better mean than other algorithms. This is a proof that the Epistocracy algorithm can effectively avoid being trapped in local minima in a complex, multimodal environment. This also shows that our algorithm is scalable and has a clear advantage over other evolutionary algorithms tested in this problem. For Schaffer-4 2D, Epistocracy algorithm shows a higher reliability than other algorithms by producing results within a narrower range depicted in its corresponding boxplot. With CrossInTray 2D, the Epistocracy algorithm still is either doing better than other algorithms such as GA, and ES, or performing the same as PSO. However, overall, the Epistocracy algorithm has a better consistency and reliability than PSO, and similar algorithms. Among all other algorithms, for Griewank 2D, the Epistocracy algorithm has produced a narrower range of optimal solutions which is represented by its tiny boxplot. For Griewank 5D, the Epistocracy algorithm, again, hows a better result than other algorithms, and reconfirms the reliability and consistency of the algorithm working in different environments with different characteristics.
Fig. 3.
Box and Whisker Plot of Fitness Scores for different benchmark functions.
These preliminary results also show that the Epistocracy algorithm is more reliable with functions that contain multiple minima. From the robustness aspect, the Epistocracy algorithm is more robust with respect to the existence of multiple minima. For large scale search space, the Epistocracy algorithm performs more efficiently than other algorithms. In terms of convergence, the Epistocracy algorithm showed a decent rate of convergence compared to other algorithms. .2 Evaluation of Epistocracy Algorithm Using the MNIST Dataset
To further evaluate the performance of our method, we tasked the Epistocracy algorithm to find the optimal set of hyper-parameters to build the best CNN model for β MNIST β handwritten digit recognition. The MNIST dataset is a set of hand-written digit images ranging from 0-9. This dataset contains size-normalized, gray-scale examples of digits written by 500 writers that were centered in a 28x28 image and associated with a label from 10 classes. MNIST consists of a training set of 60,000 examples, and a test set of 10,000 examples and was constructed from NIST's (the US National Institute of Standards and Technology) Special Database 3 and Special Database 1 which contain binary images. Each feature vector (row in the feature matrix) consists of 784 pixels (intensities) flattened from the original 28x28 pixels images. The end goal is to classify the handwritten digits based on a 28x28 black and white image. MNIST dataset is commonly used for training classification algorithms and benchmarking purpose.
Optimization of Hyper-parameters.
The problem of finding the optimal value for hyper-parameter Ξ» is called hyper-parameter optimization. The main technique for finding such a value is to choose a value π π from the trial set {π , π , β¦ , π π } , to evaluate the response function Ξ¨(π) for each one, and return the π π that worked the best as πΜ . The optimization of hyper-parameters can be expressed as follow: πΜ β argmin πβΞ πΌ ~π’ π₯ [β (π₯; π π (π π‘ππππ ))] (21) β‘ argmin πβΞ Ξ¨(π)
In the above formula, Ξ» is the hyper-parameter that should be selected in a way that the generalization error (loss function) πΌ ~π’ π₯ [β (π₯; π π (π π‘ππππ ))] minimized. π is the learning algorithm that maps the training dataset π π‘ππππ from a natural distribution π’ π₯ to the function f, π = π π (π π‘ππππ ) The hyper-parameter optimization can be denoted as the minimization of the response function
Ξ¨(π) over π β Ξ where Ξ is the search space. MNIST CNNs as a Proof of Concept.
Epistocracy as a multivariate optimization algorithm can be adapted for use in the automated discovery of CNN architectures, however, its effectiveness in doing so would be difficult to test. A regular multivariate optimization problem might have a known minimum or maximum while the accuracy of a CNN does not. In addition, the exact answer of most problems can be obtained through mathematical proof or exhaustive search. A full exhaustive search, however, is both time-consuming and computationally expensive, and there is no way to know what the best possible architecture of a model is.
The solution to this problem is to create a finite set of architectures and task Epistocracy with finding the best architecture in that set. For this purpose, a set of 480 nique models were generated, using all possible values of the hyper-parameters shown in Table 2. Every permutation of these hyper-parameters was used to create a distinct model. More hyper-parameters were not used since each model takes a considerable amount of time to train and test. Adding more options would make the amount of time needed to create all permutations of models unreasonable, and we must know the accuracy of all permutations in order to use MNIST as a proof of concept.
Table 2.
Hyper-parameters and values used in an exhaustive search.
Hyper-parameter Values Used
Filter Number 12, 16, 20, 24, 28, 32 Filter Size 3, 4, 5, 6, 7 Neuron Size 50, 100, 150, 200 Dropout Rate 0.1, 0.2, 0.3, 0.4
Creating CNNs for MNIST.
The first step to making the 480 different architectures is to create every possible set of hyper-parameters. Each set is a unique set of hyper-parameters for a single model. Given a set of hyper-parameters, a 16-layer equivalent CNN model is created in Keras using Googleβs sample code to train an βMNISTβ handwritten digit recognition model.
Table 3.
Best combination of hyper-parameters.
Hyper-parameter Value
Filter Number 28 Filter Size 6 Dropout Rate 3 Neuron Size 50
Once all 16 layers are created, the model is compiled with the βAdamβ optimizer and βCategorical Cross-Entropyβ loss function. The model is then trained with all 60,000 images in the MNIST training dataset. Each model was tested on the MNIST test set which consists of 10,000 images. After evaluating all 480 architectures the best combination of hyper-parameters was identified (see Table 3). The accuracy of this model was 99.51%. Epistocracy was then used to search and find the same optimal set of hyper-parameters shown in Table 3. Epistocracy found the best answer 33% of the time. Of the 33% of runs which it found the best answer the answer was on average found around iteration 6. The mean accuracy of the best Governor was 99.48%. The configuration of the Epistocracy algorithm was as follows:
Council rate: 10% β’
Crossover Rate: 50% β’
Mutation Rate: 20% β’
Tournament size: 5 β’
Population size: 20 To evaluate the performance and robustness of the Epistocracy algorithm, we compared our proposed algorithm with two state-of-the-art algorithms: Particle Swarm Optimization and Genetic Algorithm. These algorithms were also tasked to find the best modelβs hyper-parameters similar to Epistocracy. The population, iterations, and number of runs the algorithm tested at are given below: β’ Population size: 20 β’ Iterations: 100 β’ Runs: 100
Table 4.
Comparison of GA, PSO, and Epistocracy algorithms using MNIST dataset.
GA PSO E
PISTOCRACY M EAN ACCURACY OF TOP BEST PERFORMING G OVERNOR P ERCENTAGE OF TIMES THE BEST ANSWER WAS FOUND
26% 28% A VERAGE NUMBER OF ITERATIONS BEFORE FINDING THE BEST ANSWER
Particle Swarm Optimization . To compare how our algorithm performs against other algorithms, we used Particle Swarm Optimization (see Table 4). The configuration of PSO was the same as Epistocracy. The PSO mean accuracy was 99.48% and the best accuracy was found around the 7th iteration. The best accuracy was found 28 times out of 100 runs.
Genetic Algorithm.
The other evolutionary algorithm tested was a genetic algorithm (see Table 4). The algorithm is a standard genetic algorithm using crossover and mutation. The implementation found the best answer 26 times out of 100 runs. The mean accuracy of the best Individual was 99.48% and the best accuracy was found around the 23 rd iteration. Fig. 4.
Performance comparison of GA, PSO, and Epistocracy.
Fig. 4 shows that the Epistocracy algorithm initially has a higher accuracy than PSO and GA. After around 20 iterations, the Epistocracy algorithm asymptotically converges to the same fitness score of PSO, and eventually after about 92 iterations it defeats the PSO and shows higher accuracy. In this figure, the Epistocracy algorithm converges to the optimal solution faster than GA. However, even though PSO has a faster convergence rate of accuracy, it eventually fell behind Epistocracy. Overall, the Epistocracy algorithm shows better performance than the other algorithms. Conclusion and Future Work
Evolutionary algorithms, in general, suffer from different types of problems such as premature convergence and stagnation which is closely related to the diversity of the population, curse of dimensionality and scalability, and a random, limited searching ability which usually occur in the absence of a guided change and due to unbalanced exploration-exploitation capacities. This paper proposes a new multi-population evolutionary algorithm called Epistocracy based on socio-political evolution. In Epistocracy, there are two classes of population: governors and citizens. Citizens liberally follow governors to improve their performance through the exploration and exploitation of the search space. Governors, on the other hand, attempt to lead their population effectively to help the algorithm converge to the optimal solution in the early stages. Governors can be promoted or demoted based on their population performance and votes. Individuals with better performance have votes of greater weights. The Epistocracy algorithm was tested using several benchmark functions. The experimental results show that the Epistocracy algorithm can achieve superior results compared to other evolutionary and swarm-intelligence algorithms. Our proposed method is less likely to be trapped in local optima compared to other methods such as A, PSO, ES, ABC, and CSA, and in some cases, can reach the optimal solution faster than existing algorithms. The Epistocracy algorithm uses the idea of rebels, dynamic resource management, gravitational force, and population variance to conduct an efficient explorative and exploitative search. For future works, a number of research directions can be envisioned. First, the exploration-exploitation strategies can be enhanced to achieve a better convergence rate. Second, a multi-objective version of the algorithm can be implemented. Third, a more comprehensive test set with high dimensionality can be utilized, and the results compared with more evolutionary and swarm intelligence algorithms. Finally, the Epistocracy algorithm can be adapted for the discovery of optimal architectures of Convolutional Neural Networks and their hyper-parameters.
References F. Ong, P. Milanfar, and P. Getreuer, "Local kernels that approximate Bayesian regularization and proximal operators," IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 3007-3019, 2019. 2.
J. H. Holland, Adaptation in natural and artificial systems : an introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: University of Michigan Press, 1975, pp. viii, 183 p. 3.
J. H. Holland, "Genetic algorithms and the optimal allocation of trials," SIAM Journal on Computing, vol. 2, no. 2, pp. 88-105, 1973. 4.
J. H. Holland, "Outline for a logical theory of adaptive systems," Journal of the ACM (JACM), vol. 9, no. 3, pp. 297-314, 1962. 5.
I. Rechenberg, Cybernetic Solution Path of an Experimental Problem: Kybernetische LΓΆsungsansteuerung Einer ExperimΓ©ntellen Forschungsaufgabe. RAE, 1965. 6.
H.-G. Beyer and H.-P. Schwefel, "Evolution strategies β A comprehensive introduction," Natural Computing, vol. 1, no. 1, pp. 3-52, 2002/03/01 2002, doi: 10.1023/A:1015059928466. 7.
H. P. Schwefel, "Kybernetische evolution als strategie der experimentellen forschung in der strΓΆmungstechnik. Dipl.-Ing," thesis, 1965. 8.
R. Eberhart and J. Kennedy, "Particle swarm optimization," in Proceedings of the IEEE international conference on neural networks, 1995, vol. 4: Citeseer, pp. 1942-1948. 9.
X. Pan, L. Xue, Y. Lu, and N. Sun, "Hybrid particle swarm optimization with simulated annealing," Multimedia Tools and Applications, vol. 78, no. 21, pp. 29921-29936, 2019/11/01 2019, doi: 10.1007/s11042-018-6602-4. 10.
S. Wang, G. Liu, M. Gao, S. Cao, A. Guo, and J. Wang, "Heterogeneous comprehensive learning and dynamic multi-swarm particle swarm optimizer with two mutation operators," Information Sciences, vol. 540, pp. 175-201, 2020/11/01/ 2020, doi: https://doi.org/10.1016/j.ins.2020.06.027. 11.
A. Ratnaweera, S. K. Halgamuge, and H. C. Watson, "Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients," IEEE Transactions on evolutionary computation, vol. 8, no. 3, pp. 240-255, 2004. 12.
P. N. Suganthan, "Particle swarm optimiser with neighbourhood operator," in Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), 1999, vol. 3: IEEE, pp. 1958-1962. 3.
S. Kumar, A. Nayyar, and R. Kumari, "Arrhenius artificial bee colony algorithm," in International conference on innovative computing and communications, 2019: Springer, pp. 187-195. 14.
D. Karaboga, "An idea based on honeybee swarm for numerical optimization," Technical report-tr06, Erciyes university, engineering faculty, computer β¦, 2005. 15.
D. Bajer and B. ZoriΔ, "An effective refined artificial bee colony algorithm for numerical optimisation," Information Sciences, vol. 504, pp. 221-275, 2019/12/01/ 2019, doi: https://doi.org/10.1016/j.ins.2019.07.022. 16.
S. Xiao, W. Wang, H. Wang, and X. Zhou, "A New Artificial Bee Colony Based on Multiple Search Strategies and Dimension Selection," IEEE Access, vol. 7, pp. 133982-133995, 2019, doi: 10.1109/ACCESS.2019.2941247. 17.
X.-S. Yang and S. Deb, "Cuckoo search via LΓ©vy flights," in 2009 World congress on nature & biologically inspired computing (NaBIC), 2009: IEEE, pp. 210-214. 18.
M. Mareli and B. Twala, "An adaptive Cuckoo search algorithm for optimisation," Applied Computing and Informatics, vol. 14, no. 2, pp. 107-115, 2018/07/01/ 2018, doi: https://doi.org/10.1016/j.aci.2017.09.001. 19.
J. Li, Y.-x. Li, S.-s. Tian, and J.-l. Xia, "An improved cuckoo search algorithm with self-adaptive knowledge learning," Neural Computing and Applications, pp. 1-31, 2019. 20.
H. Li, D. Yuan, X. Ma, D. Cui, and L. Cao, "Genetic algorithm for the optimization of features and neural networks in ECG signals classification," Scientific reports, vol. 7, p. 41011, 2017. 21.
W. Xi, G. Li, H. Moayedi, and H. Nguyen, "A particle-based optimization of artificial neural network for earthquake-induced landslide assessment in Ludian county, China," Geomatics, Natural Hazards and Risk, vol. 10, no. 1, pp. 1750-1771, 2019. 22.
T. S. Kim, J. C. Na, and K. J. Kim, "Optimization of an autonomous car controller using a self-adaptive evolutionary strategy," International Journal of Advanced Robotic Systems, vol. 9, no. 3, p. 73, 2012. 23.
M. Prakash, R. Saranya, K. R. Jothi, and A. Vigneshwaran, "An optimal job scheduling in grid using cuckoo algorithm," International Journal of Computer Science and Telecommunications, vol. 3, no. 2, pp. 65-69, 2012. 24.
W.-C. Yeh, J. C. Su, T.-J. Hsieh, M. Chih, and S.-L. Liu, "Approximate reliability function based on wavelet latin hypercube sampling and bee recurrent neural network," IEEE Transactions on Reliability, vol. 60, no. 2, pp. 404-414, 2011. 25.
R. C. Contreras, O. M. Junior, and M. S. Viana, "A New Local Search Adaptive Genetic Algorithm for the Pseudo-Coloring Problem," in International Conference on Swarm Intelligence, 2020: Springer, pp. 349-361. 26.
S. Rahmani and N. Amjady, "Non-deterministic optimal power flow considering the uncertainties of wind power and load demand by multi-objective information gap decision theory and directed search domain method," IET Renewable Power Generation, vol. 12, no. 12, pp. 1354-1365, 2018. 27.
M. Han, C. Liu, and J. Xing, "An evolutionary membrane algorithm for global numerical optimization problems," Information Sciences, vol. 276, pp. 219-241, 2014. 28.
M. Kheshti, X. Kang, J. Li, P. Regulski, and V. Terzija, "Lightning flash algorithm for solving non-convex combined emission economic dispatch with generator constraints," IET Generation, Transmission & Distribution, vol. 12, no. 1, pp. 104-116, 2017. 29.
K. Zhang and B. Li, "Cooperative coevolution with global search for large scale global optimization," in 2012 IEEE Congress on Evolutionary Computation, 2012: IEEE, pp. 1-7. 0.
C. Vanaret, J.-B. Gotteland, N. Durand, and J.-M. Alliot, "Preventing premature convergence and proving the optimality in evolutionary algorithms," in International Conference on Artificial Evolution (Evolution Artificielle), 2013: Springer, pp. 29-40. 31.
E. L. Dolson, A. E. Vostinar, M. J. Wiser, and C. Ofria, "The MODES toolbox: Measurements of open-ended dynamics in evolving systems," Artificial life, vol. 25, no. 1, pp. 50-73, 2019. 32.
A. Rajan and T. Malakar, "Optimal reactive power dispatch using hybrid NelderβMead simplex based firefly algorithm," International Journal of Electrical Power & Energy Systems, vol. 66, pp. 9-24, 2015. 33.
S. Sreejith, H. K. Nehemiah, and A. Kannan, "Clinical Data Classification Using an Enhanced SMOTE and Chaotic Evolutionary Feature Selection," Computers in Biology and Medicine, p. 103991, 2020. 34.
M. Bardeen, "Survey of methods to prevent premature convergence in evolutionary algorithms," in Workshop of Natural Computing, J. Chilenas de Computation, 2013, pp. 13-15. 35.
T. Jansen, Analyzing evolutionary algorithms: The computer science perspective. Springer Science & Business Media, 2013. 36.
M. Tahir, A. Tubaishat, F. Al-Obeidat, B. Shah, Z. Halim, and M. Waqas, "A novel binary chaotic genetic algorithm for feature selection and its utility in affective computing and healthcare," Neural Computing and Applications, pp. 1-22, 2020. 37.
S. S. Jadon, R. Tiwari, H. Sharma, and J. C. Bansal, "Hybrid artificial bee colony algorithm with differential evolution," Applied Soft Computing, vol. 58, pp. 11-24, 2017. 38.
R. Murugan, M. Mohan, C. C. A. Rajan, P. D. Sundari, and S. Arunachalam, "Hybridizing bat algorithm with artificial bee colony for combined heat and power economic dispatch," Applied Soft Computing, vol. 72, pp. 189-217, 2018. 39.
O. Kramer, "Premature convergence in constrained continuous search spaces," in International Conference on Parallel Problem Solving from Nature, 2008: Springer, pp. 62-71. 40.
J. S. Mill, "Mill on Bentham and Coleridge, ed," FR Leavis, London: Chatto and Windus, 1950. 41.