Neuroevolution in Deep Learning: The Role of Neutrality
aa r X i v : . [ c s . N E ] F e b Neuroevolution in Deep Learning: The Role ofNeutrality
Edgar Galv´an
Received: date / Accepted: date
Abstract
A variety of methods have been applied to the architectural con-figuration and learning or training of artificial deep neural networks (DNN).These methods play a crucial role in the success or failure of the DNN formost problems and applications. Evolutionary Algorithms (EAs) are gainingmomentum as a computationally feasible method for the automated optimi-sation of DNNs. Neuroevolution is a term which describes these processesof automated configuration and training of DNNs using EAs. However, theautomatic design and/or training of these modern neural networks throughevolutionary algorithms is computanalli expensive. Kimura’s neutral theory ofmolecular evolution states that the majority of evolutionary changes at molec-ular level are the result of random fixation of selectively neutral mutations. Amutation from one gene to another is neutral if it does not affect the pheno-type. This work discusses how neutrality, given certain conditions, can help tospeed up the training/design of deep neural networks .
Keywords
Neutrality · Deep Neural Networks · Neurevolution · EvolutionaryAlgorithms
Deep learning algorithms, commonly referred as Deep Neural Networks [18,19,26], are inspired by deep hierarchical structures of human perception aswell as production systems [7]. These algorithms have achieved expert human-level performance in multiple areas including computer vision problems [49],games [43], to mention a few examples. The design of deep neural networks
Edgar Galv´anNaturally Inspired Computation Research Group, Department of Computer Science, Na-tional University of Ireland Maynooth, IrelandE-mail: [email protected] Edgar Galv´an (DNNs) architectures (along with the optimisation of their hyperparameters)as well as their training plays a crucial part for their success or failure [28].Neural architecture search is a reality: a great variety of methods have beenproposed over recent years including Monte Carlo-based simulations [33], ran-dom search [2] and random search with weight prediction [5], hill-climbing [9],grid search [55], Bayesian optimisation [3,21], gradient-based [27,52], and mu-tual information [50,51,54]. However, two methods started gaining momentumthanks to their impressive results: reinforcement learning (RL) methods [48]and evolution-based methods [1,8], sometimes referred to as neuroevolution inthe context of neural architecture search [7], whereas the latter method starteddominated the area due to better performance in e.g., terms of accuracy, aswell as being reported to require less computational time to find competitivesolutions [39,47] compared to reinforcement learning methods.
There has been an increased interest in the correct design (and to a lesser de-gree training) of deep neural networks by means of Evolutionary Algorithms,as extensively discussed in our recent work, summarising over 100 recent pa-pers in the area or neuroevolution in deep neural networks [7]. Figure 1 showsa visual representation of the research trends followed in neuroevolution indeep neural networks. This is the result of using keywords used in titles andabstract of around 100 published in the last 5 years. We computed a similar-ity metric between these keywords and each paper. These similarities inducecorresponding graph structures on the paper and key term ‘spaces’. Each pa-per/term corresponds to a node and edges arise naturally whenever there isa similarity between nodes. Details on how to generate this graph are givenin [35].2.1 Evolving Deep Neural Networks’ Architectures with EvolutionaryAlgorithmsThe use of evolution-based methods in designing DNN is already a reality asdiscussed in [7]. Different Evolutionary Algorithms with different representa-tions have been used, ranging from landmark evolutionary methods includ-ing Genetic Algorithms [20], Genetic Programming [25] and Evolution Strate-gies [4,41] up to using hybrids combining, for example, the use of Genetic Al-gorithms and Grammatical Evolution [42]. In a short period of time, we haveobserved both ingenious representations and interesting approaches achiev-ing extraordinary results against human-expert configured networks [39]. Wehave also seen state-of-the-art approaches in some cases employing hundredsof computers [40] to using just a few GPUs [47]. Most neuroevolution studieshave focused their attention in designing deep Convolutional Neural Networks.Other networks have also been considered including Autoencoders, Restricted euroevolution in Deep Learning: The Role of Neutrality 3 algorithms methodsarchitecturesnetworks tasksresultstechniques hyperparameterssetsdatasets problemsweightsscales strategies achievesfunctionshours layers expertsresourcestermscompetitors objectivesusesfeaturesstructuresabilities years applicationsfieldscasesgenerationsdecades cellscapabilitiesdays machinestopologies models sourcesapproachessystemssolutions humans
Fig. 1
Bird’s-eye view analysis of the research conducted in the area of neuroevolution inDNNS. The rest-length for repulsive forces between nodes was set to 13.
Boltzmann Machines, Recurrent Neural Networks and Long Short Term Mem-ory, although there are just a few neuroevolution works considering the use ofthese types of networks.Our recent article [7] summaries, in a series of informative tables, the EArepresentation used, the representation of individuals, genetic operators used,and the EA parameters. They also outline the computational resources usedin the corresponding study by attempting to outline the number of GPUsused. A calculation of the GPU days per run is approximated as in Sun et
Edgar Galv´an al. [46]. We indicate benchmark datasets used in the experimental analysis.Finally, the table indicates if the neural network architecture has been evolvedautomatically or by using a semi-automated approach whilst also indicatingthe target DNN architecture. Every selected paper does not report the sameinformation. Some papers omit details about computational resources whileothers omit information about the number of runs. A very interesting out-put from this summary is that there are numerous differences between theapproaches used by all of the papers listed. Crossover is omitted from severalstudies mostly due to encoding adopted by various researchers. Population sizeand selection strategies for the EAs change between studies. While our recentarticle [7] clearly demonstrates that MNIST and CIFAR are the most popularbenchmark datasets we can see many examples of studies using benchmarkdatasets from specific application domains.2.2 Training Deep Neural Networks Through Evolutionary AlgorithmsIn the early years of neuroevolution, it was thought that evolution-based meth-ods might exceed the capabilities of backpropagation [53]. As Artificial NeuralNetworks, in general, and as Deep Neural Networks (DNNs), in particular,increasingly adopted the use of stochastic gradient descent and backpropa-gation, the idea of using Evolutionary Algorithms (EAs) for training DNNsinstead has been almost abandoned by the DNN research community. EAsare a “genuinely different paradigm for specifying a search problem” [32] andprovide exciting opportunities for learning in DNNs. When comparing neu-roevolutionary approaches to other approaches such as gradient descent, au-thors such as Khadka et al. [22] urge caution. A generation in neuroevolutionis not readily comparable to a gradient descent epoch. Despite the fact thatit has been argued that EAs can compete with gradient-based search in smallproblems as well as using NN with a non-differentiable activation function [29],the encouraging results achieved in the 1990s [17,31,38] have inspired someresearchers to carry out research in training DNNs. This includes the workconducted by David and Greental [6] and Fernando et al. [10] both of whichusing deep autoeconders and Pawelczyk et al. [34] and Such et al. [44] who usedeep Convolutional Neural Networks. An informative summary of the workscarried out on the training of DNNs using Evolutionary Algorithms can beseen in the our recent article [7].
Kimura’s neutral theory of molecular evolution [23,24] states that the majorityof evolutionary changes at molecular level are the result of random fixation of selectively neutral mutations . A mutation from one gene to another is neutralif it does not affect the phenotype. Thus, most mutations that take place innatural evolution are neither advantageous nor disadvantageous for the sur-vival of individuals. It is then reasonable to extrapolate that, if this is how euroevolution in Deep Learning: The Role of Neutrality 5 evolution has managed to produce the amazing complexity and adaptationsseen in nature, then neutrality should aid also EAs. However, whether neu-trality helps or hinders the search in EAs is ill-posed and cannot be answeredin general. One can only answer this question within the context of a specificclass of problems, (neutral) representations and set of operators [11,12,13,14,15,16,36,37].We are not aware of any works in neuroevolution in DNNs on neutral-ity. In our recent in-depth review article on neuroevolution in deep neuralnetworks [7], we have seen that numerous studies used selection and mutationonly to drive evolution in automatically finding a suitable deep neural networkarchitecture or to train a neural network. Interestingly, many researchers havereported highly encouraging results when using these two genetic operators,including the works conducted by Real et al. [39,40] using GAs and hundredsof GPUs as well as the work carried out by Suganuma et al. [45] employingCartesian Genetic Programming and using only a few GPUs.If neutrality is beneficial, taking into consideration specific classes of prob-lems, representations and genetic operators, this can also have an immediatelypositive impact in the time needed to test the configuration of DNNs becausethe evaluation of potential EA candidate solutions will not be necessary. Thereare some interesting encodings adopted by researchers including Suganuma’swork [45] (see Fig. 2) that allow the measurement of the level of neutralitypresent in evolutionary search and can potentially indicate whether its pres-ence is beneficial or not in certain problems and DNNs.Fig. 2 helps to illustrate how neutrality can be explicitly be promoted(or impeded) in evolutionary algorithms. The genotypic representation of acartesian genetic programming [30] individual encoding a CNN architecturesis shown in Fig 2 (a). This is then decoded to a phenotypic representation Fig. 2(b), worth noting is how gene number 5 in the genotype is not expressed inthe phenotype. Thus, any mutation taking place in gene 5 will not affect thephenotype which defines the CNN architecture depicted in Fig. 2 (c).3.1 Does neutrality help or hinder the search of an Evolutionary Algorithm?This question has been debated at considerable length in the literature with-out really reaching any form of consensus on its answer. The reasons for thissituation include the lack of a single definition of neutrality, the multiple waysin which one can add neutrality to a representation, the focus on pure per-formance when evaluating the effects of neutrality without attention to thechanges in the behaviour of the search operators and in the features of thefitness landscape, and, finally, the variability in the choice of problems, algo-rithms and representations for benchmarking purposes. Also, very often stud-ies consider problems and representations that are quite complex and resultsrepresent the composition of multiple effects.
Edgar Galv´an
Softmax
Conv(32,3) Pool(max) Conv(64,5) Conv(64,3)MaxPooling + Summation
13 246
GenotypePhenotype
C0 0 2 P1 0 0 C3 1 2 S 3 4 Out 6 sum conv(32, 3) pool(max) conv(64, 5) conv(64, 3) pool(max) outputinput
12 34 56 70 sum (a) (c)(b) CNN Architecture Fig. 2 (a)
Genetic representation of a cartesian GP individual encoding a CNN architec-ture. (b)
The phenotypic representation. (c)
CNN architecture defined by (a). Gene No. 5,coloured with a black background in the genotype (a) is not expressed in the phenotype.The summation node in (c), with light yellow background, performs max pooling to theLHS of the input (Node no. 3) to get the same input tensor sizes. Redrawn from Suganumaet al. [45].
We believe that one of the first step to see whether neutrality helps or hindersevolution in the configuration (or training) of a deep neural network is to adopta very simple representation such as binary representation, using mutation andselection as genetic operators to guide evolution. The type of problem is a moredifficult endevour when trying to carry out this research. The reason is becausemuch of the empirical scientific works conducted in the area of neuroevolutionin deep neural networks are incredible different, as summarised in our recentarticle [7], where CNNs and computer vision datasets have been the attentionof the research community and no general conclusions have been drawn in thearea. However, these two can also represent good areas to be studied given thenumerous results reported in a variety of studies, helping us to use them asbasis for our research.
References
1. T. B¨ack.
Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evo-lutionary Programming, Genetic Algorithms . Oxford University Press, Oxford, UK,1996.2. J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization.
J. Mach.Learn. Res. , 13:281–305, 2012.3. J. Bergstra, D. Yamins, and D. D. Cox. Making a science of model search: Hyperparam-eter optimization in hundreds of dimensions for vision architectures. In
Proceedings of euroevolution in Deep Learning: The Role of Neutrality 7 the 30th International Conference on International Conference on Machine Learning -Volume 28 , ICML’13, page I–115–I–123. JMLR.org, 2013.4. H.-G. Beyer and H.-P. Schwefel. Evolution strategies –a comprehensive introduction.
Natural Computing: An International Journal , 1(1):3–52, May 2002.5. A. Brock, T. Lim, J. M. Ritchie, and N. Weston. SMASH: one-shot model architecturesearch through hypernetworks.
CoRR , abs/1708.05344, 2017.6. O. E. David and I. Greental. Genetic algorithms for evolving deep neural networks. In
Proceedings of the Companion Publication of the 2014 Annual Conference on Geneticand Evolutionary Computation , GECCO Comp ’14, page 1451–1452, New York, NY,USA, 2014. Association for Computing Machinery.7. Edgar Galv´an and Peter Mooney. Neuroevolution in Deep Neural Networks: CurrentTrends and Future Challenges. arXiv preprint arXiv: 2006.05415, 2020.8. A. E. Eiben and J. E. Smith.
Introduction to Evolutionary Computing . Springer Verlag,2003.9. T. Elsken, J.-H. Metzen, and F. Hutter. Simple and efficient architecture search forconvolutional neural networks, 2017.10. C. Fernando, D. Banarse, M. Reynolds, F. Besse, D. Pfau, M. Jaderberg, M. Lanctot,and D. Wierstra. Convolution by evolution: Differentiable pattern producing networks.In
Proceedings of the Genetic and Evolutionary Computation Conference 2016 , GECCO’16, page 109–116, New York, NY, USA, 2016. Association for Computing Machinery.11. E. Galv´an-L´opez.
An analysis of the effects of neutrality on problem hardness forevolutionary algorithms . PhD thesis, University of Essex, Colchester, UK, 2009.12. E. Galv´an-L´opez, S. Dignum, and R. Poli. The effects of constant neutrality on per-formance and problem hardness in GP. In M. O’Neill, L. Vanneschi, S. M. Gustafson,A. Esparcia-Alc´azar, I. D. Falco, A. D. Cioppa, and E. Tarantino, editors,
GeneticProgramming, 11th European Conference, EuroGP 2008, Naples, Italy, March 26-28,2008. Proceedings , volume 4971 of
Lecture Notes in Computer Science , pages 312–324.Springer, 2008.13. E. Galv´an-L´opez and R. Poli. An empirical investigation of how and why neutralityaffects evolutionary search. In M. Cattolico, editor,
Genetic and Evolutionary Compu-tation Conference, GECCO 2006, Proceedings, Seattle, Washington, USA, July 8-12,2006 , pages 1149–1156. ACM, 2006.14. E. Galv´an-L´opez and R. Poli. Some steps towards understanding how neutrality affectsevolutionary search. In T. P. Runarsson, H. Beyer, E. K. Burke, J. J. Merelo Guerv’os,L. D. Whitley, and X. Yao, editors,
Parallel Problem Solving from Nature - PPSN IX,9th International Conference, Reykjavik, Iceland, September 9-13, 2006, Procedings ,volume 4193, pages 778–787. Springer, 2006.15. E. Galv´an-L´opez and R. Poli. An empirical investigation of how degree neutrality affectsGP search. In A. H. Aguirre, R. M. Borja, and C. A. R. Garc´ıa, editors,
MICAI 2009:Advances in Artificial Intelligence, 8th Mexican International Conference on ArtificialIntelligence, Guanajuato, Mexico, November 9-13, 2009. Proceedings , volume 5845 of
Lecture Notes in Computer Science , pages 728–739. Springer, 2009.16. E. Galv´an-L´opez, R. Poli, A. Kattan, M. O’Neill, and A. Brabazon. Neutrality inevolutionary algorithms... what do we know?
Evolving Systems , 2(3):145–163, 2011.17. C. Goerick and T. Rodemann. Evolution strategies: An alternative to gradient basedlearning.18. I. Goodfellow, Y. Bengio, and A. Courville.
Deep Learning . MIT Press, 2016.19. G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep beliefnets.
Neural Comput. , 18(7):1527–1554, July 2006.20. J. H. Holland.
Adaptation in Natural and Artificial Systems: An Introductory Analysiswith Applications to Biology, Control and Artificial Intelligence . MIT Press, Cambridge,MA, USA, 1992.21. K. Kandasamy, W. Neiswanger, J. Schneider, B. P´oczos, and E. P. Xing. Neural archi-tecture search with bayesian optimisation and optimal transport. In S. Bengio, H. M.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,
Ad-vances in Neural Information Processing Systems 31: Annual Conference on NeuralInformation Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr´eal,Canada , pages 2020–2029, 2018. Edgar Galv´an22. S. Khadka, J. J. Chung, and K. Tumer. Neuroevolution of a modular memory-augmented neural network for deep memory problems.
Evol. Comput. , 27(4):639–664,2019.23. M. Kimura. Evolutionary rate at the molecular level.
Nature , 217:624–626, 1968.24. M. Kimura.
The Neutral Theory of Molecular Evolution . Cambridge University Press,1983.25. J. R. Koza.
Genetic Programming: On the Programming of Computers by Means ofNatural Selection . MIT Press, Cambridge, MA, USA, 1992.26. Y. LeCun, Y. Bengio, and G. E. Hinton. Deep learning.
Nature , 521(7553):436–444,2015.27. H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search, 2018.28. W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi. A survey of deep neuralnetwork architectures and their applications.
Neurocomputing , 234:11 – 26, 2017.29. M. Mandischer. A comparison of evolution strategies and backpropagation for neuralnetwork training.
Neurocomputing , 42(1):87 – 117, 2002. Evolutionary neural systems.30. J. F. Miller.
Cartesian Genetic Programming , pages 17–34. Springer Berlin Heidelberg,Berlin, Heidelberg, 2011.31. D. J. Montana and L. Davis. Training feedforward neural networks using genetic algo-rithms. In
Proceedings of the 11th International Joint Conference on Artificial Intel-ligence - Volume 1 , IJCAI’89, page 762–767, San Francisco, CA, USA, 1989. MorganKaufmann Publishers Inc.32. G. Morse and K. O. Stanley. Simple evolutionary optimization can rival stochasticgradient descent in neural networks. In
Proceedings of the Genetic and EvolutionaryComputation Conference 2016 , GECCO ’16, page 477–484, New York, NY, USA, 2016.Association for Computing Machinery.33. R. Negrinho and G. Gordon. Deeparchitect: Automatically designing and training deeparchitectures, 2017.34. K. Pawe lczyk, M. Kawulok, and J. Nalepa. Genetically-trained deep neural networks.In
Proceedings of the Genetic and Evolutionary Computation Conference Companion ,GECCO ’18, page 63–64, New York, NY, USA, 2018. Association for Computing Ma-chinery.35. R. Poli. Analysis of the publications on the applications of particle swarm optimisation.
J. Artif. Evol. App. , 2008:4:1–4:10, Jan. 2008.36. R. Poli and E. Galv´an-L´opez. On the effects of bit-wise neutrality on fitness distance cor-relation, phenotypic mutation rates and problem hardness. In C. R. Stephens, M. Tous-saint, D. Whitley, and P. F. Stadler, editors,
Foundations of Genetic Algorithms , pages138–164, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.37. R. Poli and E. Galv´an-L´opez. The effects of constant and bit-wise neutrality on problemhardness, fitness distance correlation and phenotypic mutation rates.
IEEE Trans.Evolutionary Computation , 16(2):279–300, 2012.38. V. W. Porto, D. B. Fogel, and L. J. Fogel. Alternative neural network training methods.
IEEE Expert: Intelligent Systems and Their Applications , 10(3):16–22, June 1995.39. E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. Regularized evolution for image classifierarchitecture search. In
The Thirty-Third AAAI Conference on Artificial Intelligence,AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Confer-ence, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in ArtificialIntelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019 ,pages 4780–4789. AAAI Press, 2019.40. E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Ku-rakin. Large-scale evolution of image classifiers. In
Proceedings of the 34th InternationalConference on Machine Learning - Volume 70 , ICML’17, page 2902–2911. JMLR.org,2017.41. I. Rechenberg. Evolutionsstrategien. In B. Schneider and U. Ranft, editors,
Simula-tionsmethoden in der Medizin und Biologie , pages 83–114, Berlin, Heidelberg, 1978.Springer Berlin Heidelberg.42. C. Ryan, J. Collins, and M. O. Neill. Grammatical evolution: Evolving programs foran arbitrary language. In W. Banzhaf, R. Poli, M. Schoenauer, and T. C. Fogarty,editors,
Genetic Programming , pages 83–96, Berlin, Heidelberg, 1998. Springer BerlinHeidelberg.euroevolution in Deep Learning: The Role of Neutrality 943. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche,J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe,J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu,T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networksand tree search.
Nature , 529(7587):484–489, Jan. 2016.44. F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune. Deepneuroevolution: Genetic algorithms are a competitive alternative for training deep neuralnetworks for reinforcement learning.
ArXiv , abs/1712.06567, 2017.45. M. Suganuma, S. Shirakawa, and T. Nagao. A genetic programming approach to de-signing convolutional neural network architectures. In
Proceedings of the Genetic andEvolutionary Computation Conference , GECCO ’17, page 497–504, New York, NY,USA, 2017. Association for Computing Machinery.46. Y. Sun, B. Xue, M. Zhang, and G. G. Yen. Completely automated cnn architecturedesign based on blocks.
IEEE Transactions on Neural Networks and Learning Systems ,31(4):1242–1254, 2020.47. Y. Sun, B. Xue, M. Zhang, and G. G. Yen. Evolving deep convolutional neural networksfor image classification.
IEEE Transactions on Evolutionary Computation , 24(2):394–407, 2020.48. R. S. Sutton and A. G. Barto.
Reinforcement Learning: An Introduction . A BradfordBook, Cambridge, MA, USA, 2018.49. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Van-houcke, and A. Rabinovich. Going deeper with convolutions. In
IEEE Conference onComputer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12,2015 , pages 1–9. IEEE Computer Society, 2015.50. N. I. Tapia and P. A. Estevez. On the information plane of autoencoders. , Jul 2020.51. N. Tishby and N. Zaslavsky. Deep learning and the information bottleneck principle.
CoRR , abs/1503.02406, 2015.52. S. Xie, H. Zheng, C. Liu, and L. Lin. Snas: Stochastic neural architecture search, 2018.53. X. Yao. Evolving artificial neural networks.
Proceedings of the IEEE , 87(9):1423–1447,1999.54. S. Yu and J. C. Principe. Understanding autoencoders with information theoretic con-cepts, 2019.55. S. Zagoruyko and N. Komodakis. Wide residual networks. In R. C. Wilson, E. R.Hancock, and W. A. P. Smith, editors,