[PDF] Neuroevolution in Deep Learning: The Role of Neutrality

Abstract

A variety of methods have been applied to the architectural configuration and learning or training of artificial deep neural networks (DNN). These methods play a crucial role in the success or failure of the DNN for most problems and applications. Evolutionary Algorithms (EAs) are gaining momentum as a computationally feasible method for the automated optimisation of DNNs. Neuroevolution is a term which describes these processes of automated configuration and training of DNNs using EAs. However, the automatic design and/or training of these modern neural networks through evolutionary algorithms is computanalli expensive. Kimura's neutral theory of molecular evolution states that the majority of evolutionary changes at molecular level are the result of random fixation of selectively neutral mutations. A mutation from one gene to another is neutral if it does not affect the phenotype. This work discusses how neutrality, given certain conditions, can help to speed up the training/design of deep neural networks.

Full PDF

aa r X i v : . [ c s . N E ] F e b Neuroevolution in Deep Learning: The Role ofNeutrality

Edgar Galv´an

Received: date / Accepted: date

Abstract

A variety of methods have been applied to the architectural con-ﬁguration and learning or training of artiﬁcial deep neural networks (DNN).These methods play a crucial role in the success or failure of the DNN formost problems and applications. Evolutionary Algorithms (EAs) are gainingmomentum as a computationally feasible method for the automated optimi-sation of DNNs. Neuroevolution is a term which describes these processesof automated conﬁguration and training of DNNs using EAs. However, theautomatic design and/or training of these modern neural networks throughevolutionary algorithms is computanalli expensive. Kimura’s neutral theory ofmolecular evolution states that the majority of evolutionary changes at molec-ular level are the result of random ﬁxation of selectively neutral mutations. Amutation from one gene to another is neutral if it does not aﬀect the pheno-type. This work discusses how neutrality, given certain conditions, can help tospeed up the training/design of deep neural networks .

Keywords

Neutrality · Deep Neural Networks · Neurevolution · EvolutionaryAlgorithms

Deep learning algorithms, commonly referred as Deep Neural Networks [18,19,26], are inspired by deep hierarchical structures of human perception aswell as production systems [7]. These algorithms have achieved expert human-level performance in multiple areas including computer vision problems [49],games [43], to mention a few examples. The design of deep neural networks

Edgar Galv´anNaturally Inspired Computation Research Group, Department of Computer Science, Na-tional University of Ireland Maynooth, IrelandE-mail: [email protected] Edgar Galv´an (DNNs) architectures (along with the optimisation of their hyperparameters)as well as their training plays a crucial part for their success or failure [28].Neural architecture search is a reality: a great variety of methods have beenproposed over recent years including Monte Carlo-based simulations [33], ran-dom search [2] and random search with weight prediction [5], hill-climbing [9],grid search [55], Bayesian optimisation [3,21], gradient-based [27,52], and mu-tual information [50,51,54]. However, two methods started gaining momentumthanks to their impressive results: reinforcement learning (RL) methods [48]and evolution-based methods [1,8], sometimes referred to as neuroevolution inthe context of neural architecture search [7], whereas the latter method starteddominated the area due to better performance in e.g., terms of accuracy, aswell as being reported to require less computational time to ﬁnd competitivesolutions [39,47] compared to reinforcement learning methods.

There has been an increased interest in the correct design (and to a lesser de-gree training) of deep neural networks by means of Evolutionary Algorithms,as extensively discussed in our recent work, summarising over 100 recent pa-pers in the area or neuroevolution in deep neural networks [7]. Figure 1 showsa visual representation of the research trends followed in neuroevolution indeep neural networks. This is the result of using keywords used in titles andabstract of around 100 published in the last 5 years. We computed a similar-ity metric between these keywords and each paper. These similarities inducecorresponding graph structures on the paper and key term ‘spaces’. Each pa-per/term corresponds to a node and edges arise naturally whenever there isa similarity between nodes. Details on how to generate this graph are givenin [35].2.1 Evolving Deep Neural Networks’ Architectures with EvolutionaryAlgorithmsThe use of evolution-based methods in designing DNN is already a reality asdiscussed in [7]. Diﬀerent Evolutionary Algorithms with diﬀerent representa-tions have been used, ranging from landmark evolutionary methods includ-ing Genetic Algorithms [20], Genetic Programming [25] and Evolution Strate-gies [4,41] up to using hybrids combining, for example, the use of Genetic Al-gorithms and Grammatical Evolution [42]. In a short period of time, we haveobserved both ingenious representations and interesting approaches achiev-ing extraordinary results against human-expert conﬁgured networks [39]. Wehave also seen state-of-the-art approaches in some cases employing hundredsof computers [40] to using just a few GPUs [47]. Most neuroevolution studieshave focused their attention in designing deep Convolutional Neural Networks.Other networks have also been considered including Autoencoders, Restricted euroevolution in Deep Learning: The Role of Neutrality 3 algorithms methodsarchitecturesnetworks tasksresultstechniques hyperparameterssetsdatasets problemsweightsscales strategies achievesfunctionshours layers expertsresourcestermscompetitors objectivesusesfeaturesstructuresabilities years applicationsfieldscasesgenerationsdecades cellscapabilitiesdays machinestopologies models sourcesapproachessystemssolutions humans

Fig. 1

Bird’s-eye view analysis of the research conducted in the area of neuroevolution inDNNS. The rest-length for repulsive forces between nodes was set to 13.

Boltzmann Machines, Recurrent Neural Networks and Long Short Term Mem-ory, although there are just a few neuroevolution works considering the use ofthese types of networks.Our recent article [7] summaries, in a series of informative tables, the EArepresentation used, the representation of individuals, genetic operators used,and the EA parameters. They also outline the computational resources usedin the corresponding study by attempting to outline the number of GPUsused. A calculation of the GPU days per run is approximated as in Sun et

Edgar Galv´an al. [46]. We indicate benchmark datasets used in the experimental analysis.Finally, the table indicates if the neural network architecture has been evolvedautomatically or by using a semi-automated approach whilst also indicatingthe target DNN architecture. Every selected paper does not report the sameinformation. Some papers omit details about computational resources whileothers omit information about the number of runs. A very interesting out-put from this summary is that there are numerous diﬀerences between theapproaches used by all of the papers listed. Crossover is omitted from severalstudies mostly due to encoding adopted by various researchers. Population sizeand selection strategies for the EAs change between studies. While our recentarticle [7] clearly demonstrates that MNIST and CIFAR are the most popularbenchmark datasets we can see many examples of studies using benchmarkdatasets from speciﬁc application domains.2.2 Training Deep Neural Networks Through Evolutionary AlgorithmsIn the early years of neuroevolution, it was thought that evolution-based meth-ods might exceed the capabilities of backpropagation [53]. As Artiﬁcial NeuralNetworks, in general, and as Deep Neural Networks (DNNs), in particular,increasingly adopted the use of stochastic gradient descent and backpropa-gation, the idea of using Evolutionary Algorithms (EAs) for training DNNsinstead has been almost abandoned by the DNN research community. EAsare a “genuinely diﬀerent paradigm for specifying a search problem” [32] andprovide exciting opportunities for learning in DNNs. When comparing neu-roevolutionary approaches to other approaches such as gradient descent, au-thors such as Khadka et al. [22] urge caution. A generation in neuroevolutionis not readily comparable to a gradient descent epoch. Despite the fact thatit has been argued that EAs can compete with gradient-based search in smallproblems as well as using NN with a non-diﬀerentiable activation function [29],the encouraging results achieved in the 1990s [17,31,38] have inspired someresearchers to carry out research in training DNNs. This includes the workconducted by David and Greental [6] and Fernando et al. [10] both of whichusing deep autoeconders and Pawelczyk et al. [34] and Such et al. [44] who usedeep Convolutional Neural Networks. An informative summary of the workscarried out on the training of DNNs using Evolutionary Algorithms can beseen in the our recent article [7].

Kimura’s neutral theory of molecular evolution [23,24] states that the majorityof evolutionary changes at molecular level are the result of random ﬁxation of selectively neutral mutations . A mutation from one gene to another is neutralif it does not aﬀect the phenotype. Thus, most mutations that take place innatural evolution are neither advantageous nor disadvantageous for the sur-vival of individuals. It is then reasonable to extrapolate that, if this is how euroevolution in Deep Learning: The Role of Neutrality 5 evolution has managed to produce the amazing complexity and adaptationsseen in nature, then neutrality should aid also EAs. However, whether neu-trality helps or hinders the search in EAs is ill-posed and cannot be answeredin general. One can only answer this question within the context of a speciﬁcclass of problems, (neutral) representations and set of operators [11,12,13,14,15,16,36,37].We are not aware of any works in neuroevolution in DNNs on neutral-ity. In our recent in-depth review article on neuroevolution in deep neuralnetworks [7], we have seen that numerous studies used selection and mutationonly to drive evolution in automatically ﬁnding a suitable deep neural networkarchitecture or to train a neural network. Interestingly, many researchers havereported highly encouraging results when using these two genetic operators,including the works conducted by Real et al. [39,40] using GAs and hundredsof GPUs as well as the work carried out by Suganuma et al. [45] employingCartesian Genetic Programming and using only a few GPUs.If neutrality is beneﬁcial, taking into consideration speciﬁc classes of prob-lems, representations and genetic operators, this can also have an immediatelypositive impact in the time needed to test the conﬁguration of DNNs becausethe evaluation of potential EA candidate solutions will not be necessary. Thereare some interesting encodings adopted by researchers including Suganuma’swork [45] (see Fig. 2) that allow the measurement of the level of neutralitypresent in evolutionary search and can potentially indicate whether its pres-ence is beneﬁcial or not in certain problems and DNNs.Fig. 2 helps to illustrate how neutrality can be explicitly be promoted(or impeded) in evolutionary algorithms. The genotypic representation of acartesian genetic programming [30] individual encoding a CNN architecturesis shown in Fig 2 (a). This is then decoded to a phenotypic representation Fig. 2(b), worth noting is how gene number 5 in the genotype is not expressed inthe phenotype. Thus, any mutation taking place in gene 5 will not aﬀect thephenotype which deﬁnes the CNN architecture depicted in Fig. 2 (c).3.1 Does neutrality help or hinder the search of an Evolutionary Algorithm?This question has been debated at considerable length in the literature with-out really reaching any form of consensus on its answer. The reasons for thissituation include the lack of a single deﬁnition of neutrality, the multiple waysin which one can add neutrality to a representation, the focus on pure per-formance when evaluating the eﬀects of neutrality without attention to thechanges in the behaviour of the search operators and in the features of theﬁtness landscape, and, ﬁnally, the variability in the choice of problems, algo-rithms and representations for benchmarking purposes. Also, very often stud-ies consider problems and representations that are quite complex and resultsrepresent the composition of multiple eﬀects.

Edgar Galv´an

Softmax

Conv(32,3) Pool(max) Conv(64,5) Conv(64,3)MaxPooling + Summation

13 246

GenotypePhenotype

C0 0 2 P1 0 0 C3 1 2 S 3 4 Out 6 sum conv(32, 3) pool(max) conv(64, 5) conv(64, 3) pool(max) outputinput

12 34 56 70 sum (a) (c)(b) CNN Architecture Fig. 2 (a)

Genetic representation of a cartesian GP individual encoding a CNN architec-ture. (b)

The phenotypic representation. (c)

CNN architecture deﬁned by (a). Gene No. 5,coloured with a black background in the genotype (a) is not expressed in the phenotype.The summation node in (c), with light yellow background, performs max pooling to theLHS of the input (Node no. 3) to get the same input tensor sizes. Redrawn from Suganumaet al. [45].

We believe that one of the ﬁrst step to see whether neutrality helps or hindersevolution in the conﬁguration (or training) of a deep neural network is to adopta very simple representation such as binary representation, using mutation andselection as genetic operators to guide evolution. The type of problem is a morediﬃcult endevour when trying to carry out this research. The reason is becausemuch of the empirical scientiﬁc works conducted in the area of neuroevolutionin deep neural networks are incredible diﬀerent, as summarised in our recentarticle [7], where CNNs and computer vision datasets have been the attentionof the research community and no general conclusions have been drawn in thearea. However, these two can also represent good areas to be studied given thenumerous results reported in a variety of studies, helping us to use them asbasis for our research.

References

1. T. B¨ack.

Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evo-lutionary Programming, Genetic Algorithms . Oxford University Press, Oxford, UK,1996.2. J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization.

J. Mach.Learn. Res. , 13:281–305, 2012.3. J. Bergstra, D. Yamins, and D. D. Cox. Making a science of model search: Hyperparam-eter optimization in hundreds of dimensions for vision architectures. In

Proceedings of euroevolution in Deep Learning: The Role of Neutrality 7 the 30th International Conference on International Conference on Machine Learning -Volume 28 , ICML’13, page I–115–I–123. JMLR.org, 2013.4. H.-G. Beyer and H.-P. Schwefel. Evolution strategies –a comprehensive introduction.

Natural Computing: An International Journal , 1(1):3–52, May 2002.5. A. Brock, T. Lim, J. M. Ritchie, and N. Weston. SMASH: one-shot model architecturesearch through hypernetworks.

CoRR , abs/1708.05344, 2017.6. O. E. David and I. Greental. Genetic algorithms for evolving deep neural networks. In

Proceedings of the Companion Publication of the 2014 Annual Conference on Geneticand Evolutionary Computation , GECCO Comp ’14, page 1451–1452, New York, NY,USA, 2014. Association for Computing Machinery.7. Edgar Galv´an and Peter Mooney. Neuroevolution in Deep Neural Networks: CurrentTrends and Future Challenges. arXiv preprint arXiv: 2006.05415, 2020.8. A. E. Eiben and J. E. Smith.

Introduction to Evolutionary Computing . Springer Verlag,2003.9. T. Elsken, J.-H. Metzen, and F. Hutter. Simple and eﬃcient architecture search forconvolutional neural networks, 2017.10. C. Fernando, D. Banarse, M. Reynolds, F. Besse, D. Pfau, M. Jaderberg, M. Lanctot,and D. Wierstra. Convolution by evolution: Diﬀerentiable pattern producing networks.In

Proceedings of the Genetic and Evolutionary Computation Conference 2016 , GECCO’16, page 109–116, New York, NY, USA, 2016. Association for Computing Machinery.11. E. Galv´an-L´opez.

An analysis of the eﬀects of neutrality on problem hardness forevolutionary algorithms . PhD thesis, University of Essex, Colchester, UK, 2009.12. E. Galv´an-L´opez, S. Dignum, and R. Poli. The eﬀects of constant neutrality on per-formance and problem hardness in GP. In M. O’Neill, L. Vanneschi, S. M. Gustafson,A. Esparcia-Alc´azar, I. D. Falco, A. D. Cioppa, and E. Tarantino, editors,

GeneticProgramming, 11th European Conference, EuroGP 2008, Naples, Italy, March 26-28,2008. Proceedings , volume 4971 of

Lecture Notes in Computer Science , pages 312–324.Springer, 2008.13. E. Galv´an-L´opez and R. Poli. An empirical investigation of how and why neutralityaﬀects evolutionary search. In M. Cattolico, editor,

Genetic and Evolutionary Compu-tation Conference, GECCO 2006, Proceedings, Seattle, Washington, USA, July 8-12,2006 , pages 1149–1156. ACM, 2006.14. E. Galv´an-L´opez and R. Poli. Some steps towards understanding how neutrality aﬀectsevolutionary search. In T. P. Runarsson, H. Beyer, E. K. Burke, J. J. Merelo Guerv’os,L. D. Whitley, and X. Yao, editors,

Parallel Problem Solving from Nature - PPSN IX,9th International Conference, Reykjavik, Iceland, September 9-13, 2006, Procedings ,volume 4193, pages 778–787. Springer, 2006.15. E. Galv´an-L´opez and R. Poli. An empirical investigation of how degree neutrality aﬀectsGP search. In A. H. Aguirre, R. M. Borja, and C. A. R. Garc´ıa, editors,

MICAI 2009:Advances in Artiﬁcial Intelligence, 8th Mexican International Conference on ArtiﬁcialIntelligence, Guanajuato, Mexico, November 9-13, 2009. Proceedings , volume 5845 of

Lecture Notes in Computer Science , pages 728–739. Springer, 2009.16. E. Galv´an-L´opez, R. Poli, A. Kattan, M. O’Neill, and A. Brabazon. Neutrality inevolutionary algorithms... what do we know?

Evolving Systems , 2(3):145–163, 2011.17. C. Goerick and T. Rodemann. Evolution strategies: An alternative to gradient basedlearning.18. I. Goodfellow, Y. Bengio, and A. Courville.

Deep Learning . MIT Press, 2016.19. G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep beliefnets.

Neural Comput. , 18(7):1527–1554, July 2006.20. J. H. Holland.

Adaptation in Natural and Artiﬁcial Systems: An Introductory Analysiswith Applications to Biology, Control and Artiﬁcial Intelligence . MIT Press, Cambridge,MA, USA, 1992.21. K. Kandasamy, W. Neiswanger, J. Schneider, B. P´oczos, and E. P. Xing. Neural archi-tecture search with bayesian optimisation and optimal transport. In S. Bengio, H. M.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,

Ad-vances in Neural Information Processing Systems 31: Annual Conference on NeuralInformation Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr´eal,Canada , pages 2020–2029, 2018. Edgar Galv´an22. S. Khadka, J. J. Chung, and K. Tumer. Neuroevolution of a modular memory-augmented neural network for deep memory problems.

Evol. Comput. , 27(4):639–664,2019.23. M. Kimura. Evolutionary rate at the molecular level.

Nature , 217:624–626, 1968.24. M. Kimura.

The Neutral Theory of Molecular Evolution . Cambridge University Press,1983.25. J. R. Koza.

Genetic Programming: On the Programming of Computers by Means ofNatural Selection . MIT Press, Cambridge, MA, USA, 1992.26. Y. LeCun, Y. Bengio, and G. E. Hinton. Deep learning.

Nature , 521(7553):436–444,2015.27. H. Liu, K. Simonyan, and Y. Yang. Darts: Diﬀerentiable architecture search, 2018.28. W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi. A survey of deep neuralnetwork architectures and their applications.

Neurocomputing , 234:11 – 26, 2017.29. M. Mandischer. A comparison of evolution strategies and backpropagation for neuralnetwork training.

Neurocomputing , 42(1):87 – 117, 2002. Evolutionary neural systems.30. J. F. Miller.

Cartesian Genetic Programming , pages 17–34. Springer Berlin Heidelberg,Berlin, Heidelberg, 2011.31. D. J. Montana and L. Davis. Training feedforward neural networks using genetic algo-rithms. In

Proceedings of the 11th International Joint Conference on Artiﬁcial Intel-ligence - Volume 1 , IJCAI’89, page 762–767, San Francisco, CA, USA, 1989. MorganKaufmann Publishers Inc.32. G. Morse and K. O. Stanley. Simple evolutionary optimization can rival stochasticgradient descent in neural networks. In

Proceedings of the Genetic and EvolutionaryComputation Conference 2016 , GECCO ’16, page 477–484, New York, NY, USA, 2016.Association for Computing Machinery.33. R. Negrinho and G. Gordon. Deeparchitect: Automatically designing and training deeparchitectures, 2017.34. K. Pawe lczyk, M. Kawulok, and J. Nalepa. Genetically-trained deep neural networks.In

Proceedings of the Genetic and Evolutionary Computation Conference Companion ,GECCO ’18, page 63–64, New York, NY, USA, 2018. Association for Computing Ma-chinery.35. R. Poli. Analysis of the publications on the applications of particle swarm optimisation.

J. Artif. Evol. App. , 2008:4:1–4:10, Jan. 2008.36. R. Poli and E. Galv´an-L´opez. On the eﬀects of bit-wise neutrality on ﬁtness distance cor-relation, phenotypic mutation rates and problem hardness. In C. R. Stephens, M. Tous-saint, D. Whitley, and P. F. Stadler, editors,

Foundations of Genetic Algorithms , pages138–164, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.37. R. Poli and E. Galv´an-L´opez. The eﬀects of constant and bit-wise neutrality on problemhardness, ﬁtness distance correlation and phenotypic mutation rates.

IEEE Trans.Evolutionary Computation , 16(2):279–300, 2012.38. V. W. Porto, D. B. Fogel, and L. J. Fogel. Alternative neural network training methods.

IEEE Expert: Intelligent Systems and Their Applications , 10(3):16–22, June 1995.39. E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. Regularized evolution for image classiﬁerarchitecture search. In

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence,AAAI 2019, The Thirty-First Innovative Applications of Artiﬁcial Intelligence Confer-ence, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in ArtiﬁcialIntelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019 ,pages 4780–4789. AAAI Press, 2019.40. E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Ku-rakin. Large-scale evolution of image classiﬁers. In

Proceedings of the 34th InternationalConference on Machine Learning - Volume 70 , ICML’17, page 2902–2911. JMLR.org,2017.41. I. Rechenberg. Evolutionsstrategien. In B. Schneider and U. Ranft, editors,

Simula-tionsmethoden in der Medizin und Biologie , pages 83–114, Berlin, Heidelberg, 1978.Springer Berlin Heidelberg.42. C. Ryan, J. Collins, and M. O. Neill. Grammatical evolution: Evolving programs foran arbitrary language. In W. Banzhaf, R. Poli, M. Schoenauer, and T. C. Fogarty,editors,

Genetic Programming , pages 83–96, Berlin, Heidelberg, 1998. Springer BerlinHeidelberg.euroevolution in Deep Learning: The Role of Neutrality 943. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche,J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe,J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu,T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networksand tree search.

Nature , 529(7587):484–489, Jan. 2016.44. F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune. Deepneuroevolution: Genetic algorithms are a competitive alternative for training deep neuralnetworks for reinforcement learning.

ArXiv , abs/1712.06567, 2017.45. M. Suganuma, S. Shirakawa, and T. Nagao. A genetic programming approach to de-signing convolutional neural network architectures. In

Proceedings of the Genetic andEvolutionary Computation Conference , GECCO ’17, page 497–504, New York, NY,USA, 2017. Association for Computing Machinery.46. Y. Sun, B. Xue, M. Zhang, and G. G. Yen. Completely automated cnn architecturedesign based on blocks.

IEEE Transactions on Neural Networks and Learning Systems ,31(4):1242–1254, 2020.47. Y. Sun, B. Xue, M. Zhang, and G. G. Yen. Evolving deep convolutional neural networksfor image classiﬁcation.

IEEE Transactions on Evolutionary Computation , 24(2):394–407, 2020.48. R. S. Sutton and A. G. Barto.

Reinforcement Learning: An Introduction . A BradfordBook, Cambridge, MA, USA, 2018.49. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Van-houcke, and A. Rabinovich. Going deeper with convolutions. In

IEEE Conference onComputer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12,2015 , pages 1–9. IEEE Computer Society, 2015.50. N. I. Tapia and P. A. Estevez. On the information plane of autoencoders. , Jul 2020.51. N. Tishby and N. Zaslavsky. Deep learning and the information bottleneck principle.

CoRR , abs/1503.02406, 2015.52. S. Xie, H. Zheng, C. Liu, and L. Lin. Snas: Stochastic neural architecture search, 2018.53. X. Yao. Evolving artiﬁcial neural networks.

Proceedings of the IEEE , 87(9):1423–1447,1999.54. S. Yu and J. C. Principe. Understanding autoencoders with information theoretic con-cepts, 2019.55. S. Zagoruyko and N. Komodakis. Wide residual networks. In R. C. Wilson, E. R.Hancock, and W. A. P. Smith, editors,