A Hybrid Method for Training Convolutional Neural Networks
AA Hybrid Method for Training ConvolutionalNeural Networks
Vasco Lopes
Department of Computer ScienceUniversidade da Beira Interior
Covilh˜a, [email protected]
Paulo Fazendeiro
Department of Computer ScienceUniversidade da Beira InteriorInstituto de Telecomunicac¸ ˜oes
Covilh˜a, [email protected]
Abstract —Artificial Intelligence algorithms have been steadilyincreasing in popularity and usage. Deep Learning, allows neuralnetworks to be trained using huge datasets and also removesthe need for human extracted features, as it automates thefeature learning process. In the hearth of training deep neuralnetworks, such as Convolutional Neural Networks, we findbackpropagation, that by computing the gradient of the lossfunction with respect to the weights of the network for a giveninput, it allows the weights of the network to be adjusted tobetter perform in the given task. In this paper, we propose ahybrid method that uses both backpropagation and evolutionarystrategies to train Convolutional Neural Networks, where theevolutionary strategies are used to help to avoid local minimasand fine-tune the weights, so that the network achieves higheraccuracy results. We show that the proposed hybrid method iscapable of improving upon regular training in the task of imageclassification in CIFAR-10, where a VGG16 model was used andthe final test results increased 0.61%, in average, when comparedto using only backpropagation.
Index Terms —Deep Learning, Convolutional Neural Network,Weight Training, Hybrid Method, Back-Propagation
I. I
NTRODUCTION
In the recent past, we have witnessed how Artificial Intelli-gence (AI) has been re-shaping the world, mainly because ofdeep learning, which removed the need for extensive humanhand-crafted features [16]. Deep learning not only provideda way to automate the process of extracting features, butmore importantly, it paved the way for democratizing AI,since it allowed people with less knowledge about thosetechnologies to take advantage of them. With the exponentialgrowth of deep learning and the way it can aid developersin reaching new heights, new algorithms to solve both newand existing problems, have surfaced, such as GenerativeAdversarial Networks [7], which has proved to be an excellentgenerator of faces [12], objects [33], styles [39], text [36],attributes [9], among others [34]. Moreover, many traditionalAI algorithms have been used in a “deep” way, which is thecase of Convolutional Neural Networks (CNNs) [17]. Dueto the recent advances in hardware, enabling deep networks,with dozens of hidden layers, to be trained in useful time,CNNs became one of the most popular algorithms used tosolve problems related with computer vision [27]. These havebeen successfully used in image classification tasks [8], [10], [15], [32], object detection [23], biometrics [22], [38], medicalimage analysis [28], for text analysis [5], [13], among manymore applications. The reason for CNNs to be widely used ismainly due to the good results they achieve in virtually all thetasks they have been applied to.Even though CNNs achieve good results and remove theneed for human extracted features, they still have setbacks,being the most prominent ones: the need for a great deal ofdata and getting stuck in local minimums while training. Boththese situations occur because of the training procedure, whichis usually conducted using the backpropagation algorithm [25].Backpropagation is one of the fundamental foundations ofneural networks, allowing neural networks to be trained byperforming a chain rule with two steps: a forward pass todetermine the output of the network, and a backward passto adjust the network’s parameters according to the gradientdescent of the error function. However, it is common thatthis backward pass, searching the global minimum to reducethe error in a multi-dimensional space, gets stuck in localminimums. In this paper we propose a hybrid method, thatuses both backpropagation and evolutionary algorithm to trainCNNs, to tackle the problem of converging and getting stuck inlocal minimums. The proposed hybrid training method worksin two steps: 1) performing regular training using backprop-agation for n epochs; 2) continuing the training procedureusing backpropagation, whilst periodically pausing to performthe evolutionary algorithm in the weights of the last layer, inorder to avoid local minimums. In short, the proposed methodworks by training using backpropagation until an initial cri-terion is met, and afterwards, the training procedure usingbackpropagation and the evolutionary algorithm is resumeduntil a convenient stopping criterion is met. Notice that instep 1, n , can be selected in various ways, such as: untilthe validation loss ceases decreasing, fixed in advance (e.g.,15 epochs), and others. In our experiments, we focused ondefining n depending on the validation set. The reasons to onlyevolve the last layer weights are: 1) time constraints, becausedetermining the fitness for every individual in this setting istime-costly; 2) because in some transfer learning [20] setups,where the original task is closely related with the new one, a r X i v : . [ c s . N E ] A p r nly the last layer requires to be fine-tuned, which served asan inspiration.In short, the contributions of this work can be summarizedas a proposal to improve upon the traditional CNN training,by combining both regular training with backpropagation andevolutionary strategies. More important, is the fact that thisproposal can in fact aid the training procedure in order toavoid local minimums, and can be used both as a normaltraining procedure or to fine-tune a previously trained network.We openly release the code for this proposal , which wasdeveloped using PyTorch [21].The remainder of this work is organized as follows. SectionII presents the related work. Section III explains the proposedmethod in detail, and how the evolutionary strategy is con-ducted. Section IV presents the setup of the experiments, thedataset used and discusses the results obtained. Finally, SectionV, provides a conclusion.II. R ELATED W ORK
Evolutionary algorithms have been extensively used to solveparameter optimization problems [1], [2]. Tim Salimans et al. [26], show that evolution strategies can be an alternative toreinforcement learning, in tasks such as humanoid locomotionand Atari games. With focus on solving the same problems,Felipe Petroski Such et al. [31] use evolution strategies toevolve the weights of a deep neural network. This methodshares similarities with the one proposed in this paper, butthe focus on our work is to complement regular training, theyevolve the weights of the entire network, starting from randomand using one mutation, requiring significant computationalpower. In the topic of Neuroevolution, Evolutionary strategieshave been extensively used to evolve networks [6], [11],[30]. Recently, evolution strategies have also been used toevolve deeper neural networks [18], [19], which is the case ofRENAS, that uses evolutionary algorithms and reinforcementlearning to perform neural architecture search [4].CNNs architectures that are known to do well in specifictasks and can easily be transferred to other problems and stillperform well are game-changers, which are used in manyfields and problems, and often improve the results of theprevious state-of-the-art in such tasks, which is the case ofimage classification, and others [8], [10], [15], [17], [29],[32], [35], [37]. Usually these gains are the direct result oftopology improvements and other mechanisms for efficienttraining, which is the case of the skip connections [8]. Thetraining procedure of all of the aforementioned networkswas performed using backpropagation, which is the defaultstandard for neural network training.III. P
ROPOSED M ETHOD
Inspired by transfer learning [3], we propose a hybridmethod to complement backpropagation [24] in the task ofadjusting weights in CNNs training. The proposed method https://github.com/VascoLopes/HybridCNNTrain Regular TrainingRegular TrainingReached nepochs?yth epoch?EvolutionaryAlgorithmMaximumepochs?ENDSTARTYesYesYes NoNoNo
Fig. 1: Diagram of the training procedure. First, the regulartraining is performed n epochs. After this, regular training andthe evolutionary algorithm perform the training until reachingthe limit of epochs, whereas, the evolutionary algorithm takesplace every y th epoch.works by complementing the traditional training with anevolutionary algorithm, that evolves the weights of the fullyconnected output layer of a CNN. By doing so, the hybridmethod assists the regular training in avoiding local minima,and, as it only evolves the weights of the last layer, it doeso in useful time. The proposed method works by performingregular training (using only backpropagation) for a predefinednumber of epochs, n , and after those epochs are completed,the method continues to train until convergence, but every y epochs, the evolutionary strategy takes place and evolves theweights of the last layer, in order to guide the train out ofpossible local minima. The reason to allow regular trainingfor n epochs initially is to allow the network to reach a pointwhere its loss is low in a faster way than using both methodsthroughout the entire method.The hybrid method is then composed of two steps: 1) regulartraining for n epochs, 2) regular training until convergence,whilst having an evolutionary algorithm taking place every y epochs, as represented in the diagram presented in Figure1. The evolutionary algorithm developed is based on threecomponents, the first component is the initial population,which is created using a heuristic initialization, whilst theother two components are focused on evolving the weights,being those the cross-overs and mutation, that replace thepopulation by a new one at the end of a generation. Denotethat elitism is also employed, where the best individual is keptwithout changes to the next generation, which forces both thealgorithm to find weights that are closer to the best individualsand at the same time, ensures that in the worst case possible,where all generated weights are worse than the initial one, theperformance of the CNN is not jeopardized, as if all are worse,elitism keeps the initial weights throughout the evolution. Inthe next sections, we explain in detail the three componentsmentioned. A. Initial Population
To generate the initial population, we used a heuristicinitialization, meaning that instead of randomly generating theinitial population, we generate it based on a heuristic for theproblem. The heuristic used is simply to mutate the weightscoming from the regular training, from which we will generate i − individuals based on the initial weights, where the i th element is the initial weights without changes (use of elitism).To generate the initial population, we then perform thefollowing mutation, i − times, one for each new individual: W = W i + R ∗ mag, R ← N (0 , (1)Where W i represent the weights obtained from the regulartraining, R is a 1D array of pseudo-random generated numbersfrom a normal distribution, mag is a real number, between and to smooth the mutations and W is the resulting weightsthat represent the new individual. B. Evolution
With the initial population, comprised of i individuals,generated, the evolution takes place for e generations. Asaforementioned, each generation is composed of i individu-als, where the i th is the individual with the highest fitnessfrom the previous generation. Moreover, one can think ofthe initial population as generation , and the i th element of that generation as the original, unchanged, weights. Togenerate new individuals to compose the next generation, weimplemented 3 types of mutations and 1 type of crossover,which are explained in detail in the next sections.
1) Crossover:
To perform crossover in order to generatenew individuals, we implemented a block-wise crossover,where m individuals are selected to generate one offspringby contributing with a block of his own-weights, each. Toselect the m individuals, we forced that only the top 30% canbe selected as parents and from that, we randomly select the m parents, from the ordered set that make the top 30%. Togenerate a new individual, first, we need to calculate the sizeof the blocks that each parent will contribute: k = l div m (2)Where l is the total size of the weights (length of the array), m is the number of parents and k is the result of the integerdivision of l by m , representing the size of each block. Withthis, a new individual A , is generated by concatenating allblocks of the parents: A = [ a , ...a m ] (3)Where a represents a block from parent until parent m .Where each block a , from parent 1 until the m th parent, canbe represented as: a i = P i [( i − ∗ k : i ∗ k ] , i = 1 , ..., m − (4)Where P is the parents array. Whereas the final block if thereare more than one parent, a m , is represented differently toaccommodate odd sizes: a m = P m [( m − ∗ k : l ] (5)
2) Mutation:
For weight mutation, which can be done afterthe crossover and applied to the new individual, or if crossoverhas not taken place, to one individual in the top 30%, weimplemented three methods. The first one is exactly the sameas mentioned in Section III-A, Equation 1, where all theweights in the weight array suffer a small mutation.The second mutation is a block mutation, where the weightarray of the individual being mutated is split into b equalblocks, and one random block suffers the mutation: B = B + R ∗ mag, R ← U ( − B, B ) (6)Where B represents the weights in the block, mag is a realnumber, between and to smooth the mutations and R is a1D array of pseudo-random generated numbers from a uniformdistribution, where each value is based on the value at the sameposition in the B array.The last mutation implemented is a per value mutation,where for each element in the weights array, a pseudo-randomnumber is generated, and if it is lower than a probability, P v ,the element suffers a mutation: v = v + r ∗ mag, r ← U ( − v, v ) (7)here v represents the element on a given position of theweights array, r is a pseudo-random generated number usinga uniform distribution, and mag is a real number, between and to smooth the mutation.Hence, the mutation algorithm is composed of the threeaforementioned mutation protocols. It is important to denotethat only one mutation can occur in a given individual, butas mentioned before, a mutation can take place in a newlygenerated individual by crossover.IV. E XPERIMENTS
In order to evaluate the performance of the proposedmethod, we conducted an experiment using VGG16 [29], awell-known CNN. In the following sections, we explain theexperimental setup, the dataset and the results obtained.
A. Experimental setup
For this work, we used a computer with an NVidia GeForceGTX 1080 Ti, 16Gb of ram, 500GB SSD disk and an AMDRyzen 7 2700 processor. Regarding the number of epochs forthe training without the use of evolution, we set n to 50,and y to , meaning that every epochs, starting at epochnumber , the evolution of the weights take place. We alsoset a maximum number of epochs for the entire training.Regarding the evolution, we set the number of individualsper generation, i , to 100 and the number of generations, g to 5. Moreover, regarding the probability of crossover, weset it to , and the probability of performing mutationsto , whereas, the probability of each mutation was setto: probability of mutation per value of , probability ofmutation per block of and probability of mutation ofthe entire array of . Regarding the magnitude values, inEquation (1) we used a mag = 1 , and in Equation (6) and (7) mag = 0 . .We performed the experiment three times with the afore-mentioned setup, and to have a baseline, we also performed,three times, a simple regular train using only backpropagationfor the entirety of the 100 epochs. Denote that we use the sameseeds for the hybrid method and the regular method, meaningthat three seeds were used, one for each experiment, whereit was comprised of using the hybrid method and the regularmethod. B. CIFAR-10
CIFAR-10 is a dataset composed of classes with a totalof , RGB images with a size of by pixels [14]. Itwas created by labelling a subset of the Million Tiny ImagesDataset. The official dataset is split into five training batchesof thousand images each and a batch of thousand imagesfor testing. The classes are: airplane, automobile, bird, cat,deer, dog, frog, horse, ship and truck. As the dataset wasmanually labelled, the authors took the precaution of ensuringthat the classes are mutually exclusive, meaning that there isno overlap between any of two classes. Denote that the classesare balanced, each one having images and, the evaluation metric used is accuracy. For the experiments conducted, weused , images for the train, , images for validationand , images for the test set, with stratified and balancedsplits. C. Results and Discussion
In table I, we show the accuracy and standard deviationfor each epoch where the evolution could take place (epoch50 and 10 epoch after), for both the hybrid method and theregular training. For the hybrid method, we also state theperformance at the end of each generation. From this, we cansee that the hybrid method is capable of further improvingthe weights of the regular training, meaning that it achieveshigher performance, every time the evolution takes place, thanusing only the regular method for training. Regarding theperformance of the CNN in the validation set and in the testset, after completing the process of training, and the time costof training both the hybrid method and the regular method(baseline), can be seen in Table II. In this, it is possibleto see that the hybrid method is capable of complementingbackpropagation and achieving higher performances than theregular method in both the validation and test sets. However,this accuracy improvement in performances comes at a highertime cost. V. C
ONCLUSION
Deep learning has extended the use of Artificial Intelligenceto virtually every field of research, due to its capability oflearning in an automated way, without requiring extensivehuman-extracted features. To train CNNs, backpropagation isthe go-to algorithm. It uses gradients to adjust the weights,resulting in trained CNNs with higher performances thanrandomly generated weights. Many proposals fine-tune theweights to achieve even better results, which is the case oftransfer learning, where one CNN is trained in a given task,and then the weights are re-used to train a CNN with the sametopology (or close, as the last layer is normally changed due toclassification neurons) in other tasks. This inspired us to createa hybrid method that uses both the traditional, regular trainingwith backpropagation and evolutionary strategies to fine-tunethe weights of the last fully connected layer. The proposedmethod is capable of further improving the results of a CNNand can be used to fine-tune CNNs. This work shows that thereis potential in using hybrid methods, since in our experimentsthe hybrid method was capable of achieving better resultsthan just using a regular method (only backpropagation). Thedown-side of this is that evolutionary strategies often requiregreat computational power, as they usually perform unguidedsearches throughout the search space and require many in-dividuals to find good candidates. However, this presentsresearch opportunities in the field of evolving only a set ofweights as candidates, which can be further extended to newproblems and using evolutionary mechanisms that are knownto improve the performance of such systems. Therefore, thismethod can be used in two ways: 1) as presented here, where poch Hybrid Method Accuracy/Generation (%) Regular Method1 2 3 4 5 Accuracy (%)50 . ± .
25 88 . ± .
23 88 . ± .
21 88 . ± . . ± . . ± . . ± .
15 88 . ± .
16 88 . ± .
17 88 . ± . . ± . . ± . . ± .
48 88 . ± .
47 88 . ± .
48 88 . ± . . ± . . ± . . ± .
11 88 . ± .
13 88 . ± .
13 88 . ± . . ± . . ± . . ± .
13 88 . ± .
15 88 . ± .
12 88 . ± . . ± . . ± . . ± .
13 88 . ± .
13 88 . ± .
12 88 . ± . . ± . . ± . TABLE I: Mean validation accuracy and standard deviation in every 10 epochs after epoch 50 for both the regular trainingalone, and for the proposed hybrid method, where the accuracy is shown for every generation, denoting that the fifth generationis the end of the epoch.TABLE II: Mean accuracy and standard deviation in thevalidation and test set and the time spent for training andtesting both the Hybrid Method and the Regular Method(baseline).
Validation Accuracy(%) Test Accuracy(%) Time Cost(s)Hybrid Method . ± .
40 88 . ± . . ± .
15 87 . ± . a CNN can be trained from scratch using the hybrid method,or 2) just fine-tuning a neural network that is already trained,to find weights that can achieve better performances.R EFERENCES [1] Thomas Back.
Evolutionary algorithms in theory and practice:evolution strategies, evolutionary programming, genetic algo-rithms . Oxford university press, 1996.[2] Thomas Back, Frank Hoffmeister, and Hans-Paul Schwefel. Asurvey of evolution strategies. In
Proceedings of the fourth in-ternational conference on genetic algorithms , volume 2. MorganKaufmann Publishers San Mateo, CA, 1991.[3] Yoshua Bengio. Deep learning of representations for unsuper-vised and transfer learning. In
Proceedings of ICML workshopon unsupervised and transfer learning , pages 17–36, 2012.[4] Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang,Chang Huang, Lisen Mu, and Xinggang Wang. Renas: Re-inforced evolutionary neural architecture search. In
Proceed-ings of the IEEE Conference on Computer Vision and PatternRecognition , pages 4787–4796, 2019.[5] Alexis Conneau, Holger Schwenk, Lo¨ıc Barrault, and YannLecun. Very deep convolutional networks for text classification.In
Proceedings of the 15th Conference of the European Chapterof the Association for Computational Linguistics: Volume 1,Long Papers , pages 1107–1116, Valencia, Spain, Apr. 2017.Association for Computational Linguistics.[6] Dario Floreano, Peter D¨urr, and Claudio Mattiussi. Neuroevo-lution: from architectures to learning.
Evolutionary intelligence ,1(1):47–62, 2008.[7] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,David Warde-Farley, Sherjil Ozair, Aaron Courville, and YoshuaBengio. Generative adversarial nets. In
Advances in neuralinformation processing systems , pages 2672–2680, 2014.[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recog-nition , pages 770–778, 2016.[9] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen. Attgan:Facial attribute editing by only changing what you want.
IEEETransactions on Image Processing , 28(11):5464–5478, Nov2019.[10] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian QWeinberger. Densely connected convolutional networks. In
Proceedings of the IEEE conference on computer vision andpattern recognition , pages 4700–4708, 2017.[11] C. Igel. Neuroevolution for reinforcement learning usingevolution strategies. In
The 2003 Congress on EvolutionaryComputation, 2003. CEC ’03. , volume 4, pages 2588–2595Vol.4, Dec 2003.[12] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen.Progressive growing of GANs for improved quality, stability,and variation. In
International Conference on Learning Repre-sentations , 2018.[13] Yoon Kim. Convolutional neural networks for sentence clas-sification. In
Proceedings of the 2014 Conference on Em-pirical Methods in Natural Language Processing (EMNLP) ,pages 1746–1751, Doha, Qatar, Oct. 2014. Association forComputational Linguistics.[14] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiplelayers of features from tiny images. Technical report, Citeseer,2009.[15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Im-agenet classification with deep convolutional neural networks.In
Advances in neural information processing systems , pages1097–1105, 2012.[16] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deeplearning. nature , 521(7553):436–444, 2015.[17] Yann LeCun, L´eon Bottou, Yoshua Bengio, Patrick Haffner,et al. Gradient-based learning applied to document recognition.
Proceedings of the IEEE , 86(11):2278–2324, 1998.[18] Hanxiao Liu, Karen Simonyan, Oriol Vinyals, ChrisanthaFernando, and Koray Kavukcuoglu. Hierarchical repre-sentations for efficient architecture search. arXiv preprintarXiv:1711.00436 , 2017.[19] Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar,Kalyanmoy Deb, Erik Goodman, and Wolfgang Banzhaf. Nsga-net: a multi-objective genetic algorithm for neural architecturesearch. 2018.[20] Sinno Jialin Pan and Qiang Yang. A survey on transfer learn-ing.
IEEE Transactions on knowledge and data engineering ,22(10):1345–1359, 2009.[21] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, JamesBradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, NataliaGimelshein, Luca Antiga, et al. Pytorch: An imperative style,igh-performance deep learning library. In
Advances in NeuralInformation Processing Systems , pages 8024–8035, 2019.[22] H. Proenc¸a and J. C. Neves. Deep-prwis: Periocular recognitionwithout the iris and sclera using deep learning frameworks.
IEEE Transactions on Information Forensics and Security ,13(4):888–896, April 2018.[23] Joseph Redmon, Santosh Divvala, Ross Girshick, and AliFarhadi. You only look once: Unified, real-time object detection.In
Proceedings of the IEEE conference on computer vision andpattern recognition , pages 779–788, 2016.[24] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.Learning internal representations by error propagation. Techni-cal report, California Univ San Diego La Jolla Inst for CognitiveScience, 1985.[25] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.Learning representations by back-propagating errors. nature ,323(6088):533–536, 1986.[26] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and IlyaSutskever. Evolution strategies as a scalable alternative toreinforcement learning. arXiv preprint arXiv:1703.03864 , 2017.[27] J¨urgen Schmidhuber. Deep learning in neural networks: Anoverview.
Neural Networks , 61:85 – 117, 2015.[28] Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep learningin medical image analysis.
Annual review of biomedicalengineering , 19:221–248, 2017.[29] Karen Simonyan and Andrew Zisserman. Very deep convolu-tional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556 , 2014.[30] Kenneth O Stanley and Risto Miikkulainen. Evolving neuralnetworks through augmenting topologies.
Evolutionary compu-tation , 10(2):99–127, 2002.[31] Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, JoelLehman, Kenneth O Stanley, and Jeff Clune. Deep neuroevo-lution: Genetic algorithms are a competitive alternative fortraining deep neural networks for reinforcement learning. arXivpreprint arXiv:1712.06567 , 2017.[32] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet,Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Van-houcke, and Andrew Rabinovich. Going deeper with convo-lutions. In
Proceedings of the IEEE conference on computervision and pattern recognition , pages 1–9, 2015.[33] Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. Gener-ating videos with scene dynamics. In D. D. Lee, M. Sugiyama,U. V. Luxburg, I. Guyon, and R. Garnett, editors,
Advancesin Neural Information Processing Systems 29 , pages 613–621.Curran Associates, Inc., 2016.[34] Zhengwei Wang, Qi She, and Tomas E Ward. Generativeadversarial networks: A survey and taxonomy. arXiv preprintarXiv:1906.01529 , 2019.[35] Saining Xie, Ross Girshick, Piotr Doll´ar, Zhuowen Tu, andKaiming He. Aggregated residual transformations for deepneural networks. In
Proceedings of the IEEE conference oncomputer vision and pattern recognition , pages 1492–1500,2017.[36] Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan:Sequence generative adversarial nets with policy gradient. In
Thirty-First AAAI Conference on Artificial Intelligence , 2017.[37] Sergey Zagoruyko and Nikos Komodakis. Wide residual net-works. arXiv preprint arXiv:1605.07146 , 2016.[38] Luiz A Zanlorensi, Rayson Laroca, Eduardo Luz, Alceu SBritto Jr, Luiz S Oliveira, and David Menotti. Ocular recog-nition databases and competitions: A survey. arXiv preprintarXiv:1911.09646 , 2019.[39] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros.Unpaired image-to-image translation using cycle-consistent ad-versarial networkss. In