[PDF] Evolving GAN Formulations for Higher Quality Image Synthesis

Abstract

Generative Adversarial Networks (GANs) have extended deep learning to complex generation and translation tasks across different data modalities. However, GANs are notoriously difficult to train: Mode collapse and other instabilities in the training process often degrade the quality of the generated results, such as images. This paper presents a new technique called TaylorGAN for improving GANs by discovering customized loss functions for each of its two networks. The loss functions are parameterized as Taylor expansions and optimized through multiobjective evolution. On an image-to-image translation benchmark task, this approach qualitatively improves generated image quality and quantitatively improves two independent GAN performance metrics. It therefore forms a promising approach for applying GANs to more challenging tasks in the future.

Full PDF

EEvolving GAN Formulations for Higher Quality Image Synthesis

Santiago Gonzalez † Apple Inc.Cupertino, [email protected]

Mohak Kant

Cognizant Technology SolutionsSan Francisco, [email protected]

Risto Miikkulainen

University of Texas at AustinAustin, TexasCognizant Technology SolutionsSan Francisco, [email protected]

ABSTRACT

Generative Adversarial Networks (GANs) have extended deep learn-ing to complex generation and translation tasks across differentdata modalities. However, GANs are notoriously difficult to train:Mode collapse and other instabilities in the training process of-ten degrade the quality of the generated results, such as images.This paper presents a new technique called TaylorGAN for im-proving GANs by discovering customized loss functions for eachof its two networks. The loss functions are parameterized as Tay-lor expansions and optimized through multiobjective evolution.On an image-to-image translation benchmark task, this approachqualitatively improves generated image quality and quantitativelyimproves two independent GAN performance metrics. It thereforeforms a promising approach for applying GANs to more challengingtasks in the future.

CCS CONCEPTS • Computing methodologies → Machine learning algorithms ;Distributed algorithms;

Computer vision ; Genetic algorithms ; Neu-ral networks ; KEYWORDS

Loss Functions, Metalearning, Evolutionary Strategies

Generative Adversarial Networks (GANs) have recently emergedas a promising technique for building models that generate newsamples according to a distribution within a dataset. In GANs, twoseparate networks—a generator and a discriminator—are trainedin tandem in an adversarial fashion: The generator attempts tosynthesize samples that the discriminator believes is real, whilethe discriminator attempts to differentiate between samples fromthe generator and samples from a ground-truth dataset. However,GANs are challenging to train. Training often suffers from insta-bilities that can lead to low-quality and potentially low-varietygenerated samples. These difficulties have lead many researchers totry formulating better GANs, primarily by designing new generatorand discriminator loss functions by hand.Neuroevolution may potentially offer a solution to this problem.It has recently been extended from optimizing network weightsand topologies to designing deep learning architectures [19, 31, 39].Advances in this field,—known as evolutionary metalearning—haveresulted in designs that outperform those that are manually-tuned. † Work done while at the University of Texas at Austin and Cognizant TechnologySolutions.

One particular family of techniques—loss-function metalearning—has allowed for neural networks to be trained more quickly, withhigher accuracy, and better robustness [6, 7]. Perhaps loss-functionmetalearning can be adapted to improve GANs?In this paper, such a technique is developed to evolve entirelynew GAN formulations that outperform the standard Wassersteinloss. Leveraging the TaylorGLO loss-function parameterizationapproach [8], separate loss functions are constructed for the twoGAN networks. A genetic algorithm is then used to optimize theirparameters against two non-differentiable objectives. A compositetransformation of these objectives [36] is further used to enhancethe multiobjective search.This TaylorGAN approach is evaluated experimentally in animage-to-image translation benchmark task where the goal is togenerate photorealistic building images based on a building segmentmap. The CMP Facade dataset [41] is used as the training data andthe pix2pix-HD conditional GAN [45] as the generative model. Theapproach is found to both qualitatively enhance generated imagequality and quantitatively improve the two metrics. The evaluationthus demonstrates how evolution can improve a leading conditionalGAN design by replacing manually designed loss functions withthose optimized by a multiobjective genetic algorithm.The next section reviews key literature in GANs, motivating theevolution of their loss functions. Section 3 describes the TaylorGANtechnique, focusing on how the TaylorGLO loss-function param-eterization is leveraged, and Section 4 details the experimentalconfiguration and evaluation methodologies. In Section 5, Taylor-GAN’s efficacy is evaluated on the benchmark task. Section 6 placesthese these findings in the general context of the GAN literatureand describes potential avenues for future work.

Generative Adversarial Networks (GANs;[9]), are a type of gen-erative model consisting of a pair of networks, a generator anddiscriminator, that are trained in tandem. GANs are a modern suc-cessor to Variational Autoencoders (VAEs) [15] and BoltzmannMachines [13], including Restricted Boltzmann Machines [38] andDeep Boltzmann Machines [35].The following subsections review prominent GAN methods. KeyGAN formulations, and the relationships between them, are de-scribed. Consistent notation (shown in Table 1) is used, consolidat-ing the extensive variety of notation in the field. a r X i v : . [ c s . N E ] F e b antiago Gonzalez † , Mohak Kant, and Risto Miikkulainen Table 1: GAN Notation DecoderSymbol Description 𝐺 ( 𝒙 , 𝜃 𝐺 ) Generator function 𝐷 ( 𝒛 , 𝜃 𝐷 ) Discriminator function P data Probability distribution of the original data P 𝑧 Latent vector noise distribution P 𝑔 Probability distribution of 𝐺 ( 𝒛 ) 𝒙 Data, where 𝒙 ∼ P data ˜ 𝒙 Generated data 𝒛 Latent vector, where 𝒛 ∼ P 𝑧 𝒄 Condition vector 𝜆 Various types of weights / hyperparameters

A GAN’s generator and discriminator are set to compete with eachother in a minimax game, attempting to reach a Nash equilibrium[12, 27]. Throughout the training process, the generator aims totransform samples from a prior noise distribution into data, suchas an image, that tricks the discriminator into thinking it has beensampled from the real data’s distribution. Simultaneously, the dis-criminator aims to determine whether a given sample came fromthe real data’s distribution, or was generated from noise.Unfortunately, GANs are difficult to train, frequently exhibitinginstability, i.e., mode collapse, where all modes of the target datadistribution are not fully represented by the generator [2, 10, 14, 21–23, 30]. GANs that operate on image data often suffer from visualartifacts and blurring of generated images [14, 28]. Additionally,datasets with low variability have been found to degrade GANperformance [22].GANs are also difficult to evaluate quantitatively, typically re-lying on metrics that attempt to embody vague notions of quality.Popular GAN image scoring metrics, for example, have been foundto have many pitfalls, including cases where two samples of clearlydisparate quality may have similar values [3].

Using the notation described in Table 1, the original minimax GANformulation from [9] can be defined asmin 𝜃 𝐺 max 𝜃 𝐷 E 𝒙 ∼ P data [ log 𝐷 ( 𝒙 )] + E 𝒛 ∼ P 𝑧 [ log ( − 𝐷 ( 𝐺 ( 𝒛 )))] . (1)This formulation can be broken down into two separate loss func-tions, one each for the discriminator and generator: L 𝐷 = − 𝑛 𝑛 ∑︁ 𝑖 = [ log 𝐷 ( 𝑥 𝑖 ) + log ( − 𝐷 ( 𝐺 ( 𝑧 𝑖 )))] , and (2) L 𝐺 = 𝑛 𝑛 ∑︁ 𝑖 = log ( − 𝐷 ( 𝐺 ( 𝑧 𝑖 ))) . (3)The discriminator’s loss function is equivalent to a sigmoid cross-entropy loss when thought of as a binary classifier. Goodfellow et al.[9] proved that training a GAN with this formulation is equivalent tominimizing the Jensen-Shannon divergence between P 𝑔 and P data ,i.e. a symmetric divergence metric based on the Kullback-Leiblerdivergence. In the above formulation the generator’s loss saturates quicklysince the discriminator learns to reject the novice generator’s sam-ples early on in training. To resolve this problem, Goodfellow etal. provided a second “non-saturating” formulation with the samefixed-point dynamics, but better, more intense gradients for thegenerator early on:max 𝜃 𝐷 E 𝒙 ∼ P data [ log 𝐷 ( 𝒙 )] + E 𝒛 ∼ P 𝑧 [ log ( − 𝐷 ( 𝐺 ( 𝒛 )))] , (4)max 𝜃 𝐺 E 𝒛 ∼ P 𝑧 [ log 𝐷 ( 𝐺 ( 𝒛 ))] . (5)Each GAN training step consists of training the discriminatorfor 𝑘 steps, while sequentially training the generator for only onestep. This difference in steps for both networks helps prevent thediscriminator from learning too quickly and overpowering thegenerator.Alternatively, Unrolled GANs [23] aimed to prevent the discrim-inator from overpowering the generator by using a discriminatorwhich has been unrolled for a certain number of steps in the gen-erator’s loss, thus allowing the generator to train against a moreoptimal discriminator. More recent GAN work instead uses a twotime-scale update rule (TTUR) [12], where the two networks aretrained under different learning rates for one step each. This ap-proach has proven to converge more reliably to more desirablesolutions.Unfortunately, with both minimax and non-saturating GANs thegenerator gradients vanish for samples that are on the correct sideof the decision boundary but far from the true data distribution[21, 22]. The Wasserstein GAN, described next, is designed to solvethis problem. The Wasserstein GAN (WGAN) [2] is arguably one of the most im-pactful developments in the GAN literature since the original formu-lation by Goodfellow et al. [9]. WGANs minimize the Wasserstein-1distance between P 𝑔 and P data , rather than the Jensen-Shannondivergence, in an attempt to avoid vanishing gradient and modecollapse issues. In the context of GANs, the Wasserstein-1 distancecan be defined as 𝑊 ( P 𝑔 , P data ) = inf 𝛾 ∈ Π ( P 𝑔 , P data ) E ( 𝒖 , 𝒗 )∼ 𝛾 [∥ 𝒖 − 𝒗 ∥] , (6)where, 𝛾 ( 𝒖 , 𝒗 ) represents the amount of mass that needs to movefrom 𝒖 to 𝒗 for P 𝑔 to become P data . This formulation with theinfimum is intractable, but the Kantorovich-Rubinstein duality [43]with a supremum makes the Wasserstein-1 distance tractable, whileimposing a 1-Lipschitz smoothness constraint: 𝑊 ( P 𝑔 , P data ) = sup ∥ 𝑓 ∥ 𝐿 ≤ E 𝒖 ∼ P 𝑔 [ 𝑓 ( 𝒖 )] − E 𝒖 ∼ P data [ 𝑓 ( 𝒖 )] , (7)which translates to the training objectivemin 𝜃 𝐺 max 𝜃 𝐷 ∈ Θ 𝐷 E 𝒙 ∼ P data [ 𝐷 ( 𝒙 )] − E 𝒛 ∼ P 𝑧 [ 𝐷 ( 𝐺 ( 𝒛 ))] , (8)where Θ 𝐷 is the set of all parameters for which 𝐷 is a 1-Lipschitzfunction.WGANs are an excellent example of how generator and discrimi-nator loss functions can profoundly impact the quality of generatedsamples and the prevalence of mode collapse. However, the WGANhas a 1-Lipschitz constraint that needs to be maintained throughouttraining for the formulation to work. WGANs enforce the constraint volving GAN Formulations for Higher Quality Image Synthesis via gradient clipping, at the cost of requiring an optimizer that doesnot use momentum, i.e., RMSProp [40] rather than Adam [16].To resolve the issues caused by gradient clipping, a subsequentformulation, WGAN-GP [10], added a gradient penalty regulariza-tion term to the discriminator loss: 𝐺𝑃 = 𝜆 E ˆ 𝒙 ∼ P ˆ 𝑥 (cid:2) (∥∇ ˆ 𝒙 𝐷 ( ˆ 𝒙 )∥ − ) (cid:3) , (9)where P ˆ 𝑥 samples uniformly along lines between P data and P 𝑔 . Thegradient penalty enforces a soft Lipschitz smoothness constraint,leading to a more stationary loss surface than when gradient clip-ping is used, which in turn makes it possible to use momentum-based optimizers. The gradient penalty term has even been suc-cessfully used in non-Wasserstein GANs [5, 22]. However, gradientpenalties can increase memory and compute costs [22]. Another attempt to solve the issue of vanishing gradients is theLeast-Squares GAN (LSGAN) [21]. It defines the training objectiveas min 𝜃 𝐷 E 𝒙 ∼ P data (cid:2) ( 𝐷 ( 𝒙 ) − 𝑏 ) (cid:3) + E 𝒛 ∼ P 𝑧 (cid:2) ( 𝐷 ( 𝐺 ( 𝒛 )) − 𝑎 ) (cid:3) , (10)min 𝜃 𝐺 E 𝒛 ∼ P 𝑧 (cid:2) ( 𝐷 ( 𝐺 ( 𝒛 )) − 𝑐 ) (cid:3) , (11)where 𝑎 is the label for generated data, 𝑏 is the label for real data,and 𝑐 is the label that 𝐺 wants to trick 𝐷 into believing for gen-erated data. In practice, typically 𝑎 = , 𝑏 = , 𝑐 =

1. However,subsequently, 𝑎 = − , 𝑏 = − , 𝑐 = 𝜒 divergence [29] between P data + P 𝑔 and 2 ∗ P 𝑔 . Gener-ated data quality can oscillate throughout the training process [22],indicating a disparity between data quality and loss. Traditional GANs learn how to generate data from a latent space, i.e.an embedded representation of the training data that the generatorconstructs. Typically, the elements of a latent space have no imme-diately intuitive meaning [4, 17]. Thus, GANs can generate noveldata, but there is no way to steer the generation process to generateparticular types of data. For example, a GAN that generates imagesof human faces cannot be explicitly told to generate a face with aparticular hair color or of a specific gender. While techniques havebeen developed to analyze this latent space [18, 44], or build moreinterpretable latent spaces during the training process [4], theydo not necessarily translate a human’s prior intuition correctly ormake use of labels when they are available. To tackle this problem,Conditional GANs, first proposed as future work in [9] and sub-sequently developed by Mirza and Osindero [25], allow directlytargetable features (i.e., conditions) to be an integral part of thegenerator’s input.The conditioned training objective for a minimax GAN can bedefined, without loss of generality, asmin 𝜃 𝐺 max 𝜃 𝐷 E 𝒙 ∼ P data [ log 𝐷 ( 𝒙 ⊕ 𝒄 )]+ E 𝒛 ∼ P 𝑧 [ log ( − 𝐷 ( 𝐺 ( 𝒛 ⊕ 𝒄 )))] , (12)where 𝒛 ⊕ 𝒄 is basic concatenation of vectors. During training, thecondition vector, 𝒄 , arises from the sampling process that produces each 𝒙 . This same framework can be used to design conditionedvariants of other GAN formulations.Conditional GANs have enjoyed great successes as a result oftheir flexibility, even in the face of large, complex condition vectors,which may even be whole images. They enable new applicationsfor GANs, including repairing software vulnerabilities (framed assequence to sequence translation) [11], integrated circuit maskdesign [1], and image to image translation [14]—the generationof images given text [32]—which is used as the target setting forthis paper. Notably, conditional GANs can increase the quality ofgenerated samples for labeled datasets, even when conditionedgeneration is not needed [42]. Conditional GANs are therefore usedas the platform for the TaylorGAN technique described in the nextsection. The GAN formulations described above all have one property incommon: The generator and discriminator loss functions have beenarduously derived by hand. A GAN’s performance and stability isgreatly impacted by the choice of loss functions. Different regular-ization terms, such as the aforementioned gradient penalty can alsoaffect a GAN’s training. These elements of the GAN are typicallydesigned to minimize a specific divergence. However, a GAN doesnot need to decrease a divergence at every step in order to reach theNash equilibrium [5]. In this situation, an automatic loss-functionoptimization system may find novel GAN loss functions with moredesirable properties. Such a system is presented in Section 3 andevaluated on conditional GANs in Section 5.

As GANs have grown in popularity, the difficulties involved intraining them have become increasingly evident. The loss functionsused to train a GAN’s generator and discriminator constitute thecore of how GANs are formulated. Thus, optimizing these lossfunctions jointly can result in better GANs. This section presents anextension of TaylorGLO to evolve loss functions for GANs. Imagesgenerated in this way improve both visually and quantitatively, asthe experiments in Section 5 show.TaylorGLO parameterization represents a loss function as a mod-ified third-degree Taylor polynomial. Such a parameterization hasmany desirable properties, such as smoothness and continuity, thatmake it amenable for evolution [8]. In TaylorGAN, there are threefunctions that need to be optimized jointly (using the notationdescribed in Table 1):(1) The component of the discriminator’s loss that is a functionof 𝐷 ( 𝒙 ) , the discriminator’s output for a real sample fromthe dataset,(2) The synthetic / fake component of the discriminator’s lossthat is a function of 𝐷 ( 𝐺 ( 𝒛 )) , the discriminator’s output fromthe generator that samples 𝒛 from the latent distribution),and(3) The generator’s loss, a function of 𝐷 ( 𝐺 ( 𝒛 )) .The discriminator’s full loss is simply the sum of components (1)and (2). Table 2 shows how existing GAN formulations can bebroken down into this tripartite loss. antiago Gonzalez † , Mohak Kant, and Risto Miikkulainen Table 2: Interpretation of existing GAN formulations. These three components are all that is needed to define the discrim-inator’s and generator’s loss functions (sans regularization terms). Thus, TaylorGAN can discover and optimize new GANformulations by jointly evolving three separate functions.Formulation Loss 𝐷 (real) Loss 𝐷 (fake) Loss 𝐺 (fake) E 𝒙 ∼ P data E 𝒛 ∼ P 𝑧 E 𝒛 ∼ P 𝑧 GAN (minimax) [9] − log 𝐷 ( 𝒙 ) − log ( − 𝐷 ( 𝐺 ( 𝒛 ))) log ( − 𝐷 ( 𝐺 ( 𝒛 ))) GAN (non-saturating) [9] − log 𝐷 ( 𝒙 ) − log ( − 𝐷 ( 𝐺 ( 𝒛 ))) − log 𝐷 ( 𝐺 ( 𝒛 )) WGAN [2] − 𝐷 ( 𝒙 ) 𝐷 ( 𝐺 ( 𝒛 )) − 𝐷 ( 𝐺 ( 𝒛 )) LSGAN [21] ( 𝐷 ( 𝒙 ) − ) ( 𝐷 ( 𝐺 ( 𝒛 ))) ( 𝐷 ( 𝐺 ( 𝒛 )) − ) These three functions can be evolved jointly. GAN loss func-tions have a single input, i.e. 𝐷 ( 𝒙 ) or 𝐷 ( 𝐺 ( 𝒛 )) . Thus, a set of threethird-order TaylorGLO loss functions for GANs requires only 12parameters to be optimized, making the technique quite efficient.Fitness for each set of three functions requires a different in-terpretation than in regular TaylorGLO. Since GANs cannot bethought of as having an accuracy, a different metric needs to beused. The choice of fitness metric depends on the type of problemand target application. In the uncommon case where the trainingdata’s sampling distribution is known, the clear choice is the diver-gence between such a distribution and the distribution of samplesfrom the generator. This approach will be used in the experimentsbelow.Reliable metrics of visual quality are difficult to define. Individualimage quality metrics can be exploited by adversarially constructed,lesser-quality images [3]. For this reason, TaylorGAN utilizes a com-bination of two or more metrics, and multiobjective optimizationof them. Good solutions are usually located near the middle of theresulting Pareto front, and they can be found effectively throughan objective transformation technique called Composite Objectives[36]. In this technique, evolution is performed against a weightedsum of metrics. Individual metrics are scaled such that their rangesof typical values match. Thus, if one metric improves, overall fitnesswill only increases if there is not a comparable regression alonganother metric. The technique was integrated into the LEAF evolutionary AutoMLframework [20]. TaylorGAN parameters were evolved by the LEAFgenetic algorithm as if they were hyperparameters. The implemen-tation of CoDeepNEAT [24] for neural architecture search in LEAFwas not used.The technique was evaluated on the CMP Facade [41] datasetwith a pix2pix-HD model [45]. The dataset consists of only 606perspective-corrected 256 ×

256 pixel images of building facades.Each image has a corresponding annotation image that segmentsfacades into twelve different components, such as windows anddoors. The objective is for the model to take an arbitrary annotationimage as an input, and generate a photorealistic facade as output.The dataset was split into a training set with 80% of the images, andvalidation and testing sets, each with a disjoint 10% of the images.Two metrics were used to evaluate loss function candidates: (1)structural similarity index measure (SSIM) [46] between generatedand ground-truth images, and (2) perceptual distance, implemented as the 𝐿 distance between VGG-16 [37] ImageNet [34] embeddingsfor generated and ground-truth images. During evolution, a com-posite objective [36] of these two metrics was used to evaluatecandidates. The metrics were normalized (i.e., SSIM was multipliedby 17 and perceptual distance by −

1) to have a similar impact onevolution.The target GAN model, pix2pix-HD, is a refinement of the semi-nal pix2pix model [14]. Both models generate images conditionedupon an input image. Thus, they are trained with paired images.The baseline was trained with the Wasserstein loss [2] and spec-tral normalization [26] to enforce the Lipschitz constraint on thediscriminator. The pix2pix-HD model is also trained with additiveperceptual distance and discriminator feature losses. Both additivelosses are multiplied by ten in the baseline. Models were trainedfor 60 epochs.When running experiments, each of the twelve TaylorGANparameters was evolved within [− , ] . The learning rate andweights for both additive losses were also evolved since the base-line values, which are optimal for the Wasserstein loss, may notnecessarily be optimal for TaylorGAN loss functions. TaylorGAN found a set of loss functions that outperformed theoriginal Wasserstein loss with spectral normalization. After 49generations of evolution, it discovered the loss functions L 𝐷 real = . ( 𝐷 ( 𝒙 ) − . ) + . ( 𝐷 ( 𝒙 ) − . ) + . ( 𝐷 ( 𝒙 ) − . ) (13) L 𝐷 fake = . ( 𝐷 ( 𝐺 ( 𝒛 )) − . ) + . ( 𝐷 ( 𝐺 ( 𝒛 )) − . ) + . ( 𝐷 ( 𝐺 ( 𝒛 )) − . ) (14) L 𝐺 fake = . ( 𝐷 ( 𝐺 ( 𝒛 )) − . ) + . ( 𝐷 ( 𝐺 ( 𝒛 )) − . ) + . ( 𝐷 ( 𝐺 ( 𝒛 )) − . ) . (15)A learning rate of 0 . . . volving GAN Formulations for Higher Quality Image Synthesis Input:Ground-Truth:Wasserstein Reproduction (Baseline):TaylorGAN Reproduction:

Figure 1: Five random samples from the CMP Facade test dataset, comparing Wasserstein and TaylorGAN loss functions. Theloss functions are used to train pix2pix-HD models that take architectural element annotations (top row) and generate cor-responding photorealistic images similar to the ground-truth (second row). Images from the model trained with TaylorGAN(bottom row) have a higher quality than the baseline (third row). TaylorGAN images have more realistic coloration, betterseparation of the buildings from the sky, and finer details than the baseline. closely match ground-truth images’ typical coloration. Note thatcolor information is not included in the model’s input, so per-samplecolor matching is not possible. Additionally, TaylorGAN imagestend to have higher-quality fine-grained details. For example, facadetextures are unnaturally smooth and clean in the baseline, almostappearing to be made of plastic.Quantitatively, the TaylorGAN model also outperforms the Wasser-stein baseline. Across ten Wasserstein baseline runs, the averagetest-set SSIM was 9 . . . . The results in this paper show that evolving GAN formulations is apromising direction for research. On the CMP Facade benchmarkdataset, TaylorGAN discovered powerful loss functions: With them,GANs generated images that were qualitatively and quantitativelybetter than those produced by GANs with a Wasserstein loss. Thisunique application showcases the power and flexibility of evolution-ary loss-function metalearning, and suggests that it may provide acrucial ingredient in making GANs more reliable and scalable toharder problems. antiago Gonzalez † , Mohak Kant, and Risto Miikkulainen At first glance, optimizing GAN loss functions is difficult becauseit is difficult to quantify a GAN’s performance. That is, performancecan be improved on an individual metric without increasing thequality of generated images. Multiobjective evolution, via com-posite objectives, is thus a key technique that allows evolution towork on GAN formulations. That is, by optimizing against mul-tiple metrics, each with their own negative biases, the effects ofeach individual metric’s bias will not deleteriously affect the pathevolution takes.There are several avenues of future work with TaylorGAN. First,it can naturally be applied to different datasets and different typesof GANs. While image-to-image translation is an important GANdomain, there are many others that can benefit from optimization,such as image super-resolution and unconditioned image genera-tion. Since TaylorGAN customizes loss functions for a given task,dataset, and architecture, unique sets of loss functions could bediscovered for each of them.There is a wide space of metrics, such as Delta E perceptualcolor distance [33], that quantify different aspects of image quality.They can be used to evaluate GANs in more detail and thus guidemultiobjective evolution more precisely, potentially resulting inmore effective and creative solutions.

While GANs provide fascinating opportunities for generating re-alistic content, they are difficult to train and evaluate. This paperproposes an evolutionary metalearning technique, TaylorGAN, tooptimize a crucial part of their design automatically. By evolvingloss-functions customized to the task, dataset, and architecture,GANs can be more stable and generate qualitatively and quantita-tively better results. TaylorGAN may therefore serve as a crucialstepping stone towards scaling up GANs to a wider variety andharder set of problems.

REFERENCES [1] Alawieh, M. B., Lin, Y., Zhang, Z., Li, M., Huang, Q., and Pan, D. Z. 2019. GAN-SRAF: Sub-Resolution Assist Feature Generation Using Conditional GenerativeAdversarial Networks. In

Proceedings of the 56th Annual Design AutomationConference (DAC) . ACM, 149.[2] Arjovsky, M., Chintala, S., and Bottou, L. 2017. Wasserstein Generative Adver-sarial Networks. In

Proceedings of the 34th International Conference on MachineLearning (ICML) (Proceedings of Machine Learning Research) , Doina Precup andYee Whye Teh (Eds.), Vol. 70. PMLR, International Convention Centre, Sydney,Australia, 214–223. http://proceedings.mlr.press/v70/arjovsky17a.html[3] Borji, A. 2019. Pros and cons of GAN evaluation measures.

Computer Vision andImage Understanding

179 (2019), 41–65.[4] Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. 2016.InfoGAN: Interpretable Representation Learning by Information MaximizingGenerative Adversarial Nets. In

Advances in Neural Information Processing Systems29 , D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). CurranAssociates, Inc., 2172–2180.[5] Fedus, W., Rosca, M., Lakshminarayanan, B., Dai, A. M., Mohamed, S., and Good-fellow, I. 2018. Many Paths to Equilibrium: GANs Do Not Need to Decrease aDivergence At Every Step. In

Proceedings of the Sixth International Conference onLearning Representations (ICLR) . https://openreview.net/forum?id=ByQpn1ZA-[6] Gonzalez, S. and Miikkulainen, R. 2020. Evolving Loss Functions With Multi-variate Taylor Polynomial Parameterizations. arXiv preprint arXiv:2002.00059 (2020).[7] Gonzalez, S. and Miikkulainen, R. 2020. Improved Training Speed, Accuracy, andData Utilization Through Loss Function Optimization. In

Proceedings of the IEEECongress on Evolutionary Computation (CEC) .[8] Gonzalez, S. and Miikkulainen, R. 2020. Optimizing Loss Functions ThroughMultivariate Taylor Polynomial Parameterization. (2020). [9] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,Courville, A., and Bengio, Y. 2014. Generative Adversarial Nets. In

Advances inNeural Information Processing Systems 27 , Z. Ghahramani, M. Welling, C. Cortes,N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672–2680.http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf[10] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. 2017.Improved Training of Wasserstein GANs. In

Advances in Neural InformationProcessing Systems 30 , I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5767–5777. http://papers.nips.cc/paper/7159-improved-training-of-wasserstein-gans.pdf[11] Harer, J., Ozdemir, O., Lazovich, T., Reale, C., Russell, R., Kim, L., and chin, p.2018. Learning to Repair Software Vulnerabilities with Generative AdversarialNetworks. In

Advances in Neural Information Processing Systems 31 , S. Bengio,H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.).Curran Associates, Inc., 7933–7943.[12] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. 2017.GANs trained by a two time-scale update rule converge to a local Nash equilib-rium. In

Advances in Neural Information Processing Systems 30 , I. Guyon, U. V.Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.),Vol. 30. Curran Associates, Inc., 6626–6637. https://proceedings.neurips.cc/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf[13] Hinton, G. E. and Sejnowski, T. J. 1983. Optimal perceptual inference. In

Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Citeseer, 448–453.[14] Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. 2016. Image-to-Image Translationwith Conditional Adversarial Networks. arxiv (2016).[15] Kingma, D. and Welling, M. 2014. Auto-Encoding Variational Bayes. In

Proceedingsof the Second International Conference on Learning Representations (ICLR) .[16] Kingma, D. P. and Ba, J. 2015. Adam: A Method for Stochastic Optimization.

CoRR abs/1412.6980 (2015).[17] Larsen, A. B. L., Sønderby, S. K., Larochelle, H., and Winther, O. 2015. Au-toencoding beyond pixels using a learned similarity metric. arXiv preprintarXiv:1512.09300 (2015).[18] Li, M., Xi, R., Chen, B., Hou, M., Liu, D., and Guo, L. 2019. Generate DesiredImages from Trained Generative Adversarial Networks. In

Proceedings of the IEEEInternational Joint Conference on Neural Networks (IJCNN) . IEEE, 1–8.[19] Liang, J., Meyerson, E., Hodjat, B., Fink, D., Mutch, K., and Miikkulainen, R. 2019.Evolutionary Neural AutoML for Deep Learning. In

Proceedings of the Geneticand Evolutionary Computation Conference (GECCO-2019) . http://nn.cs.utexas.edu/?liang:gecco19[20] Liang, J., Meyerson, E., Hodjat, B., Fink, D., Mutch, K., and Miikkulainen, R. 2019.Evolutionary neural autoML for deep learning. In

Proceedings of the Genetic andEvolutionary Computation Conference (GECCO) . 401–409.[21] Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., and Paul Smolley, S. 2017. LeastSquares Generative Adversarial Networks. In

The IEEE International Conferenceon Computer Vision (ICCV) .[22] Mao, X., Li, Q., Xie, H., Lau, R. Y. K., Wang, Z., and Smolley, S. P. 2018. On theeffectiveness of least squares generative adversarial networks.

IEEE Transactionson Pattern Analysis and Machine Intelligence (2018).[23] Metz, L., Poole, B., Pfau, D., and Sohl-Dickstein, J. 2016. Unrolled generativeadversarial networks. arXiv preprint arXiv:1611.02163 (2016).[24] Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju,B., Shahrzad, H., Navruzyan, A., Duffy, N., and others, . 2019. Evolving deepneural networks. In

Artificial Intelligence in the Age of Neural Networks and BrainComputing . Elsevier, 293–312.[25] Mirza, M. and Osindero, S. 2014. Conditional generative adversarial nets. arXivpreprint arXiv:1411.1784 (2014).[26] Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. 2018. Spectral Normalizationfor Generative Adversarial Networks. In

Proceedings of the Sixth InternationalConference on Learning Representations (ICLR) .[27] Nash, J. 1951. Non-cooperative games.

Annals of Mathematics (1951), 286–295.[28] Odena, A., Dumoulin, V., and Olah, C. 2016. Deconvolution and CheckerboardArtifacts.

Distill (2016). https://doi.org/10.23915/distill.00003[29] Pearson, K. 1900. On the criterion that a given system of deviations from theprobable in the case of a correlated system of variables is such that it can be rea-sonably supposed to have arisen from random sampling.

The London, Edinburgh,and Dublin Philosophical Magazine and Journal of Science

50, 302 (1900), 157–175.[30] Radford, A., Metz, L., and Chintala, S. 2015. Unsupervised representation learn-ing with deep convolutional generative adversarial networks. arXiv preprintarXiv:1511.06434 (2015).[31] Real, E., Aggarwal, A., Huang, Y., and Le, Q. V. 2019. Regularized Evolution forImage Classifier Architecture Search. In

Proceedings of the AAAI Conference onArtificial Intelligence . AAAI.[32] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. 2016. Genera-tive Adversarial Text to Image Synthesis. In

Proceedings of the 33rd InternationalConference on Machine Learning (ICML) (Proceedings of Machine Learning Re-search) , Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, NewYork, New York, USA, 1060–1069. http://proceedings.mlr.press/v48/reed16.html volving GAN Formulations for Higher Quality Image Synthesis [33] Robertson, A. R. 1990. Historical development of CIE recommended color differ-ence equations.

Color Research & Application

15, 3 (1990), 167–170.[34] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. 2015. ImageNetLarge Scale Visual Recognition Challenge.

International Journal of ComputerVision (IJCV)

ArtificialIntelligence and Statistics . 448–455.[36] Shahrzad, H., Fink, D., and Miikkulainen, R. 2018. Enhanced optimization withcomposite objectives and novelty selection. In

Artificial Life Conference Proceed-ings . MIT Press, 616–622.[37] Simonyan, K. and Zisserman, A. 2014. Very deep convolutional networks forlarge-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).[38] Smolensky, P. 1986.

Information processing in dynamical systems: Foundations ofharmony theory . Technical Report. Colorado University at Boulder Deptartmentof Computer Science.[39] Stanley, K. O., Clune, J., Lehman, J., and Miikkulainen, R. 2019. Designing NeuralNetworks through Evolutionary Algorithms.

Nature Machine Intelligence

COURSERA: Neural networks formachine learning

4, 2 (2012), 26–31. [41] Tyleček, R. and Šára, R. 2013. Spatial Pattern Templates for Recognition ofObjects with Regular Structure. In

Proceeding of the German Conference on PatternRecognition (GCPR) . Springer, Saarbrucken, Germany, 364–374.[42] Oord, van den A., Kalchbrenner, N., Espeholt, L., Kavukcuoglu, K., Vinyals,O., and Graves, A. 2016. Conditional Image Generation with Pixel-CNN Decoders. In

Advances in Neural Information Processing Systems29 , D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett(Eds.). Curran Associates, Inc., 4790–4798. http://papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders.pdf[43] Villani, C. 2009. The Wasserstein distances. In

Optimal Transport . Springer,93–111.[44] Volz, V., Schrum, J., Liu, J., Lucas, S. M., Smith, A., and Risi, S. 2018. EvolvingMario levels in the latent space of a deep convolutional generative adversarialnetwork. In

Proceedings of the Genetic and Evolutionary Computation Conference(GECCO) . ACM, 221–228.[45] Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., and Catanzaro, B. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs.In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) . 8798–8807.[46] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. 2004. Image qualityassessment: from error measurement to structural similarity.