[PDF] Regularized Generative Adversarial Network

Abstract

We propose a framework for generating samples from a probability distribution that differs from the probability distribution of the training set. We use an adversarial process that simultaneously trains three networks, a generator and two discriminators. We refer to this new model as regularized generative adversarial network (RegGAN). We evaluate RegGAN on a synthetic dataset composed of gray scale images and we further show that it can be used to learn some pre-specified notions in topology (basic topology properties). The work is motivated by practical problems encountered while using generative methods in the art world.

Full PDF

RREGULARIZED GENERATIVE ADVERSARIALNETWORK

GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAANAbstract.

We propose a framework for generating samples froma probability distribution that diﬀers from the probability distri-bution of the training set. We use an adversarial process thatsimultaneously trains three networks, a generator and two dis-criminators. We refer to this new model as regularized generativeadversarial network (

RegGAN ). We evaluate

RegGAN on a syntheticdataset composed of gray scale images and we further show that itcan be used to learn some pre-speciﬁed notions in topology (basictopology properties). The work is motivated by practical problemsencountered while using generative methods in the art world.

Contents

1. Introduction 12. Related work 23. Dataset and topology 34. Generative Adversarial Networks 45. RegGAN 96. Empirical validation 127. Conclusion 158. Future work 159. RegGAN in art 16References 171.

Introduction

In recent years, adversarial models have proven themselves to beextremely valuable in learning and generating samples from a givenprobability distribution [AML11]. What is interesting about genera-tive adversarial networks (

GANs ) [GAN14] is that they are capable ofmimicking any non-parametric distribution. On the other hand, it isfairly common that we are interested in generating samples from a

This work was inspired by an artist named Marco Gallotta, marcogallotta.net.Marco introduced us to his paper cutting art, shared images he has created andwas eager to know how AI techniques can be used to create new images based onhis work. We are very grateful for his time and eﬀort in introducing us to his workand closely working with us to assess the progress of our work. a r X i v : . [ c s . L G ] F e b GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN probability distribution that diﬀers from the training set. We proposea method that allows us to use generative models to generate samplesfrom a probability distribution even though we do not have samples ofit in the training dataset. The key idea is to use a pre-trained networkto drive the loss function in the learning process of a

GAN .Our main contributions are: • We propose and evaluate a new architecture (

RegGAN ) for agenerative adversarial network which is able to generate samplesfrom a target distribution which does not appear in the trainingset. • We show that these methods can be used as data augmentationtechnique to improve the performance of one of the discrimina-tors. • We discuss how to use convolutional neural networks (

CNNs ) tolearn discontinuous functions and use them in the loss functionof a

GAN , avoiding diﬀerentiability issues. • We show that our model is able to learn basic topology proper-ties of two dimensional sets.At the end of this paper we brieﬂy discuss our initial motivation fordeveloping these techniques. It all started as a collaboration with apaper cutting artist with the goal of producing a generative model ableto reproduce his style. We will not touch on the artistic implications ofour work, we reserve that for another paper, but we will brieﬂy explainthe problems we encountered and show some of the work done with theartist. 2.

Related work

There are diﬀerent ways we can try to control the output of a

GAN .One of the very ﬁrst works on this problem is the so-called Conditional

GAN from the paper [MO], where the authors introduced the use of la-bels in the training set. The generation of images can be conditional ona class label allowing the generator to produce images of a certain labelonly. In order to do this, one need to slightly change the architectureof the

GAN .Another class of models relevant to our project is Importance Weighted

GANs introduced in [DESCSW]. Here the output of a

GAN is con-trolled by changing the loss function. The authors introduce diﬀerentialweights in the loss function to drive diﬀerent aspects of the generatedimages.Our work should be thought as a combinations of the above men-tioned papers. We use weights in the loss function of our architecturebut the weights are given by labels of a

CNN . EGULARIZED GENERATIVE ADVERSARIAL NETWORK Dataset and topology

It is known that deep neutral networks are data hungry. To avoidany issue with lack of training data, we use a synthetic dataset com-posed of 10k gray scale images to be able to generate enough samplesfor training. The images are generated by drawing a random numberof pure black circles with a Gaussian blur of random intensity. Thisproduces blob like pictures which some samples are shown in Figure 1.

Figure 1.

Some sample of blobs from the datasetFor a given picture, we deﬁne the number of connected componentsof it to be the number of connected components of the region in the twodimensional space produced by the non-white pixels with the topologydeﬁned by the Euclidean distance, see [Mun] for a good treatmentof these notions in topology. For the purpose of our application, weare interested in generating images in the same style of the datasetwith only one connected components. On the other hand, our datasethas been generated in such a way that the images have a number ofconnected components between 8 and 20 as shown in Figure 1.3.1.

Score function.

The number of connected components is a usefultopological invariant of a region but it is not a very ﬂexible invariant.For this reason, we deﬁne a function that measures how far a regionis from being connected. Since images are presented as a collection ofgray scale pixels, or equivalently a square matrix with entries between0 and 1, the function below depends on the choice of a threshold α .Let M be a n × n matrix with entries 0 ≤ a ij ≤ < α <

1. Let ¯ M be the matrix with entry ¯ a ij deﬁned by thefollowing ¯ a ij = (cid:40) a ij ≥ α, a ij < α. Let M o be the largest connected component of ¯ M . Here we deﬁnea connected component to be the matrix composed by all entries withvalue 1 that share some common side with at least one other element GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN of the same component. The largest connected component is the onethat contains the largest number of 1’s. Note that there could be morethan one largest connected component but they all share the same area.If we represent pixels as squares of ﬁxed side length in the Euclideanspace, M o corresponds to the largest connected component of the regiondeﬁned by the pixels with value 1 under the Euclidean topology.For a given n × n matrix M = ( a ij ) we deﬁne (cid:107) M (cid:107) = (cid:80) ni =1 (cid:80) nj =1 a ij .For a matrix with entries 0 or 1, it corresponds to the usual Euclideannorm and it computes the area of the region deﬁned by the pixels withvalue 1.We are now ready to deﬁne the score function s : R n × n → R as s ( M ) = (cid:107) M o (cid:107) (cid:13)(cid:13) ¯ M (cid:13)(cid:13) . Note that 0 ≤ s ( M ) ≤ s ( M ) = 1 if and only if M has a uniqueconnected component. The above deﬁnition depends on a choice of α and for the rest of this paper we will assume that α = 0 .

6. The choiceof that value was done by trial and error and we settled on a value thatworked reasonably well for our dataset.One of the main technical problems encounter in this paper is thefact that s is not a continuous function. It is easier to imagine thebehavior of the function s acting on regions of the plane. If our regionis composed by two disconnected disks of equal area then s has value0 . s will have constant value 0 . s will jump tothe value of 1.3.2. Learning discontinuous function.

Since s is not a diﬀeren-tiable function, it cannot be used in combination with gradient descentwhile training the model. To overcome this problem we use a con-volutional neural network ( CNN ) [Neocog80], [TDNN89], [ConvNets89],to learn the score function. A

CNN will not perform well if we justtry to learn the function s as it is. The main idea here is to bin to-gether images in the dataset with similar score function. More precisely,we create 11 labels corresponding to the values obtained by applying .round () to 10 s ( M ). For example, as we are working with torch ten-sors, .round () return 0 for all values between 0 and 0 .

499 and 1 for allvalues between 0 . . CNNs are knownto perform well.4.

Generative Adversarial Networks

Given training data training data ∼ p data ( x ) EGULARIZED GENERATIVE ADVERSARIAL NETWORK we wish to generate new samples from distribution p data ( x ). Not know-ing p data ( x ), the goal is to ﬁnd p model ( x ) to be able to sample from it. Ingenerative models we learn p model ( x ) which is similar to p data ( x ). Thisturns out to be a maximum likelihood problem θ (cid:63) = arg max θ E x ∼ p data (log p model ( x | θ ))The work in generative models can be categorized as follows (a) ex-plicit density, (b) implicit density. In explicit density we assume someparametric form for density and utilize Markov techniques to be able totrack distribution or update our distribution as more data is processed. MCMC techniques are an example [MCMC83], [MCMC03]. In the im-plicit density case, it would not be possible to construct a parametricform and we assume some non-parametric form and then try to learnit.

GANs are designed to avoid using

Markov chains because of highcomputational cost of

Markov chains. Another advantage relative to

Boltzmann machines [BM07] is that the generator function has muchfewer restrictions (there are only a few probability distributions that ad-mit

Markov chain sampling). Goodfellow et al. (2014) introduced

GANs in a paper titled Generative Adversarial Networks [GAN14]. They aredeep neural networks that contain two networks, competing with oneanother, that is where the name is coming from, used in unsupervisedmachine learning.

GAN is a framework for estimating generative models through an ad-versarial process where they simultaneously train two models: a gen-erative model that captures the data distribution and a discriminativemodel that estimates the probability that a sample came from thetraining data rather than the model being trained in (a).The type of training in a

GAN is set as min-max game (game theory)with the value function V ( G, D ):min G max D V ( G, D )= E x ∼ P data ( x ) [log D ( x )] + E z ∼ P z ( z ) [log(1 − D ( G ( z )))]that means generator G tries harder to fool discriminator D and dis-criminator D becomes more and more cautious not getting fooled bythe generator G What makes

GANs very interesting and appealing in that they canlearn to copy and imitate any distribution of data. At ﬁrst,

GANs were used to improve images, make high-quality pictures and animecharacters, recently, they can be taught to create things amazinglysimilar to our surroundings. However, a vanilla

GAN is simplistic and notable to learn the high-dimensional distribution especially in computervision.

GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN

Figure 2.

GAN architectureIn training

GANs , there are a number of common failure modes fromvanishing gradients to mode collapse which make training problematic,see [SGZCRC] and [GANMC20]. These common issues are areas ofactive research. We will address mode collapse later.4.1.

Deep Convolutional Generative Adversarial Networks.

Vanilla

GANs are not capable of capturing the complexity of imagesand would be natural to introduce convolution networks into

GANs , thatis what is done in

DCGAN [DCGAN16]. They bridge the gap betweenthe success of

CNNs for supervised learning and unsupervised learningin a

GAN . They introduce a class of GANs called deep convolutionalgenerative adversarial networks (

DCGANs ), that have certain architec-tural constraints, and demonstrate that they are a strong candidate forunsupervised learning.In our work, we ﬁrst applied and trained a

DCGAN on our sampleblob images to generate images with many connected components. Thearchitecture in

DCGAN is shown in Figure 3. This network was able to Evolving from

GAN to DCGAN should be seen as evolution of feedforward neuralnetworks to

CNNs . EGULARIZED GENERATIVE ADVERSARIAL NETWORK Figure 3.

DCGAN architecturegenerate images, however it did not capture the connected componentsin the image. Results from training are shown in Figure 4.

Figure 4.

Images generated by DCGAN4.2.

Weighted Deep Convolutional Generative Adversarial Net-works.

To improve the performance, we tried to add a penalty func-tion to the loss function. This approach is not new and has beenextensively studied in the literature, see for example [DESCSW]. Ingeneral, if one is interested in any regularization, one way is to add anexplicit penalty function to the original loss function as follows (cid:101) L (Θ) = L (Θ) − λ × score (Θ) GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN where the score function measures certain characteristics about theobject under consideration, for example the ratio of the biggest con-nected component to the entire area in an image. In learning, theexplicit penalty does not work, the score function has to be incorpo-rated into learning. However as explained earlier, the score function isnot diﬀerentiable which is a major problem here. Moreover, one needsto ﬁnd a reasonable weight for the score function in the loss functionas if we give it too much weight the model will not be able to learn andthe best it can do is to generate entirely black images to maximize thescore.We tried to use a weighted deep convolutional generative adversarialnetworks (

WDCGAN ) to generate the images.

WDCGANs are an extensionof

DCGAN in which both the discriminator and the generator are con-ditioned on some extra information by introducing weights in the lossfunction.

WDCGANs have been successfully used to generate medical data[RCGAN18].The high level architecture of the

WDCGAN is shown in Figure 5. The

Figure 5.

Weighted

DCGAN architectureconditioning of the model on the extra information gives the modelan initial starting point to generate data. It also gives the user morecontrol on what type of data we want the model to generate. Notethat we do not condition the generator or the discriminator on thenumber of connected components as there are no images with only oneconnected component in the dataset. On the other hand, we hopedthat weighting the loss function with the score function would providethe model with necessary information so it would be able to generateimages with the desired structure.However, we empirically found that this is not the case, the modelfails to suﬃciently leverage the extra information provided and capture

EGULARIZED GENERATIVE ADVERSARIAL NETWORK the structures of the images. The generated images by WDCGAN areshown in Figure 6. It can be seen form the image above that the modelis not able to use the extra information given by us. A key point inthe training of weighted

GANs is the use of diﬀerentiable weights, whichultimately is the main issue in our case. To avoid that issue, we addthe second discriminator to learn the score function and include it inthe learning to be able to generate images with a large single connectedcomponent.

Figure 6.

Images generated by WDCGAN5.

RegGAN

Model architecture.

The

RegGAN architecture consists of twodiscriminator and a single generator. The second classiﬁer is used tosimulate the score function, which was designed by us. The ﬁrst dis-criminator is used to diﬀerentiate between the images generated by thenetwork and the ones from the dataset. The dataset is composed byimages of size 64 ×

64 which will determine the number of convolutionallayers of the networks. The architecture in shown in Figure 7.5.2.

Loss function.

The loss function in

RegGAN is given bymin G max D ,D V ( G, D , D )(1)= E x [log( D ( x ))] + E z log(1 − D ( G ( z ))) + E z log(1 − D ( G ( z )))) Number of publications on deep learning applications is enormous. The initialaim of this study was to construct a network that can mimic an artist’s patternswith connected components. We naturally thought to call it

ArtGAN , but recognizedthe name was taken [ArtGAN17]. Our architecture consists of two discriminatorsthus would have been natural to call it

D2GAN or DDGAN , but those two names aretaken as well [D2GAN17]. We thought of

YAGAN (Yet Another

GAN ) inspired by

YACC (Yet Another Compiler Compiler)[YACC75], but the name would not reﬂectthe nature of the proposed architecture. In our design, the second dicriminatorimplicitly plays the role of a regularizer, for that reason we name it

RegGAN forregularized

GAN . GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN

Figure 7.

RegGAN architecture5.3.

Classiﬁer.

This network is composed by 4 convolutional layers,2 max pool layers and 3 linear layers. We pre-trained it on the datasetas a classiﬁer of the images where the labels are assigned by the scorefunction s as explained before. We use cross entropy loss to train it.Around 15k iterations we get close to 80% accuracy.We pre-train this network to learn the score function. This is done sothat the second discriminator has a good starting point for the actualtraining of the network. During pre-training we feed in the images fromthe data set to the network. The outputs from the network are thencompared to the actual scores given by the score function.Once the discriminator has converged close enough to the score func-tion we freeze the weights of the model. Note that at this point theclassiﬁer has learnt a diﬀentiable approximation of the score function.After saving the trained network, we load it for training the generator. EGULARIZED GENERATIVE ADVERSARIAL NETWORK We do so because we want to use the second discriminator, as a pseudofor the score function. For other applications, where the penalty func-tion should evolve with the data the weights of the discriminator canevolve with the training of the generator.5.4.

Discriminator.

This network is composed by 5 convolutionallayers. We trained against the generator using

BCEWithLogicLoss [BCEWLL] which combines a sigmoid layer to a criterion that mea-sures the binary cross entropy in a single class. In various experiments,it proves to be more numerically stable than binary cross entropy (

BCE ).5.5.

Generator.

Similarly to the discriminator the generator is com-posed by 5 convolutional layers. We train the generator in two stepsduring each epoch: ﬁrst we train it against the discriminator in theusual way we train a

DCGAN . Then we again train it against the classi-ﬁer. We train the generator to maximize the value of the classiﬁer onthe generated images. This pushes the score function of the generatedimages to converge to 1, which forces the production of only imageswith a single connected component, or at least a very large connectedcomponent compared to the others.We feed in noise to the generator and get outputs as images. Theseimages are then fed into both the discriminators to compute the scoreand to compare it to the images of the actual data set.There are two ways in which we back-propagate. In the ﬁrst one, wefreeze the weights of the second discriminator and the gradient is onlypropagated through the the generator and the ﬁrst discriminator. Inthe second method, we pass the gradient through the second discrimi-nator as well. As far as the quality of the generated images, we did notsee major advantages to the second method, so the results presentedhere follow the ﬁrst back-propagation method, as it is faster. On theother hand, the second method has the advantage that it can be usedto improve the accuracy of the classiﬁer, as the generated images arenew data points for the score function.A sample of the images generated by the network can be seen inFigure 8.The outputs from the network are then compared to the actual scoresgiven by the score function. The iteration results from the training ofthe classiﬁer in

RegGAN are shown in Figure 9.Let us brieﬂy address mode collapse in

RegGAN . While mode collapseis usually not a big issue if the discriminator learns the mode and thegenerator keeps changing it, it is a problem for statistical analysis whenthe generator learns to produce a single image over the vast majorityof the training process. We notice that using diﬀerent learning rateswhen back-propagating for the discriminator and classiﬁer during thetraining of the generator easily solves the problem of mode collapse in

RegGAN . GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN

Figure 8. sample images generated by

RegGAN

Figure 9.

Training loss for the classiﬁer in

RegGAN Empirical validation

All of the experiments presented in this section are based on thesynthetic dataset described above. We compare the performance of

RegGAN against a

DCGAN trained on the same dataset with the samenumber of iteration.

DCGAN is used as a baseline to show that ourarchitecture succeeds in generating images with very high score and alow number of connected components.During the train we keep track of the mean of the scores of thebatches of images generated by both the

DCGAN and

RegGAN . As ex-pected, the

DCGAN is learning pretty closely the distribution of the scorefunction in the dataset. We recall that the score function is uniformlydistributed between 0 and 1 on the dataset. In particular, we get thatthe score function during the training of the

DCGAN has no particular

EGULARIZED GENERATIVE ADVERSARIAL NETWORK trend. In Figure 10, we plot the mean of the score function on batchesgenerated by the DCGAN on the last 5000 iterations.

Figure 10.

Score function on images generated by

DCGAN

In Figure 11, we illustrate some of the images generated by

DCGAN .In that ﬁgure we highlight in black the largest connected component

Figure 11.

Images generated by

DCGAN and the other connected components are drawn with diﬀerent shades ofgray. The images are visually quite similar to the images in the dataset.The picture below shows some of the images in the dataset, again withhighlighted largest connected component. We can easily tell that thenumber of connected components of the generated images is quite highand most importantly that there are many connected components oflarge area, as indicated by the values of the score function.On the other hand,

RegGAN is able to produce images visually similarto the original dataset but with much higher values for the score func-tion. As before, we keep track of the mean of the scores of generatedimages during the training of

RegGAN . In Figure 13, we plot their values GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN

Figure 12.

Sample of images in the dataset

Figure 13.

Score function during training of

RegGAN during the entire training process. In the best case scenario, the scorefunction would converge to 0 .

95 as it is the lowest possible value in thelast label of the

CNN that we use to compute the score. Even though,it is not neatly converging to that value we believe that with moreﬁne-tuning we can achieve a better convergence. On the other hand,this already tells us that the architecture introduced in this paper isable to generate images with high score value. Moreover, the imagesgenerated by

RegGAN still resemble the images in the dataset, as shownin Figure 14.Note that the generated images shown above do have more than 1connected component. On the other hand, there is a dominating con-nected component, in pure black, and the others have very small size,their area is negligible compared to the area of the largest connectedcomponent.

EGULARIZED GENERATIVE ADVERSARIAL NETWORK Figure 14.

Sample of images generated by

RegGAN Conclusion

For this study, we created a synthetic data set to train our network.We generated collections of blobs ranging between 11-18 in numberin every image. We attempted to use generative adversarial networksto generate images with a given number of connected components. Wetried various diﬀerent architectures, regularization schemes, and train-ing paradigms to achieve the task. We proposed a new

GAN architecture,called

RegGAN , with an extra discriminator playing the role of a regu-larizer.

RegGAN seems to be capturing the topology of the blob imagesthat other

GAN -typed networks failed to do.8.

Future work

For future work, one can apply

RegGAN to three-dimensional (3D)images. Topology in 3D is more challenging and should be interestingto see how

RegGAN performs. Another application would be in sim-ulating times series of ﬁnanical data. The score function introducedin

RegGAN can play the role of volatility persistence in ﬁnancial timeseries. Also

RegGAN can be used in music composition for generatingvarious diﬀerent pieces from the same musical notes. In generation mu-sical notes dynamics and rhythm of a piece are essential. We have tomake sure the generated notes follow certain dynamics. This can be setas a score function and

RegGAN can be applied to assure the producedmusical notes follow speciﬁed dynamics.Another application of our methods we intent to explore is the useof non diﬀerentiable techniques of data augmentation to better train a

GAN . As we show in this paper we can use non diﬀerentiable weight inthe loss function and in the same way we could use non diﬀerentiabledata augmentation techniques during the training process, in a similarfashion of [ZLLZH]. GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN RegGAN in art

Our original motivation was to develop a generative model tailoredaround an artist. In particular, we wanted to train a

GAN only on artpieces produced by a single artist, which do not contribute to a rea-sonable dataset. In order to be able to train the model, we developedmany data augmentation techniques which in same cases modiﬁed theimages considerably. The main artistic craft of the artist in this collab-oration is paper cutting and the

GAN had the goal to learn and generatepatterns inspired by his work. As the generated patterns will be cutfrom paper later, we need the patterns to be connected, when consid-ered as black and white images. On the other hand, some of the dataaugmentation techniques transformed the original images, which wereconnected, to new patterns with many connected components. Dueto the lack of data, it is much better not to disregard images withmany components, or parts of them. This motivated us to develop thearchitecture presented in this paper.In a future work, we will describe in detail the data augmentationtechniques developed for this project and their consequences to theartistic end product. Some of the art works obtained in this collabora-tion are shown in Figure 15.

Figure 15.

Two images produced using RegGAN.

EGULARIZED GENERATIVE ADVERSARIAL NETWORK References [YACC75] Johnson, Stephen C. Yacc: Yet Another Compiler-Compiler, 1975.[Neocog80] K. Fukushima. Neocognitron: A self-organizing neural network modelfor a mechanism of pattern recognition unaﬀected by shift in position,Biological Cybernetics, 36[4], pp. 193-202 (April 1980).[TDNN89] Phoneme Recognition Using Time-Delay Neural Networks. A. Waibel,T. Hanazawa, G. Hinton, K. Shikano, K.J. Lang. IEEE Transactionson Acoustics, Speech, and Signal Processing (Volume: 37, Issue: 3,Mar 1989)[ConvNets89] Gradient-based learning applied to document recognition. Y. LeCun,L. Bottou, Y. Bengio, and P. Haﬀner. Proceedings of the IEEE, No-vember 1998.[MCMC83] Geman, S., Geman, D. Stochastic Relaxation, Gibbs Distributions,and the Bayesian Restoration of Images. IEEE Transactions on PatternAnalysis and Machine Intelligence. 6 (6): 721–741. 1984[MCMC03] Christophe Andrieu, Nando De Freitas, Arnaud Doucet and MichaelI. Jordan. An Introduction to MCMC for Machine Learning, 2003Machine Learning, 50, 5–43, 2003 Kluwer Academic Publishers.[BM07] Geoﬀrey E. Hinton, Boltzmann machine (2007), Scholarpedia,2(5):1668.[AML11] Ling Huang, Anthony Douglas Joseph, Blaine Alan Nelson, Benjamin IP Rubinstein, J Doug Tygar, Adversarial machine learning,, Proceed-ings of the 4th ACM workshop on Security and artiﬁcial intelligence,October 2011 Pages 43–58[GAN14] Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing,Warde-Farley, David, Ozair, Sherjil, Courville, Aaron C., and Bengio,Yoshua. Generative adversarial nets. NIPS, 2014.[DCGAN16] Radford, Alec, Metz, Luke, and Chintala, Soumith Uusupervised rep-resentation learning with deep convolutional generative adversarialnetworks. ICLR 2016[ArtGAN17] ArtGAN: Wei Ren Tan, Chee Seng Chan, Hernan E. Aguirre,Kiyoshi Tanaka. Artwork Synthesis with Conditional CategoricalGANs. https://arxiv.org/pdf/1702.03410.pdf 2017[D2GAN17] Tu Dinh Nguyen, Trung Le, Hung Vu, Dinh Phung.Dual Discriminator Generative Adversarial Nets.https://arxiv.org/pdf/1709.03831.pdf 2017[RCGAN18] Stephanie Hyland, Crist´obal Esteban, Gunnar R¨atsch. Real-valued(Medical) Time Series Generation with Recurrent Conditional GANs,https://arxiv.org/abs/1706.02633, Feb 2018[GANMC20] Ricard Durall, Avraam Chatzimichailidis, Peter Labus, Janis Keuper.Combating Mode Collapse in GAN training: An Empirical Analysisusing Hessian Eigenvalues. https://arxiv.org/abs/2012.09673[BCEWLL] https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html[DESCSW] Maurice Diesendruck, Ethan R Elenberg, Rajat Sen, Guy W Cole,Sanjay Shakkottai, and Sinead A Williamson. Importance weightedgenerative networks. arXiv preprint arXiv:1806.02512, 2018.[Mun] James Munkres. Topologu, a ﬁrst course, Pearson, 2000.[ZLLZH] S. Zhao, Z. Liu, J. Lin, J.-Y. Zhu, and S. Han. Diﬀerentiable augmen-tation for data-eﬃcient GAN training. CoRR, abs/2006.10738, 2020.[MO] Mirza, M. and Osindero, S. Conditional Generative Adversarial Nets.arXiv Prepr. arXiv:1411.1784 (2014). GABRIELE DI CERBO, ALI HIRSA, AHMAD SHAYAAN [SGZCRC] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, andX. Chen. Improved techniques for training gans. In Advances in NeuralInformation Processing Systems, pages 2226–2234, 2016.

Department of Mathematics, Princeton University, Princeton NJ08540, USA

Email address : [email protected] Industrial Engineering and Operations Research & Data ScienceInstitute, Columbia University, New York NY 10027, USA

Email address : [email protected] Industrial Engineering and Operations Research, Columbia Univer-sity, New York NY 10027, USA

Email address ::