[PDF] Graph Generative Adversarial Networks for Sparse Data Generation in High Energy Physics

Abstract

We develop a graph generative adversarial network to generate sparse data sets like those produced at the CERN Large Hadron Collider (LHC). We demonstrate this approach by training on and generating sparse representations of MNIST handwritten digit images and jets of particles in proton-proton collisions like those at the LHC. We find the model successfully generates sparse MNIST digits and particle jet data. We quantify agreement between real and generated data with a graph-based Fréchet Inception distance, and the particle and jet feature-level 1-Wasserstein distance for the MNIST and jet datasets respectively.

Full PDF

GGraph Generative Adversarial Networks for SparseData Generation in High Energy Physics

Raghav Kansal, Javier Duarte

University of California San DiegoLa Jolla, CA 92093, USA

Breno Orzari, Thiago Tomei

Universidade Estadual PaulistaSão Paulo/SP - CEP 01049-010, Brazil

Maurizio Pierini, Mary Touranakou ∗ European Organization for Nuclear Research (CERN)CH-1211 Geneva 23, Switzerland

Jean-Roch Vlimant

California Institute of TechnologyPasadena, CA 91125, USA

Dimitrios Gunopulos

National and Kapodistrian University of AthensAthens 15772, Greece

Abstract

We develop a graph generative adversarial network to generate sparse data setslike those produced at the CERN Large Hadron Collider (LHC). We demonstratethis approach by training on and generating sparse representations of MNISThandwritten digit images and jets of particles in proton-proton collisions like thoseat the LHC. We ﬁnd the model successfully generates sparse MNIST digits andparticle jet data. We quantify agreement between real and generated data witha graph-based Fréchet Inception distance, and the particle and jet feature-level1-Wasserstein distance for the MNIST and jet datasets respectively.

At the CERN Large Hadron Collider (LHC), large simulated data samples are generated using MonteCarlo (MC) methods in order to translate the predictions of the standard model (SM), or beyond theSM theories, into observable detector signatures. These samples, numbering in the billions of events,are needed in order to accurately assess the predicted yields and their associated uncertainties. Inorder to achieve the highest level of accuracy possible, GEANT4-based simulation [1] is used tomodel the interaction of particles traversing the detector material. However, this approach comes ata high computational cost. At the LHC, such simulation workﬂows account for a large fraction ofthe total computing resources of the experiments, and with the planned high-luminosity upgrade, theexpanded need for MC simulation may become unsustainable [2].To accelerate simulation workﬂows, alternative methods based on generative deep learning modelshave been studied, including generative adversarial networks (GANs) [3–5] and variational autoen-coders (VAEs) [6]. Applications include generating particle shower patterns in calorimeters [7–12],particle jets [13–15], event-level kinematic quantities [16–19], pileup collisions [20], and cosmic rayshowers [21].While these studies have proven to be effective for speciﬁc high energy physics (HEP) simulationtasks, it can be both challenging and inefﬁcient to generalize such linear and convolutional neural ∗ Also at National and Kapodistrian University of Athens, Athens, Greece.Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver, Canada. a r X i v : . [ phy s i c s . d a t a - a n ] J a n n i t i a l F e a t u r e s … { f e f n × T F i n a l F e a t u r e s … { F i n a l F e a t u r e s … { F i n a l F e a t u r e s … { Figure 1: A message-passing neural network architecture.network architectures to a full, low-level description of collision events due to the sparsity, com-plexity, and irregular underlying geometry (e.g. Ref. [22]) of HEP detector data. In this paper, weinvestigate a graph-based GAN to inherently account for data sparsity and any irregular geometry inthe model architecture. As noted in Ref. [23], while graph networks have been successfully appliedto classiﬁcation and reconstruction tasks in HEP, they have yet to be explored for generative tasks,and this paper presents innovative work in this direction.As a proxy for an LHC dataset, we ﬁrst consider two sparse versions of the MNIST hand-writtendigit dataset [24]: one sparsiﬁed by hand and the other the so-called superpixels dataset [25]. Then,we apply the same strategy to a simulated dataset of jets produced in proton-proton collisions likethose occurring at the LHC [26]. We note that while, for convenience, we train on simulated data, forreal applications this model could be trained on experimental data.

Our ﬁrst dataset is a sparse graph representation of the MNIST dataset. From each image, weselect the 100 highest intensity pixels as the nodes of a fully connected graph, with their featurevectors consisting of the x , y coordinates and intensities. This is directly analogous to selectingthe coordinates and momenta of the highest momentum particles in a jet or highest energy hitsin a detector. The second dataset, known as the MNIST superpixels dataset [25], was created byconverting each MNIST image into 75 superpixels, corresponding to the nodes of a graph. Thecenters and intensities of the superpixels comprise the hidden features of the nodes. Edge features forboth datasets are chosen to be the Euclidean distance between the connected nodes.Finally, the third dataset [26–28] consists of simulated particle jets with transverse momenta p jetT ≈ TeV, originating from W and Z bosons, light quarks, top quarks, and gluons produced in √ s =13 TeV proton-proton collisions in an LHC-like detector. For our application, we only considergluon jets and limit the number of constituents to the 30 highest p T particles per jet (with zero-padding if there are fewer than 30). For each particle, the following three features resulted inthe best performance: the relative transverse momentum p relT = p particleT /p jetT and the relativecoordinates η rel = η particle − η jet and φ rel = φ particle − φ jet (mod 2 π ) . We represent each jet asa fully-connected graph with the particles as the nodes. A single edge feature is taken to be the ∆ R = (cid:112) ∆ η + ∆ φ between the connected particles. For evaluation we additionally consider thejet relative mass m jet /p jetT . For both the generator and discriminator we use a message-passing neural network (MPNN) architec-ture [29]. For a graph G t = ( V t , E t ) after t iterations of message passing ( t = 0 corresponds to theoriginal input graph), V t a set of N nodes each with its own feature vector h tv , and E t a set of edges2ach with its own feature vector e tvw , we deﬁne one additional iteration of message passing as m t +1 v = (cid:88) w ∈N v f t +1 e ( h tv , h tw , e tvw ) (1) h t +1 v = f t +1 n ( h tv , m t +1 v ) , (2)where m t +1 v is the aggregated message vector sent to node v , h t +1 v is the updated hidden state ofnode v , f t +1 e and f t +1 n are arbitrary functions which are in general unique to each iteration t , and N v is the set of nodes in the neighborhood of node v . The functions f te and f tn are implemented in ourcase as independent multilayer perceptrons (MLPs).The generator receives as input a graph G containing a set of N nodes initialized with feature vectors h v randomly sampled from a latent normal distribution, and then goes through T g message passingiterations to output the ﬁnal graph G T g g with new node features. The discriminator receives as inputeither a real or generated graph G and goes through T d message passing iterations to produce a ﬁnalgraph G T d d , with a single binary feature for each node classifying it as real or fake. This feature thenis averaged over all nodes with a 50% cutoff for the ﬁnal discriminator output.We note that a limitation of this architecture is that a particular model can only generate a ﬁxednumber of nodes and a constant graph topology. To overcome this we select a maximum number ofnodes to produce per dataset and use zero-padding when necessary. We leave exploring generatingvariable-size dynamic graph topologies to future work.A separate optimization is performed for every task to choose the hyperparameters T g , T d , hiddennode feature size | h tv | , and the number of layers and neurons in each layer of each f te and f tn network.A different model is optimized for each MNIST digit, in analogy with the HEP use case, in whichdifferent generator settings are chosen for generating different physics processes. A variety ofarchitectures experimented with, out of which an MPNN for both the generator and discriminatorwas most successful, are discussed in Appendix A. We use the least squares loss function [30] and the RMSProp optimizer with a learning rate of − forthe discriminator and × − for the generator [31], except for the superpixel digits ‘2’, ‘4’, and ‘9’where a learning rate of − for both the generator and discriminator had better performance. We useLeakyReLU activations (with negative slope coefﬁcient 0.2) for all intermediate layers, and tanh andsigmoid activations for the ﬁnal outputs of the generator and discriminator respectively. We attempteddiscriminator regularization to alleviate mode collapse via dropout [32], batch normalization [33],a gradient penalty [34], spectral normalization [35], adaptive competitive gradient descent [36]and data augmentation of real and generated graphs before the discriminator [37–39]. Apart fromdropout (with fraction . ), none of these demonstrated signiﬁcant improvement with respect to modedropping or graph quality. For model evaluation and optimization, as well as a quantitative benchmark on these datasets forcomparison, we propose a graph-based Fréchet Inception distance (FID) [31] inspired metric for theMNIST datasets, and the 1-Wasserstein distance ( W ) for the jets dataset as in Ref. [13, 40]. The twometrics differ for reasons explained below.Traditionally, the FID metric is used on image datasets, using the pre-trained Inception-v3 imageclassiﬁer network. It compares the statistics of the outputs of a single layer of this network betweengenerated and real samples, and has been shown to be a consistent measure of similarity betweengenerated and real samples in terms of both quality and diversity. To adapt FID to graph datasets, weuse the MoNet model of Ref. [25] as the pre-trained classiﬁer, which can be found at Ref. [41], andcalculate what we call the graph Fréchet distance (GFD): GFD = || µ r − µ g || + Tr( Σ r + Σ g − Σ r Σ g ) / ) , (3)where µ r ( µ g ) is the vector of means of the activation function outputs of the ﬁrst fully connectedlayer in the pre-trained MoNet model for real (generated) images and Σ r ( Σ g ) is the correspondingcovariance matrix. 3o evaluate the performance on the jet dataset, we calculate directly W between the distributionsof the three particle-level features and the jet m/p T in the real and generated samples. Unlikefor MNIST, these quantities correspond to meaningful physical observables hence measuring W between their distributions is a more desirable metric than the GFD. We use bootstrapping to calculatea baseline W between samples within the real dataset alone, taking P pairs of random sets of N jetsand calculating W between the distributions of each pair. For three combinations of P and N , themeans and standard deviations are shown in Table 2. The W values between the real and generateddistributions are similarly calculated by generating P sets of N jets each and comparing them to P random sets of N real jets. For the MNIST-derived datasets, we optimized the hyperparameters of our model using our GFDmetric. A sample of hyperparameter settings we tested with their corresponding GFD scores for all10 digits can be seen in Appendix B. Based on this optimization, the ﬁnal hyperparameters chosen forthe three datasets as listed in Table 1. Fig. 2 (left) shows a comparison between real and generateddigits for the sparse MNIST dataset. The generator is able to reproduce all 10 digits successfullywith high accuracy and little evidence of mode dropping. Similarly, Fig. 2 (right) compares real andgenerated digits for the MNIST superpixel dataset. Again, we can see that the model successfullyreproduces the real samples, though there is some evidence of mode dropping, particularly with themore complex digits and rarer modes. We leave exploring this issue further to future work. Theaverage of our best GFD scores across all 10 digits is 0.52 and 0.30 for the Sparse MNIST andsuperpixels dataset respectively.Table 1: Optimized hyperparameters for each dataset.

Dataset Digits T g T d f e (Neurons per layer) f n (Neurons per layer) | h tv | In 1 2 Out In 1 2 OutSparse MNIST 2 3 4 5 7 1 1 65 64 — 128 160 256 256 32 320 1 6 8 9 1 1 65 96 160 192 224 256 256 32 32Superpixels All 2 2 65 64 — 128 160 256 256 32 32Jet — 2 2 65 96 160 192 224 256 256 32 32

Sparse MNIST Real Samples Generated Samples Generated SamplesSuperpixels Real Samples

Figure 2: Samples from our sparse MNIST dataset (far left) compared to samples from our graphGAN (center left). Samples from the MNIST superpixels dataset (center right) compared to samplesfrom our graph GAN (far right).Our results on the jet dataset using our message-passing architecture show excellent agreement, bothqualitatively and quantitatively using W . Example generated and real distributions of particle η rel , φ rel , and p relT and jet m/p T are shown in Fig. 3 for 100,000 jets. The mean W values between real4 Particle η rel P a r t i c l e s ×10 RealGenerated -0.2 0.0 0.2

Particle ϕ rel P a r t i c l e s ×10 RealGenerated

Particle p relT P a r t i c l e s ×10 RealGenerated

Jet m/p T J e t s ×10 RealGenerated

Figure 3: Distributions of the particle η rel , φ rel , and p relT , and jet m/p T for 100,000 real and generatedjets.Table 2: Mean W values and standard deviations between particle-level distributions of η rel , φ rel ,and p relT , and the jet-level distribution of m/p T derived from comparing randomly selected sets of N real jets and from comparing sets of random N real and N generated jets. This comparison isrepeated P times to derive a mean and standard deviation. N P W mean ± standard deviation ( × − ) Pairs of real distributions Real and generated distributions η rel φ rel p relT Jet m/p T η rel φ rel p relT Jet m/p T

100 1,000 ± ± . ± . ± ± ± ± ± . ± . . ± . . ± .

02 1 . ± . . ± . ± . ± . . ± . . ± . . ± . . ± .

02 0 . ± . . ± . . ± . . ± .

06 1 . ± . and generated jet distributions are presented in Table 2. For samples of 100 jets, the mean W values(between real and generated jet samples) agree with the expected ones (between real jet samples)within one standard deviation, but this is not the case when the jet sample size is increased to 1,000or 10,000. Thus, while the generator has sufﬁcient ﬁdelity for smaller sample sizes, there is room forimprovement for larger ones. We also note that there is little evidence of mode collapse with thisdataset because we can see that the entire distributions, including rarer data samples in the tails, arereproduced with high accuracy. We have presented a novel architecture for generating graphs using a generative adversarial networkbased on a message-passing neural network, which we successfully apply to two MNIST-derivedgraph datasets as well as an LHC jet dataset. This architecture works efﬁciently with sparse data andinherently adapts to any underlying geometry. We ﬁnd the model generates realistic MNIST graphdata albeit with some evidence of mode dropping, which we quantify with our graph Fréchet distance.For the jet dataset, we measure the quality of the generator using a metric based on the 1-Wassersteindistance and ﬁnd high accuracy for smaller sample sizes. The application of our model to a highenergy physics dataset demonstrates its ﬂexibility, and indicates this approach may be readily usedfor fast simulation of a variety of scientiﬁc datasets, including sensor-level data in high granularitycalorimeters.

Broader Impacts

Physics experiments needing to generate large simulated datasets may beneﬁt from this work. If thistype of algorithm is used by experiments to produce such datasets, to produce datasets, it may reducethe computational cost of running the experiment. At the same time, if the ﬁdelity of the algorithm isnot as high as desired, it may result in suboptimal or inaccurate scientiﬁc results. Other beneﬁciariesof this work may include any group with a need to generate graph-based datasets following somerealistic patterns.

Acknowledgments and Disclosure of Funding

This work was supported by the European Research Council (ERC) under the European Union’sHorizon 2020 research and innovation program (Grant Agreement No. 772369). R. K. was partially5upported by an IRIS-HEP fellowship through the U.S. National Science Foundation (NSF) underCooperative Agreement OAC-1836650. J. D. is supported by the U.S. Department of Energy (DOE),Ofﬁce of Science, Ofﬁce of High Energy Physics Early Career Research program under AwardNo. DE-SC0021187. B. O. was partially supported by grants

AppendicesA Architecture Experiments

We experimented with multiple GAN architectures for producing graphs. This included standard MLPand CNN generators and discriminators, which were predictably unsuccessful due to the architecturesnot being permutation invariant. However, despite this limitation, a CNN classiﬁer achieved > %accuracy on our sparse MNIST dataset.A better architecture we attempted was using a gated recurrent unit (GRU) based recurrent neuralnetwork (RNN) as our generator together with a CNN discriminator because of its success as aclassiﬁer. The RNN received as input a random sample from our latent space and iteratively outputeach nodes’ features in sequence. While there was evidence of this model learning some graphstructure in Fig. 4 (left), it was not able to reproduce digits. Nonetheless, this model has the desirableability to produce graphs with arbitrary numbers of nodes and future research could explore thisfurther.The MPNN generator was a clear improvement and was able to successfully reproduce graphs fromour dataset. We tested a CNN discriminator initially, and the GAN produced high-quality outputs, asseen in Fig. 4 (right). However, training was difﬁcult and inconsistent, and there was clear evidenceof mode collapse. Replacing the CNN with an MPNN improved both these aspects.Figure 4: Samples from an RNN generator with a CNN discriminator (left), which exhibit somestructure, but no coherent digits. Samples from an MPNN generator with a CNN discriminator (right),which shows a successful output but displays mode collapse.With the MPNN, we experimented with an additional step in the discriminator after the message-passing iterations to produce the ﬁnal classiﬁcation output f nd  (cid:88) v ∈ V T dd h T d v  , (4)where f nd is implemented as an MLP. This allows the discriminator to take a holistic look at thegraph instead of classifying on a per node basis. However, empirically we found that this additionmarginally decreased performance so was not used in the ﬁnal architecture.6or edge features, we tested the absolute Euclidean distance and the vector displacement in Cartesianand polar coordinates between node positions, as well as the difference in node intensities. TheEuclidean distance alone was the most effective. We also investigated fully connected graphsversus edge connections only within a local neighborhood as in Ref. [25] and the performance wascomparable. B Hyperparameter Optimization

A characteristic sample of hyperparameter combinations we tested for the sparse MNIST datasetdigit ‘3’ can be seen in Table 3. A similar sample for the superpixels MNIST dataset digit ‘3’ can beseen in Table 4. Based on this hyperparameter optimization, the ﬁnal settings for the MNIST datasetswere chosen as shown in Table 1. For the jet dataset, we chose one of the hyperparameter settingsoptimized for the MNIST datasets and found it to be effective.Table 3: Sample of hyperparameter combinations for the sparse MNIST dataset with their respectiveGFD scores for digit ‘3.’ The selected combination is in bold. T g T d f e (Neurons per layer) f n (Neurons per layer) | h tv | GFDIn 1 2 Out In 1 2 3 Out T g T d f e (Neurons per layer) f n (Neurons per layer) | h tv | GFDIn 1 2 Out In 1 2 3 Out1 1 65 64 — 128 160 256 256 — 32 32 2.481 2 65 64 — 128 160 256 256 — 32 32 1.932 1 65 64 — 128 160 256 256 — 32 32 1.18 eferences [1] GEANT4 Collaboration, “GEANT4: A Simulation toolkit”,

Nucl. Instrum. Methods Phys. Res.A

A506 (2003) 250, doi:10.1016/S0168-9002(03)01368-8 .[2] HEP Software Foundation, “A roadmap for HEP software and computing R&D for the 2020s”,

Comput. Softw. Big Sci. (2019) 7, doi:10.1007/s41781-018-0018-8 , arXiv:1712.06982 .[3] I. J. Goodfellowet al., “Generative adversarial nets”, in Advances in Neural InformationProcessing Systems 27 , Z. Ghahramani et al., eds., p. 2672. Curran Associates, Inc., 2014. arXiv:1406.2661 .[4] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN”, (2017). arXiv:1701.07875 .[5] I. Gulrajaniet al., “Improved training of Wasserstein GANs”, in

Advances in NeuralInformation Processing Systems 30 , I. Guyon et al., eds., p. 5767. Curran Associates, Inc., 2017. arXiv:1704.00028 .[6] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes”, (2013). arXiv:1312.6114 .[7] M. Paganini, L. de Oliveira, and B. Nachman, “Accelerating science with generative adversarialnetworks: An application to 3D particle showers in multilayer calorimeters”,

Phys. Rev. Lett. (2018) 042003, doi:10.1103/PhysRevLett.120.042003 , arXiv:1705.02355 .[8] M. Paganini, L. de Oliveira, and B. Nachman, “CaloGAN: Simulating 3D high energy particleshowers in multilayer electromagnetic calorimeters with generative adversarial networks”, Phys. Rev. D (2018) 014021, doi:10.1103/PhysRevD.97.014021 , arXiv:1712.10321 .[9] M. Erdmann, J. Glombitza, and T. Quast, “Precise simulation of electromagnetic calorimetershowers using a Wasserstein generative adversarial network”, Comput. Softw. Big Sci. (2019)4, doi:10.1007/s41781-018-0019-7 , arXiv:1807.01954 .[10] D. Salamani et al., “Deep generative models for fast shower simulation in atlas”, in , p. 348. 2018. doi:10.1109/eScience.2018.00091 .[11] D. Belayneh et al., “Calorimetry with deep learning: particle simulation and reconstruction forcollider physics”, Eur. Phys. J. C (2020) 688, doi:10.1140/epjc/s10052-020-8251-9 , arXiv:1912.06794 .[12] ATLAS Collaboration, “Fast simulation of the ATLAS calorimeter system with generativeadversarial networks”, Technical Report ATL-SOFT-PUB-2020-006, 2020.[13] L. de Oliveira, M. Paganini, and B. Nachman, “Learning particle physics by example:Location-aware generative adversarial networks for physics synthesis”, Comput. Softw. Big Sci. (2017) 4, doi:10.1007/s41781-017-0004-6 , arXiv:1701.05927 .[14] P. Musella and F. Pandolﬁ, “Fast and accurate simulation of particle detectors using generativeadversarial networks”, Comput. Softw. Big Sci. (2018) 8, doi:10.1007/s41781-018-0015-y , arXiv:1805.00850 .[15] S. Carrazza and F. A. Dreyer, “Lund jet images from generative and cycle-consistentadversarial networks”, Eur. Phys. J. C (2019) 979, doi:10.1140/epjc/s10052-019-7501-1 , arXiv:1909.01359 .[16] S. Otten et al., “Event generation and statistical sampling for physics with deep generativemodels and a density information buffer”, (2019). arXiv:1901.00875 .[17] B. Hashemi et al., “LHC analysis-speciﬁc datasets with generative adversarial networks”,(2019). arXiv:1901.05282 .[18] R. Di Sipio, M. Faucci Giannelli, S. Ketabchi Haghighat, and S. Palazzo, “DijetGAN: Agenerative-adversarial network approach for the simulation of QCD dijet events at the LHC”, J.High Energy Phys. (2020) 110, doi:10.1007/JHEP08(2019)110 , arXiv:1903.02433 .[19] A. Butter, T. Plehn, and R. Winterhalder, “How to GAN LHC events”, SciPost Phys. (2019)075, doi:10.21468/SciPostPhys.7.6.075 , arXiv:1907.03764 .820] J. Arjona Martínez et al., “Particle generative adversarial networks for full-event simulation atthe LHC and their application to pileup description”, J. Phys. Conf. Ser. (2020) 012081, doi:10.1088/1742-6596/1525/1/012081 , arXiv:1912.02748 .[21] M. Erdmann, L. Geiger, J. Glombitza, and D. Schmidt, “Generating and reﬁning particledetector simulations using the Wasserstein distance in adversarial networks”, Comput. Softw.Big Sci. (2018) 4, doi:10.1007/s41781-018-0008-x , arXiv:1802.03325 .[22] CMS Collaboration, A. Martelli, “The CMS HGCAL detector for HL-LHC upgrade”, in . 8, 2017. arXiv:1708.08234 .[23] J. Shlomi, P. Battaglia, and J.-R. Vlimant, “Graph neural networks in particle physics”, Machine Learning: Science and Technology (2021), no. 2, 021001, doi:10.1088/2632-2153/abbf9a .[24] Y. LeCun and C. Cortes, “MNIST handwritten digit database”, 2010. http://yann.lecun.com/exdb/mnist/ .[25] F. Monti et al., “Geometric deep learning on graphs and manifolds using mixture model CNNs”,in , p. 5425.IEEE, New York, NY, 2017. arXiv:1611.08402 . doi:10.1109/CVPR.2017.576 .[26] M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, “ hls4ml LHC jet dataset (30 particles)”, 01,2020. doi:10.5281/zenodo.3601436 .[27] J. Duarte et al., “Fast inference of deep neural networks in FPGAs for particle physics”,

J.Instrum. (2018) P07027, doi:10.1088/1748-0221/13/07/P07027 , arXiv:1804.06913 .[28] E. Coleman et al., “The importance of calorimetry for highly-boosted jet substructure”, J.Instrum. (2018) T01003, doi:10.1088/1748-0221/13/01/T01003 , arXiv:1709.08705 .[29] J. Gilmeret al., “Neural message passing for quantum chemistry”, in Proceedings of the 34thInternational Conference on Machine Learning , D. Precup and Y. W. Teh, eds., volume 70,p. 1263. PMLR, 2017. arXiv:1704.01212 .[30] X. Mao et al., “Multi-class generative adversarial networks with the L2 loss function”, (2016). arXiv:1611.04076 .[31] M. Heuselet al., “GANs trained by a two time-scale update rule converge to a local nashequilibrium”, in

Advances in Neural Information Processing Systems 30 , I. Guyon et al., eds.,p. 6626. Curran Associates, Inc., 2017. arXiv:1706.08500 .[32] N. Srivastavaet al., “Dropout: A simple way to prevent neural networks from overﬁtting”,

J.Mach. Learn. Res. (2014) 1929.[33] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducinginternal covariate shift”, in Proceedings of the 32nd International Conference on MachineLearning , F. Bach and D. Blei, eds., volume 37, p. 448. PMLR, 2015. arXiv:1502.03167 .[34] I. Gulrajaniet al., “Improved training of wasserstein gans”, in

Advances in Neural InformationProcessing Systems 30 , I. Guyon et al., eds., p. 5767. Curran Associates, Inc., 2017. arXiv:1704.00028 .[35] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generativeadversarial networks”, in . 2018. arXiv:1802.05957 .[36] F. Schäfer, H. Zheng, and A. Anandkumar, “Implicit competitive regularization in GANs”,(2019). arXiv:1910.05852 .[37] T. Karras et al., “Training generative adversarial networks with limited data”, (2020). arXiv:2006.06676 .[38] N.-T. Tran et al., “On data augmentation for GAN training”, (2020). arXiv:2006.05338 .[39] Z. Zhao et al., “Image augmentations for GAN training”, (2020). arXiv:2006.02595 .[40] Y. Lu, J. Collado, D. Whiteson, and P. Baldi, “SARM: Sparse autoregressive model for scalablegeneration of sparse images in particle physics”, (2020). arXiv:2009.14017 .[41] R. Kansal, “rkansal47/graph-gan: v0.1.0”, 2020. doi:10.5281/zenodo.4299011doi:10.5281/zenodo.4299011