Lund jet images from generative and cycle-consistent adversarial networks
EEPJ manuscript No. (will be inserted by the editor)
Lund jet images from generative and cycle-consistent adversarialnetworks
Stefano Carrazza and Fr´ed´eric A. Dreyer TIF Lab, Dipartimento di Fisica, Universit`a degli Studi di Milano and INFN Milan,Via Celoria 16, 20133, Milano, Italy Rudolf Peierls Centre for Theoretical Physics, University of Oxford,Clarendon Laboratory, Parks Road, Oxford OX1 3PUReceived: date / Accepted: date
Abstract.
We introduce a generative model to simulate radiation patterns within a jet using the Lundjet plane. We show that using an appropriate neural network architecture with a stochastic generation ofimages, it is possible to construct a generative model which retrieves the underlying two-dimensional distri-bution to within a few percent. We compare our model with several alternative state-of-the-art generativetechniques. Finally, we show how a mapping can be created between different categories of jets, and usethis method to retroactively change simulation settings or the underlying process on an existing sample.These results provide a framework for significantly reducing simulation times through fast inference of theneural network as well as for data augmentation of physical measurements.
PACS.
One of the most common objects emerging from hadroncollisions at particle colliders such as the Large HadronCollider (LHC) are jets. These are loosely interpreted ascollimated bunches of energetic particles arising from theinteractions of quarks and gluons, the fundamental con-stituents of the proton [1, 2]. In practice, jets are usu-ally defined through a sequential recombination algorithmmapping final-state particle momenta to jet momenta, witha free parameter R defining the radius up to which sepa-rate particles are clustered into a single jet [3–5].Because of the high energies involved in the collisionsat the LHC, heavy particles such as vector bosons or topquarks are frequently produced with very large transversemomenta. In this boosted regime, the decay products ofthese objects can become so collimated that they are re-constructed as a single jet. An active field of research istherefore dedicated to the theoretical understanding of ra-diation patterns within jets, notably to distinguish theirphysical origins and remove radiation unassociated withthe hard process [6–26]. Furthermore, measurements of jetproperties provide a unique opportunity for accurate com-parisons between theoretical predictions and data, and canbe used to tune simulation tools [27] or extract physicalconstants [28].In recent years, there has also been considerable in-terest in applications of generative adversarial networks(GAN) [29] and variational autoencoders (VAE) [30] toparticle physics, where such generative models can be used Fig. 1.
Average Lund jet plane density for QCD jets simulatedwith Pythia v8.223 and Delphes v3.4.1. to significantly reduce the computing resources required tosimulate realistic LHC data [31–40]. In this paper, we in-troduce a generative model to create new samples of thesubstructure of a jet from existing data. We use the Lundjet plane [22], shown in figure 1, as a visual representa- a r X i v : . [ h e p - ph ] N ov Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs tion of the clustering history of a jet. This provides anefficient encoding of a jets radiation patterns and can bedirectly measured experimentally [41]. The Lund jet im-age is used to train a Least Square GAN (LSGAN) [42] toreproduce simulated data within a few percent accuracy.We compare a range of alternative generative methods,and show good agreement between the original jets gener-ated with Pythia v8.223 [43] using fast detector simulationwith Delphes v3.4.1 particle flow [44] and samples pro-vided by the different models [45]. Finally, we show how acycle-consistent adversarial network (CycleGAN) [46] canbe used to create mappings between different categoriesof jets. We apply this framework to retroactively changethe parameters of the parton shower on an event, addingnon-perturbative effects to an existing parton-level sam-ple, and transforming quark and gluon jets to a boosted W sample.These methods provide a systematic tool for data aug-mentation, as well as reductions of simulation time andstorage space by several orders of magnitude, e.g. througha fast inference of the neural network with hardware archi-tectures such as GPUs and field-programmable gate arrays(FPGA) [47]. The code frameworks and data used in thiswork are available as open-source and published materialin [48–50] . In this article we will construct a generative model, whichwe call gLund, to create new samples of radiation patternsof jets. We first introduce the basis used to describe a jetas an image, then construct a generative model which canbe trained on these objects.
To describe the radiation patterns of a jet, we will usethe primary Lund plane representation [22], which canbe projected onto a two-dimensional image that serves asinput to a neural network.The Lund jet plane is constructed by reclustering ajet’s constituents with the Cambridge-Aachen (C/A) algo-rithm [4,51]. This algorithm sequentially recombines pairsof particles that have the minimal ∆ ij = ( y i − y j ) + ( φ i − φ j ) value, where y i and φ i are the rapidity and azimuthof particle i .This clustering sequence can be used to construct an n × n pixel image describing the radiation patterns of theinitial jet. We iterate in reverse through the clusteringsequence, labelling the momenta of the two branches ofa declustering as p a and p b , ordered in transverse mo-mentum such that p t,a > p t,b . This procedure follows theharder branch a and at each step we activate the pixel onthe image corresponding to the coordinates (ln ∆ ab , ln k t ), The codes are available at https://github.com/JetsGame/gLund and https://github.com/JetsGame/CycleJet
Initial sample n avg = 5 n avg = 10 n avg = 20 Fig. 2.
Sample input images after averaging with n avg =1, 5,10 and 20. where k t = p t,b ∆ ab is the transverse momentum of particle b relative to a . The data sample used in this article consists of 500k jets,generated using the dijet process in Pythia v8.223. Jets areclustered using the anti- k t algorithm [5, 52] with radius R = 1 .
0, and are required to pass a selection cut, withtransverse momentum p t >
500 GeV and rapidity | y | < .
5. Unless specified otherwise, results use the Delphesv3.4.1 fast detector simulation, with the delphes_card_CMS_NoFastJet.tcl card to simulate both detector effectsand particle flow reconstruction.The simulated jets are then converted to Lund imageswith 24 ×
24 pixels each using the procedure described insection 2.1. A pixel is set to one if there is a corresponding(ln ∆ ab , ln k t ) primary declustering sequence, otherwise itis left at zero.The full samples used in this article can be accessedonline [50]. Generative adversarial networks [53] are one of the mostsuccessful unsupervised learning methods. They are con-structed using both a generator G and discriminator D ,which are competing against each other through a valuefunction V ( G, D ).In practice, we found improved performance when us-ing a Least Square Generative Adversarial Network (LS-GAN) [42], a specific class of GAN which uses a leastsquares loss function for the discriminator, and has objec-tive functions defined as For simplicity, we consider only whether a pixel is on or off,instead of counting the number of hits as in [22]. While thesetwo definitions are equivalent only for large image resolutions,this limitation can easily be overcome e.g. by considering aseparate channel for each activation level.tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 3 min D V ( D ) = 12 E x ∼ p data [( D ( x ) − b ) ]+ 12 E z ∼ p z ( z ) [( D ( G ( z )) − a ) ] , (1)min G V ( G ) = 12 E z ∼ p z ( z ) [( D ( G ( z )) − c ) ] , (2)where we defined p z ( z ) as a prior on input noise vari-ables, and a , b and c are the labels for the fake, real andpresumed fake data respectively. Thus D is trained in or-der to maximise the probability of correctly distinguishingthe training examples and the samples from G , followingequation (1), while the latter is trained to minimise equa-tion (2). The generator’s distribution p g optimises equa-tion (2) when p g = p data , so that the generator learns howto generate new samples from z . The main advantage ofthe LSGAN over the original GAN framework is a morestable training process, due to an absence of vanishing gra-dients. In addition, we include a minibatch discriminationlayer [54] to avoid collapse of the generator.The LSGAN is trained on the full sample of QCD Lundjet images. In order to overcome the limitation of GANsdue to the sparse and discrete nature of Lund images, wewill use a probabilistic interpretation of the Lund imagesto train the model. To this end, we will first re-sample ourinitial data set into batches of n avg and create a new set of500k images, each consisting of the average of n avg initialinput images, as shown in figure 2. These images can bereinterpreted as physical events through a random sam-pling, where the pixel value is interpreted as the probabil-ity that the pixel is activated. The n avg value is a param-eter of the model, with a large value leading to increasedvariance in the generated images compared to the refer-ence sample, while for too low values the model performspoorly due to the sparsity and discreteness of the data.A further data preprocessing step before training the LS-GAN consists in rescaling the pixel intensities to be in the[ − ,
1] range, and masking entries outside of the kinematiclimit of the Lund plane. The images are then whitened us-ing zero-phase components analysis (ZCA) whitening [55].
The optimal choice of hyperparameters, both for the LS-GAN model architecture and for the image preprocessing,is determined using the distributed asynchronous hyper-parameter optimisation library hyperopt [56].The performance of each setup is evaluated by a lossfunction which compares the reference preprocessed Lundimages to the artificial images generated by the LSGANmodel. We define the loss function as L h = I + 5 · S (3)where I is the norm of the difference between the aver-age of the images of the two samples and S is the absolutedifference in structural similarity [57] values between 5000 random pairs of reference samples, and reference and gen-erated samples.We perform 1000 iterations and select the one for whichthe loss L h is minimal. In figure 3 we show some of theresults obtained with the hyperopt library through theTree-structured Parzen Estimator (TPE) algorithm. TheLSGAN is constructed from a generator and discrimina-tor. The generator consists in three dense layers with 512,1024 and 2048 units respectively using LeakyReLU [58] ac-tivation functions and batch normalisation layers, as wellas a final layer matching the output dimension and usinga hyperbolic tangent activation function. The discrimina-tor is constructed from two dense layers with 768 and 384units using a LeakyReLU activation function, followed byanother 24-dimensional dense layer connected to a mini-batch discrimination layer, with a final fully connectedlayer with one-dimensional output. The best parametersfor this model are listed in table 1. The loss of the gener-ator and discriminator networks of the LSGAN is shownin figure 4 as a function of training epochs.In figure 5, the first two images illustrate an example ofinput image before and after preprocessing while the lasttwo images represent the raw output from the LSGANmodel and the corresponding sampled Lund image.A selection of preprocessed input images and imagesgenerated with the LSGAN model are shown in figure 6.The final averaged results for the Lund jet plane densityare shown in figure 7 for the reference sample (left), thedata set generated by the gLund model (centre) and theratio between these two samples (right). We observe agood agreement between the reference and the artificialsample generated by the gLund model. The model is ableto reproduce the underlying distribution to within a 3-5%accuracy in the bulk region of the Lund image. Largerdiscrepancies are visible at the boundaries of the Lundimage and are due the vanishing pixel intensities. In prac-tice this model provides a new approach to reduce MonteCarlo simulation time for jet substructure applications aswell as a framework for data augmentation. Let us now quantify the quality of the model describedin section 2.3 more concretely. As alternatives, we con-sider a variational autoencoder (VAE) [30, 59, 60] and aWasserstein GAN [45, 61].A VAE is a latent variable model, with a probabilisticencoder q φ ( z | x ), and a probabilistic decoder p θ ( x | z ) tomap a representation from a prior distribution p θ ( z ). Thealgorithm learns the marginal likelihood of the data in thisgenerative process, which corresponds to maximising L ( θ, φ ) = E q φ ( z | x ) [log p θ ( x | z )] − βD KL ( q φ ( z | x ) || p ( z )) , (4)where β is an adjustable hyperparameter controlling thedisentanglement of the latent representation z . In our im-plementation, we will set β = 1, which corresponds to theoriginal VAE framework. Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs L o ss
50 100 200 500 800 1000Latent dimension 2 4 8 10 16 20 32 n avg
128 256 384 512 768number of units in D L o ss
256 384 512 768 1024number of units in G G D Fig. 3.
Hyperparameter scan results obtained with the hyperopt library. The first row shows the scan over image and optimiserrelated parameters while the second row plots correspond to the final architecture scan.
Parameters Value
Architecture LSGAN D units 384 G units 512 α D α G n avg . · − Decay β · − Optimiser Adagrad
Table 1.
Final parameters for the gLund model. L o ss EpochsDiscriminator lossGenerator loss
Fig. 4.
Loss of the LSGAN discriminator and generatorthroughout the training stage.
During the training of the VAE, we use KL cost an-nealing [62] to avoid a collapse of the VAE output to theprior distribution. This is a problem caused by the largevalue of the KL divergence term in the early stages oftraining, which is mitigated by adding a variable weight w KL to the KL term in the cost function, expressed as w KL ( n step ) = min(1 , . · . n step ) . (5)Finally, we will also consider a Wasserstein GAN withgradient penalty (WGAN-GP). WGANs [45] use the Wasser-stein distance to construct the value function, but cansuffer from undesirable behaviour due to the critic weightclipping. This can be mitigated through gradient penalty,where the norm of the gradient of the critic is penalisedwith respect to its input [61]. tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 5 ab )20246 l n ( k t / G e V ) Raw input ab )20246 l n ( k t / G e V ) Preprocessed input ab )20246 l n ( k t / G e V ) Raw generated output ab )20246 l n ( k t / G e V ) Generated sample
Fig. 5.
Left two figures: Sample input images before and after preprocessing. Right two: sample generated by the LSGAN andthe corresponding Lund image.
Input samples Generated samples
Fig. 6.
A random selection of preprocessed input images (left),and of images generated with the LSGAN model (right). Axesand colour schemes are identical to figure 5.
We determine the best hyperparameters for both ofthese models through a hyperopt parameter sweep, whichis summarised in Appendix A. To train these models usingLund images, we then use the same preprocessing stepsdescribed in section 2.3.To compare our three models, we consider two slicesof fixed k t or ∆ ab size, cutting along the Lund jet planehorizontally or vertically respectively.In figure 8, we show the k t slice, with the referencesample in red. The lower panel gives the ratio of the dif-ferent models to the reference Pythia 8 curve, showingvery good performance for the LSGAN and WGAN-GPmodels, which are able to reproduce the data within afew percent. The VAE model also qualitatively reproducesthe main features of the underlying distribution, howeverwe were unable to improve the accuracy of the generatedsample to more than 20% without avoiding the issue ofposterior collapse. The same observations can be made infigure 9, which shows the Lund plane density as a functionof k t , for a fixed slice in ∆ ab . In figure 10a we show the distribution of the numberof activated pixels per image for the reference sample gen-erated with Pythia 8 and the artificial images producedby the LSGAN, WGAN-GP and VAE models. All modelsexcept the VAE model provide a good description of thereference distribution.We also use the Lund image to reconstruct the soft-drop multiplicity [63]. To this end, for a simpler corre-spondence between this observable and the Lund image,we retrained the generative models using ln( z∆ ) as y -axis.The soft-drop multiplicity can then be extracted from thefinal image, and is shown in figure 10b for each model us-ing z cut = 0 .
007 and β = −
1. The dashed lines indicatethe true reference distribution, as evaluated directly onthe declustering sequence, and which differs slightly fromthe reconstructed curve due to the finite pixel and imagesize.Finally, in figure 10c, we show the reconstructed massof the groomed jet using the modified Mass Drop Tag-ger [17] with z cut = 0 .
1, where we approximate the massas ρ = m R p t (cid:39) max i (cid:104) z ( i ) (cid:0) ∆ ( i ) (cid:1) (cid:105) , (6)The dotted line shows the true mass distribution, evalu-ated with the left-hand side of equation (6) on the groomedjet. As in previous comparisons, we observe a very goodagreement of the LSGAN and WGAN-GP models withthe reference sample.We note that while the WGAN-GP model is able toaccurately reproduce the distribution of the training data,as discussed in Appendix A, the individual images them-selves can differ quite notably from their real counterpart.For this reason, our preferred model in this paper is theLSGAN-based one. Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs ab )20246 l n ( k t / G e V ) reference (a) Reference ab )20246 l n ( k t / G e V ) generated (b) Generated ab )20246 l n ( k t / G e V ) generated/reference (c) Ratio generated/reference Fig. 7.
Average Lund jet plane density for (a) the reference sample and (b) a data set generated by the gLund model. (c) showsthe ratio between these two densities. ( a b , f i x e d k t ) k t [GeV] < 25.8 LSGANWGAN-GPVAEPythia 81 0.5 0.2 0.1 0.05 0.02 0.01 ab r a t i o t o P y t h i a Fig. 8.
Slice of the Lund plane along ∆ ab with 7 . Slice of the Lund plane along k t with 0 . < ∆ ab < . nique to create mappings between different types of jet. Asexamples, we will consider a mapping from parton-level todetector-level images, and a mapping from QCD imagesgenerated through Pythia 8’s dijet process, to hadronicallydecaying W jets obtained from W W scattering.The cycle obtained for a CycleGAN trained on par-ton and detector-level images is shown in figure 11, wherean initial parton-level Lund image is transformed to adetector-level one, before being reverted again. The sam-pled image is shown in the bottom row. tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 7 (a) n SD z cut =0.007, = 1, cut =0.0 LSGANWGANGPVAEPythia 8 (b) mMDT z cut =0.1 LSGANWGANGPVAEPythia 8 (c) Fig. 10. Distribution of (a) the number of activated pixels per image, (b) the reconstructed soft-drop multiplicity for z cut =0 . β = − θ cut = 0, and (c) the jet mass after applying the modified Mass Drop Tagger with z cut = 0 . parton level detector level parton level Fig. 11. Top: transition from parton-level to delphes-level andback using CycleJet. Bottom: corresponding sampled event. A CycleGAN learns mapping functions between two do-mains X and Y , using as input training samples from bothdomains. It creates an unpaired image-to-image transla-tion by learning both a mapping G : X → Y and an in-verse mapping F : Y → X which observes a forward cycleconsistency x ∈ X → G ( x ) → F ( G ( x )) ≈ x as well as abackward cycle consistency y ∈ Y → F ( y ) → G ( F ( y )) ≈ y . This behaviour is achieved through the implementationof a cycle consistency loss L cyc ( G, F ) = E x ∼ p data ( x ) [ (cid:107) F ( G ( x )) − x (cid:107) ]+ E y ∼ p data ( y ) [ (cid:107) G ( F ( y )) − y (cid:107) ] , (7)Additionally, the full objective includes also adversar-ial losses to both mapping functions. For the mappingfunction G : X → Y and its corresponding discriminator D Y , the objective is expressed as L GAN ( G, D Y , X, Y ) = E y ∼ p data ( y ) [log D Y ( y )]+ E x ∼ p data ( x ) [log(1 − D Y ( G ( x )))] , (8)such that G is incentivized to generate images G ( x )that resemble images from Y , while the discriminator D Y attempts to distinguish between translated and originalsamples.Thus, CycleGAN aims to find arguments solving G ∗ , F ∗ = arg min G,F max D X ,D Y L ( G, F, D X , D Y ) , (9)where L is the full objective, given by L ( G, F, D X , D Y ) = L GAN ( G, D Y , X, Y )+ L GAN ( F, D X , Y, X ) + λ L cyc ( G, F ) . (10)Here λ is parameter controlling the importance of the cy-cle consistency loss. We implemented a CycleGAN frame-work, labelled CycleJet, that can be used to create map-pings between two domains of Lund images. By training anetwork on parton and detector-level images, this methodcan thus be used to retroactively add non-perturbativeand detector effects to existing parton-level samples. Simi-larly, one can train a model using images generated throughtwo different underlying processes, allowing for a mappinge.g. from QCD jets to W or top initiated jets. Following the pipeline presented in section 2.4 we per-form 1000 iterations of the hyperparameter scan using the hyperopt library and the loss function L h = || R A − P B → A || + || R B − P A → B || (11) CycleJet can also be used for similar practical purposes asDCTR [64], albeit it is of course limited to the Lund imagerepresentation. Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs Parameters Value D filters 32 G filters 32 λ cycle 10 λ identity factor 0.2Epochs 3Batch size 128ZCA Yes n avg . · − Decay β Table 2. Final parameters for the CycleJet model. where A and B indexes refer to the desired input andoutput samples respectively so R A and R B are the aver-age reference images before the CycleGAN transformationwhile P B → A and P A → B correspond to the average imageafter the transformation. Furthermore, for this model wenoticed better results when preprocessing the pixel inten-sities with the standardisation procedure of removing themean and scaling to unit variance, instead of a simplerrescaling in the [-1,1] range as done in section 2.The CycleJet model consists in two generators and twodiscriminators. The generators consist in a down-samplingmodule with three two-dimensional convolutional layerswith 32, 64 and 128 filters respectively, and LeakyReLUactivation function and instance normalisation [65], fol-lowed by an up-sampling with two two-dimensional con-volutional layers with 64 and 32 filter. The last layer isa two-dimensional convolution with one filter and hyper-bolic tangent activation function. The discriminators takethree two-dimensional convolutional layers with 32, 64 and128 filters and LeakyReLU activation. The first convo-lutional layer has additionally an instance normalisationlayer and the final layer is a two-dimensional convolutionallayer with one filter. The best parameters for the CycleJetmodel are shown in table 2.In the first row of figure 12 we show results for an initialaverage parton-level sample before (left) and after (right)applying the parton-to-detector mapping encoded by theCycleJet model, while in the second row of the same figurewe perform the inverse operation by taking as input theaverage of the dephes-level sample before (left) and after(right) applying the CycleJet detector-to-parton mapping.This example shows clearly the possibility to add non per-turbative and detector effects to a parton level simulationwithin good accuracy. Similarly to the previous example,in figure 13 we present the mapping between QCD-to- W jets and vice-versa. Also in this case, the overall qualityof the mapping is reasonable and provides and interestingsuccessful test case for process remapping. ab )20246 l n ( k t / G e V ) parton level ab )20246 l n ( k t / G e V ) transformed ab )20246 l n ( k t / G e V ) detector level ab )20246 l n ( k t / G e V ) transformed Fig. 12. Top: average of the parton-level sample before (left)and after (right) applying the parton-to-detector mapping.Bottom: average of the delphes-level sample before (left) andafter (right) applying the detector-to-parton mapping. For both examples we observe a good level agreementfor the respective mappings, highlighting the possibilityto use such an approach to save CPU time for applyingfull detector simulations and non perturbative effects toparton level events. It is also possible to train the CycleJetmodel on Monte Carlo data and apply the correspondingmapping to real data. We have conducted a careful study of generative modelsapplied to jet substructure.First, we trained a LSGAN model to generate new arti-ficial samples of detector level Lund jet images. With this,we observed agreement to within a few percent accuracyin the bulk of the phase space with respect to the referencedata. This new approach provides an efficient method forfast simulation of jet radiation patterns without requiringthe long runtime of full Monte Carlo event generators. An-other advantage consists in the possibility of this methodto be applied to real collider data to generate accuratephysical samples, as well as making it possible to avoidthe necessity for large storage space by generating realis-tic samples on-the-fly. tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 9 ab )20246 l n ( k t / G e V ) QCD ab )20246 l n ( k t / G e V ) transformed ab )20246 l n ( k t / G e V ) W ab )20246 l n ( k t / G e V ) transformed Fig. 13. Top: average of the QCD sample before (left) and af-ter (right) applying the QCD-to- W mapping. Bottom: averageof the W sample before (left) and after (right) applying the W -to-QCD mapping. Secondly, a CycleGAN model was constructed to mapdifferent jet configurations, allowing for the conversion ofexisting events. This procedure can be used to changeMonte Carlo parameters such as the underlying processor the shower parameters. As examples we show how toconvert an existing sample of QCD jets into W jets andvice-versa, or how to add non perturbative and detectoreffects to a parton level simulation. As for the LSGAN,this method can be used to save CPU time by includ-ing full detector simulations and non perturbative effectsto parton level events. Additionally, one could use Cy-cleJet to transform real data using mappings trained onMonte Carlo samples or apply them to samples generatedthrough gLund.To achieve the results presented in this paper we haveimplemented a rather convolved preprocessing step whichnotably involved combining and resampling multiple im-ages. This procedure was necessary to achieve accuratedistributions but comes with the drawback of loosing in-formation on correlations between emissions at wide angu-lar and transverse momentum separation. Therefore, it isdifficult to evaluate or improve the formal logarithmic ac-curacy of the generated samples. This limitation could becircumvented with an end-to-end GAN architecture moresuited to sparse images. We leave a more detailed study Parameters Value Intermediate dimension 384KL annealing rate 0.25KL annealing factor 1.05Minibatch discriminator NoEpochs 50Batch size 32Latent dimension 1000ZCA Yes n avg . · − Decay β Table 3. Final parameters for the VAE model. Input samples Generated samples Fig. 14. A random selection of preprocessed input images(left), and of images generated with the VAE model (right).Axis and colour schemes are the same of figure 5. of this for future work. The full code and the pretrainedmodels presented in this paper are available in [48, 49]. Acknowledgments We thank Sydney Otten for discussions on β -VAEs. Wealso acknowledge the NVIDIA Corporation for the dona-tion of a Titan Xp GPU used for this research. F.D. issupported by the Science and Technology Facilities Coun-cil (STFC) under grant ST/P000770/1. S.C. is supportedby the European Research Council under the EuropeanUnion’s Horizon 2020 research and innovation Programme(grant agreement number 740006). A VAE and WGAN-GP models In this appendix we present the final parameters as wellas generated event samples for the VAE and WGAN-GPmodels used in section 2.5. These models are obtainedafter applying the hyperopt procedure described in sec-tion 2.4. Parameters Value D units 16 G units 4 α D momentum 0.7 G momentum 0.7Minibatch discriminator NoEpochs 300Batch size 32Latent dimension 800ZCA Yes n avg . · − Decay β · − ρ Table 4. Final parameters for the WGAN-GP model. Input samples Generated samples Fig. 15. A random selection of preprocessed input images(left), and of images generated with the WGAN-GP model(right). Axis and colour schemes are the same of figure 5. The VAE encoder consists of a dense layer with 384units with ReLU activation function connected to a la-tent space with 1000 dimensions. The decoder consists ofa dense layer with 384 units with ReLU activation followedby an output layer which matches the shape of the imagesand has a hyperbolic tangent activation function. The re-construction loss function used during training is taken tobe the mean squared error. The best parameters for theVAE model obtained after the hyperopt procedure areshown in table 3. In figure 14 we show a random selectionof preprocessed images generated through the VAE. Froma qualitative point of view the images appear realistic onan event-by-event comparison however as highlighted insection 2.5, the VAE model does not reproduce the un-derlying distribution accurately.Finally, the WGAN-GP consists in a generator anddiscriminator. The generator architecture contains a denselayer with 1152 units with ReLU activation function fol- lowed by three sequential two-dimensional convolutionallayers with a kernel size of 4 and respectively 32, 16 and1 filters. Between these layers we apply batch normali-sation and ReLU activation function while the final layerhas a hyperbolic tangent activation function. On the otherhand, the discriminator is composed by 4 two-dimensionalconvolutional layers with a kernel size of 3 and respectively16, 32, 64, 128 and 128 filters. We apply batch normalisa-tion for the last three layers and all of them LeakyReLUactivation function with a dropout layer. In table 4 weprovide the best parameters of the WGAN-GP model, al-ways obtained through the hyperopt scan procedure. Infigure 15 we show a random selection of preprocessed im-ages generated through the WGAN-GP. Due to the con-volutional filters of this model the preprocessing differsslightly from the description in section 2.3 as we do notremove pixels outside the kinematic range resulting in im-ages with non zero background pixels. While distributionspresented in section 2.5 are in good agreement with data,it is clear that for this WGAN-GP model the individualimages look different from the input data. References 1. G.F. Sterman, S. Weinberg, Phys. Rev. Lett. , 1436(1977)2. G.P. Salam, Eur. Phys. J. C67 , 637 (2010), 3. S.D. Ellis, D.E. Soper, Phys. Rev. D48 , 3160 (1993), hep-ph/9305266 4. Y.L. Dokshitzer, G.D. Leder, S. Moretti, B.R. Webber,JHEP , 001 (1997), hep-ph/9707323 5. M. Cacciari, G.P. Salam, G. Soyez, JHEP , 063 (2008), 6. J. Thaler, L.T. Wang, JHEP , 092 (2008), 7. D.E. Kaplan, K. Rehermann, M.D. Schwartz, B. Tweedie,Phys. Rev. Lett. , 142001 (2008), 8. S.D. Ellis, C.K. Vermilion, J.R. Walsh, Phys. Rev. D80 ,051501 (2009), 9. S.D. Ellis, C.K. Vermilion, J.R. Walsh, Phys. Rev. D81 ,094023 (2010), 10. T. Plehn, G.P. Salam, M. Spannowsky, Phys. Rev. Lett. , 111801 (2010), 11. J. Thaler, K. Van Tilburg, JHEP , 015 (2011), 12. A.J. Larkoski, G.P. Salam, J. Thaler, JHEP , 108(2013), 13. Y.T. Chien, Phys. Rev. D90 , 054008 (2014), 14. M. Cacciari, G.P. Salam, G. Soyez, Eur. Phys. J. C75 , 59(2015), 15. A.J. Larkoski, I. Moult, D. Neill, JHEP , 009 (2014), 16. I. Moult, L. Necib, J. Thaler, JHEP , 153 (2016), 17. M. Dasgupta, A. Fregoso, S. Marzani, G.P. Salam, JHEP , 029 (2013), 18. A.J. Larkoski, S. Marzani, G. Soyez, J. Thaler, JHEP ,146 (2014), 19. P.T. Komiske, E.M. Metodiev, B. Nachman, M.D.Schwartz, JHEP , 051 (2017), 20. P.T. Komiske, E.M. Metodiev, J. Thaler, JHEP , 013(2018), tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 1121. F.A. Dreyer, L. Necib, G. Soyez, J. Thaler, JHEP , 093(2018), 22. F.A. Dreyer, G.P. Salam, G. Soyez, JHEP , 064 (2018), 23. A. Butter et al., SciPost Phys. , 014 (2019), 24. S. Carrazza, F.A. Dreyer (2019), 25. P. Berta, L. Masetti, D.W. Miller, M. Spousta (2019), 26. E.A. Moreno, O. Cerri, J.M. Duarte, H.B. Newman, T.Q.Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu,J.R. Vlimant (2019), 27. N. Fischer, S. Gieseke, S. Pltzer, P. Skands, Eur. Phys. J. C74 , 2831 (2014), Les Houches 2017: Physics at TeV Colliders StandardModel Working Group Report (2018), 29. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Gen-erative adversarial nets , in Advances in neural informationprocessing systems (2014), pp. 2672–268030. D.P. Kingma, M. Welling, Auto-Encoding VariationalBayes , in (2014), http://arxiv.org/abs/1312.6114 31. L. de Oliveira, M. Paganini, B. Nachman, Comput. Softw.Big Sci. , 4 (2017), 32. M. Paganini, L. de Oliveira, B. Nachman, Phys. Rev. Lett. , 042003 (2018), 33. M. Paganini, L. de Oliveira, B. Nachman, Phys. Rev. D97 ,014021 (2018), 34. S. Otten, S. Caron, W. de Swart, M. van Beekveld, L. Hen-driks, C. van Leeuwen, D. Podareanu, R. Ruiz de Austri,R. Verheyen (2019), 35. P. Musella, F. Pandolfi, Comput. Softw. Big Sci. , 8(2018), 36. K. Datta, D. Kar, D. Roy (2018), 37. O. Cerri, T.Q. Nguyen, M. Pierini, M. Spiropulu, J.R. Vli-mant, JHEP , 036 (2019), 38. R. Di Sipio, M. Faucci Giannelli, S. Ketabchi Haghighat,S. Palazzo (2019), 39. A. Butter, T. Plehn, R. Winterhalder (2019), 40. (2019), 41. T.A. collaboration (ATLAS) (2019)42. X. Mao, Q. Li, H. Xie, R.Y.K. Lau, Z. Wang, CoRR abs/1611.04076 (2016), 43. T. Sjstrand, S. Ask, J.R. Christiansen, R. Corke, N. De-sai, P. Ilten, S. Mrenna, S. Prestel, C.O. Rasmussen,P.Z. Skands, Comput. Phys. Commun. , 159 (2015), 44. J. de Favereau, C. Delaere, P. Demin, A. Giammanco,V. Lematre, A. Mertens, M. Selvaggi (DELPHES 3), JHEP , 057 (2014), 45. M. Arjovsky, S. Chintala, L. Bottou, Wasser-stein Generative Adversarial Networks , in Pro-ceedings of the 34th International Conference onMachine Learning, ICML 2017, Sydney, NSW,Australia, 6-11 August 2017 (2017), pp. 214–223, http://proceedings.mlr.press/v70/arjovsky17a.html 46. J.Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired Image-to-Image Translation using Cycle-Consistent AdversarialNetworkss , in Computer Vision (ICCV), 2017 IEEE In-ternational Conference on (2017)47. J. Duarte et al., JINST , P07027 (2018), 48. F. Dreyer, S. Carrazza, Jetsgame/glund v1.0.0 (2019), https://doi.org/10.5281/zenodo.3384920 49. F. Dreyer, S. Carrazza, Jetsgame/cyclejet v1.0.0 (2019), https://doi.org/10.5281/zenodo.3384918 50. S. Carrazza, F.A. Dreyer, JetsGame/datav1.0.0 (2019), this repository is git-lfs., https://doi.org/10.5281/zenodo.2602514 51. M. Wobisch, T. Wengler, Hadronization corrections tojet cross-sections in deep inelastic scattering , in MonteCarlo generators for HERA physics. Proceedings, Work-shop, Hamburg, Germany, 1998-1999 (1998), pp. 270–279, hep-ph/9907280 52. M. Cacciari, G.P. Salam, G. Soyez, Eur. Phys. J. C72 ,1896 (2012), 53. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, in Pro-ceedings of the 27th International Conference on NeuralInformation Processing Systems - Volume 2 (MIT Press,Cambridge, MA, USA, 2014), NIPS’14, pp. 2672–2680, http://dl.acm.org/citation.cfm?id=2969033.2969125 54. T. Salimans, I.J. Goodfellow, W. Zaremba, V. Cheung,A. Radford, X. Chen, CoRR abs/1606.03498 (2016), 55. A.J. Bell, T.J. Sejnowski, Vision Research , 3327 (1997)56. J. Bergstra, D. Yamins, D.D. Cox, Making a Science ofModel Search: Hyperparameter Optimization in Hundredsof Dimensions for Vision Architectures , in Proceedings ofthe 30th International Conference on International Con-ference on Machine Learning - Volume 28 (JMLR.org,2013), ICML’13, pp. I–115–I–12357. Zhou Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli,IEEE Transactions on Image Processing , 600 (2004)58. A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlineari-ties improve neural network acoustic models , in in ICMLWorkshop on Deep Learning for Audio, Speech and Lan-guage Processing (2013)59. I. Higgins, L. Matthey, X. Glorot, A. Pal, B. Uria, C. Blun-dell, S. Mohamed, A. Lerchner, CoRR abs/1606.05579 (2016), 60. C.P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Wat-ters, G. Desjardins, A. Lerchner, CoRR abs/1804.03599 (2018), 61. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A.C.Courville, CoRR abs/1704.00028 (2017), 62. S.R. Bowman, L. Vilnis, O. Vinyals, A.M. Dai,R. J´ozefowicz, S. Bengio, CoRR abs/1511.06349 (2015), 63. C. Frye, A.J. Larkoski, J. Thaler, K. Zhou, JHEP , 083(2017), 64. A. Andreassen, B. Nachman (2019), 65. D. Ulyanov, A. Vedaldi, V.S. Lempitsky, CoRR abs/1607.08022 (2016),(2016),