[PDF] Lund jet images from generative and cycle-consistent adversarial networks

Abstract

We introduce a generative model to simulate radiation patterns within a jet using the Lund jet plane. We show that using an appropriate neural network architecture with a stochastic generation of images, it is possible to construct a generative model which retrieves the underlying two-dimensional distribution to within a few percent. We compare our model with several alternative state-of-the-art generative techniques. Finally, we show how a mapping can be created between different categories of jets, and use this method to retroactively change simulation settings or the underlying process on an existing sample. These results provide a framework for significantly reducing simulation times through fast inference of the neural network as well as for data augmentation of physical measurements.

Full PDF

EEPJ manuscript No. (will be inserted by the editor)

Lund jet images from generative and cycle-consistent adversarialnetworks

Stefano Carrazza and Fr´ed´eric A. Dreyer TIF Lab, Dipartimento di Fisica, Universit`a degli Studi di Milano and INFN Milan,Via Celoria 16, 20133, Milano, Italy Rudolf Peierls Centre for Theoretical Physics, University of Oxford,Clarendon Laboratory, Parks Road, Oxford OX1 3PUReceived: date / Accepted: date

Abstract.

We introduce a generative model to simulate radiation patterns within a jet using the Lundjet plane. We show that using an appropriate neural network architecture with a stochastic generation ofimages, it is possible to construct a generative model which retrieves the underlying two-dimensional distri-bution to within a few percent. We compare our model with several alternative state-of-the-art generativetechniques. Finally, we show how a mapping can be created between diﬀerent categories of jets, and usethis method to retroactively change simulation settings or the underlying process on an existing sample.These results provide a framework for signiﬁcantly reducing simulation times through fast inference of theneural network as well as for data augmentation of physical measurements.

PACS.

One of the most common objects emerging from hadroncollisions at particle colliders such as the Large HadronCollider (LHC) are jets. These are loosely interpreted ascollimated bunches of energetic particles arising from theinteractions of quarks and gluons, the fundamental con-stituents of the proton [1, 2]. In practice, jets are usu-ally deﬁned through a sequential recombination algorithmmapping ﬁnal-state particle momenta to jet momenta, witha free parameter R deﬁning the radius up to which sepa-rate particles are clustered into a single jet [3–5].Because of the high energies involved in the collisionsat the LHC, heavy particles such as vector bosons or topquarks are frequently produced with very large transversemomenta. In this boosted regime, the decay products ofthese objects can become so collimated that they are re-constructed as a single jet. An active ﬁeld of research istherefore dedicated to the theoretical understanding of ra-diation patterns within jets, notably to distinguish theirphysical origins and remove radiation unassociated withthe hard process [6–26]. Furthermore, measurements of jetproperties provide a unique opportunity for accurate com-parisons between theoretical predictions and data, and canbe used to tune simulation tools [27] or extract physicalconstants [28].In recent years, there has also been considerable in-terest in applications of generative adversarial networks(GAN) [29] and variational autoencoders (VAE) [30] toparticle physics, where such generative models can be used Fig. 1.

Average Lund jet plane density for QCD jets simulatedwith Pythia v8.223 and Delphes v3.4.1. to signiﬁcantly reduce the computing resources required tosimulate realistic LHC data [31–40]. In this paper, we in-troduce a generative model to create new samples of thesubstructure of a jet from existing data. We use the Lundjet plane [22], shown in ﬁgure 1, as a visual representa- a r X i v : . [ h e p - ph ] N ov Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs tion of the clustering history of a jet. This provides aneﬃcient encoding of a jets radiation patterns and can bedirectly measured experimentally [41]. The Lund jet im-age is used to train a Least Square GAN (LSGAN) [42] toreproduce simulated data within a few percent accuracy.We compare a range of alternative generative methods,and show good agreement between the original jets gener-ated with Pythia v8.223 [43] using fast detector simulationwith Delphes v3.4.1 particle ﬂow [44] and samples pro-vided by the diﬀerent models [45]. Finally, we show how acycle-consistent adversarial network (CycleGAN) [46] canbe used to create mappings between diﬀerent categoriesof jets. We apply this framework to retroactively changethe parameters of the parton shower on an event, addingnon-perturbative eﬀects to an existing parton-level sam-ple, and transforming quark and gluon jets to a boosted W sample.These methods provide a systematic tool for data aug-mentation, as well as reductions of simulation time andstorage space by several orders of magnitude, e.g. througha fast inference of the neural network with hardware archi-tectures such as GPUs and ﬁeld-programmable gate arrays(FPGA) [47]. The code frameworks and data used in thiswork are available as open-source and published materialin [48–50] . In this article we will construct a generative model, whichwe call gLund, to create new samples of radiation patternsof jets. We ﬁrst introduce the basis used to describe a jetas an image, then construct a generative model which canbe trained on these objects.

To describe the radiation patterns of a jet, we will usethe primary Lund plane representation [22], which canbe projected onto a two-dimensional image that serves asinput to a neural network.The Lund jet plane is constructed by reclustering ajet’s constituents with the Cambridge-Aachen (C/A) algo-rithm [4,51]. This algorithm sequentially recombines pairsof particles that have the minimal ∆ ij = ( y i − y j ) + ( φ i − φ j ) value, where y i and φ i are the rapidity and azimuthof particle i .This clustering sequence can be used to construct an n × n pixel image describing the radiation patterns of theinitial jet. We iterate in reverse through the clusteringsequence, labelling the momenta of the two branches ofa declustering as p a and p b , ordered in transverse mo-mentum such that p t,a > p t,b . This procedure follows theharder branch a and at each step we activate the pixel onthe image corresponding to the coordinates (ln ∆ ab , ln k t ), The codes are available at https://github.com/JetsGame/gLund and https://github.com/JetsGame/CycleJet

Initial sample n avg = 5 n avg = 10 n avg = 20 Fig. 2.

Sample input images after averaging with n avg =1, 5,10 and 20. where k t = p t,b ∆ ab is the transverse momentum of particle b relative to a . The data sample used in this article consists of 500k jets,generated using the dijet process in Pythia v8.223. Jets areclustered using the anti- k t algorithm [5, 52] with radius R = 1 .

0, and are required to pass a selection cut, withtransverse momentum p t >

500 GeV and rapidity | y | < .

5. Unless speciﬁed otherwise, results use the Delphesv3.4.1 fast detector simulation, with the delphes_card_CMS_NoFastJet.tcl card to simulate both detector eﬀectsand particle ﬂow reconstruction.The simulated jets are then converted to Lund imageswith 24 ×

24 pixels each using the procedure described insection 2.1. A pixel is set to one if there is a corresponding(ln ∆ ab , ln k t ) primary declustering sequence, otherwise itis left at zero.The full samples used in this article can be accessedonline [50]. Generative adversarial networks [53] are one of the mostsuccessful unsupervised learning methods. They are con-structed using both a generator G and discriminator D ,which are competing against each other through a valuefunction V ( G, D ).In practice, we found improved performance when us-ing a Least Square Generative Adversarial Network (LS-GAN) [42], a speciﬁc class of GAN which uses a leastsquares loss function for the discriminator, and has objec-tive functions deﬁned as For simplicity, we consider only whether a pixel is on or oﬀ,instead of counting the number of hits as in [22]. While thesetwo deﬁnitions are equivalent only for large image resolutions,this limitation can easily be overcome e.g. by considering aseparate channel for each activation level.tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 3 min D V ( D ) = 12 E x ∼ p data [( D ( x ) − b ) ]+ 12 E z ∼ p z ( z ) [( D ( G ( z )) − a ) ] , (1)min G V ( G ) = 12 E z ∼ p z ( z ) [( D ( G ( z )) − c ) ] , (2)where we deﬁned p z ( z ) as a prior on input noise vari-ables, and a , b and c are the labels for the fake, real andpresumed fake data respectively. Thus D is trained in or-der to maximise the probability of correctly distinguishingthe training examples and the samples from G , followingequation (1), while the latter is trained to minimise equa-tion (2). The generator’s distribution p g optimises equa-tion (2) when p g = p data , so that the generator learns howto generate new samples from z . The main advantage ofthe LSGAN over the original GAN framework is a morestable training process, due to an absence of vanishing gra-dients. In addition, we include a minibatch discriminationlayer [54] to avoid collapse of the generator.The LSGAN is trained on the full sample of QCD Lundjet images. In order to overcome the limitation of GANsdue to the sparse and discrete nature of Lund images, wewill use a probabilistic interpretation of the Lund imagesto train the model. To this end, we will ﬁrst re-sample ourinitial data set into batches of n avg and create a new set of500k images, each consisting of the average of n avg initialinput images, as shown in ﬁgure 2. These images can bereinterpreted as physical events through a random sam-pling, where the pixel value is interpreted as the probabil-ity that the pixel is activated. The n avg value is a param-eter of the model, with a large value leading to increasedvariance in the generated images compared to the refer-ence sample, while for too low values the model performspoorly due to the sparsity and discreteness of the data.A further data preprocessing step before training the LS-GAN consists in rescaling the pixel intensities to be in the[ − ,

1] range, and masking entries outside of the kinematiclimit of the Lund plane. The images are then whitened us-ing zero-phase components analysis (ZCA) whitening [55].

The optimal choice of hyperparameters, both for the LS-GAN model architecture and for the image preprocessing,is determined using the distributed asynchronous hyper-parameter optimisation library hyperopt [56].The performance of each setup is evaluated by a lossfunction which compares the reference preprocessed Lundimages to the artiﬁcial images generated by the LSGANmodel. We deﬁne the loss function as L h = I + 5 · S (3)where I is the norm of the diﬀerence between the aver-age of the images of the two samples and S is the absolutediﬀerence in structural similarity [57] values between 5000 random pairs of reference samples, and reference and gen-erated samples.We perform 1000 iterations and select the one for whichthe loss L h is minimal. In ﬁgure 3 we show some of theresults obtained with the hyperopt library through theTree-structured Parzen Estimator (TPE) algorithm. TheLSGAN is constructed from a generator and discrimina-tor. The generator consists in three dense layers with 512,1024 and 2048 units respectively using LeakyReLU [58] ac-tivation functions and batch normalisation layers, as wellas a ﬁnal layer matching the output dimension and usinga hyperbolic tangent activation function. The discrimina-tor is constructed from two dense layers with 768 and 384units using a LeakyReLU activation function, followed byanother 24-dimensional dense layer connected to a mini-batch discrimination layer, with a ﬁnal fully connectedlayer with one-dimensional output. The best parametersfor this model are listed in table 1. The loss of the gener-ator and discriminator networks of the LSGAN is shownin ﬁgure 4 as a function of training epochs.In ﬁgure 5, the ﬁrst two images illustrate an example ofinput image before and after preprocessing while the lasttwo images represent the raw output from the LSGANmodel and the corresponding sampled Lund image.A selection of preprocessed input images and imagesgenerated with the LSGAN model are shown in ﬁgure 6.The ﬁnal averaged results for the Lund jet plane densityare shown in ﬁgure 7 for the reference sample (left), thedata set generated by the gLund model (centre) and theratio between these two samples (right). We observe agood agreement between the reference and the artiﬁcialsample generated by the gLund model. The model is ableto reproduce the underlying distribution to within a 3-5%accuracy in the bulk region of the Lund image. Largerdiscrepancies are visible at the boundaries of the Lundimage and are due the vanishing pixel intensities. In prac-tice this model provides a new approach to reduce MonteCarlo simulation time for jet substructure applications aswell as a framework for data augmentation. Let us now quantify the quality of the model describedin section 2.3 more concretely. As alternatives, we con-sider a variational autoencoder (VAE) [30, 59, 60] and aWasserstein GAN [45, 61].A VAE is a latent variable model, with a probabilisticencoder q φ ( z | x ), and a probabilistic decoder p θ ( x | z ) tomap a representation from a prior distribution p θ ( z ). Thealgorithm learns the marginal likelihood of the data in thisgenerative process, which corresponds to maximising L ( θ, φ ) = E q φ ( z | x ) [log p θ ( x | z )] − βD KL ( q φ ( z | x ) || p ( z )) , (4)where β is an adjustable hyperparameter controlling thedisentanglement of the latent representation z . In our im-plementation, we will set β = 1, which corresponds to theoriginal VAE framework. Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs L o ss

50 100 200 500 800 1000Latent dimension 2 4 8 10 16 20 32 n avg

128 256 384 512 768number of units in D L o ss

256 384 512 768 1024number of units in G G D Fig. 3.

Hyperparameter scan results obtained with the hyperopt library. The ﬁrst row shows the scan over image and optimiserrelated parameters while the second row plots correspond to the ﬁnal architecture scan.

Parameters Value

Architecture LSGAN D units 384 G units 512 α D α G n avg . · − Decay β · − Optimiser Adagrad

Table 1.

Final parameters for the gLund model. L o ss EpochsDiscriminator lossGenerator loss

Fig. 4.

Loss of the LSGAN discriminator and generatorthroughout the training stage.

During the training of the VAE, we use KL cost an-nealing [62] to avoid a collapse of the VAE output to theprior distribution. This is a problem caused by the largevalue of the KL divergence term in the early stages oftraining, which is mitigated by adding a variable weight w KL to the KL term in the cost function, expressed as w KL ( n step ) = min(1 , . · . n step ) . (5)Finally, we will also consider a Wasserstein GAN withgradient penalty (WGAN-GP). WGANs [45] use the Wasser-stein distance to construct the value function, but cansuﬀer from undesirable behaviour due to the critic weightclipping. This can be mitigated through gradient penalty,where the norm of the gradient of the critic is penalisedwith respect to its input [61]. tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 5 ab )20246 l n ( k t / G e V ) Raw input ab )20246 l n ( k t / G e V ) Preprocessed input ab )20246 l n ( k t / G e V ) Raw generated output ab )20246 l n ( k t / G e V ) Generated sample

Fig. 5.

Left two ﬁgures: Sample input images before and after preprocessing. Right two: sample generated by the LSGAN andthe corresponding Lund image.

Input samples Generated samples

Fig. 6.

A random selection of preprocessed input images (left),and of images generated with the LSGAN model (right). Axesand colour schemes are identical to ﬁgure 5.

We determine the best hyperparameters for both ofthese models through a hyperopt parameter sweep, whichis summarised in Appendix A. To train these models usingLund images, we then use the same preprocessing stepsdescribed in section 2.3.To compare our three models, we consider two slicesof ﬁxed k t or ∆ ab size, cutting along the Lund jet planehorizontally or vertically respectively.In ﬁgure 8, we show the k t slice, with the referencesample in red. The lower panel gives the ratio of the dif-ferent models to the reference Pythia 8 curve, showingvery good performance for the LSGAN and WGAN-GPmodels, which are able to reproduce the data within afew percent. The VAE model also qualitatively reproducesthe main features of the underlying distribution, howeverwe were unable to improve the accuracy of the generatedsample to more than 20% without avoiding the issue ofposterior collapse. The same observations can be made inﬁgure 9, which shows the Lund plane density as a functionof k t , for a ﬁxed slice in ∆ ab . In ﬁgure 10a we show the distribution of the numberof activated pixels per image for the reference sample gen-erated with Pythia 8 and the artiﬁcial images producedby the LSGAN, WGAN-GP and VAE models. All modelsexcept the VAE model provide a good description of thereference distribution.We also use the Lund image to reconstruct the soft-drop multiplicity [63]. To this end, for a simpler corre-spondence between this observable and the Lund image,we retrained the generative models using ln( z∆ ) as y -axis.The soft-drop multiplicity can then be extracted from theﬁnal image, and is shown in ﬁgure 10b for each model us-ing z cut = 0 .

007 and β = −

1. The dashed lines indicatethe true reference distribution, as evaluated directly onthe declustering sequence, and which diﬀers slightly fromthe reconstructed curve due to the ﬁnite pixel and imagesize.Finally, in ﬁgure 10c, we show the reconstructed massof the groomed jet using the modiﬁed Mass Drop Tag-ger [17] with z cut = 0 .

1, where we approximate the massas ρ = m R p t (cid:39) max i (cid:104) z ( i ) (cid:0) ∆ ( i ) (cid:1) (cid:105) , (6)The dotted line shows the true mass distribution, evalu-ated with the left-hand side of equation (6) on the groomedjet. As in previous comparisons, we observe a very goodagreement of the LSGAN and WGAN-GP models withthe reference sample.We note that while the WGAN-GP model is able toaccurately reproduce the distribution of the training data,as discussed in Appendix A, the individual images them-selves can diﬀer quite notably from their real counterpart.For this reason, our preferred model in this paper is theLSGAN-based one. Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs ab )20246 l n ( k t / G e V ) reference (a) Reference ab )20246 l n ( k t / G e V ) generated (b) Generated ab )20246 l n ( k t / G e V ) generated/reference (c) Ratio generated/reference Fig. 7.

Average Lund jet plane density for (a) the reference sample and (b) a data set generated by the gLund model. (c) showsthe ratio between these two densities. ( a b , f i x e d k t ) k t [GeV] < 25.8 LSGANWGAN-GPVAEPythia 81 0.5 0.2 0.1 0.05 0.02 0.01 ab r a t i o t o P y t h i a Fig. 8.

Slice of the Lund plane along ∆ ab with 7 .

Slice of the Lund plane along k t with 0 . < ∆ ab < . nique to create mappings between diﬀerent types of jet. Asexamples, we will consider a mapping from parton-level todetector-level images, and a mapping from QCD imagesgenerated through Pythia 8’s dijet process, to hadronicallydecaying W jets obtained from W W scattering.The cycle obtained for a CycleGAN trained on par-ton and detector-level images is shown in ﬁgure 11, wherean initial parton-level Lund image is transformed to adetector-level one, before being reverted again. The sam-pled image is shown in the bottom row. tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 7 (a) n SD z cut =0.007, = 1, cut =0.0 LSGANWGANGPVAEPythia 8 (b) mMDT z cut =0.1 LSGANWGANGPVAEPythia 8 (c) Fig. 10.

Distribution of (a) the number of activated pixels per image, (b) the reconstructed soft-drop multiplicity for z cut =0 . β = − θ cut = 0, and (c) the jet mass after applying the modiﬁed Mass Drop Tagger with z cut = 0 . parton level detector level parton level Fig. 11.

Top: transition from parton-level to delphes-level andback using CycleJet. Bottom: corresponding sampled event.

A CycleGAN learns mapping functions between two do-mains X and Y , using as input training samples from bothdomains. It creates an unpaired image-to-image transla-tion by learning both a mapping G : X → Y and an in-verse mapping F : Y → X which observes a forward cycleconsistency x ∈ X → G ( x ) → F ( G ( x )) ≈ x as well as abackward cycle consistency y ∈ Y → F ( y ) → G ( F ( y )) ≈ y . This behaviour is achieved through the implementationof a cycle consistency loss L cyc ( G, F ) = E x ∼ p data ( x ) [ (cid:107) F ( G ( x )) − x (cid:107) ]+ E y ∼ p data ( y ) [ (cid:107) G ( F ( y )) − y (cid:107) ] , (7)Additionally, the full objective includes also adversar-ial losses to both mapping functions. For the mappingfunction G : X → Y and its corresponding discriminator D Y , the objective is expressed as L GAN ( G, D Y , X, Y ) = E y ∼ p data ( y ) [log D Y ( y )]+ E x ∼ p data ( x ) [log(1 − D Y ( G ( x )))] , (8)such that G is incentivized to generate images G ( x )that resemble images from Y , while the discriminator D Y attempts to distinguish between translated and originalsamples.Thus, CycleGAN aims to ﬁnd arguments solving G ∗ , F ∗ = arg min G,F max D X ,D Y L ( G, F, D X , D Y ) , (9)where L is the full objective, given by L ( G, F, D X , D Y ) = L GAN ( G, D Y , X, Y )+ L GAN ( F, D X , Y, X ) + λ L cyc ( G, F ) . (10)Here λ is parameter controlling the importance of the cy-cle consistency loss. We implemented a CycleGAN frame-work, labelled CycleJet, that can be used to create map-pings between two domains of Lund images. By training anetwork on parton and detector-level images, this methodcan thus be used to retroactively add non-perturbativeand detector eﬀects to existing parton-level samples. Simi-larly, one can train a model using images generated throughtwo diﬀerent underlying processes, allowing for a mappinge.g. from QCD jets to W or top initiated jets. Following the pipeline presented in section 2.4 we per-form 1000 iterations of the hyperparameter scan using the hyperopt library and the loss function L h = || R A − P B → A || + || R B − P A → B || (11) CycleJet can also be used for similar practical purposes asDCTR [64], albeit it is of course limited to the Lund imagerepresentation. Stefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs

Parameters Value D ﬁlters 32 G ﬁlters 32 λ cycle 10 λ identity factor 0.2Epochs 3Batch size 128ZCA Yes n avg . · − Decay β Table 2.

Final parameters for the CycleJet model. where A and B indexes refer to the desired input andoutput samples respectively so R A and R B are the aver-age reference images before the CycleGAN transformationwhile P B → A and P A → B correspond to the average imageafter the transformation. Furthermore, for this model wenoticed better results when preprocessing the pixel inten-sities with the standardisation procedure of removing themean and scaling to unit variance, instead of a simplerrescaling in the [-1,1] range as done in section 2.The CycleJet model consists in two generators and twodiscriminators. The generators consist in a down-samplingmodule with three two-dimensional convolutional layerswith 32, 64 and 128 ﬁlters respectively, and LeakyReLUactivation function and instance normalisation [65], fol-lowed by an up-sampling with two two-dimensional con-volutional layers with 64 and 32 ﬁlter. The last layer isa two-dimensional convolution with one ﬁlter and hyper-bolic tangent activation function. The discriminators takethree two-dimensional convolutional layers with 32, 64 and128 ﬁlters and LeakyReLU activation. The ﬁrst convo-lutional layer has additionally an instance normalisationlayer and the ﬁnal layer is a two-dimensional convolutionallayer with one ﬁlter. The best parameters for the CycleJetmodel are shown in table 2.In the ﬁrst row of ﬁgure 12 we show results for an initialaverage parton-level sample before (left) and after (right)applying the parton-to-detector mapping encoded by theCycleJet model, while in the second row of the same ﬁgurewe perform the inverse operation by taking as input theaverage of the dephes-level sample before (left) and after(right) applying the CycleJet detector-to-parton mapping.This example shows clearly the possibility to add non per-turbative and detector eﬀects to a parton level simulationwithin good accuracy. Similarly to the previous example,in ﬁgure 13 we present the mapping between QCD-to- W jets and vice-versa. Also in this case, the overall qualityof the mapping is reasonable and provides and interestingsuccessful test case for process remapping. ab )20246 l n ( k t / G e V ) parton level ab )20246 l n ( k t / G e V ) transformed ab )20246 l n ( k t / G e V ) detector level ab )20246 l n ( k t / G e V ) transformed Fig. 12.

Top: average of the parton-level sample before (left)and after (right) applying the parton-to-detector mapping.Bottom: average of the delphes-level sample before (left) andafter (right) applying the detector-to-parton mapping.

For both examples we observe a good level agreementfor the respective mappings, highlighting the possibilityto use such an approach to save CPU time for applyingfull detector simulations and non perturbative eﬀects toparton level events. It is also possible to train the CycleJetmodel on Monte Carlo data and apply the correspondingmapping to real data.

We have conducted a careful study of generative modelsapplied to jet substructure.First, we trained a LSGAN model to generate new arti-ﬁcial samples of detector level Lund jet images. With this,we observed agreement to within a few percent accuracyin the bulk of the phase space with respect to the referencedata. This new approach provides an eﬃcient method forfast simulation of jet radiation patterns without requiringthe long runtime of full Monte Carlo event generators. An-other advantage consists in the possibility of this methodto be applied to real collider data to generate accuratephysical samples, as well as making it possible to avoidthe necessity for large storage space by generating realis-tic samples on-the-ﬂy. tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 9 ab )20246 l n ( k t / G e V ) QCD ab )20246 l n ( k t / G e V ) transformed ab )20246 l n ( k t / G e V ) W ab )20246 l n ( k t / G e V ) transformed Fig. 13.

Top: average of the QCD sample before (left) and af-ter (right) applying the QCD-to- W mapping. Bottom: averageof the W sample before (left) and after (right) applying the W -to-QCD mapping. Secondly, a CycleGAN model was constructed to mapdiﬀerent jet conﬁgurations, allowing for the conversion ofexisting events. This procedure can be used to changeMonte Carlo parameters such as the underlying processor the shower parameters. As examples we show how toconvert an existing sample of QCD jets into W jets andvice-versa, or how to add non perturbative and detectoreﬀects to a parton level simulation. As for the LSGAN,this method can be used to save CPU time by includ-ing full detector simulations and non perturbative eﬀectsto parton level events. Additionally, one could use Cy-cleJet to transform real data using mappings trained onMonte Carlo samples or apply them to samples generatedthrough gLund.To achieve the results presented in this paper we haveimplemented a rather convolved preprocessing step whichnotably involved combining and resampling multiple im-ages. This procedure was necessary to achieve accuratedistributions but comes with the drawback of loosing in-formation on correlations between emissions at wide angu-lar and transverse momentum separation. Therefore, it isdiﬃcult to evaluate or improve the formal logarithmic ac-curacy of the generated samples. This limitation could becircumvented with an end-to-end GAN architecture moresuited to sparse images. We leave a more detailed study Parameters Value

Intermediate dimension 384KL annealing rate 0.25KL annealing factor 1.05Minibatch discriminator NoEpochs 50Batch size 32Latent dimension 1000ZCA Yes n avg . · − Decay β Table 3.

Final parameters for the VAE model.

Input samples Generated samples

Fig. 14.

A random selection of preprocessed input images(left), and of images generated with the VAE model (right).Axis and colour schemes are the same of ﬁgure 5. of this for future work. The full code and the pretrainedmodels presented in this paper are available in [48, 49].

Acknowledgments

We thank Sydney Otten for discussions on β -VAEs. Wealso acknowledge the NVIDIA Corporation for the dona-tion of a Titan Xp GPU used for this research. F.D. issupported by the Science and Technology Facilities Coun-cil (STFC) under grant ST/P000770/1. S.C. is supportedby the European Research Council under the EuropeanUnion’s Horizon 2020 research and innovation Programme(grant agreement number 740006). A VAE and WGAN-GP models

In this appendix we present the ﬁnal parameters as wellas generated event samples for the VAE and WGAN-GPmodels used in section 2.5. These models are obtainedafter applying the hyperopt procedure described in sec-tion 2.4.

Parameters Value D units 16 G units 4 α D momentum 0.7 G momentum 0.7Minibatch discriminator NoEpochs 300Batch size 32Latent dimension 800ZCA Yes n avg . · − Decay β · − ρ Table 4.

Final parameters for the WGAN-GP model.

Input samples Generated samples

Fig. 15.

A random selection of preprocessed input images(left), and of images generated with the WGAN-GP model(right). Axis and colour schemes are the same of ﬁgure 5.

The VAE encoder consists of a dense layer with 384units with ReLU activation function connected to a la-tent space with 1000 dimensions. The decoder consists ofa dense layer with 384 units with ReLU activation followedby an output layer which matches the shape of the imagesand has a hyperbolic tangent activation function. The re-construction loss function used during training is taken tobe the mean squared error. The best parameters for theVAE model obtained after the hyperopt procedure areshown in table 3. In ﬁgure 14 we show a random selectionof preprocessed images generated through the VAE. Froma qualitative point of view the images appear realistic onan event-by-event comparison however as highlighted insection 2.5, the VAE model does not reproduce the un-derlying distribution accurately.Finally, the WGAN-GP consists in a generator anddiscriminator. The generator architecture contains a denselayer with 1152 units with ReLU activation function fol- lowed by three sequential two-dimensional convolutionallayers with a kernel size of 4 and respectively 32, 16 and1 ﬁlters. Between these layers we apply batch normali-sation and ReLU activation function while the ﬁnal layerhas a hyperbolic tangent activation function. On the otherhand, the discriminator is composed by 4 two-dimensionalconvolutional layers with a kernel size of 3 and respectively16, 32, 64, 128 and 128 ﬁlters. We apply batch normalisa-tion for the last three layers and all of them LeakyReLUactivation function with a dropout layer. In table 4 weprovide the best parameters of the WGAN-GP model, al-ways obtained through the hyperopt scan procedure. Inﬁgure 15 we show a random selection of preprocessed im-ages generated through the WGAN-GP. Due to the con-volutional ﬁlters of this model the preprocessing diﬀersslightly from the description in section 2.3 as we do notremove pixels outside the kinematic range resulting in im-ages with non zero background pixels. While distributionspresented in section 2.5 are in good agreement with data,it is clear that for this WGAN-GP model the individualimages look diﬀerent from the input data.

References

1. G.F. Sterman, S. Weinberg, Phys. Rev. Lett. , 1436(1977)2. G.P. Salam, Eur. Phys. J. C67 , 637 (2010),

3. S.D. Ellis, D.E. Soper, Phys. Rev.

D48 , 3160 (1993), hep-ph/9305266

4. Y.L. Dokshitzer, G.D. Leder, S. Moretti, B.R. Webber,JHEP , 001 (1997), hep-ph/9707323

5. M. Cacciari, G.P. Salam, G. Soyez, JHEP , 063 (2008),

6. J. Thaler, L.T. Wang, JHEP , 092 (2008),

7. D.E. Kaplan, K. Rehermann, M.D. Schwartz, B. Tweedie,Phys. Rev. Lett. , 142001 (2008),

8. S.D. Ellis, C.K. Vermilion, J.R. Walsh, Phys. Rev.

D80 ,051501 (2009),

9. S.D. Ellis, C.K. Vermilion, J.R. Walsh, Phys. Rev.

D81 ,094023 (2010),

10. T. Plehn, G.P. Salam, M. Spannowsky, Phys. Rev. Lett. , 111801 (2010),

11. J. Thaler, K. Van Tilburg, JHEP , 015 (2011),

12. A.J. Larkoski, G.P. Salam, J. Thaler, JHEP , 108(2013),

13. Y.T. Chien, Phys. Rev.

D90 , 054008 (2014),

14. M. Cacciari, G.P. Salam, G. Soyez, Eur. Phys. J.

C75 , 59(2015),

15. A.J. Larkoski, I. Moult, D. Neill, JHEP , 009 (2014),

16. I. Moult, L. Necib, J. Thaler, JHEP , 153 (2016),

17. M. Dasgupta, A. Fregoso, S. Marzani, G.P. Salam, JHEP , 029 (2013),

18. A.J. Larkoski, S. Marzani, G. Soyez, J. Thaler, JHEP ,146 (2014),

19. P.T. Komiske, E.M. Metodiev, B. Nachman, M.D.Schwartz, JHEP , 051 (2017),

20. P.T. Komiske, E.M. Metodiev, J. Thaler, JHEP , 013(2018), tefano Carrazza, Fr´ed´eric A. Dreyer: Lund jet images with GANs 1121. F.A. Dreyer, L. Necib, G. Soyez, J. Thaler, JHEP , 093(2018),

22. F.A. Dreyer, G.P. Salam, G. Soyez, JHEP , 064 (2018),

23. A. Butter et al., SciPost Phys. , 014 (2019),

24. S. Carrazza, F.A. Dreyer (2019),

25. P. Berta, L. Masetti, D.W. Miller, M. Spousta (2019),

26. E.A. Moreno, O. Cerri, J.M. Duarte, H.B. Newman, T.Q.Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu,J.R. Vlimant (2019),

27. N. Fischer, S. Gieseke, S. Pltzer, P. Skands, Eur. Phys. J.

C74 , 2831 (2014),

Les Houches 2017: Physics at TeV Colliders StandardModel Working Group Report (2018),

29. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,

Gen-erative adversarial nets , in

Advances in neural informationprocessing systems (2014), pp. 2672–268030. D.P. Kingma, M. Welling,

Auto-Encoding VariationalBayes , in (2014), http://arxiv.org/abs/1312.6114

31. L. de Oliveira, M. Paganini, B. Nachman, Comput. Softw.Big Sci. , 4 (2017),

32. M. Paganini, L. de Oliveira, B. Nachman, Phys. Rev. Lett. , 042003 (2018),

33. M. Paganini, L. de Oliveira, B. Nachman, Phys. Rev.

D97 ,014021 (2018),

34. S. Otten, S. Caron, W. de Swart, M. van Beekveld, L. Hen-driks, C. van Leeuwen, D. Podareanu, R. Ruiz de Austri,R. Verheyen (2019),

35. P. Musella, F. Pandolﬁ, Comput. Softw. Big Sci. , 8(2018),

36. K. Datta, D. Kar, D. Roy (2018),

37. O. Cerri, T.Q. Nguyen, M. Pierini, M. Spiropulu, J.R. Vli-mant, JHEP , 036 (2019),

38. R. Di Sipio, M. Faucci Giannelli, S. Ketabchi Haghighat,S. Palazzo (2019),

39. A. Butter, T. Plehn, R. Winterhalder (2019),

40. (2019),

41. T.A. collaboration (ATLAS) (2019)42. X. Mao, Q. Li, H. Xie, R.Y.K. Lau, Z. Wang, CoRR abs/1611.04076 (2016),

43. T. Sjstrand, S. Ask, J.R. Christiansen, R. Corke, N. De-sai, P. Ilten, S. Mrenna, S. Prestel, C.O. Rasmussen,P.Z. Skands, Comput. Phys. Commun. , 159 (2015),

44. J. de Favereau, C. Delaere, P. Demin, A. Giammanco,V. Lematre, A. Mertens, M. Selvaggi (DELPHES 3), JHEP , 057 (2014),

45. M. Arjovsky, S. Chintala, L. Bottou,

Wasser-stein Generative Adversarial Networks , in

Pro-ceedings of the 34th International Conference onMachine Learning, ICML 2017, Sydney, NSW,Australia, 6-11 August 2017 (2017), pp. 214–223, http://proceedings.mlr.press/v70/arjovsky17a.html

46. J.Y. Zhu, T. Park, P. Isola, A.A. Efros,

Unpaired Image-to-Image Translation using Cycle-Consistent AdversarialNetworkss , in

Computer Vision (ICCV), 2017 IEEE In-ternational Conference on (2017)47. J. Duarte et al., JINST , P07027 (2018),

48. F. Dreyer, S. Carrazza,

Jetsgame/glund v1.0.0 (2019), https://doi.org/10.5281/zenodo.3384920

49. F. Dreyer, S. Carrazza,

Jetsgame/cyclejet v1.0.0 (2019), https://doi.org/10.5281/zenodo.3384918

50. S. Carrazza, F.A. Dreyer,

JetsGame/datav1.0.0 (2019), this repository is git-lfs., https://doi.org/10.5281/zenodo.2602514

51. M. Wobisch, T. Wengler,

Hadronization corrections tojet cross-sections in deep inelastic scattering , in

MonteCarlo generators for HERA physics. Proceedings, Work-shop, Hamburg, Germany, 1998-1999 (1998), pp. 270–279, hep-ph/9907280

52. M. Cacciari, G.P. Salam, G. Soyez, Eur. Phys. J.

C72 ,1896 (2012),

53. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, in

Pro-ceedings of the 27th International Conference on NeuralInformation Processing Systems - Volume 2 (MIT Press,Cambridge, MA, USA, 2014), NIPS’14, pp. 2672–2680, http://dl.acm.org/citation.cfm?id=2969033.2969125

54. T. Salimans, I.J. Goodfellow, W. Zaremba, V. Cheung,A. Radford, X. Chen, CoRR abs/1606.03498 (2016),

55. A.J. Bell, T.J. Sejnowski, Vision Research , 3327 (1997)56. J. Bergstra, D. Yamins, D.D. Cox, Making a Science ofModel Search: Hyperparameter Optimization in Hundredsof Dimensions for Vision Architectures , in

Proceedings ofthe 30th International Conference on International Con-ference on Machine Learning - Volume 28 (JMLR.org,2013), ICML’13, pp. I–115–I–12357. Zhou Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli,IEEE Transactions on Image Processing , 600 (2004)58. A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectiﬁer nonlineari-ties improve neural network acoustic models , in in ICMLWorkshop on Deep Learning for Audio, Speech and Lan-guage Processing (2013)59. I. Higgins, L. Matthey, X. Glorot, A. Pal, B. Uria, C. Blun-dell, S. Mohamed, A. Lerchner, CoRR abs/1606.05579 (2016),

60. C.P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Wat-ters, G. Desjardins, A. Lerchner, CoRR abs/1804.03599 (2018),

61. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A.C.Courville, CoRR abs/1704.00028 (2017),

62. S.R. Bowman, L. Vilnis, O. Vinyals, A.M. Dai,R. J´ozefowicz, S. Bengio, CoRR abs/1511.06349 (2015),

63. C. Frye, A.J. Larkoski, J. Thaler, K. Zhou, JHEP , 083(2017),

64. A. Andreassen, B. Nachman (2019),

65. D. Ulyanov, A. Vedaldi, V.S. Lempitsky, CoRR abs/1607.08022 (2016),(2016),