[PDF] How to GAN Higher Jet Resolution

Abstract

Full PDF

SSciPost Physics Submission

How to GAN Higher Jet Resolution

Pierre Baldi , Lukas Blecher , Anja Butter , Julian Collado , Jessica N. Howard ,Fabian Keilbach , Tilman Plehn , Gregor Kasieczka , and Daniel Whiteson Department of Computer Science, University of California, Irvine, US Institut f¨ur Theoretische Physik, Universit¨at Heidelberg, Germany Department of Physics and Astronomy, University of California, Irvine, US Institut f¨ur Experimentalphysik, Universit¨at Hamburg, [email protected] 4, 2021

Abstract

QCD-jets at the LHC are described by simple physics principles. We show how super-resolution generative networks can learn the underlying structures and use them to improvethe resolution of jet images. We test this approach on massless QCD-jets and on fat top-jetsand ﬁnd that the network reproduces their main features even without training on puresamples. In addition, we show how a slim network architecture can be constructed once wehave control of the full network performance.Content1 Introduction 22 Super-resolution GAN for Jets 33 Up-sampling jets 7 a r X i v : . [ h e p - ph ] F e b ciPost Physics Submission Recent innovations in machine learning (ML) have provided boosts to many areas of par-ticle physics. Ideas developed by the machine learning community to solve tasks unrelatedto physics often have potential for applications within analysis of data in particle physics,even beyond improvements to analysis of high-dimensional data and speed improvements ofﬁrst-principle simulations. One such recent development is the ability to enhance the res-olution of images [1, 2], by learning context-dependent general rules that can be applied tospeciﬁc observations to generate estimates of higher-resolution versions of the observed im-ages. Hadronic jets produced in collisions at the Large Hadron Collider (LHC) are obviouscandidates for testing many ML-methods, as they are measured in large numbers, they comewith a simple theoretical description, their complexity is balanced by their local detectorpatterns, and they are an integral part of almost every LHC analysis. In this paper, weapply super-resolution methods to LHC jets for the ﬁrst time, generating images of jets atsigniﬁcantly higher resolution than the original observations.The idea of using ML methods for exploring jets has a rich history. Early jet classiﬁcationstudies date to the early 1990s [3,4], and work has recently gained momentum through applica-tions of deep learning tools to low-level jet observables organized as calorimeter images [5–10].This approach can also be applied to the theoretically and experimentally well-deﬁned taskof top-quark tagging [11, 12]. An alternative approach to organizing calorimeter deposits aspixelated images is to prepare a list of the 4-momenta of subjet constituents [13–16], includingrecurrent neural networks inspired by language recognition [17, 18] or point clouds [19–23].These various approaches have been compared in detail [24], revealing that their expectedperformance in tagging hadronically-decaying top quarks is relatively independent of themotivation and the architecture of the network. Open questions include attempts to gaintheoretical understanding of the network’s learned strategy [25–28], the stability with respectto detector eﬀects [29, 30], treatment of the uncertainty [31, 32], extension to a wide range ofinputs [20], and anomaly detection [22, 33–35].The ﬁrst of these open questions inspires us to search for ways to apply machine learning toimprove experimental jet measurements, by combining the basic rules of jet physics with thespeciﬁc information of an observed jet. Independent of the nature of a given jet, its physicsis described by relatively few ingredients, most notably collinear and soft QCD splittings,which can be measured at the LHC [36]. These basic principles can allow a super-resolutionalgorithm [1, 2] to accurately estimate the higher-resolution information that led to the ob-served results. Super-resolution algorithms are widely used in image applications [37, 38],including those which use convolutional neural networks CNNs [39]. They can be combinedwith generative networks [40, 41], which can describe jets [42–47] and LHC events [48–51] andhave the potential to increase the speed of LHC event generators signiﬁcantly [52–56]. Suchsuper-resolution GANs [57, 58] have already been applied to cosmological simulations [59, 60].A simple super-resolution task in jet physics is to improve the resolution of a calorimeterimage, using general QCD patterns [61]. It raises the question of whether an up-sampledjet image can include more information than the original, low-resolution image . Naively, itseems that the answer must be no, based on the same reasoning that motivates the argumentthat a generative network cannot produce more information than exists in its statisticallylimited training data set. However, this argument fails to account for the implicit knowledge2 ciPost Physics Submission embedded in the architecture of the network, which can contribute information in the samemanner as a functional ﬁt [62]. A super-resolution network applied to LHC jets combines theinformation from the low-resolution image with QCD knowledge extracted from the trainingdata, for instance the underlying theoretical principles of soft and collinear splittings combinedwith mass drop patterns. While we will not attempt to quantify the added information (suchan answer will depend on individual applications), we will show that super-resolution networkscan enhance calorimeter images, and that training on QCD-jets vs top-quark jets indicatesthat model uncertainties for this application are small.Our detailed study follows similar ideas as Ref. [61] on the way to wider applications ofsuper-resolution networks in particle physics. For example, such networks can automaticallytest the consistency of a data set when applied to diﬀerent layers of a calorimeter. Withan appropriate conditioning, they can become elements of a tagging algorithm. Up-samplingfrom calorimeter to tracker resolution can provide consistency tests between charged andneutral aspects of an event and can be turned into a new way of identifying and removingpile-up. This is especially promising, as both sides of the up-sampling are present in data andthereby allow training from data only.

Jet images

The task for our super-resolution networks is to generate a high-resolution (HR),super-resolved (SR) version of a given low-resolution (LR) image. While it is ill-posed in adeterministic sense, as many distinct HR images can correspond to a single LR image, it iswell-deﬁned in a statistical sense.Our data set are jet images containing t ¯ t -events and QCD di-jets generated with Pythia [63]for a center-of-mass energy of √ s = 14 TeV, with Delphes [64] used to model the ATLAS de-tector response, and with clustering and jet-ﬁnding done with

FastJet [65]. The fat anti- k T jets [66] have a radius R = 0 . p T ,j = 550 ...

650 GeV and | η j | < , (1)to have access to decent experimental resolution. The jet images are deﬁned by pixel-wise p T ,with order of 50 active pixels. This means that, for instance, images with 160 ×

160 pixels havea sparsity of 99.8%. For the training of super-resolution models, we provide paired LR/HRjet images, which are generated by down-sampling the HR image. We use sum pooling onthe jet constituents as an approximation to reduced detector resolution before we perform jetﬁnding [67]. After jet ﬁnding, we select the hardest jet in each of the HR and LR images asa candidate pair, rejecting the pair if either jet has fewer than 15 constituents. To ensurethat the selected HR-clustered and LR-clustered jets correspond to the same hard parton, werequire the angular distance between the two to be ∆ R = (cid:112) ∆ η + ∆ φ < .

1. This proceduredeﬁnes paired HR and LR jet images, where the LR jet image contains no information fromthe HR image. We apply this procedure to create LR-HR image pairs with down-scalingfactors of 2, 4, and 8, removing events that fail the requirement for any particular resolutionfrom all samples, which ensures that all jet samples contain the same set of events.There are multiple ways of normalizing jet images to be better suited for machine learning.Such transformations do not retain the absolute momentum, which may not be a problem3 ciPost Physics Submission ]0 . . . . P r o b a b ili t y p = 1 . ]0 . . . . p = 0 . . . . . ]0 . . . . p = 0 . . . . . ]0 . . . . p = 0 . Figure 1: Distribution of energy deposition when pixel entries are raised by several diﬀerentpowers E → E p .for classiﬁcation, but for our purposes this information is needed. In Fig. 1, we show typicalenergy distributions after re-scaling the pixel entries with a power p . Clearly, some kind ofre-scaling is helpful to enhance the otherwise extremely peaked spectrum. On the other hand,we know that the low-energy radiation is largely noise, which means that choosing p too smallis not helpful for the network to learn the leading patterns. We ﬁnd that p = 0 . p = 1. Network architecture

In our jet image study, we use a variant of the enhanced super-resolution GAN (ESRGAN) [58], illustrated in Fig. 2. To begin with, the generator convertsa LR image into a SR image using a deep residual fully convolutional network. Its main ele-ment is the dense residual block (DRB) [68], built out of consecutive convolutional layers with(3 × α = 0 .

2. The particularity of the DRB is that a layer receives the input of all otherlayers in addition to the output of the previous layer. This structure fuses all the featuremaps inside the block. Three DRBs form a residual-in-residual dense block (RRDB) [58],connected via residual connections.All convolutions in the generator preserve the spatial dimensions of the input image.Following Fig. 2, the up-sampling can be done by pixel-shuﬄe layers [69] or transposed con-volutions. Our generator up-samples by a factor of two in up to three consecutive stepsand works best if we alternate between pixel-shuﬄe and transposed convolutions. In the HRfeature space, there are two additional convolutional layers, one of which simply scales theoutput by a ﬁxed value.The discriminator network is a relatively simple feed-forward convolutional network withLeakyReLU activations, as proposed for the SRGAN [57]. It uses blocks consisting of twoconvolutional layers with a (3 × ciPost Physics Submission C on v C on v C on v c C on v C on v + T r an s p . C on v +× β Residual in Residual Dense Block C on v c C on v ×4 Dense Residual Block ×3 P i x e l S hu ff l e Generator RRDBRRDB Discriminator ×4 C on v σ C on v C on v × B × β HR / SR

Figure 2: Architecture of modiﬁed generator network from ESRGAN (upper) and discrimi-nator network modiﬁed from the SRGAN (lower).we reset all weights after a ﬁxed number of batches.

Loss function

The SRGAN and ESRGANs include a set of excess functionalities, such asperceptual loss which can potentially improve the quality of the output. This loss combinesthe adversarial loss from the discriminator with a content loss that compares feature maps ofa pre-trained image classiﬁcation network. The adversarial loss for a relativistic GAN trainedon true events ( T ) to generate new events ( G ) is L adv = −(cid:104) log D (cid:105) G − (cid:104) log(1 − D ) (cid:105) T with D T = σ ( C T − (cid:104) C (cid:105) G ) D G = σ ( C G − (cid:104) C (cid:105) T ) (2) β rescaling λ reg λ std λ pow λ HR λ LR λ adv λ patch reset intervaloptimal 10 15 0.1 0.3 0.001 0.2 1 1 0.1 0.01 0.1 20kmedium 15 15 0.1 0.3 0.001 1.2 1 1 0.1 0.05 0.1 20k Table 1: Sets of hyperparameters used for networks described in Fig. 2. Two sets are pre-sented, one which optimized performance, and a second which which performed slightly worse. β is the residual scale factor. 5 ciPost Physics Submission where σ is a sigmoid classiﬁer function and C is the unactivated discriminator output. Com-pared to a standard adversarial loss, we have an additional term because D T depends on thegenerated data G . The original content loss is not needed for our purpose. Because our HRimages should resemble the ground truth, we add a L loss between the SR and HR images.Our choice of L over L prevents blurring, L HR = L (SR , HR) . (3)In return, because the LR image should correspond to the HR-jet, we deﬁne a loss term thatcompares the model input with the down-sampled model output pixel by pixel, L LR = L (cid:88) pool (SR) , LR  . (4)When we up-sample the LR-jet image by a factor f , we need to distribute each LR pixelenergy over f × f SR pixels. These f × f pixels deﬁne a patch, and we encourage the networkto spread the LR pixel energy such that the number of active pixels corresponds to the HRtruth. This deﬁnes the additional loss term L patch = L (patch(SR) , patch(HR)) (5)The combined generator loss over the standard and re-weighted jet images is then L G = (cid:88) s ∈{ std , pow } λ s ( λ HR L HR + λ LR L LR + λ adv L adv + λ patch L patch ) , (6)The GAN discriminator D measures how close the generated data set G is to the trueor training data T . In a relativistic average GAN [72], the discriminator is given by theprobability of a generated event being more realistic than the average true event, and viceversa. It corresponds to the adversarial generator loss in Eq.(2) but with switched labels,= −(cid:104) log(1 − D ) (cid:105) G − (cid:104) log D (cid:105) T . (7)To this expression we add a gradient penalty for stabilization, L reg = (cid:104) ( (cid:13)(cid:13) ∇ X (cid:48) C ( X (cid:48) ) (cid:13)(cid:13) − (cid:105) . (8)where X (cid:48) is a randomly weighted average between a real and generated samples, X (cid:48) = (cid:15)X T +(1 − (cid:15) ) X G and C(X’) is the unactivated discriminator output.All hyperparameters are listed in Tab. 1. We use Adam [73] for the optimization with β = 0 . β = 0 .

9. The learning rate is λ = 0 . Training

The starting point of our training, illustrated in Fig. 3, is the HR truth image,from which the LR image is derived. All jet images are also raised to the power p = 0 . f = 2 = 8. In that case, we divide theLR image by the total factor f and feed it into the RRDB generator. Its output is divided bythe factor 1 /f and gives the SR image raised to the power p . This intermediate result is saved6 ciPost Physics Submission GeneratorHRHR p L SR p SR2 DiscriminatorsL p L L LR gen LR p gen ∑ ⋅1/ f ⋅ f pool Figure 3: Training process for jet images. The generator and discriminator networks areshown in Fig. 2.for the computation of L HR . For the SR output image we need to only take the p th root.This SR image is sum-pooled back to its LR version LR gen to compute the diﬀerent generatorloss terms. Based on this set of LR, HR, and SR images, with and without a p -scaling, wecompute a set of L loss contributions to the generator loss, as well as the discriminator lossesfrom the HR-SR comparison. We benchmark the performance of the super-resolution algorithm for both QCD jets andtop-quark jets. QCD jets, which at the LHC arise from massless partons, exist in largesamples and are well described by collinear and soft splittings. As an alternative, we use jetsfrom top-quark decays, which are signiﬁcantly diﬀerent, but can be isolated experimentallyfrom semi-leptonic top-quark pair production and well-described theoretically via perturbativeQCD.We start with a set of HR-jet images with 160 ×

160 pixels. We down-sample each of theseimages to a corresponding LR image by a linear factor 1 /f = 1 / × ciPost Physics Submission it can help learning intricate, non-local patterns, which would be missed by a global pixelshuﬄe. In the following, we ﬁrst train and test a network on QCD-jets, then on top-jets. Toestimate the model uncertainties, we apply networks trained on one class to the other class.To evaluate the quality of the information in our image-based results in a physics context,we calculate an established set of jet observables [29, 75–77] m jet = (cid:32)(cid:88) i p µi (cid:33) w pf = (cid:80) i p T ,i ∆ R i, jet (cid:80) i p T ,i C . = (cid:80) i,j p T ,i p T ,j (∆ R i,j ) . ( (cid:80) i p T ,i ) τ N = (cid:80) k p T ,k min(∆ R ,k , ..., ∆ R N,k ) (cid:80) k p T ,k R . (9)The jet mass is the most relevant diﬀerence between pure QCD jets and top decay jets. Thegirth w pf essentially describes the geometric extension of the hard pixels, while C . is theleading pixel-to-pixel correlation. The subjettiness ratios τ /τ and τ /τ can distinguishbetween 2-prong and 3-prong decay jets. In an initial test, we train and test our super-resolution network on the sample of QCD jets,which are characterized by a few central pixels which carry most of the jet energy. In thiscase, it is important to include down-sampled kinematic distributions in the evaluation, todisentangle the central patterns.In Fig. 4 we compare the HR and SR images as well as the true LR image with theirgenerated LR gen counterpart. In addition to average SR and LR images, we show the energyspectra for the leading four pixels. This reveals how the LR image resolution reaches its limits,because the leading pixel carries most of the information. The sub-leading pixels are oftenharder for the HR image, because the up-sampling often splits the hardest LR pixel. Fromthe 7th leading pixel and beyond, we see an increasing number of empty pixels, and abovethe 10th pixel the QCD jet largely features soft noise. This transition is the weak spot of theSR network. While it learns the underlying principles of QCD splittings for the hard pixelsand the noise patterns for the soft pixels, the mixed range around the 7th and 10th pixelsindicates sizeable deviations. We also show the average ( f × f )-patches for the SR and theHR images to conﬁrm that the spreading of the hard pixels works at the 20% level.Again in Fig. 4 we see that the jet mass peaks around the expected 50 GeV, for the LRand for the HR-jet alike. Still, the agreement between LR and LR gen on the one hand andbetween HR and SR on the other is better than the agreement between the LR and HRimages. A similar picture emerges for the p T -weighted distance to the jet axis, the girth w pf ,which essentially describes the extension of the hard pixels. The pixel-to-pixel correlation C . also shows little deviation between HR and SR on the one hand and LR and LR gen onthe other. Finally, we see how the speciﬁc subjettiness ratios τ /τ and τ /τ increase forthe HR/SR images, because the splitting of hard central pixels into two hard and collinear,now resolved pixels increases the IR-safe subjet count. The ratio τ /τ turns out to be one ofthe hardest of the HR-patterns to learn, with the eﬀect that the SR version leads to slightlysmaller values. This implies that the SR network does not generate quite enough splittings.Such a feature could of course be improved, but any optimization has to be balanced with8 ciPost Physics Submission √ GeV]020004000 E n tr i e s hardest pixel SRHRLRLR gen √ GeV]01000200030004000 E n tr i e s nd hardest SRHRLRLR gen . . . . .

0E [ √ GeV]020004000 E n tr i e s rd hardest SRHRLRLR gen √ GeV]020004000 E n tr i e s th hardest SRHRLRLR gen √ GeV]01000200030004000 E n tr i e s th hardest SRHRLRLR gen √ GeV]025005000750010000 E n tr i e s th hardest SRHRLRLR gen √ GeV]01000020000 E n tr i e s th hardest SRHRLRLR gen

SR HR . . . . m jet [GeV]0200040006000 E n tr i e s SRHRLRLR gen .

00 0 .

05 0 .

10 0 . w PF E n tr i e s SRHRLRLR gen .

00 0 .

05 0 .

10 0 .

15 0 . C . E n tr i e s SRHRLRLR gen .

00 0 .

25 0 .

50 0 .

75 1 . τ /τ E n tr i e s SRHRLRLR gen .

00 0 .

25 0 .

50 0 .

75 1 . τ /τ E n tr i e s SRHRLRLR gen

Figure 4: Demonstration of the performance of a network trained on QCD-jets and appliedto QCD-jets. Top left are averages of the HR and SR images, followed by distributions of thesquare-root of the energy of leading pixels, sub-leading, etc. Also shown are average ( f × f )-patches for the SR and the HR images, and distributions of high-level jet observables, see textfor deﬁnitions. The zero-bin in energy collects jets with too few entries.the ability of the network to also describe jets with more than just collinear splittings, as wewill see in the next case. 9 ciPost Physics Submission √ GeV]020004000 E n tr i e s hardest pixel SRHRLRLR gen √ GeV]0200040006000 E n tr i e s nd hardest SRHRLRLR gen . . . .

0E [ √ GeV]0200040006000 E n tr i e s rd hardest SRHRLRLR gen √ GeV]0200040006000 E n tr i e s th hardest SRHRLRLR gen √ GeV]0200040006000 E n tr i e s th hardest SRHRLRLR gen √ GeV]0200040006000 E n tr i e s th hardest SRHRLRLR gen √ GeV]0200040006000 E n tr i e s th hardest SRHRLRLR gen

SR HR . . . . m jet [GeV]025005000750010000 E n tr i e s SRHRLRLR gen . . . w PF E n tr i e s SRHRLRLR gen C . E n tr i e s SRHRLRLR gen . . . τ /τ E n tr i e s SRHRLRLR gen . . . . τ /τ E n tr i e s SRHRLRLR gen

Figure 5: Demonstration of the performance of a network trained on top-quark jets and appliedto top-quark jets. Top left are averages of the HR and SR images, followed by distributionsof the square-root of the energy of leading pixels, sub-leading, etc. Also shown are average( f × f )-patches for the SR and the HR images, and distributions of high-level jet observables,see text for deﬁnitions. The zero-bin in energy collects jets with too few entries. The physics of top-quark, light-quark, and QCD jets is very diﬀerent. While for QCD-jetscollinear and, to some degree, soft splittings describe the entire object, top-quark jets includethe two electroweak decay steps. Comparing the top-quark jets shown in Fig. 5 with theQCD jets in Fig. 4 we see this diﬀerence already from the jet images — the top-quark jets10 ciPost Physics Submission are much wider and their energy is distributed among more pixels. From a SR point of view,this simpliﬁes the task, because the network can work with more LR-structures. Technically,the adversarial loss becomes more important, and we can indeed balance the performance ontop-quark jets vs QCD jets using λ adv .Looking at the ordered constituents, the additional mass drop structure is learned by thenetworks extremely well. The leading four constituents typically cover the three hard decaysub-jets, and they are described even better than in the QCD case. Starting with the 4thconstituent, the relative position of the LR and HR peaks changes towards a more QCD-likestructure, so the network starts splitting one hard LR-constituent into hard HR-constituents.This is consistent with the top-quark jet consisting of three well-separated patterns, wherethe QCD jets only show this pattern for one leading constituent. We also see that up to the15th constituent, the massive top-quark jet shows comparably distinctive patterns and onlyfew empty pixels.For the high-level observables, we ﬁrst see that the SR network shifts the jet mass peakby about 10 GeV and does well on the girth w PF , aided by the fact that the jet resolution hashardly any eﬀect on the jet size. As for QCD-jets, C . is no challenge for the up-sampling.Unlike for QCD-jets, τ /τ is as stable as τ /τ , because it is completely governed by the hardand geometrically well-separated hard decays.While our up-sampling network will work on one pair of LR-HR jets, with an up-scalingfactor eight, it is interesting to see what happens with these jet observables when we changethe jet resolution more continuously. In Fig. 6 we see that the three diﬀerent down-scalingsteps indeed interpolate between the full HR and LR jets smoothly. While the maximum inthe number of active pixels shifts almost linearly, the jet mass is altogether not aﬀected much.The p T -weighted girth is only aﬀected for the collimated QCD jets, similar to the subjettinessratio τ /τ . In contrast, the ratio τ /τ indicates that we start losing the prong multiplicityinformation also for top-quark jets. The ultimate goal for jet super-resolution is to learn jet structures in general, such that SRimages can be used to improve multi-jet analyses. In practice, a network could then be trainedon some kind of representative jet sample. In our case, the QCD jets and top-quark jets areextremely diﬀerent, and we further amplify this eﬀect by training the models on one sampleand applying them to the other. This gives an example of a large model dependence andallows us to understand the behavior by comparing with the correctly assigned data sets.In Fig. 7, we show the results from the network trained on QCD jets, now applied to LRtop-quark jets. Interestingly, the network generates all the correct patterns for the orderedtop-quark jet constituents, albeit with a slightly reduced precision for instance for the 15thconstituent. Similarly, the patches still do not include unwanted visible patterns, but areslightly more noisy.Finally, in Fig. 8 we show the results from the network trained on top-quark jets, butapplied to LR QCD-jets. In a detailed comparison with Fig. 4, we see that the network doesnot generate the more challenging QCD patterns out of the narrow central pixel set. It startsto fail already for the ﬁrst and second constituents, and works slightly better for the 7th11 ciPost Physics Submission E n tr i e s HRLRx8LRx4LRx2 m jet [GeV]0200040006000 E n tr i e s HRLRx8LRx4LRx2 .

00 0 .

05 0 .

10 0 . w PF E n tr i e s HRLRx8LRx4LRx2 .

00 0 .

05 0 .

10 0 .

15 0 . C . E n tr i e s HRLRx8LRx4LRx2 .

00 0 .

25 0 .

50 0 .

75 1 . τ /τ E n tr i e s HRLRx8LRx4LRx2 .

00 0 .

25 0 .

50 0 .

75 1 . τ /τ E n tr i e s HRLRx8LRx4LRx2 E n tr i e s HRLRx8LRx4LRx2 m jet [GeV]0250050007500 E n tr i e s HRLRx8LRx4LRx2 . . . w PF E n tr i e s HRLRx8LRx4LRx2 C . E n tr i e s HRLRx8LRx4LRx2 . . . . τ /τ E n tr i e s HRLRx8LRx4LRx2 . . . . . τ /τ E n tr i e s HRLRx8LRx4LRx2

Figure 6: Distribution of the number of active pixels and high-level observables m jet , τ /τ , τ /τ , w pf , and for images down-scaled by factors 2, 4, and 8. Results are shown for QCD jets(top rows) and top-quark jets (bottom rows).constituent in the transition region before correctly reproducing the soft noise patterns. Inthe distributions of high-level observables, the problem is most evident in τ /τ . Here thetraining on the top-quark sample pushes the SR QCD-image towards larger values or higherjet multiplicities. This reﬂects the broader structure of the training sample with its generallylarger values of τ /τ . The ﬂexibility of deep networks often comes at a cost of complexity. This complexity, inthe form of a large number of layers and nodes, means a large number of parameters mustbe optimized during training. This hyper-ﬂexibility can lead to undesirable side-eﬀects thatultimately hurt its utility especially when it comes to systematic studies. A network withfewer parameters, which achieves the same performance, will be more eﬃcient to train, fasterto evaluate, less prone to over-ﬁtting and more likely to generalize. For these reasons, we aimto determine the minimal necessary complexity of our GANs by systematically reducing thenumber of layers until performance is impacted.12 ciPost Physics Submission

Most of our network complexity resides in the core of the super-resolution GANs, whichcomprises the residual-in-residual dense blocks (RRDBs), each of which includes 15 convolu-tional layers. In this section we experiment with a smaller number of blocks, but the samenetwork architecture. In Fig. 9 we compare pixel energy distributions for SR images generatedby the reduced-complexity network to those generated by the network described earlier. Inthe ﬁrst panels we see that for top-quark jet even a single-block network is able to extract √ GeV]020004000 E n tr i e s hardest pixel SRHRLRLR gen √ GeV]0200040006000 E n tr i e s nd hardest SRHRLRLR gen . . . .

Figure 7: Demonstration of the performance of a network trained on QCD jets and appliedto top-quark jets. Top left are averages of the HR and SR images, followed by distributionsof the square-root of the energy of leading pixels, sub-leading, etc. Also shown are average( f × f )-patches for the SR and the HR images, and distributions of high-level jet observables,see text for deﬁnitions. The zero-bin in energy collects jets with too few entries.13 ciPost Physics Submission the truth features very well. The remaining challenge is to properly describe the softer pixels,just as we see for the full network in Fig. 5. In the second set of panels in Fig. 9 we showthe corresponding result for a network trained on and applied to QCD jets. As expected, thenetwork task is much more challenging because of the smaller number of available LR-pixelsand the much more focussed structure of QCD jets. Similar to the full network results shownin Fig. 4, the slim network does not push the energy for the softer pixels to the full truth √ GeV]020004000 E n tr i e s hardest pixel SRHRLRLR gen √ GeV]020004000 E n tr i e s nd hardest SRHRLRLR gen . . . . .

0E [ √ GeV]020004000 E n tr i e s rd hardest SRHRLRLR gen . . . .

5E [ √ GeV]020004000 E n tr i e s th hardest SRHRLRLR gen √ GeV]020004000 E n tr i e s th hardest SRHRLRLR gen √ GeV]025005000750010000 E n tr i e s th hardest SRHRLRLR gen √ GeV]01000020000 E n tr i e s th hardest SRHRLRLR gen

SR HR . . . . m jet [GeV]0200040006000 E n tr i e s SRHRLRLR gen .

00 0 .

05 0 .

10 0 . w PF E n tr i e s SRHRLRLR gen .

00 0 .

05 0 .

10 0 .

15 0 . C . E n tr i e s SRHRLRLR gen .

00 0 .

25 0 .

50 0 .

75 1 . τ /τ E n tr i e s SRHRLRLR gen .

00 0 .

25 0 .

50 0 .

75 1 . τ /τ E n tr i e s SRHRLRLR gen

Figure 8: Demonstration of the performance of a network trained on top-quark jets and appliedto QCD jets. Top left are averages of the HR and SR images, followed by distributions of thesquare-root of the energy of leading pixels, sub-leading, etc. Also shown are average ( f × f )-patches for the SR and the HR images, and distributions of high-level jet observables, see textfor deﬁnitions. The zero-bin in energy collects jets with too few entries.14 ciPost Physics Submission Figure 9: Demonstration of the performance of a reduced complexity (1 RRDB block) networkcompared to a more complex network (10 RRDB blocks), for networks trained on and appliedtop-quark jets (upper) and QCD jets (lower). Shown are distributions of the square-rootof the pixel energies for the true high resolution image (HR) and super resolution imagesgenerated by the reduced and standard complexity network.values, but gets stuck at a slightly softer spectrum.To illustrate the super-resolution network performance we compute the ﬁrst Wassersteindistance between the true HR images and the SR images. In Fig. 10 we show this Wasserstein15 ciPost Physics Submission

Figure 10: Dependence of the performance of super resolution networks on the numberof internal RRDB blocks (See Fig. 2). Performance is measured via the one dimensionalWasserstein distance between the distribution of quantities over true high-resolution imagesand the super-resolution images. Quantities examined are the energy of the leading pixel,subleading, etc. Left (right) shows results for networks trained on top-quark (QCD) jets andapplied to top-quark (QCD) jets.distance as a function of the number of RRDBs for top-quark jets (left) and QCD jets (right).The global scale of Wasserstein distance values reﬂects the fact that top-quark jets are betterdescribed by all networks, regardless of the number of RRDBs. As a matter of fact, here theperformance improvement from more RRDBs is almost completely covered by the ﬂuctuationsfrom diﬀerent network initializations and runs. In contrast, the more challenging QCD jetsshow a signiﬁcant improvement with an increased network complexity. Interestingly, for bothtop-quark and QCD jets, the performance improvement is not visibly related to, for instance,hard vs soft pixels. We also emphasize that the larger network complexity required by QCDjets is in contrast to the complexity of the actual jets. While the top-quark jets combinemassive decay and QCD splitting patterns, the physics principles behind the QCD jets aremuch simpler, so the required complexity of the super-resolution network is not driven by thecomplexity of the underlying objects, but by the eﬀect of the reduced resolution.

Jet physics in terms of low-level observables and with the help of deep networks deﬁnes manynew opportunities in jet physics and jet measurements at the LHC. For jet classiﬁcation, orjet tagging, deep networks typically outperform established high-level approaches.In this paper, we propose a new application of deep learning to jet physics: jet super-resolution, which aims to overcome the limitations of detector resolution and allow for deeperanalysis of jet data from ATLAS and CMS. Super-resolution networks can provide additionalinformation, and hence improved resolution, by encoding our knowledge about jet physics ina generative network.Our results demonstrate that a super-resolution network can indeed reproduce high-resolution jet images of top-quark jets and QCD jets when trained on these samples. Weillustrated the performance of the super-resolution networks using images, low-level observ-ables, and high-level observables. The more challenging test of the generality of the network16 ciPost Physics Submission is evaluated by applying a network trained on one sample to jets from the other sample. Weconﬁrmed that our super-resolution network exhibits the necessary model independence tobe applied to diﬀerent kinds of jets. This will allow us to train jet super-resolution networkson mixed samples and avoid complications for instance with the poorly deﬁned separation ofquark and gluon jets in a QCD sample.While the main focus of our study was to show that the technique of image super-resolutionworks reliably on LHC jets, we already showed that it can be used to enhance jet measurementsin regions with poor calorimeter performance. Additionally, we showed that the necessarycomplexity of the network depends on the source of the jets. Interestingly, equivalent perfor-mance on top-quark jets can be achieved with far fewer parameters than QCD jets, despite theformer having greater complexities in the underlying physics mechanisms. Such knowledgeis helpful in eﬃciently allocating computational resources when analyzing experimental jetdata.

Acknowledgments

We would like to thank Monica Dunford and Hans-Christian Schultz-Coulon for the ex-perimental encouragement. The research of AB is supported by the Deutsche Forschungsge-meinschaft (DFG, German Research Foundation) under grant 396021762 – TRR 257

ParticlePhysics Phenomenology after the Higgs Discovery . DW is supported by the Department ofEnergy, Oﬃce of Science. JNH acknowledges support by the National Science Foundationunder grants DGE-1633631 and DGE-1839285. Any opinions, ﬁndings, and conclusions orrecommendations expressed in this material are those of the author(s) and do not necessarilyreﬂect the views of the National Science Foundation.

References [1] W. T. Freeman and E. C. Pasztor,

Learning low-level vision , in

Proceedings of theSeventh IEEE International Conference on Computer Vision . 1999.[2] Q. Yang, R. Yang, J. Davis, and D. Nister,

Spatial-depth super resolution for rangeimages , in . 2007.[3] I. Csabai, F. Czako, and Z. Fodor,

Quark and gluon jet separation using neuralnetworks , Phys. Rev. D (1991) 1905.[4] L. Lonnblad, C. Peterson, and T. Rognvaldsson, Finding Gluon Jets With a NeuralTrigger , Phys. Rev. Lett. (1990) 1321.[5] J. Cogan, M. Kagan, E. Strauss, and A. Schwarztman, Jet-Images: Computer VisionInspired Techniques for Jet Tagging , JHEP (2015) 118, arXiv:1407.5675 [hep-ph].[6] L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartzman, Jet-images —deep learning edition , JHEP (2016) 069, arXiv:1511.05190 [hep-ph].17 ciPost Physics Submission [7] P. Baldi, K. Bauer, C. Eng, P. Sadowski, and D. Whiteson, Jet SubstructureClassiﬁcation in High-Energy Physics with Deep Neural Networks , Phys. Rev. D (2016) 9, 094034, arXiv:1603.09349 [hep-ex].[8] P. T. Komiske, E. M. Metodiev, and M. D. Schwartz, Deep learning in color: towardsautomated quark/gluon jet discrimination , JHEP (2017) 110, arXiv:1612.01551[hep-ph].[9] J. Li and H. Sun, An Attention Based Neural Network for Jet Tagging ,arXiv:2009.00170 [hep-ph].[10] S. Carrazza and F. A. Dreyer,

Lund jet images from generative and cycle-consistentadversarial networks , Eur. Phys. J. C (2019) 11, 979, arXiv:1909.01359 [hep-ph].[11] G. Kasieczka, T. Plehn, M. Russell, and T. Schell, Deep-learning Top Taggers or TheEnd of QCD? , JHEP (2017) 006, arXiv:1701.08784 [hep-ph].[12] S. Macaluso and D. Shih, Pulling Out All the Tops with Computer Vision and DeepLearning , JHEP (2018) 121, arXiv:1803.00107 [hep-ph].[13] L. G. Almeida, M. Backovi´c, M. Cliche, S. J. Lee, and M. Perelstein, Playing Tag withANN: Boosted Top Identiﬁcation with Pattern Recognition , JHEP (2015) 086,arXiv:1501.05968 [hep-ph].[14] A. Butter, G. Kasieczka, T. Plehn, and M. Russell, Deep-learned Top Tagging with aLorentz Layer , SciPost Phys. (2018) 3, 028, arXiv:1707.08966 [hep-ph].[15] J. Pearkes, W. Fedorko, A. Lister, and C. Gay, Jet Constituents for Deep NeuralNetwork Based Top Quark Tagging , arXiv:1704.02124 [hep-ex].[16] M. Erdmann, E. Geiser, Y. Rath, and M. Rieger,

Lorentz Boost Networks: AutonomousPhysics-Inspired Feature Engineering , JINST (2019) 06, P06006, arXiv:1812.09722[hep-ex].[17] G. Louppe, K. Cho, C. Becot, and K. Cranmer, QCD-Aware Recursive NeuralNetworks for Jet Physics , JHEP (2019) 057, arXiv:1702.00748 [hep-ph].[18] D. Guest, J. Collado, P. Baldi, S.-C. Hsu, G. Urban, and D. Whiteson, Jet FlavorClassiﬁcation in High-Energy Physics with Deep Neural Networks , Physical Review D (2016) , arXiv:1607.08633.[19] P. T. Komiske, E. M. Metodiev, and J. Thaler, Energy Flow Networks: Deep Sets forParticle Jets , JHEP (2019) 121, arXiv:1810.05165 [hep-ph].[20] H. Qu and L. Gouskos, ParticleNet: Jet Tagging via Particle Clouds , Phys. Rev. D (2020) 5, 056019, arXiv:1902.08570 [hep-ph].[21] A. Chakraborty, S. H. Lim, M. M. Nojiri, and M. Takeuchi,

Neural Network-based TopTagger with Two-Point Energy Correlations and Geometry of Soft Emissions , JHEP (2020) 111, arXiv:2003.11787 [hep-ph].[22] E. Bernreuther, T. Finke, F. Kahlhoefer, M. Kr¨amer, and A. M¨uck, Casting a graph netto catch dark showers , arXiv:2006.08639 [hep-ph].18 ciPost Physics Submission [23] J. Shlomi, P. Battaglia, and jean-roch vlimant,

Graph neural networks in particlephysics , Machine Learning: Science and Technology (oct, 2020) .[24] A. Butter et al. , The Machine Learning Landscape of Top Taggers , SciPost Phys. (2019) 014, arXiv:1902.09914 [hep-ph].[25] L. Bradshaw, R. K. Mishra, A. Mitridate, and B. Ostdiek, Mass Agnostic Jet Taggers ,SciPost Phys. (2020) 1, 011, arXiv:1908.08959 [hep-ph].[26] G. Kasieczka, S. Marzani, G. Soyez, and G. Stagnitto, Towards Machine LearningAnalytics for Jet Substructure , JHEP (2020) 195, arXiv:2007.04319 [hep-ph].[27] M. J. Dolan and A. Ore, Equivariant Energy Flow Networks for Jet Tagging ,arXiv:2012.00964 [hep-ph].[28] T. Faucett, J. Thaler, and D. Whiteson,

Mapping Machine-Learned Physics into aHuman-Readable Space , arXiv:2010.11998 [hep-ph].[29] G. Kasieczka, N. Kiefer, T. Plehn, and J. M. Thompson,

Quark-Gluon Tagging:Machine Learning vs Detector , SciPost Phys. (2019) 6, 069, arXiv:1812.09223[hep-ph].[30] G. Kasieczka and D. Shih, Robust Jet Classiﬁers through Distance Correlation , Phys.Rev. Lett. (2020) 12, 122001, arXiv:2001.05310 [hep-ph].[31] S. Bollweg, M. Haussmann, G. Kasieczka, M. Luchmann, T. Plehn, and J. Thompson,

Deep-Learning Jets with Uncertainties and More , SciPost Phys. (2020) 1, 006,arXiv:1904.10004 [hep-ph].[32] G. Kasieczka, M. Luchmann, F. Otterpohl, and T. Plehn, Per-Object Systematics usingDeep-Learned Calibration , arXiv:2003.11099 [hep-ph].[33] T. Heimel, G. Kasieczka, T. Plehn, and J. M. Thompson,

QCD or What? , SciPostPhys. (2019) 3, 030, arXiv:1808.08979 [hep-ph].[34] M. Farina, Y. Nakai, and D. Shih, Searching for New Physics with Deep Autoencoders ,Phys. Rev. D (2020) 7, 075021, arXiv:1808.08992 [hep-ph].[35] T. Cheng, J.-F. Arguin, J. Leissner-Martin, J. Pilette, and T. Golling,

VariationalAutoencoders for Anomalous Jet Tagging , arXiv:2007.01850 [hep-ph].[36] S. Bieringer, A. Butter, T. Heimel, S. H¨oche, U. K¨othe, T. Plehn, and S. T. Radev,

Measuring QCD Splittings with Invertible Networks , arXiv:2012.09873 [hep-ph].[37] C.-Y. Yang, C. Ma, and M. Yang,

Single-image super-resolution: A benchmark , in

ECCV . 2014.[38] Z. Wang, J. Chen, and S. C. H. Hoi,

Deep learning for image super-resolution: Asurvey , arXiv:1902.06068 [cs.CV].[39] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang,

Deep networks for imagesuper-resolution with sparse prior , arXiv:1507.08905 [cs.CV].19 ciPost Physics Submission [40] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, and Y. Bengio,

Generative adversarial networks , 2014.[41] A. Butter and T. Plehn,

Generative Networks for LHC events , arXiv:2008.08558[hep-ph].[42] E. Bothmann and L. Debbio,

Reweighting a parton shower using a neural network: theﬁnal-state case , JHEP (2019) 033, arXiv:1808.07802 [hep-ph].[43] L. de Oliveira, M. Paganini, and B. Nachman, Learning Particle Physics by Example:Location-Aware Generative Adversarial Networks for Physics Synthesis , Comput. Softw.Big Sci. (2017) 1, 4, arXiv:1701.05927 [stat.ML].[44] J. W. Monk, Deep Learning as a Parton Shower , JHEP (2018) 021,arXiv:1807.03685 [hep-ph].[45] A. Andreassen, I. Feige, C. Frye, and M. D. Schwartz, JUNIPR: a Framework forUnsupervised Machine Learning in Particle Physics , Eur. Phys. J.

C79 (2019) 2, 102,arXiv:1804.09720 [hep-ph].[46] K. Dohi,

Variational Autoencoders for Jet Simulation , arXiv:2009.04842 [hep-ph].[47] Y. Lu, J. Collado, D. Whiteson, and P. Baldi,

SARM: Sparse Autoregressive Model forScalable Generation of Sparse Images in Particle Physics , 2020.[48] S. Otten et al. , Event Generation and Statistical Sampling with Deep Generative Modelsand a Density Information Buﬀer , arXiv:1901.00875 [hep-ph].[49] B. Hashemi, N. Amin, K. Datta, D. Olivito, and M. Pierini,

LHC analysis-speciﬁcdatasets with Generative Adversarial Networks , arXiv:1901.05282 [hep-ex].[50] R. Di Sipio, M. Faucci Giannelli, S. Ketabchi Haghighat, and S. Palazzo,

DijetGAN: AGenerative-Adversarial Network Approach for the Simulation of QCD Dijet Events atthe LHC , JHEP (2020) 110, arXiv:1903.02433 [hep-ex].[51] A. Butter, T. Plehn, and R. Winterhalder, How to GAN LHC Events , SciPost Phys. (2019) 075, arXiv:1907.03764 [hep-ph].[52] C. Gao, S. H¨oche, J. Isaacson, C. Krause, and H. Schulz, Event Generation withNormalizing Flows , Phys. Rev. D (2020) 7, 076002, arXiv:2001.10028 [hep-ph].[53] E. Bothmann, T. Janssen, M. Knobbe, T. Schmale, and S. Schumann,

Exploring phasespace with Neural Importance Sampling , SciPost Phys. (2020) 4, 069,arXiv:2001.05478 [hep-ph].[54] I.-K. Chen, M. D. Klimek, and M. Perelstein, Improved Neural Network Monte CarloSimulation , arXiv:2009.07819 [hep-ph].[55] R. Verheyen and B. Stienen,

Phase Space Sampling and Inference from WeightedEvents with Autoregressive Flows , arXiv:2011.13445 [hep-ph].[56] M. Backes, A. Butter, T. Plehn, and R. Winterhalder,

How to GAN EventUnweighting , arXiv:2012.07873 [hep-ph].20 ciPost Physics Submission [57] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang,and W. Shi,

Photo-realistic single image super-resolution using a generative adversarialnetwork , CoRR abs/1609.04802 (2016) , arXiv:1609.04802.[58] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C. Loy, Y. Qiao, and X. Tang,

ESRGAN: enhanced super-resolution generative adversarial networks , CoRR abs/1809.00219 (2018) , arXiv:1809.00219.[59] D. Kodi Ramanah, T. Charnock, F. Villaescusa-Navarro, and B. D. Wandelt,

Super-resolution emulator of cosmological simulations using deep physical models ,MNRAS (May, 2020) 4227–4236.[60] Y. Li, Y. Ni, R. A. C. Croft, T. D. Matteo, S. Bird, and Y. Feng,

Ai-assistedsuper-resolution cosmological simulations , arXiv:2010.06608 [astro-ph.CO].[61] F. A. Di Bello, S. Ganguly, E. Gross, M. Kado, M. Pitt, L. Santi, and J. Shlomi,

Towards a Computer Vision Particle Flow , arXiv:2003.08863 [hep-ex].[62] A. Butter, S. Diefenbacher, G. Kasieczka, B. Nachman, and T. Plehn,

GANplifyingEvent Samples , arXiv:2008.06545 [hep-ph].[63] T. Sj¨ostrand, S. Mrenna, and P. Skands,

A brief introduction to pythia 8.1 , ComputerPhysics Communications (6, 2008) 852–867.[64] DELPHES 3, J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaitre,A. Mertens, and M. Selvaggi,

DELPHES 3, A modular framework for fast simulation ofa generic collider experiment , JHEP (2014) 057, arXiv:1307.6346 [hep-ex].[65] M. Cacciari, G. P. Salam, and G. Soyez, Fastjet user manual , The European PhysicalJournal C (3, 2012) .[66] M. Cacciari, G. P. Salam, and G. Soyez, The anti-ktjet clustering algorithm , Journal ofHigh Energy Physics (4, 2008) 063–063.[67] Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin,

Unsupervised imagesuper-resolution using cycle-in-cycle generative adversarial networks , 2018.[68] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu,

Residual dense network for imagerestoration , 2018.[69] W. Shi, J. Caballero, F. Husz´ar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, andZ. Wang,

Real-time single image and video super-resolution using an eﬃcient sub-pixelconvolutional neural network , 2016.[70] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,

Improvedtraining of wasserstein gans , CoRR abs/1704.00028 (2017) , arXiv:1704.00028.[71] P. Isola, J. Zhu, T. Zhou, and A. A. Efros,

Image-to-image translation with conditionaladversarial networks , CoRR abs/1611.07004 (2016) , arXiv:1611.07004.[72] A. Jolicoeur-Martineau,

The relativistic discriminator: a key element missing fromstandard GAN , CoRR abs/1807.00734 (2018) , arXiv:1807.00734.21 ciPost Physics Submission [73] D. P. Kingma and J. Ba,

Adam: A Method for Stochastic Optimization ,arXiv:1412.6980 [cs.LG].[74] A. Radford, L. Metz, and S. Chintala,

Unsupervised representation learning with deepconvolutional generative adversarial networks , 2015.[75] J. Gallicchio, J. Huth, M. Kagan, M. D. Schwartz, K. Black, and B. Tweedie,

Multivariate discrimination and the Higgs + W/Z search , JHEP (2011) 069,arXiv:1010.3698 [hep-ph].[76] A. J. Larkoski, G. P. Salam, and J. Thaler, Energy Correlation Functions for JetSubstructure , JHEP (2013) 108, arXiv:1305.0007 [hep-ph].[77] J. Thaler and K. Van Tilburg, Identifying Boosted Objects with N-subjettiness , JHEP03