A Data-driven Event Generator for Hadron Colliders using Wasserstein Generative Adversarial Network
AA Data-driven Event Generator for Hadron Collidersusing Wasserstein Generative Adversarial Network
Suyong Choi and Jae Hoon Lim ∗ Department of Physics, Korea University, Seoul 02841, Republic of Korea
Abstract
Highly reliable Monte-Carlo event generators and detector simulation programs are importantfor the precision measurement in the high energy physics. Huge amounts of computing resourcesare required to produce a sufficient number of simulated events. Moreover, simulation parametershave to be fine-tuned to reproduce situations in the high energy particle interactions which isnot trivial in some phase spaces in physics interests. In this paper, we suggest a new methodbased on the Wasserstein Generative Adversarial Network (WGAN) that can learn the probabilitydistribution of the real data. Our method is capable of event generation at a very short computingtime compared to the traditional MC generators. The trained WGAN is able to reproduce theshape of the real data with high fidelity.
PACS numbers: 29.85.-c, 07.05.MhKeywords: HEP data, Event generation, Deep learning, GAN, WGAN ∗ Electronic address: [email protected] a r X i v : . [ h e p - e x ] F e b . INTRODUCTION There have been lots of important scientific discoveries at the Large Hadron Collider(LHC). With the increasing integrated luminosity at the LHC, extremely rare and compli-cated physics processes could be studied such as the production of multiple massive quarksand vector bosons. For the precise measurement of the physics involving these particles, weneed a very good understanding in the background processes. Due to the large theoreticaluncertainties, it is recommended to estimate backgrounds from the real data. In addition,MC simulations of the backgrounds are statistically limited in the phase space of interest.The Generative Adversarial Network (GAN) [1] is a Deep Learning technique which cangenerate new fake image data. The idea is applied to the detector simulation in the HighEnergy Physics (HEP), and CaloGAN [2, 3] is one of the first applications. Then, the ideason generating derived event variables have been also explored to study Z → (cid:96) + (cid:96) − or two jetevents [4, 5].In this paper, we applied one of GANs to produce kinetic event variables of HEP data.We explored Wasserstein GAN (WGAN) which is known to give improved results than thetraditional GAN [6–8]. II. THE GENERATIVE ADVERSARIAL NETWORK
A GAN is composed of two neural networks, “generator” and “discriminator”. Thegenerator network, G , is supposed to create a sample of data from a set of variables calledas latent variables. By choosing a point in the latent variable space, it produces a datasample which is similar to the real data. In other words, it can be considered as a multi-dimensional mapping between the latent space and the event data space. The discriminatornetwork D distinguishes between the generated “fake” data and the “real” data that is usedfor training. Two networks compete during the training, G produces fake data as similarto the real data as possible and D tries to discriminate generated data from the real data.After successful training, any random point in the latent space can be mapped to the eventdata by the G network.The training strategy of the GAN is minimizing each loss function for the D and G .The loss function of the discriminator network L D is defined by taking the average of the1iscriminator outputs over the fake data and the real data. The loss function of the generatornetwork L G is defined by the negative value of the average of the discriminator outputs overthe fake data. L D = (cid:104) D (cid:105) fake − (cid:104) D (cid:105) real (1) L G = −(cid:104) D (cid:105) fake (2)When L D is minimized in a step, the discriminator network weights are updated so thatthe discriminator output for the real data would tend to 1 while for the fake data it wouldapproach 0. On the other hand, when L G is being minimized, the discriminator network isnot modified, but the generator network weights are updated.Special treatments are needed to overcome the issues of the traditional GAN. First,it is known that the GAN tends to produce smoother distributions compared to the realdata [4, 6, 9]. Second, the landscape of the loss function of GAN is well known to be in asaddle point, drift or show sudden jumps, therefore it has stability problems. In WGAN, apenalty term related with (cid:104)∇ D (cid:105) P is added to the L D to penalize the gradients, where P isa point between the real data and the fake data in the event data space. So the updates tothe network weights and biases become more gradual [6, 7]. This assures convergence of thenetwork parameters and stability of the training. III. APPLICATION OF WGAN IN pp → b ¯ bγγ PRODUCTION
A HEP data can be considered as a set of numbers. For example, HEP data are kinematicvariables of final state physics objects such as hadronic jets, leptons, or the missing transversemomentum. If the WGAN is applied to this case, a point in the latent space will be mappedto an event containing particle momenta components ( p x , p y , p z ).In this study, we focused on pp → b ¯ bγγ process, which is one of the backgrounds to searchfor the double Higgs boson production at the LHC. These events would allow us to probethe self-coupling of the Higgs and reconstruct the Higgs potential. One of the importantsearch modes of the double Higgs production proceeds through pp → HH → b ¯ b + γγ .As an illustration of the WGAN method, we try to mimic a Monte Carlo simulateddata, a proxy for the real data. We generated 1 million events of non-resonant production of pp → b ¯ bγγ at √ s = 14 TeV at the leading order with Madgraph 5 [10].
Pythia 8 was used2o simulate subsequent particle shower and hadronization process [11]. A fast parametrizeddetector simulation and reconstruction were performed with
Delphes 3 software packageusing the default CMS [12, 13] detector settings without pileup (multiple simultaneous pp interactions) effects. Generated events are required to satisfy selection criteria: • two photons with the transverse momentum p T >
10 GeV and the pseudorapidity | η | < . • at least two hadronic jets reconstructed with anti- k T algorithm [14] with 0.4 angulardistance (∆ R = (cid:112) ∆ η + ∆ φ , where φ is the azimuthal angle) with p T >
20 GeV and | η | < . • at least two tagged b -jets.After the event selection, about 70,000 events remained and are used for the training.For the implementation of the WGAN, we use TensorFlow 1.8.0 [16] with
Keras2.2.0 [17]. We build a modified WGAN composed of one generator network and 10 discrim-inator networks. To construct the generator network and the discriminator networks, weapply the multilayer perceptron model. Each network has four hidden layers with differentnode sizes of 2048, 2048, 1024, and 256 with the Rectified Linear Unit as the activationfunction of the neurons. Between the layers, we set a dropout rate of 0.05. The generatornetwork is configured to accept input from a 22-dimensional latent space. For the discrim-inator network, we choose the softmax function as an activation function at the outputlayer.The WGAN was trained adversarially on about 70,000 events with a batch size of 256for 100,000 epochs. Momentum components ( p x , p y , p z ) of the leading two photons and twoleading b -jets are used as inputs into the WGAN. Particle momenta are scaled down for thestability of the training. Training is done by Adam with a learning rate of 5 × − . Trainingtakes 20 hours with NVIDIA Tesla P100 graphic processor unit and the event generationwith the trained WGAN just takes a few seconds for 10 million events.Generated events by WGAN were not uniform in the azimuth or symmetric in pseudora-pidity. These symptoms seemed to stem from using a single discriminator in the adversarialtraining. Interestingly, when we create many discriminators, we observe that distributionsrecover the expected symmetry in angular distributions. By employing different discrim-inators, the generator should learn to cover the whole phase space. We employ multiple3 IG. 1: Training history of (cid:104) D (cid:105) , L D , L G , and the probability distance. discriminator networks to train the generator [15], and each discriminator network D i istrained to minimize own L iD separately.The probability distance is defined as the total variation distance between the discrimina-tor output distribution of input real events and that of generated fake events. As shown inFig. 1, the probability distance is minimized to 0.05, and (cid:104) D (cid:105) , L D , and L G show adversarialtraining in progress for 100,000 epochs.The distributions of momentum components ( p x , p y , p z ) of the leading photon are in Fig.2, and the distributions of p T , η , the invariant mass of two leading photons ( γ , γ ), and∆ R ( γ , γ ) are in Fig. 3. In each plot, the black line represents targeted events, the area filledwith red color represents generated events by WGAN. We observe a discrepancy betweendistributions of generated events and those of the targeted events. Discrepancies stem fromthe characteristics of WGAN, which has a tendency to make smoother distributions than4 IG. 2: The distributions of three momentum components ( p x , p y , p z ) of the leading photon. Blacklines represent the target events (real data), red distributions represent the generated events (fakedata before reweighting), and blue distributions represent the reweighted events (fake data afterreweighting). those of inputs.To recover discrepancies, we derive weighting factors by investigating the difference be-tween generated events and input data. We used 24 variables - three momentum com-ponents of two leading photons and b -jets, the p T and the η of two leading photons and b -jets, the invariant mass of two leading photons and two leading b -jets, ∆ R ( γ , γ ), and∆ R ( b -jet , b -jet ). We use a Gradient Boosted Decision Tree (GBDT) [18] to derive weight-ing factors of each generated event. 40 trees with the maximum depth of 3 and the minimumnumber of events in a leaf of 200 are used for the GBDT. In Fig. 2 and 3, the reweighteddistributions (blue color) have better agreements with input data.Figure 4 shows the invariant mass and ∆ R distributions of two leading photons in thehigh mass region (700 < M < ). The green dashed lines represent the trainingevents, which have 4 times lower statistics than the targeted distributions and the black solidlines represent the targeted events. In this region, only 33 events of input data were used forthe training, but the generated outputs show good agreement with the targeted distribution.This shows that the WGAN was able to reproduce the probability distributions in the thisregion even with the limited statistics. It also indicates that the WGAN based algorithmcan be considered as an alternative, data-driven event generator.5 IG. 3: The distributions of p T and η of the leading photon, the invariant mass of two photons,and ∆ R ( γ , γ ). Black lines represent the target events (real data), red distributions represent thegenerated events (fake data before reweighting), and blue distributions represent the reweightedevents (fake data after reweighting). IV. CONCLUSIONS
We developed a WGAN for reproducing pp → b ¯ bγγ events, one of the important back-ground samples for double Higgs boson studies at the LHC. The trained WGAN can generateevents at a very short computing time compared to the traditional MC generators and re-produce the shape of the real data with high fidelity even with the limited statistics. Weexpect that WGAN is a fast and faithful data-driven method for processes that are difficultto simulate in high energy physics. 6 IG. 4: The distributions of the invariant mass of two photons and ∆ R ( γ , γ ). The green dashedlines represent the training events (real data), the black lines represent the target events (realdata), red distributions represent the generated events (fake data before reweighting), and bluedistributions represent the reweighted events (fake data after reweighting). Acknowledgments
This work is supported by the National Research Foundation of Korea (NRF) underContract No.NRF-2018R1A2B6005043, NRF-2020R1A2C3009918, and the BK21 FOURprogram at Korea University,
Initiative for science frontiers on upcoming challenges . [1] I. Goodfellow et al. , Proceedings of the International Conference on Neural Information Pro-cessing Systems (NIPS 2014), pp. 2672–2680.
2] M. Paganini, L. Oliveira and B. Nachman, Phys. Rev. D , 014021 (2018).[3] M. Paganini, L. Oliveira and B. Nachman, Phys. Rev. Lett. , 042003 (2018).[4] B. Hashemi, N. Amin, K. Datta, D. Olivito and M. Pierini, arXiv:1901.05282 [hep-ex].[5] R. D. Sipio, M. F. Giannelli, S. K. Haghighat and S. Palazzo, J. High Energy Phys. , 110(2019).[6] M. Arjovsky, S. Chintala and L. Bottou, ICML’17: Proceedings of the 34th InternationalConference on Machine Learning, (ICML 2017) pp. 214–223.[7] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin and A. Courville, NIPS’17: Proceedingsof the 31st International Conference on Neural Information Processing Systems (NIPS 2017)pp. 5769–5779.[8] M. Erdmann, L. Geiger, J. Glombitza and D. Schmidt, Computing and Software for BigScience , 4 (2018).[9] Nasim Rahaman et al. , Proceedings of the 36th International Conference on Machine Learning (ICML 2019) pp. 5301-5310.[10] J. Alwall, M. Herquet, F. Maltoni, O. Mattelaer and T. Stelzer, J. High Energy Phys. , 128(2011).[11] T. Sjstrand et al. , Comput. Phys. Commun. pp. 159-177 (2015).[12] J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaˆıtre, A. Mertens and M.Selvaggi, J. High Energy Phys. , 057 (2014).[13] The CMS Collaboration, J. Instrum. , 8004 (2008).[14] M. Cacciari, G. P. Salam and G. Soyez, J. High Energy Phys. , 063 (2008).[15] I. Durugkar, I. Gemp and S. Mahadevan, Conference paper of International Conference onLearning Representations (ICLR 2017).[16] M. Abadi et al. et al. , https://keras.io (2015).[18] L. Mason, J. Baxter, P. Bartlett, and M. Frean, Advances in Neural Information ProcessingSystems 12 (MIT Press, 1999), pp. 512–518. . NORMALIZED DISTRIBUTIONS OF MOMENTUM COMPONENTS FIG. 5: The distributions of three momentum components ( p x , p y , p z ) of the subleading photon.FIG. 6: The distributions of three momentum components ( p x , p y , p z ) of the leading b -jet.FIG. 7: The distributions of three momentum components ( p x , p y , p z ) of the subleading b -jet. . NORMALIZED DISTRIBUTIONS OF ADDITIONAL VARIABLES FIG. 8: The distributions of p T and η of the subleading photon.FIG. 9: The distributions of p T and η of the leading b -jet. IG. 10: The distributions of p T and η of the subleading b -jet.FIG. 11: The distributions of invariant mass of two b -jets and ∆ R ( b -jet , b -jet ). . NORMALIZED DISTRIBUTIONS OF THE CONTROL REGION FIG. 12: The distributions of the invariant mass of two photons and ∆ R ( γ , γ ) with the conditionof [100, 150] GeV.) with the conditionof [100, 150] GeV.