Explainable machine learning of the underlying physics of high-energy particle collisions
EExplainable machine learning of the underlying physics of high-energyparticle collisions
Yue Shi Lai, ∗ Duff Neill, † Mateusz P(cid:32)losko´n, ‡ and Felix Ringer § Nuclear Science Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA Theoretical Division, MS B283, Los Alamos National Laboratory, Los Alamos, NM 87545, USA (Dated: December 15, 2020)We present an implementation of an explainable and physics-aware machine learning model ca-pable of inferring the underlying physics of high-energy particle collisions using the informationencoded in the energy-momentum four-vectors of the final state particles. We demonstrate theproof-of-concept of our White Box AI approach using a Generative Adversarial Network (GAN)which learns from a DGLAP-based parton shower Monte Carlo event generator. We show, for thefirst time, that our approach leads to a network that is able to learn not only the final distributionof particles, but also the underlying parton branching mechanism, i.e. the Altarelli-Parisi splittingfunction, the ordering variable of the shower, and the scaling behavior. While the current work isfocused on perturbative physics of the parton shower, we foresee a broad range of applications of ourframework to areas that are currently difficult to address from first principles in QCD. Examplesinclude nonperturbative and collective effects, factorization breaking and the modification of theparton shower in heavy-ion, and electron-nucleus collisions.
Introduction.
In recent years machine learning tech-niques have lead to range of new developments in nu-clear and high-energy physics [1–29]. For example, inRefs. [1–5] jet tagging techniques were developed whichoften outperform traditional techniques. In Refs. [6–11] Generative Adversarial Networks (
GAN s) [30, 31],a form of unsupervised machine learning, were used tosimulate event distributions in high-energy particle col-lisions. There have also been efforts to infer physics in-formation from data. In Ref. [32] a probabilistic modelwas introduced based on jet clustering and in Ref. [33]a convolutional autoencoder within a shower was usedwhich qualitatively reproduces jet observables. See alsoRefs. [34–36] for recent work on physics-aware learning.The underlying physics information of high-energy par-ticle collisions is encoded in hard-scattering processes,the subsequent parton shower and the hadronizationmechanism. These steps are modeled by general pur-pose parton showers used in Monte Carlo event genera-tors which play an important role in our understanding ofhigh-energy collider experiments [37–39]. Starting withhighly energetic quarks or gluons which are produced inhard-scattering events, parton showers simulate the par-ton branching processes that occur during the evolutionfrom the hard scale to the infrared which is followed bythe hadronization step. While the general concept ofparton showers is well established, important questionsabout the perturbative accuracy [40–46], nonperturba-tive effects [47–50] and the modification in the nuclearenvironment [51–65], remain a challenge.In this work, we propose an explainable or White BoxAI approach [66, 67] to learn the underlying physics ofhigh-energy particle collisions. As a proof of concept,we present results of a
GAN trained on the final out-put of a parton shower, which not only reproduces thefinal distribution of particles but also learns the under-
FIG. 1. Parton splitting process i → jk with longitudinal mo-mentum fraction z , relative splitting angle of the two daughterpartons θ and azimuthal angle φ . lying showering mechanism using the complete event in-formation. GAN s consist of two competing neural net-works, the generator and discriminator. The design ofour generator network allows to not only describe thefinal distribution of particles of the shower but the dif-ferent layers also give access to the underlying physicsencoded in the parton branching processes. More specif-ically, we demonstrate that the network can learn theAltarelli-Parisi splitting function P i → jk ( z ), the splittingangles of individual branching processes and the depen-dence of the shower on the energy scale Q , see Fig. 1.This is achieved by separating the GAN into two com-ponents such that it can learn both self-similar/fractalaspects of the shower like the Altarelli-Parisi splittingfunction as well as Monte Carlo time dependent variablessuch as the splitting angle. We use a network architec-ture that is sufficiently general, and as a result, capableof incorporating nonperturbative physics in the future.In order to use the complete information of each event,we use data representation which is directly given by thefour-vectors of the final state particles. To avoid sensitiv-ity to the unphysical ordering of the list of four-vectorsduring the training process, we use sets to represent the a r X i v : . [ h e p - ph ] D ec FIG. 2. Schematic illustrations of the generator network: a ) Parallelized data structure of the random splitting trees executedon the GPU. b ) Flow diagram of the i th splitting process ( n → n + 1 partons) of a randomly chosen parton with momentum p k .The time dependent and independent networks are shown which take as input random numbers (RND) as well as Q, θ i − in thetime dependent case. The output of the two neural networks is passed through a softmax to the module M which determinesthe four-vectors of the two daughter partons from the variables of the 1 → p k . data. In particular, in our work, the necessary permu-tation invariance is achieved by using so-called deep setswhich were developed in Refs. [68–70].With the framework introduced in this work, we canaccess the underlying physics mechanisms effectively de-parting from the typical black-box paradigm for neu-ral networks. Moreover, we expect that eventually the GAN can be trained directly on experimental data (i.e.measured four-vectors of detected particles). Generally,
GAN s are ideally suited for such applications due totheir generalizability and robustness when exposed to im-perfect data sets. We expect that our approach will beparticularly relevant for studies of heavy-ion collisions atRHIC and the LHC as well as electron-nucleus collisionsat the future Electron-Ion Collider [71]. In heavy-ion col-lisions, the presence of quark-gluon plasma (QGP) [72–80] leads to modifications of highly energetic jets as com-pared to the proton-proton baseline. These phenomenaare typically referred to as jet quenching . Significant the-oretical [51–59, 61–63, 65] and experimental [81–85] ef-forts have been made to better understand the physics ofthis process. Using the novel techniques proposed in thiswork, we will eventually be able to analyze the propertiesof the medium modified parton shower using, for the firsttime, the complete event information.
The parton shower.
The parton shower we use fortraining the
GAN is designed to solve the
DGLAP evo-lution equations, see Refs. [50, 86]. In addition, we setup the full event kinematics in spherical coordinates suchthat we can use the final distribution of partons gener-ated by the shower as input to the adversarial trainingprocess. We start with a highly energetic parton whichoriginates from a hard-scattering event at the scale Q .The parton shower cascade is obtained through recursive1 → DGLAP evolution equations. There are three variables that de-scribe a
DGLAP splitting process i → jk as illustratedin Fig. 1. First, the large light cone momentum fraction z of the daughter partons relative to the parent is de- termined by sampling from the Altarelli-Parisi splittingfunctions. Second, the orientation of the two daughterpartons, the azimuthal angle φ , is obtained by samplingfrom a flat distribution in the range [ − π, π ]. Third, thesplitting angle θ which is the relative opening angle ofthe two daughter partons, is determined as follows: First,sample a Monte Carlo time step ∆ t from the no-emissionSudakov factorexp (cid:34) − ∆ t (cid:88) i = q, ¯ q,g − (cid:15) (cid:90) (cid:15) d z P i ( z ) (cid:35) , (1)where the P i denote the final state summed Altarelli-Parisi splitting functions for (anti-)quarks and gluons.Then advance the shower time t → t + ∆ t and solve forthe splitting angle θ in t ( Q, θ ) = Q tan( θ/ (cid:90) Q tan( π/ d t (cid:48) t (cid:48) α s ( t (cid:48) ) π . (2)We evolve the shower from the hard scale Q down to thehadronization scale which we choose as 1 GeV. We notethat the DGLAP shower described here has two cutoffparameters. First, the angular cutoff on the splittingangle θ which is introduced by the hadronization scaleand which determines the end of the shower. Second,we introduce the cutoff (cid:15) on the momentum fraction z ,see Eq. (1). For our numerical results we choose (cid:15) = 0 . (cid:15) < z < − (cid:15) , and emittedpartons that violate these bounds are not evolved furtherin the shower. From the parent direction and the vari-ables ( z, θ, φ ) of a given 1 → , ˜Φ). The relevant kinematic relations are summarizedin the supplemental material. After the shower termi-nates, we record the final momentum fractions Z of thepartons relative to the initial momentum scale Q as well d N / d Z PS Q =300GeV × Q =500GeV × Q =700GeV × Q =300GeV × Q =500GeV × Q =700GeV × Z GAN / PS Q =300GeV Q =500GeV Q =700GeV d N / d GAN Q =300GeV × Q =500GeV × Q =700GeV × Q =300GeV × Q =500GeV × Q =700GeV × GAN / PS Q =300GeV Q =500GeV Q =700GeV d N / d GAN Q =300GeV × Q =500GeV × Q =700GeV × Q =300GeV × Q =500GeV × Q =700GeV × GAN / PS Q =300GeV Q =500GeV Q =700GeV FIG. 3. Comparison of the parton shower and
GAN in terms of the final distribution of particles. The three panels show themomentum fraction Z , the polar angle Θ and the azimuthal angle Φ (from left to right) for Q = 300 , ,
700 GeV. as their corresponding spherical coordinates (Θ , Φ) . To-gether with the on-shell condition they fully specify theexclusive final state distribution of all particles which areproduced by the shower. We note that the variables z, φ are independent of the shower time t (self-similar or frac-tal variables), whereas the splitting angle θ is determinedfrom the ordering variable of the shower and it also de-pends on the scale Q . Therefore, we treat θ differentlyfrom the other two variables in the generator network, asdiscussed below. The shower described here provides anideal test ground to explore the use of explainable ma-chine learning that aims to extract the structure of theparton shower, and thus the underlying physics, from thefinal distribution of particles in the event. We leave theinvestigation of other shower algorithms and nonpertur-bative effects for future work. Data representation and setup of the GAN.
To avoidany loss of information, we choose to train the
GAN di-rectly on sets which contain the event-by-event particlefour-vectors produced by the shower. The required per-mutation invariance is built into the discriminator net-work by using so-called deep sets which were developedin Refs. [68–70]. Several equivariant layers are followedby a permutation invariant layer which ensures that thediscriminator network is insensitive to the ordering of theinput. Since the number of particles that are producedper event fluctuates, the sets of four-vectors have variablelength. Deep sets are ideally suited to handle input withdifferent lengths. To accommodate the variable length ofthe training data we allow the deep sets to contain up Note that we use the variables ( z, θ, φ ) to describe an individual1 → , ˜Φ) are the spher-ical coordinates of partons at intermediate stages of the showerand ( Z, Θ , Φ) denote the final distributions of the momentumfraction and angles of the partons after the shower terminates. to 200 four-vectors which is sufficient for the energy Q that we consider here. We note that it is also possible totrain the network on a set of observables where Infrared-Collinear safety is built in directly [22, 87]. We plan toexplore the impact of different data representations infuture work which will be particularly relevant once weinclude nonperturbative effects in the shower.The generator network mimics the structure of a par-ton shower. It sequentially produces partons and learnsto map n to n + 1 partons. To simplify the trainingprocess, the generator is separated into a Monte Carlotime-dependent and time-independent part. The time-independent part is designed to learn the Altarelli-Parisisplitting function P i → jk and the azimuthal angle φ whichare the same for every branching process and indepen-dent of Q . Whereas the other part of the network de-pends on the Monte Carlo time t and on the energy Q ,i.e. it changes at every step of the shower and producesemissions which are ordered in the splitting angle θ , seeEq. (2). Both parts of the generator consist of neuralnetworks with 5 hidden layers and 50 neurons, which isillustrated schematically in Fig. 2. We use the exponen-tial linear unit (ELU) [88] as the activation function, toavoid step functions in the resulting z and θ distributions.We note that the two shower cutoffs discussed above arealso explicitly included in the generator network. How-ever, in general, we expect that the cutoffs can be chosenas trainable parameters as well.Using the shower setup described above, we generatetraining data for different energies in the range of Q =200–800 GeV. As a proof of concept, we study a puregluon shower where the gluon that splits is chosen atrandom. The training process of the GAN is a modifiedversion of the original
GAN approach. More details aregiven in the supplemental material.
Numerical results.
We first verify that the
GAN canreproduce the final distribution of particles and we then N s p lit d P / d z P gg Q <800 GeV
GAN split 1GAN split 2GAN split 3GAN split 4 PS split 1PS split 2PS split 3PS split 4 z GAN / PS Split 1Split 2 Split 3Split 4 N s p lit d P / d Q <800 GeVGAN split 1GAN split 2GAN split 3GAN split 4 PS split 1PS split 2PS split 3PS split 4 GAN / PS Split 1Split 2 Split 3Split 4 N s p lit d P / d Split 1GAN Q =300 GeVGAN Q =500 GeVGAN Q =700 GeVPS Q =300 GeVPS Q =500 GeVPS Q =700 GeV GAN / PS Q =300 GeV Q =500 GeV Q =700 GeV
FIG. 4. Comparison of the momentum fraction z , i.e. the Altarelli-Parisi splitting function P g → gg ( z ) (left) and the relativesplitting angle θ (middle) of the first four splittings from the parton shower and the GAN for Q = 200 −
800 GeV. In addition,we show the θ distribution for three different values of Q for the first splitting (right). consider the underlying physics by sampling from the dif-ferent layers of the network. To quantify the agreementbetween the shower and the GAN , we consider threekinematic variables ( Z, Θ , Φ) which characterize the fi-nal distribution of particles. The result of the
GAN andthe parton shower is shown in the three panels of Fig. 3,where 3 . × events from the GAN after 700 trainingepochs is compared to 3 . × parton shower events. Weobserve very good agreement for all three distributions.The good agreement over several orders of magnitude ishighly nontrivial even without considering the underlyingphysics. As expected for a DGLAP shower, the distri-bution of the parton momentum fractions rises steeplytoward small- Z (left panel). The distribution of the po-lar angle Θ peaks in the direction of the initial partonand Φ is flat which is consistent with the flat samplingof φ for each individual splitting.Having confirmed that the GAN can reproduce the fi-nal output of the parton shower, we are now going toanalyze the individual splitting processes to verify thatthe network has also correctly learned the underlyingphysics. The ability of the
GAN to extract informationabout parton branching mechanism is the main noveltyof our work. By sampling from different layers of thenetwork, we study the distribution of the variables ( z, θ )that characterize the individual splitting processes. Asrepresentative examples, we show the results for the firstfour splittings in the left and middle panel of Fig. 4. Thedistribution of the momentum fraction z is shown in theleft panel for the g → gg splitting process. We observevery good agreement with the Altarelli-Parisi splittingfunction P g → gg for all four splittings. In particular, wenote that the splitting function diverges for z →
1. In-stead, the final Z -distribution (left panel in Fig. 3) fallsoff steeply toward Z → GAN has in fact learned the underlyingphysics mechanism. Next we consider the Monte Carlotime-dependent θ distribution which is shown in the mid-dle panel of Fig. 4. We observe that it is correctly repro-duced by the GAN besides small fluctuations in the tail.The distributions peak at small values of θ . As expectedfor the ordering variable of the shower, the distributionsbecome more narrow for splittings that occur at laterMonte Carlo time. Here, θ is the only variable that de-pends on the scale Q . We investigate its Q dependenceby considering the first splitting of the shower which isshown in the right panel of Fig. 4. Even though the GAN is optimized to reproduce only the Q -integrated distribu-tion, the Q -dependence of the shower is nevertheless welldescribed by the network. We attribute the remainingnumerical differences to the finite number of neurons incombination with the activation function and their abil-ity to approximate a steep multi-differential distribution.This can be mitigated by extending the size of the neuralnetwork. Lastly, we find that the distribution of the az-imuthal angle φ (not shown) also agrees with the partonshower result and we thus conclude that the GAN hasin fact accurately learned the underlying physics of theparton shower.
Conclusions.
In this letter we proposed an explain-able machine learning - a White Box AI - frameworkwhich successfully learns the underlying physics of a par-ton shower - a hallmark of modeling high-energy par-ticle collisions. As a proof of concept, we demonstratedthat Generative Adversarial Networks (
GAN s) using thefull event information are capable of learning the partoncascade as described by a parton shower implementing
DGLAP evolution equations. As input to the adversarialtraining process we used deep sets which yield a permuta-tion invariant representation of the training data of vari-able length. We found that not only the final distributionof partons in the event can be described by the networkbut also the physics of individual splittings processes arecorrectly learned by the
GAN . We consider our work asa starting point of a long-term effort with the goal toeventually train networks directly on experimental datadesigned for extracting the underlying physics using fullevent information registered in the detectors. We notethat the precision of our approach in falsifying theoreticalmodeling is limited by the systematic experimental biaseswhich we plan to explore in subsequent publications. Weexpect our results to be particularly relevant for futurestudies of nonperturbative physics, collective effects, andthe modification of the vacuum parton shower in heavy-ion collisions or electron-nucleus collisions at the futureElectron-Ion Collider.
Acknowledgements.
We would like to thank BarbaraJacak, James Mulligan, Stefan Prestel, Nobuo Sato andFeng Yuan for helpful discussions. YSL, MP and FRare supported by the U.S. Department of Energy underContract No. DE-AC02-05CH11231 and the LDRD Pro-gram of Lawrence Berkeley National Laboratory. DN issupported by the U.S. Department of Energy under Con-tract No. DE-AC52-06NA25396 at LANL and throughthe LANL/LDRD Program. ∗ [email protected] † duff[email protected] ‡ [email protected] § [email protected][1] L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, andA. Schwartzman, “Jet-images — deep learning edition,” JHEP (2016) 069, arXiv:1511.05190 [hep-ph] .[2] P. T. Komiske, E. M. Metodiev, and M. D. Schwartz,“Deep learning in color: towards automatedquark/gluon jet discrimination,” JHEP (2017) 110, arXiv:1612.01551 [hep-ph] .[3] G. Kasieczka, T. Plehn, M. Russell, and T. Schell,“Deep-learning Top Taggers or The End of QCD?,” JHEP (2017) 006, arXiv:1701.08784 [hep-ph] .[4] E. M. Metodiev, B. Nachman, and J. Thaler,“Classification without labels: Learning from mixedsamples in high energy physics,” JHEP (2017) 174, arXiv:1708.02949 [hep-ph] .[5] C. Englert, P. Galler, P. Harris, and M. Spannowsky,“Machine Learning Uncertainties with AdversarialNeural Networks,” Eur. Phys. J. C no. 1, (2019) 4, arXiv:1807.08763 [hep-ph] .[6] B. Hashemi, N. Amin, K. Datta, D. Olivito, andM. Pierini, “LHC analysis-specific datasets withGenerative Adversarial Networks,” arXiv:1901.05282[hep-ex] .[7] S. Otten, S. Caron, W. de Swart, M. van Beekveld,L. Hendriks, C. van Leeuwen, D. Podareanu, R. Ruiz deAustri, and R. Verheyen, “Event Generation andStatistical Sampling for Physics with Deep GenerativeModels and a Density Information Buffer,” arXiv:1901.00875 [hep-ph] .[8] A. Butter, T. Plehn, and R. Winterhalder, “How toGAN LHC Events,” SciPost Phys. no. 6, (2019) 075, arXiv:1907.03764 [hep-ph] .[9] R. Di Sipio, M. Faucci Giannelli,S. Ketabchi Haghighat, and S. Palazzo, “DijetGAN: AGenerative-Adversarial Network Approach for theSimulation of QCD Dijet Events at the LHC,” JHEP (2019) 110, arXiv:1903.02433 [hep-ex] .[10] S. Farrell, W. Bhimji, T. Kurth, M. Mustafa, D. Bard,Z. Lukic, B. Nachman, and H. Patton, “NextGeneration Generative Neural Networks for HEP,” EPJWeb Conf. (2019) 09005.[11] Y. Alanazi, N. Sato, T. Liu, W. Melnitchouk, M. P.Kuchera, E. Pritchard, M. Robertson, R. Strauss,L. Velasco, and Y. Li, “Simulation of electron-protonscattering events by a Feature-Augmented andTransformed Generative Adversarial Network(FAT-GAN),” arXiv:2001.11103 [hep-ph] .[12] L.-G. Pang, K. Zhou, N. Su, H. Petersen, H. St¨ocker,and X.-N. Wang, “An equation-of-state-meter ofquantum chromodynamics transition from deeplearning,”
Nature Commun. no. 1, (2018) 210, arXiv:1612.04262 [hep-ph] .[13] P. T. Komiske, E. M. Metodiev, B. Nachman, andM. D. Schwartz, “Pileup Mitigation with MachineLearning (PUMML),” JHEP (2017) 051, arXiv:1707.08600 [hep-ph] .[14] NNPDF
Collaboration, R. D. Ball et al. , “Partondistributions from high-precision collider data,”
Eur.Phys. J. C no. 10, (2017) 663, arXiv:1706.00428[hep-ph] .[15] M. Paganini, L. de Oliveira, and B. Nachman,“Accelerating Science with Generative AdversarialNetworks: An Application to 3D Particle Showers inMultilayer Calorimeters,” Phys. Rev. Lett. no. 4,(2018) 042003, arXiv:1705.02355 [hep-ex] .[16] K. Datta and A. J. Larkoski, “Novel Jet Observablesfrom Machine Learning,”
JHEP (2018) 086, arXiv:1710.01305 [hep-ph] .[17] A. J. Larkoski, I. Moult, and B. Nachman, “JetSubstructure at the Large Hadron Collider: A Reviewof Recent Advances in Theory and Machine Learning,” Phys. Rept. (2020) 1–63, arXiv:1709.04464[hep-ph] .[18] Y.-T. Chien and R. Kunnawalkam Elayavalli, “Probingheavy ion collisions using quark and gluon jetsubstructure,” arXiv:1803.03589 [hep-ph] .[19] J. H. Collins, K. Howe, and B. Nachman, “AnomalyDetection for Resonant New Physics with MachineLearning,”
Phys. Rev. Lett. no. 24, (2018) 241803, arXiv:1805.02664 [hep-ph] .[20] K. Zhou, G. Endr˝odi, L.-G. Pang, and H. St¨ocker,“Regressive and generative neural networks for scalarfield theory,”
Phys. Rev. D no. 1, (2019) 011501, arXiv:1810.12879 [hep-lat] .[21] Y. S. Lai, “Automated Discovery of Jet SubstructureAnalyses,” arXiv:1810.00835 [nucl-th] .[22] P. T. Komiske, E. M. Metodiev, and J. Thaler, “EnergyFlow Networks: Deep Sets for Particle Jets,”
JHEP (2019) 121, arXiv:1810.05165 [hep-ph] .[23] Y.-L. Du, K. Zhou, J. Steinheimer, L.-G. Pang,A. Motornenko, H.-S. Zong, X.-N. Wang, andH. St¨ocker, “Identifying the nature of the QCD transition in relativistic collision of heavy nuclei withdeep learning,” Eur. Phys. J. C no. 6, (2020) 516, arXiv:1910.11530 [hep-ph] .[24] L.-G. Pang, K. Zhou, and X.-N. Wang, “Interpretabledeep learning for nuclear deformation in heavy ioncollisions,” arXiv:1906.06429 [nucl-th] .[25] A. Andreassen, P. T. Komiske, E. M. Metodiev,B. Nachman, and J. Thaler, “OmniFold: A Method toSimultaneously Unfold All Observables,” Phys. Rev.Lett. no. 18, (2020) 182001, arXiv:1911.09107[hep-ph] .[26] S. Carrazza and F. A. Dreyer, “Jet grooming throughreinforcement learning,”
Phys. Rev. D no. 1, (2019)014014, arXiv:1903.09644 [hep-ph] .[27] G. Kasieczka, S. Marzani, G. Soyez, and G. Stagnitto,“Towards Machine Learning Analytics for JetSubstructure,” arXiv:2007.04319 [hep-ph] .[28] L. Li, Y.-Y. Li, T. Liu, and S.-J. Xu, “Learning Physicsat Future e − e + Colliders with Machine,” arXiv:2004.15013 [hep-ph] .[29] G. Kanwar, M. S. Albergo, D. Boyda, K. Cranmer,D. C. Hackett, S. Racani`ere, D. J. Rezende, and P. E.Shanahan, “Equivariant flow-based sampling for latticegauge theory,”
Phys. Rev. Lett. no. 12, (2020)121601, arXiv:2003.06413 [hep-lat] .[30] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,“Generative adversarial nets,” in
Proceedings ofNIPS’14 , pp. 2672–2680. Cambridge, MA, USA, 2014. http://dl.acm.org/citation.cfm?id=2969033.2969125 .[31] A. Radford, L. Metz, and S. Chintala, “Unsupervisedrepresentation learning with deep convolutionalgenerative adversarial networks,”
CoRR abs/1511.06434 (2016) .[32] A. Andreassen, I. Feige, C. Frye, and M. D. Schwartz,“JUNIPR: a Framework for Unsupervised MachineLearning in Particle Physics,”
Eur. Phys. J. C no. 2,(2019) 102, arXiv:1804.09720 [hep-ph] .[33] J. Monk, “Deep Learning as a Parton Shower,” JHEP (2018) 021, arXiv:1807.03685 [hep-ph] .[34] A. Bogatskiy, B. Anderson, J. T. Offermann, M. Roussi,D. W. Miller, and R. Kondor, “Lorentz GroupEquivariant Neural Network for Particle Physics,” arXiv:2006.04780 [hep-ph] .[35] A. J. Larkoski and T. Melia, “Covariantizing PhaseSpace,” arXiv:2008.06508 [hep-ph] .[36] T. Faucett, J. Thaler, and D. Whiteson, “MappingMachine-Learned Physics into a Human-ReadableSpace,” arXiv:2010.11998 [hep-ph] .[37] T. Sjostrand, S. Mrenna, and P. Z. Skands, “A BriefIntroduction to PYTHIA 8.1,” Comput. Phys.Commun. (2008) 852–867, arXiv:0710.3820[hep-ph] .[38] M. Bahr et al. , “Herwig++ Physics and Manual,”
Eur.Phys. J. C (2008) 639–707, arXiv:0803.0883[hep-ph] .[39] T. Gleisberg, S. Hoeche, F. Krauss, M. Schonherr,S. Schumann, F. Siegert, and J. Winter, “Eventgeneration with SHERPA 1.1,” JHEP (2009) 007, arXiv:0811.4622 [hep-ph] .[40] Z. Nagy and D. E. Soper, “Parton shower evolutionwith subleading color,” JHEP (2012) 044, arXiv:1202.4496 [hep-ph] . [41] S. H¨oche and S. Prestel, “The midpoint between dipoleand parton showers,” Eur. Phys. J. C no. 9, (2015)461, arXiv:1506.05057 [hep-ph] .[42] S. Alioli, C. W. Bauer, C. Berggren, F. J. Tackmann,and J. R. Walsh, “Drell-Yan production atNNLL’+NNLO matched to parton showers,” Phys. Rev.D no. 9, (2015) 094020, arXiv:1508.01475[hep-ph] .[43] M. Dasgupta, F. A. Dreyer, K. Hamilton, P. F. Monni,and G. P. Salam, “Logarithmic accuracy of partonshowers: a fixed-order study,” JHEP (2018) 033, arXiv:1805.09327 [hep-ph] . [Erratum: JHEP 03, 083(2020)].[44] G. Bewick, S. Ferrario Ravasio, P. Richardson, andM. H. Seymour, “Logarithmic accuracy ofangular-ordered parton showers,” JHEP (2020) 019, arXiv:1904.11866 [hep-ph] .[45] M. Dasgupta, F. A. Dreyer, K. Hamilton, P. F. Monni,G. P. Salam, and G. Soyez, “Parton showers beyondleading logarithmic accuracy,” Phys. Rev. Lett. no. 5, (2020) 052002, arXiv:2002.11114 [hep-ph] .[46] J. R. Forshaw, J. Holguin, and S. Pl¨atzer, “Building aconsistent parton shower,” arXiv:2003.06400[hep-ph] .[47] B. Andersson, G. Gustafson, G. Ingelman, andT. Sjostrand, “Parton Fragmentation and StringDynamics,”
Phys. Rept. (1983) 31–145.[48] G. Marchesini and B. Webber, “Monte Carlo Simulationof General Hard Processes with Coherent QCDRadiation,” Nucl. Phys. B (1988) 461–526.[49] A. Metz and A. Vossen, “Parton FragmentationFunctions,”
Prog. Part. Nucl. Phys. (2016) 136–202, arXiv:1607.02521 [hep-ex] .[50] D. Neill, F. Ringer, and N. Sato, “Calculating theenergy loss of leading jets,” in . 8,2020. arXiv:2008.09532 [hep-ph] .[51] M. Gyulassy and X.-n. Wang, “Multiple collisions andinduced gluon Bremsstrahlung in QCD,” Nucl. Phys. B (1994) 583–614, arXiv:nucl-th/9306003 .[52] R. Baier, Y. L. Dokshitzer, A. H. Mueller, S. Peigne,and D. Schiff, “Radiative energy loss and p(T)broadening of high-energy partons in nuclei,”
Nucl.Phys. B (1997) 265–282, arXiv:hep-ph/9608322 .[53] B. Zakharov, “Fully quantum treatment of theLandau-Pomeranchuk-Migdal effect in QED and QCD,”
JETP Lett. (1996) 952–957, arXiv:hep-ph/9607440 .[54] M. Gyulassy, P. Levai, and I. Vitev, “Reaction operatorapproach to nonAbelian energy loss,” Nucl. Phys. B (2001) 371–419, arXiv:nucl-th/0006010 .[55] X.-N. Wang and X.-f. Guo, “Multiple parton scatteringin nuclei: Parton energy loss,”
Nucl. Phys. A (2001) 788–832, arXiv:hep-ph/0102230 .[56] P. B. Arnold, G. D. Moore, and L. G. Yaffe, “Photonand gluon emission in relativistic plasmas,”
JHEP (2002) 030, arXiv:hep-ph/0204343 .[57] J.-w. Qiu and I. Vitev, “Coherent QCD multiplescattering in proton-nucleus collisions,” Phys. Lett. B (2006) 507–511, arXiv:hep-ph/0405068 .[58] H. Liu, K. Rajagopal, and U. A. Wiedemann,“Calculating the jet quenching parameter fromAdS/CFT,”
Phys. Rev. Lett. (2006) 182301, arXiv:hep-ph/0605178 . [59] N. Armesto et al. , “Comparison of Jet QuenchingFormalisms for a Quark-Gluon Plasma ’Brick’,” Phys.Rev. C (2012) 064904, arXiv:1106.1106 [hep-ph] .[60] Y. Mehtar-Tani, J. G. Milhano, and K. Tywoniuk, “Jetphysics in heavy-ion collisions,” Int. J. Mod. Phys. A (2013) 1340013, arXiv:1302.2579 [hep-ph] .[61] JET
Collaboration, K. M. Burke et al. , “Extracting thejet transport coefficient from jet quenching inhigh-energy heavy-ion collisions,”
Phys. Rev. C no. 1, (2014) 014909, arXiv:1312.5003 [nucl-th] .[62] J.-W. Qiu, F. Ringer, N. Sato, and P. Zurita,“Factorization of jet cross sections in heavy-ioncollisions,” Phys. Rev. Lett. no. 25, (2019) 252301, arXiv:1903.01993 [hep-ph] .[63] J. Putschke et al. , “The JETSCAPE framework,” arXiv:1903.07706 [nucl-th] .[64] P. Caucal, E. Iancu, and G. Soyez, “Deciphering the z g distribution in ultrarelativistic heavy ion collisions,” JHEP (2019) 273, arXiv:1907.04866 [hep-ph] .[65] V. Vaidya and X. Yao, “Transverse MomentumBroadening of a Jet in Quark-Gluon Plasma: An OpenQuantum System EFT,” arXiv:2004.11403 [hep-ph] .[66] F. de Avila Belbute-Peres, K. Smith, K. Allen,J. Tenenbaum, and J. Z. Kolter, “End-to-enddifferentiable physics for learning and control,” in Advances in Neural Information Processing Systems ,S. Bengio, H. Wallach, H. Larochelle, K. Grauman,N. Cesa-Bianchi, and R. Garnett, eds., vol. 31,pp. 7178–7189. Curran Associates, Inc., 2018. https://proceedings.neurips.cc/paper/2018/file/842424a1d0595b76ec4fa03c46e8d755-Paper.pdf .[67] A. Koul, S. Greydanus, and A. Fern, “Learning finitestate representations of recurrent policy networks,”
CoRR abs/1811.12530 (2018) , arXiv:1811.12530 . http://arxiv.org/abs/1811.12530 .[68] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. P´oczos,R. Salakhutdinov, and A. J. Smola, “Deep sets,” CoRR abs/1703.06114 (2017) , arXiv:1703.06114 . http://arxiv.org/abs/1703.06114 .[69] E. Wagstaff, F. B. Fuchs, M. Engelcke, I. Posner, andM. A. Osborne, “On the limitations of representingfunctions on sets,” CoRR abs/1901.09006 (2019) , arXiv:1901.09006 . http://arxiv.org/abs/1901.09006 .[70] B. Bloem-Reddy and Y. Teh, “Probabilistic symmetriesand invariant neural networks,” Journal of MachineLearning Research no. 90, (2020) 1–61. http://jmlr.org/papers/v21/19-322.html .[71] A. Accardi et al. , “Electron Ion Collider: The NextQCD Frontier: Understanding the glue that binds usall,” Eur. Phys. J. A no. 9, (2016) 268, arXiv:1212.1701 [nucl-ex] .[72] J. D. Bjorken, “Highly relativistic nucleus-nucleuscollisions: The central rapidity region,” Phys. Rev. D (Jan, 1983) 140–151.[73] BRAHMS
Collaboration, I. Arsene et al. , “Quarkgluon plasma and color glass condensate at RHIC? ThePerspective from the BRAHMS experiment,”
Nucl.Phys. A (2005) 1–27, arXiv:nucl-ex/0410020 .[74]
PHENIX
Collaboration, K. Adcox et al. , “Formationof dense partonic matter in relativistic nucleus-nucleuscollisions at RHIC: Experimental evaluation by thePHENIX collaboration,”
Nucl. Phys. A (2005)184–283, arXiv:nucl-ex/0410003 . [75]
PHOBOS
Collaboration, B. Back et al. , “ThePHOBOS perspective on discoveries at RHIC,”
Nucl.Phys. A (2005) 28–101, arXiv:nucl-ex/0410022 .[76]
STAR
Collaboration, J. Adams et al. , “Experimentaland theoretical challenges in the search for the quarkgluon plasma: The STAR Collaboration’s criticalassessment of the evidence from RHIC collisions,”
Nucl.Phys. A (2005) 102–183, arXiv:nucl-ex/0501009 .[77] B. V. Jacak and B. Muller, “The exploration of hotnuclear matter,”
Science (2012) 310–314.[78] B. M¨uller, J. Schukraft, and B. Wys(cid:32)louch, “FirstResults from Pb+Pb Collisions at the LHC,”
Annu.Rev. Nucl. Part. S. no. 1, (2012) 361–386.[79] P. Braun-Munzinger, V. Koch, T. Sch¨afer, andJ. Stachel, “Properties of hot and dense matter fromrelativistic heavy ion collisions,” Phys. Rept. (2016)76–126.[80] W. Busza, K. Rajagopal, and W. van der Schee, “HeavyIon Collisions: The Big Picture, and the Big Questions,”
Ann. Rev. Nucl. Part. Sci. (2018) 339–376.[81] PHENIX
Collaboration, A. Adare et al. , “HeavyQuark Production in p + p and Energy Loss and Flowof Heavy Quarks in Au+Au Collisions at √ s NN = 200GeV,” Phys. Rev. C (2011) 044905, arXiv:1005.1627 [nucl-ex] .[82] CMS
Collaboration, A. M. Sirunyan et al. ,“Measurement of prompt and nonprompt charmoniumsuppression in PbPb collisions at 5.02 TeV,”
Eur. Phys.J. C no. 6, (2018) 509, arXiv:1712.08959[nucl-ex] .[83] STAR
Collaboration, L. Adamczyk et al. ,“Measurements of jet quenching with semi-inclusivehadron+jet distributions in Au+Au collisions at √ s NN = 200 GeV,” Phys. Rev. C no. 2, (2017) 024905, arXiv:1702.01108 [nucl-ex] .[84] ALICE
Collaboration, S. Acharya et al. ,“Measurements of inclusive jet spectra in pp and centralPb-Pb collisions at √ s NN = 5.02 TeV,” Phys. Rev. C no. 3, (2020) 034911, arXiv:1909.09718[nucl-ex] .[85]
ATLAS
Collaboration, M. Aaboud et al. ,“Measurement of the nuclear modification factor forinclusive jets in Pb+Pb collisions at √ s NN = 5 .
02 TeVwith the ATLAS detector,”
Phys. Lett. B (2019)108–128, arXiv:1805.05635 [nucl-ex] .[86] M. Dasgupta, F. Dreyer, G. P. Salam, and G. Soyez,“Small-radius jets to all orders in QCD,”
JHEP (2015) 039, arXiv:1411.5182 [hep-ph] .[87] M. J. Dolan and A. Ore, “Equivariant Energy FlowNetworks for Jet Tagging,” arXiv:2012.00964[hep-ph] .[88] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fastand accurate deep network learning by exponentiallinear units (elus),” CoRR abs/1511.07289 (2016) , arXiv:1511.07289 .[89] D. R. Cox, “The regression analysis of binarysequences,”
Journal of the Royal Statistical Society:Series B (Methodological) no. 2, (1958) 215–232.[90] D. P. Kingma and J. Ba, “Adam: A method forstochastic optimization,” CoRR abs/1412.6980 (2015), arXiv:1412.6980 . Supplemental material
We first discuss the splitting kinematics of the
DGLAP parton branching process. In particular, we focus on settingup the full event kinematics in spherical coordinates. We then present more details of the
GAN setup.
The angles of the daughter partons relative to the parent direction
The shower is designed to conserve the momentum in the plane orthogonal to the direction of the parent partonthat splits, and also conserve the energy or the light-cone momentum components parallel to the parent direction,the two being equivalent up to power-corrections in the small splitting angle limit. Formally this does not conservethe total global transverse momentum relative to the initiating parton of the cascade, on the order of 5 ∼
10% of thetotal energy of the jet, and necessarily builds up a total non-zero invariant mass of the final state, but does preservethe angular structure of the shower, and the distribution of energy implied by the
DGLAP evolution equations.More sophisticated momentum conservation schemes exist, preserving more of the structure of the distribution ofpartons in phase-space. This is necessary for the resummation of logarithms beyond leading logarithmic order, butsuch complications are unnecessary for our proof-of-concept.We consider the
DGLAP → z and 1 − z of the two daughter partons, their relative opening angle θ and their orientation in azimuth φ . In order to determine the spherical coordinates of the two daughter partons, westart by calculating their angle with respect to the parent direction, which we denote by θ , p . The angles θ , p areillustrated in Fig. 5, and we have θ = θ p + θ p . The two angles can be determined from the relative splitting angle θ which is related to the Monte Carlo time and the momentum fraction z . We consider the splitting of a parent partonwith momentum l µ (in the − z direction) to two daughter partons with momentum q µ and l µ − q µ . Both partons afterthe splitting are on-shell q = ( l − q ) = 0. Using light cone coordinates, we have | (cid:126)q | = q = 12 ( q − + q + ) = 12 ( zl − + (1 − z ) l + ) ≈ zl − . (3)where we used q + = l + l − ( l − − q − ) , (4)which follows from ( l − q ) = 0. In addition, we have l = l + l − and q − = zl − . The approximation in Eq. (3) holdsfor l + (cid:28) l − . Similarly, we find | (cid:126)l − (cid:126)q | = l − q = 12 ( l + + l − − ( q + + q − )) = 12 ((1 − z ) l − + zl + ) ≈
12 (1 − z ) l − . (5)In order to write the angle θ p of the daughter parton with momentum q µ in terms of the splitting angle θ and themomentum fraction z , we consider cos θ p = (cid:126)q · (cid:126)l | (cid:126)q || (cid:126)l | . (6)We rewrite the expression in terms of the momenta q µ and l µ − q µ ascos θ p = (cid:126)q · ( (cid:126)l − (cid:126)q ) + | (cid:126)q | | (cid:126)q || ( (cid:126)l − (cid:126)q ) + (cid:126)q | = | (cid:126)q || (cid:126)l − (cid:126)q | cos θ + | (cid:126)q | | (cid:126)q | (cid:113) | (cid:126)l − (cid:126)q | + | (cid:126)q | + 2 | (cid:126)l − (cid:126)q || (cid:126)q | cos θ = (1 − z ) cos θ p + z (cid:112) (1 − z ) + z + 2 z (1 − z ) cos θ . (7) FIG. 5. Illustration of the
DGLAP → θ of the two daughterpartons and their angles relative to the parent direction θ , p . The last line is obtained by inserting the expressions for | (cid:126)q | and | (cid:126)l − (cid:126)q | which were obtained above. We thus find thefollowing expression for the angle between the parent direction and the daughter parton with momentum q µ : θ p = arccos (cid:18) z + (1 − z ) cos θ (cid:112) − z (1 − z )(1 − cos θ ) (cid:19) . (8)Then the angle of the other daughter parton is given as θ p = θ − θ p . The direction of the two daughter partons in absolute spherical coordinates
Given the direction of the parent parton in absolute spherical coordinates ( ˜Θ p , ˜Φ p ) and the kinematics of the 1 → θ ip derived above), we can now determine the spherical coordinatesof the two daughter partons ( ˜Θ di , ˜Φ di ), i = 1 ,
2. We start with the vector pointing in the direction of the parentparton. In spherical coordinates, we have (cid:126)r p = sin ˜Θ p cos ˜Φ p sin ˜Θ p sin ˜Φ p cos ˜Θ p . (9)In order to generate the random distribution in azimuth, we construct a random vector (cid:126)r r which is then orthonormal-ized to get a basis vector in the plane transverse to the parent direction. We use flat sampling for each component (cid:126)r ir in the range of [1 , − (cid:126)r A = 1 N (( (cid:126)r p · (cid:126)r r ) (cid:126)r p − (cid:126)r r ) , (10)where the normalization factor N is given by N = (cid:32)(cid:88) i (cid:0) ( (cid:126)r p · (cid:126)r r ) (cid:126)r ip − (cid:126)r ir (cid:1)(cid:33) / . (11)By construction, we thus have (cid:126)r A · (cid:126)r p = 0 , (cid:126)r A = 1 . (12)We can then construct a second basis vector (cid:126)r B by calculating the cross product (cid:126)r B = (cid:126)r A × (cid:126)r p , (13)which is normalized and orthogonal to both (cid:126)r A and (cid:126)r p . We write the vectors (cid:126)r di of the two daughter partons i = 1 , ∼ cos θ , p . The second vector is in the transverse plane relative to the parent directionand parametrized in terms of (cid:126)r A,B and a random variable φ chosen in the range of [0 , π ] (flat sampling). The0 FIG. 6. Illustration of the vectors and angles relevant to determine the direction of the two daughter partons in absolutespherical coordinates. magnitude of that second vector is given by sin θ , p . For the two daughters i = 1 ,
2, the resulting vector can bewritten as (cid:126)r di = cos( θ ip ) (cid:126)r p ± sin( θ ip )(cos( φ ) (cid:126)r A + sin( φ ) (cid:126)r B ) . (14)See Fig. 6 for an illustration of the vectors and angles relevant for setting up the full splitting kinematics of the twodaughter partons. We can then write the polar and azimuthal angle of the two daughter partons as˜Θ di = arccos( r zdi ) , ˜Φ di = π + arctan (cid:18) r ydi r xdi (cid:19) . (15) More details of the GAN setup
For practical purposes we split the generator network into a time dependent and a time independent part. Bothparts consist of five hidden layers with 50 neurons and an exponential linear unit (ELU) [88] activation to avoiddiscontinuous steps in the generated distributions. The time dependent network generates the next splitting angle θ (cid:48) i taking as input the previous angle θ (cid:48) i − , the initial scale Q , and a uniformly distributed [0 ,
1) random number. The timeindependent network generates the variables z (cid:48) i , φ taking as input uniformly distributed [0 ,
1) random numbers. Wealso take the momentum of the parent parton as input to the time independent network. Through the training process,the GAN learns that this information is not necessary to generate the variables z (cid:48) , φ . To avoid vanishing gradients,the immediate output neuron of the time independent neural network generates a transformed z (cid:48) i = − log(1 /z i − z i that is bounded by (0 , θ (cid:48) i from the output neuron of the time dependentnetwork is converted to θ i which is bounded by (0 , π/ Q , and the current θ i throughout the showerprocess. A random parton (pure gluon shower) is selected for the splitting process by double indexing: We firstsort the current list of showered partons in descending values of Z i , note its indexing order, and count the number N of partons when their momenta are above the cutoff (cid:15) and are therefore able to split. A random parton is thenchosen from the first N partons, and, using the order of the sorted index, it is mapped to the event record. Since theprocessing is de facto executed in parallel, we calculate the splitting of a parton even if N = 0 which is then reversedafterwards. The highest number of branching processes occur for Q = 800 GeV. In this case, our implementation onNvidia Titan RTX reaches an execution time of 95 ± µ s/event.The discriminator network consists of a sequence of two deep sets networks [68–70]. The first deep sets networktakes the list of partons from the shower as input, and produces the per-event activation. The second deep setsnetwork uses the output of the first network, the per-event activation, as input and produces the statistical activationfor the entire batch. We augment observables derived from the deep sets with the 2nd to 5th moment of the wholebatch parton momenta, in order to have a fall-back in the first training epochs, until the deep sets are fully trained.The deep sets and moments are combined by a shallow network with one layer of 20 hidden neurons.1We employ a modified training process compared to the original GAN which we summarize here. We use the binarycross entropy as the loss function [89] which is given by L = − E x [log D ( x )] − E c [log(1 − D ( G ( c )))] , (16)where E is the expectation value, D the discriminator, and G the generator. The conditional vector c contains boththe initial parton and a sufficient amount of random numbers for the full shower. The Adam optimizer [90] is used forboth the discriminator and generator, where the exponential decay rate for the first and second moment are chosen as β = 0 . β = 0 . λ = 5 × − for the discriminator, and λ = 5 × − for the generator.To guard against generator training steps that may inadvertently deteriorate the generator, the discriminator D istrained in each epoch until D ( x ) > . x , and D ( G ) < D ( x ) where G are the partonsgenerated by the GAN . After each generator training step, its finite step size result is tested and reverted in case itresulted in reduced D ( G ) scores.The time-dependent and independent networks of the generator are first pre-trained to be in the vicinity of thephysical value. We observe that at the beginning of the training, the untrained discriminator allows the generatorto deviate further from the pre-trained values. After ∼
500 epochs the discriminator is sufficiently trained to correctthe generator, and closure with the parton shower occurs after ∼
700 epochs. This mostly concerns the θ variablewhereas zz