Generative Models for Fast Calorimeter Simulation.LHCb case
Viktoria Chekalina, Elena Orlova, Fedor Ratnikov, Dmitry Ulyanov, Andrey Ustyuzhanin, Egor Zakharov
GGenerative Models for Fast Calorimeter Simulation: theLHCb case
Viktoria
Chekalina , , Elena
Orlova , Fedor
Ratnikov , , ∗ , Dmitry
Ulyanov , Andrey
Ustyuzhanin , , and Egor
Zakharov NRU Higher School of Economics, Moscow, Russia Yandex School of Data Analysis, Moscow, Russia Skolkovo Institute of Science and Technology, Moscow, Russia
Abstract.
Simulation is one of the key components in high energy physics.Historically it relies on the Monte Carlo methods which require a tremendousamount of computation resources. These methods may have di ffi culties withthe expected High Luminosity Large Hadron Collider (HL-LHC) needs, so theexperiments are in urgent need of new fast simulation techniques. We introducea new Deep Learning framework based on Generative Adversarial Networkswhich can be faster than traditional simulation methods by 5 orders of magni-tude with reasonable simulation accuracy. This approach will allow physiciststo produce a su ffi cient amount of simulated data needed by the next HL-LHCexperiments using limited computing resources. Simulation plays an important role in particle and nuclear physics. It is widely used in de-tector design and in comparisons between experimental data and theoretical models. Tra-ditionally, simulation relies on
Monte Carlo methods and requires significant computationalresources. In particular, such methods do not scale to meet the growing demands result-ing from large quantities of data expected during High Luminosity Large Hadron Collider(HL-LHC) runs. The detailed simulation of particle collisions and interactions as capturedby detectors at the LHC using a well-known simulation software
Geant4 annually requiresbillions of CPU hours constituting more than half of the LHC experiments’ computing re-sources [1, 2]. More specifically, the detailed simulation of particle showers in calorimetersis the most computationally demanding step.A line of simulation methods that exploit the idea of reusing previously calculated ormeasured physical quantities have been developed to reduce the computation time [3, 4].These approaches su ff er from being specific to an individual experiment and, despite beingfaster than the full simulation, they are not fast enough or lack accuracy. Thus, the particlephysics community is in need of new faster simulation methods to model experiments.One of the possible approaches to simulate the calorimeter response is using deep learn-ing techniques. In particular, a recent work [5], provided evidence that Generative Adver-sarial Networks can be used to e ffi ciently simulate particle showers. While over 100 , × speed-up over Geant4 is achieved, the setup was quite simple as the input particles were ∗ e-mail: [email protected] a r X i v : . [ phy s i c s . d a t a - a n ] A p r arametrized by energy only. However, even in this simplified approach, there are significantdi ff erences in distributions between generated and original parameters.In this work we build a model upon Wasserstein Generative Adversarial Networks andshow its superior performance over approach [5]. We also evaluate our model in a morecomplex scenario, when a particle is described by 5 parameters: 3d momentum ( p x , p y , p z )and 2d coordinate ( x , y ). Our method for high-fidelity fast simulation of particle showers inthe specific LHCb calorimeter aims to replace the existing Monte Carlo based methods andachieve a significant speed-up factor. Generative models are of great interest in deep learning. With these models, one can approx-imate a very complex distribution defined as a set of samples. For example, such models canbe utilized to generate a face image of a non-existing person or to continue a video sequencegiven several initial frames. In this section, we give a brief overview of the most popular gen-erative model in computer vision — Generative Adversarial Networks (GANs), its strong andweak sides and di ff erent modifications to alleviate its weaknesses. Then, we review and anal-yse current approaches for applying GANs to the simulation of calorimeters in High energyphysics. Generative Adversarial Networks (GANs) were originally presented by I. Goodfellow et al .in 2014 [6] and quickly became a state-of-the-art technique in areas such as image generation[7], with a huge number of extensions [8–10].In the GAN framework, the aim is to learn a mapping G , usually called generator , towarp an easy-to-draw distribution p ( z ) (e.g. p ( z ) = N (0 , I )) into a target distribution p data ( x )to facilitate sampling from p data ( x ). When G is learned, G ≡ G ∗ , sampling from the targetdistribution p data ( x ) is done by first drawing a sample from the distribution p ( z ) and thenfeeding the sample into the generator: G ∗ ( z ) ∼ p data , where z ∼ p ( z ). For such samplingprocedure, the time needed to draw a sample from p data ( x ) is approximately equal to the timeneeded to evaluate the function G in a point.The generator is learned by using a feedback from an external classifier (usually called discriminator ), which tries to find discrepancy between the target distribution p data ( x ) andfake distribution p G ( x ) defined by samples from the generator G ( z ) ∼ p G ( x ) , z ∼ p ( z ).More formally, generator G and discriminator D play the following zero sum game:min G max D E x ∼ p data ( x ) [log D ( x )] + E x ∼ p G ( x ) [log(1 − D ( x ))] , (1)where D ( G ( z )) is the output of the discriminator specifying the probability of its input tocome from the target distribution p data .In practice, the mappings G and D are parametrized by deep neural networks and the ob-jective Eq. (1) is optimized using alternating gradient descent. For a fixed generator, the dis-criminator minimizes binary cross-entropy in a binary classification problem (samples from p data versus samples from p G ). For the fixed discriminator, the generator is updated to makeits samples to be misclassified by the discriminator, thus moving the fake distribution closerto the target distribution.For a fixed generator, it is possible to show that the optimal value for the inner optimiza-tion can be written analytically:max D E x ∼ p data ( x ) [log D ( x )] + E x ∼ p G ( x ) [log(1 − D ( x ))] = JS( p data (cid:107) p G ) , (2)here JS is the Jensen-Shannon divergence. In fact, for the fixed generator (hence fixedfake distribution), the discriminator computes the divergence between the target distribution p data and the fake distribution p G . When the divergence is computed, the generator aims toupdate the fake distribution to make this divergence lower: min G JS( p data (cid:107) p G ). While theJensen-Shannon divergence naturally arises from the original game Eq. (1), any divergence ordistance D can be used instead: min G D ( p data (cid:107) p G ). A recent work [11] proposed to use the Wasserstein distance instead of the Jensen-Shannon divergence proving its better behavior: W ( p data (cid:107) p G ) = max f ∈F E x ∼ p data ( x ) [ f ( x )] − E x ∼ p G [ f ( x )] (3)where F is a set of 1-Lipshitz functions. Using the Wasserstein distance instead of the Jensen-Shannon divergence in the GAN objective leads to the Wasserstein GAN (WGAN) objective:min G max f ∈F E x ∼ p data ( x ) [ f ( x )] − E x ∼ p G ( x ) [ f ( x )] . (4)It is highly non-trivial to search over the set of 1-Lipshitz functions and several ways havebeen proposed in order to force this constraint [11, 12]. In Ref. [12], it is proved that the set ofoptimal functions for Eq. (4) contains such function, that the norm of it’s gradient in any pointequals one. In practice, this result motivates an additional loss added to the objective Eq. (4)with a weight λ , while the hard constraint on the function f to belong to the set F is removedand f is searched over all possible functions:min G max f E x ∼ p data ( x ) f ( x ) − E x ∼ p G ( x ) f ( x ) + λ E x ∼ p G (cid:0) (cid:107)∇ ˜ x D (˜ x ) (cid:107) − (cid:1) . (5)WGAN can be easily adapted to model a conditional distribution p data ( x | y ). The generatoris modified to take the condition along with the sample z so the fake distribution is nowdefined as G ( z , y ) ∼ p G ( x | y ) , z ∼ p ( z ) and the game ismin G max f E y ∼ p ( y ) (cid:104) E x ∼ p data ( x | y ) f ( x ) − E x ∼ p G ( x | y ) f ( x ) + λ E x ∼ p G ( x | y ) (cid:0) (cid:107)∇ ˜ x D (˜ x ) (cid:107) − (cid:1) (cid:105) . (6) A systematic study on the application of deep learning to the simulation of calorimeters forparticle physics has been carried out by Paganini et al. in 2017 [5] and has resulted in theCaloGAN package. The authors aim to speed up particle simulation in a 3-layer hetero-geneous calorimeter using GANs framework and achieve ∼ × speedup. They used anexisting state-of-the-art but slow simulation engine Geant4 to create a training dataset. Theysimulated positrons, photons and charged pions with various energies sampled from a flatdistribution between 1 GeV and 100 GeV. All incident particles in this study have an initialmomentum perpendicular to the face of the calorimeter. The shower in the first layer is rep-resented as a 3 ×
96 pixel image, the middle layer as a 12 ×
12 pixel image, and the last layeras a 12 × Geant4 . In this work, we focused on electrons interactions inside an electromagnetic calorimeter in-spired by the LHCb detector at the CERN LHC [15]. The calorimeter in this study uses"shashlik" technology of alternating scintillating tiles and lead plates. The prototype consistsof 5 × ×
12 cm, the cell granularity corresponds to each block being6 × × ×
30 images Y with the corresponding parameters ( p x , p y , p z , x , y ) of the original particle.An example of such an image is presented in the top row of Fig. 3.The training data set is created as follows. The calorimeter prototype structure describedabove is described in Geant4 as a mixture of subsequent sensitive and insensitive volumes.Particles are generated using a particle gun. Particle energies are distributed dropping as 1 / E in the energy range between 1 and 100 GeV. Particle positions are generated uniformly in thesquare 1 × XZ plane and 10 degrees in YZ plane. Then Geant4 isused to simulate particle interaction with the calorimeter using the full set of correspondingphysics processes. Information about every event, therefore, includes the original particleparameters accompanied by 30 ×
30 matrix of energies deposited in scintillators for everycell tower Y . Electrons are used as test particles. Produced training dataset contains 50 000events, and another 10 000 events are used as a test data sample. Our idea is to treat simulations as a black-box and replace the traditional Monte Carlo simu-lation with a method based on Generative Adversarial Networks. As WGANs with gradientpenalty are considered to be the state-of-the-art technique for image producing, we imple-ment a tool based on this approach. For it to be useful in realistic physics applications, sucha system needs to be able to accept requests for the generation of showers originating fromincoming particle parameters such as 3d momentum and 2d coordinate. We introduce anauxiliary task of reconstructing these parameters p x , p y , p z and x , y from a shower image. We need to generate a specific calorimeter response for a particle with some parameters. Itmeans that the model is required to be conditional. Firstly, we describe a generator and dis-criminator architecture. The generator maps from an input (a 512 × ×
30 image ˆ y using deconvolu-tional layers (in fact, it is an upsampling procedure and convolutions) which are arranged asfollows. We concatenate the noise vector and the parameters ( p x , p y , p z , x , y ), after whichwe add a fully connected layer with reshaping and obtain a 256 × × × ×
8, 64 × ×
16 and 32 × enerator input Discriminator
Regressor (pretrained) realfake scoreinput
Upsampling 2x + Conv + BN + ReLUConv s2 + LeakyReLU (gray = fixed)
CxHxW output tensor size (w/o batch size)
CxHxWCxHxW noise
Nx1
Training scheme
FC + reshape concat
Figure 1: Model architecture. Pre-trained regressor for the particle parameters predictionmakes our model conditional. Thanks to building up the information from the pre-trained re-gressor into the discriminator gradient we learn G to produce a specific calorimeter response. ×
32 with ReLu activation functions. After this procedure, we crop the last output to obtainthe image of the desired size 30 × G ) and returns the score D ( y ) or D (ˆ y ) as it is described in [11]. Thediscriminator architecture is simply the reversed generator architecture (i.e. sizes of layersgo in the opposite order). It implies that we have a 30 ×
30 matrix as input, from which weobtain output layers of size 32 × ×
32, 64 × ×
16, 128 × ×
8, followed by reshaping,which leads to 256 × ×
4, and by applying LeakyRelu activation function we get the finalscore. The model scheme is presented in Fig. 1.How to train WGAN with gradient penalty in a conditional manner is described in thefollowing section.
Due to the nature of WGAN loss, conditioning on the continuous value is a non-trivial task.To overcome this issue we suggest embedding a pre-trained regressor in our model. Wetrain a neural network to predict the particle parameters by the calorimeter response. As forarchitecture, it has the same one as the discriminator but with a perceptual loss describedin [16], because it was seen to work better compared to standard MSE. By building up theinformation from the pre-trained regressor into the discriminator gradient, we obtain the con-ditional model because we train the generator and the discriminator together. As a result, thediscriminator makes the generator produce a specific calorimeter response.Matrices from our dataset are pretty sparse because almost all information is locatedin central cells (see Fig. 2). To make the optimization process easier we apply a box-coxtransformation. This mapping helps to smooth the data that makes the optimization processmore stable. Results obtained with the described model are presented in the following section.
Figure 2: energy deposition in di ff erent cells of used 30 ×
30 setup for
Geant4 simulatedevents averaged over all events in the used dataset. (a) E = . E = . E = . E = . Figure 3: Showers generated with
Geant4 (first row) and the showers, simulated with ourmodel (second row) for three di ff erent sets of input parameters. Color represents lo g ( EMeV )for every cell.
We start with comparing original clusters, produced by full
Geant4 simulation and clustersgenerated by the trained model for the same parameters of the incident particles: the sameenergy, the same direction, and the same position on the calorimeter face. Correspondingimages for four arbitrary parameter sets are presented in Fig. 3. These images demonstratethe very good visual similarity between simulated and generated clusters.Then we continue with a quantitative evaluation of the proposed simulation method.While generic evaluation methods for generative models exist, here we base our evaluation onphysics-driven similarity metrics. These metrics are designed using the domain knowledgeand the recommendations from physicists on the evaluation of simulation procedures. Forthis presentation, we selected a few cluster properties which essentially drive cluster proper-ties used in the reconstruction of calorimeter objects and following physics analysis. If theinitial particle direction is not perpendicular to the calorimeter face, the produced cluster iselongated in that direction. Therefore, we consider separately cluster width in the direction of A r b i t r a r y un i t s GeantGAN (a) The transverse width of realand generated clusters A r b i t r a r y un i t s GeantGUN (b) The longitudinal width ofreal and generated clusters
10 5 0 5 10 ∆ X [cm]0.000.050.100.150.200.250.300.35 GeantGAN (c) ∆ X between cluster centerof mass and the true particlecoordinate F r a c t i o n o f c e ll s a b o v e t h r e s h o l d GeantGAN (d) The sparsity of real andgenerated clusters
GeantGAN (e) The transverse asymmetryof real and generated clusters
GeantGAN (f) The longitudinal asymme-try of real and generated clus-ters
Figure 4: Generated images quality evaluation including described physical characteristics.the initial particle and in the transverse direction. Spatial resolution, which is the distance be-tween the centre mass of the cluster and the initial track projection to the shower max depth,is another important characteristic a ff ecting the physics properties of the cluster. Cluster spar-sity, which is the fraction of cells with energies above some threshold, reflects the marginallow energy properties of the generated clusters. Finally, longitudinal and transverse asym-metries, which are di ff erences in energies between forward-backwards and left-right sides ofthe cluster, characterise coherent energy variations. A comparison of these characteristics ispresented in Fig. 4.The primary cluster characteristics demonstrate good agreement with fully simulateddata. However, secondary characteristics driven by long-range correlations between di ff erentcluster contributions might be significantly improved.As for model performance, we trained our model for 3000 epochs which take about 70hours on GPU NVIDIA Tesla K80. The sampling rate is 0.07 ms per sample on GPU, 4.9 msper sample on CPU. Conclusion and outlook
The research proves that Generative Adversarial Networks are a good candidate for fast sim-ulation of high granularity detectors typically studied for the next generation accelerators.We have successfully generated images of shower energy deposition with a condition on theparticle parameters, such as the momentum and the coordinates, using modern generativedeep neural network techniques such as Wasserstein GAN with gradient penalty.Future work will be focused on improving reproduction of second-order cluster charac-teristics, such as variations and long-range correlations between di ff erent cells.The research leading to these results has received funding from the Russian Science Foun-dation under agreement No 19-71-30020. References [1] C. Bozzi, Tech. rep., CERN-LHCb-PUB-2015-004 (2014)[2] J. Flynn, Tech. rep., CERN-RRB-2015-117 (2015)[3] G. Grindhammer, S. Peters, arXiv preprint hep-ex / Generative adversarial nets , in
Advances in neural informationprocessing systems (2014), pp. 2672–2680[7] A. Radford, L. Metz, S. Chintala, arXiv preprint arXiv:1511.06434 (2015)[8] P. Isola, J. Zhu, T. Zhou, A.A. Efros, arXiv preprint arXiv:1611.07004 (2016)[9] J.Y. Zhu, T. Park, P. Isola, A.A. Efros,
Unpaired Image-to-Image Translation usingCycle-Consistent Adversarial Networks , in
Computer Vision (ICCV), 2017 IEEE Inter-national Conference on (2017)[10] T.C. Wang, M.Y. Liu, J.Y. Zhu, G. Liu, A. Tao, J. Kautz, B. Catanzaro, arXiv preprintarXiv:1808.06601 (2018)[11] M. Arjovsky, S. Chintala, L. Bottou, arXiv preprint arXiv:1701.07875 (2017)[12] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A.C. Courville,
Improved trainingof wasserstein gans , in
Advances in Neural Information Processing Systems (2017), pp.5769–5779[13] Y. Taigman, M. Yang, M. Ranzato, L. Wolf,
Deepface: Closing the gap to human-levelperformance in face verification , in
Proceedings of the IEEE conference on computervision and pattern recognition (2014), pp. 1701–1708[14] L. de Oliveira, M. Paganini, B. Nachman, arXiv preprint arXiv:1701.05927 (2017)[15] A.A. Alves Jr. et al. (LHCb collaboration), JINST , S08005 (2008)[16] J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and super-resolution , in