Investigating a Deep Learning Method to Analyze Images from Multiple Gamma-ray Telescopes
Aryeh Brill, Qi Feng, T. Brian Humensky, Bryan Kim, Daniel Nieto, Tjark Miener
IInvestigating a Deep Learning Method to AnalyzeImages from Multiple Gamma-ray Telescopes
Aryeh Brill
Department of Physics,Columbia University
New York, New [email protected]
Qi Feng
Department of Physicsand Astronomy, Barnard College
New York, New York
T. Brian Humensky
Department of Physics,Columbia University
New York, New York
Bryan Kim
Department of Physicsand Astronomy, UCLA
Los Angeles, California
Daniel Nieto
Instituto de F´ısica de Part´ıculas y del Cosmos,Universidad Complutense de Madrid
Madrid, Spain
Tjark Miener
Instituto de F´ısica de Part´ıculas y del Cosmos,Universidad Complutense de Madrid
Madrid, Spain
Abstract —Imaging atmospheric Cherenkov telescope (IACT)arrays record images from air showers initiated by gamma raysentering the atmosphere, allowing astrophysical sources to beobserved at very high energies. To maximize IACT sensitivity,gamma-ray showers must be efficiently distinguished from thedominant background of cosmic-ray showers using images frommultiple telescopes. A combination of convolutional neural net-works (CNNs) with a recurrent neural network (RNN) has beenproposed to perform this task. Using CTLearn, an open sourcePython package using deep learning to analyze data from IACTs,with simulated data from the upcoming Cherenkov TelescopeArray (CTA), we implement a CNN-RNN network and findno evidence that sorting telescope images by total amplitudeimproves background rejection performance.
Index Terms —astrophysics, deep learning, convolutional neuralnetworks, recurrent neural networks
I. M
OTIVATION
Very-high-energy (VHE; from about 20 GeV to 300 TeV)gamma rays provide a critical probe of the Universe’s mostextreme environments, offering the opportunity to study exoticastrophysics and fundamental physics at high energies andcosmological distances. Gamma rays in this energy range canbe indirectly detected on the ground using arrays of imagingatmospheric Cherenkov telescopes (IACTs), which detect theCherenkov light emitted from air showers produced by VHEgamma rays when they are absorbed by the atmosphere.A wide variety of scientific studies can be performed withVHE gamma rays [1]. VHE gamma rays are observed fromsupernova remnants and pulsar wind nebulae in the Milky Wayand supermassive black holes in distant galaxies, providinginsight into the nature of these sources, such as how and where in these sources particles are accelerated to relativistic ener-gies. Astrophysicists also search for VHE gamma-ray emissionfrom dark-matter-dominated objects such as dwarf galaxies,looking for gamma rays hypothesized to be produced by darkmatter annihilation or decay. In addition, IACTs play a key rolein multimessenger astronomy, regularly searching for VHEemission produced by gamma-ray bursts and by the sources ofgravitational wave events, and having recently detected TeVgamma-ray emission from a flaring blazar coincident with ahighly energetic neutrino detected by the IceCube NeutrinoObservatory [2].Measurements with IACTs enable these scientific studiesby extracting information about VHE particles from the airshowers they produce in the atmosphere. In a conventionalIACT analysis, images from multiple telescopes are param-eterized and stereoscopically combined to extract the spatial,temporal, and calorimetric information of the originating VHEparticle. II. G
AMMA - RAY I MAGE A NALYSIS
The sensitivity of IACTs depends strongly on efficientlyrejecting the background of much more numerous cosmic-ray showers, which resemble those produced by gamma raysbut tend to have a more complex morphology. Using theinformation contained in the shapes of the shower images istherefore critical to maximizing IACT sensitivity. Supervisedlearning algorithms, like random forests and boosted decisiontrees, have been shown to effectively classify IACT eventsbased on event-level parameters constructed using images frommultiple telescopes (e.g. [4]).Deep learning techniques, such as convolutional neuralnetworks (CNNs), may be used to improve on these methodsbecause they do not require the images to be parameterizedand may therefore access features of these images that wouldbe washed out by the parameterization [5]. A deep learningapproach that combines CNNs with a recurrent neural network(RNN) has been shown to improve background rejection a r X i v : . [ a s t r o - ph . I M ] J a n a) Example IACT image (b) Same image mapped using rebinningFig. 1. Left: An example IACT image from a CTA FlashCam camera simulation, illustrating the hexagonally spaced grid of pixels typical of many IACTcameras. Right: The same image mapped to a square matrix of pixels by rebinning, which preserves the image’s overall amplitude. Both images are from [3]. performance using data from the H.E.S.S. IACT array [6].In previous work, the input images to such a network havebeen sorted by total amplitude. In this study, we apply asimilar model to simulated data from the Cherenkov TelescopeArray (CTA) [7], the next-generation observatory for gamma-ray astronomy, to determine the effect of this sorting procedureon classification performance.III. CTL EARN
We implement our neural network model using CTLearn [8], an open-source Python package for using deep learningto analyze pixel-wise camera data from arrays of IACTs.CTLearn provides an application-specific framework for con-figuring and training machine learning models with Tensor-Flow and applying the trained models to generate predictionson a test set [9]. CTLearn v0.3.0 was used for training themodels used in this work.Through the associated DL1-Data-Handler package [10],CTLearn can load and preprocess IACT data from any majorcurrent- or next-generation IACT. In particular, because manyIACT cameras have pixels arranged in a hexagonal layout,posing a challenge for convolutional neural networks thatconventionally require as input a rectangular matrix of inputpixels, DL1-Data-Handler provides a number of methods tomap hexagonally spaced pixels to a square grid. In this work,the rebinning method was chosen (Fig. 1b), which is oneof several mapping methods that provide comparably goodperformance [3]. https://github.com/ctlearn-project/ctlearn DCN [...][...]LSTMDropoutDropoutDense DenseDropoutPredictionDropout
Fig. 2. Diagram of the CNN-RNN particle classification model implementedin CTLearn, from [9]. The model uses a CNN block (labeled as a deepconvolutional network or DCN) to derive a vector representation of each imagein an event. The vectors are combined using a Long Short Term Memorynetwork (LSTM), a type of recurrent neural network (RNN).
IV. CNN-RNN P
ARTICLE C LASSIFICATION M ODEL
A challenge of using deep learning methods with IACTdata is combining images from multiple telescopes providingdifferent views of an air shower event. Each event triggersmultiple telescopes, and the number of triggered telescopesmay vary from event to event.One approach to deal with this challenge is to break theproblem into two stages. First, each image is processed into vector representation by a CNN, using the same weightparameters for each image. The vectors are then combined bya recurrent neural network (RNN), a type of neural networkthat takes as input a sequence of vectors, and, by maintainingan internal state, produces an output vector that depends notonly on the most recent input but on all preceding inputs inthe sequence. This vector is then fed into a set of denselyconnected layers that produce the final prediction. Connectingthese networks allows a single model trained end-to-end toclassify events consisting of images from multiple telescopes.For this work, the built-in CNN-RNN model of CTLearnwas used, which implements an architecture similar to theCRNN network presented in [6]. More details on the modeland the default hyperparameter settings that were used can befound in [9]. The RNN in this model is specifically a LongShort-Term Memory (LSTM) network.Recurrent neural networks are capable of processing se-quential data in which the ordering of inputs may affecttheir interpretation. Therefore, having a meaningful orderingof telescope images in a CNN-RNN network may improveperformance. In previous work using a CNN-RNN network forclassifying Cherenkov air showers as produced by a gammaray or a cosmic-ray proton, the telescope images were orderedby total image amplitude, or size. As size can be consideredto be a proxy for proximity to the shower center, sorting onthis parameter may provide an ordering given the absence oftemporal information [6].To understand the effect of this ordering on performance, wetrained two CNN-RNN networks as described above to classifyIACT images as produced by a gamma ray or a cosmic-rayproton, changing only the ordering of the input images. As acontrol, in one network the images were ordered by telescopeID number, an arbitrary but consistent ordering, while in theother the images were ordered by size. The networks weretrained using a sample of 250,000 simulated events from 25FlashCam telescopes [11], part of a proposed CTA array inParanal, Chile. Ten percent of the events in the sample werereserved as a validation set, which was not used for training.
Fig. 3. Validation accuracy of the CNN-RNN model with images ordered byID (dark blue) and total brightness (light blue) as a function of number oftraining steps (batches of 16 events). The models reach respective accuraciesof 80.6% and 80.2%. Fig. 4. Validation AUC with images ordered by ID (dark blue) and totalbrightness (light blue) as a function of number of training steps (batches of16 events). AUC is the numerically integrated area under the receiver operatingcharacteristic curve, measuring sensitivity and specificity. The models reachrespective AUCs of 0.899 and 0.894.
V. R
ESULTS AND D ISCUSSION
The results of this experiment are shown in Fig. 3 and Fig. 4.The validation metrics of the two models were approximatelythe same, with those of the control model being slightly higher.The control model attained validation accuracy and AUC of80.6% and 0.899, while the model with images sorted by sizereached 80.2% and 0.894. We therefore find no evidence thatsorting images by size improves gamma-proton classificationperformance with a CNN-RNN model.This finding leaves open the possibility that a differentordering of telescope images could result in improved per-formance. In particular, an ordering which provides sufficientinformation about the telescopes’ position on the ground couldhelp a CNN-RNN to perform stereoscopic reconstruction ofCherenkov air showers. While ordering by size as a proxy fordistance to the shower center should provide some relativeposition information, it is possible this information is tooincomplete to be useful to the network.In addition to performing background rejection, deep learn-ing algorithms could be used to determine the arrival directionand energy of the particles initiating Cherenkov air showers[12], tasks for which stereoscopic reconstruction is particularlyimportant. Ensuring that telescope position information iseffectively provided to CNN-RNN networks may therefore notonly improve their performance on background rejection butalso on additional tasks critical for IACT image analysis.R
EFERENCES[1] CTA Consortium, Ed.,
Science with the Cherenkov Telescope Array et al. , “Multimessenger observations of a flaring blazarcoincident with high-energy neutrino IceCube-170922A,”
Science , vol.361, no. 6398, 2018.[3] D. Nieto, A. Brill, Q. Feng, M. Jacquemont, B. Kim, T. Miener,and T. Vuillaume, “Studying deep convolutional neural networks withhexagonal lattices for imaging atmospheric Cherenkov telescope eventreconstruction,” in
Proceedings of 36th International Cosmic RayConference , 2019. [Online]. Available: https://pos.sissa.it/358/753/4] M. Krause, E. Pueschel, and G. Maier, “Improved γ /hadron separationfor the detection of faint γ -ray sources using boosted decision trees,” Astroparticle Physics , vol. 89, pp. 1–9, 2017.[5] D. Nieto, A. Brill, B. Kim, and T. B. Humensky, “Exploring deeplearning as an event classification method for the Cherenkov TelescopeArray,” in
Proceedings of 35th International Cosmic Ray Conference ,2017. [Online]. Available: https://pos.sissa.it/301/809/[6] I. Shilon, M. Kraus, M. B¨uchele, K. Egberts, T. Fischer, T. Holch,T. Lohse, U. Schwanke, C. Steppa, and S. Funk, “Application ofdeep learning methods to analysis of imaging atmospheric Cherenkovtelescopes data,”
Astroparticle Physics , vol. 105, pp. 44–53, 2019.[7] B. Acharya et al. , “Introducing the CTA concept,”
Astroparticle Physics ,vol. 43, pp. 3–18, 2013.[8] A. Brill, B. Kim, D. Nieto, T. Miener, and Q. Feng, “CTLearn:Deep learning for imaging atmospheric Cherenkov telescopes eventreconstruction,” Jul. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3345947[9] D. Nieto, A. Brill, Q. Feng, T. B. Humensky, B. Kim, T. Miener, andR. Mukherjee, “CTLearn: Deep Learning for Gamma-ray Astronomy,”in
Proceedings of 36th International Cosmic Ray Conference , 2019.[Online]. Available: https://pos.sissa.it/358/752/[10] B. Kim, A. Brill, T. Miener, D. Nieto, and Q. Feng, “DL1-Data-Handler:DL1 HDF5 writer, reader, and processor for IACT data,” Jul. 2019.[Online]. Available: https://doi.org/10.5281/zenodo.3336561[11] A. Gadola et al. , “FlashCam: A novel Cherenkov telescope camerawith continuous signal digitization,”
Journal of Instrumentation , vol. 10,no. 1, 2015.[12] S. Mangano, C. Delgado, M. I. Bernardos, M. Lallena, and J. J.Rodr´ıguez V´azquez, “Extracting Gamma-Ray Information from Imageswith Convolutional Neural Network Methods on Simulated CherenkovTelescope Array Data,” in