[PDF] Jet Substructure Classification in High-Energy Physics with Deep Neural Networks

Abstract

At the extreme energies of the Large Hadron Collider, massive particles can be produced at such high velocities that their hadronic decays are collimated and the resulting jets overlap. Deducing whether the substructure of an observed jet is due to a low-mass single particle or due to multiple decay objects of a massive particle is an important problem in the analysis of collider data. Traditional approaches have relied on expert features designed to detect energy deposition patterns in the calorimeter, but the complexity of the data make this task an excellent candidate for the application of machine learning tools. The data collected by the detector can be treated as a two-dimensional image, lending itself to the natural application of image classification techniques. In this work, we apply deep neural networks with a mixture of locally-connected and fully-connected nodes. Our experiments demonstrate that without the aid of expert features, such networks match or modestly outperform the current state-of-the-art approach for discriminating between jets from single hadronic particles and overlapping jets from pairs of collimated hadronic particles, and that such performance gains persist in the presence of pileup interactions.

Full PDF

JJet Substructure Classiﬁcation in High-Energy Physics with Deep Neural Networks

Pierre Baldi, Kevin Bauer, Clara Eng, Peter Sadowski, and Daniel Whiteson Department of Computer Science, University of California, Irvine, CA 92697 Department of Physics and Astronomy, University of California, Irvine, CA 92697 Department of Chemical Engineering, University of California Berkeley, Berkeley CA 94270 (Dated: April 1, 2016)At the extreme energies of the Large Hadron Collider, massive particles can be produced at suchhigh velocities that their hadronic decays are collimated and the resulting jets overlap. Deducingwhether the substructure of an observed jet is due to a low-mass single particle or due to multipledecay objects of a massive particle is an important problem in the analysis of collider data. Tra-ditional approaches have relied on expert features designed to detect energy deposition patternsin the calorimeter, but the complexity of the data make this task an excellent candidate for theapplication of machine learning tools. The data collected by the detector can be treated as a two-dimensional image, lending itself to the natural application of image classiﬁcation techniques. Inthis work, we apply deep neural networks with a mixture of locally-connected and fully-connectednodes. Our experiments demonstrate that without the aid of expert features, such networks matchor modestly outperform the current state-of-the-art approach for discriminating between jets fromsingle hadronic particles and overlapping jets from pairs of collimated hadronic particles, and thatsuch performance gains persist in the presence of pileup interactions.

PACS numbers:

INTRODUCTION

Collisions at the LHC occur at such high energies thateven massive particles are produced at large enough ve-locities that their decay products become collimated. Inthe case of a hadronic decay of a boosted W boson( W → qq (cid:48) ), the two jets produced from these two quarksthen overlap in the detector, creating a single merged jet.The substructure of the jet’s energy deposition can dis-tinguish between jets which are due to a single hadronicparticle or due to the decay of a massive object into mul-tiple hadronic particles; this classiﬁcation is known as jet“tagging” and is critical for understanding the nature ofthe particles produced in the collision [1].This classiﬁcation task has been the topic of intense re-search activity [2–5]. The diﬃcult nature of the problemhas lead physicists to reduce the dimensionality of theproblem by designing expert features [6–15] which incor-porate their domain knowledge. In the current state ofthe art applications, jets are either classiﬁed based on oneof these features alone or by combining multiple designedfeatures with shallow machine learning classiﬁers suchas boosted decision trees (BDTs). It is possible, how-ever, that these designed expert features do not captureall of the available information [16–18], as the data arevery high-dimensional and despite extensive theoreticalprogress in the microphysics of jet formation [19–21] andthe existence of eﬀective simulation tools [22, 23], thereexists no complete analytical model for classiﬁcation di-rectly from theoretical principles, though see Ref. [24].Therefore, approaches that use the higher-dimensionalbut lower-level detector information to learn this clas- siﬁcation function may outperform those which rely onfewer high-level expert-designed features.Measurements of the emanating particles can be pro-jected onto a cylindrical detector and then unwrappedand considered as two-dimensional images, enabling thenatural application of computer vision techniques. Re-cent work demonstrates encouraging results with shallowclassiﬁcation models trained on jet images [25–27]. Deepnetworks have shown additional promise in particle-levelstudies [28]. However, deep learning has not yet beenapplied to more realistic scenarios which include simula-tion of the detector response and resolution, and mostimportantly, the eﬀect of unrelated simultaneous pp in-teractions, known as pileup which contributes signiﬁcantenergy depositions unrelated to the particles of interest.In this paper, we perform jet classiﬁcation on imagesbuilt from simulated detector response using deep neuralnetwork models with a combination of locally-connectedand fully-connected layers. Our results demonstrate thatdeep networks can distinguish between detector clus-ters due to single or multiple jets without using domainknowledge, matching or exceeding the performance ofshallow classiﬁers used to combine many expert features. THEORY

A typical application of jet classiﬁers is to discriminatesingle jets produced in quark or gluon fragmentation fromtwo overlapping jets produced when a high-velocity W boson decays to a collimated pair of quarks. The goal isthen to learn the classiﬁcation function, or equivalently, a r X i v : . [ h e p - e x ] M a r the likelihood ratio: P W → qq (jet) P q/g (jet)In practice, there are two signiﬁcant obstacles to cal-culating and applying this ratio.First, while theoretical understanding of the processesinvolved has made signiﬁcant progress, a formulation ofthis likelihood ratio from fundamental QCD principlesis not yet available. However, there do exist eﬀectivemodels which have been successfully incorporated intowidely used tools capable of generating simulated sam-ples. Such samples can then be used to deduce the likeli-hood ratio, but the task is very diﬃcult due to its high-dimensionality. Expert features with solid theoreticalgrounding exist to reduce the dimensionality of this prob-lem, but it is unlikely that they capture all of the infor-mation, as the theoretical understanding is not completeand the concepts which motivate them do not include thedetector eﬀects or the impact of pileup interactions. Thegoal of this paper is to attempt to capture as much of theinformation as possible and learn the classiﬁcation func-tion from simulated samples which include these eﬀects,without making the simplifying theoretical assumptionsnecessary to construct expert features.Second, the eﬀective models used in simulation tools donot provide a perfectly accurate description of observedcollider data. A classiﬁcation function learned from sim-ulated samples is limited by the validity of those samples.While deep networks may provide a powerful methodof deducing the classiﬁcation function, expert featureswhich encapsulate theoretical understanding of the pro-cess of jet formation are valuable in assessing the successand failure of these models. In this paper, we use expertfeatures as a benchmark to measure the performance oflearning tools which access only the higher-dimensionallower-level data. We expect that deep networks may pro-vide additional classiﬁcation power in concert with theinsight oﬀered by expert features, and perhaps motivatethe development of modiﬁcations to such features ratherthan blindly replacing them. DATA

Training samples for both classes were produced usingrealistic simulation tools widely used in particle physics.Samples of boosted W → qq (cid:48) were generated with acenter of mass energy √ s = 14 TeV using the dibosonproduction and decay process pp → W + W − → qqqq leading to two pairs of quarks; each pair of quarks arecollimated and lead to a single jet. Samples of jets origi-nating from single quarks and gluons were generated us-ing the pp → qq, qg, gg process. In both cases, jets aregenerated in the range of p T ∈ [300 , madgraph5 [29] v2.2.3, showering and hadronizationsimulated with pythia [22] v6.426 , and response of thedetectors simulated with delphes [30] v3.2.0. The jetimages are characterized by the energies deposited at dif-ferent points on the approximately cylindrical calorime-ter surface.The classiﬁcation of jets as due to W → qq (cid:48) or sin-gle quarks and gluons is sensitive to the presence of ad-ditional in-time pp interactions, referred to as pile-up events. We overlay such interactions in the simulationchain, with an average number of interactions per eventof (cid:104) µ (cid:105) = 50, as an estimate of future ATLAS Run 2data with the LHC delivering collisions at a 25ns bunchcrossing interval. The impact of pile-up events on jet re-construction can be mitigated using several techniques.After reconstructing jets with the anti- k T [31] clusteringalgorithm using distance parameter R = 1 .

2, we apply ajet-trimming algorithm [32] which is designed to removepileup while preserving the two-pronged jet substructurecharacteristic of boson decay. Jet trimming re-clustersthe jet constituents using the k T [33] algorithm into sub-jets of radius 0.2 and discards subjets with p T less than3% of the original jet. Then the ﬁnal trimmed jet isbuilt using the remaining subjets. Trimmed jets with300 GeV < p T <

400 GeV are selected, in order to ensurethe minimum W boson velocity needed for collimated de-cays. In principle, the machine learning algorithms maybe able to classify jets without such ﬁltering; we leavethis for future studies.To compare our approach to the current state-of-the-art, we calculate six high-level jet variables commonlyused in the literature; calculations are performed us-ing FastJet [34] v3.1.2. First, the invariant mass of thetrimmed jet is calculated. Then, the trimmed jet’s con-stituents are used to calculate the other substructurevariables, N -subjettiness [9, 35] τ β =121 , and the energy cor-relation functions [10, 36] C β =12 , C β =22 , D β =12 , and D β =22 .A comprehensive summary of these six jet substructurevariables can be found in Ref. [2]. Figures 1 shows thedistribution of the variables for the two classes of jets,both with and without pileup conditions.In this paper, we investigate the power of classiﬁca-tion of the jets directly from the lower-level but higher-dimensional calorimeter data, without the dimensionalreduction provided by the variables above. The strategyfollows that of well-established image classiﬁcation toolsby treating the distribution of energy in the calorimeteras an image. The images were preprocessed as in pre-vious work by centering and rotating into a canonicalorientation. The origin of the coordinate axis was set atthe center of energy of each jet, then the image was ro-tated so that the principle axis θ is in the same directionfor each jet, where θ is deﬁned as Trimmed Mass (GeV) F r a c t i on o f E v en t s W->qqQCD >=50 m W->qq < >=50 m QCD < =1 b t F r a c t i on o f E v en t s W->qqQCD >=50 m W->qq < >=50 m QCD < =2 b C F r a c t i on o f E v en t s W->qqQCD >=50 m W->qq < >=50 m QCD < =1 b C F r a c t i on o f E v en t s W->qqQCD >=50 m W->qq < >=50 m QCD < =2 b D F r a c t i on o f E v en t s W->qqQCD >=50 m W->qq < >=50 m QCD < =1 b D F r a c t i on o f E v en t s W->qqQCD >=50 m W->qq < >=50 m QCD <

FIG. 1: Distributions in simulated samples of high-level jetsubstructure variables widely used to discriminate betweenjets due to collimated decays of massive objects ( W → qq )and jets due to individual quarks or gluons (QCD). Two casesare shown: with and without the presence of additional in-time pp interactions, included at the level of an average of 50such interactions per collision. tan( θ ) = (cid:88) i φ i × E i R i (cid:30) (cid:88) i η i × E i R i (1) R i = (cid:113) η i + φ i . (2)Images are then reﬂected so that the maximum energyvalue is always in the top half of the image.The jet energy deposits were centered and cropped towithin a 3 . × . ×

32 image, approximating the resolu-tion of the calorimeter cells. When two calorimeter cellswere detected within the same pixel, their energies weresummed. Example individual jet images from each classare shown in Figure 2, and averages over many jets areshown in Figure 3.

TRAINING

Deep neural networks were trained on the jet imagesand compared to the standard approach of BDTs trainedon expert-designed variables that capture domain knowl-edge [2]. All classiﬁers were trained on a balanced train-ing data set of 10 million examples, with 500 thousandof these used as a validation set. The best hyperparam-eters for each method were selected using the Spearmint

FIG. 2: Typical jet images from class 1 (single QCD jet from q or g ) on the left, and class 2 (two overlapping jets from W → qq (cid:48) ) on the right, after preprocessing as described inthe text.FIG. 3: Average of 100,000 jet images from class 1 (singleQCD jet from q or g ) on the left, and class 2 (two overlappingjets from W → qq (cid:48) ) on the right, after preprocessing. Bayesian optimization algorithm [37] to optimize over thesupports speciﬁed in Tables I and II. The best modelswere then tested on a separate test set of 5 million ex-amples.Neural networks consisted of hidden layers of tanh units and a logistic output unit with cross-entropyloss. Weight updates were made using the ADAM op-timizer [38] ( β = 0 . , β = 0 . , (cid:15) = 1 e −

08) with mini-batches of size 100. Weights were initialized from a nor-mal distribution with the standard deviation suggestedby Ref. [39]. The learning rate was initialized to 0 . . RESULTS

Deep networks with locally-connected layers showedthe best performance. For example, the best networkwith 5 hidden layers has two locally-connected layers fol-lowed by three fully-connected layers of 300 units each;this architecture performs better than a network of ﬁvefully-connected layers of 500 units each.Final results are shown in Table III. The metric usedis the Area Under the Curve (AUC), calculated in sig-nal eﬃciency versus background eﬃciency, where a largerAUC indicates better performance. In Fig 4, the signaleﬃciency is shown versus backround rejection, the inverseof background eﬃciency. In the case without pile-up, asstudied in Ref. [28], the deep network modestly outper-forms the physics domain variables, demonstrating ﬁrstthat successful classiﬁcation can be performed without

TABLE I: Hyperparameter support for Bayesian optimizationof deep neural network architectures. For the no-pileup case,networks with a single hidden layer were allowed to have upto 1000 units per layer, in order to remove the possibility ofthe deep networks performing better simply because they hadmore tunable parameters. Range OptimumHyperparameter Min Max No pileup PileupHidden units per layer 100 500 425 500Fully-connected layers 1 5 4 5Locally-connected layers 0 5 4 3TABLE II: Hyperparameter support for BDTs trained on 6high-level features, and the best combinations in 110 and 140experiments, respectively, for the no-pileup and pileup tasks.Minimum leaf percent was constrained to be one fourth of theminimum split percent in all cases.Range OptimumHyperparameter Min Max No pileup PileupTree depth 15 75 49 49Learning rate 0.01 1.00 0.07 0.07Minimum split percent 0.0001 0.1000 0.0021 0.0021 expert-designed features and that there is some loss ofinformation in the dimensional reduction such featuresprovide. See the discussion below, however, for commentson the continued importance of expert features.Our results also demonstrate for the ﬁrst time thatsuch performance holds up under the more diﬃcult andrealistic conditions of many pileup interactions; indeed,the gap between the deep network and the expert vari-ables in this case is more pronounced. This is likely dueto the fact that the physics-inspired variables rest on ar-guments motivated by idealized pictures.

TABLE III: Performance results for BDT and deep networks.Shown for each method are both the signal eﬃciency at back-ground rejection of 10, as well as the Area Under the Curve(AUC), the integral of the background eﬃciency versus signaleﬃciency. For the neural networks, we report the mean andstandard deviation of three networks trained with diﬀerentrandom initializations. PerformanceTechnique Signal eﬃciency AUCat bg. rejection=10

No pileup

BDT on derived features 86 .

5% 95 . . (0.04%) . (0.02%) With pileup

BDT on derived features 81 .

5% 93 . . (0.02%) . (0.01%) Signal efficiency0 0.2 0.4 0.6 0.8 1 B a ck g r ound r e j e c t i on DNN(image)BDT(expert) +mass =2 b D +mass =1 b t Jet mass

No pile-up

Signal efficiency0 0.2 0.4 0.6 0.8 1 B a ck g r ound r e j e c t i on DNN(image)BDT(expert)+mass =2 b D +mass =1 b t Jet mass>=50 m Pile-up <

FIG. 4: Signal eﬃciency versus background rejection (inverseof eﬃciency) for deep networks trained on the images andboosted decision trees trained on the expert features, bothwith (bottom) and without pile-up (top). Typical choices ofsignal eﬃciency in real applications are in the 0.5-0.7 range.Also shown are the performance of jet mass individually aswell as two expert variables in conjunction with a mass win-dow.

INTERPRETATION

Current typical use in experimental analysis is thecombination of the jet mass feature with τ or one ofthe energy correlation variables. Our results show thateven a straightforward BDT-combination of all six of thehigh-level variables provides a large boost in comparison.In probing the power of deep learning, we then use as ourbenchmark this combination of the variables provided bythe BDT.The deep network has clearly managed to match orslightly exceed the performance of a combination of thestate-of-the-art expert variables. Physicists working on the underlying theoretical questions may naturally be cu-rious as to whether the deep network has learned a novelstrategy for classiﬁcation which could inform their stud-ies, or rediscovered and further optimized the existingfeatures.While one cannot probe the motivation of the ML al-gorithm, it is possible to compare distributions of eventscategorized as signal-like by the diﬀerent algorithms inorder to understand how the classiﬁcation is being accom-plished. To compare distributions between diﬀerent algo-rithms, we study simulated events with equivalent back-ground rejection, see Figs. 5 and 6 for a comparison of theselected regions in the expert features for the two classi-ﬁers. The BDT preferentially selects events with valuesof the features close to the characteristic signal valuesand away from background-dominated values. The DNN,which has a modestly higher eﬃciency for the equivalentrejection, selects events near the same signal values, butin some cases can be seen to retains a slightly higher frac-tion of jets away from the signal-dominated region. Thelikely explanation is that the DNN has discovered thesame signal-rich region identiﬁed by the expert features,but has in addition found avenues to optimize the perfor-mance and carve into the background-dominated region.Note that DNNs can also be trained to be independent ofmass, by providing a range of mass in training, or train-ing a network explicitly parameterized [44, 45] in mass. DISCUSSION

The signal from massive W → qq jets is typically ob-scured by a background from the copiously produced low-mass jets due to quarks or gluons. Highly eﬃcient classiﬁ-cation is critical, and even a small relative improvementin the classiﬁcation accuracy can lead to a signiﬁcantboost in the power of the collected data to make statis-tically signiﬁcant discoveries. Operating the collider isvery expensive, so particle physicists need tools that al-low them to make the most of a ﬁxed-size dataset. How-ever, improving classiﬁer performance becomes increas-ingly diﬃcult as the accuracy of the classiﬁer increases.Physicists have spent signiﬁcant time and eﬀort de-signing features for jet-tagging classiﬁcation tasks. Thesedesigned features are theoretically well motivated, but astheir derivation is based on a somewhat idealized descrip-tion of the task (without detector or pileup eﬀects), theycannot capture the totality of the information containedin the jet image. We report the ﬁrst studies of the ap-plication of deep learning tools to the jet substructureproblem to include simulation of detector and pileup ef-fects.Our experiments support two conclusions. First, thatmachine learning methods, particularly deep learning,can automatically extract the knowledge necessary forclassiﬁcation, in principle eliminating the exclusive re- Jet Mass [Gev] F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image)

No pile-up =1 b C F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image)

No pile-up =2 b C F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image)

No pile-up =1 b D F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image)

No pile-up =2 b D F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image)

No pile-up =1 b t F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image)

No pile-up

FIG. 5: Distributions in simulated samples without pileup ofhigh-level jet substructure variables for pure signal ( W → qq )and pure background (QCD) events. To explore the decisionsurface of the ML algorithms, also shown are backgroundevents with various levels of rejection for deep networkstrained on the images and boosted decision trees trained onthe expert features. Both algorithms preferentially select jetswith values near the peak signal values. Note, however, thatwhile the BDT has been supplied with these features as aninput, the DNN has learned this on its own. liance on expert features. The slight improvement inclassiﬁcation power oﬀered by the deep network com-pared to the combination of expert features is likely dueto the fact that the network has succeeded in discover-ing small optimizations of the expert features in orderto account for the detector and pileup eﬀects present inthe simulated samples. This marks another demonstra-tion of the power of deep networks to identify importantfeatures in high-dimensional problems. In practice, whiledeep network classiﬁcation can boost jet tagging perfor-mance, expert features oﬀer powerful insight [24] into thevalidity of the simulation models used to train these net-works. We do not claim that these results make expertfeatures obsolete. However, it suggests that deep net-works can provide similar performance on a variety ofrelated problems where the theoretical tools are not asmature. For example, current tools do not always in- Jet Mass [Gev] F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image) >=50 m Pile-up < =1 b C F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image) >=50 m Pile-up < =2 b C F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image) >=50 m Pile-up < =1 b D F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image) >=50 m Pile-up < =2 b D F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image) >=50 m Pile-up < =1 b t F r a c t i on o f S e l e c t ed E v en t s qq ﬁ WQCD, rej=20QCD, rej=5QCD

BDT(expert)DNN(image) >=50 m Pile-up <

FIG. 6: Distributions in simulated samples with pileup ofhigh-level jet substructure variables for pure signal ( W → qq )and pure background (QCD) events. To explore the decisionsurface of the ML algorithms, also shown are backgroundevents with various levels of rejection for deep networkstrained on the images and boosted decision trees trained onthe expert features. Both algorithms preferentially select jetswith values near the peak signal values. Note, however, thatwhile the BDT has been supplied with these features as aninput, the DNN has learned this on its own. clude information from tracking detectors, nor do theyoﬀer performance parameterized [44, 45] in the mass ofthe decaying heavy state.Second, we conclude that the current set of expert fea-tures when used in combination (via BDT or other shal-low multi-variate approach) appear to capture nearly allof the relevant information in the high-dimensional low-level features describe by the jet image. The power ofthe networks described here is limited by the accuracyof these models, and expert features may be more ro-bust to variation among the several existing simulationmodels [46]. In experimental applications, this relianceon simulation can be mitigated by using training sam-ples from real collision data, where the labels are derivedusing orthogonal information.Data in high energy physics can often be formulatedas images. Thus, these results reported on the repre-sentative classiﬁcation task of single q or g jets versusmassive jets from W → qq (cid:48) are very likely to apply toa broader set of similar tasks, such as classifying jetswith three constituents, as in the case of top quark decay t → W b → qq (cid:48) b , or massive jets from other particles suchas Higgs boson decays to bottom quark pairs. Note thatin more realistic datasets, calorimeter information oftencontains depth information as well, such that the imagesare three-dimensional instead of two; however, this doesnot represent a diﬃcult extrapolation for the machinelearning algorithms. While the fundamental classiﬁca-tion problems are very similar from a machine learningstandpoint, the literature of expert features is somewhatless mature, further underlining the potential utility ofthe reported deep learning methods in these areas.Future directions of research include studies of the ro-bustness of such networks to systematic uncertainties inthe input features and to change in the hadronizationand showering model used in the simulated events.Datasets used in this paper containing millions of sim-ulated collisions can be found in the UCI Machine Learn-ing Repository [47]. ACKNOWLEDGEMENTS

We thank Jesse Thaler, James Ferrando, Sal Rappoc-cio, Sam Meehan, Chase Shimmin, Daniel Guest, KyleCranmer and Andrew Larkoski for useful comments andhelpful discussion. We thank Yuzo Kanomata for com-puting support. We also wish to acknowledge a hardwaregrant from NVIDIA and NSF grant IIS-1321053 to PB. [1] Jonathan M. Butterworth, Adam R. Davison, Math-ieu Rubin, and Gavin P. Salam. Jet substructure as anew Higgs search channel at the LHC.

Phys.Rev.Lett. ,100:242001, 2008.[2] D. Adams, A. Arce, L. Asquith, M. Backovic, T. Barillari,et al. Towards an Understanding of the Correlations inJet Substructure. 2015.[3] A. Abdesselam et al. Boosted objects: A Probe of beyondthe Standard Model physics.

Eur. Phys. J. , C71:1661,2011.[4] A. Altheimer et al. Jet Substructure at the Tevatron andLHC: New results, new tools, new benchmarks.

J. Phys. ,G39:063001, 2012.[5] A. Altheimer et al. Boosted objects and jet substructureat the LHC. Report of BOOST2012, held at IFIC Valen-cia, 23rd-27th of July 2012.

Eur. Phys. J. , C74(3):2792,2014.[6] Tilman Plehn, Michael Spannowsky, Michihisa Takeuchi,and Dirk Zerwas. Stop Reconstruction with Tagged Tops.

JHEP , 1010:078, 2010.[7] David E. Kaplan, Keith Rehermann, Matthew D.Schwartz, and Brock Tweedie. Top Tagging: A Method for Identifying Boosted Hadronically DecayingTop Quarks.

Phys.Rev.Lett. , 101:142001, 2008.[8] Andrew J. Larkoski, Simone Marzani, Gregory Soyez,and Jesse Thaler. Soft Drop.

JHEP , 1405:146, 2014.[9] Jesse Thaler and Ken Van Tilburg. Identifying BoostedObjects with N-subjettiness.

JHEP , 1103:015, 2011.[10] Andrew J. Larkoski, Gavin P. Salam, and Jesse Thaler.Energy Correlation Functions for Jet Substructure.

JHEP , 1306:108, 2013.[11] J. Thaler D. Krohn and L.-T. Wang. Jet Trimming.

JHEP , 1002:084, 2010.[12] C. K. Vermilion S. D. Ellis and J. R. Walsh. Recom-bination Algorithms and Jet Substructure: Pruning as aTool for Heavy Particle Searches.

Phys.Rev. , D81:094023,2010.[13] S. Marzani M. Dasgupta, A. Fregoso and G. P. Salam.Towards an understanding of jet substructure.

JHEP ,1309:029,, 2013.[14] Mrinal Dasgupta, Alessandro Fregoso, Simone Marzani,and Alexander Powling. Jet substructure with analyticalmethods.

Eur. Phys. J. , C73(11):2623, 2013.[15] Mrinal Dasgupta, Alexander Powling, and AndrzejSiodmok. On jet substructure methods for signal jets.

JHEP , 08:079, 2015.[16] Pierre Baldi, Peter Sadowski, and Daniel Whiteson.Searching for exotic particles in high-energy physics withdeep learning.

Nature Communications , 5, July 2014.[17] Pierre Baldi, Peter Sadowski, and Daniel Whiteson. En-hanced higgs boson to τ + τ − search with deep learning. Phys. Rev. Lett. , 114:111801, Mar 2015.[18] Peter Sadowski, Julian Collado, Daniel Whiteson, andPierre Baldi. Deep Learning, Dark Knowledge, and DarkMatter. pages 81–87, 2014.[19] Davison E. Soper and Michael Spannowsky. Findingphysics signals with shower deconstruction.

Phys. Rev. ,D84:074002, 2011.[20] Davison E. Soper and Michael Spannowsky. Findingtop quarks with shower deconstruction.

Phys. Rev. ,D87:054012, 2013.[21] Iain W. Stewart, Frank J. Tackmann, and Wouter J.Waalewijn. Dissecting Soft Radiation with Factorization.

Phys. Rev. Lett. , 114(9):092001, 2015.[22] T. Sjostrand et al. PYTHIA 6.4 physics and manual.

JHEP , 05:026, 2006.[23] M. Bahr et al. Herwig++ Physics and Manual.

Eur.Phys. J. , C58:639–707, 2008.[24] Andrew J. Larkoski, Ian Moult, and Duﬀ Neill. AnalyticBoosted Boson Discrimination. 2015.[25] Josh Cogan, Michael Kagan, Emanuel Strauss, and ArielSchwarztman. Jet-images: computer vision inspired tech-niques for jet tagging.

Journal of High Energy Physics ,2015(2):1–16, February 2015.[26] Leandro G. Almeida, Mihailo Backovic, Mathieu Cliche,Seung J. Lee, and Maxim Perelstein. Playing Tag withANN: Boosted Top Identiﬁcation with Pattern Recogni-tion. arXiv:1501.05968 [hep-ex, physics:hep-ph] , January2015. arXiv: 1501.05968.[27] Performance of Boosted W Boson Identication with theATLAS Detector.

Tech. Rep. ATL-PHYS-PUB-2014-004, CERN, Geneva , Mar. 2014.[28] Luke de Oliveira, Michael Kagan, Lester Mackey, Ben-jamin Nachman, and Ariel Schwartzman. Jet-Images –Deep Learning Edition. 2015.[29] Johan Alwall et al. MadGraph 5 : Going Beyond.

JHEP , JHEP , 04:063,2008.[32] David Krohn, Jesse Thaler, and Lian-Tao Wang. JetTrimming.

JHEP , 02:084, 2010.[33] Stephen D. Ellis and Davison E. Soper. Successive com-bination jet algorithm for hadron collisions.

Phys. Rev. ,D48:3160–3166, 1993.[34] Matteo Cacciari, Gavin P. Salam, and Gregory Soyez.FastJet User Manual.

Eur.Phys.J. , C72:1896, 2012.[35] Jesse Thaler and Ken Van Tilburg. Maximizing BoostedTop Identiﬁcation by Minimizing N-subjettiness.

JHEP ,02:093, 2012.[36] Andrew J. Larkoski, Ian Moult, and Duﬀ Neill. PowerCounting to Better Jet Observables.

JHEP , 12:009, 2014.[37] Jasper Snoek, Hugo Larochelle, and Ryan P Adams.Practical bayesian optimization of machine learning al-gorithms. In F. Pereira, C. J. C. Burges, L. Bottou, andK. Q. Weinberger, editors,

Advances in Neural Informa-tion Processing Systems 25 , pages 2951–2959. Curran As-sociates, Inc., 2012.[38] Diederik Kingma and Jimmy Ba. Adam: A Method forStochastic Optimization. arXiv:1412.6980 [cs] , Decem-ber 2014. arXiv: 1412.6980.[39] Kaiming He, Xiangyu Zhang, Shaoqing Ren, andJian Sun. Delving Deep into Rectiﬁers: Surpass-ing Human-Level Performance on ImageNet Classiﬁca-tion. arXiv:1502.01852 [cs] , February 2015. arXiv:1502.01852.[40] Franois Chollet.

Keras . GitHub, 2015.[41] James Bergstra, Olivier Breuleux, Frdric Bastien, Pas-cal Lamblin, Razvan Pascanu, Guillaume Desjardins,Joseph Turian, David Warde-Farley, and Yoshua Bengio.Theano: a CPU and GPU math expression compiler. In

Proceedings of the Python for Scientiﬁc Computing Con-ference (SciPy) , Austin, TX, June 2010. Oral Presenta-tion.[42] Frederic Bastien, Pascal Lamblin, Razvan Pascanu,James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nico-las Bouchard, David Warde-Farley, and Yoshua Ben-gio. Theano: new features and speed improvements. arXiv:1211.5590 [cs] , November 2012.[43] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay.Scikit-learn: Machine learning in Python.

Journal of Ma-chine Learning Research , 12:2825–2830, 2011.[44] Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sad-owski, and Daniel Whiteson. Parameterized MachineLearning for High-Energy Physics. 2016.[45] Kyle Cranmer. Approximating Likelihood Ratios withCalibrated Discriminative Classiﬁers. 2015.[46] James Dolen, Philip Harris, Simone Marzani, SalvatoreRappoccio, and Nhan Tran. Thinking outside the ROCs:Designing Decorrelated Taggers (DDT) for jet substruc-ture. 2016.[47] Pierre Baldi, Kevin Bauer, Clara Eng, Peter Sadowski,and Daniel Whiteson. Data for jet substructure for high-energy physics, 2016.