Toward particle-level filtering of individual collision events at the Large Hadron Collider and beyond
aa r X i v : . [ phy s i c s . d a t a - a n ] D ec Toward particle-level filtering of individual collisionevents at the Large Hadron Collider and beyond
Federico Colecchia ∗ Brunel University, London, Uxbridge, UB8 3PH, United Kingdom
Low-energy strong interactions are a major source of background at hadron colliders, and methods ofsubtracting the associated energy flow are well established in the field. Traditional approaches treat thecontamination as diffuse, and estimate background energy levels either by averaging over large data setsor by restricting to given kinematic regions inside individual collision events. On the other hand, morerecent techniques take into account the discrete nature of background, most notably by exploiting thepresence of substructure inside hard jets, i.e. inside collections of particles originating from scatteredhard quarks and gluons. However, none of the existing methods subtract background at the level ofindividual particles inside events. We illustrate the use of an algorithm that can enable particle-by-particle background discrimination at the Large Hadron Collider, and we envisage this as the basis for anovel event filtering procedure upstream of the official jet reconstruction pipelines. Our hope is that thisnew technique will improve physics analysis when used in combination with state-of-the-art algorithmsin high-luminosity hadron collider environments.
Keywords:
Strong interactions described by Quantum Chro-modynamics (QCD) play a major role at hadroncollider experiments such as those at the LargeHadron Collider (LHC) at CERN, where thehighest-energy proton beams available worldwideare collided. The higher event multiplicities andbackground rates as compared to previous exper-iments have an impact on physics analysis, andplace even stronger requirements on backgroundsubtraction than they did in the past.In particular, the energy flow associated with soft, i.e. low-energy, QCD interactions is an impor-tant background at the LHC. Pileup, i.e. particlesoriginating from proton-proton collisions that arenot the one of interest but that nonetheless con-tribute to the same triggered event, is an issue toa number of LHC analyses, and its impact is goingto become more and more relevant as the instan-taneous luminosity of the accelerator is increased.The high pileup rates that are foreseen atupgraded LHC scenarios can significantly affectsearches for new heavy particles in final states con-taining missing transverse momentum, /p T , as wellas the analysis of channels containing hard jets, i.e.collections of particles originating from scatteredhard quarks and gluons. In fact, jet energy cor-rection has a direct impact on the quality of thereconstructed jet objects that are ultimately usedfor analysis (see e.g. [1, 2]).In addition to pileup, when a hard parton scat-tering from a proton-proton collision takes place, ∗ Email: [email protected] Missing transverse momentum, /p T , is the observed momentum imbalance inside an event measured on a plane perpen-dicular to the beam direction. THE ALGORITHM additional particles are also produced by the Un-derlying Event, i.e. by interactions between theproton beam remnants and by multiple parton in-teractions. Moreover, particles can generate addi-tional energy flow in the form of initial-state radia-tion prior to the hard scattering. All of these effectscan have a notable impact on physics analysis andare carefully taken into account at the experimentsduring reconstruction and calibration.
Methods of subtracting soft QCD background asso-ciated with pileup and Underlying Event at hadroncolliders are well established. Traditional tech-niques treat the contamination as diffuse, and es-timate a background momentum contribution thatis then subtracted from the total momentum of thehard jets of interest. Some of these methods rely onhigh-statistics samples, e.g. Minimum Bias, dijet,and Drell-Yan data (see e.g. [2]). Given the waythese methods work, the estimated background en-ergy contribution is typically averaged over manyevents, and event-to-event background fluctuationsare therefore neglected.Following the introduction of the notion of jetarea [3], which provides a measure of the suscep-tibility of reconstructed jets to soft QCD energyflow, the focus has shifted toward event-by-eventestimation of the background momentum density.With jet area-based methods, the quantity that issubtracted from the total momentum of the hardjets is proportional to an event-level estimate ofthe background momentum density as well as tothe area of the jet of interest. This takes into ac-count possible event-to-event variations of the softQCD energy flow.However, since the estimated momentum den-sity is typically averaged over all pileup jets in theevent, this approach still neglects background fluc-tuations inside individual events. Nonetheless, theamount of soft QCD contamination can be differ-ent in different jets due to the quantum nature ofthe underlying physics. Subtracting background ina kinematics-dependent way can partially addressthis issue, although jet area-based methods wereultimately not developed to describe such effects.A more recent approach exploits the presence of substructure inside jets. Jet grooming techniquesare being used at the LHC to reject soft QCDcontamination inside jets [4, 5, 6]. As opposedto treating the low-energy background as diffuse,these methods exploit the presence of substructurethat is often associated with the hierarchical com-position of the jets. Such methods have provenparticularly useful, especially in combination withjet-vertex association techniques that map individ-ual track jets to putative primary interaction ver-tices.
Despite the wealth of techniques available and theeffectiveness they have so far demonstrated, noneof the existing methods use information at thefinest-grained level, i.e. at the level of individualparticles inside events. We elaborate on the pos-sibility to use our algorithm [7, 8] to implement anovel event filtering procedure to reject soft QCDbackground from individual LHC events particleby particle. We suggest that individual particlesinside events can be mapped to a signal hard scat-tering as opposed to soft QCD background on aprobabilistic basis, thereby taking into account theeffect of fluctuations on the shapes of particle-levelprobability density functions (PDFs). This can beparticularly useful with reference to neutral parti-cles, which in general cannot be easily associatedwith the primary interaction vertex.
We recently presented a Markov Chain MonteCarlo algorithm that makes it possible to assignindividual particles inside events a probability forthem to originate from a hard parton scatteringas opposed to soft QCD interactions. We showedresults on Monte Carlo data sets comprising a to-tal number of particles in the range ∼ ÷ ,corresponding to gg → t ¯ t at √ s = 14 TeV superim-posed with Minimum Bias. Our algorithm makesit possible to estimate the effect of fluctuations onthe shapes of signal and background PDFs at theparticle level. We are here discussing the possibil-ity of using it to implement a particle-level filtering2
THE ALGORITHM procedure for individual LHC events upstream ofthe official jet reconstruction pipelines.The algorithm processes a given collection ofparticles that is assumed to be a mixture com-prising particles originating from a signal hardscattering as well as particles associated with softQCD background. It samples iteratively from aBayesian posterior PDF that encodes informationas to which particles are more likely to originatefrom either process, discriminating based on signaland background particle-level kinematics. In par-ticular, with reference to individual particle pseu-dorapidity , η , the distribution of particles orig-inating from a hard quark or gluon scattering istypically more biased toward zero, i.e. the corre-sponding particles are more “central” in the detec-tor as compared to particles associated with softQCD interactions.The statistical model is a convex combina-tion of particle-level PDFs corresponding to thehard scattering and to soft QCD background: α f ( η, p T ) + α f ( η, p T ) . The quantity α ( α ) isthe fraction of background (signal) particles, and f ( f ) is the background (signal) PDF. In practice,in the study described in [7], most of the discrimi-nation power comes from the η distributions. ThePDFs f j are estimated by regularising η histogramsbased on spline interpolation of the bin contents.This provides the statistical model with the flex-ibility required to describe generic deviations ofthe PDF shapes from those of the correspondingcontrol sample templates due to fluctuations. Thesymbol ϕ j will be used throughout to refer to suchestimates of the PDFs f j . The pseudocode is givenbelow, v ( t ) referring to the value of variable v atiteration t .1. Initialization:
Set α (0) = { α (0) j } j , j = 0 , ,and obtain estimates ϕ (0) j of the subpopula-tion PDFs f j by regularising the correspond-ing distributions from a high-statistics con-trol sample.2. Iteration t : (a) Generate the “allocation variables" z ( t ) ij , for all particles i = 1 , ..., N , and j = 0 , , based on the conditional prob-abilities P ( z ( t ) ij = 1 | α ( t − j , ϕ (0) j , x i ) = α ( t − j ϕ (0) j ( x i ) / ( α ( t − ϕ (0)0 ( x i ) + α ( t − ϕ (0)1 ( x i )) . The quantity z ( t ) ij equals 1 when observation i is mappedto distribution j at iteration t , and 0otherwise.(b) Map individual particles to signal orbackground based on z ij , and set α ( t )0 ( α ( t )1 ) to the fraction of particlesmapped to background (signal) at iter-ation t − .As described in [7], the algorithm was inspiredby the Gibbs sampler [9], and its development wasinfluenced by a number of statistical techniques in-cluding Expectation Maximisation [10], MultipleImputation [11], and Data Augmentation [12].As anticipated, a remarkable feature of thismethod relates to the possibility of estimating theeffect of fluctuations on the shapes of PDFs thatcorrespond to particles originating from a signalhard parton scattering as opposed to low-energyQCD background. The shapes of the particle-levelsignal and background PDFs in a given data setcan in fact differ notably from those of the corre-sponding templates obtained from high-statisticscontrol samples, where the effect of fluctuationson the PDF shapes is normally averaged out. Asexpected, the deviation of the actual PDF shapesfrom the shapes of the corresponding control sam-ple templates in general becomes more and morenotable as the number of particles in the input dataset is reduced.The algorithm estimates the shapes of thePDFs corresponding to particles associated witha signal hard parton scattering as opposed to softQCD background. This is done by iteratively map-ping particles to signal or background using thedata to refine initial conditions obtained from thecontrol samples. The effect of fluctuations on thePDF shapes is encoded in the stationary distri-bution of the Markov Chain, the existence anduniqueness of which is discussed in [7, 8].Figure 1 (a) displays the true η distribution of Particle pseudorapidity, η , is a kinematic variable that is related to the particle polar angle, θ , in the laboratory frame,and which is given by η = − log(tan( θ/ . CONCLUSIONS particles from Monte Carlo gg → t ¯ t normalisedto unit area (points), superimposed with the PDFtemplate obtained from a high-statistics controlsample (curve) [7]. The figure shows how the truedistribution deviates from the control sample tem-plate due to the presence of fluctuations in thedata. Figure 1 (c) shows the same true distri-bution (points) superimposed with the PDF esti-mated using the algorithm (curve). The agreementwith the true distribution is remarkably improved,corresponding to χ /ndof = 0 . as opposed to χ /ndof = 38 . from figure 1 (a). The correspond-ing ratios are given in figures 1 (b) and (d).Since the data set used in [7] comprises a num-ber of particles that is in line with typical LHCevent multiplicities, it makes sense to use those re-sults to illustrate the anticipated performance ofthe algorithm on individual LHC events. Figure 1(e) shows the difference between the signal η PDFestimated by the algorithm and the control sam-ple template, normalised to the latter. The verti-cal band corresponds to a hypothetical jet around η = − . Given that the relative difference betweenthe actual PDF and the control sample templatecan be as high as 20% in that interval of η , if onewere to map individual particles inside such a hy-pothetical jet to signal or background using thecontrol sample PDF, i.e. neglecting the effect offluctuations, the number of signal particles insidethe jet would be underestimated by as much as20%. For this reason, this technique can also beused to obtain precise estimates of the fraction ofsoft QCD particles inside individual jets, therebytaking into account the effect of fluctuations at theparticle level.Finally, with regard to execution time, the al-gorithm processed the Monte Carlo data sets usedin [7] in ∼ s on a 2 GHz Intel Processor with1 GB RAM without any optimisation. We considersuch performance reasonable for offline use. Our hope is that this new approach will comple-ment existing techniques for subtraction of low-energy QCD background at hadron colliders. Infact, since it is based on a different principle and itworks in a different way as compared to state-of- the-art techniques, we expect it to further improvephysics analysis in high-luminosity hadron colliderenvironments when used in combination with ex-isting methods. We anticipate that particle-levelevent filtering will provide a more significant con-tribution as pileup rates and average event multi-plicities increase, e.g. to improve jet mass and /p T resolution, depending on the analysis. We also ex-pect the ability to reject soft QCD contaminationparticle by particle thereby taking into account theeffect of fluctuations on the PDF shapes inside in-dividual events to further improve background sub-traction inside fat jets from decays of possible newheavy particles. We have discussed the potential of our algorithm[7] to implement a novel particle-by-particle filter-ing procedure for individual events at hadron col-lider experiments, which we envisage as a possiblenew data processing stage upstream of the exist-ing jet reconstruction pipelines. One central as-pect is the possibility to map individual particlesto a hard scattering as opposed to low-energy QCDbackground, thereby taking into account the effectof particle-level fluctuations on the shapes of signaland background PDFs inside individual events.We have shown that, if one is to map individualparticles inside events to a hard parton scatteringas opposed to soft QCD interactions, using PDFsobtained from independent high-statistics controlsamples does not take into account the effect offluctuations, and can lead to a shift in the esti-mated number of signal particles as high as 20%.On the other hand, the particle-level PDFs esti-mated using our algorithm were found to be in re-markable agreement with the true distributions onthe Monte Carlo data sets analysed in [7]. Thismethod can therefore also produce precise esti-mates of the fraction of soft QCD particles insideindividual jets.Our hope is that this approach will improve theresolution of jet observables in high-luminosity en-vironments when used in combination with state-of-the-art techniques such as jet grooming algo-rithms, e.g. with reference to the mass of fat jetsfrom boosted decays of possible new heavy parti-4
EFERENCES η Signal: Control sample PDF & Truth
Control sampleTruth (a) η Signal: Control sample PDF / Truth (b) η Signal: Estimated PDF & Truth
EstimatedTruth (c) η Signal: Estimated PDF / Truth (d) (e)
Figure 1: (a) Monte Carlo true signal particle η PDF (points), superimposed with the corre-sponding control sample template (curve) [7]. The plot highlights the effect of fluctuations onthe PDF shape, χ /ndof = 38 . . (b) Ratio between control sample and true PDF correspondingto (a). (c) The same true distribution (points) superimposed with the PDF estimated by thealgorithm (curve), χ /ndof = 0 . . (d) Ratio between estimated and true PDF correspond-ing to (c). (e) Relative difference between estimated and control sample PDF. For the sake ofillustration, the vertical band represents a hypothetical jet around η = − .cles. More generally, it is our opinion that particle-by-particle filtering of individual events based onhigh-precision particle-level PDFs has the potentialto become a useful ingredient of physics analysisat future high-luminosity hadron collider experi-ments. The author wishes to thank the High EnergyPhysics Group at Brunel University for a stimu-lating environment, and particularly Prof. AkramKhan and Prof. Peter Hobson. Particular grati-tude also goes to the High Energy Physics Groupat University College London, especially to Prof.Jonathan Butterworth for his valuable comments.The author also wishes to thank Prof. TrevorSweeting at the UCL Department of Statistical Sci-ence, as well as Dr. Alexandros Beskos at the samedepartment for fruitful discussions. Finally, partic-ular gratitude goes to Prof. Carsten Peterson andto Prof. Leif Lönnblad at the Department of The-oretical Physics, Lund University.
References [1] Nayak A K (for the CMS Collaboration) 2013CMS CR-2013/019[2] The CMS Collaboration 2010 CMS PAS JME-10-003[3] Cacciari M and Salam G P 2008
Phys Lett B :119-26[4] Butterworth J M, Davison A R, Rubin M andSalam G P 2008 Phys Rev Lett :242001[5] Krohn D, Thaler J, Wang L T 2010
J HighEnergy Phys :084[6] Ellis S D, Vermilion C K and Walsh J R 2010
Phys Rev D :094023[7] Colecchia F 2013 J Phys: Conf Ser
J Phys: Conf Ser
IEEE T. PatternAnal. EFERENCES REFERENCES [10] Dempster A P, Laird N M and Rubin D B1977
J. Roy. Statist. Soc. Ser. B (1):1-38[11] Rubin D B 1987 “Multiple Imputation forNonresponse in Surveys”, J. Wiley & Sons, New York[12] Tanner M A and Wong W H 1987 J. Amer.Statistical Assoc.82