Wavelet Analysis: Event De-noising, Shower Evolution and Jet Substructure Without Jets
WWavelet Analysis: Event De-noising, Shower Evolution and Jet SubstructureWithout Jets
J. W. Monk
Niels Bohr Institute, University of Copenhagen, Denmark
Wavelet decomposition is a method that has been applied to signal processing in a widerange of subjects. The decomposition isolates small scale features of a signal from large scalefeatures, while also maintaining information about where in the signal those features occur.Wavelets obey particular scaling relations and are especially suited to the analysis of systemsthat are self-similar and scale invariant. They are therefore a natural tool for the study ofhadron collisions.This paper introduces the use of wavelets for de-noising (removal of soft activity), studyingthe scaling behaviour of a shower, and recognising jets according to this behaviour. This isdemonstrated by processing a sample of boosted W boson Monte Carlo events together withtheir QCD background. The method is quite general and can be used as a pre-processingstep in conjunction with any jet-finder or other event-shape algorithm. The result in thissimple example is a significant enhancement in both the size and shape of the W boson masspeak, together with an improved separation of the background distribution.
I. INTRODUCTION
The ideas presented here grew out of attempts to identify diffractive processes at the LargeHadron Collider (LHC) using Fourier transforms [1]. Intense QCD activity at the LHC masks thefragile rapidity gap signature of such processes. A Fourier analysis decomposes a proton collisionevent into separate contributions to the energy flow according to their angular scale. One drawbackto Fourier transforms is that they do not give any information on where in a signal a particularcontribution occurs, and there are edge effects due to the inhermetic nature of detectors at hadroncolliders.The short-time Fourier transform (STFT) is a modification of the Fourier transform in which thesinusoidal basis functions are modified by multiplying by a Gaussian of fixed width. This Gaussianwindow localises the contribution from each basis function and addresses some of the drawbacks ofthe basic Fourier transform.However, the STFT is not optimal because of the fixed width of the windowing Gaussian; highfrequency, small scale components may in principle be localised to a much smaller region that low a r X i v : . [ h e p - ph ] M a y frequency, large scale structures, yet the STFT uses the same width for both small- and large-scalecontributions. The mathematical tools for wavelet analysis were developed in order to overcomethe limitations of the STFT.Wavelets [2, 3] have been something of a hot topic within mathematics, with a wide rangeof applications and well developed sophisticated technologies. Wavelets have had an impact in arange of diverse subjects, including geophysics, cosmology and astrophysics, speech recognition,communication and signal compression, cardiology, image processing, and analysis of financial andother systems that exhibit fractal behaviour. Wavelet-like analysis is also employed by biologicalsystems, including human hearing and vision. When a physicist visually identifies a pattern ofcalorimeter activity on an event display as a jet, they are using wavelet-like analysis to do so.Despite this deep utility and relevance to the natural world, wavelets have rarely been used withinthe field of particle physics. Theoretical studies of scaling relations in soft QCD were made in the1990s [4], long before the advent of modern jet analysis techniques at the LHC. More recently,wavelets were proposed as a method for uncovering faint event shapes caused by jet quenchingin heavy ion collisions [5], which is not dissimilar to my original plan to study diffraction usingharmonic analysis. There has also been a recent study on wavelets and boosted bosons [6] that iscoincident with the work outlined here. Although that study ostensibly shares the same topic, thepresent work is unrelated in either form or origin, and takes a different approach.On applying the wavelet transform, a signal is decomposed into a set of wavelet basis functions,which in the case of the discrete version of the transform are usually orthogonal. This decompositionis formally expressed in equation 1 P ( φ ) = (cid:88) m,n C mn ψ (cid:18) ( m − M ) φS − n (cid:19) (1)where ψ ( x ) is the wavelet basis function and S is an arbitrary scale choice, typically the resolutionlimit of the signal, that is, S = Φ / M , where Φ is the physical length of the signal and M is thenumber of samples present in the discrete signal. The index n of the wavelet coefficient C mn inequation 3 is used as a translational parameter in the wavelet basis function ψ . The index m scalesthe basis function; the wavelet basis functions at two different scale indices k and m are related by ψ ( k, φ ) = (cid:114) k m ψ (cid:18) m, k m φ (cid:19) (2)in other words, the basis function at scale k is simply a re-scaling of the basis function at scale m . This scaling relation is lacking from the STFT and is the key feature that makes the wavelettransform an exciting tool for analysing hadron collisions. Note also that, as with the choice ofscale in any renormalisable system, the choice of S is in principle arbitrary, though in practice itwill be limited by the resolution of any measuring apparatus. Wavelets are therefore a natural toolfor analysing scale invariant and self similar systems. C mn = (cid:115) ( m − M ) S (cid:90) ¯ ψ (cid:18) ( m − M ) φS − n (cid:19) P ( φ ) dφ (3)The coefficients of the discrete wavelet transform, C mn , are given by equation 3 , which isevaluated on a discrete dyadic grid, so that incrementing the scale index m by one halves thephysical scale. The translational index n has integer values between and m , meaning that fullyhalf of the wavelet coefficients C mn are used to encode the highest frequency components of thesignal. Applying a threshold to the wavelet coefficients is therefore a common method of removinghigh frequency noise and compressing a signal. The dyadic grid of scales and locations reflectsthe uncertainty principle. At high frequency (large m , small scale), there are a large numberof coefficients closely spaced in φ , giving good positional information, but with poor frequencyinformation due to the large gap between frequencies. On the other hand, at low frequenciesthere are a small number of coefficients that are widely spaced in φ , which gives poor positionalinformation, but with good frequency resolution.The discrete wavelet decomposition is an iterative procedure that can be understood as thecombination of a low-pass and high-pass filter. The full signal is initially convoluted with the set ofsmallest scale wavelet basis functions, giving the highest frequency information in the signal. Thisis the high pass filter. The signal is then re-scaled by convoluting with the wavelet scaling function,which has the effect of removing fine detail from the signal. This second step is the low pass filter.The high pass filter is then run again by convoluting the now smoothed signal with the next smallestscale wavelet basis functions, and so on until only the average of the signal remains. As a usefulpedagogical exercise, an example decomposition of a sequence of eight numbers is performed byhand in Appendix A using the Harr wavelet.It is illuminating to compare this iterative procedure with the basic principle by which a MonteCarlo shower algorithm works, which is itself an expression of the renormalisation group [7]. Ashower algorithm uses a fixed-order splitting kernel, which is then evaluated in a sequence orderedin some evolution parameter. Depending on the shower Monte Carlo in question, the evolutionparameter can include transverse momentum, virtuality or emission angle. Each scale in the evo-lution parameter results in a different emission pattern probability, and the emission at one scaleprovides the input for the splitting kernel at the next scale in the sequence. The emission patternat one scale is nothing more than a re-scaling of the pattern at another scale. If the shower splittingkernel were to satisfy the requirements for a wavelet basis function, then the shower evolution isexactly the same as the reconstruction of a signal from its wavelet components. It is an interestingand tantalising question as to whether QCD could be usefully expressed in terms of such a waveletbasis. Even in the absence of such a fully optimised basis, there are a sufficiently large number ofknown wavelet functions that some of them must surely provide a useful framework in which tostudy QCD.The hope then is that wavelets can provide a method to break down a hadron collision intocontributions arising from small angles, large angles and everything in between. Doing so allowsone to isolate, and remove if desired, contributions arising from non-perturbative physics, to studythe shower evolution and extract observables using only emissions arising at particular scales, andto search for characteristic scales present in the event, which could indicate the presence of knownor unknown physical processes. II. APPLICATION OF A WAVELET TRANSFORM TO HADRON COLLISIONS
The aim of this technique is to perform the wavelet transform of the radiation pattern in anevent, which will provide information on the scales at which emissions are produced. The directionof an emission is described by two coordinates, φ , the azimuthal angle, and y the rapidity. Point-likeparticle emissions are not suitable for input to the discrete wavelet transform, therefore before itcan be analysed, the collision event must be rasterised , that is, turned into a two-dimensional arrayof numbers. A two-dimensional histogram is produced in y − φ for each individual event. Thecontents of each bin of this histogram is the sum of the transverse momenta of all the particles thatfall in that bin for a given event. This two-dimensional histogram is effectively an image of theradiation emission pattern in the event, with each bin an image pixel, and the sequence of pixelswill is used as the input to the discrete wavelet transform. The size of the bins or pixels determinesthe limit of the angular resolution available in the wavelet transform.Rasterising the event in this way acts as a convenient prophylactic against infra-red divergences.While a formal proof of infra-red safety is notoriously difficult [8], intuitively, any result obtainedfrom the rasterised representation cannot be sensitive to soft or collinear emissions because suchemissions do not change the transverse momentum sums of any of the image pixels. The rasterisedevent is also rather similar to the structure of calorimeters used in modern particle physics detectorexperiments. The size of the image grid in φ is π , and to ensure an almost equal coverage inrapidity, it is convenient to take a rapidity range of | y | ≤ . , giving nearly square pixels. Symmetrybetween the φ and y coverage is not a requirement of the transform, but it makes the results easierto interpret because the wavelet scaling factor in the φ direction will have the same meaning asthe scaling factor in the y direction. For the same reason, it is also convenient to have an equalnumber of histogram bins, N , along the φ and y axes, with N being radix two a requirement ofthe discrete wavelet transform. In this initial study the choice N = 128 was used, which givesan angular resolution of approximately ∆ R = 0 . . This is probably somewhat smaller than thebest angular resolution currently available from hadronic calorimeters, but this choice allows thetechnique to be studied under a best case scenario. Advanced experimental techniques that combinecharged particle tracking and fine-grained calorimetry mean such an angular resolution limit is notunfeasible with current detectors.Having constructed the rasterised event representation, it can then be subjected to a wavelettransformation. Several freely available computer libraries exist for carrying out wavelet transfor-mations, although they are often specialised to tasks such as image manipulation or geophysics.The GNU Scientific Library [9] (GSL) includes libraries for performing wavelet transformation, andwas used in this example because it is relatively simple and general, although it has the downsideof not providing a large choice of basis wavelet functions. Future studies may benefit from thedevelopment of a wavelet library designed around the needs of collider physics, and could providea wider range of wavelet bases, possibly including newly designed bases for QCD.The wavelet transform returns a set of N × N coefficients. Each coefficient can be identified witha wavelet level m , which specifies the scale of the coefficient, and a translation index, n . Since thetransformation is of a two-dimensional input signal, each coefficient has two indices for the scaleand two indices for the translation, which specify the scale and translation along both the y and φ axes. If the y scale and translation indices are m, n , and the φ scale and translation indices are j, k , then each pair of m − j values specifies a N × K sub-array of coefficients covering all y and φ translations, with N = 2 m and K = 2 j . Each of these sub-arrays encodes the contribution to theemission pattern at a particular angular scale.Having thus transformed to the wavelet basis and separated the contributions to the eventfrom different angular scales, various selection criteria may be imposed. For example, coefficientscorresponding to small (or, conversely, large) angular scale may be removed. Given the expectedcorrelation between the wavelet scale and the shower evolution parameter, such a filter would isolatethe soft or hard contributions to the radiation pattern. Alternatively, any wavelet coefficient withmagnitude below a given threshold may be removed, which is a standard technique for de-noisinga signal, and has the effect of removing small contributions, regardless of their angular size. Thiswould have the effect of removing pile-up and underlying event contributions from the event.Once the desired selection and analysis is performed in the wavelet domain, the modified coeffi-cients should be subjected to the inverse wavelet transform, which returns a now-modified versionof the input rasterised event. This N × N array is not amenable to normal analysis techniquessuch as jet finding, which requires particle four-vectors. However, the per-pixel ratio of the out-put rasterised event to the input rasterised event can easily be computed. It is then possible todetermine in which pixel of this ratio each particle in the event lies, and multiply the particle mo-mentum by that corresponding ratio. The result is a list of particles whose momenta have beenmodified by the analysis and selection in the wavelet space, and which can be subjected to jet-finding or any other analysis techniques (including event shapes) in the usual way. This process ofrasterisation → analysis → de-rasterisation is easy to understand visually, and is sketched in figure1. Note that the wavelet analysis can occasionally lead to negative values in the ratios; it doesnot makes sense to multiply a particle’s momentum by a negative number, since doing so wouldreverse its direction of travel. In this situation, the particle can either be removed entirely, or canbe treated as a “ghost” particle, whose contribution can subsequently be removed from jets or othercomposite objects after jet-finding.Apart from the aforementioned intrinsic suitability of wavelets for analysing QCD, this techniqueoffers several advantages over standard jet substructure [10, 11] techniques: • The wavelet is applied to, and harvests information from, the entire event; whereas jet sub-structure operates only on jet constituents, a wavelet analysis can also use large-scale cor-relations that extend beyond the jet under consideration, including soft emissions at wideangles that would not normally survive jet-selection requirements. • The wavelet is completely independent of the jet algorithm, indeed it does not require a jetalgorithm at all. Some substructure techniques only work with certain jet algorithms, butthe wavelet analysis can be used as a pre-processing step with any jet algorithm. Waveletscan also be used with other event shape observables. • The angular scale selected in a wavelet analysis is not dependent on a jet radius parameter, R . This means it is possible to isolate, for example, the wide-angle contribution to a small- R Input particle vectorsInput rasterised event Rasterised event after wavelet analysis Ratio of analysed rasterised event to input rasterised eventOutput particle vectors after scaling momenta by above ratio
FIG. 1: Sequence showing the wavelet analysis of a hadron collision. Starting with an initial set of inputparticle vectors, the event is first rasterised to give a two dimensional histogram (left-hand image). Standardcomputational tools for discrete wavelet analysis can be applied to this rasterised representation of the event,resulting in an updated version of the rasterised event (middle image). The ratio of the updated rasterisedevent to the initial rasterised event is calculated, and used to modify the input particles’ momenta (right-hand image). jet or, conversely, the small angle contribution to a large- R jet. Standard jet filtering relieson constructing small- R jets within larger- R jets, and the angular size of the contributionsare the same as the jet size. • The wavelet analysis does not remove particles. Traditional substructure techniques removeentire particles or sub-jets that are deemed to be soft in origin, but the nature of interferencein QCD means that no single particle should be labelled as originating from either hardor soft interactions. The wavelet analysis estimates a wide- and small-angle contributionto each particle, and modifies the particle accordingly in order to isolate or remove thosecontributions. This approach follows the spirit of quantum mechanics.
III. WAVELET DE-NOISING OF BOOSTED W-BOSONS
A sample of two million Monte Carlo events containing W bosons was generated using thePythia 8.183 event generator [12] with tune AU2 [13] using the CTEQ6L1 [14] parton distributionfunction. A second sample of ten million QCD events was generated to provide a background.Events were generated weighted in transverse momentum ( p T ), with a minimum partonic p T of180 GeV. Pythia is a leading order event generator, and as such was not expected to describe high p T boson production, but this analysis was mainly concerned with the behaviour of QCD showers,for which a leading order shower Monte Carlo was fast and convenient.The samples were analysed using the Rivet framework [15], with Cambridge-Aachen jets [16] ofradius R = 1 . constructed using the FastJet library [17]. In the case of the sample of W-bosons, thejets were required to be matched to the W-boson in the event record with an angular separation nogreater than ∆ R < . . The closest such matched jet was taken if there were multiple candidates.The masses of all such jets with p T greater than GeV and rapidity y ≤ are shown in thedashed teal curves of figure 2.Events were then subjected to the wavelet decomposition, using the Daubechies d4 wavelet anda N × N grid size of × , as described in section II. The Daubechies family of wavelets arebetter at encoding high frequency information than the simplest Harr wavelet.The resulting wavelet coefficients were filtered by setting any coefficient, C jkmn , to zero if | C jkmn | < GeV. The noise threshold of GeV was chosen as it is approximately the scale belowwhich QCD becomes non-peturbative. The performance of the de-noising algorithm is not stronglydependent on the value of the noise threshold, which could in any case be optimised by studyingthe behaviour of soft inelastic (“minimum bias”) collisions in the wavelet domain.After inverting the wavelet transform and obtaining the ratio of the modified to input event,the Cambridge Aachen jet algorithm was run again, this time on the modified set of particles. Thesame jet selection criteria of p T ≥ GeV and | y | ≤ were used, and the masses of the resultingwavelet de-noised jets are shown as the orange curves of figure 2.It is immediately clear from figure 2 that the shape of the signal is very much improved bythe simple wavelet de-noising. Prior to de-noising, the signal jets have significant contributionsfrom QCD effects such as underlying event and initial and final state showers. This gives a rather [GeV] J M ] - [ pb G e V J d M s d W jetsAfter wavelet de-noising a) [GeV] J M ] - [ pb G e V J d M s d QCD jetsAfter wavelet de-noising b)FIG. 2: Mass of R = 1 . Cambridge Aachen jets with p T ≥
300 GeV and | y | ≤ after de-noising there be a jet in the event whose mass, M J , satisfies | M J −
80 GeV | <
15 GeV and, inthe case of the sample of W events, that jet be matched to the W-boson. As before, the (de-noised)jets were also required to satisfy p T > GeV and | y | <
2. The distribution of the magnitudesof the wavelet coefficients of such events is shown in figure 3. The reason why the de-noising isnot particularly sensitive to the noise threshold of 1 GeV is immediately clear; the coefficients aresharply peaked near zero, and most of the activity that is removed by de-noising lies well belowthe threshold. Figure 3 also shows that, compared to QCD events, W-boson events have a slightlylarger number of significant wavelet coefficients. The number of significant wavelet coefficients isan indication of the information content of the event. Indeed, the removal of the least significantwavelet coefficients is an effective (lossy) data compression algorithm used in a range of applications,including image and sound compression. The larger information content of the W events indicatesthey posses more structure than the QCD events.The rms of the wavelet coefficients all having the same angular scale provides a measure of theamount of activity occurring at that scale. The coefficients that are removed during de-noisingwere grouped together according to their angular scales (there are two such scales for φ and y ), andthe rms was calculated at each scale. Figure 4 shows the rms of the coefficients that are removedduring de-noising as a function of the angular scales in both rapidity and azimuth. The rms ofthe wide-angle coefficients that are removed is around twice that of the small-angle coefficients.Figure 4 also shows that slightly more activity is removed from QCD events compared to W-bosonevents, and that there is a plateau in the activity removed between scale index zero and scale index2, which corresponds to angular size between approximately π and π/ . There is also a small1 ] - [ G e V | m n d | C d N N -4 -3 -2 -1 QCDW | [GeV] mn |C W / Q CD ] - [ G e V | m n d | C d N N -4 -3 -2 -1 QCDW | [GeV] mn |C W / Q CD ] - [ G e V | m n d | C d N N -4 -3 -2 -1 QCDW | [GeV] mn |C W / Q CD QCDW
FIG. 3: Spectrum showing the magnitude of the wavelet coefficients in both QCD events (dashed teal) andW events (solid orange). The coefficients are sharply peaked at zero, indicating that most of the activity inan event is encoded in a small number of coefficients. The bottom panel shows the ratio of coefficients inW jet events to that in QCD jet events. The shaded band between 0 and 1 GeV indicates the region thatis removed during de-noising. difference in shape between the removed activity as a function of azimuthal scale and the removedactivity as a function of rapidity. At the largest scales, the activity is flatter in azimuth comparedto rapidity. This small difference in the behaviour as a function of rapidity compared to azimuthpossibly indicates the presence of colour connection effects between the beam direction and emittedpartons.2 m y ~ 6.4 / 2 D y m R M S o f c o e ff i c i e n t [ G e V ] W jetsQCD jets a) m / 2 p ~ 2 fD f m R M S o f c o e ff i c i e n t [ G e V ] W jetsQCD jets b)FIG. 4: Angular scales of the activity removed from events by de-noising. The x-axis displays the waveletscale index ( m ), which indicates the angular scale of the contributions. The y-axis shows the rms of thecoefficients removed at that scale, which is an indication of the amount of activity removed. The top plot(a) shows the rms as a function of the rapidity scale, and the bottom plot (b) shows the rms as a functionof the azimuthal scale. IV. WAVELET BASED JET RECOGNITION
The different behaviour of W and QCD events in the wavelet domain reflects different eventstructures, and creates the possibility that the two types of event may be separated by waveletanalysis. The contribution of a given wavelet coefficient to a jet (whether de-noised or not) may bedetermined by rasterising the event as in figure 1, setting the wavelet coefficient(s) of interest to zeroand then modifying the jet constituents according to the ratio of modified-to-input rasterised event.The difference between the unmodified jet constituents and the jet constituents after modificationwas considered to be the contribution to the jet from the wavelet coefficient(s) under consideration.It was therefore possible to obtain a measure of how a jet evolves with angular scale. Examplesof such jet-evolution profiles are shown in figure 5 for jet mass and jet p T . Only jets satisfying | M J −
80 GeV | < were used, and in the case of the W sample, jets were required to bematched to the W-boson as described in section III. The y-axis in figure 5 shows the fraction of ajet’s mass or p T that is contributed by keeping only those wavelet coefficients with both rapidity andazimuthal levels below the value stated on the x-axis. The profile was produced by initially includingall but the very smallest scale contributions, in this case, those coefficients whose azimuthal andrapidity levels are both smaller than 7. The difference between the jet mass obtained by includingonly those coefficients and the full jet mass is the contribution to the jet mass that occurs at waveletlevel 7. Similarly, the difference between the jet mass using only coefficients with levels smaller than6 and the jet mass using coefficients with levels smaller than 7 is the contribution to the jet massoccurring at level 6. The procedure was repeated to give the jet mass occurring at all levels downto level 0, which corresponds to the largest angular scale. The jet mass contributions at each levelwere then divided by the full jet mass to give a mass fraction. The same procedure was repeatedwith jet p T instead of mass.Figure 5 shows a clear difference between QCD- and W-jets in the evolution of both the jet massand jet p T . Further, there is a difference in the evolution of the p T and the mass. In both cases, theW-jets have larger contributions coming from small angular-scales (large wavelet level). At waveletlevels below around 5, which corresponds to angular scales larger than around 0.2, the QCD-jetsshow larger contributions than the W-jets. The jet-mass evolution shows a larger contribution toboth QCD- and W-jet masses at wavelet level zero, compared to jet p T , which has a negligiblecontribution at level zero. Level zero is the widest angular contribution, covering the full angularrange. The contribution to jet p T grows more linearly with wavelet level than the contribution tojet-mass, although in the case of QCD-jets there is a flattening of the curve between levels 6 and 7.4 ) m p
2R ~ D m ( > J / M m M D < W jets (de-noised)QCD jets (de-noised) a) ) m p
2R ~ D m ( > T J / p T p D < W jets (de-noised)QCD jets (de-noised) b)FIG. 5: Jet evolution profiles for jet mass (a, top plot) and jet p T (b, lower plot). Jets have wavelet de-noisingapplied. The y-axis indicates the average contribution to a jet mass or p T arising at given angular scale,which is indicated by the wavelet level on the x-axis. Jet evolution profiles are calculated by sequentiallyremoving wavelet contributions, starting at the smallest scale (the largest wavelet level). m = 5 the W-jets have a more sharply peaked mass fraction distribution, with the peakoccurring at a lower mass fraction compared to QCD-jets. The mass fraction distribution at level is rather similar between QCD- and W-jets, and at higher wavelet levels the W-jets favour largemass fractions compared to QCD-jets. The general trend is for W-jets to show more small-scalestructure compared to QCD-jets, which tend to show somewhat more large-angle activity. This fitsthe expectation for QCD-jets to be more diffuse and lack such a hard-core compared to W-jets.The mass fraction plots of figure 6 form a set of probability density functions (PDFs) that wereused to classify the jets. The TMVA package [18] was used to convert the eight mass fraction plotsinto two sets of eight PDFs, P W ( m, M m ) and P QCD ( m, M m ) for the W- and QCD-jets respectively,where m is the wavelet level and M m is the mass fraction at that level. The likelihood that a jetwas either a W-jet or a QCD-jet was then determined by calculating its eight mass fractions at theeight wavelet levels, evaluating the PDFs and taking their product, as in equation 4. L W = (cid:89) m =0 P W ( m, M m ) L QCD = (cid:89) m =0 P QCD ( m, M m ) (4)The likelihood ratio, L R , is a measure of whether a jet is more likely to be a W-jet than aQCD-jet, and is given in equation 5 L R = L W L W + L QCD (5)The L R distribution for both de-noised W-jets and de-noised QCD-jets is shown in figure 7, withjets again required to have masses in the W mass window of | M J −
80 GeV | <
15 GeV , and in thecase of the sample of W-jets, be matched to a W-boson. The QCD- and W-jets are well separatedby the likelihood ratio, with W-jets strongly favouring L R values close to 1, while QCD-jets favourvalues close to 0. Selecting jets according to their L R value is therefore a good way of rejectingQCD background jets while keeping most signal W-jets.Figure 8 shows the de-noised jet mass distribution for QCD- and W-jets, together with theircombination, when selecting jets that satisfy L R > . and L R > . . The process of wavelet de-noising followed by jet recognition hugely increases the signal to background ratio of W-jets to QCD-jets; the differential QCD-jet cross section in the region of the W-mass is around 55 pb − GeV − J = M(m) /M M d M d N N m=0W jetsQCD jets J = M(m) /M M -0.05 0 0.05 0.1 d M d N N m=1W jetsQCD jets J = M(m) /M M d M d N N m=2W jetsQCD jets J = M(m) /M M -0.1 0 0.1 0.2 0.3 d M d N N m=3W jetsQCD jets J = M(m) /M M d M d N N m=4W jetsQCD jets J = M(m) /M M d M d N N m=5W jetsQCD jets J = M(m) /M M d M d N N m=6W jetsQCD jets J = M(m) /M M d M d N N m=7W jetsQCD jets FIG. 6: Fractions of the jet-mass occurring at the different wavelet levels for W- and QCD-jets. Eachsub-plot corresponds to a single bin of the profile in figure 5a ) QCD + L W /(L W = L R L Likelihood Ratio R d L d N N QCD Jets (de-noised)W Jets (de-noised)
FIG. 7: Likelihood ratio for W-jets (orange) and QCD-jets (teal dashed), obtained by using the eight plotsof figure 6 as probability density functions. (figure 2b), and is reduced to around 1.8 pb − GeV − by wavelet analysis with L R > . , a reductionof 30:1. The signal W-jet mass distribution is on the other hand enhanced, although the L R selectionrequirement does reduce the signal peak in figure 8b compared to figure 2a. Selecting jets with evenlarger values of L R could further improve the signal to background ratio, though at the expense ofthe total number of signal events.The shape of the QCD background distribution in figure 8 clearly depends on the value of the L R selection criterion. Changing the jet selection to require larger values of L R enhances the bumpin the QCD M J distribution at between 20 to 30 GeV. This change in the shape of the distributionoccurs because there are different contributions to the QCD-jets from Feynman diagrams involvingboth quarks and gluons. While it is a fallacy to describe a single jet as either a quark or a gluon,the two (at a minimum) distinct populations of jets are a reflection of this underlying physics. Thatwavelet analysis is sensitive to these details of QCD production is in itself very interesting, and mayopen up new tests of QCD. Further, a more complete study may be able to separate these differentQCD-jet populations, and by doing so further enhance the separation of signal W-jets.8 [GeV] J,de-noised M ] - [ pb G e V J d M s d > 0.6 R L QCD + W JetsQCD JetsW Jets a) [GeV] J,de-noised M ] - [ pb G e V J d M s d > 0.9 R L QCD + W JetsQCD JetsW Jets b)FIG. 8: Mass distribution of de-noised jets after selection by wavelet analysis. A loose likelihood selectionof L R > . that preserves most signal jets is shown in a), while a tight selection of L R > . that improvesthe signal to background ratio at the expense of the total number of signal jets is shown in b). V. EFFECT OF MULTIPLE OVERLAID COLLISIONS
Coping with the effects of multiple overlaid hadron collisions (known as “pile up”) would seem tobe an obvious application for performing analysis in the frequency domain. Pile up adds a carpetof radiation throughout the detector, which adds to the momentum and mass of the jets, as wellas creating jets that would not otherwise be present. The overlaid pile up events are usually softerthan the hard collision of interest, with few or no high p T jets of their own. Some approaches to pileup removal include jet area subtraction, which subtracts from jets a momentum contribution basedon the average detector activity multiplied by the jet area, and jet substructure pruning techniques,which remove small radius low p T jets from large radius high p T jets.These two approaches to pile up removal represent the far extremes of the frequency domain.Jet area subtraction removes very wide angle contributions, arising from the average activity acrossthe entire detector. Jet pruning removes small localised contributions to large jets. An appropriatewavelet analysis on the other hand offers the opportunity to remove both the large scale contribu-tions from pile up, the small scale fluctuations that occur on top of the local background, and allangular scales in between.In order to perform studies with pile up, a further sample of Monte Carlo events was generated.Pythia 8 [12] tune A2 [13] with the CTEQ6L1 PDF [14] was used to generate a sample of 44 millionsoft inelastic events (including diffractive processes) at a centre of mass collision energy of 8 TeV.These soft QCD events were combined with the sample of W bosons using the PileMC package[19], which overlays a random number of pile up collisions on top of a signal sample. The numberof pile up collisions was distributed according to the Poisson distribution with a mean of 20.De-noising O (20) overlaid collisions differs from de-noising a single collision because the individ-ual collisions are not correlated. The noise threshold, t ( m y , m φ ) at wavelet levels m y and m φ wastherefore chosen according to equation 6 t ( m y , m φ ) = (cid:18) N pileup × . m y × m φ (cid:19) GeV (6)where N pileup is the number of overlaid pile up collisions in the event being analysed. Equation6 was motivated as follows: each soft inelastic collision contributes around 1 GeV of activity perunit of rapidity, or N pileup × . GeV in total within | y | < N pileup events. Each wavelet basisfunction at wavelet levels m y and m φ is localised to an area of approximately π/ m φ × . / m y ,which is a fraction / (2 m y × m φ ) of the total detector area available. A rough estimate of the0average pile up contribution to the wavelet coefficient at levels m y and m φ is therefore 1 GeV perunit of rapidity, multiplied by the number of pile up collisions and the fraction of the total area towhich the wavelet basis function is localised. To this is added a 1 GeV pedestal, which is the noisethreshold from section III, and accounts for soft contributions present in the hard signal collision.While this choice of threshold has not been fully optimised, it demonstrates an important featureof the wavelet analysis; the threshold for very wide-angle contributions can be made much higherthan that for small angle contributions. The motivation for this is that the wide-angle coefficientsreceive contributions from all of the overlaid collisions together, whereas the small-angle coefficientsare less likely to receive contributions from multiple collisions, but simply represent local fluctuationsarising from a single collision. The power of the wavelet analysis allows these different contributions,the detector-wide average and the local fluctuations, to be extracted and treated differently. An indepth study of soft physics in the wavelet domain would allow for a better understanding of themost appropriate threshold as a function of angular scale.Figure 9 shows the jet mass for the signal sample of W bosons after pile up has been overlaid.Prior to any wavelet analysis, the peak in the distribution is extremely broad, and has shiftedto around 170 GeV. After wavelet processing to remove those wavelet contributions below thethreshold in equation 6, there is a clear peak at the W-boson mass of 80 GeV. Compared to thesample without pile up, this peak is in fact higher, but the pile up has distorted the shape of thepeak, with a notable tail to higher jet mass. Applying the likelihood-based jet selection does notaffect either the shape or the height of the signal peak very much.The bias visible in the mass distribution may reflect a residual bias in the jet p T distribution.The jet p T distribution for de-noised W-jets with and without pile up are shown in figure 10. Thejets with pile up have a harder p T spectrum, which explains the greater height of the peak in thejet mass distribution with pile up. The jet-mass will also be correlated with jet p T , so the bias tohigher p T caused by pile up will also favour higher jet masses.Despite the remaining distortion in the shape of the mass distribution, wavelet analysis alone has successfully retrieved a clear mass peak from a very broad mass distribution that reflects pile upconditions consistent with the first data taking runs of the Large Hadron Collider (LHC). Furtheroptimisation of the wavelet threshold, a better understanding of the p T bias and the combineduse of other pile up techniques (such as using vertex information from charged particle trackingdetectors) offer plenty of scope for improving the behaviour in the presence of pile up beyond thisinitial study.1 (with pile up) [GeV] J M ] - [ G e V - p er pb J d M d N De-noised W jets > 0.9 R De-noised W jets, LW jets
FIG. 9: Effect of wavelet de-noising and selection on W-jet mass in the presence of pile up. The dashed tealcurve shows the matched jet mass distribution when an average of twenty soft inelastic collisions is overlaid.The solid orange curve shows W-jet masses from the same events after wavelet de-noising to eliminate pileup. The dotted purple curve shows the masses of those jets that pass the wavelet jet selection with alikelihood L R > . . VI. COMMENTS
A large number of jet substructure techniques now exist, many of which have aspects thatare similar to the wavelet analysis outlined in this note. The jet energy flow approach of [20],which predates mainstream jet substructure, is somewhat similar in that it convolutes the eventenergy flow with a smoothing function of variable radius. Like the present wavelet analysis, thatapproach aims to avoid assigning an individual particle to a unique jet as a proxy for a hardparton. However, despite introducing the concept of viewing an event over variable angular scales,that simpler smearing does not include the key ability to invert the event convolution, nor does ituse orthogonal basis functions that are well organised on a dyadic grid with a clear scaling rule.The two point correlation approach of [21] has some similarity with the Haar wavelet, which2 [GeV] T Jet p
350 400 450 500 ] - [ G e V - p er pb T dpd N De-noised W jets + pile upDe-noised W jets
FIG. 10: The effect of pile up on the de-noised jet p T distribution. The dashed teal curve shows the predictednumber of jets when pile up is present, the solid orange shows the number in the absence of pile up. is a simple step function. That approach also builds a top tagger by measuring the mass fractionas a function of correlation length. Again, it is not as general as the wavelet analysis, nor does itinclude key concepts like de-noising.Shower deconstruction, described in [22], also shares some similarities with wavelet analysis.In particular, the recognition of jets through their radiation patterns, and the ordered “showerhistory” concept is similar to the idea of using wavelets to probe the scale dependence of theshower. However, where shower deconstruction requires specific signal and background models,wavelet analysis uses self-similarity and the different scale dependence of the signal compared tothe background. If the wavelet basis functions were transposed into a toy shower model and used totrain shower decomposition, then the outcome would likely be similar to wavelet analysis, with thedifference that wavelet analysis would be much simpler to implement. How similar a wavelet showermodel is to a real shower is likely to determine the different performance of the two approaches.Significantly, wavelet analysis is not a jet substructure technique. The ideas presented here3are quite general and explain how one could perform a wavelet analysis of hadron collisions, notwhat one might do with that analysis. The key ideas are that of rasterisation, followed by waveletdecomposition and analysis, before re-composing the event with the modified wavelet components.The concepts of de-noising, probing events over different angular scales and shower evolution arealso introduced. Jet substructure is a timely, popular and appropriate topic for the application ofwavelet analysis, but the concept of using an event’s angular scale dependence could equally alsobe used to search for new physics or understand QCD better. Wavelet analysis is a hitherto largelyignored technique that could open provide an important new tool for studying hadron collisions. Acknowledgements
Thanks to Troels Petersen, who enthusiastically supported this idea and gave invaluable feed-back. Thanks also to Mario Campanelli, with whom the initial ideas for a harmonic analysis ofQCD were developed, and Frank Krauss, who subsequently suggested that wavelets might make agood basis in which to perform such a study. I also had several useful discussions with Lily Asquith,whose “LHCSound” project caused several conversations on the relevance of harmonic analysis toparticle physics. This work was funded by a grant from the Lundbeck foundation. [1] J. Monk and M. Campanelli, PoS ACAT (2010) 059.[2] I. Daubechies, “Orthonormal bases of compactly supported wavelets,” Communications on Pure andApplied Mathematics (1998) 909.[3] G. Battle, “Wavelets & Renormalization,” World Scientific Pub Co Inc, 1999, ISBN 9789810226244.[4] M. Greiner, J. Giesemann, P. Lipa and P. Carruthers, Z. Phys. C (1996) 305 [hep-ph/9503235].[5] I. M. Dremin, G. K. .Eyyubova, V. L. Korotkikh and L. I. Sarycheva, Indian J. Phys. (2011) 39[arXiv:0711.1657 [nucl-ex]].[6] V. Rentala, W. Shepherd and T. M. P. Tait, arXiv:1404.1929 [hep-ph].[7] K. G. Wilson, “The renormalization group and critical phenomena,” Rev. Mod. Phys. (1983) 583.[8] G. P. Salam and G. Soyez, JHEP (2008) 242001[arXiv:0802.2470 [hep-ph]].[11] S. D. Ellis, C. K. Vermilion and J. R. Walsh, Phys. Rev. D (2010) 094023 [arXiv:0912.0033 [hep-ph]]. [12] T. Sjostrand, S. Mrenna and P. Z. Skands, Comput. Phys. Commun. (2008) 852 [arXiv:0710.3820[hep-ph]].[13] [ATLAS Collaboration], ATL-PHYS-PUB-2012-003.[14] J. Pumplin, D. R. Stump, J. Huston, H. L. Lai, P. M. Nadolsky and W. K. Tung, JHEP (2002)012 [hep-ph/0201195].[15] A. Buckley, J. Butterworth, L. Lonnblad, D. Grellscheid, H. Hoeth, J. Monk, H. Schulz and F. Siegert,Comput. Phys. Commun. (2013) 2803 [arXiv:1003.0694 [hep-ph]].[16] Y. L. Dokshitzer, G. D. Leder, S. Moretti and B. R. Webber, JHEP (1997) 001 [hep-ph/9707323].[17] M. Cacciari, G. P. Salam and G. Soyez, Eur. Phys. J. C (2012) 1896 [arXiv:1111.6097 [hep-ph]].[18] A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. von Toerne, and H. Voss, “TMVA: Toolkit forMultivariate Data Analysis,” PoS A CAT 040 (2007) [physics/0703039].[19] C. Roehr et al. http://pilemc.hepforge.org[20] C. F. Berger, E. L. Berger, P. C. Bhat, J. M. Butterworth, S. D. Ellis, B. Flaugher, W. T. Giele andW. Kilgore et al. , eConf C (2001) P512 [hep-ph/0202207].[21] M. Jankowiak and A. J. Larkoski, JHEP (2011) 057 [arXiv:1104.1646 [hep-ph]].[22] D. E. Soper and M. Spannowsky, Phys. Rev. D (2011) 074002 [arXiv:1102.3480 [hep-ph]]. Appendix A: Example Using the Haar Wavelet
The Harr wavelet is the simplest basis wavelet, consisting of a step function that calculates thedifference between adjacent values in a sequence of numbers that forms a signal. While the Harrwas not used for the wavelet analysis of hadron collisions, it is simple enough that it can be usedfor decomposition by hand, and as such forms a useful introduction to wavelets.The re-scaling of the signal from one level to the next is performed by averaging adjacent valuesin the signal sequence. The input signal must have a length that is radix two. The decompositionprocess is applied iteratively at each level of the decomposition, namely the high frequency partof the signal (the differences between adjacent values in the sequence) is separated from the lowfrequency part of the signal (the average of adjacent values). The next level of the decompositionapplies the same decomposition to the low frequency part of the previous level, the number oflevels being limited by the length of the input sequence. Most wavelet basis functions are morecomplicated than the Harr wavelet, and the re-scaling between levels is also more complicated thana simple average between adjacent values, but this basic iterative principle is still used. Figure 11illustrates the iterative decomposition of the signal.5 -505 φ -505 φ -505 φ Signal Level 3 Level 2