Improving Higgs plus Jets analyses through Fox--Wolfram Moments
Catherine Bernaciak, Bruce Mellado, Tilman Plehn, Peter Schichtel, Xifeng Ruan
IImproving Higgs plus Jets analyses through Fox–Wolfram Moments
Catherine Bernaciak, Bruce Mellado, Tilman Plehn, Peter Schichtel, and Xifeng Ruan Institut f¨ur Theoretische Physik, Universit¨at Heidelberg, Germany School of Physics, University of the Witwatersrand, Johannesburg, South Africa
It is well known that understanding the structure of jet radiation can significantly improve Higgsanalyses. Using Fox–Wolfram moments we systematically study the geometric patterns of additionaljets in weak boson fusion Higgs production with a decay to photons. First, we find a significant im-provement with respect to the standard analysis based on an analysis of the tagging jet correlations.In addition, we show that replacing a jet veto by a Fox-Wolfram moment analysis of the extra jetradiation almost doubles the signal-to-background ratio. Finally, we show that this improvementcan also be achieved based on a modified definition of the Fox–Wolfram moments which avoidsintroducing a new physical scale below the factorization scale. This modification can reduce theimpact of theory uncertainties on the Higgs rate and couplings measurements.
Contents
I. Introduction II. Setting the Stage
III. Tagging jet correlation IV. Replacing a jet veto V. Avoiding new scales VI. Outlook References a r X i v : . [ h e p - ph ] N ov I. INTRODUCTION
After the recent Higgs discovery by ATLAS and CMS [1–3], the careful and systematic study of Higgs propertiesis becoming a key research program at the LHC and a future linear collider [4]. The theoretical implications ofthe first fundamental scalar particle include many open questions, including the actual generation of a vacuumexpectation value, the stability of its physical mass, or the link between the Higgs potential at the weak scaleto high–scale structures [5]. In the language of quantum field theory we need to construct the weak–scale HiggsLagrangian including the operator basis and the corresponding couplings [6].At the LHC the weak boson fusion production channel (WBF) [7–11] plays an important role in answeringsome of these question, in particular once the LHC runs closer to its design energy. It allows us to directlyprobe the unitarization of
W W → W W scattering and carries information on tree–level Higgs couplings withnegligible impact of perturbative extensions of the Standard Model. Experimentally, two forward tagging jetsare highly effective in reducing QCD backgrounds [12], which means that Higgs analyses in weak boson fusiontypically benefit from a signal–to–background ratio around unity.As an analysis tool utilizing the unique QCD structure of weak boson fusion we rely on a central jet veto [13–18]. It is based on the fact that we can generate large logarithms and increase central jet radiation in QCDbackgrounds while leaving the jet activity in the signal at low level. This shift from staircase scaling of jets(with constant ratios between successive exclusive jet bins) in signal and background to staircase scaling in thesignal and Poisson scaling in the background can be derived from first–principles QCD [13]. The resulting jetveto survival probabilities for the the QCD backgrounds can be measured in data. Their calculation from QCDis plagued with significant theory uncertainties which in turn will soon dominate the extraction of the Higgscouplings at the LHC [6]. In addition, a jet veto always removes a wealth of kinematic information carried bythese jets, so the question arises whether the information from the jets recoiling against the Higgs cannot beused more efficiently.To answer the question of how much information is encoded in the jet activity of Higgs candidate events weneed to systematically study multi-jet kinematics. For example in flavor physics Fox–Wolfram moments (FWM)are an established tool to analyze such geometric patterns [19], but they have hardly been employed by theATLAS and CMS collaborations. By construction, they are particularly well suited to study the geometry oftagging jets in weak boson fusion [20]. Dependent on the specific construction of their weights the moments canalso be sensitive measures of the additional jet activity in an event. Ideally, they will enhance a central jet vetodefined on a fixed phase space region to some kind of weighted jet veto over phase space regions based on thekinematics of the hard process. Moreover, by choosing different weights the moments can be adjusted such thatthey avoid introducing a fixed scale below the factorization scale of the hard process. At the expense of thebackground rejection efficiency they can be tuned to introduce smaller theory uncertainties. This will allow theATLAS and CMS experiments to optimize their Higgs analyses including theory uncertainties and significantlyimprove the case for a luminosity upgrade based on Higgs couplings measurements.In this paper we will attempt to answer three questions based on the weak boson fusion analysis with a Higgsdecay to photons. This includes a study of the signal process, the Higgs background from gluon fusion, and thecontinuum production of a photon pair with jets:1. in Section III we will apply Fox–Wolfram moments to the kinematics of the two tagging jets only. Basedon a multivariate analysis we will estimate how much these additional observables can improve the currentATLAS results at 8 TeV collider energy.2. in Section IV we will compare the performance of a set of Fox–Wolfram moments with a specific (unit)weight in comparison to the usual central jet veto for the 13 TeV run. Moreover, a multivariate analysisof Fox-Wolfram moments allows us to define a ROC curve with a free choice of operating points.3. in Section V we will introduce a new weight in the Fox–Wolfram moments. It avoids introducing a physicalmomentum scale for the jet veto which lies below the factorization scale.Obviously, our conclusions are immediately applicable to ongoing and future LHC analyses. Fox–Wolframmoments have been tested in a few ATLAS and CMS analyses, so it should be a simple task to also include themin Higgs analyses.
II. SETTING THE STAGE
The analysis presented in this paper will give an estimate of the impact which Fox–Wolfram moments com-puted from jets can have on current and future LHC Higgs analyses. Fox–Wolfram moments are one way tosystematically evaluate angular correlations between jets in terms of spherical harmonics. While such approachesare standard for example in cosmology, they are largely missing in LHC physics. We will summarize their mainfeatures below. For a more detailed account of the WBF-specific properties we refer to an earlier paper [20].To allow for significant correlations between different moments we employ multivariate methods. Our analysiswill largely be based on boosted decision trees (BDTs), which we will also briefly introduce below. Part of theanalysis we cross–check with a neural net to make sure our findings are independent of the MVA method used.
A. Fox–Wolfram moments
Most analyses of QCD jets at the LHC are based on an ad-hoc selection of angular correlation variables,which have been shown to separate signals from backgrounds. For analyses where each one–dimensional ortwo–dimensional distribution is carefully understood in terms of the underlying physics and then tuned to thebest cut value, this approach is natural and appropriate. For multivariate analyses, where events are classifiedin terms of a more generic set of kinematic observables, the choice of observables should be more systematic.For angular correlations, we know how to generally describe underlying objects, in our case jets, in termsof spherical harmonics. Obviously, Fox–Wolfram moments do not have to be based on jets. They are closelyrelated to event shapes [22], and for example at LEP they were based on calorimeter information. At the LHC,particle flow objects or topoclusters might eventually turn out more useful. In this analysis we use jets to avoidadditional experimental or theoretical complications, for example due to pile-up or underlying event.Fox–Wolfram moments are constructed by summing jet–jet correlations over all 2 (cid:96) + 1 directions, includingan unspecified weight function W xi [19] H x(cid:96) = 4 π (cid:96) + 1 (cid:96) (cid:88) m = − (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 W xi Y m(cid:96) (Ω i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (1)The index i sums over all final state jets defined by appropriate acceptance and selection criteria. The generalcoordinates of the spherical harmonics Y m(cid:96) ( θ, φ ) we replace by a reference angle Ω. The moments can be rewrittenas H x(cid:96) = N (cid:88) i,j =1 W xij P (cid:96) (cos Ω ij ) with W xij = W xi W xj . (2)The angle Ω ij is the total angle between two jets. The weight function W xij can be chosen freely. In Sections IIIand IV we will use transverse–momentum and unit weights [20]: W Tij = p T i p T j ( (cid:80) p T i ) W Uij = 1 N . (3)The advantage of the transverse–momentum weight is that soft and collinear jets with their limited amount ofinformation about the hard process are automatically suppressed. The resulting analysis becomes stable withrespect to the parton shower and QCD jet radiation. For tagging jets without an actual collinear divergence thetransverse momentum weight should be appropriate.Whenever we are interested in the color structure of the event, this jet radiation will carry the crucial infor-mation. For studies of central jet radiation we therefore expect the unit weight to be the most promising.In analogy to a jet veto, Fox–Wolfram moments with unit weight introduce an energy or momentum scale,above which we include jets in the moments. Because of the unit weight there does not exist a smooth transitionregime; requiring any Fox–Wolfram moment of such additional jets to be different from zero corresponds to a stepfunction in counting the number of jets. Because the new momentum scale usually resides below the factorizationscale of the hard event, fixed–order precision predictions are not applicable, and a dedicated resummation is atheoretical challenge [13–18]. In Section V we introduce the matched weight W Mij = ( p T i − p min T ) ( p T j − p min T ) (cid:0)(cid:80) p T i − p min T (cid:1) (4)in order to reduce the theoretical uncertainty in comparing measured cross sections to QCD predictions. This newweight avoids introducing a new hard scale and will be less dominated by the momentum scale p min T j = 20 GeV,above which jets contribute to the Fox–Wolfram moments.
B. Event generation
While the description of the tagging jets in weak boson fusion is straightforward, the continuum backgroundwith its QCD jet activity is more tricky. Moreover, the correct description of the QCD activity in the Higgssignal requires a careful treatment of the color structure of the hard process. Throughout this analysis we use
Sherpa [24] with
Ckkw merging [25]. For the weak boson fusion signal we generate samples including up tothree hard jets, including the tagging jets. Gluon fusion Higgs production we simulate with up to three hardjets. For the QCD background we include di–photon production plus up to two hard jets. For jet clustering werely on the anti- k T algorithm as in Fastjet [26] with R = 0 . p T γ >
14 GeV R γj > . m γγ >
80 GeV . (5)After those cuts we are left with a weak–boson–fusion signal cross section times branching ratio of 5.2 fb at8 TeV collider energy and 9.24 fb at 13 TeV collider energy. To allow for an efficient generation of backgroundevents we do not require a mass window for the two photons in the background generation. Later in the analysiswe add an m γγ window of ±
10 GeV around the Higgs mass. For a proper Higgs analysis we should require an m γγ window of 1-2 GeV around the measured Higgs mass. However, with this condition the event generation forthe background becomes highly inefficient. Because our analysis does not intend to predict the actual signal andbackground cross sections and instead focuses on the improvement over the established experimental analysis [27],the loose cuts of Eq.(5) allow for a much more efficient event generation and will not affect our conclusions. C. Boosted decision trees
Any multivariate analysis is based on some kind of mapping of a set of observables onto a single–valuedquantity, the classifier response. Based on this classifier response we define a classification rule to separate signaland background events. Training the multivariate analysis on a set of simulated events aims to determine thebest classification rule for a given signal and background. The optimal classification rule has to be determinedby some measure, for example the signal efficiency, the statistical significance, or the signal–to–background ratio.Independent of this optimization, we can quantify the performance of any classification rule in terms of the signalefficiency and the background mis-identification probability. In this two–dimensional plane we can describe cutson the same response parameter as a receiver operating characteristics (ROC) curve. Given such a ROC curvewe are free to choose one or more operating points. In line with the ATLAS di-photon analysis we use a fixed40% signal efficiency (cid:15) S after acceptance with a variable background rejection 1 − (cid:15) B as the standard workingpoint. In Section IV, we quote the main results of our BDT analysis for the best possible significance S/ √ S + B given the set of kinematic observables and Fox–Wolfram moments.Decision tree algorithms — as they are utilized in high energy physics applications — are based on a set ofkinematic variables, intended to separate signal and background events. In the first step they choose the ‘rootnode’ variable, i.e. the variable with the best separation between signal and background. There exist severaltypes of separation which we can choose from in Tmva [23]. We use the cross entropy C E = − SS + B log SS + B − BS + B log BS + B , (6)where S and B are the numbers of signal and background events in a particular subset of events. This measureis the closest to the original definition of information entropy [28]. After choosing the root node, the subsequentnodes are ordered by their separation at some threshold value.For the complete decision tree the events are classified as signal–like or background–like by some measure.In the training set we know how good the tree is at classifying the events. Our training set include 100000events for each signal and background channel. In the next step the algorithm corrects for mistakes througha reweighting procedure, builds another decision tree, tests its performance, and repeats for some user–definednumber of iterations. For this ‘boosting’ procedure we mainly use the adaptive boost algorithm implemented in Tmva [23]. The final classification rule for signal versus background events we then apply to an independentevent sample, again including 100000 events per signal and background process. To prevent over–training welimit our forest to 400 trees, and the individual trees to three layers.Because correlations between the different Fox–Wolfram moments are a key issue of our systematic approach tokinematic input variables, we carefully test two different boosting algorithms (adaptive and gradient boost [23])as well as different multivariate analysis methods.
Per se , boosted decision trees are not particularly well suitedfor studying strongly correlated variables. The reason is that trees are built out of the individual variables.Two strongly correlated variables are best mapped through individual fine binnings in each of them, so acareful mapping of correlations will eventually lead to statistical limitations and a possible training on statisticalfluctuations. Therefore, we compare BDT results to results using a multi–layer perceptron (MLP) neural networkwhenever an independent test appears sensible. We utilize a MLP neural network with a single hidden layercontaining N + 5 neurons, where N is the number of training variables. III. TAGGING JET CORRELATION
In this first analysis we are going to use Fox–Wolfram moments to systematically test the completeness of thetagging jet correlations included by ATLAS. Because we directly refer to the current ATLAS result we use acollider energy of 8 TeV for the most recent LHC run. The two p T -ordered tagging jets have to fulfill either ofthe two conditions p T j >
25 GeV for | y j | < . p T j >
30 GeV for 2 . ≤ | y j | < . . (7)These two tagging jets must also pass | ∆ y j j | ≥ m j j >
150 GeV . (8)These cuts correspond to the variables used in the multivariate di-photon Higgs analysis by ATLAS [27], { m j j , y j , y j , ∆ y j j } (ATLAS default) . (9)The angular correlations between the tagging jets in weak–boson–fusion Higgs production is known to reflectthe tensor structure of the W W H vertex [21]. In this application the collinearity of the two tagging jets playsan important role, with the effect that the azimuthal angle between the tagging jet is a more sensitive probethan the opening angle between them. For the Fox–Wolfram moments this means that the definition in termsof the opening angle Ω ij is not optimally suited. For the tagging jet analysis we therefore replace the openingangle in the Legendre polynomials by the azimuthal angle ∆ φ ij between the two tagging jets, H x,φ(cid:96) = (cid:88) i,j =1 W xij P (cid:96) (cos ∆ φ ij ) . (10)For a systematic study of the usefulness of the tagging jet correlations we perform a multi-variate analysis ofthe Fox–Wolfram moments introduced in Section II A. Because the moments are based on spherical harmonicsthey form a basis and include all available information, given the weight W xij we use in their definition.We show some sample BDT and MLP results based on the azimuthal moments in Table I. The full set ofmoments for each weight function by definition includes all available information for the corresponding weights.First, we see that including a large set of Fox–Wolfram moments gives a significant improvement of the currentATLAS set of observables, defined in Eq.(9). Both multivariate analyses using the first four moments with unitweight as well as with transverse–momentum weight reduces the remaining fraction of background events by afactor two. From the Tmva output we have checked that these eight moments dominate the distinctive powerof the analysis.Obviously, the next question is which of the Fox–Wolfram moments contribute most to this improvement.From the earlier analysis [20] we know that lower moments will dominate in the tagging jet analysis, and thatonly odd moments can distinguish between forward–backward and forward–forward tagging jets. Individually,
BDT MLP (cid:15) S = 0 . − (cid:15) B S √ S + B SB − (cid:15) B S √ S + B SB
ATLAS default Eq.(9) 0.887 1.50 0.76 0.888 1.50 0.78 H T,φ → H T,φ , H U,φ → H U,φ H T,φ , H T,φ , H U,φ , H U,φ H T,φ , H T,φ , H U,φ , H U,φ H T,φ , H U,φ H T,φ H U,φ φ , W T φ S/ √ S + B we compute for anintegrated luminosity of 30 fb − . All sets of variables subsequent to the first row contain the default variables as well. we find that the six best individual moments are (in order) H U,φ , H T,φ , H U,φ , H T,φ , H U,φ , and H T,φ . ∗ Themoments with unit weight are slightly more powerful than the transverse–momentum weight. The most strikingfeature is that for the tagging jet the higher moments play hardly any role in improving the analysis.As a matter of fact, the single moment H U,φ is, within uncertainties due to the training procedure, almost aspowerful as the set of the first 20 moments, both with unit and transverse–momentum weight. Given that thecorresponding Legendre polynomial is P (cos ∆ φ ij ) = cos ∆ φ ij we can further simplify the analysis by separatingthe transverse–momentum weight from the azimuthal angle. Compared to the ATLAS default variables, addingthe azimuthal angle between the tagging jets, ∆ φ ij , almost doubles the signal–to–background ratio. Systemat-ically including the Fox–Wolfram moments increases the signal–to–background ratio additionally by 8%. Thisresult persists between the two multivariate methods and we conclude that our improvement is truly due to thenature of the moments and not to some advantageous choice of methods and/or parameters for our multivariateanalyses.Following the tagging jet analysis in this section we extend the default set of tagging jet cuts Eq.(9) for theremainder of this paper to include { m j j , y j , y j , ∆ y j j , ∆ φ j j } (WBF default) . (11)It could be argued that adding the azimuthal angle to the list of kinematic variables employed in the backgroundrejection will make the analysis result less applicable to modified Higgs–like signal hypotheses. Indeed, the az-imuthal angle between the tagging jets is the key observable in the spin-0 CP analysis of the Higgs resonance [21].On the other hand, the same is true for the rapidity difference ∆ y when it comes to spin-2 alternatives [21]. IV. REPLACING A JET VETO
The key physics question we will answer in this Section is to what degree we can use information on additional(central) jet radiation to enhance the tagging jet analysis described in the previous Section III. Because a detailedanalysis of the jet activity has not been performed in the recent LHC runs, we assume a collider energy of 13 TeVin this section. The physics of the additional jets can be easily described: for the signal events the emission ofadditional central jets is suppressed by the color structure of the process. This means that the number of jets inweak boson fusion will in general follow the staircase pattern predicted for inclusive processes at the LHC [13].In contrast, gluon–fusion Higgs production or di-photon production will show this staircase pattern only in theabsence of tagging jet cuts. Once we require two hard jets with a large invariant mass we induce large logarithms, ∗ Given that
Tmva gives an ordered list of the most relevant observables, it is not clear to one of the authors (TP) why this veryinteresting information is never shown in experimental publications. ∆ y -selection p T -selectionWBF GF γγ WBF GF γγ generated [fb] 6.5 4.5 2050 6.5 4.5 2050∆ y j j > . × . × . × . × . × . × . y j y j < . × . × . × . × . × . × . m j j >
600 GeV × . × . × . × . × . × . × . × . × . × . × . × . which leads to a Poisson pattern in the number of jets [13]. The key feature of this Poisson distribution is asignificantly enhanced probability of radiating a central jet.Throughout our analysis we require two tagging jets with the generic acceptance cuts p T j >
20 GeV | y j | < . | ∆ y j j | > m j j >
150 GeV . (13)Correspondingly, we generate signal and background events using Sherpa [24] with
CKKW [25] jet mergingwith two or three hard jets from the matrix element. Throughout this Section we assume a collider energy of13 TeV. In addition to the general photon cuts of Eq.(5) we require m γγ = 126 ±
10 GeV. The cuts of Eq. (12)lead to cross sections of 6.5 fb for the weak–boson–fusion signal, 4.5 fb for gluon–fusion Higgs production, and2050 fb for the continuum background. As mentioned above, the signal–to–background ratio can be improvedthrough additional cuts, such as tightening the m γγ requirement. However, this makes it harder to reliablysimulate the background. In the following we will assume that additional cuts on the Higgs decay products areorthogonal to the additional jet kinematics.Because the selection criterion of the two tagging jets has a significant impact on the amount of Poissonenhancement of the additional jet production we use two selection criteria for the tagging jets:1. p T -selection: of all jets fulfilling Eqs.(12) and (13) the two hardest are the tagging jets. The mild cuts ofEq.(13) leave 3.36 fb for the signal, 1.04 fb for gluon–fusion Higgs production, and 509 fb for the continuumbackground.2. ∆ y -selection: of all jets fulfilling Eq.(12) and (13) the two most forward and backward are the tagging jets,maximizing ∆ y j j . After Eq.(13) the remaining rates are 3.78 fb for the signal, 1.71 fb for gluon–fusionHiggs production, and 736.2 fb for the non-Higgs background.While the p T -selection is standard in most weak–boson–fusion analyses, it will turn out that the ∆ y -selection ismore efficient in generating a large Poisson enhancement for central jet emission in the background processes.On the other hand, in particular for the 13 TeV run we have to see if pile-up makes one of the two selectionsappear experimentally superior.The standard approach to including the additional jet activity in the weak–boson–fusion Higgs analysis is acentral jet veto [12, 15]. To generate a sufficiently strong Poisson pattern in the number of jets we demand | ∆ y j j | > . y j · y j < m j j >
600 GeV . (14)In Table II we show the cut flow of the signal and background rates for each step in Eq.(14). Finally, we includea central jet veto which does not allow for jets above p T = 20 GeV in between the two tagging jets. While thetwo tagging jet selections show significant differences in the intermediate steps, after the veto the numbers ofsignal and background events are comparable. The survival rates for the central jet veto are in agreement withthe literature [7, 15].In the first three rows of Table III we show different statistical measures after the acceptance cuts of Eqs.(12)and (13), the veto–level cuts of Eq.(14), and after the central jet veto. The background is composed of gluon–fusion Higgs production and continuum di-photon production. We again see that the significance S/ √ S + B and the signal–to–background ratio are comparable for the ∆ y -selection and the p T -selection of the tagging jets. ∆ y -selection p T -selection (cid:15) S − (cid:15) B S √ S + B SB (cid:15) S − (cid:15) B S √ S + B SB acceptance cuts Eqs.(12) and (13) 1 0 0.76 0.005 1 0 0.81 0.007veto–level cuts Eq.(14) 0.402 0.854 0.80 0.014 0.405 0.996 1.01 0.026jet veto 0.302 0.967 1.24 0.047 0.369 0.945 1.26 0.045BDT: WBF default with Eq.(13) 0.400 0.862 0.79 0.014 0.400 0.904 1.04 0.0270.634 0.674 0.84 0.010 0.414 0.897 1.04 0.027BDT: WBF default plus FWM with Eq.(13) 0.400 0.952 1.34 0.041 0.400 0.944 1.35 0.0470.232 0.986 1.42 0.083 0.302 0.972 1.43 0.071Table III:
S/B and S/ √ S + B compared to classical cut and jet veto strategy for the ∆ y and p T -selections of the taggingjets. The value for S/ √ S + B we compute for an integrated luminosity of 30 fb − . The BDT analysis includes a set ofFox–Wolfram moments with unit weight, Eq.(15). We quote two working points at 40% signal efficiency and optimizedfor S/ √ S + B . However, this is only true after the jet veto. After only the hard cuts of Eq.(14) the p T -selection is significantlymore promising. As alluded to above, the jet veto benefits from the stronger Poisson enhancement from the∆ y -selection, leaving the final results essentially identical.In the next step, we use the default WBF observables of Eq.(11) and optimize them in a multivariate BDTanalysis as described in Section II C. The corresponding ROC curve we show in Figure 1. As in Table III theefficiencies are defined with respect to the full set of acceptance cuts from Eqs.(12) and (13). In the table wequote two points from this curve. First, we show the usual working point with a signal efficiency of 40%. Second,we show the working point with the best result for S/ √ S + B . Optimizing for the best result of S/B does notgive a well defined solution. As expected, the ROC curve indicates working points for the entire range of signalefficiencies (cid:15) S = 0 ... W xij . Unlike for the taggingjet kinematics we now do not constrain our system to the transverse plane, which means we use the originaldefinition of the moments in Eq.(2) with the opening angle Ω ij . On the other hand, we already know what thebenefit of including the moments of the tagging jets are: according to Section III most of the information isincluded once we add the azimuthal angle between the tagging jets, ∆ φ j j , to the standard set of observablesgiven in Eq.(11). Therefore, we limit the analysis of the additional jet activity to all jet–jet correlations withthe exception of the two tagging jets. Moreover, we can expect the unit weight to give the best sensitivity tothe relatively soft additional jet activity, so we use H U(cid:96) = 1 N (cid:88) ( i,j ) (cid:54) =(1 , P (cid:96) (cos Ω ij ) . (15)For both of the tagging jet selections we only include jets which fall between the two tagging jets, in completeanalogy of a central jet veto. For exactly two tagging jets and no additional jet radiation this implies H U(cid:96) = 0for all values of (cid:96) .In Table III we show the result of a combined BDT analysis of the observable of Eq.(11) and the set of Fox–Wolfram moments. Again, we quote two operating points, one of them for a fixed signal efficiency of 40% andone optimized for the best value of S/ √ S + B . In addition, we show results for both, the ∆ y -selection and the p T -selection of the tagging jets. A generic problem for any BDT analysis is that for limited statistics of thetraining sample it can only include a limited number of observables. On the other hand, the BDT first determinesthe most powerful observables, so we only include the five best Fox–Wolfram moments in our analysis. We havechecked that adding more moments will not improve the result beyond numerical accuracy. For the ∆ y -selectionthe five leading moments with unit weight are H U , H U , H U , H U , and H U . For the p T -selection the mostpowerful moments are H U , H U , H U , H U , and H U . However, for the p T -selection the most powerful variablein the BDT is ∆ y j j . For the ∆ y -selection this observable is maximized by construction.The ROC curves in Figure 1 shows a clear improvement of the complete multivariate analysis including theFox–Wolfram moments as compared to the kinematic variables of Eq.(11) only. For a fixed moderate signal S ˛ B ˛ - y-selection D WBF default, y-selection D plus FWMs, -selection T WBF default, p -selection T plus FWMs, p S ˛ B ˛ - y-selection D FWMs, -selection T FWMs, pjet vetojet veto
Figure 1: ROC curve for ∆ y - (black) and p T -selection (red) of the tagging jets. Left: We compare the WBF defaultobservables (dashed) of Eq.(11) to an additional set of Fox–Wolfram moments (solid). Right: We show how usingFox–Wolfram moments compare to a central jet veto. efficiency of 40% adding information on the jets decreases the probability of a background mis-identification bya factor of 2.9 for the ∆ y -selection and a factor of 1.7 for the p T -selection. The improvement relative to the jetveto we show in the right panel, zooming into typical signal efficiencies around 35% relative to the acceptancecuts of Eq.(13). For the jet veto working point of the ∆ y -selection with fixed signal efficiency of 30.2% we seethat the background misidentification is reduced by 30%. For the p T -selection with fixed signal efficiency of36.9% we find an improvement by 20%. V. AVOIDING NEW SCALES
The unit weights in the definition of the Fox–Wolfram moments used in the previous Section IV share adisadvantage with a jet veto when it comes to predicting them from theory: they introduce an additionalphysical momentum scale in the process which is below the hard scale of the Higgs production process. Collinearfactorization as the basis of defining the parton densities in perturbative field theory does not allow for suchadditional scales. All measurements which are to be compared to fixed–order perturbative QCD predictions haveto be jet–inclusive for transverse momenta below the factorization scale. If we introduce an additional scale thisimplies that we introduce a possibly large logarithm which needs to be resummed [16–18].Introducing a weight which smoothly interpolates between the jet counting scale p min T j = 20 GeV and the hardscale of the process according to Eq.(4) should alleviate this tension, suggesting to repeat the same analysis asshown in Section IV with the Fox–Wolfram moments H M(cid:96) = (cid:88) ( i,j ) (cid:54) =(1 , ( p T i − p min T ) ( p T j − p min T ) (cid:0)(cid:80) p T i − p min T (cid:1) P (cid:96) (cos Ω ij ) . (16) ∆ y -selection p T -selection (cid:15) S − (cid:15) B S √ S + B SB (cid:15) S − (cid:15) B S √ S + B SB jet veto Eq.(12) to (14) 0.302 0.967 1.24 0.047 0.369 0.945 1.26 0.045BDT: WBF default plus unit–weight FWM 0.400 0.952 1.34 0.041 0.400 0.944 1.35 0.0470.232 0.986 1.42 0.083 0.302 0.972 1.43 0.071BDT: WBF default plus matched–weight FWM 0.400 0.949 1.32 0.040 0.400 0.942 1.32 0.0450.240 0.985 1.43 0.081 0.256 0.979 1.40 0.082Table IV:
S/B and S/ √ S + B compared to jet veto strategy for the ∆ y and p T -selections of the tagging jets. The valuefor S/ √ S + B we compute for an integrated luminosity of 30 fb − . Extending Table III the BDT analysis now includesa set of Fox–Wolfram moments with matched weight, Eq.(16). As BDT results we quote the working point at 40% signalefficiency and the best point for S/ √ S + B . S ˛ B ˛ - y-selection D matched weight, y-selection D unit weight, -selection T matched weight, p-selection T unit weight, p S ˛ B ˛ - y-selection D FWMs, -selection T FWMs, pjet vetojet veto
Figure 2: ROC curve for ∆ y - (black) and p T -selection (red) of the tagging jets. Left: We compare the WBF defaultobservables (dashed) of Eq.(11) to an additional set of Fox–Wolfram moments (solid). Right: We show how usingFox–Wolfram moments compares to a central jet veto. While we cannot offer an estimate of the improvement in the perturbative QCD treatment, it is clear that thematched weights are less sensitive to large collinear logarithms generated by the violation of collinear factoriza-tion.In Table IV we extend the original Table III, including the same BDT analysis now based on matched Fox–Wolfram moments. For the standard working point with 40% signal efficiency we see that the backgroundrejection from the matched moments is essentially identical to the unit weight moments. The main differenceis the order of the most relevant set of moments, which now is H M , H M , H M , H M , H M for the ∆ y -selectionand H M , H M , H M , H M , H M for the p T -selection. Similarly, the working point optimized for S/ √ S + B is onlyslightly shifted. In Figure 2 we compare the ROC curves for the jet radiation study based on the two Fox–Wolfram moment weights. For signal efficiencies between 25% and 40% the unit weight is slightly superior, butmost likely this slight advantage will be compensated once we include theory uncertainties from QCD predictions. VI. OUTLOOK
Weak boson fusion analyses of Higgs production at the LHC are key ingredients to Higgs couplings andHiggs property analyses in the upcoming LHC run. They allow for an efficient background rejection based ontwo tagging jets and an additional central jet veto. The question is, how we can make optimal use of the jetproperties for example to improve the signal–to–background ratio or the signal significance. In our detailedanalysis we come to three conclusions:1. For the two tagging jets we rely on a set of low- (cid:96) moments with a transverse momentum weight andazimuthal angle separation. Most of the improvement as compared to the standard ATLAS analysis can betraced back to the missing azimuthal angle between the tagging jets. In addition, the signal–to–backgroundratio can be increased by 8% by including a set of Fox–Wolfram moments.2. The additional jets can be studied using a wide range of moments with a unit weight and full angularseparation. It should be compared to a jet veto and delivers a significantly better performance. Thetagging jet selection with maximum rapidity distance is better suited to distinguish the signal from thecontinuum background then the transverse momentum selection. For both cases we computed a full ROCcurve, allowing for optimized working points depending on the details of the analysis.3. To reduce theory uncertainties from QCD predictions we can introduce a softer, matched weight in theFox–Wolfram moments. It turns out that the analysis of jet radiation is almost as promising as for theunit weights, but with a much improved theoretical behavior.We conclude that tagging jet criteria as well as the jet veto as analysis tools for Higgs analyses in weak bo-son fusion can be improved by a systematic study of the multi–jet system based on Fox–Wolfram moments.The improvement is significant, both for the ∆ y -selection and the p T -selection of the tagging jets. The Fox–Wolfram moment analysis can be adapted to individual analyses by choosing appropriate working points in thecorresponding ROC curves.1 Acknowledgments
CB would like to thank the ATLAS collaboration at CERN, where part of this work was performed. We wouldlike to thank Erik Gerwick and Steffen Schumann for help with the event generation. CB acknowledges supportby BMBF under project number 05H12VHE. PS acknowledges support by the IMPRS for