[PDF] Improving Higgs plus Jets analyses through Fox--Wolfram Moments

Abstract

It is well known that understanding the structure of jet radiation can significantly improve Higgs analyses. Using Fox--Wolfram moments we systematically study the geometric patterns of additional jets in weak boson fusion Higgs production with a decay to photons. First, we find a significant improvement with respect to the standard analysis based on an analysis of the tagging jet correlations. In addition, we show that replacing a jet veto by a Fox-Wolfram moment analysis of the extra jet radiation almost doubles the signal-to-background ratio. Finally, we show that this improvement can also be achieved based on a modified definition of the Fox--Wolfram moments which avoids introducing a new physical scale below the factorization scale. This modification can reduce the impact of theory uncertainties on the Higgs rate and couplings measurements.

Full PDF

IImproving Higgs plus Jets analyses through Fox–Wolfram Moments

Catherine Bernaciak, Bruce Mellado, Tilman Plehn, Peter Schichtel, and Xifeng Ruan Institut f¨ur Theoretische Physik, Universit¨at Heidelberg, Germany School of Physics, University of the Witwatersrand, Johannesburg, South Africa

It is well known that understanding the structure of jet radiation can signiﬁcantly improve Higgsanalyses. Using Fox–Wolfram moments we systematically study the geometric patterns of additionaljets in weak boson fusion Higgs production with a decay to photons. First, we ﬁnd a signiﬁcant im-provement with respect to the standard analysis based on an analysis of the tagging jet correlations.In addition, we show that replacing a jet veto by a Fox-Wolfram moment analysis of the extra jetradiation almost doubles the signal-to-background ratio. Finally, we show that this improvementcan also be achieved based on a modiﬁed deﬁnition of the Fox–Wolfram moments which avoidsintroducing a new physical scale below the factorization scale. This modiﬁcation can reduce theimpact of theory uncertainties on the Higgs rate and couplings measurements.

Contents

I. Introduction II. Setting the Stage

III. Tagging jet correlation IV. Replacing a jet veto V. Avoiding new scales VI. Outlook References a r X i v : . [ h e p - ph ] N ov I. INTRODUCTION

After the recent Higgs discovery by ATLAS and CMS [1–3], the careful and systematic study of Higgs propertiesis becoming a key research program at the LHC and a future linear collider [4]. The theoretical implications ofthe ﬁrst fundamental scalar particle include many open questions, including the actual generation of a vacuumexpectation value, the stability of its physical mass, or the link between the Higgs potential at the weak scaleto high–scale structures [5]. In the language of quantum ﬁeld theory we need to construct the weak–scale HiggsLagrangian including the operator basis and the corresponding couplings [6].At the LHC the weak boson fusion production channel (WBF) [7–11] plays an important role in answeringsome of these question, in particular once the LHC runs closer to its design energy. It allows us to directlyprobe the unitarization of

W W → W W scattering and carries information on tree–level Higgs couplings withnegligible impact of perturbative extensions of the Standard Model. Experimentally, two forward tagging jetsare highly eﬀective in reducing QCD backgrounds [12], which means that Higgs analyses in weak boson fusiontypically beneﬁt from a signal–to–background ratio around unity.As an analysis tool utilizing the unique QCD structure of weak boson fusion we rely on a central jet veto [13–18]. It is based on the fact that we can generate large logarithms and increase central jet radiation in QCDbackgrounds while leaving the jet activity in the signal at low level. This shift from staircase scaling of jets(with constant ratios between successive exclusive jet bins) in signal and background to staircase scaling in thesignal and Poisson scaling in the background can be derived from ﬁrst–principles QCD [13]. The resulting jetveto survival probabilities for the the QCD backgrounds can be measured in data. Their calculation from QCDis plagued with signiﬁcant theory uncertainties which in turn will soon dominate the extraction of the Higgscouplings at the LHC [6]. In addition, a jet veto always removes a wealth of kinematic information carried bythese jets, so the question arises whether the information from the jets recoiling against the Higgs cannot beused more eﬃciently.To answer the question of how much information is encoded in the jet activity of Higgs candidate events weneed to systematically study multi-jet kinematics. For example in ﬂavor physics Fox–Wolfram moments (FWM)are an established tool to analyze such geometric patterns [19], but they have hardly been employed by theATLAS and CMS collaborations. By construction, they are particularly well suited to study the geometry oftagging jets in weak boson fusion [20]. Dependent on the speciﬁc construction of their weights the moments canalso be sensitive measures of the additional jet activity in an event. Ideally, they will enhance a central jet vetodeﬁned on a ﬁxed phase space region to some kind of weighted jet veto over phase space regions based on thekinematics of the hard process. Moreover, by choosing diﬀerent weights the moments can be adjusted such thatthey avoid introducing a ﬁxed scale below the factorization scale of the hard process. At the expense of thebackground rejection eﬃciency they can be tuned to introduce smaller theory uncertainties. This will allow theATLAS and CMS experiments to optimize their Higgs analyses including theory uncertainties and signiﬁcantlyimprove the case for a luminosity upgrade based on Higgs couplings measurements.In this paper we will attempt to answer three questions based on the weak boson fusion analysis with a Higgsdecay to photons. This includes a study of the signal process, the Higgs background from gluon fusion, and thecontinuum production of a photon pair with jets:1. in Section III we will apply Fox–Wolfram moments to the kinematics of the two tagging jets only. Basedon a multivariate analysis we will estimate how much these additional observables can improve the currentATLAS results at 8 TeV collider energy.2. in Section IV we will compare the performance of a set of Fox–Wolfram moments with a speciﬁc (unit)weight in comparison to the usual central jet veto for the 13 TeV run. Moreover, a multivariate analysisof Fox-Wolfram moments allows us to deﬁne a ROC curve with a free choice of operating points.3. in Section V we will introduce a new weight in the Fox–Wolfram moments. It avoids introducing a physicalmomentum scale for the jet veto which lies below the factorization scale.Obviously, our conclusions are immediately applicable to ongoing and future LHC analyses. Fox–Wolframmoments have been tested in a few ATLAS and CMS analyses, so it should be a simple task to also include themin Higgs analyses.

II. SETTING THE STAGE

The analysis presented in this paper will give an estimate of the impact which Fox–Wolfram moments com-puted from jets can have on current and future LHC Higgs analyses. Fox–Wolfram moments are one way tosystematically evaluate angular correlations between jets in terms of spherical harmonics. While such approachesare standard for example in cosmology, they are largely missing in LHC physics. We will summarize their mainfeatures below. For a more detailed account of the WBF-speciﬁc properties we refer to an earlier paper [20].To allow for signiﬁcant correlations between diﬀerent moments we employ multivariate methods. Our analysiswill largely be based on boosted decision trees (BDTs), which we will also brieﬂy introduce below. Part of theanalysis we cross–check with a neural net to make sure our ﬁndings are independent of the MVA method used.

A. Fox–Wolfram moments

Most analyses of QCD jets at the LHC are based on an ad-hoc selection of angular correlation variables,which have been shown to separate signals from backgrounds. For analyses where each one–dimensional ortwo–dimensional distribution is carefully understood in terms of the underlying physics and then tuned to thebest cut value, this approach is natural and appropriate. For multivariate analyses, where events are classiﬁedin terms of a more generic set of kinematic observables, the choice of observables should be more systematic.For angular correlations, we know how to generally describe underlying objects, in our case jets, in termsof spherical harmonics. Obviously, Fox–Wolfram moments do not have to be based on jets. They are closelyrelated to event shapes [22], and for example at LEP they were based on calorimeter information. At the LHC,particle ﬂow objects or topoclusters might eventually turn out more useful. In this analysis we use jets to avoidadditional experimental or theoretical complications, for example due to pile-up or underlying event.Fox–Wolfram moments are constructed by summing jet–jet correlations over all 2 (cid:96) + 1 directions, includingan unspeciﬁed weight function W xi [19] H x(cid:96) = 4 π (cid:96) + 1 (cid:96) (cid:88) m = − (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 W xi Y m(cid:96) (Ω i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (1)The index i sums over all ﬁnal state jets deﬁned by appropriate acceptance and selection criteria. The generalcoordinates of the spherical harmonics Y m(cid:96) ( θ, φ ) we replace by a reference angle Ω. The moments can be rewrittenas H x(cid:96) = N (cid:88) i,j =1 W xij P (cid:96) (cos Ω ij ) with W xij = W xi W xj . (2)The angle Ω ij is the total angle between two jets. The weight function W xij can be chosen freely. In Sections IIIand IV we will use transverse–momentum and unit weights [20]: W Tij = p T i p T j ( (cid:80) p T i ) W Uij = 1 N . (3)The advantage of the transverse–momentum weight is that soft and collinear jets with their limited amount ofinformation about the hard process are automatically suppressed. The resulting analysis becomes stable withrespect to the parton shower and QCD jet radiation. For tagging jets without an actual collinear divergence thetransverse momentum weight should be appropriate.Whenever we are interested in the color structure of the event, this jet radiation will carry the crucial infor-mation. For studies of central jet radiation we therefore expect the unit weight to be the most promising.In analogy to a jet veto, Fox–Wolfram moments with unit weight introduce an energy or momentum scale,above which we include jets in the moments. Because of the unit weight there does not exist a smooth transitionregime; requiring any Fox–Wolfram moment of such additional jets to be diﬀerent from zero corresponds to a stepfunction in counting the number of jets. Because the new momentum scale usually resides below the factorizationscale of the hard event, ﬁxed–order precision predictions are not applicable, and a dedicated resummation is atheoretical challenge [13–18]. In Section V we introduce the matched weight W Mij = ( p T i − p min T ) ( p T j − p min T ) (cid:0)(cid:80) p T i − p min T (cid:1) (4)in order to reduce the theoretical uncertainty in comparing measured cross sections to QCD predictions. This newweight avoids introducing a new hard scale and will be less dominated by the momentum scale p min T j = 20 GeV,above which jets contribute to the Fox–Wolfram moments.

B. Event generation

While the description of the tagging jets in weak boson fusion is straightforward, the continuum backgroundwith its QCD jet activity is more tricky. Moreover, the correct description of the QCD activity in the Higgssignal requires a careful treatment of the color structure of the hard process. Throughout this analysis we use

Sherpa [24] with

Ckkw merging [25]. For the weak boson fusion signal we generate samples including up tothree hard jets, including the tagging jets. Gluon fusion Higgs production we simulate with up to three hardjets. For the QCD background we include di–photon production plus up to two hard jets. For jet clustering werely on the anti- k T algorithm as in Fastjet [26] with R = 0 . p T γ >

14 GeV R γj > . m γγ >

80 GeV . (5)After those cuts we are left with a weak–boson–fusion signal cross section times branching ratio of 5.2 fb at8 TeV collider energy and 9.24 fb at 13 TeV collider energy. To allow for an eﬃcient generation of backgroundevents we do not require a mass window for the two photons in the background generation. Later in the analysiswe add an m γγ window of ±

10 GeV around the Higgs mass. For a proper Higgs analysis we should require an m γγ window of 1-2 GeV around the measured Higgs mass. However, with this condition the event generation forthe background becomes highly ineﬃcient. Because our analysis does not intend to predict the actual signal andbackground cross sections and instead focuses on the improvement over the established experimental analysis [27],the loose cuts of Eq.(5) allow for a much more eﬃcient event generation and will not aﬀect our conclusions. C. Boosted decision trees

Any multivariate analysis is based on some kind of mapping of a set of observables onto a single–valuedquantity, the classiﬁer response. Based on this classiﬁer response we deﬁne a classiﬁcation rule to separate signaland background events. Training the multivariate analysis on a set of simulated events aims to determine thebest classiﬁcation rule for a given signal and background. The optimal classiﬁcation rule has to be determinedby some measure, for example the signal eﬃciency, the statistical signiﬁcance, or the signal–to–background ratio.Independent of this optimization, we can quantify the performance of any classiﬁcation rule in terms of the signaleﬃciency and the background mis-identiﬁcation probability. In this two–dimensional plane we can describe cutson the same response parameter as a receiver operating characteristics (ROC) curve. Given such a ROC curvewe are free to choose one or more operating points. In line with the ATLAS di-photon analysis we use a ﬁxed40% signal eﬃciency (cid:15) S after acceptance with a variable background rejection 1 − (cid:15) B as the standard workingpoint. In Section IV, we quote the main results of our BDT analysis for the best possible signiﬁcance S/ √ S + B given the set of kinematic observables and Fox–Wolfram moments.Decision tree algorithms — as they are utilized in high energy physics applications — are based on a set ofkinematic variables, intended to separate signal and background events. In the ﬁrst step they choose the ‘rootnode’ variable, i.e. the variable with the best separation between signal and background. There exist severaltypes of separation which we can choose from in Tmva [23]. We use the cross entropy C E = − SS + B log SS + B − BS + B log BS + B , (6)where S and B are the numbers of signal and background events in a particular subset of events. This measureis the closest to the original deﬁnition of information entropy [28]. After choosing the root node, the subsequentnodes are ordered by their separation at some threshold value.For the complete decision tree the events are classiﬁed as signal–like or background–like by some measure.In the training set we know how good the tree is at classifying the events. Our training set include 100000events for each signal and background channel. In the next step the algorithm corrects for mistakes througha reweighting procedure, builds another decision tree, tests its performance, and repeats for some user–deﬁnednumber of iterations. For this ‘boosting’ procedure we mainly use the adaptive boost algorithm implemented in Tmva [23]. The ﬁnal classiﬁcation rule for signal versus background events we then apply to an independentevent sample, again including 100000 events per signal and background process. To prevent over–training welimit our forest to 400 trees, and the individual trees to three layers.Because correlations between the diﬀerent Fox–Wolfram moments are a key issue of our systematic approach tokinematic input variables, we carefully test two diﬀerent boosting algorithms (adaptive and gradient boost [23])as well as diﬀerent multivariate analysis methods.

Per se , boosted decision trees are not particularly well suitedfor studying strongly correlated variables. The reason is that trees are built out of the individual variables.Two strongly correlated variables are best mapped through individual ﬁne binnings in each of them, so acareful mapping of correlations will eventually lead to statistical limitations and a possible training on statisticalﬂuctuations. Therefore, we compare BDT results to results using a multi–layer perceptron (MLP) neural networkwhenever an independent test appears sensible. We utilize a MLP neural network with a single hidden layercontaining N + 5 neurons, where N is the number of training variables. III. TAGGING JET CORRELATION

In this ﬁrst analysis we are going to use Fox–Wolfram moments to systematically test the completeness of thetagging jet correlations included by ATLAS. Because we directly refer to the current ATLAS result we use acollider energy of 8 TeV for the most recent LHC run. The two p T -ordered tagging jets have to fulﬁll either ofthe two conditions p T j >

25 GeV for | y j | < . p T j >

30 GeV for 2 . ≤ | y j | < . . (7)These two tagging jets must also pass | ∆ y j j | ≥ m j j >

150 GeV . (8)These cuts correspond to the variables used in the multivariate di-photon Higgs analysis by ATLAS [27], { m j j , y j , y j , ∆ y j j } (ATLAS default) . (9)The angular correlations between the tagging jets in weak–boson–fusion Higgs production is known to reﬂectthe tensor structure of the W W H vertex [21]. In this application the collinearity of the two tagging jets playsan important role, with the eﬀect that the azimuthal angle between the tagging jet is a more sensitive probethan the opening angle between them. For the Fox–Wolfram moments this means that the deﬁnition in termsof the opening angle Ω ij is not optimally suited. For the tagging jet analysis we therefore replace the openingangle in the Legendre polynomials by the azimuthal angle ∆ φ ij between the two tagging jets, H x,φ(cid:96) = (cid:88) i,j =1 W xij P (cid:96) (cos ∆ φ ij ) . (10)For a systematic study of the usefulness of the tagging jet correlations we perform a multi-variate analysis ofthe Fox–Wolfram moments introduced in Section II A. Because the moments are based on spherical harmonicsthey form a basis and include all available information, given the weight W xij we use in their deﬁnition.We show some sample BDT and MLP results based on the azimuthal moments in Table I. The full set ofmoments for each weight function by deﬁnition includes all available information for the corresponding weights.First, we see that including a large set of Fox–Wolfram moments gives a signiﬁcant improvement of the currentATLAS set of observables, deﬁned in Eq.(9). Both multivariate analyses using the ﬁrst four moments with unitweight as well as with transverse–momentum weight reduces the remaining fraction of background events by afactor two. From the Tmva output we have checked that these eight moments dominate the distinctive powerof the analysis.Obviously, the next question is which of the Fox–Wolfram moments contribute most to this improvement.From the earlier analysis [20] we know that lower moments will dominate in the tagging jet analysis, and thatonly odd moments can distinguish between forward–backward and forward–forward tagging jets. Individually,

BDT MLP (cid:15) S = 0 . − (cid:15) B S √ S + B SB − (cid:15) B S √ S + B SB

ATLAS default Eq.(9) 0.887 1.50 0.76 0.888 1.50 0.78 H T,φ → H T,φ , H U,φ → H U,φ H T,φ , H T,φ , H U,φ , H U,φ H T,φ , H T,φ , H U,φ , H U,φ H T,φ , H U,φ H T,φ H U,φ φ , W T φ S/ √ S + B we compute for anintegrated luminosity of 30 fb − . All sets of variables subsequent to the ﬁrst row contain the default variables as well. we ﬁnd that the six best individual moments are (in order) H U,φ , H T,φ , H U,φ , H T,φ , H U,φ , and H T,φ . ∗ Themoments with unit weight are slightly more powerful than the transverse–momentum weight. The most strikingfeature is that for the tagging jet the higher moments play hardly any role in improving the analysis.As a matter of fact, the single moment H U,φ is, within uncertainties due to the training procedure, almost aspowerful as the set of the ﬁrst 20 moments, both with unit and transverse–momentum weight. Given that thecorresponding Legendre polynomial is P (cos ∆ φ ij ) = cos ∆ φ ij we can further simplify the analysis by separatingthe transverse–momentum weight from the azimuthal angle. Compared to the ATLAS default variables, addingthe azimuthal angle between the tagging jets, ∆ φ ij , almost doubles the signal–to–background ratio. Systemat-ically including the Fox–Wolfram moments increases the signal–to–background ratio additionally by 8%. Thisresult persists between the two multivariate methods and we conclude that our improvement is truly due to thenature of the moments and not to some advantageous choice of methods and/or parameters for our multivariateanalyses.Following the tagging jet analysis in this section we extend the default set of tagging jet cuts Eq.(9) for theremainder of this paper to include { m j j , y j , y j , ∆ y j j , ∆ φ j j } (WBF default) . (11)It could be argued that adding the azimuthal angle to the list of kinematic variables employed in the backgroundrejection will make the analysis result less applicable to modiﬁed Higgs–like signal hypotheses. Indeed, the az-imuthal angle between the tagging jets is the key observable in the spin-0 CP analysis of the Higgs resonance [21].On the other hand, the same is true for the rapidity diﬀerence ∆ y when it comes to spin-2 alternatives [21]. IV. REPLACING A JET VETO

The key physics question we will answer in this Section is to what degree we can use information on additional(central) jet radiation to enhance the tagging jet analysis described in the previous Section III. Because a detailedanalysis of the jet activity has not been performed in the recent LHC runs, we assume a collider energy of 13 TeVin this section. The physics of the additional jets can be easily described: for the signal events the emission ofadditional central jets is suppressed by the color structure of the process. This means that the number of jets inweak boson fusion will in general follow the staircase pattern predicted for inclusive processes at the LHC [13].In contrast, gluon–fusion Higgs production or di-photon production will show this staircase pattern only in theabsence of tagging jet cuts. Once we require two hard jets with a large invariant mass we induce large logarithms, ∗ Given that

Tmva gives an ordered list of the most relevant observables, it is not clear to one of the authors (TP) why this veryinteresting information is never shown in experimental publications. ∆ y -selection p T -selectionWBF GF γγ WBF GF γγ generated [fb] 6.5 4.5 2050 6.5 4.5 2050∆ y j j > . × . × . × . × . × . × . y j y j < . × . × . × . × . × . × . m j j >

600 GeV × . × . × . × . × . × . × . × . × . × . × . × . which leads to a Poisson pattern in the number of jets [13]. The key feature of this Poisson distribution is asigniﬁcantly enhanced probability of radiating a central jet.Throughout our analysis we require two tagging jets with the generic acceptance cuts p T j >

20 GeV | y j | < . | ∆ y j j | > m j j >

150 GeV . (13)Correspondingly, we generate signal and background events using Sherpa [24] with

CKKW [25] jet mergingwith two or three hard jets from the matrix element. Throughout this Section we assume a collider energy of13 TeV. In addition to the general photon cuts of Eq.(5) we require m γγ = 126 ±

10 GeV. The cuts of Eq. (12)lead to cross sections of 6.5 fb for the weak–boson–fusion signal, 4.5 fb for gluon–fusion Higgs production, and2050 fb for the continuum background. As mentioned above, the signal–to–background ratio can be improvedthrough additional cuts, such as tightening the m γγ requirement. However, this makes it harder to reliablysimulate the background. In the following we will assume that additional cuts on the Higgs decay products areorthogonal to the additional jet kinematics.Because the selection criterion of the two tagging jets has a signiﬁcant impact on the amount of Poissonenhancement of the additional jet production we use two selection criteria for the tagging jets:1. p T -selection: of all jets fulﬁlling Eqs.(12) and (13) the two hardest are the tagging jets. The mild cuts ofEq.(13) leave 3.36 fb for the signal, 1.04 fb for gluon–fusion Higgs production, and 509 fb for the continuumbackground.2. ∆ y -selection: of all jets fulﬁlling Eq.(12) and (13) the two most forward and backward are the tagging jets,maximizing ∆ y j j . After Eq.(13) the remaining rates are 3.78 fb for the signal, 1.71 fb for gluon–fusionHiggs production, and 736.2 fb for the non-Higgs background.While the p T -selection is standard in most weak–boson–fusion analyses, it will turn out that the ∆ y -selection ismore eﬃcient in generating a large Poisson enhancement for central jet emission in the background processes.On the other hand, in particular for the 13 TeV run we have to see if pile-up makes one of the two selectionsappear experimentally superior.The standard approach to including the additional jet activity in the weak–boson–fusion Higgs analysis is acentral jet veto [12, 15]. To generate a suﬃciently strong Poisson pattern in the number of jets we demand | ∆ y j j | > . y j · y j < m j j >

600 GeV . (14)In Table II we show the cut ﬂow of the signal and background rates for each step in Eq.(14). Finally, we includea central jet veto which does not allow for jets above p T = 20 GeV in between the two tagging jets. While thetwo tagging jet selections show signiﬁcant diﬀerences in the intermediate steps, after the veto the numbers ofsignal and background events are comparable. The survival rates for the central jet veto are in agreement withthe literature [7, 15].In the ﬁrst three rows of Table III we show diﬀerent statistical measures after the acceptance cuts of Eqs.(12)and (13), the veto–level cuts of Eq.(14), and after the central jet veto. The background is composed of gluon–fusion Higgs production and continuum di-photon production. We again see that the signiﬁcance S/ √ S + B and the signal–to–background ratio are comparable for the ∆ y -selection and the p T -selection of the tagging jets. ∆ y -selection p T -selection (cid:15) S − (cid:15) B S √ S + B SB (cid:15) S − (cid:15) B S √ S + B SB acceptance cuts Eqs.(12) and (13) 1 0 0.76 0.005 1 0 0.81 0.007veto–level cuts Eq.(14) 0.402 0.854 0.80 0.014 0.405 0.996 1.01 0.026jet veto 0.302 0.967 1.24 0.047 0.369 0.945 1.26 0.045BDT: WBF default with Eq.(13) 0.400 0.862 0.79 0.014 0.400 0.904 1.04 0.0270.634 0.674 0.84 0.010 0.414 0.897 1.04 0.027BDT: WBF default plus FWM with Eq.(13) 0.400 0.952 1.34 0.041 0.400 0.944 1.35 0.0470.232 0.986 1.42 0.083 0.302 0.972 1.43 0.071Table III:

S/B and S/ √ S + B compared to classical cut and jet veto strategy for the ∆ y and p T -selections of the taggingjets. The value for S/ √ S + B we compute for an integrated luminosity of 30 fb − . The BDT analysis includes a set ofFox–Wolfram moments with unit weight, Eq.(15). We quote two working points at 40% signal eﬃciency and optimizedfor S/ √ S + B . However, this is only true after the jet veto. After only the hard cuts of Eq.(14) the p T -selection is signiﬁcantlymore promising. As alluded to above, the jet veto beneﬁts from the stronger Poisson enhancement from the∆ y -selection, leaving the ﬁnal results essentially identical.In the next step, we use the default WBF observables of Eq.(11) and optimize them in a multivariate BDTanalysis as described in Section II C. The corresponding ROC curve we show in Figure 1. As in Table III theeﬃciencies are deﬁned with respect to the full set of acceptance cuts from Eqs.(12) and (13). In the table wequote two points from this curve. First, we show the usual working point with a signal eﬃciency of 40%. Second,we show the working point with the best result for S/ √ S + B . Optimizing for the best result of S/B does notgive a well deﬁned solution. As expected, the ROC curve indicates working points for the entire range of signaleﬃciencies (cid:15) S = 0 ... W xij . Unlike for the taggingjet kinematics we now do not constrain our system to the transverse plane, which means we use the originaldeﬁnition of the moments in Eq.(2) with the opening angle Ω ij . On the other hand, we already know what thebeneﬁt of including the moments of the tagging jets are: according to Section III most of the information isincluded once we add the azimuthal angle between the tagging jets, ∆ φ j j , to the standard set of observablesgiven in Eq.(11). Therefore, we limit the analysis of the additional jet activity to all jet–jet correlations withthe exception of the two tagging jets. Moreover, we can expect the unit weight to give the best sensitivity tothe relatively soft additional jet activity, so we use H U(cid:96) = 1 N (cid:88) ( i,j ) (cid:54) =(1 , P (cid:96) (cos Ω ij ) . (15)For both of the tagging jet selections we only include jets which fall between the two tagging jets, in completeanalogy of a central jet veto. For exactly two tagging jets and no additional jet radiation this implies H U(cid:96) = 0for all values of (cid:96) .In Table III we show the result of a combined BDT analysis of the observable of Eq.(11) and the set of Fox–Wolfram moments. Again, we quote two operating points, one of them for a ﬁxed signal eﬃciency of 40% andone optimized for the best value of S/ √ S + B . In addition, we show results for both, the ∆ y -selection and the p T -selection of the tagging jets. A generic problem for any BDT analysis is that for limited statistics of thetraining sample it can only include a limited number of observables. On the other hand, the BDT ﬁrst determinesthe most powerful observables, so we only include the ﬁve best Fox–Wolfram moments in our analysis. We havechecked that adding more moments will not improve the result beyond numerical accuracy. For the ∆ y -selectionthe ﬁve leading moments with unit weight are H U , H U , H U , H U , and H U . For the p T -selection the mostpowerful moments are H U , H U , H U , H U , and H U . However, for the p T -selection the most powerful variablein the BDT is ∆ y j j . For the ∆ y -selection this observable is maximized by construction.The ROC curves in Figure 1 shows a clear improvement of the complete multivariate analysis including theFox–Wolfram moments as compared to the kinematic variables of Eq.(11) only. For a ﬁxed moderate signal S ˛ B ˛ - y-selection D WBF default, y-selection D plus FWMs, -selection T WBF default, p -selection T plus FWMs, p S ˛ B ˛ - y-selection D FWMs, -selection T FWMs, pjet vetojet veto

Figure 1: ROC curve for ∆ y - (black) and p T -selection (red) of the tagging jets. Left: We compare the WBF defaultobservables (dashed) of Eq.(11) to an additional set of Fox–Wolfram moments (solid). Right: We show how usingFox–Wolfram moments compare to a central jet veto. eﬃciency of 40% adding information on the jets decreases the probability of a background mis-identiﬁcation bya factor of 2.9 for the ∆ y -selection and a factor of 1.7 for the p T -selection. The improvement relative to the jetveto we show in the right panel, zooming into typical signal eﬃciencies around 35% relative to the acceptancecuts of Eq.(13). For the jet veto working point of the ∆ y -selection with ﬁxed signal eﬃciency of 30.2% we seethat the background misidentiﬁcation is reduced by 30%. For the p T -selection with ﬁxed signal eﬃciency of36.9% we ﬁnd an improvement by 20%. V. AVOIDING NEW SCALES

The unit weights in the deﬁnition of the Fox–Wolfram moments used in the previous Section IV share adisadvantage with a jet veto when it comes to predicting them from theory: they introduce an additionalphysical momentum scale in the process which is below the hard scale of the Higgs production process. Collinearfactorization as the basis of deﬁning the parton densities in perturbative ﬁeld theory does not allow for suchadditional scales. All measurements which are to be compared to ﬁxed–order perturbative QCD predictions haveto be jet–inclusive for transverse momenta below the factorization scale. If we introduce an additional scale thisimplies that we introduce a possibly large logarithm which needs to be resummed [16–18].Introducing a weight which smoothly interpolates between the jet counting scale p min T j = 20 GeV and the hardscale of the process according to Eq.(4) should alleviate this tension, suggesting to repeat the same analysis asshown in Section IV with the Fox–Wolfram moments H M(cid:96) = (cid:88) ( i,j ) (cid:54) =(1 , ( p T i − p min T ) ( p T j − p min T ) (cid:0)(cid:80) p T i − p min T (cid:1) P (cid:96) (cos Ω ij ) . (16) ∆ y -selection p T -selection (cid:15) S − (cid:15) B S √ S + B SB (cid:15) S − (cid:15) B S √ S + B SB jet veto Eq.(12) to (14) 0.302 0.967 1.24 0.047 0.369 0.945 1.26 0.045BDT: WBF default plus unit–weight FWM 0.400 0.952 1.34 0.041 0.400 0.944 1.35 0.0470.232 0.986 1.42 0.083 0.302 0.972 1.43 0.071BDT: WBF default plus matched–weight FWM 0.400 0.949 1.32 0.040 0.400 0.942 1.32 0.0450.240 0.985 1.43 0.081 0.256 0.979 1.40 0.082Table IV:

S/B and S/ √ S + B compared to jet veto strategy for the ∆ y and p T -selections of the tagging jets. The valuefor S/ √ S + B we compute for an integrated luminosity of 30 fb − . Extending Table III the BDT analysis now includesa set of Fox–Wolfram moments with matched weight, Eq.(16). As BDT results we quote the working point at 40% signaleﬃciency and the best point for S/ √ S + B . S ˛ B ˛ - y-selection D matched weight, y-selection D unit weight, -selection T matched weight, p-selection T unit weight, p S ˛ B ˛ - y-selection D FWMs, -selection T FWMs, pjet vetojet veto

Figure 2: ROC curve for ∆ y - (black) and p T -selection (red) of the tagging jets. Left: We compare the WBF defaultobservables (dashed) of Eq.(11) to an additional set of Fox–Wolfram moments (solid). Right: We show how usingFox–Wolfram moments compares to a central jet veto. While we cannot oﬀer an estimate of the improvement in the perturbative QCD treatment, it is clear that thematched weights are less sensitive to large collinear logarithms generated by the violation of collinear factoriza-tion.In Table IV we extend the original Table III, including the same BDT analysis now based on matched Fox–Wolfram moments. For the standard working point with 40% signal eﬃciency we see that the backgroundrejection from the matched moments is essentially identical to the unit weight moments. The main diﬀerenceis the order of the most relevant set of moments, which now is H M , H M , H M , H M , H M for the ∆ y -selectionand H M , H M , H M , H M , H M for the p T -selection. Similarly, the working point optimized for S/ √ S + B is onlyslightly shifted. In Figure 2 we compare the ROC curves for the jet radiation study based on the two Fox–Wolfram moment weights. For signal eﬃciencies between 25% and 40% the unit weight is slightly superior, butmost likely this slight advantage will be compensated once we include theory uncertainties from QCD predictions. VI. OUTLOOK

Weak boson fusion analyses of Higgs production at the LHC are key ingredients to Higgs couplings andHiggs property analyses in the upcoming LHC run. They allow for an eﬃcient background rejection based ontwo tagging jets and an additional central jet veto. The question is, how we can make optimal use of the jetproperties for example to improve the signal–to–background ratio or the signal signiﬁcance. In our detailedanalysis we come to three conclusions:1. For the two tagging jets we rely on a set of low- (cid:96) moments with a transverse momentum weight andazimuthal angle separation. Most of the improvement as compared to the standard ATLAS analysis can betraced back to the missing azimuthal angle between the tagging jets. In addition, the signal–to–backgroundratio can be increased by 8% by including a set of Fox–Wolfram moments.2. The additional jets can be studied using a wide range of moments with a unit weight and full angularseparation. It should be compared to a jet veto and delivers a signiﬁcantly better performance. Thetagging jet selection with maximum rapidity distance is better suited to distinguish the signal from thecontinuum background then the transverse momentum selection. For both cases we computed a full ROCcurve, allowing for optimized working points depending on the details of the analysis.3. To reduce theory uncertainties from QCD predictions we can introduce a softer, matched weight in theFox–Wolfram moments. It turns out that the analysis of jet radiation is almost as promising as for theunit weights, but with a much improved theoretical behavior.We conclude that tagging jet criteria as well as the jet veto as analysis tools for Higgs analyses in weak bo-son fusion can be improved by a systematic study of the multi–jet system based on Fox–Wolfram moments.The improvement is signiﬁcant, both for the ∆ y -selection and the p T -selection of the tagging jets. The Fox–Wolfram moment analysis can be adapted to individual analyses by choosing appropriate working points in thecorresponding ROC curves.1 Acknowledgments

CB would like to thank the ATLAS collaboration at CERN, where part of this work was performed. We wouldlike to thank Erik Gerwick and Steﬀen Schumann for help with the event generation. CB acknowledges supportby BMBF under project number 05H12VHE. PS acknowledges support by the IMPRS for