HEPLike: an open source framework for experimental likelihood evaluation
HHEPLike : an open source framework for experimentallikelihood evaluation
Jihyun Bhom a, ∗ , Marcin Chrzaszcz a, ∗ a Henryk Niewodniczanski Institute of Nuclear Physics Polish Academy of Sciences,Krakow, Poland
Abstract
We present a computer framework to store and evaluate likelihoods com-ing from High Energy Physics experiments. Due to its flexibility it can beinterfaced with existing fitting codes and allows to uniform the interpreta-tion of the experimental results among users. The code is provided withlarge open database, which contains the experimental measurements. Thecode is of use for users who perform phenomenological studies, global fits orexperimental averages.
Keywords: experimental high energy physics, likelihoods
PROGRAM SUMMARY/NEW VERSION PROGRAM SUM-MARY
Program Title: HEPLikeLicensing provisions(please choose one): GPLv3Programming language: C++Supplementary material:Journal reference of previous version: FIRST VERSION OF PROGRAMNature of problem(approx. 50-250 words): Provide a uniform way of store,share and evaluate experimental likelihoods in a proper statistical manner. Thecode can be easily interfaced with existing global fitting codes. In addition a largedatabase with the measurements is published. The program targets users who per-form in their scientific work: phenomenological studies, global fits or measurements ∗ Corresponding author.
E-mail address: [email protected], [email protected]
Preprint submitted to Computer Physics Communications March 10, 2020 a r X i v : . [ phy s i c s . d a t a - a n ] M a r verages. The HEPLike has been created for FlavBit project[1], which was used toperform several analysis[2,3] and here we present an updated version, which canbe used in standalone mode.Solution method(approx. 50-250 words): C++ code that evaluates the statisticalproperties of the measurements without user intervention. The large open databaseis provided as well. The measurements are stored in YAML files allowing for easyreadability and extensions.
References [1] arXiv: 1705.07933[2] arXiv: 1705.07935[3] arXiv: 1705.07917 . Introduction In the High Energy Physics (HEP) the experimental measurements areperformed by several collaborations, which measure various different observ-ables. The experimental results are presented in various ways; some being assimple as a measurement with an Gaussian error, some more complicated asmultiple correlated measurements with asymmetric errors or in some placeseven a full likelihood function is being published. To make things more com-plicated in some cases multiple representations of the same measurement arebeing published. All of this makes it hard to directly use and compare var-ious different results. It also leaves a room for misinterpreting the resultsby theorists, which use these inputs to their studies. It happens that theasymmetric errors are being symmetrized, instead of using the full likelihoodonly central value with approximated asymmetric error is being used.The High Energy Physics Likelihoods (
HEPLike ) is a computer programthat allows to store and share the likelihoods of various measured quanti-ties. The published code can be useful for users performing phenomenologicalstudies using experimental results, global fitting collaborations or experimen-tal averages. Thanks to its structure it is easy to be interface with existingcodes. It simplifies the work of people as instead of looking up the appropri-ate measurement and coding up their own likelihood they can download thedatabase of measurements and choose the one they need. Furthermore, itshifts the burden of constructing the proper likelihood functions back to theexperimentalists, which performed the measurement at the first place andare clearly the most appropriate people to handle this task.The computer code described in this paper is written in
C++ , making ituseful for majority of fitting programs available on the market [1, 2, 3, 4, 5, 6].The library can be used in both the χ and likelihood fits. Moreover, it con-tains a statistical module with useful functions that can be used in the sta-tistical analysis. Besides the computer code a database with the likelihoodsis being published. The measurements are stored in the YAML files makingthem easy to read by both the machine and human. This database can beeasily extended by adding new
YAML files if new measurement becomes avail-able. With the software we provide useful utilities, which allows to performsearches inside the database, create
BiBtex containing publications, whichhave been in the fit, etc.The paper is organized as follows: in Sec. 2 construction of the likelihoodfunctions is presented. Sec. 3 explains the detailed code implementations and3ata storage, while Sec. 4 describes how to install and use
HEPLike software.
2. Likelihood constructions
In this section we will present how likelihoods in
HEPLike are stored andconstructed. Each measurement is stored in separate
YAML file. There areseveral ways collaborations published their results depending on the mea-surements itself: • Upper limits, • Single measurement with symmetric uncertainty, • Single measurement with asymmetric uncertainty, • Multiple measurements with symmetric uncertainty, • Multiple measurements with asymmetric uncertainty, • One dimensional likelihood function, • n-dimensional likelihood function.In addition, there is growing interest from the community that the exper-imental collaborations instead of only the results of the analysis publish alsothe dataset that has been used to obtain the result. For this future occasionwe have also implement a way that this data can be directly used in the fits.Each of these cases has a different module of HEPLike that is designedto evaluate the likelihood functions. In this section we will present the sta-tistical treatment of the above cases and the modules that are responsiblefor their evaluation. Each of the
YAML files is required to have the followinginformation (here for example we use the R K ∗ measurement [7]): 1B i b C i t e : A a i j : 2 0 1 7 vbb 2BibEntry : ’ @ a r t i c l e { A a i j : 2 0 1 7 vbb , 3a u t h o r = ” A a i j , R . and o t h e r s ” , 4t i t l e = ” { T e s t o f l e p t o n u n i v e r s a l i t y 5w i t h $Bˆ { } \ r i g h t a r r o w 6Kˆ {∗ }\ e l l ˆ { + }\ e l l ˆ {−} $ d e c a y s } ” ,7c o l l a b o r a t i o n = ”LHCb” , 8j o u r n a l = ”JHEP” ,4volume = ” 0 8 ” , 10y e a r = ” 2 0 1 7 ” , 11p a g e s = ” 0 5 5 ” , 12d o i = ” 1 0 . 1 0 0 7 / JHEP08 ( 2 0 1 7 ) 0 5 5 ” , 13e p r i n t = ” 1 7 0 5 . 0 5 8 0 2 ” , 14a r c h i v e P r e f i x = ” a r X i v ” , 15p r i m a r y C l a s s = ” hep − ex ” , 16reportNumber = ”LHCB − PAPER − −
013 , 17CERN − EP − − \ p r o t e c t \ v r u l e w i d t h 0 p t \ p r o t e c t \ h r e f { h t t p : / / a r x i v . o r g / a b s / 1 7 0 5 . 0 5 8 0 2 } { a r X i v : 1 7 0 5 . 0 5 8 0 2 } ; % % ”19 }
20’ 21DOI : 1 0 . 1 0 0 7 / JHEP08 ( 2 0 1 7 ) 0 5 5 22P r o c e s s : R { K s t a r ˆ {∗}} > <
6. 31HLAuthor : Gal Anonim 32HLEmail : g a l . a n o n i m @ i f j . edu . p l 33HLType : H L P r o f L i k e l i h o o dThe above informations are used to store the information relevant for thebookkeeping. For instance the entries
BibCite and
BibEntry correspondto the information that are used to generate a
BiBtex citation file with themeasurements that have been used in the studies. The
DOI corresponds tothe digital object identifier of the publication. The
Decay defines the processthat has been studied. It can also be replaced by the
Process entry. The
Name is a unique name of this measurement type. If the measurement getsupdated with more data or by other collaboration the
Name entry in the new
YAML file should be the same as in the old one.
Source entry corresponds tothe source of the measurement. This can be either a HEPData or the collab-oration itself. The
SubmissionYear ( PublicationYear ) refers to the year ofappearance (publication) of the result. The
Arxiv codes the Arxiv number,5hile the
Collaborations stores the information which experimental col-laboration has performed the measurement. Finally, the
Kinematics storesadditional information about the kinematic region that has been measured.The
HLAuthor and
HLEmail encode the information about the
YAML file au-thor and his email in case user needs further information about the encodedmeasurement. Last but not least the entry
HLType contains the informationabout which
HEPLike object should be used to read the file.Reading of this content in the
YAML is implemented in the
HL Data class.All other classes that construct the likelihood functions inherit from thisclass its capabilities. Please note that if the information is missing in the
YAML file the program will omit reading this entry. The only exception is the
FileName , which is mandatory. If a user wants to be notified by the programthat some informations are missing the
HL debug yaml variable has to be setto true (default value is false ). In case where the measurement did not observe a significant access ofsignal candidates the collaborations usually report an upper limit on themeasured quantity. Commonly 90% or 95% upper limits are quoted. Ex-periments use various statistical approaches to compute this limits. It canbe the CL s method [8], Feldman–Cousins [9] or some variation of Bayesianmethods [10]. Publication of only an upper limits does not provide enoughinformation to use the result in global fits. However, nowadays experimentsbesides the aforementioned upper limits publish a full p-value scans. Ex-amples of such scans are shown in Fig. 1. The plots are usually availablein digital format, which allows the information to be extracted and used incomputer program.In HEPLike a class
HL Limit is responsible for handling this type of mea-surements. It reads the
YAML file that contains the standard informationabout the measurement (see Sec. 2 for details). The additional informationof the observed CL s /p-value is stored in the YAML file in the following way : 1C l s : 2 − [ 0 . 0 , 1 . 0 ] 3 − [ 1 . 0 e −
10 , 0 . 9 7 7 0 9 1 6 9 4 7 0 6 ] Please note that the besides this information the previous information from Sec. 2should be included. - t + tfi s B ( B - v a l u e p LHCb
Observed CLsExpected CLs - Median σ ± Expected CLs σ ± Expected CLs C L S ) ± µ ± e → (D B -8 × LHCb
Figure 1: Example of p-value scans for the B → τ − τ + [11] (left) and D → e µ [8] (right).Please note that the CL s value can be interpreted as p-value as explained in [12]. Theblack line corresponds to the observed CL s /p-value. − [ 2 . 0 e −
10 , 0 . 9 5 4 3 7 5 8 2 4 2 9 7 ] 5 − [ 3 . 0 e −
10 , 0 . 9 3 2 0 0 3 5 5 3 4 3 ] 6 − [ 4 . 0 e −
10 , 0 . 9 1 0 6 3 0 7 0 0 5 4 6 ] 7 − [ 5 . 0 e −
10 , 0 . 8 8 9 3 8 2 7 2 1 8 0 9 ]The
Cls can be replaced in the
YAML file by p-value as they correspondto the same information. The first number in each array is the value oftested hypothesis (for example branching fraction), while the second is thecorresponding CL s /p-value. These values are then interpreted using a χ distribution with one degree of freedom: pdf ( x ) = 12 / Γ(1 / x / − e − x/ , (1)which had the cumulative distribution function defined as: cdf ( x ) = 1Γ(1 / γ (1 / , x/ . (2)In the above equations the Γ( x ) and γ ( k, x ) correspond to Gamma and in-complete gamma functions. By revering the cdf ( x ) one can obtain the χ value: χ = cdf − (1 − p ) , (3)where p corresponds to the p-value of a given x hypothesis. This χ can be7hen translated to the log-likelihood via Wilks theorem [13]: − log( L ) = 12 χ , (4)where the L is the likelihood. The user can choose if he wants to obtain the χ , likelihood or a log-likelihood value of a given hypothesis. The simplest case of a published experimental result is a single value witha symmetric uncertainty. This is for example a typical result of an PDG ofHFLAV average [14, 15]. The measurement is coded in the
YAML file as: 1O b s e r v a b l e s : 2 − [ ”Br A2BCZ ” , 0 . 1 , 0 . 0 5 , 0 . 0 1 ]The first argument in the array ”Br A2BCZ” corresponds to the observ-able name. Then the first number corresponds to the measured central value.The 2nd and the 3rd number are the statistical and systematic uncertain-ties. In case where only one uncertainty is available the 3rd number shouldbe omitted and it will be automatically set to 0 in the software. We havedecided to keep the plural Observables to be more uniform in case wheremore observables are measured.The module responsible for reading this
YAML file is called
HL Gaussian ,it calculates the χ for an x hypothesis: χ = ( x obs − x ) σ stat + σ syst , (5)where the x obs correspond to the measured central value in the YAML file andthe σ stat and σ syst are the statistical and systematic uncertainties respectively.This can be the again translated to the likelihood and log-likelihood valueusing Eq. 4. A simple extension of the Gaussian uncertainty is when an asymmetricuncertainty is reported. This type of measurements although less frequentappear in the literature. The publication in this case reports the centralvalue and two uncertainties: σ + and σ − , which correspond to the right (forvalues larger than the measured central value) and left (for values smallerthan the measured central value) uncertainty. In HEPLike we have created a
HL BifurGaussian class, which reads the following entry in the
YAML file:8O b s e r v a b l e s : 2 − [ ”Br A2BCZ ” , 0 . 1 , 0 . 0 5 , − − σ + and σ − uncertainties, while the fifth and sixth to the system-atical σ + and σ − uncertainties. It is important to keep the minus sign beforethe left side uncertainties. The code will indicate the error in case of missingsign. In some cases the systematical uncertainty is reported to be symmetric.In such case the last number can be omitted in the YAML entry.In the literature there exist number of ways to interpret asymmetric un-certainties [16]. We have chosen the most commonly used one which is theso-called bifurcated Gaussian: χ = ( x obs − x ) σ , if x ≥ x obs ( x obs − x ) σ − , if x < x obs , (6)where the σ ± is the sum of squared statistical and systematic uncertainties forright/left case. Once χ is calculated it can be translated to the log-likelihoodusing Eq. 4. Nowadays the most common are simultaneous measurements of variousquantities, which are correlated between each other. For instance cross sec-tion measurements in different kinematic bins, or measurements of angularcoefficients in heavy mesons decays. In
HEPLike the class responsible forhandling these cases is called
HL nDimGaussian . It reads the following infor-mation from the
YAML file: 1O b s e r v a b l e s : 2 − [ ”BR1” , 0 . 1 , 0 . 0 2 ] 3 − [ ”BR2” , 0 . 2 , 0 . 0 1 , 0 . 0 1 ] 4 − [ ”BR3” , 0 . 4 , 0 . 0 4 ] 5C o r r e l a t i o n : 6 − [ ”BR1” , ”BR2” , ”BR3 ” ] 7 − [ 1 . , 0 . 2 , 0 ] 8 − [ 0 . 2 , 1 . , 0 . ] 9 − [ 0 , 0 . , 1 . ] 9he information in the “Observables” entry is exactly the same as in the HL Gaussian class. Please note that similarly to the previous class the sys-tematic uncertainty is not mandatory and in case if it is not provided thecode will treat it as 0. The next entry in the
YAML file is the “Correlation”,which encodes the correlation matrix. The first ”row” is the names of thevariables it is important to keep the same order of variables as in the “Ob-servables” entry. The
HL nDimGaussian evaluates the χ in the followingway: χ = V T Cov − V, (7)where V is a column matrix, which is the difference between the measuredand the tested i-th observable value. The Cov is a square matrix, constructedfrom the correlation matrix (Corr): Cov i,j = Corr i,j σ i σ j .Often a user does not want to use the full set of measured quantities butjust their subset. In this case a function Restrict(vector
YAML file encoding such a measurement will contain the following entries: 1O b s e r v a b l e s : 2 − [ ”BR1” , 0 . 1 , + 0 . 0 2 , − − [ ”BR2” , 0 . 2 , + 0 . 0 1 , − − − [ ”BR3” , 0 . 3 , + 0 . 0 4 , − − [ ”BR1” , ”BR2” , ”BR3 ” ] 7 − [ 1 . , 0 . 1 , 0 . 2 ] 8 − [ 0 . 1 , 1 . , 0 . 1 ] 9 − [ 0 . 2 , 0 . 1 , 1 . ]The meaning of the “Observables” entry is the same as in the previousclass (cf. Sec. 2.3) and the “Correlation” encodes the same information as10n the HL nDimGaussian class (cf.. Sec. 2.4). The rules about the minussign and symmetric systematic uncertainty are the same as in case of the
HL BifurGaussian (cf. Sec. 2.3). The difference arises when one evaluates the χ , namely the cov matrix is constructed depending if σ + and σ − uncertaintyis relevant: Cov i,j = Corr i,j σ i + σ j + , if x i ≥ x iobs and x j ≥ x jobs Corr i,j σ i + σ j − , if x i ≥ x iobs and x j < x jobs Corr i,j σ i − σ j + , if x i < x iobs and x j ≥ x jobs Corr i,j σ i − σ j − , if x i < x iobs and x j < x jobs (8)The obtained Cov matrix is then used to calculate the χ using Eq. 7.The rest follows the same procedure as described in Sec. 2.4. The best way a result can be published is by providing the (log-)likelihoodfunction. This type of results are more and more common in the literature.The most easy is the one-dimensional likelihood scans as can be presented inform of a figure, which examples are shown in Fig. 2. *0 K R ) b e s t L l n - L ( l n - L0EL0HL0ICombinedComb. (stat) ] c / <6.0 [GeV q K R ) b e s t ( l n L - l og L - LHCbLHCb
Electron Kaon Other Combination
Figure 2: Examples of published one-dimensional likelihoods in the Lepton UniversalityViolation of the B → K ∗ (cid:96)(cid:96) [7] (left) and B → K (cid:96)(cid:96) [17] (right). The biggest advantage of publishing the results in this form is its com-pleteness. The (log-)likelihood curve contains all the information about allthe non-Gaussian effects and incorporates the systematic uncertainties. Thetechnical problem is how to publish such information. Usually plots are pub-lished in the pdf or png formats which makes them hard to be used. Sinceexperiments are mostly using ROOT [18] framework the plots are saved also in11he C format, which contains the points in the form of arrays. This of coursemakes the points accessible however it is not easy to automate retrieving thisdata from the C file. The best solution is provided by the HEPData portal [19].It allows to download the data in a user preferred format. In
HEPLike wehave chosen to use the
ROOT format by default, in which the data points aresaved in the form of a
TGraph object, which is also the way experimentalistslike to store this information. In the
YAML file we specify the path of the
ROOT in the following way: 1ROOTData : d a t a /HEPData − i n s 1 5 9 9 8 4 6 − v1 − T a b l e 1 . r o o t 2TGraphPath : ” Table 1/ Graph1D y1 ”The
ROOTData encodes the location of the
ROOT file, while the
TGraphPath encodes the location of the
TGraph object in that
ROOT file. In
HEPLike the class
HL ProfLikelihood is responsible for reading and encoding thislikelihood. The value of the log-likelihood can be ten translated again intothe χ with Eq. 4. The natural extension of one dimensional likelihood is an n-dim likelihood,where n ≥
2. Currently experimental collaborations publish only 2-dimlikelihood functions (cf. Fig. 3). ) - m + m fi s0 BF(B - · ) -m + m fi B F ( B - · - - - W cross section [pb]tt Z c r o ss s e c t i on [ pb ]tt Best fit68% CL95% CLNLO prediction
ATLAS -1 = 13 TeV, 36.1 fbs Figure 3: Examples of published two-dimensional likelihoods. The B (B → µµ ) vs B (B → µµ ) likelihood [20] (left) and σ (ttZ) vs σ (ttW) likelihood [21] (right). TH2D or TH3D and we have chosen this way to store this information. The correspond-ing entry in the
YAML file looks as following: 1ROOTData : d a t a /LHCb/RD/ Bs2mumu 5fb / histB2mumu . r o o t 2TH2Path : ” h 2DScan ”Similar to the one dimensional likelihood (Sec. 2.6) the
ROOTData encodes thelocation of the
ROOT file, while the
TH2Path ( TH3Path ) encodes the location ofthe
TH2D ( TH3D ) object. In the long run the community will have to addressthe question how to publish higher dimensional likelihoods and this module(
HL nDimLikelihood ) will have to be extended for such use cases.
It is possible that in the future experimental collaborations besides theresults will made the datasets public. The procedure and the form in whichthe data should be published is not decided and there is an ongoing debateif the published data should correspond to the raw detector data, the finalselected points used in the analysis or something between? Clearly publish-ing a raw data is problematic, as people outside the collaboration do nothave the necessary knowledge about the calibration and efficiency correctionprocedures or data taking conditions. The most useful way to publish thedataset is to allow the experimentalists to perform all the selection, all thenecessary efficiency corrections and publish the final dataset that has beenused for analysis. This would allow the theory community to use the datasetdirectly in their fits without knowing the technicalities about the experimen-tal data analysis. For this case in
HEPLike we have implemented such a class
HL ExpPoints .The data are stored in the
TTree structure located in the
ROOT file. The
YAML file encodes this information in form: 1ROOTData : d a t a / t o y / d a t a . r o o t 2TTreePath : t 3O b s e r v a b l e s : 4 − [ x ] 5 − [ y ] 6 − [ z ] 7Weight : w 13here the ROOTData points to the
ROOT file and the
TTreePath stores theinformation of the
TTree location inside the
ROOT file. It is assumed that theexperiments will provide all the corrections in form of event-by-event weights.The name of the weight inside the
TTree is encoded in the
Weight entry. Ingeneral the data points are elements of R n vector space, which coordinatesare stored in the Observables entry.The only thing that user needs to provide to the
HL ExpPoints object is apointer to the function to be fitted. The function should have a form: double(*fun)(vector
HL ExpPoints will then evaluate the likelihood: L ( ω ) = f ( x | ω ) w ( x ) (9)for the whole dataset. In the above the x correspondents to the n-dimensionalpoint, ω denotes the parameters that want to be fitted par , and f denotesthe fitting function ( fun ). The HEPLike does not provide a minimalizer ora scanner tool as it is not purpose of this type of software. It has to beinterfaced with proper scanner tool for example [1]. Again the user candecide if he/she prefers to perform a χ or log-likelihood fit.The biggest advantage of such format is the compatibility with the experi-mental analysis. Experimentalist can in principle publish as well the functionthat they have used to fit this data and therefore a theorists reproduce theexperimental result and start where the experimentalists finished.
3. Code implementation
In this section we will discuss the implementation of the code used tocreate likelihoods discussed in Sec. 2. The code is build in several classes: • HL Data : base class from which other classes inherit their base func-tionality. • HL Limit : class that handles the upper limit measurements. • HL Gaussian : class that handles measurements with Gaussian uncer-tainty. • HL BifurGaussian : class that handles measurements with asymmetricuncertainty. 14
HL nDimGaussian : class that handles measurements with n-dimensionalGaussian uncertainties. • HL nDimBifurGaussian : class that handles measurements with n-dimensionalasymmetric uncertainties. • HL ProfLikelihood : class that handles measurements with one-dimensionallikelihood function. • HL nDimLikelihood : class that handles measurements with 2(3)-dimensionallikelihood function. • HL ExpPoints : class that allows to perform the fits to experimentaldatasets.In Tab. 1 we present the functionality of these classes. In addition wepresent the hierarchy of the structure of the class inheritance in Fig. 4.Table 1: Functions available in the
HEPLike software.
Function Description
HL Data()
Constructor of the
HL Data class.
HL Data(string)
Constructor of the
HL Data class. Theargument that is taken by constructor is thepath for the
YAML file encoding themeasurement.
HL Limit()
Constructor of the
HL Limit class.
HL Limit(string)
Constructor of the
HL Limit class. Theargument that is taken by constructor is thepath for the
YAML file encoding themeasurement.
HL Gaussian()
Constructor of the
HL Gaussian class.
HL Gaussian(string)
Constructor of the
HL Gaussian class. Theargument that is taken by constructor is thepath for the
YAML file encoding themeasurement.
HL BifurGaussian()
Constructor of the
HL BifurGaussian class.15able 1 –
Continued from previous page
Function Description
HL BifurGaussian(string)
Constructor of the
HL Gaussian class. Theargument that is taken by constructor is thepath for the
YAML file encoding themeasurement.
HL nDimGaussian()
Constructor of the
HL nDimGaussian class.
HL nDimGaussian(string )
Constructor of the
HL nDimGaussian class.The argument that is taken by constructoris the path for the
YAML file encoding themeasurement.
HL nDimBifurGaussian()
Constructor of the
HL nDimBifurGaussian class.
HL nDimBifurGaussian(string)
Constructor of the
HL nDimBifurGaussian class. The argument that is taken byconstructor is the path for the
YAML fileencoding the measurement.
HL ProfLikelihood()
Constructor of the
HL ProfLikelihood class.
HL ProfLikelihood(string)
Constructor of the
HL ProfLikelihood class. The argument that is taken byconstructor is the path for the
YAML fileencoding the measurement.
HL nDimLikelihood()
Constructor of the
HL nDimLikelihood class.
HL ProfLikelihood(string)
Constructor of the
HL nDimLikelihood class. The argument that is taken byconstructor is the path for the
YAML fileencoding the measurement.
HL ExpPoints()
Constructor of the
HL ExpPoints class.
HL ExpPoints(string)
Constructor of the
HL ExpPoints class. Theargument that is taken by constructor is thepath for the
YAML file encoding themeasurement. read standard()
Function that reads the general informationabout the measurement from the
YAML file.16able 1 –
Continued from previous page
Function Description set debug yaml(bool)
Function that enables debugging the
YAML file. By default the debugging is switchedoff and can be switched on by passing a true bool argument to this function.Debugging will print a message that for agiven information in the
YAML file is missing.
Read()
Function reading the
YAML file. The function
GetChi2(double)
Function that returns the χ value for agiven point (passed to the function asdouble). Function is available for all classesbesides HL Data . GetLogLikelihood(double)
Function that returns the log-likelihoodvalue for a given point (passed to thefunction as double). Function is availablefor all classes besides
HL Data . GetLikelihood(double)
Function that returns the likelihood valuefor a given point (passed to the function asdouble). Function is available for all classesbesides
HL Data . GetCLs(double)
Function that returns CL s or p-value for agiven point (passed to the function asdouble). The function is a member of the HL Limit class.
Restrict(vector
Function that restricts number ofobservables from the
YAML file. Function is amember of the
HL nDimGaussian , HL nDimBifurGaussian and
HL nDimLikelihood classes.
InitData()
Function of
HL ExpPoints class that readsto the memory the data from the
TTree object.
Profile()
Function of
HL nDimLikelihood class thatcreates the profile log-likelihood projections.
SetFun()
Function of
HL ExpPoints class, that setsthe pointer to the function to be fitted.17 igure 4: Diagram of class inheritance of the
HEPLike package.
4. Installation and usage
In this chapter we will present the requirements and installation for the
HEPLike package. The software is distributed via the github site: https://github.com/mchrzasz/HEPLike .In order to compile
HEPLike the following packages (and the minimalversion) needs to be installed: • git • cmake , 2.8 • yaml-cpp , 1.58.0 • gsl , 2.1 • Boost , 1.58.0 • ROOT , 6.08The compilation is done in the following way: 1cd < i n s t a l a t i o n d i r >
2g i t c l o n e h t t p s : / / g i t h u b . com/ m c h r z a s z /HEPLike . g i t 3cd HEPLike 4mkdir b u i l d 5cd b u i l d 18cmake . . 7makeIn the above the make can be replaced with make -jN , where N is the numberof threads that user wants to be used for compilation. Please note thatin case of non standard installation of some packages one might have toprovide cmake with a proper path to the library. After successful compilationa libHEPLike.a and libHEPLike.so libraries will be created in the build directory.The
HEPLike is provided with seven examples: • Br example.cc : example program showing the usage of the
HL Gaussian class. • BrBifurGaussian example.cc : example program showing the usageof the
HL BifurGaussian class. • Data Fit example.cc : example program showing the usage of the
HL ExpPoints class. • Limit example.cc : example program showing the usage of the
HL Limit class. • Ndim BifurGaussian example.cc : example program showing the us-age of the
HL nDimBifurGaussian class. • Ndim Gaussian.cc : example program showing the usage of the
HL nDimGaussian class. • Ndim Likelihood example.cc : example program showing the usage ofthe
HL nDimLikelihood class. • ProfLikelihood example.cc : example program showing the usage ofthe
HL ProfLikelihood class.To compile them a proper variable has to be set during the cmake stage: 1cd b u i l d 2cmake − DEXECUTABLE=TRUE . . 3make 19fter the compilation in the build directory will contain executablesfrom the following examples. The
HEPLike package comes also with testprocedures for each of the classes. To perform the tests user has to performthe command:c t e s tor an equivalent:make t e s tIf the
HEPLike was successfully installed the output will look as following:T e s t p r o j e c t / s t o r a g e / g i t h u b /HEPLike/ b u i l dS t a r t 1 : HL Test YAML1/7 T e s t .1. Available measurement
The
YAML files that contain the stored measurements are located in asecond independent repository. The reason for this separation is that the
YAML files are expected to be updated more frequently then the code itself.It is expected that users and experiments will contribute to this repository.By implementing such model it is ensured that the repository will containthe most up to date measurements.The repository can be found at: https://github.com/mchrzasz/HEPLikeData .The repository should be downloaded or cloned: 1cd < some new d i r >
2g i t c l o n e h t t p s : / / g i t h u b . com/ m c h r z a s z / HEPLikeData . g i tSince the repository contains only
YAML files there is no need for anycompilation. The repository contains a directory data , where all the
YAML files are kept. It should be linked by a symbolic link to the
HEPLike pack-age. Inside the data the measurements are grouped by experiments (ex.LHCb, ATLAS, CMS, etc.). Inside the experiment directory the measure-ments are grouped according to type of measurement in the collaborations,for example: RD , Semileptonic , Charmless , Exotica , etc. The names ofthe
YAML files should be named accordingly to publication report number.For example:
CERN-EP-2018-331.yaml . If a single publication producedmore independent measurements, user might code them in the indepen-dent files and give further information at the end of the file, for exam-ple:
CERN-PH-EP-2015-314 q2 01 0.98.yaml .Currently we are publishing the measurements that have been used byus in other projects [22, 23, 24]. The list of
YAML files with the context ispresented in Tab. 2.Table 2: Functions available in the
HEPLike software.
File Description
CERN-EP-2017-100.yaml YAML file encoding themeasurement of branchingfraction of the B → µµ andB → µµ decays [20].21able 2 – Continued from previous page
File Description
PH-EP-2015-314 q2 0.1 0.98.yamlPH-EP-2015-314 q2 11.0 12.5.yamlPH-EP-2015-314 q2 1.1 2.5.yamlPH-EP-2015-314 q2 15.0 19.yamlPH-EP-2015-314 q2 2.5 4.0.yamlPH-EP-2015-314 q2 4.0 6.0.yamlPH-EP-2015-314 q2 6.0 8.0.yaml YAML files encoding themeasurements of the angularcoefficients of B → K ∗ µµ decayin different q regions [25]. CERN-EP-2016-141 q2 0.1 0.98.yamlCERN-EP-2016-141 q2 11.0 12.5.yamlCERN-EP-2016-141 q2 1.1 2.5.yamlCERN-EP-2016-141 q2 15.0 19.yamlCERN-EP-2016-141 q2 2.5 4.0.yamlCERN-EP-2016-141 q2 4.0 6.0.yamlCERN-EP-2016-141 q2 6.0 8.0.yaml YAML files encoding themeasurements of the branchingfraction of the B → K ∗ µµ decayin different q regions [26]. CERN-EP-2016-215 q2 0.1 0.98.yamlCERN-EP-2016-215 q2 1.1 2.5.yamlCERN-EP-2016-215 q2 2.5 4.yamlCERN-EP-2016-215 q2 4 6.yamlCERN-EP-2016-215 q2 6 8.yaml YAML files encoding themeasurements of the branchingfraction of the B → K πµµ decay in different q regions [27]. CERN-PH-EP-2015-145 0.1 2.yamlCERN-PH-EP-2015-145 11 12.5.yamlCERN-PH-EP-2015-145 15 19.yamlCERN-PH-EP-2015-145 1 6.yamlCERN-PH-EP-2015-145 2 5.yamlCERN-PH-EP-2015-145 5 8.yaml YAML files encoding themeasurements of the branchingfraction of the B → φµµ decayin different q regions [27]. CERN-EP-2019-043.yaml YAML file encoding themeasurement of the R K [28]. CERN-EP-2017-100 q2 0.045 1.1.yamlCERN-EP-2017-100 q2 1.1 6.yaml YAML file encoding themeasurement of the R K ∗ [7]. b2sgamma.yaml YAML file encoding the HFLAVaverage of the b → s γ [15]. RD RDstar.yaml YAML file encoding the HFLAVaverage of the R (D) and R (D ∗ ) [15].22able 2 – Continued from previous page
File Description
HFLAV 2016 157.yamlHFLAV 2016 160.yamlHFLAV 2016 161.yamlHFLAV 2016 162.yamlHFLAV 2016 164.yamlHFLAV 2016 165.yamlHFLAV 2016 166.yamlHFLAV 2016 167.yamlHFLAV 2016 168.yamlHFLAV 2016 169.yamlHFLAV 2016 170.yamlHFLAV 2016 171.yamlHFLAV 2016 176.yamlHFLAV 2016 177.yamlHFLAV 2016 178.yamlHFLAV 2016 179.yamlHFLAV 2016 180.yamlHFLAV 2016 181.yamlHFLAV 2016 182.yamlHFLAV 2016 183.yamlHFLAV 2016 211.yamlHFLAV 2016 212.yaml YAML files encoding the upperlimits of τ Lepton FlavourViolation decays [27].As already mentioned the measurements are constantly growing and thereis expected that the community will contribute to develop this repository.When a new
YAML file is wrote before merging it with the repository it shouldbe checked if it contains all the necessary information. It can be checked withthe
Test YAML.cc program. It can be used in the following way: 1cd HEPLike 2. / b u i l d /Test YAML < PATH TO YAML > If an entry is missing the user will be notified by a printout. The HEP-LikeData repository contains also a template
YAML (data/template.yaml) filewhich can be used to create new measurements
YAML files.As already mentioned we provide useful utilities for the encoded mea-surements. The first is the ability to create
BiBtex file for the measurements23hat have been used. The user should store the
BiBtex items or
YAML filenames: 1A a i j : 2 0 1 7 vbb 2b2mumu . yamlTo prepare the
BiBtex file user should run the make citations.py scriptlocated in the utils directory: 1cd u t i l s 2python m a k e c i t a t i o n s . py l i s t . t x tafter this command a new file references.bib , will be created, which willcontain the full
BiBtex entries. This can be directly used in preparing thepublication.Another useful feature of
HEPLike is the ability to search the measure-ment database for relevant measurements. The script allowing for that utilityis also located in the utils . Currently the database can be searched for usingthe year of publication, Arxiv number, author of the
YAML file or the uniquename of the measurements. The syntax for running a search is the following: 1python l o o k u p . py −− A r x i v 1 7 0 5 . 0 5 8 0 2 2Found f i l e s : 3. . / d a t a / e x a m p l e s / R K s t a r l o w q 2 . yamlTo see all available search options in the following script user can run itwith help option: python lookup.py -h .
5. Summary
We have presented a computer program
HEPLike that enables to constructand evaluate experimental likelihoods. The software is designed to handlethe interpretation of wide range of published results. It also allows to performdirect fits to data once it is provided by the experimental collaborations.The program can be easily interfaced with other computer programs andis aimed to help users, who perform fits to experimental results in theirscientific work. It is especially useful for large fitting collaborations, whichtill now had to implement the experimental measurements on their own. Themeasurement themselves are stored in
YAML files in separate repository. Thisallows for easy extensions of the database without the need of compilation.Furthermore, users and experimental collaborations can share their encodedmeasurements with the community. 24 cknowledgments
This work is partly supported by the CERN FCC Design Study Program.The research of M. Chrzaszcz is funded by the Polish National Agency forAcademic Exchange under the Bekker program. M. Chrzaszcz is also gratefulto Foundation for Polish Science (FNP) for its support.We would like thank Mike Williams, Patrick Koppenburg, Pat Scott,Danny van Dyk and Maria Moreno Llacer for invaluable comments aboutour manuscript. 25 eferences [1] P. Athron, et al., GAMBIT: The Global and Modular Beyond-the-Standard-Model Inference Tool, Eur. Phys. J. C77 (11) (2017) 784, [Ad-dendum: Eur. Phys. J.C78,no.2,98(2018)]. arXiv:1705.07908 , doi:10.1140/epjc/s10052-017-5513-2,10.1140/epjc/s10052-017-5321-8 .[2] J. C. Costa, et al., Likelihood Analysis of the Sub-GUT MSSM in Lightof LHC 13-TeV Data, Eur. Phys. J. C78 (2) (2018) 158. arXiv:1711.00458 , doi:10.1140/epjc/s10052-018-5633-3 .[3] P. Bechtle, K. Desch, P. Wienemann, Fittino, a program for determiningMSSM parameters from collider observables using an iterative method,Comput. Phys. Commun. 174 (2006) 47–70. arXiv:hep-ph/0412012 , doi:10.1016/j.cpc.2005.09.002 .[4] F. Mahmoudi, New constraints on supersymmetric models from b —¿ s gamma, JHEP 12 (2007) 026. arXiv:0710.3791 , doi:10.1088/1126-6708/2007/12/026 .[5] T. Feldmann, D. Van Dyk, K. K. Vos, Revisiting B → ππ(cid:96)ν at largedipion masses, JHEP 10 (2018) 030. arXiv:1807.01924 , doi:10.1007/JHEP10(2018)030 .[6] J. Kumar, D. London, R. Watanabe, Combined Explanations of the b → sµ + µ − and b → cτ − ¯ ν Anomalies: a General Model Analysis, Phys. Rev.D99 (1) (2019) 015007. arXiv:1806.07403 , doi:10.1103/PhysRevD.99.015007 .[7] R. Aaij, et al., Test of lepton universality with B → K ∗ (cid:96) + (cid:96) − decays,JHEP 08 (2017) 055. arXiv:1705.05802 , doi:10.1007/JHEP08(2017)055 .[8] R. Aaij, et al., Search for the lepton-flavour violating decay D → e ± µ ∓ ,Phys. Lett. B754 (2016) 167–175. arXiv:1512.00322 , doi:10.1016/j.physletb.2016.01.029 .[9] G. J. Feldman, R. D. Cousins, A Unified approach to the classicalstatistical analysis of small signals, Phys. Rev. D57 (1998) 3873–3889. arXiv:physics/9711021 , doi:10.1103/PhysRevD.57.3873 .2610] C. Rover, C. Messenger, R. Prix, Bayesian versus frequentist up-per limits, in: Proceedings, PHYSTAT 2011 Workshop on Statisti-cal Issues Related to Discovery Claims in Search Experiments andUnfolding, CERN,Geneva, Switzerland 17-20 January 2011, CERN,CERN, Geneva, 2011, pp. 158–163. arXiv:1103.2987 , doi:10.5170/CERN-2011-006.158 .[11] R. Aaij, et al., Search for the decays B s → τ + τ − and B → τ + τ − , Phys.Rev. Lett. 118 (25) (2017) 251802. arXiv:1703.02508 , doi:10.1103/PhysRevLett.118.251802 .[12] A. L. Read, Modified frequentist analysis of search results (the CL s method) (CERN-OPEN-2000-205).URL https://cds.cern.ch/record/451614 [13] S. S. Wilks, The large-sample distribution of the likelihood ratio fortesting composite hypotheses, Ann. Math. Statist. 9 (1) (1938) 60–62. doi:10.1214/aoms/1177732360 .URL https://doi.org/10.1214/aoms/1177732360 [14] M. Tanabashi, et al., Review of particle physics, Phys. Rev. D 98 (2018)030001. doi:10.1103/PhysRevD.98.030001 .URL https://link.aps.org/doi/10.1103/PhysRevD.98.030001 [15] Y. Amhis, et al., Averages of b -hadron, c -hadron, and τ -lepton propertiesas of summer 2016, Eur. Phys. J. C77 (12) (2017) 895. arXiv:1612.07233 , doi:10.1140/epjc/s10052-017-5058-4 .[16] R. Barlow, Asymmetric systematic errors arXiv:physics/0306138 .[17] R. Aaij, et al., Test of lepton universality using B + → K + (cid:96) + (cid:96) − decays,Phys. Rev. Lett. 113 (2014) 151601. arXiv:1406.6482 , doi:10.1103/PhysRevLett.113.151601 .[18] I. Antcheva, et al., ROOT: A C++ framework for petabyte data stor-age, statistical analysis and visualization, Comput. Phys. Commun. 180(2009) 2499–2512. arXiv:1508.07749 , doi:10.1016/j.cpc.2009.08.005 . 2719] E. Maguire, L. Heinrich, G. Watt, HEPData: a repository for highenergy physics data, J. Phys. Conf. Ser. 898 (10) (2017) 102006. arXiv:1704.05473 , doi:10.1088/1742-6596/898/10/102006 .[20] R. Aaij, et al., Measurement of the B s → µ + µ − branching frac-tion and effective lifetime and search for B → µ + µ − decays, Phys.Rev. Lett. 118 (19) (2017) 191801. arXiv:1703.05747 , doi:10.1103/PhysRevLett.118.191801 .[21] M. Aaboud, et al., Measurement of the t ¯ tZ and t ¯ tW cross sectionsin proton-proton collisions at √ s = 13 TeV with the ATLAS detec-tor arXiv:1901.03584 .[22] F. U. Bernlochner, et al., FlavBit: A GAMBIT module for computingflavour observables and likelihoods, Eur. Phys. J. C77 (11) (2017) 786. arXiv:1705.07933 , doi:10.1140/epjc/s10052-017-5157-2 .[23] P. Athron, et al., Global fits of GUT-scale SUSY models with GAMBIT,Eur. Phys. J. C77 (12) (2017) 824. arXiv:1705.07935 , doi:10.1140/epjc/s10052-017-5167-0 .[24] P. Athron, et al., A global fit of the MSSM with GAMBIT, Eur.Phys. J. C77 (12) (2017) 879. arXiv:1705.07917 , doi:10.1140/epjc/s10052-017-5196-8 .[25] R. Aaij, et al., Angular analysis of the B → K ∗ µ + µ − decay using 3fb − of integrated luminosity, JHEP 02 (2016) 104. arXiv:1512.04442 , doi:10.1007/JHEP02(2016)104 .[26] R. Aaij, et al., Measurements of the S-wave fraction in B → K + π − µ + µ − decays and the B → K ∗ (892) µ + µ − differential branching fraction,JHEP 11 (2016) 047, [Erratum: JHEP04,142(2017)]. arXiv:1606.04731 , doi:10.1007/JHEP11(2016)047,10.1007/JHEP04(2017)142 .[27] R. Aaij, et al., Differential branching fraction and angular momentsanalysis of the decay B → K + π − µ + µ − in the K ∗ , (1430) region, JHEP12 (2016) 065. arXiv:1609.04736 , doi:10.1007/JHEP12(2016)065 .[28] R. Aaij, et al., Search for lepton-universality violation in B + → K + (cid:96) + (cid:96) − decays arXiv:1903.09252arXiv:1903.09252