[PDF] HEPLike: an open source framework for experimental likelihood evaluation

Abstract

We present a computer framework to store and evaluate likelihoods coming from High Energy Physics experiments. Due to its flexibility it can be interfaced with existing fitting codes and allows to uniform the interpretation of the experimental results among users. The code is provided with large open database, which contains the experimental measurements. The code is of use for users who perform phenomenological studies, global fits or experimental averages.

Full PDF

HHEPLike : an open source framework for experimentallikelihood evaluation

Jihyun Bhom a, ∗ , Marcin Chrzaszcz a, ∗ a Henryk Niewodniczanski Institute of Nuclear Physics Polish Academy of Sciences,Krakow, Poland

Abstract

We present a computer framework to store and evaluate likelihoods com-ing from High Energy Physics experiments. Due to its ﬂexibility it can beinterfaced with existing ﬁtting codes and allows to uniform the interpreta-tion of the experimental results among users. The code is provided withlarge open database, which contains the experimental measurements. Thecode is of use for users who perform phenomenological studies, global ﬁts orexperimental averages.

Keywords: experimental high energy physics, likelihoods

PROGRAM SUMMARY/NEW VERSION PROGRAM SUM-MARY

Program Title: HEPLikeLicensing provisions(please choose one): GPLv3Programming language: C++Supplementary material:Journal reference of previous version: FIRST VERSION OF PROGRAMNature of problem(approx. 50-250 words): Provide a uniform way of store,share and evaluate experimental likelihoods in a proper statistical manner. Thecode can be easily interfaced with existing global ﬁtting codes. In addition a largedatabase with the measurements is published. The program targets users who per-form in their scientiﬁc work: phenomenological studies, global ﬁts or measurements ∗ Corresponding author.

E-mail address: [email protected], [email protected]

Preprint submitted to Computer Physics Communications March 10, 2020 a r X i v : . [ phy s i c s . d a t a - a n ] M a r verages. The HEPLike has been created for FlavBit project[1], which was used toperform several analysis[2,3] and here we present an updated version, which canbe used in standalone mode.Solution method(approx. 50-250 words): C++ code that evaluates the statisticalproperties of the measurements without user intervention. The large open databaseis provided as well. The measurements are stored in YAML ﬁles allowing for easyreadability and extensions.

References [1] arXiv: 1705.07933[2] arXiv: 1705.07935[3] arXiv: 1705.07917 . Introduction In the High Energy Physics (HEP) the experimental measurements areperformed by several collaborations, which measure various diﬀerent observ-ables. The experimental results are presented in various ways; some being assimple as a measurement with an Gaussian error, some more complicated asmultiple correlated measurements with asymmetric errors or in some placeseven a full likelihood function is being published. To make things more com-plicated in some cases multiple representations of the same measurement arebeing published. All of this makes it hard to directly use and compare var-ious diﬀerent results. It also leaves a room for misinterpreting the resultsby theorists, which use these inputs to their studies. It happens that theasymmetric errors are being symmetrized, instead of using the full likelihoodonly central value with approximated asymmetric error is being used.The High Energy Physics Likelihoods (

HEPLike ) is a computer programthat allows to store and share the likelihoods of various measured quanti-ties. The published code can be useful for users performing phenomenologicalstudies using experimental results, global ﬁtting collaborations or experimen-tal averages. Thanks to its structure it is easy to be interface with existingcodes. It simpliﬁes the work of people as instead of looking up the appropri-ate measurement and coding up their own likelihood they can download thedatabase of measurements and choose the one they need. Furthermore, itshifts the burden of constructing the proper likelihood functions back to theexperimentalists, which performed the measurement at the ﬁrst place andare clearly the most appropriate people to handle this task.The computer code described in this paper is written in

C++ , making ituseful for majority of ﬁtting programs available on the market [1, 2, 3, 4, 5, 6].The library can be used in both the χ and likelihood ﬁts. Moreover, it con-tains a statistical module with useful functions that can be used in the sta-tistical analysis. Besides the computer code a database with the likelihoodsis being published. The measurements are stored in the YAML ﬁles makingthem easy to read by both the machine and human. This database can beeasily extended by adding new

YAML ﬁles if new measurement becomes avail-able. With the software we provide useful utilities, which allows to performsearches inside the database, create

BiBtex containing publications, whichhave been in the ﬁt, etc.The paper is organized as follows: in Sec. 2 construction of the likelihoodfunctions is presented. Sec. 3 explains the detailed code implementations and3ata storage, while Sec. 4 describes how to install and use

HEPLike software.

2. Likelihood constructions

In this section we will present how likelihoods in

HEPLike are stored andconstructed. Each measurement is stored in separate

YAML ﬁle. There areseveral ways collaborations published their results depending on the mea-surements itself: • Upper limits, • Single measurement with symmetric uncertainty, • Single measurement with asymmetric uncertainty, • Multiple measurements with symmetric uncertainty, • Multiple measurements with asymmetric uncertainty, • One dimensional likelihood function, • n-dimensional likelihood function.In addition, there is growing interest from the community that the exper-imental collaborations instead of only the results of the analysis publish alsothe dataset that has been used to obtain the result. For this future occasionwe have also implement a way that this data can be directly used in the ﬁts.Each of these cases has a diﬀerent module of HEPLike that is designedto evaluate the likelihood functions. In this section we will present the sta-tistical treatment of the above cases and the modules that are responsiblefor their evaluation. Each of the

YAML ﬁles is required to have the followinginformation (here for example we use the R K ∗ measurement [7]): 1B i b C i t e : A a i j : 2 0 1 7 vbb 2BibEntry : ’ @ a r t i c l e { A a i j : 2 0 1 7 vbb , 3a u t h o r = ” A a i j , R . and o t h e r s ” , 4t i t l e = ” { T e s t o f l e p t o n u n i v e r s a l i t y 5w i t h $Bˆ { } \ r i g h t a r r o w 6Kˆ {∗ }\ e l l ˆ { + }\ e l l ˆ {−} $ d e c a y s } ” ,7c o l l a b o r a t i o n = ”LHCb” , 8j o u r n a l = ”JHEP” ,4volume = ” 0 8 ” , 10y e a r = ” 2 0 1 7 ” , 11p a g e s = ” 0 5 5 ” , 12d o i = ” 1 0 . 1 0 0 7 / JHEP08 ( 2 0 1 7 ) 0 5 5 ” , 13e p r i n t = ” 1 7 0 5 . 0 5 8 0 2 ” , 14a r c h i v e P r e f i x = ” a r X i v ” , 15p r i m a r y C l a s s = ” hep − ex ” , 16reportNumber = ”LHCB − PAPER − −

013 , 17CERN − EP − − \ p r o t e c t \ v r u l e w i d t h 0 p t \ p r o t e c t \ h r e f { h t t p : / / a r x i v . o r g / a b s / 1 7 0 5 . 0 5 8 0 2 } { a r X i v : 1 7 0 5 . 0 5 8 0 2 } ; % % ”19 }

20’ 21DOI : 1 0 . 1 0 0 7 / JHEP08 ( 2 0 1 7 ) 0 5 5 22P r o c e s s : R { K s t a r ˆ {∗}} > <

6. 31HLAuthor : Gal Anonim 32HLEmail : g a l . a n o n i m @ i f j . edu . p l 33HLType : H L P r o f L i k e l i h o o dThe above informations are used to store the information relevant for thebookkeeping. For instance the entries

BibCite and

BibEntry correspondto the information that are used to generate a

BiBtex citation ﬁle with themeasurements that have been used in the studies. The

DOI corresponds tothe digital object identiﬁer of the publication. The

Decay deﬁnes the processthat has been studied. It can also be replaced by the

Process entry. The

Name is a unique name of this measurement type. If the measurement getsupdated with more data or by other collaboration the

Name entry in the new

YAML ﬁle should be the same as in the old one.

Source entry corresponds tothe source of the measurement. This can be either a HEPData or the collab-oration itself. The

SubmissionYear ( PublicationYear ) refers to the year ofappearance (publication) of the result. The

Arxiv codes the Arxiv number,5hile the

Collaborations stores the information which experimental col-laboration has performed the measurement. Finally, the

Kinematics storesadditional information about the kinematic region that has been measured.The

HLAuthor and

HLEmail encode the information about the

YAML ﬁle au-thor and his email in case user needs further information about the encodedmeasurement. Last but not least the entry

HLType contains the informationabout which

HEPLike object should be used to read the ﬁle.Reading of this content in the

YAML is implemented in the

HL Data class.All other classes that construct the likelihood functions inherit from thisclass its capabilities. Please note that if the information is missing in the

YAML ﬁle the program will omit reading this entry. The only exception is the

FileName , which is mandatory. If a user wants to be notiﬁed by the programthat some informations are missing the

HL debug yaml variable has to be setto true (default value is false ). In case where the measurement did not observe a signiﬁcant access ofsignal candidates the collaborations usually report an upper limit on themeasured quantity. Commonly 90% or 95% upper limits are quoted. Ex-periments use various statistical approaches to compute this limits. It canbe the CL s method [8], Feldman–Cousins [9] or some variation of Bayesianmethods [10]. Publication of only an upper limits does not provide enoughinformation to use the result in global ﬁts. However, nowadays experimentsbesides the aforementioned upper limits publish a full p-value scans. Ex-amples of such scans are shown in Fig. 1. The plots are usually availablein digital format, which allows the information to be extracted and used incomputer program.In HEPLike a class

HL Limit is responsible for handling this type of mea-surements. It reads the

YAML ﬁle that contains the standard informationabout the measurement (see Sec. 2 for details). The additional informationof the observed CL s /p-value is stored in the YAML ﬁle in the following way : 1C l s : 2 − [ 0 . 0 , 1 . 0 ] 3 − [ 1 . 0 e −

10 , 0 . 9 7 7 0 9 1 6 9 4 7 0 6 ] Please note that the besides this information the previous information from Sec. 2should be included. - t + tﬁ s B ( B - v a l u e p LHCb

Observed CLsExpected CLs - Median σ ± Expected CLs σ ± Expected CLs C L S ) ± µ ± e → (D B -8 × LHCb

Figure 1: Example of p-value scans for the B → τ − τ + [11] (left) and D → e µ [8] (right).Please note that the CL s value can be interpreted as p-value as explained in [12]. Theblack line corresponds to the observed CL s /p-value. − [ 2 . 0 e −

10 , 0 . 9 5 4 3 7 5 8 2 4 2 9 7 ] 5 − [ 3 . 0 e −

10 , 0 . 9 3 2 0 0 3 5 5 3 4 3 ] 6 − [ 4 . 0 e −

10 , 0 . 9 1 0 6 3 0 7 0 0 5 4 6 ] 7 − [ 5 . 0 e −

10 , 0 . 8 8 9 3 8 2 7 2 1 8 0 9 ]The

Cls can be replaced in the

YAML ﬁle by p-value as they correspondto the same information. The ﬁrst number in each array is the value oftested hypothesis (for example branching fraction), while the second is thecorresponding CL s /p-value. These values are then interpreted using a χ distribution with one degree of freedom: pdf ( x ) = 12 / Γ(1 / x / − e − x/ , (1)which had the cumulative distribution function deﬁned as: cdf ( x ) = 1Γ(1 / γ (1 / , x/ . (2)In the above equations the Γ( x ) and γ ( k, x ) correspond to Gamma and in-complete gamma functions. By revering the cdf ( x ) one can obtain the χ value: χ = cdf − (1 − p ) , (3)where p corresponds to the p-value of a given x hypothesis. This χ can be7hen translated to the log-likelihood via Wilks theorem [13]: − log( L ) = 12 χ , (4)where the L is the likelihood. The user can choose if he wants to obtain the χ , likelihood or a log-likelihood value of a given hypothesis. The simplest case of a published experimental result is a single value witha symmetric uncertainty. This is for example a typical result of an PDG ofHFLAV average [14, 15]. The measurement is coded in the

YAML ﬁle as: 1O b s e r v a b l e s : 2 − [ ”Br A2BCZ ” , 0 . 1 , 0 . 0 5 , 0 . 0 1 ]The ﬁrst argument in the array ”Br A2BCZ” corresponds to the observ-able name. Then the ﬁrst number corresponds to the measured central value.The 2nd and the 3rd number are the statistical and systematic uncertain-ties. In case where only one uncertainty is available the 3rd number shouldbe omitted and it will be automatically set to 0 in the software. We havedecided to keep the plural Observables to be more uniform in case wheremore observables are measured.The module responsible for reading this

YAML ﬁle is called

HL Gaussian ,it calculates the χ for an x hypothesis: χ = ( x obs − x ) σ stat + σ syst , (5)where the x obs correspond to the measured central value in the YAML ﬁle andthe σ stat and σ syst are the statistical and systematic uncertainties respectively.This can be the again translated to the likelihood and log-likelihood valueusing Eq. 4. A simple extension of the Gaussian uncertainty is when an asymmetricuncertainty is reported. This type of measurements although less frequentappear in the literature. The publication in this case reports the centralvalue and two uncertainties: σ + and σ − , which correspond to the right (forvalues larger than the measured central value) and left (for values smallerthan the measured central value) uncertainty. In HEPLike we have created a

HL BifurGaussian class, which reads the following entry in the

YAML ﬁle:8O b s e r v a b l e s : 2 − [ ”Br A2BCZ ” , 0 . 1 , 0 . 0 5 , − − σ + and σ − uncertainties, while the ﬁfth and sixth to the system-atical σ + and σ − uncertainties. It is important to keep the minus sign beforethe left side uncertainties. The code will indicate the error in case of missingsign. In some cases the systematical uncertainty is reported to be symmetric.In such case the last number can be omitted in the YAML entry.In the literature there exist number of ways to interpret asymmetric un-certainties [16]. We have chosen the most commonly used one which is theso-called bifurcated Gaussian: χ =  ( x obs − x ) σ , if x ≥ x obs ( x obs − x ) σ − , if x < x obs , (6)where the σ ± is the sum of squared statistical and systematic uncertainties forright/left case. Once χ is calculated it can be translated to the log-likelihoodusing Eq. 4. Nowadays the most common are simultaneous measurements of variousquantities, which are correlated between each other. For instance cross sec-tion measurements in diﬀerent kinematic bins, or measurements of angularcoeﬃcients in heavy mesons decays. In

HEPLike the class responsible forhandling these cases is called

HL nDimGaussian . It reads the following infor-mation from the

YAML ﬁle: 1O b s e r v a b l e s : 2 − [ ”BR1” , 0 . 1 , 0 . 0 2 ] 3 − [ ”BR2” , 0 . 2 , 0 . 0 1 , 0 . 0 1 ] 4 − [ ”BR3” , 0 . 4 , 0 . 0 4 ] 5C o r r e l a t i o n : 6 − [ ”BR1” , ”BR2” , ”BR3 ” ] 7 − [ 1 . , 0 . 2 , 0 ] 8 − [ 0 . 2 , 1 . , 0 . ] 9 − [ 0 , 0 . , 1 . ] 9he information in the “Observables” entry is exactly the same as in the HL Gaussian class. Please note that similarly to the previous class the sys-tematic uncertainty is not mandatory and in case if it is not provided thecode will treat it as 0. The next entry in the

YAML ﬁle is the “Correlation”,which encodes the correlation matrix. The ﬁrst ”row” is the names of thevariables it is important to keep the same order of variables as in the “Ob-servables” entry. The

HL nDimGaussian evaluates the χ in the followingway: χ = V T Cov − V, (7)where V is a column matrix, which is the diﬀerence between the measuredand the tested i-th observable value. The Cov is a square matrix, constructedfrom the correlation matrix (Corr): Cov i,j = Corr i,j σ i σ j .Often a user does not want to use the full set of measured quantities butjust their subset. In this case a function Restrict(vector) canbe used. By passing in a form of vector the list of observables to be used,the program will create new smaller covariance matrix, which will be usedto evaluate the χ . In a similar manner the χ can be translated to thelikelihood and log-likelihood value by Eq. 4. More complicated case is when multiple correlated measurements are re-ported with asymmetric uncertainties. The case is similar to the one dis-cussed in Sec. 2.3 and same statistic comments apply in this case. The

YAML ﬁle encoding such a measurement will contain the following entries: 1O b s e r v a b l e s : 2 − [ ”BR1” , 0 . 1 , + 0 . 0 2 , − − [ ”BR2” , 0 . 2 , + 0 . 0 1 , − − − [ ”BR3” , 0 . 3 , + 0 . 0 4 , − − [ ”BR1” , ”BR2” , ”BR3 ” ] 7 − [ 1 . , 0 . 1 , 0 . 2 ] 8 − [ 0 . 1 , 1 . , 0 . 1 ] 9 − [ 0 . 2 , 0 . 1 , 1 . ]The meaning of the “Observables” entry is the same as in the previousclass (cf. Sec. 2.3) and the “Correlation” encodes the same information as10n the HL nDimGaussian class (cf.. Sec. 2.4). The rules about the minussign and symmetric systematic uncertainty are the same as in case of the

HL BifurGaussian (cf. Sec. 2.3). The diﬀerence arises when one evaluates the χ , namely the cov matrix is constructed depending if σ + and σ − uncertaintyis relevant: Cov i,j =  Corr i,j σ i + σ j + , if x i ≥ x iobs and x j ≥ x jobs Corr i,j σ i + σ j − , if x i ≥ x iobs and x j < x jobs Corr i,j σ i − σ j + , if x i < x iobs and x j ≥ x jobs Corr i,j σ i − σ j − , if x i < x iobs and x j < x jobs (8)The obtained Cov matrix is then used to calculate the χ using Eq. 7.The rest follows the same procedure as described in Sec. 2.4. The best way a result can be published is by providing the (log-)likelihoodfunction. This type of results are more and more common in the literature.The most easy is the one-dimensional likelihood scans as can be presented inform of a ﬁgure, which examples are shown in Fig. 2. *0 K R ) b e s t L l n - L ( l n - L0EL0HL0ICombinedComb. (stat) ] c / <6.0 [GeV q K R ) b e s t ( l n L - l og L - LHCbLHCb

Electron Kaon Other Combination

Figure 2: Examples of published one-dimensional likelihoods in the Lepton UniversalityViolation of the B → K ∗ (cid:96)(cid:96) [7] (left) and B → K (cid:96)(cid:96) [17] (right). The biggest advantage of publishing the results in this form is its com-pleteness. The (log-)likelihood curve contains all the information about allthe non-Gaussian eﬀects and incorporates the systematic uncertainties. Thetechnical problem is how to publish such information. Usually plots are pub-lished in the pdf or png formats which makes them hard to be used. Sinceexperiments are mostly using ROOT [18] framework the plots are saved also in11he C format, which contains the points in the form of arrays. This of coursemakes the points accessible however it is not easy to automate retrieving thisdata from the C ﬁle. The best solution is provided by the HEPData portal [19].It allows to download the data in a user preferred format. In

HEPLike wehave chosen to use the

ROOT format by default, in which the data points aresaved in the form of a

TGraph object, which is also the way experimentalistslike to store this information. In the

YAML ﬁle we specify the path of the

ROOT in the following way: 1ROOTData : d a t a /HEPData − i n s 1 5 9 9 8 4 6 − v1 − T a b l e 1 . r o o t 2TGraphPath : ” Table 1/ Graph1D y1 ”The

ROOTData encodes the location of the

ROOT ﬁle, while the

TGraphPath encodes the location of the

TGraph object in that

ROOT ﬁle. In

HEPLike the class

HL ProfLikelihood is responsible for reading and encoding thislikelihood. The value of the log-likelihood can be ten translated again intothe χ with Eq. 4. The natural extension of one dimensional likelihood is an n-dim likelihood,where n ≥

2. Currently experimental collaborations publish only 2-dimlikelihood functions (cf. Fig. 3). ) - m + m ﬁ s0 BF(B - · ) -m + m ﬁ B F ( B - · - - - W cross section [pb]tt Z c r o ss s e c t i on [ pb ]tt Best fit68% CL95% CLNLO prediction

ATLAS -1 = 13 TeV, 36.1 fbs Figure 3: Examples of published two-dimensional likelihoods. The B (B → µµ ) vs B (B → µµ ) likelihood [20] (left) and σ (ttZ) vs σ (ttW) likelihood [21] (right). TH2D or TH3D and we have chosen this way to store this information. The correspond-ing entry in the

YAML ﬁle looks as following: 1ROOTData : d a t a /LHCb/RD/ Bs2mumu 5fb / histB2mumu . r o o t 2TH2Path : ” h 2DScan ”Similar to the one dimensional likelihood (Sec. 2.6) the

ROOTData encodes thelocation of the

ROOT ﬁle, while the

TH2Path ( TH3Path ) encodes the location ofthe

TH2D ( TH3D ) object. In the long run the community will have to addressthe question how to publish higher dimensional likelihoods and this module(

HL nDimLikelihood ) will have to be extended for such use cases.

It is possible that in the future experimental collaborations besides theresults will made the datasets public. The procedure and the form in whichthe data should be published is not decided and there is an ongoing debateif the published data should correspond to the raw detector data, the ﬁnalselected points used in the analysis or something between? Clearly publish-ing a raw data is problematic, as people outside the collaboration do nothave the necessary knowledge about the calibration and eﬃciency correctionprocedures or data taking conditions. The most useful way to publish thedataset is to allow the experimentalists to perform all the selection, all thenecessary eﬃciency corrections and publish the ﬁnal dataset that has beenused for analysis. This would allow the theory community to use the datasetdirectly in their ﬁts without knowing the technicalities about the experimen-tal data analysis. For this case in

HEPLike we have implemented such a class

HL ExpPoints .The data are stored in the

TTree structure located in the

ROOT ﬁle. The

YAML ﬁle encodes this information in form: 1ROOTData : d a t a / t o y / d a t a . r o o t 2TTreePath : t 3O b s e r v a b l e s : 4 − [ x ] 5 − [ y ] 6 − [ z ] 7Weight : w 13here the ROOTData points to the

ROOT ﬁle and the

TTreePath stores theinformation of the

TTree location inside the

ROOT ﬁle. It is assumed that theexperiments will provide all the corrections in form of event-by-event weights.The name of the weight inside the

TTree is encoded in the

Weight entry. Ingeneral the data points are elements of R n vector space, which coordinatesare stored in the Observables entry.The only thing that user needs to provide to the

HL ExpPoints object is apointer to the function to be ﬁtted. The function should have a form: double(*fun)(vector par , vector point) , where the par vector encodes the parameters that want to be ﬁtted and the point corre-sponds to a data point. The

HL ExpPoints will then evaluate the likelihood: L ( ω ) = f ( x | ω ) w ( x ) (9)for the whole dataset. In the above the x correspondents to the n-dimensionalpoint, ω denotes the parameters that want to be ﬁtted par , and f denotesthe ﬁtting function ( fun ). The HEPLike does not provide a minimalizer ora scanner tool as it is not purpose of this type of software. It has to beinterfaced with proper scanner tool for example [1]. Again the user candecide if he/she prefers to perform a χ or log-likelihood ﬁt.The biggest advantage of such format is the compatibility with the experi-mental analysis. Experimentalist can in principle publish as well the functionthat they have used to ﬁt this data and therefore a theorists reproduce theexperimental result and start where the experimentalists ﬁnished.

3. Code implementation

In this section we will discuss the implementation of the code used tocreate likelihoods discussed in Sec. 2. The code is build in several classes: • HL Data : base class from which other classes inherit their base func-tionality. • HL Limit : class that handles the upper limit measurements. • HL Gaussian : class that handles measurements with Gaussian uncer-tainty. • HL BifurGaussian : class that handles measurements with asymmetricuncertainty. 14

HL nDimGaussian : class that handles measurements with n-dimensionalGaussian uncertainties. • HL nDimBifurGaussian : class that handles measurements with n-dimensionalasymmetric uncertainties. • HL ProfLikelihood : class that handles measurements with one-dimensionallikelihood function. • HL nDimLikelihood : class that handles measurements with 2(3)-dimensionallikelihood function. • HL ExpPoints : class that allows to perform the ﬁts to experimentaldatasets.In Tab. 1 we present the functionality of these classes. In addition wepresent the hierarchy of the structure of the class inheritance in Fig. 4.Table 1: Functions available in the

HEPLike software.

Function Description

HL Data()

Constructor of the

HL Data class.

HL Data(string)

Constructor of the

HL Data class. Theargument that is taken by constructor is thepath for the

YAML ﬁle encoding themeasurement.

HL Limit()

Constructor of the

HL Limit class.

HL Limit(string)

Constructor of the

HL Limit class. Theargument that is taken by constructor is thepath for the

YAML ﬁle encoding themeasurement.

HL Gaussian()

Constructor of the

HL Gaussian class.

HL Gaussian(string)

Constructor of the

HL Gaussian class. Theargument that is taken by constructor is thepath for the

YAML ﬁle encoding themeasurement.

HL BifurGaussian()

Constructor of the

HL BifurGaussian class.15able 1 –

Continued from previous page

Function Description

HL BifurGaussian(string)

Constructor of the

HL Gaussian class. Theargument that is taken by constructor is thepath for the

YAML ﬁle encoding themeasurement.

HL nDimGaussian()

Constructor of the

HL nDimGaussian class.

HL nDimGaussian(string )

Constructor of the

HL nDimGaussian class.The argument that is taken by constructoris the path for the

YAML ﬁle encoding themeasurement.

HL nDimBifurGaussian()

Constructor of the

HL nDimBifurGaussian class.

HL nDimBifurGaussian(string)

Constructor of the

HL nDimBifurGaussian class. The argument that is taken byconstructor is the path for the

YAML ﬁleencoding the measurement.

HL ProfLikelihood()

Constructor of the

HL ProfLikelihood class.

HL ProfLikelihood(string)

Constructor of the

HL ProfLikelihood class. The argument that is taken byconstructor is the path for the

YAML ﬁleencoding the measurement.

HL nDimLikelihood()

Constructor of the

HL nDimLikelihood class.

HL ProfLikelihood(string)

Constructor of the

HL nDimLikelihood class. The argument that is taken byconstructor is the path for the

YAML ﬁleencoding the measurement.

HL ExpPoints()

Constructor of the

HL ExpPoints class.

HL ExpPoints(string)

Constructor of the

HL ExpPoints class. Theargument that is taken by constructor is thepath for the

YAML ﬁle encoding themeasurement. read standard()

Function that reads the general informationabout the measurement from the

YAML ﬁle.16able 1 –

Continued from previous page

Function Description set debug yaml(bool)

Function that enables debugging the

YAML ﬁle. By default the debugging is switchedoﬀ and can be switched on by passing a true bool argument to this function.Debugging will print a message that for agiven information in the

YAML ﬁle is missing.

Read()

Function reading the

YAML ﬁle. The function

GetChi2(double)

Function that returns the χ value for agiven point (passed to the function asdouble). Function is available for all classesbesides HL Data . GetLogLikelihood(double)

Function that returns the log-likelihoodvalue for a given point (passed to thefunction as double). Function is availablefor all classes besides

HL Data . GetLikelihood(double)

Function that returns the likelihood valuefor a given point (passed to the function asdouble). Function is available for all classesbesides

HL Data . GetCLs(double)

Function that returns CL s or p-value for agiven point (passed to the function asdouble). The function is a member of the HL Limit class.

Restrict(vector)

Function that restricts number ofobservables from the

YAML ﬁle. Function is amember of the

HL nDimGaussian , HL nDimBifurGaussian and

HL nDimLikelihood classes.

InitData()

Function of

HL ExpPoints class that readsto the memory the data from the

TTree object.

Profile()

Function of

HL nDimLikelihood class thatcreates the proﬁle log-likelihood projections.

SetFun()

Function of

HL ExpPoints class, that setsthe pointer to the function to be ﬁtted.17 igure 4: Diagram of class inheritance of the

HEPLike package.

4. Installation and usage

In this chapter we will present the requirements and installation for the

HEPLike package. The software is distributed via the github site: https://github.com/mchrzasz/HEPLike .In order to compile

HEPLike the following packages (and the minimalversion) needs to be installed: • git • cmake , 2.8 • yaml-cpp , 1.58.0 • gsl , 2.1 • Boost , 1.58.0 • ROOT , 6.08The compilation is done in the following way: 1cd < i n s t a l a t i o n d i r >

2g i t c l o n e h t t p s : / / g i t h u b . com/ m c h r z a s z /HEPLike . g i t 3cd HEPLike 4mkdir b u i l d 5cd b u i l d 18cmake . . 7makeIn the above the make can be replaced with make -jN , where N is the numberof threads that user wants to be used for compilation. Please note thatin case of non standard installation of some packages one might have toprovide cmake with a proper path to the library. After successful compilationa libHEPLike.a and libHEPLike.so libraries will be created in the build directory.The

HEPLike is provided with seven examples: • Br example.cc : example program showing the usage of the

HL Gaussian class. • BrBifurGaussian example.cc : example program showing the usageof the

HL BifurGaussian class. • Data Fit example.cc : example program showing the usage of the

HL ExpPoints class. • Limit example.cc : example program showing the usage of the

HL Limit class. • Ndim BifurGaussian example.cc : example program showing the us-age of the

HL nDimBifurGaussian class. • Ndim Gaussian.cc : example program showing the usage of the

HL nDimGaussian class. • Ndim Likelihood example.cc : example program showing the usage ofthe

HL nDimLikelihood class. • ProfLikelihood example.cc : example program showing the usage ofthe

HL ProfLikelihood class.To compile them a proper variable has to be set during the cmake stage: 1cd b u i l d 2cmake − DEXECUTABLE=TRUE . . 3make 19fter the compilation in the build directory will contain executablesfrom the following examples. The

HEPLike package comes also with testprocedures for each of the classes. To perform the tests user has to performthe command:c t e s tor an equivalent:make t e s tIf the

HEPLike was successfully installed the output will look as following:T e s t p r o j e c t / s t o r a g e / g i t h u b /HEPLike/ b u i l dS t a r t 1 : HL Test YAML1/7 T e s t .1. Available measurement

The

YAML ﬁles that contain the stored measurements are located in asecond independent repository. The reason for this separation is that the

YAML ﬁles are expected to be updated more frequently then the code itself.It is expected that users and experiments will contribute to this repository.By implementing such model it is ensured that the repository will containthe most up to date measurements.The repository can be found at: https://github.com/mchrzasz/HEPLikeData .The repository should be downloaded or cloned: 1cd < some new d i r >

2g i t c l o n e h t t p s : / / g i t h u b . com/ m c h r z a s z / HEPLikeData . g i tSince the repository contains only

YAML ﬁles there is no need for anycompilation. The repository contains a directory data , where all the

YAML ﬁles are kept. It should be linked by a symbolic link to the

HEPLike pack-age. Inside the data the measurements are grouped by experiments (ex.LHCb, ATLAS, CMS, etc.). Inside the experiment directory the measure-ments are grouped according to type of measurement in the collaborations,for example: RD , Semileptonic , Charmless , Exotica , etc. The names ofthe

YAML ﬁles should be named accordingly to publication report number.For example:

CERN-EP-2018-331.yaml . If a single publication producedmore independent measurements, user might code them in the indepen-dent ﬁles and give further information at the end of the ﬁle, for exam-ple:

CERN-PH-EP-2015-314 q2 01 0.98.yaml .Currently we are publishing the measurements that have been used byus in other projects [22, 23, 24]. The list of

YAML ﬁles with the context ispresented in Tab. 2.Table 2: Functions available in the

HEPLike software.

File Description

CERN-EP-2017-100.yaml YAML ﬁle encoding themeasurement of branchingfraction of the B → µµ andB → µµ decays [20].21able 2 – Continued from previous page

File Description

PH-EP-2015-314 q2 0.1 0.98.yamlPH-EP-2015-314 q2 11.0 12.5.yamlPH-EP-2015-314 q2 1.1 2.5.yamlPH-EP-2015-314 q2 15.0 19.yamlPH-EP-2015-314 q2 2.5 4.0.yamlPH-EP-2015-314 q2 4.0 6.0.yamlPH-EP-2015-314 q2 6.0 8.0.yaml YAML ﬁles encoding themeasurements of the angularcoeﬃcients of B → K ∗ µµ decayin diﬀerent q regions [25]. CERN-EP-2016-141 q2 0.1 0.98.yamlCERN-EP-2016-141 q2 11.0 12.5.yamlCERN-EP-2016-141 q2 1.1 2.5.yamlCERN-EP-2016-141 q2 15.0 19.yamlCERN-EP-2016-141 q2 2.5 4.0.yamlCERN-EP-2016-141 q2 4.0 6.0.yamlCERN-EP-2016-141 q2 6.0 8.0.yaml YAML ﬁles encoding themeasurements of the branchingfraction of the B → K ∗ µµ decayin diﬀerent q regions [26]. CERN-EP-2016-215 q2 0.1 0.98.yamlCERN-EP-2016-215 q2 1.1 2.5.yamlCERN-EP-2016-215 q2 2.5 4.yamlCERN-EP-2016-215 q2 4 6.yamlCERN-EP-2016-215 q2 6 8.yaml YAML ﬁles encoding themeasurements of the branchingfraction of the B → K πµµ decay in diﬀerent q regions [27]. CERN-PH-EP-2015-145 0.1 2.yamlCERN-PH-EP-2015-145 11 12.5.yamlCERN-PH-EP-2015-145 15 19.yamlCERN-PH-EP-2015-145 1 6.yamlCERN-PH-EP-2015-145 2 5.yamlCERN-PH-EP-2015-145 5 8.yaml YAML ﬁles encoding themeasurements of the branchingfraction of the B → φµµ decayin diﬀerent q regions [27]. CERN-EP-2019-043.yaml YAML ﬁle encoding themeasurement of the R K [28]. CERN-EP-2017-100 q2 0.045 1.1.yamlCERN-EP-2017-100 q2 1.1 6.yaml YAML ﬁle encoding themeasurement of the R K ∗ [7]. b2sgamma.yaml YAML ﬁle encoding the HFLAVaverage of the b → s γ [15]. RD RDstar.yaml YAML ﬁle encoding the HFLAVaverage of the R (D) and R (D ∗ ) [15].22able 2 – Continued from previous page

File Description

HFLAV 2016 157.yamlHFLAV 2016 160.yamlHFLAV 2016 161.yamlHFLAV 2016 162.yamlHFLAV 2016 164.yamlHFLAV 2016 165.yamlHFLAV 2016 166.yamlHFLAV 2016 167.yamlHFLAV 2016 168.yamlHFLAV 2016 169.yamlHFLAV 2016 170.yamlHFLAV 2016 171.yamlHFLAV 2016 176.yamlHFLAV 2016 177.yamlHFLAV 2016 178.yamlHFLAV 2016 179.yamlHFLAV 2016 180.yamlHFLAV 2016 181.yamlHFLAV 2016 182.yamlHFLAV 2016 183.yamlHFLAV 2016 211.yamlHFLAV 2016 212.yaml YAML ﬁles encoding the upperlimits of τ Lepton FlavourViolation decays [27].As already mentioned the measurements are constantly growing and thereis expected that the community will contribute to develop this repository.When a new

YAML ﬁle is wrote before merging it with the repository it shouldbe checked if it contains all the necessary information. It can be checked withthe

Test YAML.cc program. It can be used in the following way: 1cd HEPLike 2. / b u i l d /Test YAML < PATH TO YAML > If an entry is missing the user will be notiﬁed by a printout. The HEP-LikeData repository contains also a template

YAML (data/template.yaml) ﬁlewhich can be used to create new measurements

YAML ﬁles.As already mentioned we provide useful utilities for the encoded mea-surements. The ﬁrst is the ability to create

BiBtex ﬁle for the measurements23hat have been used. The user should store the

BiBtex items or

YAML ﬁlenames: 1A a i j : 2 0 1 7 vbb 2b2mumu . yamlTo prepare the

BiBtex ﬁle user should run the make citations.py scriptlocated in the utils directory: 1cd u t i l s 2python m a k e c i t a t i o n s . py l i s t . t x tafter this command a new ﬁle references.bib , will be created, which willcontain the full

BiBtex entries. This can be directly used in preparing thepublication.Another useful feature of

HEPLike is the ability to search the measure-ment database for relevant measurements. The script allowing for that utilityis also located in the utils . Currently the database can be searched for usingthe year of publication, Arxiv number, author of the

YAML ﬁle or the uniquename of the measurements. The syntax for running a search is the following: 1python l o o k u p . py −− A r x i v 1 7 0 5 . 0 5 8 0 2 2Found f i l e s : 3. . / d a t a / e x a m p l e s / R K s t a r l o w q 2 . yamlTo see all available search options in the following script user can run itwith help option: python lookup.py -h .

5. Summary

We have presented a computer program

HEPLike that enables to constructand evaluate experimental likelihoods. The software is designed to handlethe interpretation of wide range of published results. It also allows to performdirect ﬁts to data once it is provided by the experimental collaborations.The program can be easily interfaced with other computer programs andis aimed to help users, who perform ﬁts to experimental results in theirscientiﬁc work. It is especially useful for large ﬁtting collaborations, whichtill now had to implement the experimental measurements on their own. Themeasurement themselves are stored in

YAML ﬁles in separate repository. Thisallows for easy extensions of the database without the need of compilation.Furthermore, users and experimental collaborations can share their encodedmeasurements with the community. 24 cknowledgments

This work is partly supported by the CERN FCC Design Study Program.The research of M. Chrzaszcz is funded by the Polish National Agency forAcademic Exchange under the Bekker program. M. Chrzaszcz is also gratefulto Foundation for Polish Science (FNP) for its support.We would like thank Mike Williams, Patrick Koppenburg, Pat Scott,Danny van Dyk and Maria Moreno Llacer for invaluable comments aboutour manuscript. 25 eferences [1] P. Athron, et al., GAMBIT: The Global and Modular Beyond-the-Standard-Model Inference Tool, Eur. Phys. J. C77 (11) (2017) 784, [Ad-dendum: Eur. Phys. J.C78,no.2,98(2018)]. arXiv:1705.07908 , doi:10.1140/epjc/s10052-017-5513-2,10.1140/epjc/s10052-017-5321-8 .[2] J. C. Costa, et al., Likelihood Analysis of the Sub-GUT MSSM in Lightof LHC 13-TeV Data, Eur. Phys. J. C78 (2) (2018) 158. arXiv:1711.00458 , doi:10.1140/epjc/s10052-018-5633-3 .[3] P. Bechtle, K. Desch, P. Wienemann, Fittino, a program for determiningMSSM parameters from collider observables using an iterative method,Comput. Phys. Commun. 174 (2006) 47–70. arXiv:hep-ph/0412012 , doi:10.1016/j.cpc.2005.09.002 .[4] F. Mahmoudi, New constraints on supersymmetric models from b —¿ s gamma, JHEP 12 (2007) 026. arXiv:0710.3791 , doi:10.1088/1126-6708/2007/12/026 .[5] T. Feldmann, D. Van Dyk, K. K. Vos, Revisiting B → ππ(cid:96)ν at largedipion masses, JHEP 10 (2018) 030. arXiv:1807.01924 , doi:10.1007/JHEP10(2018)030 .[6] J. Kumar, D. London, R. Watanabe, Combined Explanations of the b → sµ + µ − and b → cτ − ¯ ν Anomalies: a General Model Analysis, Phys. Rev.D99 (1) (2019) 015007. arXiv:1806.07403 , doi:10.1103/PhysRevD.99.015007 .[7] R. Aaij, et al., Test of lepton universality with B → K ∗ (cid:96) + (cid:96) − decays,JHEP 08 (2017) 055. arXiv:1705.05802 , doi:10.1007/JHEP08(2017)055 .[8] R. Aaij, et al., Search for the lepton-ﬂavour violating decay D → e ± µ ∓ ,Phys. Lett. B754 (2016) 167–175. arXiv:1512.00322 , doi:10.1016/j.physletb.2016.01.029 .[9] G. J. Feldman, R. D. Cousins, A Uniﬁed approach to the classicalstatistical analysis of small signals, Phys. Rev. D57 (1998) 3873–3889. arXiv:physics/9711021 , doi:10.1103/PhysRevD.57.3873 .2610] C. Rover, C. Messenger, R. Prix, Bayesian versus frequentist up-per limits, in: Proceedings, PHYSTAT 2011 Workshop on Statisti-cal Issues Related to Discovery Claims in Search Experiments andUnfolding, CERN,Geneva, Switzerland 17-20 January 2011, CERN,CERN, Geneva, 2011, pp. 158–163. arXiv:1103.2987 , doi:10.5170/CERN-2011-006.158 .[11] R. Aaij, et al., Search for the decays B s → τ + τ − and B → τ + τ − , Phys.Rev. Lett. 118 (25) (2017) 251802. arXiv:1703.02508 , doi:10.1103/PhysRevLett.118.251802 .[12] A. L. Read, Modiﬁed frequentist analysis of search results (the CL s method) (CERN-OPEN-2000-205).URL https://cds.cern.ch/record/451614 [13] S. S. Wilks, The large-sample distribution of the likelihood ratio fortesting composite hypotheses, Ann. Math. Statist. 9 (1) (1938) 60–62. doi:10.1214/aoms/1177732360 .URL https://doi.org/10.1214/aoms/1177732360 [14] M. Tanabashi, et al., Review of particle physics, Phys. Rev. D 98 (2018)030001. doi:10.1103/PhysRevD.98.030001 .URL https://link.aps.org/doi/10.1103/PhysRevD.98.030001 [15] Y. Amhis, et al., Averages of b -hadron, c -hadron, and τ -lepton propertiesas of summer 2016, Eur. Phys. J. C77 (12) (2017) 895. arXiv:1612.07233 , doi:10.1140/epjc/s10052-017-5058-4 .[16] R. Barlow, Asymmetric systematic errors arXiv:physics/0306138 .[17] R. Aaij, et al., Test of lepton universality using B + → K + (cid:96) + (cid:96) − decays,Phys. Rev. Lett. 113 (2014) 151601. arXiv:1406.6482 , doi:10.1103/PhysRevLett.113.151601 .[18] I. Antcheva, et al., ROOT: A C++ framework for petabyte data stor-age, statistical analysis and visualization, Comput. Phys. Commun. 180(2009) 2499–2512. arXiv:1508.07749 , doi:10.1016/j.cpc.2009.08.005 . 2719] E. Maguire, L. Heinrich, G. Watt, HEPData: a repository for highenergy physics data, J. Phys. Conf. Ser. 898 (10) (2017) 102006. arXiv:1704.05473 , doi:10.1088/1742-6596/898/10/102006 .[20] R. Aaij, et al., Measurement of the B s → µ + µ − branching frac-tion and eﬀective lifetime and search for B → µ + µ − decays, Phys.Rev. Lett. 118 (19) (2017) 191801. arXiv:1703.05747 , doi:10.1103/PhysRevLett.118.191801 .[21] M. Aaboud, et al., Measurement of the t ¯ tZ and t ¯ tW cross sectionsin proton-proton collisions at √ s = 13 TeV with the ATLAS detec-tor arXiv:1901.03584 .[22] F. U. Bernlochner, et al., FlavBit: A GAMBIT module for computingﬂavour observables and likelihoods, Eur. Phys. J. C77 (11) (2017) 786. arXiv:1705.07933 , doi:10.1140/epjc/s10052-017-5157-2 .[23] P. Athron, et al., Global ﬁts of GUT-scale SUSY models with GAMBIT,Eur. Phys. J. C77 (12) (2017) 824. arXiv:1705.07935 , doi:10.1140/epjc/s10052-017-5167-0 .[24] P. Athron, et al., A global ﬁt of the MSSM with GAMBIT, Eur.Phys. J. C77 (12) (2017) 879. arXiv:1705.07917 , doi:10.1140/epjc/s10052-017-5196-8 .[25] R. Aaij, et al., Angular analysis of the B → K ∗ µ + µ − decay using 3fb − of integrated luminosity, JHEP 02 (2016) 104. arXiv:1512.04442 , doi:10.1007/JHEP02(2016)104 .[26] R. Aaij, et al., Measurements of the S-wave fraction in B → K + π − µ + µ − decays and the B → K ∗ (892) µ + µ − diﬀerential branching fraction,JHEP 11 (2016) 047, [Erratum: JHEP04,142(2017)]. arXiv:1606.04731 , doi:10.1007/JHEP11(2016)047,10.1007/JHEP04(2017)142 .[27] R. Aaij, et al., Diﬀerential branching fraction and angular momentsanalysis of the decay B → K + π − µ + µ − in the K ∗ , (1430) region, JHEP12 (2016) 065. arXiv:1609.04736 , doi:10.1007/JHEP12(2016)065 .[28] R. Aaij, et al., Search for lepton-universality violation in B + → K + (cid:96) + (cid:96) − decays arXiv:1903.09252arXiv:1903.09252