[PDF] CutLang V2: towards a unified Analysis Description Language

Abstract

We will present the latest developments in CutLang, the runtime interpreter of a recently-developed analysis description language (ADL) for collider data analysis. ADL is a domain-specific, declarative language that describes the contents of an analysis in a standard and unambiguous way, independent of any computing framework. In ADL, analyses are written in human-readable plain text files, separating object, variable and event selection definitions in blocks, with a syntax that includes mathematical and logical operations, comparison and optimisation operators, reducers, four-vector algebra and commonly used functions. Adopting ADLs would bring numerous benefits to the LHC experimental and phenomenological communities, ranging from analysis preservation beyond the lifetimes of experiments or analysis software to facilitating the abstraction, design, visualization, validation, combination, reproduction, interpretation and overall communication of the analysis contents. Since their initial release, ADL and CutLang have been used for implementing and running numerous LHC analyses. In this process, the original syntax from CutLang v1 has been modified for better ADL compatibility, and the interpreter has been adapted to work with that syntax, resulting in the current release v2. Furthermore, CutLang has been enhanced to handle object combinatorics, to include tables and weights, to save events at any analysis stage, to benefit from multi-core/multi-CPU hardware among other improvements. In this contribution, these and other enhancements are discussed in details. In addition, real life examples from LHC analyses are presented together with a user manual.

Full PDF

CCutLang V2: towards a uniﬁed Analysis Description Language

B. Gokturk , A. M. Toon , A. Paul , B. Orgen , N. Ravel , J. Setpal , G. Unel , and S.Sekmen Bogazici University, Department of Physics, Istanbul, Turkey Saint Joseph University of Beirut, Dept. of Computer Software Engineering, Beirut, Lebanon The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy University of Ankatso, Department of Physics, Antananarivo, Madagascar R.N. Podar School, Mumbai, India University of California at Irvine, Department of Physics and Astronomy, Irvine, USA Kyungpook National University, Department of Physics, Daegu, South KoreaJanuary 28, 2021

Abstract

We will present the latest developments in CutLang , the runtime interpreter of a recently-developed anal-ysis description language (ADL) for collider data analysis. ADL is a domain-speciﬁc, declarative languagethat describes the contents of an analysis in a standard and unambiguous way, independent of any computingframework. In ADL, analyses are written in human-readable plain text ﬁles, separating object, variable andevent selection deﬁnitions in blocks, with a syntax that includes mathematical and logical operations, com-parison and optimisation operators, reducers, four-vector algebra and commonly used functions. AdoptingADLs would bring numerous beneﬁts to the LHC experimental and phenomenological communities, rangingfrom analysis preservation beyond the lifetimes of experiments or analysis software to facilitating the abstrac-tion, design, visualization, validation, combination, reproduction, interpretation and overall communicationof the analysis contents. Since their initial release, ADL and CutLang have been used for implementing andrunning numerous LHC analyses. In this process, the original syntax from CutLang v1 has been modiﬁedfor better ADL compatibility, and the interpreter has been adapted to work with that syntax, resulting inthe current release v2. Furthermore, CutLang has been enhanced to handle object combinatorics, to includetables and weights, to save events at any analysis stage, to beneﬁt from multi-core/multi-CPU hardwareamong other improvements. In this contribution, these and other enhancements are discussed in details. Inaddition, real life examples from LHC analyses are presented.

Contents a r X i v : . [ h e p - ph ] J a n Multi-threaded runs 107 Code maintenance and continuous integration 118 Analysis examples 119 Conclusions 12A User Manual 15

A.1 Blocks and keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15A.2 Predeﬁned physics objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15A.3 Predeﬁned functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16A.3.1 PDGID of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17A.4 Mathematical operators and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18A.5 Comparison, range and logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18A.5.1 Logical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.5.2 Ternary operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.6 χ minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.7 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.8 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.9 Manipulating objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.9.1 Deﬁning new objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.9.2 Sorting objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21A.9.3 Object combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21A.9.4 Looping over a subset of the object collection . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.5 Minimum and maximum of object attributes . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.6 Summing object attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.7 Object constituents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.8 Daughter particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.9.9 Hit and miss method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.10 Manipulating Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.10.1 Selecting or rejecting events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.10.2 Weighing events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.10.3 Saving events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11 Bins, counts and histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11.1 Bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11.2 Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11.3 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25A.12 Structure of a complete ADL ﬁle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25A.12.1 Initialization and information section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.12.2 Regions and algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 B The CutLang framework 26

B.1 installation and compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26B.2 External user functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

High energy collider physics data analyses nowadays are performed using complex software frameworks that in-tegrate a diverse set of operations from data access to event selection, from histogramming to statistical analysis.Mastering these frameworks requires a high level knowledge of general purpose languages and software architec-ture. Such requirements erect a barrier between data and the physicist who may simply wish to try an analysisidea. Moreover, having the physics information (e.g. object deﬁnitions, event selections, background estimationmethods, etc.) scattered throughout diﬀerent components of the framework code also makes implementing andworking with diﬀerent physics ideas less straightforward and eﬃcient.These diﬃculties could be addressed by considering a domain speciﬁc language capable of describing theanalysis ﬂow in a standard and unambiguous way. Various eﬀorts have been ongoing to design such languages forhigh energy collider data analysis. One of these eﬀorts led to the development of Analysis Description Language(ADL), a declarative language that can express the mathematical and logical algorithm of a physics analysis2n a human-readable and standalone way, independent of any computing frameworks. Being declarative, ADLexpresses the analysis logic without explicitly coding the control ﬂow, and is designed to describe what needsto be done, but not how to do it. This consequently leads to a more tidy and eﬃcient expression and eliminatesprogramming errors.ADL originated from the merging of two parallel eﬀorts. It was formed by combining the best ideas fromCutLang [1, 2], an eﬀort to build an interpreted language directly executable on events, and

LHADA (LesHouches Analysis Description Accord), initially designed by a group of experimentalists and phenomenologiststo systematically document and run content of LHC physics analyses [3, 4, 5]. At its current state, ADL iscapable of describing many standard operations in LHC analyses. However, it is being continuously improvedand generalized to address an even wider range of analysis operations.ADL is designed as a language that can be executed on data and used in real life data analyses. An analysiswritten with ADL could be executed by any computing framework that is capable of parsing and interpretingADL, hence satisfying the framework independence. Currently, two approaches have been studied to realize thispurpose. One is the transpiler approach, where ADL is ﬁrst converted into a general purpose language, which isin turn compiled into code executable on events. A transpiler called adl2tnm converting ADL to C++ code iscurrently under development [4]. Earlier prototype transpilers converting

LHADA into code snippets that couldbe integrated within CheckMate [6, 7, 8] and Rivet [9, 10] frameworks were also studied. The other approach isthat of runtime interpretation. Here ADL is directly executed on events without being intermediately convertedinto a code requiring compilation. This approach was used for developing CutLang [1, 2].In this paper, we focus on CutLang and present in detail its current state denoted as CutLang v2, which wasachieved after many improvements on the early prototype CutLang v1 introduced in [1]. Hereafter, CutLangv2 will be referred to as CutLang for brevity. The main text emphasizes the novelties that led to ADL andimproved CutLang . We start with an overview of ADL in Section 2, then proceed with describing technicalitiesof runtime interpretation with CutLang in Section 3. We next present the ADL ﬁle structure and analysiscomponents that can be expressed by ADL, focusing on the new developments and recently added functional-ities in Section 4. This is followed by Section 5 describing analysis output, again focusing on new additions,Section 6, explaining the newly-added multi-threaded run functionality, Section 7 on CutLang code maintenanceand recently incorporated continuous integration, Section 8 detailing studies on analyses implementation, andconclusions in Section 9. The full description of the current language syntax is given in the form of a user manualin Appendix A, followed by a note on the CutLang framework and external user functions in Appendix B.

In ADL, the description of the analysis ﬂow is done in a plain, easy-to-read text ﬁle, using syntax rules thatinclude standard mathematical and logical operations and 4-vector algebra. In this ADL ﬁle, object, vari-able, event selection deﬁnitions are clearly separated into blocks with a keyword value/expression structure,where keywords specify analysis concepts and operations. Syntax includes mathematical and logical operations,comparison and optimization operators, reducers, 4-vector algebra and HEP-speciﬁc functions (e.g. dφ , dR ).However, an analysis may contain variables with complex algorithms non-trivial to express with the ADL syntax(e.g. M T [11], aplanarity) or non-analytic variables (e.g. eﬃciency tables, machine learning discriminators).Such variables are encapsulated in self-contained, standalone functions which accompany the ADL ﬁle. Variablesdeﬁned by these functions are referred to from within the ADL ﬁle. As a generic rule, all keywords, operatorsand function names are case-insensitive.The language content, syntax rules, and working examples of self-contained functions will be presented inthe coming sections, after a technical introduction of the CutLang interpreter. An interpreted analysis system makes adding new event selection criteria, changing the execution order orcancelling analysis steps more practical. Therefore CutLang was designed to function as a runtime interpreterand bypass the inherent ineﬃciency of the modify-compile-run cycle. Avoiding the integration of the analysisdescription in the framework code also brings the huge advantage of being able to run many alternative analysisideas in parallel, without having to make any code changes, hence making the analysis design phase more ﬂexiblecompared to the conventional compiled framework approach.CutLang runtime interpreter is written in C++, around function pointer trees representing diﬀerent oper-ations such as event selection or histogramming. Therefore processing an event with a cutﬂow table becomes3quivalent to traversing multiple expression trees with arbitrary complexities, such as the one shown in Figure 1.Here physics objects are given as arguments.Figure 1: An expression tree example: the program traverses the tree from right to left evaluating the encoun-tered functions from bottom to top.Handling of the Lorentz vector operations, pseudo-random number generation, input-output ﬁle and his-togram manipulations are all based on classes of the ROOT data analysis framework [12]. The actual parsingof the ADL text relies on automatically generated dictionaries and grammar based on traditional Unix tools,namely, Lex and Yacc [13]. The ADL ﬁle is split into tokens by Lex, and the hierarchical structure of thealgorithm is found by Yacc. Consequently, CutLang can be compiled and operated in any modern Unix-likeenvironment. The interpreter should be compiled only once, during the installation or when optional externalfunctions for complex variables are added. Once the work environment is set up, the remainder is mostly athink-edit-run-observe cycle.The CutLang framework is able to work with multiple input data types each implemented as a plug-in.For example, ATLAS and CMS open data [14] and internal ntuple formats including CMS NanoAOD [15],Delphes [16] and LHCO event formats are recognized and can be directly used. Other input ﬁle types can alsobe easily added since all particle types and event properties are worked through an internal abstraction layer.The only requirement on the input ﬁles is to use ROOT ﬁle format, which is also used for the output ﬁle whichcontains the ADL deﬁnitions and selection algorithms of each region in text format in a separate directoryinside that ﬁle. One other point to raise is that not all input ﬁle types contain the same amount of information.Therefore CutLang provides the possibility of accessing any such input data type speciﬁc information throughexternal user functions. The practical details of the CutLang framework can be found in Appendix B.

We will now explain in detail which analysis components and physics algorithms can be described by ADLand processes with CutLang . We will prioritize highlighting the many novelties added and improvementsthat took place since the original versions CutLang v1 and LHADA. The descriptions here concentrates on theconcepts and content that can be expressed and processed by ADL and functionalities of CutLang v2, ratherthan attempting to give a full layout of syntax rules, which is independently provided in the user manual inAppendix A.

As a runtime interpreter, CutLang processes events in a well-deﬁned order. It executes the commands in theADL ﬁle from top to bottom. Therefore, the ADL ﬁles are required to describe the analysis ﬂow in a certainorder. Some dedicated execution commands are also used within the ADL ﬁle, in order to facilitate the runtimeinterpretation. Throughout the ADL ﬁle, the mass, energy and momentum are all written in Giga ElectronVolt (GEV) and angles in radians. User comments and explanations should be preceded by a hash ( initializations:

This section contains commands that are related to analysis initialization and set up, for which,the relevant keywords are summarized in Table 1. The keywords and values are separated by an equalsign. The last two lines in the table refer to the lepton (electron or muon) triggers. Their utilization isdescribed in Appendix A.2, it is worth noting at this point that Monte Carlo (MC) simulation weightsare not taken into account when the trigger value is set to data.4 ountformats:

This section is used for setting up the recording of already existing event counts and errors,e.g., from an experimental paper publication. It is therefore not directly relevant for event processing,but rather for studying the interplay between the results of the current analysis and its published experi-mental counterpart. More generally, it is used to express any set of pre-existing counts of various signals,backgrounds, and data (together with their error) of an analysis. deﬁnitions1:

This section is used for deﬁning aliases for objects and variables, in order to render them moreeasily referable and readable in the rest of the analysis description. For example, it can introduce shortcutslike

Zhreco for a hadronically reconstructed Z boson, or values like mH, i.e., mass of a reconstructed Higgsboson. These deﬁnitions can only be based on the predeﬁned keywords and objects. objects:

This section can be used to deﬁne new objects based on predeﬁned physics objects and shorthandnotations declared in deﬁnitions1. deﬁnitions2:

This section is allocated for further alias or shorthand deﬁnitions. Deﬁnitions here can be basedon objects in the previous section and predeﬁned particles. event categorization:

This section is used for deﬁning event selection regions and criteria in each region.Running with CutLang requires having at least one selection region with at least one command, whichmay include either a selection criterion or a special instruction to include MC weight factors or to ﬁllhistograms.We next describe the detailed contents and usage of these sections.Table 1: Initialization keywords and their possible valuesKeyword Explanation

SkipHistos

Skip (=1) or Display (=0) the histograms in ﬁnal eﬃciency table

SkipEffs

Skip (=1) or Display (=0) the ﬁnal eﬃciency table

TRGm

TRGe

RandomSeed random number generator seed, an integer

Generally, the starting point in an analysis algorithm is deﬁning and selecting the collections of objects, suchas jets, b jets, electrons, muons, taus, missing transverse energy, etc. that will be used in the next steps of theanalysis. Usually, the input events contain very generic and loose collections for objects, which need to be furtherreﬁned for analysis needs. CutLang is capable of performing a large variety of operations on objects, includingderiving new objects via selection, combining objects to reconstruct new objects, accessing the daughters andconstituents of objects. Once an object is deﬁned, it is also possible to ﬁnd objects with a minimum andmaximum of a given attribute within the object’s collection, or sort the collection according to an attribute.In the ADL notation, object collection deﬁnitions are clearly separated from the other analysis tasks. Here theterm object is used interchangeably with object collection. Each object is deﬁned within an individual object block uniquely identiﬁed with the object’s name. These blocks, starting with the input object collection(s)’sname(s), list diﬀerent types of operations afterwards.CutLang automatically retrieves all standard object collections from the input event ﬁle without the need forany explicit user statements within the ADL ﬁle. It can read events with diﬀerent formats, such as Delphes fastsimulation [16] output, CMS NanoAOD [15], ATLAS or CMS open data [14] ntuples and recognize the objectcollections in these. One property unique to CutLang is that it is designed to map input object collections tocommon, standard object names with a standard set of attributes, as described in Appendix A.2 and A.3. Forexample, AK4jets collection in CMSNanoAOD and

JET collection in Delphes are both mapped to

Jet . Thisapproach allows to process the same ADL ﬁle on diﬀerent input event formats, and has proven very useful inseveral simple practical applications. However, we also recognize that this approach only works when diﬀerentinput collections have matching properties, e.g. when Delphes electrons and CMS electrons have to the sameidentiﬁcation criteria which can be mapped to the same identiﬁcation attribute, or a Delphes jet and an ATLASjet use the same b-tagging algorithm that can be mapped to the same b-tagging attribute. Therefore, otherinterpreters of ADL may choose to use input collection and attribute names as they are, in order to be more5nambiguous. Allowing to practice diﬀerent approaches with advantages for diﬀerent use cases, while stilladhering to the principle of clarity is a signiﬁcant aspect of ADL.The most common object operation is to take the input object collection and ﬁlter a subset by applying aset of selection criteria on object attributes. This can be done very straightforwardly in ADL by listing eachselection criterion in consecutive lines. The objects in the input collection satisfying the criteria can be eitherselected or rejected using the select or reject keywords. Comparison operators such as = , ! = >, <, > = , < =can directly be used for expressing the criteria. Logical operators AND , OR and NOT can be used for expressingcomposite or reverted criteria. A complete list of these operators can be found in Appendix A.5.It is also possible to ﬁlter an object collection based on other object collections, such as in the cases ofobject cleaning or matching. For example, one can reject jets overlapping with photons, or select boosted Wjets matching generator level W bosons. Such operations involve intrinsic loops, which are readily handled byCutLang . Functions such as δφ or angular distance δR can be readily used when comparing objects. Givenan initial object collection, one can consecutively derive several objects. For example jets can be ﬁltered toobtain cleanjets , while cleanjets can be further ﬁltered to obtain verycleanjets . One can also use thesame initial collection to deﬁne diﬀerent collections such as taking muons and imposing diﬀerent criteria toobtain loosemuons and tightmuons .Another very common operation is to combine objects to reconstruct new objects, such as combining 2leptons to form a Z boson. Sometimes, the reconstruction could be very straightforward, as in requesting toreconstruct only a single Z boson per each event. However, in other cases, one might have to reconstruct as many Z bosons as possible. In each case, reconstructed candidates might undergo ﬁltering or selection of a single mostoptimal candidate among all candidates. Combination operations are very diverse, and ﬁnding a completelygeneric expression for them is non-trivial. In its v1, CutLang could reconstruct an explicitly deﬁned number ofobjects per event. It could ﬁnd the object satisfying given criteria by performing optimization operations. In v2,CutLang has been generalized to reconstruct any number of objects, by taking into account the combinatorics.Selection criteria can also be imposed on both the input and reconstructed objects. Technical information onhow to perform combinations is provided in Appendix A.9.3.Another common situation is when objects in a collection are individually associated to other collections.Examples include mothers or daughters of generator level particles, subjets or constituents of jets, associatedtracks of leptons or jets. As a ﬁrst CutLang was adapted in v2 to work with jet constituents using the syntaxdescribed in Appendix A.9.7. Another example of association is daughters of generator truth level particles. Ifan analysis if performed directly on generator level particles, or if a study is required on truth level particles,information such as PDGID codes or decay chain become relevant. CutLang is now capable of accessingPDGID and the decay products of a particle (referred as ”daughters” in HEP), with the syntax described inAppendix A.3.1 and A.9.8. CutLang provides both the number of daughters and a modiﬁer to refer to thedaughters. Work is in progress for ﬁnding a generalizable syntax for object association expressions.Members of object collections can be directly accessed via their indices. Being declarative, ADL syntaxdoes not include explicit statements for looping over object collections, and CutLang is capable of interpretingthis implicit looping. For example, when ﬁltering a jet collection, one might apply a cleaning criterion whichrequires no electron to be in the proximity of the jet deﬁned by a radius. Applying this criterion requires loopingover electrons, however it suﬃces to write the electron object’s name in order for CutLang to interpret implicitlooping based on the context. In other cases, it might be necessary to access only a subset of the collection,such as when imposing a selection on the δφ between ﬁrst 3 jets with highest p T and the missing transversemomentum. ADL and CutLang were updated to allow such operations. The Python slice notation has beenadapted for expressing subset ranges in object collections, as described in Appendix A.9.4.Input or deﬁned object collections are by default sorted by CutLang in the order of decreasing transversemomentum p T . ADL can express sorting object collections according to any feature, in ascending or descendingorder, and CutLang is capable of performing such sorting operations. Moreover, so-called ”reducers” can beapplied for extracting values from existing object collections. One case is the capability to extract the maximumor minimum value of a given attribute in an object collection. For example, CutLang can give the maximum p T possessed by a jet in a jet collection, or minimum value of isolation possessed by an electron in an electroncollection. Another case is the summation operation, where one can sum over the values of a given attributeover the whole collection. The most common use case here is the summation of object p T s to obtain eventvariables such as the hadronic transverse energy H T . Sorting and reducers are recent additions to ADL andCutLang and the details on their implementation and usage are given in Appendix A.9.2, A.9.5, A.9.6 and inthe examples referred to in Section 8. 6 .3 Object or event variables An object variable is a quantity deﬁned once per object, such as a jet’s transverse momentum p T or an electron’srelative isolation. An event variable is a quantity deﬁned once per event, such as missing transverse energy E missT ,number of electrons selected using the tight criteria, p T of the highest p T jet, transverse mass calculated usingthe highest p T lepton and E missT . Object and event variables used in object deﬁnitions or event categorization inan analysis are not always fully provided in the input event data. These quantities therefore need to be computedduring the analysis using the existing inputs. ADL is designed to allow deﬁnition of such new variables in twoways. Simple variables that could be described analytically using a single line formula can be expressed withinthe ADL ﬁle using mathematical operations. A classic example would be that of the deﬁnition of transversemass obtained from a visible object, such as a lepton, and the missing transverse energy. To enable writingthese simple formulas, CutLang is capable of parsing and processing operators such as + , − , ∗ , /, ˆ. CutLanghas also incorporated a series of internal functions to express other operations such as abs(), sqrt(), sin(), cos(),tan() and log(). Reducer operators used for reducing collections to a single value, e.g. size(), sum(), min(),max() are also available for computing quantities. For example, the hadronic transverse momentum H T can becomputed from all jets in an event using the sum() reducer as sum(pT(jets)) .However, in many cases, variables are deﬁned by complex algorithms non-trivial to express. Examples suchas angular separation dR , aplanarity, stransverse mass M T [11], razor variables [17], etc. either cannot beeasily written using the available operators or require multiple steps of calculation. Some of these algorithms,like angular separation and razor variables were predeﬁned as internal functions in CutLang , and more, like H T and M T were added recently. A list of existing variables can be found in Appendix A.3. Other algorithmscan be easily incorporated by the user following the recently generalized recipe in Appendix B. Another classof sophisticated variables include quantities deﬁned from numerical functions, such as object or trigger eﬃcien-cies used to compute object or event weights, provided in tables or histograms, or discriminators/eﬃcienciescomputed via machine learning models. All these variables are incorporated by being deﬁned in independent,self-encapsulated functions outside the ADL ﬁle and referring to them within the ADL ﬁle. These external userfunctions should be seen as a natural extension of the language. The ultimate aim is to provide these functionsin a well-deﬁned and straightforwardly extendable database.The expressions for variables, whether they are built directly using the available mathematical operators orindirectly via internal or user functions, can be written openly in the place of usage, e.g. in the line when aselection is applied on the variable. Alternatively, if the variable is used multiple times in an analysis, e.g. indiﬀerent selection regions, it can be deﬁned once, using the define keyword, which allows to assign an aliasname to the variable. Currently, deﬁning aliases using the define keyword is only possible for event variablesin CutLang , but not for object variables. In CutLang , the define expressions are uniquely placed at the endof the object blocks and before the beginning of the event selection. In a typical collider analysis, events are categorized based on diﬀerent sets of selection criteria applied onevent variables into a multitude of signal regions enhancing the presence of the signal of interest, or control orvalidation regions used for estimating backgrounds. These regions can be derived from each other, and can becorrelated or uncorrelated depending on the case. ADL organizes event categorization by deﬁning each selectionregion in an independent region block and labels each region with a unique name. The region blocks mainlyconsist of a list of selection criteria. As in the case for objects, each criterion is stated in a line starting witha select or a reject keyword, which allows to select or reject the events satisfying the criterion, respectively.Comparison operators, logical operators and ternary operator, syntax for which is described in Appendix A.5are used for expressing the criteria. Another operation that can be performed within the context of eventclassiﬁcation is χ optimization for reconstructed quantities, whose syntax is described in Appendix A.6. Anexample would be ﬁnding among several top quark candidates, the candidate with mass closest to the top quarkmass, and using the optimal candidate’s properties for further selection.ADL and CutLang allow deriving selection regions from each other, e.g. deriving multiple signal regionsfrom a baseline selection region. This is done by simply referring to the baseline region by name in the newregion’s block, and not repeating the whole selection every time.In many analyses, especially those targeting searches for new physics, events in given search regions arepartitioned into many bins based on one or more variables, e.g. H T , E missT or some invariant mass. Datacounts and background estimates in these bins constitute the result of the analysis. With the increased data,recent LHC analyses, especially inclusive searches for new physics may contain hundreds of bins. Treating each This block was called algo in the original CutLang syntax. Even though algo is still valid in CutLang , we generally refer tothe block as region , as the latter is a more domain speciﬁc word. bin keyword. Bins in a region, by deﬁnition, are to be non-overlapping. The CutLang interpreterand framework operate based on this principle, and skip an event once it is classiﬁed into a bin. This propertydistinguishes bins from regions, as diﬀerent regions can be overlapping, and a given event is evaluated for allregions, independent of whether it is selected or not by the preceding regions. Bins can be described in two ways:when the binning is done using only a single variable, all bins can be deﬁned in a single line, by specifying thevariable name and the bin intervals. When bins are deﬁned based on multiple variables, this way of descriptioncan become ambiguous, and a more explicit description, where each bin is deﬁned in one dedicated line canbe used. The usage and syntax of the bin keyword is described in Section A.11.1. In case multiple regionswould have the same binning (e.g. a signal region and several control regions from which the background isestimated), currently, the binning deﬁnitions must be separately speciﬁed in each region independently. Weare searching for a more practical way of expression which would avoid the repetition, while keeping with thehuman readability principle.

In an analysis, events, especially simulated events are usually weighted in order to match the real data luminosityor to correct for detector eﬀects. CutLang has been recently adapted to incorporate the capability of applyingevent weights. Event weights can be applied within the region blocks via usage of the weight keyword asdescribed in Appendix A.10.2. A particular event selected by two diﬀerent regions can receive diﬀerent weights.Event weights can be either constant numbers or functions of variables. These functions may include analyticalor numerical internal or user functions. Weights based on numerical functions, such as eﬃciencies (e.g. triggereﬃciencies) can also be applied from tables written within the ADL ﬁle, as described in Appendix A.8. Thesystematic way for expressing eﬃciencies in tables and applying them to objects and events was incorporatedrecently in ADL and CutLang .

As mentioned above, applying eﬃciencies to events and objects, such as trigger eﬃciencies or object recon-struction, identiﬁcation and isolation eﬃciencies is a common part of many analyses. Section 4.5 described howto apply the eﬀect of event eﬃciencies as event weights. There is, however, another approach, which involvesemulating the eﬀects of eﬃciencies. This approach involves randomly accepting events or objects having a cer-tain property, such that the total selected percentage reﬂects that of the eﬃciency. For example, if the overallreconstruction and identiﬁcation eﬃciency for an electron with 20 < p T <

40 GeV and | η | < . p T and | η | range is allowed to pass the selection only with a 0.6 probability. Thedecision for selection is made by sampling a uniform random number between 0 and 1, and accepting the eventor object if the uniform random number is greater than the eﬃciency value. Usually, the uncertainty on theeﬃciency is also taken into account when making the pass/fail decision. This is called the hit-and-miss method.Emulating eﬃciencies using the hit-and-miss method is regularly used in parametrized fast simulation frame-works. It is also becoming increasingly relevant to incorporate this functionality in the analysis step, especiallyfor the beneﬁt of phenomenological studies targeting interpretation or testing new analysis ideas. These studiesgenerally use events produced by fast simulation or even at truth level instead of real collision data eventsor MC events produced by full detector simulation as used in experimental analyses. Experimental analysesuse complicated object identiﬁcation criteria, which cannot be implemented by fast simulation. Moreover, it iscommon to see diﬀerent analyses working with diﬀerent identiﬁcation methods for a given object (e.g. cut-basedidentiﬁcation versus multivariate analysis-based identiﬁcation for electrons), as diﬀerent methods may performbetter for diﬀerent physics cases. Consequently, working with diﬀerent phenomenology analyses each usingdiﬀerent identiﬁcation criteria requires implementing all these criteria in the simulation step, which is highlyimpractical. Therefore, it is helpful for the infrastructure handling the analysis step to have the capability toemulate using eﬃciencies.Emulating eﬃciencies with uncertainties was recently incorporated in CutLang . The hit-and-miss method isapplied via the internal function applyHM . In the current implementation, the eﬃciency values and errors versusobject properties are input via table blocks in the ADL ﬁle. This will be generalized to reading eﬃcienciesfrom other formats, e.g. input histograms or numerical external functions in the near future.The applyHM function uses a uniform distribution to decide if the central value was hit (below the value)or missed (above the value), the central value itself is recalculated in case the table contains errors. The newvalue is recalculated each time based on a double Gaussian function with positive and negative widths which8re the errors of the associated bin in the eﬃciency table: dg ( x ) ≡ (cid:114) π ∗ (cid:15) u ∗ (cid:15) d ∗ (cid:20) e − ( x − µ ) ∗ (cid:15) d × θ ( µ ) + e − ( x − µ ) ∗ (cid:15) u (cid:21) (1)where µ is the central value of the relevant bin from eﬃciency table, (cid:15) u and (cid:15) d are the errors in the same binand ﬁnally θ is the unit step function. The applyHM function can both be used in the object blocks for deﬁningderived object collections. It can also be used in the region blocks to apply eﬃciencies on a particular object,e.g. to check whether the jet with the highest p T is a b-tagged jet or not. Syntax for the applyHM function canbe found in Appendix A.9.9. As described in the introduction, the main scope of ADL is the description of the physics content and algorithmicﬂow of an analysis. The language content presented up to this point serves this purpose. However furtherauxiliary functionalities are required for practicality while running the analysis on events. One such functionalityis histogramming. Since the start of its design, CutLang has been capable of ﬁlling one-dimensional histogramsof event variables. Recently, the capability of drawing two-dimensional histograms has been added. The syntaxfor histogramming can be found in Appendix A.11.3. Histogramming is currently only available for eventvariables. It will be added for object properties in the near future.

The main priority of the ongoing developments is to establish the principles of ADL as a language. Here,we refer to a language as a set of instructions to implement algorithms that produce various kinds of outputthrough abstractions for deﬁning and manipulating data structures or controlling the ﬂow of execution. It ishowever important to distinguish that a language can be expressed using alternative vocabulary or syntax.Here, vocabulary is the words with a particular meaning in the language, such as block or keyword names,and syntax is the set of rules that deﬁnes the combinations of symbols that are considered to be a correctlystructured expression of the language. Our experience on the way from CutLang v1 and LHADA to ADL showedthat there might not always be a single best syntax for expressing a given content. Alternative syntax optionsmay be more favorable in diﬀerent use cases, due to practicality or simply due to diﬀerent tastes of the users.Recognizing this, we recently opted to host multiple syntactic alternatives in ADL and CutLang for severalcases. The most obvious case is the syntax for expression of object attributes, as described in Appendix A.2.A more minor example is the name for the event classiﬁcation block keyword, i.e. both region and algo arevalid. Another is in the expression of specifying the input object collection in an object block, where either take keyword, using keyword or a colon ”:” are valid. CutLang was recently updated to be able to parse andinterpret diﬀerent alternatives in such cases. We assume that, these diﬀerences will naturally converge andunify as the user base and implemented analysis examples expand.

CutLang as an analysis framework is designed to output information and data that would be used for furtheranalysis. The main output obtained after running an analysis in CutLang is provided in a ROOT ﬁle. The ﬁle,ﬁrst of all, includes a copy of the ADL ﬁle content in order to document the provenance of the analysis. It alsoincludes histograms with all the event counts and uncertainties obtained from the analysis and all histogramsdeﬁned by the user. CutLang is also capable of skimming and saving events using the auxiliary save keywordin its internal format

LVL0 , as described in Appendix A.10.3. In case event saving is speciﬁed in the ADL ﬁle,the ROOT ﬁle also stores the saved events.The output ROOT ﬁle includes a directory for each event categorization region, i.e. each region block. Thesedirectories contain all user-deﬁned histograms speciﬁed in the ADL ﬁle. The prototype version of CutLang alsohad a basic cutflow histogram listing the number of events surviving each step of the selection in the givenregion. The cutﬂows, including the statistical errors on counts are also given as text output. In the currentversion, the cutﬂow histograms are improved to include the selection criteria as bin labels. Moreover, in casebinning is used in a region, a bincounts histogram is also added, where each histogram bin shows the eventcounts and errors in each selection bin, and the histogram bin labels show the bin deﬁnition. The cutflow and bincounts histograms can be directly used in the subsequent statistical analysis of the results.9 .1 Incorporation of existing counts

In some cases, event counts and uncertainties from external sources are needed to be systematically accessible inorder to be processed together with the counts and uncertainties obtained from running the analysis via CutLang. One example is phenomenological interpretation studies, where the analysis is only run through signal samples,while the experimental results, consisting of data counts and background estimates are usually taken from theexperimental publication. Having the data counts and background estimates directly available in a formatcompatible with the signal counts is necessary for subsequent statistical analysis. Moreover, for this particularcase, it is also highly desirable to have this information documented directly within the ADL ﬁle. Anotherexample is validation studies, when either multiple teams in an experimental group are synchronizing theircutﬂows, or a reimplemented analysis for a phenomenological interpretation study is validated against a cutﬂowprovided by the original experimental publication. Similarly, having the validation counts and uncertainties inthe same format would make comparison very practical.Recently, a syntax was developed in ADL for systematically storing external counts and uncertainties withinthe ADL ﬁle. The physics process for which the information is given, and the format of the information isprovided within the countsformat block using the process keyword, while the values are given in the relevant region blocks right after the deﬁnition of the relevant selection criteria using the counts keywords. The syntaxis detailed in Appendix A.11.2. When an ADL ﬁle including external counts and errors is run with CutLang, the counts and errors are converted into cutflow and bincounts histograms with a similar format to thosehosting the CutLang output. The histogram and are placed under the relevant region directories, and physicsprocess is included in the histogram names.

CutLang has been recently enhanced with the capability of multi-threaded execution of an analysis to optimallyutilize the available resources and therefore get faster results. Adding -j n to the command to start the analysisresults in using n number cores. The requirement for n is to be an integer between and total number of coreson the processor, where case is used to select one less than total number of cores to maximize performancefor demanding analyses while leaving the operating system part of the resources. CutLang can be run using 2cores as: ./CLA.sh [inputrootfile] [inputeventformat] -i [adlfilename] -j 2 Figure 2 shows the run time dependence on multi-threading. The mean and standard deviation of theseresults are given in Table 2. The computer used during the test has Intel(R) Core(TM) i5-8300H with 4 cores, 8threads and runs Ubuntu 18.04.4 LTS. The number of events analyzed was limited to 3 million due to memoryrestrictions. Table 2: Data points given in Figure 2.Threads Mean no. of Events/sec Std.Dev.1 3063.4 14.52 5853.5 18.54 10223.3 22.36 11028.0 29.68 11272.0 119.6As can be seen from the results, events per second ratios between analyses are not dissimilar from the ratiosof number of cores used in these analyses up to 4 parallel processes. Simultaneous processing eﬃciency, resourcedemand of background processes and recombination of results that are obtained in parallel contribute to declinein performance of multi-threaded runs. Due to the processor having only 4 physical cores with 2 logical coreseach, runs that use more than 4 threads showed minimal improvement.In another performance test, run times of 1,2,4 and 8 threaded analyses for varying events are given in Table3. To simplify, a normalized version of Table 3 is also provided in Table 4, where runtime of analysis that used1 core is taken to be the norm. Looking at these tables, it can be seen that as the analyses get more complex,higher levels of multi-threading performance gets better and better .10 E v en t s pe r S e c ond Multi-threaded Run Performance

Figure 2: Events processed per second when analysis is divided into 1, 2, 4, 6 and 8 threads for varying numberof events. Error bars are multiplied by 10 to make them visible.Table 3: Variation of run times with changing number of threads.Process Time For Core Used [s]1 2 4 8Processed Events 10 . × . × The CutLang source code is public and resides in the popular software development platform GitHub [18]:https://github.com/unelg/CutLangCutLang uses GitHub functionalities for parallel code development across multiple developers. This devel-opment platform, apart from a wiki page for documentation and possibility for error reporting, also oﬀers acontinuous integration setup which includes a series of tasks that could be initiated at a speciﬁc time or by atrigger such as a commit to the main branch. The continuous integration setup was recently incorporated toautomatically validate the code. The setup compiles the CutLang source code from scratch, and runs the result-ing executable over a set of example ADL ﬁles from the package on a multitude of input data ﬁles and formats.By comparing the output from the examples to a carefully selected reference output, any coding errors could beautomatically detected and reported by email. The total compilation and execution time is greatly reduced byusing a pre-compiled version of ROOT and by pre-installing the necessary event ﬁles onto a Docker [19] imageintegrated to a recent Linux (Ubuntu) virtual computer made available by the development platform.

ADL and CutLang are continuously being used for implementing a diverse set of LHC analyses and runningthese on events. The analyses implemented are being collected in the following GitHub repository [20]:https://github.com/ADL4HEP/ADLLHCanalyses 11able 4: Runtimes as percentages of single core runtime.Normalized Process Time1 2 4 8Processed Events 10

100 98.7 101 14910

100 57.2 38.6 45.710

100 50.7 29.8 32.02 . ×

100 51.9 29.4 27.04 . ×

100 51.3 29.6 26.6The main focus so far has been to implement analyses designed for new physics searches, in particularsupersymmetry searches. These supersymmetry analyses are intended to be directly used to create modeleﬃciency maps to be used by the reinterpretation framework SModelS [21, 22, 23]. The results obtained byrunning some of the implemented analyses have also been validated within dedicated exercises performed duringthe Les Houches PhysTeV workshops, in comparison to other analysis frameworks [5]. The available analysisspectrum is currently being extended to cover Higgs and other SM analyses. Furthermore, studies are ongoingto improve the functionalities of ADL and CutLang for use in searches or interpretation studies with long-livedparticles, which involve highly non-conventional objects and signatures. More recently, analyses examples forCMS Open Data [14] and a sensitivity study case for High Luminosity LHC and the Future Circular Colliderwere also added [24]. In addition, ADL and CutLang were used as main tools in an analysis school which tookplace in Istanbul in February 2020 for undergraduate students, and several analyses were implemented by theparticipating students [25]. ADL and CutLang were also used to prepare hands-on exercises for data analysisat the 26th Vietnam School of Physics (VSOP) in December 2020 [26]. The VSOP exercises involving runningCutLang and further analysis of resulting histograms with ROOT were also adapted for direct use via Jupyternotebooks, and are documented in detail in [27]. The experience in both schools justiﬁed ADL and CutLang ashighly intuitive tools for introducing high energy physics data analysis to undergraduate and masters studentswith nearly no experience in analysis.Implementing analyses with a variety of physics content led to incorporating a wider range of object andselection operations and helped to make the ADL syntax more generic and inclusive. Syntax for generalizingobject combinations, numerical eﬃciency applications, hit-and-miss method, bins and counts and many otherswere introduced as a result of these studies. Consequently, the scope and functionality of CutLang interpreterand framework was also enhanced. Many internal and external functions were added to CutLang to addressdirect requirements of the various implemented analyses. Running diﬀerent analyses on events also allowed tothoroughly test the capacity of CutLang in performing complete, realistic analysis tasks.

We presented the recent developments in CutLang , leading towards a more complete analysis descriptionlanguage and a more robust runtime interpreter. The original syntax of the earlier CutLang prototype versionand its event processing methods have been modiﬁed after a multitude of discussions with other scientistsin the ﬁeld interested in decoupling the physics analysis algorithms from the computational details and afterimplementing many HEP analyses. Modiﬁcations include signiﬁcant enhancement of object deﬁnition andevent classiﬁcation expressions, addition of more functions for calculating event variables, incorporation oftables for applying eﬃciencies, adaptation of a system for including external counts, and more. Although thesemodiﬁcations broke the strict backward compatibility of the earlier version of the language, we believe theyshould be considered as improvements as they certainly will lead to a cleaner, more robust and a widely acceptedanalysis description language. The improved syntax processing relies on formal lexical and grammar deﬁnitiontools widely available in all Unix-like operating systems.One direct result of the syntax modiﬁcations originating from community-wide discussions is that, in thepresented version there are more than a single way of expressing the same idea in CutLang . We believe thisis a desirable property: after all, in human languages (that we try to imitate) as well, the same idea can beexpressed in multiple ways. To give an example to reject events with a property smaller than a certain thresholdamounts to accepting events greater than the same threshold. Such a property should not be considered as asource of potential confusion and error, but as a fertility of the language.CutLang still follows the approach of runtime interpretation. We strongly believe that direct interpretationof the human readable commands and algorithms, although slower in execution as compared to a compiledbinary, leads to faster and less error-prone algorithm development. The possible event processing speed issues12an be cured by parallel processing of independent events and regions. The interpreted and human readablenature of CutLang and ADL have a potential area of growth and development: with the advance of machinelearning hardware and software tools, the dream of being able to perform an LHC-type analysis just by talkingto the computer in one’s native tongue might not be too far-fetched.Finally, as any language, CutLang /ADL grows with the people that use it to solve new problems. Withevery analysis requiring a new functionality, the list of already-solved problems grows. We hope that, such aninternal library together with the script assisted addition of external user functions will allow the analysts ofthe future to spend less time on previously solved problems and to focus their energy in innovating solutions tothe analysis problems of the post LHC era colliders.

References [1] S. Sekmen and G. Unel,

CutLang: A Particle Physics Analysis Description Language and RuntimeInterpreter , Comput. Phys. Commun. (2018) 215–236, [ arXiv:1801.05727 ].[2] G. Unel, S. Sekmen, and A. M. Toon,

CutLang: a cut-based HEP analysis description language andruntime interpreter , in (2019) [ arXiv:1909.10621 ].[3] G. Brooijmans et al.,

Les Houches 2015: Physics at TeV colliders - new physics working group report , in (2016) [ arXiv:1605.02684 ].[4] G. Brooijmans et al.,

Les Houches 2017: Physics at TeV Colliders New Physics Working Group Report ,(2018) [ arXiv:1803.10379 ].[5] G. Brooijmans et al.,

Les Houches 2019 Physics at TeV Colliders: New Physics Working Group Report ,in (2020)[ arXiv:2002.12220 ].[6] M. Drees, H. K. Dreiner, J. S. Kim, D. Schmeier, and J. Tattersall,

CheckMATE: Confronting yourfavourite new physics model with LHC data , Computer Physics Communications (2015) 227 – 265.[7] J. S. Kim, D. Schmeier, J. Tattersall, and K. Rolbiecki,

A framework to create customised lhc analyseswithin CheckMATE , Computer Physics Communications (2015) 535 – 562.[8] J. Tattersall, D. Dercks, et al.,

CheckMATE: Checkmating new physics at the LHC , in

Proceedings of the38th International Conference on High Energy Physics (ICHEP2016). 3-10 August 2016. Chicago (2016)120.[9] B. Waugh, H. Jung, et al.,

HZTool and Rivet: Toolkit and Framework for the Comparison of SimulatedFinal States and Data at Colliders , in (2006) [ hep-ph/0605034 ].[10] A. Buckley, J. Butterworth, et al.,

Rivet user manual , Computer Physics Communications (2013)2803 – 2819.[11] A. Barr, C. Lester, and P. Stephens, m(T2): The Truth behind the glamour , J. Phys. G (2003)2343–2363, [ hep-ph/0304226 ].[12] R. Brun and F. Rademakers, ROOT - An Object Oriented Data Analysis Framework , Nucl. Inst. andMeth. in Phys. Res. A (1997) 81–86.[13] “Lex and Yacc page.” http://dinosaur.compilertools.net .[14] “CERN Open Data Portal.” http://opendata.cern.ch .[15] Rizzi, Andrea,

The Evolution of Analysis Models for HL-LHC , EPJ Web Conf. (2020) 11001.[16] J. de Favereau, C. Delaere, et al.,

DELPHES 3: a modular framework for fast simulation of a genericcollider experiment , Journal of High Energy Physics (2014).[17] C. Rogan,

Kinematical variables towards new dynamics at the LHC , arXiv:1006.2727 .1318] “CutLang GitHub repository.” https://github.com/unelg/CutLang .[19] “Docker web page.” .[20] “ADL LHC analyses repository.” https://github.com/ADL4HEP/ADLLHCanalyses .[21] S. Kraml, S. Kulkarni, et al., SModelS: a tool for interpreting simpliﬁed-model results from the LHC andits application to supersymmetry , Eur. Phys. J. C (2014) 2868, [ arXiv:1312.4175 ].[22] F. Ambrogi, S. Kraml, et al., SModelS v1.1 user manual: Improving simpliﬁed model constraints witheﬃciency maps , Comput. Phys. Commun. (2018) 72–98, [ arXiv:1701.06586 ].[23] F. Ambrogi et al.,

SModelS v1.2: long-lived particles, combination of signal regions, and other novelties , Comput. Phys. Commun. (2020) 106848, [ arXiv:1811.10624 ].[24] A. Paul, S. Sekmen, and G. Unel,

Down type iso-singlet quarks at the HL-LHC and FCC-hh , arXiv:2006.10149 .[25] A. Adiguzel, O. Cakir, et al., Evaluating Analysis Description Language Concept as a First Introductionto Analysis in Particle Physics , arXiv:2008.12034 .[26] “26th Vietnam School of Physics: Particles and Dark Matter, 29 Nov 2020 - 11 dec 2020, Quy Nhon.” https://indico.in2p3.fr/event/19437/overview .[27] “VSOP hands-on exercises.” https://github.com/unelg/CutLang/wiki/VSOP26HandsOnEx .[28] “PDG Particle Identiﬁcation Numbers.” https://pdg.lbl.gov/2013/pdgid/PDGIdentifiers.html .[29] P. D. Group, P. A. Zyla, et al., Review of Particle Physics , Progress of Theoretical and ExperimentalPhysics (2020)[ https://academic.oup.com/ptep/article-pdf/2020/8/083C01/33653179/ptaa104.pdf ]. 083C01.[30] M. Matsumoto and T. Nishimura,

Mersenne twister: a 623-dimensionally equidistributed uniformpseudo-random number generator , ACM Trans. Model. Comput. Simul. (1998) 3–30.14 User Manual

All information about ADL and CutLang including publications, talks and twikis with syntax rules can beaccessed through the following portalhttps://cern.ch/adlThe code for CutLang is hosted in the GitHub repositoryhttps://github.com/unelg/CutLangwhich provides up-to-date instructions on how to install, compile and run CutLang .

A.1 Blocks and keywords

An ADL ﬁle consists of blocks based on a keyword value/expression structure. The blocks allow a clear separationof analysis components. A typical block looks as follows: blockkeyword blockname

Table 6 lists the available blocks, their purposes and associated keywords, and Table 5 lists the keywords.The details on their applications are given in the following sections.Table 5: Blocks in ADL and CutLangBlock Purpose Related key-wordsobject / obj Object deﬁnition block. Produces an object type from an inputobject type by applying selections. take, select, re-jectregion / algo Event categorization. select, reject,weight, bin,sort, counts,histo, saveinfo Contains analysis information such as the experiment, center-of-mass energy, luminosity, publication details, etc.table Generic block for tabular information, such as eﬃciency values ver-sus variable ranges tabletype, nvars,errorscountformat Expresses the processes for which external counts are included andthe format of counts process

A.2 Predeﬁned physics objects

Basic physics objects and their properties currently available in CutLang are deﬁned in Table 7. The predeﬁnedparticles are initially sorted per decreasing transverse momentum and their indices start at zero. With thecurrent implementation, all the predeﬁned particle names, and commonly used function names have becomecase-insensitive. For the particle, both Python-type and L A TEX-type notations are accepted; the former withsquare brackets, and the latter with an underline character. An example for electrons is given below:

Ele_0 = ELE_0 = Ele[0] = ele[0] = electron_0 = electron[0] .

Sometimes it is necessary to refer to the whole object set or just to some of its members. The CutLangnotation for these cases is to write the name of the set without any indices for the former (i.e.

ELE ) and to usethe semi-colon notation for the latter (i.e.

ELE[0:2] = ELE 0:2 ) .In CutLang , there are two object-types that merit special attention: the lepton and the neutrino types. The

LEP keyword refers to a generic lepton and at runtime it is reduced to an electron or to a muon depending onthe choice as explained in Table 1. This helps the physicist avoiding two algorithm sections, one for electron andother muon based analyses. The second object-type is related to the taming of the neutrino escaping from the15able 6: Keywords in ADL and CutLangKeyword Purpose Related blockdeﬁne Deﬁne variables, constants –select Select objects or events based on criteria that follow the keyword. object, regionreject Reject objects or events based on criteria that follow the keyword. object, regiontake / using / : Deﬁne the mother object type objectsort Sort an object in an ascending or descending order wrt a property. regionweight Weight events regionhisto Fill histograms regionprocess Specify process and the format for which external counts are given countformatcounts Give external counts regiontabletype Speciﬁes type of the table tablenvars Number of variables in a table tableerrors Type of errors indicated in a table tabletitle, experiment, id,publication, sqrtS,lumi, arXiv, hepdata,doi Provide information about the analysis (see Table 16) infodetector. At LHC energies and beyond, for which CutLang is intended, the W bosons are generally producedwith a suﬃcient boost such that in the leptonic decays, the pseudorapidity of the charged lepton is not verydiﬀerent from the chargeless one. Therefore this particular physics object beneﬁts from this approximation todeﬁne a massless and chargeless particle with transverse momentum and azimuthal angle ( φ ) values extractedfrom the missing transverse energy (MET) measurements. The pseudorapidity, however, is taken equal to thatof the charged lepton with the same particle index.Table 7: Basic physics object nomenclature in CutLangName Keyword First object Second object j + 1 th objectElectron ELE ELE[0] ELE 0 ELE[1] ELE 1 ELE j

Muon

MUO MUO[0] MUO 0 MUO[1] MUO 1 MUO j

Tau

TAU TAU[0] TAU 0 TAU[1] TAU 1 TAU j

Lepton

LEP LEP[0] LEP 0 LEP[1] LEP 1 LEP j

Photon

PHO PHO[0] PHO 0 PHO[1] PHO 1 PHO j

Jet

JET JET[0] JET 0 JET[1] JET 1 JET j

Fat Jet

FJET FJET[0] FJET 0 FJET[1] FJET 1 FJET j b-tagged Jet

BJET BJET[0] BJET 0 BJET[1] BJET 1 BJET j light Jet

QGJET QGJET[0] QGJET 0 QGJET[1] QGJET 1 QGJET j

Neutrino

NUMET NUMET[0] NUMET 0 NUMET[1] NUMET 1 NUMET j

MET

METLV METLV[0] METLV 0 — — —generator particle

GEN GEN[0] GEN 0 GEN[1] GEN 1 GEN j

A.3 Predeﬁned functions

Functions in CutLang can be used for accessing object attributes, or for computing new variables from objector event quantities. Functions for accessing object attributes can be directly related to Lorentz vectors suchas mass, momentum, rapidity etc, or be related to other variables found in some commonly used ntuples. Inboth cases, both the function syntax with parentheses and the attribute syntax with curly braces can be used.Functions used for computing new quantities can use object attributes or other already calculated quantities orconstants. The currently available object attribute functions in CutLang are listed in Table 8. Note that someof the attributes listed here are only valid for certain input types, e.g. for CMS NanoAOD, but not for others,e.g. for Delphes. The functions used for computing new quantities are listed in Table 9.One should note that in CutLang adding particles could be achieved by either writing these one after theother separated by space(s), or by using a + sign. Both notations are equally valid. Additionally, one shoulduse a comma as the separation character for the functions requiring multiple arguments.16he internal functions, such as angular distance or transverse momentum are also case-insensitive in CutLang, though they are written in this manuscript with a certain syntax (ﬁrst letter upper case) for clarity in reading.The functions requiring multiple arguments should use comma character for argument separation. Externalfunctions can also be downloaded and added to CutLang library. The instructions for this operation is describedin appendix B. Table 8: Functions and syntax for object attributes in CutLang .

Meaning

Syntax 1 Syntax 2

Lorentz vector-related attributes

Mass of m( ) { } m Charge of q( ) { } q Phi of

Phi( ) { }

Phi

Eta of

Eta( ) { }

Eta

Absolute value of Eta of

AbsEta( ) { }

AbsEta

Rapidity of

Rep( ) { }

Rep

Pt of

Pt( ) { } Pt Pz of

Pz( ) { } Pz Energy of

E( ) { } E Momentum of

P( ) { } P Other attributes

PDGID of a particle

PDGID( ) { }

PDGID

Charge of a particle btagDeepB( ) { } btagDeepB is the jet b tagged? bTag( ) { } bTag

Soft Drop mass of a jet msoftdrop( ) { } msoftdrop

N-subjetiness variable 1 tau1( ) { } tau1

N-subjetiness variable 2 tau2( ) { } tau2

N-subjetiness variable 3 tau3( ) { } tau3

Leptonic diTau invariant mass fMTauTau( ) { } fMTauTau transverse impact parameter dxy( ) { } dxy longitudinal impact parameter dz( ) { } dz lepton identiﬁcation variable softId( ) { } softId relative isolation for leptons miniPFRelIsoAll( ) { } miniPFRelIsoAll MVA based tau ID dMVAnewDM2017v2( ) { } dMVAnewDM2017v2 σ iηiη for photons sieie( ) { } sieie isolation variable reliso( ) { } reliso isolation variable relisoall( ) { } relisoall isolation variable pfreliso03all( ) { } pfreliso03all Tau decay mode id iddecaymode( ) { } iddecaymode

Tight ID and isolation ﬂag idisotight( ) { } idisotight

Tight anti ele ID for taus idantieletight( ) { } idantieletight

Tight anti mu ID for taus idantimutight( ) { } idantimutight

Tight ID for muons tightid( ) { } tightid

PU ID for jets puid( ) { } puid

Index of matched genparticle to a lepton genpartidx( ) { } genpartidx

Tau decay mode decaymode( ) { } decaymode

Tau isolation tauiso( ) { } tauiso

Muon soft ID softId( ) { } softId

A.3.1 PDGID of particles

Each type of particle recognized in particle physics is assigned a unique code by the Particle Data Group (PDG)in order to facilitate interface between event generators, detector simulators, and analysis packages. These codesare known as PDGID (or PDG ID), and this method is called the MC particle numbering scheme [28]. Thenumbering includes elementary particles such as, electrons, neutrinos, Z bosons etc, composite particles (mesons,baryons etc) and atomic nuclei. Hypothetical particles beyond the Standard Model also have PDGIDs. Particleshave a positive PDGID whereas antiparticles a negative one. The list of PDGID of some particles is given in17able 9: Functions and syntax for computing new quantities in CutLang .

Meaning

Syntax 1 Syntax 2Angular distance between dR( ) { } dR Phi diﬀerence between dPhi( ) { } dPhi

Eta diﬀerence between dEta( ) { } dEta

Missing transverse energy in the event

MET –sum of jet transverse momenta

HT( ) –partitioning objects into 2 megajets fmegajets( ) { } fmegajets

Razor variable MR fMR( ) { } fMR

Razor variable MTR fMTR( ) { } fMTR partitioning objects into 2 hemispheres fhemisphere( ) { } fhemisphere transverse mass MT2 fMT2( ) { } fMT2 table 10 Table 10: PDGID of some elementary particles[29]Quarks Leptons Bosonsd 1 e − γ

22u 2 µ −

13 Z 23s 3 τ − W + select PDGID( LEP[0]) == -11 This command selects positrons. (Positron is the antiparticle of electron, therefore it has a negative PDGID)

A.4 Mathematical operators and functions

Mathematical functions available in CutLang are listed in Table 11. Trigonometric and logarithmic functionsare implemented with their usual meanings. The Heaviside step function or the unit step function hstep , whichwas also added recently, is a discontinuous function, named after Oliver Heaviside, whose value is zero fornegative arguments and one for positive arguments. The reducer functions for minimization and maximization, min and max , which were added recently, are discussed in Appendix A.9.5. The reducer function size / count returns the number of elements of a given set, such as the number of electrons.Table 11: mathematical and logical operatorsMeaning Operator Meaning Operatornumber of Size( ) Count() NumOf() absolute value abs()tangent tan() hyperbolic tangent tanh()sine sin() hyperbolic sine cosh()cosine cos() hyperbolic cosine sinh()natural exponential exp() natural logarithm log()square root sqrt() Heaviside step function hstep()as close as possible ˜= usual meaning + - / *as far away as possible ˜! to the power ˆ A.5 Comparison, range and logical operators

CutLang understands the basic mathematical comparison expressions and logical operations.

C/C++ operatornotations and their Fortran counterparts are recognized and correctly interpreted. Additionally square bracketsare used to deﬁne inclusive or exclusive ranges. The available comparison, range and logical operators can befound in Table 12. 18able 12: Comparison, range and logical operators in CutLangKeywords Explanation > >= == <= < usual meaning

GT GE EQ LE LT usual meaning != NE not equal [ ] in the interval ] [ not in the interval

NOT logical not

AND and && logical and

OR or || logical or

A.5.1 Logical operations

The use of Boolean operators (AND, OR, NOT) can make it easy to write the event selection criteria. InCutLang , logical AND and logical OR operator had already been used to combine multiple event selectioncriteria. The newly implemented logical NOT simpliﬁes the way to write the criteria of event selections in theanalysis code to a great extent. The simplest example code to understand the syntax: select NOT Size(ELE) > 4

This command selects events which do NOT have number of electrons greater than 4. However, the advantage ofthe NOT operator becomes more apparent when trying to negate more complex selections. The event selectioncriteria can be combined using the logical AND, OR, NOT. For example : select (NOT condition1 ) AND ( condition2 OR condition3 )

Now let us look at another code : select Size(ELE) == 2select NOT ( {ELE[0] ELE[1]}q == 0 AND {ELE[0] ELE[1]}m [] 80 100)

The criteria ( {ELE[0] ELE[1]}q == 0 AND {ELE[0] ELE[1]}m [] 80 100) can be used for deﬁning Z bosons.As we have set NOT, we veto events with Z boson while looking for other dilepton signatures. Without using the

NOT command, this selection would not be so straightforward, and would require a more complicated expression.

A.5.2 Ternary operator

Application of conditional selection criteria is available, including nested statements, using a syntax similar tothat of C++ : condition ? true-case : false-case

The following example illustrates a use case: if the number of muonsVeto particles equals to 1, then the

MTm quantity should be less than 100 otherwise the

MTe quantity should be less than 100:

Size(muonsVeto) == 1 ? MTm < 100 : MTe < 100

A.6 χ minimization In an analysis with a multitude of objects of the same type, the analyst could search for the best combinationdeﬁned by some criterion. A typical example, used in fully hadronic t ¯ t reconstruction would be to ﬁnd the jetcombination that would yield the best W boson mass, or to ﬁnd the two charged leptons that would result inthe best Z boson mass. The need for such a search can be expressed in CutLang using two special comparisonoperators: ~= and ~! . The former is used in the sense of “as close as possible to” whereas the latter forcalculation “as far as possible from”. These two operators can be used to express χ minimization kinds ofoperations. The indices of the particles in such a search are to be given as negative. For example, the statement“ﬁnd two leptons with a combined invariant mass as close to 90.1 GeV” can be expressed in CutLang notationas { LEP -1 LEP -1 } m ~= 90.1 . In this case, CutLang ﬁnds the best pair of particles satisfying the condition,and stores it per event for possible later use. However the analyzer should not use negative indices directlyinside the region block. It is a much better practice that improves readability to deﬁne a new object such as define ZLepRec = LEP[-1] LEP[-1] . This deﬁnition can be used when deﬁning histograms or other selectioncriteria, such as when selecting the charge of the found lepton pair, etc. If another particle of the same type(e.g. another lepton) is to be found, it is necessary to use a diﬀerent but still negative index value.19 .7 Deﬁnitions ADL and CutLang allow to assign alias names to constants (e.g. Z boson mass) or variables (e.g. angularvariables between objects, mass of the Z boson reconstructed from two leptons, etc.). The syntax and examplesare given in Table 13. Note that the keyword define can also be shortened as def .Table 13: Simple deﬁnitionsKeywords argument1 symbol argument1 Example define name :/= value deﬁne mZprime = 500 define name :/= function deﬁne mTop1 : m(Top1) define name :/= particle(s) deﬁne Zreco : ELE[0] ELE[1] A.8 Tables

The present version of CutLang incorporates tables to implement various HEP related quantities, such aseﬃciencies, acceptances or trigger turn-on curves. Currently only one and two-dimensional tables can be used.These tables should have a name and a table type, speciﬁed by the tabletype keyword, where the latter deﬁneswhat information is hosted by the table. Currently, only eﬃciency tables are recognized, therefore the table typeinformation only serves as documentation and is not used by the interpreter. However, as other uses for tablesare developed, table type would become more relevant in the future. Tables must also specify the number ofvariables (1 or 2) using the nvars keyword as well as the availability of errors on the central value (true or false)using the errors keyword. These should be followed by the table data, using the value [lower-error upper-error]lower-limit1 upper-limit1 [lower-limit2 upper-limit2] notation. Once deﬁned in the deﬁnitions section, the tablecan be referred to and used in object and event selection. An example table is shown below: table tightmuonefftabletype efficiencynvars 2errors true

A.9 Manipulating objects

A.9.1 Deﬁning new objects

New objects can be declared using a simple syntax: object new_object_name : base_object_name where the object keyword can also be shortened as obj , and instead of the symbol : , the keywords using and take can be used. The base object name can be a base object class, or a previously deﬁned new objecttype such in the case of deﬁning b-tagged jets from already deﬁned high transverse momentum jets. These areusually called derived objects. An example, deﬁning a derived new electron type based on predeﬁned electronswould be written as: obj goodEle : ELE both : and = can be used interchangeably object AK4jetstake JETselect {JET_}Pt > 30select {JET_}AbsEta < 2.4 It is also possible to create a new object by forming a group out of multiple base or derived objects, forexample, to create a lepton object from electrons and muons. This is achieved using the

Union function, asshown below. This particular case of new object creation does not use any selection. object leps : Union( MUO , ELE, TAU)

A.9.2 Sorting objects

By default, objects are sorted according to their transverse momentum, p t , in descending order. For example, ELE[0] denotes the electron having the highest transverse momentum. In some cases, objects may need to besorted according to some other property, such as energy, pseudorapidity etc. In the current version, this can bedone as: sort {ELE_ }E ascend

This command sorts electrons according to their energy in ascending order, i.e.

ELE[0] will have the leastenergy. Sorting can also be done in the descending order by using descend . A.9.3 Object combinatorics

Let us assume that we have an event with 5 jets, and we would like to reconstruct the hadronic Z bosons. Whatare the combinations? Numbering the jets from 1 to 5, some possibilities are given in Table 14, in the leftcolumns. It is obvious that not all possibilities are listed, and ﬁnally only one possibility can be true: after all ajet can not be used to reconstruct two diﬀerent Z bosons. On top of this, other requirements might be appliedto further restrict the possible Z candidates. For example, there might be a pseudorapidity range limit on eachcandidate, the transverse momentum of the jets forming the Z boson could be limited, the angular separationbetween the hadronic Z candidate and the ﬁrst constituent jet might be limited, and ﬁnally, the invariant massof the Z candidate might be requested to be in a certain range. After all these restrictions, the same initial setmight be reduced to the combinations listed in the right side columns of Table 14, where the candidates thatdid not pass the requirements are shown as stroked out.Table 14: Combining two jets to reconstruct a hadronic Z bosonpossibility ID Zhadronic Zhadronic1 12 342 12 353 12 454 13 245 13 256 13 45... ... ... possibility ID Zhadronic Zhadronic1 12 342 12 353 12 454 13 245 13 256 13 45... ... ...This combination example can be written in CutLang as: object hZs : COMB( jets[-1] jets[-2] ) alias ahz

21n order to activate this new object, and eliminate the combinations that do not satisfy the requirements,one has to put a selection command into the running algorithm (or region); this could be, for example, to haveat least two hadronic Z candidates per event: algo testCombinationsselect Size(jets) >= 2

As indicated by Table 14 right side, there are still multiple possibilities, such as rows 2, 4 and 5. To furtherreduce these by killing the overlapping candidates and leave a single valid one, some sort of ideal conditionshould be speciﬁed. This can be achieved using the previously discussed χ minimization. As an example case,let us require the masses of both candidates to be as close as possible to the known Z mass. Now, the ﬁnalalgorithm is given as: object hZs : COMB( jets[-1] jets[-2] ) alias ahz A.9.4 Looping over a subset of the object collection

By default, CutLang loops over all objects in a given collection. However, sometimes it is necessary to looponly over a subset, such as looping only through the ﬁrst 3 jets. ADL and CutLang allow to specify the subset,e.g. as jets[0:3] . A.9.5 Minimum and maximum of object attributes

Looping over objects can be used for selecting the minimum or maximum of a function based on any objectattribute. An explanatory example could be to apply a selection based on the minimum value of the angularseparation between each of the three most energetic jets and the most energetic electron. In CutLang , thiscriterion can be expressed as: select Min( dR(JET[0:2], ELE[0] )) > 0.9 .

A.9.6 Summing object attributes

CutLang allows looping over an attribute to calculate the sum of their values. A typical example would be thesum of transverse momenta of a set of jets. Although this frequently used function is predeﬁned and availableas HT, it could also be written as: select Sum( pT(JET) ) >= 20 .

A.9.7 Object constituents

Sometimes, the analysis might necessitate a selection based on jet constituents. CutLang allows the modiﬁerword constituents only in case of jets (or any other jet-like objects, such as the large radius FatJets) to referto these. An example for deﬁning a new jet object based on criteria on the constituents would be:22 bject goodJet using JETselect q(JET constituents ) == 0

Here the ﬁrst criterion removes all the charged constituents of each jet and eventually the jet itself if it hasno more constituents left, whereas the second criterion imposes an upper limit of 40 GeV to the sum of thetransverse momenta of the remaining constituents of each jet. All other functions available in CutLang wouldwork in the same way.

A.9.8 Daughter particles

While deﬁning a new particle based on MC truth information, it is sometimes necessary to access the daughtersof a given particle. CutLang is capable of accessing the daughters of of an MC truth particle. In the followingexample, the ﬁrst selection criterion ﬁlters the particles that decayed into two or more daughters, while thesecond criterion is used to select only the daughters with electric charge. object DVcandidates take GENselect daughters( GEN ) > 1

A.9.9 Hit and miss method

The

ApplyHM function can be used to deﬁne new objects which pass or fail the eﬃciency test in that particularregion of the parameter space. In CutLang , the random number generation is achieved via the TRandom3 [30]function in ROOT libraries. This function reports the time cost of the call to be about 5 ns on an Intel i7 CPUrunning at 2.6 GHz.An example for electrons recorded by an imaginary detector whose electron detection eﬃciency is describedby a table called myDet can be written as: object myElectrontake ELEselect applyHM( myDet({ELE}pT , {ELE}Eta) == 1)

The analysis algorithm can make use of this newly deﬁned object, myElectron , to apply selection criteria, suchas the available number of electrons per event etc.

A.10 Manipulating Events

A.10.1 Selecting or rejecting events

The conditions based on which an event can be selected or rejected are written in the region / algo blocks.They start with the select or reject keywords, and are expressed in the form of functions applied on particlescomplemented by a comparison operator and a limit value. An example for select would be select Size(goodEle) >= 2} The synonyms cut and cmd can be interchangeably used in place of the select keyword. The keyword reject is equivalent to select not , thus rejecting the events that match the given criteria, as in the example below: reject {ELE[0] ELE[1]}q == 0 AND {ELE[0] ELE[1]}m [] 80 100

There are also some special keywords that require further discussion. These are shown in Table 15. Select

ALL accepts all events, for example it can be used for event counting purposes. The next two are scale factors mostlyused in ATLAS related analyses. For other input ﬁle formats these scale factors are automatically set to unity.Table 15: Special Conditions in CutLangKeywords Example Explanation

ALL select ALL accept all events

LEPsf cmd LEPsf apply leptonic scale factor to MC events bTagSF cmd bTagSF apply b-jet tagging scale factor to MC events23 .10.2 Weighing events

Many analyses require events to be weighted for cross section and luminosity, for trigger eﬃciencies, or withvarious scale factors. CutLang has a mechanism for applying constant event weights or event weights fromfunctions, for which examples are shown below: weight trigEff 0.95weight ef2Weight myWeight({ELE_0}pT, {ELE_0}Eta)

The ﬁrst command sets the weight of the selected events to 0.95, i.e, if the number of selected events is 1000 inthe beginning, now it will be counted as 950. The second command is a slightly more complicated example asit uses a table which deﬁnes the event weight according to two parameters: pT and η . The event weight is thusobtained from that table according to the attributes of the electron with the highest transverse momentum. A.10.3 Saving events

In CutLang , it is possible to save the currently surviving events at any stage of the running algorithm. Theevents are saved into a ROOT [12] ﬁle using the command save followed by the user-given ﬁle name without the.root extension which is automatically added. It is possible to save multiple times in a single algorithm (region)or multiple algorithms. The events in the output ﬁle are saved in the native format of CutLang , known as the lvl0 ﬁle. Therefore an example could be:

Save preselects

A.11 Bins, counts and histograms

A.11.1 Bins

In analyses dealing with multiple bins for signal and/or background regions, CutLang provides a simple wayfor deﬁning the selections for those bins. The binning of the results should happen as the very last stage of aselection by using the keyword bin . Either the variable and the bin boundaries should be explicitly listed, ormultiple bins can be assigned to any variable or function using CutLang syntax. These two methods are notmutually exclusive and can deﬁne overlapping regions. It is to be noted that for the former, one deﬁnes twoimplicit bins: anything below the ﬁrst value, and anything above the last value are also recorded separately.Results from binning are both printed (depending on the switches in the initialization section of the ADL ﬁle)and recorded as a histogram in the output ROOT ﬁle. The examples below illustrate the utilization of the bindeﬁnition in an analysis algorithm: bin MET 250 300 500 750 1000

A.11.2 Counts

It is possible to register various signal, background or data counts of a region together with their associatederrors. The method to achieve this task is to start the ADL ﬁle with the deﬁnitions of various count formats.Below are two such examples where for each format type, multiple processes with diﬀerent names can also bedeﬁned. countsformat resultsprocess est, "Total estimated BG", stat, systprocess obs, "Observed data"countsformat bgestsprocess lostlep, "Lost lepton background", stat, systprocess zinv, "Z --> vv background", stat, systprocess qcd, "QCD background", stat, syst

A study described in an ADL ﬁle might use data counts or a background estimate or all of these fora statistical analysis. Therefore, the appropriate region has to contain the associated event counts and errorinformation using the correct syntax. It should be consistent with the previous deﬁnitions starting with keyword counts . Here the counts of each process should be separated by a comma, and the errors can be speciﬁed either24igure 3: An example output from ROOT’s TBrowser GUI showing histograms booked and ﬁlled by CutLangas symmetrical denoted with the +- sign or asymmetrical denoted with separate + and - signs. An exampleconforming to above deﬁnitions is given below. counts results 230.0 + 16.0 - 10.0 + 10.0 - 12.0 , 224.0counts bgests 105.0 +16.0 - 10.0 +-1.0 , 123.0 +-2.0 +-12.0 , 2.3 +-0.5 +-1.4 Once the analysis run is complete, the user ﬁnds in the output ﬁle a histogram for each of the deﬁnedprocesses with the name deﬁned in the format commands. These histograms can be recalled and used laterduring the statistical analysis stage.

A.11.3 Histograms

CutLang allows deﬁning 1D and 2D histograms for any event variable. The syntax for deﬁning histogramsfollows closely the notation in ROOT. Any histogram should have a name, like h1mReco , and a list of parametersseparated by commas. The explanation of the histogram contents should be given in quotation marks, e.g., ‘‘Z candidate mass (GeV)" ; the number of bins, lower and upper limits as numbers, e.g. ; andﬁnally the quantity to histogram with the ADL notation, e.g. {ELE_0 ELE_1}m . A similar syntax is also usedfor the 2D histograms. The example below show deﬁnitions of 1D and 2D histograms: region Wtopmassselect ...select ...hmW,"W mass (GeV)", 70, 50, 150, mWhmTop,"Top mass (GeV)", 70, 0, 700, mTophmTopmW,"Top and W mass correlation (GeV)", 50, 50, 150, 70, 0, 700, mW, mTop

Apart from the user-deﬁned histograms, CutLang by default automatically ﬁlls and saves a cutﬂow eﬃciencyhistogram for each analysis region. In case binning exists, CutLang also saves a histogram with bin counts.Figure 3 shows a snapshot of the

ROOT TBrowser , with the histograms in an output ﬁle listed, and one ofthe histograms displayed.

A.12 Structure of a complete ADL ﬁle

To be run with CutLang , an ADL ﬁle should follow a deﬁnite structure order as described in Section 4.1. Inthis structure, there are mostly optional sections and one compulsory section. The structure order consists of25nitialization, count format, deﬁnitions, new objects, more deﬁnitions using new objects, yet newer objects, andevent categorization commands. In this list only the event categorization commands are mandatory. The ADLﬁle structure allows multiple concurrent commands to be executed. The details of the ﬁrst and the last sectionsare covered next.

A.12.1 Initialization and information section

Some of the possible settings in the initialization section have already been discussed in Table 1. It is also possibleto include, in this section, some information deﬁning the work that is being done. The allowed keywords andtheir meaning is explained in the table below.Table 16: Information keywords of CutLangKeywords Type Explanation info ID a name deﬁning the work experiment ID a name deﬁning the experiment id string any string deﬁning the work title string any string for the paper title publication string any string, the publication information sqrtS number a real number, the collider energy (GeV) lumi number a real number, collected data (fb-1) arXiv string any string containing the arxiv information hepdata string any string containing the hepdata information doi string any string containing the doi information

A.12.2 Regions and algorithms

CutLang can execute multiple commands in the event categorization section of the ADL ﬁle, meaning that theanalyst can test multiple methods on the same events independently of each other during design, or work withmultiple signal and control regions. The set of commands to be executed for each independent method is calledeither an algorithm or a region, therefore the keyword to be used is algo or algorithm or region followed witha user selected name, such as: region preselection Moreover it is possible to deﬁne one (1) layer of dependency such that a region can be marked as dependenton another. In this case, the independent region’s commands are executed ﬁrst and the results are saved in amemory cache, and later the dependent region’s commands are executed based on that cache. A typical casewould be to create multiple signal regions based on a common preselection. This example is illustrated below.Note that the name of the independent region has been used in the dependent region’s list of commands directly,without any preceding keywords. region preselectionselect ....region SRApreselectionselect ....region SRBpreselectionselect ....

B The CutLang framework

B.1 installation and compilation

The code for the CutLang framework can be found in 26ttps://github.com/unelg/CutLangThe ROOT library from CERN should be pre-installed. After downloading the source code, the make command should be executed in the

CLA subdirectory to compiles the whole program. Analyses in CutLangare run runs subdirectory using the script

CLA.sh or CLA.py . This subdirectory contains several example ﬁlesthat demonstrate various aspects of ADL and CutLang . An analysis can be run using the command where theinput ROOT ﬁle type can be :

LHCO FCC LVL0 DELPHES ATLASVLL ATLMIN ATLASOD CMSOD CMSNANO . The -i or --inifile option is used for specifying the adl ﬁle. B.2 External user functions

The addition of the new so called external user functions to the existing set of internal functions is partiallyautomatized. The python helper script insertExternalFunction.py in the scripts directory is developed toaccomplish this task. It accepts the name of the header ﬁle containing the new function as an argument. Theautomatization currently works with a template based setup, therefore only with certain type of functions.Currently the following input and return types for external functions can be used for building an externalfunction into CutLang : • receives a vector of TLorentzVectors and an int, returns a vector of TLorentzVector; • receives a vector of TLorentzVectors, returns a double; • receives a vector of TLorentzVectors and a TVector2, returns a double; • receives a vector of TLorentzVectors and a TLorentzVector, returns a double; • receives 3 TLorentzVectors, returns a double;The external function must be deﬁned in a header ﬁle before running the script SS: How?. The script is funwith the following command: python insertExternalFunction.py -ext abc where abc is name of the header ﬁle without .h.h