CutLang V2: towards a unified Analysis Description Language
B. Gokturk, A. M. Toon, A. Paul, B. Orgen, N. Ravel, J. Setpal, G. Unel, S. Sekmen
CCutLang V2: towards a unified Analysis Description Language
B. Gokturk , A. M. Toon , A. Paul , B. Orgen , N. Ravel , J. Setpal , G. Unel , and S.Sekmen Bogazici University, Department of Physics, Istanbul, Turkey Saint Joseph University of Beirut, Dept. of Computer Software Engineering, Beirut, Lebanon The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy University of Ankatso, Department of Physics, Antananarivo, Madagascar R.N. Podar School, Mumbai, India University of California at Irvine, Department of Physics and Astronomy, Irvine, USA Kyungpook National University, Department of Physics, Daegu, South KoreaJanuary 28, 2021
Abstract
We will present the latest developments in CutLang , the runtime interpreter of a recently-developed anal-ysis description language (ADL) for collider data analysis. ADL is a domain-specific, declarative languagethat describes the contents of an analysis in a standard and unambiguous way, independent of any computingframework. In ADL, analyses are written in human-readable plain text files, separating object, variable andevent selection definitions in blocks, with a syntax that includes mathematical and logical operations, com-parison and optimisation operators, reducers, four-vector algebra and commonly used functions. AdoptingADLs would bring numerous benefits to the LHC experimental and phenomenological communities, rangingfrom analysis preservation beyond the lifetimes of experiments or analysis software to facilitating the abstrac-tion, design, visualization, validation, combination, reproduction, interpretation and overall communicationof the analysis contents. Since their initial release, ADL and CutLang have been used for implementing andrunning numerous LHC analyses. In this process, the original syntax from CutLang v1 has been modifiedfor better ADL compatibility, and the interpreter has been adapted to work with that syntax, resulting inthe current release v2. Furthermore, CutLang has been enhanced to handle object combinatorics, to includetables and weights, to save events at any analysis stage, to benefit from multi-core/multi-CPU hardwareamong other improvements. In this contribution, these and other enhancements are discussed in details. Inaddition, real life examples from LHC analyses are presented.
Contents a r X i v : . [ h e p - ph ] J a n Multi-threaded runs 107 Code maintenance and continuous integration 118 Analysis examples 119 Conclusions 12A User Manual 15
A.1 Blocks and keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15A.2 Predefined physics objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15A.3 Predefined functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16A.3.1 PDGID of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17A.4 Mathematical operators and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18A.5 Comparison, range and logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18A.5.1 Logical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.5.2 Ternary operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.6 χ minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.7 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.8 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.9 Manipulating objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.9.1 Defining new objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.9.2 Sorting objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21A.9.3 Object combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21A.9.4 Looping over a subset of the object collection . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.5 Minimum and maximum of object attributes . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.6 Summing object attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.7 Object constituents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22A.9.8 Daughter particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.9.9 Hit and miss method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.10 Manipulating Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.10.1 Selecting or rejecting events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.10.2 Weighing events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.10.3 Saving events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11 Bins, counts and histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11.1 Bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11.2 Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.11.3 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25A.12 Structure of a complete ADL file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25A.12.1 Initialization and information section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.12.2 Regions and algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 B The CutLang framework 26
B.1 installation and compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26B.2 External user functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
High energy collider physics data analyses nowadays are performed using complex software frameworks that in-tegrate a diverse set of operations from data access to event selection, from histogramming to statistical analysis.Mastering these frameworks requires a high level knowledge of general purpose languages and software architec-ture. Such requirements erect a barrier between data and the physicist who may simply wish to try an analysisidea. Moreover, having the physics information (e.g. object definitions, event selections, background estimationmethods, etc.) scattered throughout different components of the framework code also makes implementing andworking with different physics ideas less straightforward and efficient.These difficulties could be addressed by considering a domain specific language capable of describing theanalysis flow in a standard and unambiguous way. Various efforts have been ongoing to design such languages forhigh energy collider data analysis. One of these efforts led to the development of Analysis Description Language(ADL), a declarative language that can express the mathematical and logical algorithm of a physics analysis2n a human-readable and standalone way, independent of any computing frameworks. Being declarative, ADLexpresses the analysis logic without explicitly coding the control flow, and is designed to describe what needsto be done, but not how to do it. This consequently leads to a more tidy and efficient expression and eliminatesprogramming errors.ADL originated from the merging of two parallel efforts. It was formed by combining the best ideas fromCutLang [1, 2], an effort to build an interpreted language directly executable on events, and
LHADA (LesHouches Analysis Description Accord), initially designed by a group of experimentalists and phenomenologiststo systematically document and run content of LHC physics analyses [3, 4, 5]. At its current state, ADL iscapable of describing many standard operations in LHC analyses. However, it is being continuously improvedand generalized to address an even wider range of analysis operations.ADL is designed as a language that can be executed on data and used in real life data analyses. An analysiswritten with ADL could be executed by any computing framework that is capable of parsing and interpretingADL, hence satisfying the framework independence. Currently, two approaches have been studied to realize thispurpose. One is the transpiler approach, where ADL is first converted into a general purpose language, which isin turn compiled into code executable on events. A transpiler called adl2tnm converting ADL to C++ code iscurrently under development [4]. Earlier prototype transpilers converting
LHADA into code snippets that couldbe integrated within CheckMate [6, 7, 8] and Rivet [9, 10] frameworks were also studied. The other approach isthat of runtime interpretation. Here ADL is directly executed on events without being intermediately convertedinto a code requiring compilation. This approach was used for developing CutLang [1, 2].In this paper, we focus on CutLang and present in detail its current state denoted as CutLang v2, which wasachieved after many improvements on the early prototype CutLang v1 introduced in [1]. Hereafter, CutLangv2 will be referred to as CutLang for brevity. The main text emphasizes the novelties that led to ADL andimproved CutLang . We start with an overview of ADL in Section 2, then proceed with describing technicalitiesof runtime interpretation with CutLang in Section 3. We next present the ADL file structure and analysiscomponents that can be expressed by ADL, focusing on the new developments and recently added functional-ities in Section 4. This is followed by Section 5 describing analysis output, again focusing on new additions,Section 6, explaining the newly-added multi-threaded run functionality, Section 7 on CutLang code maintenanceand recently incorporated continuous integration, Section 8 detailing studies on analyses implementation, andconclusions in Section 9. The full description of the current language syntax is given in the form of a user manualin Appendix A, followed by a note on the CutLang framework and external user functions in Appendix B.
In ADL, the description of the analysis flow is done in a plain, easy-to-read text file, using syntax rules thatinclude standard mathematical and logical operations and 4-vector algebra. In this ADL file, object, vari-able, event selection definitions are clearly separated into blocks with a keyword value/expression structure,where keywords specify analysis concepts and operations. Syntax includes mathematical and logical operations,comparison and optimization operators, reducers, 4-vector algebra and HEP-specific functions (e.g. dφ , dR ).However, an analysis may contain variables with complex algorithms non-trivial to express with the ADL syntax(e.g. M T [11], aplanarity) or non-analytic variables (e.g. efficiency tables, machine learning discriminators).Such variables are encapsulated in self-contained, standalone functions which accompany the ADL file. Variablesdefined by these functions are referred to from within the ADL file. As a generic rule, all keywords, operatorsand function names are case-insensitive.The language content, syntax rules, and working examples of self-contained functions will be presented inthe coming sections, after a technical introduction of the CutLang interpreter. An interpreted analysis system makes adding new event selection criteria, changing the execution order orcancelling analysis steps more practical. Therefore CutLang was designed to function as a runtime interpreterand bypass the inherent inefficiency of the modify-compile-run cycle. Avoiding the integration of the analysisdescription in the framework code also brings the huge advantage of being able to run many alternative analysisideas in parallel, without having to make any code changes, hence making the analysis design phase more flexiblecompared to the conventional compiled framework approach.CutLang runtime interpreter is written in C++, around function pointer trees representing different oper-ations such as event selection or histogramming. Therefore processing an event with a cutflow table becomes3quivalent to traversing multiple expression trees with arbitrary complexities, such as the one shown in Figure 1.Here physics objects are given as arguments.Figure 1: An expression tree example: the program traverses the tree from right to left evaluating the encoun-tered functions from bottom to top.Handling of the Lorentz vector operations, pseudo-random number generation, input-output file and his-togram manipulations are all based on classes of the ROOT data analysis framework [12]. The actual parsingof the ADL text relies on automatically generated dictionaries and grammar based on traditional Unix tools,namely, Lex and Yacc [13]. The ADL file is split into tokens by Lex, and the hierarchical structure of thealgorithm is found by Yacc. Consequently, CutLang can be compiled and operated in any modern Unix-likeenvironment. The interpreter should be compiled only once, during the installation or when optional externalfunctions for complex variables are added. Once the work environment is set up, the remainder is mostly athink-edit-run-observe cycle.The CutLang framework is able to work with multiple input data types each implemented as a plug-in.For example, ATLAS and CMS open data [14] and internal ntuple formats including CMS NanoAOD [15],Delphes [16] and LHCO event formats are recognized and can be directly used. Other input file types can alsobe easily added since all particle types and event properties are worked through an internal abstraction layer.The only requirement on the input files is to use ROOT file format, which is also used for the output file whichcontains the ADL definitions and selection algorithms of each region in text format in a separate directoryinside that file. One other point to raise is that not all input file types contain the same amount of information.Therefore CutLang provides the possibility of accessing any such input data type specific information throughexternal user functions. The practical details of the CutLang framework can be found in Appendix B.
We will now explain in detail which analysis components and physics algorithms can be described by ADLand processes with CutLang . We will prioritize highlighting the many novelties added and improvementsthat took place since the original versions CutLang v1 and LHADA. The descriptions here concentrates on theconcepts and content that can be expressed and processed by ADL and functionalities of CutLang v2, ratherthan attempting to give a full layout of syntax rules, which is independently provided in the user manual inAppendix A.
As a runtime interpreter, CutLang processes events in a well-defined order. It executes the commands in theADL file from top to bottom. Therefore, the ADL files are required to describe the analysis flow in a certainorder. Some dedicated execution commands are also used within the ADL file, in order to facilitate the runtimeinterpretation. Throughout the ADL file, the mass, energy and momentum are all written in Giga ElectronVolt (GEV) and angles in radians. User comments and explanations should be preceded by a hash ( initializations:
This section contains commands that are related to analysis initialization and set up, for which,the relevant keywords are summarized in Table 1. The keywords and values are separated by an equalsign. The last two lines in the table refer to the lepton (electron or muon) triggers. Their utilization isdescribed in Appendix A.2, it is worth noting at this point that Monte Carlo (MC) simulation weightsare not taken into account when the trigger value is set to data.4 ountformats:
This section is used for setting up the recording of already existing event counts and errors,e.g., from an experimental paper publication. It is therefore not directly relevant for event processing,but rather for studying the interplay between the results of the current analysis and its published experi-mental counterpart. More generally, it is used to express any set of pre-existing counts of various signals,backgrounds, and data (together with their error) of an analysis. definitions1:
This section is used for defining aliases for objects and variables, in order to render them moreeasily referable and readable in the rest of the analysis description. For example, it can introduce shortcutslike
Zhreco for a hadronically reconstructed Z boson, or values like mH, i.e., mass of a reconstructed Higgsboson. These definitions can only be based on the predefined keywords and objects. objects:
This section can be used to define new objects based on predefined physics objects and shorthandnotations declared in definitions1. definitions2:
This section is allocated for further alias or shorthand definitions. Definitions here can be basedon objects in the previous section and predefined particles. event categorization:
This section is used for defining event selection regions and criteria in each region.Running with CutLang requires having at least one selection region with at least one command, whichmay include either a selection criterion or a special instruction to include MC weight factors or to fillhistograms.We next describe the detailed contents and usage of these sections.Table 1: Initialization keywords and their possible valuesKeyword Explanation
SkipHistos
Skip (=1) or Display (=0) the histograms in final efficiency table
SkipEffs
Skip (=1) or Display (=0) the final efficiency table
TRGm
TRGe
RandomSeed random number generator seed, an integer
Generally, the starting point in an analysis algorithm is defining and selecting the collections of objects, suchas jets, b jets, electrons, muons, taus, missing transverse energy, etc. that will be used in the next steps of theanalysis. Usually, the input events contain very generic and loose collections for objects, which need to be furtherrefined for analysis needs. CutLang is capable of performing a large variety of operations on objects, includingderiving new objects via selection, combining objects to reconstruct new objects, accessing the daughters andconstituents of objects. Once an object is defined, it is also possible to find objects with a minimum andmaximum of a given attribute within the object’s collection, or sort the collection according to an attribute.In the ADL notation, object collection definitions are clearly separated from the other analysis tasks. Here theterm object is used interchangeably with object collection. Each object is defined within an individual object block uniquely identified with the object’s name. These blocks, starting with the input object collection(s)’sname(s), list different types of operations afterwards.CutLang automatically retrieves all standard object collections from the input event file without the need forany explicit user statements within the ADL file. It can read events with different formats, such as Delphes fastsimulation [16] output, CMS NanoAOD [15], ATLAS or CMS open data [14] ntuples and recognize the objectcollections in these. One property unique to CutLang is that it is designed to map input object collections tocommon, standard object names with a standard set of attributes, as described in Appendix A.2 and A.3. Forexample, AK4jets collection in CMSNanoAOD and
JET collection in Delphes are both mapped to
Jet . Thisapproach allows to process the same ADL file on different input event formats, and has proven very useful inseveral simple practical applications. However, we also recognize that this approach only works when differentinput collections have matching properties, e.g. when Delphes electrons and CMS electrons have to the sameidentification criteria which can be mapped to the same identification attribute, or a Delphes jet and an ATLASjet use the same b-tagging algorithm that can be mapped to the same b-tagging attribute. Therefore, otherinterpreters of ADL may choose to use input collection and attribute names as they are, in order to be more5nambiguous. Allowing to practice different approaches with advantages for different use cases, while stilladhering to the principle of clarity is a significant aspect of ADL.The most common object operation is to take the input object collection and filter a subset by applying aset of selection criteria on object attributes. This can be done very straightforwardly in ADL by listing eachselection criterion in consecutive lines. The objects in the input collection satisfying the criteria can be eitherselected or rejected using the select or reject keywords. Comparison operators such as = , ! = >, <, > = , < =can directly be used for expressing the criteria. Logical operators AND , OR and NOT can be used for expressingcomposite or reverted criteria. A complete list of these operators can be found in Appendix A.5.It is also possible to filter an object collection based on other object collections, such as in the cases ofobject cleaning or matching. For example, one can reject jets overlapping with photons, or select boosted Wjets matching generator level W bosons. Such operations involve intrinsic loops, which are readily handled byCutLang . Functions such as δφ or angular distance δR can be readily used when comparing objects. Givenan initial object collection, one can consecutively derive several objects. For example jets can be filtered toobtain cleanjets , while cleanjets can be further filtered to obtain verycleanjets . One can also use thesame initial collection to define different collections such as taking muons and imposing different criteria toobtain loosemuons and tightmuons .Another very common operation is to combine objects to reconstruct new objects, such as combining 2leptons to form a Z boson. Sometimes, the reconstruction could be very straightforward, as in requesting toreconstruct only a single Z boson per each event. However, in other cases, one might have to reconstruct as many Z bosons as possible. In each case, reconstructed candidates might undergo filtering or selection of a single mostoptimal candidate among all candidates. Combination operations are very diverse, and finding a completelygeneric expression for them is non-trivial. In its v1, CutLang could reconstruct an explicitly defined number ofobjects per event. It could find the object satisfying given criteria by performing optimization operations. In v2,CutLang has been generalized to reconstruct any number of objects, by taking into account the combinatorics.Selection criteria can also be imposed on both the input and reconstructed objects. Technical information onhow to perform combinations is provided in Appendix A.9.3.Another common situation is when objects in a collection are individually associated to other collections.Examples include mothers or daughters of generator level particles, subjets or constituents of jets, associatedtracks of leptons or jets. As a first CutLang was adapted in v2 to work with jet constituents using the syntaxdescribed in Appendix A.9.7. Another example of association is daughters of generator truth level particles. Ifan analysis if performed directly on generator level particles, or if a study is required on truth level particles,information such as PDGID codes or decay chain become relevant. CutLang is now capable of accessingPDGID and the decay products of a particle (referred as ”daughters” in HEP), with the syntax described inAppendix A.3.1 and A.9.8. CutLang provides both the number of daughters and a modifier to refer to thedaughters. Work is in progress for finding a generalizable syntax for object association expressions.Members of object collections can be directly accessed via their indices. Being declarative, ADL syntaxdoes not include explicit statements for looping over object collections, and CutLang is capable of interpretingthis implicit looping. For example, when filtering a jet collection, one might apply a cleaning criterion whichrequires no electron to be in the proximity of the jet defined by a radius. Applying this criterion requires loopingover electrons, however it suffices to write the electron object’s name in order for CutLang to interpret implicitlooping based on the context. In other cases, it might be necessary to access only a subset of the collection,such as when imposing a selection on the δφ between first 3 jets with highest p T and the missing transversemomentum. ADL and CutLang were updated to allow such operations. The Python slice notation has beenadapted for expressing subset ranges in object collections, as described in Appendix A.9.4.Input or defined object collections are by default sorted by CutLang in the order of decreasing transversemomentum p T . ADL can express sorting object collections according to any feature, in ascending or descendingorder, and CutLang is capable of performing such sorting operations. Moreover, so-called ”reducers” can beapplied for extracting values from existing object collections. One case is the capability to extract the maximumor minimum value of a given attribute in an object collection. For example, CutLang can give the maximum p T possessed by a jet in a jet collection, or minimum value of isolation possessed by an electron in an electroncollection. Another case is the summation operation, where one can sum over the values of a given attributeover the whole collection. The most common use case here is the summation of object p T s to obtain eventvariables such as the hadronic transverse energy H T . Sorting and reducers are recent additions to ADL andCutLang and the details on their implementation and usage are given in Appendix A.9.2, A.9.5, A.9.6 and inthe examples referred to in Section 8. 6 .3 Object or event variables An object variable is a quantity defined once per object, such as a jet’s transverse momentum p T or an electron’srelative isolation. An event variable is a quantity defined once per event, such as missing transverse energy E missT ,number of electrons selected using the tight criteria, p T of the highest p T jet, transverse mass calculated usingthe highest p T lepton and E missT . Object and event variables used in object definitions or event categorization inan analysis are not always fully provided in the input event data. These quantities therefore need to be computedduring the analysis using the existing inputs. ADL is designed to allow definition of such new variables in twoways. Simple variables that could be described analytically using a single line formula can be expressed withinthe ADL file using mathematical operations. A classic example would be that of the definition of transversemass obtained from a visible object, such as a lepton, and the missing transverse energy. To enable writingthese simple formulas, CutLang is capable of parsing and processing operators such as + , − , ∗ , /, ˆ. CutLanghas also incorporated a series of internal functions to express other operations such as abs(), sqrt(), sin(), cos(),tan() and log(). Reducer operators used for reducing collections to a single value, e.g. size(), sum(), min(),max() are also available for computing quantities. For example, the hadronic transverse momentum H T can becomputed from all jets in an event using the sum() reducer as sum(pT(jets)) .However, in many cases, variables are defined by complex algorithms non-trivial to express. Examples suchas angular separation dR , aplanarity, stransverse mass M T [11], razor variables [17], etc. either cannot beeasily written using the available operators or require multiple steps of calculation. Some of these algorithms,like angular separation and razor variables were predefined as internal functions in CutLang , and more, like H T and M T were added recently. A list of existing variables can be found in Appendix A.3. Other algorithmscan be easily incorporated by the user following the recently generalized recipe in Appendix B. Another classof sophisticated variables include quantities defined from numerical functions, such as object or trigger efficien-cies used to compute object or event weights, provided in tables or histograms, or discriminators/efficienciescomputed via machine learning models. All these variables are incorporated by being defined in independent,self-encapsulated functions outside the ADL file and referring to them within the ADL file. These external userfunctions should be seen as a natural extension of the language. The ultimate aim is to provide these functionsin a well-defined and straightforwardly extendable database.The expressions for variables, whether they are built directly using the available mathematical operators orindirectly via internal or user functions, can be written openly in the place of usage, e.g. in the line when aselection is applied on the variable. Alternatively, if the variable is used multiple times in an analysis, e.g. indifferent selection regions, it can be defined once, using the define keyword, which allows to assign an aliasname to the variable. Currently, defining aliases using the define keyword is only possible for event variablesin CutLang , but not for object variables. In CutLang , the define expressions are uniquely placed at the endof the object blocks and before the beginning of the event selection. In a typical collider analysis, events are categorized based on different sets of selection criteria applied onevent variables into a multitude of signal regions enhancing the presence of the signal of interest, or control orvalidation regions used for estimating backgrounds. These regions can be derived from each other, and can becorrelated or uncorrelated depending on the case. ADL organizes event categorization by defining each selectionregion in an independent region block and labels each region with a unique name. The region blocks mainlyconsist of a list of selection criteria. As in the case for objects, each criterion is stated in a line starting witha select or a reject keyword, which allows to select or reject the events satisfying the criterion, respectively.Comparison operators, logical operators and ternary operator, syntax for which is described in Appendix A.5are used for expressing the criteria. Another operation that can be performed within the context of eventclassification is χ optimization for reconstructed quantities, whose syntax is described in Appendix A.6. Anexample would be finding among several top quark candidates, the candidate with mass closest to the top quarkmass, and using the optimal candidate’s properties for further selection.ADL and CutLang allow deriving selection regions from each other, e.g. deriving multiple signal regionsfrom a baseline selection region. This is done by simply referring to the baseline region by name in the newregion’s block, and not repeating the whole selection every time.In many analyses, especially those targeting searches for new physics, events in given search regions arepartitioned into many bins based on one or more variables, e.g. H T , E missT or some invariant mass. Datacounts and background estimates in these bins constitute the result of the analysis. With the increased data,recent LHC analyses, especially inclusive searches for new physics may contain hundreds of bins. Treating each This block was called algo in the original CutLang syntax. Even though algo is still valid in CutLang , we generally refer tothe block as region , as the latter is a more domain specific word. bin keyword. Bins in a region, by definition, are to be non-overlapping. The CutLang interpreterand framework operate based on this principle, and skip an event once it is classified into a bin. This propertydistinguishes bins from regions, as different regions can be overlapping, and a given event is evaluated for allregions, independent of whether it is selected or not by the preceding regions. Bins can be described in two ways:when the binning is done using only a single variable, all bins can be defined in a single line, by specifying thevariable name and the bin intervals. When bins are defined based on multiple variables, this way of descriptioncan become ambiguous, and a more explicit description, where each bin is defined in one dedicated line canbe used. The usage and syntax of the bin keyword is described in Section A.11.1. In case multiple regionswould have the same binning (e.g. a signal region and several control regions from which the background isestimated), currently, the binning definitions must be separately specified in each region independently. Weare searching for a more practical way of expression which would avoid the repetition, while keeping with thehuman readability principle.
In an analysis, events, especially simulated events are usually weighted in order to match the real data luminosityor to correct for detector effects. CutLang has been recently adapted to incorporate the capability of applyingevent weights. Event weights can be applied within the region blocks via usage of the weight keyword asdescribed in Appendix A.10.2. A particular event selected by two different regions can receive different weights.Event weights can be either constant numbers or functions of variables. These functions may include analyticalor numerical internal or user functions. Weights based on numerical functions, such as efficiencies (e.g. triggerefficiencies) can also be applied from tables written within the ADL file, as described in Appendix A.8. Thesystematic way for expressing efficiencies in tables and applying them to objects and events was incorporatedrecently in ADL and CutLang .
As mentioned above, applying efficiencies to events and objects, such as trigger efficiencies or object recon-struction, identification and isolation efficiencies is a common part of many analyses. Section 4.5 described howto apply the effect of event efficiencies as event weights. There is, however, another approach, which involvesemulating the effects of efficiencies. This approach involves randomly accepting events or objects having a cer-tain property, such that the total selected percentage reflects that of the efficiency. For example, if the overallreconstruction and identification efficiency for an electron with 20 < p T <
40 GeV and | η | < . p T and | η | range is allowed to pass the selection only with a 0.6 probability. Thedecision for selection is made by sampling a uniform random number between 0 and 1, and accepting the eventor object if the uniform random number is greater than the efficiency value. Usually, the uncertainty on theefficiency is also taken into account when making the pass/fail decision. This is called the hit-and-miss method.Emulating efficiencies using the hit-and-miss method is regularly used in parametrized fast simulation frame-works. It is also becoming increasingly relevant to incorporate this functionality in the analysis step, especiallyfor the benefit of phenomenological studies targeting interpretation or testing new analysis ideas. These studiesgenerally use events produced by fast simulation or even at truth level instead of real collision data eventsor MC events produced by full detector simulation as used in experimental analyses. Experimental analysesuse complicated object identification criteria, which cannot be implemented by fast simulation. Moreover, it iscommon to see different analyses working with different identification methods for a given object (e.g. cut-basedidentification versus multivariate analysis-based identification for electrons), as different methods may performbetter for different physics cases. Consequently, working with different phenomenology analyses each usingdifferent identification criteria requires implementing all these criteria in the simulation step, which is highlyimpractical. Therefore, it is helpful for the infrastructure handling the analysis step to have the capability toemulate using efficiencies.Emulating efficiencies with uncertainties was recently incorporated in CutLang . The hit-and-miss method isapplied via the internal function applyHM . In the current implementation, the efficiency values and errors versusobject properties are input via table blocks in the ADL file. This will be generalized to reading efficienciesfrom other formats, e.g. input histograms or numerical external functions in the near future.The applyHM function uses a uniform distribution to decide if the central value was hit (below the value)or missed (above the value), the central value itself is recalculated in case the table contains errors. The newvalue is recalculated each time based on a double Gaussian function with positive and negative widths which8re the errors of the associated bin in the efficiency table: dg ( x ) ≡ (cid:114) π ∗ (cid:15) u ∗ (cid:15) d ∗ (cid:20) e − ( x − µ ) ∗ (cid:15) d × θ ( µ ) + e − ( x − µ ) ∗ (cid:15) u (cid:21) (1)where µ is the central value of the relevant bin from efficiency table, (cid:15) u and (cid:15) d are the errors in the same binand finally θ is the unit step function. The applyHM function can both be used in the object blocks for definingderived object collections. It can also be used in the region blocks to apply efficiencies on a particular object,e.g. to check whether the jet with the highest p T is a b-tagged jet or not. Syntax for the applyHM function canbe found in Appendix A.9.9. As described in the introduction, the main scope of ADL is the description of the physics content and algorithmicflow of an analysis. The language content presented up to this point serves this purpose. However furtherauxiliary functionalities are required for practicality while running the analysis on events. One such functionalityis histogramming. Since the start of its design, CutLang has been capable of filling one-dimensional histogramsof event variables. Recently, the capability of drawing two-dimensional histograms has been added. The syntaxfor histogramming can be found in Appendix A.11.3. Histogramming is currently only available for eventvariables. It will be added for object properties in the near future.
The main priority of the ongoing developments is to establish the principles of ADL as a language. Here,we refer to a language as a set of instructions to implement algorithms that produce various kinds of outputthrough abstractions for defining and manipulating data structures or controlling the flow of execution. It ishowever important to distinguish that a language can be expressed using alternative vocabulary or syntax.Here, vocabulary is the words with a particular meaning in the language, such as block or keyword names,and syntax is the set of rules that defines the combinations of symbols that are considered to be a correctlystructured expression of the language. Our experience on the way from CutLang v1 and LHADA to ADL showedthat there might not always be a single best syntax for expressing a given content. Alternative syntax optionsmay be more favorable in different use cases, due to practicality or simply due to different tastes of the users.Recognizing this, we recently opted to host multiple syntactic alternatives in ADL and CutLang for severalcases. The most obvious case is the syntax for expression of object attributes, as described in Appendix A.2.A more minor example is the name for the event classification block keyword, i.e. both region and algo arevalid. Another is in the expression of specifying the input object collection in an object block, where either take keyword, using keyword or a colon ”:” are valid. CutLang was recently updated to be able to parse andinterpret different alternatives in such cases. We assume that, these differences will naturally converge andunify as the user base and implemented analysis examples expand.
CutLang as an analysis framework is designed to output information and data that would be used for furtheranalysis. The main output obtained after running an analysis in CutLang is provided in a ROOT file. The file,first of all, includes a copy of the ADL file content in order to document the provenance of the analysis. It alsoincludes histograms with all the event counts and uncertainties obtained from the analysis and all histogramsdefined by the user. CutLang is also capable of skimming and saving events using the auxiliary save keywordin its internal format
LVL0 , as described in Appendix A.10.3. In case event saving is specified in the ADL file,the ROOT file also stores the saved events.The output ROOT file includes a directory for each event categorization region, i.e. each region block. Thesedirectories contain all user-defined histograms specified in the ADL file. The prototype version of CutLang alsohad a basic cutflow histogram listing the number of events surviving each step of the selection in the givenregion. The cutflows, including the statistical errors on counts are also given as text output. In the currentversion, the cutflow histograms are improved to include the selection criteria as bin labels. Moreover, in casebinning is used in a region, a bincounts histogram is also added, where each histogram bin shows the eventcounts and errors in each selection bin, and the histogram bin labels show the bin definition. The cutflow and bincounts histograms can be directly used in the subsequent statistical analysis of the results.9 .1 Incorporation of existing counts
In some cases, event counts and uncertainties from external sources are needed to be systematically accessible inorder to be processed together with the counts and uncertainties obtained from running the analysis via CutLang. One example is phenomenological interpretation studies, where the analysis is only run through signal samples,while the experimental results, consisting of data counts and background estimates are usually taken from theexperimental publication. Having the data counts and background estimates directly available in a formatcompatible with the signal counts is necessary for subsequent statistical analysis. Moreover, for this particularcase, it is also highly desirable to have this information documented directly within the ADL file. Anotherexample is validation studies, when either multiple teams in an experimental group are synchronizing theircutflows, or a reimplemented analysis for a phenomenological interpretation study is validated against a cutflowprovided by the original experimental publication. Similarly, having the validation counts and uncertainties inthe same format would make comparison very practical.Recently, a syntax was developed in ADL for systematically storing external counts and uncertainties withinthe ADL file. The physics process for which the information is given, and the format of the information isprovided within the countsformat block using the process keyword, while the values are given in the relevant region blocks right after the definition of the relevant selection criteria using the counts keywords. The syntaxis detailed in Appendix A.11.2. When an ADL file including external counts and errors is run with CutLang, the counts and errors are converted into cutflow and bincounts histograms with a similar format to thosehosting the CutLang output. The histogram and are placed under the relevant region directories, and physicsprocess is included in the histogram names.
CutLang has been recently enhanced with the capability of multi-threaded execution of an analysis to optimallyutilize the available resources and therefore get faster results. Adding -j n to the command to start the analysisresults in using n number cores. The requirement for n is to be an integer between and total number of coreson the processor, where case is used to select one less than total number of cores to maximize performancefor demanding analyses while leaving the operating system part of the resources. CutLang can be run using 2cores as: ./CLA.sh [inputrootfile] [inputeventformat] -i [adlfilename] -j 2 Figure 2 shows the run time dependence on multi-threading. The mean and standard deviation of theseresults are given in Table 2. The computer used during the test has Intel(R) Core(TM) i5-8300H with 4 cores, 8threads and runs Ubuntu 18.04.4 LTS. The number of events analyzed was limited to 3 million due to memoryrestrictions. Table 2: Data points given in Figure 2.Threads Mean no. of Events/sec Std.Dev.1 3063.4 14.52 5853.5 18.54 10223.3 22.36 11028.0 29.68 11272.0 119.6As can be seen from the results, events per second ratios between analyses are not dissimilar from the ratiosof number of cores used in these analyses up to 4 parallel processes. Simultaneous processing efficiency, resourcedemand of background processes and recombination of results that are obtained in parallel contribute to declinein performance of multi-threaded runs. Due to the processor having only 4 physical cores with 2 logical coreseach, runs that use more than 4 threads showed minimal improvement.In another performance test, run times of 1,2,4 and 8 threaded analyses for varying events are given in Table3. To simplify, a normalized version of Table 3 is also provided in Table 4, where runtime of analysis that used1 core is taken to be the norm. Looking at these tables, it can be seen that as the analyses get more complex,higher levels of multi-threading performance gets better and better .10 E v en t s pe r S e c ond Multi-threaded Run Performance
Figure 2: Events processed per second when analysis is divided into 1, 2, 4, 6 and 8 threads for varying numberof events. Error bars are multiplied by 10 to make them visible.Table 3: Variation of run times with changing number of threads.Process Time For Core Used [s]1 2 4 8Processed Events 10 . × . × The CutLang source code is public and resides in the popular software development platform GitHub [18]:https://github.com/unelg/CutLangCutLang uses GitHub functionalities for parallel code development across multiple developers. This devel-opment platform, apart from a wiki page for documentation and possibility for error reporting, also offers acontinuous integration setup which includes a series of tasks that could be initiated at a specific time or by atrigger such as a commit to the main branch. The continuous integration setup was recently incorporated toautomatically validate the code. The setup compiles the CutLang source code from scratch, and runs the result-ing executable over a set of example ADL files from the package on a multitude of input data files and formats.By comparing the output from the examples to a carefully selected reference output, any coding errors could beautomatically detected and reported by email. The total compilation and execution time is greatly reduced byusing a pre-compiled version of ROOT and by pre-installing the necessary event files onto a Docker [19] imageintegrated to a recent Linux (Ubuntu) virtual computer made available by the development platform.
ADL and CutLang are continuously being used for implementing a diverse set of LHC analyses and runningthese on events. The analyses implemented are being collected in the following GitHub repository [20]:https://github.com/ADL4HEP/ADLLHCanalyses 11able 4: Runtimes as percentages of single core runtime.Normalized Process Time1 2 4 8Processed Events 10
100 98.7 101 14910
100 57.2 38.6 45.710
100 50.7 29.8 32.02 . ×
100 51.9 29.4 27.04 . ×
100 51.3 29.6 26.6The main focus so far has been to implement analyses designed for new physics searches, in particularsupersymmetry searches. These supersymmetry analyses are intended to be directly used to create modelefficiency maps to be used by the reinterpretation framework SModelS [21, 22, 23]. The results obtained byrunning some of the implemented analyses have also been validated within dedicated exercises performed duringthe Les Houches PhysTeV workshops, in comparison to other analysis frameworks [5]. The available analysisspectrum is currently being extended to cover Higgs and other SM analyses. Furthermore, studies are ongoingto improve the functionalities of ADL and CutLang for use in searches or interpretation studies with long-livedparticles, which involve highly non-conventional objects and signatures. More recently, analyses examples forCMS Open Data [14] and a sensitivity study case for High Luminosity LHC and the Future Circular Colliderwere also added [24]. In addition, ADL and CutLang were used as main tools in an analysis school which tookplace in Istanbul in February 2020 for undergraduate students, and several analyses were implemented by theparticipating students [25]. ADL and CutLang were also used to prepare hands-on exercises for data analysisat the 26th Vietnam School of Physics (VSOP) in December 2020 [26]. The VSOP exercises involving runningCutLang and further analysis of resulting histograms with ROOT were also adapted for direct use via Jupyternotebooks, and are documented in detail in [27]. The experience in both schools justified ADL and CutLang ashighly intuitive tools for introducing high energy physics data analysis to undergraduate and masters studentswith nearly no experience in analysis.Implementing analyses with a variety of physics content led to incorporating a wider range of object andselection operations and helped to make the ADL syntax more generic and inclusive. Syntax for generalizingobject combinations, numerical efficiency applications, hit-and-miss method, bins and counts and many otherswere introduced as a result of these studies. Consequently, the scope and functionality of CutLang interpreterand framework was also enhanced. Many internal and external functions were added to CutLang to addressdirect requirements of the various implemented analyses. Running different analyses on events also allowed tothoroughly test the capacity of CutLang in performing complete, realistic analysis tasks.
We presented the recent developments in CutLang , leading towards a more complete analysis descriptionlanguage and a more robust runtime interpreter. The original syntax of the earlier CutLang prototype versionand its event processing methods have been modified after a multitude of discussions with other scientistsin the field interested in decoupling the physics analysis algorithms from the computational details and afterimplementing many HEP analyses. Modifications include significant enhancement of object definition andevent classification expressions, addition of more functions for calculating event variables, incorporation oftables for applying efficiencies, adaptation of a system for including external counts, and more. Although thesemodifications broke the strict backward compatibility of the earlier version of the language, we believe theyshould be considered as improvements as they certainly will lead to a cleaner, more robust and a widely acceptedanalysis description language. The improved syntax processing relies on formal lexical and grammar definitiontools widely available in all Unix-like operating systems.One direct result of the syntax modifications originating from community-wide discussions is that, in thepresented version there are more than a single way of expressing the same idea in CutLang . We believe thisis a desirable property: after all, in human languages (that we try to imitate) as well, the same idea can beexpressed in multiple ways. To give an example to reject events with a property smaller than a certain thresholdamounts to accepting events greater than the same threshold. Such a property should not be considered as asource of potential confusion and error, but as a fertility of the language.CutLang still follows the approach of runtime interpretation. We strongly believe that direct interpretationof the human readable commands and algorithms, although slower in execution as compared to a compiledbinary, leads to faster and less error-prone algorithm development. The possible event processing speed issues12an be cured by parallel processing of independent events and regions. The interpreted and human readablenature of CutLang and ADL have a potential area of growth and development: with the advance of machinelearning hardware and software tools, the dream of being able to perform an LHC-type analysis just by talkingto the computer in one’s native tongue might not be too far-fetched.Finally, as any language, CutLang /ADL grows with the people that use it to solve new problems. Withevery analysis requiring a new functionality, the list of already-solved problems grows. We hope that, such aninternal library together with the script assisted addition of external user functions will allow the analysts ofthe future to spend less time on previously solved problems and to focus their energy in innovating solutions tothe analysis problems of the post LHC era colliders.
References [1] S. Sekmen and G. Unel,
CutLang: A Particle Physics Analysis Description Language and RuntimeInterpreter , Comput. Phys. Commun. (2018) 215–236, [ arXiv:1801.05727 ].[2] G. Unel, S. Sekmen, and A. M. Toon,
CutLang: a cut-based HEP analysis description language andruntime interpreter , in (2019) [ arXiv:1909.10621 ].[3] G. Brooijmans et al.,
Les Houches 2015: Physics at TeV colliders - new physics working group report , in (2016) [ arXiv:1605.02684 ].[4] G. Brooijmans et al.,
Les Houches 2017: Physics at TeV Colliders New Physics Working Group Report ,(2018) [ arXiv:1803.10379 ].[5] G. Brooijmans et al.,
Les Houches 2019 Physics at TeV Colliders: New Physics Working Group Report ,in (2020)[ arXiv:2002.12220 ].[6] M. Drees, H. K. Dreiner, J. S. Kim, D. Schmeier, and J. Tattersall,
CheckMATE: Confronting yourfavourite new physics model with LHC data , Computer Physics Communications (2015) 227 – 265.[7] J. S. Kim, D. Schmeier, J. Tattersall, and K. Rolbiecki,
A framework to create customised lhc analyseswithin CheckMATE , Computer Physics Communications (2015) 535 – 562.[8] J. Tattersall, D. Dercks, et al.,
CheckMATE: Checkmating new physics at the LHC , in
Proceedings of the38th International Conference on High Energy Physics (ICHEP2016). 3-10 August 2016. Chicago (2016)120.[9] B. Waugh, H. Jung, et al.,
HZTool and Rivet: Toolkit and Framework for the Comparison of SimulatedFinal States and Data at Colliders , in (2006) [ hep-ph/0605034 ].[10] A. Buckley, J. Butterworth, et al.,
Rivet user manual , Computer Physics Communications (2013)2803 – 2819.[11] A. Barr, C. Lester, and P. Stephens, m(T2): The Truth behind the glamour , J. Phys. G (2003)2343–2363, [ hep-ph/0304226 ].[12] R. Brun and F. Rademakers, ROOT - An Object Oriented Data Analysis Framework , Nucl. Inst. andMeth. in Phys. Res. A (1997) 81–86.[13] “Lex and Yacc page.” http://dinosaur.compilertools.net .[14] “CERN Open Data Portal.” http://opendata.cern.ch .[15] Rizzi, Andrea,
The Evolution of Analysis Models for HL-LHC , EPJ Web Conf. (2020) 11001.[16] J. de Favereau, C. Delaere, et al.,
DELPHES 3: a modular framework for fast simulation of a genericcollider experiment , Journal of High Energy Physics (2014).[17] C. Rogan,
Kinematical variables towards new dynamics at the LHC , arXiv:1006.2727 .1318] “CutLang GitHub repository.” https://github.com/unelg/CutLang .[19] “Docker web page.” .[20] “ADL LHC analyses repository.” https://github.com/ADL4HEP/ADLLHCanalyses .[21] S. Kraml, S. Kulkarni, et al., SModelS: a tool for interpreting simplified-model results from the LHC andits application to supersymmetry , Eur. Phys. J. C (2014) 2868, [ arXiv:1312.4175 ].[22] F. Ambrogi, S. Kraml, et al., SModelS v1.1 user manual: Improving simplified model constraints withefficiency maps , Comput. Phys. Commun. (2018) 72–98, [ arXiv:1701.06586 ].[23] F. Ambrogi et al.,
SModelS v1.2: long-lived particles, combination of signal regions, and other novelties , Comput. Phys. Commun. (2020) 106848, [ arXiv:1811.10624 ].[24] A. Paul, S. Sekmen, and G. Unel,
Down type iso-singlet quarks at the HL-LHC and FCC-hh , arXiv:2006.10149 .[25] A. Adiguzel, O. Cakir, et al., Evaluating Analysis Description Language Concept as a First Introductionto Analysis in Particle Physics , arXiv:2008.12034 .[26] “26th Vietnam School of Physics: Particles and Dark Matter, 29 Nov 2020 - 11 dec 2020, Quy Nhon.” https://indico.in2p3.fr/event/19437/overview .[27] “VSOP hands-on exercises.” https://github.com/unelg/CutLang/wiki/VSOP26HandsOnEx .[28] “PDG Particle Identification Numbers.” https://pdg.lbl.gov/2013/pdgid/PDGIdentifiers.html .[29] P. D. Group, P. A. Zyla, et al., Review of Particle Physics , Progress of Theoretical and ExperimentalPhysics (2020)[ https://academic.oup.com/ptep/article-pdf/2020/8/083C01/33653179/ptaa104.pdf ]. 083C01.[30] M. Matsumoto and T. Nishimura,
Mersenne twister: a 623-dimensionally equidistributed uniformpseudo-random number generator , ACM Trans. Model. Comput. Simul. (1998) 3–30.14 User Manual
All information about ADL and CutLang including publications, talks and twikis with syntax rules can beaccessed through the following portalhttps://cern.ch/adlThe code for CutLang is hosted in the GitHub repositoryhttps://github.com/unelg/CutLangwhich provides up-to-date instructions on how to install, compile and run CutLang .
A.1 Blocks and keywords
An ADL file consists of blocks based on a keyword value/expression structure. The blocks allow a clear separationof analysis components. A typical block looks as follows: blockkeyword blockname
Table 6 lists the available blocks, their purposes and associated keywords, and Table 5 lists the keywords.The details on their applications are given in the following sections.Table 5: Blocks in ADL and CutLangBlock Purpose Related key-wordsobject / obj Object definition block. Produces an object type from an inputobject type by applying selections. take, select, re-jectregion / algo Event categorization. select, reject,weight, bin,sort, counts,histo, saveinfo Contains analysis information such as the experiment, center-of-mass energy, luminosity, publication details, etc.table Generic block for tabular information, such as efficiency values ver-sus variable ranges tabletype, nvars,errorscountformat Expresses the processes for which external counts are included andthe format of counts process
A.2 Predefined physics objects
Basic physics objects and their properties currently available in CutLang are defined in Table 7. The predefinedparticles are initially sorted per decreasing transverse momentum and their indices start at zero. With thecurrent implementation, all the predefined particle names, and commonly used function names have becomecase-insensitive. For the particle, both Python-type and L A TEX-type notations are accepted; the former withsquare brackets, and the latter with an underline character. An example for electrons is given below:
Ele_0 = ELE_0 = Ele[0] = ele[0] = electron_0 = electron[0] .
Sometimes it is necessary to refer to the whole object set or just to some of its members. The CutLangnotation for these cases is to write the name of the set without any indices for the former (i.e.
ELE ) and to usethe semi-colon notation for the latter (i.e.
ELE[0:2] = ELE 0:2 ) .In CutLang , there are two object-types that merit special attention: the lepton and the neutrino types. The
LEP keyword refers to a generic lepton and at runtime it is reduced to an electron or to a muon depending onthe choice as explained in Table 1. This helps the physicist avoiding two algorithm sections, one for electron andother muon based analyses. The second object-type is related to the taming of the neutrino escaping from the15able 6: Keywords in ADL and CutLangKeyword Purpose Related blockdefine Define variables, constants –select Select objects or events based on criteria that follow the keyword. object, regionreject Reject objects or events based on criteria that follow the keyword. object, regiontake / using / : Define the mother object type objectsort Sort an object in an ascending or descending order wrt a property. regionweight Weight events regionhisto Fill histograms regionprocess Specify process and the format for which external counts are given countformatcounts Give external counts regiontabletype Specifies type of the table tablenvars Number of variables in a table tableerrors Type of errors indicated in a table tabletitle, experiment, id,publication, sqrtS,lumi, arXiv, hepdata,doi Provide information about the analysis (see Table 16) infodetector. At LHC energies and beyond, for which CutLang is intended, the W bosons are generally producedwith a sufficient boost such that in the leptonic decays, the pseudorapidity of the charged lepton is not verydifferent from the chargeless one. Therefore this particular physics object benefits from this approximation todefine a massless and chargeless particle with transverse momentum and azimuthal angle ( φ ) values extractedfrom the missing transverse energy (MET) measurements. The pseudorapidity, however, is taken equal to thatof the charged lepton with the same particle index.Table 7: Basic physics object nomenclature in CutLangName Keyword First object Second object j + 1 th objectElectron ELE ELE[0] ELE 0 ELE[1] ELE 1 ELE j
Muon
MUO MUO[0] MUO 0 MUO[1] MUO 1 MUO j
Tau
TAU TAU[0] TAU 0 TAU[1] TAU 1 TAU j
Lepton
LEP LEP[0] LEP 0 LEP[1] LEP 1 LEP j
Photon
PHO PHO[0] PHO 0 PHO[1] PHO 1 PHO j
Jet
JET JET[0] JET 0 JET[1] JET 1 JET j
Fat Jet
FJET FJET[0] FJET 0 FJET[1] FJET 1 FJET j b-tagged Jet
BJET BJET[0] BJET 0 BJET[1] BJET 1 BJET j light Jet
QGJET QGJET[0] QGJET 0 QGJET[1] QGJET 1 QGJET j
Neutrino
NUMET NUMET[0] NUMET 0 NUMET[1] NUMET 1 NUMET j
MET
METLV METLV[0] METLV 0 — — —generator particle
GEN GEN[0] GEN 0 GEN[1] GEN 1 GEN j
A.3 Predefined functions
Functions in CutLang can be used for accessing object attributes, or for computing new variables from objector event quantities. Functions for accessing object attributes can be directly related to Lorentz vectors suchas mass, momentum, rapidity etc, or be related to other variables found in some commonly used ntuples. Inboth cases, both the function syntax with parentheses and the attribute syntax with curly braces can be used.Functions used for computing new quantities can use object attributes or other already calculated quantities orconstants. The currently available object attribute functions in CutLang are listed in Table 8. Note that someof the attributes listed here are only valid for certain input types, e.g. for CMS NanoAOD, but not for others,e.g. for Delphes. The functions used for computing new quantities are listed in Table 9.One should note that in CutLang adding particles could be achieved by either writing these one after theother separated by space(s), or by using a + sign. Both notations are equally valid. Additionally, one shoulduse a comma as the separation character for the functions requiring multiple arguments.16he internal functions, such as angular distance or transverse momentum are also case-insensitive in CutLang, though they are written in this manuscript with a certain syntax (first letter upper case) for clarity in reading.The functions requiring multiple arguments should use comma character for argument separation. Externalfunctions can also be downloaded and added to CutLang library. The instructions for this operation is describedin appendix B. Table 8: Functions and syntax for object attributes in CutLang .
Meaning
Syntax 1 Syntax 2
Lorentz vector-related attributes
Mass of m( ) { } m Charge of q( ) { } q Phi of
Phi( ) { }
Phi
Eta of
Eta( ) { }
Eta
Absolute value of Eta of
AbsEta( ) { }
AbsEta
Rapidity of
Rep( ) { }
Rep
Pt of
Pt( ) { } Pt Pz of
Pz( ) { } Pz Energy of
E( ) { } E Momentum of
P( ) { } P Other attributes
PDGID of a particle
PDGID( ) { }
PDGID
Charge of a particle btagDeepB( ) { } btagDeepB is the jet b tagged? bTag( ) { } bTag
Soft Drop mass of a jet msoftdrop( ) { } msoftdrop
N-subjetiness variable 1 tau1( ) { } tau1
N-subjetiness variable 2 tau2( ) { } tau2
N-subjetiness variable 3 tau3( ) { } tau3
Leptonic diTau invariant mass fMTauTau( ) { } fMTauTau transverse impact parameter dxy( ) { } dxy longitudinal impact parameter dz( ) { } dz lepton identification variable softId( ) { } softId relative isolation for leptons miniPFRelIsoAll( ) { } miniPFRelIsoAll MVA based tau ID dMVAnewDM2017v2( ) { } dMVAnewDM2017v2 σ iηiη for photons sieie( ) { } sieie isolation variable reliso( ) { } reliso isolation variable relisoall( ) { } relisoall isolation variable pfreliso03all( ) { } pfreliso03all Tau decay mode id iddecaymode( ) { } iddecaymode
Tight ID and isolation flag idisotight( ) { } idisotight
Tight anti ele ID for taus idantieletight( ) { } idantieletight
Tight anti mu ID for taus idantimutight( ) { } idantimutight
Tight ID for muons tightid( ) { } tightid
PU ID for jets puid( ) { } puid
Index of matched genparticle to a lepton genpartidx( ) { } genpartidx
Tau decay mode decaymode( ) { } decaymode
Tau isolation tauiso( ) { } tauiso
Muon soft ID softId( ) { } softId
A.3.1 PDGID of particles
Each type of particle recognized in particle physics is assigned a unique code by the Particle Data Group (PDG)in order to facilitate interface between event generators, detector simulators, and analysis packages. These codesare known as PDGID (or PDG ID), and this method is called the MC particle numbering scheme [28]. Thenumbering includes elementary particles such as, electrons, neutrinos, Z bosons etc, composite particles (mesons,baryons etc) and atomic nuclei. Hypothetical particles beyond the Standard Model also have PDGIDs. Particleshave a positive PDGID whereas antiparticles a negative one. The list of PDGID of some particles is given in17able 9: Functions and syntax for computing new quantities in CutLang .
Meaning
Syntax 1 Syntax 2Angular distance between dR( ) { } dR Phi difference between dPhi( ) { } dPhi
Eta difference between dEta( ) { } dEta
Missing transverse energy in the event
MET –sum of jet transverse momenta
HT( ) –partitioning objects into 2 megajets fmegajets( ) { } fmegajets
Razor variable MR fMR( ) { } fMR
Razor variable MTR fMTR( ) { } fMTR partitioning objects into 2 hemispheres fhemisphere( ) { } fhemisphere transverse mass MT2 fMT2( ) { } fMT2 table 10 Table 10: PDGID of some elementary particles[29]Quarks Leptons Bosonsd 1 e − γ
22u 2 µ −
13 Z 23s 3 τ − W + select PDGID( LEP[0]) == -11 This command selects positrons. (Positron is the antiparticle of electron, therefore it has a negative PDGID)
A.4 Mathematical operators and functions
Mathematical functions available in CutLang are listed in Table 11. Trigonometric and logarithmic functionsare implemented with their usual meanings. The Heaviside step function or the unit step function hstep , whichwas also added recently, is a discontinuous function, named after Oliver Heaviside, whose value is zero fornegative arguments and one for positive arguments. The reducer functions for minimization and maximization, min and max , which were added recently, are discussed in Appendix A.9.5. The reducer function size / count returns the number of elements of a given set, such as the number of electrons.Table 11: mathematical and logical operatorsMeaning Operator Meaning Operatornumber of Size( ) Count() NumOf() absolute value abs()tangent tan() hyperbolic tangent tanh()sine sin() hyperbolic sine cosh()cosine cos() hyperbolic cosine sinh()natural exponential exp() natural logarithm log()square root sqrt() Heaviside step function hstep()as close as possible ˜= usual meaning + - / *as far away as possible ˜! to the power ˆ A.5 Comparison, range and logical operators
CutLang understands the basic mathematical comparison expressions and logical operations.
C/C++ operatornotations and their Fortran counterparts are recognized and correctly interpreted. Additionally square bracketsare used to define inclusive or exclusive ranges. The available comparison, range and logical operators can befound in Table 12. 18able 12: Comparison, range and logical operators in CutLangKeywords Explanation > >= == <= < usual meaning
GT GE EQ LE LT usual meaning != NE not equal [ ] in the interval ] [ not in the interval
NOT logical not
AND and && logical and
OR or || logical or
A.5.1 Logical operations
The use of Boolean operators (AND, OR, NOT) can make it easy to write the event selection criteria. InCutLang , logical AND and logical OR operator had already been used to combine multiple event selectioncriteria. The newly implemented logical NOT simplifies the way to write the criteria of event selections in theanalysis code to a great extent. The simplest example code to understand the syntax: select NOT Size(ELE) > 4
This command selects events which do NOT have number of electrons greater than 4. However, the advantage ofthe NOT operator becomes more apparent when trying to negate more complex selections. The event selectioncriteria can be combined using the logical AND, OR, NOT. For example : select (NOT condition1 ) AND ( condition2 OR condition3 )
Now let us look at another code : select Size(ELE) == 2select NOT ( {ELE[0] ELE[1]}q == 0 AND {ELE[0] ELE[1]}m [] 80 100)
The criteria ( {ELE[0] ELE[1]}q == 0 AND {ELE[0] ELE[1]}m [] 80 100) can be used for defining Z bosons.As we have set NOT, we veto events with Z boson while looking for other dilepton signatures. Without using the
NOT command, this selection would not be so straightforward, and would require a more complicated expression.
A.5.2 Ternary operator
Application of conditional selection criteria is available, including nested statements, using a syntax similar tothat of C++ : condition ? true-case : false-case
The following example illustrates a use case: if the number of muonsVeto particles equals to 1, then the
MTm quantity should be less than 100 otherwise the
MTe quantity should be less than 100:
Size(muonsVeto) == 1 ? MTm < 100 : MTe < 100
A.6 χ minimization In an analysis with a multitude of objects of the same type, the analyst could search for the best combinationdefined by some criterion. A typical example, used in fully hadronic t ¯ t reconstruction would be to find the jetcombination that would yield the best W boson mass, or to find the two charged leptons that would result inthe best Z boson mass. The need for such a search can be expressed in CutLang using two special comparisonoperators: ~= and ~! . The former is used in the sense of “as close as possible to” whereas the latter forcalculation “as far as possible from”. These two operators can be used to express χ minimization kinds ofoperations. The indices of the particles in such a search are to be given as negative. For example, the statement“find two leptons with a combined invariant mass as close to 90.1 GeV” can be expressed in CutLang notationas { LEP -1 LEP -1 } m ~= 90.1 . In this case, CutLang finds the best pair of particles satisfying the condition,and stores it per event for possible later use. However the analyzer should not use negative indices directlyinside the region block. It is a much better practice that improves readability to define a new object such as define ZLepRec = LEP[-1] LEP[-1] . This definition can be used when defining histograms or other selectioncriteria, such as when selecting the charge of the found lepton pair, etc. If another particle of the same type(e.g. another lepton) is to be found, it is necessary to use a different but still negative index value.19 .7 Definitions ADL and CutLang allow to assign alias names to constants (e.g. Z boson mass) or variables (e.g. angularvariables between objects, mass of the Z boson reconstructed from two leptons, etc.). The syntax and examplesare given in Table 13. Note that the keyword define can also be shortened as def .Table 13: Simple definitionsKeywords argument1 symbol argument1 Example define name :/= value define mZprime = 500 define name :/= function define mTop1 : m(Top1) define name :/= particle(s) define Zreco : ELE[0] ELE[1] A.8 Tables
The present version of CutLang incorporates tables to implement various HEP related quantities, such asefficiencies, acceptances or trigger turn-on curves. Currently only one and two-dimensional tables can be used.These tables should have a name and a table type, specified by the tabletype keyword, where the latter defineswhat information is hosted by the table. Currently, only efficiency tables are recognized, therefore the table typeinformation only serves as documentation and is not used by the interpreter. However, as other uses for tablesare developed, table type would become more relevant in the future. Tables must also specify the number ofvariables (1 or 2) using the nvars keyword as well as the availability of errors on the central value (true or false)using the errors keyword. These should be followed by the table data, using the value [lower-error upper-error]lower-limit1 upper-limit1 [lower-limit2 upper-limit2] notation. Once defined in the definitions section, the tablecan be referred to and used in object and event selection. An example table is shown below: table tightmuonefftabletype efficiencynvars 2errors true
A.9 Manipulating objects
A.9.1 Defining new objects
New objects can be declared using a simple syntax: object new_object_name : base_object_name where the object keyword can also be shortened as obj , and instead of the symbol : , the keywords using and take can be used. The base object name can be a base object class, or a previously defined new objecttype such in the case of defining b-tagged jets from already defined high transverse momentum jets. These areusually called derived objects. An example, defining a derived new electron type based on predefined electronswould be written as: obj goodEle : ELE both : and = can be used interchangeably object AK4jetstake JETselect {JET_}Pt > 30select {JET_}AbsEta < 2.4 It is also possible to create a new object by forming a group out of multiple base or derived objects, forexample, to create a lepton object from electrons and muons. This is achieved using the
Union function, asshown below. This particular case of new object creation does not use any selection. object leps : Union( MUO , ELE, TAU)
A.9.2 Sorting objects
By default, objects are sorted according to their transverse momentum, p t , in descending order. For example, ELE[0] denotes the electron having the highest transverse momentum. In some cases, objects may need to besorted according to some other property, such as energy, pseudorapidity etc. In the current version, this can bedone as: sort {ELE_ }E ascend
This command sorts electrons according to their energy in ascending order, i.e.
ELE[0] will have the leastenergy. Sorting can also be done in the descending order by using descend . A.9.3 Object combinatorics
Let us assume that we have an event with 5 jets, and we would like to reconstruct the hadronic Z bosons. Whatare the combinations? Numbering the jets from 1 to 5, some possibilities are given in Table 14, in the leftcolumns. It is obvious that not all possibilities are listed, and finally only one possibility can be true: after all ajet can not be used to reconstruct two different Z bosons. On top of this, other requirements might be appliedto further restrict the possible Z candidates. For example, there might be a pseudorapidity range limit on eachcandidate, the transverse momentum of the jets forming the Z boson could be limited, the angular separationbetween the hadronic Z candidate and the first constituent jet might be limited, and finally, the invariant massof the Z candidate might be requested to be in a certain range. After all these restrictions, the same initial setmight be reduced to the combinations listed in the right side columns of Table 14, where the candidates thatdid not pass the requirements are shown as stroked out.Table 14: Combining two jets to reconstruct a hadronic Z bosonpossibility ID Zhadronic Zhadronic1 12 342 12 353 12 454 13 245 13 256 13 45... ... ... possibility ID Zhadronic Zhadronic1 12 342 12 353 12 454 13 245 13 256 13 45... ... ...This combination example can be written in CutLang as: object hZs : COMB( jets[-1] jets[-2] ) alias ahz
21n order to activate this new object, and eliminate the combinations that do not satisfy the requirements,one has to put a selection command into the running algorithm (or region); this could be, for example, to haveat least two hadronic Z candidates per event: algo testCombinationsselect Size(jets) >= 2
As indicated by Table 14 right side, there are still multiple possibilities, such as rows 2, 4 and 5. To furtherreduce these by killing the overlapping candidates and leave a single valid one, some sort of ideal conditionshould be specified. This can be achieved using the previously discussed χ minimization. As an example case,let us require the masses of both candidates to be as close as possible to the known Z mass. Now, the finalalgorithm is given as: object hZs : COMB( jets[-1] jets[-2] ) alias ahz A.9.4 Looping over a subset of the object collection
By default, CutLang loops over all objects in a given collection. However, sometimes it is necessary to looponly over a subset, such as looping only through the first 3 jets. ADL and CutLang allow to specify the subset,e.g. as jets[0:3] . A.9.5 Minimum and maximum of object attributes
Looping over objects can be used for selecting the minimum or maximum of a function based on any objectattribute. An explanatory example could be to apply a selection based on the minimum value of the angularseparation between each of the three most energetic jets and the most energetic electron. In CutLang , thiscriterion can be expressed as: select Min( dR(JET[0:2], ELE[0] )) > 0.9 .
A.9.6 Summing object attributes
CutLang allows looping over an attribute to calculate the sum of their values. A typical example would be thesum of transverse momenta of a set of jets. Although this frequently used function is predefined and availableas HT, it could also be written as: select Sum( pT(JET) ) >= 20 .
A.9.7 Object constituents
Sometimes, the analysis might necessitate a selection based on jet constituents. CutLang allows the modifierword constituents only in case of jets (or any other jet-like objects, such as the large radius FatJets) to referto these. An example for defining a new jet object based on criteria on the constituents would be:22 bject goodJet using JETselect q(JET constituents ) == 0
Here the first criterion removes all the charged constituents of each jet and eventually the jet itself if it hasno more constituents left, whereas the second criterion imposes an upper limit of 40 GeV to the sum of thetransverse momenta of the remaining constituents of each jet. All other functions available in CutLang wouldwork in the same way.
A.9.8 Daughter particles
While defining a new particle based on MC truth information, it is sometimes necessary to access the daughtersof a given particle. CutLang is capable of accessing the daughters of of an MC truth particle. In the followingexample, the first selection criterion filters the particles that decayed into two or more daughters, while thesecond criterion is used to select only the daughters with electric charge. object DVcandidates take GENselect daughters( GEN ) > 1
A.9.9 Hit and miss method
The
ApplyHM function can be used to define new objects which pass or fail the efficiency test in that particularregion of the parameter space. In CutLang , the random number generation is achieved via the TRandom3 [30]function in ROOT libraries. This function reports the time cost of the call to be about 5 ns on an Intel i7 CPUrunning at 2.6 GHz.An example for electrons recorded by an imaginary detector whose electron detection efficiency is describedby a table called myDet can be written as: object myElectrontake ELEselect applyHM( myDet({ELE}pT , {ELE}Eta) == 1)
The analysis algorithm can make use of this newly defined object, myElectron , to apply selection criteria, suchas the available number of electrons per event etc.
A.10 Manipulating Events
A.10.1 Selecting or rejecting events
The conditions based on which an event can be selected or rejected are written in the region / algo blocks.They start with the select or reject keywords, and are expressed in the form of functions applied on particlescomplemented by a comparison operator and a limit value. An example for select would be select Size(goodEle) >= 2} The synonyms cut and cmd can be interchangeably used in place of the select keyword. The keyword reject is equivalent to select not , thus rejecting the events that match the given criteria, as in the example below: reject {ELE[0] ELE[1]}q == 0 AND {ELE[0] ELE[1]}m [] 80 100
There are also some special keywords that require further discussion. These are shown in Table 15. Select
ALL accepts all events, for example it can be used for event counting purposes. The next two are scale factors mostlyused in ATLAS related analyses. For other input file formats these scale factors are automatically set to unity.Table 15: Special Conditions in CutLangKeywords Example Explanation
ALL select ALL accept all events
LEPsf cmd LEPsf apply leptonic scale factor to MC events bTagSF cmd bTagSF apply b-jet tagging scale factor to MC events23 .10.2 Weighing events
Many analyses require events to be weighted for cross section and luminosity, for trigger efficiencies, or withvarious scale factors. CutLang has a mechanism for applying constant event weights or event weights fromfunctions, for which examples are shown below: weight trigEff 0.95weight ef2Weight myWeight({ELE_0}pT, {ELE_0}Eta)
The first command sets the weight of the selected events to 0.95, i.e, if the number of selected events is 1000 inthe beginning, now it will be counted as 950. The second command is a slightly more complicated example asit uses a table which defines the event weight according to two parameters: pT and η . The event weight is thusobtained from that table according to the attributes of the electron with the highest transverse momentum. A.10.3 Saving events
In CutLang , it is possible to save the currently surviving events at any stage of the running algorithm. Theevents are saved into a ROOT [12] file using the command save followed by the user-given file name without the.root extension which is automatically added. It is possible to save multiple times in a single algorithm (region)or multiple algorithms. The events in the output file are saved in the native format of CutLang , known as the lvl0 file. Therefore an example could be:
Save preselects
A.11 Bins, counts and histograms
A.11.1 Bins
In analyses dealing with multiple bins for signal and/or background regions, CutLang provides a simple wayfor defining the selections for those bins. The binning of the results should happen as the very last stage of aselection by using the keyword bin . Either the variable and the bin boundaries should be explicitly listed, ormultiple bins can be assigned to any variable or function using CutLang syntax. These two methods are notmutually exclusive and can define overlapping regions. It is to be noted that for the former, one defines twoimplicit bins: anything below the first value, and anything above the last value are also recorded separately.Results from binning are both printed (depending on the switches in the initialization section of the ADL file)and recorded as a histogram in the output ROOT file. The examples below illustrate the utilization of the bindefinition in an analysis algorithm: bin MET 250 300 500 750 1000
A.11.2 Counts
It is possible to register various signal, background or data counts of a region together with their associatederrors. The method to achieve this task is to start the ADL file with the definitions of various count formats.Below are two such examples where for each format type, multiple processes with different names can also bedefined. countsformat resultsprocess est, "Total estimated BG", stat, systprocess obs, "Observed data"countsformat bgestsprocess lostlep, "Lost lepton background", stat, systprocess zinv, "Z --> vv background", stat, systprocess qcd, "QCD background", stat, syst
A study described in an ADL file might use data counts or a background estimate or all of these fora statistical analysis. Therefore, the appropriate region has to contain the associated event counts and errorinformation using the correct syntax. It should be consistent with the previous definitions starting with keyword counts . Here the counts of each process should be separated by a comma, and the errors can be specified either24igure 3: An example output from ROOT’s TBrowser GUI showing histograms booked and filled by CutLangas symmetrical denoted with the +- sign or asymmetrical denoted with separate + and - signs. An exampleconforming to above definitions is given below. counts results 230.0 + 16.0 - 10.0 + 10.0 - 12.0 , 224.0counts bgests 105.0 +16.0 - 10.0 +-1.0 , 123.0 +-2.0 +-12.0 , 2.3 +-0.5 +-1.4 Once the analysis run is complete, the user finds in the output file a histogram for each of the definedprocesses with the name defined in the format commands. These histograms can be recalled and used laterduring the statistical analysis stage.
A.11.3 Histograms
CutLang allows defining 1D and 2D histograms for any event variable. The syntax for defining histogramsfollows closely the notation in ROOT. Any histogram should have a name, like h1mReco , and a list of parametersseparated by commas. The explanation of the histogram contents should be given in quotation marks, e.g., ‘‘Z candidate mass (GeV)" ; the number of bins, lower and upper limits as numbers, e.g. ; andfinally the quantity to histogram with the ADL notation, e.g. {ELE_0 ELE_1}m . A similar syntax is also usedfor the 2D histograms. The example below show definitions of 1D and 2D histograms: region Wtopmassselect ...select ...hmW,"W mass (GeV)", 70, 50, 150, mWhmTop,"Top mass (GeV)", 70, 0, 700, mTophmTopmW,"Top and W mass correlation (GeV)", 50, 50, 150, 70, 0, 700, mW, mTop
Apart from the user-defined histograms, CutLang by default automatically fills and saves a cutflow efficiencyhistogram for each analysis region. In case binning exists, CutLang also saves a histogram with bin counts.Figure 3 shows a snapshot of the
ROOT TBrowser , with the histograms in an output file listed, and one ofthe histograms displayed.
A.12 Structure of a complete ADL file
To be run with CutLang , an ADL file should follow a definite structure order as described in Section 4.1. Inthis structure, there are mostly optional sections and one compulsory section. The structure order consists of25nitialization, count format, definitions, new objects, more definitions using new objects, yet newer objects, andevent categorization commands. In this list only the event categorization commands are mandatory. The ADLfile structure allows multiple concurrent commands to be executed. The details of the first and the last sectionsare covered next.
A.12.1 Initialization and information section
Some of the possible settings in the initialization section have already been discussed in Table 1. It is also possibleto include, in this section, some information defining the work that is being done. The allowed keywords andtheir meaning is explained in the table below.Table 16: Information keywords of CutLangKeywords Type Explanation info ID a name defining the work experiment ID a name defining the experiment id string any string defining the work title string any string for the paper title publication string any string, the publication information sqrtS number a real number, the collider energy (GeV) lumi number a real number, collected data (fb-1) arXiv string any string containing the arxiv information hepdata string any string containing the hepdata information doi string any string containing the doi information
A.12.2 Regions and algorithms
CutLang can execute multiple commands in the event categorization section of the ADL file, meaning that theanalyst can test multiple methods on the same events independently of each other during design, or work withmultiple signal and control regions. The set of commands to be executed for each independent method is calledeither an algorithm or a region, therefore the keyword to be used is algo or algorithm or region followed witha user selected name, such as: region preselection Moreover it is possible to define one (1) layer of dependency such that a region can be marked as dependenton another. In this case, the independent region’s commands are executed first and the results are saved in amemory cache, and later the dependent region’s commands are executed based on that cache. A typical casewould be to create multiple signal regions based on a common preselection. This example is illustrated below.Note that the name of the independent region has been used in the dependent region’s list of commands directly,without any preceding keywords. region preselectionselect ....region SRApreselectionselect ....region SRBpreselectionselect ....
B The CutLang framework
B.1 installation and compilation
The code for the CutLang framework can be found in 26ttps://github.com/unelg/CutLangThe ROOT library from CERN should be pre-installed. After downloading the source code, the make command should be executed in the
CLA subdirectory to compiles the whole program. Analyses in CutLangare run runs subdirectory using the script
CLA.sh or CLA.py . This subdirectory contains several example filesthat demonstrate various aspects of ADL and CutLang . An analysis can be run using the command where theinput ROOT file type can be :
LHCO FCC LVL0 DELPHES ATLASVLL ATLMIN ATLASOD CMSOD CMSNANO . The -i or --inifile option is used for specifying the adl file. B.2 External user functions
The addition of the new so called external user functions to the existing set of internal functions is partiallyautomatized. The python helper script insertExternalFunction.py in the scripts directory is developed toaccomplish this task. It accepts the name of the header file containing the new function as an argument. Theautomatization currently works with a template based setup, therefore only with certain type of functions.Currently the following input and return types for external functions can be used for building an externalfunction into CutLang : • receives a vector of TLorentzVectors and an int, returns a vector of TLorentzVector; • receives a vector of TLorentzVectors, returns a double; • receives a vector of TLorentzVectors and a TVector2, returns a double; • receives a vector of TLorentzVectors and a TLorentzVector, returns a double; • receives 3 TLorentzVectors, returns a double;The external function must be defined in a header file before running the script SS: How?. The script is funwith the following command: python insertExternalFunction.py -ext abc where abc is name of the header file without .h.h