[PDF] zfit: scalable pythonic fitting

Abstract

Statistical modeling is a key element in many scientific fields and especially in High-Energy Physics (HEP) analysis. The standard framework to perform this task in HEP is the C++ ROOT/RooFit toolkit; with Python bindings that are only loosely integrated into the scientific Python ecosystem. In this paper, zfit, a new alternative to RooFit written in pure Python, is presented. Most of all, zfit provides a well defined high-level API and workflow for advanced model building and fitting, together with an implementation on top of TensorFlow, allowing a transparent usage of CPUs and GPUs. It is designed to be extendable in a very simple fashion, allowing the usage of cutting-edge developments from the scientific Python ecosystem in a transparent way. The main features of zfit are introduced, and its extension to data analysis, especially in the context of HEP experiments, is discussed.

Full PDF

zzﬁt: scalable pythonic ﬁtting

Jonas Eschle, Albert Puig Navarro, Rafael Silva Coutinho, Nicola Serra

Physik-Institut, Universit¨at Z¨urich, Z¨urich (Switzerland)

Abstract

Statistical modeling is a key element in many scientiﬁc ﬁelds and especially inHigh-Energy Physics (HEP) analysis. The standard framework to perform thistask in HEP is the C++ ROOT/RooFit toolkit; with Python bindings that areonly loosely integrated into the scientiﬁc Python ecosystem. In this paper, zﬁt, anew alternative to RooFit written in pure Python, is presented. Most of all, zﬁtprovides a well deﬁned high-level API and workﬂow for advanced model buildingand ﬁtting, together with an implementation on top of TensorFlow, allowing atransparent usage of CPUs and GPUs. It is designed to be extendable in avery simple fashion, allowing the usage of cutting-edge developments from thescientiﬁc Python ecosystem in a transparent way. The main features of zﬁt areintroduced, and its extension to data analysis, especially in the context of HEPexperiments, is discussed.

Keywords:

Model ﬁtting, data analysis, statistical inference, Python

1. Introduction

Data collected by experiments such as the Large Hadron Collider (LHC) atCERN or ultra-energetic cosmic ray detectors are analysed in terms of physics-motivated mathematical models. Typically, these models have a parametricform, and the values of their parameters are the quantities of interest in a givenanalysis. Once a model has been deﬁned, a ﬁtting procedure that minimizesthe disagreement between the data and the model is applied to ﬁnd the optimalvalues of these parameters, a crucial step in almost every analysis.Model ﬁtting as done in High Energy Physics (HEP) has some peculiari-ties and diﬀers from other ﬁelds. Models are often multidimensional, built bycomposing complicated, custom shapes, and have a non-trivial normalization.In addition, two further requirements need to be fulﬁlled: on one hand, rea-sonable scaling with data size and model complexity, motivated by the largedata samples being collected at the LHC; and, on the other hand, the capability

Email addresses:

[email protected] (Jonas Eschle), [email protected] (Albert Puig Navarro), [email protected] (Rafael Silva Coutinho), [email protected] (Nicola Serra)

Preprint submitted to SoftwareX May 21, 2020 a r X i v : . [ phy s i c s . d a t a - a n ] M a y o not only ﬁnd optimal values for the parameters, but to be able to calculatetheir uncertainties with enough ﬂexibility taking all correlations into account toallow for advanced statistical treatment, as uncertainties of measurements areas important as their central values.Since recent years, HEP analysis is moving more and more towards theusage of Python over the more traditional C++. The moving force behindthis paradigm shift is the existence of an extremely rich scientiﬁc computingPython ecosystem [1, 2], which provides a collection of eﬃcient, easy-to-usetools for data analysis that are shared across a very large community, includingscientists from various ﬁelds and industry, especially in data science, and whichone can build new applications on.The de facto standard for model ﬁtting in HEP is RooFit [3], which is in-tegrated in the ROOT toolkit [4] and provides a very powerful object-orientedarchitecture, as well as plotting capabilities and an advanced statistics module.It is written in C++ and can be accessed in Python through the ROOT Pythonbindings. These bindings come with its own set of non-negligible problems:complex memory management that easily leads to memory leaks, diﬃcult inte-gration with the scientiﬁc computing Python stack, lack of extendability fromPython, and a complex installation procedure due to the need to install the fullROOT toolkit.General-purpose Python-based libraries such as SciPy [5], lmﬁt [6] or Ten-sorFlow Probability [7] do not fulﬁll all HEP ﬁtting requirements, and thereforecannot be used as-is . Python-based HEP-speciﬁc ﬁtting packages, such as prob-ﬁt [8], TensorProb [9] or TensorFlowAnalysis [10], have tried to address theseissues and can be considered as proof-of-concepts that HEP-centric model ﬁttingis possible in Python; none of them, however, oﬀers the same ﬁtting functionali-ties and ease of use as RooFit does while still integrating well with the scientiﬁccomputing Python ecosystem.The goal of zﬁt is to ﬁll this important gap in the HEP software ecosystemand provide a well deﬁned API and workﬂow for model ﬁtting together with aPython-based package that is fast, scalable and extensible.

2. Software description

The structure of a generic model ﬁtting workﬂow is illustrated in Fig. 1. Itsbasic goal is to maximize the agreement of a set of data points to a probabilitydistribution described by a set of parameters. The measurement of the metricof the goodness—or, more precisely, the disagreement—of the predicted model with the experimental data is referred to as loss function. One of the most usedloss functions in HEP is based on the likelihood function, in particular the nega-tive log-likelihood, which is minimized in order to estimate the parameters of themodel. Once the optimal values of the model parameters have been determined,several statistical techniques can be used to calculate their uncertainties .The structure deﬁned by model - data - loss - minimize can also be used to de-scribe a typical Deep Learning workﬂow—even more clearly by replacing model igure 1: Schematic zﬁt workﬂow view of the individual structural blocks. by “Neural Network” and minimize by “Training”. This similarity has pro-found implications in the design of the zﬁt implementation, as it allows to usean industry standard Deep Learning tool such as TensorFlow [11] (TF) as itscomputational backend by using the low-level API that consists of mathematicalfunctions. The immediate beneﬁts for zﬁt in terms of performance arise fromkey features of TF such as the capacity to be run on heterogeneous architectures,mathematical optimizations such as automatic diﬀerentiation (required by themost used minimization algorithms in HEP), and operation caching. This putsthe performance of zﬁt in the same order of magnitude as the existing standardas shown in Appendix A.zﬁt does not intend to be a large, monolithic package and has thereforea clear, restricted scope: model ﬁtting and sampling with a well deﬁned andconsistent API which is mostly independent of the implementation. Tasks suchas data processing, plotting and statistical analysis are left for other packagesto tackle.The abstraction of the ﬁtting process in the steps shown in Fig. 1 is oneof the main features of the architecture of zﬁt: by developing each step as aseparate “brick” of the library, it becomes possible to tightly integrate with thescientiﬁc computing Python ecosystem, and use the available tools to improveperformance, reduce code duplication and add extra functionality. Each of thesebricks ( model , data , loss , minimizer and ﬁt result ) provides override hooks forall its core functionality, allowing to completely customize or replace any aspect3f their behavior. As hinted at above, these object deal internally with TF.Since using TF in the correct way adds another layer of diﬃculty, most of theseadditional complications are hidden from the user and the library oﬀers a similaruser experience as the one found in other model ﬁtting libraries, independent ofthe actual backend.Model ﬁtting handles multiple dimensions. There are observables, the de-pendent variables, over which a model is usually normalized and parameters,which will be adjusted during the minimization of the loss. In contrast to otherframeworks such as RooFit, zﬁt makes a strong distinction between observablesand parameters; this removes any ambiguity in terms of normalization, allowsto be closer to Deep Learning and the optimized way of how TF works. Theobservables and corresponding limits in which the model ﬁt is performed arehandled by the Space object. Space objects can be single- or multi-dimensional,and, thanks to the use of observables, can be combined using the Python arith-metic operators to tackle arbitrarily complex uses cases.Models are deﬁned inside a given Space and implement a shape function thatreturns a value given some input data. Custom models are easily implementedusing their functional form and by default receive standard attributes such asnumerical integration and sampling. These generic functionalities can also beextended to analytical integration and speciﬁc sampling if provided by the user.Models can be combined to build more complex models (including increasing thenumber of dimensions) while keeping automatic integration, observable handlingand event sampling. The normalization is done over a range speciﬁed in the obs or further customized by overriding the _normalization method manually.Models are parametrized by

Parameter objects, which can be used in theimplementation of the shape function like any other scalar.

Independent pa-rameters are the only type of objects that are left free in the ﬁt.

Composedparameters can be built by combining parameters through mathematical oper-ators and can be used like independent parameters.The loading, ordering and processing of data in zﬁt is handled uniformly,no matter their source, by the

Data object. While only the most importantformats are currently implemented—ROOT trees, Numpy arrays and PandasDataFrames, as well as TF tensors—the simple API of the Data object makesit easy to add additional data handling capabilities.Data and Models are combined to build a

Loss function, which can eitherbe pre-deﬁned such as the negative log-likelihood or completely arbitrary. Thecomputation of the value of the loss for a given conﬁguration of the model iseﬃciently implemented in zﬁt thanks to TF. It is also possible to add constraintsto the loss in order to modify its behavior. Constraints are additional terms tothe likelihood that often represent some external information on the parameter.While the most used constraints in HEP are directly implemented in zﬁt, suchas Gaussian constraints, arbitrary custom functions can be also used.Minimization is performed through stateless Minimizer objects. Since theloss calculation is usually the most intensive part of the minimization problem,in practice it is possible to wrap any existing Python minimizer to be used byzﬁt. Several minimizers are included in zﬁt, such as the SciPy optimizers, the4ost-used minimizer in HEP, called Minuit [12], or the TF implementation ofthe Adam optimizer [13].The results of the minimization process are returned in a FitResult object,which stores all the information about the ﬁt parameters, the minimizationprocess, as well as the used instances of the Minimizer and the Loss. Thisincludes the function minimum and the covariance matrix.While the approximate uncertainties related to the ﬁt parameters can becalculated directly inside zﬁt using the Hessian matrix or a simple proﬁlingmethod, the calculation of precise parameter uncertainties and conﬁdence inter-vals are left to specialized packages, such as those in Refs. [14, 15], which don’tneed any object other than the FitResult.

3. Illustrative examples

The architecture and features of zﬁt are better visualized with a typical HEPexample: a ﬁt to the invariant mass of multiple particles including both signaland background events, extended to multiple dimensions, and introducing acontrol channel.

Let’s assume the domain of interest of the mass observable, in this case the B meson mass, is in the range [4 . , .

0] GeV. In zﬁt this is expressed with a

Space deﬁning our domain: obs = zfit . Space ( obs =" mass ", limits =(4.5, 6.0))

The best interpretation of the observable at this stage is that it deﬁnes the nameand range of the observable axis.Data can be loaded for example from a numpy array, where the obs of theSpace are matched by order with the columns of the array: data = zfit . Data . from_numpy ( array = numpy_array_data , obs = obs ) with numpy_array_data being a numpy array holding our data. The signalcomponent of the ﬁt will be modelled using a Gaussian (normal), which has twofree parameters—its mean and its width: mu = zfit . Parameter ("mu", 5.2, 4.5, 6.0)sigma = zfit . Parameter (" sigma ", 20 , 1, 40) where the arguments for the parameters are: a unique name, an initial value, alower and an upper bound. The bounds are not mandatory, the parameter willbe ﬂoating in both cases. These can be used to instantiate the Model as: signal = zfit . pdf . Gauss ( obs =obs , mu=mu , sigma = sigma ) pdf modulefor convenience, the philosophy of zﬁt is to provide a clear API with powerfulmechanisms to facilitate that the users and the community can implement theirown. Making use of another built-in model, the background that is parametrizedas an exponential: bkg = zfit . pdf . Exponential (-0.001 , obs = obs )

In order to build the sum of these two models, an additional parameter frac is used to describe the fraction of each species: frac_sig = zfit . Parameter (" signal fraction ", 0.5, 0, 1)model = zfit . pdf . SumPDF ( pdfs =[ signal , bkg ], fracs = frac_sig ) With the model and the data built, a loss function can be constructed. Inthis case, the goal is to perform an unbinned maximum likelihood ﬁt, so thecorresponding Loss object is initialized as nll = zfit . loss . UnbinnedNLL ( model = model , data = data )

Finally, the minimization process is executed, ﬁrst instantiating a Minimizerobject—in this case using the Minuit minimizer—and then running the mini-mization algorithm: minimizer = zfit . minimize . MinuitMinimizer ()result = minimizer . minimize ( nll )

The outcome of this minimization is stored in a

FitResult object, whichprovides access to all details that occurred during the convergence process andto the best values estimated for the free parameters together with a nice printoutof the parameters and uncertainties. For example, one could obtain the ﬁttedvalue of the signal mean as if result . converged :mu_value = result . params [mu][" value "] The params object stores additional information such as estimated uncer-tainties if they have been calculated. Following a similar approach to the reference implementation of RooFit, the fraction ofthe last species corresponds to 1 − (cid:80) frac i . Notice that mu , the parameter object, and not "mu" , the name of it, is used as the key. .2. Advanced usage The next step is to extend the model to take into account extra dimensionsof the problem, in this case the angular distributions of the decay products ofthe B meson. To do so, the signal distribution is modelled with a custommodel, which is created by inheriting from one of the base classes provided byzﬁt: from zfit import zclass AngularSignalPDF ( zfit . pdf . ZPDF ):_PARAMS = ["FH", " AFB "]_N_OBS = 1def _unnormalized_pdf (self , x): costhetal = z. unstack_x (x) FH = self . params ["FH"]AFB = self . params [’AFB ’] pdf = (3 / 4 * (1 - FH) * (1 - z. square ( costhetal ))+ 0.5 * FH + AFB * costhetal )return pdf

Instead of

BasePDF , we use here an even simpler class,

ZPDF , which onlyrequires to specify the naming of the parametrization via the _PARAMS attributeand allows, optionally, to deﬁne the dimensionality through _N_OBS . The latteralso allow to create n-dimensional Models as x will be a list of length _N_OBS containing the data. Internally, the class handles the management of the freeparameters, which can be accessed by their parameterization name through the params attribute, which holds the parameters. It also provides a simple way toaccess the input data with the z.unstack_x . The module z provides some TFfunctions wrapped for better handling of dtypes as well as additional methodsand can be used mixed with pure TF functions. fh = zfit . Parameter ("FH", 1.)afb = zfit . Parameter (" AFB ", 0.5)obs_costhetal = zfit . Space (" thetal ", limits =(-1, 1))angular = AngularSignalPDF ( obs = obs_costhetal , FH=fl , AFB = afb ) To create a two-dimensional model, one way is to multiply the models de-scribing the diﬀerent dimensions. Given that their observables are diﬀerent, thenew model will automatically be deﬁned in the combined space of the two: combined_model = angular * model mu can be obtained. Todo so, a second Gaussian is created to model this control mode: obs_control = zfit . Space (" control ", limits =(5.2, 5.6))sigma_control = zfit . Parameter (" sigma control channel ", 2, 0, 5)gauss_control = zfit . pdf . Gauss ( obs = obs_control ,mu=mu sigma = sigma_control ) A simultaneous

Loss can be built by either two independent

Loss functions orby providing both models and datasets in a list as nll = zfit . loss . UnbinnedNLL ([ combined_model , gauss_control ],[data , data_control ]) where data control has been created analogously to data . This simultaneousLoss can be minimized as shown before.

4. Impact and conclusions

The zﬁt package is a crucial piece required to complete the transition of HEPanalysis to a stack fully based on the scientiﬁc computing Python ecosystem.The importance of this paradigm change should not be underestimated: giventhe limited availability of resources in HEP, especially in areas such as soft-ware development, the possibility of integrating physics analysis within a well-developed and well-supported software ecosystem—allowing to focus mainly onthe HEP-speciﬁc parts of the analysis software—will have long term repercus-sions in the quality and usability of HEP analysis software.The package itself is not only an implementation of a versatile and powerfulmodel ﬁtting library, but also speciﬁes a carefully designed and well discussedAPI of the most crucial components and the workﬂow. With the aim to be astandard in its kind, other packages such as high-level statistics and plottingtools, can be built against the standardized API instead of a concrete imple-mentation, strongly reducing package dependencies. In addition, other packagescan focus on the speciﬁc implementation of a certain task, such as building aspecialized Loss, Model or a Minimizer, without the need to also implement theother parts of the workﬂow themselves.Finally, ﬁts in HEP analysis often consist of large datasets and complicatedmodels, and require far more computing power than a single core. Scalabilityin terms of size and complexity of the data and models and the possibility torun on large computing infrastructures, including accelerators such as GPUs,is therefore a necessity. TensorFlow, the graph based computing backend inzﬁt, was built for the sole purpose of eﬃcient large scale computing and takescare of the non-trivial task of optimal workload parallelization. This does notonly yield state-of-the-art performance, but also enables the HEP community8o keep up with the latest algorithm implementations and low-level computinginstructions at virtually no cost.In summary, a new high-level statistical modeling and ﬁtting library purelydesigned within, and for, the Python ecosystem is presented. Future plans forzﬁt include full support to binned models, further extension on baseline HEPfeatures as well as standardized, human-readable serialization of objects.9 cknowledgements

We are grateful to Anton Poluektov, Chris Burr and Igor Babuschkin fordemonstrating the potential of unbinned model ﬁtting within the context ofTensorFlow, which inspired this work. We also thank the Zurich LHCb Group,Matthieu Marinangeli, Josh Bendavid, Lukas Heinrich and the HSF community,especially Scikit-HEP project members, for useful discussions. A. Puig, R. SilvaCoutinho, J.Eschle and N. Serra gratefully acknowledge the support by theSwiss National Science Foundation (SNF) under contracts 168169, 174182 and182622. 10 ppendicesAppendix A. Performance

A series of studies are performed to evaluate the execution time of zﬁt incomparison to the conventional RooFit library. The performance relies on twoaspects, i.e. the complexity of the ﬁt (given by the number of free parameters)and the data sample size. In this study, the sum of 9 Gaussian functions with acommon mean and width parameter, but with diﬀerent central values, is usedas baseline model.Ensembles of Monte Carlo pseudoexperiments are generated and ﬁtted withdiﬀerent setup conﬁgurations .i. Nominal zﬁt implementation on CPU, using an analytic gradient providedby TF. In this case, an initial run is done to remove the graph compilertime. While not signiﬁcant, this provides a more realistic estimation.ii. Alternatively the analytic gradient provided by TF is disabled, denotedby the addition “nograd”. Instead, the Minuit minimizer calculates anumerical approximation of the gradient internally.iii. The default CPU zﬁt settings is replaced by the GPU implementation.iv. A standard RooFit implementation using the Python bindings with Py-ROOT (version 6.16.00). The parallelization is done when invoking the fitTo method and equals the number of cores available.For each setup, twenty toys are generated with sample sizes from 128 to 8million events (except for the GPU, where it goes only up to 4 million ) andthe average ﬁtting time is used as reference point. Note that for simplicity theMinuit algorithm is used in all cases.Figure A.2 compares the performance of diﬀerent ﬁt scenarios and can besummarize as follows.- As the number of events increase, the execution time of RooFit monoton-ically increases. - There is no advantage using parallelization for very few calculations, sincethe overhead of splitting and collecting the results is dominant. Withincreasing number of events this gets negligible and thus the executiontime of zﬁt increases way slower than that for RooFit. This also comesfrom the fact that more events mean a more stable loss shape, so theminimizer used in zﬁt performs better. The ﬁtting performance is examined using a 12 core Intel i7 8850H with 2.60 GHz and32Gb RAM, and a Nvidia P1000 with 4GB RAM. The GPU used in these tests has a relative small memory. Performing larger-than-memorycomputations and multi-GPU is still work in progress. It is worth mentioning that the RooFit parallelization strategy is rather simple and withvery few overheads; several improvements in this direction are ongoing. It however hints animportant point of using TF in zﬁt, namely that this kind of work is factored out and asuperior performance can come for free. igure A.2: Averaged time per ﬁt for a sum of Gaussian with shared mean and width as afunction of the the number of generated events. - The GPU is highly eﬃcient in computing thousands of events in parallel.For only a few data points though, the overhead of moving data back andforth dominates, resulting in a worse performance in this regime.As a conclusion, the speed for a multicore CPU system is comparable be-tween zﬁt and RooFit. For large number of events, zﬁt typically outperformswith a similar response for both GPU and a multicore system.12 equired MetadataCurrent code versionNr. Code metadata description C1 Current code version 0.3.6C2 Permanent link to code/repositoryused for this code version https : //github.com/zf it/zf it C3 Legal Code License BSD-3C4 Code versioning system used GitC5 Software code languages, tools, andservices used Python, TensorFlowC6 Compilation requirements, operat-ing environments & dependencies Linux and MacOS supported (Windowsfunctionality unknown);Python 3.6+;dependencies (also speciﬁed in requirements.txt ): tensorf low > = 1 . . , < ,tensorf low probability > = 0 . . , < . ,scipy > = 1 . uproot, pandas, numpy, iminuit,typing, colorlog, texttable, ordered - set C7 If available Link to developer docu-mentation/manual https : //zf it.readthedocs.io/en/ . . / C8 Support email for questions zﬁ[email protected]

Table A.1: Code metadata. urrent executable software versionNr. Executable softwaremetadata description S1 Current software version 0.3.6S2 Permanent link to executa-bles of this version https : //github.com/zf it/zf it/releases/tag/ . . requirements.txt ): tensorf low > = 1 . . , < ,tensorf low probability > = 0 . . , < . ,scipy > = 1 . uproot, pandas, numpy, iminuit,typing, colorlog, texttable, ordered − set S6 If available, link to user man-ual - if formally published in-clude a reference to the pub-lication in the reference list https : //zf it.readthedocs.io/en/ . . / S7 Support email for questions zﬁ[email protected]

Table A.2: Software metadata eferences [1] T. E. Oliphant, Python for Scientiﬁc Computing, Comput Sci Eng 9 (3)(2007) 10–20. doi:10.1109/MCSE.2007.58 .[2] K. J. Millman, M. Aivazis, Python for Scientists and Engineers, ComputSci Eng 13 (2) (2011) 9–12. doi:10.1109/MCSE.2011.36 .[3] W. Verkerke, D. P. Kirkby, The RooFit toolkit for data modeling, eConfC0303241 (2003) MOLT007. arXiv:physics/0306116 .[4] R. Brun, F. Rademakers, ROOT: An object oriented data analysisframework, Nucl. Instrum. Meth. A389 (1997) 81–86. doi:10.1016/S0168-9002(97)00048-X .[5] E. Jones, T. Oliphant, P. Peterson, et al., SciPy: Open source scientiﬁctools for Python, [Online; accessed 11/07/2019] (2001–).URL [6] M. Newville, R. Otten, A. Nelson, A. Ingargiola, T. Stensitzki, D. Allan,A. Fox, F. Carter, Micha(cid:32)l, D. Pustakhod, Y. Ram, Glenn, C. Deil, Stuer-mer, A. Beelen, O. Frost, N. Zobrist, G. Pasquevich, A. L. R. Hansen,T. Spillane, S. Caldwell, A. Polloreno, andrewhannum, J. Zimmermann,J. Borreguero, J. Fraine, deep 42-thought, B. F. Maier, B. Gamari, A. Al-marza, lmﬁt/lmﬁt-py (Dec. 2019). doi:10.5281/zenodo.598352 .URL https://doi.org/10.5281/zenodo.598352 [7] J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, S. Vasudevan, D. Moore,B. Patton, et al., TensorFlow Distributions, (Nov 2017). arXiv:1711.10604 .[8] P. Ongmongkolkul, C. Deil, C. hsiang Cheng, A. Pearce, E. Rodrigues,H. Schreiner, M. Marinangeli, L. Geiger, H. Dembinski, scikit-hep/probﬁt(Nov. 2018). doi:10.5281/zenodo.1477852 .URL https://doi.org/10.5281/zenodo.1477852 [9] I. Babuschkin, C. Burr, S. Neubert, K. Br¨ugge, TensorProb (Apr 2019).URL https://tensorprob.github.io/tensorprob/ [10] A. Poluektov, A. Mauri, A. Merli, A. Mathad, M. Martinelli, A. Morris,Using TensorFlow for amplitude ﬁts - the TensorFlowAnalysis package (Jul.2018). doi:10.5281/zenodo.1415412 .URL https://doi.org/10.5281/zenodo.1415412 [11] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, et al.,TensorFlow: Large-scale machine learning on heterogeneous systems, soft-ware available from tensorﬂow.org (2015).URL doi:10.1016/0010-4655(75)90039-9 .[13] D. P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, in: 3rdInternational Conference on Learning Representations, San Diego, 2015. arXiv:1412.6980 .[14] M. Marinangeli, lauztat (Jun. 2019). doi:10.5281/zenodo.3249383 .URL https://doi.org/10.5281/zenodo.3249383 [15] M. Marinangeli, B. Pollack, E. Rodrigues, J. Eschle, scikit-hep/hepstats(Nov. 2019). doi:10.5281/zenodo.3519200 .URL https://doi.org/10.5281/zenodo.3519200https://doi.org/10.5281/zenodo.3519200