Efficient description of experimental effects in amplitude analyses
Abhijit Mathad, Daniel O'Hanlon, Anton Poluektov, Raul Rabadan
NNovel approaches to estimation of acceptance andbackground distributions in multidimensionalamplitude analyses
Abhijit Mathad , , Daniel O’Hanlon , Anton Poluektov , ∗ Physik-Institut, Universit¨at Z¨urich, Z¨urich, Switzerland Department of Physics, University of Warwick, Coventry, United Kingdom INFN Sezione di Bologna, Bologna, Italy Aix Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France
February 6, 2019
Abstract
Amplitude analyses are a powerful technique to study heavy hadron decays. Asignificant complication in these analyses is the treatment of instrumental effects,such as background and selection efficiency variations, in the multidimensional kine-matic phase space. This paper reviews conventional methods to estimate efficiencyand background distributions and outlines the method of density estimation usingGaussian processes and artificial neural networks. Such techniques see widespreaduse elsewhere, but have not gained popularity in use for amplitude analyses. Finally,novel applications of these models are proposed, to estimate background density inthe signal region from the sidebands in multiple dimensions, and a more generalmethod for model-assisted density estimation using artificial neural networks. ∗ Corresponding author. E-mail address:
[email protected] a r X i v : . [ h e p - e x ] F e b ontents D + s → K + π − π + . . . . . . . . . . . . . . . . . . 73.2 Combinatorial background density for D + s → K + π − π + . . . . . . . . . . . 7 Introduction
Amplitude analysis of hadron decays is a powerful technique employed in many flavourphysics studies, such as measurements of CP violation, searches for effects beyond theStandard Model, spectroscopic studies of excited hadrons, and searches for previouslyunobserved hadronic states. In this kind of analysis, multidimensional kinematic distri-butions of the decay products of a parent particle are studied to reveal the dynamicalstructure of the decay amplitude [1]. In addition to the decay dynamics, the kinematicdistributions are in general affected by non-uniform acceptance, or detection efficiency,and background density, which need to be accounted for in the fit.In this paper we briefly review the conventional approaches employed by previousanalyses, recall a few already known but rarely used methods, and, finally, propose newtechniques to model non-uniform acceptance and background distributions. The proposedtechniques not only offer more accurate descriptions of these distributions, but also provideimproved avenues to control the systematic uncertainties arising from such approaches.A simple, yet typical example of an amplitude analysis is the study of the two-dimensional distribution of a three-body decay of a scalar meson into three scalar mesons:Dalitz plot analysis [1, 2]. This is the simplest case where the decay has internal degreesof freedom, yet the amplitude is a function of only two kinematic variables. In more com-plicated cases, such as decays involving non-spin-zero states or decays with greater thanthree particles in the final state, one is required to analyse kinematic distributions in morethan two dimensions. Here we focus on the simple two-dimensional case, however most ofthe approaches presented here can be generalised to the cases with higher dimensionality.We avoid any quantitative comparisons of the performance for the illustrated tech-niques. The optimal technique for each individual analysis depends on many factors,such as the size of the data sample, dimensionality of the kinematic phase space, require-ments on statistical and systematic uncertainties, complexity of the amplitude model,signal selection procedure and structure of the background contributions.The structure of the paper is as follows: the formalism of multidimensional maximum-likelihood fits is recalled in Section 2 and non-parametric methods to deal with backgroundand acceptance are presented. The samples used to illustrate the background and accep-tance parameterisation techniques are described in Section 3. Conventional techniques toparametrise the acceptance distribution are illustrated in Section 4. Further, two rarelyused but yet efficient approaches are presented: a technique using Gaussian processes(Section 5) and density estimation with artificial neural networks (Section 6). Finally,two novel approaches are proposed: the technique for inter- or extrapolation of back-ground density from the sidebands using one of the techniques above (Section 7), and amodel-assisted parameterisation of background or acceptance density using neural net-works (Section 8). A typical amplitude analysis in flavour physics deals with the distribution of kinematicvariables z that characterise the multibody decay. The goal is to determine the unknown3arameters p that characterise the amplitude of the decay A ( z | p ). Given a set of decaycandidates z i (1 ≤ i ≤ N ) obtained in an experiment, an unbinned maximum-likelihoodfit is performed to infer the model parameters p . The negative logarithm of the likelihood, − L , minimised in the fit, is of the form − L = − N (cid:88) i =1 ln F ( z i | p ) , (1)where N is the size of sample being fit, F ( z | p ) is the normalised probability density of thedecay that depends on the model A ( z | p ), with a normalisation term I , the acceptance (cid:15) ( z ) and the normalised probability density function for the background events B ( z ): F ( z | p ) = (cid:15) ( z ) |A ( z | p ) | I + B ( z ) . (2)Another instrumental effect that needs to be taken into account in the fit, particularlyif the amplitude contains narrow resonant states, is the finite resolution of the kinematicvariables z . This effect is beyond the scope of this paper and is not considered.The contribution of background events is typically obtained by analysing the distri-bution of selection variables, m . In the simplest case, m is a single variable that is takento be the combined invariant mass of the final state particles, which typically peaks atthe mass of the parent particle for the signal events, and is distributed more uniformlyfor the background. However, other parameters of the event can also be included in thebackground selection. Alternatively, instead of treating the background contribution ex-plicitly as shown in Eq. (2), one can also assign to each candidate i in the data sample z i a weight w i , such that the background contribution is statistically subtracted. Thisprocedure will be discussed in more detail in Section 2.2.While the amplitude A ( z | p ) is driven by the model of the decay dynamics and isthe primary object under study, the experimental effects of background and non-uniformacceptance exist as “nuisance” objects that, nevertheless, have to be modelled accuratelyfor correct interpretation of the results. Their description, especially in the case of themultidimensional kinematic phase space of the decay, often presents a major difficultyin an analysis. Below we review several conventional techniques employed in amplitudeanalyses to deal with effects of background and non-uniform acceptance. Non-uniform acceptance is usually handled either explicitly, using a parametric or non-parametric model of the decay density as shown in Eq. (2), or in an implicit way, byincluding its effect in the normalisation term of the likelihood. In the latter case, thescattered data from simulation can be directly used, and no functional representation ofthe acceptance is required.To demonstrate the implicit approach, let us consider Eq. (1) that is being minimised4n the unbinned fit (the background contribution has been omitted for simplicity): − L = − N (cid:88) i =1 ln (cid:18) |A ( z i | p ) | (cid:15) ( z i ) N (cid:19) = − N (cid:88) i =1 ln |A ( z i | p ) | − N (cid:88) i =1 ln (cid:15) ( z i ) + 2 N ln (cid:18) VM (cid:19) + 2 N ln (cid:32) M (cid:88) j =1 |A ( y j | p ) | (cid:15) ( y j ) (cid:33) . (3)Here the normalisation term I is calculated by taking the mean of the values of the densityfunction on a uniformly distributed sample y j (1 ≤ j ≤ M ) in a volume V . The constantterms that do not depend on the parameters of interest, p , can be omitted, which leadsto the following expression: − L = − N (cid:88) i =1 ln |A ( z i | p ) | + 2 N ln (cid:32) M (cid:88) j =1 |A ( y j | p ) | (cid:15) ( y j ) (cid:33) . (4)The second term in the above equation can be seen as the sum of |A| calculated over thesample y j distributed uniformly over the decay phase space, where each event j enters withthe weight (cid:15) ( y j ). The weight of each event can also be interpreted as the probability of thatevent passing the detector acceptance. Such an interpretation hints at a way to preparethe normalisation sample: one has to generate the decays uniformly in the decay phasespace, and then simulate the reconstruction and selection of the events. The retainedevents will serve as a normalisation sample for the likelihood and no further correctionsto the acceptance are required. In a real analysis, however, additional weighting of thenormalisation sample may be needed to account for the imperfections in the simulatedsample.Since no explicit parameterisation of the acceptance distribution is needed, this ap-proach is often used in amplitude analyses with more degrees of freedom than the two-dimensional Dalitz plot, such as Λ b → J/ψ pK − [3], B + → J/ψ φK + [4], or D → K − π + π + π − [5] decays where the amplitudes are described in a five- or six-dimensionalphase space. This method is, however, statistically sub-optimal, since it does not exploitthe fact that the acceptance distribution can be assumed to be at least locally smooth.As a result, these analyses usually require simulation data sizes several times larger thanthe real data samples.Explicit modelling of the acceptance function is more typically used in two-dimensionalDalitz-plot analyses. The techniques often used are two-dimensional polynomials with theparameters obtained from fitting a simulated data sample [6], two-dimensional histogramssmoothed with cubic splines [7], and kernel density estimation [8, 9]. Analyses in morethan two dimensions do sometimes also use an explicit acceptance parameterisation, albeitwith assumptions on the factorisation of the acceptance in some variables [3] to reducethe dimensionality. 5 .2 Treatment of background contributions As in the case of acceptance, backgrounds can also be treated in the amplitude analyseseither in an explicit or implicit fashion. The implicit inclusion of the background intothe likelihood fit can be performed using the sPlot technique [10, 11], where, each event isassigned a weight that depends on the value of the selection variables m . These weightsare positive in the signal-dominated regions of the selection variables and negative in thebackground-dominated regions, and as such the contribution of the background eventscan be statistically subtracted from the likelihood.This procedure does not require explicit parameterisation of the background density,however, it suffers some drawbacks. Firstly, it assumes that the amplitude fit variables areuncorrelated with the selection variables, which as will be demonstrated in Section 7, isan assumption that in general is not well motivated. The presence of correlations will thusintroduce bias in the fit results, especially if the background level is large. Secondly, sincethe procedure does not make any assumptions on the functional form of the backgrounddensity, it results in larger statistical uncertainties on the results than when a reasonablefunctional form is assumed. For the purposes of illustration, the D + s → K + π − π + decay is considered in this paper.As this is a three-body decay with scalar particles in both the initial and final states, itsdynamics are fully characterised by two kinematic variables. This decay is also convenientas all the final states of the decay are charged tracks, which makes the selection easier tobe implemented in a simplified Monte-Carlo simulation framework. There are no identicalparticles in the final state, avoiding any need to deal with Bose symmetrisation of thekinematic phase space. Finally, being a singly Cabibbo-suppressed decay, it is interestingfrom a physics point of view, since it can exhibit significant CP violation.Here we choose to parametrise the phase space in terms of the two square Dalitz-plotvariables [12], defined as m (cid:48) = 1 π arccos (cid:18) m ( K + π − ) − m min Kπ m max Kπ − m min Kπ − (cid:19) ,θ (cid:48) = 1 π θ ( K + π − ) , (5)where m ( K + π − ) is the invariant mass of the K + π − combination, m min Kπ = M K + M π and m max Kπ = M B − M π are the minimum and maximum values of m ( K + π − ) variations, M K and M π are the masses of K and π mesons, respectively [13], and θ ( K + π − ) is the helicityangle of the K + π − combination. While the square Dalitz plot was designed for analysesof B mesons, in order to give more weight to the interference regions between differenttwo-body combinations that predominantly occur in the corners of the conventional Dalitzplot, in our case it is used purely to avoid complications related to the curved boundariesof the conventional Dalitz plot. 6lthough the three-body Dalitz-plot analysis is the simplest case for the techniquesconsidered here, they scale well with dimensionality of the kinematic phase space, andcan be applied for more complicated amplitude analyses. Similarly, the exact definition ofthe kinematic phase space (conventional or square Dalitz plots, or various representationsfor four-body kinematics) is not a limitation for the techniques under study. D + s → K + π − π + The sample of D + s → K + π − π + decays is simulated using a simplified Monte-Carlo tech-nique, where only kinematic properties of the initial and final state particles are consid-ered. The production of D + s mesons and reconstruction of decay products is inspired bythe conditions of LHCb experiment [14], however the numerical values of the parametersused in the simulation are largely arbitrary and differ from those at LHCb.For the initial D + s mesons, the transverse momentum p T (the component of momentumperpendicular to the ˆ z –axis, which in the case of a proton-proton collider corresponds tothe direction of the beams) is generated according to an exponential distribution with amean of 1 . The angle θ , which is the angle between the direction of D + s momentumand the ˆ z –axis, is generated such that the pseudorapidity, η = − ln tan( θ/ D + s decay is spherically symmetric, the momenta of the finalstate products are generated such that they are uniform in the square Dalitz-plot variables m (cid:48) and θ (cid:48) . These momenta in the D + s rest frame are then boosted to the laboratory frame.To simulate the selection of D + s candidates in the experiment, only the candidates thatsatisfy certain kinematic criteria are retained: the total momentum p and the transversemomentum p T of each track are required to exceed 3 . . p T of at least one of the final state tracks is required to be greater than 1 . p T of the D + s candidate is required to be greater than 2 . . × events after selection,where a smaller sample of 10 selected candidates is used in the examples presentedelsewhere in this paper to estimate this distribution. D + s → K + π − π + The simulated combinatorial background contribution to the D + s → K + π − π + decayscontain purely random combinations of three tracks, as well as the combinations of ρ (770) → π + π − or K ∗ (892) → K + π − with a K + or a π + , respectively. The kaon andpion tracks, as well as the K ∗ and ρ resonances, are generated uniformly in pseudorapid-ity, η , and with an exponential p T distribution (with a mean p T of 0 . . . ρ and K ∗ in thecombinations before applying selection cuts is 20% and 10%, respectively. The invariant Natural units with c = (cid:126) = 1 are used in this paper m' ' q ' ) q ( m ' , e Figure 1: Relative efficiency variation over the square Dalitz-plot variables for D + s → K + π − π + sample obtained from high-statistics Monte-Carlo simulation. The distributionis normalised such that the average efficiency equals 1.masses of the ρ and K ∗ resonances are generated according to the relativistic Breit-Wigner distribution, with masses and widths equal to their world-average values [13], andtheir decay products are generated assuming that the resonances are unpolarised ( i.e. they are isotropic in the resonance rest frame and are uncorrelated with the third track).In a second stage, the three charged tracks are combined to form the D + s candidate.Only the tracks with the p T greater than 0 . D + s mass M D + s = 1 .
97 GeV [13]. Thethree-dimensional distribution of the invariant mass, m D ≡ m ( K + π − π + ), of the threeparticles before the kinematic fit, and the square Dalitz-plot variables m (cid:48) and θ (cid:48) afterthe kinematic fit, are used to extract the density of the background events in the signalregion.The distributions of each variable with the definition of signal and sideband regionsare shown in Fig. 2. To clearly show the features of the background density, the two-dimensional projections in Fig. 2(a,b,c) are obtained with the high-statistics Monte-Carlosample of 4 × events satisfying the selection requirements. The examples of backgrounddensity estimation in this paper, as well as the one-dimensional projections in Fig. 2(d,e,f)use the smaller sample of 10 candidates. The following examples assume that the back-ground density is estimated only using the sideband regions of the m D distribution, definedas m D < M D + s − . m D > M D + s + 0 . | m D − M D + s | < . K ∗ and ρ resonancesshifted in the sidebands with respect to the signal region. In this particular case, it ispurely explained by the kinematic fit procedure applied to the background sample. In a8 m' ' q D ) d m D ' , m q B ( m ' , (cid:242) (a) m' ( G e V ) D m ' q ) d D ' , m q B ( m ' , (cid:242) (b) ' q ( G e V ) D m ) d m ' D ' , m q B ( m ' , (cid:242) (c) (GeV) D m E n t r i e s / ( . G e V ) Lowersideband Signalregion Uppersideband (d) m' E n t r i e s / ( . ) Signal regionLower sidebandUpper sideband (e) ' q E n t r i e s / ( . ) Signal regionLower sidebandUpper sideband (f)
Figure 2: Simulated combinatorial background to D + s → K + π − π + decay. Two-dimensional (a) θ (cid:48) vs. m (cid:48) (b) m (cid:48) vs. m D , and (c) θ (cid:48) vs. m D projections, one-dimensionalprojections onto (d) m D variable with the definition of signal and sideband regions,and projections of signal and sideband regions onto (e) m (cid:48) and (f) θ (cid:48) variables. Two-dimensional distributions are normalised such that the average density is equal to one.9eal analysis, this can also be caused by dependence of background production propertieson the selection variable, m D . In this section, we present the estimate of the acceptance variation over the amplitudefit variables using conventional methods that involve use of Legendre polynomials andcubic splines. These conventional methods use histograms to estimate the local density ofevents before and after a requirement, and then interpolate the efficiency value betweenthe bin centres of the histogram.Numerous analyses use multidimensional orthogonal Legendre polynomials [6, 15–17]to parameterise the variation of the acceptance. In d dimensions these consist of theproduct of d n d -order polynomials, where n d can be different for each dimension, thatdescribe the form of the acceptance. As such, the number of free parameters in thismethod are O ( n × n × ... × n d ), increasing with power-law growth in the number ofdimensions. The coefficients of these polynomials are then extracted in a maximum-likelihood fit. As the optimal order of each of these polynomials is a priori unknown,often cross-validation and/or regularisation is used to prevent overfitting.Another method commonly used is interpolation between the histogram bin centres viacubic splines [7,18,19]. Here, the scale of the variation over the phase-space is determinedby the initial histogram bin size, and therefore to avoid overfitting the size and locationof bins are optimised before the spline function is calculated. Unlike in the case for theLegendre polynomials, no free parameters exist for the spline functions (except for thevalues at the bin centres). Therefore these are particularly susceptible to over-fitting, asthey simply “connect” the points.In Figure 3, we show the results of these two conventional approaches [20, 21]. For theLegendre polynomial fit, additive L and L regularisation terms were included into thelikelihood to reduce overfitting and improve likelihood fit stability [22]: − L reg = − L + L + L (6)with L = λ (cid:88) i,j | c ij | (7)and L = λ (cid:88) i,j c ij , (8)where c ij are the coefficients of the polynomials and λ and λ are regularisation parame-ters. A cross-validation procedure was performed to determine the optimal degree of theLegendre polynomials (set to be equal in each dimension for simplicity), as well as themagnitude of each of the L and L terms. This resulted in a polynomial degree of n = 8in each dimension, an L regularisation parameter λ = 0 .
01, and an L regularisationparameter λ = 0 .
1. The reduced χ for this fit is calculated using a independent test set,10 ' ' q ' ) q ( m ' , e (a) m' E n t r i e s / ( . ) SimulationFit result (b) ' q E n t r i e s / ( . ) SimulationFit result (c) m' ' q ' ) q ( m ' , e (d) m' E n t r i e s / ( . ) SimulationFit result (e) ' q E n t r i e s / ( . ) SimulationFit result (f)
Figure 3: Estimate of the acceptance variation using (top) Legendre polynomial and(bottom) cubic spline model over (left to right) two-dimensional m (cid:48) vs. θ (cid:48) variables, m (cid:48) and θ (cid:48) projections.and the number of effective degrees of freedom is calculated using bootstrap resampling,and yields a value of χ / nDoF = 2517 / . ×
10 binning in ( m (cid:48) , θ (cid:48) ) is used, and is chosen ad hoc such that the bin size is similar to the smallest structure size in these variables( ∼ . χ is calculated using the number of bins inthe χ test (50 ×
50) minus the number of points fitted by the spline (100), and a valueof χ / nDoF = 2460 / .
03 is obtained.
A Gaussian process [23] is a statistical model which assumes that each point in spaceis associated with a normally distributed random variable. This implies that any linearcombination of model estimates is also a normal distribution, yielding a closed formexpression for the model at an arbitrary point in space. Fortunately, to avoid having tofit for the parameters of an infinite number of normal distributions, Gaussian processes arecompletely determined by a parameterisation of the covariance matrix with a covariancefunction. As such, it is possible to obtain model estimates in a large number of dimensionswith relatively few parameters, which can then be used to extrapolate the behaviour ofthe model with reliable estimates of the uncertainty. Furthermore, these parameters canbe robustly extracted directly from the data, which gives Gaussian processes an advantageover other “non-parametric” models, such as kernel density estimates or piece-wise spline11nterpolation. A pedagogical introduction to Gaussian processes can be found in Ref. [24],and other applications in high-energy physics can be found in Refs. [25, 26]Given some input data vector x , of length n (where the elements of x can themselvesbe vectors of arbitrary dimension), the output, y , of Gaussian process is defined as y ∼ N (0 , Σ( x ; Θ)) , (9)where N is a multivariate normal distribution with zero mean, and the covariance matrix,Σ( x ; Θ), given byΣ( x ; Θ) = k ( x , x ; Θ) k ( x , x ; Θ) . . . k ( x n , x ; Θ) k ( x , x ; Θ) k ( x , x ; Θ) . . . k ( x n , x ; Θ)... ... . . . ... k ( x , x n ; Θ) k ( x , x n ; Θ) . . . k ( x n , x n ; Θ) . (10)Here k ( x i , x j ; Θ) is a covariance function, with hyperparameter vector Θ, that is definedbetween any two input points, x i , x j , that are elements of x . For a new input point x ∗ , theconditional probability of predicting an output that is equal to the true unknown value y ∗ , given the previously observed outputs, y , follows a normal distribution, P ( y ∗ | y ) ∼ N (Σ ∗ Σ − y, Σ ∗∗ − Σ ∗ Σ − Σ T ∗ ) , (11)where Σ ∗ = [ k ( x ∗ , x ; Θ) , k ( x ∗ , x ; Θ) , · · · , k ( x ∗ , x n ; Θ)], and Σ ∗∗ = k ( x ∗ , x ∗ ; Θ). The bestestimate of the true value, y ∗ , is equal to the mean of the above probability distribution,ˆ y ∗ = Σ ∗ Σ − y, (12)and the uncertainty is the square-root of the variance,Var( y ∗ ) = Σ ∗∗ − Σ ∗ Σ − Σ T ∗ . (13)The negative logarithm of the likelihood for this construction is given by − p ( y | x, Θ) = y T Σ − y + log | Σ | + n log 2 π. (14)The hyperparameters, Θ, of the covariance function can then be inferred by minimising thenegative logarithm of the likelihood (14), or obtained via marginalisation using suitablepriors and Markov-chain Monte-Carlo.One such covariance function is the Mat´ern function, k ν ( d ) = σ − ν Γ( ν ) (cid:18) √ ν dρ (cid:19) ν K ν (cid:18) √ ν dρ (cid:19) (15)where d is the distance between two points, Γ is the gamma function, K ν is the modifiedBessel function of the second kind, ρ and ν are non-negative parameters, and σ controlsthe absolute magnitude of the covariance. The Mat´ern function is defined in terms ofthe distance between two points, rather than the location of each point, and therefore12escribes a stationary distribution. For half-integer values of ν , this can be expressedas a product of an exponential function and a polynomial of order p = ν − /
2. Theparameters ν can be thought of as controlling the smoothness of the function, and when ν → ∞ , the Mat´ern function converges to the squared-exponential covariance functionlim ν →∞ k v ( d ) = σ exp (cid:18) − d ρ (cid:19) . (16)Here we use the Mat´ern function with ν = throughout, as this provides a good bal-ance between replicating and smoothing the observed structures, however the parametervalues, and the best model choice in general, depends on the data in question. These areideally selected using cross-validation, or a similar procedure, and can be considered as asource of systematic uncertainty on the final distribution.In reality, one would also want to describe distributions that are non-stationary (i.e.,where the mean in Eq. (9) is non-zero). However, due to the linearity of the model,this represents a simple subtraction of the mean function from the observed data, andtherefore there is no loss in generality due to this description. Specific assumptions on themean distribution will be discussed for the applications described in Sections 5.1 and 7. The dataset described in Section 3 parameterises the acceptance in terms of the squareDalitz-plot variables, m (cid:48) and θ (cid:48) . As the Gaussian process does not estimate the densitydirectly (although there are modifications that would permit this [27]), the density in( m (cid:48) , θ (cid:48) )-space is estimated first by a uniformly binned histogram, with 50 bins in eachaxis. The location of the centres of these bins are then the input points, x , defined above,where x i = [ m (cid:48) i , θ (cid:48) i ]. As these were generated uniformly in ( m (cid:48) , θ (cid:48) )-space, the acceptanceprobability in each bin is simply the reciprocal of the bin content, which is the output, y , ofthe Gaussian process. Therefore for each input x i = [ m (cid:48) i , θ (cid:48) i ] there is an associated output y i . As the acceptance is positive definite everywhere, the resulting Gaussian process doesnot represent a stationary process, and as such, a mean function constant in ( m (cid:48) , θ (cid:48) ) isadded to the Gaussian process to account for this scaling.An advantage of the Gaussian process is that it is relatively robust to statisticalfluctuations due to low sample sizes, as the uncertainty at each point is estimated directly.As such, the aforementioned binning can be considerably finer than in other methods,provided that the assumption of normally distributed uncertainties holds (processes wherethe likelihood is replaced with a Poisson distribution can also be used, however this isless computationally tractable than in the Gaussian case, as Poisson distributions are notclosed under linear combination).Here, Gaussian processes are implemented using the GPy package [28], and the result-ing acceptance model as a function of m (cid:48) and θ (cid:48) can be seen in Figure 4. The parametersof this model are the overall scale of the Mat´ern function, σ , the overall scale of theconstant mean function σ , the characteristic length scale over which points covary foreach dimension, ρ m (cid:48) and ρ θ (cid:48) , and a term describing the additive Gaussian noise at eachpoint, (cid:15) . This fit was performed using a maximum-likelihood approach, and the resultingparameter values can be found in Table 1. 13 ' ' q ' ) q ( m ' , e (a) m' E n t r i e s / ( . ) SimulationFit result (b) ' q E n t r i e s / ( . ) SimulationFit result (c)
Figure 4: Result of the density estimation of the simulated sample of D + s → K + π − π + decays using Gaussian process method (a) in two square Dalitz-plot variables m (cid:48) and θ (cid:48) ,and projections of the two-dimensional distribution onto (b) m (cid:48) and (c) θ (cid:48) variables.Table 1: Parameters of the Gaussian process fit to the simulated D + s → K + π − π + decays,with a Mat´ern kernel and a constant mean function.Model parameter Value σ
648 events σ
547 events ρ m (cid:48) . ρ θ (cid:48) . (cid:15) . χ per number of degrees of freedom of this model with respect to the data isevaluated using an independent dataset of simulated D + s → K − π + π − decays. Here, theeffective number of degrees of freedom is calculated approximately as the number of binsin the χ test (50 × χ / nDoF = 2471 / .
99 is obtained. This indicates that the model reproduces theunderlying distribution very well, where the smallness of the χ value is mostly due tothe fact that this reproduction can be achieved with a small number of parameters.As the only parameters that scale with dimensionality are the characteristic lengthscale of the Mat´ern function, the increase in the number of parameters is linear in thedimensionality. Furthermore, the time complexity of Gaussian processes is also linearin the dimensionality, making these very efficient in high dimensions compared to otherparameterisations. Unfortunately however, the complexity is cubic in the input data size,due to the dependence on a matrix inversion, and therefore these do not scale well withlarge data sizes. Nevertheless, methods exist to mitigate this, such as the use of binneddata in the strategies described here, or by selection of a small number of “pseudo”inputs [29]. Multivariate techniques such as artificial neural networks (ANNs) or boosted decisiontrees provide an alternative approach to parametrise multidimensional probability den-sity from scattered data [30, 31]. The approach involving ANNs exploits a property ofneural networks, where a “feed forward” network (when layers of neurons are arrangedin a non-cyclical structure), with smooth activation functions can approximate any con-tinuous function. Here, the parameters of the ANN are treated as free parameters in amaximum-likelihood fit to the unbinned data, performed by treating the negative loga-rithm of the likelihood as a custom loss function. Since this technique does not requirebinning the input data, and in general ANNs have successfully shown their ability formultivariate generalisation, we expect that density estimation approach using ANNs canbecome useful for multidimensional amplitude analyses. This approach is demonstratedbelow for the parameterisation of the two-dimensional acceptance of the D + s → K + π − π + decay, described in Section 3.The outputs of the n th neuron of the l th layer in the ANN is given by a n,l +1 = f (cid:32)(cid:88) m w nm,l a m,l + b n,l (cid:33) , (17)where w nm,l is the matrix of weights, b n,l is the vector of biases for l th layer, and f ( x ) isa non-linear activation function. For the estimated density to be smooth, it is convenientto use a smooth differentiable activation function such as a sigmoid function, f ( x ) = 11 + e − x . (18)In this structure, the first layer of neurons is the input layer, and accepts kinematicvariables z as inputs, while the output neuron return a single scalar, the density estimate P ( z ) 15ensity estimation can be performed by treating the weights and biases q ≡ { w nm,l , b n,l } of the ANN as free parameters, and minimising the negative logarithm of the likelihood, − L = − N (cid:88) i =1 ln P ( z i | q ) + 2 N ln (cid:32) M (cid:88) j =1 P ( y i | q ) (cid:33) , (19)where z i ( i = 1 . . . N ) are data points, and y i ( j = 1 . . . M ) is a uniformly distributedsample used for normalisation. The function (19) is used as the loss function to train theANN given the training sample z i .As in many applications of machine learning techniques, special care needs to be takento avoid overfitting, where the model configuration or parameters become too specialisedto the training dataset, and therefore fail to generalise properly. In the case of density es-timation with ANN, overfitting manifests itself as unphysical rapid oscillations in the PDFaround training data points. Regularisation techniques, where the likelihood is explicitlypenalised to promote smoothness or sparsity, are thus essential to control overfitting.It was found that regularisation which penalises large neuron weights (and thereforethose that result in large gradients in the likelihood function) works well in the typicalcases when density is expected to be smooth. Specifically, an L regularisation term ofthe form L = λ (cid:88) n,m,l w nm,l (20)is added to the loss function (19), where λ is the regularisation parameter that ultimatelycontrols the smoothness of the PDF.The density estimate of the D + s → K + π − π + decay acceptance is performed with anANN consisting of four hidden layers with the number of neurons, from the first to fourthlayer, equalling 32, 64, 32, and 8. The regularisation parameter λ is chosen to be equalto 1. Normalisation is performed with 5 × events distributed uniformly over the spaceof inputs, the square Dalitz plot. The likelihood minimisation is performed using theTensorFlow framework [32] and the Adam optimiser [33] with learning rate of 10 − . Theresulting estimate of the density after 10 000 training epochs (passes through the data) isshown in Fig. 5. As highlighted in Section 2.2, the conventional methods that either use sideband distri-butions or the sPlot technique to determine the combinatorial background contribution ingeneral introduce systematic biases, since they ignore correlations between the amplitudefit variables and the selection variables (such as the combined invariant mass of the finalstate particles m D ). The bias can become more pronounced if only one of the sidebandscan be used to estimate the background ( e.g. due to presence of specific peaking back-grounds in the other sideband as often happens in B meson decays). sPlot procedure alsointroduces additional statistical uncertainty compared to the parametric approach due tothe lack of any assumptions on the behaviour of the background.To overcome such issues, one can add the selection variables to the background pa-rameterisation. For example, in the case of Dalitz-plot analysis, a 3D fit can be performed16 m' ' q ' ) q ( m ' , e (a) m' E n t r i e s / ( . ) SimulationFit result (b) ' q E n t r i e s / ( . ) SimulationFit result (c)
Figure 5: Result of the density estimation of the simulated sample of D + s → K + π − π + decays using an ANN (a) in the two-dimensional square Dalitz-plot variables m (cid:48) and θ (cid:48) ,and projections of this distribution onto the (b) m (cid:48) and (c) θ (cid:48) variables.to obtain the probability density function P ( m (cid:48) , θ (cid:48) , m D ), which can then be used to ex-trapolate the desired combinatorial PDF B ( m (cid:48) , θ (cid:48) ) in the signal region. We present inthis section two such approaches, one using a Gaussian process [34], and another usingan ANN [35], to extrapolate the combinatorial background PDF from both the upperand lower sidebands of m D to the signal region. To illustrate the performance of both ofthese approaches, simulated combinatorial background of D + s → K + π − π + decay is used,as described in Section 3. In the Gaussian process method, the fit variables ( m (cid:48) , θ (cid:48) ) and selection variable ( m D )taken from the sideband sample are first binned to obtain a local estimate of the density,and the parameters of the covariance function are then inferred by fitting the model usingthe location of the bin centres and their respective yields. As mentioned in Section 5, themodel is fairly robust to variations in the choice of the location and size of these bins,providing that they capture sufficient variation in the input variables.Here, the D + s → K + π − π + lower, m D ∈ [1 . , .
92] GeV, and upper, m D ∈ [2 . , . D + s sideband described in Section 3.2 are separated into three bins of 0 .
05 GeV each,for a total of six bins in m D . In each of these bins, the square Dalitz plot is separatedinto 20 ×
20 uniform bins (with bin size 0 . × . × ×
20 = 2400inputs to the Gaussian process.The Gaussian process uses the Mat´ern kernel, defined in Section 5, with ν = , alongwith a constant mean function in ( m (cid:48) , θ (cid:48) ), and a linear mean function in m D . Resultsof the estimation of the background density in the sideband regions of m D variable arepresented in Fig. 6. Here it can be seen specifically that the Gaussian process modelestimates well the variation in the resonance structure in ( m (cid:48) , θ (cid:48) ) as a function of m D due to the kinematical constraints, permitting accurate estimation of this backgroundstructure in the unobserved signal region. The corresponding kernel parameters that areextracted from the data can be seen in Table 2.The distribution of the background in the signal region is obtained by querying theGaussian process at m D = 1 .
97 GeV, the results of which can be seen in Figure 7.17able 2: Parameters of the Gaussian process fit to the simulated D + s → K − π + π − back-ground, with a Mat´ern kernel, constant mean function in ( m (cid:48) , θ (cid:48) ), and linear mean functionin m D . Model parameter Value σ
279 events σ m D . σ . ρ m D .
27 GeV ρ m (cid:48) . ρ θ (cid:48) . (cid:15) . m (cid:48) distribution(Fig. 7(b)), the bias of the distribution is clearly smaller than that obtained from thesimple projections of the distribution in the sidebands. If there are no narrow structures in the background, one can consider a background PDFthat is positive-definite, reasonably smooth in m D , and is sufficiently generic in squareDalitz-plot variables, such as P ( m (cid:48) , θ (cid:48) , m D ) = | P ( m (cid:48) , θ (cid:48) ) + e − αm D P ( m (cid:48) , θ (cid:48) ) | (21)where P , ( m (cid:48) , θ (cid:48) ) are the functions modelled with ANN. One can then perform an un-binned fit of P ( m (cid:48) , θ (cid:48) , m D ) to sideband data, with regularisation to avoid overtraining,with the weights and biases q , of ANN functions P , and α as the free parameters.The background in the signal region can then be extrapolated using the trained model as B ( m (cid:48) , θ (cid:48) ) = P ( m (cid:48) , θ (cid:48) , m D = M D + s ).In the presence of narrow structures in the amplitude that vary as a function of m D (such as the resonant K ∗ and ρ contributions in the D + s → K + π − π + sample, see Fig. 2),the approximation shown in Eq.(21) may not work well. As ANN density estimation canbe performed in multiple dimensions, one can estimate the background density in theselection variable m D in addition to that in the Dalitz-plot variables, m (cid:48) and θ (cid:48) . Therefore,as an alternative to the PDF of the form (21), the full three-dimensional ANN can beused to parametrise the background as a function of square Dalitz-plot variables and m D ,where additional regularisation has to be applied to ensure continuity as a function of theselection variable m D . The latter can be done by adding an extra penalty term in thelikelihood which penalises configurations where the neurons in the input layer have largeweights corresponding to m D variable.The fit in the sideband regions to the simulated combinatorial background of the D + s → K + π − π + decay using the ANN is shown in Fig. 8. For the neurons in the firstlayer that take the m D dimension as input, the regularisation parameter λ is set equal18 q ( G e V ) D m Data ' q ( G e V ) D m Fit m' ' q Data m' ' q Fit m' ( G e V ) D m Data m' ( G e V ) D m Fit m' E n t r i e s ' q E n t r i e s (GeV) D m E n t r i e s Figure 6: Results of the estimation of combinatorial background density in sidebandregions using Gaussian process. Two-dimensional (first row) m D vs. m (cid:48) , (second row) θ (cid:48) vs. m (cid:48) , and (third row) m D vs. θ (cid:48) projections of the (left) simulated backgroundsample and (right) density predictions from the fit model. (Bottom row) One-dimensionalprojections onto (from left to right) m (cid:48) , θ (cid:48) and m D .19 ' ' q ' ) q B ( m ' , (a) m' E n t r i e s / ( . ) Signal regionSidebandsFit result (b) ' q E n t r i e s / ( . ) Signal regionSidebandsFit result (c)
Figure 7: Results of the interpolation of the combinatorial background density in the signalregion using Gaussian process: (a) two-dimensional density and (b,c) its projections.to 10, while for the other neurons this is equal to 1, as in Section 6. In Fig. 9, thepredicted combinatorial background density in the signal region using ANN approach isshown, where the ANN is trained for 10 000 epochs. As in the case of the Gaussian processtechnique, we can observe a certain bias due to smearing of the narrow structure in the m (cid:48) distribution. Possibly, fine tuning of the regularisation parameters could improve theagreement in this region. However, in the following Section, we propose a more generalsolution to control the features of density models that aims to solve this problem. In the training of the ANN, it is often difficult to replicate specific features of the resultingdensity, especially in the case of limited training data. The only handles on the genericANN training are the topology of the network and the generic regularisation terms, andcareful tuning of these is needed to obtain a reasonable description of the density.The procedure of parameterising the background or acceptance with using only the in-put data, without any external knowledge of the processes that govern the features of thedistributions, is not the most optimal approach. In general, it is known, for instance, thatthe acceptance function should be relatively smooth with a falloff at the boundaries ofthe phase space due to kinematic selection requirements, or that the combinatorial back-ground should contain contributions from certain two-body resonances. The implicationof this is that the behaviour is constrained much more than conventional parameterisationtechniques assume, and and ideally efficient procedure should take this prior informationinto account. For example, one can introduce a simplified model of these processes, andextract only the parameters of this simplified model from the unbinned data samples.In the case of background and efficiency distributions, even a simple analytical modelfor this would be difficult to express. Instead, here we propose a technique to performnonparametric estimation of these distribution, using the formalism described in Section 6,with the assumption that the complex observed behaviour explicitly depends only on afew underlying parameters that are sufficient to describe the efficiency or backgroundbehaviour in the region of interest. The values of these latent parameters can then be20 ' q ( G e V ) D m Data ' q ( G e V ) D m Fit m' ' q Data m' ' q Fit m' ( G e V ) D m Data m' ( G e V ) D m Fit m' E n t r i e s ' q E n t r i e s (GeV) D m E n t r i e s Figure 8: Results of the estimation of combinatorial background density in sidebandregions using an ANN. Two-dimensional (first row) m D vs. m (cid:48) , (second row) θ (cid:48) vs. m (cid:48) ,and (third row) m D vs. θ (cid:48) projections of the (left) simulated background sample and(right) density predictions from the fit model. (Bottom row) One-dimensional projectionsonto (from left to right) m (cid:48) , θ (cid:48) and m D . 21 m' ' q ' ) q B ( m ' , (a) m' E n t r i e s / ( . ) Signal regionSidebandsFit result (b) ' q E n t r i e s / ( . ) Signal regionSidebandsFit result (c)
Figure 9: Results of the interpolation of the combinatorial background density in thesignal region using an ANN: (a) its two-dimensional density and (b,c) its projections.inferred from the observed data, or detailed simulation in the case of a description for theacceptance, in order to parameterise the observations.This way, the features of the resulting density are controlled by the simplified model,which leverages prior information on the correlations between these variables and theparameters of interest. This results in more stable training of the ANN, as only data forthe simplified parameters are required, rather than resource-intensive detailed simulation,and therefore larger sample sizes can be generated. Crucially, reliance on a few latentparameters also results in a an ad hoc regularisation effect, and as such the densityobtained via this procedure is less sensitive to statistical fluctuations when obtaining thevalues of the simplified parameters. A similar technique has recently been independentlyproposed that utilises generative adversarial-networks [36].
In the initial stage, an estimate of the joint probability distribution P ≡ P ( z , Θ), is con-structed, in terms of the parameters of interest, z , in which the background or efficiencydescription is required, and the latent parameters on which the background or efficiencydepend, Θ. The variables z could comprise the (square) Dalitz-plot variables, such as inthe examples in this paper, but could also be anything else that is required to be parame-terised in a physics analysis, such as the invariant mass of the reconstructed particle, or itsdecay time. The parameters Θ are those that directly control physical constraints on thesystem, and influence z via their correlations. These parameters necessarily vary betweenanalyses, but it is likely that these would include effective cut values on the final stateparticle momenta, parameters that describe the shape of these momentum distributions,or fractions of potential background contributions.An estimate of the joint probability distribution is parameterised using an ANN, ob-tained via the probability density estimation technique described in Section 6. The ANNis trained using a sample of simulated data, s train = { z train , Θ train } , that encapsulates de-pendencies between the parameters of interest, z , and the latent variables Θ. This data isrequired to span the space of possible real parameter values, however accurate descriptionof any specific configuration of Θ or z is not required (that is, there is no requirement forthe set of input data points in this initial construction to overlap with the set of eventual22valuation points, due to the model smoothing).Secondly, an estimate of the specific values of the latent parameters, Θ pred , that corre-spond to the background data or detailed simulation, z data , is obtained. This is done byfixing the weights of the ANN and performing a maximum-likelihood fit for these values,treating the ANN output as the probability of the latent parameters conditioned on theknown parameters of interest, P (Θ | z ), such that Θ pred = arg max Θ P (Θ | z data ).Lastly, again using the ANN as a joint probability function, the sample z data is drawnfrom the distribution P ( z | Θ pred ), to obtain the probability density of the parameters ofinterest. This approach is illustrated below for the estimation of the acceptance and com-binatorial background distributions of the D + s → K + π − π + decay, described in Section 3.It is worth noting that, whilst these latent parameters should in principle comprise theset of features on which the parameters of interest depend strongly upon, this set need notbe exhaustive. Providing that the included parameters are at least reasonably correlatedwith any additional parameters that are not considered, yet influence the parametersof interest, an ‘effective’ value of these parameters can be obtained in the maximumlikelihood fit stage. As such, these latent parameters can differ from those that can becalculated directly from the dataset that the model is evaluated on, in such a way that theefficiency or background distribution can nevertheless be correctly parameterised usingthese. To demonstrate the feasibility of the model-assisted approach for the parameterisationof acceptances, the same model as the one described in Section 3.1 is used, however therequirements on the reconstructed D + s mesons and their decay products are varied in thegeneration of the training sample used to construct the ANN, as in step one above. Therange of the variations for each of the five parameters of the model is given in Table 3,where the entire sample consists of 500 000 events that satisfy the selection requirements.Since the efficiency model is relatively simple, generation of the training sample takesonly a few minutes, and therefore this can be arbitrarily large. Since the events that donot satisfy the selection requirements are rejected during generation, the initial uniformdistributions of the model parameters can become significantly non-uniform for the ac-cepted events in the training sample. To compensate for this effect, the model parametersare sampled from exponential distributions, where the parameters of the distribution aretuned to ensure that the distribution of accepted events is roughly uniform.The functional form of the efficiency model, a 7-dimensional probability density func-tion p ( m (cid:48) , θ (cid:48) , Θ), where Θ is the vector of parameters listed in Table 3, is obtained bythe ANN density estimation procedure described in Section 6. The ANN topology andtraining parameters are the same as those highlighted in that Section 6, with the excep-tion that here the L regularisation parameter λ = 0 .
5. The projections of the result ofthe ANN training after 30 000 epochs in the square Dalitz-plot variables, as well as thecorrelations between the Dalitz-plot variables and the latent model parameters, are givenin Appendix A.An unbinned maximum likelihood fit is then performed to obtain the effective modelparameters for a test sample corresponding to that described in Section 3.1. The results23 m' ' q ' ) q ( m ' , e (a) m' E n t r i e s / ( . ) SimulationFit result (b) ' q E n t r i e s / ( . ) SimulationFit result (c)
Figure 10: Result of the density estimation of the simulated sample of D + s → K − π + π − decays using the model-assisted ANN (a) in the two square Dalitz plot variables m (cid:48) and θ (cid:48) , and the projections of the two-dimensional distribution on to the(b) m (cid:48) and (c) θ (cid:48) variables.Table 3: Parameters of the D + s → K + π − π + efficiency model: the range of parametervariations used at the ANN training stage, the true values used in the generated testsample, and the reconstructed values extracted from the fit of the ANN model to the testsample. Model parameter Range True value Reconstructed valueTrack p T cut ( GeV) (0 . ,
1) 0.4 0 . ± . p cut ( GeV) (1 ,
10) 3.0 5 . ± . D + s p T cut ( GeV) (0 ,
5) 2.0 2 . ± . p T cut ( GeV) (0 . ,
3) 1.0 1 . ± . p T cut ( GeV) (2 . ,
6) 3.0 3 . ± . m' ' q ' ) q B ( m ' , (a) m' E n t r i e s / ( . ) Signal regionSidebandsFit result (b) ' q E n t r i e s / ( . ) Signal regionSidebandsFit result (c)
Figure 11: Results of the interpolation of the combinatorial background density in thesignal region using the model-assisted ANN: (a) the two-dimensional density and (b,c) itsprojections.
The estimation of the combinatorial background density is performed in a similar wayto that of the acceptance parameterisation, except that the ANN training sample alsoincludes the full range of the selection variable, m D , and the test sample only containsthe events in the sidebands (defined in Section 7) to reproduce a more realistic analysisscenario. As in the case of the acceptance parameterisation, the background model fromSection 3.2 is used to both generate samples for the initial joint density estimation by theANN, and the test dataset for the subsequent maximum likelihood fit. The list of modelgenerated parameters, with their input ranges, is given in Table 4, where the size of thetraining sample is 500 000 events. Here, the topology of the ANN is unchanged from thatdescribed in Section 7, and the penalty factor in the L regularisation term is taken to be λ = 2. The projections of ANN variables, after 30 000 training epochs, in the m (cid:48) , θ (cid:48) and m D variables, as well as the correlations of these variables with each other and with themodel parameters, are shown in Appendix B.The results of the resulting background density estimation are shown in Fig. 11, andthe true and reconstructed values of the model parameters are given in Table 4. Thevalues of the reconstructed parameters are consistent with the true values within theiruncertainties, which indicates a good quality of the 11-dimensional ANN parameterisationof the background model. Techniques have been proposed here to efficiently parameterise background and accep-tance variations that are an essential component to multidimensional fits of hadronicdecay amplitudes. Often, treatments of the acceptance variations are sub-optimal, asthey either do not exploit rudimentary assumptions of local smoothness, and requirelarge quantities of computationally expensive simulated data, or a sizeable systematicuncertainty results from the use of an inefficient parameterisation. For the backgrounddescription, assumptions often have to be made on the functional form of specific back-grounds, or on the validity of an extrapolation into the signal region. These assumptions25able 4: Parameters of the D + s → K + π − π + background model: the range of parametervariations used at the ANN training stage, the true values used in the generated testsample, and the reconstructed values extracted from the fit of the ANN model to the testsample. Model parameter Range True value Reconstructed valueMean p T ( K ) ( GeV) (0 . ,
1) 0.3 0 . ± . p T ( π ) ( GeV) (0 . ,
1) 0.6 0 . ± . p T ( K ∗ ) ( GeV) (0 . ,
3) 2.9 2 . ± . p T ( ρ ) ( GeV) (0 . ,
3) 2.0 2 . ± . K ∗ fraction (0 , .
3) 0.1 0 . ± . ρ fraction (0 , .
3) 0.2 0 . ± . p T cut ( GeV) (0 . , .
5) 0.3 0 . ± . p cut ( GeV) (1 ,
4) 3.0 2 . ± . Acknowledgements
The authors would like to thank their colleagues from the LHCb collaboration (TimGershon, Thomas Latham, Mark Whitehead, Sneha Malde, Maurizio Martinelli, andothers) for fruitful discussions. This work is supported by the Science and TechnologyFacilities Council (United Kingdom). 26 ppendicesA Training the neural network to parametrise effi-ciency model
The results of the ANN training to estimate the 7-dimensional density of a generatedefficiency model, as a function of a set of effective model parameters, is shown in Fig. 12.These plots show the projections of the m (cid:48) and θ (cid:48) variables (top row, two leftmost plots),as well as the normalised two-dimensional distributions for the correlations between m (cid:48) and θ (cid:48) (top row, two rightmost plots) and correlations between m (cid:48) or θ (cid:48) and the modelparameters. The plots marked as “Data” show the training data distributions, thosemarked with “Fit” show the high-statistics distributions generated from the result ofthe ANN training. As in the rest of the paper, the two-dimensional distributions arenormalised in such a way that the average of the projected distribution equals 1. B Training the neural network to parametrise back-ground model
The results of the ANN training to estimate the 11-dimensional density of generatedcombinatorial background model as a function of model parameters are shown in Fig. 13–15. These plots show the projections of the m (cid:48) , θ (cid:48) and m D variables (Fig. 13, top row),as well as the normalised two-dimensional distributions for the correlations between eachpair of m (cid:48) , θ (cid:48) and m D variables (Fig. 13, second row and the two leftmost plots in thethird row) and correlations between m (cid:48) , θ (cid:48) , or m D , and the model parameters. The plotsmarked as “Data” show the training data distributions, those marked with “Fit” showthe high-statistics distributions generated from the result of the ANN training. As in therest of the paper, the two-dimensional distributions are normalised in such a way that theaverage of the projected distribution equals 1. References [1] R. H. Dalitz,
On the analysis of tau-meson data and the nature of the tau-meson ,Phil. Mag. (1953) 1068.[2] J. Back et al. , Laura ++ : A Dalitz plot fitter , Comput. Phys. Commun. (2018)198, arXiv:1711.09854 .[3] LHCb collaboration, R. Aaij et al. , Observation of
J/ψ p resonances consistent withpentaquark states in Λ b → J/ψ pK − decays , Phys. Rev. Lett. (2015) 072001, arXiv:1507.03414 .[4] LHCb collaboration, R. Aaij et al. , Observation of exotic
J/ψ φ structures from am-plitude analysis of B + → J/ψ φK + decays , Phys. Rev. Lett. (2016) 022003, arXiv:1606.07895 . 27 m' E n t r i e s ' q E n t r i e s m' ' q Data m' ' q Fit m' c u t ( G e V ) T s u m p Data m' c u t ( G e V ) T s u m p Fit m' c u t ( G e V ) T m a x p Data m' c u t ( G e V ) T m a x p Fit m' c u t ( G e V ) T p s + D Data m' c u t ( G e V ) T p s + D Fit m' T r ac k p c u t ( G e V ) Data m' T r ac k p c u t ( G e V ) Fit m' c u t ( G e V ) T T r ac k p Data m' c u t ( G e V ) T T r ac k p Fit ' q c u t ( G e V ) T m a x p Data ' q c u t ( G e V ) T m a x p Fit ' q c u t ( G e V ) T T r ac k p Data ' q c u t ( G e V ) T T r ac k p Fit ' q c u t ( G e V ) T s u m p Data ' q c u t ( G e V ) T s u m p Fit ' q c u t ( G e V ) T p s + D Data ' q c u t ( G e V ) T p s + D Fit ' q T r ac k p c u t ( G e V ) Data ' q T r ac k p c u t ( G e V ) Fit
Figure 12: Results of the estimation of the simulated acceptance distribution, as a functionof the effective model parameters, using an ANN.28 m' E n t r i e s ' q E n t r i e s (GeV) D m E n t r i e s m' ' q Data m' ' q Fit m' ( G e V ) D m Data m' ( G e V ) D m Fit ' q ( G e V ) D m Data ' q ( G e V ) D m Fit m' ) ( G e V ) * ( K T M ea n p Data m' ) ( G e V ) * ( K T M ea n p Fit m' fr ac ti on r Data m' fr ac ti on r Fit m' ( K ) ( G e V ) T M ea n p Data m' ( K ) ( G e V ) T M ea n p Fit m' ) ( G e V ) p ( T M ea n p Data m' ) ( G e V ) p ( T M ea n p Fit m' c u t ( G e V ) T T r ac k p Data m' c u t ( G e V ) T T r ac k p Fit m' ) ( G e V ) r ( T M ea n p Data m' ) ( G e V ) r ( T M ea n p Fit m' T r ac k p c u t ( G e V ) Data m' T r ac k p c u t ( G e V ) Fit
Figure 13: Results of the estimation of the simulated combinatorial background density,as a function of effective model parameters, using an ANN.29 m' fr ac ti on * K Data m' fr ac ti on * K Fit ' q ( K ) ( G e V ) T M ea n p Data ' q ( K ) ( G e V ) T M ea n p Fit ' q T r ac k p c u t ( G e V ) Data ' q T r ac k p c u t ( G e V ) Fit ' q c u t ( G e V ) T T r ac k p Data ' q c u t ( G e V ) T T r ac k p Fit ' q fr ac ti on r Data ' q fr ac ti on r Fit ' q ) ( G e V ) p ( T M ea n p Data ' q ) ( G e V ) p ( T M ea n p Fit ' q fr ac ti on * K Data ' q fr ac ti on * K Fit ' q ) ( G e V ) r ( T M ea n p Data ' q ) ( G e V ) r ( T M ea n p Fit ' q ) ( G e V ) * ( K T M ea n p Data ' q ) ( G e V ) * ( K T M ea n p Fit (GeV) D m c u t ( G e V ) T T r ac k p Data (GeV) D m c u t ( G e V ) T T r ac k p Fit (GeV) D m fr ac ti on * K Data (GeV) D m fr ac ti on * K Fit (GeV) D m ) ( G e V ) r ( T M ea n p Data (GeV) D m ) ( G e V ) r ( T M ea n p Fit
Figure 14: Results of the estimation of the simulated combinatorial background density,as a function of effective model parameters, using an ANN.30 .8 1.9 2 2.1 (GeV) D m ) ( G e V ) * ( K T M ea n p Data (GeV) D m ) ( G e V ) * ( K T M ea n p Fit (GeV) D m T r ac k p c u t ( G e V ) Data (GeV) D m T r ac k p c u t ( G e V ) Fit (GeV) D m fr ac ti on r Data (GeV) D m fr ac ti on r Fit (GeV) D m ) ( G e V ) p ( T M ea n p Data (GeV) D m ) ( G e V ) p ( T M ea n p Fit (GeV) D m ( K ) ( G e V ) T M ea n p Data (GeV) D m ( K ) ( G e V ) T M ea n p Fit
Figure 15: Results of the estimation of the simulated combinatorial background density,as a function of effective model parameters, using an ANN.315] LHCb collaboration, R. Aaij et al. , Studies of the resonance structure in D → K ∓ π ± π + π − decays , Eur. Phys. J. C78 (2018) 443, arXiv:1712.08609 .[6] LHCb collaboration, R. Aaij et al. , Dalitz plot analysis of B → D π + π − decays ,Phys. Rev. D92 (2015) 032002, arXiv:1505.01710 .[7] LHCb collaboration, R. Aaij et al. , Dalitz plot analysis of B s → D K − π + decays ,Phys. Rev. D90 (2014) 072003, arXiv:1407.7712 .[8] A. Poluektov,
Kernel density estimation of a multidimensional efficiency profile ,JINST (2015), no. 02 P02011, arXiv:1411.5528 .[9] LHCb collaboration, R. Aaij et al. , Study of the D p amplitude in Λ b → D pπ − decays , JHEP (2017) 030, arXiv:1701.07873 .[10] M. Pivk and F. R. Le Diberder, sPlot: A statistical tool to unfold data distributions ,Nucl. Instrum. Meth. A555 (2005) 356, arXiv:physics/0402083 .[11] Y. Xie, sFit: a method for background subtraction in maximum likelihood fit , arXiv:0905.0724 .[12] BaBar, B. Aubert et al. , An amplitude analysis of the decay B ± → π ± π ± π ∓ , Phys.Rev. D72 (2005) 052002, arXiv:hep-ex/0507025 .[13] Particle Data Group, C. Patrignani et al. , Review of particle physics , Chin. Phys.
C40 (2016) 100001.[14] LHCb collaboration, A. A. Alves Jr. et al. , The LHCb detector at the LHC , JINST (2008) S08005.[15] LHCb collaboration, R. Aaij et al. , Angular analysis of the B → K ∗ µ + µ − decayusing fb − of integrated luminosity , JHEP (2016) 104, arXiv:1512.04442 .[16] LHCb collaboration, R. Aaij et al. , Differential branching fraction and angular anal-ysis of the decay B → K + π − µ + µ − in the K ∗ , (1430) region , JHEP (2016) 065, arXiv:1609.04736 .[17] LHCb collaboration, R. Aaij et al. , Angular moments of the decay Λ b → Λµ + µ − ,JHEP (2018) 146, arXiv:1808.00264 .[18] LHCb collaboration, R. Aaij et al. , Amplitude analysis of the decay B → K S π + π − and first observation of CP asymmetry in B → K ∗ (892) − π + , Phys. Rev. Lett. (2018) 261801, arXiv:1712.09320 .[19] LHCb collaboration, R. Aaij et al. , Evidence for a η c (1 S ) π − resonance in B → η c (1 S ) K + π − decays , Eur. Phys. J. C78 (2018) 1019, arXiv:1809.07416 .[20] T. Oliphant,
NumPy: A guide to NumPy , USA: Trelgol Publishing, 2006.[21] E. Jones, T. Oliphant, P. Peterson et al. , SciPy: Open source scientific tools forPython , 2001. 3222] H. Zou and T. Hastie,
Regularization and variable selection via the elastic net , Journalof the Royal Statistical Society: Series B (Statistical Methodology) (2005), no. 2301.[23] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learn-ing , Adaptative computation and machine learning series, University Press GroupLimited, 2006.[24] S. Roberts et al. , Gaussian processes for time-series modelling , Phil. Trans. R. Soc.A (2013), no. 1984 20110550.[25] M. Frate et al. , Modeling Smooth Backgrounds and Generic Localized Signals withGaussian Processes , arXiv:1709.05681 .[26] A. Bozson, G. Cowan, and F. Span`o, Unfolding with Gaussian Processes , arXiv:1811.01242 .[27] I. Murray, D. MacKay, and R. P. Adams, The gaussian process density sampler , in
Advances in Neural Information Processing Systems , pp. 9–16, 2009.[28]
GPy python package , http://gpy.readthedocs.io/en/deploy/ .[29] E. Snelson and Z. Ghahramani, Sparse gaussian processes using pseudo-inputs , in
Advances in neural information processing systems , pp. 1257–1264, 2006.[30] A. Likas,
Probability density estimation using artificial neural networks , ComputerPhysics Communications (2001), no. 2 167 .[31] B. Viaud,
On the potential of multivariate techniques for the determination of multidi-mensional efficiencies , Eur. Phys. J. Plus (2016), no. 6 191, arXiv:1602.03758 .[32] M. Abadi et al. , TensorFlow: Large-scale machine learning on heterogeneous systems ,2015. Software available from tensorflow.org.[33] D. P. Kingma and J. Ba,
Adam: A method for stochastic optimization , CoRR abs/1412.6980 (2014) arXiv:1412.6980 .[34] D. P. O’Hanlon,
Studies of CP -violation in charmless three-body b -hadron decays ,PhD thesis, University of Warwick, UK, CERN-THESIS-2017-184. Presented 04 Oct2017.[35] A. Mathad, Search for CP violation in charmless three-body decays of strange-beautybaryons , PhD thesis, University of Warwick, UK, CERN-THESIS-2018-237. Pre-sented 08 Oct 2018.[36] S. Alonso-Monsalve and L. H. Whitehead, Image-based model parameter optimisationusing Model-Assisted Generative Adversarial Networks , arXiv:1812.00879 .[37] R. Brun and F. Rademakers, ROOT: An object oriented data analysis framework ,Nucl. Instrum. Meth.
A389 (1997) 81.3338]
TensorFlowAnalysis package , https://gitlab.cern.ch/poluekt/TensorFlowAnalysishttps://gitlab.cern.ch/poluekt/TensorFlowAnalysis