OOn the Inversion of High Energy Proton
Mikael Mieskolainen
Department of Physics, Division of Particle Physics and Astrophysics, University of Helsinki,P.O. Box 64, 00014 Helsinki, Finland
E-mail: [email protected]
Abstract:
Inversion of the K -fold stochastic autoconvolution integral equation is an ele-mentary nonlinear problem, yet there are no de facto methods to solve it with finite statis-tics. To fix this problem, we introduce a novel inverse algorithm based on a combinationof minimization of relative entropy, the Fast Fourier Transform and a recursive version ofEfron’s bootstrap. This gives us power to obtain new perspectives on non-perturbative highenergy QCD, such as probing the ab initio principles underlying the approximately nega-tive binomial distributions of observed charged particle final state multiplicities, related tomultiparton interactions, the fluctuating structure and profile of proton and diffraction. Asa proof-of-concept, we apply the algorithm to ALICE proton-proton charged particle multi-plicity measurements done at different center-of-mass energies and fiducial pseudorapidityintervals at the LHC, available on HEPData. A strong double peak structure emerges fromthe inversion, barely visible without it. a r X i v : . [ h e p - ph ] J un ontents – 1 – Introduction
Differential distributions of final state particles in high energy collisions, such as multiplicitydistributions, are notoriously difficult to model precisely from the first principles of QCD,due to the nature of confinement and non-perturbative infrared physics. The topic hasbeen studied since the early days of Hagedorn fireball [28], Feynman scaling [21] and Kobe-Nielsen-Olesen/Polyakov self-similarity (fractal) scaling [33, 43]. Hadron production hasbeen treated resulting from break down of classic strings [7], Feynman-Field model [22]and Lund strings of PYTHIA [4, 5], cluster hadronization of HERWIG [46], the topologicalexpansion of cylindrical Pomeron particle production model or ‘quark-gluon strings’ [31],to name some well known concepts. From the perturbative side, the local QCD parton-hadron duality [8] is perhaps to most appealing, with much support already from LEP data.Basically, it states that the hadronization happens at low virtuality scale independent of thehard scattering scale. However, then case of pure soft hadron-hadron scatterings withoutany hard scale involved is very interesting for collectivity or global color coherence reasons.These concepts are in a way or another implemented algorithmically in the Monte Carloevent generators, with varying number of free parameters to be fixed by data, with goalsof factorizing universal, initial state and center-of-mass energy (in)-dependent properties.In proton-nucleus ( pA ) and nucleus-nucleus ( AA ) physics a whole new class of phe-nomenological topics appear, from the space-time evolution of relativistic hydrodynamicsand medium expansion to quark-gluon plasma signatures and the separation of differentstages and collective phases of the process. A very interesting topic is the question whichof these are due to heavy ion physics and which of them already happen in the elementaryproton-proton interactions. In order to probe the ab initio physics behind many phe-nomenological models, and to factorize effects between heavy ions and elementary pp , solidmathematical approaches are crucial as analysis tools.In this work we introduce a novel inverse algorithm to invert a certain simple stochasticintegral equation, the K -fold autoconvolution. By autoconvolution we mean convolving thedistribution with itself and K -fold means repeating the operation recursively K -times. Thisis a nonlinear operation. In addition, we take the K to be a random number. The ‘directproblem’ and closely related problems are well studied in probability theory and implicitlyubiquitous in high energy physics, given the much studied pQCD logarithmic evolutionintegro-differential resummation schemes such as DGLAP in ln( Q ) and BFKL in ln(1 /x ) ,which require as input the non-perturbative parton densities. The ‘inverse problem’ hasbeen studied much less, considering the variety of methods for standard deconvolutionand more generally, methods to invert approximately linear and nonlinear integral equa-tions with varying kernels. In experimental high energy physics, the unfolding of detectorresponses is more well known problem [11, 16, 35, 48]. It corresponds to the usual deconvo-lution in the special case where the detector response matrix is of a convolution kernel type.One reason why inverse methods have not been studied extensively is, perhaps, the MonteCarlo event generators, which are extensively used as the direct models. The demandinginverse problem in that case is the multidimensional parameter estimation or tuning of theevent generator itself, instead. – 2 –he problem of inverting the pileup effects in inclusive soft QCD multiplicity measure-ments by inverting a K -fold autoconvolution was treated in [39]. However, the subtractiveiterative solution there was proposed without a statistical treatment, explicit regularizationor uncertainty estimation and only for a small number of K . We address these issues hereby developing an all-order inverse. As far as we know, our approach which combines statis-tical modeling, Fast Fourier Transform (FFT) and Efron’s bootstrap, is the first of its kindin high energy physics, but also including other fields. However, we mention that relatedproblems have been studied also in a different context in laser optics [23], inverse problemsand mathematics [14, 26, 38, 40, 45], and the problem is also closely related to the inverseBorn series problem [41].In Section 2 we introduce the mathematical direct problem and related formalismtogether with some simple examples and in Section 3 and 4 we go through the correspondinginverse problem, its discretization, regularization and the algorithmic solution. Section 5is devoted to simulations and finally in Section 6 we do a proof-of-concept study of LHC-ALICE proton-proton multiplicity data. In Section 7 we discuss further applications andresearch directions. Let the underlying continuous or discrete differential distribution be f X , with normalization (cid:82) f X ( x ) dx = 1 and the corresponding random variable be X . The distribution can bediscrete, for example, in a case of charged particle multiplicity spectrum measurements. The K -fold autoconvolution means that the measurement g Y is a sum of mutually independentidentically distributed (i.i.d) random variables Y = X + X + ... + X K . If we take K ∼ Poisson ( µ ) and all X i ∼ f X , this is known as a compound Poisson distribution. Theconvolution arises here naturally, because the sum of random variables is equivalent to aconvolution between the probability distributions of the corresponding random variables.The autoconvolution operation is defined with [ f X (cid:126) f X ]( y ) ≡ (cid:90) ∞−∞ f X ( y − x ) f X ( x ) dx. (2.1)If we take the support of f X on [0 , ∞ ) , because physical observables are usually non-negative, then the autoconvolution integral range lower bound is zero. As a remark, in amore general setup, a translation dependent Green’s function G ( x, y ) would be used insteadof f ( x − y ) . That would give a Fredholm’s integral equation instead of a convolution integral,leading us to the ‘inverse Schrödinger equation’ type of problems, as they are often calledin the field of inverse problems.For now, let us assume that the measurement follows a compound Poisson. However, inour algorithmic formulation we allow also any discrete distribution for K , not just Poissondistribution. The distribution of random variable Y is now given, by construction, as an– 3 –utoconvolution series weighted with Poisson probabilities g Y ( y ) = P f X ( y ) + P [ f X (cid:126) f X ]( y ) + P [[ f X (cid:126) f X ] (cid:126) f X ]( y ) + . . . = 11 − e − µ ∞ (cid:88) n =1 µ n n ! e − µ f (cid:126) n X ( y ) , (2.2)where the convolution power (cid:126) n is defined recursively as f (cid:126) n = f (cid:126) ( n − (cid:126) f and f (cid:126) = f .We have removed the unobservable case n = 0 which gives Y = 0 and renormalized theremaining Poisson probabilities P n , n = 1 , , , . . . to sum to one. The zero suppressedmean of the variable K is E [ K >
0] = µ/ (1 − exp( − µ )) > , (2.3)where as the mean and variance (second central moment) of the compound distribution aregiven by useful formulas E [ Y ] = E [ K ] E [ X ] (2.4)Var [ Y ] = E [ K ] Var [ X ] + Var [ K ]( E [ X ]) . (2.5)See [27] for all higher order moments and central moments of generic compound distribu-tions.In the Fourier spectral domain, the characteristic function (CHF) ϕ X is defined as ϕ X ( t ) = E [ e itX ] = (cid:90) R e itX dF X ( x ) = (cid:90) R e itX f X ( x ) dx = (cid:90) e itQ X ( p ) dp, (2.6)where F X ( x ) and Q X ( p ) are the cumulative and the inverse cumulative distribution func-tions of the random variable X ∈ R . A probability generating function (PGF) G X corre-sponding to a discrete probability mass function p ∼ X ∈ N is a Laurent series around zerodefined as G X ( z ) = E [ z X ] = ∞ (cid:88) n = −∞ p ( n ) z n . (2.7)For the rest of the work we assume that the characteristic and generating functions aredefined in the complex plane within their domain of convergence of the correspondingintegral and sum, respectively, and leave the extensive algorithmic treatment of possiblesingularities for future work.Using these tools, the well-known characteristic function for the compound Poisson isobtained with ϕ Y ( t ) = E [ e itY ] = E K (cid:104)(cid:0) E [ e itX ] (cid:1) K (cid:105) = E K (cid:104) ( ϕ X ( t )) K (cid:105) = ∞ (cid:88) n =0 ϕ X ( t ) n P ( K = n )= ∞ (cid:88) n =0 ϕ X ( t ) n µ n n ! e − µ = e µ ( ϕ X ( t ) − (2.8)– 4 –nd solving similarly for the case K > ( n = 1 , , ... ) gives ϕ g ( t ) ≡ ϕ Y | K> ( t ) = e − µ (cid:0) e µϕ f ( t ) − (cid:1) − e − µ = 1 e µ − e µϕ f ( t ) − , (2.9)where we denoted the characteristic function of f X ( t ) with ϕ f ( t ) . A direct attempt toinvert this would be to take the inverse Fourier transform, but we would encounter theproblem of multivalued complex logarithm. Defining the complex logarithm in a suitableway was proposed together with a kernel estimator, for ‘decompounding’ in [45]. Here wetake a different approach, where we avoid altogether using complex logarithms or othermultivalued complex operations such as complex roots.We will obtain a practical formal notation for the algorithm by representing the problemwith an operator F : Ω X → Ω Y , where Ω X and Ω Y are the original and the smeared domain F ( f ) + δg = g. (2.10)Our main goal is to invert this nonlinear mapping taking also into account the statisticalfluctuations δg . The nonlinearity comes directly from the fact that the operator F dependson f .To point out, the existence of K -fold convolution is guaranteed by the infinitely di-visibility of a probability distribution, which is the Lévy-Khintchine theorem. That is, forgiven a distribution f , there exist for any n , the corresponding n -th convolution power. Allmembers of the stable family - for example Gaussian, Poisson, Negative binomial, Gamma,Cauchy (non-relativistic Breit-Wigner), Lévy and Landau - are infinite divisible. They arealso closed under convolution. Examples
An interesting phenomenological fact, already observed in the CERN proton-antiprotonUA5 data at √ s = 540 GeV [3, 25], is that the charged particle multiplicity distributionstend to follow approximately negative binomial distribution (NBD) or two of them, as theclassic two component models may suggest. Same observation is still approximately valid atthe LHC [24]. Mathematically speaking NBD follows from a compound Poisson distributionof number of K logarithmic series f ( n ; p ) = p n / ( − n ln(1 − p )) , n ∈ N + i.i.d variables eachwith shape parameter p ∈ (0 , while K being Poisson distributed with parameter µ ∈ R + .Thus we obtain negative binomial distribution on N + as the compound distribution withparameters − p ∈ (0 , and − µ/ ln(1 − p ) ∈ R + . This mathematical picture may be usefulto keep in mind while thinking the possible dynamical explanations.Monte Carlo models reproduce NBD like distributions by multipomeron cuts or mul-tiparton interaction modeling – the low multiplicity shape of the distribution is usually very sensitive to the non-perturbative proton profile or eikonal density driving the num-ber K with impact parameter dependent distributions, and also to diffraction contribu-tion. Kinematically, a system with the cms energy W will span a longitudinal rapidityrange Y ∼ ln( W /W ) , where W is order of proton mass. In Lund model like scenarios,also the average number of particles scales like (cid:104) N (cid:105) ∼ ln( W /W ) . A multistring sys-tem total multiplicity consisting of K -strings will behave as (cid:104) N (cid:105) ∼ (cid:80) K − i =0 ln( s i,i +1 /W ) – 5 –ith s i,i +1 = ( k i + k i +1 ) being sub-Mandelstam invariants constructed from the parton4-momenta { k i } spanning the strings and the variance Var [ N ] behaves proportionally to (cid:104) N (cid:105) [6], which is Poisson like behavior. So if the sub- W distributions behave typicallylike falling powerlaws and the number of K -simultaneous strings is also modeled as Poissonlike with impact parameter dependence or not, one will generate negative binomial likedistributions. To point out at this point, the soft transverse momentum dependence withmultiplicity is not understood from the first principles.Now let us think instead the experimentally visible superposition of ‘pileup’ proton-proton interactions at the LHC for a given Poisson µ . At the LHC Poisson µ is measuredfor low values of µ ∼ . (ALICE) using the minimum bias trigger occupancy (cid:104) R (cid:105) =1 − P ( µ ) = 1 − e − µ ⇔ µ = − ln(1 − (cid:104) R (cid:105) ) , where R ∈ [0 , . For large values of µ ∼ whenthe occupancy is completely saturated (ATLAS, CMS), the scaling of the track multiplicityor number of primary vertices is used to determine µ . A combination of both can be usefulfor intermediate values of µ ∼ (LHCb), for example. The pileup is illustrated in Figure 1.Convolution of a negative binomial with a negative binomial is yet again a negative binomial– this is the closure under convolution. It is worth pointing out that the estimation of µ based solely on the measured distribution g ( x ) itself cannot be done in general. Thisis because if the distribution is from the family of distributions which are closed underconvolutions, then we have no capability of identifying if the measured distribution is pileupfree, or, one with a larger scale and position parameter than the expected one, or theautoconvoluted version of one with smaller scale and position parameter. Thus, µ must bea parameter of the inversion and must be inferred from other measurements or theoreticalconstructions.Clearly, our discussion is now very close to the KNO / Polyakov scaling [33], whichstates that there is one elegant scaling function ψ giving the probability of multiplicity n as a function of the scaling variable n/ (cid:104) n ( s ) (cid:105) P ( n ; s ) = 1 (cid:104) n ( s ) (cid:105) ψ (cid:18) n (cid:104) n ( s ) (cid:105) (cid:19) , (2.11)where s is the the center of mass energy squared Mandelstam variable, a Lorentz scalar.However, this scaling is known to be violated at some level since the ISR times, and atthe LHC it holds merely in limited regions of the full phase space, in a narrow centralrapidity window. In general, when inspecting scaling or its violation one needs to takeinto account the collision initial state, e.g. pp versus e + e − , the fiducial phase space of themeasurement and contribution of diffraction. At model level, multiparton or multipomeronexchanges running as a function of energy are usually creating significant deviations fromthe KNO scaling. Clearly additional effects arise from soft versus hard scale interactions.See the books [18, 32] for discussion on a variety of perturbative QCD and other scenariosexhibiting scaling. (cid:104) R (cid:105) t,N B = t (cid:82) ∆ t f R ( t ) dtN B f O , where N B is the number of colliding bunch pairs, f O the LHC revolutionfrequency (Hz) and f R ( t ) the instantaneous trigger rate (Hz). – 6 –
50 100 150 200 25010 -4 -3 -2 -1 Figure 1 : K -fold compound Poisson autoconvolution of the negative binomial distributionwith varying µ .By central limit theorem, the K -fold convolution will map almost every distributioneventually into a Gaussian distribution when E [ K ] → ∞ . This is clearly visible in Figure1, where we used the direct mapping of Equation 2.2 with different Poisson µ -values toconvolve the negative binomial distribution P NBD ( n ; (cid:104) n (cid:105) , k ) = Γ( k + n )Γ( k )Γ( n + 1) (cid:20) (cid:104) n (cid:105) k + (cid:104) n (cid:105) (cid:21) n (cid:20) kk + (cid:104) n (cid:105) (cid:21) k (2.12)with a fit to the LHC charged particle multiplicity data in proton-proton at √ s = 7 TeV forpseudorapidity range η ∈ [ − . , . obtained from [24]. The fit parameters central valuesare (cid:104) n (cid:105) = 12 . and k = 1 . , where (cid:104) n (cid:105) is the average multiplicity of NBD and k is relatedto variance D = (cid:104) n (cid:105) − (cid:104) n (cid:105) by D / (cid:104) n (cid:105) = 1 / (cid:104) n (cid:105) + 1 /k .In an explicit experimental realization, in order to take into account the detector re-sponse and its effects on the resolution and efficiency, an unfolding algorithm with uncer-tainty estimation and other corrections should be processed first – if necessary. Output ofthis can be then propagated to the algorithm described here. For the rest of the work, weassume this to be the case. For example Cauchy / non-relativistic Breit-Wigner type distributions do not turn into Gaussian, alsomoments for these distributions are undefined. – 7 –
Inverse problem
Solving the K -fold autodeconvolution means here a statistical inverse of Equation 2.10.Unfortunately the measurement noise and modeling errors will usually always forbid usexecuting a direct inversion in practice, even if we would have a closed form expression todo so. This ill-posedness requires usually either implicit or explicit regularization whichturns the problem into a more well-posed one.We calculate the convolution by pointwise multiplication in Fourier domain using theconvolution theorem f (cid:126) g = F [ f ] F [ g ] , which works also the other way around F [ f ] (cid:126) F [ g ] = f g , which is also often called ambiguously convolution or folding of distributions of f and g . Now, a fixed K ∈ N autoconvolution is given by F [ f (cid:126) K ] = ( F [ f ]) K = F [ g ] , (3.1)where power is applied as z K = ( | z | e i ( ϕ +2 πn ) ) K = | z | K e iKϕ e πinK , n ∈ Z . Thus, in spectraldomain the stochastic autoconvolution results in random scaling of amplitudes and rotationof phases of the characteristic function. Now when K is not an integer this results in amulti-valued operation, because e πinK does not map a full rotation around the complexplane for different values of n . To remind, if we shift the probability density f ( x − x ) , thisresults in a phase shift in the frequency domain by exp( − iωx ) F [ f ] .A formal ‘naive’ inverse is obtained by taking the complex K -th root in Fourier domain,and then proceeding with an inverse Fourier transform (cid:0) F [ f ] K (cid:1) /K = ( F [ g ]) /K (3.2) f = F − (cid:104) ( F [ g ]) /K (cid:105) , (3.3)analogously to the standard naive Fourier deconvolution. However, complex root is a multi-valued operation | z | /K e i ( Arg ( z )+2 πn ) /K with n = 0 , , . . . , K − roots with suitable branchcuts to be specified algorithmically, in principle. Now we see that the only strictly wellposed case is the case when the characteristic function is real and non-negative ↔ a mirrorsymmetric (even) probability density around zero by the properties of Fourier transform, acase which is clearly too restrictive to be of interest here. Also, this ‘solution’ does not takeinto account that K is a random variable, neither it takes into account the finite statisticsof g and mismatches between the measurement and modeling causing instabilities to beregulated. For some related discussion, see for example [12]. Discretization
In order to solve the finite sample version of the problem, we need a (discrete) represen-tation of the problem. For simplicity, computational efficiency and due to the high energyphysics analysis convention – as a practical format of the measurement, we use histogramsas the density estimators of f and g , even if simple kernel (Parzen) estimators and evenmore advanced density estimation techniques could be used in principle. Also, we con-sider only 1D-version of the problem. However, extension to higher dimensions is formallystraightforward. – 8 –he histograms are defined in terms of D, KD bins with corresponding fixed bin width ∆ x ≡ ∆ y . In the context of algorithms histograms are represented with finite non-negativecolumn vectors, which we denote with bolded symbols f [ n ] = |{ f i ∈ [ x n , x n + ∆ x ) }| , n ∈ Z Ω X := { , , . . . , D − } (3.4)and similarly for a K -fold autoconvoluted g [ n ] = |{ g i ∈ [ y n , y n + ∆ y ) }| , n ∈ Z Ω Y := { , , . . . , K ( D − } , (3.5)where f i , g i ∈ R denote sample elements and x n , y n denote the bin lower edges. We extendedthe domain of autoconvoluted by K -times and because K is a random variable, in practicewe take some large enough integer which avoids any truncation problems in the upper tail.Only values of { g i } are directly observable and measured. Now the discrete autoconvolutionis defined as [ f (cid:126) f ][ n ] = D − (cid:88) m =0 f [ m ] f [ n − m ] , (3.6)and the K -fold version follows recursively. In the algorithmic implementations, when nec-essary to extend the domain of vectors due to convolution, we do it explicitly by zeropadding.In principle, the binning can be also non-uniform, such as logarithmic. However, thatmust be taken explicitly into account in the evaluation of the discrete convolution, which isdefined using uniform sampling. One classic solution could be interpolation and resamplingschemes with B-splines, for example. Here, however, we use only fixed binning. The binwidth ∆ x needs to be chosen such that the spectrum resolution criteria and event countstatistics are taken into account. This in order not the create ‘noisy spectrum’ with lowbin counts, or on the other hand, induce a loss of resolution. Thus, it has also a role as animplicit regulator. Relative entropy minimization
The measured histogram g is assumed to be bin-by-bin Poisson distributed, a usual as-sumption, given also the Poisson superposition property (cid:80) i Poisson ( µ i ) ∼ Poisson ( (cid:80) i µ i ) .The Poisson likelihood with a known constant background histogram b is then written asa product over the histogram bins p ( g | f ) = (cid:89) i [ F f + b ] g i i e − [ F f + b ] i g i ! (3.7)and its negative log-likelihood is now our fit quality term J ( f | g ) = − log p ( g | f ) = (cid:88) i − g i log([ F f + b ] i ) + [ F f ] i + b i + log( g i !) (3.8)and by neglecting terms which do not affect the solution and rearranging gives J ( f | g ) = (cid:88) i g i g i log([ F f + b ] i ) + [ F f ] i − g i . (3.9)– 9 –his often used formulation is the minimum entropy solution, and is equivalent to mini-mizing generalized Kullback-Leibler divergence (relative entropy) [34], known also as theCsiszár I-divergence [15]. The gradient of this functional is ∇ J ( f ) = ∂∂ f (cid:0) − g T log( F f + b ) + 1 T F f (cid:1) = F T − F T g F f + b = F T (cid:18) − g F f + b (cid:19) , (3.10)where divisions are understood component wise. If distributions are normalized to unity,then F T ≡ holds. Here, we use event counts.The gradient based iterative update or a fixed point iteration is now obtained from thesteepest descent update f k +1 = f k − γD k ∇ J ( f k ) , where D k is a gradient scaling matrix.If we set D f k = diag ( f k ) and γ = 1 which is well known to correspond in this case tothe expectation maximization (EM) [17] formulation of a frequentist maximum likelihoodsolution under the Poisson likelihood, then we obtain a Richardson-Lucy [37, 44] typemultiplicative deconvolution formula well known in optics and astrophysics u k = (cid:0) f k (cid:11) ( F T ) (cid:1) (cid:12) (cid:0) F T g (cid:11) ( F f k + b (cid:1) , (3.11)where (cid:12) and (cid:11) are Hadamard’s vector component wise product and division, respectively.This was also re-invented by D’Agostini [16] in high energy physics unfolding context. Forthe rest of the paper, we set the background b = 0 . The RL-type update conserves non-negativity of the solution and the total number of events. Another way to derive the RLtype update is to set gradient Equation 3.10 to zero, move terms on both sides, multiplyby f and do fixed point iteration. Regularization
The problem of choosing the spectral bandwidth cut-off or regularization strength in thenon-transformed domain are the most common but also the most difficult topics of in-verse problems. Motivated by the fact that the distributions of observables for us areusually smooth and non-discontinuous, we use a traditional variational regularization witha Tikhonov (cid:96) -norm type regularity functional J R ( f ) = λ (cid:90) Ω (cid:107)∇ f ( x ) (cid:107) dx, (3.12)where Ω ⊂ R . The minimum of this is given by ∇ J R ( f ) = λ ∇ (cid:18)(cid:90) Ω (cid:107)∇ f ( x ) (cid:107) dx (cid:19) = λ ∇ ∗ ∇ f ( x ) = − λ div ( ∇ f ( x )) = − λ ∇ f ( x ) = 0 , (3.13)where we applied to the integral von Neumann boundary conditions and Gauss divergence-theorem, the standard vector analysis identities and ∇ is the usual Laplacian. That is,the gradient descent direction is given by the negative Laplacian.– 10 –nstead of using the abstract functional gradient, the regularization term is discretizedusing a finite difference matrix and the corresponding finite Laplacian ∇ M ≡ − · · ·
00 1 − ...... . . . · · · − , ∇ M ≡ ∇ TM ∇ M = − − · · · − − ...... . . . · · · − − . (3.14)It is naturally also possible to modify the boundary conditions which affect these matrices,based on a priori assumed or known properties of the distribution f ( x ) .Regularization parameter λ depends on the number of observed event count N , thenumber of histogram bins D (= problem discretization) and the shape of the distributionof interest. The average pileup µ and the functional smoothness of the distribution underinversion are very important factors affecting the nonlinear direct operator and thus thenecessary regularization strength. In the limit N E → ∞ and µ → , we can take asymptot-ically λ → . There are several semi-heuristic ‘data-driven’ methods developed for selectingthe regularization. The well known principles are the L-curve, generalized cross valida-tion (GCV), Morozov’s discrepancy principle and various covariance and residual spectrumanalysis methods [42]. Monte Carlo event generator driven or a toy model based analysistogether with data driven approaches is also a reasonable choice in high energy physics. Weuse a simple variational equilibrium approach to choose the regularization parameter. Thisis is described in Section 5.The full solution taking together fidelity + regularity usually means directly minimizingin the direction of combined gradient of the full cost min f J ( f | g ) + λJ R ( f ) (3.15)as implemented for example in [9] in the context of image deconvolution. However, this weobserved to give very unstable results in this problem. Thus, we implemented the regular-ization step through forward-backward slitting type gradient update, similar in fashion toproximal optimization algorithms. It means that we update the result after the EM-updatestep as f k +1 = P + (cid:0) u k − λ ∇ M u k (cid:1) , (3.16)where P + ( · ) is the positive solution projector operator P + : R n → R n + such that P + ( x ) = y gives y i = x i if x i > , else, y i = 0 , ∀ i = 1 , . . . , n . Separating the regularization as aseparate step resulted in a much improved behavior. However, the approach chosen hereis the best performing we found from the vast phase space of optimization algorithms andregularization approaches. One must remember that we are not dealing here with an inverseproblem with a fixed kernel, thus this problem has its own peculiarities.– 11 – Algorithm
For every iteration, we construct the autoconvolution operator by taking the discrete Fouriertransform using FFT as F = F [ f ] and then construct the empirical characteristic function ofthe projected measurement with a nonlinear complex map G = e µ − (cid:0) e µF − (cid:1) using Equa-tion 2.9, where complex exponential is the single valued function exp( z ) = (cid:80) n ≥ z n /n ! , z ∈ C . The autoconvolution ‘kernel’ in spectral domain is then obtained by taking a point wisedivision H = G (cid:11) F , inverse transforming h = F − [ H ] and finally representing it as aToeplitz convolution matrix to obtain F = T [ h ] . The operator T represents here a sim-ple standard way to construct a Toeplitz matrix. We assume F to be non-zero for allof its components, which might be violated in pathological cases. These are intrinsicallynon-invertible without extra a priori information.We remark here that this nonlinear map and division in spectral domain has a differentgoal than in the usual Fourier domain linear deconvolution such as in Wiener filters [47],where the noise amplification is regularized in the spectral domain. In Wiener filtering, thedivision gives a solution to the problem non-recursively. Here, on the other hand, we usethe spectral domain as a way to obtain the Toeplitz matrix representation in the probabilitydomain extremely efficiently via FFT. The full solution is recursive and regularized in theprobability domain.Discretization of the operator F in the case of arbitrary (non-Poisson) compound dis-tribution is done with a finite number of convolution Toeplitz matrices F ( n ) , n = 1 , . . . , ˜ K with F (1) = I up to a numerically suitable finite order ˜ K . The maximum order is basicallylimited only by the available computing resources. These Toeplitz matrices are then addedtogether with weights P n . That is, we evaluate the discretized and truncated version ofEquation 2.2. This numerical finite order implementation was cross checked against theexact all order Fourier domain method in the Poisson case, as described above, and theyagreed within floating point accuracy.Now summarizing: the multiplicative algorithm describe above corresponds to a non-negative solution with Poisson bin fluctuations likelihood with the generalized Kullback-Leibler divergence being the fit criteria. On the other hand, subtractive algorithms areusually encountered with iterative solutions to unconstrained optimization under Gaussiannoise and (cid:96) -minimum norm minimizing the least squares cost J LS ( f ) = (cid:107)F f − g (cid:107) withan iterative solution f k +1 = f k − γ ∇ J LS = f k − γ F T ( F f k − g ) (4.1)which is known as the Landweber iteration scheme [36]. Non-negativity of the solutionshould be enforced simply with the projector P + ( · ) , because it is not a built-in constraint ofthe standard least squares. The γ is a relaxation parameter which controls the convergence,and can be optimized by several different line search methods or by fixing it to a constant ∼ , which gives (much) slower convergence. This Gaussian version of the algorithm couldbe utilized with large event samples. With low event count histograms, the multiplicativesolution was observed to be superior in terms of stability of the solutions.– 12 – lgorithm 1 Kisu : K-fold Inverse of Stochastic Autoconvolution
INPUT:
Smeared histogram g with N events, regularization strength λ ∈ R + , Poissonmean µ ∈ R + OR discrete distribution P n with (cid:80) ˜ Kn =1 P n = 1 , number of iterations R ∈ N + , background histogram b or otherwise b = 0 . If µ given, use option §I below, else , use input P n and option §II belowInitialize f ← Z ˜ K ( g − b ) with ˜ K -fold zero padding Z K ( · ) for all k = 1 , . . . , R doStep 1. Construction of the direct operator: §I : Poisson case; ‘all-order’ map: F ← F [ f k ] , G ← e µ − (cid:0) e µF − (cid:1) , H ← G (cid:11) F , h ← F − [ H ] , F k ← T [ h ] §II : Generic P n case; evaluated ‘order-by-order’: for all n = 2 , . . . , ˜ K do F ( n ) k ← T [ f (cid:126) n − k ] f (cid:126) n k ← f (cid:126) n − k (cid:126) f k end for F k ← (cid:80) ˜ Kn =1 P n F ( n ) k , where F (1) k = I Step 2.
EM-step + regularization/smoothing with positivity constraint: u k ← (cid:0) f k (cid:11) ( F T ) (cid:1) (cid:12) (cid:0) F T g (cid:11) ( F f k + b (cid:1) f k +1 ← P + (cid:0) u k − λ ∇ M u k (cid:1) end forOUTPUT: Deconvolution estimate ˆ f ← Z − K ( f R ) with the same support as g . Uncertainty estimation
Point estimates are obtained with Algorithm 1, which we call by the name
Kisu . Thealgorithm first solves the direct problem and then one EM-step plus regularization of theinverse, and recursively alternate between these two until convergence. The regularizationis done explicitly, because it is well known that the EM-type maximum likelihood solutionalone will amplify the high frequencies ( ∼ Poisson noise) when the number of iterations k → ∞ . That is, iteration first fits the low frequency Fourier components and later startsto fit the high frequency noise to the solution. Semi-explicit regularization strategy wouldcorrespond to an early iteration stop. However, with an explicit regularization schemeas here, the iteration can be allowed to continue till convergence within some numericalthreshold.The iterative bias estimation and uncertainty estimation are done using numericalMonte Carlo bootstrap with Algorithms 2 and 3. We call these daughter and motherbootstrap, respectively. Algorithm is started by calling the mother bootstrap, which callsthe daughter bootstrap, which calls Kisu . Bootstrap is selected here due to nonlinear– 13 – lgorithm 2 ‘Daughter bootstrap’ – Iterative Bias Correction
INPUT:
Smeared histogram g , number of bias correction iterations M ∈ N + , bootstrapsample size per iteration B ∈ N + and Algorithm A1 parameters.Run A1 using g as input, obtain ˆ f Set ˆ f (cid:63) ← ˆ f , ˆ g (cid:63) ← g for all i = 1 , . . . , M do Set S i ← ∅ for all j = 1 , . . . , B do Draw g ∗ j ∼ ˆ g (cid:63) with replacement using toy Monte CarloRun A1 using g ∗ j as input, obtain f ∗ j , update S i ← S i ∪ { f ∗ j } end forStep 1. Iterated bias estimate using bootstrap sample S i mean: ˆ Bias [ˆ f ] ← Mean [ S i ] − ˆ f (cid:63) Step 2.
Point estimate with bias correction, and non-negativity re-enforced: ˆ f (cid:63) ← P + (ˆ f − ˆ Bias [ˆ f ]) Step 3.
Using direct operator estimation
Step 1. of A1 , generate: ˆ g (cid:63) ← F (cid:63) (ˆ f (cid:63) ) end forOUTPUT: Bias corrected estimate ˆ f (cid:63) , first estimate ˆ f and the sample S B for the standardbootstrap confidence interval evaluation.and recursive nature of the problem which makes the usual linearized Taylor expansionerror propagation and analytical (Gaussian) approximations unreliable or semi-impossible.Bootstrap was introduced in the seminal paper by Efron in 1979 [19] and the recursivebias correction was first discussed in 1986 [29]. In high energy physics unfolding context,bootstrap iterated bias correction and related confidence intervals were utilized only quiterecently in [35] and we follow a similar strategy. The underlying inverse problem is nonlinearhere, in contrast. Bootstrap is relatively easy method to implement and variants of it havealready long been used in high energy physics dubbed often under the large umbrella of ‘toyMonte Carlo’. A practical problem is the high computational cost, fortunately bootstrapsampling is trivially parallelizable. Although Algorithm 1 and iterative bias correction loopof Algorithm 2 are recursive, and thus cannot be made fully parallel.Bias of a parameter estimator is by definition Bias [ˆ θ ] = E [ˆ θ ] − θ . Iterative bootstrapestimate of the bias replaces the unknown expectation with a bootstrap sample mean andthe true parameter value with the current estimate of the parameter. This is described inalgorithm 2. However, the variance of the estimator var [ˆ θ ] = E [(ˆ θ ) − E [ˆ θ ]) ] can grow asa result of the bias correction. That is, thinking in terms of classic but non-robust meansquared error MSE [ˆ θ ] = E [(ˆ θ − θ ) ] = var (ˆ θ ) + ( Bias [ˆ θ ]) , it is clear that a good estimatoris a combination of both qualities. Classic goal has usually been to find out the minimumvariance unbiased estimator (MVUE), which clearly minimizes MSE among the unbiased– 14 – lgorithm 3 ‘Mother bootstrap’ INPUT:
Smeared histogram g , bootstrap sample size Q ∈ N + , Algorithm A2 and A1 parameters.Set S (cid:63) ← ∅ , S ← ∅ for all q = 1 , . . . , Q do Draw g ∗ q ∼ g with replacement using toy Monte CarloRun A1 using g ∗ q as input, obtain ˆf ∗ (cid:63)q and ˆf ∗ q , update S (cid:63) ← S (cid:63) ∪ { ˆf ∗ (cid:63)q } , S ← S ∪ { ˆf ∗ q } end forOUTPUT: Samples S (cid:63) and S for calculating the confidence intervals, median etc., forboth bias corrected ( (cid:63) ) and uncorrected (0) estimates.estimators. For more information about the iterative bootstrap, we refer the reader to thebook by Hall [30].We calculate the uncertainty estimates by ordering the sample obtained from themother algorithm and calculate the vector component (pointwise) − α percentile in-tervals at level α as [ ˆf ∗ (cid:63)/ ,α , ˆf ∗ (cid:63)/ , − α ] , where (cid:63)/ means we consider both bias corrected ( (cid:63) )and uncorrected (0) estimates and their intervals. As another option, the so-called basicbootstrap intervals would be obtained with [2 ˆf − ˆf ∗ (cid:63)/ , − α , ˆf − ˆf ∗ (cid:63)/ , − α ] , where ˆf is the es-timate obtained directly running Kisu . Basic bootstrap intervals have clearly a possibilityof flipping the intervals upside down, which we observed also in the simulations. One couldalso go further and estimate the spectrum global uncertainty coverage, not just point bypoint local one. For more information, see [20].
The first scenario to study is the the simplest fixed K autoconvolution inverse of the neg-ative exponential distribution without any uncertainty estimation. We illustrate this inFigure 2 and compare with the naive Fourier domain inverse with spectral cut-off regu-larization, that is, we set high frequencies above the cut-off Λ ω to zero, with ω ∈ [0 , .The spectral cut-off is tuned to give the best performance for this scenario. We observenearly perfect reconstruction with Kisu and very large oscillations with the naive Fourierdomain algorithm. These oscillations seemed to appear immediately after a small amountof counting fluctuations in the observed spectrum g , which we simply added here as a Gaus-sian noise. For a typical iteration trajectories, see Figure 3. We see that the variationalregularity cost is kept as almost constant, and the error with respect to the ground truthand the re-projection error behave in a same way, which is always called for behavior.We tried to two different ‘data-driven’ ways of choosing the regularization parameter,see Figure 4. First, the regularization parameter λ was selected as the equilibrium point min( α + β ) , where the re-projection error is α = χ g = || ( g − ˆ g ) / √ g || (cid:96) and the regularitycost is β = χ ∇ ˆ f = ||∇ ˆ f / (cid:112) ˆ f || (cid:96) , with operations taken element wise. Re-projection is– 15 – Figure 2 : Simplified fixed K autoconvolution inverse using the naive Fourier domaininverse with spectral cut-off and Kisu based inverse. The true distribution is f ( x ) =1 /α exp( − x/α ) with α = 2 . Figure 3 : Typical
Kisu algorithm iteration trajectories. Red lines denote the re-projectionerror, blue lines are the error with respect to the true distribution (available only in simu-lation) and black lines denote the variational regularity (smoothness).simply defined as ˆ g = ˆ F ( ˆ f ) , available after the inversion. The values for different λ valuesare obtained via brute force loop. This was repeated for each simulation scenario separately.The second criteria was to use the Morozov like discrepancy principle point where the re-projection error α is approximately the same as the number of histogram bins D . For therest of the simulations, we use the equilibrium solution. When λ → , the solution becomes– 16 –ver smooth which results in a loss of high frequency details and when λ → , we start tofit the noise in high frequencies. In general, slightly larger λ values were always obtainedas the equilibrium solution compared to the discrepancy principle. -2 -1 Figure 4 : Data driven equilibrium between the re-projection cost α and the variationalregularity cost β . The horizontal line denotes the number of bins D in the measuredhistogram ∼ number of degrees of freedom.In more demanding simulations we have a scenario which corresponds approximatelyto inverting minimum bias pileup at the LHC. We chose fully analytical distributions asthe ground truth f and draw samples via von Neumann acceptance-rejection Monte Carlosampling. The distribution is a negative binomial with parameters given in Figure 5 andexponential in Figure 6. The number of events N and the Poisson µ were varied, withthe histogram bin size ∆ x kept approximately fixed. One must remember that the overallstatistics scales also with the µ -value, so there must be an optimal point with respect tothe pileup deterioration and the collected number of events.Convergence of the iterative algorithms was obtained in every scenario, and the funda-mental limitations were hit when the smeared distribution g was almost purely Gaussian,which was the case when µ ∼ or more. In that case, the inverted distribution was lack-ing any fine structure. Frequentist point wise uncertainty bands ( α = 0 . ∼ CL wereobtained from the bootstrap procedures. The coverage is heuristically reasonably whencomparing with respect to the truth, however, the nonlinear oscillation is not clearly cov-ered by the bootstrap. A σ signal to uncertainty ratio, shown in each figure, seems to bea reasonable sanity threshold cut. – 17 – Figure 5 : Simulations with f ( x ) = P NBD ( x ; (cid:104) n (cid:105) = 12 . , k = 1 . . Bootstrap based uncer-tainties are denoted with blue. – 18 – Figure 6 : Simulations with f ( x ) = 1 /α exp( − x/α ) with α = 6 . Bootstrap based uncer-tainties are denoted with blue. – 19 –n Figure 5, we observe some oscillation or ‘nucleation’ of certain eigenmodes in thesolution, which is not an unusual phenomena in inverse problems. We see that the biascorrection seems to increase variance of the estimates slightly, which is visible in bin-by-binfluctuations. This is expected due the to bias-variance tradeoff. The optimal number ofbias iterations should be studied further, because it is a free parameter, unless taken tillconvergence. In the NBD simulation case, bias correction seems to have amplified slightlythe oscillation via recursion. Here, we used three iterations. Thus, it can behave as anadditional regulator of the problem. The size of bootstrap samples for the daughter andmother algorithms should be taken as large as computational resources allow. As a practicalguideline, extensive simulations should be always used to investigate the problem specificstability, regularization, bias-variance and uniqueness. For this proof-of-concept study, we use ALICE proton-proton charged particle multiplicityspectra measured at √ s = 0 . and 7 TeV from HEPData [2]. The publication includes alsodata at √ s = 8 TeV, which we leave out here because of very similarity with √ s = 7 TeV,but we include it in our analysis code available online. Phenomenologically, the averagecentral multiplicity density dN ch /dη in proton-proton follows Regge like power law scaling ∝ s α (0) − with the effective Pomeron intercept α (0) (cid:39) . . The event selection definitionsare: the INEL (minimum bias inelastic) which in this case means minimal activity in any ofthe trigger subdetectors and the NSD (non-single-diffractive), which is event rapidity topol-ogy based selection. The idea behind the NSD is to suppress qualitatively single diffractionevents within forward system mass range which result in a large enough pseudorapidity gapon forward or backward pseudorapidity side of the detector. The single diffractive suppres-sion is done simply by requiring event activity on both forward and backward triggers. Inaddition, the INEL events may have either N ch ≥ or ≥ particles (tracklets) required inthe central region | η | < . We use N ch ≥ class, because diffractive events can be with N ch = 0 at central but trigger forward detectors.We executed Kisu inversion simply based on the Poisson compounding hypothesis withdifferent µ values and the regularization being fixed to small λ = 0 . in order not to biasthe distribution shapes. Varying regularization strength results in principle in a systematicuncertainty, but we observed it affect mainly the small scale structure. The ALICE datauncertainties are a combination of systematic factors with technically unknown distributioncoverage, for example soft QCD Monte Carlo modeling affecting estimated trigger efficien-cies and detector unfolding, and statistical counting fluctuations. For the systematic shapevariations we did a minimum and maximum global shape shift and evaluated statisticalbin-by-bin bootstrap Poisson re-sampling on top of that, with the given event sample sizes O (7 · ) at 0.9 TeV and O (6 · ) at 7 TeV from [2]. The uncertainties in Figures 7 and8 represent the 95CL values of this procedure. In principal, one could have done also bin-by-bin bootstrap for the systematic shapes, but because that procedure would neglect allbin-to-bin continuity correlations, it would give way too large high frequency fluctuationsnot really reflecting typical systematic spectrum distortion or bias variations.– 20 – -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 Figure 7 : ALICE INEL (left) and NSD (right) data at √ s = 0 . TeV [2] for three differentpseudorapidity intervals and the
Kisu inversion results under different µ -hypothesis.– 21 – -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 -3 -2 -1 0 25 50 75 100 125 1500.511.5 Figure 8 : ALICE INEL (left) and NSD (right) data at √ s = 7 TeV [2] for three differentpseudorapidity intervals and the
Kisu inversion results under different µ -hypothesis.– 22 – Figure 9 : ALICE NSD data parameter µ -scans for two different energies, √ s = 0 . TeV(left) and 7 TeV (right).As a surprise from the inversion, a strong hidden secondary NBD like peak structureappears at certain µ hypothesis, most strongly with Poisson µ ≈ . . . number of simul-taneous ‘mini-collisions’, with the zero-suppressed average given by Equation 2.3. Alsointeresting is that this peak is not strongly visible in the INEL class but appears only oncesingle diffractive events have been suppressed using the NSD class. It seems that the firstNBD like peak is similar at both energies, but the second runs with energy. To point out,the INEL with N ch ≥ event class (figures available using our code) gives results slightlysimilar to the NSD class, this requirement effectively also suppress diffraction. Also, in gen-eral the secondary peak increases in relative amplitude when the pseudorapidity windowis enlarged and also the peak position shifts. A thorough interpretation of this observa-tion requires further studies. A typical interpretation of the negative binomial double peakstructures is that there are two separate mechanisms for the particle production initiator,soft confining and hard point like. The high multiplicity tail is also interesting, usuallydifficult to describe perfectly by many Monte Carlo models. For comparisons, see [2].In Figure 9, we scan numerically over µ for the minimum of d ˆ f ( µ ) dN as the criteria forthe steepest dip. The results for ˆ µ show the mean and its standard error over differentpseudorapidity intervals, value growing with energy, which is expected in the multipartoninteraction picture with increasing particle densities. At lower energy, there seems to belarger variation with the solutions. We found out that using larger regularization λ valuespushes the minimum towards lower values of µ , perhaps simply (over)-smoothing the solu-tions. Finally, we see in Figure 7 and 8 ratio plots that the re-projection ˆ g = ˆ F ( ˆ f ) versusthe measured g has the largest tension at very low multiplicities N ch < , in the domaindominated by diffraction with qualitatively different behavior than the NBD peaks. Thesecond interesting domain is around N ch ∼ , where oscillations are manifest.– 23 –
25 50 75 100 125 15012345678 0 25 50 75 100 125 150123456780 25 50 75 100 125 15012345678 -3.5-3-2.5-2-1.5-1-0.5
Figure 10 : Inversion results of ˆ f in the ( N ch , µ ) plane at √ s = 7 TeV for η ∈ [ − . , . .Colors denote values of log [ P ( N ch )] .Figure 10 demonstrates the ‘topological’ differences between the inversion results as afunction of the µ parameter. The distribution splits and developes the secondary peak atcertain µ -value in the case of the NSD N ch ≥ event class. In the case of the INEL N ch ≥ splitting also happens, and the re-fusion of two peaks happens faster as a function of thegrowing µ -value. Where as in the case of the INEL N ch ≥ class distribution, no splittingis visible. – 24 – Conclusions and prospects
We studied the inversion of a stochastic autoconvolution integral equation and showed thatuseful inverse solutions can be obtained with a novel algorithm. Special effort was invested inthe algorithmic construction from bottom-up and in statistical and systematic uncertaintyestimation. A new class of semi-model free phenomenological interpretations and statisticaltests of inclusive distributions in soft QCD are possible with the algorithm. Possibly alsouseful in jet substructure studies when used together with jet algorithms. The algorithmcan be also utilized directly as an experimental pileup inversion algorithm in minimum biasstudies. There are also possibilities to combine the statistical approach derived here withevent-by-event pileup correction algorithms such as [10, 13]. Future directions may includemore detailed analytic treatment in terms of both phenomenology of soft QCD, but alsotechnical development of the algorithm for other type of statistical integral equations. Theultimate goal could be higher dimensional tomography of high energy proton, which maybe realistic with the help of deep neural network techniques.Inverting final state multiplicity spectrum caused by independent multiparton inter-actions (MPI) is an obvious research target. We did a simple one free parameter inver-sion using ALICE multiplicity measurements, which exposed an interesting secondary peakstructure most prominent with the NSD event selection class. We are not aware of a similarstudy, typically superposition fits with multiple negative binomial distributions are done,or Monte Carlo model based studies. One could do a further study where different cms-energies and rapidity intervals are matched together using the autoconvolution picture indirect or inverse direction. Speaking in the Regge theory language, approximately similarconcept is the multiple independent Pomeron exchanges with ‘AGK cuts’ [1]. Given a modelof exchange probabilities of multi-Pomeron exchange with inelastic cuts, such as the inelas-tic eikonal approximation, in principle we can from the measured observables reconstructa single Pomeron exchange observables without a Monte Carlo model. Deviation from ex-pectations indicate multipomeron (enhanced) interactions which are nonlinear effects notfitting in the K -fold autoconvolution picture. Thus, the inverse algorithm result can be seenas a statistical null-hypothesis generator. In addition, probing the fuzzy interface betweenmultiplicities driven by a semi-hard scale interaction versus soft cut Pomerons, could beinteresting. Also the nonlinear gluon saturation conjecture at low Bjorken- x and the robustexperimental observables of it are open problems.Combining the approach here with other mathematical tools, the shadow of diffractioncan be studied in a new light. That is, it would be interesting to see if data can directly showevidence for the AGK cuts like phenomena – certain specific additive algebraic rules, multi-plicity density, rapidity gap topologies. So far, no water-proof observables with one-to-onemapping with AGK have been measured, as far as we know. We also see that methods herecould possibly give new insight to the ‘centrality’ determination in heavy ion collisions, whencombined together with Gribov-Glauber model. Inverting the measured transverse energydistribution could lead to more precise understanding of pure ‘binary scaling’ (independentnucleon-nucleon collisions) versus strong nuclear effects. The mathematical description hereis, essentially, what is approximately meant by binary scaling.– 25 – otes: The complete code to reproduce all algorithms and figures in this paper is availableunder the MIT license at github.com/mieskolainen.
Acknowledgments
Risto Orava is acknowledged for discussions on the topic.
References [1] V. Abramowsky, V. Gribov, and O. Kancheli.
Sov. J. Nucl. Phys , 18(308):21, 1974.[2] S. Acharya et al. Charged-particle multiplicity distributions over a wide pseudorapidityrange in proton-proton collisions at √ s = Eur. Phys. J. , C77(12):852,2017. doi: 10.1140/epjc/s10052-017-5412-6.[3] G. Alner, K. Alpgård, P. Anderer, R. Ansorge, B. Åsman, K. Böckmann, C. Booth,L. Burow, P. Carlson, J.-L. Chevalley, et al. Multiplicity distributions in differentpseudorapidity intervals at a CMS energy of 540 GeV.
Physics Letters B , 160(1-3):193–198,1985.[4] B. Andersson, G. Gustafson, and C. Peterson. A semiclassical model for quark jetfragmentation.
Zeitschrift für Physik C Particles and Fields , 1(1):105–116, 1979.[5] B. Andersson, G. Gustafson, G. Ingelman, and T. Sjöstrand. Parton fragmentation andstring dynamics.
Physics Reports , 97(2-3):31–145, 1983.[6] B. Andersson, P. Dahlqvist, and G. Gustafson. On local parton-hadron duality.
Zeitschriftfür Physik C Particles and Fields , 44(3):461–466, 1989.[7] X. Artru and G. Mennessier. String model and multiproduction.
Nuclear Physics B , 70(1):93–115, 1974.[8] Y. I. Azimov, Y. L. Dokshitzer, V. A. Khoze, and S. Trovan. Similarity of parton and hadronspectra in QCD jets.
Zeitschrift für Physik C Particles and Fields , 27(1):65–72, 1985.[9] J. M. Bardsley and C. R. Vogel. A nonnegatively constrained convex programming methodfor image reconstruction.
SIAM Journal on Scientific Computing , 25(4):1326–1343, 2004.[10] D. Bertolini, P. Harris, M. Low, and N. Tran. Pileup per particle identification.
Journal ofHigh Energy Physics , 2014(10):59, 2014.[11] V. Blobel. An unfolding method for high energy physics experiments. arXiv preprinthep-ex/0208022 , 2002.[12] M. Bøgsted and S. M. Pitts. Decompounding random sums: a nonparametric approach.
Annals of the Institute of Statistical Mathematics , 62(5):855–872, 2010.[13] M. Cacciari, G. P. Salam, and G. Soyez. SoftKiller, a particle-level pileup removal method.
The European Physical Journal C , 75(2):59, 2015.[14] K. Choi and A. D. Lanterman. An iterative deautoconvolution algorithm for nonnegativefunctions.
Inverse problems , 21(3):981, 2005.[15] I. Csiszar. Why least squares and maximum entropy? An axiomatic approach to inferencefor linear inverse problems.
The Annals of Statistics , pages 2032–2066, 1991. – 26 –
16] G. D’Agostini. A multidimensional unfolding method based on Bayes’ theorem.
NuclearInstruments and Methods in Physics Research Section A: Accelerators, Spectrometers,Detectors and Associated Equipment , 362(2):487–498, 1995.[17] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete datavia the EM algorithm.
Journal of the royal statistical society. Series B (methodological) ,pages 1–38, 1977.[18] Y. Dokshitzer, V. Khoze, A. Mueller, and S. Troyan.
Basics of perturbative QCD . AtlanticaSéguier Frontières, 1991.[19] B. Efron. Bootstrap methods: { A } nother look at the jackknife. The Annals of Statistics , 7:1–26, 1979.[20] B. Efron and R. J. Tibshirani.
An Introduction to the Bootstrap . CRC press, 1994.[21] R. P. Feynman. Very high-energy collisions of hadrons.
Physical Review Letters , 23(24):1415,1969.[22] R. D. Field and R. P. Feynman. A parametrization of the properties of quark jets.
NuclearPhysics B , 136(1):1–76, 1978.[23] D. Gerth, B. Hofmann, S. Birkholz, S. Koke, and G. Steinmeyer. Regularization of anautoconvolution problem in ultrashort laser pulse characterization.
Inverse Problems inScience and Engineering , 22(2):245–266, 2014.[24] P. Ghosh. Negative binomial multiplicity distribution in proton-proton collisions in limitedpseudorapidity intervals at LHC up to √ s = 7 TeV and the clan model.
Physical Review D ,85(5):054017, 2012.[25] A. Giovannini and L. Van Hove. Negative binomial multiplicity distributions in high energyhadron collisions.
Zeitschrift für Physik C Particles and Fields , 30(3):391–400, 1986.[26] R. Gorenflo and B. Hofmann. On autoconvolution and regularization.
Inverse Problems , 10(2):353, 1994.[27] R. W. Grubbström and O. Tang. The moments and central moments of a compounddistribution.
European Journal of Operational Research , 170(1):106–119, 2006.[28] R. Hagedorn. Statistical thermodynamics of strong interactions at high energies.
NuovoCimento, Suppl. , 3(CERN-TH-520):147–186, 1965.[29] P. Hall. On the bootstrap and confidence intervals.
The Annals of Statistics , pages1431–1452, 1986.[30] P. Hall.
The bootstrap and Edgeworth expansion . Springer Science & Business Media, 2013.[31] A. Kaidalov. The quark-gluon structure of the pomeron and the rise of inclusive spectra athigh energies.
Physics Letters B , 116(6):459–463, 1982.[32] W. Kittel and E. A. De Wolf.
Soft multihadron dynamics . World Scientific, 2005.[33] Z. Koba, H. B. Nielsen, and P. Olesen. Scaling of multiplicity distributions in high energyhadron collisions.
Nuclear Physics B , 40:317–334, 1972.[34] S. Kullback and R. A. Leibler. On information and sufficiency.
The Annals of MathematicalStatistics , 22(1):79–86, 1951.[35] M. Kuusela and V. M. Panaretos. Statistical unfolding of elementary particle spectra: – 27 – mpirical Bayes estimation and bias-corrected uncertainty quantification.
The Annals ofApplied Statistics , 9(3):1671–1705, 2015.[36] L. Landweber. An iteration formula for Fredholm integral equations of the first kind.
American journal of mathematics , 73(3):615–624, 1951.[37] L. B. Lucy. An iterative technique for the rectification of observed distributions.
Theastronomical journal , 79:745, 1974.[38] G. Martin, K. O’Bryant, et al. The supremum of autoconvolutions, with applications toadditive number theory.
Illinois Journal of Mathematics , 53(1):219–235, 2009.[39] Z. L. Matthews. Proton-proton collisions at the Large Hadron Collider’s ALICE Experiment:diffraction and high multiplicity, PhD thesis. December 2011.[40] A. Meister. Optimal convergence rates for density estimation from grouped data.
Statistics& probability letters , 77(11):1091–1097, 2007.[41] S. Moskow and J. C. Schotland. Numerical studies of the inverse Born series for diffusewaves.
Inverse Problems , 25(9):095007, 2009.[42] J. L. Mueller and S. Siltanen.
Linear and nonlinear inverse problems with practicalapplications , volume 10. Siam, 2012.[43] A. Polyakov. Hypothesis of self-similarity in strong interactions: 1. Plural generation of e+e-annihilation hadrons.
Zh. Eksp. Teor. Fiz , 59:542, 1970.[44] W. H. Richardson. Bayesian-based iterative method of image restoration.
JOSA , 62(1):55–59, 1972.[45] B. Van Es, S. Gugushvili, P. Spreij, et al. A kernel type nonparametric density estimator fordecompounding.
Bernoulli , 13(3):672–694, 2007.[46] B. R. Webber. A QCD model for jet fragmentation including soft gluon interference.
NuclearPhysics B , 238(3):492–528, 1984.[47] N. Wiener.
Extrapolation, interpolation, and smoothing of stationary time series , volume 7.MIT press Cambridge, MA, 1949.[48] G. Zech. Analysis of distorted measurements–parameter estimation and unfolding. arXivpreprint arXiv:1607.06910 , 2016., 2016.