[PDF] Estimation of heavy tails in optical non-linear processes

Abstract

In optical non-linear processes rogue waves can be observed, which can be mathematically described by heavy-tailed distributions. These distributions are special due to the fact that the probability of registering extremely high intensities is significantly higher than for the exponential distribution, which is most commonly observed in statistical and quantum optics. The current manuscript gives a practical overview of the generic statistics toolkit concerning heavy-tailed distributions and proposes methods to deal with issues specific to non-linear optics. We take a closer look at supercontinuum generation, where rogue waves were already observed. We propose modifications to the Hill estimator to deal with detector saturation as well as corrections introduced by pumping the process by bright squeezed vacuum. The suggested methodology facilitates statistically reliable observation of heavy-tailed distribution in non-linear optics, nanooptics, atomic, solid-state processes and optomechanics.

Full PDF

aa r X i v : . [ phy s i c s . op ti c s ] N ov Estimation of heavy tails in optical non-linear processes ´Eva R´acz , L´aszl´o Ruppert , and Radim Filip Department of Optics, Palacky University, 17. listopadu 12, 771 46 Olomouc, Czech Republic Department of Theoretical Physics, Budapest University of Technology and Economics, M˝uegyetem rkp. 1.,1111 Budapest, Hungary

Abstract

In optical non-linear processes rogue waves can be observed, which can be mathematically de-scribed by heavy-tailed distributions. These distributions are special due to the fact that the proba-bility of registering extremely high intensities is signiﬁcantly higher than for the exponential distri-bution, which is most commonly observed in statistical and quantum optics. The current manuscriptgives a practical overview of the generic statistics toolkit concerning heavy-tailed distributions andproposes methods to deal with issues speciﬁc to non-linear optics. We take a closer look at super-continuum generation, where rogue waves were already observed. We propose modiﬁcations to theHill estimator to deal with detector saturation as well as corrections introduced by pumping theprocess by bright squeezed vacuum. The suggested methodology facilitates statistically reliable ob-servation of heavy-tailed distribution in non-linear optics, nanooptics, atomic, solid-state processesand optomechanics.

Keywords : non-linear optics, statistical optics, heavy-tailed distributions, rogue waves, supercontinuumgeneration

Heavy-tailed distributions, and power-law (or Pareto) distributions in particular have been reported froma very broad range of areas, including earthquake intensities [1–3], avalanche sizes [4], solar ﬂares [5],degree distributions of various social and biological networks [6–8], incomes [9, 10], insurance claims[11, 12], number of citations of scientiﬁc publications [13–15], and many more. For ﬁnancial institutions,the importance of heavy-tailed behavior comes from the fact that a simple Gaussian model severelyunderestimates the risks associated with diﬀerent products or investment strategies, which in turn resultsin considerable losses. This is why the mathematical background of heavy-tailed distributions and theirestimation have been most extensively studied in this context ([16–19] and others).For physicists, in turn, the general fascination with power laws comes from the concept of univer-sality classes in statistical physics, which describe the behavior of various systems close to their criticalpoints, irrespective of the details of the mechanisms of the systems [20]. Inspired by this, and using theincreasing availability of good-quality data sets, many of the ﬁeld have ventured into interdisciplinarywaters where power-law behavior has been reported; thereby creating entirely new disciplines like econo-physics [21, 22]. In such investigations, the emphasis has been mostly on understanding the emergenceof power-law behavior [23–25]. In contrast, the problem of practical estimation methods of the exponentassociated with the power-law (or indeed the veriﬁcation of power-law behavior) has received relativelylittle attention from the community. Notable exceptions are [8, 26], in which the power-law nature ofnumerous phenomena has been questioned.Turning to non-linear optics [27, 28], the problem formulation is, however, somewhat diﬀerent. Op-tical experiments producing light with heavy-tail intensity distributions allow high repetition rates andlarge samples to study unstable non-linear phenomena and their sources [29, 30]. Moreover, unlike insocial or ﬁnancial contexts, experiments can be repeated. For intensive pumps, non-linear systems canproduce bright signals with directly measurable intensity ﬂuctuations. In statistical optics [31], coherencetheory and quantum optics [32], the probability density function of intensities is a subject of investigationfor intensive light beams. Remarkably, it allows a direct observation of macroscopic quantum phenom-ena [33–37]. Intensity distributions in supercontinuum generation setups [38–40] being heavy-tailed –1ence the term rogue waves – does have a theoretical basis. So our aim is not to decide whether theobserved distribution is heavy-tailed or not. We rather propose ways to estimate the tail exponent ofthe distribution in the presence of experimental imperfections, which can help in experimental controland design. The issue of detector saturation, for example, should be paid special attention to sincefor intensity distributions with extremely heavy tails, it cannot be solved by a simple re-calibration.There is always going to be some portion of the data that will be aﬀected by saturation. Low-intensitystatistics, on the other hand, are mostly determined by background noise. Consequently, there is only alimited interval of intensities within which the observations can be used for estimation purposes and thishugely aﬀects the eﬃciency of any estimation procedure. If we are unlucky, the intervals aﬀected by thebackground noise and detector saturation overlap, and the experiment cannot be salvaged. If, however,there is a portion that is useful, we propose ways to gain an initial estimate of the tail exponent (and byextension, higher quantiles) which helps design further measurement setups. The procedures presentedin the current manuscript can be used to quantify the instabilities resulting in extreme events common inoptical non-linear processes [41, 42], and potentially also in non-linear optomechanics [43, 44], four-wavemixing in atomic vapors [45, 46] and wave-mixing processes in superconducting circuits [47–49]. Be-yond fundamental interest, extreme events can be used to produce highly non-classical eﬀects distillableto large squeezing and entanglement for quantum technology [50, 51]. In parallel, they also inspire aninvestigation of optical Maxwell demon principle [52] for heavy-tailed distributions.The article is structured as follows: In Section 2, we give a brief overview of the mathematicalbackground and clarify terminology. Section 3 describes a generic estimation toolkit with pointersto more sophisticated methods. Section 4 proposes variations of the generic methods for evaluatingnumerical data in non-linear optics speciﬁcally.

Let us ﬁrst provide an overview of the terminology and concepts concerning heavy-tailed distributions ,used throughout this work.In general, distributions that decay slower than exponential are referred to as heavy-tailed. To givea clear mathematical formulation of this concept, it is useful to introduce the tail function (also referredto as survival function, or complementary distribution function), deﬁned for an arbitrary real-valuedrandom variable X as F ( x ) ≡ P ( X ≥ x ) . In other words, this function describes the probability that the variable reaches or exceeds a threshold x . It is related to the more familiar distribution function (DF) F ( x ) through F ( x ) = 1 − F ( x ), andto the probability density function (PDF) f ( x ) through F ( x ) = R ∞ x f ( u ) d u . In what follows, we willonly consider the right tail of distributions, that is, the behavior of the largest values, and for the sakeof simplicity, suppose that we are dealing with positive-valued random variables like optical intensities.The whole treatment can be straightforwardly extended to the left tails of distributions.Using the tail function (TF), a heavy-tailed distribution can be deﬁned as a distribution for which C ≡ lim x →∞ − ln F ( x ) x = 0 . (1)There are other equivalent deﬁnitions [53], we prefer the latter formulation since it is the one whichis most in line with the somewhat vague notion of an “L-shaped” distribution used in connection withrogue waves ([54], for example). Note that if the limit C in (1) is a ﬁnite positive value, the distributiondecays asymptotically at an exponential rate, while if it is inﬁnity, the distribution decays faster thanexponential. Deﬁnition (1) includes even distributions whose moments of any order are ﬁnite, for examplethe log-normal distribution and the Weibull distribution with a shape parameter lower than one [55]. Pareto-type distributions (or regularly-varying distributions) form a subset of heavy-tailed distribu-tions and are deﬁned as F ( x ) = x − α · L ( x ) , (2)with L ( x ) being a slowly-varying function (for any t >

0, lim x →∞ L ( tx ) /L ( x ) = 1; slowly-varyingfunctions include for example the logarithm function and functions that have a ﬁnite limit in inﬁnity)and α >

0. For the Pareto distribution, L ( x ) = const, corresponding exact power-law behavior. Theexponent α is referred to as the tail exponent . Note that this exponent describes the decay of the tailfunction, the exponent corresponding to the decay of the PDF is α + 1. The moments E ( X a ) are ﬁnitefor all a < α and inﬁnite for all a > α . Whether E ( X α ) itself is ﬁnite depends on the function L ( x ).2 x p o n e n t i a l[ ] G a mm a [ ] L og - P a r e t o [ ] α = γ − > γ = 0 γ < L og - n o r m a l[ ] α = 0Heavy-tailed: C = 0 U n i f o r m [ ], B e t a [ ] < C < ∞ C = ∞ G a u ss i a n [ ] Pareto [9]Cauchy [64]Levy [65]Log-gamma [66]Figure 1: Categorization of univariate continuous distributions according to the properties of their righttails: speed of decay compared to exponential ( C ), tail exponent ( α ), and behavior of maxima ( γ );including a few examples for each category.What makes heavy-tailed distributions statistically special is the fact that many traditional proce-dures based on the mean and the variance are inapplicable if the ﬁrst and second moments do not exist.It is possible, of course, to compute sample averages, but they are meaningless if the correspondingexpected value is not ﬁnite for the underlying distribution. That is, as the number of observations isincreased, such sample averages do not converge to a ﬁnite number. The lack of deﬁnite moments limitsmany evaluations in statistical optics [31], coherence optics and quantum optics [32]. One should not,for example, calculate second-order quantities like correlation [42, 56–58] for α <

2. Furthermore, thetraditional central limit theorem, which lies at the heart of many applied models, is not applicable, either.The question what can be said about the largest observations for such distributions arises quitenaturally. There are limit laws concerning the maxima of samples and the distribution of thresholdexceedances. Very simply stated, if a distribution has the form (2), the distribution of sample maximatends to an extreme value distribution described by the DF exp (cid:8) − (1 + γx ) − /γ (cid:9) , with 0 < γ = α − < ∞ .The distribution of the exceedances of a suﬃciently large threshold l (that is, the random variable X − l | X > l ) tends in turn to a generalized Pareto distribution with DF 1 − (1 + γx ) − /γ , γ = α − .Both limit laws hold, of course, given proper normalizing constants, for details see [59]. These limit lawscan be straightforwardly used to model the behavior of the underlying distribution beyond the largestobserved value.There are heavy-tailed distributions whose tails are heavier than (2) (for example the log-Paretoconstructed by exponentiating a Pareto-distributed variable), these limit laws do not apply to them.There are also heavy-tailed distributions whose tail is lighter than (2), for these the DF of the maximatends to e − e − x , whereas threshold exceedances tend to an exponential distribution; these distributionsarise as the natural limit for γ = 0. Distributions with a light, but inﬁnite tail also belong to the γ = 0domain of attraction. The limit distribution of the maxima of random variables with a ﬁnite upperbound is also exp (cid:8) − (1 + γx ) − /γ (cid:9) , but with γ <

0. Figure 1 summarizes this categorization accordingto tail heaviness.In what follows, we will deal with the extreme value index γ instead of the tail exponent α = γ − .This has two practical reasons: ﬁrstly, γ is the quantity that is used in all mathematical publications,so the quite extensive mathematical theory is based on that quantity; and secondly, in the context ofsupercontinuum generation, γ is the quantity proportional to the mean intensity of the pump, so it is ina way more handy than α . 3 Basic tools for investigating power-law tails

The purpose of this section is to show physicists some simple, visual tools for assessing power-law behaviorand also give references on improving the behavior of the estimators. The reason why we have not pickeda single favorite estimator is that in order to have a reliable assessment of power-law behavior, it is betterto use more than one tool and see whether they produce consistent results.

Examples:

For demonstration purposes we will use in the next sections computer-generated samplesof diﬀerent distributions; their properties are summarized in Table 1. The size of the generated sampleswas 10 . We did not use solely regularly-varying distributions (2) because we would also like to showwhat the tools produce when used with distributions that do not have a power-law tail. The exponentialdistribution is a standard thin-tailed distribution, compared to which the heavy-tailed property itself isdeﬁned. The log-normal distribution is, according to the deﬁnition (1) heavy-tailed, but it does not havea ﬁnite tail exponent. A log-gamma distributed variable can be created by exponentiating a gamma-distributed variable the same way one transforms a normally distributed variable into a log-normal; it isan example for a regularly-varying distribution with L ( x ) = const. The Pareto distribution correspondsto pure power-law behavior, and is used for demonstrating the best-case scenario (it is also the a = 1special case of the log-gamma distribution).Name PDF Heavy-tailed? γ Exponential λe − λx N 0Log-normal 1 √ πσ x exp (cid:26) − ln x σ (cid:27) Y 0Log-gamma b a Γ( a ) (ln x ) a − · x − b − Y b − Pareto α · x − α − Y α − Table 1: Distributions used for demonstration purposes.

Preparing the histogram is probably the most wide-spread way to visualize random samples, and isdeﬁnitely go-to tool for many physicists. This is why we devote more attention to it in this paper thanit deserves from the mathematical point of view. It involves deﬁning a discrete set of bins over thenumber line and then counting how many observations of the random variable fall in each bin. Given alinear set of bins, the histogram provides an estimate of the PDF of the underlying distribution. If thisdistribution is, at least approximately, Pareto, then the histogram should be linear on a log-log plot withthe absolute value of the slope equal to the tail exponent plus one. One can even perform a least-squaresﬁt to obtain the slope of the line to estimate the exponent. There are, however, three major problemswith this approach:1. Due to the power-law nature, there will be only a few observations (if any) on the right end of alinear set of bins, meaning that exactly for large values, where the power-law property itself shouldbe more pronounced, there will be considerable variance.2. As taking the logarithm and the expectation are not interchangeable, the expected value of thelogarithm of the frequencies is not equal to the logarithm of the PDF. Furthermore, the varianceof the frequencies depends on the location. So the basic prerequisites of an ordinary least squaresﬁt do not hold.3. The choice of bins has a marked eﬀect on the outcome.To help with the ﬁrst problem, one can use logarithmic bins instead of linear ones. However, it isimportant to be aware of the fact that using logarithmic bins, that is, bin sizes that are linear on alogarithmic scale is equivalent to preparing a linearly binned histogram of Y = ln X . Consequently,this version of the histogram approximates the probability density function of Y , g ( y ). This change of4 − − − − − (a) H i s t og r a m (b) x Impr.: 3 . · x − . Naive: 2 . · x − . Lin. binsLog. bins R e l a ti v e R M S E ( % ) Figure 2: (a) Linearly (ﬁlled circles) and logarithmically binned (empty circles) histograms for a singlesample of Pareto ( α = 3) data, sample size 10 , 100 bins. The dashed line shows the result of the naiveestimator, the solid line shows the result of the improved version. For the linearly binned histogram,the analytic formula is 3 · x − , for the logarithmically binned 3 · x − . (b) Relative root mean squarederror (RMSE) of the naive histogram estimator of α − and the improved version as a function of howmany of the largest observations were taken into account. Values were calculated using 10 Pareto-distributed samples of size 10 . The dotted black line corresponds to the Cram´er–Rao bound for thePareto distribution, 1 / √ k ; the colored dashed lines show the performance of the naive estimator; andthe colored solid lines correspond to the improved estimator. The diﬀerent colors indicate how manybins were used to construct the histogram (see the legend).variables results in g (ln x ) = x · f ( x ), meaning that the slope of the linearly binned histogram is α + 1,while the slope of the logarithmically binned version is α instead, see ﬁgure 2(a).Concerning the second point, if one does not take the logarithm of the frequencies, there is noproblem with expectations, and the variances can be calculated explicitly. A consistent version of thehistogram approach is preparing the histogram of Y = ln X , and performing a weighted least-squaresﬁt of A · e − By . The weights are needed to make the data homoscedastic [67], and can be calculated as[ˆ p k (1 − ˆ p k )] − , with ˆ p k denoting the fraction of observations within the k th bin. The choice of bins inthis setting is especially problematic, since empty bins should to be avoided (or, alternatively, discardedwhen performing the ﬁt).The third issue cannot be completely eliminated, however, ﬁgure 2(b) shows the results of a small-scale simulation experiment based on purely power-law samples (corresponding to the best-case scenario).We have taken N samples = 10 samples, each consisting of N = 10 elements. For each sample, for k = 1 , , . . . , N we have calculated the values of the estimators based on the top k observations andcompared them to the known value of the extreme value index γ to obtainRel . RMSE m ( k ) ≡ γ · vuut N samples N samples X i =1 (ˆ γ m,i ( k ) − γ ) , with m denoting the method used. This means that for a given value of k , we prepared 4 histograms (2,3, 5, and 9 bins), and calculated the estimate using the naive (least-squares linear ﬁt on log-log scale)and the improved version (weighted least squares exponential ﬁt on lin-log scale) of the estimator as wellfor either. First of all, the simulation showed how bad the performance of the naive histogram estimatorreally is. Secondly, and probably a little counter-intuitively, it showed how few bins are required inorder to minimize the error of either version of the estimator based on the histogram compared to howmany bins one would use for visualization. Considering, however, that one only needs to estimate asingle parameter, increasing the number of points does not necessarily help if these in turn become lessaccurate. To get an estimate of how many bins should be used for a given number of observations k ,one can consider the R´enyi representation theorem [68]. Using it, it is straightforward to show thatthe diﬀerence between the log-spacing between the largest and smallest element of a purely Pareto-distributed sample of size k is α − · [1 + 1 / . . . + 1 / ( k − ≈ α − [ γ ∗ + ln( k − γ ∗ ≈ . α − (since that is the scale parameter of the logarithmically transformed sample), so the total number ofbins should be a linear function of ln( k − k = 100, 3 bins are ideal,for k = 10 The empirical tail function (ETF) presents a simple alternative to histograms as a visual tool that doesnot depend on an arbitrary binning procedure. Given a sample of independent, identically distributed(iid) observations, { x , x , . . . , x N } , sorted in descending order x (1) ≥ x (2) ≥ . . . ≥ x ( N ) , the empiricaltail function is given as F ∗ (cid:0) x ( k ) (cid:1) = kN , (3)or equivalently, for an arbitrary threshold l ∈ R , F ∗ ( l ) = 1 N N X i =1 { x i ≥ l } , with {·} denoting the indicator function. In other words, one has to check what proportion of theobservations exceed the limit l ; l = x ( k ) yields the ﬁrst deﬁnition, equation (3). If the sample is power-law distributed (at least for the largest observations) the ETF should be linear on a log-log plot (again,for the largest observations), with the slope equal to − α (see ﬁgure 3). Since the TF is the integral of thePDF, the ETF is considerably smoother than the histogram, which makes it easier to detect deviationsfrom power-law behavior visually.As for a numerical estimate of α , it is, of course, possible to perform a least-squares ﬁt on (cid:8) ln x ( k ) , ln kN (cid:9) (and that also yields much better results than the naive histogram approach), however, it is better to dothis with the roles reversed, that is, by treating ln kN as the independent variable. The latter procedureis referred to as the QQ estimator [69], with QQ standing for quantile-quantile. The essential diﬀerenceis that in the ETF version, one divides by the sample average (cid:10) ln x ( k ) (cid:11) − (cid:10) ln x ( k ) (cid:11) , and in the QQversion a deterministic number, (cid:10) ln kN (cid:11) − (cid:10) ln kN (cid:11) . The QQ version is therefore more stable.In general, a QQ plot is constructed by plotting the sorted observations against the matching quantilesof the theoretical distribution. If the underlying distribution matches the theoretical distribution (up toa linear transformation), then the QQ plot should be linear. In the speciﬁc case of a Pareto-distributedvariable, one transforms it into an exponentially distributed variable by taking the logarithm and plotsthe sample as a function of the standard exponential quantiles, that is, the plot consists of the points (cid:8) − ln kN , ln x ( k ) (cid:9) and the slope of the line is α − . In essence, this corresponds to exchanging the twoaxes of the ETF plot on a log-log scale. The QQ estimator of γ = α − is then calculated as the slope ofthe LS ﬁt on the QQ plot.This is a very simple estimator, and it has been shown to be weakly consistent [70]. However, theissues present in the naive histogram estimator are present here, too. Namely the expected value ofln X ( k ) is not − α − · ln kN , and the variance of ln X ( k ) depends on k . What is more, X ( i ) and X ( j ) arenot independent. The problem has been addressed in [47] for purely Pareto distributed samples, takinginto account both the issue of expectations and the covariance matrix of order statistics. The authorsshow that the solution of the generalized regression problem is equivalent to the Hill estimator discussedin section 3.3, which should not come as a surprise for since it is the maximum likelihood estimator. Thesame issue has been addressed in [71], under more general assumptions on the underlying distribution,yielding an improved regression estimator. Nevertheless, even the basic QQ estimator is also a viable,although usually inferior, alternative to the Hill estimator.6 − − − − ET F − − − − ET F (a) Exponential (b) Log-normal x (c) Log-gamma x (d) Pareto Figure 3: Empirical tail functions for 100 samples of length 10 , for the distributions: (a) Exponential( λ = 0 . σ = 2 . a = b = 0 . α = 0 . The Hill estimator can be obtained as the conditional maximum likelihood estimator of the reciprocal ofthe tail exponent α − for Pareto-distributed data [72]: d α − ( k ) ≡ ˆ γ H ( k ) = 1 k k X i =1 ln x ( i ) − ln x ( k +1) , (4)with x ( i ) , i = 1 , , . . . , N denoting the i th largest element of the sample. In a more general setting,given a suﬃciently large number of observations from a regularly-varying distribution, it converges tothe extreme value index of the distribution (the rate of convergence depends on the distribution) [73].For pure Pareto( α ) data, E (cid:0) ˆ γ H ( k ) (cid:1) = α − , and RMSE (cid:0) ˆ γ H ( k ) (cid:1) = α − / √ k .If a random variable has a power-law distribution, taking the logarithm of it transforms it intoan exponentially distributed random variable, so another way of interpreting the Hill estimator is thefollowing: it ﬁrst transforms the power-law sample into an exponential one, and after that it estimatesthe parameter of this exponential by taking the sample average, using the fact that for X ∼ EXP( λ ), E ( X | X > l ) = l + λ − for any l ∈ R + , which is sometimes referred to as the ageless or memorylessproperty of the exponential distribution. A third interpretation was given in the context of generalizedregression in [47].The term Hill plot stands for plotting ˆ γ H as a function of the tail length k . If we know that the datais exactly power-law, it is of course best to set k = N −

1, so the Hill plot does not make much sense.For real-life data, however, it is usually only the largest observations for which power-law behavior is areasonable approximation. Consequently, one would normally look for a plateau in the Hill plot where7 ˆ γ H ˆ γ H (a) Exponential k (c) Log-gamma (b) Log-normal k (d) Pareto Figure 4: Hill plot for 100 computer-generated data sets of length 10 for diﬀerent distributions. The plotfor a single realization is highlighted in black for each distribution. The parameters of the distributionswere the following: (a) Exponential ( λ = 0 . µ = 0 , σ = 2 .

8, (c) Log-gamma: a = b = 0 .

5, and (d) Pareto: α = 0 .

5. The true values for the extreme value index γ are indicated bydashed lines.the estimate is relatively stable, and choose the number of points to take into account based on that.Drees et al. [74] have shown that it is best practice to plot the Hill estimator as a function of ln k insteadof k , for which they coined the phrase altHill plot (ﬁgure 4).Taking a closer look, for a general regularly-varying distribution (2) the presence of the function L ( x )complicates the situation in two ways: • It introduces a non-zero bias which becomes zero only asymptotically. How fast it becomes zerodepends on the nature of corrections to the power-law behavior. • The optimal tail length k opt which minimizes the mean squared error of the estimator becomes(often considerably) smaller than the sample size.See for example ﬁgure 4(c), showing the Hill plots for log-gamma distributed samples, which have alogarithmic correction to the power-law. Having information about L ( x ) is therefore very useful: onecan modify the Hill estimator to reduce its bias, and also estimate the optimal tail length k opt . Thestatistics literature contains several approaches to bias reduction as well as the choice of tail length(see section 4.4 or [59] and references therein), but in general, the more information one has about thehigher-order behavior of the distribution the better, as estimating higher-order parameters is of courseeven more problematic than estimating the extreme value index γ itself.Another issue with the Hill estimator is that it deﬁnitely produces a positive number, regardless ofthe underlying distribution, see for example ﬁgure 4(b), where the true extreme value index γ is zerofor the log-normal distribution. Generally, the closer the estimate ˆ γ is to zero, the less conﬁdent oneshould be in the results, even if the histogram seems straight at the end. As a rule of thumb, if one hasan estimate ˆ γ < .

3, it is a good idea to consider alternative models for the data (unless the theoreticalbackground is clear and indicates a power-law behavior without reason for doubt).8here are further, still basic tools, like the mean excess plot [75, 76], which is quite popular in theactuarial ﬁeld, however, but are not as well suited for estimating the value of the tail exponent.

The previous section showed that in the ideal case, either approach yields results consistent with the truetail exponent. Actual experiments, however, are never that simple, the distortions and noise introducedby the apparatus require further considerations. There are two major eﬀects that cannot be escaped:detector imperfections (limited linear response) and noise added by diﬀerent experimental elements.We will use a very simplistic model of the experiments done in [41] in order to show how these aﬀectestimation.

Manceau et al. [41] showed that for supercontinuum generation setups, the distribution of measuredintensities has very heavy tails. Table 2 summarizes the types of experimental setups used by them fromthe mathematical point of view.Source Experiment Idealized distribution Heavy-tailed? γ Thermal Optical harmonic gen. [EXP( λ )] n Y 0Thermal Supercontinuum gen. sinh [EXP( λ )] Y 2 /λ BSV Optical harmonic gen. [GAMMA(1 / , β )] n Y 0BSV Supercontinuum gen. sinh [GAMMA(1 / , β )] Y 2 /β Table 2: Experimental setups in [41], BSV stands for bright squeezed vacuum. In eﬀect we have diﬀer-ent non-linear transformations of an exponential random variable for thermal pumping, and a gammadistributed variable for BSV pumping.According to Equation (1), each experimental setup produces heavy-tailed observables, even thoughthe pumps are not heavy-tailed (the gamma distribution also decays asymptotically at an exponentialrate). Note that for optical harmonic generation, even though the decay is slower than exponential,each moment exists, meaning that there is no theoretical issue with calculating second (or higher) ordercorrelations like g (2) [32], so we are not going to further concern ourselves with that case. For thesupercontinuum setup, however, the situation is quite diﬀerent: depending on the value of λ (or β ), whichis inversely proportional to the pump’s mean intensity, the highest existing moment can be arbitrarilysmall.If the models in Table 2 were not only approximations of the reality, one would only need to take theinverse of the non-linear transformation involved in order to obtain a vanilla exponential or gamma sampleand quite simply estimate the single free parameter of those distributions. Experimental imperfectionsdo, however, complicate the situation. For illustration, let us consider the following simple model of thesupercontinuum-generation process: X = D (cid:0) K · sinh ( I + ω ) (cid:1) , (5)with I denoting the pump’s intensity (whose distribution is either EXP( λ ) or GAMMA(1 / , b )) and X the detector’s output. The detector is modeled through D ( x ) =  l + ω if x < l,x + ω if x ∈ [ l, L ] ,L + ω if x > L, (6)with ω , ω denoting independent Gaussian noises on the pump’s and the detector’s side, respectively(these can be aggregates of diﬀerent types of noises); K is a constant factor. The value of l correspondsto the detector’s noise ﬂoor, the value of L to its saturation limit.The eﬀects of the diﬀerent elements of the model are shown in ﬁgure 5(a). The dash-dotted lightblue line shows the ideal case, i.e., the TF of X ≡ K · sinh I which is distorted in the following steps:Adding a Gaussian noise prior to the non-linear transformation ( X ≡ K · sinh [ I + ω ]) corresponds9 − − − − ET F − − − − − − − − − − x (a) x HistogramETF(b)

Figure 5: a) Model behavior with thermal pumping. The light blue dash-dotted line shows the TF of X ,the dark blue dash-dotted line shows the TF of X . The red/light red lines show 100 simulated ETF-swith lower cutoﬀ and detector noise ( X ). The solid black line shows the TF of the model when anupper cutoﬀ is also included. The lower and upper cutoﬀs are marked by black dotted lines. Parameters: I ∼ EXP( λ = 1), K = 200, ω ∼ N (0 , σ = 1), l = 10 , ω ∼ N (0 , σ = 10 ), L = 2 · . (b) Histogramand ETF for a single sample of size 10 with the given parameters.asymptotically only to a multiplication by a constant factor exp (cid:8) λσ (cid:9) (see dark blue dash-dotted line).Introducing a lower cutoﬀ l introduces a discontinuity in the TF, which is smeared by the second Gaussiannoise ω ( X ≡ max (cid:8) K · sinh [ I + ω ] , l (cid:9) + ω ). The upper cutoﬀ in L introduces a sharp fall to zero in L , which could be smoothed by taking a more accurate model of the saturation curve. However, for thesake of simplicity, the ﬁnal step is just X = min (cid:8) max (cid:8) K · sinh [ I + ω ] , l (cid:9) , L (cid:9) + ω . For the speciﬁcparameters used in ﬁgure 5, detector saturation essentially corresponds to discarding about the top 1%of observations, shown in the ﬁgure in light red. Note that out of the parameters of the model, only thepump’s mean intensity has an impact on the tail exponent (if there was a multiplicative distortion priorto the non-linear transformation, the situation would be diﬀerent). Figure 5(b) illustrates how mucheasier it is to notice if the saturation limit of the detector was reached during the experiment if oneuses the ETF instead of the histogram to visualize the data: while there is a very visible cut-oﬀ at thesaturation limit for the former, the latter shows only a slight increase in the rightmost bin. Note that inthis model, there is a non-zero probability of negative observations, which is why the ETF is not go toone as x goes to zero.In the following sections, we will examine how the diﬀerent estimation tools perform for the diﬀerentversions of our model. Using ﬁgure 5(a), we have gained an insight into what the individual parameters of our model do. Nowwe are looking at the performance of the three basic estimators discussed in section 3, and how they areaﬀected by model parameters.Figure 6 shows the root mean square error of the diﬀerent estimators for the thermal case, calculatedusing 10 simulated samples of size 10 . Clearly, in this case the convergence to power-law is quick, sincethe ideal RMSE (black dashed line) is reached easily by the Hill estimator, with the other two estimatorsperforming only a little worse (the minimum is about 4% instead of 3%). This is easily understood whenexamining the asymptotic form of the TF (which has a closed analytic form for σ = 0), F SCG , thermal ( x ) = (cid:20) e − λσ · xK (cid:21) − λ · (cid:18) − λK x + O ( x − ) (cid:19) , (7)which means that the convergence to the asymptotic behavior is also power-law. The divergence in thebias is caused by the fact that observations for this model can get arbitrarily close to zero (can evenbe negative), so the term − ln x ( k +1) in (4) will dominate in the Hill estimator as k is increased. Thisdivergence also introduces meaningless minima in the RMSE curve at k ≈ · , which is where thebias curves cross zero before diverging.Figure 7 shows that varying the mean intensity of the source ( λ − ) and the noise ﬂoor ( l ) changesthe situation compared to ﬁgure 6. The best achievable error is essentially determined by how many10 R e l . R M S E ( % ) k HillQQHist (a) -40-2002040608010010 R e l . b i a s ( % ) k HillQQHist (b)Figure 6: Comparison of estimators, SCG setup, thermal source, without detector saturation. (a)Relative RMSE. The black dashed line shows the relative RMSE for the exact Pareto case, 1 / √ k . (b)Relative bias. (Parameters: λ = σ = 1, l = σ = 10 , L = ∞ , K = 200.) R e l . R M S E ( % ) k λ = . λ = λ = (a) R e l . R M S E ( % ) kl = l = l = (b)Figure 7: Relative RMSE of the Hill estimator as a function of the number of points taken into account:(a) for diﬀerent values of the parameter λ of the thermal pump; (b) for diﬀerent values of the noise ﬂoorparameter l of the detector.observations are unaﬀected by experimental imperfections. If the mean intensity is increased (ﬁgure7(a)), more observations are above the noise ﬂoor of the detector, which ultimately means that one hasto discard fewer observations when estimating the tail index. Note that the blue line in ﬁgure 7(a)corresponds to considerable pre-transformation noise: the typical scale of the signal ( λ − = 1 /

2) isexceeded by the scale of the noise ( σ = 1), resulting in oddly shaped proﬁle. The same is true for thered line in ﬁgure 7(b): even though the noise ﬂoor is low ( l = 10 ), it is exceeded by the scale of theadditive noise ω ( σ = 10 ).If the process is pumped by bright squeezed vacuum, the problem becomes more technically involved.As ﬁgure 8 shows, the best achievable RMSE is signiﬁcantly worse (10-20% instead of 3-4%), owingto the considerable bias of the estimators for this case. This is because the correction to power-lawbehavior decays as a power of ln x , and not x . It is therefore helpful if, after taking the logarithm ofthe observations, one uses the conditional maximum likelihood estimator for GAMMA(1 / , β ) insteadof the Hill estimator, which, as discussed before, is a conditional MLE for EXP( λ ). That is, one shouldmaximize the function ln L ( α, β ) = α − k k X i =1 ln ln x ( i ) − βk k X i =1 ln x ( i ) + α ln β − ln Γ( α, β ln x ( k +1) ) , (8)with Γ( s, z ) denoting the upper incomplete gamma function. It is straightforward to show that asymp-totically, the value of β − that maximizes (8) is equal to the Hill estimator, and that the corrections are11 R e l . R M S E ( % ) k HillQQHistHill + corr. (a) -100-5005010010 R e l . b i a s ( % ) k HillQQHistHill + corr. (b)Figure 8: Comparison of estimators, BSV source, without detector saturation. (a) Relative RMSE. (b)Relative bias. (Parameters: β = σ = 1, l = σ = 10 , L = ∞ , K = 200.) O (cid:0) / ln x ( k +1) (cid:1) . The maximization can be done numerically using α = 1 / β = (cid:2) ˆ γ H (cid:3) − as the starting point. As the dashed lines in ﬁgure 8 show, the estimator deﬁned in (8) is indeed moreeﬃcient than the simple Hill estimator. The improvement is not as signiﬁcant as one might hope for sincethe actual transformation applied to the input signal was K · sinh ( · ) and not exp( · ). Thus, in theory,to get better results, one should change ln x ( i ) to asinh (cid:0)p x ( i ) /K ′ (cid:1) in (8), with K ′ = K · exp (cid:8) σ β (cid:9) ,however, this is problematic since K ′ depends on the unknown values of β ( ≡ /γ ), σ , and K . If the observations during an experiment are distorted by detector saturation, one can try and recalibratethe equipment so that all observations fall within the linear range of the detector. However, this is notnecessarily trivial to do if the process is indeed heavy-tailed, but even if this is not a problem, it is stilla good practice to try and evaluate the original data set instead of throwing it away. This, however,requires modifying the basic estimators introduced in section 3, which were constructed supposing thatthe largest observations are (close to being) Pareto distributed. Figure 9(a) shows that, as expected, thebasic approach fails if there is detector saturation.The three basic techniques presented in section 3 are easily modiﬁed to work with discarding thelargest observations which are aﬀected by detector saturation. For the least squares approaches oneither the histogram or the QQ plot, one quite straightforwardly has to omit observations above acertain threshold, but otherwise the optimization is exactly the same as without the upper cutoﬀ.For generalizing the Hill estimator, one has to take advantage of the R´enyi representation theoremaccording to which the scaled spacings of the order statistics of an exponential sample are themselvesexponentially distributed: that is, if Z i , i = 1 , . . . , N is an i.i.d. EXP( λ ) sample, then S ( i ) ≡ i · ( Z ( i ) − Z ( i +1) ), i = 1 , . . . , N − λ ), with Z ( i ) denoting the i th largest observation in thesample. Knowing that if X is Pareto, then ln X is exponential, discarding the j largest observationsresults in the following estimator:ˆ γ ( k, j ) := 1 k − j k X i = j +1 i (cid:0) ln x ( i ) − ln x ( i +1) (cid:1) = jk − j ln x ( j +1) x ( k +1) + 1 k − j k X i = j +1 ln x ( i ) x ( k +1) . (9)Note that setting j = 0 does yield the standard Hill estimator, as expected. As ﬁgure 9(b) shows,our suggested generalization indeed signiﬁcantly improves the performance. When choosing the valueof j , one has to, of course, discard the values aﬀected by detector saturation. However, discarding toomany values increases the RMSE, which is in the best-case scenario proportional to 1 / √ k − j . If, inorder to reduce the bias, one chooses to only keep the measurements within a shorter interval [ x ′ LO , x ′ HI ]instead of [ x LO , x HI ] ( x LO < x ′ LO < x ′ HI < x HI ), the number of measurements has to be multiplied bya factor of (cid:2) F ( x LO ) − F ( x HI ) (cid:3) / (cid:2) F ( x ′ LO ) − F ( x ′ HI ) (cid:3) . This ensures that on average, the same number of12 R e l . R M S E ( % ) kL = L = L = L = ∞ (a) R e l . R M S E ( % ) kj = j = j = j = (b)Figure 9: (a) Hill estimator performance for diﬀerent values of detector saturation. Dashed line: 1 / √ k .(b) Generalized version (9) to compensate for the saturation L = 10 and varying values of j , the numberof discarded observations. Dashed line: 1 / √ k − x ′ LO , x ′ HI ] during the second experiment as in [ x LO , x HI ] during the ﬁrst one.The values of F ( · ) can be substituted by their empirical counterparts from the ﬁrst experiment. Thisway one can check whether the new estimate ˆ γ ′ gained from the second experiment is consistent withthe old one. The ﬁgures for the QQ estimator are quite similar, whereas the histogram approach is notmuch improved by discarding the data aﬀected by saturation. Figures 6 - 9 show the best attainable performance of the basic estimators. The problem in practiceis how to decide how many points to take into account for calculating the estimators ( k ), especially ifthere is a large number of samples and evaluating each one is not feasible without automation. If thehigher-order behavior is known, or can be estimated with reasonable accuracy, the optimal tail length(minimizing the RMSE) can be estimated as well. This is the approach followed by most of the literature,however, due to the diﬃculty of estimating higher-order behavior, the resulting algorithms can be quiteinvolved (requiring the tuning of nuisance parameters) and often do not outperform simpler, heuristicapproaches. Pareto Log-gamma ModelMethod Based on Tail choice Cost γ γ γ

Table 3: Empirical RMSE of diﬀerent estimators applied to 100 samples of size 10 , for diﬀerent typesof distributions, all of which had a positive ﬁnite extreme value index ( γ ≡ α − ∈ { , } ). The “Cost”column refers to CPU time and complexity. The column “Model” refers to the toy model with a thermalpump with ( λ = γ/ σ = 0, l = σ = 10 , L = ∞ ). The best three values were indicated in diﬀerentshades of green for each distribution.We have implemented the estimation procedures shown in Table 3. We proposed procedures 1-3in which a heuristic path stability approach was used to choose the tail length, mainly based on [81],and tweaked using [74] and [26] (for details, see A). This approach was included because it essentiallyemulates how one would choose a tail length from the Hill plot, and it also works with an arbitrary13stimator. Table 3 of course, is not an exhaustive study of these procedures, especially since procedures6-10 involve more than one nuisance parameter, which could have been used to ﬁne-tune them. We didnot do that, we used the default parameter values suggested in the sources to see how they perform“out of the box”. Based on our tests, the procedure introduced by Guillou et al. [77] provided the mostreliable results and had the further advantage of simplicity and speed. Our suggested path stabilityapproach, which mimics visual evaluation, also works reasonably well combined with the Hill estimator. In this work, we ﬁrst gave a brief overview of a basic toolkit that may be used to estimate the tailexponent of associated with a heavy-tailed sample. We devoted more attention to histograms than itis justiﬁed from the mathematical point of view as they are the default tools for many physicists, anddiscussed how to use them more eﬃciently. Subsequently, we discussed two simple alternatives: the QQestimator and the Hill estimator through examples.We addressed the challenges speciﬁc to intensity measurements in supercontinuum generation exper-iments. In order to do that, we introduced a model for the observable intensity distribution. Firstly,if the source in the experiment is bright squeezed vacuum, the estimation becomes considerably moreineﬃcient than in the thermal case. We suggest using a modiﬁed version of the Hill estimator, deﬁnedin (8) for BSV. If the pump is thermal the plain Hill estimator (4) is suﬃcient. Next, one should checkwhether the observations were aﬀected by detector saturation. This is easily done by preparing theempirical tail function of the sample. If the answer is yes, one has to discard the aﬀected observationsand apply the estimator introduced in (9). Finally, we also included comparison of procedures which canchoose how many observations to take into account in an automated fashion.These results can be directly extended to investigate heavy tail distribution caused by three-waveand four-wave mixing in non-linear optics [41], atomic physics [82], superconducting circuits [48], andnon-linear optomechanics [43, 44], and electromechanics [83].

Acknowledgments ´E.R. and L.R. acknowledge the support of project 19-22950Y of the Czech Science Foundation. R.F.acknowledges the projects LTAUSA19099 and CZ.02.1.01/0.0/0.0/16 026/0008460 of MEYS CR.

References [1] B. Gutenberg and C. F. Richter, “Frequency of earthquakes in California*,”

Bulletin of the Seismo-logical Society of America , vol. 34, pp. 185–188, 10 1944.[2] K. Christensen, L. Danon, T. Scanlon, and P. Bak, “Uniﬁed scaling law for earthquakes,”

Proceedingsof the National Academy of Sciences , vol. 99, no. suppl 1, pp. 2509–2513, 2002.[3] M. G. Newberry and V. M. Savage, “Self-similar processes follow a power law in discrete logarithmicspace,”

Phys. Rev. Lett. , vol. 122, p. 158303, Apr 2019.[4] K. W. Birkeland and C. C. Landry, “Power-laws and snow avalanches,”

Geophysical Research Letters ,vol. 29, no. 11, pp. 49–1–49–3, 2002.[5] E. T. Lu and R. J. Hamilton, “Avalanches and the distribution of solar ﬂares,”

The astrophysicaljournal , vol. 380, pp. L89–L92, 1991.[6] R. Pastor-Satorras, A. V´azquez, and A. Vespignani, “Dynamical and correlation properties of theinternet,”

Phys. Rev. Lett. , vol. 87, p. 258701, Nov 2001.[7] M. P. H. Stumpf and P. J. Ingram, “Probability models for degree distributions of protein interactionnetworks,”

Europhysics Letters (EPL) , vol. 71, pp. 152–158, jul 2005.[8] M. Newman, “Power laws, pareto distributions and zipf’s law,”

Contemporary Physics , vol. 46,no. 5, pp. 323–351, 2005.[9] V. Pareto,

Cours d’´economie politique profess´e `a l’Universit´e de Lausanne.

Lausanne; Paris: F.Rouge; Pichon, 1896. 1410] V. M. Yakovenko and J. B. Rosser, “Colloquium: Statistical mechanics of money, wealth, andincome,”

Rev. Mod. Phys. , vol. 81, pp. 1703–1725, Dec 2009.[11] D. C. Shpilberg, “The probability distribution of ﬁre loss amount,”

The Journal of Risk and Insur-ance , vol. 44, no. 1, pp. 103–115, 1977.[12] H. Rootz´en and N. Tajvidi, “Extreme value statistics and wind storm losses: A case study,”

Scand.Actuarial J , vol. 1, pp. 70–94, 1995.[13] D. J. de Solla Price, “Networks of scientiﬁc papers,”

Science , vol. 149, no. 3683, pp. 510–515, 1965.[14] S. Redner, “How popular is your paper? an empirical study of the citation distribution,”

Eur. Phys.J. B , vol. 4, pp. 131–134, 1998.[15] M. Golosovsky, “Power-law citation distributions are not scale-free,”

Phys. Rev. E , vol. 96, p. 032306,Sep 2017.[16] B. Mandelbrot, “The Pareto-L´evy law and the distribution of income,”

International EconomicReview , vol. 1, no. 2, pp. 79–106, 1960.[17] B. Mandelbrot, “New methods in statistical economics,”

The Journal of Political Economy , vol. 71,no. 5, 1963.[18] S. Rachev, ed.,

Handbook of Heavy Tailed Distributions in Finance . No. 9780444508966 in ElsevierMonographs, Elsevier, 2003.[19] P. Embrechts, S. I. Resnick, and G. Samorodnitsky, “Extreme value theory as a risk managementtool,”

North American Actuarial Journal , vol. 3, no. 2, pp. 30–41, 1999.[20] K. G. Wilson, “The renormalization group and critical phenomena,”

Reviews of Modern Physics ,vol. 55, no. 3, p. 583, 1983.[21] “Econophysics: An emergent science,” 1997.[22] C. Schinckus, “1996–2016: Two decades of econophysics: Between methodological diversiﬁcationand conceptual coherence,”

The European Physical Journal Special Topics , vol. 225, pp. 3299–3311,Dec 2016.[23] P. Bak and K. Chen, “Self-organized criticality,”

Scientiﬁc American , vol. 264, no. 1, pp. 46–53,1991.[24] R. Albert and A.-L. Barab´asi, “Statistical mechanics of complex networks,”

Rev. Mod. Phys. , vol. 74,pp. 47–97, Jan 2002.[25] J. D. Farmer and F. Lillo, “On the origin of power-law tails in price ﬂuctuations,”

QuantitativeFinance , vol. 4, no. 1, pp. 7–11, 2004.[26] A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-law distributions in empirical data,”

SIAMReview , vol. 51, no. 4, pp. 661–703, 2009.[27] R. W. Boyd,

Nonlinear Optics, Third Edition . USA: Academic Press, Inc., 3rd ed., 2008.[28] C. Fabre, G. Grynberg, and A. Aspect,

Introduction to Quantum Optics: From the Semi-classicalApproach to Quantized Light . 09 2010.[29] P. Barthelemy, J. Bertolotti, and D. S. Wiersma, “A l´evy ﬂight for light,”

Nature , vol. 453, pp. 495–498, May 2008.[30] N. Mercadier, W. Guerin, M. Chevrollier, and R. Kaiser, “L´evy ﬂights of photons in hot atomicvapours,”

Nature Physics , vol. 5, pp. 602–605, Aug 2009.[31] J. Goodman, G. Skrockij, and A. Kokin,

Statistical Optics . A Wiley-Interscience publication, Wiley,1985.[32] L. Mandel, E. Wolf, and C. U. Press,

Optical Coherence and Quantum Optics . EBL-Schweitzer,Cambridge University Press, 1995. 1533] T. Iskhakov, M. V. Chekhova, and G. Leuchs, “Generation and direct detection of broadbandmesoscopic polarization-squeezed vacuum,”

Phys. Rev. Lett. , vol. 102, p. 183602, May 2009.[34] T. S. Iskhakov, M. V. Chekhova, G. O. Rytikov, and G. Leuchs, “Macroscopic pure state of lightfree of polarization noise,”

Phys. Rev. Lett. , vol. 106, p. 113602, Mar 2011.[35] T. S. Iskhakov, I. N. Agafonov, M. V. Chekhova, and G. Leuchs, “Polarization-entangled light pulsesof 10 photons,” Phys. Rev. Lett. , vol. 109, p. 150502, Oct 2012.[36] T. S. Iskhakov, V. C. Usenko, U. L. Andersen, R. Filip, M. V. Chekhova, and G. Leuchs, “Heraldedsource of bright multi-mode mesoscopic sub-poissonian light,”

Opt. Lett. , vol. 41, pp. 2149–2152,May 2016.[37] T. S. Iskhakov, V. C. Usenko, R. Filip, M. V. Chekhova, and G. Leuchs, “Low-noise macroscopictwin beams,”

Phys. Rev. A , vol. 93, p. 043849, Apr 2016.[38] D. R. Solli, C. Ropers, P. Koonath, and B. Jalali, “Optical rogue waves,”

Nature , pp. 1054–1057,12 2007.[39] V. Ruban, Y. Kodama, M. Ruderman, J. Dudley, R. Grimshaw, P. V. E. McClintock, M. Onorato,C. Kharif, E. Pelinovsky, T. Soomere, G. Lindgren, N. Akhmediev, A. Slunyaev, D. Solli, C. Ropers,B. Jalali, F. Dias, and A. Osborne, “Rogue waves – towards a unifying concept? discussions anddebates,”

The European Physical Journal Special Topics , vol. 185, pp. 5–15, Jul 2010.[40] N. Akhmediev, B. Kibler, F. Baronio, M. Beli´c, W.-P. Zhong, Y. Zhang, W. Chang, J. M. Soto-Crespo, P. Vouzas, P. Grelu, C. Lecaplain, K. Hammani, S. Rica, A. Picozzi, M. Tlidi, K. Pana-jotov, A. Mussot, A. Bendahmane, P. Szriftgiser, G. Genty, J. Dudley, A. Kudlinski, A. Demir-can, U. Morgner, S. Amiraranashvili, C. Bree, G. Steinmeyer, C. Masoller, N. G. R. Broderick,A. F. J. Runge, M. Erkintalo, S. Residori, U. Bortolozzo, F. T. Arecchi, S. Wabnitz, C. G. Tio-fack, S. Coulibaly, and M. Taki, “Roadmap on optical rogue waves and extreme events,”

Journal ofOptics , vol. 18, p. 063001, apr 2016.[41] M. Manceau, K. Y. Spasibko, G. Leuchs, R. Filip, and M. V. Chekhova, “Indeﬁnite-mean paretophoton distribution from ampliﬁed quantum noise,”

Phys. Rev. Lett. , vol. 123, p. 123606, Sep 2019.[42] K. Y. Spasibko, D. A. Kopylov, V. L. Krutyanskiy, T. V. Murzina, G. Leuchs, and M. V. Chekhova,“Multiphoton eﬀects enhanced due to ultrafast photon-number ﬂuctuations,”

Phys. Rev. Lett. ,vol. 119, p. 223603, Nov 2017.[43] G. A. Brawley, M. R. Vanner, P. E. Larsen, S. Schmid, A. Boisen, and W. P. Bowen, “Nonlinearoptomechanical measurement of mechanical motion,”

Nature Communications , vol. 7, p. 10988, Mar2016.[44] M. ˇSiler, L. Ornigotti, O. Brzobohat´y, P. J´akl, A. Ryabov, V. Holubec, P. Zem´anek, and R. Filip,“Diﬀusing up the hill: Dynamics and equipartition in highly unstable systems,”

Phys. Rev. Lett. ,vol. 121, p. 230601, Dec 2018.[45] C. F. McCormick, V. Boyer, E. Arimondo, and P. D. Lett, “Strong relative intensity squeezing byfour-wave mixing in rubidium vapor,”

Opt. Lett. , vol. 32, pp. 178–180, Jan 2007.[46] A. M. n. Guerrero, P. Nussenzveig, M. Martinelli, A. M. Marino, and H. M. Florez, “Quantum noisecorrelations of an optical parametric oscillator based on a nondegenerate four wave mixing processin hot alkali atoms,”

Phys. Rev. Lett. , vol. 125, p. 083601, Aug 2020.[47] I. B. Aban and M. M. Meerschaert, “Generalized least-squares estimators for the thickness of heavytails,”

Journal of Statistical Planning and Inference , vol. 119, no. 2, pp. 341 – 352, 2004.[48] V. Sivak, N. Frattini, V. Joshi, A. Lingenfelter, S. Shankar, and M. Devoret, “Kerr-free three-wavemixing in superconducting quantum circuits,”

Phys. Rev. Applied , vol. 11, p. 054060, May 2019.[49] S. Mundhada, A. Grimm, J. Venkatraman, Z. Minev, S. Touzard, N. Frattini, V. Sivak, K. Sliwa,P. Reinhold, S. Shankar, M. Mirrahimi, and M. Devoret, “Experimental implementation of a raman-assisted eight-wave mixing process,”

Phys. Rev. Applied , vol. 12, p. 054051, Nov 2019.1650] J. Heersink, C. Marquardt, R. Dong, R. Filip, S. Lorenz, G. Leuchs, and U. L. Andersen, “Distillationof squeezing from non-gaussian quantum states,”

Phys. Rev. Lett. , vol. 96, p. 253601, Jun 2006.[51] R. Dong, M. Lassen, J. Heersink, C. Marquardt, R. Filip, G. Leuchs, and U. L. Andersen, “Experi-mental entanglement distillation of mesoscopic quantum states,”

Nature Physics , vol. 4, pp. 919–923,Dec 2008.[52] M. D. Vidrighin, O. Dahlsten, M. Barbieri, M. S. Kim, V. Vedral, and I. A. Walmsley, “Photonicmaxwell’s demon,”

Phys. Rev. Lett. , vol. 116, p. 050401, Feb 2016.[53] S. Foss, D. Korshunov, and S. Zachary,

An introduction to heavy-tailed and subexponential dis-tributions . Springer Series in Operations Research and Financial Engineering, Springer, 2nd ed.,2013.[54] S. Boscolo and C. Finot, eds.,

Shaping light in nonlinear optical ﬁbers . Wiley, 4 2017.[55] G. Bohm and G. Zech,

Introduction to statistics and data analysis for physicists . DESY, 2010.[56] F. Boitier, A. Godard, N. Dubreuil, P. Delaye, C. Fabre, and E. Rosencher, “Photon extrabunchingin ultrabright twin beams measured by two-photon counting in a semiconductor,”

Nature Commu-nications , vol. 2, p. 425, Aug 2011.[57] Y. Zhou, F.-l. Li, B. Bai, H. Chen, J. Liu, Z. Xu, and H. Zheng, “Superbunching pseudothermallight,”

Phys. Rev. A , vol. 95, p. 053809, May 2017.[58] L. Zhang, Y. Lu, D. Zhou, H. Zhang, L. Li, and G. Zhang, “Superbunching eﬀect of classical lightwith a digitally designed spatially phase-correlated wave front,”

Phys. Rev. A , vol. 99, p. 063827,Jun 2019.[59] J. Beirlant, Y. Goegebeur, and J. Segers,

Statistics of extremes: theory and applications . WileySeries in Probability and Statistics, Chichester: Wiley, 2006.[60] U. Cormann and R.-D. Reiss, “Generalizing the pareto to the log-pareto model and statisticalinference,”

Extremes , vol. 12, pp. 93–105, Mar 2009.[61] P. W. Milonni, J. H. Carter, C. G. Peterson, and R. J. Hughes, “Eﬀects of propagation throughatmospheric turbulence on photon statistics,”

Journal of Optics B: Quantum and SemiclassicalOptics , vol. 6, pp. S742–S745, jul 2004.[62] E. T. Jaynes, “Information theory and statistical mechanics,”

Phys. Rev. , vol. 106, pp. 620–630,May 1957.[63] N. Dmitruk, A. Goncharenko, and E. Venger,

Optics of Small Particles and Composite Media . 112009.[64] O. Svelto,

Principles of Lasers . Springer US, 5 ed., 2010. p. 45.[65] S. Lepri, S. Cavalieri, G.-L. Oppo, and D. S. Wiersma, “Statistical regimes of random laser ﬂuctu-ations,”

Phys. Rev. A , vol. 75, p. 063820, Jun 2007.[66] M. E. Ghitany, E. G´omez-D´eniz, and S. Nadarajah, “A new generalization of the pareto distributionand its application to insurance data,”

J. Risk Financial Manag. , vol. 11, no. 1, 2018.[67] A. Aitken, “On least squares and linear combination of observations,” vol. 55, pp. 42–48, 1934.[68] A. R´enyi, “On the theory of order statistics,”

Acta Mathematica Academiae Scientiarum Hungarica ,vol. 4, pp. 191–231, Sep 1953.[69] M. Kratz and S. I. Resnick, “The qq-estimator and heavy tails,”

Communications in Statistics.Stochastic Models , vol. 12, no. 4, pp. 699–724, 1996.[70] S. Resnick, T. Mikosch, and S. Robinson,

Heavy-Tail Phenomena: Probabilistic and StatisticalModeling . No. v. 10 in Heavy-tail phenomena: probabilistic and statistical modeling, Springer,2007. 1771] J. Beirlant, G. Dierckx, Y. Goegebeur, and G. Matthys, “Tail index estimation and an exponentialregression model,”

Extremes , vol. 2, pp. 177–200, 1999.[72] B. M. Hill, “A simple general approach to inference about the tail of a distribution,”

Ann. Statist. ,vol. 3, pp. 1163–1174, 09 1975.[73] J. Beirlant, G. Dierckx, and A. Guillou, “Estimation of the extreme-value index and generalizedquantile plots,”

Bernoulli , vol. 11, pp. 949–970, 12 2005.[74] H. Drees, L. de Haan, and S. Resnick, “How to make a Hill plot,”

The Annals of Statistics , vol. 28,no. 1, pp. 254–274, 2000.[75] S. Coles,

An introduction to statistical modeling of extreme values . Springer Series in Statistics,London: Springer-Verlag, 2001.[76] S. Ghosh and S. Resnick, “A discussion on mean excess plots,”

Stochastic Processes and theirApplications , vol. 120, no. 8, pp. 1492 – 1517, 2010.[77] A. Guillou and P. Hall, “A diagnostic for selecting the threshold in extreme value analysis,”

Journalof the Royal Statistical Society: Series B (Statistical Methodology) , vol. 63, no. 2, pp. 293–305, 2001.[78] J. Danielsson, L. de Haan, L. Peng, and C. de Vries, “Using a bootstrap method to choose thesample fraction in tail index estimation,”

J. Multivar. Anal. , vol. 76, p. 226–248, Feb. 2001.[79] H. Drees and E. Kaufmann, “Selecting the optimal sample fraction in univariate extreme valueestimation,”

Stochastic Processes and their Applications , vol. 75, no. 2, pp. 149 – 172, 1998.[80] F. Caeiro and M. Gomes,

Threshold Selection in Extreme Value Analysis: Methods and Applications ,pp. 69–86. 12 2015.[81] M. M. Neves, M. I. Gomes, F. Figueiredo, and D. P. Gomes, “Modeling extreme events: Samplefraction adaptive choice in parameter estimation,”

Journal of Statistical Theory and Practice , vol. 9,no. 1, pp. 184–199, 2015.[82] V. Boyer, A. M. Marino, R. C. Pooser, and P. D. Lett, “Entangled images from four-wave mixing,”

Science , vol. 321, no. 5888, pp. 544–547, 2008.[83] M. J. Seitner, M. Abdi, A. Ridolfo, M. J. Hartmann, and E. M. Weig, “Parametric oscillation,frequency mixing, and injection locking of strongly coupled nanomechanical resonator modes,”

Phys.Rev. Lett. , vol. 118, p. 254301, Jun 2017.

AppendixA Path stability approach to choosing tail length

Procedures 1-3 in Table 3 use the following algorithm, which was adapted from [81], to select the taillength:1. Calculate the value of the estimator for all applicable values of the tail length: ˆ γ ( k ) for k = k min , . . . , k max .2. For all k min ≤ k ≤ k max , perform a statistical test to see whether a power-law with the estimatedparameters can be rejected or not. Let K denote the uninterrupted interval of values of k whichdoes not include any rejected tail lengths and is to the leftmost. See the dashed portion in Figure10 showing the rejected values. This addition was inspired by [26], although the KS statistic itselfis used in a much diﬀerent manner.3. For ∀ k ∈ K , round the estimated value ˆ γ ( k ) to n decimal places. Choosing n = 1 is mostlyappropriate, the important thing is that the rounded values should not be constant within K .18 . . E s ti m a t e d E V I Tail length Rejected estimatesNon-rejected estimatesRounded values

Figure 10: Heuristic tail length selection algorithm. The thick black line corresponds to K , the greeninterval to I . The sample was created by taking applying sinh ( · ) to an exponential sample and addinga Gaussian noise. The true value of γ for this sample is one.4. Identify the longest interval I , within which the rounded estimates are constant. The originalmethod suggests deﬁning length as the number of elements ( k − k + 1), we suggest per Drees etal. [74] deﬁning length on the log scale (log( k /k )).5. Within I , calculate the estimates rounded to n + 2 digits, and identify the mode of these values,denoted by m .6. The estimate of the tail is then gained by choosing the shortest tail ˆ k within I for which ˆ γ (ˆ k )rounded to n + 2 decimal places is equal to m . The ﬁnal estimate of γ is then ˆ γ (ˆ kk