[PDF] Accuracy and precision of the estimation of the number of missing levels in chaotic spectra using long-range correlations

Abstract

We study the accuracy and precision for estimating the fraction of observed levels φ in quantum chaotic spectra through long-range correlations. We focus on the main statistics where theoretical formulas for the fraction of missing levels have been derived, the Δ 3 of Dyson and Mehta and the power spectrum of the δ n statistic. We use Monte Carlo simulations of the spectra from the diagonalization of Gaussian Orthogonal Ensemble matrices with a definite number of levels randomly taken out to fit the formulas and calculate the distribution of the estimators for different sizes of the spectrum and values of φ . A proper averaging of the power spectrum of the δ n statistic needs to be performed for avoiding systematic errors in the estimation. Once the proper averaging is made the estimation of the fraction of observed levels has quite good accuracy for the two methods even for the lowest dimensions we consider d=100 . However, the precision is generally better for the estimation using the power spectrum of the δ n as compared to the estimation using the Δ 3 statistic. This difference is clearly bigger for larger dimensions. Our results show that a careful analysis of the value of the fit in view of the ensemble distribution of the estimations is mandatory for understanding its actual significance and give a realistic error interval.

Full PDF

AAccuracy and precision of the estimation of the number of missing levels in chaotic spectra usinglong-range correlations

I. Casal, ∗ L. Mu˜noz, † and R. A. Molina ‡ Grupo de F´ısica Nuclear, Departamento de Estructura de la Materia,F´ısica T´ermica y Electr´onica, Facultad de Ciencias F´ısicas,Universidad Complutense de Madrid and IPARCOS, CEI Moncloa, Madrid, 28040, Spain Instituto de Estructura de la Materia, IEM-CSIC, Serrano 123, Madrid, E-28006, Spain

We study the accuracy and precision for estimating the fraction of observed levels ϕ in quantum chaoticspectra through long-range correlations. We focus on the main statistics where theoretical formulas for thefraction of missing levels have been derived, the ∆ of Dyson and Mehta and the power spectrum of the δ n statistic. We use Monte Carlo simulations of the spectra from the diagonalization of Gaussian OrthogonalEnsemble matrices with a deﬁnite number of levels randomly taken out to ﬁt the formulas and calculate thedistribution of the estimators for different sizes of the spectrum and values of ϕ . A proper averaging of thepower spectrum of the δ n statistic needs to be performed for avoiding systematic errors in the estimation. Oncethe proper averaging is made the estimation of the fraction of observed levels has quite good accuracy for thetwo methods even for the lowest dimensions we consider d = 100 . However, the precision is generally betterfor the estimation using the power spectrum of the δ n as compared to the estimation using the ∆ statistic. Thisdifference is clearly bigger for larger dimensions. Our results show that a careful analysis of the value of theﬁt in view of the ensemble distribution of the estimations is mandatory for understanding its actual signiﬁcanceand give a realistic error interval. I. INTRODUCTION

Statistical analysis of spectra has become a very useful toolin Physics. It already started in the 50-60’s of the past centuryin the ﬁeld of Nuclear Physics when Wigner, Dyson, Gaudin,Mehta and others developed and applied the Random MatrixTheory (RMT) to nuclear spectra [1–12]. One of the mainresults arrived in 1982 when Haq, Pandey and Bohigas an-alyzed the spectral ﬂuctuations of an ensemble of 1407 ex-perimentally identiﬁed neutron and proton J π = 1 / + res-onances just above the one-nucleon emission threshold (theNuclear Data Ensemble, NDE) [13]. They showed an impres-sive agreement with the prediction of the Gaussian Orthogo-nal Ensemble (GOE) of RMT with very high statistical sig-niﬁcance. The results are certainly striking considering thematrix models from RMT are constructed with random num-bers and the prediction is parameter-free. Two years later, Bo-higas, Giannoni and Schmit applied RMT to study the spec-tral ﬂuctuations of a Sinai quantum billiard, whose classicalanalogue shows chaotic dynamics, obtaining also a very goodagreement with the GOE [14]. In view of this result, they es-tablished as a conjecture that the spectral ﬂuctuations of time-reversal-invariant systems whose classical analogs are chaoticare the same as those predicted by GOE. Known from then asthe BGS conjecture, it has been supported by many other re-sults in different physical systems along the years, and it wasﬁnally proved semiclassically by Heusler et al. in 2007 [15].On the other hand, for quantum systems with an integrableclassical analogue Berry and Tabor showed in 1977 that theirspectral ﬂuctuations are identical to those of a sequence of un-correlated random numbers and follow Poisson statistics [16]. ∗ [email protected] † [email protected] ‡ [email protected] Once the classiﬁcation depending on the underlying dy-namics of the classical system is made, the way is paved toextend it to all quantum systems with or without a classicalanalogue. Thus, a quantum system is said to be chaotic whenits spectral ﬂuctuations are well described by RMT and it issaid to be regular when they follow Poisson statistics. Spectralﬂuctuations of quantum systems are universal as they do notdepend on the particular properties of each system but only onthe more general properties of the dynamics, whether it is reg-ular or chaotic. However, this universality can only be arguedfor the two extremes, systems intermediate between chaos andregularity lack this universal characterization.To measure spectral ﬂuctuations in RMT a certain statis-tic is deﬁned which can be calculated from the sequence oflevels of a quantum spectrum. The statistics are usually di-vided on whether they measure short range correlations be-tween neighboring levels like the nearest-neighbor spacingdistribution P ( s ) and the distribution of ratios between neigh-boring spacings [11, 17] or long-range correlations betweendistant levels like the ∆ of Dyson and Mehta [10] or thepower spectrum of the δ n [18]. In order to compare experi-mental spectra with RMT results, complete sequences of lev-els without mixed symmetries are needed. In actual experi-ments missing levels are sometimes unavoidable, for example,if the wave function has a node in the location of the antennain a microwave billiard experiment the level cannot be mea-sured. There are then a number of works that have worked outhow the correlations in the spectral ﬂuctuations depend on thefraction of observed levels for chaotic quantum systems [19–21]. In general, as the fraction of missing levels increases thecorrelations in the spectra diminish and the spectral statisticsbecome closer to the Poisson case. Assuming the chaoticityof the quantum system, the fraction of missing levels in thespectral sequence can be estimated from the spectral statis-tics. This way of estimating the number of missing levels inexperimental spectra has been succesfully applied for differ- a r X i v : . [ phy s i c s . d a t a - a n ] N ov ent systems [24, 25]. However, very few attempts have beenmade to address the quality of the different estimators and tomake a comparison between them. This is specially relevantas systems intermediate between regularity and chaos as wellas systems with a superposition of sequences with differentsymmetries also present correlations between the RMT resultand Poisson. It is therefore necessary to establish how to cor-rectly use each statistic for each kind of transition in order toavoid misleading conclusions on the origin of the intermediatebehavior of a certain system.The purpose of this work is to contribute to the systematiza-tion of the correct use of the available statistics to estimate thefraction of missing levels in spectra. In particular, we focuson the ∆ and the P δk statistics, which measure long-rangecorrelations between levels. For both, there exist theoreticalformulas for the transition as a function of one parameter, thefraction of observed levels, ( ϕ ∈ [0 , ). However, the the-oretical ensemble standard deviation has not been calculatedbefore. The square root of the ensemble standard deviation of ϕ is a measure of the error in the estimation of ϕ from a ﬁtto the theoretical formulas and should be as small as possible[22]. It will depend on the particular statistic used but alsoon the value of ϕ and the size of the analyzed level sequence.This error is very different from the simple numerical error ofa non-linear squares ﬁt and is the relevant one in RMT studies,as we show in this analysis. In this work we resort to a numer-ical calculation of the full distribution of ϕ . This will allow usto study the accuracy, related to the correctness of the theoret-ical formulas, and the precision, related to ensemble standarddeviation, of the estimation of the fraction observed levels, de-pending on its value, the size of the spectrum and the methodof calculation.The paper is organized as follows: In Sec. II we describethe tools we use for the Montecarlo simulations of RMT spec-tra with missing levels and the statistics we use for the analysisof long-range correlations and the estimation of the fraction ofobserved levels. In Sec. III we show the main results of thepaper, i.e. the distribution of the estimations of the fraction ofobserved levels depending on the different methods used. Thetables and ﬁgures of this section summarize the results for theprecision and accuracy of the ﬁt of the fraction of observedlevels depending on the size of the level sequence. In Sec. IVwe summarize the main conclusions of the work. II. STATISTICAL ANALYSIS OF RMT SPECTRA WITHMISSING LEVELS

For analyzing the accuracy and precision of the estimationof the fraction of missing levels we need to construct ensem-bles of chaotic spectra with different fractions of observed lev-els ϕ . We describe below the procedure we have used focusingon GOE spectra.The GOE is an ensemble of matrices deﬁned in RMT to rep-resent hamiltonian matrices with time-reversal symmetry andspin rotation symmetry, that is, the vast majority of physicalhamiltonians. Thus, it is the most used ensemble to comparewith experimental results and perform numerical and theoret- ical studies. The GOE is composed of real symmetric ma-trices invariant under orthogonal transformations. The ma-trix elements are random numbers from Gaussian distributionsand together with the GUE (Gaussian Unitary Ensemble, forhamiltonians without time-reversal symmetry) and the GSE(Gaussian Simplectic Ensemble, for hamiltonians with time-reversal symmetry but no spin rotational symmetry) they con-stitute the three classical ensembles of RMT.First of all, we generate a sample of matrices belonging tothe GOE and diagonalize each of them to obtain the spectrum.We then randomly eliminate the necessary amount of levelsto obtain a spectrum with a determined fraction of observedlevels ϕ .After that, the ﬁrst necessary step prior to the statisticalanalysis is the so-called unfolding , that is, to separate the spec-tral ﬂuctuations, which are the object of study, from the secu-lar behavior of the level density. To perform the unfolding oneneeds to assume that the density of states g ( E ) can be sepa-rated into a smooth part g ( E ) and a ﬂuctuating part (cid:101) g ( E ) , g ( E ) = g ( E ) + (cid:101) g ( E ) (1)The standard procedure by which g ( E ) is removed consistsin mapping the actual energy levels { E i } i =1 ,...,d into new di-mensionless levels { ε i } i =1 ,...,d whose mean level density isconstant. Here d stands for the size of the spectrum. This canbe done by means of the following transformation: ε i = N ( E i ) , i = 1 , . . . , d, (2)where N ( E ) is the smooth part of the accumulated level den-sity N ( E ) = (cid:82) E −∞ dE (cid:48) g ( E (cid:48) ) , which gives the number of lev-els up to energy E . The transformed level density ρ ( ε ) in thenew energy variable ε is such that ρ ( ε ) = 1 , as required.A correct unfolding is crucial in order to properly analyzethe spectral ﬂuctuations. Ideally the smooth part of the leveldensity, that is, the mean level density should be perfectlyknown. For physical systems this is not always the case. Inmany relevant cases only a very general ﬁt, like a ﬁt to a poly-nomial function of the energy, can be done. A polynomial ﬁtis a very illustrative example to understand the consequencesof the possible mistakes which can be made. If the degree ofthe polynomial is too low the ﬁt cannot accurately describe g ( E ) completely and if it is too high the ﬁt can be too goodin the sense that it would also partially include the ﬂuctua-tions and not only the smooth part. Both type of errors lead tomisleading conclusions in the spectral analysis [26].In this work we deal with spectra from the GOE, whoselevel density is perfectly known and, moreover, the changein the level density when there are missing levels is straight-forward. The mean level density of the GOE is known asWigner’s semicircle law and is given by: ρ ( E ) =  Aπ (cid:113) dA − E | E | < (cid:113) dA | E | > (cid:113) dA (3)where A is a constant. As the level density represents thenumber of energy levels per unit energy, we only need to mul-tiply by a factor ϕ to have the level density of a spectrum witha fraction ϕ of observed levels.Thus, the mean accumulated level density we use here forthe unfolding is: N ( E ) = ϕ (cid:32) Aπ (cid:34) E (cid:114) dA − E + 2 dA arcsin (cid:32) E (cid:112) d/A (cid:33)(cid:35) + d (cid:33) (4)Once we have an ensemble of unfolded spectra with a cer-tain fraction ϕ of observed levels, we can proceed to calculatethe two statistics we focus on: ∆ and P δk .The ∆ is one of the most widely used long-range statisticsand was deﬁned originally by Dyson and Mehta [9, 10]. It isdeﬁned in an interval [ ε, ε + L ] of the unfolded spectrum as ∆ ( L ) = (cid:42) min A,B L (cid:90) ε + Lε dε (cid:48) [ N ( ε (cid:48) ) − Aε (cid:48) − B ] (cid:43) , (5)where the angle brackets denote the spectral average over thevalues of ε , the location of the window of L levels within thespectrum. The value of ∆ is independent of the position of ε and that is why it makes sense to calculate an average overintervals of length L along the spectrum.The ∆ is a measure of the spectral rigidity or of how “or-dered” is the spectrum in the following sense: The nearer toan equally spaced spectrum the smaller the value of ∆ is.For an exactly equally spaced spectrum it has a constant value ∆ = 1 / . The opposite is a regular system with Poissonstatistics where there are no correlations between levels andit grows linearly in this case, ∆ = L/ . GOE spectra aremore rigid and the growth with L is logarithmic. The resultcannot be computed analytically but it is possible to obtain anasymptotic expression valid for large L : ∆ ( L ) = 1 π (cid:20) log(2 πL ) + γ − (cid:21) −

18 + O ( L − ) (6)This equation is sufﬁcient for most purposes as the differ-ence between this asymptotic formula and the numerical resultis negligible for L (cid:38) . The bar over ∆ denotes average overthe ensemble, as these are the theoretical predictions whichcan be supplied by RMT. For the practical calculation of ∆ we have implemented the prescription described in Ref. [19].It must be noted that in order to have a meaningful spectral av-erage it is important to calculate ∆ ( L ) through independentintervals of length L in the spectrum. The number of such in-tervals limits the maximum value of L for which the ∆ ( L ) can be calculated [27]. In this work we have chosen to limitthe calculation to L = d/ . in order to average over at leastthree fully independent intervals.The P δk statistic is the power spectrum of the δ n statisticwhich is deﬁned in terms of the unfolded energy levels as δ n = ε n +1 − ε − n, n = 1 , . . . , d − (7)and it represents the deviation of the excitation energy of the ( n + 1) th unfolded level from its mean value n . Moreover, if we appropriately shift the ground state of the system, we canwrite δ n = − (cid:101) N ( ε n +1 ) , (8)that is, the accumulated level density ﬂuctuations at ε = ε n +1 .The δ n statistic was regarded in [18] from a new point ofview, considering its formal similarity with a discrete timeseries, resulting in a new long-range statistic to characterizequantum chaos. One of the most common numerical tech-niques used in time series analysis is the calculation of thepower spectrum, the square modulus of the Fourier transform: P δk ≡ | ˆ δ k | = 1 N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) n =1 δ n exp (cid:18) − i πnkN (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (9)The three classical ensembles of RMT (GOE, GUE andGSE) showed P δk ∝ /k , while on the other hand, Pois-son spectra showed P δk ∝ /k . In view of these results itwas conjectured that chaotic quantum systems are character-ized by /f noise, whereas integrable ones exhibit /f noise[18]. The analogy with time series also provides a consistentinterpretation of spectral rigidity in terms of antipersistence.In a time series, antipersistence means that an increasing ordecreasing trend in the past increases the probability of theopposite trend in the future. Fluctuations in an unfolded spec-trum are the deviations from an equally spaced spectrum withspacings between levels equal to 1. Thus, if the signal δ n ,viewed as a time series, is very antipersistent, the correspond-ing energy spectrum is very rigid. This is the case of RMTand chaotic spectra.The P δk statistic has been used to study numerical spec-tra from theoretical models like RMT ensembles [28–31], thenuclear shell-model [18], quantum billiards [32], the quarticoscillator and the kicked top [30] to study the chaotic dy-namics of quantum models or the order-chaos transition inmixed systems. Microwave billiards [24, 33], microwave net-works [34, 35] or molecules [25] are some of the experimentalspectra recently analyzed using P δk to address issues such aschaoticity, particular non-universal features in the spectra ormissing levels.For our analysis we use the following expressions of ∆ ( L ) [19, 20] and P δk [21] for a GOE spectrum with only a fraction ϕ of observed levels: ∆ ( L ; ϕ ) = ϕ π (cid:20) log (cid:18) πLϕ (cid:19) + γ − − π (cid:21) + (1 − ϕ ) L (10) P δk ( ϕ ) = N ϕ π (cid:34) K ( ϕ kN ) − k + K (cid:0) ϕ N − kN (cid:1) − N − k ) (cid:35) + 14 sin (cid:0) πkN (cid:1) − ϕ (11)The formula for P δk ( ϕ ) uses the so-called form factor ap-proximation that only takes into account correlations between ∆ L PoissonGOE ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . FIG. 1. ∆ ( L ) vs. L for different values of the number of missinglevels. The curves are the ensemble average of ensembles of 1500matrices of dimension d = 1500 where a fraction of levels have beenrandomly taken out leaving a fraction ϕ of observed levels. Resultsare compared with the theoretical formula (10). − P δ k log( k ) PoissonGOE ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . ϕ = 0 . FIG. 2. Value of P δk vs. k in logarithmic scale for the same data asFig. 1 and comparison with the theoretical formula (11). two levels. Analytical results for the GUE have recently ob-tained expressions beyond the form factor approximation thattake into account correlations to all orders [36–38]. Unfortu-nately, similar analytical formulas for the GOE have not beenobtained yet. The differences are evident only at large valuesof k close to the limit d/ . For our purposes, the formulascoming from the form factor approximation are good enoughfor estimating the fraction of observed levels ϕ as we showbelow.The quality of these formulas for describing ensemble av-erages of chaotic spectra with missing levels has been shown before [19–21]. However, in order to be self-contained, weshow in Figs. 1 and 2 the curves for several values of ϕ tocompare the behavior of the two statistics as ϕ varies fromthe GOE prediction ( ϕ = 1 ) to the Poisson case ( ϕ = 0 ) andcompare with the results of averages over the correspondingensembles of 1500 members each for dimension d = 1500 asa check of the correctness of the theoretical predictions.Similar formulas have been obtained for another widelyused statistic to study long-range correlations, the Σ [20].The ∆ is an integrated version of the Σ and, thus, muchsmoother. Although, the quality of the formulas for the en-semble average of ∆ and Σ is very similar, the variancewithin the ensemble is generally much higher for the Σ thanfor the ∆ . The latter is, thus, better suited for estimating thenumber of missing levels and has been used in past and re-cent works with this aim [23, 24, 35, 39, 40], and thus ourchoice for this paper. For a recent discussion of this issue in adifferent context see Ref. [23].Due to its nature P δk is a highly ﬂuctuating quantity froma value of k to the next for individual spectra. Moreover, inits natural logarithmic representation, the number of points ismuch larger for large values of log k . This is not an importantcaveat when comparing with the GOE results or the Poissonresults. However, when ﬁtting for estimating ϕ a direct leastsquares ﬁt of the results is usually biased to give much moreimportance to large k values as we have many more pointsin that region. This is partially compensated by the fact thatdeviations from GOE results tend to be larger in the low k region but normally the bias is still there. So, when we can-not perform an ensemble average and to avoid this problem(and also for representation purposes) an average of P δk overdifferent values of k is performed. Of course, there are differ-ent ways to perform this average in the literature which, up tonow, have not been given a particular importance [21, 25]. Wehave explored this question in more detail and found that thisprocess of averaging may induce systematic errors in the es-timation of ϕ . Our ﬁndings are summarized in the AppendixA. Our ﬁnal conclusion is that the optimal way of averagingis done in intervals of k (and not log k as was done in someprevious works) with a number of intervals n int that dependson the dimension d and the value of ϕ . However, this choiceis not very critical and can be estimated with a simple formula n int = 0 . ∗ d . that gives a number that can be safely usedfor all values of ϕ which is important as in practical cases ϕ is,in principle, unknown. The results using P δk that we presentin the following section are performed with this procedure. III. RESULTS

In this section we present the results of the estimation of thefraction ϕ of observed levels in individual spectra from ﬁts tothe equations (10) and (11) for ∆ and P δk , and the compari-son of their efﬁciency in terms of accuracy and precision. Weremind the reader that by accuracy we mean the closeness ofthe estimated value to the real value and by precision we referto the statistical variability of the estimation. From now onwe will call ϕ ∆ the estimation of ϕ obtained from the ﬁt tothe expression (10) for ∆ , ϕ P the estimation of ϕ obtainedfrom the ﬁt to the expression (11) for P δk , and ϕ e the value of ϕ initially set for the ensemble, to distinguish between themwhen necessary.We have generated ensembles of 1500 GOE spectra for di-mensions d = 100 to and values of ϕ e from 0.3 to 1.Then for each ensemble we have proceeded as follows: i) cal-culate ∆ and P δk for each individual spectrum ii) perform ﬁtsto Eqs. (10) and (11) to obtain the estimations of ϕ , and iii)calculate the mean and standard deviation of the 1500 estima-tions of ϕ from the ensemble for each statistic. Finally weanalyze the precision in terms of the standard deviations andthe accuracy by comparing with the exact value of ϕ e whichhas been set for each ensemble. The actual distribution of theresults of the estimation is different from a Gaussian, in partdue to the fact that the value of ϕ is bounded between and .Nevertheless, the value of the standard deviation still is a goodmeasure of the precision of the method used. However, for afull understanding of the meaning of the result an analysis ofthe full distribution of the estimated values of ϕ is needed.Some examples will be provided after discussing the resultsfor the mean and standard deviation. A. Results for spectra of dimension d = 2500 In Table I we show the results of the mean µ and standarddeviation σ of the estimations of ϕ for each statistic in thisensemble. For the ﬁts with the ∆ ( L ) we use values of L upto L = d/ . . We have checked that the results do not dependvery much on the chosen limit as long as there are enoughpoints to make a proper ﬁt. We choose a number of intervals inthe k -axis of for the ﬁts with the P δk . It can be seen that theaccuracy is quite good for the ensemble of estimations and forboth statistics. The mean values are very close to the values of ϕ e for the corresponding ensemble, always within one sigma.The precision, as measured by σ/µ , lowers as the fraction ofobserved levels decrease, as one would expect. However, inthis quantity there are signiﬁcant differences between ∆ and P δk . While in the former case the precision goes from . for ϕ e = 0 . to about for ϕ e = 0 . , in the later case theprecision goes from to . We can then conclude thatfor this large dimensions a ﬁt to the formula for the P δk givesbetter results for the estimation of experimental spectra.In Figs. 3 and 4 we show the distribution of estimations ϕ P and ϕ ∆ for the ensemble with ϕ e = 0 . and ϕ e = 0 . .Together with the histograms we have represented Gaussiancurves with the corresponding parameters µ and σ obtainedfor the ensemble from Table I. The distributions have a rea-sonably Gaussian shape and the standard deviations can beused for estimating the dispersion of values of ϕ , although thefull distribution is needed for a correct interpretation of theﬁtting results. This is particularly evident in the case of thedistribution of the results for the ﬁtting with ∆ . The compar-ison of the normalized distribution of the estimation for thetwo different statistics clearly supports the choice of the ﬁtsof P δk for the estimation of ϕ for this dimension. This differ-ence diminishes for lower dimensions, as we will see in the TABLE I. Results of the average and standard deviation of the ﬁtsfor ϕ ( µ and σ ) for a GOE ensemble of matrices of dimension d = 2500 . The ﬁts using P δk were obtained with intervals in the k axis while the ﬁts using ∆ were obtained with the results up to amaximum value of L of d/ . . P δk ∆ ϕ µ σ µ σ next section with d = 200 . . . . . µ = 0 . σ = 0 . N o r m a li ze d D i s tr i bu t i o n ϕ . . . . . . . . . µ = 0 . σ = 0 . N o r m a li ze dd i s tr i bu t i o n ϕ FIG. 3. Distribution of the estimations ϕ ∆ for an ensemble of 1500GOE matrices of dimension d = 2500 with ϕ e = 0 . (left) and ϕ e = 0 . (right). Notice that the axis are different between theﬁgures. The solid red line is the best ﬁt to a Gaussian distributionwith the parameters shown in the ﬁgure. B. Ensembles of GOE spectra of dimension d = 200 In Table II we show the results of the mean µ and standarddeviation σ of the estimations of ϕ for each statistic in thisensemble, that is, the same as in Table I for d = 2500 . Bycomparing both we can observe the same general trends: themean values µ ± σ are compatible with the values of ϕ e for thecorresponding ensemble, though the precision ( σ/µ ) is loweras the fraction of observed levels decrease. Here as well thereare no signiﬁcant differences between the estimations with ∆ and with P δk dealing with accuracy. There are still differencesdealing with precision and better results for the estimationswith P δk are observed in the case of smaller ϕ although thedifferences are smaller than in the case of d = 2500 . Anotherdifference of Table II with respect to Table I is the generalincrease in the standard deviations, as expected when the di-mension decreases. For a graphical comparison we show inFigs. 5 and 6 the distribution of estimations ϕ P and ϕ ∆ for . . . . µ = 0 . σ = 0 . N o r m a li ze dd i s tr i bu t i o n ϕ . . . . µ = 0 . σ = 0 . N o r m a li ze dd i s tr i bu t i o n ϕ FIG. 4. Distribution of the estimation of ϕ P for the same ensemblesas the previous ﬁgure. The solid red line is the best ﬁt to a Gaussiandistribution with the parameters shown in the ﬁgure. The x-axis arethe same as in Fig. 3 but not the y-axis.TABLE II. Results of the average and standard deviation of the ﬁtsfor ϕ ( µ and σ ) for a GOE ensemble of matrices of dimension d = 200 . The ﬁts using P δk were obtained with intervals in the k axis while the ﬁts using ∆ were obtained with the results up to amaximum value of L of d/ . . P δk ∆ ϕ µ σ µ σ the ensemble with ϕ e = 0 . and ϕ e = 0 . , that is, the same asin Figs. 3 and 4 for d = 2500 . It can be seen as the distribu-tions are more extended in this case and the shape differs morefrom Gaussian, specially in the case of the ﬁtting with ∆ .

369 0 0 . . . . µ = 0 . σ = 0 . N o r m a li ze d D i s tr i bu t i o n ϕ . . . .

54 0 0 . . . . µ = 0 . σ = 0 . N o r m a li ze dd i s tr i bu t i o n ϕ FIG. 5. Distribution of the estimations ϕ ∆ for an ensemble of 1500GOE matrices of dimension d = 200 with ϕ e = 0 . (left) and ϕ e =0 . (right). Notice that the axis are different between the ﬁgures.The solid red line is the best ﬁt to a Gaussian distribution with theparameters shown in the ﬁgure. Thus, in view of Table II, the main conclusion of the com-parison between the estimations ϕ P and ϕ ∆ changes for this

39 0 0 . . . . µ = 0 . σ = 0 . N o r m a li ze dd i s tr i bu t i o n ϕ . . . . .

55 0 0 . . . . µ = 0 . σ = 0 . N o r m a li ze dd i s tr i bu t i o n ϕ FIG. 6. Distribution of the estimation of ϕ P for the same ensemblesas the previous ﬁgure. The solid red line is the best ﬁt to a Gaussiandistribution with the parameters shown in the ﬁgure. The x-axis arethe same as in Fig. 5 but not the y-axis. lower dimension: accuracy and precision are similar when us-ing ∆ or P δk for larger values of ϕ although P δk is still muchbetter for smaller values. In Appendix B we also show resultsfor an intermediate value d = 1000 conﬁrming this trend.Now, in the next section we would like to give some guid-ance on how to proceed to obtain the best range estimation of ϕ for a given single spectrum. C. A single spectrum: How to proceed

First, a completion of the results presented in the previoussections for a high and a low dimension is in order. We haveperform analyses of ensembles of several dimensions and wefound no signiﬁcant differences in accuracy between P δk and ∆ but we found differences in precision. In Fig. 7 we showthe evolution of the standard deviation of the ensemble distri-butions of ϕ values for P δk and ∆ for ϕ e = 0 . , . and 0.9.It can be seen that, except for the lowest dimensions, the pre-cision of the ﬁt with P δk is better and the difference is bigger asthe fraction of observed levels decrease. We remind that goodprecision means that the probability to obtain a value of ϕ forthe single spectrum of study (a single member of the ensem-ble) belongs to a narrow interval around the average value.Now, when performing a ﬁt with any of the two statistics,we can give the ﬁrst conservative range estimation by check-ing to which of the ensembles our ﬁtted value of ϕ could be-long in the tables I ( d = 2500 ), V ( d = 1000 ) and II ( d = 200 )for a dimension similar to that of the spectrum of study (ormaking our own by a GOE Monte Carlo calculation like wedescribed earlier). But the best range estimation of ϕ shouldbe obtained by calculating the ensembles of spectra of the ex-act dimension for numerically obtaining the full joint distri-bution of the ﬁtted ϕ and the actual ϕ e . From that numericalcalculation we should obtain the conditional probability dis-tribution given our result for the ﬁtted ϕ , and the best rangeestimation from the standard deviation of this distribution.So, although both statistics give meaningful results, we rec-ommend to rely on P δk specially for higher dimensions. More-over, we also believe calculating P δk is more ”friendly” to thenon-practitioner. Although, when we use P δk , we have to takeinto account that we have to choose the optimal number ofintervals n int to perform the average in intervals of k , spe-cially when ensemble averaging is not possible because of thesmall size of the spectrum. As explained in the Appendix A,we only have to use a simple rule to obtain safe results, that is,performing the average dividing in n int intervals the range [ k ] ,with n int = 0 . ∗ d . . Though the optimal n int in principledepends both on the dimension and the value of ϕ , a furtherreﬁnement of n int taking this into account would not producesuch signiﬁcant improvement to make worth the calculationof new ensembles, as the choice of n int is robust against am-ple variations, as explained in the Appendix. The formula for n int can be safely used when ϕ is completely unknown, andwhen we have some hint about its value then the exponent canbe chosen nearer to 0.6 for lower values of ϕ and nearer to 0.7for higher values of ϕ . . . . . . . σ d ϕ = 0 . ϕ = 0 . ϕ = 0 . FIG. 7. Value of the standard deviation of the ensemble distributionsof the estimations of ϕ versus the dimension of the spectrum. Circlescorrespond to ϕ e = 0 . , squares to ϕ e = 0 . and diamonds to ϕ e = 0 . . Empty symbols correspond to the estimations with ∆ and ﬁlled symbols to the estimations with P δk IV. CONCLUSIONS

In order to use RMT for estimating properties of chaoticexperimental spectra it is imperative to understand the preci-sion and accuracy of the statistics and the ﬁtting procedure ofchoice as a function of the size of the spectra and other prop-erties of the system. We have done a systematic analysis ofthe methods for estimating the fraction of observed levels ϕ in chaotic spectra based on the analysis of long-range spectralcorrelations. In particular, we have studied the methods usingthe power spectrum of the δ n statistic P δk and the ∆ of Dysonand Mehta. To analyze the methods we have prepared ensem-bles of GOE spectra taken out randomly a deﬁnite number oflevels so ϕ is known. Both statistics give reasonable results, the accuracy of theestimations being similar and quite good even for relativelyshort sequences with many missing levels. Accuracy and pre-cision are better, as expected, for larger sequences and largervalues of ϕ . Our method of choice, however, would be to use P δk as its precision is better, being the dispersion of the resultssmaller, specially for larger sequences. We remind that betterprecision for the result of the ensemble (lower standard devi-ations from the correct ϕ ) implies less error in the estimationof the value of ϕ from the ﬁt to a single spectrum of interest,as the value of ϕ for any member of the ensemble would havemore probability to lie in a narrow range of values around thecorrect one.Moreover, in our opinion P δk is more friendly and easier tointerpret to scientists without experience in statistical analysisof spectra. It is based on a simple Fourier transform instead ofbeing a particular deﬁnition in this ﬁeld like the more complex ∆ . The caveat of using P δk is that, when ensemble averagesare not possible because of the small size of the spectrum,some extra averaging is needed for an unbiased estimation, butwe have found a very simple rule which guarantees the safestresults, that is, performing the average inside n int intervals ofthe range [ k ] , with n int given by formula A2.We have concentrated on the case where missing levels aremissing randomly, for example, if the physical reason behindthe missed observation of the levels is related to the nodal dis-tribution of the wave functions. Similar Monte Carlo calcu-lations can be used for making estimations of the number ofmissing levels when there are some correlations between theunobserved levels, for example, if the levels may be missedbecause near levels are not resolved experimentally as recentwork have shown [24].We hope this work will contribute to the systematization ofthe use of long-range correlations for estimating the numberof missing levels in experimental chaotic spectra, for quantumor wave systems. Without this type of analysis the statisticalsigniﬁcance of any estimation based on the ﬁt to a theoreticalformula is unknown. Appendix A: Method of calculation of P δk The calculation of P δk is straightforward, being a discreteFourier transform of the function δ n deﬁned by Eq. (7). Asseen in Fig. 2 when calculating P δk averaged over an ensembleone obtains sets of points which present a smooth behavior.When calculating P δk for a single spectrum the result is not sosmooth, so there is the need to smoothen it in some way to beable to see the trend more clearly, specially in cases like theone we approach in this work when we have to ﬁt the formulato a set of points in order to estimate the fraction of observedlevels ϕ .When the spectrum of interest is reasonably large it is usualto divide it in several sequences and then perform averages ex-actly as for an ensemble of spectra. When this is not possiblethere is still another usual method which consists in dividingthe x axis, that is, the range of frequencies k , into several inter-vals with a ﬁxed length and represent only one point with the TABLE III. Ensemble of 1500 GOE matrices of dimension d = 200 .Comparison of results for µ and σ with intervals averaged on log k (the best result in this case) and intervals averaged on k . P δk average on log k P δk average on kϕ µ σ µ σ mean value for each interval. This can be done before or aftertaking the logarithm of the frequency, that is, we can dividethe range [ k ] or the range [log k ] . In this Appendix we presenta comparison of both methods and, in view of the analysis andconclusions, we have selected the optimal method which weuse for the results presented in the paper.When possible, the optimal average to smooth the resultsis a two-fold one, that is, divide the spectrum in several se-quences and perform a ﬁrst average and then divide the re-sult in several intervals and perform the second average asdescribed. Here we analyze only the second method as thespectra of interest are typically small and we use spectra ofdimension d = 200 . In Table III we show for comparison theﬁtting results for averaging in the range [ k ] and in the range [log k ] , choosing the optimal value for the number of intervalsin both cases. It can clearly be seen accuracy an precision aremuch better when averaging in the range [ k ] . We have alsoseen these trends in the rest of the cases of different dimen-sions analyzed, that is, an overestimation of the value of ϕ and an increase in the values of σ when averaging in the range [log k ] . Only for ϕ e = 0 . the results are better for aver-aging in the range of [log k ] , although in this case the resultsare actually very far from Gaussians and the ﬁt in many casesresults in ϕ = 1 .We have also analyzed the ﬁtting results depending on thenumber of intervals n int and the dimension. For each dimen-sion and each value of ϕ e of the ensembles generated we haveperform the calculations to obtain the distributions of valuesof ϕ using different numbers of intervals and calculating thesystematic error | µ − ϕ e | in each case. We call the optimal n int the one which minimizes this error. We have seen thatusing a large number of intervals or a small number of inter-vals produces systematic deviations in the ﬁtting of ϕ as theweight of the large and low k sections in the least squares pro-cedures varies. Except for ϕ < . there is a wide optimalnumber of intervals which is plotted in Fig. 8 for different di-mensions. The shaded area represents the number of intervalswhere there is less than 0.05 of systematic error in the averageestimation of ϕ . In Fig. 8 it can also be seen that the depen-dence of n int on the value of ϕ is neither very critical, being . . . . . . . n i n t ϕd = 2500 d = 1000 d = 200 FIG. 8. Optimal number of intervals for ﬁtting ϕ using P δk versus ϕ for different values of the dimension. The transparent shaded areasencompass all the values of n int for which the systematic error in theaverage estimation of ϕ is less than 0.05. That of d = 2500 reachesvalues above 400 for the highest values of ϕ , but the y axis has beenset up to 150 to see the three cases more clearly. also a wide optimal range of n int common to all the ϕ values.For example, for d = 200 we have chosen n int = 18 as theoptimal number of intervals common to all the values of ϕ .But in order to better see this dependence we have also repre-sented n int versus the dimension for three different values of ϕ in Fig. 9. We have found a simple functional form whichdescribes the dependence very well: n int = a ( ϕ ) ∗ d b ( ϕ ) (A1)where the parameters a and b in principle depend on the valueof ϕ , but again in this ﬁgure we can clearly see that this de-pendence is not very critical as the choice is robust againstample variations: inside the shaded area the error in the aver-age estimation of ϕ is less than 0.05. Thus, if ϕ is completelyunknown one could safely choose the expression: n int = 0 . ∗ d . (A2)represented by the black solid line in the ﬁgure. It can be seenthat we would have obtained good results with this choice if ϕ would have been completely unknown as the black solid lineis inside all the shaded areas. The ﬁtting parameters for thethree values of ϕ are shown in Table IV. When we have somehint about the estimation of ϕ then we would choose a valuefor the parameter b nearer to 0.6 for lower ϕ or nearer to 0.7for higher ϕ .To summarize this systematic study on the average to beperformed to obtain the best estimation of ϕ from the ﬁt to n i n t dϕ = 0 . ϕ = 0 . ϕ = 0 . . d . FIG. 9. Optimal number of intervals for ﬁtting ϕ using P δk versus thedimension for ϕ = 0 . , . and 0.9. The black solid line correspondsto the expression A2 and the dashed curves correspond to the ﬁt toequation A1. The transparent shaded areas encompass all the valuesof n int for which the systematic error in the average estimation of ϕ is less than 0.05. For ϕ = 0 . it reaches values above 300 for thehighest values of d , but the y axis limit has been set up to 100 to seethe three cases more clearly. ϕ a b a and b in Eq. A1 for several valuesof ϕ P δk , we found essentially two rules: 1) Divide in a number ofintervals the range [ k ] (and not the range [log k ] ) 2) Choosethe number of intervals n int according to formula A2. Appendix B: Spectra of dimension d = 1000 We show in this section a table with the results for inter-mediate dimension d = 1000 , Table V, in agreement with thetrend explained in the main text. TABLE V. Results of the average and standard deviation of the ﬁtsfor ϕ ( µ and σ ) for a GOE ensemble of matrices of dimension d = 1000 . The ﬁts using P δk were obtained with intervals in the k axis while the ﬁts using ∆ were obtained with the results up to amaximum value of L of d/ . . P δk ∆ ϕ µ σ µ σ ACKNOWLEDGMENTS

We acknowledge ﬁnancial support from Projects No.PGC2018-094180-B-I00 (MCIU/AEI/FEDER, EU) andNo. RTI2018-098868-B-I00 (MCIU/AEI/FEDER, EU) andCAM/FEDER Project No. S2018/TCS-4342 (QUITEMAD-CM). This research has been also supported by CSICResearch Platform on Quantum Technologies PTI-001. [1] E. Wigner, Characteristic vectors of bordered matrices with in-ﬁnite dimensions. Ann. of Math. , 548 (1955).[2] E. Wigner, Characteristic vectors of bordered matrices of inﬁ-nite dimensions II. Ann. of Math. , 203 (1957).[3] E. Wigner, On the distribution of the roots of certain symmetricmatrices. Ann. of Math. , 325 (1958).[4] E. Wigner, Random Matrices in Physics. SIAM Reviews , 1(1967).[5] M.L. Mehta, On the statistical properties of the level-spacingsin nuclear spectra. Nuclear Phys. , 395 (1960).[6] M. Gaudin, Sur la loi limite de l’espacement des valeurs propresd’une matrice aleatoire. Nuclear Phys. , 447 (1961).[7] F.J. Dyson, Statistical Theory of the energy levels of complexsystems, I, II, & III. J. Math. Phys. , 140 (1962).[8] F.J. Dyson, The threefold way. Algebraic structure of symmetrygroups and ensembles in quantum mechanics. J. Math. Phys. ,1199 (1962). [9] F.J. Dyson and M.L. Mehta, Statistical theory of the energy lev-els of complex systems. IV. J. Math. Phys. , 701 (1963).[10] M.L. Mehta and F.J. Dyson, Statistical theory of the energy lev-els of complex systems. V. J. Math. Phys. , 713 (1963).[11] J. Flores, J.B. French, P.A. Mello, A. Pandey, and S.S.M. Wong,Random Matrix Physics: Spectrum and Strength Fluctuations.Rev. Mod. Phys. , 3 (1981).[12] T. Guhr, A. M¨uller-Groeling, and H.A. Weidenm¨uller, Ran-dom Matrix Theories in Quantum Physics: Common Concepts.Phys.Rep. , 189 (1998).[13] R. U. Haq, A. Pandey, and O. Bohigas, Fluctuation Propertiesof Nuclear Energy Levels: Do Theory and Experiment Agree?Phys. Rev. Lett. , 1086.[14] O. Bohigas, M.J. Giannoni, and C. Schmit, Characterization ofChaotic Quantum Spectra and Universality of Level FluctuationLaws. Phys. Rev. Lett. , 1 (1984).[15] S. M¨uller, S. Heusler, P. Braun, F. Haake, A. Altland, Semiclas-sical Foundation of Universality in Quantum Chaos. Phys. Rev. Lett. , 014103 (2004).[16] M.V. Berry and M. Tabor, Level Clustering in the Regular Spec-tra. Proc. R. Soc. A , 375 (1977).[17] V. Oganesyan and D. A. Huse, Localization of interactingfermions at high temperature, Phys. Rev. B , 155111 (2007).[18] A. Rela˜no, J. M. G. G´omez, R. A. Molina, and J. Retamosa,Quantum Chaos and /f Noise. Phys. Rev. Lett. , 244102(2002).[19] D. Mulhall, Maximum likelihood method to correct for missedlevels based on the ∆ ( L ) statistic. Phys. Rev. C , 054321(2011).[20] O. Bohigas and M.P. Pato, Missing levels in correlated spectra.Phys. Lett. B , 171 (2004).[21] R.A. Molina, J. Retamosa, L. Mu˜noz, A. Rela˜no, and E. Faleiro,Power spectrum of nuclear spectra with missing levels andmixed symmetries. Phys. Lett. B , 25 (2007).[22] M.L. Mehta, Random matrices , (Academic Press, Amsterdam,2014).[23] P. Naubereit, D. Studer, A.V. Viatkina, A. Buchleitner, B. Dietz,V.V. Flambaum, and K. Wendt, Intrinsic quantum chaos andspectral ﬂuctuations within the protactinium atom, Phys. Rev.A , 022506 (2018).[24] M. Ławnizak, M. Białous, V. Yunko, S. Bauch, and L. Sirko,Missing-level statistics and analysis of the power spectrum oflevel ﬂuctuations of three-dimensional chaotic microwave cav-ities. Phys. Rev. E , 012206 (2018).[25] J. Mur-Petit and R. A. Molina, Spectral statistics of molecularresonances in erbium isotopes: How chaotic are they? Phys.Rev. E, , 042906 (2015).[26] J. M. G. G´omez, R. A. Molina, A. Rela˜no, and J. Retamosa,Misleading signatures of quantum chaos. Phys. Rev. E ,036209 (2002).[27] O. Bohigas and M. J. Giannoni, Level density ﬂuctuations andrandom matrix theory. Ann. Phys. (N.Y.) , 393 (1975).[28] E. Faleiro, J. M. G. G´omez, R. A. Molina, L. Mu˜noz, A. Rela˜no,and J. Retamosa, Theoretical Derivation of /f Noise in Quan-tum Chaos. Phys. Rev. Lett. , 244101 (2004).[29] A. Rela˜no, R.A. Molina, and J. Retamosa, /f noise in the two-body random ensemble. Phys. Rev. E , 017201 (2004). [30] M. S. Santhanam and J. N. Bandyopadhyay, Spectral Fluctu-ations and /f Noise in the Order-Chaos Transition Regime.Phys. Rev. Lett. , 114101 (2005).[31] C. Male, G. Le Caer, and R. Delannay, /f α noise in the ﬂuc-tuations of the spectra of tridiagonal random matrices from the β -Hermite ensemble. Phys. Rev. E , 042101 (2007).[32] J. M. G. G´omez, A. Rela˜no, J. Retamosa, E. Faleiro, L. Salas-nick, M. Vraniˇcar, and M. Robnik, /f α Noise in SpectralFluctuations of Quantum Systems. Phys. Rev. Lett. , 084101(2005).[33] M. Białous, B. Dietz, and L. Sirko, Experimental investigationof the elastic enhancement factor in a microwave cavity emu-lating a chaotic scattering system with varying openness. Phys.Rev. E , 012210 (2019).[34] B. Dietz, V. Yunko, M. Białous, S. Bauch, M. Ławnizak, andL. Sirko, Nonuniversality in the spectral properties of time-reversal-invariant microwave networks and quantum graphs.Phys. Rev. E , 052202 (2017).[35] M. Białous, V. Yunko, S. Bauch, M. Ławnizak, B. Dietz, and L.Sirko, Power Spectrum Analysis and Missing Level Statisticsof Microwave Graphs with Violated Time Reversal Invariance.Phys. Rev. Lett. , 144101 (2016).[36] R. Riser, V.A. Ozipov, and E. Kanzipier, Power Spectrumof Long Eigenlevel Sequences in Quantum Chaotic Systems.Phys. Rev. Lett. , 204101 (2017).[37] R. Riser, V.A. Ozipov, and E. Kanzipier, Nonperturbative the-ory of power spectrum in complex systems. Ann. Phys. ,168065 (2020).[38] A. Corps and A. Rela˜no, Stringent test on power spectrum ofquantum integrable and chaotic systems, arXiv:1908.09285.[39] P. D. Georgopulos and H. S. Camarda, Use of the Dyson-Mehta ∆ statistic as a test of missing levels. Phys. Rev. C , 420(1981).[40] D. Biswas, S. Sinha, and S. V. Lawande, Spurious spectral ﬂuc-tuations due to missing levels. Phys. Rev. A46