[PDF] Annual modulations from secular variations: not relaxing DAMA?

Abstract

In a recent paper [arXiv:2002.00459], Buttazzo et al. show how the annually modulated rate of the DAMA experiments can be possibly interpreted as an artefact due to the interplay between a time-dependent background and the method to account for it. In this work, we compare this hypothesis against the sinusoidal dark matter signal as proposed by the DAMA collaboration. We produce in a Bayesian approach a quantitative comparison of how much the experimental observations are in support of each hypothesis. Our conclusions are that the odds against the hypothesis of a time varying background being responsible for the annual modulation are decreased by a Bayes factor larger than 10^8 after considering the public available data of the DAMA/NaI and DAMA/LIBRA experiments. In this work we also elaborate on general aspects of the analysis procedure. Indeed, in order to optimise the background subtraction procedure, the DAMA collaboration only considers data-taking cycles with a duration of roughly one year. We argue that any data-taking cycle is informative, and we propose a procedure to include this effect, as well as the possibility to include a slowly varying component for the background.

Full PDF

PPrepared for submission to JCAP

Annual modulations from secularvariations: not relaxing DAMA?

Andrea Messina a,b

Marco Nardecchia a,b

Stefano Piacentini a,b, a Sapienza, Università di Roma, Italy b INFN, Sezione di Roma, ItalyE-mail: [email protected], [email protected],[email protected]

Abstract.

In a recent paper [arXiv:2002.00459], Buttazzo et al. show how the annuallymodulated rate of the DAMA experiments can be possibly interpreted as an artefact dueto the interplay between a time-dependent background and the method to account forit. In this work, we compare this hypothesis against the sinusoidal dark matter signal asproposed by the DAMA collaboration. We produce in a Bayesian approach a quantitativecomparison of how much the experimental observations are in support of each hypothesis.Our conclusions are that the odds against the hypothesis of a time varying backgroundbeing responsible for the annual modulation are decreased by a Bayes factor larger than after considering the public available data of the DAMA/NaI and DAMA/LIBRA ex-periments. In this work we also elaborate on general aspects of the analysis procedure.Indeed, in order to optimise the background subtraction procedure, the DAMA collabora-tion only considers data-taking cycles with a duration of roughly one year. We argue thatany data-taking cycle is informative, and we propose a procedure to include this eﬀect, aswell as the possibility to include a slowly varying component for the background. Keywords: dark matter detectors, dark matter experiments

ArXiv ePrint: Corresponding author. a r X i v : . [ h e p - e x ] M a r ontents JAGS and rjags

JAGS

MCMC simulation 20

B.1 Prior choices 21B.2 Posterior results 23

The results of the DAMA/NaI and DAMA/LIBRA experiments are interpreted by theDAMA collaboration as a strong evidence of the presence of Dark Matter (DM) particlesin the galactic halo [1–6]. This conclusions derive from the compatibility of the annuallymodulated rate with a sinusoidal signal, characterized by a phase and a period in agreementwith those expected from the interaction of DM particles.In a recent paper [7] it has been discussed the possibility that, due to the analysisprocedure followed by the DAMA collaboration, the observed annual modulation could bereproduced by a slowly varying time-dependent background. This possibility has been usedto interpret the modulation of residuals of the single-hit scintillation rate as a function oftime published by the DAMA collaboration [2, 4, 6].The argument proposed in ref. [7] goes as follows: since the residual rate is computedby subtracting the average total rate in every cycle of data-taking, if the background rate isnot constant over time this procedure can generate a time modulated rate. More precisely,in the hypothesis of the actual presence of a DM signal, the total rate r ( t ) is given by: r ( t ) = r ( t ) + A cos (cid:18) πtT − φ (cid:19) , (1.1)where r ( t ) is the time-dependent background, A is the amplitude of the oscillating DMsignal, T is the period of the oscillation, which is equal to 1 year, and φ is a phase such thatthe peak of the oscillation is around the nd June, as expected in DM galactic halo models.– 1 –f r is constant over time, then one can choose a time window of width ∆ equal to anymultiple of T and average the rate in that window to isolate the background contribution.Under such an hypothesis, the residuals obtained by subtracting window by window thisaverage are: S ( t ) ≡ r ( t ) − (cid:104) r ( t ) (cid:105) ∆ = A cos (cid:18) πtT − φ (cid:19) (1.2)and then the signal is isolated from the background. This is what has been done by theDAMA collaboration, with the time windows ∆ chosen to be of the order of 1 year.On the other hand, as it has been pointed out, if the background is slowly varyingover time, for example linearly, the choice of using time windows of the order of the pe-riod of the wanted signal can produce residuals with a modulation of the same period.The possibility of a time-varying background is supported by the fact that in undergrounddetectors, especially in the keV energy range, the features of the background are not com-pletely modeled. In particular in ref. [7] explanations due to out-of-equilibrium physical orinstrumental eﬀects are considered. The authors of ref. [7] conclude that the available datacould not safely exclude the extreme possibility that the modulation in the residual rate isproduced by a slowly varying background only. Such a conclusion is based on the debat-able argument that the DAMA residuals could be ﬁtted by an linearly growing backgroundwith a χ / d.o.f (cid:39) .In ref. [8] a Bayesian comparison between the cosine and the null hypothesis onthe DAMA (2-6) keVee energy window dataset has been performed, leading to a decisiveevidence against the null hypothesis. The aim of this paper is to perform a thoroughcomparison of the two hypotheses of a cosine modulation and a slowly varying background,and give a quantitative conclusion on their performance by studying the likelihood ratio,the model complexity, and the posterior probabilities. We highlight that the odds in favorof the cosine modulation hypothesis are decreased by a Bayes factor of the order of with respect to pure linearly varying background hypothesis, when considering the (2-6) keVee energy window of all the three experimental phases of DAMA. For the sake ofcompleteness, this conclusion stands out also considering only the likelihood ratio of thehypotheses. However, we second the proposal of ref. [7] to encourage the experimentalcollaboration to add a slowly varying background component in the ﬁtting procedure. Anovel observation we point out in this work is that there is no conceptual obstruction touse data-taking cycles with generic time intervals (i.e. with a duration diﬀerent than 1year).This paper is organized as follows: in Section 2 we brieﬂy describe the experimentaldata considered in our study and we motivate the possibility to use generic time intervals.In Section 3 we explain the details of our analysis, performed using the Bayesian approach.In Section 4 we show our results and in particular we give a quantitative comparisonbetween the various models in terms of Bayesian factors, likelihood ratios and Ockham’sfactors. Finally, conclusions are given in Section 5. The DAMA and DAMA/LIBRA experiments use ultra-radiopure NaI(TI) scintillatingcrystals as active target, coupled with photomultipliers to measure the amount of de-posited energy. The single-hit scintillation events rate is used to look for a possible signalof DM interactions with matter over the large backgrounds due to natural radioactivity ofthe detector and surrounding environment. The event rate, expressed in in cpd/kg/keVee(where ’ee’ stands for electron equivalent), has only been published [2, 4, 6] after the sub-traction of its time-average over each cycle (of roughly one-year duration) following the– 2 –rocedure outlined in the previous section. Each cycle starts every year roughly aroundthe beginning of September.The residual rates are available in diﬀerent energy windows: for the DAMA/NaI andthe DAMA/LIBRA Phase I experiments, the energy windows are (2-4), (2-5), and (2-6)keVee; for the DAMA/LIBRA Phase II the energy windows are (1-3), (1-6), and (2-6)keVee. The (2-6) keVee energy window is the one with the smallest uncertainties, and it iscommon to the three phases. For this reason, to give a quantitative and fair comparison ofthe diﬀerent hypotheses in the three stages we decided to study only the residuals in the(2-6) KeVee energy window.In principle, our analysis could be easily extended to other energy bins, although webelieve our conclusions would not change signiﬁcantly.

The algorithm used by the DAMA collaboration to extract the time-dependent residualrate from the data has a two-fold objective: account for any constant or slowly varyingcomponent while keeping the sinusoidal time structure of an hypothetical DM signal. Theserequirements put constraints on the optimal time intervals of the data-taking cycle.The variance of the sinusoidal signal is given by the quantity ( α − β ) , with the timeaverages α = (cid:104) cos π/T ( t − t ) (cid:105) and β = (cid:104) cos 2 π/T ( t − t ) (cid:105) . For ∆ = T , the quantity ( α − β ) is . (i.e. β = 0 ). In ref. [6] the value of 0.5 is taken as a ﬁgure of merit to asseswhether the detectors were operational evenly throughout the period of the modulation ofone year. To the best of our understanding a value of ( α − β ) much diﬀerent than 0.5implies that either the average of the signal is not vanishing ( β (cid:54) = 0 ), or the data-takingperiod is not close to one year, or both.The DAMA collaboration used the argument that ( α − β ) (cid:54) = 0 . to remove the ﬁrstcycle of data-taking in table 1 of ref. [6] from the the analysis. t i ( days ) Δ i ( da ys ) β ( t i , Δ i ) - - - - Figure 1 . Left : Contour plot of the function β given in eq. (2.1); the points represent thedata-taking cycles given in table 1 of ref. [6]. In particular, the point number 1 corresponds to theﬁrst and discarded data-taking cycle. Right : Fit of a sinusoidal model (red) to the DAMA/LI-BRA Phase II residual rate in the (2-6) keVee energy window obtained taking into account thesubtraction-bias eﬀect. The dotted red line is the corresponding sinusoidal modulation without thesubtraction. The grey are is the region discarded as explained in ref. [6]: since, assuming the sinu-soidal signal, the eﬀect is perfectly quantiﬁable, in principle also this region could be used for theresiduals analysis. – 3 –he procedure to extract the residual rate might induce a subtraction of the signal ifthe time interval is not chosen carefully. In particular for a data-taking starting in t i andextending to t i + ∆ i , the average of the signal is a function of t i and ∆ i : β ( t i , ∆ i ) = 1∆ i (cid:90) t i +∆ i t i cos (cid:18) πT ( t (cid:48) − t ) (cid:19) dt (cid:48) . (2.1)The contour plot of β ( t i , ∆ i ) as a function of t i and ∆ i is given in ﬁg. 1. Independentlyof t i , if ∆ i is equal to T , as expected, β ( t i , ∆ i = T ) = 0 . This is the approximate condi-tion used by the DAMA collaboration to include a data-taking cycles in the modulationanalysis. However, as it is shown in ﬁg. 1, there are other combination of t i and ∆ i thatresult in β ( t i , ∆ i ) = 0 ; of particular relevance there is the one with t i and t i + ∆ i chosensymmetrically with respect to the time when the cosine is zero. Figure 1 also reports thevalues of β ( t i , ∆ i ) for the 7 data-taking cycles reported in ref. [6]; we note that for theﬁrst and the second cycle the relative contribution of the signal average is of the order of10%. In particular, we computed the eﬀect of such a subtraction for the ﬁrst and discardedcycle and represented it graphically in the plot on the right side of ﬁg. 1. The plot showsthe expected behaviour in the discarded region (shaded area): the dotted line correspondsto a sinusoidal model with no subtraction (as generally used), and the continuous line isthe corrected sinusoidal model. The diﬀerence between the dotted and continuous linescorresponds to the value of β in the two considered cycles. Although this eﬀect is generallysmall in the cycles selected by the DAMA collaboration, we suggest to account for it in theﬁtting procedure and use the entire dataset without being forced to drop any data-takingcycle, even if completely asymmetrical. It would be suﬃcient to compute the integral ineq. (2.1) and include it in eq. (1.2) potentially as a function of A , T , and t .Finally, we stress that if the background is not constant as a function of time it willhave an impact on the value of the average of the rate in the relative data-taking cycle.In particular the linear term of expansion of the background as a function of time willproduce a sawtooth-shape in the residual rate. The possibility to associate a probability value to any uncertain quantity is at the basis ofthe Bayesian approach to data analysis. This may apply to the outcome of a measurementbefore it is carried out, to a parameter of interest within a given model, as well as tothe model itself. Models and outcomes of measurements are then related by the rules ofprobability theory. We can compute, at least in principle, the probability of any speciﬁcpropositions given some state of information. This is particularly useful in the context ofmodel comparison: given two or more models, we can compute the probability that theactual experimental observations are in support of each of them, and then rank the modelsaccording to decreasing probability.In the following, we summarize how models can be compared in the Bayesian ap-proach, while in Appendix A we give all the details. Let’s suppose to have a dataset D = { x i } composed of n measurements x i , a set of models or hypotheses H i where eachhypothesis could in general depend on a vector of parameters (cid:126)θ i . Within a model, weassume to know the prior probability π ( H i ) , the likelihood function L ( H i , (cid:126)θ i ; D ) , and theprior probability density function (pdf) of all the model’s parameters π ( (cid:126)θ i | H i ) .The posterior odds ratio is given as: p ( H A | D ) p ( H B | D ) = BF AB × π ( H A ) π ( H B ) , with BF AB = L ( H A ; D ) L ( H B ; D ) ; (3.1)– 4 –here L ( H A ; D ) is the so called marginal likelihood deﬁned as: L ( H A ; D ) = (cid:90) L ( (cid:126)θ A , H A ; D ) π ( (cid:126)θ A | H A ) d(cid:126)θ A . (3.2)The Bayes factor BF AB encapsulates all the new information associated to the measure-ments, although, for parametric models, it could critically depend on the choice of thepriors π ( (cid:126)θ i | H i ) . In any real situation, the integral of eq. (3.2) is not solvable analyticallyand can only be estimated numerically. Nevertheless, in the case where the informationcontent of the measurements is such that the priors can be considered suﬃciently vaguein the range of the parameters space where the likelihood is sizable, we can consider theprior as a constant and use the Laplace approximation to estimate the integral. This is avery useful approximation to get sense of the diﬀerent contributions to the Bayes factorbecause it can be factorised as: BF AB = LR AB × OF AB . (3.3)where we deﬁned ( LR AB ) as the ratio of the likelihoods evaluated at their maximum, and OF AB , the so called Ockham’s factor, estimated by working out the integral in eq. (3.2)as shown in more details in Appendix A. In this factorization LR AB depends solely onthe measurements, while OF AB also on the priors. The Ockham’s factor is a measure ofthe complexity of the two models that depends on the number and the a-priori variabilityof the model’s parameters. The message behind this factorization is that even though amodel with more parameters can be more ﬂexible and thus better ﬁt the data producing anhigher likelihood, there is a price to pay for having a more complex model. In other words:even if the likelihood ratio pushes towards a more complex model which often better ﬁtsthe data, on the other hand the Ockham’s factor penalises it.Finally, two other useful criteria to quantify the ability of a model to describe theobservations are the Bayesian Information Criterion (BIC) [9] and the Akaike InformationCriterion (AIC) [10], a review of which is given for example in ref.[11]. Given a dataset ofsize n , and a model A , with a number k A of parameters (cid:126)θ A whose value that maximise thelikelihood is ˆ (cid:126)θ A , the BIC is deﬁned as: BIC A = k A ln ( n ) − (cid:16) L (ˆ (cid:126)θ A , H A ; D ) (cid:17) ; (3.4)and the AIC is deﬁned as: AIC A = 2 k A − (cid:16) L (ˆ (cid:126)θ A , H A ; D ) (cid:17) ; (3.5)According to these criteria, the smaller the value of the BIC (AIC) is, the better thedescription of the observations.When comparing two models, the ∆ BIC AB and ∆ AIC AB can be deﬁned as: ∆ BIC AB = BIC A − BIC B = ln ( n ) ( k A − k B ) − LR AB ) . (3.6) ∆ AIC AB = AIC A − AIC B = 2 ( k A − k B ) − LR AB ) . (3.7) Note that for normal likelihoods the LR AB can be expressed in terms of χ A ( B ) = (cid:80) i (cid:20) ( x i − µ A ( B ) ( x i )) σ i (cid:21) as: LR AB = exp (cid:104) − χ A − χ B (cid:105) , where µ A ( B ) ( x i ) represents the true value of the measurement x i in the model A ( B ) . This means that the LR AB of the two models evaluated at their best ﬁt values is simply given by exp [ − ∆ ˆ χ AB / . In a frequentistic approach to model comparison, the ∆ ˆ χ AB is often taken as test-statisticand its probability distribution used to compute p-values. For nested models, where the more general modelcontains the simpler plus k additional parameters, the Wilks’ theorem says that pdf (∆ ˆ χ AB ) is itself a χ distribution with k degrees of freedom. For non-nested models one would have to estimate pdf (∆ ˆ χ AB ) ,possibly by sampling it with pseudo-experiments. – 5 –he previous equations show how the ∆ BIC AB and ∆ AIC AB are closely related to the LR AB , plus a penalty term that penalises the model which is more complex.For the interpretation of the odds ratio or Bayes factor we refer to the criterion basedon Jeﬀreys scale [12]: a value of > represents strong evidence in favor of model A, anda value of > represents decisive evidence. Similar evaluation can be done based on the ∆ BIC and the ∆ AIC . We decided to consider, in our analysis, three diﬀerent models: • COS model: this is the pure-cosine model, where the only free parameter is theamplitude A of the modulation: S COS ( t ) = A cos (cid:18) πT ( t − t ) (cid:19) , (3.8)where the period T is assumed to be ﬁxed to 1 year and t = 152 . d such that thepeak of the modulation is on the nd June. • SAW model: this is the pure-sawtooth model, where the only free parameter is theslope B of the linearly varying sawtooth: S SAW ( t ) = B ( t − t i ) with t i − ∆ i < t < t i + ∆ i , (3.9)where ∆ i is the width of the time window to which t belongs, and t i is the center ofthis time window. • C+S model: this is the cosine plus sawtooth model, where this time the free param-eters to be considered are two, A and B : S C + S ( t ) = S COS ( t ) + S SAW ( t ) , (3.10)where S COS and S SAW are deﬁned in the eq. (3.8) and (3.9), respectively.What we would like to stress is that the models have been chosen in such a way thatthe COS and the SAW model have the same number of parameter, in order to minimisethe impact of the Ockham’s factor (and thus of the priors of the parameters) in the ﬁnalBayes factor. Indeed, we expect that, if the C+S model doesn’t give a strong improvementin the likelihoods with respect to the other two models, the Ockham’s factor will drive theBayes factor in favour of the simpler models.

JAGS and rjags

The computation of the posterior probabilities, even for models relatively simple as thosedescribed in the previous section, is often only possible by Monte Carlo integration. Themost common way to solve problems of this kind is by sampling the unnormalised poste-rior distribution by a Markov Chain Monte Carlo (MCMC). For our study we used thegeneral analysis framework R [14] and the MCMC algorithm called Gibbs Sampler as im-plemented in

JAGS [15] and interfaced with R in the package rjags [16]. The details of theimplementation are given in Appendix B.We assume, as it is done by the DAMA collaboration [1], that the measurements atdiﬀerent times are independent and follow a normal likelihood in each time bin t i with– 6 –nown standard deviation σ i given by the experimental uncertainty. The total likelihoodis then given by: L ( { µ i } , { σ i } ; { D i } ) = n (cid:89) i = i √ πσ i exp (cid:20) − ( y i − µ i ) σ i (cid:21) , (3.11)where the true parameters { µ i } are given as a function of { t i } by implementing one of thethree models outlined above, for example for the COS model: µ i = A cos (cid:18) πT ( t i − t ) (cid:19) (3.12)with the only unknown parameter represented by the amplitude A . For each unknownparameter in the model we have to give a prior distribution probability. In the followingwe used ﬂat priors for all parameters.The implementation in JAGS of the model outlined above and the discussion on theprior’s choice is described in Appendix B. The Monte Carlo simulation gives the unnor-malised posterior probability of the parameters of interest sampled using the Gibbs algo-rithm. The results reported in this work are obtained with a single Markov chain with · steps. Finally, to estimate the marginal likelihood and the Bayes factor we used the bridgesampling package [17]. This package uses the same Markov chain used to samplethe posterior probability for the integration of the marginal likelihood. In the following, toget a feeling of the contribution of the priors we also report the value of the maximum ofthe log-likelihood (best ﬁt χ ); when comparing two models, we computed analytically thelikelihood ratio, and the Ockham’s factor as described in Appendix A using eq. (A.10). We used a toy simulation to qualify the performance of the model selection criteria outlinedabove on the speciﬁc problem discussed in this work. We generated a few cycles of pseudo-data with the same time structure of those of DAMA, and residual rates sampled fromindependent normal pdfs with true values given by a mixture of COS and SAW modelsand known σ . We choose a value of A for the COS model, and B for the SAW modelsimilar to the amplitude of the DAMA/LIBRA phase II residuals, and a σ comparablewith their uncertainties. We ﬁtted these pseudo-datasets with the three hypotheses: COS,SAW and C+S, and checked the sensitivity of the model comparison criteria to indicatethe correct model used in the generation. The source code for the simulated analysis ispublicly available on GitHub [13].Figure 2 shows the pseudo-datasets with the underlying true model on the left-handside and the best ﬁt for each of the three models on the right-hand side. Table 1 reportsfor the three pseudo-datasets the values of the model comparison criteria. We can see thatin all cases the underlying model is correctly selected for each of the criterion of modelcomparison. The largest scores occur when we compare COS and SAW models, but sinceC+S is an extension of both COS and SAW, in some cases that include C+S we may havea relatively small BF (of the order of 10), mostly due to OF contribution, namely thediﬀerence in complexity between the two compared models.

As already said at the beginning of the Section 3, we decided to analyse the residualrate in the (2-6) keVee energy window. Since the precision of the measurements and thesize of the modulation are diﬀerent in the various phases, as a ﬁrst step we analysed the– 7 –OS generated data BF [dB] LR [dB] ∆ BIC ∆ AICCOS vs SAW . . − . − . COS vs C+S . − . − . − . SAW vs C+S − . − . . . C+S generated data BF [dB] LR [dB] ∆ BIC ∆ AICCOS vs SAW . . − . − . COS vs C+S − . − . .

30 7 . SAW vs C+S − . − . . . SAW generated data BF [dB] LR [dB] ∆ BIC ∆ AICCOS vs SAW − . − . . . COS vs C+S − . − . . . SAW vs C+S . . − . − . Table 1 . Comparison between the various models in the three pseudo-datasets in terms of Bayesfactor (BF), likelihood ratio (LR), and diﬀerence of Bayesian Information Criterion ( ∆ BIC) andAkaike Information Criterion ( ∆ AIC).

DAMA phase BF C,S [dB] LR C,S [dB] OF C,S [dB] ∆ BIC ∆ AICDAMA/NaI − . − . . .

34 8 . LIBRA Phase I . . . − . − . LIBRA Phase II . . . − . − . All . . . − . − . Table 2 . Bayes factor (BF), likelihood ratio (LR), Ockham’s factor (OF), Bayesian InformationCriterion (BIC) diﬀerence and Akaike Information Criterion (AIC) diﬀerence between the COSand the SAW model for the three stages of DAMA experiments. For the ﬁrst three rows, the ∆ BICand the ∆ AIC values are equal because the two models have the same number of parameters. Thelast row refers to the comparison between COS and a SAW model with 3 diﬀerent slopes as detailedin Section 4.3. data individually on each of the three datasets, testing only the two hypotheses of pure-cosine (COS) and pure-sawtooth (SAW) contribution. We tested on the most informativedataset also the third model (C+S), and then we performed the analysis on the wholedataset. Finally, we evaluated the performance of the cosine model promoting the periodand the phase to free parameters.

In table 2 we show the results of the comparison between the COS and the SAW modelsin terms of Bayes factor, likelihood ratio, Ockham’s factor, Bayesian Information Criteriondiﬀerence and Akaike Information Criterion diﬀerence in every experimental stage. Sincesome of these quantities can eventually be very large or very small, BF, LR and OF aregiven in decibels . This table also shows how the contribution of the Ockham’s factor,which is the component of the Bayes factor critically dependent on the choice of the priorsof the parameters, is marginal. As explicitly shown in Appendix B, for each of the threedatasets we chose a ﬂat prior on B as well as on A . However, for all the three datasetswe tested diﬀerent possible priors (uniform, normal and gamma distributions), but in allcases the Ockham’s factor contribution is always under control, namely it changes of at Given a quantity x , the decibel d is deﬁned as d = 10 log ( x ) . – 8 – igure 2 . Left:

Simulated data (points) and true underlying model (red line) for the COS, C+S,and SAW models respectively.

Right: best ﬁt for the COS (red solid line), SAW (green dottedline), and C+S (blue dashed line) models respectively. the most 4-5 units in decibels with respect to our ﬁnal choice of the prior. The results forthe diﬀerent stages can be summarized as follows: • For the DAMA/NaI dataset, the Bayes factor, which is − . in normal units,disfavours the pure-cosine model in favour of the pure-sawtooth one. However this isthe less informative dataset, namely it is the one with relative uncertainties on eachpoint larger than in the other phases. The ﬁt is shown in ﬁg. 3. • For the DAMA/LIBRA Phase I dataset, the Bayes factor is of the order of . ,and this means that already at this stage, where the relative uncertainties are smallerthan in the previous case, the cosine model starts to win on the pure-sawtooth model.The ﬁt is shown in ﬁg. 4. • For the DAMA/LIBRA Phase II dataset, the Bayes factor is . . This meansthat in the region in where the data give the maximum of the information, in thesense that we explained before, the pure-cosine model is greatly preferred to thepure-sawtooth model. In ﬁg. 5 the result of the ﬁt is presented.– 9 –AMA Fit to COS model Fit to SAW modelphase A [cpd/kg/keVee] χ / dof B [cpd/kg/keVee/yr] χ / dof DAMA/NaI . ± . . /

36 0 . ± . . / LIBRA I . ± . . /

49 0 . ± . . / LIBRA II . ± . . /

51 0 . ± . . / Table 3 . Results of the ﬁt for the cosine amplitude A of the COS model and the sawtoothcoeﬃcient B of the SAW model obtained on the DAMA residuals in the (2-6) keVee energy windowduring the diﬀerent phases, together with the corresponding χ / d.o.f. The details of the three ﬁts are described in table 3, in which all the reported valuesare obtained computing the expected value and the standard deviation on the posteriorprobability of each parameter.

Figure 3 . Fit to the DAMA/NaI residual rate in the (2-6) keVee energy window. The regionsin which the blue line is constant (and equal to zero) are the regions between two non-contiguousdata-taking cycles.

Figure 4 . Fit to the DAMA/LIBRA Phase I residual rate in the (2-6) keVee energy window.The regions in which the blue line is constant (and equal to zero) are the regions between twonon-contiguous data-taking cycles.

Since the third dataset is the most informative one, as already shown in ﬁg. 5, we decided totest not only the COS and the SAW model, but also the C+S model, as deﬁned in Section– 10 – igure 5 . Fit to the DAMA/LIBRA Phase II residual rate in the (2-6) keVee energy window.The regions in which the blue line is constant (and equal to zero) are the regions between twonon-contiguous data-taking cycles. B obtained in the C+S model is consistent withzero, B = ( − . ± . cpd/kg/keVee/yr, and the value of A is consistent with thevalue of A obtained in the COS model. Indeed, looking at the ﬁg. 5, the green (S+C ﬁt)and the red (COS ﬁt) lines are very close: in fact the χ /dof of the C+S ﬁt is very similarto that of the COS ﬁt, or, in other words, the LR of the models is very close to one (itis − . = 0 . in normal units). Therefore the Bayes factor in the COS vs C+S case isdriven by the Ockham’s factor, and since the number of parameters is diﬀerent in the twomodels, the COS model wins against the C+S model. The C+S model is an extension ofthe COS model but, even if it’s more complex, it doesn’t improve the goodness of ﬁt, andthe price to pay for the greater complexity is reﬂected in the Bayes factor. Since in this casethe value of BF depends critically on the priors, we tried to use diﬀerent possible priors forthe parameters as before, but in all cases the BF favoured the COS model. On the otherhand, for the SAW and C+S models, although the Ockham’s factor pushes in the directionof the simpler SAW model, the contribution of the LR drives the BF in favour of the C+Smodel. The ﬁnal message of this analysis is that the most informative dataset availableto us can be better represented by models that contain a dominant cosine component, asthe COS and C+S models, with respect to the SAW model. In addition, the sawtoothcomponent of the C+S model, quantiﬁed by the B parameter, is compatible with zero. Since adding the sawtooth component to the cosine modulation in the most informativedataset didn’t produce any change with respect to the pure-cosine model, the next step ofour analysis was comparing the SAW and COS model on all the available data. In principlein the various phases the size of the sawtooth variation could be diﬀerent, thus we allowedto have three diﬀerent B parameters. The results of the global ﬁt is shown in ﬁg. 6 andthe details can be summarized as follows: • For the COS ﬁt: A = (0 . ± . cpd/kg/keVee χ /dof = 116 . / (4.1)– 11 –it results A [cpd/kg/keVee] B [cpd/kg/keVee/yr] χ / dof C+S ﬁt . ± . − . ± . . / Model comparison BF [dB] LR [dB] OF [dB] ∆ BIC ∆ AICCOS vs SAW . . . − . − . COS vs C+S . − . . − . − . SAW vs C+S − . − . . . . Table 4 . Top : Results of the ﬁt of the cosine amplitude A and the sawtooth coeﬃcient B of the C+S model obtained on the DAMA residuals in the (2-6) keVee energy window during theDAMA/LIBRA Phase II, together with the corresponding χ / d.o.f. Bottom : Comparison betweenthe various models in the DAMA/LIBRA Phase II dataset in terms of Bayes factor (BF), likelihoodratio (LR), Ockham’s factor (OF) and diﬀerence of Bayesian Information Criterion ( ∆ BIC) andAkaike Information Criterion ( ∆ AIC). For all these four metrics (BF, LR, ∆ BIC, ∆ AIC) theSAW model is largely disfavoured.

Figure 6 . Fit to the DAMA residual rate in all the three stages in the (2-6) keVee energy window. • For the SAW ﬁt:  B NaI = (0 . ± . cpd/kg/keVee/yr B LIBRAI = (0 . ± . cpd/kg/keVee/yr χ /dof = 145 . / B LIBRAII = (0 . ± . cpd/kg/keVee/yr (4.2)As a cross check, for the SAW model in each of the phases we obtain for the parameters B the same results of table 3. The comparison between COS and SAW model gives BF = 88 . dB ,LR = 64 . dB ,OF = 24 . dB , ∆ BIC = − . , ∆ AIC = − . . (4.3)In this case the Ockham’s factor is still quite marginal with respect to the LR, and theﬁnal Bayes factor, which is . in normal units, pushes again strongly towards the COSmodel. – 12 – .3.1 Extraction of phase and period of the cosine model Finally, we decided to ﬁt all the available data with a pure-cosine model in which the phaseand the period of the modulation are not ﬁxed, that is to say that t and T deﬁned ineq. (3.8) are now treated as parameters. In this case, choosing uniform priors for the twonew parameters, we obtain:  A = (0 . ± . cpd/kg/keVee t = (0 . ± . yr χ /dof = 112 . / T = (1 . ± . yr (4.4)These results are compatible with a period of 1 year and a t = 152 . d = 0 . yr . Thecomparison between cosine and SAW model this time gives BF = 62 . dB ,LR = 71 . dB ,OF = − . dB , ∆ BIC = − . , ∆ AIC = − . . (4.5)Therefore, again, even if now the SAW model enters the game with the same number ofparameters of the cosine model (in fact now the OF slightly favours the SAW model), theBF, which is . in normal units, still pushes strongly towards the cosine model. All our studies indicate that the time modulation of the DAMA residual rate in the (2 − keVee energy window cannot be described by an artefact due to the algorithm to subtracta slowly varying background.Nevertheless, we believe that such an algorithm could potentially have an impact onthe extraction of the parameter of interest of the signal. In particular the deﬁnition of thetime window used to average the rate can introduce the following problems in the residualrate: • a non zero contribution due to the possible presence of a signal as discussed in Sec. 2.1; • a sawtooth time modulation due to the presence of a slowly varying background; sucha modulation can enhance or reduce the amplitude of a sinusoidal signal, as well asaﬀect its phase.For this reason we suggest to use a model that takes these eﬀects into account.Let’s consider a data-taking cycle that starts at t ∗ and extends for a period of time ∆ , the true value µ i of the observed rate in the time bin t i is: µ i = A cos (cid:18) πT ( t i − t ) (cid:19) − A ∆ (cid:90) t ∗ +∆ t ∗ cos (cid:18) πT ( t (cid:48) − t ) (cid:19) dt (cid:48) + B (cid:18) t i − ∆2 (cid:19) (4.6)where A is the amplitude, T the period, and (2 π t /T ) the phase of the sinusoidal signal,while B is the slope of a linearly varying background. For a diﬀerent time dependenceof the background a linear model is however the ﬁrst order contribution in a time powerseries.For the sake of completeness, we deployed such a model to ﬁt the three phases ofthe experiment in the (2-6) keVee energy window. Like we did in Section 4.3, since in the– 13 –arious phases the size of the sawtooth variation could be diﬀerent, we allowed to havethree diﬀerent B parameters. We performed a ﬁrst ﬁt (F1) keeping both T and t ﬁxedto the values expected for a DM signal, as well as a second one (F2) allowing them tovary. To better understand the diﬀerences with respect to the COS model ﬁt, we decidedto show in the ﬁg. 7 and ﬁg. 8 the marginal posteriors pdfs of all the ﬁt parameters for F1and F2, respectively. F1 F2A [cpd/kg/keVee] . ± . . ± . B1 [cpd/kg/keVee/yr] . ± . . ± . B2 [cpd/kg/keVee/yr] . ± . . ± . B3 [cpd/kg/keVee/yr] − . ± . − . ± . Table 5 . Results of the F1 and F2 ﬁt in terms of bet-ﬁt values of the parameters.

As shown in table 5, except for B which is positive, the other two are compatible with zeroin both cases. The value of A extracted in this way is slightly smaller with respect to theprevious cases because of the expected anti-correlation with the Bi parameters. The smallcorrelation between the Bi parameters, assumed to be a-priori independent, is an inducedeﬀect of their common anti-correlation with the A parameter. For the F2 ﬁt, the valuesof t and T are consistent with those expected for a DM signal. Finally, we remark thateven though we included into the model the correction for a signal “double-counting” asdescribed in Section 2.1, this eﬀect is not relevant with the current choice of time intervalsfor the experimental cycles. – 14 – igure 7 . Marginal posterior pdfs for the free parameters of the proposed model (Sec. 4.4) at ﬁxed T and t for the ﬁt of the whole (2-6) keVee DAMA dataset. The 4 plots on the diagonal of theﬁgure are the uni-dimensional pdfs of each single free parameter obtained by marginalising on theall the others. The 10 bi-dimensional pdfs in the bottom-left corner of the ﬁgure give the marginalpdfs of each pair of parameters obtained by marginalising on the other. The plots show also thecredible regions at . , . , . probability as black, green, and blue contours respectively. Thecorrelation coeﬃcients are given in the upper-right corner of the ﬁgure. – 15 – igure 8 . Marginal posterior pdfs for the free parameters of the proposed model (Sec. 4.4)for the ﬁt of the whole (2-6) keVee DAMA dataset. The 6 plots on the diagonal of the ﬁgureare the uni-dimensional pdfs of each single free parameter obtained by marginalising on the allthe others. The 15 bi-dimensional pdfs in the bottom-left corner of the ﬁgure give the marginalpdfs of each pair of parameters obtained by marginalising on the other. The plots show also thecredible regions at . , . , . probability as black, green, and blue contours respectively. Thecorrelation coeﬃcients are given in the upper-right corner of the ﬁgure. – 16 – Conclusions

In this paper we elaborated on the proposal of ref. [7] to interpret the DAMA modulationas a possible eﬀect of a slow varying time dependent background. We performed a Bayesiananalysis of the DAMA residual rate in the (2-6) keVee energy range with the aim of givinga quantitative comparison of the two possible hypotheses: the cosine modulation claimedby the DAMA collaboration and the sawtooth model proposed in ref. [7]. The source codeused in this work has been made publicly available on

GitHub [13].For the diﬀerent models considered we estimated the posterior probabilities, andfor each pair of models we computed both the Bayes factor, factorised also in terms oflikelihood ratio and Ockham’s factor, clearly separating the contributions coming from thedata and the prior choice, the Bayesian Information Criterion and the Akaike InformationCriterion.The conclusion of this work can be summarised as follows. With an exception forthe DAMA/NaI dataset, which is however the less informative one, in all the comparisonswe performed in Section 4 the sawtooth hypothesis is always disfavoured with respect tothe cosine modulation hypothesis. This is quantiﬁable in the Bayes factor which is in allcases between and . To be more speciﬁc, in the case of using the entire dataset andcomparing the cosine with just the amplitude as a free parameter in the ﬁt against thesawtooth with three independent and free to vary slope parameters for the three phasesof data-taking, we obtain a Bayes factor of . in favour of the cosine model. Theeﬀect of the priors choice has been extensively tested and can be reasonably quantiﬁedin a contribution at the most of the order to the total BF. Finally, we used the fullcosine model with free period and phase and still found a BF of order . For this case weobtained a value of the period and phase compatible with those reported by the DAMAcollaboration.Finally, we point out that the background subtraction algorithm used by DAMA canintroduce a bias in the parameters of the ﬁtted signal, produced by a residual contributionfrom the expected signal as well as a contribution from a slowly varying background.Although the time intervals are chosen in a such a way that this bias is small, we showedthat these eﬀects can be safely taken into account in the analysis as shown in Section 4.4. Acknowledgments

We thank Dario Buttazzo, Fabio Cappella, and Paolo Panci for useful discussions.– 17 –

Bayesian model comparison

In the following we shall suppose to have a dataset D = { x i } composed of n measurements x i , a set of models or hypotheses H i where each hypothesis could in general depend on avector of parameters (cid:126)θ i . Within a model, we assume to know the likelihood function: L ( H i , (cid:126)θ i ; D ) = p ( D | H i , (cid:126)θ i , I ) , (A.1)the expression on the right-hand-side of the previous equation is the probability densityfunction (pdf) to observe some speciﬁc data D given the model H i , its parameters (cid:126)θ i , andall the relevant information I before the measurement is carried out. The likelihood, whichis numerically identical to p ( D | H i , (cid:126)θ i , I ) , has to be intended as a function of the model andits parameters at ﬁxed (observed) data. We shall assign a prior probability π ( H i | I ) to eachmodel, as well as a prior probability π ( (cid:126)θ i | H i , I ) to its parameters, both conditioned to thestate of information I . Priors are intended to encapsulate the probability that the model isvalid before data are seen. For the parameters, the priors deﬁne their range of variabilityby setting the probability to ﬁnd them in any given interval of the allowed space. In thefollowing, all probabilities are conditional to the state of information I even if, to keep thenotation simple, we often don’t write it explicitly.Suppose we want to compare two diﬀerent models H i and H j , which for the momentwe assume don’t depend on any parameter. To compare these models we can compute theratio of the posterior probabilities: O ij ≡ p ( H i | D, I ) p ( H j | D, I ) , (A.2)this ratio gives the odds in favour of one model over the other. To express the odds asa function of the likelihoods and priors, we shall apply the Bayes theorem; that is, forexample for the numerator of eq. (A.2), p ( H i | D, I ) = p ( D | H i , I ) π ( H i | I ) π ( D | I ) . (A.3)The denominator π ( D | I ) is the prior probability of data and it can be more easily inter-preted if expressed in terms of the likelihoods by imposing that (cid:80) i p ( H i | D, I ) = 1 . Todo this we need to have a complete class of hypotheses, and this is not often the case.However, if we are only interested in the odds ratio we can neglect this term as it cancelsout in the odds. The ratio between the probabilities of the two models becomes: O ij ≡ p ( D | H i , I ) p ( D | H j , I ) × π ( H i | I ) π ( H j | I ) . (A.4)The data dependent part is the so called “Bayes factor” (BF), and it is deﬁned as: BF ij ≡ p ( D | H i , I ) p ( D | H j , I ) . (A.5)In the hypothesis in which, due to our a priori ignorance, all the model’s priors are equal,considering the odds ratio is equivalent to considering the Bayes factor.The case of a model H i ( (cid:126)θ i ) dependent on a vector of parameters (cid:126)θ i is more complicatedbecause we have to compute the marginal likelihood of the model, which corresponds tothe likelihood of the model averaged over all possible values of the parameters. For this– 18 –eason, it is often called average or marginalised likelihood. The marginal likelihood iscomputed by marginalising over the model’s parameters: p ( D | H i , I ) = (cid:90) p ( D, (cid:126)θ i | H i , I ) d(cid:126)θ i = (cid:90) p ( D | H i , (cid:126)θ i , I ) p ( (cid:126)θ i | H i , I ) d(cid:126)θ i , = (cid:90) L ( (cid:126)θ i , H i ; D ) π ( (cid:126)θ i | H i ) d(cid:126)θ i , (A.6)where we applied the multiplication law of probabilities, and, in the last equality, expressedthe integrand in terms of likelihood and prior probabilities of the parameters. In any realsituation, this integral is not solvable analytically and can only be estimated numerically.Nevertheless, in the case where the information content of our measurement is such thatthe priors can be considered suﬃciently vague in the range of the parameters space wherethe likelihood is sizable, we can consider the prior as a constant and use the Laplaceapproximation to estimate the integral. This is a very useful approximation to get senseof the diﬀerent contributions to the marginal likelihood. For simplicity, let’s assume themodel depends just on a single parameter θ i , the best ﬁt to the data occurs for θ = ˆ θ (inthe following we will use ˆ x to refer to the best ﬁt value of the quantity x ), the likelihoodis reasonably normal, and the prior is ﬂat over a range ∆ θ i . Under this assumptions theintegral can be approximated as: p ( D | H i , I ) = (cid:90) L ( θ i , H i ; D ) π ( θ i | H i ) dθ i , (cid:39) θ i √ π σ ( ˆ θ i ) L ( ˆ θ i , H i ; D ) . (A.7)The last expression makes it clear that the marginal likelihood corresponds to the likelihoodevaluated at the best ﬁt value of θ i weighted by a volume factor corresponding to theparameter’s uncertainty. In this oversimpliﬁed example, the proportional factor betweenthe global and the best ﬁt likelihood is given by the ratio of two quantities, namely ∆ θ i and σ ( θ i ) , which control respectively the possible range of θ i before and after the data are seen.This factor is often referred to as the Ockham’s factor as it naturally penalises a model bya quantity proportional to the necessary complexity of the model to get a good ﬁt withrespect to the a priori model complexity; the ideal case where the model is a good one andthe a priori variability of the parameter is just enough to ﬁt the data, the Ockham’s factoris of order one. The Ockham’s factor enters in the BF through the marginal likelihood, andthus when we compare two models we automatically account for the potential diﬀerencein their complexity.In order to visualise that, let’s make a useful example. Let’s consider two model: • Model A : this models has only one parameter, let’s call it θ A , and the prior over thisparameter is a uniform distribution in the range [ θ A , θ A ] of width ∆ θ A , and thus thepdf is: p ( θ A | H A , I ) = 1∆ θ A (A.8) • Model B : this models has two parameters, θ B and θ B , that are uncorrelated andthe prior over this parameters is a uniform distribution of width ∆ θ B and ∆ θ B ,respectively: p ( θ B , θ B | H B , I ) = p ( θ B | H B , I ) p ( θ B | H B , I ) = 1∆ θ B θ B (A.9)– 19 –ssuming normal likelihoods, and using the eq. (A.7), the BF can be estimated as: BF AB (cid:39) p ( D | ˆ θ A , H A , I ) p ( D | ˆ θ B , ˆ θ B , H B , I ) × √ πσ (ˆ θ A )∆ θ A (cid:30) (cid:32) (2 π ) σ (ˆ θ B ) σ (ˆ θ B )∆ θ B ∆ θ B (cid:33) = LR AB × OF AB , (A.10)where we deﬁned the likelihood ratio ( LR AB ) and the Ockham’s factor ( OF AB ) as LR AB = p ( D | ˆ θ A , H A , I ) p ( D | ˆ θ B , ˆ θ B , H j , I ) ,OF AB = √ πσ (ˆ θ A )∆ θ A (cid:30) (cid:32) (2 π ) σ (ˆ θ B ) σ (ˆ θ B )∆ θ B ∆ θ B (cid:33) (A.11)The message behind eq. (A.10) is that even though a model with more parameterscan be more ﬂexible and thus better ﬁt the data producing an higher likelihood, one hasto pay the price of having a more complex model. In other words: even if the likelihoodratio pushes towards a more complex model which often better ﬁts the data, on the otherhand the Ockham’s factor penalises it.It is important to stress that for parametric models the Bayes factor critically dependson the choice of priors, that, in this case, is represented by the choice of the widths ∆ θ i .For models with the same number of parameters, especially if the parameters have thesame physical meaning, the Ockham’s factor should be of order one, and the BF shouldfairly be insensitive to priors’ choice.For models with a diﬀerent number of parameters one has to pay particular attentionto avoid introducing biases due to an ‘unreasonable choice of priors’. For this reason, inthe following we will start by comparing models with just one parameter and then moveto more complex and realistic scenarios exploring the sensitivity of our conclusions to theprior’s choice. B Details of the

JAGS

MCMC simulation

The full hierarchical probability model described in Section (3.2) is implemented in

JAGS .To deﬁne the model one uses the

BUGS language [18–20], which, for example for the COSmodel, looks as follows: model {for ( i in 1: n ) {y [ i ] ~ dnorm ( mu [ i ] , 1. / ( sd [ i ]^2))mu [ i ] <- A * cos (2. * pi * ( t [ i ] - t0 ) / T )} where the for loop is over the entries of the data n -tuple, dnorm() stands for the normaldistribution density function whose parameters are the true amplitudes mu[i] as given bythe functional form of the model, and the standard deviations sd[i] given as τ i = 1 /σ i .The last line deﬁnes the prior pdf for the parameter of interest A , which, in this case isgiven by a uniform distribution in the range [0 , maxA ] . This model will be ﬁt to time seriesdata described by an n -tuple { D i } = { t i , y i , σ i } of length n , containing for each time bin t i the measurement y i and the symmetric uncertainty σ i associated to it.– 20 – igure 9 . Posterior pdf (black lines) and priors pdf (red lines) for the free parameters A of theCOS model (left) and B of the SAW model (right) for the ﬁt of the DAMA/LIBRA Phase II data. Figure 10 . Likelihood of the DAMA/LIBRA Phase II data for the COS model (left) and theSAW model (right).

B.1 Prior choices

The priors used in our analysis are the following: • for all the cases in which there’s a cosine modulation the amplitude A has a uniformprior: π ( A ) = (cid:40) A if ≤ A ≤ ∆ A otherwise (B.1)with ∆ A = 0 . cpd/kg/keV ee . One could in general choose a wider prior, but withthe upper limit set by the results of the other DM experiments. • for all the cases in which there’s a sawtooth contribution, the slope B has a uniform– 21 – igure 11 . Posterior pdf (black lines) and priors pdf (red lines) for the free parameter A of theCOS model for the ﬁt of the DAMA/LIBRA Phase II data: the priors in these cases are gammadistributions with a shape α = 1 and a scale θ = 100 (left) or θ = 5000 (right). As expected, theposterior is quite diﬀerent from the ﬂat prior case of ﬁg. 9 only in the most extreme case (right). prior: π ( B ) = (cid:40) B if − ∆ B ≤ B ≤ ∆ B otherwise (B.2)with ∆ B = 0 . cpd/kg/keV ee/yr . In this case we allowed B to have negative values inorder to take into account a possible slowly decreasing sawtooth. As in the previouscase, one could in general choose a wider prior, but with the upper limit on thebackground set by the knowledge of the DAMA experimental apparatus. • for the case in which we let the period and the phase of the cosine free to vary, thetwo parameters T and t have a uniform prior: π ( T ) = (cid:40) T if ˜ T − ∆ T ≤ T ≤ ˜ T + ∆ T otherwiseπ ( t ) = (cid:40) t if ˜ t − ∆ t ≤ t ≤ ˜ t + ∆ t otherwise (B.3)where: ˜ T = 1 yr ˜ t = 152 . d = 0 . yr ∆ T = ˜ T / . yr ∆ t = ˜ t / . yr (B.4)In this case, since we would like to study the eﬀect of letting these two parametersfree to vary, we chose quite large priors around the expected values for a DM signal.Since in all cases, the likelihoods are quite narrow with respect to the priors wechoose, we were able to easily estimate the OF using the Laplace approximation as shownin Appendix A. – 22 –n all this cases these priors could be diﬀerent in principle, and we checked, by chang-ing both the width of the priors and their functional form, that the conclusions of our anal-ysis wouldn’t change with other reasonable choices. As an example, in ﬁg. 9 we show theposterior and the prior for the parameters of the COS and SAW models in the DAMA/LI-BRA Phase II experiment; for the sake of completeness, in ﬁg. 10 we show the likelihoodsof the two models in the same dataset. In ﬁg. 11, instead, we show, always as an example,how the posterior of the COS parameter in the DAMA/LIBRA Phase II dataset wouldchange with two diﬀerent choices of priors (a gamma distrubution in both cases): in onecase, the prior is signiﬁcantly diﬀerent from zero in the region where the likelihood has themaximum, and thus it does not aﬀect signiﬁcantly the posterior pdf which is comparablewith that of ﬁg. 9 (ﬁg. 11: left-hand side); in the other case, the prior pdf is so muchconcentrated in zero that the posterior changes signiﬁcantly (ﬁg. 11: right-hand side). B.2 Posterior results

The results of our ﬁts, given in terms of posterior pdfs, are showed in ﬁg. 12, 13, 14, 15, 16,17, 18. In all cases, the MCMC chains reach equilibrium after few steps showing a stabletrace. The numerical error on the results sampled with O (10 ) steps is negligible. As onecan see from all the ﬁgures listed here, all the posteriors are reasonably normal, as couldbe expected from the assumption of normal likelihoods and uniform priors. In particular: • ﬁg. 15 shows the expected anti-correlation between A and B of the C+S model and,in addition, indicates that B is consistent with zero. • ﬁg. 17 shows, as expected, that the three Bi slopes of the sawtooth for the SAWmodel are not correlated when ﬁtted on the three stages of the experiment. • ﬁg. 18 shows the posterior distributions of the parameters A , T and t of the cosinemodel when we let the period and the phase free to vary. As expected T and t arestrongly anti-correlated, while we don’t see any remarkable correlation between theother parameters.Finally, in ﬁg. 8 we show the result of a complete ﬁt on all the (2-6) keVee dataset,obtained using a C+S model with three diﬀerent slopes in the three diﬀerent stages andtreating the period and the phase of the cosine modulation as free parameters, and, inﬁg. 7, we display the result of the same ﬁt, but ﬁxing the period and the phase of thecosine modulation to their expected values ( T = 1 yr and t = 152 . d ).– 23 – igure 12 . Posterior pdfs for the free parameters of the two models for the ﬁt of the DAMA/NaIdata. On the left hand side, the parameter B for the SAW model, and, on the right hand side, theparameter A for the COS model. Figure 13 . Posterior pdfs for the free parameters of the two models for the ﬁt of the DAMA/LI-BRA I data. On the left hand side, the parameter B for the SAW model, and, on the right handside, the parameter A for the COS model. – 24 – igure 14 . Posterior pdfs for the free parameters of the two models for the ﬁt of the DAMA/LI-BRA II data. On the left hand side, the parameter B for the SAW model, and, on the right handside, the parameter A for the COS model. – 25 – igure 15 . Posterior pdfs for the free parameters of the C+S model for the ﬁt of the DAMA/LI-BRA II data. The ﬁgure shows in the lower-left corner the combined pdf reporting also the credibleregions at . , . , . probability as black, green, and blue contours respectively. The upper-leftand lower-right plots are the marginal pdfs for the parameter A and B respectively. The correlationcoeﬃcient is given in the upper-right corner of the ﬁgure. – 26 – igure 16 . Posterior pdf for the free parameter of the COS model for the ﬁt of the whole (2-6)keVee DAMA dataset. – 27 – igure 17 . Marginal posterior pdfs for the free parameters of the SAW model with three diﬀerentslopes for the ﬁt of the whole (2-6) keVee DAMA dataset. The 3 plots on the diagonal of theﬁgure are the uni-dimensional pdfs of each single free parameter obtained by marginalising on theall the others. The 3 bi-dimensional pdfs in the bottom-left corner of the ﬁgure give the marginalpdfs of each pair of parameters obtained by marginalising on the other. The plots show also thecredible regions at . , . , . probability as black, green, and blue contours respectively. Thecorrelation coeﬃcients are given in the upper-right corner of the ﬁgure. – 28 – igure 18 . Marginal posterior pdfs for the free parameters of the cosine model for the ﬁt ofthe whole (2-6) keVee DAMA dataset. The 3 plots on the diagonal of the ﬁgure are the uni-dimensional pdfs of each single free parameter obtained by marginalising on the all the others.The 3 bi-dimensional pdfs in the bottom-left corner of the ﬁgure give the marginal pdfs of eachpair of parameters obtained by marginalising on the other. The plots show also the credible re-gions at . , . , . probability as black, green, and blue contours respectively. The correlationcoeﬃcients are given in the upper-right corner of the ﬁgure. – 29 – eferences [1] R. Bernabei et al., Riv. Nuovo Cim. , 1-73 (2003), and refs. therein,[arXiv:astro-ph/0307403].[2] R. Bernabei et al., Phys. Lett. B480

D13 , 2127 (2005), [arXiv:astro-ph/0501412].[4] R. Bernabei et al., Phys. J.

C56 , 333 (2008), arXiv:0804.2741 [astro-ph].[5] R. Bernabei et al., Eur. Phys. J.

C73 , 2648 (2013), arXiv:1308.5109 [astro-ph.GA].[6] R. Bernabei et al., Nucl. Phys. At. Energy (2018) 307, arXiv:1805.10486 [hep-ex].[7] D. Buttazzo, P. Panci, N. Rossi and A. Strumia, arXiv:2002.00459 [hep-ex].[8] A. Krishak, A. Dantuluri and S. Desai, JCAP (2020) no.02, 007, arXiv:1906.05726[astro-ph.CO].[9] Akaike H., IEEE T. Automat. Contr. (1974) 716.[10] Schwarz G., Ann. Statist. (1978) 461.[11] A. R. Liddle, Mon. Not. Roy. Astron. Soc. (2007) L74, astro-ph/0701113.[12] H. Jeﬀreys, Theory of probability, 3rd ed, Oxford Classics series (reprinted 1998) (OxfordUniversity Press, Oxford, UK, 1961).[13] GitHub repository for the source code of this work https://github.com/piacent/bayes_analysis [14] R Core Team (2018), R Foundation for Statistical Computing, Vienna, Austria. , R Core Team (2018).[15] M. Plummer, Proceedings of the 3rd International Workshop on Distributed StatisticalComputing (DSC 2003), March 20-22, Vienna, Austria. ISSN 1609-395X, http://mcmc-jags.sourceforge.net/ .[16] M. Plummer, R package version 4-10, https://cran.r-project.org/web/packages/rjags/index.html .[17] Q. F. Gronau et al. , https://github.com/quentingronau/bridgesampling .[18] W.R. Gilks, A. Thomas , D.J. Spiegelhalter, The Statistician , 169-178 (1994).[19] D. Lunn, D. Spiegelhalter, A. Thomas, N. Best, Statistics in Medicine, (2009) 3049-67.[20] The BUGS Project, ..