Multi-messenger Astronomy: a Bayesian approach
MMulti-messenger Astronomy: a Bayesian approach
G. Torralba Elipe ∗ , R. A. Vazquez and E. Zas Departamento de Física de Partículas & Instituto Galego de Física de Altas Enerxías,Universidade de Santiago de Compostela, 15782, Santiago de Compostela, SpainE-mail: [email protected] , [email protected] and [email protected] After the discovery of the gravitational waves and the observation of neutrinos of cosmic origin,we have entered a new and exciting era where cosmic rays, neutrinos, photons and gravitationalwaves will be used simultaneously to study the highest energy phenomena in the Universe. Herewe present a fully Bayesian approach to the challenge of combining and comparing the wealthof measurements from existing and upcoming experimental facilities. We discuss the procedurefrom a theoretical point of view and using simulations, we also demonstrate the feasibility ofthe method by incorporating the use of information provided by different theoretical models anddifferent experimental measurements. ∗ Speaker. c (cid:13) Copyright owned by the author(s) under the terms of the Creative CommonsAttribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). http://pos.sissa.it/ a r X i v : . [ a s t r o - ph . I M ] A ug ulti-messenger Astronomy: a Bayesian approach G. Torralba Elipe
1. Introduction
We are incoming in a new era for astroparticle physics. A lot of experiments are living to-gether measuring different observables in a wide range of energies: Pierre Auger Observatory [1],Telescope Array [2], HAWC [3], IceCube [4], Magic [5], Antares [8]. New experiments will bedeveloped such as Cherenkov Telescope Array [7] or KM3Net [9] and we are going to have un-precedented number of events to perform analyses that could answer the questions related withhigh-energy cosmic rays, neutrinos and photons from more than century ago such that: what arethe cosmic rays?; where are they comming from?; how are they accelerated?. There are no doubtsthat for answer these questions the experiments must share their results and the answer will arriveby combining all the measurements.In this work we present a brief review of Bayesian inference in Sec. 2, explaining the parameterestimation and hypothesis testing. The practice of these methods are shown using toy simulationsin Sec. 3. First we consider that two experiments analyse different data without taking into accountthe results of the other experiment. After that, we consider that the experiments share their resultsand modify their prior information in their analyses.Finally the conclusions are reported in Sec. 4.
2. Review of Bayesian statistical inference
The well known Bayes’ theorem is a consequence of the law of conditional probability P ( A | B , I ) = P ( A , B | I ) P ( B | I ) , (2.1)and the law of total probability P ( B | I ) = n ∑ i = P ( B | C i , I ) P ( C i | I ) . (2.2)Here, A and B are two events of the sample space , S (the space of all possible outcomes of anexperiment), the set { C i } ni = performs a partition of S and I is any prior information that we havebefore the analysis (see Sec. 2.1). The equation 2.1 can be rewrite as P ( A , B | I ) = P ( A | B , I ) P ( B | I ) , (2.3)which it is understood as: by assuming the information I (which include the prescription of prob-abilities), the probability of the events A and B is the product of the probability of A given B (theprobability of A if B occurs) and the probability that B occurs. On the other hand 2.2 is readed as:assuming I , the probability of B is given by the sum of all possibilities of obtaining B . Notice thatsince { C i } ni = is a partition of S , either B = C i for some i or B = ∩ kj = C j for some j and k .Finally, the Bayes’ theorem is expressed as P ( A j | B , I ) = P ( B | A j , I ) P ( A j | I ) P ( B | I ) = P ( B | A j , I ) P ( A j , I ) ∑ ni = P ( B | A i | I ) P ( A i | I ) , (2.4)understood as: the probability of obtaining the event A i given B and assuming I is the product of theprobability of obtaining B given A i and the probability of obtaining A i normalised to all possibilitiesof obtaining B . 2 ulti-messenger Astronomy: a Bayesian approach G. Torralba Elipe
Let D = { x i } ni = be n realizations of a random variable X , i.e, n results of experiments con-sisting in measuring the variable X . Let θ be a parameter of interest. Notice that there are notrestrictions on the dimensions of X and θ . The Bayesian inference consists of allocating probabil-ities to the possible values of θ according to the observed data set D by solving the equation π ( θ | D , I ) = f ( D | θ , I ) π ( θ | I ) f ( D | I ) = Likelihood × PriorEvidence , (2.5)which is expressed in terms of probability density functions. Now we describe each term appearingin Eq. 2.5. Likelihood function: f ( D | θ , I ) The likelihood function f ( D | θ , I ) is the conditional probability distribution of D given the un-known parameter θ and it is usually denoted as L ( θ | D ) . This function describes how the data set D is distributed assuming a given value of θ . The likelihood function expresses all informationobtainable for the data satisfaying the Likelihood principle : All the information about θ that can beobtained from an experiment is contained in the likelihood function for θ given X . Two likelihoodfunctions for θ (from the same or different experiments) contain the same information about θ ifthey are proportional to one another, see [10] and [11]. In [11] it is also shown that the likelihoodprinciple is derived by the assumption of two principles: the principle of sufficiency and the princi-ple of conditionality . These principles can be described informally as asserting the “irrelevance ofobservations independent of a sufficient statistic” (sufficiency) and the “irrelevance of experimentsnot actually performed” (conditionality). The prior: π ( θ | I ) It describes all the information that we have about the parameter of interest before perform-ing the experiment. A prior distribution can be created using information about past experiments,using theoretical knowledge or expressing our total ignorance about the problem. When we donot have information about the parameter of interest one should follow the
Laplace criterion rule paraphrased as: “in the abscence of any further information (prior information) all possible resultsshould be considered equally probable”. This kind of prior is the so called “flat prior”.
The posterior: π ( θ | D , I ) This function describes our knowledge about the θ parameter after the data analysis of the ex-perimental results. Then one can read Eq. 2.5 as an update of the prior knowledge of θ , describedby the prior, through the experiment described by the likelihood. For each event x i ∈ D of thedata set, our knowledge about θ changes. Once the posterior distribution is known there are twostandard estimators for the true value of θ : the mean of the posterior and the mode (the so called M aximum of A P osteriori distribution, MAP).
The evidence: f ( D | I ) ulti-messenger Astronomy: a Bayesian approach G. Torralba Elipe
Also denoted as Z acts as a normalization constant in the parameter inference but takes animportant role in the Bayesian Model Selection explained in Sec. 2.3. The evidence is given by: Z = (cid:90) f ( D | θ , I ) π ( θ | I ) d θ . (2.6) The confidence intervals or credible sets (here denoted as C.I) are easy to calculate in theBayesian approach. Once the posterior distribution is known we want to find between which values [ θ , θ ] the actual value of the parameter has been estimated. Usually this question is answered withan associated probability q which is typically 0.68, 0.9 and 0.95. The limits of the range are givenby solving the equation q = P ( θ low ≤ θ ≤ θ up ) = (cid:90) θ up θ low π ( θ | D , I ) d θ . (2.7)When the inferred value of θ equal or near to one of the limits of the possible values of θ , one talkabout upper or lower limits depending if θ ≈ θ min or θ ≈ θ max . Consider now two hypotheses I and I that we want to constrast and we perform an experimentwhich gives us the data set D = { x i } ni = . We are going to consider that the likelihood functionsare different for the different hypotheses, for I we have L ( θ | D ) = f ( D | θ ) and for I we have L ( ω | D ) = f ( D | ω ) where θ and ω could in principle have different dimensions ( θ could be forinstance a shape of an exponential distribution and ω could be the mean and the variance of anormal distribution). The posterior distributions are given by π ( θ | D , I ) = L ( θ | D ) π ( θ | I ) Z (2.8)for the first hypothesis and π ( ω | D , I ) = L ( ω | D ) π ( ω | I ) Z (2.9)for the second hypothesis. Z and Z are the normalization factors for their respective equations: Z = (cid:90) f ( D | θ ) π ( θ | I ) d θ = P ( D | I ) , (2.10)which gives the probability of the data set D given the hypothesis I (once P ( D | I k ) has been nor-malized to all the hypotheses). In the same way, Z is the probability of D given the hypothesis I . The evidences have statistical meaning. Since we can calculate P ( D | I ) and P ( D | I ) we canalso calculate P ( I | D ) and P ( I | D ) using the Bayes’ theorem obtaining the probability of a givenhypothesis given the data set D and independently of the parameters θ and ω : P ( I m | D ) = P ( D | I m ) P ( I m ) P ( D ) = Z m P ( I m ) ∑ Ml = Z l P ( I l ) , (2.11)4 ulti-messenger Astronomy: a Bayesian approach G. Torralba Elipe where here M = m = ,
2. The expression shown in Eq. 2.11 is the generalization for M possible hypotheses.Once more the prior probabilities P ( I ) and P ( I ) must be chosen before the analysis. In thisway, we obtain a probability mass function in which the variables are the different hypotheses. Tocompare which of the hypotheses is preferred by data, the ratio between the posterior probabilitiesis performed: P ( I | D ) P ( I | D ) = Z Z P ( I ) P ( I ) . (2.12)This ratio is called “posterior odds” and the ratio P ( I ) / P ( I ) is called “prior odds”. The ratio ofthe evidences Z / Z is called the Bayes’ factor of the hypothesis I over I ( B , ) and represents thegain of probability of I over the hypothesis I after the data analysis:posterior odds ( I , I ) = B , × prior odds ( I , I ) . (2.13) Suppose that an observer wants to prepare an experiment to infer certain parameter θ whichcan take values in the Θ space with prior probabilities π ( θ , I ) . The distribution of the randomvariable X is given by the likelihood function f ( x | θ , I ) . The data distribution before the experimentis f ( ˜ x | I ) = (cid:90) Θ f ( ˜ x | θ , I ) π ( θ | I ) d θ (2.14)where ˜ x denotes unobserved data. f ( ˜ x | I ) is called the prior predictive distribution. After the ex-periment has been built and the data D analysed, the knowledge about θ has changed: π ( θ , I ) → π ( θ | D , I ) . Now the expected data distribution has also changed: f ( ˜ x , I ) → f ( ˜ x | D , I ) = (cid:90) Θ f ( ˜ x | θ , I ) π ( θ | D , I ) d θ (2.15)where f ( ˜ x | D , I ) is called the posterior predictive distribution. This distribution can be used tocompare with the observed data distribution to get a feeling of how well the estimation of θ fits themeasured data or for future experiments.
3. Simulations
Let E X and E Y be two experiments measuring different observables X and Y . The experimentsare interested in to measure the fraction of certain distribution ( signal ) that there is in their data. Asan example, X can be the proton fraction of cosmic rays at ultra-high energies while Y can be theastrophysical photon or neutrino fractions at energies in the PeV region. Let M and M two modelspredicting different signals both in X and Y and predicting different relations between the signalsas it is illustrated in Fig. 1. In our example α y = α . x ( − α x ) / . M and α y = α x ( − α x ) / M . 5 ulti-messenger Astronomy: a Bayesian approach G. Torralba Elipe x . . . . . . . p . d . f g s g bg g s g bg . . . . . y . . . . . . p . d . f g s g bg g s g bg .
00 0 .
25 0 .
50 0 .
75 1 . α x . . . . . . α y M M Figure 1:
Signal and background distributions (continuous and dashed lines) predicted from M (blue) and M orange for the two experiments: E X in the left panel and E Y in the center. The fraction of the signal inE Y as a function of the signal in E X is shown in the right panel for the two models. The signal and background are normal distributions (denoted by g s and g bg respectively) forthe two models with the following parameters: M g s ( x ) : µ = . σ = . g bg ( x ) : µ = . σ = . g s ( y ) : µ = . σ = . g bg ( y ) : µ = . σ = M g s ( x ) : µ = .
27 and σ = . g bg ( x ) : µ = .
18 and σ = . g s ( y ) : µ = .
37 and σ = . g bg ( y ) : µ = .
17 and σ = . M with α truex = . X measures 300 events and E Y measures 200 events. For these simulations we have (cid:104) x (cid:105) = . σ x = . (cid:104) y (cid:105) = .
27 and σ y = .
07. The likelihood function for the model M i ( i = ,
2) andvariable z ( z = x , y ) is given by: f ( z | α z , M i ) = α z g si ( z ) + ( − α z ) g bgi ( z ) . (3.2)Now we perform two analyses: one where each experiment analyse the data without any kind ofinformation (Sec. 3.1) and another one where the experiments use the information obtained fromthe other (Sec. 3.2). In this approach the experiments have no any prior information but they are interested in thefraction of the signal, then the fraction of signal plus the fraction of background must be one. Forthis reason, each experiment choose a uniform distribution between 0 and 1 as its prior.6 ulti-messenger Astronomy: a Bayesian approach
G. Torralba Elipe . . . . . . α x . . . . . . . . . π ( α x | D , M ) Z Z =9.83Flat prior M M α truex C.I : 90% C.I : 90% (a) . . . . . . α y . . . . . . . . . π ( α y | D , M ) Z Z =1.89Flat prior M M α truey C.I : 90% C.I : 90% (b) Figure 2:
Posterior probability distributions of E X (a) and E Y (b) for the fraction of the interesting signalassuming the different models: M (blue) and M (orange). In Fig. 2 the posterior probabiltiy distributions of the signals for each experiment under theassumption of the different models are displayed. We obtain numerically that E X obtains (cid:104) α (cid:105) x = .
33 and a C.I at 90% [ . , . ] assuming M while assuming M E X obtains (cid:104) α (cid:105) x = . [ . , . ] as the posterior mean value of the fraction of the signal and C.I respectively. E Y obtains (cid:104) α (cid:105) y = .
38 and C.I 0 . , .
74 assuming M and (cid:104) α (cid:105) y = .
62 as fraction of the interesting signalwith [ . , . ] as a C.I assuming M . Since each experiment assumes P ( M ) = P ( M ) = . X arrives to the conclusion that M is almost ten times most probable than M while the resolution to discriminate between the models in E Y is smaller and for this experiment P ( M | D ) / P ( M | D ) ∼ When one experiment has analysed some data, its prior knowledge change, and these changecan be use for the same experiment to analyse new data or for another experiment. In this examplewe show how the results of each experiment is used by the other. Assuming the results of E X in theprevious section E Y can modify the prior of α y for each theoretical model or scenario. In the sameway, E X can do the same in sight of the analysis done by E Y . These new priors are shown togetherwith the new results in Fig. 3. . . . . . . α . . . . . . . . π ( α | M ) x : M x : M y : M y : M Flat (a) . . . . . . α x π ( α x | D , M ) Combined Z Z =11.8 M M α truex C.I : 90% C.I : 90% (b) . . . . . . α y π ( α y | D , M ) Combined Z Z =3.17 M M α truey C.I : 90% C.I : 90% (c) Figure 3:
Prior probabilities for the signals given the independent analysis in panel (a): prior for α x giventhe results of E Y assuming M ( M ) is shown as continuous blue (orange) line; prior for α y given the resultsof E X assuming M ( M ) are shown as dashed blue (orange) line. They are compared with the uniform priortaken in the independent analysis (black dashed line). The posterior distributions for each experiment areshown in panels (b) and (c) for E X and E Y respectively. ulti-messenger Astronomy: a Bayesian approach G. Torralba Elipe
One can observe that by including the results of one experiment in the other the results change.Now E X obtains that the posterior odds in favour of M are: P ( M | D ) / P ( M | D ) = . × . ≈ M . When E Y analyse its data taking into accountthe results of E X the posterior odds also increase being now P ( M | D ) / P ( M | D ) ≈
31. Thereforeboth experiments have reasons to beleave that the true model is M and the joined results will be (cid:104) α x (cid:105) = .
36 and (cid:104) α y (cid:105) = .
31 with C.I [ . , . ] and [ . , . ] respectively being M at least 22times more probable than M .Finally, the posterior predictive distributions taking the results of the combined analysis areshown in Fig. 4. Even though the data can be well described by the two models, the Bayesiancombined analysis permits us distinguish numerically between the two models. x . . . . . . . N e v e n t s / ( N t o t ∆ b ) f (˜ x | D,M ) f (˜ x | D,M )Data (a) . . . . . . . y . . . . . N e v e n t s / ( N t o t ∆ b ) f (˜ y | D,M ) f (˜ y | D,M )Data (b) Figure 4:
Posterior predictive distributions for E X (a) and E Y (b) compared with the observed data.
4. Conclusions
The Bayesian approach for the combination of different measurements and detectors has beenpresented and tested using simulations. With these methods the estimation of the parameters ofinterest and the discrimination among different theoretical models or scenarios can be improvedusing past or present experimental results from different experiments.In this work we show how the combination of the information obtained with two differentdetectors can improve the parameter estimation, reduce the uncertainty and distinguish betweentheoretical models that can explain the same data.
References ulti-messenger Astronomy: a Bayesian approach TheLikelihood Principle , IMS Lecture Notes–Monograph Series (1988) Institute of MathematicalStatistics[11] A. Brihnbaum,
On the foundations of statistical inference , Journal of the American StatisticalAssociation , (1962)(1962)