Event Weighted Tests for Detecting Periodicity in Photon Arrival Times
EEvent Weighted Tests for Detecting Periodicity in Photon ArrivalTimes
Peter Bickel ∗ , Bas Kleijn † , & John Rice ∗ November 12, 2018
Abstract
This paper treats the problem of detecting periodicity in a sequence of photon arrival times,which occurs, for example, in attempting to detect gamma-ray pulsars. A particular focus is onhow auxiliary information, typically source intensity, background intensity, and incidence anglesand energies associated with each photon arrival should be used to maximize the detectionpower. We construct a class of likelihood-based tests, score tests, which give rise to eventweighting in a principled and natural way, and derive expressions quantifying the power of thetests. These results can be used to compare the efficacies of different weight functions, includingcuts in energy and incidence angle. The test is targeted toward a template for the periodiclightcurve, and we quantify how deviation from that template affects the power of detection.
From a sequence of photon arrival times 0 ≤ t < t < · · · t N < T , we wish to test the hypothesisthat some of the photons come from a periodic source (for example, a gamma-ray pulsar) versus thenull hypothesis that they come from a background plus a source that does not vary in time. Thebackground emission rate is assumed to be constant in time. Associated with each event is auxiliaryinformation, such as the incidence angle and the measured energy; we denote these variables by z . Ignoring this information is clearly wasteful, and in fact it would typically be used, at leastin the form of cuts in energy and incidence angle. The value of z associated with an event (anarrival) provides information about the relative likelihood that photon was from the source or thebackground, and it seems intuitively that the event should be correspondingly weighted in somemanner. (Note that cuts corresponds to weights that are zero or one.) A main thrust of this paperis to derive in a principled way how this information can best be used to enhance detection power.We derive expressions which quantify the efficiency of any weighting function and the form of theoptimal function.Unless the periodic light curve is known, there is no universally optimal test, since a test thatis most powerful against one light curve will not be most powerful against another. This statementalso applies to tests that attempt to adapt to the form of the lightcurve. Any test implicitly orexplicitly commits to a finite dimensional class of targets. Generally, the light curve of the source ∗ Department of Statistics, University of California, Berkeley † Korteweg-de Vries Institute for Mathematics, Faculty of Science, University of Amsterdam a r X i v : . [ s t a t . M E ] J un s unknown, so we consider testing against a template, a probability density function ν ( t ) on [0 , ν ( t ) = 1 + η (cid:88) n (cid:54) =0 α n e πint (1)for η ≥
0. If η = 0, the source intensity is constant in time. Defining ν τ ( t ) = ν ( t + τ ) ν τ ( t ) = 1 + η (cid:88) n (cid:54) =0 α n e πint +2 πinτ , (2)We model the arrival times as the superposition of independent background and source processes,a Poisson process with rate function λ ( t | θ, τ, µ, f ) = µc ( t )[(1 − θ ) + θν τ ( φ ( t ))] , ≤ θ ≤ c ( t ) denotes the sensitivity of the instrument at time t . Here θ is the proportion of flux fromsource; the phase function is φ ( t ) = f t , or if drift is taken into consideration, φ ( t ) = f t + ˙ f t / H : η = 0 versus the alternative K : η >
0. That is, we are concerned with a situation in whichthe presence of a source is not in doubt, but its periodicity is in question. Testing whether there isany source at all corresponds to testing H : θ = 0 against K : θ > t j , and the associated variables, z j , in a principled way, by appropriatelyweighting the arrival times. In Section 3, we show how the detection power of the test dependson the weights. Expressions derived there allow comparison of power when ideal weights are usedand using approximate weights, such as simple cuts. We will also see the price paid for mismatchbetween the template and the actual light curve and for mismatch of the specified frequency andthe actual frequency. Section 4 contains some illustrative examples. Some technical details aredeferred to an Appendix. Let f B ( z ) denote the probability density function of z for a background event and f S ( z ) the densityfunction for a source event. We base a test on the likelihood function, assuming that the z j areindependent of the arrival times: L = µ N N (cid:89) j =1 c ( t j )[(1 − θ ) f B ( z j ) + θf S ( z j ) ν τ ( φ ( t j ))] exp (cid:16) − µ (cid:90) T c ( t )[(1 − θ ) + θν τ ( φ ( t ))] dt (cid:17) (4)A score test (Lehman and Romano, 2006) of H versus K is formed by evaluating the derivative ofthe log likelihood at η = 0: S ( τ ) = n (cid:88) j =1 (cid:18) θf S ( z j )(1 − θ ) f B ( z j ) + θf S ( z j ) ( ν τ ( φ ( t j )) − (cid:19) − µθ (cid:90) T c ( t )[ ν τ ( φ ( t )) − dt (5)2f φ ( T ) (cid:29) c ( t ) varies slowly and is nonzero over a substantial fraction of [0 , T ], the secondterm is neglible. We will make this assumption throughout.Let w j = θf S ( z j )(1 − θ ) f B ( z j ) + θf S ( z j ) (6)This is the probability that photon j is from the source, given z j . For a very weak source (small θ ), an approximation to (6) gives w j ∝ f S ( z j ) /f B ( z j ). If z = ( E, ϕ ), energy and incidence angle,we can write f B ( z ) = f B ( E ) f B ( ϕ | E ) (7) f S ( z ) = f S ( E ) f S ( ϕ | E ) (8) w ( z ) = w ( E ) w ( ϕ | E ) (9)The optimal weight function is then w opt( E, ϕ ) = θf S ( E ) f S ( ϕ | E ) θf S ( E ) f S ( ϕ | E ) + (1 − θ ) f B ( E ) f B ( ϕ | E ) (10)For a weak source (small θ ), we have the approximation w opt ∝ f S ( E ) f S ( ϕ | E ) f B ( E ) f B ( ϕ | E ) (11)The function f S ( ϕ | E ) is the point spread function of incidence angle at energy E . The backgroundwould normally be assumed to be spatially uniform, from which f B ( ϕ | E ) would follow. The optimalweight function also depends on the ratio of the energy spectra of source to background, whichpotentially provides valuable information, but might be unknown in practice. In the latter caseone could use a weight function, w ( E, ϕ ) = θf S ( ϕ | E ) θf S ( ϕ | E ) + (1 − θ ) f B ( ϕ | E ) (12)or for small θ w ( E, ϕ ) = f S ( ϕ | E ) f B ( ϕ | E ) (13)The test statistic (5) depends on the data through N (cid:88) j =1 w j ( ν τ ( φ ( t j ) −
1) = N (cid:88) j =1 w j (cid:88) n (cid:54) =0 α n e πinφ ( t j )+2 πinτ (14)= (cid:88) n (cid:54) =0 α n A n e πinτ (15)where A n = (cid:80) j w j exp(2 πinφ ( t j )). To eliminate the dependence of the test statistic on the phase, τ , we use (cid:82) | S ( τ ) | dτ . By Parseval’s theorem (cid:90) (cid:12)(cid:12)(cid:12) (cid:88) n (cid:54) =0 α n A n e πinτ (cid:12)(cid:12)(cid:12) dτ = (cid:88) n (cid:54) =0 | α n | | A n | (16)3nless the source is weak, the statistic depends upon θ , which may be approximately known fromother analyses, or may be estimated by maximum likelihood under the null hypothesis. In thelatter case, the log likelihood is (cid:96) ( θ ) = N log µ + N (cid:88) j =1 log c ( t j ) + N (cid:88) j =1 log[(1 − θ ) f B ( z j ) + θf S ( z j )] − µ (cid:90) T c ( t ) dt. (17)The log likelihood depends on θ only through the third term, which can be easily maximizednumerically, if f B ( z ) and f S ( z ) are known. The final test statistic either uses the value of θ known a priori or the maximum likelihood estimate: Q T = 1 T (cid:88) n (cid:54) =0 | α n | | A n | (18)The score test is an attractive alternative to a generalized likelihood ratio test. To compute thelikelihood ratio test, the likelihood (4) would have to be maximized both under H and K , and thelatter would entail estimating the parameters θ , η and τ .Beran (1969) showed that this test, in an unweighted form, was locally most powerful invariantfor testing uniformity of a distribution on the circle. In the case | α n | = 0 , n > w j = 1, thisis Rayleigh’s test (Rayleigh, 1919). If | α n | = 1 , n ≤ m and | α n | = 0 , n > m and w j = 1, this isthe Z m test of Buccheri et al. (1983). De Jager et al. (1989) proposed the H -test, which chooses m adaptively.We now consider the distribution of Q T when there is no periodicity ( η = 0). Let β = (cid:82) w ( z ) f B ( z ) dz and ζ = (cid:82) w ( z ) f S ( z ) dz be the expected values of the weight of background andsource events and let β = (cid:82) w ( z ) f B ( z ) dz and ζ = (cid:82) w ( z ) f S ( z ) dz . The average value of a weightis E ( W ) = (1 − θ ) β + θζ and E ( W ) = (1 − θ ) β + θζ . Let µ = µT − (cid:82) T c ( t ) dt . In the Appendixwe argue that E H Q T (cid:39) [(1 − θ ) β + θζ ] µ (cid:88) n (cid:54) =0 | α n | (19) V ar H ( Q T ) (cid:39) [(1 − θ ) β + θζ ] µ (cid:88) n (cid:54) =0 | α n | (20)Also 2 | A n | / ( µ T [(1 − θ ) β + θζ ]) has approximately a chi-square distribution with two degrees offreedom. The A n are approximately independent so that Q T approximately has the distribution ofa weighted sum of independent chi-square random variables. The scaling of the chi-square randomvariables can be estimated as follows: observe that since µ T is the expected number of events in[0 , T ], (cid:80) w j (cid:39) µ T E ( W ). Thus µ T [(1 − θ ) β + θζ ] (cid:39) (cid:88) j w j (21)4 Power
We next consider properties of the test statistic Q T when there is a periodic source, i.e. η >
0. Letthe pulse shape of the source be γ ( t ) = (cid:88) n (cid:54) =0 γ n e πint (22)As an indication of the detection power of the test, we can use the signal to noise ratio. Let E H ( Q T ) and E K ( Q T ) respectively denote the expected values of the test statistic Q T when thereis and is not a periodic component, and let σ H denote the standard deviation of Q T under the nullhypothesis of no periodic component. If the phase function φ ( t ) is correctly identified (e.g. if f and ˙ f are correctly specified) we show in the Appendix that E K ( Q T ) − E H ( Q T ) σ H (cid:39) θ T µ E ( w ) (cid:80) n (cid:54) =0 | γ n | | α n | [ (cid:80) n (cid:54) =0 | α n | ] / (23)Here the efficiency of the weighting function enters as E ( w ) = ζ (1 − θ ) β + θζ = [ E ( W | source)] E ( W ) (24)This expression holds for any weight function. Since a weight function need only be defined upto a constant of proportionality, the denominator provides a normalization. The optimal weightfunction is that given by the score test, (6), in which case it follows from a short calculation that E ( w opt) = 1 θ (cid:90) θf S ( z )(1 − θ ) f B ( z ) + θf S ( z ) f S ( z ) dz (25)which is the ratio of the average probability of a source event given z to the marginal probability ofa source event. The efficacy of weighting depends in this way on the degree to which z discriminatesbetween background and source, or on how correlated it is with the optimal weight function, sinceafter some algebra, E ( w ) = [ E ( W W opt)] E ( W ) (26)where the expectations are taken with respect to the marginal density of Z , (1 − θ ) f B ( z ) + θf S ( z ).In the case of no weighting, w ( z ) = 1, E = 1.From (23), the detection threshold for a weak signal is θ of the order T − / . The expressionalso quantifies how the power depends upon the match of the template {| α n | } to the source profile {| γ n | } . Maximal power is achieved when | α n | ∝ | γ n | . So for detection of periodicity of a givensource, the best detection-statistic has the same spectrum as the source in question. Because thelatter is unknown, a template could be based on known sources (see Section 4 for an example).This result assumes that φ ( t ) is very accurately specified. In the case φ ( t ) = f t + ˙ f t /
2, andapproximate values are used, f = f +∆ /T and ˙ f = ˙ f +∆ /T , ∆ <
1, the sum in the numerator of(23) becomes (cid:80) n (cid:54) =0 | γ n | | α n | (1 − O (( n ∆) )). Thus, accurate specification is especially importantfor higher harmonics to contribute to the power. This depends on the rate of decay of γ n and onthat of α n , which for practical reasons would be zero for sufficiently large n .5 .20.30.40.50.60.7 0 0.2 0.4 0.6 0.8 1 Pulsar
VelaGemingaCrab I n t en s i t y Proportion of phase
Light Curves ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Pulsar
VelaGemingaCrab S i z e Component
Normalized Fourier Coefficients
Figure 1: Left panel: smoothed light curves, computed from single EGRET viewing periods, forthe Crab, Geminga, and Vela pulsars. Right panel: normalized coefficients | A n | for each pulsar. To illustrate the effect of the template choice, {| α n | } , we phased photon arrival times from Crab,Geminga, and Vela for single EGRET viewing periods. We calculated the corresponding coefficients, | A n | (with no weighting). For pedagogical illustration, we normalized them and regard them asthe coefficients | γ n | (22) of the sources. These coefficients are plotted in Figure 1. It is interestingthat in all cases the coefficient | γ | is largest.The template {| α n | } will be most powerful for a particular source if | α n | ∝ | γ n | , which isof course not possible in practice. To illustrate the effects of suboptimal templates, we evaluatedpercent efficiency for sequences | α n | = 1, n ≤ m and | α n | = 0, n > m , for m = 1 , , . . . ,
10 (the Z m test). (By “efficiency” we mean the percentage of the signal to noise ratio (23) that is attainedrelative to that attained by the optimal template, | α n | ∝ | γ n | .) The results are displayed inTable 1. As would be expected from Figure 1, the efficiency increases initially with m , and thendecreases. Considerable gains in power would result from using two to five harmonics, since thesignal to noise ratios increase by factors of two to three. For example, one would expect that a8.9 σ result using the first three coefficients for Crab would only be a 2.8 σ result using the Rayleightest. We also experimented with using the average of the three sources as a template, cutting offafter five and ten terms. Those results are shown in Table 2. (The first five average coefficients6re 0.35, 0.77, 0.43, 0.17, 0.26.) Very little is gained in going from five to ten non-zero coefficients,and the computational savings would be substantial, since we would need | n ∆ | < k/T would have to be oversampled by a factor of at least ten and all ten harmonics would have to becalculated. Table 1: Relative efficiencies for m = 1 , , . . . , Consider a source which emits photons at rate α and a background whose rate is ρ per unit area andsuppose that photons are collected in a disc of radius R (rather than a spherical cap, for simplicity).Then µ = πR ρ + α (27) θ = απR ρ + α (28) f B ( ϕ | E ) = 2 ϕR , ≤ ϕ ≤ R (29)Let β = 2 πρ/α , a measure of the strength of the background relative to the source. Then thedenominator of (24) is E ( W ) = απR ρ + α (cid:104) (cid:90) w ( E ) f S ( E ) (cid:90) w ( ϕ | E ) f S ( ϕ | E ) dϕdE + β (cid:90) w ( E ) f B ( E ) (cid:90) ϕw ( ϕ | E ) dϕdE (cid:105) (30)The factor α/ ( πR ρ + α ) when combined with the factor θ µ in (23) is proportional to α . Theoptimal weight function is then w opt( E, ϕ ) = f S ( E ) f S ( ϕ | E ) f S ( E ) f S ( ϕ | E ) + βϕf B ( E ) (31)7hich depends on the energy spectra through their ratio.If the psf is bivariate circular Gaussian with standard deviation σ ( E ), then ϕ , the distance tothe origin, has the probability density function f S ( ϕ | E ) = ϕσ ( E ) exp( − ϕ σ ( E ) ) (32)(This assumes that σ ( E ) (cid:28) R , otherwise the density truncated at R has to be normalized to haveunit area.) Then the optimal weight function is w opt( E, ϕ ) = f S ( E ) f S ( E ) + βσ ( E ) exp( ϕ / σ ( E )) f B ( E ) (33)If photons are not differentially weighted according to the ratio of the energy spectra, one has theweight function w ( E, ϕ ) = 11 + βσ ( E ) exp( ϕ / σ ( E )) (34)The decay of this weight function depends on the parameter ξ = βσ ( E ). If this parameter isvery large (weak source/strong background/large σ ), w ( ϕ ) ∝ exp( − ( ϕ / σ ( E ))). Numerical ex-ploration shows that there is little difference among the functions for ξ ≥
1, but if ξ = 0 . ξ = 0 .
01, the weights decay substantially more slowly. For example, if ξ = 1 a 2 σ incidence angleis given weight (relative to that of a photon that is directly on source) of about 0.1 and a 3 σ angleis given approximately zero weight. In comparison, for ξ = 0 .
01 a 2 σ angle receives weight about0.9, a 3 σ angle receives weight about 0.5 and a 5 σ angle receives weight approximately 0. We have presented a class of tests that depend on two features: a template for the form of theperiodic light curve and a function that differentially weights arrival times. We have suggested usinga template constructed as the average of those of known sources, but one could choose the templateadaptively, for example by considering the maximum of the test statistic over the light curves fromthe known sources. The power of such a test would be more difficult to analyze explicitly, as wouldbe the power of the H -test. From general theory we know that any test will not be uniformly mostpowerful, but will perform better in certain “directions” than others. Janssen (2000) shows thatin testing for uniformity any test can achieve high power for at most a finite dimensional family ofalternatives. This can be seen quite clearly in expressions we have developed to quantify the powerof the test (23).Ideally, the weight given to a photon arrival should be proportional to the probability that thephoton came from the source, given its measured energy, incidence angle, and any other availableinformation. The optimal weight function can only be approximated in practice. It depends on theratio of the energy spectra of the source and background, which may not be accurately known fora faint source. It depends, through f S ( ϕ | E ), on the source location, which may also be subject touncertainty. The efficiency of any weight function, w ( z ), has the conceptually simple form (24).The score test was derived under some assumptions that may not strictly hold in practice. Weassume that the photon arrival process is Poisson, which does not take into account instrumentdead time following the arrival of a photon. We also assume that the distribution of z does not8epend on the arrival time. This does not take into account possible dependence between energyand the phase of the source (see Fierro et al. 1998) Nonetheless, the form of the statistic Q T issuch that it is sensitive to periodic sources, even when the assumptions upon which it was deriveddo not strictly hold.The score test was derived to discriminate between a periodic source and background whichis not time varying. From the nature of the construction, it is clear that a similar test could bederived to take into account a background intensity which varies in time in a known way, perhapsfor example a known nearby pulsar. This research was sponsored by the National Science Foundation, Award Number 0507254, andby a VIGRE grant from the National Science Foundation, Award Number 0130526. We thankCharlotte Wickham and Jeremy Shen for computational assistance and Seth Digel, Patrick Nolan,and Tom Loredo for very helpful conversations.
Here we sketch arguments supporting the assertions about the distribution of the test statis-tic Q T under the null and alternative hypotheses. We assume that φ ( T ) (cid:29) c ( t )varies slowly and is nonzero over a substantial fraction of [0 , T ]. In particular we assume that (cid:12)(cid:12)(cid:12) (cid:82) T exp(2 πinφ ( t )) c ( t ) dt (cid:12)(cid:12)(cid:12) is negligible compared to (cid:82) T c ( t ) dt , which is true, for example, if c ( t ) isconstant. Let W ( t ) = (cid:80) Nj =1 w j δ ( t − t j ). Under the null, all events are background and E √ T A n = E √ T (cid:90) T e πinφ ( t ) dW ( t ) (35)= (1 − θ ) β + θζ √ T (cid:90) T e πinφ ( t ) λ ( t ) dt (36)= (1 − θ ) β + θζ √ T µ (cid:90) T e πinφ ( t ) c ( t ) dt (37) (cid:39) φ ( t ) and c ( t ). Similarly, the real andimaginary parts of T − / A n are approximately uncorrelated. To calculate E | A n | we use E [ dW ( t ) dW ( s )] = λ ( t )[(1 − θ ) β + θζ ] δ ( s − t ) dsdt + λ ( s ) λ ( t )[ θ ζ + 2 θ (1 − θ ) ζ β + (1 − θ ) β ] dsdt (39)9hen E | A n | = (cid:90) T (cid:90) T e πinφ ( t ) e − πinφ ( s ) E [ dW ( s ) dW ( t )] (40)= [(1 − θ ) β + θζ ] (cid:90) T λ ( t ) dt + [ θζ + (1 − θ ) β ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) T e πinφ ( t ) λ ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) (41)The first term is dominant. The limiting chi-squared approximation follows from a central limittheorem argument about the distribution of the linear statistic T − / A n . To evaluate E K Q T we need to calculate E [ dW ( t ) dW ( s )]. First, for s = t , the event is either withprobability θ from source or with probability (1 − θ ) from background. Thus E [ dW ( s ) dW ( t )] = µc ( t )[ θζ γ ( φ ( t )) + (1 − θ ) β ] dt, s = t (42)For s (cid:54) = t there are three possibilities: both events are from source, both are from background, orone is from source and one is from background. E [ dW ( t ) dW ( s )] = µ c ( s ) c ( t )[ θ ζ γ ( φ ( t )) γ ( φ ( s )) + (1 − θ ) β + θ (1 − θ ) ζ β ( γ ( φ ( s )) + γ ( φ ( t ))] dsdt, s (cid:54) = t (43)We initially assume that the phase φ ( t ) is properly specified, i.e. that f and ˙ f are identified. E | A n | contains contributions of all the terms in (42) and (43). Some analysis shows that theleading order comes from the first term in (43), leading to µ θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) e − πinφ ( t ) c ( t ) γ ( φ ( t )) dt (cid:12)(cid:12)(cid:12)(cid:12) = µ θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) k (cid:90) γ k c ( t ) e − πinφ ( t ) e πikφ ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (44) (cid:39) µ θ | γ n | (cid:104) (cid:90) T c ( t ) dt (cid:105) (45)Thus, under the alternative E K Q T (cid:39) µ θ ζ [ (cid:82) T c ( t ) dt ] T (cid:88) n (cid:54) =0 | γ n | | α n | (46)The approximation for frequency misspecification, ∆ (cid:54) = 0, follows from Taylor series expansions,noting that the first derivatives vanish at ∆ = 0, since that point is a maximum. References [1] R. Beran (1969). Asymptotic theory of a class of tests for uniformity of a circular distribution.
Annals of Statistics , 40: 1196-1206[2] P. Bickel, B. Kleijn, and J. Rice (2007). On detecting periodicity in astronomical point processes.
Challenges in Modern Astronomy IV. ASP Conference Series, G , J. Babu, and E. Feigelson(eds.), Volume 371. 103] R. Buccheri, K. Bennet, G. Bignami, J. Bloeman, V. Boriakoff, P. Caraveo, W. Hermsen,G. Kanbach, R. Manchester, J. Masnou, H. Mayer-Hasselwander, M. Ozel, J. Paul, B. Sacco,L. Scarsi, and A. Strong (1983). Search for pulsed γ -ray emission from radio pulsars in theCOS-B data. Astronomy and Astrophysics , 128:245.[4] O. C. De Jager. J. Swanepol, and B. Raubenheimer (1989). A powerful test for weak periodsignals with unknown light curve shape in sparse data.
Astronomy and Astrophysics , 221:180–190.[5] J. M. Fierro, P. F. Michelson, P.L. Nolan, and D. J. Thompson (1998). Phase-resolved stud-ies of the high-energy gamma-ray emission from the Crab, Geminga, and Vela pulsars.
TheAstrophysical Journal , 494: 734-746.[6] A. Jannsen (2000) Global power functions of goodness-of-fit tests.
Annals of Statistics , 28:239-253.[7] E. Lehmann and J. Romano (2006).
Testing Statistical Hypotheses . Springer.[8] Lord Rayleigh (1919). On the problem of random vibration and flights in one, two, and threedimensions.