A Gaussian Mixture Model for Nulling Pulsars
David L. Kaplan, Joseph K. Swiggum, Travis D. J. Fichtenbauer, Michele Vallisneri
DDraft version January 30, 2018
Typeset using L A TEX twocolumn style in AASTeX62
A Gaussian Mixture Model for Nulling Pulsars
D. L. Kaplan, J. K. Swiggum, T. D. J. Fichtenbauer, and M. Vallisneri
2, 3 Center for Gravitation, Cosmology and Astrophysics, Department of Physics, University of Wisconsin–Milwaukee, P.O. Box 413,Milwaukee, WI 53201, USA Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, USA TAPIR, California Institute of Technology, Pasadena, CA 91125, USA (Accepted January 27, 2018)
Submitted to ApJABSTRACTThe phenomenon of pulsar nulling – where pulsars occasionally turn off for one or more pulses –provides insight into pulsar-emission mechanisms and the processes by which pulsars turn off whenthey cross the “death line.” However, while ever more pulsars are found that exhibit nulling behavior,the statistical techniques used to measure nulling are biased, with limited utility and precision. In thispaper we introduce an improved algorithm, based on Gaussian mixture models, for measuring pulsarnulling behavior. We demonstrate this algorithm on a number of pulsars observed as part of a largersample of nulling pulsars, and show that it performs considerably better than existing techniques,yielding better precision and no bias. We further validate our algorithm on simulated data. Ouralgorithm is widely applicable to a large number of pulsars even if they do not show obvious nulls.Moreover, it can be used to derive nulling probabilities of nulling for individual pulses, which can beused for in-depth studies.
Keywords: methods — data analysis, methods — statistical, pulsars — general INTRODUCTIONPulsar surveys to date have found ∼ , most of which move across the P – ˙ P diagram and turn off in ∼ yr when they passthe “death line” (see, e.g., Lorimer & Kramer 2012).Just how and why pulsars turn off is still a subject ofintense scrutiny, with ongoing observational and the-oretical investigations (see references below). Nullingpulsars (Backer 1970) – pulsars whose radio emissionceases temporarily for one or more rotations – offer aninvaluable laboratory to study pulsar emission mecha-nisms and magnetospheres. The meaning of null dura-tions, the intervals between them, and the underlyingmechanism have been matters of debate ever since thebehavior was recognized.The first comprehensive study of nulling pulsars(Ritchings 1976) suggested that as a pulsar ages, the Corresponding author: D. L. [email protected] time interval between regular bursts of pulsed emissionincreases, eventually leading to “death” when the inter-val between bursts is much longer than the duration ofthe bursts themselves. A later study of 72 pulsars founda stronger correlation between null fraction (NF; thefraction of time that a pulsar spends in a null state) andspin period, but still argued that nulling could be in-dicative of a faltering emission mechanism (Biggs 1992).Wang et al. (2007) argued on the basis of a smallersample of 45 nulling pulsars that nulling behavior wasmore related to a large characteristic age than otherparameters, but the analysis was not quantitative.However, while data-collection capabilities, processingtechniques, and theories explaining pulsar nulling (e.g.,Gajjar et al. 2012) have become more sophisticated, weare still in a regime where increasing the sample sizeof nullers along with the precision of the nulling anal-ysis can have a significant impact on our understand-ing. We are working to increase the sample size throughdetailed followup observations of sources found in theGreen Bank North Celestial Cap survey (GBNCC; Sto-vall et al. 2014; Lynch et al. 2018; Kawash et al. 2018). a r X i v : . [ a s t r o - ph . I M ] J a n Kaplan et al.
At the same time, the classification of pulsars into thosethat do or do not null is largely based on qualitative ex-amination by eye, and computation of the nulling frac-tion suffers from significant limitations that may biasthe results.Here we present a robust technique for determiningnulling fractions using Gaussian mixture models (see,e.g., Ivezi´c et al. 2014). As we demonstrate, the ro-bustness of this technique allows us to infer quantitativelimits on nulling fractions even for pulsars that have noobvious nulls, so the technique can be used for more so-phisticated population analyses. First, we describe thesample data that we used to test our technique ( § § § § §
4. All of our source code is available at10.5281/zenodo.1155855. Note that we believe a similaralgorithm was applied to nulling pulsars by Arjunwad-kar et al. (2014), but we have been unable to find detailsof its implementation or results. SAMPLE DATATo test our method, we gathered timing data froma recent study of new pulsar discoveries in the GBNCCsurvey (Lynch et al. 2018). Data were collected with the100-m Robert C. Byrd Green Bank Telescope (GBT),observing at a center frequency of 800 MHz. We digi-tized a 200 MHz bandwidth using the Green Bank Ul-timate Pulsar Processing Instrument (GUPPI; DuPlainet al. 2008) in incoherent search mode, with 2,048 fre-quency channels and sampling every 40.96 µ s. Pulsarsincluded in this study (PSRs J0323+6742, J0054+6946,and J0137 + 6349) were observed monthly for 3-minuteexposures between February 2013 and January 2014.Exposure times for PSR J0054 + 6946 were lengthenedto 6 minutes starting in November 2013 through May2014.For each pulsar, data were folded modulo the pulseperiod using DSPSR and RFI was zapped interactivelyusing the pazi routine from
PSRCHIVE . A crucial stepin nulling studies is the choice of on-pulse and off-pulse windows, defined as fixed phase intervals (within foldedprofiles) that are assumed to contain most and none,respectively, of the pulsar emission. Loosely speaking,the intensities observed in the off-pulse window are at-tributed to background noise setting the threshold foron-pulse–window intensities to be classified as nullingor emitting. We set the width and phase of on-pulsewindows manually by inspecting folded profiles, choos-ing off-pulse windows of the same size at pulse phasescontaining no visible emission, often about 0.5 rotations A m p li t ud e P u l s e N u m b e r O FF O N NULLEMIT 0 . . . Figure 1.
Amplitude versus pulse phase and pulse numberfor a subset of the data on PSR J0323 + 6742. The top panelshows the average pulse profile, with the on- and off-pulsephase windows delineated by the vertical lines. The bottom-left panel shows the amplitude for individual pulses, with thephase windows indicated. The right panel shows the proba-bility of nulling (Eqn. 5). We highlight regions with signifi-cant nulling behavior (20 successive pulses with minimum aposteriori null probability greater than 62%) and relativelysteady emission (10 pulses with maximum a posteriori nullprobability ∼ from the on-pulse window, but careful not to includeany interpulse (if present).We fitted and removed a 6th-order polynomial to flat-ten the baseline of each single pulse profile and center theoff-pulse noise on zero intensity (similar to Lynch et al.2013; Rosen et al. 2013). Afterwards, we assembled ourdataset of on and off intensities for each individual pulseby integrating over each phase bin and concatenatingthe results from the separate 3-min/6-min exposures.See Figure 1 for an illustration of the pulse intensitiesmeasured in on- and off-pulse windows, along with somesignificant nulling behavior. A GAUSSIAN MIXTURE MODEL FORNULLING PULSARSThe standard algorithm for fitting the nulling frac-tion NF (Ritchings 1976) is to first construct his-tograms of the integrated intensities I ON ≡ { I ON k } and I OFF ≡ { I OFF k } measured in the on-pulse andoff-pulse windows at each observation k . We denotethe histograms as ON n ( I ON ) and OFF n ( I OFF ), wherethe index n identifies histogram bins. Then, for aseries of trial values of NF one computes the differ- aussian Mixtures for Nulling Pulsars n ( I ON , I OFF ) = ON n ( I ON ) − NF × OFF n ( I OFF ). The best-fit ˆNF is the value that mini-mizes the sum of ∆ n ( I ON , I OFF ) over negative intensi-ties, | (cid:80) I n < ∆ n ( I ON , I OFF ) | , where the nulling is pre-sumed to dominate. This method has a number ofdrawbacks: first, it requires construction of histogramsso there is an arbitrary choice of binning. Second, itassumes that the pulses with I ON k < Fitting Algorithm
In our proposed method, we define the likelihood ofthe on-pulse dataset I ON as a one-dimensional Gaussianmixture model, p ( I ON |{ w j , µ j , σ j } ) = N (cid:89) k M (cid:88) j =1 w j N ( I ON k | µ j , σ j ) , (1)where µ j and σ j are the means and standard deviationsof M normal distributions, and w j are their weights inthe mixture, which must satisfy (cid:80) w j = 1. Thereforethere are 3 M − M = 2 corresponds to the standard descrip-tion of nulling with two modes; we order results so thatthe j = 1 component has the lowest mean and describesthe nulls, so NF = w . Note that Eq. (1) implies thatthe error in the measurement of the intensities is neg-ligible compared to the intrinsic scatter in the mixturecomponents, described by the σ j . The parameters that maximize this likelihood canbe found using the expectation-maximization algorithm(Dempster et al. 1977), implemented as
GaussianMixture in scikit-learn (Pedregosa et al. 2011). Using thisimplementation, we find reasonable results for pulsarswith significant separations between nulling and emit-ting pulses. Such separation may be a signal-to-noiseeffect, but scatter in pulse intensities could also resultfrom intrinsic variability in the pulsars themselves or in-terstellar scintillation (Rickett 1990; Jenet & Gil 2003).However, as we show below the results are not ideal forother pulsars.We can do better by using the off-pulse dataset I OFF to constrain the null-component parameters µ and σ ,by way of the off-pulse likelihood p ( I OFF | µ , σ ) = (cid:89) k N ( I OFF k | µ , σ ) . (2) If the measurement error is not negligible, the σ j are effectivelyredefined to include it, under the assumptions that the error issimilar for every observation. Indeed, we may think of this step as providing a priordistribution for µ and σ , which is then used in Eqn. 1.We then explore the { w j , µ j , σ j } parameter space usingMarkov Chain Monte Carlo (MCMC) techniques, specif-ically the affine-invariant population-MCMC algorithm emcee described by Foreman-Mackey et al. (2013). Thedetails of our implementation are as follows: • We initialize 40 emcee “walkers” around the best-fit region for w j , µ j , σ j , as determined by expecta-tion maximization run on I ON k . • For simplicity, we set the µ and σ priors as nor-mal distributions centered on k (cid:80) k I OFF k and onthe inner-quartile range of the I OFF , respectively,with widths determined following Ahn & Fessler(2003). • Prior distributions for µ j and σ j (with j >
1) aretaken to be flat. The prior distribution for theweights w j is a Dirichlet distribution, but sincethe sum of the weights is 1 it is effectively flat. • We run the walkers through 50 iterations toachieve “burn in.” • Last, we run the walkers for 500 iterations to ob-tain the final population, representative of the µ j , σ j , and w j joint posterior.We experimented with increasing the number of walk-ers and iterations and found that the values above gavesufficiently reliable results for data-sets with a few thou-sand pulses and M <
5, but they can be increased asneeded to achieve reliable posterior distributions.3.2.
Fit Results
Representative results from this algorithm are shownin Figure 2, where we plot p ( I ON |{ ˆ w j , ˆ µ j , ˆ σ j } ) (solidline) and p ( I OFF | ˆ µ , ˆ σ ) (dashed) on top of the I ON and I OFF histograms (with appropriate normalizations);here { ˆ w j , ˆ µ j , ˆ σ j } are the a posteriori joint maxima of theGaussian mixture parameters. Figure 3 shows the pos-terior densities of the Gaussian means and variances.The data appear to be fit well, with the null compo-nent ending up close to the off-pulse fit results; we ob-serve no significant pathologies in the MCMC chains. Toevaluate goodness of fit quantitatively, we perform theKolmogorov–Smirnov test (KS test; Chakravarti et al.1967), and find a statistic value of 0.015, correspondingto a p -value of 0.7—no evidence to reject the hypothesisthat the data were sampled from the best-fit distribu-tion.For this pulsar we find that the emitting and the nullcomponents are sufficiently separate that all of the al-gorithms outlined above would give similar results. For Kaplan et al. − −
10 0 10 20 30 40 50 60
Pulse Intensity N u m b e r o f P u l s e s ComponentsScaled Null ComponentSumONOFFON-OFF*NF
Figure 2.
Distribution of pulse intensities for PSR J0323 +6742. The blue and orange histograms are the raw intensi-ties for on- and off-pulse windows, respectively. The dashedcurves are the maximum a posteriori individual componentsfrom the Gaussian mixture model for M = 2, with the solidcurve their sum, as determined by our MCMC algorithm.The dotted curve is the component for the nulls scaled by1 / NF: it matches the off-pulse intensities well. Finally, thedashed green histogram is the data that would be used to im-plement the Ritchings (1976) algorithm, although it is plot-ted for our best-fit value of NF. Using our MCMC algorithmwe find ˆNF = 50 ± instance, we find that only about 2% of the pulses fromthe emitting component would have intensities less than0, which would only bias the Ritchings (1976) results bya small amount.If the problem is well behaved, we can select theoptimal number of components M by maximizing theBayesian information criterion (BIC; Schwarz 1978) orthe Akaike information criterion (AIC; Akaike 1974).We illustrate this in Figure 4, which shows both. Wefind strong evidence that nulling behavior is present( M > M = 2 comparedto M >
2. The BIC corresponds to approximating theBayes ratios between models as e − ∆ BIC / , where ∆ BIC isthe difference in BIC (likewise for AIC). In this approx-imation, the implied Bayes ratio for M = 2 vs. M = 3 is O = 175 (AIC) or O = 92 ,
000 (BIC), showing that M = 2 is indeed preferred.We show another example in Figure 5, where for M = 2 we find on-pulse maximum a posteriori parame-ters ˆ µ = 12 . σ = 11 .
3, with ˆNF = 41%. In thiscase there is much less separation in intensities betweennulls and pulses, with about 13% of the pulses having
I <
0. Nonetheless our method gives a robust fit. Fig-ure 5 appears to show slight deviations from Gaussian . . . . . µ E m i t . . . . . σ N u ll . . . . σ E m i t − . . . . µ Null . . . . N F . . . . . µ Emit . . . . . σ Null . . . . σ Emit .
32 0 .
40 0 .
48 0 . NF Figure 3.
Posterior probability densities for the Gaussian-distribution parameters of the null and emitting components,as derived in our MCMC algorithm. The solid lines show the µ and σ modes as inferred from the off-pulse data. Thecontours are 1-, 2-, and 3- σ joint confidence contours. Number of components M I n f o r m a t i o n c r i t e r i o n − , AICBIC
Figure 4.
Model-comparison evidence for different numberof Gaussian-mixture components in the PSR J0323 + 6742data. We show the Bayesian information criterion (BIC; or-ange squares) and the Akaike information criterion (AIC;blue circles) as a function of the number of components M .The non-nulling hypothesis ( M = 1) is rejected at high con-fidence, and M = 2 is preferred by both criteria. aussian Mixtures for Nulling Pulsars − −
10 0 10 20 30 40 50 60
Pulse Intensity N u m b e r o f P u l s e s ComponentsScaled Null ComponentSumONOFFON-OFF*NF
Figure 5.
Distribution of pulse intensities for PSR J0054 +6946 (see Fig. 2 for details). Here the nulling and emittingcomponents are much less separated than in Figure 3. Usingour MCMC algorithm we find ˆNF = 40 . ± . distributions, which may be handled with more com-plex models that (for example) incorporate asymmetricdistributions due to scintillation. Indeed, the KS test re-jects the assumption that the data were drawn from thebest-fit distribution at the 5 × − level; nevertheless,since our goal is primarily to quantify the bimodality ofthe emitted pulses rather that the exact intensity distri-bution, we believe the nulling results themselves to berobust.Finally, in Figure 6 we show an example where thepulsar shows no obvious nulling. Our algorithm findsˆNF = 0 . ± . P value of 2 × − ), but the overallrobustness of our determination is evident. In contrastto Ritchings (1976) which can give determinations ofnon-zero nulling fractions even for pulsars that do notappear to null, our algorithm behaves well. Therefore wecan use it for all pulsars that are sufficiently bright re-gardless of whether nulling is evident, and derive morerobust determinations of whether or not weak nullingbehavior is present.3.3. Simulation Results
We validate our algorithm by simulating pulsar datafor a range of parameters, drawing random intensitiesaccording to Eqs. (2) and (1) for the off- and on-pulsewindows, respectively. We base our synthetic datasetson the PSR J0323 + 6742 data analyzed above: we sim-ulate 2,000 pulses with µ = 0, σ = 5, σ = 10, and − −
10 0 10 20 30 40 50 60
Pulse Intensity N u m b e r o f P u l s e s ComponentsScaled Null ComponentSumONOFFON-OFF*NF
Figure 6.
Distribution of pulse intensities for PSR J0137 +6349 (see Fig. 2 for details). Here there is no obvious evidencefor nulling. Using our MCMC algorithm we find ˆNF = 0 . ± . nulling fraction NF = 0 .
5. We vary µ between 5 (hardto distinguish from the nulls) and 30 (easily distinguish-able), and we repeat the test for 30 trials for each valueof µ .We plot the median and standard deviation of theestimated NF in Figure 7, using the Ritchings (1976)method, the expectation-maximization method, and ourBayesian algorithm (in which case we report the max-imum a posteriori NF). All three algorithms agree forhigh pulse intensities, µ (cid:38) µ , as expected. Thisis because the model has a significant fraction of non-nulled pulses with intensities less than 0, ranging from30% (for µ = 5) to 0.1% (for µ = 30). Specifically, weexpect a fraction 12 −
12 erf (cid:32) µ √ σ (cid:33) (3)of the emitted pulses to have intensities <
0, whereerf( x ) is the error function of x . This then leads toa biased estimated NF,NF + (1 − NF) (cid:34) − erf (cid:32) µ √ σ (cid:33)(cid:35) (4)which we have plotted in Figure 7, where they agreewith our simulated results. The EM results using GaussianMixture are also biased at low pulse intensi-ties. By contrast, our Bayesian algorithm performs well,with consistent uncertainties and no obvious bias acrossthe µ range. Kaplan et al.
Mean of pulses µ Emit . . . . . . N u lli n g f r a c t i o n MCMCEMRichtings
Figure 7.
Comparison on NF estimates for simulated data,as derived using the Ritchings (1976) algorithm (green di-amonds), our expectation-maximization algorithm (orangesquares, labeled as “EM”), and our Bayesian algorithm fit(blue circles). The horizontal line marks the true nullingfraction of 0 .
5. Each simulated dataset consisted of 2,000pulses, with µ = 0, σ = 5, σ = 10; simulations were re-peated 30 times for each value of µ (hence the vertical errorbars). The solid green curve shows our analytical expecta-tion for the bias of the Ritchings (1976) algorithm (see maintext). 4. DISCUSSION AND CONCLUSIONSWe have outlined and demonstrated an improvedmethod to determine the nulling fraction of a pulsar.The method performs well in the limit of weak nulling,so it can be applied to a large number of pulsars withoutevident strong nulls. Unlike the traditional Ritchings(1976) algorithm, our method is unbiased, and it can beapplied to pulsars with more than two emission modes,as long as those are reflected in the pulse intensities.However, it does require specification of the functionalform of the intensity distributions for the nulling andemitting components: here we assume sums of Gaus-sians, although exponentials appropriate for 100% mod-ulation by interstellar scintillation (e.g., Rickett 1990),or intermediate distributions are also possible. In thosecases the AIC/BIC values can be used to quantitativelycompare how well alternative distributions fit the data. An additional benefit to this analysis is that we candetermine explicitly the probability that any individualpulse belongs to a given class. This is sometimes calledthe “responsibility” (Hastie et al. 2009), and is given by: p ( j | I ON k ) = w j N ( µ j , σ j ) (cid:80) Mj (cid:48) =1 w j (cid:48) N ( µ j (cid:48) , σ j (cid:48) ) (5)for class j . An example of this is shown in Figure 1,where we can determine the nulling probability as p ( j =1 | I ON k ). This probability can be computed for the maxi-mum a posteriori { ˆ w j , ˆ µ j , ˆ σ j } , or it can be marginalizedover their distributions. Individual-pulse nulling prob-abilities can be used in robust multi-wavelength stud-ies, to establish whether the X-ray properties of thepulses received during nulls differ from the others (e.g.,Hermsen et al. 2013). We can also look for temporal pat-terns in the nulling properties, like the length of nulls(Fig. 1) or the time between nulls (e.g., Wang et al. 2007)using quantitative probability thresholds, and we couldexamine the probability that adjacent pulses transitionbetween nulling and emitting behavior. Finally, we canfit for more than two components and identify modechanging quantitatively, in addition to nulling. All ofthese topics will be explored in future papers. Acknowledgments.
We thank S. McSweeney for help-ful comments. We thank an anonymous referee and theAAS journals statistics editor for their suggestions. TheGreen Bank Observatory is a facility of the National Sci-ence Foundation operated under cooperative agreementby Associated Universities, Inc. Support was providedby the NANOGrav NSF Physics Frontiers Center awardnumber 1430284. MV acknowledges support from theJPL RTD program. Portions of this research were car-ried out at the Jet Propulsion Laboratory, CaliforniaInstitute of Technology, under a contract with the Na-tional Aeronautics and Space Administration.
Facility:
GBT
Software:
Astropy(AstropyCollaborationetal.2013),DSPSR (http://dspsr.sourceforge.net/index.shtml), em-cee(Foreman-Mackeyetal.2013),PSRCHIVE(Hotanetal.2004), scikit-learn (Pedregosa et al. 2011) .REFERENCES
Ahn, S., & Fessler, J. A. 2003, Standard errors of mean,variance, and standard deviation estimators, Tech. Rep.Technical Report 413, Communications and SignalProcessing Lab., Univ. of Michigan Akaike, H. 1974, IEEE Transactions on Automatic Control,19, 716Arjunwadkar, M., Rajwade, K., & Gupta, Y. 2014, inAstronomical Society of India Conference Series, Vol. 13,79–81 aussian Mixtures for Nulling Pulsars7