Parameter estimation of spinning binary inspirals using Markov-chain Monte Carlo
Marc van der Sluys, Vivien Raymond, Ilya Mandel, Christian Roever, Nelson Christensen, Vicky Kalogera, Renate Meyer, Alberto Vecchio
aa r X i v : . [ g r- q c ] M a y Parameter estimation of spinning binary inspiralsusing Markov-chain Monte Carlo
Marc van der Sluys , Vivien Raymond , , Ilya Mandel ,Christian R¨over , , Nelson Christensen , Vicky Kalogera ,Renate Meyer , Alberto Vecchio , Physics & Astronomy, Northwestern University, Evanston IL, USA Universit´e Louis Pasteur - Strasbourg I, Strasbourg, France Statistics, University of Auckland, New Zealand Max-Planck-Institut f¨ur Gravitationsphysik, Hannover, Germany Physics & Astronomy, Carleton College, Northfield MN, USA Physics & Astronomy, U. of Birmingham, Edgbaston, Birmingham, UKE-mail: [email protected]
Abstract.
We present a Markov-chain Monte-Carlo (MCMC) technique tostudy the source parameters of gravitational-wave signals from the inspiralsof stellar-mass compact binaries detected with ground-based gravitational-wavedetectors such as LIGO and Virgo, for the case where spin is present in the moremassive compact object in the binary. We discuss aspects of the MCMC algorithmthat allow us to sample the parameter space in an efficient way. We show sampleruns that illustrate the possibilities of our MCMC code and the difficulties thatwe encounter.PACS numbers: 02.50.-r, 02.70.Uu, 04.30.Tv, 04.80.Nn, 95.85.Sz
Submitted to:
Classical and Quantum Gravity
1. Introduction
Inspirals of stellar-mass compact binaries induced by gravitational radiation are amongthe most promising gravitational-wave sources for ground-based laser interferometers,such as LIGO [1; 2] and Virgo [3]. If such a binary contains a black hole, it is believedto be spinning moderately [4]. A spinning black hole causes the binary orbit to precess,introducing phase and amplitude modulations in the gravitational-wave signal. Thisshould be taken into account in the analysis of the signal. The accuracy with whichthe binary parameters can be estimated is of significant astrophysical interest.We developed a code which implements a Markov-chain Monte-Carlo (MCMC)technique [5] to compute the posterior probability-density functions (PDFs) of thesource parameters. This code is a modification of an earlier parameter-estimation codefor analysis on binaries without spin [6; 7]. In addition to including post-Newtoniangravitational waveforms with a single spinning object [8], we have also implementeda number of improvements designed to make the parameter-space exploration moreefficient, such as parallel tempering. arameter estimation of spinning binaries using MCMC
2. Implementation of MCMC
The code we use to estimate the parameters of a binary inspiral with a spinningmember is based on an earlier code that was used by some of us for the case whereno spin is present [7]. In this section, we describe some of the features that weretaken from the earlier non-spinning MCMC code, as well as some features that wereintroduced in the present code for use on inspirals with a single spinning object.
For this study, we inject simulated waveforms with parameters of our choice intoa stretch of simulated Gaussian, stationary noise at the designed sensitivity levelfor the detectors [9]. The resulting data is windowed, Fourier transformed, andsubsequently examined by the MCMC analysis. The details of the data handling,Fourier transformation, windowing and PSD estimation can be found in [7].A stretch of 256 seconds of simulated noise data is used to estimate the power-spectral density (PSD) of the noise S n ( f ). Noise data is read in from files in theLIGO/Virgo frame format [10] in the time domain. Use of the frame-file format alsoallows us to read in real interferometer data and thereby test the analysis on simulatedhardware-injection signals [11] or analyse any candidate events which arise. In thatcase, noise estimates would be based on data taken close to the time of the signal,without including the signal. We have tested our MCMC code on LIGO S5 playgrounddata and find that the results are qualitatively similar to the results on Gaussian noise.The signal-to-noise ratio (SNR) of an injected model signal with parameters ~λ asdetected by a single detector i is computed as follows: ρ i ( ~λ ) = vuuut f high X f = f low (cid:12)(cid:12)(cid:12) ˜ m ( ~λ, f ) (cid:12)(cid:12)(cid:12) S n ( f ) ∆ f , (1)where ˜ m ( ~λ, f ) is the frequency-domain model waveform, S n ( f ) is the noise PSD, thesum is computed over the frequency bins between f low and f high and ∆ f is the widthof each frequency bin. The total SNR for a network of N detectors is then given by: ρ tot ( ~λ ) = vuut N X i =1 (cid:16) ρ i ( ~λ ) (cid:17) . (2) For this phase of our project, we use a simplified waveform that takes into accountpost-Newtonian (PN) expansions up to the 1.5 PN order in phase and is restrictedto the Newtonian order in amplitude. The waveform includes the simple-precessionprescription [8]. This choice of waveform template allows us to investigate the first-order effects of spin (spin-orbit coupling), as long as either only one binary member arameter estimation of spinning binaries using MCMC M ⊙ spinning black hole and a 1 . M ⊙ non-spinningneutron star.The waveform for an inspiral with one spinning object is described by twelveparameters. The parameters are: the chirp mass M c , symmetric mass ratio η ,spin magnitude a spin ≡ S/M , the constant angle between spin and orbital angularmomentum θ SL , the luminosity distance d L and sky position R.A., Dec., the time,orbital phase and precession phase at coalescence t c , φ c , α c , and two angles thatdefine the direction of the total angular-momentum ~J of the binary: θ J and φ J .Each waveform template is computed in the time domain, and then windowedand Fourier transformed. The calculation of the likelihood, which measures how wella model waveform matches the data, is carried out in the frequency domain.In this initial study, we focus on waveforms for our fiducial binary ( M c ≈ . M ⊙ , η ≈ .
11) at a distance of d L = 13 . We follow a Bayesian approach to infer the posterior probability-density functions(PDFs) of the twelve parameters that describe our waveform. The PDF of a parametervector ~λ given an observed data set d follows from Bayes’ theorem: p ( ~λ | d ) = p ( ~λ ) p ( d | ~λ ) p ( d ) ∝ p ( ~λ ) L ( d | ~λ ) , (3)where p ( ~λ ) is the prior distribution of the parameters, and L ( d | ~λ ) is the likelihood function. We calculate the likelihood for a model waveform ˜ m ( ~λ, f ) with parameters ~λ and data set ˜ d ( f ) as measured by a detector i in the usual way: L i ( d | ~λ ) ∝ exp − Z ∞ (cid:12)(cid:12)(cid:12) ˜ d ( f ) − ˜ m ( ~λ, f ) (cid:12)(cid:12)(cid:12) S n ( f ) df . (4)The tildes indicate that both d and m are expressed in the frequency domain. Sincewe will be considering the ratio of likelihoods, we do not need to take into account thenormalisation factor, and it is sufficient to compute the proportionality in Eq. 4.Assuming that the noise of different interferometers is independent, the expressionof the PDF given data from a coherent network of N interferometers generalises to: p ( ~λ | d ) ∝ p ( ~λ ) N Y i =1 L i ( d | ~λ ) . (5) We use a prior distribution that is uniform in log( d L ), cos( θ SL ), sin(Dec), sin( θ J ), (thesine is used for parameters defined in the domain [ − π , π ], the cosine for θ SL ∈ [0 , π ])and in the original scales of the remaining parameters. The allowed ranges for theseparameters are between 1 and 6 M ⊙ for M c , between 0 and 0.25 for η , in the range t c ± d L , between 0 and 1 for a spin , between -1 and 1 for theangles of which we use the sine or cosine as MCMC parameter, and between 0 and 2 π for all other angles. arameter estimation of spinning binaries using MCMC The Markov chain is created as follows. If in the current iteration i , the chain hasthe location in parameter space (set of waveform parameters, or state) ~λ i , we proposea random jump ∆ ~λ i to the new location ~λ i +1 = ~λ i + ∆ ~λ i . Since the jump proposalis random, the next state of the chain should depend only on the current state, thusgiving the chain its Markovian property.We then compute the likelihood for the new state as given by Eq. 5 and determinewhether to accept it by comparing the acceptance probability (the left-hand side inEq. 6) to a random number r drawn from a uniform distribution between 0 and 1: p ( ~λ i +1 ) p ( ~λ i ) L ( d | ~λ i +1 ) L ( d | ~λ i ) > r. (6)The jump to state ~λ i +1 is accepted if Eq. 6 is fulfilled. Otherwise the jump is rejected,the chain keeps the old parameter set ~λ i +1 = ~λ i and a new iteration is started bydrawing a new random jump proposal ∆ ~λ i +1 to a different state ~λ i +2 . Equation 6shows that a new state is always accepted when it improves the product of the prior andthe likelihood, and that a larger decrease in this product means a smaller probabilityof acceptance.We use an adaptive scheme for the proposed jump size [13]. The size of thejump proposal for the parameter λ j (the j -th element of the vector ~λ ) is drawn froma Gaussian distribution with width σ j jump . Thus, these widths form a vector ~σ jump with the same number of elements as ~λ . The adaptation of the jump size consists ofincreasing σ j jump when a jump proposal in the parameter λ j is accepted and decreasingit when a proposal is rejected. In a typical run, the increase of σ j jump is a factor of ∼ ∼
2, which results in the target acceptance ratio of about25%.
The default method for choosing a jump proposal isto draw the jump size independently in the different dimensions of the parameterspace. This implies that adaptation is done per parameter as well. We make theseupdates in two categories. The first category contains per-parameter updates, wherethe likelihood is calculated after proposing a jump in one parameter only, thus decidingwhether to accept the jump for each parameter separately. The second categoryinvolves proposing a jump in all parameters at once before calculating the likelihoodonly once. This is typically done in 10% of the uncorrelated update proposals. Forboth categories of uncorrelated update proposals the same vector ~σ jump is used. There can exist strong correlations between parameters,in which case uncorrelated updates can be very inefficient. We implemented a methodto calculate the correlations between the parameters of a block of n corr iterations(typically n corr ≈ ). We then draw the subsequent n corr jump proposals from amultivariate normal distribution that is given by the Cholesky decomposition of thiscovariance matrix.We recompute the covariance matrix and its Cholesky decomposition at the endof each block of n corr iterations, and decide whether to use the new matrix or notby checking how the diagonal elements of the matrix have changed. We find that ifwe accept each proposed matrix update (for which the covariance matrix is positive arameter estimation of spinning binaries using MCMC ∼
50% of the diagonal elements have decreased in value.The correlated update proposals are always block updates of all twelve parametersat once, hence there is a separate σ jump , corr for these updates. In a typical MCMCrun, 70–90% of the update proposals are done in a correlated way, and 10–30% in anuncorrelated way. The problem that arises when using MCMC for parameter estimation, and especiallyto find the (unknown) modes of the PDFs, is that the chains should typically be broadenough to sample the whole allowed parameter ranges, while also being able to probethe region of maximum likelihood in a detailed way. These two demands are almostmutually exclusive, but the technique known as parallel tempering offers a solution.Parallel tempering consists of several parallel chains that each have a different ‘temperature’ . In addition to the default Markov chain with T = 1, parallel chainsof higher temperature are computed. Hotter chains are more likely to accept a jumpthat decreases the likelihood, by adjusting Eq. 6 to accept a jump when p ( ~λ i +1 ) p ( ~λ i ) L ( d | ~λ i +1 ) L ( d | ~λ i ) ! T > r, (7)where T ≥ T m < T n whenever: (cid:18) L n L m (cid:19) Tm − Tn > r. (8)Since the likelihood that is needed to determine whether to swap the parameter setswas already calculated, this decision comes almost for free, and we make it for everypair of chains at every iteration. Output of parallel chains with different temperaturesfor a sample run is shown in Fig. 2 in Sect. 3.3. The temperature ladder is determined bysetting the lower temperature to T = 1. This is the only chain that is saved and usedto create the PDFs. One also has to choose a maximum temperature T max , whichis typically the lowest temperature that allows the chain to scatter over the wholeallowed parameter ranges quickly. In our test runs, we find that we need to increasethe value of T max when injecting a signal with a higher SNR. The last quantity tochoose is the number of parallel chains N ch in the temperature ladder. This will be acompromise between high computation speed (low N ch ) and high swap efficiency forthe chains by having small differences between adjacent temperatures (high N ch ). Thetemperatures are then chosen equidistantly in log( T ). Our typical setup is N ch ≈ T max ≈ −
50 for SNRs between 10 and 20. arameter estimation of spinning binaries using MCMC The obvious drawback of parallel tempering is thatone has to calculate a handful of chains, instead of just one. In order to reduce thenumber of chains in the temperature ladder, we have tested our simulations with sinusoidal temperatures, for all chains with
T >
1. In order to do this, we set up ourtemperature ladder as before, but now sinusoidally oscillate the temperature of eachchain m = 1 with an amplitude ∆ T m . We find that the swapping is efficient when wechoose ∆ T m = T m − T m − for each chain m >
1, so that the minimum temperature ofeach chain is equal to the mean temperature of the next cooler chain. Furthermore,we make sure that adjacent chains are in antiphase, so that there is an optimal overlapat the extrema. In this setup, we can use N ch ≈ − T max ≈ −
30 for SNRsbetween 10 and 20, thus reducing the computational cost of the MCMC runs withparallel tempering. We suggest that the period of the temperature variation shouldnot be too close to n corr (see Sect. 2.5.2) and that a too short period may endangerthe Markovian nature of the chain. Hence we chose a period that is ∼ × n corr .
3. MCMC simulations
In a typical MCMC run, we simulate a data set by injecting a signal into detectornoise, as described in Sect. 2.1. This data can be either Gaussian, stationary noisethat is simulated at the designed sensitivity of the detectors, or real LIGO or Virgodata. The test simulations described in this study were all done using synthetic noise.For our simulations, we inject the signal with a given set of parameters into thenoise, appropriately projecting it onto one or more of the following gravitational-wavedetectors: the 4-km LIGO detectors at Hanford, WA (H1), Livingston, LA (L1) inthe USA, or the 3-km Virgo detector near Pisa, Italy. This way, we can do coherentparameter estimation with such a network of detectors, as described in Sect. 2.3.Test simulations that we carry out at the moment focus on a fiducial binarythat consists of a 10 M ⊙ spinning black hole and a 1 . M ⊙ non-spinning neutronstar at a distance of typically 15-20 Mpc. The purpose of these test simulationsis twofold. Firstly, we want to establish to which accuracy the source parameterscan be measured. Secondly, our code must be able to find these source parameterswhen started from arbitrary values. These are two different goals that require slightlydifferent simulations to test in an efficient way. For a fixed network SNR (see Eq. 2), the accuracy with which the parameters can bedetermined depends mainly on three factors: the number of detectors in the network,the sky position and orientation of the binary (these quantities determine how thenetwork SNR is distributed over the different interferometers) and the magnitude andthe direction of the spin of the black hole. We are therefore carrying out a systematicstudy in which we vary these parameters in order to map their influence on theaccuracy of the parameter estimation. Because of the large number of parameters thatis varied, and the timescale of 1-2 weeks that is needed for the chains to accumulate asufficient number of iterations, this is a lengthy process. To speed up these simulations,we usually start the Markov chains from the true source parameter values which wereused for the software injection. We must be careful, however: in addition to parameter-estimation uncertainties caused by noise, which are quickly measured by seeding thechains with the true parameters, there can also be uncertainties due to degeneracies arameter estimation of spinning binaries using MCMC parallel chains each with a different temperature.Although these serial chains start from the same (true) values, they use a different seedfor the random-number generator and therefore produce different and independentMarkov chains.
The second purpose of the simulations with our MCMC code is to find the true sourceparameters in the case where these are unknown. In order to test the ability of ourMCMC code to do so, we carry out a semi-blind analysis, in which the chains arestarted from offset ( i.e. , non-true) parameter values. For the chirp mass and the timeof coalescence, these starting values are drawn from a Gaussian distribution that iscentred on the value of the injected parameter, with a standard deviation of about0 . M ⊙ and 30 ms respectively. The other ten parameters are drawn randomly fromthe allowed ranges (see e.g. Fig. 1). By selecting the starting values for the chainsthis way, we model the information that will be available after a detection trigger isanalysed by the LIGO-Virgo data-analysis pipeline.When starting chains from offset values, we typically use up to ten serial chains.The reason for using a larger number of chains is that chains may get stuck at a localmaximum in likelihood. In the case of our semi-blind analysis, it is easy to recognisesuch chains, since we can see whether the chains have found the likelihood of theinjected signal. In the case of a real analysis, we need to be sure that the chains havefound the highest likelihood present. One way to do this is by starting multiple serialchains from different positions in parameter space and requiring that they find thesame highest value for the likelihood. Figure 1 shows a run where six chains werestarted from offset values. After about 4 . × iterations, the four black chains haveall found the same likelihood and parameter values, whereas the two grey chains areexploring different parts of parameter space. The latter two chains are clearly at alower value of the likelihood and have therefore found local maxima in parameter space.If the run were continued, the last two chains should eventually find the parametervalues of the injected signal. Figure 2 shows how the parallel chains with different temperatures cooperate in thecase of parallel tempering. The figure shows the likelihood and Markov chains forfour different parallel chains. The chains with
T > . × and 1 . × , may be explained that way. In thefirst jump, the chain moves to the correct value for the chirp mass, in the second jump, arameter estimation of spinning binaries using MCMC Figure 1.
Likelihood and Markov chains for the parameter estimation on asimulated injection. The MCMC run consists of six serial chains plotted in twoshades of grey. All of the four black chains find the parameters of the injectedsignal after ∼ . × iterations, while the two grey chains have found other,local maxima. The figure shows the difference between the logarithm of thelikelihood log L for a waveform with the current parameter values and log L forthe null waveform (upper-left panel), and the chain projections for the chirp mass M c (upper-right), symmetric mass ratio η (lower-left) and spin magnitude a spin (lower-right). All horizontal axes show the iteration number in the chains. Thesix dotted lines in each panel indicate the starting value of each chain of thecorresponding colour. The dashed black lines are the parameter values of theinjected signal, and the corresponding value for the likelihood. One out of ∼ calculated iterations is plotted. the ‘true’ values for the mass ratio and spin magnitude are found. In most MCMCruns, we start 5–10 of these sets of parallel chains, but only save the output for the T = 1 chains.
4. Conclusions and future work
We have developed a Markov-chain Monte-Carlo (MCMC) algorithm that we useto estimate the twelve physical parameters of the gravitational waves emittedduring a spinning compact-binary inspiral that can be detected with ground-basedgravitational-wave observatories like LIGO and Virgo. In Section 2, we discuss manyof the implemented features that are needed to run this code efficiently. In Section 3we show examples of MCMC simulations carried out with our code.We are constantly working on improving the efficiency with which the Markovchains explore the parameter space. In particular, a more efficient sampler speeds up arameter estimation of spinning binaries using MCMC Figure 2.
Likelihood and Markov chains for chains of different temperature ina sample run with parallel tempering. The panels contain the same informationas those in Fig. 1, and the dashed and dotted lines have the same meaning as inthat figure. The chains with
T > T = 1 . T mean ≈ . T mean ≈ .
3) and black squares ( T mean ≈ . ∼ the search for the true modes of the PDFs when the Markov chains are started fromoffset ( i.e. non-true) initial parameter values, as they would be in the case of a realdetection.When the structure of the likelihood function in the parameter space is very rich,and the SNR is high, there are many sharply peaked local maxima of the likelihood.In this case, the MCMC algorithm is likely to get stuck on some of these local maxima,as random jumps are unlikely to lead from one local maximum to another. Coherentchanges in parameters based on an improved analytical understanding of the waveformmay allow us to efficiently traverse the peaks of this complicated parameter space. Weare currently working on gaining a sufficient understanding of the harmonic structureof the waveform, which would allow us to implement such coherent jumps.A part of our effort is directed at understanding degeneracies that exist betweenparameters, especially the sky position and orientation of the binary. For example, wefind cases where the PDF for the sky position is more-or-less uniform over the wholesky, cases where there are multiple regions in the sky where the binary could be, andcases where one unique sky position is resolved. We need to understand better howthese degeneracies depend on the number of detectors in the network and on the exact EFERENCES e.g.
LIGO S5 playground data) and we plan to explorethe possibility of doing follow-up on candidate events that come out of the LIGOdetection pipeline [15; 16]. We have detailed plans to include this MCMC code as afinal stage in the LIGO pipeline, in order to provide a post-processing tool that canbe used after a detection.
Acknowledgements
This work is partially supported by a Packard Foundation Fellowship, a NASA BEFSgrant (NNG06GH87G), and a NSF Gravitational Physics grant (PHY-0653321) toVK; NSF Gravitational Physics grant PHY-0553422 to NC; Royal Society of NewZealand Marsden Fund grant UOA-204 to RM and CR; UK Science and TechnologyFacilities Council grant to AV. Computations were performed on the Fugu computercluster funded by NSF MRI grant PHY-0619274 to VK.
References [1] Abramovici A, Althouse W E and Drever R W P et al
Science
CQG S653–S660[3] Acernese F et al
Classical and Quantum Gravity S381–S388[4] Belczynski K, Taam R E, Rantsiou E and van der Sluys M 2007 (
Preprint astro-ph/0703131 )[5] Gilks W R, Richardson S and Spiegelhalter D J 1996
Markov chain Monte Carloin practice (Chapman & Hall/CRC)[6] R¨over C, Meyer R and Christensen N 2007
Physics Review D http://hdl.handle.net/2292/2356 [8] Apostolatos T A, Cutler C, Sussman G and Thorne K 1994 PRD CQG S409–S415[10] Frame library (Fr) http://lappweb.in2p3.fr/virgo/FrameL [11] Brown D A and the LIGO Scientific Collaboration 2004
CQG et al Bernoulli PRD PRD Preprint0712.2050