Performance of Particle Swarm Optimization on the fully-coherent all-sky search for gravitational waves from compact binary coalescences
PPerformance of Particle Swarm Optimization on the fully-coherent all-sky search forgravitational waves from compact binary coalescences
Thilina S. Weerathunga
Dept. of Physics and Astronomy, University of Texas San Antonio, One UTSA Circle, San Antonio, TX 78249
Soumya D. Mohanty
Dept. of Physics and Astronomy, University of Texas Rio Grande Valley,One West University Blvd., Brownsville, Texas 78520
Fully-coherent all-sky search for gravitational wave (GW) signals from the coalescence of compactobject binaries is a computationally expensive task. Approximations, such as semi-coherent coinci-dence searches, are currently used to circumvent the computational barrier with a concomitant lossin sensitivity. We explore the effectiveness of Particle Swarm Optimization (PSO) in addressing thisproblem. Our results, using a simulated network of detectors with initial LIGO design sensitivitiesand a realistic signal strength, show that PSO can successfully deliver a fully-coherent all-sky searchwith < /
10 the number of likelihood evaluations needed for a grid-based search.
I. INTRODUCTION
The detection of gravitational waves, announced in2016 by the LIGO-Virgo Scientific Collaboration[1], haslaunched the new era of gravitational wave (GW) astron-omy. The detected signals, known as GW150914 [2, 3]and GW151226 [4], are best matched by Compact BinaryCoalescence (CBC) sources consisting of the inspiral andmerger of black holes. The signals were detected by thetwo aLIGO [5] detectors, which are currently undergo-ing commissioning to reach their design sensitivity. Overthe next few years, aLIGO will be joined by a worldwidenetwork of comparable sensitivity detectors, namely, ad-vanced Virgo [6], KAGRA [7] and LIGO-India [8]. Com-bining the data from this network of geographically dis-tributed second generation detectors will lead to a betteroverall sensitivity for CBC signals along with better lo-calization on the sky [9]. Prompt localization will enablethe study of such events using multiple messengers ofinformation [10].Given that theoretically computed waveforms for CBCsignals are sufficiently reliable over a broad parameterrange [11], it is natural to use the Generalized Likeli-hood Ratio Test (GLRT) and Maximum-Likelihood Es-timation (MLE) [12] for the detection and estimation, re-spectively, of such signals. However, both of these meth-ods involve a computationally expensive non-linear andnon-convex numerical optimization problem. Applied tothe data from a network of detectors, the MLE/GLRTapproach – called a fully-coherent search – requires the lo-calization of the global maximum of the likelihood over anat least nine dimensional parameter space [13], where thecomputation of the likelihood at each point requires cor-relations between pairs of time series involving ∼ O (10 )(for initial LIGO) to O (10 ) (for aLIGO) samples. Abrute force grid-based search for the global maximum isestimated to require ∼ × likelihood evaluations overthe low component mass range of 1 to 3 M (cid:12) in the caseof initial LIGO [14], with the number becoming substan-tially higher in the case of aLIGO. (The cost of a grid- based search is dominated by the exploration of the lowmass range due to longer signal durations [14].)The computational bottleneck in the fully-coherentsearch for CBC signals has restricted the scope of its ap-plicability so far. Fully-coherent search has either beenused only for targeted sky locations [15] or has been ap-proximated by semi-coherent all-sky searches [16]. Semi-coherent searches reduce the computational burden bydownselecting the number of data segments to analyze ina fully-coherent step. The downselection is based on re-quirements such as the simultaneous crossing of detectionthresholds [17] in at least two single-detector (incoherent)searches and closeness of the estimated signal parame-ters. As shown in [18], a semi-coherent search trades-offa significant amount of sensitivity for the reduced com-putational cost, with the detection volume being ∼ O (5 × ) iterations (likelihood eval-uations) in order to converge reliably. This is again toohigh a computational cost to allow coverage of all dataand MCMC methods have been used only as a parameterestimation step following a candidate detection.Fast methods for producing rapid estimates of sky lo-cation of a CBC source have been developed [21–23]. Inthese methods, the subset of so-called intrinsic signal pa-rameters, which are responsible for the bulk of the costof a grid search, are simply replaced by the values es- a r X i v : . [ g r- q c ] M a y timated in the computationally cheaper single detectorsearches. Under this approximation, the cost of evaluat-ing the likelihood over the remaining parameters can bereduced significantly. Other studies [24, 25] have shownthat it is possible to find alternative representations ofCBC signals that speed up the computation of the like-lihood function.Particle Swarm Optimization (PSO)[26] is a global op-timization method that has proven to be effective acrossa wide range of application areas [27] including astron-omy, such as CMBR analysis [28] and gravitational lens-ing [29]. In the context of GW data analysis, PSO wasfirst used in [30] for the single detector CBC search prob-lem where it was shown to be effective in locating theglobal maximum of the likelihood despite its ruggedness.In Pulsar Timing Array (PTA) based GW detection,PSO was used successfully in optimizing an extremelyrugged likelihood function over a 12-dimensional searchspace [31]. Other successful applications of PSO in GWdata analysis are growing at a steady rate [32].In this paper, we explore the effectiveness of PSO ina fully-coherent all-sky CBC search. For the purpose oftesting PSO, we take a four-detector network, each hav-ing the initial LIGO design noise Power Spectral Density(PSD) [33], and use the 2-PN binary inspiral waveform.We investigate the effectiveness of the GLRT and MLEas found by PSO for detection and parameter estimationrespectively.The rest of the paper is organized as follows. Sec IIdescribes the data and signal models used in the paper.Sec III presents the objective function to be optimizedin a fully-coherent all-sky search. The PSO algorithm isdescribed in Sec IV. The simulation setup and results aredescribed in Sec V. Our conclusions are presented in SecVI. II. DATA AND SIGNAL MODELS
In the following, the Fourier transform of a function, a ( t ), of time is denoted by (cid:101) a ( f ). The strain time seriesrecorded by the I th GW detector in a network of N de-tectors is x I ( t ) = h I ( t ) + n I ( t ); (1)where h I ( t ) is the detector response to the incident GWand n I ( t ) denotes detector noise. We will assume that n I ( t ) is a realization of a zero-mean, stationary Gaussianstochastic process, E [ n I ( t )] = 0; (2) E [˜ n I ( f ) (˜ n I ( f (cid:48) )) ∗ ] = 12 S n ( f ) δ ( f − f (cid:48) ) (3)with S n ( f ) denoting the one-sided noise power spectraldensity (PSD). It does not carry a detector index in thispaper because we assume identical PSD for all the detec-tors. We use the Earth Centered Earth Fixed Frame(ECEF) [34] to define the geometry of the GW detectornetwork and sources. In terms of the TT-gauge polariza-tion waveforms h + ( t ) and h × ( t ), the strain response ofthe I th detector is given by, h I ( t ) = F I + ( α, δ, ψ ) h + ( t − ∆ I )+ F I × ( α, δ, ψ ) h × ( t − ∆ I ) . (4)Here, α , δ are the azimuthal and polar angles that definethe unit vector pointing to the source, (cid:98) n = (cos α cos δ, sin α cos δ, sin δ ) , (5)with the direction of propagation of the GW plane wavebeing − (cid:98) n . The polarization angle ψ gives the orientationof the wave frame axes orthogonal to (cid:98) n with respect tothe fiducial basis, e Rx = (sin α, − cos α, , (6) e Ry = ( − cos α sin δ, − sin α sin δ, cos δ ) , (7)in the same plane. For CBC sources, the minor and majoraxes of the ellipse formed by the projection of the orbitof the binary on the sky provide the preferred orientationfor the wave frame axes. For generic sources, ψ can beset to zero. ∆ I is the time delay between the arrival ofthe signal at the ECEF origin and the detector,∆ I = r I . (cid:98) nc , (8)where r I is the position vector of the detector in theECEF. F I + and F I × are called the antenna pattern functionsand they are given by, (cid:18) F I + F I × (cid:19) = (cid:18) cos 2 ψ sin 2 ψ − sin 2 ψ cos 2 ψ (cid:19) (cid:18) U I + U I × (cid:19) . (9)Here, U I + and U I × depend on ( α, δ ) [35] and are definedby, U I + = ↔ (cid:15) + : ↔ d I ; U I × = ↔ (cid:15) × : ↔ d I ; (10)where ↔ a denotes a tensor and : denotes the contrac-tion operation on tensors. The polarization basis tensors ↔ (cid:15) + , × are given by, ↔ (cid:15) + = e Rx ⊗ e Rx − e Ry ⊗ e Ry , (11) ↔ (cid:15) × = e Rx ⊗ e Ry + e Ry ⊗ e Rx , (12)while ↔ d I is the detector tensor ↔ d I = 12 ( (cid:98) u I ⊗ (cid:98) u I − (cid:98) v I ⊗ (cid:98) v I ) , (13)where ( (cid:98) u I , (cid:98) v I ) are unit vectors along the arms of thedetector. Hanford Livingston Virgo Kagra u x -0.2239 -0.9546 -0.7005 -0.4300 u y u z v x -0.9140 0.2977 -0.0538 0.6821 v y v z -0.4049 -0.8205 0.2408 0.7292 x -2.1614e+6 -7.4276e+4 4.5464e+6 -3.7769e+6 y -3.8347e+6 -5.4963e+6 8.4299e+5 3.4839e+6 z TABLE I: Location and orientation vectors for thedetectors used in this paper. The vector componentsare specified in the ECEF. The components ( u x , u y , u z )and ( v x , v y , v z ) are for the unit vectors (cid:98) u I and (cid:98) v I alongthe detector arms. The components ( x, y, z ) of theposition vector r I are in meters. The data in this tablewas obtained from [39, 40].The relation between the responses of all the detec-tors in a network, with each appropriately time-shiftedto compensate for the delay ∆ I , to the incoming GWstrain signal can be expressed in the following compactform. h ( t ) h ( t )... h N ( t ) = F (cid:18) h + ( t ) h × ( t ) (cid:19) , (14)where the I th row of F , called the antenna pattern ma-trix , contains ( F I + , F I × ). It is know that F can becomerank-deficient for certain parts of the sky, leading to anill-posed inverse problem that can have a significant ef-fect [36–38] on parameter estimation errors. The rank-deficiency of F is quantified in terms of its condition num-ber.In this paper, we use a four-detector network consistingof the two LIGO detectors at Hanford (H) and Livingston(L), Virgo (V) and Kagra (K). We assume the initialLIGO design PSD [33] for the noise in each detector.The orientations and locations of the detectors, providedin Table I, match their real-world values. Our choice of the initial LIGO PSD allows data real-izations to be considerably shorter in length than whatis needed for the PSDs of advanced detectors. This re-duces computational costs, which is appropriate for a firstinvestigation of PSO in the context of a fully coherentsearch. A. Restricted 2-PN signal
The signal model used in this paper is the restricted2-PN waveform from a circularized binary consisting ofnon-spinning compact objects. The phase of the signalis calculated up to order ( v/c ) in the post-Newtonianexpansion but the amplitude modulation is calculatedonly at the lowest (Newtonian) order [41].The GW polarization waveforms can be expressed con-veniently in the Fourier domain using the stationaryphase approximation [11], (cid:101) h + ( f ) = A f r (1 + cos ι )2 f − / exp[ − i Ψ( f )] (15) (cid:101) h × ( f ) = A f r cos ιf − / exp[ − i (Ψ( f ) + π/ f )is given by,Ψ( f ) = 2 πf t c − φ c − π/ (cid:88) j =0 α j (cid:18) ff ∗ (cid:19) ( − j ) / . (17)The functional form of α j is given in Appendix A.They only depend on the component masses m and m through the chirp times τ and τ . . t c is the time atwhich the end of the inspiral signal arrives at the ECEForigin. The amplitude A f depends on the componentmasses m and m . The coalescence phase of the signalis given by φ c . It is possible to absorb r , φ c , ψ and ι ina new set of parameters A k , ( k = 1 , . . . , h I ( t ) = (cid:88) k =1 A k h kI ( t ) , (18)with, A = 1 r (cid:18) β cos φ c cos 2 ψ − cos ι sin φ c sin 2 ψ (cid:19) , A = − r (cid:18) β sin φ c cos 2 ψ + cos ι cos φ c sin 2 ψ (cid:19) , (19) A = 1 r (cid:18) β cos φ c sin 2 ψ + cos ι sin φ c cos 2 ψ (cid:19) , A = − r (cid:18) β sin φ c sin 2 ψ − cos ι cos φ c cos 2 ψ (cid:19) , (20)where β = (1 + cos ι ) /
2. Here, the waveforms h kI , k = 1,...,4, are defined as h I ( t ) = U I + h c ( t − ∆ I ) , h I ( t ) = U I × h c ( t − ∆ I ) ,h I ( t ) = U I + h s ( t − ∆ I ) , h I ( t ) = U I × h s ( t − ∆ I ) , with, (cid:101) h c ( f ) = A f f − / exp[ − i Ψ( f ) | φ c =0 ] , (21) (cid:101) h s ( f ) = − i (cid:101) h c ( f ) . (22) III. FULLY-COHERENT ALL-SKY SEARCH
Under our assumption of Gaussian, stationary noise,the log-likelihood Ratio (LLR) [42] for the I th detectoris given by, ln λ I = (cid:104) x I | h I (cid:105) − (cid:104) h I | h I (cid:105) , (23)where, (cid:104) p | q (cid:105) = 4 Re (cid:90) ∞ df (cid:101) p ( f ) (cid:101) q ∗ ( f ) S n ( f ) . (24)If we assume the noise in different detectors to be statis-tically independent, the log-likelihood for an N detectornetwork is given by,ln λ (N) = N (cid:88) I =1 (cid:20) (cid:104) x I | h I (cid:105) − (cid:104) h I | h I (cid:105) (cid:21) . (25)Substituting from Eqs. 18, 20, and 21 we get,ln λ ( N ) = A T H −
12 A T MA (26)where A = ( A , A , A , A ) T , H = ( H , H , H , H ) T ,and H a = N (cid:80) I =1 (cid:104) x I | h Ia (cid:105) for a = 1 , . . . ,
4. The matrix M isgiven by M = A B
B C
A B
B C (27)where, A = N (cid:88) i =1 U I + U I + (cid:104) h Ic | h Ic (cid:105) (28) B = N (cid:88) i =1 U I + U I × (cid:104) h Ic | h Ic (cid:105)C = N (cid:88) i =1 U I × U I × (cid:104) h Ic | h Ic (cid:105) It follows that, for a given data realization, the log-likelihood is a function of the parameters A and θ = { τ , τ . , α, δ, t c } .The GLRT statistic is the global maximum of the LLRover the parameters A and θ . Following [16], it is denotedby the equivalent statistic ρ coh , ρ = 2 max A ,θ ln λ ( N ) , (29) called the coherent search statistic . The MLE, (cid:98) A and (cid:98) θ ,of A and θ respectively are the maximizers.The Maximization of the log-likelihood can be carriedout as, ρ = max θ γ ( θ ) , (30) γ ( θ ) = 2 max A ln λ ( N ) . (31)The inner maximization over A can be performed ana-lytically, giving (cid:98) A = M − H , (32)as the solution. Then, γ ( θ ) = H T M − H . (33)The outer maximization over θ must be carried out nu-merically. For fixed { τ , τ . , α, δ } , the maximization over t c , can be carried out very efficiently using the FastFourier Transform (FFT) since (cid:104) p | q ( t − t c ) (cid:105) is a corre-lation operation [16]. We callΓ (Θ) = max t c γ ( θ ) ⇒ ρ = max Θ Γ (Θ) , (34)with Θ = { τ , τ . , α, δ } , the coherent fitness function. Its maximization over Θ is the main challenge in the im-plementation of a fully coherent search. In this paper,we use PSO, described next, to carry out this task.
IV. PARTICLE SWARM OPTIMIZATION
PSO is an optimization method derived from a simpli-fied mathematical model of the swarming behavior ob-served in nature across many species. It uses a fixednumber of samples (called particles ) of the function tobe optimized (called the fitness function ). The particlesare iteratively moved in the search space following a setof dynamical equations.
A. PSO algorithm
To provide a rigorous description of PSO, we adopt thefollowing notation in this section. • f ( x ): the scalar fitness function to be optimized,with x = ( x , x , . . . , x d ) ∈ R d . In our case, x = Θ, f ( x ) is the coherent fitness function Γ (Θ) (c.f.,Eq. 34) and d = 4. • S ⊂ R d : the search space defined by the hypercube a i ≤ x i ≤ b i , i = 1 , , . . . , d . • N p : the number of particles in the swarm. • x i [ k ]: the position of the i th particle at the k th iteration. • p i [ k ]: the best location found by the i th particleover all iterations up to and including the k th . f ( p i [ k ]) = max j ≤ k f ( x i [ j ]) . (35) • p g [ k ]: the best location found by the swarm overall iterations up to and including the k th . f ( p g [ k ]) = max ≤ j ≤ N p f ( x j [ k ]) (36)The PSO dynamical equations are as follows. v i [ k + 1] = w [ k ] v i [ k ] + c r ( p i [ k ] − x i [ k ]) + c r ( p g [ k ] − x i [ k ]) . (37) x i [ k + 1] = x i [ k ] + z i [ k + 1] , (38) z ji [ k ] = v ji [ k ] , − v j max ≤ v ji [ k ] ≤ v j max − v j max , v ji [ k ] < − v j max v j max v ji [ k ] > v j max , (39)Here, v i [ k ] is called the “velocity” of the i th particle, w [ k ]is a deterministic function known as the inertia weight(see below), c and c are constants, and r i is a diagonalmatrix with independent, identically distributed randomvariables having a uniform distribution over [0 , cognitive and social terms respectively.The iterations are initialized at k = 1 by indepen-dently drawing (i) x ji [1] from a uniform distribution over[ a j , b j ], and (ii) v ji [1] from a uniform distribution over[ − v j max , v j max ].The behavior of particles must be prescribed when theyexit the search space. We adopt the standard “let themfly” boundary condition under which a particle outsidethe search space is assigned a fitness values of −∞ . Sinceboth p i [ k ] and p g [ k ] are always within the search space,such a particle is eventually dragged back into the searchspace by the cognitive and social terms. The stoppingcriterion adopted is simply that a fixed number, N iter , ofiterations are completed. B. Assessing convergence
The conditions for a stochastic optimizer to convergeto the global optimum [43] require that (a) every measur-able subset of the search space be visited at least once,and (ii) the fitness value at any iteration is equal or betterthan the value at the previous one. There are no stochas-tic optimization methods, including PSO, that pass theseconditions in a finite number of iterations. Hence, con-vergence to the global maximum is not guaranteed. Notethat this does not mean that PSO cannot converge to theglobal maximum. It simply means that we can never besure if it has found the global maximum or not. One canonly talk about the probability of convergence in a givenrun of PSO. A simple way to increase this probability to near certainty is to do a sufficient number of independentruns of PSO. If the probability of convergence in a singlerun is P conv , then the probability of not converging inany of N runs independent runs is (1 − P conv ) N runs .Another issue with using a stochastic optimizationmethod like PSO is that it is not directly possible toverify that the end result of PSO is actually close to theglobal maximum or not. This is because verification canonly be done using a grid search and, for the problem ofinterest, a grid search may simply be infeasible. (The-oretical studies of PSO use benchmark fitness functionsfor which the location of the global optimum is knownby construction.) However, as pointed out in [30], thereexists an indirect way to check if PSO is performing sat-isfactorily in the case of likelihood maximization withdata containing a signal injected with known parameters.Since any parameter estimation method incurs an esti-mation error caused by the shift of the global maximumaway from the true signal parameters, the global maxi-mum of the fitness must be higher than the fitness valueat the true signal parameters, the latter being known forsimulated data. Thus, we can find out if PSO is doing asatisfactory job or not by checking that it yields a fitnessvalue greater than the one at the true signal parameters. C. PSO tuning
Stochastic optimizers such as PSO need to trade offwide-ranging exploration of the search space against ex-ploitation of a good candidate location. These two phasesare in conflict with each other, requiring a proper bal-ance in the relative time spent in each phase. In gen-eral, more exploration leads to higher computational costwhile making it too short leads to premature convergenceto a local maximum. Fig. 1 shows the global best fitnessvalue evolution for the coherent network analysis prob-lem. One can see how PSO initially converges rapidlyduring the exploration phase and then slows down whileit searches for the best value in a small region during theexploitation phase.In the version of PSO described here, the main pa-rameter controlling the transition from exploration to ex-ploitation is the inertia weight. In this paper, the inertiaweight is chosen to decay linearly from a value w max at k = 1 to w min at k = N iter .An attractive feature of PSO is the apparent robust-ness of its parameter values across a wide range of op-timization problems [44]. This greatly reduces the ef-fort needed to tune the algorithm for satisfactory per-formance. We find that, in the optimization problemconsidered here, fairly standard settings [45] for the PSOparameters work well. The values used for these param-eters are N p = 40, c = c = 2 . w max = 0 .
9, and w min = 0 . N iter . To perform the tuning, we examinedthe evolution of the the global best fitness f ( p g [ k ]) as aFIG. 1: Evolution of the mean and standard deviationof the coherent search statistic for a single datarealization. The blue line is the average over 225independent PSO runs, while the red curves show the1 σ standard deviation. The data realization contains asignal with a coherent network SNR (defined in Eq. 40)of 9 . k . The tuning process starts by picking an N iter value that is sufficiently deep in the exploitationphase based on a curve such as Fig. 1. We then do 12independent runs of PSO with this value of N iter and findthe fraction of runs in which the final global best fitnessexceeds the fitness at the true signal parameters. Thisgives an estimate of P conv , the probability of successfulconvergence. We increase N iter until P conv (cid:39) .
3, whichgives a probability of failure in 12 independent PSO runsof 0 . P conv is estimated using only12 trials, it is not very accurate. The actual probabilityof successful convergence is discussed in Sec. V. Based onthis tuning procedure, we set N iter = 500.It is important to note that the PSO algorithm pre-sented here is considered to be one of the most basicamong the general class of algorithms that have been pro-posed under the PSO meta-heuristic [46]. An importantvariant, for example, is the use of a neighborhood bestlocation [47] instead of the global best p g [ k ]. Anothervariant [27] applies a constriction factor to the equationfor v i [ k ] instead of clamping its components to the inter-val [ − v max , v max ]. We did not find it necessary to explorethese other variants because the basic version of PSO ap-pears to do a satisfactory job. V. RESULTS
We test the performance of PSO using simulated re-alizations of data for the HLVK network described inSec. II. For each data realization, N runs = 12 indepen-dent PSO runs are carried out. The result for each datarealization corresponds to the output from the run thatachieves the best final value of the coherent search statis-tic ρ coh . The independent PSO runs are executed in par-allel, and the choice of N runs = 12 arises from the 12processing cores per node in the computing cluster that TABLE II: Source sky locations used in thesimulations. The first column lists the label assigned tothe location. The second and third columns list theazimuthal and polar angles of the source location in theECEF. The condition number of the antenna patternmatrix at each chosen source location is listed in thelast column. Label α (deg) δ (deg) log (Condition number)L1 80.79 -29.22 0.9711L2 -128.34 -33.80 0.6079L3 -81.93 13.75 0.6008L4 32.09 -53.86 3.2436L5 150.11 -60.16 0.0066L6 -122.61 41.25 1.6159 was used for the analysis.The simulated signals are normalized to have a speci-fied coherent network SNR (SNR coh ), defined as,SNR coh = γ ( θ true ) = (cid:104) H T M − H | ˆ A ,θ true (cid:105) / , (40)where θ true denotes the values of { τ , τ . , α, δ, t c } associ-ated with the signal to be normalized. SNR coh is relatedto the optimal network signal to noise ratio (SNR opt ) by,SNR opt (cid:39) √ coh where SNR opt is defined as,SNR opt = E [ln λ ( N ) | H ] − E [ln λ ( N ) | H ] (cid:2) E [(ln λ ( N ) − E [ln λ ( N ) | H ]) | H ] (cid:3) / (41)= (cid:34) N (cid:88) I =1 (cid:104) h I | h I (cid:105) (cid:35) / . (42)Here, E [ X | A ] denotes the conditional expectation of arandom variable X given condition A . H and H corre-spond to the cases where a signal is, respectively, absentor present in the data. Normalization using SNR opt as-sumes the best-case scenario where all the signal param-eters, including A , are known a priori , while normal-ization with SNR coh relaxes this unrealistic assumptionsomehwhat.We pick several combinations of binary componentmasses and source sky locations to generate data re-alizations containing signals. We label these combina-tions using the scheme M a L b , where a ∈ { , } and b ∈ { , , , , , } . M1 and M2 refer to the pair of binarycomponent masses (1 . M (cid:12) , . M (cid:12) ) and (4 . M (cid:12) , . M (cid:12) )respectively. L b refers to the source sky location, forwhich the values used are listed in Table II.For each set, M a L b , of parameters, 120 data realiza-tions are generated. In all cases, the signals are normal-ized to have SNR coh = 9 . opt = 12 . ι = 0. As such, ψ gets absorbed into the coales-cence phase parameter which is set to φ c = π/ F . Since the antenna pattern functions depend on thesky location of a source, so does the condition number.The sky locations chosen in Table II sample the conditionnumber across a range of values, corresponding to a wellconditioned (low) to poorly conditioned (high) inverseproblem.For each realization, the data is generated directly inthe Fourier domain with a frequency spacing of 0.0156 Hzbetween consecutive bins and a maximum (Nyquist) fre-quency of 1024 Hz. In the time domain, this correspondsto a data segment duration of 64 sec sampled at 2048 Hz.The chirp times corresponding to the two sets of massesused in the simulation are ( τ , τ . ) = (24 . , . . , . , . τ , (ii)[0 , . τ . , (iii) [ − , α ) and (iii)[ − ,
90] degrees for declina-tion ( δ ). A. Detection performance
It is important to note that the value of the coher-ent search statistic, ρ coh , found by PSO need not be theactual value, namely, the true global maximum of thelog-likelihood ratio. Hence, it is important to verify that ρ coh as found by PSO performs well in terms of detection.Fig. 2 shows the distribution of ρ coh found by PSO fordata realizations corresponding to (i) the null hypothesis( H ): signal absent, and (ii) the alternative hypothesis( H ): signal present. For the latter, we have combinedthe ρ coh values for the 12 source parameter values, M a L b ,used in the generation of data realizations containing sig-nals. For the number of data realizations used in oursimulations, there is no overlap between the distributionof ρ coh for H and H . This suggests that ρ coh found byPSO performs quite well as a detection statistic and thatit is an acceptable surrogate for the true coherent searchstatistic.Fig. 3 shows a scatterplot of ρ coh found by PSO againstthe coherent fitness function at the true signal location,Γ(Θ true ), where Θ true denotes the known parameters ofthe signal injected into the data realization. As discussed in Sec. IV B, ρ coh ≥ Γ(Θ true ) indicates that PSO haslikely found the global maximum. We find that this con-dition is satisfied in 93 .
4% of the total number of datarealizations when the number of independent PSO runsis set to N runs = 12. Quantifying the departure fromthis condition in terms of q = (1 − ρ coh / Γ(Θ true )) × . . .
7% of all trials satisfy q ≤ ≤ ≤
10% respectively. When the number ofPSO runs is set to N runs = 24 for the realizations thatshowed a departure from the above condition, the vastmajority ended up satisfying the condition again, lead-ing to 99 . . q ≤ ≤ ≤
10% respectively.While a violation of the condition ρ coh ≥ Γ( θ true )means a failure to locate the global maximum, the dropin ρ coh found by PSO from its true value may be smallenough that the detection threshold is still crossed. Thiswould result in the detection of a signal, although it maynot provide a good estimate of its parameters. The lossin detection probability can be estimated in terms of thefraction of realizations in which ρ coh found by PSO fellbelow a given detection threshold while Γ(Θ true ) stayedabove it. Since the true ρ coh is always greater thanΓ(Θ true ), the latter condition implies that a detectionwould have resulted if the true ρ coh were available. For adetection threshold η , we find that the fractional loss indetection probability, given by the number of realizationswhere ρ coh ≤ η and Γ(Θ true ) ≥ η relative to the numberwhere Γ(Θ true ) ≥ η , is between (cid:39) .
9% to (cid:39) .
5% for8 . ≤ η ≤ . B. Estimation performance
The performance of PSO in estimating the sky loca-tion of a source is shown in Fig. 4, with zoomed in viewsshown in Figs. 5 and 6. It is evident from Fig. 4 thatthe distribution of sky localization error is strongly in-fluenced by the condition number of the antenna patternmatrix (c.f. Eq. 14) at the true location. A source loca-tion with a low condition number tends to have a smallpositional error.We use the area 16 σ α σ δ cos δ of the box of side length4 σ α and 4 σ δ , where σ α and σ δ are the standard deviationsin estimates of α and δ respectively, as a simple measureof position error. For the sources M1L5 and M2L5 thathave the lowest condition number, the position errors are10 .
43 and 7 .
32 deg respectively. For the highest condi-tion number location (L4), we get 152 .
45 and 140 .
39 deg for M1 and M2 respectively. The highest position erroroccurs not at the highest condition number but at L6,which is the second highest. However, our simple mea-sure of position error is wholly inapplicable to these ex-treme locations because the error is distributed along astretched out region. A proper estimation of errors forextreme condition numbers requires a much larger num-ber of data realizations in order to map out the errorregion with a sufficient density of sample points. ThisFIG. 2: Distribution of coherent search statistic, ρ coh , values found by PSO. Left:
Noise-only case obtained from1024 independent data realizations.
Right:
Signal plus noise for SNR coh = 9.
For the latter, we have pooled together the values of ρ coh for all the 12 source parameter value sets used in the simulations.This leads to 1440 independent trial values of ρ coh for the histogram on the right. is postponed to future work pending ongoing work onincreasing the computational efficiency of our codes.Table III summarizes the marginal distribution of er-rors up to the second moment for all the signal param-eters constituting the PSO search space. No clear trendemerges for the dependence on condition number of theerrors in the chirp time parameters τ and τ . . It islikely that resolving a dependence, if any, requires a sig-nificantly larger number of data realizations. For com-pleteness, the marginal distributions are shown in Figs. 7and 8. Table IV lists the sample correlation coefficientsbetween pairs of parameters. The sample correlation co-efficient between τ and τ . is ≥ . α and δ , and chirp timeparameters is negligible. While this result, which strictlyholds only for asymptotically large SNRs, is borne outby our simulation for the majority of cases, there aresome sky locations for both the M1 and M2 sets wherethis does not hold. For example, for M1, there are twolocations, L6 and L2, where the sample correlation co-efficients are − .
452 and 0 .
416 respectively, while it islow (absolute value (cid:46) .
24) elsewhere. Thus, the Fisherinformation may not be a good predictor of covariancesbetween parameters for all source locations.
C. Computational cost
Obtaining the coherent fitness value for each PSO par-ticle is the computationally most expensive step. Thecalculation of a single fitness value requires, (i) the gen-eration of two template waveforms (Eq. 21, Eq. 22) inthe Fourier domain, (ii) taking a sample-wise product ofthe data with each of the template waveforms (Eq. 24),and (iii) taking the inverse FFT of each such productsequence [16]. The computational cost of each fitnessevaluation in the PSO based approach is identical tothose of other stochastic optimization algorithms, suchas MCMC, that have been used for fully-coherent all-skysearch.Among the above operations, the generation of tem-plate waveforms is the computationally most expensivestep [50]. In situations where a grid-based search for theglobal maximum is computationally feasible, templatewaveforms can be pre-computed and stored in advance,thus saving the cost of generating waveforms on-the-fly.Stochastic search algorithms do not use pre-computedwaveforms and, hence, must contend with this extra cost.Several schemes [51–54] have been constructed to speedup waveform generation but we have not implementedany of these in our code so far. Besides this, our code iswritten entirely in Matlab and suffers a penalty in speedas a result. Thus, the wall-clock execution time of ourcode is not the correct metric to use for judging the com-putational savings brought about by PSO. The only use-ful metric for comparing PSO with other methods is theTABLE III: Sample mean and standard deviation (SD) of signal parameter estimates. The estimates of the binarycomponent masses, m and m , are derived from those of the chirp time parameters τ and τ . using the relationsgiven in A. τ (s) τ . (s) α (deg) δ (deg) m ( M (cid:12) ) m ( M (cid:12) )Source Mean SD Mean SD Mean SD Mean SD Mean SD Mean SDM1L1 24.848 0.017 0.866 0.018 80.78 1.13 -28.90 1.79 1.491 0.138 1.299 0.089M1L2 24.846 0.048 0.863 0.036 -128.61 2.57 -34.40 3.68 1.502 0.161 1.282 0.099M1L3 24.848 0.023 0.863 0.022 -82.03 0.64 13.55 1.59 1.471 0.151 1.311 0.089M1L4 24.849 0.024 0.866 0.025 31.73 2.09 -51.02 7.73 1.506 0.174 1.284 0.104M1L5 24.849 0.018 0.866 0.018 150.09 1.56 -60.10 0.84 1.499 0.145 1.292 0.092M1L6 24.849 0.022 0.865 0.023 -120.20 10.96 42.37 8.95 1.498 0.157 1.291 0.098M2L1 9.750 0.023 0.728 0.025 80.61 2.25 -29.06 3.26 4.594 0.266 1.406 0.077M2L2 9.747 0.023 0.725 0.022 -128.41 1.47 -34.17 2.09 4.559 0.247 1.415 0.080M2L3 9.748 0.016 0.726 0.018 -82.02 0.65 13.52 1.79 4.570 0.189 1.410 0.052M2L4 9.752 0.015 0.731 0.016 31.83 2.15 -51.74 6.92 4.623 0.164 1.396 0.042M2L5 9.750 0.023 0.728 0.021 150.08 1.46 -60.11 0.63 4.590 0.239 1.407 0.088M2L6 9.748 0.025 0.726 0.022 -122.14 5.27 41.26 7.85 4.563 0.272 1.415 0.103 FIG. 3: Comparsion of the coherent search statistic ρ coh found by PSO with the coherent fitness valueΓ(Θ true ) at the true signal parameters, Θ true . Each dotcorresponds to one data realization, from a total of 1440realizations across all the source parameters used.Dashed lines show the 3%,5%, and 10% drop from thecoherent fitness value. Black dots indicate datarealizations for which ρ coh < Γ(Θ true ) with N runs = 12indpendent PSO runs, but recovered to ρ coh ≥ Γ(Θ true )when N runs = 24. The total number of points below thediagonal is 95. TABLE IV: Sample pair-wise correlation coefficients ofparameters. The parameter pairs are listed in theheadings of the columns. Source ( τ , τ . ) ( α, δ ) ( τ , α ) ( τ , δ ) ( τ . , α ) ( τ . , δ )M1L1 0.933 -0.618 0.152 -0.111 0.165 -0.096M1L2 0.955 0.725 0.223 0.435 0.105 0.310M1L3 0.959 0.174 -0.231 -0.179 -0.263 -0.163M1L4 0.972 -0.791 -0.151 0.020 -0.144 0.051M1L5 0.951 -0.426 -0.206 0.120 -0.229 0.150M1L6 0.926 0.806 -0.452 -0.304 -0.345 -0.253M2L1 0.970 -0.891 0.054 -0.025 0.045 -0.028M2L2 0.947 -0.003 0.416 0.171 0.259 0.222M2L3 0.935 0.305 -0.067 -0.034 -0.070 -0.011M2L4 0.928 -0.794 0.011 0.081 0.058 0.031M2L5 0.956 -0.147 0.093 -0.122 0.024 -0.095M2L6 0.959 0.821 0.018 0.056 0.015 0.067 total number of fitness evaluations. Given the same codefor evaluating the fitness function, algorithms that get byusing a smaller number of evaluations will automaticallyhave a better execution speed.Since the termination criteria used for PSO in thispaper is simply the number of iterations, there is anupper limit to the total number of fitness evaluationsof 40(number of particles) × × . × . However, the ac-tual number of fitness evaluations is generally lower be-cause of the boundary condition used, which allows par-ticles to escape the search space. Particles outside thesearch space do not evaluate their fitness until they arepulled back in. Thus, the number of fitness evaluations0FIG. 4: Estimated sky locations for the mass sets M1 ( m = 1 . M (cid:12) , m = 1 . M (cid:12) ) and M2( m = 4 . M (cid:12) , m = 1 . M (cid:12) ). The true sky locations are indicated with yellow dots and listed in Table II. Estimatedsky locations associated with a particular true location have the same color. Open circles with dots correspond tothe data realizations where the coherent search statistic found by PSO failed to exceed the coherent fitness at thetrue signal parameter, ρ < Γ( θ true ), even for N runs = 24 independent PSO runs. The background gray-scale image inboth panels is identical and shows the condition number sky map for the HLVK network corresponding to ψ = π/ Min Max Mean SDSignal Absent ( H ) 163908 230808 210920 10758Signal Present ( H ) 193620 232716 221875 5373 can fluctuate across data realizations. Table V lists astatistical summary of the number of fitness evaluationsobtained across all data realizations and all sources.We see that the mean number of fitness evaluations isslightly lower in the case where a signal is absent. Thus,particles have an enhanced tendency to exit the searchspace boundary when a signal is absent. This may bea result of the fact that the contrast between the val-ues at the local maxima of the fitness function is lesspronounced in this case. Since, most of the data from aGW detector consists of only noise, the fitness evaluationcount for the signal absent case is more representative ofthe computational cost that will be incurred in practice. VI. CONCLUSION
This paper presents a study of a PSO based approachto solving the computational challenge, stemming fromthe necessity to carry out a high dimensional numericaloptimization task, in a fully-coherent all-sky search forCBC signals.At an astrophysically realistic signal strength (e.g., theSNR opt used here matches SNR opt = 13 for GW151226),we find that the best fitness value returned by PSO canapproximate the GLRT quite effectively, suffering ≤ . < /
10 the num-ber of likelihood evaluations needed for the grid-based orBayesian searches. It is important to emphasize here thatwe are not altering the standard representation of CBCwaveforms or approximating the likelihood function inany way. Any alternative scheme for likelihood or wave-form calculation can be substituted without affecting thePSO algorithm itself.A comparsion of our parameter estimation results withtheoretical bounds derived from a Fisher informationanalysis is not meaningful at the SNR value consid-ered in this paper. This is because several studies haveshown [55, 56] that these bounds are reached only at sig-nificantly higher SNR opt values.A direct comparison with existing parameter estima-tion results from Bayesian approaches is difficult, sincethe definition of error in a Bayesian analysis differs from1FIG. 5: Estimated sky locations (red dots) associated with the set M1 ( m = 1 . M (cid:12) , m = 1 . M (cid:12) ) of sources. Ineach panel, the origin is centered at the true location of the source. The axes show the deviation of the estimatedvalues of α and δ from their true values. Each panel also shows the contour levels of the bivariate probabilitydensity function, estimated using Kernel Density Estimation (KDE), [58] that enclose 68% and 95% of the points.In these figures, the view has been zoomed in to show only the estimated locations that fall within or around theouter contour.the Frequentist one. Error in a Bayesian analysis is ameasure of the spread of the posterior probability dis-tribution. The latter can be obtained even for a singledata realization. The Frequentist error is a measure ofthe spread of the point estimates over an ensemble ofdata realizations. Nonetheless, pending a future apples-to-apples comparison of Bayesian and Frequentist errorson identical data realizations, we find that the best caseerror of ∼
10 deg in sky position from PSO is near theexpected ballpark at the value of SNR opt used here. Forexample, in [57] the sky location error for a signal withSNR opt = 29 . , m = 1 . M (cid:12) , m = 2 . M (cid:12) , and the HLVnetwork is found to be 3 deg .Although our results have been obtained for the idealcase of Gaussian, stationary noise, the computationalcost will not change significantly for real detector noise.Recall that we are using PSO for only locating the globalmaximum of a fitness function. As long as the natureof this fitness function, in terms of the density of localpeaks and the contrast in their values, does not changedrastically, PSO will have the same performance. Thisis already evident when the computational cost of PSOis compared for the signal present and absent cases. Weexpect the change in the nature of the fitness functionbetween these two cases to be far more significant thanthat between ideal and real detector noise. We have run PSO on a rectangular search region con-sisting of independent upper and lower bounds on eachparameter. This implies that the search range for eachchirp time includes unphysical values. However, our re-sults demonstrate that the probability of the global max-imum straying into the unphysical region at the value ofSNR opt used is negligible. This need not be true whenthe data contains pure noise, and this may affect detec-tion performance by increasing the false alarm probabil-ity somewhat. A rigorous study of this issue is postponedto future work.Our results show that PSO offers a promising approachto realize a constantly on, fully-coherent all-sky CBCsearch. Future investigations should address the follow-ing outstanding issues. (i) A determination of wall-clocktime savings after incorporating state-of-the-art wave-form generation and likelihood evaluation techniques. (ii)Reducing the instances of failure in locating the globalmaximum by trying out well-studied variants of PSO.For example, in [31], the neighborhood best, rather thanglobal best, variant of PSO was found to perform sig-nificantly better. (iii) Extension of the analysis to thecomputationally more demanding case of aLIGO noisepower spectral density.2FIG. 6: Estimated sky locations (red dots) associated with the set M2 ( m = 4 . M (cid:12) , m = 1 . M (cid:12) ) of sources. Ineach panel, the origin is centered at the true location of the source. The axes show the deviation of the estimatedvalues of α and δ from their true values. Each panel also shows the contour levels of the bivariate probabilitydensity function, estimated using Kernel Density Estimation (KDE), [58] that enclose 68% and 95% of the points.In these figures, the view has been zoomed in to show only the estimated locations that fall within or around theouter contour.FIG. 7: Histograms of estimated chirp times, τ and τ . , for all locations and mass set M1 (1 . M (cid:12) and 1 . M (cid:12) ).The true values of the chirp times are shown by the red line in each plot. For each source sky location, the τ and τ . distributions are adjacent and on the same row, with the τ . distribution always to the right of the τ one.3FIG. 8: Histograms of estimated chirp times, τ and τ . , for all locations and mass set M2 (4 . M (cid:12) and 1 . M (cid:12) ).The true values of the chirp times are shown by the red lines. For each source sky location, the τ and τ . distributions are adjacent and on the same row, with the τ . distribution always to the right of the τ one. Acknowledgements
We thank Y. Wang for sharing the KDE estimationcodes for sky localization studies. Most of the computa-tion was done on “Thumper”, a computer cluster fundedby NSF MRI award CNS-1040430. T.S.W. acknowledgeshelp from R. Jackson, manager of Thumper, in using thecluster. We thank M. Zanolin, E. Schlegel, J. Romanoand S. Mukherjee for helpful discussions. The contri-bution of S.D.M. to this paper was supported by NSFawards PHY-1505861 and HRD-0734800.
Appendix A: Functional forms of phase parameters
Let M, µ and η denote the total mass, the reducedmass and the symmetric mass ratio of the compact binarysystem respectively. Let f ∗ be the lower cutoff frequencyof the detector. Then for m > m , the chirp times aregiven by, τ = 5256 π f − ∗ ( GMc πf ∗ ) − / η − , (A1) τ = 5192 π f − ∗ ( GMc πf ∗ ) − η − (cid:0) η (cid:1) , (A2) τ . = 18 f − ∗ ( GMc πf ∗ ) − / η − , (A3) τ = 5128 π f − ∗ ( GMc πf ∗ ) − / η − (cid:0) η + 617144 η (cid:1) , (A4) where M = ( m + m ) , µ = m m M , and η = µ M . (A5)All the four chirp time parameters are functions of m and m , implying that only two of them are independent.We chose τ and τ . as independent parameters to char-acterize the signal. Estimated values of τ and τ . wereused to derive values for M and µ using the followingequations. µ = 116 f ∗ (cid:18) π τ τ . (cid:19) / (cid:18) Gc (cid:19) − . (A6) M = 532 f ∗ (cid:18) τ . π τ (cid:19)(cid:18) Gc (cid:19) − . (A7)The parameters α i , i = 0 , , . . . , f ) (Eq. 17) are given by, α = 2 πf ∗ τ , α = 0 , α = 2 πf ∗ τ , (A8) α = − πf ∗ τ . , α = 2 πf ∗ τ . [1] B.P. Abbott et al. (LIGO Scientific Collaboration, VirgoCollaboration), Phys. Rev. Lett. , 061102 (2016).[2] B.P. Abbott et al. (LIGO Scientific Collaboration, VirgoCollaboration), Phys. Rev. Lett. , 241102 (2016).[3] B.P. Abbott et al. (LIGO Scientific Collaboration, VirgoCollaboration), Phys. Rev. D. , 122003 (2016).[4] B.P. Abbott et al. (LIGO Scientific Collaboration, VirgoCollaboration), Phys. Rev. Lett. , 241103 (2016).[5] B. Abbott et al. , Classical Quantum Gravity , 074001(2015).[6] F. Acernese et al. , Classical Quantum Gravity , 024001(2015).[7] K. Somiya, Classical Quantum Gravity , 124007(2012).[8] C. S. Unnikrishnan, Int. J. Mod. Phys. D , 1341010(2013).[9] S. Fairhurst, New J. Phys. , 069602 (2011).[10] B. P. Abbott et al. (LIGO Scientific Collaboration, VirgoCollaboration), arXiv:1602.08492v3.[11] B. S. Sathyaprakash and S. V. Dhurandhar, Phys. Rev.D , 3819 (1991).[12] S. M. Kay, Fundamentals of Statistical Signal Process-ing, (Prentice Hall, 1993), Vol. 1 & 2.[13] A. Pai, S. Dhurandhar and S. Bose, Phys. Rev. D ,042004 (2001).[14] B. J. Owen and B. S. Sathyaprakash,Phys. Rev. D ,022002 (1999).[15] I. W. Harry and S.Fairhurst, Phys. Rev. D , 084002(2011).[16] S. Bose, T. Dayanga, S. Ghosh and D.Talukder, ClassicalQuantum Gravity , 134009 (2011).[17] R. Lynch, S. Vitale, R. Essick, E. Katsavounidis, F. Robi-net, arXiv:1511.05955v2.[18] D. Macleod, I. W. Harry and S. Fairhurst, Phys. Rev. D , 064004 (2016).[19] W. R. Gilks, S. Richardson and D. J. SpiegelhalterMarkov Chain Monte Carlo in Practice (Chapman andHall/CRC Press, 1996).[20] M. van der Sluys, V. Raymond, I. Mandel, C. Rver, N.Christensen, V. Kalogera, R. Meyer and A. Vecchio, Clas-sical Quantum Gravity , 184011 (2008).[21] L. P. Singer and, L. R. Price, Phys. Rev. D , 024013(2016).[22] N. J. Cornish, arXiv:1606.00953.[23] C. Pankow, P. Brady, E. Ochsner and R. OShaughnessy,Phys. Rev. D , 023002 (2015).[24] K. Cannon, A. Chapman, C. Hanna, D. Keppel, A. C.Searle and A. J. Weinstein, Phys. Rev. D , 044025(2010).[25] S. E. Field, C. R. Galley, F. Herrmann, J. S. Hesthaven,E. Ochsner and M. Tiglio, Phys. Rev. Lett. , 221102(2011).[26] J.Kennedy and, R.C. Eberhart, (IEEE Int. Conf. Proc.Neural Networks, 1995), p.1942 - 1948.[27] Y. Shi, Particle swarm optimization: developments, ap-plications and resources, (IEEE Proc. Evolutionary Com-putation, 2001), Vol. 1[28] J.Prasad and T. Souradeep, Phys. Rev. D , 123008(2012).[29] A. Rogers and J. D. Fiege, Astrophys. J. , 2 (2011).[30] Y. Wang and S. D. Mohanty, Phys. Rev. D , 063002 (2010).[31] Y. Wang, S. D. Mohanty and F. A. Jenet, Astrophys. J. , 1 (2014).[32] Y. Bouffanais and E. K. Porter, Phys. Rev. D , 064020(2016).[33] A. Lazzarini, R. Weiss, LIGO Technical Docu-ment, LIGO-E950018, https://dcc.ligo.org/LIGO-E950018/public.[34] A. Leick, GPS Satellite Surveying, (Wiley, 2004).[35] P. Jaranowski, K. D. Kokkotas, A. Krolak and G. Tsegas,Classical and Quantum Gravity , 1279 (1996).[36] S. Klimenko, S. Mohanty, M. Rakhmanov, and G. Mit-selmakher, Phys. Rev. D , 122002 (2005).[37] S.D. Mohanty, M. Rakhmanov, S. Klimenko and G. Mit-selmakher, Classical and Quantum Gravity , 4799(2006).[38] M Rakhmanov, Classical and Quantum Gravity, , 3515 (1995).[42] W.Helstrom, Elements of Signal Detection and Estima-tion, (Prentice Hall, 1995).[43] F. Solis and R. Wets, Mathematics of Operations Re-search,
19 (1981).[44] D. Bratton and J. Kennedy, (IEEE Proc. Swarm Intelli-gence Symposium, 2007).[45] R. Poli, J. Kennedy, and T. Blackwell, Swarm Intelli-gence, , 33-57 (2007).[46] A. P. Engelbrecht, Fundamentals of computationalswarm intelligence (Wiley, 2005).[47] J. Kennedy and R. Mendes, (IEEE Transactions on Sys-tems, Man and Cybernetics,2006).[48] B. S. Sathyaprakash, Phys. Rev. D , R7111 (1994).[49] L. P. Singer and L. R. Price, Phys. Rev. D , 024013(2016).[50] A. Buonanno, B. R. Iyer, E. Ochsner, Y. Pan and B. S.Sathyaprakash, Phys. Rev. D , 084043 (2009).[51] R. J. E. Smith, K.Cannon, C. Hanna, D. Keppel and I.Mandel, Phys. Rev. D , 122002 (2013).[52] S. E. Field, C. R. Galley, J. S. Hesthaven, J. Kaye, andM. Tiglio Phys. Rev. X , 031006 (2014).[53] M. P¨urrer Classical Quantum Gravity , 195010 (2014).[54] J. Blackman, S. E. Field, C. R. Galley, B. Szilagyi, M.A. Scheel, M. Tiglio, and D. A. Hemberger, Phys. Rev.Lett. , 121102 (2015).[55] R. Balasubramanian, B. S. Sathyaprakash and S. V. Dhu-randhar, Phys. Rev. D , 3033-3055 (1996).[56] M. Vallisneri, Phys. Rev. D , 042001 (2008).[57] C. Rover, R. Meyer and N. Christensen, Phys. Rev. D , 062004 (2007).[58] V. A. Epanechnikov, Theory Probab. Appl. ,153158 (1967).[59] L. Blanchet, B.R. Iyer, C.M. Will, and A.G. Wiseman,Classical Quantum Gravity13