Detecting Gravitational Waves in Data with Non-Gaussian Noise
Barak Zackay, Tejaswi Venumadhav, Javier Roulet, Liang Dai, Matias Zaldarriaga
DDetecting Gravitational Waves in Data with Non-Gaussian Noise
Barak Zackay, ∗ Tejaswi Venumadhav, Javier Roulet, Liang Dai, and Matias Zaldarriaga School of Natural Sciences, Institute for Advanced Study, 1 Einstein Drive, Princeton, NJ 08540, USA Department of Physics, Princeton University, Princeton, NJ, 08540, USA (Dated: August 16, 2019)Searches for gravitational waves crucially depend on exact signal processing of noisy strain datafrom gravitational wave detectors, which are known to exhibit significant non-Gaussian behavior.In this paper, we study two distinct non-Gaussian effects in the LIGO/Virgo data which reduce thesensitivity of searches: first, variations in the noise power spectral density (PSD) on timescales ofmore than a few seconds; and second, loud and abrupt transient ‘glitches’ of terrestrial or instru-mental origin. We derive a simple procedure to correct, at first order, the effect of the variation inthe PSD on the search background. Given the knowledge of the existence of localized glitches inparticular segments of data, we also develop a method to insulate statistical inference from theseglitches, so as to cleanly excise them without affecting the search background in neighboring sec-onds. We show the importance of applying these methods on the publicly available LIGO data, andmeasure an increase in the detection volume of at least 15% from the PSD-drift correction alone,due to the improved background distribution.
I. INTRODUCTION
The first detection of Gravitational Waves (GWs)from merging compact binaries in LIGO data opened upthe opportunity to study a new family of astrophysicalsources. Multiple sources have been detected in the firstand second observing runs of Advanced LIGO and Virgo(O1 and O2) [1–9]. State-of-the-art searches [10–12] suf-fer from reduced sensitivity due to non-Gaussian effectsin the strain data [13], which results in fewer detectedsources. It is of paramount importance to understandand correct for these non-Gaussian effects in order tomaximize the yield of the existing data.We distinguish two types of non-Gaussian behaviour,both of which contribute to the tail distribution of trig-gers obtained by matched filtering with a template bank.The first type is due to the changing power spectral den-sity (PSD) of the noise that constitutes the backgroundfor the GW search. As we show, changes in the PSD onshort timescales (even as short as 10 s) are important tocapture in order to suppress over-production of candidatesignals with high signal-to-noise ratio (SNR). The secondtype of non-Gaussian behavior is abrupt noise transients(‘glitches’ on sub-second timescales) that are caused byeither environmental disturbance or instrumental mal-function. The origin of many of these glitches is not yetunderstood [13].In this paper, we treat the problem of detecting pu-tative GW events in the presence of these systematic ef-fects. Firstly, we propose a simple practical solution thatcorrects the effect of a varying PSD on short timescales,to first order in the change in the PSD. Secondly, wepresent a method to isolate the detection statistic forcompact binaries from abrupt noise transients at giventimes and of given lengths. ∗ [email protected] Section II is devoted to the effect of slow variations inthe PSD. We begin by reviewing matched filtering, andthe process of PSD estimation, in stationary Gaussiannoise. We demonstrate the magnitude and timescale ofthe PSD variations in the LIGO data, and derive howthese variations cause a loss in sensitivity for searches.We then propose a practical and simple way to cancel thefirst order loss in sensitivity due to the mis-estimationof the local PSD. Essentially, this solution is to divideall computed matched filtering overlaps by their locallyestimated standard deviation σ z ( t ), given by: σ z ( t ) = 1 N a t + − N a ∆ t/ (cid:88) t (cid:48) = t − N a ∆ t/ | z ( t (cid:48) ) | , (1)where z ( t ) are the matched filtering overlaps, definedin Section II A, and N a is the number of scores usedto estimate the average. Finally, we use GW triggerson the entirety of the publicly available O2 LIGO data[14, 15] to explicitly demonstrate the dramatic effects ofPSD changes on the tail distribution of matched filteringscores, and quantify the sensitivity gain when we applythis simple correction to the overlaps.In Section III we explain how to null the effect of iden-tified abrupt noise transients on matched filtering scoresof templates for long GW signals. We achieve this by re-placing (or ‘inpainting’) the bad segments of strain datawith values (that we solve for), such that the inverse-PSD-filtered (i.e., twice-whitened, or ‘blued’) data is zeroat all bad times. This guarantees that the offending datahas zero influence on any computation of the likelihood,and preserves optimal sensitivity for any real GW eventin the surrounding good data.The methods described in this paper have been im-plemented in the search pipeline described in Ref. [12],together with other improvements in candidate ranking[9], signal consistency checks [16] and template bank sub-division [17]. a r X i v : . [ a s t r o - ph . I M ] A ug II. VARIATIONS IN THE NOISE PSD
In this section we demonstrate that the noise PSD ofthe LIGO data exhibits slow drifts over timescales ofmore than a few seconds, and discuss the loss in sen-sitivity due to this drift. We present a simple way tomitigate this problem and quantify the gain in searchvolume from the proposed correction.
A. PSD estimation and matched filtering
We begin by defining and briefly reviewing the essen-tial features of matched filtering for signals within datawith stationary Gaussian random noise. The statisticalproperties of the noise are completely described by itsautocorrelation function, C n ( τ ), defined by (cid:104) d ( t ) d ( t + τ ) (cid:105) = C n ( τ ) . (2)The stationary nature manifests in the fact that the func-tion C n is only a function of the lag τ . Additionally, C n ( τ ) decays to zero for large values of the lag. Thetwo-sided Power Spectral Density (PSD) is defined asthe continuous Fourier transform of the autocorrelationfunction, i.e., S n , ( f ) = (cid:90) ∞−∞ dτ C n ( τ ) e − πifτ . (3)The data is real-valued, and hence C n ( τ ) and S n , ( f )are real-valued and even functions of their arguments. Itis conventional to define the one-sided PSD as S n ( f ) = S n , ( f ) + S n , ( − f ) = 2 S n , ( f ).We are interested in searching for a signal, say h ( t ),within the data. In our use-case, we can take h ( t ) tohave compact support when restricted to the sensitiveband of the detectors. We work with data that is sam-pled at an interval ∆ t , chosen such that the Nyquist fre-quency f s / / (2 ∆ t ) is high enough to encompass thesensitive band.Let us consider a segment of data, of length N , thatis long enough to contain the putative signal. In the ab-sence of the signal, the Discrete Fourier Transform (DFT)of the segment satisfies (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) = 12 e i π ( f m (cid:48) − f m ) ∆ t Our convention for the DFT is˜ d ( f m ) = N (cid:88) n =0 d ( n ∆ t ) e − πin ∆ tf m , where f m = mN ∆ t , − N/ ≤ m ≤ N/ , with N being even. × (cid:90) ∞−∞ d f S n , ( f ) W N ( f ; f m ) W N ( f ; f m (cid:48) ) , (4)where the window function W N ( f ; f m ) is defined inEq. (A6) (see Appendix A 1 for a derivation). Whenviewed as a function of frequency f , W N ( f ; f m ) exhibitsa series of peaks with height N and width ∼ / ( N ∆ t ),separated by the sampling frequency. If we assume thatthe data was properly bandpassed before sampling (toprevent aliasing), we can restrict to the frequency intervalbetween − f s / f s /
2. If the PSD behaves smoothlyon frequency scales of ∼ / ( N ∆ t ), W N ( f ; f m ) behaveslike a delta-function selecting the frequency f = − f m inthe integrand, and we have (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) ≈ N t S n ( f m ) δ m,m (cid:48) . (5)It is convenient to define a whitened data stream, d w ,according to: (cid:102) d w ( f m ) = (cid:20) tS n ( f m ) (cid:21) / ˜ d ( f m ) , (6)that satisfies (cid:68)(cid:102) d w ( f m ) (cid:104)(cid:102) d w ( f m (cid:48) ) (cid:105) ∗ (cid:69) = N δ m,m (cid:48) . (7)The whitened data is the result of convolving the rawdata stream with a whitening filter. We define the con-volution of the signal and the whitening filter as the‘whitened signal’, h w : (cid:102) h w ( f m ) = (cid:20) tS n ( f m ) (cid:21) / ˜ h ( f m ) . (8)Under the assumption of stationary Gaussian noise, thematched filtering score, defined as z = (cid:104) d w (cid:126) ←− h w (cid:105) (0) = 1 N (cid:88) m (cid:104) (cid:102) h w ( f m ) (cid:105) ∗ (cid:102) d w ( f m ) (9)is the optimal detection statistic for the signal h ( t ) inthe data (in the first equation, the symbol (cid:126) representsconvolution and the arrow signifies time-reversal). If thedata d and signal h are real-valued, so is the score z . Inthe absence of a signal, the variance of the score is (cid:104) z (cid:105) = (cid:104) h w (cid:126) ←− h w (cid:105) (0) = 1 N (cid:88) m (cid:12)(cid:12)(cid:12) (cid:102) h w ( f m ) (cid:12)(cid:12)(cid:12) , (10)which also equals its expectation value in the presence ofthe signal. Hence the quantity in Eq. (10) is the squaredsignal-to-noise ratio (SNR ). We can search for signalswith different arrival times by enumerating over shiftsin h ( t ) in Eq. (9) using an implementation of the FastFourier Transform (FFT). In practice, we also searchfor signals with constant phase-offsets, in which case thetemplate h ( t ) is complex-valued, and the detection statis-tic is the absolute value of Eq. (9). For the sake of sim-plicity, we assume that the template is real; all the resultsof this paper hold for complex templates as well.In deriving Equation (9) and in proving its optimal-ity, it is assumed that the noise PSD, S n ( f ), is known,while in practice, we have to measure it from the dataitself. A standard way to do so is the Welch method,which divides the data into many (ideally overlapping)segments, applies a window function followed by a DFTto each segment, and calculates S n ( f ) as the average ofthe power spectra in all the segments.The frequency resolution and the precision of the mea-sured PSD are important criteria to decide the numberand duration of segments to use with the Welch method.The frequency resolution depends on the length of thesegments, through the window functions W N ( f ; f m ) de-fined in Eq. (A6). The advanced LIGO noise containsseveral sharp spectral lines, at which the PSD is sev-eral orders of magnitude higher than the ‘floor’. If thechosen segments are too short, the lines are broadened(through convolution with W N ( f ; f m )) and bleed intosurrounding frequency bins: the effect is to reduce theSNR, and move us away from optimality. The stochasticerror of the PSD measurement depends on the number ofsegments that are averaged over. As shown in the nextsection, the signal recovery efficiency in the presence ofstochastic PSD errors is roughly 1 − . N − , where N seg is the number of segments used when estimating a PSDusing the Welch method. Together, these parameters de-fine a minimal duration over which the PSD needs to beestimated to achieve a target recovery efficiency. In orderto bound the sensitivity loss to less than a few percent,for LIGO data, the duration of the PSD measurementneeds to be roughly 10 s.The discussion so far has been theoretical; we wouldlike to test our assumptions, and whether the above pro-cedure achieves the required bounds on the sensitivity.The most direct check is to verify the statistics of thematched filtering scores, since they determine our sensi-tivity. In what follows, we normalize the template h sothat the variance (cid:104) z (cid:105) as given by Eq. (10) equals unity.We consider the matched filtering scores for a giventemplate h ( t ) as a time-series, i.e., z ( t = n ∆ t ) = (cid:104) d w (cid:126) ←− h w (cid:105) ( t )= 1 N (cid:88) m (cid:104) (cid:102) h w ( f m ) (cid:105) ∗ (cid:102) d w ( f m ) e πif m n ∆ t . (11)The simplest check is whether the actual variance of thescores (as estimated from the time-series) is consistentwith unity. Figure 1 shows the histogram of the esti-mated variance of the scores z for a single template overthe O2 data. We estimated the variance by averaging thepower z within rolling windows of length ∼
15 s, whichshould achieve 2% stochastic error on the variance. In-stead, we see that the standard deviation of the variancedistribution is approximately 8 − z P r o b a b ili t y o f o cc u r e n c e HanfordLivingston
FIG. 1: Histogram of variance values of overlaps computedon the entirety of O2 data. Measurement error on thesevalues is 2%. Measured deviations from unity are muchbigger (about 10%) and therefore produce a big difference insignificance determination if unaccounted for.
The derivation of the variance in Eq. (10) depends onthe noise being described by the PSD S n ( f ), i.e., thewhitened data satisfying Eq. (7). The failure in Fig. 1suggests that the PSD we estimated did not whiten thedata perfectly. It is well known that the behavior of thedetector varies with time (a drastic example of this isthe scaling of the noise curve with the varying level ofhuman activity in the vicinity of the detector). Due tothese phenomena, the noise characteristics can changeover timescales that are shorter than the O (10 ) secondswe use to measure the PSD.We can view the variance of this series (or instanta-neous power, z ) itself as a time-series, that is describedby its own PSD, S z . In the stationary case (when Eq. (5)holds), S z equals S z ( f m (cid:54) = 0) = 4∆ t (cid:94) (cid:12)(cid:12)(cid:12) h w (cid:126) ←− h w (cid:12)(cid:12)(cid:12) ( f m ) , (12)where h w (cid:126) ←− h w is the autocorrelation function of thewhitened waveform (see Appendix A 2 for a derivation).The autocorrelation function of a typical waveform has awidth that is of order few ms, and hence at frequenciessmaller than 1 Hz we expect the PSD of the power, S z ,to be flat.Figure 2 shows the PSD of the variance S z of thematched filtering scores, computed using a heavy binaryblack hole template (we estimate the local variance byconvolving the z series with a rolling rectangular windowof 1 s duration). The dashed curves are the estimated S z , for scores measured on stationary Gaussian noisegenerated using the fiducial PSDs for the Livingston andHanford detectors, respectively: they are flat as a func-tion of frequency, in line with the prediction of Eq. (12).The solid curves show the empirically measured variancePSDs, S z , for L1 and H1 data, averaged over the entirety Frequency (Hz)10 P o w e r s p e c t r u m ( a r b . un i t s ) HanfordLivingstonStationary noise (H PSD)Stationary noise (L PSD)
FIG. 2: It is necessary to track the drifting PSD on timescales of seconds. Solid lines show the empirically measuredpower spectrum S z of a time-series composed of themeasured variance of the overlaps in every second (note thatthe frequencies are much lower than the frequency content ofthe waveform itself). For reference, dashed lines show thesame on artificially generated stationary Gaussian noise.The variance time-series has a red-noise power spectrum. of O2. We omitted from the average any region flaggedas invalid either by the LIGO and Virgo Collaboration,or by our pipeline (see [12]), keeping only the contiguoussegments. In the rest of this section, we describe hownon-stationary noise can lead to such red-noise spectra,and what the measured curve tells us about the departurefrom the stationary case.In the non-stationary case, we start by generalizingEqs. (2) and (3): (cid:104) d ( T − τ / d ( T + τ / (cid:105) = C n ( τ ) + δC n ( τ ; T ) , (13)and S n , ( f ; T ) = S n , ( f ) [1 + (cid:15) ( f ; T )] , where (14) (cid:15) ( f ; T ) = 1 S n , ( f ) (cid:90) ∞−∞ dτ δC n ( τ ; T ) e − πifτ . (15)In the above equations, (cid:15) ( f ; T ) is the fractional changein the noise PSD at frequency f , which we take to varyon timescales T (cid:29) τ (the frequency f is conjugate to theshort timescale τ ).If we whiten the data using the PSD estimated assum-ing stationary noise (i.e., S n ( f )) over a duration that islonger than the timescales over which the PSD varies, theequivalent of Eq. (7) is: (cid:68)(cid:102) d w ( f m ) (cid:104)(cid:102) d w ( f m (cid:48) ) (cid:105) ∗ (cid:69) ≈ N δ m,m (cid:48) + ˜ (cid:15) (cid:0) ¯ f ; ∆ f (cid:1) , (16)where ¯ f = ( f m + f m (cid:48) ) /
2, and ∆ f = f m − f m (cid:48) (seeAppendix A 1 for a derivation). In the second term,˜ (cid:15) (cid:0) ¯ f ; ∆ f (cid:1) is the DFT, evaluated at ∆ f , of (cid:15) (cid:0) ¯ f , T = n ∆ t (cid:1) sampled at a rate of f s = 1 / (∆ t ). Equation (16) was de-rived under the assumption that the noise PSD, S n ( f ), is smooth on frequency scales of ∆ f , and as such is inaccu-rate in the immediate vicinity of spectral lines in S n ( f ).The non-stationary term, (cid:15) , correlates Fourier modes ofthe data, d ( f m ), with different frequencies. Intuitively,if we analyze segments that are shorter than the slowtimescale, T , and whiten them using the instantaneousPSD, correlations between Fourier modes are diagonalin terms of frequency f m (as in Eq. (7)). Over longertimescales, the frequencies of the (slow) PSD variationsbeat against the frequencies of the (fast) Fourier modesand lead to the second term in Eq. (16).We are interested in the effect of the non-stationarypart of the noise PSD, (cid:15) , on the PSD of the power in thematched filtering overlaps, S z . For this, it is useful toview the values of (cid:15) ( f m , T ) themselves as being drawnfrom a set of random time-series (one for each ‘fast’ fre-quency f m ). The simplest model for this series is that allthe (cid:15) ( f m , T ) vary in step with each other, and with thesame amplitude (it is a straightforward generalization tomodel more complicated behavior). In this case, there isa single PSD that describes the variations: (cid:10) ˜ (cid:15) ( f m , f a ) [˜ (cid:15) ( f m (cid:48) , f b )] ∗ (cid:11) = N t S (cid:15) ( f a ) δ a,b . (17)We can obtain a simple form for the corrected version ofEq. (12) under additional approximations: (a) variationsin the noise PSD (described by S (cid:15) ) have support at muchlower frequencies than the whitened signal (cid:102) h w does, and(b) the time-domain whitened waveform is much shorterthan the timescales over which (cid:15) ( f ; T ) varies. In thiscase, we have S z ( f m (cid:54) = 0) ≈ t (cid:94) (cid:12)(cid:12)(cid:12) h w (cid:126) ←− h w (cid:12)(cid:12)(cid:12) ( f m ) (cid:20) (cid:90) ∞−∞ d f S (cid:15) ( f ) (cid:21) + (cid:12)(cid:12)(cid:12) (cid:102) h ( f m ) (cid:12)(cid:12)(cid:12) S (cid:15) ( f m ) . (18)Appendix A 2 presents a detailed derivation of this equa-tion. The first term in Eq. (18) is flat with frequencyat low frequencies (similarly as in Eq. (12)). The secondterm is proportional to the power spectrum, S (cid:15) , of thenon-stationary part of the noise-PSD, (cid:15) . The behaviorof the solid curves in Fig. 2 suggests that that the PSD, S n ( f ; T ), itself varies on timescales T larger than a fewseconds, and that these variations have a red spectrum. B. The loss of sensitivity due to a wrong PSD
Suppose that instead of the true PSD, S c ( f ), we use awrong one due to PSD misestimation: S w ( f ) = S c ( f ) (1 + (cid:15) ( f )) . (19)In this section, we compute the bias to the recovered SNRof a putative event due to the wrong PSD. We will showthat:1. The first-order effect of using a wrong PSD in com-puting Eq. (9) is misestimation of its standard de-viation.2. We show that if the standard deviation of the over-laps is corrected, then the SNR loss will be of order (cid:15) ( f ).Let us consider the following statistical model for thestrain data: d ( f ) = α h ( f ) + n ( f ) , (20)where h ( f ) is a compact binary coalescence templatewaveform with an amplitude normalization α , and n ( f )is the noise which obeys the true PSD S c ( f ). For simplenotation, we define I ( f ) ≡ | h ( f ) | /S c ( f ) , (21)which is normalized according to (cid:88) f I ( f ) = 1 . (22)We now compute the overlap using the wrong PSD: z w = (cid:88) f h ∗ ( f ) d ( f ) S w ( f ) . (23)In the absence of a signal α = 0, this overlap has a vari-ance λ w : λ w ≡ (cid:104)| z w | (cid:105) = (cid:88) f h ∗ ( f ) h ( f ) S w ( f ) S c ( f )= (cid:88) f I ( f )(1 + (cid:15) ( f )) . (24)We note that this is the true variance of z w , which is noteasily computable as it depends on the unknown S c . Wewill however measure it directly from data.The naive way to calculate the variance involves usingthe wrong PSD S w : λ w computed ≡ (cid:88) f h ∗ ( f ) h ( f ) S w ( f )= (cid:88) f I ( f )1 + (cid:15) ( f ) . (25)In the presence of a signal, the expected response is givenby: α w ≡ (cid:104) z w (cid:105) h = α (cid:88) f | h ( f ) | S w ( f ) = α (cid:88) f I ( f )1 + (cid:15) ( f ) (26)Thus, if we use z w as our statistic, the significancesquared is ρ w = (cid:104) z w (cid:105) h (cid:104)| z w | (cid:105) = α w λ w . (27) If we were to take for granted that the variance of z w was λ w computed , we would have estimated: ρ w, computed = α w λ w computed . (28)If the true PSD is known, the optimal overlap is givenby: z c = (cid:88) f h ∗ ( f ) d ( f ) S c ( f ) (29)which maximizes the SNR. In the presence of a signal: α c ≡ (cid:104) z c (cid:105) h = α (cid:88) f | h ( f ) | S c ( f ) = α (cid:88) f I ( f ) = α. (30)Absent the signal, z c has a variance λ c ≡ (cid:104)| z c | (cid:105) = (cid:88) f | h ( f ) | S c ( f ) S c ( f ) = (cid:88) f I ( f ) = 1 . (31)The optimal ρ c is then: ρ c = (cid:104) z c (cid:105) h (cid:104)| z c | (cid:105) = α c λ c = α . (32)Therefore, the relative information loss between using thetrue PSD S c ( f ) and the wrong PSD S w ( f ) is given by: ρ c ρ w = α c λ w α w λ c = (cid:80) f I ( f ) / (1 + (cid:15) ( f )) (cid:2) (cid:80) f I ( f ) / (1 + (cid:15) ( f )) (cid:3) . (33)If (cid:15) ( f ) was frequency independent, then the ratio wouldbe one, and ρ w would be as optimal as ρ c . More generally, ρ c > ρ w for any frequency dependent (cid:15) ( f ), so there isalways a sensitivity loss.We now Taylor expand (33)) and keep terms up tosecond order in (cid:15) ( f ): ρ c ρ w = 1 − (cid:80) f I ( f ) (cid:15) ( f ) + 3 (cid:80) f I ( f ) (cid:15) ( f ) (cid:104) − (cid:80) f I ( f ) (cid:15) ( f ) + (cid:80) f I ( f ) (cid:15) ( f ) (cid:105) (34)= 1 + (cid:88) f I ( f ) (cid:15) ( f ) − (cid:16) (cid:88) f I ( f ) (cid:15) ( f ) (cid:17) ≥ . Thus, the leading correction to ρ is quadratic with (cid:15) ( f ).This is understandable from the fact that the overlap z c is constructed to maximize the SNR, and thus loss ofsensitivity due to an error in PSD estimation should bea quadratic function.The relative error in PSD estimation in each frequencyband of S c ( f ) is 0 . N − / , where N seg is the number ofsegments used for measuring the PSD using the Welchmethod (the coefficient 0 . . N − .A key point is that computing ρ w requires λ w , whichcannot be computed without the knowledge about S c .Nevertheless, λ w can be directly measured from the data.We now compute the fractional error in the standard de-viation estimation: λ w λ w computed = (cid:80) f I ( f ) / (1 + (cid:15) ( f )) (cid:80) f I ( f ) / (1 + (cid:15) ( f ))= 1 − (cid:88) f I ( f ) (cid:15) ( f ) + O (cid:2) (cid:15) ( f ) (cid:3) . (35)At the leading order, the error scales linearly with (cid:15) ( f ).Thus if one uses λ w, computed to compute the SNR ofan event one makes a linear mistake. We can think of (cid:80) f [ I ( f ) / (1 + (cid:15) ( f ))] as a random variable which adds O [ (cid:15) ( f )] noise to the estimate of the trigger SNRs. Aswe will describe in the next sections, this creates a tailin the distribution of triggers which results in substan-tial sensitivity loss. In the next sections we will discusshow to measure λ w . As a remedy to PSD misestimation,this mitigates the sensitivity loss from O [ (cid:15) ( f )] ∼
10% toorder O [ (cid:15) ( f )] ∼ C. PSD drift correction
We propose a practical method to mitigate the sen-sitivity loss due to fast variations in the PSD (to sec-ond order in the amplitude of the drift) by tracking thetime-dependent variance of the calculated overlaps at fewpercent accuracy. Since this is a single number to be es-timated from the data (contrary to a full PSD whichconsists of thousands of numbers), we can estimate it us-ing very short segments of data. For a 2% relative errorin the variance, we will need roughly 10 s of data. InAppendix B we estimate the precision with which thisvariance can be determined depending on the length ofthe data segments used. The variance around any giventime t is empirically computed in the time domain via λ w ( t ) ≡ (cid:104)| z w ( t ) | (cid:105) ≈ N a t + N a ∆ t/ (cid:88) t (cid:48) = t − N a ∆ t/ | z w ( t (cid:48) ) | (36) ∝ t + N a ∆ t/ (cid:88) t (cid:48) = t − N a ∆ t/ | h w ( t (cid:48) ) (cid:126) d w ( t (cid:48) ) | = (cid:88) f (cid:12)(cid:12)(cid:12)(cid:12) h ∗ ( f ) d ( f ) S w ( f ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:88) f | h ∗ ( f ) | | d ( f ) | S w ( f ) , where N a is the number of scores used in the average,and h w and d w are the template and data whitened withthe wrong PSD S w , respectively. The third equality isdue to Parseval’s theorem. The normalization factor inthe above formula depends on the choice of N a and canbe precisely worked out. The approximate equality in (36) is due to the finitenumber of samples used, which results in an uncertaintyproportional to N − / . Exact equality does not hold alsobecause the PSD may vary within the averaging timescale N a ∆ t . We invite the readers to read Appendix B for afull derivation.It is noteworthy that λ w ( t ) is independent of the phaseof h ( f ) as long as the waveform has its support within N a ∆ t . More generally, a correction that is computed fora particular waveform h ( f ) = A ( f ) e i ψ ( f ) is valid forany other waveform h ( f ) = A ( f ) e i ψ ( f ) who shares thesame amplitude profile A ( f ) = A ( f ) and whose timedomain support, after dechirping, h dechirped ( f ) = h ( f ) e − i ψ ( f ) (37)is shorter than N a ∆ t .In choosing the timescale on which the moving averageis computed for λ w ( t ), there are two mutually compet-ing factors. One is that we prefer the timescale to be asshort as possible in order to capture the temporal vari-ation in the PSD as much as possible. Another is thata smaller timescale for averaging leads to a larger sam-pling uncertainty in the measurement of λ w ( t ). We adopta compromise in which the timescale is chosen to have 2%sampling uncertainty on λ w ( t ). We estimate that for thisvalue, the error due to not capturing faster variability iscomparable, albeit varies slightly with time and from oneinterferometer to another. Another important issue is toensure that the correction is not influenced by loud andabrupt noise transients or real GW events. We achievethis by computing Eq. (36) using a safe mean, namely weonly average over overlaps that do not exceed 4 . σ .In Figs. 3 and 4 we show the dramatic impact on thetrigger distribution from applying the PSD drift correc-tion. In the figures we compare the histogram of thePSD-drift-corrected | z w | /λ w statistic and that of theusual | z w | /λ w computed . The relative importance of thiscorrection and signal consistency vetoes depends on thewaveform duration. In Fig. 3 we show the trigger dis-tribution for our binary black hole bank BBH (0,0) [17],which contains relatively long waveforms in the chirp-mass range 3 − M (cid:12) , here the PSD drift correction isthe most important correction. In Fig. 4 we use a bankwith shorter waveforms, and it can be seen that the PSDdrift correction is secondary in importance to signal con-sistency vetoes. Still, once those have been applied, thePSD drift correction substantially improves the sensitiv-ity.We quantify the volume improvement by comparingthe incoherent double detector trigger distributions fromthe entirety of O2 in banks BBH 0 and
BBH 3 beforeand after PSD drift correction. We further restrict to λ w /λ w computed < .
4, as large correction values oftenindicate big disturbances in the spectrogram and the as-sociated seconds are often rejectable by other tests. Wealso restricted the results to places were both detectorshad comparable response (20 < ( z /λ ) H , L <
40 60 80 100 120 140 C u m u l a t i v e t r i gg e r r a t e ( H z ) Livingston, bank BBH (0, 0) triggers in O2
InitialNo PSD-drift correctionNo vetoFinal40 60 80 100 120 140 C u m u l a t i v e t r i gg e r r a t e ( H z ) Hanford, bank BBH (0, 0) triggers in O2
InitialNo PSD-drift correctionNo vetoFinal
FIG. 3: Cumulative trigger rates in bank
BBH (0,0) (binaryblack holes whose chirp mass falls in the range 3–5 M (cid:12) [17])using the entirety of the O2 bulk data. The initial triggerdistribution (blue) is the distribution of the maximumoverlap obtained every second, computed after removal ofbad data, as described in Ref. [12]. In green is shown thefinal cumulative trigger distribution, after correcting forPSD drift and applying signal consistency vetoes. In blackwe show the cumulative trigger distribution when PSD driftis unaccounted for. For reference we show in orange thetrigger distribution when the signal consistency vetoes arenot applied. While real events are in general left untouchedby the PSD drift correction (see Fig. 1), the backgrounddistribution undergoes a dramatic change. This is because ifleft unaccounted for, the fluctuations in places with variancemisestimation dominate the tail of the trigger distribution.This effect becomes more severe at higher triggersignificance. H, L refer to the LIGO detectors at Hanford and Liv-ingston. We determine a 15% volume increase in bothbanks due to correcting the PSD drift effect. This is
40 60 80 100 120 140 C u m u l a t i v e t r i gg e r r a t e ( H z ) Livingston, bank BBH (3, 0) triggers in O2
InitialNo PSD-drift correctionNo vetoFinal40 60 80 100 120 140 C u m u l a t i v e t r i gg e r r a t e ( H z ) Hanford, bank BBH (3, 0) triggers in O2
InitialNo PSD-drift correctionNo vetoFinal
FIG. 4: Cumulative trigger rates in bank
BBH (3,0) (binaryblack holes whose chirp mass falls in the range 20–40 M (cid:12) and whose effective spin does not have a very negativevalue [17]), using the entirety of the O2 bulk data. Thisbank contains most of the detected BBH events in O1 andO2. As can be seen, signal consistency checks for triggers aremuch more important in this domain as the waveforms ingeneral have a very short duration in band. But by pushingback the background distribution, PSD drift correction stillmitigates a substantial amount of sensitivity loss. estimated using V corr V uncorr = (cid:18) z λ H + z λ L (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) FAR=1 / O2 (cid:18) z λ H computed + z λ L computed (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)
FAR=1 / O2 − / ≈ . , (38)where the terms are evaluated at the threshold where thebackground distribution produces a false alarm rate of 1trigger per O2 run. We expect this volume increase to bethe same for BNSs as well as for NSBHs. This is a conser-vative lower bound as many real events would not havecomparable detector response, especially in the latest ob-serving runs where the sensitivity greatly differs betweendetectors. In the case where the detection practicallyhinges on one detector alone, the volume increase is 50%for Livingston and 150% for Hanford, as can be seen inFigs. 3 and 4. Since for a substantial amount of time inthe second and third observing runs there is great asym-metry between the sensitivity of the detectors, as well asdue to natural geometric considerations (and noise fluc-tuations), the volume contribution from this regime canbe substantial. III. HOLE FILLINGA. Signal processing rationale
In almost all data segments of (cid:38)
100 s, abrupt dis-turbances are prevalent. Such disturbances have diverse(and often unknown) physical or instrumental origins,and mathematical models that can be used to accuratelycharacterize them are lacking. The timescale of distur-bance ranges from a few milliseconds to a few seconds.During searches for signals from compact binary coales-cence, these disturbances induce candidate triggers thatpopulate the tail of the distribution. Their presence di-lutes the significance of genuine astrophysical triggersand degrades the search sensitivity. Removing these badsegments of data is not a trivial task, as simply zeroingthem out (usually done using a “gate” that smoothly ze-roizes the data) might result in leakage of excess power towithin tens of seconds around, which is often more harm-ful to the search effort than the disturbance itself. This isbecause spectral lines in the PSD whose inverse widthsare longer than the duration of the gate leak power toneighboring seconds and frequency bins. See for examplethe middle panels of Figs. 5 and 6.While very often these disturbance induced triggers areeasily dismissed, sometimes careful treatment is neces-sary, as the time-domain footprint of real GW events (es-pecially for BNS events whose waveforms last very longin band) may fortuitously overlap a disturbance. In thatcase, accurately determining the significance and esti-mating the merger parameters can be a complicated task.For example, GW170817 was in coincidence with a largedisturbance in the Livingston detector (see Fig. 6). Theanalysis by the LIGO and Virgo Collaboration coherentlyremoved this glitch in their analysis of GW170817 [6, 18],but most glitches lack an exact mathematical model. Wetherefore foresee that the analysis of future BNS detec-tions, especially as the sensitivity at low frequencies im-proves, will necessitate a treatment independent of anyexact glitch model. In this section, we derive a methodto exactly remove bad data segments, ensuring that thesignificance and inferred parameters of the event are notinfluenced by the offending segment of the data.
505 With glitch 1505 Windowed505 Inpainted hole1.0 0.5 0.0 0.5 1.0 t t glitch (s)0 Inpainted hole 01 W i n d o w W h i t e n e d s t r a i n B l u e d s t r a i n FIG. 5: Effect of masking and inpainting glitches.
Toppanel:
A segment of whitened strain data (in units of thenoise standard deviation) that contains a glitch. The orangecurve tracks the standard deviation σ calculated from arunning window of 100 samples, and is typically close tounity as expected for whitened data. Second panel:
Gatingthe glitch with an upside-down Tukey window (green) andthen whitening generates artifacts in the whitened data,even outside the Tukey window. For example, σ stays above1.1 for approximately 2 s to each side of the glitch. Thirdpanel:
The inpainted whitened data has unit varianceoutside the hole (shaded).
Bottom panel:
After inpainting,the “blued” strain is identically zero inside the hole, sooverlaps with templates do not depend on the waveforminformation from inside the hole. Figure previewed alreadyin the pipeline description paper, Ref. [12].
B. Derivation
When computing overlaps one often assumes that thenoise covariance is diagonal in the frequency domain andwrites: z w = (cid:88) f h ∗ ( f ) d ( f ) S w ( f ) . (39)In practice, the data contain bad seconds that we needto mask out. Let us consider a data series of N d samplesand denote it as a vector with components d i . We willdenote the N d × N d covariance matrix of the noise as C ij ,which is diagonal in the frequency domain. Adopting thenotation of linear algebra, the overlap can be cast intothe matrix form: z w = C − ij h ∗ i d j = h † C − d, (40) GW170817 original data F r e q u e n c y ( H z ) Inverse-Tukey windowed
Glitch in-painted
FIG. 6: Demonstration of inpainting in the segment of datacontaining GW170817.
Upper panel: whitened spectrogramof the original Livingston strain few seconds around themerger of GW170817.
Middle panel: glitch gated away withan inverse Tukey window, with a timescale of 1 . α = 5 /
6. Note the leakage from the spectral lines (around500 Hz) is strongly affecting the data 5 s to either side. Anarrower window would leave a more prominent noiseleakage.
Bottom panel : 160 ms inpainted by the hole-fillingmethod presented in this work. where h i are the components of the template waveformand similarly for d i .In the presence of loud disturbances, we want the com-puted overlap to be independent of the behavior of thetemplate waveform within the bad seconds. Let us as-sume that there is a list of samples in the time domainof length N h to be masked. We denote u ( α ) a list of N h vectors such that each of these vectors is zero everywhereexcept at one of the samples to be masked. We can definethe matrix A of size N d × N h : A i,α = u ( α ) i , (41) and the matrix M of size N h × N h as: M α,β = C − ij A i,α A j,β = C − ij u ( α ) i u ( β ) j = A T C − A. (42)We are using the convention that Greek indices run over1 , . . . , N h and roman indices run over 1 , . . . , N d . We willnow define the inpainting filter: F = 1 − A M − A T C − (43)and compute the scores with the inpainted data:˜ z w = h † C − F d. (44)The presence of the hole changes the normalization of thetemplate, so we renormalize ˜ z and compute: z w = (cid:18) h † C − hh † C − F h (cid:19) / ˜ z w . (45)Note that this normalization is time and template de-pendent, which makes it computationally intractable tocompute exactly everywhere. In the next subsection, weshow how to approximate it using fast Fourier transform(FFT), via the stationary phase approximation.The inpainting filter F has several desirable proper-ties. First the score does not depend on the values of thetemplate inside the hole because: u ( α ) T C − F d = 0 , (46)for any d . The inpainting filter also satisfies: F = F (47) F T C − F = C − F = F T C − . (48)Note that the filter F accomplishes its task by filling thehole with appropriate values. Values outside the holes areleft untouched. The term A M − A T C − first weightsthe data by the inverse of the covariance. Then takesthe values inside the holes (by multiplying by A T ) andmakes some linear combinations (by multiplying them by M − ). These values are then put back into the hole bymultiplying these linear combinations by A . Note thatthe values outside of the holes are left untouched.This hole filling procedure ensures that the scores donot depend on what the template does inside the hole.The only requirement is that the covariance matrix usedto compute the scores ( C − ) is the same as the one usedto build the matrix M .We can obtain the same inpainting filter by consideringa related problem. Suppose that the strain data d is thesum of a Gaussian random field h with covariance matrix C plus some additional source of noise n with covariancematrix N , d = h + n . Then the probability of h giventhe data d , P ( h | d ), is: P ( h | d ) ∝ e − ( d − h ) † N − ( d − h ) e − h † C − h . (49)0In the limit where the additional noise is zero outside theholes and infinite inside, the maximum of this functionis given by h = F d .Another equivalent formulation is that one is trying tofind the maximum of χ = 12 h † C − h, (50)with h = d + (cid:80) α a α u ( α ) for any a α . That is, find an h that equals the data outside the holes but can take anyvalue inside. This is achieved by taking h = F d . C. Practical issues with inpainting
Correcting the variance
In order to assess the significance of overlaps computedon hole-filled data, we need to renormalize them using λ hole ≡ h † C − F hh † C − h . (51)Under the stationary phase approximation, this coeffi-cient could be understood as the fraction of SNR thatremains after the segment with bad data has been re-moved, i.e., λ hole ( t , h ) ≈ (cid:80) t/ ∈ hole | h w ( t − t ) | (cid:80) t | h w ( t ) | , (52)where t denotes the merger time and h w is the whitenedwaveform. Computing this for all times is then done via: λ hole ( t , h ) ≈ (cid:16) |←− h w | (cid:126) valid (cid:17) ( t ) (cid:80) t | h w ( t ) | , (53)where valid is one whenever the data is valid and zerootherwise. The convolution operation is computed viaFFT using the convolution theorem.In principle, the hole correction depends on the mergerphase, but under the stationary phase approximation,and the approximation that the waveform would com-plete many cycles inside the hole, this correction would bethe same for waveforms with different phases, and couldbe computed the same way for complex waveforms. Thestationary phase approximation does not apply to shortwaveforms (say waveforms that are shorter than a fewseconds), for these we do not compute the correction andwe simply ignore overlaps that are too close to a hole.For short waveforms the correction is also not relevant asthe fraction of data that is removed this way is smallerthan a few percent of the total coincident time. Edge holes
When data stream begins after a long break, or a longperiod of bad data, it is desirable to be able to detect events that are only partially inside the valid data. Inorder to do this optimally, we treat the edges of the dataas holes, and fill them using the same formulae.
Reducing the computational complexity of hole filling
Since computing
F d requires inverting a matrix of size N h × N h , which costs N h operations, it is of importance tofind efficient ways of computing those. Since the whiten-ing filter is of finite size (fixed in our pipeline to 64 s),filling holes that are 64 s apart could be done indepen-dently.Hole filling a contiguous segment could be done fastervia Toeplitz linear equation solution. This can be donethrough solving Eq. (46) for the vector F d , using the factthat C − is Toeplitz, and that the only disturbed valuesare inside the hole. Segments of data with inseparableholes that have more than 10 s of hole duration are as-sumed to be all bad and filled as contiguous segmentsusing the Toeplitz solver. When the size of the hole isbigger than the size of the blueing filter (i.e., the whiten-ing filter applied twice) we simply treat this as an edge,and fill only the duration of the blueing filter. IV. SUMMARY
We have presented methods to deal with two kinds ofnon-Gaussianity in the strain data, one kind is the fastchanges of the PSD, and the other is times of bad datacontaining abrupt noise transients.We proposed computing a running estimate for thevariance of the overlaps. We call this correction the PSDdrift correction and we have shown that it correctly ac-counts for the first order errors due to having a wrongPSD in calculating the overlaps. We have shown thatthe error in estimating the overlap variance is of order2% on 15 s of data, a segment that is short enough sothat error due to even faster PSD variations is of simi-lar magnitude. Albeit the PSD drift effect in principlehas complex frequency structure, the correction could bedone via a simple robust running mean estimate for thelocal variance of the overlaps.We have shown that applying this correction dramat-ically reduces the background in single-detector overlapdistributions, and because of this the search volume isincreased by at least 15%. This improvement is moresubstantial if the detection significance mainly hinges onone detector, as often happens due to the different an-tenna patterns of the detectors or their respective sensi-tivities. We point out that finding the physical source ofPSD change or better tracking of the fast changes couldfurther improve the search sensitivity by 2 − ACKNOWLEDGMENTS
Appendix A: Statistical properties of non-stationary noise1. Covariance between Fourier components
In this section, we derive the correction to the definition of the discrete PSD, i.e., Eq. (5) in the presence of slowvariations in the properties of the noise. The starting point are Eqs. (13) and (14), which define the behavior of thenoise over short and long timescales.As in Section II A, we work with DFT coefficients defined over a segment of data of length N . We are interested inthe correlation between the DFT coefficients, which we expand as (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) = (cid:88) a,b (cid:104) d ( a ∆ t ) d ( b ∆ t ) (cid:105) e πi ( f m (cid:48) b − f m a )∆ t . (A1)Let us first work through the stationary case, where the expectation value on the right-hand side equals (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) = (cid:88) a,b C n [( b − a )∆ t ] e πi ( f m (cid:48) b − f m a )∆ t . (A2)Substituting the inverse of Eq. (3), we have (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) = (cid:88) a,b (cid:90) ∞−∞ d f S n , ( f ) e πif ( b − a )∆ t e πi ( f m (cid:48) b − f m a )∆ t (A3)= (cid:90) ∞−∞ d f S n , ( f ) N (cid:88) b =0 e πi ( f + f m (cid:48) ) b ∆ t N (cid:88) a =0 e − πi ( f + f m ) a ∆ t . (A4)In the above equations, indices a and b run from 0 to N − − N/ N/ (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) = e πi ( f m (cid:48) − f m )∆ t (cid:90) ∞−∞ d f S n , ( f ) W N ( f ; f m ) W N ( f ; f m (cid:48) ) , (A5)where W N ( f ; f m ) is the following window function: W N ( f ; f m ) = sin [ π ( f + f m ) N ∆ t ]sin [ π ( f + f m ) ∆ t ] . (A6)As a function of frequency f , W N ( f ; f m ) has a series of peaks of amplitude N and alternating signs (for even N ),and width ∼ / ( N ∆ t ), that pick out frequencies where the sines in the numerator and denominator vanish. Thepeak-frequencies are separated by the sampling frequency, so if we bandpassed the data before sampling, we canrestrict to the primary frequency interval between − f s / f s / f s = 1 / ∆ t is the sampling frequency). If2the PSD behaves smoothly on frequency scales of ∼ / ( N ∆ t ), W N ( f ; f m ) behaves like a delta-function selecting theappropriate frequency in the integrand, and we have (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) ≈ N ∆ t S n , ( f m ) δ m,m (cid:48) = N t S n ( f m ) δ m,m (cid:48) . (A7)Applying the whitening filter, we obtain (cid:68)(cid:102) d w ( f m ) (cid:104)(cid:102) d w ( f m (cid:48) ) (cid:105) ∗ (cid:69) = N δ m,m (cid:48) . (A8)Next, we now include the non-stationary terms (the extra terms in Eqs. (13) and (14)) when simplifying Eq. (A1).They lead to an extra term, which we evaluate as follows: δ (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) = (cid:88) a,b δC n (cid:20) ( b − a )∆ t ; a + b t (cid:21) e πi ( f m (cid:48) b − f m a )∆ t (A9)= (cid:88) a,b (cid:90) ∞−∞ d f S n , ( f ) (cid:15) (cid:20) f ; a + b t (cid:21) e πif ( b − a )∆ t e πi ( f m (cid:48) b − f m a )∆ t , (A10)where in the second equation, we used the inverse of Eq. (15). It is convenient to define the variables A and B : A = a + b ∈ (cid:26) − N , − N , . . . , N (cid:27) , and (A11) B = b − a ∈ {− L A , L A + 2 , . . . , L A } , where (A12) L A = (cid:40) N − A, A ≤ ,N − A, A > . (A13)We recast Eq. (A10) in terms of these variables and simplify as follows: δ (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) = (cid:88) A L A (cid:88) B = − L A (cid:90) ∞−∞ d f S n , ( f ) (cid:15) ( f ; A ∆ t ) e πifB ∆ t e πi [ f m (cid:48) ( A + B/ − f m ( A − B/ t (A14)= (cid:88) A (cid:90) ∞−∞ d f S n , ( f ) (cid:15) ( f ; A ∆ t ) e πi ( f m (cid:48) − f m ) A ∆ t L A (cid:88) B = − L A e πi [ f +( f m (cid:48) + f m ) / B ∆ t (A15)= (cid:88) A (cid:90) ∞−∞ d f S n , ( f ) (cid:15) ( f ; A ∆ t ) e πi ( f m (cid:48) − f m ) A ∆ t W L A +1 (2 f ; f m + f m (cid:48) ) . (A16)As earlier, we restrict to the primary frequency interval between − f s / f s /
2, within which the window functionpicks out two peaks at f = − ¯ f and f = ± f s / − ¯ f (here, ¯ f = ( f m + f m (cid:48) ) /
2, and in the second equation, we pick thesign that brings the right-hand side into the primary interval). The peaks have the same (opposite) signs when L A iseven (odd). The peaks are smeared out when L A is of order unity, which happens only near the edges of the interval.We neglect these edge effects and replace the window function with the appropriate delta functions at its peaks: δ (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) ≈ (cid:88) A (cid:90) ∞−∞ d f S n , ( f ) (cid:15) ( f ; A ∆ t ) e πi ( f m (cid:48) − f m ) A ∆ t t × (cid:104) δ (cid:0) f + ¯ f (cid:1) − δ (cid:16) f + ¯ f ± f s (cid:17)(cid:105) A ∈ (cid:8) − N + , − N + . . . , N − (cid:9) , (cid:104) δ (cid:0) f + ¯ f (cid:1) + δ (cid:16) f + ¯ f ± f s (cid:17)(cid:105) A ∈ (cid:8) − N + 1 , − N + 2 . . . , N (cid:9) . (A17)Let us denote the frequency offset f m − f m (cid:48) by ∆ f . The contributions of the second term inside the square bracketsto the sum approximately cancel (since the parts with alternating signs are approximately equal), and hence we cansimplify the sum to δ (cid:68) ˜ d ( f m ) (cid:104) ˜ d ( f m (cid:48) ) (cid:105) ∗ (cid:69) ≈ t (cid:88) A S n , (cid:0) − ¯ f (cid:1) (cid:15) (cid:0) − ¯ f ; A ∆ t (cid:1) e − πi ∆ fA ∆ t (A18)3 ≈ S n (cid:0) ¯ f (cid:1) t ˜ (cid:15) (cid:0) ¯ f ; ∆ f (cid:1) , (A19)where ˜ (cid:15) (cid:0) ¯ f ; ∆ f (cid:1) is the DFT, evaluated at the frequency offset ∆ f , of (cid:15) (cid:0) ¯ f , T = n ∆ t (cid:1) sampled at a rate of f s = 1 / (∆ t ). Finally, we combine Eqs. (A7) and (A19) and apply the whitening filter to obtain the generalized version of Eq. (7): (cid:68)(cid:102) d w ( f m ) (cid:104)(cid:102) d w ( f m (cid:48) ) (cid:105) ∗ (cid:69) ≈ N δ m,m (cid:48) + S n (cid:0) ¯ f (cid:1) [ S n ( f m ) S n ( f m (cid:48) )] / ˜ (cid:15) (cid:0) ¯ f ; ∆ f (cid:1) . (A20)We assume that (a) the changes to the PSD S n ( f ) occur on long timescales, i.e., ∆ f (cid:28) ¯ f , and (b) S n ( f ) is smoothon frequency scales ∼ ∆ f . Then, we can write (cid:68)(cid:102) d w ( f m ) (cid:104)(cid:102) d w ( f m (cid:48) ) (cid:105) ∗ (cid:69) ≈ N δ m,m (cid:48) + ˜ (cid:15) (cid:0) ¯ f ; ∆ f (cid:1) . (A21)In our application, the first assumption above holds very well, while the second assumption fails only in the immediatevicinity of spectral lines.
2. Power spectrum of the variance of the matched filtering scores
In this section, we consider the variance of the matched filtering scores z ( t ) (as defined in Eq. (11)) as a time series,and derive its PSD. We empirically define the variance as the output of the estimator: (cid:104) z (cid:105) ( t = n ∆ t ) = (cid:88) n (cid:48) z [ t = ( n + n (cid:48) )∆ t ] w ( − n (cid:48) ∆ t ) , (A22)where (cid:80) n w ( n ∆ t ) = 1. The window function w effectively manifests as a multiplicative low-pass filter in the formulaein the rest of this section, and does not change any details below its cutoff frequency. For ease of presentation, wewill omit the window function and directly use the power, z , in place of the variance (cid:104) z (cid:105) .The DFT of the power is (cid:101) z ( f m ) = 1 N (cid:88) p (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) d w ( f p ) (cid:102) h w ( f p − f m ) (cid:104)(cid:102) d w ( f p − f m ) (cid:105) ∗ . (A23)In the above equation, the convention is that frequencies outside the primary interval of ( − f s / , f s /
2] are replaced bythe values in the interval that they alias to. We view the power as a random time series, with its own autocorrelationfunction. The PSD of this random series is defined by: (cid:68) (cid:101) z ( f m ) (cid:104) (cid:101) z ( f m (cid:48) ) (cid:105) ∗ (cid:69) = N t S z ( f m ) δ m,m (cid:48) . (A24)We substitute Eq. (A23) and obtain: (cid:68) (cid:101) z ( f m ) (cid:104) (cid:101) z ( f m (cid:48) ) (cid:105) ∗ (cid:69) = 1 N (cid:88) p,q (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) h w ( f p − f m ) (cid:102) h w ( f q ) (cid:104) (cid:102) h w ( f q − f m (cid:48) ) (cid:105) ∗ × (cid:68)(cid:102) d w ( f p ) (cid:104)(cid:102) d w ( f p − f m ) (cid:105) ∗ (cid:104)(cid:102) d w ( f q ) (cid:105) ∗ (cid:102) d w ( f q − f m (cid:48) ) (cid:69) . (A25)We first work through the case with stationary noise. We use Wick’s theorem to simplify the expectation value onthe right-hand side as the sum of pairwise products, which we evaluate using Eq. (A8): (cid:68) (cid:101) z ( f m ) (cid:104) (cid:101) z ( f m (cid:48) ) (cid:105) ∗ (cid:69) = (cid:88) p,q (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) h w ( f p − f m ) (cid:102) h w ( f q ) (cid:104) (cid:102) h w ( f q − f m (cid:48) ) (cid:105) ∗ [ δ m, δ m (cid:48) , + δ p,q δ m,m (cid:48) + δ p + q,m δ m,m (cid:48) ] . (A26) Note that the sum in Eq. (A18) is on a finer grid sampled atfrequency 2 / (∆ t ). Equation (A19) holds assuming that (cid:15) is ban- dlimited below f s = 1 / (∆ t ), which is true in our case. S z , by substituting Eq. (A24) and simplifying: S z ( f m ) = 2∆ tN (cid:88) p,q (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) h w ( f p − f m ) (cid:102) h w ( f q ) (cid:104) (cid:102) h w ( f q − f m ) (cid:105) ∗ [ δ m, + δ p,q + δ p + q,m ] (A27)= 2 N ∆ t δ m, + 4∆ tN (cid:88) p,q (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:102) h w ( f q ) (cid:12)(cid:12)(cid:12) δ p + q,m . (A28)= 2 N ∆ t δ m, + 4∆ t (cid:94) (cid:12)(cid:12)(cid:12) h w (cid:126) ←− h w (cid:12)(cid:12)(cid:12) ( f m ) . (A29)In the final equation, h w (cid:126) ←− h w is the autocorrelation function of the whitened waveform. To simplify the first termon the right-hand side, we used the fact that the SNR in Eq. (10) is normalized to unity.In the non-stationary case, the PSD of the power, S z , also depends on the varying part of the noise PSD, (cid:15) ( f, T ).Given a number of frequency-bins, f m (conjugate to the ‘fast’ timescales), we consider the (cid:15) ( f m , T ) as a set of randomtime-series, varying over long timescales T . The most general form of the correlations between these time-series is (cid:10) ˜ (cid:15) ( f m , f a ) [˜ (cid:15) ( f m (cid:48) , f b )] ∗ (cid:11) = N t S (cid:15), ( m,m (cid:48) ) ( f a ) δ a,b . (A30)In the above equation, ˜ (cid:15) ( f m , f a ) represents the DFT of (cid:15) ( f m , T = n ∆ t ) sampled at a rate f s = 1 / (∆ t ). If weperform the Singular Value Decomposition (SVD) of the matrix S (cid:15), ( m,m (cid:48) ) ( f a ), the basis vectors represent frequencycomponents (with weights given by the coefficients) that vary together. We assume the simplest possible form of thecorrelations in Eq. (A30): (cid:10) ˜ (cid:15) ( f m , f a ) [˜ (cid:15) ( f m (cid:48) , f b )] ∗ (cid:11) = N t S (cid:15) ( f a ) δ a,b . (A31)This implies that all the (cid:15) ( f m , T ) vary in step with each other, and with the same amplitude.We return to Eq. (A25) to evaluate the PSD of the power, S z . We first fix a particular realization of (cid:15) ( f m , T ),compute S z using the disconnected terms in Eq. (A21), and additionally average over the realizations of (cid:15) usingEq. (A31) (we neglect the connected part of the expectation value in Eq. (A21)). In addition to the terms inEq. (A29), the following terms show up: δS z ( f m ) = 1 N (cid:88) p,q (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) h w ( f p − f m ) (cid:102) h w ( f q ) (cid:104) (cid:102) h w ( f q − f m ) (cid:105) ∗ [ S (cid:15) ( f m ) + S (cid:15) ( f p − f q ) + S (cid:15) ( f p + f q − f m )]= | (cid:102) h ( f m ) | S (cid:15) ( f m )+ 1 N (cid:88) p,q (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) h w ( f p − f m ) (cid:102) h w ( f q ) (cid:104) (cid:102) h w ( f q − f m ) (cid:105) ∗ [ S (cid:15) ( f p − f q ) + S (cid:15) ( f p + f q − f m )] . (A32)We can see by a change of variable that both the terms in the summation on the right-hand side of Eq. (A32) areequal. Carrying this through, δS z ( f m ) = | (cid:102) h ( f m ) | S (cid:15) ( f m ) + 2 N (cid:88) p,q (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) h w ( f p − f m ) (cid:102) h w ( f q ) (cid:104) (cid:102) h w ( f q − f m ) (cid:105) ∗ S (cid:15) ( f p − f q )= | (cid:102) h ( f m ) | S (cid:15) ( f m )+ 2 N (cid:88) q, ∆ q ≡ ( p − q ) (cid:102) h w ( f q ) (cid:104) (cid:102) h w ( f q + f ∆ q ) (cid:105) ∗ (cid:104) (cid:102) h w ( f q − f m ) (cid:105) ∗ (cid:102) h w ( f q + f ∆ q − f m ) S (cid:15) ( f ∆ q )= | (cid:102) h ( f m ) | S (cid:15) ( f m ) + 2 N (cid:88) ∆ q (cid:94) (cid:12)(cid:12)(cid:12)(cid:12) h w (cid:126) ←−−−−−−−− h w e − πif ∆ q t (cid:12)(cid:12)(cid:12)(cid:12) ( f m ) S (cid:15) ( f ∆ q ) . (A33)In the second term on the right-hand side, we multiplied the signal with an oscillatory function of time ( e − πif ∆ q t )before reversing and convolving it with itself.We can simplify the form of Eq. (A33) using the following approximations: (a) variations in the noise PSD (˜ (cid:15) ) havesupport at much lower frequencies than the whitened signal (cid:102) h w does, and (b) the time-domain whitened waveform is5much shorter than the timescales over which (cid:15) ( f ; T ) varies. The first assumption holds very well, since (cid:15) ( f ; T ) varieson timescales of tens of seconds (i.e., ˜ (cid:15) has power at f (cid:46) . f (cid:38)
20 Hz; thesecond assumption does not hold for signals that are longer than a few tens of seconds (such as signals from mergingbinary neutron stars).Under the second approximation above, the factor of e − πif ∆ q t in Eq. (A33) acts like a constant phase multiplyingthe signal, which makes no difference. We can pull it out, and interpret the summation as a Riemann sum to obtain δS z ( f m ) ≈ (cid:12)(cid:12)(cid:12) (cid:102) h ( f m ) (cid:12)(cid:12)(cid:12) S (cid:15) ( f m ) + 2∆ t (cid:94) (cid:12)(cid:12)(cid:12) h w (cid:126) ←− h w (cid:12)(cid:12)(cid:12) ( f m ) (cid:90) ∞−∞ d f S (cid:15) ( f ) . (A34)We combine Eqs. (A29) and (A34) to get our final simplified form of the PSD of the variations in the power of thematched filtering scores: S z ( f m ) ≈ N ∆ t δ m, + 2∆ t (cid:94) (cid:12)(cid:12)(cid:12) h w (cid:126) ←− h w (cid:12)(cid:12)(cid:12) ( f m ) (cid:20) (cid:90) ∞−∞ d f S (cid:15) ( f ) (cid:21) + (cid:12)(cid:12)(cid:12) (cid:102) h ( f m ) (cid:12)(cid:12)(cid:12) S (cid:15) ( f m ) . (A35) Appendix B: Computation of the measurement error in the PSD drift
In this section, we work with whitened waveforms and data, h w ( n ∆ t ) and d w ( n ∆ t ) respectively, that are functionsof the discrete index n , and were whitened using some fiducial filter. The auto-correlation function of perfectly whitedata is (cid:104) d w ( t ) d w ( t + n ∆ t ) (cid:105) = δ n, , (B1)where the lowercase δ stands for the Kronecker delta. We define the time-series of matched filtering overlaps as z ( n ∆ t ) = (cid:88) m h w ( m ∆ t ) d w [( n − m )∆ t ] , (B2)which is equivalent to Eq. (11). We now consider the statistics of these overlaps.Let us first assume the whitening was successful, i.e., Eq. (B1) applies. In this case, the autocorrelation function ofthe score z ( t ) equals that of the waveform, i.e., (cid:104) z ( t ) z ( t + n ∆ t ) (cid:105) = (cid:104) h w (cid:126) ←− h w (cid:105) ( n ∆ t ) (B3)In particular, if we normalize the whitened waveform to satisfy (cid:80) n | h w ( n ∆ t ) | = 1, the autocorrelation functionequals unity at zero-lag, i.e., the score has unit variance. For typical waveforms, the autocorrelation function hasa very small time-domain width (for the waveform for GW150914, the time-domain half-width, defined as the lagat which the autocorrelation function drops to 0 .
5, equals ∼ . ∼ . z ( t ) are still normally distributed, but with a different variance.Formally, the new auto-correlation function is V ( n ∆ t ) = (cid:104) z ( t ) z ( t + n ∆ t ) (cid:105) = (cid:88) a,b h w ( a ∆ t ) h w ( b ∆ t ) (cid:104) d ∗ w ( t − a ∆ t ) d w [ t + ( n − b )∆ t ] (cid:105) . (B4)We can write the convolutions in Eq. (B4) in the Fourier domain. V ( n ∆ t ) = 1 N (cid:88) p,q (cid:104) (cid:102) h w ( f p ) (cid:105) ∗ (cid:102) h w ( f q ) (cid:68)(cid:104)(cid:102) d w ( f p ) (cid:105) ∗ (cid:102) d w ( f q ) (cid:69) e πi ( f q − f p ) t e πif q ∆ t , (B5)where N a is the number of samples used to compute the PSD drift correction. Instead of the general approach wefollowed in Section II, we use a segment of data that is shorter than the typical timescales over which the PSD varies,but assume that we used the wrong local PSD. In this case, the incorrectly whitened the data, d w , is locally describedby a stationary and real-valued Gaussian random variable: (cid:68)(cid:104)(cid:102) d w ( f p ) (cid:105) ∗ (cid:102) d w ( f q ) (cid:69) = N a t S d w ( f p ) δ p,q , (B6)6where S d is the two-sided power spectral density of the process. Substituting in Eq. (B5) gives us V ( n ∆ t ) = 1 N a ∆ t (cid:88) p (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) S d w ( f p ) e πif p n ∆ t . (B7)In the case where d w is white noise, S d w = 2∆ t , and using Parseval’s identity, the zero-lag auto-correlation functionof Eq. (B7) equals the normalization factor (cid:80) n | h w ( n ) | = 1.In practice, we use the estimator V = 1 N (cid:88) p (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:102) d w ( f p ) (cid:12)(cid:12)(cid:12) = 1 N a (cid:88) n z ( n ∆ t ) (B8)The variance of this estimator is σ V = (cid:104)V (cid:105) − (cid:104)V(cid:105) = 1 N (cid:88) p,q (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:102) h w ( f q ) (cid:12)(cid:12)(cid:12) (cid:104)(cid:68)(cid:104)(cid:102) d w ( f p ) (cid:105) ∗ (cid:102) d w ( f p ) (cid:104)(cid:102) d w ( f q ) (cid:105) ∗ (cid:102) d w ( f q ) (cid:69) − (cid:68)(cid:104)(cid:102) d w ( f p ) (cid:105) ∗ (cid:102) d w ( f p ) (cid:69) (cid:68)(cid:104)(cid:102) d w ( f q ) (cid:105) ∗ (cid:102) d w ( f q ) (cid:69)(cid:105) = 1 N (cid:88) p,q (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:102) h w ( f q ) (cid:12)(cid:12)(cid:12) (cid:104)(cid:68)(cid:104)(cid:102) d w ( f p ) (cid:105) ∗ (cid:104)(cid:102) d w ( f q ) (cid:105) ∗ (cid:69) (cid:68)(cid:102) d w ( f p ) (cid:102) d w ( f q ) (cid:69) + (cid:68)(cid:104)(cid:102) d w ( p ) (cid:105) ∗ (cid:102) d w ( q ) (cid:69) (cid:68)(cid:102) d w ( p ) (cid:104)(cid:102) d w ( q ) (cid:105) ∗ (cid:69)(cid:105) = 1 N (cid:88) p,q (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:102) h w ( f q ) (cid:12)(cid:12)(cid:12) (cid:18) N t (cid:19) S d w ( f p ) [ δ p, − q + δ p,q ]= 12 ( N a ∆ t ) (cid:88) p (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) S d w ( f p ) . (B9)We used Wick’s theorem in going from the second to the third line, and the real-valued nature of the template andthe data in the subsequent lines. If the data d w were white, as earlier, we have S d w = 2∆ t . Substituting this, we get σ V (cid:12)(cid:12)(cid:12)(cid:12) white d w = 2 N (cid:88) p (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) = 4 N (cid:88) p> (cid:12)(cid:12)(cid:12) (cid:102) h w ( f p ) (cid:12)(cid:12)(cid:12) . (B10)In the above formula, note that the FFT of the whitened template is performed on the interval of length N a . Inpractice, we want to solve for N a given a particular tolerance requirement. Suppose we have the whitened templateon a frequency grid conjugate to a fiducial window of length N . From the connection between the DFT and thecontinuous Fourier transform, we can scale the sum in Eq. (B10) according to the size of the grid. σ V (cid:12)(cid:12)(cid:12)(cid:12) white d w ≈ N a N (cid:88) p N > (cid:12)(cid:12)(cid:12) (cid:102) h w (cid:0) f p N (cid:1)(cid:12)(cid:12)(cid:12) . (B11)The formula above suggests that to measure the ( λ w ) = (cid:104)V(cid:105) = V to two percent relative error, we need O (15 000)samples, i.e., ∼
15 s for a typical waveform. [1] B. P. Abbott et al. (LIGO Scientific Collaborationand Virgo Collaboration), Physical Review Letters ,061102 (2016).[2] B. P. Abbott et al. (LIGO Scientific Collaborationand Virgo Collaboration), Physical Review Letters ,241103 (2016).[3] B. P. Abbott et al. (LIGO Scientific Collaboration andVirgo Collaboration), Phys. Rev. X , 041015 (2016).[4] B. P. Abbott et al. (LIGO Scientific Collaboration andVirgo Collaboration), Phys. Rev. Lett. , 221101(2017). [5] B. P. Abbott et al. (LIGO Scientific Collaboration andVirgo Collaboration), The Astrophysical Journal ,L35 (2017).[6] B. P. Abbott et al. (LIGO Scientific Collaboration andVirgo Collaboration), Phys. Rev. Lett. , 161101(2017).[7] B. P. Abbott et al. (LIGO Scientific Collaboration andVirgo Collaboration), (2018), arXiv:1811.12907 [astro-ph.HE].[8] B. Zackay, T. Venumadhav, L. Dai, J. Roulet, andM. Zaldarriaga, arXiv e-prints , arXiv:1902.10331 (2019), arXiv:1902.10331 [astro-ph.HE].[9] T. Venumadhav, B. Zackay, J. Roulet, L. Dai, andM. Zaldarriaga, (2019), arXiv:1904.07214v1 [astro-ph.HE].[10] S. Sachdev, S. Caudill, H. Fong, R. K. L. Lo, C. Mes-sick, D. Mukherjee, R. Magee, L. Tsukada, K. Black-burn, P. Brady, P. Brockill, K. Cannon, S. J. Cham-berlin, D. Chatterjee, J. D. E. Creighton, P. Godwin,A. Gupta, C. Hanna, S. Kapadia, R. N. Lang, T. G. F.Li, D. Meacher, A. Pace, S. Privitera, L. Sadeghian,L. Wade, M. Wade, A. Weinstein, and S. L. Xiao, (2019),arXiv:1901.08580v1 [gr-qc].[11] S. A. Usman, A. H. Nitz, I. W. Harry, C. M. Biwer,D. A. Brown, M. Cabero, C. D. Capano, T. Dal Canton,T. Dent, S. Fairhurst, M. S. Kehl, D. Keppel, B. Kr-ishnan, A. Lenon, A. Lundgren, A. B. Nielsen, L. P.Pekowsky, H. P. Pfeiffer, P. R. Saulson, M. West, and J. L. Willis, Classical and Quantum Gravity , 215004(2016), arXiv:1508.02357 [gr-qc].[12] T. Venumadhav, B. Zackay, J. Roulet, L. Dai, andM. Zaldarriaga, Phys. Rev. D , 023011 (2019).[13] B. P. Abbott et al. , Classical and Quantum Gravity ,065010 (2018), arXiv:1710.02185 [gr-qc].[14] “Gravitational Wave Open Science Center (GWOSC),” (accessed 2019-02-27).[15] M. Vallisneri, J. Kanner, R. Williams, A. Weinstein, andB. Stephens, in Journal of Physics Conference Series ,Journal of Physics Conference Series, Vol. 610 (2015) p.012021, arXiv:1410.4839 [gr-qc].[16] T. Venumadhav et al. , in preparation.[17] J. Roulet, L. Dai, T. Venumadhav, B. Zackay, andM. Zaldarriaga, Phys. Rev. D , 123022 (2019).[18] N. J. Cornish and T. B. Littenberg, Classical and Quan-tum Gravity32