[PDF] A LOFAR RFI detection pipeline and its first results

Abstract

Radio astronomy is entering a new era with new and future radio observatories such as the Low Frequency Array and the Square Kilometer Array. We describe in detail an automated flagging pipeline and evaluate its performance. With only a fraction of the computational cost of correlation and its use of the previously introduced SumThreshold method, it is found to be both fast and unrivalled in its high accuracy. The LOFAR radio environment is analysed with the help of this pipeline. The high time and spectral resolution of LOFAR have resulted in an observatory where only a few percent of the data is lost due to RFI.

Full PDF

AA LOFAR RFI detection pipeline and its ﬁrst results

A.R. Offringa ∗ Kapteyn Astronomical Institute, University of Groningen, The NetherlandsE-mail: [email protected]

A.G. de Bruyn,

Kapteyn Astronomical Institute and ASTRON, The NetherlandsE-mail: [email protected]

S. Zaroubi,

Kapteyn Astronomical Institute, University of Groningen, The NetherlandsE-mail: [email protected]

M. Biehl

Institute for Mathematics and Computing Science, University of Groningen, The NetherlandsE-mail: [email protected]

Radio astronomy is entering a new era with new and future radio observatories such as the LowFrequency Array and the Square Kilometer Array. We describe in detail an automated ﬂaggingpipeline and evaluate its performance. With only a fraction of the computational cost of correla-tion and its use of the previously introduced SumThreshold method, it is found to be both fast andunrivalled in its high accuracy. The LOFAR radio environment is analysed with the help of thispipeline. The high time and spectral resolution of LOFAR have resulted in an observatory whereonly a few percent of the data is lost due to RFI.

RFI mitigation workshop - RFI2010,March 29-31, 2010Groningen, the Netherlands ∗ Speaker. c (cid:13) Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike Licence. http://pos.sissa.it/ a r X i v : . [ a s t r o - ph . I M ] J u l LOFAR RFI pipeline

A.R. Offringa

1. Introduction

Now that radio telescopes of the next generation, such as the Low Frequency Array (LOFAR),the Murchison Wideﬁeld Array (MWA) and the Square Kilometer Array (SKA), are coming intooperation, the dawn of software-driven telescopes producing terabyte sized data sets has begun.Because of that, data reduction in radio astronomy is entering a new era in which more emphasisis put on automated data processing and pipelining the various steps in the data reduction. Oneimportant step in the reduction process is dealing with radio frequency interference (RFI). Althoughtrivial techniques performed by the data reducing scientist, such as manual baseline, time andfrequency selection, have been sufﬁcient albeit tedious for the last decades, more sophisticatedautomated ﬂagging procedures are required for the next generation telescopes.RFI mitigation can be performed pre- as well as post-correlation. Pre- and post-correlationtechniques are mostly complimentary: they ﬁnd or remove different kinds of RFI. Hence, the im-plementation of one does not make the other obsolete. The pre-correlation mitigation stage candetect RFI bursts of a sub-integration time changing nature with minimal loss of data, though hasto be executed on-line , implying the requirement of real-time execution on small domains, in par-ticular on a small time interval with respect to the entire observation. Examples of pre-correlationmethods are based on thresholding [15, 9, 2, 10]; spatial ﬁltering with eigenvalue decomposition[9, 13, 5]; and adaptive cancellation with a reference antenna [3].To deal with RFI, the post-correlation phase is the ﬁnal resort. Demonstrated techniques in-clude the use of an independent RFI reference signal to subtract RFI [4]; an approach using singularvalue decomposition [11, 12]; and fringe ﬁtting [1]. Since RFI comes in many forms [6, 8] not allcontaminated samples can be recovered, despite the numerous existing techniques. Therefore, ﬂag-ging remains an important ﬁnal step [11, 17, 16].The situation for RFI ﬂagging strategies in modern observatories such as the Westerbork Syn-thesized Radio Telescope (WSRT), LOFAR and the Giant Metrewave Radio Telescope (GMRT)has changed. On one hand, the high time and frequency resolutions make accurate ﬂagging of con-taminated samples available, resulting in smaller loss of data. On the other hand, radio quiet zonesare harder to achieve, and all of the above mentioned telescopes are situated in populated areas.Moreover, sensitivity requirements for telescopes are growing. For example, one of the LOFARkey science projects is the LOFAR Epoch of Reionization (EoR) project [7, 14], a very ambitiousproject with high demands on sensitivity and noise behaviour.Automated ﬂaggers can be compared on accuracy, i.e., the true/false-positive ratio; the speedof the algorithm; robustness; and technical requirements that it imposes. Constructing a ﬂaggerthat performs good on all aspects is challenging. In this article, we will describe a novel ﬂaggingstrategy that has been designed to take this challenge, and has been implemented in the LOFARobservatory pipeline.

2. Method

In this section we will explain the ﬂagger step by step. An overview of the ﬂow of executionis given in Figure 1. 2

LOFAR RFI pipeline

A.R. Offringa

Mark obvious badchannels/time stepsCalculateAmplitudesSumThresholdChangeresolution 2D surface fitContinue iterating? (+increase sensitivity)

SumThreshold1D Density-baseddilationFlag badchannels/time stepsInput: one baselineTime-frequency dataXX, XY, YX, YY or Stokes I (or one after each other)

FlagsFlags outputYes

Figure 1:

Overview of the RFI ﬂag-ging strategy

The ﬂagger is executed on the amplitude information ofone polarisation of a single sub-band of a baseline. In LO-FAR’s common operation, a sub-band consists of 256 chan-nels of 0.8 kHz resolution. The full band has 248 sub-bands.LOFAR can observe in two bands: the 10-80 MHz low bandand the 110-240 MHz high band, which are observed by phys-ically different antennae.If speed is essential, the algorithm can be executed onceon the Stokes-I values. Otherwise, if accuracy is more impor-tant than speed, the algorithm can be executed on the individ-ual XX and YY or LL and RR polarisations, or on all polar-isations individually. We do see some RFI that manifests inonly one of the polarisations, or rotates through the polarisa-tions, and some advantage is therefore seen when ﬂagging allpolarisations individually.

A part of the algorithm is iterated a few times, depictedin Figure 1 by the “Continue iterating” block. This is nec-essary for ﬁnding low-level RFI, as will be explained in thethresholding paragraph, §2.3. Iterations, however, are costly in term of speed, and should be keptto a minimum. To do so, the ﬁt should converge quickly. We do this by entirely ignoring channelsand time steps in the ﬁrst surface ﬁt that superﬁcially look bad, yet might only have been partiallyuncontaminated. The extra information that might have been added if the uncontaminated part ofthe channel or time step was added does not change the ﬁt much, and therefore is not slowing downthe convergence.It was determined that performing the ﬁt two times is enough for a stable, accurate ﬁt. Thisis true for all data that was tested, in special for both WSRT and LOFAR data, and for both cleanbands and strongly contaminated bands.

SumThreshold method

The

SumThreshold method detects series of samples with higher values than expected.Consider the orthogonal slices ρ d ( x ) and Ω d ( x ) through the ﬁt-subtracted visibilities R ( t , ν ) andthe ﬂag mask W ( t , ν ) respectively, where W ( t , ν ) evaluates to zero if the corresponding value inthe mask at time t and frequency ν has been ﬂagged and one otherwise. The function parameter x isscaled to have a unity step size in time or frequency, for respectively a slice through the frequencydirection when d = d =

2. The

SumThreshold method can now bedeﬁned by the following decision rule: Ω n + d ( x ) =  Ω nd ( x ) = ∨ ∃ i ∈ { . . . M n − } : M n − ∑ j = (cid:12)(cid:12) ρ d (cid:12) Ω nd ( x + i − j )) (cid:12)(cid:12) > χ MnMn − ∑ j = Ω nd ( x + i − j ) LOFAR RFI pipeline

A.R. Offringa with parameters M = { M , M , . . . , M max } , the set of combination lengths and χ M N , the corre-sponding threshold levels for a speciﬁc combination length M N ∈ M . In the previous study [11],the SumThreshold was introduced and was shown to produce the highest accuracy of currentpost-correlation RFI detection algorithms. We refer to this study for detailed information about the

SumThreshold method.

SumThreshold is performed in each iteration once, before the surface ﬁt, in order to ignoreRFI when ﬁtting. It is performed one last time when the surface ﬁt is expected to have beenconverged, to establish the actual ﬂags. To increase the stability of the strategy, the sensitivity ofthe

SumThreshold method starts low, i.e., it ﬁnds only the strongest RFI, and is exponentiallyincreased each time it is executed.

After

SumThreshold has found the contaminated samples, we observe especially after theﬁrst iteration, that some channels and time scans have not been fully ﬂagged, even though theyare fully contaminated. As explained in §2.2, this might slow down convergence, which is whya second step was implemented in order to completely, hence inaccurately but quickly, ﬂag thesechannels and time steps before ﬁtting.In order to detect problematic channels and time steps, the values are compared based ontheir root mean square (RMS) values. The RMS series are Gaussian smoothed and if the differenceexceeds 3.5 times the standard deviation of the sequence of differences, they are ﬂagged completely.Optionally, this selection can be executed again as the last step in the algorithm.

A surface ﬁt is executed to subtract fringes caused by strong sources, in order to increaseaccuracy. Several ﬁtting strategies have been tested, and all sliding window methods show similarlygood results in terms of accuracy. In non-sliding window approaches such as the dimensional-independent polynomial ﬁt described in [17], we have observed instability near the borders of theﬁxed windows.A Gaussian kernel was found to produce the best average between speed, accuracy and stabil-ity. The accuracy is not signiﬁcantly different from other sliding window ﬁtting strategies, such asa dimensional-independent polynomial ﬁt applied on a sliding window.Since the ﬁt is, relative to the other operations, a time-consuming operation, the input time-frequency matrix is rescaled before ﬁtting. The time dimension is three times reduced beforeﬁtting, and the ﬁtted Gaussians are interpolated to restore the original scale. No signiﬁcant changein accuracy was observed, which underlines that the quality of the ﬁt is, up to some point, not acrucial aspect of accurate detection.

It may be desirable to ﬂag samples that are up to a few channels away from strong, continuousRFI. Thresholding does not ﬂag these samples, if they are not signiﬁcantly different in amplitude.Likewise, it may be desirable to ﬂag more of a partially ﬂagged channel, because a continuoustransmitter might be recorded at different amplitudes, either because of different propagation of the4

LOFAR RFI pipeline

A.R. Offringa (a) Time-frequency plot without the dilation (b) Time-frequency plot after dilation

Figure 2:

The result of a density-based dilation with η = . : the ﬂags in panel (a) are established by the SumThreshold method and dilated based on the ﬂag density. The result is shown in panel (b). Noticeabledifferences are the small gaps in orthogonal lines that have been ﬁlled by the dilation, such as the area withinthe red ellipse. While this diagram displays over 6000 time steps, the algorithm also ﬁlls many invisible smallholes: its behaviour is scale invariant. signal, because of the transmitter moving in respect with the beam or because of a transmitter’sintrinsically changing strength, and this might cause the received RFI not to trigger the thresholdin some samples. To overcome this problem, we enlarge the ﬂag mask after the apparent RFI hasbeen ﬂagged by the iterative procedure.A typical approach in this problem is to perform a morphological dilation operation on theﬂag mask. For example, a dilation with a square mask of size N × N would enlarge each ﬂag toa square of N × N . Every sample, that has an orthogonal distance smaller than N samples froma ﬂagged sample, would be ﬂagged in this case. Although this technique is advantageous for itssimplicity and establishment in the ﬁeld of mathematical morphology, using this technique for thedescribed purpose has the disadvantage of being inaccurate: it will typically ﬂag too many sampleswhen only a few samples are ﬂagged in some area, while too few samples will be ﬂagged when achannel or time step is almost completely ﬂagged.To correct for these problems, we introduce a dilation operation which mask size is related tothe one dimensional ﬂag density: the dilation mask is larger in dense areas and smaller in sparseareas, in respect to either the one dimensional time domain or frequency domain.Consider an orthogonal slice Ω d ( x ) through the ﬂag mask as deﬁned in §2.3. The followingdensity-based decision rule is introduced: Ω (cid:48) d ( x ) =  ∃ Y ≤ x , ∃ Y > x : Y − ∑ y = Y Ω d ( y ) ≤ η ( Y − Y ) η ∈ [ , ] is the density ratio threshold. In words, this rule ﬂags the samples that are inany constructable area [ Y ; Y (cid:105) with an unﬂagged sample ratio less or equal than η . Speciﬁcally, Ω (cid:48) ( x ) = x if η =

1, while Ω (cid:48) ( x ) = Ω ( x ) for η =

0. Furthermore, since any element x with Ω ( x ) = Ω ( x ) = = ⇒ Ω (cid:48) ( x ) = O (cid:0) n (cid:1) operations for n samples in the orthogonal slice Ω d ( x ) , by putting extra constraints on Y and Y , an O ( n log n ) implementation is possible without much loss of its accuracy. Figure 2 showsthe result of such a dilation on actual data. 5 LOFAR RFI pipeline

A.R. Offringa

3. Computational requirements

Table 1 shows an estimate of the required ﬂoating point operations per sample for each indi-vidual step. The total number of operations required is on the order of 300 ﬂoating point operationsper sample. In a typical full LOFAR observation, the correlator will output 4 polarisations ×

256 channels/sub-band ×

248 sub-bands × baselines × ≈ . ∼ Table 1:

Computational requirements of the RFI pipeline

Step F/smp Count Total F/smp Calculating amplitudes 4 1 4Time/frequency selection 2 3 6

SumThreshold

20 3 60Change resolution 4 2 8Surface ﬁt 50 2 100Density-based dilation 100 Although this is only a small fractionof the required computations for correla-tion, some simpliﬁcations can be madeto lower the computational requirements.Techniques to improve the computationalperformance include: ﬂagging on Stokes-I values; using a larger resizing factor be-fore ﬁtting; using a smaller window size;and determining the cross-correlation ﬂag masks using autocorrelations.The LOFAR ﬂagging pipeline will be run on an off-line computing cluster. The ﬂaggingpipeline is parallellized by running each sub-band on a different computational node, and the ﬂag-ging of the individual sub-bands is executed by a multi-threaded implementation. Concluding fromthe interpolation of the performance of the current implementation of the pipeline, which achievesprocessing 27 stations in a quarter of the observing time with its most computational expensiveﬂagging strategy, real-time performance can be realised in a full 50 station LOFAR.

4. Input/output requirements

Processing baseline by baseline in a pipeline has implications for the software architecture ofthe observatory: since baselines are correlated simultaneously, the observed visibilities have to bewritten to disk before running the RFI pipeline, which is inefﬁcient. After ﬁnishing observing, theﬂagging pipeline can read the data in the required order. However, ﬂagging is normally followedby tasks such as calibration and source subtraction. These tasks expect time-sorted data, therebyrequiring a second read of the data in its previously observed order. Since the architecture ofLOFAR allows this ﬂow of processing, and because of the advantages of baseline by baselineﬂagging in terms of accuracy and computational speed, the input/output-overhead caused by thisdeﬁciency is ignorable. This however might become a serious issue in even larger telescopes suchas the SKA.

5. Flagging results

The implementation of the algorithm was tested on several LOFAR observations. Currently, Floating point operations per sample These are actually integer operations, since this step uses the masks. The software implementation of the presented RFI pipeline has been made publicly available and can be down-loaded from the following location: LOFAR RFI pipeline

A.R. Offringa (a) Time-frequency plot before ﬂagging (b) Time-frequency plot after ﬂagging V i s i b ili t y Time XXXYYXYY (c) Amplitude plot before ﬂagging V i s i b ili t y Time XXXYYXYY (d) Amplitude plot after ﬂagging V i s i b ili t y Frequency (MHz) BeforeAfter (e) Power spectrum

Figure 3:

Flagging results of the 6 hour LOFAR observation L2010_07096 of April 24, 2010. All plots showthe same randomly chosen sub-band around 156 MHz for a 1.5-km baseline (CS302_HBA1 × CS005_HBA0)with three second integration time. The ﬂagging pipeline was run with its default settings, and 1.8% of thedata is ﬂagged. As can be seen from panels (a), (c) and (e), this sub-band contains relatively many interferingtransmitters, yet all of them are relatively weak. The panels (b), (d) and (e) show the cleaned band afterﬂagging.

27 of the approximately 50 total LOFAR stations are ready. Flagging a single sub-band of a 6 hourobservation with the 27 stations takes 90 minutes on a single cluster node. This implies real-timeﬂagging speed for the full 50 station LOFAR that will produce four times more data. All RFI thatcan be found by visual inspection is typically ﬂagged, thereby outperforming simpler methods suchas a median absolute threshold ﬁlter in both accuracy and speed. An example result can be foundin Figure 3.The pipeline is, by the use of the

SumThreshold method, very accurate. In some cases, thealgorithms ﬁnds RFI which is invisible by eye on full scale time-frequency diagrams, but becomesonly apparent when zooming in on the data and integrating certain cuts of the data cube. In the bandshown in ﬁgure 3 an interferer is visible at approximately 156.03 MHz. Although it is visible as asmall bump in the time integrated spectrum in Figure 3(e), it is not apparent in the time frequencyplot of Figure 3(a). Nevertheless, the algorithm ﬁnds the samples that are contaminated by theinterferer, and the particular bump at 156.03 MHz in Figure 3(e) is ﬂattened.On the other hand, if an interferer has a smooth time-frequency proﬁle, it will be mistakenfor astronomical data and will not be ﬂagged. In these situations it might help to subtract a roughmodel for the celestial signal and increase the ﬂagger’s sensitivity.7

LOFAR RFI pipeline

A.R. Offringa

6. LOFAR RFI environment

LOFAR breaks the tradition of building telescopes in sparsely populated areas, with its corebeing installed in the North-East of the Netherlands. Although the core is in a nature reserve,and therefore in a sparser populated part of the Netherlands, all the stations are relatively close tofarms, roads and some nearby municipalities. Now that LOFAR is half-way ready and performingrepresentable observations, we can start to evaluate the dynamic radio environment.The ﬁrst results of RFI mitigation show several promising characteristics of the LOFAR site.First of all, hardly any broadband RFI is observed. If observed, it is typically caused by electricalfences, lightning, power cables, hardware in situ, cars and trains. It can be concluded that the siteis sufﬁciently remote and hardware on site is sufﬁciently shielded to prevent these interferences.Only one of the stations is close to an electrical fence that surrounds a farming meadow, causingbroadband spikes every two or three seconds. The ﬂagging pipeline ﬂags 40% of the data in thisstation, and this station is therefore currently not useful. Options include negotiation with thefarmer to switch off the electrical fence during observations or implementing an RFI nulling methodin the station that nulls spikes on a high time resolution.A second class of interferers are constant transmitters at a ﬁxed frequency, such as FM radio.The FM range lies between the physically separated low and high bands. Transmitters in this rangeare therefore effectively blocked by the bandpass ﬁlters. Other constant sources that do transmitwithin the observing frequency often occupy only one or a few 0.8-kHz channels, which, after thepipeline has ﬂagged these transmitters, cause only a minimal amount of data loss. While many ofthe sub-bands of 256-channels are completely clean of such constant transmitters, others have afew of such transmitters, such as the one shown in Figure 3.A third class of interferers are transient sources with variable frequency. These occur mostlyat random and their exact origin is often unknown. Some of these can be caused by moving objects,such as meteors or airplanes, that reﬂect a distant signal for a short period.Considering all classes of interferers, typical observations with representable stations showonly a few percent of data loss due to interference.

7. Conclusion and discussion

Radio astronomy is entering a new era with futuristic observatories such as LOFAR and theSKA. In this article we have presented a ﬂagging technique that has shown the ability to operateaccurately and efﬁciently on the LOFAR observations. Therefore, this technique is also a goodbasis for future observatories.Because the computational costs of the RFI pipeline are only a fraction of the correlation costs,efﬁciently ordering the data before presentation to an RFI algorithm is the largest challenge, ratherthan optimising the computational costs. The pipeline also stipulates the importance of ﬂexibility inan observatories’ architecture, which adds freedom to design decisions. The LOFAR architecturealso allows more sophisticated variations that include RFI mitigation at station level and differentpipelines based on the observation mode. With the example of a complicated pipeline as describedin this paper, it can be concluded that other algorithms such as transient detection and other patternrecognition techniques can be implemented in a similar manner in the pipeline.8

LOFAR RFI pipeline

A.R. Offringa

Both the software and hardware of LOFAR are still under construction. The ﬁrst observationsof LOFAR nevertheless show very good prospects for the telescope, with only a few percent lostdata due to interferers and, highly important, neither broadband nor in situ interference is com-monly seen. The next step in RFI mitigation is to produce and analyse images on a maximumdynamic range, in order to analyse the effects of possible weak RFI that is undetectable in post-correlated time frequency domains. Prevention of new transmitters remains very important, andestablishment of a radio-quiet zone, especially around the core, is recommended.In order to improve data quality further, pre-correlation techniques might be added at stationlevel or during correlation. An interesting improvement to the robustness of a correlator mightbe to execute the

SumThreshold method prior to correlation. Considering the accuracy gainof the

SumThreshold compared to normal thresholding, and considering the correspondence ofRFI on small and large timescales, implementing this pre-correlation method on the highest timeresolution data might improve blanking accuracy further.

Acknowledgments

We thank the LOFAR collaboration for providing the data on which the ﬂagger was developedand tested.

References [1] R. Athreya. A new approach to mitigation of radio frequency interference in interferometric data. AJ ,696:885–890, April 2009.[2] W. A. Baan, P. A. Fridman, and R. P. Millenaar. Radio Frequency Interference Mitigation at theWesterbork Synthesis Radio Telescope: Algorithms, Test Observations, and System Implementation. AJ , 128:933–949, August 2004.[3] C. Barnbaum and R. F. Bradley. A new approach to interference excision in radio astronomy:Real-time adaptive cancellation. AJ , 115:2598–2614, November 1998.[4] F.H. Briggs, J. F. Bell, and M. J. Kesteven. Removing radio interference from contaminatedastronomical spectra using an independent reference signal and closure relations. AJ , 120:3351–3361,December 2000.[5] Steven W. Ellingson and Grant A. Hampson. A subspace-tracking approach to interference nulling forphased array-based radio telescopes. IEEE Trans. on Antennas & Propagation , 50:25–30, January2002.[6] P. A. Fridman and W. A. Baan. RFI mitigation methods in radio astronomy.

A&A , 378:327–344,October 2001.[7] V. Jeli´c, S. Zaroubi, P. Labropoulos, et al. Foreground simulations for the LOFAR-epoch ofreionization experiment.

MNRAS , 389:1319–1335, September 2008.[8] John J. Lemmon. Wideband model of man-made hf noise and interference.

Radio Science ,32:525–539, March 1997.[9] Amir Leshem, Alle-Jan van der Veen, and Albert-Jan Boonstra. Multichannel interference mitigationtechniques in radio astronomy.

ApJS , 131:355–373, November 2000. LOFAR RFI pipeline

A.R. Offringa[10] N. Niamsuwan, J. T. Johnson, and S. W. Ellingson. Examination of a simple pulse-blanking techniquefor radio frequency interference mitigation.

Radio Science , 40, June 2005.[11] A. R. Offringa, A. G. de Bruyn, M. Biehl, S. Zaroubi, G. Bernardi, and V.N. Pandey. Post-correlationradio frequency interference classiﬁcation methods.

MNRAS , 405:155–167, 2010.[12] Ue-Li Pen, Tzu-Ching Chang, Jeffrey B. Peterson, Jayanta Roy, Yashwant Gupta, Christopher M.Hirata, Julia Odegova, and Kris Sigurdson. The GMRT EoR Experiment: Limits on Polarized SkyBrightness at 150 MHz.

MNRAS , 399:181–194, 2009.[13] Bart Smolders and Grant Hampson. Deterministic RF nulling in phased arrays for the next generationof radio telescopes.

IEEE Antennas & Propagation magazine , 44:13–22, August 2002.[14] R. M. Thomas, S. Zaroubi, B. Ciardi, et al. Fast large-scale reionization simulations.

MNRAS ,393:32–48, February 2009.[15] R. Weber, C. Faye, F. Biraud, and J. Dansou. Spectral detector for interference time blanking.

A&AS ,126:161–167, 1997.[16] B. Winkel, J. Kerp, and P.M.W. Kalberla. Data reduction strategy of the Effelsberg-Bonn HI Survey(EBHIS).

Proc. Panoramic Radio Astronomy: Wide-ﬁeld 1-2 GHz research on galaxy evolution,PoS(PRA2009)043 , June 2009.[17] B. Winkel, J. Kerp, and S. Stanko. RFI detection by automated feature extraction and statisticalanalysis. AN , 88:789–801, 2006., 88:789–801, 2006.