A LOFAR RFI detection pipeline and its first results
AA LOFAR RFI detection pipeline and its first results
A.R. Offringa ∗ Kapteyn Astronomical Institute, University of Groningen, The NetherlandsE-mail: [email protected]
A.G. de Bruyn,
Kapteyn Astronomical Institute and ASTRON, The NetherlandsE-mail: [email protected]
S. Zaroubi,
Kapteyn Astronomical Institute, University of Groningen, The NetherlandsE-mail: [email protected]
M. Biehl
Institute for Mathematics and Computing Science, University of Groningen, The NetherlandsE-mail: [email protected]
Radio astronomy is entering a new era with new and future radio observatories such as the LowFrequency Array and the Square Kilometer Array. We describe in detail an automated flaggingpipeline and evaluate its performance. With only a fraction of the computational cost of correla-tion and its use of the previously introduced SumThreshold method, it is found to be both fast andunrivalled in its high accuracy. The LOFAR radio environment is analysed with the help of thispipeline. The high time and spectral resolution of LOFAR have resulted in an observatory whereonly a few percent of the data is lost due to RFI.
RFI mitigation workshop - RFI2010,March 29-31, 2010Groningen, the Netherlands ∗ Speaker. c (cid:13) Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike Licence. http://pos.sissa.it/ a r X i v : . [ a s t r o - ph . I M ] J u l LOFAR RFI pipeline
A.R. Offringa
1. Introduction
Now that radio telescopes of the next generation, such as the Low Frequency Array (LOFAR),the Murchison Widefield Array (MWA) and the Square Kilometer Array (SKA), are coming intooperation, the dawn of software-driven telescopes producing terabyte sized data sets has begun.Because of that, data reduction in radio astronomy is entering a new era in which more emphasisis put on automated data processing and pipelining the various steps in the data reduction. Oneimportant step in the reduction process is dealing with radio frequency interference (RFI). Althoughtrivial techniques performed by the data reducing scientist, such as manual baseline, time andfrequency selection, have been sufficient albeit tedious for the last decades, more sophisticatedautomated flagging procedures are required for the next generation telescopes.RFI mitigation can be performed pre- as well as post-correlation. Pre- and post-correlationtechniques are mostly complimentary: they find or remove different kinds of RFI. Hence, the im-plementation of one does not make the other obsolete. The pre-correlation mitigation stage candetect RFI bursts of a sub-integration time changing nature with minimal loss of data, though hasto be executed on-line , implying the requirement of real-time execution on small domains, in par-ticular on a small time interval with respect to the entire observation. Examples of pre-correlationmethods are based on thresholding [15, 9, 2, 10]; spatial filtering with eigenvalue decomposition[9, 13, 5]; and adaptive cancellation with a reference antenna [3].To deal with RFI, the post-correlation phase is the final resort. Demonstrated techniques in-clude the use of an independent RFI reference signal to subtract RFI [4]; an approach using singularvalue decomposition [11, 12]; and fringe fitting [1]. Since RFI comes in many forms [6, 8] not allcontaminated samples can be recovered, despite the numerous existing techniques. Therefore, flag-ging remains an important final step [11, 17, 16].The situation for RFI flagging strategies in modern observatories such as the Westerbork Syn-thesized Radio Telescope (WSRT), LOFAR and the Giant Metrewave Radio Telescope (GMRT)has changed. On one hand, the high time and frequency resolutions make accurate flagging of con-taminated samples available, resulting in smaller loss of data. On the other hand, radio quiet zonesare harder to achieve, and all of the above mentioned telescopes are situated in populated areas.Moreover, sensitivity requirements for telescopes are growing. For example, one of the LOFARkey science projects is the LOFAR Epoch of Reionization (EoR) project [7, 14], a very ambitiousproject with high demands on sensitivity and noise behaviour.Automated flaggers can be compared on accuracy, i.e., the true/false-positive ratio; the speedof the algorithm; robustness; and technical requirements that it imposes. Constructing a flaggerthat performs good on all aspects is challenging. In this article, we will describe a novel flaggingstrategy that has been designed to take this challenge, and has been implemented in the LOFARobservatory pipeline.
2. Method
In this section we will explain the flagger step by step. An overview of the flow of executionis given in Figure 1. 2
LOFAR RFI pipeline
A.R. Offringa
Mark obvious badchannels/time stepsCalculateAmplitudesSumThresholdChangeresolution 2D surface fitContinue iterating? (+increase sensitivity)
SumThreshold1D Density-baseddilationFlag badchannels/time stepsInput: one baselineTime-frequency dataXX, XY, YX, YY or Stokes I (or one after each other)
FlagsFlags outputYes
Figure 1:
Overview of the RFI flag-ging strategy
The flagger is executed on the amplitude information ofone polarisation of a single sub-band of a baseline. In LO-FAR’s common operation, a sub-band consists of 256 chan-nels of 0.8 kHz resolution. The full band has 248 sub-bands.LOFAR can observe in two bands: the 10-80 MHz low bandand the 110-240 MHz high band, which are observed by phys-ically different antennae.If speed is essential, the algorithm can be executed onceon the Stokes-I values. Otherwise, if accuracy is more impor-tant than speed, the algorithm can be executed on the individ-ual XX and YY or LL and RR polarisations, or on all polar-isations individually. We do see some RFI that manifests inonly one of the polarisations, or rotates through the polarisa-tions, and some advantage is therefore seen when flagging allpolarisations individually.
A part of the algorithm is iterated a few times, depictedin Figure 1 by the “Continue iterating” block. This is nec-essary for finding low-level RFI, as will be explained in thethresholding paragraph, §2.3. Iterations, however, are costly in term of speed, and should be keptto a minimum. To do so, the fit should converge quickly. We do this by entirely ignoring channelsand time steps in the first surface fit that superficially look bad, yet might only have been partiallyuncontaminated. The extra information that might have been added if the uncontaminated part ofthe channel or time step was added does not change the fit much, and therefore is not slowing downthe convergence.It was determined that performing the fit two times is enough for a stable, accurate fit. Thisis true for all data that was tested, in special for both WSRT and LOFAR data, and for both cleanbands and strongly contaminated bands.
SumThreshold method
The
SumThreshold method detects series of samples with higher values than expected.Consider the orthogonal slices ρ d ( x ) and Ω d ( x ) through the fit-subtracted visibilities R ( t , ν ) andthe flag mask W ( t , ν ) respectively, where W ( t , ν ) evaluates to zero if the corresponding value inthe mask at time t and frequency ν has been flagged and one otherwise. The function parameter x isscaled to have a unity step size in time or frequency, for respectively a slice through the frequencydirection when d = d =
2. The
SumThreshold method can now bedefined by the following decision rule: Ω n + d ( x ) = Ω nd ( x ) = ∨ ∃ i ∈ { . . . M n − } : M n − ∑ j = (cid:12)(cid:12) ρ d (cid:12) Ω nd ( x + i − j )) (cid:12)(cid:12) > χ MnMn − ∑ j = Ω nd ( x + i − j ) LOFAR RFI pipeline
A.R. Offringa with parameters M = { M , M , . . . , M max } , the set of combination lengths and χ M N , the corre-sponding threshold levels for a specific combination length M N ∈ M . In the previous study [11],the SumThreshold was introduced and was shown to produce the highest accuracy of currentpost-correlation RFI detection algorithms. We refer to this study for detailed information about the
SumThreshold method.
SumThreshold is performed in each iteration once, before the surface fit, in order to ignoreRFI when fitting. It is performed one last time when the surface fit is expected to have beenconverged, to establish the actual flags. To increase the stability of the strategy, the sensitivity ofthe
SumThreshold method starts low, i.e., it finds only the strongest RFI, and is exponentiallyincreased each time it is executed.
After
SumThreshold has found the contaminated samples, we observe especially after thefirst iteration, that some channels and time scans have not been fully flagged, even though theyare fully contaminated. As explained in §2.2, this might slow down convergence, which is whya second step was implemented in order to completely, hence inaccurately but quickly, flag thesechannels and time steps before fitting.In order to detect problematic channels and time steps, the values are compared based ontheir root mean square (RMS) values. The RMS series are Gaussian smoothed and if the differenceexceeds 3.5 times the standard deviation of the sequence of differences, they are flagged completely.Optionally, this selection can be executed again as the last step in the algorithm.
A surface fit is executed to subtract fringes caused by strong sources, in order to increaseaccuracy. Several fitting strategies have been tested, and all sliding window methods show similarlygood results in terms of accuracy. In non-sliding window approaches such as the dimensional-independent polynomial fit described in [17], we have observed instability near the borders of thefixed windows.A Gaussian kernel was found to produce the best average between speed, accuracy and stabil-ity. The accuracy is not significantly different from other sliding window fitting strategies, such asa dimensional-independent polynomial fit applied on a sliding window.Since the fit is, relative to the other operations, a time-consuming operation, the input time-frequency matrix is rescaled before fitting. The time dimension is three times reduced beforefitting, and the fitted Gaussians are interpolated to restore the original scale. No significant changein accuracy was observed, which underlines that the quality of the fit is, up to some point, not acrucial aspect of accurate detection.
It may be desirable to flag samples that are up to a few channels away from strong, continuousRFI. Thresholding does not flag these samples, if they are not significantly different in amplitude.Likewise, it may be desirable to flag more of a partially flagged channel, because a continuoustransmitter might be recorded at different amplitudes, either because of different propagation of the4
LOFAR RFI pipeline
A.R. Offringa (a) Time-frequency plot without the dilation (b) Time-frequency plot after dilation
Figure 2:
The result of a density-based dilation with η = . : the flags in panel (a) are established by the SumThreshold method and dilated based on the flag density. The result is shown in panel (b). Noticeabledifferences are the small gaps in orthogonal lines that have been filled by the dilation, such as the area withinthe red ellipse. While this diagram displays over 6000 time steps, the algorithm also fills many invisible smallholes: its behaviour is scale invariant. signal, because of the transmitter moving in respect with the beam or because of a transmitter’sintrinsically changing strength, and this might cause the received RFI not to trigger the thresholdin some samples. To overcome this problem, we enlarge the flag mask after the apparent RFI hasbeen flagged by the iterative procedure.A typical approach in this problem is to perform a morphological dilation operation on theflag mask. For example, a dilation with a square mask of size N × N would enlarge each flag toa square of N × N . Every sample, that has an orthogonal distance smaller than N samples froma flagged sample, would be flagged in this case. Although this technique is advantageous for itssimplicity and establishment in the field of mathematical morphology, using this technique for thedescribed purpose has the disadvantage of being inaccurate: it will typically flag too many sampleswhen only a few samples are flagged in some area, while too few samples will be flagged when achannel or time step is almost completely flagged.To correct for these problems, we introduce a dilation operation which mask size is related tothe one dimensional flag density: the dilation mask is larger in dense areas and smaller in sparseareas, in respect to either the one dimensional time domain or frequency domain.Consider an orthogonal slice Ω d ( x ) through the flag mask as defined in §2.3. The followingdensity-based decision rule is introduced: Ω (cid:48) d ( x ) = ∃ Y ≤ x , ∃ Y > x : Y − ∑ y = Y Ω d ( y ) ≤ η ( Y − Y ) η ∈ [ , ] is the density ratio threshold. In words, this rule flags the samples that are inany constructable area [ Y ; Y (cid:105) with an unflagged sample ratio less or equal than η . Specifically, Ω (cid:48) ( x ) = x if η =
1, while Ω (cid:48) ( x ) = Ω ( x ) for η =
0. Furthermore, since any element x with Ω ( x ) = Ω ( x ) = = ⇒ Ω (cid:48) ( x ) = O (cid:0) n (cid:1) operations for n samples in the orthogonal slice Ω d ( x ) , by putting extra constraints on Y and Y , an O ( n log n ) implementation is possible without much loss of its accuracy. Figure 2 showsthe result of such a dilation on actual data. 5 LOFAR RFI pipeline
A.R. Offringa
3. Computational requirements
Table 1 shows an estimate of the required floating point operations per sample for each indi-vidual step. The total number of operations required is on the order of 300 floating point operationsper sample. In a typical full LOFAR observation, the correlator will output 4 polarisations ×
256 channels/sub-band ×
248 sub-bands × baselines × ≈ . ∼ Table 1:
Computational requirements of the RFI pipeline
Step F/smp Count Total F/smp Calculating amplitudes 4 1 4Time/frequency selection 2 3 6
SumThreshold
20 3 60Change resolution 4 2 8Surface fit 50 2 100Density-based dilation 100 Although this is only a small fractionof the required computations for correla-tion, some simplifications can be madeto lower the computational requirements.Techniques to improve the computationalperformance include: flagging on Stokes-I values; using a larger resizing factor be-fore fitting; using a smaller window size;and determining the cross-correlation flag masks using autocorrelations.The LOFAR flagging pipeline will be run on an off-line computing cluster. The flaggingpipeline is parallellized by running each sub-band on a different computational node, and the flag-ging of the individual sub-bands is executed by a multi-threaded implementation. Concluding fromthe interpolation of the performance of the current implementation of the pipeline, which achievesprocessing 27 stations in a quarter of the observing time with its most computational expensiveflagging strategy, real-time performance can be realised in a full 50 station LOFAR.
4. Input/output requirements
Processing baseline by baseline in a pipeline has implications for the software architecture ofthe observatory: since baselines are correlated simultaneously, the observed visibilities have to bewritten to disk before running the RFI pipeline, which is inefficient. After finishing observing, theflagging pipeline can read the data in the required order. However, flagging is normally followedby tasks such as calibration and source subtraction. These tasks expect time-sorted data, therebyrequiring a second read of the data in its previously observed order. Since the architecture ofLOFAR allows this flow of processing, and because of the advantages of baseline by baselineflagging in terms of accuracy and computational speed, the input/output-overhead caused by thisdeficiency is ignorable. This however might become a serious issue in even larger telescopes suchas the SKA.
5. Flagging results
The implementation of the algorithm was tested on several LOFAR observations. Currently, Floating point operations per sample These are actually integer operations, since this step uses the masks. The software implementation of the presented RFI pipeline has been made publicly available and can be down-loaded from the following location: LOFAR RFI pipeline
A.R. Offringa (a) Time-frequency plot before flagging (b) Time-frequency plot after flagging V i s i b ili t y Time XXXYYXYY (c) Amplitude plot before flagging V i s i b ili t y Time XXXYYXYY (d) Amplitude plot after flagging V i s i b ili t y Frequency (MHz) BeforeAfter (e) Power spectrum
Figure 3:
Flagging results of the 6 hour LOFAR observation L2010_07096 of April 24, 2010. All plots showthe same randomly chosen sub-band around 156 MHz for a 1.5-km baseline (CS302_HBA1 × CS005_HBA0)with three second integration time. The flagging pipeline was run with its default settings, and 1.8% of thedata is flagged. As can be seen from panels (a), (c) and (e), this sub-band contains relatively many interferingtransmitters, yet all of them are relatively weak. The panels (b), (d) and (e) show the cleaned band afterflagging.
27 of the approximately 50 total LOFAR stations are ready. Flagging a single sub-band of a 6 hourobservation with the 27 stations takes 90 minutes on a single cluster node. This implies real-timeflagging speed for the full 50 station LOFAR that will produce four times more data. All RFI thatcan be found by visual inspection is typically flagged, thereby outperforming simpler methods suchas a median absolute threshold filter in both accuracy and speed. An example result can be foundin Figure 3.The pipeline is, by the use of the
SumThreshold method, very accurate. In some cases, thealgorithms finds RFI which is invisible by eye on full scale time-frequency diagrams, but becomesonly apparent when zooming in on the data and integrating certain cuts of the data cube. In the bandshown in figure 3 an interferer is visible at approximately 156.03 MHz. Although it is visible as asmall bump in the time integrated spectrum in Figure 3(e), it is not apparent in the time frequencyplot of Figure 3(a). Nevertheless, the algorithm finds the samples that are contaminated by theinterferer, and the particular bump at 156.03 MHz in Figure 3(e) is flattened.On the other hand, if an interferer has a smooth time-frequency profile, it will be mistakenfor astronomical data and will not be flagged. In these situations it might help to subtract a roughmodel for the celestial signal and increase the flagger’s sensitivity.7
LOFAR RFI pipeline
A.R. Offringa
6. LOFAR RFI environment
LOFAR breaks the tradition of building telescopes in sparsely populated areas, with its corebeing installed in the North-East of the Netherlands. Although the core is in a nature reserve,and therefore in a sparser populated part of the Netherlands, all the stations are relatively close tofarms, roads and some nearby municipalities. Now that LOFAR is half-way ready and performingrepresentable observations, we can start to evaluate the dynamic radio environment.The first results of RFI mitigation show several promising characteristics of the LOFAR site.First of all, hardly any broadband RFI is observed. If observed, it is typically caused by electricalfences, lightning, power cables, hardware in situ, cars and trains. It can be concluded that the siteis sufficiently remote and hardware on site is sufficiently shielded to prevent these interferences.Only one of the stations is close to an electrical fence that surrounds a farming meadow, causingbroadband spikes every two or three seconds. The flagging pipeline flags 40% of the data in thisstation, and this station is therefore currently not useful. Options include negotiation with thefarmer to switch off the electrical fence during observations or implementing an RFI nulling methodin the station that nulls spikes on a high time resolution.A second class of interferers are constant transmitters at a fixed frequency, such as FM radio.The FM range lies between the physically separated low and high bands. Transmitters in this rangeare therefore effectively blocked by the bandpass filters. Other constant sources that do transmitwithin the observing frequency often occupy only one or a few 0.8-kHz channels, which, after thepipeline has flagged these transmitters, cause only a minimal amount of data loss. While many ofthe sub-bands of 256-channels are completely clean of such constant transmitters, others have afew of such transmitters, such as the one shown in Figure 3.A third class of interferers are transient sources with variable frequency. These occur mostlyat random and their exact origin is often unknown. Some of these can be caused by moving objects,such as meteors or airplanes, that reflect a distant signal for a short period.Considering all classes of interferers, typical observations with representable stations showonly a few percent of data loss due to interference.
7. Conclusion and discussion
Radio astronomy is entering a new era with futuristic observatories such as LOFAR and theSKA. In this article we have presented a flagging technique that has shown the ability to operateaccurately and efficiently on the LOFAR observations. Therefore, this technique is also a goodbasis for future observatories.Because the computational costs of the RFI pipeline are only a fraction of the correlation costs,efficiently ordering the data before presentation to an RFI algorithm is the largest challenge, ratherthan optimising the computational costs. The pipeline also stipulates the importance of flexibility inan observatories’ architecture, which adds freedom to design decisions. The LOFAR architecturealso allows more sophisticated variations that include RFI mitigation at station level and differentpipelines based on the observation mode. With the example of a complicated pipeline as describedin this paper, it can be concluded that other algorithms such as transient detection and other patternrecognition techniques can be implemented in a similar manner in the pipeline.8
LOFAR RFI pipeline
A.R. Offringa
Both the software and hardware of LOFAR are still under construction. The first observationsof LOFAR nevertheless show very good prospects for the telescope, with only a few percent lostdata due to interferers and, highly important, neither broadband nor in situ interference is com-monly seen. The next step in RFI mitigation is to produce and analyse images on a maximumdynamic range, in order to analyse the effects of possible weak RFI that is undetectable in post-correlated time frequency domains. Prevention of new transmitters remains very important, andestablishment of a radio-quiet zone, especially around the core, is recommended.In order to improve data quality further, pre-correlation techniques might be added at stationlevel or during correlation. An interesting improvement to the robustness of a correlator mightbe to execute the
SumThreshold method prior to correlation. Considering the accuracy gainof the
SumThreshold compared to normal thresholding, and considering the correspondence ofRFI on small and large timescales, implementing this pre-correlation method on the highest timeresolution data might improve blanking accuracy further.
Acknowledgments
We thank the LOFAR collaboration for providing the data on which the flagger was developedand tested.
References [1] R. Athreya. A new approach to mitigation of radio frequency interference in interferometric data. AJ ,696:885–890, April 2009.[2] W. A. Baan, P. A. Fridman, and R. P. Millenaar. Radio Frequency Interference Mitigation at theWesterbork Synthesis Radio Telescope: Algorithms, Test Observations, and System Implementation. AJ , 128:933–949, August 2004.[3] C. Barnbaum and R. F. Bradley. A new approach to interference excision in radio astronomy:Real-time adaptive cancellation. AJ , 115:2598–2614, November 1998.[4] F.H. Briggs, J. F. Bell, and M. J. Kesteven. Removing radio interference from contaminatedastronomical spectra using an independent reference signal and closure relations. AJ , 120:3351–3361,December 2000.[5] Steven W. Ellingson and Grant A. Hampson. A subspace-tracking approach to interference nulling forphased array-based radio telescopes. IEEE Trans. on Antennas & Propagation , 50:25–30, January2002.[6] P. A. Fridman and W. A. Baan. RFI mitigation methods in radio astronomy.
A&A , 378:327–344,October 2001.[7] V. Jeli´c, S. Zaroubi, P. Labropoulos, et al. Foreground simulations for the LOFAR-epoch ofreionization experiment.
MNRAS , 389:1319–1335, September 2008.[8] John J. Lemmon. Wideband model of man-made hf noise and interference.
Radio Science ,32:525–539, March 1997.[9] Amir Leshem, Alle-Jan van der Veen, and Albert-Jan Boonstra. Multichannel interference mitigationtechniques in radio astronomy.
ApJS , 131:355–373, November 2000. LOFAR RFI pipeline
A.R. Offringa[10] N. Niamsuwan, J. T. Johnson, and S. W. Ellingson. Examination of a simple pulse-blanking techniquefor radio frequency interference mitigation.
Radio Science , 40, June 2005.[11] A. R. Offringa, A. G. de Bruyn, M. Biehl, S. Zaroubi, G. Bernardi, and V.N. Pandey. Post-correlationradio frequency interference classification methods.
MNRAS , 405:155–167, 2010.[12] Ue-Li Pen, Tzu-Ching Chang, Jeffrey B. Peterson, Jayanta Roy, Yashwant Gupta, Christopher M.Hirata, Julia Odegova, and Kris Sigurdson. The GMRT EoR Experiment: Limits on Polarized SkyBrightness at 150 MHz.
MNRAS , 399:181–194, 2009.[13] Bart Smolders and Grant Hampson. Deterministic RF nulling in phased arrays for the next generationof radio telescopes.
IEEE Antennas & Propagation magazine , 44:13–22, August 2002.[14] R. M. Thomas, S. Zaroubi, B. Ciardi, et al. Fast large-scale reionization simulations.
MNRAS ,393:32–48, February 2009.[15] R. Weber, C. Faye, F. Biraud, and J. Dansou. Spectral detector for interference time blanking.
A&AS ,126:161–167, 1997.[16] B. Winkel, J. Kerp, and P.M.W. Kalberla. Data reduction strategy of the Effelsberg-Bonn HI Survey(EBHIS).
Proc. Panoramic Radio Astronomy: Wide-field 1-2 GHz research on galaxy evolution,PoS(PRA2009)043 , June 2009.[17] B. Winkel, J. Kerp, and S. Stanko. RFI detection by automated feature extraction and statisticalanalysis. AN , 88:789–801, 2006., 88:789–801, 2006.