A practical preconditioner for wide-field continuum imaging of radio interferometric data
Hertzog L. Bester, Audrey Repetti, Simon Perkins, Oleg M. Smirnov, Jonathan S. Kenyon
aa r X i v : . [ a s t r o - ph . I M ] J a n A practical preconditioner for wide-field continuum imaging ofradio interferometric data
Hertzog L. Bester, , Audrey Repetti, Simon Perkins, , Oleg M. Smirnov, , and Jonathan S. Kenyon , South African Radio Astronomy Observatory, Cape Town, Western Cape,South Africa; [email protected] Rhodes University, Makhanda (Grahamstown), Eastern Cape, South Africa Institute of Sensors, Signals and Systems, Heriot-Watt University, Edinburgh,United Kingdom
Abstract.
The celebrated CLEAN algorithm has been the cornerstone of deconvo-lution algorithms in radio interferometry almost since its conception in the 1970s. Forall its faults, CLEAN is remarkably fast, robust to calibration artefacts and in its abilityto model point sources. We demonstrate how the same assumptions that a ff ord CLEANits speed can be used to accelerate more sophisticated deconvolution algorithms.
1. Introduction
There is by now a wealth of research dedicated to improving interferometric imagingtechniques (eg. Thouvenin et al. (2020); Arras et al. (2020a)). Despite the fact thatmany algorithms are capable of superior imaging performance compared to CLEAN,they have not been widely adopted by the radio astronomy community. One of themain reasons for this is increased computational complexity. In particular, comparedto CLEAN, most competitor algorithms require many more applications of the fullmeasurement operator which is often the most expensive aspect of the imaging problem.In what follows we demonstrate how, by approximating the Hessian of the datafidelity term (i.e. negative log-likelihood) as a convolution with the point spread func-tion, it is possible to develop an e ff ective preconditioner for a proximal gradient basedimaging algorithm. The resulting algorithm, dubbed preconditioned forward-backwardclean ( pfb-clean ), often requires very few applications of the full measurement operatorand is therefore particularly suited to imaging radio interferometric data in the regimewhere the data size is much larger than that of the image.
2. Methodology
The interferometric imaging problem amounts to finding an estimate of an unknowndiscretised image of the sky x from incomplete measurements V . The correspondingill-posed inverse problem is given by V = Rx + ǫ, ǫ ∼ N (0 , Σ ) , (1)1 Bester et al.where R is the linear measurement operator and ǫ is a realisation of Gaussian noisewith (assumed diagonal) covariance matrix Σ . During continuum imaging, it is typi-cal to assume that flux varies slowly as a function of frequency and that it is su ffi cientto reconstruct the image at a much lower frequency resolution than that of the data.Thus (for an ideal unpolarised interferometer) R is a map R : R n b × n p → C n ν × n d with n b ≪ n ν where n ν is the number of frequency channels, n d the number of rows perchannel, n b the number of imaging bands and n p the number of image pixels per band.This map can be implemented e ffi ciently as a 2D non-uniform fast Fourier transformper imaging band with a correction that accounts for wide field e ff ects (see eg. Arraset al. (2020b)). We use the dask wrappers of the wgridder Arras et al. (2020b) in codex-africanus (https: // github.com / ska-sa / codex-africanus) to implement the mea-surement operator throughout and use dask-ms (https: // github.com / ska-sa / dask-ms) asthe data access layer (see Perkins et al. (2021)).Since the noise is Gaussian, the data fidelity term and its gradient are given by f ( x ) =
12 ( V − Rx ) † Σ − ( V − Rx ) (2) ∇ x f = − R † Σ − ( V − Rx ) = R † Σ − Rx − I d , (3)where the dirty image I d = R † Σ − V is the data projected into image space which, since n b × n p ≪ n ν × n d , results in a significant dimensionality reduction. Unfortunately,the presence of the Hessian R † Σ − R means that gradient computations are inevitablyexpensive. However, by using the Fourier convolution theorem and the approximateFourier nature of R , we can approximate the gradient as ∇ x f ≈ I ps f ⋆ x − I d , (4)where I ps f ⋆ denotes convolution with the point spread function (PSF) of the instru-ment. It is not, in general, su ffi cient to use (4) to approximate the gradient because itneglects wide-field e ff ects when the array is not coplanar.The interferometric imaging problem amounts to solving problems of the formminimise x f ( x ) + r ( x ) , (5)where r ( x ) is some regularising function. One popular choice for solving (5) when r ( x )is not smooth is the forward-backward proximal gradient algorithm. Such algorithmsusually require many gradient evaluations. Thus, we propose using a preconditionedvariant of the forward-backward algorithm (see Repetti & Wiaux (2020)) that can dras-tically reduce the required number of gradient evaluations. The algorithm alternatesbetween gradient (forward) and backward steps given, respectively, by x k + = x k − γ U − ∇ x f ( x k ) , (6) x k + = prox U γ r ( x k + ) = argmin x r ( x ) + γ ( x − x k + ) † U ( x − x k + ) . (7)The choice of U is not arbitrary but subject to some mild constraints (see Repetti &Wiaux (2020) for full details). The similarity of (6) to the standard Gauss-Newtonupdate rule suggests using the Hessian of f ( · ) as the preconditioner. Intuitively, sincethe inverse Hessian approximates the covariance in the update, the requirement in (7)fb-clean 3would then simply mean that we have to keep track of this approximate covariance dur-ing the backward step. However, simply using U = R † Σ − R is i) not possible becauseit is not invertible and ii) not practical because it would require too many applicationsof the measurement operator. Fortunately, a valid preconditioner can be obtained byadding a small multiple of the identity to the approximate Hessian i.e. U = I ps f ⋆ + σ Id , (8)where Id is the identity and σ > U . We can thenexploit e ffi cient matrix vector products of the operator U and use the conjugate gradientalgorithm to compute (6). Similarly, the primal dual algorithm of Condat (2013) canbe used to solve (7) for a large class of regularisers. Importantly, the preconditioningstrategy makes it possible to use large step sizes (typically γ .
1) so that very few exactgradient evaluations are required.
Figure 1. Comparison of Cygnus A reconstructed with wsclean (left) and pfb-clean (right). The top panel shows the science data products while the bottom panelshows the corresponding residual images.
3. Results and discussion
Figure 1 shows a Stokes I multi-frequency synthesis image of Cygnus A reconstructedfrom S band VLA data in the A, B, C and D configurations Sebokolodi et al. (2020)using a regulariser of the form r ( x ) = λ k Ψ † x k , + ι [0 , + ∞ ] ( x ) , (9)where k · k , is the l , norm, ι [0 , + ∞ ] ( · ) is an indicator function used to enforce positivityof the image, λ sets the strength of the l , prior and Ψ is a dictionary containing the Bester et al.Dirac and first six Daubechies wavelets. We compare our result to that obtained with themulti-scale CLEAN (ms-clean) algorithm (with default multi-scale settings and usingauto-masking) in wsclean O ff ringa & Smirnov (2017). For both applications, we cre-ate 1728 × Ψ dictionary is more expressive than the oneutilised by ms-clean, pfb-clean is more sensitive to calibration artefacts. These can bepartially mitigated by dialling up the value of λ but this comes at the price of not beingable to capture some of the finer morphological features. Also note that, although pfb-clean produces worse residuals, these are naturally weighted residuals correspondingto the actual science data product which respects the positivity of flux, unlike the resid-uals produced by ms-clean. Finally, in this example, ms-clean requires eight gradientevaluations to reach the final threshold of 500 µ Jy / beam and we use the same numberof forward-backward steps for pfb-clean. The computational cost of pfb-clean is stillsignificantly higher because of the sub-iterative nature of solving (6) and (7). How-ever, the main computational cost during these steps are image sized FFT’s and waveletdecompositions which can be e ffi ciently parallelised. There is also the possibility ofsub-dividing the image into facets to further reduce the computational cost. Acknowledgments.
We thank Philipp Arras, Ming Jiang, Tim Molteno, MartinReinecke and Yves Wiaux for useful discussions and Richard A. Perley for providingCygnus A observations with the VLA. The research of OS is supported by the SouthAfrican Research Chairs Initiative of the Department of Science and Technology andNational Research Foundation. This work was supported in part by the Swiss-SouthAfrica Joint Research Program (IZLSZ2170863 / References
Arras, P., Bester, H. L., Perley, R. A., Leike, R., Smirnov, O., Westermann, R., & Enßlin,T. A. 2020a, Comparison of classical and bayesian imaging in radio interferometry.
Arras, P., Reinecke, M., Westermann, R., & Enßlin, T. A. 2020b, E ffi cient wide-field radiointerferometry response. Condat, L. 2013, Journal of Optimization Theory and Applications, online first, to appear. URL https://hal.archives-ouvertes.fr/hal-00609728 O ff ringa, A. R., & Smirnov, O. 2017, Monthly Notices of the Royal Astronomical Society, 471,301–316. URL http://dx.doi.org/10.1093/mnras/stx1547 Perkins, S. J., et al. 2021, in ADASS XXX, edited by J.-E. Ruiz, & F. Pierfederici (San Fran-cisco: ASP), vol. TBD of ASP Conf. Ser., 999 TBDRepetti, A., & Wiaux, Y. 2020, Variable metric forward-backward algorithm for compositeminimization problems.
Sebokolodi, M. L., Perley, R., Eilek, J., Carilli, C., Smirnov, O., Laing, R., Greisen, E., & Wise,M. 2020, A wideband polarization study of cygnus a with the jvla. i: The observationsand data.