[PDF] A D-term Modeling Code (DMC) for simultaneous calibration and full-Stokes imaging of very long baseline interferometric data

Abstract

In this paper we present DMC, a model and associated tool for polarimetric imaging of very long baseline interferometry datasets that simultaneously reconstructs the full-Stokes emission structure along with the station-based gain and leakage calibration terms. DMC formulates the imaging problem in terms of posterior exploration, which is achieved using Hamiltonian Monte Carlo sampling. The resulting posterior distribution provides a natural quantification of uncertainty in both the image structure and in the data calibration. We run DMC on both synthetic and real datasets, the results of which demonstrate its ability to accurately recover both the image structure and calibration quantities as well as to assess their corresponding uncertainties. The framework underpinning DMC is flexible, and its specific implementation is under continued development.

Full PDF

DDraft version February 8, 2021

Typeset using L A TEX twocolumn style in AASTeX63

A D-term Modeling Code (DMC) for simultaneous calibration and full-Stokes imaging of very longbaseline interferometric data

Dominic W. Pesce

1, 2 Center for Astrophysics | Harvard & Smithsonian, 60 Garden Street, Cambridge, MA 02138, USA Black Hole Initiative at Harvard University, 20 Garden Street, Cambridge, MA 02138, USA

ABSTRACTIn this paper we present DMC, a model and associated tool for polarimetric imaging of very longbaseline interferometry datasets that simultaneously reconstructs the full-Stokes emission structurealong with the station-based gain and leakage calibration terms. DMC formulates the imaging problemin terms of posterior exploration, which is achieved using Hamiltonian Monte Carlo sampling. Theresulting posterior distribution provides a natural quantiﬁcation of uncertainty in both the imagestructure and in the data calibration. We run DMC on both synthetic and real datasets, the results ofwhich demonstrate its ability to accurately recover both the image structure and calibration quantitiesas well as to assess their corresponding uncertainties. The framework underpinning DMC is ﬂexible,and its speciﬁc implementation is under continued development.

Keywords: keywords INTRODUCTIONInterferometric observations in radio astronomy na-tively access the so-called “visibility domain,” with eachvisibility determined by the complex correlation be-tween the electric ﬁelds incident at a pair of telescopes(Thompson et al. 2017, hereafter TMS). These visibili-ties provide information about the Fourier transform ofthe incident ﬂux distribution via the van Cittert-Zerniketheorem, and radio interferometric imaging – i.e., theprocess by which the visibility measurements are trans-lated into a sky-plane image – presents an example ofan ill-posed inverse problem. The combination of sparseFourier-plane sampling and uncertain calibration, bothof which are exacerbated for very long baseline inter-ferometric (VLBI) observations, prevents a direct inver-sion of the visibilities to produce a unique image. In-stead, images must be “reconstructed” with the aid ofadditional assumptions about the image structure (e.g.,ﬂux positivity, source sparsity) to overcome this non-uniqueness.A variety of algorithms exist for reconstructing im-ages in radio interferometry, and these algorithms canbe broadly classiﬁed into the two categories estab-

Corresponding author: Dominic W. [email protected] lished in Event Horizon Telescope Collaboration et al.(2019a). “Inverse modeling” schemes, exempliﬁed bythe CLEAN algorithm and its variants (H¨ogbom 1974;Clark 1980; Schwab 1984), operate directly with the in-verse Fourier transform of the visibility measurementsand seek to iteratively deconvolve the eﬀects of the ﬁ-nite sampling from the reconstructed image. “Forwardmodeling” schemes, such as the maximum entropy (e.g.,Nityananda & Narayan 1982; Cornwell & Evans 1985)and regularized maximum likelihood (e.g., Chael et al.2016; Akiyama et al. 2017) methods, instead parameter-ize the image structure (typically using a grid of pixels)and Fourier transform it to predict the values of the vis-ibility measurements. The image parameters are thenvaried so as to optimize some objective function, typi-cally consisting of a data comparison term (e.g., a χ metric) along with one or more regularization terms.From the perspective of computational speed, theCLEAN approach has historically been a clear favoritefor VLBI imaging. The forward modeling schemes,though generally more computationally taxing, beneﬁtfrom the ability to enforce various nonlinear constraints(such as ﬂux positivity) on the image and to ﬁt di-rectly to non-visibility data products (such as closurequantities; e.g., Chael et al. 2018). Typical implemen-tations of both classes of algorithm, however, share amixed relationship with data calibration whereby im-age reconstruction steps are iterated with interleaving a r X i v : . [ a s t r o - ph . I M ] F e b “self-calibration” steps that attempt to solve for station-based calibration terms (e.g., Readhead et al. 1980).Furthermore, both the inverse and forward modelingclasses of image reconstruction algorithm classically lacka natural quantiﬁcation of uncertainty in the image.Recent developments have yielded a new class of im-age reconstruction algorithms, based on posterior explo-ration or parameterization techniques, that aim to over-come the aforementioned shortcomings (e.g., Cai et al.2018a,b; Arras et al. 2019; Broderick et al. 2020a). Fromthe perspective of statistical integrity, an image recon-struction algorithm should solve simultaneously for theensemble of both sky-plane emission structures and req-uisite calibration terms that are permissible, given theuncertainties in the data and any sources of prior knowl-edge about the parameters. In this paper we presentsuch an algorithm in terms of a model for simultaneouscalibration and full-Stokes imaging of VLBI data, alongwith an implementation of this model within a genericposterior exploration framework. An implementation ofan analogous model within Themis (Broderick et al.2020b) is presented in a separate paper, Broderick et al.(2021).This paper is organized as follows. In Section 2 weprovide the a detailed description of the model, specifyour likelihood construction, and describe its software im-plementation. In Section 3 we demonstrate the resultsof ﬁtting this model to both synthetic and real VLBIdatasets. We summarize and conclude in Section 4. MODEL SPECIFICATIONSFor the compact sources and small ﬁelds of view typ-ically considered in VLBI, the observed visibilities arerelated to the Fourier transform of the sky brightnessdistribution by the van Cittert-Zernike theorem (TMS),˜ I ( u, v ) = (cid:90) (cid:90) I ( x, y ) e − πi ( ux + vy ) dxdy, (1)where we use a tilde (˜) to denote a transformed quan-tity. In this section and throughout the paper, unlessotherwise speciﬁed, we consider observations made withonly a single frequency channel.We have developed a new publicly available D-termModeling Code (DMC) that implements the polarimet-ric image model detailed in this section. DMC is imple-mented in Python, and it ﬁts the model using a Bayesianformalism in which the posterior distribution P ( Θ ) of https://github.com/dpesce/eht-dmc the parameter vector Θ is related to the likelihood L ( Θ )and prior π ( Θ ) via Bayes’ Theorem, P ( Θ ) ∝ L ( Θ ) π ( Θ ) . (2)Model parameters and their associated priors are aggre-gated in Table 1. DMC uses the eht-imaging library(Chael et al. 2016, 2018) for internal organization andmanipulation of VLBI data, and it uses the PyMC3 li-brary (Salvatier et al. 2016) for sampling. Because ituses a Markov chain Monte Carlo (MCMC) sampler,the output of running DMC on a VLBI dataset is anensemble of images that are drawn from the posteriordistribution of the model. From this ensemble it is pos-sible to compute various useful statistics (e.g., means,variances), which we demonstrate in Section 3.2.1. Image model

We model the image as a Cartesian grid of N pix pixels,with the grid axes aligned with the equatorial coordinateaxes. Each pixel has a location ( x j , y j ) and a Stokes Iintensity I j . These intensities are constrained to sum toa total ﬂux density F , F = N pix (cid:88) j =1 I j , (3)with F speciﬁable but by default sampled from a nor-mal prior truncated at zero to ensure positivity. Theconstrained sum in Equation 3 is imposed via a Dirich-let prior on the pixel intensity values, I F ∼ Dir( N pix , a ) , (4)where I = (cid:0) I , I , . . . , I N pix (cid:1) is the vector of pixel inten-sities. The concentration parameter vector a is speciﬁ-able but defaults to a = ≡ (1 , , . . . , N pix − I j to the other Stokes parameters by the inequality I j ≥ Q j + U j + V j ≡ p j I j , (5) Table 1.

Model parameters and priorsParameter Description Default prior F image-integrated Stokes I ﬂux density N ( ˘ F , [0 . F ] ) I j /F fraction of Stokes I ﬂux density contained in pixel j Dir( N pix , ) p j polarization fraction in pixel j U (0 , α j azimuthal angle Poincar´e coordinate in pixel j U per ( − π, π ) β j polar angle Poincar´e coordinate in pixel j cos( β j ) ∼ U ( − , µ as δ (Σ − ˘Σ) x overall image centroid shift along the Right Ascension (RA) axis, in µ as δ ( x − ˘ x ) y overall image centroid shift along the Declination (Dec) axis, in µ as δ ( y − ˘ y ) g R,a righthand gain amplitude for station a N (1 , ˘ σ R,a ) θ R,a righthand gain phase for station a U per ( − π, π ) g L,a lefthand gain amplitude for station a N (1 , ˘ σ L,a ) θ L,a lefthand gain phase for station a U per ( − π, π ) d R,a righthand leakage amplitude for station a U (0 , δ R,a righthand leakage phase for station a U per ( − π, π ) d L,a lefthand leakage amplitude for station a U (0 , δ L,a lefthand leakage phase for station a U per ( − π, π ) f fractional systematic uncertainty U (0 , N x number of image pixels along the RA axis . . .N y number of image pixels along the Dec axis . . . FOV x ﬁeld of view along the RA axis, in µ as . . . FOV y ﬁeld of view along the Dec axis, in µ as . . . Note —A list of the model parameters and their corresponding prior distributions. The top portion of the table lists theparameters associated with the image, the middle portion lists parameters associated with the calibration, and the bottomportion lists the hyperparameters. We use a breve (˘) to denote user-speciﬁed quantities. We use a number of diﬀerent priorclasses: δ ( x − a ) denotes a Dirac delta prior over x such that it takes on the ﬁxed value a , U ( a, b ) denotes a uniform prioron the range [ a, b ]; U per ( a, b ) denotes a periodic (or “wrapped”) uniform prior on the range [ a, b ]; N ( µ, σ ) denotes a normal(Gaussian) distribution with mean µ and variance σ ; N ( µ, σ ) denotes a normal distribution (with mean µ and variance σ ) that has a lower-bound truncation at zero; N c ( µ, σ ) denotes a circularly-symmetric complex normal distribution with(complex) mean µ and variance σ along both the real and imaginary directions; Dir( N, a ) denotes a Dirichlet distribution in N dimensions with concentration parameter vector a . where we have introduced the polarization fraction p j ≤

1. This spherical relationship lends itself naturally to aPoincar´e parameterization in terms of angular variables,  I j Q j U j V j  = I j  p j cos( α j ) sin( β j ) p j sin( α j ) sin( β j ) p j cos( β j )  , (6)where − π ≤ α j ≤ π is an azimuthal angle and 0 ≤ β j ≤ π is a polar angle. The angle α j determines theorientation of the polarization ellipse, and it is relatedto the usual electric vector position angle (EVPA) χ j by χ j = 12 tan − (cid:18) U j Q j (cid:19) = α j . (7) The angle β j determines the degree of circular polar-ization, with purely linear polarization having β j = π/ β j = 0 or β j = π .We sample p j from a unit uniform distribution, andwe sample the angular variables uniformly on the unitsphere. Our polarized image model thus consists of thefour quantities ( I j , p j , α j , β j ) for every pixel, which to-gether with the total ﬂux F amount to 4 N pix modelparameters.From the parameters ( I j , p j , α j , β j ), we determine theStokes parameters in each pixel using Equation 6. Wethen compute the Fourier transforms of these Stokes pa-rameters via  ˜ I k ˜ Q k ˜ U k ˜ V k  = S k N pix (cid:88) j =1 A jk  I j Q j U j V j  , (8)where A jk = exp ( − πi [ u k ( x j − x ) + v k ( y j − y )]) (9)are elements of the discrete Fourier transform matrix,( u k , v k ) are the Fourier-plane coordinates for visibilitymeasurement k in units of the observing wavelength,( x , y ) are the image-plane coordinates of the imageorigin (or “phase center”), and S k = exp (cid:20) − π Σ ( u k + v k )4 ln(2) (cid:21) (10)is a circularly symmetric Gaussian smoothing kernelwith full width at half maximum (FWHM) Σ in theimage plane that serves to maintain image continuity.2.2. Corruption model

For an array observing with circularly polarized feeds,the measured quantities are parallel- and cross-hand cor-relation products; we denote the parallel-hand visibil-ities as RR ≡ (cid:104) E R, E ∗ R, (cid:105) and LL ≡ (cid:104) E L, E ∗ L, (cid:105) ,and we denote the cross-hand visibilities as RL ≡(cid:104) E R, E ∗ L, (cid:105) and LR ≡ (cid:104) E L, E ∗ R, (cid:105) . The measured vis-ibilities on a baseline ab are related to the Stokes visi-bilities on that same baseline by  RR ab LL ab RL ab LR ab  =  ˜ I ab + ˜ V ab ˜ I ab − ˜ V ab ˜ Q ab + i ˜ U ab ˜ Q ab − i ˜ U ab  . (11)In real interferometric observations, the measured vis-ibilities are corrupted by a combination of a priori un-known signal propagation eﬀects. Following the radiointerferometer measurement equation (RIME) formal-ism developed by Hamaker et al. (1996) – and in par-ticular the 2 × V ab ≡ (cid:32) RR ab RL ab LR ab LL ab (cid:33) . (12)Within the RIME formalism, the (complex) Jones ma-trix J a captures all linear transformations undergone bythe incident astrophysical signal at a station a , such thatˆ V ab = J a V ab J † b , (13)where a dagger ( † ) denotes a conjugate transpose anda hat (ˆ) denotes an observed quantity.DMC incorporates a minimal but standard (see, e.g.,TMS) threefold decomposition of J a , J a = G a D a F a , (14)where G a = (cid:32) G R,a G L,a (cid:33) (15)contains the station gain terms, D a = (cid:32) D R,a D L,a (cid:33) (16)contains the polarimetric leakage terms, and F a = (cid:32) e − iφ a e iφ a (cid:33) (17)applies the feed rotation angle, φ a . Note that thisdecomposition incorporates only station-based corrup-tions, and it does not account for direction-dependenteﬀects (e.g., Smirnov 2011) or for other baseline-basedcorruptions. The feed rotation angle φ a depends on thestation mount properties and on the source parallacticand elevation angles as a function of time, but for mostradio interferometers it is well-known a priori; we thusassume the φ a to be given and therefore do not incorpo-rate them as model parameters. The station gain andleakage terms, however, are typically imperfectly cali-brated and so we retain both as model parameters.We parameterize the complex station gains using am-plitude and phase, G R,a = g R,a e iθ R,a , (18a) This particular (sky-intrinsic) coherency matrix is also referredto in the literature as the “brightness matrix,” and its mea-sured counterpart has been referred to as the “visibility matrix”(Smirnov 2011). G L,a = g L,a e iθ L,a . (18b)Our priors on the gain amplitudes are normal with alower-bound truncation at zero, and we impose peri-odic uniform priors on the gain phases with the range( − π, π ). For each observation we select a single “ref-erence station” for which both the right and left gainphases are ﬁxed to be zero (i.e., θ R = θ L ). We per-mit all gains other than those of the reference station tobe independent across stations and across timestamps.We note that in real-world arrays, the station gain am-plitudes are not expected to ﬂuctuate wildly from onetimestamp to the next. In this sense, DMC aims toprovide a conservative treatment of the gain beharior;i.e., DMC permits – though it does not impose – largegain amplitude ﬂuctuations between timestamps, suchas may occur when a telescope is re-pointed.We use an analogous parameterization for the complexleakage terms, D R,a = d R,a e iδ R,a , (19a) D L,a = d L,a e iδ L,a . (19b)We impose unit uniform priors on the leakage ampli-tudes and periodic uniform priors on the leakage phaseswith the range ( − π, π ). We assume that the leakageterms are constant in time (see, e.g., Conway & Kro-nberg 1969; Roberts et al. 1994) and so assign only asingle D a for every station.We note that the default priors described in this sec-tion for the gain and leakage terms may be overriddenwhen running DMC to incorporate any a priori knowl-edge of the station properties.2.3. Likelihood construction

The thermal noise in any single visibility measurementdepends on various factors, including the collecting areaof the telescopes constituting the baseline and the av-eraging time and bandwidth of the observation, but forthe vast majority of sources of interest this thermalnoise is normally distributed and statistically indepen-dent across diﬀerent baselines (TMS). In the absence A periodic uniform distribution is one that is uniformly dis-tributed on the unit circle. Note that if the reference station does not actually have a zero-valued phase diﬀerence between its right- and left-hand gains,this treatment will result in an overall image EVPA rotation thatmust be absolutely calibrated (Brown et al. 1989). The “self-noise” of Kulkarni (1989) introduces statistical depen-dence in the noise measured across multiple baselines, but thiscontribution only becomes relevant for extremely bright sources. of any other corruptions, a measured visibility is drawnfrom a circularly symmetric complex Normal distribu-tion, e.g., ˆ RR k ∼ N c (cid:0) RR k , σ ,RR,k (cid:1) , (20)where RR k is the “true” visibility value on baseline k and σ ,RR,k is the thermal variance in the correspond-ing visibility measurement. Note that the presence ofgain corruptions does introduce covariance between vis-ibility measurements, but by explicitly modeling thesegains we account for this covariance and ensure thatthe remaining diﬀerences between modeled and observedvisibilities will be independently distributed (Blackburnet al. 2020).Though we explicitly incorporate a number of knowncorrupting eﬀects into the model (see Section 2.2), wealso permit an additional multiplicative systematic noisecomponent,  σ RR,k σ LL,k σ RL,k σ LR,k  =  σ ,RR,k + f I k σ ,LL,k + f I k σ ,RL,k + f I k σ ,LR,k + f I k  , (21)where f is sampled from a unit uniform distribution.This systematic component aims to account for any un-calibrated non-closing errors that cannot be describedby thermal noise or leakage. We then construct the like-lihood of a particular set of model visibilities given thevisibility measurements using L RR = (cid:89) k πσ RR,k exp  (cid:16) ˆ RR k − RR k (cid:17) (cid:16) RR k − ˆ RR k (cid:17) ∗ σ RR,k  , (22a) L LL = (cid:89) k πσ LL,k exp  (cid:16) ˆ LL k − LL k (cid:17) (cid:16) LL k − ˆ LL k (cid:17) ∗ σ LL,k  , (22b) L RL = (cid:89) k πσ RL,k exp  (cid:16) ˆ RL k − RL k (cid:17) (cid:16) RL k − ˆ RL k (cid:17) ∗ σ RL,k  , (22c) L LR = (cid:89) k πσ LR,k exp  (cid:16) ˆ LR k − LR k (cid:17) (cid:16) LR k − ˆ LR k (cid:17) ∗ σ LR,k  , (22d) where the products are taken over all visibility measure-ments. The ﬁnal likelihood expression is then simplythe product of the individual visibility likelihoods, L = L RR L LL L RL L LR . (23) − − − − − I n t e n s i t y ( J y p i x − ) I II IIIpixel on the edge of the imagepixel on the ringpixel in the center of the ring

Figure 1.

Example set of parameter traces from a DMC ﬁt to the EHT-like synthetic dataset described in Section 3.1, showingthe Stokes I intensity of a pixel located towards the edge of the image (in blue), a pixel situated on the ring (in green), and a pixellocated in the center of the ring (in red). The shaded regions highlight the diﬀerent tuning windows described in Section 2.4; thedark gray shading indicates a “fast” tuning window (one example is labeled as “I”), the light gray shading indicates a “slow”tuning window (one example is labeled as “II”), and the unshaded region indicates the main sampling phase (labeled as “III”).

Sampler and tuning

DMC uses the Hamiltonian Monte Carlo (HMC; Du-ane et al. 1987) No U-Turn Sampler (NUTS; Hoﬀman &Gelman 2011) implemented within the PyMC3 Pythonpackage (Salvatier et al. 2016) to explore the posteriorspace. HMC is an MCMC method whose output prod-uct is an ensemble of samples from the posterior distri-bution. Detailed descriptions of the HMC method canbe found in, e.g., Neal (2011) and Betancourt (2017).PyMC3 is a probabilistic programming tool that lever-ages Theano (Bergstra et al. 2010; Bastien et al. 2012)to automatically diﬀerentiate the posterior density whencomputing model gradients.As an HMC sampler, PyMC3 exploits model gradientinformation to eﬃciently explore the high-dimensionalposterior space presented by the polarized image model.The number of tunable hyperparameters is minimizedthrough the use of NUTS, but there remain two key hy-perparameters that need to be adaptively tuned duringsampling itself: a “step size” hyperparameter that setsthe discretization interval for trajectory integrations,and a “mass matrix” hyperparameter (actually a col-lection of hyperparameters, the elements of the matrix)that determine the Gaussian distribution from which themomentum parameters are sampled. PyMC3 nativelyadapts the step size hyperparameter during sampling(see Hoﬀman & Gelman 2011), but its default function-ality only adapts the diagonal elements of the mass ma-trix. Strong correlations in the posterior distributioncan therefore lead to decreased sampling eﬃciency.To mitigate this potential deﬁciency, we divide thesampling period into multiple tuning windows duringwhich both the step size and (dense) mass matrix are adaptively determined. We have designed these win-dows to mimic the “warmup epochs” used in the Stanpackage (Carpenter et al. 2017). An initial “fast” win-dow is used to tune the step size parameter, after whicha series of increasingly heavily-sampled “slow” windowsare used to estimate the mass matrix using the parame-ter covariances measured from the set of posterior sam-ples in the previous window. Each slow window is pre-ceded by a brief fast window to permit the step sizeto adapt to the new mass matrix. A ﬁnal fast windowfollows the last slow window, after which the main sam-pling phase proceeds using the tuned values for bothhyperparameters. This tuning procedure is illustratedin Figure 1.In practice, we ﬁnd that tuning of both the step sizeand the mass matrix is essential for polarized imagingusing DMC. If, e.g., the mass matrix tuning is restrictedto only the diagonal elements, parameter autocorrela-tion times are liable to increase by several orders of mag-nitude and the sampler will eﬀectively stall. DEMONSTRATIONSIn this section we demonstrate the imaging capabili-ties of DMC on both synthetic and real data.3.1.

Synthetic data construction

We ﬁrst run DMC on a synthetic dataset constructedto have properties similar to the 2017 observations of theM87 black hole with the EHT (Event Horizon TelescopeCollaboration et al. 2019b,c,d,a,e,f). The baseline cover-age and signal-to-noise ratio distribution for this datasetis shown in Figure 2, and the input source model Stokesimages are shown in the left panels of Figure 3. The − . − . − . . . . . u (G λ ) − . − . − . . . . . v ( G λ ) . . . . . . . l og ( S / N ) Figure 2. ( u, v )-coverage for the EHT-like syntheticdataset, with points colored by the base-10 logarithm of theirStokes I signal-to-noise ratio. visibility data are generated in a circular polarizationbasis, corresponding to the state of the real EHT dataafter fringe-ﬁtting has been performed (Event HorizonTelescope Collaboration et al. 2019d).The input source structure is a circular crescent witha diameter of 40 µ as and a Gaussian FWHM of 10 µ as,oriented such that the brightest region of the crescent islocated towards the North. The image-integrated ﬂuxdensity is 1.0 Jy, and it is polarized at the 10% level inlinear polarization and at the 2% level in circular polar-ization. We construct the linear polarization structureto have an approximately threefold azimuthal symmetry(see the left panel of Figure 6), corresponding to nonzero β − and β modes in the decomposition developed byPalumbo et al. (2020, hereafter PWP).In addition to the thermal noise, we add gain andleakage corruptions to each of the seven stations in thesynthetic dataset. The gain amplitudes are permitted tovary at the 10% level, while the gain phases are uncon-strained (i.e., they are sampled uniformly on the unitcircle); each station has independent gains, but thesegains are drawn from the same distribution and thusshare the same magnitude of ﬂuctuations. The samegain corruptions are used for both right- and left-handpolarization. Each station has an associated complexstation leakage in both right- and left-hand polarizationthat is at the level of 0–10%. The station gain ampli-tudes and phases are independently generated for eachstation at each observing timestamp, while the complexright- and left-hand leakage terms are held constant foreach station across the synthetic observation. 3.2. Imaging synthetic data

We use DMC to image the synthetic dataset describedin the previous section, setting FOV x = FOV y = 60 µ as, N x = N y = 30, ˘ x = ˘ y = 0 µ as, and Σ = 6 µ as. The ini-tial total ﬂux estimate ˘ F is set equal to the visibility am-plitude on the shortest baseline, and the gain amplitudeprior standard deviations are ﬁxed to ˘ σ R = ˘ σ L = 0 . DMC is able to ﬁnd good ﬁts to this dataset, withreduced- χ values near unity and an additional sys-tematic noise parameter consistent with zero (i.e., themodel is able to describe the data to within thermal er-rors without requiring an additional source of systematicuncertainty). The image-integrated linear polarizationfraction is recovered as 9 . ± .

03% (compared to aninput value of 10%), while the image-integrated circularpolarization fraction is found to be 1 . ± .

0% (comparedto an input value of 2%).In Figure 3, we show maps of both the mean and thestandard deviation of the image ensemble output fromDMC – where each image in the ensemble correspondsto a single sample from the posterior distribution – andwe compare these against the input images of all fourStokes parameters. Because DMC produces an estimateof the posterior distribution for every image pixel, arbi-trarily complicated single- or multi-point statistics maybe computed that self-consistently capture the uncer-tainties and correlations between pixels. For instance,image-domain feature extraction techniques – such asthose employed in Event Horizon Telescope Collabo-ration et al. (2019a,f) to measure the properties (e.g.,diameter, width, orientation) of the ring-like structureseen in M87 – can be applied to the ensemble of imagesin the DMC posterior to yield corresponding distribu-tions for any derived values. We note that for this synthetic dataset, all of the stations areeﬀectively calibrated in absolute EVPA. For real EHT data onlythe ALMA station has this property, which it inherits from appli-cation of the

PolConvert procedure (Mart´ı-Vidal et al. 2016) thatgets applied after correlation to convert the mixed-basis polar-ization products on ALMA baselines to the circular polarizationproducts used by the rest of the array (Event Horizon TelescopeCollaboration et al. 2019d; Goddi et al. 2019).

Stokes I input mean standard deviationStokes QStokes U − − x ( µ as) − − y ( µ a s ) Stokes V . . . . . . . × − . . . . . . . × − . . . . . I n t e n s i t y ( J y µ a s − ) × − − − − × − − − − × − . . . . . . . I n t e n s i t y ( J y µ a s − ) × − − − × − − − × − I n t e n s i t y ( J y µ a s − ) × − − − − × − − − − × − I n t e n s i t y ( J y µ a s − ) × − Figure 3.

DMC image reconstructions of the polarized EHT-like synthetic dataset described in Section 3.1. Each row showsthe results for a diﬀerent Stokes parameter. The same ﬁeld-of-view is used for all plots and is explicitly labeled in the bottom-left panel. The leftmost column shows the ground-truth input images, the middle column shows the mean posterior imagesfor each of the Stokes parameters, and the rightmost column shows the standard deviations of the image posteriors. Becausethese datasets included arbitrary gain phase corruptions, absolute position information is not uniquely recovered and so thereconstructed images have been shifted to the location that maximizes the normalized cross-correlation between the ground-truth Stokes I image and the mean of the Stokes I image posterior. The 6 µ as FWHM of the Gaussian smoothing kernel (i.e., Σfrom Equation 10) is shown in the bottom right-hand corner of the top middle panel. Note that this kernel does not representa typical “restoring beam” that gets applied after imaging has been completed; rather, it is a convolving function that getsself-consistently applied during the imaging process itself (see Section 2.1). Table 2.

Station leakages for synthetic dataStation Input D R Posterior D R ± σ Input D L Posterior D L ± σ ALMA 8 . − . i (7 . − . i ) ± (0 .

14 + 0 . i ) 2 . . i (2 .

11 + 7 . i ) ± (0 .

11 + 0 . i )APEX − . . i ( − .

93 + 7 . i ) ± (0 .

11 + 0 . i ) − . . i ( − .

09 + 3 . i ) ± (0 .

12 + 0 . i )SMT 4 . − . i (4 . − . i ) ± (0 .

16 + 0 . i ) 6 . . i (6 .

14 + 5 . i ) ± (0 .

19 + 0 . i )JCMT − . . i ( − .

34 + 5 . i ) ± (0 .

35 + 0 . i ) − . − . i ( − . − . i ) ± (0 .

33 + 0 . i )LMT − . . i ( − .

07 + 3 . i ) ± (0 .

22 + 0 . i ) 6 . − . i (6 . − . i ) ± (0 .

23 + 0 . i )IRAM 30m 2 . . i (1 .

75 + 2 . i ) ± (0 .

43 + 0 . i ) − . − . i ( − . − . i ) ± (0 .

43 + 0 . i )SMA − . . i ( − .

23 + 1 . i ) ± (0 .

34 + 0 . i ) − . . i ( − .

02 + 1 . i ) ± (0 .

29 + 0 . i ) Note —A list of the leakage values derived from the DMC ﬁt to the EHT-like synthetic dataset, compared against the inputvalues for all stations. The second and fourth columns list the the input values, while the third and ﬁfth columns list theposterior values for the right- and left-hand leakage terms, respectively. The leakage values in the third and ﬁfth columns arequoted as posterior means, with the posterior standard deviation in both the real and imaginary components following the ± symbol. All leakage values are quoted as percentages. .

280 0 .

285 0 .

290 0 .

295 0 . | β | .

085 0 .

090 0 .

095 0 .

100 0 . | β − |

87 88 89 90 91 92 93arg( β ) (degrees) − − − − − − − − β − ) (degrees) Figure 4.

Posterior distributions for the β (in blue) and β − (in orange) values corresponding to the DMC reconstruc-tion of the EHT-like synthetic dataset. The top two panelsshow the amplitudes of both quantities, while the bottomtwo panels show their phases. In all panels, the value de-rived from the input image is shown as a vertical line. We demonstrate an example of this latter capabilityin Figure 4, which shows the derived posteriors for thePWP β and β − parameters. In this paper, we adopt β m = 1 F (cid:90) (cid:90) ( Q + iU ) e − imϕ rdrdϕ (24) as our working deﬁnition for β m , with F the total im-age ﬂux density and the integral taken over the entirerange of image-domain spatial coordinates ( r, ϕ ). ThePWP β m decomposition permits a quantitative charac-terization of the polarization structure on a ring, andfor the input synthetic dataset only β and β − modesare nonzero. As shown in Figure 4, we ﬁnd that theDMC ﬁts return posteriors on these two modes that areconsistent with the input values.Furthermore, because DMC simultaneously modelsboth the image structure and the station-based corrup-tions, we are also able to investigate the posterior be-havior of the latter. Of most interest for polarimetricreconstructions are the leakage terms, which we list inTable 2. Figure 5 shows the leakage posterior distribu-tions for each station in the array. We see that DMC isable to recover the correct leakages for all stations andto determine which stations are most well-constrained.We also compare the DMC results with those of Themis (Broderick et al. 2020b), an independent codecapable of producing simultaneous image and leakageposteriors for VLBI datasets (Broderick et al. 2021). InFigure 5 and Figure 6, the DMC posteriors are comparedagainst those from

Themis for the station leakages andimage structure, respectively. Although the two soft-wares employ distinct model speciﬁcations and explo-ration schemes, we ﬁnd that the posterior reconstruc-tions from both codes nevertheless show good agree-ment. The

Themis ﬁt typically shows only small ( (cid:46) σ )shifts in posterior mean relative to the DMC ﬁt, and thesystematically wider posterior widths from the DMC ﬁtstem from the more permissive station gain priors (DMCpermits the right- and left-hand station gains to varyindependently, while Themis enforces equality betweenboth hands). 3.3.

Imaging real data

We also apply DMC to a real dataset, for which we usea polarimetric VLBI observation of the blazar OJ 287obtained from the Boston University Blazar ResearchGroup (Jorstad & Marscher 2016; Jorstad et al. 2017).A fully calibrated version of this dataset is publicly avail-able , but we wish to demonstrate DMC’s ability to per-form calibration itself alongside image reconstruction.Furthermore, applying DMC to a pre-calibrated datasetmay violate our assumption that the leakage terms areconstant in time (see Appendix A). In this paper wethus use an earlier version of the OJ 287 dataset forwhich only the a priori amplitude and phase calibra- ALMA APEX SMT JCMT-10.0 -5.0 0.0 5.0 10.0Real part (%)-10.0-5.00.05.010.0 I m ag i n a r y p a r t( % ) LMT IRAM 30m SMA DMC D R DMC D L Themis

Figure 5.

Leakage posteriors for the DMC ﬁt to the EHT-like synthetic dataset described in Section 3.1; each panel shows theresult for an individual station, and ground-truth input values are marked using solid vertical and horizontal lines. The sameaxis ranges are used for all panels and are explicitly labeled in the bottom-left panel. Blue contours show the right-hand leakageposteriors, and red contours show the left-hand leakage posteriors. In all panels, the plotted contours enclose 50%, 90%, and99% of the posterior probability. For comparison, we have overplotted the leakage posteriors from

Themis as black ellipses,with the sizes of the ellipses along each axis corresponding to the posterior standard deviation. − − x ( µ as) − − y ( µ a s ) input DMC Themis . . . . . . . . . P o l a r i ze d i n t e n s i t y ( J y µ a s − ) × − Figure 6.

A comparison of the linear polarization structure in the EHT-like dataset as recovered using DMC (center) and

Themis (right), with the input image shown in the leftmost panel. The Stokes I structure is marked using blue contours, withthe outermost contour levels starting at 5% of the peak intensity and increasing inwards by factors of 2. The backgroundcolormap indicates the linearly polarized intensity, and the overlaid tick marks show the EVPA direction. All three panels sharea common ﬁeld of view (explicitly labeled in the leftmost panel), and all three images have been convolved with the EHT beam(circular Gaussian of width 18 µ as; Event Horizon Telescope Collaboration et al. 2019a) shown in the lower right-hand cornerof the left panel. − . − . . . . u (G λ ) − . − . . . . v ( G λ ) . . . . . . l og ( S / N ) Figure 7.

Same as Figure 2, but showing the ( u, v )-coveragefor the OJ 287 dataset used in Section 3.3.

Stokes I mean standard deviationStokes Q . . − . − . x (mas) − . . . . y ( m a s ) Stokes U . . . . . I n t e n s i t y ( J y m a s − ) − . − . . . . I n t e n s i t y ( J y m a s − ) × − − − − I n t e n s i t y ( J y m a s − ) × − Figure 8.

Similar to Figure 3, but for the DMC ﬁt to theOJ 287 dataset (for which we have no ground-truth image).No signiﬁcant Stokes V signal is detected in this dataset, sowe show here only the Stokes I, Q, and U maps. The 0.2 masFWHM of the Gaussian smoothing kernel is shown in thelower right-hand corner of the top left panel, and we noteagain that this kernel does not represent a “restoring beam”but rather a convolving function that is self-consistently ap-plied during imaging. tions have been applied (S. Jorstad 2020, private com-munication), but for which the full set of gain and leak-age corruptions have not yet been removed. The ob-servation was carried out with the Very Long BaselineArray (VLBA) on January 3, 2020 at an observing fre-quency of 43 GHz, and the data reduction procedure isdescribed in Jorstad et al. (2005). The original datasetis split into four separate frequency bands, but we com-bine bands during imaging to produce a single imageand set of leakage terms. To reduce data volume, we co-herently average the visibilities on a per-scan basis priorto imaging; the ( u, v )-coverage of the resulting datasetis shown in Figure 7.We run DMC on the OJ 287 dataset using FOV x =FOV y = 2 mas, N x = N y = 20, ˘ x = ˘ y = 0 mas, andΣ = 0 . F to be equal to the visibilityamplitude on the shortest baseline. The gain amplitudeprior standard deviations are ﬁxed to ˘ σ R = ˘ σ L = 0 . χ value near unity without requiring any ad-ditional systematic noise; i.e., f from Equation 21 has a90% conﬁdence upper limit of f < .

2% and is consistentwith being zero. Figure 8 shows our reconstructed im-ages for each Stokes parameter, and Figure 9 shows thederived station leakages and compares them against theresults from LPCAL. The LPCAL leakages were deter-mined separately for each of the four frequency bands inthe original dataset, and are the result of averaging leak-age solutions across 15 diﬀerent observational targets(including OJ 287). We ﬁnd that the leakage posteriorsrecovered by DMC– which are determined by solving foronly a single leakage term for all four frequency band atonce – behave similarly to the LPCAL values for all sta-tions and are largely consistent with the average of theLPCAL values across bands.Figure 10 shows a comparison between the DMC andCLEAN image reconstructions for the OJ 287 dataset,after restoring both with the CLEAN beam. We seegood agreement between both the Stokes I and linearlypolarized image structures across most of the image,with noticeable deviations manifesting only at the (cid:46) SUMMARYIn this paper we have presented DMC, a Python-basedsoftware package for performing polarimetric imaging ofVLBI data. DMC simultaneously reconstructs the full-3

BR FD HN KP LA-6 -4 -2 0 2 4 6Real part (%)-6-4-20246 I m ag i n a r y p a r t( % ) MK NL OV PT SC

Figure 9.

Similar to Figure 5, but for the DMC ﬁt to the OJ 287 dataset; as in Figure 5, the blue contours correspondto right-hand leakages and the red contours to left-hand leakages. The overplotted blue and red crosses indicate the rightand left leakage solutions, respectively, derived from the same dataset using CLEAN and LPCAL (S. Jorstad 2020, privatecommunication). There are four crosses associated with each leakage term because the original CLEAN analysis imaged eachof the four frequency bands separately, while for the current DMC analysis we image all bands simultaneously and solve for asingle set of leakages. − . − . − . . . . x (mas) − . − . − . . . . . y ( m a s ) CLEAN DMC . . . . . . P o l a r i ze d i n t e n s i t y ( J y m a s − ) Figure 10.

Analogous to Figure 6, but showing a comparison of the linear polarization structure in the OJ 287 dataset asrecovered using CLEAN (left) and DMC (right). The outermost Stokes I contour levels start at 0.5% of the peak intensity, andthe contoured value increases inwards by factors of 2. Both images have been convolved with the CLEAN beam (dimensions0 . × .

16 mas, position angle − ◦ East of North) shown in the lower right-hand corner of the left panel.

Stokes image structure and solves for the station-basedgain and leakage calibration terms within a probabilisticformalism. The output of DMC is a sample of parametervalues drawn from the posterior distribution, such thatinstead of a single image and associated set of calibra-tion quantities one obtains an ensemble of images andsets of calibration quantities. DMC explores this poste-rior space using the NUTS HMC sampler implemented within PyMC3. Using this posterior distribution it ispossible to determine not only a best-ﬁt value but alsoa self-consistent measure of uncertainty in any of themodeled parameters, including the sky-plane emissionstructure in all four Stokes parameters as well as theaforementioned calibration quantities. DMC thereforeprovides a tool for combining the information acrossdiﬀerent regions of an image (e.g., for averaging low-4ﬂux regions) or across multiple images (e.g., for spectralindex or Faraday rotation analyses) in a manner thatproperly accounts for the associated uncertainties andcorrelations.We have demonstrated the eﬀectiveness of DMC onboth synthetic and real data. The results from runningDMC on an EHT-like synthetic dataset show that DMCis capable of accurately recovering both the image struc-ture and the ground-truth calibration quantities. Ourresults are compatible with those of

Themis , which isalso capable of exploring the full posterior space andreturns distributions that are consistent with DMC’s.Similarly, we ﬁnd that running DMC on a real VLBAdataset recovers an image structure and leakage termsthat agree well with those derived from an independentanalysis using CLEAN and LPCAL.DMC is built on a highly ﬂexible modeling frame-work, and it continues to be developed. Future DMCcapabilities may include more sophisticated priors onboth the image and calibration terms, a more physically-motivated calibration model, hybrid imaging (i.e., simul-taneous imaging and modeling), compound sampling op-tions, and time-, frequency-, and scale-dependent imag-ing. ACKNOWLEDGMENTSI thank the anonymous referee for a constructive re-port that improved the quality of this paper. I amgrateful to Lindy Blackburn, Avery Broderick, AndrewChael, Shep Doeleman, Michael Johnson, Iv´an Mart´ı-Vidal, and Daniel Palumbo for helpful discussions andcomments. I’d like to further thank Andrew Chaelfor his guidance on generating synthetic data usingthe eht-imaging library, Avery Broderick for provid-ing

Themis

Software: eht-imaging (Chael et al. 2016, 2018), ehtplot , PyMC3 (Salvatier et al. 2016), Themis (Brod-erick et al. 2020b, 2021) https://github.com/liamedeiros/ehtplot Akiyama, K., Ikeda, S., Pleau, M., et al. 2017, AJ, 153, 159,doi: 10.3847/1538-3881/aa6302Arras, P., Frank, P., Leike, R., Westermann, R., & Enßlin,T. A. 2019, A&A, 627, A134,doi: 10.1051/0004-6361/201935555Bastien, F., Lamblin, P., Pascanu, R., et al. 2012, CoRRBergstra, J., Breuleux, O., Bastien, F., et al. 2010,Proceedings of the Python for Scientiﬁc ComputingConference (SciPy), Vol. 4Betancourt, M. 2017, arXiv e-prints, arXiv:1701.02434.https://arxiv.org/abs/1701.02434Blackburn, L., Pesce, D. W., Johnson, M. D., et al. 2020,ApJ, 894, 31, doi: 10.3847/1538-4357/ab8469Broderick, A. E., Pesce, D. W., & Tiede, P. 2021Broderick, A. E., Pesce, D. W., Tiede, P., Pu, H.-Y., &Gold, R. 2020a, ApJ, 898, 9,doi: 10.3847/1538-4357/ab9c1fBroderick, A. E., Gold, R., Karami, M., et al. 2020b, ApJ,897, 139, doi: 10.3847/1538-4357/ab91a4Brown, L. F., Roberts, D. H., & Wardle, J. F. C. 1989, AJ,97, 1522, doi: 10.1086/115091Cai, X., Pereyra, M., & McEwen, J. D. 2018a, MNRAS,480, 4154, doi: 10.1093/mnras/sty2004—. 2018b, MNRAS, 480, 4170, doi: 10.1093/mnras/sty2015Carpenter, B., Gelman, A., Hoﬀman, M., et al. 2017,Journal of Statistical Software, Articles, 76, 1,doi: 10.18637/jss.v076.i01Chael, A. A., Johnson, M. D., Bouman, K. L., et al. 2018,ApJ, 857, 23, doi: 10.3847/1538-4357/aab6a8Chael, A. A., Johnson, M. D., Narayan, R., et al. 2016,ApJ, 829, 11, doi: 10.3847/0004-637X/829/1/11Clark, B. G. 1980, A&A, 89, 377Conway, R. G., & Kronberg, P. P. 1969, MNRAS, 142, 11,doi: 10.1093/mnras/142.1.11Cornwell, T. J., & Evans, K. F. 1985, A&A, 143, 77Duane, S., Kennedy, A., Pendleton, B. J., & Roweth, D.1987, Physics Letters B, 195, 216 ,doi: https://doi.org/10.1016/0370-2693(87)91197-XEvent Horizon Telescope Collaboration, Akiyama, K.,Alberdi, A., et al. 2019a, ApJL, 875, L4,doi: 10.3847/2041-8213/ab0e85—. 2019b, ApJL, 875, L1, doi: 10.3847/2041-8213/ab0ec7 —. 2019c, ApJL, 875, L2, doi: 10.3847/2041-8213/ab0c96—. 2019d, ApJL, 875, L3, doi: 10.3847/2041-8213/ab0c57—. 2019e, ApJL, 875, L5, doi: 10.3847/2041-8213/ab0f43—. 2019f, ApJL, 875, L6, doi: 10.3847/2041-8213/ab1141Goddi, C., Mart´ı-Vidal, I., Messias, H., et al. 2019, PASP,131, 075003, doi: 10.1088/1538-3873/ab136aHamaker, J. P. 2000, A&AS, 143, 515,doi: 10.1051/aas:2000337Hamaker, J. P., Bregman, J. D., & Sault, R. J. 1996,A&AS, 117, 137Hoﬀman, M., & Gelman, A. 2011, Journal of MachineLearning Research, 15H¨ogbom, J. A. 1974, A&AS, 15, 417Jorstad, S., & Marscher, A. 2016, Galaxies, 4, 47,doi: 10.3390/galaxies4040047Jorstad, S. G., Marscher, A. P., Lister, M. L., et al. 2005,AJ, 130, 1418, doi: 10.1086/444593Jorstad, S. G., Marscher, A. P., Morozova, D. A., et al.2017, ApJ, 846, 98, doi: 10.3847/1538-4357/aa8407Kulkarni, S. R. 1989, AJ, 98, 1112, doi: 10.1086/115202Mart´ı-Vidal, I., Roy, A., Conway, J., & Zensus, A. J. 2016,A&A, 587, A143, doi: 10.1051/0004-6361/201526063Neal, R. M. 2011, MCMC Using Hamiltonian Dynamics(CRC Press), 113–162, doi: 10.1201/b10905-7Nityananda, R., & Narayan, R. 1982, Journal ofAstrophysics and Astronomy, 3, 419,doi: 10.1007/BF02714884Palumbo, D. C. M., Wong, G. N., & Prather, B. S. 2020,ApJ, 894, 156, doi: 10.3847/1538-4357/ab86acReadhead, A. C. S., Walker, R. C., Pearson, T. J., & Cohen,M. H. 1980, Nature, 285, 137, doi: 10.1038/285137a0Roberts, D. H., Wardle, J. F. C., & Brown, L. F. 1994,ApJ, 427, 718, doi: 10.1086/174180Salvatier, J., Wiecki, T. V., & Fonnesbeck, C. 2016, PeerJComputer Science, 2, e55, doi: 10.7717/peerj-cs.55Schwab, F. R. 1984, AJ, 89, 1076, doi: 10.1086/113605Smirnov, O. M. 2011, A&A, 527, A106,doi: 10.1051/0004-6361/201016082Thompson, A. R., Moran, J. M., & Swenson, George W., J.2017, Interferometry and Synthesis in Radio Astronomy,3rd Edition (Springer), doi: 10.1007/978-3-319-44431-4 A. RESIDUAL JONES MATRICESIn this section we explore the notion of a “residual” Jones matrix; i.e., what is the form of the Jones matrix forthe remaining calibrations after an imperfect calibration has been applied? Consider a dataset that has had such anattempted (but imperfect) calibration applied to it. The true Jones matrix J looks like (see Equation 14) J = (cid:32) G R G L (cid:33) (cid:32) D R D L (cid:33) (cid:32) e − iφ e iφ (cid:33) , (A1)but the applied inverse Jones matrix ˆ J has some imperfections in it. We can assume that these imperfections manifestin the station gain and leakage terms, but not in the ﬁeld rotation angles. ˆ J may thus be written asˆ J = (cid:32) G R (1 + (cid:15) R ) 00 G L (1 + (cid:15) L ) (cid:33) (cid:32) D R (1 + δ R ) D L (1 + δ L ) 1 (cid:33) (cid:32) e − iφ e iφ (cid:33) . (A2)The act of applying the imperfect calibration corresponds to attempting to “invert” J using ˆ J − , which yields a residualJones matrix, R ≡ ˆ J − J = (cid:32) Γ R

00 Γ L (cid:33) (cid:32) R ∆ L (cid:33) (cid:32) e − iφ e iφ (cid:33) . (A3)Here, we have deﬁned the residual gains,Γ R ≡ e iφ (cid:18) rD R D L (1 + δ R ) − D R (1 + δ R ) D L (1 + δ L ) −

1] (1 + (cid:15) R ) (cid:19) , (A4a)Γ L ≡ e − iφ (cid:18) D R D L (1 + δ L ) − r [ D R (1 + δ R ) D L (1 + δ L ) −

1] (1 + (cid:15) R ) (cid:19) ; (A4b)the residual leakages, ∆ R ≡ D R (cid:18) δ R − rD R D L (1 + δ R ) − r (cid:19) , (A5a)∆ L ≡ D L (cid:18) r (1 + δ L ) − rD R D L (1 + δ L ) − (cid:19) ; (A5b)and the fractional gain residuals, r ≡ (cid:15) L (cid:15) R . (A6)We see that while the residual Jones matrix R can be cast in the form of the original Jones matrix, the residual gainand leakage terms may no longer adhere to the same set of assumptions that apply to the true gains and leakages.In particular, the residual leakages depend on the fractional gain residuals, rr