[PDF] ALMA High-frequency Long-baseline Campaign in 2017: A Comparison of the Band-to-band and In-band Phase Calibration Techniques and Phase-calibrator Separation Angles

Abstract

The Atacama Large millimeter/submillimeter Array (ALMA) obtains spatial resolutions of 15 to 5 milli-arcsecond (mas) at 275-950GHz (0.87-0.32mm) with 16km baselines. Calibration at higher-frequencies is challenging as ALMA sensitivity and quasar density decrease. The Band-to-Band (B2B) technique observes a detectable quasar at lower frequency that is closer to the target, compared to one at the target high-frequency. Calibration involves a nearly constant instrumental phase offset between the frequencies and the conversion of the temporal phases to the target frequency. The instrumental offsets are solved with a differential-gain-calibration (DGC) sequence, consisting of alternating low and high frequency scans of strong quasar. Here we compare B2B and in-band phase referencing for high-frequencies ( > 289GHz) using 2-15km baselines and calibrator separation angles between ∼ 0.68 and ∼ 11.65 ∘ . The analysis shows that: (1) DGC for B2B produces a coherence loss < 7% for DGC phase RMS residuals < 30 ∘ . (2) B2B images using close calibrators ( < 1.67 ∘ ) are superior to in-band images using distant ones ( > 2.42 ∘ ). (3) For more distant calibrators, B2B is preferred if it provides a calibrator ∼ 2 ∘ closer than the best in-band calibrator. (4) Decreasing image coherence and poorer image quality occur with increasing phase calibrator separation angle because of uncertainties in the antenna positions and sub-optimal phase referencing. (5) To achieve > 70% coherence for long-baseline (16 km) band 7 (289GHz) observations, calibrators should be within ∼ 4 ∘ of the target.

Full PDF

aa r X i v : . [ a s t r o - ph . I M ] S e p Draft version September 15, 2020

Typeset using L A TEX twocolumn style in AASTeX62

ALMA High-frequency Long-baseline Campaign in 2017:A Comparison of the Band-to-band and In-band Phase Calibration Techniques andPhase-calibrator Separation Angles

Luke T. Maud —Yoshiharu Asaki —Edward B. Fomalont

3, 6 —William R. F. Dent —Akihiko Hirota —Satoki Matsushita —Neil M. Phillips —John M. Carpenter —Satoko Takahashi

3, 4,5 —Eric Villard —Tsuyoshi Sawada —Stuartt Corder — ESO Headquarters, Karl-Schwarzchild-Str 2 85748 Garching, Germany Allegro, Leiden Observatory, Leiden University, PO Box 9513, 2300 RA Leiden, The Netherlands Joint ALMA Observatory, Alonso de C´ordova 3107, Vitacura, Santiago, 763 0355, Chile National Astronomical Observatory of Japan,Alonso de C´ordova 3788, Oﬃce 61B, Vitacura, Santiago, Chile Department of Astronomical Science, School of Physical Sciences,The Graduate University for Advanced Studies (SOKENDAI), 2-21-1 Osawa, Mitaka, Tokyo 181-8588, Japan National Radio Astronomy Observatory, Edgemont Rd. Charlottesville, VA 22903, USA Institute of Astronomy and Astrophysics, Academia Sinica,11F of Astronomy-Mathematics Building, AS/NTU,No.1, Sec. 4, Roosevelt Rd, Taipei 10617, Taiwan, R.O.C.

Submitted to ApJSABSTRACTThe Atacama Large millimeter/submillimeter Array (ALMA) obtains spatial resolutions of 15 to5 milli-arcsecond (mas) at 275-950 GHz (0.87-0.32 mm) with 16 km baselines. Calibration at higher-frequencies is challenging as ALMA sensitivity and quasar density decrease. The Band-to-Band (B2B)technique observes a detectable quasar at lower frequency that is closer to the target, compared to oneat the target high-frequency. Calibration involves a nearly constant instrumental phase oﬀset betweenthe frequencies and the conversion of the temporal phases to the target frequency. The instrumentaloﬀsets are solved with a diﬀerential-gain-calibration (DGC) sequence, consisting of alternating low

Corresponding author: Luke T. [email protected]

Maud et al. and high frequency scans of strong quasar. Here we compare B2B and in-band phase referencing forhigh-frequencies ( >

289 GHz) using 2-15 km baselines and calibrator separation angles between ∼ ∼ ◦ . The analysis shows that: (1) DGC for B2B produces a coherence loss < < ◦ . (2) B2B images using close calibrators ( < ◦ ) are superior to in-bandimages using distant ones ( > ◦ ). (3) For more distant calibrators, B2B is preferred if it providesa calibrator ∼ ◦ closer than the best in-band calibrator. (4) Decreasing image coherence and poorerimage quality occur with increasing phase calibrator separation angle because of uncertainties in theantenna positions and sub-optimal phase referencing. (5) To achieve >

70 % coherence for long-baseline(16 km) band 7 (289GHz) observations, calibrators should be within ∼ ◦ of the target. Keywords:

Long baseline interferometry (932), Submillimeter astronomy (1647), Phase error (1220) INTRODUCTIONALMA is currently the only submillimeter interfer-ometer that provides access to frequencies between420 −

950 GHz, wavelengths 0.3 − ∼ <

200 pc away,such as protoplanetary discs, or sub-pc scales for extra-galactic sources within 40 Mpc. The initial ALMAlong-baseline observations were originally showcasedin (ALMA Partnership et al. 2015a,b,c) and have beenoﬀered as an observing mode for band 3, 4 and 6 since2015, and in band 5 shortly thereafter. Starting inALMA Cycle 7, band 7 long-baselines have been oﬀeredfor use, where resolutions can reach ∼

15 mas. How-ever, observations at frequencies higher than 450 GHzhave not been oﬀered yet on baselines longer than 5 km.These observations pose a signiﬁcant challenge becausestandard calibration techniques are more diﬃcult toemploy.Observations in the sub-mm regime suﬀer from ab-sorption, in that a signal from an astronomical source isattenuated, mostly by water vapor, in the Earth’s atmo-sphere. These signals cannot be recovered and observa-tions must be limited to conditions with low precipitablewater vapor (PWV) content to maximize transmission.ALMA typically limits band 9 and band 10 observa-tions to when the precipitable water vapor is less than ∼ ∼ ∼ ℓ , in µ m) directlyrelate to root-mean-squared (RMS) of the phase ﬂuctu- ations ( σ φ , in radians) for a given observing frequency( ν obs in Hz) by: σ φ = 2 πℓ ν obs c ( radians ) , (1)where c is the speed of light (in µ m s − ). Thus for par-ticular atmospheric conditions, causing path length vari-ations, the phase ﬂuctuations will increase with increas-ing observing frequency . The estimated coherence forthe visibilities ( V = V e iφ , where V are the true vis-ibilities and φ describes the phase ﬂuctuations causedby the atmosphere, Thompson et al. 2017) can be cal-culated using: h V i = h V i h e iφ i = h V i e − σ φ / , (2)assuming Gaussian random phase ﬂuctuations, φ , withan RMS of σ φ (in radians) about a mean phase ofzero (Thompson et al. 2017). Anomalous path lengthchanges on many baselines can cause a signiﬁcant lossin the signal due to decoherence and also cause blurringand smearing in target image (e.g. Carilli & Holdaway1999).Atmospheric ﬂuctuations are thought to be describedby Kolmogorov turbulence theory (Coulman 1990)where path length variations are a function of base-line length. During previous ALMA long-baselinecampaigns, Matsushita et al. (2017) indicated that thephase RMS σ φ increases as b . for baselines, b , < b . for b > This is generally true although dispersion can occur near at-mospheric lines. We use the term ‘estimated coherence’ throughout this pa-per to denote coherence values estimated from Equation 2 afterinputting a measured phase RMS ﬂuctuation and V = 1. comparison of calibration techniques > < < >

420 GHz), the WVRs will notprovide a signiﬁcant improvement of the phases as thewater vapor ﬂuctuations are no longer dominant.Phase referenced observations are the standard formost interferometric observations. A point source cal-ibrator, typically a quasar, close-by on the sky is ob-served interspersed with the observations of the astro-nomical source to form a phase referencing cycle. Thephase referencing cycle, from calibrator scan to targetscan and back to the calibrator scan, is repeated with aﬁnite cycle time. During data calibration the phase solu-tions are interpolated and transferred from the calibra-tor scans to the source scans, correcting any phase vari-ability occurring on timescales longer than the referenc-ing cycle time. In order to counteract rapid atmosphericvariations fast-switching, with notably reduced cycletimes, can be used. Pre-ALMA-era investigations foundthat cycle times as low as 80 s lead to better calibrationand thus better imaging (e.g. Holdaway & Owen 1995;Carilli & Holdaway 1997; Morita et al. 2000; Lal et al.2007). These authors also noted that even faster cy-cling might be required for high frequency observations.For ALMA, the antennas can change pointing by a fewdegrees in only 2-3 s such that fast-switching phase ref-erencing cycle times can be as short as ∼

20 s withoutsigniﬁcant overheads (Asaki et al. 2014, 2016).For completeness, we note there are special caseswhen self-calibration can be undertaken. If the sci-ence target is a suﬃciently strong source and has ahigh surface brightness it can itself be used to cali-brate the phase delays on the timescales of those varia-tions (for more details see Pearson & Readhead 1984,Cornwell & Fomalont 1999, and Brogan et al. 2018).Self-calibration requires an initial model of the sourcebrightness distribution which usually remains unknownprior to the observations. Unless a point-like initialstarting model can be used, phase referencing needs tobe good enough to provide the starting image for self-calibration. For most high-frequency long-baseline observations,we need relatively short cycle times and a close calibra-tor to track the phase variations. The requirement ofa close calibrator is compounded by path length errorscaused by antenna position uncertainties. Hunter et al.(2016) present a ﬁt to ALMA long-baseline antennaposition measurements indicating uncertainties of upto ∼ α ∼ -0.7) andare notably weaker above 420 GHz, while atmospherictransmission decreases and reduces observing sensitiv-ity. Weaker calibrators require more integration time toachieve enough signal-to-noise to provide a single cal-ibration solution. This entirely conﬂicts with the ne-cessity of short cycle times to track the variable at-mosphere as there would be little time left to observea target source. A more subtle point is that long on-calibrator times should be avoided and cannot be aver-aged as the signal itself would suﬀer from decorrelationdue to the ﬂuctuations that we are trying to correct.Alternatively, one could search for bright enough cali-brators with increasing distance from a science target.Studies by Asaki et al. (1998, 2016) however, indicatedthat using distant calibrators does not always providethe expected levels of phase correction (see also Section4.1).One foreseeable way to calibrate high-frequency long-baseline observations is to observe the calibrators at alower frequency where they are naturally much stronger.Subsequently, the likelihood of ﬁnding a calibrator closeto a given scientiﬁc target source improves (Asaki et al.2020a) and hence would minimize any phase errorscaused due to large calibrator separation angles. Be-cause phase variations scale linearly with frequency(Equation 1), phase solutions can be scaled by a mul-tiplicative factor from a calibrator observed at a lowfrequency to a scientiﬁc target observed at a higherfrequency. Transferring phase solutions from lower tohigher frequencies has being tried, somewhat success-fully, by other arrays: the Nobeyama millimeter array(NMA) used paired antennas, one set observing a satel-lite at 19.45 GHz, while the others observed a celestialtarget at 146.81 GHz (Asaki et al. 1998); The Combined Maud et al.

Array for Research in Millimeter-wave (CARMA) alsoused a paired-antenna-combination-system (C-PACS) inwhich their long-baseline (2 km) antennas were pairedwith adjacent antennas that observed at a wavelengthof 1 cm (P´erez et al. 2010a,b; Zauderer et al. 2016); theSub-Millimeter-Array (SMA) tested a dual-frequencymode in which two frequencies were observed simul-taneously, and the phase solution derived from thelow-frequency observations were applied to the high-frequency data (Hunter et al. 2005). The SMA, beforeALMA, was the only interferometric instrument to at-tempt frequencies as high 650 GHz, as ALMA can nowobserve. The SMA observations were very diﬃcult evenwith the dual frequency capability as the culmination ofatmospheric conditions, lower sensitivities (a smaller ar-ray with smaller antennas when compared with ALMA)and instabilities meant that very few successful stud-ies were undertaken. In the dual-frequency observa-tions by Chen et al. (2007) low- to high-frequency phasetransfer was not used, rather calibration relied on so-lutions transferred from a high-frequency maser source ∼ ◦ away.Currently, the KVN (Korean-Very-Long-Baseline-Interferometry-Network) can simultaneously observe at22, 43, 86 and 129 GHz and is the only instrument toregularly employ a technique termed frequency-phase-transfer (FPT - Rioja et al. 2015), which is the processof calibrating all sources at a given high-frequency withthe lower-frequency solutions. Note, these are VLBI ob-servations of point-like sources at cm-wavelengths suchthat the FPT is valid. An additional technique, source-frequency-phase-referencing (SFPR - Dodson & Rioja2009; Rioja & Dodson 2011) can also be employed. Thisuses a combination of FPT with simultaneous multi-band phase referencing allowing a scaling of lower- tohigher-frequency phase solutions to correct troposphericphase variations while the phase referencing correctsslower ionospheric errors.For ALMA the technique analogous to FPT is calledband-to-band (B2B) phase referencing and was alreadyenvisaged as a calibration method for ALMA combinedwith fast switching (e.g. Holdaway et al. 2004). Theuse case of B2B would be that a phase calibrator ata lower frequency would generally be found closer toa selected science target compared to using standardin-band phase referencing. We investigate this scenarioin detail in this paper. Diﬀerent ALMA frequency receiver ranges are divided intobands.

In this paper, as part of the series on the HF-LBC-2017 (see also Asaki et al. 2020a,b; Maud et al. 2020),we make comparisons of the B2B and the standard in-band phase referencing techniques. These comparisonsare part of the stage 3 tests enumerated in Asaki et al.(2020a). For stage 3 there are three main goals that canbe investigated with our observations, of which the ﬁrsttwo are detailed in this work whereas goal (3) is detailedin Maud et al. (2020):1. To make a comparison of the image quality ob-tained from B2B phase referencing with that ofin-band calibration using the same phase calibra-tor and to categorise any detrimental eﬀects dueto diﬀerential-gain-calibration (DGC).2. To determine the deterioration of the standard in-band phase referencing with increasing calibratordistance from the target, and to contrast with B2Bobservations using closer calibrators.3. To determine the image improvement as a functionof phase referencing cycle time from ∼

120 to 24 s.In Section 2 we describe in detail the observationaltests, the data reduction and the methodology used forthe analysis. Section 3 details the results of the ﬁrst twomain goals listed above and the comparisons between in-band and B2B techniques. In Section 4 we speciﬁcallydiscuss the eﬀects of calibrator separation angle and thechoice of observing technique, B2B or in-band. Finally,we summarize our ﬁndings in Section 5. OBSERVATIONS, REDUCTION ANDMETHODOLOGYWe undertook our experiments in the latter half of Cy-cle 4 and the start of Cycle 5 (2017 June to October) aspart of the 2017 high-frequency and long-baseline cam-paign HF-LBC-2017 (Asaki et al. 2020a). During thisperiod of time we conducted 50 full length observations,of which 44 are useable for analysis . Spreading the testsover a period of a few months allowed a reasonable cov-erage of maximal baseline lengths ranging from ∼ ∼

15 km. Observations were taken using a range of band-to-band pairs (i.e. frequency pairs) B7-3, B8-4, B9-4,B9-6 and even one B10-7 observation (where the ﬁrstnumber denotes the target frequency band and the lat-ter the calibrator frequency band). The frequency pairsare constrained by the ALMA hardware (see Asaki et al. One band 10 test and four band 9 tests failed due to non-detections as a result of high image noise; while one band 7 couldnot be calibrated due to missing calibrator scans. comparison of calibration techniques ∼ < ◦ ), suchthat we can investigate any loss of image quality sincethe DGC potentially adds extra phase errors for B2B.For the second goal, to illustrate the real beneﬁt of B2Bcalibration when the best in-band calibration is muchfarther away, we use close calibrators for B2B ( < ◦ ) andmore distant ones for in-band (2-11 ◦ ). The phase ref-erencing cycle time (i.e. measured from the mid-pointof the phase calibration scan to the mid-point of thesubsequent phase calibrator scan) was of the order 24 sboth for the in-band and B2B phase referencing blocks.We use fast-switching when compared to the standardALMA Cycle 7 long-baseline cycle-time of 72 s. Theintegration time used was 1.15 s, while each on-sourcescan comprised eight integrations and thus was 9.2 s inlength (i.e. from the calibrator mid-point, 4.6 s cali-brator + 2-3 s slew/freq. change + 9.2 s target + 2-3 sslew/freq. change + 4.6 s calibrator ∼

24 s cycle time).As noted, the antennas can slew between sources and stabilize in 2-3 seconds. Speciﬁcally for the B2B blocks,short cycle times are possible only by changing frequencyusing the harmonic frequency switching mode (see alsoAsaki et al. 2020a). Brieﬂy, we use a ﬁxed frequency forthe photonic Local Oscillator (LO) that is tuned once atthe start of the observations. Each receiver multipliesthis LO value using an auxiliary oscillator in each an-tenna to achieve the ﬁrst LO frequency (LO1) and setthe frequency ranges. The harmonic frequency switch-ing mode permits a change in frequency in 2-3 seconds.In comparison any normal frequency changes at ALMAwhere both the LO and LO1 are reset requires a 20 soverhead to re-lock the frequencies and thus would nothave allowed the fast switching B2B operation. All ob-servations were performed in the continuum TDM (TimeDomain Mode) for the correlator setup. In this modethere are four spectral windows (SPWs) per frequency,conﬁgured with 64 channels. The bandwidth of eachSPW is 2 GHz, with a usable width of 1.875 GHz, pro-viding an aggregate eﬀective bandwidth of 7.5 GHz. Inthe case of the band 9 observations, due to the spe-ciﬁc testing setup, only two of the four SPWs are inde-pendent providing an eﬀective bandwidth of 3.75 GHz.The central frequencies for the observations are listedin Table 2. The high-frequencies are those of the targetand calibrator for in-band, while the low frequencies arethose for the calibrator only when using B2B mode.2.1.

Diﬀerential Gain Calibration

DGC is the only way to determine the instrumentalphase diﬀerence between the high frequency and low fre-quency bands. It is accomplished by observing a brightquasar while switching quickly between the two bands.DGC is detailed more thoroughly in Asaki et al. (2020a)and Asaki et al. (2020b), although we provide a shortoverview for context. The phase diﬀerence between theDGC source observed at two frequencies is a delay termdominated by atmospheric variations and an instrumen-tal oﬀset. Because of the changing atmosphere, fastswitching between frequencies is required such that atransfer of the low frequency phase solutions can beused to calibrate the high frequency atmospheric vari-ations that one assumes have not changed signiﬁcantlyover the switching time. Under this assumption, the re-maining delay is considered to be entirely instrumentaland is thus the characteristic band-oﬀset, hereafter theDGC solution. It is imperative that the DGC solution isnot contaminated by any atmospheric phase variationswhich could be detrimental to B2B phase calibration.We explore this in more detail in Section 3.1.1.In these tests we use fast frequency switching witha 24 s cycle time where ∼ Maud et al.

Figure 1.

Illustrative schematic of the full observing sequence. The six blocks making up one full observation can be understoodby separating the ﬁrst DGC observation, the ﬁrst in-band referencing, the ﬁrst B2B referencing, the second in-band block, thesecond B2B block, and the end DGC block. The ﬁgure is illustrative as it is not accurate in terms of observing time or numberof scans. In reality all scan lengths were ﬁxed at 9 s for all targets regardless of observing at the high or low frequency, and therewere more than 10 scans per block. The individual scans are identiﬁed in the legend at the bottom. Calibration scans refer tosystem-noise temperature (Tsys) and pointing scans. high- and low-frequencies respectively, with around 3 sto switch between frequencies. All DGC source ﬂuxesat the high frequency are > < ◦ in all cases. The DGC source is only used toﬁnd the instrumental oﬀset, akin to how a bright sourcewithin some tens of degrees of a target is used to solvefor the instrumental bandpass response in general opera-tions, and thus the separation is not critical (Asaki et al.2020a). Two DGC solutions are found for the start andend DGC blocks and applied using a linear interpolationto correct for any slow drifts (Asaki et al. 2020b).2.2. Data Reduction and Processing

With a forward look to commissioning and imple-mentation of B2B, a standardized script was developedbased on the ALMA quality assessment procedures us-ing casa (McMullin et al. 2007). Each of the tests wasconducted in the same sequence such that the script au-tomatically identiﬁed all required SPWs and scans touse for each source at each frequency and proceed ina step wise manner. Although the process of calibra-tion was therefore almost entirely automated, it did not preclude checking the data and solutions during the re-duction steps.Automatic ﬂags are typically generated in standardALMA science observations when any system based is-sues occur. However, due to the nature of these testswith fast cycle times and frequency switching (in theB2B case), some of the typical online ﬂags were notstored, and thus manual checking and ﬂagging of alldatasets were undertaken before calibration, imagingand analysis. The data reduction follows a relativelysimilar process to standard ALMA data reduction ex-cept for added stages to address the frequency switchingfor the B2B datasets.2.2.1.

In-band reduction

In short, WVR, system-noise temperature (Tsys) andantenna position corrections are applied followed by theaforementioned manually set ﬂags. The DGC sourcehigh-frequency scans are used to calibrate the bandpassresponse and also to provide a single ﬂux amplitude cal-ibration, i.e. there is no secondary temporal amplitudescaling bootstrapped from the phase calibrator gains,as to provide a fair comparison with the B2B blocks.Normal phase interpolation is conducted as the phase comparison of calibration techniques Table 1.

Overview and parameters of the 44 analyzed in-band - B2B observations conducted as part of the HF-LBC-2017.

Name Date Time DGC source No. PWV Expected phase RMS Maximum(UTC) Name ﬂux (Jy) α ants (mm) ( µ m) (deg) Baseline (m) Band 7 - 3

J2228-170827-B73-1deg 2017-08-27 04:48:24 J2253+1608 5.89 -0.705 48 0.62 35.4 10.9 4540J2228-170829-B73-1deg 2017-08-29 07:33:25 J2253+1608 5.89 -0.705 48 1.45 42.0 14.1 4734J2228-170829-B73-3deg 2017-08-29 02:07:47 J2253+1608 5.89 -0.705 47 1.33 55.5 18.6 5049J2228-170829-B73-6deg 2017-08-29 02:57:20 J2253+1608 5.89 -0.705 47 1.28 41.6 13.9 4874J0449-170829-B73-2deg 2017-08-29 08:24:35 J0522 − − a − − − − b − − c,d,e − e,f − c − Band 8 - 4

J1709-170717-B84-2deg 2017-07-17 00:49:17 J1924 − − − c,g − − − − h − − − i Band 9 - 4

J2228-170717-B94-1deg 2017-07-17 06:58:47 J2253+1608 3.37 -0.716 29 0.35 40.56 33.2 2343

Band 9 - 6

J2228-170725-B96-1deg 2017-07-25 05:59:58 J2253+1608 3.26 -0.718 33 0.27 19.3 15.8 2943J2228-170725-B96-6deg 2017-07-25 06:46:10 J2253+1608 3.26 -0.718 33 0.27 16.1 13.2 2851J0449-170725-B96-5deg c − c − c − e Notes: The tests are ordered into band pairs, where the ﬁrst band is that of the target and the following that of the calibratorfor the B2B blocks only. Tests are identiﬁed by the target, the observing date (YYMMDD), the B2B frequency pair andthe related in-band calibrator separation angle in the naming scheme. Baseline length is the maximal projected value andare rounded to the nearest meter. The ﬂux and spectral index ( α ) of the DGC source and the phase RMS given in degreesrelate to the target observing frequency. The expected phase RMS is that measured on the DGC over ∼

30 s and combinesall baselines > a b Uses DGC block from J2228 − c Only has/uses one DGC block, d Only has one in-bandblock, e f All DGC blocks failed, used one from J0633 − g Only has one B2B block, h Onein-band block ﬂagged as source > ◦ elevation, i Last DGC block ﬂagged.

Maud et al.

Table 2.

Central frequencies for the SPWs.

SPW High Frequency (GHz) Low Frequency (GHz)B2B (calibrator only)

Band 7 (B2B pair Band3)

Band 8 (B2B pair Band4)

Band 9 (B2B pair Band4)

Band 9 (B2B pair Band6) calibrator and target source are observed at the samefrequency (i.e. standard phase referencing). Note thatall high frequency SPWs are combined when obtainingthe phase calibrator solution to maximize the signal-to-noise ratio (S/N). Ideally each SPW used for in-bandcalibration should generally be calibrated individuallybecause any combination of SPWs averages the residualdelays, which are then considered as phase solutions atan average frequency. Phase solution error (in radians)follow 1/(S/N) and are non-negligible for low S/N val-ues, and so in some observations the combination is ab-solutely required because the phase calibrator was par-tially ﬂagged, or was weaker than expected (e.g. at band9). 2.2.2.

B2B reduction

For B2B calibration, the Tsys and WVR correctionsare applied to both the high- and low-frequencies, whileantenna position corrections and ﬂags are applied as perin-band reduction. The DGC source is used ﬁrst to cal-ibrate the low- and high-frequency bandpass responseand to provide a single ﬂux amplitude calibration onlyat the high-frequency. No ﬂux calibration is required atthe low-frequency as we use only the phase solutions.Subsequently we solve for the DGC solution as outlinedabove. The only remaining correction to make now isthat for the atmospheric ﬂuctuations. Here, phase cal- ibration transfers the phase solutions from the low fre-quency phase calibrator to the high frequency targetscaled by the multiplicative ratio between frequencies, ν h /ν l (similar to FPT, where ν h and ν l are the high-and low-frequencies respectively). For the harmonic-switching setup employed, the ratio is an integer value,although it is not a requirement in general for ALMA us-ing normal frequency switching. The interpolation andscaling entirely is handled by the ‘linearPD’ interpola-tion option in the applycal task within casa , that fullysolves for any phase ambiguities before scaling (G. Moel-lenbrock - private communication). We note that thecalibrator low-frequency SPWs are averaged together inall tests. 2.2.3. Self-calibration

All target sources are point source quasars and so wealso perform self-calibration on the raw data. Again,we combine all SPWs in this process to boost the S/N.Self-calibration is undertaken in order to directly com-pare an ideal calibration, free of any residual phase er-rors, against both the in-band and B2B phase refer-encing techniques. Generally for self-calibration we usethe integration timescale of ∼ ∼ Measuring the Phase stability

Our observational strategy groups in-band and B2Bblocks together as part of one observation sequence toensure the direct comparability of the in-band and B2Bphase calibration techniques as they are taken under thesame observing conditions. However, to allow the com-parison between diﬀerent observations taken on diﬀerentdays and under diﬀerent conditions we must consider thephase stability.The spatial-structure-function (SSF) D φ ( b ) is the dis-persion of the atmospheric phase as a function of base-line length b measured over the entire time intervalof a given observation ( t obs , see also Wright 1996;Carilli & Holdaway 1999; Matsushita et al. 2017) and is comparison of calibration techniques σ φ = p D φ ( b ). Here we usethe same metric but impose a speciﬁc timescale average: σ φ ( b, t ) = q D φ ( b, t ) = h ( φ ( x + b ) − φ ( x )) i / t ( deg ) , (3)where ( φ ( x + b ) − φ ( x )) is the atmospheric phase dif-ference between the two antennas, and the angle brack-ets represent an ensemble average over the timescale t ,our chosen speciﬁc time range for the averaging period( t = t obs results in the classic SSF). Crucially, we make allphase RMS assessments after application of the WVRsolutions. This is because the WVR solutions are ap-plied on the integration timescale and correct the phaseﬂuctuations for all sources.We refer to our ﬁrst phase RMS measure we refer toas the expected phase RMS. This is established usinga time interval, t , close to the cycle-time ( t cyc ) but onnon-phase referenced data. This provides a representa-tive value of the phase variations that will likely remainin the data after ideal phase referencing, because phasereferencing only corrects ﬂuctuations longer than thecycle time, and hence variations . t cyc remain largelyunchanged. We calculate the expected phase RMS us-ing only the low frequency scans of the DGC sourceas these provide the highest S/N. Because our observ-ing sequence has ﬁxed and repetitive scan lengths, ifwe exclude the high frequency scan in-between two con-secutive low-frequency scans we measure the expectedphase RMS over ∼

30 s, which is the closest match tothe phase referencing cycle time of ∼

24 s. The ﬁnal ex-pected phase RMS is the average value from all pairedlow frequency scans, but scaled to the high-frequency atwhich the target it observed. This is also used to derivethe expected coherence via Equation 2. The expectedphase RMS values are indicated in Table 1.Second, we measure the residual phase RMS for theDGC source. The residual phase RMS is measured afterB2B phase referencing has occurred, using the full timeduration of the DGC source observations (block 1 and6 combined, t = t obs ), and only the high frequency data.The residual phase RMS provides the true value of phaseﬂuctuations remaining in the data. We detail the eﬀectof the DGC source phase residuals in Section 3.1.1.Finally, we measure the residual phase RMS of thetarget sources pre- and post-calibration for the respec- For these phase RMS calculations we must account for a non-zero degrees mean phase as we use only WVR corrected data andmust exclude phase oﬀsets and only include the short-term phasevariations over the selected scans. We therefore use a standarddeviation calculation for each of the DGC scan pairs rather thana true RMS statistic. tive in-band and B2B target data. These calculationsalso use the entire duration of the observations on eachtarget (blocks 2 and 4 for in-band, and blocks 3 and5 for B2B data). The pre- and post-calibration values,combined with the expected phase RMS allow us to in-vestigate how eﬀective phase referencing is and contrastthat with what was expected. This is discussed furtherin Section 4. 2.4.

Imaging

All of the targets are imaged in an automatic fashionusing the clean command in casa . The beam pixelsizes are chosen to be ﬁve times smaller than the synthe-sized beam (band 9 uses seven time smaller) and we usesquare maps with sizes of 512 ×

512 or 1024 × clean ing isundertaken within a 15 pixel radius circular region inthe center of the map using a ﬁxed number of 50 iter-ations, which is suﬃcient for the central point sources.The peak ﬂux density and integrated ﬂux are measuredwithin the same circular aperture to parameterize thesource, while the map noise is taken within an annu-lus between radii of 15 pixels out to 250 or 500 pixelsdepending on the image size 512 or 1024 respectively.The sources are also ﬁtted with a 2D Gaussian in theimage plane within the central region of the map. Allself-calibrated images are measured in the same manneras those made with phase referencing.2.5. Image Assessment Criteria: image coherence,ﬁdelity, dynamic range, defect

We deﬁne four measures to assess the results: imagecoherence, ﬁdelity, dynamic range, and defect. The con-cept of coherence was introduced in Section 1 as the es-timated coherence of the visibility data calculable basedon the expected phase RMS of the data. Here we deﬁnethe image coherence which is measured directly by com-paring the peak ﬂux densities of the target from phasereferenced images using in-band and B2B calibrationwith the respective self-calibrated images (Asaki et al.2020b). In a number of VLBI studies this is often re-ferred to as the fractional (peak) ﬂux recovered, or peak-ratio (e.g. Dodson & Rioja 2009; Rioja et al. 2011, 2015;Mart´ı-Vidal et al. 2010b). To provide a measure of im-age ﬁdelity, the accuracy of the reconstructed sky bright-ness distribution, we simply use the measure of peak ﬂuxdensity divided by the integrated ﬂux. Given the targetsare point sources these should be equal for an ideal cal-ibration (peak ﬂux density in Jy/beam and integrated0

Maud et al.

Table 3.

Parameters of the in-band - B2B observations where the same nearby phase calibrator was used for both phasereferencing techniques.

Name Target Calibrator Sep. Peak (mJy/beam) Flux (mJy) Noise (mJy/beam)(deg) In-Band B2B In-Band B2B InBand B2B

Band 7 - 3

J2228-170827-B73-1deg J2228 − − − − − − − − − − a J0633 − − − − − − a J0633 − − Band 8 - 4

J1709-170717-B84-1deg J1709 − − − − a J0633 − − a J2228 − − − − Band 9 - 4

J2228-170717-B94-1deg J2228 − − Band 9 - 6

J2228-170725-B96-1deg J2228 − − Notes: The peak, integrated ﬂux, and noise levels are indicated for the target source after calibration. Band 9 ﬂux accuracylimited by thermal noise. a B2B nosier than in-band by >

30 %. ﬂux in Jy). Any deviation from this equality indicatesa spreading of the ﬂux in the image and a poorer rep-resentation of the point source target. The dynamicrange is simply the image peak divided by the imagenoise level. Finally, we judge whether there are signif-icant defects that shift the source central position oralter the structure by using the Gaussian ﬁts. Strictlyspeaking these parameters are somewhat correlatated,in that poor phase calibration will result in a low imagecoherence and will cause image defects leading to a lowimage ﬁdelity and thus overall poor image quality. RESULTS AND COMPARIONSThe following subsections address goals (1) and (2)outlined at the end of Section 1. Where relevantwe introduce the assessment criteria and measures ofthe expected and residual phase RMS to make com-parisons. We also divide the observations by max-imal baseline length, separated into short-baseline - < > − − − . Predominantly, the longer baseline observa-tions, > Comparison of in-band and B2B calibration withthe same calibrator

There are a total of 16 observations divided into dif-ferent frequency pairings, B7-3 (9), B8-4 (5), B9-4 (1)and B9-6 (1). Table 3 lists these datasets along with thepeak ﬂux densities and integrated ﬂux values and imagenoise. The calibrators used in these observations can beconsidered as almost ideal as they are extremely close tothe target, they were selected to be ∼ ◦ away. The max-imal target-to-calibrator separation is 1.67 ◦ , while nineof the datasets target J2228 − − ◦ .In Figure 2 the comparisons of the image peak ﬂuxdensity (left), integrated ﬂux (center) and the imagemap noise (right) of the targets calibrated with the B2Band in-band techniques from each dataset are presented. https://almascience.eso.org/documents-and-tools/cycle5/alma-technical-handbook/view comparison of calibration techniques ∼

60 % increase forJ0633-170930-B73-1deg. This observation has a failedin-band block and DGC block due to a hardware insta-bility in the telescope system, although the B2B data didrecord without obvious errors. Such an extreme outlierdoes not appear to be the norm. We note that when us-ing only one DGC block any linear change of the DGCsolution cannot be corrected. We also cannot rule outeﬀects of system instabilities. After self-calibration thetarget image noise values are more consistent betweenthe in-band and B2B data.Figure 3 shows the comparisons of the image assess-ment criteria (Section 2.5), the image coherence (left),ﬁdelity (center) and the dynamic range (right). We ﬁndthat the B2B image coherence values typically are within6 % (bands 7 and 8) of the in-band values, which againare generally better. For ﬁdelity the B2B images arewithin 3 % (band 7) and 8 % (band 8) of the in-band val-ues. Still this is indicative of the B2B technique closelymatching that of the standard in-band phase referencingcalibration. The dynamic range panel mirrors the im-age noise panel from Figure 2, where B2B images withworse noise are those with a lower dynamic range. Table4 presents all the plotted parameters. None of the ﬁttedtarget positions show discrepancies larger than one-thirdof their respective synthesized beams for the band 7 andband 8 data. Good ﬁts are not achieved for band 9 asthey are limited by thermal noise.3.1.1.

Determining the eﬀect of DGC

The only critical diﬀerence between in-band and B2Bphase referencing is the extra step of DGC to correct theinstrumental band-oﬀset. We can surmise that any inac-curacies in the DGC solution are responsible for discrep-ancies when comparing in-band and B2B images. Be-cause the instrumental oﬀset can only be found by DGCwe have no ideal value to compare with. However, we can investigate whether uncorrected atmospheric vari-ations negatively impact the DGC solution. Figure 4plots the diﬀerence between the in-band and B2B im-age coherence as a percentage against the residual phaseRMS of the DGC high-frequency data after B2B phasereferencing. There are nine band 7 and 8 datasets withhigher in-band image coherence values (excluding J0633-170930-B73-1deg which has signiﬁcantly higher B2B im-age noise). The black dashed line is a linear ﬁt to thelogarithm of the residual phase RMS against the coher-ence diﬀerence of those datasets. The trend implies thatany remaining atmospheric ﬂuctuations during DGChave a negative impact on the DGC solution, and thusmarginally degrade the ﬁnal B2B target source images.The degradation follows: 17 . ( σ φ DGC ) − .

4, andis small, only ∼ ∼ σ φ DGC = 20 and 30 ◦ respectively. At a residualphase RMS σ φ DGC = 12 ◦ the coherence diﬀerence ap-proaches zero. Except the aforementioned outlier J0633-170930-B73-1deg there are no datasets with a residualphase RMS < ◦ , at which level the expected coherenceis >

98 %. One could surmise that in-band and B2B im-ages are equal to within an image coherence of 1-2 % atsuch low residual phase RMS levels, and any diﬀerenceslikely relate to uncertainties in the phase solutions inthe calibration irrespective of the technique used. Pro-vided that the residual phase RMS for the DGC sourceis minimized, using fast frequency switching, then theDGC stage should not be signiﬁcantly detrimental tothe ﬁnal target image quality for B2B calibration (alsosee Section 3.2.1)3.1.2.

Close calibrators for phase referencing

In an attempt to isolate and investigate the eﬀect ofthe small calibrator-to-target separation angles ( < ◦ for these close calibrator data), we compare the tar-get image coherence with the DGC source image co-herence, and these with the respective expected coher-ence values (Figure 5). Here the DGC source acts asthe ideal phase referencing case because there is no po-sition change for the phase-referencing, only the tem-poral phase transfer using B2B. In calculating the ex-pected coherence, we make a correction to the expectedphase RMS measured on the DGC sources to account forthe diﬀerent elevation of the target sources, σ φ ( θ tar ) = σ φ ( θ dgc ) sin ( θ dgc ) /sin ( θ tar ), where θ tar and θ dgc are theelevations of the target and DGC sources respectively(Holdaway 1997; Butler 1997). The maximum elevationdiﬀerence between the DGC sources and the targets is29 ◦ , although over half the datasets have elevation diﬀer-ences < ◦ . The elevation corrections generally changethe expected phase RMS by <

10 %, well within the as-2

Maud et al.

Table 4.

Image coherence, Image Fidelity, Dynamic range, DGC source image coherence and Expected coherence parametersof the in-band - B2B observations where the same nearby phase calibrator was used for phase referencing.

Name Image Coherence Fidelity Dyn. Range DGC image ExpectedIn-band B2B In-band B2B In-band B2B Coherence Coherence

Band 7 - 3

J2228-170827-B73-1deg 0.97 0.97 0.97 0.97 571.97 465.74 0.98 0.98J2228-170829-B73-1deg 0.94 0.95 0.98 0.98 408.21 390.42 0.98 0.97J0449-170829-B73-2deg 0.90 0.90 0.95 0.94 325.34 323.53 0.98 0.97J2228-170830-B73-1deg 0.95 0.93 0.98 0.98 381.92 330.98 0.96 0.96J2228-170917-B73-1deg 0.92 0.90 0.95 0.96 272.38 267.35 0.94 0.94J0633-170917-B73-1deg 0.60 0.52 0.77 0.70 176.73 117.04 0.61 0.67J2228-170926-B73-1deg 0.95 0.96 0.95 0.97 403.18 352.20 0.98 0.98J0449-170928-B73-2deg 0.84 0.90 0.90 0.93 239.66 234.61 0.97 0.96J0633-170930-B73-1deg 0.97 0.92 0.97 0.93 427.81 250.09 0.99 0.99

Band 8 - 4

J1709-170717-B84-1deg 0.72 a a a a a a Band 9 - 4

J2228-170717-B94-1deg 0.73 b b Band 9 - 6

J2228-170725-B96-1deg 0.91 b b Notes: Fidelity for the band 9 data is above 1.0 as the weak detection is thermal noise dominated. a Indicates that the image coherence was calculated against the self-calibrated image that used the scan length solutioninterval (9 s) to achieve a self-calibration signal-to-noise & b Indicates that the image coherence was calculated against the expected band 9 source ﬂux after extrapolation from theself-calibrated band 7 and band 8 images. signed 20 % uncertainties of the expected RMS phasecalculation itself. The target image coherence is the av-erage value from both B2B and in-band images whichare nearly equal.The left panel of Figure 5 shows that the target imagecoherence values are comparable with the DGC sourceimage coherence values, on average within 5 % and 11 %for bands 7 and 8 respectively. Separated into mid- andlong-baselines for band 7, the diﬀerences are 4 % and 6 %on average. As one might expect the DGC image coher-ence values are marginally better, and thus the smallreduction in target image coherence is as a result ofthe small target to phase calibrator separation angle.Roughly one or two percent could be be attributed todetrimental eﬀect of DGC on the B2B images as thetarget image coherence used is the average of in-bandand B2B ones. The center and right panels of Figure5 compare the target and DGC source image coherenceparameter with the expected coherence calculated usingthe expected phase RMS, respectively. These coherencemeasures are very similar and on average for the tar-get images are within 6 % excluding images diﬀering by >

20 %. The band 9 are consistent with these ﬁndingsconsidering the larger uncertainties. The DGC sourceimage coherence and expected coherence values are al-most equal and are presented in the last two columns ofTable 4.The fundamental result here is that independent ofphase calibration technique, in-band or B2B phase ref-erencing, the target images have comparable parame-ters and that the DGC stage does not have a signiﬁcantdetrimental eﬀect for B2B observations. The compara-bility of the target images with the DGC source imagesindicates that small phase calibrator separation angles( < . ◦ ) can recover to within 7 % (5 % - excludingthe two images with diﬀerences >

15 % at band 8) ofthe maximal coherence possible for bands 7 and 8 com-bined. Furthermore, provided that the phase RMS ismeasured over a timescale similar to the proposed cycletime, using a strong point source, one can already es-tablish a representative value for expected phase RMSachievable after phase referencing. This in turn trans-lates to an expected image coherence, which can be eval-uated before the observations have even taken place, of comparison of calibration techniques Figure 2.

Comparisons of the image peak ﬂux densities, integrated ﬂuxes and map noise from the in-band and B2B calibrateddatasets that use the same phase calibrator. The color relates to the target frequency, blue is band 7, purple is band 8 andyellow is band 9. The symbols are representative of the maximal baseline lengths, circles are for maximal baselines between2.0 km and 3.7 km, squares are for 3.7 km to 8.5 km, while triangles are for baselines > ± course under the premise that a close by calibrator wouldbe used. Figure 6 presents images of the target sourceJ2228 − − ◦ .3.2. Investigation of phase calibrator separation angles

In the following sub-sections we present a number ofresults in order to investigate the eﬀect of phase calibra-tor separation angle building upon the results presentedin Section 3.1.3.2.1.

Comparison of In-band using distant calibrators vs.B2B using close calibrators

The underlying philosophy of the B2B technique isthat more optimal calibrator choices are available. Us-ing a lower frequency provides a higher chance of phasecalibrators being stronger and closer to a given targetsource (see Asaki et al. 2020a). The remaining test ob-servations were arranged such that the B2B phase cal-ibrator is separated at most by 1.67 ◦ from the target,whereas the in-band calibrator can range from 2.42 ◦ to11.65 ◦ away. In each observation the target source inthe B2B and in-band blocks is the same but a diﬀerentcalibrator is used. Table 5 lists these observations alongwith the peak ﬂux density, integrated ﬂux values andimage noise. For the B2B blocks there are B7-3 (12),B8-4 (10) and B9-6 (6) frequency pairs. Following the same analysis as Section 3.1, in Figure7 (left) we now ﬁnd that the peak ﬂux density values ofthe B2B calibrated images noticeably exceed those usingin-band calibration (except one band 9 dataset). Build-ing from the fact that when using the same phase cali-brator the in-band images were marginally better thanB2B images, here the complete opposite trend occurs.The central panel shows that the integrated ﬂuxes aregenerally comparable between the B2B and in-band im-ages, while the right panel mirrors the left panel, in thatthe majority of in-band images have higher image mapnoise values compared to the B2B images. There are ﬁvein-band observations with an image noise >

30 % higherthan the B2B ones. The reduction of peak ﬂux den-sity and the increase in noise for the in-band images,while still somewhat recovering a similar integrated ﬂuxas the B2B images, suggests a loss of coherence due topoorer phase correction. Even though B2B images cansuﬀer from a few percent degradation due to the DGCsolution (see Section 3.1.1), the eﬀect of using distancecalibrators is much more detrimental. Figure 8 providesa visual representation of this by showing the imagesof J2228-0753 taken on 2017 September 26 in band 7using the longest baselines. The left panel shows theB2B images that always use a calibrator only 0.68 ◦ fromthe target, while the in-band images on the right, topto bottom, use calibrators 0.68, 3.04, and 6.02 ◦ awayrespectively (see also Figure 10 in Asaki et al. 2020a).The B2B images remain largely unchanged during the4 Maud et al.

Figure 3.

Comparisons of the image coherence (as a fraction, 1.0 = 100 %), ﬁdelity and dynamic range from the B2B andin-band calibrated data that use the same phase calibrator. Analogously to Figure 2 image coherence and ﬁdelity values are ingood agreement, indicating B2B calibration can result in images as good as in-band calibration. The dynamic range is worsefor a few B2B datasets, caused by the underlying increase in map noise. Colors (bands) and symbols (maximal baseline length)are the same as Figure 2. Uncertainties are propagated from those in the peak ﬂux density and integrated ﬂux (three times theimage map noise). Band 9 are not plotted in the central panel with values > three observations, while the in-band images graduallydegrade, the poor phase correction evidently blurs thetarget images (Carilli & Holdaway 1999, see also).Figure 9 presents the image coherence, image ﬁdelityand dynamic range comparisons for the B2B images us-ing a close calibrator against the in-band images usinga more distant calibrator. The negative eﬀect of usingmore distant phase calibrators for the in-band observa-tions is clear, where in almost all cases the B2B imageshave better image coherence and image ﬁdelity than thecomparison in-band images. Interestingly, a number ofthe lower image coherence datasets ( <

60 %) show thatthe B2B values are often a factor of two better than thecomparative in-band values. This suggests that pooreratmospheric stability also decorrelates the calibratorsscans potentially introducing phase errors, that whencombined with larger phase calibrator separation angleslooking through a diﬀerent line-of-sight has a compoundeﬀect on the image quality.In Figure 10 we plot the image coherence of the in-band (left) and B2B (center) observations separatelyagainst the DGC source image coherence. The leftpanel highlights that the in-band data, with more dis-tant phase calibrators, have notably reduced image co-herence values compared to that achievable in an idealcase. The majority of in-band image coherence valuesare over 15 % worse than the DGC source image coher-ence. The B2B images (center panel) using the closecalibrators have image coherence values predominantlywithin ∼

10 % of the DGC image coherence, as alreadyshow for close calibrators in Figure 5 (left). The right panel shows the DGC source image coherence comparedwith the expected coherence, again, indicating that theestimated coherence is a good proxy for that expectedin an ideal phase referencing case. Table 6 lists the pa-rameters as plotted.Finally, in Figure 11 we plot the quality parameters ofimage position uncertainty and excess source size withrespect to the synthesized beam each dataset. The leftand right panel pairs separate the in-band and B2Bdatasets. The spread of centroid ﬁt positions is morevariable for the in-band calibrated data which use moredistant phase calibrators. A few in-band images exceeda discrepancy of more than 1/3 of the synthesized beam(dashed circle). Two of these are long-baseline band 7observations. In contrast, none of the B2B images ex-ceed this limit. The discrepancies in ﬁtted source sizeexcess are also seen for the in-band images, many ofwhich have measurable blurring or spreading of the emis-sion >

15 %, while some surpass 1/3 of the synthesizedbeam size. In general the B2B image excess sizes are <

10 %. We note that J0633-170917-B73-4deg is the onlydataset with a size excess > > Calibrator separation angle dependence

Figure 12 presents a number of in-band image param-eters as a function of calibrator separation angle. Thelarger symbol sizes highlight images where the expectedcoherence should be >

87 % after calibration. The top- comparison of calibration techniques Table 5.

Parameters of the B2B and in-band observations where a more distant phase calibrator was used for the in-bandphase referencing blocks compared to the B2B blocks.

Name Target B2B Sep. In-band Sep. Peak (mJy/beam) Flux (mJy) Noise (mJy/beam)Calibrator (deg) Calibrator (deg) In-band B2B In-band B2B In-band B2B

Band 7 - 3

J2228-170829-B73-3deg J2228 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − Band 8 - 4

J1709-170717-B84-2deg J1709 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − Band 9 - 6

J2228-170725-B96-6deg J2228 − − − − − − − − − − − − − − − a J2228 − − − Notes: The calibrator names and separation angles are listed for both the B2B and in-band blocks. The peak ﬂux density,integrated ﬂux, and image noise levels are indicated for the target source after phase referencing calibration using bothtechniques. There are no B9-4 datasets as the sources were too weak to be imaged. a The target could not be imaged with in-band calibration. left panel shows the ratio of the in-band image peakﬂux densities to the paired B2B image peak values,which all use close calibrators and should be assumedas the best achievable images when phase referencing.The bottom-left panel shows the in-band image coher-ence, i.e. the ratio of the image peak ﬂux comparedwith that from the self-calibrated ideal in-band image.Both left panels are similar, and point to a general trendof in-band image peak ﬂux degradation with separationangle. Moreover, the in-band coherence is much lowerthan expected ( <

87 %) for a number of the low expectedphase RMS datasets. The trend is perhaps most com-pelling for band 8 and band 9 datasets which probe agreater range of separation angles. The band 7 long-baseline points (blue-triangles) suggest a steeper under-lying slope when compared with the relatively constant value for mid-baselines (blue-squares), although thereare few datasets over a narrow separation angle range.We further investigate diﬀerences with baseline lengthand frequency in Section 3.2.3.The central-top panel shows the ratio of in-band im-age noise to that achieved in the B2B images. Thelighter symbols indicate data where either in-band orB2B blocks were missing, which would cause a notablenoise change. By-eye, the band 7 longer baselines (blue-triangles) appear to increase sharply, while shorter base-line band 8 data increase gradually with separation an-gle (purple-circles). In general, the noise is worse for in-band images compared to B2B images, although there isnotable scatter and no clear ensemble trend with sepa-ration angle. The bottom-central panel shows the ratioof the in-band dynamic range to the B2B image dy-6

Maud et al.

Table 6.

Image coherence, Image Fidelity, Dynamic range, DGC source image coherence and Expected coherence parametersof the B2B - in-band observations where diﬀerent phase calibrators were used.

Name Image Coherence Fidelity Dyn. Range DGC image ExpectedIn-band B2B In-band B2B In-band B2B Coherence Coherence

Band 7 - 3

J2228-170829-B73-3deg 0.79 0.87 0.87 0.99 249.94 332.24 0.95 0.95J2228-170829-B73-6deg 0.78 0.90 0.83 1.00 273.57 365.91 0.96 0.97J0449-170829-B73-3deg 0.92 0.96 0.94 0.96 406.85 462.27 0.98 0.97J2228-170830-B73-3deg 0.89 0.94 0.92 0.99 268.12 351.27 0.97 0.96J0449-170830-B73-5deg 0.79 0.90 0.85 0.96 169.24 359.87 0.98 0.97J0449-170830-B73-7deg 0.85 0.94 0.94 0.98 247.86 556.51 0.98 0.98J0633-170917-B73-4deg 0.09 0.29 0.22 0.64 21.36 64.68 0.30 0.29J2228-170926-B73-3deg 0.80 0.91 0.83 0.92 216.58 303.5 0.98 0.98J2228-170926-B73-6deg 0.65 0.95 0.69 0.97 104.15 386.68 0.98 0.98J0449-170929-B73-5deg 0.31 0.50 0.50 0.84 48.32 93.53 0.91 0.87J0633-170930-B73-4deg 0.83 0.90 0.83 0.93 164.96 199.97 0.99 0.99J0633-171001-B73-9deg 0.04 0.34 0.30 0.81 9.20 78.22 0.52 0.37

Band 8 - 4

J1709-170717-B84-2deg 0.52 a a a a a a a a a a Band 9 - 6

J2228-170725-B96-6deg 0.53 b b b b b b b b b b c - 0.90 b - 1.49 1.70 4.81 0.87 0.88 a Indicates that the image coherence was calculated against the self-calibrated image that used the scan length solutioninterval (9 s). b Indicates that the image coherence was calculated against the expected band 9 source ﬂux after extrapolation from theself-calibrated band 7 and band 8 images. c The target could not be imaged with in-band calibration. namic range. A clear decreasing trend with increasingseparation angle is apparent with all frequency bandsclosely clustered. The main driver of this trend is likelythe peak ﬂux density. The top-right panel shows theimage ﬁdelity. Given that the in-band and B2B imageintegrated ﬂuxes are generally in agreement, the shallowtrend is also likely driven by the change in image peakﬂux density. Overall, increasing calibrator-to-target sep-aration angles will reduced the recovered image peakﬂux densities.The bottom-right panel of Figure 12 shows the magni-tude of the image position oﬀset ( x + y ) / against calibrator separation angle. Here x oﬀ and y oﬀ are therespective x and y position oﬀsets of the peak ﬂux den-sity in the image from the central position, expressed asa fraction of the synthesized beam. A number of in-bandcalibrated images have a central position oﬀset exceed-ing 1/3 of the beam (dotted line). The majority of thedata hint at a trend of increasing position oﬀset withincreasing separation angle. The dashed line shows theﬁt for the position oﬀsets after excluding those > ± ∼ Dependence on Frequency and Baseline length comparison of calibration techniques Figure 4.

Diﬀerence in image coherence (as a percentage)for in-band and B2B calibrated data using the same closecalibrators as a function of residual phase RMS of the DGCsource after B2B phase referencing. Colors (bands) and sym-bols (maximal baseline length) are the same as Figure 2.Positive values indicate a better in-band image coherencewhile negative values indicate that the B2B image coherenceis higher. Band 9 are not included considering the large un-certainties in coherence. Uncertainties are propagated fromthose of the peak ﬂux density (three times the image mapnoise). There is a trend of increasing coherence diﬀerenceas a function of residual phase RMS. The black dashed lineis a linear ﬁt to the logarithm of the residual phase againstthe positive coherence diﬀerence values. J0633-170930-B73-1deg, where the B2B data was notably nosier, is excludedfrom the ﬁt - triangle at x=7.7 ◦ . If the residual phaseRMS can be minimized then DGC has only a small eﬀecton B2B calibrated images. However, if observing conditionsare marginal and the phase RMS remained high, in-band cal-ibration would be preferable provided that a close calibratorwas available. In Figure 13 we plot the corrected in-band image co-herence as a function of separation angle separated intothe short- < > < ◦ ) if they were ideally calibrated. The corrected in-band imagecoherence values are the measured image coherence val-ues (Tables 6 and 5) corrected by the expected coherenceloss (1.0 - expected coherence, where expected coherenceis derived from the expected phase RMS measured onthe DGC over ∼

30 s and scaled to the target elevation).For the low expected phase RMS data the correctionsare 3 %, 7 % and 6 % on average for bands 7, 8 and 9respectively. The underlying assumption is that phasereferencing should have corrected the phase ﬂuctuationsdown to the expected phase RMS level, and thus theimages should all have achieved the expected coherence.Thereafter our correction should shift all datasets up toa corrected image coherence value of one. This how-ever is not the case. We see clear trends of decreasingcoherence with separation angle, meaning there are ad-ditional phase errors remaining, corrupting the imagebeyond the expected level and causing coherence losses.We ﬁt each frequency and baseline length indepen-dently using only the low expected phase RMS data.The ﬁts are listed in Table 7. The y-intercept values areclose to one, as we would expect. At zero separationthere should be ideal phase transfer as there is no po-sition change. Any remaining phase variations are dueto the temporal phase-referencing which our correctionto the coherence, based on phase RMS, accounts for.This ties with the previous ﬁndings that the DGC ex-pected coherence and image coherence are almost equalwhen no position change is made during phase referenc-ing (Section 3.1.2, Figure 5-right). We note that theband 8 y-intercept is 0.92, skewed by two lower coher-ence datasets at separation angles of 1.3 and 2.4 ◦ . Ex-cluding these the y-intercept is increased to 0.99.A potentially interesting result is that the ﬁtted gradi-ents for band 9 short-baselines (-0.052 ± ± ± ± < ◦ phase RMS; the image coherencedegradation is worse for longer baselines at the sameobserving frequency; the image coherence degradationis worse for higher frequencies using similar array con-ﬁgurations. We discuss these point further in Section 4.If using these ﬁts to estimate the ﬁnal image coherence,any estimate must account for the coherence loss dueto the expected phase RMS level achievable, i.e. they-intercept would not be one: it should be the value of8 Maud et al.

Figure 5.

Comparisons of the average target image coherence with the DGC source image coherence (left) and expectedcoherence (center), and (right) the DGC source image coherence with the expected coherence. Coherence values are presentedas a fraction, 1.0 = 100 %. The DGC source image coherence represents the best value achievable using phase referencing butwithout any source position change, while the expected value is calculated from the expected phase RMS of the DGC sourcemeasured over ∼

30 s, via Equation 2. In the center panel the expected phase RMS is scaled to account for elevation diﬀerence forthe target source before computing the expected coherence. The similarity between the target and DGC source image coherencesuggests that phase referencing with small calibrator separation angles generally has little detrimental eﬀect on target images.The right panel indicates that the coherence calculated from the expected phase RMS matches well with that found in theimages of the DGC sources. Colors (bands) and symbols (maximal baseline length) are the same as Figure 2. Note band 9 datatarget image coherence values are calculated against the expected band 9 ﬂux extrapolated from the band 7 and band 8 selfcalibrated data. Uncertainties for the image coherence values are propagated from those in the peak ﬂux density of three timesthe image map noise, while that of the expected coherence is from a 20 % variability in the expected phase RMS. the expected coherence given an expected phase RMS(see Section 4.2).

Table 7.

Linear ﬁt parameters for the in-band correctedimage coherence against in-band calibrator separation angle.

Baseline length (km) Oﬀset Slope

Band 7 ± > ± Band 8 < ± Band 9 < ± Plotted diﬀerently, in Figure 14 we split the datainto two groups based on the baseline length in unitsof wavelength (to remove any frequency dependence).The top and bottom panels plot baselines shorter andlonger than 5 × λ , ∼ Table 8.

Linear ﬁt parameters for the corrected in-bandcoherence against in-band calibrator separation angle sepa-rated by visibility baseline lengths.

Baseline length ( λ ) Oﬀset Slope Short < × ± Long > × ± tions as above, we ﬁnd gradients of -0.022 ± < × λ and -0.054 ± > × λ (see also Table8), clearly highlighting the baseline length dichotomy.The y-intercept values are again close to one, althoughthe shorter baselines are still skewed by the two lower-coherence images, while the longer baselines are aﬀectedby the data spread as the ﬁt combines band 7 and scat-tered band 9 values. The black solid line indicates theﬁt to a set of ideal observations with antenna positionuncertainties added, we discuss this in Section 4.1. DISCUSSIONOne of the fundamental questions for ALMA is: whichphase calibration technique should be used for high fre- comparison of calibration techniques Figure 6.

Images of the target source J2228 − − ◦ in all cases. Allobservations have an expected phase RMS < ◦ and should all achieve a coherence >

87 %. The coherence (%) and dynamicrange values are listed to the top right of each panel. Note the band 9 images are only weak detections and that the in-bandbeam is larger than that for B2B as some automatic ﬂags occurred due to the weak phase calibrator. Overall the B2B andin-band images are comparable. We note the band 9 B2B is marginally worse but only used one DGC block. The color rangesare scaled to 50, 40 and 20 mJy for panels top to bottom, while the contour levels are at 2.5, 5, 10, 20, 30, 40 and 50 mJy/beamin all plots. Maud et al.

Figure 7.

Comparisons of the peak ﬂux densities, integrated ﬂuxes and map noise from the in-band and B2B calibrateddatasets that use diﬀerent phase calibrators. The B2B calibrators are always within 1.67 ◦ while the in-band ones range from2.42 to 11.65 ◦ . Most strikingly, compared to Figure 2 almost all the B2B image values are better than the in-band ones, i.e.have a higher peak ﬂux density. This clearly indicates a worse calibration for the in-band block due to using a more distantphase calibrator, while the B2B blocks use a close one, i.e. B2B with close calibrators will result in better images. For band 9images the noise values are oﬀ the scale. Color (bands) and symbols (baseline length) are the same as Figure 2. The dashed lineis that of equality, whereas the dotted lines either side denote discrepancies of 5 and 10 %. Uncertainties in peak ﬂux densityand ﬂux are plotted as three times the image noise. comparison of calibration techniques Figure 8.

Images of the target source J2228-0753 in band 7 using the longest baselines taken on 2017 September 26 in threeconsecutive observations, top: B2B and in-band have the same calibrator at 0.68 ◦ separation, middle: B2B keeps the 0.68 ◦ calibrator but in-band uses a more distant calibrator at 3.04 ◦ , bottom: B2B uses the 0.68 ◦ calibrator while in-band now usesa distant calibrator at 6.02 ◦ . The color ranges are all scaled to 50 mJy, while the contour levels are at 0.5, 1.0, 2.5, 5.0 and10.0 mJy/beam in all plots to highlight the weakest emission and an image defects. Note 0.5 mJy/beam corresponds to ∼ σ for all images except the in-band 6.02 ◦ image which has a higher noise value of 0.29 mJy/beam. The B2B images with the samecalibrator remain largely unchanged from top to bottom in the three observations while the in-band images gradually degradein terms of both image coherence and dynamic range as a result of poor phase correction with increasing calibrator-to-targetseparation angle. The beam sizes are of the order 35 ×

25 mas. Maud et al.

Figure 9.

Comparisons of the image coherence (as a fraction), ﬁdelity and dynamic range from the in-band and B2B calibrateddata that use diﬀerent phase calibrators. The image coherence and ﬁdelity values indicate that the B2B calibration is superior.The dynamic range is worse for in-band datasets as caused by the underlying larger map noise values. Colors (bands) andsymbols (maximal baseline length) are as Figure 2. We note that some datasets have ﬁdelity values > Figure 10.

Comparisons of the in-band (left) and B2B (center) target image coherence with the DGC source image coherence,and the DGC source image coherence against the expected coherence (right). Coherence values are plotted as fractions. TheDGC source image coherence represents the ideal case using phase referencing but without any source position change, while theexpected value is calculated from the expected phase RMS. The in-band image coherence values are always notably less thanthe DGC source values - as a result of using distant calibrators. Typically the B2B observations nearly achieve the ideal imagecoherence values - i.e. phase referencing with small calibrator separation angles generally has little detrimental eﬀect on targetimages. Colors (bands) and symbols (maximal baseline length) are as Figure 2. The band 9 data target image coherence valuesare calculated against the expected band 9 ﬂux extrapolated from the band 7 and band 8 self calibrated data. Uncertaintiesin image coherence are propagated from those in the peak ﬂux density and integrated ﬂux (three times the image map noise),while that for the expected coherences is related to a 20 % variability of the expected phase RMS. comparison of calibration techniques Figure 11.

Image ﬁt oﬀsets and source size excess as a fraction of the image synthesized beam for in-band images (left) andB2B images (right). The in-band images calibrated using more distant phase calibrators have more noticeable oﬀsets in thecentroid source position compared to the B2B data calibrated with close calibrators. The in-band image ﬁts also indicate anumber of target source sizes are > Figure 12.

In-band image parameters as a function of calibrator to target separation angle. Colors (bands) and symbols(maximal baseline length) are the same as Figure 2. Top-to-bottom, left-to-right shows the ratio of the in-band peak ﬂuxdensity to the corresponding B2B target value, image coherence, the ratio of the in-band image map noise and ratio of the imagedynamic range both to the corresponding B2B values respectively, the image ﬁdelity, and the in-band image centroid positionoﬀset. The larger symbol sizes indicate datasets where the expected phase RMS should be < ◦ , which translates an expectedcoherence 87 % (dotted line in bottom-left panel). In the top-middle panel the lighter symbols represent observations whereeither in-band or B2B blocks were entirely ﬂagged. In the bottom-right panel the dotted line marks 1/3 of a beam, while thedashed line is the ﬁt of position oﬀset with separation angle. Excluding values > Maud et al.

Figure 13.

Corrected in-band image coherence (shown as a fraction, 1.0 = 100 %) as a function of in-band phase calibratorseparation angle for short ( < − > < ◦ ). These plots show a clearcoherence degradation as a function of separation angle. The gradients, ﬁtted individually for the diﬀerent frequencies andbaseline lengths are annotated on the panels. The larger of the two symbol sizes highlights those datasets where the expectedphase RMS should be < ◦ and thus should have achieved a coherence of 87 %. The dashed lines are ﬁts using only these lowexpected phase RMS data. Colors (bands) and symbols (maximal baseline length) follow Figure 2. Errors are propagated from:an uncertainty in the peak ﬂux density that is three times the image noise; and a variation of 20% in the phase rms. In somecases these are smaller than the symbol. comparison of calibration techniques Figure 14.

Corrected in-band image coherence as a function of in-band phase calibrator separation angle as Figure 13 but withthe datasets separated by visibility baseline length in units of wavelength. The top and bottom panels separate the maximalbaselines that are shorter or longer than 5 × λ ( ∼ quency observations? We can now begin to answer thisbased on the presented results.4.1. Degradation with separation angle

Results presented throughout Section 3.2, speciﬁcallyin Section 3.2.3 and most clearly visualized in Figures13 and 14, strongly demonstrate that a small separa-tion is required between the phase calibrator and thetarget source to minimize decoherence, assuming self-calibration is not possible. As a further example, Fig-ure 15 shows images of various point-source targets cal-ibrated using standard in-band phase referencing wherethe calibrator to target separation angles are > ◦ . Top,middle and bottom panels show bands 7, 8 and 9 im-ages respectively. The visibility baseline lengths are > × λ for all expect the left-middle panel, whichshows a slightly shorter baseline band 8 observation(max baseline ∼ × λ ). The expected phase RMSvalues of these observations are lower than 30 ◦ and thusthe expected image coherence should be >

87 %, exceptthe right-middle panel, which shows ∼ ◦ and should have achieved a coherence of at least 78 %. The ex-pected coherence levels are not met for any images. Thereduced coherence and target structure defects seen inthe images are solely due to the use of distant phase cal-ibrators. We note that the lowest resolution (i.e. short-est baseline) observation (left-middle) is structurally theleast degraded, as one also expects from the previoustrends.Considering the evidence provided, we know thatshorter baselines and lower frequencies are less sus-ceptible to degradation caused by distant calibrators,as expected. On the contrary, longer baselines andhigher frequencies (independently and to a greater ef-fect when combined) are most aﬀected. Also, short-baseline ( ∼ > Maud et al. when converted to phase (see also Section 1). Us-ing only the vertical-direction baseline dependent er-ror (0.198 mm/km, Hunter et al. 2016) we estimate thepath length uncertainties ℓ ( µ m) via ∆ ρ · ( s t − s c ). Here∆ ρ is the baseline position uncertainty and ( s t − s c ) isthe calibrator to target separation angle (in radians).For baseline lengths of 15, 10 and 5 km, | ∆ ρ | = 2.97,1.98 and 0.99 mm respectively, and thus for a calibratorto target angle of 5 ◦ , ℓ ∼ µ m. During arelatively short observation the projected baselines andthe on-sky target-to-phase calibrator separation anglewill remain roughly constant. Therefore, via Equation 1,at band 7, 8 and 9 for the long-, mid- and short-baselinesalmost constant phase oﬀsets of 90, 84 and 68 ◦ , respec-tively, would be imparted to the target during phasereferencing. These values are reasonably similar, illus-trating our expected result tying low-frequency longerbaselines to high-frequency shorter baselines. The ex-pected coherence would range from 30-50 % for phaseRMS values of this magnitude (phase RMS = phaseoﬀset for data with a zero degree mean phase) before webegin to consider the eﬀects of atmospheric variationsthat remain after phase referencing. This simplistictreatment using a single maximal baseline length un-derestimates the expected coherence when comparedwith the data, because many shorter baselines in a realarray would not suﬀer such a large corruption.To provide a more realistic case we use the self-calibrated data of the DGC source from the long-baseline J0449-170928-B37-2deg dataset and corrupt itwith incorrect antenna positions generated using gen-cal in casa for a range of calibrator separation angles,from 1 to 12 ◦ (in 1 ◦ steps). We provide the antenna posi-tion uncertainties as an input, calculated with the samemethod as above, for all baselines made with the refer-ence antenna but in all directions, east, north and verti-cally. After reﬁtting the data presented in Hunter et al.(2016) we use the uncertainties as a function of baselinelength, b , of: east (0.140 mm + b km 0.071 mm/km);north (0.110 mm + b km 0.054 mm/km); and vertical(0.220 mm + b km 0.198 mm/km). The baseline-length-independent position errors are now accounted for. Werandomly select a ﬁnal position uncertainty for eachdirection from a Gaussian distribution centered on theabsolute uncertainty as above, per baseline, with stan-dard deviations in the east, north and vertical direc-tions of 0.086, 0.076 and 0.129 mm and 0.121, 0.118and 0.296 mm for baselines shorter and longer than2.5 km, respectively. These values were found usingthe data from Hunter et al. (2016) after removing thebaseline-dependent ﬁt to ﬁnd the spread. The signsof the position uncertainties are randomly assigned for each antenna. A Gaussian phase variation of 20 ◦ isadded thereafter to each baseline, representative of thatexpected to remain in well calibrated observations.The solid black line in the lower panel of Figure 14shows the ﬁt to the corrected image coherence of themodeled observations corrupted by antenna position un-certainties. We apply the same correction as used for theobserved data, adding the expected coherence loss dueto atmospheric phase variations to the measured imagecoherence. The correction is 6 % for all model imagesas we corrupted the data with only a 20 ◦ phase RMS.As expected, for zero separation angle there are no an-tenna position uncertainties and the correction appliedfor the atmospheric phase RMS sets the image coherenceto one. The degradation of the antenna-position cor-rupted models follows a curved ﬁt (corrected coherence= -0.00135 θ -0.01338 θ + 1.00638, where θ is the sep-aration angle in degrees). The ﬁt has a noticeably shal-lower decline of corrected image coherence when com-pared with the long visibility baseline data ( > × λ ).At separation angles of 2, 4 and 6 ◦ , the antenna posi-tion errors amount to image coherence losses of ∼

3, 7and 12 %, respectively. The corrected image coherencevalues, with only antenna position corruptions are 13,20 and 36 % higher than the ﬁt to the observations. An-tenna position uncertainties therefore cannot fully ac-count for the level of image decoherence. Considering acase where antenna position uncertainties were negligi-ble, assuming we can naively add the decoherence theycause to the observational data, then a re-ﬁt to the ob-servations with antenna position uncertainties removedyields a gradient of -0.026 ± ◦ and 5.5 ◦ were not signiﬁcantly improvedafter phase referencing, even when using fast switch-ing, which should have corrected any rapid atmosphericﬂuctuations. Similarly, poor phase correction was il-lustrated at millimeter wavelengths in paired antennatests by Asaki et al. (1998) using the NMA with base-lines < ◦ during the ob-servations. With the paired antenna system the targetand satellite were continuously monitored. The diﬀer-ence of the phase-time series for the target and satellite comparison of calibration techniques Figure 15.

Images of various target sources calibrated using standard in-band phase referencing but where phase calibratorshave an angular separation > ◦ Top-left to bottom right are images of the targets J2228-0753, J0449-4350, J0633-2223, J2228-0753, J0449-4350 and J2228-0753. The top, middle and bottom panels show bands 7, 8 and 9 respectively. The images areall scaled to the respective peak ﬂux values, and the contours are plotted at the 5, 10, 30, 50, 70 and 90 % levels. The paneltitles identify the observation and the calibrator separation angle. The beams are shown as a black ellipses to the bottomleft, while the coherence (%) and dynamic range are shown at the top right. The expected phase RMS for all observationwas < ◦ and the images should have achieved image coherence valves >

87 %, except the right-middle panel band 8 imagewhich was ∼ ◦ corresponding to an image coherence of 78 %. All images have a coherence lower than expected and some haveconsiderable structural defects. The white cross marks the center of the image. We note that J2228-0753 is not imaged in band9 (bottom-right) and no parameters are shown. Maud et al. indicated that eﬀective phase correction occurred onlywhen the satellite was within ∼ ◦ of the target, andthat only if it was within ∼ ◦ was the path length af-ter correction below 100 µ m. This path length was arequirement at the time of the study in which futureobservatories could successfully undertake sub-mm ob-servations. Beyond a 10-15 ◦ angular separation therewas no correction to the target phases. Interestingly,Asaki et al. (1998) were able to better phase calibratetheir observations via the introduction of a time-lag intothe phase-time series as a function of increasing separa-tion angle which compensated for the diﬀering targetand calibrator lines-of-sight as the atmosphere advectedover the array. It is also worth noting a study of cal-ibrator separation angles using VLBI data at 8.4 and15.0 GHz frequencies by Mart´ı-Vidal et al. (2010b). Inshort, these authors also ﬁnd a clear decrease of im-age coherence (they call this peak-ratio) with separationangle. Their results also tie with their previous model-ing of a snapshot turbulent atmosphere, which includedboth ionospheric and tropospheric eﬀects, the formerimportant at such low frequencies (Mart´ı-Vidal et al.2010a). Unlike the ALMA observations we present,those of Mart´ı-Vidal et al. (2010b) do not further deteri-orate past separation angles of ∼ ◦ where is it believedthat the atmospheric turbulence saturates at these lowerobserving frequencies. For high-frequencies, our ﬁndingscombined with those of Asaki et al. (1998, 2016) implythat close calibrators are required to correctly track andcorrect the ﬂuctuations near to the line-of-sight of a tar-get.In Figure 16 we indicate the residual phase RMS ofthe target sources pre- and post- phase calibration fromall band 7 and 8 observations that should have achievedan expected phase RMS < ◦ . The symbol sizes areindicative of the target-to-calibrator separation angles,with dark and light symbols representing in-band andB2B calibration. The B2B data all use close calibra-tors. We remove any nonzero phase oﬀsets from thephase RMS to exclude the eﬀect of antenna position un-certainties that manifest as almost constant phase oﬀ-sets in time, and thus the residual phase RMS valuesplotted are assumed to be purely due to uncorrected at-mospheric ﬂuctuations. Even though our observationswere taken in a variety of stability conditions, indicatedby the large spread of pre-calibration phase RMS val-ues, the targets using nearby calibrators are generallycorrected to, or below, the expected phase RMS level(30 ◦ ). The post-calibration residual phase RMS val-ues for in-band data using distant calibrators generallyshow values worse than the expect phase RMS level. Forsome in-band data with distant calibrators that do meet Figure 16.

Pre- and post-calibration residual phase RMSof the target sources from all band 7 and 8 observations thatwere expected to achieve a low phase RMS ( < ◦ ). Colors(bands) and symbols (maximal baseline length) follow Figure2, while symbol size is indicative of phase calibrator separa-tion angle and dark and light colors are for in-band and B2Bobservations respectively. The diagonal dashed line markregions of no correction (left) and sub-optimal correction(right) while the horizontal one marks the line of expectedmaximal phase RMS (30 ◦ ). Most B2B datasets with closecalibrators have post-calibration residual phase RMS valuesthat meet the expected phase RMS, whereas many in-banddatasets fall in the region of sub-optimal phase correctiondue to using distant calibrators. Some in-band data withdistant calibrators do meet the expect phase RMS level, how-ever, these already had very low pre-calibration phase RMSvalues (e.g. large blue-squares near ∼ ◦ pre-calibrationphase RMS). the expected level, they already had low pre-calibrationphase RMS values and are actually corrected very lit-tle by phase referencing (e.g. larger blue-squares near apre-calibration phase RMS of ∼ ◦ ). In-band datawith distance calibrators therefore indicate sub-optimalcalibration.Crucially, when using distant phase calibrators, inaddition to antenna position uncertainties, uncorrectedatmospheric phase variations still remain in data thatshould have been corrected to a much lower phase RMShad a close calibrator been used.4.2. Recommended maximal separation angles

If a maximal calibrator separation angle is not em-ployed images could be degraded to a level where they comparison of calibration techniques ∼

70 % are associatedwith an increase of structural defects when the calibra-tor separation angles exceed ∼ ◦ . With this as a ref-erence point and following the empirical ﬁts to Figure14 provided in Table 7 we parameterize the recommendmaximal phase calibrator to target separation angle re-quired to achieved a minimal 70 % image coherence. Asnoted in Section 3.2.3, the ﬁnal image coherence hastwo components: the expected coherence derived fromthe expected phase RMS; and the additional coherenceloss due to separation angle. To reiterate, the formerparameterizes the highest achievable coherence tied tothe lowest expected phase RMS after phase referenc-ing with an ideal zero degree separation angle. Thelatter encompasses antenna position uncertainties andany sub-optimal phase correction of the temporal atmo-spheric ﬂuctuations due to the ﬁnite calibrator-to-targetseparation angle. Following the divide used throughoutthis work, we assume that atmospheric ﬂuctuations canbe corrected to an expected phase RMS of 30 ◦ , and thusthe maximal image coherence should be 87 %, a loss ofonly 13 %. In evaluating the separation angle required toachieve the ﬁnal image coherence of 70 %, the expectedcoherence loss is subtracted from the y-intercept of theﬁts presented in Table 7. The maximal separation an-gles using the shallowest slope limit and maximal base-line length of each group are presented in Table 9. Wenote that these limits need not be as strict for better ob-serving conditions where the expected phase RMS wouldbe < ◦ as the coherence loss would be <

13 %, or forsources that can be self-calibrated. However, when con-sidering self-calibration image structural defects wouldstill need to be minimized for targets with complex ex-tended structure. We also ﬁt a planar function to thesemaximal recommended separation angles in order to ﬁllthe remaining baseline and frequency ranges that werenot covered in our test observations. We caution thatthe planar ﬁt uses only four data points. For band 9long-baselines the value is un-physical (negative), whichmay point to a real diﬃculty in achieving a high enoughimage coherence or possibly that the linear ﬁt used toparameterize the image coherence degradation with sep-aration angle is too simplistic.Asaki et al. (2020a) presented a table listing themean separation angles of suitable calibrators for high-frequency observations. Those authors indicated thatfor bands 9 and 10 in-band calibrators would be > ◦ away while B2B calibrators are almost a factor of twocloser at ∼ ◦ . Certainly B2B appears to be the onlyforeseeable way to calibrate high-frequency observa-tions. The B2B average separation angle of availablecalibrators is larger than our requirement for band 8long-baselines in Table 9 and suggests higher frequencyobservations may not achieve a minimal 70 % coherencelevel. In the present long-baseline observing regime,where ALMA uses 72 s cycle times with ∼

18 s spenton the calibrator, successful high-frequency observa-tions may only be conducted if the science target isluckily within 1-2 ◦ of a strong quasar. Alternatively ifthe atmosphere was stable enough, longer cycle-timeswith longer on-calibrator times could be used to at-tain suitably high S/N solutions for weaker, but close,calibrators. Following from the discussion in Section4.1, it is apparent that other remedial action could bemade. The eﬀect of antenna position uncertainties couldbe minimized by performing a short observation (at alower frequency) targeting a number of quasars withina few degrees of the target in order to calculate thevertical (zenith) path length errors. A better correctionof these at the time of observing would minimize theeﬀect of the vertical uncertainties in the antenna posi-tions and could increase the calibrator separation angletolerance to match the B2B calibrator availability. Itis worth noting that long time-interval self-calibrationcould also provide some correction for uncertain antennapositions which are manifest as almost constant phaseoﬀsets. However, a detailed study with high frequencylong-baselines observations is required to understandthe divide between sub-optimal phase referencing andantenna position errors as a function of calibrator sep-aration angle, and whether long term self-calibration isfeasible. Table 9.

Maximum recommended calibrator separation an-gle to achieve an image coherence of 70 %.

Band (Frequency) b< b =3.7-8.5 km b =8.5-16.0 km7 (279 GHz) 10.5 a a a a - b Notes: The separation is estimated using the lower-limitgradient in Table 7 for various baseline lengths, b , assum-ing an expected phase RMS of 30 ◦ , attributable to a 13 %coherence loss. a Estimated using a planar ﬁt. b − Unphysical value.

Choice of technique Maud et al.

Given that we have now established suitable maximallimits for the calibrator to target separation angle, wemust ask: when does B2B deliver higher image coher-ence than in-band when calibrators are found for both?To make this assessment we compare the diﬀerencebetween in-band and B2B image coherence values forthe paired observation as a function of the diﬀerence inthe phase calibrator separation angles. We subtract theB2B image coherence from the in-band values. In caseswhere the same phase calibrators were used for B2B andin-band, the diﬀerence in separation angle will be zerodegrees, while if the in-band images have a higher co-herence, the coherence diﬀerence will be positive. Thein-band data diﬀerenced with the B2B data are plottedas ﬁlled symbols in Figure 17. Many of the test observa-tions were taken consecutively on the same day, of whichthe in-band blocks often observe the same target sourcein the same band but with a diﬀerent phase calibrator.We have 28 additional unique pairings of in-band onlyobservations from which we plot the image coherenceof the observation after diﬀerencing with that using thecloser calibrator for the various pairings. These in-banddata are plotted as open symbols in Figure 17.

Table 10.

Linear ﬁt parameters from the image coherencediﬀerence against calibrator separation angle diﬀerence.

Baseline length (km) Oﬀset Slope

Band 7 ± > ± Band 8 < ± Band 9 < ± Figure 17 can be viewed as directly indicating a fur-ther degradation of in-band observations when speciﬁ-cally compared to B2B observations independent of theabsolute calibrator separation angle. In principle if in-band and B2B observations used the same calibrator,be it at an absolute separation angle of 2 ◦ or 5 ◦ , bothwould be plotted at a separation angle diﬀerence of 0 ◦ and thus the coherence diﬀerence would be around zero.Evidently there is a notable image coherence degrada-tion as a function of phase calibrator separation diﬀer-ence. These trends can be used to decide if the in-bandor B2B technique should be preferred in returning thehighest coherence images. Simply, using a linear ﬁt (seeTable 10) we can evaluate m ∆ θ + k , where k is the y-axisintercept, m is the gradient and ∆ θ is the angular diﬀer- ence (in degrees) between in-band and B2B calibrators.Values > k , are > m ∆ θ + k <

0, theB2B technique would be preferred to provide a calibra-tion yielding the highest coherence image. For the band7 long-baseline data investigated this would already oc-cur when a B2B calibrator was only ∼ ◦ closer to thetarget compared to the available in-band one.Interestingly, the y-axis intercept values for the band7 mid- and long-baselines groups are almost zero, sug-gesting that the B2B images are not noticeably degradeddue to the added DGC step in comparison with the band8 observations. This however is a side eﬀect of the dataat hand. The band 7 and band 8 datasets with lowexpected phase RMS have average residual DGC sourcephase RMS values of ∼ ◦ and ∼ ◦ , respectively. Withreference to Section 3.1.1 the DGC detrimental eﬀect onthe band 7 data is < ∼ Deﬁning an operational metric

From an operations perspective we have to take apragmatic approach, and not just consider which tech-nique provides the highest coherence image judged fromthe linear trends presented in Section 4.3. Time over-heads for the B2B mode must be accounted for, and like-wise the detrimental eﬀect of the DGC considering thepermissible observing conditions. The operation metriccan be broken into three parts: inaccuracies in the DGCsolution, time overheads, and which technique providesthe best image coherence for the phase calibrators avail-able.Firstly, following the previous scenario we considerthe case where the expected phase RMS remaining af-ter phase referencing is 30 ◦ for the DGC source. There-fore, we could expect that the detrimental eﬀect of DGCwould reduce the coherence of a B2B image by up to7 %, see also Section 3.1.1. Indeed, if the expected phaseRMS could be reduced further, by observing in betterconditions, the detriment to the B2B target image willbecome almost negligible.Secondly, observations at ALMA are broken up intoexecution blocks, EBs. Each EB lasts a pre-deﬁnedamount of time, limited by the maximal target source in- comparison of calibration techniques Figure 17.

Image coherence diﬀerence resulting from the subtraction of the B2B image coherence from the in-band pairedobservations (ﬁlled) and from in-band observations paired with other in-band observations on the same day but using a diﬀerentcalibrator (open) as a function of phase calibrator separation angle diﬀerence, for short- ( < − > ∼

30 s interval, are < ◦ . The dashed lines are ﬁts using only lowexpected phase RMS data for the diﬀerent frequencies and baseline lengths, and using only in-band data diﬀerenced with B2Bdata. Table 10 presents the ﬁt results. Colors (bands) and symbols (maximal baseline length) are the same as in Figure 2. Band9 observations with much poorer B2B images are oﬀ the scale of the top panel and are excluded in the ﬁtting. Clearly using acloser calibrator will result in a better image coherence. tegration time of 50 min, and is a self-contained datasetwith all required calibrators. In order to achieve thescientiﬁc goal of any project the EB is repeated un-til the sensitivity requirement is met after calibrationand imaging. For B2B projects a DGC source would beadded to the EB. If observations follow a similar regimeto the presented tests the DGC source would be ob-served for ∼ σ image ∝ . / √ ∆ t obs , the B2B im-age noise level would increase to 1 . / √ . >

30 % worse than B2B images, the av-erage noise increase for in-band data is 8 %. Overall2

Maud et al. balancing the time overhead of B2B as a noise increasewith the in-band noise increase due to using distant cal-ibrators, there is only a 4 % deﬁcit for B2B. We notethat in reality the B2B observations will be extended induration to ensure that the noise requirements are met,and that the above calculation is an illustrative way toaccount for the cost, in terms of image parameters, ofthe extra time.The third and ﬁnal term was already found in Section4.3. The gradient for the ﬁts in Figure 17 indicates theextra image degradation per degree of calibrator sepa-ration angle diﬀerence. We note that the gradient is theonly negative term of the operational metric. As a re-minder, the y-axis intercept is not used from those ﬁts,as it is biased by the low residual phase RMS of theDGC source of those data. We address this by insteadusing the ﬁrst term of the metric described above.The three terms of the operational metric, in order,can now be combined as shown in Equation 4:0 .

11 + m ∆ θ = ( B2B if < − band if > m is the gra-dient of the ﬁts from Table 10 dependent on baselinelength and frequency and ∆ θ is the angular diﬀerence(in degrees) between in-band and B2B calibrators. Fol-lowing the same rational as Section 4.3, B2B would beapplied in operations based on the evaluation of Equa-tion 4 being <

0, otherwise a standard in-band observa-tion should be used under the premise that the maxi-mal recommended calibrator separation angle is not ex-ceeded. To provide two clear cases relevant for the up-coming ALMA band 7 long-baseline observations: if anin-band calibrator was found only beyond the maximalseparation angle limit of 3.8 ◦ then B2B would be thedefault technique used; whereas if an in-band calibratorwas found at the maximal limit from the target, B2Bwould only be employed if the calibrator is 2.1 ◦ closer(i.e. < ◦ from the target). SUMMARYAn investigation was made using 44 analyzed datasetstaken as part of the ALMA high-frequency long-baselinecampaign 2017 (HF-LBC-2017) in order to make a di-rect comparison between standard in-band phase refer-encing and band-to-band (B2B) phase referencing tech-niques. The B2B technique is a method to calibratehigh frequency observations using phase solutions froma calibrator observed at a lower frequency, as the quasarcalibrator is likely to be brighter and also closer to thetarget. Calibration involves correction of the instrumen- tal phase oﬀset between the frequencies and the conver-sion of the calibrator temporal phases to the frequencyof the target. A diﬀerential-gain-calibration (DGC) se-quence, consisting of alternating low and high frequencyscans of strong quasar, is used to calibrate the instru-mental oﬀset. The test observations included both B2Band in-band phase referencing blocks which observed thesame target sources that were calibrated with either thesame, or diﬀerent phase calibrators. A range of max-imal baseline lengths were investigated from 2 km outto ∼

15 km. Successful high frequency observations weremade in bands 7, 8 and 9, which were paired with band3, 4 and 4 and 6 respectively for the B2B mode (B7-3,B8-4, B9-4, B9-6). The work presented also examinedthe detrimental eﬀects of increasing target to phase cal-ibrator separation angles on calibration and imaging.In 16 observations the same nearby phase calibratorwas chosen for the in-band and B2B blocks (separa-tion angles < ◦ ). Comparing target image param-eters such as peak ﬂux density and image coherence weﬁnd the B2B calibration technique can produce imageswith similar properties, within 5-10 %, of the standardin-band phase calibration, the latter providing typicallybetter images. The DGC step required for the B2Btechnique is responsible for the few percent diﬀerencebetween the B2B and in-band images. Provided thatthe phase residuals are minimized ( < ◦ ) for the DGCsource then the detriment to B2B images is < < < ◦ .For the remaining 28 datasets the phase calibratorsfor the B2B block were selected to be nearby ( < ◦ )whereas the in-band calibrators were chosen to be at alarger separation angles, between 2.42 and 11.65 ◦ . Theseobservations were designed to test the speciﬁc use case ofB2B in which low frequency calibrators are much morelikely to be found closer to a given target comparedto ﬁnding a suitably strong calibrator at the in-bandhigh frequency. Comparing the B2B and in-band im-ages we ﬁnd that the peak ﬂux density, noise level andimage coherence parameters are superior for the B2B im-ages. Most in-band images have coherence values thatare >

15 % worse than the B2B ones.Examination of the in-band image coherence showsa linear decrease with increasing calibrator separationangles. The image coherence values are lower than theexpected image coherence values which are calculatedusing the expected phase RMS measured over the cycletime as a proxy. Corruption of an ideal self-calibratedlong-baseline observation with antenna position uncer-tainty eﬀects can only partially account for the deco-herence. The remaining coherence loss is attributed to comparison of calibration techniques > < × λ , we ﬁnd that shorterbaselines have a fractional image coherence degradationof − . ± − . ± > >

87 %.Notable image deformation and structural changes alsooccur and could eﬀect a scientiﬁc interpretation. Forband 7 long-baseline observations, assuming conditionsthat could achieve an expected ∼ ◦ phase RMS corre-sponding to a ∼

87 % coherence, a phase calibrator mustbe within ∼ ◦ to actually achieve a ﬁnal image coherence >

70 %. Propagation to higher frequencies suggests thatcalibrators may need to be closer than 2 ◦ to a target,something that may only be possible to achieve usingB2B observations. Finally, we present a metric that could be used tojudge whether the B2B mode should be used in oper-ations depending on calibrator availability. The B2Bmode should always be used if there is no suitably closein-band calibrator that would provide a ﬁnal target im-age coherence >