[PDF] Spurious source generation in mapping from noisy phase-self-calibrated data

Abstract

Phase self-calibration (or selfcal) is an algorithm often used in the calibration of interferometric observations in astronomy. Although a powerful tool, this algorithm presents strong limitations when applied to data with a low signal-to-noise ratio. We analyze the artifacts that the phase selfcal algorithm produces when applied to extremely noisy data. We show how the phase selfcal may generate a spurious source in the sky from a distribution of completely random visibilities. This spurious source is indistinguishable from a real one. We numerically and analytically compute the relationship between the maximum spurious flux density generated by selfcal from noise and the particulars of the interferometric observations. Finally, we present two simple tests that can be applied to interferometric data for checking whether a source detection is real or whether the source is an artifact of the phase self-calibration algorithm.

Full PDF

aa r X i v : . [ a s t r o - ph ] J a n Astronomy&Astrophysicsmanuscript no. 2007-8690 c (cid:13)

ESO 2018October 28, 2018

Spurious source generation in mapping from noisyphase-self-calibrated data

I. Mart´ı-Vidal and J. M. Marcaide Dpt. Astronomia i Astrof´ısica, Universitat de Val`encia, C / Dr. Moliner 50, 46100 Burjassot (Valencia), SPAINe-mail:

[email protected]

Accepted on 12 December 2007

ABSTRACT

Phase self-calibration (or selfcal ) is an algorithm often used in the calibration of interferometric observations in astronomy. Althougha powerful tool, this algorithm presents strong limitations when applied to data with a low signal-to-noise ratio. We analyze theartifacts that the phase selfcal algorithm produces when applied to extremely noisy data. We show how the phase selfcal may generatea spurious source in the sky from a distribution of completely random visibilities. This spurious source is indistinguishable from a realone. We numerically and analytically compute the relationship between the maximum spurious ﬂux density generated by selfcal fromnoise and the particulars of the interferometric observations. Finally, we present two simple tests that can be applied to interferometricdata for checking whether a source detection is real or whether the source is an artifact of the phase self-calibration algorithm.

Key words.

Techniques: interferometric – Methods: data analysis – Techniques: image processing

1. Introduction

Phase self-calibration (or selfcal ) is an algorithm often usedin the calibration of radio astronomical data. It was intro-duced by Readhead & Wilkinson (1978) and Cotton (1979),and it has been essential for the success of Very LongBaseline Interferometry (VLBI) imaging. Also, the antenna-based calibrations obtained from the

Global Fringe Fitting al-gorithm (Schwab & Cotton 1983) are equivalent to a phaseself-calibration. The phase selfcal will also be an algorithmwidely used with future interferometric instruments, such asthe Atacama Large Millimeter Array (ALMA) or the SquareKilometre Array (SKA), now under construction or planned.Optical interferometric observations (like those in the VeryLarge Telescope Interferometry, VLTI) will also eventually ben-eﬁt from some form of selfcal, although closure phases and am-plitudes are measured in optical interferometry in a very di ﬀ er-ent way than in radio. Thus, the statistical analysis presentedhere may need some substantial changes to rigorously describethe probability of false detections by optical interferometers.Given that part of the interferometric observations obtainedfrom all those instruments may come from very faint sources,it is important to take into account the undesired and uncontrol-lable e ﬀ ects that the instrumentation and / or the calibration andanalysis algorithms applied to the data could introduce in the in-terferometric observations. A deep study of all our analysis toolsand their e ﬀ ects on noisy data is essential for discerning the re-liability of detections of very faint sources. Some discoveriesmade by pushing the interferometric instruments to their sensi-tivity limits could turn out to be the result of artifacts producedby the analysis tools.The main limitations of the phase self-calibration algorithmhave been analyzed in many publications (e.g., Linﬁeld 1986,Wilkinson et al. 1988). It is well known that an unwise use of Send o ﬀ print requests to : I. Mart´ı-Vidal selfcal can lead to imperfect images, even to the generation ofspurious source components, elimination of real components,and deformation of the structure of extended sources. In this pa-per, we focus on the e ﬀ ects that phase self-calibration produceswhen applied to pure noise. We show that selfcal can generate aspurious source from pure noise, with a relatively high ﬂux den-sity compared to the rms of the visibility amplitudes. We ana-lytically and numerically study how the recoverable ﬂux densityof such a spurious source depends on the details of the observa-tions (the sensitivity of the interferometer, the number of anten-nas, and the averaging time of the selfcal solutions). Finally, westudy the e ﬀ ects of phase self-calibration applied to the visibili-ties resulting from observations of real faint sources, instead ofpure noise. We present two simple tests that can be applied toreal data in order to check whether a given faint source is realor not, and apply these tests to real data, corresponding to VLBIobservations of the radio supernova SN 2004et (Mart´ı-Vidal etal. 2007).

2. Basics of phase self-calibration

The basics of phase self-calibration can be found in many publi-cations (e.g., Readhead & Wilkinson 1978, Schwab 1980). Here,we explain the essence of this algorithm in a few lines. Let ussuppose that we have made an interferometric observation us-ing a set of N antennas. We obtain one visibility, V i j , for eachbaseline, that is, for each pair of antennas ( i , j ). Let us call φ i j the phase of the visibility V i j . Any atmospheric or instrumentale ﬀ ect on the optical path of the signals that arrived to antennas i and j will a ﬀ ect the phase φ i j in the way: φ i j = φ stri j + φ atmi − φ atmj , (1)where φ atml represents all the undesired (i.e., atmospheric andinstrumental) contributions to the optical path of the signal re-ceived by the antenna l and φ stri j the contribution to the phase that I. Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. comes purely from the structure of the observed source, that is,the so called source structure phase . It can be easily shown thatthe quantities known as closure phases , and deﬁned as (Jennison1958, Rogers et al. 1974): C i jk = φ i j + φ ik − φ jk (2)are independent of φ atml . That is, the closure phases C i jk are onlydeﬁned by the structure of the observed source. Thus, they are in-dependent of any atmospheric or instrumental contribution thatmay a ﬀ ect the signals received by the antennas of the interfer-ometer. The phase self-calibration algorithm takes advantage ofthe closure phases to estimate the undesired antenna-dependentcontributions φ atml . In short, the phase selfcal ﬁnds which set ofantenna-dependent quantities φ gainl (called phase gains ) generatethe set of phases: φ sel fi j = φ i j − φ gaini + φ gainj , (3)where φ sel fi j are the phases that better represent the source struc-ture given by the closure phases C i jk . In the ideal case, φ gainl = φ atml and φ sel fi j = φ stri j . The process from which the values φ gainl (for l = i , j ) are obtained is called hybrid mapping , and its ex-planation can be found in many publications (e.g., Cornwell &Wilkinson 1981). Here, su ﬃ ce to say that the phase gains of theantennas are obtained from a least-square ﬁt of the raw visibili-ties to the source model obtained from the mapping.The hybrid mapping is an iterative process from which thestructure of the source model is reﬁned step by step. Often, themodel used in the ﬁrst iteration of hybrid mapping is a pointsource located at the center of the map (obviously, the ﬂux den-sity of this point source will not a ﬀ ect the phase calibration).The successive steps of hybrid mapping and selfcal correct thispoint source model until the structure that better represents allthe closure phases is obtained.

3. Probability distribution of the visibilities due topure noise

When an interferometer observes a given source with a ﬂux den-sity well below the sensitivity limit of its baselines (that is, whenthe interferometric data contain only noise), both the real andimaginary parts of the resulting visibilities follow Gaussian dis-tributions, centered at the origin. The amplitudes and phases ofthe visibilities follow distributions di ﬀ erent from Gaussian. Itcan be shown that, for each baseline, the probability distributionof the phases is uniform between − π and π , and that instead thedistribution of the amplitudes is given by: g ( A ) = A σ i j exp  − A σ i j  (4)where g ( A ) is the probability density of the amplitudes and σ i j isthe width of the Gaussian distributions of the real and imaginaryparts of the visibilities of the baseline ( i , j ). The width σ i j isrelated to the thermal noise of the baseline ( i , j ). We assume, forsimplicity, that all the baselines of the interferometer have thesame value of σ . It can be shown that the rms of the visibilityamplitudes of a pure noise signal is ρ = √ σ and that the meanamplitude, < A > , is √ π/ σ , di ﬀ erent from zero.

4. Probability of generating a spurious source frompure noise

Given that the real part of a visibility with phase in the range( − π/ , π/

2) is positive, all the visibilities with phases in thatrange bring a positive mean ﬂux density to the map. We call phase close to zero to a phase in the range ( − π/ , π/

2) and phasedistant from zero to a phase outside that range.The distribution of closure phases is uniform between − π and π , as it is also the case for the distribution of phases. This meansthat there is a subset of closure phases that by chance are close tozero, being compatible with a point source. However, there arealso closure phases distant from zero, which are totally incom-patible with a compact source. If self-calibration is not appliedto the data, then the uniform distribution of phases (and closurephases) will result in a noisy map with no source deﬁned in it.But if a single iteration of phase self-calibration is applied, thereis a selection process of the closure phases in the calibration,which may generate a spurious source with a ﬂux density com-parable to the rms of the amplitudes (which can be much higherthan the rms of the image), as we show below.For each scan, the e ﬀ ects of the least-square ﬁt described inSect. 2 can be understood in the following way: selfcal searchesfor the visibilities corresponding to the antennas most commonlyappearing in the closure phases that are close to zero (whichusually correspond to phases that can be modelled with a pointsource). Then, selfcal minimizes the phases of such visibilitiesby calibrating those antennas, leaving all the other visibilitieswith the phases dispersed between − π and π . In other words, thephases of the visibilities with large closure phases contribute toincrease the value of the χ at the minimum, but the positionof such minimum only depends on the visibilities with phasesthat can be modelled with a point source. That is, all the visi-bilities susceptible of producing a compact source are calibratedand their phases concentrate around zero. All the other visibili-ties tend to have their phases uniformly distributed between − π and π , thus generating, after the Fourier inversion, a null meanﬂux density in the map. Thus, selfcal always produces a positivebias in the mean ﬂux density of the map, given that selfcal onlyacts, e ﬀ ectively, on the phases that can be approached to zero,because their corresponding closure phases are close to zero.One might think that it is very di ﬃ cult for an antenna to beinvolved in a large number of closure phases close to zero, giventhat the distribution of phases is uniform. One might think that,in average, a given antenna is involved in the same number ofclosure phases that are close to zero than in the closure phasesthat are distant from zero. In such case, it would be impossiblefor selfcal to select which antenna should be calibrated, giventhat all the antennas participating in each scan would have thesame chances for being calibrated. But this is only true in aver-age. In the distributions of all the interferometric observations,there are statistical ﬂuctuations, which are always used by self-cal for the generation of a spurious point source. For a givenantenna i , the probability of ﬁnding n closure phases (in whichthat antenna is involved) close to zero is: P i ( n ) = N ′ ( N ′ )! n !( N ′ − n )! (5)where N ′ = ( N − N − / i is involed. (Notice that even thoughall closure phases are not independent, the closure phases withone common antenna are.) Thus, there is a ﬁnite probability ofﬁnding an antenna that appears in more than N ′ / . Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. 3 the antenna i appears will be minimized with success, generatinga positive mean ﬂux density in the map. Actually, even the casesin which n < N ′ / i appears will tend to gather around − π and π ,meaning that there are other antennas that will appear in a largenumber of closure phases close to zero (i.e., the antennas be-longing to the closure phases distant from zero in which antenna i appears).In short, there are always statistical ﬂuctuations in the purenoise distribution of phases that can be used by selfcal to produce(by a selection process of the antennas most commonly beinginvolved in the closure phases close to zero) a point source witha spurious source ﬂux density.It must be said that Global Fringe Fitting, when applied tonoisy data, can also generate a spurious source from pure noise,given that this algorithm also ﬁnds antenna-based calibrationsfor adjusting the interferometric fringes of all the baselines ineach scan. The spurious source is generated as long as the min-imum SNR of the fringes to be considered in the ﬁt is set toa small value (lower than 2 or 3) . In those cases, the FringeFitting would work on correlation peaks (fringes) produced, inmany cases, by spurious noise ﬂuctuations. Then, by the samereasons given above, there would be a relatively high probabilityof generating a spurious point source from pure noise.

5. Dependence of the ﬂux density of the spurioussource on the characteristics of the observations

In this section we consider how the ﬂux density of the spurioussource generated by selfcal depends on the parameters deﬁning aset of observations. The parameters that we consider are the sen-sitivity of the array (for simplicity, we assume the same sensitiv-ity for all the baselines of the interferometric array), the numberof antennas of the interferometer, and the averaging time of theselfcal solutions. For the case of the Global Fringe Fitting algo-rithm, the averaging time of the solutions is equal to the durationof the scans.

We generated a set of synthetic interferometric data with the pro-gram fake of the Caltech Package (Pearson 1991). We generated6 hours of observations using a set of 20 antennas. All theseantennas had the same diameters (25 m) and the same systemtemperatures (60 K). The correlator integration time was set to2 seconds. The source model used by fake consisted on a singlepoint source with a ﬂux density of 1 nJy (of course, completelyundetectable by the interferometer). The data generated this waythus contain 6 hours of pure noise observations made at 20 iden-tical antennas under identical conditions. The mean of all thevisibility amplitudes is 106 mJy.We used the program difmap (Shepherd et al. 1995) for hy-brid mapping. We applied the natural weighting scheme to thevisibilities, for sensitivity optimization, and applied an initialselfcal using a centered point source. The hybrid mapping stepswere repeated until the χ of the ﬁt of selfcal arrived to conver-gence. Then, we deconvolved the resulting point source using Even though a SNR of 2 − ﬀ s of ∼ ﬀ s may be applied with very small delay / rate windows, say,after a phase-reference pre-calibration. the CLEAN algorithm to see how much ﬂux density was gener-ated by selfcal. This process was repeated for di ﬀ erent numbersof antennas and for di ﬀ erent averaging times of the selfcal so-lutions. The spurious point source ﬂux densities obtained in allthese cases are shown in Fig. 1 as ﬁlled circles. When the interferometer observes only noise, the sensitivity ofthe baselines deﬁnes the value of the standard deviation, σ , ofthe Gaussian distributions of the real and imaginary parts of thevisibilities. As the sensitivity increases, the thermal noise of thebaselines decreases, decreasing also the value of σ . Given thatselfcal has to do only with phases and leaves the amplitudes un-altered, the recoverable ﬂux density of the spurious source de-pends linearly on the rms of the amplitudes, which in turn alsodepends linearly on the value of σ . Thus, an increase in the sen-sitivity of the interferometer decreases the amount of spuriousﬂux density recoverable from the data using selfcal. The con-stant factor in the ratio between the ﬂux density recovered andthe rms of the visibility amplitudes depends on the method usedto estimate the ﬂux density (i. e. di ﬀ erent deconvolution algo-rithms or modelﬁtting to the visibilities). In our case, from thenumerical study described in the previous section, we determineit to be 0 . ± . ﬀ ecton the spurious source ﬂux density. If we have one observationevery t seconds (usually, t is 2 seconds) and ﬁnd only one so-lution of selfcal every t seconds, with t > t , then the spurioussource ﬂux density decreases. Finding a single solution of selfcalevery t seconds is equivalent to averaging all the observations inbins of t seconds and, afterwards, self-calibrating the resultingvisibilities. When we average the visibilities in blocks of t sec-onds, we are averaging separately the real and imaginary partsof the visibilities, which follow Gaussian distributions. The ef-fect of this average is that the standard deviations of the resultingdistributions decrease by a factor √ t / t , because of the CentralLimit Theorem.The dependence of the spurious source ﬂux density on thenumber of antennas is more di ﬃ cult to ﬁnd. There are lots ofpossible combinations of phases and baselines that help selfcalto generate a spurious source, and each one of these combina-tions has a di ﬀ erent weight in the ﬁnal spurious ﬂux density. Wecan use a simpliﬁed phenomenological model to ﬁnd out the re-covered ﬂux density as a function of the number of antennas.In principle, the recovered ﬂux density depends directly on howwell the point model ﬁts the data. A good indicator of the ad-justability of the data, with the phases randomly distributed, is,for each scan, equal to the number of visibility phases divided bythe number of phase gains to ﬁt. That is, the number of phasesper parameter. When the number of phases per parameter in-creases, one single parameter must account for the minimizationof more phases, and the quality of the ﬁt decreases. In our case,the number of parameters is equal to N −

1, given that one an-tenna (the reference antenna) has a null phase gain by deﬁnition.Thus, the number of phases per gain to be ﬁtted is: phases gains = N ( N − N − = N N / I. Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. N t = 6st = 2st = 30s rmsF sp Fig. 1.

Flux densities of an spurious point source, F sp , in unitsof amplitude rms, ρ , recovered from hybrid mapping using purenoise synthetic visibilities, as a function of the number of anten-nas ( N ) and the averaging time of the selfcal solutions ( t ). Thelines correspond to our model (Eq. 7) and the dots to the numer-ical simulations.spurious source ﬂux density is ﬁtted very well using the simplemodel F sp ∝ ( N / γ , with γ = − . ± . F sp = . ρ r t t (cid:18) N (cid:19) − . (7)where F sp is the spurious source ﬂux density that can be gener-ated by selfcal, ρ is the root-mean-square (rms) of the visibilityamplitudes, t is the averaging time used in the correlator (typi-cally, t = t is the averaging time of the selfcal solu-tions, and N is the number of antennas of the interferometer. Wenote that the duration of the whole set of observations does nota ﬀ ect F sp , since this ﬂux density depends on the ratio betweenthe number of phases close to zero and the number of phasesdistant from zero, but does not depend on the total amount ofvisibilities used in the Fourier inversion. This model is shown inFig. 1. Equation 7 gives an estimate of the contribution of the artifactsof selfcal to the ﬂux density of a source obtained by calibratingthe antennas with the hybrid mapping technique. For cases ofhigh SNR data, such contribution to the total ﬂux density of thesources is negligibly small. However, when the ﬂux density of asource is comparable to the rms of the visibility amplitudes, caremust be exercised with the use of selfcal or the Global FringeFitting algorithm.We note that in the worst situation for the use of selfcal (i.e.,3 antennas and t = t ) the amount of spurious source ﬂux den-sity is as large as 76% of the rms of the visibility amplitudes. For10 antennas (the case of the VLBA) the recoverable ﬂux densitydecreases to 46% of the rms (and can be lower if we set t > t ).For interferometers with a large number of antennas, the amountof spurious source ﬂux density is, of course, smaller. For exam-ple, if we extrapolate the results shown in Fig. 1 to 50 antennas(the case of ALMA), the spurious source ﬂux density generatedby selfcal would be 24% of the rms of the visibility amplitudes,using t = t . All these results assume the same sensitivity for all the base-lines. In real cases, each baseline has its own sensitivity, withthe longest baselines noisier than the shortest ones. The use ofdata from all the baselines in the ﬁt can worsen the situation. Agood alternative for avoiding the spurious source generated byselfcal or, at least, to make its ﬂux density smaller is to ﬂag ordownweight the longest baselines in the ﬁt and / or to increase thestatistical weight of the data coming from the most sensitive an-tennas of the array. Nevertheless, even doing so, the statisticalﬂuctuations of the closure phases will always tend to make, afterthe use of selfcal, a spurious source with a considerably largeﬂux density.A better way to calibrate faint source data is using the phase-reference technique (e.g., Beasley & Conway 1995). When us-ing this technique, scans of a strong (reference) source are in-troduced between the scans of the faint (target) source. Then,the antenna gains are determined from the observations of thestrong source and then interpolated to the scans of the faint (tar-get) source. This technique is rather insensitive to the artifactsof selfcal and the probability of generating a fake signal fromnoise is practically zero. This is so, because the calibration ofthe target source comes from the analysis of data coming fromanother source (the reference source). Therefore, the noise in thedata of the faint source does not a ﬀ ect the antenna calibrations.However, it is common to use the antenna gains determined fromthe phase reference as an a priori calibration, performing then aGlobal Fringe Fitting on the target source data using small searchwindows (based on the calibration from the reference sourcedata) or applying self-calibration to the target visibilities in orderto improve the dynamic range of the ﬁnal image. In some cases,this might be malpractice, because the probability of generatinga spurious source ﬂux density from noise appears anew with fullstrengh, wasting all the beneﬁts of the phase referencing.

6. Tests of the reliability of a source detection fromnoisy data

In this section we present two simple tests that can be per-formed on real data in order to check the reliability of a possiblesource detection, or to check if part of the ﬂux density of a de-tected source may come from artifacts of selfcal. These tests areonly meaningful when they are applied to extremely noisy data.However, the application of these tests to high SNR data canstill give us an idea of the possible contribution of the artifactsof selfcal to the ﬂux density recovered from a source.We assume, in all our discussion, that the detected source iscompact enough to be considered point-like, without any impor-tant loss of precision. As in the previous sections, to simplify theexpressions we also assume that all the baselines of the interfer-ometer have the same sensitivity.

The dependence of the spurious source ﬂux density on the aver-aging time of the selfcal solutions is determined by the depen-dence of the standard deviation σ on the averaging time. Thatdependence translates into equation 7. However, the ﬂux densityof a real source in the data is independent of the averaging timeof the selfcal solutions. We can use this condition to estimatethe ﬂux density of a (possibly real) source detected under crit-ical circumstances. If we apply phase self-calibrations to a realdata set for di ﬀ erent averaging times, t , of the selfcal solutions, . Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. 5 the ﬂux densities recovered after each self-calibration, F self , aregiven by the formula: F self = F sp + F real = K √ t + F real (8)where K (related to the speciﬁcs of the interferometer) and F real (an estimate of the ﬂux density of the real source) are parametersto be ﬁtted. In the case that the signal of a real source is included in the vis-ibilities, the probability distributions of the real and imaginaryparts are still Gaussian, but the mean value of the real part ofthe visibilities (if such visibilities are well calibrated) will beequal to the ﬂux density of the real source. Then, it can be easilyshown that the probability distribution of the resulting visibilityamplitudes and phases is: g ( A , φ ) = A σ exp − ( A + F ) σ ! exp (cid:18) AF cos φσ (cid:19) (9)where g ( A , φ ) is the probability density of amplitudes (variable A ) and phases (variable φ ), F is the ﬂux density of the realsource, and σ is the width of the distributions of the real andimaginary parts of the visibilities.Equation 9 turns into equation 4 for F =

0. When F is dif-ferent from zero, the distribution of phases is not uniform. As F increases, the phases gather around zero in a Gaussian-like man-ner. The distribution of amplitudes also changes, increasing theratio between the rms of the visibility amplitudes and σ .How could the information provided by equation 9 be usedto check the reliability of a possible source detection? The clo-sure phases are robust quantities that can be used to check thereliability of a source detection. The closure phases are sensitiveto F and are not a ﬀ ected by the phase self-calibration. The dis-tribution of closure phases tends to gather around zero if thereis signal of a real source in the data (and particularly so if thesource has no structure) and is uniformly distributed if there isonly noise in the data. From the deﬁnition of closure phase (seeequation 2) we conclude that, if the visibility phases are wellcalibrated, the probability distribution of closure phases is equalto: c ( β ) = Z π − π Z π − π p ( φ ) p ( φ ) p ( β − φ − φ ) d φ d φ (10)where c ( β ) is the probability density of the closure phase, β , and p ( φ ) = Z ∞ g ( A , φ ) dA (11)Let us ellaborate on this: for the case of a perfect calibration,the phases of data from a given baseline ( i , j ) are independentof the phases from any other baseline, given that all the contri-butions to φ i j come from noise, which is uncorrelated betweenthe di ﬀ erent baselines. Thus, the probability distribution func-tion of any linear combination of visibility phases (as it is thecase of the closure phase) is equal to the product of the proba-bility distributions of the visibility phases. However, for the caseof a non-perfect calibration, in which selfcal has introduced aspurious source ﬂux density in the data, the distributions of thephases from the di ﬀ erent baselines will no longer be independentand equation 10 will not quite apply. In such cases, the phase distributions, p ( φ ), will be more peaked around zero, but corre-lations will appear among the phases of the di ﬀ erent baselines.Therefore, the probability distribution function of a linear com-bination of phases will not be equal to a simple product of p ( φ ).Nonetheless, the correlations between baselines that selfcal in-troduces in the data keep the distribution of closure phases, c ( β ),unaltered. That is, even after a selfcal iteration has generated aspurious source in the map, the distribution of closure phasesis still equal to c ( β ), as computed from the distribution of thephases, p ( φ ), corresponding to perfectly calibrated data.This invariance property of the closure phase distribution, c ( β ), can be used to check the reliability of a possible source de-tection. In Fig. 2, the theoretical ﬂux density of a real source (inunits of the rms of the amplitudes, ρ ) is shown as a function ofthe mean absolute value of the closure phases and as a functionof the mean cosine of the closure phases. Both quantities are di-rectly related to the deviation of the closure phase distributionfrom a uniform distribution. If the closure phases are uniformlydistributed between − π and π , the average of their absolute val-ues will be equal to π/ At this stage, it must be said that, even though the self-calibrationdoes not a ﬀ ect the closure phases, a time average of a set of self-calibrated visibilities will change the closure phase distribution.That is, if selfcal generates a spurious source from noise and thevisibilities are averaged in time bins of t seconds, with t > t ,the closure phase distribution changes and the resulting closurephases concentrate around zero, creating the e ﬀ ect of a sourcethat is completely indistinguishable from a real one using anytest.For very small values of the ﬂux density F , this test is not asgood as the ﬁrst one. As we can see in Fig. 2, the closure phasesdo not dramatically change their distribution for ﬂux densitiesin the range between 0 and ∼

15% of the rms of the visibilityamplitudes. For tentative detections under critical circumstancesin that range, this test could lead us to the wrong conclusionabout the reliability of a source detection. Looking at it from adi ﬀ erent viewpoint, Fig. 2 provides an interesting lesson: for agiven dataset, there can exist a real faint source (appearing in amap with a dynamic range of 6 or more) even if the distributionof closure phases is uniform, that is, even if the closure phasedistribution is noise-like . Thus, a conclusion on the reliability ofa source detection based only on the closure phase distributionbeing extremely noisy, is not deﬁnitive.Another thing worth-noticing is that this test assumes thesame sensitivity for all the antennas and a source compactenough for generating closure phases close to zero even forthe longest baselines (which may not be the case, specially forsources with a very low ﬂux density per unit beam). These as-sumptions impose limitations to the use of this test. However, itcould still be applied by restricting its use to the shortest base-lines with similar antennas. For an array with a large antenna, I. Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. π / 2 − < |C| > < cos(C) > F ρ Fig. 2.

Theoretical dependence of the ﬂux density of a real observed source (in units of the rms of the visibility amplitudes), as afunction of “ π/ ﬀ erent rms in di ﬀ erentbaselines.We must also note that in cases of high SNR, the rms ofthe visibility amplitudes is no longer related only to the thermalnoise of the baselines (the ﬂux density of the source a ﬀ ects thevalue of the rms), and the fraction F /ρ shown in Fig. 2 should beaccordingly corrected. For cases of high SNR, the quantity thatwill substitute ρ in the fraction F /ρ shown in Fig. 2 is p ρ − F . Real data do not obey the simplifying assumptions that we haveused in the earlier sections. The baselines of a real interferome-ter have di ﬀ erent sensitivities, which also vary in time. Thus, inorder to check the reliability of a source detection from real datawe must search a subset of observations in which the sensitivityof the antennas is approximately constant. Moreover, we mustonly work with the subset of most sensitive antennas of the in-terferometer. If there is one antenna in our interferometer that isclearly more sensitive than the others, we should compute onlythe average of the closure phases in which this more sensitiveantenna appears, in order to insure the possible signature of thesource in the closure phase distribution. In what follows, we ap-ply our reliability criteria to real data corresponding to the radiosupernova SN 2004et (Mart´ı-Vidal et al. 2007). We observed this supernova on 20 February 2005. From all datareported in Mart´ı-Vidal et al. (2007), we have chosen the fol-lowing subset of antennas: Brewster, Fort Davis, Green Bank,Hancock, Kitt Peak, and Owens Valley. We have only computedthe closure phases in which the antenna Green Bank appears, andwe have used data only from 14 hr to 20 hr (UT). These choicesare based on the quality of the data for our purposes (i.e., the sta-bility of the antenna sensitivities, which we assume proportionalto the system temperatures registered for each station).First test: we self-calibrated the SN 2004et data using dif-ferent averaging times, ranging from 2 to 120 seconds (roughly,the duration of one scan). The ﬁt of the ﬂux densities recov-ered from the SN 2004et data as a function of the averaging time of the selfcal solutions, equation 8, results in a value of F real = . ± .

13 mJy. This value is clearly higher than zero,indicating that there is a real signal in the data. This value is alsoclose to the ﬂux density of SN 2004et reported by Mart´ı-Vidal etal., 0 . ± .

03 mJy, recovered from phase referenced data witha deconvolution using CLEAN.Second test: even though we know that it is not quite appro-priate, we have also performed this test on the SN 2004et data.The ﬂux density of SN 2004et is too low compared to the rms ofthe visibility amplitudes ( ∼

20 mJy) for obtaining a good resultwith the test of the closure phase distribution. The average valueof the cosines of all the closure phases considered in these obser-vations is equal to 0 . ± . . ± . . ± .

08 times the rms of the visibilityamplitudes, ρ , used in our computations, which, as said above, is ∼

20 mJy. Hence, the estimated ﬂux density of the (real) sourcein the data is, then, 3 . ± . +

614 to the scans of the supernova. These authors did notreﬁne afterwards such a calibration applying a Global FringeFitting to the supernova data. This procedure assured a reliabledetection of the supernova. These authors did not apply any othercalibration (selfcal) to the phased-referenced supernova data, inorder to avoid any possible artifact introduced by the use of self-cal.

7. Conclusions

We have analyzed the consequence of the phase-self-calibrationalgorithm when it is applied to extremely noisy data. We havestudied how this algorithm and the statistical ﬂuctuations of thevisibility phases can create a spurious source from pure noise.The ﬂux density of the spurious source can be a considerablefraction of the rms of the visibility amplitudes. The applica-tion of other other antenna-based calibration algorithms (likethe Global Fringe Fitting) to noisy data can have similar conse- . Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. 7 quences to those of selfcal if the SNR cuto ﬀ of the gain solutionsis set to small values.We have considered numerical and analytic studies to showhow the ﬂux density of a spurious source created by selfcal de-pends on the number of antennas, the sensitivity of the array,and the averaging time of the selfcal solutions. We have also pre-sented two simple tests that can be applied to real data in order tocheck if the detection of a faint source could be the result of theapplication of an antenna-based calibration algorithm to noisydata. These tests basically relate the averaging time of the selfcalsolutions and the characteristics of the closure phase distributionto the ﬂux density of a compact source possibly present in thedata. To show a practical case, we have applied these tests to aset of real VLBI observations of supernova SN 2004et and foundgood agreement between the ﬂux density recovered by CLEANfrom the (phase-referenced) visibilities of this supernova (Mart´ı-Vidal et al. 2007) and the ﬂux density estimate provided by ourreliability tests. Acknowledgements.