Spurious source generation in mapping from noisy phase-self-calibrated data
aa r X i v : . [ a s t r o - ph ] J a n Astronomy&Astrophysicsmanuscript no. 2007-8690 c (cid:13)
ESO 2018October 28, 2018
Spurious source generation in mapping from noisyphase-self-calibrated data
I. Mart´ı-Vidal and J. M. Marcaide Dpt. Astronomia i Astrof´ısica, Universitat de Val`encia, C / Dr. Moliner 50, 46100 Burjassot (Valencia), SPAINe-mail:
Accepted on 12 December 2007
ABSTRACT
Phase self-calibration (or selfcal ) is an algorithm often used in the calibration of interferometric observations in astronomy. Althougha powerful tool, this algorithm presents strong limitations when applied to data with a low signal-to-noise ratio. We analyze theartifacts that the phase selfcal algorithm produces when applied to extremely noisy data. We show how the phase selfcal may generatea spurious source in the sky from a distribution of completely random visibilities. This spurious source is indistinguishable from a realone. We numerically and analytically compute the relationship between the maximum spurious flux density generated by selfcal fromnoise and the particulars of the interferometric observations. Finally, we present two simple tests that can be applied to interferometricdata for checking whether a source detection is real or whether the source is an artifact of the phase self-calibration algorithm.
Key words.
Techniques: interferometric – Methods: data analysis – Techniques: image processing
1. Introduction
Phase self-calibration (or selfcal ) is an algorithm often usedin the calibration of radio astronomical data. It was intro-duced by Readhead & Wilkinson (1978) and Cotton (1979),and it has been essential for the success of Very LongBaseline Interferometry (VLBI) imaging. Also, the antenna-based calibrations obtained from the
Global Fringe Fitting al-gorithm (Schwab & Cotton 1983) are equivalent to a phaseself-calibration. The phase selfcal will also be an algorithmwidely used with future interferometric instruments, such asthe Atacama Large Millimeter Array (ALMA) or the SquareKilometre Array (SKA), now under construction or planned.Optical interferometric observations (like those in the VeryLarge Telescope Interferometry, VLTI) will also eventually ben-efit from some form of selfcal, although closure phases and am-plitudes are measured in optical interferometry in a very di ff er-ent way than in radio. Thus, the statistical analysis presentedhere may need some substantial changes to rigorously describethe probability of false detections by optical interferometers.Given that part of the interferometric observations obtainedfrom all those instruments may come from very faint sources,it is important to take into account the undesired and uncontrol-lable e ff ects that the instrumentation and / or the calibration andanalysis algorithms applied to the data could introduce in the in-terferometric observations. A deep study of all our analysis toolsand their e ff ects on noisy data is essential for discerning the re-liability of detections of very faint sources. Some discoveriesmade by pushing the interferometric instruments to their sensi-tivity limits could turn out to be the result of artifacts producedby the analysis tools.The main limitations of the phase self-calibration algorithmhave been analyzed in many publications (e.g., Linfield 1986,Wilkinson et al. 1988). It is well known that an unwise use of Send o ff print requests to : I. Mart´ı-Vidal selfcal can lead to imperfect images, even to the generation ofspurious source components, elimination of real components,and deformation of the structure of extended sources. In this pa-per, we focus on the e ff ects that phase self-calibration produceswhen applied to pure noise. We show that selfcal can generate aspurious source from pure noise, with a relatively high flux den-sity compared to the rms of the visibility amplitudes. We ana-lytically and numerically study how the recoverable flux densityof such a spurious source depends on the details of the observa-tions (the sensitivity of the interferometer, the number of anten-nas, and the averaging time of the selfcal solutions). Finally, westudy the e ff ects of phase self-calibration applied to the visibili-ties resulting from observations of real faint sources, instead ofpure noise. We present two simple tests that can be applied toreal data in order to check whether a given faint source is realor not, and apply these tests to real data, corresponding to VLBIobservations of the radio supernova SN 2004et (Mart´ı-Vidal etal. 2007).
2. Basics of phase self-calibration
The basics of phase self-calibration can be found in many publi-cations (e.g., Readhead & Wilkinson 1978, Schwab 1980). Here,we explain the essence of this algorithm in a few lines. Let ussuppose that we have made an interferometric observation us-ing a set of N antennas. We obtain one visibility, V i j , for eachbaseline, that is, for each pair of antennas ( i , j ). Let us call φ i j the phase of the visibility V i j . Any atmospheric or instrumentale ff ect on the optical path of the signals that arrived to antennas i and j will a ff ect the phase φ i j in the way: φ i j = φ stri j + φ atmi − φ atmj , (1)where φ atml represents all the undesired (i.e., atmospheric andinstrumental) contributions to the optical path of the signal re-ceived by the antenna l and φ stri j the contribution to the phase that I. Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. comes purely from the structure of the observed source, that is,the so called source structure phase . It can be easily shown thatthe quantities known as closure phases , and defined as (Jennison1958, Rogers et al. 1974): C i jk = φ i j + φ ik − φ jk (2)are independent of φ atml . That is, the closure phases C i jk are onlydefined by the structure of the observed source. Thus, they are in-dependent of any atmospheric or instrumental contribution thatmay a ff ect the signals received by the antennas of the interfer-ometer. The phase self-calibration algorithm takes advantage ofthe closure phases to estimate the undesired antenna-dependentcontributions φ atml . In short, the phase selfcal finds which set ofantenna-dependent quantities φ gainl (called phase gains ) generatethe set of phases: φ sel fi j = φ i j − φ gaini + φ gainj , (3)where φ sel fi j are the phases that better represent the source struc-ture given by the closure phases C i jk . In the ideal case, φ gainl = φ atml and φ sel fi j = φ stri j . The process from which the values φ gainl (for l = i , j ) are obtained is called hybrid mapping , and its ex-planation can be found in many publications (e.g., Cornwell &Wilkinson 1981). Here, su ffi ce to say that the phase gains of theantennas are obtained from a least-square fit of the raw visibili-ties to the source model obtained from the mapping.The hybrid mapping is an iterative process from which thestructure of the source model is refined step by step. Often, themodel used in the first iteration of hybrid mapping is a pointsource located at the center of the map (obviously, the flux den-sity of this point source will not a ff ect the phase calibration).The successive steps of hybrid mapping and selfcal correct thispoint source model until the structure that better represents allthe closure phases is obtained.
3. Probability distribution of the visibilities due topure noise
When an interferometer observes a given source with a flux den-sity well below the sensitivity limit of its baselines (that is, whenthe interferometric data contain only noise), both the real andimaginary parts of the resulting visibilities follow Gaussian dis-tributions, centered at the origin. The amplitudes and phases ofthe visibilities follow distributions di ff erent from Gaussian. Itcan be shown that, for each baseline, the probability distributionof the phases is uniform between − π and π , and that instead thedistribution of the amplitudes is given by: g ( A ) = A σ i j exp − A σ i j (4)where g ( A ) is the probability density of the amplitudes and σ i j isthe width of the Gaussian distributions of the real and imaginaryparts of the visibilities of the baseline ( i , j ). The width σ i j isrelated to the thermal noise of the baseline ( i , j ). We assume, forsimplicity, that all the baselines of the interferometer have thesame value of σ . It can be shown that the rms of the visibilityamplitudes of a pure noise signal is ρ = √ σ and that the meanamplitude, < A > , is √ π/ σ , di ff erent from zero.
4. Probability of generating a spurious source frompure noise
Given that the real part of a visibility with phase in the range( − π/ , π/
2) is positive, all the visibilities with phases in thatrange bring a positive mean flux density to the map. We call phase close to zero to a phase in the range ( − π/ , π/
2) and phasedistant from zero to a phase outside that range.The distribution of closure phases is uniform between − π and π , as it is also the case for the distribution of phases. This meansthat there is a subset of closure phases that by chance are close tozero, being compatible with a point source. However, there arealso closure phases distant from zero, which are totally incom-patible with a compact source. If self-calibration is not appliedto the data, then the uniform distribution of phases (and closurephases) will result in a noisy map with no source defined in it.But if a single iteration of phase self-calibration is applied, thereis a selection process of the closure phases in the calibration,which may generate a spurious source with a flux density com-parable to the rms of the amplitudes (which can be much higherthan the rms of the image), as we show below.For each scan, the e ff ects of the least-square fit described inSect. 2 can be understood in the following way: selfcal searchesfor the visibilities corresponding to the antennas most commonlyappearing in the closure phases that are close to zero (whichusually correspond to phases that can be modelled with a pointsource). Then, selfcal minimizes the phases of such visibilitiesby calibrating those antennas, leaving all the other visibilitieswith the phases dispersed between − π and π . In other words, thephases of the visibilities with large closure phases contribute toincrease the value of the χ at the minimum, but the positionof such minimum only depends on the visibilities with phasesthat can be modelled with a point source. That is, all the visi-bilities susceptible of producing a compact source are calibratedand their phases concentrate around zero. All the other visibili-ties tend to have their phases uniformly distributed between − π and π , thus generating, after the Fourier inversion, a null meanflux density in the map. Thus, selfcal always produces a positivebias in the mean flux density of the map, given that selfcal onlyacts, e ff ectively, on the phases that can be approached to zero,because their corresponding closure phases are close to zero.One might think that it is very di ffi cult for an antenna to beinvolved in a large number of closure phases close to zero, giventhat the distribution of phases is uniform. One might think that,in average, a given antenna is involved in the same number ofclosure phases that are close to zero than in the closure phasesthat are distant from zero. In such case, it would be impossiblefor selfcal to select which antenna should be calibrated, giventhat all the antennas participating in each scan would have thesame chances for being calibrated. But this is only true in aver-age. In the distributions of all the interferometric observations,there are statistical fluctuations, which are always used by self-cal for the generation of a spurious point source. For a givenantenna i , the probability of finding n closure phases (in whichthat antenna is involved) close to zero is: P i ( n ) = N ′ ( N ′ )! n !( N ′ − n )! (5)where N ′ = ( N − N − / i is involed. (Notice that even thoughall closure phases are not independent, the closure phases withone common antenna are.) Thus, there is a finite probability offinding an antenna that appears in more than N ′ / . Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. 3 the antenna i appears will be minimized with success, generatinga positive mean flux density in the map. Actually, even the casesin which n < N ′ / i appears will tend to gather around − π and π ,meaning that there are other antennas that will appear in a largenumber of closure phases close to zero (i.e., the antennas be-longing to the closure phases distant from zero in which antenna i appears).In short, there are always statistical fluctuations in the purenoise distribution of phases that can be used by selfcal to produce(by a selection process of the antennas most commonly beinginvolved in the closure phases close to zero) a point source witha spurious source flux density.It must be said that Global Fringe Fitting, when applied tonoisy data, can also generate a spurious source from pure noise,given that this algorithm also finds antenna-based calibrationsfor adjusting the interferometric fringes of all the baselines ineach scan. The spurious source is generated as long as the min-imum SNR of the fringes to be considered in the fit is set toa small value (lower than 2 or 3) . In those cases, the FringeFitting would work on correlation peaks (fringes) produced, inmany cases, by spurious noise fluctuations. Then, by the samereasons given above, there would be a relatively high probabilityof generating a spurious point source from pure noise.
5. Dependence of the flux density of the spurioussource on the characteristics of the observations
In this section we consider how the flux density of the spurioussource generated by selfcal depends on the parameters defining aset of observations. The parameters that we consider are the sen-sitivity of the array (for simplicity, we assume the same sensitiv-ity for all the baselines of the interferometric array), the numberof antennas of the interferometer, and the averaging time of theselfcal solutions. For the case of the Global Fringe Fitting algo-rithm, the averaging time of the solutions is equal to the durationof the scans.
We generated a set of synthetic interferometric data with the pro-gram fake of the Caltech Package (Pearson 1991). We generated6 hours of observations using a set of 20 antennas. All theseantennas had the same diameters (25 m) and the same systemtemperatures (60 K). The correlator integration time was set to2 seconds. The source model used by fake consisted on a singlepoint source with a flux density of 1 nJy (of course, completelyundetectable by the interferometer). The data generated this waythus contain 6 hours of pure noise observations made at 20 iden-tical antennas under identical conditions. The mean of all thevisibility amplitudes is 106 mJy.We used the program difmap (Shepherd et al. 1995) for hy-brid mapping. We applied the natural weighting scheme to thevisibilities, for sensitivity optimization, and applied an initialselfcal using a centered point source. The hybrid mapping stepswere repeated until the χ of the fit of selfcal arrived to conver-gence. Then, we deconvolved the resulting point source using Even though a SNR of 2 − ff s of ∼ ff s may be applied with very small delay / rate windows, say,after a phase-reference pre-calibration. the CLEAN algorithm to see how much flux density was gener-ated by selfcal. This process was repeated for di ff erent numbersof antennas and for di ff erent averaging times of the selfcal so-lutions. The spurious point source flux densities obtained in allthese cases are shown in Fig. 1 as filled circles. When the interferometer observes only noise, the sensitivity ofthe baselines defines the value of the standard deviation, σ , ofthe Gaussian distributions of the real and imaginary parts of thevisibilities. As the sensitivity increases, the thermal noise of thebaselines decreases, decreasing also the value of σ . Given thatselfcal has to do only with phases and leaves the amplitudes un-altered, the recoverable flux density of the spurious source de-pends linearly on the rms of the amplitudes, which in turn alsodepends linearly on the value of σ . Thus, an increase in the sen-sitivity of the interferometer decreases the amount of spuriousflux density recoverable from the data using selfcal. The con-stant factor in the ratio between the flux density recovered andthe rms of the visibility amplitudes depends on the method usedto estimate the flux density (i. e. di ff erent deconvolution algo-rithms or modelfitting to the visibilities). In our case, from thenumerical study described in the previous section, we determineit to be 0 . ± . ff ecton the spurious source flux density. If we have one observationevery t seconds (usually, t is 2 seconds) and find only one so-lution of selfcal every t seconds, with t > t , then the spurioussource flux density decreases. Finding a single solution of selfcalevery t seconds is equivalent to averaging all the observations inbins of t seconds and, afterwards, self-calibrating the resultingvisibilities. When we average the visibilities in blocks of t sec-onds, we are averaging separately the real and imaginary partsof the visibilities, which follow Gaussian distributions. The ef-fect of this average is that the standard deviations of the resultingdistributions decrease by a factor √ t / t , because of the CentralLimit Theorem.The dependence of the spurious source flux density on thenumber of antennas is more di ffi cult to find. There are lots ofpossible combinations of phases and baselines that help selfcalto generate a spurious source, and each one of these combina-tions has a di ff erent weight in the final spurious flux density. Wecan use a simplified phenomenological model to find out the re-covered flux density as a function of the number of antennas.In principle, the recovered flux density depends directly on howwell the point model fits the data. A good indicator of the ad-justability of the data, with the phases randomly distributed, is,for each scan, equal to the number of visibility phases divided bythe number of phase gains to fit. That is, the number of phasesper parameter. When the number of phases per parameter in-creases, one single parameter must account for the minimizationof more phases, and the quality of the fit decreases. In our case,the number of parameters is equal to N −
1, given that one an-tenna (the reference antenna) has a null phase gain by definition.Thus, the number of phases per gain to be fitted is: phases gains = N ( N − N − = N N / I. Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. N t = 6st = 2st = 30s rmsF sp Fig. 1.
Flux densities of an spurious point source, F sp , in unitsof amplitude rms, ρ , recovered from hybrid mapping using purenoise synthetic visibilities, as a function of the number of anten-nas ( N ) and the averaging time of the selfcal solutions ( t ). Thelines correspond to our model (Eq. 7) and the dots to the numer-ical simulations.spurious source flux density is fitted very well using the simplemodel F sp ∝ ( N / γ , with γ = − . ± . F sp = . ρ r t t (cid:18) N (cid:19) − . (7)where F sp is the spurious source flux density that can be gener-ated by selfcal, ρ is the root-mean-square (rms) of the visibilityamplitudes, t is the averaging time used in the correlator (typi-cally, t = t is the averaging time of the selfcal solu-tions, and N is the number of antennas of the interferometer. Wenote that the duration of the whole set of observations does nota ff ect F sp , since this flux density depends on the ratio betweenthe number of phases close to zero and the number of phasesdistant from zero, but does not depend on the total amount ofvisibilities used in the Fourier inversion. This model is shown inFig. 1. Equation 7 gives an estimate of the contribution of the artifactsof selfcal to the flux density of a source obtained by calibratingthe antennas with the hybrid mapping technique. For cases ofhigh SNR data, such contribution to the total flux density of thesources is negligibly small. However, when the flux density of asource is comparable to the rms of the visibility amplitudes, caremust be exercised with the use of selfcal or the Global FringeFitting algorithm.We note that in the worst situation for the use of selfcal (i.e.,3 antennas and t = t ) the amount of spurious source flux den-sity is as large as 76% of the rms of the visibility amplitudes. For10 antennas (the case of the VLBA) the recoverable flux densitydecreases to 46% of the rms (and can be lower if we set t > t ).For interferometers with a large number of antennas, the amountof spurious source flux density is, of course, smaller. For exam-ple, if we extrapolate the results shown in Fig. 1 to 50 antennas(the case of ALMA), the spurious source flux density generatedby selfcal would be 24% of the rms of the visibility amplitudes,using t = t . All these results assume the same sensitivity for all the base-lines. In real cases, each baseline has its own sensitivity, withthe longest baselines noisier than the shortest ones. The use ofdata from all the baselines in the fit can worsen the situation. Agood alternative for avoiding the spurious source generated byselfcal or, at least, to make its flux density smaller is to flag ordownweight the longest baselines in the fit and / or to increase thestatistical weight of the data coming from the most sensitive an-tennas of the array. Nevertheless, even doing so, the statisticalfluctuations of the closure phases will always tend to make, afterthe use of selfcal, a spurious source with a considerably largeflux density.A better way to calibrate faint source data is using the phase-reference technique (e.g., Beasley & Conway 1995). When us-ing this technique, scans of a strong (reference) source are in-troduced between the scans of the faint (target) source. Then,the antenna gains are determined from the observations of thestrong source and then interpolated to the scans of the faint (tar-get) source. This technique is rather insensitive to the artifactsof selfcal and the probability of generating a fake signal fromnoise is practically zero. This is so, because the calibration ofthe target source comes from the analysis of data coming fromanother source (the reference source). Therefore, the noise in thedata of the faint source does not a ff ect the antenna calibrations.However, it is common to use the antenna gains determined fromthe phase reference as an a priori calibration, performing then aGlobal Fringe Fitting on the target source data using small searchwindows (based on the calibration from the reference sourcedata) or applying self-calibration to the target visibilities in orderto improve the dynamic range of the final image. In some cases,this might be malpractice, because the probability of generatinga spurious source flux density from noise appears anew with fullstrengh, wasting all the benefits of the phase referencing.
6. Tests of the reliability of a source detection fromnoisy data
In this section we present two simple tests that can be per-formed on real data in order to check the reliability of a possiblesource detection, or to check if part of the flux density of a de-tected source may come from artifacts of selfcal. These tests areonly meaningful when they are applied to extremely noisy data.However, the application of these tests to high SNR data canstill give us an idea of the possible contribution of the artifactsof selfcal to the flux density recovered from a source.We assume, in all our discussion, that the detected source iscompact enough to be considered point-like, without any impor-tant loss of precision. As in the previous sections, to simplify theexpressions we also assume that all the baselines of the interfer-ometer have the same sensitivity.
The dependence of the spurious source flux density on the aver-aging time of the selfcal solutions is determined by the depen-dence of the standard deviation σ on the averaging time. Thatdependence translates into equation 7. However, the flux densityof a real source in the data is independent of the averaging timeof the selfcal solutions. We can use this condition to estimatethe flux density of a (possibly real) source detected under crit-ical circumstances. If we apply phase self-calibrations to a realdata set for di ff erent averaging times, t , of the selfcal solutions, . Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. 5 the flux densities recovered after each self-calibration, F self , aregiven by the formula: F self = F sp + F real = K √ t + F real (8)where K (related to the specifics of the interferometer) and F real (an estimate of the flux density of the real source) are parametersto be fitted. In the case that the signal of a real source is included in the vis-ibilities, the probability distributions of the real and imaginaryparts are still Gaussian, but the mean value of the real part ofthe visibilities (if such visibilities are well calibrated) will beequal to the flux density of the real source. Then, it can be easilyshown that the probability distribution of the resulting visibilityamplitudes and phases is: g ( A , φ ) = A σ exp − ( A + F ) σ ! exp (cid:18) AF cos φσ (cid:19) (9)where g ( A , φ ) is the probability density of amplitudes (variable A ) and phases (variable φ ), F is the flux density of the realsource, and σ is the width of the distributions of the real andimaginary parts of the visibilities.Equation 9 turns into equation 4 for F =
0. When F is dif-ferent from zero, the distribution of phases is not uniform. As F increases, the phases gather around zero in a Gaussian-like man-ner. The distribution of amplitudes also changes, increasing theratio between the rms of the visibility amplitudes and σ .How could the information provided by equation 9 be usedto check the reliability of a possible source detection? The clo-sure phases are robust quantities that can be used to check thereliability of a source detection. The closure phases are sensitiveto F and are not a ff ected by the phase self-calibration. The dis-tribution of closure phases tends to gather around zero if thereis signal of a real source in the data (and particularly so if thesource has no structure) and is uniformly distributed if there isonly noise in the data. From the definition of closure phase (seeequation 2) we conclude that, if the visibility phases are wellcalibrated, the probability distribution of closure phases is equalto: c ( β ) = Z π − π Z π − π p ( φ ) p ( φ ) p ( β − φ − φ ) d φ d φ (10)where c ( β ) is the probability density of the closure phase, β , and p ( φ ) = Z ∞ g ( A , φ ) dA (11)Let us ellaborate on this: for the case of a perfect calibration,the phases of data from a given baseline ( i , j ) are independentof the phases from any other baseline, given that all the contri-butions to φ i j come from noise, which is uncorrelated betweenthe di ff erent baselines. Thus, the probability distribution func-tion of any linear combination of visibility phases (as it is thecase of the closure phase) is equal to the product of the proba-bility distributions of the visibility phases. However, for the caseof a non-perfect calibration, in which selfcal has introduced aspurious source flux density in the data, the distributions of thephases from the di ff erent baselines will no longer be independentand equation 10 will not quite apply. In such cases, the phase distributions, p ( φ ), will be more peaked around zero, but corre-lations will appear among the phases of the di ff erent baselines.Therefore, the probability distribution function of a linear com-bination of phases will not be equal to a simple product of p ( φ ).Nonetheless, the correlations between baselines that selfcal in-troduces in the data keep the distribution of closure phases, c ( β ),unaltered. That is, even after a selfcal iteration has generated aspurious source in the map, the distribution of closure phasesis still equal to c ( β ), as computed from the distribution of thephases, p ( φ ), corresponding to perfectly calibrated data.This invariance property of the closure phase distribution, c ( β ), can be used to check the reliability of a possible source de-tection. In Fig. 2, the theoretical flux density of a real source (inunits of the rms of the amplitudes, ρ ) is shown as a function ofthe mean absolute value of the closure phases and as a functionof the mean cosine of the closure phases. Both quantities are di-rectly related to the deviation of the closure phase distributionfrom a uniform distribution. If the closure phases are uniformlydistributed between − π and π , the average of their absolute val-ues will be equal to π/ At this stage, it must be said that, even though the self-calibrationdoes not a ff ect the closure phases, a time average of a set of self-calibrated visibilities will change the closure phase distribution.That is, if selfcal generates a spurious source from noise and thevisibilities are averaged in time bins of t seconds, with t > t ,the closure phase distribution changes and the resulting closurephases concentrate around zero, creating the e ff ect of a sourcethat is completely indistinguishable from a real one using anytest.For very small values of the flux density F , this test is not asgood as the first one. As we can see in Fig. 2, the closure phasesdo not dramatically change their distribution for flux densitiesin the range between 0 and ∼
15% of the rms of the visibilityamplitudes. For tentative detections under critical circumstancesin that range, this test could lead us to the wrong conclusionabout the reliability of a source detection. Looking at it from adi ff erent viewpoint, Fig. 2 provides an interesting lesson: for agiven dataset, there can exist a real faint source (appearing in amap with a dynamic range of 6 or more) even if the distributionof closure phases is uniform, that is, even if the closure phasedistribution is noise-like . Thus, a conclusion on the reliability ofa source detection based only on the closure phase distributionbeing extremely noisy, is not definitive.Another thing worth-noticing is that this test assumes thesame sensitivity for all the antennas and a source compactenough for generating closure phases close to zero even forthe longest baselines (which may not be the case, specially forsources with a very low flux density per unit beam). These as-sumptions impose limitations to the use of this test. However, itcould still be applied by restricting its use to the shortest base-lines with similar antennas. For an array with a large antenna, I. Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. π / 2 − < |C| > < cos(C) > F ρ Fig. 2.
Theoretical dependence of the flux density of a real observed source (in units of the rms of the visibility amplitudes), as afunction of “ π/ ff erent rms in di ff erentbaselines.We must also note that in cases of high SNR, the rms ofthe visibility amplitudes is no longer related only to the thermalnoise of the baselines (the flux density of the source a ff ects thevalue of the rms), and the fraction F /ρ shown in Fig. 2 should beaccordingly corrected. For cases of high SNR, the quantity thatwill substitute ρ in the fraction F /ρ shown in Fig. 2 is p ρ − F . Real data do not obey the simplifying assumptions that we haveused in the earlier sections. The baselines of a real interferome-ter have di ff erent sensitivities, which also vary in time. Thus, inorder to check the reliability of a source detection from real datawe must search a subset of observations in which the sensitivityof the antennas is approximately constant. Moreover, we mustonly work with the subset of most sensitive antennas of the in-terferometer. If there is one antenna in our interferometer that isclearly more sensitive than the others, we should compute onlythe average of the closure phases in which this more sensitiveantenna appears, in order to insure the possible signature of thesource in the closure phase distribution. In what follows, we ap-ply our reliability criteria to real data corresponding to the radiosupernova SN 2004et (Mart´ı-Vidal et al. 2007). We observed this supernova on 20 February 2005. From all datareported in Mart´ı-Vidal et al. (2007), we have chosen the fol-lowing subset of antennas: Brewster, Fort Davis, Green Bank,Hancock, Kitt Peak, and Owens Valley. We have only computedthe closure phases in which the antenna Green Bank appears, andwe have used data only from 14 hr to 20 hr (UT). These choicesare based on the quality of the data for our purposes (i.e., the sta-bility of the antenna sensitivities, which we assume proportionalto the system temperatures registered for each station).First test: we self-calibrated the SN 2004et data using dif-ferent averaging times, ranging from 2 to 120 seconds (roughly,the duration of one scan). The fit of the flux densities recov-ered from the SN 2004et data as a function of the averaging time of the selfcal solutions, equation 8, results in a value of F real = . ± .
13 mJy. This value is clearly higher than zero,indicating that there is a real signal in the data. This value is alsoclose to the flux density of SN 2004et reported by Mart´ı-Vidal etal., 0 . ± .
03 mJy, recovered from phase referenced data witha deconvolution using CLEAN.Second test: even though we know that it is not quite appro-priate, we have also performed this test on the SN 2004et data.The flux density of SN 2004et is too low compared to the rms ofthe visibility amplitudes ( ∼
20 mJy) for obtaining a good resultwith the test of the closure phase distribution. The average valueof the cosines of all the closure phases considered in these obser-vations is equal to 0 . ± . . ± . . ± .
08 times the rms of the visibilityamplitudes, ρ , used in our computations, which, as said above, is ∼
20 mJy. Hence, the estimated flux density of the (real) sourcein the data is, then, 3 . ± . +
614 to the scans of the supernova. These authors did notrefine afterwards such a calibration applying a Global FringeFitting to the supernova data. This procedure assured a reliabledetection of the supernova. These authors did not apply any othercalibration (selfcal) to the phased-referenced supernova data, inorder to avoid any possible artifact introduced by the use of self-cal.
7. Conclusions
We have analyzed the consequence of the phase-self-calibrationalgorithm when it is applied to extremely noisy data. We havestudied how this algorithm and the statistical fluctuations of thevisibility phases can create a spurious source from pure noise.The flux density of the spurious source can be a considerablefraction of the rms of the visibility amplitudes. The applica-tion of other other antenna-based calibration algorithms (likethe Global Fringe Fitting) to noisy data can have similar conse- . Mart´ı-Vidal et al.: Spurious source generation from noisy self-calibrated data. 7 quences to those of selfcal if the SNR cuto ff of the gain solutionsis set to small values.We have considered numerical and analytic studies to showhow the flux density of a spurious source created by selfcal de-pends on the number of antennas, the sensitivity of the array,and the averaging time of the selfcal solutions. We have also pre-sented two simple tests that can be applied to real data in order tocheck if the detection of a faint source could be the result of theapplication of an antenna-based calibration algorithm to noisydata. These tests basically relate the averaging time of the selfcalsolutions and the characteristics of the closure phase distributionto the flux density of a compact source possibly present in thedata. To show a practical case, we have applied these tests to aset of real VLBI observations of supernova SN 2004et and foundgood agreement between the flux density recovered by CLEANfrom the (phase-referenced) visibilities of this supernova (Mart´ı-Vidal et al. 2007) and the flux density estimate provided by ourreliability tests. Acknowledgements.