A classifier for spurious astrometric solutions in Gaia EDR3
Jan Rybizki, Gregory Green, Hans-Walter Rix, Markus Demleitner, Eleonora Zari, Andrzej Udalski, Richard L. Smart, Andy Gould
MMNRAS , 1–12 (2020) Preprint 29 January 2021 Compiled using MNRAS L A TEX style file v3.0
A classifier for spurious astrometric solutions in Gaia EDR3
Jan Rybizki ★ , Gregory M. Green , Hans-Walter Rix , Markus Demleitner , Eleonora Zari ,Andrzej Udalski , Richard L. Smart , and Andy Gould , Max Planck Institute for Astronomy, Königstuhl 17, D-69117 Heidelberg, Germany Astronomisches Rechen-Institut, Zentrum für Astronomie der Universität Heidelberg, Mönchhofstrasse 12-14, D-69120 Heidelberg, Germany Astronomical Observatory, University of Warsaw, Al. Ujazdowskie 4, 00-478 Warszawa, Poland INAF - Osservatorio Astrofisico di Torino, via Osservatorio 20, 10025 Pino Torinese (TO), Italy Department of Astronomy, Ohio State University, 4055 McPherson Laboratory, 140 West 18th Avenue, Columbus, Ohio 43210, USA
Accepted XXX. Received YYY; in original form ZZZ
ABSTRACT
The Gaia mission is delivering exquisite astrometric data for 1.47 billion sources, which arerevolutionizing many fields in astronomy. For a small fraction of these sources the astrometricsolutions are poor, and the reported values and uncertainties may not apply. For many analysesit is important to recognize and excise these spurious results, commonly done by means ofquality flags in the Gaia catalog. Here we devise and apply a path to separating ’good’ from’bad’ astrometric solutions that is an order-of-magnitude cleaner than any single flag: weachieve a purity of 99.7% and a completeness of 97.6% as validated on our test data. Wedevise an extensive sample of manifestly bad astrometric solutions: sources whose inferredparallax is negative at ≥ . 𝜎 ; and a corresponding sample of presumably good solutions:the sources in HEALPix patches of the sky that do not contain extremely negative parallaxes.We then train a neural net that uses 14 pertinent Gaia catalog entries to discriminate thesetwo samples, captured in a single ’astrometric fidelity’ parameter. An extensive and diverseset of verification tests show that our approach to assessing astrometric fidelity works verycleanly also in the regime where no negative parallaxes are involved; its main limitations arein the very low S/N regime. Our astrometric fidelities for all EDR3 can be queried via theVirtual Observatory. In the spirit of open science, we make our code and training/validationdata public, so that our results can be easily reproduced. Key words:
Galaxy: stellar content, Galaxy: kinematics and dynamics, software: publicrelease, space vehicles: instruments, virtual observatory tools
Parallax measurements contain information about the distance of as-trophysical objects, and are critical to anchoring the cosmic distanceladder. At the same time, kinematic measurements – proper motionsand radial velocities – provide phase-space information that is keyto understanding Milky Way dynamics and external galaxies. The1.47 billion astrometric measurements reported in
Gaia
Early DataRelease 3 (Gaia Collaboration et al. 2020a, “EDR3”) constitute thelargest astrometric dataset ever produced.While this astrometric catalog is of extremely high quality(Lindegren et al. 2020b), a significant fraction of astrometric solu-tions are spurious (Fabricius et al. 2020). Spurious astrometric solu-tions should are a distinct issue from negative parallaxes, which arean expected outcome of the normally distributed parallax measure-ment (Bailer-Jones 2015; Luri et al. 2018). Spurious solution havea biased parallax value with an incompatible parallax uncertainty ★ E-mail: [email protected] reported, due to specific failure modes (Fabricius et al. 2020). Thiscan be a particular concern when looking at sparsely populated por-tions of the color-magnitude diagram, or at extreme objects, suchas the nearest or fastest-moving stars (i.e., those with the largestparallaxes or proper motions, respectively). For example, naivelyselecting all objects with measured parallaxes greater than 10 mas(corresponding to a distance of less than 100 pc) yields a catalogwith an estimated 50 % of spurious parallax measurements.
Gaia
EDR3 provides a number of astrometric quality parameters that canbe used to exclude such spurious solutions. The “Gaia Catalogueof Nearby Stars” (Gaia Collaboration et al. 2020b, “GCNS”) usesa combination of these parameters to filter out spurious sources,obtaining a highly complete and pure subset of
Gaia
EDR3 sourceslying within 100 pc. In this paper, we use a similar approach toextend this work to the entire
Gaia
EDR3 catalog.
Gaia
EDR3 provides 1.47 billion astrometric measurementscontaining of a two-dimensional position on the sky, a two-dimensional proper motion and a parallax (in addition, a 7.2Msubset also has radial velocity measurements). There are many pos- © a r X i v : . [ a s t r o - ph . I M ] J a n Jan Rybizki et al. sible sources of excess noise in these astrometric measurements.Some error modes, such as unmodeled acceleration caused by anunresolved binary companion typically introduce small residualsinto the astrometric solution, which will usually be accounted forin the parallax uncertainty estimate (Lindegren et al. 2020b). How-ever, other error modes, such as incorrect epoch cross-matches withbackground or spurious sources and also close source pairs, whichmight be partially resolved (Fabricius et al. 2020), can introducevery large residuals, scattered around the true parallax(Gaia Col-laboration et al. 2020b), which are unaccounted for in the reportedparallax uncertainty. Spurious astrometric solutions mainly happenin very dense parts of the sky (Fabricius et al. 2020). It is this latterclass of “catastrophic” errors in the astrometric solutions (leadingto errors in excess of the stated uncertainties) that we will attemptto detect.One can try to mitigate these spurious astrometric solutionswith cuts on ruwe , visibility_periods_used and G magnitude. Thesecuts are known to exclude many valid sources and also bias the skycoverage. In the GCNS the approach was to use many and onlyastrometric quality indicators and train a random forest on a goodand a bad training sample. Since only sources with observed parallaxof greater 8mas were considered, this was operating in an extremelyhigh parallax SNR regime. For the “bad” examples the sourceswith parallax < -8 mas were used, exploiting the fact that spuriousastrometric solutions can be expected to scatter randomly aroundthe true parallax. For the “good” examples sources in low densityregions of the sky were used that were crossmatched to 2MASS andshowed consistent absolute magnitudes in Gaia and 2MASS bandswith main stellar populations.When trying to classify the spurious parallax solutions for thewhole GaiaEDR3 catalog we also need to make informed decisionsfor low parallax SNR as they constitute 85 % of the sources (forSNR < This preprint is a work in progress. We have created a classifier thatwe believe identifies cleanly sources with spurious parallaxes, andwhich performs significantly better than the simple cuts advocatedin the existing literature. However, we are open to suggestions forimprovement – in the classifier itself, the creation of the trainingdatasets, and ideas for validating performance. We want to allowthe community to use this classifier, and welcome feedback. We areopen to offering co-authorship in the final journal-submitted versionof this work for any significant contribution.All of the work in this paper, including the training of theclassifier and the various validation tests, can be redone using a
Python notebook and data that we have made available. .As we update the classifier, we will continue to keep the initial The notebook can be found at https://colab.research.google.com/drive/1d4KCXiCyFzLF1RzTzRGRAnVS0Uc8x3RU?usp=sharing ,while the necessary data is stored at https://keeper.mpdl.mpg.de/d/21d3582c0df94e19921d/
Figure 1.
Density distribution of the sources identified as bad for our trainingsample by their < − . 𝜎 (negative) parallaxes, shown using an Aitoffprojection in Galactic coordinates (orange to black). In contrast, the regionsof the sky from which we drew the good training sample, where such stronglynegative parallaxes are absent, are shown in blue. version of the astrometric fidelities (v1) in the corresponding Vir-tual Observatory (VO) table, but will add additional columns withupdated probabilities. This will allow astronomers to redo theiranalysis with upcoming classifiers, and to compare results acrossdifferent versions of the classifier. In all of the following we neglect the zero-point parallax offset(Lindegren et al. 2020a). We name the training set for spurious(valid) astrometric solutions " bad " ( "good" ). We construct our bad training sample by selecting sources with parallax _ over _ error < − .
5. We use the following query:
SELECT *FROM gaiaedr3.gaia_sourceWHERE parallax_over_error < -4.5
This returns 4.18 million sources. If all of the 1.47 billion sourcesin Gaia EDR3 with measured parallaxes had a true parallax of zero,and all of the measurement errors were Gaussian, then we wouldexpect approximately 5000 stars – nearly three orders of magnitudefewer – to satisfy the above cut. Since in reality, sources have positiveparallaxes, the discrepancy is even larger. Thus, even with the mostpessimistic assumptions, the contamination rate of our bad trainingsample by sources with good astrometric solutions is ∼ . bad sources over the sky.Dense areas such as the bulge, disc and the Magellanic clouds areapparent, but scanning law patterns are also visible, with regionsof the sky that are scanned most often (notably the two rings alongecliptic latitude ≈ ± ◦ ) having higher densities of spurious astro-metric solutions. We conjecture that this is due to the many scansalong a similar scanning angle, which increases the probability ofspurious detections ocurring at the same place and therefore reduc-ing the probability of being filtered out in the downstream process(Torra et al. 2020) for example due to visibility_periods_used < MNRAS000
This returns 4.18 million sources. If all of the 1.47 billion sourcesin Gaia EDR3 with measured parallaxes had a true parallax of zero,and all of the measurement errors were Gaussian, then we wouldexpect approximately 5000 stars – nearly three orders of magnitudefewer – to satisfy the above cut. Since in reality, sources have positiveparallaxes, the discrepancy is even larger. Thus, even with the mostpessimistic assumptions, the contamination rate of our bad trainingsample by sources with good astrometric solutions is ∼ . bad sources over the sky.Dense areas such as the bulge, disc and the Magellanic clouds areapparent, but scanning law patterns are also visible, with regionsof the sky that are scanned most often (notably the two rings alongecliptic latitude ≈ ± ◦ ) having higher densities of spurious astro-metric solutions. We conjecture that this is due to the many scansalong a similar scanning angle, which increases the probability ofspurious detections ocurring at the same place and therefore reduc-ing the probability of being filtered out in the downstream process(Torra et al. 2020) for example due to visibility_periods_used < MNRAS000 , 1–12 (2020) purious astrometric solutions In order to construct the good training set, we select all sources inregions of the sky that do not contain any sources with significantlynegative parallaxes (in the above − . 𝜎 sense). This means that our good and bad training examples come from disjoint regions of thesky, as can be seen in Fig. 1. The good sample does not come fromthe Galactic plane (i.e., | 𝑏 | > ◦ for all good sources). In detail,we separately query each HEALPix level-6 pixel that contains nosources with parallax _ over _ error < − .
5. In all, 4197 out of49152 pixels meet this condition. Our query for a single pixel is asfollows:
SELECT *FROM (SELECT dr3.*, tmass.tmass_oid,FLOOR(dr3.source_id/140737488355328) as hpx6FROM gaiaedr3.gaia_source as dr3JOIN gaiadr2.tmass_best_neighbour AS tmassUSING (source_id)WHERE source_id BETWEEN 0 AND 562949953421311) ASsubquery-- Query only first HEALpix of level 6JOIN gaiadr1.tmass_original_valid AS tmUSING (tmass_oid)-- only sources with a crossmatch to 2MASS are queried
We obtain a total of 5.24 million sources from the 4197 pixels wequery in this manner. The requirement that the source is also visiblein 2MASS (Skrutskie et al. 2006) ensures that we do not includespurious sources. It lowers the fraction of faint blue objects (e.g.white dwarfs) though, but this cut does not seem to propagate intoour prediction, as we do not classify using photometric indicators.We split our good training set into two subsets: a high-SNRsubset with parallax _ over _ error ( SNR ) > . − . < SNR < .
5. In order to further purify our good training set, we require 𝐺 − 𝑅𝑃 < . 𝐺 − 𝑅𝑃 requires 𝑅𝑃 photometry, which only excludes 10k sources. The 40k sourcesremoved by this cut are unphysically red, as can be seen in the upperpanel of Fig. 2. This extremely red color sometimes coincides withnearby sources and/or high phot_bp_rp_excess_factor . Afterthese photometric cuts, our good training set contains 5.18 millionsources.As is apparent from the lower panel of Fig. 2, the high-SNRsubsample of the good training set (in blue) resembles a low-extinction color-absolute magnitude diagram (CAMD), with manyof the known subpopulations clearly distinguishable and almost nounphysical features. The small overdensity of sources between themain sequence (MS) and the white dwarf (WD) sequence are mostlydue to erroneously high 𝐵𝑃 values from faint sources (Riello et al.2020). It is also important to note that the cut in 𝐺 − 𝑅𝑃 does not ex-clude the reddest asymptotic giant branch (AGB) stars in 𝐵𝑃 − 𝑅𝑃 .As expected, the absolute magnitudes of the low-SNR subsamplescatter to much brighter absolute magnitudes, due to their poorlyconstrained distance moduli. The low-SNR sample consists mainlyof sources with apparent 𝐺 between 15 and 20 mag.In constructing the good sample, we do not cut on any astro-metric flags that we use later as potential features in the classifier. Ofcourse, our good and bad training sets probe quite different regimes,with the good training set coming mostly from low-extinction re-gions and sparse fields. Nevertheless, we hope (and later verify) thatin the space of astrometric parameters and quality flags, our trainingsets cover the relevant feature space and will allow our classifier to Figure 2.
Color-absolute magnitude (CAMD) distribution for different ver-sions of the good training sample, showing a naive estimate of the ab-solute 𝐺 -band magnitude as a function of two different colors, 𝐺 − 𝑅𝑃 in the top panel, and ( 𝐵𝑃 − 𝑅𝑃 ) in the bottom panel. The green pointsresult from the initial query, before sources excessively red in 𝐺 − 𝑅𝑃 were removed. The gray and blue points show the low SNR and high SNRsubsample, defined by 3 < parallax _ over _ error ( SNR ) < . parallax _ over _ error ( SNR ) > .
5, respectively. The plotting order ischanged such that either the high-SNR subset in blue (bottom) or the low-SNR subset in grey (top) is fully visible. have discriminative power over the entire sky, as was the case forthe GCNS. However, we acknowledge that it would be desirable toadd valid astrometric solutions from the Galactic plane to the good training sample.
We use the exact same features as in the GCNS (Gaia Collaborationet al. 2020b) marked in grey in their Table A.1. For completeness,we list them here in descending order of importance accordingto the Gini metric reported by the GCNS: parllax_error , parallax_over_error , astrometric_sigma5d_max , pmra_error , pmdec_error , astrometric_excess_noise , ipd_gof_harmonic_amplitude , ruwe , MNRAS , 1–12 (2020)
Jan Rybizki et al. visibility_periods_used , pmdec , pmra , ipd_frac_odd_win , ipd_frac_multi_peak and astrometric_gof_al . For parallax_over_error , which we abbreviate as SNR, both theGCNS and we use the absolute value as a feature. We train two different classifiers, intended for use in the regimes | SNR | < . | SNR | > .
5. We will refer to these classifiers asthe “low-SNR” and “high-SNR” classifiers, respectively. The mostimportant difference between these two classifiers is that the high-SNR classifier uses |SNR| as a feature, while the low-SNR classifierdoes not. Recall that our bad training set does not include sourceswith | 𝑆𝑁 𝑅 | < .
5. If we were to allow the low-SNR classifier totake |SNR| into account, it would learn that there are no bad sourceswith | 𝑆𝑁 𝑅 | < .
5, which is simply an artifact of our method ofidentifying training data.To train the low-SNR classifier, we use only training data with | 𝑆𝑁 𝑅 | > . good training examples).Excluding low-SNR training data while training the low-SNR clas-sifier may seem counterintuitive, but our goal is to prevent theimbalance in coverage of SNR-space in the good and bad trainingsets from impacting our classifications in the low-SNR regime. Inthis regime, we end up with 3,964,264 good and 4,180,244 bad training examples.To train the high-SNR classifier, we use the entire good and bad sets. In this regime, we end up with 5,184,555 good and 4,180,244 bad training examples. In contrast to the work on GCNS, where spurious sources wereidentified using a random forest, we employ a feed-forward neuralnetwork (NN) here. Our NN model consists of 4 hidden layers, eachwith 64 neurons and a Rectified Linear Unit (ReLU) activation.The final layer has a single neuron with a sigmoid activation, andrepresents the probability that a source belongs to the good class.We use the binary cross-entropy loss function (e.g. Goodfellowet al. 2016), which is closely related to the Kullback-Leibler diver-gence and which measures how much additional information wouldneeded to correct the classifier’s prediction. Given input features (cid:174) 𝑥 ,the classifier outputs a probability 𝑃 ((cid:174) 𝑥 ) that the source belongs tothe good class. Denote true class (the label) by 𝑦 ∈ { , } (where 𝑦 = good ”). The binary cross-entropy is then given by H = − ( − 𝑦 ) ln [ − 𝑃 ((cid:174) 𝑥 )] − 𝑦 ln 𝑃 ((cid:174) 𝑥 ) . (1)As the binary cross-entropy is a measure of missing information, itcan be expressed in units of bits or nats.We implement our model in Tensorflow 2 (Abadi et al. 2016)and Keras (Chollet et al. 2015). We train for 100 epochs with anAdam optimizer (Kingma & Ba 2014), using a learning rate of10 − in the first 50 epochs, and a learning rate of 10 − in the final50 epochs. During training, we apply a dropout rate of 0.1 aftereach hidden layer in order to prevent over-fitting. The features areshuffled and normalised (to zero mean and unit variance) prior tothe training, and we set 20% of the data aside for validation.We train our high- and low-SNR classifiers separately. Weassess our final performance by applying the low-SNR classifier toour low-SNR test dataset, and our high-SNR classifier to our high-SNR test dataset. On this combined test dataset, we achieve a loss of0.0405 nats of entropy, with a purity of 99.7% and a completeness of . . . . . . − − label = good label = bad high SNR , label = good Figure 3.
Histogram of the predicted classifier probabilities (of belonging tothe good class) for sources in the test dataset, split by training label. As the good class contains both low- and high-SNR training data, we additionallyshow the classifier probabilities for the high-SNR good sources. The 𝑥 -axis,is the probability output by the classifier that a given source is good , whichwe term the ”astrometric fidelity”. > We compare the astrometric fidelity predicted by our neuralnetwork to analogous quantities obtained using simpler classi-fiers. First, we evaluate how cleanly simple cuts on ruwe and astrometric_excess_noise separate good and bad sourcesin the high-SNR test dataset. Fig. 4 shows the binary cross-entropy, purity and completeness of the cut as a function ofthe threshold value for each feature. For ruwe , we achievea minimum binary cross-entropy of 1.61 nats using a cut of ruwe < .
12, corresponding to a purity of 89.5% and a com-pleteness of 89.0%. For astrometric_excess_noise , we achievea minimum binary cross-entropy of 1.01 nats for a cut of astrometric_excess_noise < . MNRAS000
12, corresponding to a purity of 89.5% and a com-pleteness of 89.0%. For astrometric_excess_noise , we achievea minimum binary cross-entropy of 1.01 nats for a cut of astrometric_excess_noise < . MNRAS000 , 1–12 (2020) purious astrometric solutions . . . . . . . . . . c r o ss e n t r o p y . good : ruwe < threshold crossentropypuritycompleteness 0 . . . . . . . . . pu r i t y & c o m p l e t e n e ss . . . . . . . . . . . c r o ss e n t r o p y . good : astrometric excess noise < threshold crossentropypuritycompleteness 0 . . . . . . . . . . . pu r i t y & c o m p l e t e n e ss . . Figure 4.
Performance of simple cuts on ruwe (top panel) and astrometric_excess_noise (bottom panel) in differentiating good and bad astrometric solutions. In each panel, we show how binary cross-entropy,purity and completeness depend on the threshold chosen for the cut. Ourneural network predicting astrometric fidelity achieves an order of magni-tude less contamination than the optimal choices for these cuts (at minimalcross-entropy). linear combination of features. This model assigns a probability 𝑃 ( good | (cid:174) 𝑥 ) = (cid:104) + 𝑒 −( (cid:174) 𝑤 · (cid:174) 𝑥 + 𝑏 ) (cid:105) − (2)of belonging to the good class to each source, where (cid:174) 𝑥 is a vectorcontaining the features, (cid:174) 𝑤 is a vector containing a weight for eachfeature, and 𝑏 , the bias, is a scalar. We use the Adam optimizer tofind the weights and bias that minimize the binary cross-entropy ofthe predictions. On the high-SNR test dataset, we obtain a binarycross-entropy of 0.0960 nats, a purity of 96.4% and a completenessof 97.0%. This is better than what we achieve with simple cuts, butstill represents more than three times the binary cross-entropy weobtain with the full neural network model.The full neural network is not significantly more difficult toimplement than these simpler classifiers, and it achieves a far morecomplete and pure separation of the validation data. For these rea-sons, we strongly favor use of the full neural network classificationover simpler alternatives. We divide the validation for the two models in the regimes SNR ≥ . < . We first apply our classifier to all sources in the “Gaia Catalogueof Nearby Stars” (GCNS). All 1.2 million sources with parallax > | SNR | ≥ . | SNR | < .
5. Taking the GCNS clas-sifications to be correct, our low-SNR model has lower performancethan our high-SNR model. However, bad parallax determinationsare fundamentally more difficult both to identify and to define inthis regime, as the reported measurements are compatible with avery wide range of true parallaxes. However, note that we did nottrain our classifier on the GCNS sample, so it is not unsurprisingthat our low-SNR classifier performs worse on the GCNS datasetthan on our own test dataset.
Open and globular clusters, and the “prior” information on thedistance of their likely member stars, offer a great opportunity tovalidate our parallax classifier. We begin with a catalog of 162,484sources assigned to 121 clusters, coming from Bailer-Jones et al.(2018). This catalog was compiled using a method similar to thatused in Gaia Collaboration et al. (2018). For each individual clus-ter, we calculate the variance-weighted mean parallax of its membersources, as well as the corresponding uncertainty in the mean par-allax: (cid:104) ˆ 𝜛 (cid:105) = ∑︁ 𝑖 ˆ 𝜛 𝑖 𝜎 𝑖 , 𝜎 (cid:104) ˆ 𝜛 (cid:105) = (cid:32)∑︁ 𝑖 𝜎 𝑖 (cid:33) − / . (3)We then select the 41 clusters for which 𝜎 (cid:104) ˆ 𝜛 (cid:105) /(cid:104) ˆ 𝜛 (cid:105) < . Δ ˆ 𝜛 ≡ ˆ 𝜛 − (cid:104) ˆ 𝜛 (cid:105) , 𝜎 Δ ˆ 𝜛 = (cid:16) 𝜎 𝜛 + 𝜎 (cid:104) ˆ 𝜛 (cid:105) (cid:17) / . (4)The distribution of these parallax residuals (divided by the cor-responding uncertainties) is shown in Fig. 5. Our classifier labelsapproximately 1% of sources in these clusters as bad . The stan-dardized residuals of sources classified as good roughly follow theexpected unit normal distribution, while the distribution of standard-ized residuals of the sources classified as bad is shifted negative andhas much longer tails. MNRAS , 1–12 (2020)
Jan Rybizki et al. − − $/σ ∆ ˆ $ . . . . . N (0 , goodbad Figure 5.
Validation of our astrometric fidelity prediction using open andglobular clusters. The figure shows histograms of the standardized parallaxresiduals for good and bad sources. The true parallaxes are estimated usingthe variance-weighted mean of the parallaxes in each cluster. We restrictthis comparison to clusters with distances determined to 20% or better. The good sources closely follow the expected unit normal distribution, in markedcontrast to the standardized residuals of the bad sources.
The Fourth Phase of the Optical Gravitational Lensing Experi-ment (OGLE-IV, Udalski et al. 2015) began observing the bulgeof the Milky Way in 2010. Here, we validate our classifier usingsources with proper-motion measurements from both OGLE-IV(OGLE Uranus astrometry project, Udalski et al. 2021, in prepa-ration) and Gaia EDR3. Our assumption is that objects with spu-rious parallax determinations in Gaia EDR3 are more likely tohave spurious proper-motion determinations. This should be re-flected in the proper-motion residuals between Gaia EDR3 andOGLE-IV, with sources classified as bad in Gaia EDR3 having sys-tematically higher 𝜒 values in this comparison. We begin with acatalog of OGLE-IV sources with proper-motion measuresments,lying in a 0 .
15 deg × .
15 deg box centered on ( 𝛼 J2000 , 𝛿
J2000 ) = ( .
761 deg , − .
698 deg ) . Using a matching radius of 0 . (cid:48)(cid:48) , weobtain 14125 matching Gaia EDR3 sources with measured propermotions. Our classifier labels 2288 of these sources good .We calculate the proper-motion residuals, Δ (cid:174) 𝜇 ≡ (cid:174) 𝜇 Gaia −(cid:174) 𝜇 OGLE , as well as the covariance matrix of the residuals, 𝐶 Δ (cid:174) 𝜇 = 𝐶 𝜇, Gaia + 𝐶 𝜇, OGLE . We then calculate 𝜒 = Δ (cid:174) 𝜇 𝑇 𝐶 − Δ (cid:174) 𝜇 Δ (cid:174) 𝜇 for eachsource. If the uncertainties are well estimated and the residuals fol-low a Gaussian distribution, then the 𝜒 values that we obtain shouldfollow a 𝜒 distribution with two degrees of freedom. However, wefind that the resulting 𝜒 values are significantly larger, on average,than expected, both for sources labeled good and bad , indicatingthat Gaia EDR3 and/or OGLE-IV proper-motion uncertainties areunderestimated in the Galactic Bulge. One could attempt to addressthis problem by inflating the uncertainties by a constant factor orby introducing a systematic error floor. However, these differentmethods of “correcting” the proper-motion uncertainties impact thedistributions of 𝜒 values obtained for the good and bad sources dif-ferently, as the good sources tend to have smaller estimated proper-motion uncertainties than the bad sources. In order to avoid thesedifficulties, we restrict our comparison to sources in a relativelysmall range of estimated proper-motion uncertainties, for which we χ / dof10 − − − . . m e d i a n χ / d o f m e d i a n χ / d o f Gaia vs . OGLE proper motions goodbad
Figure 6.
Validation of our astrometric fidelity classification through propermotion comparison between OGLE and Gaia EDR3. Shown is the distribu-tion of 𝜒 / dof, based on a comparison of Gaia EDR3 and OGLE-IV propermotions in the Galactic Bulge, for sources labeled good and bad by ourclassifier. For ideal data, the median 𝜒 / dof would be ∼ .
69. We find thatthat proper-motion uncertainties are underestimated for Gaia EDR3 and/orOGLE-IV, leading to larger 𝜒 / dof values. However, for sources labeled good by our classifier, Gaia EDR3 and OGLE-IV proper motions matchsignificantly better, as indicated by the lower median 𝜒 / dof values (4.4 forthe good subsample, vs. 7.4 for the bad subsample). assume the true proper-motion uncertainties to be similar. In particu-lar, we select sources with 0 . − < (cid:12)(cid:12)(cid:12) 𝐶 Δ (cid:174) 𝜇 (cid:12)(cid:12)(cid:12) / < . − ,obtaining 1192 sources labeled good and 1978 sources labeled bad .The resulting distributions of 𝜒 values are displayed in Fig. 6. Wefind that sources labeled good by our classifier tend to have signifi-cantly lower 𝜒 values than those labeled bad , with the median 𝜒 per degree of freedom (dof) for the good subsample being 4.4, andthe median 𝜒 / dof of the bad subsample being 7.4. In the direction of the Large Magellanic Cloud (LMC), the vast ma-jority of sources should be at a distance of ∼
50 kpc, correspondingto a parallax of 0.02 mas. This affords us another opportunity tovalidate our classifications, as almost all stars labeled good in thisregion of the sky should have reported parallaxes consistent with0.02 mas. We expect the bad sources to have larger than reportedresiduals, and to scatter equally to positive and negative parallaxes,leading to a widened distribution of reported parallaxes centered on0.02 mas.We query a 0.25 deg cone in Gaia eDR3, centered on Galacticcoordinates ( ℓ, 𝑏 ) = ( .
47 deg , − .
88 deg ) , obtaining 252,115sources, which we then run through our classifier. In this denselycrowded region of the sky, only 11.2% of all sources are classified as good , while only 0.7% of high-SNR sources are classified as good .In order to model the small number of Milky Way foreground starsin this field, we compare to a control field of the same apparentsize with the same Galactic latitude, and longitude reflected around ℓ = good .Fig 7 shows the parallax distribution of good and bad sources MNRAS000
88 deg ) , obtaining 252,115sources, which we then run through our classifier. In this denselycrowded region of the sky, only 11.2% of all sources are classified as good , while only 0.7% of high-SNR sources are classified as good .In order to model the small number of Milky Way foreground starsin this field, we compare to a control field of the same apparentsize with the same Galactic latitude, and longitude reflected around ℓ = good .Fig 7 shows the parallax distribution of good and bad sources MNRAS000 , 1–12 (2020) purious astrometric solutions − $ (mas)10 LMC ( parallax error < . N (0 , . goodbad control field Figure 7.
Validation of our astrometric fidelity classification through theparallax distribution of good and bad sources with small parallax uncertain-ties ( 𝜎 ˆ 𝜛 < . good sources is consistent with a large population of distant ( ˆ 𝜛 ≈
0) sources(approximated by a normal distribution with zero mean and a standard de-viation of 0.2 mas), along with an expected population of foreground stars(matching the distribution of parallaxes in a control field). In contrast, the bad parallaxes are consistent with a distant population of stars with parallaxuncertainties that are underestimated by ∼ with small reported errors ( parallax _ error < .
2) in our LMCfield. The parallax distribution of good sources is consistent with adistant population of stars with well-measured errors, plus a smallforeground population of Milky Way stars at larger parallax (match-ing the control field). The bad sources are consistent with a distantpopulation of stars with significantly underestimated parallax errors.Our classifier is thus clearly identifying sources with excess paral-lax residuals, and even in this dense field, is still cleanly identifyingforeground stars.
The catalog of O-, B-, and A-type (OBA) stars devised by Zari etal. (2021, subm.) offers another opportunity to test our classifierwith an ensemble of sources at low Galactic latitudes . Zari et al.( in preparation ) select stars brighter than 𝐺 =
16 mag, with
Gaia
EDR3 and 2MASS colors consistent with (reddened) OBA-typestars. Zari et al. do not apply any condition on the parallax error, asthe sample was designed to be inclusive for spectroscopic follow-up.We run our classifier on the resulting catalog consists of ∼ good (left, ∼
75% of the initialsample) and bad (right) sources in the Galactic plane ( | 𝑏 | < ◦ ).The distribution of sources with good astrometric solutions showsknown regions of young stars and traces the spiral arm structure ofthe Milky Way disk, as discussed in Zari et al. (cf. their Fig. 11). Thedistribution of sources with bad astrometric solutions shows a ring-like feature between 2 and 3 kpc, which is physically implausibleand hence presumably spurious. This is expected, as the parallaxdistribution of all sources in the OBA catalog peaks at around0.3 mas ( ∼ As a final approach to validation, we visually inspect the projectedsky distribution and CAMDs of Gaia EDR3 sources classified as good and bad in narrow bins of (catalog-reported) parallax. Werefer to the parallax bins by their corresponding nominal distances.The 100 pc sample consists of the 1.2 million sources with ˆ 𝜛 > . < ˆ 𝜛 < . < ˆ 𝜛 < . .
333 mas < ˆ 𝜛 < .
334 mas) contains 1.3 million sources,the 10 kpc sample (0 . < ˆ 𝜛 < .
101 mas) contains 1.2 millionsources, and the 30 kpc sample (0 . < ˆ 𝜛 < .
034 mas)contains 1.4 million sources.
Here we only look at sources in each parallax bin that have|SNR| ≥ good sources in four parallax slices(from top to bottom, at nominal distances of 0.3, 1, 3 and 10 kpc). Inthe closer distance slices, distinct overdensities that correspond toopen clusters are visible. At 3 kpc, the Milky Way’s overall structurebecomes clearly visible, with star forming regions standing out. Notmany sources at very large distances have high SNR, and highlyextincted regions of the Galactic plane have no sources (and aretherefore colored gray).When looking at the sources classified as bad in Fig. 10, thebulge and disk dominate in all parallax bins. Even nominally nearbysources with high SNR are concentrated in the region of the skycorresponding to the bulge and disk. Interestingly, even a cut for rea-sonably high SNR does not remove spurious astrometric solutions,as can be seen by the large number of bad sources.Fig. 11 shows the CAMD of high-SNR good solutions in eachparallax bin. The stellar locus well populated in each parallax bin.The unphysically large number of seemingly pre-main sequencestars (redder and brighter than the main sequence) is due to photo-metric excess in the RP photometry of sources in dense regions ofthe sky (Riello et al. 2020). Another feature that is apparent in theseCAMDs is that the red clump becomes increasingly elongated withdistance, due to the greater range of dust columns probed at largerdistances.For the high SNR sources that are classified bad we see inFig. 12 a floor of sources near to the Gaia magnitude limit forthe respective parallax slices. For the 3000pc bin there might bean indication of AGB sources wrongly classified as bad. Thoughobservational conditions for these extreme objects might resembleastrometrically a bad solution source. Now we inspect the result of the low-SNR model on the low SNRvalidation data (|SNR| < good sources in Fig. 13, we see a similarity to the high-SNR sample,though the bulge region is missing and scanning law patterns arevisible. While the structures overlap more across different distancebins (due to the lower parallax SNR), similar structures are visible at3 and 10 kpc as in the high-SNR sample (Fig. 9). Reassuringly, thebulge is most prominent at 10 kpc, while at 30 kpc, the Magellanicclouds are more prominent. MNRAS , 1–12 (2020)
Jan Rybizki et al.
Figure 8.
Validation of our astrometric fidelity classification through the astrophysical plausibility of the X-Y distribution of young stars in the Galactic plane.Shown is the distribution of good (left) and bad (right) OBA star sources in the Galactic plane, with the Sun located at ( 𝑋, 𝑌 ) = ( , ) , and theGalactic center at ( . , ) . We have divided the plane into pixels 100 pc on a side. The color bar shows the number of sources per bin. The dashedcircles have radii ranging from 1 to 5 kpc, in steps of 1 kpc. The good sources show concentrations at many known locations of young stars, and show spiralarm like morphology. The bad sources show a ringlike structure, exactly centered on the sun and at the (seeming) distance of the most common parallax;clearly, a far too Ptolemean distribution to be “real”. Fig. 14 shows the sky distribution of sources classified as bad in the low-SNR sample. Bad sources strongly outnumber the good sources in the 1 kpc slice. In every distance slice, the sky distributionof the bad sources essentially traces the highest density parts of thesky. At 100 pc the scanning law is still visible, but the sparsity ismainly due to most sources having a high SNR in this distance bin.For the low SNR sample we only focus on the 10 kpc bin whenlooking at the CAMD. For the good sources in the upper panel ofFig. 15 we can see massive main sequence stars and turn-off stars,as well as the red clump and maybe sdB stars at the very blue end.For the bad sources in the lower panel, a weak signal of all thosepopulations seems to be present and again some AGB stars mighthave been wrongly predicted as bad. Overall most of the physicalstructure can be seen in the good sample, making us confident thateven at low SNR our astrometric fidelity classifier is useful.
Our catalog is hosted at the German Astrophysical Virtual Observa-tory (GAVO), in the table gedr3spur.main . The simplest wayto access the astrometric fidelities for a sample of stars is to cross-match directly via a Table Access Protocol (TAP) upload join inTOPCAT. If the local table has Gaia EDR3 source_id s one cansimply query:
SELECT src.*FROM gedr3spur.main as srcJOIN TAP_UPLOAD.t1 AS target-- TAP_UPLOAD.tX needs to be the table number in TOPCATUSING (source_id) https://dc.zah.uni-heidelberg.de/ https://dc.zah.uni-heidelberg.de/browse/gedr3spur/q The TOPCAT program is described at . Because there is 100 MB upload limit, one can increase the num-ber of sources queried at a time by hiding all columns except source_id . GAVO hosts a light version of Gaia EDR3, containingonly the most commonly used columns. One can directly query thislight versio nof Gaia eDR3 and simultaneously crossmatch to ourastrometric fidelities. For example,
SELECT COUNT(*) AS ct,ROUND(parallax/parallax_error,2) AS binFROM gaia.edr3lite -- only contains most important rowsJOIN gedr3spur.main using (source_id)WHERE fidelity_v1 >= 0.5 GROUP BY bin returns a histogram of the parallax distribution for the 730 million good sources. Requiring fidelity _ v1 < . bad sources. Fig. 16 shows thedistributions returned by these two queries. As expected, the bad solutions dominate the negative parallax regime. Interestingly, this isalso the case for positive parallaxes in the range 0 . < ˆ 𝜛 / mas < bad sources peaks at ˆ 𝜛 = .
19 mas andis almost symmetrically distributed around this point, with an excessof 35 million sources in the positive wing. This might be partly dueto misclassification of sources with good astrometric solutions, butcould also due in part to spurious solutions scattering around thetrue parallax value (Gaia Collaboration et al. 2020b).The parallax distribution of the good sources peaks at ˆ 𝜛 = .
26 mas. For comparison, we have also plotted the distribution ofobserved parallaxes from 1.33 billion GeDR3mock sources (Ry-bizki et al. 2020), which peaks at ˆ 𝜛 = .
16 mas. The bad astrometricsolutions are usually real sources that have anomalously large paral-lax errors. One can therefore imagine how the excess GeDR3mocksources (compared to the good
EDR3 sources) could be randomly We have adopted the EDR3 corrected parallax_error from theADQL query in Gaia Collaboration et al. (2020b) and imposed the mag-nitude limits from table maglim_6 of GeDR3mock: https://dc.zah.uni-heidelberg.de/browse/gedr3mock/q .MNRAS000
EDR3 sources) could be randomly We have adopted the EDR3 corrected parallax_error from theADQL query in Gaia Collaboration et al. (2020b) and imposed the mag-nitude limits from table maglim_6 of GeDR3mock: https://dc.zah.uni-heidelberg.de/browse/gedr3mock/q .MNRAS000 , 1–12 (2020) purious astrometric solutions Figure 9.
Distribution of good solutions for |SNR| ≥ ≥ scattered to produce the distribution of the bad sources, painting aconsistent picture.Fig. 17 shows the SNR distribution for the good and bad sources in EDR3. We see a jump in the number of sources at the|SNR| = 4.5 transition between our high- and low-SNR classifier. AtSNR = 4.5, the number of bad sources increases by almost 50 %,i.e. the high-SNR classifier seems to have a higher purity. Fabriciuset al. (2020) estimates the total contamination of the sources withSNR > ∼ < -5). Our classifier finds 12.2 million bad Figure 10.
Distribution of bad solutions for |SNR| ≥ sources in this regime, which constitute 6 % of the 192 millionsources with SNR > This is a list of ideas that could be applied to improve results andmight enter a future version of the classifier. • correcting for parallax zero point • use wide binaries for validation, use wide binary sample whichhave statistically sound parallax uncertainty for training • make mock test (using gedr3mock) with photometric cleaningfor sample generation • Sources in the good sample HEALpix without a 2MASS cross-match could be still used if a crossmatch to Pan-STARRS1 (Cham-bers et al. 2016) is found. • clean classified samples from sources that are obviously wrongand feed them into training • use different photometric cuts • use more restrictive cuts when acquiring the good trainingsample, e.g. only HEALpix with no sources of SNR < − . • take into account the error coming from true parallax overerror, vs measured parallax over error MNRAS , 1–12 (2020) Jan Rybizki et al.
Figure 11.
CAMD for good solutions for |SNR| ≥ • cut the samples at different snr levels • Optional features which might have high predictive power:– matched_transits_removed – astrometric_params_solved (or maybe train a modelfor 5p and 6p solutions separately)– phot_proc_mode (available for little less sources than thosewith astrometric solution but more than RP, BP)– it might also help to add in astrometric_excess_noise / astrometric_excess_noise_sig as an estimate of the un-certainty of the excess source noise– distance to nearest neighbour in the catalogue Figure 12.
CAMD for bad solutions for |SNR| ≥ bad sources overwhelminglyhave spurious parallaxes. – distance to nearest bright neighbour (e.g. < 𝐺 ) due totheir spurious source creation (Fabricius et al. 2016).– phot_bp_rp_excess_factor (only available for sourceswith both colors) We have extended the classification of valid and spurious astro-metric solutions from the Gaia Catalogue of nearby stars (GaiaCollaboration et al. 2020b) to all 1.47 billion sources in Gaia EDR3
MNRAS000
MNRAS000 , 1–12 (2020) purious astrometric solutions Figure 13.
Sky distribution of good solutions for |SNR| < with astrometry. Our training sample of spurious sources are ob-tained by taking all sources with parallax_over_error < -4.5.Our training sample of good astrometric solutions is obtained bytaking all sources with a 2MASS crossmatch in parts of the skywhere no sources with parallax_over_error < -3.5 exist. Wetrain two neural network models, one for high parallax SNR andone for low (divided at |SNR|=4.5), which take astrometric qualityparameters from EDR3 as inputs.Our validation shows that we outperform simple cuts but alsologistic models that take into account a linear combination of ourfeatures. Our good sources’ parallaxes distribute normally withrespect to clusters but also the LMC. Sources classified as good also have significantly lower 𝜒 when comparing to OGLE propermotions. Sources classified as bad usually occur in high-density Figure 14.
Sky distribution of bad solutions for |SNR| < Figure 15.
CAMD of good (top panel) and bad (bottom panel) solutionsfor |SNR| < , 1–12 (2020) Jan Rybizki et al.
Figure 16.
Parallax distribution for good and bad sources in Gaia EDR3 andfor the mock observed parallaxes of GeDR3mock.
Figure 17.
SNR ( parallax_over_error ) distribution for good and badsources in Gaia EDR3. The |SNR| = 4.5, where the classifiers change, isshown in dashed grey lines. regions, e.g. in the bulge and disc region and in the Magellanicclouds.
ACKNOWLEDGEMENTS
The authors would like to thank Douglas P. Finkbeiner and JoshuaS. Speagle for helpful discussions and suggestions.This work has made use of data from the European SpaceAgency (ESA) mission Gaia, processed by the Gaia Data Process-ing and Analysis Consortium (DPAC). Funding for the DPAC hasbeen provided by national institutions, in particular the institutionsparticipating in the Gaia Multilateral Agreement.This research or product makes use of public auxiliary dataprovided by ESA/Gaia/DPAC as obtained from the publicly acces-sible ESA Gaia SFTP.This work was funded by the DLR (German space agency) viagrant 50 QG 1403.GG acknowledges funding from the Alexander von HumboldtFoundation, through the Sofja Kovalevskaja Award.The OGLE project has received funding from the NationalScience Centre, Poland, grant MAESTRO 2014/14/A/ST9/00121to AU.JR will not travel anywhere by aeroplane for the purpose ofpromoting this paper. Software:
TOPCAT (Taylor 2005),
HEALpix (Górski et al.2005).
Data availability
The data underlying this article are available in the article and in itsonline supplementary material.
REFERENCES
Abadi M., et al., 2016, TensorFlow: Large-Scale Machine Learning on Het-erogeneous Distributed Systems ( arXiv:1603.04467 )Bailer-Jones C. A. L., 2015, PASP, 127, 994Bailer-Jones C. A. L., Rybizki J., Fouesneau M., Mantelet G., Andrae R.,2018, AJ, 156, 58Chambers K. C., et al., 2016, arXiv e-prints, p. arXiv:1612.05560Chollet F., et al., 2015, Keras, https://keras.io
Fabricius C., et al., 2016, A&A, 595, A3Fabricius C., et al., 2020, arXiv e-prints, p. arXiv:2012.06242Gaia Collaboration et al., 2018, A&A, 616, A10Gaia Collaboration Brown A. G. A., Vallenari A., Prusti T., de Brui-jne J. H. J., Babusiaux C., Biermann M., 2020a, arXiv e-prints, p.arXiv:2012.01533Gaia Collaboration et al., 2020b, arXiv e-prints, p. arXiv:2012.02061Goodfellow I., Bengio Y., Courville A., 2016, Deep Learning. MIT PressGórski K. M., Hivon E., Banday A. J., Wandelt B. D., Hansen F. K., ReineckeM., Bartelmann M., 2005, ApJ, 622, 759Kingma D. P., Ba J., 2014, arXiv e-prints, p. arXiv:1412.6980Lindegren L., et al., 2020a, arXiv e-prints, p. arXiv:2012.01742Lindegren L., et al., 2020b, arXiv e-prints, p. arXiv:2012.03380Luri X., et al., 2018, A&A, 616, A9Riello M., et al., 2020, arXiv e-prints, p. arXiv:2012.01916Rybizki J., et al., 2020, PASP, 132, 074501Skrutskie M. F., et al., 2006, AJ, 131, 1163Taylor M. B., 2005, in Shopbell P., Britton M., Ebert R., eds, AstronomicalSociety of the Pacific Conference Series Vol. 347, Astronomical DataAnalysis Software and Systems XIV. p. 29Torra F., et al., 2020, arXiv e-prints, p. arXiv:2012.06420Udalski A., Szymański M. K., Szymański G., 2015, Acta Astron., 65, 1This paper has been typeset from a TEX/L A TEX file prepared by the author.MNRAS000