Bayesian Cross-Matching of High Proper Motion Stars in Gaia DR2 and Photometric Metallicities for \sim1.7 million K and M Dwarfs
DDraft version February 23, 2021
Typeset using L A TEX preprint style in AASTeX63
Bayesian Cross-Matching of High Proper Motion Stars in
Gaia
DR2 and PhotometricMetallicities for ∼ Ilija Medan, S´ebastien L´epine, and Zachary Hartman
1, 2 Department of Physics and Astronomy, Georgia State University, Atlanta, GA 30302, USA Lowell Observatory, 1400 W Mars Hill Road, Flagstaff, AZ 86001 (Received Sep. 22, 2020; Accepted Feb. 17, 2021)
Submitted to The Astronomical JournalABSTRACTWe present a Bayesian method to cross-match 5,827,988 high proper motion
Gaia sources ( µ > mas yr − ) to various photometric surveys: 2MASS, AllWISE, GALEX,RAVE, SDSS and Pan-STARRS. To efficiently associate these objects across catalogs,we develop a technique that compares the multidimensional distribution of all sources inthe vicinity of each Gaia star to a reference distribution of random field stars obtained byextracting all sources in a region on the sky displaced 2 (cid:48) . This offset preserves the localfield stellar density and magnitude distribution allowing us to characterize the frequencyof chance alignments. The resulting catalog with Bayesian probabilities >
95% has amarginally higher match rate than current internal
Gaia
DR2 matches for most catalogs.However, a significant improvement is found with Pan-STARRS, where ∼ ∼ Gaia
DR2. Using these results, we train a Gaussian Process Regressor to calibratetwo photometric metallicity relationships. For dwarfs of 3500 < T eff < σ precision of 0.12 dex and few systematicerrors. We then indirectly infer the metallicity of 4,018 stars with 2850 < T eff < σ precision of 0.21 dex and significantsystematic errors. Additional work is needed to better remove unresolved binaries fromthis sample to reduce these systematic errors. Keywords: catalog, astrometry, stars: abundances INTRODUCTIONOver the past couple of decades, the stellar astronomical community has created multiple skysurveys that chart the positions and measure the brightness of stars in the local Milky Way indifferent wavelength regimes, down to different brightness limits and for varying regions of the sky.By combining these various catalogs, astronomers are able to derive a wealth of knowledge about thestellar properties and kinematics for hundreds of millions of stars. The recent
Gaia
Data Release 2 a r X i v : . [ a s t r o - ph . S R ] F e b (Gaia Collaboration et al. 2018b) is arguably the most important catalog to be released in recentyears, as it not only provides accurate photometry for an unprecedented numbers of stars, but it alsoprovides the most accurate position and motion measurements to date.With the number of very large stellar catalogs on the rise, developing methods to accurately matchstars between catalogs becomes very important. Distant, bright stars, which make up most of thestars in traditional catalogs, are typically straightforward to match as their brightness make themprominent compared with fainter field stars in their vicinity, and because their position does not varymuch over time. Of particular interest, however, are the nearby, low-mass stars, which are typicallyfaint on the sky, and have large parallaxes and proper motions, now readily available from Gaia .Due to their faintness, they are more easily confused with background stars in the field, and becauseof their significant change in position over time, these stars become much more difficult to matchcorrectly between catalogs.In recent years this problem has been addressed through multiple methods, some statistical innature. Budav´ari & Szalay (2008) created a Bayesian framework that mostly relies on the angularseparation between objects in order to get a probability of a match. This method can be extendedto include other parameters (such as photometry) by assuming these distributions to be independentfrom the probability distributions for the angular separation between catalog objects. This frameworkhowever does not set up procedures for dealing with catalogs that have significantly different epochs ofobservation, when large proper motions are involved. Additionally, the assumption of independencebetween distributions in angular separation and brightness may not hold, as we will show in thiswork.Additionally, many large catalogs are attempting to address this issue by including their own cross-matches against popular external catalogs.
Gaia includes such attempted cross-matches using acustom, multi-filter algorithm created to find the best matches between
Gaia and several externalcatalogs based on the
Gaia astrometry (Marrese et al. 2019). While this does expand on the Budav´ari& Szalay (2008) approach by considering astrometric motions, the method does not explicitly considerthe photometry of the sources in either of the catalogs during the match.This study aims to address both of the components missing in the above two methods by combiningastrometry and photometry to create a Bayesian framework to match high proper motion ( µ > mas yr − ) stars in Gaia with existing catalogs that span the wavelength regime from the UV tothe Infrared.Complete and accurate photometry for stars across a larger portion of the EM spectrum is veryimportant for those interested in studying the physical properties of stars. A catalog of high propermotion stars, in particular, focuses on low-mass field stars in the solar vicinity, whose magnitudesand colors are less affected by interstellar reddening and potentially can be used to estimate theirmasses and chemical compositions. This is important as low mass stars are the most abundant starsin the Galaxy (Bochanski et al. 2010) and their long main sequence lifetimes mean they have thepotential to trace the entirety of the Galactic star-formation history, and provide clues to understandthe structure and evolution of the Milky Way.Additionally, low mass stars are of great interest for the exoplanet community, as these sourcesare the best targets for detecting terrestrial planets in habitable zones, due to such planets havingshorter orbital periods, and creating deeper transits and producing larger Doppler shifts in their hoststars (Trifonov et al. 2018). With missions like the Transiting Exoplanet Survey Satellite (TESS;Ricker et al. 2015), knowledge of the chemistry of low mass stars allows for constraints to be madeon the physical properties of the orbiting planets and their formation history.To get an approximation of physical properties with minimal work for large number of stars, pho-tometry has been used in place of spectroscopy by using calibrated relationships between variousphotometric bands and physical stellar properties. This concept has been used to successfully cali-brate relationships for hotter dwarfs of spectral types F,G and K to estimate effective temperature(e.g. Gonz´alez Hern´andez & Bonifacio 2009; Casagrande et al. 2010) and metallicity (e.g. Ram´ırez& Mel´endez 2005). Calibrations of K and M dwarfs have proved more difficult as spectra needed forsuch a calibration required high signal-to-noise to accurately measure equivalent widths of atomiclines. The use of large telescopes have however allowed for such measurements for limited subsets ofvery nearby low-mass stars (e.g. Woolf & Wallerstein 2005, 2006), which have provided tentative pho-tometric calibrations for stellar parameters of low mass dwarfs (e.g. Bonfils et al. 2005; Casagrandeet al. 2008; Dittmann et al. 2016).These relationships however rely on photometric band measurements (e.g. Johnson–Cousins band-passes) that are not common in the most extensive modern astronomical surveys, which means theyare not applicable to the largest datasets now available. To update these calibrations, Schmidt et al.(2016) combined SDSS and AllWISE photometry with the derived stellar parameters of 3,834 starsfrom APOGEE spectra to calibrate new relationships for K and early M dwarfs metallicities down totemperatures of 3500 K with a precision of ∼ Gaia , 2MASS and AllWISE photometryto calibrate a color-metallicity relationship with a precision of ∼ ∼ T eff < ± .
12 dex for stars of T eff > ± .
28 dex for stars of T eff < DATA SETS2.1.
High Proper Motion Gaia SubsetGaia sources that have larger proper motions can be the most challenging to match accuratelydue to their large apparent motions. We extract from
Gaia
DR2 a subset of 5,827,988 stars withrecorded proper motions µ > mas yr − ; this will be our primary catalog for the subsequent crossmatch. We emphasize that no “cleaning” has been done to this subset, such that all Gaia sourceswith proper motions listed with µ > mas yr − are included regardless of source magnitude oruncertainty in the astrometry and/or photometry. All coordinate entries in Gaia are recorded at anepoch of J =2015.5 using the measured proper motions.2.2. External Catalogs
The Two Micron All Sky Survey (2MASS) was conducted between June 1997 and February 2001and collected photometry of stars for the entirety of the sky in three infrared bands: J (1.25 µ m), H (1.65 µ m), and K s (2.16 µ m) (Skrutskie et al. 2006). The final point source catalog includes470,992,970 individual sources where, with a 10 σ point-source detection level, objects are detecteddown to a limit of 15.8, 15.1, and 14.3 mag for the J , H , and K s bands, respectively.2.2.2. AllWISE
The Wide-field Infrared Survey Explorer (WISE) operated between January 2010 and July 2010and collected photometry for the entirety of the sky in four infrared bands: W µ m), W µ m), W µ m), and W µ m) (Wright et al. 2010). For this study, we have opted to use theupdated version of the catalog, AllWISE, which has enhanced sensitivity and accuracy comparedwith earlier WISE data releases (Cutri & et al. 2014). This release consists of 747,634,026 individualsources. Readers should be aware that the AllWISE release results in saturation issues for photometricmeasurements of W1 < < . For the high proper motion sample used in this study, suchstars make up a small portion of the overall sample ( < . GALEX DR5
The GALEX All-Sky Imaging Survey in its fifth data release has collected photometry for sourcesin 21,435 square degrees of the sky between May 2003 and February 2012 in two bands:
F U V (1344-1786 ˚A) and
N U V (1771-2831 ˚A) (Bianchi et al. 2011). This data release consists of 65.3 millionobjects where objects are detected down to
F U V = 19 . N U V = 20 . σ detection level.2.2.4. RAVE DR5
The Radial Velocity Experiment (RAVE) is a magnitude limited (9 < I <
12) radial velocity surveyof bright stars randomly selected in the southern hemisphere (Kunder et al. 2017) that operatedbetween 2003 and 2013. Certain extinction limits were imposed during the target selection in orderto bias the survey towards giants. The fifth data release includes radial velocities for 457,588 uniquestars. In addition, this data release includes stellar parameters for these unique objects. https://wise2.ipac.caltech.edu/docs/release/allwise/expsup/sec2 1.html SDSS DR12
The twelfth release of the Sloan Digital Sky Survey (SDSS DR12; Alam et al. 2015) consists ofphotometric and spectroscopic data from SDSS-III (Eisenstein et al. 2011), collected from August2008 to July 2014, along with all previous SDSS campaigns beginning in 2000. The total unique areacovered by this phase of the survey was 14,555 square degrees, with observations from 469,053,874primary sources (i.e. sources that have been deemed as non-duplicate observations). For this study,we are mainly interested in the photometric part of the survey, which consists of measurements infive optical bands: u (3543 ˚A), g (4770 ˚A), r (6231 ˚A), i (7625 ˚A) and z (9134 ˚A).2.2.6. Pan-STARRS
The Panoramic Survey Telescope and Rapid Response System (Pan-STARRS; Chambers et al.2016) is imaging 30,000 square degrees of the sky in five optical bands: g P (4776 ˚A), r P (6130˚A), i P (7485 ˚A), z P (8658 ˚A) and y P (9603 ˚A). The first data release from the survey consists of1,919,106,885 sources down to a 5 σ detection, which corresponds to limiting magnitudes of 23.3,23.2, 23.1, 22.3, and 21.4 mag for the g P , r P , i P , z P , and y P bands, respectively. CATALOG CROSS-MATCH3.1.
Initial Catalog Query
To reduce the number of sources to be included in the detailed search, the full external catalogsare initially queried at the coordinates around the sources in the
Gaia subset. As these sources ofinterest are (allegedly) high proper motion stars, their motion must be taken into account during theinitial query due to the significant epoch difference between catalogs. With a
Gaia epoch of 2015.5,the location of a
Gaia source, G at the mean epoch of an external catalog, J mean , is given by: α G,mean = α G, . + 2 . × − ( J mean − . µ G,α /cos ( δ ) (1) δ G,mean = δ G, . + 2 . × − ( J mean − . µ G,δ (2)where α and δ are measured in degrees, and µ in mas yr − . The mean epochs for 2MASS, AllWISE,GALEX DR5, RAVE DR5, SDSS DR12 and Pan-STARRS were assumed to be 2000.16, 2010.80,2004.87, 1999.57, 2004.54 and 2014.23, respectively. All sources within 30 arcseconds of the meanlocation of a given Gaia source at the epoch of each survey are retrieved and comprise the initialexternal catalogs for our cross-match.3.2.
Initial Cross-Match
Starting with the initial external catalogs outlined in the section above, a more detailed cross-matchis conducted. For each
Gaia source, all sources from the external catalogs with | x G − x c | < . | y G − y c | < .
01 and | z G − z c | < .
01 are first selected, where: x = cos ( α ) cos ( δ ) (3) y = sin ( α ) cos ( δ ) (4) z = sin ( δ ) (5)and where ( x c , y c , z c ) are based on the coordinates listed in the external catalog and ( x G , y G , z G ) arebased on the location of the Gaia sources at the mean epoch of the external catalog (as described byeqs. 1 and 2). This coordinate system is used to provide more accurate separations near the poles,where other appropriations can have large errors.Following this initial cut, we extrapolate the positions of all sources in the external catalogs assum-ing they have the same proper motion as the
Gaia sources of interest: α c, . = α c,J + 2 . × − (2015 . − J ) µ G,α /cos ( δ ) (6) δ c, . = δ c,J + 2 . × − (2015 . − J ) µ G,δ (7)All sources with extrapolated locations less than 15 arcseonds from the
Gaia source location are thenincluded in the final cut of the cross match. GALEX DR5 does not specifically list observed epochsfor their sources, so the mean epochs for this survey (see above) are assigned to all sources. Also,while AllWISE does provide epochs for individual bands, the mean epochs for this survey (see above)are also assigned to these sources. This is done for two reasons. First, these epochs are for individualbands, and do not describe the epoch for the coordinates listed in the catalog. Second, even if thereare differences between the mean epoch for this survey and the epochs listed for individual bands,due to the very short baseline of the AllWISE measurements ( ∼ Gaia source for the remainder of the analysis.In the next step, we account for the possibility that the epochs of the external catalogs might beinaccurate or unreliable, and for each source we extrapolate a set of positions using epochs rangingfrom 1950 and 2050 in increments of 1/12 year. The epoch that minimizes the angular separationbetween the external source and the
Gaia source is then recorded as the optimal epoch for thatcross-match.As a check, we examine the distribution of optimal epochs for all the cross-matches from eachexternal catalog; these are shown in Figure 1. We use these distributions to verify the effective meanepoch of each of the external catalogs. Sources with best epochs equal to 1950 or 2050 (the edges of theepoch grid) are excluded as they are unlikely to be correct matches. The widths of the optimal epochdistributions are generally consistent with the astrometric precision of each external catalog, wherecatalogs like GALEX DR5 demonstrate a much lower angular resolution than a catalog like Pan-STARRS. Additionally, differences between the means of the optimal and listed epoch distributionsare indicative of systematic errors in sky coordinates. Whereas catalogs like 2MASS and SDSSDR12 demonstrate relativity small offsets, catalogs like RAVE DR5 and Pan-STARRS show largesystematics. Differences in the RAVE DR5 epochs are simply due to the positions of the sourcesbeing listed at their 2000.0 epoch and not at the epoch of the spectroscopic observations, which isthe value listed in the RAVE catalog. The Pan-STARRS difference, on the other hand, may indicatelarger errors in the catalog itself.Finally, the angular separations of the stars that pass the final cut in the initial search at thesemean epochs are calculated and will be used in the subsequent Bayesian analysis.
Figure 1.
Distributions for optimal epochs (red histograms) as compared to the listed epochs (blue his-tograms) for matches with minimzed angular separations between the external source and the
Gaia source.For external catalogs that do not list individual epochs for each source (AllWISE and GALEX DR5), themean epoch assumed for the survey is shown as a blue dotted line.
Displaced Sample
All sources from an external catalog passing the final angular separation cut are considered possiblematches to a specific
Gaia source. However, at least some of these matches are bound to be randomfield stars instead of true counterparts. In order to to evaluate the likelihood that a source is arandom field star, we create a “displaced” sample of sources from each external catalog, which ismeant to represent a sample of purely random field stars.This displaced sample is created by repeating the procedure in Sections 3.1 and 3.2, but by firstdisplacing all the
Gaia declinations by ± Bayesian Analysis
Generally, for some vector of observations, (cid:126)x , the probability of some hypothesis H given that (cid:126)x is true is given by: P ( H | (cid:126)x ) = P ( (cid:126)x | H ) P ( H ) P ( (cid:126)x ) (8)Given an alternative hypothesis to H , H , where P ( H ) + P ( H ) = 1, the above expression can berewritten as: P ( H | (cid:126)x ) = P ( (cid:126)x | H ) P ( H ) P ( (cid:126)x | H ) P ( H ) + P ( (cid:126)x | H ) P ( H ) (9)For a cross-match, the initial hypothesis, H , is that the external catalog entry is a match to the Gaia source being considered, or associated with the
Gaia source, while the alternate hypothesis, H , isthat the external catalog entry is a random field star unrelated to the Gaia source. We include thecaveat of “associated with the
Gaia source” in the statement for the hypothesis H to account forany image or pipeline artifact associated with the Gaia source or a possible binary star companionto the
Gaia source, as these would not be present in the displaced sample. We do expect that theseartifacts/companions will be lower probability, secondary matches in the final result. All Bayesianprobabilities in general will be a function of the mean angular separation of the external source,but also of the magnitude of the external source, the G magnitude of the Gaia source, the Galacticlatitude of the
Gaia source being matched, and possibly other parameters.In order to increase the dimensionality of the problem, while keeping the probability distributionsreasonably simple (e.g. two-dimensions), we will build statistics for different subsets defined byvarious cuts in e.g. the G magnitude and Galactic latitude of the Gaia source, and reduce thedimensions of the probability distributions to two factors: angular separation, and difference betweenthe external catalog magnitude and the
Gaia source G magnitude, or “quasi-color”. The quasi-colorparameter will act a spectral type check when paired with examining probability distributions in finitebins of G magnitude, as we would expect the true matches of similar spectral type to center arounda determined value in this quasi-color space. For this study, the following bins in Gaia magnitude G and Galactic latitude b are used: G <
10, 10 ≤ G < .
5, 12 . ≤ G <
15, 15 ≤ G < . . ≤ G <
20 and G ≥
20, and | b | < .
5, 19 . ≤ | b | < . | b | ≥ .
8. The latitude cuts arechosen such that that they cover regions of approximately equal solid angle on the sky. Due to lownumber statistics, however, these cuts were not applied to the RAVE DR5 sample.Based on these inputs, the probability of the possible match being a random field star, P ( (cid:126)x | H ),can be directly inferred from the results of the cross match with the displaced sample. This is doneby creating a two-dimensional frequency distribution with the displaced sample results for each ofthe surveys using the same cuts in G and b above.Frequency distributions are also determined for stars in the true sample, but, as mentioned, thesedistributions are contaminated by the presence of random field stars. In order to account for thecomponents of the distribution due to these random field stars, the frequency distributions of thedisplaced samples are used. Under ideal circumstances, the number of random field stars should bethe same in both samples, and simple subtraction of the two distributions should suffice in inferringthe component of the distribution for the true cross-match that is equivalent to P ( (cid:126)x | H ).This is not necessarily the case for our distributions though. When performing the search forthe displaced sample, all pointings are randomized and are most likely not falling in the immediatevicinity of a bright source. This is however not the case for the true cross-match, since many of the Gaia sources are matched with bright sources in these external catalogs. In these cases, the PSFsof these bright sources outshines some of these random field stars and leaves them missing from thecatalog and from our inferred statistical distribution. This “blind spot” effect can be seen in Figure2, which shows an SDSS image made using the SDSS DR12 Image Tool . In this image, the PSFeffectively extends nearly 10 arcseconds from the center of the object. This effect is present in allexternal catalogs and has an obvious effect on the frequency distributions for the true cross-match.An example of the resulting effect is shown in Figure 3, where, when comparing the two distributions,a large gap in dim stars at short angular separations is observed. In Figure 3, the median and 68thpercentile of W − G as a function of θ is shown, where quantiles are found using COBS (Ng &Maechler 2015). These quantiles demonstrate that only at very large angular separations does thedistribution of the true sample begin to match that of the displaced sample due to this “outshining”effect.For distributions in bins of fainter G magnitude however, the component due to random fieldstars in the true distribution becomes more apparent at smaller angular separations as compared todistributions for bright sources (Figure 4). This is seen clearly when comparing the quantiles of thetwo distributions in Figure 4. It is at these larger angular separations that the displaced distributionis used to model the random field star component. For the brighter bins though, the “outshining”effect discussed above and random fluctuations in the number of random field stars, this distributionat larger angular separations does not always have the same amplitude.To determine the correct scaling factor and properly subtract the distribution of field stars repre-sented by the displaced sample, we examine the negative part of the residuals. Figure 5 shows theintegral of the negative part of the residuals of the inferred frequency distribution for the hypothesis H after subtracting the frequency distribution of the displaced sample multiplied by a scaling factor.The curve always shows two segments with differing slopes. This change in slope occurs at the pointwhere the scaling factor over-estimates the number of stars in the displaced sample that may havebeen masked by the PSF of the source in the true sample, causing an over-subtraction to occur. Theappropriate scale factor for each distribution, then, is found at the scaling factor where this slopechange occurs. We find that this value is usually around 1 .
0, where deviations from 1 . S = 1 . H for a distribution based on somemagnitude, m , some cut in Gaia G and some cut in Galactic Latitude, b , is given by: F ( (cid:126)x | H ) m,G,b = f = F ( (cid:126)x | H ∪ H ) m,G,b − S m,G,b × F ( (cid:126)x | H ) m,G,b , if f > , if f ≤ S is the scale factor determined from above. The conditional probabilities for a distributionof m − G vs. θ based on some magnitude, m , some cut in Gaia G and some cut in Galactic Latitude, https://skyserver.sdss.org/dr12/en/tools/chart/listinfo.aspx Figure 2.
SDSS image of the source SDSS J000000.21+091332.1 (centered onthe crosshairs), which ismatched with
Gaia source 2747168838957855104. The orange circles in the image indicate other photometricobjects in the SDSS DR12 catalog. b , are then given by the likelihoods: P ( (cid:126)x | H ) m,G,b = F ( (cid:126)x | H ) m,G,b (cid:80) (cid:126)x =( θ,m − G ) F ( (cid:126)x | H ) m,G,b (11) P ( (cid:126)x | H ) m,G,b = F ( (cid:126)x | H ) m,G,b (cid:80) (cid:126)x =( θ,m − G ) F ( (cid:126)x | H ) m,G,b (12)Finally, the prior probabilities are based on the expected number of matches and random field stars: P ( H ) m,G,b = (cid:80) (cid:126)x =( θ,m − G ) F ( (cid:126)x | H ) m,G,b (cid:80) (cid:126)x =( θ,m − G ) F ( (cid:126)x | H ∪ H ) m,G,b (13)1 Figure 3.
Figure showing the 2D distributions of W − G color versus angular separation for the AllWISEinitial cross match (left panel) and for the displaced sample (right panel), in both cases for Gaia sourceswith 12 . < G ≤
15 and | b | ≥ .
8. The colormap of the distribution shows the number density of sources ina bin, on a logarithmic scale; both panels have the same colormap range. The median and 68th percentile of W − G as a function of θ are shown as the red solid and red dashed lines, respectively, where quantiles arefound using COBS (Ng & Maechler 2015). At small angular separations, the distribution in the left panelis dominated by the counterparts and artifacts/companions in the external catalog, while at larger angularseparations the distribution is due to random field stars. P ( H ) m,G,b = 1 − P ( H ) m,G,b (14)Using the definitions laid out in this section, we can now determine a discrete 2D Bayesian probabilitydistribution based on eq. 9 for all combinations of our G magnitude and Galactic latitude bins, and forevery external catalog that we match. An example of the various cross-match distributions discussedabove are shown in Figure 6, where the labels above each panel match the functional definitionsoutlined in eq. 9 and eqs. 11-14. In the bottom right panel of Figure 6, some artifacts are present,such as the line around ( θ, i − G ) ≈ (14 , Distribution Smoothing and Modeling
One of the issues with the above procedure is the statistical noise present in these distributions.To remedy this, while keeping relatively small bin sizes for the frequency distributions, a percentilesmoothing filter was applied and the priors of the discrete and smoothed distributions are then foundto be comparable.2
Figure 4.
Figure showing the 2D distributions of i − G color versus angular separation for the Pan-STARRSinitial cross match (left panel) and for the displaced sample (right panel), in both cases for Gaia sourceswith 17 . < G ≤
20 and 19 . ≤ | b | < .
8. The colormap of the distribution shows the number density ofsources in a bin, on a logarithmic scale; both panels have the same colormap range. The median and 68thpercentile of i − G as a function of θ are shown as the red solid and red dashed lines, respectively, wherequantiles are found using COBS (Ng & Maechler 2015). At small angular separations, the distribution inthe left panel is dominated by the counterparts and artifacts/companions in the external catalog, while atlarger angular separations the distribution is due to random field stars. Another issue with the procedure is that in low occurrence regimes, the under-sampling of thedisplaced sample distribution can cause artificially high Bayesian probabilities. Fortunately, while thetrue cross-match distribution is too complex to model in most cases, the displaced sample distributionis fairly easily modeled. This is because the placement of fields stars is inherently random, and theangular separation and magnitudes differences can be considered independent variables, and theirdistributions modeled separately.Representative models of these one-dimensional, independent distributions are shown in Figure 7.As can be seen from the left panel, the angular separations form a distribution that increases linearlyat higher separations: F ( x = θ ) = mx + b (15)The quasi-color distribution (right panel), on the other hand, can be modeled as a sum of normaldistributions with varying weights: F ( y = m − G ) = n (cid:88) i W i e − ( y − µi )22 σ i (16)The number of components is varied from n = 2 to n = 10, and the ideal number of components isbased on the function that minimizes the Bayesian information criterion (BIC):BIC = n ln (cid:20) (cid:80) ( F ( x ) measured − F ( x ) model ) n (cid:21) + k ln ( n ) (17)3 Figure 5.
Plot of the integral of the negative part of the residuals between the true and displaced distributionfunctions (see eq. 10), represented as black dots. The example above is for a cross-match of our
Gaia subsetwith the Pan-STARRS catalog, with 12 . ≤ G <
15 and 19 . ≤ | b | < .
8. The black dashed line shows theintersect of the two model line segments (red lines), which is determined to be the ideal scaling factor forthe displaced sample in this region.
Where n is the number of data points used for the model and k is the number of components.After fitting the above equations to the 1D distributions for each of the displaced samples, the 2Ddistribution of field stars from the displaced sample can generally be modeled by: F ( (cid:126)x | H ) = F ( x = θ, y = m − G ) = C ( mx + b ) n (cid:88) i W i e − ( y − µi )22 σ i (18)where the scaling constant C is added to ensure the scale of the modeled distribution matches thatof the observed distribution. Best-fitting parameters are evaluated separately for each distribution;one example is shown on the right panel in Figure 8, where it is compared to its associated observeddistribution on the left. As a note, this modelling procedure is completed for 18 different distribu-tions per photometric band in each survey, to account for all combinations of Gaia G and Galacticlatitude bins described in Section 3.4, so the parameters in eq. 18 will be different for each of thesedistributions. 3.6. High Probability Matches Figure 6.
Cross-match frequency distributions discussed in Section 3.4 for Pan-STARRS counterparts of
Gaia with 17 . < G ≤
20 and 19 . ≤ | b | < .
8. The description of individual panels is as follows: upperleft is the frequency distribution for the “true” sample, upper right is the frequency distribution for the“displaced sample”, middle left is the smoothed frequency distribution for the “true” sample, middle rightis the modeled frequency distribution for the “displaced” sample, lower left is the difference between thesmoothed “true” frequency distribution and the modeled “displaced” frequency distribution using the correctscaling factor, and the lower right is the resulting Bayesian probability distribution. All plots, except forthe distribution for P ( H | (cid:126)x ), show the number of sources per bin on a logarithmic scale and they all sharethe same colormap range (indicated by the colorbar on the far right). The median and 68th percentile of i − G as a function of θ are shown on these plots as the red solid and red dashed lines, respectively, wherequantiles are found using COBS (Ng & Maechler 2015). The distribution for P ( H | (cid:126)x ) is shown for a rangeof 0 to 1 and the black solid line overlaid shows the 99% line. Figure 7.
1D distributions (blue bins) of angular separation (left panel) and i − G (right panel) for cross-matched pairs in the Pan-STARRS displaced sample with 12 . ≤ G <
15 and 19 . ≤ | b | < .
8. The bestmodels based on eqs. 15 and 16 are shown as the red dashed lines.
Figure 8.
2D distributions of i − G versus angular separation for the Pan-STARRS displaced sample with12 . ≤ G <
15 and 19 . ≤ | b | < .
8. The colormap of the distribution shows the number of sources in a binon a logarithmic scale and both panels have the same colormap range. The left panel shows the observeddisplaced distribution while the right panel shows the modeled distribution. Additionally, the median and68th percentile of i − G as a function of θ are shown as the red solid and red dashed lines, respectively, wherequantiles are found using COBS (Ng & Maechler 2015). In order to determine the best matches from the external catalogs to the
Gaia sources, we calculatethe total probability defined as the product of the probabilities for each magnitude in a given catalog,when there is a valid magnitude measurement for the source. For SDSS DR12 sources not part of the6
Table 1.
Number of sourcesmatched with a total probability > primary catalog (i.e. are flagged as duplicates in the SDSS DR12) we force their probability to be0%. It should be noted that this does imply that for catalogs with a larger number of photometricmeasurements, the individual probabilities must be higher to reach this 95% threshold. For example,for a catalog like 2MASS that has three photometric measurements, each individual magnitudemust have a probability > .
30% on average to reach the 95% threshold, while a catalog like Pan-STARRS that has five photometric measurements would require each individual magnitude to havea probability > .
98% on average. This small difference should not cause a large difference in thefinal catalog of high probability matches as a large number of the stars with high probability haveprobabilities >
99% (see black line in lower right panel of Figure 6). Best matches are then defined asthose possible sources with a total probability > >
95% for a particular
Gaia source the catalog entry with the largest total probabilityis designated as the best match. Table 1 shows the percentage of matches to
Gaia sources thatmeet this criterion. Low match percentages for catalogs like SDSS DR12 and Pan-STARRS can beattributed to the difference in sky coverage, while low match probabilities in the other catalogs canbe attributed to the limiting magnitudes of the external catalogs.The catalog of “best matches”, i.e. where the
Gaia sources have a counterpart with total probability > Gaia sources. The code needed to reproduce these results, or perform additionalcross-matches, can also be found on our GitHub repository . https://github.com/imedan/bayes match T a b l e . B e s t m a t c h e s f o und i n t h i ss t ud y w i t h a B a y e s i a np r o b a b ili t y g r e a t e r t h a n % f o r a ll c r o ss m a t c h e s i n t h i ss t ud y . C o l u m n N u m b e r L a b e l U n i t s C o l u m n D e s c r i p t i o n G a i a D R I D ··· G a i a D R I d e n t i fi e r G m ag G a i a D R m ag G a i a D R G - b a nd m ag n i t ud e B P m ag G a i a D R m ag G a i a D R B P - b a nd m ag n i t ud e R P m ag G a i a D R m ag G a i a D R R P - b a nd m ag n i t ud e R A I C R S G a i a D R d e g R i g h t a s ce n s i o n a t E p o c h = . f r o m G a i a D R D E I C R S G a i a D R d e g D ec li n a t i o n a t E p o c h = . f r o m G a i a D R P L X G a i a D R m a s P a r a ll a x m e a s u r m e n t f r o m G a i a D R P L X e rr G a i a D R m a s P a r a ll a x m e a s u r m e n t e rr o r f r o m G a i a D R P M R A G a i a D R m a s / y r P r o p e r m o t i o n i n r i g h t a s ce n s i o nd i r ec t i o n ( P M R A * c o s D E ) f r o m G a i a D R P M D E G a i a D R m a s / y r P r o p e r m o t i o n i nd ec li n a t i o nd i r ec t i o n f r o m G a i a D R S D SS D R I D ··· S D SS D R I d e n t i fi e r u m ag S D SS D R m ag u - b a nd m ag n t i ud e o n A B s c a l e f r o m S D SS D R
12 13 u m ag e rr S D SS D R m ag u - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m f r o m S D SS D R
12 14g m ag S D SS D R m agg - b a nd m ag n t i ud e o n A B s c a l e f r o m f r o m S D SS D R
12 15g m ag e rr S D SS D R m agg - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m f r o m S D SS D R
12 16 r m ag S D SS D R m ag r - b a nd m ag n t i ud e o n A B s c a l e f r o m f r o m S D SS D R
12 17 r m ag e rr S D SS D R m ag r - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m f r o m S D SS D R
12 18 i m ag S D SS D R m ag i - b a nd m ag n t i ud e o n A B s c a l e f r o m f r o m S D SS D R
12 19 i m ag e rr S D SS D R m ag i - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m f r o m S D SS D R
12 20 z m ag S D SS D R m ag z - b a nd m ag n t i ud e o n A B s c a l e f r o m f r o m S D SS D R
12 21 z m ag e rr S D SS D R m ag z - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m f r o m S D SS D R
12 22 R A I C R SS D SS D R d e g R i g h t a s ce n s i o n f r o m S D SS D R ( I C R S ) D E I C R SS D SS D R d e g D ec li n a t i o n f r o m S D SS D R ( I C R S ) B a y e s P r o bS D SS D R ··· T o t a l b a y e s i a np r o b a b ili t y o f S D SS D R b j ec t m a t c h t o G a i a D R O b j ec t A n g S e pS D SS D R r c s ec A n g u l a r s e p a r a t i o nb e t w ee nS D SS D R b j ec t a nd G a i a D R b j ec t a t m e a n e p o c h o f S D SS D R
12 262 M A SS I D ··· M A SS I d e n t i fi e r J m ag2 M A SS m ag J - b a nd m ag n t i ud e f r o m M A SS J m ag e rr M A SS m ag J - b a nd m ag n t i ud ee rr o r f r o m M A SS H m ag2 M A SS m ag H - b a nd m ag n t i ud e f r o m M A SS H m ag e rr M A SS m ag H - b a nd m ag n t i ud ee rr o r f r o m M A SS K s m ag2 M A SS m ag K s - b a nd m ag n t i ud e f r o m M A SS K s m ag e rr M A SS m ag K s - b a nd m ag n t i ud ee rr o r f r o m M A SS T a b l e c o n t i n u e do nn e x t pa ge T a b l e ( c o n t i n u e d ) C o l u m n N u m b e r L a b e l U n i t s C o l u m n D e s c r i p t i o n R A J M A SSd e g R i g h t a s ce n s i o n f r o m M A SS ( J ) D E J M A SSd e g D ec li n a t i o n f r o m M A SS ( J ) B a y e s P r o b M A SS ··· T o t a l b a y e s i a np r o b a b ili t y o f M A SS o b j ec t m a t c h t o G a i a D R O b j ec t A n g S e p M A SS a r c s ec A n g u l a r s e p a r a t i o nb e t w ee n M A SS o b j ec t a nd G a i a D R b j ec t a t m e a n e p o c h o f M A SS A ll W I S E I D ··· A ll W I S E I d e n t i fi e r W m ag A ll W I S E m ag W - b a nd m ag n i t ud e f r o m A ll W I S E W m ag e rr A ll W I S E m ag W - b a nd m ag n t i ud ee rr o r f r o m A ll W I S E W m ag A ll W I S E m ag W - b a nd m ag n i t ud e f r o m A ll W I S E W m ag e rr A ll W I S E m ag W - b a nd m ag n t i ud ee rr o r f r o m A ll W I S E W m ag A ll W I S E m ag W - b a nd m ag n i t ud e f r o m A ll W I S E W m ag e rr A ll W I S E m ag W - b a nd m ag n t i ud ee rr o r f r o m A ll W I S E W m ag A ll W I S E m ag W - b a nd m ag n i t ud e f r o m A ll W I S E W m ag e rr A ll W I S E m ag W - b a nd m ag n t i ud ee rr o r f r o m A ll W I S E R A J A ll W I S E d e g R i g h t a s ce n s i o n f r o m A ll W I S E ( J ) D E J A ll W I S E d e g D ec li n a t i o n f r o m A ll W I S E ( J ) B a y e s P r o b A ll W I S E ··· T o t a l b a y e s i a np r o b a b ili t y o f A ll W I S E o b j ec t m a t c h t o G a i a D R O b j ec t A n g S e p A ll W I S E a r c s ec A n g u l a r s e p a r a t i o nb e t w ee n A ll W I S E o b j ec t a nd G a i a D R b j ec t a t m e a n e p o c h o f A ll W I S E G A L E X D R I D ··· G A L E X D R I d e n t i fi e r F UV m ag G A L E X D R m ag F UV - b a nd m ag n i t u f e f r o m G A L E X D R F UV m ag e rr G A L E X D R m ag F UV - b a nd m ag n t i ud ee rr o r f r o m G A L E X D R NUV m ag G A L E X D R m ag NUV - b a nd m ag n i t u f e f r o m G A L E X D R NUV m ag e rr G A L E X D R m ag NUV - b a nd m ag n t i ud ee rr o r f r o m G A L E X D R R A J G A L E X D R d e g R i g h t a s ce n s i o n f r o m G A L E X D R ( J ) D E J G A L E X D R d e g D ec li n a t i o n f r o m G A L E X D R ( J ) B a y e s P r o b G A L E X D R ··· T o t a l b a y e s i a np r o b a b ili t y o f G A L E X D R b j ec t m a t c h t o G a i a D R O b j ec t A n g S e p G A L E X D R r c s ec A n g u l a r s e p a r a t i o nb e t w ee n G A L E X D R b j ec t a nd G a i a D R b j ec t a t m e a n e p o c h o f G A L E X D R P S I D ··· P a n - S T A RR S I d e n t i fi e r m ag P S m agg - b a nd m ag n t i ud e o n A B s c a l e f r o m P a n - S T A RR S m ag e rr P S m agg - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m P a n - S T A RR S r m ag P S m ag r - b a nd m ag n t i ud e o n A B s c a l e f r o m P a n - S T A RR S r m ag e rr P S m ag r - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m P a n - S T A RR S i m ag P S m ag i - b a nd m ag n t i ud e o n A B s c a l e f r o m P a n - S T A RR S i m ag e rr P S m ag i - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m P a n - S T A RR S z m ag P S m ag z - b a nd m ag n t i ud e o n A B s c a l e f r o m P a n - S T A RR S z m ag e rr P S m ag z - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m P a n - S T A RR S T a b l e c o n t i n u e do nn e x t pa ge T a b l e ( c o n t i n u e d ) C o l u m n N u m b e r L a b e l U n i t s C o l u m n D e s c r i p t i o n y m ag P S m ag y - b a nd m ag n t i ud e o n A B s c a l e f r o m P a n - S T A RR S y m ag e rr P S m ag y - b a nd m ag n t i ud ee rr o r o n A B s c a l e f r o m P a n - S T A RR S R A J P S d e g R i g h t A s ce n s i o n f r o m P a n - S T A RR S ( J ) D E J P S d e g D ec li n a t i o n f r o m P a n - S T A RR S ( J ) B a y e s P r o b P S ··· T o t a l b a y e s i a np r o b a b ili t y o f P a n - S T A RR S o b j ec t m a t c h t o G a i a D R O b j ec t A n g S e p P S r c s ec A n g u l a r s e p a r a t i o nb e t w ee n P a n - S T A RR S o b j ec t a nd G a i a D R b j ec t a t m e a n e p o c h o f P a n - S T A RR S R AV E D R I D ··· R AV E D R I d e n t i fi e r B T m ag R AV E D R m ag T y c h o - B T m ag n i t ud e li s t e d i n R AV E D R B T m ag e rr R AV E D R m ag T y c h o - B T m ag n i t ud ee rr o r li s t e d i n R AV E D R V T m ag R AV E D R m ag T y c h o - V T m ag n i t ud e li s t e d i n R AV E D R V T m ag e rr R AV E D R m ag T y c h o - V T m ag n i t ud ee rr o r li s t e d i n R AV E D R R A J R AV E D R d e g R i g h t a s ce n s i o n f r m R AV E D R ( J ) D E J R AV E D R d e g D ec li n a t i o n f r o m R AV E D R ( J ) B a y e s P r o b R AV E D R ··· T o t a l b a y e s i a np r o b a b ili t y o f R AV E D R b j ec t m a t c h t o G a i a D R O b j ec t A n g S e p R AV E D R r c s ec A n g u l a r s e p a r a t i o nb e t w ee n R AV E D R b j ec t a nd G a i a D R b j ec t a t m e a n e p o c h o f R AV E D R T a b l e c o n t i n u e do nn e x t pa ge T a b l e ( c o n t i n u e d ) C o l u m n N u m b e r L a b e l U n i t s C o l u m n D e s c r i p t i o n N o t e — T h i s t a b l e i s pub li s h e d i n i t s e n t i r e t y i n t h e m a c h i n e - r e a d a b l e f o r m a t i n t h ee l ec t r i c v e r s i o n o f t h i s m a nu s c r i p t . T h e d e s c r i p t i o n o f i t s c o l u m n s a r e s h o w nh e r e f o r g u i d a n ce r e ga r d i n g i t s f o r m a nd c o n t e n t . T h i s i s c o m pu t e d f r o m t h e G - b a nd m e a nflu x a pp l y i n g t h e m ag n i t ud eze r o - p o i n t i n t h e V e ga s c a l e . N o e rr o r i s p r o v i d e d f o r t h i s q u a n t i t y a s t h ee rr o r d i s t r i bu t i o n i s o n l y s y mm e t r i c i nflu x s p a ce . T h i s c o n v e r t s t oa n a s y mm e t r i ce rr o r d i s t r i bu t i o n i n m ag n i t ud e s p a ce w h i c h c a nn o t b e r e p r e s e n t e db y a s i n g l ee rr o r v a l u e ( G a i a C o ll a b o r a t i o n e t a l. b ) . M e a n m ag n i t ud e i n t h e i n t e g r a t e d B P b a nd . T h i s i s c o m pu t e d f r o m t h e B P - b a nd m e a nflu x a pp l y i n g t h e m ag n i t ud eze r o - p o i n t i n t h e V e ga s c a l e . N o e rr o r i s p r o v i d e d f o r t h i s q u a n t i t y a s t h ee rr o r d i s t r i bu t i o n i s o n l y s y mm e t r i c i nflu x s p a ce . T h i s c o n v e r t s t oa n a s y mm e t r i c e rr o r d i s t r i bu t i o n i n m ag n i t ud e s p a ce w h i c h c a nn o t b e r e p r e s e n t e db y a s i n g l ee rr o r v a l u e ( G a i a C o ll a b o r a t i o n e t a l. b ) . M e a n m ag n i t ud e i n t h e i n t e g r a t e d R P b a nd . T h i s i s c o m pu t e d f r o m t h e R P - b a nd m e a nflu x a pp l y i n g t h e m ag n i t ud eze r o - p o i n t i n t h e V e ga s c a l e . N o e rr o r i s p r o v i d e d f o r t h i s q u a n t i t y a s t h ee rr o r d i s t r i bu t i o n i s o n l y s y mm e t r i c i nflu x s p a ce . T h i s c o n v e r t s t oa n a s y mm e t r i c e rr o r d i s t r i bu t i o n i n m ag n i t ud e s p a ce w h i c h c a nn o t b e r e p r e s e n t e db y a s i n g l ee rr o r v a l u e ( G a i a C o ll a b o r a t i o n e t a l. b ) . P r o p e r m o t i o n i n r i g h t a s ce n s i o n o f t h e s o u r ce i n I C R S a tt h e r e f e r e n cee p o c h . T h i s i s t h e t a n g e n t p l a n e p r o j ec t i o n o f t h e p r o p e r m o t i o n v ec t o r i n t h e d i r ec t i o n o f i n c r e a s i n g r i g h t a s ce n s i o n ( G a i a C o ll a b o r a t i o n e t a l. b ) . P r o p e r m o t i o n i nd ec li n a t i o n o f t h e s o u r ce a tt h e r e f e r e n cee p o c h . T h i s i s t h e t a n g e n t p l a n e p r o j ec t i o n o f t h e p r o p e r m o t i o n v ec t o r i n t h e d i r ec t i o n o f i n c r e a s i n g d ec li n a t i o n ( G a i a C o ll a b o r a t i o n e t a l. b ) . S e x ag e s i m a l, e q u a t o r i a l p o s i t i o n - b a s e d s o u r ce n a m e i n t h e f o r m : hh mm ssss + dd mm sss [ A B C ...] S k r u t s k i ee t a l. ( ) . T h i s i s t h e s e l ec t e d ” d e f a u l t ” m ag n i t ud e f o r e a c hb a nd ,[ J H K ]. I f t h e s o u r ce i s n o t d e t ec t e d i n t h e b a nd , t h i s i s t h e % c o nfid e n ce upp e r li m i t d e r i v e d f r o m a4” r a d i u s a p e r t u r e m e a s u r e m e n tt a k e n a tt h e p o s i t i o n o f t h e s o u r ce o n t h e A t l a s I m ag e . T h e o r i g i n o f t h e d e f a u l t m ag n i t ud e i s g i v e nb y t h e fi r s t c h a r a c t e r o f t h e R fl g v a l u e ( R fl g ) i n t h e o r i g i n a l c a t a l og . T h i s c o l u m n i s nu lli f t h e s o u r ce i s n o m i n a ll y d e t ec t e d i n t h e b a nd , bu t n o u s e f u l b r i g h t n e ss e s t i m a t ec o u l db e m a d e ( R fl g = ”9” ) S k r u t s k i ee t a l. ( ) . C o m b i n e d , o r t o t a l ph o t o m e t r i c un ce r t a i n t y [ J H K ] m s i g c o m f o r t h e d e f a u l t m ag n i t ud e i n t h a t b a nd . T h ec o m b i n e dun ce r t a i n t y i s d e r i v e d f r o m t h e f o ll o w i n g r e l a t i o n : e [ J H K ] m ag = s q r t( [ J H K ] c m s i g2 + [ J H K ] z p e rr + ff e rr + [ r n o r m r m s ] ) , w h e r ec m s i g = C o rr ec t e db a nd ph o t o m e t r i c un ce r t a i n t y , z p e rr = N i g h t l y ph o t o m e t r i cze r o p o i n t un ce r t a i n t y = . m ag , ff e rr = F l a t - fi e l d i n g r e s i du a l e rr o r = . m ag s , r n o r m r m s = R n o r m a li z a t i o nun ce r t a i n t y = . m ag s ( a pp li e d o n l y f o r s o u r ce s w i t h R fl g = ”1” ) . T h i s c o l u m n i s nu lli f t h e d e f a u l t m ag n i t ud e i s a95 % c o nfid e n ce upp e r li m i t( i. e . t h e s o u r ce i s n o t d e t ec t e d , o r i n c o n s i s t e n t l y d e b l e nd e d i n t h e b a nd )) S k r u t s k i e e t a l. ( ) . P o s i t i o n o f t h e NUV s o u r ce , o r F UV i f s ee n i n F UV o n l y ( B i a n c h i e t a l. ) . C oo r d i n a t e s f r o m s i n g l ee p o c hd e t ec t i o n s ( w e i g h t e d m e a n ) i n e q u i n o x J tt h e m e a n e p o c h g i v e nb y m e a n e p o c h f r o m o b j ec t( C h a m b e r s e t a l. ) . T a b l e . A ll o f t h e p o ss i b l e m a t c h e s f o r t h ec r o ss m a t c hb e t w ee n G a i a a nd M A SS . F o r m o r e d e t a il e dd e s c r i p t i o n s o f t h ec o l u m n s i n t h i s t a b l e , l oo k a tt h ec o rr e s p o nd i n g c o l u m n s i n T a b l e . G a i a I D G α G a i a δ G a i a M A SS I D J σ J H σ H K σ K α M A SS δ M A SS B a y e s P r o b . ∆ θ [ m ag ][ d e g ][ d e g ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ d e g ][ d e g ][ a r c s ec ] . . . + . . . . . . . . . . . . . + . . . . . . . . . . . . . + . . . . . . . . . . . . . + . . . . . . . . . . . . - . - . . . . . . . - . . . . . . + . . . . . . . . . . . . . + . . . . . ··· . . . . . . - . - . . . . . . . - . . . . . - . - . . . . . . . - . . . . . - . - . . . . . . . - . . . N o t e — T h i s t a b l e i s pub li s h e d i n i t s e n t i r e t y i n t h e m a c h i n e - r e a d a b l e f o r m a t . A p o rt i o n i ss h o w nh e r e f o r g u i d a n ce r e ga r d i n g i t s f o r m a nd c o n t e n t . T a b l e . A ll o f t h e p o ss i b l e m a t c h e s f o r t h ec r o ss m a t c hb e t w ee n G a i a a nd A ll W I S E . F o r m o r e d e t a il e dd e s c r i p t i o n s o f t h ec o l u m n s i n t h i s t a b l e ,l oo k a tt h e c o rr e s p o nd i n g c o l u m n s i n T a b l e . G a i a I D G α G a i a δ G a i a A ll W I S E I D W σ W W σ W W σ W W σ W α A ll W I S E δ A ll W I S E B a y e s P r o b . ∆ θ [ m ag ][ d e g ][ d e g ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ d e g ][ d e g ][ a r c s ec ] . . . J . + . . . . . . ··· . ··· . . . . . . - . J . - . . . . . . ··· . ··· . - . . . . . - . J . - . . . . . . ··· . ··· . - . . . . . . J . + . . . . . . ··· . ··· . . . . . . . J . + . . . . . . ··· . ··· . . . . . . - . J . - . . . . . . ··· . ··· . - . . . . . . J . + . . . . . . ··· . ··· . . . . . . - . J . - . . . . . . ··· . ··· . - . . . . . - . J . - . . . . . . . . ··· . - . . . . . - . J . - . . . . . . ··· . ··· . - . . . N o t e — T h i s t a b l e i s pub li s h e d i n i t s e n t i r e t y i n t h e m a c h i n e - r e a d a b l e f o r m a t . A p o rt i o n i ss h o w nh e r e f o r g u i d a n ce r e ga r d i n g i t s f o r m a nd c o n t e n t . T a b l e . A ll o f t h e p o ss i b l e m a t c h e s f o r t h ec r o ss m a t c hb e t w ee n G a i a a nd G A L E X D R . F o r m o r e d e t a il e dd e s c r i p t i o n s o f t h e c o l u m n s i n t h i s t a b l e ,l oo k a tt h ec o rr e s p o nd i n g c o l u m n s i n T a b l e . G a i a I D G α G a i a δ G a i a G A L E X D R I D F U V σ F U V N U V σ N U V α G A L E X δ G A L E X B a y e s P r o b . ∆ θ [ m ag ][ d e g ][ d e g ][ m ag ][ m ag ][ m ag ][ m ag ][ d e g ][ d e g ][ a r c s ec ] . . - . ······ . . . - . . . . . . ······ . . . . . . . . . ······ . . . . . . . . - . ······ . . . - . . . . . - . ······ . . . - . . . . . - . ······ . . . - . . . . . - . ······ . . . - . . . . . . ······ . . . . . . . . - . . . . . . - . . . . . - . ······ . . . - . . . N o t e — T h i s t a b l e i s pub li s h e d i n i t s e n t i r e t y i n t h e m a c h i n e - r e a d a b l e f o r m a t . A p o rt i o n i ss h o w nh e r e f o r g u i d a n ce r e ga r d i n g i t s f o r m a nd c o n t e n t . T a b l e . A ll o f t h e p o ss i b l e m a t c h e s f o r t h ec r o ss m a t c hb e t w ee n G a i a a nd P a n - S T A RR S . F o r m o r e d e t a il e dd e s c r i p t i o n s o f t h ec o l u m n s i n t h i s t a b l e ,l oo k a tt h ec o rr e s p o nd i n g c o l u m n s i n T a b l e . G a i a I D G α G a i a δ G a i a P a n - S T A RR S I D g σ g r σ r i σ i z σ z y σ y α P S δ P S B a y e s P r o b . ∆ θ [ m ag ][ d e g ][ d e g ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ d e g ][ d e g ][ a r c s ec ] . . . . . . . . . . . . . . . . . . . . ············ . . . . ······ . . . . . . . ············ . . . . ······ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . ············ . . ············ . - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ······ . . . . N o t e — T h i s t a b l e i s pub li s h e d i n i t s e n t i r e t y i n t h e m a c h i n e - r e a d a b l e f o r m a t . A p o rt i o n i ss h o w nh e r e f o r g u i d a n ce r e ga r d i n g i t s f o r m a nd c o n t e n t . T a b l e . A ll o f t h e p o ss i b l e m a t c h e s f o r t h ec r o ss m a t c hb e t w ee n G a i a a ndS D SS D R . F o r m o r e d e t a il e dd e s c r i p t i o n s o f t h ec o l u m n s i n t h i s t a b l e ,l oo k a tt h e c o rr e s p o nd i n g c o l u m n s i n T a b l e . G a i a I D G α G a i a δ G a i a S D SS D R I D u σ u g σ g r σ r i σ i z σ z α S D SS δ S D SS B a y e s P r o b . ∆ θ [ m ag ][ d e g ][ d e g ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ m ag ][ d e g ][ d e g ][ a r c s ec ] . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . . . . J . + . . . . . . . . . . . . . . . N o t e — T h i s t a b l e i s pub li s h e d i n i t s e n t i r e t y i n t h e m a c h i n e - r e a d a b l e f o r m a t . A p o rt i o n i ss h o w nh e r e f o r g u i d a n ce r e ga r d i n g i t s f o r m a nd c o n t e n t . T a b l e . A ll o f t h e p o ss i b l e m a t c h e s f o r t h ec r o ss m a t c hb e t w ee n G a i a a nd R AV E D R . F o r m o r e d e t a il e dd e s c r i p t i o n s o f t h e c o l u m n s i n t h i s t a b l e ,l oo k a tt h ec o rr e s p o nd i n g c o l u m n s i n T a b l e . G a i a I D G α G a i a δ G a i a R AV E D R I D B T σ B T V T σ V T α R A V E δ R A V E B a y e s P r o b . ∆ θ [ m ag ][ d e g ][ d e g ][ m ag ][ m ag ][ m ag ][ m ag ][ d e g ][ d e g ][ a r c s ec ] . . - . J . - . . . . . - . . . . . - . J . - ············ . - . . . . . - . J . - ············ . - . . . . . - . J . - ············ . - . . . . . - . J . - ············ . - . . . . . - . J . - . . . . . - . . . . . - . J . - . . . . . - . . . . . - . J . - . . . . . - . . . . . - . J . - ············ . - . . . . . - . J . - . . . . . - . . . N o t e — T h i s t a b l e i s pub li s h e d i n i t s e n t i r e t y i n t h e m a c h i n e - r e a d a b l e f o r m a t . A p o rt i o n i ss h o w nh e r e f o r g u i d a n ce r e ga r d i n g i t s f o r m a nd c o n t e n t . PHOTOMETRIC METALLICITY DETERMINATIONWith the cross-matched data discussed above, we can now assemble a catalog of stars that includesnot only a large variety of photometric bands for all objects, but also spectra for a subset of ∼ K and Early M Dwarf Relationship
Thanks to the SDSS cross match discussed in the previous section, our database comprises spec-troscopic data for ∼ > T eff > Gaia photometryand spectral type from Pecaut & Mamajek (2013). To complement the APOGEE spectra at boththe lower mass end and the lower metallicity end, we also use the catalog from Hejazi et al. (2020),who derived stellar parameters from low-resolution spectra of 1544 high proper motion M dwarfs andsubdwarfs, all of which are included in our database. Additionally, we focus on stars that have grizyphotometry from Pan-STARRS, as it has been shown that low-mass stars show large variations intheir optical colors due to molecular band opacities, and that some of these variations are tied todifferences in chemical composition (L´epine et al. 2007). We also require that the stars have coun-terparts in 2MASS and AllWISE, as it has also been shown that optical colors become degenerate inrelation to metallicity at the very low mass end (e.g. Schmidt et al. 2016). Due to saturation issueswith the Pan-STARRS photometric bands, we only include sources with g > . E ( B − V ) > .
05. Values of A λ are calculatedfor Pan-STARRS, assuming R V = 3 .
1, using the results from Schlafly & Finkbeiner (2011), and A λ values for 2MASS and AllWISE using the results from Davenport et al. (2014). These values of A λ are used to correct all photometric measurements for extinction. This yields a training sample of6197 stars with APOGEE spectra, plus 173 stars from Hejazi et al. (2020) that meet all criteria andhave listed magnitudes in all bands being considered for the calibration (i.e. g, r ,i, z, y, J, H, K s ,W1 and W2).To determine a relationship between photometry and metallicity we use a Gaussian Process Re-gressor implemented in scikit-learn (Buitinck et al. 2013) with an RBF kernel and a white-noisekernel proportional to the average error of the derived abundances in an iterative manner. In thefirst iteration, the sample includes all of the K and early M dwarfs with APOGEE spectra that have g > .
5. To determine the optimal set of colors and absolute magnitudes for the metallicity predic-tion, we start out by performing the regression on a training subset consisting of 60% of the overallsample, determined by splitting the sample into five metallicity bins and selecting 60% from each binto guarantee proportional representation in all color combinations and absolute magnitudes for the8chosen photometry. The regression is then evaluated using the mean squared error of the regressionon the testing subset consisting of the remaining 40% of the overall sample. Next, to determine thecolor or absolute magnitude that least affects the regression, each color and absolute magnitude isindividually removed in turn, the regression performed again, and the new mean squared error com-pared to the initial value obtained when all colors and absolute magnitudes are in use. The color orabsolute magnitude that produces the smallest change in the mean squared error is then omitted forthe remaining iterations, as it has the least effect on the overall regression. This process is repeateduntil one color or absolute magnitude remains. The optimal set of colors and absolute magnitudesthat provide the best metallicity prediction are determined to be the set of N colors and magnitudesthat minimize the mean squared error for the entire process.With this first iteration completed, we perform additional cleaning of our sample by attempting toremove unresolved binaries, as their derived metallicities in spectroscopic surveys will be inaccuratedue to their properties being derived from synthetic spectra of single stars. As unresolved binariesappear over-luminous on an HR diagram compared with stars of similar metallicity, we use thebest related color and absolute magnitude combination from the first iteration to clean the samplein the following manner: we plot subsets of stars from each of 13 metallicity bins of 0.1 dex inwidth, defined in the range − . < [ M/H ] < .
5; to each bin we fit a single-star main sequenceusing a fourth degree polynomial fit, where the fit is performed iteratively by removing stars with d M,i > d M + 1 . σ d M (i.e. stars significantly above the single-star main sequence on a HR diagram)and d M,i < d M − σ d M (i.e. stars significantly below the single-star main sequence on a HR diagram),where d M,i is the distance in absolute magnitude from the fitted relationship of the i th star in thesample, defined as d M,i = M fit − M i , also where d M is the mean distance in absolute magnitudefrom the fitted relationship for all stars in each step, and finally where σ d M is the standard deviationin absolute magnitude around the fitted relationship for all stars in each step; iterations continueuntil no more stars are removed. Due to low number statistics, we are however unable to fit color-magnitude relationships for stars with M/H ≤ − .
8, so we exclude objects with d M > d M usingthe color-magnitude relationship polynomial relationship for stars with − . < M/H ≤ − . − . < M/H ≤ − . N inputs that providedthe minimum mean squared error for the testing subset during the entire process. For the cleaned9 Table 9.
Polynomial fits to the single star main sequence for thirteen bins of metallicity for Kand early M dwarfs. Coefficients of the polynomial fits describe the relationship y = a + a x + a x + a x + a x where y = M K and x = g − W
1. The last two columns of the table give themean offset and standard deviation, in absolute magnitude, from the relationship; these are usedto identify stars deemed over- or under-luminous.M/H Range a a a a a d M σ d M − . < M/H ≤ − . − . × − − . < M/H ≤ − . . × − − . < M/H ≤ − . − . × − − . < M/H ≤ − . − . × − − . < M/H ≤ − . − . × − − . < M/H ≤ − . − . × − − . < M/H ≤ − . . × − − . < M/H ≤ . − . × − . < M/H ≤ . . × − . < M/H ≤ . . × − . < M/H ≤ . . × − . < M/H ≤ . . × − . < M/H ≤ . . × − Figure 9.
HR diagrams of M K vs. g − W g > .
5. The left panel is the original subset, the middlepanel is the subset after unresolved binaries have been removed, using the method described in Section 4.1,and the right panel shows the polynomial fits from Table 9 overlayed on the subset after the removal ofunresolved binaries, where the color of the line corresponds tp the bin of the polynomial fit. sample, the optimal set in this case consists of M g , g − y , y − W J − W W − W
2. The resultsof the regressor for the training subset (the subset used to train the regressor) and the testing subset(the subset used to evaluate the regressor) are shown in Figure 10. A comparison of predicted andobserved metallicity (Figure 10, second row) values suggests that the regressor is capable of predictingthe metallicities from the input colors and absolute magnitudes to a precision of ∼ σ scatter. Additionally, the residual error in predicted vs. measured metallicity (Figure10, bottom row) demonstrates that this scatter is roughly constant over the metallicity range underconsideration and that systematic offsets are only apparent at the very metal poor (and sparse) endof the distribution.The actual regressor can be loaded via a Python script; the file with the trained scikit-learn regres-sor, along with example code to load and use the regressor to predict metallicites, are hosted on ourGitHub repository (https://github.com/imedan/gpr metallicity relationship).4.2. M Dwarf Relationship
Next we attempt to determine a photometry-metallicity relationship for M dwarfs cooler than 3500K, in the same manner as for the K and early M dwarfs (see Section 4.1). The main challengeis to identify a training set for these cooler stars. As discussed above, the stellar parameters forlater-type M dwarfs determined from APOGEE are not as reliable as for the K dwarfs. However,since we now have a reliable photometric relationship for the K and early M dwarfs, we can exploitthe wide binaries in our high proper motion sample and infer the metallicities of the cooler M dwarfsecondaries by proxy of their K dwarf primary. To do this, we use the SUPERWIDE catalog of widebinaries from Hartman & L´epine (2020), which lists ∼ Gaia through a Bayesian method. Similar to above, the photometry of the stars in eachbinary is corrected for extinction prior to the analysis. From this catalog, we photometrically selectall pairs with a K and early M dwarf primary that also have an M dwarf secondary cooler than3500 K, again based on the relationship between
Gaia photometry and spectral type from Pecaut& Mamajek (2013). Metallicities for the primaries are determined using the final regressor fromthe Section 4.1 above, and all unresolved binaries for the primary sources are removed using thepolynomial relationships and cuts determined for the single star main sequence found in the previoussection. This results in a set of 4,859 primary stars, used as proxy for the metallicity values of theircooler M dwarf secondaries. Additionally, we complement this sample with 364 cool, single-star Mdwarfs ( < M r , r − W i − K . The results of the regressor for the training subset (the subset used to train theregressor) and the testing subset (the subset used to evaluate the regressor) are shown in Figure 12. Acomparison of the predicted and “observed” metallicity values (Figure 12, top row), suggest that the1 Figure 10.
Results of the Gaussian Process Regressor on the single-star data set for K and early M dwarfsusing the following combination of colors and absolute magnitudes: M g , g − y , y − W J − W W − W Table 10.
Polynomial fits to the single star main sequence for thirteen bins ofmetallicity for M dwarfs. Coefficients of the polynomial fits describe the relationship y = a + a x + a x + a x + a x where y = M W and x = g − W
1. The lasttwo columns of the table give the mean offset and standard deviation, in absolutemagnitude, from the relationship; these are used to identify stars deemed over- orunder-luminous.M/H Range a a a a a d M σ d M − . < M/H ≤ − . − . < M/H ≤ − . − . < M/H ≤ − . − . < M/H ≤ − . − . < M/H ≤ − . − . < M/H ≤ − . − . < M/H ≤ − . − . < M/H ≤ . . < M/H ≤ . . < M/H ≤ . . < M/H ≤ . . < M/H ≤ . . < M/H ≤ . Figure 11.
HR diagrams of M W vs. g − W regressor is generally capable of predicting metallicities from the input colors and absolute with someaccuracy. However the regressor struggles with more metal poor stars: the scatter in the predicted vs.“observed” metallicities (Figure 12, middle row) shows a 1 σ spread of 0.21 dex for the testing subset,though this does not account for the compounded error from basing this regressor on the results from3 Table 11.
Number of high proper motion sources matchedwith a total probability > · · · † † The match done by Marrese et al. (2019) was to SDSSDR9 rather than DR12. the K and early M dwarf regressor. Finally, the residual error in predicted vs. “observed” metallicity(Figure 10, bottom row) demonstrates that this scatter is not uniform over the full metallicity range,and that there are systematic offsets, with metal-poor stars having over-estimated predicted values,and metal-rich stars having under-estimated predicted values. DISCUSSION5.1.
Cross-Match Efficiency
The main motivation of this study was to improve the cross-match to external catalogs for
Gaia stars with the largest proper motions. Our overall matching rates to the high proper motion subsetare listed in Table 11. We compare our percentage of high probability matches to that of the cross-match by Marrese et al. (2019). By extracting our high-proper motion targets with external catalogmatches, we found that the Best-Neighbor catalog from Marrese et al. (2019) for the same subsetyields the match rates listed in Table 11. The comparison shows that our method, which combinesastrometry and photometry, performs better for most catalogs, with the only exception being RAVEDR5.Another way to examine the improvement in the match is to look at the fraction of
Gaia sourcesmatched as a function of proper motion for a small, uncrowded region of the sky where all externalcatalogs should have 100% sky coverage when magnitude limits are not considered. Figure 13 showsthis for the four external catalogs with the largest number of matched sources. For external catalogsthat are not as deep as
Gaia (2MASS and AllWISE), we expect that the fraction matched shoulddecrease with decreasing proper motions, because fainter sources beyond the magnitude limit of theexternal catalogs are more likely to have small proper motions. This pattern is indeed apparent inthe 2MASS and AllWISE recovery rates both from Marrese et al. (2019) and from the present study(see Figure 13). For external catalogs deeper than
Gaia (SDSS and Pan-STARRS), we expect thatnearly 100% of the
Gaia sources should be matched in uncrowded fields. We find this to be the casefor our cross-match results, but the Marrese et al. (2019) recovery rates tell a different story, as it isthe high proper motion stars that have a lower match rate. In particular, the match to Pan-STARRSfrom Marrese et al. (2019) is proper motion dependent in a way that suggests significant issues in4
Figure 12.
Results of the Gaussian Process Regressor on the single star data set for M dwarfs using thefollowing combination of colors and absolute magnitudes: M r , r − W i − K . The first column showsthe result for the training subset (the subset used to train the regressor) and the second column the testingsubset (the subset used to evaluate the regressor). The top row compares the predicted metallicity fromthe regressor to the “observed” metallicity, i.e., values inferred either from their brighter primaries usingour K/early-M regressor or from the spectroscopic analysis of Hejazi et al. (2020), the distribution of thedifferences between the predicted and APOGEE metallicity, and the bottom row shows the residual error inthe predicted versus the “observed” metallicity. Figure 13.
Recovered fraction of
Gaia stars within 3 ◦ of α, δ = 200 ◦ , ◦ that were matched to 2MASS,SDSS, Pan-STARRS and AllWISE by this study (blue bins) compared with the recovered fraction in thematch by.Marrese et al. (2019) (orange bins). For all external catalogs, this region of sky is 100% covered bythe survey, and thus any deviations from a nearly 100% match rate are either due to differences in magnitudelimit or some systemic errors in the survey positions. the Pan-STARRS pipeline in regards to the mean positioning or epoch determination, which thenresults in poor match rates for high proper motion stars unless these systematic errors are accountedfor, as we have done in the present study. We note that these issues also seem present in Marreseet al. (2019) for stars with µ < mas yr − , demonstrating that additional improvements could bemade using our method. Performing these matches are outside of the scope of the current study, butvolume limited cross-matches are planned for future works.This large improvement for matching high proper motion stars with Pan-STARRS, when takinginto account the footprint of the survey (i.e. δ > − ◦ ), results in a match rate of 99.8% for ourmethod, as compared to 20.8% in the same region for the match by Marrese et al. (2019). Todemonstrate that our additional matches are genuine, Figure 15 shows the HR diagram using Pan-STARRS photometry combined with matched Gaia parallaxes for three groups: “Same Matches”,6consists of
Gaia stars that are matched to the same Pan-STARRS source by us and by Marrese et al.(2019); “Different Matches”, consists of
Gaia stars that are matched to different Pan-STARRS sourceby us and by Marrese et al. (2019); and “Mutually Exclusive Matches”, consists of
Gaia stars forwhich one method finds a match while the other does not. This figure shows that the large increasein the match rate is in fact real, as we are getting clean main-sequence, giant, and white dwarf locifor the stars in the mutually exclusive group.In addition, a look at the “Different Matches” group shows two HR diagrams that look relativelysimilar in both studies (notably at the faint end), despite matching with different Pan-STARRSsources. We attribute this to the existence of a significant number of duplicates in the Pan-STARRScatalog. This appears to be especially common for high proper motion objects, likely because thePan-STARRS pipeline is not perfectly able to match detections of fast moving objects at differentepochs. The HR diagram from our study, however, does show a “cleaner’” main sequence (with lessscatter) at the bright end, which suggests that our study generally finds the Pan-STARRS duplicateentry with the more reliable photometry. Part of this success comes from our more efficient accountingof epoch differences, but also from our Bayesian probability method having a preference for matcheswith smaller magnitude differences to the Gaia sources, as demonstrated when comparing the 99%lines in the Bayesian probability distributions in Figures 6 and 14.While we see the largest improvement in the Pan-STARSS cross match, we also see modest, but sig-nificant, improvements in the cross-match to other catalogs, like SDSS. Figure 16 compares standard[ g − r, r − i ] color-color diagrams for our match and the match by Marrese et al. (2019), where thelayout is the same as in Figure 15. Two notable trends are apparent. First, we do find that for somematches that differ between the studies (“Different Matches” group) our cross-match again producesa cleaner subset with better photometry, i.e., more consistent with the expected main-sequence lo-cus, which suggest that our match is more accurate. Second, we can see in the “Mutually ExclusiveMatches” group that our matches shows a denser main-sequence locus, and also show a clear “whitedwarf-M dwarf binary bridge”, which is a well-known arc that connects the white dwarf and Mdwarf loci on the color-color diagrams (e.g. Augusteijn et al. 2008; Liu et al. 2012). In contrast, the“Mutually Exclusive Matches” from Marrese et al. (2019), show more stars with poor or inconsistentphotometry, notably stars in a diagonal locus redder in r − i than the expected M dwarf locus, andwith inconsistent colors for a nearby main-sequence star.Overall, this demonstrates that our cross-match method not only produces a more complete catalogof counterparts, but also provides cleaner photometry. Our catalog of cross-matches is also “prob-abilistic”, and users can lower the probability threshold in order to get a more complete catalog atthe expense of having a higher rate of false matches.5.2. Photometric Metallicity Relationships
K and Early M Dwarf Relationship
Using machine learning with a Gaussian Process Regressor, we show that it is possible to predictthe metallicity of stars in a testing subset of K and early M dwarfs to within ± Figure 14.
Probability distributions as discussed in Section 3.4 for Pan-STARRS where 12 . < G ≤ . ≤ | b | < .
8. Top left panel: frequency distribution for the “true” sample. Top right: frequencydistribution for the “displaced sample”. Center left: smoothed frequency distribution for the “true” sample.Center right: modeled frequency distribution for the “displaced” sample. Bottom left: difference betweenthe smoothed “true” frequency distribution and the modeled “displaced” frequency distribution using thecorrect scaling factor. Bottom right: resulting Bayesian probability distribution. All plots, except for thedistribution for P ( H | (cid:126)x ), show the number of sources in a bin on a logarithmic scale and they all sharethe same colormap range (indicated by the colorbar on the far right). The median and 68th percentile of i − G as a function of θ are shown on these plots as the red solid and red dashed lines, respectively, wherequantiles are found using COBS (Ng & Maechler 2015). The distribution for P ( H | (cid:126)x ) is shown for a rangeof 0 to 1 and the overlaid, black solid line shows the 99% line. Figure 15.
HR diagrams combining Pan-STARRS photometry with
Gaia parallaxes based on the cross-matches from this study (first column) and the cross-matches from Marrese et al. (2019) (second column).The third column shows the fractional difference between the HR diagrams in the first two columns. Thetop row shows the “Same Matches”, defined as when both studies find the same match to a Pan-STARRSsource (both panels identical by definition); the center row displays the “Different Matches”, defined as whenboth studies find a match to a
Gaia source but determine the best match to be a different Pan-STARRSsource; and finally the bottom row shows the “Mutually Exclusive Matches”, defined as when one studyfinds a match to a
Gaia source when the other study finds no suitable match.
Schmidt et al. (2016) tested over − < M/H < . − < M/H < .
5. The relationship from Schmidt et al. (2016) also covered a smaller temperaturerange of 3550 < T eff < − . 5) and temperatures (3500 < T eff < Figure 16. Standard [ g − r, r − i ] color-color diagram of the SDSS DR12 sources matched in this study (firstcolumn) and of the SDSS DR9 match done by Marrese et al. (2019) (second column). The third columnshows the fractional difference between the color-color diagrams in the first two columns, where all colormaps are centered on zero, which is represented as white in the color map. Three subsets are represented:“Same Matches”, defined as when both matches find the same match to a Pan-STARRS source (top row;identical for both studies by definition); “Different Matches”, defined as when both matches find a matchto a Gaia source but we determine the best match to be a different Pan-STARRS source (center row); and“Mutually Exclusive Matches”, defined as when one match found a match to a Gaia source when the othermatch did not (bottom row). from our regressor when it is trained and evaluated on the K and early M dwarf data set before theunresolved binaries are removed. Not only is the scatter greater when the binaries are left in (Figure17, first and second row), but the residuals between the photometric metallicity estimates and themeasured metallicity values show the same systematic errors as in previous studies (Figure 17, thirdrow). This demonstrates the importance of removing such contaminates from the training samplebefore extracting a relationship between photometry and metallicity. Another important distinctionbetween our study and Schmidt et al. (2016) is that our regressor uses Pan-STARRS for the opticalmagnitudes, while Schmidt et al. (2016) uses those from SDSS. Due to the much larger sky coverageof Pan-STARRS (including large swaths of the Galactic plane overlooked by SDSS), our regressorcan be used for a much larger number of stars than that of Schmidt et al. (2016).Davenport & Dorn-Wallenstein (2019) also used a machine learning technique, with a k-nearestneighbors regressor, in order to estimate photometric metallicities. They have provided their code0 Figure 17. Results of the Gaussian Process Regressor on the training set of K and early M dwarfs butwith no preliminary removal of suspected binaries (compare with Figure 10), and again using the followingcombination of colors and absolute magnitudes: M g , g − y , y − W J − W W − W 2. The first column showsthe result for the training subset (the subset used to train the regressor) and the second column the testingsubset (the subset used to evaluate the regressor). The top row compares the predicted metallicity fromthe regressor to the actual value from APOGEE, the middle row shows the distribution of the differencesbetween the predicted and APOGEE metallicity, and the bottom row shows the residual error in metallicityversus the APOGEE metallicity. Compared with the training subset cleaned of suspected binaries (Figure10), this “unclean” subset yields a regressor which introduces significantly larger systematic and randomerrors, thus demonstrating the need to use a “cleaned” training subset. Figure 18. Results of the k-nearest neighbors regressor from Davenport & Dorn-Wallenstein (2019). Theleft panel compares the predicted metallicity from the regressor to the actual value from APOGEE, themiddle panel shows the distribution of the differences between the predicted and APOGEE metallicity forDavenport & Dorn-Wallenstein (2019) (blue bins) and this study (orange bins), and the right panel showsthe residual error in metallicity versus the APOGEE metallicity. and data set used for training , and we have used these to predict metallicites for their trainingsample. We have also included M G as part of the inputs for their regressor, which they note improvesthe final results. The results from the Davenport & Dorn-Wallenstein (2019) regressor are shown inFigure 18 for comparison. The scatter in the distribution is found to be comparable to that fromour own regressor, although the tails of our scatter distribution are much lower, the systematic trendof the metallicities of metal-poor stars being over-estimated, and the metallicities of metal-rich starsbeing underestimated is however stronger in the Davenport & Dorn-Wallenstein (2019) regressorresults, suggesting that the sample used for training includes contaminates in the form of unresolvedbinaries which are skewing the results.5.2.2. M Dwarf Relationship While our removal of unresolved binaries in the K and early M subset appears to be successful inreducing systematic errors, we cannot currently achieve the same results for our M dwarf calibration.Our current M dwarf regressor does show significant systematic errors consistent with binary starcontamination (Figure 17), which suggests that our attempted removal of unresolved binaries is lesseffective than in the K and early M subset. This result is affirmed by the polynomial fits in theright panel of Figure 11, where it is clear the fits did not converge and effectively remove all of theunresolved binaries. This result is to be expected if we consider that the width of the main sequenceis significantly wider for M dwarfs, thus making the removal of unresolved binaries more challengingwith our polynomial method. Binary removal is especially challenging for metal-poor stars, whichhave minimal representation in the training subset in any case.To further illustrate the challenge of training a regressor for metal-poor M dwarfs, we appliedboth of the trained regressors to the entirety of the cross matched high proper motion sample in thisstudy. This resulted in 716,651 K and early M dwarfs, and 1,030,761 M dwarfs that have the requiredphotometry, have a parallax error < − . < M/H < . https://github.com/jradavenport/ingot Table 12. List of predicted metallicities for 716,651 K and early M dwarfsusing the Gaussian Process Regressor outlined in Section 4.1. The regressoruses the following data inputs, M g , g − y , y − W J − W W − W 2, andstars are included only if they pass the single star polynomial cuts outlinein Table 9. Gaia ID α Gaia δ Gaia M/H σ M/H [deg] [deg] [dex] [dex]2738617902668633344 0.00059265319 1.41569388809 0.182 0.0552420684907087327104 0.00064439500 -14.44069806218 0.175 0.057384458916557103744 0.00098644795 42.92964411582 -0.100 0.0632422936569461883648 0.00149210844 -9.59710946245 -0.320 0.0572422910181182822784 0.00181091004 -9.91118104462 0.138 0.0542448241275523065856 0.00190792593 -2.76638530879 -1.609 0.1562335002057582474752 0.00194199280 -25.60630079213 0.302 0.0562334659529646655744 0.00206867958 -26.54384648213 -0.268 0.0652341416475275983872 0.00212583048 -21.25877753606 0.424 0.0572414490258576137984 0.00254114360 -17.08225464560 -0.129 0.063 Note —This table is published in its entirety in the machine-readable for-mat. A portion is shown here for guidance regarding its form and content. estimated metallicity values are listed in Tables 12 and 13. Elimination of suspected binaries usingour polynomial cut method removed 23% of the K dwarfs and 10% of the M dwarfs. HR diagrams ofthese samples are presented in Figure 19, and show a clear deficit of stars at the metal-poor (blue)edge of the main-sequence locus. Figure 20 shows that the distribution of metal-poor K and early-Mstars is very spread out compared with the distribution of metal-rich objects, which is more local.The difference comes from the metal-poor stars coming from the local thick disk and halo population,which has a much lower density near the Sun. Figure 21 shows that this is a problem for locatingmetal-poor M dwarfs, because the Gaia catalog identifies them to a significantly shorter distancerange, biasing the sample in favor of more metal-rich M dwarfsIn the Gaussian Processor Regression method, standard deviations of predicted values can becomputed, and these are represented as the color scale in the right panels of Figure 19. We findthat for the highly sampled metallicity regime (i.e. − . < M/H < . σ spread we found in the testing subset for eachregressor. However, the errors become much larger for lower metallicity regimes and regions of theHR diagram that are not as well sampled in the training subset. This is a good indicator that whilethere are limitations with both of these regressors, the predicted errors at least allow us to determinethe range over which the method yields the best results.Knowing that stellar kinematics is locally correlated with chemical abundances, we validate ourpredicted metallicity values by examining the range of metallicity estimates for various subsets ofK and M dwarfs selected by tangential velocity (Figure 22). For the K and early M dwarfs (Figure22, left panels), we observe the trend noted in previous studies (e.g. Nordstr¨om et al. 2004; Ivezi´c3 Table 13. List of predicted metallicities for 1,030,761 M dwarfs using theGaussian Process Regressor outlined in Section 4.2. The regressor uses thefollowing data inputs, M r , r − W i − K , and stars are included only ifthey pass the single star polynomial cuts outline in Table 10. Gaia ID α Gaia δ Gaia M/H σ M/H [deg] [deg] [dex] [dex]429379051804727296 0.00021864741 60.89771179398 0.051 0.133384318827606840448 0.00047931984 42.19465632646 0.308 0.1402772104109112208256 0.00051294158 15.56517501763 0.039 0.1332846731811580119040 0.00115069265 21.30261306611 0.329 0.1482441058204714858240 0.00121147731 -8.90793754062 -0.230 0.138386559121204819328 0.00193074605 45.04225289803 0.173 0.1562313055461894623232 0.00204668844 -33.55516167237 -0.008 0.1812421214317639640064 0.00231957574 -12.53175806608 -0.155 0.1332334666126716440064 0.00256127213 -26.36534090966 -0.071 0.1332443194001755562624 0.00271533083 -5.07308824110 0.281 0.138 Note —This table is published in its entirety in the machine-readable for-mat. A portion is shown here for guidance regarding its form and content. et al. 2008; Bensby et al. 2014; Grieves et al. 2018) showing a steep negative metallicity gradient as afunction of tangential velocity. The trend is however not observed for our subset of M dwarfs (Figure22, right panels) which shows few metal-poor stars, even at large tangential velocities. We suggestthat this again is due to the inherent bias in the initial magnitude-limited, high proper motion Gaia subset, which is efficient at selecting more distant, high velocity K dwarfs of the thick disk and halo,but does a poor job at identifying their M dwarf equivalents because of magnitude limit. In orderto improve our metallicity calibration for the M dwarfs, we would need to include a larger sample ofnearby, metal-poor M dwarfs in our training subset, which likely will depend on the use of deepersurveys like SDSS, Pan-STARRS, and the upcoming LSST (Ivezi´c et al. 2019).5.2.3. Photometric Metallicities of Nearby Clusters: Hyades and Praesepe As a final check on our calibrations, we verify the average metallicity of stars in our high propermotion sample (from Section 5.2.2) that are members of nearby open clusters. Specifically, we considerstars that are members of the Hyades and Praesepe open clusters. We use the open cluster membersfrom Gaia Collaboration et al. (2018a), where members were chosen using an iterative process thatdetermines the cluster velocity and location centroid, as detailed in Gaia Collaboration et al. (2017).We use the Gaia IDs from Gaia Collaboration et al. (2018a) to identify members in our high propermotion sample. The distributions of photometric metallicities from both calibrations are shown inFigure 23. The number of cluster members displayed in Figure 23 is lower than the count fromGaia Collaboration et al. (2018a), especially for the K and early M dwarfs, because we only estimatemetallicities for stars with g P S < . Figure 19. HR diagrams for the 716,651 K and early M dwarfs (top row), and for the 1,030,761 M dwarfs(bottom) for which we estimated metallicities using our Gaussian Process regressor, with the metallicityvalues shown as a color scale in the panels on the left, and the accuracy of the estimate shown as a colorscale in the panels on the right. These represent all stars with the required photometry in the cross matchedhigh proper motion sample, that have a parallax error < − . < M/H < . 5) shownon respective HR diagrams. The panels on the right show the 95% confidence interval for the metallicitymeasurements shown in the left column. A Gaussian fit to the K/early-M metallicity distribution of Hyades members shows an averagevalue of M/H = 0 . 120 and a dispersion of σ M/H = 0 . M/H = 0 . σ M/H = 0 . M/H = 0 . 130 with a dispersion σ M/H = 0 . M/H = 0 . 185 with a dispersion σ M/H = 0 . Figure 20. Height above the Galactic plane (z) versus Galactocentric radius for the 716,651 K and early Mdwarfs with metallicity values estimated from our Gaussian Process Regressor (see Figure 19). Each panelshows a subset in a defined metallicity range, with more metal poor stars near the top and more metal richstars near the bottom. Metal-poor stars show a more dispersed distribution consistent with old disk andhalo membership, which metal-rich stars show a more pronounced concentration near the Galactic midplane. [ M/H ] ≈ − . M/H RMS = 0 . ± . 406 for the K and early MDwarfs, and M/H RMS = 0 . ± . 068 for the M dwarfs, as compared to M/H RMS = 0 . ± . M/H RMS = 0 . ± . M/H RMS = 0 . ± . 073 for the M dwarfs, as compared to M/H RMS = 0 . ± . 08 from Netopil et al. (2016). It seems that, within the uncertainties, ourphotometric calibrations agree with the spectroscopic estimates from Netopil et al. (2016).From this test alone, we cannot however rule out systematic errors in our calibrations. This isbecause the metallicity probed by these two clusters is in a region in metallicity where the resultsfrom our calibration indicate either no systematic errors (K and early M dwarfs, Figure 10) orconsiderably lessened systematic errors (M Dwarfs, Figure 12), as compared to more metal-poor andmetal-rich regimes. A test of the metal-poor and metal-rich regimes may prove difficult due to thelimited number of nearby stars clusters, and may have to rely on more direct spectroscopic metallicity6 Figure 21. Height above the Galactic plane (z) versus Galactocentric radius for the 1,030,761 M withmetallicity values estimated from our Gaussian Process Regressor (see Figure 19). Each panel shows asubset in a defined metallicity range, with more metal poor stars near the top and more metal rich starsnear the bottom. Compared with the K and early M dwarf distribution shown in Figure 20, the M dwarfdistribution appears significantly more distance limited, which is due to the Gaia magnitude limit. Thisprevents the inclusion of larger numbers of metal-poor objects. measurements from individual stars. With this in mind, we remain cautious about the applicationof our regressors for metal-poor systems, but with the results in this section do feel confident aboutthe metallicity estimates provided for M dwarfs in the local disk population of the Milky Way. CONCLUSIONSIn this study we sought to develop a Bayesian framework for cross-matching high proper motionstars in Gaia to photometric external catalogs of various size and wavelength coverage. This wasaccomplished by comparing two dimensional distributions that take into account photometry andastrometry of a “true” initial catalog query and a “displaced” catalog query, used to infer the distri-bution of random field stars. Using these distributions, we are able to find match probabilities forall stars within 15” of our Gaia sources at the mean epoch of the external survey. These sources andtheir Bayesian match probabilities are included in an accompanying electronic table.When comparing the counterparts with the highest match probabilities ( > Gaia cross-match presented in Marrese et al. (2019), we find that our methodproduces a better match rate, with the largest improvement achieved for Pan-STARRS. Additionally,we show that the photometry produced by our method creates a “cleaner” catalog overall, with fewerstars having clearly incorrect colors due to mismatches.7 Figure 22. Distribution of estimated photometric metallicity values for different cuts of tangential velocity(top row) and photometric metallicity as a function of tangential velocity (bottom row) for the 716,651 Kand early M dwarfs (bottom left column) and 1,030,761 M dwarfs (bottom right panel). The red data pointsand error bars represent the average and standard deviation, respectively, of the predicted metallicity inbins of 50 km/s of tangential velocity. The apparent lack of metal-poor M dwarfs is due to selection effectsfrom the magnitude limit of of the Gaia catalog. With this more complete cross matched catalog of high proper motion sources, we identify a trainingsubset of 4,378 K and early M dwarfs with metallicity values measured from SDSS-APOGEE spectra,and recover their optical and infrared photometry from Pan-STARRS, 2MASS, and AllWISE. We usethis training subset to derive a Gaussian Process Regressor that yields metallicity estimates basedon the optical and infrared photometry. Our regressor applies to K and early M dwarfs in the colorrange 0 . < BP − RP < . 39 mag. In addition, we apply the results of the first regressor on a set of3,689 common proper motion binaries and infer metallicity for the later-type M dwarf companions inthese systems. We use this as a training subset to create a second Gaussian Process Regressor, whichwe use to estimate metallicities in later type M dwarfs, in the color range 2 . < BP − RP < Figure 23. Distributions of estimated photometric metallicities for the subset of K and early M dwarfs(top row), and for the subset of later-type M dwarfs (bottom row) for members of the Hyades (left column)and Praesepe (right column), where membership of these open clusters is from Gaia Collaboration et al.(2018a). The solid black line is a fitted Gaussian distribution to the histogram. Gaussian fits are shownwith the black line, and the measured mean and dispersion values are noted with the black dots and theirerrobars. For the K and early M dwarf members of the Hyades, the one member at [ M/H ] ≈ − . M/H RMS = 0 . ± . 039 for the K and early M Dwarfs, and M/H RMS = 0 . ± . 068 for theM dwarfs, as compared to M/H RMS = 0 . ± . 05 from Netopil et al. (2016). The average and RMS errorsfor Praesepe are M/H RMS = 0 . ± . 064 for the K and early M Dwarfs, and M/H RMS = 0 . ± . M/H RMS = 0 . ± . 08 from Netopil et al. (2016). 9M dwarf regressor, we find larger random errors for the testing subset along with systematic offsets,which we attribute to the contaminating presence of unresolved binaries in our training subset.When applying these regressors to the full catalog of high proper motion stars, we find that themetallicity estimates are consistent with that we would expect from the stellar populations probedby our sample based on their location (galactic height) and kinematics (old/young population).Additionally, when we examine the metallicity distributions of stars in our sample that are membersof the Hyades and Praesepe open clusters, we find average metallicity values that are consistent withvalues measured with high resolution spectroscopy of the more massive cluster members. While we donot notice any systematic errors for these clusters, we are not yet able to test the relationship for themore critical metal-poor metallicity regime, due to limitations with the high proper motion/low-masssample which restricts the census to very local populations. The validation of the relationship in themetal-poor regime might require direct metallicity measurements through spectroscopic observations.Additionally, for the M dwarf regressor specifically, we note that systematic errors are still presentto some degree throughout the metallicity range probed, which is indicative of unresolved binaries inour training sample. Future work will need to be done in order to better remove unresolved binariesfrom the training subset in order to improve the regression and mitigate these issues.ACKNOWLEDGMENTSMr. Medan gratefully acknowledges support from a Georgia State University Second Century Ini-tiative (2CI) Fellowship.This work has made extensive use of an astronomical web-querying package in Python, astroquery (Ginsburg et al. 2019), in the querying of external catalogs.This work has made use of data from the European Space Agency (ESA) mission Gaia Alam, S., Albareti, F. D., Allende Prieto, C.,et al. 2015, ApJS, 219, 12Allard, F., Hauschildt, P. H., Alexander, D. R., &Starrfield, S. 1997, ARA&A, 35, 137Augusteijn, T., Greimel, R., van den Besselaar,E. J. M., Groot, P. J., & Morales-Rueda, L.2008, A&A, 486, 843Bensby, T., Feltzing, S., & Oey, M. 2014, \ aap,562, A71Bianchi, L., Herald, J., Efremova, B., et al. 2011,Ap&SS, 335, 161Bochanski, J. J., Hawley, S. L., Covey, K. R.,et al. 2010, AJ, 139, 2679Bonfils, X., Delfosse, X., Udry, S., et al. 2005,A&A, 442, 635Budav´ari, T., & Szalay, A. S. 2008, ApJ, 679, 301Buitinck, L., Louppe, G., Blondel, M., et al. 2013,in ECML PKDD Workshop: Languages forData Mining and Machine Learning, 108–122 Casagrande, L., Flynn, C., & Bessell, M. 2008,MNRAS, 389, 585Casagrande, L., Ram´ırez, I., Mel´endez, J., Bessell,M., & Asplund, M. 2010, A&A, 512, A54Chambers, K. C., Magnier, E. A., Metcalfe, N.,et al. 2016, arXiv e-prints, arXiv:1612.05560Cutri, R. M., & et al. 2014, VizieR Online DataCatalog, II/328Davenport, J. R., & Dorn-Wallenstein, T. Z. 2019,Research Notes of the American AstronomicalSociety, 3, 54Davenport, J. R., Ivezi´c, ˇZ., Becker, A. C., et al.2014, \ mnras, 440, 3430Dittmann, J. A., Irwin, J. M., Charbonneau, D.,& Newton, E. R. 2016, ApJ, 818, 153Eisenstein, D. J., Weinberg, D. H., Agol, E., et al.2011, AJ, 142, 72Gaia Collaboration, van Leeuwen, F., Vallenari,A., et al. 2017, A&A, 601, A19 Gaia Collaboration, Babusiaux, C., van Leeuwen,F., et al. 2018a, \ aap, 616, A10Gaia Collaboration, Brown, A. G. A., Vallenari,A., et al. 2018b, A&A, 616, A1Ginsburg, A., Sip˝ocz, B. M., Brasseur, C. E., et al.2019, AJ, 157, 98Gonz´alez Hern´andez, J. I., & Bonifacio, P. 2009,A&A, 497, 497Green, G. M., Schlafly, E., Zucker, C., Speagle,J. S., & Finkbeiner, D. 2019, \ apj, 887, 93Grieves, N., Ge, J., Thomas, N., et al. 2018, \ mnras, 481, 3244Hartman, Z. D., & L´epine, S. 2020, ApJS, 247, 66Hejazi, N., L´epine, S., Homeier, D., Rich, R. M.,& Shara, M. M. 2020, AJ, 159, 30Holtzman, J. A., Hasselquist, S., Shetrone, M.,et al. 2018, The Astronomical Journal, 156, 125Ivezi´c, ˇZ., Sesar, B., Juri´c, M., et al. 2008, \ apj,684, 287Ivezi´c, ˇZ., Kahn, S. M., Tyson, J. A., et al. 2019,ApJ, 873, 111Kunder, A., Kordopatis, G., Steinmetz, M., et al.2017, AJ, 153, 75L´epine, S., & Bongiorno, B. 2007, AJ, 133, 889L´epine, S., Rich, R. M., & Shara, M. M. 2007,ApJ, 669, 1235Lindgren, S., Heiter, U., & Seifahrt, A. 2016,A&A, 586, A100Liu, C., Li, L., Zhang, F., et al. 2012, MNRAS,424, 1841Marrese, P. M., Marinoni, S., Fabrizio, M., &Altavilla, G. 2019, A&A, 621, A144Netopil, M., Paunzen, E., Heiter, U., & Soubiran,C. 2016, \ aap, 585, A150 Newton, E. R., Charbonneau, D., Irwin, J., et al.2014, AJ, 147, 20Ng, P. T., & Maechler, M. 2015, COBS:COnstrained B-Splines, , , ascl:1505.010Nordstr¨om, B., Mayor, M., Andersen, J., et al.2004, \ aap, 418, 989Pecaut, M. J., & Mamajek, E. E. 2013, ApJS, 208,9Ram´ırez, I., & Mel´endez, J. 2005, ApJ, 626, 465Ricker, G. R., Winn, J. N., Vanderspek, R., et al.2015, Journal of Astronomical Telescopes,Instruments, and Systems, 1, 014003Rojas-Ayala, B., Covey, K. R., Muirhead, P. S., &Lloyd, J. P. 2010, ApJL, 720, L113Schlafly, E. F., & Finkbeiner, D. P. 2011, \\