[PDF] Cross-Identification of Stars with Unknown Proper Motions

Abstract

The cross-identification of sources in separate catalogs is one of the most basic tasks in observational astronomy. It is, however, surprisingly difficult and generally ill-defined. Recently Budavári & Szalay (2008) formulated the problem in the realm of probability theory, and laid down the statistical foundations of an extensible methodology. In this paper, we apply their Bayesian approach to stars that, we know, can move measurably on the sky, with detectable proper motion, and show how to associate their observations. We study models on a sample of stars in the Sloan Digital Sky Survey, which allow for an unknown proper motion per object, and demonstrate the improvements over the analytic static model. Our models and conclusions are directly applicable to upcoming surveys such as PanSTARRS, the Dark Energy Survey, Sky Mapper, and the LSST, whose data sets will contain hundreds of millions of stars observed multiple times over several years.

Full PDF

SSubmitted to ApJ

Preprint typeset using L A TEX style emulateapj v. 11/10/09

CROSS-IDENTIFICATION OF STARS WITH UNKNOWN PROPER MOTIONS

Gy¨ongyi Kerekes , Tam´as Budav´ari , Istv´an Csabai , Andrew J. Connolly , Alexander S. Szalay Draft version October 26, 2018

ABSTRACTThe cross-identiﬁcation of sources in separate catalogs is one of the most basic tasks in observa-tional astronomy. It is, however, surprisingly diﬃcult and generally ill-deﬁned. Recently Budav´ari &Szalay (2008) formulated the problem in the realm of probability theory, and laid down the statisticalfoundations of an extensible methodology. In this paper, we apply their Bayesian approach to starswith detectable proper motion, and show how to associate their observations. We study models on asample of stars in the Sloan Digital Sky Survey, which allow for an unknown proper motion per object,and demonstrate the improvements over the analytic static model. Our models and conclusions aredirectly applicable to upcoming surveys such as PanSTARRS, the Dark Energy Survey, Sky Mapper,and the LSST, whose data sets will contain hundreds of millions of stars observed multiple times overseveral years.

Subject headings: astrometry — catalogs — stars: statistics — methods: statistical INTRODUCTIONAt the heart of many astronomical studies today isthe basic step of catalog merging; combining measure-ments from diﬀerent time intervals, wavelengths, and po-tentially separate instruments and telescopes. Scientiﬁcanalyses exploit these multicolor cross-matches to under-stand the temporal and photometric nature of the un-derlying objects. In doing so they rely implicitly on thequality of the associations, thus the cross-identiﬁcationof sources is arguably one of the most important steps inmeasuring the properties of celestial objects.In general, cross-matching catalogs is a diﬃcult prob-lem that cannot really be separated from the scientiﬁcquestion at hand. An example of this is apparent whenwe consider the case of stellar observations. Stars thatmove between observations, due to their proper motions,are diﬃcult to merge into multicolor sources (even withina single survey). Yet without the multicolor informationit might not be possible to classify the source as a starin the ﬁrst place. With a new generation of surveys thatwill take large quantities of multicolor photometry cover-ing the Galactic Plane and observed over a period of sev-eral years (e.g. the Panoramic Survey Telescope & RapidResponse System, PanSTARRS, and the Large SynopticSurvey Telescope, LSST) it is clear that addressing theseissues is becoming a serious concern.In the recent work of Budav´ari & Szalay (2008) a gen-eral probabilistic formalism was introduced that is ex-tendable to arbitrarily complex models. The beauty ofthe approach of Bayesian hypothesis testing is that itclearly separates the contributions of diﬀerent types ofmeasurements, e.g., the position on the sky or the colors [email protected], [email protected] E¨otv¨os Lor´and University, Department of Physics of Com-plex Systems, P´azm´any P. s´et´any 1/A, Budapest, 1117, Hungary Department of Physics and Astronomy, The Johns HopkinsUniversity, 3400 North Charles Street, Baltimore, MD 21218,USA Institute for Advanced Study, Collegium Budapest,Szenth´aroms´ag u. 2, H-1014 Budapest, Hungary Department of Astronomy, University of Washington, 391015th Avenue NE, Seattle, WA 98195, USA of the sources, yet, naturally combines them into a co-herent method. It is a generic framework that providesthe prescription for the calculations that can be reﬁnedwith more and more sophisticated modeling.In this paper, we go beyond the simple case of station-ary objects, and study the cross-identiﬁcation of pointsources that move on the sky. Most importantly we fo-cus on stars that can be signiﬁcantly oﬀset between theepochs of observations. Although we only have loose con-straints on their proper motions in general, this priorknowledge is enough to revise our static models, andwork out the Bayesian evidence of the matches. In Sec-tion 2 we introduce a class of models that allow forchanges in the position over time. Section 3 deals withthe a priori constraints on the proper motions of the starsand their empirical ensemble statistics. In Section 4 weshow the improvements over the static model on actualobservations of stars, and Section 5 concludes our study.Throughout this paper, we adopt the convention to usethe capital P symbol for probabilities and the lower case p letter for probability densities. PROPER MOTIONConceptually, modeling the position of moving sourcesis straightforward. The description combines the motionand the uncertainty of the astrometric measurements.The ﬁrst question to answer is where on the sky oneshould expect to see an object of a certain proper motion,if it had been in some known position at a given time.Next, we calculate the evidence that given detections aretruly observations of the same object.2.1.

Multi-epoch Models

The positional accuracy is characterized by a proba-bility density function (hereafter PDF) on the celestialsphere. In a given model M , this p ( x | r , M ) functiontells us where to expect x detections of an object thatis at its true location r . Throughout this paper, we use3-dimensional unit vectors for the positions on the sky,e.g., the aforementioned x and r quantities. Usually thePDF is a very sharp peak and is assumed to be a normal a r X i v : . [ a s t r o - ph . GA ] J un Kerekes, Budav´ari et al.distribution with some angular accuracy σ . The correctgeneralization to directional measurements is the Fisher(1953) distribution, F ( x | r , w ) = w δ ( | x |− π sinh w exp ( w rx ) (1)whose shape parameter w is essentially 1 /σ in the limitof large concentration; see details in Budav´ari & Szalay(2008).The added complication comes from the fact that someobjects are not stationary. If a given star is at location r now and has µ proper motion then ∆ t time later, itwould be at some other position r (cid:48) r (cid:48) = r (cid:48) (∆ t ; r , µ ) (2)that is oﬀset by a small displacement along a great circle.By substituting this position into our astrometric model,we create a new one M (cid:48) with the added proper motion, µ , and time diﬀerence, ∆ t , parameters. p ( x | ∆ t, r , µ , M (cid:48) ) = F ( x | r (cid:48) (∆ t ; r , µ ) , w ) (3)Naturally, there is nothing speciﬁc in this about the cho-sen characterization of the astrometry; one can use anyappropriate PDF in place of the Fisher distribution in-stead. 2.2. The Bayes Factor

At the heart of the probabilistic cross-identiﬁcation isthe Bayes factor used for hypothesis testing. The ques-tion we are asking is whether our data D , a set of de-tected sources in separate catalogs with positions { x i } ,are truly from the same object. For every catalog, weknow its epoch and its astrometry characterized by aknown p i PDF. Let H denote the hypothesis that as-sumes that all measured positions are observations of thesame object, and let K denote its complement, i.e., anyone or more of the detections might belong to a separateobject. By deﬁnition, the Bayes factor is the ratio of thelikelihoods of the two hypotheses we wish to compare, B ( H, K | D ) = p ( D | H ) p ( D | K ) (4)that are calculated as the integrals over their entire pa-rameter spaces.If we assume that there is a single object behind theobservations, we can integrate over its unknown propermotion and position to calculate p ( D | H ) = (cid:90) d r (cid:90) d µ p ( r , µ | H ) n (cid:89) i =1 p i ( x i | ∆ t i , r , µ , H ) (5)where the joint likelihood of H given the data is writ-ten as the product of the independent components and p ( r , µ ) is the prior on the parameters, which is the sub-ject of the following section. The actual calculation ofthis likelihood depends on the prior and might only beaccessible via numerical methods.The complementary hypothesis is more complicated inthe sense that the model has a set of independent objectswith { r i , µ i } parameters, however, the result of the cal-culation turns out to be much simpler. Here the integral separates into the product of p ( D | K ) = n (cid:89) i =1 (cid:90) d r i (cid:90) d µ i p ( r i , µ i | K ) p i ( x i | ∆ t i , r i , µ i , K )(6)For each integral, we can select a reference time such that∆ t i = 0, hence the eﬀect of the proper motion drops out,and we arrive at the same result as the stationary casediscussed by Budav´ari & Szalay (2008). PRIOR DETERMINATIONThe proper motion really only shows up in the nu-merator of the Bayes factor for assessing the quality ofthe association. The model is well-deﬁned but the in-tegration domain is set by the joint prior that is yet tobe determined. In general, the prior p ( r , µ | H ) can bevery complicated for its dependence on the properties ofthe star. Simply put, brighter sources are likely to becloser, and hence, have a larger proper motion. Morecomplicated is the eﬀect of the color that is (along withits magnitude) a proxy for placing the star in diﬀerentstellar populations with diﬀerent dynamics. In this pa-per, we will not discuss these eﬀects that will be a topicof future work. We also note that the prior can be afunction of the time diﬀerence, ∆ t , to account for caseswhen the star travels far between observations to a newlocation with diﬀerent source density. However, we ex-pect this to be a small eﬀect because the typical speed ofstars and the usual time diﬀerences between observationstoday yield small displacements on the sky.Using the basic properties of conditional densities, wecan write it as the product p ( r , µ | H ) = p ( r | H ) p ( µ | r , H ) (7)where the ﬁrst term is the prior on the position, e.g., theall-sky prior written with Dirac’s δ symbol as p ( r | H ) = 14 π δ ( | r | −

1) (8)and the more complicated second term describes the pos-sible proper motions as a function of location and option-ally other properties. The simplest possible model, afterthe stationary case, is to assume a uniform prior on µ up to some µ max limit independent of the location, i.e., p ( µ | r , H ) = (cid:26) (cid:14) πµ if | µ | < µ max µ is assumed to be negligi-ble. 3.1. Ensemble Statistics

To derive a more realistic p ( µ | r , H ) prior, we chooseto study the ensemble statistics of stars instead of ap-proaching the problem with an analytic model. Whilethe latter would have the advantage of providing a func-tion at an arbitrary resolution, the formulas are diﬃcultto derive and the analytic approximations might misssubtle details of the relation that could be relevant.We study the properties of stars in the Sloan Digi-tal Sky Survey catalog archive that also contains accu-rate proper motion measurements from the recalibratedross-Identiﬁcation of Stars 3 Fig. 1.—

Illustration of the ﬁrst term of the prior (10) in thefootprint of SDSS Stripe 82. The µ δ axis has an asinh() scale.As one moves into the direction of Galactic Plane (upwards anddownwards on the ﬁgure), the probability distribution gets nar-rower since the velocity dispersion of stars decreases. United States Naval Observatory B1.0 Catalog (USNO-B; Munn et al. 2004). For this analysis, we pick starsfrom the Stripe 82 data set where multiple observationsare available over 300 square degrees strip covering anarrow range in declination between ± . ◦ . These re-peated observations were taken between June and De-cember each year from 1998 to 2005 (Adelman-McCarthyet al. 2008). After rejecting saturated and faint sources(u should be in the range of 15–23.5 and g, r, i, z in therange of 14.5–24), the number of stars is around 100,000.This size does not allow for a high-resolution determina-tion of the prior, hence we analyze additional simulationdata.To extend the number of stars used in constructingthe prior, we used the current state-of-the-art Besan¸conmodels (Robin et al. 2003) that match the SDSS distri-butions well. Assuming four diﬀerent stellar populationsin the Milky Way, using the Poisson equation and colli-sionless Boltzmann equation with a set of observationalparameters (i.e. ﬁtting parameters to the dynamical rota-tion curve), they compute the number of stars of a givenage, type, eﬀective temperature and absolute magnitude,at any place in the Galaxy. The model has been success-fully used for predictions of kinematics and comparisonwith observational data in studies, e.g., (Bienaym´e etal. 1992), (Chareton et al. 1993), (Ojha et al. 1999),(Rapaport et al. 2001), (Soubiran et al. 2003). A to-tal of 740 ,

000 stars were generated from the Besan¸conmodels using large-ﬁeld equatorial coordinates, thus theprior is dominated by model data and not by SDSS mea-surements. The proper motion distribution of SDSS dataand that of the model are very consistent, with the modelyielding somewhat wider distributions than the observa-tions.In preparation for binning the data, we separate thedependence of the prior on the diﬀerent proper motioncomponents, µ = ( µ α , µ δ ), and, omitting the explicit hypothesis, we write p ( µ α , µ δ | r ) = p ( µ δ | r ) p ( µ α | µ δ , r ) (10)Since the Stripe 82 data set in SDSS contains sourcesonly in a narrow declination range between − . ◦ and+1 . ◦ , we can safely neglect the dependence on decli-nation in Equation (10) for the purpose of this study,thus p ( µ α , µ δ | r ) ≈ p ( µ δ | α ) p ( µ α | µ δ , α ) (11)We establish these relations one-by-one starting with theformer using the basic property of conditional densities p ( µ δ | α ) = p ( µ δ , α ) (cid:82) p ( µ (cid:48) δ , α ) dµ (cid:48) δ (12)To achieve a better signal-to-noise behaviour across theentire parameter space, we do not use a uniform gridbut vary bin sizes so that they follow the asinh() in bothparameters. In this way one can have higher resolutionbins where more data are available (around the peak closeto 0) and wider bins in the tail. We further improve thequality of the empirical prior by removing high-frequencynoise with a convolution ﬁlter, whose characteristic widthis approximately one pixel in size at any location. Theintegral in the denominator is evaluated by counting thestars in the appropriate bins using the widths of the binsto weight the counts.Figure 1 shows the prior using the aforementioned non-linear scale for µ δ as function of the position α . The dis-tribution is centered on approximately − / yr, andthe location of the mode is practically independent ofthe R.A. As one nears to the direction of the GalacticPlane ( α = 60 ◦ and α = 305 ◦ are the nearest regionsin this stripe) the distribution becomes sharper. Thisis to be expected as, if we included a broader range inRight Ascension, the PDF would get even narrower asthe velocity dispersion of stars decreases.The second term of Equation (11) is constructed sim-ilarly using the same adaptive binning and smoothingbut in even higher dimensions. It is diﬃcult to visualizea 4-dimensional PDF, hence, in Figure 2, we plot slices ofthe prior at various α values. Both axes are shown in thetransformed scale. The values α = 57 . ◦ and α = 307 . ◦ represent the two edges of Stripe 82, which are closestparts to the Galactic Plane. The same eﬀect can be seenon these panels as on Figure 1, looking out of the Planethe PDF gets more disperse. The boxy (squared) shapeof the contour lines arises from the asinh() transforma-tion; on a linear system, the contours would appear tobe more circular. DISCUSSION4.1.

Sample Stars

As mentioned earlier, Stripe 82 was observed repeat-edly from 1998 to 2005, between June and December ofeach year. Thus we can obtain multi-epoch observationsto test our method. We choose a range of stars with dif-ferent proper motions observed at diﬀerent epochs. Tobe sure that for our tests the observed stars are the samein each epoch we select bright stars (magnitude r <

Fig. 2.—

Illustration of the second term of the prior (10). The function is 4-dimensional; The 6 diﬀerent panels represent slices of the4-dimensional PDF in the R.A. direction.

Fig. 3.—

Eﬀects of time diﬀerence and proper motion limit on Bayes factor in case of uniform prior. In the left panel, we consider amock observation pair with 0.54” separation and vary time intervals. The right panel shows the Bayes factor of a 360 mas/yr star with∆ t = 2 years as a function of the proper motion limit; see Equation (9). The error bars represent the uncertainty of numerical integration. ross-Identiﬁcation of Stars 5 Fig. 4.—

Illustration of the weight of evidence (upper panels) and posterior probability (lower panels) as a function of proper motion incase of 2 observations with variable time diﬀerences (increasing from the left to the right). The proper motion is shown on a logarithmicscale. Open circles represent the static model, the triangles and crosses correspond to the values from the constant proper motion and theempirical prior, respectively. These values are also presented in Table 1.

Kerekes, Budav´ari et al.

Fig. 5.—

The Bayes factor and posterior probability as a function of the proper motion in case of 2-, 3- and 4-way associations, increasingfrom left to right, see text. These values are also presented in Table 2.. small time intervals while the biggest time interval be-tween the epochs is approximately 6.5 years. We dividethe time interval into 3 approximately equal parts andthus get 4 observations of each star with much the sametime intervals between them. According to Equations (5)and (6), only RA and Dec coordinates are used for calcu-lating the Bayes factors, USNO measurements of propermotion on the forthcoming ﬁgures and tables are onlyshown as a reference. We randomly select a dozen starsfor the following tests.4.2.

Numerical Integration

We calculate the integrals of the Bayes factor numeri-cally. Our Monte-Carlo implementation generates inde-pendent random positions { r n } (3-D unit vectors) andtwo random components of the velocity that yield the µ n vector in the tangent plane of each r n , i.e., r (cid:48) n = ( r n + µ n ∆ t ) (cid:46) | r n + µ n ∆ t | (13)In theory, one has to integrate the position over the wholecelestial sphere, and the proper motion out to inﬁnity,but the integrands always drop sharply in practice, henceone can bound all the relevant parameters easily to re- duce the computational need and to use the above ap-proximation to the motion. For more eﬃcient implemen-tations, one can utilize more sophisticated Markov chainMonte Carlo (MCMC) methods.The uncertainty estimates include two separate sourcesof errors. The numerical imprecision is tuned by thenumber of generated random parameters, and can be es-timated in the process of the integration. In our cal-culations, this error term is kept at a low level, andcontributes 10 − order of magnitude to the value of theweight of evidence.Another source of error comes from the uncertaintyin position measurements. While this is small for theSDSS detections, σ = 0 . σ deviations.4.3. The Uniform Proper Motion Prior ross-Identiﬁcation of Stars 7After the static case, the simplest is a uniform prioras introduced in Equation (9). This analytic formulamay appear at ﬁrst not to favor any particular propermotion, yet the Bayes factor has some non-trivial scalingproperties that are worth considering.As the displacement of the source is a product of thetime diﬀerence and the proper motion, associations at thesame distances but with varying time intervals will in-deed have diﬀerent Bayes factors. In the case of a longer∆ t , only smaller proper motions will contribute to theintegral in Equation (5) shrinking the integral domain.This yields a scaling by a factor of ∆ t − , as seen in theleft panel of Figure 3. This means that associations willbe assigned lower qualities if they are farther in time evenif their angular separations are identical.Another interesting aspect is the selection of the lim-iting µ max value. Our choice of 600 mas/yr is admit-tedly somewhat arbitrary and was selected to cover thestars in our sample. If one decreased its value then starsmoving at faster speeds would quickly get lower Bayesfactors and clearly not be associated. Increasing limitsmake the value of the constant prior drop, which in turnwill lower the quality of the associations. For illustrat-ing this eﬀect we compute the Bayes factors for a starwith µ = 360 mas / yr and ∆ t = 2 years as a functionof µ max , see the right panel of Figure 3. As the prior isproportional to µ − , the curve follows the same trend.4.4. The Time Diﬀerence

First we analyze the quality of the associations as afunction time diﬀerence between the observations. Thetop panels of Figure 4 show the logarithm of the Bayesfactor, a.k.a. the weight of evidence for all stars in ourtest sample as a function of their proper motion. Opencircles represent the results from the static model thatcan be obtained analytically as in Budav´ari & Szalay(2008), and crosses show the new measurements fromthe numerical integration of the improved model usingthe empirical prior introduced in this study. Trianglessignal the value for a simple model of constant propermotion prior with µ max = 600 mas / yr. If we correct forsmall relative diﬀerences in time intervals between theepochs taking 2, 4.5 and 6.5 years as a reference respec-tively, this prior yields practically constant weights ofevidence. For reference, the W = 0 threshold is plot-ted as the dashed horizontal line. This is the theoreticaldividing line above which the observations support thehypothesis of the match. All panels contain the sameobjects but the calculations are based on diﬀerent detec-tions that are farther apart in time as we go from leftto right. What we see immediately is that as the timediﬀerence increases, the models provide increasingly dif-ferent results: the static model starts rejecting stars withlarger proper motions much faster than models that ac-commodates the possibility of the sources moving.While the only objective measure of the quality for thematch is the Bayes factor, its interpretation for the unini-tiated is admittedly not as obvious at ﬁrst as a proba-bility value would be, where one has a good sense of themeaning of the values. From the Bayes factor, we cancalculate the posterior P ( H | D ), if we have a prior P ( H ) via the equation P ( H | D ) = (cid:20) − P ( H ) B ( H, K | D ) P ( H ) (cid:21) − (14)Assuming a constant prior over the sky with the valueof P ( H ) = 1 /N , where N = 10 is the estimated totalnumber of stars on the sky as computed from the averagedensity in SDSS, we can plot the matching probabilitiesfor comparison. Note that large posterior probabilitiesare not sensitive to small modulations in the density; itchanges linearly only for small values of the Bayes factor.The bottom panels of Figure 4 use the same symbols asthe top ones to illustrate the derived posteriors using theabove constant prior. The diﬀerence between the modelsis possibly even more striking here: While the left panelhas very similar estimates from the diﬀerent models, withtime the separations grow large enough to quickly zeroout the probabilities for stars with proper motions largerthan 100 mas/yr, whereas the new models keep the prob-abilities signiﬁcantly larger. Table 1 contains the mea-surements for all stars as function of the time diﬀerence.The ﬁrst column of the table is the identiﬁer of the star,the ObjID in SDSS Data Release 6. The reference propermotion values are taken from ProperMotions table of theSDSS Catalog Science Archive, which combine the SDSSand the recalibrated USNO-B astrometry for a preciseand reliable determination.We see two important features of the associations usingthe new proper motion priors. The constant prior yieldslower and lower Bayes factors as the elapsed time in-creases and the probability drops regardless of the propermotion. Even when the separations are small enough forthe static model to perfectly recover the object, the con-stant proper motion prior yields a lower 90% probability.The empirical prior always outperforms the static modelbut, in this 2-epoch observation case, the probabilities ofthe fast stars fall below the constant case or any reason-able probability threshold.4.5.

Three and Four Epochs

Next we turn our attention to the potential improve-ments from including additional epochs to the data sets.For this comparison, we keep the ﬁrst and last observa-tions in time, hence the baseline is the same for all cases.We add to these two observations additional measure-ments whose epochs are between them in time. These 2-,3- and 4-way associations are shown in the left, middleand right panels of Figure 5, respectively. It is apparentthat adding new detections signiﬁcantly improves the theproper motion models: the reasonably good associationsof only two detections are promoted to essentially cer-tain matches by including intermediate detections. Incontrast, the static model continues to reject the associ-ations of all high proper motion stars. We see that oneof the stars with µ ∼

100 mas / yr actually gets a highprobability even in the static model, when the angularseparation for only a few years between the epochs issmall enough to recover the star.Table 2 shows the measurements as a function of thenumber of epochs used in the calculations. We see thatthe empirical prior of the improved model assigns 100%probability of all stars when considering all 4 epochs, andeven the 3-epoch computations would yield close to that Kerekes, Budav´ari et al.with the µ = 300 mas / yr star getting a lower 97%. Theexception from this is the fastest star at µ = 555 mas / yrin case of the empirical prior, whose probability is essen-tially 0 in all panels. The reason for this is that this staris one of the highest proper motion stars in Stripe 82and even with the generated model stars, which appearin the prior, we have very few (roughly 40) high propermotion stars.It is worth re-iterating the reason for and the conse-quence of these results. Associations of more than twodetections beneﬁt dramatically more from the proper mo-tion prior because two points can always be connectedwith a straight line unlike three or more. In other words,the prior probability of two detections being on a greatcircle is 100% but for three or more it is small, hencesuch combinations will get boosted by the alignment.Having seen the convincingly large probabilities for the3-way cases and assuming the same maximum time dif-ference between observations, one can conclude that, forthe time intervals we consider here, surveying strategieswith 3 epochs are superior to those with only 2, butadding more would not improve noticeably our ability tocorrectly cross-identify the detections. CONCLUSIONSWe presented an improved model for probabilisticcross-identiﬁcation of stars, which accommodates thepossibility of moving objects via a proper motion prior.Using the Bayesian approach of Budav´ari & Szalay(2008), we performed hypothesis testing with the newmodels on a sample of SDSS DR6 stars with knownproper motions and compared the results to the staticcase. In accord with our expectations, we found thatmoving stars would be missed by association algorithmsthat neglect to model the motion, but using an em-pirical prior of the proper motion would assign largerobservational evidence to the match and higher proba-bilities. The dependence of the quality of these cross-identiﬁcations was studied as a function of separationin time (and space) as well as using multi-epoch obser- vations. The SDSS Stripe 82 sample provided a goodtest set with 2–4 detections at diﬀerent times with afew years in between. The tests were done assuminga maximum proper motion of 600 mas/yr. We foundthat, even though the 2-epoch data sets beneﬁt signiﬁ-cantly from the proper motion model, the 3-epoch ob-servations essentially recover the right associations evenfor fast-moving stars, and the 4-epoch cases yield 100%probabilities. We also conclude, that the empirical priorsurpasses the static model for the whole range of propermotions, while the uniform prior performs better only forthe high proper-motion stars.Since the analytically computable static case is still agood model for most celestial sources, it is best to carryout the cross-identiﬁcation in multiple steps: ﬁrst ﬁndingassociations using the static model, and then applyingthe more computer-intensive proper motion variant onlyto the remainder of sources. While it might be temptingto simply increase the positional errors to discover theassociations of moving sources, the procedure would befar from optimal. The overall dominant eﬀect of suchchanges is that the Bayes factor would drop slower withseparation, and, since the angular distance is essentiallydivided by the uncertainty, a ten times larger σ wouldpractically yield associations out to ten times larger dis-tances; most of them incidental. The improvement ofour novel approach over such naive workarounds comesfrom using the true uncertainties and the high sensitivityof the algorithm to sources moving on a great circle asallowed by the proper motion model.The authors would like to acknowledge the use of theonline tools of the Besan¸con collaboration to obtain sim-ulated stars for this study and thank Rosemary Wysefor her invaluable insights and help with stellar modelof the Galaxy. T.B. acknowledges support from theGordon and Betty Moore Foundation via GBMF 554.G.K. and I.C. acknowledge support from NKTH:Polanyi,KCKHA005 and OTKA-MB08A-80177. A.C. acknowl-edges partial support from NSF award AST-0709394. REFERENCESAdelman-McCarthy, J.K. et al. 2008, ApJS, 175, 297-313Bienaym´e O., Mohan V., Cr´ez´e M., Consid´ere S., Robin A. C.,1992, A&A, 253, 389Budav´ari, T., & Szalay, A. S. 2008, ApJ, 679, 301Chareton M., Consid´ere S., Bienaym´e O. 1993, A&AS, 102, 649Fisher, R., 1953, Proceedings of the Royal Society of London,Series A, Mathematical and Physical Sciences, Vol. 217,No. 1130., pp.295–305Munn et al. 2004, AJ, 127, 3034. ”An Improved Proper-MotionCatalog Combining USNO-B And The Sloan Digital SkySurvey” Ojha, D. K., Bienaym´e, O., Mohan, V., & Robin, A. C. 1999,A&A, 351, 945Rapaport, M., Le Campion, J.-F., Soubiran, C., Daigne, G., Pri,J.-P., Bosq, F., Colin, J., Desbats, J.-M., Ducourant, C.,Mazurier, J.-M., Montignac, G., Ralite, N., Rquime,Y.,Viateau, B. 2001, A&A, 376,325Robin, A. C., Reyl´e, C., Derriere, S. & Picaud, S. 2003, A&A,409, 523Soubiran, C., Bienaym´e, O., Siebert, A. 2003, A&A, 398, 141York, D.G., et al. 2000, AJ, 120, 1579 ross-Identiﬁcation of Stars 9

TABLE 1Weights of evidence and posterior probabilities in the static, uniform prior and theproper-motion models as a function the elapsed time between two observations

ObjID µ ∆ t Weight Probability[mas/yr] [yr] static uniform motion static uniform motion587731173305614418 13 1.38 12.59 11.08 12.59 1.00 0.99 1.003.20 12.47 10.37 12.49 1.00 0.96 1.004.46 12.56 10.01 12.56 1.00 0.91 1.00587730847429427304 18.6 2.01 12.59 11.08 12.59 1.00 0.99 1.003.95 12.48 10.37 12.47 1.00 0.96 1.005.16 12.63 10.00 12.58 1.00 0.91 1.00587731173305876571 19 1.38 12.60 11.08 12.60 1.00 0.99 1.003.20 12.58 10.37 12.58 1.00 0.96 1.004.46 12.56 10.01 12.56 1.00 0.91 1.00587731186187763779 40 2.01 12.55 11.09 12.56 1.00 0.99 1.004.03 12.46 10.36 12.49 1.00 0.96 1.006.13 11.95 10.02 12.14 1.00 0.91 1.00588015509268725910 98 2.02 11.93 11.08 11.90 1.00 0.99 1.005.15 9.76 10.41 10.18 0.87 0.96 0.947.98 6.33 10.08 9.35 0.00 0.92 0.72588015509271805995 143 2.02 11.32 11.08 11.38 1.00 0.99 1.005.07 6.88 10.38 9.50 0.01 0.96 0.787.18 2.07 10.12 9.13 0.00 0.93 0.61588015509286813878 163 2.17 11.64 11.10 11.70 1.00 0.99 1.005.01 5.50 10.38 9.13 0.00 0.96 0.617.18 -1.52 10.07 8.88 0.00 0.92 0.46588015509268201645 196 2.02 11.44 11.07 11.56 1.00 0.99 1.006.11 -1.24 10.36 8.68 0.00 0.96 0.357.98 -11.97 10.12 8.33 0.00 0.93 0.19588015509273378938 255 2.04 9.41 11.05 9.82 0.75 0.99 0.885.08 -7.07 10.39 7.98 0.00 0.96 0.107.20 -24.16 10.06 7.65 0.00 0.92 0.05588015509279342731 257 2.02 10.02 11.08 10.08 0.92 0.99 0.935.17 -4.89 10.32 7.82 0.00 0.95 0.077.18 -24.11 10.01 7.51 0.00 0.91 0.04587730847426740272 300 1.94 9.43 11.04 9.63 0.75 0.99 0.834.01 -2.49 10.37 7.35 0.00 0.96 0.026.13 -22.84 10.04 6.81 0.00 0.91 0.01588015509283930154 555 2.19 0.40 11.09 4.12 0.00 0.99 0.004.11 -36.83 10.33 -3.05 0.00 0.96 0.007.18 -139.78 9.99 -21.80 0.00 0.91 0.00

TABLE 2Weights of evidence and posterior probabilities in the static, uniform prior andproper-motion models for the 2-, 3- and 4-way associations

ObjID µ N obsobs