[PDF] The Implementation of Binned Kernel Density Estimation to Determine Open Clusters' Proper Motions: Validation of the Method

Abstract

Stellar membership determination of an open cluster is an important process to do before further analysis. Basically, there are two classes of membership determination method: parametric and non-parametric. In this study, an alternative of non-parametric method based on Binned Kernel Density Estimation that accounts measurements errors (simply called BKDE-e) is proposed. This method is applied upon proper motions data to determine cluster's membership kinematically and estimate the average proper motions of the cluster. Monte Carlo simulations show that the average proper motions determination using this proposed method is statistically more accurate than ordinary Kernel Density Estimator (KDE). By including measurement errors in the calculation, the mode location from the resulting density estimate is less sensitive to non-physical or stochastic fluctuation as compared to ordinary KDE that excludes measurement errors. For the typical mean measurement error of 7 mas/yr, BKDE-e suppresses the potential of miscalculation by a factor of two compared to KDE. With median accuracy of about 93%, BKDE-e method has comparable accuracy with respect to parametric method (modified Sanders algorithm). Application to real data from The Fourth USNO CCD Astrograph Catalog (UCAC4), especially to NGC 2682 is also performed. The mode of member stars distribution on Vector Point Diagram is located at μ α cosδ=−9.94±0.85 mas/yr and μ δ =−4.92±0.88 mas/yr. Although the BKDE-e performance does not overtake parametric approach, it serves a new view of doing membership analysis, expandable to astrometric and photometric data or even in binary cluster search.

Full PDF

aa r X i v : . [ a s t r o - ph . GA ] F e b The Implementation of Binned Kernel Density Estimationto Determine Open Clusters’ Proper Motions:Validation of the Method

R. Priyatikanto • M. I. Arifyanto

Abstract

Binned Kernel DensityEstimation that accounts measurements errors (simplycalled BKDE- e ) is proposed. This method is appliedupon proper motions data to determine cluster’s mem-bership kinematically and estimate the average propermotions of the cluster. Monte Carlo simulations showthat the average proper motions determination usingthis proposed method is statistically more accuratethan ordinary Kernel Density Estimator (KDE). Byincluding measurement errors in the calculation, themode location from the resulting density estimate isless sensitive to non-physical or stochastic ﬂuctuationas compared to ordinary KDE that excludes measure-ment errors. For the typical mean measurement er-ror of 7 mas/yr, BKDE- e suppresses the potential ofmiscalculation by a factor of two compared to KDE.With median accuracy of about 93%, BKDE- e methodhas comparable accuracy with respect to parametricmethod (modiﬁed Sanders algorithm). Application toreal data from The Fourth USNO CCD AstrographCatalog (UCAC4), especially to NGC 2682 is also per-formed. The mode of member stars distribution on Vec-tor Point Diagram is located at µ α cos δ = − . ± . µ δ = − . ± .

88 mas/yr. Although the

R. PriyatikantoM. I. Arifyanto Astronomy Department, Institut Teknologi Bandung, Indonesiaemail: [email protected] Space Science Centre, National Institute for Aeronautics andSpace, Bandung, Indonesia Astronomy Research Group, Faculty of Mathematics and Nat-ural Sciences, Institut Teknologi Bandung, Indonesia

BKDE- e performance does not overtake parametric ap-proach, it serves a new view of doing membership anal-ysis, expandable to astrometric and photometric dataor even in binary cluster search. Keywords proper motions; (

Galaxy :) open clustersand associations: general; methods: numerical; ∼ wilton/)where the average proper motions of 55% of these openclusters have been determined by several of researchers(e.g. Dias et al. 2002; Kharchenko et al. 2005). Thesedeterminations, of course, need to be done after carefulmembership determination since the existing ﬁeld starsobscure those Galactic open clusters.Special characteristics of open clusters as spatial,kinematic, and photometric agglomeration of stars arethe foundation of membership determination that ba-sically distinguish cluster members out oﬀ contaminat-ing ﬁeld stars. Several methods have been developed toevaluate the membership probability of any star insidesampling radius of open cluster. One of the oldest para-metric method has been introduced by Vasilevkis et al.(1958) and developed by several authors (Sanders 1971;Cabrera-Ca˜no and Alfaro 1985; Zhao and He 1990;Zhao et al. 2006; Krone-Martins et al. 2010), employ-ing proper motions data. This method assumes thatthe proper motions distribution of member and non-member populations follow a normal bivariate dis- tribution function (Vasilevkis et al. 1958). Two nor-mal distribution function with diﬀerent parametersare ﬁtted to the data iteratively. The latest develop-ment of this method includes measurement errors intocalculation and claims that the algorithm may pro-vide intrinsic velocity dispersion for mass calculation(Zhao and He 1990; Zhao et al. 2006). Both in-depthcluster study (e.g. Wiramihadja et al. 2009) and batchanalysis (Dias et al. 2002, 2006) employ this methodfor membership determination.The basic assumptions of parametric method faceproblem when dealing with high-uncertainty data, non-Gaussian distribution of ﬁeld stars’ kinematics, or over-lapping distribution between member and non-memberstars (Cabrera-Ca˜no and Alfaro 1990). These condi-tions drive the emergence of non-parametric methods inmembership probability calculation (Cabrera-Ca˜no and Alfaro1990; Galadi-Enriquez et al. 1998). Within the frame-work of this method, the kinematic distribution of bothpopulation are constructed using density estimatessuch as Kernel Density Estimation (KDE, Silverman(1986)). Non-parametric nature of this approach servesﬂexibility that enable researcher to ﬁnd multimodalitysuch as the overlapping cluster NGC 1750 and NGC1758 (Galadi-Enriquez et al. 1998).However, the density estimation process as thehearth of non-parametric approach requires greatercomputational eﬀort compared to the iteration processof parametric methods. For this case, binning pro-cess (namely binned KDE) gives alternative and fasterway to estimate density distribution without reducingthe accuracy (Wand 1994). This scheme reduces thecomputational work from O ( n ) or O ( nn k ) (for director ordinary binned KDE) to O ( n + n k ), which n and n k represent number of evaluated data points and em-ployed bins. The lower computational eﬀort is essentialwhile dealing with overwhelming survey data comingin the future, e.g. the rise of kinematic data from HSTand forthcoming GAIA survey. Faster algorithm alsoserves advantage in handling high-dimensional mem-bership analysis, e.g. simultaneous membership deter-mination using astrometric, kinematic and photomet-ric data. Besides, binned KDE provides a chance toaccount measurement errors during density estimation.In this study, the performance of kernel based den-sity estimation for membership determination is ex-plored. A new scheme of BKDE that evaluates mea-surement errors through numerical technique is pro-posed. We name it BKDE- e . Monte Carlo simula-tions are drawn to test the method and compare it tothe parametric method (modiﬁed Sanders algorithm,Zhao and He (1990)) and the basic KDE method.This paper is organized as follows: In section 2fundamental concepts of KDE, BKDE, and proposed BKDE- e that accounts errors are explained. Then,Monte Carlo simulations for validation are presented insection 3. The result of these simulations is discussedin section 4, accompanied by the application to the realkinematic data for NGC 2682 obtained from US NavalObservatory CCD Astrograph Catalog (Zacharias et al.2013). At last, the conclusions are given in section 6. Density distribution of data points can be estimatedthrough various ways. One of the well-known den-sity estimation is kernel based which can be applied tounivariate or multivariate data (Silverman 1986). Let x = x , . . . , x n and y = y , . . . , y n be two-dimensionaldata. Density distribution of these data points withrespect to ( x, y ) can be estimated using the followingequation:ˆ f ( x, y ) = 1 nh x h y n X i =1 n X j =1 K (cid:18) x − x i h x (cid:19) K (cid:18) y − y j h y (cid:19) , (1)where K ( x ′ ) is kernel function, while h x and h y repre-sent kernel width or smoothing parameter.There are several kernel function to be employed forevaluating Equation 1, one of which Gaussian kernel.The choice of function among the most widely usedkernel function is not as crucial as the choice of kernelwidth, h . This parameter determines smoothing leveland of course the accuracy of the estimate. To minimizeerrors, Silverman (1986, p 45) proposed simple rule-of-thumb to determine optimal value of h , especially fornormally distributed data: h opt = 1 . σn − . , (2)where σ is standard deviation of the data. This rulework well if the data is normally distributed, but itmay oversmooth if the data is multimodal. For thecase of equal mixture of standards normal distribu-tions ( σ = 1) with means separated by two or more,Silverman (1986) showed that the rule oversmooth bya factor of two. Fortunately, Solar neighbour open clus-ters have relatively low proper motions (less than ∼ σ andSilverman’s rule becomes appropriate. (a) w k,c w k,b Hxx k,a x k,b x k,c x k,d (b) x − σ x + 3 σxx k,a x k,b x k,c x k,d Fig. 1

Illustration of linear binning process of one-dimension data point x ± σ contained by H -sized bins. Without errorcalculation (a), data point x only gives its weight to two neighbouring knots ( x k,b and x k,c ), proportional to its closeness.For example, x gives its partial weight to knot x k,b proportional to the dark-grey area ( w k,b ). While in error calculation(b), that data point gives weight to knot x k,a , . . . , x k,d that conﬁne the whole probability distribution of x ± σ . x k , y k ) are generated as the rep-resentation of relevant bins. Each data point has itsweight with respect to neighbouring knots. Then kerneldensity estimate of each knot is multiplied by binningcount ( c k ) that represents the total weights ( w k ) fromsurrounding data points (see Figure 1). Wand (1994)stated that linear binning, that assigns weight propor-tionally to the closeness of any data point and nearbyknot, provides better count compared to simple count-ing rule. Adoption of this scheme may improve ordinarybinned KDE as used by Balaguer-N´un˜ez et al. (2007).Density estimate for each knot can be evaluated us-ing the following equation:ˆ f ( x k , y k ) = 1 nh x h y n k,x X i =1 n k,y X j =1 K (cid:18) x k − x k,i h x (cid:19) K (cid:18) y k − y k,j h y (cid:19) c k,ij . (3)Computation eﬀort for linear binning is O ( n ), whilethe kernel evaluation is O ( n k ) such that total compu-tation eﬀort for BKDE scheme is O ( n + n k ) which islower than ordinary KDE evaluation. However, binningprocess may eliminate some informations and generate bias between BKDE and KDE. To minimize this ef-fect, Wand (1994) gave a complex polinomial rule todetermine appropriate number of grid n k , for example n k ≥

32 for 10,000 normally distributed data which isappropriate to open cluster kinematic data (see Table1 of Wand 1994). Using this rule, the achieved relativemean integrated error,

RM ISE = binning errortotal expected error ≈ . (4)2.2 Dealing with uncertaintyLet x = x , . . . , x n be one-dimensional data with un-certainty of σ = σ , . . . , σ . Each data point with itsuncertainty can be interpreted as normal probabilitydensity function (PDF) centered at x i and having σ i de-viation. Instead of giving its weight into nearby knots,each data point spread its weight along the PDF thatmay include more surrounding knots as illustrated inFigure 1. For this case, total weights form x i to knot x k,b and x k,c can be calculated using the following equa-tions: w k,b = Z x c x b f ( x ′ ) dx ′ − H Z x c x b ( x ′ − x b ) f ( x ′ ) dx ′ , (5) w k,c = 1 H Z x c x b ( x ′ − x b ) f ( x ′ ) dx ′ , (6)where f ( x ′ ) = f ( x ′ | x, σ ) expresses normal PDF of x i ,while H is the bin size. This formulation is also ap- plied for another relevant knots, e.g. x k,a and x k,d inFigure 1. Then, the binning count is merely the totalweights obtained by each knot: c k,i = n X j =1 w k,i ( x j ) . (7)This 1 D binning scheme can be expanded for multidi-mensional cases where Equation 5 and Equation 6 canbe evaluated through the following two approaches: • Analytic approach : integration of the GaussianPDF can be done analytically for each binning seg-ment (deﬁnite integral). But, if the uncertainty σ much greater than the bin size H , analytic evalua-tion of these equations may takes more eﬀort. • Numerical integration : in this scheme, each datapoint x i generates m random dummy points withnormal distribution. Then, every dummy point gives1 /m weight to its neighbouring knots. This schemeneeds extra computational eﬀort of O ( nm ) for bin-ning purpose, m times larger than ordinary binningscheme. Beside that, employing random numbersmeans stochastic variation of the obtained solution.The emerging error follows Poisson statistics that de-creases by √ m .Among these two approaches, we use numerical inte-gration for the following analysis which is considerablymore practical. To examine the behaviour and performance of theproposed method, 1000 Monte Carlo simulations aredrawn. In this simulation, synthetic data that consistof positions, magnitude and proper motions with errorsare created to mimic the real data. Both member andnon-member populations are created.3.1 Synthetic DataSpatial distribution of member stars in the observa-tional plane is assumed to follow Gaussian distribu-tion centered at (0 ,

0) with deviation of σ = 0 . r = 1 . -2.5 0.0 2.5x-2.50.02.5 y in-fieldout-field -20 0 20 µ x (mas/yr)-20020 µ y ( m a s / y r ) in-fieldout-field d l o g ( N ) / d m clusterfield ε µ ( m a s / y r ) ε µ = a ε + (10-a ε ) e a ε = 0a ε = 2a ε = 4a ε = 6 Fig. 2

Characteristics of the generated synthetic datafor Monte Carlo simulations. Upper-left: positional data;upper-right: proper motions data; lower-left: luminosityfunction; lower-right: proper motions error as function ofmagnitude. the other hand, non-member stars are distributed uni-formly within square of 5 units around cluster center.The adopted unit is arbitrary since it does not havephysical meaning toward further analysis. Positionaldata is used to select in-ﬁeld stars which are consid-ered as main sample ( r < .

50) and control sampleof out-ﬁeld stars (1 . ≤ r ≤ . annulus and dannulus in aperture photometry analysis. The inner-most/sampling radius is chosen to encompass memberstars as complete as possible and simultaneously mini-mize the contamination from ﬁeld stars (S´anchez et al.2010). Inner and outer radii for out-ﬁeld selection arecarefully chosen such that the out-ﬁeld ring has similararea to the in-ﬁeld and covers kinematically similar ﬁeldstars compared to ﬁeld stars inside the in-ﬁeld area.Magnitude distribution of ﬁeld stars follow power lawdistribution function as observed in USNO CCD As-trograph Catalog (Zacharias et al. 2013). This distri-bution can also be described using gamma distributionfunction (e.g. Krishnamoorthy 2006) which starts frommagnitude m ≈

16, peaks at m ≈

14 and spans uptothe saturation limit of m ≈

6. Prominence star clus-ters that emerge from crowding ﬁeld stars implies thatmember stars follow the same distribution, but withbrighter mean. The result of this scheme is presentedin Figure 2.

Proper motions distribution of the ﬁeld stars is ba-sically governed by intrinsic motions of diﬀerent stel-lar populations (e.g. thin and thick disk stars), dif-ferential rotation of the Galaxy and also solar mo-tion. Thus, the distribution depend on the direc-tion of the observational ﬁeld. On the other hand,the distribution of member stars depends on the tan-gential motion of the cluster itself. Since the intrin-sic dispersion of stars inside open cluster is relativelysmall ∼ µ x,f , µ y,f ) = (0 ,

0) with dispersion of σ = 8 mas/year. This value is comparable to the para-metric solutions of σ f obtained by Krone-Martins et al.(2010) from the analysis of nine clusters with PM2000proper motions data (Ducourant et al. 2006). On theother hand, the proper motions of all member stars areidentical, equal to cluster’s proper motions.To obtain more realistic model, measurement errorsare assigned for every proper motions data. The mag-nitude of proper motions errors grow exponentially asthe magnitude increase (e.g. dimmer stars have largererror). The upper limit of this errors is 10 mas/yearwhile the minimum value is speciﬁed by input parame-ter a ǫ = [0 ,

6] which will be explained later. Lower-rightplot of Figure 2 shows error proﬁles with diﬀerent errorparameters.3.2 Free ParametersDuring the simulations, there are three varied parame-ters governing 1000 diﬀerent synthetic clusters to be an-alyzed. The ﬁrst parameter is the number ratio of clus-ter members and ﬁeld stars ( N c /N f ) which ranges from0.1 to 2.0, equivalently N c ≈ [60 , N f = 600. This pa-rameter determines how scarce the cluster members are.The second parameter deﬁnes centroid position of mem-ber stars’ kinematic. Cabrera-Ca˜no and Alfaro (1990)stated that the distance between two populations’ cen-troid inﬂuences the accuracy of proper motions deter-mination. The third parameter is the proper motionserrors that strongly depend on the apparent magnitudeof the stars. In this simulation, the errors grow expo-nentially as the magnitude increase. Equation 8 deﬁnesproper motions errors as the function of magnitude. ǫ µ = a ǫ + (10 − a ǫ ) exp (cid:16) m − (cid:17) , (8) with a ǫ = [0 ,

6] characterizes the whole function. Thisfunction is inspired by the exponential proﬁle of mea-surement error from UCAC4 data which come fromseveral catalogues with slightly diﬀerent accuracies(Zacharias et al. 2013). To simulate the measurementerror, every proper motion data is replaced by a ran-dom number taken from Gaussian distribution with themean equal to the original proper motion and the stan-dard deviation equal to the proper motion error.

For each simulated model, we perform standard para-metric method as described by Zhao and He (1990) andnon-parametric (both KDE and BKDE- e ) kinematicanalysis to determine both membership probability ofany star and cluster’s proper motion. The output ofthis process which are membership probability and dis-tribution center will be analysed as follow.4.1 Membership ProbabilityAs mentioned by previous authors (Cabrera-Ca˜no and Alfaro1990; Galadi-Enriquez et al. 1998; Balaguer-N´un˜ez et al.2007), non-parametric membership probability of anystar can be calculated using the following equation: P i =  ˆ f c + f,i − ˆ f f,i ˆ f c + f,i , if ˆ f c + f,i > ˆ f lim , if ˆ f c + f,i ≤ ˆ f lim (9)where ˆ f c + f and ˆ f c deﬁne the density estimation of in-ﬁeld and out-ﬁeld stars, while ˆ f lim sets the bottom limitto exclude insigniﬁcant density estimate at the tail ofdistribution. In this study, ˆ f lim ≈ − that corre-sponds to 2 . σ normal distribution is used. This exclu-sion is necessary to avoid overestimation of membershipprobability due to density ﬂuctuation at the edge of dis-tribution, e.g. P ≈ f f ≈ f lim ) that automatically yields zero probability at theedge of distribution. Second treatment is to use moreout-ﬁeld stars in order to statistically improve the out-ﬁeld distribution proﬁle ( ˆ f f ). This procedure comesas alternative treatment as very low ˆ f f can be drivenby under-sampled data. The implementation of thosetreatment have not been investigated deeply in thisstudy since the overestimation does not really inﬂuencethe proper motions determination of any cluster. Parametric . -10 0 20 . µ x (mas/yr)0.00.20.40.60.81.0 P r o b a b ili t y KDE . -10 0 20 . µ x (mas/yr) BKDE-e . -10 0 20 . µ x (mas/yr) parametric P K D E parametric P B K D E - e

1 26

Fig. 3

Three upper ﬁgures show membership probability of the simulated stars as function of proper motions in x direction,derived by three mentioned methods. Non-parametric probability may show more than one peaks (more obvious in KDEanalysis). As consequence, non-parametric probability for several stars are higher than their parametric probability (lowerﬁgure). Dotted line in those ﬁgure mark 2 σ limit of determined probability. CDF j = j X i =1 P sort ,i , (10)where P sort ,i represents membership probability, sortedincreasingly. By assuming Gaussian probability func-tion, 1 σ members have CDF ≥ . σ have CDF ≥ . σ have CDF ≥ . .From this category, list of cluster members can becomposed, for example by assigning 2 σ members as the Two-tails Gaussian test . probable cluster members. As shown in the lower pan-els of Figure 3, non-parametric methods (both KDEand BKDE- e ) often assign higher probability comparedto parametric results. The diﬀerence is that in KDE,the data with low parametric probability tend to de-viate more. In BKDE- e analysis, there are more 2 σ members but false assignment of less probable mem-bers are suppressed. BKDE- e assigns probability limitmore loosely compared to the parametric method as itaccounts asymmetric distribution function or even theexistence of multi-modality or blended distributions.4.3 Average Proper MotionBeside the assignment of membership probability, kine-matic analysis will give the centroid of proper motionsdistribution of cluster stars and ﬁeld stars. The cen- m (mas/yr)0.00.10.20.3 ∆ µ c /N f ε > (mas/yr) parametric KDE BKDE-emodeweighted-mean Fig. 4

Dimensionless accuracy parameter of proper motions determination (∆ µ ) as function of three free parameters:centroid separation (left), N c /N f ratio (middle) and average measurement errors (right). The lines represent medianvalue of accuracy parameter for each abscissa bin (analogous to moving average). Red, blue, and black color correspondto BKDE- e , KDE and parametric analysis respectively. Solid and dashed line correspond to mode and weighted meanprocedures for non-parametric analysis. troid of cluster stars corresponds to the average propermotions or the bulk-motion of the cluster.The location of the centroid, both in µ x and µ y di-rection, are the main parameters of parametric equa-tions (Zhao and He 1990; Krone-Martins et al. 2010).In such way, the cluster’s proper motions is achievedas the algorithm reaches the convergence solutions. Onthe other hand, non-parametric method will yield den-sity estimation φ c = φ c + f − φ f where the mode locationof φ c can be considered as the average proper motion.But, determination of the mode it self is a tricky pro-cess. Balaguer-N´un˜ez et al. (2007) used bin locationwith the highest φ c as the indicator of cluster’s bulk-motion with the size of the bin as its uncertainty.In this study, the following procedures are employed: • Based on the established ˆ f c , data points with ˆ f c ≥ × ˆ f c, max are assigned as the elite data used forthe next steps. The probability limit of 90% is chosenbased on the assumption that the attained elite datarepresent the mode of the distribution. • The mode or the average proper motions is the aver-age value of elite data points: µ = 1 n elite n elite X i =1 µ elite ,i . (11) • The standard deviation of elite data points is consid-ered as the proper motion’s uncertainty: σ µ = 1 n elite − n elite X i =1 ( µ elite ,i − µ ) . (12)The usage of mode as indicator of cluster’s bulkmotion in non-parametric methods have positive andnegative features. In this procedure, skewness or the Table 1

Statistics of cluster proper motions determinationaccuracy parameter (∆ µ ) for each method summarized byits mean and median. Method ∆ µ Accuracy

Mean Med Mean Med

Parametric 0.120 0.056 88% 94%Mode: KDE 0.204 0.093 80% 91%Mode: BKDE- e e wide wing of distribution will not aﬀect the ﬁnal result.But, subtraction in Equation 9 may slightly deform theshape of ˆ f c and shift the location of the mode.Another alternative procedure to determine theproper motions is using probability-weighted average: µ = P ni =1 µ i P i P ni =1 P i , (13)with membership probability P i . This procedure is verysensitive to the membership probability determination.Then, only 2 σ members are included in calculation.4.4 Accuracy of the proper motions DeterminationTo evaluate and compare the accuracy of three meth-ods, an accuracy parameter ∆ µ is deﬁned as follow:∆ µ = | µ − µ | D µ , (14)where µ is mean proper motions from parametricand non-parametric analysis, using mode and weighted Table 2

Proper motion of NGC 2682 obtained by several authors. Among these authors, Bellini et al. (2010) used CFHTand LBT observations, Krone-Martins et al. (2010) used PM2000 (Ducourant et al. 2006), Balaguer-N´un˜ez et al. (2007)used UCAC2 (Zacharias et al. 2004), while the rest directly used

Tycho-2 catalog (Høg et al. 2000). µ α cos δ µ δ Referece Data Source − . ± . − . ± .

88 this study UCAC4 − . ± . − . ± . − . ± . − . ± .

07 Krone-Martins et al. (2010) PM2000 − . ± . − . ± .

59 Frinchaboy and Majewski (2008) Tycho-2 − . ± . − . ± . − . ± . − . ± .

22 Kharchenko et al. (2005) Tycho-2 − . ± . − . ± .

28 Dias et al. (2002) Tycho-2mean procedures, while µ represents true value of theproper motions and D µ represents the distance of clus-ter and ﬁeld stars distribution in VPD (centroid sepa-ration). This parameter is equal to 0 for accurate de-termination or & µ from 1000 simulated models are shown inFigure 4.∆ µ versus D µ : As expected, the accuracy of propermotions determinations increase as the centroid oftwo populations become more separated. The rel-ative error, ∆ µ decrease as D µ increase. It isshown that the weighted mean solutions of non-parametric analysis are not better than mode solu-tion of BKDE- e . This latter solution is as good asparametric solution.∆ µ versus N c /N f : Variation of ∆ µ as function of N c /N f achieved from the simulation is in line withCabrera-Ca˜no and Alfaro (1990) that the accuracyincrease as the ratio increase. Cluster’s centroid canbe indentiﬁed more easily as the cluster becomesmore populous. Similar behaviour of mode solutionof BKDE- e , which is superior to KDE and weightedmean solutions, is also observed. Lower accuracy ofweighted mean solutions come from overestimatedmembership probability as discussed before.∆ µ versus h ǫ i : The accuracy of proper motions de-termination declines monotonically as measurementerrors grows. It is trivial, but the inferiority ofmode solution of KDE becomes more clearly asshown in Figure 4. At h ǫ i ∼ e serves better result compared to parametricapproach, even in large measurement errors.Median and mean value from ∆ µ for each methodare also enlisted in Table 1. Based on those distribu-tions, it can be summarized that parametric method has the highest accuracy (on average ∼ e , with average accuracy ∼ ∼ µ . This result shows that BKDE- e method iseligible to be used in the future works. It is of interest to apply BKDE- e to the real clusterdata. As the early comparison, we choose NGC 2682(M 67) as the subject cluster. This cluster can be acandidate as the birth place of the Sun according tothe similarities in metallicity and the predicted posi-tion at 4.5 Gyr ago though controversy still exist (seeBrown et al. (2010) and also Pichardo et al. (2012)).Accurate determination of the actual cluster’s motionin space provides an input parameter to trace back thecluster’s position in the past. This will lead to the con-clusion whether the historical position of the Sun coin-cided with NGC 2682.To determine the proper motion of NGC 2682, we useproper motion from the fourth US Naval ObservatoryCCD Astrograph Catalog (UCAC4) that includes morethan 105 million stars with measured proper motion upto R -band magnitude of ∼

16 (Zacharias et al. 2013).Typical proper motion error in proper motion rangesfrom 1 to 10 mas/yr, but we only use stars with errorless than 7 mas/yr for further analysis. Stars withintidal radius of 0 . ′ (Kharchenko et al. 2005) aroundthe cluster’s center are considered as in-ﬁeld stars whilethe stars within ring area 0 . ′ < r < . ′ are out-ﬁeld stars. For the case of concentrated-cluster NGC 2682,the inappropriate choice of in-ﬁeld and out-ﬁeld radiidoes not signiﬁcantly aﬀect the obtained cluster propermotion, but overestimation of the radii could increase -20 -10 0 10 20 µ α cos δ (mas/yr)-20-1001020 µ δ ( m a s / y r ) J Fig. 5

Vector point (left) and color-magnitude (right) of star cluster NGC 2682 (M 67). Membership probability of eachstar is denoted by circle with various size, proportional to the probability. Dashed lines in VPD mark the proper motion ofthe cluster obtained using BKDE- e . the rate of false membership assignment as alerted byS´anchez et al. (2010). On the other hand, underestima-tion of out-ﬁeld radius misleads to a deviated densityestimate of ﬁeld stars kinematics. This problem is notsigniﬁcant for rich or concentrated clusters but not fordiﬀuse clusters which could be overcame by employinga slightly larger in-ﬁeld and out-ﬁeld radii. In this way,false member assignments need to be pruned by photo-metric analysis, e.g. through CMD.Figure 5 depicts membership analysis using BKDE- e .In this case, cluster’s members are relatively easy to beseparated from the ﬁeld stars. Based on the mode ofcluster’s density estimate ( ˆ f c ), we obtain µ α cos δ = − . ± .

85 mas/yr and µ δ = − . ± .

88 mas/yr.These results are comparable to the recent studies assummarized in Table 2. However, non-systematic de-viation of the obtained proper motions, less than 0.1mas/yr, is observed when the in-ﬁeld and out-ﬁeld radiiare varied between 0.7 to 1.3 times tidal radius.The result of decontamination process is shown incolor-magnitude diagram where giant branch and blue-straggler populations are clearly identiﬁed as clustermembers. However, an asymmetric non-parametricmembership probability is observed in vector point di-agram where stars located away from ﬁeld stars distri-bution centre are considered as members with system-atically higher membership probability.Beyond the case of NGC 2682, UCAC4 provides alarge number of proper motions for cluster kinematicanalysis. This is the subject of study in the near future.

In the present study, we employ non-parametric

BinnedKernel Density Estimator for open cluster member- ship analysis using proper motions data. Some mod-iﬁcations are implemented to this method in order toaccount the eﬀect of measurement errors through nu-merical approach (BKDE- e ). A thousand of MonteCarlo models are simulated to evaluate the behaviourof BKDE- e compared to the basic KDE and parametricapproach. As the result, proper motions determinationfrom BKDE- e analysis using mode searching algorithmis better than the KDE solutions. BKDE- e solutionsare as accurate as parametric solutions. Simulationalso shows that the accuracy of proper motions deter-mination decreases as the member stars far outnum-bered by ﬁeld stars, though steeper decline is observedin accuracy versus centroid distance plot. These con-clusions quantitatively support the previous statementsof Cabrera-Ca˜no and Alfaro (1990).By including measurement errors in the calculation,the mode location from the resulting density estimateis less sensitive to non-physical or stochastic ﬂuctuationas compared to ordinary KDE that excludes measure-ment errors. For the typical mean measurement errorof 7 mas/yr, BKDE- e suppresses the potential of mis-calculation by a factor of two compared to KDE. Withmedian accuracy of about 93%, BKDE- e based methodhas comparable accuracy with respect to parametricmethod.Additionally, BKDE- e is also implemented to thereal proper motions data gathered from UCAC4 aroundthe ﬁeld of NGC 2682. Non-parametric analysis yieldscluster’s proper motions of µ α cos δ = − . ± . µ δ = − . ± .

88 mas/yr which are com-parable to other studies.Although BKDE- e method does not overtake para-metric approach, it provides ﬂexibility in terms that itcan be employed to astrometric and photometric data(Cabrera-Ca˜no and Alfaro 1990; Galadi-Enriquez et al. Acknowledgements

Authors are grateful to the anonymous referee forhis/her constructive comments. R. Priyatikanto also in-debted to Tim Pembina Olimpiade Astronomi Indone-sia (TPOA) for its continuous support.The ﬁnal publication is available at Springer viahttp://dx.doi.org/10.1007/s10509-014-2137-y References