Unveiling the Hierarchical Structure of Open Star Clusters: the Perseus Double Cluster
DD raft version J uly
24, 2020Typeset using L A TEX twocolumn style in AASTeX62
Unveiling the Hierarchical Structure of Open Star Clusters: the Perseus Double Cluster H eng Y u ( 余 恒 ), Z heng -Y i S hao ( 邵 正 义 ), A ntonaldo D iaferio , and L u L i ( 李 璐 ) Department of Astronomy, Beijing Normal University, Beijing, 100875, China. Shanghai Astronomical Observatory, Chinese Academy of Sciences, 80 Nandan Road, Shanghai 200030, People’s Republic of China. Key Lab for Astrophysics, 100 Guilin Road, Shanghai, 200234, People’s Republic of China Dipartimento di Fisica, Universit`a di Torino, Via P. Giuria 1, I-10125 Torino, Italy Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Torino, Via P. Giuria 1, I-10125 Torino, Italy University of the Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China (Revised July 24, 2020)
ABSTRACTWe introduce a new kinematic method to investigate the structure of open star clusters. We adopt a hierarchicalclustering algorithm that uses the celestial coordinates and the proper motions of the stars in the field of view ofthe cluster to estimate a proxy of the pairwise binding energy of the stars and arrange them in a binary tree. Thecluster substructures and their members are identified by trimming the tree at two thresholds, according to the σ -plateau method. Testing the algorithm on 100 mock catalogs shows that, on average, the membership of theidentified clusters is (91 . ± . . ± . h Per and χ Per: the distributions of the proper motions and the color-magnitudediagrams of the members of Sub1-1 and Sub1-2 are fully consistent with those of h Per and χ Per reported inthe literature. These results suggest that our hierarchical clustering algorithm can be a powerful tool to unveilthe complex kinematic information of star clusters.
Keywords: open clusters and associations: individual (NGC869, NGC884), stars: kinematics and dynamics,methods: data analysis INTRODUCTIONAn open cluster is a group of stars which formed withinthe same giant molecular cloud and were roughly born at thesame time. An open cluster generally is loosely gravitation-ally bound: some of the stars can leave the group after theirbirth, while others can be tidally removed by close encoun-ters with gas clouds and other star systems. Open clustersare unique laboratories of stellar evolution, and also impor-tant probes of the structure and evolution of the Galactic disk.The identification of the cluster members is the first cru-cial step for their investigation. Open clusters do not usu-ally show a large density contrast on the sky, unlike glob-ular clusters; therefore, sophisticated algorithms are neces-sary to identify their star members. The method of the max-imum likelihood pioneered by Vasilevskis et al. (1958) and
Corresponding author: Heng Yu, Zheng-Yi [email protected], [email protected]
Sanders (1971), based on bivariate Gaussian distributions ofthe proper motions of the cluster and field stars, has beenrevised and updated by several more recent studies (e.g.,Zhao & He 1990; Kozhurina-Platais et al. 1995; Deacon &Hambly 2004; Kharchenko et al. 2004; Dias et al. 2006;Krone-Martins et al. 2010; Sarro et al. 2014; Sampedro & Al-faro 2016, to mention a few), and remains the most adoptedmethod.To avoid the bias introduced by a fixed density distribution,some non-parametric approaches have also been developed(e.g., Cabrera-Cano & Alfaro 1990; Balaguer-N´u˜nez et al.2004; Javakhishvili et al. 2006; Nambiar et al. 2019). Inaddition, clustering methods have recently been introducedin the field. Schmeja (2011) explores four clustering algo-rithms applied to the two-dimensional distribution of the starson the sky, thus ignoring any kinematic information: starcounts, nearest-neighbor density, Voronoi tessellation, andthe minimum spanning tree. Schmeja (2011) concludes thatthe nearest-neighbor density is the most reliable method. a r X i v : . [ a s t r o - ph . GA ] J u l The Gaia mission provides unprecedented astrometric,photometric and spectroscopic data of stars and star clusters(Gaia Collaboration et al. 2016, 2018a,b), that are ideal totest standard and new clustering algorithms.Krone-Martins & Moitinho (2014) combined the principalcomponent analysis and the k -means clustering to design theUPMASK method, that has been applied by Cantat-Gaudinet al. (2018) to the data of the Gaia data release 2 (DR2), thusincluding the stellar kinematic information. Cantat-Gaudinet al. (2018) identify the members of 1229 star clusters, in-cluding h Per and χ Per. An additional clustering algorithm,DBSCAN, based on the local density of points in some pa-rameter space (Ester et al. 1996), was applied by Gao (2014)to the stars in the field of NGC188 with known proper mo-tions and radial velocities. Castro-Ginard et al. (2018) alsoinvestigate the optimal parameters of DBSCAN on simulateddata.Some of the methods mentioned above do not actually dis-tinguish between members and non-members of the cluster,but rather assign a membership probability to each star inthe field. Here, we propose a hierarchical clustering method,a new kinematic method based on a simple physical quan-tity: a proxy for the pairwise gravitational binding energy.This method arranges the stars in the field of view in a bi-nary tree; by trimming this tree according to the σ -plateaumethod (Diaferio 1999; Serra et al. 2011), the method sepa-rates the star distribution into structures with unambiguouslyidentified star members.Hierarchical clustering algorithms are well-known in com-puter science and statistics. They separate a system intosubgroups based on the measure of an adopted similarity ormetric (see, e.g., Everitt et al. 2011, for a detailed descrip-tion). A hierarchical clustering algorithm was adopted byMaterne (1978) and Serna & Gerbal (1996) to investigategroups and clusters of galaxies. Diaferio (1999) and Serraet al. (2011) improved over the original algorithm by intro-ducing the σ − plateau criterion to identify both the galax-ies that are members of a galaxy cluster (Serra & Diaferio2013) and the cluster substructures (Yu et al. 2015, 2016; Liuet al. 2018). In principle, this method can also be appropri-ate for other systems held together by gravity, like star clus-ters. Here, we show that this is indeed the case and apply themethod to the Perseus double cluster.The Perseus double cluster is a bright and rich open clus-ter, located at the distance of 2344 + − pc from the Sun (Diaset al. 2002; Currie et al. 2010; Gaia Collaboration et al.2018c), with a relatively young age of about ∼ . −
14 Myr(Keller et al. 2001; Currie et al. 2010). Based on di ff erentdata sets and methods (Uribe et al. 2002; Currie et al. 2010;Kharchenko et al. 2013; Gaia Collaboration et al. 2018c;Cantat-Gaudin et al. 2018), many of the Perseus propertieshave been extensively investigated, including the stellar mass function (Slesnick et al. 2002), the mass segregation (Bragg& Kenyon 2005), the substructures and its surrounding stel-lar halo (Currie et al. 2010; Zhong et al. 2019), and the ex-tended main-sequence turno ff (Li et al. 2019). The two maincomponents of the Perseus cluster, h Per (NGC869) and χ Per(NGC884), have similar photometric and spectroscopic prop-erties. It follows that separating their members with methodsbased on photometric data alone is not a trivial task. The twoPerseus components are clearly separated on the sky, so anusual and simple strategy is to consider a star as a member ofone of the two components if it is located within one of twoareas of the sky chosen a priori .The hierarchical clustering method we propose here iden-tifies the substructures in the field of view of the cluster with-out assuming the position and size of the substructures in ad-vance. The precise measurement of the
Gaia
DR2 data of thePerseus cluster provides an ideal test of our method.We describe the method in Sect. 2 and test it on mockcatalogs in Sect. 3. We apply the method on the data in thefield of view of Perseus in Sect. 4, and we present our resultsin Sect. 5. We conclude in Sect. 6. METHODOur hierarchical clustering algorithm arranges all the ob-jects in the field of view in a binary tree according to a proxyof the pairwise gravitational binding energies of the objects:pairs of objects with increasingly absolute value of the bind-ing energy will appear at increasingly deeper levels of thetree. By trimming the binary tree at appropriate thresholds,we can associate the tree branches to well-defined kinematicstructures. We apply this procedure to the star cluster.2.1.
The binary tree
The pairwise binding energy E i j of any pair of stars i and j combines their gravitational potential energy and their ki-netic energy: E i j = − Gm i m j r i j + m i m j m i + m j ) v i j , (1)where m i and m j are the star masses and r i j and v i j are thepairwise relative distance and velocity, respectively, with G the gravitational constant. In principle, we could estimate E i j by knowing the mass of the two stars and their six phase-space coordinates.Here, we intend to apply the algorithm to the field of viewof Perseus. Therefore, for the three spatial coordinates, weconsider the two celestial coordinates alone and ignore thedistance of the star from the observer. In fact, the averageuncertainty on the star parallax is ∼ . ∼
500 pc for the distance to a star withinthe Perseus cluster; it follows that the uncertainty on the stardistance is ∼
10 times larger than the cluster size ∼
41 pc.We thus assume that all the stars are at the same distance,corresponding to the distance r = .
34 kpc of the star cluster.As for the velocity components, in
Gaia
DR2 the typicaluncertainty on the proper motion is ∼ . − for a starof brightness G =
17 mag (Gaia Collaboration et al. 2018b),which corresponds to a velocity of ∼ − at the Perseusdistance ∼ G =
17 mag. In fact, an uncertaintyas small as ∼ − on the radial velocity can only beobtained for much brighter stars, with G =
12 mag (GaiaCollaboration et al. 2018b). We therefore ignore the stellarradial velocities.In conclusion, we consider only four out of the six phase-space coordinates, due to the large uncertainties of the twoneglected coordinates. In Sect. 3, we perform a test thatshows that these four coordinates are indeed su ffi cient to pro-vide an appropriate proxy of the binding energy.We thus estimate the pairwise binding energy of each starpair as E i j = − G m i m j r θ i j p + m i m j m i + m j ( ∆ µ x + ∆ µ y )2 r , (2)where r = θ i j is the pairwise projected angular separation of eachstar pair, ∆ µ x and ∆ µ y are the pairwise di ff erences of the twoorthogonal components of the proper motion, µ x = µ RA cos δ and µ y = µ DEC , with µ RA and µ DEC the proper motions alongthe celestial coordinates right ascension α and declination δ . Assigning the masses m i and m j is a crucial issue that de-termines the correct balance between the gravitational and ki-netic energy contributions. The simplest approach to assignthe mass to each star could rely on the mass-luminosity rela-tion. Unfortunately, this method is prone to be dominated bythe uncertainty on the distance of the stars and their derived Neglecting the component along the line of sight of both the positionand the velocity of each star clearly overweights the gravitational potentialenergy over the kinetic energy, because we underestimate both the three-dimensional separation and the three-dimensional relative velocity. Thise ff ect is easy to quantify in spherically symmetric systems: for a three-dimensional pairwise separation r ij and an angle χ between r ij and the lineof sight, we have the projected pairwise separation ˆ r ij = r ij sin χ ; by aver-aging over all the possible lines of sight, we find (cid:104) / ˆ r ij (cid:105) = / r ij (cid:104) / sin χ (cid:105) = π/ (2 r ij ). Similarly, for the pairwise kinetic energy per unit mass v ij , wefind the average projected pairwise kinetic energy per unit mass (cid:104) ˆ v ij (cid:105) = v ij (cid:104) sin χ (cid:105) = (2 / v ij . Therefore, we overestimate the absolute value of thegravitational potential energy by a factor π/ / luminosity: foreground bright stars erroneously associated tothe cluster can generate spurious gravitational potential wellsand background faint stars can erroneously be associated tocluster substructures. We thus set m i = m j = m for any i and j to avoid unnecessary complications deriving from these un-certainties, and set m = (cid:12) ; this mass appears to be a rea-sonable value according to the recent mass function of openclusters (Bastian et al. 2010).To control the balance between the two energy contribu-tions in eq. (2), we introduce the parameter p , that is au-tomatically determined by our algorithm, as we illustrate inSect. 2.3 below. As mentioned above, v i j is a ff ected by alarge uncertainty which is comparable to the velocity disper-sion of the star cluster, and overestimating v i j , and thus thekinetic energy contribution, is thus more likely than under-estimating it, compared to a situation where more accuratemeasures of v i j were available. To restore the correct balancebetween the gravitational and kinetic energy contributions,it appear thus reasonable to underweight the kinetic energycontribution by artificially amplifying the gravitational en-ergy; we obtain this result by introducing the factor p , whichwill always be larger than 1. The pairwise binding energy E i j will not correspond to its actual value, but we are more likelyto preserve the correct relative weight of the two energy con-tributions.For a catalog of N stars, the binary tree is built as follows(see Diaferio 1999; Serra et al. 2011; Yu et al. 2015 for fur-ther details):i. initially, each star α is an individual group G α , and wethus have N groups G i , with i = { , . . . , N } ;ii. we estimate the binding energy between two groups G α and G β as E αβ = min { E i j } , where E i j is the pairwisebinding energy between the star i ∈ G α and the star j ∈ G β ;iii. the two groups with the smallest binding energy arereplaced with a single group and the total number ofgroups is decreased by one;iv. we repeat the procedure from step (ii) until only onegroup is left.At the end of the procedure, all the stars in the field of vieware arranged in a binary tree. As an example, Fig. 1 showsthe binary tree obtained by applying the algorithm to the 172stars brighter than G =
10 mag in the field of view of Perseus.For this illustrative example, we set p = dendrogram . Each segment shown in the figure is a treebranch that links two nodes. The two nodes hanging beloweach node, which is the parent node, are called children. Theroot at the top of the dendrogram, which is not shown in Fig.1, is the parent of all the nodes. The nodes without children,at the bottom of the dendrogram, are the leaves. Each star is aleaf at the bottom of the dendrogram, and each group of starsis identified by each node at each level of the dendrogram.The ordinate of each node is its binding energy, whereas itsabscissa value is set to properly display the dendrogram.2.2. Trimming the tree
The identification of the stellar structures in the tree re-quires the definition of a threshold to trim the branches of thetree. To set this threshold, we consider that a gravitationallybound cluster can be roughly approximated by an isothermalsphere; therefore, di ff erent subsamples of the cluster mem-bers should approximately return the same estimate of thevelocity dispersion of the cluster (Diaferio 1999; Serra et al.2011).We can exploit this feature when the cluster we are inter-ested in is the richest system in the field of view. In this case,the cluster corresponds to the main branch of the tree, namelythe set of nodes, at each level of the tree, from which thelargest number of leaves hangs. In Fig. 1, the main branch ishighlighted by the thick black line. We walk along the mainbranch and compute the velocity dispersion σ of the leaveshanging from each node on the main branch at each level ofthe tree. Figure 2 shows σ on the main branch of the bi-nary tree of Fig. 1 as a function of the main branch nodes.The node identification numbers on the horizontal axis aresorted from the root on the left to the leaves on the right ofthe panel. This figure shows that when walking from the rootto the leaves, the velocity dispersion drops rapidly, reachesa long plateau, and decreases again. This latter drop is notactually obvious for the small sample we show in Fig. 2, butit appears more clear in richer structures, as for the structureshown in Fig. 11 below. We call this plateau the σ plateau.To locate the σ plateau, we consider the distribution of σ ,as shown in the right panel of Fig. 2. We fit this distributionwith the Gaussian mixture model (GMM, Pedregosa et al.2011) G = N G (cid:88) i w i g ( σ i , δ i ) = N G (cid:88) i w i δ i √ π e − ( x − σ i ) / δ i , (3)where the sum is over the N G Gaussians g ( σ i , δ i ). N G is setaccording to the Bayesian Information Criterion (BIC; Ivezi´cet al. 2014), that selects the model with the smallest BIC tominimize the number of free parameters required to fit thedata. We limit the search of N G in the range 1 −
10. For eachGaussian component, the fit returns its mean σ i , its standarddeviation δ i and its weight w i . The largest w i identifies theprincipal Gaussian component, and thus its parameters σ and δ . The principal Gaussian component captures the σ plateau. We identify the σ plateau from σ and δ : the nodescorresponding to σ + δ and σ − δ define the extension of the plateau; these nodes are highlighted by the two verti-cal dashed lines in Fig. 2. The plateau generally is neitherexactly flat nor monotonically decreasing; we thus identifythe nodes associated with the largest and the smallest veloc-ity dispersions within the plateau extension and call them thekey nodes. The key nodes are highlighted with the red sym-bols in Fig. 2. Here, the key nodes coincide with the plateauextension, but this might not always be the case, as it hap-pens for the case shown in Fig. 11. We adopt the bindingenergies of these two key nodes as the thresholds for trim-ming the tree: the leftmost and rightmost key nodes identifythe cluster and its substructures, respectively.2.3. Setting the parameter p
The value of p is identified by requiring an appropriate bal-ance between the gravitational and the kinetic energy contri-butions in eq. (2): if p is too small, the gravitational energy isunderestimated, a substantial number of real members are notassociated to the cluster and the extension of the σ plateau isunderestimated; if p is too large, the gravitational energy isoverestimated and a substantial number of interlopers, erro-neously associated to the cluster, blur the appearence of the σ plateau.We identify the proper value of p by exploiting its e ff ecton the weight w of the principal Gaussian component men-tioned in Sect. 2.2: the largest possible w identifies theoptimal p . We adopt the three-point equal-interval searchscheme (Ravindran et al. 2006). We start with three valuesof p = { p , p , p } = { , , } . For each of them, we de-rive the binding energies and build the binary tree. We thusderive the corresponding value of w for each p i . If w in-creases monotonically with p i , we repeat the procedure forthe fourth value of p , p = p − p =
60. We repeatedly doso until w decreases.We thus find the maximum w corresponding to the value p M of our set { p , p , . . . , p M − , p M , p M + } , with p i + = p i − p i − , for i ≥
2. The values p M − and p M + define thelength L = p M + − p M − . We now derive w for p left = p M − + L / p right = p M − + L / w among the five values corresponding to the setof p , { p M − , p left , p M , p right , p M + } . We thus identify the newset of three consecutive values of p , { p a , p b , p c } , where thelargest w is associated to p b . This set, which has now length L (cid:48) = L /
2, is taken as the new searching range and we againderive w for p (cid:48) left = p a + L (cid:48) / p (cid:48) right = p a + L (cid:48) /
4. Weiterate this procedure until the length of the searching range L is smaller than 3.Figure 3 shows a flow chart of our algorithm. The core ofthe procedure is the classical hierarchical clustering method,that we implemented based on the python module scikit-learn (Pedregosa et al. 2011). The crucial modifications webrought to this module are the adopted similarity and the Figure 1.
Dendrogram of the 172 stars brighter than G =
10 mag in the field of view of Perseus. The stars are the leaves of the tree at the bottomof the dendrogram. The vertical coordinate of each node is its binding energy. The black path highlights the main branch. The horizontal blackline shows the lower threshold; the upper threshold is closer to the root and is outside the plot. The upper threshold identifies the main structurewhose leaves are highlighted by the blue branches from which they hang; the lower threshold identifies two substructures whose leaves arehighlighted by the red and green branches from which they hang. trimming criterion: for the former, we replace the originaldimensionless distance with the proxy E i j (eq. 2) of the grav-itational binding energy; for the latter, the criterion is basedon the σ plateau method described in the previous section,that identifies both the cluster and its substructures. The mosttime-consuming section of the algorithm is the calculation ofthe pairwise binding energy and thus the computational costof the algorithm is proportional to N , with N the number ofstars in the field. MOCK CATALOGSWe test our method on a set of mock catalogs. We builda synthetic cluster of 3000 members whose number densitydistribution on the sky follows the spherical King’s model (King 1962): ρ ( r ) = ρ c (cid:112) + r / r − (cid:113) + r / r . (4)We set the tidal radius r t to infinity and derive the other pa-rameters of the model to roughly mimic the Perseus clusterat its distance ∼ . ρ c = . − returns ∼ ∼ r c = r c ∼ Figure 2.
Velocity dispersion of the leaves hanging from the nodeson the main branch of the dendrogram shown in Fig. 1 as a func-tion of the node identification number. The root of the binary tree ison the left; the leaves are on the right. The σ plateau is shown bythe horizontal solid line; the vertical dashed lines show the plateauextension. The red square and triangle indicate the key nodes ofthe plateau. The distribution of the velocity dispersions on the mainbranch nodes is shown in the right panel. Its multi-Gaussian fit re-turns three components shown by the red, orange, and green curves. Gaussian distributions with means and standard deviations µ x = − . ± .
18 mas yr − and µ y = − . ± .
18 mas yr − ,as estimated from the GAIA DR2 data (Li et al. 2019). Theparallaxes of the stars are sampled from a Gaussian distribu-tion with mean and standard deviation 0 . ± .
06 mas. Thestandard deviation is the instrumental error of GAIA (GaiaCollaboration et al. 2018b), because the error associated tothe size of the cluster is negligible. The line-of-sight veloci-ties are sampled from a Gaussian distribution with mean andstandard deviation − . ± . − , according to the recentmeasurement of NGC884 (Gaia Collaboration et al. 2018c).As background stars, we select the 29446 stars brighterthan G =
18 mag from the circular region of radius 48 ar-cmin located one degree south of NGC869, where no clus-ter exists. We keep all the properties of these backgroundstars unaltered, namely celestial coordinates, proper motionsand parallaxes. For ∼
97% of these stars, the radial velocitymeasures are unavailable. To the stars with missing radialvelocity we associate a radial velocity by sampling the Gaus-sian distribution of the available measures, whose mean andstandard deviation are − ±
33 km s − .We now assign the celestial coordinates to the stars of themock cluster so that we have the cluster at the center of thisfield of background stars. We randomly select 7000 starsfrom the background stars and create a catalog of 10000 stars,so that 30% of the catalog stars are cluster members.From this mock catalog, we extract 20 subsamples with N randomly selected stars. We adopt 5 values of N = N stars and initial p calculate the pairwisebinding energiesfind the minimumbinding energy E αβ replace the groups α and β with a single group: N=N-1N > w ( p ) and thecorresponding σ plateauwas the maximum of thefunction w ( p ) found?trim the tree andidentify the structures update p yesno noyes Figure 3.
Flow chart of the procedure for the identification of thecluster structures. , , , , p when the range L , described in the previous section, dropsbelow 10 rather than 3.The blue dots in Fig. 4 show the completeness of the cat-alogs, namely the fraction of the members of the identifiedmain structure that actually are real members, as a functionof the sample size N . The blue dots of Fig. 5 shows theinterloper fractions, namely the fraction of the members ofthe identified main structure that actually are background orforeground stars, as a function of N . The completeness andthe interloper fractions appear basically independent of thesample size N , for the range we investigate here. The aver-age completeness and interloper fractions are (91 . ± . . ± .
500 1000 1500 2000 2500 3000 3500 4000Sample size N0.8250.8500.8750.9000.9250.9500.9751.000 C o m p l e t e n e ss Figure 4.
Completeness of the main structure in the 100 mockcatalogs as a function of the size N of the sample. The blue (red)dots show the completeness from the analysis with four (six) phase-space coordinates alone (eqs. 2 and 6, respectively). The curves andshaded areas show the mean and standard deviation of each sampleat fixed N .
500 1000 1500 2000 2500 3000 3500 4000Sample size N0.0000.0250.0500.0750.1000.1250.1500.175 I n t e r l o p e r f r a c t i o n Figure 5.
Same as Fig. 4 for the fraction of interlopers.
In the estimation of the binding energy we only include thetwo celestial coordinates and the two proper motion compo-nents. To quantify the systematic error caused by ignoringthe distance to the star and its radial velocity, we also com-pute the completeness and the interloper fraction when weadopt the full expression for the binding energy E i j = − G m i m j r i + r j − r i r j cos θ i j p (5) + m i m j m i + m j ( µ xi r i − µ x j r j ) + ( µ yi r i − µ y j r j ) + ( v i − v j ) , with obvious meaning of the symbols. In this expression westill need to include the parameter p , because we continueto assume m i = m j = M (cid:12) . As in the previous tests, tolimit the computational e ff ort, we stop the procedure for the identification of p when the searching range L drops below10.With the six phase-space coordinates, the mean complete-ness increases to (96 . ± . . ± . σ from the six-coordinate values. The bias on the inter-loper fractions is relatively larger, but still within 3 σ fromthe six-coordinate values. We thus conclude that limiting ouralgorithm to four phase-space coordinates does return a bi-ased completeness and a biased interloper fraction comparedto an approach where all the six coordinates were known;however, these biases are expected to be within the randomfluctuations. DATA OF THE PERSEUS CLUSTERWe consider the data of the Perseus cluster from the
Gaia
DR2 catalog (Gaia Collaboration et al. 2016, 2018a) whichis publicly available on the EAS Gaia archive . We considerthe 32,672 stars brighter than G =
18 mag within the circularregion of radius 48 arcmins centered on the Perseus cluster: α =
2h 20m 45.3s (35.1889 deg), δ = ◦ (cid:48) . (cid:48)(cid:48) (57.1307deg). Our sample excludes the stars in the outer halo ofPerseus, which is more extended than its central region, asdiscussed in Zhong et al. (2019).We compare our analysis with the most recent identifica-tion of the members of Perseus performed by Cantat-Gaudinet al. (2018). They adopt the UPMASK method (Krone-Martins & Moitinho 2014) and use the celestial coordinates,proper motions and parallaxes provided by the Gaia
DR2sample. Cantat-Gaudin et al. (2018) set the centers and theradii of two circular regions, and to each star within these re-gions they assign a membership probability according to theuncertainties on the proper motion and parallax of the star.The two circular regions are shown in Fig. 6: they have a di-ameter of 18 arcmins, and they do not overlap. The coloredsymbols in Fig. 6 show the members with membership prob-ability P > .
5. We also consider the stars with membershipprobability P >
0. The basic properties of these star samplesare listed in Table 1.Figure 7 shows the color-magnitude diagram of the sys-tem, from the accurate photometric data of the
Gaia mission,based on the broad band G magnitude, the blue G BP (330 -680 nm) and red G RP (640 - 1000 nm) colors.Figure 8 shows the Perseus members in the plane of thecomponents of the star proper motions. The distributionsof these proper motions have mean and standard deviationsin the two directions: µ RA = ( − . ± . − , https: // gea.esac.esa.int / archive / Table 1.
The basic properties of the Perseus double cluster according to Cantat-Gaudin et al. (2018)Name FoV a N N . D d µ RA e µ DEC (mag) P > P > . / yr) (mas / yr)Field 1.6 32672 - - - -NGC869 0.3 1422 720 2344 + − -0.685 ± ± + − -0.614 ± ± a Diameter of the field of view (FoV). b Number of stars brighter than G =
18 mag in the FoV. For NGC869 and NGC884, it is the number of stars with membership probability P > c Number of stars brighter than G =
18 mag in the FoV with membership probability P > . d Distance from the Sun according to Currie et al. (2010). e Mean proper motion with one standard deviation of the stars with membership probability P > . Figure 6.
Distribution of the members of the Perseus double clusteraccording to Cantat-Gaudin et al. (2018) in the azimuthal equidis-tant projection. The grey dots show our sample of 32,672 starsbrighter than G =
18 mag. Cantat-Gaudin et al. (2018) assign amembership probability P to the stars within two circular regionsassociated to NGC869 and NGC884 with predetermined centers andradii. The stars with membership probability P > P > . µ DEC = ( − . ± . − for NGC869, and µ RA = ( − . ± . − , µ DEC = ( − . ± . − for NGC884.The standard deviations of these proper motion distribu-tions coincide with the typical uncertainty on the proper mo-tion ∼ . − of G =
17 mag stars of the
Gaia
DR2sample, which, at the distance of the Perseus cluster, corre-sponds to an uncertainty of ∼ . − on the velocity.These standard deviations are slightly larger than the esti-mated velocity dispersion ∼ . − of the Perseus doublecluster (Bragg & Kenyon 2005). However, if we only con-sider stars with membership probability P > .
5, the stan-dard deviations reduce by almost a factor of ∼
2: the meanand standard deviation of the proper motions become µ RA = ( − . ± . − , µ DEC = ( − . ± . − for NGC869, and µ RA = ( − . ± . − , Figure 7.
Color-magnitude diagram of the members of the Perseuscluster according to the membership of Cantat-Gaudin et al. (2018).Colors and symbols are as in Fig. 6. The members of NGC884 andNGC869 have indistinguishable sequences. µ DEC = ( − . ± . − for NGC884, as listed inTable 1. Figure 8.
Proper motions of the Perseus members according to themembership of Cantat-Gaudin et al. (2018). Colors and symbols areas in Fig. 6.
Figure 9.
The weight w of the principal Gaussian component ofthe multi-Gaussian fitting procedure as a function of p . The largest w corresponds to p =
10. In addition to the values probed by theprocedure described in Sect. 2.3, the figure also shows w for p = p = THE HIERARCHICAL STRUCTURE OF PERSEUSWe apply the method described in Sect. 2 to the positionsand proper motions of the set of stars in the Perseus field ofview described in the previous section. Figure 9 shows thatthe algorithm sets to p =
10 the optimal value of the parame-ter p that identifies the σ plateau. The hierarchical clusteringmethod builds, according to the proxy of the pairwise bindingenergy, the binary tree shown in Fig. 10.The velocity dispersions of the nodes on the main branchare shown in Fig. 11. The σ plateau is at 2.26 ± − ;its location and extension are shown by the horizontal solidline and the two vertical dashed lines. The two key nodes ofthe σ plateau, indicated by the red symbols, correspond tothe thresholds at binding energies E = − . s − M (cid:12) and E = − . s − M (cid:12) , respectively. With these twothresholds, we identify the members of the Perseus clusterand its substructures. The main cluster Sub1
With the upper threshold E = -0.0006 km s − M (cid:12) , the bi-nary tree returns only one structure with more than 100 mem-bers: it contains 4542 members. We call this structure Sub1.Figure 12 shows the distribution of its members on the sky.Its extended shape supports the conclusion of Zhong et al.(2019) that the physical scale of the double cluster is muchlarger than the size of its core.We compare the members of Sub1 with the members ofNGC869 and NGC884 identified by Cantat-Gaudin et al. The catalog of the members of Perseus and its substructures de-rived with our procedure is publicly available at http: // paperdata.china-vo.org / yuheng / paper / NGC0869 mag18 r0.8.mem.zip. (2018): 1137 (912) members of Sub1 are stars of NGC869(NGC884), whose membership probability P computed byCantat-Gaudin et al. (2018) is larger than 0. Figure 13 showsthe distributions of P of these stars. The Sub1 memberstend to have large P , whereas a large fraction of stars thatCantat-Gaudin et al. (2018) associate to small P are not iden-tified as members of Sub1. Specifically, for NGC869, Sub1contains 97.8% of the Cantat-Gaudin et al. (2018) stars with P > .
5, namely 704 stars out of 720; similarly, for NGC884,Sub1 contains 98.6% of the P > . P ≤ .
5, Sub1 contains 61.7% (433 stars out of 702)and 69.9% (436 stars out of 624) of the Cantat-Gaudin et al.(2018) stars, for NGC869 and NGC884, respectively.Figure 14 shows the distributions of the magnitudes of thestars with P >
0, according to Cantat-Gaudin et al. (2018),in the circular regions of NGC884 or NGC869, and the dis-tributions of the magnitudes of the subsets of these stars thatare also members of Sub1. We recover most stars of Cantat-Gaudin et al. (2018) with P > G =
16 mag:Sub1 includes 782 out of the 902 (86.7%) bright members ofNGC869 and 603 out of the 674 (89.5%) bright members ofNGC884 of Cantat-Gaudin et al. (2018).Figure 15 shows the distribution of the members of Sub1 inthe plane of the components of the star proper motions. TheSub1 members are concentrated within a roughly circular re-gion of radius ∼ . − . The distribution has mean andstandard deviation in the two directions: µ RA = ( − . ± . − and µ DEC = ( − . ± . − .These standard deviations are in between the standard de-viations ∼ . − .
24 mas yr − of NGC884 and NGC869,estimated with the stars of Cantat-Gaudin et al. (2018) with P >
0, and the standard deviations ∼ . − .
14 mas yr − ,estimated with the stars with P > .
5, as reported in Sect. 4.Figure 16 shows the distribution of the parallaxes of theSub1 members. Similarly to Cantat-Gaudin et al. (2018), weadded the zero-point o ff set 0.029 mas to each star parallax tocorrect for the systematic errors of the Gaia
DR2 . The meanand standard deviations (cid:36) = . ± .
074 mas match thevalues of the two clusters based on the stars with membershipprobability P > . (cid:36) = . ± .
042 mas for NGC869 and (cid:36) = . ± .
039 masfor NGC 884.Finally, Fig. 17 shows that the color-magnitude diagramof the Sub1 members is qualitatively comparable with thecolor-magnitude diagrams of the Cantat-Gaudin et al. (2018) According to Zinn et al. (2019), this o ff set derives from the degen-eracy in the astrometric solution between the global parallax shift and theterm describing the periodic variation of the spacecraft’s basic angle withthe spacecraft spin period. -0.01500.0150.03 B i n d i n g E n e r g y ( k m s M ) Figure 10.
Dendrogram of the Perseus double cluster with all the 32,672 stars in the field of view. The horizontal black lines are the twotrimming thresholds. The colored branches are the structures identified by the thresholds. The color code of the two structures is kept the samethroughout the paper. stars with membership probability P > The substructures Sub1-1 and Sub1-2
A relevant advantage of the hierarchical clustering methodis that, once the stars are arranged in the binary tree, di ff erentstructures in the field of view can be immediately identifiedby the proper trimming threshold.Figure 10 shows that the binary tree splits into two separatestructures at the binding energy E = − . s − M (cid:12) ,shown by the lower horizontal line. This lower threshold is set by the rightmost key node of the σ plateau shown in Fig.11. Adopting this threshold removes most of the members ofSub1 in the cluster outskirts and focuses on the deepest re-gion of the gravitational potential well of the Perseus doublecluster.The two substructures with more than 100 members iden-tified by this lower threshold, Sub1-1 and Sub1-2, are sub-structures of Sub1. Their basic properties are listed in Ta-ble 2. We also list the properties of Sub1-0, the system ofstars that are members of Sub1, but are not members of ei-ther Sub1-1 or Sub1-2.Figure 18 shows the distribution of the members of Sub1-1 and Sub1-2 on the sky. Sub1-1 and Sub1-2 overlap withNGC869 and NGC884, respectively. Compared to NGC869and NGC884, Sub1-1 and Sub1-2 contain fewer membersthan the stars with membership probability P > Figure 11.
Velocity dispersions of the leaves hanging from thenodes on the main branch of the dendrogram shown in Fig. 10 as afunction of the node identification number. The root of the binarytree is on the left; the leaves are on the right. The σ plateau isshown by the horizontal solid line, while its extension by the twovertical dashed lines. The red square and the red triangle are thetwo key nodes. The distribution of the velocity dispersions is on theright panel. Its multi-Gaussian fitting components are shown by thecolored solid lines. Figure 12.
The blue dots show the distribution on the sky of themembers of Sub1 identified by trimming the binary tree shown inFig. 10 with the threshold corresponding to the left key node ofthe σ plateau shown in Fig. 11. The grey dots show the remainingstars in the field of view. The two circles indicate the regions set byCantat-Gaudin et al. (2018). and NGC884, respectively. However, 89.5% of the Sub1-1 members, 622 out of 695 stars, are Cantat-Gaudin et al.(2018) members of NGC869 with P >
0. For Sub1-2 andNGC884, this fraction is 74.2%, namely 444 out of 598 stars.Similarly to Sub1, Sub1-1 and Sub1-2 tend to have mem-bers with large P : Table 3 shows that 71.4% of the P > Figure 13.
The open histograms show the distributions of the mem-bership probability P , according to Cantat-Gaudin et al. (2018), ofthe stars with P > P ofthe subset of these stars that are also members of Sub1. The upperand lower panel shows the distributions for NGC884 and NGC869,respectively. Figure 14.
Same as Fig. 13 for the distribution of the star magni-tudes. members of NGC869 have P > .
5, namely 444 out of 622stars; this percentage is 67.3% for NGC884, namely 299out of 444 stars. Table 3 shows that Sub1-2, that overlapswith NGC884, also contains 17 stars of the P > Figure 15.
The blue dots show the distribution of the proper mo-tions of the members of Sub1 in the plane of the proper motioncomponents. The grey dots show the remaining stars in the field ofview. The axis ranges are the same as in Fig. 8.
Figure 16.
Distributions of the star parallaxes. The blue solid his-togram is the distribution of the members of Sub1. The black solidhistogram is the combined distribution of the stars of NGC869 andNGC884 with membership probability P > tions of the stars of Cantat-Gaudin et al. (2018) with P > . σ RA , σ DEC ) = (0 . , . − , whereas NGC869has ( σ RA , σ DEC ) = (0 . , . − . Similarly,Sub1-2 has ( σ RA , σ DEC ) = (0 . , . − , whereasNGC884 has ( σ RA , σ DEC ) = (0 . , . − . There-fore, unlike the velocity dispersion of Sub1 reported inSect. 5.1, the velocity dispersions of Sub1-1 and Sub1-2are smaller than the velocity dispersions of NGC869 and Figure 17.
The blue dots show the distribution of the membersof Sub1 in the color-magnitude diagram. The grey dots show theremaining stars in the field of view. The axis ranges are the same asin Fig. 7.
Table 2.
Properties of the Perseus double cluster according to thebinary tree structureID N mema µ RAb (mas / yr) µ DEC (mas / yr)Sub1 4542 − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . a Number of the members of the binary tree structures. b Mean and standard deviation of the proper motions of the members of thebinary tree structures.
Table 3.
Membership probability of the Cantat-Gaudin et al.(2018) members that are also members of the binary tree structuresNGC869 NGC884 P > P > . P > P > . a a Number of members according to Cantat-Gaudin et al. (2018).
NGC884, indicating that our hierarchical algorithm identi-fies members with more similar proper motions than Cantat-Gaudin et al. (2018).Figure 20 shows the color-magnitude relations of the mem-bers of Sub1-1 and Sub1-2: they are similar to each otherand qualitatively similar to the color-magnitude relations ofNGC869 and NGC884 shown in Fig. 7. These similar-ities support the conclusion of Slesnick et al. (2002) that3
Figure 18.
Distribution of the members of Sub1-1 (cyan dots,mostly on the right) and Sub1-2 (magenta dots, mostly on the left)on the sky. The grey dots show the entire star sample. The twocircles indicate the regions set by Cantat-Gaudin et al. (2018).
NGC869 and NGC884, and thus Sub1-1 and Sub1-2, havethe same epoch of star formation. The scatter along the color-magnitude ridge line of Sub1-1 is ∼ .
183 mag and is compa-rable to the scatter ∼ .
184 mag of the stars of NGC869 withmembership probability P > .
5. For NGC884, the starswith P > . ∼ .
179 mag, which is ∼ ∼ .
155 mag. This re-sult shows that our algorithm identifies members with moresimilar photometric properties than the approach of Cantat-Gaudin et al. (2018).All these results indicate that Sub1-1 and Sub1-2 coincidewith the traditional clusters NGC869 and NGC884. Our re-sults also confirm that the approach of Cantat-Gaudin et al.(2018), who, unlike our algorithm, predetermined the centersand sizes of NGC869 and NGC884, is legitimate for Perseus,because, although Sub1-1 and Sub1-2 are very close to eachother, they remain largely distinct on the sky.The star members of Sub1 which do not belong to eitherSub 1-1 or Sub 1-2 are stars of the halo of the cluster. We la-bel this halo as Sub1-0. The distribution of these stars on thesky, shown in Fig. 21, shows that they are almost uniformlydistributed over the field.The di ff erence between the spatial distribution of the mem-bers of the binary tree structures and the remaining stars inthe field is further illustrated in Fig. 22, that shows the radialprofiles of the star number densities of the di ff erent struc-tures. The black line in the top is the profile of the field stars,namely all the stars in our catalog that are not members ofSub1. This profile is roughly constant and it drops in thevery center for a statistical fluctuation caused by the smallarea. The centers of the two components Sub1-1 and Sub1-2are at ∼
12 arcminutes from the center of Perseus, and deter-
Figure 19.
Distributions of the proper motions of the members ofSub1-1 and Sub1-2. Colors are as in Fig. 18.
Figure 20.
Distributions of the members of Sub1-1 and Sub1-2 inthe color-magnitude diagram. Colors are as in Fig. 18. mine the peak of the Sub1 profile shown in blue. The profilesof Sub1-1 and Sub1-2 , the cyan and magenta profiles re-spectively, display a similar peak in correspondence of thepeak of Sub1. The halo stars Sub 1-0, whose profile is in red,have a relatively flat distribution, similarly to the field starswhose profile is in black. These halo stars account for thewide spread both in the proper motion diagram shown in Fig.15 and in the color-magnitude diagram shown in Fig. 17. CONCLUSIONWe propose a hierarchical clustering algorithm to identifythe members and the substructures of open star clusters. Thealgorithm requires at least the celestial coordinates and theproper motions of the stars in the field of view of the cluster.4
Figure 21.
Distribution of the halo members of the Perseus doublecluster on the sky. The grey dots show our entire star sample. Thecrimson symbols show the members of the cluster halo Sub1-0. Thetwo circles indicate the regions set by Cantat-Gaudin et al. (2018). nu m b e r d e n s i t y ( a r c m i n ) fieldSub1Sub1-0Sub1-1Sub1-2 Figure 22.
Radial profiles of the star number density for the dif-ferent structures identified by the binary tree. The number densityis estimated in circular rings centered on the center of the field( α, δ ) = (35 . , . − σ standard deviations of Poissonvariates. We test our algorithm on star mock catalogs and apply it tothe Perseus cluster.Our hierarchical clustering algorithm is based on thesingle-linkage method, where we adopt a proxy of the pair-wise binding energy as the distance metric to arrange the starsin the field of view in a binary tree. We use the σ -plateau method to trim the binary tree and associate its branches tothe main cluster and its substructures; the leaves of thesebranches are the star members of the structures. Our algo-rithm relies neither on the photometric properties of the starsnor on any assumption on the shape, size, evolutionary ordynamical state of the cluster. The algorithm is thus ideal toinvestigate unrelaxed irregular systems like open clusters.When applied to the Gaia
DR2 data in the field of view ofthe Perseus cluster, the proxy for the pairwise binding energyassociates the same mass m = (cid:12) to all the stars in the fieldof view and uses only four out of the six phase-space coordi-nates of each star: the celestial coordinates and the two com-ponents of the proper motion. We ignore the radial distancesand the radial velocities of the stars, because their uncertain-ties are much larger than the size and the velocity dispersionof the cluster.We test the algorithm with this proxy of the binding en-ergy on 100 mock catalogs mimicking the Perseus field ofview. The algorithm correctly identifies the cluster: it returnsa completeness of (91 . ± . . ± . . ± . . ± . h Per) and NGC884 ( χ Per), respectively. In fact, their mem-bers share the same photometric and kinematic properties ofthe members of NGC869 and NGC884 identified within tworegions on the sky set a priori by Cantat-Gaudin et al. (2018).Compared to this latter analysis, the velocity dispersions ofthe members of Sub1-1 and Sub1-2 are 5% to 23% smaller,depending on the proper motion component. Similarly, thescatter around the color-magnitude relation is comparable forSub1-1 and NGC869, whereas the scatter is ∼
15% smallerin Sub1-2 compared with NGC884.These results suggest that our algorithm identifies mem-bers that have more homogeneous kinematic and photomet-ric properties than the procedure adopted by Cantat-Gaudinet al. (2018), despite the fact that our algorithm is only basedon the star proper motions and ignore the photometric andspectroscopic properties of the stars.Our hierarchical clustering algorithm is an e ffi cient toolthat can be easily applied to other data sets. With the high-accuracy data coming from future spectroscopic and astro-metric surveys (e.g. The Theia Collaboration et al. 2017;Malbet et al. 2019), that are expected to increase the accuracy5reached by, e.g., SEGUE (Yanny et al. 2009), RAVE (Kor-dopatis et al. 2013), APOGEE (Majewski et al. 2017), Gaia(Gaia Collaboration et al. 2018b) and LAMOST (Cui et al.2012), the proxy for the binding energy can be improved by(i) including all the six phase-space coordinates and (ii) as-signing the proper mass to each star. These enhancementswill further increase the ability of our algorithm to unveil thecomplex inner structure of open clusters and understandingtheir formation and evolution. Software: astropy (Astropy Collaboration et al. 2013),scikit-learn (Pedregosa et al. 2011)We dedicate this article to the 60th anniversary of the De-partment of Astronomy of Beijing Normal University, the2nd astronomy programme in the modern history of China.We sincerely thank the anonymous referee whose very con-structive comments largely improved the quality of the pa- per. This work was supported by the Bureau of Interna-tional Cooperation, Chinese Academy of Sciences under thegrant GJHZ1864. SZ acknowledges the National Key R&DProgram of China No. 2019YFA0405501. AD and HYalso acknowledge partial support from the INFN grant In-Dark and from the Italian Ministry of Education, Universityand Research (MIUR) under the
Departments of Excellence grant L.232 / Gaia (https: // / gaia), processed by the Gaia
Data Processingand Analysis Consortium (DPAC, https: // / web / gaia / dpac / consortium). Funding for the DPAC hasbeen provided by national institutions, in particular the in-stitutions participating in the Gaia
Multilateral Agreement.This research has made use of NASA’s Astrophysics DataSystem Bibliographic Services.REFERENCES
Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al. 2013,A&A, 558, A33, doi: 10.1051 / / / / annurev-astro-082708-101642Bragg, A. E., & Kenyon, S. J. 2005, AJ, 130, 134,doi: 10.1086 / / / / / / / / / / / / / / / j.1365-8711.1999.02864.xDias, W. S., Alessi, B. S., Moitinho, A., & L´epine, J. R. D. 2002,A&A, 389, 871, doi: 10.1051 / / // dl.acm.org / citation.cfm?id = / / / / / / / / / / / / // books.google.com / books?id = / / / asna.200410256 Kharchenko, N. V., Piskunov, A. E., Schilbach, E., R¨oser, S., &Scholz, R. D. 2013, A&A, 558, A53,doi: 10.1051 / / / / / / / / / / / / / / ab15d2Liu, A., Yu, H., Diaferio, A., et al. 2018, The AstrophysicalJournal, 863Majewski, S. R., Schiavon, R. P., Frinchaboy, P. M., et al. 2017,AJ, 154, 94, doi: 10.3847 / / aa784dMalbet, F., Abbas, U., Alves, J., et al. 2019, arXiv e-prints,arXiv:1910.08028. https: // arxiv.org / abs / / mnras / sty2851Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal ofMachine Learning Research, 12, 2825Ravindran, A., Ragsdell, K., & Reklaitis, G. 2006, EngineeringOptimization: Methods and Applications (Wiley).https: // books.google.com / books?id = Hf4eAQAAIAAJSampedro, L., & Alfaro, E. J. 2016, MNRAS, 457, 3949,doi: 10.1093 / mnras / stw243 Sanders, W. L. 1971, A&A, 14, 226Sarro, L. M., Bouy, H., Berihuete, A., et al. 2014, A&A, 563, A45,doi: 10.1051 / / / asna.201011484Serna, A., & Gerbal, D. 1996, Astronomy and Astrophysics, 309,65Serra, A. L., & Diaferio, A. 2013, ApJ, 768, 116,doi: 10.1088 / / / / / j.1365-2966.2010.17946.xSlesnick, C. L., Hillenbrand, L. A., & Massey, P. 2002, ApJ, 576,880, doi: 10.1086 / // arxiv.org / abs / / / / / / / / / / / / / / / / / / //