A LAMOST BHB Catalog and Kinematics Therein I: Catalog and Halo Properties
DDraft version February 19, 2021
Preprint typeset using L A TEX style emulateapj v. 12/16/11
A LAMOST BHB CATALOG AND KINEMATICS THEREIN I: CATALOG AND HALO PROPERTIES
John J. Vickers † , 1, 2 , Zhao-Yu Li † , 1, 2 , Martin C. Smith , Juntai Shen † , 1, 2, 3 Draft version February 19, 2021
ABSTRACTWe collect a sample of stars observed both in LAMOST and Gaia which have colors implying atemperature hotter than 7000 K. We train a machine learning algorithm on LAMOST spectroscopicdata which has been tagged with stellar classifications and metallicities, and use this machine toconstruct a catalog of Blue Horizontal Branch stars (BHBs) with metallicity information. Anothermachine is trained using Gaia parallaxes to predict absolute magnitudes for these stars. The finalcatalog of 13,693 BHBs is thought to be about 86% pure, with σ [ F e/H ] ∼ σ G ∼ β ∼ .
70) at all radii than our metalpoor BHB stars ( β ∼ . INTRODUCTION
Blue horizontal branch stars are reliable “standardcandle” stars commonly used for studying the Galac-tic halo (e.g. Greenstein & Sargent 1974, Beers et al.1992, Yanny et al. 2000). They are helium burning gi-ants, which have evolved off the red giant branch (andso, thought to be old, Hoyle & Schwarzschild 1955), withabsolute magnitudes near 0 in a variety of optical bands.The anatomy of the horizontal branch begins at the coolend with the red horizontal branch (near the red clump),progresses blueward through the RR Lyrae gap, into theblue horizontal branch, and then falls off in magnitudedown the extreme horizontal branch (see the review ofCatelan 2009).Owing to their low surface gravities, BHBs exhibit nar-row spectral lines compared to main sequence stars andmay be selected on that basis (Pier 1983, Flynn et al.1994, Clewley et al. 2002). This variation in spectralline shape, near the absorption line series’ limits, alsocauses a net change in continuum flux, which may be ex-ploited to select them based on filter colors (for example:Bell et al. 2010, Deason et al. 2011, Vickers et al. 2012).While photometric studies of BHB stars allow ob-servations extending to great distances (the MagellanicClouds, Belokurov & Koposov 2016, or the halo to hun-dreds of kpc, Deason et al. 2018, Fukushima et al. 2018,Fukushima et al. 2019, Nie et al. 2015, Thomas et al. 2018 Department of Astronomy, School of Physics and Astronomy,Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai200240, People’s Republic of China Key Laboratory for Particle Astrophysics and Cosmology(MOE) / Shanghai Key Laboratory for Particle Physics and Cos-mology, Shanghai 200240, China Key Laboratory for Research in Galaxies and Cosmology,Shanghai Astronomical Observatory, Chinese Academy of Sci-ences, 80 Nandan Road, Shanghai 200030, People’s Republic ofChina † [email protected]@[email protected] for example), they suffer worse contamination ( ∼ < . Spectroscopy also un-locks the 6-th phase coordinate through radial velocitymeasurements.Unfortunately, BHB stars’ hot temperatures ( > “Purity” also called “precision” refers to the number of truepositives out of all positive classifications. “Contamination” refersto the number false positive misclassifications out of all positiveclassifications (one minus the purity). “Completeness,” also called“recall,” refers to the number of “true positive” classifications outof all true positives (that is, the proportion of true candidates whichare successfully recovered). a r X i v : . [ a s t r o - ph . GA ] F e b Vickers et al.vestigate the ancient history of the Milky Way.BHBs have long been used to study the Milky Wayhalo. Their bright absolute magnitudes allow them toprobe great distances, and their old ages are ideal forstudying the canonical in-situ halo. While the halo for-mation scenario has been more complicated in recentyears from the monolithic collapse of Eggen et al. (1962),to the present understanding of heirarchical formation(Searle & Zinn 1978, Helmi et al. 1999, Newberg et al.2002), BHBs remain useful as the accreted satellites oftenpossess ancient BHB populations of their own.This paper proceeds as follows: in Section 2 we de-scribe our data to be classified and reduced, and gothrough how we assign classifications, metallicities, andabsolute magnitudes to our data; in Section 2.6 we de-scribe some “sanity checks” to double check our reduceddata (such as comparing pipeline values to globular clus-ter values); in Section 3 we describe our coordinate sys-tems; in Section 4 we look over the chemistry, kinemat-ics, and ages of our halo BHBs; we discuss our findingsin Section 5 and conclude in Section 6.
Abbreviations and Notation
For this paper, we will use the following abbreviations:“BHB” is a Blue Horizontal Branch star; “MSA” is aMain Sequence A-type star ; “XGB” indicates an eX-treme Gradient Boosted (XGBoost) random forest algo-rithm or a value which has been predicted by one; (cid:36) indi-cates Gaia parallax or a value calculated from it; the sub-script “0” indicates a photometric value which has beencorrected using the full column extinction of Schlegel etal. (1998) with the filter coefficients from Casagrande, &VandenBerg (2018); an upper-case G will represent anabsolute magnitude in the Gaia g band; “SNR” standsfor signal-to-noise ratio. PREDICTING STELLAR TYPES, METALLICITIES, ANDABSOLUTE MAGNITUDES WITH XGBOOST
Data
Our BHB catalog will be constructed from data col-lected by the Large Area Multi Object SpectroscopicTelescope (LAMOST; Luo et al. 2015). This telescope,located at Xinglong Station in Hebei Province, collects4000 spectra per exposure at a resolution of R ∼ ∼ Foundation
The fundamental spectral difference between BHBstars and the main contaminants in this spectral color For this paper, a blue straggler star will be considered a MSAstar as they are similar in observational qualities. regime (main sequence A stars and blue stragglers,quasars, and white dwarfs) are the presence of hydro-gen absorption lines (which are instead emission lines inquasar spectra) which are narrow due to the low sur-face gravity of giant BHBs compared to the other stellarcontaminants. See, for example, Vickers et al. (2012)Figures 7 and 8, Yanny et al. (2000) Figure 8.Previous works frequently relied on fitting line profilesto discriminate BHBs from different species of stars bytheir surface gravity (e.g. Clewley et al. 2002, Sirko etal. 2004, Xue et al. 2008). However, with modern ma-chine learning tools becoming readily available and easilyaccessible on personal computers, we opt to classify ourBHB stars using a data-driven algorithm.The benefit of using a machine learning framework toclassify BHB stars lies in the fact that we can readily useinformation from the entire spectra, whereas prior line-fitting methods rely on carefully selecting the wavelengthranges of individual lines. The absorption features fol-low profiles which merge into the continuum, or eveninto each other as is the case in stars with exceptionallystrong surface gravities, such as white dwarfs. Utiliz-ing the entire spectra avoids problems associated withmis-selecting profile extents and shape profiles, and al-lows more spectral information, such as non-targetted,incidental lines, to be utilized.The difficulty of using a machine learning supervisedalgorithm is that we require a training set which is re-garded as “true” for the machine to learn from. Thesupervised algorithm will consider a training data setwhich is in the same data format as the data to be clas-sified, (i.e. LAMOST spectra) and which has tags forthe classes (i.e. “BHB,” “not-BHB”), and construct amathematical equation through which the data may bepassed to yield a numerical value. This numerical valuemay be a regressed value (as is the case when fitting acontinuous metric, such as metallicity), or a 1-0 classvalue (as we are doing now for BHB, not-BHB).This need for a training set means that a subset ofthe data must be processed and classified by hand beforethe bulk of the data may be classified by machine – itmust be “supervised” by an intelligent actor. In thisway, modern data-driven pipelines rest very directly onthe shoulders of past researchers who have painstakinglyreduced subsets of these data by hand. Training Data
The Century Survey of Brown et al. (2008) is a catalogof 2414 color-selected BHB candidates over 10% of thesky from Two-Micron All Sky Survey (Skrutskie et al.2006) in 12.5 ≤ J ≤ g,r,i > ∼ Sometimes the profile is fit with a Voigt profile, which is acombination of a Gaussian core and Lorentz wings, representativeof the differing effects on spectral line shape of abundance andgravity. In most cases, some techniques such as those based on decisiontrees construct a series of yes-no questions.
AMOST BHBs I 3BHB stars. LAMOST has observed 833 of these 10,000candidates, and 380 of the 2558 confirmed BHB stars (atSNR g,r,i > λ /∆ λ ∼ A ,which is slightly more than one grid-point per measure-ment at the blue end of the spectra and more dense permeasurement at the red end. Our grid extends from 3850to 8900 ˚ A , which is the maximal extent covered by allspectra in our data set (at rest wavelengths). Note thatthis data set is pre-cut to include only data with radialvelocities less than 1000 km s − . Our training sample, using data from Xue et al. (2008)and the Century Survey (Brown et al. 2008), contains1,406 stars with BHBs making up 42%. For a full break-down, see Table 1.
The Machine
With our training data and data to be classified inhand, we return to the machine learning problem. Weutilize the eXtreme Gradient Boosting method (XGB ,Chen & Guestrin 2016). This modern machine learn-ing method is an optimized version of “gradient boost-ing” (Friedman 2001), which was found to be a general-ized version of the popular “adaptive boosting” (Freund2009) method, which belongs to the “random forest” (Ho1995) family of algorithms. A random forest is a seriesof weak decision trees. These trees each calculate a valueY based on input characteristics X for a data set by ask-ing a series of yes or no questions; they are kept weakby only allowing each tree to ask a limited number ofquestions and only showing each tree a random subset ofthe data instead of the whole set . The trees then voteas a forest on a value of Y for each data array X. An“adaptive boosted” forest is planted one tree at a timeinstead of randomly all at once. Each new tree is chosento correct classifications or regressions that the existingforest has failed at using weighting, so the forest tendsto grow more accurate with each new tree. “Gradient Large redshifts artificially compress our rest spectra to besmaller than the grid we use. This cut probably removes spec-tra of blue items at large distances, such as quasars. https://xgboost.readthedocs.io/ A problem common to decision trees, and a risk inherent inall supervised machine learning techniques, is so-called “overtrain-ing,” whereby the machine memorizes the training data exactly.This provides excellent precision and recall of the training data,but poor metrics for unseen testing data. This is why the indi-vidual components must be kept “weak,” however the resultingconglomerate will be “strong.” boosting” is a more generalized method of this selectivetree planting method, and “eXtreme Gradient Boosting”is a computer-optimized implementation of this general-ized method.The machine requires an array of X values (the spectro-scopic fluxes, shifted and upsampled as described above)and a list of classes or values (for example: Y
Class =BHB or not-BHB, or Y [ F e/H ] = -1.2 or -0.7).We create three machines: the first machine classifiesstars as BHB or not-BHB, using the training data (1,406stars, 593 of which are BHBs); the second machine re-gresses metallicities using the same catalogs, but usesonly BHB estimated metallicities (that is, a training setwith 552 BHBs, extending from [Fe/H] of -3.0 to -0.3,the 41 BHB difference is from those which do not havemetallicities assigned in our training data ); the thirdmachine regresses absolute magnitudes using the paral-lax (cid:36) for a training set of 697 BHBs (extending fromG of -1.2 to 2.5, this training set is constructed fromLAMOST-observed stars identified as BHBs from theprior classification machine, which also have good par-allax measurements, and is unrelated to the training setsconstructed from the Century Survey or Xue et al. 2008sample). Note on Predicting Absolute Magnitudes
One of the desirable characteristics of BHB stars istheir bright and stable absolute magnitudes. These ab-solute magnitudes are sometimes treated as a constant,and sometimes treated as a function of color (see, for ex-ample, Deason et al. 2011). We use the XGB algorithmto predict a value, which should offer more flexibilitythan a flat value or a color-magnitude fit.Absolute magnitudes are not given in our training cat-alogs. To predict absolute magnitudes, we will need atraining set of BHB stars with “known” absolute magni-tudes. We can easily calculate this for BHB stars whichhave good parallax measurements and with correct ap-parent magnitudes. To ensure correct parallaxes, we usestars with 10% parallax errors or less. To ensure we havethe correct apparent magnitudes, we correct our data forextinction using the maps of Schlegel et al. (1998) andonly use data which is 750 pc or more outside of theplane. This should ensure that the bulk of the dust col-umn from the reddening map is between us and the starand we are not over-correcting our data. This trainingset of 697 BHBs is used to predict the absolute magni-tudes of the rest of the BHBs in our sample.
Note on the Parallax Offset
Note that we have not used a parallax offset (for aninclusive overview of the field, see Zinn et al. 2019 and The 41 BHBs with missing metallicity values all occur in thecatalog of Xue et al. (2008). In their work, they classified BHBstars based on a line-shape method which they implemented them-selves on the SDSS spectra, and the reported metallicities werecollected from a separate calculation, the WBG pipeline in SDSS.The WBG, or “Wilhelm, Beers, and Gray” pipeline, is optimizedfor the treatment of hot stars, such as BHBs, and the details maybe found in Wilhelm et al. (1999). It is unclear why the SDSSpipeline was unable to assign a metallicity to these stars while Xueet al. (2008) were able to calculate their line parameters for classi-fication purposes. We note that the stars with missing values seemto be preferentially bluer and dimmer than the bulk of the Xue etal. (2008) sample.
Vickers et al.
TABLE 1
Survey (Total / < (cid:36) error) N Full N (cid:36) Century (573/175) BHB 213 (37%) 18 (10%)MSA 154 (27%) 68 (39%)BHB/MSA 47 (8%) 4 (2%)subdwarf 17 (3%) 12 (7%)DA white dwarf 8 (1%) 8 (5%)None 134 (23%) 65 (37%)Xue (833/180) BHB 380 (46%) 103 (57%)MSA/BS 453 (54%) 77 (43%)
Note . — The makeup of our training sets in the full sample and in the 10% parallax error sample. references therein). The reasoning for this is that a par-allax offset of 0.054 mas (a reasonably central value fromSch¨onrich et al. 2019 when compared to various studiesshown in Figure 1 of Zinn et al. 2019), for example, shiftsthe XGB estimated absolute magnitudes of our BHBs byapproximately 0.3 magnitudes. Consider for illustration,a BHB observed with a magnitude of 12 and a paral-lax of 0.4 mas, which would have an absolute magnitudeof 0.01; increasing the parallax to 0.45 mas would shiftthe absolute magnitude by 0.25 to G = 0.26. This shiftcauses our distance comparison to globular clusters, asseen in Section 2.6.2 and Figure 2, to be systemically off-set with distance moduli too small (distances closer thantheir globular cluster’s).We suggest that our sample of objects used to trainour data (having colors b - r < − . This is sensible as increasing theparallax will raise the estimated absolute magnitudes,and therefore lower the estimated distances to the stars TABLE 2
Precision RecallBHB 85.8% 86.2%Not BHB 90.0% 89.8%
Note . — The confusion matrix for our XGB classification. Pre-cision (also called purity) indicates the number of true positives asa percentage of all positive classifications. So 85.8% of all objectsclassified as a “BHB” are true BHBs. Recall (also called complete-ness) indicates the fraction of objects which are correctly classified.So 86.2% of all BHBs in our data are classified as BHBs. This tableis based on 100 training-testing splits of the data. in our sample. The proper motions will then imply lessertangential motions, leading to lower velocity dispersions.
Optimizing the Machine and Reported Errors
The performance of the machine can be optimized bychanging meta-parameters such as the learning rate, andthe maximal tree-depth, etc. To find the optimal param-eters, we may split our training data into a training setand a validation set. The machine learns from the train-ing set (75% of the data in each of the three machines),and then its performance with various meta-parametersis analyzed on the validation set (25% of the data, thissplit simulates the machine performance on new, unseendata by hiding a portion of the data from the machinewhile it learns, to minimize “data leakage”). We thencycle through various “splits” of the training data. Thisis a technique called “cross-validation.” We perform arandom grid search of meta-parameters to minimize themean squared error (for metallicity and absolute magni-tude) or to maximize precision (purity, i.e. minimizingthe number of false-positive contaminants, not maximiz-ing the number of true-positives) for the classificationproblem.After this optimization of meta-parameters, we use theentire training set to train the XGB machine before pre-dicting values for the complete observational data set.The machine self-reports a classification purity (pre-cision) of 85.8% for BHB stars, 14.2% contamination,and successfully finds (recalls) 86.2% of all the BHBs forwhich there are spectra. We have constructed a confu-sion matrix based on the results of 100 train-test splitsof our training data in Table 2.The metallicities are estimated with mean squared er-ror of ∼ σ [ F e/H ] ∼ TABLE 3 [Fe/H] N σ [Fe/H]-3.0 to -2.5 43 0.77-2.5 to -2.0 169 0.31-2.0 to -1.5 232 0.23-1.5 to -1.0 83 0.45-1.0 to -0.5 24 0.82 Note . — The precision of the metallicity estimate as a functionof metallicity. To construct this, we trained the XGB machine with100 training-testing splits of the total training data and evaluatedthe precision of the metallicity estimate of the machine on thosesplits. The precision seems to worsen for the highest and lowestmetallicity BHBs, perhaps because of the lower number of objectsin those training sets. One BHB from the training set is missingfrom this table, it has a metallicity of -0.31. bins where training data is abundant, from -2.5 ≤ [Fe/H] ≤ -1.5, and less precise where training data is sparse.We note that our training data are extremely sparsefor high metallicities. Only 25 have metallicities ≥ -1dex. This will cause difficulty for the machine when itcomes to predicting metallicities higher than -1 dex.Absolute magnitudes have precisions around σ G ∼ . Verifying the Regressions
Through cross-validation, we have obtained estimatesfor the mean-squared error of the metallicity and abso-lute magnitude estimates as well as purity estimates forthe BHB classification. Here we attempt to verify thesein a few independent ways.
Color-Magnitude Diagram
A simple way to look at contamination would be toconstruct a color-magnitude diagram of classified BHBstars. A pure sample should follow the horizontal branch,and contamination should appear off of that branch,most likely at redder and dimmer regions, where the mostnumerous contaminants (MSA stars) reside.In Figure 1 we show the color magnitude diagram forstars estimated to be BHBs by our classifier. The starsare selected only to have (cid:36) errors of 10% or less and alsoto be outside of the plane (z
XGB >
750 pc, so that thecolor and magnitude values are reliable after extinctioncorrection assuming the full columns of Schlegel et al.1998). We believe that this sample should reasonably re-flect the full sample in proportions of BHB and non-BHBstars by referencing Table 1 and noting that the propor-tion of BHBs in the 10% parallax error sample is 34%,instead of 42% in the entire sample. This figure showsa strong concentration of presumptive BHBs, with somecontamination extending redwards and fainter (presum-ably, MSA stars, which includes blue stragglers here) andsome contamination brighter and blueward (this could bequasars, extreme horizontal branch stars, or possibly trueBHB stars which have been over-corrected for extinction,moving them to too-bright and too-blue sections of thediagram). Traditionally, main sequence A stars are themost prominent contaminants in this type of classifica-tion problem.The selection of height from the plane has been usedinstead of latitude in an attempt to improve the accu-racy of the dereddening procedure (by guaranteeing the full column density of the maps of Schlegel et al. 1998are meaningful for each star) while preserving as manystars as possible. However, despite these efforts, areasof low latitude could still be confused by high extinctionvalues, particularly if the extinction maps have poor an-gular resolution of the dust near our stars. In the sampledetailed above: the lowest latitude BHB candidate is at8.9 ◦ ; 97% of the candidates are above a latitude of 15 ◦ ;88% are above 20 ◦ ; and 68% above 30 ◦ . There may besome candidates which experience extinction confusion,but the large majority should be in low-extinction direc-tions and outside of the plane with reliable colors andmagnitudes. Changing our procedure to include a cutsuch that all stars are more than 30 ◦ from the plane doesnot change the results much, the anisotropy, for example,remains 0.62 for the low metallicity BHBs, and increasesto 0.74 from 0.70 for the high metallicity BHBs.These stars are generally found in the apparent G mag-nitude range of 10 to 14, with the median near 12.8. Ourbiggest fear is that some of these objects have been cor-rected for the full column density extinction of Schlegelet al. (1998), but in fact experience a different level ofextinction. Since these reddening maps are calculatedbased on extragalactic sources, we expect that the red-dening values may over-correct (under-correction shouldnot be a problem), shifting the objects to too-blue andtoo-bright in this figure (Figure 1) of the extinction cor-rected colors and magnitudes. We would expect this typeof error to manifest as a diagonal cloud with a negativeslope in the figure as main-sequence stars (which shouldtruly be in the bottom-right corner) are dragged towardthe center, and BHB stars (which should truly be in thecenter) are dragged toward the top-left corner. Whilesome of this may be occurring, such a pattern is notprominent or noticeable in this data subset. There alsoappears to be no dependence on galactic latitude, sup-porting our choice of using distance from the plane as aselector.Note that Figure 1 has heavier contamination when weinclude stars closer to the plane and of lower probabilityof being a BHB star than in our shown plot, which hascuts at 750 pc and 75%, respectively. Including lowerprobability stars increases contamination in the dimmer-and-redder direction, most likely MSA stars. Includingstars closer to the plane populates a plume of high proba-bility BHB “contaminants” which are bluer and brighterthan the BHB locus. This could be true BHB stars whichare extinction corrected more than they are intrinsicallydimmed by dust.To get a rough estimate of the contamination levels,we draw a polygon around what we presume to be “true”BHBs by eye. 23 out of 450, or 5% of the plotted stars,lie outside this polygon. This is lower than the machine-reported 14% contamination, although, there may alsobe contamination inside of this selection box. If we con-struct the same figure with cuts of 50% P BHB and z >
500 pc, this changes to 10% lying outside of the polygon.This color-magnitude diagram is indicative of our datain this current work, but not for the entire catalog. Theinterested party is encouraged to take note of this. Wediscuss the contamination profile inside the plane morethoroughly in a companion paper. Including lower probability BHB stars (P
BHB between 0.5
Vickers et al. (bp-rp) G P B H B Fig. 1.—
The absolute magnitude of BHB classified stars cal-culated from parallax as a function of dereddened Gaia color. Thestars in this figure are most likely BHBs from their spectra (P
BHB ≥ z >
750 pc. Most likely they are all BHBs outside of the plane,so they have full column reddening and a relation between the in-trinsic color and magnitude may be constructed (the dashed lineis a quadratic fit to the data inside the polygon, which is drawnby eye). This figure may be compared to Figure 4 in Deason etal. (2011), which shows a similar BHB color-magnitude distribu-tion, with a similar brightness around 0.6 in SDSS g (our sampleof 13,693 BHBs has a mean abs( G ) of ∼ Globular Clusters
Globular clusters have well defined distances andmetallicities in the literature. By cross-matching ourdata on the sky within 1 tidal radius of the globularclusters in the Harris Catalog (Harris 1996; 2010 revi-sion), we find 20 stars classified as BHBs in six clusters.We show the distance modulus and abundance values ofthese six clusters and 20 stars in Table 4 and plot themin Figure 2.With regard to the apparent visual outlier in NGC5024, we find that the three stars lying visually on thehorizontal branch have proper motions of ( µ R.A. , µ Dec. )= (-0.09, -1.64), (-0.25, -1.9), (-0.14, -1.26), and the out-lier has a proper motion of (-0.31, -1.42). This propermotion seems consistent with cluster membership (asdoes the radial velocity). It could possibly be a trueBHB member of the cluster with a highly uncertain pho-tometric magnitude, although that seems unlikely as itsflux error is well below 1%. It is also not likely to bean RR Lyrae (which would be located in a similar placeon the color-magnitude diagram), as it is observed twomagnitudes from the horizontal branch when RR Lyraetypically only have pulsations on the order of one magni-tude. This star is removed from the subsequent analysisof the [Fe/H] and abs( G ) errors, although, we note thatincluding it actually improves the expected precision ofthe [Fe/H] calculation.We compare the estimated [Fe/H] values of the memberBHBs from the XGB regression with the literature values and 0.75) increases main sequence contamination to the red anddim corner of the plot. Including lower z height data creates aplume of contamination along the reddening vector to the brighter,bluer corner, which could be true BHBs incorrectly dereddened.This figure should be simple to reproduce with the provided catalogcrossmatched to Gaia, so the interested reader is encouraged toexamine how these effects change depending on your own cuts. of their clusters’ metallicities from the Harris Catalog.The error (assuming that the catalog values are exactand true) for our 19 stars (20, minus one visual outlier)is σ [ F e/H ] ∼ .
31. This is slightly more precise than themachine estimated value of σ [ F e/H ] ∼ .
39. Perhaps thisis because the metallicities of these globular clusters liebetween -2.5 and -1.5, where we predict the [Fe/H] errorsto be lower (see Table 3).For each star in these globular clusters, we may alsoassign a distance modulus value, mod
XGB = G XGB − g , which can be compared to the distance moduli of theglobular clusters, mod G.C. .If we assume that the error on the observed magni-tude, g , and the distance modulus of the globular clus-ter, mod G.C. are small compared to the error on G
XGB ,then: σ ( mod XGB − mod G.C. ) ∼ σ G XGB ∼ . . This value of σ G ∼ .
21 is substantially lower thanthe error estimated by the machine of σ G ∼ .
41. Whenthe outlier discussed previously is included, σ G ∼ . G measurements). These 9 objects alsoonly span -2.0 ≤ [Fe/H] ≤ -1.5, which would exacerbateour noted problems with predicting high metallicity val-ues for our data. We abandon this line of inquiry andaccept the covariance.This covariance could be an effect of how temperatureand metallicity affect spectral line shape. Larger bright-nesses tend to characterize lower surface gravity stars,which have narrower spectral lines. Larger abundancestend to make lines deeper. A deep-and-wide line (highabundance, low brightness) could have a similar overallshape to a shallow-and-narrow line (low abundance, highbrightness) on a normalized spectra. Confusion betweenthese two could cause the machines to mistakenly lowerthe abundance of a given spectrum while also increasingthe brightness, leading to the covariance we see. The obvious visual outlier in NGC 5024 is omitted from theright-hand panel as its distance modulus is an extreme outlier. Thestar is presented in Table 4.
AMOST BHBs I 7
Distance Modulus G NGC 6205 NGC 5466 NGC 5272 G NGC 5053
BP-RP
NGC 5024 NGC 4147 [ F e / H ] Fig. 2.—
Color magnitude diagrams of six globular clusters from the Harris (1996) catalog which have BHB stars within 1 tidal radiusof their center in our data. The right panel shows the difference between the individual stars [Fe/H] and distance moduli in relation to thecatalog values for their respective clusters. There is an apparent covariance in XGB predicted G magnitudes and [Fe/H] values where stars which are too metal poor are predictedto be too bright, and stars which are too metal rich are predicted to be too dim. The [Fe/H] and G values are predicted by independentmachines. TABLE 4
Globular Glob. [Fe/H] BHB [Fe/H] Glob D. Mod. BHB D. Mod.NGC 6205 -1.53 -1.89 14.26 14.24NGC 5466 -1.98 -2.03 16.02 15.92-2.18 16.09-1.9 15.88NGC 5272 -1.5 -1.96 15.04 15.18NGC 5053 -2.27 -1.75 16.2 15.99-1.95 15.94-1.92 16.16-2.35 16.49-2.33 16.34-1.86 16.3NGC 5024 -2.1 -1.96* 16.26 14.27*-1.89 16.38-1.71 16.15-1.52 16.0NGC 4147 -1.8 -1.77 16.43 16.15-2.21 16.97-1.89 16.26-2.07 16.73-1.82 16.45
Note . — BHBs which reside within 1 tidal radius of the globular clusters shown in Figure 2. The asterisks indicate an item omittedfrom the right panel of Figure 2 because its distance modulus is a large outlier ( ∼ This could possibly be avoided with higher resolutionspectra as these two effects actually affect the line shapein subtly different ways, particularly in the wings of theprofile.
Duplicates
The LAMOST survey has observed numerous stars inmultiple observations, some of these duplicate observa-tions have been classified by the XGB machine as BHBstars. Since the spectra are independent of each other,they provide some insight into the internal errors of theclassifications and regressions. In Figure 3 we show the distribution of magnitudes and metallicities for stars ob-served two or more times. From this Figure, we see thatthe median σ [ F e/H ] is about 0.07 and the median σG isabout 0.06 .Note that duplicates are removed in the analysis sec- These values are much smaller than the dispersions discussedpreviously, that is, 100 spectra of similar metallicity or brightnessstars would have a larger spread in [Fe/H] and abs(G) than 100spectra of the same star. This may imply that the fitting surfaceis not smoothly varying, which is a side effect of random forestalgorithms, or that some third parameter such as surface gravityor temperature works to inflate the errors.
Vickers et al. [Fe/H] a b s ( G ) Fig. 3.—
The σ [ F e/H ] and σG values for stars (blue pointsand hex map) observed two or more times (with different observ-ing conditions) by LAMOST. The contours indicate the smoothed25%, 50%, and 75% confidence intervals and the cross indicates themedian σ [ F e/H ] and σG (0.07 and 0.06, respectively). tion of the paper (preserving the observation with thehighest signal to noise ratio) but are not removed fromthe catalog.In our sample of 13,693 BHB spectra, we estimatethat there are 11,046 unique stars. There are 1,884 ob-jects with multiple-observations, the largest number ofrepeated observations is one star with 16 individual spec-tra and the median is two observations per star. COORDINATES AND VELOCITIES
Kinematic phase information is calculated in the usualway from observed coordinates, proper motions, radialvelocities, and the distances derived from the XGB pre-dicted absolute magnitudes and photometry which hasbeen extinction corrected using the full column dustmaps of Schlegel et al. (1998) .Our coordinates are all right handed with the Sunat ( X, Y, Z ) = ( − . , , V R , V φ , V Z ) = (0 , − ,
0) km s − . V R is positiveaway from the Galactic center. The solar motion is( U, V, W ) = (13 . , . , .
24) km s − with respect to thelocal standard of rest. These positions and velocities areall adopted from Sch¨onrich & Aumer (2017); Sch¨onrich(2012).The final assignment of these coordinates is based onthe mean value of 100 realizations of the observed starconsidering the observational errors as well as the Gaiacorrelations on the relevant parameters. Final Cuts for Halo BHBs
Coverage of our sample is shown in Figure 4. The cat-alog covers a large area of sky unexplored by prior BHBsurveys. However, for the rest of this paper, we will onlyconsider stars which are observed at more than 3 kpcfrom the plane, to select halo BHBs. In a companion pa-per we will more thoroughly analyze the BHB populationcloser to, and inside of, the plane.From our initial Gaia-LAMOST crossmatch of 155,549objects, 14,101 are classified as BHBs. From the catalog We use full column reddening here as we will later cut thedata to have a distance of more than 3 kpc from the plane, to selecthalo stars. When using these data at lower altitudes, a differentprocedure is necessary. l b N Fig. 4.—
Sky coverage of our BHB sample. It can be seen thatour BHB sample covers a good amount of the plane which is notcovered in other large BHB catalogs. Note that this is the fullcatalog, the subset analyzed here is restricted to more than 3 kpcaway from the plane. we remove objects for which the machine has predictedvalues outside of the input parameter range for G (from2.5 to -1.2) and [Fe/H] (from -3 to -0.3), which leaves13,693 BHBs in our final catalog.We make the following additional cuts to the catalogto prepare the scientific sample for the remainder of thispaper: duplicates removed within 7”, e µ R.A. , e µ Dec. < < BHB > z > σ velocityoutliers iteratively culled.This leaves 2,692 high confidence BHBs in our halosample with good kinematics which form the data forthe rest of the current analysis. ANALYSIS
Anisotropy
Anisotropy is an observable measurement of the degreeto which a population is orbiting tangentially versus ra-dially in the galaxy: β = 1 − ( σ θ + σ φ ) / (2 σ r ) . (1)Populations which are radially biased have β >
0, andtangentially biased populations have β < β ∼ .
62) and the high metallicity stars beingmore radial velocity dispersion dominated ( β ∼ . β ∼ β ∼ β ∼ β values for metal rich stars, is consistent withour results, although our population of BHB stars seemsto be generally more tangential velocity dispersion dom-inated than their population of K-giants.Wegg et al. (2019) recently looked at the anisotropyand velocity ellipsoids of RR Lyrae stars out to a spheri-cal Galactocentric radius of about 20 kpc. Their sampleis similar in both age and extent to our current BHBpopulation. They found that the velocity profiles of RRLyrae become highly radial beyond 5 kpc with β ∼ of about 50 km s − and β down to 0.25.Our sample is less radial velocity dispersion dominatedthan their sample, and does not probe as low in galacto-centric radii as theirs does. It’s worth noting that theirstudy probed latitudes down to 10 ◦ , which would bringtheir observational window closer to the disk than theone considered in this work.Analyses of cosmological simulations generally find the β parameter to rise with galactic radius. At their centers,simulated galaxies are nearly isotropic. Moving outward,the halos of galaxies tend to become more radially biased,with β rising, at first quickly and then more slowly, to avalue ≥ Ages and Metallicities
Nearly thirty years ago, Preston et al. (1991) notedthat BHB stars exhibited a color gradient in the halo,growing redder with increasing radius. They posited thatsuch a color gradient could be the result of an age gradi-ent in the halo, with redder BHB stars being, on averageyounger .That work has been revisited recently using Sloan Dig-ital Sky Survey spectroscopy (Santucci et al. 2015) andphotometry (Carollo et al. 2016, see also Whitten et al.2019). They similarly find a suggestion of a reddeningof BHB color with increasing radius, with bluest (oldest)stars being more centrally concentrated.We plot the colors, as well as the metallicities of ourBHB sample in Figure 6. This Figure shows that, as wemove outward in the halo, the BHB population growsredder (perhaps younger) and metallicity remains rela-tively flat around -1.9 dex. This abundance value is sim-ilar to what is expected for the halo, if a bit high, Xue etal. (2008) found a value of around -2 dex in their sample. DISCUSSION
In our halo sample, we have noted two trends: Our sample is slightly retrograde in the area studied ( z > − . They noted that the color gradient could also be the result ofmore distant BHB stars having smaller core masses, but found noreasonable explanation for why that may be the case; see also “thesecond parameter phenomenon” of BHB morphology, explored indetail in Dotter et al. (2010). • Our metal rich BHB stars move on more radialorbits than our metal poor stars at all radii. • As we move outward in the halo, the BHBs growredder (possibly younger) while metallicity remainsrelatively flat.When speaking of the halo, there are a few mainparadigms. The first is the idea of an inner-outer halo,two major components which overlap but trade domi-nance as radius increases leading to a sort of “break”where an outer halo becomes the more dominant popu-lation. The inner halo is thought to be slightly progradeand more metal rich, while the outer halo is thought to beslightly retrograde and more metal poor. This idea hasbeen pursued vigorously with mounting evidence fromthe research group of Carollo et al. (2007). This findinghas been supplemented with evidence from other teamsrevealing broken halo density profiles in various tracers(e.g.: BHBs Deason et al. 2011 and Deason et al. 2018,RR Lyrae Sesar et al. 2011), in the presence of two majorcomponents in the chemistry (Nissen & Schuster 2011,and subsequent papers in that series), in a broken ageprofile in the halo (Whitten et al. 2019), and in the ex-otic stellar population prevalences (Carollo et al. 2012).A more recent development, unveiled primarilythrough the exquisite proper motion data from Gaia, iscouched in terms of merger history. The discovery papersof Belokurov et al. (2018) and Belokurov et al. (2020) de-tail two major halo components dubbed “The Sausage”and “The Splash,” respectively. The Sausage is a highlyradial population of stars thought to have arisen froma merger 8-11 Gyr ago with a 10 M (cid:12) galaxy, and theSplash is thought to be debris from the Milky Way’sproto-disk at the time of this merger. The Splash isslightly younger by perhaps a Gyr and more metal rich([Fe/H] > -0.7, while the Sausage is found around [Fe/H]between -1.7 and -0.7). The Splash stars generally havelow angular momentum and some have retrograde orbits(see also Amarante et al. 2020).The “Gaia Enceladus” theory from Helmi et al. (2018)combines the retrograde halo with the eccentric Sausageinto a single event. Other groups (e.g. Myeong etal. 2019) propose that the Sausage arose from a singlemerger while the retrograde component arose from a sep-arate event called “Sequoia.” Seqouia stars are thoughtto be slightly more metal poor than Sausage stars.Comparing the metal-rich to metal-poor componentsat different radii, we find that our metal rich BHBs seemto be more radial velocity dispersion dominated (havinghigher β ), which could imply that they contain a signif-icant portion of “Sausage” members. We briefly inspectthe V R - V φ velocity plane in Figure 7, and find that aportion of our high metallicity stars reside in an extendedV R feature around zero rotation, which is consistent withthe “Sausage” kinematic feature.We briefly check for the presence of the Sequoia /Enceladus feature by investigating our most metal poorBHBs; however we do not see evidence of these less en-riched BHBs having a surplus of counter-rotating mem-bers, so we do not think a substantial amount of starsfrom those features are present in our sample.At this point, we return to our other finding that ourBHB stars possibly grow younger with increasing radius,while the metallicity gradient remains flat.0 Vickers et al. low [Fe/H]high [Fe/H]Bird et al. (2020) r Sph . r Sph . V T o t a l V RR r Sph . Fig. 5.—
Top Left: anisotropy is a measure of how radial the orbits of a population are. The metal rich stars in our sample are slightlymore radial than our metal poor stars.
Top Right: total velocity dispersion of the BHB sample falls with spherical radius, the metal richBHBs usually being cooler than the metal poor BHBs.
Bottom:
Individual components of the velocity dispersions, used to derive the upperpanels.The panels include the findings of Bird et al. (2020) for their “all SDSS halo BHB” sample, and the agreement seems good. We refer thereader to that paper for a more detailed analysis of their BHB and K-giant data including metallicity dependence and a more thoroughremoval of substructure than we have implemented here. We have used the “extreme deconvolution” method of Bovy et al. (2011) tocalculate the velocity dispersions. z ( k p c ) ( bp - r p ) ( bp - r p ) z ( k p c ) [ F e / H ] [ F e / H ] Fig. 6.—
Left : The colors of our BHB stars as a function of distance from the plane and radius (cylindrical in the bottom frame, sphericalin the upper frame). Color may be a proxy for age in BHBs, with redder BHBs coming from younger populations. This figure shows aclear trend for more distant BHB stars to be redder, and therefore to possibly represent a younger population.
Right : Similar to the lefthand frames, but with metallicity instead of color.
AMOST BHBs I 11
400 200 0 200V R V Fig. 7.—
The V R V φ plane of our bhbs with metallicities > -1.8and Z > R , as expected for the canonical halo population. However, theV φ distribution is not well fit by a gaussian, with a strong peak nearzero and a heavy tail extending toward prograde rotation. Thispeak near zero is indicative of the sausage-like feature in Belokurovet al. (2018), their Figure 2. The heavy tail extending towardprograde rotation could be related to the thick disk halo interface. When looking at the globular cluster population of theMilky Way, in-situ globular clusters generally populatean area around 12-14 Gyr of age with metallicities mainlybetween -0.5 and -1.5, and a slight tail extending to lowermetallicities. Accreted globular clusters follow a trackwhich moves from age = 14 Gyr and [Fe/H] = -2.5 dexto age = 10 Gyr and [Fe/H] = -1.25 dex (that is, a trackwhich is more metal poor for a given age, or younger ata given metallicity, than the in-situ clusters, see Forbes2020) .It has also been suggested in simulations that minormergers, with masses in the range of 1:50 or 1:100 thatof the Milky Way, are expected to deposit more debrisin the outer regions of the Galaxy compared to majormergers (Karademir et al. 2019).In ΛCDM simulations of Milky Way type galaxies, itis often found that mass buildup is dominated by earlyaccretions (Bullock & Johnston 2005). Since these earlyaccretions are generally larger, they experience more dy-namical friction and sink toward the center, this leadsto the oldest stars being most common in the very cen-tral portions of the galaxy (Tumlinson 2010). If thereare several accretions, it can flatten the halo metallicitygradient, and if there are few, then a steeper gradient isexpected (Cooper et al. 2010).It is also expected that systems in more quiescentregions may continue star formation longer than those The Sagittarius associated globular clusters, for example, haveages up to four Gyr younger than the in-situ Milky Way clusters atsimilar enrichment levels, and the Sagittarius stream is prominentat distances up to 80 kpc in the halo. Alternatively, the Gaia Ence-ladus associated globular clusters have metallicities 1 dex lowerthan in situ globular clusters at similar ages. which are quenched through merging processes andchaotic environments (Balogh et al. 1999). Thereforesmaller accretions at later times could possibly hostyounger populations.In our data, we see a relatively flat metallicity gradi-ent, which would imply that the Milky Way has expe-rienced several mergers rather than a few. We also seea color gradient, which could be interpreted as an agegradient, with younger stars in the external regions. Wesuspect that these younger, peripheral stars come fromminor mergers with smaller (shifting them to the exter-nal regions of the halo) satellite systems which have lessintense star formation owing to their lower masses (shift-ing them to lower metallicities for a given age), and havebeen accreted at later times than the older halo starsfrom earlier major mergers (allowing their star forma-tion to continue longer outside of the quenching effectsof accretion, and so, host younger stars).In short, we see evidence in the anisotropy parame-ter for a portion of our data to be associated with theGaia Sausage event (having a higher β and therefore be-ing more radial). We do not see indications of our databelonging to the Splash or Sequoia/Enceladus as we seeno retrograde component beyond what is expected fromthe canonical halo at low metallicities. We suggest thatwe see evidence for the outer regions of our footprint be-ing predominantly accreted stars from smaller accretionevents at later epochs. CONCLUSIONS
We have implemented a machine learning algorithm toclassify LAMOST spectra with blue Gaia colors as BHBor not-BHB objects. This classification is approximately86% pure.
Please note when using this catalog that thisis more complicated for lower machine-probability starsand stars closer to the plane, although, even in the worstcase, the catalog should be at least ∼
60% pure . We willdiscuss this in more detail in the companion paper. Wesimilarly (with a second machine) predict metallicities toabout 0.35 dex, although we note that our sample doesnot contain many [Fe/H] values above -1 dex. This isprobably a limitation of our training data, and it is betterto consider the derived metallicities as relative values, in-stead of absolute values. We predict absolute magnitudesto ∼ and reddeningis largely unknown inside the plane, with the exceptionof three-dimensional reddening maps which remain dif-ficult to construct precisely . Outside the plane, ex-tinction may be regarded as a known quantity thanks tothe invaluable maps of Schlegel et al. (1998). Spectro-scopic identification, as we have performed here, does notsuffer from this extinction confusion though. This cata-log therefore presents interesting and novel opportunitiesto study this species of star in a hitherto unexaminedregime.We have briefly investigated the halo properties of ourBHB stars. We find that our metal rich BHBs are onmore radial orbits at all galactocentric radii. We inter-pret this as showing that the metal rich population in oursample is populated by Gaia Sausage stars in addition tohalo stars, while the metal poor stars do not obviouslybelong to any of the other named major merger compo-nents.We find that as we move outward in the halo, fromabout 5-20 kpc, the BHBs grow redder (which couldmean younger) and that the metallicity gradient remainsmostly flat.If we suppose that the inner region is populated mostlyby large accretion events, and the outer region by smallerones, then we would expect a decreasing metallicity gra-dient and a flat age gradient if the mergers occurred atthe same time and star formation in the systems was sub-sequently quenched simultaneously (since larger systemshave higher star formation rates and should be intuitivelymore enriched).If, instead, the larger, centrally concentrated accretionsoccur before the smaller accretions, their star formationwill be quenched at earlier epochs, so we would perhapssee a decreasing age gradient in the stellar populations aswe move outward in the halo to the areas populated bysmaller, more recent mergers. In this situation, we maynot expect a decreasing metallicity gradient with radius;since the outer regions are quenched at later epochs, theymay have had more time to enrich and their abundancelevels may “catch up” to those of the larger systems atthe center.In this paper, we have investigated only BHB starsin our catalog which reside far from the plane, in thehalo, omitting more than half of our data. Dealing withthe data in the plane requires a more nuanced and rig-orous investigation of effects from differential reddening(since our distances here are photometrically derived)and larger contaminating populations. A second paperhas been prepared which deals specifically with thesedata inside the plane. The catalog may be downloaded fromhttps://zenodo.org/record/4547803 ACKNOWLEDGEMENTS
We thank the referee for their thoughtful commentswhich helped improve the clarity of this paper.We thank Corrado Boeche for his work in normalizingthe LAMOST spectra so that we could easily utilize themin this work.We thank Iulia Simion for helpful discussions and sug-gestions regarding Gaia kinematics.We thank the developers and maintainers of the fol-lowing software libraries which were used in this work:Topcat (Taylor 2005), NumPy (van der Walt et al. 2011),SciPy (Jones et al. 2001), AstroPy (Astropy Collabora-tion et al. 2013), matplotlib (Hunter 2007), scikit-learn(Pedregosa 2011), IPython (Perez & Granger 2007), XG-Boost (Chen & Guestrin 2016) and Python.The research presented here is partially supported bythe National Key R&D Program of China under grantNo. 2018YFA0404501; by the National Natural Sci-ence Foundation of China under grant Nos. 12025302,11773052, 11761131016; by the “111” Project of the Min-istry of Education under grant No. B20019. This workmade use of the Gravity Supercomputer at the Depart-ment of Astronomy, Shanghai Jiao Tong University, andthe facilities of the Center for High Performance Com-puting at Shanghai Astronomical Observatory.JJV gratefully acknowledges the support of the Chi-nese Academy of Sciences President’s International Fel-lowship Initiative. M.C.S. acknowledges financial sup-port from the CAS One Hundred Talent Fund and fromNSFC grants 11673083 and 11333003.Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is aNational Major Scientific Project built by the ChineseAcademy of Sciences. Funding for the project has beenprovided by the National Development and Reform Com-mission. LAMOST is operated and managed by the Na-tional Astronomical Observatories, Chinese Academy ofSciences.This work has made use of data from theEuropean Space Agency (ESA) mission
Gaia ( ), processedby the Gaia
Data Processing and Analysis Consortium(DPAC, ). Funding for the DPAC has beenprovided by national institutions, in particular theinstitutions participating in the
Gaia
MultilateralAgreement.
REFERENCESAmarante, J. A. S., Smith, M. C., & Boeche, C. 2020, MNRAS,492, 3816. doi:10.1093/mnras/staa077Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al.2013, A&A, 558, A33Balogh, M. L., Morris, S. L., Yee, H. K. C., et al. 1999, ApJ, 527,54 Particularly ultraviolet wavelengths such as the U band, whichis frequently used for BHB identification. Despite outstanding efforts from, for example Drimmel et al.(2003), Marshall et al. (2006), Sale et al. (2014), Green et al. (2019). Beers, T. C., Preston, G. W., Shectman, S. A., et al. 1992, AJ,103, 267Bell, E. F., Xue, X. X., Rix, H.-W., et al. 2010, AJ, 140, 1850Belokurov, V., & Koposov, S. E. 2016, MNRAS, 456, 602Belokurov, V., Erkal, D., Evans, N. W., et al. 2018, MNRAS, 478,611Belokurov, V., Sanders, J. L., Fattahi, A., et al. 2020, MNRAS,494, 3880Bird, S. A., Xue, X.-X., Liu, C., et al. 2019, AJ, 157, 104Bird, S. A., Xue, X.-X., Liu, C., et al. 2020, arXiv:2005.05980Boeche, C., Smith, M. C., Grebel, E. K., et al. 2018, AJ, 155, 181