[PDF] Mapping Luminous Hot Stars in the Galaxy

Abstract

[Abridged] Luminous hot stars dominate the stellar energy input to the interstellar medium throughout cosmological time, they are laboratories to test theories of stellar evolution and multiplicity, and they serve as luminous tracers of star formation in the Milky Way and other galaxies. Massive stars occupy well-defined loci in colour-colour and colour-magnitude spaces, enabling selection based on the combination of Gaia EDR3 astrometry and photometry and 2MASS photometry, even in the presence of substantive dust extinction. In this paper we devise an all-sky sample of such luminous OBA-type stars, designed to be quite complete rather than very pure, to serve as targets for spectroscopic follow-up with the SDSS-V survey. We estimate "astro-kinematic" distances by combining parallaxes and proper motions with a model for the expected velocity and density distribution of young stars; we show that this adds useful constraints on the stars' distances, and hence luminosities. With these distances we map the spatial distribution of a more stringently selected sub-sample across the Galactic disc, and find it to be highly structured, with distinct over- and under-densities. The most evident over-densities can be associated with the presumed spiral arms of the Milky Way, in particular the Sagittarius-Carina and Scutum-Centaurus arms. Yet, the spatial picture of the Milky Way's young disc structure emerging in this study is complex, and suggests that most young stars in our Galaxy (t_{age}<t_{dyn}) are not neatly organised into distinct spiral arms. The combination of the comprehensive spectroscopy to come from SDSS-V (yielding velocities, ages, etc..) with future Gaia data releases will be crucial to reveal the dynamical nature of the spiral arm themselves.

Full PDF

AAstronomy & Astrophysics manuscript no. aanda © ESO 2021February 18, 2021

Mapping Luminous Hot Stars in the Galaxy

E. Zari , H.-W. Rix , N. Frankel , M. Xiang , E. Poggio , , R. Drimmel , and A. Tkachenko Max-Planck-Institut für Astronomie, Königstuhl 17 D-69117 Heidelberg, Germany Osservatorio Astroﬁsico di Torino, Istituto Nazionale di Astroﬁsica (INAF), I-10025 Pino Torinese, Italy Université Côte d’Azur, Observatoire de la Côte d’Azur, CNRS, Laboratoire Lagrange, France Institute of Astronomy, KU Leuven, Celestijnenlaan 200D, 3001, Leuven, BelgiumReceived xxx; accepted yyy

ABSTRACT

Luminous hot stars ( M K s (cid:46) T e ﬀ (cid:38) Gaia

EDR3 astrometry and photometry and 2MASS photometry, even in thepresence of substantive dust extinction. In this paper we devise an all-sky sample of such luminous OBA-type stars, designed to bequite complete rather than very pure, to serve as targets for spectroscopic follow-up with the SDSS-V survey. To estimate the purityand completeness of our catalogue, we derive stellar parameters for the stars in common with LAMOST DR6 and we compare thesample to other O and B-type star catalogues. We estimate "astro-kinematic" distances by combining parallaxes and proper motionswith a model for the expected velocity and density distribution of young stars; we show that this adds useful contraints on the stars’distances, and hence luminosities. With these distances we map the spatial distribution of a more stringently selected sub-sampleacross the Galactic disk, and ﬁnd it to be highly structured, with distinct over- and under-densities. The most evident over-densitiescan be associated with the presumed spiral arms of the Milky Way, in particular the Sagittarius-Carina and Scutum-Centaurus arms.Yet, the spatial picture of the Milky Way’s young disc structure emerging in this study is complex, and suggests that most young starsin our Galaxy ( t age < t dyn ) are not neatly organised into distinct spiral arms. The combination of the comprehensive spectroscopy tocome from SDSS-V (yielding velocities, ages, etc..) with future Gaia data releases will be crucial to reveal the dynamical nature ofthe spiral arm themselves.

Key words.

Stars: early-type; Galaxy: disc; Galaxy: structure.

1. Introduction

Luminous and hot stars are massive, and hence rare and shortlived. However, they play decisive roles across di ﬀ erent ﬁelds ofastrophysics.They dominate the interaction between stars and the inter-stellar gas and dust (ISM) in their host galaxies, by heating orionising those components; those stars also interact with the ISMthrough powerful stellar winds and eventually supernova explo-sions (Mac Low & Klessen 2004; Hopkins et al. 2014). Lumi-nous and hot stars must indeed have played an important rolein galaxy evolution throughout cosmic time, via their intensewinds, ultraviolet radiation ﬁelds, chemical processing, and ex-plosions (Haiman & Loeb 1997; Douglas et al. 2010; Bouwenset al. 2011).Luminous and hot stars are inevitably young, and thereforethey can serve as tracers of recent massive star formation. Al-though they make up an insigniﬁcant fraction of the overall stel-lar mass, they contribute a major portion of the light of the discand they can thus probe the spiral structure and young disc kine-matics of our and other galaxies (Xu et al. 2018; Chen et al.2019; Dobbs & Baba 2014; Kendall et al. 2011, 2015).They are, directly or through their remnants, decisive driversof the chemical evolution for many elements in the periodic table and they serve as crucial laboratories for stellar evolution in thisarguably least well-understood regime of stellar physics.Massive stars are born predominantly as members of binaryand multiple systems (Sana et al. 2012, 2014; Kobulnicky et al.2014; Moe & Di Stefano 2017). As a consequence, most of themare expected to undergo strong binary interaction, which drasti-cally alters their evolution (Podsiadlowski et al. 1992; Van Bever& Vanbeveren 2000; O’Shaughnessy et al. 2008; de Mink et al.2013; Langer et al. 2020).Finally, massive (binary) stars are the only channel to yieldbinaries that involve black holes and neutron stars in the disc ofthe Milky Way. Therefore, understanding massive star binaries,also as they evolve away from their zero-age main sequence, isindispensable for understanding the distribution of gravitationalwave events. For all the above reasons, a multi-epoch, spec-troscopic census of luminous and hot stars across the Galaxy,providing spatial and dynamic information together with esti-mates for masses, ages, metallicity, multiplicity, and other spec-troscopic information is needed. The SDSS-V survey (Kollmeieret al. 2017; Bowen & Vaughan 1973; Gunn et al. 2006; Smeeet al. 2013; Wilson et al. 2019, Kollmeier, J., Rix, H.-W., et al.,in prep) will provide such a comprehensive, multi-epoch spec-troscopic program on hot and massive stars, which – however – Article number, page 1 of 19 a r X i v : . [ a s t r o - ph . GA ] F e b & A proofs: manuscript no. aanda must be based on a clear and quantitative selection function toenable rigorous subsequent population studies.There is no universal precise ’deﬁnition’ across di ﬀ erent sub-communities within astrophysics of when a star is ’luminous’,’hot’, and ’massive’. ’Hot’ may either mean that its T e ﬀ is su ﬃ -ciently high that the spectrum is dominated by H and He lines,rather than metal lines (i.e. OBA stars); or it may mean that T e ﬀ is su ﬃ ciently high that the star produces signiﬁcant amounts ofionising radiation (O and early B stars). ’Massive’ may eithermean, massive enough to ’go Supernova’ or ’to have a convec-tive core’.For the current context we choose an operative deﬁnition of luminous and hot as T e ﬀ (cid:38) M K (cid:46) ﬀ ected by O and early B stars; but it is im-portant to remember that even the most massive stars are notionisingly hot during all of their evolution. Therefore we aim inour sample deﬁnition for a sample that is hot enough to elim-inate all the luminous red giant branch (RGB) and asymptoticgiant branch (AGB) stars, but include most evolved phases ofmassive stars. Stars with T e ﬀ ≥ ﬃ cient discrimination of10,000 K from 20,000 K by means of photometry only would benearly impossible in the presence of dust.In light of the science goals above, in this paper we presentthe selection of the target sample of massive stars of the SDSS-V survey. We describe an approach to use kinematics to improvedistance estimates. Finally we use a "clean" sub-set of the targetsample to study the structure of the Milky Way disc as traced byyoung, massive stars.The selection of our target sample is based on the combi-nation of Gaia

EDR3 (Gaia Collaboration et al. 2016, 2020a)astrometry and photometry and 2MASS (Skrutskie et al. 2006)photometry. Similar samples of stars in the Upper Main Se-quence (UMS) were already selected by Poggio et al. (2018)and Romero-Gómez et al. (2019) to characterise the structureand the kinematic properties of the Galactic warp. Poggio et al.(2018) used a combination of

Gaia

DR2 and 2MASS coloursto select stars of spectral type earlier than B3, and obtained asample of 599 494 stars. Romero-Gómez et al. (2019) used only

Gaia

DR2 photometry to select stars brighter than M G , =

2. Target sample

In this Section we present the criteria that we applied to selectthe target sample for the SDSS-V spectroscopic survey, and wedescribe its characteristics in terms of sky distribution, magni-tude and luminosity distribution, variability, purity, and com-pleteness.

To select our sample, we use

Gaia

EDR3 photometry (Rielloet al. 2020) and astrometry (Lindegren et al. 2020b) combinedwith 2MASS photometry. The cross-match betweeen

Gaia and2MASS is not yet provided in the

Gaia archive (Marrese et al.,2021, in prep) . For this reason we performed ﬁrst a cross-matchwith Gaia

DR2, then a cross-match with 2MASS. The ADQLquery for this cross-match is provided in Appendix A.We restrict our query to the sources with G <

16 mag. Themotivation of this condition is twofold. On the one hand, it is aSDSS-V technical requirement: at G =

16 mag we can obtain aS / N =

75 with 15 minute exposures with the BOSS spectrograph(Smee et al. 2013). On the other hand, for G >

16 mag the

Gaia

EDR3 parallax precision rapidly deteriorates (see e.g. Table 3in Gaia Collaboration et al. 2020a), thus a) estimating absolutemagnitudes (which are crucial to our selection) becomes non-trivial and b) inferring distances strongly depends on the choiceof the prior (see Appendix B and Section 3).We deﬁne a proxy for the absolute magnitude of a star in the K s band, ˜ M K s , and we require ˜ M K s <

0. This translates to thefollowing parallax condition: (cid:36) < (10 − K s − . / , (1)and aims at selecting a reasonably-sized sample of bright stars.The value M K s = M K s < M K s = M K s = ﬃ cult to model.At this point our sample, shown in the ( J − K s ) vs G BP − G RP colour-magnitude diagram in Fig. 2, consists of luminous stars,which could either be OBA, RGB, and AGB stars. OBA starsfall in the region highlighted by the black ellipse. The stars inthe horizontal stripes are stars with large 2MASS photometricerrors or 2MASS photometric quality ﬂags ( ph_flag ) di ﬀ erentthan "AAA".To select OBA stars we use simple photometric cuts, roughlyfollowing the procedure outlined in Poggio et al. (2018) (see inparticular their Fig. 1). The ﬁrst step of our selection consists inselecting sources with:( J − K s ) < . J − K s ) > − . . (2)The ﬁrst condition excludes most of the red sources, the secondcondition removes sources with unnaturally blue colours. We ob-tained the ( J − K s ) colour by applying the following Equation:( J − K s ) = ( J − K s ) − . G − K s ) , (3)which we derived by noticing that, for the spectroscopically con-ﬁrmed O and B-type stars selected by Liu et al. (2019), the( J − K s ) and ( G − K s ) colours are linearly correlated (see Fig. https://archives.esac.esa.int/gaia Article number, page 2 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy

Fig. 1.

2D histogram of J − K s vs. G − K s (left) and ( J − K s ) and G − K s (right) for the stars in the LAMOST sample by Liu et al. (2019) (seemain text for more details on the sample). Fig. 2. ( J − K s ) vs. G BP − G RP colour-magnitude diagram. OBA starsroughly fall in the region highlighted by the orange ellipse. This rep-resentation shows that hot stars stand out qualitatively in colour-colourspace.

1, left), and by assuming that the di ﬀ erence between the J and K s magnitudes for such stars is close to zero (see e.g. Pecaut &Mamajek 2013).Fig. 3 shows the distribution of sources in the G − K s vs. J − H colour-colour diagram. O and B-type stars lie on a sequence inthe J − H vs. G − K s colour-colour diagram as a consequenceof interstellar reddening and are clearly separated from redderturn-o ﬀ stars and giants. We therefore select stars located in theregion deﬁned by the equations: J − H < . G − K s ) + . J − H > . G − K s ) − .

15 (4)(solid grey lines in Fig. 3). Finally, we select stars in the G vs. G − K s colour-magnitude diagram, where giants still contaminatingour sample can be easily separated from OBA stars (see Fig. 4).In particular we select stars satisfying: G > G − K s ) + Fig. 3. J − H vs. G − K s colour-colour diagram of the sources with( J − K s ) < . J − K s ) > − . (solid grey line, in Fig. 4). The catalogue consists of 988 202entries. Fig. 5 shows the sky distribution of the sample selected in theprevious Sections in Galactic coordinates ( l , b ). The stars in ourcatalogue are distributed in our Galaxy (mainly on the Galac-tic plane) and in the Large (LMC, l , b ∼ ◦ , − ◦ ) and theSmall (SMC, l , b ∼ ◦ , − ◦ ) Magellanic Clouds. We stressthat massive stars in the Magellanic clouds are included in thetarget sample of the SDSS-V survey. In this paper however wefocus on the Milky Way disc, thus in the following sections werestrict our sample to stars with | b | < ◦ . Massive star formingregions can be recognised as over-densities in the source distri-bution.Dust features are visible as ’gaps’ in the star distribution.For example, the Aquila rift, the Pipe Nebula and the Ophiu-cus clouds can be easily identiﬁed towards the Galactic centre Article number, page 3 of 19 & A proofs: manuscript no. aanda

Fig. 4. G − K s vs. G colour-magnitude diagram of the sources selectedafter applying the criteria in the ( J − K s ) vs G − K s and G − K s vs J − H colour-magnitude diagrams (see text). The solid orange line hasequation G = G − K s ) +

3, and aims at excluding the few cool giantsthat may still contaminate the sample. ( l , b = , l , b ∼ ◦ , ◦ the ω Centaurus globular cluster isvisible, while the stars towards the Galactic centre delineate theshape of the bulge. We further study the purity and completenessof our sample in Sec. 2.1.2.Fig. 6 shows the distribution in apparent (left) and absolutemagnitude (right) of the sources. The orange histograms showthe distribution of all the sources in our sample. The grey his-tograms show the distribution of the sources selected in Section3. On the right hand panel, absolute magnitudes are computedby using the inversion of parallaxes as the distance (orange his-togram) or astro-kinematic distances (grey histogram, see Sec-tion 3). The grey distribution appears shifted towards larger M G values. This is due to the fact that, for stars with small paral-laxes (and large parallax errors), our astro-kinematic distancesare shorter than those obtained by inverting the parallax.Fig. 7 (left) shows the variability distribution of the sourcesin the G band vs. the ratio of the variability in the G BP and G RP bands. Following roughly Belokurov et al. (2017), Deason et al.(2017) and Iorio et al. (2018), the variability in a given ﬁlter x isdeﬁned as:AMP x = √ phot_x_n_obsphot_x_mean_flux_over_error , (6)where phot_x_n_obs is the number of observations contribut-ing to the x -band photometry, and phot_x_mean_flux_over_error = phot_x_mean_flux / phot_x_mean_flux_error . (7)While most of our sources are not variable, some cluster in tworegions: RR Lyrae stars have G -band variability ranging from 5to 40% (Belokurov et al. 2017) and G BP / G RP variability around1.6; eclipsing binaries have G variability up to 30% and roughlyequal G BP and G RP variability. Fig. 7 (right) shows the same asFig. 7 (left) but for the sources selected in Section 3 (see below),that have astro-kinematic distances compatible with those esti-mated by Bailer-Jones et al. (2018). This selection removes RR Lyrae stars, while retaining eclipsing binaries: this is expected asRR Lyrae stars do not follow the same kinematics as the OBAstars that we are interested in. To study the purity level of our sample we cross-matched itwith LAMOST DR6 ( http://dr6.lamost.org/ ), and ﬁnd 36617 sources in common. We derived e ﬀ ective temperatures ( T e ﬀ )and surface gravity (log g ) from the LAMOST spectra by ﬁttingab initio model spectra generated with the 1D-LTE ATLAS12model atmosphere (Xiang et al., in prep). Fig. 8 shows the distri-bution of sources in the log T e ﬀ vs. M K s plane. Around the 45%of the sources are hotter than 9 700 K (orange solid line in Fig.8, left, corresponding to the temperature of a A0V-type star) andaround the 20% are hotter than 14 000 K (roughly the temper-ature of a B7V-type star). We assume that we can extrapolatethe same numbers to the entire sample. The PARSEC isochrones( A V = Gaia

EDR3, 2MASS and further checking for duplicated sources (1"cross-match radius), and to 9 083 stars by applying Eq. 1. Thecross-match between ours and Liu et al. (2019) catalogue gives8212 sources. The stars that are not included in the cross-matchare those whose infrared colours are not consistent with our se-lection. Sota et al. (2014) and Maíz Apellániz et al. (2016) sam-ple (the Galactic O-star spectroscopic survey, GOSSS) consistsof 590 stars, that reduce to 580 after cross-matching with

Gaia

EDR3 and 2MASS. Our catalogue contains 503 ( ∼ J − H > . G − K s ) + .

05 (see Eq. 4),that is 2MASS-

Gaia

EDR3 colours consistent with being giants.

3. Distance estimates

To study the 3D space distribution of our sources, precise dis-tance estimations are needed. We estimate distances by using amodel designed to reproduce the properties of our data-set interms of spatial and luminosity distribution, and the additionalinformation that stars belonging to our sample should followGalactic rotation, with a small, typical velocity dispersion. Bymaking such assumption we neglect non-circular streaming mo-tions. We can then predict the true proper motions ( µ (cid:48) l , µ (cid:48) b ) of thestars in our sample and compare them with the observed ones,thus adding a further constraint to the distance estimation and de-riving ’astro-kinematic’ distances. The probability density func-tion ( pdf ) for a star in our sample to be at a certain distance d kin is given by: p ( d | o , m K s , Θ KM , Θ SM , Θ CMD ) ∝ p ( o | Θ KM ) p ( d , m K s | l , b , Θ SM , Θ CMD ) , (8) Article number, page 4 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy

Fig. 5.

On-sky density distribution of the candidate OBA stars selected in Section 2, in Galactic coordinates. Most of the sources are located in theGalactic plane ( b = ◦ ) and in the Large and Small Magellanic clouds. Fig. 6.

Left: G magnitude distribution of all the OBA stars selected in Section 2.1 (orange histogram) and of the sources selected in Section 3, withastro-kinematic distances comparable with those estimated by Bailer-Jones et al. (2018) (grey histogram). Right: absolute magnitude distribution ofall the OBA stars selected in Section 2.1, computed by using their parallax (orange histogram), and of the sources selected in Section 3, computedby using their astro-kinematic distances (grey histogram). where – o = ( (cid:36) − (cid:36) , µ l ∗ , µ b ) is the array of the astrometric observ-ables, i.e. parallax (cid:36) (with (cid:36) the parallax zero-point) andproper motions components in l and b , µ l ∗ and µ b ; – m K s is the apparent magnitude in the K s band; – Θ KM represents our kinematic model; – Θ S M represents our model for the distribution of stars in theGalaxy; – Θ CMD accounts for the observational e ﬀ ects due to our selec-tion function on the spatial distribution of our sample.The method and all the terms in Eq. 8 are described in detail inAppendix B. We adopt as the astro-kinematic distance estimate d kin the mode of the pdf of Eq. 8, and we use the 16 th and 84 th percentiles to estimate distance errors. We also tried using themedian of the pdf as a distance estimate, but this did not causesigniﬁcant di ﬀ erences in the maps presented in Section 5. Figure 9 (left) shows the comparison between our astro-kinematic dis-tances and the photo-geometric distances estimated by Bailer-Jones et al. (2020). More than the 85% of the sources haveastro-kinematic distances consistent within 1 σ with the photo-geometric distances from Bailer-Jones et al. (2020). We com-ment on the sources with inconsistent distances in Section 4.2.

4. Filtered sample

In this Section we deﬁne a sub-set of the target sample (Sec-tion 2), which we use to map the structure of the young MilkyWay disc. Such "ﬁltered" sample is obtained by cleaning the tar-get sample for sources with likely spurious astrometric solutions(Sec. 4.1) and by removing sources with kinematic propertiesnot consistent with the model that we used to estimate astro-kinematic distances (Sec. 4.2).

Article number, page 5 of 19 & A proofs: manuscript no. aanda

Fig. 7. G − band variability vs. G RP / G BP variability of all the sources selected in Section 2.1 (left) and of the sources selected in Section 4 (right).Eclipsing binaries have the same level variability in G BP and G RP and G − variability lower than 0.3. RR Lyrae stars have G BP / G RP variability ∼ . G − band variability lower than 0.4. Fig. 8.

Left: Distribution of the sources with LAMOST DR6 spectra in the log T e ﬀ vs. M K s (gray). The orange solid line indicates T e ﬀ = Fig. 9.

Comparison between the photo-geometric distances estimated by Bailer-Jones et al. (2020) and our astro-kinematic distances (see Sec.3), after removing spurious sources (left), and after removing sources with vertical velocity not consistent with our kinematic model and with M K s < y = x .Article number, page 6 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy Fig. 10.

Astrometric ﬁdelity distribution for the sources in our targetsample. Low astrometric ﬁdelity values ( < .

5) correspond to bad as-trometric solutions, high astrometric ﬁdelity values ( > .

5) correspondto good astrometric solutions.

To remove spurious sources, we use a method developed by Ry-bizki et al. (2021), based on Gaia Collaboration et al. (2020b).Spurious sources have poor astrometric solutions that can bedue to the inconsistent matching of the observations to di ﬀ erentphysical sources. This is more likely to occur in regions of highsource surface density (for example, in the Galactic plane) or forclose binary systems (either real or due to perspective e ﬀ ects).To identify poor astrometric solutions in their solar neighbour-hood catalogue, Gaia Collaboration et al. (2020b) constructed arandom-forest classiﬁer that assigns a probability to each sourceof having a good (or bad) astrometric solution, based on astro-metric quantities and quality indicators. The classiﬁcation prob-ability is ∼ ∼ Gaia

EDR3 catalogue. The distribution (in logarithmic scale) of"astrometric ﬁdelities" for the sources in our target sample isshown in Fig. 10. We select sources with astrometric ﬁdelities > ﬀ ect thespace distribution of our ﬁltered sample. Our astro-kinematic distance estimates are not accurate forsources that do not follow the disc kinematics assumed by ourmodel. Such sources can be grouped in two categories:1. stars whose kinematics is inﬂuenced by the bar. Theyare located mostly towards the Galactic Centre, and haveastro-kinematic distances larger than Bailer-Jones photo-geometric distances. These stars could in principle be young.Their astro-kinematic distance estimates could however be wrong due to the kinematic e ﬀ ects of the bar itself, that havenot been included in our model (see Section 3 and AppendixB). In the following Sections we focus on the in-plane dis-tribution of stars within d kin (cid:46) / bulge regions.2. stars with heated vertical kinematics. These stars haveBailer-Jones photo-geometric distances d BJ (cid:38) d kin ∼ v z (see Appendix C).We ﬁt the vertical velocity distri-bution with a Gaussian Mixture model with two compo-nents, the ﬁrst (component A) with mean vertical veloc-ity < v z > A = − . − and velocity dispersion σ v z , A = . − which has properties comparable with the youngpopulation that we are interested in studying, and the second(component B), with < v z > B = . − and velocity dis-persion σ v z , B = . km s − , which instead better reproducesthe properties of an old population. We estimate the proba-bility for each star to belong to either component A or B, andwe select those stars with larger probability of belonging tocomponent A.Figure 9 (right) shows the comparison between the astro-kinematic distances and Bailer-Jones photo-geometric distancesafter cleaning the target sample as explained above. At this point,around the 95% of the sources have astro-kinematic distancesconsistent with Bailer-Jones photo-geometric distances within1 σ .Finally, by using our astro-kinematic distances, we estimatethe absolute magnitude of each star in the K s band, M K s , and weselect stars with M K s <

5. 3D space distribution

To create the 3D density maps, we follow the method outlined inZari et al. (2018). We compute galactic Cartesian coordinates, x , y , and z , for all the sources by using our newly estimated astro-kinematic distances and we deﬁne a box V = × × . v = × × D ( x , y , z ) by smoothing thedistribution by means of a 3D Gaussian ﬁlter, using a techniquesimilar to that used by Bouy & Alves (2015).The Gaussian width(equal on the three axes) is w = σ . The choice of a certain w value is arbitrary. A high w valueproduces a smooth, less detailed map, while a low w value resultsin a noisy map.Fig. 11 shows the projection on the Galactic plane of theﬁltered sample selected in Section 4. A number of features isvisible, showing di ﬀ erent structures at di ﬀ erent scales and dis-tances. At the centre of the map ( x , y ∼ , ∼ B7V.

Article number, page 7 of 19 & A proofs: manuscript no. aanda

Thus, even if some late type B stars present within 200 −

300 pcfrom the Sun, they are excluded from our selection. Third, selec-tion e ﬀ ects due to the fact that a) nearby bright sources are notincluded in Gaia

EDR3 and b) some of those that are includedmight have poor 2MASS photometry, and thus are excluded bythe colour selection in Eq. 2.The small dense clumps can be associated to well studiedOB associations, for example Cygnus, Carina, Cassiopeia, andVela, are visible. The distribution of the lower density contourstraces the Milky Way young disc structure, in particular the spi-ral arm location (see Sec. 5.2 and 6). The density distributionpresents numerous low-density gaps. These could be due to thefact that our view might be obscured by interstellar dust towardscertain lines of sight, however some gaps are located in regionsof relatively low extinction (see also Section 5.3). For examplethe Perseus gap ( x , y ∼ − , x , y ∼ , − G =

16 mag (see Sec. 2).The radial features are caused by ’shadow cones’ producedby foreground extinction, and by distance uncertainties. Asshown in Fig. B.3, distance uncertainties are lower than 10% forstars within 5 kpc, and therefore are only partially responsiblefor the elongation.

As mentioned in the Introduction, early-type star samples havebeen used to study the properties of the Galactic warp (seePoggio et al. 2018; Romero-Gómez et al. 2019, and referencestherein). Although in this paper we do not focus speciﬁcally onthe warp, we studied the median height of our ﬁltered samplewith respect to the plane of the Galaxy and the median verti-cal velocity distribution, and compared our results to those byPoggio et al. (2018) and Romero-Gómez et al. (2019) to furthervalidate our selection.Fig. 12 shows a map of the median height z of our sample ona spatially uniform grid, with bins of 100 pc width. We consideronly bins that have at least 10 stars. The radial features in Fig.12 are likely an artefact due to uneven sampling above or belowthe Galactic plane due to foreground extinction. Such featuresmake the interpretation of Fig. 12 uncertain and might preventfrom seeing the warp shape, especially for X > − . X < − . v z ) distribution of our ﬁl-tered sample on the Galactic plane (see Appendix C for moredetails on how the vertical velocity was computed). For distances larger than 4 kpc from the Sun, the vertical velocities have posi-tive values that seem to peak at Y ∼ v z are likelyrelated to the perturbations that the disc is experiencing (e.g. in-teraction with satellite galaxies). By combining

Gaia

EDR3 and VPHAS data with literature cata-logues, Chen et al. (2019) obtained a sample of 14 880 OB stars,earlier than B3V, which they used to describe the morphologyof the spiral structure of the Milky Way. Chen et al. method toselect O and B stars substantially di ﬀ ers from ours, however aqualitative comparison between their Fig. 4 and our maps showssubstantially the same gross structures. This increases our con-ﬁdence that both maps are revealing the same features in theGalactic O and B star distribution. The larger number of sourcesin our catalogue enables us to reveal more details in the sourcedistribution. Interestingly, neither maps show a clear spiral struc-ture. This will be further discussed in the Sec. 6.Fig. 3 in Romero-Gómez et al. (2019) shows the distribu-tion on the Galactic plane of their upper main sequence sources.Fig. 11 shows essentially the same over-densities, although to ahigher level of detail. This is mainly due to two facts: a) to cre-ate their maps, Romero-Gómez et al. (2019) used a larger binsize than in our present work, and b) their sample likely includesmore later type stars than ours due to their selection criteria: thiscauses the distribution of sources to be smoother.Fig. 3 (panel A) in Poggio et al. (2018) also shows the dis-tribution of upper main sequence sources on the Galactic plane.Similarly to Romero-Gómez et al. (2019), the pixel size of Pog-gio et al. (2018) is higher than ours. As mentioned above, thiscreates a smoother map, where it is not possible to identifysmall-scale over-densities. The comparison between large scalestructures shows however many similarities. Dust is also a tracer of Galactic structure. Fig. 14 shows the pro-jection of the 3D dust distribution from Lallement et al. (2019)on the Galactic plane, over-plotted on top of our density map(see Fig. 11). The units of the density distribution are arbitrary.The dust density distribution shows discrepancies with respectto the star distribution. For example, the two elongated struc-tures at the centre of the map are not prominent in the OB starsdistribution, and the dust features in the ﬁrst and fourth Galacticquadrants are o ﬀ -set with respect to the star density distribution,and seem to have a di ﬀ erent separation and relative inclination.Such o ﬀ sets are expected, as newly formed stars will drift fromtheir birthplaces as they follow galactic rotation while the spiralarms move with a given pattern speed. Classical Cepheids are young ( <

400 Myr) variale stars, whosedistances can be estimated thanks to their period-luminosity re-lation. (Fig. 15 show the Cepheids identiﬁed by Skowron et al.(2019), over-plotted on the density distribution of O and B stars(same as in Figs. 11 and 14). The colour-bar represents the agesdetermined by Skowron et al. (2019) (in million years). The

Article number, page 8 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy

Fig. 11.

Surface density of the stars selected in Section 3 projected on the Galactic Plane. The Sun is in (0,0), the x-axis is directed to-wards the Galactic centre, and the y-axis towards Galactic rotation. The z-axis is perpendicular to the plane. The density is displayed in ar-bitrary units. The dashed circles have radii from 1 to 6 kpc, in steps of 1 kpc. The labels follow the nomenclature proposed by K. Jardine(http: // gruze.org / galaxymap / map_2020 / ). The gaps in the density distribution mentioned in the Section 5 are indicated by the grey circles. Cepheid distribution traces reasonably well the density enhance-ments corresponding to the Sagittarius-Carina arm (see Sec. 5.5),while the correspondence with the other density enhancements isnot as tight. Although the age distribution of the Cepheids andour ﬁltered sample is similar, the selection criteria are di ﬀ erent:this makes a more direct comparison di ﬃ cult. The spiral structure of the Milky Way can be traced also by waterand methanol masers associated with high-mass star forming re-gions (HMSFRs). By measuring parallaxes and proper motionsof such HMSFRs, Reid et al. (2019) provided estimates of thefundamental Galactic parameters, among which the pitch angleand the arm width. Fig. 16 shows the masers and the location ofthe arms from the ﬁt from Reid et al. (2019), on top of the OBstars density map shown in Fig. 11. Similarly to the Cepheidsdistribution, there is a good agreement between the position ofthe masers and the location of the most prominent over-densities, which makes us conﬁdent of our distance estimates. Unfortu-nately there are no available maser data tracing the spiral struc-ture in the 4th Galactic quadrant ( X > Y <

6. Discussion

We have devised a large and systematically selected sample ofmassive, young stars and we studied its properties focusing onthe spatial distribution of our sources on the Galactic plane. Inthis Section we discuss our ﬁndings in the context of the spiralarm structure of the Milky Way.Traditionally, the four main spiral arms of the Milky Way areconsidered to be Perseus, Sagittarius, Scutum and Norma, with

Article number, page 9 of 19 & A proofs: manuscript no. aanda

Fig. 12.

Median height for the ﬁltered sample projected on the Galacticplane. We divided the XY plane into bins of 200 pc width, and we onlyshow the ones containing more than 10 stars. The dashed circles haveradii from 1 to 9 kpc, in steps of 1 kpc. The radial features are likely dueto uneven sampling of sources in the Galactic plane due to foregroundextinction.

Fig. 13.

Median vertical velocity v z distribution on the Galactic plane.We divided the XY plane into bins of 200 pc width, and we only showthe ones containing more than 10 stars. The dashed circles have radiifrom 1 to 9 kpc, in steps of 1 kpc. The Galactic warp is visible for R > an additional Outer arm, which might be the outer part of oneof the inner arms, and the Local or Orion arm, which is shorterand may be a spur or a bridge between two arms (see for in-stance Churchwell et al. 2009). In the Milky Way, the terminol-ogy "spiral arm" has been used very broadly, for HI, moleculargas (and masers), dust and young stars, while it has been shownfrom external galaxies that these tracers have quite distinct mor- phology. This has led to some inconsistencies in deﬁning thespiral arms in our Galaxy. For example, the structure param-eters derived by Reid et al. (2019) (see Fig. 16 and Sec. 5.5)do not agree with those presented in Chen et al. (2019) (seeSec. 5.2). Further, certain features might be interpreted as spi-ral arms or spurs connecting di ﬀ erent arms: this is for examplethe case for the dust complex labelled as "Lower Sagittarius-Carina" by Lallement et al. (2019) (see Sec 5.3 and Fig. 14) and"Lower Sagittarius-Carina Spur" by Chen et al. (2020). Further,by studying the correlation between the location of young clus-ters and molecular clouds in NGC 7793 and M51 (respectively aﬂocculent and a grand-desing spiral galaxy), Grasha et al. (2018)and Grasha et al. (2019) found that the star clusters that are as-sociated (i.e., located within the footprint of a giant molecularcloud) are young, with a median age of 2 Myr in NGC 7793 andof 4 Myr in M51. Older clusters are mostly un-associated withany molecular cloud. Thus, equating the same spiral arm mor-phology with di ﬀ erent tracers (such as early type stars and dust)might be inappropriate.The over-density towards the inner Galaxy (visible for posi-tive x values, i.e. in the 1 st and 4 th Galactic quadrant) in Fig. 11is associated with the Sagittarius-Carina and Scutum-Centaurusarm, and is much more prominent than the others, containing nu-merous high-mass star forming regions. On the contrary, the dis-tribution of massive stars associated with the Perseus arm peaksmainly in the 2 nd Galactic quadrant (towards Cassiopeia). Thisis consistent with Reid et al. (2019) ﬁndings, and would point tothe conclusion that the Perseus arm as traced by O and B-typestars is not a dominant arm, and might be dispersing in the ﬁeld.This suggests that a recent, large-scale episode of (massive) starformation occurred in Sagittarius-Carina and Scutum-Centaurus,while in the other arms (Perseus and Local) stars formed earlier,except in a few isolated massive associations, such as Cygnusand Cassiopeia. To conﬁrm this, we selected stars brighter than M K s = − M K s = − K s absolute magnitude of B2V-type stars and B1V-typestar), and we evaluated their density on the plane following theprocedure described in Sec. 5. The density maps obtained withthese samples are shown in Fig. 17, where also the same mapof Fig. 11 is shown for clarity. The density contours in Fig. 17(left) trace the same dense structures as in Fig. 11 (note that thecontour levels are di ﬀ erent), while the low-density contours sub-stantially disappear. This is even more evident in Fig. 17 (right).This is expected, as intrinsically brighter stars are on averageyounger and thus have had less time to disperse in the ﬁeld.Drimmel (2000) and Drimmel & Spergel (2001) studied thespiral arm features of the Milky Way in the far-infrared and near-infrared and concluded that the near-infrared features were con-sistent with a two-armed spiral model, while the other two armswere traced only by the far-infrared emission and thus could per-haps be present only in gas and young stars. More recently, Xuet al. (2018) and Chen et al. (2019) noted that the spiral armstructure traced by O and early B-type stars may have many sub-structures in addition to the four major arms. Thus, they con-cluded that the Milky Way might not be pure grand design spi-ral galaxy with well-deﬁned, two or four dominant arms. Thismight suggest that the Milky Way could exhibit characteristicsof both ﬂocculent and grand design arms, with grand designseen in the infrared (old stellar populations) and more ﬂoccu-lent (or multi- arm) features seen in the optical (young stars).As mentioned above, these characteristics have been observed inspiral galaxies for instance by Kendall et al. (2011, 2015, andreferences therein), who concluded that galaxies that exhibit agrand design structure in the optical also exhibit such structure Article number, page 10 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy

Fig. 14.

Same as Fig. 11, with dust density contours from Lallement et al. (2019) over-plotted. Both densities are displayed in arbitrary units. Thedashed circles have radii from 1 to 5 kpc, in steps of 1 kpc. in the NIR, while there are optically ﬂocculent galaxies whichdo exhibit NIR grand design structure. As pointed out by Dobbs& Baba (2014), Elmegreen et al. (1999) and Elmegreen et al.(2011) noted however that most ﬂocculent galaxies do not ex-hibit grand-design structure and those that do have very weakspiral arms. Elmegreen et al. (2003) further suggested that bothgrand-design and ﬂocculent spirals (as seen in the old stars) ex-hibit a similar structure in the gas and young stars (indepen-dent of the underlying old stellar population) which is drivenby turbulence in the disc. The co-existence of di ﬀ erent spiralarm patterns in the Milky Way disc might have di ﬀ erent expla-nations. For example, Martos et al. (2004) concluded that sec-ondary arms could be resonance-related features in the quasi-stationary density wave picture, while Steiman-Cameron et al.(2010) and Drimmel (2000) proposed that the traditional fourspiral arms may represent the dynamical response of a gas discto a two-arm spiral perturbation in the mass distribution. As mentioned before, the density distribution presented inFig. 11 exhibits a prominent density enhancement roughly corre-sponding to the Sagittarius-Carina and Scutum-Centaurus arms.This alone does not allow to draw ﬁrm conclusions on the nature(nor on the number) of the spiral arms of the Milky Way. On theone hand, star formation occurs in a clumpy and patchy fashion:this might explain the observed density distribution without nec-essarily assuming a ﬂocculent structure. On the other hand, themap shown in Fig. 15 might suggest a di ﬀ erent picture, in whichthe Milky Way, as traced by young stars ( (cid:46)

300 Myr) does notshow a set of discrete and distinct set of spiral arms.The spiral arm structure of the Milky Way can also be investi-gated by using stellar kinematics (see e.g. Antoja et al. 2016). Ofcourse, a spiral mass density perturbation can have a very di ﬀ er-ent morphology from spiral-like density enhancements in molec-ular gas or young stars. Eilers et al. (2020) studied the kinematicsof red giant stars and showed a spiral feature in their radial ve- Article number, page 11 of 19 & A proofs: manuscript no. aanda

Fig. 15.

Cepheids from Skowron et al. (2019), colour-coded by age (Myr). The density map is the same as Fig. 11. The dashed circles have radiifrom 1 to 6 kpc, in steps of 1 kpc. locities that they interpret as rising from a perturbation to thepotential of the Milky Way caused by a two-armed logarithmicspiral. Their model predicts the locations of the spiral perturba-tion in the Milky Way, one roughly co-spatial to the Orion (orLocal) arm, and the other roughly corresponding to the Outerarm. The location of the perturbation predicted by Eilers et al.(2020) does not closely follow the over-densities of the map inFig. 11. The comparison between the radial velocities of the Oand B-type star sample and the giant sample will be crucial tomake progress in our understanding of the mechanism givingrise to the spiral signature presented by Eilers et al. (2020). Thedata of the SDSS-V survey will make such a comparison pos-sible, and will also likely allow to construct a surface densitymap of the giant sample which could be directly compared withthe map presented in this work. The complete kinematic infor-mation that we will obtain by combining

Gaia proper motionswith SDSS-V radial velocities will also allow to better separatestars belonging to di ﬀ erent spiral arms, and to study their inter-nal motions. Indeed, as already mentioned in Section 5, verticalvelocities are not related to the perturbation in the disc kinemat-ics induced by the spiral arms.

7. Conclusions

In this study we have analysed the 3D space distribution of asample of hot and luminous OBA stars, focusing in particular ontheir conﬁguration in the Galactic disc.Our target selection is based on the combination of

Gaia

EDR3 astrometry and photometry and 2MASS photometry, andis aimed at providing a well-deﬁned selection function. We de-scribe the properties of the target sample in terms of distributionin the sky, brightness, and variability. We estimate the purity andcontamination of the catalogue by comparing with existing Oand B-type star catalogues, such as those by Liu et al. (2019),and Sota et al. (2014) and Maíz Apellániz et al. (2016), and byderiving stellar parameters ( T e ﬀ and log g ) for stars in commonwith LAMOST DR6.By assuming that young massive stars are on near-circularorbits with a small velocity dispersion, we compute astro-kinematic distances, and compare them to those derived byBailer-Jones et al. (2020). We use these distances to study thedistribution of sources in the Galactic disc of a sub-set of thetarget sample, which we obtain by ﬁltering out sources with spu- Article number, page 12 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy

Fig. 16.

The masers (dots) and the ﬁt to the spiral arms (solid lines) from Reid et al. (2019) are over-plotted on top of our density map (same asFig. 11). The maser distances were computed by naively inverting the parallax. The dashed circles have radii from 1 to 6 kpc, in steps of 1 kpc.The shaded regions correspond to the arm width. Di ﬀ erent colours correspond to di ﬀ erent spiral arms: Outer, green ; Perseus, grey ; Local, pink ;Sagittarius-Carina, purple ; Scutum-Centaurus, orange . rious astrometric solutions and kinematic properties inconsistentwith our model.We ﬁnd that the distribution of sources in the plane ofthe Milky Way is highly structured, characterised by over- andunder-densities. Some of the density enhancements correspondto massive star forming regions, such as Carina, Cygnus, andCassiopeia. With these associations, the inner arms (Sagittarius-Carina and Scutum-Centaurus) are strikingly more prominent inOBA stars than any outer arms, for which we ﬁnd little evidence.The distribution of O and B-type stars from previous cata-logues, classical Cepheids, dust, and high-mass star forming re-gions shows similarities with our density distribution. However,the picture of the spiral arm structure of the Galaxy that we ob-tain in this study is complex and it may suggest that young starsshow little tendency to be neatly organised in distinct spiral fea-tures.To assess di ﬀ erent spiral arm models, it would be necessaryto extend our map beyond the volume currently studied and, per- haps more importantly, complement with better Gaia data andspectroscopic information.The target sample that we have devised is indeed optimisedfor spectroscopic follow-up with the SDSS-V survey. The in-formation that we will be able to obtain with SDSS-V will becrucial for studying the kinematics and dynamics of the spiralarms, and thus the nature of the spiral arms themselves. Finally,the sample that we have devised will further allow to study theproperties of massive stars, for example in terms of multiplicity,internal structure, and (binary) evolution, in a statistic fashion.

Acknowledgements.

We thank the referee for their comments, which improvedthe quality of this manuscript. We would also like to thank: J. Rybizki and G.Green for making the catalogue of spurious sources available in advance of pub-lication; A. Gould for discussions that prompted us to use kinematics for distanceestimations; M. Sormani for discussions on the kinematics of the MW bar andbulge.This work has made use of data from the European Space Agency (ESA)mission

Gaia ( ), processed by the Gaia

Data Processing and Analysis Consortium (DPAC, ). Funding for the DPAC has been pro-

Article number, page 13 of 19 & A proofs: manuscript no. aanda

Fig. 17.

Left: same as Fig. 11. Centre: density of stars with M K s < − M K s < − ﬀ erent. vided by national institutions, in particular the institutions participating in the Gaia

Multilateral Agreement.This publication makes use of data products from the Two Micron All Sky Sur-vey, which is a joint project of the University of Massachusetts and the InfraredProcessing and Analysis Center / / / ﬃ ce(BELSPO) through PRODEX grant PLATO.This research made use of TOPCAT (Taylor 2005), Astropy, (Astropy Collabo-ration et al. 2013, 2018), matplotlib (Hunter 2007), numpy (Harris et al. 2020),scipy (Virtanen et al. 2020), and scikit-learn (Pedregosa et al. 2011). This workwould have not been possible without the countless hours put in by members ofthe open-source community all around the world. References

Antoja, T., Roca-Fàbrega, S., de Bruijne, J., & Prusti, T. 2016, A&A, 589, A13Astraatmadja, T. L. & Bailer-Jones, C. A. L. 2016, ApJ, 832, 137Astropy Collaboration, Price-Whelan, A. M., Sip˝ocz, B. M., et al. 2018, AJ, 156,123Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al. 2013, A&A, 558,A33Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Demleitner, M., & Andrae,R. 2020, arXiv e-prints, arXiv:2012.05220Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Mantelet, G., & Andrae, R.2018, AJ, 156, 58Belokurov, V., Erkal, D., Deason, A. J., et al. 2017, MNRAS, 466, 4711Bouwens, R. J., Illingworth, G. D., Oesch, P. A., et al. 2011, ApJ, 737, 90Bouy, H. & Alves, J. 2015, A&A, 584, A26Bowen, I. S. & Vaughan, A. H., J. 1973, Appl. Opt., 12, 1430Bressan, A., Marigo, P., Girardi, L., et al. 2012, MNRAS, 427, 127Cantat-Gaudin, T., Anders, F., Castro-Ginard, A., et al. 2020, arXiv e-prints,arXiv:2004.07274Chen, B. Q., Huang, Y., Hou, L. G., et al. 2019, MNRAS, 487, 1400Chen, B. Q., Li, G. X., Yuan, H. B., et al. 2020, MNRAS, 493, 351 Churchwell, E., Babler, B. L., Meade, M. R., et al. 2009, PASP, 121, 213de Mink, S. E., Langer, N., Izzard, R. G., Sana, H., & de Koter, A. 2013, ApJ,764, 166Deason, A. J., Belokurov, V., Erkal, D., Koposov, S. E., & Mackey, D. 2017,MNRAS, 467, 2636Dobbs, C. & Baba, J. 2014, PASA, 31, e035Douglas, L. S., Bremer, M. N., Lehnert, M. D., Stanway, E. R., & Milvang-Jensen, B. 2010, MNRAS, 409, 1155Drimmel, R. 2000, A&A, 358, L13Drimmel, R., Smart, R. L., & Lattanzi, M. G. 2000, A&A, 354, 67Drimmel, R. & Spergel, D. N. 2001, ApJ, 556, 181Eilers, A.-C., Hogg, D. W., Rix, H.-W., et al. 2020, arXiv e-prints,arXiv:2003.01132Eilers, A.-C., Hogg, D. W., Rix, H.-W., & Ness, M. K. 2019, ApJ, 871, 120Elmegreen, B. G., Elmegreen, D. M., & Leitner, S. N. 2003, ApJ, 590, 271Elmegreen, D. M., Chromey, F. R., Bissell, B. A., & Corrado, K. 1999, AJ, 118,2618Elmegreen, D. M., Elmegreen, B. G., Yau, A., et al. 2011, ApJ, 737, 32Frankel, N., Rix, H.-W., Ting, Y.-S., Ness, M., & Hogg, D. W. 2018, ApJ, 865,96Frankel, N., Sanders, J., Rix, H.-W., Ting, Y.-S., & Ness, M. 2019, ApJ, 884, 99Gaia Collaboration, Brown, A. G. A., Vallenari, A., et al. 2020a, arXiv e-prints,arXiv:2012.01533Gaia Collaboration, Prusti, T., de Bruijne, J. H. J., et al. 2016, A&A, 595, A1Gaia Collaboration, Smart, R. L., Sarro, L. M., et al. 2020b, arXiv e-prints,arXiv:2012.02061Grasha, K., Calzetti, D., Adamo, A., et al. 2019, MNRAS, 483, 4707Grasha, K., Calzetti, D., Bittle, L., et al. 2018, MNRAS, 481, 1016Gravity Collaboration, Abuter, R., Amorim, A., et al. 2018, A&A, 615, L15Gunn, J. E., Siegmund, W. A., Mannery, E. J., et al. 2006, AJ, 131, 2332Haiman, Z. & Loeb, A. 1997, ApJ, 483, 21Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357Hopkins, P. F., Kereš, D., Oñorbe, J., et al. 2014, MNRAS, 445, 581Hunter, J. D. 2007, Computing in Science Engineering, 9, 90Iorio, G., Belokurov, V., Erkal, D., et al. 2018, MNRAS, 474, 2142Kendall, S., Clarke, C., & Kennicutt, R. C. 2015, MNRAS, 446, 4155Kendall, S., Kennicutt, R. C., & Clarke, C. 2011, MNRAS, 414, 538Kobulnicky, H. A., Kiminki, D. C., Lundquist, M. J., et al. 2014, ApJS, 213, 34Kollmeier, J. A., Zasowski, G., Rix, H.-W., et al. 2017, arXiv e-prints,arXiv:1711.03234Lallement, R., Babusiaux, C., Vergely, J. L., et al. 2019, A&A, 625, A135Langer, N., Schürmann, C., Stoll, K., et al. 2020, A&A, 638, A39Lindegren, L., Bastian, U., Biermann, M., et al. 2020a, arXiv e-prints,arXiv:2012.01742Lindegren, L., Klioner, S. A., Hernández, J., et al. 2020b, arXiv e-prints,arXiv:2012.03380Liu, Z., Cui, W., Liu, C., et al. 2019, ApJS, 241, 32Mac Low, M.-M. & Klessen, R. S. 2004, Reviews of Modern Physics, 76, 125Maíz Apellániz, J., Sota, A., Arias, J. I., et al. 2016, ApJS, 224, 4Martos, M., Hernandez, X., Yáñez, M., Moreno, E., & Pichardo, B. 2004, MN-RAS, 350, L47Moe, M. & Di Stefano, R. 2017, ApJS, 230, 15O’Shaughnessy, R., Kim, C., Kalogera, V., & Belczynski, K. 2008, ApJ, 672,479Pecaut, M. J. & Mamajek, E. E. 2013, ApJS, 208, 9Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of MachineLearning Research, 12, 2825Podsiadlowski, P., Joss, P. C., & Hsu, J. J. L. 1992, ApJ, 391, 246Poggio, E., Drimmel, R., Andrae, R., et al. 2020, Nature Astronomy, 4, 590

Article number, page 14 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy

Poggio, E., Drimmel, R., Lattanzi, M. G., et al. 2018, MNRAS, 481, L21Reid, M. J., Menten, K. M., Brunthaler, A., et al. 2019, ApJ, 885, 131Reid, M. J., Menten, K. M., Brunthaler, A., et al. 2014, ApJ, 783, 130Riello, M., De Angeli, F., Evans, D. W., et al. 2020, arXiv e-prints,arXiv:2012.01916Romero-Gómez, M., Mateu, C., Aguilar, L., Figueras, F., & Castro-Ginard, A.2019, A&A, 627, A150Rybizki, J., Green, G., Rix, H.-W., et al. 2021, arXiv e-prints, arXiv:2101.11641Sana, H., de Mink, S. E., de Koter, A., et al. 2012, Science, 337, 444Sana, H., Le Bouquin, J. B., Lacour, S., et al. 2014, ApJS, 215, 15Schönrich, R., Binney, J., & Dehnen, W. 2010, MNRAS, 403, 1829Skowron, D. M., Skowron, J., Mróz, P., et al. 2019, Science, 365, 478Skrutskie, M. F., Cutri, R. M., Stiening, R., et al. 2006, AJ, 131, 1163Smee, S. A., Gunn, J. E., Uomoto, A., et al. 2013, AJ, 146, 32Sota, A., Maíz Apellániz, J., Morrell, N. I., et al. 2014, ApJS, 211, 10Steiman-Cameron, T. Y., Wolﬁre, M., & Hollenbach, D. 2010, ApJ, 722, 1460Tang, J., Bressan, A., Rosenﬁeld, P., et al. 2014, MNRAS, 445, 4287Taylor, M. B. 2005, in Astronomical Society of the Paciﬁc Conference Se-ries, Vol. 347, Astronomical Data Analysis Software and Systems XIV, ed.P. Shopbell, M. Britton, & R. Ebert, 29Van Bever, J. & Vanbeveren, D. 2000, A&A, 358, 462Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nature Methods, 17, 261Wilson, J. C., Hearty, F. R., Skrutskie, M. F., et al. 2019, PASP, 131, 055001Xu, Y., Hou, L.-G., & Wu, Y.-W. 2018, Research in Astronomy and Astro-physics, 18, 146Zari, E., Hashemi, H., Brown, A. G. A., Jardine, K., & de Zeeuw, P. T. 2018,A&A, 620, A172

Article number, page 15 of 19 & A proofs: manuscript no. aanda

Appendix A: Query of the

Gaia archive

Here we provide an example query for cross-matching 2MASSand

Gaia

EDR3 in the

Gaia archive by using the cross-matchwith

Gaia

DR2. select edr3.* xdr2.*, tm.*from gaiaedr3.gaia_source as edr3inner join gaiaedr3.dr2_neighbourhood as xdr2on edr3.source_id = xdr2.dr3_source_idinner join gaiadr2.tmass_best_neighbour as xtmon xdr2.dr2_source_id = xtm.source_idinner join gaiadr1.tmass_original_valid AS tmon tm.tmass_oid = xtm.tmass_oidWHERE xtm.angular_distance < 1.AND xdr2.angular_distance < 100.AND edr3.phot_g_mean_mag < 16.AND edr3.parallax < power(10,(10-tm.ks_m)/5) Appendix B: Astro-kinematic distances

In this Section we describe all the terms of Eq. 8, which we re-port here for convenience: p ( d | o , m K s Θ KM , Θ SM , Θ CMD ) ∝ p ( o | Θ KM ) p ( d , m K s | l , b , Θ SM , Θ CMD ) . (B.1)The prior p ( d , m K s | l , b , Θ SM , Θ CMD ) can be written as: p ( d , m K s | l , b , Θ SM , Θ CMD ) ∝ p ( d | l , b , Θ SM ) f ( d , m K s | Θ CMD ) . (B.2)The term p ( d | l , b , Θ SM ) represents the probability density func-tion of observing a star in the direction ( l , b ) at distance d fromthe Sun according to our assumed model for the spatial distri-bution of stars in the Galaxy. The term f ( d , m K s | Θ CMD ) speciﬁesthe fraction of stars that can be observed at a distance d and mag-nitude m K s given the distribution of stars in colour-magnitudespace and our selection criteria. In the following Sections, wedescribe the di ﬀ erent components of Eq. B.1.Figure B.1 shows example pdf ’s for four randomly selectedstars. The di ﬀ erent components of the pdf ’s are represented withthin coloured lines, and the total pdf , P ( d | ... ), is represented witha black solid line. Both the parallaxes of the stars in the top rowhave σ (cid:36) /(cid:36) < P ( d | µ l ∗ , Θ KM ), which representthe probability of observing a star at distance d given the ob-served proper motion along Galactic latitude and the kinematicmodel Θ KM . P ( d | µ l ∗ , Θ KM ) shows bi-modality, with a primarymaximum at ∼ ∼ . pdf shows traces of such bi-modality. In theright panel of the top row instead, the parallax component domi-nates the distance determination. The star in the left panel of thebottom row has σ (cid:36) /(cid:36) < P ( d | µ l ∗ , Θ KM and P ( d | µ b , Θ KM peak at closer distances than the parallaxterm, and thus put a strong constraint on the ﬁnal distance es-timate. Finally, the star in the right panel of the bottom row has σ (cid:36) /(cid:36) ≈ pdf is however still quite nar-row, as a result of the combination of the di ﬀ erent terms. Appendix B.1: Kinematic model

We assume that stars in our sample follow the rotation curvedetermined by Eilers et al. (2019), v c =

229 km s − − . − kpc − ( R − R (cid:12) ) , (B.3) with R the Galactocentric radius and R (cid:12) = .

122 kpc (GravityCollaboration et al. 2018). The intrinsic velocity dispersions atthe Sun’s location in the components of the velocity in Galacticcoordinates U , V , W are σ U = . / s, σ V = . / s, and σ W = / s (Robin et al. 2003). These values are assumed tobe constant over the disc.In Cartesian Galactic coordinates, the pdf of the space veloc-ity for one star in our sample is: p ( v | Θ KM ) = π ) / | S | / exp (cid:32) −

12 ( v − u ) (cid:48) S − ( v − u ) (cid:33) , (B.4)the prime signifying the transpose of the vector. The quantity u takes into account Galactic rotation and Solar motion, and it canbe written as: u = v c ( x star ) − v c ( x (cid:12) ) − v (cid:12) , (B.5)where v c ( x star ) = v c ( r star ) · (cid:0) ˆ x star × e Z (cid:1) (B.6)is the circular velocity in Galactocentric Cartesian coordi-nates of a star at a Galactocentric distance r star and x star = [ x star , y star , z star ] . The unit vectors ˆ x star and e Z are deﬁned as ˆ x star = x star / | x star | and e Z = [0 , , x (cid:12) , v c ( x (cid:12) ) = v c ( r (cid:12) ) e Y , (B.7)where e Y = [0 , ,

0] is the unit vector in the Y direction in Galac-tic coordinates. We assumed the Sun’s position with respect tothe Galactic centre to be x (cid:12) = [ − . , ., . v (cid:12) = [11 . , . , .

25] km / s (Schönrich et al. 2010). Wewrite the velocity dispersion matrix S as:  σ U σ V

00 0 σ W  (B.8)For each star the astrometric observables are the parallax (cid:36) ,and the proper motions components in l and b , µ l ∗ = µ l cos b ,and µ b respectively. These are collected in arrays: o =  (cid:36) − (cid:36) µ l ∗ µ b  (B.9)where (cid:36) is the systematic zero point o ﬀ set of Gaia

EDR3 paral-laxes. There is not a unique (cid:36) for all the Gaia

EDR3 stars. Fol-lowing Bailer-Jones et al. (2020), we applied the parallax zero-point correction derived by Lindegren et al. (2020a).By assuming Gaussian errors, the pdf for the observables o given the true values ˜ o , is then: p ( o | ˜ o ) = π ) / | C | / exp (cid:32) −

12 ( o − ˜ o ) (cid:48) C − ( o − ˜ o ) (cid:33) . (B.10)The true values are deﬁned as:˜ o =  / d p (cid:48) · v / d A q (cid:48) · v / d A  (B.11)where d is the true distance to the star and A = . / s.The two vectors p and q are the two components of the normaltriad in longitude and latitude, and, as above, the prime signify Article number, page 16 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy

Fig. B.1.

Posterior pdf ’s for four stars in our sample (their

Gaia

EDR3 source_id is displayed on top of each panel). The thick black solidline represents p ( d | o , m K s Θ KM , Θ SM , Θ CMD ). The coloured thin lines illustrate the di ﬀ erent components of the pdf ’s. The correlation terms in thecovariance matrix of Eq. A.12 are neglected here for illustrative purposes. The distributions are scaled so that they peak at unity. their transpose. The elements of the astrometric covariance ma-trix C are:  σ (cid:36) ρ (cid:36)µ l ∗ σ (cid:36) σ µ l ∗ ρ (cid:36)µ b σ (cid:36) σ µ b ρ (cid:36)µ l ∗ σ (cid:36) σ µ l ∗ σ µ l ∗ ρ µ l ∗ µ b σ µ l ∗ σ µ b ρ (cid:36)µ b σ (cid:36) σ µ b ρ µ l ∗ µ b σ µ l ∗ σ µ b σ µ b  (B.12)and are provided in the Gaia archive.The joint pdf of the observables with the velocity is there-fore: p ( o , v | d ) = p ( o | ˜ o ( v , d )) p ( v | Θ KM ) . (B.13)The pdf for the observables is obtained by marginalising overthe velocity: p ( o | Θ KM ) = (cid:90) + ∞∞ d v p ( o | ˜ o ( v , d )) p ( v | Θ KM ) . (B.14)The integral can be resolved analytically. Since the product oftwo normal pdf is normal and the marginal density of a normal pdf is also normal, we can write: p ( o | Θ KM ) = π ) / | D | / exp (cid:32) −

12 ( o − c ) (cid:48) D − ( o − c ) (cid:33) , (B.15)where c =  / d p (cid:48) · u / d A q (cid:48) · u / d A  (B.16) and D = C +  p (cid:48) S p p (cid:48) S q q (cid:48) S p q (cid:48) S q  . (B.17) Appendix B.2: Structural model

The probability of a star to be at a true distance d is proportionalto the stellar density ρ predicted by our spatial model Θ SM , sothat we can write: p ( d | l , b , Θ SM ) ∝ d ρ ( d | l , b , Θ S M ) , (B.18)where the Jacobian term d takes volume e ﬀ ects into account.For convenience we write our model in Galactocentric coordi-nates ( R , φ, z ), where the stellar density is modelled as an expo-nential disc: ρ ( R , φ, z ) = ρ exp (cid:18) − R − R (cid:12) L (cid:19) × exp (cid:32) − | z | h z (cid:33) (B.19) R and z are the Galactocentric radius and height above the plane.The quantity h z = .

15 kpc is the disc scale height (cfr. Poggioet al. 2020), and L = . < Article number, page 17 of 19 & A proofs: manuscript no. aanda

Fig. B.2.

The e ﬀ ective luminosity function f ( d , m K s | Θ CMD ) used to de-termine our prior, normalised to unity.

Appendix B.3: Luminosity function

We assume that the Milky Way has a universal luminosity func-tion φ ( M K s ) = φ ( M K s ( m K s , d )), where M K s is the absolute mag-nitude in the 2MASS K s band, and M K s = m K s − ( d ) + K s band to be able to neglectextinction, at least as a ﬁrst approximation. Following the pro-cedure outlined by Poggio et al. (2020), we construct the dis-tribution of stars in colour- magnitude space φ ( m K s , d , c ) by as-suming a constant star formation rate, the two-part power lawKroupa initial mass function corrected for unresolved binaries,and solar metallicity. The term c indicates a colour (for instance, G − K s ). The distribution of stars in colour-magnitude spaceis then modiﬁed by setting to zero the parts of the distributionwhere T e ﬀ < ﬀ ectively changes the term S ( c ) of Eq. B.20 into a selec-tion approximately based on e ﬀ ective temperature, S ( T e ﬀ ). Wecan thus write the e ﬀ ective luminosity function, shown in Fig.B.2, f ( d , m K s | Θ CMD ) as: f ( d , m K s | Θ CMD ) = (cid:90) φ ( m K s , d , c ) S ( c ) dc , (B.20)where S ( c ) = T e ﬀ > S ( c ) = Appendix B.4: Distance uncertainties

Figure B.3 shows the fractional distance uncertainty for stars inthe ﬁltered sample (see Section 4). For distances smaller than ∼ σ (cid:36) /(cid:36) < .

1, and the inverseparallax is a good distance estimate (see also Bailer-Jones et al.2020). For distances larger than ∼ d kin < d kin <

10 kpc.

Appendix B.5: Distance validation

To validate our distances, we identiﬁed the stars in our sam-ple that belong to the three clusters IC 4756, NGC 2112, and

Fig. B.3.

Fractional distance uncertainty d kin , − d kin , / d kin for stars inthe clean sample. The dashed light gray lines correspond to the 16 th and84 th percentiles, the solid gray line corresponds to the median fractionalerror. Fig. B.4.

Single star distance pd f ’s for the members of the three clustersIC 4756 (black), NGC 2112 (blue), and King 21 (orange). The grey ver-tical lines represent the cluster distances (respectively 469pc, 1090pc,and 3005pc) derived by Cantat-Gaudin et al. (2020).

King 21, respectively at 469 pc, 1090 pc, and 3005 pc, as deter-mined by Cantat-Gaudin et al. (2020). Fig. B.4 shows the dis-tance pd f ’s of such stars. As exepcted, the pd f ’s for stars atlarger distances are broader than for those at closer distances.The distance estimates for all the cluster members are compat-ible within 2 σ with the average distance value. The dispersionaround the median distance is between 10-20 pc for IC 4756,around 65 pc for NGC 2112, and around 300 pc for King 21.This indicates that distance uncertainties are within the 10% for d ∼ Appendix C: Vertical velocity

As described in Drimmel et al. (2000), the vertical velocity v z ofa star is computed by: v z = . µ b d / cos( b ) + W (cid:12) + v r sin( b ) , (C.1)where b is Galactic longitude, µ b is the proper motion along b , d is the distance estimate, W (cid:12) is the Sun’s velocity along Z , and v r Article number, page 18 of 19ari et. al: Mapping Luminous Hot Stars in the Galaxy is the line-of-sight velocity. The majority of stars in our samplelacks line-of-sight velocities, thus it is not possible to calculatedirectly their vertical velocity. However, for stars at low Galac-tic latitudes ( | b | < ◦ ), the term v r sin( b ) can be approximatelycomputed by assigning to each star the velocity it would have ifit followed exactly the Galactic rotation curve: v r sin( b ) ≈ ( S − S (cid:12) ) tan( b ) , (C.2)where: S (cid:12) = U (cid:12) cos l + V (cid:12) sin( l ), with l the galactic longitude ofa star, and U (cid:12) and V (cid:12) the components of the Sun’s motion along X and Y respectively; S = v φ R (cid:12) / R − v LS R sin( l ), with R (cid:12) thedistance of the Sun from the Galactic Centre, R a star’s Galacto-centric radius, v φ the azymutal velocity, and v LS R the standard ofrest velocity. We assumed ( U , V , W ) (cid:12) = (11 . , . , .

25) km / sfrom Schönrich et al. (2010), and the rotation curve derived byEilers et al. (2019), reported in Eq. B.3.sfrom Schönrich et al. (2010), and the rotation curve derived byEilers et al. (2019), reported in Eq. B.3.