[PDF] Substructure at High Speed II: The Local Escape Velocity and Milky Way Mass with Gaia DR2

Abstract

Measuring the escape velocity of the Milky Way is critical in obtaining the mass of the Milky Way, understanding the dark matter velocity distribution, and building the dark matter density profile. In Necib & Lin (2021), we introduced a strategy to robustly measure the escape velocity. Our approach takes into account the presence of kinematic substructures by modeling the tail of the stellar distribution with multiple components, including the stellar halo and the debris flow called the Gaia Sausage (Enceladus). In doing so, we can test the robustness of the escape velocity measurement for different definitions of the "tail" of the velocity distribution, and the consistency of the data with different underlying models. In this paper, we apply this method to the second data release of Gaia and find that a model with at least two components is preferred. Based on a fit with three bound components to account for the disk, relaxed halo, and the Gaia Sausage, we find the escape velocity of the Milky Way at the solar position to be v esc = 484.6 +17.8 −7.4 km/s. Assuming a Navarro-Frenck-White dark matter profile, and taken in conjunction with a recent measurement of the circular velocity at the solar position of v c =230±10 km/s, we find a Milky Way concentration of c 200 = 13.8 +6.0 −4.3 and a mass of M 200 = 7.0 +1.9 −1.2 × 10 11 M ⊙ , which is considerably lighter than previous measurements.

Full PDF

DDraft version February 5, 2021

Typeset using L A TEX twocolumn style in AASTeX63

Substructure at High Speed II: The Local Escape Velocity and Milky Way Mass with

Gaia

DR2

Lina Necib

1, 2, 3 and Tongyan Lin Walter Burke Institute for Theoretical Physics, California Institute of Technology, Pasadena, CA 91125, USA Center for Cosmology, Department of Physics and Astronomy, University of California, Irvine, CA 92697, USA Observatories of the Carnegie Institution for Science, 813 Santa Barbara St., Pasadena, CA 91101, USA Department of Physics, University of California San Diego, La Jolla, CA 92093, USA

ABSTRACTMeasuring the escape velocity of the Milky Way is critical in obtaining the mass of the MilkyWay, understanding the dark matter velocity distribution, and building the dark matter densityproﬁle. In Necib & Lin (2021), we introduced a strategy to robustly measure the escape velocity.Our approach takes into account the presence of kinematic substructures by modeling the tail of thestellar distribution with multiple components, including the stellar halo and the debris ﬂow called the

Gaia

Sausage (Enceladus). In doing so, we can test the robustness of the escape velocity measurementfor diﬀerent deﬁnitions of the “tail” of the velocity distribution, and the consistency of the data withdiﬀerent underlying models. In this paper, we apply this method to the second data release of

Gaia and ﬁnd that a model with at least two components is preferred. Based on a ﬁt with three boundcomponents to account for the disk, relaxed halo, and the

Gaia

Sausage, we ﬁnd the escape velocity ofthe Milky Way at the solar position to be v esc = 484 . +17 . − . km/s. Assuming a Navarro-Frenck-Whitedark matter proﬁle, and taken in conjunction with a recent measurement of the circular velocity atthe solar position of v c = 230 ±

10 km/s, we ﬁnd a Milky Way concentration of c = 13 . +6 . − . and amass of M = 7 . +1 . − . × M (cid:12) , which is considerably lighter than previous measurements. INTRODUCTIONSince the initial discovery of dark matter (DM)(Zwicky 1933), estimating the total mass and densityproﬁle of the Milky Way has been of crucial importance,providing a window into estimating the mass of the un-seen DM. Various methods have been suggested to tacklethis question, from modeling the density distributionsof the diﬀerent Galactic components (Caldwell & Os-triker 1981), to the study of the fastest moving stars(Alexander 1982), to ﬁtting the local escape velocity ofthe stars as a way to constrain the local gravitationalpotential (Leonard & Tremaine 1990), to more complexmethods that involve using large stream structures suchas the Sagittarius stream to constrain the Milky Waypotential at large distances (Gibbons et al. 2014; Dier-ickx & Loeb 2017).In recent years, a number of new phase-space struc-tures have been discovered, which speaks to the successof hierarchical structure formation (White & Rees 1978)and also suggests the need to re-examine methods to ex-tract the Milky Way mass. One of the many discoveriespioneered by

Gaia was the

Gaia

Sausage or

Gaia

Ence-ladus (Belokurov et al. 2018; Helmi et al. 2018), whichwe will refer to as the Sausage in the remainder of thispaper. The Sausage is the remnant of a merger that oc- curred 6 to 10 billion years ago between a galaxy with astellar mass of ∼ − M (cid:12) and the Milky Way (Myeonget al. 2018; Deason et al. 2018; Lancaster et al. 2018).This substructure is kinematically distinct from the stel-lar halo, with stars on extremely radial orbits (Deasonet al. 2018), and shifts the peak of the stellar speed dis-tribution to lower values compared to the Standard HaloModel (Necib et al. 2019a).In Necib & Lin (2021), we introduced a method toaccount for the presence of kinematic substructures inmeasurements of the escape velocity. Our work buildson the approach of Leonard & Tremaine (1990), whichmodeled the tail of the stellar speed distribution as f ( v ) ∝ ( v esc − v ) k , v min < v < v esc (1)where v esc is the escape velocity, k is the slope, and v min is an arbitrary speed above which we deﬁne the “tail”of the distribution. Many papers have used this for-mulation to infer the local escape velocity by ﬁtting forthe parameters v esc and k with various datasets and as-sumptions (Smith et al. 2007; Piﬄ et al. 2014; Monariet al. 2018; Deason et al. 2019; Koppelman & Helmi2020). These studies have found large correlations be-tween v esc and k , and subsequently large errors on theescape velocity measurements. In order to reduce the er- a r X i v : . [ a s t r o - ph . GA ] F e b ror bars associated with v esc , works such as Leonard &Tremaine (1990); Smith et al. (2007); Piﬄ et al. (2014);Monari et al. (2018); Deason et al. (2019) argue for im-posing prior ranges on k based on simulations. However,Grand et al. (2019) and Necib & Lin (2021) showed thatthis approach can lead to underestimates of the escapevelocity if the prior range on k is too low, or overesti-mates of v esc if the prior range on k is too high comparedto the Milky Way.In Necib & Lin (2021), hereafter Paper I, we devel-oped a strategy for measuring v esc that does not rely onartiﬁcial prior ranges, and that accounts for the pres-ence of substructure by including multiple bound com-ponents following Eq. (1). In particular, the choice of arather low v min = 300 km/s is standard in the literaturein order to increase statistics, but it is not clear thatthe speed distribution for v > v min can be described byonly one power-law component. For instance, in simu-lated halos with major mergers, the speed distributioncan deviate signiﬁcantly from the power-law form due tosubstructure (Grand et al. 2019). Since it is known thatthe Sausage contributes a large fraction of non-disk starsin the solar neighborhood (Necib et al. 2019a,b), thereis strong motivation to include multiple components inmodeling the speed distribution.To test the idea that v esc measurements might be bi-ased by kinematic substructures, in Paper I we gener-ated mock data that contains two components followingEq. (1), with a common v esc but diﬀerent k . We at-tributed these to a “relaxed” stellar halo component,which has a larger slope k ∼ k S = 1. These slopes arebased on the analytic arguments of Deason et al. (2019)for tracer populations with diﬀerent levels of velocityanisotropy. When v min is low, we found that a single-component ﬁt to the mock data tends to overestimate v esc and also give larger error bars on v esc , but as v min isincreased the single-component ﬁt will converge to thecorrect value. A key conclusion of Paper I is that a ro-bust inference of v esc requires testing ﬁt dependence on v min and on the number of components.In this work, we use the methods developed in Paper Ito estimate the escape velocity of the Milky Way at thesolar position. We perform the analysis on the seconddata release of Gaia ( Gaia

DR2) subset that includesline-of-sight velocity measurements and passes the qual-ity cuts of Sec. 2, as well as the subset of this sample withstars on retrograde motion. We test for the dependenceof v esc on both v min and number of components in orderto ensure that the model selection is self-consistent andthat results are robust to changes in data selection. Thispaper is organized as follows: we discuss the datasets used in Sec. 2, present the analysis and discuss best ﬁtresults in Sec. 3, and study implications for the mass ofthe Milky Way in Sec. 4. DATAWe use

Gaia

DR2 and focus on the subset of starswith radial velocity measurements (Gaia Collaborationet al. 2016, 2018) such that we can reconstruct the stellarspeed. This subset is already a rather local sample ofstars, but in order to restrict to a local measurementof the escape velocity, we implement a Galactocentricdistance cut of r GC ∈ [7 . , .

0] kpc.We deﬁne the distance measurement as the inverse ofthe measured parallax (1 /(cid:36) ). In order to make surethis assumption is valid, we also implement a cut on theparallax error, such that the fractional parallax error foreach star is less than 10%. We use Bovy (2011) to an-alytically transform the proper motions and the radialvelocities of

Gaia into Galactocentric Cartesian coor-dinates, using the Local Standard of Rest (

U, V, W ) =(11 . , . , .

25) km/s (Sch¨onrich et al. 2010). Weassume the position of the Sun in these coordinates is( − . , , .

02) kpc (Bennett & Bovy 2019).Of the stars passing the cuts above, the majority havefractional errors on the measured speed of (∆ v ) /v < v ) /v ≤

5% on the data sampleto ensure the data is of comparable quality as the mockstested in Paper I. For the entire stellar sample where thefractional speed error is ≤ v >

400 km/s.In previous works such as Monari et al. (2018); Dea-son et al. (2019), only retrograde stars were modeled inorder to avoid contamination from the disk. However,retrograde data also yields lower statistics, with only 442stars above 300 km/s and 66 above 400 km/s, makingit more diﬃcult to draw strong conclusions from sucha small sample. Instead, with the analysis pipeline ofPaper I, we can easily account for possible disk contam-ination by introducing an additional component in themodel. We will therefore consider two sets of analyses:with the

Gaia

DR2 subset as discussed above, whereone of the components in our ﬁt may be attributed tothe disk, and with the subset of retrograde stars only.As we will see below, in ﬁtting only the retrograde starswe are not able to obtain a robust estimate of v esc .

300 400 500 600 700 800 | ~v | [km/s] − − − − − − f ( | ~ v | ) [ k m / s ] − N =1862 v min =300 km/s DataHaloOutliersSum

300 400 500 600 700 800 | ~v | [km/s] − − − − − − f ( | ~ v | ) [ k m / s ] − N =1862 v min =300 km/s DataHaloOutliersSausageSum

300 400 500 600 700 800 | ~v | [km/s] − − − − − − f ( | ~ v | ) [ k m / s ] − N =1862 v min =300 km/s DataDiskOutliersHaloSausageSum

Figure 1.

Best ﬁt results obtained from the full

Gaia data with one (left), two (middle), and three (right) bound components.The bound components are labeled with decreasing k as Disk, Halo, and Sausage for convenience and in line with our expectationfor how diﬀerent components might behave, but the speciﬁc assignment cannot be determined from the ﬁt. The solid lines arethe best ﬁt distributions, while the shaded regions are the 68% containment regions obtained from the posteriors of the modelparameters. The single-component model does not give a good ﬁt to the data, while the results of the two- and three-componentﬁts are largely consistent with each other and ﬁt the data well.3. RESULTSWe ﬁrst brieﬂy summarize the analysis pipeline, whilethe complete details can be found in Paper I. We modelthe stellar speed distribution above v min with one, two,or three bound components, as well as an outlier com-ponent to account for unbound stars. Each of the boundcomponents has a common v esc while the outlier distri-bution can extend above v esc . Explicitly, the likelihoodfunction for a single star is L = f p out ( v obs | σ out )+(1 − f ) (cid:88) i f i p i ( v obs | v esc , k i ) , (2)where v obs is the observed speed, f i the fractional contri-bution of the i -th component to the bound stars, and p i is the distribution of the i -th bound component. To ob-tain p i , we consider Eq. (1) with v esc and k i and convolveit with the measurement error for that star. The boundcomponents are ordered such that k i > k i +1 when thereare multiple components. To describe unbound out-lier stars, we assume p out is a Gaussian function witha dispersion of σ out and that f is the outlier fraction.Although the three component ﬁt was not discussed inPaper I, it is a natural extension to include another com-ponent with the same v esc but a third slope k and anassociated fraction f . For the total likelihood, we sumover all stars in the dataset.We used the Markov Chain Monte Carlo emcee (Foreman-Mackey et al. 2013) to ﬁnd the best ﬁt pa-rameters, using 200 walkers, 500 steps for the burninstage, and 2000 steps for each run. We assume linearpriors in k i and f i and log priors in f, σ out . We take atheory prior that is uniform in 1 /v esc in order to favorlower v esc ; note that for our main results, we ﬁnd thatthe v esc posteriors are sharply peaked, such that thisprior has a negligible eﬀect on the results. 3.1. All Data

We begin with the entire dataset satisfying the qual-ity cuts of Sec. 2 and with v min = 300 km/s. Becauseof the possibility of disk contamination, we consider upto three bound components. We will refer to the com-ponents in order of decreasing k as the Disk, Halo, andSausage for convenience, but note that in this analysiswe cannot empirically determine a physical origin foreach of the bound components.Fig. 1 shows the best ﬁt distributions obtained withone, two, or three bound components. First, we see thatthe single component ﬁt (left panel) is not suﬃcient todescribe the data. This is due to the fact that the ﬁtfor the bound component is anchored at the low end | (cid:126)v | ∼

300 km/s, where there is a steep slope or high k . This leads to an underestimate of the number ofstars at v ∼

400 km/s, which is partially compensatedfor by increasing the outlier fraction, ultimately leadingto an overestimate of the number of stars at | (cid:126)v | > v esc towards larger values. In-deed, for this ﬁt we found v esc = 735 . +50 . − . km/s.Meanwhile, we see that the two and three compo-nent models do provide good ﬁts to the data and con-sistent results. The two function ﬁt yields v esc =519 . +17 . − . km/s and the three function ﬁt yields v esc =484 . +17 . − . km/s, which are consistent within two stan-dard deviations. In both ﬁts, we expect that the ﬁrstcomponent, or the highest k component, will correspondto any disk contamination which drops steeply in v . In-deed, we ﬁnd that the result for k = k pushes up againstour default prior of k i ∈ [0 . ,

15] on the upper end; in-creasing the prior range to k i ∈ [0 . ,

300 325 350 375 400 v min [km/s]4005006007008009001000 v e s c [ k m / s ] N =1862 N =769 N =425 N =250 N =161 Gaia Data

Err=5.0%

Single FunctionTwo Functions Three Functions

Figure 2.

Posteriors in the escape velocity for diﬀerent values of v min , from ﬁtting the dataset satisfying quality cuts andfractional speed error ≤ v min for v min <

375 km/s,suggesting the need for an additional component in the ﬁt. The posterior distributions for the two-component ﬁt (shaded red)are much more stable with v min . For v min = 300 km/s only, we show the posterior from a three-component ﬁt (shaded greenand dashed). The labels N indicate the number of stars in each sample. sulting slope for this disk component is k = 18 . +1 . − . and k = 18 . +0 . − . for the two and three function ﬁts,respectively. While it is expected that the disk distri-bution drops sharply for | (cid:126)v | (cid:38)

300 km/s, it might beconcerning to have a ﬁt pushing against the prior here.Therefore, we next turn to the dependence of our resultson v min . Studying the ﬁt dependence with increasing v min will provide additional tests of robustness as wellas further eliminate the disk contamination.Our analysis is repeated for v min ∈ [300 , , , , v esc posterior which drifts with v min suggests thatthe model is missing some important component. InFig. 2, we show the posterior distribution of the es-cape velocity for the single and two component ﬁts fordiﬀerent values of v min . As v min increases, v esc in thesingle-component ﬁt drifts towards lower values, con-verging towards similar results for v min = 375 km/s and v min = 400 km/s. This indicates that additional com-ponents are needed to describe the data below | (cid:126)v | ∼ v min ≥

375 km/s. Indeed, the posterior distributionsfor the two-component ﬁt are much more stable with v min and consistent with the single component ﬁt for v min ≥

375 km/s. However, for v min = 300 km/s the2-component posterior is peaked at higher v esc values,suggesting that a third component might be present.Thus, we also show the posterior for the 3-component

300 320 340 360 380 400 v min [km/s] − − − − − A I C Gaia Data

Err=5.0%

AIC - AIC AIC - AIC AIC - AIC Figure 3.

We use the ∆AIC to compare goodness-of-ﬁt be-tween models with diﬀerent numbers of bound components.The plot shows results using the full

Gaia dataset, and thesubscript indicates number of components in the ﬁt. When∆AIC is negative, the ﬁrst model is preferred. ﬁt and v min = 300 km/s. This result is consistentwith the posteriors for single and two-component ﬁtsat larger v min . (We do not show the posteriors for the3-component ﬁt and v min ≥

325 km/s because thoseare largely unconstrained, indicating that we have toomany degenerate parameters.)In order to compare goodness-of-ﬁt with diﬀerentnumbers of bound components, we calculate the AIC(Akaike 1974) as discussed in Paper I. The AIC is de-ﬁned as AIC = 2 s − L ) where s is the numberof model parameters and log( ˆ L ) is the maximum loglikelihood. To compare models, we compute the diﬀer-ence ∆AIC ab = AIC a − AIC b between ﬁts with a or b bound components, where the model with the lowestAIC is preferred. We show the resulting ∆AIC in Fig. 3,where we compare the model with one, two, and threecomponents. For v min = 300 km/s, where we expecta large disk contamination, we ﬁnd that the two andthree component models are strongly preferred over asingle function case. This is visible from the ﬁrst panelof Fig. 1, where a single bound component clearly doesnot describe the data as well. As expected however, thesingle function is preferred for v min ≥

375 km/s. Thisis because at high | (cid:126)v | the distribution is dominated bya single distribution, and extra model parameters arepenalized. It is interesting to see that the three functionﬁt is marginally better than the two function ﬁt onlyfor v min = 300 km/s, beyond which it leads to an over-ﬁt. This implies that the data is determined by threecomponents for v min = 300 km/s, but that the disk islargely subdominant past v min = 325 km/s.From this discussion, we see a consistent story emerge:there is a sharply peaked component of stars near | (cid:126)v | ∼

300 km/s, along with two other bound components thatdominate at higher speeds. A single component domi-nates the tail for | (cid:126)v | >

375 km/s. These could already beseen in the three-component ﬁt shown in Fig. 1, but isfurther supported by studying ﬁt dependence with v min and the ∆AIC. We conclude that a single-componentmodel is not a self-consistent description of the data forlow v min . Our analysis allows us to identify the “tails”where a single or two-component ﬁt is actually robust.In Sec. 3.3, we will summarize the key results for v esc and also compare with previous studies.It is important to note that our framework only showsthat this data is better modeled by multiple componentswith power law distributions. It cannot show that eachcomponent has a distinct physical origin. Doing so re-quires further studies. Nevertheless, the behavior of thediﬀerent components appears to be consistent with con-tributions from the Disk, relaxed Halo component, andSausage, and here we summarize the ﬁt results for thosecomponents. Additional results can be found in App. B.As mentioned in Paper I, we do not have a priori in-formation about which component is the Halo and whichis the Sausage; the labels of Fig. 1 are based on the as-sumption that the Sausage has the lowest slope k . Work-ing with that assumption, we ﬁnd that the posterior onthe slope of the Sausage is k S = 0 . +0 . − . from the threefunction ﬁt at v min = 300 km/s. This is similar to the

300 400 500 600 700 800 | ~v | [km/s] − − − − − − f ( | ~ v | ) [ k m / s ] − N =442 v min =300 km/s DataHaloOutliersSausageSum

Figure 4.

Two function ﬁt only to the retrograde stars sat-isfying quality cuts and with fractional speed errors cappedat 5%. The bound components and outlier population aremuch more poorly constrained than the ﬁts to all stars, andwe do not obtain a robust v esc result. range of slopes in simulations with Sausage-like mergers, k S ∈ [1 , . . +0 . − . and 0 . +0 . − . , respectively. Thisleads to the Sausage component being 0 . +0 . − . of thesenon-disk stars for | (cid:126)v | >

300 km/s. Using the valuesof the best ﬁt slopes, we ﬁnd that the Sausage compo-nent rises to 0 . +0 . − . of non-disk stars for | (cid:126)v | > v min = 350 km/s, f S = 0 . +0 . − . . These fractional contributions are im-portant in understanding the composition of the DM dis-tribution and impacts predicted signals in DM searches,which was studied in Necib et al. (2018, 2019c).3.2. Retrograde Data

We next perform the same analysis with retrogradestars only. Previous studies have applied this cut onthe data to avoid disk contamination. Similarly, herewe consider a ﬁt with two bound components since wedo not expect any disk contribution for v min = 300km/s. We show the resulting ﬁt in Fig. 4, where wesee that the 68% containment region (shaded bands) forall the components is much larger compared to the ﬁtto all stars in Fig. 1. It is expected that the uncer-tainties on the components should be larger given thatthe statistics is about 2.5 times lower, but the spreadin the bands here reﬂects more than that. Focusing onthe tail of the bound components, we see that there issigniﬁcantly more spread in v esc and in fact the ﬁt gives v esc = 710 . +216 . − . km/s. Dataset Functions v min v esc M , tot [km/s] [km/s] [10 M (cid:12) ]All data 3 300 484 . +17 . − . . +1 . − . All data 2 350 498 . +63 . − . . +2 . − . All data 1 375 515 . +113 . − . . +3 . − . Table 1.

Best ﬁt values of the escape velocity at the solarposition, as well as the total mass of the Milky Way obtainedthrough the assumptions of Sec. 4.

The reason for the much larger uncertainty in v esc hereis due to the diﬃculty in constraining the outlier pop-ulation. In particular, with the dataset in Fig. 4, it isdiﬃcult to distinguish whether the stars with speeds of500-600 km/s are likely to come from the outlier com-ponent or the bound components. We can further un-derstand this outlier confusion by looking at the escapevelocity posterior, which is shown in Fig. 5. Here wesee that the posterior for the 2-component ﬁt to retro-grade stars is actually double-peaked. One peak is at low v esc , at values roughly consistent with ﬁts to all stars.Meanwhile, the peak at high v esc is correlated with alower outlier fraction, where those high speed stars near | (cid:126)v | ∼

600 km/s can also partly be modeled with thebound components.As a result, taking v esc = 710 . +216 . − . km/s for ret-rograde stars is not a robust result. In App. C, weshow the ﬁt to retrograde stars for diﬀerent v min andsee similar results with non-convergent ﬁt results anddouble-peaked posteriors. This is expected since theoutlier confusion cannot be removed with larger v min .In fact, the larger number of high-speed stars in the fulldata sample helps pin down the outlier component andthus also v esc of the bound components. With limitedstatistics in the retrograde sample, modeling the tail ofa distribution is particularly sensitive to shot noise inthe outliers. Furthermore, applying a cut for retrogrademotion might be shaping the sample such that it is notkinematically complete. The possible presence of addi-tional kinematic structures was studied in Paper I by theintroduction of small peaked structures, and we foundthat it could strongly bias the results. Given these fac-tors, we will consider the ﬁt to all of the data as ourmain result.3.3. Comparison with previous results

The main results for v esc are summarized in Fig. 5and Table 1. For these ﬁts, we selected the lowest v min that was consistent with the number of bound compo-nents in the ﬁt. This was determined by the lowest v min ∈ [300 , , , ,

400 600 800 1000 v esc [km/s]0 . . . . a . u . v min = 300 km/s2 Func, v min = 350 km/s1 Func, v min = 375 km/sRetrograde, v min = 300 km/s Figure 5.

Posteriors for v esc from ﬁts to all data with 1, 2, or3 bound components, and from a ﬁt to retrograde data with2 bound components. Full data posterior distributions havebeen normalized to unity, while the retrograde data has beenmultiplied by a factor of 6 to clarify the non-convergence ofthe posterior. For each of ﬁts to all data, we selected thelowest v min valid for that number of components. For theretrograde data, we do not obtain a convergent result. see that the results for 1, 2, or 3 bound components arealso all consistent with each other.Previous studies all used a single function of the formof Eq. (1). To deal with a large degeneracy in v esc and k ,they typically impose an artiﬁcial prior on the slope k .Piﬄ et al. (2014); Monari et al. (2018) adopt a prior of k ∈ [2 . , . k ∈ [1 , .

5] given that cos-mological simulations with merger events similar to the

Gaia

Sausage had slopes in that range. These priorsstrongly aﬀect the measured escape velocity, giving riseto nonconvergent results as can be seen in the cornerplot of Deason et al. (2019).As discussed in Paper I, the use of a single functionwhere the ﬁt requires two leads to an averaging of theslopes k . For example, in v min = 350 km/s where thetwo function ﬁt suﬃces, we ﬁnd two components with k = 12 . +4 . − . and k S = 1 . +1 . − . , while a single func-tion ﬁt yields k = 8 . +4 . − . . One can see in this casethat although k S is indeed within the range suggestedby Deason et al. (2019), using a single function ﬁt anda limited prior in k can artiﬁcially inﬂate the escapevelocity value.Given this, it is not surprising that our results for v esc are lower than previous studies. Deason et al. (2019)ﬁnds v esc = 528 +24 − km/s with a prior range of k ∈ [1 , . v esc = 580 ± k ∈ [2 . , . k , they ﬁnd v esc = 497 +53 − km/s where the quoted errors are the99% conﬁdence intervals. Using a larger sample of starswith only tangential velocities, they report a lower limitof v esc = 497 +40 − km/s at the solar radius, again givingthe 99% conﬁdence interval. MASS OF THE MILKY WAYThe escape velocity at a certain radius is related tothe gravitational potential Φ. Therefore, measurementof the escape velocity can translate to a measurement ofthe mass of the Milky Way, once proﬁles of the diﬀer-ent mass components are assumed. For an isolated halo v esc ( r ) = (cid:112) | Φ( r ) | , but in practice we must select a lim-iting radius beyond which stars can become unbound. Amore realistic assumption is that stars are bound withina few times r , where r is the radius at which thegalaxy’s mass is 200 times the critical mass of the uni-verse. For ease of comparison, we will adopt the samedeﬁnition as Deason et al. (2019) by taking the limitingradius as 2 r , with v esc ( r (cid:12) ) = (cid:112) | Φ( r (cid:12) ) − Φ(2 r ) | .To recover the mass of the Milky Way, we make thefollowing assumptions on the diﬀerent baryonic and DMcomponents, similarly to Deason et al. (2019), whichassumes model I of Pouliasis et al. (2017): • The bulge is modeled as a Plummer proﬁle (Plum-mer 1911) of mass M bulge = 1 . × M (cid:12) anda scale radius b = 0 . • The thin disk is modeled as a Miyamoto-Nagai proﬁle (Miyamoto & Nagai 1975) of mass M thin disk = 3 . × M (cid:12) , a scale radius r thin disk = 5 . z thin disk = 0 .

25 kpc. • The thick disk is modeled as Miyamoto-Nagai pro-ﬁle of mass M thick disk = 3 . × M (cid:12) , a scaleradius r thick disk = 2 . z thick disk = 0 . • The Dark Matter proﬁle is modeled as an Navarro-Frenk-White (NFW) (Navarro et al. 1996) proﬁleof mass M and concentration parameter c ,which we will ﬁt for. We take the Hubble con-stant H = 70 km s − Mpc − , the matter abun-dance Ω M = 0 . . . . . . ( M ) [ M (cid:12) ]51015202530 c v esc ( r (cid:12) ) v circ ( r (cid:12) ) Figure 6.

Mass and concentration of the DM halo, con-strained by the circular velocity measurement (blue) fromEilers et al. (2019), and by the escape velocity (red) studiedin this work. For v esc we take the posterior from the three-function ﬁt and v min = 300 km/s. The contours are 68% and95% CL, and the black lines show the combined constraints. To translate the posterior distribution of v esc ,marginalized over all other parameters, into a posteriorin the enclosed mass-concentration M − c space,we use galpy (Bovy 2015) to compute the escape ve-locity of the summed potentials assumed above in a gridof M and c . We then plot the PDF of each point in M and c using the interpolated version of the es-cape velocity posterior distribution, and show in Fig. 6the 68% and 95% containment regions based on v esc .Similarly, we overlay constraints based on the circularvelocity v circ = 230 ±

10 km/s from Eilers et al. (2019).This is important as the escape velocity gives informa-tion on the mass of the Milky Way at large distances,while the circular velocity constrains it within the solarcircle.We now compare our ﬁndings to the literature. Wedeﬁne the total mass of the Milky Way as the mass con-tained within r of the DM halo, in addition to thebaryons, including the bulge, thin, and thick disk as de-scribed above. In Fig. 7, we show the results for the total mass based on the four analyses summarized in Sec. 3.3.For analyses with higher v min , the errors on the escapevelocity increases, as explicitly shown in Table 1. Thisleads to an increase in the error estimate of the mass ofthe Milky Way.For reference, we also show the masses found byMonari et al. (2018); Deason et al. (2019); Koppelman& Helmi (2020). We note there is a small diﬀerence inconvention, where Monari et al. (2018) and Koppelman& Helmi (2020) deﬁne the escape velocity cutoﬀ as 3 r and not 2 r as we do, although we do not expect more v e s c [ k m / s ] F un c , v m i n = k m / s F un c , v m i n = k m / s F un c , v m i n = k m / s R e t r og r a d e , v m i n = k m / s M o n a r i + ( ) D e a s o n + ( ) K o pp e l m a n + ( ) . . . . . . . . l og ( M , t o t ) [ M (cid:12) ] F un c , v m i n = k m / s F un c , v m i n = k m / s F un c , v m i n = k m / s R e t r og r a d e , v m i n = k m / s M o n a r i + ( ) D e a s o n + ( ) K o pp e l m a n + ( ) Figure 7.

Best ﬁt values of the escape velocity (top) andtotal mass (bottom) of the Milky Way for the diﬀerent anal-yses discussed in Sec. 3. The analyses of the full data is themost robust, and tends to be lower than previous measure-ments; it is within a single standard deviation of Deason et al.(2019) and Koppelman & Helmi (2020) and lower than mea-surements by Monari et al. (2018). The result based on theretrograde data is shown for completeness, but that analysisdid not give a robust v esc (see Fig. 4). We show the escapevelocity measurement of Koppelman & Helmi (2020) wherewe divide their errors by 3, as they quote 99% conﬁdenceintervals while we quote 68%. For the mass measurement,Koppelman & Helmi (2020) multiply the escape velocity es-timate by 10%, motivated by the ﬁndings of (Grand et al.2019) for potential biases. The mass plotted above is the onecorrected with this factor. than a few percent diﬀerence due to this eﬀect. We ﬁndthat our analysis consistently shows a lower mass of theMilky Way than previous studies, although it is within asingle standard deviation of the result from Deason et al.(2019). Note that we include the result from the retro-grade data for completeness, and to show that even thenon-convergent result for v esc still translates to a some-what consistent total mass.As can be seen in Fig. 6, the circular velocity mea-surement plays an important role in constraining themass of the Milky Way. For example, the current un-certainties on both the circular and escape velocities in- troduce a ∼

20% error on the mass estimate. Drop-ping the error on the circular velocity from 10 km/s to5 km/s would reduce this error down to ∼ M , tot = 6 . +1 . − . × M (cid:12) , or a ∼

10% diﬀerencefrom the value we quote above. A better understandingof the baryonic model of the Milky Way as well as amore accurate measurement of the circular velocity aretherefore important in improving the total Milky Waymass estimates. Similarly, using a contracted NFW pro-ﬁle due to the presence of baryons (Schaller et al. 2015)would also change the total mass estimate substantially;Piﬄ et al. (2014) used both the regular and contractedNFW proﬁles and found a diﬀerence of ∼ CONCLUSIONSIn this paper, we applied a new analysis pipeline forconstraining v esc that accounts for the presence of kine-matic substructure in the stellar speed distribution. Ourwork is motivated by the discovery of the Gaia

Sausage-Enceladus, as well as by the need to improve the ro-bustness of v esc ﬁts, which can be very sensitive to thedeﬁnition of the “tail” of the velocity distribution. Weintroduce a forward model that allows for multiple powerlaw components in the tail of the stellar velocity distri-bution, and showed that repeating the analysis for v min and number of components allows us to robustly deter-mine v esc . Details of this pipeline and examples withmock datasets can be found in Paper I.We found that at least two bound components is pre-ferred in ﬁts to the Gaia data for stars with speeds v >

325 km/s. These components are suggestive of aSausage-Enceladus component and a relaxed stellar halocomponent, but more study is needed to understand ifthe components truly have diﬀerent physical origin or ifa multi-component model simply provides a more ﬂex-ible ﬁtting framework. With this multi-component ﬁt,our result for v esc is lower than previous measurementsin the literature. On the other hand, previous worksassumed a single power law component for the tail, andwe have shown that assuming a single component wherethe data prefers more can lead to systematically higher v esc values.Using our results for v esc , we determined the totalmass of the Milky Way, ﬁnding a value of the concen-tration of c = 13 . +6 . − . and a total mass of M =7 . +1 . − . × M (cid:12) . Our result for the total mass is lowerthan those of previous studies relying on the escape ve-locity to obtain the mass of the Galaxy. However, theyare more consistent with methods based on the distri-bution function of globular clusters (e.g. Eadie & Juri´c(2019)) and matching satellites of the Milky Way withtheir simulation counterparts (e.g. Patel et al. (2018)).Wang et al. (2020) provides a review of all these meth-ods, which shows a large scatter in the mass estimatesranging from ∼ (0 . − × M (cid:12) . Our method providesthe most robust measurement relying on the escape ve-locity at the location of the Sun.Along with a better understanding of the baryoniccomponents of the Milky Way, the dark matter proﬁle,and the local circular velocity, other eﬀects are impor-tant to evaluate in order to improve Milky Way massestimates. In particular, many of the existing estimatesassume a relaxed equilibrated halo between the locationof the Sun and the edge of the Galaxy. The presence ofsatellites, streams, and the evidence of the active mergerhistory of the Milky Way would suggest otherwise. It istherefore crucial to pair our pipeline, applied at diﬀer-ent distances from the Galactic center, with a betterunderstanding of the halo at larger radii. A combinedapproach will help build a complete and coherent pictureof the Milky Way potential, and constrain the shape ofthe dark matter halo. ACKNOWLEDGEMENTSWe are grateful to I. Moult for early discussions andcollaboration on the project, and to M. Lisanti for help-ful feedback. We would also like to thank L. Anderson,A. Bonaca, G. Collin, A. Deason, P. Hopkins, A. Ji, andJ. Johnson for helpful conversations.This work was performed in part at Aspen Center forPhysics, which is supported by National Science Founda-tion grant PHY-1607611. This research used resourcesof the National Energy Research Scientiﬁc ComputingCenter (NERSC), a U.S. Department of Energy Oﬃce ofScience User Facility operated under Contract No. DE-AC02-05CH11231. LN is supported by the DOE underAward Number DESC0011632, the Sherman Fairchildfellowship, the University of California Presidential fel-lowship, and the fellowship of theoretical astrophysics atCarnegie Observatories. TL is supported by an AlfredP. Sloan Research Fellowship and Department of Energy(DOE) grant DE-SC0019195.This work has made use of data from the Euro-pean Space Agency (ESA) mission Gaia

Gaia

Multilateral Agreement.REFERENCES

Ade, P., et al. 2016, Astron. Astrophys., 594, A13Akaike, H. 1974, IEEE Transactions on Automatic Control,19, 716Alexander, J. B. 1982, MNRAS, 201, 579Belokurov, V., Erkal, D., Evans, N. W., Koposov, S. E., &Deason, A. J. 2018, MNRAS, 478, 611Bennett, M., & Bovy, J. 2019, MNRAS, 482, 1417Bovy, J. 2011, PhD thesis, copyright - Database copyrightProQuest LLC; ProQuest does not claim copyright in theindividual underlying works; Last updated - 2016-06-04Bovy, J. 2015, The Astrophysical Journal SupplementSeries, 216, 29Caldwell, J. A. R., & Ostriker, J. P. 1981, ApJ, 251, 61Deason, A. J., Belokurov, V., Koposov, S. E., & Lancaster,L. 2018, ApJ, 862, L1Deason, A. J., Fattahi, A., Belokurov, V., et al. 2019,MNRAS, 485, 3514Dierickx, M. I. P., & Loeb, A. 2017, ApJ, 847, 42Eadie, G., & Juri´c, M. 2019, ApJ, 875, 159Eilers, A.-C., Hogg, D. W., Rix, H.-W., & Ness, M. K.2019, ApJ, 871, 120 Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman,J. 2013, PASP, 125, 306Gaia Collaboration, Brown, A. G. A., Vallenari, A., et al.2018, ArXiv e-prints, arXiv:1804.09365Gaia Collaboration, Prusti, T., de Bruijne, J. H. J., et al.2016, A&A, 595, A1Gibbons, S. L. J., Belokurov, V., & Evans, N. W. 2014,MNRAS, 445, 3788Grand, R. J. J., Deason, A. J., White, S. D. M., et al. 2019,MNRAS, 487, L72Helmi, A., Babusiaux, C., Koppelman, H. H., et al. 2018,Nature, 563, 85Koppelman, H. H., & Helmi, A. 2020, arXiv e-prints,arXiv:2006.16283Lancaster, L., Koposov, S. E., Belokurov, V., Evans, N. W.,& Deason, A. J. 2018, ArXiv e-prints, arXiv:1807.04290Leonard, P. J. T., & Tremaine, S. 1990, ApJ, 353, 486Miyamoto, M., & Nagai, R. 1975, PASJ, 27, 533Monari, G., Famaey, B., Carrillo, I., et al. 2018, A&A, 616,L9 Myeong, G. C., Evans, N. W., Belokurov, V., Sand ers,J. L., & Koposov, S. E. 2018, ApJ, 863, L28Navarro, J. F., Frenk, C. S., & White, S. D. M. 1996, ApJ,462, 563Necib, L., & Lin, T. 2021Necib, L., Lisanti, M., & Belokurov, V. 2019a, ApJ, 874, 3Necib, L., Lisanti, M., Garrison-Kimmel, S., et al. 2018,arXiv e-prints, arXiv:1810.12301Necib, L., Ostdiek, B., Lisanti, M., et al. 2019b, arXive-prints, arXiv:1907.07681—. 2019c, arXiv e-prints, arXiv:1907.07190Patel, E., Besla, G., Mandel, K., & Sohn, S. T. 2018, ApJ,857, 78Piﬄ, T., Scannapieco, C., Binney, J., et al. 2014, A&A,562, doi:10.1051/0004-6361/201322531Plummer, H. C. 1911, MNRAS, 71, 460Pouliasis, E., Di Matteo, P., & Haywood, M. 2017, A&A,598, A66 Schaller, M., Frenk, C. S., Bower, R. G., et al. 2015,Monthly Notices of the Royal Astronomical Society, 451,1247Sch¨onrich, R., Binney, J., & Dehnen, W. 2010, MNRAS,403, 1829Smith, M. C., Ruchti, G. R., Helmi, A., et al. 2007,MNRAS, 379, 755Wang, W., Han, J., Cautun, M., Li, Z., & Ishigaki, M. N.2020, Science China Physics, Mechanics, and Astronomy,63, 109801White, S. D. M., & Rees, M. J. 1978, Mon. Not. Roy.Astron. Soc., 183, 341Xue, X. X., Rix, H. W., Zhao, G., et al. 2008, ApJ, 684,1143Zwicky, F. 1933, Helvetica Physica Acta, 6, 110 A. DATA SELECTIONFor the data sample in the main text, we apply Galactocentric distance cuts as well as parallax error cuts. For theremaining stars, we show the distribution of the errors on the measured speeds in Fig. S1. The cut placed in the maintext of ∆ v/v < .

05 only removes a small fraction of the entire data sample, while the typical errors in the retrogradesample are larger. We account for the error in individual stars in the likelihood, as discussed in Necib & Lin (2021). ∆ v [km/s] N S t a r s All StarsRetrograde Stars .

00 0 .

05 0 . (∆ v ) /v N S t a r s Figure S1.

The top panel shows the distribution of speed errors in the entire

Gaia sample satisfying distance and parallaxerror cuts, while the bottom panel shows the distribution of fractional speed error. B. CORNER PLOTS FOR ANALYSES IN MAIN TEXTB.1.

Corner plots for v min = 300 km/s v esc = 735 . +50 . − . k k = 18 . +1 . − . − . − . − . − . − . l og ( f ) log( f ) = − . +0 . − .

500 600 700 800 900 v esc . . . . . l og ( σ o u t ) k − . − . − . − . − . log( f ) .

25 5 .

40 5 .

55 5 .

70 5 . log( σ out ) log( σ out ) = 5 . +0 . − . Figure S2.

Corner plot for the single component ﬁt of the full

Gaia data, with v min = 300 km/s, and a cap of 5% on themeasured fractional error of the speeds. The contours correspond to 68% and 95% containment. v esc = 519 . +17 . − . . . . . k k = 18 . +1 . − . − . − . − . − . l og ( f ) log( f ) = − . +0 . − . . . . . . l og ( σ o u t ) log( σ out ) = 5 . +0 . − . . . . . k Sub s k Subs = 2 . +0 . − .

480 520 560 600 v esc . . . . . f Sub s . . . . k − . − . − . − . log( f ) . . . . . log( σ out ) . . . . k Subs .

32 0 .

40 0 .

48 0 .

56 0 . f Subs f Subs = 0 . +0 . − . Figure S3.

Corner plot for the two component ﬁt of the full

Gaia data, with v min = 300 km/s, and a cap of 5% on themeasured fractional error of the speeds. The contours correspond to 68% and 95% containment. v esc = 484 . +17 . − . k k = 18 . +0 . − . − . − . − . − . − . l og ( f ) log( f ) = − . +0 . − . . . . . l og ( σ o u t ) log( σ out ) = 5 . +0 . − . . . . . . k Sub s k Subs = 0 . +0 . − . . . . . f Sub s f Subs = 0 . +0 . − . k k = 4 . +1 . − .

440 480 520 560 v esc . . . . f

12 14 16 18 k − . − . − . − . − . log( f ) . . . . log( σ out ) . . . . . k Subs .

15 0 .

30 0 .

45 0 . f Subs k .

15 0 .

30 0 .

45 0 . f f = 0 . +0 . − . Figure S4.

Corner plot for the three component ﬁt of the full

Gaia data, with v min = 300 km/s, and a cap of 5% on themeasured fractional error of the speeds.The contours correspond to 68% and 95% containment. Corner plot for v min = 350 km/s and 2-component ﬁt v esc = 498 . +63 . − . k k = 12 . +4 . − . − . − . − . − . − . l og ( f ) log( f ) = − . +0 . − . . . . . . l og ( σ o u t ) log( σ out ) = 5 . +0 . − . k Sub s k Subs = 1 . +1 . − .

450 600 750 900 v esc . . . . f Sub s k − . − . − . − . − . log( f ) . . . . . log( σ out ) k Subs . . . . f Subs f Subs = 0 . +0 . − . Figure S5.

Corner plot for the two component ﬁt of the full

Gaia data, with v min = 350 km/s, and a cap of 5% on themeasured fractional error of the speeds. The contours correspond to 68% and 95% containment. Corner plot for v min = 375 km/s and 1-component ﬁt v esc = 515 . +113 . − . k k = 2 . +3 . − . − . − . − . − . − . l og ( f ) log( f ) = − . +0 . − .

450 600 750 900 v esc . . . . l og ( σ o u t ) k − . − . − . − . − . log( f ) .

75 6 .

00 6 .

25 6 . log( σ out ) log( σ out ) = 5 . +0 . − . Figure S6.

Corner plot for the single component ﬁt of the full

Gaia data, with v min = 375 km/s, and a cap of 5% on themeasured fractional error of the speeds. The contours correspond to 68% and 95% containment. C. ANALYSIS WITH RETROGRADE DATAIn the main text, we found robust and self-consistent results for v esc when ﬁtting the Gaia dataset (satisfying somequality cuts). We are not able to ﬁnd robust results when ﬁtting retrograde stars only. In Fig. S7, we see that the v esc posteriors are non-convergent in all cases, and there is no clear trend in the results with v min . As described in themain text and can be seen in Fig. 4, there are far fewer retrograde stars with speeds above 500 km/s, and the outlierpopulation is not as well constrained, leading to outlier confusion and the double-peaked posteriors.

300 325 350 375 4004006008001000 v e s c [ k m / s ] N =442 N =285 N =172 N =95 N =66 Gaia Retrograde Data

Err=5.0%

Single Function Two Functions

300 325 350 375 400 v min [km/s]051015 k Figure S7.

From ﬁtting the retrograde data only. Posteriors in escape velocity and k for diﬀerent values of v minmin