[PDF] An Improved and Physically-Motivated Scheme for Matching Galaxies with Dark Matter Halos

Abstract

The simplest scheme for predicting real galaxy properties after performing a dark matter simulation is to rank order the real systems by stellar mass and the simulated systems by halo mass and then simply assume monotonicity - that the more massive halos host the more massive galaxies. This has had some success, but we study here if a better motivated and more accurate matching scheme is easily constructed by looking carefully at how well one could predict the simulated IllustrisTNG galaxy sample from its dark matter computations. We find that using the dark matter rotation curve peak velocity, v_{max}, for normal galaxies reduces the error of the prediction by 30% (18% for central galaxies and 60% for satellite systems) - following expectations from Faber-Jackson and the physics of monolithic collapse. For massive systems with halo mass > 10^{12.5} M_{\odot} hierarchical merger driven formation is the better model and dark matter halo mass remains the best single metric. Using a new single variable that combines these effects, \phi = v_{max}/v_{max,12.7} + M_{peak}/(10^{12.7} M_{\odot}) allows further improvement and reduces the error, as compared to ranking by dark matter mass at z=0 by another 6% from v_{max} ranking. Two parameter fits -- including environmental effects produce only minimal further impact.

Full PDF

DD raft version M arch

1, 2021Typeset using L A TEX twocolumn style in AASTeX63

An Improved and Physically-Motivated Scheme for Matching Galaxies with Dark Matter Halos S tephanie T onnesen and J eremiah P. O striker

Flatiron Institute, CCA, 162 5th Avenue, New York, NY 10010 USA Princeton University Observatory, Ivy Lane, Princeton, NJ 08544 USA Columbia Astrophysics Laboratory, 538 W 120th St, New York, NY 10027 USA (Dated: January 2021)

ABSTRACTThe simplest scheme for predicting real galaxy properties after performing a dark matter simulation is torank order the real systems by stellar mass and the simulated systems by halo mass and then simply assumemonotonicity - that the more massive halos host the more massive galaxies. This has had some success, but westudy here if a better motivated and more accurate matching scheme is easily constructed by looking carefullyat how well one could predict the simulated IllustrisTNG galaxy sample from its dark matter computations.We ﬁnd that using the dark matter rotation curve peak velocity, v max , for normal galaxies reduces the errorof the prediction by 30% (18% for central galaxies and 60% for satellite systems) - following expectationsfrom Faber-Jackson and the physics of monolithic collapse. For massive systems with halo mass > . M (cid:12) hierarchical merger driven formation is the better model and dark matter halo mass remains the best singlemetric. Using a new single variable that combines these e ﬀ ects, φ = v max / v max , . + M peak / (10 . M (cid:12) ) allowsfurther improvement and reduces the error, as compared to ranking by dark matter mass at z = v max ranking. Two parameter ﬁts – including environmental e ﬀ ects produce only minimal further impact. INTRODUCTIONThere has been a great deal of progress in recent yearsin our understanding and simulations of galaxy formation.The initial conditions seem to be well speciﬁed by LCDMcosmological models and their variants, and hydrodynamiccodes now have the capacity and resolution to include manyof the physical processes relevant to galaxy formation andevolution. These include dark matter gravitational collapse,gas cooling, star formation and the mechanical and radiativefeedback from stars and central, massive black holes (BHs).There is no widely accepted mechanism for the formationof these massive BHs, but, assuming their formation as seedBHs, their evolution and e ﬀ ects on the surrounding galaxiesare now reasonably well modeled. Recent summaries of thestatus of existing work on galaxy formation and evolution arepresented in Somerville and Davé (2015) and Naab and Os-triker (2017). For more detailed treatments one can consultthe EAGLE (Schaye et al. 2015; Crain et al. 2015), FIRE(Hopkins et al. 2018a,b), IllustrisTNG (Vogelsberger et al.2014; Genel et al. 2014; Weinberger et al. 2017; Pillepichet al. 2018), MUFASA (Davé et al. 2016), NIHAO (Wanget al. 2015; Blank et al. 2019) and other simulations. [email protected] (ST) But there are virtues to considering simpler treatments thatdo not need to rely on sub-grid modeling and which are eas-ily adaptable to analyzing large data sets. Of these, the sim-plest, perhaps, is the abundance matching scheme which isbased on the fact that, in all variants of the LCDM modeling,galaxies live in more massive dark matter (DM) halos, quasi-spherical lumps of dark matter, which grow via gravitationalinstabilities from very low amplitude ( 10 − ), gaussian per-turbations imparted at very early times in a roughly powerlaw distribution by unknown processes (thought to be relatedto inﬂation) ((e.g. Vale and Ostriker 2004, 2006, 2008).There is good agreement on how to compute the formationof these dark matter halos with several computational codesnow able to make moderately high-resolution cosmic scalevolumes containing accurate distributions of DM halos hav-ing well deﬁned properties. Early analyses by Navarro et al.(1997) showed that these could be represented to a reason-able approximation by three numbers, a mass, a virial radiusand a core radius, with the ratio of the latter two numbersrepresented as the concentration. To zeroth order observedgalaxies can be represented by their stellar masses, or alter-natively by their stellar luminosities.Thus, the simplest possible scheme for populating a vol-ume in the universe with galaxies would be to populate it ﬁrstwith DM halos, and then make a rank ordered list of thesewith the most massive ﬁrst. Next, one could take the same a r X i v : . [ a s t r o - ph . GA ] F e b volume from the real universe and rank order the observedgalaxies by mass (or luminosity), and then simply assumethat the more massive halos hosted more massive galaxies,putting in each halo the corresponding galaxy.This, almost ridiculously simple, scheme was pursued byVale & Ostriker in three papers (Vale and Ostriker 2004,2006, 2008; see also Kravtsov et al. 2004; Tasitsiomi et al.2004). Using this scheme (or a variation thereof), one cantake di ﬀ erent variants of the LCDM model, compute the halodistribution, populate the halos with galaxies using the sim-ple abundance matching scheme and then compare to ob-servations. By construction, the luminosity functions mustcome out to be correct, but correlation functions (Conroyet al. 2006; Marín et al. 2008; Guo et al. 2010; Trujillo-Gomez et al. 2011), pair counts (Berrier et al. 2006), mag-nitude gap statistics (Hearin et al. 2013; Ostriker et al. 2019),and galaxy-galaxy lensing (Hearin and Watson 2013) can beusefully compared to observations.Two immediate questions arise. First, is there a better ze-roth order scheme than ranking by mass and matching. Sev-eral authors have considered this question. For example,while the DM in subhalos is quickly stripped once they areaccreted onto halos, stripping of the more centralized galaxystarts later (Nagai and Kravtsov 2005), and in fact their op-tical sizes are observed to grow with cosmic time (cf vanDokkum et al. 2010), presumably due to the accretion ofsmaller satellite systems. Thus, Conroy et al. (2006) im-proved subhalo abundance matching by matching galaxiesto halos at the time at which they are accreted onto a cen-tral halo. Even earlier, Kravtsov et al. (2004) proposed usingthe maximum circular velocity of (sub)halos, v max , which ismore stable than halo mass to stripping. Reddick et al. (2013)tested abundance matching models using several halo prop-erties, and found that only v peak (deﬁned as the peak valueof v max over the history of the halo) or a combination of v max for central galaxies and v peak for satellite galaxies isable to reproduce observations of galaxy clustering. Indeed,Zentner et al. (2014) argues that abundance matching using v max matches several observed galaxy statistics (Conroy et al.2006; Hearin et al. 2013; Hearin and Watson 2013; Reddicket al. 2013) because halo mass alone does not determine thehalo velocity proﬁle.Xu and Zheng (2018) conﬁrmed that for the central galax-ies in the original Illustris simulations, M ∗ is more tightlycorrelated with v peak than with halo mass. They also ﬁndthat at ﬁxed v peak , the correlation between M ∗ and other haloproperties is removed. In He (2020), the author uses subhaloabundance matching and ﬁnds that v peak correlates best withthe stellar mass at the epoch of v peak in both central and satel-lite galaxies in EAGLE, Illustris, and IllustrisTNG. Stellarmass stripping of satellite galaxies results in increased scat-ter in the z = ∗ to v peak relation. Chaves-Montero et al. (2016) ﬁnd that v relax , deﬁned as the maximum of the circularvelocity of a dark matter structure while it fulﬁls a relaxationcriterion, as evaluated along its entire history, correlates moststrongly with M ∗ .Second, does there exist any ﬁrst order reﬁnements of themass-matching scheme that could be implemented, whichwould be easy to apply and would signiﬁcantly increase itsaccuracy. In fact, Lehmann et al. (2017) point out that whileranking by v max is similar to ranking by halo mass, at ﬁxedhalo mass more concentrated halos have higher v max (Klypinet al. 2011). They therefore use a parameterization from Maoet al. (2015) that includes both halo mass and concentration: V α ≡ V vir ( V max V vir ) α (1)where v max is the maximal circular velocity of the halo and V vir ≡ ( GM vir R vir ) / (2)They ﬁnd that an α ∼ ﬀ erent environmental densities for SDSS ob-servations and a subhalo abundance matching model appliedto the Bolshoi-Planck simulation and ﬁnd that the model pre-dictions agree well with observations.In this paper we will try to both physically motivate andimprove the simplest matching scheme. We will use Illus-trisTNG to determine whether complicating the most sim-ple form of subhalo abundance matching, ie matching stellarmass to a single halo property, reduces scatter in the assign-ment of galaxies to halos. We ﬁrst verify the halo propertythat produces the least scatter in the relation, v max , consid-ering the central and satellite populations separately and se-lecting di ﬀ erent mass ranges even in this initial step. Wethen provide a physical motivation for the parameters usedin the optimal matching scheme. Then, similarly to Martizziet al. (2020), we calculate how the scatter is reduced whenwe fold in a second halo feature. However, unlike these re-cent works, we test a wide range of possible parameters andpossible combinations thereof.In Section 2 we describe our sample of galaxies and ourmethodology for testing and evaluating procedures for usingmatching techniques to predict galaxy properties given darkmatter simulations. In Section 3 we try out the simplest oneparameter schemes and show that the physically motivatedfocus on peak velocity dispersion is best for normal galaxiesbut that total halo mass remains best for ﬁrst brightest, mas-sive, central systems as again would be expected from phys-ical arguments. Section 4 broadens the treatment to includemultiple variables - including environment - and then in Sec-tions 5 & 6 we present an overall discussion of the resultsand our conclusions. METHODS2.1.

IllustrisTNG

The IllustrisTNG100 (public data release: Nelson et al.2019) is part of a suite of cosmological simulations runusing the AREPO moving mesh code (Springel 2010).TNG100 has a volume of 110.7 Mpc and a mass resolu-tion of 7 . × M (cid:12) and 1 . × M (cid:12) for dark matter andbaryons, respectively. The TNG suite implements upgradedsubgrid models compared to the Illustris simulation (Vogels-berger et al. 2014; Genel et al. 2014); speciﬁcally, a modiﬁedblack hole accretion and feedback model (Weinberger et al.2017), updated galactic winds (Pillepich et al. 2018). TNGalso includes magnetohydrodynamics (Pakmor et al. 2011).2.2. Galaxy Selection

We use galaxy populations from the IllustrisTNG 100 sim-ulation described above. We consider galaxies at the z = ≥ M (cid:12) / h in the dark-matter only run (DMO) that are matched in the full hydro-dynamical simulation with galaxies whose stellar mass isgreater than 10 M (cid:12) / h .In detail, we ﬁrst selected all galaxies in the DMO simu-lation with a dark matter mass greater than 10 M (cid:12) / h . Wethen used the publicly available matching data to ﬁnd the cor-responding galaxy in the full hydro simulation. We use allgalaxies identiﬁed with masses above 5 × M (cid:12) / h , whichclearly includes galaxies that are underresolved in the simu- lation. However, at our minimum dark matter halo mass, thelowest stellar mass of any galaxy in our sample is 7 × M (cid:12) / h . In order to only include well-resolved galaxies, thatare more likely to be in an observational sample, our ﬁnalanalysis only includes galaxies with stellar masses above 10 M (cid:12) / h . 2.3. Galaxy Environmental Measures

We use nearby galaxies to measure the local environment.We include all galaxies with dark matter masses above 10 M (cid:12) / h within 1 Mpc, 2 Mpc, 5 Mpc, 8 Mpc or 15 Mpc ofeach galaxy. In order to have a more physical measure ofthe local mass density, we summed the total mass of all thesegalaxies.We also were able to separate galaxies into satellitesor centrals using the GroupFirstSub identiﬁer in the Illus-trisTNG DMO simulation. This allowed us to perform ourﬁts for the entire sample and for satellites and centrals sep-arately, and, as we shall see, the two categories are signiﬁ-cantly di ﬀ erent in their properties. This gives us three sam-ples: “all", “centrals" and “satellites". The fourth sample islabeled as “mix", which is the combined sample in whichsatellites and centrals are ﬁt separately.2.4. Concentration

We use three measures of the concentration. First, fromBose et al. (2019) we use: c v ≡ V max H o R max (3)where v max is the maximum velocity of the simulated rotationcurve and R max is the radius at which V c is maximal. Boseet al. (2019) show that this is equivalent to the concentrationcalculated using all the particles in a halo and assuming anNFW proﬁle (see also Moliné et al. 2017).Second, we use the ratio c h ≡ v max / V hal f mass , whereV hal f mass is the circular velocity at the half mass radius ofthe dark matter halo, calculated as (cid:112) GM hal f mass / R hal f mass .Finally, we use the ratio c R ≡ R max / R hal f mass .2.5. Percent Error

We deﬁne the error as:

Error ≡ (cid:80) N | log ( M true / M prediction ) | N (4)so that for small errors our deﬁnition is equivalent to 0.43times the average fractional error. RANK ORDERINGIn this section we discuss using rank ordering to match ha-los and galaxies. Speciﬁcally, we ﬁrst present a straightfor-ward theoretical sca ﬀ old for selecting the halo property bestsuited for rank ordering. Then, using IllustrisTNG we con-ﬁrm our derivation.3.1. A Simple Theoretical Basis for Selecting the OrderingHalo Property

We can begin with an assumption that stellar mass is re-lated to the baryonic mass scaled to the dark matter mass,corrected by the fraction of matter that cools and forms starsin the center of the halo: M ∗ ∝ M peak Ω b Ω d t f orm t cool , f orm (5)Here t f orm is the formation time of the halo and t cool , f orm isthe time required for the baryons to cool and condense intoa galaxy. This is simply putting in the form of an equationthe classical idea of “monolithic collapse" ﬁrst proposed byEggen et al. (1962).We can relate the mean density of the galaxy to its massand radius: ρ max ≡ M max π r max (6)In which ρ max , M max , and r max are the density, mass, andradius at which the circular velocity reaches v max , where v max = GM max r max .We can relate t f orm to the halo density assuming standardgravitational collapse (Gunn & Gott 1972): G < ρ > ≡ t − f orm (7)We can also relate t cool , f orm to density using energy conser-vation and the standard cooling equations: kT max m ≡ GM max r max (8)The above equation deﬁnes T max . We subsequently candeﬁne t cool , f orm as: Λ ( T max ) ρ max ≡ ρ max kT max t cool , f orm (9)Thus, t cool , f orm ∝ ρ − max f where f ≡ Λ ( T max ) / T max , and t f orm ∝ ρ − max . If we use these relations in Equation 5, we ﬁnd that M ∗ ∝ M peak ρ max f ∝ ( M max r max ) f ∝ v max f (10)Because for Bremsstrahlung cooling f ∝ T max ∝ v max ,we complete the simpliﬁed derivation of the well-establishedFaber-Jackson relation (Faber and Jackson 1976): M ∗ ∝ v max (11)We highlight that this derivation is based on the assump-tion of spherical collapse of the halo and pure radiative cool-ing. We do not consider any complicating processes that we know a ﬀ ect galaxies in the universe, such as mergers or feed-back from star formation or AGN. In fact, we might expectthis relation between M ∗ and v max to break down more oftenfor higher mass galaxies, as they have been found to havelater growth times where these assumptions clearly break-down (Behroozi et al. 2013). For ﬁrst brightest systems, sit-ting in massive halos from which they can accrete satellites,one would expect M peak to be more relevant, and, as we havenoted, both observations (e.g. van Dokkum et al. 2010) andLCDM theory argue that hierarchical accretion is the domi-nant process for ﬁrst brightest galaxies.Indeed, we can go a step farther and ask the basis forand the value of the transition mass above which “normal"growth of the stellar component from a cooling collapse be-comes di ﬃ cult. This was addressed in a paper by Rees andOstriker (1977)(eqn 20) in an elementary discussion of themaximum mass of cosmic gas that can cool and collapse ina dynamical time. They did not include the important ef-fects of dark matter in their treatment and obtained a massof [( (cid:126) cGm p ) ( e (cid:126) c ) ( m p m e ) ] m p ∼ M (cid:12) in baryons. Had the ef-fects of dark matter been included the value of the bary-onic, transition mass would have been reduced somewhat,but the corresponding dark matter mass would have approx-imated 10 . M (cid:12) . In fact, the mass function of galaxies inthe standard Press-Schechter parameterization declines ex-ponentially above a certain critical mass, the stellar mass be-ing roughly 10 M (cid:12) and the corresponding halo mass beingroughly 10 . M (cid:12) . Consequently, we have both an observa-tional and a physical basis for expecting that galaxies abovesome critical mass will grow primarily by accreting satellitesand cannot be formed easily by a monolithic collapse. Thus,while matching based on v max will be best for normal sys-tems, we can expect that, for ﬁrst brightest galaxies in mas-sive clusters, M peak should be best metric.3.2. Rank Ordering in IllustrisTNG

Here, we test these theoretical predictions using the Illus-trisTNG simulation. As described in Sectiion 2.2, we use asample of galaxies with dark matter mass greater than 10 M (cid:12) / h in the DMO simulation and stellar masses above 10 M (cid:12) / h in the full hydrodynamical simulation.We rank-ordered our selected galaxies by total mass andthe stellar mass separately for each of our samples: "all"(11927), "satellites" (2337), and "centrals" (9590). We haveused three simple proxies for dark matter halo mass in ourranking schemes: the current dark matter mass, M DM , thepeak dark matter mass, M peak , and the current v max . Theseare shown in order from the top to bottom panels in Figure1. We show the total mass proxy and stellar mass for each ofour galaxies as orange “o". The blue lines show the predictedstellar mass using the rank-ordering method for each of oursamples.We see that the scatter decreases as we move from usingM DM to M peak , although the M ∗ ∝ M (3 / halo for higher massesholds for both variables. We also highlight that the rank-order line for satellite galaxies is much closer to that for cen-trals when we use M peak than when we use M DM . The scattercontinues to decrease when we use v max as our dark matterhalo mass proxy, particularly at lower masses (lower v max ).To guide the eye we have overplotted simple power-law rela-tions between M ∗ and v max .The results shown in Figure 1 are quantiﬁed using the per-cent error as described in Section 2.5 (eqn 4), with the re-sults shown in Table 1. We see that while using the currentM DM in the matching scheme is reasonably accurate for cen-tral galaxies, it is much less accurate for satellite systems.Therefore, we also consider the peak mass of the halo, M peak .Using M peak should correct for mass loss from satellite galax-ies due to tidal stripping. Because dark matter is distributedto a larger radius than the stars in a galaxy, it will be morestrongly stripped.Therefore, while we expect M peak to be very similar toM DM for central galaxies, it can vary by a considerableamount for satellites. Indeed, we see in Table 1 that the im-provement for centrals is very small when using M peak ratherthan M DM , but it is dramatic for satellite galaxies. We alsoconsider v max , as tidal stripping is found to have little e ﬀ ecton this property, likely because the maximum rotational ve-locity is reached at relatively low radii. Using v max for ourvariable we ﬁnd that the error for central galaxies has im-proved by more than 15%, although the correlation between v max and stellar mass for satellites is somewhat weaker thanthe correlation between M peak and stellar mass. However, be-cause most of our galaxies are centrals, v max remains the bestsingle variable for rank-ordering our galaxy sample.We stress that, because of the shape of the mass function,any relation between stellar mass and halo mass will be dom-inated by the lowest-mass galaxies. Therefore, we also con-sider separately only galaxies whose mass in the DMO sim-ulation is greater than 10 M (cid:12) / h in order to remove the bulkof low mass galaxies while still retaining a sample with ∼ DM is reasonably accurate for central galaxies,but much less so for satellite systems. Again, we ﬁnd a largeimprovement in the ranking scheme for satellites galaxies us-ing M peak .However, unlike in the full sample, v max is the worst rank-ing variable for central galaxies with halo masses above 10 M (cid:12) / h . This agrees well with our theoretical argument that atlarge masses M peak will be the best ranking variable due tomerging.With this empirical support for the trends predicted in ourmodel, we also develop a straightforward variable, using the Figure 1.

The stellar mass of galaxies versus possible variablesto use for ranking. Blue lines show the predicted stellar mass us-ing the rank-ordering method for each of our samples.

Top panel:

Ranking using the current dark matter halo mass has the most scat-ter.

Middle panel:

Using M peak for ranking reduces scatter, and therank ordering predictions for satellite and central galaxies is muchcloser.

Bottom panel:

Ranking using v max reduces scatter dramat-ically, particularly for lower mass (lower v max ) halos, as quantiﬁedin Table 1. physical intuition from above, that v max will be the best rank-ing variable for low mass galaxies and M peak will be the bestranking variable for high mass galaxies (Section 3.1). Forthis variable we normalize both v max and M peak to their val-ues at a “pivot mass" of M peak = . M (cid:12) . We call thesevariables v norm ≡ v max / v max , . and m norm ≡ M peak / . .We then rank order our galaxies using the parameter basedon these normalized values: φ ≡ v norm + m norm (12)Using this parameter, low mass galaxies depend morestrongly on v max , while high mass galaxies depend on M peak .Both the exact value of the pivot mass and the powers of v norm and m norm were selected to minimize error while ﬂeshing outour theoretical sca ﬀ old.As shown in Table 1, using this parameter φ gives someimprovement on the ﬁt to the central galaxies in our sample,and dramatically reduces the error for the satellite galaxies.Using this variable for the mix of all galaxies reduces the er-ror by a substantial 33% when compared with rank orderingby M DM . USING SECONDARY VARIABLES TO IMPROVERANK ORDERINGWe now attempt to minimize the scatter in the φ - M ∗ re-lation using other features of dark matter halos. These fea-tures are listed in Table 2. We have roughly grouped the haloproperties into those related to the halo mass (M DM , M peak , v disp ≡ dark matter velocity dispersion, and v max ), size (r max ≡ v max radius and r DM ≡ dark matter half mass radius), shape(concentration using the three methods described in Section2.4), formation time (the lookback time to M peak , to whenthe halo reaches 50% of its z = = Method of Correction

We ﬁrst plot our feature as a function of our best singlevariable φ , and ﬁnd the running median of the feature usinga window size of 50 galaxies. We have tested using otherwindow sizes (25 and 100 galaxies) and ﬁnd similar results.The top panel of Figure 2 shows this plot using the environ-mental density M DM , r < Mpc variable. Clearly there is a trendof increasing environmental density as a function of φ , and itdi ﬀ ers for satellites and centrals.Because we use the rolling median, we need to remove theﬁrst and last 25 values, so we are left with an "all" sample of11877, a "centrals" sample of 9540, and a "satellites" sampleof 2287 galaxies. Removing these galaxies has little impact Figure 2.

The top and bottom panels show the ﬁrst and secondsteps used to include a secondary halo feature to reduce the scatterin rank ordering halos (Section 4.1). Here we use the M DM , r < Mpc environmental measure, written as M2Mpc.

Top:

First we plot thisvariable as a function of φ , our rank-ordering variable. Here weshow the total sample (“all") as well as the centrals and satellites("sats") separately. The points are color-coded as centrals and satel-lites. Bottom:

Using the scatter from a rolling median, we canﬁnd that the ratio of the true stellar mass of the galaxy to the rank-ordered assigned mass has a dependence on M DM , r < Mpc . The pointsare not color-coded for satellites and centrals, as for the “all" ﬁt weuse all of the galaxies in the sample. We can then correct our stellarmass using this dependence. Notice that the rolling median for thetotal sample is similar to that for the separated satellite and centralsamples. on the percent errors using the rank ordering method for eachsample (a change of less than 1%).We then plot M true / M rank as a function of ∆ log( feature ),which is the di ﬀ erence between the log( feature ) for each darkmatter halo and the log( feature rollingmedian ) found at each φ . Inthe bottom panel of Figure 2 we show how M true / M rank is re-lated to the scatter in M DM , r < Mpc . This relation is ﬁt using

Number of galaxies (M DM > M (cid:12) ) 11927 9590 2337 11927 11927Galaxy Sample All Centrals Satellites Mix % ImprovementRank Ordering using M DM peak v max φ ≡ v norm + m norm DM (at z = peak , v max (at z = φ ≡ v norm + m norm (eqn 12) for the dark matter mass for the di ﬀ erent samples (a ﬁt to all galaxies, only centrals, only satellites, and mix of allgalaxies ﬁtting the centrals and satellites separately). We use a galaxy sample with dark matter mass in the DMO simulation greater than10 M (cid:12) / h that is matched to any galaxy in the hydrodynamical run with stellar mass greater than 10 M (cid:12) / h . The ﬁnal column shows thepercent improvement of ranking by the selected variable compared to M DM (at z =

0) on the “mix" sample. We see that ranking using thesingle variable φ reduces the error 34% compared to matching by M DM (at z = DM > M (cid:12) ) 11927 9590 2337 11927 11927Galaxy Sample All Centrals Satellites Mix % Improvement φ + v disp φ + v max φ + M DM φ + M peak φ + r max φ + r DM φ + c v φ + c h φ + c r φ + t peak φ + t φ + t φ + M DM , r < Mpc φ + M DM , r < Mpc φ + M DM , r < Mpc φ + M DM , r < Mpc φ + M DM , r < Mpc φ + M DM , r < Mpc + t φ ranking method plus the listed corrections. We use the galaxy sample with dark mattermass greater than 10 M (cid:12) / h that is matched to any galaxy in the hydro run with stellar mass greater than 10 M (cid:12) / h . Note that M DM , r < XMpc isthe mass of all halos within that radius from the DMO simulation with M DM > M (cid:12) , including the mass of the halo from which themeasurement originates. All halo properties are measured using the DMO simulation. Here the ﬁnal column shows the percent improvementon the “mix" sample of using the correction variable in addition to rank-ordering by φ in comparison to only rank-ordering by φ . ﬁrst, second and third order polynomial ﬁts. Finally, we cor-rect our prediction for the stellar mass using our chosen ﬁt asbelow (the second order polynomial ﬁt shown tends to givethe best results): log ( M ∗ , pred ) = log ( M ∗ , rank ) + α ∆ log ( f eature ) + β ∆ log ( f eature ) + γ (13)Finally, we calculate the percent error of the new predic-tion. This value for each feature and galaxy population isshown in Table 2. 4.2. Results

All of our quantitative results are shown in Table 2. Themost glaring result is that most corrections to do not result ina large improvement of the percent error from ranking using φ . Using random resampling of 70% of our data sets(“all",“central", and “satellites") 60 times, we ﬁnd a distri-bution of errors with means matching the values listed for φ of the complete sample in Table 1, and standard deviations of0.001, 0.001, 0.002, and 0.0009 for “all", “centrals", “satel-lites" and “mix" samples, respectively. With this in mind wecan look more closely at the improvement when adding asecond feature to our matching scheme.In some more detail, it is not surprising that all of the halofeatures describing halo mass do not improve the ﬁt to cen-tral galaxies at all. These are well-ﬁt by our φ variable. How-ever, interestingly, the error is reduced for the satellite samplewhen we include a M DM correction. This may be because wehave largely ignored satellite galaxy evolution by choosing v max and M peak as the components of φ . Including M DM maystart to include the later evolution of these galaxies.We ﬁnd universally small improvement when consideringour variables describing halo size (r max and r DM ) and shape(concentration).Interestingly, there is some improvement in the error whenfolding formation time into the stellar mass estimate. Forexample, t is the halo feature that results in the smallestpercent errors across all of our samples: “all" galaxies, “cen-trals", “satellites", and the “mix" sample.Finally, using environment to correct for the stellar massalso has a small impact on the overall error. Despite this, wenote that including the mass from galaxies within 2-5 Mpcseems to produce a slightly better correction than smaller orlarger environment windows.4.2.1. Correcting Using A Combination of Environment andFormation Time

Finally, we use our ﬁts for each of our strongest individualcorrections, t and M DM , r < Mpc , to create a combined correc-tion on the rank-ordering technique. M ∗ , pred = log ( M ∗ , rank ) + ( α M ∆ log ( M DM , r < Mpc ) + β M ∆ log ( M DM , r < Mpc ) + ( α t ∆ log ( t ) + β t ∆ log ( t ) + γ (14)We use the curveﬁt module in scipy to perform a least-squares ﬁt to the above equation, and ﬁnd that we can reducethe error using both M DM , r < Mpc and t as shown in the ﬁnalline of Table 2. 4.3. Verifying our Results

Here we use two methods to verify our results on the im-provement using multiple halo features to determine stellarmass. 4.3.1.

Random Forest Regression

Now that we have gained insight into the level of improve-ment that can be gained by using more than one feature ofdark matter halos in the abundance matching technique, weturn to machine learning to provide an independent check ofour modeling and ranking scheme.Using Random Forest Regression (RFR) allows us to rankthe features according to their e ﬀ ect on the model output, and has the additional beneﬁt of expanding the space of availablemodels beyond polynomial ﬁtting. For this work we use sci-kit learn (Pedregosa et al. 2011).First, we are able to reproduce the percent error on the en-tire sample using only our deﬁned φ feature (0.111), and ina two-feature setting where we add a central / satellite galaxylabel (0.105). We check the rest of our ranking parametersfrom Table 1 and verify that φ produces the best ranking vari-able to match DM halos to galaxies. Also, we conﬁrm thatour φ variable produces lower error values than the combina-tion of v max and M peak .We also use our four selected halo features that we foundproduced the best match between the DMO and hydro-dynamical simulations, φ , M DM , r < Mpc , t , and the cen-tral / satellite label. Using an optimized RF regressor, the ex-pected test set error is 0.098 with a standard deviation of0.0013, quite similar to the percent error we ﬁnd ranking thesatellites and centrals separately using φ and applying ouranalytic correction using M DM , r < Mpc and t . This is reas-suring because it shows that our results are only very mildlydependent on the modeling assumptions.We can also include all the features and use a parameteroptimization technique to ﬁnd the minimum possible errorof a Random Forest Regression. We ﬁnd a minimum er-ror of 0.092 using eight randomly selected features, creating100 trees (n estimators ) with a maximum depth of 14 branches(max depth ). However, there are more than 30 combinationswithin one standard deviation (0.0015), including one usingonly 4 features. We can conclude that there may be manysimilarly relevant predictors in our feature list. This supportsour analytic reasoning that several of our halo features arereasonable proxies for halo mass, and we have already notedthat our other halo features can be separated into only a fewtypes of variables (halo size, concentration, formation time,and environment).Indeed, if we optimize the RFR including one feature ofeach type we can reach an error of 0.095 ( φ , M DM , r < Mpc , t ,r DM , and c v ). This is within two standard deviations of thefour halo parameters we use in our analytic model, and sodoes not indicate a dramatic improvement.Comparing our results to the errors found using the RFRmachine learning technique gives us assurance that our ana-lytic method for including extra halo features is reasonable,and that our conclusions are not strongly model-dependent.While continuing to add features can reduce the error on thematching scheme, we do not ﬁnd other clear DM halo fea-tures that dramatically improve upon our analytic method.4.3.2. Cross-Validation

In order to obtain another view on whether increasing thenumber of halo features improves our estimate of stellar masswe can use cross-validation. This can be used to determine

All Centrals Satellites Mix φ φ + M DM , r < Mpc φ + t φ + M DM , r < Mpc + t how meaningful our derived improvements are when usedto predict the stellar mass of galaxies. Cross-validation isspeciﬁcally designed to trade o ﬀ over- and under-ﬁtting togive the highest prediction accuracy. For this, we randomlyselect 80% of our sample as our test set, on which we performthe ﬁtting processes as described. We use the remaining 20%as our test set to determine if the percent error on the stel-lar mass prediction improves when including more features.Speciﬁcally, we select 80% of our total sample for the “all"ﬁts, and then 80% of the central and satellite samples, in or-der to determine the “central", “satellite", and “mix" ﬁts.We performed this cross-validation routine ten times usingten di ﬀ erent random subsets of the data, and universally ﬁndimprovement in both the training and test sets when usingM DM , r < Mpc , t , or their combination. In Table 3, we list themedian percent error values for the ten sets of training andtest samples. We can conclude that we have not yet overﬁtusing these halo features, and our improvement in predictingstellar masses from halo masses is real, albeit small. DISCUSSIONWhat have we learned from this exercise? The zeroth orderconclusion is that a matching scheme based on the maximumvelocity in a dark matter halo is a good single predictor ofthe ﬁnal stellar mass for normal galaxies, whether they arecentral galaxies or satellites. The typical error in the predic-tion (in the IllustrisTNG100 simulations) is 11.6 percent inlog(M ∗ ) compared to 19.8 using M DM , and the dependence ofstellar mass on v max is unsurprisingly log (M ∗ ) ∼ (3.8 ± v max ) (using bootstrap resampling with 70% of the dataset). This result is just what one would have expected fromthe simplest physical argument that estimates the amount ofgas that can be turned into stars in the standard Gunn andGott (1972) collapse of a dark matter halo.But, for high mass systems comparable to the ﬁrst brightestgalaxies in clusters living in halos more massive than 10 . M (cid:12) , the accretion of satellite systems will signiﬁcantly in-crease the stellar mass and the most relevant halo parameteris simply the peak dark matter mass, M peak . Using a sin-gle variable, φ (eqn 12), which incorporates both featuresreduces the error to 10.5% when satellites and centrals areranked seperately.These prescriptions should be easy to implement andcan replace the simplest, halo mass based initial matching schemes when estimating the expected galaxy stellar massesgiven a dark matter simulation.If one wants to go farther and improve the best zeroth orderscheme by ﬁrst order corrections then we have found that aroughly 6% improvement is possible. Interestingly, environ-mental considerations that we considered did not lead to sig-niﬁcant improvement even in satellite galaxies, and the bestsingle variable for improvement was t , the time at which ahalo reached 85% of its peak mass.However, an almost mindless combination of the two vari-ables ( v max , M peak ) worked best. We further found that asimple linear combination based on these two variables en-ables predictions to a typical accuracy of 10.5 percent errorin log(M ∗ ). TESTSAll of these results are based on simulated data and it isimportant to test them in the real world. We have been ableto think of two tests that might be applied to help determinewhether the proposed matching scheme provides a signiﬁcantimprovement over the simplest matching scheme. First oneconstructs a standard LCDM, dark matter only simulationand, using a standard halo ﬁnding algorithm, makes a catalogof dark matter halos labeling each of them with the ﬁnal darkmatter mass, M DM , the peak dark matter mass M peak over thehistory of the halo and the current halo maximum circularvelocity v max . Then, to test the classic matching scheme (ashas been done before – Conroy et al. 2006), one takes a rep-resentative volume and rank orders the halos by M DM , takescatalog values for a comparable volume (from, say, the SloanDigital Sky Survey) and rank orders the observed galaxies by(for example) g or r magnitudes and then identiﬁes each DMhalo with the matched by ranking, real galaxy. This gives onean artiﬁcial catalog of galaxies each tagged with a position, avelocity and a g or r optical magnitude.Then one would “observe” this synthesized catalog andconstruct two spatial distribution functions, a galaxy-galaxyspatial correlation function (Conroy et al. 2006; Hearin et al.2013; Hearin and Watson 2013; Reddick et al. 2013) anda void distribution function (e.g. Walsh and Tinker 2019).These could then be compared to the known galaxy-galaxyspatial correlation functions and the known void distributionfunctions with both one parameter functions speciﬁed as afunction of magnitude. The magnitude distribution itself is0of course correct by construction. Then comparing – say –the autocorrelation length as a function of galaxy magnitudebetween the real and synthesized data sets allows one to de-termine the fractional error as a function of galaxy magni-tude.Then one would go back to the original DM halo catalogand, using (M peak , v max ), construct for each halo the value of φ = ( v max / v ) + (M peak / (M )) (equation 13), where ( v , M )are the values of ( v max , M peak ) for the average halo of mass10 . M (cid:12) . Now, with each halo tagged with its value of φ ,one can rank order the synthetic sample by φ and attach vi-sual magnitudes to each galaxy by the same method as wasdone using M DM . Now one has a new catalog to observe withrespect to spatial distribution metrics and can again ﬁnd thefractional error in – for example – the spatial autocorrelationlength as a function of visual brightness and compute the er-ror by comparing to real observed data.This procedure would give us a quantitative estimate asto how well the matching scheme was working compared toboth reality and the previous simpler matching scheme whichhas had considerable success. And, unlike the exercises inthis paper, the tests would not be dependent on the accuracyof our current galaxy formation algorithms, which, whilewell tested, su ﬀ er from the “conﬁrmation bias” inevitablewhen uncertain modelling parameters are adjusted to ﬁt ob-servations. We look forward to pursuing these independenttests in future work. CONCLUSIONSIn this paper we have examined schemes to populate a syn-thesized dark matter only set of cosmological simulationswith galaxies to see if we could devise a simple and accuratescheme. We took as our starting point a matching scheme(Vale and Ostriker 2004, 2006, 2008) which, while almostnaively simple has had some success. In that scheme, onerank orders DM halos by ﬁnal mass and rank orders realgalaxies in a similar cosmic volume by luminosity and at-taches to the kth ranked halo the kth ranked galaxy. Table 1represents of one parameter e ﬀ orts which we compared to thecomputed luminosities in the IllustrisTNG simulated galaxycatalog. Table 2 summarizes our results with two parame-ter ﬁts where we used combinations of velocity dispersion,mass and environmental density. We did not ﬁnd that addingan environmental variable produced a signiﬁcant improve-ment over simpler schemes nor did we ﬁnd that any of thetwo parameter ﬁts that we investigated were statistically sig-niﬁcantly superior to the one parameter ﬁts. What we diddiscover was that a new single variable, φ (cf equation 12),which combines information from both mass and velocityvariables, provides a quite signiﬁcant improvement over thebasic ranking scheme using ﬁnal dark matter mass, the errorbeing reduced by about 33% percent. In our examination of the physical basis for the success of this new variable we ex-amined simple arguments starting with the over half centuryold paper by Rees and Ostriker (1977).There is a critical mass for galaxies – the mass above whichit cannot cool by normal radiative processes in roughly a freefall time. That mass corresponds roughly to 10 . M (cid:12) whichwe designate as M1. Below this mass there is a simple an-alytic argument that asks if a gaseous object can cool in itsown free fall time and (assuming bremsstrahlung cooling) wenoted that this condition is equivalent to the Faber-Jacksonrelation, ie that M ∗ ∼ v max . For masses above M1 growthonly occurs by accretion of satellites and that is proportionalto M. So, we designed a metric, φ , which is dominated by ve-locity for low mass objects and dominated by mass for highmass objects more massive than M1. This single variable,based on the physical motivation given above, seems to pro-vide a matching scheme superior to others which we havetested. We did try other combinations of (M peak , v max ) andfound none superior to the simple variable, φ , that we hadtested. So our bottom line is that the variable, φ (eq 12), isthe best single variable to use in predicting the stellar massof galaxies, given their halo properties.ACKNOWLEDGMENTSST would like to thank Claire Kopenhafer and TjitskeStarkenburg for their help and scripts in reading in and an-alyzing TNG outputs, Viviana Aquaviva for her MachineLearning class and comments on the draft, and Dan Foreman-Mackey for discussions and comments on cross-validation.ST gratefully acknowledges support from the Center forComputational Astrophysics at the Flatiron Institute, whichis supported by the Simons Foundation. The data used inthis work were hosted on facilities supported by the Scien-tiﬁc Computing Core at the Flatiron Institute, a division ofthe Simons Foundation.1REFERENCES Behroozi, P. S., Wechsler, R. H., and Conroy, C. (2013). On theLack of Evolution in Galaxy Star Formation E ﬃ ciency. ApJL ,762(2):L31.Berrier, J. C., Bullock, J. S., Barton, E. J., Guenther, H. D.,Zentner, A. R., and Wechsler, R. H. (2006). Close GalaxyCounts as a Probe of Hierarchical Structure Formation.

ApJ ,652(1):56–70.Blank, M., Macciò, A. V., Dutton, A. A., and Obreja, A. (2019).NIHAO - XXII. Introducing black hole formation, accretion, andfeedback into the NIHAO simulation suite.

MNRAS ,487(4):5476–5489.Bose, S., Eisenstein, D. J., Hernquist, L., Pillepich, A., Nelson, D.,Marinacci, F., Springel, V., and Vogelsberger, M. (2019).Revealing the galaxy-halo connection in IllustrisTNG.

MNRAS ,490(4):5693–5711.Chaves-Montero, J., Angulo, R. E., Schaye, J., Schaller, M., Crain,R. A., Furlong, M., and Theuns, T. (2016). Subhalo abundancematching and assembly bias in the EAGLE simulation.

MNRAS ,460(3):3100–3118.Conroy, C., Wechsler, R. H., and Kravtsov, A. V. (2006).Modeling Luminosity-dependent Galaxy Clustering throughCosmic Time.

ApJ , 647(1):201–214.Crain, R. A., Schaye, J., Bower, R. G., Furlong, M., Schaller, M.,Theuns, T., Dalla Vecchia, C., Frenk, C. S., McCarthy, I. G.,Helly, J. C., Jenkins, A., Rosas-Guevara, Y. M., White, S. D. M.,and Trayford, J. W. (2015). The EAGLE simulations of galaxyformation: calibration of subgrid physics and model variations.

MNRAS , 450(2):1937–1961.Davé, R., Thompson, R., and Hopkins, P. F. (2016). MUFASA:galaxy formation simulations with meshless hydrodynamics.

MNRAS , 462(3):3265–3284.Dragomir, R., Rodríguez-Puebla, A., Primack, J. R., and Lee, C. T.(2018). Does the galaxy-halo connection vary withenvironment?

MNRAS , 476(1):741–758.Eggen, O. J., Lynden-Bell, D., and Sandage, A. R. (1962).Evidence from the motions of old stars that the Galaxycollapsed.

ApJ , 136:748.Faber, S. M. and Jackson, R. E. (1976). Velocity dispersions andmass-to-light ratios for elliptical galaxies.

ApJ , 204:668–683.Genel, S., Vogelsberger, M., Springel, V., Sijacki, D., Nelson, D.,Snyder, G., Rodriguez-Gomez, V., Torrey, P., and Hernquist, L.(2014). Introducing the Illustris project: the evolution of galaxypopulations across cosmic time.

MNRAS , 445(1):175–200.Gunn, J. E. and Gott, J. Richard, I. (1972). On the Infall of MatterInto Clusters of Galaxies and Some E ﬀ ects on Their Evolution. ApJ , 176:1.Guo, Q., White, S., Li, C., and Boylan-Kolchin, M. (2010). Howdo galaxies populate dark matter haloes?

MNRAS ,404(3):1111–1120. He, J.-h. (2020). Modelling the tightest relation between galaxyproperties and dark matter halo properties from hydrodynamicalsimulations of galaxy formation.

MNRAS , 493(3):4453–4462.Hearin, A. P. and Watson, D. F. (2013). The dark side of galaxycolour.

MNRAS , 435(2):1313–1324.Hearin, A. P., Zentner, A. R., Berlind, A. A., and Newman, J. A.(2013). SHAM beyond clustering: new tests of galaxy-haloabundance matching with galaxy groups.

MNRAS ,433(1):659–680.Hopkins, P. F., Wetzel, A., Kereš, D., Faucher-Giguère, C.-A.,Quataert, E., Boylan-Kolchin, M., Murray, N., Hayward, C. C.,and El-Badry, K. (2018a). How to model supernovae insimulations of star and galaxy formation.

MNRAS ,477(2):1578–1603.Hopkins, P. F., Wetzel, A., Kereš, D., Faucher-Giguère, C.-A.,Quataert, E., Boylan-Kolchin, M., Murray, N., Hayward, C. C.,Garrison-Kimmel, S., Hummels, C., Feldmann, R., Torrey, P.,Ma, X., Anglés-Alcázar, D., Su, K.-Y., Orr, M., Schmitz, D.,Escala, I., Sanderson, R., Grudi´c, M. Y., Hafen, Z., Kim, J.-H.,Fitts, A., Bullock, J. S., Wheeler, C., Chan, T. K., Elbert, O. D.,and Narayanan, D. (2018b). FIRE-2 simulations: physics versusnumerics in galaxy formation.

MNRAS , 480(1):800–863.Klypin, A. A., Trujillo-Gomez, S., and Primack, J. (2011). DarkMatter Halos in the Standard Cosmological Model: Results fromthe Bolshoi Simulation.

ApJ , 740(2):102.Kravtsov, A. V., Berlind, A. A., Wechsler, R. H., Klypin, A. A.,Gottlöber, S., Allgood, B. o., and Primack, J. R. (2004). TheDark Side of the Halo Occupation Distribution.

ApJ ,609(1):35–49.Lehmann, B. V., Mao, Y.-Y., Becker, M. R., Skillman, S. W., andWechsler, R. H. (2017). The Concentration Dependence of theGalaxy-Halo Connection: Modeling Assembly Bias withAbundance Matching.

ApJ , 834(1):37.Mao, Y.-Y., Williamson, M., and Wechsler, R. H. (2015). TheDependence of Subhalo Abundance on Halo Concentration.

ApJ , 810(1):21.Marín, F. A., Wechsler, R. H., Frieman, J. A., and Nichol, R. C.(2008). Modeling the Galaxy Three-Point Correlation Function.

ApJ , 672(2):849–860.Martizzi, D., Vogelsberger, M., Torrey, P., Pillepich, A., Hansen,S. H., Marinacci, F., and Hernquist, L. (2020). Baryons in theCosmic Web of IllustrisTNG - II. The connection amonggalaxies, haloes, their formation time, and their location in theCosmic Web.

MNRAS , 491(4):5747–5758.Matthee, J., Schaye, J., Crain, R. A., Schaller, M., Bower, R., andTheuns, T. (2017). The origin of scatter in the stellar mass-halomass relation of central galaxies in the EAGLE simulation.

MNRAS , 465(2):2381–2396. Moliné, Á., Sánchez-Conde, M. A., Palomares-Ruiz, S., and Prada,F. (2017). Characterization of subhalo structural properties andimplications for dark matter annihilation signals.

MNRAS ,466(4):4974–4990.Naab, T. and Ostriker, J. P. (2017). Theoretical Challenges inGalaxy Formation.

ARA & A , 55(1):59–109.Nagai, D. and Kravtsov, A. V. (2005). The Radial Distribution ofGalaxies in Λ Cold Dark Matter Clusters.

ApJ , 618(2):557–568.Navarro, J. F., Frenk, C. S., and White, S. D. M. (1997). AUniversal Density Proﬁle from Hierarchical Clustering.

ApJ ,490(2):493–508.Nelson, D., Springel, V., Pillepich, A., Rodriguez-Gomez, V.,Torrey, P., Genel, S., Vogelsberger, M., Pakmor, R., Marinacci,F., Weinberger, R., Kelley, L., Lovell, M., Diemer, B., andHernquist, L. (2019). The IllustrisTNG simulations: public datarelease.

Computational Astrophysics and Cosmology , 6(1):2.Ostriker, J. P., Choi, E., Chow, A., and Guha, K. (2019). Mind theGap: Is the Too Big to Fail Problem Resolved?

ApJ , 885(1):97.Pakmor, R., Bauer, A., and Springel, V. (2011).Magnetohydrodynamics on an unstructured moving grid.

MNRAS , 418(2):1392–1401.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg,V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M.,Perrot, M., and Édouard Duchesnay (2011). Scikit-learn:Machine learning in python.

Journal of Machine LearningResearch , 12(85):2825–2830.Pillepich, A., Springel, V., Nelson, D., Genel, S., Naiman, J.,Pakmor, R., Hernquist, L., Torrey, P., Vogelsberger, M.,Weinberger, R., and Marinacci, F. (2018). Simulating galaxyformation with the IllustrisTNG model.

MNRAS ,473(3):4077–4106.Reddick, R. M., Wechsler, R. H., Tinker, J. L., and Behroozi, P. S.(2013). The Connection between Galaxies and Dark MatterStructures in the Local Universe.

ApJ , 771(1):30.Rees, M. J. and Ostriker, J. P. (1977). Cooling, dynamics andfragmentation of massive gas clouds: clues to the masses andradii of galaxies and clusters.

MNRAS , 179:541–559.Schaye, J., Crain, R. A., Bower, R. G., Furlong, M., Schaller, M.,Theuns, T., Dalla Vecchia, C., Frenk, C. S., McCarthy, I. G.,Helly, J. C., Jenkins, A., Rosas-Guevara, Y. M., White, S. D. M.,Baes, M., Booth, C. M., Camps, P., Navarro, J. F., Qu, Y.,Rahmati, A., Sawala, T., Thomas, P. A., and Trayford, J. (2015).The EAGLE project: simulating the evolution and assembly ofgalaxies and their environments.

MNRAS , 446(1):521–554.Somerville, R. S. and Davé, R. (2015). Physical Models of GalaxyFormation in a Cosmological Framework.

ARA & A , 53:51–113. Springel, V. (2010). E pur si muove: Galilean-invariantcosmological hydrodynamical simulations on a moving mesh. MNRAS , 401(2):791–851.Tasitsiomi, A., Kravtsov, A. V., Wechsler, R. H., and Primack, J. R.(2004). Modeling Galaxy-Mass Correlations in DissipationlessSimulations.

ApJ , 614(2):533–546.Trujillo-Gomez, S., Klypin, A., Primack, J., and Romanowsky,A. J. (2011). Galaxies in Λ CDM with Halo AbundanceMatching: Luminosity-Velocity Relation, BaryonicMass-Velocity Relation, Velocity Function, and Clustering.

ApJ ,742(1):16.Vale, A. and Ostriker, J. P. (2004). Linking halo mass to galaxyluminosity.

MNRAS , 353(1):189–200.Vale, A. and Ostriker, J. P. (2006). The non-parametric model forlinking galaxy luminosity with halo / subhalo mass. MNRAS ,371(3):1173–1187.Vale, A. and Ostriker, J. P. (2008). A non-parametric model forlinking galaxy luminosity with halo / subhalo mass: are brightestcluster galaxies special? MNRAS , 383(1):355–368.van Dokkum, P. G., Whitaker, K. E., Brammer, G., Franx, M.,Kriek, M., Labbé, I., Marchesini, D., Quadri, R., Bezanson, R.,Illingworth, G. D., Muzzin, A., Rudnick, G., Tal, T., and Wake,D. (2010). The Growth of Massive Galaxies Since z = ApJ ,709(2):1018–1041.Vogelsberger, M., Genel, S., Springel, V., Torrey, P., Sijacki, D.,Xu, D., Snyder, G., Nelson, D., and Hernquist, L. (2014).Introducing the Illustris Project: simulating the coevolution ofdark and visible matter in the Universe.

MNRAS ,444(2):1518–1547.Walsh, K. and Tinker, J. (2019). Probing Galaxy assembly bias inBOSS galaxies using void probabilities.

MNRAS ,488(1):470–479.Wang, L., Dutton, A. A., Stinson, G. S., Macciò, A. V., Penzo, C.,Kang, X., Keller, B. W., and Wadsley, J. (2015). NIHAO project- I. Reproducing the ine ﬃ ciency of galaxy formation acrosscosmic time with a large sample of cosmologicalhydrodynamical simulations. MNRAS , 454(1):83–94.Weinberger, R., Springel, V., Hernquist, L., Pillepich, A.,Marinacci, F., Pakmor, R., Nelson, D., Genel, S., Vogelsberger,M., Naiman, J., and Torrey, P. (2017). Simulating galaxyformation with black hole driven thermal and kinetic feedback.

MNRAS , 465(3):3291–3308.Xu, X. and Zheng, Z. (2018). Dependence of halo bias andkinematics on assembly variables.

MNRAS , 479(2):1579–1594.Zentner, A. R., Hearin, A. P., and van den Bosch, F. C. (2014).Galaxy assembly bias: a signiﬁcant source of systematic error inthe galaxy-halo relationship.