[PDF] A flexible subhalo abundance matching model for galaxy clustering in redshift space

Abstract

We develop an extension of subhalo abundance matching (SHAM) capable of accurately reproducing the real and redshift-space clustering of galaxies in a state-of-the-art hydrodynamical simulation. Our method uses a low-resolution gravity-only simulation and it includes orphan and tidal disruption prescriptions for satellite galaxies, and a flexible amount of galaxy assembly bias. Furthermore, it includes recipes for star formation rate (SFR) based on the dark matter accretion rate. We test the accuracy of our model against catalogues of stellar-mass- and SFR-selected galaxies in the TNG300 hydrodynamic simulation. By fitting a small number of free parameters, our extended SHAM reproduces the projected correlation function and redshift-space multipoles for number densities 10 −3 − 10 −2 h 3 Mpc −3 , at z=1 and z=0 , and for scales r∈[0.3−20] h −1 Mpc . Simultaneously, the SHAM results also retrieve the correct halo occupation distribution, the level of galaxy assembly bias, and higher-order statistics present in the TNG300 galaxy catalogues. As an application, we show that our model simultaneously fits the projected correlation function of the SDSS in 3 disjoint stellar mass bins, with an accuracy similar to that of TNG300 galaxies. This SHAM extension can be used to get accurate clustering prediction even when using low and moderate-resolution simulations.

Full PDF

MMNRAS , 1–13 (2020) Preprint 15 December 2020 Compiled using MNRAS L A TEX style ﬁle v3.0

A ﬂexible subhalo abundance matching model for galaxy clusteringin redshift space

S. Contreras, ★ R. E. Angulo, , & M. Zennaro . Donostia International Physics Center (DIPC), Manuel Lardizabal Ibilbidea, 4, 20018 Donostia, Gipuzkoa, Spain. IKERBASQUE, Basque Foundation for Science, 48013, Bilbao, Spain.

Accepted XXX. Received YYY; in original form ZZZ

ABSTRACT

We develop an extension of subhalo abundance matching (SHAM) capable of accurately repro-ducing the real and redshift-space clustering of galaxies in a state-of-the-art hydrodynamicalsimulation. Our method uses a low-resolution gravity-only simulation and it includes orphanand tidal disruption prescriptions for satellite galaxies, and a ﬂexible amount of galaxy as-sembly bias. Furthermore, it includes recipes for star formation rate (SFR) based on the darkmatter accretion rate. We test the accuracy of our model against catalogues of stellar-mass- andSFR-selected galaxies in the TNG300 hydrodynamic simulation. By ﬁtting a small numberof free parameters, our extended SHAM reproduces the projected correlation function andredshift-space multipoles for number densities 10 − − − ℎ Mpc − , at 𝑧 = 𝑧 =

0, andfor scales 𝑟 ∈ [ . − ] ℎ − Mpc. Simultaneously, the SHAM results also retrieve the correcthalo occupation distribution, the level of galaxy assembly bias, and higher-order statisticspresent in the TNG300 galaxy catalogues. As an application, we show that our model simul-taneously ﬁts the projected correlation function of the SDSS in 3 disjoint stellar mass bins,with an accuracy similar to that of TNG300 galaxies. This SHAM extension can be used toget accurate clustering prediction even when using low and moderate-resolution simulations.

Key words: cosmology: theory - galaxies: evolution - galaxies: formation - galaxies: haloes -galaxies: statistics - large-scale structure of universe

With a new generation of surveys soon to start collecting data, anew generation of theoretical predictions for galaxy clustering isneeded. These “mock” galaxies are essential to prepare for theseobservations, estimate sources of uncertainty, and exploit them sci-entiﬁcally. Due to the large volume and statistical power of upcom-ing surveys, this new generation of mocks will need to be accurateand computationally eﬃcient.In the Λ CDM paradigm, galaxies form from the gas that iscaptured in the gravitational potential well of dark matter haloes(White & Rees 1978). As this gas cools, it falls to the centre ofhaloes and settles into a disk, to then cool and fragment into stars.While the evolution of a halo is somewhat simple, growing in massthrough accretion and mergers determined mostly by gravitationalinteractions, the evolution of galaxies is much more complex, witha large number of additional astrophysical processes involved. Hy-drodynamic simulations are probably the most realistic method tomodel these processes, however they can take up to hundreds of mil-lions of CPU hours to execute as they require high spatial and massresolution. This implies that most of these simulations have onlyreached volumes of the order of 100 ℎ − Mpc a side (eg. EAGLE, ★ E-mail: [email protected]

Schaye et al. 2015, Illustris,Vogelsberger et al. 2014, HorizonAGN,Dubois et al. 2014, Simba, Davé et al. 2019, simulations).Other techniques which simplify the modelling of some astro-physical processes, like semi-analytical models of galaxy formation(SAMs, Henriques et al. 2015; Stevens et al. 2016; Lacey et al.2016; Croton et al. 2016; Lagos et al. 2018) or semi-empirical mod-els such as EMERGE (Moster et al. 2018) or The Universe Machine(Behroozi et al. 2019) can be run in post-processing over Gpc scalegravity-only simulations in a hundreds of CPU hours (only run,not calibrate them). While much faster than hydrodynamic simu-lations, this is still too slow when thousands or even millions ofmocks are needed to, e.g. scan high dimensional parameter spacesor compute covariance matrices. An additional limitation is thatthese approaches require underlying simulations with considerablyhigh force and mass resolution.Although the evolution of galaxies and haloes is diﬀerent, weexpect a strong correlation between their properties, which moti-vates “empirical models” as an alternative to create fast and compu-tationally eﬃcient mocks. These models “paint” galaxies in haloesor subhaloes using theoretically-motivated relations between thegalaxies and their host (sub)haloes. These methods can be run ina few minutes over large cosmological volumes, and have provedto successfully reproduce several galaxy observables, such as thegalaxy two-point correlation function. One example of an empirical © a r X i v : . [ a s t r o - ph . C O ] D ec S. Contreras et al. models is the SubHalo Abundance Matching technique (SHAM,Conroy et al. 2006; Reddick et al. 2013; Chaves-Montero et al.2016; Lehmann et al. 2017; Dragomir et al. 2018). This tech-nique is based on the idea that most massive subhaloes shouldhost the most massive/luminous galaxies. The algorithm matches,with some scatter, the selected subhalo property to the expected stel-lar mass/luminosity function. While simple, the model is capableof reproducing the galaxy clustering in complex galaxy formationmodels, such as hydrodynamic simulations (e.g. Chaves-Monteroet al. 2016).The main limitation of SHAM is that it assumes a direct rela-tion between the fate of a dark matter satellite and the putative galaxyit might host. This assumption does not hold in detail, as the pres-ence of a galaxy would modify e.g. dynamical friction timescalesand/or the resilience of a subhalo against tidal disruption. Further-more, in relatively low resolution simulations, a subhalo might bestripped below the resolution limit of the simulation, but the hostedgalaxy is still expected to survive as a satellite. Another shortcom-ing of SHAM is that it is limited to properties expected to correlatestrongly with dark matter mass, so it cannot make predictions forstar-formation rates, which would be relevant for upcoming galaxysurveys targeting emission line galaxies.In this paper, we develop an extension to the standard subhaloabundance matching technique that, with only 3 free parameters (5 ifalso including assembly bias) and a low-resolution dark matter only(DMO) simulation, can reproduce the clustering of stellar mass-selected galaxies in real and redshift space with high precision. Thisextension includes the use of orphan satellite subhaloes/galaxiestracking structures below the resolution level of the simulation; a disruption mechanism to account for satellite galaxies that shouldhave been disrupted to form part of the intracluster medium, or inwhich their stellar mass has decreased so much that they shouldno longer be part of the galaxy selection; and a ﬂexible amount ofgalaxy assembly bias to account for any possible diﬀerence betweencorrelation with large-scale environment in SHAM and our targetsample.Along with these improvements, we develop a method capableof predicting the clustering of SFR-selected galaxies. This methodassumes that the SFR of each galaxy is proportional to the darkmatter accretion rate of its host halo, and that is modulated by aneﬃciency that depends on the mass of its host halo. This approachis similar to that employed by the semi-empirical model

EMERGE but adapted to not use the full merger tree on its computation butinstead properties readily available in SHAM.To test the performance of our SHAM extension, we use theIllustris TNG300 magneto-hydrodynamic simulation (Nelson et al.2018), to our knowledge, the largest publicly cosmological hydro-dynamic simulation available today ( 𝐿 = ℎ − Mpc). For thestellar mass-selected mocks, we ﬁnd that our method is capableof reproducing almost perfectly the real and redshift-space clus-tering, as well as the galaxy assembly bias level, halo occupationdistribution (HOD), and higher-order statistics as quantiﬁed by thek-nearest neighbour cumulative distribution functions (kNN-CDF).Similarly, for the SFR-selected sample, we ﬁnd that we are alsoable to reproduce the same galaxy statistics, albeit with a some-what less accuracy, particularly for the level of assembly bias. Thediﬀerences in the galaxy clustering between the star-forming mockand the TNG300 are still much smaller than those found among dif-ferent galaxy formation models (Contreras et al. 2013). We expectthis new generation of mocks to be useful for the development andunderstanding of current and future galaxy surveys.The outline of this work is as follows: In section 2 we present the simulations used in this work and introduce the SHAM tech-nique. Our modiﬁcations to SHAM regarding the treatment ofsatellite galaxies are discussed in section 3 and those regardinggalaxy assembly bias in section 4. In section 5 we present our star-formation-rate modelling. We compare the 2-point clustering of ourmock galaxies with that of the TNG300 in section 6 and with ob-servations in section 7. To ﬁnalise, we present our conclusions insection 8.Unless otherwise stated, the units in this paper are ℎ − M (cid:12) formasses, ℎ − Mpc for distances, km / s for the velocities and M (cid:12) / yrfor star formation rates. To test the accuracy of our mocks, we will compare their clusteringagainst those of galaxies in the TNG300 magneto-hydrodynamicsimulation, the largest of “The Next Generation” Illustris Sim-ulations suite, and, to our knowledge, the largest volume high-resolution hydrodynamic simulation publicly available (Nelsonet al. 2018; Springel et al. 2018; Marinacci et al. 2018; Pillepichet al. 2018; Naiman et al. 2018).The TNG300 simulation follows a periodic box of 205 ℎ − Mpc( ∼

300 Mpc) a side. It used 2500 dark matter particles and gas cells,implying a baryonic mass resolution of 7 . × ℎ − M (cid:12) and of3 . × ℎ − M (cid:12) for dark matter. The simulations were carriedout using the AREPO code (Springel 2010) adopting cosmologicalparameters consistent with recent analyses (Planck Collaborationet al. 2016) . While main results of this paper are only shown atz =

0, we tested that they are also valid at z = .We built galaxies catalogues by selecting the most massiveor star-forming galaxies such that their abundance is equal to anumber density of n = . , . . ℎ Mpc − , equivalentto a minimum value of the stellar mass of 8 . × , . × &6 . × ℎ − M (cid:12) and a minimum SFR of 0.468, 1.493 & 3.033M (cid:12) / yr, respectively. We deﬁned the stellar mass of a galaxy as thesum of the mass of all particles within the stellar half mass radius.We also test deﬁning the stellar mass of the galaxies as the sumof the masses of all particles of a subhalo, ﬁnding similar results.We deﬁne the SFR of a galaxy as the sum of the individual starformation rate of all gas cells in its subhalo.We note that although the TNG300 simulation has shown toagree with many observables (Springel et al. 2018), it is not theaim of this work to create mocks that only resemble this simulation,but to create a ﬂexible model that can describe a broader range offeasible galaxy formation physics and their correlation with cos-mological parameters. We chose an hydrodynamic simulation asa benchmark since it models baryons and dark matter jointly, un-like other approaches that can reproduce galaxy clustering (suchas HODs, semi-empirical models or semi-analytical models). Thisadds an extra challenge for mocks created in DMO simulations,allowing us to perform a stringent test of the performance of ourSHAM galaxies.In addition to the TNG300, we also use the TNG300-1-Dark, Ω dm = 0.3089, Ω b = 0.0486, 𝜎 = 0.8159, 𝑛 𝑠 = 0.9667 and ℎ = 0.6774 MNRAS000

Figure 1.

The stellar mass of the galaxies of the TNG300 simulation asa function of 𝑉 peak of the subhaloes of the matched substructures in theTNG300-1-Dark simulation. The shaded regions represent 1, 2 & 3 𝜎 of thedistribution, and the dashed red and blue lines show the median of the centraland satellite galaxies, respectively. As a reference, the three horizontal dottedlines mark the three number densities used in this work, as labeled. TNG300-2-Dark & TNG300-3-Dark simulations. These are DMOsimulations employing the same initial white noise ﬁeld as theTNG300 run. The TN300-1-Dark has the same number of darkmatter particles as the TNG300 (2500 ) whereas the TNG300-2-Dark and TNG300-3-Dark have factors of 2 and 4 fewer particles(i.e. 1250 and 625 particles, respectively). For some comparisons,we will use the cross-match of TNG galaxies and subhaloes in theTNG300-1-Dark, available in the Illustris TNG website. The main dark matter simulation we will employ is analogous tothe TNG300-3-Dark (same volume and initial conditions as theTNG300, but 4 fewer particles), but with additional propertiescomputed for its subhaloes as we describe below. We refer to thissimulation as the TNG300-3-mimic, and highlight that by using asimulation with the same initial condition as the TNG300 we canperform more accurate comparisons as it reduces any diﬀerencebetween the model caused by cosmic variance.The simulation was carried out with an updated version of L-Gadget3 (Angulo et al. 2012), a lean version of

GADGET (Springelet al. 2005) used to run the Millennium XXL simulation and theBacco Simulations (Angulo et al. 2020). This version of the codeallows an on-the-ﬂy identiﬁcation of haloes and subhaloes usinga Friend-of-Friend algorithm (

FOF Davis et al. 1985) and anextended version of

SUBFIND (Springel et al. 2001). Our updatedversion of

SUBFIND can better identify substructures by consideringthe information of its past history, while also measuring propertiesthat are non-local in time such as the peak halo mass ( 𝑀 peak ), peakmaximum circular velocity (V peak ), infall subhalo mass ( 𝑀 infall ),and mass accretion rate among others.We note that we employ the same numerical accuracy pa-rameters as the Bacco simulations (Angulo et al. 2020). And as avalidation test, we have compared the 𝑧 = In its most basic incarnation, SHAM assumes a one-to-one mappingbetween the mass of subhaloes and the stellar mass or luminosityof hypothetically hosted galaxies. More recent implementations ofSHAM include scatter to this mapping and employ other subhaloproperties, such as 𝑀 infall , 𝑀 peak , 𝑉 max or 𝑉 peak , but still it isassumed that the position and fate of galaxies is determined by theevolution of dark matter subhaloes. This, however, cannot be truein detail since subhalos might be artiﬁcially disrupted in 𝑁 -bodysimulations (because of numerical artifacts and/or limited massresolution). More generally, the presence of a galaxy can modifythe subhalo density proﬁle, aﬀecting its orbit, tidal stripping, andeventual disruption. This puts a limit to the accuracy of SHAM,which we will illustrate next.Fig. 1 compares the stellar mass of TNG300 galaxies and thevalue of 𝑉 peak , the peak value of the maximum circular velocityof a halo throughout its life, of the corresponding cross-matchedsubhalo in the TNG300-1-Dark. Note that Chaves-Montero et al.(2016) showed that in the EAGLE simulation (Schaye et al. 2015) 𝑉 peak is one of the properties that best predicts the value of stellarmass. Indeed, we can see that there is a strong correlation betweenthese two properties in TNG300, with a mean relation being almostidentical for centrals and satellite galaxies, validating a key SHAMassumption. Chaves-Montero et al. (2016) also found that SHAM using 𝑉 peak was capable of reproducing the clustering of stellar-mass se-lected galaxies in EAGLE with very high accuracy. However, theseauthors used a very high resolution dark-matter simulation (whichwould imply artiﬁcial disruption of subhalos was negligible), andthe accuracy on small scales varied depending on the assumptionsregarding star formation in satellites and their tidal disruption.We illustrate the impact of mass resolution for SHAM inFig. 2. We show the projected correlation function, w p ( r p ) , andthe monopole, quadrupole, and hexadecapole of the redshift spacecorrelation function ( 𝜉 ℓ = ( r ) , 𝜉 ℓ = ( r ) , 𝜉 ℓ = ( r ) , respectively) forstellar-mass selected galaxies at 𝑧 =

0. Solid lines show the re-sults for TNG galaxies, whereas coloured lines show the SHAMpredictions as computed using the TNG300-1-Dark, TNG300-2-Dark & TNG300-3-Dark simulations. Note we display results forthree diﬀerent number densities.We see that SHAM in the TNG300-1-Dark agrees remarkablywell with the TNG measurements for the densest galaxy sample, butit overpredicts the clustering for lower number densities. Conversely,SHAM in the lowest resolution simulation, TNG300-3-Dark, agreesvery well with the sparsest sample but instead heavily underpredictsthe clustering of the densest sample. Note that these problems arerelatively conﬁned to small scales in real space, but can aﬀect evenlarge scales in the case of the quadrupole and hexadecapole.All this is simply a consequence that in standard SHAM, theabundance of satellites depends on the numerical resolution of theparent dark matter simulation which might result in an excess orlack of satellite galaxies. For instance, galaxies in the TNG300 Note that, although not shown here, we found a similar scatter between 𝑀 peak and stellar mass but with a signiﬁcantly larger diﬀerence betweencentrals as satellite galaxies, which further motivates the use of 𝑉 peak insteadof 𝑀 peak (see also the discussion in Campbell et al. 2018).MNRAS , 1–13 (2020) S. Contreras et al. − log(r /h − Mpc) log(w p (r) /h − Mpc)

TNG300TNG300-1-onlyDMTNG300-2-onlyDMTNG300-3-onlyDM − log(r /h − Mpc) − log( ξ l=0 (r)) − log(r /h − Mpc) r ξ l=2 (r) − log(r /h − Mpc) r ξ l=4 (r) Figure 2.

The projected correlation function (w p ( r p ) , rightmost panel), the monopole ( 𝜉 ℓ = ( r ) , second panel), quadrupole ( 𝜉 ℓ = ( r ) , third panel) andhexadecapole ( 𝜉 ℓ = ( r ) , leftmost panel) of the galaxies of the TNG300 simulation (black solid line) and the subhaloes of the TNG300-1-Dark, TNG300-2-Dark& TNG300-3-Dark dark matter only simulation (cyan, magenta and yellow dashed lines, respectively). The galaxies/subhaloes samples have a number densityof n= 0.01, 0.00316 & 0.001 ℎ Mpc − selected by stellar mass (for the TNG300) and 𝑉 peak (for the dark matter only simulations). For visualisation propose,the lines for the two lowest number densities have been displaced in the y-axis. The lines in w p ( r p ) and 𝜉 ℓ = ( r ) have been displaced in 0.7 and 1.4, and 20 and40 for the lines in 𝜉 ℓ = ( r ) and 𝜉 ℓ = ( r ) , with the lowest number density is in the top. can be aﬀected by stripping of the stellar mass, while subhaloesnever decrease their 𝑉 peak , producing an overabundance of satellitegalaxies (Smith et al. 2016). Overall, this highlights an importantlimitation of SHAM and motivates our developent of an extensionto SHAM that addresses these problems.We use the subhalo abundance matching technique (SHAM)with 𝑉 peak as a starting point of our mocks. Firstly, we match thevalue of 𝑉 peak of the dark matter only simulation to the stellar massfunction of the TNG300 simulation. Since most of our analysisis done using cuts in number density, the results are mostly inde-pendent of the stellar mass function used. We then i) add a freeparameter controlling the scatter ( 𝜎 logM ) between the stellar massof galaxies and their 𝑉 peak , ii) explicitly model the level of assemblybias (following the procedure presented in Contreras et al. 2020a),and iii) regulate the disruption of satellites and the star formationrate with recipes inspired by semi-analytic galaxy formation modelsand by the semi-empirical model EMERGE (Moster et al. 2018). Inthe next sections we will discuss our prescriptions and show theimpact that the free parameters of our model have on the predictedredshift space clustering of galaxies.

In this section, we introduced two of SHAM modiﬁcations relatedto the treatment of satellite galaxies. In section 3.1 we describe howwe follow subhaloes after they can no longer be identiﬁed by oursimulations (also known as “orphan suhaloes”), and in section 3.2we explain our implementation to account for tidal disruption andstripping of stars.

Due to ﬁnite mass resolution of dark matter simulations, subhaloescan no be longer identiﬁed after they have lost a given fraction oftheir mass due to tidal striping. To improve convergence among sim-ulation with diﬀerent resolution, it is necessary to include “orphan”subhaloes/galaxy. These are structures with a known progenitorwhich we expect to still exist in the halo (i.e. that should not have merged with the central structure or completely destroyed via tidalforces).Orphan sub-structures are normally used in semi-analyticalmodels of galaxy formation (a.k.a. SAMs) to reproduce galaxy clus-tering (De Lucia & Blaizot 2007; Guo et al. 2011). While there aresome SHAM prescriptions that have included orphans in the past(eg. Moster et al. 2013), most do not include this feature, because itadds an additional level of complexity to the model and to the un-derlying dark matter simulation. While uncommon, Guo & White(2014) found that not using an orphan prescription in the SHAMwill cause some diﬀerences in the galaxy clustering prediction, evenwhen using high-resolution simulations.In this work, we include a ﬂexible amount of orphan galaxiesby tracking the most bound particle of subhalos with no knowndescendant. We assume a subhalo to be merged with their centralstructure only when the time since accretion exceeds the dynamicalfriction timescale of the subhalo. To compute the dynamical frictiontime, we use a modiﬁed version of the expression given by (Binney& Tremaine 1987, Eq.7.26)t dyn = .

17 t merge d V host ( M host / h − M (cid:12) ) / G ln ( M host / M sub + ) M sub , (1)where t merger is a free parameter that eﬀectively regulates the num-ber of orphan galaxies; d host is the distance of the subhalo to thecentre of its host halo; v host is the virial velocity of the host halo;M host is the virial mass of the host halo; and M sub is the sub-halo mass. The only variation from the original equation (besidesthe free parameter) is the term ( M host / h − M (cid:12) ) / , which weﬁnd helpful to improve our predictions for the number of satellitegalaxies in high halo masses. One of the most basic assumptions in SHAM is that the relationbetween stellar mass and subhalo property used ( 𝑉 peak in our case) isconstant through time and the same for central and satellites. Whilethe 𝑉 peak - stellar mass relation is indeed very similar for centralsand satellites (c.f. Fig. 1) it is not identical. In Smith et al. (2016), MNRAS000

TNG300 σ logM = 0 . σ logM = 0 . TNG300t merger = 0 . merger = 0 . − log(r /h − Mpc)

TNG300f s = 0 . s = 0 . log( ξ l=0 (r)) σ logM = 0 . σ logM = 0 . σ logM = 0 . t merger = 0 . merger = 1 . merger = 3 . − log(r /h − Mpc) f s = 0 . s = 0 . s = 0 . r ξ l=2 (r) − log(r /h − Mpc)

020 02040 S HA M s c a tt e r r ξ l=4 (r) O r ph a n s − log(r /h − Mpc) T i d a l d i s r up t i o n Figure 3.

The impact of the free parameters in our extended SHAM for stellar-mass selected galaxies with a number density of n = 0.01 ℎ Mpc − at 𝑧 = p ( r p ) ; whereas the second, third and fourth columns respectively display the monopole 𝜉 l = ( r ) ,quadrupole 𝜉 ℓ = ( r ) , and hexadecapole 𝜉 ℓ = ( r ) of the redshift space correlation function. In each row we systematically vary a diﬀerent SHAM parameter, 𝜎 logM , 𝑡 merger , and 𝑓 s , while keeping all the other parameters ﬁxed. Red to blue colours indicate changes from low to high parameter values, as indicated bythe legend. log(w p (r) /h − Mpc)

TNG300 β = 0 . β = 3 . TNG300 γ = 0 . γ = 3 . TNG300log(M1) = 11 . . TNG300 τ = 0 . τ = 4 . − log(r /h − Mpc) TNG300 τ s = − . τ s = − . log( ξ l=0 (r)) β = 6 . β = 9 . β = 12 . γ = 6 . γ = 9 . γ = 12 . log(M1) = 12 . . . τ = 8 . τ = 12 . τ = 16 . − log(r /h − Mpc) τ s = − . τ s = − . τ s = 0 . − r ξ l=2 (r) − − − − log(r /h − Mpc) − r ξ l=4 (r) − log(r /h − Mpc)

Figure 4.

Similar to Fig. 3 but for the SFR parameters: the slopes in the star formation eﬃciency ( 𝛽 & 𝛾 ), the halo mass of the peak of the star formationeﬃciency ( 𝑀 ), the parameters that control de abundance of satellite galaxies 𝜏 and its dependence with it host halo mass 𝜏 𝑆 .MNRAS , 1–13 (2020) S. Contreras et al. the authors showed that satellite galaxies can decrease their stellarmass to the point of disappearing into the intra-cluster medium.This eﬀect should remove some satellite galaxies of a given sample(most of them located near the centre of a halo).To account for this eﬀect, we follow Moster et al. (2018) andassume that all the galaxies in subhaloes with current mass belowa fraction 𝑓 s of their maximum subhalo mass during its evolution 𝑀 peak are disrupted,M sub < f s M peak . (2)In Chaves-Montero et al. (2016) the authors found that the clus-tering of a SHAM run in a dark matter simulation with the sameresolution than EAGLE (a 67.7 ℎ − Mpc high-resolution hydrody-namic simulation) has a larger clustering compare to the galaxies ofEAGLE when selected by stellar mass. Similar results were foundfor the TNG300 by Contreras et al. (2020a) when doing SHAMmocks using 𝑉 peak . This is because 𝑉 peak never decrease, contraryto the stellar mass of galaxies that can lose stellar-mass via strippingeﬀect. By including a disruption parameter, the SHAM can repro-duce better the galaxy clustering even in high-resolution simulations(see appendix A. for more details). We now explore the impact that these recipes and free parametershave in the predicted redshift space clustering of SHAM galaxies.Fig. 3 shows the multipoles and the projected correlation functionof galaxies at 𝑧 = ℎ Mpc − inour TNG-300-mimic simulation. Coloured lines in each row showSHAM predictions varying either 𝜎 logM , the scatter between stellarmass and 𝑉 peak ; the merger timescale, t merger ; or tidal disruption, f s ;while keeping all other parameters ﬁxed. For comparison, we alsoinclude the results for TNG galaxies as black lines.Firstly, we see that the 𝜎 logM parameter has a small but system-atic eﬀect in the clustering. As its value increases, smaller subhaloeswill be included in the selected sample which will progressively re-duce the mean halo mass and large-scale bias. Note that the scatterwe empirically measure from Fig. 1 is 𝜎 logM ∼ . merger which has a signiﬁcantly stronger im-pact. Large values of t merger essentially imply that every subhaloever accreted will still be present in the simulation – this is thusan estimate for a simulation with inﬁnite mass resolution. Con-sequently, there will be a large population of satellite galaxies inhigh-mass haloes, increasing its contribution and the clustering am-plitude. In contrast, lower values of t merger implies orphans subhaloswill be destroyed immediately, thus lowering the clustering and ap-proaching the case of no orphan subhaloes. Note that this parameterbrackets the two limiting cases of no artiﬁcial disruption in 𝑁 − bodysimulations and of every subhalo destroyed being due to numericalreasons. From all the parameters introduced in this model, t merger isthe only one that can increase the galaxy clustering at small scalesand is key to reproduce the galaxy clustering on low-resolutionsimulations.Finally, as expected, larger (smaller) values of the tidal parame-ter f s means fewer (more) satellite galaxies and thus, a lower (higher)clustering. Whereas this eﬀect appears negligible compared to theeﬀect of orphan subhalos and t merger , this parameter becomes morerelevant when dealing with high-resolution simulations.We remind the reader all the changes show before are for a ﬁxednumber density, meaning that the eﬀect on the galaxy clustering isnot only due to the galaxies we are directly removing/adding, but also the ones that are enter/exit the sample to keep the total numberof galaxy constant. The concept of halo assembly bias (also known as secondary bias,Mao et al. 2018) was introduced by Sheth & Tormen (2004) andGao et al. (2005) and is deﬁned as the diﬀerence in the cluster-ing of haloes of the same mass but of diﬀerent secondary property(e.g. halo age, concentration, spin). Halo assembly bias can alsopropagate to galaxy clustering when haloes of the same mass butdiﬀerent secondary properties have a diﬀerent halo occupation (ef-fect know as occupancy variation, Zehavi et al. 2018; Artale et al.2018). This diﬀerence in the galaxy clustering is known as galaxyassembly bias (Croton et al. 2007) and can cause diﬀerences in thecorrelation function of up to a 20% for stellar mass-selected sam-ples, depending on the redshift and number density of the galaxysample in SAMs and hydrodynamic simulations (Contreras et al.2019, 2020a).In Chaves-Montero et al. (2016), the authors found that thegalaxy assembly bias signal predicted by SHAM can be up to 50%lower than that measured in EAGLE. In Contreras et al. (2020a)we extended this work and found that hydrodynamic simulations,semi-analytical models, and SHAM predict diﬀerent levels of galaxyassembly bias and that its evolution with redshift and number densityis also diﬀerent. Since it is unclear that the level of assembly bias inSHAM is a robust prediction, there we include an extension so thatany amount of galaxy assembly bias signal can be added.While the standard SHAM assumes that the scatter betweenthe stellar mass and 𝑉 peak (or any other subhalo property used)is completely random (i.e. it does not depend on any secondaryproperty of the subhalo), our method adds a correlation with theindividual large-scale bias of subhaloes (measured with the methoddeveloped by Paranjape et al. 2018). This is done for central andsatellite galaxies independently, forcing conservation of the satellitefraction at all number densities (see section 4 of Contreras et al.2020a for more details).To assign the level of galaxy assembly bias, the method usestwo free parameters (one for central galaxies, A c , and one for satel-lite galaxies, A s ) that control the correlation between stellar massand individual bias of each subhalo at a given constant 𝑉 peak (onefor central galaxies, A c , and one for satellite galaxies, A s ).These free parameters can have a value between 1 (perfect cor-relation between stellar mass and bias-per-object, which will meana maximum assembly bias signal), and -1 (perfect anticorrelationbetween stellar mass and bias-per-object, which will mean a min-imum assembly bias signal), with 0 being a uncorrelated signal,meaning the sample will have the same assembly bias signal as thestandard SHAM. Although not shown here, we kindly refer to Fig.8 of Contreras et al. (2020a) where we showed that the diﬀerencein the galaxy clustering on large scales for a SHAM with maximumand minimum assembly bias is around a factor three. Most of the empirical models available to create mocks are focusedon galaxy samples selected by stellar mass-like properties (e.g.HOD, SHAM, SCAM), and comparatively much less eﬀort hasbeen made to model star-forming galaxy samples. This is becausemost of these models relate a given (sub)halo property to a galaxy

MNRAS000

MNRAS000 , 1–13 (2020) ﬂexible SHAM model property, and while these relationships are quite simple for stellar-mass selected samples (i.e. the stellar mass is a good proxy forhow massive a (sub)halo is) they are much more complicated forSFRs. More sophisticated models, such as semi-analytical modelsof galaxy formation, show no good agreement in how classic (sub)halo properties correlate with SFR-like galaxy properties (Contreraset al. 2013, 2015).To account for this non-trivial relation between SFR-likegalaxy properties and (sub) halo properties, empirical models havebecome more complicated when trying to reproduce these galaxies.Whereas a standard HOD (Zheng et al. 2005) can reproduce thegalaxy clustering of a stellar-mass selected sample with reasonableaccuracy using only 5 free parameters, SFR-selected galaxy sam-ples need HODs with 8 to 9 parameters for a reasonable agreementon the galaxy clustering (Gonzalez-Perez et al. 2020; Avila et al.2020). Other empirical methods like the age matching or the con-ditional abundance matching (Hearin & Watson 2013; Hearin et al.2014; Kulier & Ostriker 2015, Favole et al. in prep) have been ableto reproduce the galaxy clustering using the additional correlationbetween secondary halo properties, such as halo age, and SFR, butwith some limitations.In this section, we implement a diﬀerent approach to createstar-forming mock galaxies. Inspired by the fact that newly accretedmass provides gas to fuel the formation of new stars in galaxiesand by the modelling carried out by EMERGE , for central galaxies,we assume that the SFR is proportional to the dark matter massaccretion multiplied by an eﬃciency function. We assume that thestar formation eﬃciency is maximal at a halo mass M (typically, ∼ ℎ − 𝑀 (cid:12) , close to the mass AGN feedback starts operating)and then it decays as a power law for both, more and less massive,host haloes. The SFR of the central galaxies can be expressed as:SRF centrals ∝ M h ( M h / M ) 𝛽 + ( M h / M ) − 𝛾 (3)For satellite galaxies, we assume the star formation rate isconstant until they quench, with the same value they had whentheir subhalo mass was equal to its peak halo mass. For this, westored the value of M when M sh = M peak while running the 𝑁 -bodysimulations. We consider a given satellite galaxy quenched whent infall , the time since accretion by a larger halo, is larger than a givenquenching timescale:t infall < t quench ≡ 𝜏 × ( M host halo / ℎ − M (cid:12) ) 𝜏 s , (4)with t dyn the dynamical time of the halo (0 . / H ), M host halo thehost halo mass of the subhalo and 𝜏 & 𝜏 𝑠 free parameters.This last equation is slightly diﬀerent from that used by EMERGE . There, they assume a direct relation with the stellar massof galaxies, while we assume a relation with their host halo mass.We ﬁnd that, for our speciﬁc implementation, this describes betterthe SFR of TNG galaxies. Also, not using the stellar mass on theSFR calculation gives us more freedom when creating mocks withonly SFR and not stellar mass.To avoid including more free parameters to the mock, we willomit any normalisation parameter. This means that the value ob-tained for the SFR can not be compared with other values in theliterature, but that a galaxy sample with a ﬁxed number densityshould include the same galaxy population.We show the impact of varying each of the SFR parameters inFig. 4. Similar to Fig. 3, it shows the multipoles and the projectedcorrelation function of galaxies at 𝑧 = ℎ Mpc − in our TNG-300-mimic simulation. Coloured lines in each row show the slopes in the star formation eﬃciency ( 𝛽 & 𝛾 ),the halo mass of the peak of the star formation eﬃciency ( 𝑀 ), theparameters that control de abundance of satellite galaxies 𝜏 and itsdependence with it host halo mass 𝜏 𝑆 .The dependence of the galaxy clustering of 𝛽 and 𝛾 is weak,especially for 𝛽 . Lower values of 𝛽 and higher values of 𝛾 wouldmean selecting galaxies in more massive haloes. Since these haloesare more biased, the clustering of these galaxies should also behigher, as shown in the plot. The same thing happens with 𝑀 ,where a higher value of 𝑀 means more clustered galaxy sample,consistent in selecting more massive haloes. The dependence onthe galaxy clustering with 𝜏 and 𝜏 𝑆 is strong, with higher valuesof these parameters having a larger galaxy clustering. This makessense since higher values of 𝜏 would mean more satellite galaxies,that have a strong eﬀect on the clustering of galaxies, especially atsmall scales, and higher values of 𝜏 𝑆 will put these galaxies in moremassive haloes (that have more bias).In the next section, we will show the results for the galaxyclustering for stellar mass and SFR mock galaxy clustering. In this section, we will compute the clustering predictions of ourmocks and compare them to those in the TNG300. Since the sim-ulation used to build the mocks has the same initial conditions andvolume as the TNG300, cosmic variance should contribute little tothe diﬀerences in clustering.To ﬁnd the best set of values for the free parameters in ourextended SHAM, we minimize the chi-squared ( 𝜒 ) between thepredicted projected correlation function ( 𝑤 𝑝 ), monopole ( 𝜉 ℓ = ),quadrupole ( 𝜉 ℓ = ), and hexadecapole ( 𝜉 ℓ = ) of the two-point corre-lation functions. We employ these statistics from 0.3 to 20 ℎ − Mpc.We compute uncertainties using 1000 jackknife samples, followingthe procedure of Zehavi et al. (2002) & Norberg et al. (2009). Forsimplicity, we assume no correlation between diﬀerent scales (i.e.we only use the elements of the diagonal of the covariance matrix).To minimise 𝜒 , we use a particle swarm optimization (alsoknown as PSO, Kennedy & Eberhart 1995). In this technique, agroup of particles (known as the swarm) move with random veloci-ties inside the hyperspace we would like to explore. In each step, wecompute the value of 𝜒 of the particular parameter combination.The velocities of the particles are then updated so they can pointtowards the location where each of the particle found its minimum 𝜒 and to the position with the minimum 𝜒 found by all the parti-cle as a swarm. The amount of velocity that remains constant aftereach step, how strong is the deviation to the best local and globalvalue, are the parameters that control how much “exploration” or“exploitation” the PSO will have. We chose a more “explorative”setup using 16 particles with 400 steps each. The PSO typicallyconverges in less than half of these steps. In Fig. 5 we show the clustering prediction of the TNG300 (blacksolid line) and our extended SHAM (red dashed lines) for threenumber densities, 𝑛 = . ℎ Mpc − , 𝑛 = . ℎ Mpc − & 𝑛 = . ℎ Mpc − based on the stellar mass of the galaxies.The shaded black region represents 1 𝜎 uncertainty estimated fromthe jackknifes samples. For comparison, we show the clustering MNRAS , 1–13 (2020)

S. Contreras et al. − log(r /h − Mpc) log(w p (r) /h − Mpc)

TNG300 M stell

Basic SHAMSHAM extended − log(r /h − Mpc) − log( ξ l=0 (r)) − log(r /h − Mpc) r ξ l=2 (r) − log(r /h − Mpc) r ξ l=4 (r) Figure 5.

The projected correlation function (w p ( r p ) , left most panel), the monopole ( 𝜉 ℓ = , second panel), quadrupole ( 𝜉 ℓ = , panel row) and the hexadecapole( 𝜉 ℓ = , right most) for the TNG300 (black solid line), the basic SHAM (green dashed line) and or extended SHAM (red dashed line) for number density of n=0.01, 0.00316 & 0.001 ℎ Mpc − . Same as Fig. 2 the lowest number densities have been displaced to higher values for visualisation purposes, with the lowestnumber density at the top. The ﬁt is done for scales above 0.3 ℎ − Mpc (vertical dotted line).Stellar mass TNG300n / ℎ Mpc − 𝜎 logM 𝑡 merger 𝑓 s A c A s 𝜎 logM 𝑡 merger 𝑓 s A c A s / ℎ Mpc − 𝛽 𝛾 log ( M ) 𝜏 𝜏 s Table 1.

The best-ﬁtting parameters of the extended SHAM for galaxiesselected by stellar mass from the TNG300 (top part), the SDSS (middlepart) and for galaxies selected by SFR of the TNG300 (bottom part). Theﬁt is done for three diﬀerent number densities (as labelled) for the TNG300and all samples simultaneously for the SDSS. prediction of a standard SHAM with a scatter equal to that measuredfor TNG galaxies ( 𝜎 log 𝑀 = . ℎ − Mpc, the minimum scale used in theﬁtting of the SHAM parameters (indicated as a horizontal dottedline in the ﬁgure), suggesting the robustness of our model. Wealso notice that the clustering agrees better for the densest sample.Originally, we expected the opposite, since at these number densitiesthe clustering depends more on the additional parameters we include(such as the fraction of orphans). We assume this is because at thesescales we have a better statistic at larger number densities, makingit easier to reproduce the general clustering of the sample.To study the precision of the mock in more detail, we show inFig. 6 the halo occupation distribution (HOD), the galaxy assemblybias (GAB), and the k-nearest neighbour Cumulative DistributionFunctions (kNN-CDF) for the TNG300, our extended SHAM, andthe standard SHAM for the same three galaxy samples as the previ-ous ﬁgure. We emphasise that our model was not ﬁtted to reproducethese statistics, and that its performance there is a direct reﬂectionof its robustness.We ﬁnd that our extended SHAM model can successfully re-produce, in an almost perfect way, the mean number of galaxies( (cid:104) 𝑁 (cid:105) ) that populate the haloes of mass “M” in the TNG300, forall number densities. Small diﬀerences found at the transition ofzero galaxies to one galaxy per halo are expected, since it is poorlyconstrained by galaxy clustering (see Zehavi et al. 2011, on the con-strain of 𝜎 log 𝑀 , that control this transition ). To properly comparethe HOD from the TNG300 and the SHAMs from the TNG300-mimic, we match the halo masses of both simulations, followingContreras et al. (2015).The magnitude of GAB is estimated by computing the ratio be-tween our galaxy sample, and that where the galaxy population wasshuﬄed among haloes of similar masses, following the procedure ofCroton et al. (2007). This method eliminates any dependence on thegalaxy population and on any other property of the halo beyond itsmass (i.e. any assembly bias component on the galaxy clustering).We remind the reader that, while two of our parameters regulatethe amount of assembly bias of the galaxy population, we only ﬁt Notice that, while the 𝜎 log 𝑀 value of the HOD and of the SHAM arerelated, they are two diﬀerent concepts. MNRAS000 , 1–13 (2020) ﬂexible SHAM model these parameters the full galaxy clustering. The amount of assemblybias captured by the models is similar to the one of the TNG300,with sub-percental diﬀerences. This suggests that the assembly biasimplementation in the SHAM done in (Contreras et al. 2020a) cansuccessfully retrieve the galaxy assembly bias of a galaxy sample.In a future work, we plan to use this model to constrain, amongother things, the level of assembly bias from observational cluster-ing measurements.Finally, we test the k-nearest neighbour Cumulative Distri-bution Functions (kNN-CDF) of the three number densities for 𝑘 = , , In Fig. 7 we show the clustering predictions for SFR-selected sam-ples in our extended SHAM and in the TNG300. We do not includethe basic SHAM in our results since it does not have any predictionfor samples selected by SFR properties. While not as good com-pared to the stellar mass-selected sample, there is a good agreementwith the TNG300, especially for the highest number densities. Tobe able to properly ﬁt the clustering, we were on the need of usingall scales above 0.1 ℎ − Mpc. The best-ﬁtting parameters found forthese number densities are shown in the lower part of Table 1. Asfor stellar mass, the two highest number densities have similar valuefor their parameters. The larger diﬀerence with the highest numberdensity come from 𝛽 and 𝛾 , parameters we ﬁnd have a low impacton the galaxy clustering, when ﬁxing the number density of thesample.A further analysis of the galaxy samples is shown in Fig. 8. Asfor the stellar mass-selected sample, we show the resulting HOD,GAB, and kNN-CDF. The HOD of the TNG300 and our extendedSHAM show a similar shape, with some diﬀerences at high halomasses. These diﬀerences are small compared to those found byAvila et al. (2020) when trying to ﬁt HODs for a ﬁxed bias, and tothose found by Contreras et al. (2013) when comparing the HODin diﬀerent semi-analytical models. For galaxy assembly bias, theextended SHAM does not agree well with the TNG300. This isalso expected since empirical models and diﬀerent galaxy forma-tion models do not share the same amount or even evolution of theirgalaxy assembly bias signal (Chaves-Montero et al. 2016; Contr-eras et al. 2020a). While for the stellar mass sample we added twoadditional parameters to explicitly model the level of assembly bias,we have not done so for SFRs. Adding more free parameters to analready complicated model could not be ideal, especially since thediﬀerences in the galaxy clustering between the extended SHAMmodel and the TNG300 are larger than the diﬀerences coming fromthe diﬀerent galaxy assembly bias level. Finally, the kNN-CDF pre-dictions show good agreement with the hydrodynamic simulation.In summary, our SFR mocks show similar clustering propertiesas TNG300 galaxies. This agreement is not as good as for the stellar mass-selected sample, since the physics that dominates the starformation is much more complicated than those that rule the totalstellar mass of galaxies. Nevertheless, the diﬀerence between ourextended SHAM model and the TNG300 are small compared to thediﬀerences in clustering among diﬀerent galaxy formation models(Contreras et al. 2013). In the previous sections, we showed that our extended SHAM modelis capable of reproducing the clustering of SFR- and stellar mass-selected galaxies of the TNG300 hydrodynamic simulation withhigh precision. Now, we aim to test the ability of our model todescribe observed clustering data.We use the projected correlation function of galaxies in variousstellar mass bins measured in the SDSS DR7 survey, as providedby Guo et al. (2011) based on the work of Li et al. (2006). We ﬁtour extended SHAM model for three ranges of stellar mass withlog ( M stell / ℎ − M (cid:12) ) between [9.5 - 10], [10 - 10.5], & [10.5 - 11].Please notice that unlike the rest of the paper, the mass unit is ℎ − M (cid:12) . To facilitate the comparison with TNG300 as well, we useits stellar mass function to build the SHAM.The resulting projected correlation function is shown as a greendashed line in Fig. 9. The observational data used to ﬁt the extendedSHAM model is shown as black circles, whereas the TNG300 isshown for comparison as a solid red line. For the ﬁtting, we assumea constant arbitrary error per point, and use all scales above 0.1 ℎ − Mpc, the smallest scales at which we tested the extended SHAMmodel, shown as a vertical dotted line in the ﬁgure. All three massrange were ﬁtted simultaneously (i.e. the predictions come from asingle set of extended SHAM parameters).We see that our model shows good agreement with the obser-vational data up to scales 10 times lower than the minimum scalesused in the ﬁtting, proving the ﬂexibility and robustness of ourmodel. The best-ﬁtting parameters are shown in the middle part ofTable 1. Notice that the two assembly bias parameters are positive,meaning that we required a larger amount of assembly bias signalcompared to the standard SHAM to ﬁt the observations. While thismight point out to the existence of assembly bias in the Universe, amore detailed study is needed to properly address this.Some of the small inconsistencies between our model andthe SDSS are found at scales where the TNG300 and the SDSSare inconsistent too, with our model normally agreeing with theTNG300. Since our SHAM and the TNG300 trace the same cosmicstructures (we recall both underlying simulations share the sameinitial white noise ﬁeld), a possibility could be that the discrepan-cies originate from cosmic vartiance. To test for this, we apply ourextended SHAM model to simulations with a signiﬁcantly largervolume compared to the TNG300 (512 ℎ − Mpc) but with a similarresolution (1536 particles) and cosmology (Planck Collaborationet al. 2020, without massive neutrinos). These simulations were runwith opposite initial Fourier phases, using the procedure of Angulo& Pontzen (2016) that suppresses cosmic variance by up to 50 timescompared to a random simulation of the same volume.The SHAM parameters were found using a PSO in our largersimulation, not the TNG300-mimic. The clustering prediction ofthe mean of the projected correlation functions, shown as magentadashed lines in Fig. 9, however, are very similar to those obtainedin the TNG300-mimic. This implies that cosmic variance in oursimulations cannot explain the disagreement between SDSS obser- MNRAS , 1–13 (2020) S. Contreras et al.

11 12 13 14 15 log(M h /h − M (cid:12) ) − . − . − . . . . . . l og ( h N i ) n = 0 . h Mpc − n = 0 . h Mpc − n = 0 . h Mpc − TNG300 M stell

Basic SHAMSHAM extended . . ξ / ξ s hu ﬄ e n = 0 . h Mpc − TNG300 M stell

Basic SHAMSHAM extended . . ξ / ξ s hu ﬄ e n = 0 . h Mpc − − . . . . log(r /h − Mpc) . . ξ / ξ s hu ﬄ e n = 0 . h Mpc − − . − . − . − . − . − . − . − . . l og ( P ( N )) n = 0 . h Mpc − N = 0N = 1N = 2N = 3 . . . . . . log(r /h − Mpc) . . . P ( N ) / P T N G ( N ) n = 0 . h Mpc − TNG300 M stell

Basic SHAMSHAM extended . . . . . . log(r /h − Mpc) n = 0 . h Mpc − . . . . . . log(r /h − Mpc)

Figure 6.

Top left:

The halo occupation distribution (HOD) for galaxies from the TNG300 (blue solid line), a basic SHAM technique (orange dotted line)and our extended SHAM model (green dashed line) for number densities of 0.01, 0.00316, and 0.001 ℎ Mpc − based on their stellar mass, with the highestnumber density at the right of the plot. Top right : The galaxy assembly bias signal, measured as the ratio of the 2PCF of the selected galaxy sample to the oneof its shuﬄed counterpart, following Croton et al. 2007. Same as the top panel, the solid black, dotted green and dashed red lines show the predictions for theTNG300, basic SHAM and our extended SHAM.

Bottom:

The upper part shows the k-nearest neighbour Cumulative Distribution Functions (kNN-CDF) forthe TNG300 (solid lines) basic SHAM (dotted lines) and our extended SHAM model (dashed lines) for N = 0, 1, 2 & 3 (red, green, blue and magenta line,respectively). The lower part of this row shows the ratio of the kNN-CDF of the standard and extended SHAM to the one of the TNG300. − log(r /h − Mpc) log(w p (r) /h − Mpc)

TNG300 M stell

SHAM extended − log(r /h − Mpc) − log( ξ l=0 (r)) − log(r /h − Mpc) r ξ l=2 (r) − log(r /h − Mpc) r ξ l=4 (r) Figure 7.

Similar to Fig. 5 but for galaxies selected by their SFR instead of stellar mass. We do not include predictions of the basic SHAM since this modeldoes not predict SFR for it galaxies. MNRAS000

Similar to Fig. 5 but for galaxies selected by their SFR instead of stellar mass. We do not include predictions of the basic SHAM since this modeldoes not predict SFR for it galaxies. MNRAS000 , 1–13 (2020) ﬂexible SHAM model

11 12 13 14 15 log(M h /h − M (cid:12) ) − . − . − . . . . . . l og ( h N i ) n = 0 . h Mpc − n = 0 . h Mpc − n = 0 . h Mpc − TNG300 M stell

SHAM extended . . ξ / ξ s hu ﬄ e n = 0 . h Mpc − TNG300 SFRSHAM extended . . ξ / ξ s hu ﬄ e n = 0 . h Mpc − − . . . . log(r /h − Mpc) . . ξ / ξ s hu ﬄ e n = 0 . h Mpc − − . − . − . − . − . − . − . − . . l og ( P ( N )) n = 0 . h Mpc − N = 0N = 1N = 2N = 3 . . . . . . log(r /h − Mpc) . . . P ( N ) / P T N G ( N ) n = 0 . h Mpc − TNG300 SFRSHAM extended . . . . . . log(r /h − Mpc) n = 0 . h Mpc − . . . . . . log(r /h − Mpc)

Figure 8.

Similar to Fig. 6 but for galaxies selected by SFR instead of stellar mass, and without the predictions of the basic SHAM. vations and the theoretical predictions. The most logical explanationfor these diﬀerences is that the diﬀerent stellar mass functions be-tween the TNG300 (that is the same as the SHAMs models) andthe observational stellar mass function cause a diﬀerent galaxy se-lection, or that it could be some measurement of the observationalcorrelation function. Since we have shown the ﬂexibility of thismodel by reproducing the observed clustering as a proof of con-cept, which was the goal of this test, we defer further analysis tofuture publications.

In this work, we have developed an extension of the standard subhaloabundance matchine technique (SHAM) that allowed us to repro-duce the galaxy clustering in real and redshift space using relativelylow-resolution simulations.The main characteristics of this model for stellar-mass selectedsamples are the following: • We start from a standard SHAM, that links the stellar mass of agalaxy sample with the peak circular velocity of the subhalo ( 𝑉 peak )in a 1-to-1 relation, and then add a scatter to it. • We include an orphan model to the subhaloes to identify sub-structure even after they have fallen below a given mass resolution of a simulation. The amount of orphans is added with one freeparameter, in a physically motivated model. • Resolved subhaloes, that had lost a large fraction of their sub-halo mass, are consider disrupted. • The scatter between the stellar mass and 𝑉 peak is modiﬁed topreferentially include subhaloes with a larger/lower large-scale bias(following the procedure of Contreras et al. 2020a). This changesthe amplitude of the galaxy assembly bias signal of the sample.For SFR-selected samples, we adopt the following model: • For central galaxies, and for a ﬁxed value of halo mass, M ,we assume the SFR is proportional to the accreted mass of the haloover the last snapshot (M h ). The dependence between the SFR andthe halo mass (described in Eq. 3) is modeled as a double power-lawwith a maximum value at 𝑀 = M . • For the satellite galaxies, we assume their SFR has been con-stant since their halo reached its peak mass (M = M peak ) untilthe galaxy quenched. The quenching is determined by as a functionof the time elapsed since the galaxy became a satellite and of thehost halo mass of the galaxy.To test the performance of our model, we ﬁtted the galaxyclustering of stellar mass and the SFR selected galaxies fromthe TNG300 magneto-hydrodynamic simulation at three diﬀerentnumber densities: 𝑛 = . ℎ Mpc − , 𝑛 = . ℎ Mpc − & 𝑛 = . ℎ Mpc − , for the projected correlation function ( 𝑤 𝑝 ) and MNRAS , 1–13 (2020) S. Contreras et al. − − log(r /h − Mpc) . . . . . . . l og ( w p (r) / h − M p c ) . < log(M stell /h − M (cid:12) ) < . TNG300SHAM extendedSHAM extended 2xL512N1536SDSS − − log(r /h − Mpc)10 . < log(M stell /h − M (cid:12) ) < . − − log(r /h − Mpc)10 . < log(M stell /h − M (cid:12) ) < . Figure 9.

The projected correlation function, 𝑤 𝑝 ( 𝑟 𝑝 ) , for the TNG300 hydrodynamic simulation (red solid line), our extended SHAM implementation runover the TNG300-3 mimic (green dashed line) on a pair of simulations of 512 ℎ − M (cid:12) (using the average over the two resulting clustering measurements) andthe measurements in the SDSS (Guo et al. 2011). The right, middle and left panel show the predictions for galaxies with log ( M stell / ℎ − M (cid:12) ) between 9.5 - 10,10 - 10.5 & 10.5 - 11. Please notice that diﬀerent from the rest of the paper, the units of mass are in ℎ − M (cid:12) . The SHAM was ﬁtted so it can reproduce theSDSS prediction of the three diﬀerent number densities at the same time the monopole ( 𝜉 ℓ = ), quadrupole ( 𝜉 ℓ = ) and hexadecapole ( 𝜉 ℓ = )of the redshift space correlation function (Fig. 5 and 7). Our modelsuccessfully reproduces the galaxy clustering for all number densi-ties considered, for both stellar mass and SFR selection (with higherprecision for the former), and at 𝑧 = 𝑧 =

1. This was doneusing a dark matter-only simulation with a resolution 64 times lowerthan the TNG300.To further validate our mocks, we compared the halo occu-pation distribution (HOD), the amount of galaxy assembly bias(GAB), and the k-nearest neighbour Cumulative Distribution Func-tions (kNN-CDF) of our samples with those in the TNG300 (Fig. 6& 8). We emphasise that none of these statistics were used in ﬁttingthe free parameters of our extended SHAM model. For the stellarmass sample, all the mock predictions agree almost perfectly withthose of TNG300 galaxies. For the SFR selection, the kNN-CDFand the HOD show a reasonably good agreement with TNG300galaxies. The discrepancies found here and in the galaxy clusteringare small compared to the diﬀerences between diﬀerent state-of-the-art galaxy formation models (eg. Contreras et al. 2013) and ismainly because the SFR is a complex and uncertain quantity tomodel, and thus, diﬃcult to parametrise with some basic subhaloinformation. The amplitude of the galaxy assembly bias of the SFR-selected samples do not agree particularly well with those measuredin the TNG300, but the overall diﬀerences are much smaller thanthose seen in the galaxy clustering. To keep the model simpler, wedecided not to add more parameters and attempt to improve theagreement.Finally, to show the ﬂexibility of our model, we reproducedthe projected correlation function of a galaxy sample of the SDSSselected by stellar mass. We again found a good agreement withtheir clustering measurement (Fig. 9), with most of the diﬀerencescoming probably for the particular stellar mass function used by ourmodel (that is the stellar mass sample of the TNG300).We conclude that this S ubhalo A bundance M atching e xtendedmodel (SHAMe) has the ﬂexibility to reproduce with a high accu-racy most stellar mass or SFR selected samples, independently ifthey are based on models or observations. On a follow-up work, we plan to use our SHAMe model to populate galaxies in scaledsimulations (Angulo et al. 2020; Zennaro et al. 2019), followingthe procedure of Contreras et al. (2020b), to constrain cosmologicalinformation from current and future galaxy surveys. DATA AVAILABILITY

The data underlying this article will be shared on reasonable requestto the corresponding author.

ACKNOWLEDGES

We thank useful comments from Giovanni Aricò, Jonás ChavesMontero, and Marcos Pellejero. The authors acknowledge the sup-port of the ERC Starting Grant number 716151 (BACCO). SCacknowledges the support of the “Juan de la Cierva Formación”fellowship (FJCI-2017-33816). The authors also acknowledge thecomputer resources at MareNostrum and the technical support pro-vided by Barcelona Supercomputing Center (RES-AECT-2019-2-0012 & RES-AECT-2020-3-0014)".

REFERENCES

Angulo R. E., Pontzen A., 2016, MNRAS, 462, L1Angulo R. E., Springel V., White S. D. M., Jenkins A., Baugh C. M., FrenkC. S., 2012, MNRAS, 426, 2046Angulo R. E., Zennaro M., Contreras S., Aricò G., Pellejero-Ibañez M.,Stücker J., 2020, arXiv e-prints, p. arXiv:2004.06245Artale M. C., Zehavi I., Contreras S., Norberg P., 2018, MNRAS, 480, 3978Avila S., et al., 2020, MNRAS, 499, 5486Banerjee A., Abel T., 2020, arXiv e-prints, p. arXiv:2007.13342Behroozi P., Wechsler R. H., Hearin A. P., Conroy C., 2019, MNRAS, 488,3143Binney J., Tremaine S., 1987, Galactic dynamics. Princeton, NJ, PrincetonUniversity Press, 1987, 747 p. MNRAS000 , 1–13 (2020) ﬂexible SHAM model Campbell D., van den Bosch F. C., Padmanabhan N., Mao Y.-Y., ZentnerA. R., Lange J. U., Jiang F., Villarreal A., 2018, MNRAS, 477, 359Chaves-Montero J., Angulo R. E., Schaye J., Schaller M., Crain R. A.,Furlong M., Theuns T., 2016, MNRAS,Conroy C., Wechsler R. H., Kravtsov A. V., 2006, ApJ, 647, 201Contreras S., Baugh C. M., Norberg P., Padilla N., 2013, MNRAS, 432,2717Contreras S., Baugh C. M., Norberg P., Padilla N., 2015, MNRAS, 452,1861Contreras S., Zehavi I., Padilla N., Baugh C. M., Jiménez E., Lacerna I.,2019, MNRAS, 484, 1133Contreras S., Angulo R., Zennaro M., 2020a, arXiv e-prints, p.arXiv:2005.03672Contreras S., Angulo R. E., Zennaro M., Aricò G., Pellejero-Ibañez M.,2020b, MNRAS, 499, 4905Croton D. J., Gao L., White S. D. M., 2007, MNRAS, 374, 1303Croton D. J., et al., 2016, ApJS, 222, 22Davé R., Anglés-Alcázar D., Narayanan D., Li Q., Raﬁeferantsoa M. H.,Appleby S., 2019, MNRAS, 486, 2827Davis M., Efstathiou G., Frenk C. S., White S. D. M., 1985, ApJ, 292, 371De Lucia G., Blaizot J., 2007, MNRAS, 375, 2Dragomir R., Rodríguez-Puebla A., Primack J. R., Lee C. T., 2018, MNRAS,476, 741Dubois Y., et al., 2014, MNRAS, 444, 1453Gao L., Springel V., White S. D. M., 2005, MNRAS, 363, L66Gonzalez-Perez V., et al., 2020, MNRAS, 498, 1852Guo Q., White S., 2014, MNRAS, 437, 3228Guo Q., et al., 2011, MNRAS, 413, 101Hearin A. P., Watson D. F., 2013, MNRAS, 435, 1313Hearin A. P., Watson D. F., Becker M. R., Reyes R., Berlind A. A., ZentnerA. R., 2014, MNRAS, 444, 729Henriques B. M. B., White S. D. M., Thomas P. A., Angulo R., Guo Q.,Lemson G., Springel V., Overzier R., 2015, MNRAS, 451, 2663Kennedy J., Eberhart R., 1995. pp 1942–1948,

Kulier A., Ostriker J. P., 2015, MNRAS, 452, 4013Lacey C. G., et al., 2016, MNRAS, 462, 3854Lagos C. d. P., Tobar R. J., Robotham A. S. G., Obreschkow D., MitchellP. D., Power C., Elahi P. J., 2018, MNRAS, 481, 3573Lehmann B. V., Mao Y.-Y., Becker M. R., Skillman S. W., Wechsler R. H.,2017, ApJ, 834, 37Li C., Kauﬀmann G., Jing Y. P., White S. D. M., Börner G., Cheng F. Z.,2006, MNRAS, 368, 21Mao Y.-Y., Zentner A. R., Wechsler R. H., 2018, MNRAS, 474, 5143Marinacci F., et al., 2018, MNRAS, 480, 5113Moster B. P., Naab T., White S. D. M., 2013, MNRAS, 428, 3121Moster B. P., Naab T., White S. D. M., 2018, MNRAS, 477, 1822Naiman J. P., et al., 2018, MNRAS, 477, 1206Nelson D., et al., 2018, MNRAS, 475, 624Norberg P., Baugh C. M., Gaztanaga E., Croton D. J., 2009, MNRAS, 396,19Paranjape A., Hahn O., Sheth R. K., 2018, MNRAS, 476, 3631Pillepich A., et al., 2018, MNRAS, 475, 648Planck Collaboration et al., 2016, A&A, 594, A13Planck Collaboration et al., 2020, A&A, 641, A6Reddick R. M., Wechsler R. H., Tinker J. L., Behroozi P. S., 2013, ApJ, 771,30Schaye J., et al., 2015, MNRAS, 446, 521Sheth R. K., Tormen G., 2004, MNRAS, 350, 1385Smith R., Choi H., Lee J., Rhee J., Sanchez-Janssen R., Yi S. K., 2016, ApJ,833, 109Springel V., 2010, MNRAS, 401, 791Springel V., White S. D. M., Tormen G., Kauﬀmann G., 2001, MNRAS,328, 726Springel V., et al., 2005, Nature, 435, 629Springel V., et al., 2018, MNRAS, 475, 676Stevens A. R. H., Croton D. J., Mutch S. J., 2016, MNRAS, 461, 859 Vogelsberger M., et al., 2014, Nature, 509, 177White S. D. M., Rees M. J., 1978, MNRAS, 183, 341Zehavi I., et al., 2002, ApJ, 571, 172Zehavi I., et al., 2011, ApJ, 736, 59Zehavi I., Contreras S., Padilla N., Smith N. J., Baugh C. M., Norberg P.,2018, ApJ, 853, 84Zennaro M., Angulo R. E., Aricò G., Contreras S., Pellejero-Ibáñez M.,2019, arXiv e-prints, p. arXiv:1905.08696Zheng Z., et al., 2005, ApJ, 633, 791

APPENDIX A: TIDAL DISRUPTION ON HIGHRESOLUTION SIMULATIONS

In section 3.2 we show the impact of the stripping parameter f s on thegalaxy clustering. While the efect of this parameter on the galaxyclustering is lower compare to the rest of the parameters imple-mented here, its impact becomes more important on high-resolutionsimulations, where the galaxy clustering can be overestimated bythe standard SHAM (lowest number density of Fig 2). This sameeﬀect is shown in Fig. A1 where we show the correlation function ofa standard SHAM (no orphans or additional galaxy assembly bias)with diﬀerent values of f s run over the TNG300-1-Dark only darkmatter simulation and compare it with the TNG300 full hydrody-namic run. The scatter of these SHAM is 𝜎 𝑙𝑜𝑔𝑀 = . s = 𝑓 𝑠 . APPENDIX B: PREDICTIONS AT HIGHER REDSHIFT

In this paper, we showed the performance of our extended SHAMmodel by comparing its predictions to the TNG300 hydrodynamicsimulation. While we only showed the predictions at 𝑧 =

0, we alsotest its performance at higher redshifts. On Fig. B1 and Fig. B2the prediction of our extended SHAM model when ﬁtting the clus-tering of the TNG300 at z=1 for stellar mass and SFR selectedgalaxies. The performance for the stellar mass-selected sample issimilar to the one at z=0 (Fig. 5) while the predictions for the SFRselected galaxies look slightly better than for z=0 (Fig. 7), provingthe ﬂexibility and robustness of our model.

This paper has been typeset from a TEX/L A TEX ﬁle prepared by the author.MNRAS , 1–13 (2020) S. Contreras et al. − . − . . . . log(r /h − Mpc) − l og ( ξ (r)) n = 0 . h Mpc TNG300-1SHAM f s = 0 . s = 0 . s = 0 . s = 0 . s = 0 . s = 0 . − . − . . . . log(r /h − Mpc) n = 0 . h Mpc − . − . . . . log(r /h − Mpc) n = 0 . h Mpc Figure A1.

Similar to the right panel of Fig. 3 but only for the 2PCF ( 𝜉 ( r ) ) with the SHAMs runs over the TNG300-1-Dark simulation. − log(r /h − Mpc) log(w p (r) /h − Mpc)

TNG300 M stell

Basic SHAMSHAM extended − log(r /h − Mpc) − log( ξ l=0 (r)) − log(r /h − Mpc) r ξ l=2 (r) − log(r /h − Mpc) r ξ l=4 (r) Figure B1.

Similar to Fig. 5, but for galaxies at z=1 instead of z=0. − log(r /h − Mpc) log(w p (r) /h − Mpc)

TNG300 M stell

SHAM extended − log(r /h − Mpc) − log( ξ l=0 (r)) − log(r /h − Mpc) r ξ l=2 (r) − log(r /h − Mpc) r ξ l=4 (r) Figure B2. to Fig. 7, but for galaxies at z=1 instead of z=0. MNRAS000