[PDF] Accurate and Efficient Halo-based Galaxy Clustering Modelling with Simulations

Abstract

Small- and intermediate-scale galaxy clustering can be used to establish the galaxy-halo connection to study galaxy formation and evolution and to tighten constraints on cosmological parameters. With the increasing precision of galaxy clustering measurements from ongoing and forthcoming large galaxy surveys, accurate models are required to interpret the data and extract relevant information. We introduce a method based on high-resolution N-body simulations to accurately and efficiently model the galaxy two-point correlation functions (2PCFs) in projected and redshift spaces. The basic idea is to tabulate all information of haloes in the simulations necessary for computing the galaxy 2PCFs within the framework of halo occupation distribution or conditional luminosity function. It is equivalent to populating galaxies to dark matter haloes and using the mock 2PCF measurements as the model predictions. Besides the accurate 2PCF calculations, the method is also fast and therefore enables an efficient exploration of the parameter space. As an example of the method, we decompose the redshift-space galaxy 2PCF into different components based on the type of galaxy pairs and show the redshift-space distortion effect in each component. The generalizations and limitations of the method are discussed.

Full PDF

aa r X i v : . [ a s t r o - ph . C O ] M a r Mon. Not. R. Astron. Soc. , 000–000 (0000) Printed August 13, 2018 (MN L A TEX style ﬁle v2.2)

Accurate and Efﬁcient Halo-based Galaxy Clustering Modellingwith Simulations

Zheng Zheng ⋆ and Hong Guo , Department of Physics and Astronomy, University of Utah, 115 South 1400 East, Salt Lake City, UT 84112, USA Shanghai Astronomical Observatory, Chinese Academy of Sciences, Shanghai 200030, China

August 13, 2018

ABSTRACT

Small- and intermediate-scale galaxy clustering can be used to establish the galaxy-halo con-nection to study galaxy formation and evolution and to tighten constraints on cosmologicalparameters. With the increasing precision of galaxy clustering measurements from ongoingand forthcoming large galaxy surveys, accurate models are required to interpret the data andextract relevant information. We introduce a method based on high-resolution N -body simu-lations to accurately and efﬁciently model the galaxy two-point correlation functions (2PCFs)in projected and redshift spaces. The basic idea is to tabulate all information of haloes in thesimulations necessary for computing the galaxy 2PCFs within the framework of halo occupa-tion distribution or conditional luminosity function. It is equivalent to populating galaxies todark matter haloes and using the mock 2PCF measurements as the model predictions. Besidesthe accurate 2PCF calculations, the method is also fast and therefore enables an efﬁcientexploration of the parameter space. As an example of the method, we decompose the redshift-space galaxy 2PCF into different components based on the type of galaxy pairs and show theredshift-space distortion effect in each component. The generalizations and limitations of themethod are discussed. Key words: cosmology: observations – cosmology: theory – galaxies: clustering – galax-ies: distances and redshifts – galaxies: haloes – galaxies: statistics – large-scale structure ofUniverse

Over the past two decades, large galaxy redshift surveys, such asthe Sloan Digital Sky Survey (SDSS; York et al. 2000), the Two-Degree Field Galaxy Redshift Survey (2dFGRS; Colless 1999), theSDSS-III (Eisenstein et al. 2011), and the WiggleZ Dark EnergySurvey (Blake et al. 2011), have enabled us to study in detail thelarge-scale structure of the universe probed by galaxies. Galaxyclustering has become a powerful tool to study galaxy formationand evolution and to learn about cosmology. An informative way tointerpret galaxy clustering is to link galaxies to the underlying darkmatter halo population, whose formation and evolution are dom-inated by gravitational interaction and whose properties are wellunderstood with analytic models and N –body simulations.The commonly adopted descriptions of the connection be-tween galaxies and dark matter haloes include the halo occupationdistribution (HOD; e.g. Jing et al. 1998; Peacock & Smith 2000;Seljak 2000; Scoccimarro et al. 2001; Berlind & Weinberg 2002;Berlind et al. 2003; Zheng et al. 2005) and the conditional luminos-ity function (CLF; e.g. Yang et al. 2003). The former speciﬁes the ⋆ E-mail: [email protected] probability distribution of the number of galaxies in a given sampleas a function of halo mass, together with the spatial and velocitydistribution of galaxies inside haloes. The latter speciﬁes the lumi-nosity distribution of galaxies as a function of halo mass. Givena set of HOD or CLF parameters, with the halo population for agiven cosmological model, galaxy clustering statistics can be pre-dicted. Such frameworks have been successfully applied to galaxyclustering data to infer the connection between galaxy propertiesand halo mass (see e.g. van den Bosch et al. 2003; Zehavi et al.2005, 2011; Zheng et al. 2007, 2009; Guo et al. 2014; Skibba et al.2015) and to constrain cosmology (e.g. van den Bosch et al. 2003;Tinker et al. 2005; Cacciato et al. 2013; Reid et al. 2014). In partic-ular, the main clustering statistic used is the two-point correlationfunction (2PCF) of galaxies, which is the focus of this paper aswell.Halo properties, like their mass function and spatial cluster-ing (bias), can be understood analytically (e.g. Press & Schechter1974; Mo et al. 1996; Sheth & Tormen 1999), and N -body simu-lations also enable accurate ﬁtting formulae to be obtained (e.g.Jenkins et al. 2001; Tinker et al. 2008, 2010). Based on these, ana-lytic models of galaxy 2PCF can be developed. The basic idea is todecompose the 2PCF into contributions from intra-halo and inter- c (cid:13) Zheng Zheng and Hong Guo halo galaxy pairs. The intra-halo component, or the one-halo term,represents the highly nonlinear part of the 2PCF. The inter-halocomponent, or the two-halo term, can be largely modelled by lineartheory. Such analytic models have the advantage of being compu-tationally inexpensive, and they can be used to efﬁciently probe theHOD/CLF and cosmology parameter space. However, as the preci-sion of the 2PCF measurements in large galaxy surveys continuesto improve, the requirement on the accuracy of the analytic mod-els becomes more and more demanding. As pointed out in Zheng(2004a), an accurate model of the galaxy 2PCF needs to incorporatethe nonlinear growth of the matter power spectrum (e.g. Smith et al.2003), the halo exclusion effect, and the scale-dependent halo bias.In addition, the non-spherical shape of haloes should also be ac-counted for (e.g. Tinker et al. 2005; van den Bosch et al. 2013).These are just factors to be taken into account in computing thereal-space or projected 2PCFs. For redshift-space 2PCFs, more fac-tors come into play. An accurate analytical description of the ve-locity ﬁeld of dark matter haloes in the nonlinear or weakly non-linear regime proves to be difﬁcult and complex (e.g. Tinker 2007;Reid & White 2011; Zu & Weinberg 2013). Therefore, an accurateanalytic model of redshift-space 2PCFs on small and intermediatescales is still not within reach.The above complications faced by analytic models can all beavoided or greatly reduced if the 2PCF calculation is directly donewith the outputs of N -body simulations. With the simulation, darkmatter haloes can be identiﬁed, and their properties (mass, veloc-ity, etc) can be obtained. For a given set of HOD/CLF parameters,one can populate haloes with galaxies accordingly (e.g. using darkmatter particles as tracers) and form a mock galaxy catalog. The2PCFs measured from the mock catalog are then the model pre-dictions used to model the measurements from observations. Sucha method of directly populating simulations have been developedand applied to model galaxy clustering data (e.g. White et al. 2011;Parejko et al. 2013). This simulation-based model is attractive, asmore and more large high-resolution N -body simulations emerge.It is also straightforward to implement. Once the mock catalog isproduced, measuring the 2PCFs can be made fast (e.g. with treecode). However, populating haloes with a given set of HOD/CLFparameters is probably the most time-consuming step, as one needsto loop over all haloes of interest. In addition, information of indi-vidual haloes and tracer particles is needed, like their positions andvelocities. Even with only a subset of all the particles in a high-resolution simulation, the amount of data can still be substantial.The purpose of this paper is to introduce a method that takesthe advantage of the simulation-based model, but being much moreefﬁcient in modelling galaxy clustering. The main idea is to decom-pose the galaxy 2PCFs and compress the information in the simu-lation by tabulating relevant clustering-related quantities of darkmatter haloes. We also apply a similar idea to extend the com-monly used sub-halo abundance matching method (SHAM; e.g.Conroy et al. 2006).The paper is structured as follows. In Section 2, we formu-late the method, within the HOD/CLF-like framework and withinthe halo/sub-halo framework. In Section 3, we show an exampleof modelling redshift-space 2PCFs, which also provides an under-standing of the three-dimensional (3D) small- and intermediate-scale galaxy redshift-space 2PCF and its multipoles by decompos-ing them into the various components. In Section 4, we summarizethe method and discuss possible generalizations and limitations. In our simulation-based method, we divide haloes identiﬁed in N -body simulations into narrow bins of a given property, which de-termines galaxy occupancy. In the commonly used HOD/CLF, theproperty is the halo mass. In our presentation, we use halo mass asthe halo variable, but the method can be generalized to any set ofhalo properties.The basic idea of the method is to decompose the galaxy2PCF into contributions from haloes of different masses, from one-halo and two-halo terms, and from different types of galaxy pairs(e.g. central-central, central-satellite, and satellite-satellite pairs).The decomposition also allows the separation between the halooccupation and halo clustering. The former relies on the speciﬁcHOD/CLF parameterization, while the latter can be calculated fromthe simulation. The method is to tabulate all relevant informationabout the latter for efﬁcient calculation of galaxy 2PCFs and explo-ration of the HOD/CLF parameter space.We ﬁrst formulate the method in the HOD/CLF framework.We then apply the similar idea to the SHAM case, which providesa more general SHAM method. Let us start with a given N -body simulation and a given set ofHOD/CLF parameters. To populate galaxies into a halo identiﬁed inthe simulation, we can put one galaxy at the halo ‘centre’ as a cen-tral galaxy, according to the probability speciﬁed by the HOD/CLFparameters. Halo ‘centre’ should be deﬁned to reﬂect galaxy for-mation physics. For example, a sensible choice is the position ofpotential minimum rather than centre of mass. For satellites, wecan choose particles as tracers. In the usually adopted models, it isassumed that satellite galaxies follow dark matter particles insidehaloes (e.g. Zheng 2004a; Tinker et al. 2005; van den Bosch et al.2013), rooted in theoretical basis (e.g. Nagai & Kravtsov 2005).One can certainly modify the distribution proﬁle as needed, andbelow we assume that the distribution of galaxies inside haloes hasbeen speciﬁed and that the corresponding tracer particles have beenselected for each halo.We divide haloes in the simulation into N narrow mass binsand denote the mean number density of haloes in the mass bin log M i ± d log M i / as ¯ n i . The mean number density of galax-ies is computed as ¯ n g = X i ¯ n i [ h N cen ( M i ) i + h N sat ( M i ) i ] , (1)where N cen ( M ) and N sat ( M ) are the occupation numbers of cen-tral and satellite galaxies in a halo of mass M , hi denotes the aver-age over all haloes of this mass, and i = 1 , ..., N .In the halo-based model, galaxy 2PCF ξ gg is computed asthe combination of two terms, ξ gg = 1 + ξ + ξ (Zheng2004a), where the one-halo term ξ (two-halo term ξ ) are fromcontributions of intra-halo (inter-halo) galaxy pairs. FollowingBerlind & Weinberg (2002), the one-halo term can be computedbased on

12 ¯ n g (¯ n g d r ) (cid:2) ξ ( r ) (cid:3) = X i ¯ n i h N pair ( M i ) i f ( r ; M i ) d r . (2)The left-hand side (LHS) is the number density of one-halo pairswith separation in the range r ± d r / from the deﬁnition of 2PCF. c (cid:13) , 000–000 alaxy Clustering Modelling with Simulations The right-hand side (RHS) is the same quantity from counting one-halo pairs in each halo and the summation is over all the halo massbins. Here h N pair ( M ) i is the total mean number of galaxy pairs inhaloes of mass M , and f ( r ; M ) is the probability distribution ofpair separation in haloes of mass M , i.e. f ( r ; M ) d r is the proba-bility of ﬁnding pairs with separation in the range r ± d r / in haloesof M . By further decomposing pairs into central-satellite (cen-sat)and satellite-satellite (sat-sat) pairs, we reach the following expres-sion, ξ ( r ) = X i n i ¯ n g h N cen ( M i ) N sat ( M i ) i f cs ( r ; M i )+ X i ¯ n i ¯ n g h N sat ( M i )[ N sat ( M i ) − i f ss ( r ; M i ) . (3)The functions f cs ( r ; M ) and f ss ( r ; M ) are the probability distri-butions of one-halo cen-sat and sat-sat galaxy pair separation inhaloes of mass M . They are normalized such that Z f cs ( r ; M ) d r = 1 and Z f ss ( r ; M ) d r = 1 . (4)Note that here and in what follows, the 2PCF can be either real-space, projected-space, redshift-space, or it can be the multipolesof the redshift-space 2PCF. The variable r should be understood aspair separation in the corresponding space. For redshift-space clus-tering, we discuss how to specify velocity distribution of galaxieslater.To compute the two-halo term, we add up all possible two-halogalaxy pairs, following the 2PCF decomposition from different paircounts in Zu et al. (2008). Similar to equation (2), the total numberdensity of two-halo pairs with separation in the range r ± d r / is n pair , = 12 ¯ n g (¯ n g d r ) (cid:2) ξ ( r ) (cid:3) , (5)which is composed of two-halo central-central (cen-cen) pairs n cc − pair , = 12 X i = j [¯ n i h N cen ( M i ) i ][¯ n j h N cen ( M j ) i d r ] × [1 + ξ hh , cc ( r ; M i , M j )] , (6)two-halo cen-sat pairs n cs − pair , = X i = j [¯ n i h N cen ( M i ) i ][¯ n j h N sat ( M j ) i d r ] × [1 + ξ hh , cs ( r ; M i , M j )] , (7)and two-halo sat-sat pairs n ss − pair , = 12 X i = j [¯ n i h N sat ( M i ) i ][¯ n j h N sat ( M j ) i d r ] × [1 + ξ hh , ss ( r ; M i , M j )] . (8)In each of equations (6)–(8), the summation is over all halo massbins (i.e. i = 1 , ..., N and j = 1 , ..., N ). The three correlation func-tions on the RHS have the following meanings – ξ hh , cc ( r ; M i , M j ) is just the two-point cross-correlation function between ‘centres’(positions to put central galaxies) of haloes of masses M i and M j (cen-cen); ξ hh , cs ( r ; M i , M j ) is the two-point cross-correlationfunction between the ‘centres’ of M i haloes and the satellite tracerparticles in the (extended) M j haloes (cen-sat); ξ hh , ss ( r ; M i , M j ) is the two-point cross-correlation function between satellite tracerparticles in the (extended) M i haloes and those in the (extended) M j haloes (sat-sat). With n pair , = n cc − pair , + n cs − pair , + n ss − pair , , we reach the ﬁnal expression for the two-halo term, ξ ( r ) = X i = j ¯ n i ¯ n j ¯ n g h N cen ( M i ) ih N cen ( M j ) i ξ hh , cc ( r ; M i , M j )+ X i = j n i ¯ n j ¯ n g h N cen ( M i ) ih N sat ( M j ) i ξ hh , cs ( r ; M i , M j )+ X i = j ¯ n i ¯ n j ¯ n g h N sat ( M i ) ih N sat ( M j ) i ξ hh , ss ( r ; M i , M j ) . (9)Equations (1), (3), and (9) lead to the method we pro-pose. The quantities related to galaxy occupancy are speciﬁedby the HOD/CLF parameterization one chooses, while those re-lated to haloes are from the simulation, independent of theHOD/CLF parameterization. We therefore can prepare tables for ¯ n i , f cs ( r ; M i ) , f ss ( r ; M i ) , ξ hh , cc ( r ; M i , M j ) , ξ hh , cs ( r ; M i , M j ) ,and ξ hh , ss ( r ; M i , M j ) . For a given set of HOD/CLF parameters,the predictions of galaxy 2PCFs can be obtained from perform-ing the weighted summation over the tables. The tables are onlyprepared once, and we can then change the galaxy occupation asneeded to compute galaxy 2PCFs for different galaxy samples anddifferent sets of HOD/CLF parameters, which is much more efﬁ-cient than populating galaxies into haloes by selecting particles.Since summation is used to replace integration in the method,we need to choose narrow halo mass bins ( d log M = 0 . is usu-ally sufﬁcient, as shown in Section 3). The ¯ n i table represents thehalo mass function. To prepare the other tables that depend onpair separation, the bins of pair separation r are best chosen tomatch the ones used in the measurements from observational data,which would naturally avoid any discrepancy related to the ﬁnitebin sizes. For haloes in each mass bin, the f cs and f ss tables canbe computed by using either all the particles in the haloes withthe speciﬁed distribution or a random subset. For ξ hh , cc , ξ hh , cs ,and ξ hh , ss , we effectively compute the halo-halo two-point cross-correlation function with different deﬁnitions of halo positions. For ξ hh , cc , halo positions are deﬁned by our choice of ‘centres’. For ξ hh , cs ( r ; M i , M j ) , we choose ‘centres’ for M i haloes and positionsof arbitrary tracer particles in M j haloes. For ξ hh , ss ( r ; M i , M j ) ,positions of arbitrary tracer particles in both M i and M j haloes arechosen. We can use any number of tracer particles in each halo todo the calculation. For haloes with positions deﬁned by the tracerparticles, they can be thought as extended (with positions having aprobability distribution). On large scales, ξ hh , cc , ξ hh , cs , and ξ hh , ss are the same, while on small scales, ξ hh , cs and ξ hh , ss are smoothedversion of ξ hh , cc . Note that in analytic models such differencesare usually neglected. In computing the three halo-halo correlationfunctions, we do not need to construct random catalogs to ﬁnd outthe pair counts from a uniform distribution – in the volume V sim of the simulation with periodic boundary conditions, the counts ofcross-pairs at separation in the range r ± d r / between two ran-domly distributed populations with number densities ¯ n i and ¯ n j aresimply (¯ n i V sim )(¯ n j d r ) . Making use of this fact can greatly re-duce the computational expense in preparing the tables.For the redshift-space tables, in addition to the halo veloci-ties, one needs to specify the velocity distribution of galaxies in-side haloes, which can be different from that of dark matter parti-cles (a.k.a. velocity bias; e.g. Berlind & Weinberg 2002). The dif-ference can be parameterized by central and satellite velocity biasparameters (e.g. Guo et al. 2015a). For a set of central and satel-lite velocity bias parameters and with a choice of the line-of-sightdirection, we can obtain the redshift-space positions of the cen-tral galaxy and satellite tracer particles according to halo velocities c (cid:13)000 , 000–000 Zheng Zheng and Hong Guo and central and satellite galaxy velocity distributions inside haloes,and the redshift-space tables can be computed. We suggest to pre-pare tables for different sets of central and satellite velocity biasparameters and interpolate among tables to probe the velocity biasparameter space, as is done in Guo et al. (2015a).Multipole moments of redshift-space galaxy 2PCFs are usu-ally modelled. We can derive the corresponding tables by comput-ing the corresponding multipole moments of f cs , f ss , ξ hh , cc , ξ hh , cs ,and ξ hh , ss . In such a case, r is expressed by s = | r | and µ , the co-sine of the angle between r and the line-of-sight direction. In theintegration (summation) for obtaining the multipoles, the bins of µ match those used in observational measurements to remove anyﬁnite-bin-size effect.For modelling the projected 2PCF w p , a corresponding set oftables can be obtained by integrating the redshift-space tables overthe line-of-sight separation. The integration is done in the same wayas in the measurements with data to avoid any ﬁnite-bin-size effect,summing over the same line-of-sight bins (with the same bin size)up to the same maximum line-of-sight separation. The SHAM method uses more information from (high-resolution)simulations, including both distinct haloes and subhaloes identi-ﬁed inside distinct haloes, where the distinct haloes refer to haloesthat are not subhaloes of another halo. Distinct haloes are also re-ferred to as haloes, main haloes, or host haloes. Central galaxiesare hosted by distinct haloes at the centres, while satellite galaxiesare in subhaloes. Before merging into distinct haloes, subhaloes aredistinct haloes themselves. The SHAM method generally works inthe following way. By adopting one property, subhaloes and dis-tinct haloes can be treated as a uniﬁed entity. For distinct haloes,the property is evaluated at the time of interest. For subhaloes, it be-comes common practice to evaluate the property at the time whensubhaloes were still distinct haloes. The properties commonly usedinclude mass ( M acc ) at the time a subhalo was accreted into a hosthalo, maximum circular velocity V acc at the time of accretion, andpeak maximum circular velocity V peak over the history of the sub-halo as a distinct halo. The connection between haloes/subhaloesand galaxies is established by rank ordering haloes/subhaloes ac-cording to the given property and galaxies according to one cer-tain property (e.g. luminosity or stellar mass). When normalizedto the same survey/simulation volume, halo/subhalo and galaxyof the same rank are linked. A more general treatment also ac-counts for the scatter between the halo/subhalo property and thegalaxy property. The simple procedure of linking light (galaxies) tomatter (haloes/subhaloes) can provide a reasonable interpretationof galaxy clustering trend and enable a study of galaxy evolution(e.g. Conroy et al. 2006; Conroy & Wechsler 2009; Behroozi et al.2013; Reddick et al. 2013).We generalize the idea in Section 2.1 to the subhalo case, ex-tending the SHAM model and making it efﬁcient to model galaxyclustering. The model allows the scatter between the halo/subhaloproperty and the galaxy property to be different for distinct haloes(central galaxies) and subhaloes (satellite galaxies). We use mass asthe halo/subhalo property variable here, which can be understoodas the mass at accretion ( M acc ). However, it can be replaced by anyproperty one chooses to adopt, e.g. V acc and V peak . A halo/subhalomethod following a similar spirit of pair decomposition to modelthe projected galaxy 2PCF and weak lensing signal is presented inNeistein et al. (2011) and Neistein & Khochfar (2012).Compared to the commonly used SHAM method that con- nects the whole range of galaxy property and halo/subhalo property,the method presented here works for each individual galaxy sam-ple. To some degree, it is formulated in an HOD/CLF-like form,with distinct haloes and subhaloes as tracers of central and satellitegalaxies, respectively. It is no longer limited to abundance match-ing. Instead, the method can be used to ﬁt both galaxy abundanceand galaxy clustering (2PCFs).For a given galaxy sample, the scatter between halo/subhaloproperty and galaxy property means that not all haloes/subhaloesare fully occupied by these galaxies, which can be characterized bythe probability of occupancy (or the smaller-than-unity mean oc-cupation number). Denote the mean occupation number of centralgalaxies in distinct haloes of mass M h as p cen ( M h ) and that ofsatellite galaxies in subhaloes of mass M s as p sat ( M s ) . The samebins of mass are adopted for M h and M s . In principle we do notneed to differentiate M s and M h , since the scripts of ‘c’ (cen) and‘s’ (sat) below make the situation self-explanatory. Let the meannumber densities of distinct haloes and subhaloes in the mass bin log M i ± d log M i / be ¯ n h ,i and ¯ n s ,i , respectively.For a given sample of galaxies, with a model of p cen ( M ) and p sat ( M ) , the mean number density of galaxies ¯ n g is computed as ¯ n g = X i [¯ n h ,i p cen ( M i ) + ¯ n s ,i p sat ( M i )] . (10)With a similar decomposition as in equation (9), the galaxy 2PCFcan be computed as ξ gg ( r ) = X i,j ¯ n h ,i ¯ n h ,j ¯ n g p cen ( M i ) p cen ( M j ) ξ hh ( r ; M i , M j )+ X i,j n h ,i ¯ n s ,j ¯ n g p cen ( M i ) p sat ( M j ) ξ hs ( r ; M i , M j )+ X i,j ¯ n s ,i ¯ n s ,j ¯ n g p sat ( M i ) p sat ( M j ) ξ ss ( r ; M i , M j ) , (11)which simply states that the total number of galaxy pairs is the sumof cen-cen, cen-sat, and sat-sat pairs. The three correlation func-tions on the RHS have the following meanings – ξ hh ( r ; M i , M j ) isjust the two-point cross-correlation function between centres of dis-tinct haloes of masses M i and M j ; ξ hs ( r ; M i , M j ) is the two-pointcross-correlation function between centres of M i distinct haloesand those of M j subhaloes; ξ ss ( r ; M i , M j ) is the two-point cross-correlation function between centres of subhaloes of masses M i and M j . Unlike the particle case in Section 2.1, there are no explicitone-halo and two-halo terms here (though they can be derived), andthe i = j condition is not imposed in the summation.The quantities p cen ( M ) and p sat ( M ) come from the occu-pation function model, which is up to our choice of parameteriza-tion for the sample of galaxies. In this halo/subhalo-based method,we only need to prepare tables for ¯ n h ,i , ¯ n s ,i , ξ hh ( r ; M i , M j ) , ξ hs ( r ; M i , M j ) , and ξ ss ( r ; M i , M j ) .As with the tables using particles (Section 2.1), for redshift-space 2PCF or multipole moments, tables for different sets of cen-tral and satellite velocity bias parameters can be prepared. For eachset, haloes and subhaloes are shifted to redshift-space positions forcalculation. Tables can also be generated for modelling the pro-jected 2PCF w p . The procedures and bins used in the measurementsshould be followed so that the model and measurements are madefully consistent. c (cid:13) , 000–000 alaxy Clustering Modelling with Simulations Figure 1.

Decomposition of the projected galaxy 2PCF w p and redshift-space 2PCF multipoles ξ , ξ , and ξ into the various one-halo and two-halocomponents (one-halo cen-sat, one-halo sat-sat, two-halo cen-cen, two-halo cen-sat, and two-halo sat-sat). The circles are measurements from 100 mockgalaxy catalogs constructed by populating galaxies into dark matter halos in the simulation, according to the set of ﬁducial HOD parameters. The curves arecalculations with the method introduced in this paper. See text for more details. The method developed here has been successfully applied to modelprojected and redshift-space 2PCFs of SDSS and SDSS-III galax-ies on small to intermediate scales (e.g. Guo et al. 2015a,b,c) andto compare HOD and SHAM models (Guo et al. in prep.). As themethod is built on the basis of decomposition of galaxy 2PCFs,here we provide an example to illustrate the different 2PCF com-ponents. In particular, we show the components for the redshift-space 3D 2PCF and the manifestation of redshift-space distortionsin each component to have a better understanding of the redshift-space 2PCFs within the HOD framework. In addition, we also in-vestigate how redshift-space 2PCFs help with HOD constraints,including the inference of the galaxy velocity distribution insidehaloes. The example adopts HOD parameters for the sample of z ∼ . CMASS galaxies in the the SDSS-III Baryon OscillationSpectroscopic Survey (BOSS; Dawson et al. 2013). With sphericaloverdensity haloes and halo particles from the z = 0 . output ofthe MultiDark simulation (MDR1; Prada et al. 2012; Riebe et al.2013), we create tables for halo properties, including halo numberdensity ¯ n (i.e. halo mass function), projected 2PCF w p , redshift-space 2PCF monopole ξ , quadrupole ξ , and hexadecapole ξ .We choose the position of the potential minimum as the centre ofeach halo for putting the central galaxy and halo particles as trac-ers of satellites. Each of w p and ξ / / has ﬁve components (one-halo cen-sat, one-halo sat-sat, two-halo cen-cen, two-halo cen-sat,and two-halo sat-sat). To generate the w p ( r p ) tables, we measure ξ ( r p , r π ) for each component and for each combination of halomass bins and sum over the r π direction, where r p and r π are thepair separations in the directions perpendicular and parallel to the c (cid:13) , 000–000 Zheng Zheng and Hong Guo -10 0 10r p (h -1 Mpc)-10010 r π (h -1 Mpc) -10 0 10r p (h -1 Mpc)-10010 r π (h -1 Mpc) -10 0 10r p (h -1 Mpc)-10010 r π (h -1 Mpc) p (h -1 Mpc) -10 0 10r p (h -1 Mpc) 2hc-s -10 0 10r p (h -1 Mpc) -10 0 10r p (h -1 Mpc) 2hs-s -10 0 10r p (h -1 Mpc) -10 0 10r p (h -1 Mpc) 2htotal -10010 r π (h -1 Mpc) -10010 r π (h -1 Mpc) total 1hc-s 1hs-s 1htotal -4.0-3.2-2.4-1.6-0.80.00.81.6

Figure 2.

Decomposition of the 3D redshift-space 2PCF ξ ( r p , r π ) into the various one-halo and two-halo components (one-halo cen-sat, one-halo sat-sat,two-halo cen-cen, two-halo cen-sat, and two-halo sat-sat). The plot is based on the average measurements from 100 mock galaxy catalogs constructed bypopulating galaxies into dark matter halos in the simulation, according to the set of ﬁducial HOD parameters. The color scale shows ξ ( r p , r π ) in logarithmicscale. See text for more details. line-of-sight direction (chosen to be one principle direction of thesimulation box). To generate the ξ / / tables, we measure ξ ( s, µ ) for each component and for each combination of halo mass bins andform the multipoles by integrating over µ , where s is the redshift-space pair separation and µ the cosine of the angle between pairdisplacement and the line-of-sight direction. Following the setupin the observational measurements (Guo et al. 2015a), we have 19bins for r p and s uniformly spaced in logarithmic space, 50 linearlyspaced bins in r π and 20 linearly spaced bins in µ . For halo massbins, we use d log M = 0 . . We construct tables for 5 bins of cen-tral velocity bias parameter α c and 8 bins of satellite velocity biasparameter α s , respectively. The total size of the ﬁnal set of tablesis about 10GB. That is, the information in the high-resolution sim-ulation output relevant for modelling projected and redshift-space2PCFs of galaxies has been tremendously compressed, making themodelling tractable even with a desktop computer.For the HOD, we adopt the common parameterization for asample of galaxies above a luminosity threshold (Zheng et al. 2005,2007). The mean occupation function of central galaxies in haloesof mass M is h N cen ( M ) i = 12 (cid:20) (cid:18) log M − log M min σ log M (cid:19)(cid:21) , (12)where erf is the error function. For the mean occupation functionof satellite galaxies, we use h N sat ( M ) i = h N cen ( M ) i (cid:18) M − M M ′ (cid:19) α . (13)The number of satellites in haloes of mass M is assumed to followthe Poisson distribution with the above mean. In addition, for mod-elling redshift-space 2PCFs, we have two additional HOD param-eters α c and α s for central and satellite velocity bias. Essentially, α c ( α s ) is the ratio of the velocity dispersion of central (satellite)galaxies to that of dark matter particles inside halos (see Guo et al. 2015a). For the ﬁducial model, we adopt the set of parameters thatﬁt the projected and redshift-space 2PCFs for the CMASS sam-ple in Guo et al. (2015a) – log M min = 13 . , σ log M = 0 . , log M = 13 . , log M ′ = 14 . , α = 1 . , α c = 0 . , and α s = 0 . . Halo masses are in units of h − M ⊙ .With the tables and the ﬁducial HOD parameters, we followequations (1), (3), and (9) to compute all the components of w p and ξ / / . For the purpose of a sanity check, we also measure thecomponents from 100 mock galaxy catalogs. The mock catalogsare generated from populating haloes in the simulation by puttingcentral galaxies at the potential minimum in haloes and drawingrandom dark matter particles as satellite galaxies, in accordancewith the occupation distributions and velocities set by the ﬁducialHOD parameters. For the purpose of comparison with the modelbased on the tables, we decompose the galaxy 2PCF (either w p or ξ / / ) measured in the mock catalogs into ﬁve components, ξ gg ( r ) = 2 ¯ n cs − pair ¯ n g f cs ( r ) + 2 ¯ n ss − pair ¯ n g f ss ( r )+ ¯ n c ¯ n g ξ cc ( r ) + 2 ¯ n c ¯ n s ¯ n g ξ cs ( r ) + ¯ n s ¯ n g ξ ss ( r ) . (14)The ﬁrst two terms on the RHS are one-halo terms – ¯ n cs − pair and ¯ n ss − pair are the mean number densities of one-halo cen-sat pairsand one-halo sat-sat pairs measured in the mock catalogs, and f cs and f ss are the normalized average distributions of one-halo cen-sat and sat-sat pairs in the mock. The last three terms on the RHSare two-halo terms – ¯ n c and ¯ n s are the mean number densities ofcentral and satellite galaxies in the mock, and ξ cc , ξ cs , ξ ss are the2PCFs by counting only two-halo cen-cen, cen-sat, and sat-sat pairs(Zu et al. 2008).Figure 1 shows the decomposition of w p and ξ / / for theﬁducial model. As expected, the calculations from the simulation-based method (curves) agree with the measurements from the mock c (cid:13) , 000–000 alaxy Clustering Modelling with Simulations catalogs (circles), which is reassuring. For the projected 2PCF w p (top-left panel), the one-halo cen-sat term (red) dominate the small-scale signal. The one-halo sat-sat term (magenta) extends to largerscales, since the maximum sat-sat pair separation in a halo is the di-ameter of the halo, twice that of the cen-sat pair separation. Owingto the low satellite fraction ( f sat ∼ ) of this sample of galaxies,the contribution of the one-halo sat-sat pairs to w p is overall small,but noticeable around h − Mpc , the one-halo to two-halo termtransition scales. On large scales, the three two-halo terms have asimilar shape, since they essentially follow the halo-halo correla-tion. The ﬂattening towards small scales are caused by the halo ex-clusion effect. Compared to the two-halo cen-cen component, thetwo-halo cen-sat is smoothed on small scales, since each halo con-tributing the satellite of the cen-sat pair on average is extended in-stead of a point source (the case for the halo contributing the centralgalaxy of the pair) as a result of the spatial distribution of satellitesinside haloes. The two-halo sat-sat term is even more smoothed,since every halo becomes extended. To see the relative contributionof each term to the large-scale 2PCF, we note that in equation (14), ξ cc ∝ b c , ξ cs ∝ b c b s , and ξ ss ∝ b s on large scales, where b c and b s are the large-scale bias factors for central and satellite galax-ies, respectively. Since satellites on average reside in more massivehaloes than central galaxies, the value of b s is higher than that of b c (roughly by tens of per cent for luminosity-threshold samples).From equation (14), we see that the relative contributions to thelarge-scale 2PCF from the two-halo cen-cen, cen-sat, and sat-satterms are 1 : f n f b : ( f n f b ) , with f n = ¯ n s / ¯ n c = f sat / (1 − f sat ) the satellite to central galaxy number density ratio and f b = b s /b c the satellite to central galaxy bias ratio. For the sample we con-sider, the ratios are 1: 25% : 1.6%. For lower luminosity sampleswith higher satellite fractions, we expect the contributions from thetwo-halo cen-sat and sat-sat to be substantially higher.The decomposition of the redshift-space 2PCF monopole ξ (top-right panel) and the relative amplitudes of the various termsare similar to the case of w p . The bottom two panels show thecase of quadrupole ξ and hexadecapole ξ , and a factor s . ismultiplied for each term so that both the small-scale and large-scale signals can reasonably show up. The Fingers-of-God effect(Jackson 1972; Huchra 1988) from one-halo terms causes a posi-tive quadrupole. In the ξ panel, we see that the inﬂuence of theone-halo terms can extend to about 10 h − Mpc in the quadrupole.The negative quadrupole on large scales manifests the Kaiser effect(Kaiser 1987; Hamilton 1992) caused by the coherent motion ofhaloes, falling into overdense regions and streaming out of under-dense regions. The two-halo cen-cen term dominates the large-scalequadrupole, but the cen-sat term is also important. Both terms showlow positive quadrupole signals toward small scales caused by therandom motion of haloes (and galaxies). The two-halo sat-sat termmakes an almost negligible contribution to the quadrupole on allscales. The hexadecapole ξ (bottom-right panel) are mostly posi-tive from all components. The relative contributions from differentcomponents are similar to the quadrupole case.The projected 2PCF and the redshift-space 2PCF multipolesare usually the quantities to model. The 3D redshift-space 2PCFmeasurements are commonly displayed as contours of ξ ( r p , r π ) ,which make the redshift-space distortion effects on all scales eas-ily visualized. It would be instructive to have the correspondingone-halo and two-halo components to gain a better intuition aboutthe redshift-space distortions. Figure 2 shows such a decompositionmeasured from the mock catalogs, which can also be calculated us-ing the ξ ( r p , r π ) component tables.The leftmost panel shows the total redshift-space 2PCF of the sample, with the Fingers-of-God and Kaiser effects clearly seen.The Fingers-of-God effect, limited to small transverse separation r p , is mainly contributed by the one-halo terms (two middle panelson the top). The one-halo sat-sat component appears to be more ex-tended than the one-halo cen-sat component in both the transverseand the line-of-sight direction. In the transverse direction, it can beexplained by the fact that the largest one-halo sat-sat (cen-sat) pairseparation is about the diameter (radius) of the largest haloes. In theline-of-sight direction, the elongation is mainly a result of galaxymotion inside haloes. The relative line-of-sight velocity of sat-satpairs are higher than that of cen-sat pairs, causing the one-halo sat-sat component to be more extended (shallower proﬁle as a functionof r π ). The total one-halo term (rightmost panel on the top) is dom-inated by the cen-sat and sat-sat component at small r p and slightlylarge r p , respectively.The three two-halo components and the total two-halo termare shown in the bottom panels of Figure 2. In each component, thedouble-hump feature at small r p reﬂects the halo-exclusion effect.The effect would lead to a hole at the centre if the real-space 2PCFwere plotted here. The shift in the line-of-sight galaxy positions inredshift space from galaxy peculiar motion makes the hole partiallyﬁlled. The two-halo cen-cen component shows an overall Kaisersquashing effect along the line of sight. However, the contours atsmall r p are elongated along the line of sight, like the Fingers-of-God effect. This is caused by the random motion of haloes andthat of central galaxies with respect to haloes (i.e. a non-zero cen-tral velocity bias). The two-halo cen-sat component shows a muchstronger line-of-sight elongation up to a few Mpc in r p . The rea-son lies in the motion of satellites inside haloes, which causes theaverage redshift-space distribution of satellites appears extendedalong the line of sight in an average halo hosting the satellites ofthe two-halo cen-sat pairs. The line-of-sight elongation pattern iseven stronger in the two-halo sat-sat component – the correlationof elongated haloes (as a result of the redshift-space spatial distri-bution of satellites inside haloes) completely suppresses the Kaisereffect even on the largest scales shown here ( ∼ h − Mpc ). Thetotal two-halo term is dominated by the cen-cen component witha substantial contribution from the cen-sat component. The sat-satcomponent does not make an important contribution for this sam-ple. As discussed before, we expect the two-halo cen-sat and sat-sat components to become more important for galaxy samples withlower luminosity thresholds and higher satellite fractions.Overall, for the 3D redshift 2PCF ξ ( r p , r π ) different compo-nents of the one-halo and two-halo terms have different transverserange of the line-of-sight elongation. The proﬁle along the line ofsight also depends on the type of pairs in consideration, becom-ing increasingly shallower from cen-cen, cen-sat, to sat-sat com-ponents. For each component, the streaming model (e.g. Peebles1980) usually adopted in simple models of redshift-space distor-tions should work well, which is kind of a convolution of the real-space 2PCF with a velocity dispersion kernel. For the total redshift-space 2PCF, our results indicate that it is hard to use a single veloc-ity dispersion kernel to accurately model the redshift-space distor-tion effect. The different components are needed if one wishes todevelop an accurate analytic model (e.g. Tinker 2007).Finally, we investigate the constraints on the HOD parame-ters from projected and redshift-space 2PCFs. The 2PCFs predictedfrom the ﬁducial set of HOD parameters are used as the input mea-surements, and the full covariance matrix from Guo et al. (2015a)measured from the CMASS data is adopted. The model uncertaintycaused by the ﬁnite volume of the simulation is also accountedfor by rescaling the covariance matrix (see Appendix A). We em- c (cid:13)000

Figure 3.

Left:

Constraints on log M min and σ log M from the 2PCFs with the ﬁducial galaxy sample. The model 2PCFs are calculated with method introducedin this paper. Blue and black contours are for the cases of modelling w p only and jointly modelling w p + ξ / / , respectively. The 68.3% and 95.4% conﬁdencelevels are shown for each case. Right:

Constraints on the central and satellite velocity bias parameters ( α c and α s ) for the ﬁducial galaxy sample from jointlymodelling w p + ξ / / . The red asterisk in each panel indicates the value from the ﬁducial model. ploy a Monte Carlo Markov Chain method to explore the param-eter space of the 7 HOD parameters, M min , σ log M , M , M ′ , α , α c , and α s . We ﬁrst model the projected 2PCF w p only. The ﬁrstﬁve parameters related to the galaxy mean occupation function canbe constrained, while there are virtually no constraints on the ve-locity bias parameters ( α c and α s ) as the line-of-sight informa-tion is lost. We then jointly model w p and the redshift-space 2PCFmultipoles ξ / / . We ﬁnd that redshift-space 2PCFs help tightenthe constraints mainly in M min and σ log M , the two parametersfor the mean occupation function of central galaxies. In the leftpanel of Figure 3, we compare the constraints (marginalized 1 σ and 2 σ contours) from w p only (blue) and w p + ξ / / (black). Theconstraints on the parameters for the mean occupation function ofsatellite galaxies are only slightly improved, mainly in M . In gen-eral, compared to the w p -only case, redshift-space 2PCFs do notlead to a substantial improvement in the HOD parameters relatedto the occupation function. The reason may be related to the factthat the projected 2PCF w p is not independent of the redshift-space2PCFs, and that the information content in ξ / / to constrain theoccupation-related parameters is largely overlapped with that in w p . The correlated information in w p and ξ / / is embedded inthe covariance matrix. Therefore, when jointly modelling w p and ξ / / , it is important to use the full covariance matrix includingthe covariances between w p and ξ / / to avoid double countingthe information content and artiﬁcially tightening the HOD con-straints.The redshift-space distortions are caused by the peculiar mo-tion of galaxies. The peculiar motion of haloes is in the simulationand built in the tables. So modelling redshift-space 2PCFs lead toconstraints of galaxy motion inside haloes, i.e. the central and satel-lite velocity bias parameters. The right panel of Figure 3 showsthat velocity bias parameters can be clearly detected for the ﬁdu-cial sample. Velocity bias parameters have been constrained fromredshift-space clustering for the z ∼ . BOSS CMASS galax-ies (Guo et al. 2015a,b; Reid et al. 2014) and z ∼ . SDSS Maingalaxies (see Guo et al. 2015c and Guo et al. 2015d for applyingthe modelling method based on simulation particles and subhaloes, respectively). More discussions on the velocity bias constraints andthe implications can be found in Guo et al. (2015a).

In this paper, we introduce a simulation-based method to accu-rately and efﬁciently model galaxy 2PCFs in projected and redshiftspaces. The basic idea is to make use of a high-resolution simu-lation and tabulate all the halo information necessary for galaxyclustering calculation. Then on top of the tables, galaxy 2PCFs canbe computed with the galaxy-halo relation speciﬁed by the HOD orCLF model. We also provide a version that applies to and extendsthe SHAM method. Based on the method, we also study the de-composition of the projected and redshift-space galaxy 2PCFs intodifferent components according to the type of galaxy pairs.The proposed method is accurate, since it is directly based onhigh-resolution simulations. The effects like halo exclusion, non-linear evolution, scale-dependent halo bias, and non-sphericity ofhaloes, which are difﬁcult to deal with in analytic methods of com-puting galaxy 2PCFs, are all automatically accounted for in thesimulation-based method. The method also breaks the 2PCFs intoall the one-halo and two-halo components based on the nature ofgalaxy pairs and computes each component accurately, which areusually not the case in analytic methods (especially for the two-halo term). When building the tables, the same binning scheme(in pair separation and in angle) and the same integration proce-dure as used in the observation measurements are adopted, so thereis no binning-related issue when comparing the model predictionwith the measurements. The method is equivalent to measure themodel galaxy 2PCFs from mock catalogs and is as accurate as whatthe mean mock catalog can achieve. The mock catalogs are con-structed by populating galaxies (using tracer particles) to haloesidentiﬁed in the simulation, according to the halo occupation spec-iﬁed by the HOD/CLF model. However, the method is more efﬁ-cient, as it avoids the construction of mock catalogs and the mea-surement of the 2PCFs from the mocks. Instead, ‘populating galax-ies’ and ‘measuring the 2PCFs’ are performed analytically within c (cid:13) , 000–000 alaxy Clustering Modelling with Simulations the HOD/CLF framework. This greatly reduces the computationaltime and make it possible to efﬁciently explore the parameter spacewhen modelling the 2PCF data.A similar method working in Fourier space can be easilydeveloped to model galaxy redshift-space power spectrum. Themethod can also be generalized to other clustering statistics, e.g.angular 2PCF of galaxies, two-point cross-correlation function ofgalaxies, and galaxy-galaxy lensing. Generalizing the method tothree-point correlation function (3PCF) of galaxies is also possi-ble. In principle, there are more components for the 3PCF – cen-sat-sat and sat-sat-sat triplets for the one-halo term, cen-(cen-sat),cen-(sat-sat), sat-(cen-sat), and sat-(sat-sat) triplets for the two-haloterm (the pair in the parentheses is in the same halo), and cen-cen-cen, cen-cen-sat, cen-sat-sat, and sat-sat-sat triplets for the three-halo term. More importantly, compared to the 2PCF case, the di-mension of each 3PCF component table will increase (e.g. twosides and the angle in between for a triangle conﬁguration andthree halo mass indices). To make such a method suitable for the3PCF modelling, further simpliﬁcation is necessary, e.g. throughmultipole or Fourier expansion (e.g. Szapudi 2004; Zheng 2004b;Slepian & Eisenstein 2015).To make use of the high precision of small- to intermediate-scale 2PCFs measurements to help constrain cosmological param-eters (e.g. Zheng & Weinberg 2007; Reid et al. 2014), a set of ta-bles need to be prepared based on simulations with different cos-mological parameters or by rescaling one simulation to differentcosmological models (e.g. Zheng et al. 2002; Tinker et al. 2006;Angulo & White 2010; Reid et al. 2014; Guo et al. 2015c). Evenwith one cosmological model, there may be situations that needmore tables. For example, in the particle-based model, random par-ticles are selected to trace satellite galaxies by default. However, thedifference between the spatial distributions of satellites and darkmatter can be an additional parameter to be constrained. For sucha purpose, one needs to build different sets of tables using tracerparticles of different distributions. In either of the above cases (orany case that needs to extend the tables), the total size of the tableswould have an order-of-magnitude increase. Compared with meth-ods of directly populating simulations, such an increase in tablesize is still reasonable and manageable.With one simulation, we do not have the global or ensembleaverage properties of haloes. That is, the model with one simula-tion has uncertainty caused by the ﬁnite volume effect. One canuse multiple simulations with different realizations of the initialconditions to build the average tables, which reduces the model un-certainty. The model uncertainty should be included in modellingdata. In Appendix A, we show that this can be done by rescalingthe covariance matrix of the measurements based on the ratio ofsimulation and survey volume. For any simulation, the ﬂuctuationmodes with wavelengths longer than the box size are missing, sothe application of our modelling method should be limited to scalesmuch smaller than the simulation box size. This is particularly truefor redshift-space distortion modelling, since the velocity ﬁeld ismore sensitive to large-scale modes than the density ﬁeld.In presenting the method, the halo variable is adopted to behalo mass (or characteristic velocity for the subhalo case) to buildthe tables. The corresponding HOD/CLF model assumes that thestatistical properties of galaxies inside haloes only depend on halomass, not on halo environment or growth history. Clustering ofhaloes at ﬁxed mass is found to depend on the assembly history(a.k.a. assembly bias; e.g. Gao et al. 2005; Wechsler et al. 2006;Zhu et al. 2006; Jing et al. 2007). There is room for the galaxy con-tent in haloes of ﬁxed mass to depend on halo formation history, which would affect galaxy clustering and HOD constraints (e.g.Zentner et al. 2014), although no clear evidence is found in hy-drodynamic galaxy formation simulations (e.g. Berlind et al. 2003)or galaxy clustering measurements (e.g. Lin et al. 2015). As men-tioned in Section 2, the halo variable in our method is not nec-essarily the halo mass. It can certainly be a set of variables, likehalo mass plus a variable characterizing halo formation history (e.g.halo concentration or formation redshift). With tables built in termsof the set of variables, along with an HOD/CLF model depend-ing on these variables, the simulation-based method works in thesame way as presented in this paper. However, the efﬁciency ofthe method drops sharply when including more halo variables. Thelimitation is mainly set by the computation of the two-halo terms,where both the table size and computational time scale as O ( N ) ,with N the total number of bins in halo properties (e.g. with N halo mass bins and N halo formation time bins, N = N N ).In practice, we may be barely able to accomodate the case of twohalo variables, by choosing bin sizes to minmize the table size andcomputational cost without sacriﬁcing the accuracy of the method.Before resorting to directly populating the simulations, a possibleway of circumventing the limitation is to use some combination ofhalo variables, reducing the problem to one effective halo variable.Certainly further investigations are needed to ﬁnd the appropriatecombination(s).A different approach to model galaxy clustering is through anemulator (e.g. Kwan et al. 2015). With this approach, galaxy corre-lation functions are ﬁrst obtained with mock catalogs from N -bodysimulations, spanning a range of HOD parameters. Then the emula-tor works by interpolation to predict the galaxy correlation functionfor any given set of HOD parameters. Compared to the method wepropose in this paper, the emulator can be extremely fast, since itonly performs interpolations and avoids any calculation at the levelof dark matter haloes. In principle, the emulator can be generalizedto interpolate among the one-halo and two-halo component contri-butions to the 2PCFs. However, by construction, the emulator onlyoperates with a certain HOD form and within a certain range ofHOD parameters for the interpolation to work and for the accuracyto be under control. The method we propose performs direct cal-culations with clear physical meanings based on halo properties,and therefore it does not suffer from the above restrictions of anemulator.With increasingly more precise measurements of galaxy clus-tering from forthcoming large galaxy surveys, such as DESI(Levi et al. 2013) and Euclid (Laureijs et al. 2011), we expect thatthe accurate and efﬁcient modelling method introduced in this workand its generalizations will have great potentials and wide applica-tions. ACKNOWLEDGMENTS

ZZ is partially supported by NSF grant AST-1208891. HG ac-knowledges the support of NSFC-11543003 and the 100 TalentsProgram of the Chinese Academy of Sciences.

References

Angulo R. E., White S. D. M., 2010, MNRAS, 405, 143Behroozi P. S., Wechsler R. H., Conroy C., 2013, ApJ, 770, 57Berlind A. A., Weinberg D. H., 2002, ApJ, 575, 587Berlind A. A. et al., 2003, ApJ, 593, 1 c (cid:13)000

Blake C., Kazin E. A., Beutler F. et al., 2011, MNRAS, 418, 1707Cacciato M., van den Bosch F. C., More S., Mo H., Yang X., 2013,MNRAS, 430, 767Colless M., 1999, RSPTA, 357, 105Conroy C., Wechsler R. H., Kravtsov A. V., 2006, ApJ, 647, 201Conroy C., Wechsler R. H., 2009, ApJ, 696, 620Dawson K. S. et al., 2013, AJ, 145, 10Eisenstein D. J., Weinberg D. H., Agol E. et al. 2011, AJ, 142, 72Feldman H. A., Kaiser N., Peacock J. A., 1994, ApJ, 426, 23Gao L., Springel V., White S. D. M., 2005, MNRAS, 363, L66Guo H. et al., 2015a, MNRAS, 446, 578Guo H. et al., 2015b, MNRAS, 449, L95Guo H. et al., 2015c, MNRAS, 453, 4368Guo H. et al., 2015d, arXiv:1508.07012Guo H. et al., 2014, MNRAS, 441, 2398Hamilton A. J. S., 1992, ApJL, 385, L5Huchra J. P., 1988, in Dickey J. M., ed., Astronomical Society ofthe Paciﬁc Conference Series Vol. 5, The Minnesota lectures onClusters of Galaxies and Large-Scale Structure. pp 41–70Jackson J. C., 1972, MNRAS, 156, 1PJenkins A., Frenk C. S., White S. D. M. et al., 2001, MNRAS,321, 372Jing Y. P., Mo H. J., Boerner G., 1998, ApJ, 494, 1Jing Y. P., Suto Y., Mo H. J., 2007, ApJ, 657, 664Kaiser N., 1987, MNRAS, 227, 1Kwan J., Heitmann K., Habib S., et al., 2015, ApJ, 810, 35Laureijs R., Amiaux J., Arduini S. et al., 2011, ArXiv e-prints,arXiv:1110.3193Levi M., Bebek C., Beers T. et al., 2013, ArXiv e-prints,arXiv:1308.0847Lin Y.-T., Mandelbaum R., Huang Y.-H., Huang H.-J., DalalN., Diemer B., Jian H.-Y., Kravtsov A., 2015, ArXiv e-prints,arXiv:1504.07632Mo H. J., Jing Y. P., White S. D. M., 1996, MNRAS, 282, 1096Nagai D., Kravtsov A. V., 2005, ApJ, 618, 557Neistein E., Li C., Khochfar S. et al., 2011, MNRAS, 416, 1486Neistein E., Khochfar S., 2012, ArXiv e-prints, arXiv:1209.0463Parejko J. K. et al., 2013, MNRAS, 429, 98Peacock J. A., Smith R. E., 2000, MNRAS, 318, 1144Peebles P. J. E., 1980, The large-scale structure of the universePrada F., Klypin A. A., Cuesta A. J., Betancort-Rijo J. E., PrimackJ., 2012, MNRAS, 423, 3018Press W. H., Schechter P., 1974, ApJ, 187, 425Reddick R. M., Wechsler R. H., Tinker J. L., Behroozi P. S., 2013,ApJ, 771, 30Reid B. A., Seo H.-J., Leauthaud A., Tinker J. L., White M., 2014,MNRAS, 444, 476Reid B. A., White M., 2011, MNRAS, 417, 1913Riebe K. et al., 2013, AN, 334, 691Scoccimarro R., Sheth R. K., Hui L., Jain B., 2001, ApJ, 546, 20Seljak U., 2000, MNRAS, 318, 203Sheth R. K., Tormen G., 1999, MNRAS, 308, 119Skibba R. A., Coil A. L., Mendez A. J., et al., 2015, ApJ, 807, 152Slepian Z., Eisenstein D. J., 2015, MNRAS, 454, 4142Smith R. E. et al., 2003, MNRAS, 341, 1311Szapudi I., 2004, ApJL, 605, L89Tegmark M., 1997, Physical Review Letters, 79, 3806Tinker J. L., Weinberg D. H., Zheng Z., 2006, MNRAS, 368, 85Tinker J. L., 2007, MNRAS, 374, 477Tinker J. L., Weinberg D. H., Zheng Z., Zehavi I., 2005, ApJ, 631,41Tinker J., Kravtsov A. V., Klypin A. et al., 2008, ApJ, 688, 709 Tinker J. L., Robertson B. E., Kravtsov A. V. et al., 2010, ApJ,724, 878van den Bosch F. C., Yang X. H., Mo H. J., 2003, MNRAS, 340,771van den Bosch F. C., Mo H. J., Yang X. H., 2003, MNRAS, 345,923van den Bosch F. C., More S., Cacciato M., Mo H., Yang X., 2013,MNRAS, 430, 725Wechsler R. H., Zentner A. R., Bullock J. S., Kravtsov A. V.,Allgood B., 2006, ApJ, 652, 71White M. et al., 2011, ApJ, 728, 126Yang X., Mo H. J., van den Bosch F. C., 2003, MNRAS, 339,1057York D. G. et al., 2000, AJ, 120, 1579Zehavi I. et al., 2005, ApJ, 621, 22Zehavi I. et al., 2011, ApJ, 736, 59Zentner A. R., Hearin A. P., van den Bosch F. C., 2014, MNRAS,443, 3044Zheng Z., Tinker J. L., Weinberg D. H., Berlind A. A., 2002, ApJ,575, 617Zheng Z., 2004a, ApJ, 610, 61Zheng Z., 2004b, ApJ, 614, 527Zheng Z. et al., 2005, ApJ, 633, 791Zheng Z., Coil A. L., Zehavi I., 2007, ApJ, 667, 760Zheng Z., Weinberg D. H., 2007, ApJ, 659, 1Zheng Z., Zehavi I., Eisenstein D. J., Weinberg D. H., Jing Y. P.,2009, ApJ, 707, 554Zhu G., Zheng Z., Lin W. P., Jing Y. P., Kang X., Gao L., 2006,ApJL, 639, L5Zu Y., Weinberg D. H., 2013, MNRAS, 431, 3319Zu Y., Zheng Z., Zhu G., Jing Y. P., 2008, ApJ, 686, 41

APPENDIX A: COVARIANCE MATRIX WITH MODELUNCERTAINTY

Let us consider the case that we use a model built on one simula-tion in a volume V m (‘m’ for model) to interpret the observationobtained from a survey volume V o (‘o’ for observation). What co-variance matrix should we use to model the data? The covariancematrix estimated for the observation tells us the covariance in theobservational data. However, the model is based on a simulationwith a ﬁnite volume, and therefore it is not the global model or themodel from ensemble average. The model itself has uncertainty,and the modelling needs to account for this. To derive the effec-tive covariance matrix C eﬀ to be used in the modelling, let us de-ﬁne the i -th data point measured in the observational volume V o as F V o o ,i , the i -th data point from the model with simulation volume V m as F V m m ,i , and the global averages (or the ensemble averages) ofthe observational and model data points as F o ,i and F m ,i , respec-tively. Note that for an accurate model that reﬂects the reality, wehave F m ,i = F o ,i . That is, the global model reproduces the globalaverage observation.The effective covariance matrix with model uncertainty in-cluded is then C eﬀ ij = (cid:10)(cid:0) F V o o ,i − F V m m ,i (cid:1) (cid:0) F V o o ,j − F V m m ,j (cid:1)(cid:11) (A1) = (cid:10)(cid:2)(cid:0) F V o o ,i − F o ,i (cid:1) − (cid:0) F V m m ,i − F m ,i (cid:1)(cid:3)(cid:2)(cid:0) F V o o ,j − F o ,j (cid:1) − (cid:0) F V m m ,j − F m ,j (cid:1)(cid:3)(cid:11) (A2) = (cid:10)(cid:0) F V o o ,i − F o ,i (cid:1) (cid:0) F V o o ,j − F o ,j (cid:1)(cid:11) c (cid:13) , 000–000 alaxy Clustering Modelling with Simulations + (cid:10)(cid:0) F V m m ,i − F m ,i (cid:1) (cid:0) F V m m ,j − F m ,j (cid:1)(cid:11) + (cid:10)(cid:0) F V o o ,i − F o ,i (cid:1) (cid:0) F V m m ,j − F m ,j (cid:1)(cid:11) + (cid:10)(cid:0) F V m m ,i − F m ,i (cid:1) (cid:0) F V o o ,j − F o ,j (cid:1)(cid:11) . (A3)The symbol hi denotes global/ensemble average over observationsin volumes of V o and over models in volumes of V m . From (A1)to (A2), we make use of the above F m ,i = F o ,i relation. In (A3),the ﬁrst term is the element C V o ij of the covariance matrix for themeasurements in volume V o , the second term is the element C V m ij of the covariance matrix for the measurements in volume V m (sincethe model values can be regarded as mock measurements), and boththe third and fourth terms are zero (since there is no correlationbetween observation measurements and mock measurements). Wethen have C eﬀ = C V o + C V m , (A4)and the result is expected and intuitive.For power spectrum or 2PCF measurements, the covari-ance matrix element is inversely proportional to the volume(Feldman et al. 1994; Tegmark 1997). We can express the effectivecovariance matrix in equation (A4) in terms of the one estimatedfor the observation and the relative volume of the simulation andobservation, C eﬀ = (cid:16) V o V m (cid:17) C V o . (A5) c (cid:13)000