[PDF] Constraints on sigma_8 from galaxy clustering in N-body simulations and semi-analytic models

Abstract

We generate mock galaxy catalogues for a grid of different cosmologies, using rescaled N-body simulations in tandem with a semi-analytic model run using consistent parameters. Because we predict the galaxy bias, rather than fitting it as a nuisance parameter, we obtain an almost pure constraint on sigma_8 by comparing the projected two-point correlation function we obtain to that from the SDSS. A systematic error arises because different semi-analytic modelling assumptions allow us to fit the r-band luminosity function equally well. Combining our estimate of the error from this source with the statistical error, we find sigma_8=0.97 +/- 0.06. We obtain consistent results if we use galaxy samples with a different magnitude threshold, or if we select galaxies by b_J-band rather than r-band luminosity and compare to data from the 2dFGRS. Our estimate for sigma_8 is higher than that obtained for other analyses of galaxy data alone, and we attempt to find the source of this difference. We note that in any case, galaxy clustering data provide a very stringent constraint on galaxy formation models.

Full PDF

aa r X i v : . [ a s t r o - ph ] S e p Mon. Not. R. Astron. Soc. , 000–000 (0000) Printed 9 November 2018 (MN L A TEX style ﬁle v2.2)

Constraints on σ from galaxy clustering in N -body simulations andsemi-analytic models Geraint Harker, , ⋆ Shaun Cole and Adrian Jenkins Department of Physics, University of Durham, Science Laboratories, South Road, Durham DH1 3LE Kapteyn Astronomical Institute, University of Groningen, PO Box 800, 9700AV Groningen, the Netherlands

ABSTRACT

We generate mock galaxy catalogues for a grid of different cosmologies, using rescaled N -body simulations in tandem with a semi-analytic model run using consistent parameters. Be-cause we predict the galaxy bias, rather than ﬁtting it as a nuisance parameter, we obtain analmost pure constraint on σ by comparing the projected two-point correlation function weobtain to that from the SDSS. A systematic error arises because different semi-analytic mod-elling assumptions allow us to ﬁt the r -band luminosity function equally well. Combining ourestimate of the error from this source with the statistical error, we ﬁnd σ = 0 . ± . . Weobtain consistent results if we use galaxy samples with a different magnitude threshold, or ifwe select galaxies by b J -band rather than r -band luminosity and compare to data from the2dFGRS. Our estimate for σ is higher than that obtained for other analyses of galaxy dataalone, and we attempt to ﬁnd the source of this difference. We note that in any case, galaxyclustering data provide a very stringent constraint on galaxy formation models. Key words: cosmology: theory – dark matter – galaxies: haloes – galaxies: formation

For a given set of cosmological parameters in Λ CDM, the clus-tering of dark matter can be studied very accurately through N -body simulations (e.g., Springel et al. 2005), or for that matter,through analytic models calibrated by simulations (e.g., Smith et al.2003). The clustering of dark matter is not usually observed di-rectly, however, though weak lensing shear-shear correlations canprovide (at present noisy) estimates. Redshift surveys may fur-nish us with galaxy clustering statistics (see, e.g., Peacock 2003),while weak lensing measurements, for example, normally probethe cross-correlation between galaxies and dark matter (Refregier2003, and references therein).Galaxy clustering statistics derive a great deal of their powerto constrain cosmological parameters by constraining the scale atwhich the power spectrum ‘turns over’ on large scales, which com-plements the high-redshift CMB constraint on this scale rather well.The baryonic features in the correlation function or power spec-trum add to the effectiveness of the constraint (Cole et al. 2005;Eisenstein et al. 2005). The scales used in these joint constraintstend to be large scales, where the evolution of clustering is still inthe linear regime or where deviations from linearity can be morereadily modelled. Moreover, in this regime the galaxy correlationfunction is expected, in the absence of non-local effects, to havethe same shape as the mass correlation function, though offset by aconstant factor (see, e.g., Coles 1993). This offset – the (square of ⋆ E-mail: [email protected] the) bias – depends on the galaxy population under consideration;it depends, for example, on the threshold luminosity of the sample.Because of this uncertainty, when the galaxy correlation function isused to constrain cosmology, information on its overall normaliza-tion is not normally used, and the constraints come entirely fromits shape.In this paper we generate synthetic galaxy clustering statisticsby painting galaxies from a semi-analytic model onto dark matterdistributions given by N -body simulations. We then compare theseclustering statistics with those from the SDSS to attempt to con-strain cosmology. We can see the possibility for various beneﬁtsfrom our approach. Firstly, because we attempt to generate realisticcatalogues with full galaxy properties, we can make a prediction for the bias factor of a given galaxy sample and hence use the over-all normalization of the correlation function in our cosmologicalconstraints. In particular, we may be able to constrain σ , whichis not possible for normal techniques employing galaxy clustering.Secondly, because we populate the simulations on a halo-by-halobasis rather than just assuming that galaxies approximately tracemass on large scales, we generate a theoretical prediction for thesmall-scale, nonlinear clustering. We can therefore attempt to usethis information in our cosmological constraints too.Our constraints are largely independent from those using theCMB, and involve different assumptions (though we consider onlyﬂat models, which one could regard as implicitly using CMBresults). Because dark energy has an effect on structure forma-tion, and different forms of dark energy might affect it in dif-ferent ways at late times, it is useful to have an independent, c (cid:13) G. Harker et al. low-redshift constraint on σ that does not rely on a joint anal-ysis with high-redshift data (Doran, Schwindt & Wetterich 2001;Bartelmann, Doran & Wetterich 2006). A joint analysis would tendto be more model-dependent as one must be able to model whathappens in the gap between observed snapshots of the Universe. In choosing parameters for our simulations, our aim was to gener-ate simulation outputs for a range of different values of Ω M and σ so that we could examine galaxy clustering as a function of thesequantities. Given our focus on constraining σ , we opted to gener-ate outputs with σ taking values between 0.65 and 1.05, regularlyspaced in steps of 0.05.Measurements of the abundance of clusters constrain the high-mass end of the halo mass function, and hence constrain a combi-nation of Ω M and σ (e.g., Eke, Cole & Frenk 1996). This combi-nation is, very approximately, σ Ω . . To test if we could break thisdegeneracy, we have generated two grids of models. For Grid 1, theparameters of each model satisfy σ Ω . = 0 . . . , while forGrid 2 they satisfy σ Ω . = 0 . . . . Within each grid, σ takes on its full range of values between . and . . It would bevery difﬁcult to distinguish between two cosmologies lying on thesame grid using cluster abundances. The two ‘cluster normaliza-tion’ curves, with σ Ω . = const . , are shown as the long-dashedand short-dashed lines in Fig. 1. The pairs (Ω M , σ ) labelling thecosmologies we analyse are plotted as crosses on these curves.We extract the mass distribution for these cosmologies fromtwo simulations run using the TREE - PM N -body code GADGET particles in a h − Mpc box. Wehave stored the simulation output at several redshifts. Each of thesesnapshots of the mass distribution is then interpreted as a z = 0 . snapshot of a simulation with a different cosmology, to avoid hav-ing to run a great number of simulations. We choose z = 0 . since this is near the median redshift of the main SDSS and 2dF-GRS galaxy samples. The output redshifts are chosen so that oncethe simulations are relabelled as z = 0 . snapshots, the value of σ at z = 0 for each simulation falls onto a regular grid. Eachsimulation then gives us snapshots with σ taking values between0.65 and 1.05, regularly spaced by 0.05. Table 1 gives the valueof Ω M and σ in these relabelled snapshots. We have chosen thesimulation parameters such that the ﬁrst simulation, ‘Run 1’, has Ω M = 0 . at its σ = 0 . output, while the second simulation,‘Run 2’, has Ω M = 0 . at its σ = 0 . output. When we performa further rescaling of Ω M (see below) it is these central snapshotswhich remain unchanged. The initial conditions are calculated us-ing a Bardeen et al. (1986) power spectrum with shape parameter Γ = 0 . and with primordial spectral index n s = 1 . A smoothpower spectrum was most convenient in the light of the rescalingswe carry out on the ﬁnal output, but in fact the Bardeen et al. (1986)power spectrum with Γ = 0 . was found to be a good ﬁt to the CMBFAST (Seljak & Zaldarriaga 1996) spectrum with Ω b = 0 . used for the Millennium Simulation (Springel et al. 2005), the pa-rameters of which were chosen to be in agreement with the one-year WMAP results (Spergel et al. 2003).Once we have relabelled the simulation snapshots as z = 0 . snapshots, they lie on a curve in (Ω M , σ ) space which reﬂectsthe way the dark matter density is reduced and the amplitude of Figure 1.

The position in the (Ω M , σ ) plane of the outputs of our twosimulations. The line connecting the outputs shows how the values of Ω M and σ change as the simulation evolves, where we track the instantaneousvalues of these parameters rather than the values they will have at the ﬁ-nal time, which would be the more conventional labelling (and would, ofcourse, not change during the course of the simulation). The solid line cor-responds to Run 1 and the dotted line corresponds to Run 2. Low redshiftoutputs (lower density, more clustered) are in the top left, while high red-shift outputs (higher density, less clustered) are in the bottom right. Alsoshown are the curves described by the cluster normalization condition that σ Ω . = const . for two different values of this constant. We rescalesimulation outputs so that they lie on these curves. The rescaling is shownschematically by the red arrows. clustering is increased as the simulation evolves. These curves areshown as the solid and dotted lines in Fig. 1. We rescale Ω M in eachsnapshot, so that instead the snapshots lie on one of the cluster-normalized curves described above. The rescaling is achieved inpractice by applying the results of Zheng et al. (2002). If the par-ticle mass is scaled in the obvious way to obtain the desired Ω M ,the particle velocities must also be scaled to compensate, else thehaloes no longer satisfy the virial relation between their kineticand potential energy, and the galaxy populations of haloes are eas-ily distinguished in the different cosmologies via dynamical ob-servables. The rescalings in Ω M which move a snapshot onto thecluster-normalization curve are shown schematically as red arrowsin Fig. 1. Each cluster-normalized grid contains rescaled snapshotsfrom both simulations, and the simulation parameters were chosenso that the rescalings would never have to be too large. For some ofthe cosmologies on our grid, we could choose to rescale from ei-ther of our simulation runs without having to change Ω M by a largefactor. We have used these cosmologies to test that the results usingeither simulation run are consistent, and hence that our rescalingworks as expected.The simulation code runs SUBFIND (Springel et al. 2001)on the ﬂy, providing us with a list of friends-of-friends haloes(Davis et al. 1985) of more than 20 particles, and their substruc-tures. We use a linking length of . times the mean inter-particleseparation in the friends-of-friends algorithm to identify the haloes. SUBFIND also allows us to identify the particle in the halo with c (cid:13) , 000–000 onstraints on σ from simulations Table 1.

Cosmological parameters of simulation outputs after having beenrelabelled as z = 0 . outputs. We follow the usual convention that these arethe parameter values the simulation would have if evolved further to z = 0 .Run 1 Run 2 Ω M σ Ω M σ the least gravitational potential energy, which we use in our galaxyplacement scheme. The properties of galaxies in our catalogues are generated using thesemi-analytic galaxy formation code,

GALFORM (Cole et al. 2000;Benson et al. 2002, 2003; Baugh et al. 2005). For the purposes ofthis paper, we may consider a semi-analytic model as being a meansof predicting, given some dark matter halo at a redshift of interest,the galaxy population of that halo. Having that information, we canconstruct galaxy luminosity functions, correlation functions, etc.that might be considered the results or predictions of the model.The ﬁrst step in predicting the galaxy population of a halo iscalculating the merger history of the halo. In simulations of sufﬁ-ciently high resolution and with a sufﬁciently large number of out-puts, this can be extracted from the N -body data. In the case of GALFORM , this has been done recently by Bower et al. (2006) withthe Millennium Simulation (Springel et al. 2005); the same simula-tion has also been used by Croton et al. (2006) and De Lucia et al.(2006) to generate catalogues using a different semi-analytic code.The simulations we describe above, by contrast, do not have suf-ﬁcient resolution for us to extract reliable merger trees for thehaloes of interest. A Monte Carlo scheme based on the workof Lacey & Cole (1993) and using the algorithm described byCole et al. (2000) is employed instead, therefore. This generates amerger tree for a halo based only on the halo mass, the cosmologyand the initial power spectrum, and does not use other data fromthe simulation. This scheme does not, therefore, provide galaxypositions; our method of placing galaxies is given, instead, in Sec-tion 2.3.Unfortunately, the statistical properties of merger historiesgenerated by this algorithm are not identical to histories ex-tracted directly from an N -body simulation (Cole et al. 2007).Parkinson, Cole & Helly (2007) and Neistein & Dekel (2007) havedevised empirically motivated modiﬁcations to the algorithm to al-low Monte Carlo trees to ﬁt the simulation data better. A detailedanalysis of the effect of such a modiﬁcation on semi-analytic galaxyproperties is beyond the scope of this paper. We have, though, testedsome of our results using the new algorithm of Parkinson et al.(2007), and ﬁnd that for our purposes the new trees make little dif-ference.Given the merger history of a halo, the model computes theevolution of the baryonic content of the halo using a variety of an-alytic prescriptions. Many of the equations governing the physicalprocesses modelled by GALFORM contain parameters which may be adjusted. Some of these (for example, the form factor f orbit /c which governs the size of merger remnants) have a ‘natural’ valuedetermined by the physics; others (those governing the angular mo-mentum distribution of infalling haloes, say) are derived by com-parison to more detailed simulations. The function of allowingthese parameters to change, then, is to allow investigation into themagnitude of the effect of different physical processes on the re-sulting galaxy properties in the model. Other parameters have nonatural value, and can only be ﬁxed by requiring that they take val-ues which allow the model to ﬁt observations. Much of the time, ifwe are able to ﬁt some set of observations satisfactorily by choos-ing the parameters of the model judiciously, the same set of ob-servations could also be ﬁt reasonably well by some very differentchoice of parameters. Therefore, within the GALFORM framework,we have different models using different physics which are equallygood at matching the observations (though this may not, of course,be the case if we were to choose a different set of observations toconstrain the model).Our aim here is to try to constrain cosmological parameters bycomparing clustering statistics from a simulation populated withsemi-analytic galaxies to the corresponding measurements in anobservational survey. We would hope that our constraints are insen-sitive to the precise semi-analytic model used, and we would liketo test whether this is the case. Therefore, although we use onlyone code,

GALFORM , we use three different ‘models’, in the senseof different combinations of the physics we attempt to model andthe parameters governing that physics. In the remainder of this sec-tion of the paper, we discuss the technical differences between thethree models before brieﬂy describing how galaxies are placed inthe simulations in Section 2.3. A reader uninterested in the detailsof the models may therefore wish to skip to 2.3, or to our results inSection 3. The three models are as follows: • The ﬁducial model of Cole et al. (2000). This is successful inmatching several sets of observations, including the B - and K -bandluminosity functions, galaxy colours and mass-to-light ratios forgalaxies of different morphologies, the cold gas mass in galaxies,galaxy disc sizes and the slope and scatter of the I -band Tully-Fisher relation. Unfortunately, though, it assumes a cosmic baryonfraction, Ω b , of only 0.02. This is inconsistent with recent estimatesfrom Big Bang nucleosynthesis (e.g., O’Meara et al. 2001) and thecosmic microwave background (Spergel et al. 2007). Nevertheless,we feel it is worthwhile to include this model in our analysis asa well recognized and well understood model that has been thor-oughly described and studied. In our ﬁgures, lines correspondingto output from this model are given the label ‘Cole2000’. • A model similar to the ﬁrst, but with Ω b = 0 . , which iscloser to current estimates. Since there are twice as many baryonsas in the ﬁrst model, if we leave the rest of the parameters un-changed then, as expected, the model is unable to match obser-vations such as the luminosity function. Therefore we introduce anew physical process: thermal conduction in massive haloes (thisis analysed in greater detail by Benson et al. 2003). We simply as-sume that gas is unable to cool if the halo circular velocity, V circ ,satisﬁes V circ > V cond √ z , (1)where V cond is a parameter we may adjust. This suppresses theproblematic bright end of the luminosity function; the effect is sim-ilar, in fact, to more recent and more physically motivated imple-mentations of feedback from active galactic nuclei in GALFORM (Bower et al. 2006). Though it is clearly rather crude, note that our c (cid:13) , 000–000 G. Harker et al. objective here is only to produce a realistic enough galaxy cat-alogue to compare to observations. We are trying to mimic theeffect of whatever physical process suppresses the bright end ofthe luminosity function, without having to adopt a complicatedparametrization that is no better physically motivated than a moresimple and understandable one. The label we give to this modelin our ﬁgures is ‘C2000hib’ (where ‘hib’ stands for ‘high baryonfraction’). • Another model with Ω b = 0 . , but now incorporating ‘su-perwinds’. In this model it is postulated that a galaxy’s cold gasis heated strongly enough for it to be expelled completely fromthe halo, rather than returning to the reservoir of hot gas associ-ated with the halo. In fact, the model is derived from that used byBaugh et al. (2005) to reproduce the abundance of faint galaxiesdetected at submillimetre wavelengths. This also incorporates theadditions and reﬁnements to GALFORM described by Benson et al.(2003). These include a modiﬁcation to the assumed proﬁle of thehalo gas, a more sophisticated treatment of conduction, and a moredetailed treatment of galaxy mergers, in particular the effects oftidal stripping and dynamical friction (Benson et al. 2002). Theyalso include a simple model of the effect of reionization on smallhaloes, where cooling is prevented if V circ < V cut and z < z cut fortwo parameters V cut and z cut . The strength of superwind feedbackis parametrized by V SW , the characteristic velocity of the wind. Themodel is denoted ‘Model M’ in our ﬁgures.We wish to run each of these models in many different cos-mologies, in order to generate galaxy catalogues in which the N -body component and the semi-analytic component are consistent.Changing cosmology naturally changes the galaxy population pre-dicted by each model, however, so that even if we ﬁx the parameterssuch that the galaxies match observational constraints in one ﬁdu-cial cosmology, they are unlikely to match in other cosmologies.We therefore tweak the parameters between different cosmologiesto try to match the data. It is not possible to do this for the full rangeof even the primary constraints described by Cole et al. (2000). Werestrict ourselves to a comparison with the . r -band SDSS lumi-nosity function of Blanton et al. (2003) at z = 0 . . Even then, tomake the problem tractable and to ensure our three models remaindistinct, we restrict the parameters we allow ourselves to vary to: • V SW , for Model M only. • V cond , for the C2000hib model only. • V cut , one of the parameters controlling reionization. Thoughwe experimented with changing this for all the models, all theones below have either V cut = 0 (Cole2000 and C2000hib) or V cut = 60 (Model M). As well as being simpler, this also helpsmake Model M more distinct from the other two. • V hot and α hot . These are closely linked but we vary them in-dependently. They control the strength of standard (i.e. not super-wind) supernova feedback in the following way. The rate of changeof the mass of hot gas and of cool gas in a halo are linked with theinstantaneous star formation rate, ψ , by: ˙ M hot = − ˙ M cool + βψ (2)(Cole et al. 2000, equation 4.7). β is related to the circular velocityof the galaxy disc, V disc , by β = ( V disc /V hot ) − α hot (3)(Cole et al. 2000, equation 4.15).Some of the cosmologies below require quite extreme, per-haps unphysical, parameter values. In some cases, GALFORM is reluctant to run, while in others the ﬁt for some observations iscompromised in an attempt to ﬁt the . r -band luminosity functionwell. In addition to running each model with tweaked parameters ineach cosmology, therefore, we also run each model in each cosmol-ogy using the same parameters as the central cosmology of Run 1,in which (Ω M , σ ) = (0 . , . .We are not able to produce a good ﬁt to the luminosity func-tion in a χ sense, even allowing these parameters to vary. This maybe a concern when comparing clustering statistics to observationaldata. Volume-limited galaxy samples, to which we wish to compareour results later, are chosen such that all the galaxies are brighterthan some given absolute magnitude limit. If we choose a sampleof semi-analytic galaxies with the same magnitude limit, then be-cause our luminosity function is wrong we may not be choosing asample that necessarily corresponds to the observational one, evenwithin our model. Therefore we instead select semi-analytic galaxysamples with a magnitude such that the sample has the same spacedensity as the corresponding observational sample. This means thatwhen we adjust the parameters of the model to match the lumi-nosity function, it is more important for our purposes to match itsoverall shape rather than its magnitude normalization.The Υ parameter of Cole et al. (2000) is related to this sort ofscaling of the luminosity function. It was introduced to account forbrown dwarfs, which absorb some of the mass of gas assumed tobe tied up in stars, but without producing light. It is deﬁned by Υ = mass in visible stars + mass in brown dwarfsmass in visible stars (4)(Cole et al. 2000, equation 5.2). Clearly, then, we must have Υ > given this physical explanation. The result of including this effectis to scale luminosities by a factor / Υ . For each GALFORM modelwe run, we compare the resulting . r -band luminosity functionwith the observational value from the SDSS (in fact, we compareonly one point near the characteristic luminosity, L ∗ ). We expressthe difference between the two in terms of the Υ parameter: the(reciprocal of the) amount by which we would have to scale theluminosity of the semi-analytic galaxies to match the data. Some-times this requires Υ < . Therefore, when we give a value of Υ below, it should be treated only as an indication of the amount bywhich we would have to scale luminosities so that when we selecta galaxy sample by a number density threshold then it would havethe same luminosity threshold as the observational sample. Notethat we calculate Υ by reference to a speciﬁc point on the SDSS . r -band luminosity function. Its exact value would change if wenormalized at a different point (since the model luminosity func-tion is not the same shape as the observational one), or in a differentband (since the colour of model galaxies may be incorrect).Once we have given ourselves the freedom to scale the lumi-nosity function in this way, then, the effect of varying the parame-ters we allow ourselves to change to try to match the shape of theluminosity function is as follows: • Increasing V SW , or decreasing V cond , tends to steepen thebright-end slope, i.e. give fewer very bright galaxies. V SW is onlynon-zero for Model M; V cond is only ﬁnite for the C2000hib model. • Increasing V cut reduces the slope at the faint end, reducing thenumber of the faintest of the galaxies we study. • Increasing V hot tends to suppress the overall space density ofgalaxies. Because of the effect of the other parameters, it is mostuseful for adjusting the abundance of galaxies of around L ∗ , or alittle fainter. • Changes in α hot can be viewed as modulating the effect ofchanging V hot . Visually, for typical values of V hot , increasing α hot c (cid:13) , 000–000 onstraints on σ from simulations Figure 2. r -band luminosity functions for the three ﬁducial GALFORM models, compared to the SDSS luminosity function of Blanton et al. (2003)(solid line) and a 2dFGRS r -band luminosity function (dotted line, mostlyobscured by the others) generated using the SuperCosmos r F -band data(Hambly et al. 2001). The value of the scaling parameter Υ , required tonormalize the model luminosities for galaxies of a particular space densityin this band, is also given in the legend. Errors on the SDSS luminosityfunction are only given for every tenth point, for clarity. ﬂattens the faint-end slope, typically over a wider range of lumi-nosities than V cut .We usually ﬁnd that to make the bright-end slope steeper and tomake the faint-end slope shallower, as required by the data, needsall parameters tweaked to give larger amounts of feedback. Thistends to have the overall effect of reducing the predicted luminosi-ties, leading to Υ < as mentioned above. Requiring Υ > would therefore require us to compromise one component or otherof the shape in these models. Since we later rescale to match spacedensities anyway, we opt not to make this compromise. Given twoparameter combinations which both match the luminosity functionreasonably well and which both give Υ < , we use the Υ parame-ter as a tie-breaker, selecting the combination which gives Υ closerto unity. We have also checked that each model is at least qualita-tively consistent with the other primary GALFORM constraints (theTully-Fisher relation, disc sizes, morphological mix, metallicitiesand gas fractions – see Cole et al. 2000).The r -band luminosity function for each of our models inour ﬁducial cosmology ( Ω M = 0 . and σ = 0 . ) is shown inFig. 2. Qualitatively, the agreement between the models and thedata is reasonable. The very sharp cutoff at the bright end of theluminosity function in Model M is a generic feature of the model.The lower space density of very faint galaxies in this model is alsogeneric, and comes from the introduction of reionization (non-zero V cut ). At ﬁrst glance, it appears that the Cole2000 model gives bet-ter agreement with the data at the bright end than the C2000hibmodel, despite the inclusion of a feedback mechanism speciﬁcallyto solve this problem in the latter. Recall, though, that the Cole2000model has a lower baryon fraction, and despite this we have to in-troduce relatively high levels of supernova feedback to match the shape of the luminosity function. We therefore need Υ ∼ . to re-cover the correct luminosities, while the C2000hib model needs amuch more physically palatable Υ ∼ . . It may appear that our re-quirement for Υ ∼ . is inconsistent with the original Cole et al.(2000) paper, the reference model of which requires Υ = 1 . .It is not inconsistent, for a few reasons. Firstly, we use the label‘Cole2000’ for our model because it uses equivalent code with thesame physics governed by the same parameters as the models ofCole et al. (2000). As we have just noted, however, some of the pa-rameters take different values in our ﬁducial model in order to try tomatch the shape of the r -band luminosity function. Secondly, whilethe reference model of Cole et al. (2000) had σ = 0 . , ours has σ = 0 . . Thirdly, their Υ was calculated by reference to the valueof the observed b J band luminosity function at L ∗ (though in factthe same correction also provided a good match to the K -band lu-minosity function). Ours is calculated by reference to the r -bandluminosity function. We match to a point slightly brighter than L ∗ (where the exponential cutoff has started to bite more deeply andthe galaxies are less abundant; the point at M r ∼ − . can beseen quite easily in Fig. 2 as being where all the lines cross) sincewe otherwise had problems calculating Υ for some of our modelswith a very shallow faint-end slope and low galaxy number density. With the N -body simulations and the semi-analytic catalogues inplace, it remains to merge the two to create a synthetic galaxy cata-logue, or in other words to populate the simulations with our galax-ies. To each halo in the simulation we assign a semi-analytic galaxypopulation for a random merger tree of the same mass. We thenplace the central galaxy at the position of the particle with leastgravitational potential energy, and place the satellite galaxies onrandom particles within the halo.One might worry that given the resolution of our simulations,it is possible for the semi-analytic model to predict that a halo ina simulation contains a bright enough galaxy to enter our sampleeven though the halo is not resolved with at least 20 particles, whichis our normal criterion for considering the halo to be resolved. Totake account of these galaxies, we calculate the number of suchhaloes expected for the simulation volume for the Jenkins et al.(2001) mass function. We then take the galaxy populations pre-dicted by GALFORM for these haloes and place the galaxies on ran-dom particles in the simulation which are not in haloes. We do notexpect this to have a signiﬁcant effect on clustering statistics, sincealmost all galaxies which would be placed in unresolved haloes arevery faint, and in any case the halo bias as a function of mass isnot strongly varying in this regime (Cole & Kaiser 1989; Mo et al.1999) so that we do not lose too much accuracy by placing galaxiesin haloes of the wrong mass. None the less, we have checked thatemploying this scheme has only a small effect on our measured cor-relation functions. Changing the minimum resolved mass from 20particles to 50 particles has only a very small effect on the corre-lation function, as does ignoring the ‘unresolved’ galaxies entirely,even for a conservative mass limit of 50 particles. Moreover, thisremains true even for galaxy samples which are rather faint whencompared to the magnitude limit of the SDSS samples we will beconsidering, and which therefore provide a more stringent test. c (cid:13) , 000–000 G. Harker et al.

Figure 3.

Clustering in our models and in the SDSS. r p w p ( r p ) is plotted for clarity. The solid, black line with error bars is the SDSS data; the dotted lineshows the SDSS ﬂux-limited sample for comparison. The coloured lines are from our three models: short-dashed red for Cole2000, long-dashed green forC2000hib and dot-dashed blue for Model M, as in Fig. 2. The nine different cosmologies form a grid in σ with the cosmological parameters lying on thecurve σ Ω . = 0 . . . . This plot shows models for which we allow ourselves to tweak the GALFORM parameters to match the luminosity function.

In their study of the luminosity and colour dependence of thegalaxy correlation function using the main galaxy sample of theSDSS, Zehavi et al. (2005) calculated the projected two-point cor-relation function w p ( r p ) for ten different galaxy samples deﬁnedby thresholds in absolute magnitude. We have been provided withthese correlation functions, and their covariance matrices calcu-lated by jackknife resampling. Zehavi et al. (2005) also tabulate thespace density of each sample, so it is straightforward for us to selectcorresponding samples of semi-analytic galaxies.Our cosmological constraints will use samples with a galaxy space density ¯ n g = 0 . h Mpc − , corresponding to galax-ies with M r − h < − . in the SDSS. Our semi-analyticcatalogues have approximately twice the effective volume of theobservational sample, so when calculating how well our modelsﬁt the data we use only the covariance matrix of the observationalcorrelation function to compute our errors, neglecting the statisticalerrors on the simulated function. We use the sample of this spacedensity since it provides a good compromise between volume andspace density, giving relatively small errors, and since most of theconstraining power then comes from the galaxies of intermediateluminosity which are modelled best by the semi-analytic code. Wewill, though, brieﬂy discuss the effect of using samples of a differ-ent space density or selected in a different waveband. c (cid:13) , 000–000 onstraints on σ from simulations Figure 4.

A comparison between our four sets of models for one particularvalue of σ . The top two panels are for models in which we tweak the semi-analytic parameters to match the r -band luminosity function, while for thebottom two the parameters stay the same as for our ﬁducial model. The twoleft-hand panels are for the low- Ω M cluster normalization curve while theothers are for the high- Ω M case. The colour coding and line styles of themodels are the same as for Figs. 2 and 3. We compare the clustering in our synthetic catalogues and in theSDSS in Fig. 3. We plot the quantity r p w p ( r p ) since this scales outmuch of the r p dependence and makes differences in shape easier tosee. Fig. 3 shows our results for a grid of nine cosmologies spacedregularly in σ such that they lie on the curve σ Ω . = 0 . . . (‘Grid 1’). For this plot we show the models for which we allow thesemi-analytic parameters to vary so as to provide a good match forthe r -band luminosity function. Note that we have three other sim-ilar sets of models: one which includes the same cosmologies butin which the semi-analytic parameters are identical in each cosmol-ogy, and two more in which the cosmologies lie on the same gridin σ but which have lower Ω M , such that σ Ω . = 0 . . . (‘Grid 2’). In one low- Ω M sequence the semi-analytic parametersare allowed to vary, and in the other they are not. A ﬁgure similar toFig. 3 could be made for each of the latter three sets of catalogues,but since the features turn out to be qualitatively similar we do notshow such plots here. We do, though, compare the four sets for oneparticular value of σ in Fig. 4. For each grid of cosmologies andfor each choice as to whether to allow the parameters to vary wehave catalogues for each of our three models, and therefore in totalwe have twelve sets of catalogues each of which has nine memberslying on a regular grid in σ . The key to our numbering of thesesets is given in Table 2.Examining Fig. 3, it is clear that some of our catalogues ﬁt thedata better than others. For the higher σ cosmologies, the shapeof the models ﬁts that of the data rather well. The trend betweencosmologies is consistent between the three GALFORM models: ahigher amplitude of clustering for higher σ , as expected. Thereare differences between the models, however, especially on small Table 2.

The key to the model numbering used in Fig. 5. The ﬁrst columngives the label we assign to each of our 12 sets of populated simulations(each of which has nine cosmologies, regularly spaced in σ ). The ‘Grid’in the second column refers to whether the cosmologies lie on the clusternormalized curve with high Ω M (Grid 1) or low Ω M (Grid 2). The thirdcolumn shows whether we adopt different parameters in different cosmolo-gies, or whether they stay the same, while the fourth gives the GALFORM model in use.Model no. Grid Same/diff. pars.

GALFORM model1 1 diff. C2000hib2 1 diff. Cole20003 1 diff. M4 1 same C2000hib5 1 same Cole20006 1 same M7 2 diff. C2000hib8 2 diff. Cole20009 2 diff. M10 2 same C2000hib11 2 same Cole200012 2 same M scales, which must arise from differences in the details of the halooccupation distribution predicted by the models.The variation in the predicted correlation function betweencosmologies, and the consistent trend between models, supportsour hope that comparison to the SDSS correlation function can con-strain σ . For each set of nine cosmologies, we calculate χ withrespect to the observed correlation function and its covariance ma-trix, then ﬁt a quadratic through the three points around the mini-mum to interpolate and estimate the best-ﬁtting σ and its 1- σ error.The result of applying this procedure is given in Fig. 5. There, wegive an estimate of σ and its errors for each of our twelve differentsets of populated simulations, as the black crosses and error bars.The model number referred to on the x -axis is explained in Table 2.A few comments may be made about Fig. 5. Firstly, some ofthe sets of models yielded no value of σ for which the simulatedcorrelation function was an acceptable ﬁt to the observed one. Thelarge χ and ∆ χ values then result in a spuriously small errorbar. This is the case for models 9, 11 and 12, so the constraintscoming from those models should be ignored. Secondly, recall thatwe ran only two N -body simulations, using only the outputs givenin Table 1. In fact, while one was run with a larger value for σ (and therefore a smaller Ω M for an output with given σ ), it wasstarted with initial conditions where the different Fourier modesof the density ﬁeld were given the same phase as in the low- σ simulation. This means that the constraints from the different setsof synthetic galaxy catalogues are not independent, but should beused to give an indication of the systematic error arising from thechoice of semi-analytic model parameters and the assumed valueof Ω M (which we do not constrain). Note that we have also plottedtwo constraints for each model in Fig. 5. On the left, with soliderror bars, are the constraints derived just as we have described.We discuss the estimates on the right, with dashed error bars, inSection 3.3.1. c (cid:13) , 000–000 G. Harker et al.

Figure 5.

Constraints on σ . The x -axis shows the model number, the keyto which is given in Table 2. The y -axis shows the 1 σ constraint on σ achieved in that particular model. The points with solid error bars are forthe unmodiﬁed catalogues. The points with dashed error bars show howthe constraints change when we make an empirical correction for the effectof halo assembly bias (Croton, Gao & White 2007, and references therein).We do this by modifying the correlation function according to the scale-dependent bias between shufﬂed and unshufﬂed GALFORM catalogues inthe Millennium Simulation, as described in Section 3.3.1.

The points with dashed error bars in Fig. 5 show a constraint af-ter we attempt to correct our simulated correlation functions forthe effect of so-called ‘halo assembly bias’. This is an effect thatmay arise if one of the assumptions we make when populating oursimulation with galaxies is incorrect. We assume that the distri-bution from which the properties of the galaxy content of a haloare drawn depends only on the mass of the halo. In other words,since the halo merger tree is the basic input to our semi-analyticmodel, we assume that the distribution from which the proper-ties of a halo’s merger tree are drawn depends only on halo mass.This is explicitly the case for the Monte Carlo merger trees we usehere, since it is a result of the underlying extended Press-Schechtertheory (Press & Schechter 1974; Bower 1991; Bond et al. 1991;Lacey & Cole 1993). This result was also supported by work onsimulations (Lemson & Kauffmann 1999). More recently, how-ever, the advent of larger simulations with better resolution has al-lowed this result to be challenged. Gao, Springel & White (2005)showed that old haloes of a given mass in the Millennium Sim-ulation were more strongly clustered than young haloes of thesame mass, while Harker et al. (2006) demonstrated that halo for-mation time is a function of halo environment as well as halomass using an independent set of merger trees in the same sim-ulation. Because halo age is a property of the halo merger tree,this shows that the assumption we have described is violated. Infact, the variation of N -body merger trees with environment hasbeen studied in other large simulations (Maulbetsch et al. 2007).More generally, this environmental dependence formally invali- dates the straightforward application of halo models of galaxy clus-tering (Benson et al. 2000; Seljak 2000; Berlind & Weinberg 2002;Cooray & Sheth 2002). It has stimulated theoretical attempts to ex-plain the departure from the extended Press-Schechter prediction(e.g., Sandvik et al. 2007; Wang, Mo & Jing 2007), and to lookfor possible effects on the galaxy population both observationally(Yang, Mo & van den Bosch 2006) and in models (Croton et al.2007; Zhu et al. 2006).Our correlation functions are corrected using a GALFORM cat-alogue generated in the Millennium Simulation. The version of

GALFORM which generates the catalogue takes as its input the ac-tual N -body merger tree of each halo (Bower et al. 2006). This cat-alogue therefore incorporates environmentally dependent halo for-mation. In a similar spirit to Croton et al. (2007), we shufﬂe thiscatalogue, assigning to each halo the galaxy population of a ran-dom halo of the same mass. This destroys any connection betweenthe environment of a halo of given mass and its galaxy popula-tion. We calculate the galaxy correlation function for a range ofdifferent magnitude thresholds for both the original and shufﬂedcatalogues. This gives us an estimate of the effect of halo assemblybias, for GALFORM galaxies at least: we note that while our resultsare qualitatively consistent with those of Croton et al. (2007), thetwo semi-analytic models do not respond identically. We have cal-culated the scale-dependent ratio between the correlation functions,and then used this ratio (for a sample of appropriate space density)to correct the correlation functions used for our constraints. Thisis intended only to give an estimate of the size of the systematicerror on our constraints coming from halo assembly bias. As Fig. 5shows, the error from this source is small in comparison to the sta-tistical errors, for our model at least.

We calculate the correlation length of the samples by ﬁtting a powerlaw to ξ ( r ) for < r/ ( h − Mpc) < ; that is, we parametrizethe correlation function as ξ ( r ) = ( r/r ) − γ where r is the corre-lation length. We have done this for all our samples of all luminosi-ties, so we are able to plot the correlation length as a function ofsample space density (or, equivalently, as a function of sample lu-minosity threshold) in Fig. 6. The black line in the plots shows thecorresponding result from the SDSS. The SDSS data show a steadyincrease in clustering strength with luminosity (i.e. with decreasingspace density) apart from a feature at ¯ n g ≈ . h Mpc − cor-responding to the difference between two M max r = − . sam-ples: one has a large, overdense region at z ∼ . excised (seeZehavi et al. 2005 for details) and has lower space density but, asmight be expected, weaker clustering than the sample where thisregion is retained.For many cosmologies, the Cole2000 model and the updated,higher baryon fraction C2000hib model do a reasonable job ofmatching the luminosity-dependent clustering in the SDSS, espe-cially for samples of moderate to low space density. Model M,which invokes superwinds (see Section 2.2), does not do so well,predicting very little luminosity dependence. This may be becausethe feedback effects are so extreme in large haloes that their cen-tral galaxies are little brighter (if at all) than those at the centre ofless massive, less biased haloes. The other two models tend to havethe opposite problem in the brightest samples: they tend to predicttoo high an amplitude of clustering. This could be due to too tighta relationship between halo mass and galaxy luminosity (perhapsbecause in reality feedback is more efﬁcient or more stochastic):none of the brightest galaxies is scattered into lower mass, less bi- c (cid:13) , 000–000 onstraints on σ from simulations Figure 6.

Correlation length as a function of sample space density for r -band selected galaxies, for the same set of models and with the same colour codingand line styles as Fig. 3. ased haloes. A generic feature of the GALFORM models seems tobe an upturn in the clustering amplitude at high space density. Thissuggests that too many of the faint galaxies generated by the modelreside in high mass haloes. This may be related to the fact that it ishard to produce a luminosity function with a ﬂat enough faint-endslope, the excess of faint galaxies perhaps consisting of satellites inmassive haloes.Clearly, matching the luminosity-dependent clustering ofgalaxies will continue to be a very stringent test for semi-analyticmodels. Even if the models were provided with the correct cos-mology as an input, matching the clustering would still seem torequire that the models predict the correct galaxy population forhaloes as a function of luminosity and mass, rather than predictingquantities which implicitly average over a range of halo mass, suchas the (unconditional) galaxy luminosity function. Conversely, if the models were able to correctly capture the trends of luminositydependent clustering, it would give us more conﬁdence that theywere predicting realistic galaxy populations on a halo-by-halo ba-sis, and give a ﬁrmer foundation for attempts to constrain cosmol-ogy with methods involving semi-analytic catalogues. Though webear this in mind, it seems unrealistic to require a perfect and com-plete model of galaxy formation before considering the informationit can provide us on cosmological parameters.The models display a minimum in clustering strength atroughly the space density of the sample we use for our constraint.We might therefore expect that using a sample of different spacedensity may yield a lower estimate for σ than the sample wehave used above. In fact when we attempt to constrain σ usingdifferent samples, we obtain high values of χ for all cosmolo-gies, even for those which appear by eye to be acceptable ﬁts, c (cid:13) , 000–000 G. Harker et al.

Figure 7.

A plot similar to Fig. 3 but for a 2dF sample with M b J − h < − and a b J -selected GALFORM sample with space density . h Mpc − . or which give reasonable values of χ using only the diagonalelements of the covariance matrix. We therefore suspect that forthese samples, an estimate of σ would be severely affected bynoise in the covariance matrix. Using only the diagonal elementsfor samples with ¯ n g = 0 . or . h Mpc − (having M r − h < − . and − . respectively) would suggest aslightly lower σ , in the region 0.85–0.9.One might worry that the supercluster at z ∼ . (mentionedabove) affects our constraints. Zehavi et al. (2005) note, however,that in their analysis it has no effect on samples fainter than their M r − h < − . sample, while removing it has a verysmall effect on brighter samples, producing a negligible drop in w p ( r p ) . If the drop were larger, then using samples without this re-gion removed could bias our estimate of σ upwards. More likely,the supercluster may cause a slight underestimate of the size of our error bars, since the jackknife samples used to calculate the co-variance matric are smaller than the supercluster. This prevents thejackknife method from fully capturing the variance in the densityﬁeld. We have chosen to use the ( r -band selected) SDSS rather than the( b J -band selected) 2dFGRS for our main constraint on σ , sincethe prediction for the luminosity of galaxies in bluer bands dependsmore heavily on recent star formation. It therefore tends to be moremodel-dependent than the prediction for redder bands, where thereis a larger dependence on total stellar mass. None the less, the 2dF-GRS provides very valuable data on galaxy clustering, and an accu-rate galaxy formation model should give constraints on σ which c (cid:13) , 000–000 onstraints on σ from simulations are consistent between the two datasets. In addition, the analysisof satellite fractions in the 2dFGRS by van den Bosch et al. (2005)suggested, if somewhat indirectly, that the 2dF data prefer a rela-tively low σ . In a similar spirit to the analysis we perform here,this constraint on σ came about independently of other datasets.It may therefore be interesting to see whether our relatively highvalue of σ coming from galaxy clustering data alone is driven bythe data (in which case our estimate for σ when using 2dFGRSdata should be consistent with theirs) or by other factors. Note, forexample, that Pan & Szapudi (2005) quote a high preferred valuefor σ in their 2dFGRS clustering analysis – albeit concentratingon the three-point function – though their error bar extends to lowvalues, σ = 0 . +0 . − . .The 2dFGRS clustering data we use are an updated version ofthe analysis of Norberg et al. (2001, 2002) (Norberg et al., in prep.).This is the same dataset as used by Tinker et al. (2007) in theirstudy of the luminosity dependence of the galaxy pairwise velocitydispersion. We compare the sample with M b J − h < − to corresponding catalogues from our models in Fig. 7. The grid ofmodels used is the same as for Fig. 3, but we select samples us-ing a b J magnitude threshold so as to match the space density ofthe 2dF sample. The threshold is chosen so that the space density, . h Mpc − , is similar to that of our main SDSS sample.The model clustering appears to depend more weakly on cosmol-ogy than for the r -selected samples, but there is still a clear trendand so we would hope still to be able to use these data to estimate σ . We calculate χ between the data and the model using aprincipal component analysis, again ignoring the errors on themodel correlation functions as we did for the SDSS. This analy-sis is performed on the dimensionless projected correlation func-tion, w p ( r p ) /r p , denoted Ξ( σ ) /σ by Norberg et al. (2002). We useonly the ﬁrst six principal components, which account for over 99per cent of the variance. Statistical errors in the estimate of theprincipal components dominate the contribution to χ of the lesssigniﬁcant components. This illustrates the problems which wouldarise if we instead used the whole covariance matrix, as highlightedin Section 3.3.2.The resulting constraints on σ are given in Fig. 8. The modelnumbering is the same as for Fig. 5, and is given in Table 2. Notingthe change in axis scale from Fig. 5, we see that the statistical erroron σ from the 2dF sample is comparable to that from the SDSS.While most of the grids of models yield σ estimates similar tothose obtained from the SDSS (if perhaps a little lower), there areseveral model grids which give signiﬁcantly lower values of σ forthe b J -selected samples than they did for the r -selected ones. Infact, these grids (numbers 1, 4 and 10) all use the C2000hib model.Our results therefore suggest that in this model the blue galaxiesare more clustered than in the others, and hence lower dark matterclustering is required to match the observational result. This may bebecause the feedback excessively reddens isolated galaxies, leav-ing a larger proportion of the bluer galaxies in more massive, moreclustered haloes. In any case it supports the idea that the cluster-ing of model galaxies selected in bluer wavebands may be moredependent on the semi-analytic prescription. As we note in Section 3.2, the constraints from the 12 different setsof catalogues in Fig. 5 are not independent, since the underlying N -body simulations in each case were seeded with the same phases. Figure 8.

Constraints on σ for the 2dF sample. The model numbering isthe same as for Fig. 5 and is given in Table 2. This does, though, mean that we can use the scatter between thecatalogues to estimate the systematic error in the constraint aris-ing from our choice of semi-analytic prescription. While we onlyhave three different models (along with variants in which we do nottweak the parameters to match the r -band luminosity function), wecan see that they differ quite strongly in the luminosity dependence(Fig. 3) and colour dependence (Fig. 8) of their clustering. Theymay, then, be representative of the scatter we can expect between GALFORM models that ﬁt the r -band luminosity function and theprimary constraints listed in Section 2.2. From the range of ∼ . in the value of the best-ﬁtting σ between sets of catalogues, weestimate a systematic error from this source of ± . . The aver-age size of the statistical error bars among the catalogues for whichthe best-ﬁtting σ was a good ﬁt in a χ sense suggests a statisti-cal error of, again ± . . Adding these in errors in quadrature tothe mean of the best-ﬁtting values in these catalogues gives a ﬁnalﬁgure of σ = 0 . ± . for the r -band samples.Within the scope of the parameter variations we investigated,if we assume that σ = 0 . then none of our models gives us a goodﬁt to the SDSS clustering over the full range of scales. This does notpreclude the possibility that models that do not ﬁt the luminosityfunction, or that include different physics to our particular semi-analytic model, may achieve such a ﬁt.Our value for σ is clearly at odds with the most strik-ing recent measurement, from the three-year WMAP data. Usingthose data alone, Spergel et al. (2007) quote σ = 0 . +0 . − . for ﬂat, power-law Λ CDM, and this value is not signiﬁcantly in-creased (though the error bars tighten) when the data are anal-ysed jointly with galaxy clustering or supernova data. There is,though, some tension between the WMAP result and results fromweak lensing surveys, which provide rather complementary pa-rameter constraints (Tereno et al. 2005). Spergel et al.’s joint anal-ysis of WMAP and the CFHTLS lensing survey (Hoekstra et al.2006; Semboloni et al. 2006) pulls their estimate up to σ =0 . +0 . − . , with the lensing data alone favouring even higher val-ues. Benjamin et al. (2007) have combined data from the CFHTLS c (cid:13) , 000–000 G. Harker et al. and other surveys to give σ (Ω M / . . = 0 . ± . .Lyman- α forest data can be used to constrain the power spectrum;Seljak, Makarov, McDonald et al. (2005) quote σ = 0 . ± . (reducing to 0.84 incorporating the new constraints on reionizationfrom the three-year WMAP data). Measurements of cluster abun-dance have frequently been used to constrain σ , but provide a verywide range of estimates because of the difﬁculy in relating the prop-erties of an observed cluster to its mass (e.g., Rasia et al. 2005).Recent estimates are, though, consistent with the WMAP determi-nation of σ (e.g., Pierpaoli et al. 2003).The overall picture of the value of σ from other methods istherefore a little confusing, but even the highest recent estimates areonly marginally consistent with ours. A possible source of tensionbetween our constraints and those from WMAP is that we haveassumed a spectral index n s = 1 , while the best-ﬁtting WMAP σ is quoted for their best-ﬁtting n s of approximately . . As onecan see from the lower right panel of ﬁgure 10 of Spergel et al.(2007), the constraints on these two parameters are correlated. Evenso, increasing n s to unity would only correspond to an increase in σ of . , which is not enough to eliminate the discrepancy withour result. A further complication is that the size of n s is not theonly difference in the initial P ( k ) between the WMAP three-yearconstraints and our model. Though ideally one might like to repeatour tests using the power spectrum shape inferred by Spergel et al.(2007), we note that despite some parameter changes from the ﬁrst-year WMAP constraints, the net change in the power spectrum issmall. Furthermore, other datasets that are less dependent on n s also prefer a lower σ than we do, so even were we to infer a higher σ from the CMB than currently favoured, our problems would notbe solved.As far as using galaxy data alone goes, methods involvinghigher-order correlations, particularly the three-point correlationfunction (or its Fourier counterpart the bispectrum) are promising,for example because of their ability to constrain galaxy bias. Theaddition of dynamical information, for example redshift space dis-tortions or the pairwise velocity dispersion (PVD) can also helpconstrain cosmological parameters and galaxy bias. An analysisincluding PVD information in the conditional luminosity function(CLF) approach, by Yang et al. (2004, 2005), suggested relativelylow values of σ , though this was inferred from their models withhigh σ since low- σ simulations were not explicitly analysed. Re-sults from halo occupation (HOD) modelling, an approach perhapsmore akin to ours – for example the analysis of the cluster mass-to-light ratio by Tinker et al. (2005) – have tended also to favour alow σ . Zheng & Weinberg (2007), and references therein, providea detailed explanation of HOD modelling and its use in constrain-ing parameters using galaxy data alone.An alternative approach using the 2dFGRS is employed byvan den Bosch et al. (2005). They study the abundance and radialdistribution of satellite galaxies within the CLF framework, us-ing mock galaxy catalogues produced by a semi-analytic code tocalibrate their model. This calibration quantiﬁes the impact of theinevitable imperfections in the halo ﬁnder that lead to satellitegalaxies being spuriously identiﬁed as central galaxies of separatehaloes, and vice versa. It also accounts for incompleteness effectsin the 2dFGRS. Their results are consistent with other CLF analy-ses in suggesting that simultaneously matching the observed clustermass-to-light ratio and the fraction of satellite galaxies in the 2dF-GRS requires a low value of σ , lowering the abuundance of verymassive haloes with a great number of satellites. Again, they do notdirectly construct mock galaxy redshift surveys for a low σ model,but using the same calibration parameters as for their σ = 0 . model leads them to believe that adopting σ = 0 . provides abetter ﬁt to the data.The parameters of the conditional luminosity functions whichvan den Bosch et al. (2005) use to ﬁt the 2dFGRS data for low andhigh σ are tabulated in their paper. We have used these parametersto construct the corresponding mean occupation functions – that is,the mean number of galaxies in a halo of given mass – then usedthese functions to populate the Millennium Simulation, and outputsof our Run 1 having σ = 0 . and σ = 0 . . This is done as fol-lows: for each halo in the simulation, we look up the mean numberof galaxies in a halo of this mass, h N ( M ) i . If h N i the haloreceives either a single, central galaxy (with probability h N i ) or nogalaxies at all (with probability − h N i ). If h N i > then the haloreceives a central galaxy, plus a number of satellite galaxies drawnfrom a Poisson distribution with mean h N i − . The use of theseprobability distributions follows the work of Kravtsov et al. (2004)and Zheng et al. (2005); in addition, the halo occupation distribu-tion of GALFORM galaxies in our models is consistent with thisscheme. Once we know the number of galaxies in a given halo, wethen place the galaxies according to the scheme described in Sec-tion 2.3.We ﬁnd that the mean occupation functions look reasonable,though in some cases they are not quite monotonic, and do not gen-erally exhibit so clean a ‘step function + power law’ form as the

GALFORM mean occupation functions. In our approach to populat-ing simulations with the CLF HODs, we do not assign a luminosityto each galaxy, and so we construct a different catalogue for eachmagnitude threshold we wish to analyse. The space density of thesecatalogues as a function of magnitude gives us a cumulative lumi-nosity function, and we have checked that this matches the 2dF-GRS b J -band luminosity function for the σ = 0 . catalogue, as itshould by construction. For σ = 0 . , the CLF catalogues give toosteep a faint-end slope of the luminosity function, so that strictlyspeaking the CLF is incorrect, though this should not be a concernfor galaxies of the luminosity we use for our clustering analysis.We also ﬁnd that the CLF produces clustering consistent with our b J -selected GALFORM samples.The similar clustering in the

GALFORM and CLF cataloguesraises the question of what drives the difference in preferred σ .The similarity between values from our SDSS and 2dF analysessuggests it is not driven entirely by the data. Moreover, it seems atodds with the conclusions of Tinker, Weinberg & Warren (2006),who show that in their HOD model the projected correlation func-tion tightly constrains the satellite fraction. The answer may lie inthe fact that their parametrized HOD, and our semi-analytic HOD,are unable to match the form of the mean occupation functions pro-duced by the CLF approach, in which the parametrizations adoptedfor different parts of the CLF are a few steps removed from HODparameters.It would be exciting to conclude that our results support a realdifference between low-redshift estimates of σ (e.g. from weaklensing) and estimates using CMB data, and that this indicatessomething about, say, evolving dark energy. Other analyses ﬁndlower values, though, and there are still one or two concerns aboutour constraints. The mean occupation functions from our high Ω M ,low σ GALFORM runs tend to be more ragged, perhaps indicat-ing a difﬁculty with our modelling in these cosmologies. While wehave checked that galaxy samples selected in a different wavebandgive similar results, the anomalous C2000hib model gives somecause for concern. Similarly, while samples with a different magni-tude threshold appear to give consistent results, the luminosity de-pendence of clustering differs between models. These samples also c (cid:13) , 000–000 onstraints on σ from simulations have smaller effective volume which seems to give rise to problemsin the error analysis.As we hoped, a large part of our constraint comes from theintermediate-scale clustering for which halo-based models are mostnecessary, but this is affected by the scheme for placing galaxieswithin haloes. The largest difference between our high and low σ models (and between our low σ models and the data) is manifestedat small scales. The radial distribution of satellite galaxies withinhaloes is again different in the GALFORM model of Bower et al.(2006) which uses N -body substructure data, and this is clearly anarea of galaxy formation modelling which requires further atten-tion. We have repeated our analysis of the SDSS data using onlypoints with r p > h − Mpc . While the best-ﬁtting value of σ decreases by approximately six per cent to ∼ . , the error barsapproximately double in size, illustrating the importance of makinguse of smaller scales.While bearing the above caveats in mind, we would like to em-phasize the tight constraints available in principle using our tech-nique, and to note that the results presented here are consistent (ifmarginally) with some other analyses which use only low-redshiftdata. If the tightness of the constraints seems surprising, considerﬁrstly that in the absence of uncertainty about galaxy bias, the con-straints would be extremely tight as the amplitude of the corre-lation function is very well determined. Secondly, by comparingmodel galaxies to real galaxies of the same abundance, we fac-tor out most of the dependence on the details of the semi-analyticmodel. Any change that makes galaxies monotonically brighter orfainter will have no effect on our comparison between models anddata. Thirdly, the number of galaxies assigned to each halo in thesemi-analytic model is principally determined by the merger his-tory of that halo, and this is well described by the extended Press-Schechter theory. Hence this is not a major uncertainty in the semi-analytic models: unlike purely statistical descriptions as providedby the HOD or CLF, our semi-analytic models do not have the free-dom to deﬁne arbitrary HODs. These are instead largely determinedby the merger trees.We also remark that other constraints using galaxy data alone,which at ﬁrst sight seem inconsistent with ours, use different tech-niques or different data or both. These inconsistencies arise eventhough galaxy formation models can reproduce statistics which im-plicitly average over a range of halo mass – such as the uncondi-tional galaxy luminosity function – reasonably well. This illustratesthat matching the luminosity- and colour-dependence of clustering,at small and large scales, is an important and very stringent require-ment on galaxy formation models. We have compared the SDSS projected two-point correlation func-tion at a galaxy space density ¯ n g = 0 . h Mpc − to a suiteof populated simulations generated using the N -body code GAD - GET

GALFORM . Because we require N -body data in a great number of different cosmologies, we haverelabelled and rescaled some simulation outputs using the tech-niques of Zheng et al. (2002) to avoid the need to run a full sim-ulation for each cosmology in our grid. The galaxy catalogues areself-consistent, GALFORM being run afresh for each cosmology westudy.We have attempted to estimate the systematic error in ourvalue of σ due to the particular choice of semi-analytic model byrunning three different GALFORM variants in each cosmology. For each of these variants we generate a catalogue in which the

GAL - FORM parameters are adjusted to match the SDSS . r -band lumi-nosity function and a catalogue in which the parameters take thesame values as they take in the (Ω M , σ ) = (0 . , . cosmology.We obtain the result σ = 0 . ± . (statistical) ± . (system-atic). This constraint is impressively tight, given we have attemptedto narrow the range of assumptions we require to produce an esti-mate of σ by using only one well understood, low redshift dataset.By choosing grids of cosmologies which lie on cluster-normalizedcurves, σ Ω . = const . we have shown that the degeneracies in-herent in our approach are different to those inherent in cosmicshear measurements, which provide an important low redshift con-straint in Ω M and σ . In fact our method gives an almost pure con-straint on σ . We have shown that in our model, halo assembly biasdoes not severely affect our constraint on σ , though this may notbe universally the case for other semi-analytic codes. If it were notthe case, we would expect it to bias our estimate of σ high, sincefailing to account for halo assembly bias tends to lower the ampli-tude of model correlation functions, requiring an increased σ inthe model to compensate.We obtain similar values for σ if we use samples with a loweror higher galaxy space density, but the error analysis is less secure.We also obtain similar values using a principal component analysisof 2dFGRS data, for a sample of similar space density to that usedfor our primary constraint. We note, though, that the clustering ofgalaxies selected in bluer wavebands appears to be more model-dependent, as one might expect.Our estimate of σ looks high compared to the values obtainedby many recent measurements, in particular those from the WMAPexperiment. While we note that this tension has some interestingconsequences if it persists, we have also pointed out how appar-ent inconsistencies between our results and other low redshift con-straints may arise. Small and intermediate scales in the correlationfunctions contribute strongly to χ and hence to our constraint on σ , and yet are not as well understood as the large scales. This isclearly an area where further modelling effort is required. More-over, we will not be completely assured that semi-analytic modelscapture the phenomenology of the galaxy population sufﬁcientlywell for high precision cosmological constraints until they are ableto match the observed colour- and luminosity-dependent clusteringof galaxies. The models need to be able to reproduce the proper-ties of the observed galaxy population on a halo-by-halo basis, notjust the properties averaged spatially or over luminosity. This im-plies that if cosmological parameters can be tightly constrained byother techniques, measurements of galaxy clustering will continueto provide stringent tests for models of galaxy formation. ACKNOWLEDGEMENTS

Most of the work for this paper was undertaken while GH was inreceipt of a PhD studentship from the Particle Physics and Astron-omy Research Council. Thanks to Peder Norberg for providing uswith his principal component analysis of the 2dFGRS data, and hishelp with the use thereof. David Weinberg helped to initiate andplan this project, and provided the SDSS data on which our mainanalysis is based.

REFERENCES

Bardeen J. M., Bond J. R., Kaiser N., Szalay A. S., 1986, ApJ, c (cid:13) , 000–000 G. Harker et al. c (cid:13)000