[PDF] ELUCID. VI: Cosmic variance of galaxy distribution in the local Universe

Abstract

Halo merger trees are constructed from ELUCID, a constrained N -body simulation in the Sloan Digital Sky Survey (SDSS) volume. These merger trees are used to populate dark matter halos with galaxies according to an empirical model of galaxy formation. Mock catalogs in the SDSS sky coverage are constructed, which can be used to study the spatial distribution of galaxies in the low- z Universe. These mock catalogs are used to quantify the cosmic variance in the galaxy stellar mass function (GSMF) measured from the SDSS survey. The GSMF estimated from the SDSS magnitude-limited sample can be affected significantly by the presence of the under-dense region at z<0.03 , so that the low-mass end of the function can be underestimated significantly. Several existing methods designed to deal with the effects of the cosmic variance in the estimate of GSMF are tested, and none is found to be able to fully account for the cosmic variance. We propose a method based on the conditional stellar mass functions in dark matter halos, which can provide an unbiased estimate of the global GSMF. The application of the method to the SDSS data shows that the GSMF has a significant upturn at M ∗ < 10 9.5 h −1 M ⊙ , which has been missed in many earlier measurements of the local GSMF.

Full PDF

DDraft version September 5, 2018

Preprint typeset using L A TEX style emulateapj v. 12/16/11

ELUCID. VI: COSMIC VARIANCE OF GALAXY DISTRIBUTION IN THE LOCAL UNIVERSE

Yangyao Chen , H.J. Mo , Cheng Li , Huiyuan Wang , Xiaohu Yang , Shuang Zhou , Youcai Zhang Draft version September 5, 2018

ABSTRACTHalo merger trees are constructed from ELUCID, a constrained N -body simulation in the SloanDigital Sky Survey (SDSS) volume. These merger trees are used to populate dark matter halos withgalaxies according to an empirical model of galaxy formation. Mock catalogs in the SDSS sky coverageare constructed, which can be used to study the spatial distribution of galaxies in the low- z Universe.These mock catalogs are used to quantify the cosmic variance in the galaxy stellar mass function(GSMF) measured from the SDSS survey. The GSMF estimated from the SDSS magnitude-limitedsample can be aﬀected signiﬁcantly by the presence of the under-dense region at z < .

03, so that thelow-mass end of the function can be underestimated signiﬁcantly. Several existing methods designedto deal with the eﬀects of the cosmic variance in the estimate of GSMF are tested, and none is found tobe able to fully account for the cosmic variance. We propose a method based on the conditional stellarmass functions in dark matter halos, which can provide an unbiased estimate of the global GSMF.The application of the method to the SDSS data shows that the GSMF has a signiﬁcant upturn at M ∗ < . h − M (cid:12) , which has been missed in many earlier measurements of the local GSMF. Subject headings: dark matter - large-scale structure of the universe - galaxies: halos - methods:statistical INTRODUCTIONThe Universe contains prominent structures up to ∼

100 Mpc, only reaching homogeneity on much largescales (e.g. Peebles 1980; Davis et al. 1985). The proper-ties of galaxies and other objects, which form and evolvein the cosmic web, are expected to be aﬀected by theirlarge-scale environments. Thus, astronomical observa-tions, which are always made in limited volumes in theUniverse, can be aﬀected by the cosmic variance (CV)caused by spatial variations of the statistical propertiesof cosmic objects, such as galaxies, due to the presenceof large scale structure. Because of CV, statistics ob-tained from a sample that covers a speciﬁc volume in theUniverse may be diﬀerent from those expected for theUniverse as a whole. Erroneous inferences would thenbe made if such biased observational data were used toconstrain models.Cosmic variance (CV) is a well known problem (e.g.Somerville et al. 2004; Jha et al. 2007; Driver &Robotham 2010; Moster et al. 2011; Marra et al. 2013;Keenan et al. 2013; Wojtak et al. 2014; Whitbourn &Shanks 2014, 2016), and various attempts have beenmade to deal with it. One way is to analyze diﬀerent Center for Astrophysics and Physics Department,Tsinghua University, Beijing 100084, China; [email protected] Department of Astronomy, University of Massachusetts,Amherst MA 01003-9305, USA Key Laboratory for Research in Galaxies and Cosmology,Department of Astronomy, University of Science and Technologyof China, Hefei, Anhui 230026, China School of Astronomy and Space Science, University ofScience and Technology of China, Hefei 230026, China Department of Astronomy, Shanghai Jiao Tong University,Shanghai 200240, China IFSA Collaborative Innovation Center, Shanghai Jiao TongUniversity, Shanghai 200240, China Shanghai Astronomical Observatory, Shanghai 200030,China (sub-)samples, e.g. obtained from the Jackknife sam-pling of the observational data, and to use the varia-tions among them to have some handle on the CV. How-ever, this can only provide information about the vari-ance within the total sample itself, but not that of thetotal sample relative to a fair sample of the Universe.Another way is to use the spatial distribution of brightgalaxies (a density-deﬁning population), which can beobserved in a large volume, to quantify the CV expectedin sub-volumes (e.g. Driver & Robotham 2010), or tore-scale (or correct) the number density of faint galax-ies observed in a smaller volume, as was done by Baldryet al. (2012) in their estimate of galaxy stellar mass func-tions in the GAMA sample. However, this method relieson the assumption that galaxies of diﬀerent luminosi-ties/masses have similar spatial distributions, which maynot be true. The same problem also exists in the maxi-mal likelihood method (e.g. G. Efstathiou 1988), wheregalaxy luminosity function is explicitly assumed to beindependent of environment. Yet another way is to esti-mate the CV expected from a given sample using simple,analytic models for the clustering properties of galaxieson large scales. Along this line, Somerville et al. (2004)tested the eﬀects of the CV on diﬀerent scales, and pro-posed the use of either the two-point correlation functionof galaxies, or the combination of the linear density ﬁeldwith halo bias models (e.g. Mo & White 1996; Sheth et al.2001), to predict the CV of diﬀerent surveys. Similarly,Moster et al. (2011) carried out an investigation of theCV expected in observations of the galaxy populations atdiﬀerent redshifts, using the linear density ﬁeld predictedby the ΛCDM model combined with a bias model thattakes into account the dependence of galaxy distributionon galaxy mass and redshift. Unfortunately, such an ap-proach does not take into account observational selectioneﬀects. More importantly, this approach only gives a sta-tistical estimate of the CV but does not measure the de-viation of a speciﬁc sample from a fair sample. Finally, a r X i v : . [ a s t r o - ph . GA ] S e p Chen Yangyao et al.one can also use a large number of mock galaxy samples,either obtained directly from hydrodynamic simulations,or from N-body simulation-based semi-analytic (SAM)and empirical models, to quantify how the sample-to-sample variation of the statistical measure in questiondepends on sample volume. However, this needs a largeset of simulations for each model, analyzed in a way thattakes into account the observational selection eﬀects inthe data, which in practice is costly and time consum-ing. Furthermore, the same as the approach based ongalaxy clustering statistics, this approach can only pro-vide a statistical statement of the expected CV, but doesnot provide a way to correct the variance of a speciﬁcsample.Can one develop a systematic method to study thecosmic variance, and to quantify and correct biases thatare present in observational data? The answer is yes, andthe key is to use constrained simulations. Indeed, if onecan accurately reconstruct the initial conditions for theformation of the structures in which the observed galaxypopulation reside, one can then carry out simulationswith such initial conditions in a suﬃciently large boxthat contains the constrained volume, so that the largebox can be used as a fair sample, while the constrainedregion can be used to model the observational data. Bycomparing the statistics obtained from the mock sampleswith those obtained from the whole box, one can quantifyand correct the CV in the observational data.In the past few years, the ELUCID collaboration hasembarked on the development of a method to accuratelyreconstruct the initial conditions responsible for the den-sity ﬁeld in the low- z Universe (Wang et al. 2014). Asdemonstrated by various tests (Wang et al. 2014, 2016),the reconstruction method is much more accurate thanother methods that have been developed, and works re-liably even in highly non-linear regimes. The initial con-ditions in a 500 h − Mpc box that contains the main partof the SDSS volume have already been obtained, and ahigh resolution N -body simulation, run with (3072) par-ticles, has been carried out with these initial conditionsin the current ΛCDM cosmology (Wang et al. 2016).In the present paper, we use the dark matter halomerger trees constructed from the ELUCID simulationto populate simulated halos with model galaxies pre-dicted by the empirical galaxy formation model devel-oped by Lu et al. (2014a, 2015b, thereafter L14, L15).The model galaxies in the constrained volume are thenused to construct mock catalogs that contain the sameCV as the real SDSS sample. We compare galaxy stellarmass functions (GSMF) estimated from the mock cata-logs with that obtained from the total simulation box toquantify the CV within the SDSS volume. Finally, wepropose a method based on the conditional stellar massor luminosity distribution in dark matter halos to correctfor the CV in the observed GSMF. As we will see, theCV can be very severe in the low-mass end of the GSMFobtained from methods commonly adopted, and the low-mass end slope of the true GSMF in the low- z Universemay be signiﬁcantly steeper than those published in theliterature.The structure of the paper is as follows. In § § § § M ∗ ) = d N/ d V / d log M ∗ , which is the number of galax-ies per unit volume per unit stellar mass in logarith-mic space, and deﬁne the GLF in X-band as Φ( M X ) =d N/ d V / d( M X − h ), which is the number of galax-ies per unit volume per unit magnitude. The magnitude M X is k -corrected to redshift 0 . MERGER TREES OF DARK MATTER HALOSFROM THE ELUCID SIMULATION2.1.

The simulation

We use the ELUCID simulation carried out by Wanget al. (2016) to model the dark matter halo population,their formation histories, and spatial distribution. Thisis an N -body simulation that uses L-GADGET, a mem-ory optimized version of GADGET-2 (Springel 2005),to follow the evolution of 3072 dark matter particles(each with a mass of 3 . × h − M (cid:12) ) in a peri-odic cubic box with side length of 500 h − Mpc in co-moving units. The cosmology used is the one based onWMAP5 (Dunkley et al. 2009; Komatsu et al. 2009): aﬂat Universe with Ω K = 0; a matter density parameterΩ m , = 0 . Λ , = 0 . B , = 0 . H = 100 h km s − Mpc − with h = 0 . P ( k ) ∝ k n , with n = 0 .

96 and with the amplitude spec-iﬁed by σ = 0 .

80. The simulation is run from redshift z = 100 to z = 0, with outputs recorded at 100 snapshotsbetween z = 18 . z = 0 . h − M (cid:12) can bematched with the simulated halos of similar masses, witha distance error tolerance of ∼ h − Mpc, and massivestructures such as the Coma cluster and the Sloan GreatWall can be well reproduced in the reconstruction. Thus,the use of the constrained simulation from ELUCID al-lows us not only to model accurately the large-scale envi-ronments within which observed galaxies reside, but alsoto recover, at least partially, the formation histories ofthe massive structures seen in the local Universe.LUCID VI: Cosmic variance of galaxy distribution in the local Universe 32.2.

The construction of halo merger trees

Halos and their sub-halos with more than 20 particlesare identiﬁed with the friend-of-friend (FOF) and SUB-FIND algorithms (Springel et al. 2005). To be safe, weonly use halos identiﬁed in the simulation with masses M h ≥ M th = 10 h − M (cid:12) . However, this mass resolu-tion is not suﬃcient to resolve lower mass halos in whichstar formation may still be signiﬁcant, particularly athigh z . In order to trace the star formation histories inhalos to high redshifts, we need to reach a halo massof about 10 h − M (cid:12) , below which star formation is ex-pected to be unimportant due to photo-ionization heat-ing (e.g. Babul & Rees 1992; Thoul & Weinberg 1996).Here we adopt a Monte Carlo method to extend themerger trees of the simulated halos down to a mass limit,10 h − M (cid:12) . Jiang & van den Bosch (2014) have testedthe performances of several diﬀerent methods of gener-ating Monte Carlo halo merger trees, and found that themethod of Parkinson et al. (2008, thereafter P08) consis-tently provides the best match to the halo merging treesobtained from N -body simulations. We therefore adoptthe P08 method.We join the P08 Monte Carlo trees to the halo mergertrees obtained from the simulation through the followingsteps:(i) For each simulated halo merger tree T , we elimi-nate halos that have masses below M th = 10 h − M (cid:12) but have no progenitors more massive than M th . Thepurpose of the second condition is to preserve halos whichonce had masses larger than M th but have become lessmassive later due to stripping and/or mass loss.(ii) For each halo H that is not eliminated in T , wegenerate a Monte Carlo tree t (down to 10 h − M (cid:12) ),rooted from a halo h that has the same mass and thesame redshift as H , and eliminate all halos more massivethan 10 h − M (cid:12) in t .(iii) We add t to H . The procedure is repeated for allhalos with masses above 10 h − M (cid:12) in all trees in theELUCID simulation, so that all such halos have mergertrees extended to 10 h − M (cid:12) .(iv) For halos with masses below 10 h − M (cid:12) at z = 0,their merger trees are entirely generated with the MonteCarlo method. Note that these halos are not identiﬁedfrom the simulation, but can be used to model galaxiesin such low-mass halos when needed.With these steps, we obtain ‘repaired’ halo mergertrees that have a mass resolution of 10 h − M (cid:12) , withhalos more massive than 10 h − M (cid:12) sampled entirelyby the simulation, and the less massive ones modeled byMonte-Carlo trees. Fig. 1 shows the conditional progen-itor mass functions of dark matter halos, deﬁned as thefraction of mass in progenitors per logarithmic mass, formerger trees rooted from diﬀerent masses, and for pro-genitors at diﬀerent redshifts. Our results, obtained bycombining the simulated trees above the mass resolution M th with the Monte Carlo merging trees generated withthe P08 model below the mass limit, are shown by theblack solid lines, and compared with the merger treesgenerated entirely with the P08 model. Overall, the pro-genitor mass distributions we obtain match well those obtained from the Monte Carlo method, indicating thatour merger trees are reliable.Since galaxies form and evolve in dark matter halos,our ‘repaired’ halo merger trees from the ELUCID simu-lation provide the basis to link galaxy properties to darkmatter halos, and can be used in combinations with halo-based methods of galaxy formation, such as abundancematching, semi-analytic and other empirical models, topopulate halos with galaxies. The method can, in prin-ciple, be applied to simulated halos with any mass res-olution and with any cosmology, to extend halo mergertrees to a suﬃciently low mass, as long as reliable MonteCarlo trees can be generated. We note that our mergertrees do not include high order sub-halos, i.e. sub-halosin sub-halos. In the next section, we apply the empiricalmodel, developed in L14 and L15, to follow galaxy for-mation and evolution in dark matter halos, based on ourrepaired halo merger trees. POPULATING HALOS WITH GALAXIESIn this section, we describe the L14, L15 empiricalmethod, developed by Lu et al. (2014a, 2015b), to pop-ulate galaxies in the halo merger trees described in theprevious section. Brieﬂy, we assign a central galaxy toeach distinctive halo and give it an appropriate star for-mation rate (SFR) according to the empirical model. Wethen evolve all galaxies in the current snapshot to thenext, following the accretion of galaxies by dark matterhalos and the mergers of galaxies. The stellar masses forboth central and satellite galaxies are obtained by inte-grating the stellar contents along their histories. Finally,observable quantities, such as luminosity and apparentmagnitude, are obtained from a stellar population syn-thesis model.3.1.

The empirical model of galaxy formation

In the model of L14 and L15, SFR of a central galaxy isassumed to depend on the halo mass M halo and redshift z asSFR( M halo , z ) = ε f B M halo τ (1 + z ) κ (1 + X ) α (cid:18) X + RX + 1 (cid:19) β (cid:18) XX + R (cid:19) γ (1)where τ = 1 / (10 H ), κ = 3 / f B = Ω B , / Ω m , is the cosmic baryon fraction, and ε and β are time-independent model parameters. The parameters, α and γ , are assumed to be time-dependent, given by α = α (1 + z ) α (cid:48) (2)and γ =  γ a if z < z c ( γ a − γ b )( z + 1 z c + 1 ) γ (cid:48) + γ b if z ≥ z c (3)where α , α (cid:48) , γ a , γ b and z c are time-independent modelparameters. In L14 and L15, both α and γ are chosen tobe time-dependent to make the model compatible withthe observed galaxy stellar mass functions (GSMFs) atdiﬀerent redshifts and the composite conditional lumi-nosity function of cluster galaxies at redshift z = 0 (see Chen Yangyao et al. F r a c M d e x M h = 10 h M z = 4.0 ELUCID+MCMCMillenniumP08 F r a c M d e x z = 2.0 F r a c M d e x z = 1.0 log (M prog /M z = 0 ) F r a c M d e x z = 0.5 log (M prog /M z = 0 ) log (M prog /M z = 0 ) Fig. 1.—

Conditional progenitor mass functions (mean fraction of mass in progenitors in per unit progenitor mass M prog /M z =0 bin inlogarithmic space) of dark matter halos from diﬀerent kinds of merging trees, for diﬀerent z = 0 halos of diﬀerent masses (each column)and for progenitors at diﬀerent redshifts (each row). Black solid: halo merger trees obtained from ELUCID simulation repaired by Monte-Carlo-based trees. Blue solid: P08 (Parkinson et al. 2008) Monte Carlo trees generated with the WMAP5 cosmology. Purple dashed:Millennium (Springel 2005) FOF halo merger trees. Green dashed: P08 Monte Carlo trees with Millennium cosmology. The two verticalsolid lines show the 20 particles mass resolution of halos in ELUCID and Millennium simulations, respectively. also Lim et al. 2017b). All the model parameters aredetermined by ﬁtting the model predictions to a set ofobservational data (see the original papers for details).Here we adopt the parameters listed in L14 (denotedby ’Model III SMF+CGLF’ in this paper), which arebased on a cosmology consistent with the WMAP5 cos-mology (Dunkley et al. 2009; Komatsu et al. 2009) usedhere.Once a dark matter halo hosting a galaxy is ac-creted by a bigger halo, the central galaxy in it is as-sumed to become a satellite galaxy, and thus experiencesatellite-speciﬁc processes, such as tidal stripping andram-pressure stripping, which may reduce and quenchits star formation. L14 modelled the SFR in satellites asSFR( M ∗ , z ) = SFR( t accr ) exp (cid:20) − t − t accr τ ( M ∗ ) (cid:21) (4) with τ ( M ∗ ) = τ ∗ , exp( − M ∗ /M ∗ ,c ) . (5)Here M ∗ is the current mass of the satellite galaxy, τ ∗ , and M ∗ ,c are time-independent model parameters, and t accr is the cosmic time at which the host halo of thegalaxy is accreted. After accretion, the satellite halo andthe galaxies it hosts are expected to experience dynam-ical friction, which causes them to move towards the in-ner part of the new host halo. The satellites may thenmerger with the central galaxy located near the center.We follow L14 and use an empirical model to determinethe time when the merger occurs:∆ t = 0 . Y . ln(1 + Y ) exp(1 . η ) r halo v halo , (6)where Y = M cen /M sat is the ratio of mass between thecentral halo and the satellite halo at the time when theLUCID VI: Cosmic variance of galaxy distribution in the local Universe 5 log M * [h M ] ( M * )[ h M p c d e x ] ELUCIDL14Obs

Fig. 2.—

Galaxy stellar mass functions at redshift z = 0. Blacksolid line: model galaxies based on our repaired trees. Purple solidline: from Lu et al. 2014a, based on Monte Carlo halo mergertrees. Green dots with error bars: from observational result whichis used by L14 to calibrate the model. d N / d l o g M * [ d e x ] [12.0,12.3] log M h [h M ] [12.9,13.2]8 9 10 11 log M * [h M ] d N / d l o g M * [ d e x ] [13.5,13.8] 8 9 10 11 log M * [h M ] [14.4,14.7]

ELUCIDL14Obs

Fig. 3.—

Conditional stellar mass functions in halos with diﬀer-ent mass M h (as indicated in each panel) at redshift z = 0. Blacklines: model galaxies based on our repaired trees, for central galax-ies (dashed) and satellite galaxies (solid). The error bars indicatethe standard deviations among 100 bootstrap resamplings. Purplelines: from Lu et al. 2014a, based on Monte Carlo halo merger trees,for central galaxies (dashed) and satellite galaxies (solid). Greenmarkers with error bars: from observational result of Yang et al.2008, for central galaxies (circles) and satellite galaxies (triangles). accretion occurs, and r halo and v halo are the virial ra-dius and virial velocity of the central halo (e.g. Boylan-Kolchin et al. 2008). The parameter, η , describes thespeciﬁc orbital angular momentum, and is assumed tofollow a probability distribution P ( η ) = η . (1 − η ) . (e.g. Zentner et al. 2005). After merger, a fraction of f TS of the stellar mass of the satellite is added to thecentral galaxy, with f TS a model parameter.The ingredients given above can be used to predict thestellar mass and SFR of both central and satellite galax-ies. In order to make predictions for galaxy luminositiesin diﬀerent bands, we also need the metallicities of stars.We use the mean metallicity - stellar mass relation givenby Gallazzi et al. (2005) to assign metallicities to galax-ies according to their masses. A simple stellar populationsynthesis model, based on the Bruzual & Charlot (2003)with a Chabrier initial mass function (Chabrier 2003),is adopted to obtained the mass to light ratio of formedstar, and the mass loss due to stellar evolution.We note that the L14 model, which is based on Monte-Carlo merger trees, does not take into account some spe-cial events that exist in numerical simulations. In sim-ulated merger trees, some sub-halos were main halos atsome early times, accreted into other systems as satel-lites later, and were eventually ejected and became mainhalos again. For such cases, we treat the galaxy in thesub-halo as a satellite galaxy even after the sub-halo isejected. The ejected sub-halos are then treated as newmain halos after ejection. This implementation does notmake much physical sense, but best mimic the MonteCarlo merger trees in which sub-halos are never ejected,and all halos at a given time are treated equally withoutdepending on whether or not they have gone through abig halo. Such an implementation is necessary, as themodel parameters given by L14 are calibrated by usingMonte-Carlo merger trees.Fig. 2 shows the galaxy stellar mass function (GSMF)of model galaxies at redshift z = 0, in black solid line, incomparison with the result of L14 (purple solid line). Asone can see, the L14 result is well reproduced over wideranges of stellar masses, which demonstrates that our im-plementation of the L14 model with the ELUCID halomerger trees are reliable, as long as the general galaxyGSMF is concerned. For reference, we also include theobservational data points (green dots with error bars)that were used in L14 to constrain their model parame-ters.As a more demanding test, we compare in Fig. 3 theconditional galaxy stellar mass functions (CGSMFs) inhalos of diﬀerent masses at redshift z = 0 obtained fromthe ELUCID halo merger trees with those given by L14.Here again we see a good agreement between the two.Since the CGSMF gives the average number of galaxiesof a given stellar mass in a halo of a given mass, a goodmatch in CGSMFs also implies that the spatial clusteringof galaxies as a function of stellar mass is also reproduced.3.2. Galaxy occupation in dark matter halos

To use our model galaxies to construct mock catalogs,we need to assign spatial positions and peculiar velocitiesto galaxies in each halo in the simulation according to thehalo occupation distributions (HODs) obtained from theempirical model described above. Here we adopt a sub-halo abundance matching method that links galaxies ina halo to the sub-halos in it. As shown in Wang et al.(2016), the sub-halo population can be identiﬁed reliablyfrom the ELUCID simulation for sub-halos with massesdown to ∼ h − M (cid:12) . The abundance matching goesas follows. For a given halo, we ﬁrst rank galaxies in de- Chen Yangyao et al. Fig. 4.—

Spatial distribution of SDSS (left panel) and mock (right panel) galaxies. Selections are made for all galaxies with r ≤ . . , .

12] in the Sloan NGC region. Only galaxies in a 4 ◦ declination slice are plotted. scending order of stellar mass and sub-halos in descend-ing order of halo mass. Here the mass of a sub-halois that at the time when the sub-halo was ﬁrst accretedinto its host. Note that sub-halos both identiﬁed directlyfrom the simulation and added using Monte Carlo mergertrees (see § z = 0 Monte Carlo halo, the galax-ies hosted by them are usually too faint to be relevant;they are only included for completeness, but actually arenot used in constructing the mock catalog. Finally, theposition and velocity of the sub-halo are assigned to thegalaxy that has the same rank. For those galaxies thatdo not have sub-halo counter-parts, their positions andvelocities are assigned randomly according to the NFWproﬁle. This method can be used to construct volumelimited samples within the entire simulation box downto stellar masses ∼ h − M (cid:12) , with full phase space in-formation obtained from the simulated sub-halos. Thisis suﬃcient for most of our purposes.3.3. The SDSS mock catalog

With full information about the luminosities and phasespace coordinates for individual galaxies, it is straight-forward to make mock catalogs using galaxies in the con-strained volume and applying the same selection criteriaas in the observation. For each model galaxy in the sim-ulation box, we assign to it a cosmological redsfhit, z cos ,according to its distance to a virtual observer, and theobserved redshift, z obs , is given by z cos together with itsline-of-sight (los) peculiar velocity, v los : z obs = z cos + (1 + z cos ) v los c , (7)with c the speed of light. Here the location of the virtualobserver and the coordinate system are determined bythe orientation of the SDSS volume in the simulationbox. SDSS apparent magnitudes in u , g , r , i , and z are assigned to each galaxy according to its luminositiesin the corresponding bands. For our SDSS mock sample,we select all galaxies in the SDSS Northern-Galactic-Cap (NGC) region with redshifts 0 . < z < .

12 and withmagnitude r ≤ . ≈ .

08, are well reproduced. Thus, themock catalog can be used to investigate both the proper-ties of the galaxy population in the cosmic web, and thelarge scale clustering of galaxies. In particular, since allgalaxies above our mass resolution limit, which is about10 h − M (cid:12) , are modeled in the entire simulation box,a comparison of the statistical properties between theSDSS mock catalog and the whole simulation box car-ries information about the CV of the SDSS sample. COSMIC VARIANCE IN GALAXY STELLARMASS FUNCTIONSThe realistic model catalogs described above havemany applications, such as to study the relationships be-tween galaxies and the the mass density ﬁeld, and to in-vestigate the galaxy population in diﬀerent componentsof the cosmic web. Here we use them to analyze andquantify the cosmic variances (CV) in the measurementsof the galaxy stellar mass function (GSMF) and lumi-nosity function (GLF). We ﬁrst use model galaxies inthe whole simulation box to quantify the CV as a func-tion of sample volume and galaxy mass. We then use theSDSS mock catalog to examine the CV in the SDSS, andto investigate diﬀerent estimates of the GSMF/GLF intheir abilities to account for the CV. We propose and testa new method that can best correct for the CV. Finally,we apply our method to the SDSS catalog to obtain GLFand GSMF that are free of the CV.4.1.

Cosmic variance as a function of sample volumeand galaxy mass

To quantify the eﬀects of CV, we partition the whole500 h − Mpc simulation box into sub-boxes, each witha given size L s , without overlap. For each sub-box i wecalculate the galaxy number density n g,i ( M ∗ ; L s ). Fig. 5shows the GSMFs obtained for 100 sub-boxes with sizes L s = 25, 50, 100, and 250 h − Mpc, respectively. The re-sults of individual sub-boxes are shown by the green lines,while the average and the 2 σ variance (96%) among theGSMFs are shown by the red curve and bars, respec-tively. As expected, the scatter among the sub-boxesLUCID VI: Cosmic variance of galaxy distribution in the local Universe 7 ( M * )[( h M p c ) d e x ] L box = 25 h Mpc

Sub-boxes h Mpc log M * [h M ] ( M * )[( h M p c ) d e x ] h Mpc log M * [h M ] h Mpc

Fig. 5.—

GSMFs at z = 0 in sub-boxes in the 500 h − Mpc boxof ELUCID simulation. For each sub-box size L box ≤ h − Mpc,100 sub-boxes without overlap are randomly chosen in the simu-lation box, while for L box = 250 h − Mpc, all the 8 sub-boxes areused. The GSMFs of individual sub-boxes are shown by the greencurves in each panel. The average over the sub-boxes of the samesize is given by the red line in each panel. Error bars covering 96%(2 σ ) range among diﬀerent sub-boxes are also plotted. decreases as the sub-box size increases. For instance,the scatter for L s = 50 h − Mpc is about ≈ . L s = 250 h − Mpc.Theoretically, the galaxy number density n g is relatedto the mass density ρ m by a stochastic bias relation: δ g = bδ m + (cid:15) , (8)where δ g = ( n g /n g ) − δ m = ( ρ m /ρ m ) −

1, with n g and ρ m being the mean number density of galaxiesand the mean density of mass in the Universe. The co-eﬃcient, b , is the bias parameter, which characterizesthe deterministic part of the bias relation, and (cid:15) is thestochastic part. If the galaxy number density ﬁeld is aPoisson sampling of the mass density ﬁeld, then the vari-ance in the galaxy density can be written as σ t = σ + σ , (9)where σ P = N − / is due to Poisson ﬂuctuation. Assum-ing linear bias, the deterministic part, which we refer toas the cosmic variance (CV), can be written as: σ ( M ∗ ; L s ) = b ( M ∗ ) σ m ( L s ) , (10)where L s is the characteristic size of the sample, and σ m ( L s ) is the rms of the mass ﬂuctuation on the scale of L s .Motivated by this, we model σ CV using the GSMF ob-tained from simulated galaxies. The number density n g,i of all sub-boxes are synthesized to give the mean value, n g ( M ∗ ; L s ), and the variance, σ t ( M ∗ ; L s ). We use n g to estimate the expected Poisson variance, σ , and useequation (9) to estimate σ ( M ∗ ; L s ) by subtracting thePoisson part from the total variance. Equation (10) is C V M * [ h M ]

10 20 30 50 70 L s [h Mpc] C V [ M o c k ] / C V [ M o d e l ] Fig. 6.—

Upper panel: Cosmic variance σ CV as a function of thecharacteristic size of sample L s and stellar mass M ∗ , as indicatedin the panel. Solid lines are σ CV estimated from the mock sample,while dashed lines are from the ﬁtting formula. Lower panel: Ratioof σ CV between the mock sample and model prediction. The blacksolid line indicates the ratio of 1 . then used to ﬁt the dependence of CV on stellar massand the size of sub-box. We ﬁnd that the L s -dependencecan be well described bylog σ m ( x ) = p + p x + p x + p x , (11)where x = log( L s / h − Mpc), and p = 1 . p = − . p = 0 .

92, and p = − .

25, while the M ∗ dependence bylog b ( y ) = q + q y + q y + q y , (12)where y = log( M ∗ / h − M (cid:12) ), and q = − . q = 5 . q = − .

68, and q = 0 . σ CV obtained di-rect from the simulated galaxy sample and the model pre-diction as a function of L s for galaxies of diﬀerent M ∗ , asrepresented by diﬀerent lines. The ﬁtting formulae workwell over the range from 10 h − M (cid:12) to 10 . h − M (cid:12) in M ∗ , and from 10 h − Mpc to 125 h − Mpc in L s .The above prescription also provides a model for thecovariance matrix of the cosmic variance. Consider thecovariance matrix, C , of the densities between galaxies ofmasses M ∗ , and M ∗ , . The bias model described above Chen Yangyao et al. L s [h Mpc] C o v a r i a n c e ( M * , M * ; L s ) M *1 , M *2 [ h M ] =10 , 10 , 10 Fig. 7.—

The covariance, Cov( M ∗ , , M ∗ , ; L s ), of the GSMFbetween two stellar masses, M ∗ , and M ∗ , , as a function ofthe characteristic sample size L s . Symbols show results obtainedfrom the mock sample. Diﬀerent M ∗ , are represented by diﬀer-ent symbols: M ∗ , [ h − M (cid:12) ] = 10 . , . , . , from bot-tom up, scaled by 0 . , , M ∗ , with the same M ∗ , are re-scaled by 0 . , , .

2, for M ∗ , [ h − M (cid:12) ] = 10 . , . , . , respectively. The solidcurves are model predictions. gives C ( M ∗ , , M ∗ , ; L s ) = b ( M ∗ , ) b ( M ∗ , ) σ m ( L s ) . (13)Fig. 7 shows the ratio between the measured C and themodel predictions as a function of L s for a number of( M ∗ , , M ∗ , ) pairs. Overall the model matches the mea-surements well. Some discrepancies can be seen for mas-sive galaxies and small L s , where the model predictionis slightly lower than that measured from the simulationdata.We can compare the CV model calibrated above withthat obtained from SDSS data. To this end, we estimatethe total variance, the Poisson variance, and the cosmicvariance using sub-boxes of given L s that are fully con-tained by the SDSS volume, within which the sampleis complete for a given M ∗ . In order to estimate thevariance among sub-boxes reliably, we only present caseswhere at least 10 sub-boxes are available. The results,plotted in Fig. 8, show that the SDSS measurements fol-low the model predictions for 10 < L s < h − Mpc and M ∗ > h − M (cid:12) . Note that we did not ﬁt the σ CV for M ∗ > . h − M (cid:12) , but the extrapolation seemsto match the SDSS measurements well even for suchstellar masses. For M ∗ < h − M (cid:12) , the variance ob-tained from the SDSS becomes signiﬁcantly lower thanthe model prediction. As we will see below, this devia-tion is caused by the fact that the local volume, withinwhich such galaxies can be observed, does not sample thegalaxy population fairly.To summarize, the simple model presented above pro-vides a useful way to estimate the level of CV expected inthe measurements of the GSMF. This variance, which is C V M * [ h M ]

10 20 30 50 70 L s [h Mpc] C V [ S D SS ] / C V [ M o d e l ] Fig. 8.—

The model of cosmic variance compared with SDSSdata. Upper panel: Cosmic variance σ CV of the GSMF as a func-tion of the characteristic sample size L s , for galaxies of diﬀerentstellar masses, M ∗ , as shown by diﬀerent colors. Solid lines are σ CV estimated from the SDSS sample, while dashed lines are predictionsof the ﬁtting model. Lower panel: The ratio of σ CV between theSDSS sample and the model. The black horizontal line indicatesthe ratio of 1 . produced by the ﬂuctuations of the cosmic density ﬁeld,should be combined with the Poisson variance from num-ber counting to estimate the total variance in the uncer-tainty in the GSMF. This is particularly the case wherethe galaxy population is observed in a small volume andthe cosmic variance is large than the counting error. Inreal applications, other types of uncertainties, such as er-rors in photometry, redshift, and stellar mass estimate,should also be modeled properly along with the CV de-scribed here.4.2. Cosmic variances in the SDSS volume

In this subsection we examine in detail the CV in theSDSS using the mock samples constructed for the SDSS.Here we only consider galaxies and model galaxies in theSDSS Northern-Galactic-Cap (NGC) (thereafter, SDSSsky coverage) with redshift 0 . ≤ z ≤ .

12 (thereafter,SDSS volume). We construct four diﬀerent types of sam-ples(i) SDSS sample: SDSS DR7 observed galaxies in theSDSS volume, with r -band magnitude selection r ≤ . Redshift N u m b e r D e n s i t y [ h M p c ] Simulated HaloMock GalaxySDSS Galaxy

Fig. 9.—

Galaxy and halo number densities at diﬀerent redshift z in the Sloan sky coverage, from z = 0 .

01 to 0 .

12. Red lines arefor simulated halos (Solid: halos with M h > h − M (cid:12) , oﬀset by × .

0; Dashed: 10 ≤ M h ≤ h − M (cid:12) ). Blue lines are for mockgalaxies based on the empirical model (Solid: M ∗ > . h − M (cid:12) ,oﬀset by × .

0; Dashed: 10 . ≤ M ∗ ≤ . h − M (cid:12) , oﬀset by × . M r < − .

5, oﬀset by × .

0; Dashed: − . ≤ M r ≤ − .

5, oﬀsetby × . (ii) SDSS mock sample: model galaxies in the SDSSvolume, with r-band magnitude selection r ≤ . n g ,as a function of redshift z , in the SDSS volume-limitedmock sample. Model galaxies in a given stellar mass binare binned in redshift intervals with bin size δz = 0 . z ≈ .

03, due tothe presence of the large-scale structure known as theCfA Great Wall (Geller & Huchra 1989), and the otheraround z ≈ .

075 due to the presence of the the SloanGreat Wall (Gott III et al. 2005). Below z ≈ .

03, thenumber densities show a sharp decline as z decreases, andthe eﬀect is stronger for massive galaxies, indicating thepresence of a local void (see also, for example Whitbourn& Shanks 2014, 2016). For comparison, we also show theredshift distribution of SDSS galaxies, obtained by using sub-samples complete to given absolute magnitude lim-its. We see that the observed distribution follows wellthat in the mock sample, indicating our mock samplecan be used to study the CV in the SDSS sample. Forreference we also plot the number densities of simulateddark matter halos in the SDSS volume versus redshift.Here again we see structures similar to that seen in thegalaxy distribution. In particular, there is a marked de-cline of halo density at z < .

03, and the decline is moreprominent for more massive halos.The presence of the local low-density region shownabove can have strong impact on the statistical prop-erties of the galaxy population derived from the SDSS,especially for faint galaxies which can be observed onlywithin the local volume in a magnitude limited sample.Indeed, the measurement of the GSMF, which describesthe number density of galaxies as a function of galaxymass, can be biased if the local low-density region is notproperly accounted for.As an illustration, Fig. 10 shows the GSMFs derivedfrom SDSS magnitude-limited mock samples with diﬀer-ent r -band magnitude limits, using the standard V max method. For reference, we also plot the GSMF obtainedfrom the SDSS volume-limited mock sample (the thickdashed line), which matches well the ‘global’ GSMF ob-tained from the whole 500 h − Mpc simulation box.As one can see, the GSMF can be signiﬁcantly underes-timated if the magnitude limit is shallow (correspondingto a low value of the r -band magnitude limit, r lim ). Onlya sample as deep as r lim = 20 can provide an unbiasedestimate of the GSMF down to M ∗ ∼ h − M (cid:12) . Forthe SDSS limit, r lim = 17 .

6, the measurement starts todeviate from the global GSMF at M ∗ ≈ h − M (cid:12) , andthe diﬀerence between them reaches a factor of about 5at around 10 h − M (cid:12) .The underestimate of the GSMF at the low-mass endis produced by the presence of the low-density regionat z < .

03 in the SDSS volume. To show this moreclearly, we deﬁne a ‘break’ mass, M . ( r lim ), so thatgalaxies with stellar masses M ∗ = M . is complete to z = 0 .

03 for the given magnitude limit, r lim . Here wehave used the mean mass-to-light ratio, obtained fromthe mock sample, to convert the stellar mass to an ab-solute magnitude. As one can see, for each r lim , theGSMF obtained from the sample starts to deviate fromthe global benchmark at M . ( r lim ), shown by the ver-tical line, and is substantially lower at M ∗ < M . . Allthese demonstrate that the faint-end of the GSMF canbe under-estimated signiﬁcantly in the SDSS due to thepresence of the local low-density region at z < . The correction of cosmic variance

Conventional methods

The results described above indicate that CV is a se-rious issue in the measurements of the GSMF, even fora sample as large as the SDSS. Corrections have to bemade in order to obtain an unbiased result that repre-sents the true GSMF in the low- z Universe. In the liter-ature, some estimators other than the standard V-maxmethod have been proposed, such as the maximum like-lihood method (e.g. G. Efstathiou 1988; Blanton et al.2001; Cole 2011; Whitbourn & Shanks 2016), and scal-ing with bright galaxies (e.g. Baldry et al. 2012). These0 Chen Yangyao et al. log M * [h M ] li m i t e d / B e n c h m a r k r lim = log M * [h M ] li m i t e d [ h M p c d e x ] Fig. 10.—

The galaxy stellar mass functions (GSMF), Φ( M ∗ ), estimated using the V-max method from SDSS magnitude-limited mocksamples with diﬀerent magnitude limits, r lim , as shown in the left panel. Right panel shows the absolute values of the GSMF, with thebenchmark shown by the black dashed line. The black solid shows the result for r lim = 17 .

6, the magnitude limit of the SDSS survey.The left panel shows the ratio of GSMF between the magnitude-limited samples and benchmark. In each panel, the vertical dashed linesindicate the stellar masses corresponding to the break at z = 0 . ( M * )[ h M p c d e x ] SDSS Mock

BenchmarkVmaxDensity ScalingMaximum Likelihood log M * [h M ] / B e n c h m a r k Fig. 11.—

The GSMFs estimated from the SDSS mock catalogwith diﬀerent methods designed to account for the cosmic vari-ance. Upper panel shows the GSMFs and lower panel shows theratio of each GSMF to the benchmark. Black solid line: the bench-mark GSMF obtained from the SDSS volume-limited mock sample.Black dashed line: the GSMF obtained with the V-max method,with error bars calculated using 100 bootstrap samples. Purpleline: the GSMF obtained from the density scaling method of Baldryet al. (2012). Green line: the GSMF obtained from the maximumlikelihood method assuming a triple-Schechter function form. methods were designed, at least partly, to correct for theeﬀects of large-scale structure in the measurements of theGSMF from an observational sample. Here we test theirperformances using our SDSS mock samples.In the maximum likelihood method, one starts withan assumed functional form, either parametric or non-parametric, for the GSMF, and then use a maximumlikelihood method to match the model prediction withthe data, thereby obtaining the parameters that speciﬁesthe functional form of the GSMF. In our analysis here, wechoose a triple-Schechter function to model the GSMF,Φ( M ∗ )d log M ∗ = (cid:88) k =1 Φ ∗ ,k (cid:18) M ∗ µ i (cid:19) α i +1 e − M ∗ /µ i d log M ∗ , (14)where Φ ∗ ,i , µ i , α i are the amplitude, the characteris-tic mass, and the faint-end slope, of the i -th Schechtercomponent, respectively. This function is assumed to bedeﬁned over the domain, [ M ∗ , min , M ∗ , max ]. For a galaxy,‘ i ’, with stellar mass M i at redshift z i in the sample, theprobability for it to be observed at this redshift is L i = Φ( M i ) (cid:82) M max M i, min Φ( M ∗ )d log M ∗ . (15)The total likelihood L that the GSMF takes the assumedΦ is then given by L = N (cid:89) i =1 L i , (16)where N is the number of galaxies in the sample. Themodel parameters can be adjusted so as to maximize thelikelihood L . In our application to the SDSS mock sam-ple, we ﬁt the GSMF obtained from the V-max methodwith the Triple-Schechter function and use the param-eters as the initial input of the maximization process.Since the bright end is free of cosmic variance, we ﬁxthe three parameters characterizing the Schechter com-ponent at the brightest end, leaving the remaining sixLUCID VI: Cosmic variance of galaxy distribution in the local Universe 11parameters to be constrained by the maximum likeli-hood process. As the maximum likelihood method doesnot provide information about the overall amplitude ofΦ( M ∗ ), the bright end is also used to ﬁx the amplitude ofΦ( M ∗ ). The GSMF estimated in the way from the SDSSmock sample is plotted in Fig. 11 as the green line, incomparison with that estimated by the V-max method(dashed line), and the benchmark GSMF (black line).It is clear that the maximum likelihood method worksbetter than the V-max method, but it still underesti-mates the GSMF at the low-mass end. The underlyingassumption of the maximum likelihood method is thatthe relative distribution of galaxies with respect to M ∗ iseverywhere the same. This in general is not true, giventhat galaxy clustering depends on M ∗ . This explains thefailure of this method in correcting the CV.In an attempt to control the cosmic variance in theGAMA survey, Baldry et al. (2012) proposed to use thenumber density of brighter galaxies estimated in a largervolume to scale the number density of fainter galaxiesthat are observed only in a smaller volume. This methodwill be referred to as “density scaling” method. Ourimplementation of this method is as follows.(i) Choose a ‘cosmic-variance-free (CVF)’ sample, in-cluding only bright galaxies that have z max larger than0 .

12. In our SDSS mock sample, this corresponds to se-lect galaxies with M ∗ > × h − M (cid:12) . This samplewill be used as the density tracer at diﬀerent redshifts,to scale the density at the fainter end.(ii) Compute the cumulative number density of theCVF sample, n CVF ( < z ), as a function of redshift z .In practice, the cumulative number density is calculatedin the redshift range [0 . , z ].(iii) Compute the GSMF, Φ Vmax ( M ∗ ), using the V-max method(iv) For each stellar mass bin of Φ Vmax ( M ∗ ), ﬁnd thelargest redshift, z max ( M ∗ ), below which galaxies in thisbin can be observed in the sample.(v) Obtain the corrected GSMF, Φ sc , by scaling theV-max estimate with a correction factor:Φ sc ( M ∗ ) = Φ Vmax ( M ∗ ) n CVF ( < . n CVF [ < z max ( M ∗ )] , (17)where n CVF ( < .

12) is the number density of the CVFsample in the full redshift range, [0 . , . n CVF [

Fig. 12.—

The conditional galaxy stellar mass functions(CGSMFs) for halos of diﬀerent masses, M h , as indicated in theﬁgure. Solid lines represent the CGSMFs estimated from the SDSSmock sample. Dotted lines are estimated from the SDSS volume-limited mock sample. bright and faint galaxies are both related to the underly-ing density ﬁeld by a similar bias factor. In general, thisassumption is not valid.4.3.2. Methods based on the joint distribution of galaxiesand environment

Since galaxies form and reside in the cosmic densityﬁeld, the number density of galaxies is expected to de-pend on the local environment of galaxies. Suppose thelocal environment is speciﬁed by a quantity or a set ofquantities E . The joint distribution of galaxy mass and E obtained from a given sample, ‘ S ’, can be written asΦ S ( M ∗ , E ) = Φ S ( M ∗ |E ) P S ( E ) , (18)where Φ S ( M ∗ |E ) is the conditional distribution of galaxymass in a given environment estimated from sample’S’, and P S ( E ) is the probability distribution functionof the environmental quantity given by the sample. Ifgalaxy formation and evolution is a local process so thatΦ( M ∗ |E ) is independent of the galaxy sample, then theCV in the stellar mass function derived from the samplecan all be attributed to the diﬀerence between P S ( E ) andthe global distribution function, P ( E ), expected from alarge sample where the distribution of E is sampled with-out bias. An unbiased estimate of the GSMF Φ( M ∗ ) isthen Φ( M ∗ ) = (cid:90) Φ S ( M ∗ |E ) P ( E ) d E . (19)Thus, the unbiased GSMF is obtained from the condi-tional distribution function, Φ S ( M ∗ |E ), derived from thesample ‘ S ’, and the unbiased distribution P ( E ) of envi-ronment variable.The environmental quantity has to be chosen properlyso that it can be estimated from observation, while theunbiased distribution function, P ( E ), can, in principle,2 Chen Yangyao et al. ( M * )[ h M p c d e x ] SDSS Mock

BenchmarkCorrected CGSMFDirect CGSMFVmax log M * [h M ] / B e n c h m a r k Fig. 13.—

The GSMFs obtained by applying diﬀerent meth-ods to the SDSS mock sample. Upper panel shows the GSMFswhile the lower panel shows the ratio of each GSMF to the bench-mark. Black solid line shows the benchmark GSMF directly calcu-lated from the SDSS volume-limited mock sample. Black dashedline is the GSMF obtained by the method based on the CGSMFsdescribed in this paper. Black dotted line shows the GSMF de-rived by using the V-max method. Green line shows the GSMFobtained by combining the CGSMFs directly calculated from theSDSS mock sample (which is incomplete for faint galaxies in mas-sive groups). The purple solid and dashed lines are Φ and Φ ,the contributions of halos with masses M h ≥ . h − M (cid:12) and M h < . h − M (cid:12) , respectively (see § be obtained from large cosmological simulations. Herewe analyze a method which uses the masses of dark mat-ter halos as the environmental quantity. In this case, E is represented by halo mass, M h , Φ S ( M ∗ | M h ) is theconditional galaxy stellar mass function(CGSMF), and P ( E ) = n ( M h ) is the halo mass function estimated di-rectly from the constrained simulation (Yang et al. 2003).The advantage here is that the unbiased estimates areonly needed for the conditional functions, Φ( M ∗ | M h ).The disadvantage is that it is model dependent through n ( M h ), and that one has to identify galaxy systems torepresent dark matter halos.Fig. 12 shows the conditional stellar mass functions, ofgalaxies in halos of diﬀerent masses, estimated from theSDSS mock sample, in comparison with the benchmarksobtained from the total SDSS volume-limited sample. Asone can see, for a given halo mass, the CGSMF obtainedfrom the SDSS mock sample matches the benchmark wellonly in the massive end. This happens because of the ( M r )[ h M p c m a g ] SDSS (This work)SDSS (Vmax)Loveday 2012Whitbourn 2016 M r / V m a x Fig. 14.—

The galaxy luminosity function (GLF) estimatedfrom the SDSS catalog by using our method, in comparison to theresults in the literature. The upper panel shows the GLFs, whilethe lower panel shows the ratio of each GLF to that obtained withthe V-max method. The black solid line is the GLF obtained byour method. The gray solid line at the faint end (ﬁrst two datapoints) are obtained by linear extrapolation. Black dashed line isthe GLF by the V-max method. The gray shaded band indicatesthe cosmic variance of SDSS sample expected from Eq. 10. Purpleline is from Loveday et al. (2012) for GAMA survey. Green line isfrom Whitbourn & Shanks (2016) using SDSS ’cmodel’ magnitude. absence of massive halos at small distances in the localunder-dense region, so that their faint member galaxiesare not observed in the magnitude limited sample. Thetotal GSMF, obtained using Eq. (19), is shown in theFig. 13 by the green line, in comparison to the benchmarkof the total GSMF represented by the black solid line,and to the GSMF obtained by the traditional V-maxmethod represented by the black dotted line. Here thebenchmark CGSMFs (dotted lines in Fig. 12) are usedfor halos with M h < h − M (cid:12) , while the CGSMFsestimated from the magnitude-limted sample (solid linesin Fig. 12) are used for less massive halos. This is tomimic the fact that the total CGSMF for less massivehalos can be obtained by other means (see Eq. 21), whilethe low-stellar-mass end of CMGSFs for massive haloscannot be obtained directly from the SDSS spectroscopicsample. Here again the stellar mass function at the low-mass end is under-estimated, although the method workssubstantially better than the V-max method. The reasonis clear from Fig. 12. The stellar mass function at thelow mass end is only sampled by low-mass halos becauseof the absence of massive halos in the nearby volume,while the low-mass end in the benchmark stellar massLUCID VI: Cosmic variance of galaxy distribution in the local Universe 13 ( M * )[ h M p c d e x ] = . SDSS (This work)SDSS (Vmax)Li & White 2009Baldry 2012Bernardi 2013 log M * [h M ] / V m a x Fig. 15.—

The galaxy stellar mass function (GSMF) obtainedfrom the SDSS sample in this paper, in comparison with the re-sults published earlier. Upper panel shows the GSMFs, while lowerpanel shows the ratio of each GSMF to that given by the V-maxmethod. Black solid line is the GSMF obtained by our method.Black dashed line is the GSMF by the V-max method for the SDSSsample. The gray shaded band indicates the cosmic variance ofSDSS sample expected from Eq. 10. The gray solid piece at thefaint end indicates the slight change when the extrapolation of theGLF is used. Green line: GSMF from Li & White (2009). Blueline: GSMF from the GAMA survey (Baldry et al. 2012). Purpleline: GSMF from Bernardi et al. (2013), where the stellar masses ofbright galaxies are estimated from S´ersic-Exponential ﬁtting. Thered dashed line in the top of upper panel has a slope α = − . function is actually aﬀected by the low-mass ends of theconditional stellar mass functions of massive halos.These results demonstrate an important point. If theshape of the CGSMF depends signiﬁcantly on halo mass,then one needs to estimate all the conditional functionsreliably down to a given stellar mass limit, in order toget an unbiased estimate of the total stellar mass functiondown to the same mass limit. The SDSS redshift sampleis clearly insuﬃcient to achieve this goal in the low-massend.In a recent paper, Lan et al. (2016) showed thatthe conditional functions of galaxies can be estimateddown to M r ∼ −

14 (corresponding to a stellar mass ofabout 10 M (cid:12) ) for halos with mass M h > M (cid:12) bycross correlating galaxy groups (halos) selected from theSDSS spectroscopic sample with SDSS photometric data.Thus, if we can estimate the contribution by halos withlower masses to a similar magnitude, then the total func-tion can be obtained. Here we test the feasibility of such an approach using SDSS mock sample. First, we ob-tain the CGSMFs down to a stellar mass of 10 h − M (cid:12) for halos with M h ≥ M = 10 . h − M (cid:12) directly fromthe total simulation volume. This step is to mimic thefact that such CGSMFs can be obtained, as in Lan et al.(2016), from observational data. The GSMF contributedby such halos isΦ ( M ∗ ) = (cid:90) ∞ M Φ( M ∗ | M h ) n ( M h ) dM h , (20)To maximally reduce possible uncertainties introducedby this procedure, we estimate the total CGSMF Φ for M h ≥ M directly from a modiﬁed V-max method forthe high-stellar-mass end. Speciﬁcally, each galaxy is as-signed a weight, n halo , u /n halo ( V max ), the ratio betweenthe number density of M h ≥ M halos in the Universeand that in V max . In practice, the weighted V-max haslittle impact on the results, as the eﬀect of cosmic vari-ance for high-mass galaxies is small. The procedure isincluded only for maintaining consistency. Eq. (20) isthen used only at the low-stellar-mass end where the V-max method fails because of incompleteness. The resultfor Φ obtained in this way is shown by the purple solidcurve in Fig 13.To estimate the contribution by halos with M h < M in a way that can be applied to real observation, we ﬁrsteliminate all galaxies that are contained in halos with M h ≥ M . For the rest of the galaxies, we estimate thefunction by a modiﬁed version of the V-max methodΦ ( M ∗ ) = (cid:88) V max

11 + bδ ( V max ) , (21) TABLE 1Corrected galaxy luminosity and stellar mass function M r − h log Φ( M r ) log M ∗ log Φ( M ∗ )[ h Mpc − mag − ] [ h − M (cid:12) ] [ h Mpc − dex − ] − . − . +0 . − . . − . − . − . − . +0 . − . . − . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . . − . +0 . − . − . − . +0 . − . − . − . +0 . − . − . − . +0 . − . ∗ Galaxy brighter than M r − h = −

15 is not suﬃcient tocalculate GSMF down to M ∗ = 10 . , . h − M (cid:12) . Extrap-olation of GLF is used to solve this. Left column of Φ( M ∗ ) iswithout extrapolation, while right column is with extrapolation. b =0 . δ ( V max ) = ρ ( V max ) /ρ u − V max , ρ u is the universal mass density, and ρ ( V max ) is the meanmass density within V max . The function Φ so estimatedis shown as the purple dashed curve in Fig.13. Note thatsmall groups can only be seen in the very local region,so the CGSMF estimated for halos in a small mass bincan be very noisy. Our method intends to avoid thisuncertainty by calculating the total CGSMF for all halosless massive than M . The total GSMF, Φ = Φ + Φ isshown by the black dashed line in Fig.13, which is veryclose to the benchmark, indicating that our method canindeed take care of the bias produced by the local under-dense region. We have checked that the result dependsonly weakly on the choice of the value of M .4.4. Applications to observational data

In this subsection, we apply the method describedabove to the real SDSS sample. We ﬁrst estimatethe galaxy luminosity function (GLF) using the proce-dure based on the conditional distributions of galaxy lu-minosity in dark matter halos, as described in § ( M r ), for faint galaxies with magni-tude M r − h > − . . h − M (cid:12) are obtained from Lan et al. (2016), whilethe CLF for brighter galaxies in these halos is estimateddirectly from SDSS sample using the V-max method andthe group catalog of Yang et al. (2012) (see also Yanget al. 2007). For halos with masses below 10 . h − M (cid:12) the CLF, Φ ( M r ), is obtained from the SDSS sample us-ing the modiﬁed V-max method as described by Eq. (21).The total galaxy luminosity function (GLF) is then ob-tained by Φ( M r ) = Φ ( M r ) + Φ ( M r ). Fig. 14 showsthe result of the GLF so obtained in solid black line, incomparison with that obtained from the traditional V-max method. At the faint end, M r − h ≈ −

15, theGLF is about twice as high as that given by the V-maxmethod, indicating that cosmic variance can have largeimpact on the estimate of the GLF at the faint end. Toshow this more clearly, we plot the cosmic variance ex-pected from Eq. 10 as the shaded band in Fig. 14, wherethe stellar mass is obtained from luminosity by usingmean mass-to-light ratio. The expected cosmic varianceis quite large at the faint end, indicating that cosmicvariance is an important issue in estimating the faintend of GLF. The GLF in the local Universe has beenestimated by many authors using various samples (e.g.Blanton et al. 2003; Yang et al. 2009; Loveday et al. 2012;Jones et al. 2006; Driver et al. 2012; Whitbourn & Shanks2016). For comparison, we plot the GLFs obtained byLoveday et al. (2012) from the GAMA survey and byWhitbourn & Shanks (2016), who applies a maximumlikelihood method to the SDSS to account for the cosmicvariance in their estimates. The result of Whitbourn& Shanks (2016) matches ours over a wide range of lu-minosity, but seems to still underestimate the GLF atthe faint end. The result of Loveday et al. (2012) hasa large discrepancy with our result, possibly due to thecosmic variance in the small sky converage of the GAMAsample used, which is 144 deg . Since many of the faintgalaxies in the SDSS photometric data do not have re- liable stellar mass estimates, conditional galaxy stellarmass functions are not available at the low mass end.Because of this, we cannot estimate the GSMF down tothe low-mass end directly from the data with the methodabove. As an alternative, we use the M r - M ∗ relationobtained from the SDSS spectroscopic sample to convertthe GLF obtained above to estimate a GSMF. We dothis through the following steps. (i) Construct a largevolume-limited Monte-Carlo sample of galaxies with ab-solute magnitude distribution given by the GLF. (ii) Binthese galaxies according to their absolute magnitudes.(iii) For each Monte-Carlo galaxy, we randomly choose agalaxy in the real SDSS spectroscopic sample in the sameabsolute magnitude bin, and assign the stellar mass ofthe real galaxy to the Monte Carlo galaxy. (iv) Computethe GSMF of this volume-limited Monte-Carlo sample.The GSMF obtained directly from the GLF in thisway is shown in Fig. 15 by the black solid curve. Sincethe GLF is estimated only down to M r − h ≈ − M r − h = −

15 may contribute to these two stel-lar mass bins. To test this, we extrapolate the faintend of the GLF to M r − h = − .

2, which is suf-ﬁcient to include all galaxies with stellar masses downto 10 h − M (cid:12) . This extrapolation is shown by the grayextension of the black solid curve in Fig. 14. The GSMFobtained from the extended GLF is shown by the grayline in Fig.15. As one can see, the extension of the GLFonly slightly increases the GSMF at the lowest mass. TheGSMF estimated in this way is compared with that esti-mated with the conventional V-max method. The grayshaded band shows the expected cosmic variance givenby Eq. 10 for the SDSS sample. The eﬀect of CV is quitelarge at the low-stellar-mass end. The diﬀerence betweenour result and that obtained from the V-max method iseven larger, indicating again that the local SDSS regionis an unusually under-dense region. The GSMF in thelow- z Universe has been estimated in numerous earlierinvestigations using diﬀerent samples and methods (e.g.Li & White 2009; Yang et al. 2009; Baldry et al. 2012;Bernardi et al. 2013; He et al. 2013; D’Souza et al. 2015).Several of the earlier results are plotted in Fig. 15 forcomparison. The result of Li & White (2009), who mea-sured the GSMF of SDSS sample directly from the stel-lar masses estimated by Blanton & Roweis (2007) with aChabrier IMF (Chabrier 2003) and corrections for dust,matches well our V-max result, and also misses the steep-ening of the GSMF at M ∗ < . h − M (cid:12) . Our mea-surement at M ∗ > . h − M (cid:12) is signiﬁcantly lowerthan that from Bernardi et al. (2013), because they in-cluded the light in the outer parts of massive galaxiesthat may be missed in the SDSS NYU-VAGC used here(see also He et al. 2013 for the discussion of this eﬀect).Such corrections do not aﬀect the GSMF in the low-massend. The overall shape of our GSMF is similar to that ofBaldry et al. (2012) obtained from the GAMA sample,but the amplitude of their function is about 50% lower.GAMA has a small sky coverage, 144 deg , although itis deeper, to r ≈ .

8. According to our test with mocksamples of a similar sky coverage and depth, the cosmicvariance in the GSMF estimated from such a sample canbe very large. The lower amplitude given by the GAMALUCID VI: Cosmic variance of galaxy distribution in the local Universe 15sample may be produced by such cosmic variance.In conclusion, when CV is carefully taken into account,the low-stellar-mass end slope of the GSMF in the low- z Universe, which is about − . M ∗ < . h − M (cid:12) in the GSMF thatis missed in many of the earlier measurements. For ref-erence, we list the GLF and GSMF estimated with ourmethod in Table 1. SUMMARY AND DISCUSSIONIn this paper, we use ELUCID simulation, a con-strained N -body simulation in the Sloan Digital Sky Sur-vey (SDSS) volume to study galaxy distribution in thelow- z Universe. Our main results can be summarized asfollows:(i) Dark matter halos are selected from diﬀerent snap-shots of the simulation, and halo merger trees are con-structed from the simulated halos down to a halo massof ∼ h − M (cid:12) . A method is developed to extend allthe simulated halo merger trees to a mass resolution of10 h − M (cid:12) , which is needed to model galaxies down toa stellar mass of 10 h − M (cid:12) .(ii) The merger trees are used to populate simulateddark matter halos with galaxies according to an empiricalmodel of galaxy formation developed by Lu et al. (2014b,2015a). The model galaxies follow the real galaxies in theSDSS volume both in spatial distribution and in intrinsicproperties. The catalog of the model galaxies, therefore,provide a unique way to study galaxy formation and evo-lution in the cosmic web in the low- z Universe.(iii) Mock catalogs in the SDSS sky coverage are con-structed, which can be used to investigate the distribu-tion of galaxies as measured from the real SDSS dataand its relation to the global distribution expected froma fair sample of galaxies in the low- z Universe. Thesemock catalogs can thus be used to quantify the cosmicvariances in the statistical properties of the low- z galaxypopulation estimated from a survey like SDSS.(iv) As an example, we use the mock catalogs so con-structed to quantify the cosmic variance in the galaxystellar mass function (GSMF). Useful ﬁtting formulae areobtained to describe the cosmic variance and covariancematrix of the GSMF as functions of stellar mass andsample volume.(v) We ﬁnd that the GSMF estimated from the SDSSmagnitude-limited sample can be aﬀected signiﬁcantlyby the presence of the under-dense region at z < .

03, sothat the low-mass end of the function can be underesti-mated signiﬁcantly.(vi) We test several existing methods that are designedto deal with the eﬀects of the cosmic variance in the estimate of GSMF, and ﬁnd that none of them is able tofully account for the cosmic variance eﬀects.(vii) We propose and test a method based on theconditional stellar mass functions in dark matter halos,which is found to provide an unbiased estimate of theglobal GSMF.(viii) We apply the method to the SDSS data andﬁnd that the GSMF has a signiﬁcant upturn at M ∗ < . h − M (cid:12) , which is missed in many earlier measure-ments of the local GSMF.Our results of the GSMF have important implicationsfor galaxy formation and evolution. The presence of anupturn in the GSMF at M ∗ < . h − M (cid:12) suggests thatthere is a characteristic mass scale, ∼ . h − M (cid:12) , cor-responding to a halo mass of ∼ h − M (cid:12) (e.g. Limet al. 2017a), below which star formation may be aﬀectedby processes that are diﬀerent from those in galaxiesof higher masses. The stellar mass function of galax-ies at low- z has been widely used to calibrate numericalsimulations and semi-analytic models of galaxy forma-tion. The improved estimate of the GSMF presentedhere clearly will provide more accurate constraints ontheoretical models.The mock catalogs constructed here have other appli-cations. For example, they can be used to analyze thecosmic variance in the measurements of other statisticalproperties of the galaxy population, such as the corre-lation functions (Zehavi et al. 2005; Wang et al. 2007)and peculiar velocities (e.g. Jing et al. 1998; Lovedayet al. 2018) of galaxies of diﬀerent luminosities/masses.Because of the presence of local large-scale structures,such as the under-dense region at z < .

03, the measure-ments for faint galaxies can be aﬀected. A comparisonbetween the results obtained from the mock sample andthat from the benchmark sample can then be used toquantify the eﬀects of cosmic variance. Another appli-cation is to HI samples of galaxies. Current HI surveys,such as HIPASS (Meyer et al. 2004) and ALFALFA (Gio-vanelli et al. 2005), are shallow, typically to z ∼ .

05, andso the HI-mass functions and correlation functions esti-mated from these surveys can be aﬀected signiﬁcantlyby the cosmic variance in the nearby Universe (e.g. Guoet al. 2017). The same method as described here canbe used to construct mock catalogs for HI galaxies, andto quantify cosmic variances in these measurements. Wewill come back to some of these problems in the future.ACKNOWLEDGEMENTSThis work is supported by the National Key R&DProgram of China (grant Nos. 2018YFA0404503,2018YFA0404502), the National Key Basic ResearchProgram of China (grant Nos. 2015CB857002,2015CB857004), and the National Science Foundationof China (grant Nos. 11233005, 11621303, 11522324,11421303, 11503065, 11673015, 11733004, 11320101002).HJM acknowledges the support from NSF AST-1517528.

REFERENCESBabul A., Rees M. J., 1992, MNRAS, 255, 346Baldry I. K., et al., 2012, MNRAS, 421, 621Bernardi M., Meert A., Sheth R. K., Vikram V.,Huertas-Company M., Mei S., Shankar F., 2013, MNRAS, 436,697 Blanton M. R., Roweis S., 2007, AJ, 133, 734Blanton M. R., et al., 2001, AJ, 121, 2358Blanton M. R., et al., 2003, ApJ, 592, 819Boylan-Kolchin M., Ma C.-P., Quataert E., 2008, MNRAS, 383,936 Chen Yangyao et al.