[PDF] Stochastic bias in multi-dimensional excursion set approaches

Abstract

We describe a simple fully analytic model of the excursion set approach associated with two Gaussian random walks: the first walk represents the initial overdensity around a protohalo, and the second is a crude way of allowing for other factors which might influence halo formation. This model is richer than that based on a single walk, because it yields a distribution of heights at first crossing. We provide explicit expressions for the unconditional first crossing distribution which is usually used to model the halo mass function, the progenitor distributions, and the conditional distributions from which correlations with environment are usually estimated. These latter exhibit perhaps the simplest form of what is often called nonlocal bias, and which we prefer to call stochastic bias, since the new bias effects arise from `hidden-variables' other than density, but these may still be defined locally. We provide explicit expressions for these new bias factors. We also provide formulae for the distribution of heights at first crossing in the unconditional and conditional cases. In contrast to the first crossing distribution, these are exact, even for moving barriers, and for walks with correlated steps. The conditional distributions yield predictions for the distribution of halo concentrations at fixed mass and formation redshift. They also exhibit assembly bias like effects, even when the steps in the walks themselves are uncorrelated. Finally, we show how the predictions are modified if we add the requirement that halos form around peaks: these depend on whether the peaks constraint is applied to a combination of the overdensity and the other variable, or to the overdensity alone. Our results demonstrate the power of requiring models to reproduce not just halo counts but the distribution of overdensities at fixed protohalo mass as well.

Full PDF

aa r X i v : . [ a s t r o - ph . C O ] J a n Mon. Not. R. Astron. Soc. , 1–9 (0000) Printed 29 December 2017 (MN L A TEX style ﬁle v2.2)

Stochastic bias in multi-dimensional excursion setapproaches

Emanuele Castorina ⋆ & Ravi K. Sheth , SISSA - International School For Advanced Studies, Via Bonomea, 265 34136 Trieste, Italy The Abdus Salam International Center for Theoretical Physics, Strada Costiera, 11, Trieste 34151, Italy Center for Particle Cosmology, University of Pennsylvania, 209 S. 33rd St., Philadelphia, PA 19104, USA

29 December 2017

ABSTRACT

We describe a simple fully analytic model of the excursion set approach asso-ciated with two Gaussian random walks: the ﬁrst walk represents the initialoverdensity around a protohalo, and the second is a crude way of allowing forother factors which might inﬂuence halo formation. This model is richer thanthat based on a single walk, because it yields a distribution of heights at ﬁrstcrossing. We provide explicit expressions for the unconditional ﬁrst crossing dis-tribution which is usually used to model the halo mass function, the progenitordistributions from which merger rates are usually estimated, and the conditionaldistributions from which correlations with environment are usually estimated.These latter exhibit perhaps the simplest form of what is often called nonlocalbias, and which we prefer to call stochastic bias, since the new bias eﬀects arisefrom ‘hidden-variables’ other than density, but these may still be deﬁned locally.We provide explicit expressions for these new bias factors. We also provide for-mulae for the distribution of heights at ﬁrst crossing in the unconditional andconditional cases. In contrast to the ﬁrst crossing distribution, these are exact,even for moving barriers, and for walks with correlated steps. The conditionaldistributions yield predictions for the distribution of halo concentrations at ﬁxedmass and formation redshift. They also exhibit assembly bias like eﬀects, evenwhen the steps in the walks themselves are uncorrelated. Our formulae showthat without prior knowledge of the physical origin of the second walk, the naiveestimate of the critical density required for halo formation which is based onthe statistics of the ﬁrst crossing distribution will be larger than that based onthe statistical distribution of walk heights at ﬁrst crossing; both will be biasedlow compared to the value associated with the physics. Finally, we show howthe predictions are modiﬁed if we add the requirement that halos form aroundpeaks: these depend on whether the peaks constraint is applied to a combinationof the overdensity and the other variable, or to the overdensity alone. Our resultsdemonstrate the power of requiring models to reproduce not just halo counts butthe distribution of overdensities at ﬁxed protohalo mass as well.

Key words: large-scale structure of Universe

The excursion set approach, pioneered by Epstein(1983) and developed substantially by Bond et al.(1991), Lacey & Cole (1993), Mo & White (1996) andSheth (1998) yields important insight into various ⋆ E-mail: [email protected] features of hierarchical clustering. Although recentwork has highlighted the limitations of this approach(Paranjape & Sheth, 2012), the limitations are primar-ily of a quantitative rather than qualitative nature.The approach combines the statistics of the initialdensity ﬂuctuation ﬁeld with the physics of spherical ortriaxial collapse, to make predictions for the abundanceof virialized objects as a function of time. This means c (cid:13) E. Castorina & R. K. Sheth that it provides information about merger rates, the high-redshift progenitors of objects of ﬁxed mass at a latertime, the tendency for the mass function in dense regionsto be top-heavy, and hence how the spatial clustering ofthese objects depends on their mass.In the spherical collapse model, the evolution of anobject is determined by its own overdensity. This entersin the excursion set approach as follows. One associatesa one-dimensional random walk with each position inspace; this walk shows how the initial overdensity de-pends on the smoothing scale over which the density isaveraged. The largest scale on which this walk exceeds thecritical density required for spherical collapse contains amass; this is the excursion set estimate of the mass of theobject in which this particular position in space will end-up. Therefore, in this approach, the technical problem tobe solved is that of the ﬁrst crossing distribution of abarrier whose height may depend on the number of stepstaken by the one-dimensional random walk. The statisticsof the initial ﬂuctuation ﬁeld determines the ensemble ofwalks over which to average.In triaxial collapse models, the evolution of an ob-ject is determined by more than its initial overdensity(Bond & Myers, 1996; Sheth et al., 2001). In the con-text of such models, it is natural to ask how these extraparameters enter the excursion set approach. It shouldcome as no surprise that each additional variable simplyadds an extra walk (Sheth et al., 2001; Chiueh & Lee,2001; Sheth & Tormen, 2002), but there is no guaranteethat these variables are Gaussian distributed. As a re-sult, the technical problem becomes one of ﬁrst crossinga multi-dimensional barrier by multi-dimensional walks.However, it has recently been realized that this has non-trivial, qualitatively diﬀerent, consequences for halo bias:in eﬀect, the correlations between these other parameterson the large scale density ﬁeld introduce what are knownas nonlocal bias eﬀects (Sheth et al., 2012). In this re-spect, the multi-dimensional excursion set approach isconsiderably richer than the one-dimensional one.The main goal of this paper is to illustrate a num-ber of these qualitatively new features of the multi-dimensional excursion set approach. Our goal here is notso much to develop a model which reproduces eﬀects seenin simulations, as to develop insight: therefore, the em-phasis is on developing a fully analytic model in whichit is easy to see the origin of these new eﬀects. It turnsout that this model may not be that unrealistic – this isexplored further in Achitouv et al. (2013).Section 2 describes our model and provides expres-sions for the usual excursion set approach quantities, aswell as for the qualitatively new ones. Section 3 describesa number of extensions, including an explicit calcula-tion of how all the predictions are modiﬁed if protohalosare identiﬁed with peaks in the initial ﬁeld. We use thisto demonstrate how requiring models to reproduce bothhalo counts as well as overdensities at ﬁxed halo massprovides sharp constraints. A ﬁnal section summarizes.

Let δ and g both denote zero-mean Gaussian variables,with variance h δ i ≡ s and h g i ≡ β s respectively. Whenplotted as a function of s , these represent walks associ-ated with the overdensity and the second variable whichmatters for collapse. We will assume that δ and g areindependent: h δg i = 0.We will use f ( s ) to denote the distribution of s when δ ≥ δ c ( s ) + g (1)for the ﬁrst time. We will also be interested in p ( δ × | s ),the distribution of walk heights at ﬁrst crossing. The ex-cursion set ansatz assumes that the quantity f ( s ) is re-lated to the mass fraction in halos having mass m ( s ) by f ( s ) d s = m ¯ ρ d n ( m )d m d m, (2)where d n/ d m is the comoving number density of halos ofmass m , and ¯ ρ is the comoving background density. When the inequality (1) is saturated, it deﬁnes a line inthe ( δ, g ) plane. The clearest way to think of this prob-lem is to change variables to ones which run parallel andperpendicular to this line. Therefore, deﬁne g − = δ − g p β and g + = βδ + g/β p β . (3)Notice that h g − i = h g i = s , and that these variables areindependent: h g + g − i = β h δ i − h g i β = 0 . (4)In these variables, g − steps towards or away from thebarrier, which has height δ c ( s ) / p β , and g + stepsparallel to it.For what follows, it is useful to note that δ = g − + βg + p β and g = β g + − βg − p β . (5) The independence of g + and g − means that f ( s ) de-pends only on g − . Since g − is just a one dimensionalgaussian walk, and it must cross a barrier of height δ c ( s ) / p β , the ﬁrst crossing distribution is that fora moving barrier, for which simple approximations areavailable (Sheth & Tormen, 2002).For the special case in which δ c does not depend on s , the ﬁrst crossing distribution is sf ( s ) = νf ( ν β )2 = ν exp( − ν / √ π , (6)where ν ≡ δ c (0) /s β ≡ ν β . (7) c (cid:13) , 1–9 wo-dimensional excursion set Notice that β = 0 yields the usual one-dimensional solu-tion.Notice also that the factor 1 + β can be viewed ineither of two ways. Either it rescales the barrier height(which is how it appeared in the analysis above) or itrescales the variance s . Now, the ﬁrst crossing distribu-tion f ( s )d s is usually equated with the mass fraction inhalos of mass m (equation 2). If δ c itself is expected to berelated to the physics of halo formation, then the rescal-ing of δ c means that one must also understand the physicswhich led to β = 0 if one wishes to derive the value of δ c from halo abundances. Failure to do so will lead toa misestimate of the true value of the value of δ c whichmatters for the physics. If we require δ c ≈ . β ) − ≈ . β ≈ . Deﬁne δ × to be the value of δ when g − = δ c ( s ) / p β .Then δ × ≡ δ c ( s ) / p β + βg + p β = δ c ( s )1 + β + β g + p β . (8)Since g + is just a Gaussian with zero mean and variance s (recall it is independent of g − ), the expression aboveshows that p ( δ × | s ) = e − ( δ × − µ × ) / × q π Σ × (9)where µ × = δ c ( s )1 + β and Σ × = β β s. (10)The limit β = 0 yields a delta-function centered on δ c ( s )as it should.If we set ν × ≡ δ × /σ where σ ≡ s , and recall fromequation (7) that ν β ≡ ( δ c (0) /σ ) / p β , then it isuseful to think of the distribution above as p ( ν × | ν β ), theconditional distribution of ν × given ν β : in this case, theexpression above is the standard expression for the condi-tional Gaussian distribution with correlation parameter(1 + β ) − / .Note that equation (8), and hence equation (9) areexact even when δ c depends on s . In this respect, thedistribution of δ × at ﬁrst crossing is much simpler thanis the ﬁrst crossing distribution itself – it always has aGaussian shape, with the barrier only aﬀecting the meanvalue of this Gaussian.It is also worth noting that h δ × | s i = µ × is guaran-teed to be less than δ c . Thus, without prior knowledge ofthe value of β , the statistical distribution of δ × will leadto a misestimate of the value of δ c which is associatedwith the physics. In this context, it is useful to think interms of the distribution of diﬀerences from δ c . If we de-ﬁne ∆ ×− c ≡ δ × − δ c ( s ), then it is Gaussian distributedwith mean − δ c ( s ) β / (1 + β ) and variance sβ / (1 + β ).I.e., the mean is δ c ( s ) times the same factor by which s is rescaled. This provides a simple operational way of deter-mining the value of β from a measurement of p (∆ ×− c | s ). Similarly, deﬁne g × to be the value of g at ﬁrst crossing.Then, because g × ≡ δ × − δ c , it has the same distributionas δ × , but with a shifted mean. Speciﬁcally, p ( g × ) willbe Gaussian with mean − δ c ( s ) β / (1 + β ) and variance sβ / (1 + β ). Symmetry means that the distribution of S at which δ ≥ δ c + g (11)for the ﬁrst time, given that inequality (1) was ﬁrst sat-isﬁed on scale S < S , is given by equation (6) but with ν replaced by ν = ( δ c − δ c ) ( S − S )(1 + β ) . (12)The limit β = 0 yields the usual expression for progenitordistributions associated with one-dimensional walks (e.g.Lacey & Cole, 1993).Halo formation is often identiﬁed with the time whenat least half the total mass has been assembled in piecesthat are each more than µ times the ﬁnal mass. For µ > /

2, there can be only one such piece so the formationtime distribution is given by p ( δ cf ≥ δ c | M, δ c ) = Z MµM d m Mm f ( m, δ c | M, δ c ) (13)(Lacey & Cole, 1993). For white noise initial conditions( s ∝ m − ) and µ = 1 / p ( ω f ) = 2 ω f erfc( ω f / √

2) (14)where ω f = ν f with ν f given by equation (12). Be-cause ω f includes a factor of 1 + β , the mean forma-tion redshift will be scaled to higher values than when β = 0. This sort of rescaling yields better agreementwith measurements in simulations (Giocoli et al., 2007;Moreno, Giocoli & Sheth, 2008). See Sheth (2011) for thecase µ < / Similarly, the distribution of s at which inequality (1)is ﬁrst satisﬁed, given that δ has height ∆ on some scale S < s , but G is unconstrained (except by the requirementthat ∆ − G < δ c ), is also given by equation (6) but with ν replaced by ν = ( δ c − ∆) s (1 + β ) − S . (15)(The Appendix provides a short derivation.) This can bethought of as subtracting from the variance s (1 + β ) the c (cid:13) , 1–9 E. Castorina & R. K. Sheth

Figure 1.

First crossing distributions for our two-dimensionalwalks with β = 0 . S = 0 . δ c (histogram); smoothcurve shows our prediction (equation 15 in equation 6). Theeﬀective cosmology of this environment has critical density δ c ; dashed curve shows the progenitor distribution with thissame eﬀective cosmology (equation 12 in equation 6). For one-dimensional walks, the solid curve would be the same as thedashed one. piece which comes from constraining δ = ∆ on scale S ,which makes its correspondence to the one-dimensionalexpression (the β = 0 limit of this expression) obvious.Because equation (12) is diﬀerent from (15), whenexpressed as a function of s rather than ν , the condi-tional distribution is diﬀerent from the progenitor one,whereas they are the same for one-dimensional walks.The diﬀerence between the two is largest in the s → S limit, where the conditional distribution predicts moreobjects than does the progenitor distribution. Figure 1illustrates. self-similar distribution. Thus, a discrepancybetween the progenitor and environmental dependencesof clustering provides a simple way to see if stochasticityhas played a role in determining halo abundances.Things are slightly more complicated if δ c ( s ), ofcourse, but the basic fact that progenitor and conditionaldistributions with δ c − δ c = δ c − ∆ will no longer bethe same is generic. The distribution of s at which inequality (1) is ﬁrst sat-isﬁed, given that the walk was at (∆ , G ) on scale S < s ,is also given by equation (6) but with ν replaced by ν G = ( δ c − ∆ + G ) ( s − S )(1 + β ) . (16)This follows from the fact that the distance from a point( x , y ) to the line ax + by + c = 0 is | ax + by + c | / √ a + b . Alternatively, one can view this as the sameshift of origin to the g − walk that is made in the one-dimensional case (e.g. Lacey & Cole, 1993). The expres-sion above shows that G can aﬀect halo abundances inqualitatively the same way that ∆ can.In more detail, the halo overdensity is deﬁned by theratio of the conditional expression to the unconditionalone (Mo & White, 1996). In our case, this means that1 + δ h ( ν | ∆ , G ) = ν ∆ G f ( ν ∆ G ) νf ( ν ) . (17)The peak-background split bias factors are the coeﬃ-cients in the Taylor series expansion of the expressionabove, in the limit where s ≫ S . If we write these as1 + δ h ≡ X i,j B ij ∆ i i ! G j j ! , (18)then the dependence on G gives rise to what is known asnonlocal bias. Since G may also be determined by localquantities, this is, in general, a misnomer. Since it is reallyan eﬀect which arises from the dependence of halo countson the ‘hidden’ stochastic variable G , we think it is moreaccurate to call this ‘stochastic’ bias, which may or maynot be local.Recently, Musso et al. (2012) have shown that cross-correlating the halo overdensity ﬁeld with the n th-order Hermite polynomial H n (∆ / h ∆ i / ) is an eﬃ-cient way of reconstructing the b n coeﬃcients even when h ∆ i / is not small. In our case, cross-correlating with H i (∆ / h ∆ i / ) H j ( G/ h G i / ) yields δ i + jc B ij = ( − j ν i + j − H i + j +1 ( ν ) , (19)where ν = ( δ c /s ) / (1 + β ). This reduces to the usualexpression (Mo & White, 1996; Musso et al., 2012) when j = 0: δ kc B k ≡ δ kc b k = ν k − H k +1 ( ν ) . (20)Since the dependence of equation (16) on G is the sameas that on ∆, cross-correlating with H n ( G/ h G i / ) aloneyields B k ≡ c k = ( − k b k . (21)In this respect, the stochastic (possibly nonlocal) biasmodel here is simpler than that in Sheth et al. (2012),where the analogue of G was not Gaussian distributed (sothe associated orthogonal polynomials were more compli-cated). Assembly bias is the correlation between properties ofprotohaloes of ﬁxed mass and their environment, suchas those ﬁrst identiﬁed by Sheth & Tormen (2004), andstudied since by many others. While it is generally be-lieved that this eﬀect should be absent in excursion setmodels with uncorrelated steps (White, 1996), we nowshow that our two-dimensional model does exhibit as-sembly bias, even though the steps in the walks are un-correlated. However, we caution that we are not claiming c (cid:13) , 1–9 wo-dimensional excursion set Figure 2.

Dependence of walk height at ﬁrst crossing, δ × , onlarge scale environment. Symbols with error bars show the dis-tribution of δ × for walks which ﬁrst cross each other on scale s , and which had height ∆ on scale S < s ; smooth dashedcurves show equation (22). Black histogram shows the corre-sponding unconditional distribution for the same value of s ;smooth solid curve shows the corresponding prediction (equa-tion 9). that this model explains assembly bias; simply that as-sembly bias is part and parcel of the multi-dimensionalexcursion set approach, even for walks with uncorrelatedsteps.The distribution of walk heights at ﬁrst crossing,given that δ = ∆ on scale S , is p ( δ × | s, ∆ , S ) = e − ( δ × − ∆ − µ ∆ ) / p π Σ (22)where µ ∆ = δ c ( s ) − ∆1 + β and Σ = ( s − S ) β β . (23)This is the conditional analogue of equation (9).This shows that the variance is smaller than it is forunconditioned walks, but that the diﬀerence is negligiblewhen s ≫ S . The mean is more interesting: h δ × | s, ∆ , S i = ∆ + µ ∆ = δ c ( s ) + β ∆1 + β (24)is shifted by ∆ β / (1 + β ) compared to the uncondi-tional mean. Even more suggestively, this implies that h ∆ ×− c | s, ∆ , S i = [∆ − δ c ( s )] β / (1 + β ). The depen-dence of this mean on the larger scale ∆ is this model’sexpression of assembly bias, and is an important way inwhich the two-walk problem diﬀers from the one-walkproblem. When β = 0 the distribution becomes a delta-function centered on δ c ; since it is therefore independentof ∆, this shows explicitly that the one-dimensional solu- tion shows no assembly bias when the steps in the walkare uncorrelated.Figure 2 illustrates the eﬀect: objects which are sur-rounded by large scale overdensities tend to have larger δ x than objects of the same mass in large-scale under-densities. Since they have above average initial overden-sities on scale s , they will also tend to have above aver-age overdensities at formation (typically, on scale ∼ s ).The result is a correlation, at ﬁxed halo mass, betweenthe density at formation and environment – even thoughthere will not be a correlation between formation time(rather than the overdensity at the formation time) andenvironment. (In this model, as for the one-dimensionalcase, any correlation between formation time and largerscale environment can only come from correlations be-tween steps.) Since the density at formation is correlatedwith halo concentration at virialization (Navarro et al.,1997), our model predicts a correlation between halo con-centration and environment at ﬁxed mass. Our change of variables from δ, g to walks which step par-allel and perpendicular to the barrier makes it straight-forward to see what should happen when both δ and g are walks with correlated steps. If the correlations arethe result of smoothing with the same ﬁlter, then the un-conditional distribution f ( s ) should be replaced with thecorresponding expression in Musso & Sheth (2012) (seediscussion following equation 32), but the distribution ofthe walk height at ﬁrst crossing, p ( δ × | s ), remains un-changed. This is because, at ﬁrst crossing, δ × dependsonly on g + (by deﬁnition), and g + , although it has corre-lated steps, is independent of g − , so it is not constrainedby the fact that g − = δ c / (1 + β ).Following Paranjape et al. (2012) the progenitor dis-tribution should be well-approximated by replacing δ c − δ c → δ c − ( S × /S ) δ c and ( s − S )(1 + β ) → [ s − ( S × /S ) S ](1 + β ), and the conditional distribution byreplacing δ c − ∆ → δ c − ( S × /S )∆ and s (1 + β ) − S → s (1 + β ) − ( S × /S ) S . Still more accurate expressionsfollow from making the corresponding replacements inthe expressions provided in Musso et al. (2012). Testingthese expressions is the subject of work in progress. Suppose instead that steps in δ are uncorrelated, whereassteps in g are correlated with those in δ . This may hap-pen, for example, if the critical density for collapse de-pends on the overdensity on a larger scale, e.g. in the cor-related galaxy formation model of Bower et al. (1993) orin theories of modiﬁed gravity (Lam & Li, 2012). Then,let ρ ≡ h δg ih δ i / h g i / , (25) c (cid:13)000

2. This can betraced back to the fact that, when β = 1, then h g + g − i =0; i.e., the walks in g − and g + are independent (eventhough δ and b are correlated), but they have diﬀerentvariances.But in general, correlations between the walks leadto a shift in the mean and a rescaling of the variance.However, they do not change the fact that p ( δ x | σ ) isGaussian. In practice, one should be able to determineif ρ = 0 because the three unknowns, δ c , β and ρ canbe determined from our expressions for the mean andvariance of p ( δ x | σ ) and the required rescaling of s in theﬁrst crossing distribution f ( s ). Our fundamental assumption, that equation (1) accu-rately captures the physics of collapse, is, of course, onlyan idealization. Note, however, that if other variables also mattered, and they were also Gaussian distributed, suchthat equation (1) becomes δ ≥ δ c ( s ) + n − X i =1 g i , (30)then, because the sum of Gaussians is itself Gaussian,this n -dimensional model reduces to the 2-dimensionalone we have just solved, with β = P n − i =1 β i .Alternatively, suppose instead that δ ≥ δ c + χ, (31)where χ follows a non-Gaussian distribution. E.g.,Sheth et al. (2012) study a model in which δ c is indepen-dent of s , but χ is drawn from a chi-squared distributionwith ﬁve degrees of freedom. However, this distributionhas a mean which depends on s . If the distribution of∆ χ ≡ χ − h χ i is not too diﬀerent from a Gaussian, thenwe can use our 2-dimensional Gaussian model as a reason-able approximation to this one, with δ c ( s ) in equation (1)equal to δ c + h χ i and g a zero-mean Gaussian variatehaving the same variance as ∆ χ . E.g., for the model inSheth et al. (2012), h χ i ≈ . √ s and h (∆ χ ) i ≈ . s .I.e., this model should be reasonably well approximatedby our two-Gaussian model with δ c ( s ) = δ c + 0 . √ s and β = 0 . χ will be like thatof g × , meaning that it should have mean and vari-ance approximately given by − δ c ( s ) β / (1 + β ) and s β / (1 + β ). Since the variance of the initial variate χ was β s , one should think of χ × as having variancereduced by (1 + β ) − . For it to still have approxi-mately the same functional form as χ itself, it shouldhave mean 0 . p s/ (1 + β ), which is smaller than theoriginal value of 0 . √ s . For β ≪

1, we can think of thisas a shift in the mean by − . √ sβ /

2. The actual shift, − δ c ( s ) β / (1 + β ), has the same sign, but a diﬀerent am-plitude, indicating that the distribution of χ × will notbe quite the same as that of χ itself.We end this discussion with a word of caution: Al-though mapping to an eﬀective Gaussian is useful, it mayhide interesting physics. For example, the non-Gaussianstochasticity in Sheth et al. (2012) results in a quadrupo-lar signature for Lagrangian space halo bias; using aneﬀective Gaussian obscures the origin of this angular de-pendence. For walks associated with peaks in δ − g , one mustsimply add a weight which depends on d( δ − g ) / d s (Musso & Sheth, 2012). The associated ﬁrst crossingdistribution becomes that for excursion set peaks(Paranjape & Sheth, 2012), provided we remember torescale δ c ( s ) → δ c ( s ) / p β , because the peaks arein g − rather than in δ . Namely, sf ( s ) = exp( − ν β / γ √ π Z d x x p ( x | γν β ) F ( x )( R ∗ /R ) (32) c (cid:13) , 1–9 wo-dimensional excursion set Figure 3.

First crossing distribution for all walks when stepsare uncorrelated (solid); when steps are correlated because ofGaussian smoothing and the power-spectrum is P ( k ) ∝ k − . ;when the walks are centered on peaks in δ − g (equation 32)and on peaks in δ only (equation 34). where ν β ≡ ( δ c /σ ) / p β as before (c.f. equation 7),the parameters γ and R ∗ are deﬁned by equation (4.6a)in Bardeen et al. (1986), p ( x | γ, ν β ) = e − ( x − γν β ) / − γ ) p π (1 − γ ) (33)is the usual conditional Gaussian (i.e. γ is the correlationcoeﬃcent between x and ν β ), and F ( x ) is given by equa-tion (A15) of Bardeen et al. (1986). (The Musso-Shethapproximation for the ﬁrst crossing distribution for allwalks with correlated steps has F ( x ) = 1 and R = R ∗ .)The distribution of δ × is then unchanged from thatfor all walks (equation 9), because a constraint on the‘velocity’ of g − , which is what the peaks constraint boilsdown to (Musso & Sheth, 2012), means nothing for g + ,which is what determines δ × . The statistics of walkscentered on a randomly chosen particle within a pro-tohalo are known to be diﬀerent from those centeredon the protohalo center of mass; the latter yield largervalues of δ × (Sheth et al., 2001; Achitouv et al., 2013;Despali et al., 2013). Therefore, the analysis above indi-cates that a model which identiﬁes protohalo centers ofmass with peaks in δ − g cannot explain this diﬀerence.If we identify protohalo centers of mass on scale s with positions where δ ﬁrst exceeds δ c + g and are peaksin δ (rather than in δ − g ) on that scale, then the ﬁrstcrossing distribution becomes sf ( s ) = exp( − ν β / γ ( R ∗ /R ) √ π Z d x x p ( x | γ, ν β ) G ( x, γ β x )(34)where we have deﬁned γ β ≡ (1 + β ) − , ν β , γ, R ∗ and Figure 4.

Distribution of walk height at ﬁrst crossing, δ × ,for all walks (dotted) and for walks which are also peaks in δ on scale s (solid); i.e., equations (9) and (36) respectively.The distribution for peaks in δ − g is also given by the dottedcurve. p ( x | γ, ν β ) were deﬁned above, and G n ( x, γ β x ) ≡ Z dy p ( y | γ β , x ) F ( y ) y n (35)with F ( y ) the same quantity that appears in equa-tion (32), i.e., given by equation (A15) of Bardeen et al.(1986). Similarly, a little algebra shows that in this casethe distribution of δ × is given by p pk ( δ × | s ) = A × p ( δ × | s ) [ G − γ ( ν × − ν c ) G ] , (36)where p ( δ × | s ) is the distribution for all walks (equa-tion 9), G n ( ν × , γν × ) is given by equation (35), and A × = [ p β R d x x p ( x | γ, ν β ) G ( x, γ β x )] − is a nor-malization factor which ensures that the integral over all δ × yields unity.In the limit β →

0, the distribution p ( y | γ β , x ) be-comes sharply peaked around its mean value γ β x → x ,so that G ( x, γ β x ) → F ( x ). Thus, in this limit, equa-tion (34) reduces to equation (32). Similarly, p ( δ × | s ) be-comes a delta function centered on δ c / (1 + β ), making p pk ( δ × | s ) → A × p ( δ × | s ) G . Since A × → G − in thislimit, p pk ( δ × | s ) → p ( δ × | s ) as it should.In general, at large ν × , G /G → γν × making p pk ( δ × | s ) ∝ p ( δ × | s ) G ( ν × , γν × ); this illustrates thatthe term in square brackets acts to skew the distribu-tion towards larger δ × . Figures 3 and 4 show this explic-itly: they compare sf ( s ) and p ( δ × | s ) for these two peakmodels with that for all walks. In practice, we use equa-tions (4.4) and (6.13) of Bardeen et al. (1986) to approxi-mate G and G /G , and we assumed Gaussian smooth-ing of a scale-free power spectrum, i.e. P ( k ) ∝ k n , forwhich γ = ( n + 3) / ( n + 5) and ( R ∗ /R ) = 6 / ( n + 5). To c (cid:13) , 1–9 E. Castorina & R. K. Sheth make the Figures, we set n = − . β = 1 to highlightthe eﬀects of β .Figure 3 shows that peaks in δ and δ − g do indeedproduce diﬀerent counts (short and long dashed curves,respectively); both are diﬀerent from the result for allwalks (dotted). And Figure 4 shows that the distributiongiven in equation (36) is indeed shifted to larger values of δ × , with the shift depending weakly on the mass scale ν c .This increase in δ × is qualitatively in the right direction,suggesting that identifying protohalos with peaks in δ isa better model than one where protohalos are identiﬁedwith peaks in δ − g . However, the predicted distributionfor peaks is not as diﬀerent from that for all walks as isthe diﬀerence seen in simulations between centre-of-masswalks and randomly chosen ones (the shift in the meanis not large enough, the width is not narrow enough, andthe shape is not skewed enough).Before moving on, we note that, in the one-dimensional problem, the peaks motivated approach isattractive because it provides a natural reason whyhalo counts in simulations do not fall as steeply asexp( − δ c / s ) at small s . The two-Gaussian model hereachieves this by setting β ≈ . β to reproduce the halo counts. Then reproducing the dis-tributions of p ( δ × | s ) and p pk ( δ × | s ) provide importantself-consistency tests. Since reducing β from the valueused to make Figure 4 will only make all the curvesthere more similar to one another, this will exacerbatethe discrepancies between model and simulations. Thus,our analysis suggests that neither of the peaks models wehave considered here are consistent with measurements. We described a two-dimensional excursion set model, forGaussian walks in δ and g , for which almost all quan-tities associated with ﬁrst crossing distributions can becomputed analytically. We have tested all the analyticexpressions we provide in this paper using Monte-Carlorealizations of the two-dimensional stochastic process,ﬁnding excellent agreement. Since the analytic argumentsare suﬃciently simple, we have only included a few plotsshowing this agreement.Our predictions include the unconditional ﬁrst cross-ing distribution f ( s | z ) (Section 2.2); the conditional ﬁrstcrossing distribution for redshift z , f ( s, z | S, Z ), by walkswhich are known to have ﬁrst crossed one another onscale

S < s at redshift

Z < z (Section 2.5); and theconditional distribution f ( s, z | S, ∆) for walks which areconstrained to have height ∆ on scale S < s (Section 2.6).These are usually used to model halo abundances, pro-genitor distributions, and the environmental dependenceof clustering. In the one-dimensional case, for appropri-ately chosen pairs of redshift and environment, the pro-genitor and conditional distributions are the same. Forhigher-dimensional walks, this is no longer the case: the conditional distributions generically predict more mas-sive objects (Figure 1 and related discussion).Another new feature of such higher-dimensionalmodels is the fact that there is, generically, a distributionof walk heights at ﬁrst crossing p ( δ × | s ) (Section 2.3), andan associated distribution of the other variable p ( g × | s )(Section 2.4). For the Gaussian walks considered here,these distributions are Gaussian, even when the barrierheight depends on the ﬁrst crossing scale s . We arguedthat s -dependence of the mean barrier height, with aGaussian scatter around the mean, should provide a goodapproximation even when the walks are not Gaussian(Section 3.3).We also argued that, because of the variable(s) whichare not δ , halo bias in these models will generally bestochastic (sometimes refered to as nonlocal), and theconditional distributions will generically exhibit assemblybias, even when the steps in the walks are uncorrelated.We provided explicit expressions for both the stochas-tic (Section 2.7) and the assembly bias (Section 2.8 andFigure 2). Although our model predicts no correlationbetween halo formation times and environment (at ﬁxedhalo mass), in agreement with the one-dimensional case,it nevertheless predicts that halos surrounded by over-densities should be denser and more concentrated thanhalos of the same mass in underdensities.The lack of correlation between time and environ-ment is a consequence of studying walks with uncorre-lated steps. We sketched how to generalize our resultsto include correlations between the steps in each walk(necessary for quantitative comparison with simulations;Section 3.1), and between the walks themselves (as mightarise in models where the critical density required for col-lapse is determined by the overdensity on large scales;Section 3.2). These will introduce additional assemblybias eﬀects, for the same reasons they do so for one-dimensional walks. Although we sketched how to quantifythese here, we did not show plots or otherwise quantifythese eﬀects for the following reason.One of the drawbacks of this model – that is in com-mon with the usual one-dimensional walk approach – isthat it is explicitly about the statistics of all points inspace. However, halos form around special positions inspace, and the statistics of this point process – arguablythe point process for which the description of the physicsis simplest – is very diﬀerent from that around randomlychosen positions (Sheth et al., 2001; Paranjape & Sheth,2012; Achitouv et al., 2013). We argued that that thesimplest case, in which halos form around positions whichare peaks in δ − g , cannot explain this diﬀerence (Sec-tion 3.4). Although a model in which halos form aroundpeaks in δ fares better (Figures 3 and 4), it fails toadequately model the diﬀerences between walks centredon all particles, and those centred on the special subsetwhich are protohalo centers of mass. Work in progressshows how to extend this approach to include a moreelaborate model for protohalo centers-of-mass, but webelieve our results demonstrate the power of requiring c (cid:13) , 1–9 wo-dimensional excursion set models to reproduce not just halo counts but the distri-bution of δ at ﬁxed halo mass as well. ACKNOWLEDGEMENTS

This work is supported in part by NSF-0908241 andNASA NNX11A125G. RKS thanks the LUTH group atMeudon Observatory for hospitality during the summerof 2012, I. Achitouv for discussions about the conditionalcrossing distribution, and A. Paranjape for discussionsabout excursion set peaks.

References

Achitouv I., Rasera Y., Corasaniti P., Sheth R. K., PRL,submitted (arXiv:1212.1166)Bardeen J. M., Bond J. R., Kaiser N., Szalay A. S., 1986,ApJ, 304, 15Bond J. R., Myers S. M., 1996, ApJS, 103, 1Bond J. R., Cole S., Efstathiou G., Kaiser N., 1991, ApJ,379, 440Bower R. G., Coles P., Frenk C. S, & White S. D. M,1993, ApJ, 405, 403Chiueh T., Lee J., ApJ, 555, 33Despali G., Tormen G., Sheth R. K., 2013, MNRAS,accepted (arXiv:1212.4157)Epstein R. I., 1983, MNRAS, 205, 207Giocoli C., Moreno J., Sheth R. K., Tormen G., 2007,MNRAS, 376, 977Lacey C., Cole S., 1993, MNRAS, 262, 627Lam T. Y., Li B., 2012, MNRAS, 426, 3260LMo H. J., White S. D. M., 1996, MNRAS, 282, 347Moreno J., Giocoli C., Sheth R. K., 2008, MNRAS, 391,1729Musso M., Paranjape A., Sheth R. K., 2012, MNRAS,427, 3145Musso M., Sheth R. K., 2012, MNRAS, 423, 102Navarro J., Frenk C. S., White S. D. M., 1997, ApJ, 490,493Paranjape A., Lam T.-Y., Sheth R. K., 2012, MNRAS,420, 1429Paranjape A., Sheth R. K., 2012, MNRAS, 426, 2789Paranjape A., Sheth R. K., Desjacques V., 2012, MN-RAS, submitted (arXiv:1210.1483)Press W. H., Schechter P., 1974, ApJ, 187, 425Sheth R. K., 1998, MNRAS, 300, 1057Sheth R. K., Tormen G., 1999, MNRAS, 308, 119Sheth R. K., Tormen G., 2002, MNRAS, 329, 61Sheth R. K., Tormen G., 2004, MNRAS, 350, 1385Sheth R. K., Mo H. J., Tormen G., 2001, MNRAS, 323,1Sheth R. K., Chan K.-C., Scoccimarro R., 2012, PRD,submitted (arXiv:1207.7117)Sheth R. K., 2011, Pramana, 77, 169White S. D. M., 1996, in Schaeﬀer R. et al., eds.,Cosmology and large scale structure, Proc. 60th LesHouches School, ASP Conf. Series, Vol. 176. Elsevier,Amsterdam, p. 349

APPENDIX A: PROOF OF EQUATION (15)

The main complication with respect to the one dimen-sional case is that the constraint that the walk passedthrough ∆ on scale S still allows walks with a range ofvalues of G . This range is constrained by the requirementthat ∆ and G had not crossed on scales smaller than S .At ﬁxed ∆ and G , the solution is straightforward, as weshow shortly, so the main work is to integrate this solu-tion over the allowed range of G .As before, it is best to work in the ( g + , g − ) plane, inwhich case the requirement that the walk has height ∆on scale S means that G − < δ c ( S ) p β and G + = ∆ p β − G − (A1)(again capital letters indicate values at S ). The distribu-tion of s at which δ c ( s ) / p β is ﬁrst crossed, giventhat the walk started from ( G + , G − ) on scale S , is givenby equation (6) with ν = ( δ c / p β − G − ) s − S . (A2)Notice that this expression depends only on G − , so wewill denote the associated ﬁrst crossing distribution as f ( s | G − , S ).To get the quantity we are after, f ( s | ∆ , S ), we mustnow integrate f ( s | G − , S ) over all allowed starting val-ues ( G + , G − ), weighting by the probability of starting ateach. I.e., f ( s | ∆ , S ) = A Z ∞−∞ d G + Z δ c / √ β −∞ d G − f ( s | G − , S ) × p ( G + | S ) q ( G − | S ) δ D ( δ − ∆) (A3)where q ( G − | S ) = e − G − / S √ πS − e − (2 δ c / √ β − G − ) / S √ πS (A4)is the probability that (the one-dimensional) walk g − hasheight G − at S and never crossed δ c / p β on somesmaller s < S (Bond et al., 1991), p ( G + | S ) is a Gaussianwith zero mean and variance S , and A ≡ Z ∞−∞ d G + Z δ c / √ β −∞ d G − p ( G + | S ) q ( G − | S ) δ D ( δ − ∆)(A5)is a normalization constant which ensures that the prob-abilities integrate to unity. This, and the integral ineq.(A3) can be performed analytically, yielding f ( s | ∆ , S ) = ( δ c − ∆)(1 + β ) s − S + β s exp − ( δ c − ∆) / s − S + β s ) p π ( s − S + β s ) . (A6)This is equivalent to the change of variables given byequation (15) of the main text. c (cid:13)000