[PDF] Mind the gap: the power of combining photometric surveys with intensity mapping

Abstract

Full PDF

PPrepared for submission to JCAP

Mind the gap: the power ofcombining photometric surveyswith intensity mapping

Chirag Modi, a,b

Martin White, a,c

Emanuele Castorina, d,e

AnžeSlosar e a Berkeley Center for Cosmological Physics, Department of Physics, University of California,Berkeley, CA 94720 b Center for Computational Astrophysics, Flatiron Institute, 162 Fifth Ave., New York, NY10010, USA c Department of Astronomy, University of California, Berkeley, CA 94720 d dDipartimento di Fisica ‘Aldo Pontremoli’, Universita’ degli Studi di Milano, Milan, Italy e Theoretical Physics Department, CERN, 1211 Geneva 23, Switzerland f Department of Physics, Brookhaven National Laboratory, Upton, NY 11973E-mail: cmodi@ﬂatironinstitute.org, [email protected],[email protected], [email protected]

Abstract.

The long wavelength modes lost to bright foregrounds in the interferometric 21-cm surveys can partially be recovered using a forward modeling approach that exploits thenon-linear coupling between small and large scales induced by gravitational evolution. In thiswork, we build upon this approach by considering how adding external galaxy distributiondata can help to ﬁll in these modes. We consider supplementing the 21-cm data at two diﬀerentredshifts with a spectroscopic sample (good radial resolution but low number density) looselymodeled on DESI-ELG at z = 1 and a photometric sample (high number density but poorradial resolution) similar to LSST sample at z = 1 and z = 4 respectively. We ﬁnd thatboth the galaxy samples are able to reconstruct the largest modes better than only using 21-cm data, with the spectroscopic sample performing signiﬁcantly better than the photometricsample despite much lower number density. We demonstrate the synergies between surveys byshowing that the primordial initial density ﬁeld is reconstructed better with the combinationof surveys than using either of them individually. Methodologically, we also explore theimportance of smoothing the density ﬁeld when using bias models to forward model thesetracers for reconstruction. Keywords: cosmological parameters from LSS – 21 cm – galaxy clustering –bias model –forward modeling a r X i v : . [ a s t r o - ph . C O ] F e b ontents The study of large-scale structure in the high-redshift Universe is a promising tool for cos-mology [1]. One means of mapping large-scale structure in the distant Universe is throughthe technique of intensity mapping (IM): performing a low resolution, spectroscopic survey tomeasure integrated ﬂux from unresolved sources on large areas of sky at diﬀerent frequencies.Such surveys capture the largest elements of the cosmic web and map out the distributionof matter in very large cosmological volumes in a fast and eﬃcient manner, with good radialresolution [2, 3]. Since hydrogen is so abundant in the Universe, 21-cm emission from cosmicneutral hydrogen (H i ) oﬀers one tracer to map out the Universe in such a way. With its lowenergy and optical depth there is little chance of line confusions and it provides an eﬃcientway to probe the spatial distribution of neutral hydrogen, and hence the underlying darkmatter from the local Universe to the dark ages [3–8].One issue with IM surveys is that foregrounds render the long-wavelength ﬂuctuationsalong the line of sight unmeasureable, which can adversely aﬀect the science that they cando [9–13]. In ref. [13] we studied one method for reconstructing long-wavelength ﬂuctuations,using the distinct pattern of correlations imprinted by gravitational instability. In this paperwe take another route, more similar to refs. [10, 12], and consider how adding additional datacan help to ﬁll in the modes that are lost to foregrounds in 21-cm observations. To model our galaxy and IM surveys we use an extension of the Hidden Valley simulations[14], a set of particle N-body simulations performed in a h − Mpc box using the

FastPM code [15]. Further details of this simulation and the manner in which Hi is assignedto halos can be found in Appendix A. Anticipating that the Hi would be observed by aninstrument such as the Hydrogen Intensity and Real-time Analysis eXperiment (HIRAX;[16]) or the Packed Ultrawideband Mapping Array (PUMA [8, 17]) we assign the Hi to aregular grid and work with the Fourier transform of the density ﬁeld. To the signalwe add thermal noise and foregrounds as described in detail in refs. [13, 14]. For z = 1 wewill present results for both PUMA and HIRAX thermal noise, while for z = 4 we will haveobservations only from PUMA and so restrict ourselves to that case.– 1 – k redshift survey21-cm IM21-cm wedgephoto-z survey Figure 1 . A schematic view of the modes probed by 21-cm surveys and optical surveys in the k ⊥ − k (cid:107) plane. Photometric redshift surveys (purple) are capable of high angular resolution (i.e. probe to high k ⊥ ) but have limited radial resolution and thus only constrain well low k (cid:107) modes. A spectroscopicredshift survey (yellow) would enable access to the full (cid:126)k plane, but would be prohibitively expensiveif done with high number density. An IM survey of 21-cm emission (green) has high number densityand radial resolution but limited angular resolution and misses modes at low k (cid:107) (and in a regionknown as the foreground wedge). Due to foregrounds and diﬃculties in instrument calibration, we assume the Hi map hasinﬁnite noise for modes with k (cid:107) ≈ and in a region of the k ⊥ − k (cid:107) plane known as the “wedge”(see Fig. 1). For details of how these limitations arise, we refer the reader to refs. [13, 14]and the extensive references to the earlier literature. Our goal here is to ask to what extenta diﬀerent survey, either a photometric or spectroscopic galaxy survey, can “ﬁll in” thesemissing modes. Ideally the galaxy survey can take advantage of the high number densityand excellent radial resolution of the Hi survey while the Hi survey can take advantage ofthe low- k (cid:107) sensitivity of the galaxy survey. We will present results for a ‘pessimistic’ choiceof foreground wedge which removes information for angles less than three times the primaryﬁeld of view ( × FOV). This corresponds to a wedge angle of θ w = 6 ◦ and ◦ at z = 1 and z = 4 respectively [13]. Obviously a better instrument calibration and foreground subtraction,which leads to a smaller wedge, will require less input from the auxilliary data.We will consider two populations of mock galaxies, loosely modeled on samples thatmight be returned by upcoming surveys. At z = 1 we consider a galaxy sample with goodredshift measurements and ¯ n = 10 − h Mpc − . Such a sample is similar to the emission linegalaxy (ELG) sample to be targeted by the Dark Energy Spectroscopic Instrument (DESI)survey [18], and henceforth we will refer it so. For this exploratory calculation, rather thanmodel these galaxies in great detail, we simply choose a mass-limited sample of halos in ourN-body simulation with ¯ n = 10 − h Mpc − . We assume the redshifts of these halos are– 2 –nown precisely. Such a halo sample has a complex, scale-dependent bias and about the rightlevel of shot noise while being very easy to model. We do not expect our results to dependupon the details of this choice.The second sample, appropriate for z ≥ , is a photometric sample of galaxies such aswill be observed by the Vera Rubin Observatory - Legacy Survey of Space and Time (LSST;[19]) and thus we will refer to it as LSST sample. We follow ref. [20] and consider an analogueof Lyman break dropout galaxies (LBGs). This sample has high number density but poorradial resolution, leading to a density ﬁeld with lower noise at low k (cid:107) but high noise for large k (cid:107) . As above we model these galaxies as a mass-limited halo sample. We introduce thephotometric redshift scatter simply by enhancing the noise for this sample as an exponentialin k (cid:107) (see 3.3). We will consider the LSST sample at two redshifts, z = 1 and z = 4 withnumber densities ¯ n = 5 × − h Mpc − and ¯ n = 3 . × − h Mpc − respectively. Our reconstruction method largely follows the steps outlined in Ref. [13]. We reconstructthe initial density ﬁeld by optimizing its posterior, conditioned on the observed data ( Hi andgalaxy density in diﬀerent regions of k -space), assuming Gaussian initial conditions. Evolvingthis initial ﬁeld allows us to reconstruct the observed data on all scales. The initial conditionsare reconstructed in the manner described in refs. [13, 21, 22] (for alternative approaches toreconstruction see also refs. [23–27]).For the forward model [ F ( s ) ] connecting the observed data ( δ ) with the Gaussian initialconditions (ICs; s ) we use a second order Lagrangian bias model coupled to the non-lineardynamics of the simulation. The forward-modeled ﬁnal density ﬁelds (of galaxies and Hi ) areobtained by assigning particles to a grid at their ﬁnal redshift-space positions with a weightthat is a function of the density and shear at their positions in the initial conditions [13].We explore diﬀerent non-linear dynamics to estimate this ﬁnal position and ﬁnd the bestresults when using Zeldovich dynamics to evolve particles from their Lagrangian to Eulerianposition. Our Lagrangian bias model [13, 28–35] connects the matter ﬁeld to the dark matterhalos and includes terms up to quadratic order ( δ L , δ L,R and s L,R ≡ (cid:80) ij s ij the scalar shearﬁeld where s ij = ( ∂ i ∂ j ∂ − − [1 / δ Dij ) δ L ) computed from the ICs of the simulation, evolvedto z = 0 using linear theory. We subtract the zero-lag terms to make these ﬁelds have zeromean.To construct the bias terms of our model, and in particular to estimate the quadraticoperators, we smooth the linear density ﬁeld with a Gaussian kernel with smoothing scale R .The dependence on the smoothing scale R of reconstruction algorithms is an open problemin the ﬁeld [36, 37], since a formal understanding of the renormalization of the bias expansionat the ﬁeld level has not been obtained yet [34, 37]. It should also be kept in mind that, evenwithout any additional smoothing, the size of the FFT grid provides an unavoidable cut oﬀof the power on small scales. We thus explore diﬀerent smoothing scales for reconstructionand empirically motivate our choices. We ﬁnd that for z = 4 , any smoothing leads to lowerreconstruction performance, implying that the optimal smoothing is likely smaller than thegrid resolution. For z = 1 , we ﬁnd that when reconstructing with only Hi data, in which casethe only large scale information is provided by non-linear coupling of gravitational evolution,the large scales are best reconstructed when smoothing the Hi ﬁeld with R = 6 h − Mpc. Since δ L and s are correlated, we actually deﬁne a new ﬁeld g L = δ L − s which does not correlate with δ L on large scales and use this instead of shear ﬁeld. – 3 –owever when combining this with additional data of galaxy ﬁeld, we ﬁnd that no smoothingis required since the information on large scales is dominated by the galaxy ﬁeld. Thus for z = 1 , we smooth Hi data with R = 6 h − Mpc and do not smooth for the galaxy ﬁeld. Weintend to the return to the issue of the smoothing scale in a forthcoming publication.Our modeled tracer ﬁeld is then [13, 35, 38]: δ b HI ( x ) = δ [1] ( x ) + b δ [ δ L ] ( x ) + b δ [ δ L,R ] ( x ) + b g δ [ g L,R ] ( x ) . (3.1)where δ [ W ] ( x ) refers to the ﬁeld generated by weighting the particles with the ‘ W ’ ﬁeld.To ﬁt for the bias parameters, we minimize the mean square model error between thedata and the model ﬁelds which is equivalent to minimizing the error power spectrum of theresiduals, r ( k ) = δ b ( k ) − δ ( k ) , in Fourier space. We do this separately for the Hi and galaxyﬁelds and thus have two sets of bias parameters. We ﬁnd that the smoothing scale aﬀectsthe forward modeling and reconstruction diﬀerently, with the impact being more signiﬁcanton the reconstruction than the forward modeling. Thus ideally one would like to keep boththe bias parameters and the smoothing scale as a free-parameters to be ﬁt at the time ofreconstruction instead of ﬁtting them in advance. We plan to explore this in the future.Once the bias parameters are known, we reconstruct the initial (and ﬁnal density) ﬁeldby maximizing the posterior as a function of the IC amplitudes, s ( k ) , using L-BFGS [39].The negative log-likelihood for the Gaussian prior can be combined with the negative log-likelihood of the data to get the posterior (see also [34, 36]) P = (cid:88) k modes (k)  (cid:88) k , | k |∼ k, k (cid:54)∈ w | δ b HI ( k ) − δ obsHI ( k ) | P err − HI ( k, µ )  + (cid:88) k modes (k)  (cid:88) k , | k |∼ k | δ b g ( k ) − δ obsg ( k ) | P err − g ( k )  + (cid:88) k modes (k)  (cid:88) k , | k |∼ k | s ( k )) | P s ( k )  (3.2)where P s is the prior power spectrum of the initial conditions and the sum is over modesthat are measured by each survey (i.e. modes not in the foreground wedge for Hi and low k (cid:107) modes for the galaxies). For the Hi ﬁeld the error power spectrum, P err − HI , is a combinationof the modeling error and the noise power spectrum. This changes the amplitude of P err − HI ,especially on small scales, and also introduces an angular dependence. We have indicatedthis by the µ dependence in P err − HI . Note the data automatically include shot-noise, sincewe have a single realization of the halo ﬁeld in the simulation.For the galaxies the error power spectrum, P err − g , is due to a combination of shot noiseand modeling error. For the photometric sample we also need to include the smearing of thedensity ﬁeld along the line of sight. We include this by damping the signal in the likelihood In principle one could ﬁt for the bias parameters using summary statistics, such as the power spectrum,as in e.g. refs. [35, 38], though we anticipate that the constraints from the ﬁeld itself would be tighter. https://en.wikipedia.org/wiki/Limited-memory_BFGS – 4 –erm with a Gaussian smoothing kernel: (cid:88) k , | k |∼ k | δ b g ( k ) − δ obsg ( k ) | → (cid:88) k , | k |∼ k | δ b g ( k ) − δ obsg ( k ) | exp (cid:32) − k µ σ (cid:33) . (3.3)Here σ ph is the photometric smoothing scale along the line of sight. It is equal to (cid:39) h − Mpc at z = 1 and (cid:39) h − Mpc at z = 4 . In this section we present the results for our reconstructions. Our primary metrics to gaugethe performance of our model and reconstruction are the cross correlation function, r cc ( k ) ,and the transfer function, T f ( k ) , deﬁned as r cc ( k ) = P XY ( k ) (cid:112) P X ( k ) P Y ( k ) , T f ( k ) = (cid:115) P Y ( k ) P X ( k ) , (4.1)The cross correlation, r cc , measures how faithfully the reconstructed map describes the inputmap, up to rescalings of the output map amplitude. For better visual clarity, we insteadshow the error power spectrum ( − r cc ) with lower values indicating better reconstruction.The transfer function, on the other hand, tells us about the amplitude of the output mapas a function of scale, with r T f = P XY /P X . These metrics will always be deﬁned betweeneither the model or the reconstructed ﬁelds as Y , and the corresponding true ﬁeld as X unlessexplicitly speciﬁed otherwise.We have to consider 2 sets of bias parameters, one for H i and the other for the galaxies.We keep them ﬁxed under the assumption that they have been estimated prior to reconstruc-tion. For the H i ﬁeld at z = 1 , we also smooth the initial density ﬁeld with R = 6 h − Mpc toestimate quadratic terms as described earlier. The dynamics in the forward model is taken tobe the Zeldovich approximation (ZA). All of the data are in redshift space, with the forwardmodel using the ZA velocities to perform the translation. The reconstruction procedure isoutlined in detail in ref. [13] but, brieﬂy, it is done in a series of optimization steps. We beginon a grid and smooth the likelihood term in Eq. 3.2 for both galaxies and Hi on smallscales to ﬁt the large scales ﬁrst. This smoothing is reduced in 5 steps (16, 8, 4, 2, h − Mpc),each with 100 iterations. Note that this smoothing is diﬀerent from the smoothing of thelinear ﬁeld to estimate quadratic components of the bias model discussed earlier and is partof the optimization scheme, not the forward model. The reconstructed ﬁeld is then upsampledto grid and smoothing is reduced in 2 steps (2 and h − Mpc) each with 100 iterations.We begin by showing the data and reconstruction at the level of the ﬁelds in Figure 2 for z = 1 . The ﬁrst two rows show the true H i ﬁeld and the H i data with thermal noise and fore-ground wedge. The next three rows show the H i reconstructed ﬁeld when the reconstructionis done with only H i data, H i and LSST data as well as H i and ELG data. Reconstruction isclosest to the underlying truth when we combine H i with ELG data, but improvement withLSST data is also apparent in X-Z and Y-Z projections.Figure 3 shows the results at the level of the two point function for the following threecases: z = 4 with PUMA and z = 1 with both PUMA and HIRAX. For z = 1 we show resultswith both photometric LSST and spectroscopic DESI-ELG data, while at z = 4 we only havethe photometric galaxy sample. For comparison, we also show the reconstruction with H i data– 5 – r u t h ZY ZX YX D a t a R e c o n - H I o n l y R e c o n - H I + L SS T R e c o n - H I + E L G Figure 2 . Slices perpendicular to the 3 box axes of the true H i ﬁeld; the data with thermal noiseand wedge; and the reconstructed H i ﬁeld with H i data only, H i +LSST and H i +ELG data at z = 1 .We project over h − Mpc slices, transverse directions show the full box ( h − Mpc). The colorscale is the same for all rows and columns except the data row. only. In each case, we see an improvement in the cross-correlation between the reconstructedH i ﬁeld and the true H i ﬁeld as compared to the case when reconstruction is done only with– 6 – .00.20.40.60.81.0 r cc t f z=4.00, PUMA0.00.20.40.60.81.0 r cc HI dataHI + LSST dataHI + DESI data 0.60.81.01.21.41.61.82.0 t f z=1.00, PUMA10 k [ h Mpc ] r cc k [ h Mpc ] t f z=1.00, HIRAX Figure 3 . Cross correlation (left) and transfer function (right) of the reconstructed H i ﬁeld for PUMAnoise level at z = 4 (top) and z = 1 (middle), and with HIRAX noise level at z = 1 (bottom). Weshow results for reconstruction without galaxies (blue), with a spectroscoic DESI-ELG like sample of ¯ n = 10 − h Mpc − (green), and a photometric LSST-like sample (orange) of ¯ n = 5 × − h Mpc − and ¯ n = 3 . × − h Mpc − at z = 1 and z = 4 respectively. H i data. At z = 1 , we ﬁnd that the reconstruction with data from a spectroscopic surveyfar outperforms the reconstruction with a photometric survey, even though the latter has 50times higher number density. At z = 4 , as the photometric smoothing scale decreases forLSST, the reconstruction improves and we can recover the largest scales almost perfectly.– 7 – .00.20.40.60.81.0 r cc = 0.1 z=4.00, PUMA = 0.9 r cc HI dataHI + LSST dataHI + DESI data z=1.00, PUMA10 k [ h Mpc ] r cc k [ h Mpc ] z=1.00, HIRAX Figure 4 . Same as Figure 3 but showing the cross-correlation for the corresponding ﬁelds in wedgesperpendicular to the line of sight (left, µ = [0 , . ) and along the line of sight (right, µ ∈ [0 . , ). Additionally, in every case, we ﬁnd that more power is reconstructed at the largest scales forH i than the original case, as shown in the transfer function.In Figure 4, we show the same results for cross-correlation but in µ bins along andperpendicular to the line of sight µ ∈ [0 − . and µ ∈ [0 . − . . We remind the reader thatour goal was to supplement the large scale modes lost in the foreground wedge, especially thoseperpendicular to the line of sight, with modes in galaxy clustering surveys. Both spectroscopicand photometric surveys probe the perpendicular modes, but the latter loses the line of sight– 8 –nes. Figure 4 clearly shows that the large-scale modes perpendicular to the line of sightare reconstructed very well. Furthermore, when reconstructed with H i data, the LSST ﬁeldalso recovers the modes along the line of sight that are otherwise missing due to photometricuncertainties. The combination of 21-cm data and LSST is therefore greater than the sum ofits parts. A translation of these metrics into performance gains for particular science goalscan be found in §6 of ref. [13]. k [ h Mpc ] r cc HI dataHI + LSST dataHI + DESI dataDESI data 10 k [ h Mpc ] t f Figure 5 . Angle average cross correlation and transfer function for the reconstructed initial ﬁeldwith the true initial ﬁeld at z = 1 for PUMA noise-levels. We show the results when reconstructionis done with only H i data, only DESI ELG-like data, ¯ n = 10 − h Mpc − , the combination of the twoand H i with LSST data. In addition to ﬁlling in the foreground wedge of H i data, we can also use the combinationof the diﬀerent surveys to explore other synergies between the diﬀerent cosmological tracers.For instance we can ask how well our method reconstructs the primordial initial conditionsthat are shared by, and give rise to, both the observed tracer ﬁelds. In Figure 5, we showthe cross-correlation coeﬃcient and transfer function for reconstructed initial conditions fromdata at z = 1 with the true initial conditions. We compare the cases when reconstructionis done only with a single tracer, such as H i data or ELG-like data with the combinationof surveys i.e. H i and ELG or H i and LSST data. On large scales, DESI and LSST datadominate, improving the reconstruction over using only H i ﬁeld. However on small scales,H i allows us to improve over the galaxy ﬁelds and push further into the non-linear regime.The combination of the diﬀerent probes yields higher returns than each tracer consideredindependently. Interferometric 21-cm surveys have the potential to map out the distribution of matter inthe largest cosmological volumes with good radial resolution. However they must contendwith bright foregrounds that, when coupled to instrument imperfections, can lead to a loss oflarge scale modes in the foreground wedge. In this work we build upon the forward modelingapproach of Ref. [13], that reconstructed these long-wavelength ﬂuctuations by exploitingthe non-linear coupling between small and large scales induced by gravitational evolution, byadding external galaxy distribution data that can help to ﬁll in the missing modes. Speciﬁcally,we consider supplementing the 21-cm data from mock HIRAX and PUMA surveys at diﬀerent– 9 –edshifts with a spectroscopic sample (good radial resolution but low number density) and aphotometric sample (high number density but poor radial resolution). The mock galaxies forthese datasets are loosely modeled on DESI-ELG sample at z = 1 and LSST sample at z = 1 and z = 4 respectively.We ﬁnd that the spectroscopic sample reconstructs the modes signiﬁcantly better thanthe photometric sample, despite having much lower number density. Both galaxy samplesare able to reconstruct the largest modes ( k < . h Mpc − ) transverse to the line of sightvery well. However the contribution of an LSST-like photometric sample to scales smallerthan k (cid:39) . h Mpc − is not signiﬁcant, especially for the thermal noise of PUMA survey. Atthe same time, we ﬁnd that 21-cm data also reconstructs the correlations in the LSST dataalong the line of sight on small scales that are otherwise lost due to photometric smoothing.The spectroscopic sample, on the other hand, improves reconstruction across all the scales,especially for a noise-dominated survey like HIRAX where the reconstruction with only 21-cmdata is poor. We also explore the synergies of diﬀerent surveys in reconstructing the initialdensity ﬁeld and ﬁnd that the combination of surveys performs better than using eithersurveys individually.With regards to forward modeling approaches, we ﬁnd that the smoothing scale plays animportant role in quadratic bias model when the data itself lacks any direct information onlarge scales, such as the H i ﬁeld at low redshifts. In this case, using a large smoothing scale tosuppress small scales non-linearities when estimating quadratic ﬁelds improves reconstructionof large scale modes in H i data. Interestingly, this does not seem to be the case at higherredshifts. The appropriate numerical procedure for the implementation of an eﬀective ﬁeldtheory when modeling large-scale structure at the ﬁeld level (that would remove the depen-dence on the smoothing scale) remains an area of active research, and our results show theimportance of understanding this issues at a more fundamental level. We plan on pursuingthese directions in future work. Acknowledgments

M.W. is supported by the U.S. Department of Energy and by NSF grant number 1713791.This research used resources of the National Energy Research Scientiﬁc Computing Center(NERSC), a U.S. Department of Energy Oﬃce of Science User Facility operated under Con-tract No. DE-AC02-05CH11231. This work made extensive use of the NASA AstrophysicsData System and of the astro-ph preprint archive at arXiv.org . A HiddenValley2

In this work we have used a second run of the Hidden Valley simulations, ﬁrst reported inref. [14]. The HiddenValley2 simulation employs the same code, box size, particle loadingand initial conditions but has been run to z = 0 . with outputs at z = 1 . , 1.0 and . . Inaddition to lower redshift outputs we have also updated the model used to populate the darkmatter halos in the simulation with H i , with parameters recalibrated to the wider redshiftrange and updated measurements of the abundance and clustering of H i .We make use of an M HI − M halo relation in order to populate our dark matter onlysimulation with H i , much as we did in our earlier work [14]. We assume the total H i mass in– 10 – z H I × C+15H+19modD 1 2 3 4 5 6 z b H I ( z ) Figure 6 . (Left) The abundance of H i as a function of redshift (points; [43–56]). The dotted lineshows a linear ﬁt, ˜Ω HI = (0 . . z ) × − , from Fig. 14 of ref. [56] while the dashed line showsthe power-law ﬁt, ˜Ω HI = 0 . z ) . , of ref. [54] and the solid black line shows our ﬁducial model.(Right) The H i bias as a function of redshift. The error on the z ≈ point is dominated by theuncertainty in ˜Ω HI . a halo of mass M h is [40, 41] M HI ( M h ) = A ( z ) (cid:18) M h M cut (cid:19) α exp (cid:20) − M cut M h (cid:21) (A.1)This H i mass is split into a central component and a component that moves with the virialvelocity of the halo. Speciﬁcally a fraction f cen of the H i mass is taken to reside at the halocenter and move with the halo center-of-mass velocity. The remaining f sat = 1 − f cen of theH i mass has an additional, Gaussian line-of-sight velocity distribution with dispersion equalto the virial velocity dispersion of the halo. For numerical convenience we implemented thisby dividing this H i into N sat = (cid:98) . M h /M cut ) . (cid:99) equal parts and for each drawing anadditional line-of-sight velocity component from a Gaussian. As we found in refs. [13, 14] thatthe details of how we treated such ﬁngers of god were largely unimportant for our science,we have opted to take the simple modeling approach here. Following Figure 7 of ref. [42], wemodel the fraction of H i in satellites as: f sat = min (cid:20) . , . × (log M h − . . − . (cid:21) ∀ M h > . h − M (cid:12) (A.2)otherwise f sat = 0 .There is unfortunately very little data with which to tune the parameters of Eq. (A.1).The amplitude, A ( z ) , is largely constrained by the abundance of H i , ˜Ω HI ( z ) (Fig. 6). Wehave followed the common convention in absorption line studies and H i intensity mapping andquoted the abundance as a comoving H i density divided by the (physical) z = 0 critical density.However we have used a tilde to distinguish this quantity from the more common usage of Ω as a ratio of (physical or comoving) density at z to critical density at z . The agreementwith the data above z ≈ is quite good. The values of α and M cut are less constrained.Both physical intuition and numerical simulations suggest that there is a minimum halomass ( M cut ∼ − M (cid:12) ) below which neutral hydrogen will not be self-shielded from UV– 11 – k [ h Mpc ]10 ( k )[ m K ] z Figure 7 . The monopole redshift-space clustering of H i at z = 1 from our model, compared to theGBT measurements of refs. [57, 58] at z (cid:39) . (grey bands showing 68% and 95% conﬁdence regions).The upper limits from the Hi auto power are approximately independent in each bin but the lowerlimits are perfectly correlated (see ref. [58]). photons. Above this mass the amount of H i should increase as the halo mass increases, thoughnot necessarily linearly (however, simulations suggest α ≈ at z > ). The characteristic halomass scale ( M cut ) is determined by the clustering of H i . At z (cid:39) an analysis of H i -selectedgalaxies in the Sloan Digital Sky Survey constrains the HOD [59]. At z (cid:39) the large-scale biasis known approximately from cross-correlation with optical galaxies [57, 58, 60, 61], howeverat higher z there are no direct measurements. The clustering of Damped Ly α systems (DLAs)has been measured at z (cid:39) − by ref. [62], and since the DLAs contain the majority of the H i at those redshifts this can be used as a proxy for the H i clustering amplitude. The numbersinferred agree reasonably well with an analysis of the most recent hydrodynamical simulations(see Table 6 of ref. [42]). Based on these considerations we take α = (1 + 2 z ) / (2 + 2 z ) , A ( z ) = 1 . × (1 + z ) − / h − M (cid:12) , M cut = 6 × exp (cid:18) − z (cid:19) h − M (cid:12) . (A.3)Figure 6 shows the abundance and large-scale bias of H i predicted from our model(‘Model D’) compared to observations. Our model has suﬃcient ﬂexibility to ﬁt the availabledata, while also being in broad agreement with the current generation of hydrodynamicsimulations. From Fig. 6 it appears our model overpredicts the clustering at z ≈ , howeverthis is partly due to the way the large-scale bias is estimated in the observations. We take acloser look at the agreement at z (cid:39) . (we use the z = 1 output of our simulation) in Fig. 7.Here we compare the product, ¯ T ∆ ( k ) , predicted by our simulation to the range allowedby the H i auto-correlation and Hi -WiggleZ cross-correlation [57, 58]. While there may besome evidence for more small-scale power in the model than the observations, the level ofagreement is quite good for most of the range and within the errors on the observation for allscales. – 12 – eferences [1] S. Ferraro and M. J. Wilson, Inﬂation and Dark Energy from spectroscopy at z > , BAAS (2019) 72 [ ].[2] E. D. Kovetz, M. P. Viero, A. Lidz, L. Newburgh, M. Rahman, E. Switzer et al., Line-IntensityMapping: 2017 Status Report , ArXiv e-prints (2017) [ ].[3] E. Kovetz, P. C. Breysse, A. Lidz, J. Bock, C. M. Bradford, T.-C. Chang et al.,

Astrophysicsand Cosmology with Line-Intensity Mapping , in

BAAS , vol. 51, p. 101, May, 2019, .[4] S. R. Furlanetto, S. P. Oh and F. H. Briggs,

Cosmology at low frequencies: The 21 cmtransition and the high-redshift Universe , PhysRep (2006) 181 [ astro-ph/0608032 ].[5] F. B. Abdalla, P. Bull, S. Camera, A. Benoit-Lévy, B. Joachimi, D. Kirk et al.,

Cosmologyfrom HI galaxy surveys with the SKA , Advancing Astrophysics with the Square Kilometre Array(AASKA14) (2015) 17 [ ].[6] M. Santos, P. Bull, D. Alonso, S. Camera, P. Ferreira, G. Bernardi et al.,

Cosmology from aSKA HI intensity mapping survey , Advancing Astrophysics with the Square Kilometre Array(AASKA14) (2015) 19 [ ].[7] Cosmic Visions 21 cm Collaboration, R. Ansari, E. J. Arena, K. Bandura, P. Bull, E. Castorinaet al.,

Inﬂation and Early Dark Energy with a Stage II Hydrogen Intensity MappingExperiment , arXiv e-prints (2018) arXiv:1810.09572 [ ].[8] E. Castorina, S. Foreman, D. Karagiannis, A. Liu, K. W. Masui, P. D. Meerburg et al., PackedUltra-wideband Mapping Array (PUMA): Astro2020 RFI Response , arXiv e-prints (2020)arXiv:2002.05072 [ ].[9] H.-J. Seo and C. M. Hirata, The foreground wedge and 21-cm BAO surveys , MNRAS (2016) 3142 [ ].[10] J. D. Cohn, M. White, T.-C. Chang, G. Holder, N. Padmanabhan and O. Doré,

Combininggalaxy and 21-cm surveys , MNRAS (2016) 2068 [ ].[11] A. Obuljen, E. Castorina, F. Villaescusa-Navarro and M. Viel,

High-redshift post-reionizationcosmology with 21cm intensity mapping , JCAP (2018) 004 [ ].[12] S.-F. Chen, E. Castorina, M. White and A. Slosar, Synergies between radio, optical andmicrowave observations at high redshift , JCAP (2019) 023 [ ].[13] C. Modi, M. White, A. Slosar and E. Castorina,

Reconstructing large-scale structure withneutral hydrogen surveys , JCAP (2019) 023 [ ].[14] C. Modi, E. Castorina, Y. Feng and M. White,

Intensity mapping with neutral hydrogen andthe Hidden Valley simulations , JCAP (2019) 024 [ ].[15] Y. Feng, M.-Y. Chu, U. Seljak and P. McDonald,

FASTPM: a new scheme for fast simulationsof dark matter and haloes , MNRAS (2016) 2273 [ ].[16] L. B. Newburgh, K. Bandura, M. A. Bucher, T. C. Chang, H. C. Chiang, J. F. Cliche et al.,

HIRAX: a probe of dark energy and radio transients , in

Proc. SPIE , vol. 9906 of

Society ofPhoto-Optical Instrumentation Engineers (SPIE) Conference Series , p. 99065X, Aug., 2016, , DOI.[17] A. Slosar, Z. Ahmed, D. Alonso, M. A. Amin, E. J. Arena, K. Bandura et al.,

PackedUltra-wideband Mapping Array (PUMA): A Radio Telescope for Cosmology and Transients , in

Bulletin of the AAS , vol. 51, p. 53, Sep, 2019, .[18] DESI Collaboration, A. Aghamousa, J. Aguilar, S. Ahlen, S. Alam, L. E. Allen et al.,

TheDESI Experiment Part I: Science,Targeting, and Survey Design , ArXiv e-prints (2016)[ ]. – 13 –

19] LSST Science Collaboration, P. A. Abell, J. Allison, S. F. Anderson, J. R. Andrew, J. R. P.Angel et al.,

LSST Science Book, Version 2.0 , ArXiv e-prints (2009) [ ].[20] M. J. Wilson and M. White,

Cosmology with dropout selection: straw-man surveys &CMB lensing , JCAP (2019) 015 [ ].[21] U. Seljak, G. Aslanyan, Y. Feng and C. Modi,

Towards optimal extraction of cosmologicalinformation from nonlinear data , Journal of Cosmology and Astro-Particle Physics (2017) 009 [ ].[22] C. Modi, Y. Feng and U. Seljak,

Cosmological reconstruction from galaxy light: neural networkbased light-matter connection , JCAP (2018) 028 [ ].[23] J. Jasche and B. D. Wandelt, Bayesian physical reconstruction of initial conditions fromlarge-scale structure surveys , MNRAS (2013) 894 [ ].[24] H. Wang, H. J. Mo, X. Yang, Y. P. Jing and W. P. Lin,

ELUCID—Exploring the LocalUniverse with the Reconstructed Initial Density Field. I. Hamiltonian Markov Chain MonteCarlo Method with Particle Mesh Dynamics , ApJ (2014) 94 [ ].[25] H.-M. Zhu, Y. Yu, U.-L. Pen, X. Chen and H.-R. Yu,

Nonlinear reconstruction , Phys. Rev. D (2017) 123502 [ ].[26] N. G. Karaçaylı and N. Padmanabhan, Anatomy of cosmic tidal reconstruction , MNRAS (2019) 3864.[27] M. Schmittfull, T. Baldauf and M. Zaldarriaga,

Iterative initial condition reconstruction , PRD (2017) 023505 [ ].[28] T. Matsubara, Resumming cosmological perturbations via the Lagrangian picture: One-loopresults in real space and in redshift space , PRD (2008) 063530 [ ].[29] T. Matsubara, Nonlinear perturbation theory with halo bias and redshift-space distortions viathe Lagrangian picture , PRD (2008) 083519 [ ].[30] J. Carlson, B. Reid and M. White, Convolution Lagrangian perturbation theory for biasedtracers , MNRAS (2013) 1674 [ ].[31] M. White,

The Zel’dovich approximation , MNRAS (2014) 3630 [ ].[32] Z. Vlah, E. Castorina and M. White,

The Gaussian streaming model and convolutionLagrangian eﬀective ﬁeld theory , JCAP (2016) 007 [ ].[33] C. Modi, E. Castorina and U. Seljak, Halo bias in Lagrangian space: estimators and theoreticalpredictions , MNRAS (2017) 3959 [ ].[34] M. Schmittfull, M. Simonović, V. Assassi and M. Zaldarriaga,

Modeling Biased Tracers at theField Level , arXiv e-prints (2018) [ ].[35] C. Modi, S.-F. Chen and M. White, Simulations and symmetries , MNRAS (2020) 5754[ ].[36] F. Schmidt, F. Elsner, J. Jasche, N. M. Nguyen and G. Lavaux,

A rigorous EFT-based forwardmodel for large-scale structure , JCAP (2019) 042 [ ].[37] G. Cabass and F. Schmidt, The Likelihood for LSS: Stochasticity of Bias Coeﬃcients at AllOrders , JCAP (2020) 051 [ ].[38] N. Kokron, J. DeRose, S.-F. Chen, M. White and R. H. Wechsler, The cosmology dependence ofgalaxy clustering and lensing from a hybrid N -body-perturbation theory model , arXiv e-prints (2021) arXiv:2101.11014 [ ].[39] J. Nocedal and S. J. Wright, Numerical Optimization . Springer, New York, NY, USA,second ed., 2006. – 14 –

40] H. Padmanabhan, A. Refregier and A. Amara,

A halo model for cosmological neutral hydrogen :abundances and clustering , MNRAS (2017) 2323 [ ].[41] E. Castorina and F. Villaescusa-Navarro,

On the spatial distribution of neutral hydrogen in theUniverse: bias and shot-noise of the H I power spectrum , MNRAS (2017) 1788[ ].[42] F. Villaescusa-Navarro, S. Genel, E. Castorina, A. Obuljen, D. N. Spergel, L. Hernquist et al.,

Ingredients for 21 cm Intensity Mapping , ApJ (2018) 135 [ ].[43] M. A. Zwaan, M. J. Meyer, L. Staveley-Smith and R. L. Webster,

The HIPASS catalogue: Ω HI and environmental eﬀects on the HI mass function of galaxies , MNRAS (2005) L30[ astro-ph/0502257 ].[44] R. Braun,

Cosmological Evolution of Atomic Gas and Implications for 21 cm H I Absorption , ApJ (2012) 87 [ ].[45] A. M. Martin, E. Papastergis, R. Giovanelli, M. P. Haynes, C. M. Springob and S. Stierwalt,

The Arecibo Legacy Fast ALFA Survey. X. The H I Mass Function and Ω _H I from the 40%ALFALFA Survey , ApJ (2010) 1359 [ ].[46] J. Delhaize, M. J. Meyer, L. Staveley-Smith and B. J. Boyle,

Detection of H I in distantgalaxies using spectral stacking , MNRAS (2013) 1398 [ ].[47] J. Rhee, M. A. Zwaan, F. H. Briggs, J. N. Chengalur, P. Lah, T. Oosterloo et al.,

Neutralatomic hydrogen (H I) gas evolution in ﬁeld galaxies at z ∼ ∼ , MNRAS (2013)2693 [ ].[48] P. Lah, J. N. Chengalur, F. H. Briggs, M. Colless, R. de Propris, M. B. Pracy et al.,

The HIcontent of star-forming galaxies at z = 0.24 , MNRAS (2007) 1357 [ astro-ph/0701668 ].[49] S. M. Rao, D. A. Turnshek and D. B. Nestor,

Damped Ly α Systems at z<1.65: TheExpanded Sloan Digital Sky Survey Hubble Space Telescope Sample , ApJ (2006) 610[ astro-ph/0509469 ].[50] S. M. Rao, D. A. Turnshek, G. M. Sardane and E. M. Monier,

The statistical properties ofneutral gas at z < 1.65 from UV measurements of Damped Lyman Alpha systems , MNRAS (2017) 3428 [ ].[51] P. Noterdaeme, P. Petitjean, W. C. Carithers, I. Paris, A. Font-Ribera, S. Bailey et al.,

VizieROnline Data Catalog: SDSS-III DR9 DLA catalogue (Noterdaeme+, 2012) , VizieR OnlineData Catalog (2012) J/A+A/547/L1.[52] A. Songaila and L. L. Cowie,

The Evolution of Lyman Limit Absorption Systems to RedshiftSix , ApJ (2010) 1448 [ ].[53] T. Zafar, C. Péroux, A. Popping, B. Milliard, J. M. Deharveng and S. Frank,

The ESO UVESadvanced data products quasar sample. II. Cosmological evolution of the neutral gas massdensity , A&A (2013) A141 [ ].[54] N. H. M. Crighton, M. T. Murphy, J. X. Prochaska, G. Worseck, M. Rafelski, G. D. Beckeret al.,

The neutral hydrogen cosmological mass density at z = 5 , MNRAS (2015) 217[ ].[55] S. Bird, R. Garnett and S. Ho,

Statistical properties of damped Lyman-alpha systems fromSloan Digital Sky Survey DR12 , MNRAS (2017) 2111 [ ].[56] W. Hu, L. Hoppmann, L. Staveley-Smith, K. Geréb, T. Oosterloo, R. Morganti et al.,

Anaccurate low-redshift measurement of the cosmic neutral hydrogen density , MNRAS (2019)1619 [ ].[57] K. W. Masui, E. R. Switzer, N. Banavar, K. Bandura, C. Blake, L.-M. Calin et al., – 15 – easurement of 21 cm Brightness Fluctuations at z ∼ . in Cross-correlation , ApJL (2013) L20 [ ].[58] E. R. Switzer, K. W. Masui, K. Bandura, L.-M. Calin, T.-C. Chang, X.-L. Chen et al.,

Determination of z ∼ . neutral hydrogen ﬂuctuations using the 21 cm intensity mappingautocorrelation , MNRAS (2013) L46 [ ].[59] A. Obuljen, D. Alonso, F. Villaescusa-Navarro, I. Yoon and M. Jones,

The H I content of darkmatter haloes at z ≈ , MNRAS (2019) 5124 [ ].[60] C. J. Anderson, N. J. Luciw, Y.-C. Li, C. Y. Kuo, J. Yadav, K. W. Masui et al.,

Low-amplitudeclustering in low-redshift 21-cm intensity maps cross-correlated with 2dF galaxy densities , MNRAS (2018) 3382 [ ].[61] L. Wolz, A. Pourtsidou, K. W. Masui, T.-C. Chang, J. E. Bautista, E.-M. Mueller et al.,

HIconstraints from the cross-correlation of eBOSS galaxies and Green Bank Telescope intensitymaps , arXiv e-prints (2021) arXiv:2102.04946 [ ].[62] I. Pérez-Ràfols, A. Font-Ribera, J. Miralda-Escudé, M. Blomqvist, S. Bird, N. Busca et al., TheSDSS-DR12 large-scale cross-correlation of damped Lyman alpha systems with the Lyman alphaforest , MNRAS (2018) 3019 [ ].].