[PDF] Recovering the topology of the IGM at z~2

Abstract

We investigate how well the 3D density field of neutral hydrogen in the Intergalactic Medium (IGM) can be reconstructed using the Lyman-alpha absorptions observed along lines of sight to quasars separated by arcmin distances in projection on the sky. We use cosmological hydrodynamical simulations to compare the topologies of different fields: dark matter, gas and neutral hydrogen optical depth and to investigate how well the topology of the IGM can be recovered from the Wiener interpolation method implemented by Pichon et al. (2001). The global statistical and topological properties of the recovered field are analyzed quantitatively through the power-spectrum, the probability distribution function (PDF), the Euler characteristics, its associated critical point counts and the filling factor of underdense regions. The local geometrical properties of the field are analysed using the local skeleton by defining the concept of inter-skeleton distance. At scales larger than ~1.4 <d_LOS>, where <d_LOS> is the mean separation between lines of sight, the reconstruction accurately recovers the topological features of the large scale density distribution of the gas, in particular the filamentary structures. At scales larger than the intrinsic smoothing length of the inversion procedure, the power spectrum of the recovered HI density field matches well that of the original one and the low order moments of the PDF are well recovered as well as the shape of the Euler characteristic. The integral errors on the PDF and the critical point counts are indeed small, less than 20% for <d_LOS>~2.5 arcmin. The small deviations between the reconstruction and the exact solution mainly reflect departures from the log-normal behaviour that are ascribed to highly non-linear objects in overdense regions.

Full PDF

aa r X i v : . [ a s t r o - ph ] J a n Mon. Not. R. Astron. Soc. , 000–000 (0000) Printed 8 November 2018 (MN L A TEX style ﬁle v2.2)

Recovering the topology of the IGM at z ∼ S. Caucci , S. Colombi , C. Pichon , , E. Rollinde , P. Petitjean , T. Sousbie , Institut d’Astrophysique de Paris & UPMC, 98 bis boulevard Arago, 75014 Paris, France Centre de Recherche Astrophysique de Lyon, 9 avenue Charles Andre, 69561 Saint Genis Laval, France

ABSTRACT

We investigate how well the 3D density ﬁeld of neutral hydrogen in the IntergalacticMedium (IGM) can be reconstructed using the Lyman- α absorptions observed along lines ofsight to quasars separated by arcmin distances in projection on the sky. We use cosmologicalhydrodynamical simulations to compare the topologies of different ﬁelds: dark matter, gasand neutral hydrogen optical depth and to investigate how well the topology of the IGM canbe recovered from the Wiener interpolation method implemented by Pichon et al. (2001). The global statistical and topological properties of the recovered ﬁeld are analyzed quantitativelythrough the power-spectrum, the probability distribution function (PDF), the Euler charac-teristics, its associated critical point counts and the ﬁlling factor of underdense regions. The local geometrical properties of the ﬁeld are analysed using the local skeleton by deﬁning theconcept of inter-skeleton distance.As a consequence of the nearly lognormal nature of the density distribution at the scalesunder consideration, the tomography is best carried out on the logarithm of the density ratherthan the density itself. At scales larger than ∼ . h d LOS i , where h d LOS i is the mean separationbetween lines of sight, the reconstruction accurately recovers the topological features of thelarge scale density distribution of the gas, in particular the ﬁlamentary structures: the inter-skeleton distance between the reconstruction and the exact solution is smaller than h d LOS i .At scales larger than the intrinsic smoothing length of the inversion procedure, the powerspectrum of the recovered H I density ﬁeld matches well that of the original one and the loworder moments of the PDF are well recovered as well as the shape of the Euler characteristic.The integral errors on the PDF and the critical point counts are indeed small, less than 20% fora mean line of sight separation smaller than ∼ Key words: methods: statistical, hydrodynamical simulations – cosmology: large-scale struc-tures of universe, intergalactic medium – quasars: absorption lines

The structure and composition of the intergalactic medium (IGM)has long been studied using the Ly- α forest in QSO absorp-tion spectra (Rauch 1998). The progress made in high resolutionEchelle-spectrographs has led to a consistent picture in which theabsorption features are related to the distribution of neutral hy-drogen through the Lyman transition lines of H I . Hydrogen in theIGM is highly ionized (Gunn & Peterson, 1965). Its photoioniza-tion equilibrium in the expanding IGM establishes a tight corre-lation between neutral and total hydrogen density and numericalsimulations have conﬁrmed the existence of this correlation. Theyhave also shown that the gas density traces the ﬂuctuations of theDM density on scales larger than the Jeans length (see for example Cen et al. 1994, Petitjean et al. 1995, Miralda-Escud´e et al. 1996,Theuns et al. 1998, Viel, Haehnelt & Springel 2004).As we will show in the ﬁrst part of this work, the statisticaland topological properties of the IGM and of the dark matter dis-tributions are the same, so that recovering the three-dimensionaldistribution and inferring the topological properties of the IGM al-lows us to constrain the properties of the dark matter distribution aswell.Although topological tools have been introduced only rela-tively recently in cosmological analysis, they have been used ex-tensively to characterize the topology of large scales structures asrevealed by the three-dimensional distribution of galaxies in thelocal universe (see for exemple Gott et al. (1986), Vogeley et al.(1994), Protogeros & Weinberg (1997), Trac et al. (2002), Park etal.(2005) and Sousbie et al. (2006) for the topological analysis of c (cid:13) Sara Caucci et al. galaxy surveys). The outcome of such an analysis is a quantita-tive description of the complex appearance of the distribution ofthe matter in the universe, with its network of clump, voids, ﬁla-ments and sheet-like structures. The study of the topology usinggalaxy surveys is attractive because of their large volume and thehuge number of objects they contain. However the clustering ofhighly non linear objects (galaxies, clusters of galaxies or QSOs) isbiased compared to the underlying clustering of dark matter ﬂuc-tuations that we wish to constrain (see Kaiser 1984). This biasingresults from a complicated and delicate competition between a va-riety of processes which are often too complicated to be tractableanalytically. Besides, the maximum redshift in surveys is low (inthe analysis of the SDSS data made by Park et al. 2005, the max-imum redshift is z = 0 . ), so that this kind of analysis can bedone only in the local Universe, where the ﬂuctuations have alreadyentered the highly non-linear regime.Given the strong correlation existing between dark matterdistribution and the low-density intergalactic medium, one couldprobe the underlying distribution of matter via the signature pro-duced by diffuse hydrogen in quasar spectra, namely absorptionfeatures observed in the Ly- α forest. Indeed, absorption spectraprovide a picture complementary to those drawn by galaxy surveysto infer the large scale distribution of the matter in the Universe,since the absorption features produced by the IGM in Ly- α forestcan be detected also at large redshift and since the IGM probes thelow density range, whereas the galaxy distribution does not. Even-tually, higher density contrasts can be recovered from the analy-sis of the Ly- α forest if higher order transitions are included in theanalysis; for example, the Ly- β transitions should allow us to probedensity contrasts up to δ ≈ .The ﬂux along a single line-of-sight towards a quasar onlyprovides one dimensional information, which can be used to con-strain the ﬂuctuation amplitude and the matter density (Nusser &Haehnelt 1999, Rollinde et al. 2001, Zaroubi et al. 2006). The trans-verse information, found in pairs of quasars, has been used to studythe extension of the absorbing regions (e.g. Petitjean et al. 1998;Crotts & Fang 1998; Young, Impey & Foltz 2001; Aracil et al.2002) and the geometry of the Universe at z ∼ (Hui, Stebbins& Burles 1999; McDonald & Miralda-Escud´e 1999; Rollinde et al.2003, Coppolani et al. 2006).Given a set of lines of sight (LOSs) toward a group of QSOswith small angular separation, inversion methods can be used torecover the three-dimensional distribution of low density gas, asdemonstrated in Pichon et al. (2001). They showed that the visualcharacterization of the density ﬁeld (with its network of ﬁlaments,clumps, voids and pancakes) is correctly reproduced if the meanseparation between the LOSs is less than h d LOS i ≤ Mpc.In this paper we test quantitatively whether such an inversioncan recover the global properties of connectivity of the densityﬁeld, using topological tools such as the Euler characteristic andthe probability distribution function.The paper is organized as follows. In Section 2, the Euler char-acteristic is deﬁned as an alternate critical points count and imple-mented for a Gaussian ﬁeld. The difference between the topolog-ical properties of the dark matter, of the gas and of the observedoptical depth is then discussed using outputs of a hydrodynami-cal simulation (Section 3) and relying on different statistical tools.In Section 4, the ability to reconstruct the global topology of thethree dimensional distribution from a simple Wiener interpolationof a discrete group of lines of sight is considered. Finally, Section 5summarizes the results of this paper and discusses some possible improvements of the method as well as observational constrainsfrom future surveys.

This paper makes use of various statistical tools, namely the PDF,the Euler characteristic, the skeleton and related estimators suchas the ﬁrst cumulants of the PDF (connected moments), criticalpoint counts and the ﬁlling factor, to characterize the topology ofthe large scale density distribution. These tools will also be usedto test the efﬁciency of reconstructing the density ﬁeld from a gridof QSO sight-lines and in particular the ability to reproduce theconnectivity of the large scale structures.Following Colombi, Pogosyan & Souradeep (2000, hereafterCPS), this section introduces the Euler characteristic, ˜ χ + , as an al-ternate critical point count in an overdense excursion with densitycontrasts larger than a threshold δ TH . It is shown how the behaviorof ˜ χ + is related to connectivity in the ﬁeld. The numerical imple-mentation used to measure it is described and tested on Gaussianrandom realizations. Let δ ( x ) be a scalar function deﬁned in a 3D volume V . Given athreshold value δ TH , consider the excursion set E + formed by thepoints x with δ ( x ) ≥ δ TH , as expressed by the following equation: E + ≡ { x | δ ( x ) > δ TH } . (1)The analysis of the geometrical properties of points that belong tothe excursion set E + as a function of δ TH gives information aboutthe global topology of the scalar ﬁeld δ ( x ) and allows for the char-acterization of large scale structures.A simple qualitative link can be established between the dis-tribution of critical points (deﬁned by ∇ δ = ), a on the one hand,and connectivity on the other, which are related to local and global properties of the excursion set respectively. If one considers over-dense regions, connectivity happens along ridges (ﬁlaments) pass-ing through saddle points and connecting local maxima. The samereasoning can be applied to under-dense regions where minima areconnected through tunnels (pancakes) via another kind of saddlepoint. This idea is in fact supported on rigorous grounds by Morsetheory (see Milnor 1963). The Morse theorem establishes the linkbetween the distribution of critical points and the global connectiv-ity of the excursion set, via the Euler characteristic. This quantityrepresents the integral of the Gaussian curvature over an iso-densitysurface that marks the boundary of the excursion set (see for exem-ple Gott, Melott & Dickinson, 1986). It is usually deﬁned as thefollowing count (see for example Mecke, Buchert & Wagner, 1994,for details): ˜ χ + = connected components − tunnels + cavities . (2)According to Morse theorem, it can also be expressed as a linearcombination of the number of critical points of different types thatare found in the excursion set as a function of δ TH .To be more speciﬁc, let us consider the critical points of theﬁeld. For these points, the Hessian matrix, whose components aregiven by: H i,j = ∂ δ∂x i ∂x j , (3) c (cid:13) , 000–000 he topology of IGM at high redshift −4 −2 0 2 4−0.08−0.06−0.04−0.020.000.020.04 d / sc + Figure 1.

Mean Euler characteristics, χ + (see Eqs. 4 and 6), for a Gaus-sian random ﬁeld (GRF, points with small errorbars) as a function of thedensity threshold, δ/σ , compared to the theoretical prediction (Doroshke-vich, 1970, smooth curve). The mean is carried over 5 realizations of a GRFwhose power spectrum is given by a power-law with spectral index n = − on a grid, while additional smoothing is performed with a Gaussianwindow of size pixels. is calculated and its eigenvalues are estimated. According to thenumber of negative eigenvalues, I , of the Hessian matrix, the lo-cal structures of the ﬁeld can be classiﬁed in the following way: aclump, a ﬁlament, a pancake and a void corresponding to I = ˜ χ + = N I =3 − N I =2 + N I =1 − N I =0 , (4)where N I = i is the number of critical points with i negative eigen-values. With this approach, it is sufﬁcient to determine the numberdistribution of the four kinds of critical point. However this dif-ferential method requires the ﬁeld under consideration to be sufﬁ-ciently smooth and non degenerate. To this end, in the subsequentanalyses of this paper, the ﬁeld will be smoothed with a Gaussianwindow (using standard FFT technique), W ( r ) = 1(2 π ) / L s exp (cid:18) − r L (cid:19) , (5)of sufﬁciently large size L s compared to the sampling grid pixelsize in order to minimize the impact of numerical artefacts com-ing from the discretization of the ﬁeld on a grid (see, e.g., CPSfor a thorough analysis of measurement issues). In what follows,the smoothing scale used to measure the Euler characteristic is al-ways larger than . × N pix grid pixels where N pix = 256 pixelsis the box resolution of the simulations. In principle this smooth-ing scale is large enough to have an unbiased measurement of theEuler characteristic. The prescription of CPS is used to detect andclassify critical points. This method involves locally ﬁtting a sec-ond order hypersurface on the smoothed density ﬁeld, while takinginto account each point on the grid under consideration and its 26neighbors. MinimaPancakesFilamentsMaxima −4 −2 0 2 40.00.20.40.60.81.0 d / s F r a c t i on abo v e t he t h r e s ho l d Figure 2.

Evolution of the number of critical points entering the computa-tion of the Euler characteristic for the GRF considered in Fig. 1. The frac-tion of different types of critical points above the threshold is plotted as afunction of δ/σ and each distribution is compared to the analytical predic-tion. Again, symbols with error bars represent the mean over 5 realizationsof the same GRF, while the smooth curves give the analytical prediction(which can be easily derived from Bardeen et al. 1986).

For clarity, let us recall here the interpretation of the shape of theEuler characteristic as a function of density threshold (CPS). Let usﬁrst study the simple case of a Gaussian random ﬁeld (GRF). Theanalytic predictions for a GRF are given, for example, in Doroshke-vich (1970) (see also, Schmalzing & Buchert 1997). In what fol-lows, a slightly different normalization from Eq. (4) is used: thevolume independent quantity χ + = ˜ χ + /N tot , (6)where N tot = P i N I = i is the total critical point count in the vol-ume considered, in the limit δ TH → −∞ .In Figure 1, the numerical estimates of the Euler characteristicare given as a function of the density threshold δ TH = δ/σ where δ = ( ρ − ¯ ρ ) / ¯ ρ is the density contrast and σ = rms( δ ) from ﬁveGaussian random ﬁeld (GRF) realizations (points with error bars)whose power spectrum is given by a power-law with spectral index n = − , i.e. P ( k ) ∝ k n . The result is compared with the ana-lytic prediction (solid line). The shape of the curve as a function ofthe density threshold can be understood from Equation (4) and Fig-ure 2 which displays the critical point counts. At very low valuesof the threshold ( δ/σ < ∼ − ) the excursion set includes almost allpoints and, due to the symmetry between high and low-density re-gions, the number of minima (pancakes) compensates the numberof maxima (ﬁlaments) so that the Euler characteristic approacheszero. When the value of the threshold is increased, local minimaﬁrst drop out of E + , creating cavities and thus increasing the valueof χ + . At δ/σ > ∼ − , pancakes start to drop out too and cavitiesconnect together, thus the value of χ + decreases, reaching its mini-mum at δ/σ ≈ . In the range < ∼ δ/σ < ∼ ﬁlaments also drop out,breaking up the ridges to create isolated clusters, thus increasing χ + again. Finally in the region δ/σ > ∼ only clumps are found tolie in the excursion set, but they are progressively lost as the thresh-old increases, explaining the ﬁnal decrease of the curve. c (cid:13) , 000–000 Sara Caucci et al.

This simple analysis shows how the features seen in the Eulercharacteristic are closely related to the network of ﬁlaments andpancakes that connect clumps and voids.

The Lyman- α absorption lines observed in QSOs spectra and pro-duced by the H I structures intercepted by the line-of-sight canbe used to study the topology of the Universe at high redshift( z > ∼ − ). However, the information derived from observationsof QSOs spectra is more directly related to the H I optical depth,whereas here the aim is to constrain the underlying dark matterdensity ﬁeld for which theory makes direct predictions. Hence, onehas to rely on simulations in order to calibrate the relation betweenthe density ﬁeld of the neutral hydrogen and that of the dark matter.In this section we ﬁrst present the hydrodynamical simulationsused in the present work and we then analyse the shape of the PDFand the Euler characteristic of the three density ﬁelds (dark matter,gas and H I ) and of the optical depth ﬁeld, explaining how thesecurves are related. We analyse a cosmological hydrodynamical simulation that evolvesboth dark matter particles and a gaseous component to study theglobal topology of the intergalactic medium at redshift z = 2 . Thedynamical evolution and the physical properties of the gas and theof the H I component are calculated taking into account the heat-ing and cooling processes and the effect of the ionizing UV back-ground in a standard way. The corresponding Particle-Mesh (PM)code used to perform the simulation is described in detail in Cop-polani et al. (2006).In this run, the standard Λ CDM model is assumed with a setof cosmological parameters given by: Ω m = 0 . and Ω Λ = 0 . ,while the assumed baryon density is Ω b = 0 . . The Hubble con-stant is H = 70 km s − Mpc − and the amplitude of the ﬂuctu-ations of the matter density ﬁeld in a sphere of radius h − Mpcis σ = 1 . While the other cosmological parameters are roughly inagreement with recent observational constraints, the value of σ issomewhat large compared to the value suggested by WMAP (seeSpergel et al., 2007). However, this should not have any incidenceon the results derived in this paper.The simulation involved dark matter particles in a boxwith periodic boundary conditions of comoving size L box = 40 Mpc. The gaseous component was also followed on a gridwhich was used to compute gravitational forces. Although this sim-ulation marginally resolves the Jeans length of the gas, Coppolaniet al. (2006) checked with higher resolution runs that numericalconvergence was achieved at small scales.Although grid points were available, this resolution wasdegraded to a resolution (using standard donner cell pro-cedure), in order to make the calculations more tractable. Obvi-ously this additional smoothing makes the effects of subclusteringwithin the Jeans length irrelevant. Therefore the gaseous compo-nent should be nearly indistinguishable from the dark matter com-ponent.The main limit in these analyses remains the box size, whichis still small and only allows for a fair statistical measure at scales L s larger than L max ∼ L box / , i.e. 4 Mpc. Indeed ﬁnite volumeeffects are known to become signiﬁcant for L s > ∼ L max for stan-dard statistics such as the probability function (see, e.g., Colombi, Bouchet & Schaeffer 1994) and the Euler number (see, e.g., CPS).For the reconstruction, the typical separation h d LOS i between linesof sight deﬁnes a natural smoothing scale L s ≃ h d LOS i . Note that,unfortunately, the upper bound of L s ∼ Mpc corresponds toa lower bound on h d LOS i in current state of the art observations(Rollinde et al. 2003), but one can expect to lower this limit in fu-ture surveys (Theuns & Srianand 2006). Hence the following anal-yses are performed in the range ≤ L s ≤ Mpc.

It this Section we compare the large-scale distribution and the topo-logical properties of the different density ﬁelds (dark matter, totalamount of gas and neutral gas) by looking at their probability dis-tribution function (PDF) and their Euler characteristic ( χ + ). Ourknowledge of the physics of the intergalactic medium is used toperform the analysis and to link the distributions of H I and H. In-deed, the observations give access to the H I optical depth throughabsorption spectra. We also consider thermal broadening and red-shift distortion effects. I : IGM equation of state It is well known that on scales larger than the Jeans length the dis-tribution of the gas follows the distribution of dark matter, so thattheir statistical and topological properties are expected to be thesame at these scales. This is checked by comparing the PDF andthe Euler characteristic of the two density ﬁelds smoothed usingdifferent values of L s , as shown in Figs. 3 and 4. Note the agree-ment of the PDFs and of the Euler characteristics of the two ﬁeldsfor all values of the smoothing scale considered, a result which canbe expected since the scaling regime probed is largely above theJeans length of the gas.The comparison of the distribution of the neutral gas (H I ) withthat of the total amount of gas and the dark matter calls for a slightlymore elaborate approach, given the non-linearity involved in theexpression that relates the distribution of the gas to the distributionof H I . In fact, numerical simulations support the idea that a tightcorrelation exists between neutral and total hydrogen density (Cenet al. 1994, Miralda-Escud´e et al. 1996, Theuns et al. 1998, Viel,Haehnelt & Springel 2004). This correlation is expected to followa power-law of the form ρ gas ≈ A · ( ρ H i ) α . (7)We thus introduce here a new density ﬁeld ˜ ρ H i deﬁned as the right-hand-side of Equation (7), so that ˜ ρ H i ≡ A · ( ρ H i ) α . In what followsthis new density ﬁeld will be used in order to approximate the den-sity of the gas. However, Equation (7) is not fulﬁlled in the wholerange of ρ H i values.To illustrate this, Fig. 5(a) displays the gas density distribution(top), with its network of ﬁlaments outlined in the left panel witha contour corresponding to δ = 1 , and the temperature distributionin units of K (bottom) for which we have drawn the contourcorresponding to ( T / ) = 2 in the left panel. Note that alongﬁlaments and at their intersection the gas is hot. This indicates thatshock waves propagate along ﬁlaments, rising the temperature andionizing the gas. This is conﬁrmed by Fig. 5(b) which shows theratio R = ˜ ρ HI /ρ gas measured directly in the grid (top), andafter a Gaussian smoothing with a window whose size is L s = 2 . Mpc (bottom). In both cases, the panels on the left show the con-tours relative to R = 0 . . To complete the picture, let us consider c (cid:13) , 000–000 he topology of IGM at high redshift d P D F L s = 1.6 MpcL s = 2.2 Mpc L s = 3.6 Mpc Figure 3.

Probability distribution function of density ﬁelds at different smoothing scales ( from left to right, top to bottom , no smoothing, L s =1.6, . and . Mpc). The thick solid, thick dashed and thin solid curves correspond to dark matter, gas and H I (rescaled according to Eq. 7), respectively. The dotted curve isa best ﬁt of a lognormal distribution to the thin solid curve, showing that all these PDFs are reasonably close to lognormal, a property that will be useful forthe reconstruction. In the unsmoothed case, the gas and H I PDFs match very well for δ < ∼ but depart from each other at higher density. The apparentvery good match in the unsmoothed case comes from the fact that the un-shocked part of the intergalactic medium totally dominates the part of the PDF whichis visible in this panel. The match between H I and gas PDFs decreases with increasing smoothing scale, due to “mixing effects”, as explained in the main text.Note ﬁnally that the dark matter is not displayed in the unsmoothed panel because the result would be contaminated by the cloud-in-cell interpolation used tocompute the density on the grid. Figure 4.

Same as in Fig. 3 but for the measured Euler characteristic as a function of density threshold at different smoothing scales. Again, the thick solid,thick dashed and thin solid curves correspond respectively to dark matter, gas and H I (rescaled according to Eq. 7 with the values ( A, α ) given in Table 1).While the curves for the dark matter and for the gas superpose exactly at all smoothing scales and all values of the threshold, the H I , even after the scaling isapplied, behaves in a different way in the high density region. As explained in the text, this is a consequence of the presence of shocks and condensed objects,whose effect is a change in connectivity properties.c (cid:13) , 000–000 Sara Caucci et al. G a s den s i t y T e m pe r a t u r e (a) Gas density and temperature R − U n s m oo t hed R − L s = . M p c (b) Ratio R= ˜ ρ HI /ρ gas Figure 5.

Top : Gas density and temperature (in units of K) spatial distributions in a one pixel ( ≈ . kpc) slice. The intensity of the ﬁelds is color-codedwith the scale given on the right. The panels on the left give the contours corresponding to δ = 1 for the density and to T/ = 2 for the temperature. Bottom : For the same slice as above we show on the right the spatial distribution of the ratio R = ˜ ρ HI /ρ gas for the unsmoothed ﬁeld ( up ) and for the ﬁeldsmoothed with a Gaussian window of size (FWHM) L s = 2 . Mpc ( down ). The color scale is such that darker regions correspond to low values of R . On theleft the contours correspond to R = 0 . . Fig. 6 which shows the scatter between ρ gas and ρ H i for differentsmoothing scales, as indicated in each panel. As expected, the tight-ness of the correlation is very high in underdense and moderatelydense regions, but shock heating on the one hand and the forma-tion of condensed objects on the other produce a signiﬁcant scatter(where R < ) along densest ﬁlaments and at their intersection (inclusters). For the purpose of the reconstruction, some smoothing isrequired. Unfortunately, smoothing also mixes these regions withthe un-shocked part of the intergalactic medium. This is conﬁrmed, in a qualitative way, by the slices shown in Fig. 5(b). More quantita-tively, for the ﬁelds R shown in Figure 5(b) we have calculated thefraction of the volume occupied by the regions with R < . (i.e.the volume of the regions enclosed by the contours in the left pan-els). For the slice shown, this fraction is f ( R < .

7) = 0 . and f ( R < .

7) = 0 . for the un-smoothed and the smoothed case re-spectively, while when the whole three dimensional boxes are con-sidered, the fraction of volume occupied by shocked regions are f ( R < .

7) = 0 . and f ( R < .

7) = 0 . . As a result of such c (cid:13) , 000–000 he topology of IGM at high redshift Figure 6.

Scatter plots displaying the relation between the gas density and the H I density at different smoothing scales. The dashed black lines in each panelrepresent the best ﬁt, following Eq. (7), with the parameters ( A, α ) given in Table 1. Note that the dispersion increases when the smoothing scales increases,due to the mixing effect discussed in Figure 5. L s A α

Unsmoothed 1.275 0.631.6 Mpc 0.915 0.568792.2 Mpc 0.85 0.552093.6 Mpc 0.795 0.5389

Table 1.

Values of the parameters A and α entering in the scaling relationbetween gas and H I (see Eq. 7) as a function of the smoothing scale L s . mixing, the tightness of the correlation is weakened, but remainsgood as shown in Fig. 6. However, the best ﬁt values of the pa-rameters A and α changes slightly when the ﬁeld is smoothed (seeTable 1). We ﬁt these values with the low density tail of the PDF(see Figure 3). As expected, the higher density tail match worsenswith smoothing.Given that the scaling relation (7) is monotonous, should itapply exactly, the topology of the neutral gas should be exactlythe same as of the total gas/matter distribution. However, giventhe dispersion of this relation, one expects the Euler characteristicof the ˜ ρ H i ﬁeld to depart from that of ρ gas for large density con-trasts. This is conﬁrmed by Fig. 4: a nearly perfect agreement isfound between the gas and H I for δ < ∼ , while differences becomesigniﬁcant at larger values of the density contrast. Increasing the smoothing length (i.e. going from left to right in Fig. 4) worsensthe match, as expected, but this is in part lost in the noise due toﬁnite volume effects. Note that χ + measured in H I is, in the δ > regime, more peaked than for the total gas. This agrees with intu-ition, since galaxies form in ﬁlaments: in these highly condensedobjects, gas concentrates and cools down. Hence the H I density be-comes signiﬁcant again inside these clumps, but is depleted in theirsurroundings due to shock heating as can be seen from Fig. 5. Theresulting distribution of H I in ﬁlaments is therefore expected to bemore clumpy than the total gas i.e. less efﬁciently connected, re-sulting in a larger increase of χ + for δ > . The estimates madeinside ﬁlaments are however certainly not free of numerical arte-facts since they are limited by the simulation’s spatial resolution(following accurately the formation of condensed objects requiresmuch higher spatial resolution than our simulation). Therefore al-though one can deﬁnitely trust the δ < ∼ measurements, the resultsderived for δ > are likely to yield the right qualitative behavior,but are certainly quantitatively biased.In the next Sections, the δ > disagreement will be ignoredand it will be assumed that the scaling relation (7) is always valid,keeping in mind the limitation of such an assumption. Hence, re-constructions will be performed on the optical depth without at-tempting to directly recover the gas distribution. c (cid:13) , 000–000 Sara Caucci et al.

Figure 7.

The effect of redshift distortion on the HI density. The same slice (whose width is 6 pixels, corresponding to 0.94 Mpc) of the H I density contrast isshown without distortion ( left panels ) and with distortion ( right panels , using the inﬁnitely distant observer approximation with a distortion along the x axis)in the case where the ﬁelds are not smoothed ( top panels ) and when the ﬁelds are smoothed at L s = 1 . Mpc (bottom panels).

Figure 8.

Effect of redshift distortion on the Euler characteristic of H I at different smoothing scales: L s = 1 . Mpc, 2.2 Mpc and 3.6 Mpc. The solid blackline is for H I without any distortions, the dashed yellow line has been obtained by including only the effect of peculiar velocities, while for the red thin lineboth redshift distortion and thermal broadening are taken into account. I to optical depth: redshift space distortions andthermal broadening In the above discussion we argued that the main features of darkmatter topology, as traced through the Euler characteristic and theprobability density function, can be recovered through the topol-ogy of the H I for small density contrasts, δ < ∼ . However, along aline-of-sight, the optical depth is in fact observed in redshift space, where distortions induced by the peculiar motions operate. More-over, the proﬁles of absorption lines are broadened at small scalesby the effect of the temperature. Since the thermal broadening isimportant only at scales of the same order or smaller than the Jeanslength, this second effect should be negligeable in the scaling rangeconsidered in this paper, since it will be swept out by the smooth- c (cid:13) , 000–000 he topology of IGM at high redshift ing. On the contrary, redshift distortion should a priori not be ne-glected.In theory, it is possible to partially correct for redshift distor-tion effects (see for instance PVRCP). However, the correspondingtreatment of the peculiar velocities involves a simultaneous decon-volution of the H I density ﬁeld with the velocity ﬁeld on top of theinversion discussed in next section. This requires not only a priorfor the density ﬁeld but also for its correlation with the peculiar ve-locity ﬁeld and makes the inversion quite convolved and this wouldgo beyond the scope of this paper. In what follows, it is shown thatin fact redshift distortions have a small effect on the topology of theoverall density distribution for the probed scales; they shall thus beneglected in the reconstruction part of this work. Moreover, oneof the interesting outcomes of the reconstruction is to predict thepositions of ﬁlaments in the three dimensional matter distribution.Cross-correlation of such a distribution with for instance the ob-served distribution of galaxies at high redshift can in fact be alsoperformed in redshift space.Figure 7 displays the H I distribution in real and redshift spacewith and without smoothing: the main effect of redshift distortionon H I is an enhancement of large scale density contrasts orthog-onally to the line of sight due to large scale motions (this is theso-called Kaiser effect, e.g. Kaiser 1987): the “voids” (underdenseregions) are more pronounced, and the ﬁlaments orthogonal to theLOS are more contrasted. There is as well a small scale “ﬁnger ofgod” effect, due to internal motions inside large dark matter haloes,but it is not very pronounced at such a high redshift, and is in ampli-tude of the same order of thermal broadening. Note however, thatnon trivial shell crossings can still occur, e.g. two ﬁlaments cross-ing each other thanks to peculiar velocities, but this effect remainssmall, and is clearly damped out by smoothing; after smoothingonly the Kaiser effect remains.These qualitative arguments are illustrated by Figure 8. Themeasured Euler characteristics before and after redshift distortiondiffer only slightly. When the redshift distortion is taken into ac-count, for δ < ∼ , a shift towards the left is induced (dashed curve)as compared to the non-distorted case (solid one); while the oppo-site occurs for δ > ∼ (although in the latter case, the effect seemsto be nearly insigniﬁcant). This shift remains quite small as arguedbefore. Note as well that thermal broadening (thin curve) is totallynegligeable.Finally, one last point should be mentioned. When one con-siders real absorption spectra, instrumental noise has to be takeninto account in the analysis. This noise, combined with saturationof the ﬂux of the Ly- α absorption lines arising in high density re-gions (with δ > ∼ ) can complicate the interpretation of the mea-surements. In this case, some of the information about the intensityof the density ﬁeld cannot be recovered, unless, say, Lyman- β isalso available. In this work, however, the main interest lays in re-producing the low-density part of the H I distribution, for which therelation (7) holds and for which the topology of the underlying darkmatter distribution is theoretically constrained. In this regime, theLy- α lines are not saturated, thus a complete treatment of satura-tion effects in high density regions is not required for the aim of thepresent work. The absorption spectrum towards a quasar gives access to one-dimensional information i.e. the optical depth along the line of sight (LOS) towards the QSO. However, if a set of LOSs towards a groupof quasars is available, the information along each LOS can be in-terpolated to construct a 3-dimensional optical depth ﬁeld.In this Section we ﬁrst brieﬂy outline the inversion techniqueimplemented to recover the optical depth and describes how to setthe parameters that enter the inversion procedure. We then checkhow the reconstruction performs by measuring various statisticalquantities, in particular the PDF of the density ﬁeld and its Eulercharacteristic. As argued in the previous Section, the focus is onthe optical depth: no attempt is made to recover the gas or darkmatter distribution directly. Thermal broadening, redshift distortionand effects of saturation or instrumental noise are neglected. Giventhese assumptions, studying the optical depth distribution is thenequivalent to studying the H I density distribution, ρ H i . The technique used to interpolate the optical depth ﬁeld betweenlines of sight is described and discussed in details in PVRCP.Let D be a 1-dimensional array representing the data set (i.e.the values of γ LOS = ln( ρ H i ) along the LOSs, which we assumeto be parallel to each other); we call M the 3-dimensional array ofthe parameters that need to be estimated (here the values of γ =ln( ρ H i ) in the 3-dimensional volume) by ﬁtting the data. Wienerinterpolation reads (see Eq. 20 of PVRCP), assuming the noise isuniform and uncorrelated, M = C MD · ( C DD + N ) − · D , (8)where N = n I is the diagonal noise contribution, C MD is themixed parameters-data covariance matrix and C DD is the data co-variance matrix: C MD = C γ γ LOS , C DD = C γ LOS γ LOS . (9)Here an ad-hoc prior is used and a Gaussian shape for the co-variances is assumed. In this cases the matrices C γ γ LOS and C γ LOS γ LOS are given by C ( x , x , x ⊥ , x ⊥ ) = σ × exp (cid:16) − ( x − x ) L x (cid:17) × exp (cid:16) − | x ⊥ − x ⊥ | L T (cid:17) , (10)where ( x i , x i ⊥ ) represents the coordinates of the points along andperpendicular to the LOSs respectively, L x and L T are correlationlengths along and perpendicular to the LOSs, while σ quantiﬁesthe typical a priori ﬂuctuations in a volume of size L x × L T . Themeaning and choice of these parameters will be discussed furtherin § σ/n ,and two typical lengths in the interpolation, L x and L T . c (cid:13) , 000–000 Sara Caucci et al.

Each reconstruction is performed on a number N LOS of LOSs ex-tracted at random from the simulation box. Since the distant ob-server approximation is implemented, all the LOSs are parallel.For a given value of N LOS , the mean inter-LOS distance, h d LOS i ,reads: h d LOS i ≡ p L /N LOS . (11)This parameter obviously deﬁnes a natural scale in the reconstruc-tion: one cannot, intuitively, expect to reconstruct details of the dis-tribution at scales < ∼ h d LOS i , at least in the direction orthogonal tothe LOSs.The meaning of the parameters L T and L x in Eq. (10) is thenquite straightforward. The correlation lengths L = L T and L = L x stabilize the inversion by ensuring the smoothness of the recon-struction. In order to avoid the formation of ﬁctitious structures, thetransverse correlation length must be of the order of the mean sepa-ration between the LOSs, L T ∼ h d LOS i (we have chosen to take itexactly equal to h d LOS i ), while the choice of the longitudinal cor-relation length depends on the problem considered. Since redshiftdistortion is not a concern in the present work, this parameter canbe chosen to be of the order of the Jeans length in order to avoidinformation loss for small scales along the LOSs, here L x = 0 . Mpc.From a practical point of view, the variance parameter σ of thecorrelation matrix ﬁxes the relative contribution of signal to noisein Eq. (8), σ/n . In our reconstruction, only ideal LOSs are con-sidered. Thus, strictly speaking, there is no instrumental noise orsaturation effects. However, the inversion of the matrix C DD + N is numerically unstable when N is set to zero, given the ﬁnite sam-pling and the degeneracy of the matrix, Eq (10), close to its diag-onal, ( x , x ⊥ ) ≃ ( x , x ⊥ ) . In practice one has to “tune” thesignal-to-noise ratio, σ/n , to obtain the best compromise betweennumerical stability and “exactness” of the ﬁnal reconstruction. Thischoice is ad-hoc : ( σ/n ) is the estimated variance σ ( L T , L x ) ofthe underlying ﬁeld in a box of size L T × L T × L x . This is equiv-alent to assuming that, as the noise goes to zero, the inverse of thenon-reduced second order correlation (in the appropriate units) isused, I + C DD , instead of the reduced one, C − , to perform astable reconstruction.In this work we estimate directly σ ( L T , L x ) from the simu-lation. It is however important to note that σ ( L T , L x ) can in prin-ciple be derived from the LOSs alone by measuring the 1-D power-spectrum of ρ H i . From this 1-D power-spectrum, one can indeedinfer a 3-D power-spectrum with standard deconvolution methodsand then an estimate of σ ( L T , L x ) by the appropriate integral onthe 3-D power-spectrum.The measured values of σ ( L T , L x ) are listed in Table 2. Theyare of the order unity: the assumed signal-to-noise ratio is about onein the regime considered here. Hence, in practice, the ad-hoc pro-cedure used to perform the inversion does not change signiﬁcantlyby including the contribution of the actual instrumental noise. How-ever, in this case the presence of the saturated regions in the Lyman- α spectra remains a problem.Finally, note that due to the large size of the matrices, re-constructions can only be performed by partioning the simulationbox in blocks of smaller size, that contain N grid points with N sub = 32 . The reconstruction is performed on each block indi-vidually. In order to avoid edge effects, neighboring patches areoverlapped by adding a buffer region in which LOSs still con-tribute. In this way, the a priori correlation ensure continuity be- N LOS

Separation L T σ (arcmin) (Mpc)400 1.33 2 1.12320 1.49 2.24 1.17225 1.77 2.67 1.23200 1.88 2.83 1.25145 2.2 3.32 1.29120 2.42 3.65 1.31100 2.65 4 1.34 Table 2.

Parameters used in the reconstructions performed in this paper.The longitudinal correlation length as been ﬁxed to the value L x = 0 . Mpc for all the reconstructions. tween adjacent patches. The size of the buffer region is chosen tobe n over ≃ L T (in grid pixel units), which implies a typical resid-ual contamination of edge effects due to the partitioning of less than2 percent. First note that the inversion is not directly performed on the density,but on its logarithm, i.e. the normalized density ﬁeld is written as ρ H i = 1 + δ H i ≡ exp( γ H i ) and the ﬁeld γ H i is interpolated byusing the method described above. This has two advantages: (i) itensures the positivity of the reconstructed density; (ii) since thedensity turns out to be roughly log-normal (see for example Bi &Davidsen, 1997; Choudhury, Padmanabhan & Srianand, 2001; Vielet al., 2002a; Zaroubi et al.; 2006, Desjacques et al., 2007; and seeColes & Jones, 1991, for the statistical properties of the log-normaldistribution), performing the reconstruction on the logarithm of theﬁeld is expected to reproduce more realistic results as shown onFig. 3.However, as a result of the reconstruction, the recoveredﬁeld, γ H i , rec , will be smooth over anisotropic volumes of size ∼ L x × L T × L T , which means that at best, one can identify struc-tures at this level of smoothness on a logarithmic space. Althoughtheoretical predictions (namely gravitational clustering, primordialnon Gaussianities, etc. ) do exist for the density ﬁeld itself, that isfor ρ H i = exp( γ H i ) and its smoothed counterparts, they can notbe applied directly in our case because smoothing and taking theexponential are operations which do not commute, except in theweakly nonlinear regime, δ H i ≪ . In particular, recovering theresults for δ H i on a linear space, by taking the exponential of γ H i and subsequently smoothing it, an effective bias, essentially due torare peaks in the γ H i , rec ﬁeld, is introduced. The effect of such anonlinear bias is difﬁcult to control and can in some cases be im-portant as shown below and studied in more details in Appendix A. We now test the quality of the reconstruction using the same statis-tical tools as in Section 3, namely the PDF of the ﬁeld and the Eulercharacteristic. Other statistics are considered, such as the varianceand the skewness of the PDF, the power-spectrum of the density Note that if a ﬁeld such as ρ gas is lognormal, the (inverse of the) trans-formation (7) leaves the new ﬁeld, e.g. ρ H i , lognormal as well.c (cid:13) , 000–000 he topology of IGM at high redshift Raw ﬁeldsSmoothed ﬁelds in logarithmic space Smoothed ﬁelds in linear space

Figure 9.

Qualitative comparison between the original H I density ﬁeld in terms of γ = ln(1 + δ ) (left column in each group of 4 panels) and the recoveredone (right column in each group of 4 panels) in a thin slice (the thickness of the slice is 8 pixels, corresponding to 1.25 Mpc). Higher densities correspond todarker colors. The recovered ﬁeld has been obtained by inverting a set of N LOSs = 320 random LOSs (mean separation h d LOS i = 2 . Mpc) taken throughthe original (unsmoothed) density ﬁeld. In each group of panels, the ﬁrst row corresponds to a slice orthogonal to the LOSs, while the second row correspondsto a slice parallel to the LOSs.

Upper group: the raw γ ﬁelds for the original box and for the reconstruction. Lower left group: same as the upper group butafter smoothing (in the logarithmic space) with a Gaussian window of radius L s = √ h d LOS i = 3 . Mpc.

Lower right group: same as the lower left one,but smoothing is now applied directly to the density ﬁeld, δ = exp( γ ) , instead of its logarithm and the normalization is slightly different (see Eq. A1 ofAppendix A). ﬁeld and the ﬁlling factor of regions less dense than the minimumof the Euler characteristic. In addition, to have a quantitative esti-mate of the accuracy in the locus of the ﬁlamentary structures, weuse the skeleton as introduced by Novikov, Colombi & Dor´e (2006)and by Sousbie et al. (2007) and deﬁne an inter-skeleton distance(ISD).Following the discussion in Section 4.3, the reconstruction ismainly tested on the ﬁeld γ H i and its smoothed counterparts. InAppendix A we provide additional results on the ﬁeld δ H i .Since there are two scales in the inversion (see Section 4.2), the recovered optical depth is an anisotropic smooth ﬁeld withfewer structures in the direction transverse to the LOSs than inthe direction parallel to them. Optimal comparison between recon-structed and real optical depth would require an approach basedon anisotropic smoothing, a level of complexity well beyond thescope of this paper. Instead, to compare the reconstructions tothe exact solution, an isotropic smoothing via a Gaussian win-dow is used (see Eq. 5). The width of the smoothing window is L s > ∼ h d LOS i = L T . The choice of the optimal smoothing scale isconstrained by the inter-skeleton distance. c (cid:13) , 000–000 Sara Caucci et al. −4 −3 −2 −1 00.00.51.0 −4 −3 −2 −1 0 −3 −2 −1−0.10−0.050.000.050.00.51.0 0.1 1.010 −10 −8 −6 −4 −2 ln(1+ d HI ) P ( k ) P D F E u l e r c ha r a c t e r i s t i c C r i t i c a l po i n t s N LOS = 320 N

LOS = 200 N

LOS = 120|k|

Figure 10.

Statistics and topology in logarithmic space, i.e. in terms of γ H i = ln(1 + δ H i ) , for three different reconstructions performed with N LOS = 320 , N LOS = 200 and N LOS = 120 , from left to right respectively.

First row of panels: the power-spectrum of γ H i as a function of wavenumber indicated at thetop. In each panel, the thin dashed curve represents the power spectrum of the original ﬁeld while the thick solid line is the power spectrum of the (unsmoothed)recovered ﬁeld. The light shaded region corresponds to the scatter between ﬁve realizations of Gaussian ﬁelds (GRFs) with the same power spectrum as thereconstruction. The wavenumber k is expressed in unit of the inverse of the pixel size multiplied by π , corresponding roughly to k ≃ /L (Mpc) . Secondrow of panels: the probability distribution function as a function of γ H i = ln(1 + δ H i (as it is indicated at the bottom), after smoothing γ H i with a Gaussianwindow of size L s = √ h d LOS i . Solid thick and dashed thin lines correspond to the recovered and the original ﬁelds, respectively. The light shaded regionin each panel represents the scatter derived from the ﬁve GRFs, while the big dots correspond to gaussian proﬁles with same mean and same variance as thesmoothed recovered ﬁelds.

Third row of panels: similarly as for the second row, but for the Euler characteristic.

Fourth row of panels: similarly as for the thirdrow, but for the individual critical point counts. In that case, the thick and thin lines correspond to the recovered and original ﬁelds, respectively. The solid,dashed, dotted and dot-dashed lines correspond respectively to minima, pancake saddle points, ﬁlament saddle points and maxima.c (cid:13) , 000–000 he topology of IGM at high redshift (a)

100 150 200 250 300 350 4000.60.81.01.21.4 N LOS I S D ( s m oo t h i ng l eng t h un i t s ) (b) Figure 11.

Left panel:

Comparison between the skeletons of the original ﬁeld (light lines) and its recovered counterpart (darker lines) (the skeletons representedhere are the true ones, and not their local approximations, as deﬁned in the main text). The original ﬁeld was recovered by inverting N LOS = 320 lines ofsight, corresponding to a separation h d LOS i = 2 . Mpc. Both skeletons are computed on ﬁelds smoothed over a scale L s = 3 . Mpc, in logarithmicspace. For clarity, only a Mpc slice is shown, the background contour representing the original smoothed density ﬁeld (lighter colors corresponding to higherdensities).

Right panel:

Evolution of the inter-skeleton distance (ISD) between the original and reconstructed ﬁelds as a function of the number of line of sight N LOS . The ISD is computed after smoothing over a scale L s = 3 . Mpc which is roughly equivalent to the lowest resolution reconstruction sample. Theupper (crosses) and lower (squares) curves represent the measured median distance from the reconstructed ﬁeld skeleton to the original one and vice versarespectively, while the dotted curve represent their average value.

One of the uncertainties in the reconstruction involves thedetermination of the mean value of the ﬁeld µ true ≡ h γ H i i ≡h ln ρ H i i , which can in principle be estimated only along the LOSs.To improve the quality of the reconstruction, its average is ﬁxed to µ true : γ H i , rec = ˜ γ H i , rec − h ˜ γ H i , rec i + µ true . (12)In practice, the knowledge of µ true is expected to be accurate,even though its actual measured value, µ LOS , is determined alongthe LOSs. For instance in the worse case considered in this work, N LOS = 100 , h ( µ LOS − µ true ) i / / | µ true | ≃ . , where themean value of the difference between the measured and the real µ has been calculated by averaging over 100 different realizations of100 LOSs. A ﬁrst qualitative comparison between the original and the recov-ered ﬁelds in logarithmic space can be made by examining Figure 9.The top panels illustrate the anisotropic nature of the reconstruc-tion. Smoothing at a scale L s > ∼ h d LOS i (for example, in the case ofFig. 9, L s = √ h d LOS i ), greatly improves the agreement betweenthe reconstruction and the exact solution and the two ﬁeld becomealmost indistinguishable (bottom left panels). When one examinesin detail where the reconstruction fails, one notices that these struc-tures correspond to overdense regions. The ﬁne nature of the webformed by overdense regions (ﬁlaments, clusters) makes the recon-struction more difﬁcult for these regions than for the underdenseones because of the sparse sampling of the transverse structures. Note that the inversion formula, equation (8) can be amended to imposedirectly this constraint, following equation (11) of PVRCP, by includingthis information in M . When the analyses are performed in linear space, the normalization isdifferent, as discussed in Appendix A.

When going to linear space, i.e. taking the exponential of theﬁelds and subsequently smoothing them, the effect caused by theampliﬁcation of rare events discussed in § The good agreement between the original and the recovered ﬁeldsin logarithmic space is conﬁrmed by the ﬁrst row of panels inFig. 10, which shows the power-spectrum, P ( k ) = h| γ k | i , ofthe raw ﬁelds, γ H i and γ H i , rec , for three reconstructions ( N LOS =320 , , , corresponding respectively to h d LOS i = 2 . , . and . Mpc). We also show ﬁve realizations of a Gaussian ran-dom ﬁeld (GRF) with the same P ( k ) as γ H i , rec , in order to es-timate ﬁnite volume effects. As expected, the ﬁltering nature ofthe reconstruction introduces an apodization effect on P ( k ) visi-ble on Fig. 10: a bending of P ( k ) is expected to happen roughlyfor k ≃ k bend ≡ π/L T , i.e., k bend = 0 . , . , . fromthe upper left to the upper right panel, respectively, in the unitschosen in Fig. 10. It is not straightforward to check accurately thisproperty by visual inspection. Indeed, when N LOS decreases, thesmall k part of the reconstructed power-spectrum becomes lesswell correlated with the true P ( k ) , giving the illusion, for exam-ple, that overall the N LOS = 120 reconstruction does better thanthe N LOS = 320 one. Still for k < ∼ k bend , i.e., L s > ∼ L T = h d LOS i ,there is a good agreement between the reconstruction and the exactsolution. However, the measurement of the power-spectrum itselfis not accurate enough neither does it contain enough informationto guarantee that ﬁlaments are well reconstructed in detail, as weexamine now. c (cid:13) , 000–000 Sara Caucci et al. R M S Separation [Mpc] S k e w ne ss OriginalRecoveredGaussian

Logaritmic space

Figure 12.

Variance (top panel) and skewness (bottom panel) of γ H i =ln(1 + δ H i ) for the original (open crosses), the recovered (ﬁlled crosses)ﬁelds and the Gaussian prediction (light dots), as functions of the LOS sep-aration, h d LOS i . The symbols correspond to measurements performed onthe γ ﬁelds smoothed with a Gaussian window of size L s = √ h d LOS i .For a proxy of the errorbars, we used the measurements at smoothing scales L s = h d LOS i and L s = √ h d LOS i , except for the Gaussian ﬁelds, wherethe dispersion among the 5 realizations is a better choice. For the Gaussiancase, the skewness should be exactly zero. The measurements are consis-tent with that value, despite the large dispersion at the largest scales, due toﬁnite volume effects. Logarithmic spaceSeparation [Mpc] F illi ng f a c t o r OriginalRecovered

Figure 13.

Same as in Fig. 12 but for the ﬁlling factor of underdense regionsat the minimum of the Euler characteristic. The Gaussian limit, not shownhere, would give a ﬁlling factor exactly equal to 0.5.

10 20 30 40 10 20 30 40

Mpc M p c Logarithmic space: contours d underdense Figure 14.

Contours of underdense regions estimated from the minimumof the Euler characteristic in logarithmic space. The thick curves repre-sent the contours for two recovered ﬁelds γ H i , rec obtained by the inver-sion of N LOS = 320 and N LOS = 200 lines of sight (solid and dashedlines respectively). Prior to contour determination, the recovered ﬁeld wassmoothed with a Gaussian window of size L s = √ h d LOS i . These con-tours should be compared with those of the original ﬁeld, γ H i , representedwith a thin line, smoothed at the same scales (solid and dashed lines respec-tively). This ﬁgure is complemented with the two upper panels of the lowerleft group shown in Fig. 9, where the same slices for the original and therecovered ﬁelds are displayed.

0 10 20 30 40 0 10 20 30 40

Mpc M p c Linear space: contours d underdense Figure 15.

Same as Fig. 14, but smoothing is performed in linear space.This ﬁgure is complemented with the two upper panels of the lower rightgroup in Fig. 9, where the same slices for the original and the recoveredﬁelds are displayed.

The visual inspection of Figure 9 seems to show that the ﬁlamen-tary pattern of the overall three-dimensional distribution is well re-covered by the reconstruction in logarithmic space. One can checkthat assertion more quantitatively on the skeleton (Novikov et al.,2006; Sousbie et al., 2007). This will allow us to deﬁne an opti-mal smoothing scale which will be used in the subsequent analy- c (cid:13) , 000–000 he topology of IGM at high redshift ses. More detailed analyses relying on the skeleton are postponedto another paper.The actual deﬁnition of the skeleton is in fact deeply relatedto the Euler characteristic since it relies on ﬁrst principles of Morsetheory: basically, the skeleton is composed of the set of ﬁeld lines(the curves deﬁned by the gradient of the density ﬁeld) startingfrom the ﬁlament saddle points ( I = 2 in the formalism describedin § Although apparentlysimple, solving this equation is quite difﬁcult, this is why a local ap-proximation, based on a Taylor expansion around the critical pointscontained in the skeleton, was introduced in Novikov et al. (2006)and extended in 3D by Sousbie et al. (2007): the local skeleton. Inthis paper we use the implementation of Sousbie et al. (2007).Note that, as opposed to a global topological estimator suchas the Euler characteristic, the skeleton provides a local test of theaccuracy of reconstruction (i.e., one can check whether a precise ﬁl-ament at a given location is recovered or not). Figure 11(a) presentsthe skeleton (yellow lines) of a Mpc slice extracted from the orig-inal H I density ﬁeld, as well as the skeleton of its reconstructedcounterpart (red lines), using LOSs. Both ﬁelds are smoothedin logarithmic space using a gaussian window of scale L s = 3 . Mpc. This ﬁgure conﬁrms that, on large scales, the general shape ofthe ﬁlamentary structures is well preserved, demonstrating the abil-ity of the reconstruction to recover the cosmic web. Nonetheless, asexpected, some discrepancies appear on small scales.The inter-skeleton distance (ISD) is an estimator which allowsto make a quantitative comparison. A skeleton corresponds to anumber of small segments linked together to form the ﬁlaments.In order to estimate the average distance between two skeletons, Aand B, for each segment of A, the distance to the closest segment inB is computed leading to the PDF of the distribution of the spatialseparation from A to B. The distance from A to B is deﬁned as themedian of this PDF. Since this deﬁnition of distance is not sym-metrical (in the sense that ISD(A,B) and ISD(B,A) will, in general,be different), the mean distance between A and B is deﬁned as theaverage of these quantities. Figure 11(b) presents the measurementof the ISD between the skeleton of the original ﬁeld and its recon-struction, as a function of the number of LOSs used to perform theinversion (in units of the smoothing scale). In all cases, both ﬁeldswere smoothed over a scale L s = 3 . Mpc. What is important tonotice here is the sharp transition at N LOS = 225 , correspondingto L s = L crit with L crit ≃ . h d LOS i . (13)For a smoothing scale L s < ∼ L crit , the match between reconstruc-tion and exact solution worsens suddenly, while no signiﬁcant im-provement is really seen when L s > ∼ L crit : L crit represents some“optimal” smoothing scale, which is the smallest possible scaleat which the reconstruction performs well, in terms of ﬁlamentarypattern recovery. Note that only the measurements for a particularvalue of L s are shown, but Eq. (13) should not change signiﬁcantlyfor the scaling range considered in this work.Although all the subsequent analyses involving smoothingwere performed at various scales, namely L = h d LOS i , h d LOS i , h d LOS i , increasing likewise the number of LOSs con-tributing per smoothing volume, we shall, in light of the aboveﬁndings, mainly concentrate on the results corresponding to L =2 h d LOS i ≃ L . The actual conditions for this deﬁnition to be valid are discussed inNovikov et al. (2006).

The second row of Fig. 10 shows the PDFs of the smoothed coun-terparts of γ H i (dashes) and γ H i , rec (solid), with a window ofsize L s = √ h d LOS i , as argued just above. These measurementsare supplemented with Fig. 12, which shows the variance and theskewness of the PDFs of various ﬁelds as functions of separationbetween the LOSs. Again, the agreement between the solid anddashed curves in second row of panels of Fig. 10 is quite good andthe results do not depend signiﬁcantly on the value of N LOS .From a quantitative point of view, the difference between therecovered and the original curves can be calculated using the fol-lowing estimator, err = P i | y orig i − y rec i | ∆ x i P i | y orig i | ∆ x i , (14)where y orig i = y orig ( x i ) and y rec i = y rec ( x i ) are the values of thecurves relative to the original and the recovered ﬁelds respectivelyand the curves have been sampled at points x i . This corresponds tothe area between the curves, normalized by the area enclosed by theoriginal ones. For the three reconstructions shown, the errors are ofthe order of err PDF = 10 − .These quantitative estimates show that there are still some no-ticeable differences between the reconstruction and the true ﬁeld:the shape of PDF of the reconstructed ﬁeld, γ H i , rec , tends to beGaussian, within the error range provided by the ﬁve Gaussianﬁelds. This “Gaussianisation” is expected from both the centrallimit theorem and the shape of the correlation matrix given byEq. (10). Note that this statement is not totally consistent with themeasurement of the skewness (lower panel of Fig. 12), especiallyat intermediate separations between the LOSs. However, this skew-ness is quite sensitive to the upper tail of the PDF corresponding torare events in overdense regions: one expects, in that regime, de-viations from Gaussianity in the reconstruction because the centrallimit is not yet reached.The true ﬁeld, γ H i , deviates slightly from a Gaussian, as al-ready shown in Fig. 3. In particular, in the right part of the bellshape of the PDF on Fig. 10, there is a slight disagreement betweenthe dashed and the continuous curves, which corresponds to theweak negative skewness measured in lower panel of Fig. 12. Thisdisagreement would be even more visible if a logarithmic represen-tation were used on the y axis to display the PDF: the high densitytail of the H I ﬁeld is far from lognormal. The main contributionto such a tail comes from collapsed objects in clusters and in ﬁl-aments. As argued in § The nearly Gaussian nature of the reconstructed γ ﬁeld can be alsoconﬁrmed by examining the third row of panels in Fig. 10, whichis similar to the second row, but displays the Euler characteristicas a function of the density threshold. Deviations from Gaussianityof the true ﬁeld, γ H i , are more clearly visible than for the PDF. Inparticular, on all the panels, the corresponding dashed curve alwayspresents an asymmetry between its two maxima, contrary to what isobserved in the Gaussian limit. The reconstruction, γ H i , rec , beingmore symmetrical, is clearly closer to the Gaussian limit than thetrue ﬁeld. However, as noticed earlier for the skewness of the PDF,one cannot really claim that the reconstruction is fully Gaussian: c (cid:13) , 000–000 Sara Caucci et al. deviations outside the range allowed by our ﬁve Gaussian realiza-tions are noticeable, particularly in the right panel and in general inthe overdense right tail ( γ > ∼ − . ) of χ + . Still the overall topologyof the reconstructed ﬁeld, although closer to the Gaussian limit, re-produces rather well the topology of the true ﬁeld, especially in therange − . < ∼ γ < ∼ − . , which conﬁrms the ﬁndings of § γ H i and γ H i , rec . Note theincreasing contribution of the noise when N LOS decreases, whichmakes the agreement between reconstruction and exact solutionworse, particularly for large densities, as expected. From a morequantitative point of view, one can, similarly as for the PDF, com-pute the integrated errors on the critical point counts (Eq. 14). Forthe 3 reconstructions we consider here, these errors are of the sameorder as for the PDF (i.e. less than 20%).As an additional test, the minimum of the Euler character-istic, γ min ∼ − , can be used to deﬁne a topological bound-ary between “underdense” and “overdense” regions. Indeed in theGaussian limit, this minimum lies exactly at γ min = h γ i . Deﬁn-ing the ﬁlling factor of underdense regions, F F underdense , as thefraction of space occupied by points verifying γ ≤ γ min , one ex-pects F F underdense to be always close to . : at least this is truefor any monotonic local transform of a Gaussian ﬁeld (with no ad-ditional smoothing). Even though the reconstruction and the trueﬁeld do not have exactly the same behavior for χ + , they seem tohave very close values of γ min , which should correspond to a goodagreement between the measured values of F F underdense : this isindeed the case as shown in Fig. 13. Although the measured val-ues of

F F underdense are consistent with those of the original ones,the connectivity of the underdense (or equivalently, overdense) re-gions deﬁned in this way is good but not perfect, as illustrated byFig. 14. In this range of densities, connectivity of the excursion iscontrolled equally by ﬁlament and pancake saddle points and theirrespective counts tend to be slightly underestimated and overesti-mated, respectively, as illustrated by bottom panels of Fig. 10. Thisis however not enough to explain the discrepancies in Fig. 13, andshows the limits of global topological estimators.At the qualitative level, note ﬁnally that the situation becomesworse when one attempts to recover the boundary contour betweenoverdense and underdense regions in linear space, because of thebias mentioned in § In this paper we have studied the topology of large scale structuresas traced by the intergalactic medium (IGM) in a hydrodynamicalcosmological simulation. The main goal was to test a reconstruc-tion method (PVRCP) of the three-dimensional large scale matterdistribution from multiple lines of sights (LOSs) towards quasars.For this purpose, we relied on a number of global statistical tools,the probability distribution function of density ﬁeld (PDF), the Eu-ler characteristic ( χ + ) as an alternate critical point counts and re- lated quantities such as the variance and the skewness of the PDF,the individual critical points counts and the ﬁlling factor of the un-derdense regions at the minimum of the Euler characteristic. Wealso used the skeleton as local probe of the geometry and the topol-ogy of the ﬁeld. The main results of our investigations can be sum-marized as follows • In the ﬁrst part of this paper we addressed the problem of re-lating the topology of the dark matter density ﬁeld to the topologyof the distribution traced by the total amount of gas and the neutralgas (H I ). When one considers the H I density distribution at scaleslarger than the Jeans length of the gas and takes into account theIGM equation of state relating the neutral and total amount of gas,then the properties of this nearly lognormal distribution are exactlythe same as found for the dark matter/total gas in underdense re-gions (i.e. for density contrasts δ < ∼ ). For larger density contrasts,some deviations appear, due to shocks (where H I is depleted) and tothe presence in ﬁlaments and clusters of highly condensed objects(where H I is very concentrated). Taking these results into account,with the additional assumption that instrumental noise, in particulareffects of saturation, can be neglected, we have shown that studyingthe topological properties of large scale matter density distributionis equivalent to studying directly those of the optical depth or inwhat follows, those of neutral gas, H I . • In the second part of this work we tested the Wiener interpo-lation proposed by PVRCP to recover the three-dimensional distri-bution of H I from a set of multiple LOSs, along which the (one-dimensional) distribution of H I is assumed to be known exactly.This interpolation depends on three parameters, the typical size, L x of structures along the LOSs, the typical mean LOSs separa-tion, L T = h d LOS i , and the expected variance of the ﬂuctuationsof the ﬁeld which can be in principle indirectly inferred from theLOSs themselves.Our investigation shows that the reconstruction method can beused to predict quite accurately the patterns in the large scale mat-ter distribution at scales of the order of ∼ . h d LOS i or larger whenone attempts to recover the logarithm of the density ﬁeld. In par-ticular it allows us to recover the position of ﬁlaments in the largescale distribution: we compared the skeleton of the initial and re-covered ﬁeld and measured the distance between these skeletonsand found that for smoothing scales larger than ∼ . h d LOS i , theinter-skeleton distance remains smaller than h d LOS i . Furthermore,the global shape of the PDF, of the fraction of critical points andof the Euler characteristic are well reproduced, the integral errorson these quantities varying in the range 10-20%. Discrepancies be-tween the reconstruction and the exact solution are mainly foundin overdense regions, where deviations from a lognormal behaviorare the most signiﬁcant.The good recovery of the statistical properties of the densityﬁeld in logarithmic space, is strongly related to the Gaussian prioron which the inversion method is based. Recall that, since the dis-tribution of the gas density is very close to log-normal, the distri-bution of its logarithm is well approximated by a Gaussian func-tion. As demonstrated in PVRCP, the Wiener interpolation is justa special case of the maximum likelihood method. It gives, underthe hypothesis that the statistical distributions of the data and ofthe parameters are Gaussian, the optimal reconstruction for a lin-ear model. However, this relies on a proper knowledge of the co-variances matrices. Here we assume a simple functional shape forthese matrices, given by Equation (10). A better treatment wouldneed an accurate knowledge of the underlying power-spectrum ofthe logarithm of the density. The interpolation could for instance be c (cid:13) , 000–000 he topology of IGM at high redshift improved by using a stronger prior relying on the extension of e.g.the nonlinear ansatz of Hamilton et al. (1991) to logarithmic space.We noticed that some deviations are present in the originalﬁeld, compared to the log-normal limit at the scales we have probedhere. This information could be added to the model. This could beachieved by applying an Edgeworth expansion to the logarithm ofthe ﬁeld (Juszkiewicz et al. 1995; Colombi 1994), hence by tak-ing into account slight deviations of the likelyhood function froma Gaussian distribution to correct our Wiener interpolator (Amen-dola, 1998).Even though the best variable for the reconstruction is the log-arithm of the density, theoretical predictions are usually performedon the density itself. Therefore it is in practice difﬁcult to comparethe properties of the reconstructed density distribution to those pre-dicted by e.g. nonlinear perturbation theory (e.g. Bernardeau et al.2002) or other models. The problem is that linear space gives moreemphasis to rare events in overdense regions. In Appendix A weanalyse the corresponding bias on the reconstruction, and ﬁnd thatit is critical for the higher density peaks. As a result, the tomog-raphy is in practice much less robust when expressed directly inlinear space. However, this is mainly related to the fact that ouranalyses are performed at scales smaller or of the order of 4 Mpc,where non-linear effects in the dynamics are still present.Due to the size of our simulation ( L box = 40 Mpc), in thiswork we have analysed the properties of connectivity at relativelysmall scales ( L s = 4 Mpc), where the distribution of matter isclose to log-normal . However, one could in principle extend theanalyses to larger scales, to probe the linear or quasi-linear regime,where the density distribution is actually close to Gaussian. In thatcase, the reconstruction should be performed on the density itselfrather than on its logarithm while the above mentioned problemswould become irrelevant. In particular, the implementation of theimprovements on the Wiener interpolator could for instance be usedto test directly if non trivial deviations from Gaussianity are presentor not in the data. If present, they could be ascribed to primor-dial non-gaussian features that are produced during the inﬂationaryphase or as a result of topological defects.The inversion method is based on the hypothesis that a suf-ﬁciently strong correlation exists at the scale under consideration.Indeed, various sources of noise can hide such a correlation com-pletely (errors due to the ﬁnite cosmological volume probed by aﬁnite number of LOSs, noise in the measurement of the spectra),making the reconstruction irrelevant. To test the strength of thecorrelation a large number of quasar pairs spanning the range ofseparations we want to probe must be observed. It has been re-cently shown (Coppolani et al. 2006) that at z ≈ for a separationof ∼ arcmin (corresponding to ≈ . Mpc for a ﬂat universewith Ω m = 0 . , Ω Λ = 0 . and H = 70 km s − Mpc − ), somecorrelation is observed, suggesting that the inversion method couldbe applied at such scales. It is thus very important to measure moreaccurately the transverse correlation function from quasar pairs. In-deed, once this is done, we can include this information as a self-consistent prior in the reconstruction procedure.Using realistic data about the luminosity function of quasars(Jiang et al. 2006), it is found that for magnitude limits of g < ∼ (23 , , the number of quasars observed per square degreeat z > ∼ is n QSOs = (41 , , respectively. For the set of cos- Note that, because of the small size of our simulation, we could notreally examine the effects of cosmic variance, except with our Gaussianrealizations. Magnitude limit Separation (QSOs) Separation (QSOs et LBGs) g (arcmin) (arcmin)23 9.4 9.324 6.8 4.325 5.2 1.2 Table 3.

Mean angular separation between the background sources as afunction of the magnitude limit (left column). mological parameters assumed here, the corresponding mean angu-lar separations are h d LOS i = (9 . , . , . arcmin. Moreover,for g > ∼ the number density of Lyman-break galaxies (LBGs)starts to become signiﬁcant and we can think of using these ob-jects as background sources in combination with QSOs. In par-ticular, it is found that for g < ∼ (23 , , the number of LBGspar square degree is n LBGs = (0 . , , respectively (Adel-berger & Steidel 2000), so that, even at g < ∼ , the number of avail-able sources is largely increased. In Table 3 we display the meanseparation one can expect as a function of the magnitude limit.One can see that if we are able to observe objects up to a mag-nitude limit of g ∼ , the density of background sources willbe high enough to perform a reconstruction similar to what de-scribed in this paper. A better approach will be to search for pe-culiar ﬁelds in which the density is larger by chance (e.g. Petitjean1997). The spectral resolution will be a decreasing function of themagnitude. Observational difﬁculties will include the contamina-tion of the LBG spectrum by absorption lines originating in the in-terstellar medium of the galaxy and the fact that the mean redshift( z ≈ . ) will be larger than what we have considered in this paper.To reach these faint magnitudes we need to wait for the advent ofthe Extremely Large Telescopes (Theuns & Srianand 2006).To conclude, the approach developed here is very promising asthe advent of Extremely Large Telescopes will boost this ﬁeld byallowing the observation of a number of background sources largeenough to probe the distribution of the matter with accurate preci-sion at the scales under consideration. The total amount of observ-ing time will be large however but worthwhile given the expectedresults foreseen in this paper. ACKNOWLEDGEMENTS

We thank D. Pogosyan for providing us with his calculations ofcritical count numbers predicted in the Gaussian limit, as dis-played as smooth curves on Fig. 2. We thank S. Prunet, R. Teyssierand D. Weinberg for stimulating discussions and D. Munro forfreely distributing his Yorick programming language and opengl in-terface (available at http://yorick.sourceforge.net/

REFERENCES

Adelberger, K. L., Steidel, C. C., 2000, ApJ, 544, 218Amendola, L., 1998, astro-ph/9810198Aracil, B., Petitjean, P., Smette, A., Surdej, J., M¨ucket, J.P., Cristiani, S.,2002, A&A, 391, 1c (cid:13) , 000–000 Sara Caucci et al.

Bardeen, J. M., Bond, J. R., Kaiser, N., Szalay, A. S., 1986, ApJ 304, 15Bernardeau, F., Colombi S., Gazta˜naga, E., Scoccimarro, R., 2002, PhysicsReports 367, 1Bi, H. G., Davidsen, A. F., 1997, ApJ, 479, 523Cen, R., Miralda-Escud´e, J., Ostriker, J., P., Rauch, M., 1994, ApJ, 437, L9Choudhury, T. R., Padmanabhan, T., Srianand, R., 2001, MNRAS, 322, 561Coles, P., Jones, B.,1991, MNRAS, 248, 1Colombi, S., 1994, ApJ, 435, 536Colombi, S., Bouchet, F. R., Schaeffer, R., 1994, A&A, 281, 301Colombi, S., Pogosyan, D., Souradeep, T., 2000, PhRvL, 85, 5515 (CPS)Coppolani, F., Petitjean, P., Stoehr, F., Rollinde, E., Pichon, C., Colombi,S., Haehnelt, M., Carswell, B., Teyssier R., 2006, MNRAS, 370, 1804Croft, R.A.C., Weinberg, D.H., Katz, N., Hernquist, L., 1998, ApJ, 495,44Crotts, A.P.J., Fang, Y., 1998, ApJ, 502, 16Dav´e et al. 2001, ApJ, 552, 473Desjacques, V,, Nusser, A., Sheth, R. K., 2007 MNRAS, 374, 206Doroshkevich, A. G. 1970, Astrophysics, 6, 320Gnedin, N. Y., Hui, L., 1998, MNRAS, 296, 44Gott III, J. R., Melott, A. L., Dickinson, M. 1986, ApJ, 306, 341Gott III, J. R., Weinberg, D. H., Melott, A. L., 1987, ApJ, 319, 1Guimares, R., Petitjean, P., Rollinde, E., de Carvalho, R. R., Djorgovski, S.G., Srianand, R., Aghaee, A., Castro, S., 2007, MNRAS, 377, 657Gunn, J. E., Peterson, B. A. 1965, ApJ, 142, 1633Guzzo, L. 2001, astro-ph/0102062Hamilton, A. J. S.; Kumar, P.; Lu, Edward; Matthews, Alex 1991, ApJ, 374,1LHui, L., Stebbins, A., Burles, S. 1999, ApJ, 511, L5Jiang, L., Fan, X., Cool, R. J., Eisenstein, D. J., Zehavi, I., Richards, G. T.,Scranton, R., Johnston, D., Strauss, M. A., Schneider, D. P., Brinkmann,J., 2006, AJ, 131, 2788Juszkiewicz, R., Weinberg, D. H., Amsterdamski, P., Chodorowski, M.,Bouchet, F., 1995, ApJ, 442, 39Kaiser, N. 1984, ApJ, 284, L9Kaiser, N. 1987, MNRAS, 227, 1McDonald, P., Miralda-Escud´e, J. 1999, ApJ, 518, 24ecke, K. R., Buchert, T., Wagner, H., 1994, A&A 288, 697Milnor, 1963,

Morse Theory p.29Miralda-Escud´e, J., Cen, R., Ostriker, J. P., Rauch, M., 1996, ApJ, 417, 582Mucket, J. P., Petitjean, P., Kates, R. E., Riediger, R. 1996, A&A, 308, 17Nakagami, T., Matsubara, T., Schmalzing, J., Jing, Y., 2004, astro-ph/0408428Novikov, D., Colombi, S., Dor´e, O., 2006, MNRAS, 366, 1201Nusser, A., Haehnelt, M., 1999, MNRAS, 303, 179Park, C., Choi, Y., Vogeley, M. S., Gott III, J. R., Kim, J., Hikage, C., Mat-subara, T., Park, M., Suto, Y., Weinberg, D. H., 2005, ApJ, 633, 11Peacock, J. A., Dodds S. J., 1996, MNRAS, 280, L19Petitjean, P., M¨ucket, J. P., Kates, R. E., 1995, A&A, 295, L9Petitjean, P., 1997, euvl.conf, p266, (arXiv:astro-ph/9608115)Petitjean, P., Surdej, J., Smette, A., Shaver, P., M¨ucket, J., Remy, M. 1998,A&A, 334, L45Pichon, C., Vergely, J. L., Rollinde, E., Colombi, S., Petitjean, P., 2001,MNRAS, 326, 597 (PVRCP)Protogeros, Z. A. M., Weinberg, D. H., 1997, ApJ, 489, 457Rauch, M., 1998, ARA& A, 36, 267Rollinde, E., Petitjean, P., Pichon, C., 2001, A&A, 376, 28Rollinde, E., Petitjean, P., Pichon, C., Colombi, S., Aracil, B., D’Odorico,V., Haehnelt, M. G. 2003, MNRAS, 341, 1299Schmalzing, J., Buchert, T., 1997, ApJ, 482, L1Sousbie, T., Pichon, C., Courtois, H., Colombi, S., Novikov, D., 2006, sub-mitted to ApJLett. (arXiv:astro-ph/0602628)Sousbie, T., Pichon, C., Colombi, S., Novikov, D., Pogosyan, D. 2007, sub-mitted to MNRAS (arXiv:astro-ph/0707.3123)Spergel, D. N., Bean, R., Dor´e, O., Nolta, M. R., Bennett, C. L., Dunkley, J.,Hinshaw, G., Jarosik, N., Komatsu, E., Page, L., Peiris, H. V., Verde, L.,Halpern, M., Hill, R. S., Kogut, A., Limon, M., Meyer, S. S., Odegard,N., Tucker, G. S., Weiland, J. L., Wollack, E., Wright, E. L., 2007,ApJS, 170, 377 R M S Separation [Mpc] S k e w ne ss OriginalRecoveredLognormal

Linear space

Figure A2.

Same as Figure 12, but in linear space, as explained in captionof Fig. A1.Theuns, T., Leonard, A., Efstathiou, G., Pearce, F.R., Thomas, P. A., 1998,MNRAS, 301, 478Theuns T. & Srianand R., 2006, IAUS, 232, 464 (arXiv:astro-ph/0601637v1)Trac, H., Mitsouras, D., Hickson, P., Brandenberger, R., 2002, MNRAS,330, 531Viel, M., Matarrese, S., Mo, H. J., Haehnelt, M. G., Theuns, T., 2002, MN-RAS, 329, 848Viel M., Matarrese S., Mo, H. J., Theuns T., Haehnelt M. G. 2002, MNRAS,336, 685Viel, M., Haehnelt, M. G., Springel, V., 2004, MNRAS, 354, 684Viel, M., Haehnelt, M. G., Springel, V., 2006, MNRAS, 367, 1655Vogeley, M. S., Park, C., Geller, M. J., Huchra, J. P., Gott III, J. R., 1994,ApJ, 420, 525Young, P. A., Impey, C. D., Foltz, C. B. 2001, ApJ, 549, 76Zaroubi, S., Viel, M., Nusser, A., Haehnelt, M., Kim, T. S., 2006, MNRAS369, 734

APPENDIX A: RECOVERED FIELD: ANALYSIS INLINEAR SPACE

While the reconstruction seems to perform well for γ = ln( ρ ) andits smoothed counterparts (except that it is somewhat “Gaussian-ized”, as shown by the measurements in the main text), let us nowinvestigate what happens for the statistical properties of the ﬁelditself ρ = exp( γ ) .It was noted in that case ( § § c (cid:13) , 000–000 he topology of IGM at high redshift −10 −8 −6 −4 −2 d HI P ( k ) P D F E u l e r c ha r a c t e r i s t i c C r i t i c a l po i n t s N LOS = 320 N

LOS = 200 N

LOS = 120|k|

Figure A1.

Same as in Fig. 10, but in the linear space, i.e. by taking the exponential of the recovered ﬁelds and the Gaussian realizations along withnormalization A1, with subsequent smoothing with a Gaussian of size L s = √ h d LOS i for the last three row of panels. The big dots on the second row ofpanels now correspond to a lognormal distribution with same variance and average as the reconstruction. ρ ∼ exp( γ ) , rare events in overdense regions (which are poorly re-constructed) dominate. As a consequence, the reconstruction failswith respect to the mean density: equation (12) is clearly not ap-propriate anymore to normalize the reconstruction. Instead, the re-constructed density, ρ H i , rec , is renormalized as follows: ρ H i , rec = h ρ H i i exp(˜ γ H i , rec ) h exp(˜ γ H i , rec ) i , (A1)where h ρ H i i is the true mean density in the simulation. Note thatthis density is no longer accurately determined from direct mea-surements on the LOSs: in the worse case considered in this paper, N LOS = 100 , we indeed ﬁnd a relative error on the estimate of c (cid:13) , 000–000 Sara Caucci et al.

Linear spaceSeparation [Mpc] F illi ng f a c t o r OriginalRecovered

Figure A3.

Same as Fig. 13, but in linear space, as explained in caption ofFig. A1. h ρ H i i of the order of %. However, the simulation volume is quitesmall, leading to unrealistically short LOSs. In real observations thedetermination of the average neutral gas density along LOSs shouldbe much more accurate (Guimares et al. 2007).The choice of the normalization given by Eq. (A1) is naturalsince it imposes the average density of the reconstructed ﬁeld to beequal to that of the exact solution. However, because it is still af-fected by overdense regions contributions, this normalization is notfully satisfactory as it does not lead to the appropriate correctionsin underdense regions, as can be noted by a careful examination of4 lower right panels of Fig. 9.The contamination by high density peaks affects all statistics,as illustrated by Figs. A1 and A2. This is particularly dramatic forsecond order statistics (upper row of Fig. A1 and upper panel ofFig. A2). The reconstruction underestimates the normalization ofthe power-spectrum, and as a result the variance of the PDF, espe-cially when the separation between the LOSs is small: in the lattercase, nonlinear features in the density ﬁeld are given more weightand are poorly captured by the reconstruction. This appears as ashift in the PDF shown in the second row of panels in Fig. A1,worsening with increasing N LOS . Note however that the agree-ment between the reconstruction and the exact solution, althoughpoorer than in logarithmic space, improves when N LOS < ∼ .Note also that the smoothed lognormal ﬁelds no longer match thereconstruction. In fact, in the linear space, its seems that the recon-struction gives a solution intermediate between the exact one andthe smoothed lognormal ﬁelds, both from the point of view of thepower-spectrum and the PDF (and its cumulants) (Fig. A2): it cap-tures more than just the Gaussian features of the logarithm of thereal solution, as would have naively followed from the analysis of § χ + ). Note that overall, the reconstructionmatches better the lognormal behaviour than the true solution, espe-cially when N LOS is large, implying that “lognormalization” dom- inates, at least from a topological point of view, while nonlineardynamics implies signiﬁcant departures from a purely lognormalbehavior. This explains again why the quality of the reconstructiondecreases when attempting to probe the smallest scales. Note thatthis does not mean that decreasing the number of LOSs is better: theanalysis always looks at the smallest scale recoverable in logarith-mic space, ∼ . h d LOS i . At ﬁxed smoothing scale, a reconstruc-tion with a given number of LOSs does better than a reconstructionwith sparser LOS sampling. Still, note that the reconstruction doesmore than a simple “lognormalization” as it gives an intermediaryanswer between the expected lognormal behavior from the analysisin logarithmic space and the true solution, at least from the pointof view of the PDF and the Euler number. The uncertainties in themeasurements due to the emphasis put on rare events are howevertoo large to drive deﬁnite conclusions with a small sample of LOS:the spread between the ﬁve lognormal ﬁelds is much larger thanthey were in the logarithmic space (and similarly for the PDF).Let us ﬁnally check the global topological properties of thereconstruction by examining the number counts of each kind ofcritical points individually, as shown in the last row of panels inFigure A1. Notwithstanding all the above points, note that the in-version achieves a fair reconstruction of the distribution of some ofthe critical points: in the low density regime, it overestimates the lo-cal minima count, as expected from visual inspection of four lowerright panels of Fig. 9 and from the PDF: the reconstructed ﬁeldin underdense region is overestimated. In the intermediate densityrange, reconstruction overestimates pancake saddle point counts(and to a lesser extent, underestimates ﬁlaments saddle point andlocal maxima counts) for N LOS = 320 while larger separationsbetween LOSs do better. In the overdense regime, where the recon-struction fails more dramatically, and where the ampliﬁcation of theerrors is large, one tends to overestimate (underestimate) ﬁlamentsaddle points (local maxima).Still, it is interesting to note that the local minimum of the Eu-ler number, ρ min ∼ . is comparable for the reconstruction andthe exact solution, suggesting that the measured ﬁlling factor de-ﬁned previously will be similar for the reconstruction and the exactsolution: according to Figure A3, the ﬁlling factor of underdenseregions at the minimum of the Euler number does nearly as well asin logarithmic space, but the match between its isocontours is worsethan before (compare Figure 14 with Figure 15): thus, even if thecritical point counts and the fraction of underdense regions agree,this does not necessarily imply that the structures, in particular thedensest ones, are at the right position. We did not examine the skeleton in linear space to ﬁnd the best smoothingscale in that case. c (cid:13)000