[PDF] On improving analytical models of cosmic reionization for matching numerical simulation

Abstract

The methods for studying the epoch of cosmic reionization vary from full radiative transfer simulations to purely analytical models. While numerical approaches are computationally expensive and are not suitable for generating many mock catalogs, analytical methods are based on assumptions and approximations. We explore the interconnection between both methods. First, we ask how the analytical framework of excursion set formalism can be used for statistical analysis of numerical simulations and visual representation of the morphology of ionization fronts. Second, we explore the methods of training the analytical model on a given numerical simulation. We present a new code which emerged from this study. Its main application is to match the analytical model with a numerical simulation. Then, it allows one to generate mock reionization catalogs with volumes exceeding the original simulation quickly and computationally inexpensively, meanwhile reproducing large scale statistical properties. These mock catalogs are particularly useful for CMB polarization and 21cm experiments, where large volumes are required to simulate the observed signal.

Full PDF

DDraft version November 9, 2018

Preprint typeset using L A TEX style emulateapj v. 5/2/11

ON IMPROVING ANALYTICAL MODELS OF COSMIC REIONIZATIONFOR MATCHING NUMERICAL SIMULATION

Alexander A. Kaurov Draft version November 9, 2018

ABSTRACTThe methods for studying the epoch of cosmic reionization vary from full radiative transfer simu-lations to purely analytical models. While numerical approaches are computationally expensive andare not suitable for generating many mock catalogs, analytical methods are based on assumptionsand approximations. We explore the interconnection between both methods. First, we ask how theanalytical framework of excursion set formalism can be used for statistical analysis of numerical sim-ulations and visual representation of the morphology of ionization fronts. Second, we explore themethods of training the analytical model on a given numerical simulation. We present a new codewhich emerged from this study. Its main application is to match the analytical model with a numer-ical simulation. Then, it allows one to generate mock reionization catalogs with volumes exceedingthe original simulation quickly and computationally inexpensively, meanwhile reproducing large scalestatistical properties. These mock catalogs are particularly useful for CMB polarization and 21cmexperiments, where large volumes are required to simulate the observed signal. INTRODUCTION

The nature of the epoch of cosmic reionization involvesa large dynamic range of scales and various physical pro-cesses. An accurate treatment of this epoch requiresdetailed bookkeeping of the photon budget. Therefore,good understanding of astrophysics as well as large scalestructure formation and cosmology is a necessity. Indeed,after being emitted by a star, a photon travels throughpartially neutral interstellar medium (ISM) and thenthrough intergalactic medium (IGM) populated with Ly-man limit systems (LLS) before it reaches the ionizationfront. The diversity of these environments makes studyof reionization challenging.The past two decades have seen rapid development ofanalytical (Miralda-Escud´e et al. 2000; Furlanetto et al.2004; Kuhlen & Faucher-Gigu`ere 2012) and numerical(Iliev et al. 2006; McQuinn et al. 2007; Zahn et al. 2007;Croft & Altay 2008; Lee et al. 2008; Shin et al. 2008;Trac et al. 2008; Iliev et al. 2009; Aubert & Teyssier 2010;Friedrich et al. 2011; Ahn et al. 2012; Shapiro et al. 2012;Hutter et al. 2014; Iliev et al. 2014; So et al. 2014; Nor-man et al. 2015) methods of studying the epoch of cosmicreionization. Nevertheless, both of them struggle fromvarious limitations. The growth of computer capabilitiesgave a great boost to the numerical methods (Trac &Gnedin 2011). However, still, simulations are limited bythe dynamic range, i.e. either smaller boxes with highresolution or large boxes with low resolution. Therefore,either the correlations on large scales are neglected or thecrude approximations are adopted for sub-grid models.On the other hand, the analytical models are compu-tationally inexpensive and potentially can probe the vol-umes of any sizes. Also, the framework of these modelsprovides a descriptive view on reionization, and thereforedevelops intuition about the signiﬁcance of each physi-cal process. However, those assumptions, which makethe framework so eﬃcient, also cause some limitations. Department of Astronomy & Astrophysics, The Universityof Chicago, Chicago, IL 60637 USA; [email protected]

Firstly, it limits physical processes which can be includedin the model. Secondly, the approximations cause unpre-dictable accuracy. In particular, one cannot invest morecomputational power to achieve higher accuracy.These limitations are reﬂected in the capabilities ofthese models to be compared with observations. For in-stance, in order to match the galaxy luminosity function(star formation history) and escape fractions one has noother choice but to run a detailed high resolution simula-tion; while for the large scale morphology studies (21cmtomography) large number of realizations in cosmolog-ical volumes are needed, which can be done only withapproximate semi-numerical models.In this study we attempt to take the best from bothmethods. First, in § § §

4. If it is informative enough,then it will be suﬃcient for reproducing the reionizationhistory on a diﬀerent set of initial conditions. In otherwords, we perform a training (or tuning) of an analyt-ical model on a given numerical simulation and discusshow well it can be done. In contrast to Battaglia et al.(2013) where a similar goal was set, we use the excursionset formalism (Press & Schechter 1974) instead of biasapproach. The performance of the excursion set analyti-cal models versus numerical simulation has been carriedout in Zahn et al. (2011), where the authors put thesame physics (the eﬃciency of ionizing photon produc-tion) into numerical and analytical models. We adopt adiﬀerent approach, and try to ﬁnd any analytical modelthat would correspond to a given numerical simulation.The results from this study allow us to make conclu-sions about capabilities of the analytical approach, i.e.what physical processes they generally fail to mimic and a r X i v : . [ a s t r o - ph . C O ] S e p Alexander A. Kaurovfor which tasks it can be used for. Also, we show howthe analytical framework can play a complimentary rolein numerical calculations by providing a tool for visual-ization of spatially complicated reionization history.Additionally, we present a semi-numerical code basedon excursion set formalism which emerged from thisstudy. Its main application is training of an analyti-cal model on a given numerical simulation. Then, thistrained model can be used for rapid generation of mockcatalogs for 21cm experiments.We start by discussing the motivation in § § § § § § § MOTIVATION

The value of any theoretical model is its capability tobe experimentally tested. The new observational facili-ties are able to probe the most interesting for reioniza-tion range of redshifts 5 < z <

20. Here we discuss howthose can be estimated in the theoretical models. First,we list those observations which can be compared versusexisting numerical simulations: • The integrated optical depth of Thomson scattering onfree electrons to the cosmic microwave background pro-vides a glimpse to the history of ionization. The mea-sured value by the Wilkinson Microwave AnisotropyProbe argues in favor of the scenario when reioniza-tion began at z >

15 (Hinshaw et al. 2013; Bromm& Yoshida 2011; Dunlop 2012) with measured opti-cal depth: τ = 0 . ± . τ = 0 . ± . • The end of reionization is probed by the observationof the Gunn-Peterson eﬀect in spectrum of high red-shift QSOs (Fan et al. 2006; Bolton et al. 2011) andgamma gay bursts (Chornock et al. 2013), as well as theLyman- α emission from high redshift galaxies (Starket al. 2010; Pentericci et al. 2011, 2014; Schenker et al.2011, 2014; Treu et al. 2013; Tilvi et al. 2014). • Another observable is luminosity function and itsderivative star formation rate at high redshift. Currentobservations by Hubble Space Telescope Ultra Deepallow to probe the galaxies up to redshift 10. Themodern simulations of reionization including the onewe use (Gnedin 2014) are capable of reproducing theseluminosity functions.In contrast to the observations listed above, these can-not be numerically simulated with the same precision dueto much larger volumes: • The polarization of CMB caused by reionization allowsto put constraints on its duration and redshift (Zahnet al. 2012; Benson et al. 2014). • The most intriguing observation is 21 cm line, whichwill allow to map hydrogen at high redshifts. Those include SKA , LOFAR , MWA Tingay et al. (2013),PAPER (Parsons et al. 2010) and HERA .The data from these types of experiments requireMonte Carlo simulations for proper analysis. However,current computational capabilities do not allow to runmultiple simulations in large (1 Gpc) cosmological boxeswith ﬁne radiation transfer and galaxy formation. There-fore, semi-numerical models are adopted for these tasks. PRELIMINARIES

Analytic models

We leave beyond the scope of this paper the analyti-cal models of homogeneous reionization (i.e. Kuhlen &Faucher-Gigu`ere (2012)) and focus on the two of the mostpopular inhomogeneous models. Those are Miralda-Escud´e et al. (2000) (

MHR00 hereafter) and Furlanettoet al. (2004) (

FZH04 hereafter) and their derivatives.Both models are based on the underlying overdensityﬁeld δ . The key concept is the density scale, i.e. thedensity averaged over some scale R . The density ρ ( R )can be deﬁned as: ρ ( R ) = (cid:90) V R δ d V /V R , (1)where V R is the volume of a sphere of a radius R , centeredat the point of interest. However, in practice smoothingis performed through ﬁltering in the Fourier space.If the Gaussian density ﬁeld is considered, then anotheruseful quantity emerges from the Fourier deﬁnition – thevariance of density at a given scale: σ ( R ) = (cid:10) δ ( R ) (cid:11) = (cid:90) ∞ P ( k ) ˜ W ( kR ) d k, (2)where P ( k ) is the power spectrum and ˜ W R is the Fouriertransform of a spherical ﬁlter: W R ( r ) = Θ(1 − r/R ) . (3)We do not specify what the scalar ﬁeld, δ , is that weuse in this method. It can be the baryon or dark matterdensity, the initial Gaussian ﬁeld or evolved non-linearﬁeld. Also, it can be the halo density ﬁeld or star forma-tion density.In case of the Gaussian ﬁeld the distribution of ρ ( R )in the universe is Normal by deﬁnition for any R , and σ ( R ) calculated with the Equation 2 is its variance.However, for any other non-Gaussian scalar ﬁeld, thedistribution can diverge from Normal. Nevertheless, inorder to have the plots easily readable and consistentwith each other we normalize the ρ ( R ) to zero meanand unit variance. Even though this operation erasesthe shape of distribution function, it essentially aﬀectsonly visual representation of the ﬁgures. Thus, we la-bel those normalized values as “in units of σ ” in all theﬁgures.It is common in the literature to use other variablesinstead of scale R . Among those are the mass, m , of asphere with the radius R , or variance, σ , at scale R .All three values can be derived from each other. In this http://reionization.org/ atching analytical models of cosmic reionization with numerical simulations 3paper we use only R since it is most appropriate for ourpurposes. MHR00 model

The MHR00 model was built to describe the laterstages of reionization when only the small patches of hy-drogen are left neutral. It describes a process of theionization background burning into the dense neutral re-gions, where the recombination and photoionization ratesare comparable. The characteristic overdensities of theregions where this eﬀect takes place were studied in Kau-rov & Gnedin (2015) with the same simulation. The re-sult shows that inside the ionized regions the local den-sity is a proxy for the local ionization fraction.This model uses one speciﬁc scale, R MHR , and deter-mines the ionization fraction of a cell based on its localdensity ρ ( R MHR ) only. This relatively simple model iscapable of describing Lyman- α forest statistics.The model is based on the assumption of quasi-staticionization equilibrium, i.e. when the local recombinationrate is close to the ionization rate: R u x (1 + δ ) = Γ(1 + δ )(1 − x ion ) , (4)where δ and x ion are local gas overdensity and ionizaedfraction, Γ is ionizing background, and R u is the ratio ofthe recombination rate for a homogeneous universe to theHubble constant, which is order of unity at z ≈ x ion = 0 . R MHR radius and therefore does notdepend on the Large-Scale Structure.

FZH04 model

In contrast to the MHR00, the FZH04 model was de-veloped to describe the morphology of the reionizationon the large scales. In order to reproduce the large scaleinhomogeneity it has to take into account overdensity notonly at a single scale but on a range of scales.In the FZH04 model we assign to each point at position r a one-dimensional function, trajectory T ( R ), which cor-responds to the density of the point deﬁned at diﬀerentscales R : T r ( R ) = (cid:82) δ ( x ) W R ( | r − x | )d x σ ( R ) . (5)The physical motivation is that the function T ( R ) canprovide an estimate for the fraction of matter collapsedinto halos (Press & Schechter 1974). Then, assuminga rate of ionizing photons production in halos, one canderive the total number of ionizing photons in the spher-ical regions centered at the point of interest. If for anyof those regions the number of ionizing photons exceedsthe number of hydrogen atoms in the same region, thepoint is considered ionized.Various improvements can be made for this model,which allow to include more sophisticated physics(Furlanetto & Oh 2005; Furlanetto et al. 2006; Alvarez & Abel 2007; Mesinger & Furlanetto 2007; Mesinger et al.2011; Zahn et al. 2011; Alvarez & Abel 2012; Battagliaet al. 2013; Kaurov & Gnedin 2013; Zhou et al. 2013;Kaurov & Gnedin 2014; Sobacchi & Mesinger 2014).However, the main principle remains the same. Allphysics are combined into the main parameter of themodel – the barrier function, B ( R, z ), which is the func-tion of scale and redshift. The model predicts that thepoint is ionized at redshift z if its trajectory T ( R ) inter-sects with B ( R, z ).This model gained popularity because it is physicallyjustiﬁed, based on excursion set framework which iswidely used in structure formation theory, and is com-putationally eﬃcient. It became a starting point forthe development of semi-numerical codes like 21cmFAST(Mesinger et al. 2011).The weak part of this family of models is the transi-tion from physical equations to the shape of B ( R, z ). Atthis step a lot of assumptions are made and it is hard totrace how they aﬀect the accuracy of the model. The bar-rier eﬀectively becomes degenerate because of the largeamount of physical eﬀects contributing to its shape.

Numerical simulation

For this study we use simulations from the CosmicReionization On Computers (CROC) project (Gnedin2014; Gnedin & Kaurov 2014) as a reference numericalsimulation. These simulations include a large variety ofphysical processes which aﬀect the morphology of reion-ization in diﬀerent ways on various scales. For instance,the non-spherically-symmetric escape fraction may aﬀectthe shape of small bubbles at earliest stages of reioniza-tion, while the bias of galaxy distribution regulates themorphology of ionization fronts on large scales.The simulation is tuned to match the available observa-tional constraints. Those include the galaxy luminosityfunctions and full distribution function of Gunn-Petersonabsorption in the spectra of the high redshift quasars.However, in this study we do not need precise tuning ofa simulation. We only need a complexity of the ioniza-tion bubbles and the shapes of neutral patches, in orderto test our methods.With the use of Adaptive Mesh Reﬁnement algorithm,CROC simulations achieve spatial resolution of 125pc insimulation volumes of up to 40 h − Mpc, which allows toconsider IGM and ﬁlaments to be well resolved.In this paper we use a 40 h − Mpc realization (runB40.sf1.uv2.bw10.A from Gnedin (2014)) as our ﬁdu-cial set and two other 40 h − Mpc realizations (runsB40.sf1.uv2.bw10.B,C from Gnedin (2014)) for compari-son. The scalar ﬁelds which we extract from simulationare: the baryon and dark matter density ﬁelds, the ion-ization fractions of Hydrogen. THE FRAMEWORK OF ANALYTICAL MODELS FORANALYSIS OF NUMERICAL RESULTS

In this section we approach a realization of the nu-merical simulation with the framework of the analyticalmodel.

Preparing numerical simulation

For the following analysis we use the down-sampledsnapshots of 40 h − Mpc to 256 uniform grid (the eﬀec-tive resolution is 156 h − kpc). Next, for each pixel we Alexander A. Kaurov h − Mpc . . . . . . . . . Redshift of crossing 10% threshold2 . h − Mpc 40 h − Mpc . . . . . . . . . Redshift of crossing 90% threshold2 . h − Mpc

Fig. 1.—

The moment of reaching 10% (left panel) and 90% (right panel) ionization level in a slice from 40 h − Mpc simulation. h − Mpc − . − . − . − . . . . . . δ . h − Mpc 40 h − Mpc δ . h − Mpc

Fig. 2.—

The overdensity of initial conditions (left panel) and evolved baryon density ﬁeld at redshift 6 (right panel) in the same sliceas in Figure 1. atching analytical models of cosmic reionization with numerical simulations 5calculate the moment of its ionization. We adopt twothresholds 10% and 90% weighted by mass. The vastmajority of the cells cross these thresholds only once,therefore we can assign those values to the grid. In Fig-ure 1 the slices of the redshift of ionization are plotted.In Figure 2 the same slice for the density initial condi-tions (IC) and the evolved baryonic density ﬁeld at z = 6are plotted.Immediately we see that the ionization ﬁeld is muchsmoother for 10% threshold rather than for 90% thresh-old. It motivates us to consider the time delay betweenthe moments when a cell reaches 10% and 90% ionizationthresholds as a separate ﬁeld.In Figure 3 we plot the distribution of those delays.For the vast majority of the cells this delay is small com-pared to the cosmological timescales; however, a fractionof the cells show big delays. The peak-like structure ofthe histogram is an artifact due to the characteristic timebetween the simulation snapshots.In Figure 4 we plot the delay for the same slice. Thecells with long delays are correlated with the overdensecells in the density ﬁeld. This correlation is caused bythe ﬁlaments located in denser cells, which stay rela-tively neutral even when the surroundings are alreadyhighly ionized. Therefore, this eﬀect ﬁts into the MHR00paradigm of the gradual ionization of the overdense re-gions. Random walk in numerical simulation

The trajectory, or random walk, which we described in § r , r , ...r N in comoving units.In this paper we stick with r notation since we workwith diﬀerent scalar ﬁelds, and values of m and σ arenot constant across them. Also not all scalar ﬁelds we useare Gaussian, and those are not described exclusively bythe ﬁrst moment statistics, power spectrum. Therefore, σ is less motivated quantity.We call the barrier a two dimensional function, B ( r i , z j ), which is deﬁned on uniform grid of scales, r i ,and redshifts, z j . The trajectory is one dimensional func-tion associated with each cell x , T x ( r i ), deﬁned at allscales, r i . It corresponds to the density of scalar ﬁeld inthe cell after applying a smoothing ﬁlter of scale r i .The analytic theories based on Furlanetto et al. (2004)use the ﬁrst time crossing as an indication of ionization.The condition of the ﬁrst time ionization of a cell (cid:126)x canformulated as follows: z ion = max { z j | ∃ i. { T x ( r i ) > B ( r i , z j ) } . (6)It summarizes the barrier approach.In order to translate a numerical simulation to the lan-guage of the analytical model we calculate a random walkfor every cell in our simulation box. In order to do thiswe use 8 log-spaced discrete smoothing scales – from asingle cell to the half-box size. This number is foundto be suﬃcient for our purposes, and increasing it doesnot aﬀect the results. We generate a smoothed density T d [Myr] N d N / d T d Fig. 3.—

The distribution of time delays between the momentswhen a cell reaches 10% and 90% ionization thresholds. The spikesat T d ∼ T d ∼

10 are artiﬁcial, and arecaused by the uneven time intervals between the snapshots. ﬁeld with every chosen scale. Then, we normalize the re-sulting scalar ﬁelds to the Normal distribution with zeromean and unity standard deviation. This operation isperformed only to avoid usage of physical units (whichdiﬀer from ﬁeld to ﬁeld) and better visualization of tra-jectories in ﬁgures. Skipping this step does not aﬀectthe ﬁnal results. Finally, we assign to each cell 8 num-bers that correspond to the normalized overdensities for8 smoothing scales.The method described above can work with any scalarﬁeld. In this paper we consider only the IC density ﬁeld(linear) and evolved matter density ﬁeld (non-linear).We leave the star density and halo density ﬁelds beyondthe scope of this paper; however, both of these ﬁelds po-tentially can signiﬁcantly improve the results.For now, let us consider the IC, which is a Gaussianﬁeld, and the redshift of 10% ionization. We want to de-termine whether there is a correlation between the tra-jectories and the redshifts of ionization. We consider thecells that were ionized within some redshift interval andplot the density distribution of their trajectories alongwith the median in Figure 5. The deviation from zerotells us that, indeed, there is a dependence.Next, we repeat this operation for all redshift inter-vals. We consider two density ﬁelds (IC and evolved DMdensity) and two ionization thresholds (10% and 90%).Thus, we have 4 possible combinations. The median tra-jectories for all of them are plotted in Figure 6. Also, forcompleteness, we study delays with the same approachand evolved density ﬁeld (see Figure 7).

Discussion

The median trajectories in Figures 6 and 7 clearly showredshift dependence. In this section we discuss the ob-served patterns and the physical reasoning behind them.First, we have two ﬁgures which relate to the 10% ion-ization threshold with the IC and the evolved density Alexander A. Kaurovﬁelds. Both of them display a gradual decrease of a bar-rier from overdense to underdense, which conﬁrms theinside-out scenario. The physical explanation for this ob-servation is simple: the dense regions host galaxies whichproduce the ionizing radiation, and therefore reionizationstarts from the overdense regions and ends inside the un-derdense – voids.All the trajectories reach zero at the maximumsmoothing scale which is expected box eﬀect; however,the same behavior is expected in the real world, sincecosmic variance is supposed to reach zero at some scale.Since the largest box we have access to is 40 h − Mpc, wecannot distinguish these two eﬀects.On the other end, at the smallest scales, there is also areduced correlation between density and ionization ﬁeld.For the IC density it is caused by the fact that the posi-tion of galaxies is not identically correlated with densitypeaks in the IC. For the evolved density ﬁeld the reason-ing is diﬀerent. The evolved density ﬁeld correlates withgalaxy positions much better; however, it also correlateswith dense ﬁlaments which require more intense ﬂux ofionizing photons to ionize.The phase plots show that the highest correlation(largest amplitudes of the trajectories) is achieved atscales 0 . − h − Mpc. This might be interpreted as therange of scales where the majority of the informationabout the ionization ﬁeld is contained. As mentionedabove, due to the lack of larger boxes we can only specu-late regarding the upper bound of such a range. However,we can be more conﬁdent in the lower bound becausethe resolution of the numerical simulation (0 . h − kpc) ismuch higher than the down-sampled grids (156 h − kpc)that we used for this analysis.Another feature that can be observed is the scale ofthe largest correlation amplitude. It increases from justbelow 1 h − Mpc at redshift 15 to a few h − Mpc scale atredshift 6. It can be associated with the growth of thecharacteristic bubbles size.The most important observation from these phasespace plots is that the median trajectories are distin-guishable and therefore they contain information aboutthe reonization history. It conﬁrms the applicability ofthe excursion set framework for describing the reioniza-tion morphology.Now we consider the ﬁgures related to 90% thresh-old. The main diﬀerence is located at lower redshifts,where the median trajectories are going up again. Thosecorrespond to the regions correlated with the local den-sity which we mentioned in § h − Mpc

Delay [Myr]2 . h − Mpc

Fig. 4.—

The time delay between the moments when cell reaches10% and 90% ionization thresholds in the same slice as in Figures1 and 2. the phase space of delayed ﬁeld in Figure 7 does not haveintersecting trajectories. It means that by splitting 90%ionization threshold ﬁeld into 10% ionization ﬁeld anddelays we can avoid degeneracy in the phase space.Another way of thinking about it is that MHR00 modelis valid not only at late stages of reionization, but at anyredshifts inside the ionized bubbles. While the FZH04model only describes the ionization fronts at low ioniza-tion thresholds (10% in our case). TRAINING THE ANALYTICAL MODEL ON ANUMERICAL SIMULATION In § § § − Smoothing scale [ h − Mpc] − O v e r d e n s i t y i n σ Fig. 5.—

The density of the trajectories distribution. The solidand dashed lines are the median trajectory and 1 σ scatter. about training or ﬁtting is what is the criteria of good-ness. We overview them in § Building a model

As it was shown in § § Barrier estimation In § ∼ z i , z i +1 ] cross thebarrier B ( z i +1 , r ) and do not cross B ( z i , r ). Since it is not the case for a real numerical simulation with radia-tive transfer, because we have at least some noise in thephase plot, we need to come up with a more robust al-gorithm.In Figure 5 the two-dimensional density of trajectoriesis plotted for a given redshift. In fact, the phase plotis three-dimensional, where the third dimension is theredshift. In this space the series of barriers at diﬀerentredshifts is represented by a surface.The simplest algorithm would be the brute-force, i.e.to probe all possible barriers and ﬁnd the one that allowsto achieve the best concordance with the reference simu-lation. This would be the most robust, but not eﬀectivealgorithm. We can optimize it by making an initial guessof a good barrier, and then only brute-force all barriersclose to it.In order to make a good initial guess for the barrier weconsider the gradients along redshift axis. The collectionof points with highest gradients deﬁnes the surface, whichis our best guess. Notice, that for the scale equal to ourbox size all gradients along the redshift axis will be zero,because the trajectories are not distinguishable on thisscale. Consequently, the maximum gradient cannot befound and the barrier is not deﬁned at this scale (seeFigure 8).No matter what is the training algorithm, it shouldrely on some sort of goodness criteria, which we discussin § Goodness criteria

The comparison between the numerical and analyticalmodels had been carried out before in Zahn et al. (2011)and Battaglia et al. (2013). Here we outline possiblecriteria and discuss why some of them are not applicablein our case.

Global ionization fraction , or optical depth averagedover the whole sky, does not trace any information aboutmorphology and inhomogeneity of reionization. There-fore, it can be used only for estimating total number ofemitting photons, neglecting recombination (which in ageneral case can signiﬁcantly depend on the morphol- Alexander A. Kaurov . Smoothing scale [ h − Mpc] − O v e r d e n s i t y i n σ Redshift IC vs. 10% . Smoothing scale [ h − Mpc] − O v e r d e n s i t y i n σ Redshift IC vs. 90% . Smoothing scale [ h − Mpc] − O v e r d e n s i t y i n σ Redshiftgas density vs. 10% . Smoothing scale [ h − Mpc] − O v e r d e n s i t y i n σ Redshiftgas density vs. 90%

Fig. 6.—

The phase space of median trajectories colorcoded with redshift and deﬁned for IC and 10% ionized threshold ionization ﬁeld(top-left panel); IC and 90% (top-right panel); evolved density ﬁeld and 10% (bottom left); evolved density and 90% (bottom right). ogy).

Bubble distribution has a few deﬁnitions. The ﬁrst oneis robust and based on smoothed scalar ﬁelds. It is equiv-alent to the power spectrum. On the other hand, if abubble is deﬁned as an ionized volume, which is not con-nected with other bubbles through ionized patches, thanit becomes much harder to deﬁne it algorithmically. Ifthe bubbles are deﬁned as the largest spherical volumewith some given ionized fraction (Zahn et al. 2007), thenin our case it is not clear how to deﬁne bubbles for thedense semi-neutral ﬁlaments and for the cells right nearthem.

Pixel-by-pixel comparison, which was made inBattaglia et al. (2013), is the most powerful in case ifwe expect perfect match. However, if we expect somedeviations we will face a few uncertainties. First, as wementioned before the redshift of ionization of a cell is not a well deﬁned quantity. It depends on the threshold, andalso the process of ionization cannot be instantaneous.Second, one should ask which areas are more importantto match. For instance, at redshifts below 6 most ofthe observed signal comes from neutral patches whichoccupy small volume; therefore, pixel-by-pixel compari-son has to take it into account weighting pixels by theircontribution to the observed quantities.

Topological quantities , such as Minkowski functionalare used for statistical analysis of reionization and distri-bution of cosmological objects (Friedrich et al. 2011; Mc-Donough & Brandenberger 2013). However, these quan-tities are not directly observed.

Power spectrum is the most common statistics for de-scribing ﬁelds on large scales. Also it is a direct observ-able which will be measured with 21cm experiments. In avanilla FZH04 model, where only excursion set formalismatching analytical models of cosmic reionization with numerical simulations 9 . Smoothing scale [ h − Mpc] − O v e r d e n s i t y i n σ

80 160 240 320 400 480 560 640 T d [Myr] Fig. 7.—

Median trajectories corresponding to the delay betweencrossing 10% and 90% ionization thresholds. . Smoothing scale [ h − Mpc] − O v e r d e n s i t y i n σ . . . . . . . . . Redshift

Fig. 8.—

The barriers that were estimated from the phase space,shown in Figure 5. and the Gaussian IC are used the barrier and the powerspectrum (as well as bubble distribution) are directly in-terconnected and one can be derived analytically fromanother. However, when we modify the model by usingnon-linear density ﬁeld and adding the neutral patchesinside the ionized regions this convenience vanishes andwe have to perform all calculations numerically. Cross correlation coeﬃcient between the neutral hy- The input of smallest structures to the power spectrum hasbeen studied in Watkinson et al. (2015). Also, in Kaurov & Gnedin(2016) the eﬀect of the substructure on the 21 cm power spectrumis studied. drogen in the numerical simulation, D , and analyticalmodel, X , deﬁned as (Zahn et al. 2011): r XD = P XD / (cid:112) P XX P DD . (7)In this study we consider only last two statistics be-cause those are directly related to the observable quanti-ties and do not have uncertainties in the deﬁnition. Theresult of training is shown in the Figure 9 with black lines.The analytical model matches the numerical simulationwithin 10s of percent level, which is consistent with thesimilar study in Zahn et al. (2011). In contrast to thatstudy, we consider smaller scales and semi-neutral ﬁla-ments, and in result we have a diﬀerent shape of r XD asa function of k . The largest deviation from unity reaches0.2 at scales ∼ − h Mpc − ; the deviations at the samescales can be visually distinguished in Figure 9.Beside the realization that has been used for train-ing (realization A), we have two other with the identicalphysics, but diﬀerent initial conditions (B and C). We ap-ply the same analytical model to these two realizationsand compare the power spectrum and cross correlationcoeﬃcients in Figure 9. The model performs equally well.Therefore, we can conclude that we did not over-trainedthe model.Also, in Figure 9, we show the power spectrum of theanalytical model applied to a 160 h − Mpc box, for whichwe do not have a numerical result. It illustrates how theanalytical model can be expanded to the larger volumes.Three stages captured in the Figure 9 represent thebeginning, the middle, and the very end of reionization.The ﬁrst one is characterized by individual bubbles withonly few galaxies within them. The model can repro-duce the power spectrum; however, visually many bub-bles are placed in the incorrect places. It happens be-cause in this study we used only density ﬁelds. Themodel can be signiﬁcantly improved if the halos wouldbe cosidered as well. The second stage with overlap-ping bubbles correponds to the global ionization fractionreaching ∼ § h − Mpc. However, here we showedthat the largest deviations occur on the scales just abovethis range. We interpret it as the limitation of the bar-rier approach. Since the excursion set formalism is onlyan approximation to the numerical simulation, it is onlycapable of capturing the correlations partially. CONCLUSIONS

In this paper we limit ourselves to the approach basedon the excursion set formalism; however, having all ofthe available information from the phase space we couldconstruct any mathematical model, for instance, basedon the likelihood to the given median trajectories. The0 Alexander A. Kaurov h − Mpc0 . . . h − Mpc . . . Ionized fraction 40 h − Mpc . . . Ionized fraction h − Mpc0 . . . h − Mpc . . . Ionized fraction 40 h − Mpc . . . Ionized fraction . k [ h Mpc − ] − − . k / π P k . k [ h Mpc − ] − − . k / π P k . k [ h Mpc − ] − − − . k / π P k . k [ h − Mpc − ] . . . . . r X D . k [ h − Mpc − ] . . . . . r X D . k [ h − Mpc − ] . . . . . r X D Fig. 9.—

The ionization fraction in the numerical simulation (the ﬁrst row of panels) and trained analytical model (the second row)in the same slice as in Figures 1, 2, and 4. The power spectrum of the neutral hydrogen in the numerical simulations (dashed lines) andour model (solid) is shown in the third row of panels. The correlation coeﬃcient (see Equation 7) is shown in the forth row. Black isthe realization A, blue and red are realization B and C. Solid lines correspond to the same analytical model, which is ﬁtted only to therealization A. Green solid line corresponds to the power spectrum calculated with the same analytical model in a 160 h − Mpc box. Thecolumns correspond to three diﬀerent moments in time: early stage of reionization 5% ionized fraction; intermediate stage with 50%ionized fraction and interloping bubbles; and late stage, when only ﬁlaments remain partially neutral. atching analytical models of cosmic reionization with numerical simulations 11advantage of barrier approach is the similarity with ex-isting and widely used analytical models. The excursionset formalism is not the only analytical method that canbe ﬁtted into numerical simulation.We have performed a detailed comparison between thenumerical high resolution simulation and approximateanalytical methods. We studied how the framework ofthe analytical model (excursion set formalism) can helpto statistically describe ionization fronts in the numericalsimulation. We have shown that it is possible and high-lights some physical properties which are usually lost orcannot be easily seen in other statistics like power spec-trum.The statistics we developed is descriptive enough to beable to reproduce reionization history from density ﬁeld.It allows us to build a model which combines the frame-work of excursion set formalism Furlanetto et al. (2004)for describing large scale structure of ionization frontsand threshold approach Miralda-Escud´e et al. (2000) forsmall scale neutral patches – ﬁlaments – inside ionizedbubbles. We perform training of the model into a givennumerical simulation, using the power spectrum as ourmain criteria. The ﬁtted model is capable to repro-duce the power spectrum of other simulation realizations,which were not used during the ﬁtting.Ideally, a theoretical model should be able to describe all of the mentioned observations, as discussed in §

2. Ourapproach allows to build a single model, and compare itversus all available observations.We propose the following work ﬂow. First, one runs asimulation with any physics included. The box might berelatively small and number of realization can be low.Such a numerical simulation allows one to model theThompson optical depth, Lyman- α forest and galaxy lu-minosity function. Then, the semi-analytical model istrained on this numerical simulation using the describedmethod. The resulted model can generate as many mockcatalogs as needed in boxes with sizes exceeding the sizeof the original simulation. These extended boxes are suit-able for generating mock catalogs for 21cm experimentsand to generate polarized CMB signal.We made our code publicly available, further de-tails are available on https://bitbucket.org/kaurov/211mm .This work was supported by the NSF grant AST-1211190. AK is supported by the Fermilab Fellowshipin Theoretical Physics. Fermilab is operated by FermiResearch Alliance, LLC, under Contract No. DE-AC02-07CH11359 with the United States Department of En-ergy..This work was supported by the NSF grant AST-1211190. AK is supported by the Fermilab Fellowshipin Theoretical Physics. Fermilab is operated by FermiResearch Alliance, LLC, under Contract No. DE-AC02-07CH11359 with the United States Department of En-ergy.