[PDF] Reionization constraints using Principal Component Analysis

Abstract

Using a semi-analytical model developed by Choudhury & Ferrara (2005) we study the observational constraints on reionization via a principal component analysis (PCA). Assuming that reionization at z>6 is primarily driven by stellar sources, we decompose the unknown function N_{ion}(z), representing the number of photons in the IGM per baryon in collapsed objects, into its principal components and constrain the latter using the photoionization rate obtained from Ly-alpha forest Gunn-Peterson optical depth, the WMAP7 electron scattering optical depth and the redshift distribution of Lyman-limit systems at z \sim 3.5. The main findings of our analysis are: (i) It is sufficient to model N_{ion}(z) over the redshift range 2<z<14 using 5 parameters to extract the maximum information contained within the data. (ii) All quantities related to reionization can be severely constrained for z<6 because of a large number of data points whereas constraints at z>6 are relatively loose. (iii) The weak constraints on N_{ion}(z) at z>6 do not allow to disentangle different feedback models with present data. There is a clear indication that N_{ion}(z) must increase at z>6, thus ruling out reionization by a single stellar population with non-evolving IMF, and/or star-forming efficiency, and/or photon escape fraction. The data allows for non-monotonic N_{ion}(z) which may contain sharp features around z \sim 7. (iv) The PCA implies that reionization must be 99% completed between 5.8<z<10.3 (95% confidence level) and is expected to be 50% complete at z \approx 9.5-12. With future data sets, like those obtained by Planck, the z>6 constraints will be significantly improved.

Full PDF

aa r X i v : . [ a s t r o - ph . C O ] D ec Mon. Not. R. Astron. Soc. , 000–000 (0000) Printed 22 October 2018 (MN L A TEX style ﬁle v2.2)

Reionization constraints using Principal Component Analysis

Sourav Mitra ⋆ , T. Roy Choudhury † and Andrea Ferrara ‡ Harish-Chandra Research Institute, Chhatnag Road, Jhusi, Allahabad 211019, India Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy

22 October 2018

ABSTRACT

Using a semi-analytical model developed by Choudhury & Ferrara (2005) we study the ob-servational constraints on reionization via a principal component analysis (PCA). Assumingthat reionization at z > is primarily driven by stellar sources, we decompose the unknownfunction N ion ( z ) , representing the number of photons in the IGM per baryon in collapsed ob-jects, into its principal components and constrain the latter using the photoionization rate, Γ PI ,obtained from Ly α forest Gunn-Peterson optical depth, the WMAP7 electron scattering opti-cal depth τ el and the redshift distribution of Lyman-limit systems d N LL / d z at z ∼ . . Themain ﬁndings of our analysis are: (i) It is sufﬁcient to model N ion ( z ) over the redshift range < z < using 5 parameters to extract the maximum information contained within thedata. (ii) All quantities related to reionization can be severely constrained for z < becauseof a large number of data points whereas constraints at z > are relatively loose. (iii) Theweak constraints on N ion ( z ) at z > do not allow to disentangle different feedback modelswith present data. There is a clear indication that N ion ( z ) must increase at z > , thus rul-ing out reionization by a single stellar population with non-evolving IMF, and/or star-formingefﬁciency, and/or photon escape fraction. The data allows for non-monotonic N ion ( z ) whichmay contain sharp features around z ∼ . (iv) The PCA implies that reionization must be99% completed between . < z < . (95% conﬁdence level) and is expected to be 50%complete at z ≈ . − . With future data sets, like those obtained by Planck , the z > constraints will be signiﬁcantly improved. Key words: dark ages, reionization, ﬁrst stars – intergalactic medium – cosmology: theory –large-scale structure of Universe.

The importance of studying hydrogen reionization at high red-shifts lies in the fact that it is tightly coupled to proper-ties of ﬁrst luminous sources and subsequent galaxy formation(for reviews, see, Loeb & Barkana 2001; Barkana & Loeb 2001;Choudhury & Ferrara 2006a; Choudhury 2009). In recent years,studies in reionization have been boosted by (i) the availability ofa wide range of data sets and (ii) the expectation that the volumeof data would increase rapidly over the next few years (for reviews,see Furlanetto, Oh, & Briggs 2006; Fan, Carilli, & Keating 2006).Given such a large amount of data, it is important to develop theo-retical and statistical methods so that maximum information can beextracted.Theoretically, reionization is modelled either semi-analytically or by numerical simulations. Unfortunately, thephysical processes relevant to reionization are so complex thatneither of the two approaches can capture the overall picture ⋆ E-mail: [email protected] † E-mail: [email protected] ‡ E-mail: [email protected] entirely. The simulations are indispensable for understandingdetailed spatial distribution of ionized regions and topology ofreionization. However, if one is interested in the evolution ofglobally-averaged quantities, then semi-analytical models prove tobe very useful in providing insights. The main reason for this isthat these models can probe a wide range of parameter space whichcan be quite large depending on our ignorance of the differentprocesses.At present, our understanding of reionization is that it is pri-marily driven by ultra-violet radiation from stellar sources form-ing within galaxies. The major uncertainty in modelling reioniza-tion is to model the star-formation history and transfer of radia-tion from the galaxies to the intergalactic medium (IGM) whichis usually parameterized through N ion , the number of photonsentering the IGM per baryon in collapsed objects. This param-eter, in principle, has a dependence on z which can arise fromevolution of star-forming efﬁciency, fraction of photons escap-ing from the host halo and chemical and radiative feedback pro-cesses. Note that this parameter remains uncertain even in nu-merical simulations, hence the semi-analytical models can be-come handy in studying a wide range of parameter values andthe corresponding agreement with data sets. In analytical stud- c (cid:13) Mitra, Choudhury & Ferrara ies, N ion ( z ) is either taken to be a piecewise constant func-tion (Wyithe & Loeb 2003; Choudhury & Ferrara 2005) or param-eterized using some known functions (Chiu, Fan, & Ostriker 2003;Pritchard, Loeb, & Wyithe 2010) or modelled using a physically-motivated prescription (Choudhury & Ferrara 2006b). In partic-ular, a model involving metal-free and normal stars withsome prescription for radiative and chemical feedback canmatch a wide range of observations (Choudhury & Ferrara 2006b;Gallerani, Choudhury, & Ferrara 2006) and possibly make predic-tion regarding search for reionization sources by future experiments(Choudhury & Ferrara 2007).However, the fact remains that many of the physical processesinvolved in modelling N ion are still uncertain. Given this, it isworthwhile doing a detailed probe of the parameter space and deter-mine the range of reionization histories that are allowed by the data.In other words, rather than working out the uncertain physics, onecan ask the question as to what are the forms of N ion ( z ) impliedby the data itself. It is expected that in near future, with more datasets becoming available, the allowed range in the forms of N ion ( z ) would be severely constrained, thus telling us exactly how reion-ization occurred. Now, it is obvious that the constraints on N ion ( z ) will not be same for all redshifts, points where there are more andbetter data available, the constraint would be more tight. Similarly,since we deal with a heterogeneous set of data, it is expected thatthe constraints would depend on the nature of data used. It is thusimportant to know which aspects of reionization history can be con-strained by what kind of data sets. A method which is ideally suitedto tackle this problem is to use the principal component analysis(PCA); this is a technique to compute the most meaningful basis tore-express the unknown parameter set and the hope is that this newbasis will reveal hidden detailed statistical structure.In this work, we make a preliminary attempt to constrain N ion ( z ) using PCA and hence estimate the uncertainties in thereionization history. The main objective of the work would be toﬁnd out the widest possible range in reionization histories allowedby the different data sets.Throughout the paper, we assume a ﬂat Universe with cos-mological parameters given by the WMAP7 best-ﬁt values: Ω m =0 . , Ω Λ = 1 − Ω m , Ω b h = 0 . , and h = 0 . . The pa-rameters deﬁning the linear dark matter power spectrum we use are σ = 0 . , n s = 0 . , d n s / d ln k = 0 (Larson et al. 2010). The semi-analytical model used in this work is based onChoudhury & Ferrara (2005) and Choudhury & Ferrara (2006b).Let us ﬁrst summarize the main features of the model alongwiththe modiﬁcations made in this work: • The model accounts for IGM inhomogeneities by adopt-ing a lognormal distribution according to the method outlined inMiralda-Escud´e, Haehnelt, & Rees (2000); reionization is said tobe complete once all the low-density regions (say, with overden-sities ∆ < ∆ crit ∼ ) are ionized. The mean free path of photonsis thus determined essentially by the distribution of high densityregions: λ mfp ( z ) = λ [1 − F V ( z )] / (1)where F V is the volume fraction of ionized regions and λ is a nor- malization parameter. In our earlier works, the value of this param-eter was ﬁxed by comparing with low redshift observations whilein this work, we treat it as a free parameter. We follow the ioniza-tion and thermal histories of neutral, HII and HeIII regions simul-taneously and self-consistently, treating the IGM as a multi-phasemedium. • The model assumes that reionization is driven by stellarsources. The stellar sources can further be divided into two classes,namely, (i) metal-free (i.e. PopIII) stars having a Salpeter IMF inthe mass range − M ⊙ : they dominate the photoionization rateat high redshifts; (ii) PopII stars with sub-solar metallicities alsohaving a Salpeter IMF in the mass range − M ⊙ . • Reionization by UV sources is accompanied by photo-heatingof the gas, which can result in a suppression of star formationin low-mass haloes. We compute such (radiative) feedback self-consistently from the evolution of the thermal properties of theIGM. • Furthermore the chemical feedback including PopIII → PopIItransition is implemented using merger-tree based genetic approachSchneider et al. (2006). Under this approach, it is assumed that ifa given star-forming halo has a progenitor which formed PopIIIstars, then the halo under consideration is enriched and cannot formPopIII stars. In this work, we introduce an analytical formula for thetransition from PopIII to PopII phase using the conditional proba-bility of Press-Schechter mass function (Lacey & Cole 1993). Theprobability that a halo of mass M at z never had a progenitor in themass range [ M min ( z ) , M + M res ] is given by f III ( M, z ) = 2 π tan − (cid:20) σ ( M + M res ) − σ ( M ) σ ( M min ( z )) − σ ( M + M res ) (cid:21) , (2)where M min is the minimum mass of haloes which are able to formstars and M res represents the minimum increase in mass (either byaccretion or by merger) of an object so that it may be identiﬁedas a new halo. The fraction of collapsed haloes which are able toform PopII and PopIII stars at redshift z are given by the followingrelations: f coll , II ( z ) = 1¯ ρ m Z ∞ M min ( z ) d M [1 − f III ( M, z )] M ∂n ( M, z ) ∂M ,f coll , III ( z ) = 1¯ ρ m Z ∞ M min ( z ) d M f

III ( M, z ) M ∂n ( M, z ) ∂M . (3)with f coll , II ( z ) + f coll , III ( z ) = f coll ( z ) . The quantity ¯ ρ m is thecomoving density of dark matter and ∂n/∂M is number densityof collapsed objects per unit comoving volume per unit mass range(Press & Schechter 1974). • Given the collapsed fraction, this model calculates the produc-tion rate of ionizing photons in the IGM as ˙ n ph ( z ) = n b (cid:20) N ion , II d f coll , II d t + N ion , III d f coll , III d t (cid:21) (4)where n b is the total baryonic number density in the IGM and N ion , II ( N ion , III ) is the number of photons from PopII (PopIII) starsentering the IGM per baryon in collapsed objects. The parameter N ion can actually be written as a combination of various other pa-rameters: N ion ≡ ǫ ∗ f esc m p Z ∞ ν HI d ν (cid:20) d N ν d M ∗ (cid:21) ≡ ǫm p Z ∞ ν HI d ν (cid:20) d N ν d M ∗ (cid:21) , (5)where ǫ ∗ denotes the star-forming efﬁciency (fraction of baryonswithin collapsed haloes going into stars), f esc is the fraction ofphotons escaping into the IGM, [d N ν / d M ∗ ] gives the number ofphotons emitted per frequency range per unit mass of stars (which c (cid:13) , 000–000 eionization constraints using PCA depends on the stellar IMF and the corresponding stellar spec-trum) and ǫ ≡ ǫ ∗ f esc . For PopII stars with sub-solar metallici-ties having a Salpeter IMF in the mass range − M ⊙ , we get N ion , II ≈ ǫ II , while for PopIII stars having a Salpeter IMF inthe mass range − M ⊙ , we get N ion , III ≈ ǫ III .In this Section, we take ǫ II , ǫ III (or, equivalently N ion , II , N ion , III ) to be independent of z and M , which im-plies that the star-forming efﬁciencies and the escape fractionsdo not depend on the mass of the star-forming halo and also donot evolve. However, note that the effective N ion (which is theappropriately weighted average of N ion , II and N ion , III ) evolveswith zN ion ( z ) = N ion , II d f coll , II d t + N ion , III d f coll , III d t d f coll , II d t + d f coll , III d t (6)At high redshifts, we expect d f coll , II / d t → , hence N ion ( z ) → N ion , III , and similarly at low redshifts where chemical enrichmentis widespread, we have N ion ( z ) → N ion , II . • We also include the contribution of quasars basedon their observed luminosity function at z < (Hopkins, Richards, & Hernquist 2007); we assume that theyhave negligible effects on IGM at higher redshifts. They aresigniﬁcant sources of photons at z . and are particularlyrelevant for studying helium reionization. • The free parameters for this analysis would be ǫ II , ǫ III (or,equivalently N ion , II , N ion , III ) and λ , the normalization which de-termines the mean free path of photons. • Usually, the model is constrained by comparing with a varietyof observational data, namely, (i) redshift evolution of Lyman-limitabsorption systems (LLS), (ii) IGM Ly α and Ly β optical depths,(iii) electron scattering optical depth, (iv) temperature of the meanintergalactic gas, and (v) cosmic star formation history. However,most of the constraints on the model come from a subset of theabove data sets. In this work, we would like to carry out a detailedlikelihood analysis of the parameters. Hence to keep the analysissimple, the likelihood analysis is done using only three particulardata sets which are discussed as follows:(i) We use estimates for the photoionization rates Γ PI obtained using Ly α forest Gunn-Peterson optical depth ob-servations and a large set of hydrodynamical simulations(Bolton & Haehnelt 2007). The error-bars in these data points takeinto account the uncertainties in the thermal state of the IGM inaddition to the observational errors in the Ly α optical depth. Thedata points have a mild dependence on the cosmological param-eters which has been taken into account in this work. We alsoﬁnd that although the error-bars on Γ PI are highly asymmetric,those on log(Γ PI ) are relatively symmetric; hence we use valuesof log(Γ PI ) and the corresponding errors in our likelihood anal-ysis. The photoionization rate can be obtained in our model from ˙ n ph ( z ) using the relation Γ PI ( z ) = (1 + z ) Z ∞ ν HI d ν λ mfp ( z ; ν ) ˙ n ph ( z ; ν ) σ H ( ν ) (7)where ν the frequency of radiation, ν HI is the threshold frequencyfor photoionization of hydrogen and σ H ( ν ) is the photoionizationcross section of hydrogen.(ii) The second set of observations we have used correspondsto the WMAP7 data on electron scattering optical depth τ el (Larson et al. 2010). The reported value of this quantity dependson the background cosmological model used. In this work, werestrict ourselves to the ﬂat CDM universe with a cosmologicalconstant and use the corresponding constraints on τ el . Also, the τ el constraint is treated as a single data point which should bethought as a simpliﬁcation because CMB polarization observationsare, in principle, sensitive to the shape of the reionization history(Burigana et al. 2008). However, we have checked and found thatthe range of reionization histories considered in this paper wouldhardly make any difference to the currently observed large angu-lar scale polarization anisotropies other than the value of τ el . Thequantity τ el can be obtained from our model given the global reion-ization history, in particular the comoving density of free electrons n e ( z ) : τ el ( z ) = σ T c Z z [ t ]0 d t n e (1 + z ) (8)where σ T is the Thomson scattering cross section.(iii) Finally, we use the redshift distribution of LLS d N LL / d z at z ∼ . (Prochaska, O’Meara, & Worseck 2010). The data pointsare obtained using a large sample of QSO spectra which results inextremely small statistical errors. However, there are various sys-tematic effects arising from effects like the incidence of proximateLLS and uncertainties in the continuum. Usually, these effects con-tribute to about 10–20% uncertainty in the data points. The quantity d N LL / d z can be calculated in our model from the mean free path: d N LL d z = c √ π λ mfp ( z ) H ( z )(1 + z ) (9)Note that inclusion of the Lyman-limit systems in the analysis iscrucial for constraining the parameter λ .The likelihood function used in our calculations is given by L ∝ exp( −L ) (10)where L is the negative of the log-likelihood. It is estimated usingthe relation L = 12 n obs X α =1 (cid:20) G obs α − G th α σ α (cid:21) (11)where G α represents the set of n obs observational data points de-scribed above, i.e., G α = { log(Γ PI ) , τ el , d N LL / d z } and σ α arethe corresponding observational error-bars. We constrain the freeparameters by maximizing the likelihood function. We impose aprior such that reionization should be complete by z = 5 . , oth-erwise it will not match that Ly α and Ly β forest transmitted ﬂuxdata. The results of our likelihood analysis using the reionization modeldescribed above are summarized in Table 1. The evolution of vari-ous quantities for models which are allowed within 95% conﬁdencelimit is shown in Figure 1.The top-left panel of the ﬁgure shows the evolution of the ef-fective N ion as given by equation (6). One can see that the quantityattains a constant value ≈ at z < which is a consequence of We did not include the more recent measurements of d N LL / d z bySongaila & Cowie (2010) because the values are systematically larger thanthe ones quoted in Prochaska, O’Meara, & Worseck (2010) at z ∼ . ; in-clusion of both the data sets would lead to a bad ﬁt for the model. TheSongaila & Cowie (2010) set has a data point at z ∼ which is not presentin other data sets, however the present error-bar on that particular point isrelatively large and hence excluding it does not affect our constraints sig-niﬁcantly.c (cid:13) , 000–000 Mitra, Choudhury & Ferrara mean N i o n ∙ ∙ ∙ ∙ ∙ . Γ P I / - s - ∙∙∙∙∙ d N LL / d z ∙ . . . z τ e l . . . z Q H II ■ ∙ □ △ ○○ - - - z l o g x H I Figure 1.

The marginalized posteriori distribution of various quantities related to reionization history for a model with chemical feedback (Choudhury &Ferrara 2006). The solid lines correspond to the model described by mean values of the parameters while the shaded regions correspond to 2- σ limits. Thepoints with error-bars denote the observational data points. Top-left: the evolution of the effective N ion ( z ) ; Top-middle: the hydrogen photoionization rate Γ PI ( z ) alongwith the constraints from Bolton & Haehnelt (2007); Top-right: the LLS distribution d N LL / d z with data points from Prochaska, O’Meara &Worseck (2010); Bottom-left: the electron scattering optical depth τ el with the WMAP7 constraint (Larson et al. 2010); Bottom-middle: the volume ﬁllingfactor of HII regions Q HII ( z ) ; Bottom-right: the global neutral hydrogen fraction x HI ( z ) with observational limits from QSO absorption lines (Fan et al.2006; ﬁlled square), Ly α emitter luminosity function (Kashikawa et al. 2006; open triangle) and GRB spectrum analysis (Totani et al 2006; open square). Alsoshown are the constraints using dark gap statistics on QSO spectra (Gallerani et al 2008a; open circles) and GRB spectra (Gallerani et al. 2008b; ﬁlled circle).Parameters Mean value 95% conﬁdence limits ǫ II .

003 [0 . , . ǫ III .

020 [0 . , . λ .

310 [2 . , . z ( Q HII = 0 .

5) 9 .

661 [7 . , . z ( Q HII = 0 .

99) 6 .

762 [5 . , . Table 1.

The marginalized posterior probabilities with 95% C.L. errors ofall free parameters (top three parameters) and derived parameters (from thefourth parameter down) for the reionization model with PopII and PopIIIstars. the fact that the photon emissivity at those epochs are purely de-termined by PopII stars. However at higher redshifts, the value of N ion increases with z because of the presence of PopIII stars. It isclear that the data cannot be ﬁtted with PopII stars with constant N ion , II alone, one requires a rise in N ion at higher redshifts. Forthe kind of chemical feedback employed in the model, the rise israther smooth and gradual.The mean values of parameters quoted in Table 1 are simi-lar to the best-ﬁt model described in Choudhury & Ferrara (2006b)and hence the corresponding reionization history is similar to thosedescribed in the same paper. This can be readily veriﬁed from Fig-ure 1 where we see that reionization starts around z ≈ drivenby PopIII stars, and it is 90 per cent complete by z ≈ . . Af-ter a rapid initial phase, the growth of the volume ﬁlled by ionizedregions slows down at z . due to the combined action of chem-ical and radiative feedback, making reionization a considerably ex-tended process completing only at z ≈ . We refer the reader toour earlier papers for a discussion of this model. Our likelihood analysis shows that reionization is 50 (99) % complete betweenredshifts z = z ≈ , essentially rulingout models of very early reionization. The reason for this is thatthe number of photons in the IGM at z = 6 is very low as impliedby the Ly α forest data. In order to take the data point into account,the models typically cannot have too high a emissivity at z ∼ .On the other hand, the constraints on τ el imply that reionizationmust be initiated early enough. Thus the IGM has to go througha gradual reionization phase. As we discussed above, the gradualreionization is maintained by a combined action of radiative andchemical feedback effects.Interestingly, we ﬁnd that a couple of data points for d N LL / d z lie above the 2- σ limits of our analysis. In models where thesepoints agree with the data, the photon mean free path λ mfp , andhence the photoionization rate Γ PI , are relatively smaller. Theselead to larger GP optical depths which then violate the Ly α forestconstraints. This discrepancy can arise either (i) because of someunaccounted systematics present in the data or (ii) from the simpli-fying assumptions made in our models for calculating λ mfp . Theactual reason needs to be investigated further. It is most likely that the star-forming efﬁciencies and escape frac-tions and hence N ion are functions of halo mass and redshift; how-ever since the dependencies are not well understood, they weretaken to be constant for each considered stellar population in the c (cid:13) , 000–000 eionization constraints using PCA previous Section. The question one can ask is that how would theconstraints on reionization histories of the previous Section changewhen the evolution of N ion is taken into account. Ideally one wouldlike to do a rigorous likelihood analysis with N ion varying with z and see the possible ranges of reionization histories consistentwith available data. One possible approach could be to parameter-ize N ion ( z ) using some (known) function and constrain the param-eters of the function (Pritchard, Loeb, & Wyithe 2010). However,it is possible that the reionization constraints thus obtained coulddepend on the nature of the function chosen. In addition, it is notclear as to how many parameters should be used to parameterizethe function.An alternative approach is to assume N ion ( z ) to be completelyarbitrary and decompose it into principal components. These prin-cipal components essentially ﬁlters out components of the modelwhich are most sensitive to the data. Obviously, these compo-nents are the ones which can be constrained most accurately, whilethe others cannot be done so. This principal component analysis(PCA), thus, should give an idea as to which aspects of N ion canbe constrained with available data. This implies that one should geta clear idea about the optimum number of parameters required tomodel N ion to ﬁt the data most accurately.In order to carry out such analysis, we modify the model de-scribed in the previous Section in following respects: • We take N ion to be a function of z . Unlike in the previous Sec-tion, we do not explicitly assume the presence of two population ofstars but rather we include only one stellar population; any changein the characteristics of these stars over time would be accountedfor in the evolution of N ion . • Clearly, the chemical feedback prescription has to abandonedin this model, as there are no two different populations of stars any-more. The chemical feedback is rather taken into account indirectlyby the evolution of N ion . However, we retain radiative feedback inthe model given its weak dependence on the speciﬁc stellar popu-lation properties.In recent years there has been a wide use of this method incosmological data analysis. The ﬁrst set of works were mostlyrelated to CMB data where, e.g., Efstathiou & Bond (1999) andEfstathiou (2002) used principal component analysis of CMBanisotropy measurements to investigate degeneracies among cos-mological parameters. Kadota et al. (2005) applied PCA to studyhow accurately CMB observables can constrain inﬂaton poten-tial in a model-independent manner. Leach (2006) used PCA tech-niques for measuring departures from scale-invariance in the pri-mordial power spectrum of density perturbations using cosmic mi-crowave background (CMB) C l data. Mortonson & Hu (2008) de-veloped a model-independent method to study the effects of reion-ization on the large-scale E-mode polarization for any reioniza-tion history with the help of principal component analysis fol-lowed by the earlier work by Hu & Holder (2003). In the con-text of weak lensing surveys, Munshi & Kilbinger (2006) studiedthe degeneracies between cosmological parameters and measure-ment errors from cosmic shear surveys using PCA. The PCA hasalso been employed as an effective tool in the context of typeIa supernova observations to constrain the equation of state ofdark energy (Huterer & Starkman 2003; Huterer & Cooray 2005;Crittenden, Pogosian, & Zhao 2009; Clarkson & Zunckel 2010). Consider a set of n obs observational data points labeled by G α , α = 1 , , . . . , n obs . Recall that G α can represent com-binations of different data sets, e.g., in our case G α = { log(Γ PI ) , τ el , d N LL / d z } .Now, let us assume that our model contains an unknown func-tion N ion ( z ) , which we wish to constrain through observations. Wecan divide our entire redshift interval [ z min , z max ] into (equal) binsof width ∆ z and represent N ion ( z ) by a set of n bin discrete freeparameters N ion ( z i ) ≡ N i ; i = 1 , , ..., n bin (12)where z i = z min + ( i − z (13)and the bin width is given by ∆ z = z max − z min n bin − . (14)In other words, we have modelled reionization using the value of N ion in each redshift bin. We can also include other free parame-ters apart from N ion ( z i ) in the analysis, like the normalization ofthe mean free path λ , cosmological parameters etc. However, forthe moment let us assume that these parameters are ﬁxed (knownfrom other observations) and concentrate on N ion ( z i ) only. We willaddress the inclusion of other parameters later in this Section.The next step is to assume a ﬁducial model for N ion ( z i ) ,which we denote by N ﬁdion ( z i ) . The ﬁducial model should be cho-sen such that it is close to the “true” model. The departure from theﬁducial model is denoted by δN ion ( z i ) = N ion ( z i ) − N ﬁdion ( z i ) ≡ δN i . (15)We can then construct the n bin × n bin Fisher matrix F ij = n obs X α =1 σ α ∂ G th α ∂N i ∂ G th α ∂N j , (16)where G th α is theoretical value of G α modelled using the N i and σ α is the observational error on G α . The derivatives in the aboverelation are evaluated at the ﬁducial model N i = N ﬁd i . Once the Fisher matrix is constructed, we can determine itseigenvalues and corresponding eigenvectors. The principal valuedecomposition is then given by the eigenvalue equation n bin X j =1 F ij S jk = λ k S ik (17)where λ k are the eigenvalues and the eigenfunctions correspondingto λ k are the k -th column of the matrix S ik , these are the principalcomponents of N i . They can be thought of a function of z i.e., S ik = S k ( z i ) .The eigenvalues λ k are usually ordered such that λ ≥ λ ≥ . . . ≥ λ n bin , i.e., λ corresponds to the largest eigenvalue while λ nbin the smallest. The eigenfunctions are both orthonormal andcomplete and hence we can expand any function of z as linear com-binations of them. In particular we can expand the departure fromthe ﬁducial model as It is worthwhile to mention that any analysis based on the Fisher ma-trix F ij , in principle, depends on the ﬁducial model chosen. The principalcomponent analysis, which essentially involves diagonalizing F ij , is thusdependent on the choice of N ﬁd i too. In this sense, the PCA is not com-pletely model-independent.c (cid:13) , 000–000 Mitra, Choudhury & Ferrara δN i = n bin X k =1 m k S k ( z i ); m k = n bin X i =1 δN ion ( z i ) S k ( z i ) (18)where m k are the expansion coefﬁcients with m k = 0 for the ﬁdu-cial model. We can now describe our model by the coefﬁcients m k rather than the original parameters δN i . The advantage is that, un-like N i , the coefﬁcients m k are uncorrelated with variances givenby the inverse eigenvalue: h m i m j i = 1 λ i δ ij (19)The accuracy with which we can determine δN ion at a particular z i is determined by the Cramer-Rao bound (cid:10) δN ( z i ) (cid:11) ≥ n bin X k =1 S k ( z i ) λ k (20)So, the largest eigenvalues correspond to minimum variance. Theeigenvalues which are smaller would essentially increase the uncer-tainty in determining δN ion ( z i ) . Hence, most of the informationrelevant for the observed data points G α is contained in the ﬁrstfew modes with the largest eigenvalues. One may then attempt toreconstruct the function δN ion ( z i ) using only the ﬁrst M ≤ n bin modes: δN ( M ) i = M X k =1 m k S k ( z i ) . (21)However, in neglecting the last n bin − M terms, one introducesa bias in determining δN ion ( z i ) . One has to then use a carefullychosen M to perform the analysis; the choice usually depends onthe particular problem in hand. We shall discuss our choice of M in the next Section.In realistic situations, there will be other free parameters (apartfrom m k or δN i ) in the model; these could be, e.g., the normal-ization of the mean free path λ , cosmological parameters etc. Letthere be n ext number of extra parameters other than m k ; this meansthat we are now dealing with a total of n tot = n bin + n ext parame-ters. In this case, we can still form the Fisher matrix of n tot × n tot dimensions which can be written as F = (cid:18) F BB T F ′ (cid:19) (22)where F is the n bin × n bin -dimensional Fisher matrix for the δN i , F ′ is the n ext × n ext -dimensional Fisher matrix for the other pa-rameters and B is a n bin × n ext -dimensional matrix containing thecross-terms. One can then invert the above F to obtain the cor-responding Hessian matrix T = F − . Following that, one simplyretains the sub-block T corresponding to δN i whose principal com-ponents will be “orthogonalized” to the effect of the other parame-ters. The resulting “degraded” sub-block will be (Press et al. 1992) ˜F = T − = F − BF ′− B T (23)In this work we keep the cosmological parameters ﬁxed; how-ever we still need to use the above formalism to marginalize over λ . In that case, obviously n ext = 1 . The detailed results of our PCA are presented in this Section.

Figure 2.

The Fisher matrix F ij in the z − z plane. The ﬁrst task is to make an assumption for the ﬁducial model N ﬁdion ( z ) . The model should match the Γ PI and d N LL / d z datapoints at z < and also produce a τ el in the acceptable range.Unfortunately, the simplest model with N ion being constant doesnot have these requirements (recall models with only PopII starswere disfavoured in the previous Section). We have found earlierthat the effective N ion should be higher at early epochs dominatedby PopIII stars and should approach a lower value at z ∼ deter-mined by PopII stars. In this work we take N ﬁdion to be the modelgiven by mean values of the free parameters in Section 2.2.The choice of this N ﬁdion may seem somewhat arbitrary as therecould be many other forms of N ion which may match the dataequally well. We have chosen this to be our ﬁducial model be-cause of the following reasons: (i) it is obtained from a physically-motivated model of star formation which includes both metal-freeand normal stars, (ii) it is characterized by a higher N ion at higherredshifts and hence produces a good match with different observa-tions considered in this work, and (iii) the transition from higher tolower values is smooth (i.e., there is no abrupt transition or sharpfeatures). The ﬁnal conclusions of this work (to be presented laterin the Section) would hold true for any ﬁducial model having thesethree properties (though the actual functional form might be differ-ent). The match with the data for our ﬁducial model is similar toFig. 2 of Choudhury (2009).We have run the reionization models over a redshift range [ z min : z max ] = [0 : 30] , with a bin width of ∆ z = 0 . . Thisgives n bin = 151 . We have checked and found that our main con-clusions are unchanged if we vary the bin width between 0.1–0.5.The Fisher matrix F ij deﬁned in equation (16) is evaluatedat the ﬁducial model and is shown as a shaded plot in the z − z plane in Figure 2. Firstly, the components of the the matrix vanishfor z < because there are no data points considered at these red-shifts. The plot shows different characteristics for F ij at redshiftintervals < z < and z > . For z < , the values of F ij areconsiderably higher because it is determined by the sensitivity of Γ PI and d N LL / d z on N ion and it turns out that Γ PI is extremelysensitive to changes in N ion . One can see a band-like structure inthe information matrix which essentially corresponds to the pres-ence of data points. The regions where data points are sparse (ornon-existent, like between z = 2 and 3), the value of F ij is rel- c (cid:13) , 000–000 eionization constraints using PCA ∙∙∙∙∙ -6 -5 -4 -3 i λ i - Figure 3.

The inverse of eigenvalues which essentially measures the vari-ance on the corresponding coefﬁcient m i . For modes larger than 5, theeigenvalues are extremely small. atively smaller, implying that one cannot constrain N ion from thedata in those redshift bins. On the other hand, the information at z > is determined by the sensitivity of τ el on N ion . Once can seethat F ij → at the highest redshifts considered; this is expectedbecause the collapsed fraction of haloes is negligible at those red-shifts and hence there exist no free electrons to contribute to τ el .The precise redshift range at which F ij become negligible dependson the (measured) value of τ el . For the WMAP7 measurements, weﬁnd that F ij is negligible for z > ; if, e.g., the measured value of τ el were higher, F ij would be non-negligible till relatively higherredshifts. We can thus conclude that it is not possible to constrainany parameters related to star formation at redshifts z > usingthe data sets we have considered in this work.Once we diagonalize the matrix F ij , we obtain its eigenvaluesand the corresponding eigenmodes. The inverse of the eigenvalues,which are essentially the variances of the corresponding modes, areplotted in Figure 3. Since the eigenvalues λ i are sorted in ascend-ing order, the variances are larger for higher modes. For modes i > , the eigenvalues are almost zero and the variances are ex-tremely large. This implies that the errors on N ion would increasedramatically if we include modes i > .The ﬁrst 5 eigenmodes which have the lowest variances areshown in Figure 4. Clearly, all these modes tend to vanish at z > , which is because of F ij being negligible at these redshifts.Also, modes are identically zero at z < because we have not usedany data points at these redshifts. The ﬁrst 4 modes essentially tracethe sensitivity of Γ PI and d N LL / d z at z < on the value of N ion .One can see a number of spikes and troughs in these modes whosepositions correspond to the presence of data points and amplitudescorrespond to the error-bars on these data points (smaller the error,larger the amplitude). The shape of the 5th mode is vary much dif-ferent from the previous four. This mode essentially contains thebehavior of N ion at z > and hence it characterizes the sensitivityof τ el on N ion . Since τ el is obtained by integrating the reioniza-tion history over the whole redshift range, the sensitivity covers awide range of redshifts (which is unlike the sensitivity of Γ PI ). The -101-101-101-1010 10 20 30-101 z Figure 4.

The ﬁrst 5 eigenmodes of the Fisher matrix, i.e., S k ( z ); k =1 , . . . , . sensitivity is maximum around z ≈ , which is determined by thenature of the ﬁducial model. The sensitivity falls at z > becausethere is a reduction in the number of sources and free electrons. In-terestingly the sensitivity falls at z < too which is due to the factthat reionization is mostly complete at these redshifts x e → andhence changing N ion does not change the value of τ el signiﬁcantly.The modes with smaller eigenvalues have large variances andhence introduce huge uncertainties in the determination of N ion .The modes are characterized by sharp features at different redshiftsand they do not contain any signiﬁcant information about the over-all reionization history. The next step in our analysis is to decide on how many modes M touse. In the case where M = n bin , all the eigenmodes are includedin the analysis and no information is thrown away. However, thiswould mean that modes with very small eigenvalues (and hencelarge uncertainties) are included and thus the errors in recoveredquantities would be large. Reducing M is accompanied by a reduc-tion in the error, but an increased chance of getting the recoveredquantities wrong (which is known as bias).It is thus natural to ask what could be the optimum value of M for calculations. The most straightforward way, which is usedoften, is to determine it by trial and error, i.e., more and more termsare added till one gets some kind of convergence in the recoveredquantities (Mortonson & Hu 2008). Let us ﬁrst work out the sim-plistic trial-and-error approach to ﬁx M and as we shall see thatthis would be helpful in understanding recovery of various parame-ters using PCA. We have already discussed that inclusion of modes > implies drastic rise in the errors. Hence, it seems that M ≤ would be a good choice. The question is whether throwing awaysuch a large number of modes ( n bin − M ) would introduce largebiases in the recovered quantities.In order to examine these issues in more detail, let us assumethat the underlying “true” form of N ion is very different from theﬁducial model we have chosen and then try to estimate the errors c (cid:13) , 000–000 Mitra, Choudhury & Ferrara input M= M= M= N i o n . Γ P I / - s - d N LL / d z . . . z τ e l . . . z Q H II - - z l o g x H I Figure 5.

Recovery of various quantities related to reionization when the input underlying model of N ion ( z ) is assumed to be a step function (shown by solidlines). The extent of recovery is shown when the ﬁrst 3 (short-dashed lines), 5 (long-dashed lines), 7 (short-long-dashed lines) PCA modes are included in theanalysis.Parameters Input value Recovered values M = 3 M = 5 M = 7 z ( Q HII = 0 .

5) 9 .

817 9 .

722 10 .

004 9 . z ( Q HII = 0 .

99) 8 .

337 6 .

311 7 .

875 8 . Table 2.

The recovered quantities for a input model where N ion is repre-sented by a step function when only the ﬁrst M PCA modes are included inthe analysis. we make in recovering this underlying model using only the ﬁrstfew modes. In order to put our method to test, it is then natu-ral to assume an underlying model which is noticeably differentfrom the ﬁducial one and study its recovery using only the ﬁrst fewmodes. Recall that the ﬁducial model represents a smoothly varying N ion , so we assume the underlying input model to be one havingan abrupt transition, e.g., a step function: N inpion ( z ) = 10 for z <

7= 40 for z ≥ (24)The parameters in the model are adjusted so that it matches with ob-servations of Γ PI and d N LL / d z at z < and also gives the correctobserved value of τ el . The idea would be to check whether we areable to recover quantities of interest with reasonable accuracy with M = 5 . The model chosen above is similar to the abrupt-transitionmodel considered in Choudhury & Ferrara (2005).The results of our analysis are shown in Figure 5 and in Ta-ble 2. In the ﬁgure, we have plotted, as functions of redshifts, thefour quantities relevant to reionization which we would like to re-cover, namely, N ion (top-left panel), the photoionization rate Γ PI (top-right panel), the volume ﬁlling factor of ionized regions Q HII (bottom-left panel) and the globally averaged neutral hydrogenfraction x HI (bottom-right panel). Different curves represent theinput step model (solid) and the recovered quantities for three val-ues of M = 3 , , (short-dashed, long-dashed, short-long-dashed,respectively). We have not shown results for intermediate values of M (i.e., M = 4 , ) because the difference between successiveplots is too small to be noticed. It is clear from the top left panel thatthe recovered N ion is excellent for z < because the ﬁducial andinput models agree at these redshifts, which is a manifestation ofthe fact that the value of N ion is highly constrained by good qual-ity data points at these redshifts. On the other hand, the recoveryis quite poor for z > . This is because the evolution at z > isonly weakly constrained by τ el . In particular at z > , the modesare essentially zero and hence all models tend to the ﬁducial oneimplying that it is impossible to recover N ion at z > with theﬁrst few modes.The top-middle and top-right panels show the correspondingplots for the photoionization rate Γ PI and the redshift distributionof Lyman-limit systems d N LL / d z respectively. The input modelhas a sharp feature around z ≈ in both the quantities arisingmainly from the abrupt step in N ion . The reionization is complete( Q HII = 1 ) at z ≈ after which the photoionization rate risessharply because of overlap of ionized regions and consequent risein mean free path (which manifests itself as a sharp drop in thenumber of LLS). This rise in Γ PI is suddenly halted at z = 7 where we see a sharp decline because of the corresponding stepdecline in N ion . Following that, Γ PI settles to a smaller value (cor-responding to a smaller value of N ion ) and subsequently shows agradual rise arising again from the rise in mean free path. Interest-ingly, this feature is completely missing in the recovered model for M = 3 (and also for M = 4 , not shown in the ﬁgure). The fea-ture shows up when M is increased to 5, though the exact natureof this feature is not identical to the input one. Increasing M to 7introduces other sharp features at z ∼ which are not present inthe input model. Of course, the recovery at z > is poor as mostof the eigenmodes hardly contain any information at these redshiftsand the recovered models simply follow the ﬁducial model. Hence,the recovery of the photoionization rate and the LLS distribution isprobably not satisfactory overall, however we can recover it with c (cid:13) , 000–000 eionization constraints using PCA Riskerrorbias1 10 100

No. of modes M R i s k , e rr o r , b i a s . . Figure 6.

Dependence of Risk, error and bias as deﬁned in equation (25)on the number of modes M . The blow-up of a region around M = 5 isshown in the inset which shows that there is a clear minimum in the Risk at M = 5 . reasonable accuracy for z < by considering the ﬁrst M = 5 modes.The recovery of τ el is shown in the bottom-left panel. It isclear that the recovery is good for all values of M . In most reion-ization studies, the quantities of main interest are the Q HII and x HI ,which are plotted in the bottom-middle and bottom-right panels re-spectively. One can easily see from both the panels that that theagreement between the M = 3 case and the input model is quitepoor (which is the case for M = 4 as well). In particular, reioniza-tion is complete at z ≈ . for the input model, while it completesonly at z ≈ for M = 3 case (see Table 2). However, the mo-ment M is increased to 5, one has a remarkable match with theinput model, e.g., the difference in Q HII is < . for z < whilethe difference is < for z < . Unfortunately, we cannot re-cover the sharp feature in x HI around z = 7 for the input model(which corresponds to a similar feature in Γ PI , discussed above)for the M = 5 case, however the overall agreement with the inputmodel is still quite good. The agreement is further improved as weincrease M (to 7 in the plot) but that comes at the cost of increas-ing errors. As far as recovering the basic reionization history (i.e.,evolution of Q HII and x HI ) is concerned, M = 5 seems to be theoptimum choice.It is important to point out that the recovery of various quan-tities related to reionization is good (or excellent, in some cases)even when the recovered value of N ion is incorrect. This may seemssurprising as it is the value of N ion that acts as a source for reion-ization. To understand this apparent paradox, note that the recoveryof N ion is poor mostly at z > . At these redshifts the collapsedfraction d f coll / d t is typically small, hence the source emissivity N ion d f coll / d t → at these epochs. Hence even if we change thevalue of N ion , the absolute change in the emissivity is negligibleand hence the reionization process remains relatively unaffected.There is another way of looking at it: the extent of recovery of var-ious quantities at z > is determined by the behaviour of PCAmodes at z > which, in turn, is determined by the data set relatedto τ el . Now τ el is most sensitive to the ionized fraction Q HII ( z ) at z > . Hence, it is not surprising that Q HII ( z ) would be nicelyrecovered at these redshifts. Such arguments can be extended forother quantities too. This also brings out the fact that in order to re-cover N ion ( z ) (and thus star formation, escape fraction and chem-ical feedback) reliably, one requires data points at z > relatedto quantities which are sensitive to N ion , like say, hypothetically,a good constraint on Γ PI at z ∼ can constrain N ion at thoseredshifts.To summarize our results on recovering the input step model,the recovery of all the quantities is excellent for z < . We ﬁndthat recovery of N ion at z > is not satisfactory. The recovery of Γ PI at z < is quite reasonable by considering the ﬁrst M = 5 modes. Fortunately, the recovery of Q HII and x HI turns out to beexcellent for M = 5 . Hence we can use the coefﬁcients m i of these5 best constrained eigenmodes as our model parameters instead of N ion ( z i ) without signiﬁcant loss of information.We should mention that the above analysis depends on thechoice of the input model which is taken to be the step function.In fact, the recovery is better if the input N ion is a smoother func-tion (provided it satisﬁes the observational constraints, of course).In particular, all models which are bracketed by the ﬁducial modeland the step model would end up giving good agreements for Γ PI , d N LL / d z, τ el , Q HII and x HI . Of course, if the input modelshave sharp features at some particular redshift(s) z > , those fea-tures may not be recovered satisfactorily by including only ﬁrst fewterms.A slightly more formal approach is to estimate M by minimiz-ing the quantity Risk, which is deﬁned as (Wasserman et al. 2001) R = n bin X i =1 (cid:16) δN ( M ) i (cid:17) + n bin X i =1 (cid:28)(cid:16) δN ( M ) i (cid:17) (cid:29) (25)The 1st term in the RHS is the bias contribution which arisesfrom neglecting the higher order terms, and the 2nd term is theuncertainty given by Cramer-Rao bound which rises as higher or-der terms (i.e., those corresponding to smaller eigenvalues) are in-cluded: (cid:28)(cid:16) δN ( M ) i (cid:17) (cid:29) ≥ M X k =1 S k ( z i ) λ k (26)However, the calculation of Risk, as deﬁned above, involves as-sumption of an “underlying model”, hence the determination of M using this method would be model-dependent. Let us assume theunderlying model to be the same as equation (24). Then the depen-dence of the Risk on the number of modes M is shown in Figure6. In addition, we also show the plots of bias [ﬁrst term of the rhsin equation (25)] and the error [second term of the rhs in equation(25)] are also shown. It is clear that the value of error is small forlower M which is a direct consequence of small eigenvalues. Theerror shoots up drastically for M > which is what we discussedin the previous Section. On the other hand, the bias is higher forsmall M and decreases gradually as more and more terms in thesummation are included. The Risk, which is the sum of these twoquantities, has a clear minimum at M = 5 (which is more clearfrom the inset in Figure 6). Hence we conclude that M = 5 is theoptimum value to be used.The main conclusion of this Section is that one needs ﬁveparameters to describe the reionization history which can be con-strained with the data considered in this paper. Out of these ﬁve,four parameters are required to describe the emissivity at z < where most of the data points exist; these parameters are the best-constrained ones. The ﬁfth parameter characterizes the evolution c (cid:13) , 000–000 Mitra, Choudhury & Ferrara

Parameters Mean value 95% conﬁdence limits m .

002 [ − . , . m .

007 [ − . , . m − .

003 [ − . , . m .

003 [ − . , . m − .

065 [ − . , . λ .

450 [3 . , . z ( Q HII = 0 .

5) 10 .

349 [9 . , . z ( Q HII = 0 .

99) 8 .

357 [5 . , . Table 3.

The marginalized posterior probabilities with 95% C.L. errors ofall free parameters (top six parameters) and derived parameters (from theseventh parameter down) for the reionization model with principal compo-nent analysis. of N ion at z > and is essentially determined by the WMAP con-straints of τ el . Inclusion of more parameters would lead to overﬁt-ting of the data and hence the constraints on the parameters wouldbe highly uncertain. The constraints on reionization are obtained by performing aMonte-Carlo Markov Chain (MCMC) analysis over the parameterspace of PCA amplitudes { m , . . . , m } and λ . The cosmologicalparameters are kept ﬁxed to the WMAP7 best-ﬁt values. In orderto carry out the analysis, we have developed a code based on thepublicly available COSMOMC (Lewis & Bridle 2002) (which iswidely used for running MCMC on CMB and other cosmologicaldata sets). To get accurate results from MCMC, we ensure that theparameter chains contain enough independent samples over a suf-ﬁciently large volume of parameter space so that the density of thesamples converges to the actual posterior probability distribution.We run a number of separate chains (varying between 5 to 10) untilthe Gelman and Rubin convergence statistics, R , corresponding tothe ratio of the variance of parameters between chains to the vari-ance within each chain, satisﬁes R − < . .The mean values and the 95% conﬁdence limits on our param-eters obtained from our analysis are shown in Table 3. Our ﬁducialmodel m = m = m = m = m = 0 is included withinthe 95% conﬁdence limits of the parameters corresponding to theeigenmode amplitudes, however the mean values show clear depar-tures from the ﬁducial model. This implies that the model charac-terized by the mean values of parameters, loosely mentioned as the“mean model” hereafter, is different from the ﬁducial one.In order to see how different it is, we show the evolution ofvarious quantities related to reionization is shown in Figure 7. Thesolid lines represent the mean model while the shaded region cor-respond to 95% conﬁdence limits. For comparison, we have alsoplotted the ﬁducial model (short-dashed) and the step model (long-dashed) which was introduced in Section 4.2. We ﬁnd that the ﬁdu-cial model is within the 95% conﬁdence limits for the whole red-shift range, while the step model is within the 95% conﬁdence lim-its for z < . Also note that the ﬁducial model is actually nearthe edge of the shaded region, implying that there is a wide rangeof models allowed by the data which are characteristically differentfrom the ﬁducial model. http://cosmologist.info/cosmomc/ The next point to note is that all the quantities are highly con-strained at z < , which is expected as most of the observationalinformation related to reionization exists only at those redshifts.The errors also decrease at z > as there is practically no infor-mation in the PCA modes and hence all models converge towardsthe ﬁducial one. This implies that early stages of reionization are al-most similar independent of the N ion chosen. The most interestinginformation regarding reionization is concentrated within a redshiftrange < z < .It is very clear from the plot of N ion ( z ) (top-left panel) thatsuch quantity must necessarily increase from its constant value at z < . This rules out the possibility of reionization with a singlestellar population having non-evolving IMF and/or star-forming ef-ﬁciency and/or escape fraction. The value of N ion can be almost 40times larger than its value at z < . Also note that N ion need notbe a monotonic function of z . For example, the mean model, whichis constant for z < , shows an increase for z > followed by adecrease at z ≈ . The plot shows a subsequent increase around z ≈ , however one should remember that the information con-tained within eigenmodes are severely limited at these epochs.From the plot of Γ PI ( z ) (top-middle panel), we ﬁnd that themean model is consistent with the observational data at z < ,as expected. The errors corresponding to 95% conﬁdence limits arealso smaller at z < for reasons discussed above. The photoioniza-tion rate for the ﬁducial model shows a smooth rise at z > with apeak around z ≈ , however model described by the mean valuesof the parameters shows a much sharper rise and much prominentpeak. The location of the peak is around z ∼ . . The highestvalue of Γ PI allowed by the data can be as high as − s − (95%conﬁdence level), which is about 100 times the values typically ob-served at z < . The prominent peak-like structure is also presentin the d N LL / d z (top-right panel). Interestingly, the high- Γ PI mod-els predict that d N LL / d z ≈ at z ∼ . , hence any sighting ofLLS at these epochs would put more constraints on the models.The limits on τ el (bottom-left panel) are, as expected, similarto the WMAP7 constraints. We ﬁnd that the mean τ el is slightlyhigher than the best-ﬁt WMAP7 value because a wide range ofmodels with early reionization are allowed by the data.The constraints on the reionization history can be seen fromthe plot of Q HII ( z ) (bottom-middle panel). The growth of Q HII for the ﬁducial model is somewhat gradual. On the other hand, themean model, which is characterized by sharp peak structures in N ion and Γ PI at z > , shows a much faster rise in Q HII at ini-tial stages, though the completion of reionization takes place onlyat z ≈ . The shaded regions show that reionization can be com-plete as early as z ≈ . (95% conﬁdence level). These modelsof early reionization are essentially characterized by high N ion at . < z < (so that enough contribution to τ el is achieved tomatch the WMAP7 constraints) followed by a sharp decrease at z < . so that the emissivity becomes low enough to match thephotoionization rate obtained from Ly α forest data.Similar conclusions can be obtained from the plot of x HI ( z ) (bottom-right panel). In general, the models allowed by the 95%conﬁdence limits are consistent with the available data points(shown by points with error-bars). Models of very early reioniza-tion (i.e., those with high N ion at . < z < ) show sharpdecrease in x HI at z ≈ and it can become as low as − at z ≈ . . However, the neutral fraction has to increase sharplyagain at z < . (corresponding to sharp decrease in N ion ) so as tomatch the Ly α forest constraints. Thus the evolution of x HI is notmonotonic for these models. On the other hand, models with rela-tively smoothly evolving N ion (ones similar to the ﬁducial model) c (cid:13) , 000–000 eionization constraints using PCA mean ﬁducialstep N i o n ∙ ∙ ∙ ∙ ∙ . Γ P I / - s - ∙∙∙∙∙ d N LL / d z ∙ . . . z τ e l . . . z Q H II ■ ∙ □ △ ○○ - - - z l o g x H I Figure 7.

The marginalized posteriori distribution of various quantities related to reionization history obtained from the PCA. The different quantities inthe panels are identical to those in Figure 1. The solid lines correspond to the model described by mean values of the parameters while the shaded regionscorrespond to 2- σ limits. In addition, we show the properties of the ﬁducial model (short-dashed lines) and the step model used in Section 4.2 (long-dashedlines). The points with error-bars denote the observational data points which are identical to those in Figure 1. show gradual decrease in x HI between < z < and it smoothlymatches the Ly α forest data. The evolution of the neutral fractionis thus monotonic in such models with smoothly evolving N ion .If we now go back to the lower portion of Table 3, we ﬁnd thatreionization is 50% complete between redshifts 9.6 – 12.0 (95%conﬁdence level), while it is almost (99%) complete between red-shifts 5.8 – 10.6 (95% conﬁdence level). Note that the lower limiton the redshift of reionization (5.8) is imposed as a prior on theparameters.Thus, the PCA shows that a wide range of reionization his-tories is still allowed by the data. Reionization can be quite earlyor can be gradual and late, depending on the behavior of N ion ( z ) .Hence, if one considers only the data we have used, it is practi-cally impossible to put any sensible constraints on chemical feed-back and/or the evolution of star-forming efﬁciencies and/or es-cape fractions. While this might seem somewhat disappointing atthe moment, one can hope for much better constraints in near fu-ture when the magnitude of data sets are going to rise manifold.In fact, in order to keep the analysis simple, we have not used allthe data sets available. For example, the constraints on the Ly α and Ly β transmitted ﬂuxes now extend beyond z = 6 and pos-sibly could constrain the models much more. However, our nu-merical code takes signiﬁcantly more time while calculating thetransmitted ﬂuxes and also there remain uncertainties in the theo-retical modelling of the IGM at such redshifts (like the distributionof baryonic matter and the scatter in the temperature-density rela-tion); hence we have worked simply with the constraints on Γ PI .Similarly the distribution of LLS at z > could also be importantin ruling out some of the allowed models. At present, there exists adata point at z ≈ which put limits d N LL / d z = 8 . ± . . Onthe other hand, very high emissivity models predict d N LL / d z ≈ at z ∼ . . Hence constraints on LLS distribution at z ∼ . canbe helpful in shrinking the allowed parameter space signiﬁcantly.We should also mention that the constraints obtained through the PCA are widely different from those obtained using the chem-ical feedback model of Section 2 involving PopII and PopIII stars.The model in Section 2 uses a particular prescription for chemi-cal feedback and assumes constant N ion , II , N ion , III , which resultsin an effective N ion ( z ) which is smoothly evolving and monotoni-cally increasing with z . On the other hand, the models allowed bythe PCA do not have any physical constraint regarding how N ion should evolve. It turns out that in absence of any physical motiva-tion, current data does allow for non-monotonic N ion which maycontain sharp features. Hence it is not surprising that the shapes ofthe allowed models are quite different from the chemical feedbackmodels. In this work, we have used a semi-analytical model(Choudhury & Ferrara 2005; Choudhury & Ferrara 2006b) tostudy the observational constraints on reionization. Assumingthat reionization at z > is primarily driven by stellar sources,we have developed a formalism based on principal componentanalysis to model the unknown function N ion ( z ) , the number ofphotons in the IGM per baryon in collapsed objects. We have usedthree different sets of data points, namely, the photoionizationrates Γ PI obtained from Ly α forest Gunn-Peterson optical depth,WMAP7 data on electron scattering optical depth τ el , and theredshift distribution of Lyman-limit systems d N LL / d z at z ∼ . .The main ﬁndings of our analysis are: • The elements of the Fisher information matrix have larger val-ues for z < where most of the data points are. There is hardlyany information at z > , implying that no information on star-formation and/or chemical feedback can be obtained at these red-shifts using the available three data sets. • To model N ion ( z ) over the range < z < it is necessary c (cid:13) , 000–000 Mitra, Choudhury & Ferrara to include 5 modes. Using a larger number of modes improves theagreement but at the cost of increasing errors. • One may not be able to recover the actual form of N ion ( z ) us-ing only these 5 modes, however the recovery of Γ PI and d N LL / d z at z < is quite satisfactory and that of Q HII , x HI is excellent. • It is not possible to match available reionization data with aconstant N ion over the whole redshift range, i.e. N ion must increaseat z > . This is a signature of either of a changing IMF induced bychemical feedback and/or evolution in the star-forming efﬁciencyand/or photon escape fraction of galaxies. The data allows for non-monotonic N ion ( z ) (and consequently of x HI ). In particular, reion-ization histories could show sharp features around z ≈ . • The PCA implies that reionization must be 99% completedbetween . < z < . (95% conﬁdence level) and is expectedto be 50% complete at z ≈ z < are quite tight, one requires additional data points at z > to improve constraints on models of feedback and reionization. Themost obvious addition would, of course, be observation of Gunn-Peterson trough in more QSOs at higher redshifts. In parallel, it isexpected that observations of GRBs and Ly α emitters could con-strain x HI at z > , which again would result in improved con-straints. Finally, observations of large-scale EE polarization signalby future CMB probes, like Planck , would be extremely importantin probing the evolution of N ion at z > . Since the constraintsobtained from the data are still unsatisfactory, there remains am-ple scope for developing physically-motivated theoretical modelswhich can match a wide-variety of available data. This, in turn, re-quires signiﬁcant improvement in our understanding of processeslike chemical feedback and also the evolution of star-forming efﬁ-ciencies and escape fraction. ACKNOWLEDGEMENTS

We would like to thank Dhiraj Kumar Hazra and Atri Bhattacharyafor their help and suggestions regarding numerical computations.Computational work for this study was carried out at the clustercomputing facility in the Harish-Chandra Research Institute . REFERENCES

Barkana R., Loeb A., 2001, Phys. Rep., 349, 125Bolton J. S., Haehnelt M. G., 2007, MNRAS, 382, 325Burigana C., Popa L. A., Salvaterra R., Schneider R., Choudhury T. R.,Ferrara A., 2008, MNRAS, 385, 404Chiu W. A., Fan X., Ostriker J. P., 2003, ApJ, 599, 759Choudhury T. R., 2009, Current Science, 97, 841Choudhury T. R., Ferrara A., 2005, MNRAS, 361, 577Choudhury T. R., Ferrara A., 2006a, in Fabbri R., ed, Cosmic Polarization.Research Signpost, p. 205Choudhury T. R., Ferrara A., 2006b, MNRAS, 371, L55Choudhury T. R., Ferrara A., 2007, MNRAS, 380, L6Clarkson C., Zunckel C., 2010, Physical Review Letters, 104, 211301Crittenden R. G., Pogosian L., Zhao G., 2009, J. Cosmology Astropart.Phys., 12, 25 http://cluster.hri.res.in/index.html Efstathiou G., 2002, MNRAS, 332, 193Efstathiou G., Bond J. R., 1999, MNRAS, 304, 75Fan X., Carilli C. L., Keating B., 2006, ARA&A, 44, 415Fan X. et al., 2006, AJ, 131, 1203Furlanetto S. R., Oh S. P., Briggs F. H., 2006, Phys. Rep., 433, 181Gallerani S., Choudhury T. R., Ferrara A., 2006, MNRAS, 370, 1401Gallerani S., Ferrara A., Fan X., Choudhury T. R., 2008a, MNRAS, 386,359Gallerani S., Salvaterra R., Ferrara A., Choudhury T. R., 2008b, MNRAS,388, L84Hopkins P. F., Richards G. T., Hernquist L., 2007, ApJ, 654, 731Hu W., Holder G. P., 2003, Phys. Rev. D, 68, 023001Huterer D., Cooray A., 2005, Phys. Rev. D, 71, 023506Huterer D., Starkman G., 2003, Physical Review Letters, 90, 031301Kadota K., Dodelson S., Hu W., Stewart E. D., 2005, Phys. Rev. D, 72,023510Kashikawa N. et al., 2006, ApJ, 648, 7Lacey C., Cole S., 1993, MNRAS, 262, 627Larson D. et al., 2010, ArXiv e-prints, 1001.4635Leach S., 2006, MNRAS, 372, 646Lewis A., Bridle S., 2002, Phys. Rev. D, 66, 103511Loeb A., Barkana R., 2001, ARA&A, 39, 19Miralda-Escud´e J., Haehnelt M., Rees M. J., 2000, ApJ, 530, 1Mortonson M. J., Hu W., 2008, ApJ, 672, 737Munshi D., Kilbinger M., 2006, A&A, 452, 63Press W. H., Schechter P., 1974, ApJ, 187, 425Press W. H., Teukolsky S. A., Vetterling W. T., Flannery B. P., 1992, Numer-ical recipes in FORTRAN. The art of scientiﬁc computing. Cambridge:University Press, —c1992, 2nd ed.Pritchard J. R., Loeb A., Wyithe J. S. B., 2010, MNRAS, 408, 57Prochaska J. X., O’Meara J. M., Worseck G., 2010, ApJ, 718, 392Schneider R., Salvaterra R., Ferrara A., Ciardi B., 2006, MNRAS, 369, 825Songaila A., Cowie L. L., 2010, ApJ, 721, 1448Totani T., Kawai N., Kosugi G., Aoki K., Yamada T., Iye M., Ohta K., Hat-tori T., 2006, PASJ, 58, 485Wasserman L. et al., 2001, ArXiv Astrophysics e-prints, arXiv:astro-ph/0112050Wyithe J. S. B., Loeb A., 2003, ApJ, 586, 693c (cid:13)000