[PDF] Overcoming real-world obstacles in 21 cm power spectrum estimation: A method demonstration and results from early Murchison Widefield Array data

Abstract

We present techniques for bridging the gap between idealized inverse covariance weighted quadratic estimation of 21 cm power spectra and the real-world challenges presented universally by interferometric observation. By carefully evaluating various estimators and adapting our techniques for large but incomplete data sets, we develop a robust power spectrum estimation framework that preserves the so-called "EoR window" and keeps track of estimator errors and covariances. We apply our method to observations from the 32-tile prototype of the Murchinson Widefield Array to demonstrate the importance of a judicious analysis technique. Lastly, we apply our method to investigate the dependence of the clean EoR window on frequency--especially the frequency dependence of the so-called "wedge" feature--and establish upper limits on the power spectrum from z = 6.2 to z = 11.7. Our lowest limit is Delta(k) < 0.3 Kelvin at 95% confidence at a comoving scale k = 0.046 Mpc^-1 and z = 9.5.

Full PDF

OOvercoming real-world obstacles in 21 cm power spectrum estimation: A methoddemonstration and results from early Murchison Wideﬁeld Array data

Joshua S. Dillon,

1, 2, ∗ Adrian Liu,

1, 2, 3, ∗ Christopher L. Williams,

1, 2

Jacqueline N. Hewitt,

1, 2

MaxTegmark,

1, 2

Edward H. Morgan, Alan M. Levine, Miguel F. Morales, Steven J. Tingay,

5, 6

GianniBernardi, Judd D. Bowman, Frank H. Briggs,

6, 9

Roger C. Cappallo, David Emrich, Daniel A.Mitchell,

6, 11

Divya Oberoi,

10, 12

Thiagaraj Prabu, Randall Wayth,

5, 6 and Rachel L. Webster

6, 11 MIT Kavli Institute, Massachusetts Institute of Technology, Cambridge, MA, USA Dept. of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA Dept. of Astronomy and Berkeley Center for Cosmological Physics, Berkeley, CA, USA Dept. of Physics, University of Washington, Seattle, WA, USA International Centre for Radio Astronomy Research, Curtin University, Perth, WA, Australia ARC Centre of Excellence for All-sky Astrophysics (CAASTRO), Australia Harvard-Smithsonian Center for Astrophysics, Harvard University, Cambridge, MA, USA School of Space and Earth Exploration, Arizona State University, Tempe, AZ, USA Research School of Astronomy and Astrophysics,The Australian National University, Canberra, Australia MIT-Haystack Observatory, Westford, MA, USA School of Physics, The University of Melbourne, Melbourne, Australia National Centre for Radio Astrophysics, Tata Institute of Fundamental Research, Pune, India Raman Research Institute, Bangalore, India (Dated: January 15, 2014)We present techniques for bridging the gap between idealized inverse covariance weightedquadratic estimation of 21 cm power spectra and the real-world challenges presented universallyby interferometric observation. By carefully evaluating various estimators and adapting our tech-niques for large but incomplete data sets, we develop a robust power spectrum estimation frameworkthat preserves the so-called “EoR window” and keeps track of estimator errors and covariances. Weapply our method to observations from the 32-tile prototype of the Murchinson Wideﬁeld Array todemonstrate the importance of a judicious analysis technique. Lastly, we apply our method to inves-tigate the dependence of the clean EoR window on frequency—especially the frequency dependenceof the so-called “wedge” feature—and establish upper limits on the power spectrum from z = 6 . z = 11 .

7. Our lowest limit is ∆( k ) < . k = 0 . − and z = 9 . PACS numbers: 95.75.-z, 95.85.Bh, 98.62.Ra, 98.80.-k, 98.80.Es

I. INTRODUCTION

In recent years, 21 cm tomography has emerged as apromising probe of the Epoch of Reionization (EoR). Asa direct measurement of the three-dimensional distribu-tion of neutral hydrogen at high redshift, the techniquewill allow detailed study of the complex astrophysical in-terplay between the intergalactic medium and the ﬁrstluminous structures of our Universe. This will eventu-ally pave the way towards the use of 21 cm tomographyto constrain cosmological parameters to exquisite preci-sion, thanks to the enormity of the physical space withinits reach (please see, e.g., Furlanetto et al. [1], Moralesand Wyithe [2], Pritchard and Loeb [3], Loeb and Furlan-etto [4] for recent reviews). ∗ Because the contributions of the ﬁrst two authors were essen-tially equal, we determined their order by a Monte Carlo algo-rithm inspired by the one presented in Section II B. They canbe reached for questions or comments at [email protected] and [email protected] . To date, observational eﬀorts have focused on measure-ments of the 21 cm power spectrum. Such a measurementis exceedingly diﬃcult. Sensitivity requirements are ex-treme, requiring thousands of hours of integration andlarge collecting areas [5–9]. Adding to this challenge isthe fact that raw sensitivity is insuﬃcient—what countsis sensitivity to the cosmological signal above expectedcontaminants like galactic synchrotron radiation, whichare three to four orders of magnitude brighter at the rel-evant frequencies [10–13].To deal with these challenges, numerous techniqueshave been proposed and implemented for foregroundmitigation and power spectrum estimation. These in-clude foreground removal via parametric ﬁts [14–17], non-parametric methods [18–20], principal component anal-yses [21–24], ﬁltering [25–27], frequency stacking [28],and quadratic methods [29–31]. In almost all of theseproposals, foregrounds are separated from the cosmolog-ical signal by taking advantage of the diﬀerences in theirspectra. Foregrounds are dominated by continuum pro-cesses and thus have smooth spectra. On the other hand,because the cosmological line-of-sight distance maps tothe observed frequency of the redshifted 21 cm line, the a r X i v : . [ a s t r o - ph . C O ] J a n rapid ﬂuctuations in the brightness temperature distri-bution that are expected from theory will map to a mea-sured cosmological signal with jagged, rapidly ﬂuctuat-ing spectra. When these spectral diﬀerences are con-sidered in conjunction with instrumental characteristics,one can identify an “EoR window”: a region in Fourierspace where power spectrum measurements are expectedto be relatively free from foregrounds [27, 32–36]. Thisis shown schematically in Figure 1, where we have usedearly Murchison Wideﬁeld Array (MWA) data to esti-mate the power spectrum as a function of k ⊥ (Fouriermode perpendicular to the line-of-sight) and k (cid:107) (Fouriermode parallel to the line-of-sight). More details regard-ing this ﬁguew are provided in Section III; for now wesimply wish to draw attention to the existence of a rela-tively contaminant-free region in the middle of the k ⊥ - k (cid:107) plane. This clean region is what we denote the EoR win-dow.The EoR window is generally considered the sweet spotfor an initial detection of the cosmological 21 cm powerspectrum, and constraints are likely to degrade awayfrom the window. At high k ⊥ (i.e., the ﬁnest angularfeatures on the sky), errors increase due to the angularresolution limitations of one’s instrument. For an inter-ferometer, this resolution is roughly set by the length ofthe longest baseline. Conversely, the shortest baselinesdeﬁne the largest modes that are observable by the in-strument. Errors therefore also increase at the lowest k ⊥ where again there are few baselines.A similar limitation deﬁnes the boundary of the EoRwindow at high k (cid:107) . Since the spectral nature of 21 cmmeasurements mean that diﬀerent observed frequenciesmap to diﬀerent redshifts, the highest k (cid:107) modes are in-accessible due to the limited spectral resolution of one’sinstrument. At low k (cid:107) , one probes spectrally smoothmodes—precisely those that are expected to be fore-ground contaminated. Thus there is another boundaryto the EoR window at low k (cid:107) .A ﬁnal delineation of the EoR window is provided bythe region labeled as the “wedge” in Figure 1. The wedgefeature is a result of an interplay between angular andspectral eﬀects. Simulations have shown that the wedgeis the eﬀect of chromaticity in one’s synthesized beam(which is inevitable when an interferometer is used tosurvey the sky). This chromaticity imprints unsmoothspectral features on measured foregrounds, resulting inforeground contamination beyond the lowest k (cid:107) modeseven if the foregrounds themselves are spectrally smooth.Luckily, this sort of additional contamination follows areasonably predictable pattern in the k ⊥ - k (cid:107) plane, and inthe limit of intrinsically smooth foregrounds, the wedgecan be shown to extend no farther than the line k (cid:107) = (cid:20) sin θ ﬁeld D M ( z ) E ( z ) D H (1 + z ) (cid:21) k ⊥ , (1)where D H ≡ c/H , E ( z ) ≡ (cid:112) Ω m (1 + z ) + Ω Λ , D M ( z ) ≡ (cid:82) z dz (cid:48) /E ( z (cid:48) ), θ ﬁeld is angular radius of thethe ﬁeld-of-view, and c , H , Ω m , and Ω Λ have their FIG. 1. The “EoR window,” a region of Fourier space withrelatively low noise and foregrounds, is thought to presentthe best opportunity for measuring the cosmological 21 cmpower spectrum during the Epoch of Reionization. Here weshow an example power spectrum from early MWA data, as afunction of k ⊥ (Fourier components perpendicular to the lineof sight) and k (cid:107) (Fourier components parallel to the line ofsight). More details on how we have calculated and plotted P ( k ⊥ , k (cid:107) ) are found in Section III. We schematically highlightthe instrumental and foreground eﬀects that that delimit theEoR window—the coldest part of this power spectrum. At lowand high k ⊥ , measurements are limited by an instrument’sability to probe the largest and smallest angular scales, re-spectively. Limited spectral resolution causes similar eﬀectsat the highest k (cid:107) . As spectrally smooth sources, foregroundsinhabit primarily the low k (cid:107) regions. Thanks to chromaticinstrumental eﬀects, however, there is a slight encroachmentof foregrounds towards higher k (cid:107) at higher k ⊥ , in what hasbeen colloquially termed the “wedge” feature. usual meanings [32–35]. Intuitively, the foreground-contaminated wedge extends to higher k (cid:107) at higher k ⊥ because the high k ⊥ modes are probed by the longer base-lines of an interferometer array, which have higher fringerates that more eﬀectively imprint spectral structure inthe measured signals. For an alternate but equivalentexplanation in terms of delay modes, please see the illu-minating discussion in Parsons et al. [27].The concept of an EoR window is important in that itprovides relatively strict boundaries that separate fairlyforeground-free regions of Fourier space from heavilyforeground-contaminated ones. It therefore provides onewith the option of practicing foreground avoidance ratherthan foreground subtraction. If it turns out that fore-grounds cannot be modeled well enough to be directlysubtracted with the level of precision required to detectthe cosmological signal, foreground avoidance becomesan important alternative, in that the only way to robustlysuppress foregrounds is to preferentially make measure-ments within the EoR window. Likely, some combina-tion of the two strategies—foreground subtraction andforeground avoidance—will prove useful for the detec-tion of the 21 cm power spectrum. Of course, measure-ments within the EoR window are still contaminated byinstrumental noise, but fortunately the noise integratesdown with further observation time (as long as calibra-tion errors and other instrumental systematics can besuﬃciently minimized). Observationally, it is encourag-ing that the EoR window has now been shown to be freeof foregrounds to better than one part in a hundred inpower [13].As experimental sensitivities increase, however, onemust take care to preserve the cleanliness of the EoRwindow to an even higher dynamic range. There areseveral ways in which our notion of the EoR windowmay be compromised. First, as experiments integratein time and acquire greater sensitivity, we may discoverthat our approximation of spectrally smooth foregroundsis insuﬃciently good for a detection of the (faint) cosmo-logical signal. In other words, foreground sources mayhave small but non-negligible high k (cid:107) components in theirspectra that have thus far gone undetected. This wouldtranslate into a smaller-than-expected EoR window. Inaddition, even intrinsically smooth foregrounds may ap-pear jagged in a real measurement because of instrumen-tal eﬀects such as imperfect calibration. The precise in-terferometer layout may also result in unsmooth artifactsthat arise from combining data from non-redundant base-lines [37]. Finally, suppose that the aforementioned ef-fects are negligible and that the assumption of spectrallysmooth foreground emission continues to hold. The EoRwindow still cannot be taken for granted because non-optimal data analysis techniques may result in unwantedforeground artifacts in the region. For the EoR windowto exist at all, it is essential that power spectra are esti-mated in a rigorous fashion, with well-understood statis-tics.The goal of this paper is to minimize unwanted dataanalysis artifacts by establishing methods for power spec-trum estimation that are both robust and as optimal aspossible. Previous eﬀorts have rarely met both criteria:either the methods are robustly applicable to data withreal-world artifacts but fail to achieve optimized (or evenrigorously computable) error properties, or provide anoptimal framework but ignore real-world complications.In this paper we extend the rigorous framework describedin Liu and Tegmark [29] and Dillon et al. [30] to dealwith real-world eﬀects. The result is a computationally feasible approach to analyzing real data that not onlypreserves the cleanliness of the EoR window, but alsorigorously keeps track of all relevant error statistics.To demonstrate the applicability of our approach, weapply our techniques to early data from the MurchisonWideﬁeld Array (MWA). These data were derived from ∼

22 hours of tracked observations using an early, 32-element prototype array. The results are therefore notdesigned to be cosmologically competitive, but insteadillustrate the rigor that will be required for an eventualdetection of the EoR while also providing new measure-ments on the “wedge” feature that delineates the EoRwindow.This paper is organized as follows. In Section II we dis-cuss various real-world obstacles that must be dealt withwhen analyzing real data, and how one can overcomethem while maintaining statistical rigor. We then applyour methods to MWA data in Section III as a “worked ex-ample”, highlighting the importance of various subtletiesof power spectrum estimation. In Section IV we presentsome results from the data, emphasizing the agreementbetween theoretical expectations and our observationsof the foreground wedge (particularly regarding the fre-quency dependence of the wedge). We also present upperlimits on the cosmological 21 cm power spectrum over thebroad redshift range of z = 6 . z = 11 .

1. Finally, wesummarize our conclusions in Section V.

II. SYSTEMATIC METHODS FOR DEALINGWITH REAL-WORLD OBSTACLES

To understand the gap between an analysis frameworkfor idealized observations and any real-world data set, weenumerate and address six diﬀerent obstacles that ratheruniversally aﬀect real data. Our goal in this section isto meet the challenges presented by these obstacles whilemaintaining as many of the advantages of the optimalframework as possible, which we reiterate in Section II A,especially the ability to minimize and precisely quantifythe uncertainties in the measurements. In the follow-ing sections, we address the problems presented by largedata volumes (Section II B), uncertainties in the proper-ties of contaminants such as foregrounds (Section II C),incomplete uv coverage (Section II D), radio frequency in-terference (RFI) ﬂagging (Section II E), foreground leak-age into the EoR window (Section II F), and binning tospherically averaged power spectra (Section II G). A. A systematic framework for analyzing idealizedobservations

In this section, we brieﬂy review the formalism of Liuand Tegmark [29] for optimal power spectrum estimation,which was adapted for 21 cm tomography from similartechniques used in galaxy survey and cosmic microwavebackground analysis [38–41]. For now, we do not includereal-world eﬀects such as missing data from RFI ﬂagging,and the purpose of later sections is to extend the formal-ism to take into account these complications.In 21 cm tomography, one typically wishes to measureboth the spherically-binned power spectrum P sph ( k ), de-ﬁned by (cid:104) (cid:101) T ∗ ( k ) (cid:101) T ( k (cid:48) ) (cid:105) ≡ (2 π ) P sph ( k ) δ ( k − k (cid:48) ) , (2)and the cylindrically-binned power spectrum P cyl ( k ⊥ , k (cid:107) ), deﬁned by (cid:104) (cid:101) T ∗ ( k ) (cid:101) T ( k (cid:48) ) (cid:105) ≡ (2 π ) P cyl ( k ⊥ , k (cid:107) ) δ ( k − k (cid:48) ) , (3)with (cid:101) T ( k ) signifying the spatial Fourier transform of the21 cm brightness temperature ﬁeld T ( r ), k denoting thespatial wavevector with magnitude k , and components k ⊥ and k (cid:107) as the components perpendicular and paral-lel to the line-of-sight, respectively. The angled brackets (cid:104)· · · (cid:105) represent an ensemble average. The spherical powerspectrum is useful for comparing to theoretical models,since it is obtained by angularly averaging over spheri-cal shells in Fourier space, and thus makes the cosmo-logically relevant assumption of isotropy. The cylindricalpower spectrum is useful for identifying instrumental andforeground eﬀects, which possess a cylindrical symmetryrather than a spherical one. Typically, the cylindricalpower spectrum is produced ﬁrst as a tool for foregroundisolation (i.e., to identify the EoR window), and then sub-sequently binned into a spherical power spectrum. Thissection concerns the estimation of the cylindrical powerspectrum. Optimal binning techniques to go from thecylindrical spectrum to the spherical spectrum are dis-cussed in Section II G.In estimating a power spectrum from data, one mustnecessarily discretize the problem. We make the approx-imation that the power spectra are piecewise constantfunctions, such that we can describe them in terms of avector of bandpowers with components p α , where p α ≡ P cyl ( k α ⊥ , k α (cid:107) ) . (4)It is the bandpowers and their error properties that onewishes to estimate from the data, which come in the formof a data vector x . Intuitively, one can think of the datavector as a list of the 21 cm brightness temperatures mea-sured at various locations in a three-dimensional “datacube”. Rigorously, we deﬁne each element of the datavector (i.e., each voxel of the data cube) as x i ≡ (cid:90) T ( r ) ψ i ( r ) d r , (5)with ψ i ( r ) being the pixelization kernel and T ( r ) as the(continuous) three-dimensional 21 cm brightness temper-ature ﬁeld . In this paper we take the i th pixelization Of course, instrumental noise and foregrounds do not properly kernel ψ i ( r ) to be a boxcar function centered on the i th voxel of the data. To estimate the α th bandpower from the data vector,we ﬁrst form a quadratic estimator of the form q α ≡

12 ( x − m ) t C − C ,α C − ( x − m ) −

12 tr[ C junk C − C ,α C − ] , (6)where m ≡ (cid:104) x (cid:105) is the mean of the data, C ≡ (cid:104) xx t (cid:105) −(cid:104) x (cid:105)(cid:104) x (cid:105) t is its covariance, C junk is the component of thecovariance “junk”/contaminants (to be deﬁned in the fol-lowing section), and C ,α is the derivative of the covari-ance with respect to the α th bandpower. Since we are ap-proximating the power spectrum as piecewise constant,we have C = C junk + (cid:88) α p α C ,α . (7)Combined with Equation (5), this expression can be usedto derive explicit forms for C ,α , which reveals that thematrix essentially Fourier transforms and bins the data[29, 30]. Intuitively, C ,α can be thought of as the re-sponse in the data covariance C to the bandpower p α .Thus, as long as one selects an appropriate form for C ,α ,the formalism of this section can also be used to directlymeasure the spherical power spectrum. However, as wediscussed above, in this paper we choose to ﬁrst estimatethe cylindrical power spectrum as an intermediate diag-nostic step, to quantify and mitigate foregrounds better.Once the q α s have been formed, they need to be nor-malized using a suitable invertible matrix M to form theﬁnal bandpower estimates: (cid:98) p = Mq , (8)where we have grouped the bandpower estimates (cid:98) p α intoa vector (cid:98) p (and similarly grouped the coeﬃcients q α and q ), with the hat ( (cid:98) ) signifying the fact that we haveformed an estimator of the true bandpowers . We shalldiscuss diﬀerent choices of M in Section II F. reside in a cosmological three-dimensional volume: noise is in-troduced in the electronics of the system, whereas foregroundsare “nearby” and only appear in the same location in the datacube as our cosmological signal by virtue of their frequency de-pendence. However, there is a gain in convenience and no lossof generality in assigning a noise and foreground contribution toeach voxel, pretending that those contaminants also live in theobserved cosmological volume. This choice, following [30], is motivated by the fact that thecovariance between each pixel in this basis for both noise andforegrounds can be written in an algorithmically convenient way. Note that q , (cid:98) p , and M live in a diﬀerent vector space than x , C , and C ,α . The former are in a vector space where eachcomponent refers to a diﬀerent bandpower, whereas the latterare in one where diﬀerent components refer to diﬀerent voxels. To understand the uncertainty in our estimates, wecompute several error properties. The ﬁrst is the covari-ance matrix of the ﬁnal measured bandpowers: Σ ≡ (cid:104) (cid:98) p (cid:98) p t (cid:105) − (cid:104) (cid:98) p (cid:105)(cid:104) (cid:98) p (cid:105) t = MFM t , (9)where we have introduced the Fisher matrix F , which hascomponents F αβ = 12 tr[ C − C ,α C − C ,β ] . (10)The Fisher matrix also allows us to relate our estimatedbandpowers (cid:98) p to the true bandpowers p via the windowfunction matrix W : (cid:104) (cid:98) p (cid:105) = Wp , (11)where W can be shown to take the form W = MF . (12)If we choose M such that the rows of W each sum tounity, Equation (11) shows that each bandpower estimatecan be thought of as a weighted average of the truth,with weights given by each row (each window function).Even with this normalization requirement, there are stillmany choices for M . We discuss the various options andtradeoﬀs in Section II F.Whatever the choice of M , our estimator has optimalerror properties in the sense that if (cid:98) p in Equation (11) isused to constrain parameters in some theoretical model,those measured parameters will have the smallest possi-ble error bars given the observed data [38]. Our goal inthe following sections will be to ensure that both thesesmall error bars and our ability to rigorously computethem are preserved in the face of real-world diﬃculties. B. A real-world obstacle: data volume

Perhaps the most glaring diﬃculty presented by theideal technique outlined above is its computational cost.Much of that cost arises from the inversion of the data co-variance matrix C in Equations (6) and (10), in additionto the multiplication of C and matrices of the same size.Both of these operations scale like O ( N ), where N is thenumber of voxels in each data vector. The computationalcost makes taking full advantage of current generationalinterferometric data prohibitive, not to mention upcom-ing observational eﬀorts that expect to produce 10 ormore voxels of data.One would like to retain the information theoretic ad-vantages of the quadratic estimator method and its abil-ity to precisely model errors and window functions, with-out O ( N ) complexity. The solution to this problem,developed and demonstrated in [30], comes from tak-ing advantage of a number of symmetries and approx-imate symmetries of the survey geometry and the co-variance matrix, C , and can accelerate the technique to O ( N log N ). The fast method relies on assembling the data into adata cube with rectilinear voxels amenable to manipula-tion with the Fast Fourier Transform. This is equivalentto the assertion that each voxel represents an equal vol-ume of comoving space, an approximation that relies ontwo restrictions on the data cube geometry. First, therange of frequencies considered must be small enoughthat D c ( z ) (the line-of-sight comoving distance, equal to D M ( z ) above in a spatially ﬂat universe) is linear with ν .Generally, one should limit oneself to analyzing the powerspectrum of redshift ranges short enough that the evo-lution of the power spectrum during reionization can beneglected. This range, suggested by [42] to be ∆ z < ∼ . ν and D c ( z ) better than one part in 10 at the redshiftsof interest to 21 cm cosmology.Second, the assumption of equal volume voxels relieson the ﬂat sky approximation. To achieve this the areasurveyed can be broken into a number of subﬁelds, eacha few degrees on a side, for which the curvature of thesky can be neglected. As long as the angular extent ofthe data cube is smaller than ∼ ◦ , the ﬂat sky approx-imation is correct to a few parts in 10 .By analyzing a rectilinear volume of the universe, allsteps in calculating the band powers q α can be performedquickly by exploiting various symmetries and taking ad-vantage of the Fast Fourier Transform. The model for C can be broken up into a number of independent ma-trices representing signal, noise, and foregrounds. Eachof these models, developed by [29], is well approximatedby a sparse matrix in a convenient combination of realand Fourier spaces [30]. As a result, multiplication ofa vector by C can be performed in O ( N log N ). Dillon et al. [30] showed how that speed-up can be parlayed intoa method for quickly calculating q α using the ConjugateGradient Method. The rapid convergence of the iterativemethod for calculating C − x can be ensured by the ap-plication of a preconditioner which relies on the spectralsmoothness of foregrounds and the fact that they are welldescribed by only a few eigenmodes [22]. Then, by ran-domly simulating many data vectors from the covariance C and calculating q α from each, the Fisher matrix canbe estimated from the fact that F = (cid:104) qq t (cid:105) − (cid:104) q (cid:105)(cid:104) q t (cid:105) , (13)which follows from Equation (9). All of this togetherallows for fast, optimal power spectrum estimation—including error bars and window functions—despite thechallenge presented by an enormous volume of data. C. A real-world obstacle: uncertain contaminantproperties

If one had perfect knowledge of the foreground contam-ination in the data cube, the problem of foreground con-tamination would be trivial; one would simply perform adirect subtraction of the foregrounds from the data vec-tor x . Unfortunately, our knowledge of foregrounds isfar from perfect, particularly at the level of precision re-quired for a direct detection of the cosmological 21 cmsignal. Because of this, the estimator shown in Equation(6) in fact combines several diﬀerent foreground subtrac-tion steps in an attempt to achieve the lowest possiblelevel of foreground contamination:1. A direct subtraction of a foreground model from thedata vector. This is given by x − m . To see this,note that the data vector can be thought of as beingcomprised of the cosmological 21 cm signal x , theforegrounds x fg , and the instrumental noise n . Onthe other hand, the mean data vector m ≡ (cid:104) x (cid:105) = (cid:104) x (cid:105) + (cid:104) x fg (cid:105) + (cid:104) n (cid:105) = (cid:104) x fg (cid:105) . (14)contains only the foreground contribution, becausewe are interested in the ﬂuctuations of the 21 cmsignal, so the cosmological signal has zero mean,as does the instrumental noise (in the absence ofinstrumental systematics). Note that because themean here is the mean in the ensemble averagesense (as opposed to just the spatial mean), m rep-resents a full spatial and spectral model of the fore-grounds.2. Since the foregrounds also appear in the covari-ance matrix, the action of C − is to downweightforeground-contaminated modes, exploiting fore-ground properties such as smooth frequency depen-dence.3. Subtracting the term tr[ C junk C − C ,α C − ] elimi-nates the bias from contaminants.4. Finally, the binning of the cylindrical power spec-trum to the spherical power spectrum providesyet more foreground suppression. Foregrounds aredistributed in select regions on the k ⊥ - k (cid:107) plane(i.e., outside the EoR window) in patterns that donot lie along contours of constant k = (cid:113) k ⊥ + k (cid:107) .Thus, when binning along such contours to producea spherical power spectrum, one can selectivelydownweight parts of the contour with greater fore-ground contamination, which constitutes a form offoreground cleaning. Roughly speaking, this corre-sponds to taking advantage of the fact that fore-grounds have a cylindrical symmetry in Fourierspace, whereas the signal is spherically isotropic[43]. We do note, however, that the formalism weintroduce in Section II G is general enough to useany geometric diﬀerences between foregrounds andsignal.Of these foreground mitigation strategies, the ﬁrst andthird are direct subtractions (in amplitude and power,respectively), whereas the second and the fourth act through weightings. The former group represent opera-tions that are particularly vulnerable to incorrectly mod-eled foregrounds. To see this, recall that the foregroundsare expected to be larger than the cosmological signal bythree or four orders of magnitude [10–13]. Thus, whenperforming direct subtractions, low-level, unaccounted-for inaccuracies in the foreground model can translateinto extremely large biases in the ﬁnal results. In ad-dition, signiﬁcant numerical errors may arise from thesubtraction of two large numbers (the data and the fore-grounds) to obtain a small number (the measured cos-mological signal).Our goal for the rest of the section is to immunizeourselves against biases from direct subtractions. Of thedirect subtraction steps list above, the Step 1 is likely tobe relatively harmless for two reasons. First, it is imme-diately followed by the C − downweighting. The down-weighting mitigates the eﬀects of inaccuracies in model-ing, for the C − tends to gives less weight to preciselythe modes that have the largest foreground amplitudes,and therefore would be the most susceptible to modelingerrors in the ﬁrst place. In addition, the uncertainty inforeground properties in those regions of the k ⊥ - k (cid:107) planeresult in large error bars there, providing a convenientmarker of the untrustworthy parts of the plane, eﬀec-tively demarcating the boundaries of the EoR window.For these two reasons, Step 1 is unlikely to be an issue,at least not inside the EoR window.More worrisome is Step 3, where the power spectrumbias of contaminants is subtracted oﬀ. If we deﬁne “con-taminants” to be “everything but the cosmological 21 cmsignal”, there are two potential sources of bias: fore-grounds and noise. The subtraction of these biases isnot followed by a downweighting analogous to the appli-cation C − in Step 1. Moreover, whereas one could arguethat the foreground bias is likely to be large only outsidethe EoR window, the noise bias will spread throughoutthe k ⊥ - k (cid:107) plane. This noise bias will also be quite large,as current experiments are ﬁrmly in the regime wherethe signal-to-noise is below unity. It would therefore beadvantageous to avoid bias subtractions altogether if pos-sible.To avoid having to subtract foreground bias, we sim-ply redeﬁne what we mean by contaminants/junk. Ifwe modify our mission to be one where we are measur-ing the power spectrum of total sky emission instead ofthe power spectrum of the cosmological 21 cm signal, theforeground contribution to the bias term no longer ex-ists, as foregrounds now count as part of the signal wewish to measure. Of course, nothing has really changed,for we have simply ignored the subtraction of the fore-ground bias by redeﬁning what we mean by “contami-nants”. The method is still optimal for measuring thepower spectrum of the sky emission—though now it willnot provide the absolute best possible limits on the EoRpower spectrum. Within the EoR window, this shouldresult in little degradation of our ﬁnal constraints, forin this region foreground contamination is expected tobe negligible, and the power spectrum of the cosmolog-ical signal should be essentially identical to the powerspectrum of total sky emission. In any case, this is anassumption that can be checked in the ﬁnal results, andrepresents a conservative assumption throughout Fourierspace since foreground power is necessarily positive. Asdetailed low-frequency foreground observations are con-ducted, it may be possible to achieve more sensitivity inforeground contaminated regions by taking advantage ofmore detailed maps and developing more faithful mod-els. This task is left to future power spectrum estimationstudies.In contrast, escaping to the safe conﬁnes of the EoRwindow alone is not suﬃcient to eliminate the instrumen-tal noise portion of the bias term, for the instrumentalnoise bias pervades the entire k ⊥ - k (cid:107) plane. To elimi-nate the noise bias, one can choose to compute not theauto-power spectrum of a single data cube with itself,but instead to compute the cross-power spectrum of twodata cubes that are formed from data from interleaved(i.e., odd and even) time samples. Since the instrumen-tal noise is uncorrelated in time, this has the eﬀect ofautomatically removing the instrumental noise bias .More explicitly, we can form a bandpower estimate ofthe cross-power spectrum by simply computing (cid:98) p cross α = x t E α x , (15)where x and x are the data vectors for the two timeinter-leaved data cubes, and for notational brevity wehave deﬁned E α ≡ (cid:80) β M αβ C − C ,β C − . For nota-tional cleanliness we will omit the − m term in our powerspectrum estimator for this section only, with the under-standing that x signiﬁes the data vector after the best-guess foreground model has already been subtracted. Ina similar fashion, x fg refers to the foreground residuals,post-subtraction.To see that the cross-power spectrum has no noise bias,let us decompose the data vectors x i into the sum of s and n i , the signal and noise components respectively,where the signal component has no index because it doesnot vary in time (note also that following the discussionabove, any true sky emission counts as signal, so that s ≡ x + x fg ). Inserting this decomposition into thepreceding equation and taking the expectation value of The reader may object to this by (correctly) pointing out thatthere exist errors that are correlated in time, with calibration er-rors being a prime example. The result would be a cross-powerspectrum that still retained a bias. However, this does not inval-idate the cross-power spectrum approach, in the following sense.While biases will make our estimates of the power spectrum im-perfect, these estimate will not be incorrect—the ﬁnal (biased)power spectra will still represent perfectly rigorous upper limitson the cosmological power, provided we are conservative abouthow we estimate our error bars. We will discuss how to makesuch conservative error estimates later on in this section and inSection III C. the result gives (cid:104) (cid:98) p cross α (cid:105) = (cid:104) ( s + n ) t E α ( s + n ) (cid:105) = (cid:104) s t E α s (cid:105) + (cid:104) n (cid:105) t E α s + s t E α (cid:104) n (cid:105) + (cid:104) n E α n (cid:105) = (cid:104) s t E α s (cid:105) , (16)where the last equality holds because the instrumen-tal noise has zero mean, i.e. (cid:104) n i (cid:105) = 0, and no cross-correlation between diﬀerent times, i.e. (cid:104) n n (cid:105) = 0. Theresulting estimator depends only on the power spectrumof the signal, and there is no additive bias.Importantly, however, we emphasize that while wehave eliminated noise bias by computing a cross-powerspectrum, we have not eliminated noise variance . Inother words, the instrumental nosie will still contributeto the error bars. To see this, consider the variance inour estimator, which is given by Σ cross αβ = (cid:104) (cid:98) p cross α (cid:98) p cross β (cid:105) − (cid:104) (cid:98) p cross α (cid:105)(cid:104) (cid:98) p cross β (cid:105) = (cid:104) x t E α x x t E β x (cid:105) − (cid:104) x t E α x (cid:105)(cid:104) x t E β x (cid:105) (17)The second term simpliﬁes to (cid:104) (cid:98) p cross α (cid:105)(cid:104) (cid:98) p cross β (cid:105) = (cid:88) ijkl (cid:104) x i x j (cid:105)(cid:104) x k x l (cid:105) E αij E βkl . (18)Similarly, the ﬁrst term is equal to (cid:104) (cid:98) p cross α (cid:98) p cross β (cid:105) = (cid:88) ijkl (cid:104) x i x j x k x l (cid:105) E αij E βkl = (cid:88) ijkl (cid:18) (cid:104) x i x j (cid:105)(cid:104) x k x l (cid:105) + (cid:104) x i x k (cid:105)(cid:104) x j x l (cid:105) + (cid:104) x i x l (cid:105)(cid:104) x j x k (cid:105) (cid:19) E αij E βkl , (19)where in the last equality we assumed Gaussian dis-tributed data to simplify the four-point correlation. Ourbandpower covariance is now Σ cross αβ = (cid:88) ijkl (cid:18) (cid:104) x i x k (cid:105)(cid:104) x j x l (cid:105) + (cid:104) x i x l (cid:105)(cid:104) x j x k (cid:105) (cid:19) E αij E βkl . (20) In principle, x may exhibit departures from Gaussianity, sinceforegrounds are typically not Gaussian-distributed. How-ever, there are several reasons to expect deviations from non-Gaussianity to be unimportant. First, the most ﬂagrantly non-Gaussian foregrounds are typically those that are bright. Whenwe analyze real data in Section III, we alleviate this problem byanalyzing only a relatively clean part of the sky. In addition, re-call that in this section, x represents the data after a best-guessmodel of foregrounds has been subtracted from the original mea-surements. Thus, the crucial probability distribution to consideris not the foregrounds themselves, but rather the deviations fromthe foregrounds, which are likely to be better-approximated bya Gaussian distribution. The ﬁrst term in this expression consists only of auto-correlations, which contain both noise and signal: (cid:104) x x t (cid:105) = (cid:104) ( s + n )( s t + n t ) (cid:105)−(cid:104) s (cid:105)(cid:104) s (cid:105) t = S + N = C , (21)where we have deﬁned C to be the total data covariance(as deﬁned in Section II A), S ≡ (cid:104) ss t (cid:105) − (cid:104) s (cid:105)(cid:104) s (cid:105) t is thesky signal covariance (as per the discussion earlier in thissection), and N ≡ (cid:104) n n t (cid:105) = (cid:104) n n t (cid:105) is the instrumentalnoise covariance. We have assumed that there is no cor-relation between the sky emission and the instrumentalnoise, so that (cid:104) sn t (cid:105) = (cid:104) sn t (cid:105) = 0.The second term in our bandpower covariance consistsonly of cross-correlations, and thus contains no noise co-variance: (cid:104) x x t (cid:105) = (cid:104) ( s + n )( s t + n t ) (cid:105) = S . (22)Putting everything together, we obtain Σ cross αβ = tr (cid:2) CE α CE β (cid:3) + tr (cid:2) SE α SE β (cid:3) . (23)This, then, is the error covariance of our cross power spec-trum estimator. It gives less variance than the expressionfor the auto power spectrum, which in the notation of thissection takes the form Σ auto αβ = 2tr (cid:2) CE α CE β (cid:3) . (24)Despite this diﬀerence between equations 23 and 24, onemay conservatively opt to use the above covariance ma-trix for the auto-power spectrum to estimate error barseven when using Equation (15) to estimate the powerspectrum itself. In fact, it may be prudent to makethis choice because there exists the possibility that thenoise between interleaved time samples may not be trulyuncorrelated, making the true errors closer to those de-scribed by Σ auto . In our worked example with MWA datain Section III, we will conservatively use Equation (24)to estimate the errors of our cross-power spectrum. Thetask of characterizing the noise properties of the instru-ment thoroughly enough to eliminate this assumption isleft to future work on a larger data set.In summary, uncertainties in noise and foregroundproperties make it desirable to avoid trying to extractweak signals by performing subtractions between twolarge numbers (the contamination-dominated data andthe possibly inaccurate contaminant models). Mathe-matically, the greatest concern comes with the subtrac-tion of the noise and foreground biases from power spec-tra estimates. To deal with the residual noise bias, onemay evaluate cross-power spectra between interleaved Note that this assumption has nothing to do with whether ornot the instrument is sky-noise dominated. A sky-noise domi-nated instrument will have instrumental noise whose amplitude depends on the sky temperature, but the actual noise ﬂuctua-tions will still be uncorrelated with the sky signal. time samples rather than auto-power spectra. To dealwith the foreground bias, one can conservatively elect tosimply leave it in when placing upper limits on the cosmo-logical signal, and rely on the robustness of the EoR win-dow to separate out the foregrounds from the cosmolog-ical 21 cm signal. In eﬀect, one can practice foregroundavoidance rather than foreground subtraction, since theformer (if it is suﬃcient for a detection of the cosmo-logical signal) will be more robust than the latter in theface of foreground uncertainties. Finally, as a brute-forcesafeguard, to quantify such uncertainties, one can alwaysvary the foreground model used in power spectrum es-timation, as we do in Section III C when we apply ourmethods to the worked example of MWA data.

D. A real-world obstacle: incomplete uv -coverage While the methods of the previous section allow one toalleviate the eﬀects of foreground modeling uncertainty,it is impossible to avoid the fact that real interferome-ters are imperfect imaging instruments. This is because areal interferometer will inevitably have uv -coverage thatis non-ideal in two ways. First, the coverage is non-uniform, resulting in images that have been convolvedwith non-trivial synthesized beam kernels. Second, the uv -coverage is incomplete, in that certain parts of the uv -plane are not sampled at all. The idealized methodsof Section II A deals with neither problem, and in thissection with augment the formalism to rectify this.Assume for a moment that uv coverage is complete (sothat there are no “holes” in the uv -plane), but not nec-essarily uniform. In such a scenario, one has measuredan unevenly weighted sample of the Fourier modes ofthe sky. The eﬀect of this non-trivial weighting needs tobe accounted for when measuring the power spectrum,since uv coordinates roughly map to k ⊥ . A failure todo so would therefore result in the ﬁnal power spectrumestimate being multiplied by some function of k ⊥ corre-sponding to the uv distribution.Put another way, the uv distribution of an interferom-eter deﬁnes its synthesized beam, the kernel with whichthe true sky has been convolved in the production of ourimage data cube. The equations of Section II A assumethat this convolution has already been undone. Thus, wemust ﬁrst perform this step, which in our notation maybe written as x = B − x (cid:48) , (25)where x (cid:48) represents the convolved data vector, B is theconvolution matrix encoding the eﬀects of the synthe-sized beam, and x is the processed data vector that isfed into Equation (6). Note that this application of B − is meant to undo only the eﬀects of the synthesized beam,not the primary beam.The above method assumes that the matrix B is in-vertible. In practice, this will likely not be the case asparts of the uv plane will be missed by the interferom-eter, resulting in a singular B matrix. In what follows,we will present two diﬀerent ways to deal with this. Theﬁrst is to modify the equations of Section II A so thatthey accept the convolved images (the “dirty maps”) asinput. Since all the statistical information relevant to thepower spectrum are encoded in the covariance matrix, wesimply have to make the replacement C ≡ (cid:104) xx t (cid:105) − (cid:104) x (cid:105)(cid:104) x (cid:105) t −→ (cid:104) x (cid:48) x (cid:48) t (cid:105) − (cid:104) x (cid:48) (cid:105)(cid:104) x (cid:48) (cid:105) t . (26)This amounts to C −→ B (cid:0) (cid:104) xx t (cid:105) − (cid:104) x (cid:105)(cid:104) x (cid:105) t (cid:1) B t = BCB t . (27)Of course, changing the covariance matrix also changes C ,α , and we must propagate this change. Diﬀerentiatingthe preceding equation with respect to the bandpower p α gives the substitution C ,α −→ BC ,α B t . (28)Since C ,α is the response of the data covariance C to thebandpower p α , this is simply a statement of the fact thatif our data consists of dirty maps, the revised C ,α matrixshould encode the response of a dirty map’s data covari-ance to the bandpower. With the substitutions givenby Equations (27) and (28), the rest of the equations ofSection II A can be used unchanged. In the limit of aninvertible B matrix, it is straightforward to show thatthis is equivalent to using Equation (25).The second method for dealing with a singular B ,which was proposed in Ref. [30], is to replace the ill-deﬁned inverse matrix B − with a pseudoinverse givenby Π (cid:0) B + γ UU † (cid:1) − Π , (29)where γ is a non-zero but otherwise arbitrary real num-ber, and Π is a projection matrix given by Π ≡ I − U ( U † U ) − U † . (30)The matrix U speciﬁes which modes on the sky are miss-ing in the data as a result of unobserved pixels on the uv -plane. It is constructed by computing the responses(on the sky) of each unobserved uv pixel individually andstoring each response as a column of U . As an example,in the ﬂat-sky approximation the U matrix would have asinusoid in each column, corresponding to the fringes thatwould have been observed by the interferometer had datanot been missing in a particular uv pixel. If these modeswere present in the covariance model (which might bethe case, for example, if the covariance were constructedby modeling data from a diﬀerent interferometer withdiﬀerent uv coverage), then the inverse covariance C − in our estimator needs to be similarly replaced with thepseudoinverse: Π (cid:0) C + γ UU † (cid:1) − Π . (31) Importantly, the pseudoinverse can be quickly multipliedby a vector using the previously discussed conjugate gra-dient method. Its usage therefore does not sacriﬁce anyof the speedups that were identiﬁed in Section II B fordealing with large data volumes. E. A real-world obstacle: missing data from RFI

In any practical observation, the presence of narrow-band RFI will mean that certain RFI-contaminated fre-quency channels will need to be ﬂagged as outliers andomitted from a ﬁnal power spectrum analysis. The re-sult, once again, is the presence of gaps in the data, onlythis time the missing modes are complete frequency chan-nels. However, the pseudoinverse formalism of the pre-vious section is quite ﬂexible in that modes of any formcan be projected out of the analysis. Thus, to correctlyaccount for RFI-ﬂagged data, one simply uses the pseu-doinverse in exactly the same way as one does to accountfor missing uv data. F. A real-world obstacle: foreground leakage intothe EoR window

As Equation (11) showed, estimates of the power spec-trum are not truly local, in the sense that every band-power estimate (cid:98) p α corresponds to a weighted averageof the true power spectrum, with weights speciﬁed bythe window functions. Liu and Tegmark [29] showedthat these window functions can be quite broad, par-ticularly in regions with high foreground contamination.There is thus the danger that foreground power couldleak into the EoR window. Because the foregrounds areso much brighter than the cosmological signal, even asmall amount of leakage could compromise the cleanli-ness of the EoR window.Fortunately, one can exert some control over the shapeof the window functions by making wise choices regard-ing the form of M in Equation (8), which in turn givesthe window functions via W = MF . As discussed above, M must be chosen such that the rows of W sum to unity.Beyond that requirement, however, an inﬁnite number ofchoices are admissible. One choice would be M = F − ,which gives W = I (i.e., delta function windows). Thiswould certainly minimize the amount of leakage into theEoR window, but it comes at a high price: the result-ing error bars on the power spectrum measurement—the The term “window function” should not be confused with theterm “EoR window”. The former refers to the weights thatspecify the linear combination of the true bandpowers that eachbandpower estimate represents, as per Equation (11). The latterrefers to the region on the k ⊥ - k (cid:107) plane that naturally has verylow levels of foreground contamination, as illustrated in Figure1. Σ from Equation (9)—tend to belarge, reﬂecting the data’s inability to make highly lo-calized measurements in Fourier space when the surveyvolume is ﬁnite.On the other extreme, the error bars predicted by Σ can be shown to be their smallest possible if M is takento be diagonal [44]. However, this gives broader windowfunctions, for it is via the smoothing/binning eﬀect ofthese broad window functions that the small errors canbe achieved. One can also argue that the level of smooth-ing dictated by this approach is excessive, since the re-sulting bandpowers have positively correlated errors. (Tosee this, note that up to a row-dependent normalization,the error covariance matrix takes the form Σ ∼ F . Sinceall elements of a Fisher matrix must necessarily be non-negative, this implies that all cross-covariances of theestimated bandpowers have positively correlated errorsunless F is diagonal, which is rarely the case).As a compromise option, we advise M ∼ F − / (againafter a normalization of each row so that the windowfunctions sum to unity). This choice gives window func-tions that are narrower than those for a diagonal M whilemaintaining reasonably small error bars. In addition, aninspection of Equation (9) reveals that this method givesa diagonal Σ , which means that errors between diﬀerentbandpowers are uncorrelated.In Section III D, we use MWA data to demonstrate thecrucial role that the M ∼ F − / choice plays in preserv-ing the cleanliness of the EoR window. G. A real-world obstacle: ensuring that binningdoesn’t destroy error properties

In previous sections, we have discussed how one canpreserve all the desirable properties of the power spec-trum estimator of Section II A in the face of all the real-world complications presented in Sections II B throughII F. The result is a rigorous yet practical estimator forthe cylindrical power spectrum P cyl ( k ⊥ , k (cid:107) ). We nowturn to the problem of binning the cylindrical powerspectrum into the cosmologically relevant spherical powerspectrum P sph ( k ), with a special emphasis on the preser-vation of the information content of our estimator.Just as with the cylindrical power spectrum, we param-eterize the spherical power spectrum as piecewise con- Of course, there exist other choices that are more elaborate thanthe three considered in this paper. For example, with exquisiteforeground and instrumental modeling, one could imagine ﬁrstdecorrelating to delta-function windows by setting M = F − in an attempt to “perfectly” contain the foregrounds to regionsoutside the EoR window, and then to re-smooth the bandpowerswithin the window to reduce the variance. This is a promisingavenue for future investigation, but for this paper our goal issimply to apply the F − / decorrelator to real data (see SectionIII D) to demonstrate the feasibility of containing foregroundsusing decorrelation techniques. stant, so that all the information is encoded in a vectorof bandpowers p sph , so that: p sph α ≡ P sph ( k α ) . (32)The spherical bandpowers are related to estimates of thecylindrical bandpowers (cid:98) p cyl by the equation (cid:98) p cyl = Ap sph + ε , (33)where A is a matrix of size N cyl × N sph of 1s and 0s thatrelates k ⊥ - k (cid:107) pairs to k bins, with N cyl and N sph equal tothe number of cells in the k ⊥ - k (cid:107) plane and the number ofspherical k bins respectively. The vector ε is a randomvector of errors on (cid:98) p cyl . It has zero mean (assuming thatone has taken the care to avoid additive bias in our esti-mator of the cylindrical bandpowers, as discussed above),but non-zero covariance equal to Σ cyl ≡ (cid:104) εε t (cid:105) , where Σ cyl is given by either Equation (23) or (24), dependingon whether the cylindrical bandpowers were computedusing cross or auto-power spectra. (The methods pre-sented in this section are applicable either way).Our goal is to construct an optimal, unbiased estimatorof p sph from (cid:98) p cyl . This is a solved problem [45], and thebest estimator (cid:98) p sph is given by (cid:98) p sph = [ A t Σ − A ] − A t Σ − (cid:98) p cyl , (34)with the ﬁnal error covariance on the spherical bandpow-ers given by Σ sph αβ ≡ (cid:104) (cid:98) p sph α (cid:98) p sph β (cid:105) − (cid:104) (cid:98) p sph α (cid:105)(cid:104) (cid:98) p sph β (cid:105) = [ A t Σ − A ] − . (35)Since the A matrix has (by construction) a single 1 perrow and zeros everywhere else, an inspection of Equation(35) reveals that a diagonal Σ cyl implies a diagonal Σ sph .In other words, the estimator given by Equation (34) pre-serves the decorrelated nature of the M ∼ F − / versionof the cylindrical power spectrum estimator deﬁned inSection II F. This will not be the case for an arbitraryestimator (such as one that is formed from taking uni-formly weighted Fast Fourier Transforms, then squaringand binning). We also emphasize that if one does notchoose to use decorrelated cylindrical bandpower vectors,Equations (34) and (35) require that one keep full trackof the oﬀ-diagonal terms of Σ − . Without it, a consistentpropagation of errors to the spherical power spectrum isnot possible, and may even lead to a mistakenly claimeddetection of the cosmological signal, as we discuss in Sec-tion III D and in Appendix A.Just as with the cylindrical power spectra, we wouldlike to compute the window functions. The deﬁnition ofthe spherical window functions are exactly analogous tothat provided in Equation (11) for the cylindrical powerspectrum, so that (cid:104) (cid:98) p sph (cid:105) = W sph p sph . (36)Taking the expectation value of Equation (34), we have (cid:104) (cid:98) p sph (cid:105) = [ A t Σ − A ] − A t Σ − (cid:104) (cid:98) p cyl (cid:105) = [ A t Σ − A ] − A t Σ − W cyl Ap sph , (37)1where we have used the deﬁnition of the cylindrical win-dow functions to say that (cid:104) (cid:98) p cyl (cid:105) = Wp cyl , as well as thefact that p cyl = Ap sph (with no error term because weare relating the true cylindrical bandpowers to the truespherical bandpowers). Inspecting this equation, we seethat W sph = [ A t Σ − A ] − A t Σ − W cyl A . (38)Therefore, by measuring the width of the spherical win-dow functions (rows of W sph ), one can place rigoroushorizontal error bars on the ﬁnal spherical power spec-trum estimate. H. Summary of the issues

In the last few sections, we have provided techniquesfor dealing with a number of real-world obstacles. Theseinclude:1. Taking advantage of the ﬂat-sky approximation andthe rectilinearity of data cubes, as well as the con-jugate gradient algorithm for matrix inversion toallow large data sets to be analyzed quickly.2. Using cross-power spectra rather than auto-powerspectra in order to eliminate noise bias.3. Replacing inverses with pseudoinverses to deal withdata that has missing spatial modes (due to incom-plete uv coverage) and missing frequency channels(due to RFI).4. Performing power spectrum decorrelation to avoidthe leakage of foreground power into the EoR win-dow.5. Binning of cylindrical power spectra into sphericalpower spectra in a way that preserves desirable er-ror properties.Crucial to this is the fact that these techniques all oper-ate under a self-consistent framework. This allows faith-ful error propagation that accurately captures how vari-ous real-world eﬀects act together. For example, it wasshown in [30] that properly accounting for pixelization ef-fects in Equation (5) results in low Fisher information athigh k (cid:107) , providing a marker for parts of the k ⊥ - k (cid:107) planethat cannot be well-constrained because of ﬁnite spec-tral resolution. The identiﬁcation of such a region wouldbe trivial if one had spectrally contiguous data, for thenone would simply say that the largest measurable k (cid:107) wasroughly 1 / ∆ L (cid:107) , where ∆ L (cid:107) is the width of a single fre-quency channel mapped into a cosmological line-of-sightdistance. However, such a straightforward analysis nolonger applies when there are RFI gaps in the data atarbitrary locations. In contrast, the uniﬁed frameworkpresented in this paper allows all such complications tobe folded in correctly. III. A WORKED EXAMPLE: EARLY MWADATA

Now that we have bridged the gap between theoreti-cal techniques for analyzing ideal data and the numer-ous challenges presented by real data, we are ready tobring together our methods, specify a covariance model,and estimate power spectra from MWA 32-tile prototype(MWA-32T) data. The data were taken between the 21stand 29th of March 2010, the ﬁrst observing campaignduring which data were taken that were scientiﬁcally use-ful. The observations are described in more detail by [46].Real data aﬀords us two opportunities. In this section, welook at the data to examine and quantify the diﬀerencesbetween power spectrum estimators and the pitfalls as-sociated with choice of estimator. In Section IV, we takeadvantage of everything we have developed to arrive atinteresting new foreground results and a limit on the 21cm brightness temperature power spectrum.

A. Description of observations

All of the data used for this paper were taken onthe MWA-32T system. This system has since been up-graded to a 128-tile instrument (MWA-128T; Tingay et al. [47], Bowman et al. [48]), but in this paper we focusexclusively on MWA-32T data, reserving the MWA-128Tdata for future work.The MWA-32T instrument consisted of 32 phased-array “antenna tiles” which served as the primary col-lecting elements. Each tile contained 16 dual linear-polarization wideband dipole antennas which were com-bined to form a steerable beam with a full width at halfmaximum (FWHM) size of ∼ ◦ at 150 MHz. The arrayhad an approximately circular layout with a maximumbaseline length of ∼

340 m, and a minimum baselinelength of 6.6 m, although the shortest operating baselineduring this observational campaign was 16 m. After dig-itization, ﬁltering, and correlation, the ﬁnal visibilitieshad a 1 second time resolution and 40 kHz spectral res-olution over a 30.72 MHz bandwidth. The instrumentalcapabilities are summarized in Table I.For our worked example, we concentrate on March2010 observations of the MWA “EoR2” ﬁeld. Itis centered located at R . A . (J2000) = 10 h m s ,decl . (J2000) = − ◦ (cid:48) (cid:48)(cid:48) , and is one of two ﬁelds athigh Galactic latitude that have been identiﬁed by theMWA collaboration as candidates for deep integrations,owing to their low brightness temperature in low fre-quency measurements of Galactic emission [10, 49]. Forfurther details about the observational campaign or theEoR2 ﬁeld, please see Williams et al. [46], which wasbased on the same set of observations as the ones usedin this paper.Observations covered three 30.72 MHz wide bands,centered at 123 .

52 MHz, 154 .

24 MHz and 184 .

96 MHz,corresponding to a redshift range of 6 . < z < . Field of View (Primary Beam Width) ∼ ◦ at 150 MHzAngular Resolution ∼ (cid:48) at 150 MHzCollecting Area ∼

690 m towards zenith at 150 MHzPolarization Linear X-YFrequency Range 80 MHz to 300 MHzInstantaneous Bandwidth 30.72 MHzSpectral Resolution 40 kHzTABLE I. MWA-32 Instrument Parameters (the redshift range of the results presented in this workis slightly smaller because of data ﬂagging) for the21 cm signal. The 123 .

52 MHz and 154 .

24 MHz bandswere observed for approximately 5 hours each, and the184 .

96 MHz band was observed for approximately 12hours.These early data from the prototype have provided uswith a set of test data that enabled development of exten-sive analysis methods and software on which the results ofthis paper are based. The early prototype had shortcom-ings (e.g., mismatched cables, receiver ﬁrmware errors,correlator timing errors) that compromised the calibra-tion to some extent, raising the apparent noise level. Ad-ditionally, the instrument was only operating with < ∼ B. Mapmaking

Before the data can be used as a worked example forour power spectrum estimator, however, we must convertthe measured visibilities into a data cube of sky imagesat every frequency in our band. In other words, we mustform the data vector x , deﬁned by Equation (5), whichserves as the input for our power spectrum pipeline.To form the data vector, we performed the followingsteps. First, we performed a reduction procedure simi-lar to that described in Williams et al. [46] for the ini-tial ﬂagging and calibration of the data. Hydra A wasidentiﬁed as the dominant bright source in the ﬁeld, andused for calibration assuming a point source model. TheHydra A source model was then subtracted from the uv data. As this same source model was also used for gainand phase calibration, this can be thought of as a “peel-ing” source removal procedure [50–53] on a single source.Alternatively, in the absence of gridding artifacts, this isequivalent to imaging the point-source model and sub-tracting it from the data as part of the direct foregroundsubtraction step discussed in the ﬁrst step of Section II C [45].The subtracted data were imaged using the CASA task clean without deconvolution to produce “dirty” images.No multi-frequency synthesis was performed, so that thefull 40 kHz spectral resolution of the data would be avail-able. The visibilities were gridded using w-projectionkernels [54] with natural (inverse-variance) weighting toproduce maps at each frequency with a cell size of 3 (cid:48) over a 25 . ◦ ﬁeld of view. The resulting cubes contained ∼

200 million voxels, with 512 elements along each spa-tial dimension and 768 elements in the frequency domain.It is important to note that the pre-ﬂagging performedon the data resulted in the ﬂagging of entire frequencybands (which means that there are gaps in the ﬁnal datacube). Cubes were generated for each 5 minute snapshotimage.The individual snapshot data cubes were combined us-ing the primary beam inverse-variance weighting methoddescribed in Williams et al. [46]. The weighting and pri-mary beams were simulated separately for each 40 kHzfrequency channel in each 5 minute snapshot. The com-bined maps and weights were saved, along with the eﬀec-tive point spread function at the center of the ﬁeld. Twoadditional data cubes were created by averaging alternat-ing 5 minute snapshots (i.e. even numbered snapshotswere averaged into one cube, and odd numbered snap-shots were averaged into the other) so that they weregenerated from independent data, but with essentiallythe same sky and uv coverage properties.A further ﬂux scale calibration of the integratedcubes was performed using three bright point sources:MRC 1002-215, PG 1048-090, and PKS 1028-09 to setthe ﬂux scale on a channel-by-channel basis. A two di-mensional Gaussian ﬁtting procedure was used to ﬁt thepeak ﬂux of each of these sources in each 40 kHz chan-nel of the data cube. Predictions for each source werederived by ﬁtting a power law to source measurementsfrom the 4.85 GHz Parkes-MIT-NRAO survey [55], the408 MHz Molonglo Reference Catalog [56], the 365 MHzTexas Survey [57], the 160 MHz and 80 MHz CulgooraSource List [58] and the 74 MHz VLA Low-frequency SkySurvey [59]. A weighted least-squares ﬁt was then per-formed to calculate and apply a frequency-dependent ﬂuxscaling for the cube to minimize the square deviations ofthe source measurements from the power law models.An additional ﬂagging of spectral channels was per-3formed based on the root-mean-square (RMS) noise ineach spectral channel of the cube. A smooth noise modelwas determined by median ﬁltering the RMS channelnoise as a function of frequency (bins of 16 channels wereused in the ﬁltering). Any channel with 5 σ or largerdeviations from the smoothed noise model was ﬂagged.Upon inspection, these additional ﬂagged channels wereobserved to be primarily located at the edges of the coarsedigital ﬁlterbank channels, which were corrupted due toan error in the receiver ﬁrmware. After this procedure,approximately one third of the spectral channels werefound to have been ﬂagged.Each individual map covered 25 . ◦ × . ◦ at a res-olution of 3 (cid:48) with 768 frequency channels (40 kHz fre-quency resolution). To decrease the computational bur-den of the covariance estimation, each map was subdi-vided into 9 subﬁelds, and the pixels were averaged toa size of 15 (cid:48) . The data cubes were mapped to comov-ing cosmological coordinates using WMAP-7 derived cos-mological parameters, with Ω M = 0 . Λ = 0 . H = 71 km s − Mpc − , and Ω k ≡ x . However,estimating power spectra and error statistics using theformalism of Section II also requires a covariance model,which we construct in the next section. C. Covariance model

We follow [29] and [30] in modeling the covariance ma-trix C as the sum of independent parts attributable tonoise and foregrounds. We leave oﬀ the signal covari-ance because it only contributes to the ﬁnal error bars byaccounting for cosmic variance—a completely negligibleeﬀect in comparison to foreground and noise-induced er-rors. We adopt a conservative model of the extragalacticforegrounds by treating them as a Poisson random ﬁeldof sources with ﬂuxes less than 100 Jy, after the manualremoval of Hydra A. By treating all extragalactic fore-grounds as “unresolved,” we eﬀectively throw out infor-mation about which lines of sight are most contaminatedby bright foregrounds. As [30] showed, future analysescan improve on our limits by including more informationabout the foregrounds. We begin with the parameterizedcovariance model of [29], C unresolved ij = (cid:18) . × − KJy (cid:19) (cid:32)(cid:90) S cut S dndS dS (cid:33) × (cid:18) ν i ν j ν ∗ (cid:19) − − ¯ κ exp (cid:34) σ κ (cid:18) ln (cid:20) ν i ν j ν ∗ (cid:21)(cid:19) (cid:35) × exp (cid:20) ( r ⊥ i − r ⊥ j ) σ ⊥ (cid:21) (cid:18) Ω pix (cid:19) − (39)where ν ∗ = 150 MHz is a reference frequency, ν i is thefrequency of the i th voxel, which has an angular distance of r ⊥ i from the ﬁeld center. The spectral index is ¯ κ =0 .

5, the uncertainty in the spectral index is σ κ = 0 . σ ⊥ = 7 (cid:48) , Ω pix is theangular size of each pixel, the ﬂux cut S cut = 100 Jy, and dn/dS is the diﬀerential source count from [61], dndS =(4000 Jy − sr − ) × (cid:16) S .

880 Jy (cid:17) − . for S > (cid:16) S .

880 Jy (cid:17) − . for S ≤ κ = 0 . σ κ = 0 . σ ⊥ =30 ◦ , and replace the ﬁrst three terms of the covariance inEquation 39 with T = (335 . .Our model for the instrumental noise is adopted from[30], with one key diﬀerence: the overall normalization.For each subband, we let the noise covariance matrixscale by a free multiplicative constant. This is equiva-lent to treating the combination T / ( A t obs ) as a freeparameter. We then ﬁt for that parameter by requiringthe RMS diﬀerence between the two time slices—whichshould be free of sky signal—for the densely sampled in-ner region of uv space and rescaling our noise covariancematrix to match. The spatial structure of the covari-ance was left unchanged. Even though the data is some-what nosier than suggested by a ﬁrst principles calcula-tion assuming ﬁducial values for system temperature andantenna eﬀective area, this empirical renormalization al-lows for an honest account of the errors introduced byinstrumental eﬀects.To verify that our parameterization of the foregroundswas reasonable, we varied these parameters over an or-der of magnitude and found that they had little eﬀect onour ﬁnal power spectrum estimates, except at the lowestvalues of k . There are two reasons for this: ﬁrst, sincewe are only measuring the power spectrum of the sky, weneed not worry about precisely subtracting foregrounds.Second, because the noise in our instrument is still morethan two orders of magnitude from the cosmological sig-nal, in the EoR window our band power measurementswill be noise dominated and agnostic to our foregroundmodel. Future analyses might include a more thoroughtreatment of the foregrounds, especially by utilizing thefull power of the Dillon et al. [30] method to include infor-mation about the positions, ﬂuxes, and spectral indicesof individual point sources.4 D. Evaluating power spectrum estimator choices

With both a data vector x and a covariance matrix C in hand, we can now apply the methods of SectionII to estimate power spectra. In doing so, we deal withreal-world obstacles using all of the techniques that wehave developed. In this section, we show why all this isnecessary.In Section II F we touted the choice of power spectrumestimator (cid:98) p = Mq with M ∼ F − / as a compromisesolution between the choice with the smallest error bars, M ∼ I , and the choice with the narrowest window func-tions, M ∼ F − . In the race to detect the power spec-trum from the EoR, one might be tempted to aggressivelyseek out the smallest possible errors. This could prove adeleterious choice, as we will now show using MWA-32Tdata.First, in Figure 2 we compare cylindrical power spec-tra, (cid:98) p , generated using two diﬀerent estimators of thepower spectrum that we presented in Section II F. Onthe left, we have used M ∼ I , the estimator withthe smallest error bars, and on the right we have used M ∼ F − / , the estimator with decorrelated errors. Inboth cases, we have plotted the absolute value of thepower spectrum estimates (which can be negative be-cause they are cross-power spectra). Because the twoestimates are related to one another by an invertible ma-trix, they contain the same cosmological information. Ina sense, the M ∼ F − / method is the most honest es-timator of the power spectrum because the band powersform a mutually exclusive and collectively exhaustive setof measurements. In other words, they represent all theall the power spectrum information from the data, di-vided into independent pieces.Moreover, just because two sets of estimators have thesame information content does not mean that they areequally useful for distinguishing the cosmological powerspectrum from foreground contamination. In Figure 2,the minimum variance estimator for the power spectrumintroduces considerable foreground contamination intothe EoR window, demarcated by the expected angularextent of the wedge feature (which we introduced in Sec-tion I and will discuss in greater detail in Section IV A).Even highly suspect features at high k ⊥ where uv cover-age is spottiest seem to get smeared across k ⊥ and intothe EoR window. We cannot simply cut out the wedgefrom our cylindrical-to-spherical binning and expect aclean measurement of the power spectrum in the EoRwindow.Looking closely at Figure 2, one might notice that someregions of the EoR window on the lefthand panel still In our comparison of choices for M , we drop the M ∼ F − , δ -function windows choice. In addition to proving the noisiest esti-mator, it suﬀers from strong anti-correlated errors. We adopt theperspective that the important comparison is between the “ob-vious” choice, the minimum variance M ∼ I , and our preferredchoice with decorrelated errors, M ∼ F − / . seem very clean—cleaner perhaps that the same regionsin the righthand panel. To examine that apparent fact,we plot (cid:98) p α instead of | (cid:98) p α | in Figure 3. To make theﬁgure more intelligible, we have plotted colors based onan sinh − color scale with a sharp color division at 0.The sinh − has the advantage of behaving linearly atsmall values of (cid:98) p α and logarithmically at large positiveor negative (cid:98) p α .What emerges is a striking diﬀerence between the twoestimators. For the reasons discussed in Section II C, wehave chosen to estimate the cross power spectrum be-tween two time-interleaved sets of observations. As a re-sult, we expect that instrumental noise should be equallylikely to contribute positive power as it is to contributenegative power. In noise dominated regions of the k ⊥ - k (cid:107) plane, we expect about half of our measurements to bepositive and about half to be negative. That is exactlywhat we see in the EoR window of the M ∼ F − / es-timator. However, the M ∼ I estimator in the lefthandpanel clearly shows positive power throughout the entiresupposed EoR window. Though the magnitude of thatpower is not enormous—often it is well within the verticalerror bars—the overall bias towards positive cross powermeans that sky signal is contaminating the EoR window.This is precisely the problem we were worried about inSection II F and the data have clearly manifested it. This also explains why there appeared to be less powerin the EoR window of the lefthand panel of Figure 2;by taking the absolute value of the weighted average ofpositive and negative quantities, we expect to measurea smaller absolute value of the power. However, as thisﬁgure clearly shows, that weighted average is biased byforeground leakage. And, even though there still appearsto be a region just inside the EoR window that retainspositive band powers consistent with foregrounds, thatsmall amount of leakage can be attributed to ﬁnite sizedwindows functions and to calibration uncertainties. Re-gardless, it does not appear to be an insurmountable lim-itation to the cleanliness of the EoR window; rather, itsuggests that we should be careful in how we demarcatethe EoR window when calculating spherically-averagedpower spectra.In addition to producing a cleaner EoR window, thedecorrelated estimator of the power spectrum yields an-other advantage: narrower window functions. Both theestimator with the minimum variance and estimator with Of course, as we noted in Section II F, the choice of M ∼ F − / is not unique in its ability to mitigate foreground leakage, andother choices certainly warrant future investigation. Picking M ∼ F − / is, however, a good choice for a ﬁrst attempt atdecorrelation, particularly given its various other desirable prop-erties that we have described. The important point here is thatwhile M ∼ F − / may not be necessarily optimal for contain-ing foregrounds within the wedge, our results show that it is areasonable one. In contrast, the “straightforward” approach ofnormalizing the power spectrum with the diagonal choice M ∼ I is clearly ill-advised. FIG. 2. Unless one chooses a power spectrum estimator with decorrelated errors, foregrounds and other instrumental eﬀectscan leak signiﬁcantly into the EoR window. Here we show the absolute value of the cylindrical power spectrum estimate fromthe subband centered on 158 MHz ( z = 8 .

0) and averaged over all 9 ﬁelds. On the left, we have set M ∼ I . On the right, M ∼ F − / . We expect contamination from smooth spectrum foregrounds interacting with the chromatic synthesized beamto occupy the “wedge” portion of Fourier space, deﬁned in Equation (1). Optimistically, the wedge is delimited by the extentof the main lobe of the primary beam; conservatively, we should not see bright foreground contamination beyond the horizon.In the regions where the power spectrum is noise dominated, we expect little structure in the k (cid:107) direction in the EoR windowabove some moderate value of k (cid:107) . In the left panel, we see considerably more k (cid:107) structure in the form of horizontal bands,attributable to foreground contamination and instrumental eﬀects, that has leaked into the putative EoR window. decorrelated errors represent, in aggregate, the weightedaverage of the true, underlying band power spectrum,as we discussed in Section II A. In Figure 4, we showthe improvement that the decorrelated estimator oﬀersover the minimum variance estimator by narrowing thewindow functions considerably. We show ﬁve examplewindow functions from the same subband that we plot in While the choice of M ∼ F − / ensures that the power spec-trum estimator covariance is diagonal (recall, Σ = MFM t while Figure 2, cropped to ﬁt together on one set of axes, eachcentered at their respective peaks. Because the windowfunctions are normalized to sum to 1, the breadth of eachwindow function is reﬂected by the value of the centralpeak. As we expected, the window functions are con-siderably narrower for our decorrelated power spectrumestimator. W = MF ), it does not mean that the window functions are deltafunctions. The oﬀ-diagonal terms of Σ might be zero even if theoﬀ-diagonal terms of W are not. FIG. 3. One advantage of calculating the cross power spectrum of interleaved time-slices of data is that we can easily tell whichregions of Fourier space are noise dominated. Here we reproduce the power spectra from Figure 2 without taking the absolutevalue of P ( k ). By plotting with a discontinuous, sinh − color scale, it is easy to see that the EoR window for our decorrelatedpower spectrum estimate (right panel) has roughly an equal number of positive and negative band power estimates—exactlywhat we would expect from a noise dominated region. By contrast, our power spectrum estimate with correlated errors (leftpanel) shows positive power over almost all of Fourier space, indicating ubiquitous leakage of contaminants into the EoRwindow. Even after binning from cylindrical power spectra tospherical power spectra, the diﬀerence remains quitestark. In Figure 5 we see clearly that choosing a powerspectrum estimator with decorrelated errors also consid-erably improves the window functions in one dimensionas well as two.Lastly, as we mentioned in Section II G, one of advan-tage of our method is that it keeps a full accounting of theerror covariance, Σ . When M is not chosen to make Σ di-agonal, an improper accounting can lead to a suboptimalor simply incorrect propagation of errors. In Appendix Awe work through an example of the consequences of as-suming the independence of errors at various steps in the analysis. This should serve as a warning of the impor-tance of careful analysis; incorrectly assuming a diagonal Σ can lead to unnecessarily wide window functions, anoverestimation of errors, or—worst of all—an underesti-mation of errors that could lead to an unjustiﬁed claimof a detection. IV. EARLY RESULTS

Having developed and demonstrated a technique thatrobustly preserves the EoR window while thoroughly andhonestly keeping track of the errors on and correlations7

FIG. 4. By using an estimator of the 21 cm power spectrum with uncorrelated errors, we signiﬁcantly narrow the windowfunctions that relate the ensemble average of our estimator to the true, underlying power spectrum. Here we show a sampleof ﬁve cropped window functions for the power spectrum estimate in Figure 2, each centered at their maxima, for both anestimator with correlated errors (left panel) and an estimator with uncorrelated errors (right panel). Though the estimatorwith correlated errors produces smaller vertical error bars, it acheives this by “over-smoothing” many band powers together.Narrow window functions let us independently measure many modes of the power spectrum. The band power measured with M ∼ F − / is one of a set of mutually exclusive and collectively exhaustive pieces of information. between our band power estimates, we can now conﬁ-dently generate some interesting preliminary science re-sults. Because these data span the widest redshift rangeto date, we are able to investigate the behavior of thewedge feature over many frequencies. Understanding thebehavior of the EoR window over a large redshift rangeis important, since there is still considerable uncertaintyabout the timing and duration of the EoR. Moreover, itis often argued that a tentative ﬁrst detection of the cos-mological signal will only be convincingly distinguishablefrom residual foregrounds if one can show that the 21 cmbrightness temperature ﬂuctuations peak at some red-shift, since theory predicts that the midpoint of reioniza- tion should be marked by such a peak [7, 62]. It is there-fore essential to characterize the EoR window (and byextension, residual foregrounds) over a broad frequencyrange. We also apply our methods from Section II tocalculate spherically averaged power spectra over our en-tire redshift range, including error bars and window func-tions, thus setting a limit on the 21 cm brightness tem-perature power spectrum during the EoR.8 −1 ) W i ndo w F un c t i on ( U n i t l e ss ) M ∝ I (Correlated Errors) −2 −1 −1 ) W i ndo w F un c t i on ( U n i t l e ss ) M ∝ F −1/2 (Decorrelated Errors) FIG. 5. Even after optimally binning the cylindrical powerspectra from Figure 2 to spherical power spectra, the choiceof a power spectrum estimator with decorrelated errors pro-duces much narrower window functions than the minimumvariance technque. In addition to maintaining a clean EoRwindow, the choice of M ∼ F − / provides the additionalbeneﬁt of allowing power spectrum modes to be measuredmore independently. A. The wedge

In Figure 6, we show all the cylindrical power spectraover the redshift range probed by our current observa-tions. The spectra are sorted into three rows, each of which contain data coming from a single 30 .

72 MHz widefrequency band. All of the spectra were generated usingthe same techniques that were used to generate the ex-ample cylindrical power spectra in Section III D and thuscontain all the desirable statistical properties discussedin Section II. One sees that in every case the foregroundsare mostly conﬁned to the wedge region in the bottomright corner of the k ⊥ - k (cid:107) plane. This builds upon thesingle frequency observations of [13], demonstrating theexistence of the EoR window across a wide range of fre-quencies relevant to EoR observations.Having these measurements also allows us to exam-ine the behavior of the EoR window as a function offrequency. Consider ﬁrst the high k ⊥ regions of the k ⊥ - k (cid:107) plane. The most striking feature here is the wedge.Consistent with being dominated by foreground power,the wedge generally gets brighter with decreasing fre-quency within each wide frequency band, just as fore-ground emission is known to behave. The extent of thewedge is also in line with theoretical expectations. Re-call from Equation (1) that the wider the ﬁeld-of-view,the farther up in k (cid:107) the wedge goes. Since the ﬁeld-of-view is deﬁned by the primary beam, whose extent de-creases with increasing frequency, one expects the wedgeto have the largest area at the lowest frequencies. Thistrend is clearly visible in the cylindrical power spectraof Figure 6, where the wedge extends to the highest k (cid:107) at the highest redshifts. Importantly, the wedge is con-ﬁned to its expected location across the entire range ofthe observations. To see this, note that we have overlaidEquation (1) on the plots, with the dashed line corre-sponding to θ max equal to that of the ﬁrst null of theprimary beam, and the solid line with θ max = π/ k ⊥ , theory suggests that foregrounds will con-taminate a horizontally-oriented region at the bottom ofthe plot. This is clearly seen in the highest frequencyplots. Interestingly, at lower frequencies the increasinginstrumental noise plays more of a role, and the fore-ground contribution is less obvious in comparison (al-though it is still there). While a naive reading of someof these low frequency plots (such as the one for z = 9 . k (cid:107) , such a conclusion would be misguided. As weshall see in Section IV B, these modes are likely dom-inated by foregrounds (and therefore do not integratedown with further integration unlike instrumental noise9 FIG. 6. Examining cylindrically binned power spectra for each subband (each averaged over all nine subﬁelds), reveals sev-eral important trends with frequency of the EoR window and the foregrounds. Each row is a single simultaneously observedfrequency band. Since diﬀerent bands were observed for diﬀerent amounts of time, direct comparisons between rows is chal-lenging. However, several clear trends emerge. For each band, moving to higher redshift (increasing wavelength) shows strongerforegrounds, a larger wedge (in part due to a wider primary beam), and a noisier EoR window (due to a higher system temper-ature). In general the brightest foreground contamination is well demarcated by the wedge line in Equation (1) for the primarybeam (dotted line) and especially by the wedge line for the horizon (solid line). In short, the wedge displays the theoreticallyexpected frequency dependence. dominated modes). Moreover, the error statistics (which self-consistently include foreground errors in our formal-0ism) suggest that low k (cid:107) modes are less useful for con-straining theoretical models, and that the true EoR win-dow does in fact lie at higher k (cid:107) , as suggested by theory.Again, this highlights the importance of estimating powerspectra in a framework that naturally contains a rigorouscalculation of the errors involved. B. Spherical Power Spectrum Limits

Having conﬁrmed that the EoR window behaves asexpected, we will now proceed to place constraints onthe spherical power spectrum. In top panel of Figure 7we show the result of binning the z = 10 . k ) ≡ (cid:114) k π P ( k ) (41)(which simply has units of temperature) rather than P ( k )itself.To quantify the errors in our spherical power spec-trum estimate, we also bin the cylindrical power spec-trum measurement covariances and window functions us-ing the formulae of Section II G. The resulting windowfunctions are shown in the bottom panel, and give anestimate of the horizontal error bars. Thinking of thesewindow functions (which, recall, are normalized to inte-grate to unity) as probability distributions, the horizon-tal error bars shown in the top panel are demarcated bythe 20th and 80th percentiles of the distribution. (Thiscorresponds to the full-width-half-maximum in the eventthat the window functions are Gaussians). The verticalerror bars were obtained by taking the square root ofeach diagonal element of the covariance matrix. Sincethe methods of Section II G carefully preserved the di-agonal nature of the bandpower covariance, each datapoint in Figure 7 represents a statistically independentmeasurement. This would not have been the case had wenot employed the decorrelation technique of Section II F.Immediately obvious from the plot is that there is aqualitative diﬀerence between the data points at low k and those at high k . In particular, the points at low k are detections of the sky power spectrum, whereas thepoints at high k are formal upper limits. This is not tosay, of course, that the cosmological EoR signal has beendetected at low k . Rather, recall from Section II C thatin an attempt to avoid having to make large bias sub-tractions, we elected to compute cross-power spectra oftotal sky emission rather than of the cosmological signal,with the expectation (largely conﬁrmed in Section IV A)that the intrinsic cleanliness of the EoR window would besuﬃcient to ensure a relatively foreground-free measure-ment at high k (cid:107) . Now, our survey volume is such thatwe are sensitive almost exclusively to regions in Fourierspace where k (cid:107) (cid:29) k ⊥ . When binning along contours ofconstant k in the cylindrical Fourier space, we have that k ≡ (cid:113) k ⊥ + k (cid:107) ≈ k (cid:107) , and therefore the low k points ofFigure 7 map to low k (cid:107) . The detections seen at low k thusreside outside the EoR window and are almost certainlydetections of the foreground power spectrum.Despite the fact that the low k modes are foregrounddominated, they still constitute a formal upper limit onthe cosmological power spectrum, since the foregroundpower spectrum is necessarily positive. In fact, our cur-rent, most competitive upper limit resides at the lowest k values. However, this is unlikely to continue to bethe case as more data is taken with the MWA, for tworeasons. First, as foreground-limited measurements, thedata points at low k will not average down with fur-ther integration time. In addition, the error statistics inthe region are not particularly encouraging. The win-dow functions (and therefore the horizontal error bars)are seen to broaden towards lower k , reducing the abil-ity of constraints at those k to place limits on theoret-ical models. (This is most easily seen by recalling thatthe window functions integrate to unity by construction,and thus the increase in their peak values towards higher k implies a broadening of the window functions). Thebroadening of the window functions is an expected conse-quence of foreground subtraction [29] and thus will likelycontinue to limit the usefulness of the low k regime unlessfuture measurements can characterize foreground prop-erties with exquisite precision.In contrast, the points at high k do reside in the EoRwindow. The constraints here are limited by thermalnoise, as we saw in Section III D. Bolstering this view isthe fact that the data here are consistent with zero, asone expects for a noise-dominated cross-power spectrum.The limits here are given by the 2 σ errors predictedby the Equation (35). As mentioned in Section III C,these errors are somewhat larger than what might be pre-dicted by a theoretical sensitivity calculation. However,they are consistent with rough estimates of the errorsobtained from a calculation of root-mean-square valuesfrom the images produced in Section III B. This suggeststhat the larger-than-expected errors are due to noisier-than-expected input maps, and not to any approxima-tions made in the power spectrum estimation techniquespresented in this paper. The data on which these re-sults are based are from the very ﬁrst operation of theprototype array, and we expect better performance inlater data. Encouragingly, we note also that as noise-dominated constraints, the measurements at high k willcontinue to improve with integration time.In Figure 8, we show power spectrum limits across theentire frequency range of the MWA, along with some the-oretical predictions generated using the models in [63].At the lowest redshift, no theory curve is plotted be-cause the model predicts that reionization is complete bythen. This yet again underscores the importance of mak-ing measurements over a broad frequency range—withaccess to a wide range of redshifts, future detections ofthe cosmological signal can be distinguished from residualforegrounds by measuring null signals at redshifts where1 −2 −1 −2 ∆ ( k ) ( K ) k (Mpc −1 ) 10 −2 −1 −1 ) W i ndo w F un c t i on ( U n i t l e ss ) ∆ (k) at z=8.01 σ errors and 20−80% window functions2 σ upper limitsTheoretical ∆ (k) from Barkana (2009) FIG. 7. Our method allows for the estimation of the spherically binned power spectrum in temperature units, ∆( k ), whilekeeping full acount of both vertical error bars and window functions (horizontal error bars) and making an optimal choice in thetradeoﬀ between the two. In the top panel, we have plotted our spherical power spectrum estimates of the subband centered on158 MHz ( z = 8 . σ errors on detections (which are often only barely visible), 2 σ upper limits on non-detections,and horizontal error bars that span the middle three quintiles of the window functions (bottom panel). At low k , the wide errorbars are the expected consequence of foreground contamination [29]. Downward arrows represent measurements consistent withnoise at the 2 σ level. Even though the area under the primary beam wedge has been excised from the 2D-to-1D binning, thedetection of foregrounds at low k , is expected due to the contribution of unresolved foregrounds over a wide range of k ⊥ [30].Our ﬁducial theoretical power spectrum is taken from [63]. reionization is complete.Each redshift bin of Figure 8 exhibits trends that arequalitatively similar to those discussed above for the z = 10 . k detec-2 −2 −2 k (Mpc −1 )z = 11.710 −2 −2 k (Mpc −1 )z = 10.910 −2 −2 k (Mpc −1 )z = 10.210 −2 −2 ∆ ( k ) ( K ) k (Mpc −1 )z = 9.5 10 −2 z = 9.010 −2 z = 8.410 −2 z = 8.010 −2 ∆ ( k ) ( K ) z = 7.6 10 −2 z = 7.210 −2 z = 6.810 −2 z = 6.510 −2 ∆ ( k ) ( K ) z = 6.2 FIG. 8. Taking advantage of our fast yet thorough power spectrum estimation technique, we estimate ∆( k ) for a wide range of k and z , including both vertical and horizontal errors. (For points that represent positive detections of foregrounds or systematiccorrelations, the vertical error bars are often barely visible). Using the visual language of Figure 7, we show here our sphericalpower spectrum limits as a function of both k and z . Each panel is a diﬀerent subband. The many detections can be attibutedto foregrounds (especially at low k ), instrumental eﬀects like those we saw in Figure 2 (especially at medium values of k ), orboth. Our absolute lowest limit on the 21 cm brightness temperature power spectrum, ∆( k ) < . k = 0 .

046 cMpc − and z = 9 . k ) < z = 9 . k = 0 .

134 cMpc − if one discards the lowest k binto immunize oneself against foreground modeling uncertainties). tions and large diﬀerences between neighboring k bins.With as new an instrument as the MWA was at timeof this observation, this issues are understandable. Theexact physical origin of those systematics is beyond thescope of this paper, however they should serve as a re-minder to stay vigilant for them in future datasets from a more battle-tested instrument. However, because wesee no evidence of strong anti-correlations between datacubes, we expect that the extra power introduced by sys-tematics into the EoR window only the eﬀect of worsen-ing the limits we can set.Over all bands, our best limit is ∆( k ) < . z = 9 . k = 0 .

046 cMpc − . However, as re-marked in Section III C, the lowest k bins can be rathersensitive to the covariance model, and if one excludesthose bins, our best limit is ∆( k ) < z = 9 . k = 0 .

134 cMpc − . While our limits may not be quite aslow as other existing limits in the literature [24, 64], theyare the only limits on the EoR power spectrum that spana broad redshift range from z = 6 . z = 11 .

7. More-over, these statistically rigorous limits will likely improvewith newer and more sensitive data from the MWA.

V. CONCLUSIONS

In this paper, we have accomplished three goals. First,we adapted 21 cm power spectrum estimation techniquesfrom Liu and Tegmark [29] and Dillon et al. [30] withreal-world obstacles in mind, so that they could be ap-plied to real data. With early MWA data, our gener-alized formalism was then used to demonstrate the im-portance of employing a statistically rigorous frameworkfor power spectrum estimation, lest one corrupt the nat-urally foreground-free region of Fourier space known asthe EoR window. Finally, we used the MWA data to setlimits on the EoR power spectrum.In confronting real-world obstacles, our desire is topreserve the as much of the statistical rigor in previ-ous matrix-based power spectrum estimation frameworksas possible. To avoid having to perform direct sub-tractions of instrumental noise biases, we advocate com-puting cross-power spectra between statistically identicalsubsets of the data (in the case of the MWA worked ex-ample of this paper, these subsets were formed from oddand even time samples of the data). This has the ef-fect of eliminating noise bias in the power spectrum, al-though instrumental noise continues to contribute to theerror bars. To avoid direct subtractions of foregroundbiases, we simply look preferentially in the EoR win-dow, where foregrounds are expected to be low. Miss-ing data, whether from incomplete uv coverage or RFIﬂagging, can be dealt with using the pseudoinverse for-malism. Doing this allows the eﬀects of missing data tobe self-consistently propagated into error statistics suchas the power spectrum covariance and the window func-tions. In an eﬀort to preserve the cleanliness of the EoRwindow, one should form decorrelated bandpower esti-mates, which have uncorrelated errors and reasonablynarrow window functions. Care must then be taken topreserve these nice properties via an optimal binning ofcylindrical bandpowers into spherical bandpowers.Using early MWA data to demonstrate these tech-niques, we have conﬁrmed theoretical predictions for theexistence of the EoR window and have extended previousobservations done by other groups to a much wider fre-quency range. This allowed us to check predicted trendsof the EoR window as a function of frequency, all of whichare consistent with theory. Crucially, we found that with-out using the decorrelation technology of Section II F, measurements in the EoR window are not in fact instru-mental noise dominated, and contain a systematic biasthat is indicative of foreground leakage from outside theEoR window.The early MWA data has also allowed us to place lim-its on the cosmological EoR power spectrum. Our bestlimit is ∆( k ) < . z = 9 . k = 0 .

046 cMpc (or∆( k ) < z = 9 . k = 0 .

134 cMpc − if one dis-cards the lowest k bin to immunize oneself against fore-ground modeling uncertainties). This may not be com-petitive with other published observations, but general-izes them in an important way: instead of focusing on oneparticular frequency, our limits span a wide range of red-shifts relevant to the EoR, going from z = 6 . z = 11 . ACKNOWLEDGMENTS

This work makes use of the Murchison Radio-astronomy Observatory. We acknowledge the WajarriYamatji people as the traditional owners of the Observa-tory site. Support for the MWA comes from the U.S. Na-tional Science Foundation (grants AST-0457585, PHY-0835713, CAREER-0847753, and AST-0908884), theAustralian Research Council (LIEF grants LE0775621and LE0882938), the U.S. Air Force Oﬃce of Scientic Re-search (grant FA9550-0510247), and the Centre for All-sky Astrophysics (an Australian Research Council Centreof Excellence funded by grant CE110001020). Support isalso provided by the Smithsonian Astrophysical Obser-vatory, the MIT School of Science, the Raman ResearchInstitute, the Australian National University, and theVictoria University of Wellington (via grant MED-E1799from the New Zealand Ministry of Economic Develop-ment and an IBM Shared University Research Grant).The Australian Federal government provides additionalsupport via the National Collaborative Research Infras-tructure Strategy, Education Investment Fund, and theAustralia India Strategic Research Fund, and AstronomyAustralia Limited, under contract to Curtin University.We acknowledge the iVEC Petabyte Data Store, the Ini-tiative in Innovative Computing and the CUDA Centerfor Excellence sponsored by NVIDIA at Harvard Univer-sity, and the International Centre for Radio AstronomyResearch (ICRAR), a Joint Venture of Curtin Universityand The University of Western Australia, funded by theWestern Australian State government. Additionally, theauthors wish to thank Aaron Ewall-Wice, Lu Feng, Abra-4ham Neben, Aaron Parsons, Jonathan Pober, RonaldRemillard, and Richard Shaw for valuable discussions,and Rennan Barkana for both useful discussions and forproviding the theory results shown in Figures 7 and 8.This work is partially supported by NSF grants AST-0821321 and AST-1105835. A.L. acknowledges supportfrom the Berkeley Center for Cosmological Physics.

Appendix A: On the Importance of Modeling theFull Error Covariance

In Section II G, we argued that an inverse covarianceweighted binning scheme for estimating spherical bandpowers produced optimal spherical power spectrum esti-mate. In the case where M is chosen either for the small-est possible error bars or the narrowest possible windowfunctions, the estimator covariance Σ cyl is non-diagonal.Assuming that the matrix is actually diagonal, at one ormore steps in the binning and error propagation, can leaderror bars that are overly conservative or—worse yet—error bars that are insuﬃciently conservative and mightfalsely lead to a claimed detection. In Figure 9, we showthe eﬀects of making a suboptimal choice for binning.If one fully models the covariance matrix Σ cyl , includ-ing oﬀ-diagonal elements, but chooses to generate (cid:98) p sph asan inverse variance (and not inverse covariance) weighted average of cylindrical band powers, neglecting oﬀ diago-nal terms in the weighting, one’s estimators will be noisieras a result (see the solid lines in Figure 9). These are thecorrect errors for the suboptimal choice of estimators.Even worse, if one assumes that Σ cyl is diagonal whenit is not, one is led either to overestimate the error bars, inthe case of M ∼ F − , or underestimate them, as wouldbe the case when M ∼ I . This is because the formercase general exhibits anti-correlated errors while the lat-ter suﬀers from correlated errors. The last scenario isthe most troubling: by aggressively choosing the estima-tor with the smallest vertical error bars ( M ∼ I ) andthen neglecting the correlations between errors, one willunderestimate the error bars and might be lead to falselyclaiming a detection. In this case, the estimator is sub-optimal and the errors are incorrect. Additionally, as weshow in Figure 10, if one were to calculate the the windowfunctions under the assumption that Σ cyl is diagonal, onewould ﬁnd window functions several times boarder thanthey would otherwise be.Thankfully, choosing the cylindrical power spectrumestimator with decorrelated errors avoids the subtle dif-ference between inverse variance and inverse covarianceweighted binning. The M ∼ F − / decorrelated estima-tor preserves the EoR window and allows for easy, op-timal binning of uncontaminated regions into sphericalband power spectrum estimates. [1] S. R. Furlanetto, S. P. Oh, and F. H. Briggs, Phys. Rep., , 181 (2006), arXiv:astro-ph/0608032.[2] M. F. Morales and J. S. B. Wyithe, ARA&A, , 127(2010), arXiv:0910.3010 [astro-ph.CO].[3] J. R. Pritchard and A. Loeb, Reports on Progressin Physics, , 086901 (2012), arXiv:1109.6012 [astro-ph.CO].[4] A. Loeb and S. R. Furlanetto, The First Galaxies InThe Universe (Princeton University Press, Princeton,NJ, 2013).[5] M. F. Morales, ApJ, , 678 (2005), arXiv:astro-ph/0406662.[6] J. D. Bowman, M. F. Morales, and J. N. Hewitt, ApJ, , 20 (2006), arXiv:astro-ph/0507357.[7] A. Lidz, O. Zahn, M. McQuinn, M. Zaldarriaga, andL. Hernquist, ApJ, , 962 (2008), arXiv:0711.4373.[8] G. Harker, S. Zaroubi, G. Bernardi, M. A. Brent-jens, A. G. de Bruyn, B. Ciardi, V. Jeli´c, L. V. E.Koopmans, P. Labropoulos, G. Mellema, A. Oﬀringa,V. N. Pandey, A. H. Pawlik, J. Schaye, R. M.Thomas, and S. Yatawatta, MNRAS, , 2492 (2010),arXiv:1003.0965 [astro-ph.CO].[9] A. Parsons, J. Pober, M. McQuinn, D. Jacobs, andJ. Aguirre, ApJ, , 81 (2012), arXiv:1103.2135 [astro-ph.IM].[10] A. de Oliveira-Costa, M. Tegmark, B. M. Gaensler,J. Jonas, T. L. Landecker, and P. Reich, MNRAS, ,247 (2008), arXiv:0802.1525.[11] V. Jeli´c, S. Zaroubi, P. Labropoulos, R. M. Thomas,G. Bernardi, M. A. Brentjens, A. G. de Bruyn, B. Cia- rdi, G. Harker, L. V. E. Koopmans, V. N. Pandey,J. Schaye, and S. Yatawatta, MNRAS, , 1319 (2008),arXiv:0804.1130.[12] G. Bernardi, A. G. de Bruyn, M. A. Brentjens, B. Ciardi,G. Harker, V. Jeli´c, L. V. E. Koopmans, P. Labropoulos,A. Oﬀringa, V. N. Pandey, J. Schaye, R. M. Thomas,S. Yatawatta, and S. Zaroubi, A&A, , 965 (2009),arXiv:0904.0404 [astro-ph.CO].[13] J. C. Pober, A. R. Parsons, J. E. Aguirre, Z. Ali, R. F.Bradley, C. L. Carilli, D. DeBoer, M. Dexter, N. E.Gugliucci, D. C. Jacobs, D. MacMahon, J. Manley, D. F.Moore, I. I. Stefan, and W. P. Walbrugh, ArXiv e-prints(2013), arXiv:1301.7099 [astro-ph.CO].[14] X. Wang, M. Tegmark, M. G. Santos, and L. Knox, ApJ, , 529 (2006), arXiv:astro-ph/0501081.[15] J. D. Bowman, M. F. Morales, and J. N. Hewitt, ApJ, , 183 (2009), arXiv:0807.3956.[16] A. Liu, M. Tegmark, J. Bowman, J. Hewitt, and M. Zal-darriaga, MNRAS, , 401 (2009), arXiv:0903.4890.[17] A. Liu, M. Tegmark, and M. Zaldarriaga, MNRAS, ,1575 (2009), arXiv:0807.3952.[18] G. Harker, S. Zaroubi, G. Bernardi, M. A. Brentjens,A. G. de Bruyn, B. Ciardi, V. Jeli´c, L. V. E. Koopmans,P. Labropoulos, G. Mellema, A. Oﬀringa, V. N. Pandey,J. Schaye, R. M. Thomas, and S. Yatawatta, MNRAS, , 1138 (2009), arXiv:0903.2760 [astro-ph.CO].[19] E. Chapman, F. B. Abdalla, G. Harker, V. Jeli´c,P. Labropoulos, S. Zaroubi, M. A. Brentjens, A. G. deBruyn, and L. V. E. Koopmans, MNRAS, , 2518(2012), arXiv:1201.2190 [astro-ph.CO]. −2 −1 −20%0 %20 %40 %60 %80 %100% k (Mpc −1 ) R e l a t i v e E rr o r B a r I n c r ea s e Incorrect ErrorEstimationSuboptimalBinningIncorrect ErrorEstimation M ∝ I (Correlated Errors) M ∝ F −1/2 (Decorrelated Errors) M ∝ F −1 ( δ −Function Windows) FIG. 9. Neglecting the fact that the covariance of the power spectrum estimator is, in general, non-diagonal, can lead totwo mistakes that can either unnecessarily enlarge our error bars or, even worse, unjustiﬁably shrink them. In this ﬁgure, weﬁrst show an approximately 10% increase in the vertical error bars on the power spectrum (solid lines) from a suboptimalinverse variance weighted binning scheme, rather than the inverse covariance weighted binning of Equation (34). This problemis obviated by choosing an estimator with decorrelated errors and thus a diagonal covariance matrix. If one simply assumesthat the estimator covariance in Equation (35) is diagonal when it is not (dotted lines), one is led, depending on the choice ofestimator, either to roughly 50% larger error bars than necessary or, worse yet, artiﬁcially small error bars. The last mistake,choosing an estimator with small error bars—despite its wide window functions—and then neglecting the oﬀ-diagonal terms inthe estimator covariance, is potentially the most pernicious since it could lead to a claimed detection in the absence of signal.[20] E. Chapman, F. B. Abdalla, J. Bobin, J.-L. Starck,G. Harker, V. Jeli´c, P. Labropoulos, S. Zaroubi, M. A.Brentjens, A. G. de Bruyn, and L. V. E. Koopmans, MN-RAS, , 165 (2013), arXiv:1209.4769 [astro-ph.CO].[21] G. Paciga, T.-C. Chang, Y. Gupta, R. Nityanada, J. Ode-gova, U.-L. Pen, J. B. Peterson, J. Roy, and K. Sigurd-son, MNRAS, , 1174 (2011), arXiv:1006.1351 [astro-ph.CO].[22] A. Liu and M. Tegmark, MNRAS, , 3491 (2012),arXiv:1106.0007 [astro-ph.CO].[23] K. W. Masui, E. R. Switzer, N. Banavar, K. Bandura,C. Blake, L.-M. Calin, T.-C. Chang, X. Chen, Y.-C. Li,Y.-W. Liao, A. Natarajan, U.-L. Pen, J. B. Peterson,J. R. Shaw, and T. C. Voytek, ApJ, , L20 (2013),arXiv:1208.0331 [astro-ph.CO].[24] G. Paciga, J. Albert, K. Bandura, T.-C. Chang,Y. Gupta, C. Hirata, J. Odegova, U.-L. Pen, J. B. Pe-terson, J. Roy, R. Shaw, K. Sigurdson, and T. Voytek,ArXiv e-prints (2013), arXiv:1301.5906 [astro-ph.CO]. [25] L. Gleser, A. Nusser, and A. J. Benson, MNRAS, ,383 (2008), arXiv:0712.0497.[26] N. Petrovic and S. P. Oh, MNRAS, , 2103 (2011),arXiv:1010.4109 [astro-ph.CO].[27] A. R. Parsons, J. C. Pober, J. E. Aguirre, C. L. Carilli,D. C. Jacobs, and D. F. Moore, ApJ, , 165 (2012),arXiv:1204.4749 [astro-ph.IM].[28] J. Cho, A. Lazarian, and P. T. Timbie, ApJ, , 164(2012), arXiv:1203.5197 [astro-ph.CO].[29] A. Liu and M. Tegmark, Phys. Rev. D, , 103006 (2011),arXiv:1103.0281 [astro-ph.CO].[30] J. S. Dillon, A. Liu, and M. Tegmark, Phys. Rev. D, ,043005 (2013), arXiv:1211.2232 [astro-ph.CO].[31] J. R. Shaw, K. Sigurdson, U.-L. Pen, A. Stebbins, andM. Sitwell, ArXiv e-prints (2013), arXiv:1302.0327 [astro-ph.CO].[32] A. Datta, J. D. Bowman, and C. L. Carilli, ApJ, ,526 (2010), arXiv:1005.4071 [astro-ph.CO].[33] H. Vedantham, N. Udaya Shankar, and R. Subrah- −2 −1

0 %100%200%300%400%500%600% k (Mpc −1 ) R e l a t i v e W i ndo w F un c t i on W i d t h M ∝ I (Minimum Variance)M ∝ F −1/2 (Decorrelated Errors)M ∝ F −1 ( δ −Function Windows) FIG. 10. Just as with the error bars in Figure 9, generatingsuboptimally binned spherical power spectrum estimates byneglecting oﬀ-diagonal terms in the estimator covariance canlead to wider window functions than necessary. We illustratethe eﬀect by comparing the width of the window functions be-tween the 20th and 80th percentiles between the two binningschemes. This is important for the choice of power spectrumestimator with the smallest error bars and widest windowfunctions ( M ∼ I ). In the case where our power spectrumestimator has uncorrelated errors, there are no oﬀ-diagonalterms in the estimator covariance and both binning schemesare identical. In the case of the estimator with δ -function win-dow functions, suboptimal binning does not aﬀect the windowfunctions—though it still aﬀects the vertical errors (see Figure9). manyan, ApJ, , 176 (2012), arXiv:1106.1297 [astro-ph.IM].[34] M. F. Morales, B. Hazelton, I. Sullivan, and A. Beards-ley, ApJ, , 137 (2012), arXiv:1202.3830 [astro-ph.IM].[35] C. M. Trott, R. B. Wayth, and S. J. Tingay, ApJ, ,101 (2012), arXiv:1208.0646 [astro-ph.CO].[36] Thyagarajan et al., submitted (2013).[37] B. J. Hazelton, M. F. Morales, and I. S. Sullivan, ArXive-prints (2013), arXiv:1301.3126 [astro-ph.IM].[38] M. Tegmark, Phys. Rev. D, , 5895 (1997), arXiv:astro-ph/9611174.[39] J. R. Bond, A. H. Jaﬀe, and L. Knox, Phys. Rev. D, ,2117 (1998), arXiv:astro-ph/9708203.[40] M. Tegmark, A. J. S. Hamilton, M. A. Strauss, M. S.Vogeley, and A. S. Szalay, ApJ (1998), arXiv:astro-ph/9708020.[41] M. Tegmark, M. R. Blanton, M. A. Strauss, and et al.,ApJ, , 702 (2004), arXiv:astro-ph/0310725.[42] Y. Mao, M. Tegmark, M. McQuinn, M. Zaldarriaga,and O. Zahn, Phys. Rev. D, , 023529 (2008),arXiv:0802.1710. [43] M. F. Morales and J. Hewitt, ApJ, , 7 (2004),arXiv:astro-ph/0312437.[44] M. Tegmark, A. J. S. Hamilton, and Y. Xu, MNRAS, , 887 (2002), arXiv:astro-ph/0111575.[45] M. Tegmark, ApJ, , L87 (1997), arXiv:astro-ph/9611130.[46] C. L. Williams, J. N. Hewitt, A. M. Levine, A. deOliveira-Costa, J. D. Bowman, F. H. Briggs, B. M.Gaensler, L. L. Hernquist, D. A. Mitchell, M. F. Morales,S. K. Sethi, R. Subrahmanyan, E. M. Sadler, W. Arcus,D. G. Barnes, G. Bernardi, J. D. Bunton, R. C. Cappallo,B. W. Crosse, B. E. Corey, A. Deshpande, L. deSouza,D. Emrich, R. F. Goeke, L. J. Greenhill, B. J. Hazel-ton, D. Herne, D. L. Kaplan, J. C. Kasper, B. B. Kin-caid, R. Koenig, E. Kratzenberg, C. J. Lonsdale, M. J.Lynch, S. R. McWhirter, E. H. Morgan, D. Oberoi,S. M. Ord, J. Pathikulangara, T. Prabu, R. A. Remil-lard, A. E. E. Rogers, D. Anish Roshi, J. E. Salah, R. J.Sault, N. Udaya Shankar, K. S. Srivani, J. B. Stevens,S. J. Tingay, R. B. Wayth, M. Waterson, R. L. Webster,A. R. Whitney, A. J. Williams, and J. S. B. Wyithe,ApJ, , 47 (2012), arXiv:1203.5790 [astro-ph.CO].[47] S. J. Tingay, R. Goeke, J. D. Bowman, D. Emrich,S. M. Ord, D. A. Mitchell, M. F. Morales, T. Booler,B. Crosse, R. B. Wayth, C. J. Lonsdale, S. Tremblay,D. Pallot, T. Colegate, A. Wicenec, N. Kudryavtseva,W. Arcus, D. Barnes, G. Bernardi, F. Briggs, S. Burns,J. D. Bunton, R. J. Cappallo, B. E. Corey, A. Deshpande,L. Desouza, B. M. Gaensler, L. J. Greenhill, P. J. Hall,B. J. Hazelton, D. Herne, J. N. Hewitt, M. Johnston-Hollitt, D. L. Kaplan, J. C. Kasper, B. B. Kincaid,R. Koenig, E. Kratzenberg, M. J. Lynch, B. Mckinley,S. R. Mcwhirter, E. Morgan, D. Oberoi, J. Pathiku-langara, T. Prabu, R. A. Remillard, A. E. E. Rogers,A. Roshi, J. E. Salah, R. J. Sault, N. Udaya-Shankar,F. Schlagenhaufer, K. S. Srivani, J. Stevens, R. Subrah-manyan, M. Waterson, R. L. Webster, A. R. Whitney,A. Williams, C. L. Williams, and J. S. B. Wyithe, PASA, , e007 (2013), arXiv:1206.6945 [astro-ph.IM].[48] J. D. Bowman, I. Cairns, D. L. Kaplan, T. Murphy,D. Oberoi, L. Staveley-Smith, W. Arcus, D. G. Barnes,G. Bernardi, F. H. Briggs, S. Brown, J. D. Bunton, A. J.Burgasser, R. J. Cappallo, S. Chatterjee, B. E. Corey,A. Coster, A. Deshpande, L. deSouza, D. Emrich, P. Er-ickson, R. F. Goeke, B. M. Gaensler, L. J. Greenhill,L. Harvey-Smith, B. J. Hazelton, D. Herne, J. N. He-witt, M. Johnston-Hollitt, J. C. Kasper, B. B. Kincaid,R. Koenig, E. Kratzenberg, C. J. Lonsdale, M. J. Lynch,L. D. Matthews, S. R. McWhirter, D. A. Mitchell, M. F.Morales, E. H. Morgan, S. M. Ord, J. Pathikulangara,T. Prabu, R. A. Remillard, T. Robishaw, A. E. E. Rogers,A. A. Roshi, J. E. Salah, R. J. Sault, N. U. Shankar, K. S.Srivani, J. B. Stevens, R. Subrahmanyan, S. J. Tingay,R. B. Wayth, M. Waterson, R. L. Webster, A. R. Whit-ney, A. J. Williams, C. L. Williams, and J. S. B. Wyithe,PASA, , e031 (2013), arXiv:1212.5151 [astro-ph.IM].[49] C. G. T. Haslam, C. J. Salter, H. Stoﬀel, and W. E.Wilson, A&AS, , 1 (1982).[50] J. E. Noordam, in Society of Photo-Optical Instrumen-tation Engineers (SPIE) Conference Series , Society ofPhoto-Optical Instrumentation Engineers (SPIE) Con-ference Series, Vol. 5489, edited by J. M. Oschmann Jr.(2004) pp. 817–825.[51] S. van der Tol, B. D. Jeﬀs, and A.-J. . van der Veen, IEEE Transactions on Signal Processing, , 4497 (2007).[52] D. A. Mitchell, L. J. Greenhill, R. B. Wayth, R. J. Sault,C. J. Lonsdale, R. J. Cappallo, M. F. Morales, and S. M.Ord, IEEE Journal of Selected Topics in Signal Process-ing, , 707 (2008).[53] H. T. Intema, S. van der Tol, W. D. Cotton, A. S. Cohen,I. M. van Bemmel, and H. J. A. R¨ottgering, A&A, ,1185 (2009), arXiv:0904.3975 [astro-ph.IM].[54] T. J. Cornwell, K. Golap, and S. Bhatnagar, IEEE Jour-nal of Selected Topics in Signal Processing, , 647 (2008).[55] M. R. Griﬃth, A. E. Wright, B. F. Burke, and R. D.Ekers, ApJS, , 347 (1995).[56] M. I. Large, B. Y. Mills, A. G. Little, D. F. Crawford,and J. M. Sutton, MNRAS, , 693 (1981).[57] J. N. Douglas, F. N. Bash, F. A. Bozyan, G. W. Torrence,and C. Wolfe, AJ, , 1945 (1996).[58] O. B. Slee, Australian Journal of Physics, , 143 (1995).[59] A. S. Cohen, W. M. Lane, W. D. Cotton, N. E. Kassim, T. J. W. Lazio, R. A. Perley, J. J. Condon, and W. C.Erickson, AJ, , 1245 (2007), arXiv:0706.1191.[60] E. Komatsu, K. M. Smith, J. Dunkley, C. L. Bennett,B. Gold, G. Hinshaw, N. Jarosik, D. Larson, M. R. Nolta,L. Page, D. N. Spergel, M. Halpern, R. S. Hill, A. Kogut,M. Limon, S. S. Meyer, N. Odegard, G. S. Tucker, J. L.Weiland, E. Wollack, and E. L. Wright, ApJS, , 18(2011), arXiv:1001.4538 [astro-ph.CO].[61] T. Di Matteo, R. Perna, T. Abel, and M. J. Rees, ApJ, , 576 (2002), arXiv:astro-ph/0109241.[62] J. M. Bittner and A. Loeb, J. Cosmology Astropart.Phys., , 038 (2011), arXiv:1006.5460 [astro-ph.CO].[63] R. Barkana, MNRAS,397