Recovering the Wedge Modes Lost to 21-cm Foregrounds
Samuel Gagnon-Hartman, Yue Cui, Adrian Liu, Siamak Ravanbakhsh
MMNRAS , 1–15 (2020) Preprint 18 February 2021 Compiled using MNRAS L A TEX style file v3.0
Recovering the Lost Wedge Modes in 21-cm Foregrounds
Samuel Gagnon-Hartman, , ★ Yue Cui, , Adrian Liu, Siamak Ravanbakhsh , Department of Physics and McGill Space Institute, McGill University, Montreal, QC, Canada H3A 2T8 Department of Physics and Astronomy, Bishop’s University, 2600 College Street, Sherbrooke J1M 1Z7, Canada University of Electronic Science and Technology of China, 2006 Xiyuan Ave., West High-tech Zone, Chengdu, Sichuan, China School of Computer Science, McGill University, 845 Sherbrooke Street, Montreal H3A 0G4, Canada MILA - Quebec AI Institute, 6666 St Urbain St, Montreal, Quebec H2S 3H1, Canada
Accepted XXX. Received YYY; in original form ZZZ
ABSTRACT
One of the critical challenges facing imaging studies of the 21-cm signal at the Epoch ofReionization (EoR) is the separation of astrophysical foreground contamination. These fore-grounds are known to lie in a wedge-shaped region of ( 𝑘 ⊥ , 𝑘 (cid:107) ) Fourier space. Removingthese Fourier modes excises the foregrounds at grave expense to image fidelity, since thecosmological information at these modes is also removed by the wedge filter. However, the21-cm EoR signal is non-Gaussian, meaning that the lost wedge modes are correlated to thesurviving modes by some covariance matrix. We have developed a machine learning-basedwhich exploits this information to identify ionized regions within a wedge-filtered image. Ourmethod reliably identifies the largest ionized regions and can reconstruct their shape, size,and location within an image. We further demonstrate that our method remains viable wheninstrumental effects are accounted for, using the Hydrogen Epoch of Reionization Array andthe Square Kilometre Array as fiducial instruments. The ability to recover spatial informationfrom wedge-filtered images unlocks the potential for imaging studies using current- and next-generation instruments without relying on detailed models of the astrophysical foregroundsthemselves.
Key words: cosmology – machine learning – deep neural network
The highly redshifted 21-cm line is becoming recognized as apromising probe of the high-redshift universe, with the potentialto use neutral hydrogen as a tracer to map out volumes extend-ing from redshift 𝑧 ∼ ★ [email protected] smooth-spectrum foregrounds occupy an anisotropic wedge-shapedregion of Fourier space, leaving only a small window of Fourierspace where the 21-cm signal may be cleanly observed (e.g. Par-sons et al. 2012; Datta et al. 2010; Vedantham et al. 2012; Moraleset al. 2012; Trott et al. 2012; Thyagarajan et al. 2013; Hazelton et al.2013; Liu et al. 2014a,b). This is illustrated in Figure 1, where 𝑘 ⊥ refers to spatial wavenumbers perpendicular to the line of sight ofone’s observations and 𝑘 (cid:107) to spatial wavenumbers parallel to theline of sight. Up to some proportionality factors, the former is theFourier dual to angles on the sky, while the latter is the Fourier dualto frequency since the observed redshift of the 21-cm emission canbe mapped to radial distance.A clean measurement of the cosmological 21-cm signal cantherefore in principle be made by probing only the regions of Fourierspace which do not lie within the wedge. Pober et al. (2014) demon-strates that for statistical quantities like the power spectrum of spa-tial fluctuations, current instruments such as the Hydrogen Epochof Reionization Array (DeBoer et al. 2017; HERA) can in prin-ciple make high signal-to-noise measurements with this type of“foreground avoidance" strategy. However, this approach has somedrawbacks. First, the cosmological signal strength peaks on largescales (or equivalently, on modes with small spatial wavenumber, 𝑘 ), meaning that 𝑘 modes lying within the wedge may have a signif-icantly higher signal-to-noise ratio than those lying in uncontami- © a r X i v : . [ a s t r o - ph . C O ] F e b Gagnon-Hartman et al. k k , Q W U L Q V L F ) R U H J U R X Q G V 7 K H : H G J H ( R 5 : L Q G R Z Figure 1.
A schematic depicting the footprint of foreground contaminationin cylindrical 𝑘 -space. Intrinsic foregrounds uniformly overwhelm low 𝑘 (cid:107) modes, while mode-mixing contaminates higher 𝑘 (cid:107) modes in a region calledthe wedge. The region left untouched by foregrounds is referred to as theEpoch of Reionization (EoR) window. 𝜃 designates the “wedge angle",which characterizes the wedge’s footprint and is equivalent to arctan ( 𝐶 ) with reference to Equation 1. nated regions. Foreground avoidance therefore results in a potentialsignificant reduction in the overall signal-to-noise of one’s mea-surement. Second, while statistical measurements like the powerspectrum can leverage the statistical isotropy of our Universe in aforeground avoidance scheme, this is not an avenue that is avail-able to imaging experiments, which need to retain full realization-specific information on individual Fourier modes. Said differently,eliminating Fourier modes within the wedge is equivalent to filter-ing the data in a rather strange way, where the data is put througha high-pass filter in the spectral direction with finer angular scales(high 𝑘 ⊥ ) being subject to a more aggressive filter. The resultingimages therefore become extremely difficult to interpret. This can beseen in Figure 2, where we show the effect of a foreground wedgefilter on noiseless example images of 21-cm emission during theEoR. Two effects are immediately apparent. The first is that the mapis no longer statistically isotropic. The second is that the locationsof ionized bubbles (those with zero 21-cm brightness temperature)around first-generation galaxies are distorted beyond recognition.It is not simply the case that the wedge-filtered maps are slightlyblurred versions of the original maps; the morphologies are com-pletely different (Beardsley et al. 2015).To do EoR science using 21-cm images , one should thereforego beyond foreground avoidance and actually perform foregroundsubtraction. Numerous techniques have been proposed for this (seeLiu & Shaw 2020 for a summary). Many of these techniques involvethe explicit modelling of foreground emission or parameterized fits(whether based on preset templates or empirical ones). Thus far,neither technique has demonstrated that the foreground emission An alternative approach is to forward model the distortions of wedge-filtered images and to make probabilistic statements regarding the true im-ages, as was explored in Beardsley et al. (2015). in an actual observation can be removed to the thermal noise levelof the instruments. Recently, machine learning-based foregroundremoval ideas have been explored in the literature. For example, Liet al. (2019) trained a convolutional denoising autoencoder to modeland remove foreground emission as seen through an instrumentalbeam pattern, outputting the underlying EoR signal. Makinen et al.(2020) consider hypothetical single-dish 21-cm observations of thepost-reionization neutral hydrogen signal and use a U-Net to im-prove foreground cleaning following a more traditional principal-component-based foreground removal step.In this paper, we build on both Li et al. (2019) and Makinenet al. (2020) to propose a U-Net-based deep learning algorithm torecover Fourier modes that are ignored or nulled out by a fore-ground avoidance scheme. In our study, we are asking more of ournetwork than Li et al. (2019) did in theirs, because we are attempt-ing to recover Fourier modes after a more aggressive cut; in Liet al. (2019), only the first few 𝑘 (cid:107) modes (corresponding to the“Intrinsic Foregrounds" portion of Figure 1) were excised from thedata, whereas we remove the entire foreground wedge and have ournetwork reconstruct the cosmological signal there from the non-excised modes. The initial principal component pre-processing inMakinen et al. (2020) will in principle touch a broad range of Fouriermodes. This occurs because systematics tend to proliferate acrossmany Fourier modes (Switzer & Liu 2014). The flip side of this,however, is that the removal of a set of principal components willin general not entirely zero out any Fourier modes. Our work buildson this by considering the recovery of cosmological Fourier modesafter a more drastic excision: the relevant Fourier modes are zeroedout completely, and because we are dealing with the instrumentallymore complicated case of an interferometer (rather than a singledish), we conservatively excise all Fourier modes within the wedge.After removing the Fourier modes in the foreground wedge, itis unclear a priori whether there remains enough information to re-cover the cosmological portion of the excised modes from the rest ofthe dataset. If the EoR signal were Gaussian-distributed and obeyedstationary statistics, we would immediately know that this is impos-sible, since the Fourier modes would then be uncorrelated. However,during the EoR there are significant non-Gaussian correlations be-tween Fourier modes (Shimabukuro et al. 2016; Majumdar et al.2018; Watkinson et al. 2018; Hutter et al. 2019; Gorce & Pritchard2019). This in principle allows a reconstruction of modes that arelost in the foreground suppression (or subtraction) process, and in-deed, this is the idea of proposed tidal reconstruction schemes forpost-reionization 21-cm experiments (Zhu et al. 2018; Li et al. 2018;Goksel Karacayli & Padmanabhan 2019). Unfortunately, our rela-tive ignorance of the relevant astrophysics of the EoR makes such areconstruction difficult to formulate using traditional cosmologicaltechniques. It is for this reason that we turn to a machine-learning-based approach.In what follows, we will demonstrate with our U-net that there is in fact enough information to recover reasonable images of theEoR after completely removing Fourier modes in the foregroundwedge. We will focus on an imaging application of 21-cm maps: theidentification of ionized bubbles during the EoR. We will demon-strate that a machine learning approach enables a reliable identifi-cation of the largest ionized bubbles, even with current-generationexperiments. The rest of the paper is structured as follows. Section 2reviews the phenomenology of the wedge and establishes notation.Section 3 describes the data preparation procedure and the archi-tecture of the Convolutional Neural Network (CNN) that we use.Section 4 includes a description of the five trainings run using thenetwork and their results. MNRAS , 1–15 (2020) ecovering the Lost Wedge Modes in 21-cm Foregrounds T 7 U D Q V 0 S F 3 R V W : H G J H , P D J H 7 U D Q V T / R 6 3 R V W : H G J H , P D J H / R 6 T > P . @ Figure 2.
A 21cmFAST 21-cm brightness temperature anisotropy map before and after replacing all Fourier modes within outside the EoR window with nullvectors. In the left pair of images, box axes are transverse to the line of sight direction, while in the right pair of images the vertical axis is the line of sightdirection. Localized structures are lost in both the transverse and line of sight dimensions.
In this section, we briefly review the foreground wedge. For a morein-depth summary and derivations, see Liu & Shaw (2020) and refer-ences therein. For a review of some alternative foreground removaltechniques, see Hothi et al. (2020); Cunnington et al. (2020).At the relevant frequencies, astrophysical foregrounds suchas Galactic synchrotron emission overwhelm the EoR signal by4 to 5 orders of magnitude, making spatial mapping of the EoRsignal impossible without some means of foreground avoidance,removal, or subtraction. Since the foreground elements which con-taminate the high-redshift 21-cm signal are expected to be spectrallysmooth, only the lowest 𝑘 (cid:107) modes (i.e., modes along the line of sightor frequency direction) should be intrinsically affected. However,the frequency dependence inherent to an interferometer’s responsecauses what is referred to as “mode-mixing", whereby contamina-tion leaks into higher 𝑘 (cid:107) modes. This effect is most pronounced forlonger baselines of an interferometer (which probe high 𝑘 ⊥ angularFourier modes), since these baselines have finer fringe patterns thatdilate or contract more quickly with changing frequency. The prolif-eration of foregrounds to a broader range of Fourier modes reducesthe available Fourier space over which cosmological measurementscan be performed.Fortunately, the physics of mode-mixing predicts that this pro-liferation is limited to a well-defined wedge-shaped region of 𝑘 ⊥ - 𝑘 (cid:107) space (Datta et al. 2010; Pober et al. 2014; Dillon et al. 2014; Liuet al. 2014a; Pober 2014), illustrated schematically in Figure 1.Mathematically, the boundary of the wedge is given by 𝑘 (cid:107) = 𝑘 ⊥ (cid:18) sin 𝜃 FoV 𝐷 M ( 𝑧 ) 𝐸 ( 𝑧 ) 𝐷 H ( + 𝑧 ) (cid:19) ≡ 𝑘 ⊥ tan 𝜓, (1)where 𝜃 FoV is the angular radius of the field of view, 𝐷 H ≡ 𝑐 / 𝐻 , 𝐻 is the Hubble parameter, 𝐸 ( 𝑧 ) ≡ √︁ Ω m ( + 𝑧 ) + Ω Λ , Ω m isthe normalized matter density, Ω Λ is the normalized dark energydensity, and 𝐷 M ( 𝑧 ) is the transverse co-moving distance (Hogg1999). We have additionally defined 𝜓 to be the angle that the wedgemakes with the 𝑘 ⊥ axis. There is some uncertainty as to preciselywhat this angle ought to be, because there is a lack of consensusas to what value of 𝜃 FoV should be inserted into the expression.A pessimistic assumption might be to set 𝜃 FoV to be 90 ◦ . Thiscorresponds to the horizon, which may be a realistic choice sinceantenna beam patterns do not generally have sharp cutoffs, and evenlow-level sidelobes can pick up on bright foreground emission veryfar away from zenith. More optimistic forecasts in the literaturehave assumed 𝜃 FoV < ◦ , reflecting the community’s aspiration z : H G J H $ Q J O H > @ FoV > @
Figure 3.
The internal angle of the wedge as a function of redshift for various 𝜃 FoV . This work assumes a wedge angle of 75 ◦ , which is consistent with themost pessimistic possible wedge at 𝑧 = .
5, the highest redshift consideredin this study. that some combination of beam control and foreground subtractionmay be able to reduce the bleed of foregrounds in Fourier space.In Figure 3 we illustrate how 𝜓 scales with 𝜃 FoV and redshift 𝑧 .In this paper, we conservatively zero out Fourier modes lying below 𝜓 = ◦ . This roughly corresponds to the most pessimistic case ofa horizon wedge at 𝑧 = .
5, the highest redshift considered in thisstudy. Such a filter was how Figure 2 was produced, demonstratingthat a substantial amount of information is lost. Filtered images likethose will be the starting point for our information recovery, and inSection 3 we go into more detail about our data preparation beforeshowcasing our results in Section 4.
The data used to represent the “clean” 21-cm signal are producedusing , a semi-numerical simulation of the highly red-shifted 21-cm signal (Mesinger et al. 2010). We chose because its relatively quick speed enables the construction of a suf-ficient amount of data upon which to train a neural network. Therelevant outputs from the code are maps of the 21-cm brightnesstemperature field, evaluated at fixed snapshots in redshift. In otherwords, we do not consider light cone effects in this paper, although
MNRAS000
MNRAS000 , 1–15 (2020)
Gagnon-Hartman et al. (200,200,200)
Interpolated+Tiled Map (200,200,200)
Convolution FilterSimulated Noise Intensity Map with Noise (82,127,127)(82, 4363, 4363)
Intensity Map w/ Wedge (200,200,200)
Intensity Map w/ Noise+WedgeIntensity Map w/ Noise+Wedge (one tile)InterpolatedIntensity Map(one tile)Wedge Recovery Wedge Recovery with NoiseBinary Intensity Map (82,127,127)(82, 4363, 4363)(82,127,127)(82, 4363, 4363)(82,127,127)(82, 4363, 4363) (82,127,127) (82, 4363, 4363)(82,29,29)(82, 2163, 2163) (82,29,29)(82, 2163, 2163)
Figure 4.
The preprocessing routine used for data preparation. The size of an individual map at each step is given beneath the name of the step. In boxes wheretwo sizes are given, the top corresponds to HERA noise preparation while the bottom corresponds to SKA noise preparation. The network used in this paperwas tested on removing the wedge from maps where instrumental noise is present and from maps without instrumental noise. The noise-enriched intensitymaps are first convolved with the kernel inherent to the instrument of interest and then the thermal noise inherent to the instrument is added to the result. Thewedge filter used in both pathways Fourier transforms the map, sends all vectors within the wedge region to zero, and then inverse Fourier transforms back toreal space to produce the wedge-affected image. 1 H X W U D O ) U D F W L R Q ) L O H V 7 U D L Q L Q J 9 D O L G D W L R Q z Figure 5.
The distribution of training and validation files across neutralfractions. Files are clustered around particular neutral fractions indicative ofthe redshift from which they were drawn. One validation file was selectedfrom a neutral fraction lower than those represented in the training set. of course a real observation would include such effects (Datta et al.2014; La Plante et al. 2014). We fix our boxes to have 200 × × (see, e.g., the fiducial valuesused in Park et al. 2019) In future work, it will be important toconsider other parameter values; for this paper, however, we followthe precedent of Li et al. (2019) and Makinen et al. (2020) and keep parameters fixed. Again, our goal is to provide a proof-of-concept study to establish that it is indeed possible to recover themorphology of ionized regions in wedge-filtered images. Our studyis therefore highly complementary to that of Bianco et al. (2021),who have trained a network for ionized bubble identification that isrobust to a wide range of parameter values and have performed anextensive study of instrumental noise levels, but do not include theeffects of foregrounds.With the aforementioned settings, we generated atotal of 20 random realizations, each with different random seeds forthe initial density field of the simulations. By producing simulationboxes at various redshifts between 𝑧 = . 𝑧 = .
0, we obtain atotal of 57 different boxes. One of the random realizations is evolveddown to redshift 𝑧 = .
85 so that its 𝑧 = .
85 realization is usedonly in validation. Other-redshift realizations of this seed are usedin training. The motivation here was to test domain transfer acrossneutral fractions, i.e., to see to what extent a neural network trainedin the range 𝑧 = . . 𝑧 = . Noiseless data.
Each 21-cm brightness temperature cube isfirst Fourier transformed, and then all Fourier modes outside of the“EoR window" are zeroed out. The result is inverse Fourier trans-formed to give a final box in configuration space. This representswhat a perfect, noiseless instrument might see once foreground-contaminated wedge modes are excised.(ii)
Noisy data.
In a parallel dataset, we included instrumen-tal effects. As our fiducial instruments we consider HERA and theSquare Kilometre Array (SKA; Koopmans et al. 2015). The motiva-
MNRAS , 1–15 (2020) ecovering the Lost Wedge Modes in 21-cm Foregrounds tion here is that HERA represents a current-generation instrumentthat is not necessarily optimized for imaging, whereas the SKA isa next-generation instrument that is better suited for imaging. Foreach of these interferometers, we take into account its Fourier-spacesampling (i.e., the 𝑢𝑣 distribution in radio astronomy parlance) toconvolve the original 21-cm brightness temperature boxes with anappropriate—and non-trivial—point spread function. The 𝑢𝑣 dis-tribution is determined by the antenna layout in the interferometerarray. For HERA we assume its full 350-dish configuration, with320 dishes in an “split hexagon" layout and 30 outrigger dishes (seeDillon & Parsons 2016; DeBoer et al. 2017 for details). For the SKAwe assume the fiducial design outlined in the “SKA Admin - SKATEL SKO DD 001 1 Baseline Design 1" memo .We also use a modified version of (Pober et al. 2013,2014) to add Gaussian random noise (according to the radiometerequation) to the sampled Fourier modes, thereby producing instru-mental noise that has the proper pixel-to-pixel correlations in con-figuration space. After adding instrumental noise we perform thewedge excision as with the noiseless data (mimicking the sequencethat would take place with real observations). A total integrationtime of 1080 hrs is assumed. The preprocessing pipelines for thenoiseless and noise-inclusive trainings are shown in Figure 4, wherethe numbers in parentheses represent the size of a tensor at each step.One sees from the numbers that in many cases, it was necessary totile the boxes in order to match the fact that HERA andthe SKA have wide fields of view (relative to the angle subtendedby a ∼
300 Mpc simulation box at 𝑧 ∼ after applying instrumental and noise effects.(iii) Null tests.
Finally, we consider a set of “Gaussianized" boxesin order to test our hypothesis that it is non-Gaussian correlationsbetween Fourier modes that enable the reconstruction of modeswithin the foreground wedge. If our guess is correct, an accuratereconstruction of the original images should fail. We Gaussianizeour boxes in two ways. One method is to take the Fourier trans-form of each brightness temperature map and replacethe phase of each Fourier coefficient with a phase drawn from auniform distribution between 0 and 2 𝜋 while preserving its ampli-tude. The second method is to generate a new Gaussian realizationof a map given the power spectrum of the original (non-Gaussian)map, followed by an assignment of pixels from the new map to theold map by their ranking in brightness. In other words, the valueof the dimmest pixel in the Gaussian realization replaces the valueof the dimmest pixel in the original map, the second dimmest pixelreplaces the second dimmest pixel in the original, and so on. In thisway, a histogram of pixels in our map is Gaussian but we preservethe morphology of having low brightness temperature “bubbles”and higher brightness temperatures elsewhere.All trainings conducted in this study used a collection of 57 bright-ness temperature boxes (pre-processed to include instrumental ef-fects in the manner that we have just described). Of these, 51 wereused for training our neural networks and 6 were used for vali-dation. The distribution of the redshifts and neutral fractions ofthe training and validation sets are shown in Figure 5. The inputsto our neural networks are wedge-filtered brightness temperature https://github.com/jpober/21cmSense maps which have been normalized between 0 and 1. During train-ing the outputs are compared to ground truth binarized maps whereall non-zero voxels in 21-cm brightness temperature are set to one(i.e. any voxel which is not fully ionized is considered fully neutralby the network). This binarization is performed to simplify the taskinto a two-class image segmentation problem where we are simplyinterested in knowing whether a part of our Universe is neutral ornot. In its simplest form, our problem is one of image segmentation.We have an image wherein some regions are ionized and the rest isneutral, but the boundaries between these regions are not obviousafter passing through the wedge filter. A desirable wedge-removingnetwork is able to label each pixel within the wedge-affected mapas neutral or ionized, which is an image segmentation task. Giventhis, we select a U-net architecture for our neural network given theU-net’s demonstrated success in image segmentation tasks (Ron-neberger et al. 2015; Isensee et al. 2019). Our U-Net draws heavilyfrom the architecture presented in Isensee et al. (2018). In whatfollows we closely mirror the presentation in that paper, while alsohighlighting modifications made to the network for this work.A schematic of our neural network in shown in Figure 6.The network is configured to process large 3D input blocks of128 × ×
128 voxels. The basic U-Net architecture intrinsi-cally recombines different scales throughout the entire network,allowing it to make effective use of the entire input volume. Thegeneral U-Net architecture consists of a contextualization pathway(left branch) which encodes increasingly abstract representations ofthe input as one progresses deeper into the network, followed by alocalization pathway (right branch) which recombines the abstractrepresentations with shallower features in order to precisely localizethe structures of interest. The vertical depth in the U-Net is referredto as the level, with deeper levels having lower spatial resolutionand more channels than shallower levels.The activations in the context pathway are computed by a pre-activation residual block containing two 3 × × × × × × × × MNRAS000
128 voxels. The basic U-Net architecture intrinsi-cally recombines different scales throughout the entire network,allowing it to make effective use of the entire input volume. Thegeneral U-Net architecture consists of a contextualization pathway(left branch) which encodes increasingly abstract representations ofthe input as one progresses deeper into the network, followed by alocalization pathway (right branch) which recombines the abstractrepresentations with shallower features in order to precisely localizethe structures of interest. The vertical depth in the U-Net is referredto as the level, with deeper levels having lower spatial resolutionand more channels than shallower levels.The activations in the context pathway are computed by a pre-activation residual block containing two 3 × × × × × × × × MNRAS000 , 1–15 (2020)
Gagnon-Hartman et al. ++++ +CCCC +
16 16
32 32 64 64
128 128 256 256
128 64
64 32 32 16 16 +C + INPUT OUTPUT
Figure 6.
A block diagram of the U-Net used in this work. in blue) consists of a 3 × × × × − . Instance normalization is used on all contextualiza-tion modules instead of batch normalization since the stochasticityinduced by small batch sizes can destabilize batch normalization(Isensee et al. 2018; Ulyanov et al. 2016). Skip connections con-nect layers of equal depth across the network via concatenationalong the channel axis, as per the original U-Net design presentedin Ronneberger et al. (2015).The final layer of the network is a so-called “binarization filter",which maps each voxel in the output to zero or one depending onsome threshold. It is not used during training in order to incentivizethe network to produce near-binary outputs. When predictions aregenerated for post-training testing, the binarization filter is usedwith a threshold of 0 .
9. Some level of arbitrariness exists in thedetermination of the cutoff used in binarizing the prediction andground truth boxes. We selected 0 . . . Adam optimizerwith an initial learning rate of 𝑙𝑟 init = · − , and an exponentiallydecaying learning schedule ( 𝑙𝑟 init · . 𝐸 , where 𝐸 is the numberof epochs elapsed). The network is trained using a differentiableapproximation of the binary dice coefficient function, defined as 𝐷 𝑙𝑜𝑠𝑠 ≡ − × | 𝑋 ∩ 𝑌 | + 𝛼 | 𝑋 | + | 𝑌 | + 𝛼 , (2) where 𝑋 and 𝑌 represent the truth and prediction matrices, respec-tively, and 𝛼 represents a small number used to avoid divide-by-zeroerrors (in our implementation, 𝛼 = The binary dice coefficients calculated for training and validationdata at the end of every epoch of training are shown in Figure 7.In none of the three models is a point reached in training wherethe training loss continues to decrease while the validation lossincreases. This suggests that our network is not over-fitting. Fur-thermore, by 100 epochs all validation loss curves have entered adomain of near-flatness, indicating that the network has learned allthat it can from the data set. However, in all three models a largedivide separates the validation loss from the training loss, possiblyindicating that our learning may benefit from a training set of largersize or variation (Anzanello & Fogliatto 2011).
Figures 8, 9, and 10 display sample predictions from each validationbox in each test. Each figure is arranged into four columns and sixrows. The first column in each figure shows a cross-section of theoriginal 21-cm brightness temperature map. In Figures 9 and 10this temperature map is sampled by HERA and the SKA’s Fourierfootprints, respectively. Appropriately correlated noise is added, inaccordance with the procedure outlined in Section 3.1. The second
MNRAS , 1–15 (2020) ecovering the Lost Wedge Modes in 21-cm Foregrounds 7 U D L Q L Q J ( S R F K % L Q D U \ ' L F H & R H I I L F L H Q W 6 . $ 7 U D L Q 6 . $ 9 D O + ( 5 $ 7 U D L Q + ( 5 $ 9 D O 1 R L V H O H V V 7 U D L Q 1 R L V H O H V V 9 D O Figure 7.
The binary dice loss computed at each epoch of training for each ofthe three models. Solid lines are used for training loss, while the dashed linesare used for validation loss. The curves are indicative of learning withoutsignificant over-fitting. column of each figure shows a cross-section of the wedge-filteredinput to our network. The third column shows the 21-cm tempera-ture field after being passed through a binarization filter; it is thiscolumn that represents the ground truth that our algorithm is tryingto reproduce. The final column shows the prediction made by thenetwork. Each row shows a sample set from one of the redshiftsincluded in the validation data set. In all figures the arbitrary de-cision is made to show cross-sections which are perpendicular tothe line of sight direction. As we know from Figure 2, slices alongthe line of sight direction look substantially different and containunique information. We remind the reader that our network takesin 3D data cubes and outputs 3D data cubes, and thus all of thisinformation is used in the prediction.Figure 8 displays sample predictions from the noiseless model.Comparing the third and fourth columns, it is clear that the networkis capable of reproducing the sizes, shapes, and locations of thelargest bubbles in each image. However, it is also evident that manystructures present in the ground truth do not appear in the predic-tion, especially small structures. While the network misses manystructures, it does not tend to create structures which are not presentin the ground truth. This observation will be expanded upon as wediscuss the prediction statistics. The performance of the networkdoes not appear to be significantly better or worse at any redshift.Figure 9 displays sample predictions from the HERA model.Despite HERA’s low resolution, the network still captures the lo-cations of the major ionized regions, and in all redshifts except for 𝑧 = .
85 it is able to reproduce the size and shape of the largest fewbubbles. This opens the door to the limited imaging work which canbe done using HERA, which was intended as a primarily statisticalmeasurement experiment.Figure 10 displays sample predictions from the SKA model.Since the SKA is an instrument more optimized for imaging, itspredictions are near in fidelity to those in the noiseless case. As withthe previous two cases, the network neglects the smallest bubblesin favour of the largest. Similarly to the HERA case, the SKAmodel performs more poorly on redshift 𝑧 = .
85 than on otherredshifts. However, we note that the seemingly poor performancehere is in fact a visual artifact of our plotting a transverse slice
Prediction Ground Truth Class
Ionized Ionized True PositiveIonized Neutral False PositiveNeutral Ionized False NegativeNeutral Neutral True Negative
Table 1.
The logic scheme used to determine the class of a prediction voxel.These class labels are then used to calculate the accuracy, precision, recall,and intersection-over-union metrics. of the data cubes. Figure 11 shows slices with one transverse andone light-of-sight axis. It is visually apparent that many of theionized bubble structures are recovered along the line of sight.This suggests that even if the network does not perform quite aswell when validated on boxes from redshifts that were not used intraining (recall Figure 5), there is still some degree of success whenconsidering the predictions in a three-dimensional volume.
The network’s performance in each test is evaluated on the similarityof the validation prediction data to their corresponding binarizedground truth data. This is judged for the first three models usingthe accuracy, precision, recall, and intersection-over-union (IoU)statistics. The first three metrics are calculated by classifying eachvoxel of a prediction box into one of four classes: true positives, truenegatives, false positives, and false negatives. The logic scheme usedfor class assignment is shown in Table 1. These are then distilled intoscores by taking the number of voxels in each class for a given boxand dividing by the total number of voxels in the box. For example,if a box with a resolution of 128 × ×
128 has 1500000 “falsepositive" voxels, then its “false positive" score is 0 . = TP + TNTP + FP + FN + TN . (3)Since accuracy accounts for the populations of all four classes, itis easily inflated in situations where one class is overwhelminglypresent. For example, if a validation box is 99 .
9% neutral, and thenetwork improperly identifies the 0 .
1% region which is ionized,then the accuracy of the prediction will be 99 .
8% despite the net-work not properly labelling a single ionized voxel. Therefore, othermetrics are necessary in order to capture full texture of a network’sclassification biases.Precision is a measure of how many voxels labelled as ionizedby the network are truly ionized. It is defined asPrecision = TPTP + FP . (4)This is useful in situations where the “cost" of a false positiveis high. In our study, we want to make sure that our network ispredicting ionized regions that actually exist, with an eye towardsfuture studies where ionized regions from 21-cm maps can be usedto direct searches for high-redshift galaxies.Recall is the share of truly ionized voxels which are labelled MNRAS000
8% despite the net-work not properly labelling a single ionized voxel. Therefore, othermetrics are necessary in order to capture full texture of a network’sclassification biases.Precision is a measure of how many voxels labelled as ionizedby the network are truly ionized. It is defined asPrecision = TPTP + FP . (4)This is useful in situations where the “cost" of a false positiveis high. In our study, we want to make sure that our network ispredicting ionized regions that actually exist, with an eye towardsfuture studies where ionized regions from 21-cm maps can be usedto direct searches for high-redshift galaxies.Recall is the share of truly ionized voxels which are labelled MNRAS000 , 1–15 (2020)
Gagnon-Hartman et al. z = 0 S F 0 S F 0 S F 0 S F 0 S F 0 S F z = z = z = z = 7 U X H T z = : H G J H ) L O W H U H G , Q S X W % L Q D U L ] H G T 3 U H G L F W L R Q T > P . @ Figure 8.
Sample network predictions on noiseless data suite. The first column shows a transverse cross-section of the true brightness temperature field, whilethe second column shows the same field after excising Fourier modes lying within the foreground wedge. The third column is a binarized version of the firstcolumn and serves as the ground truth for our neural network. The fourth column shows the predicted ionization maps from our network. Visually, it is clearthat our network is able to recover ionized bubbles from wedge-filtered 21-cm maps. MNRAS , 1–15 (2020) ecovering the Lost Wedge Modes in 21-cm Foregrounds z = 0 S F 0 S F 0 S F 0 S F 0 S F 0 S F z = z = z = z = 7 U X H T z = : H G J H ) L O W H U H G , Q S X W % L Q D U L ] H G T 3 U H G L F W L R Q T > P . @ Figure 9.
Same as Figure 8 except with the HERA data suite.MNRAS000
Same as Figure 8 except with the HERA data suite.MNRAS000 , 1–15 (2020) Gagnon-Hartman et al. z = 0 S F 0 S F 0 S F 0 S F 0 S F 0 S F z = z = z = z = 7 U X H T z = : H G J H ) L O W H U H G , Q S X W % L Q D U L ] H G T 3 U H G L F W L R Q T > P . @ Figure 10.
Same as Figure 8 except with the SKA data suite. MNRAS , 1–15 (2020) ecovering the Lost Wedge Modes in 21-cm Foregrounds 7 U X H T 0 S F : H G J H ) L O W H U H G , Q S X W % L Q D U L ] H G T 6 . $ 3 U H G L F W L R Q Figure 11.
Same as the top row of Figure 10, except with image slices containing one transverse axis and the line-of-sight axis. One sees that while thereconstruction of the 𝑧 = .
85 bubbles appears to be poor in Figure 10 when we just consider transverse slices, many line-of-sight structures are predictedcorrectly, suggesting a reasonable overall 3D reconstruction.
NoiselessNeutral Fraction Accuracy Precision Recall IoU
HERANeutral Fraction Accuracy Precision Recall IoU
SKANeutral Fraction Accuracy Precision Recall IoU
Table 2.
The tabulated statistics for the predictions made by the network oneach validation data suite. The highest score in each column is underlined,while the lowest is italicized. as ionized in the prediction. It is defined asRecall = TPTP + FN (5)A highly conservative network will have low recall, since it onlylabels regions which it is highly confident in as positive. Such a network may properly locate the rough location of ionized bubbles,but may not accurately portray their size or morphology by beingtoo conservative about pixels on the edge of the bubbles.IoU is a measure of the overlap between a prediction andits ground truth, defined as the algebraic intersection between twoboxes divided by their union. It is commonly used to evaluate thepredictions of image segmentation neural networks (Rezatofighiet al. 2019), and is included in this study for ease of comparisonwith similar networks. IoU is calculated viaIoU = | 𝑃 ∩ 𝑇 || 𝑃 ∪ 𝑇 | , (6)where 𝑃 is the binarized prediction 𝑃 and 𝑇 is the binarized groundtruth. Both are 3-dimensional boolean arrays.These statistics are tabulated for each box in the validation setin Table 2 which contains the results for the noiseless, HERA, andSKA models. The validation boxes are identified by their neutralfractions. What follows is a discussion of the statistics of eachmodel’s predictions, and what they may imply about the tendenciesof each model.While none of the statistics for the noiseless test are stronglycorrelated with the neutral fraction of the box, it is perhaps notablethat the network performed best in the recall and IoU statistics on thethree boxes with the lowest neutral fractions. This is probably notthe result of a bias in training, since the training set is more heavilybiased towards large neutral fractions (see Figure 5). Notably, thebox with the highest IoU also has the lowest accuracy, indicatingthat the network is likely to mark ionized voxels as neutral, butunlikely to mark neutral voxels as ionized.The HERA statistics are oriented with the highest recall andIoU statistics lying on the higher end of the neutral fraction spectrumand the highest precision on the lowest neutral fraction box. It isnotable that the HERA boxes have a much lower resolution than thenoiseless or SKA boxes, so less small-scale detail exists to be minedin the first place. This could have the effect of suppressing recallon low-neutral fraction boxes, where the ionized bubbles tend to besmaller.The SKA statistics are comparable in their distribution to theHERA statistics, save for recall, which does not vary as greatlyamong neutral fractions as in the HERA case. The precision scoresare higher than any on the HERA prediction, but they fall shortof the noiseless predictions. Since precision is a measure of thenumber of true positives out of all pixels labelled positive, this islikely a matter of image resolution. The HERA model does not havethe liberty to set a low confidence threshold for labelling each pixelsince it has relatively few to work with. Meanwhile, the SKA and MNRAS000
The tabulated statistics for the predictions made by the network oneach validation data suite. The highest score in each column is underlined,while the lowest is italicized. as ionized in the prediction. It is defined asRecall = TPTP + FN (5)A highly conservative network will have low recall, since it onlylabels regions which it is highly confident in as positive. Such a network may properly locate the rough location of ionized bubbles,but may not accurately portray their size or morphology by beingtoo conservative about pixels on the edge of the bubbles.IoU is a measure of the overlap between a prediction andits ground truth, defined as the algebraic intersection between twoboxes divided by their union. It is commonly used to evaluate thepredictions of image segmentation neural networks (Rezatofighiet al. 2019), and is included in this study for ease of comparisonwith similar networks. IoU is calculated viaIoU = | 𝑃 ∩ 𝑇 || 𝑃 ∪ 𝑇 | , (6)where 𝑃 is the binarized prediction 𝑃 and 𝑇 is the binarized groundtruth. Both are 3-dimensional boolean arrays.These statistics are tabulated for each box in the validation setin Table 2 which contains the results for the noiseless, HERA, andSKA models. The validation boxes are identified by their neutralfractions. What follows is a discussion of the statistics of eachmodel’s predictions, and what they may imply about the tendenciesof each model.While none of the statistics for the noiseless test are stronglycorrelated with the neutral fraction of the box, it is perhaps notablethat the network performed best in the recall and IoU statistics on thethree boxes with the lowest neutral fractions. This is probably notthe result of a bias in training, since the training set is more heavilybiased towards large neutral fractions (see Figure 5). Notably, thebox with the highest IoU also has the lowest accuracy, indicatingthat the network is likely to mark ionized voxels as neutral, butunlikely to mark neutral voxels as ionized.The HERA statistics are oriented with the highest recall andIoU statistics lying on the higher end of the neutral fraction spectrumand the highest precision on the lowest neutral fraction box. It isnotable that the HERA boxes have a much lower resolution than thenoiseless or SKA boxes, so less small-scale detail exists to be minedin the first place. This could have the effect of suppressing recallon low-neutral fraction boxes, where the ionized bubbles tend to besmaller.The SKA statistics are comparable in their distribution to theHERA statistics, save for recall, which does not vary as greatlyamong neutral fractions as in the HERA case. The precision scoresare higher than any on the HERA prediction, but they fall shortof the noiseless predictions. Since precision is a measure of thenumber of true positives out of all pixels labelled positive, this islikely a matter of image resolution. The HERA model does not havethe liberty to set a low confidence threshold for labelling each pixelsince it has relatively few to work with. Meanwhile, the SKA and MNRAS000 , 1–15 (2020) Gagnon-Hartman et al. 1 R L V H O H V V z z z z z z 1 R U P D O L ] H G & U R V V 3 R Z H U + ( 5 $ k > 0 S F @ 6 . $ Figure 12.
The normalized cross-power between each prediction and itsassociated binary mask in the validation sets for the noiseless, HERA, andSKA models. In all cases, normalized cross-power is highest at low 𝑘 ,indicating that the network is better at recovering large-scale structures thansmall-scale structures. noiseless models have high resolution images, and can afford toscrutinize each pixel. The normalized cross-power spectrum between the prediction andbinarized mask was also calculated for each test. To define thiscross-power, let us denote the prediction and binarized masks as 𝑓 and 𝑔 , which are real-valued data sets. Let the Fourier transformsof 𝑓 and 𝑔 be F and G , and the complex conjugates of these be F ∗ and G ∗ . We define the normalized cross-power of 𝑓 and 𝑔 to be thepower spectrum of N = F ∗ G √︁ (F ∗ F )(G ∗ G) , (7) 1 X O O 7 H V W z z z z z z k > 0 S F @ 1 R U P D O L ] H G & U R V V 3 R Z H U 1 X O O 7 H V W Figure 13.
The normalized cross-power between each prediction and itsassociated binary mask in the validation sets for both null test models. Inboth cases, the normalized cross-power is near zero across 𝑘 modes. which is a complex-valued function of 𝑘 . If 𝑓 and 𝑔 are identical,then the cross-spectrum is 1 at all 𝑘 , and if they share nothing at allin common, then the cross-spectrum is 0 at all 𝑘 . In this way, thecross-spectrum demonstrates the fidelity with which the networkrecovers different 𝑘 -modes of an image.Figure 12 shows the normalized cross-power spectra for thenoiseless, HERA, and SKA model predictions, while Figure 13shows the normalized cross-power spectra for both null tests. Com-mon among the predictions in Figure 12 is that the normalizedcross-power drops off as a function of 𝑘 . It does so most slowly forthe noiseless suite, and most quickly for the HERA suite, suggestinga relationship between box resolution and prediction fidelity. Therelationship between prediction fidelity and spatial frequency scaleis a well-documented phenomenon in machine learning, referredto in the literature as spectral bias (for an in-depth discussion, seeRahaman et al. 2019). However, it is unclear to what extent spectralbias places a limit on the performance of the network. It is possible,for example, that our network is not optimally configured and stillfalls short of the fidelity limit imposed by the U-Net’s spectral bias.The demonstrated drop-off in fidelity at high 𝑘 illustrates thatour algorithm is best suited for enabling image-associated sciencethat relies on the identification of ionized bubbles. While it is not MNRAS , 1–15 (2020) ecovering the Lost Wedge Modes in 21-cm Foregrounds * T 0 S F V : H G J H ) L O W H U H G , Q S X W % L Q D U \ * T 3 U H G L F W L R Q T > P . @ Figure 14.
Sample network prediction on a validation box slice from 𝑧 = .
85 in the first null test data suite. The network fails to reproduce any meaningfulstructures present in the ground truth image. As with Figure 8, the leftmost image shows the 21-cm brightness temperature field, except here the field hasbeen Gaussianized. The second image is the same as the first image after a wedge filter. The third image is a binarized version of the first, and the rightmostimage is the (failed) attempt at predicting the third image. This confirms our intuition that the key to reconstructing Fourier modes lost to the wedge is thenon-Gaussianity of EoR maps. * T 0 S F V : H G J H ) L O W H U H G , Q S X W % L Q D U \ * T 3 U H G L F W L R Q T > P . @ Figure 15.
Same as Figure 14, but for the second null test data suite. appropriate for improving measurements of the power spectrum orother Fourier space statistics, the network demonstrably excels atrecovering the locations and sizes of ionized regions.It is evident in Figure 12 that the normalized cross-power atany given 𝑘 tends to increase with redshift, regardless of the noisetype. This is probably an artifact of the training set, which leansstrongly towards high-redshift boxes (see Figure 5). The effect maybe further exacerbated on the 𝑧 = .
85 line, since the networks werenot trained on data from that redshift.Meanwhile, the normalized cross-power spectra for both nulltest validation suites are very noisy and do not demonstrate anyclear trend, besides being noisier at low 𝑘 . This indicates that thenetwork is completely unable to reconstruct the signal beneath thewedge when it is “Gaussianized", supporting our hypothesis that thenetwork is exploiting the non-Gaussian coupling between Fouriermodes to reconstruct the EoR signal. This is confirmed by a visualinspection of the predicted images, shown in Figures 14 and 15 forthe first and second null tests described in Section 3.1, respectively. We have developed a machine learning-based method to identifyionized bubbles during the Epoch of Reionization. Our methodconsiderably extends the work of Li et al. (2019) and Makinen et al.(2020) and uses a U-Net-based deep learning algorithm to recoverFourier modes that are obscured by foregrounds. The algorithmdoes not rely on any knowledge of the foregrounds themselves,and enables image reconstruction after all modes lying within theforeground wedge have been completely nulled out. This is possibledue to the significant non-Gaussian correlations between Fouriermodes (Shimabukuro et al. 2016; Majumdar et al. 2018; Watkinsonet al. 2018; Hutter et al. 2019; Gorce & Pritchard 2019).Our main goal was to assess whether or not enough informationexists in a wedge-filtered EoR image to reconstruct the originalimage within a reasonable margin of error. This paper demonstratesan affirmative answer to this question: the lost wedge modes can indeed be recovered from a wedge-filtered image by exploiting thenon-Gaussian nature of the 21-cm EoR signal. We verify that the U-Net relies on phase correlations in the 21-cm signal by performingtwo null tests where the phases are decorrelated with one another.In both null tests, the U-Net fails to reconstruct any meaningfulinformation (see Figures 14 and 15 for sample predictions, andFigure 13 for the Fourier space recovery fidelity in these null tests).Additionally, we aimed to show that our methods remain viablewhen instrumental effects are accounted for, using HERA and theSKA as fiducial instruments. These instruments were selected sinceHERA is a current-generation instrument not necessarily optimizedfor imaging, while the SKA is a next-generation instrument moresuitable for imaging. We found that the reconstruction fidelity inFourier space drops off strongly as a function of 𝑘 (shown in Figure12) and that better mode reconstruction will likely be necessary ifone wishes to use our techniques for applications such as powerspectrum estimation. However, in the image domain, the largestionized regions in a wedge-filtered image can be reliably identified,even when the images include instrumental affects from HERA orthe SKA (see Figures 8, 9, and 10 for sample predictions in thenoiseless, HERA, and SKA cases). This demonstrates the capacityof even current-generation instruments like HERA to perform somelimited imaging work, and paves the way for future EoR imagingstudies.In this paper, we have shown that filtering out foreground-contaminated modes within the wedge is not a dealbreaker for imag-ing studies that seek to locate ionized bubbles during the EoR—themodes can be recovered to a sufficient extent using a neural networkthat in the image domain, the bubbles can be reliably identified.While our proof-of-concept study is an important first step, futurework must considerably generalize our approach in order for it tobe a practical tool. For instance, in this paper we kept astrophysi-cal and cosmological parameters fixed, which does not accuratelyreflect our current state of knowledge in EoR studies. Progress hasbeen recently made in this direction in a complementary study byBianco et al. (2021) who have also tackled the problem of the EoR MNRAS000
85 line, since the networks werenot trained on data from that redshift.Meanwhile, the normalized cross-power spectra for both nulltest validation suites are very noisy and do not demonstrate anyclear trend, besides being noisier at low 𝑘 . This indicates that thenetwork is completely unable to reconstruct the signal beneath thewedge when it is “Gaussianized", supporting our hypothesis that thenetwork is exploiting the non-Gaussian coupling between Fouriermodes to reconstruct the EoR signal. This is confirmed by a visualinspection of the predicted images, shown in Figures 14 and 15 forthe first and second null tests described in Section 3.1, respectively. We have developed a machine learning-based method to identifyionized bubbles during the Epoch of Reionization. Our methodconsiderably extends the work of Li et al. (2019) and Makinen et al.(2020) and uses a U-Net-based deep learning algorithm to recoverFourier modes that are obscured by foregrounds. The algorithmdoes not rely on any knowledge of the foregrounds themselves,and enables image reconstruction after all modes lying within theforeground wedge have been completely nulled out. This is possibledue to the significant non-Gaussian correlations between Fouriermodes (Shimabukuro et al. 2016; Majumdar et al. 2018; Watkinsonet al. 2018; Hutter et al. 2019; Gorce & Pritchard 2019).Our main goal was to assess whether or not enough informationexists in a wedge-filtered EoR image to reconstruct the originalimage within a reasonable margin of error. This paper demonstratesan affirmative answer to this question: the lost wedge modes can indeed be recovered from a wedge-filtered image by exploiting thenon-Gaussian nature of the 21-cm EoR signal. We verify that the U-Net relies on phase correlations in the 21-cm signal by performingtwo null tests where the phases are decorrelated with one another.In both null tests, the U-Net fails to reconstruct any meaningfulinformation (see Figures 14 and 15 for sample predictions, andFigure 13 for the Fourier space recovery fidelity in these null tests).Additionally, we aimed to show that our methods remain viablewhen instrumental effects are accounted for, using HERA and theSKA as fiducial instruments. These instruments were selected sinceHERA is a current-generation instrument not necessarily optimizedfor imaging, while the SKA is a next-generation instrument moresuitable for imaging. We found that the reconstruction fidelity inFourier space drops off strongly as a function of 𝑘 (shown in Figure12) and that better mode reconstruction will likely be necessary ifone wishes to use our techniques for applications such as powerspectrum estimation. However, in the image domain, the largestionized regions in a wedge-filtered image can be reliably identified,even when the images include instrumental affects from HERA orthe SKA (see Figures 8, 9, and 10 for sample predictions in thenoiseless, HERA, and SKA cases). This demonstrates the capacityof even current-generation instruments like HERA to perform somelimited imaging work, and paves the way for future EoR imagingstudies.In this paper, we have shown that filtering out foreground-contaminated modes within the wedge is not a dealbreaker for imag-ing studies that seek to locate ionized bubbles during the EoR—themodes can be recovered to a sufficient extent using a neural networkthat in the image domain, the bubbles can be reliably identified.While our proof-of-concept study is an important first step, futurework must considerably generalize our approach in order for it tobe a practical tool. For instance, in this paper we kept astrophysi-cal and cosmological parameters fixed, which does not accuratelyreflect our current state of knowledge in EoR studies. Progress hasbeen recently made in this direction in a complementary study byBianco et al. (2021) who have also tackled the problem of the EoR MNRAS000 , 1–15 (2020) Gagnon-Hartman et al. bubble identification over a wide range of parameter choices andinstrumental noise scenarios, but not in the context of foregroundfiltering. Synthesizing these and other preliminary studies will al-low 21-cm machine learning techniques to mature and take EoRimaging studies to the next level, unlocking the potential of 21-cmcosmology to even more dramatically alter our view of CosmicDawn than with just statistical studies alone.
ACKNOWLEDGMENTS
The authors are delighted to acknowledge helpful discussions withJames Aguirre, Joelle Begin, Youssef Bestavros, Michele Bianco,Razvan Ciuca, Sambit Giri, Brad Greig, Nick Kern, Ilian Iliev,Paul La Plante, Garrelt Mellema, Andrei Mesinger, Damien Pinto,Jonathan Pober, Clovis Vinant-Tang, and Chris Williams. YC wasfunded by the Mitacs Globalink Research Internship Program. ALand SR are grateful for support from the Natural Sciences andEngineering Research Council of Canada (NSERC) through theirDiscovery Grants program as well as the Canadian Institute for Ad-vanced Research (CIFAR) via the Azrieli Global Scholars programfor AL and the Canada CIFAR AI Chair program for SR. Addition-ally, AL acknowledges support from the New Frontiers in ResearchFund Exploration grant program, a NSERC Discovery Launch Sup-plement, the Sloan Research Fellowship, and the William DawsonScholarship at McGill. Computations were made on the supercom-puters Cedar (at Simon Fraser University) and Béluga (at École detechnologie supérieure) managed by Compute Canada. The opera-tion of this supercomputer is funded by the Canada Foundation forInnovation (CFI).
REFERENCES
Anzanello M. J., Fogliatto F. S., 2011, International Journal of IndustrialErgonomics, 41, 573Beardsley A. P., Morales M. F., Lidz A., Malloy M., Sutter P. M., 2015, ApJ,800, 128Bianco M., Giri S. K., Iliev I. T., Mellema G., 2021, arXiv e-prints, p.arXiv:2102.06713Bowman J. D., Morales M. F., Hewitt J. N., 2009, The Astrophysical Journal,695, 183Carucci I. P., Irfan M. O., Bobin J., 2020, MNRAS, 499, 304Chapman E., et al., 2012, MNRAS, 423, 2518Chapman E., et al., 2013, MNRAS, 429, 165Cunnington S., Irfan M. O., Carucci I. P., Pourtsidou A., Bobin J., 2020,arXiv e-prints, p. arXiv:2010.02907Datta A., Bowman J. D., Carilli C. L., 2010, The Astrophysical Journal, 724,526Datta K. K., Jensen H., Majumdar S., Mellema G., Iliev I. T., Mao Y., ShapiroP. R., Ahn K., 2014, Monthly Notices of the Royal Astronomical Society,442, 1491DeBoer D. R., et al., 2017, PASP, 129, 045001Dillon J. S., Parsons A. R., 2016, ApJ, 826, 181Dillon J. S., Liu A., Tegmark M., 2013, Physical Review D, 87Dillon J. S., et al., 2014, Physical Review D, 89Furlanetto S. R., Peng Oh S., Briggs F. H., 2006, Physics Reports, 433, 181Goksel Karacayli N., Padmanabhan N., 2019, arXiv e-prints, p.arXiv:1904.01387Gorce A., Pritchard J. R., 2019, Monthly Notices of the Royal AstronomicalSociety, 489, 1321Hazelton B. J., Morales M. F., Sullivan I. S., 2013, The AstrophysicalJournal, 770, 156Hogg D. W., 1999, arXiv e-prints, pp astro–ph/9905116 Hothi I., et al., 2020, Monthly Notices of the Royal Astronomical Society,500, 2264Hutter A., Watkinson C. A., Seiler J., Dayal P., Sinha M., Croton D. J., 2019,Monthly Notices of the Royal Astronomical Society, 492, 653Isensee F., Kickingereder P., Wick W., Bendszus M., Maier-Hein K. H.,2018, in Crimi A., Bakas S., Kuijf H., Menze B., Reyes M., eds, Brain-lesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries.Springer International Publishing, Cham, pp 287–297Isensee F., Kickingereder P., Wick W., Bendszus M., Maier-Hein K., 2019,in Crimi A., van Walsum T., Bakas S., Keyvan F., Reyes M., KuijfH., eds, Brainlesion. Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics). Springer Verlag, pp 234–244, doi:10.1007/978-3-030-11726-9_21Koopmans L., et al., 2015, in Advancing Astrophysics with the SquareKilometre Array (AASKA14). p. 1 ( arXiv:1505.07568 )La Plante P., Battaglia N., Natarajan A., Peterson J. B., Trac H., Cen R.,Loeb A., 2014, The Astrophysical Journal, 789, 31Li D., Zhu H.-M., Pen U.-L., 2018, arXiv e-prints, p. arXiv:1811.05012Li W., et al., 2019, Monthly Notices of the Royal Astronomical Society, 485,2628Liu A., Shaw J. R., 2020, PASP, 132, 062001Liu A., Tegmark M., 2011, Physical Review D, 83Liu A., Tegmark M., Bowman J., Hewitt J., Zaldarriaga M., 2009, MonthlyNotices of the Royal Astronomical Society, 398, 401Liu A., Parsons A. R., Trott C. M., 2014a, Phys. Rev. D, 90, 023018Liu A., Parsons A. R., Trott C. M., 2014b, Phys. Rev. D, 90, 023019Majumdar S., Pritchard J. R., Mondal R., Watkinson C. A., Bharadwaj S.,Mellema G., 2018, Monthly Notices of the Royal Astronomical Society,476, 4007Makinen T. L., Lancaster L., Villaescusa-Navarro F., Melchior P., HoS., Perreault-Levasseur L., Spergel D. N., 2020, arXiv e-prints, p.arXiv:2010.15843Mesinger A., Furlanetto S., Cen R., 2010, Monthly Notices of the RoyalAstronomical Society, 411, 955Milletari F., Navab N., Ahmadi S.-A., 2016, V-Net: Fully Convolu-tional Neural Networks for Volumetric Medical Image Segmentation( arXiv:1606.04797 )Morales M. F., Wyithe J. S. B., 2010, Annual Review of Astronomy andAstrophysics, 48, 127Morales M. F., Bowman J. D., Hewitt J. N., 2006, The Astrophysical Journal,648, 767Morales M. F., Hazelton B., Sullivan I., Beardsley A., 2012, The Astrophys-ical Journal, 752, 137Park J., Mesinger A., Greig B., Gillet N., 2019, MNRAS, 484, 933Parsons A. R., Pober J. C., Aguirre J. E., Carilli C. L., Jacobs D. C., MooreD. F., 2012, The Astrophysical Journal, 756, 165Pober J. C., 2014, Monthly Notices of the Royal Astronomical Society, 447,1705Pober J. C., et al., 2013, The Astronomical Journal, 145, 65Pober J. C., et al., 2014, The Astrophysical Journal, 782, 66Pritchard J. R., Loeb A., 2012, Reports on Progress in Physics, 75, 086901Rahaman N., Baratin A., Arpit D., Draxler F., Lin M., Hamprecht F., BengioY., Courville A., 2019. PMLR, Long Beach, California, USA, pp 5301–5310, http://proceedings.mlr.press/v97/rahaman19a.html
Rezatofighi H., Tsoi N., Gwak J., Sadeghian A., Reid I., Savarese S., 2019,Generalized Intersection over Union: A Metric and A Loss for BoundingBox Regression ( arXiv:1902.09630 )Ronneberger O., Fischer P., Brox T., 2015, in Navab N., Hornegger J., WellsW. M., Frangi A. F., eds, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Springer International Publish-ing, Cham, pp 234–241Santos M. G., Cooray A., Knox L., 2005, The Astrophysical Journal, 625,575Shimabukuro H., Yoshiura S., Takahashi K., Yokoyama S., Ichiki K., 2016,Monthly Notices of the Royal Astronomical Society, 458, 3003Switzer E. R., Liu A., 2014, ApJ, 793, 102Thyagarajan N., et al., 2013, The Astrophysical Journal, 776, 6MNRAS , 1–15 (2020) ecovering the Lost Wedge Modes in 21-cm Foregrounds Tompson J., Goroshin R., Jain A., LeCun Y., Bregler C., 2014, Efficient Ob-ject Localization Using Convolutional Networks ( arXiv:1411.4280 )Trott C. M., Wayth R. B., Tingay S. J., 2012, The Astrophysical Journal,757, 101Ulyanov D., Vedaldi A., Lempitsky V., 2016, Instance Normalization: TheMissing Ingredient for Fast Stylization ( arXiv:1607.08022 )Vedantham H., Udaya Shankar N., Subrahmanyan R., 2012, The Astrophys-ical Journal, 745, 176Wang X., Tegmark M., Santos M. G., Knox L., 2006, ApJ, 650, 529Watkinson C. A., Giri S. K., Ross H. E., Dixon K. L., Iliev I. T., MellemaG., Pritchard J. R., 2018, Monthly Notices of the Royal AstronomicalSociety, 482, 2653Wolz L., et al., 2017, MNRAS, 464, 4938Zheng H., et al., 2017, MNRAS, 464, 3486Zhu H.-M., Pen U.-L., Yu Y., Chen X., 2018, Phys. Rev. D, 98, 043511de Oliveira-Costa A., Tegmark M., Gaensler B. M., Jonas J., LandeckerT. L., Reich P., 2008, MNRAS, 388, 247This paper has been typeset from a TEX/L A TEX file prepared by the author.MNRAS000