[PDF] An Unbiased Estimator of the Full-sky CMB Angular Power Spectrum using Neural Networks

Abstract

Full PDF

AAn Unbiased Estimator of the Full-sky CMB Angular Power Spectrumusing Neural Networks

Pallav Chanda ∗ and Rajib Saha †

21, 2

Department of Physics, Indian Institute of Science Education and Research, Bhopal

February 8, 2021

Abstract

Accurate estimation of the Cosmic Microwave Background (CMB) angular power spectrum is enticing due to theprospect for precision cosmology it presents. Galactic foreground emissions, however, contaminate the CMB signaland need to be subtracted reliably in order to lessen systematic errors on the CMB temperature estimates. Typicallybright foregrounds in a region lead to further uncertainty in temperature estimates in the area even after someforeground removal technique is performed and hence determining the underlying full-sky angular power spectrumposes a challenge. We explore the feasibility of utilizing artiﬁcial neural networks to predict the angular powerspectrum of the full sky CMB temperature maps from the observed angular power spectrum of the partial sky inwhich CMB temperatures in some bright foreground regions are masked. We present our analysis at large angularscales with two diﬀerent masks. We produce unbiased predictions of the full-sky angular power spectrum and theunderlying theoretical power spectrum using neural networks. Our predictions are also uncorrelated to a large extent.We further show that the multipole-multipole covariances of the predictions of the full-sky spectra made by the ANNsare much smaller than those of the estimates obtained using the method of pseudo- C l . The Cosmic Microwave Background (CMB) is consideredas an important probe of the early universe. Details re-garding the cosmological parameters, obtained from theangular power spectrum of the CMB temperature and po-larization anisotropies, have proved useful in understand-ing the mechanism for the formation and the growth oflarge-scale structure. Constraints on cosmological param-eters have been obtained by satellite-based experimentslike COBE [1], WMAP [2], and PLANCK [3], as wellas ground-based experiments like ACT [4], and SPT [5].Upcoming projects like the CCAT-prime [6], ESA CORE[7], and others with improved sensitivities and specializedequipment, will surely make remarkable improvements inCMB measurements.The CMB, discovered in 1965 [8], is a nearly-uniformand isotropic radiation ﬁeld exhibiting a virtually perfectblack-body spectrum at a temperature of 2 . µ K. During an observation, the tempera-ture ﬂuctuations are seen projected on the 2D surface ofthe spherical sky. Therefore, the temperature anisotropiescan be expressed by using a spherical harmonic expansionas follows: T ( θ, φ ) = ∞ (cid:88) l =0 l (cid:88) m = − l a lm Y lm ( θ, φ ) , (1)where Y lm ( θ, φ ) are the spherical harmonic functions, and a lm are the harmonic modes given by, a lm = (cid:90) πθ = − π (cid:90) πφ =0 T ( θ, φ ) Y ∗ lm ( θ, φ )dΩ . (2)Noting that one can measure only (2 l + 1) m -modes for amultipole, the angular power spectrum can be written as,ˆ C l = 12 l + 1 l (cid:88) m = − l | a lm | . (3)ˆ C l is χ -distributed with a mean of C th l , a variance of2( C th l ) / (2 l + 1), and 2 l + 1 degrees of freedom, where C th l represents the theoretical power spectrum. Therefore, C th l = (cid:104) ˆ C l (cid:105) = 12 l + 1 l (cid:88) m = − l (cid:104)| a lm | (cid:105) , (4)where (cid:104)◦(cid:105) denotes an ensemble average. Since we haveonly one real sky, our observation of the CMB sky willbe described by one such realization of the angular powerspectrum drawn from the above-stated distribution.The statistical information present in a CMB temper-ature anisotropy map can be encapsulated in its angularpower spectrum, ˆ C l , in the widely accepted structure for-mation model of inﬂation-introduced curvature perturba-tions that are Gaussian-distributed. Extracting cosmolog-ical information from CMB observations is possible onlywhen all non-cosmological signals are reliably subtracted.Recent CMB experiments are equipped with adequate an-gular resolution and sensitivity to probe CMB anisotropies ∗ email: [email protected] † email: [email protected] a r X i v : . [ a s t r o - ph . C O ] F e b t large angular scales. Therefore, the primary source ofuncertainty in estimates is due to the galactic foregroundemissions and not the instrumental noise.The galactic foregrounds are characterized by muchhigher intensities (temperatures) in the region around thegalactic plane compared to other regions on the sky. Thisresults in comparatively higher uncertainties in CMB mea-surements in those regions when reconstructed using exist-ing methods like COMMANDER (see Eriksen et al. 2004[10], Eriksen et al. 2008 [11]), ILC (see Saha et al. 2006,2008 [12, 13], Sudevan et al. 2018 [14]), etc. Bright sourcesin some sky regions also result in further unreliability ofthe measurements. Applying a mask on the CMB tem-perature map, that excludes the components in ‘bright’foreground regions, aids data analysis. The angular powerspectrum of the partial (masked) sky can then be com-puted easily. However, recovering an unbiased estimatorof the underlying full-sky angular power spectrum is animportant problem in cosmology to solve.Common maximum likelihood methods [15, 16], Gibbsand Bayesian sampling methods [10, 17] exist for esti-mating the full-sky CMB temperature anisotropy angularpower spectrum, ˆ C l , from the angular power spectrum ofﬁnite-area cut-sky CMB map, ˆ˜ C l . However, these meth-ods involve complex computations and are CPU expensivesince they scale as l , with l max being the maximum mul-tipole. The method using Gabor transforms, introducedby Hansen et al. (2002) [18], is again hindered by slow cal-culations of the correlation matrix of ˆ˜ C l needed for theirmaximum likelihood analysis.Furthermore, Peebles (1973) [19], Wandelt et al. (2001)[20] and Hivon et al. (2002) [21] put forward the pseudo- C l algorithm for estimation of the full-sky angular powerspectrum from the masked-sky spectrum, which foregoesthe slow calculations by introducing a mode-mode cou-pling kernel that depends only on the geometry of themask used. Various extensions of the pseudo- C l methodexist like those presented by Reinecke et al. (2013) [22],Elsner et al. (2017) [23], and the references therein. Nev-ertheless, the ˆ C l estimates using these methods have largeerror-bars on the low multipoles, even after using substan-tial bin-width, and are also limited in the sense that theunbiased estimator relies on a linear transformational re-lation that exists between the ensemble averages, (cid:104) ˆ˜ C l (cid:105) and (cid:104) ˆ C l (cid:105) , which may not necessarily be the best possible re-lation between ˆ˜ C l and ˆ C l of the individual realizations.Therefore, the requirement for a technique that is eﬃcient,reliable, and enables optimal recovery of the full-sky an-gular power spectra at all multipoles is well documented.In this work, we have explored the use of Artiﬁcial Neu-ral Networks (ANNs) to predict the full-sky CMB angularpower spectrum based on the power spectrum of the par-tial (or masked) sky as a new alternative method. We areinterested in the reconstruction of large-scale CMB tem-perature anisotropy power spectrum in this initial article.We produce random realizations of the CMB temperatureanisotropy, i.e., CMB maps. We choose the suitable tem-perature masks to apply on the CMB maps so as to getthe masked CMB maps. Then, we calculate the angularpower spectra of the masked CMB maps as well as the un- masked (full-sky) CMB maps, hereafter represented as ˆ˜ C l sand ˆ C l s respectively. The simulated data is used to con-struct training and testing sets for the ANNs. The trainingdata plays a role similar to that of the priors and is used bythe neural network to learn the complicated mapping thatexists between the ˆ˜ C l and ˆ C l of the individual realizations.ANNs are well-known as universal function approximators(see Hornik (1991) [24], Pinkus (1999) [25]). Using thisnovel method enables us to get unbiased predictions of thefull-sky spectra without binning at lower multipoles andwith a minimal bin width at higher multipoles. We alsoobtain signiﬁcantly smaller error-bars on our full-sky ˆ C l predictions. The ANN predictions help us acquire an un-biased estimator of the underlying theoretical C th l as well.This paper is organized as follows: In §

2, we derive thenecessary equations that relate the the full-sky angularpower spectrum to the partial-sky power spectrum. Next,we give a brief review of the concept of Artiﬁcial NeuralNetworks in §

3. In §

4, we present our strategy for get-ting an unbiased estimate of the full-sky angular powerspectrum from the power spectrum of the partial-sky. § C l s and the ˆ˜ C l s. We list our methodology,the procedure for training the networks, and the binningstrategy required to get an unbiased estimate after mak-ing the predictions in §

6. In §

7, we present the results ofour analyses on the simulated data. Our conclusions andpossible future work on our method are discussed in § The eﬀects of a mask on the full-sky temperature ﬁeld canbe described by a position-dependent weighting using awindow function W . The eﬀect of a ﬁnite window func-tion on the temperature ﬁeld is given by,˜ T ( θ, φ ) = W ( θ, φ ) T ( θ, φ ) . (5)Deﬁning the harmonic space window function, W l (cid:48) m (cid:48) lm , as: W l (cid:48) m (cid:48) lm = (cid:90) πθ = − π (cid:90) πφ =0 Y l (cid:48) m (cid:48) ( θ, φ ) W ( θ, φ ) Y ∗ lm ( θ, φ )dΩ , (6)the harmonic modes of the partial (masked) sky are givenby, ˜ a lm = (cid:88) l (cid:48) m (cid:48) W l (cid:48) m (cid:48) lm a l (cid:48) m (cid:48) . (7)Using these harmonic modes, the angular power spec-trum of the masked sky can be calculated as,ˆ˜ C l = 12 l + 1 l (cid:88) m = − l | ˜ a lm | . (8)Thus, one can relate the partial-sky power spectra to thefull-sky spectra by taking ensemble averages on both sidesand then using Eq.7 and Eq.4 as follows: (cid:104) ˆ˜ C l (cid:105) = 12 l + 1 l (cid:88) m = − l (cid:104) ˜ a lm ˜ a † lm (cid:105) = (cid:88) l (cid:48) (cid:88) mm (cid:48) W l (cid:48) m (cid:48) lm (cid:104) ˆ C l (cid:48) (cid:105) ( W l (cid:48) m (cid:48) lm ) † . (9)2he above expression can be simpliﬁed by using a matrix M to describe the mode-mode coupling, and it becomes: (cid:104) ˆ˜ C l (cid:105) = (cid:88) l (cid:48) M ll (cid:48) (cid:104) ˆ C l (cid:48) (cid:105) . (10)On the large scales that we are working on, the instru-mental noise is negligible in magnitude and hence, can beignored. Thus, inverting Eq.10, the true full-sky powerspectra can be represented in terms of the pseudo partial-sky spectra, (cid:104) ˆ C l (cid:105) = (cid:88) l (cid:48) M − ll (cid:48) (cid:104) ˆ˜ C l (cid:48) (cid:105) . (11)For a given realization of the CMB sky, we use theconvention that c represents the column vector whoseelements are the ˆ C l ( l = 0 , , ..., l max ) and ˜c repre-sents the column vector whose elements are the ˆ˜ C l ( l =0 , , ..., l max ), where, l max depends on the resolution of thetemperature maps, throughout the paper. If the ensembleaverages in Eq.11 are eliminated, the same relation canbe used as an estimator of the full-sky CMB ˆ C l using thepartial-sky ˆ˜ C l . This relation is key to the existing pseudo- C l methods. However, the linear transformational rela-tion holds only on the stated ensemble averages and thus,inherently, excludes the complex functional relationshipsthat may exist between the full-sky and the partial-skypower spectrum of individual realizations of the CMB sky.Thus, to get a better estimate, we also replace the lineartransformation with a functional relation while discardingthe ensemble averages. Eq.11 can, then, be generalized asfollows: c estimate = f ( ˜c ) . (12)The above equation can be equivalently described bythe mapping problem, ˜c f −−→ c . (13)Based on this expression, we explore a strategy usingANNs, discussed in §

4, to approximate the full-sky CMBangular power spectrum.

Artiﬁcial Neural Networks (ANNs) are computational in-formation processing systems aimed at recognizing under-lying relationships in a set of data. Historically, the learn-ing mechanisms of neural networks have drawn inspirationfrom those of the human brain and the nervous system.However, as the study of neural networks progressed andvarious architectures that had little connection with theworking of the brain were discovered, such comparisonsare now debated. Even so, ANNs have found a wide rangeof applications ranging from ﬁnancial predictions and func-tion approximation to computer vision and speech recog-nition/synthesis. Bishop (1995) [26] gives a detailed de-scription of neural networks.In our work, we focus on supervised learning with‘dense’ (fully-connected) feed-forward ANNs. An ANNconsists of an input layer, an output layer and one or morehidden layers (see Fig.3). Each of the circular units is called a neuron and the lines connecting the diﬀerent neu-rons represent the associated weights. Consider the inputfeatures in a particular training example to be representedas a column vector x whose elements are x p ( p = 1 , ..., n in ,where, n in is the number of input features) and the trueoutput values (or ground truths) to be represented as acolumn vector ˆy whose elements are ˆ y q ( q = 1 , ..., n out ,where, n out is the number of output values). An ANN canthen be understood as a mapping from input to output, x ANN −−−−→ y . Figure 1: Figure showing a ‘dense’ ANN with an inputlayer, an output layer and a single hidden layer. The word‘dense’ implies a fully-connected network in which all theneurons in any layer of the network are connected to allthe neurons in the subsequent layer.

The ANN learns the mapping by training the weights andbiases in the network. Each neuron in a layer is connectedto all the neurons of the previous layer. We use the con-vention that superscript [ l ] represents the l -th layer. Thus,any layer l has n [ l ] neurons, associated with W [ l ] weights[ w [ l ] ij joining neuron i in layer l and neuron j in layer ( l − b [ l ] biases (or oﬀsets) [ b i of neuron i in layer l ]. A neu-ron is comprised of a linear part and an activation part.Let the linear and activation parts of the neurons in the l -th hidden layer be represented as vectors z [ l ] and a [ l ] ,respectively. Then, the forward propagation step is givenby, z [ l ] i = n [ l − (cid:88) j =1 w [ l ] ij a [ l − j + b [ l ] i , (14) a [ l ] i = g [ l ] (cid:16) z [ l ] i (cid:17) , (15)where g [ l ] ( z ) represents the activation function g of the l -th layer. The activation functions are chosen to be non-linear functions, e.g. logistic sigmoid, tanh, ReLU, etc., tointroduce a non-linearity in the network. By convention,the input layer is referred to as the 0 th layer, and a [0] = x .The hidden layers, along with the activation functions,helps the ANN to create complex non-linear mappings.For regression problems in which the output featuresare real-valued numbers, an identity activation is applied3o the output later. If the output layer is enumerated as L , the output layer activations can be computed as: y q = a [ L ] q = z [ L ] q = n [ L − (cid:88) j =1 w [ L ] qj a [ L − j + b [ L ] q . (16)This constitutes the network’s prediction vector y whichdepends on the weights, biases, and activation functionsof all the layers in the network, thus creating a complexmapping. Consider training an ANN using m training examples ofinput and true output vector pairs ( x ( k ) , ˆy ( k ) ), where weuse the convention that superscript ( k ) represents the k th training example. The ANN forward propagates to get theneuron activations for all m training examples and eventu-ally the predicted outputs y ( k ) ( k = 1 , ..., m ). A trainingalgorithm that minimizes a cost function ( J ) is used totrain an ANN. For the real-valued output problem thatwe currently have at hand, we use the sum-of-squares er-ror function averaged over all training examples as the costfunction , i.e., J = 1 m m (cid:88) k =1 n out (cid:88) q =1 (cid:16) y ( k ) q − ˆ y ( k ) q (cid:17) . (17)A technique called Error Back Propagation [27] helpsus estimate the gradients of the weights and biases w.r.t.the cost function based on all the training examples weconsider. Taking a small step from w to w + δw and from b to b + δb changes the cost by an amount δJ , and wemove in the direction in which J minimizes. This job ofupdating the parameters by a minimal amount in everyiteration is handled by an Optimization Algorithm . Var-ious types of optimization algorithms exist like

GradientDescent , Stochastic Gradient Descent (SGD) , Momentum , RMSProp [28],

Adaptive Moment Estimation (ADAM) [29], etc. with their own sets of pros and cons. It is beyondthe scope of this paper to provide a review of these, andwe refer the reader to Sun et al. (2019) [30] for the same.Updating the weights of a neural network by estimat-ing the gradients based on all the training examples dur-ing each iteration of network-training may lead the neu-ral network to get stuck in the local minima of the high-dimensional weight space. To tackle this problem, mini-batches of size b ( < m ) are used. The ﬂuctuations inthe weight space brought about by the mini-batch opti-mization algorithms like Mini-batch Stochastic GradientDescent (MSGD) , ADAM , etc. by computing the gradi-ents based on a subset of the entire training set duringeach iteration allow the neural network to jump to an-other possible minimum as shown by Ruder (2016) [31].In literature, the terminology ‘epoch’ is used to addressthe number of iterations of the optimization algorithm, inwhich the network ‘sees’ all the training examples once.1 epoch = (cid:108) mb (cid:109) iterations , where b represents the number of training examples in abatch (i.e., batch-size) and (cid:100)◦(cid:101) represents the ceiling func-tion. The training process for the networks used can, thus,be summed up as follows:1. Randomly initialize all the weights and initialize all thebiases to zero.2. Shuﬄe all the examples in the training set and split itinto n bat batches, each having b examples (except thelast batch, which can have less than b examples).3. (a) Forward propagate to get the neuron activationsusing the training examples in the consideredbatch.(b) Compute the cost on the current batch ( J r ) as: J r = batch − size (cid:88) k =1 n out (cid:88) q =1 (cid:16) y ( k ) q − ˆ y ( k ) q (cid:17) , (18)where the subscript r denotes the current batch.(c) Back propagate errors to get estimates of the gra-dients of the weight space, ∂J r ∂w and ∂J r ∂b .(d) Update the weights and the biases using an opti-mization algorithm and the gradients.(e) Repeat steps 3a to 3d for all batches in the train-ing set (i.e., (cid:6) mb (cid:7) iterations).(f) Compute the total cost ( J tot ) as: J tot = 1 m n bat (cid:88) r =1 J r (19)4. Repeat steps 2 and 3 for t number of epochs or till J tot is minimized.The hyper-parameters of the ANN model need to betuned manually for training it successfully. While training,the performance on a separate validation set is tracked toensure that the network does not over-ﬁt to the trainingdata. The ANN, with the trained weights and biases, isthen used for making predictions on the test set. Thisway, it is ensured that testing is done on examples thatthe ANN has never seen before. In view of the discussions in § §

3, Eq.13 is essentiallythe mapping problem that an ANN is designed to solve,i.e., ˜c ANN −−−−→ c , (20)when trained on data using ( x ( k ) , ˆy ( k ) ) = ( ˜c ( k ) , c ( k ) ),where, k = 1 , ..., m represent the m training examples.Partial-sky ˆ˜ C l at all multipoles are required for estimatingthe full-sky ˆ C l at each multipole. Thus, we use ‘dense’feed-forward ANNs to tackle this mapping problem.A rich training data set, containing several realizationsof the partial-sky and the full-sky spectra, is primary totraining a good model for predicting the full-sky spectrafrom the partial-sky spectra. In the following sections, wewill discuss how we obtain the simulated training data andbuild the network.4 Simulations of the full-sky andpartial-sky CMB angular powerspectra

HEALPix [32] software in python ( healpy ) hasbeen used to simulate and analyze random re-alizations of the CMB temperature anisotropy. healpy.sphtfunc.synfast can simulate full-sky CMBmaps given the theoretical C th l . WMAP theoretical CMBtemperature anisotropy power spectrum is used as the in-put theoretical C th l . The cosmological parameters used togenerate this theoretical power spectrum model were thebest-ﬁtting parameters obtained using a standard ΛCDMmodel with a power-law spectral index [33]: { Ω b h , Ω m h ,h, A, τ, n s } = { . , . , . , . , . , . } , (21)where, Ω b is the baryonic energy density, Ω m is the to-tal matter density, h is the Hubble constant (in units of100 km/s/Mpc), n s is the scalar spectral index, τ is theoptical depth to the decoupling surface, and A is the pa-rameter used to characterise the amplitude of the initialperturbations.We work with low resolution HEALPix maps at N side = 16, which gives an angular resolution of ≈ . C th l up to l max = 32 are provided as input to synfast for map generation. The simulated maps are alsosmoothed by a Gaussian beam with FWHM = 540 arcmin.We present our analysis with two diﬀerent masks. Weuse the ‘Kp2 mask’ as our ﬁrst mask, in which ≈ N side = 512, which is downgraded to N side = 16. All the non-integer values that may havearisen due to downgrading are then rounded oﬀ to either0 or 1. We use the temperature mask given alongsidethe Planck PR3 CMB IQU maps produced by the COM-MANDER pipeline as our second mask, hereafter referredto as the ‘Planck T-mask’ ( ≈

12% masked pixels). Themask is available at N side = 2048, which is downgradedto N side = 16 and applied with similar processing as de-scribed for the ‘Kp2 mask’. We apply these raw (1 / C l s) of the simulated full-sky maps are computed using healpy.sphtfunc.anafast ,where we again set l max = 32 as the maps were gener-ated using the same l max on C th l . However, in the case ofpartial-sky maps, we compute the angular power spectra( ˆ˜ C l s) with l max = 47 so as to extract as much informationas possible from the masked maps, which would assist ourANNs in making better predictions. CMB ˆ C l are usuallyplotted as l ( l + 1) C l / π , following which a lot of the struc-ture at smaller scales is brought out. The power spectraare also divided by an additional factor of B l P l , where B l represents the beam window function of the Gaussianbeam corresponding to FWHM = 540 arcmin and P l rep-resents the pixel window function for N side = 16, so as to account for the eﬀects of smoothing and pixel area, re-spectively. Fig.3 depicts the procedure for obtaining therequired data for a realization of the CMB sky.Following the procedure, 1 . × random realizationsof the CMB sky are simulated, getting the angular powerspectra of which give us 1 . × examples of full-sky CMBˆ C l . Then, each of these full-sky maps are masked with the‘Kp2 mask’ and the angular power spectra are computedto get the corresponding 1 . × examples of partial-sky CMB ˆ˜ C l . Further, each of the full-sky maps is thenmasked with the ‘Planck T-mask’ to get the correspond-ing 1 . × examples of partial-sky CMB ˆ˜ C l . This is thesimulated data that we use to train the neural networks.Figure 2: The Kp2 mask and the Planck T-mask at N side = 16 that we use for our analyses. The masked re-gion of the sky is shown in gray colour and the unmaskedregion is shown in green colour. Due to the statistical nature of the problem, it is knownthat the neural network needs to ‘see’ at least 10 trainingexamples so as to map possible feature variations. Out ofthe 1 . × pairs of examples of the full-sky ˆ C l and thepartial-sky ˆ˜ C l obtained using the ‘Kp2 mask’, 10 pairs of( ˜c , c ) are designated as the training set, while 10 pairseach are kept aside as validation and test sets. A similartrain, validation, and test set splitting is done for ( ˜c , c )pairs obtained using the ‘Planck T-mask.’We train two diﬀerent neural networks - 1. For estimat-ing full-sky ˆ C l using partial-sky ˆ˜ C l obtained with the ‘Kp2mask’, and 2. For estimating full-sky ˆ C l using partial-sky https://healpix.sourceforge.io/ https://github.com/healpy/healpy N side = 16. A realization of the CMB skyis obtained given the theoretical power spectrum, C th l . Its angular power spectrum gives the ˆ C l . The desired mask isthen applied on the CMB map. The angular power spectrum of the masked CMB map gives the ˆ˜ C l . The ˆ˜ C l are verylarge at higher multipoles due to the loss of information caused by masking.ˆ˜ C l obtained with the ‘Planck T-mask’. Here forth, we referto the analyses using the ‘Kp2 mask’ and ‘Planck T-mask’as cases I and II, respectively. For both ANNs, the in-put features are x = ˜c (generated using the correspondingmask), and the true output values are ˆy = c , as discussedin §

4. Considering the discussion in § C l at l = 0 , l max = 47, and we predictthe full-sky spectra in the multipole range: 2 ≤ l ≤ TensorFlow [34] machine-learningframework to set up, train, and analyze the neural net-works. Training the networks is an iterative procedure.Mean normalization pre-processing is applied on the inputfeatures of the training set for both cases. This transformsall our input training features to have a similar range andallows the optimization algorithm to eﬃciently explore theweight-space. The mean and scale used to transform thetraining set are also used to process the validation and testsets to ensure that the prediction features go through thesame pre-processing as the training ones. This helps toaccurately estimate the network reliability.An ANN with two hidden layers having 64 neuronseach with ReLU activation is found to work best for bothcases. The training takes about 2 CPU hours on a personalcomputer with Intel ® Core ™ i5-6200U (@ 2.30GHz ×

4, 8GB RAM). The schematic of the ANN architecture usedin this work is shown in Fig.4. We use

ADAM as our op- timization algorithm and use mini-batches while trainingthe ANNs. The performance of the ANNs on the corre-sponding validation sets are also tracked.After training the networks and getting the predictionson the test sets, we bin the predicted full-sky spectra l = 21onward with a bin width of two to get an unbiased esti-mate at larger multipoles. Thus, six bins are obtainedfrom 21 ≤ l ≤

32. The binned ˆ C l is assigned to the centralmultipole of the corresponding bin.Figure 4: Representation of the ANN architecture that weuse for our analyses. The ANN has an input layer with 48neurons, an output layer with 31 neurons, and two hiddenlayers with 64 neurons each and ReLU activation function. Results

In this section, we discuss the results from simulations de-picting the use of the two diﬀerent masks. We mask eachof the simulated CMB maps once with the Kp2 mask andonce with the Planck T-mask, get the required power spec-tra, and create the corresponding training, validation, andtest sets using the procedure described in §

5. The trainedANNs are utilized to predict the full-sky CMB angularpower spectra on the corresponding test sets (see § C l predictions madeby ANN in case I, i.e., for the analysis with the Kp2 mask.The predictions are in good agreement with the groundtruths at lower multipoles. At higher multipoles, the pre-dictions smoothen out compared with the ground truths.Figure 6: Same as Figure 5 but for case II, i.e., the analysiswith the Planck T-mask. Figure 7: Figure showing the mean and the SEM of the10 ˆ C l diﬀerences ( c original - c predicted ) for case I, i.e., whenthe Kp2 mask is used (top), and for case II, i.e., when thePlanck T-mask is used (bottom). The mean is less than 3 σ at all of the multipoles except at l = 10 ,

14 when using theKp2 mask and at l = 14 when using the Planck T-mask,where it is between 3 σ and 4 σ .Figure 8: Figure showing the statistics - mean (top) andstandard deviation (bottom) - of the predicted full-skyˆ C l s and those of the ground truths for the analyses withboth masks. The theoretically expected statistics are alsoshown. The mean of the predictions are in good agree-ment with C th l . The standard deviation of the predictionsis smaller than the square root of the cosmic variance.7igure 9: Correlation matrices of the predicted full-skyˆ C l s for case I, i.e., for the analysis with the Kp2 mask(left), and for case II, i.e., for the analysis with the PlanckT-mask (right). The correlations are negligible at lowermultipoles. Some correlations are present at higher mul-tipoles. The correlations obtained when using the PlanckT-mask are lower than those obtained with the Kp2 mask.Figure 10: Left panel: Covariance matrix of the predictedˆ C l s using ANN for case I, i.e., when the Kp2 mask is used.The matrix shows that the predictions have some covari-ance at lower multipoles and in a region around the diag-onal at higher multipoles, while the covariances are muchlower in the remaining areas. Right panel: Covariance ma-trix of the estimated ˆ C l s using the pseudo- C l method withthe Kp2 mask. The matrix has a smooth structure withthe covariances being larger at higher multipoles. Overall,the covariances observed in pseudo- C l estimates are largerthan those obtained using the ANN predictions.Figure 11: Same as Figure 10 but for case II, i.e., whenthe Planck T-mask is used.Fig.5 and Fig.6 show some predictions of the full-skyangular power spectrum made by the ANNs on the testsets along with the ground truth (original) power spec-trum for cases I and II, respectively. At lower multipoles,the ˆ C l predictions of the neural networks almost trace theoriginal ˆ C l . The mask leads to a greater loss of informationat higher multipoles (see Fig.3), and consequently, we ob-serve that the ANNs are unable to reconstruct the featuresor ﬂuctuations in the power spectra eﬃciently. However,being intelligent systems, the ANNs compensate for theinformation loss and predict the full-sky ˆ C l increasingly closer to C th l at larger multipoles, even though the ANNsare not provided with any information about the same.The power spectra diﬀerences ( c original - c predicted ) forthe 10 examples in the corresponding test sets are ob-tained. Fig.7 plots the mean of these diﬀerences alongwith the standard error of the mean (SEM) for cases I andII, respectively. In both cases, the mean of the ˆ C l diﬀer-ences are below 3 σ at most of the multipoles. At l = 10 , l = 14 for case II, the mean of the ˆ C l dif-ferences are between 3 σ and 4 σ . This indicates that thepredictions of the full-sky spectra are unbiased.We compute the overall statistics of the predictions andcompare them with those of ground truths and the theo-retical C th l in Fig.8. The mean of the 10 predicted full-skyˆ C l s traces the C th l and the mean of original ground truthˆ C l s in both cases. However, the standard deviation of thepredictions is less than the square root of the cosmic vari-ance and the standard deviation of the ground truths. Themost likely reason for this is the same as discussed earlier,the predictions made by the ANN are progressively closerto the mean, C th l , as the multipoles increase so as to makeup for the loss of information due to masking and not goarbitrarily wrong on the predictions. The standard devi-ations of the predictions also have a small dip caused bythe binning of the predicted ˆ C l l = 21 onwards.We further present the correlation matrices of the full-sky ˆ C l predicted by the ANNs for cases I and II in Fig.9.The correlations are almost absent at lower multipoles upto l (cid:46)

22. The higher multipoles have a maximum correla-tion of about 24% near the last three bins, with the othercorrelations being less than ≈ ≈

3% lesser masked pixels inthe Planck T-mask than the Kp2 mask, which leads tosomewhat more information in the corresponding ˆ˜ C l . Thecorresponding ANN is able to use this information to makepredictions with comparatively lesser correlations.Finally, the covariance matrices of the predictionsmade by the ANN and the estimations given by themethod of pseudo- C l on the test sets for cases I and II areshown side by side for comparison in Fig.10 and Fig.11, re-spectively. We discern that the covariance matrices for thetwo methods are structured diﬀerently from one another,with the covariances being signiﬁcantly smaller with ourmethod using ANNs for both cases I and II. In this paper, we have demonstrated for the ﬁrst time thatsupervised machine learning with Artiﬁcial Neural Net-works can be employed to estimate the full-sky CMB an-gular power spectrum ( ˆ C l ) eﬀectively from the partial-skyspectrum ( ˆ˜ C l ).We have considered two diﬀerent masks in our anal-ysis - the Kp2 mask and the Planck T-mask. We haveexhibited that an ANN with just two hidden layers can beutilized for the purpose. The optimum number of layersand neurons in a layer were found by training various neu-ral network architectures until we got the best results forboth cases. We have not used detector noise in our simu-8ations, which is a reasonable assumption for temperatureand power spectrum analysis over large angular scales at N side = 16.Both masks conceal comparable regions of the sky. Thepredictions made by the ANNs on the corresponding testsets show that the estimations are equally good (see Fig.5and Fig.6). The mean and the SEM of the 10 ˆ C l diﬀer-ences ( c original - c predicted ) on our test sets for both casesalso show that the full-sky ˆ C l predictions are unbiased(see Fig.7). By comparing the overall statistics of the pre-dictions with those of the ground truths and C th l , we haveshown that our ANN does not output negative or arbitrar-ily high full-sky power spectra at higher multipoles wherethe information loss due to the cut-sky is signiﬁcant. Thishelps us accurately recover the mean C th l , while obtain-ing a decent unbiased estimate of the ˆ C l s even at largermultipoles. However, the ANN predictions are not able tofully recover the cosmic variance on the test sets. Preserv-ing the cosmic variance on the ˆ C l predictions will be thesubject of a subsequent study of this ANN method.The correlation matrices of the predicted full-sky spec-tra in both analyses (see Fig.9) suggest that the ˆ C l pre-dictions are mostly uncorrelated with some residual corre-lation at higher multipoles where the partial-sky spectraare extremely correlated owing to the loss of informationon the cut-sky. We have also compared the covariancematrices of the full-sky spectra predictions made by theANNs and the full-sky spectra estimates obtained usingthe pseudo- C l method (see Fig.10 and Fig.11), which in-dicate that the ANNs obtain signiﬁcantly smaller covari-ances on the predictions at all multipoles compared to thelatter method in both cases I and II. However, we restatethat the ANNs achieve this at the cost of a lesser varianceon the ˆ C l predictions compared with the cosmic variance.Nonetheless, our initial analysis has produced encour-aging results and exempliﬁes the capacity of ANNs as anew alternative method for estimation of the full-sky CMBangular power spectra. Further research is needed in thisarea in order to improve the method and make the pre-dictions better. A project focused on a higher resolutionanalysis that includes instrumental noise would be partic-ularly interesting. A useful advantage of our method isthat it does not require a machine with very high com-putational power. Even for a higher resolution analysis,ANNs will be reliably fast in convergence, as the ANNscan compute complex functional mappings between highdimensional input and output features by using just a fewlayers and neurons in the network with much-advancedoptimization algorithms.We have presented an unbiased estimator of the full-sky ˆ C l from the partial-sky ˆ˜ C l and accurately recovered the C th l using neural networks in this article. Going forward,we would like to probe the reliability of the full-sky spectrapredictions made by the neural networks in measurementsof the cosmological parameters and the application of thisdeveloped ANN framework for cosmological analyses. Acknowledgements

The authors would like to thank Vipin Sudevan for theirhelp. P.C. would like to thank Nirnay Roy and PrashantShukla for their help and insightful discussions at vari-ous stages of this work. P.C. would also like to acknowl-edge the support of DST for providing the INSPIRE schol-arship. The authors would like to acknowledge the useof open-source packages

HEALPix and TensorFlow ,and thank the respective groups for the same. References [1] Charles L Bennett, Anthony J Banday, et al. Four-year cobe*dmr cosmic microwave background observations: maps and ba-sic results.

The Astrophysical Journal Letters , 464(1):L1, 1996.[2] CL Bennett, RS Hill, et al. First-year wilkinson microwaveanisotropy probe (wmap)* observations: Foreground emission.

The Astrophysical Journal Supplement Series , 148(1):97, 2003.[3] Peter AR Ade, N Aghanim, et al. Planck 2013 results. xvi.cosmological parameters.

Astronomy & Astrophysics , 571:A16,2014.[4] Jonathan L Sievers, Renee A Hlozek, et al. The atacamacosmology telescope: Cosmological parameters from three sea-sons of data.

Journal of Cosmology and Astroparticle Physics ,2013(10):060, 2013.[5] Z Hou, CL Reichardt, et al. Constraints on cosmology from thecosmic microwave background power spectrum of the 2500 deg2spt-sz survey.

The Astrophysical Journal , 782(2):74, 2014.[6] GJ Stacey, M Aravena, et al. Ccat-prime: science with anultra-wideﬁeld submillimeter observatory on cerro chajnantor.In

Ground-based and Airborne Telescopes VII , volume 10700,page 107001M. International Society for Optics and Photonics,2018.[7] P. de Bernardis, P.A.R. Ade, et al. Exploring cosmic originswith CORE: The instrument.

Journal of Cosmology and As-troparticle Physics , 2018(04):015–015, apr 2018.[8] Arno A Penzias and Robert Woodrow Wilson. A measurementof excess antenna temperature at 4080 mc/s.

The AstrophysicalJournal , 142:419–421, 1965.[9] DJ Fixsen, ES Cheng, et al. The cosmic microwave backgroundspectrum from the full cobe* ﬁras data set.

The AstrophysicalJournal , 473(2):576, 1996.[10] H. K. Eriksen, I. J. O’Dwyer, et al. Power Spectrum Estima-tion from High-Resolution Maps by Gibbs Sampling.

ApJS ,155(2):227–241, December 2004.[11] H. K. Eriksen, J. B. Jewell, et al. Joint Bayesian Compo-nent Separation and CMB Power Spectrum Estimation.

ApJ ,676(1):10–32, March 2008.[12] Rajib Saha, Pankaj Jain, and Tarun Souradeep. A blind esti-mation of the angular power spectrum of CMB anisotropy fromWMAP.

The Astrophysical Journal , 645(2):L89–L92, jul 2006.[13] Rajib Saha, Simon Prunet, et al. Cmb anisotropy power spec-trum using linear combinations of wmap maps.

Phys. Rev. D ,78:023003, Jul 2008.[14] Vipin Sudevan and Rajib Saha. A global ILC approach in pixelspace over large angular scales of the sky using CMB covariancematrix.

The Astrophysical Journal , 867(1):74, nov 2018.[15] J. R. Bond, A. H. Jaﬀe, and L. Knox. Estimating the powerspectrum of the cosmic microwave background.

Phys. Rev. D ,57:2117–2137, Feb 1998.[16] Benjamin D. Wandelt and Frode K. Hansen. Fast, exact cmbpower spectrum estimation for a certain class of observationalstrategies.

Phys. Rev. D , 67:023001, Jan 2003.[17] Justin Alsing, Alan Heavens, et al. Hierarchical cosmic shearpower spectrum inference.

Monthly Notices of the Royal Astro-nomical Society , 455(4):4452–4466, 12 2015.[18] Frode K Hansen, Krzysztof M G´orski, and Eric Hivon. Gabortransforms on the sphere with applications to cmb power spec-trum estimation.

Monthly Notices of the Royal AstronomicalSociety , 336(4):1304–1328, 2002. https://healpix.sourceforge.io/

19] P. J. E. Peebles. Statistical Analysis of Catalogs of Extragalac-tic Objects. I. Theory.

ApJ , 185:413–440, October 1973.[20] Benjamin D. Wandelt, Eric Hivon, and Krzysztof M. G´orski.Cosmic microwave background anisotropy power spectrumstatistics for high precision cosmology.

Phys. Rev. D , 64:083003,Sep 2001.[21] Eric Hivon, Krzysztof M. Gorski, et al. Master of the cosmic mi-crowave background anisotropy power spectrum: A fast methodfor statistical analysis of large and complex cosmic microwavebackground data sets.

The Astrophysical Journal , 567(1):2–17,Mar 2002.[22] Reinecke, M. and Seljebotn, D. S. Libsharp - spherical harmonictransforms revisited.

A&A , 554:A112, 2013.[23] Franz Elsner, Boris Leistedt, and Hiranya V. Peiris. Unbiasedmethods for removing systematics from galaxy clustering mea-surements.

Monthly Notices of the Royal Astronomical Society ,456(2):2095–2104, 12 2015.[24] Kurt Hornik. Approximation capabilities of multilayer feedfor-ward networks.

Neural Networks , 4(2):251 – 257, 1991.[25] Allan Pinkus. Approximation theory of the mlp model in neuralnetworks.

Acta Numerica , 8:143–195, 1999.[26] Christopher M Bishop et al.

Neural networks for pattern recog-nition . Oxford university press, 1995.[27] Robert Hecht-Nielsen. Theory of the backpropagation neuralnetwork. In

Neural networks for perception , pages 65–93. Else-vier, 1992. [28] G Hinton, N Srivastava, and K Swersky. Neural networks formachine learning—lecture 6e—rmsprop: Divide the gradient bya running average of its recent magnitude. ,2012.[29] Diederik P Kingma and Jimmy Ba. Adam: A method forstochastic optimization. arXiv preprint arXiv:1412.6980 , 2014.[30] S. Sun, Z. Cao, et al. A survey of optimization methods froma machine learning perspective.

IEEE Transactions on Cyber-netics , 50(8):3668–3681, 2020.[31] Sebastian Ruder. An overview of gradient descent optimizationalgorithms. arXiv preprint arXiv:1609.04747 , 2016.[32] Krzysztof M Gorski, Eric Hivon, et al. Healpix: A frameworkfor high-resolution discretization and fast analysis of data dis-tributed on the sphere.

The Astrophysical Journal , 622(2):759,2005.[33] David N Spergel, Licia Verde, et al. First-year wilkinson mi-crowave anisotropy probe (wmap)* observations: determinationof cosmological parameters.

The Astrophysical Journal Supple-ment Series , 148(1):175, 2003.[34] Mart´ın Abadi, Ashish Agarwal, et al. TensorFlow: Large-scalemachine learning on heterogeneous systems, 2015. Softwareavailable from tensorﬂow.org., 148(1):175, 2003.[34] Mart´ın Abadi, Ashish Agarwal, et al. TensorFlow: Large-scalemachine learning on heterogeneous systems, 2015. Softwareavailable from tensorﬂow.org.