Joint Bayesian component separation and CMB power spectrum estimation
H. K. Eriksen, J. B. Jewell, C. Dickinson, A. J. Banday, K. M. Gorski, C. R. Lawrence
aa r X i v : . [ a s t r o - ph ] N ov Draft version October 22, 2018
Preprint typeset using L A TEX style emulateapj v. 08/22/09
JOINT BAYESIAN COMPONENT SEPARATION AND CMB POWER SPECTRUM ESTIMATION
H. K. Eriksen , J. B. Jewell , C. Dickinson , A. J. Banday , K. M. G´orski , C. R. Lawrence (Dated: Received - / Accepted -) Draft version October 22, 2018
ABSTRACTWe describe and implement an exact, flexible, and computationally efficient algorithm for jointcomponent separation and CMB power spectrum estimation, building on a Gibbs sampling framework.Two essential new features are 1) conditional sampling of foreground spectral parameters, and 2) jointsampling of all amplitude-type degrees of freedom (e.g., CMB, foreground pixel amplitudes, and globaltemplate amplitudes) given spectral parameters. Given a parametric model of the foreground signals,we estimate efficiently and accurately the exact joint foreground-CMB posterior distribution, andtherefore all marginal distributions such as the CMB power spectrum or foreground spectral indexposteriors. The main limitation of the current implementation is the requirement of identical beamresponses at all frequencies, which restricts the analysis to the lowest resolution of a given experiment.We outline a future generalization to multi-resolution observations. To verify the method, we analysesimple models and compare the results to analytical predictions. We then analyze a realistic simulationwith properties similar to the 3-yr WMAP data, downgraded to a common resolution of 3 ◦ FWHM.The results from the actual 3-yr WMAP temperature analysis are presented in a companion Letter.
Subject headings: cosmic microwave background — cosmology: observations — methods: numerical INTRODUCTION
Great advances have been made recently both in ex-perimental techniques for studying the cosmic microwavebackground (CMB) and in the measurements themselves.The angular power spectrum of temperature fluctuationshas been characterized over more than three decadesin angular scale (Hinshaw et al. 2007; Kuo et al. 2007;Readhead et al. 2004), and even the E-mode polariza-tion spectrum has now been measured to some preci-sion (Ade et al. 2007; Page et al. 2007; Montroy et al.2006; Sievers et al. 2007). In the coming years, evengreater improvements in sensitivity are expected, withthe Planck nearing completion.As the sensitivity of CMB experiments improves, therequirements on the control and characterization of sys-tematic effects also increase. It is of critical importanceto propagate properly the uncertainties caused by sucheffects through to the CMB power spectrum and cos-mological parameters, in order not to underestimate thefinal uncertainties, and thereby draw incorrect cosmolog-ical conclusions.A prime example of such systematic effects is non-cosmological foregrounds in the form of galactic andextra-galactic emission. With an amplitude rivaling thatof the temperature signal over a significant fraction ofthe sky and completely dominating the polarization sig-nal over most of the sky, the diffuse signal from our owngalaxy must be separated accurately from the CMB sig- email: [email protected] Institute of Theoretical Astrophysics, University of Oslo, P.O.Box 1029 Blindern, N-0315 Oslo, Norway Centre of Mathematics for Applications, University of Oslo,P.O. Box 1053 Blindern, N-0316 Oslo Jet Propulsion Laboratory, California Institute of Technology,Pasadena CA 91109 Max-Planck-Institut f¨ur Astrophysik, Karl-Schwarzschild-Str.1, Postfach 1317, D-85741 Garching bei M¨unchen, Germany Warsaw University Observatory, Aleje Ujazdowskie 4, 00-478Warszawa, Poland nal in order not to bias the cosmological conclusions.Further, the uncertainties in the separation process mustbe propagated through to the errors on the CMB powerspectrum and cosmological parameters.These problems have been discussed extensivelyin the literature, and many different approaches toboth power spectrum analysis and component sepa-ration have been proposed. Two popular classes ofpower spectrum estimation methods are the pseudo- C ℓ estimators (e.g., Wright et al. 1994; Hivon et al.2002; Szapudi et al. 2001) and maximum-likelihoodmethods (e.g., G´orski 1994, 1997; Bond et al. 1998).For a review and comparison of these methods, seeEfstathiou (2004). Examples of component sep-aration methods are the Maximum Entropy Method(Barreiro et al. 2004; Bennett et al. 2003b; Hobson et al.1998; Stolyarov et al. 2002, 2005), the Internal Lin-ear Combination method (Bennett et al. 2003b;Tegmark et al. 2003; Eriksen et al. 2004a), Wiener fil-tering (Bouchet & Gispert 1999; Tegmark & Efstathiou1996), the Independent Component Analysis method(Maino et al. 2002, 2003; Donzelli et al. 2006;Stivoli et al. 2006), and direct likelihood estimation(Brandt et al. 1994; G´orski et al. 1996; Banday et al.1996; Eriksen et al. 2006).The final step in a modern cosmological analysispipeline is typically to estimate a small set of parametersfor some cosmological model, which in practice is doneby mapping out the parameter posteriors (or likelihoods)using an MCMC code (e.g., CosmoMC; Lewis and Bri-dle 2002). To do so, one must establish an expression forthe likelihood L ( C ℓ ) = P ( d | C ℓ ), where C ℓ is a theoreti-cal CMB power spectrum and d are the observed data.It is therefore essential that the methods used in the baseanalysis pipeline (e.g., map making, component separa-tion, power spectrum estimation) allow one to estimatethis function both accurately and efficiently.A particularly appealing framework for this task is theCMB Gibbs sampler, pioneered by Jewell et al. (2004)and Wandelt et al. (2004). While a brute-force CMBlikelihood evaluation code must invert a dense signal-plus-noise covariance matrix, C = S + N , at a computa-tional cost of O ( N ), N pix being the number of pixels inthe data set, the Gibbs sampler only requires the signaland noise covariance matrices separately. Consequently,the algorithmic scaling is dramatically reduced, typicallyto either O ( N / ) or O ( N ) for data with white or cor-related noise, respectively.In addition to being a highly efficient CMB likeli-hood evaluator in its own right, as demonstrated byseveral previous analyses of real data (O’Dwyer et al.2004; Eriksen et al. 2007a,b), the Gibbs sampler also of-fers unique capabilities for propagating systematic uncer-tainties end-to-end. Any effect for which there is a well-defined sampling algorithm, either jointly with or con-ditionally on other quantities, can be propagated seam-lessly through to the final posteriors. One example of thisis beam uncertainties. Given some stochastic descriptionof the beam, for instance a mean harmonic space profileand an associated covariance matrix, one could sample ateach step in the Markov chain one particular realizationfrom this model and use the resulting beam for the nextCMB sampling steps, allowing for a short burn-in period.The CMB uncertainties will then increase appropriately.Similar approaches could be taken for uncertainties ingain calibration and noise estimation.However, rather than simply propagating a particularerror term through the system, one often wants to es-timate the characteristics of the effect directly from thedata. In that case, a parametric model P ( θ | d ), θ be-ing a set of parameters describing the effect, must bepostulated. Then, if it is both statistically and compu-tationally feasible to sample from this distribution, theeffect may be included in the joint analysis, and all jointposteriors will respond appropriately.In this paper, we describe how non-cosmologicalfrequency-dependent foreground signals may be includedin a Gibbs sampler. In this framework the CMB signal isassumed Gaussian and isotropic, while the foregroundsare modeled either in terms of fixed spatial templates(e.g., monopoles, dipoles, low/high-frequency observa-tions) or in terms of a free amplitude and spectral re-sponse function at each pixel. Our current code assumesidentical angular resolution for all frequency bands, butwe outline in § ◦ FWHM, we are able to produce the exact likelihood up to ℓ ∼ REVIEW OF BASIC ALGORITHMS
The algorithm developed in this paper is a essen-tially a hybrid of two previous algorithms, namely theCMB Gibbs sampler developed by Jewell et al. (2004),Wandelt et al. (2004) and Eriksen et al. (2004b), and theforeground MCMC sampler developed by Eriksen et al.(2006). In this section, we review these algorithms, em-phasizing an intuitive and pedagogical introduction tothe underlying ideas. In the next section we present theextensions required to make the hybrid code functional.Note that while we discuss temperature measure-ments only in this paper, the methodology for analyz-ing polarization measurements is completely analogous.For example, see Larson et al. (2007) and Eriksen et al.(2007b) for details on polarized power spectrum analysisthrough Gibbs sampling.
The CMB Gibbs sampler
We first review the Gibbs sampler for CMB tempera-ture measurements.
The CMB posterior
We choose our first data model to read d = s + n , (1)where d are the observed data, s is the CMB sky sig-nal, and n is instrumental noise. Complications such asmulti-frequency observations and beam convolution willbe introduced at a later stage.We assume both the CMB signal and noise to be Gaus-sian random fields with vanishing mean and covariancematrices S and N , respectively. In harmonic space,where s = P ℓ,m a ℓm Y ℓm , the CMB covariance matrixis given by C ℓm,ℓ ′ m ′ = h a ∗ ℓm a ℓ ′ m ′ i = C ℓ δ ℓℓ ′ δ mm ′ , C ℓ be-ing the angular power spectrum. The noise matrix N isleft unspecified for now, but we note that for white noiseit is diagonal in pixel space, N ij = σ i δ ij , for pixels i and j and noise variance σ i .Our goal is to estimate both the sky signal s and thepower spectrum C ℓ , which in a Bayesian analysis meansto compute the posterior distribution P ( s , C ℓ | d ). ByBayes’ theorem, this distribution may be written as P ( s , C ℓ | d ) ∝ P ( d | s , C ℓ ) P ( s , C ℓ ) (2) ∝ P ( d | s , C ℓ ) P ( s | C ℓ ) P ( C ℓ ) , (3)where P ( C ℓ ) is a prior on C ℓ , which we take to be uni-form in the following. Our final power spectrum distri-bution may thus be interpreted as the likelihood, andintegrated directly into existing cosmological parameterMCMC codes. Since we have assumed Gaussianity, thejoint posterior distribution may thus be written as P ( s , C ℓ | d ) ∝ e − ( d − s ) t N − ( d − s ) Y ℓ e − ℓ +12 σℓCℓ C ℓ +12 ℓ P ( C ℓ ) , (4)where σ ℓ ≡ ℓ +1 P ℓm = − ℓ | a ℓm | is the angular powerspectrum of the full-sky CMB signal. Gibbs sampling
In principle, we could map out this distribution over agrid in s and C ℓ , and the task would be done. Unfortu-nately, since the number of grid points required for suchan analysis scales exponentially with the number of freeparameters, this approach is not feasible.A potentially much more efficient approach is to mapout the distribution by sampling. However, direct sam-pling from the joint distribution in Equation 4 is diffi-cult even from an algorithmic point of view alone; weare not aware of any textbook approach for this. Andeven if there were, it would most likely involve inversesof the joint S + N covariance matrix, with a prohibitive O ( N ) scaling, in order to transform to the eigenspaceof the system.This is the situation in which Jewell et al. (2004) andWandelt et al. (2004) proposed a particular Gibbs sam-pling scheme. For a general introduction to the algo-rithm, see, e.g., Gelfand & Smith (1990). In short, thetheory of Gibbs sampling tells us that if we want to sam-ple from the joint density P ( s , C ℓ | d ), we can alternatelysample from the respective conditional densities as fol-lows, s i +1 ← P ( s | C iℓ , d ) (5) C i +1 ℓ ← P ( C ℓ | s i +1 , d ) . (6)Here ← indicates sampling from the distribution on theright-hand side. After some burn-in period, during whichthe samples must be discarded, the joint samples ( s i , C iℓ )will be drawn from the desired density. Thus, the prob-lem is reduced to that of sampling from the two condi-tional densities P ( s | C ℓ , d ) and P ( C ℓ | s , d ). Sampling algorithms for conditional distributions
We now describe the sampling algorithms for eachof these two conditional distributions, starting with P ( C ℓ | s , d ). First, note that P ( C ℓ | s , d ) = P ( C ℓ | s ); if wealready know the CMB sky signal, the data themselvestell us nothing new about the CMB power spectrum.Next, since the sky is assumed Gaussian and isotropic,the distribution reads P ( C ℓ | s ) ∝ e − s tℓ S − ℓ s ℓ p | S ℓ | = e − ℓ +12 σℓCℓ C ℓ +12 ℓ , (7)which, when interpreted as a function of C ℓ , is knownas the inverse Gamma distribution. Fortunately, thereexists a simple textbook sampling algorithm for this dis-tribution (e.g., Eriksen et al. 2004b), and we refer theinterested reader to the previous papers for details.The sky signal algorithm is even simpler from a statis-tical point of view, although more involved to implement.Defining the so-called mean-field map (or Wiener filtereddata) to be ˆ s = ( S − + N − ) − N − d , the conditional skysignal distribution may be written as P ( s | C ℓ , d ) ∝ P ( d | s , C ℓ ) P ( s | C ℓ ) (8) ∝ e − ( d − s ) t N − ( d − s ) e − s t S − s (9) ∝ e − ( s − ˆ s ) t ( S − + N − )( s − ˆ s ) . (10)Thus, P ( s | C ℓ , d ) is a Gaussian distribution with meanequals to ˆ s and a covariance matrix equals to ( S − + N − ) − . Sampling from this Gaussian distribution is straight-forward, but computationally somewhat cumbersome.First, draw two random white noise maps ω and ω withzero mean and unit variance. Then solve the equation (cid:2) S − + N − (cid:3) s = N − d + S − ω + N − ω . (11)for s . Since the white noise maps have zero mean, oneimmediately sees that h s i = ˆ s , while a few more calcula-tions show that h ss t i = ( S − + N − ) − .The problematic part about this sampling step is thesolution of the linear system in Equation 11. Since thisa ∼ × system for current CMB data sets, it can-not be solved by brute force. Instead, one must use amethod called Conjugate Gradients (CG), which onlyrequires multiplication of the coefficient matrix on theleft-hand side, not inversion. For details on these com-putations, together with some ideas on preconditioning,see Eriksen et al. (2004b). Generalization to multi-frequency data
For notational transparency, the discussion in the pre-vious sections was limited to analysis of a single sky map,and did not include the effect of an instrumental beam.We now review the full equations for the general case.See Eriksen et al. (2004b) for full details.Let d ν denote an observed sky map at frequency ν , N ν its noise covariance matrix, and A ν convolution withthe appropriate instrumental beam response. Equation11 then generalizes to " S − + X ν A tν N − ν A ν s = X ν A tν N − ν d ν + S − ω + X ν A tν N − ν ω ν . (12)Note that we now draw one white noise map for eachfrequency band, ω ν . The sampling procedure for P ( C ℓ | s )is unchanged. Computational considerations
Finally, we make two comments regarding numericalstability and computational expense. First, note that theelements of s have a variance equal to the CMB powerspectrum, which goes as C ℓ ∼ ℓ − . To avoid round-offerrors over the large dynamic range in the solution, it isnumerically advantageous to solve first for x = S − s inthe CG search, and then to solve (trivially) for s . Thesystem solved by CG in practice is thus " + S X ν A tν N − ν A ν S x = S X ν A tν N − ν d ν + ω + S X ν A tν N − ν ω ν . (13)Second, solving this equation by CG involves multipli-cation with expression in the brackets on the left-handside, and therefore scales as the most expensive operationin the coefficient matrix. For white noise, N ij = σ i δ ij ,this is the spherical harmonic transform required betweenpixel (for noise covariance matrix multiplication) andharmonic (for beam convolution and signal covariancematrix multiplication) space, with a scaling of O ( N / ).For correlated noise, it is the multiplication with a dense N pix × N pix inverse noise covariance matrix, with a scal-ing of O ( N ). The foreground sampler
The previous section described how to sample fromthe exact CMB posterior P ( s , C ℓ | d ) by Gibbs sampling.In this section, we very briefly review the algorithm forsampling general sky signals presented by Eriksen et al.(2006).First we define a parametric frequency model for thetotal sky signal, S ν ( θ ), θ representing the set of all freeparameters in the model. A simple example would be S ( T cmb , A s , β s ) = T cmb + A s a ( ν ) (cid:16) νν (cid:17) β s , where T cmb isthe CMB temperature, A s is the synchrotron emissionamplitude relative to a reference frequency ν , β s is thesynchrotron spectral index, and a ( ν ) is the conversionfactor between antenna and thermodynamic temperaturefor differential measurements. Note that no constraintsare imposed on the form of the spectral model in general,beyond the fact that it should contain at most N ν − N ν being the number of frequency bands ofthe experiment. In practice, one should also avoid modelsthat contain nearly degenerate parameters.Our goal is now to compute the posterior distribution P ( θ | d ) for each pixel. For this to be computationally fea-sible, we make two assumptions. First, we assume thatthe noise is uncorrelated between pixels, and second, thatthe instrumental beams are identical between frequencybands. If so, the data may be analyzed pixel-by-pixel,and the likelihood for a single pixel simply reads − L ( θ ) = χ ( θ ) = X ν (cid:18) d ν − S ν ( θ ) σ ν (cid:19) . (14)The posterior is as usual given by P ( θ | d ) ∝ L ( θ ) P ( θ ),where P ( θ ) is a prior on θ . Given this likelihood andprior, it is straightforward to sample from P ( θ | d ), forinstance by Metropolis-Hastings MCMC (Eriksen et al.2006), or by inversion sampling as described later in thispaper. JOINT CMB AND FOREGROUND SAMPLING
The main goal of this paper is to merge the two algo-rithms described in §§ P ( s , C ℓ , θ | d ). In this paper we focus on a matchedbeam response experiment, for which A ν = A , whichis sufficient for low-resolution analysis of high-resolutionexperiments such as WMAP and Planck. Data model and priors
We define the joint data model to be d ν = As + M X i =1 a ν,i t i + N X j =1 b j f j ( ν ) f j ++ K X k =1 c k g k ( ν ; θ k ) + n ν . (15) The first term on the right-hand side is the CMB sky sig-nal. The second term is a sum over M spatial templates, t i , each having a free amplitude a ν,i at each frequency,for example monopole and dipole components. The thirdterm is a sum over N spatial templates, f j with a fixedfrequency scaling f j ( ν ) and a single overall amplitude b j , for example the H α template (e.g., Dickinson et al.2003) coupled to a power law spectrum with free-freespectral index of β ff = − .
15. Such spatial templates area way of incorporating constraints on the sky from othermeasurements. Their value depends on the validity ofassumptions about spectra over large frequency ranges(e.g., H α as a proxy for free-free emission at CMB fre-quencies). Nevertheless, CMB experiments with too fewfrequencies to constrain foregrounds adequately on theirown require such templates to provide additional con-straints. Both t i and f j are assumed to be convolved tothe appropriate angular resolution of the experiment.The fourth term, the most important novel feature ofthis paper, is a sum over K foreground components eachgiven by an overall amplitude c k ( p ) and a frequency spec-trum g k ( ν ; θ k ( p )) at each pixel p . The spectral parame-ters θ k ( p ) may or may not be allowed to vary from pixelto pixel. By allowing independent frequency spectra atevery single pixel, the model is very general, and capableof describing virtually any conceivable sky signal.The fifth and last term, n ν , is instrumental noise.In the current implementation of our codes, we allowonly foreground spectra parametrized by a single spectralindex, g ( ν ; β ) = G ( ν )( ν/ν ) β , where G ( ν ) is an arbitrary,but fixed, function of frequency and ν is a referencefrequency. A typical example is synchrotron emission,which may be modelled accurately over a wide frequencyrange by a simple power law (in intensity, flux density,or antenna temperature units) with a spatially varyingspectral index. The CMB is most naturally described interms of thermodynamic units, and we adopt this con-vention in our codes. The corresponding synchrotronmodel is therefore g ( ν ; β ) = a ( ν )( ν/ν ) β , where a ( ν ) isthe antenna-to-thermodynamic conversion factor.As in any Bayesian analysis, we must adopt a set ofpriors for the parameters under consideration. For thispaper, we choose the prior most widely accepted in thestatistical community, namely Jeffreys’ ignorance prior(e.g., Box & Tiao 1992). This prior is given by the squareroot of the Fisher information measure, P θ ∝ p | F | θθ = p | − ∂ ln L /∂ θ | . Its effect is essentially to “normal-ize” the parameter volume relative to the likelihood, andmake the likelihood so-called “data translated”. We willreturn to the effect of this prior in § β described above, it reads P ( β ) ∝ pP ν (( G ( ν ) /σ ν )( ν/ν ) β ln( ν/ν )) . The difference be-tween this and a flat prior will be demonstrated in § Sampling from the joint posterior distribution
Having defined our data model and priors, the goalis now to estimate the joint CMB–foreground posterior P ( s , C ℓ , a ν,i , b j , c k , θ k | d ). This is achieved through thefollowing straightforward generalization of the previousGibbs sampling scheme, { s , a ν,i , b j , c k } i +1 ← P ( s , a ν,i , b j , c k | C iℓ , θ ik , d ) (16) θ i +1 k ← P ( θ k | s i +1 , a i +1 ν,i , b i +1 j , c i +1 k , d ) (17) C i +1 ℓ ← P ( C ℓ | s i +1 ) . (18)Explicitly, all amplitude-type degrees of freedom aresampled jointly with the CMB sky signal using a gen-eralization of Equation 11, while all non-linear spectralparameters are sampled conditionally by inversion sam-pling, as described in § Amplitude sampling
We first describe the algorithm for sampling from theconditional amplitude density, P ( s , a ν,i , b j , c k | C ℓ , θ k , d ) Conditional sampling of amplitudes — In principle, wecould take further advantage of the Gibbs sampling ap-proach, and sample each of s , a ν,i , b j and c k condition-ally, given all other parameters, including the amplitudesnot currently being sampled. This method was brieflydescribed by Eriksen et al. (2004b) for monopole anddipole sampling, and later used for actual analysis byboth O’Dwyer et al. (2004) and Eriksen et al. (2007b).Briefly stated, this approach simply amounts to subtract-ing each of the signals that is conditioned upon from thedata, and using the residual map to sample the remain-ing parameters in place of the full data set. Its mainadvantage is highly modularized, simple and transparentcomputer code.However, for general applications this is a prohibitivelyinefficient sampling algorithm due to poor mixing prop-erties and long Markov chain correlation lengths. Theproblem is due to strong correlations between the vari-ous amplitudes. Consider for instance a model includinga CMB sky signal, monopole and dipole components, anda foreground template. Note that the latter has botha non-zero monopole and dipole and also smaller-scalestructure.The conditional sampling algorithm would then go asfollows: First subtract the current monopole, dipole andforeground components from the data, and sample theCMB sky based on the residual map. The uncertaintiesin this conditional distribution are both cosmic varianceand instrumental noise. Second, subtract the recentlysampled CMB signal and foreground template from thedata, and sample the monopole and dipoles of the resid-ual. The only source of uncertainty in this conditionaldistribution is instrumental noise alone, and the nextsample therefore equals the previous state plus a noisefluctuation. For high signal-to-noise data the instrumen-tal noise uncertainty in a single all-sky number such asthe monopole and dipole amplitude is very small indeed, and the new sample is therefore essentially identical tothe previous. Finally, subtract the CMB signal and themonopole and dipole from the data, and sample the fore-ground template amplitude of the new residual. Again,with high signal-to-noise data the new amplitude is vir-tually identical to the previous.The failure of this approach stems from the fact thatthe main uncertainty in the monopole, dipole and tem-plate amplitudes is not instrumental noise, but ratherCMB cosmic variance coupled from template structures.This component is not explicitly acknowledged in theconditional template sampling algorithms when condi-tioning on the CMB signal, but only implicitly throughthe Gibbs sampling chain. The net result is an extremelylong Markov chain correlation length.The reasons this conditional approach worked well inthe analyses of Eriksen et al. (2004b), O’Dwyer et al.(2004) and Eriksen et al. (2007b) cases were different,and somewhat fortuitous: Only the monopole and dipolecomponents were included in the 1-yr WMAP temper-ature analysis, which couple only weakly to the low- ℓ CMB modes with a relatively small sky-cut. No fore-ground template sampling step as such was included,which would couple strongly to both the monopole,dipole and CMB signals. For the polarization analy-sis of Eriksen et al. (2007b), in which foreground tem-plates were indeed included, a different effect came intoplay, namely the very low signal-to-noise ratio of the 3-yrWMAP polarization data. At this signal-to-noise, evenconditional sampling works well.
Joint sampling of amplitudes — The solution to this prob-lem is to sample all amplitude-type degrees of freedomjointly from P ( s , a ν,i , b j , c k | C ℓ , θ k , d ). This is a four-component Gaussian distribution with mean ˆ x and co-variance matrix A . The required sampling algorithm istherefore fully analogous to that described in § x and A .To keep the notation tractable, we first define a sym-bolic four-element block vector of all amplitude coeffi-cients, x = ( s , a ν,i , b j , c k ) t . The first block of x con-tains the harmonic coefficients of s , the second blockcontains a ν,i for all frequencies and templates, the thirdcontains b j for all templates with a fixed spectrum,and the fourth contains the pixel amplitudes c k for allpixel-by-pixel foreground components. In total, x isan (( ℓ max + 1) + M N band + N + K N pix )-element vec-tor. We also define a corresponding response vector u ν = ( , t i , f j ( ν ) f j , g k ( ν ; θ k )) t , such that the data modelin Equation 15 may be abbreviated to d ν = x · u ν + n ν .With this notation, the joint amplitude distributionreads P ( s , a ν,i , b j , c k | C ℓ , θ k , d ) ∝ P ( d | s , C ℓ ,a ν,i , b j , c k ) P ( s | C ℓ ) ∝ e − P ν ( d ν − x · u ν ) t N − ν ( d ν − x · u ν ) e − s t S − s ∝ e − ( x − ˆ x ) t A − ( x − ˆ x ) . Here we have implicitly defined the symbolic 4 × A − = S − + A t N − A A t N − T A t N − F A t N − GT t N − A T t N − T T t N − F T t N − GF t N − A F t N − T F t N − F F t N − GG t N − A G t N − T G t N − F G t N − G (19)(see Appendix A for explicit definitions of each element inthis matrix) and a corresponding four-element symbolicblock vector for the Wiener-filter mean,ˆ x = A P ν A t N − ν dt tν,j N − ν d P ν f j ( ν ) f tj N − d P ν g k ( ν ; θ k ) N − ν d . (20)The sampling algorithm for this joint distribution isnow fully analogous to the one described in § N band + 1 white noise maps withzero mean and unit variance; 2) form the Wiener filtermean plus random fluctuation right-hand side vector, b = P ν A t N − ν d + C ω + P ν A tν N − ν ω ν t tν,j N − ν d + t tν,j N − ν ω ν P ν f j ( ν ) f tj N − d + P ν f j ( ν ) f tj N − ω ν P ν g k ( ν ; θ k ) N − ν d + P ν g k ( ν ; θ k ) N − ν ω ν ;(21)and 3) solve the set of linear equations, A − x = b . (22)The solution vector x then has the required mean ˆ x andcovariance matrix A . Again, for numerical stability itis useful to multiply both sides of Equation 22 by theblock-diagonal matrix P = diag( C − , , , ), and solvefor Px by CG.To demonstrate the difference in mixing efficiency be-tween conditional and joint amplitude sampling, Figure 1shows two trace plots for a high signal-to-noise simula-tion that included a CMB, a monopole, and a foregroundtemplate component. While the joint sampler instanta-neously moves into the right regime, and subsequentlyefficiently explores the correct distribution, the condi-tional sampler converges only very slowly toward the cor-rect value. The associated long Markov chain correlationlength makes this approach unfeasible for general prob-lems. Preconditioning — The performance of the CG algorithm(see Shewchuk (1994) for an outstanding introduction tothis method) depends sensitively on the condition num-ber of the coefficient matrix A , i.e., the ratio of the largestto the smallest eigenvalue. In fact, the algorithm is notguaranteed to converge at all for poorly conditioned ma-trices, due to increasing round-off errors in cases thatrequire many iterations.The condition number of the regularized A matrix isessentially the largest signal-to-noise ratio of any compo-nent in the system, which in practice means that of theCMB quadrupole or the template amplitudes. For cur-rent and future CMB experiments, such as WMAP andPlanck, the integrated signal-to-noise of these large-scalemodes is very large. It is therefore absolutely essential to Iteration count F o r e g r ound t e m p l a t e a m p lit ud e Input valueConditional sampling algorithmJoint sampling algorithm
Fig. 1.—
Comparison of trace plots generated by the joint (solidcurve) and the conditional (dashed curve) template amplitude sam-pling algorithms for a simulated data set consisting of a CMB skysignal, a monopole component and a synchrotron template compo-nent. The true input template amplitude is shown as a horizontaldotted line. construct an efficient preconditioner, M ≈ A , to decou-ple these modes brute-force, M − A x = M − b , simplyin order to achieve basic convergence.For the 4 × ℓ CMBcomponents we explicitly compute all elements of A upto some ℓ precond ∼ ℓ block is then coupled to the template amplitudesin a symbolic 3 × M = S − + A t N − A A t N − T A t N − FT t N − A T t N − T T t N − FF t N − A F t N − T F t N − F . (23)The elements in this matrix are computed by trans-forming each object individually into spherical harmonicspace, including modes only up to ℓ precond , and then per-forming the sums explicitly. (Note that the seemingly in-tuitive proposition of computing the template elementsin pixel space, as opposed to in harmonic space, is flawed;unless all elements are properly bandwidth limited, anon-positive definite preconditioning matrix will result.)For examples of such computations, see Eriksen et al.(2004b).The second part of our preconditioner regularizes thehigh- ℓ CMB components, and consists of the diago-nal elements A ℓm,ℓ ′ m ′ δ ℓℓ ′ δ mm ′ from ℓ precond + 1 to ℓ max (Eriksen et al. 2004b). The third part of our precon-ditioner covers the single pixel-pixel foreground ampli-tudes, which have low signal-to-noise ratio, and are pre-conditioned with the corresponding diagonal elements of A only, G t N − G .For a typical low-resolution WMAP3 application (fivefrequency channels degraded to N side = 64 and 3 ◦ FWHM resolution and regularized with 2 µ K RMS whitenoise), we find that including only the diagonal elementsin the above matrix can bring the fractional CG residualdown to ∼ − , while the recommended convergence cri-terion for single-precision data is 10 − . Thus, includingthe CMB-template cross-terms in the low- ℓ CMB pre-conditioner in Equation 23 is not just a question of per-formance for the signal-to-noise levels of WMAP; it isrequired in order to converge at all. The total number ofCG iterations is typically .
200 for the same applicationwith the previously described three-level preconditioner.For some further promising ideas on preconditioning forsimilar systems, see Smith et al. (2007).
Imposing linear constraints — A useful addition to theabove formalism is the possibility of imposing linear con-straints on one or more of the parameters. For instance,if it is possible to calibrate the absolute offset of onefrequency band by external information, for instance us-ing knowledge about the instrument itself, it would behighly beneficial to fix the corresponding monopole valueaccordingly. Another constraint may be to exclude tem-plate amplitude combinations with a given frequencyspectrum, in order to disentangle arbitrary offsets at eachfrequency from the absolute zero-level of a given fore-ground component.In the present code, we have implemented an option forimposing linear constraints on the template amplitudes a = { a ν,i } on the form X ν,i q kν,i a ν,i = q k · a = 0 , (24)where q k = { q kν,i } , k = 1 , . . . , N c , are constant orthog-onality vectors, and N c is the number of simultaneouslinear constraints. For example, if we want to obtain asolution with a fixed monopole amplitude at frequency ν , we would set q ν,i = δ ν,ν δ i, .The total dimension of the template amplitude vectorspace is D = M N band , M being the number of free tem-plates at each band. Within this space, the constraintvectors q span an N c -dimensional sub-space V to whichthe CG solution must be orthogonal; a must lie in thecomplement of V , denoted V c .To achieve this, we construct a projection operator P : R D → V c by standard Gram-Schmidt orthogonalization,which is a D × ( D − N c )-dimensional matrix P . To imposethe constraints defined by Equation 24 on the the finalCG solution, Equation 22 is rewritten as P t A − PP t x = P t b , (25)which is solved as before. Corresponding elements in thepreconditioner are similarly modified in order to main-tain computational efficiency. Spectral parameter sampling
With the amplitude sampling equations for P ( s , a ν,i , b j , c k | C ℓ , θ k , d ) in hand, the only missingpiece in the Gibbs sampling scheme defined by Equa-tions 16–18, is a spectral parameter sampler for P ( θ k | s , a ν,i , b j , c k , d ). In the FGFit code presented byEriksen et al. (2006), this task was done by Metropolis-Hastings MCMC, a very general technique that cansample from almost any multi-variate distribution.However, it has two disadvantages. First, hundreds ofMCMC steps may be required to generate two uncor-related samples, making the process quite expensive.Second, and even worse for our application, the chainsmay need to “burn in” at each main Gibbs iteration,because the amplitude parameters have changed sincethe last iteration. Proper monitoring of these issues is difficult for problems with tens of thousands of pixelswith very different signal-to-noise ratios.Therefore, we have replaced the MCMC sampler witha direct sampler, specifically a standard inversion sam-pler, in the present version of our codes. While this al-gorithm is only applicable for univariate problems, it isalso quite possibly the best such sampler, as it drawsfrom the exact distribution, and no computation of ac-ceptance probabilities is needed. The algorithm is the fol-lowing: First, compute the conditional probability den-sity P ( x | θ ), where x is the currently sampled parame-ter and θ denotes the set of all other parameters in themodel. In our application, P ( x | θ ) is the normalized prod-uct of the likelihood L in Equation 14 and any prior wewish to impose. Then compute the corresponding cumu-lative probability distribution, F ( x | θ ) = R x −∞ P ( y | θ ) dy .Next, draw a random number u from the uniform distri-bution U [0 , P ( x | θ ) is givenby F ( x | θ ) = u .For multivariate problems we use a Gibbs samplingscheme to draw from the joint distribution, and sampleeach parameter conditionally. For example, if we want toallow free amplitudes ( c s and c d ) and spectral indices ( β s and β d ) for both synchrotron and thermal dust emission,the full sampling scheme reads { s , c s , c s } i +1 ← P ( s , c s , c s | C iℓ , β i s , β i s , d ) (26) C i +1 ℓ ← P ( C ℓ | s i +1 ) (27) β i +1s ← P ( β s | s i +1 , , c i +1s , c i +1s , β i d , d ) (28) β i +1d ← P ( β d | s i +1 , , c i +1s , c i +1s , β i s , d ) (29)Note that it can be beneficial to iterate the latter twoequations more than once in each main Gibbs loop, in or-der to reduce the correlations between consequtive sam-ples cheaply. Typically, with two moderately correlatedspectral indices we run ∼ MARGINALIZATION, PRIORS AND DEGENERACIES
The algorithm described in § P ( s , C ℓ , a ν,i , b j , c k , θ k | d ). Fromthese multivariate samples we estimate each parameterindividually by marginalizing over all other parametersin the system and reporting, say, the marginal posteriormean and standard deviation.This is straightforward, but there are subtleties andcare is required. Before applying the method to simu-lated data in § µ K)-300-200-1000100 O ff s e t ( µ K ) µ K)-6-4-20 S pe c t r a l i nde x -300 -200 -100 0 100Offset ( µ K)-6-4-20 S pe c t r a l i nde x -300 -200 -100 0 100Offset ( µ K)0.0000.0050.0100.0150.020 M a r g i na l po s t e r i o r µ K)0.0000.0050.0100.015 M a r g i na l po s t e r i o r -6 -4 -2 0Spectral index0.00.10.20.30.40.5 M a r g i na l po s t e r i o r Fig. 2.—
The one- (bottom row) and two-dimensional (top row) marginal posteriors for the single-pixel and four frequency-band dataset described in § − P has dropped by 0.1, 2.3, 6.17 and 11.8, respectively, corresponding to the peak and 1,2, and 3 σ region for a Gaussian distribution. The crosses mark the true values. In the one-dimensional plots, the dashed lines indicate thetrue values, and the dotted lines show the marginal posterior mean. the overall zero-level of a foreground component with afree amplitude at each pixel. The same observations ap-ply to any full-sky template with a free amplitude at eachband (e.g., the three dipoles). For simplicity we discussonly offsets below. For the same reason, we neglect theantenna-to-thermodynamic temperature conversion fac-tor. When explicit formulae are derived, the simplifiedand more readable versions are given in the text; fullexpressions are given in Appendices B and C.It turns out that the degeneracy between unknown off-sets and the foreground zero-level has almost no effecton the CMB component. For the CMB, the relevantquantity is the sum over all foregrounds, not internaldegeneracies among different foregrounds. If one caresonly about separating the CMB from foregrounds, andnot the foregrounds themselves, much of the followingcan be ignored. The offset-amplitude-spectral index degeneracy fora single pixel
Consider a hypothetical experiment that observes asingle pixel at 30, 44, 70, and 100 GHz, with RMS noise10 µ K in each band. Assume that the signal is a straightpower-law parametrized by amplitude A and spectral in-dex β , and that the absolute offset of the detectors isknown perfectly for the three highest frequencies, butnot for the 30 GHz band. The signal model is T ν = m δ ν,ν + A (cid:18) νν (cid:19) β . (30)There are three free parameters in this system, the offset,amplitude, and spectral index, and four measurements. Since the number of constraints exceeds the number ofdegrees of freedom, it should be possible to estimate allthree parameters individually.We simulated one realization of this model, adoptingthe model parameters A = 100 µ K, β = − m =0 µ K, and adding white noise to each band. Our priorsare chosen to be uniform over − µ K ≤ A, m ≤ µ Kand − ≤ β ≤
0. We compute the joint posterior by asimple χ evaluation over a 200 × ×
200 grid, andmarginalize by direct integration.Figure 2 shows the results in terms of one- and two-dimensional marginal posteriors. The true input valuesare marked by thick crosses in the top panels and bydashed lines in the bottom panels. The posterior meansare shown by dotted lines in the bottom panels.This simple example highlights two problems that willrecur in later sections. First, as the top left panelshows, the offset and amplitude are highly degenerateand anti-correlated; one may add an arbitrary offset tothe 30 GHz band and subtract it from the foregroundamplitude, without affecting the final χ . This degen-eracy is a crucial issue for CMB component separation.Many foregrounds have power-law spectra, and differen-tial anisotropy experiments (e.g., WMAP) cannot deter-mine absolute offsets. The monopoles of the WMAPtemperature sky maps were determined a posteriori based on a co-secant fit to a crude plane-parallel galaxymodel (Bennett et al. 2003b; Hinshaw et al. 2007). Thisapproach is prone to severe modelling errors, preciselybecause of this type of degeneracy.The second problem is that integration over ahighly degenerate joint posterior yields complicated andstrongly non-Gaussian marginal posteriors. Obtaining -4 -3 -2 -1 0 1 2 Offset M a r g i n a l d i s t r i bu ti on Jeffereys’ priorUniform prior
Fig. 3.—
Comparison of the marginal offset posterior for a uni-form (dashed line) and Jeffreys’ ignorance (solid line) prior on thespectral index β . See § unbiased point estimates from these posteriors is nottrivial. Clearly, the posterior mean is not an unbiasedestimator. Further, as we will see in the next section,even the posterior maximum is biased in general, unlessspecial care is taken when choosing priors. Uniform vs. Jeffreys’ prior
The strong degeneracies found in the previous examplecan be broken partially by adding more data. Consider afull-sky data set pixelized at HEALPix resolution N pix =16 (3072 independent pixels). Reduce the noise to 1 µ KRMS per pixel. Use the same signal model as before, butwith an offset common to all pixels, T ν ( p ) = m δ ν,ν + A ( p ) (cid:18) νν (cid:19) β ( p ) . (31)We adopt the spatially varying synchrotron model ofGiardino et al. (2002) as a template for the amplitudeand spectral index of the signal componentWe simulated a new data set, and computed themarginal monopole posterior by direct integration. Thisis straightforward because, for a given value of m , theconditional amplitude-spectral index posterior reduces toa product of single-pixel distributions. The integrationtherefore goes over a sum of N pix two-dimensional grids,rather than a single 2 N pix grid.The result is shown as a dashed curve in Figure 3.Two points are noteworthy. First, the marginal distri-bution is nearly Gaussian, in contrast to the stronglynon-Gaussian single-pixel posterior shown in the bottompanel of Figure 2. Thus, the additional data seem tohave broken the degeneracy. Second, however, the distri-bution has a mean and standard deviation of − . ± . σ away from zero! Repeated experimentswith different noise seeds gave similar results.This behaviour is a result of the choice of prior. Weinitially adopted a uniform prior on the offset, the ampli-tudes and the spectral indices, with little thought to whywe should do so. This was a poor choice. Jeffreys (1961)argued that when nothing is known about a particularparameter, one ought to adopt a prior that does not im-plicitly prefer a given value over another, relative to the likelihood. This is not in general the uniform prior.Jeffreys argued that the appropriate ignorance prior isgiven by the square root of the Fisher information mea-sure, P J ( θ ) ∼ p F θθ = s − (cid:28) ∂ ln L ∂ θ (cid:29) , (32)where the angle brackets indicate an ensemble average.This prior ensures that no parameter region is preferredbased on the parametrization of the likelihood alone; itis therefore a proper ignorance prior (e.g., Box & Tiao1992).The log-likelihood corresponding to the model definedin Equation 30 reads − L = X ν d ν − mδ ν,ν − A (cid:16) νν (cid:17) β σ ν . (33)Computing the second derivatives of this expression withrespect to A , m , and β , we find that the appropriateJeffreys’ priors for the three parameters are P J ( A ) ∼ , (34) P J ( m ) ∼ , (35) P J ( β ) ∼ vuutX ν σ ν (cid:18) νν (cid:19) β ln (cid:18) νν (cid:19)! , (36)respectively. In general, the ignorance prior for any linearparameter in a Gaussian model is uniform because thesecond derivative of the likelihood is constant. However,for non-linear parameters greater care is warranted.Figure 4 shows Jeffreys’ prior for the spectral index β ,limited to − ≤ β ≤ − β = − β = −
4. Intuitively, this isnecessary because there is an asymmetry between a steepand a shallow spectrum: A steep spectrum means thatthe signal dies off quickly with frequency, while a shallowspectrum implies that it maintains its strength longer.Thus, there is a larger allowed parameter volume withsteep indices than with shallow, leading to an imbalancein terms of marginal probabilities. This parametrizationeffect is countered by the Jeffreys’ prior.The solid curve in Figure 3 shows the result of us-ing Jeffreys’ prior instead of a uniform prior. Similarbehaviour is observed independent of noise realization.The conclusion is clear: a proper ignorance prior leadsto unbiased estimates, while a naive uniform prior leadsto biased estimates.In addition to this basic ignorance prior, it may be ben-eficial to adopt physical priors, based on knowledge fromother experiments. For example, if one had reason toexpect that the dominant signal in a given data set wereGalactic synchrotron emission, a reasonable prior couldbe β = − . ± .
3, based on low frequency measurements.The physical prior is multiplied by the ignorance prior,taking account of both effects. In the rest of the paper,when we say that a Gaussian prior is adopted for thespectral indices, we mean a product of a Gaussian andJeffreys’ prior.0 -4 -3.5 -3 -2.5 -2
Spectral index J e ff e r e y s ’ i gno r a n ce p r i o r Fig. 4.—
Jeffreys’ ignorance prior for the spectral index β , de-fined by Equation 31. Steep indices are given less weight thanshallow ones to compensate for their smaller overall impact on thelikelihood. Marginalization over high-dimensional anddegenerate posteriors
The previous section shows that given sufficient dataand an appropriate prior, the marginal posterior is a goodestimator of the target parameter. In this section we in-vestigate what happens when the data are not sufficientlystrong to break a degeneracy. We replace the single-channel offset m by a template amplitude b coupled toa fixed free-free template t ff ( p ) and a spectral index of β ff = − . T ν ( p ) = b t ff ( p ) (cid:18) νν (cid:19) − . + A ( p ) (cid:18) νν (cid:19) β ( p ) . (37)Two modifications are made to the simulation. First,the spectral index of the synchrotron component is fixedto β s = −
3, rather than being spatially varying. Sec-ond, a fifth frequency channel is added at 143 GHz.No free-free component is added to the data; the op-timal template amplitude value is zero. The question iswhether these data are sufficient to distinguish betweensynchrotron and free-free emission with similar spectralindices of − − .
15, respectively.The answer is no. Figure 5 shows the marginal tem-plate amplitude posteriors, computed by direct integra-tion as in the previous section. The different curves cor-respond to different Gaussian priors imposed on the syn-chrotron spectral index. All are centered on the truevalue −
3, but with different standard deviations ∆ β s .With the strong prior of ∆ β s = 0 .
01, the amplitude pos-terior is well-centered near the true value of zero. How-ever, when the prior is gradually relaxed, the marginalposterior widens and drifts away from the true value.The marginal posterior is not a useful estimator for thetemplate amplitude in this case.This behaviour is explained by the fact that with 3072independent pixels the contribution of noise to the off-set amplitude is insignificant compared to the uncer-tainty introduced by coupling to the synchrotron com-ponent. Moreover, the amplitude and spectral indexdistributions are similar for the two foreground compo-nents. As a result, the joint distribution becomes long,narrow and curved, like that in the top middle panel -0.2 0 0.2 0.4 0.6 0.8
Free-free template amplitude M a r g i n a li ze d po s t e r i o r True value ∆β = 0.01 ∆β = 0.10 ∆β = 0.20 ∆β = 0.50 ∆β = 1.00 Fig. 5.—
Marginal free-free template amplitude posteriors forvarious priors on the synchrotron spectral index. See § of Figure 2, The marginal one-dimensional posteriors aredominated by the “boomerang wing” orthogonal to theparameter axis. Similarly, the “wing” parallel to the axisis diluted. Given sufficiently strong degeneracies, themarginal distributions no longer contain the maximum-likelihood point within their, say, 3 σ confidence regions.When the prior is made increasingly tight, however, thewings of the distribution are gradually cut off and themarginal distribution homes in on the true value. Thus,the collection of distributions shown in Figure 5 in somesense visualizes the joint posterior.This behaviour may be quantified by means of the co-variance matrix of the Gaussian amplitude part of thesystem, defined in Equation 19. A useful quantity de-scribing this matrix is its condition number, the ratio ofits largest and smallest eigenvalues. For the particularcase discussed above, we find that the condition numberis 4 × , which, although tractable in terms of numer-ical precision for double precision numbers , indicates avery strong degeneracy. The offset vs. amplitude degeneracy for full-skydata
The final example we consider before turning to real-istic simulations is the same as in § all frequency bands, not just one. (Thisis characteristic of real experiments, which do not knowthe absolute zero-point at any frequency.) The model is T ν = m ν + A ( p ) (cid:18) νν (cid:19) β ( p ) . (38)If the spectral index is constant over the sky, β ( p ) = β , The absolute limit on the condition number for reliable matrixinversion is 10 − for single-precision arithmetic and 10 − for dou-ble precision. However, in practice one should stay well below thesevalues, in particular for iterative applications since small numericalerrors may propagate in an uncontrolled manner. T ν = m ν + A ( p ) (cid:18) νν (cid:19) β (39)= ( m ν + δm (cid:18) νν (cid:19) β ) + ( A ( p ) − δm ) (cid:18) νν (cid:19) β (40)= m ′ ν + A ′ ( p ) (cid:18) νν (cid:19) β . (41)One can simply add a constant to the foreground ampli-tude, and subtract a correspondingly scaled value fromeach offset. It is thus impossible to determine individu-ally the absolute zero-level of foreground component andoffsets. To obtain physically relevant results, externalinformation must be imposed.Spatial variations in the spectral index partially re-solve this degeneracy. In the Giardino et al. (2002) syn-chrotron model the spectral index β G varies smoothly onthe sky between − . − .
2. The condition numberof the foreground amplitude-offset covariance matrix is2 × , and the covariance matrix is no longer singu-lar. A modified index model with ten times smaller fluc-tuations but the same mean ( β ( p ) = 0 . h β G i + 0 . β G )increases the condition number by two orders of magni-tude, to 2 × .These strong degeneracies lead to the same quantita-tive behaviour as seen for the marginal free-free tem-plate amplitude posterior in the previous section, mak-ing it very difficult to estimate both all offsets and theforeground amplitude zero-level individually. In practice,external constraints are required. We have implementedtwo approaches for dealing with this degeneracy in ourcode, both based on the projection operator described in § a-priori byexternal information. For instance, if an experimentsomehow measured total power, as opposed to differ-ences alone, detailed knowledge about the instrumentitself could be used for these purposes. The advantageof this approach is that it is exact, assuming the valid-ity of the prior, and the accuracy of all uncertainties ismaintained. It is implemented simply by demanding that m ν = 0, which requires, in terms of the orthogonalityvectors defined in § q ν,i = δ ν,ν δ i, .The second approach is based on the observation thatthe degeneracy between the foreground amplitude andthe offsets seen in Equation 41 leads to a very specificfrequency distribution of offset amplitudes. Specifically, m ′ ν = ( m ν + δm ( ν/ν ) β ), where δm is an arbitrary con-stant, but common to all frequency bands . It is thereforepossible to require that the set of offsets should not havea frequency spectrum that matches the foreground spec-trum.The corresponding constraint on m ν may be derivedfrom χ = X ν,pp ′ m ν − δm (cid:18) νν (cid:19) β ( p ) ! N − ν,pp ′ m ν − δm (cid:18) νν (cid:19) β ( p ′ ) ! . (42) by first taking the derivative with respect to δm , andthen enforcing a vanishing foreground component, δm =0, X ν m ν X p,p ′ N − ν,pp ′ (cid:18) νν (cid:19) β ( p ′ ) = 0 (43)The expression in brackets says that the offsets should beorthogonal to the mean noise-weighted foreground spec-trum.If the total signal model includes more than one sig-nal component with a free amplitude at each pixel, thenthese should be included jointly in the above χ . A par-ticularly important case is that including both a CMBsignal, which has a frequency-independent spectrum, anda proper foreground component. For this case, we have χ = X ν,pp ′ m ν − m − δm (cid:18) νν (cid:19) β ( p ) ! N − ν,pp ′ m ν − m − δm (cid:18) νν (cid:19) β ( p ′ ) ! , (44)where m is the additional degree of freedom introducedby the CMB signal. The equivalent constraint on m ν de-rived from this expression is notationally more involved(see Appendix C for a full derivation and constraints),but may be written as before in terms a set of orthogo-nality vectors q iν .While this orthogonality constraint is effective for es-timating the absolute zero-level of the foreground com-ponent in question, it corresponds to a strong implicitprior that is not likely to be compatible with reality. Ifthere are indeed random offsets at all frequencies, somefractional combination of these offsets will mimic a fore-ground component. In the above approach, this compo-nent is defined to be a foreground signal, rather than anoffset. Further, no mixing between the two componentsis allowed. Thus, the estimated error bars on both theoffsets and foreground zero-level will be underestimated.Recall, however, that this entire discussion concernsthe relative contributions to the foreground zero-leveland the free offsets, not the CMB signal, which relieson the sum of the two components alone. The factthat the estimated error in the foreground zero-levelis under-estimated by a small factor, say, four or five( σ est ∼ . µ K vs. σ true ∼ µ K; see the simulation de-scribed in § ∼ − µ K is detected in allfrequency bands, as well as a significant residual dipole2
Power spectrum, C l l ( l + 1)/2 π (10 µ K ) P o s t e r i o r d i s t r i bu ti on l = 2 l = 5 l = 10 l = 20 Fig. 6.—
Verification of the CMB power spectrum distributionsproduced by Gibbs sampling. Black lines show analytically com-puted slices through the joint likelihood L ( C ℓ ), and red lines showsthe same computed by Gibbs sampling. Vertical lines show the the-oretical input spectrum used to generate the simulation (solid lines)and the realization specific spectrum (dashed lines), respectively. in the V-band data. Summary
The above discussion may be summarized by the fol-lowing observations: • The marginal mean is a good estimator only formildly degenerate and non-Gaussian joint distri-butions. Strongly degenerate models should beavoided, because they are difficult to summarize bysimple statistics, and because it takes a prohibitivenumber of samples to fully explore them. • The uniform prior is a proper ignorance prior forGaussian variables only. In general, Jeffreys’ ruleshould be used in the absence of informative priors. • For experiments with unknown offsets at each fre-quency band, there is a strong degeneracy betweenthese offsets and the overall zero-level of the fore-ground amplitudes. This degeneracy should bebroken by external or internal priors, if marginalposteriors are to be used as estimators. CODE VERIFICATION In § § P ( C ℓ | s ), the amplitude distribution P ( s , a ν,i , b j , c k | C ℓ , θ k , d ), and the spectral parameter dis-tribution P ( θ k | s , a ν,i , b j , c k , d ). In the following threesubsections, we test the output from Commander forthese three conditional distributions against analyticalexpressions, at low resolution, to verify both the generalsampling algorithms and our specific implementation. -4 -2 0 2Spherical harmonic coefficient, a ( µ K) -10.5 -10 -9.5K-band offset ( µ K)508 512 516 520Synchrotron amplitude at K-band ( µ K) 0.99 0.995 1 1.005 1.01Dust template amplitude at W-band
Fig. 7.—
Verification of Gaussian amplitude sampler. Analyticalmarginal posteriors are shown by smooth distributions, and resultsfrom Commander are shown by histograms. The true values areindicated by vertical dashed lines.
The CMB power spectrum sampler
To verify the CMB power spectrum distribution P ( C ℓ | s ), we construct a low-resolution CMB-only sim-ulation as follows. Draw a random CMB realizationfrom a standard ΛCDM power spectrum (Spergel et al.2007), smooth to 10 ◦ FWHM, and pixelize at N side = 16.Add white noise of 1 µ K RMS to each pixel. Imposethe WMAP Kp2 sky cut (Bennett et al. 2003b), withoutpoint sources and downgraded to N pix = 16, on the data.We compute slices through the corresponding like-lihood by considering each ℓ individually, fixing allother multipoles at the input power spectrum, with abrute-force calculation in pixel-space (e.g., Eriksen et al.2007a), and with Commander. The outputs fromthe latter are smoothed through Rao-Blackwellization(Chu et al. 2005) to reduce Monte Carlo errors.Figure 6 shows the results for four multipoles. The the-oretical input spectrum is shown by vertical solid lines,and the true realization spectrum by dashed lines. Com-mander reproduces the CMB power spectrum distribu-tions perfectly. The Gaussian amplitude sampler
To verify the amplitude distribution P ( s , a ν,i , b j , c k | C ℓ , θ k , d ), we construct a simulationat N side = 8 (768 independent pixels, angular resolution20 ◦ FWHM). The CMB realization is the same as inthe previous section, appropriately smoothed. Fivefrequency channels are simulated, corresponding tothe five WMAP channels. In addition to the CMBsky signal, s , we add a synchrotron signal, c ( p ) witha spatially varying spectral index, a dust templatewith an amplitude, b , scaled to unity at W-band,and a a = − µ K monopole to the K-band. (Seeforeground description in Section 6.1 for further detailson this model.) Thus, all four types of amplitudes arerepresented. White noise of 1 µ K RMS is added to eachpixel at each frequency.We fix the CMB power spectrum and synchrotron spec-tral index map, and compute the joint Gaussian ampli-tude distribution both analytically and with Comman-3
100 150 200 250 300Synchrotron amplitude00.0050.010.015 M a r g i n a l d i s t r i bu ti on True valueAnalyticalCommander-4 -3.5 -3 -2.5 -2Spectral index00.511.52 M a r g i n a l d i s t r i bu ti on -40 -30 -20 -10 0 10 20 30Offset for 30GHz band00.010.020.030.04 M a r g i n a l d i s t r i bu ti on Fig. 8.—
Comparison of marginal posteriors for a single pixel,computed both analytically (thick, dashed curve) and with Com-mander (thin line). The true value is indicated by a vertical dottedline. der. The analytical computation is performed by directevaluation of the mean ˆ x and covariance matrix A de-fined by Equations 19 and 20. The marginal variances ofeach parameter are given by the diagonal elements of A .Figure 7 shows the marginal distributions for one pa-rameter of each type. Again, Commander reproduces theexact analytical result perfectly. The spectral index sampler
To verify the spectral index sampler for P ( θ k | s , a ν,i , b j , c k , d ), a single-pixel distribution, we simulate a single pixel. The signal model is identicalto that in § APPLICATION TO SIMULATED 3-YR WMAP DATA
We turn now to a more realistic simulation, with prop-erties corresponding to the 3-yr WMAP data. The sim-ulation has two goals. First, to show that the methodcan handle data with realistic complexities, and that itis applicable to the current WMAP data and (even moreimportantly) the up-coming Planck data. Second, to pro-vide the necessary background for understanding the re-sults from the actual 3-yr WMAP analysis presented byEriksen et al. (2007c).
Simulation, model and priors
We construct the simulation as follows. Draw a CMBsky realization from the best-fit ΛCDM power spectrumpresented by Spergel et al. (2007). Convolve with theeach of the beams of the ten differencing assemblies ofWMAP (Bennett et al. 2003a), pixelized at a HEALPixresolution of N side = 512. Add white noise to each mapwith standard deviation σ ( p ) = σ / p N obs ( p ), where N pix is the number of observations at pixel p (providedon Lambda together with the actual sky maps).Downgrade these ten maps to a common resolutionof 3 ◦ FWHM and N pix = 64, bandlimiting each mapat ℓ max = 150. Create frequency maps by co-addingdifferencing assembly maps at the same frequency (e.g.,Q = (Q1+Q2)/2). Add uniform white noise of 2 µ K RMSto each frequency map to regularize the noise covariancematrix.Figure 9 shows the CMB and noise spectra of the co-added V-band data, both at the native resolution of thefrequency band (dashed lines) and at the common 3 ◦ FWHM resolution (solid lines). The CMB to regulariza-tion noise ratio is unity at ℓ ∼ ℓ max = 150. Both the instrumental and the regular-ization signal-to-noise ratios are &
500 at ℓ ≤
50, andtherefore negligible at these scales compared to cosmicvariance.Instrumental noise averaged over the full sky is largerthan the regularization noise everywhere below ℓ . ℓ . χ in the ecliptic plane. However, since this error term iscorrelated only on very small scales (the beam size of 3 ◦ FWHM), and we understand its origin and benign be-haviour, it does not represent a significant problem forthe analysis. With additional years of WMAP observa-tions, and the addition of the Planck data, this noise http://lambda.gsfc.nasa.gov4
50, andtherefore negligible at these scales compared to cosmicvariance.Instrumental noise averaged over the full sky is largerthan the regularization noise everywhere below ℓ . ℓ . χ in the ecliptic plane. However, since this error term iscorrelated only on very small scales (the beam size of 3 ◦ FWHM), and we understand its origin and benign be-haviour, it does not represent a significant problem forthe analysis. With additional years of WMAP observa-tions, and the addition of the Planck data, this noise http://lambda.gsfc.nasa.gov4 -4 -2 P o w e r s p ec t r u m , C l l ( l + ) / π ( µ K ) CMB -- unconvolvedCMB - 3 degree FWHMInstrumental noise -- unconvolvedInstrumental noise - 3 degree FWHMRegularization noise l S i gn a l - t o - no i s e Fig. 9.—
Top panel: CMB (black lines) and noise power spectra(blue and red lines) for the 3-yr WMAP V-band channel. Solidlines show spectra for the 3 ◦ FWHM data set, and dashed linesshow spectra at the native resolution of the V-band data. Bot-tom panel: CMB signal to noise ratio for instrumental (blue) andregularization (red) noise, respectively. contribution will be further suppressed. Further, we willconsider in the future various approaches for taking thisterm into account, for instance by computing explicitlythe corresponding sparse covariance matrix.Our foreground model has three components, syn-chrotron, free-free, and thermal dust emission. For syn-chrotron emission, where the spectral index is known tovary substantially with position on the sky, we extrapo-late the 408 MHz map (Haslam et al. 1982) using a mapof the spectral index for each pixel. For the latter we usean updated version of the Giardino et al. (2002) spec-tral index map that is based on 408 MHz and WMAP23GHz data, after removing the free-free emission via theWMAP MEM free-freemodel (Bennett et al. 2003b) .The free-free model is defined by the H α template ofFinkbeiner (2003), scaled to 23 GHz assuming an elec-tron temperature of T e = 4000 K and a spatially constantspectral index of β ff = − .
15. The dust model is based onmodel 8 of Finkbeiner et al. (1999), evaluated at 94 GHzand scaled to other frequencies using a single-componentmodified black-body spectrum with T d = 18 . α = 2 .
0. Anomalous dust is ignoredin this analysis.Guided by the results of Eriksen et al. (2007c), we adda common offset to all frequencies of − µ K. No dipolecontributions are added to the simulations.For the power spectrum analysis in § These models were produced as part of the development ofthe Planck Sky Model, under the coordination of Planck WorkingGroup 2. between sky-cut and foreground-induced effects.We adopt the same parametric signal model as thatused by Eriksen et al. (2007c), T ν ( p ) = s ( p ) + m ν + X i =1 m iν [ˆ e i · ˆ n ( p )] ++ b " t ( p ) a ( ν ) (cid:18) νν dust0 (cid:19) . + f ( p ) a ( ν ) (cid:18) νν (cid:19) β ( p ) . (45)The first term is the CMB sky signal. The second andthird terms are the monopole m and three dipole com-ponents m i defined by standard Cartesian basis vectors.The fourth term is a dust tracer, based on the FDS tem-plate coupled to a fixed spectral index of β d = 1 .
7, anda free overall amplitude b . The postulated power-lawspectrum does not match the modified black-body spec-trum used to create the simulation, and modelling errorsare therefore to be expected. The fifth term is a singlelow-frequency foreground component with a free ampli-tude f ( p ) and spectral index β ( p ) at each pixel p . Theantenna-to-thermodynamic differential temperature con-version factor is a ( ν ), as always.In addition to the previously described Jeffreys’ prior,we adopt a prior of β = − ± . β ff = − .
15 is only 2.8 σ awayfrom the prior mean, and it does not take a large free-free amplitude to overcome this. For instance, near thegalactic plane the standard deviation of the marginal in-dex posterior is ∼ .
01, thirty times smaller than theprior width. At high latitudes, on the other hand, thesynchrotron spectral index is for all practical purposesunconstrained. The prior prevents this component frominterfering with the CMB signal in regions where its am-plitude is low.We impose the orthogonality constraint discussed in § ∼
50 s, paral-lelized over five 2.6 GHz AMD Opteron 2218 processors,one for each frequency band. We generate five chainswith 1000 samples each, for a total wall clock time of14 hr. The total computational cost is 350 CPU hours.
Burn-in, correlation lengths and convergence
We begin our examination of the results by plotting theoutput Markov chains as a function of iteration countin Figure 10. Each panel shows the evolution of oneparameter, such as the CMB power spectrum coefficientfor a single multipole or the dust template amplitude.Burn-in is a crucial issue for Markov chain algorithms.The chains were initialized with a random CMB powerspectrum over-dispersed relative to the true distribution,and the spectral indices of the low-frequency foreground5 χ ( )
1 23 C ( µ K ) σ ( µ K ) -3-11 C M B , a , ( µ K ) F G a m p lit ud e ( µ K ) -3.0-2.8-2.6 F G i nd e x -30-20-10 K - b a nd m onopo l e ( µ K ) D u s t a m p lit ud e Fig. 10.—
Trace plots showing the evolution of the Gibbs chainsas a function of iteration count, for each type of parameter. Thetrue input values are indicated by a horizontal dashed line, whereapplicable. The sky map type parameters are thinned by a factorof ten to reduce disk space requirements. component were drawn randomly and uniformly between − −
2. The Gibbs sampler needs some time to con-verge to the equilibrium distribution; as we see in Fig-ure 10, about 200 iterations are required to reach theequilibrium state.The last parameters to equilibrate are the globalmonopole and dust amplitudes, because the uncertaintyin these very high signal-to-noise parameters is verysmall, and only small steps can be made between con-secutive Gibbs samples. Moreover, since these are globalparameters, they couple to all other parameters.The χ trace plot shows an interesting feature. After reaching a minimum χ solution after about 100 itera-tions, the chain stabilizes at a very slightly higher equi-librium value. This is due to the fact that the full dis-tribution consists not only of the sky signal components,but also the CMB power spectrum. Maximizing the totaljoint posterior value is therefore a compromise betweenminimizing the sky signal χ and optimizing the CMBpower spectrum posterior. At iteration number 100, theCMB component is still burning in, whereas the fore-ground amplitude, the single most important parameterin terms of χ , has already reached its equilibrium. TheMarkov chain thus overshoots in χ minimization untilthe CMB power spectrum equilibrates.Correlation length is a second crucial issue for Markovchain algorithms. In general, classic Metropolis-Hastingsalgorithms have a long correlation length because theypropose relatively small modifications at each iterationin order to maintain high acceptance probability. TheGibbs sampler works differently. Because it samples fromexact conditional distributions, large jumps are perfectlyfeasible, at least in the absence of strong conditional cor-relations. (In the present case there are no such strongcorrelations.) The CMB power spectrum and CMB skysignal are only weakly correlated in the high signal-to-noise regime, and the foreground spectral index couplesonly moderately strongly to the foreground amplitude ofthe same pixel, and weakly to anything else. The re-sult is excellent mixing properties and short correlationlengths.This translates into a high sampling efficiency anda relatively small number of samples required for con-vergence. To quantify this, we adopt the widelyused Gelman-Rubin R statistic (Gelman & Rubin 1992),which is the ratio between two variance estimates. If theMarkov chains have converged, the two estimates shouldagree, and their ratio, R , should be close to unity. Atypical recommendation is that R should be less than 1.1to claim convergence, given that the chains were initiallyover-dispersed, although smaller numbers are clearly bet-ter.Computing this statistic for the five chains above, whilediscarding the first 200 samples, we find that R is lessthan 1.01 for the CMB power spectrum up to ℓ ∼ § Component separation results
We now turn to the marginal distributions of the es-timated signal parameters, and focus first on the sig-nal components. The CMB power spectrum is discussedseparately in the next section. The sky map results aresummarized in Figure 11 in terms of the marginal pos-terior means, standard deviations, and differences be-tween the posterior means and the input maps. Table 1lists the monopole and dipole results. The dust tem-plate amplitude posterior mean and standard deviationis b = 1 . ± . Fig. 11.—
Marginal posterior sky signals from the WMAP simulation. The left column shows the posterior mean for each pixel, themiddle column shows the posterior standard deviation, and the right column shows the difference between the estimated posterior meanand the known input signal. From top to bottom, the rows show 1) the CMB solution; 2) the low-frequency foreground amplitude solution;and 3) the low-frequency spectral index solution.
Fig. 12.—
Mean χ map computed over posterior samples. Avalue of χ = 15 corresponds to a χ that is high at the 99%significance level. pelling. No obvious foreground residuals are observedin the CMB map, familiar structures such as the NorthGalactic Spur and Gum Nebula are seen in the fore-ground amplitude map, and the spectral index map dis-tinguishes clearly between the known synchrotron andfree-free regions.These visual considerations are quantified in the rightcolumn, where the input maps has been subtracted fromthe posterior means . We see that the CMB map hasresiduals at the ∼ µ K RMS level, with a peak-to-peakamplitude of ± µ K. Little of these residuals is corre-lated on the sky except for a few patches near the galacticplane. Most of the differences are simply due to instru- For the foreground amplitude, the input map was estimatedby fitting a single power law to the sum of the synchrotron andfree-free components.
TABLE 1Monopole and dipole posterior statistics
Monopole Dipole X Dipole Y Dipole ZBand ( µ K) ( µ K) ( µ K) ( µ K)K-band − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . − . ± . Note . — Means and standard deviations of the marginalmonopole and dipole posteriors. mental noise.For the foreground amplitude, more distinct correlatedpatches are seen, in particular in regions with strong free-free emission. This is due to the fact that a single powerlaw is not a sufficiently good approximation to the sumof the free-free and synchrotron components, relative tothe statistical uncertainty.Finally, even the the spectral index difference mapshows clearly correlated regions, and additionally a neg-ative bias of about − .
1. This bias is primarily due totwo effects. First, as reported at the beginning of thissection, the dust template amplitude is over-estimatedby 1–2%, mainly because of mis-specification of the dustspectrum. As a result, slightly too much signal is sub-tracted from the higher frequency channels, and this inturn steepens the spectral index of the remaining signal.Second, at high latitudes the data are noise dominated,and the β = − ± . ∼ − . ∼ − . ∼ . ∼ ± σ . The expected peak-to-peak range in a differ-ence plot is therefore roughly three times the RMS er-ror.) This is due to modelling errors in two forms. Firstand foremost, we neglected the smoothed instrumentalnoise in our data model, and this causes a significant un-modelled uncertainty at the smoothing scale. However,being random with zero mean, it does not induce sig-nificant structure on larger scales, and it therefore hasnegligible impact on the scales of cosmological interest( ℓ ≤ − µ K and0 µ K, respectively. In general, these values are recon-structed reasonably well, although the error estimatesare somewhat underestimated for the Ka- and Q-bandmonopoles. We see that the orthogonality constraint de-scribed in § ℓ modes in the powerspectrum, even though it looks visually dominating in amarginal RMS map. Second, the masked Galactic planehas a very high uncertainty, although not infinite; therequirement of isotropy implies that the modes insidethis plane is to some extent restricted, at least on largeangular scales. Finally, as expected there is a (weaker)correlation between the foreground amplitude and spec-tral index maps and the CMB RMS map.Figure 12 shows the average χ computed over the 4000accepted samples. A value of 15 in this plot correspondsto a model that is excluded at the 99% confidence level.Two points are worth noticing in this plot. First, theecliptic plane stands out with higher χ values. As de-scribed above, this is due to the unmodelled, smoothedinstrumental noise. At a smoothing scale of 3 ◦ FWHM, this component is not fully negligible for the 3-yr WMAPdata relative to the CMB signal, and therefore causes aslight bias at the smallest scales, ℓ & ℓ ≤
50, and in this range the in-strumental signal-to-noise ratio exceeds 100 everywhere.This term does not affect the CMB signal of primaryinterest.The actual foreground-induced modelling errors arevery small. Indeed, despite the fact that we approximatethe sum of two different power laws with a single com-ponent, and assume an incorrect dust spectrum, the χ distribution is essentially perfect near the ecliptic poles,and the residuals are very small even close to the Galacticplane.However, the χ for the global solution as a whole issomewhat poor, with a reduced χ of 1.68. This largevalue is largely dominated by unmodelled smoothed in-strumental noise in the ecliptic plane, as discussed above;when analyzing the same data set at a smoothing scale of6 ◦ , rather than 3 ◦ , we found a reduced χ ∼ .
1. Thus,when using the products from this analysis in subsequentstudies (e.g., for cosmological parameter estimation), itis important to include only those scales that are unaf-fected by the degradation process itself.”In summary, the overall results are very promising in-deed. The CMB sky signal is reconstructed to withina few percent everywhere, as is the foreground ampli-tude. Further, the spectral indices are accurate to the ∼ . ℓ ≤
30. This is the topic of the next section.
CMB power spectrum and cosmological parameters
We now consider the CMB power spectrum posteriorwith the goal of understanding the impact of both resid-ual foregrounds and error propagation on the final re-sults. To do so, we consider both the identical simula-tion described in the previous section and a similar onein terms of instrumental properties and sky coverage, butexcluding foregrounds both from the simulated data andthe model. Comparing the two against each other allowsus to disentangle the effects of foregrounds and sky cut.The top panel of Figure 13 shows the posterior meanrealization specific spectrum, σ ℓ , for three different cases.First, the true full-sky spectrum is plotted as a red line.Second, the cut-sky but CMB-only spectrum is shown8 σ l l ( l + ) / π ( µ K ) Full sky, no noise spectrumPosterior mean without foregroundsPosterior mean with foregrounds ( σ l - σ l r e f ) l ( l + ) / π ( µ K ) σ l fg - σ l true σ l fg - σ l no fg Fig. 13.—
Top panel: Comparison of the posterior mean realiza-tion power spectra, σ ℓ , 1) as computed on the full sky (i.e., truerealization spectrum; red line); 2) as computed on the cut sky, butnot including foregrounds either in the simulation or in the model(blue line); and 3) as computed with Commander on the full simu-lation, including the foreground complexity (black line). The grayarea indicates the 1 σ confidence region for the case that includesforegrounds. Bottom panel: Difference between the spectra shownabove; blue line shows the difference with and without foregrounds,and the red line shows the difference between the spectra includingforegrounds and a sky cut and the true spectrum . ∆ C l l ( l + ) / π ( µ K ) Cosmic varianceNoise and sky cutForegrounds
Multipole moment, l ∆ C l / ∆ C l cv Fig. 14.—
Top panel: Contributions to the total power spec-trum uncertainty from cosmic variance (dotted line), sky cut andinstrumental noise (dashed line) and foregrounds (solid line), re-spectively. Bottom panel: Ratio of noise and sky cut errors (dashedline) and foreground errors (solid line) to cosmic variance. as blue curve, and finally, the cut-sky and “foreground-contaminated” spectrum is shown as a black curve. The1 σ confidence region about the latter is marked as a grayregion. The bottom panel shows the difference betweenthe foreground-contaminated spectrum and the full-skyspectrum (red), and difference between the two cut-skyspectra (blue).In terms of absolute differences, we see that the fore-ground errors (blue curve in Figure 13) are in generalless than 50 µ K at ℓ ≤
30, with a few occasional peaksat ∼ µ K , and without any striking biases. Alreadyat this point, we may thus predict that the absolute ef-fect of residual temperature foregrounds on cosmologicalparameters will be small at large angular scales, when us-ing the component separation method presented in thispaper.Next, we consider the foreground induced uncertain-ties. First we note that if the total CMB spectrum un-certainty has been properly estimated, then the full-skydifference spectrum (red curve in Figure 13) should bedistributed according to the uncertainties indicated bythe gray region. Except for some noticeable correlatedfeatures around ℓ ≈
50, this agreement is quite reason-able. Second, the blue curve shows the differences due toforegrounds alone. This term should thus be describedby a corresponding increase in the total uncertainty whenincluding foregrounds in the analysis.To understand the relative magnitude of these contri-butions, it is instructive to compute the relative magni-tudes of the errors due to cosmic variance, sky cut andinstrumental noise, and foregrounds. These can be esti-mated from quantities ready at hand. First, the standardexpression for the cosmic variance isVar(cosmic variance) = 22 ℓ + 1 C ℓ . (46)Second, the uncertainty due to the mask and instrumen-tal noise alone is given by the variance of σ iℓ ,Var(mask,noise) = h ( σ iℓ ) − h σ iℓ i i , (47)where σ iℓ are generated in the analysis without fore-grounds. Similarly, the uncertainty due to the combinedeffect of the mask, instrumental noise, and foregroundsis given by the same expression but computed from thesamples that also include foregrounds. To establish anorder of magnitude approximation of the foreground-induced uncertainty alone, we assume that the variancesadd in quadrature,Var(fg) = Var(mask,noise,fg) − Var(mask,noise) (48)This is of course not strictly correct, because the errorsin question are quite non-Gaussian, but it is a sufficientapproximation for our purposes.These three functions are shown in the top panel ofFigure 14 for the data set described above. In the bot-tom panel, we show the ratio of the mask and noise errorand the foreground error, respectively, to cosmic vari-ance. First we note that the foreground error is alwayssmaller than the mask and noise induced error, exceptat the very highest ℓ ’s, where the estimates are any-way not reliable. However, at ℓ ∼
40 these two arealmost comparable in magnitude, both at the 25–50%level of the cosmic variance. Once again assuming that9 l = 2 l = 3 l = 5 l = 10 l = 50 l = 40 l = 30 l = 20 Power spectrum, C l l ( l + 1)/2 π ( µ K ) P o s t e r i o r d i s t r i bu ti on Fig. 15.—
Slices through the CMB power spectrum likelihood. Black lines show the distributions from the foreground-free simulation,and red lines show the same for the simulation that did include foregrounds. The ensemble-averaged spectrum is indicated by verticalsolid lines, and the true realization-specific spectrum by dashed lines. Note that at very low ℓ ’s, the distributions are essentially identical,because of cosmic variance domination. At ℓ & these errors add in quadrature, neglecting a 20% errorterm implies underestimating the full errors by about2% ( √ . − . ℓ ≤ ℓ ≤
30, simply because thecosmic variance is so totally dominating. However, thesame does not necessarily hold true on smaller scales.At ℓ ≈
40, the foreground error increases the total un-certainty by several percent, and this is likely to increasefurther to smaller scales . It could constitute a very sig-nificant fraction of the total error in the range between ℓ ∼ ℓ = 2 and 50, for the twocases that exclude (black curves) and include (red curves)foregrounds. In these plots, we see the same behaviouras discussed above: At very low ℓ ’s, the widths of the dis-tributions with and without foregrounds are essentiallyidentical. However, at ℓ = 40 the effect of the foregrounduncertainties starts to become visible, as the red distri-bution is noticeably wider than the black distribution.Still, it is also very clear from this figure that the effectof neglecting foreground errors in the temperature spec-trum at ℓ .
50 in terms of cosmological parameters willbe minimal.To demonstrate this statement, we define a simple two- The peculiarly low foreground error at ℓ ≈
50 may be con-nected with the lack of features in the dust template at these an-gular scales; we do not believe it is a general feature. parameter power spectrum model, C ℓ = q (cid:18) ℓ (cid:19) n C in ℓ , (49)where q is a free overall amplitude, n is a spectral tiltparameter, and C in ℓ is the actual theoretical power spec-trum used to generate the simulation. We then map outthe corresponding two-dimensional posterior using theBlackwell-Rao estimator (Chu et al. 2005). The analysisis restricted to the range between ℓ = 2 and 30, whichis the primary target range for our first WMAP analysis(Eriksen et al. 2007c).Figure 16 shows the results in terms of two sets ofcontours. The dashed contours show the results from theanalysis excluding foregrounds, and the solid contoursshow the results including foregrounds. The true valueof ( q, n ) = (1 ,
0) is marked by a cross. The agreementbetween the two sets of results is excellent.Two conclusions may be drawn from this exercise.First, the method presented in this paper is fully ca-pable of extracting the valuable cosmological signal fromthe 3-yr WMAP data at large angular scales in quite re-alistic simulations, even when using the simplified fore-ground model described earlier. Second, the increaseduncertainty in cosmological parameters due to these fore-grounds at ℓ .
30 is negligible. CONCLUSIONS
We have presented an algorithm for joint componentseparation and CMB power spectrum estimation. Thisalgorithm is a natural extension of the CMB Gibbssampler previously developed by Jewell et al. (2004),Wandelt et al. (2004) and Eriksen et al. (2004b), and theforeground sampler described by Eriksen et al. (2006).The basic product from this algorithm is a set of jointsamples drawn from the full joint posterior P ( C ℓ , s , θ | d ),where C ℓ is the CMB power spectrum, s is the CMBsky signal, and θ denotes the set of all parameters in0 q -0.4-0.20.00.20.4 S pe c t r a l i nde x , n Fig. 16.—
Joint posterior distributions for the two-parametermodel defined by Equation 49. Dashed contours show the resultsfor the analysis without foregrounds, and solid contours show thesame for the analysis with foregrounds. In both cases, the contoursare where − P ( q, n | d ) rises by 0.1, 2.3, 6.17 and 11.8 from itsminimum value, corresponding (for Gaussian distributions) to thepeak and the 1, 2, and 3 σ confidence regions. The cross marks thetrue input value, ( q, n ) = (1 , the foreground model. With this tool, exact marginal-ization over very general foreground models is feasible,and proper foreground uncertainties may be propagatedseamlessly through to the CMB power spectrum and,therefore, to cosmological parameters.There are some potential pitfalls the user of the methodneeds to be aware of before applying it to real data. Inparticular, one has to pay attention to possible degen-eracies in the parametric signal model fitted to the data.Such degeneracies are not uncommon in models relevantto CMB foreground analysis. Two specific examples aresynchrotron and free-free emission, with spectral indicesof β s ∼ − β ff = − .
15, respectively, and the degen-eracy between unknown offsets at all frequency bandsand the foreground zero-level. In order to obtain reliableone-dimensional marginal estimates of each componentindividually, one must either make sure that the datahave sufficient power to resolve the model, or impose pri-ors to break the degeneracies. Fortunately, since only thesum over all foregrounds is relevant to the actual CMBreconstruction, these issues are of little concern to thecosmological interpretation of the data.The primary target of this work is Planck, scheduledto be launched in late 2008, and which will observe themicrowave sky at nine frequencies from 30 to 857 GHz.Combined with the five WMAP frequencies, these four-teen sky maps will constitute an outstanding data setfor both cosmological and Galactic studies. Using themethods described in this paper, it will be possible toconstrain three or four foreground components pixel-by-pixel, and even more if adopting spatial templates aspriors (e.g., the H α template as a tracer for free-free). A more immediate application is the analysis of the 3-yr WMAP temperature data, which is presented in a sep-arate Letter by Eriksen et al. (2007c). As demonstratedin the present paper, the method is fully capable of ex-tracting the valuable cosmological signal at large angularscales from this data set, both in terms of the CMB powerspectrum and cosmological parameters. Further, a verygeneral foreground model may be constrained to withina few percent in all parameters.In its present form, the method assumes identical beamresponses at each frequency band. This limits its appli-cation to the lowest angular resolution of a given dataset. However, this is not a fundamental limitation of themethod, but only of our current implementation. Specif-ically, it is straightforward to rewrite the sampling equa-tions presented in § ℓ . ℓ ∼ APPENDIX
THE TEMPLATE AMPLITUDE COUPLING MATRIX
In Section 3.2.1 we described the sampling algorithm for the conditional Gaussian distribution P ( s , a ν,i , b j , c k | C ℓ , θ k , d ). This involved calculating the joint mean,ˆ x = A P ν A t N − ν dt tν,j N − ν d P ν f j ( ν ) f tj N − d P ν g k ( ν ; θ k ) N − ν d , (A1)and multiplying with the inverse covariance matrix, A − = S − + A t N − A A t N − T A t N − F A t N − GT t N − A T t N − T T t N − F T t N − GF t N − A F t N − T F t N − F F t N − GG t N − A G t N − T G t N − F G t N − G . (A2)We now write each element in this matrix explicitly, for the benefit of readers who want to implement the algorithmthemselves. The elements in the first row of blocks are given by, S − + A t N − A ≡ S − + X ν A tν N − ν A ν (A3) A t N − T ≡ A tν N − ν t ν,j (A4) A t N − F ≡ X ν A tν N − ν f j ( ν ) f j (A5) A t N − G ≡ X ν A tν N − ν g k ( ν ; θ k ) (A6)The upper part of the second row is T t N − T ≡ t tν,j N − t ν,k (A7) T t N − F ≡ X ν t tν,j N − ν f k ( ν ) f j (A8) T t N − G ≡ X ν t tν,j N − ν g k ( ν ; θ k ) . (A9)Note that in the above, T T N − T is the block containing the second derivatives of the log density with respect tothe amplitudes ( a ν,j , a ν ′ ,k ), but by the assumed independence of noise at different frequency channels, the terms with ν = ν ′ vanish. The elements of the upper part of the third row are F t N − F ≡ X ν f j ( ν ) f tj N − ν f k f k ( ν ) (A10) F t N − G ≡ X ν f j ( ν ) f tj N − g k ( ν ; θ k ) , (A11)and, finally, for the last block on the diagonal we have G t N − G = X ν g tj ( ν ; θ j ) N − ν g k ( ν ; θ k ) . (A12) JEFFREYS’ IGNORANCE PRIOR FOR A SPECTRAL INDEX In § a ( ν ).The data model is in this case d ν = Aa ( ν ) (cid:18) νν (cid:19) β + n ν , (B1)where n ν is a noise term with variance σ ν = (cid:10) n ν (cid:11) . We assume no noise correlations between frequencies. Thelog-likelihood then reads − L ( β ) ∝ X ν (cid:18) d ν − Aa ( ν )( ν/ν ) β σ ν (cid:19) . (B2)2Jeffreys’ ignorance prior is defined by P J ( β ) ∼ p F ββ = s − (cid:28) ∂ ln L ∂ β (cid:29) , (B3)where the averaging brackets denote an ensemble average. We therefore differentiate the log-likelihood twice and find − ∂ ln L ∂ β = 4 X ν σ ν ( A a ( ν ) (cid:18) νν (cid:19) β ln (cid:18) νν (cid:19) − (cid:20) d ν − Aa ( ν ) (cid:18) νν (cid:19)(cid:21) Aa ( ν ) (cid:18) νν (cid:19) β ln (cid:18) νν (cid:19)) (B4)Taking the ensemble average of this expression simply means setting h d ν i = Aa ( ν )( ν/ν ) β , and the second termtherefore vanishes, − (cid:28) ∂ ln L ∂ β (cid:29) ∝ X ν A a ( ν ) σ ν (cid:18) νν (cid:19) β ln (cid:18) νν (cid:19) , (B5)neglecting irrelevant constants. Thus, the full expression for Jeffreys’ ignorance prior for a spectral index β reads P J ( β ) ∝ vuutX ν a ( ν ) σ ν (cid:18) νν (cid:19) β ln (cid:18) νν (cid:19) (B6) TEMPLATE ORTHOGONALITY CONSTRAINTS
Most of the discussion in § χ .To break this degeneracy, two approaches were proposed. First, if external calibration is possible at one or morefrequencies, such information should be exploited and conditioned upon. The second approach was to demand thatthe free offsets do not have frequency component similar to that of the foreground, by effectively fitting out thecorresponding spectrum from the offsets.These issues apply more generally to any fixed template with a free amplitude at all frequency bands. For mosttypical applications, this includes also three unknown dipole components, in addition to the familiar offset or monopole.In the following, we simply consider a general collection of templates denoted by T , which is an N pix × N t matrixconsisting of N t templates listed in its columns. Coupled to this, we define a coefficient vector a ν = { a ν,i } containingthe template amplitudes for each frequency and template. The sky response at frequency ν is thus given by Ta ν .We now derive the joint orthogonality constraints for N t templates, for a sky model that includes two componentswith a free amplitude at each pixel, namely a CMB sky signal and a foreground component with a given frequencyspectrum. We denote the vectors of free template coefficients for the CMB and foreground terms by b and c , respec-tively, and the foreground spectrum is defined by the N pix × N pix diagonal matrix F ν having entries equal to g ( ν ; θ )on the diagonal. Note that the frequency spectrum of the CMB component is constant, and the corresponding matrixis therefore the identity. It is omitted in the following.With this notation, the χ to be minimized reads χ = X ν (cid:0) a tν T t − b t T t − c t F tν T t (cid:1) N − ν ( Ta − Tb − TF ν c ) (C1)Equating the derivatives of this expression with respect to b and c to zero gives ∂χ ∂ b = − X ν T t N − ν ( Ta − Tb − TF ν c ) = 0 (C2) ∂χ ∂ c = − X ν F tν T t N − ν ( Ta − Tb − TF ν c ) = 0 (C3)These two sets of equations provide a general expression for a as a function of b and c , and at this point we have donenothing more than performed a partial change of basis.We now impose the prior that breaks the degeneracy between the template and the foreground amplitudes, byrequiring c = 0. The two above equations then has a unique solution. In particular, a is given by X ν ( B ν − DC − A ν ) a = 0 , (C4)3where we, for notational transparency, have defined four ancillary matrices A ν = T t N − ν T (C5) B ν = F t T t N − ν T (C6) C = X ν A ν (C7) D = X ν B ν . (C8)Note that Equation C4 corresponds to N t separate constraints on a , and, in particular, each row of the matrix inparentheses defines one orthogonality vector q ν . These constraints are thus imposed in the CG solver using the sameprojection operator method that was described in § REFERENCESAde, P., et al. 2007, ApJ, submitted, [arXiv:0705.2359]Baccigalupi, C., Burigana, C., Perrotta, F., De Zotti, G., LaPorta, L., Maino, D., Maris, M., & Paladini, R. 2001, A&A,372, 8Baccigalupi, C. 2003, New Astronomy Review, 47, 1127Baccigalupi, C., Perrotta, F., de Zotti, G., Smoot, G. F.,Burigana, C., Maino, D., Bedini, L., & Salerno, E. 2004,MNRAS, 354, 55Banday, A. J., G´orski, K. M., Bennett, C. L., Hinshaw, G.,Kogut, A., & Smoot, G. F. 1996, ApJ, 468, L85Barreiro, R. B., Hobson, M. P., Banday, A. J., Lasenby, A. N.,Stolyarov, V., Vielva, P., & G´orski, K. M. 2004, MNRAS, 351,515Bennett, C. L. et al. 2003a, ApJS, 148, 1Bennett, C. L., et al. 2003b, ApJS, 148, 97Bond, J. R., Jaffe, A. H., & Knox, L. 1998, Phys. Rev. D, 57, 2117Bouchet, F. R., & Gispert, R. 1999, New Astronomy, 4, 443Box, G. E. P, & Tiao, G. C. 1992,
Bayesian Inference inStatistical Analysis , Wiley Classics Library edition, John Wileyand SonsBrandt, W. N., Lawrence, C. R., Readhead, A. C. S.,Pakianathan, J. N., & Fiola, T. M. 1994, ApJ, 424, 1Chu, M., Eriksen, H. K., Knox, L., G´orski, K. M., Jewell, J. B.,Larson, D. L., O’Dwyer, I. J., & Wandelt, B. D. 2005,Phys. Rev. D, 71, 103002Delabrouille, J., Cardoso, J.-F., & Patanchon, G. 2003, MNRAS,346, 1089Dickinson, C., Davies, R. D., & Davis, R. J. 2003, MNRAS, 341,369Donzelli, S., et al. 2006, MNRAS, 369, 441Efstathiou, G. 2004, MNRAS, 349, 603Eriksen, H. K., Banday, A. J., G´orski, K. M., & Lilje, P. B.2004a, ApJ, 612, 633Eriksen, H. K., et al. 2004b, ApJS, 155, 227Eriksen, H. K., et al. 2006, ApJ, 641, 665Eriksen, H. K., et al. 2007a, ApJ, 656, 641Eriksen, H. K., Huey, G., Banday, A. J., G´orski, K. M., Jewell,J. B., O’Dwyer, I. J., & Wandelt, B. D. 2007b, ApJ, 665, L1Eriksen, H. K., Dickinson, C., Jewell, J. B., Banday, A. J.,G´orski, K. M., & Lawrence, C. R. 2007c, ApJ, submitted,[arXiv:0709.1037]Finkbeiner D.P., Davis M., & Schlegel D.J. 1999, ApJ, 524, 867Finkbeiner, D. P. 2003, ApJS, 146, 407 Gelfand, A. E., & Smith, A. F. M. 1990, J. Am. Stat. Asso., 85,398Gelman, A., & Rubin, D. 1992, Stat. Sci., 7, 457Giardino, G., Banday, A. J., G´orski, K. M., Bennett, K., Jonas,J. L., & Tauber, J. 2002, A&A, 387, 82G´orski, K. M. 1994, ApJ, 430, L85G´orski, K. M., Banday, A. J., Bennett, C. L., Hinshaw, G.,Kogut, A., Smoot, G. F., & Wright, E. L. 1996, ApJ, 464, L11G´orski, K. M. 1997, Proc. XVIth Moriond Astrophysics Meeting,Microwave Background Anisotropies, Editions Frontieres,Gif-sur-Yvette, p.77, [astro-ph/9701191]G´orski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen,F. K., Reinecke, M., & Bartelmann, M. 2005, ApJ, 622, 759Haslam, C. G. T., Salter, C. J., Stoffel, H., & Wilson, W. 1982,A&AS, 47, 1Hinshaw, G., et al. 2007, ApJS, 170, 288Hivon, E., G´orski, K. M., Netterfield, C. B., Crill, B. P.,Prunet,S., & Hansen, F. 2002, ApJ, 567, 2Hobson, M. P., Jones, A. W., Lasenby, A. N., & Bouchet, F. R.1998,MNRAS, 300, 1Jeffreys, H. 1961,
Theory of probability , third edition, Oxford,Clarendon PressJewell, J., Levin, S., & Anderson, C. H. 2004, ApJ, 609, 1Kuo, C. L., et al. 2007, ApJ, 664, 687Larson, D. L., Eriksen, H. K., Wandelt, B. D., G´orski, K. M.,Huey, G., Jewell, J. B., & O’Dwyer, I. J. 2007, ApJ, 656, 653Lewis, A., & Bridle, S. 2002, Phys. Rev. D, 66, 103511Liu, J. S. 2001,
Monte Carlo Strategies in Scientific Computing ∼ quake-papers/painless-conjugate-gradient.psSievers, J. L., et al. 2007, ApJ, 660, 976Smith, K. M., Zahn, O., & Dor´e, O. 2007, Phys. Rev. D, 76,043510Spergel, D. N., et al. 2007, ApJS, 170, 377Stivoli, F., Baccigalupi, C., Maino, D., & Stompor, R. 2006,MNRAS, 372, 6154