Astrophysical data analysis with information field theory
AAstrophysical data analysis with information field theory
Torsten Enßlin
Max Planck Institut für Astrophysik (Karl-Schwarzschild-Straße 1, D-85748 Garching, Germany), andLudwig-Maximilians-Universität München (Geschwister-Scholl-Platz 1, D-80539 München, Germany)
Abstract.
Non-parametric imaging and data analysis in astrophysics and cosmology can be addressed by information fieldtheory (IFT), a means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical field theory,which permits the construction of optimal signal recovery algorithms. It exploits spatial correlations of the signal fields evenfor nonlinear and non-Gaussian signal inference problems. The alleviation of a perception threshold for recovering signals ofunknown correlation structure by using IFT will be discussed in particular as well as a novel improvement on instrumental self-calibration schemes. IFT can be applied to many areas. Here, applications in in cosmology (cosmic microwave background,large-scale structure) and astrophysics (galactic magnetism, radio interferometry) are presented.
INFORMATION FIELD THEORY
Information field theory (IFT) is information theory for fields, describing in mathematical language how informationon spatially distributed quantities following some physical laws can optimally be extracted from data. IFT exploitsknown or inferred correlation structures of the field of interest s ( s is regarded as a fuction s = s ( x ) and as a vector s = ( s x ) x ∈ Ω in a function space) over some domain Ω = { x } in order to regularize the otherwise ill-posed inverseproblem of determining virtually infinitely many field degrees of freedom from a finite dataset d = ( d , . . . d n ) T = ( d i ) i , n ∈ N . What distinguishes it from many non-parametric inference methods is that an IFT is defined over continuousspaces, and any pixelization of the field used in actual computations must preserve this continuum limit and recover itfor infinitely small pixels. A concise introduction into IFT can be found Ref. [1], exhaustive ones in Refs. [2, 3], andthe numerical issues of properly discretized fields are addressed in Refs. [4, 5].Signal field estimation in IFT relies on Bayes theorem that gets recast into a (statistical) field theoretical language: P ( s | d ) = P ( d | s ) P ( s ) P ( d ) ≡ e − H ( d , s ) Z d . (1)This defines the information Hamiltonian and its partition function to be H ( d , s ) ≡ − ln P ( d , s ) = − ln P ( d | s ) − ln P ( s ) and (2) Z d ≡ (cid:90) D s e − H ( d , s ) = (cid:90) D s P ( d , s ) = P ( d ) , THE MEASUREMENT PROBLEM
In a generic linear measurement the data depend linearly on the signal, d = R s + n = (cid:18) (cid:90) Ω dx R ix s x + n i (cid:19) i , (4) a r X i v : . [ a s t r o - ph . I M ] M a y ith R the instrument response operator, which describes how the instrument senses the signal, and n the noise. In thesimplest case, signal and noise priors are independent Gaussian distributions, P ( n , s ) = G ( s , S ) G ( n , N ) , with G ( φ , Φ ) = (cid:112) | π Φ | exp (cid:18) − φ † Φ − φ (cid:19) , (5)with S and N known signal and noise covariances, respectively, and † denoting the adjoint of a vector or field. Then,the signal posterior is Gaussian as well, P ( s | d ) = G ( s − m , D ) , with m = D j , D = (cid:0) S − + R † N − R (cid:1) − , and j = R † N − d . (6)Here, m is the signal posterior mean or Wiener filter estimate, D the signal uncertainty covariance or informationpropagator, and j the information source [3].A typical signal in astronomy is the intensity distribution I x of light of a certain wavelength on the celestial sphere Ω = S . Since I x is strictly positive and can vary over orders of magnitude, the logarithmic brightness s x = log ( I x / I ) would be a natural variable that can be as well positive as negative. The Maximum Entropy principle singles out aGaussian prior for s if only information up to the second moment of the s -distribution is available a priori or should betaken into account. Furthermore, the likelihood might be a Poisson distribution, if individual photons are counted. Allthis complicates the inference problem and introduces so-called interaction terms into the information Hamiltonian, butcan in principle be dealt with the toolbox of IFT [3, 6, 7]. For simplicity, we assume in the following the measurementEq. (4).In practical applications of IFT, it is often the case that the signal covariance S , the noise covariance N , the response R , or combinations thereof are not sufficiently known for applying the Wiener filter given by Eq. (6). We denote allsuch undetermined parameters with the parameter vector p . Its uncertainty has to be taken into account in the signalinference. A p -marginalized joint probability of data and signal might be calculated analytically, P ( d , s ) = (cid:90) D p P ( d , s , p ) . (7)However, the corresponding information Hamiltonian H ( d , s ) = − ln P ( d , s ) is usually a complicated function of thesignal, preventing an easy calculation of the posterior mean m . Simple MAP signal estimators, resulting from solving ∂ H ( d , s ) / ∂ s = s , can be highly biased due to the strong skewness of the marginalized posterior [8].The Empirical Bayes approach performs much better. There, the s -marginalized parameter posterior P ( d , p ) = (cid:82) D s P ( d , s , p ) is used to choose a good point estimate p (cid:63) , which then is assumed to be correct in a signal meaninference based on P ( s | d , p (cid:63) ) . The reason for Empirical Bayes to be a logically consistent approximation can beobserved in the following reformulation of the posterior mean signal, m ≡ (cid:104) s (cid:105) ( s , p | d ) ≡ (cid:90) D p (cid:90) D s P ( s , p | d ) s = (cid:90) D p P ( p | d ) (cid:90) D s P ( s | d , p ) s = (cid:104)(cid:104) s (cid:105) ( s | d , p ) (cid:105) ( p | d ) ≈ (cid:104) s (cid:105) ( s | d , p (cid:63) ) . (8)In the last step a delta-approximation, P ( p | d ) ≈ δ ( p − p (cid:63) ) , was used. Thus, Empirical Bayes can be regarded asthe zeroth order term of a perturbation expansion in the posterior parameter uncertainty. The next order term mightbe obtained by using a Gaussian approximation of P ( p | d ) . A more systematic expansion can be obtained usingthe minimal Gibbs free energy approach [6]. However, already the empirical Bayes approximation leads to well-working signal inference recipes in case of unknown hyper-parameters for the problems of unknown covariancesand/or response. UNKNOWN SIGNAL COVARIANCE
The signal covariance S might be unknown, however, statistical homogeneity can often be assumed, i.e., the correlation (cid:104) s x ¯ s y (cid:105) ( s ) = S xy = C s ( x − y ) is only a function of the distance between x , and y . This implies that S is diagonal in Fourierspace, S kk (cid:48) = ( π ) u δ ( k − k (cid:48) ) P s ( k ) , (9)with k , k (cid:48) being u -dimensional Fourier wave-vectors. In this case, a suitable hyper-prior for the unknown signal powerspectrum P s ( k ) might be given by a term penalizing deviations from power-law spectra and inverse Gamma priors for .11101000.01 0.1 1 10 100critical filter x = ( P d / P N )( k i ) y = p i P Q ( k i ) PUREMAP spectrumclassicaljoint MAP - ϱ /(2 ϱ +4) δε MAP spectrumPUREcriticalclassicaljoint MAP no p e r ce p ti on t r e s ho l dp e r ce p ti on t r e s ho l d Source: Figures from [8]. the individual Fourier modes, P ( P s ) ∝ exp (cid:34) − (cid:90) dk (cid:18) ∂ log P s ( k ) σ k ∂ log k (cid:19) (cid:35)(cid:124) (cid:123)(cid:122) (cid:125) ≡− log P † s T log P s ∏ k [ P s ( k )] − α k exp (cid:18) − q k P s ( k ) (cid:19) . (10)The hyper-parameters σ k , α k , and q k determine how strongly spectral curvature is penalized and how informative thespectral prior is for the individual modes. α k = q k = Empirical Bayes approximation we have to deduce a point estimate of the spectrum. Using the MAP approximation inlog P s ( k ) for this [7], we obtain P (cid:63) s ( k ) = q k + Tr (cid:104)(cid:0) m (cid:63) m (cid:63) † + D (cid:63) (cid:1) ( k ) (cid:105) α k − + ρ k + ( T log P (cid:63) s ) k , (11)with ( k ) a projection onto all Fourier modes with the same power-spectrum as k (e.g., on spheres in Fourier space,in case of statistical isotropy) and ρ k = Tr [ k ] the number of such modes. m (cid:63) and D (cid:63) are the mean and uncertaintycovariance, respectively, calculated for the Gaussian inference problem assuming P (cid:63) s ( k ) to be self-consistently thecorrect power spectrum.It is instructive to investigate this signal filter operation (the self-consistent calculation of m (cid:63) and P (cid:63) s ) in the contextof similar filters. For this, we simplify the problem by assuming that signal and noise covariances, as well as responseare diagonal in Fourier space, and that our spectral prior is completely non-informative ( σ k = ∞ , α k =
1, and q k = k ). A generic spectral estimator (to be calculated consistently with the corresponding signal estimator m (cid:63) = D (cid:63) j ) isthen P (cid:63) s ( k ) = Tr (cid:104)(cid:0) m (cid:63) m (cid:63) † + δ D (cid:63) (cid:1) ( k ) (cid:105) ρ k + ε , (12)where the parameters δ and ε have been introduced in order to see how differently signal and spectrum estimatorsperform. The above derived non-informative filter corresponds to ( δ , ε ) = ( , ) , but other assumptions or approxi-mations might lead to different values. These are depicted in the top right figure on this page. For example, the MAPestimator using the power spectrum marginalized posterior leads to ( δ , ε ) = ( , ) (labeled “classical”) and a param-eter uncertainty renormalized estimator (PURE) can be projected to ( δ , ε ) = ( , − ρ / ( ρ + )) . The above estimatorwith ( δ , ε ) = ( , ) is called critical filter , because it resides on a critical line for perception thresholds. All the es-timators to the left of this line exhibit a perception threshold. They do not respond at all to data modes with powerbelow some critical value. This is depicted in the left figure, where the power in the resulting reconstructions of thesefilters is shown as a function of the ratio of data to noise power. Filters with perception threshold require the datao have more variance than expected from the noise. For example, the classical filter with ( δ , ε ) = ( , ) does notaccount for the missing power in the reconstruction. As a consequence it has a strong perception threshold. It requiresthe data to have four times the variance of the noise before it recognizes a signal. The critical filter has only a marginalperception threshold at the point where the data variance is exactly the noise variance. The PURE filter tries even toextract information from data with less variance than expected from the noise level. UNKNOWN NOISE COVARIANCE
The critical filter needs to know precisely the noise level of the data in order to tune itself optimally. However, inmany measurement situations this is also not reliably known and the data have to provide this information as well.As noise and signal variance are somehow degenerately encoded in the data, an informative prior for the noise levelis mandatory whenever a non-informative prior for the signal power spectrum is used. The noise covariance might bedecomposed as N = ∑ i η i N i , where the N i denote block matrices in data space that contain the best available guessesfor the error covariances. We model our noise prior knowledge with a multivariate inverse-gamma prior, P ( η ) ∝ ∏ i η − β i i exp (cid:18) − r i η i (cid:19) , (13)by choosing β i > r i > N i appropriately.The resulting signal estimation scheme is as before, just with an additional estimator for the noise parameters, givenby η (cid:63) i = r i + Tr (cid:8)(cid:2) ( d − R m (cid:63) ) ( d − R m (cid:63) ) † + R D (cid:63) R † (cid:3) N − i (cid:9) β i − + µ i , (14)where N − i is the pseudo-inverse of N i and µ i = Tr ( N i N − i ) the dimensionality of the noise blocks [9]. This extendedcritical filter has successfully been used to reconstruct an all sky map of the galactic magnetic field [10]. UNKNOWN CALIBRATION
Finally, it might also well be that the response in Eq. (4) is not fully known. The process and the result of determiningthe response is called calibration . Ideally, the measurement of a well-known external calibration signal is used.However, often this is not sufficient, since the instrument response might change with time. In this case, the scientificsignal of interest might be used as a calibrator itself. The resulting self-calibration scheme ( selfcal ) assumes somecalibration, infers the signal with a reconstruction method, assumes this signal to be correct in order to calibrate onthis, and iterates until convergence. Although the selfcal scheme is widely used, in particular in radio astronomy, aninformation theoretical investigation and a proof about its convergence was only provided recently [11].There it has been shown that if the imaging and calibration algorithms used can be regarded as MAP estimators,the classical selfcal scheme corresponds to a joint MAP estimation of signal and calibration. Similar to the case ofunknown signal covariance, we expect such a joint MAP estimation to be sub-optimal in an L -error norm sense.Specifically, we assume the unknown calibration parameters γ = ( γ i ) i to affect the response linearly, R γ = B + ∑ a γ a B a , (15)with B and B a known and γ -independent, and P ( γ ) = G ( γ , Γ ) the calibration prior with some known calibrationuncertainty matrix Γ . Using the Empirical Bayes approach, we find that the signal reconstruction m (cid:63) should use thecalibration point estimate given by Ref. [11] self-consistently γ (cid:63) = ∆ (cid:63) h (cid:63) , with (16) ∆ (cid:63) − ab = Γ − ab + Tr (cid:104)(cid:0) m (cid:63) m (cid:63) † + D (cid:63) (cid:1) B a † N − B b (cid:105) , and h (cid:63) b = m (cid:63) † B b † N − d − Tr (cid:104)(cid:0) m (cid:63) m (cid:63) † + D (cid:63) (cid:1) B N − B b (cid:105) . .00.50.00.51.01.5 s i g n a l & r e c o n s t r u c t i o n signal domain [position] r e c o n s t r u c t i o n e rr o r original signalclassical selfcalnew selfcal g a i n & c a li b r a t i o n data domain [time] c a li b r a t i o n e rr o r original gains classical selfcalnew selfcal Source: Figures from [11].
This is a new selfcal scheme, which improves over classical selfcal by taking the posterior signal uncertainty D (cid:63) = (cid:104) ( s − m (cid:63) ) ( s − m (cid:63) ) † (cid:105) ( s | d , γ (cid:63) ) into account.An example of selfcal can be seen in the figure on this page. There, an unknown signal is shown, which wasscanned three times with an instrument with varying gains (also shown) according to the measurement equation d t = ( + γ t ) s x t + n t with x t = t mod1. The solutions of the classical selfcal and the new selfcal schemes are comparedto this. Both solve Eq. (6) and Eq. (16) simultaneously, while classical selfcal ignores D (cid:63) in Eq. (16). The figure showsthat the new selfcal scheme alleviates partly a bias of classical selfcal for high gain solutions. ASTROPHYSICAL APPLICATIONS
The versatility of IFT to deal with various measurement problems, to take uncertainties on covariances and responsesinto account and to provide implementable algorithms have let to a number of concrete IFT applications in astronomyand astrophysics. Many of them build on NIFT Y , Numerical Information Field Theory [4, 5], a publicly available,object-oriented library that permits the user to program IFT algorithms abstractly, irrespective of the space discretiza-tion used. IFT was already applied to a number of areas, which we briefly discuss in the following.The cosmic microwave background statistic is nearly Gaussian. However, different inflationary scenarios of cos-mology predict different levels of non-Gaussian signatures. Optimal non-Gaussianity estimators could be constructedusing Feynman diagrams [3] and by a saddle point approximation [12].The cosmic large-scale structure of the matter distribution in the Universe is traced by galaxies and can thereforebe reconstructed from galaxy surveys [13, 14, 15, and others]. The power spectrum of the large-scale structure can bemeasured as well [14] and analyzed for its information content on cosmology. Although the cosmic density field is astrictly positive quantity, it is often treated as a Gaussian random field. A log-normal model would be more accurate3, 16, 17]. For its usage, it is necessary to translate between linear and logarithmic spectra [18]. Galactic magnetism can be studied using the Faraday rotation of extragalactic radio sources. The construction ofall-sky Faraday maps from such individual measurements towards the directions of these sources was possible thanksto the critical filter [19] and the extended critical filter [20].
Interferometric radio astronomy faces the problem that an interferometer array measures only some part of theFourier transformed sky brightness. The other part has to be reconstructed by the imaging algorithm.
RESOLVE , anovel IFT based imaging method, which assumes the sky brightness to follow a log-normal distribution with unknownpower spectrum, seems to be superior in reconstructing diffuse emission compared to classical algorithms (e.g.
CLEAN and
MAXENT ) [21].
Gamma- and X-ray astronomy have to deal with extended emission, point sources, Poisson statistics, inhomoge-neous sky exposure, and complex point spread functions. The IFT based D PO algorithm for denoising, deconvolving,and decomposition of photon observations handles these and produces maps of the extended emission, catalogs of thepoint sources, and angular power spectra of the diffuse flux [22].These applications demonstrate that IFT is a versatile framework to develop and analyze measurement and imagingproblems not only in astronomy, but also in cosmology and other areas.
ACKNOWLEDGMENTS
First of all, I should apologize for exclusively reviewing IFT work I was involved in. There is a large amount ofrelated work I had to ignore for brevity. I want to thank my IFT co-investigators of the here discussed works, MichaelBell, Sebastian Dorn, Mona Frommert, Maksim Greiner, Jens Jasche, Henrik Junklewitz, Righi Khatri, FranciscoShu Kitaura, Niels Oppermann, Martin Reinecke, Georg Robbers, Marco Selig, and Cornelius Weig. Furthermore, Iacknowledge helpful comments on the manuscript by Vanessa Böhm, Sebastian Dorn, and Marco Selig.
REFERENCES
1. T. Enßlin, “Information field theory,” in
American Institute of Physics Conference Series , edited by U. von Toussaint, 2013,vol. 1553 of
American Institute of Physics Conference Series , pp. 184–191, .2. J. C. Lemm,
Bayesian Field Theory , Johns Hopkins University Press, 2003.3. T. A. Enßlin, M. Frommert, and F. S. Kitaura,
Phys.Rev.D , 105005 (2009), .4. M. Selig, M. R. Bell, H. Junklewitz, N. Oppermann, M. Reinecke, M. Greiner, C. Pachajoa, and T. A. Enßlin, A&A , A26(2013), .5. M. Selig, “The NIFT Y way of Bayesian signal inference,” in Bayesian Inference and Maximum Entropy Methods in Scienceand Engineering , 2013, vol. these proceedings.6. T. A. Enßlin, and C. Weig,
Phys.Rev.E , 051112 (2010), .7. N. Oppermann, M. Selig, M. R. Bell, and T. A. Enßlin, Phys.Rev.E , 032136 (2013), .8. T. A. Enßlin, and M. Frommert, Phys.Rev.D , 105014 (2011), .9. N. Oppermann, G. Robbers, and T. A. Enßlin, Phys.Rev.E , 041118 (2011), .10. N. Oppermann et al., A&A , A93 (2012), .11. T. A. Enßlin, H. Junklewitz, L. Winderling, and M. Selig,
ArXiv e-prints (2013), .12. S. Dorn, N. Oppermann, R. Khatri, M. Selig, and T. A. Enßlin,
Phys.Rev.D , 103516 (2013), .13. F. S. Kitaura, and T. A. Enßlin, MNRAS , 497–544 (2008), .14. J. Jasche, F. S. Kitaura, B. D. Wandelt, and T. A. Enßlin,
MNRAS , 60–85 (2010), .15. F.-S. Kitaura, P. Erdoˇgdu, S. E. Nuza, A. Khalatyan, R. E. Angulo, Y. Hoffman, and S. Gottlöber,
MNRAS , L35–L39(2012), .16. F. S. Kitaura et al.,
MNRAS , 183–203 (2009), .17. C. Weig, and T. A. Enßlin,
MNRAS , 1393–1411 (2010), .18. M. Greiner, and T. A. Enßlin,
ArXiv e-prints (2013), .19. N. Oppermann, H. Junklewitz, G. Robbers, and T. A. Enßlin,
A&A , A89+ (2011), .20. N. Oppermann, H. Junklewitz, G. Robbers, M. R. Bell, T. A. Enßlin, A. Bonafede, R. Braun, J. C. Brown, T. E. Clarke, I. J.Feain, B. M. Gaensler, A. Hammond, L. Harvey-Smith, G. Heald, M. Johnston-Hollitt, U. Klein, P. P. Kronberg, S. A. Mao,N. M. McClure-Griffiths, S. P. O’Sullivan, L. Pratley, T. Robishaw, S. Roy, D. H. F. M. Schnitzeler, C. Sotomayor-Beltran,J. Stevens, J. M. Stil, C. Sunstrum, A. Tanna, A. R. Taylor, and C. L. Van Eck,
A&A , A93 (2012), .21. H. Junklewitz, M. R. Bell, M. Selig, and T. A. Enßlin,
ArXiv e-prints (2013), .22. M. Selig, and T. Enßlin,
ArXiv e-prints (2013),1311.1888