[PDF] Cosmological Inference from Galaxy-Clustering Power Spectrum: Gaussianization and Covariance Decomposition

Abstract

Likelihood fitting to two-point clustering statistics made from galaxy surveys usually assumes a multivariate normal distribution for the measurements, with justification based on the central limit theorem given the large number of overdensity modes. However, this assumption cannot hold on the largest scales where the number of modes is low. Whilst more accurate distributions have previously been developed in idealized cases, we derive a procedure suitable for analysing measured monopole power spectra with window effects, stochastic shot noise, and the dependence of the covariance matrix on the model being fitted all taken into account. A data transformation is proposed to give an approximately Gaussian likelihood, with a variance--correlation decomposition of the covariance matrix to account for its cosmological dependence. By comparing with the modified- t likelihood derived under the usual normality assumption, we find in numerical tests that our new procedure gives more accurate constraints on the local non-Gaussianity parameter f NL , which is sensitive to the large-scale power. A simple data analysis pipeline is provided for straightforward application of this new approach in preparation for forthcoming large galaxy surveys such as DESI and Euclid.

Full PDF

MMNRAS , 951–965 (2019) Preprint 2019 April 23 Compiled using MNRAS L A TEX style ﬁle v3.0

Cosmological inference from galaxy-clustering power spectrum:Gaussianization and covariance decomposition

Mike (Shengbo) Wang , (cid:63) Will J. Percival,

Santiago Avila , Robert Crittenden and Davide Bianchi Institute of Cosmology and Gravitation, University of Portsmouth, Burnaby Road, Portsmouth PO1 3FX, UK Department of Physics and Astronomy, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada Perimeter Institute for Theoretical Physics, 31 Caroline Street North, Waterloo, Ontario N2L 2Y5, Canada Institut de Ciències del Cosmos, University of Barcelona, IEEC-UB, Martí i Franqués, 1, E-08028 Barcelona, Spain

Accepted 2019 March 18. Received 2019 March 15; in original form 2018 November 19

ABSTRACT

Likelihood ﬁtting to two-point clustering statistics made from galaxy surveys usually assumesa multivariate normal distribution for the measurements, with justiﬁcation based on the cen-tral limit theorem given the large number of overdensity modes. However, this assumptioncannot hold on the largest scales where the number of modes is low. Whilst more accuratedistributions have previously been developed in idealized cases, we derive a procedure suit-able for analysing measured monopole power spectra with window eﬀects, stochastic shotnoise and the dependence of the covariance matrix on the model being ﬁtted all taken intoaccount. A data transformation is proposed to give an approximately Gaussian likelihood, witha variance–correlation decomposition of the covariance matrix to account for its cosmologicaldependence. By comparing with the modiﬁed- t likelihood derived under the usual normalityassumption, we ﬁnd in numerical tests that our new procedure gives more accurate constraintson the local non-Gaussianity parameter f NL , which is sensitive to the large-scale power. Asimple data analysis pipeline is provided for straightforward application of this new approachin preparation for forthcoming large galaxy surveys such as DESI and Euclid . Key words: methods: data analysis – methods: statistical – large-scale structure of Universe.

The matter power spectrum, which measures the two-point corre-lation in Fourier space, is an important statistic for describing thelarge-scale structure of the Universe. It contains all of the informa-tion about a Gaussian random ﬁeld, which describes cosmic densityﬂuctuations on large scales where non-linearities are negligible.Galaxies are linearly biased tracers of the underlying matter dis-tribution on large scales, and thus measurements of the comovinggalaxy-clustering power spectrum can provide a wealth of infor-mation about fundamental cosmological parameters. Moreover, asthe angular positions and redshifts of galaxies are observed, match-ing the expected comoving clustering oﬀers a geometrical test ofthe Universe through the distance–redshift relationship, and mea-surements of redshift space distortions (RSD) provide a powerfulprobe of structure growth. Upcoming large-volume surveys, includ-ing the Dark Energy Spectroscopic Instrument (DESI Collabo-ration; Aghamousa et al. 2016) and Euclid (Euclid Consortium; (cid:63) Email: [email protected] Laureijs et al. 2011), will be able to tightly constrain cosmologi-cal models with unprecedented precision, but the accuracy of theseconstraints relies on performing careful statistical analyses.The multivariate normal distribution is ubiquitous in modellingcosmological observables thanks to the central limit theorem, andthis normality assumption is commonly found in likelihood analysesof power spectrum measurements from galaxy surveys (e.g. Alamet al. 2017; Abbott et al. 2018). Given a theoretical model for thedata, the key ingredient of a multivariate normal distribution is itscovariance matrix, and in the past many eﬀorts have been devotedto the accurate estimation of covariance matrices subject to limitedcomputational resources.Unbiased covariance matrix estimates are often made from aset of mock galaxy catalogues synthesized using algorithms rangingfrom fast but approximate perturbation-theory algorithms to slowyet detailed N -body simulations, or diﬀerent combinations of those(e.g. Manera et al. 2013; Kitaura et al. 2016; Avila et al. 2018). Anoverview and comparison of those methods is provided in a series ofpapers by Lippich et al. (2019), Blot et al. (2019), and Colavincenzoet al. (2019). One could further reduce the computational costs andenhance the precision in this estimation step through various statis-tical techniques, e.g. the shrinkage method for combining empirical © 2019 The Authors a r X i v : . [ a s t r o - ph . C O ] A p r M. S. Wang et al. estimates and theoretical models (Pope & Szapudi 2008) and thecovariance tapering method (Kaufman et al. 2008; Paz & Sánchez2015).However, there are multiple caveats to using an estimated co-variance matrix. As noted by Hartlap et al. (2007) and known instatistics (see e.g. Anderson 2003), the inverse of an unbiased co-variance matrix estimate is biased with respect to the true precision(inverse covariance) matrix that appears in the likelihood function.Therefore, one needs to include a multiplicative correction that isdependent on the data dimension and the number of catalogue sam-ples used for estimation. Further, the error in covariance estimationmust be properly propagated to inferred parameter uncertainties,and achieving desired precision requires a large number of cata-logue samples (Dodelson & Schneider 2013; Taylor et al. 2013;Percival et al. 2014). Rather than correcting the derived errors, onecould adopt the Bayesian principle and treat both the unknown un-derlying covariance matrix and its estimate as random variables, andmarginalize over the former given the latter using Bayes’ theorem(Sellentin & Heavens 2016). In addition, as mock catalogues arecomputationally expensive, they are often produced at ﬁxed ﬁducialcosmological parameters, whereas in reality the covariance matrixmay have cosmological dependence and thus has to vary with themodel being tested (see e.g. Eiﬂer et al. 2009, in the context ofcosmic shear). Failure to account for any of these factors could ad-versely impact parameter estimation, which should be an importantconcern to future surveys possessing greater statistical power.Another fundamental issue to be addressed is the normalityassumption itself when the premises of the central limit theorem arenot fulﬁlled. In this scenario, an arbitrarily precise covariance ma-trix estimate is not suﬃcient for accurate parameter inference dueto signiﬁcant higher moments in the non-Gaussian likelihood, asdemonstrated by Sellentin & Heavens (2018) in the context of weaklensing. This also happens to galaxy-clustering measurements on thelargest survey scales, where fewer overdensity modes are availabledue to the ﬁnite survey size; if one is to infer parameters sensi-tive to these large-scale measurements while assuming a Gaussianlikelihood, the parameter estimates are likely to be erroneous.Indeed, in the past there have been eﬀorts to go beyond theGaussian likelihood approximation in various contexts (e.g. recentlySellentin et al. 2017; Seljak et al. 2017). Motivated by constraintsimposed by non-negativity of the power spectrum, Schneider &Hartlap (2009), Keitel & Schneider (2011) and Wilking & Schnei-der (2013) have found a transformation that improves the normalityassumption for the bounded conﬁguration-space correlation func-tion which has a non-normal distribution. Sun et al. (2013) haveconsidered the gamma distribution for the power spectrum mul-tipoles and the log-normal plus Gaussian approximation, and as-sessed their eﬀects on parameter estimation in comparison withthe Gaussian approximations. Kalus et al. (2016) have investigatedthis problem for the three-dimensional power spectrum by derivingthe probability distribution of a single-mode estimator for Gaussianrandom ﬁelds, and comparing it with approximations inspired bysimilar studies for the cosmic microwave background (e.g. Verdeet al. 2003; Percival & Brown 2006; Smith et al. 2006; Hamimeche& Lewis 2008). However, these previous works are either limited tothe univariate or bivariate distribution, or have neglected windowfunctions due to survey geometry and selection eﬀects, which couldcorrelate independent overdensity modes.In this work, we focus on the power spectrum monopole,following the Feldman–Kaiser–Peacock approach (Feldman et al.1994, hereafter FKP). Our work can also be trivially extended tothe second and fourth power-law moments measured using the Ya- mamoto estimator (Yamamoto et al. 2006). The two approaches areequivalent for the monopole moment. Alternatives to using theseestimators on large scales would be to either directly ﬁt the observedoverdensity modes or to perform an analysis based on the quadraticmaximum likelihood (QML) method (Tegmark 1997; Tegmark et al.1998), which would be more optimal given the large-scale windoweﬀects in the power spectrum, but computationally more demandingdue to evaluation of the large data inverse covariance matrix.For power-spectrum monopole measurements made using theFKP procedure, we derive the underlying non-normal probabilitydistribution for the windowed galaxy-clustering power spectrum inthe linear regime with random shot noise. The multivariate normaldistribution is reinstated through a Gaussianizing transformationthat improves data normality, and cosmological dependence of thecovariance matrix is fully included by the variance–correlation de-composition. Note that our transformation predicts a new variablewhose expectation is not the same transformation of the model. In-stead it is simply designed to give a likelihood that is multivariateGaussian in the data vector. The work is presented as follows.(i) We review the FKP framework for analysing galaxy-clustering measurements in Section 2, and derive the probabilitydistribution of the windowed power spectrum for a Gaussian ran-dom ﬁeld.(ii) A Gaussianization scheme is presented in Section 3, whichgives a new power-spectrum likelihood approximation with bothcosmological dependence and random scatter in the estimated co-variance matrix taken into consideration.(iii) We numerically test our procedure and demonstrate its su-periority to the traditional likelihood treatments in Section 4 byperforming inference on the local non-Gaussianity parameter f NL and comparing the shape of the new likelihood with that of the truelikelihood from simulated data sets.(iv) A simple pipeline is provided in Section 5 for straightforwardapplication of this method. We discuss in Section 6 the applicabilityof this new approach and motivate future work.We also provide a summary of notations (Table 1) used in this work,which may be of particular use to the pipeline detailed in Section 5. We assume that the galaxy redshifts measured in a given surveyhave been converted to comoving distances using a ﬁducial cosmo-logical model at ﬁxed cosmological parameter(s) θ = θ f . This isrequired to measure the comoving local galaxy density, and hencethe comoving power spectrum. Rather than recalculating the powerspectrum for each model to be tested, we include the cosmologicalmodel dependence of this translation in the model to be ﬁtted to thedata.Let n g ( r ) be the observed galaxy number density ﬁeld and n s ( r ) be the number density ﬁeld in an unclustered random cataloguewith expectation ¯ n ( r ) = (cid:10) n g ( r ) (cid:11) = α (cid:10) n s ( r ) (cid:11) , where α matches theobserved and catalogue mean densities. Following FKP, one candeﬁne the zero-mean ﬁeld (cid:98) F ( r ) = w ( r )√ I (cid:2) n g ( r ) − α n s ( r ) (cid:3) (1) MNRAS486

Notations used in the proposed ﬁnal likelihood analysis pipeline.Symbol Meaning Remarks θ Cosmological parameter(s) – · ( f ) Quantities at the ﬁducial cosmology as a superscript in Section 5 · ( d ) Measurements/data realizations as a superscript in Section 5 N s ≡ m + P ( k ) Power spectrum model – (cid:101) P ( k ) Band power spectrum model – a = , . . ., p Index for k -bins – ( R a , η a ) Gamma distribution shape–scale parameters varying with the cosmological model ν a Transformation parameter ﬁxed at ν = / Z Gaussianized data vector – µ a , σ a Mean and variance for Gaussianized band power varying with the cosmological model (cid:98) Σ Estimated covariance matrix for Gaussianized data rescaled with varying cosmology L ( θ ) Likelihood function – for a weighted mask w ( r ) , where the normalization constant is I = ∫ d r w ( r ) ¯ n ( r ) . (2)The underlying galaxy-clustering power spectrum P true ( k ) isequivalent to the Fourier transform of the conﬁguration-space two-point correlation ξ ( r ) by the Wiener–Khinchin theorem (see e.g.Gabrielli et al. 2005). This power spectrum is convolved (denotedwith a tilde) with a window due to the mask, (cid:101) P ( k ) = (cid:10)(cid:12)(cid:12) (cid:98) F ( k ) (cid:12)(cid:12) (cid:11) = ∫ d q ( π ) | G ( k − q )| P true ( q ) + P shot , (3)where the function G ( k ) = √ I ∫ d r e i k · r w ( r ) ¯ n ( r ) , (4)and there is an additional scale-independent shot noise power (seeappendix A) P shot = + α V ∫ d r ¯ n ( r ) − . (5)The convolved power spectrum may be expanded in the basisof Legendre polynomials L (cid:96) ( ∆ ) , (cid:101) P ( k ) = ∞ (cid:213) (cid:96) = (cid:101) P (cid:96) ( k , ∆ ) L (cid:96) ( ∆ ) , (6)where ∆ is cosine of the angle between the wavevector k and theline of sight. With L ≡

1, the monopole of the convolved powerspectrum is a spherical average over the shell V k at radius k , (cid:101) P ( k ) = ∫ V k d k V k (cid:101) P ( k ) = ∫ d q ( π ) W ( k , q ) P true ( q ) + P shot , (7)where we have introduced the window function W ( k , q ) = ∫ V k d k V k | G ( k − q )| . (8)Expanding the power spectrum in multipoles allows the computa-tional demand of the convolution to be reduced (Wilson et al. 2017).For a standard linear power spectrum model that is complete withthe ﬁrst three even power-law moments, the convolution could berewritten requiring three window matrices whose entries are of theform W (cid:96) ( k , q ) . However, in order to keep our equations compact,we retain the more general form and focus on the monopole here. The windowed power spectrum monopole measured in p binsconstitutes the band power data vector (cid:101) Y ≡ (cid:2) (cid:98) P ( k a ) (cid:3) pa = , (9)where a hat denotes a realization or an estimator. We can constructanother vector Y ≡ (cid:2)(cid:12)(cid:12) ˆ δ ( q i ) (cid:12)(cid:12) (cid:3) ri = (10)that estimates the unconvolved power with shot noise at r discretewavenumbers, where the galaxy overdensity ﬁeld estimator isˆ δ ( r ) = n g ( r ) − α n s ( r ) ¯ n ( r ) = √ I w ( r ) ¯ n ( r ) (cid:98) F ( r ) . (11)The discretized analogue to equation (7) for the measured power isthen (cid:101) Y = B Y , (12)where the window function is encoded in the mixing matrix B ai = W ( k a , q i ) , (13)which may be suitably normalized so that ∫ d q ( π ) W ( k , q ) = ∀ k ⇔ (cid:213) i B ai = ∀ a . (14) In this subsection we consider the probability distribution of a bandpower measurement in a single bin at scale k a . This will be extendedto a multivariate distribution including correlations between bins atdiﬀerent scales in Section 3. If ˆ δ ( r ) is directly drawn from a Gaussian random ﬁeld, then thesquare amplitude of a single Fourier overdensity mode provides anestimator (cid:98) P i for the unconvolved power P i ≡ P ( q i ) that is expo-nentially distributed with the probability density function (PDF) P (cid:2) (cid:98) P i (cid:3) = P − i exp (cid:16) − (cid:98) P i (cid:14) P i (cid:17) , (15)and has expectation E (cid:2) (cid:98) P i (cid:3) = P i and variance Var (cid:2) (cid:98) P i (cid:3) = P i . MNRAS , 951–965 (2019)

M. S. Wang et al.

However, galaxy formation is a discrete point process, meaningthat the overdensity ﬁeld realization ˆ δ ( r ) is a Poisson sample ofthe underlying Gaussian overdensity ﬁeld (Peebles 1980; Feldmanet al. 1994). Consequently, each mode-power estimator Y i has anadditional independent shot noise component (cid:15) i , Y i = (cid:98) P i + (cid:15) i . (16)In the large galaxy number limit, the shot noise (cid:15) i is also exponen-tially distributed (see appendix A).For the bin centred at k a , the window function mixes mutu-ally independent exponential variables (cid:8) (cid:98) P i (cid:9) and { (cid:15) i } into the bandpower (cid:101) Y a = r (cid:213) i = B ai Y i = r (cid:213) i = B ai (cid:0) (cid:98) P i + (cid:15) i (cid:1) . (17)This is an exponential mixture that follows the hypo-exponentialdistribution (see appendix B) P (cid:2)(cid:101) Y a ; { λ i (cid:48) } (cid:3) = (cid:213) i (cid:48) (cid:18) (cid:214) j (cid:48) (cid:44) i (cid:48) − λ j (cid:48) / λ i (cid:48) (cid:19) λ − i (cid:48) exp (cid:16) − λ − i (cid:48) (cid:101) Y a (cid:17) , (18)with positive scale parameters λ i (cid:48) ∈ { B ai P i , B ai P shot : i = , . . . , r } (19)being the individual contributions of the unconvolved power andshot noise power to the a -th bin. It is understood that a well-deﬁnedlimit is taken in equation (18) in the case λ i (cid:48) = λ j (cid:48) .We now see that the band power measurement (cid:101) Y a = (cid:98) P ( k a ) is hypo-exponentially distributed for a Poisson-sampled Gaussianoverdensity ﬁeld. By the central limit theorem, as the number ofcontributing modes in equation (17) increases, the hypo-exponentialvariable (cid:101) Y a converges in distribution to a normal variable. This isthe basis for the normality assumption often used in power spec-trum analyses: the underlying power as well as shot noise becomesnormally distributed, with the latter subtracted as a deterministicquantity to recover the former. However, on the largest scales in asurvey where the number of overdensity modes is the fewest, thereis clear deviation between the normal distribution and the hypo-exponential distribution. The univariate PDF given by equation (18) is in a diﬃcult formto manipulate as it involves many uncompressed scale parameters { λ i (cid:48) } . A robust approximation is the exponentially modiﬁed gammadistribution with one shape parameter R and two scale parame-ters ( τ, η ) (see Golubev 2016), but determining these parametersinvolves solving a cubic algebraic equation which is cumbersome,making the procedure followed in this paper computationally de-manding or even unfeasible. We can adopt a simpler approximationwith the gamma distribution P Γ (cid:2)(cid:101) Y a ; R , η (cid:3) = η − R Γ ( R ) (cid:101) Y R − a e − (cid:101) Y a / η , (20)where by matching the mean and variance of the hypo-exponentialdistribution we can easily write down the shape and scale parameters R = E (cid:2)(cid:101) Y a (cid:3) (cid:46) Var (cid:2)(cid:101) Y a (cid:3) = (cid:18) (cid:213) i (cid:48) λ i (cid:48) (cid:19) (cid:30)(cid:213) i (cid:48) λ i (cid:48) ,η = Var (cid:2)(cid:101) Y a (cid:3) (cid:46) E (cid:2)(cid:101) Y a (cid:3) = (cid:18) (cid:213) i (cid:48) λ i (cid:48) (cid:19) (cid:30)(cid:213) i (cid:48) λ i (cid:48) . (21) In place of the mean and variance parameters of a normal dis-tribution, these two gamma distribution parameters determine the non-normal distribution of the band power measurement in eachbin, and have a natural interpretation in our context: the shape R isthe eﬀective number of independent overdensity modes contribut-ing to the bin under window function mixing, and the scale η isthe eﬀective convolved power in that bin. In the limiting case thatthere is only a single non-vanishing mode (e.g. pure shot noise), thegamma distribution coincides exactly with the hypo-exponentialdistribution, both of which reduce to the exponential distribution.In principle, the shape and scale parameters ( R , η ) could be cal-culated for each bin using analytical expressions of the band powerexpectation E (cid:2)(cid:101) Y a (cid:3) and variance Var (cid:2)(cid:101) Y a (cid:3) , with knowledge of thefull window mixing matrix B and all observed overdensity modes (cid:8) ˆ δ ( q i ) (cid:9) . Practically, these quantities are not always available, as fastwindow convolution is now often performed in conﬁguration spacewith ever larger surveys and higher resolution requirements, and thewindowed power spectrum is computed via a Hankel transform ofthe convolved two-point correlation multipoles (cid:101) ξ (cid:96) ( r ) (Beutler et al.2017; Wilson et al. 2017). Under these circumstances, the vari-ance of the measured band power Var (cid:2)(cid:101) Y a (cid:3) may be estimated frommock catalogues, provided the error in this estimation is subdomi-nant compared to other sources of uncertainty. However, it is worthpointing out ongoing eﬀorts in developing accurate analytic covari-ance matrices that could possibly evade the problem of covarianceestimation altogether, e.g. Li et al. (2019). For the sake of completeness and for reference, we write down theunivariate normal distribution for the band power (cid:101) Y a in terms of theshape–scale parameters ( R , η ) for the a -th bin, P N (cid:2)(cid:101) Y a ; R η, R η (cid:3) = (cid:112) π R η exp (cid:34) − (cid:0)(cid:101) Y a − R η (cid:1) R η (cid:35) , (22)so that its expectation and variance match those of the exact hypo-exponential and approximate gamma distributions. A multivariate PDF transformation for Y B (cid:55)−→ (cid:101) Y is captured by theJacobian factor J = det ( BB (cid:124) ) , where the distribution of Y is theproduct of independent exponential components. The window mix-ing matrix B ∈ R p × r + is non-square ( p < r ), and expressing thetransformed PDF explicitly in terms of the band power vector (cid:101) Y ,whose components are correlated, requires the inversion of B . Onecould introduce ( r − p ) ‘helper components’ in (cid:101) Y to pad B into asquare matrix, and eventually marginalize out these additional ran-dom variables. However, the linear transformation induced by thepadded square matrix will generally map the domain of the randomvector Y from R r + to a diﬀerent domain in R r , making marginaliza-tion diﬃcult and susceptible to the ‘curse’ of dimensionality, whichis likely as r (cid:29) p for massive data compression.Instead of trying to determine a full multivariate transformationfor the windowed band power, we subscribe to a component-wise Gaussianization strategy, and reinstate the multivariate normalityassumption in the

Gaussianized data vector Z ← (cid:0) (cid:101) Y . The reasoningbehind this is two fold. First, each component of the random vector isnow certainly univariate normal, as should be the case for a bona ﬁde MNRAS486 , 951–965 (2019) aussianization and covariance decomposition multivariate normal distribution. Secondly, if the cross-bin correla-tion is weak and the covariance matrix has a narrow-band structure,the components become approximately independent and univari-ate Gaussianization is equivalent to multivariate Gaussianization;this can be achieved with a suitable binning choice where the binwidth is chosen to be greater than the known window correlationlength. Although sophisticated full Gaussianization schemes exist(e.g. Laparra et al. 2011), they are often iteratively applied basedon empirical data and may not be suitable for forward modellingof theoretical models. We thus leave more advanced multivariateGaussianization methods for future investigations.We propose a simple Gaussianization scheme using the Box–Cox transformation. This scheme has already been applied in cos-mology to deal with non-Gaussian parameter spaces (Joachimi &Taylor 2011; Schuhmann et al. 2016), but our context and imple-mentation diﬀer from these studies. An alternative scheme directlytransforming a non-normal distribution into the standard normal dis-tribution exists by matching the cumulative distribution function;however, computationally costly numerical integration is necessaryfor calculating transformed moments, so we relegate this scheme toappendix C for reference. As we shall see later, the Box–Cox trans-formation suppresses higher order moments to achieve approximate Gaussianization and is suﬃciently accurate for our purposes. To per-form the transformation, we use the ﬁducial shape–scale parameters ( R f , η f ) which are determined by the power spectrum model P f ( k ) at ﬁxed cosmological parameter(s) θ = θ f . We deﬁne the Box–Cox transformation (Box & Cox 1964) for eachcomponent of Y (index suppressed for brevity) by Z = (cid:101) Y ν , ν > , (23)where positive ν is chosen to ensure regularity. The transformedPDF in Z is now P (cid:2) Z (cid:0)(cid:101) Y (cid:1) ; R , η (cid:3) = (cid:12)(cid:12) J Z (cid:0)(cid:101) Y (cid:1)(cid:12)(cid:12) − η − R Γ ( R ) (cid:101) Y R − e − (cid:101) Y / η , (24)where J Z (cid:0)(cid:101) Y (cid:1) = ν (cid:101) Y ν − is the Jacobian for the transformation (cid:101) Y (cid:55)→ Z . The transformed K th moment is given by E (cid:2) Z K (cid:3) = Γ ( R + K ν ) Γ ( R ) η K ν , (25)and we can write down the Gaussianized mean and variance µ ( R , η ) ≡ E [ Z ] = Γ ( R + ν ) Γ ( R ) η ν ,σ ( R , η ) ≡ Var [ Z ] = Γ ( R + ν ) Γ ( R ) − Γ ( R + ν ) Γ ( R ) η ν . (26)To determine the transformation parameter ν , we demand thethird central moment vanish for the ﬁducial model parameters θ f ,0 = E (cid:2) Z (cid:3) − E (cid:2) Z (cid:3) E (cid:2) Z (cid:3) + E (cid:2) Z (cid:3) = (cid:26) Γ ( R f + ν ) Γ ( R f ) − Γ ( R f + ν ) Γ ( R f ) Γ ( R f + ν ) Γ ( R f ) + (cid:20) Γ ( R f + ν ) Γ ( R f ) (cid:21) (cid:27) η ν . (27)The dependence of ν on R f required to satisfy this constraintis shown in Fig. 1 as a numerical solution (dashed blue line). The ν numerical solutionﬁtting formula10 R f − − % e rr o r Figure 1.

The dependence of the optimal Box–Cox transformation param-eter, ν , on the ﬁducial gamma-distribution shape parameter R f . Top panel :comparison of the numerical solution to equation (27) ( solid line ) with theempirical ﬁtting formula given by equation (30) ( dash–dotted line ) for deter-mining the optimal transformation parameter ν as a function of the ﬁducialshape parameter R f . Bottom panel : the percentage error of the ﬁtting formulafrom the numerical solution ( dashed line ) is at the sub per cent level. observed asymptotic behaviour can be understood by consideringthe expansion of gamma function ratios (Burić & Elezović 2012) Γ ( A + b ) Γ ( A ) ∼ A b (cid:104) + ( b − ) b A − + ( b − )( b − )( b − ) b A − + · · · (cid:105) , (28)so as R f → ∞ with η R f < ∞ (ﬁnite mean), equation (27) becomes0 (cid:39) R − ν ( ν − ) . (29)The non-trivial solution is ν = /

3. The precise value of ν mat-ters less for increasing R f owing to the suppression factor R − , amanifestation of asymptotic normality in the limit R f (cid:29) ν ≈ + . (cid:104) − exp (cid:16) . / R . (cid:17)(cid:105) . (30)Comparing this ﬁt to the true solution in Fig. 1 shows that thisﬁtting formula performs well, being accurate to sub per cent levels.As we shall see in Section 4, in fact simply assuming the ﬁxed value ν = / ν with the ﬁducial parameter R f , i.e.the choice of ﬁducial cosmology. In this subsection we discuss two general treatments of estimatedcovariance matrices which could be in the band power data (cid:101) Y or inthe Gaussianized data Z , although our treatments are applied to theGaussianized data in Section 5. Now that we have a Gaussianizing transformation, the model de-pendence of the covariance matrix still needs to be considered. This

MNRAS , 951–965 (2019)

M. S. Wang et al. sub-subsection shows how the parameter dependence can be in-cluded in the covariance matrix estimate analytically. In Section 2,we have ignored the integral constraint in the modelling of the powerspectrum, which biases the power spectrum measured due to estima-tion of the mean galaxy number density from the galaxy catalogueitself; since this oﬀset in the measured power can be subtracted (seee.g. Peacock & Nicholson 1991; Beutler et al. 2014), we do notconsider its contribution here in this work.A generic covariance matrix Σ may be decomposed into thediagonal matrix Λ = ( diag Σ ) / and the correlation matrix C , Σ = ΛCΛ , (31)where Λ is the diagonal matrix of the variances. For the bandpower spectrum on large scales, whether Gaussianized or not , theoﬀ-diagonal correlation in C is solely induced by the window func-tion as encoded in the mixing matrix B . Whilst this mixing ma-trix does depend on the ﬁducial cosmological model through thedistance–redshift relation, crucially it does not depend on the modelbeing tested through the power spectrum. This insight makes thevariance–correlation decomposition particularly useful, for the de-composition into Λ and C is precisely the separation of any cosmol-ogy dependence from cosmology independence in the covariancematrix Σ . Therefore one may obtain a covariance matrix estimate (cid:98) Σ f = Λ f CΛ f from mock catalogues produced at a ﬁxed cosmology θ = θ f with the ﬁducial power spectrum model P f ( k ) , and calibratethis estimate by rescaling with the diagonal variances to allow forvarying cosmology, (cid:98) Σ ( θ ) = Λ ( θ ) Λ − (cid:98) Σ f Λ − Λ ( θ ) . (32)For instance, for the Gaussianized band power Z at cosmologicalparameter(s) θ , the entries in the diagonal variance matrix arediag Λ ( θ ) = Var [ Z ] = (cid:2) σ a ( R a , η a ) (cid:3) pa = (33)as given by equation (26), where the gamma shape–scale parame-ters ( R a , η a ) for each bin depend on cosmological parameter(s) θ through the power (see equations 19 and 21). Covariance estimation from mock catalogues gives an inherentlyrandom quantity. Instead of directly substituting the estimated co-variance in a probability distribution, a more principled approachis to marginalize out the unknown underlying covariance matrixusing Bayes’ theorem (Sellentin & Heavens 2016, hereafter SH).Let (cid:98) Σ f be an unbiased covariance matrix estimate calculated from N s ≡ m + Σ f ,Π [ Σ f ] ∝ | Σ f | −( p + )/ , (34)and the fact that an empirical covariance matrix estimate (cid:98) Σ f has theWishart distribution conditional on Σ f , P (cid:2)(cid:98) Σ f (cid:12)(cid:12) Σ f (cid:3) ∝ (cid:12)(cid:12)(cid:98) Σ f (cid:12)(cid:12) ( m − p − )/ | Σ f | m / exp (cid:104) − m (cid:16) Σ − (cid:98) Σ f (cid:17)(cid:105) , (35) A caveat is that, in principle, there could be redshift-space eﬀects thatrender the overall window function dependent on the underlying cosmology,e.g. a survey boundary deﬁned by a maximum redshift. one could show that the posterior distribution of Σ f (cid:12)(cid:12)(cid:98) Σ f is inverseWishart with the PDF P (cid:2) Σ f (cid:12)(cid:12)(cid:98) Σ f (cid:3) ∝ (cid:12)(cid:12)(cid:98) Σ f (cid:12)(cid:12) m / | Σ f | ( m + p + )/ exp (cid:104) − m (cid:16)(cid:98) Σ f Σ − (cid:17)(cid:105) . (36)It is this distribution that one needs to marginalize over to replacethe unknown true covariance Σ f with its estimate (cid:98) Σ f . The samederivation follows exactly for the rescaled covariance estimate (cid:98) Σ ( θ ) ,since our decomposition does not aﬀect the SH marginalizationprocedure (see appendix D).In addition to the covariance matrix estimate for the Gaussian-ized data Z , if one estimates the gamma distribution parameters { R a } and { η a } (see equation 21) from the empirical covariancematrix estimate of the band power data (cid:101) Y , then these parametersshould also be considered as random variables; in particular, theshape adopted for the likelihood is itself an estimated quantity, andthis could in principle also bias the recovered likelihood. In Sec-tion 4, we will see that in practice this is not an issue: for estimatedcovariance matrices with signiﬁcant uncertainties, the eﬀect of thecovariance estimation (allowed for by the SH marginalization proce-dure) itself dominates. For estimated covariance matrices with lownoise, the two eﬀects can become comparable, but in this situationthe size of both eﬀects is small. When the mean vector µ for the Gaussian data vector Z and itscovariance matrix Σ are known exactly, the multivariate normalPDF simply reads P N [ Z ; µ , Σ ] = (cid:12)(cid:12) π Σ (cid:12)(cid:12) − / exp (cid:20) − χ ( Z ; µ , Σ ) (cid:21) , (37)where the quantity χ ( Z ; µ , Σ ) ≡ ( Z − µ ) (cid:124) Σ − ( Z − µ ) . (38)This is the multivariate version of equation (22) but now for the Gaussianized band power Z . The SH procedure replaces the under-lying covariance Σ with an estimate (cid:98) Σ , and changes this to a modiﬁed t -distribution (see appendix D) P t (cid:2) Z ; µ , (cid:98) Σ , m (cid:3) = c p (cid:12)(cid:12)(cid:98) Σ (cid:12)(cid:12) − / (cid:34) + χ (cid:0) Z ; µ , (cid:98) Σ (cid:1) m (cid:35) −( m + )/ , (39)where the normalization constant is c p = ( m π ) − p / Γ [( m + )/ ] Γ [( m − p + )/ ] . (40)The PDF given in equation (39), when regarded as a function ofcosmological parameter(s) θ through the dependence of µ and (cid:98) Σ onthe power spectrum model as functions of {( R a , η a )} , L (cid:0) θ ; Z , (cid:98) Σ f , m (cid:1) = P t (cid:2) Z ; µ ( θ ) , (cid:98) Σ ( θ ) , m (cid:3) , (41)is a key result of this work.In the next section, we shall test diﬀerent aspects of our proce-dure used to derive the new likelihood, using Monte Carlo simula-tions matched to the speciﬁcations of future survey data. This leadsus to formulating a simple pipeline for likelihood analysis of galaxysurveys, which we present in Section 5. MNRAS486

Distribution of the band power spectrum in the lowest- k bin,centred at k ≈ . h Mpc − . The eﬀective number of independent modesin this bin is R ≈

4. We compare the exact hypo-exponential distribution( ﬁlled region ) sampled from 40 000 realizations, the gamma distributionapproximation ( solid line ) and the normal distribution assumption with thesame mean and variance ( dashed line ) for the band power measurement.

In order to test our proposed likelihood form, we create MonteCarlo simulations of data vectors by generating exponentially dis-tributed overdensity mode power and shot noise, and then con-volve with a chosen survey window before extracting the measuredpower spectra. Given the survey volume of DESI and

Euclid , weconsider scales k = . × − –1 . × − h Mpc − covering anorder of magnitude, and select overdensity wavenumbers q usingan inverse-volume distribution P [ q ] ∝ | q | . The chosen windowfunction has a Gaussian shape with full width at half-maximum(FWHM) 1 . × − h Mpc − . We divide the scales into p = T ( k ) by Eisenstein& Hu (1998), with the large-scale galaxy linear bias ﬁxed at b = .

87 and other cosmological parameters set to

Planck n = × − h Mpc − . The numberdensities predicted for DESI and Euclid are higher (e.g. Duﬀy 2014),so our analysis is conservative with respect to the eﬀect of shot noiseon large scales.

In Fig. 2, we compare the sampled hypo-exponential distribution(equation 18), the gamma distribution approximation (equation 20)and the normal distribution assumption (equation 22) with the samemean and variance for the band power in the bin centred at k ≈ . h Mpc − , which has an eﬀective number of independentmodes R ≈

4. The assumed normal distribution has a peak shiftedfrom the underlying hypo-exponential distribution; on the otherhand, the gamma distribution is a good approximation that matchesboth the peak and the tails well.One may wish to quantify the improvement in multivariate normality our component-wise Gaussianization could bring to theband power data vector. A key deﬁning property of a multivariatenormal variable X is that any projection X (cid:55)→ t (cid:124) X ∈ R , for somevector t , gives a univariate normal variable. Hence as a simplemultivariate normality test, given a set of samples { x i } for X , onecould randomly choose some directions t and perform univariatenormality tests on the projected samples { t (cid:124) x i } (see e.g. Shao &Zhou 2010).We perform the D’Agostino–Pearson normality test(D’Agostino & Pearson 1973) on 10 000 random projections of40 000 samples of the band power vector (cid:101) Y , which returns the p -value that characterizes how extreme the sample realizations areunder the null hypothesis that the underlying distribution were in-deed normal. It must be emphasized here that the p -value itself isnot a meaningful indicator of normality, as it varies depending onthe sample size; rather it is the comparison of the p -values with thesame sample that signiﬁes relative departure from normality. Weﬁnd p = .

01 for (cid:101) Y without Gaussianization; with Gaussianization (cid:101) Y (cid:55)→ Z , however, we ﬁnd improved p = .

08 given these samples.

To test the variance–correlation decomposition proposed in Sec-tion 3.2, we generate one set of 40 000 band power data realizationswith the Hubble parameter set to H = .

4, and an additional set of40 000 realizations generated with H = .

2. The former set givesa sampled ‘true’ covariance matrix (cid:98) Σ , and the latter gives a ‘ﬁducial’covariance estimate (cid:98) Σ f which is then rescaled using equation (32) tomatch the ‘true’ cosmology. For both band power (cid:101) Y without Gaus-sianization and Gaussianized band power Z , Fig. 3 shows that thediﬀerences between the directly sampled ‘true’ covariance matricesand the rescaled covariance estimates are small; this validates thedecomposition as a means to include cosmological dependence ofthe covariance matrix.Since covariance marginalization and decomposition do notmutually aﬀect each other, we do not test the eﬀect of the SHprocedure (the modiﬁed- t likelihood) separately here but point thereader to Sellentin & Heavens (2016) and Sellentin & Heavens(2018) for reference. The ultimate goal of power-spectrum likelihood analysis is to con-strain cosmological parameters, so the primary aim of our numericalsimulations is to test our new likelihood function after Gaussianiza-tion and covariance rescaling.To summarize, we have proposed the following steps for deriv-ing the new likelihood function (41), which we now tweak to isolatetheir eﬀects:(i)

Gaussianization – the data vector can remain un-Gaussianized (cid:101) Y , Gaussianized Z / with a ﬁxed parameter ν = / Z ν with a ﬁtted transformation parameter ν givenby the formula in equation (30);(ii) Covariance rescaling – the ﬁducial covariance matrix esti-mate is either ﬁxed at (cid:98) Σ f when calculating the likelihood, or rescaledto (cid:98) Σ ( θ ) using equation (32) to account for parameter dependence;(iii) Covariance marginalization – the Hartlap-debiased preci-sion matrix estimate (Hartlap et al. 2007) can either be substituteddirectly into the Gaussian likelihood L G ( θ ) (see equation 37), or theSH marginalization procedure can be used to give the modiﬁed- t likelihood L t ( θ ) (see equation 39). MNRAS , 951–965 (2019)

M. S. Wang et al. C o v (cid:2) e Y (cid:3) [ × ] simulated 0.00.20.40.60.81.01.21.4 rescaled 0.00.20.40.60.81.01.21.4 residual 0.00000.00250.00500.00750.0100 C o v [ Z ][ × ] Figure 3.

Validation of the variance–correlation decomposition to account for the parameter dependence of the covariance matrix by comparing directly sampledcovariance matrices with rescaled covariance matrix estimates.

Left-hand panel : covariance matrices sampled at a ‘true’ cosmology input ( H = . middlepanel : covariance estimates at the ‘ﬁducial’ cosmology ( H = .

2) rescaled using equation (32) to match the ‘true’ cosmology; right-hand panel : residuals(element-wise absolute diﬀerences) between the covariance matrices in the left-hand and middle panels. The top row and the bottom row are produced from40 000 realizations of the band power (cid:101) Y and the Gaussianized band power Z , respectively. Diﬀerent combinations of these choices give the likelihoods tabu-lated in Table 2. All these likelihoods can be compared with the truelikelihood constructed from the exponentially distributed modes andshot noise power prior to convolution , L true ( θ ; Y ) = r (cid:214) i = e − Y i / P i − e − Y i / P shot P i − P shot , (42)which is inaccessible in a realistic survey in the presence of thewindow function.Since our methods mostly aﬀect measurements on the largestsurvey scales, the local non-Gaussianity parameter f NL , which issensitive to the large-scale power, is a well-motivated test parameter(Sun et al. 2013; Kalus et al. 2016). f NL enters the galaxy powerspectrum by modifying the constant linear galaxy bias on largescales (Dalal et al. 2008; Matarrese & Verde 2008; Slosar et al.2008), b (cid:55)→ b + f NL A ( k )( b − ) , (43)which introduces scale dependence via A ( k ) = Ω m δ c k T ( k ) (cid:18) H c (cid:19) . (44)Here c is the speed of light, Ω m is the matter density parameter,and δ c ≈ .

686 is the spherical collapse critical overdensity today.Henceforth we will identify θ with θ = f NL . We emphasize thatthe choice of f NL as a test parameter is entirely based on likelihoodconsiderations; our work does not serve as a stringent constraint onprimordial non-Gaussianity. To leading order in f NL , which is smallas constrained by Planck f NL ; this will eventually break down onnear the Hubble horizon scale (see e.g. Tellarini et al. 2015).To properly examine the ensemble behaviour of the likelihoodslisted in Table 2 with diﬀerent treatments, we now produce 250 000data realizations at some ‘true’ input cosmology, and a ﬁxed set of N s = f NL =

0. The latter provides a covariance matrix estimate for boththe band power and the Gaussianized data. The prior range for f NL is set to be [− , ] and scanned through with a resolution of ∆ f NL = . To get an initial intuition, we compare in Fig. 4 the true like-lihood L true , the modiﬁed- t likelihood L t , fc (cid:2)(cid:101) Y (cid:3) without Gaussian-ization and covariance rescaling, and the new likelihood L t , rc (cid:2) Z ν (cid:3) derived using our full procedure with ﬁtted transformation param-eter ν , all averaged over data realizations produced at f NL =

0. It isclear that our methods produce a superior likelihood approximationto the true likelihood.

For diﬀerent true f NL parameter inputs and the same ﬁducial cos-mology at f NL =

0, we compare both frequentists’ and Bayes esti-mators calculated from the likelihoods L true , L t , rc (cid:2)(cid:101) Y (cid:3) , L t , rc (cid:2) Z ν (cid:3) , L t , fc (cid:2)(cid:101) Y (cid:3) and L t , fc (cid:2) Z ν (cid:3) (see Table 2 for deﬁnitions). The results We have chosen a much wider prior range than the

Planck f NL = . ± .486

Planck f NL = . ± .486 , 951–965 (2019) aussianization and covariance decomposition Table 2.

List of likelihoods with diﬀerent combinations of Gaussianization and covariance rescaling treatments, as well as diﬀerent functional forms.Data variable No Gaussianization With GaussianizationFixed ν = / ν Functional form Modiﬁed- t Gaussian Modiﬁed- t Modiﬁed- t Rescaled covariance estimate (rc) L t , rc (cid:2)(cid:101) Y (cid:3) L G , rc (cid:2) Z / (cid:3) L t , rc (cid:2) Z / (cid:3) L t , rc (cid:2) Z ν (cid:3) Fixed covariance estimate (fc) L t , fc (cid:2)(cid:101) Y (cid:3) – L t , fc (cid:2) Z / (cid:3) L t , fc (cid:2) Z ν (cid:3) L ( f N L ) L true L t , rc [ Z ν ] L t , fc [ e Y ] −

200 0 200 f NL − ∆ L Figure 4.

A direct comparison between the true likelihood L true ( solid line ),the modiﬁed- t likelihood L t , rc (cid:2) Z ν (cid:3) ( dashed line ) with Gaussianization andcovariance rescaling, and the modiﬁed- t likelihood L t , fc (cid:2)(cid:101) Y (cid:3) ( dotted line ) without Gaussianization and covariance rescaling, all averaged over datarealizations produced at f NL =

0. The bottom panel shows the diﬀerences ofthe likelihoods from L true . have been marginalized over our data realizations to assess theirensemble behaviour. Maximum likelihood estimator – This is a frequentists’ estimator,given by (cid:98) θ = arg max L ( θ ) . (45)The estimates are compared in Table 3. Their uncertainties are thestandard deviations estimated from the ensemble of data realiza-tions. Posterior median estimator – With ﬂat priors, the posterior P [ θ | X ] on the cosmological parameter(s) θ given any observations X issimply the likelihood L ( θ ; X ) suitably normalized. Common MonteCarlo analyses usually return the posterior median or mean as thebest estimate (see e.g. Trotta 2008; Hogg & Foreman-Mackey 2018).Here we choose the absolute loss function (Berger 1985)loss ( a , θ ) = | a − θ | (46)and minimize its expectation to obtain the Bayes estimator that isthe posterior median (cid:98) θ = arg min a E θ | X [ (cid:96) ( a , θ )] . (47) Table 3.

Maximum likelihood estimates of f NL from likelihoods L true , L t , rc (cid:2)(cid:101) Y (cid:3) , L t , rc (cid:2) Z ν (cid:3) , L t , fc (cid:2)(cid:101) Y (cid:3) , and L t , fc (cid:2) Z ν (cid:3) (see Table 2 for deﬁni-tions), averaged over 250 000 data realizations with diﬀerent true f NL inputsand the same ﬁducial cosmology at f NL =

0. The 1- σ error bounds are givenas the estimated standard deviations.Input Maximum likelihood estimates L true L t , rc (cid:2)(cid:101) Y (cid:3) L t , rc (cid:2) Z ν (cid:3) L t , fc (cid:2)(cid:101) Y (cid:3) L t , fc (cid:2) Z ν (cid:3)

50 49 ±

42 42 ±

44 48 ±

42 49 ±

44 51 ± ±

39 2 ±

41 8 ±

40 9 ±

39 11 ± − ± − ± − ± − ±

38 1 ± − − ± − ± − ± − ± − ± − − ± − ± − ± − ± − ± Table 4.

Posterior median estimates of f NL from likelihoods L true , L t , rc (cid:2)(cid:101) Y (cid:3) , L t , rc (cid:2) Z ν (cid:3) , L t , fc (cid:2)(cid:101) Y (cid:3) , and L t , fc (cid:2) Z ν (cid:3) (see Table 2 for deﬁnitions), averagedover 250 000 data realizations with diﬀerent true f NL inputs and the sameﬁducial cosmology at f NL =

0. The 1- σ error bounds are quoted as theequal-tailed 68 . L true L t , rc (cid:2)(cid:101) Y (cid:3) L t , rc (cid:2) Z ν (cid:3) L t , fc (cid:2)(cid:101) Y (cid:3) L t , fc (cid:2) Z ν (cid:3)

50 53 + − + − + − + − + −

10 13 + − + − + − + − + − + − − + − + − − + − + − − − + − − + − − + − − + − − + − − − + − − + − − + − − + − − + − The associated 1- σ uncertainties can be quoted as the equal-tailed68 . L t , rc (cid:2) Z ν (cid:3) performsthe best in producing closer best estimates as well as error bounds tothose from the true likelihood, whether we perform frequentists’ es-timation or Bayesian inference. We have found that Gaussianizationwith a ﬁxed transformation parameter ν = /

3, i.e. L t , rc (cid:2) Z / (cid:3) ,gives similar results to Gaussianization with ν ﬁtted by the formulain equation (30); likewise, assuming a wrong ﬁducial cosmology forGaussianization, even when it is signiﬁcantly deviant from the truecosmology, has negligible impact on recovered parameters. Thisdemonstrates that our Gaussianization scheme is robust to variationof the ﬁducial cosmological model. MNRAS , 951–965 (2019) M. S. Wang et al.

A graphical comparison of likelihood shapes is the quantile–quantile (Q–Q) probability plot (Wilk & Gnanadesikan 1968) oftheir respective posteriors. We show the f NL percentiles inferredfrom all likelihoods in Table 2 except L G , rc (cid:2) Z / (cid:3) against f NL per-centiles of the true likelihood L true in Fig. 5, where we contrastno Gaussianization against Gaussianization, and ﬁxed covarianceestimates against rescaled covariance estimates.There are two trends that match our expectation: the new like-lihoods in the Gaussianized data variable Z matches the shape ofthe true likelihood L true better than the ones in data variable (cid:101) Y without Gaussianization, especially away from the peak and nearthe tails of the distribution; not rescaling the covariance matrix inparameter space to account for its dependence on cosmology no-ticeably distorts the error bounds. Again we have also found thatGaussianization with ﬁxed ν = /

3, i.e. L t , rc (cid:2) Z / (cid:3) , producesnearly indistinguishable results from Gaussianization with ﬁtted ν ,i.e. L t , rc (cid:2) Z ν (cid:3) .Another quantitative measure of ‘statistical distance’ betweena true probability distribution f ( θ ) and an approximate probabilitydistribution g ( θ ) is the Kullback–Leibler (KL) divergence (Kullback& Leibler 1951) D KL ( f (cid:107) g ) ≡ ∫ d θ f ( θ ) ln f ( θ ) g ( θ ) . (48)For instance, if we take f to be the posterior of the true likelihoodagainst which we compare the posterior g of the new likelihood,then the expected KL divergence over the entire ensemble of datarealizations could quantify the ‘information loss’ due to replacementof the true likelihood with the new.In Table 5, we list the KL divergence values for all likelihoodsin Table 2, except for L G , rc (cid:2) Z / (cid:3) , from the true likelihood L true ,averaged over our data realizations generated at diﬀerent true f NL inputs. The evidence again suggests that the likelihood L t , rc (cid:2) Z ν (cid:3) in Gaussianized data with the rescaled covariance estimate matchesthe full shape of true likelihood very well, and this remains the casewhen we use Gaussianization at ﬁxed ν = /

3, i.e. L t , rc (cid:2) Z / (cid:3) . Although the major sources of impact on parameter inference,namely distribution non-normality and parameter dependence of thecovariance matrix, have been identiﬁed and mitigated by Gaussian-ization and variance–correlation decomposition respectively, thereare other sources of error which we now consider.The ﬁrst potential concern is how the correlations betweenband powers aﬀect the Gaussianization. We have proposed thatthe Gaussianization be performed using only the univariate distri-butions, and hence this will work the best when the oﬀ-diagonalcorrelations in the covariance matrix are relatively weak. Thus fora given window function we want to minimize the number of bandpowers to be included in the data vector so that the covariance ma-trix is strongly dominated by the diagonal entries. For future surveyswith increasing volume, this will be easier as the window functionswill be narrower in Fourier space. However, reducing the number ofband powers will also mean that more overdensity modes contributeto each bin, making the statistics more Gaussian. The limit to howfew band powers should be included in the data vector is that weneed to make sure that there are suﬃciently many bins to retain thecosmological information in the data.The second potential concern is that when the band power variance cannot be analytically calculated from the window func-tion mixing matrix and observed overdensity modes, the gammadistribution parameters have to be obtained from mock catalogues,and this comes with additional statistical scatter owing to the estima-tion of band power variance (see Section 2.2.2). Ideally it needs tobe marginalized out together with the unknown full covariance ma-trix in the SH procedure, but this unfortunately makes the likelihoodanalytically intractable after Gaussianization.To this end, we would like to assess the relative impact be-tween covariance estimation of the Gaussianized band power andthe estimation of the band power variance for evaluating gammadistribution parameters ( R , η ) . Both estimations are made from thesame set of mock catalogues, so we need to consider an ensem-ble of mock catalogue sets. For 25 000 data realizations, we nowgenerate N s = each of them. Inthe reproduced Q–Q probability plots (Fig. 6) for likelihoods in the Gaussianized data Z with ﬁxed transformation parameter ν = / L G , rc (cid:2) Z / (cid:3) . This scenariocorresponds to the dashed lines.(ii) The unbiased covariance matrix estimate is marginal-ized with the SH procedure and thus the modiﬁed- t likelihood L t , rc (cid:2) Z / (cid:3) is used. This scenario corresponds to the dash–dottedlines.(iii) A high-precision covariance estimate is used as a proxy forthe exact covariance matrix in the Gaussian likelihood L G , rc (cid:2) Z / (cid:3) .This scenario corresponds to the dotted lines.The covariance estimate is rescaled for varying cosmology for eachof these scenarios, and we compare using analytically calculatedshape–scale parameters ( R , η ) (without ‘+’ markers) with using ( R , η ) obtained from estimated band power variance (with ‘+’ mark-ers). In addition, we explore the eﬀect of the mock catalogue sizeon the relative impacts between the two estimations, by adding thesame plots (in the right-hand panel) for the case N s =

50. Thedeviation from the true likelihood in these scenarios are shown asresiduals (numerical diﬀerences) in the bottom panel of Fig. 6.The evidence indicates that the SH procedure (the modiﬁed- t likelihood) indeed accounts for the statistical scatter of covarianceestimation, but this eﬀect is subdominant to the band power vari-ance estimation for evaluating ( R , η ) if we use a set of N s = ( R , η ) are estimated or exact, in the left-hand panel of Fig. 6. If we reduce the sample size of catalogues to N s =

50, the impact of covariance estimation becomes greater thanthat of ( R , η ) estimation – this is evident as the lines correspondingto estimated covariance matrices deviate the most from all otherlines. However, both eﬀects are far less signiﬁcant than the distribu-tion non-normality and cosmological dependence of the covariancematrix, as we have seen in Fig. 5, which are the focal problemsaddressed in this work. In particular, the errors due to these esti-mations with N s = t likelihood for parameter inference, even when we cannotdo the same for the gamma distribution parameters calculated fromthe estimated band power variance. MNRAS486

50 0 50 100 − r e s i d u a l s true likelihood f NL percentiles l o w e r m e d i a n u pp e r ﬁxed covariance matrix L true L t , fc [ e Y ] L t , fc [ Z ν ] L t , fc [ Z ] −

50 0 50 100

Figure 5.

Q–Q plots comparing posterior distributions of the likelihoods without Gaussianization (data variable (cid:101) Y , dashed line ), with ﬁtted- ν Gaussianization(data variable Z ν , dash–dotted lines ) and with ﬁxed- ν Gaussianization (data variable Z / , dotted lines ) against the true posterior from L true as a reference( solid lines with unit slope). See Table 2 for deﬁnitions of the likelihoods. Top panel : inferred f NL percentiles from the diﬀerent likelihoods against the truelikelihood L true percentiles, averaged over 250 000 data realizations at f NL = Bottom panel : the residuals (numerical diﬀerences) of corresponding linesfrom the reference line.

Left-hand panel : the ﬁducial covariance estimate is rescaled (‘rc’).

Right-hand panel : the ﬁducial covariance estimate is correct butﬁxed in parameter space (‘fc’). The dotted vertical lines show the f NL posterior median estimate and the 1- σ uncertainties from the true likelihood. Table 5.

Kullback–Leibler (KL) divergence values of f NL posteriors of all likelihoods in Table 2, except for L G , rc (cid:2) Z / (cid:3) , from that of the true likelihood L true ,averaged over 250 000 data realizations with diﬀerent true f NL inputs and the same ﬁducial cosmology at f NL = D KL from the true posterior L t , rc (cid:2)(cid:101) Y (cid:3) L t , rc (cid:2) Z / (cid:3) L t , rc (cid:2) Z ν (cid:3) L t , fc (cid:2)(cid:101) Y (cid:3) L t , fc (cid:2) Z / (cid:3) L t , fc (cid:2) Z ν (cid:3)

50 0 .

12 0 .

02 0 .

35 0 .

05 0 . .

11 0 .

02 0 .

18 0 .

04 0 .

040 0 .

10 0 .

02 0 .

17 0 .

05 0 . −

10 0 .

02 0 .

17 0 .

05 0 . −

50 0 .

11 0 .

03 0 .

23 0 .

10 0 . We now present the proposed ﬁnal pipeline for the straightforwardapplication of our methods. A comprehensive list of the notationsused can be found in Table 1 in Section 1; note that in this sectionwe use the superscript (f) to denote quantities evaluated at theﬁducial cosmology, and superscript (d) for measurements or datarealizations.

Gamma distribution parameters – We model the band power dis- tribution as a gamma distribution in shape–scale parametrization ( R , η ) . Given band power measurements or realizations (cid:101) P ( d ) ( k a ) at some cosmology θ , the shape–scale parameters are determinedfrom its mean and variance R a ( θ ) = E (cid:2) (cid:101) P ( d ) ( k a ) (cid:3) (cid:46) Var (cid:2) (cid:101) P ( d ) ( k a ) (cid:3) ,η a ( θ ) = Var (cid:2) (cid:101) P ( d ) ( k a ) (cid:3) (cid:46) E (cid:2) (cid:101) P ( d ) ( k a ) (cid:3) . (49)In the absence of analytic expressions for the band power vari- MNRAS , 951–965 (2019) M. S. Wang et al. − − − i n f e rr e d f N L p e r c e n t il e s l o w e r m e d i a n u pp e r N s = L true : reference L G , rc [ Z ] : est. cov., est. ( R , η ) L G , rc [ Z ] : est. cov., exact ( R , η ) L t , rc [ Z ] : marg. cov., est. ( R , η ) L t , rc [ Z ] : marg. cov., exact ( R , η ) L G , rc [ Z ] : exact cov., est. ( R , η ) L G , rc [ Z ] : exact cov., exact ( R , η ) −

50 0 50 100 − r e s i d u a l s true likelihood f NL percentiles − − − l o w e r m e d i a n u pp e r N s = −

50 0 50 100 − Figure 6.

Q–Q plots comparing the relative impact of covariance estimation and band power variance estimation for evaluating gamma distribution parameters ( R , η ) . All likelihoods compared are in the Gaussianized data Z / with the covariance matrix estimate rescaled for varying cosmology. The covariancematrix is either estimated without SH marginalization (Gaussian likelihood L G , rc (cid:2) Z / (cid:3) , dashed lines ), SH marginalized (modiﬁed- t likelihood L t , rc (cid:2) Z / (cid:3) , dash–dotted lines ) or exact (Gaussian likelihood L G , rc (cid:2) Z / (cid:3) , dotted lines ); the distribution parameters are either estimated ( with ‘+’ markers ) or exact ( without‘+’ markers ). Top panel : inferred f NL percentiles from the diﬀerent cases against percentiles of the true likelihood L true , averaged over 25 000 data realizationsat f NL = Bottom panel : the residuals (numerical diﬀerences) of corresponding lines from the reference line (the true likelihood L true ). Left-hand panel : mockcatalogues contain N s = Right-hand panel : mock catalogues contain N s =

50 samples. The dotted vertical lines show the f NL posterior medianestimate and the 1- σ uncertainties from the true likelihood. Note – the scales in the bottom panels diﬀer, and also diﬀer from those of the bottom panels inFig. 5. ance, this should be replaced by a ﬁducial estimate (cid:99)

Var (cid:2) (cid:101) P ( d , f ) ( k a ) (cid:3) calculated from mock catalogues and suitably rescaled with thecosmology θ , i.e. (cid:99) Var (cid:104) (cid:101) P ( d ) ( k a , θ ) (cid:105) =  (cid:101) P ( k a , θ ) (cid:101) P ( f ) ( k a )  (cid:99) Var (cid:104) (cid:101) P ( d , f ) ( k a ) (cid:105) , (50)which leads to corresponding rescaling for the distribution param-eters in equation (49). Note that this rescaling cancels out for theshape parameter R a , which is in fact independent of θ . This is ex- pected as the eﬀective number of modes is a model-independentquantity. Data transformation – To make the data distribution approximatelymultivariate normal, we adopt the Box–Cox transformation wherebythe band power measurements are univariately Gaussianized, (cid:101) P ( d ) ( k a ) (cid:55)→ Z a ≡ (cid:104) (cid:101) P ( d ) ( k a ) (cid:105) ν a . (51)Whilst the transformation parameters ν a for each bin can be deter-mined using the ﬁtting formula given by equation (30) as a functionof the ﬁducial shape parameter R ( f ) a , we have found little gain over MNRAS486

3, which we favour for reasons ofsimplicity. After the transformation, the mean µ a ( θ ) and variance σ a ( θ ) of the Gaussianized band power Z a at cosmology θ are givenby equation (26) for each bin. Likelihood evaluation – The remaining quantity needed for like-lihood evaluation is the covariance matrix estimate (cid:98) Σ ( θ ) for the Gaussianized data Z , which is allowed to vary with cosmology byrescaling the ﬁducial estimate (cid:98) Σ ( f ) from N s ≡ m + (cid:98) Σ ( θ ) = D (cid:98) Σ ( f ) D , (52)where the diagonal matrix D consists of entries D aa = σ a ( θ ) (cid:14) σ a (cid:0) θ ( f ) (cid:1) . (53)We recommend using the modiﬁed- t distribution obtained with SHmarginalization as the ﬁnal likelihood (see equations 39 and 41), L (cid:0) θ ; Z , (cid:98) Σ ( f ) , m (cid:1) = P t (cid:2) Z ; µ ( θ ) , (cid:98) Σ ( θ ) , m (cid:3) . (54)Our simulations in Section 4 have shown that the impact from theerrors in the poorly determined covariance matrix entries dominatesover problems caused by the estimated gamma distribution param-eters in our likelihood. Consequently, it is worth marginalizing outscatter of the estimated covariance matrix using the SH procedureeven if we cannot simultaneously perform an equivalent procedurefor the gamma distribution parameters.Standard Bayesian inference can be readily performed now toextract cosmological parameter estimates and associated uncertain-ties, or to sample the posterior distribution in a multidimensionalparameter space using Monte Carlo techniques. In preparation for next-generation galaxy surveys such as DESI and

Euclid , we have revisited the Gaussian likelihood assumption com-monly found in galaxy-clustering likelihood analyses, which mayadversely impact cosmological parameter inference from measure-ments limited by sample size on the largest survey scales. Extendingprevious work by Schneider & Hartlap (2009), Keitel & Schneider(2011), Wilking & Schneider (2013), Sun et al. (2013) and Kaluset al. (2016), we have carefully derived the distribution of the bandpower spectrum (windowed power spectrum monopole) in the lin-ear regime while taking window eﬀects and random shot noise intoaccount; in particular, we have(i) devised a Gaussianization scheme using the Box–Cox trans-formation to improve data normality;(ii) proposed a variance–correlation decomposition of the co-variance matrix to allow for varying cosmology;(iii) presented a simple pipeline for straightforward applicationof this new methodology (Section 5).We always recommend rescaling the covariance matrix estimate us-ing our decomposition as its parameter dependence has a signiﬁcantimpact on parameter estimation. Although below the largest surveyscales the normal distribution may be a good approximation forthe band power measurements, we still recommend the use of ourGaussianization scheme for its simplicity.With numerical simulations, we have tested the likelihood de-rived from the new procedure for both point estimation and shapecomparison with the true likelihood inaccessible in real surveys. By focusing on the local non-Gaussianity f NL , which is a sensitive pa-rameter for the large-scale power spectrum, we have demonstratednoticeable improvement in parameter inference brought by Gaus-sianization and covariance rescaling. Whilst Gaussianizing trans-formations are not new, our set-up, motivation and implementationdiﬀer from previous works by, for instance, Wilking & Schneider(2013) and Schuhmann et al. (2016).However, an all-encompassing formalism for galaxy-clusteringpower spectrum analysis is still out of reach. Towards the non-linearregime where overdensity modes are no longer independent butcoupled due to gravitational evolution, the power-spectrum covari-ance structure is fundamentally more complex, and non-negligibleshot noise can also deviate from the Poisson sampling prescrip-tion (Bernardeau et al. 2002). The analysis covered in this paperfocuses on the windowed power spectrum monopole in the FKPframework, but this could also be applied to power-law momentestimators with even exponents in the local plane-parallel approxi-mation (Yamamoto et al. 2006). We leave further extensions to thecurrent analysis to future work. ACKNOWLEDGEMENTS

MSW, WJP, and DB acknowledge support from the European Re-search Council (ERC) through the Darksurvey grant 614030. SAacknowledges support from the UK Space Agency through grantST/K00283X/1. RC is supported by the Science and TechnologyFacilities Council (STFC) grant ST/N000668/1.Numerical computations are performed on the Sciama HighPerformance Computing (HPC) cluster which is supported by the In-stitute of Cosmology and Gravitation (ICG), the South East PhysicsNetwork (SEPnet) and the University of Portsmouth.

REFERENCES

Abbott T. M. C. et al., 2018, Phys. Rev. D, 98, 043526Ade P. A. R. et al., 2016, A&A, 594, A17Aghamousa A. et al., 2016, preprint ( arXiv:1611.00036 )Aghanim N. et al., 2018, preprint ( arXiv:1807.06209 )Alam S. et al., 2017, MNRAS, 470, 2617Anderson T. W., 2003, An Introduction to Multivariate Statistical Analysis,3rd edn. WileyAvila S. et al., 2018, MNRAS, 479, 94Berger J. O., 1985, Statistical Decision Theory and Bayesian Analysis.Springer-VerlagBernardeau F., Colombi S., Gaztañaga E., Scoccimarro R., 2002,Phys. Rep., 367, 1Beutler F., Saito S., Seo H.-J., Brinkmann J., Dawson K. S., EisensteinD. J., Font-Ribera A. et al., 2014, MNRAS, 443, 065Beutler F. et al., 2017, MNRAS, 466, 2242Blot L., et al., 2019, MNRAS, 485, 2806Box G. E. P., Cox D. R., 1964, J. Royal Stat. Soc., 26, 211Burić T., Elezović N., 2012, Integral Transforms Spec. Funct., 23, 355Colavincenzo M. et al., 2019, MNRAS, 482, 4883D’Agostino R., Pearson E. S., 1973, Biometrika, 60, 613Dalal N., Dore O., Huterer D., Shirokov A., 2008, Phys. Rev. D, 77, 123514Dodelson S., Schneider M. D., 2013, Phys. Rev. D, 88, 063537Duﬀy A. R., 2014, Ann. Phys., 526, 283Eiﬂer T., Schneider P., Hartlap J., 2009, A&A, 502, 721Eisenstein D. J., Hu W., 1998, ApJ, 496, 605Feldman H. A., Kaiser N., Peacock J. A., 1994, ApJ, 426, 23Gabrielli A., Sylos Labini F., Joyce M., Pietronero L., 2005, StatisticalPhysics for Cosmic Structures. Springer-VerlagGolubev A., 2016, J. Theor. Biol., 393, 203MNRAS , 951–965 (2019) M. S. Wang et al.

Gupta A., Nagar D., 2000, Matrix Variate Distributions. Chapman andHall/CRCHamimeche S., Lewis A., 2008, Phys. Rev. D, 77, 103013Hartlap J., Simon P., Schneider P., 2007, A&A, 464, 399Hogg D. W., Foreman-Mackey D., 2018, ApJS, 236, 11Joachimi B., Taylor A. N., 2011, MNRAS, 416, 1010Kalus B., Percival W. J., Samushia L., 2016, MNRAS, 455, 2573Kaufman C. G., Schervish M. J., Nychka D. W., 2008, J. Am. Stat. Assoc.,103, 1545Keitel D., Schneider P., 2011, A&A, 534, A76Kitaura F.-S. et al., 2016, MNRAS, 456, 4156Kullback S., Leibler R. A., 1951, Ann. Math. Stat., 22, 79Laparra V., Camps-Valls G., Malo J., 2011, IEEE Trans. Neural Netw., 22,537Laureijs R. et al., 2011, preprint ( arXiv:1110.3193 )Li Y., Singh S., Yu B., Feng Y., Seljak U., 2019, J. Cosmol. Astropart.Phys., 1901, 016Lippich M. et al., 2019, MNRAS, 482, 1786Manera M. et al., 2013, MNRAS, 428, 1036Matarrese S., Verde L., 2008, ApJ, 677, L77Neuts M. F., 1981, Matrix-Geometric Solutions in Stochastic Models: anAlgorthmic Approach. Dover Publications Inc.Paz D. J., Sánchez A. G., 2015, MNRAS, 454, 4326Peacock J. A., Nicholson D., 1991, MNRAS, 253, 307Peebles P. J. E., 1980, The Large-Scale Structure of the Universe.Princeton University PressPercival W. J., Brown M. L., 2006, MNRAS, 372, 1104Percival W. J. et al., 2014, MNRAS, 439, 2531Pope A. C., Szapudi I., 2008, MNRAS, 389, 766Schneider P., Hartlap J., 2009, A&A, 504, 705Schuhmann R. L., Joachimi B., Peiris H. V., 2016, MNRAS, 459, 1916Seljak U., Aslanyan G., Feng Y., Modi C., 2017, J. Cosmol. Astropart.Phys., 1712, 009Sellentin E., Heavens A. F., 2016, MNRAS, 456, L132Sellentin E., Heavens A. F., 2018, MNRAS, 473, 2355Sellentin E., Jaﬀe A. H., Heavens A. F., 2017, preprint( arXiv:1709.03452 )Shao Y., Zhou M., 2010, J. Multivar. Anal., 101, 2637Slosar A., Hirata C., Seljak U., Ho S., Padmanabhan N., 2008, J. Cosmol.Astropart. Phys., 0808, 031Smith S., Challinor A., Rocha G., 2006, Phys. Rev. D, 73, 023517Sun L., Wang Q., Zhan H., 2013, ApJ, 777, 75Taylor A., Joachimi B., Kitching T., 2013, MNRAS, 432, 1928Tegmark M., 1997, Phys. Rev. D, D55, 5895Tegmark M., Hamilton A. J. S., Strauss M. A., Vogeley M. S., Szalay A. S.,1998, ApJ, 499, 555Tellarini M., Ross A. J., Tasinato G., Wands D., 2015, J. Cosmol.Astropart. Phys., 1507, 004Trotta R., 2008, Contemp. Phys., 49, 71Verde L. et al., 2003, ApJS, 148, 195Wilk M. B., Gnanadesikan R., 1968, Biometrika, 55, 1Wilking P., Schneider P., 2013, A&A, 556, A70Wilson M. J., Peacock J. A., Taylor A. N., de la Torre S., 2017, MNRAS,464, 3121Wishart J., 1928, Biometrika, 20A, 32Yamamoto K., Nakamichi M., Kamino A., Bassett B. A., Nishioka H.,2006, PASJ, 58, 93

APPENDIX A: SHOT NOISE POWER AND ITSDISTRIBUTION

Here we derive the amplitude of the shot noise power and consider itsdistribution, which aﬀects the power spectrum likelihood. Followingthe calculations in Peebles (1980) and Feldman et al. (1994) for thetwo-point correlation of the Poisson-sampled overdensity ﬁeld ˆ δ (see equation 11), we have (cid:10) ˆ δ ( q ) ˆ δ ∗ ( q (cid:48) ) (cid:11) = ∫ d r d r (cid:48) e i ( q − q (cid:48) ) · r e i q (cid:48) · ( r − r (cid:48) ) × (cid:20) ξ ( r − r (cid:48) ) + + α ¯ n ( r ) δ D ( r − r (cid:48) ) (cid:21) = P true ( q ) δ K q , q (cid:48) + + α V ∫ d r e i ( q − q (cid:48) ) · r ¯ n ( r ) − , (A1)where δ D ( r ) is the Dirac delta function and δ K q , q (cid:48) is the Kroneckerdelta function. This expectation value contains both the underlyingpower spectrum and the scale-invariant shot noise power P shot = + α V ∫ d r ¯ n ( r ) − . (A2)To determine the distribution of the stochastic shot noise, weconsider the scenario where N galaxies are randomly located at { x i } Ni = in a ﬁnite volume. In this set-up, the overdensity ﬁeld andits Fourier transform are δ ( r ) = VN N (cid:213) i = δ D ( r − x i ) − , δ ( k ) = N N (cid:213) i = e i k · x i , (A3)where in δ ( k ) we have dropped a Dirac delta term that vanishes for k (cid:44) . In the large galaxy number limit N → ∞ , regardless of thedetailed distribution of the summands exp ( i k · x i ) where { x i } Ni = are independently uniformly distributed, δ ( k ) becomes a Gaussianrandom ﬁeld by the central limit theorem (Peacock & Nicholson1991). Hence the shot noise power is also exponentially distributed(cf. Section 2.2.1), and will overlay the exponential distribution ofany underlying power if there is any intrinsic structure in galaxyclustering. APPENDIX B: HYPO-EXPONENTIAL DISTRIBUTION

Here we derive the form of the hypo-exponential PDF (equation 18)introduced in Section 2. We will also show that the sum of indepen-dently identically distributed exponential random variables followsthe gamma distribution. This motivates the gamma distribution ap-proximation of the hypo-exponential distribution in Section 2.2.2.Let X be the sum of independent exponential variables { X i } ri = with PDF P i [ X i = x i ] = β i exp (− β i x i ) , (B1)where (cid:8) β − i (cid:9) ri = are their respective scale parameters. Then the PDFof X is the convolution of the individual PDFs P i , P [ X = x ] = (cid:18) r ∗ i = P i (cid:19) [ x ] ≡ ∫ r (cid:214) i = d t i P [ x − t ]× r (cid:214) j = P j [ t j − − t j ] = r (cid:213) i P i [ x ] r (cid:214) j (cid:44) ij = β j β j − β i . (B2)We prove the last equality by induction on r : the initial statementfor r = MNRAS486 , 951–965 (2019) aussianization and covariance decomposition step P [ X = x ] = ∫ x d t r − (cid:213) i = (cid:169)(cid:173)(cid:171) r − (cid:214) j (cid:44) i β j β j − β i (cid:170)(cid:174)(cid:172) P i [ t ] P r [ x − t ] = e − β r x r − (cid:213) i = (cid:169)(cid:173)(cid:171) r − (cid:214) j (cid:44) i β j β j − β i (cid:170)(cid:174)(cid:172) β i β r ∫ x d t e −( β i − β r ) t = r − (cid:213) i = (cid:169)(cid:173)(cid:171) r − (cid:214) j (cid:44) i β j β j − β i (cid:170)(cid:174)(cid:172) β i P r [ x ] − β r P i [ x ] β i − β r = r (cid:213) i = (cid:169)(cid:173)(cid:171) r (cid:214) j (cid:44) i β j β j − β i (cid:170)(cid:174)(cid:172) P i [ x ] . (B3)This is an example of the hypo-exponential family of distri-butions, sometimes also referred to as the generalized Erlang dis-tribution (Neuts 1981). We note here the particular case β i = β j for some i (cid:44) j , when two variables are also identically distributed.Using the formula above by taking the limit ∆ β ≡ β i − β j →

0, thePDF of (cid:0) X i + X j (cid:1) is P [ X i + X j = x ] = lim (cid:15)β → β i ( β i − ∆ β ) ∆ β e − β i x (cid:16) − e x ∆ β (cid:17) = β i x e − β i x . (B4)We recognize this as a gamma distribution Γ (cid:0) , β − i (cid:1) in the shape–scale parametrization.This recovers the usual result that the sum of independently identically distributed exponential random variables follows thegamma distribution; it also motivates our gamma distribution ap-proximation of the hypo-exponential distribution. APPENDIX C: ERROR-FUNCTION TRANSFORMATION

We consider an alternative to the Box–Cox transformation adoptedas our default Gaussianization scheme. This is derived by matchingthe cumulative distribution functions (CDF) and involves the (com-plementary) error function. Whilst this scheme is exact in principle,it requires computationally costly numerical integrations for calcu-lating transformed moments.To this end, we seek an invertible transformation (cid:101) Y (cid:55)→ Z ofthe gamma random variable with ﬁducial shape–scale parameters ( R f , η f ) , where Z ∼ N ( , ) is a standard normal variable with zeromean and unit variance, by matching the CDFs ∫ (cid:101) Y d t P Γ [ t ; R f , η f ] = ∫ Z −∞ d t P N [ t ; 0 , ] , (C1)where the gamma PDF is given by equation (20) and the normalPDF with zero mean and unit variance is P N [ t ; 0 , ] = √ π e − t / . (C2)The solution with d Z (cid:46) d (cid:101) Y > Z = −√ − (cid:34) γ ( R f , (cid:101) Y / η f ) Γ ( R f ) (cid:35) , (C3)where erfc − ( x ) is the inverse of the complementary error functionerfc ( x ) ≡ √ π ∫ ∞ x d t e − t . (C4) For a gamma variable (cid:101) Y with diﬀerent shape–scale parameters ( R , η ) transformed using equation (C3), the PDF in Z is now P [ z ; R , η ] = P N [ t ; 0 , ] P Γ [ ˜ y ( z ) ; R , η ] P Γ [ ˜ y ( z ) ; R f , η f ] , (C5)with mean and variance given by µ ( R , η ) = ∫ ∞−∞ d z z P [ z ; R , η ] ,σ ( R , η ) = ∫ ∞−∞ d z z P [ z ; R , η ] − µ ( R , η ) . (C6)These results are analogous to equation (26) for the Box–Cox Gaus-sianization scheme; however, in this case accurate evaluation oftransformed moments requires computationally expensive numeri-cal integration for each parameter pair ( R , η ) . For this reason we donot implement this scheme in our pipeline. APPENDIX D: COVARIANCE MARGINALIZATION

We now show that the covariance matrix decomposition, whichis used in rescaling the ﬁducial covariance estimate to allow forcosmological dependence, does not aﬀect the SH procedure formarginalizing out the scatter due to covariance estimation usingsimulated data samples.Let us consider the unbiased estimator of the true covariancematrix Σ ∈ R p × p for ( m + ) samples { X i } m + i = of a random vector X ∈ R p , (cid:98) Σ = m + (cid:213) i (cid:0) X i − X (cid:1) (cid:0) X i − X (cid:1) (cid:124) , where X = m + (cid:213) i X i . (D1)The distribution of (cid:98) Σ conditional on Σ is Wishart W p ( Σ / m , m ) withthe PDF (Wishart 1928) P W (cid:2)(cid:98) Σ (cid:12)(cid:12) Σ (cid:3) = | Σ / m | − m / mp / Γ p ( m / ) (cid:12)(cid:12)(cid:98) Σ (cid:12)(cid:12) ( m − p − )/ × exp (cid:104) − m (cid:16) Σ − (cid:98) Σ (cid:17)(cid:105) , (D2)where the multivariate gamma function Γ p is deﬁned by Γ p ( x ) ≡ π ( p − ) p / p (cid:214) j = Γ (cid:18) x + − j (cid:19) . (D3)We also introduce the inverse Wishart distribution W − p (cid:0) m (cid:98) Σ , m (cid:1) with the PDF P W -1 (cid:2) Σ (cid:12)(cid:12)(cid:98) Σ (cid:3) = (cid:12)(cid:12) m (cid:98) Σ (cid:12)(cid:12) m / mp / Γ p ( m / ) | Σ | −( m + p + )/ × exp (cid:104) − m (cid:16)(cid:98) ΣΣ − (cid:17)(cid:105) . (D4)Both the Wishart and inverse Wishart distribution possess the fol-lowing invariance property (Gupta & Nagar 2000): given a non-singular matrix D ∈ R p × p ,(i) if Ψ ∼ W p ( A , m ) , then D (cid:124) ΨD ∼ W p ( D (cid:124) AD , m ) ;(ii) if Ψ ∼ W − p ( A − , m ) , then D (cid:124) ΨD ∼ W p ( D (cid:124) A − D , m ) .Since in our covariance matrix decomposition the rescaling MNRAS , 951–965 (2019) M. S. Wang et al. matrix is the diagonal matrix of standard deviations (see equa-tions 31 and 32), the cosmology-varying covariance matrix Σ ( θ ) has the inverse Wishart posterior distribution W − p (cid:0) m (cid:98) Σ ( θ ) , m (cid:1) , P (cid:0) Σ ( θ ) (cid:12)(cid:12)(cid:98) Σ ( θ ) (cid:1) = (cid:12)(cid:12) m (cid:98) Σ (cid:12)(cid:12) m / mp / Γ p ( m / ) | Σ | −( m + p + )/ × exp (cid:104) − m (cid:16)(cid:98) ΣΣ − (cid:17)(cid:105) . (D5)This shows that the variance–correlation decomposition does notchange the SH marginalization step in Section 3.2.Finally, by marginalizing the normal distribution over the pos-terior distribution of Σ (cid:12)(cid:12)(cid:98) Σ derived above, we can replace the unknowncovariance matrix Σ ( θ ) with an unbiased estimate (cid:98) Σ ( θ ) from ( m + ) data samples. This leads to the modiﬁed t -distribution (equation 39)introduced in Sellentin & Heavens (2016), which we recommendusing as the likelihood form in our analysis pipeline. This paper has been typeset from a TEX/L A TEX ﬁle prepared by the author. MNRAS486