[PDF] FAST-PT II: an algorithm to calculate convolution integrals of general tensor quantities in cosmological perturbation theory

Abstract

Cosmological perturbation theory is a powerful tool to predict the statistics of large-scale structure in the weakly non-linear regime, but even at 1-loop order it results in computationally expensive mode-coupling integrals. Here we present a fast algorithm for computing 1-loop power spectra of quantities that depend on the observer's orientation, thereby generalizing the FAST-PT framework (McEwen et al., 2016) that was originally developed for scalars such as the matter density. This algorithm works for an arbitrary input power spectrum and substantially reduces the time required for numerical evaluation. We apply the algorithm to four examples: intrinsic alignments of galaxies in the tidal torque model; the Ostriker-Vishniac effect; the secondary CMB polarization due to baryon flows; and the 1-loop matter power spectrum in redshift space. Code implementing this algorithm and these applications is publicly available at this https URL .

Full PDF

PPrepared for submission to JCAP

FAST-PT

II: an algorithm tocalculate convolution integrals ofgeneral tensor quantities incosmological perturbation theory

Xiao Fang, Jonathan A. Blazek, Joseph E. McEwen, andChristopher M. Hirata

Center for Cosmology and AstroParticle Physics, Department of Physics, The Ohio StateUniversity, 191 W Woodruﬀ Ave, Columbus OH 43210, USAE-mail: [email protected], [email protected], [email protected],[email protected]

Abstract.

FAST-PT framework (McEwen et al. , 2016) that was originally developed forscalars such as the matter density. This algorithm works for an arbitrary input power spec-trum and substantially reduces the time required for numerical evaluation. We apply thealgorithm to four examples: intrinsic alignments of galaxies in the tidal torque model; theOstriker-Vishniac eﬀect; the secondary CMB polarization due to baryon ﬂows; and the 1-loop matter power spectrum in redshift space. Code implementing this algorithm and theseapplications is publicly available at https://github.com/JoeMcEwen/FAST-PT . a r X i v : . [ a s t r o - ph . C O ] F e b ontents J αβ J J Jk ( k ) Integral 62.2.2 Summary of the Algorithm 82.3 Removing Possible Divergences 92.3.1 Divergence From Kernel Expansions 92.3.2 Divergence From Periodic Power Spectrum and Choice of Bias Indices 9

FAST-PT

Format 123.2 Ostriker-Vishniac Eﬀect 133.2.1 Theory 133.2.2 Conversion to

FAST-PT

Format 153.3 Kinetic polarization of the CMB 163.3.1 Theory 163.3.2 Conversion to

FAST-PT

Format 173.4 Redshift Space Distortions 173.4.1 Theory 173.4.2 Conversion to

FAST-PT

Format 20

A.1 Spherical Harmonics and Legendre Polynomials 28A.2 Wigner j and j Symbols 29

B Derivations 30

B.1 Derivation of Eq. (2.3) 30B.2 Derivation of Eq. (2.10) and (2.11) 31

C Proof of Feasibility of Series Expansion 31 – 1 –

Introduction

Observational cosmology has entered a new era of precision measurement. Current and up-coming surveys [1–5] are enabling us to probe large-scale structure in more detail and overlarger volumes, and hence to better constrain the underlying cosmological model. A paralleleﬀort is underway to understand the astrophysical eﬀects that are both signals and contami-nants in these measurements. For example, weak gravitational lensing has become a powerfuland direct probe of the dark matter distribution [6, 7], but it also suﬀers from systematicuncertainties, such as galaxy intrinsic alignments (IA), which must be mitigated in orderto make use of high-precision measurements. Similarly, connecting observable tracers ( e.g. in spectroscopic surveys) with the underlying dark matter requires a description of the biasrelationship [8–12] and the eﬀect of redshift-space distortions (RSDs) [13–15]. Developmentsin CMB measurements provide another illustration, as the range of observables has expandedfrom early initial detections of temperature anisotropies by COBE [16–24]. Current and fu-ture measurements [25–30] will be able to investigate more subtle eﬀects, such as the kineticSunyaev-Zel’dovich (kSZ) [31, 32] and CMB spectral distortions [33, 34].While modern cosmology has advanced signiﬁcantly using our understanding from linearperturbation theory, nonlinear contributions become signiﬁcant at late times and at smallerscales. In the quasi-linear regime, many relevant cosmological observables are usefully de-scribed using perturbation theory at higher order. Signiﬁcant eﬀort has been devoted tounderstanding structure formation via a range of perturbative techniques ( e.g. [35–45]). Inthis work, we consider integrals in standard perturbation theory (SPT), although the methodsand code we develop have a broader range of applications.The next-to-leading-order (“1-loop”) corrections in these perturbative expansions are typ-ically expressed as two-dimensional mode-coupling convolution integrals, which are genericallytime consuming to evaluate numerically. Recent algorithmic developments have dramaticallysped up these computations for scalar quantities – those with no dependence on the directionof the observer, such as the matter density or real-space galaxy density. The new algorithms[46, 47] take advantage of the locality of evolution in perturbation theory, the scale invarianceof cold dark matter (CDM) structure formation, and the Fast Fourier Transform (FFT); andwork is underway to apply them to 2-loop power spectra as well [48]. In a previous paper, weintroduced the

FAST-PT implementation of these methods in Python [46].However, there are many interesting 1-loop convolution integrals for tensor quantities –those with explicit dependence on the observer line of sight, such as those arising for redshift-space distortions. In this case, we need convolution integrals with “tensor” kernels: I ( k ) = (cid:90) d q (2 π ) K ( ˆ q · ˆ q , ˆ q · ˆ k , ˆ q · ˆ k , q , q ) P ( q ) P ( q ) , (1.1)where K ( ˆ q · ˆ q , ˆ q · ˆ k , ˆ q · ˆ k , q , q ) is a tensor mode-coupling kernel, k = q + q , k = | k | ,and P ( q ) is the input signal – typically the linear matter power spectrum – logarithmicallysampled in q . Due to the dependence on the direction of k , the decomposition of these kernelsis more complicated than in the scalar case. In this work, we generalize our FAST-PT algorithm The kernel K can be expressed as a sum of polynomials in the relevant dot products. “Tensor” refersto the general transformation properties of the cosmological quantities being considered under a symmetryoperation – in this case, rotations in SO(3). For instance, the momentum density is a rank 1 tensor (a vector)while the IA ﬁeld is a rank 2 tensor. The scalar case (rank 0) considered in [46] is thus a speciﬁc applicationof this more general framework. – 2 –o evaluate these tensor convolution integrals, achieving O ( N log N ) performance as in thescalar case.This paper is organized as follows: in §2 we provide the mathematical basis for ourmethod (§2.1), introduce our algorithm (§2.2), and discuss divergences that may arise andhow they are resolved (§2.3). In section §3 we apply our method to several examples: thequadratic intrinsic alignment model (§3.1); the Ostriker-Vishniac eﬀect (§3.2); the kineticpolarization of CMB (§3.3); and the 1-loop redshift-space power spectrum (§3.4). Section§4 summarizes the results. An appendix contains derivations of the relevant mathematicalidentities. The Python code implementing this algorithm and the examples presented in thispaper is publicly available at https://github.com/JoeMcEwen/FAST-PT . In this section we extend the

FAST-PT framework to include the computation of convolutionintegrals with tensor kernels in the form of Eq. (1.1)Our approach is similar to the scalar version of

FAST-PT . We ﬁrst expand the kernel intoseveral Legendre polynomial products – the explicit dependence on the direction ˆ k requires anexpansion in three angles rather than one (as shown in Eq. 2.1 and 2.2). Second, products ofLegendre polynomials are written in spherical harmonics using the addition theorem, wherethe required combinations of spherical harmonics are constrained by Wigner j symbols andpreserve angular momentum (as in Eq. 2.3). Third, in conﬁguration space, the integral of eachterm in the expansion can be further transformed into a product of several one-dimensionalintegrals (as in Eq. 2.14 and 2.15), which can be quickly performed by assuming a (biased)log-periodic power spectrum and employing FFTs (as in Eq. 2.18 and 2.22).We will ﬁrst provide the theory in §2.1 and then brieﬂy introduce our algorithm in §2.2.Finally, in §2.3 we will discuss physical divergence problems that can arise and the way tosolve them through the choice of appropriate biasing of the log-periodic power spectrum. In general, the kernel function K can be decomposed as a summation of terms K ( ˆ q · ˆ q , ˆ q · ˆ k , ˆ q · ˆ k , q , q ) = (cid:88) (cid:96) ,(cid:96) ,(cid:96),α,β A αβ(cid:96) (cid:96) (cid:96) P (cid:96) ( ˆ q · ˆ q ) P (cid:96) (ˆ k · ˆ q ) P (cid:96) (ˆ k · ˆ q ) q α q β , (2.1)where P (cid:96) are the Legendre polynomials, and the A αβ(cid:96) (cid:96) (cid:96) coeﬃcients specify the componentsof a particular kernel. For general angular dependences the sum may require an inﬁnitenumber of terms. However the kernels that appear in CDM perturbation theory and galaxybiasing theory are composed of a ﬁnite number of terms in a polynomial expansion. Thisdecomposition leads us to consider integrals of the form f ( k ) = (cid:90) d q (2 π ) P (cid:96) ( ˆ q · ˆ q ) P (cid:96) (ˆ k · ˆ q ) P (cid:96) (ˆ k · ˆ q ) q α q β P ( q ) P ( q ) . (2.2)The product of Legendre polynomials can be decomposed into spherical harmonics bythe addition theorem. Using the result presented in Appendix B.1, we can write the productof three Legendre polynomials in terms of spherical harmonics and Wigner j symbols: P (cid:96) ( ˆ q · ˆ q ) P (cid:96) ( ˆ q · ˆ k ) P (cid:96) ( ˆ q · ˆ k )= (cid:88) J ,J ,J k C J J J k (cid:96) (cid:96) (cid:96) (cid:88) M ,M ,M k Y J M ( ˆ q ) Y J M ( ˆ q ) Y JkMk (ˆ k ) (cid:18) J J J k M M M k (cid:19) , (2.3)– 3 –ith coeﬃcients given by C J J J k (cid:96) (cid:96) (cid:96) =(4 π ) / ( − (cid:96) + (cid:96) + (cid:96) × (cid:112) (2 J + 1)(2 J + 1)(2 J k + 1) (cid:18) J (cid:96) (cid:96) (cid:19) (cid:18) (cid:96) J (cid:96) (cid:19) (cid:18) (cid:96) (cid:96) J k (cid:19) (cid:26) J J J k (cid:96) (cid:96) (cid:96) (cid:27) , (2.4)where we have used the j and j symbols, denoted by ( ) and { }, respectively. The integers M , M , M k satisfy the selection rule M + M + M k = 0 . The coeﬃcients C J J J k (cid:96) (cid:96) (cid:96) mapthe product of spherical harmonics in Eq. (2.3), written in terms of the J , J , J k basis, tothe original (cid:96) , (cid:96) , (cid:96) basis of Legendre polynomials. Upon replacing the product of Legendrepolynomials in Eq. (2.2) with Eq. (2.3) (omitting the coeﬃcients C J J J k (cid:96) (cid:96) (cid:96) ), we arrive at anintegral over the product of three spherical harmonics, which we will denote as I αβ J J Jk ( k ) . Foreach combination of J , J , J k , we have I αβ J J Jk ( k ) = (cid:88) M M M k (cid:90) d q (2 π ) P ( q ) P ( q ) Y J M ( ˆ q ) Y J M ( ˆ q ) Y JkMk (ˆ k ) q α q β (cid:18) J J J k M M M k (cid:19) ≡ (cid:88) M k Y JkMk (ˆ k ) T αβ J J JkMk ( k ) , (2.5)where we have deﬁned T αβ J J JkMk ( k ) ≡ (cid:88) M M (cid:18) J J J k M M M k (cid:19) H αβ J M J M ( k ) and (2.6) H αβ J M J M ( k ) ≡ (cid:90) d q (2 π ) P ( q ) P ( q ) Y J M ( ˆ q ) Y J M ( ˆ q ) q α q β . (2.7)We can separate H αβJ M J M ( k ) into a product of two integrals, respectively over q and q ,by Fourier transforming to conﬁguration space H αβ J M J M ( r ) = (cid:90) d q (2 π ) d q (2 π ) e i ( q + q ) · r q α q β P ( q ) P ( q ) Y J M ( ˆ q ) Y J M ( ˆ q )= ¯ H αβ J J ( r ) Y J M ( ˆ r ) Y J M ( ˆ r ) , (2.8)where we have used the plane wave expansion (Eq. A.5) together with orthogonality relations(Eq. A.3) to arrive at the equality. We have also deﬁned ¯ H αβ J J ( r ) ≡ (4 π ) i J J (2 π ) (cid:90) ∞ dq q α P ( q ) j J ( q r ) (cid:90) ∞ dq q β P ( q ) j J ( q r ) , (2.9)where j J ( qr ) are the spherical Bessel functions. Substituting Eq. (2.8) into the deﬁnition of T αβ J J JkMk we obtain T αβ J J JkMk ( r ) = (cid:88) M M (cid:18) J J J k M M M k (cid:19) H αβ J M J M ( r )= ¯ H αβ J J ( r ) (cid:88) M M (cid:18) J J J k M M M k (cid:19) Y J M ( ˆ r ) Y J M ( ˆ r )= ¯ H αβ J J ( r ) a J J Jk Y ∗ JkMk ( ˆ r ) , (2.10)– 4 –here a J J Jk ≡ (cid:115) (2 J + 1)(2 J + 1)4 π (2 J k + 1) (cid:18) J J J k (cid:19) . (2.11)The derivation of Eqs. (2.10) and (2.11) is provided in Appendix (B.2). Fourier transformingback to k -space, we obtain T αβ J J JkMk ( k ) = (cid:90) d rT αβ J J JkMk ( r ) e − i k · r = a J J Jk (cid:90) r dr ¯ H αβ J J ( r ) (cid:90) d ˆ r Y ∗ JkMk ( ˆ r ) e − i k · r = a J J Jk (cid:90) r dr ¯ H αβ J J ( r ) (cid:90) d ˆ r Y ∗ JkMk ( ˆ r )4 π (cid:88) (cid:96) (cid:48) m (cid:48) ( − i ) (cid:96) (cid:48) j (cid:96) (cid:48) ( kr ) Y ∗ (cid:96) (cid:48) m (cid:48) (ˆ k ) Y (cid:96) (cid:48) m (cid:48) ( ˆ r )= a J J Jk (cid:90) r dr ¯ H αβ J J ( r )4 π (cid:88) (cid:96) (cid:48) m (cid:48) ( − i ) (cid:96) (cid:48) j (cid:96) (cid:48) ( kr ) Y ∗ (cid:96) (cid:48) m (cid:48) (ˆ k ) δ (cid:96) (cid:48) Jk δ m (cid:48) Mk =4 π ( − i ) Jk a J J Jk (cid:90) r dr ¯ H αβ J J ( r ) j Jk ( kr ) Y ∗ JkMk (ˆ k ) , (2.12)where in the third equality we have used the plane wave expansion (Eq. A.5), and in the fourthequality used the orthogonality relation between spherical harmonics (Eq. A.3). Combiningthe results from Eq. (2.9), (2.12), (2.11), we arrive at I αβ J J Jk ( k ) = 4 π ( − i ) Jk a J J Jk (cid:90) r dr ¯ H αβ J J ( r ) j Jk ( kr ) (cid:88) M k Y JkMk (ˆ k ) Y ∗ JkMk (ˆ k )= ( − i ) Jk (2 J k + 1) a J J Jk (cid:90) r dr ¯ H αβ J J ( r ) j Jk ( kr )= ( − J k +( J + J + J k ) / (cid:114) (2 J + 1)(2 J + 1)(2 J k + 1)64 π (cid:18) J J J k (cid:19) × (cid:90) r drJ αβ J J ( r ) j Jk ( kr ) , (2.13)where J + J + J k must be even for the j symbol to be non-zero, and J αβ J J ( r ) is deﬁned by J αβ J J ( r ) ≡ (cid:20)(cid:90) ∞ dq q α P ( q ) j J ( q r ) (cid:21) (cid:20)(cid:90) ∞ dq q β P ( q ) j J ( q r ) (cid:21) . (2.14)Combining Eq. (2.13) and (2.3) we can rewrite the integral (2.2) as (cid:90) d q (2 π ) P (cid:96) ( ˆ q · ˆ q ) P (cid:96) (ˆ k · ˆ q ) P (cid:96) (ˆ k · ˆ q ) q α q β P ( q ) P ( q )= (cid:88) J ,J ,J k C J J J k (cid:96) (cid:96) (cid:96) I αβ J J Jk ( k ) = (cid:88) J ,J ,J k B J J J k (cid:96) (cid:96) (cid:96) (cid:90) r drJ αβ J J ( r ) j Jk ( kr ) , (2.15)– 5 –here the coeﬃcients B J J J k (cid:96) (cid:96) (cid:96) are given by B J J J k (cid:96) (cid:96) (cid:96) ≡ C J J J k (cid:96) (cid:96) (cid:96) ( − J k +( J + J + J k ) / (cid:114) (2 J + 1)(2 J + 1)(2 J k + 1)64 π (cid:18) J J J k (cid:19) =( − (cid:96) + J J Jk × (2 J + 1)(2 J + 1)(2 J k + 1) π × (cid:18) J (cid:96) (cid:96) (cid:19) (cid:18) (cid:96) J (cid:96) (cid:19) (cid:18) (cid:96) (cid:96) J k (cid:19) (cid:18) J J J k (cid:19) (cid:26) J J J k (cid:96) (cid:96) (cid:96) (cid:27) . (2.16)The evaluation of J αβ J J ( r ) is similar to the analogous quantity in scalar FAST-PT . For notationalsimplicity, we deﬁne the last integral in Eq. (2.15) as J αβ J J Jk ( k ) = (cid:90) r drJ αβ J J ( r ) j Jk ( kr ) . (2.17)Eq. (2.17) is similar in structure to Eq. (2.19) of [46]. As such, we can easily generalize the FAST-PT framework to evaluate integrals in the form of Eq. (2.17).Note that some (scalar) 2-loop integrals have similar structure to the tensor 1-loopintegrals considered here. In recent work, Ref. [48] employed similar techniques involvingWigner j symbols to deal with these 2-loop integrals, although the implementations aresomewhat diﬀerent. J αβ J J Jk ( k ) Integral

We adopt the discrete Fourier transformation of the power spectrum as discussed in the ﬁrst

FAST-PT paper [46], c m = W m N − (cid:88) q =0 P ( k q ) k ν q e − πimq/N → P ﬁltered ( k q ) = N/ (cid:88) m = − N/ c m k ν + iη m q , (2.18)where N is the size of the input power spectrm, η m = m × π/ ( N ∆) , m = − N/ , − N/ , ..., N/ − , N/ , ν is the bias index, and ∆ is the linear spacing, i.e. k q = k exp( q ∆) with k being the smallest value in the k array. Similarly, c (cid:48) n are the Fourier coeﬃcients ofthe power spectrum with bias index ν . The physics of the bias has been discussed in [46] and the choice of its value will be discussed in §2.3.2. For a real power spectrum the Fouriercoeﬃcients obey c ∗ m = c − m , c (cid:48)∗ n = c (cid:48)− n . W m is a window function used to smooth the edges ofthe Fourier coeﬃcient array of the biased power spectrum ( e.g. from the cutoﬀs in k ), hencesmoothing over the noise and sharp features in the power spectrum, as well as prevent themfrom propagating non-locally in the “ﬁltered” power spectrum. The “ﬁltered” power spectrumis then treated as the input power spectrum and its c m ’s are used for calculations afterwards. The bias is introduced to solve the numerical divergences arising from the Fourier transform. By per-forming the Fourier transform, we assume the input power spectrum to be periodic, so that there are inﬁnite“satellite” power spectra on both low and high k sides. To avoid inﬁnite contribution from the satellites,appropriate bias values are required. The window function we use is a smoothing function described in Appendix C of [46]. – 6 –ollowing Eq. (2.17) in [46], we can write Eq. (2.14) as J αβ J J ( r ) = π N/ (cid:88) m = − N/ N/ (cid:88) n = − N/ c m c (cid:48) n g αm g βn Q αm + Q βn r − − ν − ν − α − β − iη m − iη n , (2.19)where g αm ≡ g ( J + , Q αm ) , g βn ≡ g ( J + , Q βn ) , Q αm ≡ + ν + α + iη m , Q βn ≡ + ν + β + iη n , and g ( µ, κ ) ≡ Γ[( µ + κ + 1) / µ − κ + 1) / . (2.20)The integral then becomes J αβ J J Jk ( k q ) ≡ (cid:90) ∞ dr r J αβ J J ( r ) j Jk ( k q r )= π N/ (cid:88) m = − N/ N/ (cid:88) n = − N/ c m g αm c (cid:48) n g βn Q αm + Q βn (cid:90) ∞ dr j Jk ( k q r ) r − − ν − ν − α − β − iη m − iη n = π N/ (cid:88) m = − N/ N/ (cid:88) n = − N/ c m g αm c (cid:48) n g βn Q αm + Q βn k Q αm + Q βn q (cid:90) ∞ dr j Jk ( r ) r − − ν − ν − α − β − iη m − iη n = (cid:16) π (cid:17) N/ (cid:88) m = − N/ N/ (cid:88) n = − N/ c m g αm c (cid:48) n g βn Q αm + Q βn k Q αm + Q βn q × (cid:90) ∞ dr J Jk + 12 ( r ) r − − ν − ν − α − β − iη m − iη n = (cid:16) π (cid:17) N/ (cid:88) m = − N/ N/ (cid:88) n = − N/ c m g αm c (cid:48) n g βn Q αm + Q βn k Q αm + Q βn q × − − ν − ν − α − β − iη m − iη n g (cid:18) J k + 12 , − − ν − ν − α − β − iη m − iη n (cid:19) = π / N/ (cid:88) m = − N/ N/ (cid:88) n = − N/ c m g αm c (cid:48) n g βn k Q αm + Q βn q g (cid:18) J k + 12 , − − ν − ν − α − β − iη m − iη n (cid:19) . (2.21)We deﬁne τ h ≡ η m + η n and Q h ≡ Q αm + Q βn , which only depends on the sum m + n . Wewrite the double summation over m and n as a discrete convolution, indexed by h , such that The major step is substituting the expansions of the power spectra into Eq. (2.14), and utilizing theformula: (cid:82) ∞ dt t κ J µ ( t ) = 2 κ g ( µ, κ ) for (cid:60) κ < / , (cid:60) ( κ + µ ) > − , where the Bessel function of the ﬁrst kind J µ is related to the spherical Bessel function by J µ ( t ) = (cid:112) t/πj µ − / ( t ) , and g ( µ, κ ) is deﬁned in Eq. (2.20). – 7 – = m + n = {− N, − N + 1 , · · · , N − , N } . This leads to J αβ J J Jk ( k q ) = π / N/ (cid:88) m = − N/ N/ (cid:88) n = − N/ c m g αm c (cid:48) n g βn k Q h q g (cid:18) J k + 12 , − Q h − (cid:19) = π / (cid:88) h [ c m g αm ⊗ c (cid:48) n g βn ] h k Q h q g (cid:18) J k + 12 , − Q h − (cid:19) = π / k ν + ν + α + βq (cid:88) h C h exp( iτ h ln k ) exp( iτ h q ∆) g (cid:18) J k + 12 , − Q h − (cid:19) = π / k ν + ν + α + βq IFFT (cid:20) C h g (cid:18) J k + 12 , − Q h − (cid:19)(cid:21) , (2.22)where C h is deﬁned as the convolution in the second equality, and IFFT is the discrete inverseFast Fourier Transform. This derivation is similar to Eq. (2.21) in [46].In the algorithm, for each set of ( J , J , J k ) there are 3 FFT operations and 1 convo-luton. In our public code, we use the scipy.signal.fftconvolve routine [49] to perform theconvolution, which uses the convolution theorem, resulting in 3 additional FFT operations.Thus, for each set of ( J , J , J k ) there are 6 FFT operations executed in total. From Eq (2.15), the tensor convolution integral (1.1) can be decomposed as I ( k ) = (cid:88) (cid:96) ,(cid:96) ,(cid:96),α,β A αβ(cid:96) (cid:96) (cid:96) (cid:88) J ,J ,J k B J J J k (cid:96) (cid:96) (cid:96) (cid:90) r drJ αβ J J ( r ) j Jk ( kr ) . (2.23)Our algorithm is thus as follows:1. Given an integral in the form of Eq. (1.1), expand it in terms of Eq. (2.2) to obtain allthe non-zero coeﬃcients A αβ(cid:96) (cid:96) (cid:96) ;2. For each combination of (cid:96) , (cid:96) , (cid:96) , use Eq. (2.16) to calculate all the possible combinationsof J , J , J k and their corresponding (non-zero) coeﬃcients B J J J k (cid:96) (cid:96) (cid:96) ;3. For all the possible combinations of J , J , J k , calculate J αβ J J ( r ) and perform the Hankeltransform integration (see §2.2.1 for the detailed implementation);4. Sum up all the terms to obtain the result.The criteria for non-zero B J J J k (cid:96) (cid:96) (cid:96) can be obtained from the properties of the Wigner j symbols. From Eq. (2.16) we have | (cid:96) − (cid:96) | ≤ J k ≤ (cid:96) + (cid:96) , | (cid:96) − (cid:96) | ≤ J ≤ (cid:96) + (cid:96) , | (cid:96) − (cid:96) | ≤ J ≤ (cid:96) + (cid:96) , (2.24) | J − J | ≤ J k ≤ J + J , (2.25)and J + (cid:96) + (cid:96) = even , (cid:96) + J + (cid:96) = even , (cid:96) + (cid:96) + J k = even . (2.26)The condition that “ J + J + J k = even ” is redundant since it can be infered from theconditions (Eq. 2.26). . Summing up the three equations in Eq. (2.26) we have J + J + J k + 2( (cid:96) + (cid:96) + (cid:96) ) = even , which leadsto J + J + J k = even . – 8 – .3 Removing Possible Divergences Note that the algorithm we have presented in this section is only for the “ P ( k ) ”-type in-tegrals, i.e. containing two power spectra P ( q ) P ( q ) in the integrand as in Eq. (1.1). In§3.4.2 we will encounter integrals containing P ( q ) P ( k ) or P ( q ) P ( k ) , which can be reducedto one-dimensional integrals, analogous to P ( k ) in 1-loop SPT (for details on our algorithmof P and P , see [46]). We ﬁrst focus on the P ( k ) -type integrals, where two potentialtypes of divergence may emerge in this algorithm. When we expand the kernel into the Legendre polynomial form, the integral (2.2) can bedivergent for some combinations of (cid:96), (cid:96) , (cid:96) , α, β , even though the sum of all terms will beconvergent for physical observables. If the input power spectrum is the linear matter powerspectrum P lin ( k ) , for q (cid:29) k , q ≈ − q , and the power spectra, P lin ( q ) and P lin ( q ) , bothscale as q − . Thus the integral (2.2) is proportional to (cid:82) dq q α + β − for (cid:96) = (cid:96) . Convergencerequires that α + β < . For (cid:96) (cid:54) = (cid:96) , this constraint is relaxed due to suppression from theangular integral.For q (cid:28) k , q ≈ k , so that P lin ( q ) ∝ q n s and P lin ( q ) ∝ k n eﬀ ( k ) , where n s ∼ is theprimordial spectral index of the matter power spectrum, and n eﬀ ( k ) is the eﬀective spectralindex at k . The integral is then proportional to (cid:82) dq q α + n s +21 , leading to the requirement: α > − − n s for (cid:96) = (cid:96) . Similarly, for q small, we get β > − − n s for (cid:96) = (cid:96) . As before,these constraints are relaxed if (cid:96) (cid:54) = (cid:96) or (cid:96) (cid:54) = (cid:96) .Violations of these criteria have to be removed by regularization, speciﬁcally cancelingthe divergent parts. None of the examples in the next section have such a divergence (althoughsee §3.4.2 for a discussion of a separate numerical divergence which is treated analytically). As discussed in [46], the use of FFTs enforces a periodic power spectrum which can lead tounphysical divergences for certain choices of the power-law bias. This generalized implemen-tation of

FAST-PT has more freedom in the choice of bias indices ν , ν , compared with theoriginal “scalar” version. First, it allows the use of two diﬀerent bias indices ν , ν for the twoinput power spectra, instead of one ﬁxed ν . Second, it allows the bias indices to change fordiﬀerent Legendre integrals (2.2). We now discuss our choice of ν , ν .In FAST-PT , we expand the input power spectra P lin ( q ) , P lin ( q ) into sums over power-law spectra q ν + iη m and q ν + iη n . The real parts of the exponents, i.e. the bias indices ν , ν ,will aﬀect the convergence of the integrals.Using a similar argument as in the previous subsection, for large q , we will have P lin ( q ) ∝ q ν , P lin ( q ) ∝ q ν . Working out the integral, we end up with the criterion: ν + α + ν + β < − for (cid:96) = (cid:96) . For small q , we get α + ν > − for (cid:96) = (cid:96) ; simi-larly for small q , we get β + ν > − for (cid:96) = (cid:96) . These constraints are relaxed if (cid:96) (cid:54) = (cid:96) or (cid:96) (cid:54) = (cid:96) . We plot the convergence region in Figure 1.In our code, we take ν = − − α and ν = − − β for all cases to satisfy the aboveconditions. Note that the choice of diﬀerent bias values for diﬀerent components of a givenobservable is technically non-physical since the choice of bias speciﬁes the properties of the“universe in which the calculation is done. However, if the input k -range (or zero-padding) is– 9 – + ν α + ν O -3 -3 Figure 1 . The convergence region of the bias indices ν , ν is indicated by the shaded region. suﬃcient, this eﬀect is negligible on scales of interest . The ﬁxed biasing scheme ( ν = − )employed for scalar quantities in [46] avoids this issue. However, because one component of P violates α + ν > − (for (cid:96) = 0 ) under this ﬁxed biasing, we required analytic regularizationto enforce Galilean invariance and remove the formally inﬁnite contribution to displacementsfrom k → modes. Those integrals can be performed using the new scheme without theanalytic regularization, although in this case a larger input range in k (or additional zero-padding) is required for numerical convergence. In this section we apply the

FAST-PT tensor algorithm to several cosmological applications: thequadratic intrinsic alignment model (§3.1); the Ostriker-Vishniac eﬀect (§3.2); the kinetic po-larization of CMB (§3.3); and the 1-loop redshift-space distortion power spectrum (§3.4). Ineach subsection we ﬁrst brieﬂy review the theory behind the application before expanding therelevant integral(s) into the form of Eq. (2.2) and comparing the output for each case with theresults from conventional (and signiﬁcantly slower) two-dimensional cubature integration. To In principle, diﬀerent bias indices could lead to slightly diﬀerent integral results due to contributions fromthe periodic “satellite” power spectra. However, when the input k -range or zero-padding is suﬃcient, theseartiﬁcial contributions become negligible. When the bias indices are chosen inside the convergence region inFig.1, we can always ﬁnd a suﬃcient k -range, while outside the region, there may be no suﬃcient range. Totest the stability of the results, we compared the OV power spectrum (Eq. 3.9) obtained using the bias indices ν = − − α, ν = − − β to the result obtained with the indices ν = − . − α, ν = − . − β , and foundthat the maximum fractional diﬀerence over the range 0.003-10 h/ Mpc is less than 3 × − . – 10 –emonstrate the performance of the code, we provide this comparison out to high wavenum-bers ( k = 10 h/ Mpc). We caution that the underlying perturbative models are not applicableto the real Universe beyond the the mildly nonlinear regime ( k ∼ few × − h/ Mpc), eventhough

FAST-PT can still accurately compute the perturbation theory integrals. We envisionthese examples both as results in and of themselves, and, more importantly, as reference ma-terial for other cosmologists who may want to compute 1-loop power spectra with their ownkernels and convert them to

FAST-PT format.Our input linear power spectrum was generated by CAMB [50], assuming a ﬂat Λ CDMcosmology corresponding to the Planck 2015 results [51]. We used Python version 3.5.1, numpy scipy

Weak gravitational lensing has become one of the most promising probes of the dark matterdistribution [5, 52]. The observed shapes of galaxies are weakly distorted (“sheared”) bythe gravitational potential of the large-scale structure along the line of sight. Correlationsin observed shapes tell us about the projected matter distribution. However, weak lensingsuﬀers from several systematic eﬀects, one of which is intrinsic correlations between galaxyellipticities, known as “intrinsic alignments” (IA) [53, 54]. In the weak lensing regime, theintrinsic shapes of galaxies dominate the observed shapes ( i.e. are much larger than the lensingshear contribution). While the dominant uncorrelated component of intrinsic ellipticitiesdoes not aﬀect the correlation of shapes beyond adding noise, the component correlating theellipticity with the underlying tidal ﬁeld can bias cosmological inference from weak lensingmeasurements [55]. On the other hand, IA can also serve as a probe of the the cosmologicaldensity ﬁeld as well as the astrophysics of galaxies and halos [56].On large scales, there are two types of physically-motivated intrinsic galaxy alignmentmodels, the tidal (linear) and quadratic alignment models [57, 58]. The tidal alignmentmodel is based on the assumption that large-scale correlations in the intrinsic ellipticity ﬁeldof triaxial elliptical galaxies are linearly related to ﬂuctuations in the primordial gravitationaltidal ﬁeld in which the galaxy formed. In quadratic models (often referred to as “tidaltorquing”), the observed ellipticity of spiral galaxies comes from the inclination of the diskwith respect to the line of sight, and hence from the direction of its angular momentum. Inthis scenario, the tidal ﬁeld from the large-scale structure will both “spin-up” the galaxy aswell as provide a torque, contributing to the mean intrinsic ellipticity at second order. Ingeneral, once nonlinear eﬀects are included, both tidal alignment and tidal torquing modelshave contributions from mode coupling integrals of the form of Eq. 1.1 [59]. More generally,these models can be viewed as components in an “eﬀective expansion” of IA [60], analogousto treatments of galaxy biasing [61].In the quadratic alignment model [57], the intrinsic alignment

E/B -mode power spec-trum P ( EE,BB )˜ γ I ( k ) contains a convolution integral in the form of P ( EE,BB )IA , quad ( k ) = 2 (cid:90) d q (2 π ) h E,B ) ( ˆ q , ˆ q ) P lin ( q ) P lin ( q ) , (3.1) Similar results are obtained from assuming that intrinsic shapes are “instantaneously” set by the tidalﬁeld at the time of observation (see [59] for further discussion). – 11 –here k = q + q and h E and h B are tensor kernels. If we choose the coordinate systemsuch that ˆ k = ˆ z and ˆ x points to the observer, h ( E,B ) can be expressed as h E ( ˆ q , ˆ q ) = h zz ( ˆ q , ˆ q ) − h yy ( ˆ q , ˆ q )=( ˆ q · ˆ q )( ˆ q · ˆ k )( ˆ q · ˆ k ) −

13 ( ˆ q · ˆ k ) −

13 ( ˆ q · ˆ k ) − ( ˆ q · ˆ q )( ˆ q · ˆ y )( ˆ q · ˆ y ) + 13 ( ˆ q · ˆ y ) + 13 ( ˆ q · ˆ y ) , (3.2) h B ( ˆ q , ˆ q ) =2 h zy ( ˆ q , ˆ q )= (cid:20) ( ˆ q · ˆ q )( ˆ q · ˆ k ) −

23 ( ˆ q · ˆ k ) (cid:21) ( ˆ q · ˆ y )+ (cid:20) ( ˆ q · ˆ q )( ˆ q · ˆ k ) −

23 ( ˆ q · ˆ k ) (cid:21) ( ˆ q · ˆ y ) (3.3)where we can see that h ( E,B ) have ˆ k dependence. We have made the Limber approximationin assuming that only modes transverse to the line of sight will contribute to observed cor-relations, hence ˆ n = ˆ x . Note that our choice of the coordinate system is diﬀerent from theconventions in some previous work where ˆ z is chosen to be along the line of sight. Because theintegrand has an azimuthal symmetry around k , independent of the line-of-sight direction, itis more convenient to work in our coordinate system, although the ﬁnal results do not dependon this choice. FAST-PT

Format

In spherical coordinates, we have ˆ q i = (sin θ i cos φ i , sin θ i sin φ i , cos θ i ) for i = 1 , . Note that φ = φ − π ≡ φ because q and q add up to k which is on the z − axis. We obtain ˆ q · ˆ y = sin θ sin φ, ˆ q · ˆ y = − sin θ sin φ, ˆ q · ˆ q = cos θ cos θ − sin θ sin θ , ( ˆ q · ˆ y )( ˆ q · ˆ y ) = (cid:16) ˆ q · ˆ q − ( ˆ q · ˆ k )( ˆ q · ˆ k ) (cid:17) sin φ, ( ˆ q · ˆ y ) + ( ˆ q · ˆ y ) = (cid:16) − ( ˆ q · ˆ k ) − ( ˆ q · ˆ k ) (cid:17) sin φ. (3.4)Now we can rewrite h E as h E ( ˆ q , ˆ q ) =( ˆ q · ˆ q )( ˆ q · ˆ k )( ˆ q · ˆ k )(1 + sin φ ) − ( ˆ q · ˆ q ) sin φ −

13 (1 + sin φ ) (cid:104) ( ˆ q · ˆ k ) + ( ˆ q · ˆ k ) (cid:105) + 23 sin φ = µµ µ (1 + sin φ ) − µ sin φ −

13 (1 + sin φ )( µ + µ ) + 23 sin φ, (3.5)where we deﬁne µ ≡ ˆ q · ˆ q , µ ≡ ˆ q · ˆ k , µ ≡ ˆ q · ˆ k (following the convention where each angleis labeled by the subscript for the opposite side in the triangle).– 12 – (cid:96) (cid:96) A E ) (cid:96) (cid:96) (cid:96) A B ) (cid:96) (cid:96) (cid:96) / − / / − / / − / / − / − /

60 59 / − /

15 16 / / − / / − / / − / − /

10 2 / / — Table 1 . The coeﬃcient of each term in the Legendre polynomial expansion of h E and h B kernels(without the factor of 2 in front of the integral Eq. 3.1). Due to symmetry, we need only keep termswith (cid:96) ≥ (cid:96) (multiplying the value by two where relevant). Taking square of h E and then averaging over φ , we obtain h E = 16 − µ + 38 µ −

718 ( µ + µ ) + 712 µ ( µ + µ ) + 1972 ( µ + µ )+ 76 µµ µ − µ µ µ − µ ( µ µ + µ µ ) + 1936 µ µ + 198 µ µ µ = (cid:88) (cid:96) ,(cid:96) ,(cid:96)(cid:96) ≥ (cid:96) A E ) (cid:96) (cid:96) (cid:96) P (cid:96) ( µ ) P (cid:96) ( µ ) P (cid:96) ( µ ) , (3.6)where we apply the symmetry between q and q and only keep terms with (cid:96) ≥ (cid:96) . Sim-ilarly, we can write h B kernel in the same form with coeﬃcients A B ) (cid:96) (cid:96) (cid:96) . The coeﬃcient ofeach term is listed in Table 1. Now each term has been expressed in the required form of q α q β P (cid:96) ( µ ) P (cid:96) ( µ ) P (cid:96) ( µ ) , with α = β = 0 .In Figure 2, we show the FAST-PT result of P ( EE,BB )IA , quad ( k ) (Eq. 3.1) and the fractionaldiﬀerence comparing to the results from conventional methods. The plot shows excellentagreement between two methods, with fractional accuracy better than × − up to k =10 h/ Mpc.

After CMB photons leave the surface of last scattering, they can experience further inter-actions, leading to secondary anisotropies. One of the most important is re-scattering oﬀ offree electrons after reionization in which photons can be shifted to higher or lower frequenciesdue to motions of the electrons. The thermal Sunyaev-Zel’dovich eﬀect (tSZ) results fromthermal motion of the electrons, usually in galaxy clusters as these are the hottest regions.Bulk hydrodynamic motions produce the kinetic Sunyaev-Zel’dovich (kSZ) eﬀect (in clusters) Averaging over the azimuthal angle, we have (cid:104) cos φ (cid:105) = π (cid:82) π dφ cos φ = 1 / , (cid:104) cos φ (cid:105) = π (cid:82) π dφ cos φ = 3 / . More generally, (cid:104) cos n φ (cid:105) = π − Γ( n + ) / Γ( n + 1) for any non-negative integer n , known as the Wallis formula. – 13 – − P EE / BB I A , qu a d ( k ) [ M p c / h ] P EE IA , quad ( k ) P BB IA , quad ( k ) k [ h /Mpc] − × − − × − − × − × − × − × − fractional difference Figure 2 . The

FAST-PT result for the intrinsic alignment integrals P ( EE,BB )IA , quad ( k ) in Eq. (3.1) (upperpanel) and the fractional diﬀerence compared to the conventional method (lower panel). or the Ostriker-Vishniac (OV) eﬀect (in large-scale structure). In this section, we considerthe second-order perturbation theory analysis of the Ostriker-Vishniac eﬀect.The fractional temperature perturbation in the direction ˆ n on the sky is given by [62–64] Θ( ˆ n ) = − (cid:90) η dw g ( w ) ˆ n · q ( w ) , (3.7)where q ( w ) ≡ [1 + δ ( w )] v ( w ) , v ( w ) is the bulk velocity at position w ≡ w ˆ n at a comovingdistance w (or a conformal time η − w ), g ( w ) is the visibility function specifying the proba-bility distribution for scattering from reionized electrons, given by g ( w ) = ( dτ /dw ) e − τ , and τ is the optical depth.At 1-loop, the angular power spectrum of Θ produced by the OV eﬀect, C ΘΘ (cid:96) (equivalentto P p ( κ ) in [64]), requires the calculation of the Vishniac power spectrum, which is a tensorconvolution integral. In a ﬂat Universe, C ΘΘ (cid:96) = 116 π (cid:90) η ( a ( w ) g ( w )) w (cid:32) ˙ DDD (cid:33) S ( (cid:96)/w ) dw , (3.8)where D and D are the growth factors at w and at present, respectively. Choosing the samecoordinate system as in the IA calculation above, i.e. ˆ z = ˆ k and ˆ x pointing to the observer,the integral is given by S ( k ) = 4 π (cid:90) d q (2 π ) (cid:18) q x q + q x q (cid:19) P lin ( q ) P lin ( q ) , (3.9)which is consistent with Eq. (21) in [64]. Our interest here is in fast computation of S ( k ) .– 14 – S ( k ) [ M p c / h ] k [ h /Mpc] − × − − × − − × − × − × − × − fractional difference Figure 3 . The

FAST-PT result for the Ostriker-Vishniac eﬀect integral S ( k ) in Eq. (3.9) (upperpanel) and the fractional diﬀerence compared to the conventional method (lower panel). FAST-PT

Format

First noting that the integral S ( k ) is symmetric under the exchange q ↔ q and that q x = − q x , we can expand Eq. (3.9) as S ( k ) = 4 π (cid:90) d q (2 π ) (cid:18) q x q − q x q q (cid:19) P lin ( q ) P lin ( q ) . (3.10)In the spherical coordinate system, q x = q sin θ cos φ , which becomes q sin θ afteraveraging over φ . The kernel is thus (cid:16) − (ˆ k · ˆ q ) (cid:17) (cid:18) q − q (cid:19) = 23 [ P ( µ ) − P ( µ )] (cid:18) q − q (cid:19) , (3.11)where µ ≡ ˆ k · ˆ q . There are 4 terms in this case: A − , = 2 / , A − , = − / , A , − = − / , A , − = 2 / . In Figure 3, we show the

FAST-PT result of S ( k ) integral (Eq. 3.9) and the fractionaldiﬀerence from a conventional method. The plot shows excellent agreement between twomethods with accuracy better than × − up to k = 10 h/ Mpc. It is possible to write the integral S ( k ) in other forms without breaking the q ↔ q symmetry, e.g. towrite the kernel as (cid:0) − µ (cid:1) (cid:16) q − q + q q (cid:17) . However, the q − terms suﬀer from divergence at small q (see§2.3). The divergence is artiﬁcial because − µ → when q → , which makes physical sense, but it cancause instability in the FAST-PT code. – 15 – .3 Kinetic polarization of the CMB3.3.1 Theory

The kSZ eﬀect can induce a secondary linear polarization in the CMB via the quadraticDoppler eﬀect and Thomson scattering [65, 66]. Due to the motion of baryons, an isotropicCMB appears to have a quadrupole anisotropy component in the rest frame of the scatteringbaryons, as seen from the expansion

Θ = (cid:113) − v b − ˆ n · v b − (cid:39) ˆ n · v b + ( ˆ n · v b ) − v b , (3.12)where Θ is the fractional temperature ﬂuctuation of CMB in the direction of ˆ n as seen bythe scattering electron. The relation between the quadrupole anisotropy at position x andthe CMB temperature angular distribution seen by the scatter is given by Q ( m ) ( x ) = − (cid:90) d Ω Y ∗ m ( ˆ n ) √ π Θ( x , ˆ n ) , (3.13)where m = 0 , ± , ± . In the Rayleigh-Jeans limit, the observed power spectra of E - and B -mode polarizations are related to the power spectra of Q (0 , ± and Q ( ± , respectively, by C EE(cid:96) = 3 π (cid:96) (cid:90) dw g D A (cid:32)

34 ∆ Q ( k ) + 18 (cid:88) m = ± ∆ m ) Q ( k ) (cid:33) and C BB(cid:96) = 3 π (cid:96) (cid:90) dw g D A (cid:32) (cid:88) m = ± ∆ m ) Q ( k ) (cid:33) , (3.14)where ∆ m ) Q ( k ) = k P ( m ) ( k ) / (2 π ) is the variance of Q ( m ) per unit range in ln k , the spheri-cal harmonics in Eq. (3.13) are evaluated with k on the z -axis, g is the visibility function, andthe comoving angular distance D A = w (the comoving distance) in a ﬂat Universe. Since thequadrupole anisotropy arises from the quadratic Doppler eﬀect, in Fourier space with ˆ k = ˆ z ,we have Q ( m ) ( k ) = − (cid:90) d Ω Y ∗ m ( ˆ n ) √ π (cid:90) d q (2 π ) ˆ n · v b ( q ) ˆ n · v b ( q ) , (3.15)where v b is the baryon bulk velocity. In linear theory ˙ δ = − ∇ · v b a = f Hδ , (3.16)where f ≡ d ln G/d ln a for growth factor G and scale factor a . Taking the Fourier transformand assuming no vorticity, we obtain v b ( k ) = iaf H δ ( k ) k ˆ k ≡ iT δ ( k ) k ˆ k . (3.17) This limit is necessary to justify saying that temperature is scattered – really it is the intensity, butat low frequencies the two are proportional. As noted in Ref. [65], the kinetic polarization has a speciﬁcnon-blackbody spectral shape, which can be used to scale from the Rayleigh-Jeans limit to any frequency ofinterest. – 16 –ubstituting Eq. (3.17) into Eq. (3.15) and applying identities (A.1, A.11), we have Q ( m ) ( k ) = T √ π (cid:90) d q (2 π ) δ ( q ) δ ( q ) q q (cid:90) d Ω Y ∗ m ( ˆ n ) ( ˆ n · ˆ q ) ( ˆ n · ˆ q )= T √ π (cid:90) d q (2 π ) δ ( q ) δ ( q ) q q (cid:90) d Ω Y ∗ m ( ˆ n ) (cid:18) π (cid:19) (cid:88) m m Y m ( ˆ q ) Y ∗ m ( ˆ n ) Y m ( ˆ q ) Y ∗ m ( ˆ n )= T √ π (cid:90) d q (2 π ) δ ( q ) δ ( q ) q q (cid:18) π (cid:19) (cid:88) m m Y m ( ˆ q ) Y m ( ˆ q ) (cid:114) π (cid:18) (cid:19) (cid:18) m m m (cid:19) = T √ π (cid:18) π (cid:19) (cid:114) π (cid:90) d q (2 π ) δ ( q ) δ ( q ) q q (cid:88) m m Y m ( ˆ q ) Y m ( ˆ q ) (cid:18) m m m (cid:19) . (3.18)Following the deﬁnition that (cid:104) Q ( m ) ( k ) Q ( m ) ( k (cid:48) ) (cid:105) = (2 π ) P Q ( m ) ( k ) δ D ( k + k (cid:48) ) , we have P ( m ) ( k ) ≡ P Q ( m ) ( k )4(4 π ) T = (cid:90) d q (2 π ) P lin ( q ) P lin ( q ) q q (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) m m Y m ( ˆ q ) Y m ( ˆ q ) (cid:18) m m m (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (3.19)which is a tensor convolution integral in the form of Eq. (2.2). FAST-PT

Format

Since k (cid:107) ˆ z , the kernels for each m can be written in terms of µ, µ , µ . Note that m , m can only be 0 or ± , so we can explicitly write down all the spherical harmonics and Wigner j symbols in the summation and transform to Legendre polynomial products as before: m = 0 : q − q − π [2 + 6 P ( µ ) + P ( µ ) + 6 P ( µ ) P ( µ ) − P ( µ ) P ( µ ) P ( µ )] ,m = ± q − q − π [1 − P ( µ ) + 9 P ( µ ) P ( µ ) P ( µ ) − P ( µ ) P ( µ )] ,m = ± q − q − π [1 − P ( µ ) + P ( µ ) P ( µ )] . (3.20)Note that the symmetry between µ and µ has been used to simplify the kernels. Thecoeﬃcients A αβ(cid:96) (cid:96) (cid:96) are now trivially seen.In Figure 4, we show the FAST-PT result of P ( m ) ( k ) integrals (Eq. 3.19) for m = 0 , ± , ± ,respectively, and the fractional diﬀerence from a conventional method. The plots show excel-lent agreement between two methods with accuracy better than × − in the k range from . to h/ Mpc.

Cosmological surveys map large-scale structure in three dimensions, using galaxies or otherluminous tracers of the total matter distribution (e.g. [1–5]). To determine distance along theline-of-sight, surveys typically use redshift information and are thus actually making a mapin “redshift space.” In order to compare theory to galaxy redshift survey data, models mustbe translated into redshift space. – 17 – − − − − P ( m ) ( k ) [ M p c / h ] P ( ) ( k ) k [ h /Mpc] − × − − × − − × − × − × − × − fractional difference P ( ± ) ( k ) k [ h /Mpc] P ( ± ) ( k ) k [ h /Mpc] Figure 4 . The

FAST-PT results for the kinectic CMB polarization integrals P ( m ) ( k ) in Eq. (3.19)(upper panels) and the fractional diﬀerence compared to the conventional method (lower panels). Tracers tend to infall towards overdense regions and, due to the Doppler eﬀect, will thushave observed redshifts that deviate from those predicted by pure cosmological expansion.These deviations cause “redshift-space distortions” (RSDs) in the observed tracer distribution.Although at highly nonlinear scales RSDs are no longer well-described by perturbation theory, e.g. the “Fingers of God” (FoG) eﬀect [67], we can still explore the mildly nonlinear regimevia perturbation theory, avoiding time-consuming numerical simulations.The “textbook” model for linear RSDs, the Kaiser eﬀect [13], relates the matter powerspectrum in redshift space matter to that in real space matter with an angular-dependentbias factor related to the growth rate of structure. Subsequently, [14] improved the Kaisermodel by distinguishing P δθ and P θθ from P δδ , where θ is the divergence of velocity ﬁeld. Inthe linear regime of standard perturbation theory, these three power spectra are equal to eachother.The TNS model [15] accounts for the nonlinear mode coupling between density andvelocity ﬁelds, improving the modeling of the matter power spectrum in redshift space acrossa range of scales (including the BAO scale). Fixing k along the ˆ z direction, and deﬁning θ n as the angle between ˆ n (the line-of-sight direction) and k , with µ n ≡ cos θ n , the densitypower spectrum in the redshift space can be written: P ( S ) ( k, µ n ) = D FoG [ kµ n f σ v ] (cid:8) P δδ ( k ) + 2 f µ n P δθ ( k ) + f µ n P θθ ( k ) + A ( k, µ n ) + B ( k, µ n ) (cid:9) , (3.21)where D FoG [ kµ n f σ v ] encapsulates the contribution from the FoG eﬀect. The A, B terms are– 18 –ensor convolution integrals given by ¯ A ( k, µ n ) ≡ A ( k, µ n ) kµ n f = (cid:90) d q (2 π ) q n q [ B σ ( q , q , − k ) − B σ ( q , k , − k − q )] , (3.22) ¯ B ( k, µ n ) ≡ B ( k, µ n )( kµ n f ) = (cid:90) d q (2 π ) F ( q ) F ( q ) , (3.23)where k = q + q , and the subscript “ n ” denotes the projection onto ˆ n , e.g. q n ≡ q · ˆ n , and F ( q ) = q n q (cid:18) P δθ ( q ) + f q n q P θθ ( q ) (cid:19) . (3.24)The cross bispectra B σ is deﬁned by (cid:28) θ ( k ) (cid:20) δ ( k ) + f k n k θ ( k ) (cid:21) (cid:20) δ ( k ) + f k n k θ ( k ) (cid:21)(cid:29) = (2 π ) δ D ( k + k + k ) B σ ( k , k , k ) . (3.25)The convolution integrals ¯ A ( k, µ n ) and ¯ B ( k, µ n ) are particularly time-consuming ( e.g. [68])and are ideal applications for our algorithm. ¯ B Term

Substituting the F ( q ) kernel into the ¯ B ( k, µ n ) integral, we obtain ¯ B ( k, µ n ) = (cid:90) d q (2 π ) q n q n q q (cid:20) P δθ ( q ) P δθ ( q ) + f q n q n q q P θθ ( q ) P θθ ( q ) + 2 f q n q P δθ ( q ) P θθ ( q ) (cid:21) = (cid:90) d q (2 π ) ( ˆ q · ˆ n )( ˆ q · ˆ n ) q q P lin ( q ) P lin ( q ) + f (cid:90) d q (2 π ) ( ˆ q · ˆ n ) ( ˆ q · ˆ n ) q q P lin ( q ) P lin ( q )+ 2 f (cid:90) d q (2 π ) ( ˆ q · ˆ n )( ˆ q · ˆ n ) q q P lin ( q ) P lin ( q ) . (3.26)As previously mentioned, P δθ , P θθ , P δδ are all equal to P lin at the leading order. Since termsin the form of ( ˆ q · ˆ n ) p ( ˆ q · ˆ n ) p with non-negative integers p , p can always be decomposedas a polynomial in terms of ˆ k · ˆ n after longitude angle averaging (see Appendix C for a proof),it is natural to write ¯ B as ¯ B ( k, µ n ) = (cid:88) i =0 B i ( k ) µ in , (3.27)where each B i ( k ) is a tensor convolution integral that can be written in terms of products ofLegendre polynomials. ¯ A Term

The cross bispectrum satisﬁes B σ ( k , k , k ) = B σ ( − k , − k , − k ) = B σ ( k , k , k ) , so wecan write the ¯ A integral as ¯ A ( k, µ n ) = (cid:90) d q (2 π ) q n q [ B σ ( q , q , − k ) − B σ ( − q , k + q , − k )] . (3.28)Changing the dummy variable q to − q in the second term, we have ¯ A ( k, µ n ) = (cid:90) d q (2 π ) q n q [ B σ ( q , q , − k ) + B σ ( q , k − q , − k )] = 2 (cid:90) d q (2 π ) q n q B σ ( q , q , − k ) . (3.29)– 19 – (cid:96) (cid:96) A αβ(cid:96) (cid:96) (cid:96) i = 0 i = 2 i = 4 i = 6 B i − / − f / − f /

20 3 / + f /

20 3 f / − f /

20 131 f / f / + f / − f − f / f / − f /

10 47 f / − f /

20 21 f / − f /

20 231 f / (1+ f ) / + f / − / + f / − f / − f / − f / − f / − f / f + f / − f / + f / − f / f / − f /

12 53 f / − f / Table 2 . The coeﬃcient of each term in the Legendre polynomial expansion of kernels of B i ( k ) . α = β = − for all the terms. Due to symmetry, we need only keep terms with (cid:96) ≥ (cid:96) (multiplyingthe value by two where relevant). Empty entries are equal to the previous row. Expanding the left-hand side of Eq. (3.25) to the leading order, we have B σ ( q , q , − k ) =2 (cid:18) q n q f (cid:19) (cid:18) k n k f (cid:19) G ( q , − k ) P lin ( q ) P lin ( k )+ 2 (cid:18) k n k f (cid:19) (cid:18) F ( q , − k ) + q n q f G ( q , − k ) (cid:19) P lin ( q ) P lin ( k )+ 2 (cid:18) q n q f (cid:19) (cid:18) F ( q , q ) + k n k f G ( q , q ) (cid:19) P lin ( q ) P lin ( q ) . (3.30)Similarly, we can expand the integral ¯ A as a polynomial in terms of µ n : ¯ A ( k, µ n ) = (cid:88) i =0 A i ( k ) µ in . (3.31)Each A i ( k ) can be separated into two parts: A i ( k ) = A I i ( k ) + A II i ( k ) , (3.32)where A I i ( k ) is a convolution integral with P lin ( q ) P lin ( q ) in the integrand, while A II i ( k ) hasan integrand with P lin ( q ) P lin ( k ) or P lin ( q ) P lin ( k ) , which is similar to the P integral andcan be treated in a similar fashion. FAST-PT

Format

The B i ( k ) and A I i ( k ) integrals are standard convolution integrals, which can be decomposedinto the form of Eq. (2.2). The associated coeﬃcients A αβ(cid:96) (cid:96) (cid:96) are listed in Tables 2 and 3.The A II i ( k ) integrals are ﬁrst decomposed into the form of P αβγ(cid:96) (cid:96) (cid:96) ( k ) = (cid:90) d q (2 π ) q α q β k γ P (cid:96) ( ˆ q · ˆ k ) P (cid:96) ( ˆ q · ˆ k ) P (cid:96) ( ˆ q · ˆ q ) P lin ( q ) P lin ( k ) , (3.33)with coeﬃcients A αβγ(cid:96) (cid:96) (cid:96) given by Table 4, so that for each A II i integral, A II i ( k ) = (cid:88) A αβγ(cid:96) (cid:96) (cid:96) P αβγ(cid:96) (cid:96) (cid:96) ( k ) . (3.34)Note that for P lin ( q ) P lin ( k ) terms one can always exchange the indices (1 ↔

2) of q and (cid:96) inthe integrand to recover the form above. For the special case that β = (cid:96) = (cid:96) = 0 and (cid:96) (cid:54) = 0 ,– 20 – β (cid:96) (cid:96) (cid:96) A αβ(cid:96) (cid:96) (cid:96) i = 1 i = 3 i = 5 A I i − / + f / f / + f / f / − f /

21 340 f / − f /

21 260 f / f / − f / + f / − f / − f f / − f f / / + f / f / + f / − f / − f /

21 80 f / − f /

21 160 f / f / − f / + f / − f / − f / − f / + f / − f / f / f / − f f / − f f / f / − f / + f / − f / Table 3 . The coeﬃcient of each term in the Legendre polynomial expansion of kernels of A I i ( k ) . Theempty entries mean that they equal to the previous row. the integral vanishes. These P -like integrals can be further reduced to one-dimensionalintegrals and quickly calculated using discrete convolutions as done for P in [46]. P αβγ(cid:96) (cid:96) (cid:96) ( k ) = k γ P lin ( k ) (cid:90) d q (2 π ) q α q β P (cid:96) (cid:18) k − q µ q (cid:19) P (cid:96) ( µ ) P (cid:96) (cid:18) kµ − q q (cid:19) P lin ( q )= k γ P lin ( k )(2 π ) (cid:90) ∞ dq q α P lin ( q ) (cid:90) − dµ q β P (cid:96) (cid:18) k − q µ q (cid:19) P (cid:96) ( µ ) P (cid:96) (cid:18) kµ − q q (cid:19) , (3.35)where q = (cid:112) k + q − kq µ . The angular ( µ ) integration can be performed analytically. Summing the components, we ﬁnd: A II i ( k ) = k P lin ( k )672 π (cid:90) ∞ drP lin ( kr ) Z i ( r ) , i = 1 , , , (3.36)where Z ( r ) = 18 fr − − f + (192 − f ) r − (72 − f ) r + (cid:20) fr + 36(1 − f ) r − − f ) r + 36(3 − f ) r − − f ) r (cid:21) ln (cid:12)(cid:12)(cid:12)(cid:12) − r r (cid:12)(cid:12)(cid:12)(cid:12) , (3.37) Z ( r ) = 18 f (1 + f ) r − f − f + (318 f − f ) r − (126 f − f ) r + (cid:20) f (1 + f ) r + 36 f (1 − f ) r − f (3 − f ) r + 36 f (5 − f ) r − f (7 − f ) r (cid:21) ln (cid:12)(cid:12)(cid:12)(cid:12) − r r (cid:12)(cid:12)(cid:12)(cid:12) ,Z ( r ) = 18 f r − f + 126 f r − f r + (cid:20) f r − f r + 72 f r − f r (cid:21) ln (cid:12)(cid:12)(cid:12)(cid:12) − r r (cid:12)(cid:12)(cid:12)(cid:12) . There are several ways to do this; a brute-force approach is to write µ in terms of q (at ﬁxed k and q ),which turns the integral into a linear combination of power laws in q . – 21 –he integral (3.36) is a convolution. Upon making the substitution r = e − s , Eq. (3.36)becomes A II i ( k ) = k P lin ( k )672 π (cid:90) ∞−∞ ds e − s P lin ( e log k − s ) Z i ( e − s )= k P lin ( k )672 π (cid:90) ∞−∞ ds G i ( s ) F (log k − s ) , (3.38)where G i ( s ) ≡ e − s Z i ( e − s ) and F ( s ) ≡ P lin ( e s ) . We can convert to a discrete convolutionwith the substitutions ds → ∆ , log k n = log k + n ∆ , and s m = log k + m ∆ (where k is thesmallest value in the k array): (cid:90) ∞−∞ ds G i ( s ) F (log k − s ) → ∆ N − (cid:88) m =0 G Di ( m ) F D ( n − m ) , (3.39)where in the ﬁnal line we deﬁne the discrete functions G Di ( m ) ≡ G i ( s m ) and F D ( m ) ≡ F ( m ∆) . We then have A II i ( k n ) = k n P lin ( k n )∆672 π [ G Di ⊗ F D ][ n ] , i = 1 , , . (3.40)Thus A II i ( k ) , which at ﬁrst appears to involve order N steps (an integral over N samples ateach of N output values k n ), can in fact be computed for all output k n in O ( N log N ) steps .Note that some integrals P αβγ(cid:96) (cid:96) (cid:96) ( k ) suﬀer from a divergence due to contributions fromsmall-scales. When summing to get A II i , the divergent parts cancel each other precisely.Taking q to be large, so that q → − q and P lin ( q ) ∝ q − , we have P αβγ(cid:96) (cid:96) (cid:96) ( k ) → ( − (cid:96) + (cid:96) δ (cid:96) (cid:96) k γ P lin ( k )(2 (cid:96) + 1)2 π (cid:90) dq q α + β P lin ( q ) ∝ (cid:90) dq q α + β − , (3.41)so that the divergence appears when (cid:96) = (cid:96) and α + β ≥ . In Table 4 there are 5 terms thatsuﬀer from this divergence problem. However these divergences cancel in A II i ; in our case, thecancellation occurs when doing the sum over ( α, β, γ, (cid:96) , (cid:96) , (cid:96) ) to derive Z i ( r ) .In Figures 5, we show the FAST-PT results of A + B terms in the TNS model (Eq. 3.21)for f = 1 and µ n = 0 . , . , . , respectively, as well as the fractional diﬀerence comparedto our conventional method. The plots show excellent agreement between two methods withaccuracy at the − level for most of the k range from . to h/ Mpc. Note that theindividual A and B terms agree to signiﬁcantly higher precision ( ∼ − ). Cancellationsamong terms in the total A + B amplify the fractional diﬀerence, especially at high k andnear the zero-crossing. In this paper we have extended the

FAST-PT algorithm to treat 1-loop convolution integralswith tensor kernels (explicitly dependent on the direction of the observed mode). The general-ized algorithm has many applications – we have presented quadratic intrinsic alignments, the In principle, N is the size of the input k array. However, to suppress the possible ringing and alisingeﬀects, we need to apply appropiate window functions, zero-padding or extend the input power spectrum intoa larger range. The true value of N is usually a few times larger than the original value, depending on theuser’s inputs and options. – 22 – α β (cid:96) (cid:96) (cid:96) A αβγ(cid:96) (cid:96) (cid:96) i = 1 i = 3 i = 5 A II i − − f /

35 36 f / − f /

35 32 f / − f /

35 32 f / f / − f / + f / − f / f / − f / + f / − f / − / − f /

105 80 f / − f /

105 4 f / / − f /

147 1012 f / − f /

147 788 f / − f /

245 64 f / − f /

245 64 f / f / − f / + f / − f / f / − f / + f / − f / − − / − f / − f / f / − f / + f / − f / f / − f / + f / − f / − f f − f f − − f / − f + f / − f f / − f + f / − f − f / f / − f / f / − f / f / − f / f / − − / − f / − f / f / − f / + f / − f / f / − f / + f / − f / − f f − f f − − − f / − f + f / − f f / − f + f / − f − f / f / − f / f / − f / f / − f / f / Table 4 . The coeﬃcient of each term in the Legendre polynomial expansion of kernels of A II i ( k ) . Ostriker-Vishniac eﬀect, kinetic CMB polarizations, and a sophisticated model for redshiftspace distortions. Our algorithm and code achieve high precision for all of these applications.We have tested the output of the code to high wavenumber ( k = 10 h/ Mpc), although wereiterate that the smaller scales considered are beyond the range of validity of the underlyingperturbative models. The reduction in evaluation time is similar as for the scalar

FAST-PT .For instance, execution time is ∼ . seconds for 600 k values in all our examples. In theresults shown here, the input power spectrum was sampled at 100 points per log interval.We ﬁnd that much of the noise (in comparisons with the conventional method) is driven bythe exact process by which the CAMB power spectrum is interpolated before it is used in FAST-PT .There are underlying physical concepts and symmetries that make the eﬃciency of thisalgorithm possible. For example, the locality of the gravitational interactions allows us toseparate diﬀerent modes in conﬁguration space. Since the structure evolution under gravityonly depends on the local density and velocity divergence ﬁelds, in Fourier space the 1-looppower spectra of the matter density as well as its tracers (assuming local biasing theories)must be in form of Eq. (1.1), where the kernels can always be written in terms of dot productsof diﬀerent mode vectors. Without this locality, it may not be possible to write the desired– 23 – − − A + B [ M p c / h ] f = . µ n = . A + B − ( A + B ) k [ h /Mpc] − × − − × − × − × − fractional difference f = . µ n = . k [ h /Mpc] f = . µ n = . k [ h /Mpc] Figure 5 . The

FAST-PT result for the redshift space distortion nonlinear corrections A ( k, µ n ) + B ( k, µ n ) in the TNS model, Eq. (3.21) (upper panels) and the fractional diﬀerence compared to theconventional method result (lower panels). power spectrum as a sum of terms that can be calculated with this algorithm. The scaleinvariance of the problem also indicates that we should decompose the input power spectruminto a set of power-law spectra and make full use of the FFT algorithm. There are alsorotational symmetries that allow us to reduce the 3-dimensional integrals to 1-dimension.This algorithm, and implementations of the examples presented here, are publicly avail-able as a Python code package at https://github.com/JoeMcEwen/FAST-PT . Acknowledgments

XF is supported by the Simons Foundation, JB is supported by a CCAPP Fellowship, JM issupported by NSF grant AST1516997, and CH by the Simons Foundation, the US Departmentof Energy, the Packard Foundation, and NASA.

References [1]

DESI collaboration, M. Levi et al.,

The DESI Experiment, a whitepaper for Snowmass 2013 , .[2] K. S. Dawson, D. J. Schlegel, C. P. Ahn, S. F. Anderson, É. Aubourg, S. Bailey et al., TheBaryon Oscillation Spectroscopic Survey of SDSS-III , Astron. J. (2013) 10, [ ].[3] R. Laureijs, J. Amiaux, S. Arduini, J. . Auguères, J. Brinchmann, R. Cole et al.,

EuclidDeﬁnition Study Report , ArXiv e-prints (Oct., 2011) , [ ]. – 24 –

4] D. Spergel, N. Gehrels, J. Breckinridge, M. Donahue, A. Dressler, B. S. Gaudi et al.,

Wide-Field InfraRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTAFinal Report , ArXiv e-prints (2013) , [ ].[5]

DES collaboration, T. Abbott et al.,

The Dark Energy Survey: more than dark energy - anoverview , Mon. Not. Roy. Astron. Soc. (2016) , [ ].[6] M. Bartelmann and P. Schneider,

Weak gravitational lensing , Phys. Rept. (2001) 291–472,[ astro-ph/9912508 ].[7] Y. Mellier,

Probing the universe with weak lensing , Ann. Rev. Astron. Astrophys. (1999)127–189, [ astro-ph/9812172 ].[8] SDSS collaboration, U. Seljak, A. Makarov, R. Mandelbaum, C. M. Hirata, N. Padmanabhan,P. McDonald et al.,

SDSS galaxy bias from halo mass-bias relation and its cosmologicalimplications , Phys. Rev.

D71 (2005) 043511, [ astro-ph/0406594 ].[9] P. McDonald,

Clustering of dark matter tracers: Renormalizing the bias parameters , Phys. Rev.

D74 (2006) 103512, [ astro-ph/0609413 ].[10] P. McDonald and A. Roy,

Clustering of dark matter tracers: generalizing bias for the comingera of precision LSS , J. Cosmo. Astropart. Phys. (Aug., 2009) 020, [ ].[11] T. Baldauf, U. Seljak, L. Senatore and M. Zaldarriaga, Galaxy Bias and non-Linear StructureFormation in General Relativity , JCAP (2011) 031, [ ].[12] U. Seljak,

Bias, redshift space distortions and primordial nongaussianity of nonlineartransformations: application to Ly- α forest , J. Cosmo. Astropart. Phys. (Mar., 2012) 004,[ ].[13] N. Kaiser, Clustering in real space and in redshift space , Mon. Not. R. Astron. Soc. (1987)1–21.[14] R. Scoccimarro,

Redshift-space distortions, pairwise velocities and nonlinearities , Phys. Rev.

D70 (2004) 083007, [ astro-ph/0407214 ].[15] A. Taruya, T. Nishimichi and S. Saito,

Baryon Acoustic Oscillations in 2D: ModelingRedshift-space Power Spectrum from Perturbation Theory , Phys. Rev.

D82 (2010) 063522,[ ].[16] I. A. Strukov, A. A. Brukhanov, D. P. Skulachev and M. V. Sazhin,

Anisotropy of themicrowave background radiation , Soviet Astronomy Letters (1992) 153.[17] G. F. Smoot, C. Bennett, A. Kogut, E. Wright, J. Aymon et al., Structure in the COBEdiﬀerential microwave radiometer ﬁrst year maps , Astrophys.J. (1992) L1–L5.[18] J. Kovac, E. Leitch, C. Pryke, J. Carlstrom, N. Halverson et al.,

Detection of polarization in thecosmic microwave background using DASI , Nature (2002) 772–787, [ astro-ph/0209478 ].[19] A. Readhead, S. Myers, T. J. Pearson, J. Sievers, B. Mason et al.,

Polarization observationswith the Cosmic Background Imager , Science (2004) 836, [ astro-ph/0409569 ].[20]

WMAP collaboration, C. Bennett et al.,

Nine-Year Wilkinson Microwave Anisotropy Probe(WMAP) Observations: Final Maps and Results , Astrophys.J.Suppl. (2013) 20,[ ].[21]

SPT collaboration, A. T. Crites et al.,

Measurements of E-Mode Polarization andTemperature-E-Mode Correlation in the Cosmic Microwave Background from 100 SquareDegrees of SPTpol Data , Astrophys. J. (2015) 36, [ ].[22]

ACTPol collaboration, S. Naess et al.,

The Atacama Cosmology Telescope: CMB Polarizationat < (cid:96) < , JCAP (2014) 007, [ ]. – 25 – POLARBEAR collaboration, P. A. R. Ade et al.,

Evidence for Gravitational Lensing of theCosmic Microwave Background Polarization from Cross-correlation with the Cosmic InfraredBackground , Phys. Rev. Lett. (2014) 131302, [ ].[24]

BICEP2, Planck collaboration, P. A. R. Ade et al.,

Joint Analysis of BICEP2/

KeckArray and

P lanck

Data , Phys. Rev. Lett. (2015) 101301, [ ].[25]

Planck collaboration, R. Adam et al.,

Planck 2015 results. I. Overview of products andscientiﬁc results , .[26] A. Kogut, D. J. Fixsen, D. T. Chuss, J. Dotson, E. Dwek, M. Halpern et al., The PrimordialInﬂation Explorer (PIXIE): a nulling polarimeter for cosmic microwave backgroundobservations , J. Cosmo. Astropart. Phys. (2011) 025, [ ].[27] J. Bock, A. Aljabri, A. Amblard, D. Baumann, M. Betoule, T. Chui et al., Study of theExperimental Probe of Inﬂationary Cosmology (EPIC)-Intemediate Mission for NASA’sEinstein Inﬂation Probe , ArXiv e-prints (2009) , [ ].[28] J. Lazear, P. A. R. Ade, D. Benford, C. L. Bennett, D. T. Chuss, J. L. Dotson et al.,

ThePrimordial Inﬂation Polarization Explorer (PIPER) , in

Millimeter, Submillimeter, andFar-Infrared Detectors and Instrumentation for Astronomy VII , vol. 9153, p. 91531L, 2014. . DOI.[29]

PRISM collaboration, P. Andre et al.,

PRISM (Polarized Radiation Imaging and SpectroscopyMission): A White Paper on the Ultimate Polarimetric Spectro-Imaging of the Microwave andFar-Infrared Sky , .[30] PRISM collaboration, P. Andre et al.,

PRISM (Polarized Radiation Imaging and SpectroscopyMission): An Extended White Paper , JCAP (2014) 006, [ ].[31] R. A. Sunyaev and Y. B. Zeldovich,

The Observations of Relic Radiation as a Test of theNature of X-Ray Radiation from the Clusters of Galaxies , Comments on Astrophysics andSpace Physics (Nov., 1972) 173.[32] J. E. Carlstrom, G. P. Holder and E. D. Reese, Cosmology with the Sunyaev-Zel’dovich eﬀect , Ann. Rev. Astron. Astrophys. (2002) 643–680, [ astro-ph/0208192 ].[33] J. Chluba and R. A. Sunyaev, The evolution of CMB spectral distortions in the early Universe , Mon. Not. Roy. Astron. Soc. (2012) 1294–1314, [ ].[34] R. Khatri and R. A. Sunyaev,

Beyond y and µ : the shape of the CMB spectral distortions inthe intermediate epoch, 1.5 × (cid:46) z (cid:46) × , JCAP (2012) 016, [ ].[35] F. Bernardeau, S. Colombi, E. Gaztanaga and R. Scoccimarro,

Large scale structure of theuniverse and cosmological perturbation theory , Phys. Rept. (2002) 1–248,[ astro-ph/0112551 ].[36] N. S. Sugiyama,

Using Lagrangian perturbation theory for precision cosmology , Astrophys. J. (2014) 63, [ ].[37] M. Crocce and R. Scoccimarro,

Renormalized cosmological perturbation theory , Phys. Rev.

D73 (2006) 063519, [ astro-ph/0509418 ].[38] P. McDonald,

Dark matter clustering: a simple renormalization group approach , Phys. Rev.

D75 (2007) 043514, [ astro-ph/0606028 ].[39] P. McDonald,

What the "simple renormalization group" approach to dark matter clusteringreally was , .[40] B. Audren and J. Lesgourgues, Non-linear matter power spectrum from Time RenormalisationGroup: eﬃcient computation and comparison with one-loop , J. Cosmo. Astropart. Phys. (2011) 037, [ ]. – 26 –

41] D. Baumann, A. Nicolis, L. Senatore and M. Zaldarriaga,

Cosmological Non-Linearities as anEﬀective Fluid , JCAP (2012) 051, [ ].[42] J. J. M. Carrasco, M. P. Hertzberg and L. Senatore,

The Eﬀective Field Theory ofCosmological Large Scale Structures , JHEP (2012) 082, [ ].[43] E. Pajer and M. Zaldarriaga, On the Renormalization of the Eﬀective Field Theory of LargeScale Structures , JCAP (2013) 037, [ ].[44] M. P. Hertzberg,

Eﬀective ﬁeld theory of dark matter and structure formation: Semianalyticalresults , Phys. Rev.

D89 (2014) 043521, [ ].[45] D. Blas, M. Garny, M. M. Ivanov and S. Sibiryakov,

Time-Sliced Perturbation Theory for LargeScale Structure I: General Formalism , JCAP (2016) 052, [ ].[46] J. E. McEwen, X. Fang, C. M. Hirata and J. A. Blazek,

FAST-PT: a novel algorithm tocalculate convolution integrals in cosmological perturbation theory , JCAP (2016) 015,[ ].[47] M. Schmittfull, Z. Vlah and P. McDonald,

Fast large scale structure perturbation theory usingone-dimensional fast Fourier transforms , Phys. Rev.

D93 (2016) 103528, [ ].[48] M. Schmittfull and Z. Vlah,

FFT-PT: Reducing the 2-loop large-scale structure power spectrumto one-dimensional, radial integrals , .[49] E. Jones, T. Oliphant, P. Peterson et al., SciPy: Open source scientiﬁc tools for Python , 2001–.[50] A. Lewis, A. Challinor and A. Lasenby,

Eﬃcient computation of CMB anisotropies in closedFRW models , Astrophys. J. (2000) 473–476, [ astro-ph/9911177 ].[51]

Planck collaboration, P. A. R. Ade et al.,

Planck 2015 results. XIII. Cosmological parameters , .[52] H. Hildebrandt et al., KiDS-450: Cosmological parameter constraints from tomographic weakgravitational lensing , .[53] M. A. Troxel and M. Ishak, The Intrinsic Alignment of Galaxies and its Impact on WeakGravitational Lensing in an Era of Precision Cosmology , Phys. Rept. (2014) 1–59,[ ].[54] B. Joachimi et al.,

Galaxy alignments: An overview , Space Sci. Rev. (2015) 1–65,[ ].[55] E. Krause, T. Eiﬂer and J. Blazek,

The impact of intrinsic alignment on current and futurecosmic shear surveys , Mon. Not. Roy. Astron. Soc. (2016) 207–222, [ ].[56] N. E. Chisari and C. Dvorkin,

Cosmological Information in the Intrinsic Alignments ofLuminous Red Galaxies , JCAP (2013) 029, [ ].[57] C. M. Hirata and U. Seljak,

Intrinsic alignment-lensing interference as a contaminant ofcosmic shear , Phys. Rev.

D70 (2004) 063526, [ astro-ph/0406275 ].[58] P. Catelan, M. Kamionkowski and R. D. Blandford,

Intrinsic and extrinsic galaxy alignment , Mon. Not. Roy. Astron. Soc. (2001) L7–L13, [ astro-ph/0005470 ].[59] J. Blazek, Z. Vlah and U. Seljak,

Tidal alignment of galaxies , JCAP (2015) 015,[ ].[60] J. Blazek, M. Troxel and N. MacCrann,

Intrinsic Alignment Modeling for Mixed GalaxyPopulations , in preparation .[61] P. McDonald and A. Roy, Clustering of dark matter tracers: generalizing bias for the comingera of precision LSS , J. Cosmo. Astropart. Phys. (2009) 020, [ ]. – 27 –

62] J. P. Ostriker and E. T. Vishniac,

Generation of microwave background ﬂuctuations fromnonlinear perturbations at the ERA of galaxy formation , Astrophys. J. Lett. (1986)L51–L54.[63] E. T. Vishniac,

Reionization and small-scale ﬂuctuations in the microwave background , Astrophys. J. (1987) 597–604.[64] A. H. Jaﬀe and M. Kamionkowski,

Calculation of the Ostriker-Vishniac eﬀect in cold darkmatter models , Phys. Rev.

D58 (1998) 043001, [ astro-ph/9801022 ].[65] R. A. Sunyaev and I. B. Zeldovich,

The velocity of clusters of galaxies relative to the microwavebackground - The possibility of its measurement , Mon. Not. R. Astron. Soc. (1980)413–420.[66] W. Hu,

Reionization revisited: secondary cmb anisotropies and polarization , Astrophys. J. (2000) 12, [ astro-ph/9907103 ].[67] J. C. Jackson,

Fingers of God: A critique of Rees’ theory of primoridal gravitational radiation , Mon. Not. Roy. Astron. Soc. (1972) 1P–5P, [ ].[68] B. Bose and K. Koyama,

A Perturbative Approach to the Redshift Space Power Spectrum:Beyond the Standard Model , .[69] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions: with formulas, graphs,and mathematical tables . No. 55. Courier Corporation, 1964.[70] “NIST Digital Library of Mathematical Functions.” http://dlmf.nist.gov/, Release 1.0.10 of2015-08-07.[71] F. W. J. Olver, D. W. Lozier, R. F. Boisvert and C. W. Clark, eds.,

NIST Handbook ofMathematical Functions . Cambridge University Press, New York, NY, 2010.

A Mathematical Identities

In this work we have used a number of common mathematical identities. These identities areeasily found in any standard mathematical physics text or handbook, (e.g. [69–71]). However,to make our paper self-contained we list those relevant to our paper.

A.1 Spherical Harmonics and Legendre Polynomials • The addition theorem P (cid:96) ( ˆ q · ˆ q ) = 4 π (cid:96) + 1 (cid:96) (cid:88) m = − (cid:96) Y (cid:96)m ( ˆ q ) Y ∗ (cid:96)m ( ˆ q ); (A.1) • The special case thereof, (cid:96) (cid:88) m = − (cid:96) Y (cid:96)m ( ˆ q ) Y ∗ (cid:96)m ( ˆ q ) = 2 (cid:96) + 14 π ; (A.2) • The orthonormality relation (cid:90) S d ˆ q Y (cid:96)m ( ˆ q ) Y ∗ (cid:96) (cid:48) m (cid:48) ( ˆ q ) = δ (cid:96)(cid:96) (cid:48) δ mm (cid:48) ; (A.3)– 28 – The symmetry Y (cid:96)m ( ˆ q ) = ( − m Y ∗ (cid:96), − m ( ˆ q ); (A.4) • The expansion/decomposition of a plane wave: (cid:90) S d ˆ q Y ∗ (cid:96)m ( ˆ q ) e i q · r = 4 πi (cid:96) j (cid:96) ( qr ) Y ∗ (cid:96)m ( ˆ r ) ↔ e i q · r = 4 π (cid:88) (cid:96) i (cid:96) j (cid:96) ( qr ) (cid:96) (cid:88) m = − (cid:96) Y ∗ (cid:96)m ( ˆ q ) Y (cid:96)m ( ˆ r ) . (A.5) A.2 Wigner j and j Symbols

The deﬁnitions of Wigner j and j symbols, denoted by ( ) and { }, respectively, are long andcan be easily found online or in handbooks. Here we only list some properties and identitiesneeded in our derivations. • Assuming j , j , j satisfy the triangle conditions, we have the special case (cid:18) j j j (cid:19) =  , J odd , ( − J/ (cid:16) ( J − j )! ( J − j )! ( J − j )!( J +1)! (cid:17) / ( J ) ! ( J − j ) ! ( J − j ) ! ( J − j ) ! , J even , (A.6)where J ≡ j + j + j ; • The permutation and reﬂection symmetry (cid:18) j j j m m m (cid:19) = (cid:18) j j j m m m (cid:19) = (cid:18) j j j m m m (cid:19) , (cid:18) j j j m m m (cid:19) = ( − j + j + j (cid:18) j j j m m m (cid:19) ; (A.7) (cid:18) j j j m m m (cid:19) = ( − j + j + j (cid:18) j j j − m − m − m (cid:19) ; (A.8) • The orthogonality relation (cid:88) m m (2 j + 1) (cid:18) j j j m m m (cid:19) (cid:18) j j j (cid:48) m m m (cid:48) (cid:19) = δ j ,j (cid:48) δ m ,m (cid:48) , (A.9) • Relation to spherical harmonics Y (cid:96) m ( ˆ q ) Y (cid:96) m ( ˆ q ) = (cid:88) (cid:96),m (cid:114) (2 (cid:96) + 1)(2 (cid:96) + 1)(2 (cid:96) + 1)4 π (cid:18) (cid:96) (cid:96) (cid:96)m m m (cid:19) Y ∗ (cid:96)m ( ˆ q ) (cid:18) (cid:96) (cid:96) (cid:96) (cid:19) ; (A.10) (cid:90) d ˆ q Y (cid:96) m ( ˆ q ) Y (cid:96) m ( ˆ q ) Y (cid:96) m ( ˆ q ) = (cid:114) (2 (cid:96) + 1)(2 (cid:96) + 1)(2 (cid:96) + 1)4 π (cid:18) (cid:96) (cid:96) (cid:96) (cid:19) (cid:18) (cid:96) (cid:96) (cid:96) m m m (cid:19) ; (A.11)– 29 – j j j m m m (cid:19) (cid:26) j j j (cid:96) (cid:96) (cid:96) (cid:27) = (cid:88) m (cid:48) m (cid:48) m (cid:48) ( − (cid:96) + (cid:96) + (cid:96) + m (cid:48) + m (cid:48) + m (cid:48) × (cid:18) j (cid:96) (cid:96) m m (cid:48) − m (cid:48) (cid:19) (cid:18) (cid:96) j (cid:96) − m (cid:48) m m (cid:48) (cid:19) (cid:18) (cid:96) (cid:96) j m (cid:48) − m (cid:48) m (cid:19) . (A.12) B Derivations

B.1 Derivation of Eq. (2.3)

Applying identities (A.1, A.4, A.7, A.8, A.10, A.12), we obtain P (cid:96) ( ˆ q · ˆ q ) P (cid:96) ( ˆ q · ˆ k ) P (cid:96) ( ˆ q · ˆ k )= (4 π ) (2 (cid:96) + 1)(2 (cid:96) + 1)(2 (cid:96) + 1) (cid:88) m,m ,m ( − m + m + m × Y (cid:96)m ( ˆ q ) Y (cid:96), − m ( ˆ q ) Y (cid:96) m ( ˆ q ) Y (cid:96) , − m (ˆ k ) Y (cid:96) m ( ˆ q ) Y (cid:96) , − m (ˆ k )=(4 π ) (cid:88) J ,J ,J k (cid:112) (2 J + 1)(2 J + 1)(2 J k + 1) (cid:18) (cid:96) (cid:96) J (cid:19) (cid:18) (cid:96) (cid:96) J (cid:19) (cid:18) (cid:96) (cid:96) J k (cid:19) × (cid:88) M ,M ,M k ( − M + M + M k Y J M ( ˆ q ) Y J M ( ˆ q ) Y JkMk (ˆ k ) × (cid:88) m,m ,m ( − m + m + m (cid:18) (cid:96) (cid:96) J m m − M (cid:19) (cid:18) (cid:96) (cid:96) J − m m − M (cid:19) (cid:18) (cid:96) (cid:96) J k − m − m − M k (cid:19) =(4 π ) / ( − (cid:96) + (cid:96) + (cid:96) × (cid:88) J ,J ,J k (cid:112) (2 J + 1)(2 J + 1)(2 J k + 1) (cid:18) J (cid:96) (cid:96) (cid:19) (cid:18) (cid:96) J (cid:96) (cid:19) (cid:18) (cid:96) (cid:96) J k (cid:19) (cid:26) J J J k (cid:96) (cid:96) (cid:96) (cid:27) × ( − J + J + J k (cid:88) M ,M ,M k Y J M ( ˆ q ) Y J M ( ˆ q ) Y JkMk (ˆ k ) (cid:18) J J J k M M M k (cid:19) , (B.1)where we can deﬁne a coeﬃcient C J J J k (cid:96) (cid:96) (cid:96) ≡ (4 π ) / ( − (cid:96) + (cid:96) + (cid:96) + J + J + J k × (cid:112) (2 J + 1)(2 J + 1)(2 J k + 1) (cid:18) J (cid:96) (cid:96) (cid:19) (cid:18) (cid:96) J (cid:96) (cid:19) (cid:18) (cid:96) (cid:96) J k (cid:19) (cid:26) J J J k (cid:96) (cid:96) (cid:96) (cid:27) , (B.2)Note that when we combine two spherical harmonics into one, the triangle conditions of the j symbols imply that M = m + m , M = − m + m , M k = − m − m , (B.3)so that M , M , M k satisfy M + M + M k = 0 . (B.4)According to the condition (2.26), we have J + J + J k = even , leading to ( − J J Jk = 1 .Hence, Eq. (2.3) is recovered. – 30 – .2 Derivation of Eq. (2.10) and (2.11) Applying Eqs. (A.10) and (A.9), we obtain (cid:88) M M (cid:18) J J J k M M M k (cid:19) Y J M ( ˆ r ) Y J M ( ˆ r )= (cid:88) M M (cid:18) J J J k M M M k (cid:19) (cid:88) (cid:96) (cid:48) m (cid:48) (cid:114) (2 J + 1)(2 J + 1)(2 (cid:96) (cid:48) + 1)4 π (cid:18) J J (cid:96) (cid:48) M M m (cid:48) (cid:19) Y ∗ (cid:96) (cid:48) m (cid:48) ( ˆ r ) (cid:18) J J (cid:96) (cid:48) (cid:19) = (cid:88) (cid:96) (cid:48) m (cid:48) (cid:115) (2 J + 1)(2 J + 1)4 π (2 (cid:96) (cid:48) + 1) Y ∗ (cid:96) (cid:48) m (cid:48) ( ˆ r ) (cid:18) J J (cid:96) (cid:48) (cid:19) (cid:88) M M (2 (cid:96) (cid:48) + 1) (cid:18) J J J k M M M k (cid:19) (cid:18) J J (cid:96) (cid:48) M M m (cid:48) (cid:19) = (cid:88) (cid:96) (cid:48) m (cid:48) (cid:115) (2 J + 1)(2 J + 1)4 π (2 (cid:96) (cid:48) + 1) Y ∗ (cid:96) (cid:48) m (cid:48) ( ˆ r ) (cid:18) J J (cid:96) (cid:48) (cid:19) δ (cid:96) (cid:48) Jk δ m (cid:48) Mk = (cid:115) (2 J + 1)(2 J + 1)4 π (2 J k + 1) Y ∗ JkMk ( ˆ r ) (cid:18) J J J k (cid:19) , (B.5)where we can deﬁne the coeﬃcient as a J J Jk ≡ (cid:115) (2 J + 1)(2 J + 1)4 π (2 J k + 1) (cid:18) J J J k (cid:19) . (B.6)Hence, Eqs. (2.10) and (2.11) are demonstrated. C Proof of Feasibility of Series Expansion

In this section we will prove the series expansion of ¯ A ( k, µ n ) and ¯ B ( k, µ n ) are feasible. Suppose p , p are non-negative integers, we want to show the following ﬁnite series expansion alwaysexists, D ( k, µ n ) ≡ (cid:90) d q (2 π ) ( ˆ q · ˆ n ) p ( ˆ q · ˆ n ) p = (cid:88) i =0 D i ( k ) µ in . (C.1)In spherical coordinates where the z − axis is chosen along ˆ k and ˆ n on the x − z plane, thekernel will be F ( p , p ) ≡ ( ˆ q · ˆ n ) p ( ˆ q · ˆ n ) p =(sin θ sin θ n cos φ + cos θ cos θ n ) p ( − sin θ sin θ n cos φ + cos θ cos θ n ) p = p (cid:88) r =0 (cid:18) p r (cid:19) sin r θ sin r θ n cos r φ cos p − r θ cos p − r θ n × p (cid:88) r =0 ( − r (cid:18) p r (cid:19) sin r θ sin r θ n cos r φ cos p − r θ θ cos p − r θ n = p (cid:88) r =0 p (cid:88) r =0 ( − r (cid:18) p r (cid:19)(cid:18) p r (cid:19) cos r + r φ sin r θ sin r θ sin r + r θ n cos p − r θ cos p − r θ × cos p + p − r − r θ n . (C.2)– 31 –veraging over the azimuthal angle φ , we are only left with terms with r + r = even , since (cid:104) cos m φ (cid:105) φ = 0 for odd integer m . The kernel then becomes (cid:104) F ( p , p ) (cid:105) = p (cid:88) r =0 p (cid:88) r =0 ( − r (cid:18) p r (cid:19)(cid:18) p r (cid:19) (cid:104) cos r + r φ (cid:105) sin r θ sin r θ cos p − r θ cos p − r θ × (cid:0) − µ n (cid:1) r r µ p + p − r − r n . (C.3)Since ( r + r ) / and p + p − r − r are both non-negative integers, we can futher expandit as a polynomial of µ n . Thus, the expansion (C.1) is always feasible.Furthermore, from Eq. (C.3) we obtain two properties of the expansion:1. The power of µ n goes up to p + p , so that the series is ﬁnite. And it goes as p + p − , p + p − , · · · , down to 0 or 1 depending on the parity of p + p .2. The part with cos p − r θ cos p − r θ can always be written as products of Legendrepolynomials of µ and µ . The only apparent problem comes from sin θ and sin θ .However, since r + r is even, r − r must be even as well. Suppose r ≥ r , thepotentially problematic term becomes: sin r θ sin r θ = (sin θ sin θ ) r sin r − r θ = (cos θ cos θ − ˆ q · ˆ q ) r (cid:0) − cos θ (cid:1) r − r , (C.4)so that each term can be written in terms of the products of cos θ , cos θ and ˆ q · ˆ q2