[PDF] A convolutional-neural-network estimator of CMB constraints on dark matter energy injection

Abstract

We show that the impact of energy injection by dark matter annihilation on the cosmic microwave background power spectra can be apprehended via a residual likelihood map. By resorting to convolutional neural networks that can fully discover the underlying pattern of the map, we propose a novel way of constraining dark matter annihilation based on the Planck 2018 data. We demonstrate that the trained neural network can efficiently predict the likelihood and accurately place bounds on the annihilation cross-section in a \textit{model-independent} fashion. The machinery will be made public in the near future.

Full PDF

AA convolutional-neural-network estimator of CMB constraints ondark matter energy injection

Wei-Chih Huang, ∗ Jui-Lin Kuo, † and Yue-Lin Sming Tsai

3, 4, ‡ CP -Origins, University of Southern Denmark,Campusvej 55 5230 Odense M, Denmark Institute of High Energy Physics, Austrian Academy of Sciences,Nikolsdorfergasse 18, 1050 Vienna, Austria Key Laboratory of Dark Matter and Space Astronomy,Purple Mountain Observatory, Chinese Academy of Sciences, Nanjing 210033, China Department of Physics, National Tsing Hua University, Hsinchu 300, Taiwan (Dated: January 27, 2021)

Abstract

We show that the impact of energy injection by dark matter annihilation on the cosmic mi-crowave background power spectra can be apprehended via a residual likelihood map. By resortingto convolutional neural networks that can fully discover the underlying pattern of the map, wepropose a novel way of constraining dark matter annihilation based on the Planck 2018 data. Wedemonstrate that the trained neural network can eﬃciently predict the likelihood and accuratelyplace bounds on the annihilation cross-section in a model-independent fashion. The machinery willbe made public in the near future. ∗ [email protected] † [email protected] ‡ [email protected] a r X i v : . [ a s t r o - ph . C O ] J a n . INTRODUCTION The nature of dark matter (DM) remains one of the biggest unsolved enigmas in decades.One simple, minimum assumption about DM is that DM couples to Standard Model (SM)particles and its density follows the thermal distribution. In this scenario, thermal DM, whenthe temperature drops below the DM mass, the Boltzmann-suppressed DM-SM interactionscan no longer keep up with the expansion of the universe and DM decouples from thethermal bath, aka freeze-out. To reproduce the correct density, the strength of the couplingshould be similar to the weak interaction in the SM for the DM mass around the electroweakscale O (100 GeV) [1–3]. Such a coincidence is known as the “WIMP miracle” where WIMPstandards for Weakly Interacting Massive Particle. To hunt for DM, various experimentshave been developed, which can be categorized into direct detection, indirect detection, andsearches at colliders. Unfortunately, the long-awaited miracle has not been observed sofar, resulting in stringent limits, especially from direct detection, on the parameter spaceof the minimum DM theory. To date, the DM mass for a successful freeze-out scenariois only allowed in some particular regions, such as a mass window of O (MeV–GeV) [4–6],mainly due to the kinetic threshold which leads to a sharp plummet in sensitivity towardthe (sub-)GeV mass range in direct detection experiments considering nuclear recoil. Onecan also avoid the stringent bounds by disconnecting DM freeze-out processes from thoseof detection, for example, with the help of the resonance enhancement [7–10] or involvingslightly heavier excited states into annihilation processes to increase the cross-section duringthe period of freeze-out, namely co-annihilation [11–14].Besides the direct detection, the indirect searches for the consequent high-energy SMparticles from DM annihilation are also of great interest. In the early universe, these SMparticles can interact and thermalize with the CMB photons, which leads to distortions onits black-body spectrum [15]. Furthermore, in the post-recombination epoch, the injectedenergy can alter the evolution of the gas temperature and ionization fraction, in turn leavingimprints on the temperature and polarization power spectra of the CMB [16–19]. It is worthmentioning that the CMB measurements are not subject to uncertainties on the local DMdensity and the kinetic threshold that brings down the sensitivity of direct detection. AsCMB experiments evolved and advanced signiﬁcantly from the pre-WMAP epoch [20–22] tothe WMAP [23] and the Planck satellite [24], we are now entering the precision era of CMB2easurements. These data can provide independent, competitive bounds on the annihilationcross-section, especially on the (sub-)GeV mass range. Note that other late-time indirectdetection experiments such as Fermi-LAT [25–27] and AMS-02 [28] also experience the lossof sensitivity [29] toward the low-mass regime because of the instrumental threshold. All inall, the CMB provides a distinct, complementary probe into the nature of the thermal DM,especially important for low-mass DM and for models featuring s -wave annihilation that isnot velocity suppressed.The standard procedure to derive the CMB exclusion limit is ﬁrst to translate the DMannihilation energy spectra into the evolution of the ionization fraction and gas temperature.Second, with the help of the Markov chain Monte Carlo (MCMC) method, one maps outthe likelihood distribution by including proper priors (if any) in the hyperspace consisting ofboth cosmological and nuisance parameters. Finally, the likelihood function can be projectedonto the plane of the DM mass and the cross-section by either marginalizing over (Bayesian)or proﬁling out (Likelihoodist) other parameters. Then, the exclusion limit can be easilyderived, e.g. , by the p -value. The whole procedure, unfortunately, is very time-consumingand technically involved. Besides, the task becomes unmanageable for scenarios where DMannihilates into various ﬁnal states at the same time. In this case, one has to generate theinput annihilation spectrum for each set of the DM mass, cross-section and branching ratios.It has been pointed out in Refs. [30–33] that the DM-induced energy injection rate canbe quantiﬁed by an eﬀective parameter p ann = f ( z ) (cid:104) σv rel (cid:105) /m χ where f ( z ) is the energydeposition eﬃciency, (cid:104) σv rel (cid:105) is the thermal-averaged annihilation cross-section, and m χ isthe DM mass, respectively. The dependency on the DM model is all encapsulated in p ann ; asa result, p ann can be directly constrained by the CMB measurements. When the likelihoodtable as a function of p ann is built via MCMC scans once and for all, one can simply deriveconstraints on (cid:104) σv rel (cid:105) for given m χ , if f ( z ) is known; see Refs. [32, 33] for derivation of f ( z ).This method is adopted by Planck collaboration [34, 35] where f ( z ) is approximated asredshift-independent and only depends on annihilation channels and the DM mass. We notethat Ref. [36] attempted to resolve the model-dependency of f ( z ) by utilizing the principalcomponent analysis to approximately parameterize the impact on the CMB power spectrawith the DM-induced energy spectra. In this way, CMB constraints can be derived in amodel-independent fashion . One still needs diﬀerent treatments for DM decay, annihilation or other exotic processes such as a mixed

3n this work, following the spirit of deriving CMB constraints in a model-independent waydiscussed above, we use the perturbations on the evolution of the ionization fraction dx DM e /dz and gas temperature dT DM g /dz , rather than p ann or the energy spectra, to characterize theeﬀects of DM-induced energy injection as they are more physically intuitive and general forany sort of energy injection. In other words, we manage to build a likelihood map uponthese quantities, with the aim to provide a single tool of inferring the CMB constraints ongeneric models with multiple annihilation channels and arbitrary branching ratios. Thatis, the tool can predict the value of likelihood regardless of details of DM models once thevalues of dx DM e /dz and dT DM g /dz are provided.Given the complexity of diﬀerent ﬁnal states and branching fractions, it is not practicalto manually construct an enormous interpolation table to cope with a large parameter space.We therefore resort to the power of Convolutional Neural Network (CNN), to provide anvery eﬃcient way of placing constraints on DM models of multiple annihilation channels with arbitrary branching ratios. As we have known, artiﬁcial intelligence (AI) has been ubiquitouswith numerous applications and has resulted in far-reaching inﬂuences in our daily lives. Dueto the availability of a tremendous amount of digital data, deep learning is a booming branchin AI, which has developed a large variety of neural networks and algorithms. Out of manytypes of deep neural works, CNN is renowned for the capabilities of discovering patterns,shapes and correlations among input and output parameters (especially powerful when thedimension of input parameters is large), and is employed here in light of the sequentialnature of the input parameters, dx DM e /dz and dT DM g /dz . For a pedagogical introduction onCNN, see, e.g. , Refs. [37, 38].Our workﬂow is sketched out as follows. First, we compute dx DM e /dz and dT DM g /dz following the methodology in Refs. [39–41], and with those quantities we can attain theCMB angular power spectra via the Boltzmann solver CLASS [42]. Four typical annihilationchannels are studied, e − e + , b ¯ b , W − W + , µ − µ + , each with diﬀerent combinations of ( m χ , (cid:104) σv rel (cid:105) ). Secondly, with the help of MontePython [43, 44], a parameter inference packagefor cosmology, the power spectra from

CLASS are used to map out the likelihood in thehyperspace of cosmological and nuisance parameters, given a set of the DM mass and cross-section. We repeat this step for each set of ( m χ , (cid:104) σv rel (cid:105) ). Thirdly, for each ( m χ , (cid:104) σv rel (cid:105) )we proﬁle out the cosmological and nuisance parameters by selecting the point with the channel considered in this work. dx DM e /dz and dT DM g /dz , with the target variable being the likelihood of the selected data points.Finally, we apply the trained network on unseen data generated from a new scenario withDM annihilating into all of the four channels mentioned above.As we shall see below, the network yields consistent predictions for the new channelwith those from MCMC scans. Similar performance can be expected even for completelynew ﬁnal states, as long as their energy spectra are not contrasting with those seen bythe network. In addition, the model is very scalable to diﬀerent scenarios; for instance,by adding data points from decaying DM models to the training dataset, the network canhandle both annihilating and decaying cases without extra tweaks – more training data ofdiﬀerent types, better performance and broader applicability. Furthermore, one can trainthe network to predict not only the likelihood but also the corresponding best-ﬁt values ofthe cosmological parameters by simply including these parameters as the target variables.The rest of the paper is organized as follows. In Sec. II, we begin by summarizing thecalculation of the ionization fraction and gas temperature evolution in the presence of DMenergy injection. Then, we demonstrate via a residual likelihood map that the DM-inducedchanges on the ionization fraction and gas temperature can closely characterize the eﬀectson the CMB. We furthermore explain how the CMB power spectra are computed and howMCMC scans over the parameter space are performed with the existing tools. In Sec. III,we discuss the network structure and data training. In Sec. IV, the good performance of thenetwork in terms of the accuracy of predictions and the CMB exclusion limits is displayed,and the decent performance happens to unseen data as well. Finally, we conclude in Sec. V. II. CMB CONSTRAINTS ON ENERGY INJECTION FROM DM

DM signals have not been detected by the measurement of the CMB anisotropy thus far,which constrains the energy injection sourced through DM to be small, compared to that ofthe standard cosmology and statistical uncertainties. In this section, we start with elaborat-ing on how we calculate the eﬀect of DM energy deposition on the ionization fraction andgas temperature evolution. Then we show the statistical importance of DM contributions The target variable is the feature of a dataset on which the network make the prediction. z − − − − − x e m χ = 10 GeV , h σv rel i = 10 − cm / s ΛCDM χ ¯ χ → e − e + χ ¯ χ → b ¯ b z T g ( K ) m χ = 10 GeV , h σv rel i = 10 − cm / s T CMB

ΛCDM χ ¯ χ → e − e + χ ¯ χ → b ¯ b FIG. 1. Exemplary evolutions of ionization fraction (left panel) and gas temperature (right panel)for both ΛCDM and cases with the energy injection from DM annihilation. The benchmark valuesof DM parameters are m χ = 10 GeV and (cid:104) σv rel (cid:105) = 10 − cm / s. The ΛCDM scenario, the e − e + and b ¯ b channels are shown by black, red and blue curves, respectively. We can clearly see theadditional ionization and heating caused by DM annihilation. at diﬀerent redshifts can be captured by a residual likelihood map in a model-independentway. Finally, we detail the conventional global ﬁtting approach that utilizes the MCMCmethod to scan over a hyperparameter space of the cosmological parameters given DMmodel parameters to obtain robust CMB constraints on DM annihilation. A. DM contribution to ionization and heating

For calculation of DM-induced eﬀects on the evolution of the ionization fraction x e andthe gas temperature T g , we solely consider s -wave annihilation as the cross-section is notsuppressed by the DM relative velocity v rel . The DM contribution to ionization and heatingcan be written as [39–41, 45] − (cid:20) dx DM e dz (cid:21) s -wave = (cid:88) F Br F (cid:90) z dz (cid:48) H ( z (cid:48) )(1 + z (cid:48) ) n χ ( z (cid:48) ) (cid:104) σv rel (cid:105) n H ( z (cid:48) ) m χ E RY dχ Fi ( m χ , z, z (cid:48) ) dz , − (cid:34) dT DM g dz (cid:35) s -wave = (cid:88) F Br F (cid:90) z dz (cid:48) H ( z (cid:48) )(1 + z (cid:48) ) n χ ( z (cid:48) ) (cid:104) σv rel (cid:105) n H ( z (cid:48) ) m χ dχ Fh ( m χ , z, z (cid:48) ) dz , (1)where H ( z ), n χ ( z ) and n H ( z ) are the Hubble parameter, DM number density, and hydrogenatom number density, respectively, as functions of redshift z . Notice that z (cid:48) stands for theredshift at which the energy is injected, while z is the redshift when the injected energy is6bsorbed. In this way, the diﬀerential ionization fraction and gas temperature at redshift z account for accumulative eﬀects from all energy deposition that happened at earlier redshifts, z (cid:48) > z . The Rydberg energy E RY ≡ . F denotes diﬀerent ﬁnal states with branching ratio Br F . The terms dχ Fi,h ( m χ , z, z (cid:48) ) /dz in Eq. (1), which represent the fraction of injected energy going intoionization ( i ) and heating ( h ), are given by dχ Fi,h ( m χ , z, z (cid:48) ) dz = (cid:90) dE Em χ (cid:34) dN Fe dE dχ ei,h ( E, z, z (cid:48) ) dz + dN Fγ dE dχ γi,h ( E, z, z (cid:48) ) dz (cid:35) . (2)Calculation of the fractions dχ e,γi,h ( E, z, z (cid:48) ) /dz is explained in Refs. [39–41] and the ﬁnalenergy spectra of electrons and photons per DM annihilation dN Fe,γ /dE are obtained fromthe package

LikeDM [46]. It interpolates three dimensional

PPPC4 [47] tables which take intoaccount electroweak corrections computed in Ref. [48].We should point out that the existing CMB constraints [41] derived based on Eq. (1)are in accord with results of other studies [19, 30, 32, 33] that employ a diﬀerent methodof computing the DM energy injection. Moreover, the cosmological boost factor discussedin Ref. [45] is not considered here as the epoch of interest is still in the linear regime. InFig. 1, we demonstrate the evolution of x e and T g for s -wave annihilation into e − e + and b ¯ b with m χ = 10 GeV and (cid:104) σv rel (cid:105) = 10 − cm / s.In general, the energy injection rate of DM annihilation is proportional to the velocity-averaged cross-section (cid:104) σv rel (cid:105) multiplied by the DM number density squared n χ . One cansimply Taylor expand (cid:104) σv rel (cid:105) in powers of v rel . Only the velocity-independent component( s -wave) is explored here, but there exist v -dependent component ( p -wave) [49–51] andannihilation through resonance (Breit-Wigner enhancement) [52–54], which are also phe-nomenologically interesting. However, we do not explore these scenarios as they are eithersubdominant due to velocity suppression or more complicated in light of additional modelparameters such as the mass and width of the intermediate particle.Apart from annihilation, DM can inject energy into the thermal plasma by decayinginto SM particles. In this case, the energy injection rate of DM decay is proportional to n χ (scales as (1 + z ) ), which results in dx DM e /dz ∝ (1 + z ) − / as opposed to dx DM e /dz ∝ (1 + z ) / for s -wave annihilating DM. Therefore, for decaying DM, energy deposition atlow redshifts becomes more important than that at high redshifts. For the mass region of7eV–TeV considered in this study, constraints on decaying DM from indirect search [55–64]are more stringent than the CMB bound [36]. Nevertheless, the constraints from CMB andreionization [50] are still crucial for sub-GeV DM decay, which will be pursued in the future.As a ﬁnal remark of this section, although we only consider s -wave annihilation, we notethat the methodology of computing the diﬀerentials of x e and T g described in Refs. [39–41] can be applied to all the diﬀerent scenarios mentioned above. For completeness, thederivations of formulas similar to Eq. (1) are given in App. A for DM decay, in App. B for p -wave annihilation, and in App. C for annihilation via resonance. B. CMB residual likelihood map

From a residual likelihood map , one can easily see at which redshift intervals the changesin the evolution of x e and T g lead to signiﬁcant eﬀects on the CMB power spectra. To someextent, the map represents the underlying pattern the neural network will discover during thetraining process. In terms of model-independence, we ﬁnd that the quantities dx DM e /dz and dT DM g /dz are more suited to represent the DM impact than the parameters of DM modelslike the mass and cross-section. We assume the DM contribution is only perturbation to thestandard cosmology, which is well-justiﬁed for the parameter space close to the exclusionlimit, where the eﬀect on the CMB power spectra is linearly proportional to dx DM e /dz and dT DM g /dz . To demonstrate the dependency, we further ignore dT DM g /dz in constructingthe residual likelihood map because its eﬀect on the CMB power spectra is rather minorcompared to that of dx DM e /dz , simplifying the map from three dimensions ( z , dx DM e /dz , dT DM g /dz ) to two dimensions ( z , dx DM e /dz ).The statistics strength χ ( m χ , (cid:104) σv rel (cid:105) ) = − L ( m χ , (cid:104) σv rel (cid:105) ) is computed by contrast-ing the expected CMB angular power spectra to the measured one, and we in general need toperform the full numerical calculation of CMB power spectra to obtain the likelihood for eachset of the DM parameters. The results from the full numerical calculation are presumablymore accurate but very time-consuming. For the purpose of the dependency demonstra-tion, we simply reconstruct the likelihood in terms of dx DM e /dz . First, we discretize a given dx DM e /dz curve into n log-spaced bins in redshifts. The small DM contribution assumption The term “residual” refers to the deviation from the background value. χ ≈ (cid:88) i,j c ij (cid:113) χ i χ j , (3)where c ij is the element of the covariance matrix c with i and j representing the i -th and j -th bin. The individual χ i = − L i can be obtained by performing a full numericalcalculation of CMB power spectra, with DM contribution being a kernel function (cid:20) dx DM e dz (cid:21) i =  N DM , if z i ≤ z ≤ z i +1 , , else . (4)Here N DM is treated as a model-independent parameter. For any given DM annihilationchannel, the value of N DM can be computed from Eq. (1). We tabulate N DM and theirresulting χ i for later usage. In case of no correlation among the bins, the value of c ij is justa normalization factor 1 /n . By contrast, the components can be negative for i (cid:54) = j in thepresence of correlation. To construct the residual likelihood map, for simplicity we assume c ij = 0 for i (cid:54) = j , which is adequate as long as the DM contribution is small.The residual likelihood map for the individual redshift bins is presented in Fig. 2 togetherwith exemplary dx DM e /dz curves from diﬀerent DM annihilation channels. Here, we deﬁne δχ ≡ χ − χ , (5)where χ is the statistics strength for the standard ΛCDM cosmology. As indicatedby regions with a darker color in Fig. 2, the CMB power spectra are more sensitive tothe energy injection in the redshift range [600 , dx DM e /dz is indeed a suitable quantity for delineating, in a model-independentfashion, the eﬀect of an arbitrary DM model on the CMB. C. Cosmological scans for DM annihilation

Although the simpliﬁed CMB residual likelihood map can depict the DM impacts, itis still crucial to consider the correlation among energy injections from diﬀerent redshifts9

10 100 1000 z -10.0-9.5-9.0-8.5-8.0-7.5-7.0-6.5-6.0-5.5-5.0-4.5-4.0-3.5-3.0-2.5-2.0 l og ( d x D M e / d z ) m χ = 100 GeV, h σv rel i e − e + = 10 − cm / s m χ = 1 GeV, h σv rel i e − e + = 10 − cm / s m χ = 100 GeV, h σv rel i b ¯ b = 10 − cm / s m χ = 10 GeV, h σv rel i b ¯ b = 10 − cm / s − l og ( δ χ ) FIG. 2. The residual CMB likelihood map δχ ( z, dx DM e /dz ). We also plot the dx DM e /dz curves for e − e + and b ¯ b channels with various choices of ( m χ , (cid:104) σv rel (cid:105) ) for comparison. The energy injectionfrom DM annihilation can signiﬁcantly aﬀect the CMB power spectra in z = [600 , to infer a robust, precise CMB constraint. The standard procedure of achieving that isto perform a global ﬁtting of DM models as follows. First, we calculate dx DM e /dz and dT DM g /dz according to the formulas in Sec. II A. The tabulated dx DM e /dz and dT DM g /dz arethen inserted to a Boltzmann equation solver, e.g . CAMB [65, 66] or

CLASS [42], to obtainthe CMB angular power spectra. We calculate the likelihood based on the resulting CMBangular power spectra and the Planck 2018 data [67], out of which we exclusively involvethe baseline high- (cid:96) power spectra ( TT , TE , and TE ), low- (cid:96) power spectrum TT , low- (cid:96) HFIpolarization power spectrum EE , and lensing power spectrum lensing .In practice, to determine the proper statistical strength with the systematic uncertaintiesof ΛCDM for each of ( m χ , (cid:104) σv rel (cid:105) ) sets, one has to simultaneously scan over all cosmologicalparameters C = (Ω b h , Ω χ h , θ s , h, ln 10 A s , n s , τ ) as well as any additional nuisanceparameters N associated with the data sets involved. The cosmological parameters are10he reduced Hubble constant h , baryon density parameter Ω b , DM density parameter Ω χ ,the angle subtended by the sound horizon θ s , the primordial curvature power spectrum A s at k = 0 .

05 Mpc − , the scalar spectrum power-law index n s , and the Thomson scatteringoptical depth due to reionization τ , respectively. In this work, we utilize MontePython [43, 44]to compute the likelihood probability density L i ( m χ , (cid:104) σv rel (cid:105) , C , N ) with the CMB powerspectra computed by CLASS , and scan the space of cosmological and nuisance parameters viaa MCMC method, Metropolis-Hastings. Note that scans over the nuisance parameters arehandled diﬀerently from the cosmological ones according to the method of fast sampling [44]as jumps along the former need not involve the time-consuming Boltzmann code.For each scan of ( m χ , (cid:104) σv rel (cid:105) ), we proﬁle out the cosmological and nuisance parametersby choosing a single point with the maximum likelihood, whose value is recorded togetherwith the corresponding cosmological parameters, namely L ( m χ , (cid:104) σv rel (cid:105) ) = max C , N [ L i ( m χ , (cid:104) σv rel (cid:105) , C , N )] . (6)In this work, χ ( m χ , (cid:104) σv rel (cid:105) ) in Eq. (5) is deﬁned as − L ( m χ , (cid:104) σv rel (cid:105) ), which is legit-imate as L i ( m χ , (cid:104) σv rel (cid:105) , C , N ) on the right-hand side of Eq. (6) are well-approximated bythe Gaussian distribution. Moreover, our statistic strength δχ in Eq. (5) can be understoodas the null-signal approach. Therefore, the 95% conﬁdence level for (cid:104) σv rel (cid:105) given m χ cor-responds to δχ = 2 .

71, assuming a one-side χ -distribution. The derived constraints fromthe MCMC scans will be discussed later in Sec. IV together with the network performance. III. NETWORK ARCHITECTURE AND DATA TRAINING

In this section, we detail how we choose the network structure as well as the trainingprocedures. We utilize

TensorFlow which is an end-to-end, open-source machine learningplatform that has built-in Keras , a deep learning application programming interface writtenin Python . As demonstrated in Sec. II B, the evolution of ionization fraction x e and gastemperature T g are capable of characterizing, in a model-independent way, impacts of theDM-induced energy injection into the thermal plasma – larger deviation from the predictionsof standard cosmology, larger χ from CMB angular power spectra and hence more likely https://keras.io/ dx DM e /dz and dT DM g /dz are chosen as the input parametersinstead of the DM mass, cross-section, and annihilation channel. In this way, we assistnetworks with the knowledge of physics by choosing proper input parameters.For the format of input parameters, each curve of log ( dx DM e /dz ) and log ( dT DM g /dz )are discretized into 30 equally-spaced points in the log-scale of (1 + z ). The reason whywe use the logarithm is that values of the diﬀerentials vary over many orders of magnitude.Taking their exponents as the input parameters signiﬁcantly reduces the parameter range,which dramatically expedites the network’s optimization algorithm. In this approximation,the continuous values of the diﬀerentials are represented by the values of individual pointsin these 30 bins. We then preprocess the data by calculating the standard score ( z -score)for the whole data set in each of the 30 bins. Namely, x ji → ( x ji − (cid:104) x j (cid:105) ) /σ x j , where x j denotes all data points located in the j -th bin while (cid:104) x j (cid:105) and σ x j correspond to the meanand the standard deviation of points x ji , respectively. All in all, for each data point, theinput parameter is a two-dimensional array with a shape of (30 , χ found by MCMC scans in MontePython , given the DMmass and annihilation cross-section. In practice, δχ = χ − χ ( (cid:28) χ ) is usedinstead, to facilitate network optimization such that layers with small values of weightssuﬃce for accurate predictions. Otherwise, the optimization algorithm has to spend moretime searching in a larger parameter space. Note that MontePython also yields best-ﬁt valuesfor the cosmological parameters C . In principle, one can train networks to make predictionson all the best-ﬁt values provided by MontePython . The best-ﬁt values of the cosmologicalparameters across the dataset, however, are quite centralized with tiny variation; thus, weconcentrate on the prediction of δχ .We are now in a position to discuss the network structure. In light of the redshift depen-dence of the dx DM e /dz and dT DM g /dz , the sequence and correlation of the input parametersand matters. That is, swamping values of diﬀerent redshifts represent diﬀerent physicalscenarios. As generic deep neural networks will not be able to properly capture the natureof the sequence, one-dimensional CNNs, Conv1D layers, are employed to extract correlationand features of the input parameters as shown in Fig. 3 that pictorially displays the network One can simply construct an individual network for each of the cosmological parameters with the sameprocedure described here. Alternatively, one can contrive a more complicated network architecture forsimultaneous predictions on all the parameters. filters=256 kernel_size=3Conv1DInput Features Max-Poolpool_size=2 Conv1Dfilters=128 kernel_size=3 filters=64 kernel_size=3Conv1D Max-Pool Flatten

Flatten

Fully connected layer - l n ( L i ke li h ood ) Layer1: (28,512)

Layer2: (26,256)

Layer: (13,256)

Layer1: (11,128)

Layer2: (9,128)

Layer1: (7,64)

Layer2: (5,64)

Layer: (2,64) (128) (88) (112) (112) (1)

FIG. 3. The architecture of our machine. See main text for more details. The output shape of thelayer is speciﬁed by the values in the parentheses which is also the input shape of the next layer.The structure of networks is analogous to a previous work [68] which deals with cosmic ray spectrain a similar manner. architecture and exhibits the evolution of the input array’s shape through diﬀerent layers.Applying a ﬁlter of size 3 (with 2 channels corresponding to the depth of 2 in the inputarray for dx DM e /dz and dT DM g /dz respectively) converts an input array of dimension (30 , , , Conv1D layer of 512 ﬁlters. The pooling layers , MaxPooling1D are deployed among

Conv1D layers to enhance extracted features by choosing the maximum values within thewindow of pooling kernels, at the same time reducing the size of the outputs from

Conv1D layers. We refer readers to websites of

Keras or TensorFlow for technical details of CNNsand other types of layers used in this work.Finally, the convoluted 2-D array is ﬂattened into a 1-D array and fed into two fully-connected layers , Dense , with 88 and 112 neurons, respectively. The output layer consistsof a single neuron for predicting the value of δχ . The second to last layer is the Lambda layer, which simply rescales the output of the previous layer by a factor of the maximumvalue of δχ of the entire data set. The insertion of the Lambda layer follows the same logicof using δχ as the target variable – conﬁning searches of minimization to a small parameterspace accelerates the optimization process. Furthermore, the number of neurons for the two13ully-connected layers are determined by Keras Tuner that enables search for the best setof hyperparameters of networks automatically and eﬃciently. The activation function of ReLU , i.e. , x → x ( x →

0) for x > x ≤ Conv1D and

Dense layers whilethe output layer assumes the linear activation function ( x → x ).For the training procedure, the total 39421 points from four independent channels χ ¯ χ → e − e + , b ¯ b , W − W + and µ − µ + – each data point corresponds to a set of ( m χ , (cid:104) σv rel (cid:105) ) and thecorresponding δχ – are randomly shuﬄed and split into a training set (80% of 39421 points)and a validation set (20%). As we shall see below, the evaluation of the trained networkperformance is carried out with unseen data from a mixed channel where DM particlesannihilate into all of the four ﬁnal states. We choose the mean squared error for the lossfunction, which quantiﬁes the diﬀerence between the network prediction and value of thetarget variable. In addition, the root mean squared error as the metrics is used to monitorthe network performance during the training. The Adam algorithm is adopted to minimizethe loss function with a learning rate of 10 − .To prevent overﬁtting to the training data, we resort to callbacks , which can ceasethe training process when the loss on the validation data stops improving and save thelayer weights with the best performance on the validation set. Overﬁtting refers to thephenomenon that a trained network performs very well on the data set that it was trainedon but fails to generalize to diﬀerent data. One of the main reasons is that the networkis overtrained and picks up information from noise or statistical ﬂuctuations from the datarather than learning the underlying pattern. The evolution of the loss functions is shown inFig. 4. The loss of the training data drops signiﬁcantly until 20 epochs or so and levels oﬀaround 40 epochs with a much smaller decreasing rate. On the other hand, the validationloss begins with a lower value and ﬂattens at a much earlier time. After roughly 75 epochs,the curve of validation turns upward and catches up with the training one, a herald ofoverﬁtting, and the training soon is stopped by callbacks .Before discussing our results, it is worthwhile to point out that, as we shall see below,there exists intrinsic noise in the MCMC scan results due to the diﬃculty in ﬁnding the trueminimum in a high-dimensional parameter space, e.g. , the MCMC chain gets stuck in a local Hyperparameters, diﬀerent from parameters of a network, e.g. , weights of hidden layers, are those relatedto network architecture, such as the number and width of hidden layers, and to the learning algorithm,such as the learning rate for gradient descent. Adam optimization [69] is a method of stochastic gradient descent according to the adaptive estimation ofﬁrst-order and second-order moments.

15 30 45 60 75 90Training epoch10 L o ss Training dataValidation data

FIG. 4. The evolution of the loss on the training and validation data with respect to the trainingepoch. The loss of the validation data plateaus out earlier than the training loss, but ceases toimprove around 70 epochs. It instead becomes worse than the loss of training, triggering callbacks to stop the training process. minimum. The noise, to some extent, averts the overﬁtting. Thus, two common techniquesto avoid overﬁtting – dropout , which randomly drops out part of neurons in a layer duringtraining, and regularization , which adds penalties on large layer weights to the loss function– have not been deployed in our network.

IV. RESULTS

In this section we present our results and demonstrate the predictive power of the networktrained on the data comprising four annihilation channels: e − e + , b ¯ b , W − W + and µ − µ + .First, we show the one-dimensional χ distribution for e − e + channel with m χ = 290 MeV(left panel) and m χ = 290 GeV (right panel) in Fig. 5. The black dot denotes the resultof the full MCMC scan, while the red solid line represents the network prediction. Thecorresponding χ for the ΛCDM-only scenario (green dashed line) and 95% C.L. exclusionlimit (blue dashed line) are also shown for comparison. It is noticeable that the value of15 − − − − − − − − − h σv rel i e − e + (cm / s) χ t e s t

95% C.L. exclusionΛCDM only m χ = 290 MeVFull MCMCMachine Prediction − − − − − − − − − h σv rel i e − e + (cm / s) χ t e s t

95% C.L. exclusionΛCDM only m χ = 290 GeVFull MCMCMachine Prediction FIG. 5. The one-dimensional χ distribution for the e − e + channel, from the full MCMC scans(black dot) and the network predictions (red solid curve) for two benchmark values of m χ , 290MeV (left panel) and 290 GeV (right panel). The blue dashed line marks the 95% C.L. exclusionlimit on (cid:104) σv rel (cid:105) given m χ . χ ﬂuctuates among nearby data points due to the intricacy of minimization in high-dimensional space that leads to the innate noise mentioned above. As the adjacent pointshave the similar cross-section and thus provide similar values of the input parameters to thenetwork; some of them, however, have the distinctive output, preventing the network fromﬁtting all the points perfectly, namely suppressing overﬁtting. In this case, by minimizingthe loss function, the network manages to ﬁnd general, overall correlation and trend out ofthe training data and naturally yields much smoother prediction curves, as shown in Fig. 5.When the annihilation cross-section is small, the existence of the DM-induced energyinjection helps ﬁt better to the CMB observables . On the other hand, DM contributionsgradually become important compared to those from the background when (cid:104) σv rel (cid:105) is largeenough, with the increment in χ proportional to (cid:104) σv rel (cid:105) . The network successfully capturesthe overall correlation between (cid:104) σv rel (cid:105) and χ with the smoother interpolation, diminishingthe noise in the data. The predicted exclusion limits for both light and heavy DM are fairlyclose to those of the scans, as can be seen from Fig. 5.In Fig. 6, we show 95% C.L. constraints on channels of e − e + and b ¯ b derived from the We have also conﬁrmed that χ will eventually increase and become equal to χ if one keeps de-creasing the cross-section. m χ (GeV) − − − − − − − h σ v r e l i e − e + ( c m / s ) χ ¯ χ → e − e + Full MCMCMachine PredictionCang et. al (2020)Slayter (2015) m χ (GeV) − − − − − h σ v r e l i b ¯ b ( c m / s ) χ ¯ χ → b ¯ b Full MCMCMachine PredictionSlayter (2015)

FIG. 6. Comparison of the 95% C.L. constraints inferred from the full MCMC scans (black dot)and network predictions (red solid curve) for e − e + (left panel) and b ¯ b (right). We also includesome existing bounds from Refs. [33, 70]. One can see that the constraints derived in this workfrom both the full MCMC scan and the network agree well with the existing bounds. full MCMC scans (black dots) and network predictions (red solid curves). These channelsare selected as they have distinct energy spectra from DM annihilation, while the spectra of W − W + and µ − µ + are quantitatively similar to that of b ¯ b . Some of the existing bounds in theliterature (dashed curves) are also included for comparison: the constraints from Ref. [33],that is based on Planck 2015 data, for both e − e + and b ¯ b , and the limit derived in Ref. [70],based on Planck 2018 data, for e − e + . Note that for the b ¯ b channel we require m χ ≥ dN Fe,γ /dE .Three comments are in order as follows. First, the constraints derived from the MCMC scanresults agree very well with the existing ones, reinforcing the validity of our implementationof DM-induced contributions into

CLASS and

MontePython . Second, consistency betweenthe MCMC results and network predictions corroborates again that the trained networksuccessfully learns the underlying patterns from the training data. Therefore, it can be usedas an eﬃcient estimator of chi-square values and a satisfactory method of deriving exclusionlimits on DM annihilation into SM particles. Finally, the discrepancy between inferredconstraints from Planck 2015 and 2018 data is quite small, and hence the constraints basedon diﬀerent data set are consistent.After demonstrating how well the trained network can decipher the data, on part of17 m χ (GeV) − − − − h σ v r e l i ( c m / s ) Br ( e,b,W,µ ) = (0 . , . , . , . Full MCMC (mixed)Machine Prediction (mixed)Machine Prediction ( e − e + )Machine Prediction ( b ¯ b ) FIG. 7. Comparison of the 95% C.L. constraints inferred from the full MCMC scans (blackdot) and network predictions (black curve) for the mixed channel of branching ratios Br ( e,b,W,µ ) =(0 . , . , . , . which it has been trained. We take one step further to appraise the network performanceon unseen data generated from a diﬀerent mixed channel with the branching ratios ofBr ( e,b,W,µ ) = (0 . , . , . , . b ¯ b and e − e + channels as expected (recall the bounds on the W − W + and µ − µ + channels are similar to b ¯ b channel due to the similar energy spectra). There also exist wiggles on the curves of predic-tion that originate from the noise on the training data. In this case, we have demonstratedthat the trained network is capable of producing quantitatively correct exclusion limits on achannel with arbitrary branching ratios. It is plausible that similar performance can also beattained in completely new annihilation channels such as τ − τ + or u ¯ u , as their energy spectraare not too diﬀerent from the four channels considered here. Moreover, to infer the exclusionlimit on the mixed channel, we generated 595 data points which take MontePython roughly10 days when running with 256 CPUs in parallel. On the other hand, it only takes thenetwork less than a minute to predict values of δχ for these 595 data points, underscoring18 ∆ (cid:105) σ ∆ Training data -0.0404 0.849Validation data -0.0540 0.850 (cid:104) σv rel (cid:105) on four channels -0.151 0.0967 (cid:104) σv rel (cid:105) on mixed channel 0.0162 0.124 TABLE I. The summary of the diﬀerence between the network predictions (denoted by P below)and MCMC ( M ) data. We quantify the deviation by the mean (second column) and standarddeviation (third column) of the diﬀerences. The second and third rows represent the diﬀerence on δχ , i.e , ∆ ≡ ( δχ ) P − ( δχ ) M for the training data and validation data. The last two rows showthe relative error of the 95% bound on (cid:104) σv rel (cid:105) , i.e. , ∆ ≡ ( (cid:104) σv rel (cid:105) P − (cid:104) σv rel (cid:105) M ) / (cid:104) σv rel (cid:105) P . the striking eﬃciency of the neural network as an estimator of the CMB constraints.To conclude, the neural network has been proven to be a competent, time-saving methodto obtain CMB constraints on DM annihilation. When properly trained with an suﬃcientamount of data, it delivers consistent results even on unseen data, avoids enormous workloadof MCMC scans, and greatly reduces the time needed to attain the results. We summarizeour results in Table. I, showing the diﬀerence between values of the true and predicted δχ on the training and validation data, as well as the relative error of the bound on (cid:104) σv rel (cid:105) .The discrepancy is partially attributed to the intrinsic ﬂuctuations in the data associatedwith diﬃculties of minimization in the high-dimensional space. Furthermore, degrees ofdiscrepancy are quite similar between the training and validation data, ensuring the trainingis not plagued with overﬁtting. Overall, we observe good agreement between the networkpredictions and MCMC results. V. CONCLUSIONS

In this work, we propose a novel way – a CNN estimator – to infer CMB bounds onDM-induced energy injection. We closely conform to the philosophy of model-independencepromoted in existing works to bypass or considerably reduce time-consuming MCMC scansover a large hyperspace space of cosmological parameters and degrees of freedom in DMmodels. More importantly, we take full advantage of neural networks’ power in discoveringunderlying patterns of input and output parameters to provide a fast and eﬃcient method ofinferring the CMB limits for annihilating DM models with several ﬁnal states and arbitrary branching ratios.To realize the goal, we select the DM-induced changes of the ionization fraction and gas19emperature as the variables to quantify DM eﬀects on the CMB. As conﬁrmed by the sim-pliﬁed residual likelihood map, the impacts can be genuinely reﬂected in terms of the chosenquantities. Next, we use various combinations of the DM mass and annihilation cross-sectionfrom four independent channels, e − e + , b ¯ b , µ − µ + and W − W + , and compute correspondingCMB power spectra which are aﬀected by DM energy injection. The maximum likelihood foreach set of the mass and cross-section can be attained via proﬁling out the cosmological andnuisance parameters. The dataset containing input parameters ( dx DM e /dz and dT DM g /dz )and the target variable (likelihood) are split into the training data and validation data andfed into a CNN for training.The decent performance on both the training and validation data has been achieved –the diﬀerence in chi-square between the full MCMC scans and network predictions is lessthan one unit on average. It takes only a matter of seconds , instead of days, for the networkto make prediction. More strikingly, the equally good performance also manifests itself at anew mixed channel where DM annihilates simultaneously into the four ﬁnal states. In otherwords, the CNN estimator is proven to be a powerful and eﬃcient tool for inferring CMBbounds on DM models.This work is a valuable stepping-stone to a fully developed CNN estimator that canconstrain in a matter of seconds energy injection from any types of DM models or exoticscenarios beyond the standard cosmology based on the CMB measurements. It can alsopredict other important cosmological parameters at the same time when including theseparameters into the training data. ACKNOWLEDGMENTS

We thank Masahiro Kawasaki, Kazunori Nakayama, and Toyokazu Sekiguchi for provid-ing us the tables and Chien Lin for early collaboration. We are very grateful to FlorianNiedermann for extremely helpful discussions on

CLASS and

MontePython . W.-C. Huangis supported by the Independent Research Fund Denmark, grant number DFF 6108-00623.J.-L. Kuo is supported by the Austrian Science Fund FWF under the Doctoral ProgramW1252-N27 Particles and Interactions. Y.-L. S. Tsai was funded by the Ministry of Scienceand Technology Taiwan under Grant No. 109-2112-M-007-022-MY3. The authors wouldlike to acknowledge that this work was performed using the UCloud computing and storage20esources, managed and supported by eScience center at SDU.

Appendix A: Decay

For DM decay, the energy injection rate is proportional to n χ /τ DM , where τ DM is thelifetime of DM. The contribution of DM decay to ionization and heating is thus expressedas − (cid:20) dx DM e dz (cid:21) decay = (cid:88) F Br F (cid:90) z dz (cid:48) H ( z (cid:48) )(1 + z (cid:48) ) n χ ( z (cid:48) )2 n H ( z (cid:48) ) τ DM m χ E RY dχ Fi ( m χ , z, z (cid:48) ) dz , − (cid:34) dT DM g dz (cid:35) decay = (cid:88) F Br F (cid:90) z dz (cid:48) H ( z (cid:48) )(1 + z (cid:48) ) n χ ( z (cid:48) )3 n H ( z (cid:48) ) τ DM m χ dχ Fh ( m χ , z, z (cid:48) ) dz . (A1) Appendix B: p -wave annihilation The p -wave annihilation cross-section (cid:104) σv rel (cid:105) is proportional to v , which can be writtenas (cid:104) σv rel (cid:105) = (cid:104) v χ (cid:105) v (cid:104) σv rel (cid:105) , (B1)with DM velocity at any time v χ , a reference velocity v and a reference cross-section (cid:104) σv rel (cid:105) for p -wave annihilation when (cid:104) v χ (cid:105) = v . We follow the convention [49] and choose v = 100 km / s, which is roughly the velocity dispersion of DM in halos at z = 0, for easycomparison with the result of indirect detection. After DM kinetic decouples and becomesnon-relativistic, its temperature T χ redshifts as (1 + z ) , thus we can deduce that (cid:104) v χ (cid:105) v = (cid:18) z z ref (cid:19) , (B2)where z ref is the redshift that the root-mean-square velocity v rms ≡ (cid:113) (cid:104) v χ (cid:105) = (cid:112) T χ /m χ of DM equals to v . Note that we have assumed DM is an ideal gas and adopted theenergy equipartition. It is useful to express z ref as a function of the DM kinetic decouplingtemperature T kd . At DM kinetic decoupling ( T χ = T kd ), Eq. (B2) could be transformed into1 + z ref = v (1 + z kd ) (cid:114) m χ T kd , (B3)21here z kd is the redshift of DM kinetic decoupling. Furthermore, z kd can be written in termsof current CMB temperature T γ, = 2 . × − MeV as1 + z kd = T kd T γ, (cid:39) . × (cid:18) T kd MeV (cid:19) . (B4)Putting together Eq. (B3) and Eq. (B4), we obtain1 + z ref = 2 . × (cid:18) T kd MeV (cid:19) / (cid:16) m χ GeV (cid:17) / . (B5)Finally, the thermal-averaged p -wave annihilation cross-section can be expressed as (cid:104) σv rel (cid:105) = (cid:18) z z ref (cid:19) (cid:104) σv rel (cid:105) . (B6)Similar to s -wave annihilation, we can write the DM contribution to ionization and heatingas − (cid:20) dx e dz (cid:21) p -wave = (cid:88) F Br F (cid:90) z dz (cid:48) (1 + z (cid:48) ) H ( z (cid:48) )(1 + z ref ) n χ ( z (cid:48) ) (cid:104) σv rel (cid:105) n H ( z (cid:48) ) m χ E RY dχ Fi ( m χ , z, z (cid:48) ) dz , − (cid:20) dT b dz (cid:21) p -wave = (cid:88) F Br F (cid:90) z dz (cid:48) (1 + z (cid:48) ) H ( z (cid:48) )(1 + z ref ) n χ ( z (cid:48) ) (cid:104) σv rel (cid:105) n H ( z (cid:48) ) m χ dχ Fh ( m χ , z, z (cid:48) ) dz . (B7) Appendix C: Breit-Wigner enhancement

In the Breit-Wigner enhancement scenario, DM annihilates to SM particles via a nar-row resonance. Following Ref. [52], we consider a scalar resonance φ and the general DMannihilation cross-section via the resonance ( χ ¯ χ → φ → f ¯ f ) can be written as σ = 16 πE ¯ β i β i m φ Γ φ ( E − m φ ) + m φ Γ φ × B i B f , (C1)where E cm is the center-of-mass energy, m φ and Γ φ are the mass and decay rate of theresonance φ . The initial state and ﬁnal state space phase factors are ¯ β i = (cid:113) − m χ /m φ evaluated at the resonance and β i = (cid:113) − m χ /E at E cm of the collision. The branchingratio of φ decaying to χ ¯ χ and f ¯ f are B χ and B f , respectively.Considering DM are non-relativistic at the recombination epoch and later times, we are22llowed to adopt the Maxwell-Boltzmann velocity distribution for DM and use the Gaussianaverage to compute the thermal-averaged annihilation cross-section, which reads (cid:104) σv rel (cid:105) = 1(2 πv / (cid:90) d(cid:126)v χ (cid:90) d(cid:126)v ¯ χ e − v χ + v χ ) / (2 v ) × σv rel , (C2)where (cid:126)v χ, ¯ χ are the velocities of initial states and | (cid:126)v χ, ¯ χ | = v χ, ¯ χ . In the non-relativistic limit( v χ , v ¯ χ (cid:28) E ∼ m χ ), the center-of-mass energy can be expanded as E = 4 m χ + m χ v , and the relative velocity can be approximated as v rel (cid:39) β i .The condition of annihilation near a narrow resonance is fulﬁlled when m φ = 4 m χ (1 − δ ) , | δ | (cid:28) , (C3)Note that positive (negative) δ will imply DM annihilates below (above) the pole in Eq. (C1)when ¯ β i = 0 (when m φ = 2 m χ ). The pole becomes unphysical when δ > m φ < m χ ), butit can be regarded as analytic continuations of those quantities from the physical region as δ <

0. With Eq. (C3) and deﬁning γ ≡ Γ φ /m φ , we can write Eq. (C1) as σ = 16 πm φ ¯ β i β i γ ( δ + v / + γ × B i B f . (C4)Ref. [52] provides us a good approximation for the Gaussian average, (cid:104) σv rel (cid:105) (cid:39) πm φ ¯ β i γ [ δ + v / (3 √ + γ × B i B f , (C5)applicable for the parameter region v rms (cid:28) δ < (cid:104) σv rel (cid:105) as (cid:104) σv rel (cid:105) = δ + γ [ δ + v / (3 √ + γ (cid:104) σv rel (cid:105) , (C6)where (cid:104) σv rel (cid:105) for the Breit-Wigner enhancement is deﬁned as the thermal-averaged annihi-lation cross-section at T χ = 0 ( v rms = 0), written as (cid:104) σv rel (cid:105) = 32 πB i B f m φ ¯ β i γ δ + γ . (C7)23herefore, for the Breit-Wigner enhancement, the DM contribution to ionization andheating is formulated as − (cid:20) dx e dz (cid:21) BW = (cid:88) F Br F (cid:90) z dz (cid:48) H ( z (cid:48) )(1 + z (cid:48) ) n χ ( z (cid:48) ) (cid:104) σv rel (cid:105) n H ( z (cid:48) ) γ + δ [ δ + v / (3 √ + γ m χ E RY dχ Fi ( m χ , z, z (cid:48) ) dz , − (cid:20) dT g dz (cid:21) BW = (cid:88) F Br F (cid:90) z dz (cid:48) H ( z (cid:48) )(1 + z (cid:48) ) n χ ( z (cid:48) ) (cid:104) σv rel (cid:105) n H ( z (cid:48) ) δ + γ [ δ + v / (3 √ + γ m χ dχ Fh ( m χ , z, z (cid:48) ) dz . (C8)The redshift dependence of v rms can be derived as v rms ( z ) = (cid:115) T χ ( z ) T kd = 1 + z z kd (cid:39) . × − (1 + z ) (cid:18) T kd MeV (cid:19) − , (C9)where Eq. (B4) is used. [1] G. Jungman, M. Kamionkowski, and K. Griest, Phys. Rept. , 195 (1996), arXiv:hep-ph/9506380.[2] L. Bergstr¨om, Rept. Prog. Phys. , 793 (2000), arXiv:hep-ph/0002126.[3] G. Bertone, D. Hooper, and J. Silk, Phys. Rept. , 279 (2005), arXiv:hep-ph/0404175.[4] M. Pospelov, A. Ritz, and M. B. Voloshin, Phys. Lett. B , 53 (2008), arXiv:0711.4866[hep-ph].[5] R. T. D’Agnolo and J. T. Ruderman, Phys. Rev. Lett. , 061301 (2015), arXiv:1505.07107[hep-ph].[6] S. Matsumoto, Y.-L. S. Tsai, and P.-Y. Tseng, JHEP , 050 (2019), arXiv:1811.03292 [hep-ph].[7] S. Matsumoto, S. Mukhopadhyay, and Y.-L. S. Tsai, JHEP , 155 (2014), arXiv:1407.1859[hep-ph].[8] S. Matsumoto, S. Mukhopadhyay, and Y.-L. S. Tsai, Phys. Rev. D , 065034 (2016),arXiv:1604.02230 [hep-ph].[9] P. Athron et al. (GAMBIT), Eur. Phys. J. C , 38 (2019), arXiv:1808.10465 [hep-ph].[10] E. Bagnaschi et al. , Eur. Phys. J. C , 895 (2019), arXiv:1905.00892 [hep-ph].

11] P. Binetruy, G. Girardi, and P. Salati, Nucl. Phys. B , 285 (1984).[12] K. Griest and D. Seckel, Phys. Rev. D , 3191 (1991).[13] S. Banerjee, S. Matsumoto, K. Mukaida, and Y.-L. S. Tsai, JHEP , 070 (2016),arXiv:1603.07387 [hep-ph].[14] C.-T. Lu, V. Q. Tran, and Y.-L. S. Tsai, JHEP , 033 (2020), arXiv:1912.08875 [hep-ph].[15] J. Chluba and R. A. Sunyaev, Mon. Not. Roy. Astron. Soc. , 1294 (2012), arXiv:1109.6552[astro-ph.CO].[16] J. A. Adams, S. Sarkar, and D. W. Sciama, Mon. Not. Roy. Astron. Soc. , 210 (1998),arXiv:astro-ph/9805108.[17] X.-L. Chen and M. Kamionkowski, Phys. Rev. D , 043502 (2004), arXiv:astro-ph/0310473.[18] N. Padmanabhan and D. P. Finkbeiner, Phys. Rev. D , 023508 (2005), arXiv:astro-ph/0503486.[19] T. R. Slatyer, N. Padmanabhan, and D. P. Finkbeiner, Phys. Rev. D , 043526 (2009),arXiv:0906.1197 [astro-ph.CO].[20] G. F. Smoot et al. (COBE), Astrophys. J. Lett. , L1 (1992).[21] C. Netterﬁeld et al. (Boomerang), Astrophys. J. , 604 (2002), arXiv:astro-ph/0104460.[22] J. Kovac, E. Leitch, C. Pryke, J. Carlstrom, N. Halverson, and W. Holzapfel, Nature ,772 (2002), arXiv:astro-ph/0209478.[23] C. L. Bennett et al. (WMAP), Astrophys. J. , 1 (2003), arXiv:astro-ph/0301158.[24] (2006), arXiv:astro-ph/0604069.[25] M. Ackermann et al. (Fermi-LAT), Phys. Rev. Lett. , 231301 (2015), arXiv:1503.02641[astro-ph.HE].[26] S. Hoof, A. Geringer-Sameth, and R. Trotta, (2018), 10.1088/1475-7516/2020/02/012,arXiv:1812.06986 [astro-ph.CO].[27] L. Oakes et al. , PoS ICRC2019 , 012 (2020), arXiv:1909.06310 [astro-ph.HE].[28] M. Boudaud, J. Lavalle, and P. Salati, Phys. Rev. Lett. , 021103 (2017), arXiv:1612.07698[astro-ph.HE].[29] L. Roszkowski, E. M. Sessolo, and S. Trojanowski, Rept. Prog. Phys. , 066201 (2018),arXiv:1707.06277 [hep-ph].[30] T. R. Slatyer, Phys. Rev. D , 123513 (2013), arXiv:1211.0283 [astro-ph.CO].[31] M. S. Madhavacheril, N. Sehgal, and T. R. Slatyer, Phys. Rev. D , 103508 (2014), rXiv:1310.3815 [astro-ph.CO].[32] T. R. Slatyer, Phys. Rev. D , 023521 (2016), arXiv:1506.03812 [astro-ph.CO].[33] T. R. Slatyer, Phys. Rev. D , 023527 (2016), arXiv:1506.03811 [hep-ph].[34] P. Ade et al. (Planck), Astron. Astrophys. , A13 (2016), arXiv:1502.01589 [astro-ph.CO].[35] N. Aghanim et al. (Planck), (2018), arXiv:1807.06209 [astro-ph.CO].[36] T. R. Slatyer and C.-L. Wu, Phys. Rev. D , 023010 (2017), arXiv:1610.06933 [astro-ph.CO].[37] K. O’Shea and R. Nash, arXiv e-prints , arXiv:1511.08458 (2015), arXiv:1511.08458 [cs.NE].[38] S. Albawi, T. A. Mohammed, and S. Al-Zawi, in (2017) pp. 1–6.[39] T. Kanzaki and M. Kawasaki, Phys. Rev. D78 , 103004 (2008), arXiv:0805.3969 [astro-ph].[40] T. Kanzaki, M. Kawasaki, and K. Nakayama, Prog. Theor. Phys. , 853 (2010),arXiv:0907.3985 [astro-ph.CO].[41] M. Kawasaki, K. Nakayama, and T. Sekiguchi, Phys. Lett.

B756 , 212 (2016),arXiv:1512.08015 [astro-ph.CO].[42] J. Lesgourgues, arXiv e-prints , arXiv:1104.2932 (2011), arXiv:1104.2932 [astro-ph.IM].[43] B. Audren, J. Lesgourgues, K. Benabed, and S. Prunet, JCAP , 001 (2013),arXiv:1210.7183 [astro-ph.CO].[44] T. Brinckmann and J. Lesgourgues, (2018), arXiv:1804.07261 [astro-ph.CO].[45] K. Cheung, J.-L. Kuo, K.-W. Ng, and Y.-L. S. Tsai, Phys. Lett.

B789 , 137 (2019),arXiv:1803.09398 [astro-ph.CO].[46] X. Huang, Y.-L. S. Tsai, and Q. Yuan, Comput. Phys. Commun. , 252 (2017),arXiv:1603.07119 [hep-ph].[47] M. Cirelli, G. Corcella, A. Hektor, G. Hutsi, M. Kadastik, P. Panci, M. Raidal, F. Sala,and A. Strumia, JCAP , 051 (2011), [Erratum: JCAP1210,E01(2012)], arXiv:1012.4515[hep-ph].[48] P. Ciafaloni, D. Comelli, A. Riotto, F. Sala, A. Strumia, and A. Urbano, JCAP , 019(2011), arXiv:1009.0224 [hep-ph].[49] R. Diamanti, L. Lopez-Honorez, O. Mena, S. Palomares-Ruiz, and A. C. Vincent, JCAP , 017 (2014), arXiv:1308.2578 [astro-ph.CO].[50] H. Liu, T. R. Slatyer, and J. Zavala, Phys. Rev. D , 063507 (2016), arXiv:1604.02457[astro-ph.CO].

51] H. An, M. B. Wise, and Y. Zhang, Phys. Lett. B , 121 (2017), arXiv:1606.02305 [hep-ph].[52] M. Ibe, H. Murayama, and T. Yanagida, Phys. Rev. D , 095009 (2009), arXiv:0812.0072[hep-ph].[53] W.-L. Guo and Y.-L. Wu, Phys. Rev. D , 055012 (2009), arXiv:0901.1450 [hep-ph].[54] X.-J. Bi, P.-F. Yin, and Q. Yuan, Phys. Rev. D85 , 043526 (2012), arXiv:1106.6027 [hep-ph].[55] H. Yuksel and M. D. Kistler, Phys. Rev. D , 023502 (2008), arXiv:0711.2906 [astro-ph].[56] S. Palomares-Ruiz, Phys. Lett. B , 50 (2008), arXiv:0712.1937 [astro-ph].[57] L. Zhang, C. Weniger, L. Maccione, J. Redondo, and G. Sigl, JCAP , 027 (2010),arXiv:0912.4504 [astro-ph.HE].[58] M. Cirelli, P. Panci, and P. D. Serpico, Nucl. Phys. B , 284 (2010), arXiv:0912.0663[astro-ph.CO].[59] N. F. Bell, A. J. Galea, and K. Petraki, Phys. Rev. D , 023514 (2010), arXiv:1004.1008[astro-ph.HE].[60] L. Dugger, T. E. Jeltema, and S. Profumo, JCAP , 015 (2010), arXiv:1009.5988 [astro-ph.HE].[61] M. Cirelli, E. Moulin, P. Panci, P. D. Serpico, and A. Viana, Phys. Rev. D , 083506 (2012),arXiv:1205.5283 [astro-ph.CO].[62] K. Murase and J. F. Beacom, JCAP , 043 (2012), arXiv:1206.2595 [hep-ph].[63] R. Essig, E. Kuﬂik, S. D. McDermott, T. Volansky, and K. M. Zurek, JHEP , 193 (2013),arXiv:1309.4091 [hep-ph].[64] Y. Mambrini, S. Profumo, and F. S. Queiroz, Phys. Lett. B , 807 (2016), arXiv:1508.06635[hep-ph].[65] A. Lewis, A. Challinor, and A. Lasenby, Astrophys. J. , 473 (2000), arXiv:astro-ph/9911177 [astro-ph].[66] C. Howlett, A. Lewis, A. Hall, and A. Challinor, JCAP , 027 (2012), arXiv:1201.3654[astro-ph.CO].[67] N. Aghanim et al. (Planck), Astron. Astrophys. , A5 (2020), arXiv:1907.12875 [astro-ph.CO].[68] Y.-L. S. Tsai, Y.-L. Chung, Q. Yuan, and K. Cheung, (2020), arXiv:2011.11930 [astro-ph.HE].[69] D. P. Kingma and J. Ba, arXiv e-prints , arXiv:1412.6980 (2014), arXiv:1412.6980 [cs.LG].[70] J. Cang, Y. Gao, and Y.-Z. Ma, Phys. Rev. D , 103005 (2020), arXiv:2002.03380 [astro- h.CO].h.CO].