Inflation after Planck: and the winners are
aa r X i v : . [ a s t r o - ph . C O ] D ec INFLATION AFTER PLANCK: AND THE WINNERS ARE ...
J´erˆome Martin
Insitut d’Astrophysique de Paris, UMR 7095-CNRS, Universit´e Pierre et Marie Curie,98bis boulevard Arago, 75014 Paris (France)
We review the constraints that the recently released Cosmic Microwave Background (CMB)Planck data put on inflation and we argue that single field slow-roll inflationary scenarios (withminimal kinetic term) are favored. Then, within this class of models, by means of Bayesianinference, we show how one can rank the scenarios according to their performances, leadingto the identification of “the best models of inflation”.
The theory of inflation 1 , , , , , , K = − . +0 . − . , which is of coursevery consistent with inflation and that the cosmological fluctuations are adiabatic (at 95% CL)and Gaussian f loc NL = 2 . ± . f eq NL = − ±
75 and f ortho NL = − ±
39 7. Another importantmessage of the Planck data 8 is the fact that a tilt in the power spectrum has now been detectedat a significant statistical level, n S = 0 . ± . σ . In addition, neither a significant running nor a significant running of the runninghave been detected since it was found that d n S / d ln k = − . ± .
009 (Planck+WP) andd n S / d ln k = 0 . ± .
016 (WMAP+WP), with a pivot scale chosen at k ∗ = 0 . − .Based on the above discussion, it is clear that single field slow roll models (with a minimalkinetic term) are favored from an observational point of view since this class of models preciselypredicts no entropy perturbations and negligible non-Gaussianities. Of course, this does notmean that other inflationary scenarios are ruled out but simply that there are not needed toexplain the data. Inflation therefore appears as a simple and non trivial, but non exotic, theory.It should however be clear that, even if we restrict our considerations to this simple class ofmodels, it still remains a very large number of possible models 9. Then comes the questions ofhow one can constrain these models, estimate their performances and rank them, in a statisticallywell-defined fashion in order to find “the best model(s) of inflation”. Once a well justified methodhas been designed, it can be applied to all inflationary models in order to actually identify whichscenario is favored by the Planck data. Answering and discussing these questions is the mainsubject of the present paper.This article is organized as follows. In the next section, Sec. 2, we briefly review slow-rollinflation. Then, in Sec. 3, we define and discuss what is meant by a model A is better than amodel B. For this purpose, we review the Bayesian model comparison approach, we quickly recallhow the Bayesian evidence of a slow-roll inflationary model can be estimated and we present theresults of Ref. 10 which give the model winners. Finally, in the conclusion, Sec. 4, we summarizeour results. Slow-roll inflation is a very simple system. It consists in one scalar field with a minimal kineticterm and a potential V ( φ ) and its behavior is controlled by the Friedmann-Lemaˆıtre and Klein-Gordon equations, namely H = 13 M Pl " ˙ φ V ( φ ) , ¨ φ + 3 H ˙ φ + V φ = 0 , (1)where H ≡ ˙ a/a denotes the Hubble parameter, a ( t ) being the Friedmann-Lemaˆıtre-RobertsonWalker (FLRW) scale factor and ˙ a its derivative with respect to cosmic time t . M Pl = 8 πG denotes the reduced Planck mass. A subscript φ means a derivative with respect to the inflatonfield. Therefore, the only unknown function is the potential and, here, we try to constrain itsshape using the Planck data.When the potential is no longer flat enough (this usually happens when the system ap-proaches its ground state, i.e. the minimum of the potential), inflation stops, the inflaton fielddecays 11 ,
12, the decay products thermalize 13 and this is how inflation is smoothly connectedo the standard hot Big Bang phase. Let ρ and P be the energy density and pressure of theeffective fluid dominating the Universe during reheating and w reh ≡ P/ρ the corresponding “in-stantaneous” equation of state. One can also define the mean equation of state parameter, w reh ,by 14 w reh ≡ N Z N reh N end w reh ( n )d n, (2)where ∆ N ≡ N reh − N end is the total number of e-folds during reheating, N end being the numberof e-folds at the end of inflation and N reh being the number of e-folds at which reheating iscompleted and the radiation dominated era begins. Then, one introduces a new parameter 14ln R rad ≡ ∆ N − w reh ) . (3)As discussed in detail in Ref. 14, this parameter completely characterizes the reheating phaseand its knowledge is necessary in order to work out the inflationary predictions for the CMB.In particular, it can be related to the so-called reheating temperature through 14 T = 30 ρ end π g ⋆ R w reh ) / (1 − w reh )rad , (4)where ρ end is the energy density at the end of inflation, which is known when V ( φ ) has beenchosen, and g ⋆ is the number of degrees of freedom at that time.Let us now turn to the description of inflationary perturbations. Two types of fluctuationsare relevant for inflation: density perturbations and primordial gravity waves. The densityperturbations are described in terms of the Mukhanov-Sasaki variable v ( η, x ). In the Schr¨odingerapproach, the quantum state of the system is described by a wavefunctional, Ψ [ v ( η, x )], whichcan be factorized into mode components as 15Ψ [ v ( η, x )] = Y k Ψ k (cid:0) v R k , v I k (cid:1) = Y k Ψ R k (cid:0) v R k (cid:1) Ψ I k (cid:0) v I k (cid:1) , (5)where v R k denotes the real part of v and v I k its imaginary part. Each wavefunction obeys aSchr¨odinger equation with an Hamiltonian that can be deduced from a second order expansionof the action “gravity + inflaton field”. Then, one can show that the solution is explicitlytime-dependent and given by a Gaussian ( η being the conformal time)Ψ R , I k (cid:16) η, v R , I k (cid:17) = N k ( η )e − Ω k ( η ) ( v R , I k ) . (6)where the functions N k ( η ) and Ω k ( η ) can be expressed as 15 | N k | = (cid:18) ℜ e Ω k π (cid:19) / , Ω k = − i f ′ k f k . (7)The function f k obeys the equation of motion of a parametric oscillator, namely f ′′ k + ω f k = 0,where the time dependent frequency of this oscillator is given by ω ( η, k ) = k − (cid:0) a √ ǫ (cid:1) ′′ / ( a √ ǫ ), k being the wavenumber of the mode under consideration and ǫ ≡ − ˙ H/H the first slow-roll pa-rameter characterizing the cosmological expansion during inflation. For gravitational waves, onealso obtains a Gaussian wave-function except that the fundamental frequency of the oscillator f k is now given by ω = k − a ′′ /a .One of the great advantage of inflation is that it is possible to choose well justified initialconditions. In brief, this is because, at the beginning of inflation, the physical wavelengths ofFourier modes of cosmological relevance today are much smaller than the Hubble radius. Thesemodes do not feel spacetime expansion and, as a consequence, it is natural to choose the vacuumtate as their initial state. Technically, this amounts to take Ω k = k/ ǫ n +1 ≡ d ln | ǫ n | d N , n ≥ , (8)where ǫ ≡ H ini /H . The slow-roll conditions refer to a situation where all the ǫ n ’s satisfy ǫ n ≪
1. From this definition, we see that ω ( k, η ) for density perturbations depends on ǫ , ǫ and ǫ while, for gravity waves, it only depends on ǫ . Notice that, since H ( φ ) and V ( φ ) arerelated through the Einstein equations, the parameters ǫ n can also be expressed in terms of thesuccessive derivatives of the potential, namely ǫ ≃ M Pl (cid:18) V φ V (cid:19) , (9) ǫ ≃ M Pl "(cid:18) V φ V (cid:19) − V φφ V , (10) ǫ ǫ ≃ M Pl " V φφφ V φ V − V φφ V (cid:18) V φ V (cid:19) + 2 (cid:18) V φ V (cid:19) . (11)The slow-roll approximation also allows us to solve the equation that controls the evolutionof the function f k and, therefore, of the wavefunction. Since the initial conditions are alsocompletely specified (see the above discussion), the function f k and, hence, the wavefunction, iscompletely known. One can then calculate the two-point correlation function of the Mukhanov-Sasaki variable or, in Fourier space, of the power spectrum a . This involves a double expansion.The power spectrum is first expanded around a chosen pivot scale k ∗ such that P ( k ) P = a + a ln (cid:18) kk ∗ (cid:19) + a (cid:18) kk ∗ (cid:19) + . . . , (13)where P ζ = H / (cid:0) π ǫ M Pl (cid:1) and, then, the coefficients a n are expanded in terms of the slow-rollparameters. Concretely, for scalar perturbations, at second order in the slow-roll approximation,one obtains 16 , a = 1 − C + 1) ǫ ∗ − Cǫ ∗ + (cid:18) C + 2 C + π − (cid:19) ǫ ∗ + (cid:18) C − C + 7 π − (cid:19) ǫ ∗ ǫ ∗ + (cid:18) C + π − (cid:19) ǫ ∗ + (cid:18) − C + π (cid:19) ǫ ∗ ǫ ∗ , (14) a = − ǫ ∗ − ǫ ∗ + 2(2 C + 1) ǫ ∗ + (2 C − ǫ ∗ ǫ ∗ + Cǫ ∗ − Cǫ ∗ ǫ ∗ , (15) a = 4 ǫ ∗ + 2 ǫ ∗ ǫ ∗ + ǫ ∗ − ǫ ∗ ǫ ∗ , (16)where C ≡ γ E + ln 2 − ≈ − . γ E being the Euler constant. ǫ n ∗ denotes the value ofthe function ǫ n at Hubble radius crossing during inflation. For gravitational waves, the powerspectrum has the same structure but the expressions of the coefficients a n differ. a For density perturbations, the definition of the power spectrum reads P ζ ( k ) ≡ k π M Pl (cid:12)(cid:12)(cid:12)(cid:12) v k a √ ǫ (cid:12)(cid:12)(cid:12)(cid:12) . (12) n order to make concrete predictions, we must calculate the numerical values of the quan-tities ǫ n ∗ . In order to do so, one needs to know the slow-roll trajectory and we need to calculateaccurately when inflation stops. As a result, ǫ n ∗ usually depends on θ inf , the parameters of thepotential V ( φ ), and on the reheating temperature: ǫ n ∗ = ǫ n ∗ ( θ inf , T reh ).The above considerations explain how the CMB can tell us something about inflation. In-deed, CMB measurements constrain the power spectrum, that is the say, given the form theexpression of P ( k ) above, the values of the parameters ǫ n ∗ ( θ inf , T reh ). These parameters carryinformation about the shape of the potential (recall the expression of the slow-roll parametersin terms of the derivative of the potential) and on the reheating temperature. As a consequence,one can infer what are the properties of the inflaton potential V ( φ ) and learn about the physicalconditions that prevailed in the early universe. In the previous section, we have described how one can calculate the predictions of a giveninflationary model. However, we also would like to compare the performances of the differentinflationary scenarios and one way to achieve this program is to compare the quality of the fitsprovided by the different models.Let us now briefly describe how this can be achieved 18 , ,
20. Let us call M and M two competing models, aiming at explaining some data D (here, of course, we have in mindthe Cosmic Microwave Background - CMB - measurements), the model one depending on oneparameter, θ , and the model two depending on two parameters, α and β . Their likelihoodfunction can be written as L ( D | θ ) = L , max e − χ ( θ ) / , L ( D | α, β ) = L , max e − χ ( α,β ) / , (17)where χ is the effective chi-squared of the corresponding model that we do not need to specifyat this stage. The quality of the fits can be estimated by computing the ratio of the maximumsof the two likelihoods. However, this does not give us information regarding the complexity ofthe two models b . If, for instance, model M achieves a very good fit only at the price of a fine-tuning, while M “naturally” performs well, one may wish to penalize M for its complexity.This “Occam’s razor” criterion is automatically included if one characterizes a model by itsBayesian evidence 19. The Bayesian evidence is the integral of the likelihood function over theprior space. Concretely, for M and M , this leads to E = Z L ( D | θ ) π ( θ ) d θ, E = Z L ( D | α, β ) π ( α, β ) d α d β. (18)The prior distributions π ( θ ) and π ( α, β ), satisfying R π ( θ )d θ = 1 [and a similar expression for π ( α, β )], encodes what we know about the parameter θ before our information is updated whenwe learn about the data D . Let us notice that the likelihood functions are not normalized inthe sense that R L ( D | θ ) d θ = 1. For simplicity, let us now assume that the prior π ( θ ) is flat inthe range [ θ min , θ max ] and vanishes elsewhere. Because the distribution is normalized, one has π ( θ ) = 1 / ∆ θ with ∆ θ = θ max − θ min . Let us also assume that the likelihood function has a bellshape (for instance, but necessarily, is a Gaussian function) characterized by the width δθ . Letus finally suppose that the data give more information than the prior, in other words that thelikelihood is more peaked than the prior. In that case, the Bayesian evidence of model M canbe approximated by E ≃ L , max δθ ∆ θ . (19) b In the following, we will introduce a quantity called the “Bayesian complexity”. Here, we use the word“complexity” in the standard sense, i.e. a model is more complicated than another if, for instance, it has moreparameters or more fine-tuning. At this stage, it should not be confused with the Bayesian complexity. n the same fashion, with the same assumptions (and obvious notations), the evidence of model M can be expressed as E ≃ L , max δα ∆ α δβ ∆ β . (20)Then, applying Bayes’ theorem, the probability of model M is given by p ( M | D ) = E π ( M ) /p ( D )and a similar formula for p ( M | D ). In this expression, π ( M ) represents the prior of model M and the quantity p ( D ) is a normalization factor. If we say that, initially, the two models areequally probable, that is to say π ( M ) = π ( M ), then the ratio of their posterior probabilities,the so-called Bayes factor, can be expressed as B ≡ p ( M | D ) p ( M | D ) = E E = L , max L , max δα ∆ α δβ ∆ β ∆ θδθ . (21)We see that the Bayes factor is controlled by the ratio L , max / L , max but now weighted by afactor, the so-called Occam factor, which penalizes the more complicated model, M , for anywasted parameter space. If, for instance, we take δα/ ∆ α = δβ/ ∆ β = δθ/ ∆ θ = 0 .
01, then B = 0 . L , max / L , max and the more complicated model can win only if its likelihood at the“best fit point” is two orders of magnitude larger than that of M . So the best model is themodel which can achieve the best compromise between simplicity and quality of the fit.From the previous considerations, we see that the Bayesian evidence is an ideal tool to rankmodels and to find the best model. Nevertheless, it has the following property that could beconsidered as a shortcomings. Suppose we define a model M such that it is in fact model M but with a third parameter, say γ , such that this new parameter does not affect in any way thefit to the data; in other words, such that the likelihood is flat along γ . In that case, the evidenceof model M is given by E = Z L ( D | α, β, γ ) π ( α ) π ( β ) π ( γ )d α d β d γ = Z L ( D | α, β ) π ( α ) π ( β ) π ( γ )d α d β d γ (22)= Z L ( D | α, β ) π ( α ) π ( β )d α d β Z π ( γ )d γ = E . (23)Therefore, the two models have the same evidence despite the fact that M is obviously simplerthan M . In order to break this degeneracy, one has to introduce another quantity, the Bayesiancomplexity 18, which allows us to distinguish M and M .In order to discuss the definition of the complexity, we work with a one parameter modelonly, i.e. M , (the generalization to an arbitrary number of parameters is straightforward) andwe explicitly assume that the likelihood of the model is a Gaussian, namely L ( D | θ ) = L , max e − ( θ − d ) / ( σ ) , (24)where d represents a measurement of the parameter θ . Regarding the prior, instead of consideringa flat distribution as before, we also assume it is given by a Gaussian centered at θ = µ , π ( θ ) = 1Σ √ π e − ( θ − µ ) / ( ) . (25)We can check that this distribution is properly normalized. These new assumptions are made forconvenience only and do not change the above discussion (in fact, not quite exactly, see below).In particular, now, δθ is clearly given by σ and the ∆ θ by Σ so that the condition that the dataare more informative than the prior, δθ ≪ ∆ θ , corresponds to σ ≪ Σ. Then one can calculatethe posterior distribution of the parameter θ , p ( θ | D ) = 1 E L ( D | θ ) π ( θ ) (26)= 1 √ π r + 1 σ exp " − (cid:18) + 1 σ (cid:19) (cid:18) θ − d + µ σ / Σ σ / Σ (cid:19) , (27)hich is a properly normalized Gaussian with mean and variance respectively given by d + µσ / Σ σ / Σ , q + σ . (28)On the other hand, the evidence of the model can be expressed as E = L , max p /σ e − ( µ − d ) / [ ( σ +Σ )] . (29)This result is compatible with the previous discussion. Indeed, if the likelihood is more informa-tive than the prior, then Σ /σ ≫ ∼ L , max σ/ Σwhich is equivalent to L , max δθ/ ∆ θ and shows that the Occam’s factor is simply σ/ Σ.We now come to the definition of the Bayesian complexity denoted by C b in what follows.It reads 18 C b = (cid:10) χ ( θ ) (cid:11) − χ ( h θ i ) , (30)where the symbol h· · · i means an average of the quantity · · · with a weigh given by the posterior p ( θ | D ). In the above expression, the effective χ is defined by − L , which in the present case,reads χ ( θ ) = 1 σ ( θ − d ) − L , max . (31)Then, using the explicit expression for the posterior distribution, see Eq. (26), and the previousexpression for the χ , one obtains the following formula for the Bayesian complexity C b = Z p ( θ | D ) χ ( θ )d θ − χ (cid:20)Z p ( θ | D ) θ d θ (cid:21) = 11 + σ / Σ . (32)Therefore, if σ ≪ Σ, one has C b ≃
1. In other words, since the likelihood function is much morepeaked than the prior, the parameter θ is well-measured and the complexity is one. If, one thecontrary, σ ≫ Σ, then C b ≃ θ . In themultidimensional case (i.e. a model with n parameters), one has C b = P ni =1 / (1 + σ i / Σ i ), andthe complexity gives the number of parameters that have been measured with the data D or,in other words, the number of eigendirections in which the likelihood is more informative thanthe prior.Finally, to conclude this section, let us try to derive the complexity for another very simpleone parameter model, similar to the example we treated at the beginning of this article. Thiswill help us to understand the meaning of complexity in another context 21. We assume that thelikelihood is flat, centered at θ = 0 with a width given by δθ and a height L max . We also assumethat the prior is flat in the range [ − ∆ θ/ , ∆ θ/
2] and has height 1 / ∆ θ (and is less informativethan the likelihood). In that case, it is straightforward to estimate the evidence of the modelwhich is E = L max δθ/ ∆ θ . On the other hand, the posterior on the parameter θ can be expressedas p ( θ | D ) = L max ∆ θ E = 1 δθ , for − δθ < θ < δθ , (33)and vanishes otherwise. As a consequence, one finds that the complexity can be written as C b = − Z δθ/ − δθ/ δθ ln L d θ + 2 ln L max = 0 . (34)We see that one can no longer interpret the complexity as we did before. The reason is that themodel we have used is too far from a Gaussian model and the concept of complexity cannot bereally defined in that case. This illustrates the limitation of this statistical tool which is efficientonly if the underlying statistics is not too far from a Gaussian. This is a warning that shouldbe kept in mind in the following. ln B i REF | Odds Strength of evidence < . < . ∼ . ∼
12 : 1 Moderate evidence5 . ∼
150 : 1 Strong evidence
Table 1: Jeffreys scale for evaluating the strength of evidence when comparing two models, M i versus a referencemodel M REF . Following the above considerations, it should now be clear that one way to estimate the perfor-mances of inflationary models (in explaining the recently released Planck data) is to calculatetheir evidence and their complexity. Then, one can rank them in a statistically consistent wayand find the best scenarios. The predictions of all single field scenarios have been worked out andcompared to Planck data in
Encyclopædia Inflationaris B i REF ≡ E ( D |M i ) E ( D |M REF ) , (35)where the reference model was taken to be the Starobinsky model. The “Jeffreys scale”, seeTable 1, gives an empirical prescription for translating the values of B i REF into strengths ofbelief. One can summarize our results as follows. Firstly, for convenience, one can change thereference point of the Bayes factor and estimate the quantity B i BEST ≡ E ( D |M i ) / E ( D |M BEST )(rather than B i REF before) with non-committal model priors. Then, one uses the Jeffreys scalewith B i BEST , instead of B i REF , and count the number of models in the “inconclusive”, “weakevidence”, “moderate evidence” and “strong evidence” zones. The models in the “inconclusive”category can be viewed as the best models. We have found that this is the case for 52 modelsfor a total of 193 models, that is to say 26% of the models. Therefore, this means that ≃
73% ofthe inflationary scenarios can now be considered as disfavored and/or ruled out by the Planckdata.Secondly, one determines the number of unconstrained parameters, N i uc , which is the numberof parameters of model M i , N i param , minus its complexity C i b N i uc = N i param − C i b . (36)Then, among the models in the “inconclusive” region, one should prefers models for which N i uc ≃
0. If one retains the criterion 0 < N i uc <
1, then one reduces the number of “goodmodels” to 17, that is to say to ≃
9% of the
Encyclopædia Inflationaris scenarios.These results are summarized in Fig. 1 which shows the histogram corresponding to thenumber of models in each Jeffreys category with a given value of N i uc . A complete analysis andthe list of the best models can be found in Ref. 10. In these proceedings, we have analyzed the implications of the recently released Planck datafor inflation. We have argued that single field slow-roll scenarios with minimal kinetic term arefavored by Planck 2013. Then, we have designed specific Bayesian tools to further constrain themodels within the class of favored scenarios. We have shown that Planck2013 can then singleout about ∼
10% of the models, thus strongly reducing the inflationary landscape compatiblewith the astrophysical observations. Our results demonstrate concretely that CMB data can igure 1 – Histogram representing the number of inflationary models after Planck2013 according to the Jeffreycategory and the number of unconstrained parameters. constrain the physics of the early universe in an efficient way. In the near future, the nextrelease of Planck measurements should allow us to learn even more about inflation.
Acknowledgments
I would like to thank C. Ringeval and V. Vennin for careful reading of the manuscript.
References
1. A. Starobinsky,
Phys. Lett. B , 99 (1980).2. A. Guth, Phys. Rev. D , 347 (1981).3. V. Mukhanov and G. Chibisov, JETP Lett. , 532 (1981).4. A. Linde, Phys. Lett. B , 389 (1982).5. A. Starobinsky, Phys. Lett. B , 175 (1982).6. J. Martin, Lect. Notes Phys. , 193 (2008).7. P. Ade et al., arXiv:1303.5084
8. P. Ade et al., arXiv:1303.5076 .9. J. Martin, C. Ringeval, and V. Vennin, arXiv:1303.3787 .10. J. Martin, C. Ringeval, R. Trotta and V. Vennin, arXiv:1312.3529 .11. M. Turner,
Phys. Rev. D , 1243 (1983).12. L. Kofman, A. Linde and A. Starobinsky, Phys. Rev. D , 3258 (1997).13. D. Podolsky, G. Felder, L. Kofman and M. Peloso Phys. Rev. D , 023501 (2006).14. J. Martin and C. Ringeval, Phys. Rev. D , 023511 (2010).15. J. Martin, V. Vennin and P. Peter, Phys. Rev. D , 103524 (2012).16. D. Schwarz and C. Terrero-Escalante, JCAP , 003 (2004).17. J. Martin, C. Ringeval and V. Vennin,
JCAP , 021 (2013).18. M. Kunz, R. Trotta and D. Parkinson,
Phys. Rev. D , 023503 (2006).9. R. Trotta, Contemp. Phys. , 71 (2008).20. J. Martin, C. Ringeval and R. Trotta, Phys. Rev. D , 063524 (2011).21. F. Feroz, K. Cranmer, M. Hobson, R. de Austri and R. Trotta, JHEP , 042 (2011).22. C. Ringeval, arXiv:1312.2347arXiv:1312.2347