[PDF] Comparison between different methods of model selection in cosmology

Abstract

There are several methods for model selection in cosmology which have at least two major goals, that of finding the correct model or predicting well. In this work we discuss through a study of well-known model selection methods like Akaike information criterion (AIC), Bayesian information criterion (BIC), deviance information criterion (DIC) and Bayesian evidence, how these different goals are pursued in each paradigm. We also apply another method for model selection which less seen in cosmological literature, the Cross-validation method. Using these methods we will compare two different scenarios in cosmology, \LambdaCDM model and dynamical dark energy. We show that each of the methods tends to different results in model selection. While BIC and Bayesian evidence overrule the dynamical dark energy scenarios with 2 or 3 extra degree of freedom, the DIC and cross-validation method prefer these dynamical models to \LambdaCDM model. Assuming the numerical results of different analysis and combining cosmological and statistical aspects of the subject, we propose cross-validation as an interesting method for model selection in cosmology that can lead to different results in comparison with usual methods of model selection.

Full PDF

aa r X i v : . [ a s t r o - ph . C O ] F e b D RAFT VERSION F EBRUARY

23, 2021

Preprint typeset using L A TEX style AASTeX6 v. 1.0

COMPARISON BETWEEN DIFFERENT METHODS OF MODEL SELECTION IN COSMOLOGY M EHDI R EZAEI , AND M OHAMMAD M ALEKJANI Department of Physics, Bu-Ali Sina University, Hamedan 65178, Iranand Iran meteorological organization, Hamedan Research Center for Applied Meteorology, Hamedan, Iran

AbstractThere are several methods for model selection in cosmology which have at least two major goals, that of ﬁndingthe correct model or predicting well. In this work we discuss through a study of well-known model selectionmethods like Akaike information criterion (AIC), Bayesian information criterion (BIC), deviance informationcriterion (DIC) and Bayesian evidence, how these different goals are pursued in each paradigm. We also applyanother method for model selection which less seen in cosmological literature, the Cross-validation method.Using these methods we will compare two different scenarios in cosmology, Λ CDM model and dynamical darkenergy. We show that each of the methods tends to different results in model selection. While BIC and Bayesianevidence overrule the dynamical dark energy scenarios with 2 or 3 extra degree of freedom, the DIC and cross-validation method prefer these dynamical models to Λ CDM model. Assuming the numerical results of differentanalysis and combining cosmological and statistical aspects of the subject, we propose cross-validation as aninteresting method for model selection in cosmology that can lead to different results in comparison with usualmethods of model selection. INTRODUCTIONThe evidence of the accelerated expansion of current Uni-verse ﬁrstly comes from extensive surveys of high-redshifttype Ia supernovae (SnIa) in the late 1990s (Riess et al. 1998;Perlmutter et al. 1999). Recent advances in observationalcosmology also conﬁrms this acceleration phase. In orderto explain this accelerating phase, cosmologists have pro-posed many different cosmological models. These modelscan be divided into two main groups based on theoreticalassumptions on the cause of the accelerated expansion. Inthe ﬁrst group, the accelerated expansion can be explainedby introducing an unknown form of component with neg-ative pressure, dubbed dark energy (DE), while in the sec-ond group of models the mentioned expansion is justiﬁed bysome modiﬁcation of the standard theory of gravity on extra-galactic scales (see Carroll et al. 2004; Kowalski et al. 2008;Nojiri & Odintsov 2011).In the framework of general relativity (GR), the cosmologi-cal constant Λ is the simplest and most likely candidate forDE and occupies about of the energy budget of theuniverse (Bennett et al. 2003; Peiris et al. 2003; Spergel et al.2003). One can easily explain the recent accelerated expan-sion phase of the universe by assuming the cosmological con-stant ( Λ ) and cold dark matter (CDM). Besides the obser-vational success of the Λ CDM model, it suffers from sometheoretical problems such as the ﬁne-tuning problem and thecoincidence problem (Weinberg 1989; Padmanabhan 2003;Copeland et al. 2006). Furthermore, the Λ CDM model plagued with some observa-tional tensions in estimation of some key cosmological pa-rameters. In particular as the ﬁrst tension, the Lyman- α for-est measurement of the Baryon Acoustic Oscillations from(Delubac et al. 2015), suggests a smaller value of matter den-sity parameter in comparison with that of obtained by CMBdata. Furthermore, the other tension concerns the discrep-ancy between large scale structure data from (Macaulay et al.2013), and too large value of σ predicted by the Λ CDM. Asthe other tension, there is a statistically signiﬁcant disagree-ment between the H value obtained by the classical distanceladder and that measured by the Planck CMB data (Freedman2017).Quantitatively speaking, the Λ CDM cosmology deducedfrom Planck CMB data predict H = 67 . ± . km/s/Mpc(Aghanim et al. 2018), while from the Cepheid-calibratedSnIa (Riess et al. 2019) we have H = 74 . ± . km/s/Mpc. Some investigations using model independentcosmographic approaches in the literature show that this bigvalue of tensions which observed between the results of Λ CDM and those obtained in model independent approach,support this claim that we should investigate for ﬁnding goodalternatives to Λ CDM (see also Rezaei & Malekjani 2017;Yang et al. 2019; Khadka & Ratra 2019; Lusso et al. 2019;Benetti & Capozziello 2019).Exiting from the Λ CDM model proposed different kindsof DE models, from simplest generalization of the Λ CDM,the so-called w CDM model, which assumes DE as a per- R

EZAEI ET AL .fect ﬂuid with a constant state parameter w differs from − to other different dynamical DE scenarios in literature (seealso Veneziano 1979; Erickson et al. 2002; Thomas 2002;Caldwell 2002; Padmanabhan 2002; Gasperini & Veneziano2002; Elizalde et al. 2004; Gomez-Valent & Sola 2015).However, in order to solve some of tensions in Λ cosmol-ogy, we need some thing more than late time deviations from Λ CDM. For instance, to solve the Hubble tension we requiremodiﬁcations prior to recombination which lower the soundhorizon in order to match BAO and uncalibrated SnIa obser-vations, as discussed in (Lloyd Knox 2020).Although, roughly all of these models can justify the histor-ical evolution of the universe, especially the current acceler-ated expansion, but just some of them can solve or alleviatethe problems of Λ CDM. To compare these models and ﬁndthe best of them, one can do a comparison between them withdifferent observational data and examine those compatibilitywith observations. In the literature, it is common to use dif-ferent approaches to select the best model among differentones. Some of these approaches which more be used by cos-mologists are the least squares method, likelihood-ratio test,Akaike information criterion (AIC)(Akaike 1974), Bayesianinformation criterion (BIC)(Schwarz 1978), deviance infor-mation criterion (DIC)(David J. Spiegelhalter & Linde 2002)and statistical Bayesian evidence (see also Malekjani et al.2017; Rezaei et al. 2017; Malekjani et al. 2018; Lusso et al.2019; Rezaei 2019a; Lin et al. 2019; Rezaei et al. 2019,2020a,b). Also the Bayesian model comparison approachas applied to cosmological models has been strongly criti-cized by Cousins (2008, 2017). Doing a comparison betweencosmological constant and the dynamical DE shows that theBayesian Evidence will only be of interest in model selec-tion, if the models and priors are physically well motivated(Efstathiou 2008). However, Bayesian evidence remains thestandard for model comparison in the ﬁeld, much more thanother methods. Moreover, there are some new methods toovercome the limits of Bayesian evidence, for instance theopen likelihoods method which discussed in (Gariazzo ????).Thus it can be useful if one can ﬁnd more proper approachesfor model selection. In the ﬁeld of statistics, there is an otherconventional approach for model selection, dubbed cross-validation. Different methods of this approach have been in-vestigated in (Dabbs & Junker 2016) and those results havebeen compared with AIC and BIC. It has been shown thatthe cross-validation performs more accurate model selection,and avoids over-ﬁtting better than any of the other model se-lection methods considered in (Dabbs & Junker 2016).In this work we want to use cross-validation method formodel selection and compare it with other common ap-proaches in cosmology. In this way we will assume twodifferent parametrizations to introduce the Equation of State(EoS) parameter of DE and compare those results with con-cordance Λ CDM. These parametrizations can help us in or-der to study the main effects of dynamical DE in compar- ison with cosmological constant. More details about theseparametrizations can be seen in section 3. Previously in(Rezaei et al. 2017), these DE scenarios have been comparedin the light of different observational data combinations usingboth of AIC and BIC criteria. The aim of this paper is to com-pare these models by cross validation method. In Sect.2, webrieﬂy introduce the cross validation and other different ap-proaches for model selection which commonly used in cos-mology and some details of the procedure which we have ap-plied in our analysis. In Sect.3, ﬁrstly we describe the basicequations of cosmological models under study and secondlywe present our data samples. We present our results and dis-cussions in Sect.4. Sect.5 brieﬂy gives the conclusions. HOW TO SELECT THE BEST MODEL?To compare different cosmological models in the light ofthe observational data we need procedures which can numer-ically determine the goodness of ﬁt and also the best value offree parameters. One of the simplest methods which havecommonly been applied in cosmology is the least squaresmethod. It is a statistical procedure to ﬁnd the best ﬁt fora set of data points by minimizing the sum of the offsetsor residuals of data points from the model values (Aldrich2007). Using this method we can ﬁnd the best ﬁt parametersof a model and its goodness of ﬁt. Although increasing thenumber of free parameters can improve the goodness of ﬁt ofthe models to data, but it also enhances the complexity of themodel.In comparing different models, the least squares method cannot remove the effects of additional free parameters. There-fore in comparing various models with different number offree parameters we should choose another approach. To solvethis problem, it is proposed some criteria, based on Occam’srazor which says " among competing hypotheses, the onewith the fewest assumptions should be selected". Given achoice of competing theories, Occam’s razor is the princi-ple that directs us to pick the simplest one as the most likelyto be correct(Ralph 2015). Based on this principle Akaikeproposed a criteria the so-called Akaike Information Criteria(AIC)(Akaike 1974). Historically AIC is the ﬁrst penalizedmaximum likelihood rule. If one reads Akaike’s early papersand resolves the ambiguities in the light of the further pio-neering theoretical from (Shibata 1984), it becomes clear thatthe objective is to choose a theoretical model so that one canpredict optimally the data on the dependent variables in anexact new replicate of the design for the given data. In fact,the AIC is maximized over models to get the best predictivemodel. Furthermore Shao’s theorem (Shao 1997) shows thatAIC can help us to identify the most useful model from thepoint of view of prediction (Dutta et al. 2015).Bayes Information Criterion (BIC)(Schwarz 1978) as thesecond penalized maximum likelihood rule, provides a con-venient approximation,which may be interpreted as a penal-ized maximum likelihood corresponding to a model. BIC

ODEL SELECTION k equal-size subsets. In turn each of the k subsets is retained as the validation set, while the remaining k − folds work as the training set, and the average predic-tion error of each candidate model is obtained (Arlot 2010).In the literature, there are conﬂicting recommendations onthe data splitting ratio for cross-validation (Arlot 2010). Asthe data can be divided in many ways into k groups, it in-troduces additional variance in the estimates. This variancecan be reduced by repeating k-fold cross-validation severaltimes with different permutations in the data division. When k is increasing, we observe an decrement in the training error.While as k increases, the cross-validation error ﬁrst decreasesand then starts increasing. Therefore, we try to ﬁnd the opti-mum value of k which minimizes the cross-validation error.In order to ﬁnd this optimum value, we repeated our analysisusing different values of k between < k < . We ﬁnd thatfor the values of k between 8 to 11, the cross-validation errorreaches its minimum. This result is in agreement with thoseobtained in literature(see also Arlot 2010; A. Vehtari 2015;Kohavi 1995; et. al. 2015). Kohavi focused only on accu-racy estimation in all the numerical work in (Kohavi 1995).His work has been very well-known and the recommenda-tion there that the best method to use for model selection is10-fold Cross Validation has been followed by many in com-puter science, statistics and other ﬁelds(et. al. 2015).Upon our results and in the line of mentioned literature,we use 10-fold cross-validation for selecting the best model.In this way we have divided our data points in to ten sub-samples. In order to split our data points we sort all of datapoints upon redshift, select the ﬁrst to 10th data points fromthe list and put each of them in a different sub-samples. Thenwe repeat this procedure for 11th to 20th data point of thelist and follow this procedure to the end of the list of datapoints. After preparing sub-samples in each step of the anal-ysis, we leave one of our sub-samples out (test sample) andusing other ones (training sample) to put constraints on thefree parameters of the model. Now, based on the best ﬁt val-ues, we compare theoretical values predicted by model withthe test sample and ﬁnd its χ t value. This process repeatagain until all of the sub-samples will be used as a test sam-ple.Finally, we calculate the sum of χ t values to obtain χ tot .Doing this procedure for each of the models, we will have a χ tot value which shows the goodness of ﬁt for that model.The model with the lowest cross-validation score ( χ tot inour analysis) will perform best on the testing data and willachieve a balance between under-ﬁtting and over-ﬁtting. Tohave a comparison between cross-validation and commonlyused criteria, AIC and BIC, we perform the Markov chainMonte Carlo (MCMC) analysis and upon the results of this R EZAEI ET AL .analysis we will compute AIC and BIC values for each of themodels under study. For more details concerning the MCMCanalysis we refer the reader to Mehrabi et al. (2015) (seealso Malekjani et al. 2017; Rezaei et al. 2017; Rezaei 2019a).Moreover, we compute the other criteria, DIC and the wellknown Bayesian evidence (for more details see Rezaei et al.2020a) for different cosmological scenarios. In the next sec-tion we introduce the cosmological models that we want toinvestigate and also represent the observational data sets usedin our analysis. MODELS AND DATA SETSOne of the simplest ways for studying the EoS parame-ter of dynamical DE models is via a parameterization. It iseasy to ﬁnd many different EoS parameterizations in litera-ture. The earliest parameterization is the Taylor expansionof EoS with respect to z up to ﬁrst order (Maor et al. 2001;Riess et al. 2004). This simple parametrization also have begeneralized by considering the second order approximationin Taylor series (Bassett et al. 2008). Although these twoparameterizations can introduce the dynamical properties ofDE in a very simple way, but they diverge at higher red-shifts. Thus, investigations have been continued for ﬁndingmore suitable parameterizations. The suitable parameteriza-tions should can provide the dynamical properties of DE EoSand also prevent any divergence at high redshifts. In order toachieve this goal, some purely phenomenological parameter-izations have been proposed in literature (Efstathiou 1999;Jassal et al. 2005; Barboza et al. 2009). Here we considertwo well known parameterizations namely CPL and PADEparameterizations as the alternatives of cosmological con-stant. For a good behavior function f ( x ) , the PADE approx-imation of order ( i, j ) has the following form (Pade 1892;Baker & Graves-Morris 1996; Adachi & Kasai 2012): f ( x ) = a + a x + a x + ... + a j x j b + b x + b x + ... + b i x i , (1)where all the coefﬁcients are constants. In this study, as thePADE parameterization, we focus on the expansion of theEoS parameter w d with respect to (1 − a ) up to order (1 , upon Eq.(1) as: w d ( a ) = w + w (1 − a )1 + w (1 − a ) . (2)To study the background evolution of the universe in PADEcosmology, we assume an isotropic and homogeneous spa-tially ﬂat FRW cosmologies. In this cosmology which drivenby non-relativistic matter and DE, the ﬁrst Friedmann equa-tion reads H = 8 πG ρ m + ρ d ) , (3)where H ≡ ˙ a/a and ρ m and ρ d are the energy densities ofdark matter and DE, respectively. In the absence of interac-tion among the cosmic ﬂuid components, we will have the following differential equations for different energy densi-ties: ˙ ρ m + 3 Hρ m = 0 , (4) ˙ ρ d + 3 H (1 + w d ) ρ d = 0 , (5)where the over-dot indicates a derivative with respect to cos-mic time t . Inserting Eqs . (2) into equation (5), we canﬁnd the DE density of PADE parameterizations (see alsoRezaei et al. 2017; Rezaei 2019b) ρ (P)d = ρ d0 a − w w w w ) [1 + w (1 − a )] − w − w w w w ) , (6)Combining Eq.(6) with Eq.(3) we obtain the dimension-less Hubble parameter E = H/H as follow(see alsoRezaei et al. 2017): E = Ω m0 a − + Ω d0 a − w w w w ) × (1 + w − aw ) − w − w w w w ) , (7)where Ω m0 is the matter density parameter and thus we have Ω d0 = 1 − Ω m0 as DE parameter. As the second DEparametrization in this work, we investigate the CPL parame-terization. One can easily ﬁnd that setting w = 0 , Eq. (2) re-duces to CPL parameterization. We note that unlike CPL pa-rameterization, in PADE cosmology, the EoS parameter with w = 0 avoids the divergence at far future ( a → ∞ ). Fol-lowing the above lines, for CPL parameterization, it is easyto ﬁnd ρ (C)d = ρ d0 a − w + w ) exp[ − w (1 − a )] , (8)and E = Ω m0 a − + Ω d0 a − w + w ) exp[ − w (1 − a )] . (9)Now, in order to study the performance of standard Λ CDMand the above cosmological parameterizations against obser-vations, we assume two different sets of latest data samples:• Pantheon: This sample includes the apparent mag-nitude of 1048 type Ia supernovae in the range of . < z < . (Scolnic et al. 2018). Using this sam-ple we can constrain cosmological parameters throughthe comparison of their apparent luminosities.• Hubble data: Latest measurements from cosmicchronometers for Hubble parameter, H ( z ) , from(Farooq et al. 2017) is the second sample which wehave used in our analysis.• Baryon acoustic oscillations (BAO): In order to con-strain late-time new physics we have used 6 datapoints of available BAO measurements from 6dfGalaxy,MGS, BOSS DR12, SDSS DR12, BOSSDR14 and DES collaborations (Beutler et al. 2011;Alam et al. 2017; Ross et al. 2015; Gil-Marín et al.2018; Abbott et al. 2019; Bautista et al. 2017). ODEL SELECTION RESULTS AND DISCUSSIONSIn order to obtain the value of AIC and BIC values forDE models under study, we perform statistical analysis usingmentioned data samples in MCMC algorithm. In particularthe total chi-square χ has the following form: χ ( p ) = χ + χ + χ . (10)where p is the vector including the free parameters of themodel under study. The relevant vectors for different cosmo-logical models in this work are:• PADE: p = { Ω M0 , h, w , w , w } • CPL: p = { Ω M0 , h, w , w } • Λ CDM: p = { Ω M0 , h } Computing the minimum value of χ ( χ ), we can obtainthe value of AIC and BIC for each of the model under study: AIC = χ + 2 K , (11)

BIC = χ + K ln N , (12)where K and N are the number of free parameters andthe total number of data points respectively. The DIC crite-rion employs both Bayesian statistics and information theoryconcepts (David J. Spiegelhalter & Linde 2002), and it is ex-pressed as (Liddle 2007) DIC = D (¯ p ) + 2 C B . (13)where the quantity C B = ¯ D ( p ) − D (¯ p ) is the Bayesiancomplexity and over-lines imply the standard mean value.Moreover, D ( p ) is the Bayesian deviation, which can be ex-pressed as D ( p ) = χ ( p ) in the case of exponential class ofdistributions (Trashorras et al. 2016; Liddle 2007, see moredetails in). It is closely related to the K , number of effectivedegrees of freedom, which is actually the number of param-eters that affect the ﬁtting. In a less strict manner, it couldbe considered as a measure of the spread of the likelihood(Anagnostopoulos et al. 2020).As a different model selection method we compute theBayesian evidence in different scenarios which we considerin this paper. Considering Θ as the free parameters of themodel M and D as data set, Bayesian evidence ǫ is given by: ǫ = p ( D | M ) = Z p (Θ | M ) p ( D | Θ , M ) d Θ . (14)Although this might has an analytic solution for a low di-mensional cases, for a high denominational problem it is in-tractable analytically and we have to use numerical meth-ods to evaluate the integral value. Here, we use the se-quential Monte Carlo (SMC) algorithm to sample the pos-terior. The evidence is a crucial quantity for model selec-tion in Bayesian framework and in comparison between two models. In this paper, we use the Jeffreys’ scale (Jeffreys1961) to measure the signiﬁcant difference between two dif-ferent models. Considering two models M and M the Jef-freys scale with respect of ∆ ln ǫ = ln ǫ M − ln ǫ M is asfollowing(Nesseris et al. 2013):• for ∆ ln ǫ < . there is a weak evidence against M .• for . < ∆ ln ǫ < there is deﬁnite evidence against M .• for ∆ ln ǫ > there is strong evidence against M We report the numerical results of our analysis for differentmodels under consideration in Tab.1.As one can see in the Tab.1, value of χ min which wehave obtained for each of the models indicates that the re-lated model can be a good model for ﬁtting the observationaldata. But as we mentioned before, because of different num-ber of free parameters, the values of χ min is not suitable forcomparing these models. Therefore, in the next step we useAIC values for model comparison. As we can see in nu-merical results, Λ CDM with

AIC = 1065 . has the leastvalue of AIC and upon this criteria we can say Λ CDM isthe best model and on the other hand, PADE is the worstone. But, how we can determine the distance between thebest model and worst one from AIC point of view. The valueof ∆ AIC is the parameter that determine this distance. Al-though Λ CDM with the minimum value of AIC is the bestmodel, but ∆ AIC < for two other models indicate thatthere are Signiﬁcant support to CPL and PADE models. Thuswe can conclude from AIC results that Λ CDM is the bestmodel and also we observe signiﬁcant support to other mod-els under study. For more details concerning ∆ AIC and thelevel of support to a model upon it we refer the reader to(Kass & Raftery 1995; Rezaei 2019a). In the BIC columnwe observe that Λ CDM has the minimum value. In otherwords, from BIC point of view the Λ CDM is the best modeland PADE is the worst one. This result was expected becauseof existence of more extra free parameters in PADE param-eterization. The results which we ﬁnd from ∆ BIC valuesare completely different from those we ﬁnd from ∆ AIC. Inthis case we have ∆ BIC

P ADE = 16 . and thus Very strongevidence against

PADE parameterization. For CPL param-eterization we have ∆ BIC

CP L = 10 . and equivalently Very strong evidence against this cosmology. Reader canﬁnd more details concerning ∆ BIC and the strength of evi-dence against each candidate model in Refs.(Kass & Raftery1995; Rezaei 2019a). Here we observe a signiﬁcant conﬂictbetween the conclusions we obtained from AIC and BIC cri-teria.Assuming the values of ∆ DIC for different models we obtaindifferent results compared with those we obtained from AICand BIC. From the value of ∆ DIC = 12 . we observe verystrong evidence against Λ CDM, while PADE with minimum R

EZAEI ET AL . Table 1 . The statistical results for different cosmological models studied in this work obtained using MCMC analysis. Note that we calculatethe values of ∆ AIC, ∆ BIC and ∆ DIC related to the Λ CDM. Moreover, in the last column we have ∆ ln ǫ = ln ǫ Λ CDM − ln ǫ Model as theresult of Bayesian evidence.Model

K N χ min AIC ∆ AIC BIC ∆ BIC DIC ∆ DIC ∆ ln ǫ PADE . . . . . . . . CPL . . . . . . . . CDM . . . . . . . . value of DIC is the best model. From the Bayesian evidencepoint of view, Λ CDM is the best model, while we have deﬁ-nite evidence against both PADE and CPL parameterizations.Now, we focus on the results of cross-validation method.As we mentioned before, our analysis was done in ten steps.In each of these steps, we perform a MCMC analysis usingtraining sample (includes 983 data points from pantheon,H(z) and BAO sample which were selected randomly)and ﬁnd the best ﬁt parameters for DE models. Then, bychoosing free parameters equal to best ﬁts, we compute χ t using test sample (includes remained 109 data pointsof pantheon, H(z) and BAO). In Tabs.(2-4) we show theresults of above steps for Λ CDM model, CPL and PADEparameterizations respectively. As one can see in Tab2for Λ CDM sum of different χ t values is χ tot = 1171 . .We note that all of the χ t values obtained in each step areindependent from observations in test sample. We alsocompute the mean value of free parameters were obtainedin each of steps. These mean values are ¯Ω m0 = 0 . and ¯ h = 0 . for Λ CDM model. These values are in fullagreement with those we obtained from MCMC analysis us-ing complete data sets. In the case of CPL parameterization,using ten different test samples we obtain χ tot = 1055 . or ∆ χ tot = − . respect to Λ CDM, which means CPLis more consistent with data compared to standard modelof cosmology. In this case the mean value of free param-eters are ¯Ω m0 = 0 . , ¯ h = 0 . , ¯ w = − . and ¯ w = 0 . .As the last model we computethe main values related to PADE parameterization.In this case we obtain χ tot = 1066 . and also ¯Ω m0 = 0 . , ¯ h = 0 . , ¯ w = − . , ¯ w = 0 . and ﬁnally ¯ w = 0 . . Compared to other models, forPADE parameterization we have the minimum value of χ tot . In the other meaning, from cross-validation point ofview, CPL parameterization with ∆ χ tot = − . respectto Λ CDM is the best model and PADE parameterizationwith ∆ χ tot = − . respect to Λ CDM occupies thesecond position in the model ranking. Λ CDM as the worstmodel, with the biggest value of χ tot placed in the bottomposition of the ranking. We plot the evolution of Hubbleparameter H ( z ) for each of the models under study inthe left panels of Fig.1. In each of the panels we plotdifferent curves using the best ﬁt parameters obtained bytraining samples (Tabs.2,3 and 4), the mean values offree parameters and ﬁnally those we obtained in MCMC analysis. In similar fashion, we have plotted the evolutionof theoretical distance modulus µ the ( z ) for different modelsunder study in the right panels of Fig.1. Here we have µ the ( z ) = 5 log[(1 + z ) R z dxE ( x ) ] − h + 42 . .As one can see in H ( z ) plots (left panels) the evolution of H for best ﬁt parameters of MCMC and the mean valueparameters obtained from training samples behave similarly.These two curves move between those we plotted upon theresults of different training samples. This situation also existfor different curves of µ the ( z ) which means that all of themodels with their best ﬁt parameters have same behaviors.As a main cosmological parameter, we plot the evolutionof EoS parameter of DE models under study in Fig.2. Inthe left (right) panel we plot w d for different conditions ofCPL (PADE) parameterization. In the both of panels wealso plot w d = − as the EoS of Λ CDM for comparison.In both of the panels we observe that the results of MCMCanalysis lead to grater value of w d compared with thoseof mean values of parameters. Nevertheless, w d in all ofconditions of CPL and PADE parameterizations behave insame manner. In all conditions, w d ( z = 0) is very closeto that of cosmological constant ( w d = − ) and in higherredshifts w d decouples from phantom line and increases by z . CONCLUSIONIn this work we investigate different scenarios of DE, thecosmological constant as a main candidate for playing therole of DE and two DE parameterizations as alternatives forcosmological constant. We use the CPL and PADE param-eterizations for describing the form EoS parameter of DE.In order to compare these different cosmologies and selectthe best model, there are many different approaches whichhave been used in literature. As the main goal in this pa-per we select some of these well known approaches, theAIC, BIC, DIC criteria and the Bayesian evidence, and com-pare them with another approach for model selection namedcross-validation. Among different methods which there arefor cross-validation, we use k -fold cross-validation methodby setting k = 10 . In this way we apply three sets of lat-est observational data points, the Pantheon sample as the lat-est 1048 type Ia supernovae observations from (Scolnic et al.2018) and the latest measurements from cosmic chronome-ters for H ( z ) from (Farooq et al. 2017) and the BAO sample.Firstly, in order to compute the values of AIC, BIC and DIC, ODEL SELECTION Table 2 . The statistical results for Λ CDM model obtained in different steps of analysis using different training samples. Note that we calculatethe values of χ t using test sample in each of the steps.Sample Parameter step-1 step-2 step-3 step-4 step-5 step-6 step-7 step-8 step-9 step-10 Ω m0 . . . . . . . . . . Training h . . . . . . . . . . χ min .

98 938 .

72 926 .

11 940 .

11 949 .

38 926 .

59 974 .

65 935 .

23 936 .

82 935 . Test χ t .

30 121 .

42 134 .

33 119 .

72 111 .

92 133 .

20 86 .

22 124 .

03 122 .

41 124 . χ tot . Table 3 . The statistical results for CPL parameterization obtained in different steps of analysis using different training samples. Note that wecalculate the values of χ t using test sample in each of the steps..Sample Parameter step-1 step-2 step-3 step-4 step-5 step-6 step-7 step-8 step-9 step-10 Ω m0 . . . . . . . . . . h . . . . . . . . . . Training w − . − . − . − . − . − . − . − . − . − . w . . . . . . . . . . χ min .

77 934 .

96 923 .

05 938 .

12 945 .

11 969 .

59 972 .

73 955 .

59 956 .

77 958 . Test χ t .

17 121 .

62 133 .

39 119 .

36 112 .

44 87 .

32 86 .

45 102 .

18 100 .

63 99 . χ tot . Table 4 . The statistical results for PADE parameterization obtained in different steps of analysis using different training samples. Note that wecalculate the values of χ t using test sample in each of the steps.Sample Parameter step-1 step-2 step-3 step-4 step-5 step-6 step-7 step-8 step-9 step-10 Ω m0 . . . . . . . . . . h . . . . . . . . . . Training w − . − . − . − . − . − . − . − . − . − . w . . . . . . . . . . w . . − . . − . . . − . . − . χ min .

81 934 .

35 923 .

66 937 .

02 945 .

41 969 .

12 970 .

25 955 .

70 956 .

27 958 . Test χ t .

51 121 .

82 134 .

49 118 .

40 112 .

23 87 .

24 96 .

08 102 .

36 100 .

56 100 . χ tot . R EZAEI ET AL . H ( z ) z Best fit of training Best fit of MCMC Average of trainings Observations (cid:1) ( z ) z CV"b2.dat" us 1:3"b3.dat" us 1:3"b4.dat" us 1:3"b5.dat" us 1:3"b6.dat" us 1:3"b7.dat" us 1:3"b8.dat" us 1:3"b9.dat" us 1:3Best fit of training Best fit of MCMC Average of trainings (cid:1) (cid:2) obs H ( z ) z Best fit of training Best fit of MCMC Average of trainings Observations (cid:1) ( z ) z CV"b2.dat" us 1:3"b3.dat" us 1:3"b4.dat" us 1:3"b5.dat" us 1:3"b6.dat" us 1:3"b7.dat" us 1:3"b8.dat" us 1:3"b9.dat" us 1:3Best fit of training Best fit of MCMC Average of trainings (cid:1) obs H ( z ) z Best fit of training Best fit of MCMC Average of trainings Observations (cid:1) ( z ) z CV"b2.dat" us 1:3"b3.dat" us 1:3"b4.dat" us 1:3"b5.dat" us 1:3"b6.dat" us 1:3"b7.dat" us 1:3"b8.dat" us 1:3"b9.dat" us 1:3Best fit of training Best fit of MCMC Average of trainings (cid:1) obs

Figure 1 . Top left (right) panel: evolution of Hubble parameter (distance modulus) for Λ CDM model. Middle left (right) panel: evolution ofHubble parameter (distance modulus) for CPL parametrization. Bottom left (right) panel: evolution of Hubble parameter (distance modulus)for PADE parameterization. The curves plotted for best ﬁt results of free parameters obtained in different analysis. The observational datapoints also plotted for comparison.

ODEL SELECTION -1.2-1.1-1-0.9-0.8-0.7-0.6-0.5-0.4-0.3 0 0.5 1 1.5 2 2.5 3 W d ( z ) z Λ CDM Best fit of training Best fit of MCMC Average of trainings -1.2-1.1-1-0.9-0.8-0.7-0.6-0.5-0.4-0.3 0 0.5 1 1.5 2 2.5 3 W d ( z ) z Λ CDM Best fit of training Best fit of MCMC Average of trainings

Figure 2 . Evolution of w d for different best ﬁt parameters obtained in different analysis for CPL (PADE) parameterization plotted in the left(right) panel. we implement a joint likelihood analysis using all of the datapoints. As the result, we observe that PADE parametriza-tion with χ min = 1056 . has the minimum value of χ and Λ CDM with χ min = 1061 . has the greatest value of χ .Using these values, we compute the related AIC and BIC fordifferent DE models under study. Based on AIC results weobserve that Λ CDM cosmology is the best model, but theperformance of the other models are also very good. UponAIC, we found that there are

Signiﬁcant support to CPL andPADE models. Assuming the computed BIC values, we sawthat Λ CDM with

BIC = 1075 . is the best model and PADEparametrization with BIC = 1091 . is the worst one. Uponthese values we found Very Strong evidence against both CPLand PADE parametrization. Although, AIC results indicatethat all of the models under study are consistent with ap-plied data samples, but from the BIC point of view we foundhuge differences among the performance of the models. Cal-culating the value of DIC for different models under study,we observe a big conﬂict in results of various criteria. Inthis case, PADE with

DIC = 1065 . is the best model,while Λ CDM with

DIC = 1078 . is the worst model. Thevalue of ∆ DIC = 12 . indicate that there is a Strong evi-dence against Λ CDM. Based on the Bayesian inference anal-ysis, our results indicate that Λ CDM is the best model, whilefor CPL (PADE) parameterization we have ∆ ln ǫ = 2 . ( ∆ ln ǫ = 2 . ) which means there is Deﬁnite evidence againstthese models.Finally, we do our new analysis, the cross-validation tocompare the models. As one can see in Tabs.(2-4), cross-validation results indicate that CPL cosmology with χ tot =1055 . has the minimum value of χ tot among investigatedmodels and thus, is the best model. On the other hand Λ CDMmodel with χ tot = 1171 . has the worst performancesamong different models. Although, we found conﬂict be-tween the results of AIC and BIC and those of DIC, but nowwe ﬁnd a greater conﬂict between Bayesian evidence resultsand those of cross-validation. Here, the main question is "which of these methods for model selection are the best?" or equivalently, "which of these cosmologies are the bestone?". From cosmological point of view we know that thereare different theoretical problems against cosmological con-stant which dynamical DE can solve or alleviate them. Fur-thermore, as we mentioned in Sec.1 Λ cosmology plaguedwith different observational tensions in estimation of someof main cosmological parameters. It was observed in somestudies that dynamical DE scenarios can reduce or even solvesome of these tensions (see also Rezaei et al. 2019; Pan et al.2019). On the other hand, from model selection point ofview, we show that AIC just can help us to identify the mostuseful model for prediction, while BIC provides good resultsif in the set of the models under consideration is one favoredmodel. Furthermore, BIC obtained using the values of χ min which calculated upon the value of free parameters that con-strained with observational data. It is important to note thatthe data samples which have used to constrain free parameterare exactly same as those have used to compute χ min . Thispoint can affect the results which we obtain using BIC. Incontrast with AIC and BIC criteria, instead of using just thebest ﬁt likelihood, DIC uses the whole sample. Furthermore,AIC and BIC penalize all the involved parameters, while DICpenalizes just those parameters which contribute to the ﬁt inan actual way. On the other hand, Bayesian inference notonly determines the free parameters of the model, but alsoprovides a direct way to compare different models. More-over, the Bayesian evidence for model selection has beenwidely used in cosmology.As the last method of model selection we have used cross-validation. In cross-validation method, the data samplewhich we use for computing χ value is independent fromthose we use for training the model. This item can signiﬁ-cantly improve the position of cross-validation against otherapproaches, especially BIC and AIC criteria. Combining allof these and assuming the numerical results we obtained inthis work, we can say that choosing cross-validation and DICmethods for model selection leads to precedence of dynami-cal DE scenario, while AIC, BIC and Bayesian evidence lead0 R EZAEI ET AL .to precedence of Λ cosmology. Therefore, we conclude thatusing cross-validation and DIC methods for model selectioncan lead to different results compared to BIC and AIC. Ourresults, can be examined using other different data samplesor by applying cross-validation for comparing other differentcosmological models. Cross-validation besides other meth-ods, can be considered as a potential candidate for model se- lection in cosmology in further investigations. ACKNOWLEDGEMENTSThe authors gratefully thank to the Referee for the con-structive comments and recommendations which deﬁnitelyhelp to improve the readability and quality of the paper.REFERENCES

A. Vehtari, A. Gelman, J. G. 2015, arXiv:1507.04544Abbott, T. M. C., et al. 2019, Mon. Not. Roy. Astron. Soc., 483, 4866Adachi, M., & Kasai, M. 2012, Prog. Theor. Phys., 127, 145Aghanim, N., et al. 2018, arXiv:1807.06209Akaike, H. 1974, IEEE Transactions of Automatic Control, 19, 716Alam, S., et al. 2017, Mon. Not. Roy. Astron. Soc., 470, 2617Aldrich, J. 2007, International Statistical Review, 66(1), 61Anagnostopoulos, F. K., Basilakos, S., & Saridakis, E. N. 2020,arXiv:2012.06524Arevalo, F., Cid, A., & Moya, J. 2017, Eur. Phys. J., C77, 565Arlot, S., C. A. 2010, Statistics Surveys, 4, 40Baker, A., & Graves-Morris, P. 1996, Pade Approximants (CambridgeUniversity Press)Barboza, E. M., Alcaniz, J. S., Zhu, Z. H., & Silva, R. 2009, Phys. Rev.,D80, 043521Bassett, B. A., Brownstone, M., Cardoso, A., et al. 2008, JCAP, 0807, 007Bautista, J. E., et al. 2017, Astron. Astrophys., 603, A12Benetti, M., & Capozziello, S. 2019, JCAP, 1912, 008Bennett, C., et al. 2003, ApJS, 148, 1Beutler, F., Blake, C., Colless, M., et al. 2011, MNRAS, 416, 3017Caldwell, R. R. 2002, Phys. Lett. B, 545, 23Carroll, S. M., Duvvuri, V., Trodden, M., & Turner, M. S. 2004, Phys. Rev.D, 70, 043528Copeland, E. J., Sami, M., & Tsujikawa, S. 2006, International Journal ofModern Physics D, 15, 1753Cousins, R. D. 2008, Phys. Rev. Lett., 101, 029101—. 2017, Synthese, 194, 395Dabbs, B., & Junker, B. 2016, arXiv preprint arXiv:1605.03000David J. Spiegelhalter, Nicola G. Best, B. P. C., & Linde, A. V. D. 2002,Journal of the Royal Statistical Society, 64, 583Delubac, T., et al. 2015, Astron. Astrophys., 574, A59Dutta, R., Bogdan, M., & Ghosh, J. K. 2015, Model Selection and MultipleTesting - A Bayesian and Empirical Bayes Overview and some NewResults, , , arXiv:1510.00547Efstathiou, G. 1999, Mon. Not. Roy. Astron. Soc., 310, 842—. 2008, Mon. Not. Roy. Astron. Soc., 388, 1314Elizalde, E., Nojiri, S., & Odintsov, S. D. 2004, Phys. Rev., D70, 043539Erickson, J. K., Caldwell, R., Steinhardt, P. J., Armendariz-Picon, C., &Mukhanov, V. F. 2002, Phys. Rev. Lett., 88, 121301et. al., Z. 2015, Journal of Econometrics, 187(1)Farooq, O., Madiyar, F. R., Crandall, S., & Ratra, B. 2017, Astrophys. J.,835, 26Freedman, W. L. 2017, Nat. Astron., 1, 0121Gariazzo, S. ????, EPJC, 80,552Gasperini, M., & Veneziano, F. P. G. 2002, Phys. Rev. D, 65, 023508Ghojogh, B., & Crowley, M. 2019, The Theory Behind Overﬁtting, CrossValidation, Regularization, Bagging, and Boosting: Tutorial, , ,arXiv:1905.12787Gil-Marín, H., et al. 2018, Mon. Not. Roy. Astron. Soc., 477, 1604Gomez-Valent, A., & Sola, J. 2015, Mon. Not. Roy. Astron. Soc., 448, 2810Jassal, H. K., Bagla, J. S., & Padmanabhan, T. 2005, Mon. Not. Roy.Astron. Soc., 356, L11Jeffreys, H. 1961, Theory of ProbabilityKass, R. E., & Raftery, A. E. 1995, J. Am. Statist. Assoc., 90, 773Khadka, N., & Ratra, B. 2019, arXiv:1909.01400 Kohavi, R. 1995, In Proceedings of the 14th International Joint Conferenceon Artiﬁcial IntelligenceKowalski, M., Rubin, D., Aldering, G., & et al. 2008, ApJ, 686, 749Kurek, A., & Szydlowski, M. 2008, Astrophys. J., 675, 1Liddle, A. R. 2007, Mon. Not. Roy. Astron. Soc., 377, L74Lin, W., Mack, K. J., & Hou, L. 2019, arXiv:1910.02978Lloyd Knox, M. M. 2020, Phys.Rev.D, 101, 4, 043533Lusso, E., Piedipalumbo, E., Risaliti, G., et al. 2019, Astron. Astrophys.,628, L4Macaulay, E., Wehus, I. K., & Eriksen, H. K. 2013, Physical ReviewLetters, 111, 161301Malekjani, M., Basilakos, S., Davari, Z., Mehrabi, A., & Rezaei, M. 2017,Mon. Not. Roy. Astron. Soc., 464, 1192Malekjani, M., Rezaei, M., & Akhlaghi, I. A. 2018, Phys. Rev., D98,063533Maor, I., Brustein, R., & Steinhardt, P. J. 2001, Phys. Rev. Lett., 86, 6,[Erratum: Phys. Rev. Lett.87,049901(2001)]March, M. C., Starkman, G. D., Trotta, R., & Vaudrevange, P. M. 2011,Mon. Not. Roy. Astron. Soc., 410, 2488Mehrabi, A., Basilakos, S., & Pace, F. 2015, MNRAS, 452, 2930Nesseris, S., Basilakos, S., Saridakis, E. N., & Perivolaropoulos, L. 2013,Phys. Rev., D88, 103010Nojiri, S., & Odintsov, S. D. 2011, Phys. Rept., 505, 59Pade, H. 1892, Ann. Sci. Ecole Norm. Sup., 9(3), 1Padmanabhan, T. 2002, Phys. Rev. D, 66, 021301—. 2003, PhR, 380, 235Pan, S., Yang, W., Di Valentino, E., Shaﬁeloo, A., & Chakraborty, S. 2019,arXiv:1907.12551Peiris, H. V., et al. 2003, Astrophys. J. Suppl., 148, 213Perlmutter, S., Aldering, G., Goldhaber, G., & et al. 1999, ApJ, 517, 565Ralph, W. 2015, The Combinatorics of Occam’s Razor, , ,arXiv:1504.07441Rezaei, M. 2019a, Mon. Not. Roy. Astron. Soc., 485, 550—. 2019b, Mon. Not. Roy. Astron. Soc., 485, 4841Rezaei, M., & Malekjani, M. 2017, Phys. Rev. D, 96, 063519Rezaei, M., Malekjani, M., Basilakos, S., Mehrabi, A., & Mota, D. F. 2017,Astrophys. J., 843, 65Rezaei, M., Malekjani, M., & Sola, J. 2019, Phys. Rev., D100, 023539Rezaei, M., Naderi, T., Malekjani, M., & Mehrabi, A. 2020a, Eur. Phys. J.C, 80, 374Rezaei, M., Pour-Ojaghi, S., & Malekjani, M. 2020b, Astrophys. J., 900, 70Riess, A. G., Casertano, S., Yuan, W., Macri, L. M., & Scolnic, D. 2019,Astrophys. J., 876, 85Riess, A. G., Filippenko, A. V., Challis, P., & et al. 1998, AJ, 116, 1009Riess, A. G., et al. 2004, ApJ, 607, 665Ross, A. J., Samushia, L., Howlett, C., et al. 2015, Mon. Not. Roy. Astron.Soc., 449, 835Schwarz, G. 1978, Annals of Statistics, 6, 461Scolnic, D. M., et al. 2018, Astrophys. J., 859, 101Shao, J. 1997, Statistica sinica, 221Shibata, R. 1984, Biometrika, 71, 43Spergel, D., et al. 2003, ApJS., 148, 175Thomas, S. 2002, Physical Review Letters, 89, 081301Trashorras, M., Nesseris, S., & Garcia-Bellido, J. 2016, Phys. Rev. D, 94,063511ODEL SELECTION11