[PDF] Bayes Factors for Peri-Null Hypotheses

Abstract

A perennial objection against Bayes factor point-null hypothesis tests is that the point-null hypothesis is known to be false from the outset. Following Morey and Rouder (2011) we examine the consequences of approximating the sharp point-null hypothesis by a hazy `peri-null' hypothesis instantiated as a narrow prior distribution centered on the point of interest. The peri-null Bayes factor then equals the point-null Bayes factor multiplied by a correction term which is itself a Bayes factor. For moderate sample sizes, the correction term is relatively inconsequential; however, for large sample sizes the correction term becomes influential and causes the peri-null Bayes factor to be inconsistent and approach a limit that depends on the ratio of prior ordinates evaluated at the maximum likelihood estimate. We characterize the asymptotic behavior of the peri-null Bayes factor and discuss how to construct peri-null Bayes factor hypothesis tests that are also consistent.

Full PDF

aa r X i v : . [ m a t h . S T ] F e b Bayes Factors for Peri-Null Hypotheses

Alexander Ly

University of AmsterdamCentrum Wiskunde & Informatica

Eric-Jan Wagenmakers

University of AmsterdamAbstractA perennial objection against Bayes factor point-null hypothesis testsis that the point-null hypothesis is known to be false from the out-set. Following Morey and Rouder (2011) we examine the consequencesof approximating the sharp point-null hypothesis by a hazy ‘peri-null’hypothesis instantiated as a narrow prior distribution centered on thepoint of interest. The peri-null Bayes factor then equals the point-nullBayes factor multiplied by a correction term which is itself a Bayes fac-tor. For moderate sample sizes, the correction term is relatively incon-sequential; however, for large sample sizes the correction term becomesinﬂuential and causes the peri-null Bayes factor to be inconsistent andapproach a limit that depends on the ratio of prior ordinates evaluatedat the maximum likelihood estimate. We characterize the asymptoticbehavior of the peri-null Bayes factor and discuss how to constructperi-null Bayes factor hypothesis tests that are also consistent.

Keywords:

Consistency, Peri-null correction factor, Asymptotic sam-pling distributionERI-NULL BAYES FACTORS ARE INCONSISTENT 2 vagueness leads nowhere.Jeﬀreys, 1937

Introduction

In the Bayesian paradigm, the support that data y n := ( y , . . . , y n ) provide foran alternative hypothesis H versus a point-null hypothesis H is given by the Bayesfactor BF ( y n ): p ( y n | H ) p ( y n | H ) | {z } BF ( y n ) = Posterior model odds z }| { P ( H | y n ) P ( H | y n ) , prior model odds z }| { P ( H ) P ( H ) (1)= R Θ f ( y n | θ ) π ( θ | H ) d θ R Θ f ( y n | θ ) π ( θ | H ) d θ , (2)where the ﬁrst line indicates that the Bayes factor quantiﬁes the change from priorto posterior model odds (Wrinch & Jeﬀreys, 1921), and the second line indicatesthat this change is given by a ratio of marginal likelihoods, that is, a comparisonof prior predictive performance obtained by integrating the parameters θ j out ofthe j th model’s likelihood f ( y n | θ j ) at the observations y n with respect to the priordensity π ( θ j | H j ) (Jeﬀreys, 1935, 1939; Kass & Raftery, 1995). Although the gen-eral framework applies to the comparison of any two models (as long as the modelsmake probabilistic predictions; Dawid, 1984; Shafer & Vovk, 2019), the proceduredeveloped by Harold Jeﬀreys in the late 1930s was explicitly designed as an improve-ment on p -value null-hypothesis signiﬁcance testing. In the prototypical scenario, anull-hypothesis H has p free parameters, whereas an alternative hypothesis H has p = p + 1 free parameters; the additional free parameter in H is the one that is test-relevant . For instance, in Jeﬀreys’s t -test the test-relevant parameter δ = µ/σ represents standardized eﬀect size; after assigning prior distributions to the modelparameters we may compute the Bayes factor for H : δ = 0 with free parameter θ = σ ∈ (0 , ∞ ) versus H : θ = ( δ, σ ) ∈ R × (0 , ∞ ) where δ is unrestricted and σ denoting the common nuisance parameter. When BF ( y n ) = 1 / BF ( y n ) is largerthan 1, the data provide evidence that the ‘general law’ H can be retained; whenBF ( y n ) is smaller than 1, the data provide evidence that it ought to be replacedby H , the model that relaxes the general law. The larger the deviation from 1, thestronger the evidence. Importantly, in Jeﬀreys’s framework the test-relevant param-eter is ﬁxed under H and free to vary under H . The hypothesis H is generallyknown as a ‘point-null’ hypothesis.A perennial objection against point-null hypothesis testing—whether Bayesianor frequentist—is that in most practical applications, the point-null is never trueERI-NULL BAYES FACTORS ARE INCONSISTENT 3exactly (e.g., Bakan, 1966; Berkson, 1938; Edwards, Lindman, & Savage, 1963;Jones & Tukey, 2000; Kruschke & Liddell, 2018; see also Laplace, 1774/1986, p. 375).If this argument is accepted and H is deemed to be false from the outset, then thetest merely assesses whether or not the sample size was suﬃciently large to detectthe non-zero eﬀect. This objection was forcefully made by Tukey:“Statisticians classically asked the wrong question—and were willingto answer with a lie, one that was often a downright lie. They asked “Arethe eﬀects of A and B diﬀerent?” and they were willing to answer “no.”All we know about the world teaches us that the eﬀects of A and B arealways diﬀerent—in some decimal place—for any A and B. Thus asking“Are the eﬀects diﬀerent?” is foolish. (Tukey, 1991, p. 100)This perennial objection has been rebutted in several ways (e.g., Jeﬀreys,1937, 1961; Kass & Raftery, 1995); in the current work we focus on the mostcommon rebuttal, namely that the point-null hypothesis is a mathematicallyconvenient approximation to a more realistic ‘peri-null’ (Tukey, 1995) hypoth-esis H e that assigns the test-relevant parameter a distribution tightly concen-trated around the value speciﬁed by the point-null hypothesis (e.g., Good,1967, p. 416; Berger & Delampady, 1987; Cornﬁeld, 1966, 1969; Dickey, 1976;Edwards et al., 1963; Gallistel, 2009; George & McCulloch, 1993; Jeﬀreys, 1935, 1936;Rouder, Speckman, Sun, Morey, & Iverson, 2009). For instance, in the case of the t -test the peri-null H e could specify δ ∼ π ( δ | H e ) = N (0 , κ ), where the width κ isset to a small value.Previous work has suggested that the approximation of a point-null hypothesisby an interval is reasonable when the width of that interval is half a standard error inwidth (Berger & Delampady, 1987) or one standard error in width (Jeﬀreys, 1935).Here we explore the consequences of replacing the point-null hypothesis H by aperi-null hypothesis H e from a diﬀerent angle. We alter only the speciﬁcation of thenull-hypothesis H , which means that the alternative hypothesis H now overlaps with H e . Below we show, ﬁrst, that the eﬀect on the Bayes factor of replacing H with H e isgiven by another Bayes factor, namely that between H and H e (cf. Morey & Rouder,2011, p. 411). This ‘peri-null correction factor’ is usually near 1, unless samplesize grows large. In the limit of large sample sizes, we demonstrate that the Bayesfactor for the peri-null H e versus the alternative H is bounded by the ratio of theprior ordinates evaluated at the maximum likelihood estimate. This proves earlierstatements from Morey and Rouder (2011, pp. 411-412) and conﬁrms suggestions inJeﬀreys (1961, p. 367) and Jeﬀreys (1973, p. 39, Eq. 2). In other words, the Bayesfactor for the peri-null hypothesis is inconsistent. We end with suggestions on howa consistent method for hypothesis testing can be obtained without fully committingto a point-null hypothesis.ERI-NULL BAYES FACTORS ARE INCONSISTENT 4 The Peri-Null Correction Factor

Consider the three hypotheses discussed earlier: the point-null hypothesis H ﬁxes the test-relevant parameter to a ﬁxed value (e.g., δ = 0); the peri-null hypothesis H e assigns the test-relevant parameter a distribution that is tightly centered aroundthe value of interest (e.g., δ ∼ π ( δ | H e ) = N (0 , κ ) with κ small); and the alternativehypothesis H assigns the test-relevant parameter a relatively wide prior distribution, δ ∼ π ( δ | H ). The Bayes factor of interest is between H and H e , which can beexpressed as the product of two Bayes factors involving H : p ( y n | H ) p ( y n | H e ) | {z } Peri-null BF e = p ( y n | H ) p ( y n | H ) | {z } Point-null BF × p ( y n | H ) p ( y n | H e ) | {z } Correction factor BF e . (3)In words, the Bayes factor for the alternative hypothesis against the peri-null hy-pothesis equals the Bayes factor for the alternative hypothesis against the point-nullhypothesis, multiplied by a correction factor (cf. Morey & Rouder, 2011, p. 411).This correction factor quantiﬁes the extent to which the point-null hypothesis out-predicts the peri-null hypothesis. With data sets of moderate size, and κ small, theperi-null and point-null hypotheses will make similar predictions, and consequentlythe correction factor will be close to 1. In such cases, the point-null can indeed beconsidered a mathematically convenient approximation to the peri-null. Example

Consider the hypothesis that “more desired objects are seen as closer”(Balcetis & Dunning, 2011). In the authors’ Study I, 90 participants had to esti-mate their distance to a bottle of water. Immediately prior to this task, 47 ‘thirsty’participants had consumed a serving of pretzels, whereas 43 ‘quenched’ participantshad drank as much as they wanted from four 8-oz glasses of water. In line withthe authors’ predictions, “Thirsty participants perceived the water bottle as closer( M = 25 . SD = 7 .

3) than quenched participants did ( M = 28 . SD = 6 . t = 2 .

00 and p = . t -test concerning the test-relevant parameter δ may contrast H : δ = 0 versus H with a Cauchy distribution with mode 0 and interquartile range κ , the commondefault value κ = 1 / √ = 1 . H . We may also computea peri-null correction factor by contrasting H : δ = 0 against H e : δ ∼ N (0 , κ ),with κ = 0 .

01, say. The resulting peri-null correction factor is BF e = 0 . κ = 0 .

05, we have BF e = 0 . Calculated using the Summary Stats module in JASP, (e.g., Ly et al., 2018, jasp-stats.org ),and based on Gronau, Ly, and Wagenmakers (2020).

ERI-NULL BAYES FACTORS ARE INCONSISTENT 5factor of BF e = 1 . = 1 .

259 to BF e = 1 .

167 is utterlyinconsequential.The diﬀerence between the peri-null and point-null Bayes factor remains incon-sequential for larger values of t . When we change t = 2 .

00 to t = 4 .

00, the point-nullBayes factor equals BF = 174, which according to Jeﬀreys’s classiﬁcation of evi-dence (e.g., Jeﬀreys, 1961, Appendix B) is considered compelling evidence for H .With κ = 0 .

01, the peri-null correction factor equals BF e = 0 .

986 and consequentlya peri-null Bayes factor equals of about 172 in favor of H over H e . With κ = 0 . e = 0 .

713 and BF e ≈ P ( H | y n ) = 174 / ≈ .

994 versus 124 /

125 = 0 . H and H e at the maximum likelihood estimate. The Peri-Null Bayes Factor is Inconsistent

Historically, the main motivation for the development of the Bayes factor wasthe desire to be able to obtain arbitrarily large evidence for a general law: “Weare looking for a system that will in suitable cases attach probabilities near 1 toa law.” (Jeﬀreys, 1977, p. 88; see also Etz & Wagenmakers, 2017; Ly et al., 2020;Wrinch & Jeﬀreys, 1921).Statistically, this desideratum means that we want Bayes factors to be consis-tent, which implies that, as sample size increases, BF ( Y n ) (i) grows without boundwhen the data are generated under the alternative model H , and (ii) tends to zerowhen the data are generated under the null model, that is,BF ( Y n ) P θ → P θ ∈ H , and BF ( Y n ) P θ → P θ ∈ H , (4)thus, regardless of the chosen prior model probabilities P ( H ) , P ( H ) ∈ (0 , P ( H | Y n ) P θ → P θ ∈ H , and P ( H | Y n ) P θ → P θ ∈ H , (5)where P θ refers to the data generating distribution, here, Y i iid ∼ P θ , and where X n P θ → X denotes convergence in probability, that is, lim n →∞ P θ ( | X n − X | > ǫ ) = 0 as usual.Suppose that the parameter θ ∈ R p can be separated into a test-relevant pa-rameter δ ∈ R and nuisance parameters σ ∈ R p − . Below we prove that when thepoint-null hypothesis H : δ = 0 is replaced by a distribution over δ , i.e., the peri-null hypothesis H e : δ ∼ π ( δ | H e ), the resulting peri-null Bayes factor BF e ( Y ( n ) ) isinconsistent (cf. suggestions by Jeﬀreys, 1961, p. 367; Jeﬀreys, 1973, p. 39, Eq. 2;and the statements by Morey & Rouder, 2011, p. 411-412).ERI-NULL BAYES FACTORS ARE INCONSISTENT 6The inconsistency of peri-null Bayes factors follows quite directly from Laplace’smethod (Laplace, 1774/1986) for nested model comparisons, and consistency of themaximum likelihood estimator (MLE). Both Laplace’s method and consistency of theMLE hold under weaker conditions than stated here, namely, for absolute continuouspriors (e.g., van der Vaart, 1998, Chapter 10), and regular parametric models (e.g.,van der Vaart, 1998, Chapter 7; Ly, Marsman, Verhagen, Grasman, & Wagenmakers,2017, Appendix E). These models only need to be one time diﬀerentiable with respectto θ in quadratic mean and have non-degenerate Fisher information matrices that arecontinuous in θ with determinants that are bounded away from zero and inﬁnity.The inconsistency of the peri-null Bayes factor is therefore expected to hold moregenerally.We show that under the stronger conditions of Kass, Tierney, and Kadane(1990), the asymptotic sampling distribution of peri-null Bayes factors can be easilyderived. These stronger conditions imply that the model is regular for which we knowthat the MLE is not only consistent, but also locally asymptotically normal with avariance equal to the observed Fisher information matrix at ˆ θ with entries[ ˆ I (ˆ θ )] a,b = − n n X i =1 (cid:18) ∂ ∂θ a ∂θ b log f ( Y i | θ ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =ˆ θ , (6)see for instance Ly et al. (2017) for details. Theorem 1 (Limit of a peri-null Bayes factor) . Let Y n = ( Y , . . . , Y n ) be indepen-dently and identically distributed random variables with common distribution P θ ∈ P Θ ,where P Θ is an identiﬁable family of distributions that is Laplace-regular (Kass et al.,1990). This implies that P Θ admits densities f ( y n | θ ) with respect to the Lebesguemeasure that are six times continuously diﬀerentiable in θ at the data-governing pa-rameter θ ∈ Θ ⊂ R p and Θ open with non-empty interior. Furthermore, assumethat the (peri-null) prior densities π ( θ | H e ) and π ( θ | H ) assign positive mass to aneighborhood at the data-governing parameter θ and are four times continuously dif-ferentiable at θ ; then BF e ( Y n ) P θ → π ( θ | H ) π ( θ | H e ) . ⋄ Proof.

The condition that the model is Laplace-regular allows us to employ theLaplace method to approximate the numerator and the denominator of the peri-nullBayes factor by p ( Y n | H j ) = f ( Y n | ˆ θ ) (cid:16) πn (cid:17) p | ˆ I (ˆ θ ) | − π (ˆ θ | H j ) (7) × (cid:18) C (ˆ θ | H j ) n + C (ˆ θ | H j ) n + O ( n − ) (cid:19) , where C (ˆ θ | H j ) and C (ˆ θ | H j ) for j = e , e ( Y n ) = π (ˆ θ | H ) (cid:20) C (ˆ θ | H ) n + O ( n − ) (cid:21) π (ˆ θ | H e ) (cid:20) C (ˆ θ | H e ) n + O ( n − ) (cid:21) . (8)Identiﬁability and the regularity conditions on the model imply that the maximumlikelihood estimator is consistent, thus, ˆ θ P θ → θ (e.g., van der Vaart, 1998, Chapter 5).As all functions of ˆ θ in Eq. (8) are smooth at θ , the continuous mapping theoremapplies and the assertion follows.Theorem 1 implies that BF e ( Y n ) is inconsistent; for all data-governing param-eter values that have a neighborhood that receives positive mass from both priors, theperi-null Bayes factor approaches a limit that is given by the ratio of prior densitiesevaluated at the data governing θ as n increases. Note that this holds in particular forthe test point of interest, e.g., δ = 0, which has a neighborhood that the peri-null priorassigns positive mass to. The limit in Theorem 1 can also be derived using the gener-alized Savage-Dickey density ratio (Verdinelli & Wasserman, 1995) and exploiting thetransitivity of the Bayes factor. Theorem 1, however, can be more straightforwardlyextended to characterize the asymptotic behavior of the peri-null Bayes factor.The limiting value of the peri-null Bayes factor is not representative when n issmall or moderate. Theorem 2 below shows that the sampling mean of log BF e ( Y ( n ) )is expected to be of smaller magnitude than its limiting value. In other words, thelimit in Theorem 1 should be viewed as an upper bound under the alternative and alower bound under the null.This theorem exploits the fact that without a point-null hypothesis the gradientsof the densities π ( θ | H ) and π ( θ | H e ) are of the same dimension, which implies thatthe gradient ∂∂θ log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) is well-deﬁned. As such, the delta method can be usedto show that the peri-null Bayes factor inherits the asymptotic normality property ofthe MLE.To state the theorem we write D for the diﬀerential operator with respectto θ , e.g., [ D π ( θ | H j )] = ∂∂θ π ( θ | H j ) denotes the gradient, and [ D π ( θ | H j )] = ∂ ∂θ∂θ π ( θ | H j ) denotes the Hessian matrix. Theorem 2 (Asymptotic sampling distribution of a peri-null Bayes factor) . Underthe regularity conditions stated in Theorem 1 and for all data-governing parameters θ for which ˙ v ( θ ) := [ D log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) ] = 0 , (9)ERI-NULL BAYES FACTORS ARE INCONSISTENT 8 the asymptotic sampling distribution of the logarithm of the peri-null Bayes factor isnormal, that is, √ n (cid:18) log BF e ( Y n ) − log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) − E ( θ, n ) (cid:19) P θ N (cid:16) , ˙ v ( θ ) T I − ( θ ) ˙ v ( θ ) (cid:17) , (10) where P θ denotes convergence in distribution under P θ and where E ( θ, n ) = log (cid:16) C ( θ | H ) /n + C ( θ | H ) /n C ( θ | H e ) /n + C ( θ | H e ) /n (cid:17) , (11) is a bias term that is asymptotically negligible.For all θ for which ˙ v ( θ ) = 0 , but ¨ v ( θ ) := [ D log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) ] = 0 , the asymptoticdistribution of log BF e ( Y n ) has a quadratic form, that is, n (cid:18) log BF e ( Y n ) − log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) − E ( θ, n ) (cid:19) P θ Z T I − / ( θ )¨ v ( θ ) I − / ( θ ) Z, (12) where Z ∼ N (0 , I ) with I ∈ R p × p the identity matrix. ⋄ Proof.

The proof depends on (another) Taylor series expansion, see Appendix A forfull details. Firstly, we recall that √ n (ˆ θ − θ ) θ N (0 , I − ( θ )). To relate this asymp-totic distribution to that of log BF e ( Y n ), we note that Eq. (8) is, up to a decreasingerror in n , a smooth function of the maximum likelihood estimator. The goal isto ensure that the error terms 1 + C ( θ | H j ) /n + C ( θ | H j ) /n are asymptoticallynegligible. A Taylor series expansion at the data-governing θ shows thatlog BF e ( Y n ) = log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) + E ( θ, n ) (13)+ (ˆ θ − θ ) T (cid:16) ˙ v ( θ ) + [ D E ( θ, n )] (cid:17) + (ˆ θ − θ ) T (¨ v ( θ )+[ D E ( θ,n )])2 (ˆ θ − θ ) + O P ( n − / ) . The asymptotic normality result follows after rearranging Eq. (13), a multiplicationof √ n on both sides, and an application of Slutsky’s lemma. Similarly, when ˙ v ( θ ) iszero, but ¨ v ( θ ) not, we have n log BF e ( Y n ) = n (cid:16) log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) + E ( θ, n )) (cid:17) (14)+ √ n (ˆ θ − θ ) T (¨ v ( θ )+ O ( n − )])2 √ n (ˆ θ − θ ) + O P ( n − / ) . Since √ n (ˆ θ − θ ) P θ N (0 , I ( θ ) − ), the second order result follows.To conclude that the bias term is indeed asymptotically negligible, note thatlog(1 + x/n ) ≈ x/n as n → ∞ and therefore D k E ( θ, n ) = O (cid:18) n D k (cid:16) C ( θ | H ) − C ( θ | H e ) (cid:17)(cid:19) for all k ≤

3. The approximation log(1 + x/n ) ≈ x/n requires C k ( θ | H j )for k = 1 , j = e , κ is relatively small compared to κ . The bias is, therefore, expected to decaymuch more slowly.ERI-NULL BAYES FACTORS ARE INCONSISTENT 9Theorem 2 also shows that under the alternative hypothesis, log BF e ( Y n ) isexpected to increase towards the limiting value log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) as n → ∞ whenever E ( θ, n ) <

0. The bias is negative, because if the data-governing parameter δ isfar from zero, but the peri-null prior is speciﬁed such that it is peaked at zero, theLaplace approximations become less accurate. In other words, for ﬁxed n and δ = 0,we typically have C ( θ | H ) ≤ C ( θ | H e ) and C ( θ | H ) ≤ C ( θ | H e ) and, therefore, E ( θ, n ) < Example

We consider a Bayesian t -test and for the peri-null Bayes factor use the priors π ( δ, σ | H ) ∝ Cauchy( δ ; 0 , κ ) σ − and π ( δ, σ | H e ) ∝ N ( δ ; 0 , κ ) σ − . (15)Note that π ( δ, σ | H ) is chosen as in the default Bayesian t -test (Jeﬀreys, 1948;Ly, Verhagen, & Wagenmakers, 2016a, 2016b; Rouder et al., 2009), where κ > δ = µ/σ ,and σ ∝ σ − implies that the standard deviation common in both models is pro-portional to σ − (for advantages of this choice see Grünwald, de Heide, & Koolen,2019; Hendriksen, de Heide, & Grünwald, in press). For data governing parameters θ = ( µ, σ ), where µ is the population mean, Theorem 1 shows that as n → ∞ log BF e ( Y n ; κ , κ ) P θ → log  √ κ exp( µ κ σ ) √ πκ (1 + h µκ σ i )  =: v ( θ ) . (16)Direct calculations show that ˙ v ( θ ) = 0 only when µ = 0. Hence, under the alternative µ = 0, the logarithm of these peri-null Bayes factor t -tests are asymptotically normalwith an approximate variance of( µ + 2 µ σ )( µ + ( κ − κ ) σ ) κ σ ( µ + κ σ ) n . (17)To characterize the asymptotic mean we also require the bias term E ( θ, n ), which forthe problem at hand comprises of C ( θ | H ) = µ +(18+2 κ ) σ µ +( κ − κ σ µ + κ σ ) , (18) C ( θ | H ) = µ +(1110+3127 κ ) σ µ +(6020+4462 κ ) κ σ µ +(5091 κ − κ σ − µ + κ σ ) , (19) C ( θ | H e ) = µ +6 σ µ + κ σ (2 κ − κ σ , (20) C ( θ | H e ) = µ +(264 − κ ) σ µ +(10811 κ − κ σ µ +2(713 − κ ) κ σ κ σ . (21)More concretely, under µ = 0 .

167 and σ = 1, log BF e ( Y n ; 0 . ,

1) converges inprobability to log(10). This limit is depicted as the brown dashed horizontal curve inthe top left subplot of Fig. 1.ERI-NULL BAYES FACTORS ARE INCONSISTENT 10 κ = 0 . n l og B F κ = 0 . n l og B F n l og B F n l og B F Figure 1 . Under the alternative, the logarithm of the peri-null Bayes factor t -testis asymptotically normal with a mean (i.e., the solid curves) that increases to thelimit, e.g., log BF e = log(10) and log BF e = log(30) in the top and bottom rowrespectively. The black and red curves correspond to the simulated and asymptoticnormal sampling distribution respectively. The dotted curves show the 97.5% and2.5% quantiles of the respective sampling distribution. Note that the convergence tothe upper bound is slower when the peri-null is more concentrated, e.g., compare theleft to the right column.This subplot also shows the mean (solid red curve) and the 97.5% and 2.5%quantiles (dotted red curves above and below the solid curve respectively) based onthe asymptotic normal result of Theorem 2. The black curves represent the analogousquantities based on simulated normal data with µ = 0 . σ = 1 based on 1,000replications at sample sizes n = 100 , , , . . . , , p ( Y n | H e ) is still inaccu-rate. As expected, the Laplace approximation becomes accurate sooner, wheneverthe peri-null prior is less concentrated. The top right subplot depicts results oflog BF e ( Y n ; 0 . ,

1) under µ = 0 .

314 and σ = 1, which converges in probabilityto log(10).ERI-NULL BAYES FACTORS ARE INCONSISTENT 11Similarly, the asymptotic normal distribution becomes adequate at a smallersample size for larger population means µ . The bottom left subplot corresponds tolog BF e ( Y n ; 0 . ,

1) under µ = 0 .

182 and σ = 1, whereas the bottom right subplotcorresponds to log BF e ( Y n ; 0 . ,

1) under µ = 0 .

348 and σ = 1. The logarithms ofboth peri-null Bayes factors converge in probability to log(30).In sum, the plots show that under the alternative hypothesis the asymptoticnormal distribution approximates the sampling distribution of the logarithm of theperi-null Bayes factor quite well, and it approximates better when the peri-null prioris less concentrated.Under the null hypothesis µ = 0, the gradient ˙ v (0 , σ ) = 0, and so is the Hessian,except for the the ﬁrst entry of ¨ v , that is, ∂ ∂µ v ( µ, σ ) (cid:12)(cid:12)(cid:12)(cid:12) µ =0 = κ − κ κ κ σ . (22)As such, log BF e ( Y n ) has a shifted asymptotically χ (1)-distribution, i.e., n (cid:18) log BF e ( Y n ; κ , κ ) − log (cid:16) π ( θ | H ) π ( θ | H e ) (cid:17) − E ( θ, n ) (cid:19) P ,σ κ − κ κ κ Z , (23)where Z ∼ N (0 , µ = 0 and σ = 1, log BF e ( Y n ; 0 . , P ,σ → − . e ( Y n ; 0 . ,

1) converges in probability to − .

53. Both cases yieldevidence for the null hypothesis, but the evidence is stronger for the peri-null thatis more tightly concentrated around 0. The approximation based on the asymptotic χ (1)-distribution (in red) and the simulations (in black) are shown in Fig. 2. In theleft subplot, the curves based on the asymptotic χ (1)-distribution only start from n = 185, because only for n ≥

185 does log(1+ C (0 , | H e ) /n + C (0 , | H e ) /n ) havea non-negative argument; for κ = 0 .

05 we have that C (0 , | H e ) = − .

83. Notethat under the null hypothesis, the Laplace approximations are accurate sooner thanunder the alternative hypothesis, because the priors are already concentrated at zero.Under the null hypothesis the general observation remains true that for reasonablesample sizes the expected peri-null Bayes factor is far from the limiting value.Unlike the peri-null Bayes factor, the (default) point-null Bayes factor is consis-tent. Fig. 3 shows the simulated sampling distribution of the point-null and peri-nullBayes factors in blue and black respectively. As before the 97.5% quantile (top dottedcurve), the average (solid curve), and the 2.5% quantile (bottom dotted curve) aredepicted as well.The top left subplot of Fig. 3 shows that under µ = 0 .

167 and σ = 1 thepoint-null and peri-null Bayes factor behave similarly up to n = 30. Furthermore,the average point-null log Bayes factor crosses the peri-null upper bound of log(10)at around n = 380, whereas the peri-null Bayes factor remains bounded even in thelimit, and is therefore inconsistent. The top right subplot shows, under µ = 0 .

314 andERI-NULL BAYES FACTORS ARE INCONSISTENT 12 κ = 0 . n l og B F ~ κ = 0 . n l og B F ~ Figure 2 . Under the null, the logarithm of the peri-null Bayes factor t -test has ashifted asymptotically χ (1)-distribution with a mean (i.e., the solid curves) thatdecreases to the limit, e.g., log BF e = − .

22 and log BF e = − .

53 in the left andright plot respectively. The black and red curves correspond to the simulated andasymptotic χ (1) sampling distribution respectively. The dotted curves show the97.5% and 2.5% quantiles of the respective sampling distribution. Note that theconvergence to the lower bound is slower when the peri-null is more concentrated,e.g., compare the left to the right plot. σ = 1, that the discrepancy between the point-null and peri-null Bayes factor becomesapparent sooner when the peri-null prior is less concentrated, i.e., κ = 0 .

10 insteadof κ = 0 .

05. Also note that under these alternatives, the logarithm of the point-nullBayes factor grows linearly (e.g., Bahadur & Bickel, 2009; Johnson & Rossell, 2010).Hence, the point-null Bayes factor has a larger power to detect an eﬀect than thataﬀorded by the peri-null Bayes factor.The bottom row of Fig. 3 paints a similar picture; under the null the point-nullBayes factor accumulates evidence for the null hypothesis without bound as n grows.For κ = 0 .

05 the behavior of the peri-null and the point-null Bayes factor is similarup to n = 200 and it takes about n = 1 ,

000 samples before the average point-null logBayes factor crosses the peri-null lower bound of − .

22. For κ = 0 .

10 only n = 270samples are needed before the log Bayes factor for the point-null hypothesis crossesthe peri-null lower bound of − . Towards Consistent Peri-Null Bayes Factors

There are at least three methods to adjust the peri-null Bayes factor in order toavoid inconsistency. The ﬁrst method changes both the point-null hypothesis H andthe alternative hypothesis H . Speciﬁcally, one may deﬁne the hypotheses under testto be non-overlapping (e.g., Chandramouli & Shiﬀrin, 2019). The resulting procedureis usually known as an ‘interval-null hypothesis’, where the interval-null is deﬁned asa (renormalized) slice of the prior distribution for the test-relevant parameter underan alternative hypothesis (e.g., Morey & Rouder, 2011). For instance, in the case ofERI-NULL BAYES FACTORS ARE INCONSISTENT 13 κ = 0 . n l og B F ~ κ = 0 . n l og B F n l og B F n l og B F Figure 3 . (Default) point-null Bayes factor t -tests (depicted in blue) are consistentunder both the alternative and null, e.g., top and bottom row respectively, as op-posed to peri-null Bayes factors (depicted in black). Note that the peri-null and thedefault point-null Bayes factors behave similarly when n is small. The domain wherethe two types of Bayes factors behave similarly is smaller when the peri-null is lessconcentrated, e.g., compare the right to the left column.a t -test an encompassing hypothesis H e may assign eﬀect size δ a Cauchy distributionwith mode 0 and interquartile range κ e ; from this encompassing hypothesis one mayconstruct two rival hypotheses by restricting the Cauchy prior to particular intervals:the interval-null hypothesis truncates the encompassing Cauchy to an interval cen-tered on δ = 0: δ ∼ Cauchy(0 , κ e ) I ( − a, a ), whereas the interval-alternative hypoth-esis is the conjunction of the remaining two intervals, δ ∼ Cauchy(0 , κ e ) I ( −∞ , − a )and δ ∼ Cauchy(0 , κ e ) I ( a, ∞ ). The resulting peri-null Bayes factor is then consistentin accordance to subjective interval belief; for all data-governing parameters δ in theinterior of the interval-null, lim n →∞ BF e = 0, and for δ in the interior of the slicedout alternative lim n →∞ BF e = 0. In particular, when a = 1 and the data govern-ing δ = 0 .

7, then this Bayes factor will eventually show unbounded evidence for theinterval-null. Apart from the need to specify the width of the interval (Jeﬀreys, 1961, For consistency to hold the standard condition is assumed that the interval-null or sliced upprior assigns positive mass to a neighborhood of δ in the respective intervals. ERI-NULL BAYES FACTORS ARE INCONSISTENT 14p. 367), the disadvantages of this method are twofold. Firstly, the prior distributionsfor the rival interval hypotheses are of an unusual shape – a continuous distributionup to the point of truncation, where the prior mass abruptly drops to zero. It isdebatable whether such artiﬁcial forms would ever result from an elicitation eﬀort.The second disadvantage is that it seems somewhat circuitous to parry the critique“the null hypothesis is never true exactly” by adjusting both the null hypothesis and the alternative hypothesis.The second method to specify a (partially) consistent peri-null Bayes factoris to change the point-null hypothesis to a peri-null hypothesis by supplementingrather than supplanting the spike with a distribution (Morey & Rouder, 2011). Inother words, the point-null hypothesis is upgraded to include a narrow distributionaround the spike. This mixture distribution is generally known as a ‘spike-and-slab’prior, but here the slab represents the peri-null hypothesis and is relatively peaked.This mixture model H ′ may be called a ‘hybrid null hypothesis’ (Morey & Rouder,2011), a ‘mixture null hypothesis’, or a ‘peri-point null hypothesis’. Thus, H ′ = ξ H + (1 − ξ ) H e , with ξ ∈ (0 ,

1) the mixture weights and, say, ξ = . Because ξ > H ′ to H will be consistent when the data come from H ; and because H ′ also has mass away from the point under test, the presence of atiny true non-zero eﬀect will not lead to the certain rejection of the null hypothesis as n grows large. The data determine which of the two peri-point components receivesthe most weight. As before, for modest sample sizes and small κ , the distinctionbetween point-null, peri-null, and peri-point null is immaterial. The main drawbackof the peri-point null hypothesis is that it is consistent only when the data come from H ; when the data come from H or H e , the Bayes factor remains bounded as before(i.e., Eq. 8).The third method is to deﬁne a peri-null hypothesis whose width κ slowlydecreases with sample size (i.e., a ‘shrinking peri-null hypothesis’). For the t -test, onecan take κ = cσ/ √ n for some constant c > κ shrinks too quickly. The representationEq. (3) shows that this consistency ﬁx is equivalent to keeping the peri-null correctionBayes factor BF e close to one regardless of the data. Note that this is attainableas κ → Concluding Comments

The objection that “the null hypothesis is never true” may be countered byabandoning the point-null hypothesis in favor of a peri-null hypothesis. For moderatesample sizes and relatively narrow peri-nulls, this change leaves the Bayes factorrelatively unaﬀected. For large sample sizes, however, the change exerts a profoundinﬂuence and causes the Bayes factor to be inconsistent, with a limiting value givenby the ratio of prior ordinates evaluated at the maximum likelihood estimate (cf.Jeﬀreys, 1961, p. 367 and Morey & Rouder, 2011, pp. 411-412). Here we also derivedthe asymptotic sampling distribution of the peri-null Bayes factor and show that itslimiting value is essentially an upper bound under the alternative and a lower boundunder the null. The asymptotic distributions also provide insights to typical valuesof the peri-null Bayes factor at a ﬁnite n . Note that there exist several Bayes factormethods that have replaced point-null hypotheses with either peri-null hypotheses(e.g., Stochastic Search Variable Selection (SSVS; George & McCulloch, 1993) ) orwith other hypotheses that have a continuous prior distribution close to zero (e.g.,the sceptical prior proposed by Pawel & Held, 2020). As far as evidence from themarginal likelihood is concerned, these methods are therefore inconsistent.Inconsistency may not trouble subjective Bayesians: if the peri-null hy-pothesis truly reﬂects the belief of a subjective sceptic, and the alterna-tive hypothesis truly reﬂects the belief of a subjective proponent, then theBayes factor provides the relative predictive success for the sceptic ver-sus the proponent, and it is irrelevant whether or not this relative suc-cess is bounded. Objective Bayesians, however, develop and apply proceduresthat meet various desiderata (e.g., Bayarri, Berger, Forte, & García-Donato, 2012;Consonni, Fouskakis, Liseo, & Ntzoufras, 2018), with consistency a prominent exam-ple. As indicated above, the desire for consistency was the primary motivation for thedevelopment of the Bayesian hypothesis test (Wrinch & Jeﬀreys, 1921). For objectiveBayesians then, it appears the point-null hypothesis is more than just a mathemat-ically convenient approximation to the peri-null hypothesis (Jeﬀreys, 1961, p. 367).The peri-point mixture model (consistent only under the point-null hypothesis) andthe shrinking peri-point model (incoherent because the prior width depends on samplesize) may provide acceptable compromise solutions.Regardless of one’s opinion on the importance of consistency, it is evident thatseemingly inconsequential changes in model speciﬁcation may asymptotically yieldfundamentally diﬀerent results. Researchers who entertain the use of peri-null hy-potheses should be aware of the asymptotic consequences; in addition, it generallyappears prudent to apply several tests and establish that the conclusions are relativelyrobust. “A similar setup in this context was considered by Mitchell and Beauchamp (1988) , who insteadused “spike and slab” mixtures. An important distinction of our approach is that we do not put aprobability mass on β i = 0.” (George & McCulloch, 1993, p. 883). ERI-NULL BAYES FACTORS ARE INCONSISTENT 16

Acknowledgements

This research was supported by the Netherlands Organisation for Scientiﬁc Re-search (NWO; grant

Bahadur, R. R., & Bickel, P. J. (2009). An optimality property of Bayes’ test statistics.

Lecture Notes-Monograph Series , , 18–30.Bakan, D. (1966). The test of signiﬁcance in psychological research. Psychological Bulletin , , 423–437.Balcetis, E., & Dunning, D. (2011). Wishful seeing: More desired objects are seen as closer. Psychological Science , , 147–152.Bayarri, M. J., Berger, J. O., Forte, A., & García-Donato, G. (2012). Criteria for Bayesianmodel choice with application to variable selection. The Annals of Statistics , ,1550–1577.Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistical Science , ,317–352.Berkson, J. (1938). Some diﬃculties of interpretation encountered in the application of thechi-square test. Journal of the American Statistical Association , , 526–536.Chandramouli, S. H., & Shiﬀrin, R. M. (2019). Commentary on Gronau and Wagenmakers. Computational Brain & Behavior , , 12–21.Consonni, G., Fouskakis, D., Liseo, B., & Ntzoufras, I. (2018). Prior distributions forobjective Bayesian analysis. Bayesian Analysis , , 627–679.Cornﬁeld, J. (1966). A Bayesian test of some classical hypotheses—with applications tosequential clinical trials. Journal of the American Statistical Association , , 577–594.Cornﬁeld, J. (1969). The Bayesian outlook and its application. Biometrics , , 617–657.Dawid, A. P. (1984). Present position and potential developments: Some personal views:Statistical theory: The prequential approach (with discussion). Journal of the RoyalStatistical Society Series A , , 278–292.Dickey, J. M. (1976). Approximate posterior distributions. Journal of the American Sta-tistical Association , , 680–689.Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference forpsychological research. Psychological Review , , 193–242.Etz, A., & Wagenmakers, E.-J. (2017). J. B. S. Haldane’s contribution to the Bayes factorhypothesis test. Statistical Science , , 313–329.Gallistel, C. R. (2009). The importance of proving the null. Psychological Review , ,439–453.George, E. J., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journalof the American Statistical Association , , 881–889.Good, I. J. (1967). A Bayesian signiﬁcance test for multinomial distributions. Journal ofthe Royal Statistical Society, Series B (Methodological) , , 399–431.Gronau, Q. F., Ly, A., & Wagenmakers, E.-J. (2020). Informed Bayesian t -tests. TheAmerican Statistician , , 137–143.Grünwald, P., de Heide, R., & Koolen, W. (2019). Safe testing. arXiv preprintarXiv:1906.07801 .Hendriksen, A., de Heide, R., & Grünwald, P. (in press). Optional stopping with Bayesfactors: A categorization and extension of folklore results, with an application toinvariant situations. Bayesian Analysis . ERI-NULL BAYES FACTORS ARE INCONSISTENT 18

Isserlis, L. (1918). On a formula for the product-moment coeﬃcient of any order of a normalfrequency distribution in any number of variables.

Biometrika , , 134–139.Jeﬀreys, H. (1935). Some tests of signiﬁcance, treated by the theory of probability. Pro-ceedings of the Cambridge Philosophy Society , , 203–222.Jeﬀreys, H. (1936). Further signiﬁcance tests. In (Vol. 32, pp. 416–445).Jeﬀreys, H. (1937). Scientiﬁc method, causality, and reality. Proceedings of the AristotelianSociety , , 61–70.Jeﬀreys, H. (1939). Theory of probability (1st ed.). Oxford, UK: Oxford University Press.Jeﬀreys, H. (1948).

Theory of probability (2nd ed.). Oxford, UK: Oxford University Press.Jeﬀreys, H. (1961).

Theory of probability (3rd ed.). Oxford, UK: Oxford University Press.Jeﬀreys, H. (1973).

Scientiﬁc inference (3rd ed.). Cambridge, UK: Cambridge UniversityPress.Jeﬀreys, H. (1977). Probability theory in geophysics.

Journal of the Institute of Mathematicsand its Applications , , 87–96.Johnson, V. E., & Rossell, D. (2010). On the use of non-local prior densities in Bayesianhypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Method-ology) , , 143–170.Jones, L. V., & Tukey, J. W. (2000). A sensible formulation of the signiﬁcance test. Psychological Methods , , 411–414.Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American StatisticalAssociation , , 773–795.Kass, R. E., Tierney, L., & Kadane, J. B. (1990). The validity of posterior expansions basedon Laplace’s method. In S. Geisser, J. S. Hodges, S. J. Press, & A. Zellner (Eds.), Bayesian and likelihood methods in statistics and econometrics: Essays in honor ofGeorge A. Barnard (Vol. 1, pp. 473–488). Elsevier.Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian New Statistics: Hypothesistesting, estimation, meta–analysis, and power analysis from a Bayesian perspective.

Psychonomic Bulletin & Review , , 178–206.Laplace, P.-S. (1774/1986). Memoir on the probability of the causes of events. StatisticalScience , , 364–378.Ly, A., Komarlu Narendra Gupta, A. R., Etz, A., Marsman, M., Gronau, Q. F., & Wagen-makers, E.-J. (2018). Bayesian reanalyses from summary statistics and the strengthof statistical evidence. Advances in Methods and Practices in Psychological Science , (3), 367–374. doi: 10.1177/2515245918779348Ly, A., Marsman, M., Verhagen, A. J., Grasman, R. P. P. P., & Wagenmakers, E.-J. (2017).A tutorial on Fisher information. Journal of Mathematical Psychology , , 40–55.Ly, A., Stefan, A., van Doorn, J., Dablander, F., van den Bergh, D., Sarafoglou, A., . . .Wagenmakers, E.-J. (2020). The Bayesian methodology of Sir Harold Jeﬀreys as apractical alternative to the p-value hypothesis test. Computational Brain & Behav-ior (3), 153–161.Ly, A., Verhagen, A. J., & Wagenmakers, E.-J. (2016a). Harold Jeﬀreys’s default Bayes fac-tor hypothesis tests: Explanation, extension, and application in psychology.

Journalof Mathematical Psychology , , 19–32.Ly, A., Verhagen, A. J., & Wagenmakers, E.-J. (2016b). An evaluation of alternativemethods for testing hypotheses, from the perspective of Harold Jeﬀreys. Journal of

ERI-NULL BAYES FACTORS ARE INCONSISTENT 19

Mathematical Psychology , , 43–55.McCullagh, P. (2018). Tensor methods in statistics . Courier Dover Publications.Mitchell, T. J., & Beauchamp, J. J. (1988). Bayesian variable selection in linear regression.

Journal of the American Statistical Association , , 1023–1032.Morey, R. D., & Rouder, J. N. (2011). Bayes factor approaches for testing interval nullhypotheses. Psychological Methods , , 406–419.Morey, R. D., & Rouder, J. N. (2018). BayesFactor 0.9.12-4.2.

Comprehensive R Archive Network. Retrieved from http://cran.r-project.org/web/packages/BayesFactor/index.html

Pawel, S., & Held, L. (2020). The sceptical Bayes factor for the assessmentof replication success.

Manuscript submitted for publication . Retrieved from https://arxiv.org/abs/2009.01520

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review , , 225–237.Shafer, G., & Vovk, V. (2019). Game-theoretic foundations for probability and ﬁnance (Vol. 455). John Wiley & Sons.Tukey, J. W. (1991). The philosophy of multiple comparisons.

Statistical Science , ,100–116.Tukey, J. W. (1995). Controlling the proportion of false discoveries for multiple comparisons:Future directions. In V. S. L. Williams, L. V. Jones, & I. Olkin (Eds.), Perspectiveson statistics for educational research: Proceedings of a workshop (pp. 6–9). ResearchTriangle Park, NC: National Institute of Statistical Sciences.van der Vaart, A. W. (1998).

Asymptotic statistics . Cambridge University Press.Verdinelli, I., & Wasserman, L. (1995). Computing Bayes factors using a generalization ofthe Savage–Dickey density ratio.

Journal of the American Statistical Association , ,614–618.Wrinch, D., & Jeﬀreys, H. (1921). On certain fundamental principles of scientiﬁc inquiry. Philosophical Magazine , , 369–390. ERI-NULL BAYES FACTORS ARE INCONSISTENT 20AppendixA. Laplace ApproximationThe Laplace approximation uses a (multivariate) Taylor expansion for which we in-troduce notation. Let h : Θ ⊂ R p → R , i.e., h ( θ ) = − n P ni =1 log f ( y i | θ ), and we writeˆ θ for the point in its domain where h takes its global minimum. Furthermore, we usesubscripts to denote partial derivatives, whereas superscripts refer to components ofa vector, or more generally an array. For instance, π a = ∂∂θ a π (ˆ θ ) refers to the a -thcomponent of the vector of partial derivatives [ D π (ˆ θ )] of the prior π evaluated atthe MLE. Similarly, we write h abc = ∂ ∂θ a ∂θ b ∂θ c h (ˆ θ ) for the abc -th component of thethree-dimensional array [ D h (ˆ θ )]. Hence, the number of indices in the subscript cor-responds to the number of derivatives of h and the indices, each in 1 , , . . . , p , providethe location of the component.We use superscripts to refer to the component of a vector. For instance, u a =( θ a − ˆ θ a ) represents the a -th component of the diﬀerence vector u = θ − ˆ θ , thus,equivalently u a := e Ta u , where e a is the unit (column) vector with entry 1 at index a and zero elsewhere. Similarly, ς abcd the abcd -th component of a four dimensionalarray.Moreover, we employ Einstein’s summation convention and suppress the sumwhenever an index occurs in both the sub and superscript. For instance, h a u a := p X a =1 h a u a , (24) h abc u a u b u c := p X a =1 p X b =1 p X c =1 h abc u a u b u c . (25)The former deﬁnes an inner product between the gradient of h and deviations u ,whereas the h abc = [ D h ] abc refers to the a -th row, b -th column, and c -th depth of thethree-dimensional array consisting of partial derivatives of h of order three. Lastly,we use the shorthand notation h a h b u a u b := X a X b h a h b u a u b , (26)to denote the nested sum which is needed for Cauchy products ( h a u a )( h b u b ). Forinstance, with d = 2( h u + h u )( h u + h u ) = h u h u + 2 h u h u + h u h u , (27)which is equivalent to h h u u + h h u u + h h u u + h h u u . (28)With these notational conventions a multivariate Taylor approximation is denoted as h ( θ ) = h (ˆ θ ) + h a u a + h ab u a u b + h abc u a u b u c + h abcd u a u b u c u d + O ( | u | ) . (29)and note the similarity to its one-dimensional counterpart.ERI-NULL BAYES FACTORS ARE INCONSISTENT 21 Theorem 3 (Laplace expansion with error term) . Let P Θ be a collection of densityfunctions that are six times continuously diﬀerentiable in θ ∈ Θ ⊂ R p , and π ( θ ) a prior density that is four times continuously diﬀerentiable. Let Y iid ∼ f ( y | θ ) forcertain θ , then with ˆ θ the MLE p ( y n ) = Z Θ f ( y n | θ ) π ( θ )d θ (30)= ( πn ) p f ( y n | ˆ θ ) π (ˆ θ ) | ˆ I (ˆ θ ) | − / (cid:20) C (1) (ˆ θ ) n + C (2) (ˆ θ ) n + O ( n − ) (cid:21) , (31) where | · | denotes the determinant and C (1) (ˆ θ ) = π ab π (ˆ θ ) ς ab − (cid:16) h abcd + h abc π u π (ˆ θ ) (cid:17) ς abcd + h abc h uef ς abcdef , (32) C (2) (ˆ θ ) = π abcd π (ˆ θ ) ς abcd − π (ˆ θ ) h abcdef +6 h abcde π f +15 h abcd π ef +20 h abc π def π (ˆ θ ) ς abcdef (33)+ π (ˆ θ ) h abcd h efgh +8 π (ˆ θ ) h abcde h fgh +40 h abc (cid:16) h defg π h + h def π gh (cid:17) π (ˆ θ ) ς abcdefgh − π (ˆ θ ) h abcd h efg h hij +4 h abc h def h ghi π j π (ˆ θ ) ς abcdefghij + h abc h def h ghi h jkl ς abcdefghijkl where ς ab , ς abcd , ς abcdef , ς abcdefgh , ς abcdefghij , and ς abcdefghijkl represent the ab -th com-ponent of the second, the abcd -th component of the fourth, the abcdef -th component ofthe sixth moment, the abcdef gh -th component of the eigth moment, the abcdef ghij -thcomponent of the tenth moment, and the abcdef ghijkl -th component of the twelfthmoment, of the p dimensional random vector Q ∼ N p (0 , ˆ I (ˆ θ ) − ) , respectively. ⋄ Proof.

The proof is based on (i) Taylor-expanding the exponential of the log-likelihoodof order ﬁve around ˆ θ , (ii) the deﬁnition of the exponential as a series and Taylor-expanding π to third order at the same point ˆ θ , and (iii) properties of the normaldistribution. Step (i)

Let h ( θ ) = n P ni =1 log f ( y i | θ ), then since h ( θ ) ∈ C (Θ) we know thatthere exists δ > B ˆ θ ( δ ) ⊂ R p of radius δ centered at ˆ θ theaverage log-likelihood h n ( θ ) is well-approximated by a Taylor expansion of order 5.This combined with ˆ θ being the MLE and the notation ˜ q = θ − ˆ θ yields p ( y n ) = Z Θ e − nh ( θ ) π ( θ )d θ = Z B ˆ θ ( δ ) e − nh (ˆ θ ) − nh ab ˜ q a ˜ q b R (˜ q ) π (˜ q )d˜ q, (34)= f ( y n | ˆ θ ) Z B ˆ θ ( δ ) e − nh ab ˜ q a ˜ q b e − ˜ R (˜ q ) π (˜ q )d˜ q, (35)where ˜ R (˜ q ) = n [ h abc ˜ q a ˜ q b ˜ q c + h abcd ˜ q a ˜ q b ˜ q c ˜ q d + h abcde ˜ q a ˜ q b ˜ q c ˜ q d ˜ q e + O ( | ˜ q | )] , (36)ERI-NULL BAYES FACTORS ARE INCONSISTENT 22is the bounded remainder term since h ∈ C (Θ). The replacement of Θ by B ˆ θ ( δ ) inthe integral is justiﬁed if the mass is concentrated at ˆ θ , thus, whenever the integralwith respect to the ﬁrst order term falls oﬀ quadratically, that is, if | n ˆ I (ˆ θ ) | / e − n ( h ( θ ) − h (ˆ θ )) π ( θ )d θ = O ( n − ) , (37)which is the case when ˆ θ is unimodal. When it is not unimodal, but ˆ θ is a globalmaximum, then the condition implies that the requirement that the contribution ofthe other maxima is not too big. Step (ii)

After centering the integral at ˆ θ we scale with respect to √ n , that is,we apply the change of variable q = √ n ˜ q , thus, R n − p/ d q = R d˜ q and therefore p ( y n ) = ( πn ) p/ f ( y n | ˆ θ ) | ˆ I (ˆ θ ) | − / Z B ˆ θ ( √ nδ ) ˜ ϕ ( q ) e − R ( q ) ˜ π ( q )d q, (38)where ˜ ϕ is the density of a multivariate normal distribution centered at 0 and co-variance matrix Σ = ˆ I − (ˆ θ ), and where ˜ π ( q ) is the Taylor approximation of π at theMLE, that is, ˜ π ( q ) = π (ˆ θ ) + π a (ˆ θ ) q a n / + π ab (ˆ θ ) q a q b n + π abc (ˆ θ ) q a q b q c n / + O ( n − ) , (39)and where the remainder term is now R ( q ) = h abc q a q b q c n / + h abcd q a q b q c q d n + h abcde q a q b q c q d q e n / + h abcde q a q b q c q d q e q f n O ( n − / ) . To exploit the properties of Gaussian integrals we replace integration domain B ˆ θ ( √ nδ )by R p , which is justiﬁed when n is large, and because the tails of a normal densityfall oﬀ exponentially.By deﬁnition of e − R ( q ) as a series and without the exponential approximationerror p ( y n ) ≈ ( πn ) p/ f ( y n | ˆ θ ) | ˆ I (ˆ θ ) | − / (40) × Z R p ˜ ϕ ( q ) h − R ( q ) + R ( q ) − R ( q ) + O ( | R ( q ) | ) i ˜ π ( q )d q. From here onwards we focus on the integral Eq. (40), which after some straightforwardbut tedious computations can be shown to be of the form Z R p ˜ ϕ ( q ) h A + A n − / + A n − + A n − / + A n − + O ( n − ) i d q, (41)where the A j terms are functions of q and ˆ θ deﬁned by the series representation of e − R ( q ) and ˜ π ( q ).ERI-NULL BAYES FACTORS ARE INCONSISTENT 23 Step (iii)

The terms A j are given below. Of the following results only the exactvalues of A , A and A matter; what matters for A and A is that they only involveodd powers of q : A = π (ˆ θ ) (42) A = π a q a − h abc π (ˆ θ )6 q a q b q c (43) A = π ab q a q b − (cid:16) π (ˆ θ ) h abcd + h abc π u (cid:17) q a q b q c q d + π (ˆ θ ) h abc h uef q a q b q c q d q e q w (44) A = π abc q a q b q c − h abcde π (ˆ θ )+30 h abcd π v +60 h abc π uv q a q b q c q d q e (45)+ h abc h uefl π (ˆ θ )+2 h abc h uef π l q a q b q c q d q e q w q l A = π abcd q a q b q c q d (46) − π (ˆ θ ) h abcdef +6 h abcde π f +15 h abcd π ef +20 h abc π def q a q b q c q d q e q f + π (ˆ θ ) h abcd h efgh +8 π (ˆ θ ) h abcde h fgh +40 h abc (cid:16) h defg π h + h def π gh (cid:17) q a q b q c q d q e q f q g q h − π (ˆ θ ) h abcd h efg h hij +4 h abc h def h ghi π j q a q b q c q d q e q f q g q h q i q j + π (ˆ θ ) h abc h def h ghi h jkl q a q b q c q d q e q f q g q h q i q j q k q l . Since for k odd A k only involve odd powers of q we conclude that their integral withrespect to ˜ ϕ ( q ) vanishes. Hence, p ( y n ) = ( πn ) p/ f ( y n | ˆ θ ) | ˆ I (ˆ θ ) | − / π (ˆ θ ) h E [ A ] nπ (ˆ θ ) + E [ A ] n π (ˆ θ ) O ( n − ) i , (47)where E [ A ] and E [ A ] are expectations with respect to Q ∼ N (0 , ˆ I (ˆ θ ) − ). Thisimplies that the order n − and n − terms in the assertion are C (1) (ˆ θ ) = E [ A ] /π (ˆ θ )and C (2) (ˆ θ ) = E [ A ] /π (ˆ θ ).The components of higher moments can be expressed in terms of the covari-ances ς ab = Cov( Q a , Q b ) using Isserlis’ formula (Isserlis, 1918; McCullagh, 2018). Formoments ς a ··· a w , that is, a component of the w th moment of Q with w = 2 v even,the following holds ς a ··· a w = X u ∈ P w Y i,j ∈ u ς ij , (48)where P w is the collection of all pairs of which there are v . For instance, for w = 4, ς abcd is a sum of 2-products of pairs, for w = 6 is a sum of 3-products of ς abcdef andERI-NULL BAYES FACTORS ARE INCONSISTENT 24so forth and so on. More speciﬁcally, ς abcd = ς ab ς cd + ς ac ς bd + ς ad ς bc (49) ς abcdef = ς ab ς cd ς ef + ς ab ς ce ς df + ς ab ς cf ς de (50)+ ς ac ς bd ς ef + ς ac ς be ς df + ς ac ς bf ς de + ς ad ς bc ς ef + ς ad ς be ς cf + ς ad ς bf ς ce + ς ae ς bc ς df + ς ae ς bd ς cf + ς ae ς bf ς cd + ς af ς bc ς de + ς af ς bd ς ce + ς af ς be ς cd , where all indexes a, b, c, d, e, f = 1 , , . . . , p . The expression of ς abcdefgh , ς abcdefghij ,and ς abcdefghijkl deﬁne sums of 105 = 3 × ×

7, 945 = 3 × × ×

9, and 10 ,

395 =3 ××

395 =3 ×× ××

395 =3 ×× ×× ××

395 =3 ×× ×× ×× ××