[PDF] On Statistical Non-Significance

Abstract

Significance tests are probably the most extended form of inference in empirical research, and significance is often interpreted as providing greater informational content than non-significance. In this article we show, however, that rejection of a point null often carries very little information, while failure to reject may be highly informative. This is particularly true in empirical contexts where data sets are large and where there are rarely reasons to put substantial prior probability on a point null. Our results challenge the usual practice of conferring point null rejections a higher level of scientific significance than non-rejections. In consequence, we advocate a visible reporting and discussion of non-significant results in empirical practice.

Full PDF

OOn Statistical Non-Signiﬁcance

Alberto AbadieMITMarch 2018

Abstract

Signiﬁcance tests are probably the most extended form of inference in empirical re-search, and signiﬁcance is often interpreted as providing greater informational contentthan non-signiﬁcance. In this article we show, however, that rejection of a point nulloften carries very little information, while failure to reject may be highly informative.This is particularly true in empirical contexts where data sets are large and wherethere are rarely reasons to put substantial prior probability on a point null. Ourresults challenge the usual practice of conferring point null rejections a higher levelof scientiﬁc signiﬁcance than non-rejections. In consequence, we advocate a visiblereporting and discussion of non-signiﬁcant results in empirical practice.

1. Introduction

Non-signiﬁcant empirical results (usually in the form of t -statistics smaller than 1.96) relativeto some null hypotheses of interest (usually zero coeﬃcients) are notoriously hard to publish inprofessional/scientiﬁc journals (see, e.g., Ziliak and McCloskey, 2008). This state of aﬀairs is inpart maintained by the widespread notion that non-signiﬁcant results are non-informative. Afterall, lack of statistical signiﬁcance derives from the absence of extreme or surprising outcomesunder the null hypothesis. In this article, we argue that this view of statistical inference ismisguided. In particular, we show that non-signiﬁcant results are informative, and argue thatthey are more informative than signiﬁcant results in scenarios that are common, even prevalent,in empirical practice. Alberto Abadie, Department of Economics, MIT, [email protected]. We thank Joshua Angrist, Gary Chamberlain,Amy Finkelstein, Guido Imbens, and Ben Olken for comments and discussions. A version of this article aimed atan economics readership has circulated under the title “Statistical Non-Signiﬁcant in Empirical Economics”. a r X i v : . [ s t a t . O T ] M a r o discuss the informational content of diﬀerent statistical procedures, we formally adopt alimited information Bayes perspective. In this setting, agents representing journal readership orthe scientiﬁc community have priors, P , over some parameters of interests, θ ∈ Θ. That is, amember p of P is a probability density function (with respect to some appropriate measure) on P .While agents are Bayesian, we will consider a setting where journals report frequentist results, inparticular, statistical signiﬁcance. Agents construct limited information Bayes posteriors basedon the reported results of signiﬁcance tests. We will deem a statistical result informative whenit has the potential to substantially change the prior of the agents over a large range of valuesfor θ .Notice, that, like Ioannidis (2005) and others, we restrict our attention to the eﬀect of statisticalsigniﬁcance on beliefs. We adopt this framework not because we believe it is (always) repre-sentative of empirical practice (in fact, journals typically report additional statistics, beyondstatistical signiﬁcance), but because isolating the informational content of statistical signiﬁcancehas immediate implications for how we should interpret its occurrence or lack of it. Correct inter-pretation of statistical signiﬁcance is important because, while many other statistics are reportedin practice, the scientiﬁc discussion of empirical results is often framed in terms of statistical sig-niﬁcance of some parameters of interest and non-signiﬁcant results may be under-reported asdiscussed above.

2. A Simple Example

In this section, we consider a simple example with Normal priors and data that captures theessence of our argument. In section 3 we will consider the case where the priors and the distribu-tion of the data are not restricted to be in a particular parametric family. Assume an agent hasa prior θ ∼ N ( µ, σ ) on θ , with σ >

0. A researcher observes n independent measurement of θ with Normal errors mutually independent and independent of θ , and with variance normalizedto one. That is, x , . . . , x n are independent N ( θ, (cid:98) θ = 1 n n (cid:88) i =1 x i ∼ N ( θ, /n ) . is deemed signiﬁcant if √ n | (cid:98) θ | > c , for some c >

0. In empirical practice, c is often equal to 1.96,the 0 . θ conditional on √ n | (cid:98) θ | > c and √ n | (cid:98) θ | ≤ c . First, notice thatPr( √ n | (cid:98) θ | > c | θ ) = Pr( (cid:98) θ > c/ √ n | θ ) + Pr( − (cid:98) θ > c/ √ n | θ )= Φ( √ nθ − c ) + Φ( −√ nθ − c ) . Therefore, Pr( √ n | (cid:98) θ | > c ) = Φ (cid:32) √ nµ − c √ nσ (cid:33) + Φ (cid:32) −√ nµ − c √ nσ (cid:33) . (1)The limited information posteriors given signiﬁcance and non-signiﬁcance are: p (cid:0) θ (cid:12)(cid:12) √ n | (cid:98) θ | > c (cid:1) = 1 σ φ (cid:32) θ − µσ (cid:33)(cid:16) Φ( √ nθ − c ) + Φ( −√ nθ − c ) (cid:17) Φ (cid:32) √ nµ − c √ nσ (cid:33) + Φ (cid:32) −√ nµ − c √ nσ (cid:33) , and p (cid:0) θ (cid:12)(cid:12) √ n | (cid:98) θ | ≤ c (cid:1) = 1 σ φ (cid:32) θ − µσ (cid:33)(cid:16) − Φ( √ nθ − c ) − Φ( −√ nθ − c ) (cid:17) − Φ (cid:32) √ nµ − c √ nσ (cid:33) − Φ (cid:32) −√ nµ − c √ nσ (cid:33) . The two posteriors, along with the Normal prior, are plotted in Figure 1 for µ = 1, σ = 1, c = 1 .

96, and n = 10. This ﬁgure illustrates the informational value of a signiﬁcance test.Rejection of the null carves probability mass around zero in the limited information posterior, This calculation uses the following fact of integration (cid:90) Φ (cid:18) λ − θξ (cid:19) σ φ (cid:18) θ − µσ (cid:19) dθ = Φ (cid:32) λ − µ (cid:112) σ + ξ (cid:33) for arbitrary real λ and µ and positive σ and ξ . Alternatively, the result can be easily derived after noticing thatthe distribution of (cid:98) θ integrated over the prior is Normal with mean µ and variance σ + 1 /n . priorposterior with significanceposterior with no significance Figure 1: Posterior Distributions After a Signiﬁcance Testwhile failure to reject concentrates probability mass around zero. Notice that failure to rejectcarries substantial information, even in the rather under-powered setting generated by the valuesof µ , σ , c , and n adopted for Figure 1, which imply Pr( √ n | (cid:98) θ | > c (cid:1) = 0 . n is small, signiﬁcance aﬀects the posterior over a large range of values. When n islarge, signiﬁcance provides only local to zero information. That is, signiﬁcance is not informativein large samples. This is explained by the fact that the probability of rejection in equation (1)converges to one as the sample size increases. By the law of total probability, it follows thatconditional on non-signiﬁcance probability mass concentrates around zero as n increases. Thatis, the occurrence of an event that is very unlikely given the prior has a large eﬀect on beliefs.The full information posterior is p (cid:0) θ | x , . . . , x n (cid:1) = 1 σ n φ (cid:32) θ − µ n σ n (cid:33) , Figure 2: Prior and Posterior with Signiﬁcance for Diﬀerent Sample Sizeswhere µ n = µ + nσ (cid:98) θ nσ , and σ n = σ nσ . So, in this very particular context, knowledge of the t -ratio ( √ n (cid:98) θ ) is suﬃcient to go back to thefull information posterior. The same is true for the combined information given by the P -value,2Φ( −√ n | (cid:98) θ | ), and the sign of (cid:98) θ .These results have immediate counterparts in in large samples settings with asymptotically Nor-mal distributions. They can also be generalized to non-parametric settings, as we demonstratein the next section. 5 . General Case To extend the results of the previous section beyond Normal priors and data, we will consider atest statistic, (cid:98) T n , such thatPr (cid:0) (cid:98) T n > c (cid:12)(cid:12) θ = 0 (cid:1) → α, and Pr (cid:0) (cid:98) T n > c (cid:12)(cid:12) θ, θ (cid:54) = 0 (cid:1) → . That is, we consider signiﬁcance tests that are consistent under ﬁxed alternatives and haveasymptotic size equal to α . Let p ( · ) be a prior on θ , and p ( ·| (cid:98) T n > c ) and p ( ·| (cid:98) T n ≤ c ) be thelimited information posteriors under signiﬁcance and non-signiﬁcance, respectively. We will ﬁrst assume a prior that is absolutely continuous with respect to the Lebesgue measure,with a version of the density that is positive and continuous at zero. By dominated convergence,we obtain:Pr (cid:0) (cid:98) T n > c (cid:1) → . We ﬁrst derive the posterior densities under signiﬁcance, p (0 | (cid:98) T n > c ) = Pr (cid:0) (cid:98) T n > c (cid:12)(cid:12) θ = 0 (cid:1) Pr (cid:0) (cid:98) T n > c (cid:1) p (0) → α p (0) , and p ( θ | (cid:98) T n > c ) = Pr (cid:0) (cid:98) T n > c (cid:12)(cid:12) θ (cid:1) Pr (cid:0) (cid:98) T n > c (cid:1) p ( θ ) → p ( θ ) , for θ (cid:54) = 0. So, again, signiﬁcance only changes beliefs locally around zero. The posterior densitiesafter non-signiﬁcance are p (0 | (cid:98) T n ≤ c ) = Pr (cid:0) (cid:98) T n ≤ c (cid:12)(cid:12) θ = 0 (cid:1) Pr (cid:0) (cid:98) T n ≤ c (cid:1) p (0) → ∞ , p ( θ | (cid:98) T n ≤ c ) = Pr (cid:0) (cid:98) T n ≤ c (cid:12)(cid:12) θ (cid:1) Pr (cid:0) (cid:98) T n ≤ c (cid:1) p ( θ )for θ (cid:54) = 0. Typically, for θ (cid:54) = 0 (using large deviation results) − n log (cid:16) Pr (cid:0) (cid:98) T n ≤ c (cid:12)(cid:12) θ (cid:1)(cid:17) → d θ , with 0 < d θ < ∞ . Therefore, Pr (cid:0) (cid:98) T n ≤ c (cid:12)(cid:12) θ (cid:1) converges to zero exponentially for θ (cid:54) = 0. Let β n ( θ ) = Pr( (cid:98) T n ≤ c | θ )be the probability of Type II error (one minus the power). Assume that (cid:90) lim inf n →∞ β n ( z/ √ n ) dz > . This rules out perfect local asymptotic power. Then, by change of variable z = n / θ and Fatou’slemma, we obtain lim inf n →∞ n / Pr( (cid:98) T n ≤ c ) = lim inf n →∞ n / (cid:90) β n ( θ ) p ( θ ) dθ = lim inf n →∞ (cid:90) β n ( z/ √ n ) p ( z/ √ n ) dz ≥ (cid:90) lim inf n →∞ ( β n ( z/ √ n ) p ( z/ √ n )) dz = (cid:90) lim inf n →∞ β n ( z/ √ n ) lim n →∞ p ( z/ √ n ) dz = p (0) (cid:90) lim inf n →∞ β n ( z/ √ n ) dz > . It follows that p ( θ | (cid:98) T n ≤ c ) → , for θ (cid:54) = 0.That is, like in the Normal case of section 2, conditional on non-signiﬁcance the posterior con-verges to a degenerate distribution at zero. For the second to last equality, notice that if a n ≥ b n → b > n → ∞ , thenlim inf n →∞ ( a n b n ) = lim inf n →∞ a n lim n →∞ b n . Figure 3: Limit of p ( θ | (cid:98) T n > c ) /p ( θ ) as a function of q ( θ (cid:54) = 0, α = 0 . We now consider the case when the prior has probability mass q at zero, with 0 < q <

1. ThenPr (cid:0) (cid:98) T n > c (cid:1) → qα + (1 − q ) ∈ ( α, . Now, the posterior after signiﬁcance is, p (0 | (cid:98) T n > c ) = Pr (cid:0) (cid:98) T n > c (cid:12)(cid:12) θ = 0 (cid:1) Pr (cid:0) (cid:98) T n > c (cid:1) p (0) → (cid:18) αqα + (1 − q ) (cid:19) q ≤ q, and p ( θ | (cid:98) T n > c ) = Pr (cid:0) (cid:98) T n > c (cid:12)(cid:12) θ (cid:1) Pr (cid:0) (cid:98) T n > c (cid:1) p ( θ ) → (cid:18) qα + (1 − q ) (cid:19) p ( θ ) ≥ p ( θ ) , for θ (cid:54) = 0. Now, in contrast to the continuous prior case, signiﬁcance changes beliefs away fromzero in large samples In particular, if we start with a prior that assigns a large probability to θ = 0, then signiﬁcance greatly aﬀects beliefs for values of θ diﬀerent from zero. Notice, however,8hat for moderate values of q the eﬀect of signiﬁcance on beliefs may be negligible. Figure 3show the limit of p ( θ | (cid:98) T n > c ) /p ( θ ) as a function of q , for θ (cid:54) = 0 and α = 0 .

05. This limit is closeto one for modest values of q . In order for signiﬁcance to at least double the probability of θ (cid:54) = 0we need q ≥ / (2(1 − α )) = 0 . α does not substantiallychange the value of the limit of p ( θ | (cid:98) T n > c ) /p ( θ ), except for very large values of q . For example,with α = 0 .

005 (as advocated in Benjamin et al., 2017), for signiﬁcance to at least double theprobability of θ (cid:54) = 0 we need q ≥ / (2(1 − α )) = 0 . q needs to be bigger than 0.5 in order for signiﬁcance to double the probability density functionof beliefs at non-zero values of θ .The posterior after non-signiﬁcance is, p (0 | (cid:98) T n ≤ c ) = Pr (cid:0) (cid:98) T n ≤ c (cid:12)(cid:12) θ = 0 (cid:1) Pr (cid:0) (cid:98) T n ≤ c (cid:1) p (0) → − αq (1 − α ) q = 1 , and for θ (cid:54) = 0, p ( θ | (cid:98) T n ≤ c ) = Pr (cid:0) (cid:98) T n ≤ c (cid:12)(cid:12) θ (cid:1) Pr (cid:0) (cid:98) T n ≤ c (cid:1) p ( θ ) → . Again, non-signiﬁcance seems to have a stronger eﬀect on beliefs than signiﬁcance.Some remarks about priors with probability mass at a point null are in order. First, priorbeliefs that assign probability mass to point nulls may not be adequate in certain settings. Forexample, beliefs on the average eﬀect of an anti-poverty intervention may sometimes concentrateprobability smoothly around zero, but more rarely in such a way that a large probability massat zero is a good description of a reasonable prior. Moreover, priors with probability mass at apoint null generate a drastic discrepancy, know as Lindley’s paradox, between frequentist andBayesian testing procedures (see, e.g., Berger, 1985). Lindley’s paradox arises in settings with aﬁxed value of (cid:98) T n and a large n . In those settings, frequentists would reject the null hypothesiswhen (cid:98) T n > c . Bayesians, however, would typically ﬁnd that the posterior probability of the pointnull far exceeds the posterior probability of the alternative. Lindley’s paradox can be explained bythe fact that, as n increases, the distribution of the test statistic under the alternative diverges.Therefore, a ﬁxed value of the test statistic as n increases can only be explained by the null9ypothesis. Notice that conditioning on the event { (cid:98) T n ≤ c } (as opposed to conditioning on thevalue of (cid:98) T n ) is not subject to Lindley’s paradox and it may be the natural choice to evaluate atesting procedure for which signiﬁcance depends on the value of { (cid:98) T n ≤ c } only.

4. Testing an Interval Null

In view of the lack of informativeness of non-signiﬁcance in large samples (under a point null),one could instead try to reinterpret signiﬁcance tests as tests of the implicit null “ θ is close tozero”.To accommodate this possibility, we will now concentrate in the problem of testing the null thatthe parameter θ is in some interval around zero. Under the null hypothesis, θ ∈ [ − δ, δ ], where δ is some positive number. Under the alternative hypothesis, θ (cid:54)∈ [ − δ, δ ]. Consider the Normalmodel of section 2. To obtain a test of size α we control the supremum of the probability ofType I error:Pr( √ n | (cid:98) θ | > c | | θ | = δ ) = Φ( √ nδ − c ) + Φ( −√ nδ − c ) . Therefore, we choose c such that Φ( √ nδ − c ) + Φ( −√ nδ − c ) = α . While there is no closed-formsolution for c , its value can be calculated numerically for any given value of √ nδ , and a veryaccurate approximation for large √ nδ is given by c = Φ − (1 − α ) + √ nδ. That is, controlling size in this setting implies that the critical value has to increase with thesample size at a root- n rate. In turn, this implies that the probability of rejection, Pr( √ n | (cid:98) θ | >c | θ ) = Φ( √ nθ − c ) + Φ( −√ nθ − c ) converges to one if θ (cid:54)∈ [ − δ, δ ], and converges to zero if θ ∈ ( − δ, δ ). As a result, the large sample posterior distributions with and without signiﬁcanceare truncated versions of the prior, with the prior truncated at ( − δ, δ ) under signiﬁcance, andat ( −∞ , − δ ) ∪ ( δ, ∞ ) under no signiﬁcance. If δ is large both signiﬁcance and non-signiﬁcanceare informative. If, however, δ is small, we go back to the setting where signiﬁcance carries onlylocal-to-zero information. Figure 4 reports posterior distributions for δ = { . , , } , α = 0 . n = 10000. 10 priorposterior with significanceposterior with no significance Figure 4: Posterior After a Test of the Null θ ∈ [ − δ, δ ] ( n = 10000, α = 0 . . Conditioning on the sign of the estimated coeﬃcient In previous sections we have shown that statistical signiﬁcance may carry very little informationin large samples. As a result, the values of other statistics should be taken into account alongwith signiﬁcance when the null is rejected in a signiﬁcance test. As discussed above, in a Normal(or asymptotically Normal) setting it does not take much to go back to full information (e.g., P -value and the sign of (cid:98) θ ). Here we consider the question of whether minimally augmenting theinformation on signiﬁcance with the sign of (cid:98) θ results in informativeness when the null is rejected.This exercise is motivated by the possibility that the sign of the estimated coeﬃcient is implicitlytaken into account in many discussions of results from signiﬁcance tests.For concreteness, we will concentrate on the case of a positive coeﬃcient estimate, (cid:98) θ >

0. Thatis, the limited information posterior under signiﬁcance and positive (cid:98) θ conditions on the event √ n (cid:98) θ > c . The case with negative (cid:98) θ is analogous. Using similar calculations as in section 1, weobtain: p (cid:0) θ (cid:12)(cid:12) √ n (cid:98) θ > c (cid:1) = 1 σ φ (cid:32) θ − µσ (cid:33) Φ( √ nθ − c )Φ (cid:32) √ nµ − c √ nσ (cid:33) , and p (cid:0) θ (cid:12)(cid:12) < √ n (cid:98) θ ≤ c (cid:1) = 1 σ φ (cid:32) θ − µσ (cid:33)(cid:16) − Φ( √ nθ − c ) − Φ( −√ nθ ) (cid:17) − Φ (cid:32) √ nµ − c √ nσ (cid:33) − Φ (cid:32) −√ nµ √ nσ (cid:33) . Figure 5 reproduces the setting of Figure 1 but for the case when the posterior is conditional onsign of the estimate in addition to signiﬁcance. Like in Figure 1, failure to reject carries subtantialinformation. In fact, both outcomes of the signiﬁcance test carry additional information, withrespect to the setting in Figure 1, which of course is explained by the additional information inthe sign of (cid:98) θ . 12 priorposterior with significanceposterior with no significance Figure 5: Posterior Distributions Conditional of Signiﬁcance and Coeﬃcient SignNotice that, in this case, under signiﬁcance, the ratio between the posterior and the prior con-verges tolim n →∞ p ( θ |√ n (cid:98) θ > c ) p ( θ ) =  θ < , Φ( − c ) / Φ( µ/σ ) if θ = 0 , / Φ( µ/σ ) if θ > . Without signiﬁcance, the ratio between the posterior and the prior converges tolim n →∞ p ( θ | < √ n (cid:98) θ ≤ c ) p ( θ ) = (cid:26) θ (cid:54) = 0 , ∞ if θ = 0 . That is, as n → ∞ non-signiﬁcance is highly informative. Under signiﬁcance, the posterior of θ converges to the prior truncated at zero. As a result, in this case the informational content ofsigniﬁcance depends on the value of Pr( θ >

0) = Φ( µ/σ ). If this quantity is small, signiﬁcancewith a positive sign is highly informative. Unsurprisingly, when µ/σ is large (that is, in caseswhere there is little uncertainty about the sign of the parameter of interest), a positive sign of (cid:98) θ does not add much to the informational content of the test. Moreover, the limit of p ( θ |√ n (cid:98) θ > c )cannot be more than double the value of p ( θ ) as long as µ is non-negative. This is relevant to13any instances where there are strong belief about the sign of the estimated coeﬃcients (e.g.,the slope of the demand function, or the eﬀect of schooling on wages) and speciﬁcations reporting“wrong” signs for the coeﬃcients of interest are rarely reported or published.

6. Conclusions

Signiﬁcance testing on a point null is the most extended form of inference. In this article, wehave shown that rejection of a point null often carries very little information, while failure toreject is highly informative. This is especially true in empirical contexts where data sets arelarge and where there are no reasons to put substantial prior probability on a point null. Ourresults challenge the usual practice of conferring point null rejections a higher level of scientiﬁcsigniﬁcance than non-rejections. In consequence, we advocate a visible reporting and discussionof non-signiﬁcant results in empirical practice (e.g., as in Angrist et al., 2017; Cantoni, 2018).

References

Angrist, J. D., V. Lavy, J. Leder-Luis, and A. Shany (2017). Maimonides rule redux. NBERWorking Paper 23486.Benjamin, D. J., J. O. Berger, M. Johannesson, B. A. Nosek, E.-J. Wagenmakers, R. Berk, K. A.Bollen, B. Brembs, L. Brown, C. Camerer, et al. (2017). Redeﬁne statistical signiﬁcance.

Nature Human Behavior , 6–10.Berger, J. O. (1985).

Statistical Decision Theory and Bayesian Analysis . New York: Springer-Verlag.Cantoni, E. (2018). Got ID? The zero eﬀects of voter ID laws on county-level turnout, voteshares, and uncounted ballots, 1992-2014. Working paper.Ioannidis, J. P. A. (2005). Why most published research ﬁndings are false.

PLOS Medicine 2 (8),696–701. 14ennedy, P. E. (2005). Oh no! I got the wrong sign! What should I do?

The Journal ofEconomic Education 36 (1), 77–92.Leamer, E. E. (1978).

Speciﬁcation searches: Ad hoc inference with nonexperimental data . NewYork: Wiley.Pratt, J. W. (1965). Bayesian interpretation of standard inference statements.

Journal of theRoyal Statistical Society. Series B (Methodological) 27 (2), 169–203.Robert, C. P. (2014). On the Jeﬀreys-Lindley paradox.

Philosophy of Science 81 (2), 216–232.Wasserstein, R. L. and N. A. Lazar (2016). The ASA’s statement on p-values: Context, process,and purpose.

The American Statistician 70 (2), 129–133.Wooldridge, J. M. (2016).

Introductory econometrics : A modern approach.

Boston: CengageLearning.Ziliak, S. and D. McCloskey (2008).