[PDF] Almost Similar Tests for Mediation Effects and other Hypotheses with Singularities

Abstract

Testing for mediation effects is empirically important and theoretically interesting. It is important in psychology, medicine, economics, accountancy, and marketing for instance, generating over 90,000 citations to a single key paper in the field. It also leads to a statistically interesting and long-standing problem that this paper solves. The no-mediation hypothesis, expressed as H 0 : θ 1 θ 2 =0 , defines a manifold that is non-regular in the origin where rejection probabilities of standard tests are extremely low. We propose a general method for obtaining near similar tests using a flexible g -function to bound the critical region. We prove that no similar test exists for mediation, but using our new varying g -method obtain a test that is all but similar and easy to use in practice. We derive tight upper bounds to similar and nonsimilar power envelopes and derive an optimal test. We extend the test to higher dimensions and illustrate the results in a trade union sentiment application.

Full PDF

AAlmost Similar Tests for Mediation Eﬀects and otherHypotheses with Singularities ∗ Kees Jan van GarderenAmsterdam School of EconomicsUniversity of Amsterdam

[email protected]

Noud van GiersbergenAmsterdam School of EconomicsUniversity of Amsterdam

[email protected]

First Version: February, 2020

Abstract

Testing for mediation eﬀects is empirically important and theoretically interesting.It is important in psychology, medicine, economics, accountancy, and marketing forinstance, generating over 90,000 citations to a single key paper in the ﬁeld. It alsoleads to a statistically interesting and long-standing problem that this paper solves.The no-mediation hypothesis, expressed as H : θ θ = 0, deﬁnes a manifold that isnon-regular in the origin where rejection probabilities of standard tests are extremelylow. We propose a general method for obtaining near similar tests using a ﬂexible g -function to bound the critical region. We prove that no similar test exists for mediation,but using our new varying g -method obtain a test that is all but similar and easy to usein practice. We derive tight upper bounds to similar and nonsimilar power envelopesand derive an optimal test. We extend the test to higher dimensions and illustrate theresults in a trade union sentiment application.Keywords: Varying g -method, Mediation, Indirect Eﬀect, Power Envelope, SimilarTests, Invariant Tests, Optimal Tests ∗ The authors thank Isaiah Andrews, Tim Armstrong, Peter Boswijk, Geert Dhaene, James Duﬀy, Jean-Marie Dufour, Patrik Guggenberger, Grant Hillier, Max King, Gael Martin, Sophocles Mavroeidis, GeertMesters, Frank Kleibergen, Anna Mikusheva, Ulrich M¨uller, Bent Nielsen, Adam McCloskey, Peter Phillips,Mikkel Plagborg-Møller, Richard Smith, Frank Windmeijer, Tiemen Woutersen and other participants atseminars at the University of Oxford, Monash University, KU Leuven, University of Amsterdam, and theconferences Advances in Econometrics 2018 in London, ANZESG2019 in Wellington, and (EC) a r X i v : . [ ec on . E M ] D ec Introduction

Testing for mediation eﬀects is empirically extremely important in various scientiﬁc disci-plines. A key paper in psychology, Baron and Kenny (1986) has more than 90,000 citations and is used in many other ﬁelds. Mediation testing is important in accounting, e.g. Colettiet al. (2005), marketing, e.g. MacKenzie et al. (1986), sociology, e.g. Alwin and Hauser(1975) who used the term indirect eﬀect, in epidemiology, e.g. Freedman and Schatzkin(1992) who coined the term intermediate endpoint eﬀect, and in econometrics e.g. Heckmanand Pinto (2015a,b) on treatment eﬀects and production technology. This minimal selectionis hardly representative for the vast body of literature on mediation analysis. It only illus-trates the breadth of its empirical relevance. Tests for mediation eﬀects can have extremelylow power, especially when the eﬀect is small or estimated with large variance. The primarypurpose of this paper is to provide a new and powerful test.The aim of mediation testing is to discover if an independent variable ( X ) causes adependent variable ( Y ) via an intervening or mediating variable ( M ). The mediating variableis exogenous in the common experimental settings in psychology and other ﬁelds, but is alsoconsidered exogenous in other settings where assignments are random or constitute a naturalexperiment. The basic model is: Y = τ X + βM + u, (1) M = αX + v, (2)where all variables are taken in deviation from their means or more generally after partialingout other exogenous eﬀects. The disturbances u and v are assumed to be independentbecause of an experimental set up and more generally because no inﬂuence of Y on M isassumed in this type of model. This independence is a crucial identiﬁcation condition. Theparameter β cannot be estimated consistently if M is endogenous. We will further makethe distributional assumption: ( u i , v i ) (cid:48) ∼ II N (0 , diag ( σ , σ )), i = 1 , · · · , n, with n thenumber of observations. This facilitates a likelihood analysis, but has no consequences forthe asymptotic normality of the t -statistics that will be used.MacKinnon et al. (2002) give a literature review and compare 14 diﬀerent methods fortesting the eﬀects of a mediation variable. These methods are based on standardized mea-sures of the product of two coeﬃcients ( αβ ) or based on the diﬀerence of two related coeﬃ-cients ( τ ∗ − τ ) in equations (1) and (3) : Y = τ ∗ X + w. (3)If there is a mediation eﬀect then X inﬂuences M, such that α (cid:54) = 0 , and M inﬂuences Y, such that β (cid:54) = 0 . If there is no mediation by M then the eﬀect of X on Y is not altered bythe inclusion of M such that τ ∗ − τ = 0 . Since model (3) is a restricted version of (1) with β = 0 , it is straightforward to showthat the OLS estimates for the three models satisfy ˆ τ ∗ = ˆ τ + ˆ α ˆ β and the relation τ ∗ − τ = αβ also holds in model interpretation terms; see Appendix A. Cited by 90,147 on 15 January 2020 and 79,205 on 22 October 2018. A simple extension is to add other explanatory variables to the model (cid:80) Kk =1 γ k Z k . If the covariates Z k are added to all three models, then the degrees of freedom of the relevant t -tests are reduced by K. (cid:113) ˆ α ˆ β/ ˆ σ αβ , with ˆ σ αβ an estimate of the standard error of the product ˆ α ˆ β. It is availablein standard statistical packages such as SPSS, SAS, and R. It has good properties wheneither α or β is large and the standard errors of ˆ α and ˆ β are small, but if the two t -testsfor testing α = 0 and β = 0 tend to be small, properties deteriorate. For parameter valuesunder the null, the Null Rejection Probability (NRP) can be very close to zero and, under thealternative, power can fall far below the size (highest NRP) of 5% that we use throughout.Other tests considered in MacKinnon et al. (2002) suﬀer from the same problems.The origin is exceptional even under the null. The null hypotheses αβ = 0 deﬁnes amanifold that is almost everywhere continuously diﬀerentiable, with the exception of theorigin which is a singular point. The problematic behavior of the Wald test under thenull with singularities is well known and as yet unresolved. Dufour et al. (2017) providean extensive characterization of the asymptotic null distribution of Wald-type statistics fortesting restrictions given by polynomial functions with local singularities. We refer to theircomprehensive review of the literature on the problems of Wald tests with singularities. Inthe case of a single restriction, such as mediation testing, they provide limit distributions andbounds. This only shows the extend of the problems, but does not solve them or show howthe Wald test can be salvaged. We will construct a new test that has good power propertiesuniformly, even in a neighborhood of the singularity.The distributions of all test statistics considered in the literature depend on the value ofthe parameters under the null. As a consequence none of these tests is similar, meaning thatit has rejection probability that is not constant on the boundary of the null hypothesis. Infact rejection probabilities under alternatives close to the origin can be much lower than thesize. These tests are therefore seriously biased since power can be much less than the NRPsfor certain parameter values. The Wald-type (Sobel) test’s dependence on the parametervalue is extreme in the sense that the asymptotic critical value when both α = 0 and β = 0 is χ (0 .

95) and for any other value equals χ (0 . g method. It varies a function g that deﬁnes the boundaryof the critical region to obtain a test that has NRPs as close as possible to 5% . This newmethod is not limited to the mediation hypothesis, but can be applied to many other testingproblems with nuisance parameters to obtain near similar tests more generally.For the mediation hypothesis we construct this boundary in the space of the two common3 -statistics for testing α = 0 or β = 0 . We develop a numerical method that does not usesimulations to determine this critical region which has NRPs that are extremely close to 5%for all values of α and β under the null. This requires some computing eﬀort on our partinitially, but once completed our results can be easily implemented in practice using thetable or the computer code provided. This test has much better power properties than Waldand LR tests, but a natural question is if one can do even better. To determine the qualityof the test in absolute terms requires an appropriate power envelope. The power envelopecannot be constructed by point optimal invariant tests based on a simple application ofthe Neyman-Pearson lemma because the null and alternative are composite. Andrews andPloberger (1994) address this issue by optimizing weighted power and recent econometriccontributions, including Andrews et al. (2006, 2008), Elliott et al. (2015), Guggenbergeret al. (2019), to name but a few, have considered null and/or alternative weighted mixturedistributions such that the Neyman-Pearson lemma can be applied to the resulting pointnull and point alternative distributions. Within the speciﬁed class of mixture distributions,the least favorable distribution is then constructed and a critical value calculated. Any othertest in this class has power no higher than the test constructed, resulting in an optimalityproperty. There is no guarantee however, that the resulting test is similar and it can stillbe seriously biased as we have conﬁrmed (but not reported) in the mediation context for avariety of mixtures.We take a more direct approach to constructing the power envelope and maximize powerfor a grid of points in the alternative. We introduce a class of near similar invariant testsΓ (cid:15) with 0 . − (cid:15) ≤ N RP ≤ .

05 for a grid of points under H and (cid:15) small. The algorithmgenerates a diﬀerent test for each grid point in the alternative that is an approximatelysimilar invariant test that maximizes power for that point alternative. This provides apower envelope (upper bound) for similar tests. Using the same algorithm we can furtherdetermine a power envelope for nonsimilar tests by discarding the near similarity restriction0 . − (cid:15) ≤ N RP .We use the near similar power envelope to construct an optimal test within Γ (cid:15) thatminimizes the total power diﬀerence from the envelope on a grid of points. It has powerthat deviates less than 0 . , showing that the potential power loss due to the similarity requirement issmall.Andrews (2012) shows that an exact similar test exists in the related one-sided testingproblem H : α ≥ β ≥ Emperor’s New Tests ” for similar tests with very poor properties. Insistence onsimilarity can render Likelihood Ratio (LR) tests α -inadmissible, cf. Lehmann and Romano(2005, Section 6.7), but Perlman and Wu (1999) give examples where similar tests haveextremely undesirable properties yet inadmissible LR tests still provide reasonable answers.In the mediation setting the LR test is much better than the Wald test as we will show,but still suﬀers from poor power properties close to the origin and is inadmissible. Our testis non-randomized and has good power properties uniformly superior to the Wald, LR, andLM tests considered here and would please even statistically erudite emperors.4oreira and Mour˜ao (2016) consider random critical values. Such an interpretationcan be given quite generally to any critical region in a higher dimensional space since theboundary of the critical region for one statistic can be expressed as a function of the remainingstatistics. Our solution can be framed in terms of a random critical value for the minimumof the absolute t -values. This critical value is a function of the maximum of the absolute t -values. The critical region that we construct is ﬁxed however and not at all random. Ourapproach appears to lend itself better to multivariate extensions.Our empirical illustration requires such an extension to three dimensions and we considergeneral hypotheses of the form H : θ · · · θ K = 0. In order to derive the critical region fordimensions three and higher we exploit the symmetries of the testing problem further. Thetesting problem is invariant to ordering (permutations) of parameters and statistics andsign changes (reﬂections), giving rise to a ﬁnite group with eight transformations on theparameter and sample space in two dimensions and K !2 elements in K dimensions. Wegive the relevant distribution of the maximal invariant and use it to derive a critical regionexplicitly in two and three dimensions. For three dimensions we use a method to obtain thesolution in dimension K from the preceding solution in dimension K −

1. These solutionsare dimensionally coherent in the sense that for extremely large values of a number, k say,of t -statistics, the solution reduces to the K − k dimensional reduction since in such casesit is essentially known that k parameters are non-zero and rejection of the null depends onthe remaining ( K − k ) t -statistics.An empirical illustration on union sentiment among southern nonunion textile workersin Section 6 shows the practical implementation in two and three dimensions and leads todiﬀerent conclusions than standard tests. For practitioners the major advantage of our testis that there is a better chance of formally showing that there is a mediation eﬀect. Ourtest has better power, especially when the two channeling eﬀects are small or less accuratelyestimated. Given the enormous interest in testing for mediation and the fact that our testcan have close to 5% more power than existing tests, many unpublished examples will existwhere it can now be concluded that there is a statistically signiﬁcant mediation eﬀect. The joint density of (

Y, M ) given X can be written as f ( Y, M | X ; λ ) = f ( Y | M, X ; λ ) f ( M | X ; λ )with λ = ( τ , β, σ ) (cid:48) and λ = ( α, σ ) (cid:48) . The parameters λ and λ vary freely as a resultof the triangular structure of the model. The mediation variable is the endogenous variablein (2) but is strongly exogenous for β in (1) since Y is not causal for M . For a sample of n independent observations the loglikelihood equals the sum of two normal loglikelihoodscorresponding to (1) and (2) : (cid:96) ( λ ) ∝ − σ n (cid:88) i =1 ( y i − τ x i − βm i ) − n σ ) − σ n (cid:88) i =1 ( m i − αx i ) − n σ ) . (4) This can easily be extended to include more regressors/covariates. Instrumental variables can also beused, but note that X and M appear in both equations and in the standard setup u and v are independentbecause of the experimental interpretation of M .

5s a consequence the Maximum Likelihood Estimators (MLEs) for α and β are the usualOLS estimators for the two equations separately. Furthermore, both observed and expectedFisher information matrices will be block diagonal in terms of λ and λ as well as in ( τ , β ) (cid:48) , σ , α, and σ . As a result the standard t -statistics T and T for α and β respectively areasymptotically independent and normally distributed with means µ ≡ α /σ α , µ ≡ β /σ β where α , β denote the true parameter values and σ α , σ β the standard deviations of theOLS estimators: ( T − µ ) d → N (0 , I ). Throughout the rest of the paper we will use thenormal distribution for the t -statistics,( T − µ ) ≡ (cid:18) T − µ T − µ (cid:19) ∼ N (0 , I ) , with the understanding that this is an asymptotic approximation, but exploited as if it is theexact distribution. This is analogues to the assumption in the weak instruments literaturethat the covariance matrix is known (e.g. Andrews et al. (2006)).The ﬁnite-sample distribution involves t -distributions with diﬀerent degrees of freedomand is complicated by the fact that σ β depends on M . A strong justiﬁcation for restrictingattention to T , even in ﬁnite samples is the exact result that T is maximal invariant withrespect to an appropriate group of (location-scale) transformations that leave the testingproblem invariant. Appendix C proves this result and provides further distributional detailsrelevant for the model. Standard test statistics used in practice have distributions that depend on the parametervalues under the null. The rejection probabilities are therefore not constant and the testsbiased with power dropping below the size of the test, especially in a neighborhood of theorigin. We illustrate the issue for the classic trinity of Wald, LR, and LM methods forconstructing test statistics.The Wald test for testing H : αβ = 0 together with its asymptotic distribution is givenby Glonek (1993) and further analyzed in Drton and Xiao (2016): W = T T T + T d → (cid:26) χ : if α = 0 or β = 0 , but not both, χ : if α = β = 0 . (5)The widely used Sobel (1982) test equals √ W . The discrete jump in the asymptoticdistribution from the origin to any other parameter value that is ﬁxed is remarkable andshows explicitly that the distribution depends heavily on the parameter values under thenull. The critical value is the usual 3 .

84 in all cases other than the origin. For an NRP of5% at the origin the critical value should be 0 .

96 but this would lead to over-rejection forother values under the null and the test would be oversized (size > LR = min {| T | , | T |} , (6)and rejects when both H α : α = 0 and H β : β = 0 are rejected. In MacKinnon et al. (2002)this is referred to as the test for joint signiﬁcance, but not identiﬁed as the LR test. Therejection probability is: P [ LR ≥ cv ] = P [ | T | ≥ cv ∩ | T | ≥ cv ] = P [ | T | ≥ cv ] · P [ | T | ≥ cv ]by independence of T and T . These rejection probabilities are monotonically increasing inthe absolute values of α and β . Correct size is therefore obtained by choosing the criticalvalue of the test by letting α → ∞ when β = 0 , or β → ∞ if α = 0 , to guarantee that therejection probability under the null is always smaller than or equal to the nominal size. Theasymptotic 5% critical value is therefore the usual 1 .

96. The NRP will depend on the valuesof α and β and vary between the following two extremes: P [ LR ≥ z . ] = (cid:26) .

05 : if α → ∞ ∧ β = 0 , or β → ∞ ∧ α = 0 , . = 0 . α = 0 ∧ β = 0 , where z . is the upper 2 .

5% percentile of the standard normal distribution. For an NRPof 5% at the origin ( α, β ) = (0 , , the critical value should equal cv LR = 1 . > H α ¯ β : α = 0 ∧ β (cid:54) = 0 , H αβ : β = 0 ∧ α (cid:54) = 0 , or H αβ : α = 0 ∧ β = 0. Explicitexpressions for the three LM tests are given in Appendix A. These LM tests are essentiallysquared t -tests with restricted variance estimates. The three versions can be combined intoa single statistic, but its distribution will depend on the true parameter values under thenull.All these classic tests are functions of two t -statistics and their distributions, as wellas the NRPs, clearly depend on the parameter values under the null and the tests are notsimilar. A test is called similar on the boundary of H if the probability of rejection of thenull is constant for all parameter values on the boundary of H and H : Deﬁnition 1

Similar test . Let ω ⊂ Θ be the boundary between H : θ ∈ Θ and H : θ ∈ Θ \ Θ . A test is similar on the boundary ω if the null rejection probability does not dependon θ ∈ ω . For the null hypothesis of no mediation, the boundary consists of the horizontal andvertical axes of the ( α, β ) space and is equal to H itself. None of the classic tests is similarand in a neighborhood of the origin the NRPs are close to zero. As a result the power ina neighborhood of the origin is also close to zero and far below the size of the test andthe tests are biased since there are parameter values with probability of rejection under thealternative lower than under the null. 7 .2 Critical Regions The behavior and construction of the classic test statistics is problematic. Given that nosatisfactory adjustments of classic test statistics have been found, despite considerable eﬀortsover recent decades, a diﬀerent approach is required.In order to derive an alternative test procedure we shift the focus from the test statisticto the critical region. A critical region deﬁnes a test statistic of course, but choosing a classof tests, such as Wald, LR, or LM tests, restricts the shape of the critical region. For thesame reason the tests focusing on improving the standard error of ˆ α ˆ β or (ˆ τ ∗ − ˆ τ ) analyzedin MacKinnon et al. (2002) limit the shape.We construct a new test procedure by constructing the critical region directly in thetwo-dimensional sample space of the t -statistics used in the construction of the tests. Weconsider critical regions that are bounded by a measurable function g ( · ) and give the fol-lowing deﬁnition. Deﬁnition 2

Boundary function (of the critical region) . g : R → R deﬁnes the:Critical Region : CR g = (cid:8) ( T , T ) ∈ R | | T | ≥ g ( | T | ) ∩ | T | ≥ g ( | T | ) (cid:9) , Acceptance Region : AR g = (cid:8) ( T , T ) ∈ R | | T | < g ( | T | ) ∪ | T | < g ( | T | ) (cid:9) . The justiﬁcation for considering the t -statistics is threefold. First, the MLE ˆ λ = (cid:16) ˆ τ , ˆ β, ˆ σ , ˆ α, ˆ σ (cid:17) (cid:48) is a complete minimal suﬃcient statistic because the model constitutes a full exponentialmodel given the dimensional equality of the minimal suﬃcient statistic and the parameterspace; see van Garderen (1997). Second, T and T have distributions under the null that areindependent of the nuisance parameters τ , σ , and σ . Finally, T = ( T , T ) (cid:48) is a maximalinvariant under an appropriate group of transformations generalizing the scale invariance ofthe t -statistics. In the mediation problem there are further symmetries and invariances.The null hypothesis is not changed if α and β are permuted or their signs changed. Conse-quently, we can permute the t -statistics and change their sign without aﬀecting the problem.As a consequence only 1 / th of the two-dimensional sample space of T needs considerationand we deﬁne the critical region in the ﬁrst octant (east to northeast). The other sevenparts follow by symmetry. The test deﬁned by CR g is indeed invariant to permutations,reﬂections, and scale transformations. The domain of g ( · ) can therefore be restricted to thenon-negative real line and bounded by the 45 ◦ line: g ( x ) ≤ x. In Section 5 we show that theordered absolute t -statistic is a maximal invariant not only for this testing problem but alsoits generalization to higher dimensions.We can put the general deﬁnition of a similar test in terms of the boundary function g ( · ),noting that H is itself the boundary ω of H and H : Deﬁnition 3 g ( · ) is said to be a similar boundary function if the probability of the criticalregion CR g deﬁned by g is constant under H : P [ T ∈ CR g | H ] = constant ∀ ( α, β ) ∈ R with αβ = 0 . Appendix C shows that this also holds exactly in ﬁnite samples. T , T ) forthe Wald and LR test. The LM test is not illustrated because it is not properly and uniquelydeﬁned. We show the boundaries for two critical values: one such that for large | α | or | β | the NRP is 5% asymptotically. This value is the usual 3 .

84 for the Wald test and 1 .

96 forthe LR test. The second, smaller critical value is such that the NRP is 5% when α = β = 0 . This value is 0 .

96 for the Wald test and 1 .

22 for the LR test. The rejection probabilitiesare shown as a function of the noncentrality parameter µ = α/σ α for given µ = 0 , suchthat H holds. For the LR test with critical value 1 .

96 the NRP goes to 0 . . when α = β = 0 . In the second case, with the smaller critical value 1 .

22, the NRP is 5%by construction when α = β = 0 , but for other values the NRPs are much higher than thenominal size of the test and hence they are not valid 5% tests. The same situation will occurwhen constructing point optimal invariant tests. The Wald test is considerably worse withlower NRP over a wider range of µ , see ﬁgures 1c and 1d.The trinity of classic tests is clearly nonsimilar. The question is if we can do much better.Does there exist a similar test or is this a problem that is intrinsically unsolvable? Our maintheoretical contribution, Theorem 4, states that no similar test exists. The practical answerhowever is that we can get very close to similarity and can do much better in terms of powerthan existing tests. Theorem 4

No similar boundary function g ( · ) exists for testing H : αβ = 0 . The proof of this main theorem is given in Appendix B and exploits the symmetriesof the problem and the completeness of the normal distribution. We use 5% signiﬁcancethroughout but it is immediate from the proof that there is no signiﬁcance level for whichthere exists a similar boundary function, apart from two trivial exceptions. A size of 0%would yield g ( t ) = t and AR g = R such that the test would never reject. The other trivialsolution is g ( t ) = 0 and deﬁning g − (0) = ∞ accordingly. This test always rejects, leadingto an NRP of 100% for all parameter values.Andrews (2012) proves constructively that a similar test exists for the “one-sided” testingproblem H : µ ≥ H : µ (cid:3)

0, but his test is randomized and he makes the pointthat it has very low power. Andrews shows that on the negative parts of the axes the NRPis 5%, which is (trivial) power in his setting and correct size in ours. One-sided alternativesdestroy the symmetry of the problem that we exploit and use to prove the non-existence.We do not consider randomized tests and our new test below has very good power propertiesclose to the power envelope.Despite the negative non-existence result of Theorem 4, we construct a critical regionthat is all but similar with NRPs that do not diﬀer from 5% in practical terms for anyparameter value under the null and has good power. This new test is easy to implementusing Table 1. We also provide R-code in Appendix E.The new test is obtained in two steps. We propose a new general method for constructingnear similar tests. In the ﬁrst step we use this method to derive a near similar test for themediation hypothesis. We derive the power envelope for near similar tests, which can beused to show that the test has good power properties. In the second step, however, we usethis power envelope to optimize the test and maximize power within the class of near similar9 - - - - - (a) CR Wald (Sobel) test. - - - - - - (b) CR LR test. WaldWald % - - - μ → P [ R e j ] → (c) NRP Wald (Sobel) test. LRLR % - - - μ → P [ R e j ] → (d) NRP LR test. Figure 1: Critical regions for Wald (Sobel) and LR tests and their rejection probabilities.White areas are the critical regions for the valid 5% tests. For 5% rejection in the origin( α, β ) = (0 , g -Method The new general method for the construction of near similar tests is easily described by threegeneric steps:1. Deﬁne a ﬂexible boundary g for the critical region in the relevant sample space.2. Deﬁne a criterion function Q ( g ) that penalizes the deviation of the NRP from 5% fora grid of parameter values under the null (and possibly restrictions on g and otheraspects deemed relevant).3. Systematically vary and determine g such that it minimizes the criterion function andis therefore as close to similarity as possible in the metric deﬁned by Q .The relevant sample space is determined by the particular testing problem at hand andmay have been reduced by suﬃciency, invariance, or other principles, to dimension k , say.The boundary g of the critical and acceptance region is then of dimension ( k − g ﬂexibly, but we will use splines.The criterion function may include aspects other than similarity, for instance smoothnessand monotonicity of g , convexity of the critical or acceptance regions, or even rejection prob-abilities under alternatives. Consequently, Step 3 will generally be a constraint optimizationproblem. The systematic variation of g is intended to be in line with the optimization routineused to minimize Q as in, e.g. a Newton-Raphson-type procedure.An explicit implementation of the varying g -method to the mediation problem is givennext. g -Test The initial step in the varying g -method is to determine the relevant sample space for thetesting problem. The mediation model allows a reduction of the sample space by suﬃciencyto the MLE of the ﬁve parameters. A further reduction to ( T , T ) in two dimensions followsfrom location-scale invariance as proved in Appendix C. Permutation and reﬂection sym-metries reduce the sample space to one octant, because the absolute order statistic deﬁnedas: (cid:16) | T | (1) , | T | (2) (cid:17) = (min ( | T | , | T | ) , max ( | T | , | T | )) , is a maximal invariant. It has a distribution that depends only on the ordered absolutenoncentrality parameter which is the corresponding maximal invariant in the parameterspace: (cid:16) | µ | (1) , | µ | (2) (cid:17) = (min ( | µ | , | µ | ) , max ( | µ | , | µ | )) and ( µ , µ ) = ( α/σ α , β/σ β ) . - - T - - - T t ( ) t ( ) t ( ) t ( ) t ( ) t ( ) g ( ) g ( ) g ( ) g ( ) g ( ) g ( ) g - ( t ) g ( t ) Figure 2: Construction of the basic g -function with J = 6, so 8 knots in all and the resultingCR boundary in ( T , T ) space.The g -boundary is generally determined by an algorithm. Appendix D shows the basicimplementation of the varying g -method using linear splines with J + 2 knots, with the ﬁrstand last knots ﬁxed. In spite of its simplicity, it leads to big improvements even for smallvalues of J . Figure 2 illustrates the construction of the g -function for a ﬁxed number of gridpoints J = 6 and the resulting CR g in the sample space of ( T , T ).Figure 3 shows the NRPs of the test in comparison to the LR and Wald (Sobel) tests.There is a remarkable gain in the lowest NRP, and therefore local power, from 0 .

25% to4 . J = 2. We started with J = 2, after J = 16 the improvements were verysmall, and with J = 32 there was essentially no improvement. The ﬁgure shows a slightover-rejection for some parameter values under H and hence the test is not a valid 5% test.We will correct for this subsequently by imposing strict side conditions on the NRP, becausesimply increasing penalties on over-rejection will not solve the issue. Insistence on similarity can have negative consequences for the power in general, but nothere. Even the basic test with J = 32 has good power, especially in comparison with theSobel and LR tests. It is uniformly better for all values of the noncentrality parameter µ and in a neighborhood of the origin with µ = 0 it is essentially 5% points higher.12 J = g J = g J = LRW ( Sobel - - μ → P [ R e j ] → Figure 3: NRPs basic g -tests with J = 2 , , g J = g J = g J = LRW ( Sobel ) P [ R e j ec t ] → μ → Figure 4: Power comparison basic g -tests, LR- and W (Sobel) tests along µ = µ (= µ ).Denote the rejection probability of the g -test as a function of the noncentrality parameters( µ , µ ) = ( α/σ α , β/σ β ) by: π g ( µ , µ ) = P [ T ∈ CR g | ( µ , µ )] . (7)If µ and/or µ equal 0 then the null hypothesis is true and π g is the NRP. When both arenon-zero, H is false and π g is the power of the test deﬁned by CR g . Figure 4 illustratesthe power in the 45 ◦ direction µ = µ , but in other directions power is also superior to theWald (Sobel) and LR tests.There is a straightforward explanation for the additional power. The Wald and LR testboth reject much less than 5% near the origin. The critical region can be extended and thepower increased without failing the size condition. In the origin the NRPs are close to 0%for the LR and Wald (Sobel) tests. By extending the critical region we can therefore gainalmost 5% power without violating the size condition. Nevertheless, the LR test has someattractive features including that rejection for a particular ( t , t ) implies rejection for largervalues of t and/or t which is intuitive since the evidence against the null is increasing. A13isadvantage is however that one never rejects when either t or t is smaller than 1 .

96 andthis causes conservativeness that can be resolved by adding area to the critical region. Weelaborate on this after deriving the optimal g -test. Comparison to the Sobel and LR tests is limited because they are very poor for small valuesof µ . The absolute quality, or even near optimality, of the new g -test can only be assessedby comparing the power surface of the test to the power envelope, or a tight upper boundthereof, for a class of tests that satisfy appropriate invariance-, size and almost similarityrestrictions. Since no exact similar invariant test exists, we introduce a class Γ (cid:15) of near similartests with NRPs that deviate less than (cid:15) from the 5% level and an operational (super)classΓ M (cid:15) ⊇ Γ (cid:15) as follows. Deﬁnition 5

The class Γ (cid:15) of near similar boundary functions with (cid:15) > is deﬁned by: Γ (cid:15) = (cid:26) g : R → R (cid:12)(cid:12)(cid:12)(cid:12) sup µ ≥ P [ CR g | (0 , µ )] ≤ . and inf µ ≥ P [ CR g | (0 , µ )] ≥ . − (cid:15) (cid:27) The class Γ M (cid:15) with M = (cid:110)(cid:16) , µ ( ι )0 (cid:17)(cid:111) Υ ι =1 a set containing Υ points under H , is deﬁned by: Γ M (cid:15) = (cid:40) g : R → R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup (0 ,µ ) ∈ M P [ CR g | (0 , µ )] ≤ . and inf (0 ,µ ) ∈ M P [ CR g | (0 , µ )] ≥ . − (cid:15) (cid:41) . For (cid:15) = 0 the boundary functions in Γ would be similar and, since no such boundaryexists by Theorem 1, Γ would be empty. For (cid:15) = 0 .

05, on the other hand, Γ . contains all tests that satisfy the size condition. For (cid:15) close to 0, Γ (cid:15) contains boundaries that arealmost similar. The minimum value of (cid:15) for which Γ (cid:15) are not empty depends in general onthe testing problem.The class Γ M (cid:15) can be thought of as a discretization of Γ (cid:15) in the sense that a grid ofpoints under the null is considered. It imposes less restrictions and enforces near similarityconditions on a ﬁnite number of points only. As a consequence it may contain boundariesthat do not satisfy the size condition for points that are not in M . Obviously Γ (cid:15) ⊆ Γ M (cid:15) sincethe size and NRP conditions also hold for the points in M .Within the class Γ (cid:15) there is no unique solution. As a consequence one is forced to choosea boundary function from Γ (cid:15) , or in practice from Γ M (cid:15) , to obtain an operational test. For theconstruction of the power envelope we can select the test that maximizes the power againsta particular point ( µ , µ ) in the alternative. This test is a Point Optimal Invariant NearSimilar (POINS) test. The critical region of this test varies with ( µ , µ ) and no uniformlymost powerful test exists within the class Γ (cid:15) . It can be used however, to construct an upperbound for the power envelope. Deﬁnition 6

The power envelope of a near similar invariant test with (cid:15) > is deﬁned as: π ( µ , µ ) = max g ∈ Γ (cid:15) P [ CR g | ( µ , µ )] . or a given set of points M = (cid:110) (0 , µ ( ι )0 ) (cid:111) Υ ι =1 deﬁne an upper bound to the power envelopeby: ¯ π ( µ , µ ) = max g ∈ Γ M (cid:15) P [ CR g | ( µ , µ )] . For notational simplicity we have suppressed the dependence on (cid:15) and M . Since Γ (cid:15) ⊆ Γ M (cid:15) and elements of Γ M (cid:15) do not necessarily satisfy the size condition for all parameter values itfollows that ¯ π ( µ , µ ) ≥ π ( µ , µ ) , because fewer conditions are imposed. Choosing a ﬁnergrid for M will force ¯ π ( µ , µ ) closer to π ( µ , µ ), at least in the additional points in M where the size condition is now required to hold. Also note that the “point” optimal g thatmaximizes power for the point ( µ , µ ) , may have undesirable features such as includingparts of the axes in the critical region, even though such observations are perfectly in linewith the null hypothesis.We determine ¯ π ( µ , µ ) numerically by maximizing the power directly by selecting criticalregion points in the sample space that maximize the probability of rejection when the truedensity has parameter ( µ , µ ), under the side conditions that the NRP ∈ [0 . − (cid:15), .

05] forall parameters (0 , µ ) ∈ M . The sample space is decomposed into 285 ,

150 squares andfor each square it is determined whether it should be included in the critical or acceptanceregion in order to maximize the power while at the same time satisfying the approximatesimilarity condition. This is repeated for a grid of ( µ , µ ) points. So for each point on thegrid the POINS critical region is determined and the power recorded. Appendix D givesdetails of the algorithm and the optimization routine that can deal with a large number ofvariables and side conditions.By dropping the near similarity restriction 0 . − (cid:15) ≤ N RP in the same algorithm, we canconstruct a power envelope for nonsimilar tests. The maximal diﬀerence from the (higher)nonsimilar power surface is 2% points when power is around 40% , showing that the powerloss due to the similarity requirement is small.The power surface of the basic test based on 32 knots, g J =32 , can now be compared tothe power envelope upper bound. They are very close over the whole of the parameter space.The test g J =32 is oversized, however, and even though the over-rejection seems practicallyirrelevant, theoretically the test is not a valid 5% test.The power envelope enables us to construct a correctly sized optimal test derived nextand will show that the upper bound is tight. g -test Having determined an upper bound to the power envelope, we can determine a g -boundaryfunction with a power surface as close as possible to this upper bound. This optimal test isfound using the algorithm given in Appendix D. We parsimoniously simplify the g -functionto just three clamped splines joined by three linear parts. This function is given in AppendixE and R-code is also provided there. For ease of implementation we give values of g ( t ) inTable 1. Figure 5 shows the optimal g -boundary test for the mediation problem.The optimal CR g includes a narrow region close to the 45 ◦ line where both t -statisticsare of the same magnitude. This is expedient because mediation requires both α and β to15 .0 0.5 1.0 1.5 2.0 2.5 3.00.00.51.01.52.0 t → g ( t ) → g ( t ) Figure 5: Optimal g -boundary function. The dashed line is the LR boundary. μ → Figure 6: NRP as a function of the non-centrality parameter µ . The solid line is the optimaltest and is strictly between 0.04999 and 0.05. The dotted horizontal line is uniformly 5%of the unattainable similar test. The dashed wave is the NRP of the basic g J =32 which ismarginally over-sized. 16 Table 1: g -function: table entries are g ( t ) values for corresponding t -value in ﬁrst column+ value ﬁrst row. e.g. g (1 .

09) = 1 . . Compare smallest absolute t-statistic with g (largest absolute t -statistic) . Linear interpolation results in | N RP − | < . ◦ line as illustrated by thepower surface in Figure 7 showing highest power on the diagonal. Second, the near similaritycondition requires additional critical region area in the left corner of the octant because NRPsare particularly low for small parameter values. The increased power is naturally linked tothe increase in Type I error, but correct size of a test by deﬁnition merely requires thatthis is not larger than 5%. Nevertheless, size (NRP)/power trade-oﬀ exists as well as othercompromises that can be assessed using critical region analysis. For instance, it may seemless intuitive that rejection is not monotonic in t and t since an increase in both t and t represents increased evidence against the null. The LR and Wald tests are monotonic inthis sense, but lead to a reduction in power to nearly zero for small parameter values. Noobserved value t of T will ever lie on the horizontal or vertical axis and any observed t istherefore more likely given an alternative parameter value than a value under the null. It istherefore desirable to add area to the LR critical region even if this results in a non-convexcritical region or acceptance region. One could cogitate about the very narrow region closeto the diagonal and whether the acceptance should not continue along the 45 ◦ line until e.g.1 .

2, but the new g -boundary is the optimal solution to a well-deﬁned problem.The narrow region of the optimal CR g is a strict extension of the CR LR , which itself isstrictly larger than the Sobel (Wald) CR W . Since the new test is constructed to satisfy thesize condition we have the following: Theorem 7

The Sobel/Wald test and the LR test are inadmissible.

Proof. CR W ⊂ CR LR ⊂ CR g hence P [ CR W ] < P [ CR LR ] < P [ CR g ] ≤ .

05. The optimal g -test has uniformly higher power and is correctly sized by construction. (cid:3) The NRP as a function of the noncentrality parameter µ is shown in Figure 6. Thediﬀerence from 5% is less than 10 − and so small that the scale had to be magniﬁed greatlyand to an extend that prevents comparison with the LR and Sobel tests in the same graph.We include the NRPs for the basic g J =32 which shows the over-rejection less than 0 . g -test is very close to the power envelope (upper bound). Themaximal diﬀerence is 0 . g -test look almost identical when graphed. Figure 7 therefore shows only the powersurface of the optimal g -test. Finally, the new g -test is optimal for all intents and purposesin a larger class of tests. It is optimal by construction within the class of near similar testsΓ M (cid:15) , but given the closeness of its power surface to the (non)similar power envelope, therecannot exist any near similar test that has additional power more than 0 . g -test hasgood properties for all parameter values.The power surface in Figure 7 shows only the ﬁrst quadrant of the parameter spaceof ( µ , µ ) . The other four quadrants follow by simple permutations and reﬂections of theparameters. 18igure 7: Power surface optimal g -test If mediation is through a chain of eﬀects where X → M (0) → · · · → M ( K − → Y then K parameters are required to be non-zero for this channel to operate. The empirical examplein the next section requires an extension to three dimensions, but there are many otherproblems in econometrics that involve restrictions that at least one parameter is zero, seee.g. Dufour et al. (2017). In K dimensions the null hypothesis that at least one parameteris zero and the alternative is all K parameters non-zero: H : θ θ ...θ K = 0 H : θ θ ...θ K (cid:54) = 0As before we assume that the estimator ˆ θ is normally distributed with covariance matrixΩ = diag ( σ , ..., σ K ) and known such that Ω − / (cid:16) ˆ θ − θ (cid:17) = ( T − µ ) ∼ N (0 , I K ) withelements T k = ˆ θ k /σ k and noncentrality parameters µ k = θ k /σ k . In higher dimensions it is even more important to exploit invariance and symmetry prop-erties of the problem because it can reduce the domain of integration by a factor K !2 K . Thetesting problem is invariant to reordering the parameters (permutations) and sign changes(reﬂections) of the K parameters { θ i } Ki =1 . There is an associated group of transformations on T that leaves the distribution invariant. It can be decomposed into two proper subgroups:the group of permutations, G say, with K ! elements, and the group of sign changes, G say, with 2 K elements (two possible signs for each element). The groups G and G haveonly the identity element in common, but are otherwise non-overlapping. The full group G generated by G and G therefore has K !2 K elements. In two dimensions this equals eight,in three dimensions 48 , in four dimensions 384 etc. with a multiplicative factor 2 K to obtaindimension K from the one before.The density after a sign change in T k is obtained by a corresponding sign change in µ k and for a permutation of T also µ permutes accordingly. Hence for any element h ∈ G we19ave h · T ∼ N ( h · µ, I K ) or P hµ [ hT ∈ A ] = P µ [ T ∈ A ] so the distribution is invariant; seeLehmann and Romano (2005).Deﬁne the absolute order statistic (cid:110) | T | (1) , ..., | T | ( K ) (cid:111) as the reordered absolute valuesof the t -statistics such that | T | (1) < | T | (2) < ... < | T | ( K ) and the absolute order parameter (cid:110) | µ | (1) , ..., | µ | ( K ) (cid:111) as the reordered absolute values of the parameters µ k . Lemma 8 If T ∼ N ( µ, I K ) then the absolute order statistic (cid:110) | T | (1) , ..., | T | ( K ) (cid:111) is a max-imal invariant statistic and the absolute order parameter (cid:110) | µ | (1) , ..., | µ | ( K ) (cid:111) is a maximalinvariant parameter under the group of transformations G = G × G . The distribution of (cid:110) | T | (1) , ..., | T | ( K ) (cid:111) depends only on (cid:110) | µ | (1) , ..., | µ | ( K ) (cid:111) . Lemma 9

The probability density function of the absolute order statistic is given by: f { | T | (1) ,..., | T | ( K ) } (cid:16) | t | (1) , ..., | t | ( K ) (cid:17) = perm  χ (cid:16) | t | (1) , | µ | (1) (cid:17) · · · χ (cid:16) | t | (1) , | µ | ( K ) (cid:17) ... χ (cid:16) | t | ( K ) , | µ | (1) (cid:17) · · · χ (cid:16) | t | ( K ) , | µ | ( K ) (cid:17)  , (8) with perm ( A ) the permanent of the square matrix A and χ ( x, λ ) the noncentral Chi-distribution with one degree of freedom and noncentrality parameter λ > . Note that the null hypothesis implies | µ | (1) = ... = | µ | ( k ) = 0 for some k ≥

1. We will usedensity (8) on the relevant domain 0 ≤ | T | (1) ≤ ... ≤ | T | ( K ) ∈ [0 , ∞ ) to calculate rejectionprobabilities based on the critical region deﬁned by the boundary function g (cid:16) | t | (2) , ..., | t | ( K ) (cid:17) . Dimensional Coherency.

If it is known that θ K (cid:54) = 0 , then the null hypothesis reduces to H : θ θ ...θ K − = 0 . Thisimplies that the critical region for the K − t -statistics must reduce to thesolution found for K − | µ K | is large. For large values of | T K | ( p -valuesvery small) it is essentially known that µ K and θ K are non-zero. The probability of rejectionwill eﬀectively depend only on the K − t -values. In two dimensions this means that as t → ∞ the boundary function g ( t ) → .

96, which is the one-dimensional solution for testing H : θ = 0 . In three dimensions it means that the solution must reduce to the g -test derivedin Section 4. For three dimensions we have used a multivariate spline generalization usingbarycentric coordinates of the basic spline version of the varying g -method in two dimensions The permanent is deﬁned as perm ( A ) = (cid:80) σ ∈ S n (cid:81) ni =1 a i,σ ( i ) with the sum over all permutations σ ofthe numbers 1 , ..., n, akin the determinant but without the ± signature of the permutation. The noncentral Chi-distribution with k degrees of freedom and noncentrality parameter λ has density: f ( x, k, λ ) = e − ( x + λ ) / x k/ λ (2 − k ) / I k/ − ( λx ) , λ > , x > I k/ − () the modiﬁed Bessel function of the ﬁrst kind. g -boundary in 3D. CR is furthest removed from the origin and includes e.g.(4,4,4). The edges show the 2D solution since one t -statistic is very large. If two t -statisticsare very large then it reduces to 1.96, the 1D solution.and imposed dimensional coherency. This resulted in a maximum of 0 .

2% points diﬀerencefrom 5%, and hence not as close to similarity as in two dimensions.With increasing K, the dimension of the integral and the number of knots needed todeﬁne the function g increases. The problem becomes progressively involved and suﬀersfrom the curse of dimensionality. We can determine the solution in dimension K using thesolution in dimension K − K = 3 thesolution is given in Figure 8. It uses the solution in two dimensions and an optimized weightfunction which leads to a maximum of 0 .

13% points diﬀerence from 5%. In four dimensionswe also determined a solution using this method based on optimized weights and dimensionalcoherency, but do not report it. Employing this method one could, in principle, recursivelydetermine the K + 1 dimensional solution on the basis of the K -dimensional solution, butthis is left for future research. For a numerical illustration, we consider the recursive model of union sentiment amongsouthern nonunion textile workers as used by Bollen and Stine (1990). The model:  ym m  =  β β β   ym m  +  τ α α  (cid:20) x x (cid:21) +  uv v  , (9)21 α β β β τ Estimate –0.050 0.057 –0.283 0.987 –0.215 0.720 t -statistic –1.902 2.709 –3.582 7.120 –1.838 1.777Table 2: OLS Estimates and t -statistics for Union Sentiment Model ( N = 100)is a simpliﬁed version of McDonald and Clelland (1984) and discussed in some detail byBollen (1989, p. 82–93). It analyses the direct and indirect eﬀects of tenure and age onunion sentiment via deference and/or labor activism. Tenure x is measured in log of yearsworking in a particular textile mill and age x is measured in years. The variables sentimenttowards unions y , deference/submissiveness to managers m , and support for labor activism m , are measures based on 7, 4, and 9 survey questions respectively. The disturbances ( u , v and v ) are assumed to be uncorrelated across equations and individuals. When they arenormally distributed, ML estimation of the system reduces to OLS applied to each equationseparately due to the recursive structure. age deference activism union sentiment α α β β β Figure 9: Union sentiment mediation graphWe use a selection of 100 observations out of the original 173 and focus on three alternativetheories of the indirect eﬀects from age to union sentiment: two competing parallel eﬀectsthat the age eﬀect is mediated by increased deference in which case i = α β quantiﬁesthe indirect eﬀect. The alternative mediation channel is that activism mediates such that i = α β is the indirect eﬀect. The third channel is a serial eﬀect that age aﬀects deference,which in turn aﬀects activism, which in turn aﬀects union sentiment such that i = α β β measures the indirect eﬀect. Figure 9 illustrates the three mediation channels. The OLSestimates of the coeﬃcients of the structural equations and their t -statistics are shown inTable 2.The point estimates of the indirect eﬀects and their t -statistics based on the delta methodare shown in Table 3. For the g -test we need the absolute order statistics, evaluate g, andcompare. For H : i = 0 we observe | t (ˆ β ) | = 1 . > .

774 = g (1 . g ( | t (ˆ α ) | ) , and hence reject. For H : i = 0 we have | t (ˆ α ) | = 2 . > .

960 = g (7 . g (cid:16)(cid:12)(cid:12)(cid:12) t (cid:16) ˆ β (cid:17)(cid:12)(cid:12)(cid:12)(cid:17) and also reject. Testing the last null hypothesis H : i = 0 requires the three-dimensional solution given in Figure 8. We have | t (ˆ α ) | = 1 . < .

96 = g (3 . , . g (cid:16)(cid:12)(cid:12)(cid:12) t (cid:16) ˆ β (cid:17)(cid:12)(cid:12)(cid:12) , (cid:12)(cid:12)(cid:12) t (cid:16) ˆ β (cid:17)(cid:12)(cid:12)(cid:12)(cid:17) and do not reject. 22stimate Sobel t -statistic g -test i = α β . .

322 1 . ∗ > .

774 = g ( | − . | ) i = α β . . ∗ . ∗ > .

960 = g (7 . i = α β β . .

635 1 . ≯ .

96 = g (3 . , . t -statistics and g -test. * indicates signiﬁcance at 5%The Sobel test with critical value 1 .

96 concludes that i is signiﬁcant but does not ﬁndenough evidence for the i mediation channel. The new g -test in contrast, concludes that i is also signiﬁcant. Both t -values in this case are smaller than 1 .

96, so the LR test would notreject either. The two t -values are of comparable magnitude and the g -test ﬁnds a signiﬁcantmediation eﬀect. For implementation of the g -test only the relevant t -statistics are required.The absolute values are ordered and the smallest value compared with the value of the g -function evaluated at the largest absolute t -value. This can be looked up in Table 1 (possiblyusing linear interpolation) or one can use the spline function detailed in Table 4 and codedin R provided in Appendix E. For i both tests draw the same conclusion. The three t -values involved are not ofcomparable magnitude and the t -statistics for β and β are so large that rejecting the null H : α β β = 0 essentially depends on whether α is zero. The corresponding absolute t -value of 1 .

90 is too small to warrant such conclusion.

This paper has addressed the mediation problem that is empirically extremely importantwith thousands of applications per year in many diﬀerent ﬁelds including economics, business,marketing, and accounting and has resulted in more than 90,000 citations to a key reference.Theoretically it is an interesting statistical problem which has generated results dating backto Craig (1936) and still continues today with contributions on poor performance of the Waldstatistic, construction of similar tests, involving many diﬀerent hypotheses with singularitiesin econometrics and elsewhere.We have proposed a new general method for constructing tests that are as near as pos-sible to similarity. This varying- g method proposes a ﬂexible critical region boundary andminimizes the diﬀerence from 5% of the rejection probabilities at a number of points onthe boundary of the null hypothesis. Conceptually and practically this is very simple andstraightforward to implement. It does not require a choice of mixture distribution, nor theconstruction of least favorable distributions which may lead to nonsimilar solutions. Nu-merically it is also attractive in terms of convergence properties and avoids the need for The bootstrap is a popular alternative for testing mediation. Because of the asymmetry of the distri-bution involved this is carried out through alternative conﬁdence intervals of the indirect eﬀect. See e.g.MacKinnon et al. (2004) and Preacher and Hayes (2008). It is well known however that the bootstrap is notvalid. Simulations we carried out showed that bootstrap tests for mediation based on generally preferredBCa conﬁdence intervals can have sizes of 8% when n = 100 and higher for n smaller. . g -test satisﬁes the size condition. The critical region is strictly larger thanthe LR and Wald critical regions and is therefore strictly and uniformly more powerful.The classic tests are therefore not admissible. For large values of the coeﬃcients the powerdiﬀerence becomes negligible, but when mediation eﬀects are small or have relatively bigstandard errors, the power can be close to 5% points higher than these classic tests. Thishas important consequences for empirical work. It enables researcher to prove mediationeﬀects earlier in circumstances that one could not show mediation before due to extremeconservativeness of standard tests near the origin. Appendix A Theory

A.1 Elementary Relation

Let y = ( y , · · · , y n ) (cid:48) , m = ( m , · · · , m n ) (cid:48) , x = ( x , · · · , x n ) (cid:48) , be vectors of observablesin deviations from their means such that ¯ y = 0, ¯ m = 0, ¯ x = 0 and disturbance vectors u = ( u , · · · , u n ) (cid:48) , v = ( v , · · · , v n ) (cid:48) . The model is then: y i = τ x i + βm i + u i , (10) m i = αx i + v i , (11)and the restricted version of equation (10) with β = 0 equals: y i = τ ∗ x i + w i . (12)The claim ˆ τ ∗ = ˆ τ + ˆ α ˆ β follows from a standard exercise to relate restricted and unre-stricted OLS estimators: (cid:98) τ ∗ = ( x (cid:48) x ) − x (cid:48) y = ( x (cid:48) x ) − x (cid:48) (cid:16) x ˆ τ + m ˆ β + ˆ u (cid:17) = ˆ τ + ( x (cid:48) x ) − x (cid:48) m ˆ β + ( x (cid:48) x ) − x (cid:48) ˆ u, and ˆ α = ( x (cid:48) x ) − x (cid:48) m is the OLS estimator in equation (11) and x (cid:48) ˆ u = 0 since ˆ u are the OLSresiduals from equation (10) and orthogonal to x. τ ∗ − τ = αβ follows by substituting (11) in (10): y i = τ x i + βm i + u i = ( τ + βα ) x i + ( βv i + u i ) = τ ∗ x i + w i . It follows that H : αβ = 0 ⇔ H : τ ∗ = τ . A.2 Likelihood

The joint density of ( y, m ) given x is f ( y, m | x ) = f ( y | m, x ) f ( m, x ) , and according tothe model: y i | m i , x i ∼ N ( τ x i + βm i , σ ) ,m i | x i ∼ N ( αx i , σ ) . Hence the log-likelihood: (cid:96) = (cid:96) ( τ , β, σ , α, σ ) = log f ( y, m | x ; τ , β, σ , α, σ ) = log f ( y | m, x ; τ , β, σ )+log f ( m | x ; α, σ ) , for n independent observations equals equation (4) which can be written as: (cid:96) ∝ − σ y (cid:48) y + τσ y (cid:48) x + βσ y (cid:48) m − (cid:18) τ βσ − ασ (cid:19) x (cid:48) m − (cid:18) β σ + 1 σ (cid:19) m (cid:48) m + − (cid:18) τ σ − α σ (cid:19) x (cid:48) x − n σ σ )= η (cid:48) r − κ ( η ) , with : η = (cid:18) − σ , τσ , βσ , − (cid:18) τ βσ − ασ (cid:19) , − (cid:18) β σ + 1 σ (cid:19)(cid:19) (cid:48) ,r = ( y (cid:48) y, y (cid:48) x, y (cid:48) m, x (cid:48) m, m (cid:48) m ) (cid:48) , and κ some function of η and x (cid:48) x which is ﬁxed. Since dim ( η ) = dim ( r ) the model is a fullexponential model of dimension ﬁve following the Koopman-Fisher-Darmois theorem (seevan Garderen (1997)) and r is a complete suﬃcient statistic. The score s ( α, β, δ, σ , σ ) = s = ( s (cid:48) , s (cid:48) ) (cid:48) is analogues to the scores of the two separate regression models since ( τ , β, σ )appears in the ﬁrst equation only, and ( α, σ ) appears in the second equation only. So: s = (cid:16) ( y − τx − βm ) (cid:48) xσ , ( y − τx − βm ) (cid:48) mσ , ( y − τx − βm ) (cid:48) ( y − τx − βm )2 σ − n σ , ( m − αx ) (cid:48) xσ , ( m − αx ) (cid:48) ( m − αx )2 σ − n σ (cid:17) (cid:48) and the Maximum Likelihood Estimator (MLE) equals the MLE for the two equations sep-arately: (cid:18) ˆ τ ˆ β (cid:19) = (cid:0) ( x : m ) (cid:48) ( x : m ) (cid:1) − ( x : m ) (cid:48) y ; (cid:98) σ = 1 n y (cid:48) M X y ;ˆ α = ( x (cid:48) x ) − x (cid:48) m ; (cid:98) σ = 1 n m (cid:48) M x m, with M A = I − A ( A (cid:48) A ) − A (cid:48) and X = [ x : m ] an n × r which is a minimal suﬃcient andcomplete statistic. 25 .3 Classic Tests Wald test.

Under H : αβ = r ( α, β ) = 0. Then R ( α, β ) = ∂ r ( α,β ) ∂ ( α,β ) (cid:48) = ( β, α ) (cid:48) and evaluatedat the (unrestricted) MLE equals R (cid:16) ˆ α, ˆ β (cid:17) = (cid:16) ˆ β, ˆ α (cid:17) (cid:48) . The Wald test therefore becomes: W = ˆ α ˆ β (cid:32)(cid:18) ˆ β ˆ α (cid:19) (cid:48) (cid:18) σ α σ β (cid:19) (cid:18) ˆ β ˆ α (cid:19)(cid:33) − ˆ α ˆ β = ˆ α ˆ β ˆ α σ β + ˆ β σ α · (cid:16) σ α σ β (cid:17) − (cid:16) σ α σ β (cid:17) − = T T T + T The Sobel test equals √ W and is usually expressed as the square root of the ﬁrst term inthe second line: ˆ α ˆ β (cid:114) ˆ α σ β +ˆ β σ α . LR test.

The maximum value of the log-likelihood can be expressed in terms of the OLSresidual sum of squares in the usual way for the ﬁrst and second equation,

RSS and RSS respectively: (cid:96) (cid:16) ˆ τ , ˆ β, ˆ σ , ˆ α, ˆ σ (cid:17) ∝ − σ n (cid:88) i =1 (cid:16) y i − ˆ τ x i − ˆ βm i (cid:17) − n σ ) + − σ n (cid:88) i =1 ( m i − ˆ αx i ) − n σ )= − n − n RSS /n ) − n − n RSS /n ) . Denote the restricted residual sums of squares by (cid:93)

RSS when β = 0, (cid:93) RSS when α = 0 , and the restricted maximized log-likelihoods by: (cid:96) α =0 (cid:16) ˆ τ , ˆ β, ˆ σ , , ˜ σ (cid:17) ∝ − n − n RSS /n ) − n − n (cid:16) (cid:93) RSS /n (cid:17) ,(cid:96) β =0 (˜ τ , , ˜ σ , ˆ α, ˆ σ ) ∝ − n − n (cid:16) (cid:93) RSS /n (cid:17) − n − n RSS /n ) . The LR test of the full model with ﬁve parameters, against the model with the singlerestriction β = 0 equals: LR β =0 = 2 (cid:16) − n RSS /n ) + n (cid:16) (cid:93) RSS /n (cid:17)(cid:17) = n log (cid:18) n T (cid:19) since (cid:93) RSS = ˆ β (cid:48) m ( m (cid:48) M x m ) − m ˆ β + RSS and T = ˆ β (cid:48) m ( m (cid:48) M x m ) − m ˆ β/ ˆ σ = (cid:16) (cid:93) RSS − RSS (cid:17) / ( RSS /n ) . α = 0 equals: LR α =0 = n log (cid:18) n T (cid:19) . The likelihood ratio test for H : α = 0 and/or β = 0 uses the maximized log-likelihoodunder the alternative (the same in both cases) and under the null, which means minimizingover LR α =0 and LR β =0 and hence: LR = min { LR α =0 , LR β =0 } , which is equivalent to rejecting for large values of:min (cid:8) T , T (cid:9) or min {| T | , | T |} . LM tests.

The score version of the LM statistic: LM = s (cid:16) ˜ λ (cid:17) (cid:48) I − λ s (cid:16) ˜ λ (cid:17) requires the score vector evaluated under the null, but there are three cases: (i) α = 0 ∧ β (cid:54) = 0(ii) β = 0 ∧ α (cid:54) = 0 (iii) α = 0 ∧ β = 0 s (cid:16) ˜ λ α =0 (cid:17) =  ˜ v (cid:48) x ˜ v (cid:48) ˜ v/n  ; s (cid:16) ˜ λ β =0 (cid:17) =  ˜ u (cid:48) x m ˜ u (cid:48) ˜ u/n  ; s (cid:16) ˜ λ α =0 ∧ β =0 (cid:17) =  ˜ u (cid:48) x m ˜ u (cid:48) ˜ u/n ˜ v (cid:48) x ˜ v (cid:48) ˜ v/n  . the inverse information matrix equals: I − λ =  n σ σ (cid:0) n σ x (cid:48) x (cid:1) − n σ σ − n σ σ n σ σ σ n σ x (cid:48) x

00 0 0 0 σ n  . Hence the three score versions of the LM test equal: LM α =0 = n (cid:18) ˜ v (cid:48) x ˜ v (cid:48) ˜ v (cid:19) x (cid:48) x (cid:18) ˜ v (cid:48) ˜ vn (cid:19) = n x (cid:48) m x x (cid:48) x m x (cid:48) x x (cid:48) m x m ,LM β =0 = n (cid:18) ˜ u (cid:48) x m ˜ u (cid:48) ˜ u (cid:19) n (cid:18) ˜ u (cid:48) ˜ u ˆ v (cid:48) ˆ v (cid:19) = n ˜ u (cid:48) x m x (cid:48) m ˜ u ˜ u (cid:48) ˜ u ˆ v (cid:48) ˆ v ,LM α =0 ∧ β =0 = n x (cid:48) m x x (cid:48) x m x (cid:48) x x (cid:48) m x m + n ˜ u (cid:48) x m x (cid:48) m ˜ u ˜ u (cid:48) ˜ u ˜ v (cid:48) ˜ v . ppendix B Proof of Theorem 1 The proof is slightly more transparent in terms of the acceptance region. The probabilityof not rejecting H should equal 0 .

95 uniformly over H : P [ AR g |∀ α ∈ R ∧ β = 0] = 0 .

95 = P [ AR g |∀ β ∈ R ∧ α = 0]. Without loss of generality we set β = 0. P [ AR g | α ] = P [ | T | < g ( | T | ) ∪ | T | < g ( | T | ) | β = 0 ∧ α ∈ R ] . Under H : β = 0 and α ∈ R : T ∼ N ( µ,

1) with µ = ασ α and T ∼ N (0 , . By independenceand symmetry we have, with φ ( · ) denoting the standard normal density (see also Figure 2for the areas of integration) : P [ AR g | µ ] = 2 (cid:90) + ∞−∞ φ ( t − µ ) (cid:34)(cid:90) g ( t )0 φ ( t ) + (cid:90) + ∞ g − ( t ) φ ( t ) (cid:35) dt dt =0 .

95 = 2 (cid:90) + ∞−∞ φ ( t − µ ) (cid:20) Φ ( g ( t )) − Φ (cid:0) g − ( t ) (cid:1) + 12 (cid:21) dt . This implies restrictions on g ( · ). Deﬁne F ( t ) = 2 · (cid:2) Φ ( g ( t )) − Φ ( g − ( t )) + − . / (cid:3) thenthe restrictions become:0 = (cid:90) + ∞−∞ φ ( t − µ ) F ( t ) dt ∀ µ ∈ R . The normal distribution N ( µ,

1) is a one parameter full exponential family and thereforecomplete. Hence F ( T ) ≡ only function with expectation 0 for all values of µ .Consequently g ( t ) must satisfy:Φ ( g ( t )) − Φ (cid:0) g − ( t ) (cid:1) = − . ∀ t ∈ R But g (0) = 0 implies g − (0) = 0 and hence Φ ( g (0)) − Φ ( g − (0)) = 0 (cid:54) = − .

025 and nosimilar boundary exists.

Notes

1. The proof can be extended explicitly to a function g ( t ) that is deﬁned for t ≥ L only, and undeﬁned on [0 , L ). Implicitly this is already covered by deﬁning g ( t ) = 0 ∀ t ∈ [0 , L ) since this horizontal line segment contains no probability to contribute tothe NRP.2. We use 5% signiﬁcance throughout but it is immediate from the proof that there is nosigniﬁcance level for which there exists a similar boundary function, apart from twotrivial exceptions. A size of 0% would yield g ( t ) = t and AR = R such that the testwould never reject. The other trivial solution is g ( t ) = 0 and deﬁning g − (0) = ∞ accordingly, which is a test that always rejects and leading to an NRP of 100% for allparameter values. 28 ppendix C Invariance When testing the no-mediation hypothesis H : αβ = 0 , the parameters τ , σ , σ arenuisance parameters. Their values have no inﬂuence on whether the null is true or not andtherefore we want a test that is invariant with respect to an appropriate group of transfor-mations that leaves the relevant distributions and hypotheses invariant. All distributions areconditional on x since it is strictly exogenous, but m is a random variable that depends on x ,and y depends on both x and m. In the notation conditionality is implicit on x , but explicitif conditional on m. So conditional on x and under the normality assumption of ( u, v ) wehave the common OLS results, with all variables in deviation from their means: (cid:18) ˆ τ ˆ β (cid:19) | m ∼ N (cid:18)(cid:18) τβ (cid:19) , σ ( X (cid:48) X ) − (cid:19) , and s /σ ∼ χ n − ˆ α = ( x (cid:48) x ) − x (cid:48) m ∼ N ( α, σ ( x (cid:48) x ) − ) and s /σ ∼ χ n − ,with X = [ x : m ], s = y (cid:48) M X y = n (cid:99) σ , s = m (cid:48) M x m = n (cid:99) σ . The conditional varianceof (cid:16) ˆ τ , ˆ β (cid:17) (cid:48) depends on m only through ˆ α and s because: (cid:0) [ x : m ] (cid:48) [ x : m ] (cid:1) − = 1 s ( x (cid:48) x ) − (cid:20) m (cid:48) m − x (cid:48) m − m (cid:48) x x (cid:48) x (cid:21) = (cid:34) ( x (cid:48) x ) − + ˆ α s − ˆ αs − ˆ αs s (cid:35) , using: | X (cid:48) X | = x (cid:48) xm (cid:48) m − x (cid:48) mm (cid:48) x = x (cid:48) x ( m (cid:48) m − m (cid:48) x ( x (cid:48) x ) − x (cid:48) m ) = x (cid:48) xm (cid:48) M x m = x (cid:48) x · s ,m (cid:48) m = s + m (cid:48) x ( x (cid:48) x ) − x (cid:48) m = s + x (cid:48) x (cid:16) ( x (cid:48) x ) − x (cid:48) m (cid:17) = s + x (cid:48) x ˆ α ,x (cid:48) m = x (cid:48) x ˆ α. This implies that s depends on m only through ˆ α and s since s = y (cid:48) M X y = y (cid:48) y − (cid:0) ˆ τ ˆ β (cid:1) (cid:48) (cid:18) x (cid:48) x x (cid:48) mm (cid:48) x m (cid:48) m (cid:19) (cid:0) ˆ τ ˆ β (cid:1) . Hence s σ | m ≡ s σ | ˆ α, s , x ∼ χ n − . Conditional on m, s is alsodistributed independently of (cid:16) ˆ τ , ˆ β (cid:17) . Further note that conditional on m, or ˆ α, s , the dis-tribution of ˆ β | ˆ α, s ∼ N ( β, σ /s ) and ˆ τ | ˆ α, s , ˆ β ∼ N (cid:16) τ − ˆ α (cid:16) ˆ β − β (cid:17) , σ ( x (cid:48) x ) − (cid:17) .Writing the joint density of the suﬃcient statistics as product of conditional and marginaldistributions we obtain the representation: f (ˆ τ , ˆ β, s , ˆ α, s ) = N (cid:16) τ − ˆ α (cid:16) ˆ β − β (cid:17) , σ ( x (cid:48) x ) − (cid:17) × N ( β, σ /s ) × (cid:0) σ χ n − (cid:1) × N ( α, σ ( x (cid:48) x ) − ) × (cid:0) σ χ n − (cid:1) which is equivalent to the likelihood which was given in logs in equation (4).The transformations s → a s and s → a s with a , a > s , s ) in the same family with ( σ , σ ) replaced by ( a σ , a σ ) and have no Most invariance results in this section are in collaboration and thanks to Hillier (2019). σ and σ are present in the other compo-nents and we need to transform the remaining variables accordingly as: ˆ α → √ a ˆ α andˆ β → (cid:112) a /a ˆ β with densities: √ a ˆ α ∼ N ( √ a α, a σ ( x (cid:48) x ) − ) and (cid:112) a /a ˆ β (cid:12)(cid:12)(cid:12) a s ∼ N ( (cid:112) a /a β, a σ / ( a s ) ( x (cid:48) x ) − ) respectively. Finally, since τ is not involved in the infer-ence problem, we may transform ˆ τ → √ a (ˆ τ + a ) so that: √ a (ˆ τ + a ) | √ a ˆ α, (cid:112) a /a ˆ β, a s ∼ N (cid:16) √ a ( τ + a ) − √ a ˆ α (cid:112) a /a (cid:16) ˆ β − β (cid:17) , a σ ( x (cid:48) x ) − (cid:17) , which has the same form as before the transformation.These transformations preserve the family of distributions for the suﬃcient statistics (andMLEs), and transform the mediation eﬀect as αβ → √ a αβ . They do not change the nullhypothesis being true or false, i.e. H is true before iﬀ it is true after the transformation.We may therefore state: Proposition 10

The testing problem is invariant under the group K of transformationsacting on (cid:16) ˆ τ , ˆ β, s , ˆ α, s (cid:17) deﬁned by (cid:16) ˆ τ , ˆ β, s , ˆ α, s (cid:17) → (cid:16) √ a (ˆ τ + a ) , (cid:112) a /a ˆ β, a s , √ a ˆ α, a s (cid:17) ,a ∈ R , a , a ∈ R + . The induced group of transformations ¯K acting on the parameter space is ( τ , β, σ , α, σ ) → (cid:16) √ a ( τ + a ) , (cid:112) a /a β, a σ , √ a α, a σ (cid:17) Proposition 11

A maximal invariant statistic under the group of transformations K is thevector of t -statistics: T = ( T , T ) (cid:48) = (cid:32) ˆ α/ (cid:114) n − s ( x (cid:48) x ) − , ˆ β/ (cid:114) n − s /s (cid:33) (cid:48) . A parameter-space maximal invariant under the induced group ¯K is µ = ( µ , µ ) (cid:48) = (cid:18) α/ (cid:113) σ ( x (cid:48) x ) − , β/ (cid:114) σ σ /n (cid:19) (cid:48) The distribution of ( T , T ) (cid:48) depends only on ( µ , µ ) (cid:48) . Proof.

The transformations on ˆ τ are transitive, so no invariant test can depend on ˆ τ . Wewill therefore restrict further analysis to the four remaining statistics (cid:16) ˆ β, s , ˆ α, s (cid:17) anduse k and ¯k to denote the transformations restricted to these four statistics. Invarianceof ( T , T ) follows immediate upon substitution. Now T = ( T , T ) is a maximal invariant30f T (ˆ β, ˆ α, s , s ) = T (˜ β, ˜ α, ˜ s , ˜ s ) and T (ˆ β, ˆ α, s , s ) = T (˜ β, ˜ α, ˜ s , ˜ s ) implies thatthere exists a group element k such that (˜ β, ˜ α, ˜ s , ˜ s ) = k (ˆ β, ˆ α, s , s ) : T : ˆ α (cid:113) n − s √ x (cid:48) x = ˜ α (cid:113) n − ˜ s √ x (cid:48) x ⇒ ˜ α = √ ˜ s √ s ˆ α = √ a ˆ αT : ˆ β (cid:113) n − s /s = (cid:101) β (cid:113) n − ˜ s / ˜ s ⇒ (cid:101) β = (cid:115) ˜ s / ˜ s s /s ˆ β = (cid:112) a /a ˆ β, and therefore a = ˜ s /s and a = ˜ s /s . The two values a and a give the correcttransformation for ˆ α and ˆ β and the same holds for s and s : ˜ s = a s and ˜ s = a s .So there is indeed a group element k such that (˜ β, ˜ α, ˜ s , ˜ s ) = k (ˆ β, ˆ α, s , s ). The sameargument applies to the parameter space. The last statement is a well-known property ofmaximal invariants. (cid:3) Note that ( T , T ) are the basic t -statistics for testing α = 0 and β = 0 when treatingthe two equations separately and estimating by OLS. The estimated standard error for ˆ α isthe standard formula (cid:113) n − ˜ s ( x (cid:48) x ) − and, using the Frisch-Waugh theorem, the estimatedstandard error for ˆ β conditional on m and x is (cid:113) n − s ( m (cid:48) M x m ) − = (cid:113) n − s /s .These exact invariance results provide a strong justiﬁcation for restricting attention to thetwo t -statistics for any sample size, ﬁnite or asymptotically, since it is natural to restrict theproblem to procedures that are scale invariant and do not depend on τ . The testing problemhas further symmetries. The problem is invariant to changing the signs (reﬂections) of T and T or permuting them. This leads to maximal invariants with a sample and parameterspace that is only part of R K . Proof. (of Lemma 8) (cid:110) | T | (1) , ..., | T | ( K ) (cid:111) is obviously invariant to changes in sign andpermutation as a consequence of the absolute values and subsequent sorting. It is a maximalinvariant because any two T and ˜ T such that (cid:110) | T | (1) , ..., | T | ( K ) (cid:111) = (cid:26)(cid:12)(cid:12)(cid:12) ˜ T (cid:12)(cid:12)(cid:12) (1) , ..., (cid:12)(cid:12)(cid:12) ˜ T (cid:12)(cid:12)(cid:12) ( K ) (cid:27) can only hold if ˜ T is a permutation of T with a number of sign changes. Hence therewill exist a transformation h = h · h ∈ G × G s.t. ˜ T = h · T. The same argumentholds for (cid:110) | µ | (1) , ..., | µ | ( K ) (cid:111) since the group of transformations on the parameter space isthe same as on the sample space. That the distribution of (cid:110) | T | (1) , ..., | T | ( K ) (cid:111) depends onlyon (cid:110) | µ | (1) , ..., | µ | ( K ) (cid:111) is again a property of maximal invariant. Lemma 9 gives an explicitexpression that further shows that the distribution is invariant under the G × G . (cid:3) Proof. (of Lemma 9) The absolute value of the normal variate T k with mean µ k and variance1 follows a noncentral Chi-distribution with one degree of freedom. The K distributions χ (cid:16) | t | ( k ) , | µ | ( k ) (cid:17) are independent. The result is then a direct application of Vaughan andVenables (1972, eq. 6). (cid:3) ppendix D Algorithms The construction of the optimal g -test is in two steps. The ﬁrst step is a basic implementationof the general varying- g method. This generates a near similar test that deviates less than0 .

01% points from 5% . We use this (cid:15) as a starting value for determining an upper bound tothe power envelope. The second step is using this upper bound to derive an optimal g -testthat minimizes the distance between the power surface and the power envelope for tests inΓ M (cid:15) . Implementation of the varying- g method: Basic g -function algorithm

1. Deﬁne g ( · ) nonparametrically as a linear spline deﬁned by J +2 knots (cid:8)(cid:0) t ( j ) , g ( j ) (cid:1)(cid:9) J +1 j =0 , i.e. by J + 2 values g ( j ) on a regular grid of points t ( j ) . The ﬁrst and last knots are ﬁxedat (0 ,

0) and (2 . , z . ) respectively, so there are J knots to be chosen. For points t not on the grid g ( t ) is obtained by linear interpolation and g ( t ) = z . ≈ .

96 for t > . .

2. The criterion function Q ( g ) is the accumulated NRP deviation from 5% as measuredby the asymmetric loss function q over a grid of points (cid:110) µ ( ι )0 (cid:111) Υ ι =1 with Υ > J and µ (1)0 = 0 : Q ( g ) = Υ (cid:88) ι =1 q (cid:16) N RP g (cid:16) µ ( ι )0 (cid:17) − . (cid:17) , withq ( x ) = (cid:26) − x : x ≤ x : x > N RP g ( µ ) = P [ T ∈ CR g | µ = 0 , µ = µ ≥ Q ( g ) by varying g ( · ):(a) Initialize g ( · ) with knots { (0 , , (0 . , . , (1 . , . , (2 . , . } correspond-ing to the LR boundary. The ﬁrst and last knot are ﬁxed and the middle two arevaried when optimizing Q ( g ).(b) For given g ( · ) calculate the NRPs by numerical integration for the grid of Υ noncentrality parameter points { (0 , µ ( ι ) ) } Υ ι =1 under the null, with Υ ≥ J andcalculate Q ( g ) . (c) Vary g ( · ) by changing J knots and minimize the criterion function Q ( g ), subjectto:i. 0 ≤ g ( j +1) − g ( j ) < δ : monotonicity and limited increaseii. g ( t ) ≤ t : logical restriction since maximal invariant is absolute order statisticiii. g ( J +1) = z . : dimensional coherence requires reduction to one dimensionalsolution (see Section 5)(d) Increase the number of knots J and iterate until convergence.32 omments

1. The grid points (cid:8) t ( j ) (cid:9) Jj =1 are chosen equally spaced between 0 and t J . The ﬁrst andlast knot, (0 ,

0) and ( t ( J +1) , g ( J +1) = z . ) remain ﬁxed. For the illustration we havechosen t ( J ) = 1 .

96 and t ( J +1) = 2 .

5. For large enough | T | it is essentially known that β (cid:54) = 0 and the rejection depends only on whether α = 0 is rejected. The corresponding5% critical value for | T | based on the normal distribution is the usual z . ≈ .

96 as | T | → ∞ .

2. For J small there are big gains in reducing the deviation from 5% by varying the knots (cid:8) t ( j ) , g ( j ) (cid:9) Jj =1 and also by increasing J, see Figure 3.3. The number Υ of µ ( ι ) points to check similarity was chosen to be Υ = 76 > J :60 points equally spaced between 0 and 6, and 16 points equally spaced between 6and 20. This imposes 152 side conditions. Step 3(c) imposes a further 3 J restrictionsapproximately for every choice of J , and about 100 when J = 32 .

4. The loss function q was chosen such that it puts large penalty on positive valuesof ( N RP − .

05) that violate the size condition. Even extreme penalties still lead toNRPs that are over 5% for some values of µ and therefore renders an invalid (oversized)test. Even though these NRP transgressions are very minor, we address this issue inthe optimal test in Section 4 and use the following algorithm. Optimal g -function In order to ﬁnd the optimal g , we minimize the sum of diﬀerences between g ’s powersurface and the power envelope on a grid of points, subject to the size and (cid:15) similarity con-ditions. The criterion function further includes a roughness penalty on g based on numericalsecond derivatives ∆ g (cid:0) t ( i ) (cid:1) , and we impose monotonicity g (cid:0) t ( i +1) (cid:1) ≥ g (cid:0) t ( i ) (cid:1) and, since bydeﬁnition of the absolute order statistic | T | (1) ≤ | T | (2) ,we logically restrict g to 0 ≤ g ( t ) ≤ t . Optimal g -function algorithm

1. Deﬁne g ( · ) nonparametrically as a linear spline deﬁned above.2. Deﬁne the criterion function Q ∗ (cid:15) ( g ) as the accumulated power diﬀerence over the tri-angular grid of points M = (cid:110)(cid:16) µ ( γ,κ )1 , µ ( γ,κ )2 (cid:17)(cid:111) ≤ γ<κ ≤ Υ : Q ∗ (cid:15) ( g ) = Υ (cid:88) κ =1 (cid:88) γ ≤ κ ¯ π (cid:16) µ ( γ,κ )1 , µ ( γ,κ )2 (cid:17) − P (cid:104) CR g | (cid:16) µ ( γ,κ )1 , µ ( γ,κ )2 (cid:17)(cid:105) + λ J (cid:88) i =1 ∆ g (cid:0) t ( i ) (cid:1)

3. Minimize Q ∗ (cid:15) ( g ) by varying g ( · ):(a) Start with g ( · ) equal to the previously determined basic g -function . (b) For given g ( · ) calculate Q ∗ (cid:15) ( g ) by numerical integration . (c) Vary g ( · ) by changing J knots and minimize the criterion function Q ∗ (cid:15) ( g ), subjectto: 33. 0 . − (cid:15) ≤ P (cid:104) CR g | (cid:16) , µ ( ι )0 (cid:17)(cid:105) ≤ . , ∀ ι = 1 , · · · , Υ : near similarity andsize restrictionsii. 0 = g (0) ≤ g (cid:0) t ( j ) (cid:1) ≤ g (cid:0) t ( j +1) (cid:1) ≤ t ( j +1) : monotonicity,iii. g ( j +1) − g ( j ) < t ( j +1) − t ( j ) : limited increase and derivative,iv. g ( t ) ≤ t : logical restriction since argument is absolute order statistic,v. g ( J +1) = z . : dimensional coherence.(d) Increase the number of knots J and iterate until convergence.The regularization parameter was set to λ = 0 . . The basic implementation algorithm solved the optimal g -boundary by minimizing (cid:15). Once (cid:15) is determined, the current algorithm is akin to solving a dual problem that uses (cid:15) forthe inequality restrictions and maximizes power. It minimizes the total diﬀerence from thepower envelope.

Power Envelopes

We calculate two power envelopes: one for near similar tests in Γ (cid:15) and a second for non-similar tests. The algorithm for calculating the power envelope is related to Chiburis (2009)and implemented in Julia, see Bezanson et al. (2017), using Gurobi, an optimization packagethat can handle many side restrictions; see Gurobi Optimization (2019). We maximize powersubject to size and near similarity restrictions on a grid of Υ parameter points under thenull: 0 . − (cid:15) ≤ N RP (cid:0) µ ( ι ) (cid:1) ≤ .

05 for ι = 1 , ..., Υ . The upper bounds ensure correct size, atleast for the points considered. The lower bounds constitute the near similarity restriction.The power envelope is obtained by repeating this maximization on a grid of points ( µ , µ )under the alternative.For the nonsimilar power envelope we can discard the lower bound restrictions 0 . − (cid:15) ≤ N RP (cid:0) µ ( ι ) (cid:1) . The power can only increase (or remain the same) and the diﬀerence betweenthe two diﬀerent power envelopes is the power loss one suﬀers from insisting on similarity.This turns out to be less than 2% points and it should be stressed that this overstates theloss since no single test achieves the power envelope.Denote the parameter space for the ordered absolute noncentrality parameterΞ = (cid:8) ( µ , µ ) ∈ R + × R + | ≤ µ ≤ µ (cid:9) . We will use a bounded (triangular) subset of this octant Ξ deﬁned asΞ = (cid:8) ( µ , µ ) ∈ R + × R + | ≤ µ ≤ µ ≤ µ max (cid:9) and partitioned it into a null and alternativeparameter setΞ = (cid:8) ( µ , µ ) ∈ R + × R + | µ ≤ µ ≤ µ max (cid:9) andΞ = (cid:8) ( µ , µ ) ∈ R + × R + | < µ ≤ µ ≤ µ max (cid:9) respectively.Analogously deﬁne the sample space of the maximal invariant/absolute order statistic as T = (cid:8) ( t , t ) ∈ R + × R + | t ≤ t (cid:9) . Very large values of t and t are of limited interest andfor computational purposes we can restrict ourselves to a bounded triangular subset of thesample space: T = (cid:8) ( t , t ) ∈ R + × R + | t ≤ t ≤ t max (cid:9) Power Envelope Algorithm

1. Discretize Ξ into Υ points under H : M = (cid:110)(cid:16) , µ ( ι )0 (cid:17)(cid:111) Υ ι =1 .

34. Discretize Ξ by choosing a triangular array of Υ (1 + Υ ) points under H : M = (cid:110)(cid:16) µ ( γ,κ )1 , µ ( γ,κ )2 (cid:17)(cid:111) ≤ γ ≤ κ ≤ Υ

3. Partition T into squares s ij with 1 ≤ i ≤ j ≤ J such that (cid:91) ≤ i ≤ j ≤ J s ij = T and s ij ∩ s kl = ∅ ∀ ( i, j ) (cid:54) = ( k, l ) .

4. Under H for ι = 1 , · · · , Υ calculate p ιij = P (cid:104)(cid:16) | T | (1) , | T | (2) (cid:17) ∈ s ij | (cid:16) , µ ( ι )0 (cid:17) ∈ Ξ (cid:105)

5. For each 1 ≤ γ, κ ≤ m choose µ ( γ,κ ) = (cid:16) µ ( γ,κ )1 , µ ( γ,κ )2 (cid:17) ∈ M under the alternative. Forthis µ :(a) Calculate p γκij = P (cid:104)(cid:16) | T | (1) , | T | (2) (cid:17) ∈ s ij | µ ( γ,κ ) = (cid:16) µ ( γ,κ )1 , µ ( γ,κ )2 (cid:17)(cid:105) for each s ij ∈ T (b) Determine the critical region to maximize the powermax { φ γκij , ≤ i ≤ j ≤ J } (cid:88) ≤ i ≤ j ≤ J p γκij φ γκij by selecting indicators φ γκij = CR ( s ij ) equal to 1 if s ij is part of the criticalregion, or 0 if part of the acceptance region, subject to the near similarity andsize restrictions on the NRPs:0 . − (cid:15) ≤ (cid:88) ≤ i ≤ j ≤ J p ιij φ µij ≤ .

05 for ι = 1 , ..., Υ Comments . Optimizer: Gurobi: t max = 11 , each square s ij has lengths 0 . . Hencecardinality of | T | = 285150. For power calculations we use for M a regular grid with µ ∈ { . , . , · · · , } , µ ∈ { . , . , · · · , µ } . For size and near similarity restrictions weuse µ ∈ { . , . , · · · , . } and for near similarity (cid:15) = 10 − . Appendix E g -Function R Code R Code g < - function(y) { SplinePredict < - function(x,BP,coef) { if (x > =BP[2]) { idx < - 2 } else { idx < - 1 } h < - (x-BP[idx])return(coef[idx,1]+coef[idx,2]*h+coef[idx,3]*h^2+coef[idx,4]*h^3) } z0025 < - 1.9599639845400542355BP < - c(0.05, 0.16, 0.2, 1.2, 1.3, 1.5, 2.0, 2.1, 2.2)w < - c(0.05, 0.12375880123418455, 0.15698848058671158,1.1569884851127599, 1.247794136737431, 1.3715940041517778,1.8715940058845608, 1.9440175462611924, z0025) oef1 < - rbind(c(0.05, 1.0, -6.0947848621912755,28.178586075656632),c(0.12375880123418455, 0.6820300048642552,3.204148542775413,12.841273273689932))coef2 < - rbind(c(1.1569884851127599, 1.0, 0.06613363957554252,-9.85568477108435),c(1.247794136737431, 0.7175561847825775,-2.8905717917497653,11.988937765977742))coef3 < - rbind(c(1.8715940058845608, 1.0, -2.4006862861725504,-3.5695967616429662),c(1.9440175462611924, 0.41277483991620023,-3.4715653146654413,9.38460743389627))x < - abs(y)if (x < =BP[1]) { return(x) } else if (x < BP[3]) { return(SplinePredict(x,BP[1:3],coef1)) } else if (x < =BP[4]) { return(w[3]+(x-BP[3])/(BP[4]-BP[3])*(w[4]-w[3])) } else if (x < BP[6]) { return(SplinePredict(x,BP[4:6],coef2)) } else if (x < =BP[7]) { return(w[6]+(x-BP[6])/(BP[7]-BP[6])*(w[7]-w[6])) } else if (x < BP[9]) { return(SplinePredict(x,BP[7:9],coef3)) } else { return(z0025) }} t ( i − t ( i ) a i b i c i d i l ( t ) 0.000000 1.0000000.05 0.16 s ( t ) 0.050000 1.000000 -6.094785 28.1785860.16 0.20 s ( t ) 0.123759 0.682030 3.204149 12.8412730.20 1.20 l ( t ) 0.156988 1.0000001.20 1.30 s ( t ) 1.156988 1.000000 0.066134 -9.8556851.30 1.50 s ( t ) 1.247794 0.717556 -2.890572 11.9889381.50 2.00 l ( t ) 1.371590 1.0000002.00 2.10 s ( t ) 1.871594 1.000000 -2.400686 -3.5695972.10 2.20 s ( t ) 1.944018 0.412775 -3.471565 9.3846072.20 ∞ constant z . Table 4: The optimal g function in spline representation.Coeﬃcients of the linear splines l i ( t ) and clamped cubic splines s i ( t ) : l i ( t ) = a i + b i ( t − t ( i − ) and s i ( t ) = a i + b i ( t − t ( i − ) + c i ( t − t ( i − ) + d i ( t − t ( i − ) References

Alwin, D. F. and R. M. Hauser (1975): “The decomposition of eﬀects in path analysis,”

American sociological review , 37–47.

Andrews, D. W. K. (2012): “Similar-on-the-boundary tests for moment inequalities exist,but have poor power,” Tech. rep., Cowles Foundation Discussion Paper.36 ndrews, D. W. K., M. J. Moreira, and J. H. Stock (2006): “Optimal two-sidedinvariant similar tests for instrumental variables regression,”

Econometrica , 74, 715–752.——— (2008): “Eﬃcient two-sided nonsimilar invariant tests in IV regression with weakinstruments,”

Journal of Econometrics , 146, 241–254.

Andrews, D. W. K. and W. Ploberger (1994): “Optimal tests when a nuisanceparameter is present only under the alternative,”

Econometrica , 1383–1414.

Baron, R. M. and D. A. Kenny (1986): “The moderator–mediator variable distinctionin social psychological research: Conceptual, strategic, and statistical considerations.”

Journal of personality and social psychology , 51, 1173.

Bezanson, J., A. Edelman, S. Karpinski, and V. B. Shah (2017): “Julia: A freshapproach to numerical computing,”

SIAM review , 59, 65–98.

Bollen, K. and R. Stine (1990): “Direct and indirect eﬀects: Classical and bootstrapestimates of variability,”

Sociological methodology , 115–140.

Bollen, K. A. (1989):

Structural equations with latent variables , John Wiley & Sons.

Chiburis, R. C. (2009): “Approximately most powerful tests for moment inequalities,”in

Essays on Treatment Eﬀects and Moment Inequalities , Ph.D. thesis, Department ofEconomics, Princeton University, chap. 3.

Coletti, A. L., K. L. Sedatole, and K. L. Towry (2005): “The eﬀect of controlsystems on trust and cooperation in collaborative environments,”

The Accounting Review ,80, 477–500.

Craig, C. C. (1936): “On the frequency function of xy,”

The Annals of MathematicalStatistics , 7, 1–15.

Drton, M. and H. Xiao (2016): “Wald tests of singular hypotheses,”

Bernoulli , 22, 38–59.

Dufour, J.-M., E. Renault, and V. Zinde-Walsh (2017): “Wald tests when restric-tions are locally singular,” Tech. rep., arxiv.org/abs/1312.0569v1.

Elliott, G., U. K. M¨uller, and M. W. Watson (2015): “Nearly optimal tests whena nuisance parameter is present under the null hypothesis,”

Econometrica , 83, 771–811.

Freedman, L. S. and A. Schatzkin (1992): “Sample size for studying intermediateendpoints within intervention trials or observational studies,”

American Journal of Epi-demiology , 136, 1148–1159.

Glonek, G. F. V. (1993): “On the behaviour of Wald statistics for the disjunction of tworegular hypotheses,”

Journal of the Royal Statistical Society: Series B (Methodological) ,55, 749–755. 37 uggenberger, P., F. Kleibergen, and S. Mavroeidis (2019): “A more powerfulsubvector Anderson Rubin test in linear instrumental variables regression,”

QuantitativeEconomics , 10, 487–526.

Gurobi Optimization, L. (2019): “Gurobi Optimizer Reference Manual,” .

Heckman, J. and R. Pinto (2015a): “Causal analysis after Haavelmo,”

EconometricTheory , 31, 115–151.——— (2015b): “Econometric mediation analyses: Identifying the sources of treatmenteﬀects from experimentally estimated production technologies with unmeasured and mis-measured inputs,”

Econometric reviews , 34, 6–31.

Hillier, G. H. (2019): Personal communication.

Lehmann, E. L. and J. P. Romano (2005):

Testing statistical hypotheses , SpringerScience & Business Media.

MacKenzie, S. B., R. J. Lutz, and G. E. Belch (1986): “The role of attitude towardthe ad as a mediator of advertising eﬀectiveness: A test of competing explanations,”

Journal of marketing research , 23, 130–143.

MacKinnon, D. P., C. M. Lockwood, J. M. Hoffman, S. G. West, and V. Sheets (2002): “A comparison of methods to test mediation and other intervening variable ef-fects,”

Psychological methods , 7, 83.

MacKinnon, D. P., C. M. Lockwood, and J. Williams (2004): “Conﬁdence limitsfor the indirect eﬀect: Distribution of the product and resampling methods,”

Multivariatebehavioral research , 39, 99–128.

McDonald, J. A. and D. A. Clelland (1984): “Textile workers and union sentiment,”

Social Forces , 63, 502–521.

Moreira, M. J. and R. Mour˜ao (2016): “A critical value function approach, with anapplication to persistent time-series,” arXiv preprint arXiv:1606.03496 . Perlman, M. D. and L. Wu (1999): “The emperor’s new tests,”

Statistical Science , 14,355–369.

Preacher, K. J. and A. F. Hayes (2008): “Asymptotic and resampling strategies forassessing and comparing indirect eﬀects in multiple mediator models,”

Behavior researchmethods , 40, 879–891.

Sobel, M. E. (1982): “Asymptotic conﬁdence intervals for indirect eﬀects in structuralequation models,”

Sociological methodology , 13, 290–312. van Garderen, K. J. (1997): “Curved exponential models in econometrics,”

EconometricTheory , 13, 771–790. 38 an Giersbergen, N. P. A. (2014): “Inference about the indirect eﬀect: a likelihoodapproach,” Tech. rep., Universiteit van Amsterdam, UvA-Econometrics Discussion Papers2014/10.

Vaughan, R. J. and W. N. Venables (1972): “Permanent expressions for order statisticdensities,”