[PDF] A Test for Kronecker Product Structure Covariance Matrix

Abstract

We propose a test that a covariance matrix has Kronecker Product Structure (KPS). KPS implies a reduced rank restriction on an invertible transformation of the covariance matrix and the new procedure is an adaptation of the Kleibergen and Paap (2006) reduced rank test. KPS is a generalization of homoscedasticity and allows for more powerful subvector inference in linear Instrumental Variables (IV) regressions than can be achieved under general covariance matrices. Re-examining sixteen highly cited papers conducting IV regressions, we find that KPS is not rejected in 24 of 30 specifications for moderate sample sizes at the 5% nominal size.

Full PDF

aa r X i v : . [ ec on . E M ] O c t A Test for Kronecker Product Structure CovarianceMatrix ∗ Patrik GuggenbergerDepartment of EconomicsPennsylvania State University Frank KleibergenAmsterdam School of EconomicsUniversity of AmsterdamSophocles MavroeidisDepartment of EconomicsUniversity of OxfordFirst Version: October, 2019Revised: October 22, 2020

Abstract

We propose a test that a covariance matrix has Kronecker Product Structure (KPS). KPSimplies a reduced rank restriction on an invertible transformation of the covariance matrix andthe new procedure is an adaptation of the Kleibergen and Paap (2006) reduced rank test. KPSis a generalization of homoscedasticity and allows for more powerful subvector inference in linearInstrumental Variables (IV) regressions than can be achieved under general covariance matrices.Re-examining sixteen highly cited papers conducting IV regressions, we ﬁnd that KPS is notrejected in 24 of 30 speciﬁcations for moderate sample sizes at the 5% nominal size.Keywords: covariance matrix, heteroskedasticity, Kronecker product structure, linear instru-mental variables regression model, reduced rank, weak identiﬁcationJEL codes: C12, C26

The robustness properties provided by nonparametric covariance matrix estimators, like, thoseproposed by White (1980) and the heteroskedasticity and autocorrelation (hac) robust ones by,for example, Newey and West (1987) and Andrews (1991), have enabled the current default of ∗ Guggenberger gratefully acknowledges the hospitality of the EUI in Florence while parts of the paper were drafted.Mavroeidis gratefully acknowledges the research support of the European Research Council via Consolidator grantnumber 647152. We would like to thank Lewis McLean for research assistance. This adaptation especially concerns the degrees of freedom of the χ limitingdistribution of the test statistic which is not directly obvious.We apply the new KPS test to the diﬀerent speciﬁcations of linear IV models employed insixteen highly cited empirical studies published in top ranked economic journals. We ﬁnd that forthe speciﬁcations with moderate numbers of observations KPS is not rejected in 24 of 30 cases at the Another adaptation of the KP reduced-rank statistic is by Donald et al. (2007), who develop a test for singularity,or reduced rank, of a symmetric matrix.

2% signiﬁcance 28 level while for smaller number of observations it is rejected in 14 of 28 cases. Therelatively high number of nonrejections illustrates the importance of the KPS test for applied work.In related work, we show how the KPS test can be used as a pre-test for weak-identiﬁcation-robustsubset tests while preserving the overall size, see Guggenberger et al. (2020).KPS or separability, which is how other ﬁelds sometimes refer to KPS, of the covariance matrixis also studied in the statistics and signal processing literature. The distance to a covariance matrixwith KPS is considered in Genton (2007) and Velu and Herman (2017), while Lu and Zimmermann(2005) and Mitchell et al. (2006) analyze the likelihood ratio test of KPS of the covariance matrixof Normally distributed data. They estimate the elements of the KPS covariance matrix using aswitching algorithm. Exploiting the reduced rank restriction imposed on the reordered covariancematrix by KPS is also done in Werner et al. (2008). Their results are, however, based on a complexGaussian distribution for the data, which leads to a degrees of freedom parameter of the χ limitingdistribution of their test that is not comparable to the one derived here. KPS is an example ofdimension reduction of a covariance matrix. Other examples result from shrinking the covariancematrix to one with (much) fewer unrestricted elements to estimate, for example, (a scalar multipleof) the identity matrix, see e.g. Ledoit and Wolf (2012), or by shrinking the population eigenvalues,see e.g. Ledoit and Wolf (2015), Ledoit and Wolf (2018).The paper is organized as follows. In the second section, we introduce the new test for a KPScovariance matrix. The third and fourth sections conduct simulation studies to analyze the sizeand power of the new KPS test. The ﬁfth section summarizes the extensive analysis of testing fora KPS reduced-form covariance matrix in a considerable number of prominent articles. The ﬁnalsixth section concludes. Proofs and detailed empirical results are given in the Appendix at the end.We use the vec operator of the matrix A , so vec( A ) = ( a ′ . . . a ′ k ) ′ ∈ R mk for a m × k dimensionalmatrix A = ( a . . . a k ) . We propose a test for Kronecker product structure (KPS) of a covariance matrix R n ∈ R kp × kp .The covariance matrix results from a sample of n observations, R n := E (cid:0) n P ni =1 f i f ′ i (cid:1) , (1)for mean zero, independently distributed random vectors f i ∈ R kp , i = 1 , . . . , n which are theKronecker product of two uncorrelated random vectors, so f i = ( V i ⊗ Z i ) with V i ∈ R p and Z i ∈ R k uncorrelated random vectors. The speciﬁcation of f i ﬁts, for example, a setting where V i containsthe errors of a number of regression equations and Z i contains the regressors, so that R n is then thecovariance matrix of the sample covariance between these errors and the regressors. More generalspeciﬁcations of f i are also feasible but because these are of less interest for our purpose, we do notconsider them.An index n has been added to the covariance matrix to allow for the (average) covariance matrix3o evolve with the sample size. The covariance matrix has a block structure R n =  R · · · R p ... . . . ... R p · · · R pp  , (2)where R jl ∈ R k × k are symmetric matrices, j, l = 1 , . . . , p , since R jl = E (cid:0) n P ni =1 V ij V il Z i Z ′ i (cid:1) = R ′ lj , for V i = ( V i . . . V ip ) ′ , is symmetric. We are interested in testing if the covariance matrix R n hasKPS: H : R n = G ⊗ G , (3)with G ∈ R p × p and G ∈ R k × k symmetric positive semi-deﬁnite matrices of which one has adiagonal element equal to one (say the upper left element of G ), against the alternative hy-pothesis of not having KPS. To measure the distance of the sample covariance matrix estimatorbelow from a KPS covariance matrix, we use a convenient (invertible) transformation proposed byVan Loan and Pitsianis (1993):For a matrix A ∈ R kp × kp with block structure as in (2) deﬁne R ( A ) :=  A ... A p  ∈ R p × k , with A j :=  vec ( A j ) ′ ... vec ( A pj ) ′  ∈ R p × k , (4)for j = 1 , ..., p. One can easily show that R ( G ⊗ G ) = vec ( G ) vec ( G ) ′ (5)and by Theorem 2.1 in Van Loan and Pitsianis (1993) we have k R − G ⊗ G k F = (cid:13)(cid:13) R ( R ) − vec ( G ) vec ( G ) ′ (cid:13)(cid:13) F with k . k F the Frobenius or trace norm of a matrix, k A k F := tr ( A ′ A ) = vec ( A ) ′ vec ( A ) , for any rect-angular matrix A . Because R ( G ⊗ G ) is a matrix of rank one, this provides an easier hypothesisto test for compared to directly testing for KPS of the untransformed covariance matrix.Consider the covariance matrix estimator b R n := n n P i =1 ˆ f i ˆ f ′ i ∈ R kp × kp (6)which uses the sample values, ˆ f i , of the random vectors f i , which are assumed to converge to f i , ˆ f i = f i + o p (1) . Deﬁne the distance from a KPS covariance matrix by the Frobenius norm: DS := min G > ,G > (cid:13)(cid:13)(cid:13) R ( ˆ R n ) − vec ( G ) vec ( G ) ′ (cid:13)(cid:13)(cid:13) F . (7)4 heorem 1 The distance measure DS (7) equals the square root of the sum of squares of all but thelargest singular value of R ( ˆ R n ) ∈ R p × k , i.e. DS = P min( p ,k ) i =2 ˆ σ i , where ˆ σ ≥ ... ≥ ˆ σ min( p ,k ) are the singular values of R ( ˆ R n ) . Proof. see the Appendix.We use the distance between R ( ˆ R n ) and a matrix of rank one to test for KPS of R n . There are pk ( p + 1)( k + 1) unique elements in ˆ R n . Let ˆ r n be the vector that collects these unique elements.Under mild conditions on f i (like, for example, mean zero and independently distributed with ﬁniteeighth moments) ˆ r n satisﬁes a central limit theorem: √ n (ˆ r n − r n ) d → N (0 , V r n ) , (8)where V r n denotes the pk ( p + 1)( k + 1) × pk ( p + 1)( k + 1)-dimensional covariance matrix and r n the population value. We test the signiﬁcance of the distance between R ( ˆ R n ) and a rank onematrix using the KP rank statistic. To describe the KP rank statistic consider ﬁrst a singular valuedecomposition (SVD) of R ( ˆ R n ): R ( ˆ R n ) = ˆ L ˆΣ ˆ N ′ , (9)where ˆΣ := diag (ˆ σ . . . ˆ σ min( p ,k ) ) denotes a p × k dimensional diagonal matrix with the singularvalues ˆ σ j ( j = 1 , ..., min( p , k )) on the main diagonal ordered non-increasingly, and with ˆ L ∈ R p × p and ˆ N ∈ R k × k orthonormal matrices. Decomposeˆ L = ˆ L ˆ L ˆ L ˆ L ! = ˆ L ′ ˆ L ′ ! , ˆΣ = ˆ σ

00 ˆΣ ! , ˆ N = ˆ N ˆ N ′ ˆ N ′ ˆ N ′ ! = ˆ N ′ ˆ N ′ ! , (10)with ˆ L : 1 × , ˆ L : 1 × ( p − , ˆ L : ( p − × , ˆ L : ( p − × ( p − , ˆ σ : 1 × , ˆΣ : ( p − × ( k − , ˆ N : 1 × , ˆ N : 1 × ( k − , ˆ N : ( k − × , ˆ N : ( k − × ( k − G and ˆ G connected to the largest singular value, we have: vec ( ˆ G ) := (cid:0) ˆ L ˆ L (cid:1) / ˆ L : p × ,vec ( ˆ G ) ⊥ := ˆ L ˆ L − ( ˆ L ˆ L ′ ) / : p × ( p − ,vec ( ˆ G ) := ˆ L ˆ σ (cid:18) ˆ N ... ˆ N ′ (cid:19) = ˆ L ˆ σ ˆ N ′ : 1 × k ,vec ( ˆ G ) ′⊥ := (cid:16) ˆ N ˆ N ′ (cid:17) / ˆ N ′− ˆ N ′ : ( k − × k , (11) The matrix R n consists of p ( p + 1) diﬀerent symmetric blocks. Each of these blocks has k ( k + 1) diﬀerentelements so the number of unique elements in R n is pk ( p + 1)( k + 1) . If A is a positive semi deﬁnite symmetric matrix A = EL E ′ , where L is a diagonal matrix containing the squareroots of the eigenvalues of A, and E is a matrix that contains the orthonormal eigenvectors of A, deﬁne A / := ELE ′ and A − / := E L − E ′ , where L is a diagonal matrix containing the non-zero eigenvalues of A and E consists ofthe corresponding eigenvectors. A − is the generalized inverse of a matrix A. DeﬁneˆΛ := (cid:16) ˆ L ˆ L ′ (cid:17) − / ˆ L ˆΣ ˆ N ′ (cid:16) ˆ N ˆ N ′ (cid:17) − / : ( p − × ( k − . (12)It can be shown that ˆΛ = vec ( ˆ G ) ′⊥ R ( ˆ R n ) vec ( ˆ G ) ⊥ , see Kleibergen and Paap (2006). Theorems5.3-5.6 from Van Loan and Pitsianis (1993) show that if ˆ R n is a symmetric positive semi-deﬁnitematrix then so are ˆ G and ˆ G . We then have R ( ˆ R n ) = vec ( ˆ G ) vec ( ˆ G ) ′ + vec ( ˆ G ) ⊥ ˆΛ vec ( ˆ G ) ′⊥ . (13)The KP rank test statistic is a quadratic form of the vectorization of ˆΛ. Its speciﬁcation directlyextends to the new KPS test but determining the degrees of freedom parameter for its χ limitingdistribution is not straightforward.We deﬁne the statistic KPST for testing H in (3) as KP ST := n × vec ( ˆΛ) ′ (cid:16) ˆ J ′ ˆ V ˆ J (cid:17) − vec ( ˆΛ) , whereˆ J := (cid:16)h vec ( ˆ G ) i ⊥ ⊗ h vec ( ˆ G ) i ⊥ (cid:17) , ˆ V := d cov( vec ( R ( ˆ R n ))) ∈ R p k × p k , (14)and d cov( vec ( R ( ˆ R n ))) is a re-arrangement of d cov( vec ( ˆ R n )) = 1 n n X i =1 ˆ ξ i ˆ ξ ′ i , ˆ ξ i := ˆ f i ⊗ ˆ f i − vec (cid:16) ˆ R n (cid:17) which conforms with the re-arrangment of ˆ R n to R ( ˆ R n ). Theorem 2 a. For mean zero, independently distributed random vectors f i ∈ R kp with ﬁniteeighth moments, KPST deﬁned in (14) has a χ a limiting distribution as n → ∞ with a := pk ( p + 1)( k + 1) − p ( p + 1) − k ( k + 1) + 1 (15) degrees of freedom. b. Under joint limit sequences of p, k and n, alongside the conditions in part a , a suﬃcientcondition for the limiting distribution of KPST to be unaﬀected, is for these sequences to be suchthat ( pk ) n → . (16) Proof. see the Appendix.The new KPST test rejects H in (3) at nominal size α if KP ST > χ a, − α , χ a, − α denotes the 1 − α quantile of a χ a distribution.The covariance matrix of cov( vec ( R ( ˆ R n ))) results from the sample covariance matrix of r n , ˆ V r n . Since R ( ˆ R n ) has more elements that ˆ r n , cov( vec ( R ( ˆ R n ))) is a singular matrix. The proof ofTheorem 2a shows how this singularity is accounted for in the degrees of freedom parameter of thelimiting distribution of KPST, a. Theorem 2b provides a suﬃcient condition for uniform convergence of ˆΛ and its covariancematrix estimator for settings where p, k, and n jointly go to inﬁnity so the limiting distributionof KPST remains unaltered. It is needed to assess the validity of the asymptotic approximationfor settings where p and k are relatively large compared to the number of observations n. For suchsettings, it would then allow for the use of a studentized version of KPST to avoid using criticalvalues that result from a χ distribution with a very large degrees of freedom parameter.The conditions in Theorem 2b are slightly less strict than those in Newey and Windmeijer(2009). They prove the validity of the asymptotic approximation of test statistics where the numberof observations grows faster than the cube root of the number of moment restrictions. The numberof moment restrictions here is proportional to ( pk ) so their rate would be ( pk ) /n → Clustered data

In case of clustered data, we assume there are n clusters of N i observations each, so the totalnumber of data points is P ni =1 N i : f i = P n i j =1 f ij , (17)for mean zero kp dimensional random vectors f ij , j = 1 , . . . , n i , i = 1 , . . . , n. Observations f ij within cluster i can be arbitrarily dependent, i.e., E ( f ij f is ) is unrestricted for all j, s = 1 , ..., N i , while observations across clusters are independent. The kp × kp dimensional (positive semi-deﬁnite)covariance matrix of the sample moments then results as: R n = n P ni =1 E ( f i f ′ i ) . (18) We evaluate the accuracy of the limiting distribution in Theorem 2 to set critical values for testingfor KPS. We do so in a small simulation experiment using the linear regression model: Y i = Z ′ i Π + V i , i = 1 , . . . , n, (19)where Y i is a p dimensional vector with dependent variables, Z i is a k dimensional vector withexplanatory (exogenous) variables and V i is a p dimensional vector with errors. The test statisticresults from the moment vector ˆ f i = C ′ b V i ⊗ C ′ Z i , (20)7 ata Generating Process : homoskedastic scalar heterop k n a m 10% 5% 1% 10% 5% 1%2 2 1626 4 9 10.0 5.1 1.0 9.7 4.4 0.72 3 14130 10 18 10.0 5.0 0.8 9.3 4.2 0.72 4 65536 18 30 9.4 5.0 0.9 9.7 4.9 0.93 2 14130 10 18 10.2 5.0 0.9 10.0 4.7 0.9Table 1: Rejection frequencies (in percentages) of KPST at various signiﬁcance levels. χ a crit-ical values. n = ( pk ) / , a : number of restrictions given in eq. (15), m : number of estimatedparameters. 10000 MC replications.where C and C are used for normalization, see also Kleibergen and Paap (2006). For example, C C ′ = (cid:16) n P ni =1 b V i b V ′ i (cid:17) − and C C ′ = (cid:0) n P ni =1 Z i Z ′ i (cid:1) − . We further set Π to zero (which iswithout loss of generality since KPST uses the residual vectors) and generate the Z i ’s independentlyfrom N (0 , I k ) distributions and V i given Z i independently from a N (0 , h ( Z i ) I p ) distribution. Weconsider two diﬀerent speciﬁcations of h ( Z i ) . The ﬁrst leads to homoskedasticity and has h ( Z i ) = 1while the second leads to (scalar) heteroskedasticity and has h ( Z i ) = k Z i k /k. For each case, wecompute null rejection probabilities using a nominal signiﬁcance level of 5%. The null rejectionprobabilities (NRPs) are computed using 10000 Monte Carlo replications for the KPST that usesthe asymptotic critical values resulting from Theorem 2. Table 1 reports the NRPs when the samplesize depends on the dimensions p and k, speciﬁcally n = ( kp ) / , in accordance with Theorem 2.We notice only a slight underrejection in some cases, but the rest of NRPs are not signiﬁcantlydiﬀerent from the test’s nominal levels. Table 2 reports NRPs with a smaller sample size n = ( pk ) .In this case, we ﬁnd some modest deviations from the nominal size but these are generally quitesmall.Figures 1 and 2 show the NRPs as a function of the sample size n for diﬀerent settings of p and k. Depending on the value of the latter, the NRPs are close to the nominal level for values of n much smaller than ( pk ) . For example, when p = k = 2 and testing at the 5% signiﬁcance level,the NRP is close to the nominal level for sample size of around 100. More striking is when when p = 2 and k = 5 for which KPSTs using a 5% signiﬁcance level have NRPs close to the size forvalues of n around two hunderd. To analyze the power of the KPST test, we analyze settings where the covariance matrix of themoments R n ∈ R kp × kp is local to KPS: R n = ( G ⊗ G ) + √ n A , (21)8 ata Generating Process : homoskedastic scalar heterop k n a m 10% 5% 1% 10% 5% 1%2 2 256 4 9 11.2 5.3 0.9 11.4 4.8 0.52 3 1296 10 18 10.2 4.9 0.9 9.3 4.0 0.52 4 4096 18 30 9.9 5.1 1.0 9.1 4.2 0.82 5 10000 28 45 9.7 4.6 0.8 8.8 4.0 0.63 2 1296 10 18 9.9 4.8 0.7 9.0 3.7 0.53 3 6561 25 36 9.8 5.0 0.9 9.6 4.4 0.73 4 20736 45 60 10.7 5.6 1.2 10.2 5.1 0.93 5 50625 70 90 10.4 5.2 1.0 10.2 5.0 0.7Table 2: Rejection frequencies (in percentages) of KPST test at various signiﬁcance levels. χ a critical values. n = ( pk ) , a : number of restrictions given in eq. (15), m : number of estimatedparameters. 10000 MC replications. hom:0.1 het:0.1 hom:0.05 het:0.05 hom:0.01 het:0.01

100 200 300 1000100 200 300 10000.050.100.15 p=2, k=2 n hom:0.1 het:0.1 hom:0.05 het:0.05 hom:0.01 het:0.01

100 200 1000 2000 10000100 200 1000 2000 100000.10.2 p=2, k=3 n100 200 1000 10000 70000100 200 1000 10000 700000.10.2 p=2, k=4 n 100 200 1000 10000 100000100 200 1000 10000 1000000.250.50 p=2, k=5 n

Figure 1: Null rejection probabilities of KPST test at diﬀerent sample sizes and DGPs, hom:conditionally homoskedastic; het: scalar heteroskedastic. Computed using 40000 MC replcations.9 om:0.1 het:0.1 hom:0.05 het:0.05 hom:0.01 het:0.01

100 200 1000 2000 10000100 200 1000 2000 100000.10.2 p=3, k=2 n hom:0.1 het:0.1 hom:0.05 het:0.05 hom:0.01 het:0.01

100 200 1000 10000 100000100 200 1000 10000 1000000.20.4 p=3, k=3 n1000 2000 10000 200001000 2000 10000 200000.250.50 p=3, k=4 n 1000 2000 100001000 2000 100000.00.5 p=3, k=5 n

Figure 2: Null rejection probabilities of KPST test at diﬀerent sample sizes and DGPs, hom:conditionally homoskedastic; het: scalar heteroskedastic. Computed using 40000 MC replcations.where G ∈ R p × p and G ∈ R k × k are symmetric positive deﬁnite matrices and A :=  A · · · A p ... . . . ... A p · · · A pp  ∈ R kp × kp (22)is a ﬁxed symmetric matrix, where A ij ∈ R k × k for i, j = 1 , . . . , p. The re-arranged matrix R ( R n )used to pin down the KPS is: R ( R n ) = vec ( G ) vec ( G ) ′ + √ n R ( A )= vec ( ¯ G ,n ) vec ( ¯ G ,n ) ′ + √ n vec ( ¯ G ,n ) ⊥ Λ n vec ( ¯ G ,n ) ′⊥ , (23)with ¯ G ,n ∈ R p × p and ¯ G ,n ∈ R k × k symmetric positive deﬁnite matrices potentially diﬀerent from G and G but converging to them as n goes to inﬁnity. The decomposition in the last line of (23)is identical to the one in (13). The speciﬁcation in (23) results from a generic SVD of R ( A ) . For ¯ G ,n and ¯ G ,n to equal G and G , one needs vec ( G ) ′ R ( A ) vec ( G ) = 0 which we do not assume but do in the next example. heorem 3 Assume that δ := lim n →∞ vec (Λ n ) ′ h ( (cid:2) vec ( ¯ G ,n ) (cid:3) ′⊥ ⊗ (cid:2) vec ( ¯ G ,n ) (cid:3) ′⊥ ) cov( vec ( R ( ˆ R n ))( (cid:2) vec ( ¯ G ,n ) (cid:3) ⊥ ⊗ (cid:2) vec ( ¯ G ,n ) (cid:3) ⊥ ) i − vec (Λ n ) (24) exists. Then, under local to KPS sequences of covariance matrices as in (21) and for mean zero,independently distributed random vectors f i ∈ R kp with ﬁnite eighth moments, KPST has a χ a ( δ ) limiting distribution as n → ∞ ( with k, p ﬁxed). Proof.

Follows directly from the proof of Theorem 2 in the Appendix.

Power simulation

We simulate the power of the KPST test using the asymptotic χ criticalvalues stated in Theorem 2. The Data Generating Process (DGP) is generated by a model with p = k = 2 , where Y i = Z i Π + V i and Π = 0 , see (19). The two dimensional vectors containing theregressors Z i and errors V i are simulated according to: V i ∼ iid ( N (0 , Ω ) ,N (0 , Ω ) , Z i ∼ iid ( N (0 , Q zz, ) , i = 1 , ..., [ n/ N (0 , Q zz, ) , i = [ n/

2] + 1 , ..., n, (25)with Ω = diag ( b, , Ω = diag (1 , b ) , Q zz, = diag (1 , c ) , Q zz, = diag ( c, , and b := 12 σ √ n − s σ √ n (cid:18) σ √ n + 8 (cid:19) + 1 , c := 12 σ √ n + 12 s σ √ n (cid:18) σ √ n + 8 (cid:19) + 1 , (26)for σ ∈ [0 , √ n ) . The covariance matrix R n is then such that: R n = n var ( P ni =1 ( V i ⊗ Z i )) = diag ( b + c, bc, bc, b + c )= I |{z} G ⊗ G + σ √ n × diag (1 , − , − , , (27)and G = G = I . Since R ( diag (1 , − , − ,  −

10 0 0 00 0 0 0 −  , (28)11 ec ( G ) ′ R ( diag (1 , − , − , vec ( G ) = 0 , the re-arranged speciﬁcation of R n (23) has ¯ G ,n and¯ G ,n coinciding with G and G : R ( R n ) = vec ( G ) vec ( G ) ′ + σ √ n  −

10 0 0 00 0 0 0 −  = vec ( G ) vec ( G ) ′ + √ n vec ( G ) ⊥ Λ n vec ( G ) ′⊥ , (29)with vec ( G ) ⊥ = vec ( G ) ⊥ = √  √ √ −  , Λ n = σ   . (30)In this case Λ n does not depend on n because ¯ G ,n and ¯ G ,n coincide with G and G . For σ = 0 ,R n has KPS, so the null hypothesis in (3) holds. For the limiting case of σ = √ n : b = 0 , so Ω and Ω are singular.We compute the power function for the KPST test at three signiﬁcance levels 10%, 5% and1% for a sample of size n = 1626 ≈ ( kp ) / , using 10 Monte Carlo replications. The results arereported in Figure 3. Alongside the simulated power curve, Figure 3 also shows its asymptoticapproximation that results from Theorem 3. The non-centrality parameter of this asymptoticapproximation results from noting that( e ⊗ e ) ′ h ( (cid:2) vec ( ¯ G ,n ) (cid:3) ′⊥ ⊗ (cid:2) vec ( ¯ G ,n ) (cid:3) ′⊥ )cov( vec ( R ( ˆ R n ))( (cid:2) vec ( ¯ G ,n ) (cid:3) ⊥ ⊗ (cid:2) vec ( ¯ G ,n ) (cid:3) ⊥ ) i − ( e ⊗ e ) = (31)where ¯ G i,n = G i = I for i = 1 ,

2, and e = (1 , , vec (Λ n ) = 2 σ ( e ⊗ e ), thenoncentrality parameter is δ = 14 σ . (32)We see that the asymptotic approximation to the power of the KPST is reasonable, if somewhatoptimistic, especially at the 1% level of signiﬁcance. We investigate whether KPS covariance matrices are relevant for applied work by analyzing thecovariance matrices of estimators in published empirical studies to see if they satisfy KPS. Wehave therefore taken sixteen highly cited papers conducting instrumental variables regressions fromtop journals in economics and test for KPS of the joint covariance matrix of the (unrestrictedreduced form) least squares estimators which result from regressing all endogenous variables on the12

PST asymptotic n s

10% level

KPST asymptotic

5% level n s

1% level n s Figure 3: Power of KPST test with χ critical values (red solid), and asymptotic power fromTheorem 3 (blue dashed). 10000 MC replications. σ measures deviation from KPS in Frobeniusnorm. Sample size is n = 1626 . instruments. The involved papers, and the acronyms we use to refer to them, are listed in Table3. Tables 6 and 7 in the Appendix report our KPS test results for the hundred sixteen diﬀerentspeciﬁcations we analyzed. Table 6 does so for the studies using independent data while Table 7lists the results for studies with clustered data. Since these tables are rather extensive, Tables 4and 5 report a summary of our ﬁndings on the KPS tests.Table 4, summarizing our results on KPS tests for the papers using independent data, showsconsiderable support for KPS covariance matrices especially when the number of observations isnot too large. For the ﬁfty eight diﬀerent speciﬁcations using independent data reported in Table4, we reject KPS at the 5% signiﬁcance level for about one third of them: twenty two.Table 5, summarizing our results for papers using clustered data, shows that for the ﬁfty eightdiﬀerent speciﬁcations with clustered data, we reject KPS at the 5% signiﬁcance level for forty eightspeciﬁcations when using the unrestricted covariance matrix estimator (6) and for forty when usingthe clustered covariance matrix estimator (18). The number of observations in the involved papersusing clustered data is typically much larger than for the papers using independent observationswhich largely explains our diﬀerent ﬁndings for independent compared to clustered observations.Our analysis on the KPS of covariance matrices of moment condition vectors in a considerablenumber of prominent empirical studies shows that KPS is often not rejected especially for moderatesample sizes. Both the endogenous variables and the instruments are ﬁrst regressed on the control, or included exogenous,variables and only the residuals from these regressions are used. cronym Paper ACJR 11 Acemoglu et al. (2011)AD 13 Autor and Dorn (2013)ADG 13 Autor et al. (2013)AGN 13 Alesina et al. (2013)AJ 05 Acemoglu and Johnson (2005)AJRY 08 Acemoglu et al. (2008)DT 11 Duranton and Turner (2011)HG 10 Hansford and Gomez (2010)JPS 06 Johnson et al. (2006)MSS 04 Miguel et al. (2004)Nunn 08 Nunn (2008)PSJM 13 Parker et al. (2013)TCN 10 Tanaka et al. (2010)V et al 12 Voors et al. (2012)Yogo 04 Yogo (2004)Table 3: List of papers used in the empirical applications.

Paper speciﬁcations KPS rejection

TCN 10 2 none moderateNunn 08 4 4 smallAJ 05 24 10 smallHG 10 2 2 hugeAGN 13 6 1 moderateYogo 04 22 5 moderateTable 4: Summary of results of 5 percent signiﬁcance KPST tests for speciﬁcations in papers usingindependent observations

We propose a straightforward test for a KPS covariance matrix of an estimator. The test is anextension of the KP rank test and is easy to use. We apply it to data used in a considerable numberof prominent applied studies conducting IV regressions and ﬁnd that KPS of the covariance matrixof the least squares estimator of the unrestricted reduced form is mostly not rejected for moderatesample sizes. In linear IV regression, a KPS covariance matrix brings considerable advantages forboth computation and inference in weakly identiﬁed settings. Given the common occurrence ofweak identiﬁcation in applications, our empirical ﬁndings underscore the contribution that the useof KPS covariance matrices can make in applied work. In a subsequent paper, Guggenberger et al.(2020), we therefore develop a two-step test procedure that in the ﬁrst step uses our KPS covariancematrix test and, depending on its outcome, in the second step conducts a weak-identiﬁcation-robusttest on a subset of the structural parameters. The two-step procedure is constructed such that theoverall size of the test is controlled. 14 aper speciﬁc. KPS rej.

DT 11 8 6 large 5 moderateAJRY 08 9 7 large 5 moderateJPS 06 4 4 huge 4 hugePSJM 13 2 2 huge 2 hugeADH 13 18 18 large 13 smallAD 13 7 7 huge 7 smallACJR 11 1 1 small 1 very smallMSS 04 3 0 large 3 smallV etal 12 6 2 moderate 0 smallTable 5: Summary of results of 5 percent signﬁcance KPST tests for speciﬁcations in papers usingclustered observations

AppendixA Proofs

Proof of Theorem 1:

For R ( ˆ R n ) = ˆ L ˆΣ ˆ N ′ , (33)the singular value decomposition (SVD) of R ( ˆ R n ) with ˆΣ = diag (ˆ σ . . . ˆ σ min( l,q ) ) a l × q dimensionaldiagonal matrix with the singular values ordered non-increasingly on the main diagonal, and ˆ L andˆ N l × l and q × q dimensional orthonormal matrices, we have (cid:13)(cid:13)(cid:13) R ( ˆ R n ) − vec ( G ) vec ( G ) ′ (cid:13)(cid:13)(cid:13) F = P min( l,q ) i =1 ˆ σ i − vec ( G ) ′ ˆ L ˆΣ ˆ N ′ vec ( G )+ vec ( G ) ′ vec ( G ) vec ( G ) ′ vec ( G ) , which shows that it is minimized with respect to G , G at vec ( ˆ G ) = ˆ L / ˆ L , vec ( ˆ G ) = ˆ L ˆ σ ˆ N with ˆ L and ˆ N the ﬁrst columns of ˆ L and ˆ N resp., so ( DS ) = P min( l,q ) i =2 ˆ σ i , see also Van Loan and Pitsianis(1993). Proof of Theorem 2: a.

To obtain ˆΛ , ˆ r n gets transformed to ˆ R n , ˆ R n to R ( ˆ R n ) and R ( ˆ R n ) to( ˆ G , ˆ G , ˆΛ) . Since the relationship between ˆ r n and ( ˆ G , ˆ G , ˆΛ) is invertible, the total number ofunique elements in ( ˆ G , ˆ G , ˆΛ) equals the number of elements of ˆ r n . When solving for ( ˆ G , ˆ G , ˆΛ)from R ( ˆ R n ) , ﬁrst the unique p ( p + 1) + k ( k + 1) − G and ˆ G get solved from thelargest singular value and its eigenvector of R ( ˆ R n ) . Hereafter, the pk ( p + 1)( k + 1) − p ( p + 1) − k ( k + 1) + 1 unique elements of ˆΛ get solved from the remaining smaller singular values and theireigenvectors. This implies that the rank of the matrix which transforms vec ( R ( ˆ R n )) into vec ( ˆΛ) , The unique elements of the symmetric ˆ G consist of all but its lower (or upper) diagonal elements minus one(which is used for normalizing): p ( p + 1) −

1. The unique elements of ˆ G consist of all but its lower (or upper)diagonal elements: k ( k + 1). h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) : vec ( ˆΛ) = (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) vec ( R ( ˆ R n ))equals a. The spectral decomposition of a general m × m dimensional positive semi-deﬁnite matrix A canbe speciﬁed as: A = P SP ′ = P S P ′ + P S P ′ for S =diag( S , S ) a diagonal m × m dimensional matrix containing the characteristic roots indescending order and P the m × m dimensional matrix that contains the orthonormal eigenvectors.Consider the case that r eigenvalues are equal to zero and let S be a r × r dimensional diagonalmatrix which contains all the zero eigenvalues, so it is a matrix of zeros, and P is the m × r dimensional matrix that contains the orthonormal eigenvectors associated with the zero eigenvalues.For S the ( m − r ) × ( m − r ) diagonal matrix that contains the positive eigenvalues and P the m × ( m − r ) dimensional matrix that contains the orthonormal eigenvectors associated with thesepositive eigenvalues, the generalized inverse of A is: A − = P S − P ′ = P ( P ′ AP ) − P ′ . Since the rank of (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) equals a, we can specify the generalized inverseof (cid:20)(cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) cov (cid:16) vec ( R ( ˆ R n )) (cid:17) (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19)(cid:21) as P (cid:20) P ′ (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) cov (cid:16) vec ( R ( ˆ R n )) (cid:17) (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) P (cid:21) − P ′ , for P the ( p − k − × a dimensional matrix that contains the orthonormal eigenvectors. Theunique elements of ˆΛ can be represented by a a × ( p − k −

1) dimensional selection matrixselection matrix S Λ ,kp so S Λ ,kp vec ( ˆΛ) consists of the a unique elements of ˆΛ . This selection matrixis such that S ′ Λ ,kp = P B, with B an invertible a × a dimensional matrix. When speciﬁed on the unique elements of ˆΛ , the16P rank test reads KP S = vec ( ˆΛ) ′ S ′ Λ ,kp (cid:20) S Λ ,kp (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) cov (cid:16) vec ( R ( ˆ R n )) (cid:17)(cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) ′ S Λ ,kp (cid:21) − S Λ ,kp vec ( ˆΛ)= vec ( ˆΛ) ′ P B (cid:20) B ′ P ′ (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) cov (cid:16) vec ( R ( ˆ R n )) (cid:17)(cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) P B (cid:21) − B ′ P vec ( ˆΛ)= vec ( ˆΛ) ′ P (cid:20) P ′ (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) cov (cid:16) vec ( R ( ˆ R n )) (cid:17)(cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) P (cid:21) − P vec ( ˆΛ)= vec ( ˆΛ) ′ (cid:20)(cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) cov (cid:16) vec ( R ( ˆ R n )) (cid:17)(cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19)(cid:21) − vec ( ˆΛ) , which is the expression of the KPS test in Theorem 2. b. Under joint limit sequences of k, p and n, we have to consider all components of vec ( ˆΛ) andits covariance matrix estimator. 17e can specify vec ( ˆΛ) as in (12): vec ( ˆΛ)= (cid:16) vec ( ˆ G ) ⊥ ⊗ vec ( ˆ G ) ⊥ (cid:17) ′ vec ( R ( ˆ R n ))= (cid:16)h vec ( G ) ⊥ + vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( G ) ⊥ + vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( R n ) + R ( ˆ R n ) − R ( R n ) (cid:17) = ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) ′ vec ( R ( R n )) + (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ I p − (cid:17) ′ vec ( vec ( G ) ′⊥ R ( R n )) + ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) + (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) + (cid:16) I k − ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec ( R ( R n ) vec ( G ) ⊥ )+ (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec ( R ( R n ))+ (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) + (cid:16) vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) = ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) + (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) + (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec ( R ( R n ))+ (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) +( vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ) ′ vec h R ( ˆ R n ) − R ( R n ) i = a + b + c, where we used that R ( R n ) = vec ( G ) vec ( G ) ′ , so vec ( G ) ′⊥ R ( R n ) = 0 , R ( R n ) vec ( G ) ⊥ = 0 , and a = ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) b = (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) + (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec ( R ( R n ))+ (cid:16) vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) c = (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) . From (8) it follows that R ( ˆ R n ) −R ( R n ) = O p ( n − ), and so the same holds for the convergences ratesof vec ( ˆ G ) ⊥ − vec ( G ) ⊥ and vec ( ˆ G ) ⊥ − vec ( G ) ⊥ , see Kleibergen and Paap (2006). Since vec ( ˆ G ) ⊥ and vec ( ˆ G ) ⊥ are solved from R ( ˆ R n ) , R ( ˆ R n ) − R ( R n ) , vec ( ˆ G ) ⊥ − vec ( G ) ⊥ and vec ( ˆ G ) ⊥ − vec ( G ) ⊥ are all jointly dependent. In a limiting sequence where the dimensions p and k jointly18ove with the sample size n, we then have the following convergence rates:( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) = O p (cid:16) n − (cid:17)(cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) = O p (cid:0) k n (cid:1)(cid:16) vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) = O p (cid:0) p n (cid:1)(cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec ( R ( R n )) = O p (cid:0) ( pk ) n (cid:1)(cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17) = O p (cid:0) p k n √ n (cid:1) . These limit behaviors result from the number of components we sum over in the product of theKronecker product form matrix and the vectorized matrix. For the respective, say, q -th element ofeach of the four components in the above expression, it results in:1 . (cid:20)(cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17)(cid:21) q = P p i =1 P k j =1 h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i jm [ vec ( G ) ⊥ ] il h vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17)i ( j − k + i , for m = 1 + ⌊ ( q − / ( k − ⌋ , l = q − ( p − m − , with ⌊ b ⌋ the entier function of a scalar b ,which is of order O p (cid:0) ( pk ) n (cid:1) . The order results from the fact that we sum over p k components ofproducts of elements of vec ( ˆ G ) ⊥ − vec ( G ) ⊥ and vec h R ( ˆ R n ) − R ( R n ) i that are not necessarilyindependent. 2 . (cid:20)(cid:16) [ vec ( G ) ⊥ ] ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17)(cid:21) q = P p i =1 P k j =1 [ vec ( G ) ⊥ ] jm h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i il h vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17)i ( j − k + i , which is of order O p (cid:0) ( pk ) n (cid:1) . . (cid:20)(cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec ( R ( R n )) (cid:21) q = P p i =1 P k j =1 h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i jm h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i il [ vec ( R ( R n ))] ( j − k + i , which is of order O p (cid:0) ( pk ) n (cid:1) . This order results from the fact that we have a double sum over p random variables in h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i and k random variables in h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i . (cid:20)(cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) ′ vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17)(cid:21) q = P p i =1 P k j =1 h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i jm h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i il h vec (cid:16) R ( ˆ R n ) − R ( R n ) (cid:17)i ( j − k + i , which is of order O p (cid:0) ( pk ) n √ n (cid:1) . For the limit behavior of ˆΛ to be just resulting from the limit behavior of a, it is then suﬃcientto have joint limit sequences that satisfy: ( pk ) √ n → . For the estimator of the covariance matrix of ˆΛ , we further have (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) d cov( vec ( R ( ˆ R n ))) (cid:18)h vec ( ˆ G ) i ′⊥ ⊗ h vec ( ˆ G ) i ′⊥ (cid:19) = (cid:18)h vec ( G ) ⊥ + vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ h vec ( G ) ⊥ + vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:19) ′ (cid:16) cov( vec ( R ( ˆ R n ))) + d cov( vec ( R ( ˆ R n ))) − cov( vec ( R ( ˆ R n ))) (cid:17)(cid:18)h vec ( G ) ⊥ + vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ h vec ( G ) ⊥ + vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:19) =( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) cov( vec ( R ( ˆ R n ))) ( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) ′ + U = A + B + B + B + C + C + C + C + C + C + D + . . . where below we show that the maximal convergence rates besides the zero-th order component are O p ( n − ), O p (cid:16) ( pk ) n (cid:17) , O p (cid:16) k n (cid:17) , O p (cid:16) p n (cid:17) and O p (cid:16) k p n √ n (cid:17) . All these rates appear again in the inverseof the estimator of the covariance matrix. When taking the resulting inverse and accounting forthe summations over the k p components in vec ( ˆΛ) , we obtain a slightly stronger condition thanjust for ˆΛ : ( pk ) n → , which results from the O p ( n − ) components from the covariance matrix estimation paired withthe O p (cid:0) ( pk ) n (cid:1) components from ˆΛ corrected for the multiplication by n and the double summationover p k components. The rate that would result from ˆΛ is ( pk ) n → . The convergence rate is inbetween the rate implied by Newey and Windmeijer (2006) which would be k p n for ˆΛ and k p n forconvergence of the test statistic which is slightly stricter than our rate of ( pk ) n → . Below, we state the rates of the diﬀerent

A, B, C and D (third order error) components wherewe only give the rate for one of the D components since we just showed that they do not lead tothe largest error rate because the O p (cid:16) k p n √ n (cid:17) is less than the O p (cid:16) p k n (cid:17) that results from some ofthe C components. 20 = ( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) cov( vec ( R ( ˆ R n ))) ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) = O (1) B = (cid:18)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ vec ( G ) ′⊥ (cid:19) cov( vec ( R ( ˆ R n )))( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) + ( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ )cov( vec ( R ( ˆ R n ))) (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) = O p ( n − ) B = (cid:18) vec ( G ) ′⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ (cid:19) cov( vec ( R ( ˆ R n )))( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) ′ + ( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ )cov( vec ( R ( ˆ R n ))) (cid:16) vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) = O p ( n − ) B = ( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) h d cov( vec ( R ( ˆ R n ))) − cov( vec ( R ( ˆ R n ))) i ( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) ′ = O p ( n − ) C = (cid:18)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ (cid:19) cov( vec ( R ( ˆ R n ))) ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) + ( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ )cov( vec ( R ( ˆ R n ))) (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) = O p (cid:16) ( pk ) n (cid:17) C = (cid:18)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ vec ( G ) ′⊥ (cid:19) cov( vec ( R ( ˆ R n ))) (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) = O p (cid:16) k n (cid:17) C = (cid:18) vec ( G ) ′⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ (cid:19) cov( vec ( R ( ˆ R n ))) (cid:16) vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) = O p (cid:16) p n (cid:17) C = (cid:18)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ vec ( G ) ′⊥ (cid:19) cov( vec ( R ( ˆ R n ))) (cid:16) vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) + (cid:18) vec ( G ) ′⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ (cid:19) cov( vec ( R ( ˆ R n ))) (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) = O p (cid:16) p k n (cid:17) C = (cid:18)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ vec ( G ) ′⊥ (cid:19) h d cov( vec ( R ( ˆ R n ))) − cov( vec ( R ( ˆ R n ))) i ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) +( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) h d cov( vec ( R ( ˆ R n ))) − cov( vec ( R ( ˆ R n ))) i(cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ vec ( G ) ⊥ (cid:17) = O p (cid:16) p k n (cid:17) C = (cid:18) vec ( G ) ′⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ (cid:19) h d cov( vec ( R ( ˆ R n ))) − cov( vec ( R ( ˆ R n ))) i ( vec ( G ) ⊥ ⊗ vec ( G ) ⊥ ) +( vec ( G ) ′⊥ ⊗ vec ( G ) ′⊥ ) h d cov( vec ( R ( ˆ R n ))) − cov( vec ( R ( ˆ R n ))) i(cid:16) vec ( G ) ⊥ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) = O p (cid:16) p k n (cid:17) = (cid:18)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ′ (cid:19) cov( vec ( R ( ˆ R n ))) ′ (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊥ ⊗ vec ( G ) ⊥ (cid:17) + (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊥ ⊗ vec ( G ) ⊥ (cid:17) ′ cov( vec ( R ( ˆ R n ))) (cid:16)h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i ⊗ h vec ( ˆ G ) ⊥ − vec ( G ) ⊥ i(cid:17) = O p (cid:16) k p n √ n (cid:17) B Detailed empirical results

Tables 6 and 7 give detailed empirical results in the applications considered, with non-clusteredand clustered data, respectively. 22able 6: Applications of KPST.

Paper Specif.

Y Z p k n

KPST p val

TCN 10 T5.P2.C1 Value function curvature, Income Rainfall, Head of Household CannotWork (dummy variable) 2 2 181 4.944 0.293T5.P2.C2 Value function curvature, Relative In-come, Mean Income Rainfall, Head of Household CannotWork (dummy variable) 3 2 181 14.859 0.137Nunn 08 T4.C1 Log income in 2000, Slave exports Atlantic distance, Indian distance,Saharan distance, Red Sea distance 2 4 52 32.307 0.02T4.C2 Log income in 2000, Slave exports,(X: Colonization eﬀect) Atlantic distance, Indian distance,Saharan distance, Red Sea distance 2 4 52 30.922 0.029T4.C3 Log income in 2000, Slave exports,(X: Col. eﬀect, geographical controls) Atlantic distance, Indian distance,Saharan distance, Red Sea distance 2 4 52 34.597 0.011T4.C4 Log income in 2000, Slave exports,(X: Col. eﬀect, geographical controls) Atlantic distance, Indian distance,Saharan distance, Red Sea distance 2 4 42 28.263 0.058AJ 05 T4.P1.C1 Log GDP per capita, legal formalism,constraint on executive English legal origin, settler mortality 3 2 51 8.18 0.611T4.P1.C2 Log GDP per capita, legal formalism,constraint on executive English legal origin, population den-sity 1500 3 2 60 25.969 0.004T4.P1.C3 Log GDP per capita, constraint onexecutive, procedural complexity English legal origin, settler mortality 3 2 60 5.574 0.85T4.P1.C4 Log GDP per capita, constraint onexecutive, number of procedures English legal origin, settler mortality 3 2 61 10.916 0.364T4.P1.C5 Log GDP per capita, legal formalism,average protection against risk of ex-propriation English legal origin, settler mortality 3 2 51 7.075 0.718Continued on next page able 6 – continued from previous pagePaper Specif. Y Z p k n

KPST p val

T4.P1.C6 Log GDP per capita, legal formalism,private property English legal origin, settler mortality 3 2 52 8.646 0.566T4.P2.C1 Investment-GDP ratio, legal formal-ism, constraint on executive English legal origin, settler mortality 3 2 51 13.068 0.22T4.P2.C2 Investment-GDP ratio, legal formal-ism, constraint on executive English legal origin, population den-sity 1500 3 2 60 36.298 0T4.P2.C3 Investment-GDP ratio, constraint onexecutive, procedural complexity English legal origin, settler mortality 3 2 61 16.838 0.078T4.P2.C4 Investment-GDP ratio, constraint onexecutive, number of procedures English legal origin, settler mortality 3 2 62 14.82 0.139T4.P2.C5 Investment-GDP ratio, legal formal-ism, average protection against risk ofexpropriation English legal origin, settler mortality 3 2 51 13.75 0.185T4.P2.C6 Investment-GDP ratio, legal formal-ism, private property English legal origin, settler mortality 3 2 52 8.582 0.572T5.P1.C1 Private credit, legal formalism, con-straint on executive English legal origin, settler mortality 3 2 51 9.296 0.504T5.P1.C2 Private credit, legal formalism, con-straint on executive English legal origin, population den-sity 1500 3 2 60 31.406 0.001T5.P1.C3 Private credit, constraint on execu-tive, procedural complexity English legal origin, settler mortality 3 2 60 13.721 0.186T5.P1.C4 Private credit, constraint on execu-tive, number of procedures English legal origin, settler mortality 3 2 61 11.605 0.312T5.P1.C5 Private credit, legal formalism, aver-age protection against risk of expro-priation English legal origin, settler mortality 3 2 51 12.206 0.272Continued on next page able 6 – continued from previous pagePaper Specif. Y Z p k n

KPST p val

T5.P1.C6 Private credit, legal formalism, pri-vate property English legal origin, settler mortality 3 2 52 19.304 0.037T5.P2.C1 Stock market capitalization, legal for-malism, constraint on executive English legal origin, settler mortality 3 2 50 19.178 0.038T5.P2.C2 Stock market capitalization, legal for-malism, constraint on executive English legal origin, population den-sity 1500 3 2 59 19.405 0.035T5.P2.C3 Stock market capitalization, con-straint on executive, procedural com-plexity English legal origin, settler mortality 3 2 59 34.566 0T5.P2.C4 Stock market capitalization, con-straint on executive, number of pro-cedures English legal origin, settler mortality 3 2 59 28.06 0.002T5.P2.C5 Stock market capitalization, legal for-malism, average protection againstrisk of expropriation English legal origin, settler mortality 3 2 50 35.531 0T5.P2.C6 Stock market capitalization, legal for-malism, private property English legal origin, settler mortality 3 2 51 21.344 0.019HG 10 T1.C2 Democratic vote share, turnout,turnout * partisan composition,turnout * Republican incumbent Rainfall, rainfall*partisan composi-tion, raninfall*Republican incumbent 4 3 27401 507.919 0T1.C3 Democratic vote share, turnout,turnout * partisan composition,turnout * Republican incumbent Rainfall, rainfall*partisan composi-tion, raninfall*Republican incumbent 4 3 27401 457.962 0AGN 13 T8.P3.C1 Female LF participation, Traditionalplough use Plough-neg. environment, Plough-pos. environment 2 2 160 6.191 0.185Continued on next page able 6 – continued from previous pagePaper Specif. Y Z p k n

KPST p val

T8.P3.C2 Female LF participation, Traditionalplough use Plough-neg. environment, Plough-pos. environment 2 2 160 4.939 0.294T8.P3.C3 Share ﬁrm ownership female, Tradi-tional plough use Plough-neg. environment, Plough-pos. environment 2 2 122 3.586 0.465T8.P3.C4 Share ﬁrm ownership female, Tradi-tional plough use Plough-neg. environment, Plough-pos. environment 2 2 122 6.785 0.148T8.P3.C5 Share political position female, Tra-ditional plough use Plough-neg. environment, Plough-pos. environment 2 2 140 9.29 0.054T8.P3.C6 Share political position female, Tra-ditional plough use Plough-neg. environment, Plough-pos. environment 2 2 140 10.982 0.027Yogo 04 AUL cons growth, risk-free rtn Twice lagged nominal interestrate, inﬂation,consumption growth,and log dividend-price ratio 2 4 114 16.628 0.549cons growth, stk mkt rtn 2 4 114 22.879 0.195CAN cons growth, risk-free rtn 2 4 115 24.078 0.152cons growth, stk mkt rtn 2 4 115 32.528 0.019FRA cons growth, risk-free rtn 2 4 113 28.015 0.062cons growth, stk mkt rtn 2 4 113 25.608 0.109GER cons growth, risk-free rtn 2 4 79 25.452 0.113cons growth, stk mkt rtn 2 4 79 31.24 0.027ITA cons growth, risk-free rtn 2 4 106 18.266 0.438cons growth, stk mkt rtn 2 4 106 25.889 0.102JAP cons growth, risk-free rtn 2 4 114 22.835 0.197cons growth, stk mkt rtn 2 4 114 16.132 0.583NTH cons growth, risk-free rtn 2 4 86 20.969 0.281cons growth, stk mkt rtn 2 4 86 21.762 0.243SWD cons growth, risk-free rtn 2 4 116 18.967 0.394cons growth, stk mkt rtn 2 4 116 29.714 0.04Continued on next page able 6 – continued from previous pagePaper Specif. Y Z p k n

KPST p val

SWT cons growth, risk-free rtn 2 4 91 14.889 0.67cons growth, stk mkt rtn 2 4 91 43.768 0.001UK cons growth, risk-free rtn 2 4 115 30.148 0.036cons growth, stk mkt rtn 2 4 115 19.94 0.336US cons growth, risk-free rtn 2 4 114 18.478 0.425cons growth, stk mkt rtn 2 4 114 22.373 0.216Speciﬁcation T: table; P: panel; C: column. able 7: Applications of cluster KPST. Specif.

Y Z p k n

KPST p val n c KPST c p val AJRY 08

T5.C5 Freedom House measureof democracy, Log GDPper capita in t-1 Savings rate in t-2,Democracy in t-1 2 2 891 23.86 0.000 134 20.204 0.001T5.C7 Freedom House measureof democracy, Log GDPper capita in t-1 Savings rate in t-2,labour share of income 2 2 471 21.85 0.000 98 6.037 0.303T5.C8.S1 Freedom House measureof democracy, Log GDPper capita in t-1 Savings rate in t-2,democracy in t-1X: democracy in t-2, t-3 2 2 471 17.21 0.002 98 13.500 0.019T5.C8.S2 Freedom House measureof democracy, Log GDPper capita in t-1 Savings rate in t-2,democracy in t-2X: democracy in t-1, t-3 2 2 471 14.96 0.005 98 11.738 0.039T5.C8.S3 Freedom House measureof democracy, Log GDPper capita in t-1 Savings rate in t-2,democracy in t-3X: democracy in t-1, t-2 2 2 471 6.83 0.145 98 4.388 0.495T5.C9 Freedom House measureof democracy, Log GDPper capita in t-1 Savings rate in t-2, t-3 2 2 796 12.14 0.016 125 18.960 0.002T6.C5 Freedom House measureof democracy, Log GDPper capita in t-1 Trade-weighted (tw) logGDP in t-1, democracyin t-1 2 2 796 4.71 0.318 125 12.970 0.024Continued on next page able 7 – continued from previous pageSpecif. Y Z p k n

KPST p val n c KPST c p val T6.C7 Freedom House measureof democracy, Log GDPper capita in t-1 tw log GDP in t-1, twdemocracy in t-1 2 2 796 10.18 0.037 125 11.808 0.038T6.C9 Freedom House measureof democracy, Log GDPper capita in t-1 tw log GDP in t-1, t-2 2 2 796 12.83 0.012 125 12.121 0.033

JPS 06

T4.P1.C5 Dollar change in strictnon-durables, rebate int+1, t I (rebate t+1), I (rebatet) 3 2 12730 1062.30 0.000 6253 386.388 0.000T4.P1.C6 Dollar change in non-durable goods, rebate int+1, t I (rebate t+1), I (rebatet) 3 2 12730 1062.05 0.000 6253 377.982 0.000T4.P2.C5 Dollar change in strictnon-durables, rebate int+1, t, t-1 I (rebate t+1), I (rebatet), I (rebate t-1) 4 3 15022 1635.13 0.000 6295 1128.150 0.000T4.P2.C6 Dollar change in non-durable goods, rebate int+1, t, t-1 I (rebate t+1), I (rebatet), I (rebate t-1) 4 3 15022 1666.13 0.000 6295 1140.060 0.000

PSJM 13

T4.P1.C5 Nondurable spending,ESP by check, ESP byelectronic transfer I (ESP by check), I(ESP by electronictransfer) 3 2 17281 457.30 0.000 8038 314.724 0.000Continued on next page able 7 – continued from previous pageSpecif. Y Z p k n

KPST p val n c KPST c p val T4.P1.C6 All spending, ESP bycheck, ESP by elec-tronic transfer I (ESP by check), I(ESP by electronictransfer) 3 2 17281 458.98 0.000 8038 288.445 0.000

ADH 13

T10.P3.C1 ∆ mfg empl, ∆ tradeUS-China net input pw(nipw) ∆ trade other-China, ∆net input other-China 2 2 1444 20.00 0.001 48 27.125 0.000T10.P3.C2 ∆ nonmfg empl, ∆trade US-China nipw ∆ trade other-China, ∆net input other-China 2 2 1444 22.95 0.000 48 24.312 0.000T10.P3.C3 ∆ mfg log wage, ∆trade US-China nipw ∆ trade other-China, ∆net input other-China 2 2 1444 31.27 0.000 48 19.553 0.002T10.P3.C4 ∆ mfg log wage, ∆trade US-China nipw ∆ trade other-China, ∆net input other-China 2 2 1444 19.40 0.001 48 22.269 0.000T10.P3.C5 ∆ nonmfg log wage, ∆trade US-China nipw ∆ trade other-China, ∆net input other-China 2 2 1444 100.88 0.000 48 10.514 0.062T10.P3.C6 ∆ log transfers, ∆ tradeUS-China nipw ∆ trade other-China, ∆net input other-China 2 2 1444 21.82 0.000 48 16.716 0.005T10.P4.C1 ∆ mfg empl, ∆ US-China net imports pw ∆ trade other-China, ∆net exports other-China 2 2 1444 16.52 0.002 48 10.187 0.070T10.P4.C2 ∆ nonmfg empl, ∆ US-China net imp pw ∆ trade other-China, ∆net exports other-China 2 2 1444 18.44 0.001 48 10.014 0.075T10.P4.C3 ∆ mfg log wage, ∆ US-China net imp pw ∆ trade other-China, ∆net exports other-China 2 2 1444 37.44 0.000 48 13.290 0.021T10.P4.C4 ∆ nonmfg log wage, ∆US-China net imp pw ∆ trade other-China, ∆net exports other-China 2 2 1444 11.21 0.024 48 11.072 0.050Continued on next page able 7 – continued from previous pageSpecif. Y Z p k n

KPST p val n c KPST c p val T10.P4.C5 ∆ log transfers, ∆ US-China net imp pw ∆ trade other-China, ∆net exports other-China 2 2 1444 41.77 0.000 48 9.138 0.104T10.P4.C6 ∆ avg household inc, ∆US-China net imp pw ∆ trade other-China, ∆net exports other-China 2 2 1444 18.08 0.001 48 13.395 0.020T10.P6.C1 ∆ mfg empl, ∆ nettrade factor (ntf) US-China ∆ ntf other-China, ∆net export factor (nef)other-China 2 2 1444 16.57 0.002 48 14.213 0.014T10.P6.C2 ∆ nonmfg empl, ∆ ntfUS-China ∆ ntf other-China, ∆nef other-China 2 2 1444 43.88 0.000 48 15.611 0.008T10.P6.C3 ∆ mfg log wage, ∆ ntfUS-China ∆ ntf other-China, ∆nef other-China 2 2 1444 24.54 0.000 48 12.087 0.034T10.P6.C4 ∆ nonmfg log wage, ∆ntf US-China ∆ ntf other-China, ∆nef other-China 2 2 1444 10.81 0.029 48 18.869 0.002T10.P6.C5 ∆ log transfers, ∆ ntfUS-China ∆ ntf other-China, ∆nef other-China 2 2 1444 15.56 0.004 48 16.692 0.005T10.P6.C6 ∆ avg household inc, ∆ntf US-China ∆ ntf other-China, ∆nef other-China 2 2 1444 16.46 0.002 48 29.073 0.000

AD 13

T5.P2.C1 Growth of service em-ployment, Share of rou-tine employment (t-1) 1950 employmentshare by commut-ing zone excludingthose correspondingto observation: 1980;1990;2000.‘ 2 3 2166 141.50 0.000 48 57.891 0.000Continued on next page able 7 – continued from previous pageSpecif. Y Z p k n

KPST p val n c KPST c p val T5.P2.C2 Growth of service em-ployment, Share of rou-tine employment (t-1) 1951 employmentshare by commut-ing zone excludingthose correspondingto observation: 1980;1990;2000.‘ 2 3 2166 122.97 0.000 48 41.735 0.000T5.P2.C3 Growth of service em-ployment, Share of rou-tine employment (t-1) 1952 employmentshare by commut-ing zone excludingthose correspondingto observation: 1980;1990;2000.‘ 2 3 2166 140.57 0.000 48 52.603 0.000T5.P2.C4 Growth of service em-ployment, Share of rou-tine employment (t-1) 1953 employmentshare by commut-ing zone excludingthose correspondingto observation: 1980;1990;2000.‘ 2 3 2166 118.33 0.000 48 47.893 0.000T5.P2.C5 Growth of service em-ployment, Share of rou-tine employment (t-1) 1954 employmentshare by commut-ing zone excludingthose correspondingto observation: 1980;1990;2000.‘ 2 3 2166 106.08 0.000 48 47.248 0.000Continued on next page able 7 – continued from previous pageSpecif. Y Z p k n

KPST p val n c KPST c p val T5.P2.C6 Growth of service em-ployment, Share of rou-tine employment (t-1) 1955 employmentshare by commut-ing zone excludingthose correspondingto observation: 1980;1990;2000.‘ 2 3 2166 146.81 0.000 48 43.400 0.000T5.P2.C7 Growth of service em-ployment, Share of rou-tine employment (t-1) 1956 employmentshare by commut-ing zone excludingthose correspondingto observation: 1980;1990;2000.‘ 2 3 2166 101.50 0.000 48 32.647 0.002

ACJR 11

T6.P.3.C2 Urbanization in Ger-many, reform index French presence in 1850,1875 and 1900 2 3 74 12.74 0.239 13 112.422 0.000

MSS 04

T4.C5 Civil conﬂict > > > able 7 – continued from previous pageSpecif. Y Z p k n

KPST p val n c KPST c p val V etal 12

T3.C6 Degree of altruismscale, Percentage deadin attacks Distance to Bujumbura(log), Altitude (log) 2 2 278 9.45 0.051 35 8.054 0.153T4.C6 Risk preference, Per-centage dead in attacks Distance to Bujumbura(log), Altitude (log) 2 2 213 12.28 0.015 35 1.349 0.930T5.C6 Discount rate, Percent-age dead in attacks Distance to Bujumbura(log), Altitude (log) 2 2 266 6.69 0.153 35 5.622 0.345T6.C4 Degree of altruismscale, Percentage deadin attacks Distance to Bujumbura(log), Altitude (log) 2 2 212 6.36 0.174 35 6.931 0.226T6.C5 Risk preference, Per-centage dead in attacks Distance to Bujumbura(log), Altitude (log) 2 2 158 18.69 0.028 35 6.860 0.231T6.C6 Discount rate, Percent-age dead in attacks Distance to Bujumbura(log), Altitude (log) 2 2 205 2.34 0.673 35 4.451 0.487Speciﬁcation: T: table; P: panel; C: column. n c : number of clusters, KPST c : cluster KPST statistic. eferences Acemoglu, D., D. Cantoni, S. Johnson, and J. A. Robinson (2011). The Consequences of Radical Reform:The French Revolution.

American Economic Review 101 (7), 3286–3307.Acemoglu, D. and S. Johnson (2005). Unbundling Institutions.

Journal of Political Economy 113 (5), 949–995.Acemoglu, D., S. Johnson, J. A. Robinson, and P. Yared (2008). Income and Democracy.

American EconomicReview 98 (3), 808–42.Alesina, A., P. Giuliano, and N. Nunn (2013). On the Origins of Gender Roles: Women and the Plough.

The Quarterly Journal of Economics 128 (2), 469–530.Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation.

Econometrica 59 (3), 817–858.Autor, D. H. and D. Dorn (2013). The Growth of Low-Skill Service Jobs and the Polarization of the USLabor Market.

American Economic Review 103 (5), 1553–97.Autor, D. H., D. Dorn, and G. H. Hanson (2013). The China Syndrome: Local Labor Market Eﬀects ofImport Competition in the United States.

American Economic Review 103 (6), 2121–68.Dahl, G. B. and L. Lochner (2012). The Impact of Family Income on Child Achievement: Evidence fromthe Earned Income Tax Credit.

American Economic Review 102 (5), 1927–56.Donald, S. G., N. Fortuna, and V. Pipiras (2007). On Rank Estimation in Suymmetric Matrices: The Caseof Indeﬁnite Matrix Estimators.

Journal of Econometrics , 1217–1232.Duranton, G. and M. A. Turner (2011). The Fundamental Law of Road Congestion: Evidence from USCities. American Economic Review 101 (6), 2616–52.Genton, M. G. (2007). Separable approximatioms of space-time covariance matrices.

Environmetrics ,681–695.Guggenberger, P., F. Kleibergen, and S. Mavroeidis (2019). A more powerful subvector Anderson Rubin testin linear instrumental variable regression. Quantitive Economics . Forthcoming.Guggenberger, P., F. Kleibergen, and S. Mavroeidis (2020). A powerful Subvector Anderson Rubin Test inLinear Instrumental Variables Regression with Conditional Heteroskedasticity. Working paper.Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.

Economet-rica 50 , 1029–1054.Hansen, L. P., J. Heaton, and A. Yaron (1996). Finite sample properties of some alternative GMM estimators.

Journal of Business and Economic Statistics 14 , 262–280.Hansford, T. G. and B. T. Gomez (2010). Estimating the Electoral Eﬀects of Voter Turnout.

AmericanPolitical Science Review 104 (2), 268–288.Johnson, D. S., J. A. Parker, and N. S. Souleles (2006). Household Expenditure and the Income Tax Rebatesof 2001.

American Economic Review 96 (5), 1589–1610. leibergen, F. (2020). Eﬃcient size correct subset inference in homoskedastic linear instrumental variablesregression. Journal of Econometrics . forthcoming.Kleibergen, F. and R. Paap (2006). Generalized reduced rank tests using the singular value decomposition.

Journal of Econometrics 133 (1), 97–126.Ledoit, O. and M. Wolf (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices.

Annals of Statistics , 1024–1060.Ledoit, O. and M. Wolf (2015). Spectrum Estimation: a uniﬁed approach for covariance estimation andPCA in large dimensions. Journal of Multivariate Analysis , 360–384.Ledoit, O. and M. Wolf (2018). Optimal estimation of a large-dimensional covariance matrix under Stein’sloss.

Bernouilli , 3791–3832.Lu, N. and D. L. Zimmermann (2005). The likelihood ratio test for a separable covariance matrix. Statisticsand Probability Letters , 449–457.Miguel, E., S. Satyanath, and E. Sergenti (2004). Economic Shocks and Civil Conﬂict: An InstrumentalVariables Approach. Journal of Political Economy 112 (4), 725–753.Mitchell, M., M. Genton, and M. Gumpertz (2006). A likelihood ratio test for separability of ccovariance.

Journal of Multivariate Analysis , 1025–1043.Newey, W. and F. Windmeijer (2009). GMM with many weak moment conditions. Econometrica 77 (3),687–719.Newey, W. K. and K. D. West (1987). A simple, positive semideﬁnite, heteroskedasticity and autocorrelationconsistent covariance matrix.

Econometrica 55 (3), 703–708.Nunn, N. (2008). The Long-Term Eﬀects of Africa’s Slave Trades.

The Quarterly Journal of Eco-nomics 123 (1), 139–176.Parker, J. A., N. S. Souleles, D. S. Johnson, and R. McClelland (2013). Consumer Spending and the EconomicStimulus Payments of 2008.

American Economic Review 103 (6), 2530–53.Tanaka, T., C. F. Camerer, and Q. Nguyen (2010). Risk and Time Preferences: Linking Experimental andHousehold Survey Data from Vietnam.

American Economic Review 100 (1), 557–71.Van Loan, C. and N. Pitsianis (1993). Approximation with kronecker products. In

Linear algebra forlarge scale and real-time applications , NATO Adv. Sci. Inst. Ser. E Appl. Sci. 232, pp. 293–314. KluwerAcademic Publishers.Velu, R. and K. Herman (2017). Separable Covariance Matrices and Kronecker Approximations.

ProcediaComputer Science , 1019–1029.Voors, M. J., E. E. Nillesen, P. Verwimp, E. H. Bulte, R. Lensink, and D. P. Van Soest (2012). ViolentConﬂict and Behavior: A Field Experiment in Burundi.

American Economic Review 102 (2), 941–64.Werner, K., M. Jansson, and P. Stoica (2008). On estimation of covariance matrices with Kronecker productstructure.

IEEE Transactions of Signal Processing , 478–491. hite, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for het-eroskedasticity. Econometrica 48 (4), 817–38.Yogo, M. (2004). Estimating the Elasticity of Intertemporal Substitution when Instruments are Weak.

Review of Economics and Statistics 86 (3), 797–810.(3), 797–810.