aa r X i v : . [ ec on . E M ] N ov Dependence-Robust Inference UsingResampled Statistics ∗Michael P. Leung † December 2, 2020
Abstract.
We develop inference procedures robust to general forms ofweak dependence. The procedures use test statistics constructed by resamplingdata in a manner that does not depend on the unknown correlation structureof the data. We prove that the statistics are asymptotically normal under theweak requirement that the target parameter can be consistently estimated atthe parametric rate. This holds for regular estimators under many well-knownforms of weak dependence and justifies the claim of dependence-robustness.We consider applications to settings with unknown or complicated forms ofdependence, with various forms of network dependence as leading examples.We develop tests for both moment equalities and inequalities.
JEL Codes : C12, C31
Keywords : resampling, dependent data, social networks, clustered standarderrors ∗ First draft: Sept. 2017. I thank three anonymous referees for comments that improved thequality of the paper. I also thank Eric Auerbach, Vittorio Bassi, Jinyong Hahn, Roger Moon, HashemPesaran, Kevin Song, and seminar audiences at UC Davis, USC INET, the 2018 Econometric SocietyWinter Meetings, the 2018 Econometrics Summer Masterclass and Workshop at Warwick, and theNetSci2018 Satellite on Causal Inference and Design of Experiments. This research is supported byNSF Grant SES-1755100. † Department of Economics, University of Southern California. E-mail: [email protected]. ichael P. Leung This paper builds on randomized subsampling tests due to Song (2016) and proposesinference procedures for settings in which the dependence structure of the data iscomplex or unknown. This is useful for a variety of applications using, for example,network data, clustered data when cluster memberships are imperfectly observed, orspatial data with unknown locations. The proposed procedures compare a test statis-tic, constructed using a set of resampled observations, to a critical value constructedeither using a normal approximation or by resampling. Computation is the sameregardless of the dependence structure of the data. We prove that our procedures areasymptotically valid under the weak requirement that the target parameter can beconsistently estimated at the ? n rate, a condition satisfied by most forms of weaklydependent data for regular estimators. In this sense, inference using our resampledstatistics is robust to quite general forms of weak dependence.To illustrate the idea, consider the simple problem of inference on a scalar pop-ulation mean µ . Typically we assume that the sample mean ¯ X is asymptoticallynormal: ? n p ¯ X ´ µ q d ÝÑ N p , Σ q . However, in a setting with complex or unknown forms of dependence, it may bedifficult or unclear how to estimate Σ due to the covariance terms. A simple statisticwe propose for inference in this setting is ˜ T M “ a R n ˆ Σ ´ { p ¯ X ˚ ´ µ q , where ¯ X ˚ is the mean of R n draws with replacement from the data and ˆ Σ “ n ´ ř ni “ p X i ´ ¯ X q is the (naive) sample variance. Using the identity ˜ T M “ ˜ T M ´ E r ˜ T M | X s looooooooomooooooooon r I s ` E r ˜ T M | X s looooomooooon r II s , where X is the data, we show that r I s d ÝÑ N p , q and the bias term r II s is asymptot-ically negligible if R n is chosen to diverge at a sufficiently slow rate. We can thereforecompare ˜ T M to a normal critical value to conduct inference on µ , regardless of theunderlying dependence structure. As we later discuss, larger values of R n generatehigher power through a faster rate of convergence for r I s but also a larger bias term2 ependence-Robust Inference r II s , which generates size distortion. Thus for practical implementation, we suggesta rule of thumb for R n that accounts for this trade-off.Song (2016) proposes randomized subsampling statistics for testing moment equal-ities when the data satisfies a particular form of weak dependence known as “localdependence” in which the dependence structure of the data is characterized by agraph. The novelty of his procedure is that its implementation and asymptotic va-lidity does not require knowledge of this graph. We study essentially the same teststatistic he proposes but generalize his theoretical results, showing asymptotic valid-ity under the substantially weaker requirement of estimability at a ? n rate, whichsignificantly broadens the applicability of the method. Additionally, we propose newmethods for testing moment inequalities, which are important, for example, in certainnetwork applications.Many resampling methods are available for inference using spatial, temporal, andclustered data when the dependence structure is known (e.g. Cameron et al., 2008;Lahiri, 2013; Politis et al., 1999). Knowledge of the dependence structure is com-monly exploited by resampling blocks of neighboring observations, but this requiresknowledge of the neighborhood structure. These procedures are used to constructcritical values for a test statistic computed on the original dataset. For the criticalvalues to be asymptotically valid, resampling has to be implemented in a way thatmimics the dependence structure of the data, which requires information about thisstructure. In contrast, our procedures involve computing a resampled test statisticand critical values based on its limiting distribution conditional on the data. Hence,there is no need to mimic the actual dependence structure of the data, which is whyour procedures are dependence-robust.Of course, the broad applicability of our procedures comes at a cost. The maindrawback is inefficiency due to the fact that the test statistics are essentially computedfrom a subsample of observations. In contrast, the test statistic under conventionalsubsampling, for example, utilizes the full sample. Thus, in settings where inferenceprocedures exist, our resampled statistics suffer from slower rates of convergence,which yield tests with lower power and can exacerbate finite-sample concerns suchas weak instruments. We interpret this as the cost of dependence robustness. Our The U-type statistic we study below is due to Song (2016). Our results for the mean-typestatistic ˜ T M were posted prior to the 2018 version of his paper, which adds results on an “M-type”statistic analogous to ˜ T M . ichael P. Leung objective is not to propose a procedure that is competitive with existing proceduresbut rather to provide a broadly applicable and robust inference procedure that canbe useful when little is known about the dependence structure or when this structureis complex and no inference procedure is presently available.We consider four applications. The first is regression with unknown forms of weakdependence. This setting is relevant when the dependent or independent variables arefunctions of a social network, as in network regressions (Chandrasekhar and Lewis,2016). Another special case is cluster dependence when the level of clustering is un-known and the number of clusters is small, settings in which conventional clusteredstandard errors can perform poorly (Cameron and Miller, 2015). The second applica-tion is estimating treatment spillovers on a partially observed network. The third isinference on network statistics, a challenging setting because different network forma-tion models induce different dependence structures. The fourth application is testingfor a power-law distribution, a problem that has received a great deal of attention ineconomics, network science, biology, and physics (Barabási and Albert, 1999; Gabaix,2009; Newman, 2005). Widely used methods in practice assume that the underlyingdata is i.i.d. (Clauset et al., 2009; Klaus et al., 2011), which is often implausible inapplications involving spatial, financial, or network data.The outline of the paper is as follows. The next section introduces our inferenceprocedures and provides intuition for why they work. We then discuss four appli-cations in §3, followed by an empirical illustration on testing for power-law degreedistributions in §4. Next, §5 states formal results on the asymptotic validity of ourprocedures. In §6, we present simulation studies using four different data-generatingprocesses, each corresponding to one of the discussed applications. Lastly, §7 con-cludes. We begin with a description of our proposed inference procedures. Throughout, let X “ t X i u ni “ Ď R m be a set of n identically distributed random vectors with possiblydependent row elements. Denote the sample mean of X by ¯ X . The goal is to conductinference on some parameter µ P R m . A simple example is the population mean µ “ E r X s , but we will also consider other parameters when discussing asymptoticallylinear estimators. Our main assumption will require X to be weakly dependent in4 ependence-Robust Inference the sense that ¯ X is ? n -consistent for µ . We first consider testing the null hypothesis that µ “ µ for some µ P R m andconstructing confidence regions for µ . Let R n ě be an integer and Π the set ofall bijections (permutation functions) on t , . . . , n u . Let t π r u R n r “ be a set of R n i.i.d.uniform draws from Π and π “ p π , . . . , π R n q . Define the sample variance matrix ˆ Σ “ n ´ ř ni “ p X i ´ ¯ X qp X i ´ ¯ X q .We focus on two test statistics. The first is the mean-type statistic , given by T M p µ ; π q “ ˜ T M p µ ; π q ˜ T M p µ ; π q , where ˜ T M p µ ; π q “ ? R n R n ÿ r “ ˆ Σ ´ { ` X π r p q ´ µ ˘ . That is, ˜ T M p µ ; π q is computed by drawing R n observations with replacement from t ˆ Σ ´ { p X i ´ µ qu ni “ , then taking the average and scaling up by ? R n . Note that wecompute ˆ Σ using the full sample.The second test statistic is the U-type statistic , which is given by T U p µ ; π q “ ? mR n R n ÿ r “ p X π r p q ´ µ q ˆ Σ ´ p X π r p q ´ µ q and essentially follows Song (2016). Unlike the mean-type statistic, here we resamplepairs of observations with replacement and compute a quadratic form. Inference Procedures.
We prove that if ¯ X is ? n -consistent for µ , then (underregularity conditions) ˜ T M p µ ; π q d ÝÑ N p , I m q if R n { n Ñ , and T U p µ ; π q d ÝÑ N p , q if a R n { n Ñ , (1)where I m is the m ˆ m identity matrix. The requirement of ? n -consistency canbe verified using CLTs for a wide range of notions of weak dependence, includingmixing and near-epoch dependence. Also see §3 for references for CLTs for networkdata. There are some examples of dependent data that violate ? n -consistency. Oneis cluster dependence with many clusters, large cluster sizes, and strongly dependent5 ichael P. Leung observations within clusters (Hansen and Lee, 2019). We discuss in Remark 4 howother rates of convergence can be accommodated by adjusting R n .Result (1) enables us to construct critical values for testing. For example, to testthe null that µ “ µ against a two-sided alternative, we can use t T M p µ ; π q ą q ´ α u or t T U p µ ; π q ą z ´ α u , (2)where z ´ α and q ´ α are respectively the p ´ α q -quantiles of the standard normaldistribution and chi-square distribution with m degrees of freedom. Note that the U-type statistic is two-sided in nature, which is why in (2) we simply compare T U p µ ; π q ,rather than its absolute value, to a normal quantile. To test one-sided alternativeswith the U-type statistic, we can additionally exploit the sign of ¯ X , as done in theapplication in §3.4 and the moment inequality test below. In particular, we can chooseto reject only if the test statistic exceeds its critical value and the sign of ¯ X is positive,for instance.In the case where m “ , the limit of the mean-type statistic yields the followingsimple confidence interval for µ : R n R n ÿ r “ X π r p q ˘ z ´ α { ˆ Σ { ? R n , (3)Alternatively, we can use the U-type statistic to obtain a confidence interval by testinversion. Why This Works.
To see the intuition behind (1), consider the mean-type statistic.Define W M,r “ ˆ Σ ´ { ` X π r p q ´ µ ˘ . (4)The statistic decomposes like ˜ T M p µ ; π q “ ? R n R n ÿ r “ p W M,r ´ E r W M,r | X sq loooooooooooooooooooomoooooooooooooooooooon r I s ` ? R n R n ÿ r “ E r W M,r | X s looooooooooooomooooooooooooon r II s . (5)Some algebra shows that r II s “ p R n { n q { ˆ Σ ´ { n ´ { ř ni “ p X i ´ µ q . Since R n { n “ o p q and n ´ { ř ni “ p X i ´ µ q “ O p p q under the ? n -consistency condition, we have r II s “ o p p q , provided the sample variance converges to a positive-definite matrix.6 ependence-Robust Inference Since the random permutations are i.i.d. conditional on X , r I s d ÝÑ N p , I m q . Theproof for T U p µ ; π q follows a similar logic. Choice of R n and Statistic. In choosing the tuning parameter R n , we face thefollowing trade-off. A larger value of R n corresponds to using a larger number ofobservations to construct the resampled statistic, which translates to higher powerthrough a faster rate of convergence for part r I s of decomposition (5). On the otherhand, smaller values of R n ensure that the bias term r II s in (5) is negligible, which isimportant for size control. As is clear from the previous proof sketch, for the mean-type statistic, the bias and rate of convergence are respectively order p R n { n q { and R ´ { n . For the U-type statistic, they are instead ? R n { n and R ´ { n . We considerchoosing R n to minimize the sum of these terms, yielding R Mn “ ? n and R Un “ p n { q { (6)for the mean- and U-type statistics, respectively. The choice of minimizing the sum ofthe two is only heuristic but at least reflects an asymptotic trade-off between controlof Type I and Type II errors. We only seek to provide a practical recommendationthat accounts in some way for the trade-off and leave more sophisticated and data-dependent choices of R n to future work. We also note that simulation results in §6show that the tests perform similarly for a variety of choices of R n around (6).The rates of convergence of the mean- and U-type statistics when choosing R n according to (6) are respectively n ´ { and n ´ { . In general, the U-type statistichas better power properties, as shown theoretically in Song (2016) in the context oflocally dependent data. Our simulations confirm this for (6) across a wider rangeof dependence structures, which leads us to recommend use of the U-type over themean-type statistic for smaller samples. The main appeal of the mean-type statisticis the ease of confidence interval construction (3). Asymptotically Linear Estimators.
Suppose we observe identically distributeddata Z “ t Z i u ni “ , and we are interested in a parameter β P R d . Suppose thereexists a parameter θ and a function ψ satisfying E r ψ p Z ; β , θ qs “ , and let ˆ θ bean estimate of θ . Consider an estimator ˆ β that is asymptotically linear in the sensethat ? n p ˆ β ´ β q “ ? n n ÿ i “ ψ p Z i ; β , ˆ θ q ` o p p q ichael P. Leung For example, in the case of maximum likelihood, ˆ θ is the sample Hessian, and ψ isthe score function times the Hessian. We can then apply our procedures to conductinference on β by defining X i “ ψ p Z i ; β , ˆ θ q (7)and µ “ E r ψ p Z ; β , θ qs “ . Note that in this example, µ is not the populationmean of the “data” t X i u ni “ . As discussed below in Remark 7, under regularity con-ditions, our procedures are asymptotically valid if ˆ β and ˆ θ are ? n -consistent for β and θ , respectively. Remark 1.
As pointed out to us by Eric Auerbach, an alternative dependence-robust test (here for the case m “ ) is to reject when |Z ` ? n p ¯ X ´ µ q{ h n | ą z ´ α { ,where Z „ N p , q is independent of X and h n is a diverging sequence. Since h n is eventually larger than the asymptotic variance, the second term vanishes, and thetest has asymptotic size α under the null. Under the alternative, the power tends toone at rate ? n { h n , which is always slower than ? n . Thus, this test has similar powerproperties to our test, and first-order asymptotics do not distinguish between them.Nonetheless, we do not view this test as a serious practical alternative. Choosing thetuning parameter h n literally corresponds to choosing the size of the standard error,which is clearly problematic in practice. Indeed, for any fixed choice of h n , this testis almost always either conservative or anti-conservative. In contrast, our conditionssuggest that for the U-type statistic (for example), R n should not be chosen largerthan n for the claim of size control to be considered credible in finite samples, giventhat ? R n { n Ñ is required for asymptotic validity. We also provide guidance forchoosing R n in practice (6) and validate this choice across a wide range of dependencestructures in extensive simulation experiments. Remark 2.
Since the test statistics are random conditional on the data due to thepermutation draws π , different researchers can reach different conclusions with thesame dataset. This occurs with small probability for n large, but for smaller samples,it is useful to have a procedure less sensitive to π . In his §3.5, Song (2016), proposesa procedure that allows the researcher to make the influence of π as small as desired,which we reproduce here. Let t ˜ π rℓ : ℓ “ , . . . , L ; r “ , . . . , R n u be i.i.d. uniform draws8 ependence-Robust Inference from Π and ˜ π ℓ “ p ˜ π ℓ , . . . , ˜ π R n ℓ q . Define the “randomized confidence function” f L p µ ; α q “ L L ÿ ℓ “ t T U p µ ; ˜ π ℓ q ď z ´ α u . We can also use T M p µ ; ˜ π ℓ q and q ´ α in place of T U p µ ; ˜ π ℓ q and z ´ α , respectively.Note that by taking L as large as desired, we can make f L p µ ; α q arbitrarily closeto a nonrandom function of the data by the law of large numbers, which solves therandomness problem. To see how this function can be used for inference, for anysmall β P p , α q chosen by the econometrician, define the confidence region C L p α ; β q “ t µ P R m : f L p µ ; α ´ β q ě ´ α u . Using (1), it is straightforward to show that lim n Ñ8 lim L Ñ8 P ` µ P C L p α ; β q ˘ ě ´ α, so the confidence region has the desired asymptotic coverage. For the case of lo-cally dependent data, this follows from Corollary 3.1 of Song (2016). It immediatelygeneralizes to other forms of weak dependence by applying our Theorem 1. This subsection considers testing the null µ ď for µ “ E r X s , where “ ď ” denotescomponent-wise inequality. This is relevant, for example, for inference in strate-gic models of network formation (Sheng, 2020) and models of social interactions(Li and Zhao, 2016). Let T U,k p µ k ; π q be the U-type statistic applied to data t X i,k u ni “ ,where X i,k is the k th component of X i and µ k P R . Also let ¯ X k be the k th componentof ¯ X and ˆ Σ kk the k th diagonal of ˆ Σ . We propose the test statistic Q n p π q “ max ď k ď m T U,k p π q ´ ˆ λ k t ¯ X k ă u ( , where ˆ λ k “ ¯ X k ˆ Σ ´ kk ? mR n R n ÿ r “ ` X π r p q ,k ` X π r p q ,k ˘ ´ c R n m ˆ Σ ´ kk ¯ X k . ichael P. Leung To construct the critical value, define ˜ Q n p π q “ max ď k ď m T U,k p ¯ X k ; π q . (8)Let c ´ α be the p ´ α q -quantile of the conditional-on- X distribution of ˜ Q n p π q . Ourproposed test is to reject if and only if φ n “ for φ n ” t Q n p π q ą c ´ α u . (9)In practice, we can approximate c ´ α arbitrarily well by resampling π L times, com-puting ˜ Q n p π q for each draw, and then taking the appropriate sample quantile of thisset of statistics. Formally, this leads to the feasible critical value c L, ´ α ” inf c ą L L ÿ ℓ “ ! ˜ Q n p ˜ π ℓ q ą c ) ď α + , (10)where ˜ π ℓ “ p ˜ π ℓ , . . . , ˜ π R n ℓ q and t ˜ π rℓ : ℓ “ , . . . , L ; r “ , . . . , R n u are i.i.d. uniformdraws from Π .We show in §5.2 that (9) uniformly controls size. The intuition behind the test isas follows. Some algebra shows that Q n p π q “ max ď k ď m T U,k p ¯ X k ; π q ` ˆ λ k t ¯ X k ě u ( , (11)which is similar to ˜ Q n p π q , except for the presence of the ˆ λ k t ¯ X k ě u term. Theindicator in the latter serves to detect the sign of µ k , the k th component of µ , andthus give the test power. To see this, first note that in the appendix, we show in (A.3)and (A.4) in the proof of Proposition 1 that, for any k , ˆ λ k « p R { n µ k q m ´ { Σ ´ kk ,and R { n ¯ X k « R { n µ k , respectively. We apply these two “facts” to the case m “ forillustration.• Under a fixed null, ˆ λ k t ¯ X k ě u “ ˆ λ k t R { n ¯ X k ě u « , since either R { n µ k “ , in which case ˆ λ k « by the first fact above, or R { n µ k ă , in which casethe indicator is eventually zero by the second fact above. Consequently, Q n p π q and ˜ Q n p π q have the same asymptotic distributions, and the test controls size.• Under the alternative, R { n ¯ X k is instead eventually positive, so ˆ λ k t R { n ¯ X k ě u « ˆ λ k , which is positive with high probability. Indeed, for a fixed alternative,10 ependence-Robust Inference ˆ λ k diverges by the first fact above and therefore so does Q n p π q . On the otherhand, ˜ Q n p π q has a non-degenerate limit distribution, so the test is consistent. Remark 3.
For the case m “ , dropping the subscript k , we have directly byTheorem A.2 in the appendix that ˜ Q n p π q d ÝÑ N p , q . Then in place of φ n we canalso use the test ˜ φ n “ t Q n p π q ´ ˆ λ t ¯ X ă u ą z ´ α u , which is similar to (9), except it uses the asymptotic critical value z ´ α rather thanthe permutation critical value c ´ α . This has the advantage of being computationallysimpler. We next discuss four applications of the proposed methods.
Let Y be an n -dimensional outcome vector and D an n ˆ k matrix of covariates. Let D i denote the i th row of D . Consider the standard linear regression model Y i “ D i β ` ε i . Our goal is inference on the j th component of β , denoted by β j .We assume tp D i , ε i qu ni “ is identically distributed but possibly dependent. Com-mon sources of dependence are clustering (Bertrand et al., 2004), spatial autocorrela-tion (Barrios et al., 2012; Bester et al., 2011), and network autocorrelation (Acemoglu et al.,2015). Often the precise form of dependence may be unknown, or there may be insuf-ficient data to compute conventional standard errors, for example if we do not fullyobserve the clusters, the spatial locations of the observations, or the network. If theOLS estimator is ? n -consistent, however, inference on β j is possible using resampledstatistics.In the context of network applications, a common exercise is to regress an outcomeon some measure of the network centrality of a node (Chandrasekhar and Lewis,2016). Such measures are inherently correlated across nodes even when links are i.i.d.11 ichael P. Leung Worse, different models of network formation can lead to different expressions for theasymptotic variance. To our knowledge, there is no universal inference method forweakly dependent network data.To apply our procedure, we write the estimator in the form of a sample mean.Let W “ n p D D q ´ D and W ji its ji th component. Then the OLS estimator for β j can be written as n n ÿ i “ W ji Y i . This fits into the setup of (7) if we define Z i “ p Y i , D i q , ˆ θ “ n ´ D D , and ψ p Z i ; β , ˆ θ q “ W ji Y i ´ β j . Thus, when computing our test statistics, we resample elements of X for X i “ W ji Y i ´ β j .The main assumption required for the asymptotic validity of our procedure is ? n -consistency of the least-squares estimator and ˆ θ for their population analogs. Forcluster dependence, this holds under conventional many-cluster asymptotics, wherethe number of observations in each clusters is small, but the number of clusters islarge. However, unlike with clustered standard errors, we need not know the rightlevel of clustering or even observe cluster memberships. We can also allow the numberof clusters to be small (fixed), perhaps even equal to one, so long as the data is weaklydependent within clusters in the sense of ? n -consistency of the estimator. Within-cluster weak dependence is also required by Bakirov and Székely (2006), Canay et al.(2017), Ibragimov and Müller (2010), and Ibragimov and Müller (2016), who proposenovel inference procedures for cluster dependence when the number of clusters is small.An advantage of using resampled statistics is that we can allow for only a single clusterand do not require knowledge of cluster memberships.For spatial dependence, the required conditions can be shown using CLTs formixing or near-epoch dependent data (Jenish and Prucha, 2012). CLTs for networkstatistics are mentioned in §3.3. Suppose we observe data from a randomized experiment on a single network, wherefor each node i , we observe an outcome Y i , a binary treatment assignment D i , thenumber of nodes connected to i (“network neighbors”) γ i , and the number of treated12 ependence-Robust Inference network neighbors T i . Consider the following outcome model studied in Leung (2020): Y i “ r p D i , T i , γ i , ε i q (also see Aronow and Samii, 2017). This departs from the conventional potentialoutcomes model by allowing r p¨q to depend on T i and γ i , violating the stable unittreatment value assumption. The object of interest is the following measure of treat-ment/spillover effects: E r r p d, t, γ, ε q | γ “ γ s ´ E r r p d , t , γ, ε q | γ “ γ s , (12)where t, t ď γ P N . The conditioning on γ controls for the number of friends.Variation across d, d identifies the direct causal effect of the treatment, while variationacross t, t identifies a spillover effect. Leung (2020) provides conditions on the networkand dependence structure of t ε i u ni “ under which the sample analog of (12) is ? n -consistent. For example, we can allow ε i and ε j to be correlated if i and j areconnected. Therefore, the setup falls within the scope of our assumptions.Now, suppose the econometrician obtains data t W i u ni “ for W i “ p Y i , D i , T i , γ i q by snowball-sampling 1-neighborhoods. That is, she first obtains a random sampleof units, from which she gathers p Y i , D i q , and then she obtains the network neigh-bors of those units and their treatment assignment, from which she gathers p T i , γ i q .This is a very common method of network sampling. However, standard error for-mulas provided by Aronow and Samii (2017) and Leung (2020) require knowledge ofthe path distances between observed units for certain error correlation structures.Unfortunately, these are typically unknown under this form of sampling.Our proposed procedures can be used in this setting. Let i p d, t, γ q “ t D i “ d, T i “ t, γ i “ γ u . The frequency estimator for the average treatment/spillover effectis given by ř ni “ Y i i p d, t, γ q ř ni “ i p d, t, γ q ´ ř ni “ Y i i p d , t , γ q ř ni “ i p d , t , γ q . This fits into the setup of (7) by defining Z i “ W i , ˆ θ “ ˜ n n ÿ i “ i p d, t, γ q , n n ÿ i “ i p d , t , γ q ¸ , and13 ichael P. Leung ψ p Z i ; β , ˆ θ q “ Y i i p d, t, γ q n ´ ř ni “ i p d, t, γ q ´ Y i i p d , t , γ q n ´ ř ni “ i p d , t , γ q ´ β , where β is the hypothesized value of the true average treatment/spillover effect (12). Inference methods for network statistics are important for network regressions, asdiscussed in §3.1, and strategic models of network formation (Sheng, 2020). They arealso of independent interest, as stylized facts about the structure of real-world socialnetworks motivate much of the networks literature (Barabási, 2015; Jackson, 2010).These facts are obtained by computing various summary statistics from network data.However, little attempt has been made to account for the sampling variation of thesepoint estimates, perhaps due to the wide variety of network formation models, whichinduce different dependence structures. This motivates the use of resampled statistics,which can be used to conduct inference on network statistics without taking a stanceon the network formation model.We next consider two stylized facts that have arguably received the most attentionin the literature: clustering and power-law degree distributions. This subsectionfocuses on the former, while the latter is discussed in a more general context in §3.4.For a set of n nodes, let A be a symmetric, binary adjacency matrix that represents anetwork. Its ij th entry A ij is thus an indicator for whether i and j are linked. Definethe individual clustering for a node i under network A as Cl i p A q “ ř j ‰ i,k ‰ j,k ‰ i A ij A ik A jk ř j ‰ i,k ‰ j,k ‰ i A ij A ik , with Cl i p A q ” if i has at most one link. The numerator counts the number ofpairs p j, k q linked to i that are themselves linked, while the denominator counts thenumber of pairs linked to i . The average clustering coefficient of A is defined as n ´ ř ni “ Cl i p A q .This statistic is a common measure of transitivity or clustering , the tendency forindividuals with partners in common to associate. A well-known stylized fact in thenetwork literature is that most social networks exhibit nontrivial clustering, where“nontrivial” is defined relative to the null model in which links are i.i.d. (Jackson,2010). Under the null model, when n is large, the average clustering coefficient is14 ependence-Robust Inference close to the probability of forming a link. Yet, the average clustering coefficienttypically appears to be quite larger than the empirical linking probability in practice,hence the stylized fact (Barabási, 2015, Ch. 3).In order to assess formally whether average clustering is significantly different fromthe probability of link formation, we can use the tests given by (2) with X i “ Cl i p A q ´ n ´ n ÿ j “ A ij . Then ¯ X is the difference between the average clustering coefficient and the empiricallinking probability. To verify ? n -consistency, we can apply CLTs derived by, forexample, Bickel et al. (2011) and Leung and Moon (2020). Testing for whether data follows a power-law distribution is of wide empirical interestin economics, finance, network science, neuroscience, biology, and physics (Barabási,2015; Gabaix, 2009; Klaus et al., 2011; Newman, 2005). By “power law” we mean thatthe probability density or mass function of the data f p x q is proportional to x ´ α forsome positive exponent α . Many methods are available for estimating α , for examplemaximum likelihood or regression estimators (Ibragimov et al., 2015). When the datais dependent, the former becomes pseudo-maximum likelihood, but the estimator isstill consistent under weak dependence. With an estimate of the power law exponentin hand, it is of interest to test how well the data accords with or deviates from apower law. Standard methods assume that the underlying data is i.i.d. (Clauset et al.,2009), but this is unrealistic for spatial, financial, and network data, motivating theuse of resampled statistics.In the networks literature, for example, a well-known stylized fact is that real-world social networks have power-law degree distributions (Barabási and Albert, 1999),where a network’s degree distribution is the distribution of degrees (a node’s numberof connections) across nodes in the network. However, statistical evidence for thisfact is commonly obtained visually from log-log plots of degree distributions, ratherthan from formal tests (Holme, 2018). A recent paper by Broido and Clauset (2019)implements formal tests for power laws on a wide variety of network datasets, buttheir methods assume independent observations, despite the fact that network degrees15 ichael P. Leung are typically correlated.The null hypothesis we next consider testing is motivated by Klaus et al. (2011),which is that the power law fits no better than some reference null distribution, forexample exponential or log-normal. This is operationalized using a Vuong test of thenull that the expected log-likelihood ratio is zero. The numerator of the likelihoodratio is the power-law distribution with an estimated exponent, and the denominatoris the estimated null distribution. Under general misspecification, the log-likelihoodratio is zero if both models poorly fit the data, and less (greater) than zero if the nulldistribution fits better (worse) (Pesaran, 1987; Vuong, 1989).For i.i.d. data, in the case of non-nested hypotheses, we test the null by comparingthe absolute value of the normalized log-likelihood ratio with a normal critical value.If it exceeds the critical value and the log-likelihood ratio exceeds zero, then we rejectin favor of the power law. If it exceeds the critical value and the log-likelihood ratio isless than zero, then we reject in favor of the null distribution. Otherwise, we concludethat the models are equally good.We modify this procedure to account for dependence using the U-type statistic asfollows. For identically distributed data t Z i u ni “ , let ℓ P L p Z i , α q be the likelihood of ob-servation i under a power law and ℓ p Z i , γ q the likelihood under the null distribution,which is parameterized by γ . Then the null hypothesis is E r log ℓ P L p Z i , α q ´ log ℓ p Z i , γ qs “ . This fits into our setup (7) by defining ˆ θ “ p ˆ α, ˆ γ q , estimates of p α, γ q , and X i “ log ℓ P L p Z i , ˆ α q ´ log ℓ p Z i , ˆ γ q . We can compute the U-type statistic using these X i ’s and compare it to a normalcritical value. If the statistic does not exceed the critical value, then the models areequally good, so we do not reject the null. If it does, then we reject in favor of thepower law (null distribution) if ¯ X , the estimated log-likelihood ratio computed on thefull dataset, is strictly greater (less) than zero. Jackson and Rogers (2007) propose a model of network formation that generates a16 ependence-Robust Inference degree distribution parameterized by r , which interpolates between the exponentialand power-law distributions. Their model provides microfoundations for the differ-ent distributions. When r Ñ 8 , the network is formed primarily through randommeetings, and the distribution is exponential. When r Ñ , the network is formed pri-marily through “network-based meetings,” as nodes are more likely to meet friends ofnodes that were previously met randomly. Since high-degree nodes are more likely tobe met through the network-based meetings, this corresponds to a “rich-get-richer” or“preferential-attachment” mechanism that generates a power-law degree distribution.The authors estimate r using data on six distinct social networks and informallyassess the extent to which the estimated distributions depart from a power law. Seetheir paper for descriptions of the data. In this section, we use the same datasets toimplement the test described in §3.4, using the exponential distribution as the null.We set the lower support points of the exponential and power-law distributions at oneand estimate the parameters of the distribution using (pseudo) maximum likelihood.Table 1: ResultsCoauthor Radio Prison Romance Citation WWWExp. 1.77 1.46 1.67 1.94 1.43 1.79LL -36.45 -1.75 -6.13 -13.75 -3.84 62.46Naive E E E E E P r n R n RS R n RS R n RS R n RS R n RS R n RS60k E 37 N 65 E 1130 E 692 E 60k P80k E 50 N 86 E 1507 E 923 N 80k PRS 100k E 62 N 108 E 1884 E 1154 N 100k P120k E 74 N 130 E 2261 E 1385 E 120k P140k E 87 N 151 E 2638 E 1616 E 140k P “Exp.” “ estimated power law exponent. “LL” “ the normalized log-likelihood ratio. “RS” “ conclusion of our test, and “Naive” “ conclusion of i.i.d. test, where P “ power law, E “ exponential, and N “ fail to reject. Table 1 displays the results of the tests. Row “Exp.” displays the estimatedpower law exponent, “LL” the normalized log-likelihood ratio, “ r ” the estimated valueof r from Jackson and Rogers (2007), and n the sample size. “Naive” displays theconclusion of the conventional Vuong test that assumes the data is i.i.d., with the17 ichael P. Leung conclusion “P” denoting power law, “E” denoting exponential, and “N” meaning thenull is not rejected. Finally, the bottom five rows display the conclusions of ourtest for different values of R n to assess the robustness of the conclusions. Recallingthe definition of R Un from (6), the rows correspond to R n “ min t R Un ¨ ǫ, k u for ǫ P t . , . , , . , . u . Thus, the middle of the five bottom rows is our suggestedchoice R Un . We truncate R n at 100k, since this is already sufficiently large enough todraw a robust conclusion.The results of all three methods are in agreement for the coauthor, prison, ro-mance, and WWW networks. This is due to the large values of the normalizedlog-likelihood ratios, which make the conclusion rather obvious regardless of the testused. For the other datasets, however, the methods draw different conclusions.For the ham radio network, Jackson and Rogers (2007) estimate r to be 5. Asthey note, this means network-based meetings are about eight times more common inthe WWW network compared to the ham radio network, so their degree distributionsshould be closer to exponential than power law. However, our test finds insufficientevidence to reject the null for the ham radio network due to the very small sample size.The Vuong test rejects in favor of the exponential distribution, despite the normalizedlog-likelihood ratio being smaller (-1.75). This may be because the test assumes i.i.d.data, so the sample variance of the log-likelihoods is perhaps an underestimate, andthe test may be anti-conservative.Jackson and Rogers (2007) estimate r to be close to zero for the citation network,which favors the power law. In contrast, the Vuong test rejects in favor of the ex-ponential distribution, while our test either concludes exponential or fails to reject,depending on the value of R n . This is due to the negative normalized log-likelihood-3.84, which is large enough for the i.i.d. test to draw a clear conclusion but perhapsnot quite large after adjusting for dependence, which explains the ambiguity in theconclusion of our test. Still, neither test concludes the data is consistent with a powerlaw, unlike the estimate of Jackson and Rogers (2007). This section considers a generalization of the setup in §2 in which X is a triangulararray. That is, X i and µ may implicitly depend on n , but we suppress this in thenotation. This is important to accommodate network applications, since, for example,18 ependence-Robust Inference when the network is sparse, the linking probability decays to zero with n . All proofscan be found in the appendix. For any vector v , let k v k denote its sup norm. The next theorem shows that theU-type (mean-type) statistic is asymptotically normal (chi-square). Theorem 1.
For every n , let R n ě be an integer. Suppose the following conditionshold under asymptotics sending n to infinity.(a) n ´ { ř ni “ p X i ´ µ q “ O p p q .(b) There exists a positive-definite matrix Σ such that ˆ Σ p ÝÑ Σ .(c) n ´ ř ni “ k X i k ` δ “ O p p q for some δ ą .If R n Ñ 8 and R n { n “ o p q , then T M p µ ; π q d ÝÑ χ m and T M p ¯ X ; π q d ÝÑ χ m , where χ m is the chi-square distribution with m degrees of freedom. If ? R n { n “ o p q , T U p µ ; π q d ÝÑ N p , q and T U p ¯ X ; π q d ÝÑ N p , q . Furthermore, conditional on the data, the CDFs of these statistics each uniformlyconverge in probability to the CDF of the standard normal distribution.
Remark 4.
While we focus on the standard case of ? n -consistency, our methods canbe adjusted to accommodate slower rates of convergence. This involves choosing anasymptotically smaller value of R n . To see this, suppose ¯ X is n δ -consistent for some δ P p , q , and consider decomposition (5) for the mean-type statistic. Term r I s isstill asymptotically normal. Term r II s equals ? R n ˆ Σ ´ { n ´ ř ni “ p X i ´ µ q . We stilltypically expect ˆ Σ p ÝÑ Σ positive definite. Thus, we need ? R n n δ ˜ n δ n n ÿ i “ p X i ´ µ q ¸ “ o p p q . ichael P. Leung Given n δ -consistency, we therefore require ? R n n ´ δ “ o p q , which for δ ă . is slowerthan the rate for R n required in Theorem 1. Remark 5.
Theorem 1 provides limit distributions for T M p ¯ X ; π q and T U p ¯ X ; π q , whichgive us an alternate way of constructing critical values using the “permutation dis-tribution.” Let t ˜ π rℓ : ℓ “ , . . . , L ; r “ , . . . , R n u be i.i.d. uniform draws from Π and ˜ π ℓ “ p ˜ π ℓ , . . . , ˜ π R n ℓ q . Following §3.2 of Song (2016), the permutation critical value forthe test in (2) using the U-type statistic is c L, ´ α “ inf c ą L L ÿ ℓ “ T U p ¯ X, ˜ π ℓ q ą c ( ď α + , the ´ α quantile of the permutation distribution. Permutation critical values for themean-type statistic are obtained analogously, replacing T U p ¯ X, ˜ π ℓ q with T M p ¯ X, ˜ π ℓ q inthe previous expression. Remark 6.
The generality of our procedures comes at the cost of having poweragainst fewer sequences of alternatives. If the econometrician could consistently esti-mate the asymptotic variance of ¯ X , then the usual trinity of tests would have poweragainst local alternatives µ n “ µ ` h {? n . In contrast, the test in (2) using the mean-type statistic only has nontrivial asymptotic power against alternatives µ n “ h { α n ,where α n Ñ 8 but α n R ´ { n Ñ c P r , . This is immediate from the rate ofconvergence and bias discussed in §2. For the test using the U-type statistic, wehave instead α n R ´ { n Ñ c P r , , following the argument in Theorem 3.3 of Song(2016). Due to the rate conditions on R n , this implies that tests using our resampledstatistics have lower power than conventional tests, which we interpret as the cost ofdependence-robustness. Note that these calculations do not imply the mean-type ismore powerful than the U-type statistic because R n is chosen differently for the two.It can grow faster with n for the U-type statistic, which is why the latter has betterpower properties, as discussed in §2. Remark 7.
Assumption (a) of Theorem 1 allows X i to depend on a “first-stage”estimator, which is important for many of the applications in §3. Consider the setupfor asymptotically linear estimators (7). The following are primitive conditions forTheorem 1: 20 ependence-Robust Inference (i) n ´ { ř ni “ ψ p Z i ; β , θ q and ? n p ˆ θ ´ θ q are O p p q .(ii) There exists ˆ S consistent for S “ lim n Ñ8 n ´ ř ni “ E r ψ p Z i ; β , θ q ψ p Z i ; β , θ q s ,and the latter is positive-definite.(iii) n ´ ř ni “ k ψ p Z i ; β , θ q k ` δ “ O p p q for some δ ą .(iv) sup θ P Θ k n ´ ř ni “ p ∇ θ ψ p Z i ; β , θ q ´ E r ∇ θ ψ p Z i ; β , θ qsq k “ o p p q . We next state formal results on the uniform size and power properties of test (9). Let λ min p M q denote the smallest eigenvalue of a matrix M and k M k “ max i,j | M ij | . Define µ p P q “ E P r X s , where E P r¨s denotes the expectation under the data-generatingprocess (DGP) P , and let µ k p P q be the k th component of µ p P q . Define Σ P n “ n ´ ř ni “ E P n rp X i ´ µ qp X i ´ µ q s . Theorem 2.
Let P be the set of DGPs such that for any sequence t P n u n P N Ď P thefollowing conditions hold.(a) n ´ { ř ni “ p X i ´ µ p P n qq “ O P n p q .(b) lim sup n Ñ8 k Σ P n k ă 8 , lim inf n Ñ8 λ min p Σ P n q ą , and k ˆ Σ ´ Σ P n k “ o p p q .(c) n ´ ř ni “ k X i k ` δ “ O P n p q for some universal constant δ ą that only dependson P and not the sequence t P n u n P N .If R n Ñ 8 and ? R n { n “ o p q , then under the null that µ p P n q ď for all n , sup P P P E P r φ n s Ñ α , where φ n is the test given in (9) . This shows that the test uniformly controls size. The theorem follows fairly directlyfrom the next proposition, which also provides results on the power of the test.
Proposition 1.
Fix a sequence of DGPs t P n u n P N , and let δ ˚ k “ lim n Ñ8 R { n µ k p P n q P R Y t´8 , . Suppose the assumptions of Theorem 2 hold.(a) If max k δ ˚ k ď (null / “local-to-null” case), then E P n r φ n s Ñ α .(b) If max k δ ˚ k “ 8 (fixed alternative case), then E P n r φ n s Ñ . ichael P. Leung (c) If max k δ ˚ k P p , (local alternative case), then E P n r φ n s Ñ β ą α . Part (a) gives the size under any sequence of null DGPs. Parts (b) and (c) describethe test’s power, with (c) showing that the test has power against local alternativesthat vanish no faster than rate R ´ { n . Since we need ? R n { n Ñ , this implies thatthe test does not have power against ? n local alternatives. Remark 8.
Remark 6 discusses a sort of bias-variance trade-off for choosing R n in theequality-testing case. A similar trade-off occurs here, since we use the same U-typestatistic. Consider the case m “ and drop the subscript k . By (A.2) in the proofof Proposition 1, ˆ λ depends on the term ? R n ˆ Σ ´ { p ¯ X ´ µ p P n qq . If ? R n { n Ñ as required, then this term vanishes. However, if R n were too large, say equal to n ,then the term would instead be asymptotically chi-square, and our test would haveincorrect size. Validity of our test therefore requires ? R n { n Ñ to eliminate a “bias”term. On the other hand, the rate of convergence of the test is R ´ { n , as shown inProposition 1, which reflects the same trade-off under equality testing. Remark 9.
Consider the conventional moment inequalities setting in which X is i.i.d.For simplicity, suppose m “ , and consider the test statistic n ´ { ˆ Σ ´ { ř ni “ X i . Thewell-known difficulty with constructing critical values for this statistic is that while ? n n ÿ i “ ˆ Σ ´ { p X i ´ µ p P n qq d ÝÑ N p , I m q , it is impossible to consistently estimate ? n Σ ´ { µ p P n q . Much of the moment-inequalities literature boils down to finding clever ways to conservatively bound thisnuisance parameter from above (Canay and Shaikh, 2017). In contrast, in our set-ting, the nuisance parameter is essentially ? R n p Σ ´ { µ p P n qq (A.3), which can beconsistently estimated by ? R n p ˆ Σ ´ { ¯ X q , since ? R n { n Ñ (A.4). This is why ourtest is asymptotically exact. This section presents results from four simulation studies, each corresponding to oneof the applications in §2. We use asymptotic critical values to implement the tests and22 ependence-Robust Inference multiple values of R n to assess the robustness of the methods to the tuning parameter.The results are broadly summarized as follows. The size is largely close to the targetlevel of 5 percent across all designs, more so for larger sample sizes. Power can below in small samples, as expected from the convergence rates discussed in Remark 6.The U-type statistic has significantly better power properties than the mean-type,which leads us to recommend use of the former. Finally, the results are similar acrossdifferent values of R n . Cluster Dependence.
Let c index cities, f index families, and i index individuals.We generate outcomes according to the random effects model Y ifc “ θ ` α f ` ε ifc , where α f iid „ N p , q and ε ifc iid „ N p , q . Under this dependence structure, the correctlevel of clustering is at the family level. The true value of θ is one. Let n i bethe number of individuals, n f the number of families, n c the number of cities, and N “ p n c , n f , n i q . Families have equal numbers of individuals and cities equal numbersof families.We present results for resampled statistics and compare them to t -tests usingclustered standard errors at each level of clustering. Table 2 displays simulationresults for the size and power of our tests, computed using 6000 simulations. Thefirst two rows display rejection percentages for size (testing H : θ “ ) and power(testing H : θ “ . ), respectively. To show the robustness of our test, we displayresults for five different values of R n . For the U-type statistic, the columns correspondto R n “ R Un ¨ ǫ for ǫ P t . , . , , . , . u and R Un defined in (6). Thus, the middle ofthe five columns for both sample sizes corresponds to our suggested choice R Un . We dothe same for the M-type statistic, except we use R n “ R Mn ¨ ǫ . Finally, Table 3 displaysanalogous results for clustered t -tests. The columns display the level of clustering with c for city, f for family, i for individual.The results show that the t -test overrejects when clustering at too coarse a leveland the number of clusters is small (clustering at the city level). It also overrejectswhen clustering at too fine a level (clustering at the individual level) because thisassumes more independence in the data than is warranted. In contrast, tests usingresampled statistics properly control size. On the other hand, the t -test is clearlymore powerful. U-type statistics show a significant power advantage over mean-type23 ichael P. Leung statistics. Our results are also similar across values of R n .Table 2: Cluster Dependence: Our Tests N p , , q p , , q Size 5.37 6.02 6.30 6.33 6.73 5.22 5.18 5.68 5.77 5.42M Power 17.55 22.62 27.88 32.52 36.70 33.35 44.35 51.63 58.77 65.78 R n R n
278 371 464 557 650 2381 3175 3969 4763 5557
Averages over 6000 simulations. N “ p n c , n f , n i q . M “ mean-type test, U “ U-type test.
Table 3: Cluster Dependence: t -TestsCluster Lvl c f i c f i Size 6.93 5.27 11.37 7.02 5.18 11.22Power 98.23 98.35 99.22 100 100 100
Averages over 6000 simulations.
Network Statistics.
We generate a network according to a strategic model ofnetwork formation, following the simulation design of Leung (2019). There are n nodes, and each node i is endowed with a type p X i , Z i q , i.i.d. across nodes, where Z i „ Ber p . q and X i „ U pr , s q with Z i KK X i . Let ρ be the function such that ρ p δ q “ if δ ď and equal to otherwise. Potential links satisfy A ij “ ! θ ` p Z i ` Z j q θ ` max k G ik G jk θ ´ ρ p r ´ n k X i ´ X j k q ` ζ ij ą ) , where k ¨ k is the Euclidean norm on R d and ζ ij iid „ N p , θ q is independent of types.We set θ “ p´ , . , . , q and r n “ p . { n q { and use the selection mechanism inthe design of Leung (2019); see his paper for details.We are interested in two statistics that are functions of the network, the aver-age clustering coefficient (defined in §3.3) and the average degree. Leung and Moon(2020) prove ? n -consistency of the sample statistics for their population analogs. Let θ be the expected value of the network statistic. Tables B.1 and B.2 in the appendix24 ependence-Robust Inference display rejection percentages for average clustering and degree for two different nulls.The first is the null that θ equals its true value, which estimates the size. The secondis the null that θ equals the true value plus the number indicated in the table, whichestimates power. We use 6000 simulation draws each to simulate θ and rejectionpercentages.For this design, our tests exhibit substantial size distortion in small samples,although the size does tend toward the nominal level as n grows. For the averagedegree, we still see size distortion at larger samples for the U-type statistic. TheU-type statistic is significantly more powerful than the mean-type statistic, withrejection percentages sometimes more than twice as large. Treatment Spillovers.
Consider the setup in §3.2. We assign units to treatmentwith probability 0.3, and draw the network from the same model used for the networkstatistics above. We generate outcomes according to the linear model Y i “ β ` β D i ` β T i ` β γ i ` ε i . For ν j iid „ N p , q , we set ε i “ ř j A ij ν j { ř j A ij , which represents exogenous peereffects in unobservables and generates network autocorrelation in the errors. We set p β , β , β , β q “ p , . , ´ , . q .We consider a linear regression estimator of Y i on p , D i , T i , γ i q . We test twohypotheses, β “ ´ to estimate the size and β “ ´ . to estimate the power.Table B.3 in the appendix displays rejection percentages, computed using 6000 sim-ulations. As in the network statistics application, we display multiple values of R n corresponding to R n “ R Un ¨ ǫ for the U-type statistic and R n “ R Mn ¨ ǫ for the mean-type statistics, for the same values of ǫ above. In this design, our tests control sizeextremely well across all sample sizes, and the U-type statistic is substantially morepowerful than the mean-type statistic. Power Laws.
We implement the test in §3.4, which uses the U-type statistic. Weset R n “ R Un . Following the notation in that section, we draw data t Z i u ni “ i.i.d. fromeither an Exp p . q distribution or a power-law distribution with exponent 2. Thelower support point for both is set at 1. Table B.4 in the appendix reports rejectionpercentages from 6000 simulations under both alternatives (exponential and powerlaw). Row “LL” displays the average normalized log-likelihood ratio, “Favor Exp” the25 ichael P. Leung percentage of simulations in which we reject in favor of the null distribution, and“Favor PL” the percentage of simulations in which we reject in favor of the power law.The power is around 55–60 percent for n “ and 86–97 percent for n “ . We develop tests for moment equalities and inequalities that are robust to generalforms of weak dependence. The tests compare a resampled test statistic to an appro-priate asymptotic critical value, in contrast to conventional resampling procedures,which compare a test statistic constructed using the original dataset to a resampledcritical value. The validity of conventional procedures requires resampling in a waythat mimics the dependence structure of the data, which in turn requires knowledgeabout the type of dependence. In contrast, our procedure is implemented the sameway regardless of the dependence structure. We show that our procedure is asymptot-ically valid under the weak requirement that the target parameter can be estimatedat a ? n rate. To illustrate the broad applicability of our procedure, we discuss fourapplications, with a focus on varieties of network dependence, including regressionwith unknown dependence, treatment effects with network interference, and testingnetwork stylized facts. References
Acemoglu, D., C. Garcia-Jimeno, and J. Robinson , “State Capacity and Eco-nomic Development: A Network Approach,”
American Economic Review , 2015, (8), 2364–2409.
Aronow, P. and C. Samii , “Estimating Average Causal Effects Under GeneralInterference, with Application to a Social Network Experiment,”
Annals of AppliedStatistics , 2017, (4), 1912–1947. Bakirov, N. and G. Székely , “Student’s t -Test for Gaussian Scale Mixtures,” Jour-nal of Mathematical Sciences , 2006, (3), 6497–6505.
Barabási, A. , Network Science , Cambridge University Press, 2015.26 ependence-Robust Inference
Barabási, A. and R. Albert , “Emergence of Scaling in Random Networks,”
Science ,1999, (5439), 509–512.
Barrios, T., R. Diamond, G. Imbens, and M. Kolesár , “Clustering, SpatialCorrelations, and Randomization Inference,”
Journal of the American StatisticalAssociation , 2012, (498), 578–591.
Bertrand, M., E. Duflo, and S. Mullainathan , “How Much Should We TrustDifferences-in-Differences Estimates?,”
The Quarterly Journal of Economics , 2004, (1), 249–275.
Bester, C., T. Conley, and C. Hansen , “Inference with Dependent Data UsingClustering Covariance Matrix Estimators,”
Journal of Econometrics , 2011, (2),137–151.
Bickel, P., A. Chen, and E. Levina , “The Method of Moments and Degree Dis-tributions for Network Models,”
The Annals of Statistics , 2011, (5), 2280–2301. Broido, A. and A. Clauset , “Scale-free Networks are Rare,”
Nature Communica-tions , 2019, (1), 1–10. Cameron, C. and D. Miller , “A Practitioner’s Guide to Cluster-Robust Inference,”
Journal of Human Resources , 2015, (2), 317–372. , J. Gelbach, and D. Miller , “Bootstrap-Based Improvements for Inference withClustered Errors,” Review of Economics and Statistics , 2008, (3), 414–427. Canay, I. and A. Shaikh , “Practical and Theoretical Advances for Inference in Par-tially Identified Models,” in B. Honore, A. Pakes, M. Piazzesi, and L. Samuelson,eds.,
Advances in Economics and Econometrics: 11th World Congress (Economet-ric Society Monographs) , 2017, pp. 271–306. , J. Romano, and A. Shaikh , “Randomization Tests Under an ApproximateSymmetry Assumption,”
Econometrica , 2017, (3), 1013–1030. Chandrasekhar, A and R. Lewis , “Econometrics of Sampled Networks,”
StanfordUniversity working paper , 2016.
Clauset, A., C. Shalizi, and M. Newman , “Power-Law Distributions in EmpiricalData,”
SIAM Review , 2009, (4), 661–703.27 ichael P. Leung Gabaix, X. , “Power Laws in Economics and Finance,”
Annual Review of Economics ,2009, (1), 255–294. Hansen, B. and S. Lee , “Asymptotic Theory for Clustered Samples,”
Journal ofEconometrics , 2019, (2), 268–290.
Holme, P. , “Power-Laws and Me,” https://petterhol.me/2018/01/12/me-and-power-laws/
January 2018. Accessed: 2020-02-02.
Ibragimov, M., R. Ibragimov, and J. Walden , Heavy-Tailed Distributions andRobustness in Economics and Finance , Vol. 214, Springer, 2015.
Ibragimov, R. and U. Müller , “ t -Statistic Based Correlation and HeterogeneityRobust Inference,” Journal of Business and Economic Statistics , 2010, (4), 453–468. and , “Inference with Few Heterogeneous Clusters,” Review of Economics andStatistics , 2016, (1), 83–96. Jackson, M. , Social and Economic Networks , Princeton University Press, 2010. and B. Rogers , “Meeting Strangers and Friends of Friends: How Random AreSocial Networks?,”
American Economic Review , 2007, (3), 890–915. Jenish, N. and I. Prucha , “On Spatial Processes and Asymptotic Inference UnderNear-Epoch Dependence,”
Journal of Econometrics , 2012, (1), 178–190.
Klaus, A., S. Yu, and D. Plenz , “Statistical Analyses Support Power Law Distri-butions Found in Neuronal Avalanches,”
PloS one , 2011, (5), e19779. Lahiri, S. , Resampling Methods for Dependent Data , Springer Science and BusinessMedia, 2013.
Lehmann, E. and J. Romano , Testing Statistical Hypotheses , Springer Science &Business Media, 2006.
Leung, M. , “A Weak Law for Moments of Pairwise-Stable Networks,”
Journal ofEconometrics , 2019, (1), 182–195., “Treatment and Spillover Effects Under Network Interference,”
Review of Eco-nomics and Statistics , 2020, , 368–380.28 ependence-Robust Inference and R. Moon , “Normal Approximation in Large Network Models,” arXiv preprintarXiv:1904.11060 , 2020.
Li, T. and L. Zhao , “A Partial Identification Subnetwork Approach to DiscreteGames in Large Networks: An Application to Quantifying Peer Effects,”
VanderbiltUniversity working paper , 2016.
Newman, M. , “Power laws, Pareto distributions and Zipf’s law,”
Contemporaryphysics , 2005, (5), 323–351. Pesaran, H. , “Global and Partial Non-Nested Hypotheses and Asymptotic LocalPower,”
Econometric Theory , 1987, (1), 69–97. Politis, D., J. Romano, and M. Wolf , Subsampling
Springer Series in Statistics,New York, NY: Springer New York, 1999.
Sheng, S. , “A Structural Econometric Analysis of Network Formation GamesThrough Subnetworks,”
Econometrica , 2020, (5), 1829–1858. Song, K. , “Ordering-Free Inference from Locally Dependent Data,” arXiv preprintarXiv:1604.00447 , 2016.
Vuong, Q. , “Likelihood Ratio Tests for Model Selection and Non-Nested Hypothe-ses,”
Econometrica , 1989, pp. 307–333.
A Proofs
Proof of Theorem 1.
Convergence in distribution is a direct corollary of The-orems A.1 and A.2 below. These theorems show that the test statistics can be de-composed into a part r I s that converges in distribution conditional on the data and abias term r II s that is o p p q . By Polyá’s Theorem (e.g. Lehmann and Romano, 2006,Theorem 11.2.9), the CDF of r I s , conditional on the data, uniformly converges to anormal CDF a.s. Hence, the last claim of the theorem follows. Theorem A.1 (Mean-Type Statistic) . For every n , let R n ě be an integer. Supposethe following conditions hold. ichael P. Leung (a) R n Ñ 8 and R n { n “ o p q .(b) n ´ { ř ni “ p X i ´ µ q “ O p p q .(c) There exists a positive-definite matrix Σ such that ˆ Σ p ÝÑ Σ .(d) n ´ ř ni “ k X i k ` δ “ O p p q for some δ ą .Then ˜ T M p µ ; π q d ÝÑ N p , I m q and ˜ T M p ¯ X ; π q d ÝÑ N p , I m q . Proof.
Recall decomposition (5). Let ˜ X i “ X i ´ µ . By definition of π r , E r ˜ T M p µ ; π q | X s “ a R n ˆ Σ ´ { | Π | ÿ π P Π ˜ X π p q “ a R n ˆ Σ ´ { n ! ÿ π P Π ˜ X π p q “ a R n ˆ Σ ´ { n ! n ÿ i “ ˜ X i p n ´ q ! “ c R n n ˆ Σ ´ { ? n n ÿ i “ ˜ X i . Hence, r II s in (5) is O p pp R n { n q { q and therefore o p p q by our assumptions.The remainder of the proof establishes a normal limit for r I s in (5). We conditionon the data, treating it as fixed, and apply a Linderberg CLT, since the statistic is asum of conditionally independent random vectors.First we show that asymptotic variance is I m . We have W M,r ´ E r W M,r | X s “ ˆ Σ ´ { ˜ X π r p q ´ n n ÿ i “ ˆ Σ ´ { ˜ X i “ ˆ Σ ´ { p X π r p q ´ ¯ X q . Its conditional second moment is | Π | ÿ π P Π ˆ Σ ´ { p X π p q ´ ¯ X qp X π p q ´ ¯ X q p ˆ Σ ´ { q “ n ! n ÿ i “ ˆ Σ ´ { p X i ´ ¯ X qp X i ´ ¯ X q p ˆ Σ ´ { q p n ´ q ! “ I m . Since Var p W M,r | X q is identically distributed across r , it follows that Var pr I s | X q “ I m . 30 ependence-Robust Inference Similar calculations yield, for δ in assumption (d), E “ k W M,r ´ E r W M,r | X s k ` δ | X ‰ “ n n ÿ i “ k ˆ Σ ´ { p X i ´ µ q ´ ˆ Σ ´ { p ¯ X ´ µ q k ` δ , where k ¨ k denotes the sup norm for vectors. This is O p p q by assumptions (b)–(d)and Minkowski’s inequality, which verifies the Lindeberg condition. Theorem A.2 (U-Type Statistic) . For every n , let R n ě be an integer. Suppose(a) R n Ñ 8 , ? R n { n “ o p q , and (b)–(d) assumptions (b) – (d) of Theorem A.1 hold.Then T U p µ ; π q d ÝÑ N p , q and T U p ¯ X ; π q d ÝÑ N p , q . Proof.
Step 1.
We first show that T U p ¯ X ; π q “ T U p µ ; π q ` o p p q . We have T U p ¯ X ; π q “ T U p µ ; π q ` ? mR n R n ÿ r “ ´ ´p ¯ X ´ µ q ˆ Σ ´ X π r p q ´ X π r p q ˆ Σ ´ p ¯ X ´ µ q ` ¯ X ˆ Σ ´ ¯ X ´ µ ˆ Σ ´ µ ¯ . From the right-hand side, add and subtract a R n { m ´ ´p ¯ X ´ µ q ˆ Σ ´ p´ µ q ´ p´ µ q ˆ Σ ´ p ¯ X ´ µ q ¯ to obtain T U p ¯ X ; π q “ T U p µ ; π q ´ p ¯ X ´ µ q ˆ Σ ´ ? mR n R n ÿ r “ ` p X π r p q ´ µ q ` p X π r p q ´ µ q ˘ ` a R n { mn ? n p ¯ X ´ µ q ˆ Σ ´ ? n p ¯ X ´ µ q . By assumption (b), the third term on the right-hand side is o p p q . Call the secondterm on the right-hand side A n . 31 ichael P. Leung Notice that A n equals ´ times the sum of two similar terms, one of which is p ¯ X ´ µ q ˆ Σ ´ ? mR n R n ÿ r “ p X π r p q ´ µ q loooooooooooooomoooooooooooooon B n . As shown in the proof of Theorem A.1, B n “ O p p q . Thus, the previous expressionis o p p q by assumption (b). Step 2.
Decompose T U p µ ; π q “ p T U p µ ; π q ´ E r T U p µ ; π q | X sq ` E r T U p µ ; π q | X s . (A.1)We show that E r T U p µ ; π q | X s p ÝÑ : E r T U p µ ; π q | X s “ ? mR n R n ÿ r “ E ” p X π r p q ´ µ q ˆ Σ ´ p X π r p q ´ µ q ˇˇ X ı “ | Π | ÿ π P Π c R n m p X π p q ´ µ q ˆ Σ ´ p X π p q ´ µ q“ c R n m n ! ÿ π P Π p X π p q ´ µ q ˆ Σ ´ p X π p q ´ µ q“ c R n m n p n ´ q n ÿ i “ ÿ j ‰ i p X i ´ µ q ˆ Σ ´ p X j ´ µ q . From the last line, add and subtract c R n m n p n ´ q n ÿ i “ p X i ´ µ q ˆ Σ ´ p X i ´ µ q to obtain a R n { mn ´ ˜ ? n p ¯ X ´ µ q ˆ Σ ´ ? n p ¯ X ´ µ q ´ n n ÿ i “ p X i ´ µ q ˆ Σ ´ p X i ´ µ q ¸ . This is o p p q by assumptions (a), (b), and (c). Step 3.
It remains to establish a normal limit for the term T U p µ ; π q ´ E r T U p µ ; π q | X s in decomposition (A.1). We condition on the data, treating it as fixed, and apply32 ependence-Robust Inference a Linderberg CLT, since the statistic is a sum of conditionally independent randomvariables. Define W U,r “ m ´ { p X π r p q ´ µ q ˆ Σ ´ p X π r p q ´ µ q . First consider the variance. We have Var p T U p µ ; π q | X q “ E r W U,r | X s´ E r W U,r | X s , where the second term on the right-hand side is o p p q by step 2 above. On theother hand, E r W U,r | X s equals | Π | ÿ π P Π m ´ ` p X π r p q ´ µ q ˆ Σ ´ p X π r p q ´ µ q ˘ “ n p n ´ q n ÿ i “ ÿ j ‰ i m ´ ` p X i ´ µ q ˆ Σ ´ p X j ´ µ q ˘ “ n n ÿ i “ m ´ p X i ´ µ q ˆ Σ ´ ˜ n ´ ÿ j ‰ i p X j ´ µ qp X j ´ µ q ¸ ˆ Σ ´ p X i ´ µ q , which converges in probability to one, as desired.Finally, we show that, for δ in assumption (d), E „ | W U,r ´ E r W U,r | X s | ` δ ˇˇˇˇ X “ O p p q . This is enough to verify the Lindeberg condition, since W U,r is identically distributedacross r . The left-hand side of the previous equation is equal to a constant times n p n ´ q n ÿ i “ ÿ j ‰ i | p X i ´ µ q ˆ Σ ´ p X j ´ µ q ´ E r W U,r | X s | ` δ ď ¨˝˜ n p n ´ q n ÿ i “ ÿ j ‰ i | p X i ´ µ q ˆ Σ ´ p X j ´ µ q | ` δ ¸ {p ` δ q ` | E r W U,r | X s | ˛‚ ` δ by Minkowski’s inequality. This is O p p q , since n p n ´ q n ÿ i “ ÿ j ‰ i | p X i ´ µ q ˆ Σ ´ p X j ´ µ q | ` δ “ O p p q by assumptions (c) and (d). 33 ichael P. LeungProof of Theorem 2. We prove sup P P P E P r φ n s Ñ α ď α by contradic-tion. Suppose not. Then we can find some null sequence t P n u n P N Ď P such that lim inf n Ñ8 E P n r φ n s ą α . This contradicts conclusion (a) of Proposition 1.Since P includes a DGP P under which E P r X s ď , setting P n “ P for all n yields E P r φ n s Ñ α by conclusion (a) of Proposition 1. Hence, α “ α . Proof of Proposition 1.
We first establish the asymptotic behavior of ˆ λ k . Notethat m { ˆ λ k “p ¯ X k ´ µ k p P n qq ˆ Σ ´ kk ? R n R n ÿ r “ ` p X π r p q ,k ´ µ k p P n qq ` p X π r p q ,k ´ µ k p P n qq ˘ ´ a R n ˆ Σ ´ kk p ¯ X k ´ µ k p P n qq ` µ k p P n q ˆ Σ ´ kk ? R n R n ÿ r “ ` p X π r p q ,k ´ µ k p P n qq ` p X π r p q ,k ´ µ k p P n qq ˘ ` X k ˆ Σ ´ kk a R n µ k p P n q ` a R n ˆ Σ ´ kk p µ k p P n q ´ X k µ k p P n qq . (A.2)As shown in the proof of Theorem A.1, ? R n R n ÿ r “ ` p X π r p q ´ µ p P n qq ` p X π r p q ´ µ p P n qq ˘ “ O P n p q . Since ˆ Σ kk is asymptotically bounded away from zero and infinity under our assump-tions, the first and second lines on the right-hand side of (A.2) are o P n p q ; the thirdline is µ k p P n q ¨ O P n p q ; and the last line equals ? R n ˆ Σ ´ kk µ k p P n q . Then ˆ λ k “ c R n m ˆ Σ ´ kk µ k p P n q ` µ k p P n q ¨ O P n p q ` o P n p q . (A.3)We also note for later that R { n ¯ X k “ R { n µ k p P n q ` d R { n n ? n p ¯ X k ´ µ k p P n qq “ R { n µ k p P n q ` o P n p q . (A.4)Now we turn to each of the claims (a)–(c) of the proposition. Claim (a).
Recall that δ ˚ k “ lim n Ñ8 R { n µ k p P n q . Suppose max k δ ˚ k ď . Without34 ependence-Robust Inference loss of generality, assume δ ˚ k “ for all k “ , . . . , ℓ ´ , and δ ˚ k ă for all k “ ℓ, . . . , m .Then for any k “ , . . . , ℓ ´ , | ˆ λ k t R { n ¯ X k ě u | ď | ˆ λ k | p ÝÑ (A.5)by (A.3). For any k “ ℓ, . . . , m , we have ˆ λ k t R { n ¯ X k ě u p ÝÑ because for any ǫ ą , by the law of total probability, P n ´ | ˆ λ k t R { n ¯ X k ě u | ą ǫ ¯ ď P n ´ | ˆ λ k t R { n ¯ X k ě u | ą ǫ X R { n ¯ X k ă ¯loooooooooooooooooooooooooooomoooooooooooooooooooooooooooon ` P n p R { n ¯ X k ě q Ñ by (A.4) because δ ˚ k ă . Therefore, by (11) and the requirement R n ą , P n p Q n p π q ą c ´ α q “ P n ˆ max ď k ď m T U,k p ¯ X k ; π q ` o P n p q ( ą c ´ α ˙ . (A.6)By definition, c ´ α is the p ´ α q -quantile of ˜ Q n p π q . Moreover, ˜ Q n p π q is a continuousfunction of p T U,k p ¯ X k ; π qq mk “ , whose CDF converges uniformly to that of a normalrandom vector by a minor extension of Theorem A.2. Hence,(A.6) “ P n p ˜ Q n p π q ` o P n p q ą c ´ α q Ñ α. Claim (b).
Suppose max k δ ˚ k “ 8 . Then for some k “ , . . . , m , ˆ λ k t ¯ X k ě u p ÝÑ 8 by (A.3). Since p T U,k p ¯ X k ; π qq mk “ has a tight limit distribution, as established in claim(a), we have Q n p π q p ÝÑ 8 by (11). On the other hand, c ´ α converges in probabilityto a positive constant. Hence, the rejection probability tends to one. Claim (c).
Suppose max k δ ˚ k P p , . Without loss of generality suppose that δ ˚ k is finite and strictly positive for k “ , . . . , ℓ and non-positive for k “ ℓ ` , . . . , m .Then for k “ , . . . , ℓ , ˆ λ k t R { n ¯ X k ě u p ÝÑ λ ˚ k P p , by (A.3) and (A.4). For k “ ℓ ` , . . . , m , ˆ λ k t R { n ¯ X k ě u p ÝÑ , as shown in claim35 ichael P. Leung (a). Therefore, Q n p π q “ max " max ď k ď ℓ T U,k p ¯ X k ; π q ` λ ˚ k ` o P n p q ( , max ℓ ă k ď m T U,k p ¯ X k ; π q ` o P n p q (* . (A.7)On the other hand, c ´ α is the p ´ α q -quantile of the distribution of ˜ Q n p π q “ max " max ď k ď ℓ T U,k p ¯ X k ; π q , max ℓ ă k ď m T U,k p ¯ X k ; π q * . As discussed above in claim (a), both have tight limit distributions obtained byreplacing p T U,k p ¯ X k ; π qq mk “ with a normal random vector. Since the λ ˚ k s in (A.7) arestrictly positive, P n p Q n p π q ą c ´ α q converges to a nonzero constant strictly largerthan α . B Additional Tables
This section contains simulation results referenced in §6.Table B.1: Average Clustering n Mean-Type Test U-Type TestSize 5.35 6.67 7.58 7.50 8.93 8.88 9.75 10.08 10.47 10.57100 Power 13.80 17.90 20.63 23.38 26.05 32.40 35.75 38.67 39.85 41.68 R n R n
13 18 22 26 31 944 1259 1574 1889 2204Size 4.93 5.32 5.92 5.30 6.00 5.85 6.02 6.65 6.10 6.321k Power 25.73 32.97 38.12 44.50 47.92 91.87 95.02 96.55 97.48 98.12 R n
19 25 31 37 43 2381 3174 3968 4762 5555
Averages over 6000 simulations. “Size” rows obtained from testing H : θ “ θ ˚ , where θ ˚ “ true expectedvalue of average clustering. “Power” rows obtained from testing H : θ “ θ ˚ ` . . ependence-Robust Inference Table B.2: Average Degree n Mean-Type Test U-Type TestSize 7.22 7.43 8.18 9.10 9.77 9.20 10.22 10.83 10.88 12.07100 Power 26.58 31.98 36.72 41.08 45.93 62.28 67.07 69.87 72.50 74.45 R n R n
13 18 22 26 31 944 1259 1574 1889 2204Size 5.65 5.92 5.40 5.85 6.07 6.05 6.43 6.82 6.63 6.751k Power 49.17 59.52 68.52 75.10 80.45 99.93 99.97 99.98 99.98 100 R n
19 25 31 37 43 2381 3174 3968 4762 5555
Averages over 6000 simulations. “Size” rows obtained from testing H : θ “ θ ˚ , where θ ˚ “ true expectedvalue of average degree. “Power” rows obtained from testing H : θ “ θ ˚ ` . . Table B.3: Treatment Spillovers n
100 500 1kSize 5.55 5.95 5.83 5.87 5.88 4.98 4.33 5.68 5.20 5.27 5.05 4.95 4.93 4.67 4.65M Power 8.08 10.12 11.62 13.57 15.85 14.85 19.52 22.12 25.93 30.10 19.93 25.43 31.40 35.03 41.98 R n R n
110 147 184 221 258 944 1259 1574 1889 2204 2381 3174 3968 4762 5555
Averages over 6000 simulations. M “ mean-type statistic, U “ U-type statistic.
Table B.4: Power Law TestExponential Power LawFavor Exp 56.93 97.07 99.93 0.00 0.00 0.00Favor PL 0.00 0.00 0.00 60.47 86.30 93.83LL -370.48 -786.58 -1103.83 301.72 432.20 498.63 R n
184 1574 3968 184 1574 3968 n
100 500 1000 100 500 1000
Averages over 6000 simulations. “LL” “ average normalized log-likelihood ratio.“Favor Exp” “ % rejections in favor of exponential.% rejections in favor of exponential.